home.social

#queryengine — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #queryengine, aggregated by home.social.

  1. #ReleaseDay thi.ng/column-store is an in-memory database with customizable column types, extensible query engine, bitfield indexing for query acceleration, JSON serialization with optional RLE compression.

    The new version introduces support for arbitrarily nested queries and merging of sub-query results using a choice of AND (intersection/constraint), OR (union/alternative) or NAND/NOR (negated versions) semantics.

    docs.thi.ng/umbrella/column-st

    Also without query nesting, each individual query term's results can be merged as union now. The previous behavior only allowed for term result intersections, with each subsequent term further narrowing the total result set. This behavior limited the types of compound queries possible and therefore required multiple separate queries and additional user effort to merge results manually. Well, no more! :)

    docs.thi.ng/umbrella/column-st

    Another (intentional) side effect: Since queries are defined declaratively, creating complex queries is much easier (and legible) now via composition and re-use of predefined sub-queries. The support for nesting also simplifies the creation of user-defined query DSLs (domain specific languages).

    As before, for columns with bitmap indexing enabled, most of the query operators are extremely fast since only bit masks need to be combined and no actual row or column data is being visited. The latter is only necessary for predicate-based matchXXX() operators...

    #ThingUmbrella #OpenSource #Database #QueryEngine #TypeScript #BitField

  2. #ReleaseSaturday — This week I've been working on extracting, refactoring & generalizing the minimal column store database I've been using for my personal knowledge/media management toolset, and I'm happy to share it with the world now:

    thi.ng/column-store

    This is an in-memory column store database with:

    - Customizable column storage types with configurable min/max cardinality, support for optional and/or tuple-values, default values
    - Support for custom column type implementations
    - Optional dictionary encoding of column values (memory & filesize saving)
    - Powerful extensible multi-term query engine with built-in OR/AND/NOR/NAND operators and predicate-based matchers (column, row, partial row). Queries can be pre-built and then executed as standard JS iterables
    - Optional bitfield indexing for dramatic query acceleration (esp. for complex multi-term queries)
    - Dynamic adding/removing of columns
    - JSON serialization with optional RLE compression (in my PKM dataset with ~20k items, the RLE compressed version is only 29% of the normal JSON serialization)

    I hope the readme and code examples give a decent overview for now... I've been using the overall system for a couple of years now, but this new packaged version is still marked as _alpha_. Everything's still being worked on.

    Also, for those wondering what's the point of this all and why not using SQLite etc. — I find there're many use cases for a which a pure JSON-based approach is more than sufficient (without requiring extra tools and interfacing layers). The structure/storage model and the bitfield optimizations enable very fast query performance (compared to other JSON db's I've tried in the past)...

    (Including all dependencies [only some other thi.ng packages], the entire DB package is ~6KB brotli'd, 19KB uncompressed...)

    #ThingUmbrella #TypeScript #JavaScript #JSON #Database #QueryEngine #RLE #SmallWeb

  3. Embedding policy enforcement directly into query engines gives AI agents fine‑grained, auditable control over data. Think row‑ and column‑level security, purpose‑binding, and seamless IAM integration—without sacrificing performance. Learn how this opens the path to trustworthy, open‑source AI. #PolicyEnforcement #QueryEngine #AIagents #RowLevelSecurity

    🔗 aidailypost.com/news/embedding

  4. Embedding policy enforcement directly into query engines gives AI agents fine‑grained, auditable control over data. Think row‑ and column‑level security, purpose‑binding, and seamless IAM integration—without sacrificing performance. Learn how this opens the path to trustworthy, open‑source AI. #PolicyEnforcement #QueryEngine #AIagents #RowLevelSecurity

    🔗 aidailypost.com/news/embedding

  5. Tags are sets. Many apps support tagging of content, but most of them (incl. Mastodon) treat tags only as singular/isolated topic filters, akin to a flat folder-based approach. But tagging can be so, so much more powerful when treating tags as sets and offering users the possibility to combine and query tagged content as sets (think Venn diagrams), i.e. allowing tags to be combined using AND/OR/NOT aka intersection/union/difference operations...

    Below is a simple query engine to do just that in ~40 lines of code (sans comments), incl. using an extensible interpreter for a simple Lisp-like S-Expression language to define arbitrarily complex nested tag queries (the code is actually lifted & simplified from my personal knowledge graph tooling, also talked about here recently[1]...)

    gist.github.com/postspectacula

    For example, the query:

    `(and (or 'Alps' 'PNW') (or 'LandscapePhotography' 'NaturePhotography') (not 'Monochrome'))`

    ...would select all items which have been tagged with `Alps` OR `PNW`, AND have at least one of the two photography tags given, but have NOT the `Monochrome` tag.

    Whilst this syntax is probably alien-looking to the average user, it'd would be fairly straightforward to create visual/structural UIs for defining such queries (over the past 20 years I've done that myself several times already), heck even a SLM (small language model) could be used to translate natural language into such query expressions — what matters here is the widespread lack of treating tags this way in terms of conceptual/data modeling in most applications. Imagine being able to use hashtags this way on Mastodon to assemble personalized timelines (and extend the system to not just deal with hashtags, but other post metadata/provenance too)...

    The code example illustrates how, with the right tools, such features are actually not hard to implement (or to integrate into existing apps). The example uses the following #ThingUmbrella packages for its key functionality:

    - thi.ng/associative: Set-theory operations, custom Map/Set data types (unused here)
    - thi.ng/lispy: Customizable/extensible S-expression parser, interpreter & runtime
    - thi.ng/oquery: Optimized object and array pattern query engine

    [1] mastodon.thi.ng/@toxi/11549755

    #Tagging #Sets #QueryEngine #Lisp #Syntax #Parser #Interpreter #TypeScript #JavaScript

  6. Die auf SQL und weitere Sprachen ausgelegte Suchmaschine Photon steht nun für Lakehouse-Datensysteme auf den wichtigsten Cloud-Plattformen bereit.
    Query Engine Photon für alle Lakehouse-Systeme