home.social

#bitfield — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #bitfield, aggregated by home.social.

  1. #ReleaseDay thi.ng/column-store is an in-memory database with customizable column types, extensible query engine, bitfield indexing for query acceleration, JSON serialization with optional RLE compression.

    The new version introduces support for arbitrarily nested queries and merging of sub-query results using a choice of AND (intersection/constraint), OR (union/alternative) or NAND/NOR (negated versions) semantics.

    docs.thi.ng/umbrella/column-st

    Also without query nesting, each individual query term's results can be merged as union now. The previous behavior only allowed for term result intersections, with each subsequent term further narrowing the total result set. This behavior limited the types of compound queries possible and therefore required multiple separate queries and additional user effort to merge results manually. Well, no more! :)

    docs.thi.ng/umbrella/column-st

    Another (intentional) side effect: Since queries are defined declaratively, creating complex queries is much easier (and legible) now via composition and re-use of predefined sub-queries. The support for nesting also simplifies the creation of user-defined query DSLs (domain specific languages).

    As before, for columns with bitmap indexing enabled, most of the query operators are extremely fast since only bit masks need to be combined and no actual row or column data is being visited. The latter is only necessary for predicate-based matchXXX() operators...

    #ThingUmbrella #OpenSource #Database #QueryEngine #TypeScript #BitField

  2. tl;dr Using thi.ng/column-store to accelerate tag intersection queries by a factor of 880x...

    Working on the static website generator/export plugin for my personal knowledge tool has been one of the main projects this past month. A key part of this setup is tagging, not just simple flat keywords/categories, but actually treating tags as sets. The system doesn't just allow browsing content by a single tag, but also supports adding (or removing) tags to narrow or widen the current topic. E.g. The combination of "3d + geometry + typescript" would select only works which have all of these three tags...

    In the local version of my tool there's no limit to the number of tags (and it also supports tag negation), but for the static site generation I have to limit the set size (due to combinatoric explosion) and pre-compute all possible permutations, then create HTML documents for each these individual combinations which actually produce results.

    So far I'm having ~400 unique tags in use, meaning if I want to aim for a max set size of 3, there're theoretically ~64,000,000 possibilities to check[1]! For the roughly 3500 content items used for testing, a naive JS approach to filter the result array and only retain items matching the entire current permutation is so extremely slow, that I stopped the process after 3.5 minutes just for the first 250k (aka 0.4%) of the 64 million permutations, i.e. at that rate the full process would have taken ~15 hours, pretty slow for a SSG... :)

    Naive approach 🫣:

    ```
    permutation = ["3d", "geometry", "typescript"]
    results.filter(item => permutation.every(tag => item.tags.includes(tag)))
    ```

    But since I'm using thi.ng/column-store as my database, such queries can be optimized by a few magnitudes, since here these intersection queries are applied only to bitfields (explained in the pkg readme). This results in all 64+ million permutations being processed in just 62 seconds (1+ million per second). Quite the difference, i.e. ~880x faster than the above approach!

    Also, of these 64 million initial possibilities, there're fewer unique ones (excluding duplicates and ignoring ordering), and currently only ~24,000 are actually producing a result. Still, that's 24,000 index pages to generate & host and it's, of course, far, far too much!

    So I will have to also spend more effort curating and severely reducing the tag vocabulary, at least the subset used for the website. On the other hand, I think this system will really help with browsing this large body/archive of work much more meaningfully than the boring single-tag/category approach most websites are offering. And it will do so without any backend (other than file hosting)...

    [1] Permutations = 400 + 400^2 + 400^3

    #ThingUmbrella #Tagging #Intersection #Query #Bitfield #WebDev #JavaScript #TypeScript #Optimization

  3. tl;dr Using thi.ng/column-store to accelerate tag intersection queries by a factor of 880x...

    Working on the static website generator/export plugin for my personal knowledge tool has been one of the main projects this past month. A key part of this setup is tagging, not just simple flat keywords/categories, but actually treating tags as sets. The system doesn't just allow browsing content by a single tag, but also supports adding (or removing) tags to narrow or widen the current topic. E.g. The combination of "3d + geometry + typescript" would select only works which have all of these three tags...

    In the local version of my tool there's no limit to the number of tags (and it also supports tag negation), but for the static site generation I have to limit the set size (due to combinatoric explosion) and pre-compute all possible permutations, then create HTML documents for each these individual combinations which actually produce results.

    So far I'm having ~400 unique tags in use, meaning if I want to aim for a max set size of 3, there're theoretically ~64,000,000 possibilities to check[1]! For the roughly 3500 content items used for testing, a naive JS approach to filter the result array and only retain items matching the entire current permutation is so extremely slow, that I stopped the process after 3.5 minutes just for the first 250k (aka 0.4%) of the 64 million permutations, i.e. at that rate the full process would have taken ~15 hours, pretty slow for a SSG... :)

    Naive approach 🫣:

    ```
    permutation = ["3d", "geometry", "typescript"]
    results.filter(item => permutation.every(tag => item.tags.includes(tag)))
    ```

    But since I'm using thi.ng/column-store as my database, such queries can be optimized by a few magnitudes, since here these intersection queries are applied only to bitfields (explained in the pkg readme). This results in all 64+ million permutations being processed in just 62 seconds (1+ million per second). Quite the difference, i.e. ~880x faster than the above approach!

    Also, of these 64 million initial possibilities, there're fewer unique ones (excluding duplicates and ignoring ordering), and currently only ~24,000 are actually producing a result. Still, that's 24,000 index pages to generate & host and it's, of course, far, far too much!

    So I will have to also spend more effort curating and severely reducing the tag vocabulary, at least the subset used for the website. On the other hand, I think this system will really help with browsing this large body/archive of work much more meaningfully than the boring single-tag/category approach most websites are offering. And it will do so without any backend (other than file hosting)...

    [1] Permutations = 400 + 400^2 + 400^3

    #ThingUmbrella #Tagging #Intersection #Query #Bitfield #WebDev #JavaScript #TypeScript #Optimization

  4. tl;dr Using thi.ng/column-store to accelerate tag intersection queries by a factor of 880x...

    Working on the static website generator/export plugin for my personal knowledge tool has been one of the main projects this past month. A key part of this setup is tagging, not just simple flat keywords/categories, but actually treating tags as sets. The system doesn't just allow browsing content by a single tag, but also supports adding (or removing) tags to narrow or widen the current topic. E.g. The combination of "3d + geometry + typescript" would select only works which have all of these three tags...

    In the local version of my tool there's no limit to the number of tags (and it also supports tag negation), but for the static site generation I have to limit the set size (due to combinatoric explosion) and pre-compute all possible permutations, then create HTML documents for each these individual combinations which actually produce results.

    So far I'm having ~400 unique tags in use, meaning if I want to aim for a max set size of 3, there're theoretically ~64,000,000 possibilities to check[1]! For the roughly 3500 content items used for testing, a naive JS approach to filter the result array and only retain items matching the entire current permutation is so extremely slow, that I stopped the process after 3.5 minutes just for the first 250k (aka 0.4%) of the 64 million permutations, i.e. at that rate the full process would have taken ~15 hours, pretty slow for a SSG... :)

    Naive approach 🫣:

    ```
    permutation = ["3d", "geometry", "typescript"]
    results.filter(item => permutation.every(tag => item.tags.includes(tag)))
    ```

    But since I'm using thi.ng/column-store as my database, such queries can be optimized by a few magnitudes, since here these intersection queries are applied only to bitfields (explained in the pkg readme). This results in all 64+ million permutations being processed in just 62 seconds (1+ million per second). Quite the difference, i.e. ~880x faster than the above approach!

    Also, of these 64 million initial possibilities, there're fewer unique ones (excluding duplicates and ignoring ordering), and currently only ~24,000 are actually producing a result. Still, that's 24,000 index pages to generate & host and it's, of course, far, far too much!

    So I will have to also spend more effort curating and severely reducing the tag vocabulary, at least the subset used for the website. On the other hand, I think this system will really help with browsing this large body/archive of work much more meaningfully than the boring single-tag/category approach most websites are offering. And it will do so without any backend (other than file hosting)...

    [1] Permutations = 400 + 400^2 + 400^3

    #ThingUmbrella #Tagging #Intersection #Query #Bitfield #WebDev #JavaScript #TypeScript #Optimization

  5. tl;dr Using thi.ng/column-store to accelerate tag intersection queries by a factor of 880x...

    Working on the static website generator/export plugin for my personal knowledge tool has been one of the main projects this past month. A key part of this setup is tagging, not just simple flat keywords/categories, but actually treating tags as sets. The system doesn't just allow browsing content by a single tag, but also supports adding (or removing) tags to narrow or widen the current topic. E.g. The combination of "3d + geometry + typescript" would select only works which have all of these three tags...

    In the local version of my tool there's no limit to the number of tags (and it also supports tag negation), but for the static site generation I have to limit the set size (due to combinatoric explosion) and pre-compute all possible permutations, then create HTML documents for each these individual combinations which actually produce results.

    So far I'm having ~400 unique tags in use, meaning if I want to aim for a max set size of 3, there're theoretically ~64,000,000 possibilities to check[1]! For the roughly 3500 content items used for testing, a naive JS approach to filter the result array and only retain items matching the entire current permutation is so extremely slow, that I stopped the process after 3.5 minutes just for the first 250k (aka 0.4%) of the 64 million permutations, i.e. at that rate the full process would have taken ~15 hours, pretty slow for a SSG... :)

    Naive approach 🫣:

    ```
    permutation = ["3d", "geometry", "typescript"]
    results.filter(item => permutation.every(tag => item.tags.includes(tag)))
    ```

    But since I'm using thi.ng/column-store as my database, such queries can be optimized by a few magnitudes, since here these intersection queries are applied only to bitfields (explained in the pkg readme). This results in all 64+ million permutations being processed in just 62 seconds (1+ million per second). Quite the difference, i.e. ~880x faster than the above approach!

    Also, of these 64 million initial possibilities, there're fewer unique ones (excluding duplicates and ignoring ordering), and currently only ~24,000 are actually producing a result. Still, that's 24,000 index pages to generate & host and it's, of course, far, far too much!

    So I will have to also spend more effort curating and severely reducing the tag vocabulary, at least the subset used for the website. On the other hand, I think this system will really help with browsing this large body/archive of work much more meaningfully than the boring single-tag/category approach most websites are offering. And it will do so without any backend (other than file hosting)...

    [1] Permutations = 400 + 400^2 + 400^3

    #ThingUmbrella #Tagging #Intersection #Query #Bitfield #WebDev #JavaScript #TypeScript #Optimization

  6. tl;dr Using thi.ng/column-store to accelerate tag intersection queries by a factor of 880x...

    Working on the static website generator/export plugin for my personal knowledge tool has been one of the main projects this past month. A key part of this setup is tagging, not just simple flat keywords/categories, but actually treating tags as sets. The system doesn't just allow browsing content by a single tag, but also supports adding (or removing) tags to narrow or widen the current topic. E.g. The combination of "3d + geometry + typescript" would select only works which have all of these three tags...

    In the local version of my tool there's no limit to the number of tags (and it also supports tag negation), but for the static site generation I have to limit the set size (due to combinatoric explosion) and pre-compute all possible permutations, then create HTML documents for each these individual combinations which actually produce results.

    So far I'm having ~400 unique tags in use, meaning if I want to aim for a max set size of 3, there're theoretically ~64,000,000 possibilities to check[1]! For the roughly 3500 content items used for testing, a naive JS approach to filter the result array and only retain items matching the entire current permutation is so extremely slow, that I stopped the process after 3.5 minutes just for the first 250k (aka 0.4%) of the 64 million permutations, i.e. at that rate the full process would have taken ~15 hours, pretty slow for a SSG... :)

    Naive approach 🫣:

    ```
    permutation = ["3d", "geometry", "typescript"]
    results.filter(item => permutation.every(tag => item.tags.includes(tag)))
    ```

    But since I'm using thi.ng/column-store as my database, such queries can be optimized by a few magnitudes, since here these intersection queries are applied only to bitfields (explained in the pkg readme). This results in all 64+ million permutations being processed in just 62 seconds (1+ million per second). Quite the difference, i.e. ~880x faster than the above approach!

    Also, of these 64 million initial possibilities, there're fewer unique ones (excluding duplicates and ignoring ordering), and currently only ~24,000 are actually producing a result. Still, that's 24,000 index pages to generate & host and it's, of course, far, far too much!

    So I will have to also spend more effort curating and severely reducing the tag vocabulary, at least the subset used for the website. On the other hand, I think this system will really help with browsing this large body/archive of work much more meaningfully than the boring single-tag/category approach most websites are offering. And it will do so without any backend (other than file hosting)...

    [1] Permutations = 400 + 400^2 + 400^3

    #ThingUmbrella #Tagging #Intersection #Query #Bitfield #WebDev #JavaScript #TypeScript #Optimization

  7. #HowToThing #023 — Responsive & reactive image gallery with tag-based Jaccard similarity ranking/filtering using thi.ng/bitfield, thi.ng/rstream & thi.ng/rdom

    A quite common comment about #ThingUmbrella is that people often have little idea what some of the ~185 packages are even good/intended for and/or how to synthesize solutions from these small, individual building blocks. IMHO this is less about these packages themselves and more down to existing blank spots about the underlying concepts, algorithms and their potential role/utility in a larger problem domain... So I very much hope this new example is also useful in this respect!

    Alas, the full code for this got pretty long and contains a lot more UI stuff. I'm intending to develop this further for the new homepage to browse all ~135 #ThingUmbrella examples (and maybe even for parts of the thi.ng website itself)... For those of you interested in more "advanced" thi.ng/rdom examples, do check it out!

    Background info:
    en.wikipedia.org/wiki/Jaccard_

    Demo:
    demo.thi.ng/umbrella/related-i

    Full source code:
    github.com/thi-ng/umbrella/tre

    The important parts re: using compact binary encoding, bitfields & Jaccard similarity to find related items are here:

    github.com/thi-ng/umbrella/blo

    #ThingUmbrella #Tagging #Statistics #Similarity #Ranking #Bitfield #TypeScript #JavaScript #UI #Frontend #Reactive #Tutorial

  8. #HowToThing #023 — Responsive & reactive image gallery with tag-based Jaccard similarity ranking/filtering using thi.ng/bitfield, thi.ng/rstream & thi.ng/rdom

    A quite common comment about #ThingUmbrella is that people often have little idea what some of the ~185 packages are even good/intended for and/or how to synthesize solutions from these small, individual building blocks. IMHO this is less about these packages themselves and more down to existing blank spots about the underlying concepts, algorithms and their potential role/utility in a larger problem domain... So I very much hope this new example is also useful in this respect!

    Alas, the full code for this got pretty long and contains a lot more UI stuff. I'm intending to develop this further for the new homepage to browse all ~135 #ThingUmbrella examples (and maybe even for parts of the thi.ng website itself)... For those of you interested in more "advanced" thi.ng/rdom examples, do check it out!

    Background info:
    en.wikipedia.org/wiki/Jaccard_

    Demo:
    demo.thi.ng/umbrella/related-i

    Full source code:
    github.com/thi-ng/umbrella/tre

    The important parts re: using compact binary encoding, bitfields & Jaccard similarity to find related items are here:

    github.com/thi-ng/umbrella/blo

    #ThingUmbrella #Tagging #Statistics #Similarity #Ranking #Bitfield #TypeScript #JavaScript #UI #Frontend #Reactive #Tutorial

  9. #HowToThing #023 — Responsive & reactive image gallery with tag-based Jaccard similarity ranking/filtering using thi.ng/bitfield, thi.ng/rstream & thi.ng/rdom

    A quite common comment about #ThingUmbrella is that people often have little idea what some of the ~185 packages are even good/intended for and/or how to synthesize solutions from these small, individual building blocks. IMHO this is less about these packages themselves and more down to existing blank spots about the underlying concepts, algorithms and their potential role/utility in a larger problem domain... So I very much hope this new example is also useful in this respect!

    Alas, the full code for this got pretty long and contains a lot more UI stuff. I'm intending to develop this further for the new homepage to browse all ~135 #ThingUmbrella examples (and maybe even for parts of the thi.ng website itself)... For those of you interested in more "advanced" thi.ng/rdom examples, do check it out!

    Background info:
    en.wikipedia.org/wiki/Jaccard_

    Demo:
    demo.thi.ng/umbrella/related-i

    Full source code:
    github.com/thi-ng/umbrella/tre

    The important parts re: using compact binary encoding, bitfields & Jaccard similarity to find related items are here:

    github.com/thi-ng/umbrella/blo

    #ThingUmbrella #Tagging #Statistics #Similarity #Ranking #Bitfield #TypeScript #JavaScript #UI #Frontend #Reactive #Tutorial

  10. #HowToThing #023 — Responsive & reactive image gallery with tag-based Jaccard similarity ranking/filtering using thi.ng/bitfield, thi.ng/rstream & thi.ng/rdom

    A quite common comment about #ThingUmbrella is that people often have little idea what some of the ~185 packages are even good/intended for and/or how to synthesize solutions from these small, individual building blocks. IMHO this is less about these packages themselves and more down to existing blank spots about the underlying concepts, algorithms and their potential role/utility in a larger problem domain... So I very much hope this new example is also useful in this respect!

    Alas, the full code for this got pretty long and contains a lot more UI stuff. I'm intending to develop this further for the new homepage to browse all ~135 #ThingUmbrella examples (and maybe even for parts of the thi.ng website itself)... For those of you interested in more "advanced" thi.ng/rdom examples, do check it out!

    Background info:
    en.wikipedia.org/wiki/Jaccard_

    Demo:
    demo.thi.ng/umbrella/related-i

    Full source code:
    github.com/thi-ng/umbrella/tre

    The important parts re: using compact binary encoding, bitfields & Jaccard similarity to find related items are here:

    github.com/thi-ng/umbrella/blo

    #ThingUmbrella #Tagging #Statistics #Similarity #Ranking #Bitfield #TypeScript #JavaScript #UI #Frontend #Reactive #Tutorial

  11. #HowToThing #023 — Responsive & reactive image gallery with tag-based Jaccard similarity ranking/filtering using thi.ng/bitfield, thi.ng/rstream & thi.ng/rdom

    A quite common comment about #ThingUmbrella is that people often have little idea what some of the ~185 packages are even good/intended for and/or how to synthesize solutions from these small, individual building blocks. IMHO this is less about these packages themselves and more down to existing blank spots about the underlying concepts, algorithms and their potential role/utility in a larger problem domain... So I very much hope this new example is also useful in this respect!

    Alas, the full code for this got pretty long and contains a lot more UI stuff. I'm intending to develop this further for the new homepage to browse all ~135 #ThingUmbrella examples (and maybe even for parts of the thi.ng website itself)... For those of you interested in more "advanced" thi.ng/rdom examples, do check it out!

    Background info:
    en.wikipedia.org/wiki/Jaccard_

    Demo:
    demo.thi.ng/umbrella/related-i

    Full source code:
    github.com/thi-ng/umbrella/tre

    The important parts re: using compact binary encoding, bitfields & Jaccard similarity to find related items are here:

    github.com/thi-ng/umbrella/blo

    #ThingUmbrella #Tagging #Statistics #Similarity #Ranking #Bitfield #TypeScript #JavaScript #UI #Frontend #Reactive #Tutorial