home.social

#hyperloglog — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #hyperloglog, aggregated by home.social.

  1. RE: wisskomm.social/@ioer/11589933

    I really took a deep dive into #datashader with this map: Locals & Tourists in Germany, as derived from 67 Million Geo-Social Media Posts (2007-2022) in Germany. The data includes public shared posts from Instagram, Flickr, Twitter and iNaturalist.

    I always wanted to create such a map, following the footsteps of Eric Fisher's Locals & Tourists dataset from 2011 [1].

    I shared the code for producing this map here [2]. The repository is available here [3]. This includes some neat methods for various #geospatial processing tasks in #Python, such as exporting a datashader map to a #GeoTiff [4] with the help of #Xarray and #Rasterio.

    Finally, all of this was created in a privacy-preserving way using #HyperLogLog, which allowed me to share the code and abstracted data publicly for full reproducibility and transparency. [6] #FAIR

    Below you'll find the link to the (quite succinct) publication in Natur und Landschaft in Karten (#NuL).

    [1]: flickr.com/photos/walkingsf/al
    [2]: code.ad.ioer.info/wip/digital_
    [3]: gitlab.hrz.tu-chemnitz.de/ad/d
    [4]: gitlab.hrz.tu-chemnitz.de/s739
    [5]: nul-online.de/article-7301410-
    [6]: doi.org/10.71830/VDMUWW

  2. RE: wisskomm.social/@ioer/11589933

    I really took a deep dive into #datashader with this map: Locals & Tourists in Germany, as derived from 67 Million Geo-Social Media Posts (2007-2022) in Germany. The data includes public shared posts from Instagram, Flickr, Twitter and iNaturalist.

    I always wanted to create such a map, following the footsteps of Eric Fisher's Locals & Tourists dataset from 2011 [1].

    I shared the code for producing this map here [2]. The repository is available here [3]. This includes some neat methods for various #geospatial processing tasks in #Python, such as exporting a datashader map to a #GeoTiff [4] with the help of #Xarray and #Rasterio.

    Finally, all of this was created in a privacy-preserving way using #HyperLogLog, which allowed me to share the code and abstracted data publicly for full reproducibility and transparency. [6] #FAIR

    Below you'll find the link to the (quite succinct) publication in Natur und Landschaft in Karten (#NuL).

    [1]: flickr.com/photos/walkingsf/al
    [2]: code.ad.ioer.info/wip/digital_
    [3]: gitlab.hrz.tu-chemnitz.de/ad/d
    [4]: gitlab.hrz.tu-chemnitz.de/s739
    [5]: nul-online.de/article-7301410-
    [6]: doi.org/10.71830/VDMUWW

  3. #FOSS breaks down barriers and makes innovation more accessible to everyone, worldwide. Roberto Luna Rojas from #Valkey shares why #opensource matters to him.

    Learn more about #vectors, #hyperloglog, #Redis, and how to improve your observability with key-value datastores: t.ly/ZnTNX

    #Linux #observability #kubernetes #softwarelibre #freesoftware

  4. Counting Millions of Things with Kilobytes
    A Hands-On Quarkus Tutorial for Scalable Unique Counting with HyperLogLog
    myfear.substack.com/p/quarkus-
    #Java #Quarkus #GitHub #HyperLogLog

  5. @bkastl Hm, feel you!

    Arbeite durchaus in dem Bereich und war bisher immer ein großer Freund des Ethikrates.

    Evtl. sollte sie eine Befürworterin des #HyperLogLog werden.

    media.ccc.de/v/38c3-privacy-pr

    #38c3 #Patientenakte

  6. Completed the First Assignment of #645 @CMUDB , Hyperloglog was an interesting data structure to learn about.

  7. Google 的 HyperLogLog++

    算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」,在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」,裡面提出了 HyperLogLog++ (HLL++)。

    論文中 Google 提出來的改進主要有三個,第一個是用了 64-bit hash function:

    5.1 Using a 64 Bit Hash Fu

    blog.gslin.org/archives/2024/0

    #Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

  8. Google 的 HyperLogLog++

    算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」,在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」,裡面提出了 HyperLogLog++ (HLL++)。

    論文中 Google 提出來的改進主要有三個,第一個是用了 64-bit hash function:

    5.1 Using a 64 Bit Hash Fu

    blog.gslin.org/archives/2024/0

    #Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

  9. Google 的 HyperLogLog++

    算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」,在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」,裡面提出了 HyperLogLog++ (HLL++)。

    論文中 Google 提出來的改進主要有三個,第一個是用了 64-bit hash function:

    5.1 Using a 64 Bit Hash Fu

    blog.gslin.org/archives/2024/0

    #Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

  10. Google 的 HyperLogLog++

    算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」,在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」,裡面提出了 HyperLogLog++ (HLL++)。

    論文中 Google 提出來的改進主要有三個,第一個是用了 64-bit hash function:

    5.1 Using a 64 Bit Hash Fu

    blog.gslin.org/archives/2024/0

    #Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

  11. Google 的 HyperLogLog++

    算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」,在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」,裡面提出了 HyperLogLog++ (HLL++)。

    論文中 Google 提出來的改進主要有三個,第一個是用了 64-bit hash function:

    5.1 Using a 64 Bit Hash Fu

    blog.gslin.org/archives/2024/0

    #Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

  12. Redis 對 HyperLogLog 省空間的實作

    HyperLogLog (HLL) 是用統計方式解決 Count-distinct problem 的資料結構以及演算法,不要求完全正確,而是大概的數量。

    演算法其實沒有很難懂,在 2007 年的原始論文「HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm」裡面可以讀到演算法是長這樣:

    可以

    blog.gslin.org/archives/2024/0

    #Computer #Murmuring #Software #algorithm #count #data #distinct #hyperloglog #problem #redis #structure

  13. #HyperLogLog is super clever.

    It can count any number of unique values in constant space (i.e. without storing the values) within a specified margin of error.

    And HLLs can be merged to count unique number of values in both sets! So you can quickly count something like "unique number of requests per day", and combine these into "per month", and "per year", without storing a year worth of history.

  14. Fantastic explanation of algorithm: youtube.com/watch?v=lJYufx0bfpw. What's great about this video is that it uses very basic concepts, so that even non-programmers will understand it. On the other hand, CS terms like hash functions or sorted sets are mentioned in fine print, so the video doesn't sound childish

  15. #TIL about the #HyperLogLog algorithm and I think it's a damn brilliant way to estimate the number of unique elements of a potentially gargantuan set of items and only running in O(n) time and O(1) space. The fact that variants of the algorithm can be done in parallel makes it even more awesome!

    youtu.be/lJYufx0bfpw

    #algorithms #ComputerScience #SoME3 #mathematics #maths #statistics

  16. #TIL about the #HyperLogLog algorithm and I think it's a damn brilliant way to estimate the number of unique elements of a potentially gargantuan set of items and only running in O(n) time and O(1) space. The fact that variants of the algorithm can be done in parallel makes it even more awesome!

    youtu.be/lJYufx0bfpw

    #algorithms #ComputerScience #SoME3 #mathematics #maths #statistics

  17. #TIL about the #HyperLogLog algorithm and I think it's a damn brilliant way to estimate the number of unique elements of a potentially gargantuan set of items and only running in O(n) time and O(1) space. The fact that variants of the algorithm can be done in parallel makes it even more awesome!

    youtu.be/lJYufx0bfpw

    #algorithms #ComputerScience #SoME3 #mathematics #maths #statistics

  18. #TIL about the #HyperLogLog algorithm and I think it's a damn brilliant way to estimate the number of unique elements of a potentially gargantuan set of items and only running in O(n) time and O(1) space. The fact that variants of the algorithm can be done in parallel makes it even more awesome!

    youtu.be/lJYufx0bfpw

    #algorithms #ComputerScience #SoME3 #mathematics #maths #statistics

  19. #TIL about the #HyperLogLog algorithm and I think it's a damn brilliant way to estimate the number of unique elements of a potentially gargantuan set of items and only running in O(n) time and O(1) space. The fact that variants of the algorithm can be done in parallel makes it even more awesome!

    youtu.be/lJYufx0bfpw

    #algorithms #ComputerScience #SoME3 #mathematics #maths #statistics

  20. Something a little different today on the channel: HyperLogLog!

    It's one of my favorite algorithms, used to estimate cardinality of a set. Typically used in environments with very large datasets (spread across many servers in a cluster) where a true, accurate distinct count would be very expensive.

    HLL uses a simple observation about coin flipping probabilities, and extends that to cardinality estimation. Really clever algo, and provides a very fast and compact datastructure with reasonably small errors (<2% across billions of unique elements, typically in just a few kb of memory).

    youtube.com/watch?v=lJYufx0bfp

    #programming #algorithm #hyperloglog #cardinality #datastructures