#hyperloglog — Public Fediverse posts on home.social

Alexander Dunkel @[email protected] · 2026-01-16 · 08:13 UTC

RE: https://wisskomm.social/@ioer/115899330915763542

I really took a deep dive into #datashader with this map: Locals & Tourists in Germany, as derived from 67 Million Geo-Social Media Posts (2007-2022) in Germany. The data includes public shared posts from Instagram, Flickr, Twitter and iNaturalist.

I always wanted to create such a map, following the footsteps of Eric Fisher's Locals & Tourists dataset from 2011 [1].

I shared the code for producing this map here [2]. The repository is available here [3]. This includes some neat methods for various #geospatial processing tasks in #Python, such as exporting a datashader map to a #GeoTiff [4] with the help of #Xarray and #Rasterio.

Finally, all of this was created in a privacy-preserving way using #HyperLogLog, which allowed me to share the code and abstracted data publicly for full reproducibility and transparency. [6] #FAIR

Below you'll find the link to the (quite succinct) publication in Natur und Landschaft in Karten (#NuL).

[1]: https://www.flickr.com/photos/walkingsf/albums/72157624209158632
[2]: https://code.ad.ioer.info/wip/digital_traces_map/html/03_visualization.html
[3]: https://gitlab.hrz.tu-chemnitz.de/ad/digital_traces_map/
[4]: https://gitlab.hrz.tu-chemnitz.de/s7398234--tu-dresden.de/base_modules/-/blob/main/raster.py?ref_type=heads#L78
[5]: https://www.nul-online.de/article-7301410-1111/landschaft-und-natur-in-karten-.html
[6]: https://doi.org/10.71830/VDMUWW

#nul #datashader #geospatial #geotiff #xarray #rasterio

Alexander Dunkel @[email protected] · 2026-01-16 · 08:13 UTC

RE: https://wisskomm.social/@ioer/115899330915763542

I really took a deep dive into #datashader with this map: Locals & Tourists in Germany, as derived from 67 Million Geo-Social Media Posts (2007-2022) in Germany. The data includes public shared posts from Instagram, Flickr, Twitter and iNaturalist.

I always wanted to create such a map, following the footsteps of Eric Fisher's Locals & Tourists dataset from 2011 [1].

I shared the code for producing this map here [2]. The repository is available here [3]. This includes some neat methods for various #geospatial processing tasks in #Python, such as exporting a datashader map to a #GeoTiff [4] with the help of #Xarray and #Rasterio.

Finally, all of this was created in a privacy-preserving way using #HyperLogLog, which allowed me to share the code and abstracted data publicly for full reproducibility and transparency. [6] #FAIR

Below you'll find the link to the (quite succinct) publication in Natur und Landschaft in Karten (#NuL).

[1]: https://www.flickr.com/photos/walkingsf/albums/72157624209158632
[2]: https://code.ad.ioer.info/wip/digital_traces_map/html/03_visualization.html
[3]: https://gitlab.hrz.tu-chemnitz.de/ad/digital_traces_map/
[4]: https://gitlab.hrz.tu-chemnitz.de/s7398234--tu-dresden.de/base_modules/-/blob/main/raster.py?ref_type=heads#L78
[5]: https://www.nul-online.de/article-7301410-1111/landschaft-und-natur-in-karten-.html
[6]: https://doi.org/10.71830/VDMUWW

#nul #hyperloglog #fair #datashader #geospatial #geotiff

Coroot @[email protected] · 2025-09-12 · 18:13 UTC

#FOSS breaks down barriers and makes innovation more accessible to everyone, worldwide. Roberto Luna Rojas from #Valkey shares why #opensource matters to him.

Learn more about #vectors, #hyperloglog, #Redis, and how to improve your observability with key-value datastores: https://t.ly/ZnTNX

#Linux #observability #kubernetes #softwarelibre #freesoftware

#foss #valkey #opensource #vectors #hyperloglog #redis

Markus Eisele @[email protected] · 2025-08-07 · 06:18 UTC

Counting Millions of Things with Kilobytes
A Hands-On Quarkus Tutorial for Scalable Unique Counting with HyperLogLog
https://myfear.substack.com/p/quarkus-hyperloglog-unique-counting-java
#Java #Quarkus #GitHub #HyperLogLog

#java #quarkus #github #hyperloglog

𝐭𝐚𝐤𝐢𝐧𝐠 𝐥𝐮𝐜 (RN) @[email protected] · 2025-01-08 · 20:00 UTC

@bkastl Hm, feel you!

Arbeite durchaus in dem Bereich und war bisher immer ein großer Freund des Ethikrates.

Evtl. sollte sie eine Befürworterin des #HyperLogLog werden.

https://media.ccc.de/v/38c3-privacy-preserving-health-data-processing-is-possible

#38c3 #Patientenakte

#hyperloglog #38c3 #patientenakte

Rajiv Harlalka @rajivharlalka009 · 2024-09-16 · 13:41 UTC

Completed the First Assignment of #645 @CMUDB , Hyperloglog was an interesting data structure to learn about.

#hyperloglog #presto

Gea-Suan Lin @[email protected] · 2024-03-20 · 21:45 UTC

Google 的 HyperLogLog++

算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」，在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」，裡面提出了 HyperLogLog++ (HLL++)。

論文中 Google 提出來的改進主要有三個，第一個是用了 64-bit hash function：

5.1 Using a 64 Bit Hash Fu

https://blog.gslin.org/archives/2024/03/21/11709/google-%e7%9a%84-hyperloglog/

#Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

#computer #murmuring #programming #algorithm #data #google

Gea-Suan Lin @[email protected] · 2024-03-20 · 21:45 UTC

Google 的 HyperLogLog++

算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」，在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」，裡面提出了 HyperLogLog++ (HLL++)。

論文中 Google 提出來的改進主要有三個，第一個是用了 64-bit hash function：

5.1 Using a 64 Bit Hash Fu

https://blog.gslin.org/archives/2024/03/21/11709/google-%e7%9a%84-hyperloglog/

#Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

#computer #murmuring #programming #algorithm #data #google

Gea-Suan Lin @[email protected] · 2024-03-20 · 21:45 UTC

Google 的 HyperLogLog++

算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」，在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」，裡面提出了 HyperLogLog++ (HLL++)。

論文中 Google 提出來的改進主要有三個，第一個是用了 64-bit hash function：

5.1 Using a 64 Bit Hash Fu

https://blog.gslin.org/archives/2024/03/21/11709/google-%e7%9a%84-hyperloglog/

#Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

#computer #murmuring #programming #algorithm #data #google

Gea-Suan Lin @[email protected] · 2024-03-20 · 21:45 UTC

Google 的 HyperLogLog++

算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」，在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」，裡面提出了 HyperLogLog++ (HLL++)。

論文中 Google 提出來的改進主要有三個，第一個是用了 64-bit hash function：

5.1 Using a 64 Bit Hash Fu

https://blog.gslin.org/archives/2024/03/21/11709/google-%e7%9a%84-hyperloglog/

#Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

#structure #hyperloglog #hll #google #data #algorithm

Gea-Suan Lin @[email protected] · 2024-03-20 · 21:45 UTC

Google 的 HyperLogLog++

算是接續昨天寫的「Redis 對 HyperLogLog 省空間的實作」，在 Redis 的 HyperLogLog 實作有提到 Google 的論文「HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm」，裡面提出了 HyperLogLog++ (HLL++)。

論文中 Google 提出來的改進主要有三個，第一個是用了 64-bit hash function：

5.1 Using a 64 Bit Hash Fu

https://blog.gslin.org/archives/2024/03/21/11709/google-%e7%9a%84-hyperloglog/

#Computer #Murmuring #Programming #algorithm #data #google #hll #hyperloglog #structure

#computer #murmuring #programming #algorithm #data #google

Gea-Suan Lin @[email protected] · 2024-03-19 · 22:37 UTC

Redis 對 HyperLogLog 省空間的實作

HyperLogLog (HLL) 是用統計方式解決 Count-distinct problem 的資料結構以及演算法，不要求完全正確，而是大概的數量。

演算法其實沒有很難懂，在 2007 年的原始論文「HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm」裡面可以讀到演算法是長這樣：

可以

https://blog.gslin.org/archives/2024/03/20/11705/redis-%e5%b0%8d-hyperloglog-%e7%9c%81%e7%a9%ba%e9%96%93%e7%9a%84%e5%af%a6%e4%bd%9c/

#Computer #Murmuring #Software #algorithm #count #data #distinct #hyperloglog #problem #redis #structure

#computer #murmuring #software #algorithm #count #data

Kornel @[email protected] · 2023-10-31 · 16:22 UTC

#HyperLogLog is super clever.

It can count any number of unique values in constant space (i.e. without storing the values) within a specified margin of error.

And HLLs can be merged to count unique number of values in both sets! So you can quickly count something like "unique number of requests per day", and combine these into "per month", and "per year", without storing a year worth of history.

#hyperloglog

Tomasz Nurkiewicz @nurkiewicz · 2023-07-24 · 19:21 UTC

Fantastic explanation of #HyperLogLog algorithm: https://www.youtube.com/watch?v=lJYufx0bfpw. What's great about this video is that it uses very basic concepts, so that even non-programmers will understand it. On the other hand, CS terms like hash functions or sorted sets are mentioned in fine print, so the video doesn't sound childish

#hyperloglog

Eugene Alvin Villar 🇵🇭 @[email protected] · 2023-07-18 · 11:11 UTC

#TIL about the #HyperLogLog algorithm and I think it's a damn brilliant way to estimate the number of unique elements of a potentially gargantuan set of items and only running in O(n) time and O(1) space. The fact that variants of the algorithm can be done in parallel makes it even more awesome!

https://youtu.be/lJYufx0bfpw

#algorithms #ComputerScience #SoME3 #mathematics #maths #statistics

#til #hyperloglog #algorithms #computerscience #some3 #mathematics

Eugene Alvin Villar 🇵🇭 @[email protected] · 2023-07-18 · 11:11 UTC

#TIL about the #HyperLogLog algorithm and I think it's a damn brilliant way to estimate the number of unique elements of a potentially gargantuan set of items and only running in O(n) time and O(1) space. The fact that variants of the algorithm can be done in parallel makes it even more awesome!

https://youtu.be/lJYufx0bfpw

#algorithms #ComputerScience #SoME3 #mathematics #maths #statistics

#til #hyperloglog #algorithms #computerscience #some3 #mathematics

Eugene Alvin Villar 🇵🇭 @[email protected] · 2023-07-18 · 11:11 UTC

#TIL about the #HyperLogLog algorithm and I think it's a damn brilliant way to estimate the number of unique elements of a potentially gargantuan set of items and only running in O(n) time and O(1) space. The fact that variants of the algorithm can be done in parallel makes it even more awesome!

https://youtu.be/lJYufx0bfpw

#algorithms #ComputerScience #SoME3 #mathematics #maths #statistics

#til #hyperloglog #algorithms #computerscience #some3 #mathematics

Eugene Alvin Villar 🇵🇭 @[email protected] · 2023-07-18 · 11:11 UTC

#TIL about the #HyperLogLog algorithm and I think it's a damn brilliant way to estimate the number of unique elements of a potentially gargantuan set of items and only running in O(n) time and O(1) space. The fact that variants of the algorithm can be done in parallel makes it even more awesome!

https://youtu.be/lJYufx0bfpw

#algorithms #ComputerScience #SoME3 #mathematics #maths #statistics

#statistics #maths #mathematics #some3 #computerscience #algorithms

Eugene Alvin Villar 🇵🇭 @[email protected] · 2023-07-18 · 11:11 UTC

#TIL about the #HyperLogLog algorithm and I think it's a damn brilliant way to estimate the number of unique elements of a potentially gargantuan set of items and only running in O(n) time and O(1) space. The fact that variants of the algorithm can be done in parallel makes it even more awesome!

https://youtu.be/lJYufx0bfpw

#algorithms #ComputerScience #SoME3 #mathematics #maths #statistics

#til #hyperloglog #algorithms #computerscience #some3 #mathematics

Franck Pachot @[email protected] · 2023-07-02 · 07:52 UTC

Approximate COUNT(DISTINCT ...) in
#postgresql and #yugabytedb https://dev.to/yugabyte/approximate-count-distinct-in-yugabytedb-and-postgresql-with-hyperloglog-1i13… #hyperloglog #skipscan #analyze

#postgresql #yugabytedb #hyperloglog #skipscan #analyze

Franck Pachot @[email protected] · 2023-07-02 · 07:52 UTC

Approximate COUNT(DISTINCT ...) in
#postgresql and #yugabytedb https://dev.to/yugabyte/approximate-count-distinct-in-yugabytedb-and-postgresql-with-hyperloglog-1i13… #hyperloglog #skipscan #analyze

#postgresql #yugabytedb #hyperloglog #skipscan #analyze

Franck Pachot @[email protected] · 2023-07-02 · 07:52 UTC

Approximate COUNT(DISTINCT ...) in
#postgresql and #yugabytedb https://dev.to/yugabyte/approximate-count-distinct-in-yugabytedb-and-postgresql-with-hyperloglog-1i13… #hyperloglog #skipscan #analyze

#postgresql #yugabytedb #hyperloglog #skipscan #analyze

Franck Pachot @[email protected] · 2023-07-02 · 07:52 UTC

Approximate COUNT(DISTINCT ...) in
#postgresql and #yugabytedb https://dev.to/yugabyte/approximate-count-distinct-in-yugabytedb-and-postgresql-with-hyperloglog-1i13… #hyperloglog #skipscan #analyze

#analyze #skipscan #hyperloglog #yugabytedb #postgresql

Franck Pachot @[email protected] · 2023-07-02 · 07:52 UTC

Approximate COUNT(DISTINCT ...) in
#postgresql and #yugabytedb https://dev.to/yugabyte/approximate-count-distinct-in-yugabytedb-and-postgresql-with-hyperloglog-1i13… #hyperloglog #skipscan #analyze

#postgresql #yugabytedb #hyperloglog #skipscan #analyze

Breaking Taps @[email protected] · 2023-06-28 · 17:05 UTC

Something a little different today on the channel: HyperLogLog!

It's one of my favorite algorithms, used to estimate cardinality of a set. Typically used in environments with very large datasets (spread across many servers in a cluster) where a true, accurate distinct count would be very expensive.

HLL uses a simple observation about coin flipping probabilities, and extends that to cardinality estimation. Really clever algo, and provides a very fast and compact datastructure with reasonably small errors (<2% across billions of unique elements, typically in just a few kb of memory).

https://www.youtube.com/watch?v=lJYufx0bfpw

#programming #algorithm #hyperloglog #cardinality #datastructures