#datafusion — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #datafusion, aggregated by home.social.
-
Missing RustWeek and want to hear me ramble about data processing in Rust? Head on over here ;) https://youtu.be/uvfiz7-blyU
-
I’m excited to be heading to Los Angeles next week for the Air Sensors International Conference (ASIC) 2026. I’ll be sharing new findings from our work evaluating low‑cost air sensors for data fusion.
Presentation: In-Situ Evaluation of Low-Cost PM2.5 Sensor Networks Using a Novel Distance-Based Method Quantifies Sensor Uncertainty and Reveals Episodic Variability in Performance
Session: Air Sensor Degradation and Operating in Challenging Environments
Time: Thursday, May 7th, 2026 | 1:30–3:00 PMIf you’ll be at ASIC, I’d love to connect and exchange ideas.
#ASIC2026 #airquality #lowcostsensors #airpollution #datafusion -
@thealexmerced thanks! Added to wish list in manning. Better 2buy there vs Amazon to get the ai features?
I guess Manning got rid of old option to buy coins 2 read individual pages? was a cool feature 2 bad.
Thanks for reminder about #datafusion i guess it & #polars have excellent #iceberg support & can be used from #rust
I was thinking about replacing a #pyspark glue job with a rust #lambda on #aws
Just found your excellent medium account. Best of luck at your upcoming talk!
-
@thealexmerced thanks! Added to wish list in manning. Better 2buy there vs Amazon to get the ai features?
I guess Manning got rid of old option to buy coins 2 read individual pages? was a cool feature 2 bad.
Thanks for reminder about #datafusion i guess it & #polars have excellent #iceberg support & can be used from #rust
I was thinking about replacing a #pyspark glue job with a rust #lambda on #aws
Just found your excellent medium account. Best of luck at your upcoming talk!
-
If you are a person working on the software development industry, please do yourself a favor and read @andygrove book: "How Query Engines work":
https://howqueryengineswork.com/
The book is free to read on web, but if you like it, I'd suggest to buy the eBook version too.
Andy Groove is the creator of #DataFusion, and has been working with query engines for a big while.
-
Как мы переписали ядро Trino на Rust
CedrusData Engine — это lakehouse-движок, основанный на Trino. На реальных нагрузках наш продукт рутинно превосходит по производительности другие технологии (Trino, Doris, Dremio, StarRocks) в 1.5-3 раза, с еще более значительным отрывом от устаревших Greenplum и Impala. Эти результаты — следствие постоянных вложений в разработку новейших техник обработки больших данных. В этой статье я расскажу про проект Oxide — одну из наших ключевых инициатив прошлого года по переписыванию ядра Trino с Java на Rust.
-
Как мы переписали ядро Trino на Rust
CedrusData Engine — это lakehouse-движок, основанный на Trino. На реальных нагрузках наш продукт рутинно превосходит по производительности другие технологии (Trino, Doris, Dremio, StarRocks) в 1.5-3 раза, с еще более значительным отрывом от устаревших Greenplum и Impala. Эти результаты — следствие постоянных вложений в разработку новейших техник обработки больших данных. В этой статье я расскажу про проект Oxide — одну из наших ключевых инициатив прошлого года по переписыванию ядра Trino с Java на Rust.
-
Как мы переписали ядро Trino на Rust
CedrusData Engine — это lakehouse-движок, основанный на Trino. На реальных нагрузках наш продукт рутинно превосходит по производительности другие технологии (Trino, Doris, Dremio, StarRocks) в 1.5-3 раза, с еще более значительным отрывом от устаревших Greenplum и Impala. Эти результаты — следствие постоянных вложений в разработку новейших техник обработки больших данных. В этой статье я расскажу про проект Oxide — одну из наших ключевых инициатив прошлого года по переписыванию ядра Trino с Java на Rust.
-
Как мы переписали ядро Trino на Rust
CedrusData Engine — это lakehouse-движок, основанный на Trino. На реальных нагрузках наш продукт рутинно превосходит по производительности другие технологии (Trino, Doris, Dremio, StarRocks) в 1.5-3 раза, с еще более значительным отрывом от устаревших Greenplum и Impala. Эти результаты — следствие постоянных вложений в разработку новейших техник обработки больших данных. В этой статье я расскажу про проект Oxide — одну из наших ключевых инициатив прошлого года по переписыванию ядра Trino с Java на Rust.
-
We've been working on something exciting in the Arrow/DataFusion ecosystem, which finally shipped with yesterday's release of DataFusion. You can now use Run-End-Encoded arrays in group by clauses!
#opensource #apache #arrow #datafusion #performance #database
-
Embedding User-Defined Indexes in Apache Parquet
https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/
#HackerNews #Embedding #User-Defined #Indexes #in #Apache #Parquet #ApacheParquet #UserDefinedIndexes #DataFusion #BigData #Analytics
-
Improvement of the disk manager configuration on #apache #datafusion by introducing a builder.
-
Со скоростью кометы: ускоряем Spark без переписывания кода
Привет, Хабр! Меня зовут Лев Маковеев. Я младший инженер по обработке данных в компании «Криптонит». В этой статье хочу поделиться с вами результатами небольшого исследования, в ходе которого мы протестировали ускоритель запросов Apache DataFusion Comet и пришли к довольно впечатляющим результатам. Забегая вперёд, отмечу, что в отдельных тестах ускорение было более чем десятикратным!
https://habr.com/ru/companies/kryptonite/articles/902872/
#spark #apache #comet #DataFusion #большие_данные #анализ_данных #data_engineering #data_scientist #big_data #оптимизация
-
I wrote a blog post on how to write user defined functions in Apache DataFusion. This includes how you can write Rust backed Python functions that operate at full native speed with zero copy operations of the data structures. Switching from pure python functions to these types of UDFs can lead to 10x speed improvements.
https://datafusion.apache.org/blog/2024/11/19/datafusion-python-udf-comparisons/
-
The Apache Arrow ecosystem is an amazing enabler for a whole new range of database systems. Tonbo is a K/V store that uses Arrow+Parquet on storage systems ranging from OPFS (browser filesystem) to S3. And of course you get SQL for free with Apache DataFusion! #ApacheArrow #DataFusion #RustLang
https://tonbo.io/blog/introducing-tonbo -
The Apache Arrow ecosystem is an amazing enabler for a whole new range of database systems. Tonbo is a K/V store that uses Arrow+Parquet on storage systems ranging from OPFS (browser filesystem) to S3. And of course you get SQL for free with Apache DataFusion! #ApacheArrow #DataFusion #RustLang
https://tonbo.io/blog/introducing-tonbo -
The Apache Arrow ecosystem is an amazing enabler for a whole new range of database systems. Tonbo is a K/V store that uses Arrow+Parquet on storage systems ranging from OPFS (browser filesystem) to S3. And of course you get SQL for free with Apache DataFusion! #ApacheArrow #DataFusion #RustLang
https://tonbo.io/blog/introducing-tonbo -
The Apache Arrow ecosystem is an amazing enabler for a whole new range of database systems. Tonbo is a K/V store that uses Arrow+Parquet on storage systems ranging from OPFS (browser filesystem) to S3. And of course you get SQL for free with Apache DataFusion! #ApacheArrow #DataFusion #RustLang
https://tonbo.io/blog/introducing-tonbo -
The Apache Arrow ecosystem is an amazing enabler for a whole new range of database systems. Tonbo is a K/V store that uses Arrow+Parquet on storage systems ranging from OPFS (browser filesystem) to S3. And of course you get SQL for free with Apache DataFusion! #ApacheArrow #DataFusion #RustLang
https://tonbo.io/blog/introducing-tonbo -
Awesome reading list about Apache DataFusion. I started diving into it lately, hacking an extension to have Elasticsearch as a data source (aka a TableProvider). It's a wonderful piece of software and an impressive ecosystem. #DataFusion #RustLang https://datafusion.apache.org/user-guide/concepts-readings-events.html
-
Very nice introduction to Apache DataFusion and its internals. I started to dive into it lately, and it's really a wonderful piece of software. #datafusion #rustlang https://www.youtube.com/watch?v=iJhRbDFJjbg
-
And the code is up! Still an early prototype, but we can write SQL queries with DataFusion that join Elasticsearch indices with other sources like SQL databases, CVS files, Parquet data on S3, etc 🤯 DataFusion is powerful and fun to use! #elasticsearch #datafusion #rustlang https://github.com/swallez/elasticsearch-datafusion-tableprovider
-
Starting a "space time" week at Elastic, where we can work on whatever we want that is loosely related to our products. For me, this will be experimenting with Apache DataFusion to see how we can use/integrate it with Elasticsearch.
#elasticsearch #datafusion -
I recently worked on an update to the Apache DataFusion project. The goal of this project is to provide a fast, modular approach to building large scale data processing.
The update adds in significant improvements to the python interface, to the point where I would now recommend giving it a try to people who haven't used it or had success.
I wrote up a blog post about the changes, hosted on the Apache DataFusion site: https://datafusion.apache.org/blog/2024/08/20/python-datafusion-40.0.0/
-
I put in an open source PR today!
I've been playing around with Apache #DataFusion and wanted to see how it performed against a real life non-trivial problem I have. I ran into a blocker in that one of the basic functions that is exposed in the rust code underneath isn't exposed in python, so I just wrote it myself and put up a PR. I've never really been much of an open source contributor, so looking forward to seeing how this goes.
-
🌐 Five of our prototypes will be using advanced #DataFusion methods to bring EO products to the #nextlevel . Our scientists are at the forefront of developing a multi-modal architecture, leveraging Single Image Super Resolution #SIRS techniques fuelled by deep Convolutional Neural Networks #CNN and Generative Adversarial Networks #GAN 🖥
Explore further 👉 https://www.evo-land.eu/method/data-fusion/
-
My latest rust project: I've been using my regular note taking app for tracking my workouts instead of a weight lifting app. I still want to analyze that data a bit though, so now I've made a parser for my notes with #Pestrs, converting my lifts to Apache Arrow column batches and slapped #Datafusion on top of it so I can query it with SQL. Thinking of embedding it in a Flutter so I can take notes, make queries and create custom reports in the same app :D
-
If you're interested in #dataframes, seems like it starts to be many of us in Mastodon: @jorisvandenbossche @marcogorelli (#pandas) @maartenbreddels (#vaex) @ritchie46 (#polars) @andygrove (#datafusion) @bkamins (#JuliaLang)
And probably others (feel free to comment with anyone I missed).
-
@datapythonista @jorisvandenbossche @marcogorelli @maartenbreddels @ritchie46 @andygrove @bkamins I'll tag alone 🙂 Mostly interested in #polars & #dataFusion + #ArrowFlight & #ApacheBallista. #MalloyData too actually.
-
@goodthinkhunting
So ist es. Es fängt schon so früh an, wo die Schützlinge nicht ansatzweise verstehen, was für Wellen das schlägt. Und diese auch nicht ganz so offensichtlichen #Daten sind auch nur die Spitze des Eisberges. Schnell ist es die #Matrikelnummer der uni und somit der halbe Notenspiegel in Digitalen aushängen. Bald haben wir in der #SteuerID eine universelle #personenke #Personenkenziffer für sämtliche Ämter.
#DataFusion
#DatenschutzIstMenschenschutz #DatenschutzIstKinderschutz -
@goodthinkhunting
So ist es. Es fängt schon so früh an, wo die Schützlinge nicht ansatzweise verstehen, was für Wellen das schlägt. Und diese auch nicht ganz so offensichtlichen #Daten sind auch nur die Spitze des Eisberges. Schnell ist es die #Matrikelnummer der uni und somit der halbe Notenspiegel in Digitalen aushängen. Bald haben wir in der #SteuerID eine universelle #personenke #Personenkenziffer für sämtliche Ämter.
#DataFusion
#DatenschutzIstMenschenschutz #DatenschutzIstKinderschutz -
@goodthinkhunting
So ist es. Es fängt schon so früh an, wo die Schützlinge nicht ansatzweise verstehen, was für Wellen das schlägt. Und diese auch nicht ganz so offensichtlichen #Daten sind auch nur die Spitze des Eisberges. Schnell ist es die #Matrikelnummer der uni und somit der halbe Notenspiegel in Digitalen aushängen. Bald haben wir in der #SteuerID eine universelle #personenke #Personenkenziffer für sämtliche Ämter.
#DataFusion
#DatenschutzIstMenschenschutz #DatenschutzIstKinderschutz