#apacheiceberg — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #apacheiceberg, aggregated by home.social.
-
DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.
Key features:
• catalog-stored small updates
• improved sorting and partitioning
• compatibility with Iceberg-style data featuresLearn more ⇨ https://bit.ly/48PsPIS
-
DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.
Key features:
• catalog-stored small updates
• improved sorting and partitioning
• compatibility with Iceberg-style data featuresLearn more ⇨ https://bit.ly/48PsPIS
-
DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.
Key features:
• catalog-stored small updates
• improved sorting and partitioning
• compatibility with Iceberg-style data featuresLearn more ⇨ https://bit.ly/48PsPIS
-
DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.
Key features:
• catalog-stored small updates
• improved sorting and partitioning
• compatibility with Iceberg-style data featuresLearn more ⇨ https://bit.ly/48PsPIS
-
Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.
But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.
In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.
📰 Read now: https://bit.ly/4902zeH
-
Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.
But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.
In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.
📰 Read now: https://bit.ly/4902zeH
-
Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.
But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.
In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.
📰 Read now: https://bit.ly/4902zeH
-
Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.
But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.
In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.
📰 Read now: https://bit.ly/4902zeH
-
The Data Lakehouse Explained: Why Apache Iceberg Is Quietly Running the Show
https://techlife.blog/posts/data-lakehouse-iceberg
#ApacheIceberg #DataLakehouse #DataWarehouse #DataLake #Snowflake #ApacheSpark #DataEngineering
-
#Pinterest launched a next-gen CDC-based ingestion framework.
Using #ApacheKafka, #ApacheFlink, #ApacheSpark & #ApacheIceberg, they achieved:
• Latency cut from 24+ hours to 15 minutes
• Processing of only changed records
• Support for incremental updates & deletions
• Petabyte-scale data across 1,000+ pipelinesWin: optimized cost & efficiency!
Read the architectural deep dive on InfoQ 👉 https://bit.ly/4rMJB2H
-
Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.
🔍 Dive into the key insights on #InfoQ ⇨ https://bit.ly/49y1tIa
#CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare
-
Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.
🔍 Dive into the key insights on #InfoQ ⇨ https://bit.ly/49y1tIa
#CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare
-
Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.
🔍 Dive into the key insights on #InfoQ ⇨ https://bit.ly/49y1tIa
#CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare
-
scrapy-contrib-bigexporter 0.6.1 released: https://codeberg.org/ZuInnoTe/scrapy-contrib-bigexporters
Added: You can customize Iceberg table location
#scrapy #webscraping #bigdata #iceberg #apacheiceberg #opensource #python
-
scrapy-contrib-bigexporter 0.6.0 released: https://codeberg.org/ZuInnoTe/scrapy-contrib-bigexporters
New: Export your webscraped items in Scrapy to Apache Iceberg tables with simple configuration
#scrapy #webscraping #bigdata #iceberg #apacheiceberg #opensource #python
-
Watching the re-indexing of an archival catalog backup of AtoM, I realized:
Indices populated with 18751 documents in 164.84 seconds.
19k Objects?
Thats /nothing/ for a regular #bigDATA tech-tool. This is peanuts.400.000 Objects?
Millions?! - According to documentation of #ApacheIceberg #ObjectStore #Redis #KeyDB, etc: **easy**#DLTP & #GLAM: Storing and using those "objects" in key/value annotated filesystems with bigDATA tools:
**FUN!!**
-
Nimtable: Open-source web UI to browse and manage Apache Iceberg tables
https://github.com/nimtable/nimtable
#HackerNews #Nimtable #OpenSource #ApacheIceberg #WebUI #DataManagement #DatabaseTools
-
Paris: Apache Iceberg Paris Community Meetup #1, Le jeudi 19 juin 2025 de 18h00 à 21h30. https://www.agendadulibre.org/events/32653 #data #dataLakehouse #dataEngineer #dataScience #dataPlatform #dataWarehouse #apacheIceberg
-
Apache iceberg the Hadoop of the modern-data-stack? — https://blog.det.life/apache-iceberg-the-hadoop-of-the-modern-data-stack-c83f63a4ebb9
#HackerNews #ApacheIceberg #ModernDataStack #Hadoop #DataEngineering #BigData -
🎃The October issue of #CheckpointChronicle is now out 🌟
It covers Ververica's Fluss, #ApacheFlink 2.0, Iggy.rs, Strimzi's support for #ApacheKafka 4.0, tons of OTF material from @vanlightly, Christian Hollinger's write up of ngrok's data platform, nice detail of how SmartNews use #ApacheIceberg with Flink and #ApacheSpark, a good writeup from Sudhendu Pandey on #ApachePolaris, notes from Kir Titievsky on Kafka's Avro serialisers, and much more!
-
👩💻 Hands-On with Catalogs in Flink SQL
🔧 In this second post in the series, @rmoff shows how to use Flink SQL with catalogs including #apacheHive, #JDBC, & #apacheIceberg. It also includes a closer look at the data structures within the Hive Metastore.
-
#Netflix created a new solution for incremental processing in its data platform, reducing computing costs and execution time.
Learn how Maestro #WorkflowEngine & #ApacheIceberg improve data freshness and accuracy: https://bit.ly/47G53vo
#InfoQ #SoftwareArchitecture #Database #DataPipelines #AI #ML
-
🔥⏲️ Fudge Sunday "Are You Gonna Go Parquet" A look at the past, present, and future of Apache Parquet
#apacheiceberg #apachespark #prestodb #prestosql #trino #aiops #mlops #artificialintelligence #ai #aiforgood #aiforall #aiandbusiness #datalake #datalakehouse #datalakes #insights #dataengineering #realtimeanalytics #realtimedata #dataintegration #platformengineering #watsonx #devx #developerexperience #newsletter #newsletters
-
So Tessellate inherits lots of support for various data formats from Cascading
https://github.com/cwensel/cascadingEven though #apacheparquet dropped Cascading support, we were able to port it over.
Now that Parquet is native to Cascading, it should be easier to add #apacheiceberg support.
This would allow #clusterless to convert data as it arrives into Iceberg continuously for use in #aws Athena or other data front-ends.
Anyone interested in a challenge?
-
Die auf SQL zugeschnittene Datenanalyse- und BI-Plattform Dremio Cloud steht ab sofort als vollständig verwalteter Service kostenfrei bereit.
Dremio gibt seinen Data-Lakehouse-Dienst kostenlos für alle frei