#apacheiceberg — Public Fediverse posts on home.social

InfoQ @[email protected] · 2026-05-07 · 05:00 UTC

DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

Key features:
• catalog-stored small updates
• improved sorting and partitioning
• compatibility with Iceberg-style data features

Learn more ⇨ https://bit.ly/48PsPIS

#InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

#ducklake #infoq #duckdb #apacheiceberg #ai #datalake

InfoQ @[email protected] · 2026-05-07 · 05:00 UTC

DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

Key features:
• catalog-stored small updates
• improved sorting and partitioning
• compatibility with Iceberg-style data features

Learn more ⇨ https://bit.ly/48PsPIS

#InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

#ducklake #infoq #duckdb #apacheiceberg #ai #datalake

InfoQ @[email protected] · 2026-05-07 · 05:00 UTC

DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

Key features:
• catalog-stored small updates
• improved sorting and partitioning
• compatibility with Iceberg-style data features

Learn more ⇨ https://bit.ly/48PsPIS

#InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

#datastorage #datalake #ai #apacheiceberg #duckdb #infoq

InfoQ @infoq · 2026-05-07 · 05:00 UTC

DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

Key features:
• catalog-stored small updates
• improved sorting and partitioning
• compatibility with Iceberg-style data features

Learn more ⇨ https://bit.ly/48PsPIS

#InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

#ducklake #infoq #duckdb #apacheiceberg #ai #datalake

InfoQ @[email protected] · 2026-04-20 · 07:46 UTC

Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

📰 Read now: https://bit.ly/4902zeH

#RelationalDatabases #DataLake

#apacheiceberg #sql #infoq #relationaldatabases #datalake

InfoQ @[email protected] · 2026-04-20 · 07:46 UTC

Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

📰 Read now: https://bit.ly/4902zeH

#RelationalDatabases #DataLake

#apacheiceberg #sql #infoq #relationaldatabases #datalake

InfoQ @[email protected] · 2026-04-20 · 07:46 UTC

Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

📰 Read now: https://bit.ly/4902zeH

#RelationalDatabases #DataLake

#datalake #relationaldatabases #infoq #sql #apacheiceberg

InfoQ @infoq · 2026-04-20 · 07:46 UTC

Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

📰 Read now: https://bit.ly/4902zeH

#RelationalDatabases #DataLake

#apacheiceberg #sql #infoq #relationaldatabases #datalake

TechLİfe @techlife_blog · 2026-03-31 · 08:24 UTC

The Data Lakehouse Explained: Why Apache Iceberg Is Quietly Running the Show

https://techlife.blog/posts/data-lakehouse-iceberg

#ApacheIceberg #DataLakehouse #DataWarehouse #DataLake #Snowflake #ApacheSpark #DataEngineering

#apacheiceberg #datalakehouse #datawarehouse #datalake #snowflake #apachespark

InfoQ @[email protected] · 2026-03-03 · 13:02 UTC

#Pinterest launched a next-gen CDC-based ingestion framework.

Using #ApacheKafka, #ApacheFlink, #ApacheSpark & #ApacheIceberg, they achieved:
• Latency cut from 24+ hours to 15 minutes
• Processing of only changed records
• Support for incremental updates & deletions
• Petabyte-scale data across 1,000+ pipelines

Win: optimized cost & efficiency!

Read the architectural deep dive on InfoQ 👉 https://bit.ly/4rMJB2H

#SoftwareArchitecture #ChangeDataCapture

#pinterest #apachekafka #apacheflink #apachespark #apacheiceberg #softwarearchitecture

InfoQ @[email protected] · 2025-11-04 · 10:59 UTC

Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.

🔍 Dive into the key insights on #InfoQ ⇨ https://bit.ly/49y1tIa

#CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare

#infoq #cloudcomputing #datalake #dataanalytics #apacheiceberg #cloudflare

InfoQ @[email protected] · 2025-11-04 · 10:59 UTC

Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.

🔍 Dive into the key insights on #InfoQ ⇨ https://bit.ly/49y1tIa

#CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare

#infoq #cloudcomputing #datalake #dataanalytics #apacheiceberg #cloudflare

InfoQ @[email protected] · 2025-11-04 · 10:59 UTC

Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.

🔍 Dive into the key insights on #InfoQ ⇨ https://bit.ly/49y1tIa

#CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare

#cloudflare #apacheiceberg #dataanalytics #datalake #cloudcomputing #infoq

Jörn Franke @[email protected] · 2025-10-03 · 19:24 UTC

scrapy-contrib-bigexporter 0.6.1 released: https://codeberg.org/ZuInnoTe/scrapy-contrib-bigexporters

Added: You can customize Iceberg table location

#scrapy #webscraping #bigdata #iceberg #apacheiceberg #opensource #python

#scrapy #webscraping #bigdata #iceberg #apacheiceberg #opensource

Jörn Franke @[email protected] · 2025-10-03 · 16:55 UTC

scrapy-contrib-bigexporter 0.6.0 released: https://codeberg.org/ZuInnoTe/scrapy-contrib-bigexporters

New: Export your webscraped items in Scrapy to Apache Iceberg tables with simple configuration

#scrapy #webscraping #bigdata #iceberg #apacheiceberg #opensource #python

#scrapy #webscraping #bigdata #iceberg #apacheiceberg #opensource

Peter B. @[email protected] · 2025-09-19 · 19:10 UTC

Watching the re-indexing of an archival catalog backup of AtoM, I realized:

Indices populated with 18751 documents in 164.84 seconds.

19k Objects?
Thats /nothing/ for a regular #bigDATA tech-tool. This is peanuts.

400.000 Objects?
Millions?! - According to documentation of #ApacheIceberg #ObjectStore #Redis #KeyDB, etc: **easy**

#DLTP & #GLAM: Storing and using those "objects" in key/value annotated filesystems with bigDATA tools:

**FUN!!**

#bigdata #apacheiceberg #objectstore #redis #keydb #dltp

Hacker News @[email protected] · 2025-07-01 · 05:35 UTC

Nimtable: Open-source web UI to browse and manage Apache Iceberg tables

https://github.com/nimtable/nimtable

#HackerNews #Nimtable #OpenSource #ApacheIceberg #WebUI #DataManagement #DatabaseTools

#hackernews #nimtable #opensource #apacheiceberg #webui #datamanagement

Agenda du Libre @[email protected] · 2025-06-16 · 21:11 UTC

Paris: Apache Iceberg Paris Community Meetup #1, Le jeudi 19 juin 2025 de 18h00 à 21h30. https://www.agendadulibre.org/events/32653 #data #dataLakehouse #dataEngineer #dataScience #dataPlatform #dataWarehouse #apacheIceberg

#data #datalakehouse #dataengineer #datascience #dataplatform #datawarehouse

Hacker News @[email protected] · 2025-03-06 · 15:05 UTC

Apache iceberg the Hadoop of the modern-data-stack? — https://blog.det.life/apache-iceberg-the-hadoop-of-the-modern-data-stack-c83f63a4ebb9
#HackerNews #ApacheIceberg #ModernDataStack #Hadoop #DataEngineering #BigData

#moderndatastack #hadoop #dataengineering #bigdata #hackernews #apacheiceberg

rmoff 🏃🏻 🍺 🥓 @[email protected] · 2024-10-30 · 16:19 UTC

🎃The October issue of #CheckpointChronicle is now out 🌟

It covers Ververica's Fluss, #ApacheFlink 2.0, Iggy.rs, Strimzi's support for #ApacheKafka 4.0, tons of OTF material from @vanlightly, Christian Hollinger's write up of ngrok's data platform, nice detail of how SmartNews use #ApacheIceberg with Flink and #ApacheSpark, a good writeup from Sudhendu Pandey on #ApachePolaris, notes from Kir Titievsky on Kafka's Avro serialisers, and much more!

https://dcbl.link/cc-oct242

#checkpointchronicle #apacheflink #apachekafka #apacheiceberg #apachespark #apachepolaris

Decodable @[email protected] · 2024-02-20 · 17:26 UTC

👩‍💻 Hands-On with Catalogs in Flink SQL

🔧 In this second post in the series, @rmoff shows how to use Flink SQL with catalogs including #apacheHive, #JDBC, & #apacheIceberg. It also includes a closer look at the data structures within the Hive Metastore.

https://dcbl.link/flink-catalogs---2

#dataEngineering #streamProcessing #SQL #openSource

#apachehive #jdbc #apacheiceberg #dataengineering #streamprocessing #sql

InfoQ @[email protected] · 2024-01-19 · 09:12 UTC

#Netflix created a new solution for incremental processing in its data platform, reducing computing costs and execution time.

Learn how Maestro #WorkflowEngine & #ApacheIceberg improve data freshness and accuracy: https://bit.ly/47G53vo

#InfoQ #SoftwareArchitecture #Database #DataPipelines #AI #ML

#netflix #workflowengine #apacheiceberg #infoq #softwarearchitecture #database

jay @[email protected] · 2023-08-28 · 00:57 UTC

🔥⏲️ Fudge Sunday "Are You Gonna Go Parquet" A look at the past, present, and future of Apache Parquet

#apacheiceberg #apachespark #prestodb #prestosql #trino #aiops #mlops #artificialintelligence #ai #aiforgood #aiforall #aiandbusiness #datalake #datalakehouse #datalakes #insights #dataengineering #realtimeanalytics #realtimedata #dataintegration #platformengineering #watsonx #devx #developerexperience #newsletter #newsletters

https://fudge.org/archive/are-you-gonna-go-parquet/

#apacheiceberg #apachespark #prestodb #prestosql #trino #aiops

Chris K Wensel @cwensel · 2023-06-30 · 18:00 UTC

So Tessellate inherits lots of support for various data formats from Cascading
https://github.com/cwensel/cascading

Even though #apacheparquet dropped Cascading support, we were able to port it over.

Now that Parquet is native to Cascading, it should be easier to add #apacheiceberg support.

This would allow #clusterless to convert data as it arrives into Iceberg continuously for use in #aws Athena or other data front-ends.

Anyone interested in a challenge?

#aws #java

#apacheparquet #apacheiceberg #clusterless #aws #java

heise online (inoffiziell) @[email protected] · 2022-03-03 · 13:15 UTC

Die auf SQL zugeschnittene Datenanalyse- und BI-Plattform Dremio Cloud steht ab sofort als vollständig verwalteter Service kostenfrei bereit.
Dremio gibt seinen Data-Lakehouse-Dienst kostenlos für alle frei

#sql #bigdata #businessintelligence #datawarehouse #dremio #datascience