home.social

#apacheiceberg — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #apacheiceberg, aggregated by home.social.

  1. DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

    Key features:
    • catalog-stored small updates
    • improved sorting and partitioning
    • compatibility with Iceberg-style data features

    Learn more ⇨ bit.ly/48PsPIS

    #InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

  2. DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

    Key features:
    • catalog-stored small updates
    • improved sorting and partitioning
    • compatibility with Iceberg-style data features

    Learn more ⇨ bit.ly/48PsPIS

    #InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

  3. DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

    Key features:
    • catalog-stored small updates
    • improved sorting and partitioning
    • compatibility with Iceberg-style data features

    Learn more ⇨ bit.ly/48PsPIS

    #InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

  4. DuckDB Labs released 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

    Key features:
    • catalog-stored small updates
    • improved sorting and partitioning
    • compatibility with Iceberg-style data features

    Learn more ⇨ bit.ly/48PsPIS

  5. Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

    But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

    In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

    📰 Read now: bit.ly/4902zeH

    #RelationalDatabases #DataLake

  6. Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

    But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

    In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

    📰 Read now: bit.ly/4902zeH

    #RelationalDatabases #DataLake

  7. Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

    But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

    In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

    📰 Read now: bit.ly/4902zeH

    #RelationalDatabases #DataLake

  8. Lakehouse architectures allow multiple engines to run on shared data through open table formats like .

    But identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

    In this article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

    📰 Read now: bit.ly/4902zeH

  9. #Pinterest launched a next-gen CDC-based ingestion framework.

    Using #ApacheKafka, #ApacheFlink, #ApacheSpark & #ApacheIceberg, they achieved:
    • Latency cut from 24+ hours to 15 minutes
    • Processing of only changed records
    • Support for incremental updates & deletions
    • Petabyte-scale data across 1,000+ pipelines

    Win: optimized cost & efficiency!

    Read the architectural deep dive on InfoQ 👉 bit.ly/4rMJB2H

    #SoftwareArchitecture #ChangeDataCapture

  10. Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.

    🔍 Dive into the key insights on #InfoQbit.ly/49y1tIa

    #CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare

  11. Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.

    🔍 Dive into the key insights on #InfoQbit.ly/49y1tIa

    #CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare

  12. Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.

    🔍 Dive into the key insights on #InfoQbit.ly/49y1tIa

    #CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare

  13. Watching the re-indexing of an archival catalog backup of AtoM, I realized:

    Indices populated with 18751 documents in 164.84 seconds.

    19k Objects?
    Thats /nothing/ for a regular #bigDATA tech-tool. This is peanuts.

    400.000 Objects?
    Millions?! - According to documentation of #ApacheIceberg #ObjectStore #Redis #KeyDB, etc: **easy**

    #DLTP & #GLAM: Storing and using those "objects" in key/value annotated filesystems with bigDATA tools:

    **FUN!!**

  14. 🎃The October issue of #CheckpointChronicle is now out 🌟

    It covers Ververica's Fluss, #ApacheFlink 2.0, Iggy.rs, Strimzi's support for #ApacheKafka 4.0, tons of OTF material from @vanlightly, Christian Hollinger's write up of ngrok's data platform, nice detail of how SmartNews use #ApacheIceberg with Flink and #ApacheSpark, a good writeup from Sudhendu Pandey on #ApachePolaris, notes from Kir Titievsky on Kafka's Avro serialisers, and much more!

    dcbl.link/cc-oct242

  15. 👩‍💻 Hands-On with Catalogs in Flink SQL

    🔧 In this second post in the series, @rmoff shows how to use Flink SQL with catalogs including #apacheHive, #JDBC, & #apacheIceberg. It also includes a closer look at the data structures within the Hive Metastore.

    dcbl.link/flink-catalogs---2

    #dataEngineering #streamProcessing #SQL #openSource

  16. #Netflix created a new solution for incremental processing in its data platform, reducing computing costs and execution time.

    Learn how Maestro #WorkflowEngine & #ApacheIceberg improve data freshness and accuracy: bit.ly/47G53vo

    #InfoQ #SoftwareArchitecture #Database #DataPipelines #AI #ML

  17. So Tessellate inherits lots of support for various data formats from Cascading
    github.com/cwensel/cascading

    Even though dropped Cascading support, we were able to port it over.

    Now that Parquet is native to Cascading, it should be easier to add support.

    This would allow to convert data as it arrives into Iceberg continuously for use in Athena or other data front-ends.

    Anyone interested in a challenge?

  18. Die auf SQL zugeschnittene Datenanalyse- und BI-Plattform Dremio Cloud steht ab sofort als vollständig verwalteter Service kostenfrei bereit.
    Dremio gibt seinen Data-Lakehouse-Dienst kostenlos für alle frei