home.social

#apacheiceberg — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #apacheiceberg, aggregated by home.social.

  1. DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

    Key features:
    • catalog-stored small updates
    • improved sorting and partitioning
    • compatibility with Iceberg-style data features

    Learn more ⇨ bit.ly/48PsPIS

    #InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

  2. DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

    Key features:
    • catalog-stored small updates
    • improved sorting and partitioning
    • compatibility with Iceberg-style data features

    Learn more ⇨ bit.ly/48PsPIS

    #InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

  3. DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

    Key features:
    • catalog-stored small updates
    • improved sorting and partitioning
    • compatibility with Iceberg-style data features

    Learn more ⇨ bit.ly/48PsPIS

    #InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

  4. DuckDB Labs released 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

    Key features:
    • catalog-stored small updates
    • improved sorting and partitioning
    • compatibility with Iceberg-style data features

    Learn more ⇨ bit.ly/48PsPIS

  5. DuckDB Labs released #DuckLake 1.0 - a data lake format that stores table metadata in a SQL database, rather than spreading it across object storage files.

    Key features:
    • catalog-stored small updates
    • improved sorting and partitioning
    • compatibility with Iceberg-style data features

    Learn more ⇨ bit.ly/48PsPIS

    #InfoQ #DuckDB #ApacheIceberg #AI #DataLake #DataStorage

  6. Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

    But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

    In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

    📰 Read now: bit.ly/4902zeH

    #RelationalDatabases #DataLake

  7. Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

    But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

    In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

    📰 Read now: bit.ly/4902zeH

    #RelationalDatabases #DataLake

  8. Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

    But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

    In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

    📰 Read now: bit.ly/4902zeH

    #RelationalDatabases #DataLake

  9. Lakehouse architectures allow multiple engines to run on shared data through open table formats like .

    But identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

    In this article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

    📰 Read now: bit.ly/4902zeH

  10. Lakehouse architectures allow multiple engines to run on shared data through open table formats like #ApacheIceberg.

    But #SQL identifier resolution and catalog naming rules differ across engines - creating hidden interoperability failures.

    In this #InfoQ article, Maninder Parmar explains why enforcing consistent naming conventions and cross-engine validation is critical.

    📰 Read now: bit.ly/4902zeH

    #RelationalDatabases #DataLake

  11. #Pinterest launched a next-gen CDC-based ingestion framework.

    Using #ApacheKafka, #ApacheFlink, #ApacheSpark & #ApacheIceberg, they achieved:
    • Latency cut from 24+ hours to 15 minutes
    • Processing of only changed records
    • Support for incremental updates & deletions
    • Petabyte-scale data across 1,000+ pipelines

    Win: optimized cost & efficiency!

    Read the architectural deep dive on InfoQ 👉 bit.ly/4rMJB2H

    #SoftwareArchitecture #ChangeDataCapture

  12. #Pinterest launched a next-gen CDC-based ingestion framework.

    Using #ApacheKafka, #ApacheFlink, #ApacheSpark & #ApacheIceberg, they achieved:
    • Latency cut from 24+ hours to 15 minutes
    • Processing of only changed records
    • Support for incremental updates & deletions
    • Petabyte-scale data across 1,000+ pipelines

    Win: optimized cost & efficiency!

    Read the architectural deep dive on InfoQ 👉 bit.ly/4rMJB2H

    #SoftwareArchitecture #ChangeDataCapture

  13. #Pinterest launched a next-gen CDC-based ingestion framework.

    Using #ApacheKafka, #ApacheFlink, #ApacheSpark & #ApacheIceberg, they achieved:
    • Latency cut from 24+ hours to 15 minutes
    • Processing of only changed records
    • Support for incremental updates & deletions
    • Petabyte-scale data across 1,000+ pipelines

    Win: optimized cost & efficiency!

    Read the architectural deep dive on InfoQ 👉 bit.ly/4rMJB2H

    #SoftwareArchitecture #ChangeDataCapture

  14. #Pinterest launched a next-gen CDC-based ingestion framework.

    Using #ApacheKafka, #ApacheFlink, #ApacheSpark & #ApacheIceberg, they achieved:
    • Latency cut from 24+ hours to 15 minutes
    • Processing of only changed records
    • Support for incremental updates & deletions
    • Petabyte-scale data across 1,000+ pipelines

    Win: optimized cost & efficiency!

    Read the architectural deep dive on InfoQ 👉 bit.ly/4rMJB2H

    #SoftwareArchitecture #ChangeDataCapture

  15. launched a next-gen CDC-based ingestion framework.

    Using , , & , they achieved:
    • Latency cut from 24+ hours to 15 minutes
    • Processing of only changed records
    • Support for incremental updates & deletions
    • Petabyte-scale data across 1,000+ pipelines

    Win: optimized cost & efficiency!

    Read the architectural deep dive on InfoQ 👉 bit.ly/4rMJB2H

  16. #AWS announced 2 new capabilities for #S3Tables!

    🔹 Intelligent-Tiering storage class that automatically optimizes costs based on access patterns
    🔹 Replication support that keeps Apache Iceberg table replicas consistent across AWS regions and accounts - no manual syncing required

    Find out more: bit.ly/4qgRn3Y

    #CloudComputing #S3 #ApacheIceberg #InfoQ

  17. #AWS announced 2 new capabilities for #S3Tables!

    🔹 Intelligent-Tiering storage class that automatically optimizes costs based on access patterns
    🔹 Replication support that keeps Apache Iceberg table replicas consistent across AWS regions and accounts - no manual syncing required

    Find out more: bit.ly/4qgRn3Y

    #CloudComputing #S3 #ApacheIceberg #InfoQ

  18. #AWS announced 2 new capabilities for #S3Tables!

    🔹 Intelligent-Tiering storage class that automatically optimizes costs based on access patterns
    🔹 Replication support that keeps Apache Iceberg table replicas consistent across AWS regions and accounts - no manual syncing required

    Find out more: bit.ly/4qgRn3Y

    #CloudComputing #S3 #ApacheIceberg #InfoQ

  19. #AWS announced 2 new capabilities for #S3Tables!

    🔹 Intelligent-Tiering storage class that automatically optimizes costs based on access patterns
    🔹 Replication support that keeps Apache Iceberg table replicas consistent across AWS regions and accounts - no manual syncing required

    Find out more: bit.ly/4qgRn3Y

    #CloudComputing #S3 #ApacheIceberg #InfoQ

  20. announced 2 new capabilities for !

    🔹 Intelligent-Tiering storage class that automatically optimizes costs based on access patterns
    🔹 Replication support that keeps Apache Iceberg table replicas consistent across AWS regions and accounts - no manual syncing required

    Find out more: bit.ly/4qgRn3Y

  21. #DuckDB now supports end-to-end interaction with Iceberg REST Catalogs directly in the browser - no infrastructure setup required.

    With DuckDB-Wasm, users can query, read, and write Iceberg tables seamlessly.

    Learn more: bit.ly/4qCTYoF

    #DataAnalytics #WebAssembly #ApacheIceberg #AI #InfoQ

  22. Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.

    🔍 Dive into the key insights on #InfoQbit.ly/49y1tIa

    #CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare

  23. Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.

    🔍 Dive into the key insights on #InfoQbit.ly/49y1tIa

    #CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare

  24. Cloudflare has just launched the open beta of its Cloudflare Data Platform - a managed service for ingesting, storing & querying analytical data tables using open standards like Apache Iceberg.

    🔍 Dive into the key insights on #InfoQbit.ly/49y1tIa

    #CloudComputing #DataLake #DataAnalytics #ApacheIceberg #Cloudflare

  25. #Netflix scaled 𝐌𝐮𝐬𝐞 to handle 𝐭𝐫𝐢𝐥𝐥𝐢𝐨𝐧-𝐫𝐨𝐰 𝐝𝐚𝐭𝐚𝐬𝐞𝐭𝐬!

    ➡️ Muse helps teams see which artwork & videos resonate with audiences.
    ➡️ To keep up with demand, Netflix 𝐫𝐞𝐝𝐞𝐬𝐢𝐠𝐧𝐞𝐝 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚 𝐥𝐚𝐲𝐞𝐫, cutting query latencies by ~50% while keeping results accurate and responsive.

    🔗 Learn more: bit.ly/4gG3HGU

    #SoftwareArchitecture #DataBase #ApacheIceberg #InfoQ

  26. Watching the re-indexing of an archival catalog backup of AtoM, I realized:

    Indices populated with 18751 documents in 164.84 seconds.

    19k Objects?
    Thats /nothing/ for a regular #bigDATA tech-tool. This is peanuts.

    400.000 Objects?
    Millions?! - According to documentation of #ApacheIceberg #ObjectStore #Redis #KeyDB, etc: **easy**

    #DLTP & #GLAM: Storing and using those "objects" in key/value annotated filesystems with bigDATA tools:

    **FUN!!**

  27. Watching the re-indexing of an archival catalog backup of AtoM, I realized:

    Indices populated with 18751 documents in 164.84 seconds.

    19k Objects?
    Thats /nothing/ for a regular #bigDATA tech-tool. This is peanuts.

    400.000 Objects?
    Millions?! - According to documentation of #ApacheIceberg #ObjectStore #Redis #KeyDB, etc: **easy**

    #DLTP & #GLAM: Storing and using those "objects" in key/value annotated filesystems with bigDATA tools:

    **FUN!!**

  28. Watching the re-indexing of an archival catalog backup of AtoM, I realized:

    Indices populated with 18751 documents in 164.84 seconds.

    19k Objects?
    Thats /nothing/ for a regular #bigDATA tech-tool. This is peanuts.

    400.000 Objects?
    Millions?! - According to documentation of #ApacheIceberg #ObjectStore #Redis #KeyDB, etc: **easy**

    #DLTP & #GLAM: Storing and using those "objects" in key/value annotated filesystems with bigDATA tools:

    **FUN!!**

  29. Amazon #S3 now supports sort and z-order compaction for #ApacheIceberg tables, promising reduced scan times & lower engine costs.

    Available for both S3 Tables and traditional S3 buckets via AWS Glue Data Catalog optimization.

    Dive into the details: bit.ly/3GyjxWQ

    #InfoQ #AWS #DataAnalytics

  30. 📢 Behold, the earth-shattering breakthrough of Nimtable: a web UI to *click* on Apache Iceberg tables! 🙄 Presumably because using command line tools is an insurmountable task for mere mortals. Or maybe it’s just a clever way to make clicking around a web interface the new rocket science. 🚀
    github.com/nimtable/nimtable #Nimtable #ApacheIceberg #WebUI #Innovation #TechNews #ClickAndGo #HackerNews #ngated

  31. "Centralize Your Data Lake: Apache Polaris Supports Apache Iceberg and Now Delta Lake"

    BTW 'Polaris' used to be the name of the UK nuclear deterrent pre 1996. 😬

    snowflake.com/en/engineering-b

    #ApacheIceberg #ApachePolaris #DataLake

  32. "Centralize Your Data Lake: Apache Polaris Supports Apache Iceberg and Now Delta Lake"

    BTW 'Polaris' used to be the name of the UK nuclear deterrent pre 1996. 😬

    snowflake.com/en/engineering-b

    #ApacheIceberg #ApachePolaris #DataLake

  33. "Centralize Your Data Lake: Apache Polaris Supports Apache Iceberg and Now Delta Lake"

    BTW 'Polaris' used to be the name of the UK nuclear deterrent pre 1996. 😬

    snowflake.com/en/engineering-b

    #ApacheIceberg #ApachePolaris #DataLake

  34. "Centralize Your Data Lake: Apache Polaris Supports Apache Iceberg and Now Delta Lake"

    BTW 'Polaris' used to be the name of the UK nuclear deterrent pre 1996. 😬

    snowflake.com/en/engineering-b

    #ApacheIceberg #ApachePolaris #DataLake

  35. "Centralize Your Data Lake: Apache Polaris Supports Apache Iceberg and Now Delta Lake"

    BTW 'Polaris' used to be the name of the UK nuclear deterrent pre 1996. 😬

    snowflake.com/en/engineering-b

    #ApacheIceberg #ApachePolaris #DataLake

  36. What happens when you marry database with ? you could query huge datasets fast and with 10x cheaper storage. Sounds promising, right?

    Join me tomorrow on the live stream to find out!

    May 20th, 11am PT / 20:00 CET:
    youtube.com/watch?v=VeyTL2JlWp0

  37. R2 Data Catalog: Managed Apache Iceberg tables with zero egress fees - Cloudflare

    The Iceberg wars are hotting up. AWS has some competition.

    blog.cloudflare.com/r2-data-ca

    #ApacheIceberg #DataAnalysis #Cloudflare