home.social

#apacheparquet — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #apacheparquet, aggregated by home.social.

  1. 1 Billion Row Challenge creator Gunnar Morling is back with #Hardwood - a zero-dependency, ultra-fast #Java parser for #ApacheParquet.

    • Uses page-level parallelization with Java Virtual Threads to maximize CPU core usage & scalable concurrency
    • Built AI-natively, where strong documentation accelerated development - while human oversight remained critical for code quality, API design & regression prevention

    🎧 Listen to the #InfoQ #podcast to learn more about Hardwood’s architecture, parallelization & performance optimization ⇨ bit.ly/4wP8jlR

    #SoftwareEngineering #AI

  2. 1 Billion Row Challenge creator Gunnar Morling is back with #Hardwood - a zero-dependency, ultra-fast #Java parser for #ApacheParquet.

    • Uses page-level parallelization with Java Virtual Threads to maximize CPU core usage & scalable concurrency
    • Built AI-natively, where strong documentation accelerated development - while human oversight remained critical for code quality, API design & regression prevention

    🎧 Listen to the #InfoQ #podcast to learn more about Hardwood’s architecture, parallelization & performance optimization ⇨ bit.ly/4wP8jlR

    #SoftwareEngineering #AI

  3. 1 Billion Row Challenge creator Gunnar Morling is back with #Hardwood - a zero-dependency, ultra-fast #Java parser for #ApacheParquet.

    • Uses page-level parallelization with Java Virtual Threads to maximize CPU core usage & scalable concurrency
    • Built AI-natively, where strong documentation accelerated development - while human oversight remained critical for code quality, API design & regression prevention

    🎧 Listen to the #InfoQ #podcast to learn more about Hardwood’s architecture, parallelization & performance optimization ⇨ bit.ly/4wP8jlR

    #SoftwareEngineering #AI

  4. 1 Billion Row Challenge creator Gunnar Morling is back with #Hardwood - a zero-dependency, ultra-fast #Java parser for #ApacheParquet.

    • Uses page-level parallelization with Java Virtual Threads to maximize CPU core usage & scalable concurrency
    • Built AI-natively, where strong documentation accelerated development - while human oversight remained critical for code quality, API design & regression prevention

    🎧 Listen to the #InfoQ #podcast to learn more about Hardwood’s architecture, parallelization & performance optimization ⇨ bit.ly/4wP8jlR

    #SoftwareEngineering #AI

  5. 1 Billion Row Challenge creator Gunnar Morling is back with - a zero-dependency, ultra-fast parser for .

    • Uses page-level parallelization with Java Virtual Threads to maximize CPU core usage & scalable concurrency
    • Built AI-natively, where strong documentation accelerated development - while human oversight remained critical for code quality, API design & regression prevention

    🎧 Listen to the to learn more about Hardwood’s architecture, parallelization & performance optimization ⇨ bit.ly/4wP8jlR

  6. Every #TimeSeriesDatabase is just a set of storage decisions:
    ➡️ Row layout
    ➡️ Compression timing
    ➡️ Partitioning strategy

    These choices often impact cost and query performance more than the database you pick.

    This #InfoQ article breaks down these fundamentals from first principles using #PostgreSQL & #ApacheParquetbit.ly/4fkDHlV

    #BigData #TimeSeriesData #Database

  7. Every #TimeSeriesDatabase is just a set of storage decisions:
    ➡️ Row layout
    ➡️ Compression timing
    ➡️ Partitioning strategy

    These choices often impact cost and query performance more than the database you pick.

    This #InfoQ article breaks down these fundamentals from first principles using #PostgreSQL & #ApacheParquetbit.ly/4fkDHlV

    #BigData #TimeSeriesData #Database

  8. Every #TimeSeriesDatabase is just a set of storage decisions:
    ➡️ Row layout
    ➡️ Compression timing
    ➡️ Partitioning strategy

    These choices often impact cost and query performance more than the database you pick.

    This #InfoQ article breaks down these fundamentals from first principles using #PostgreSQL & #ApacheParquetbit.ly/4fkDHlV

    #BigData #TimeSeriesData #Database

  9. Every #TimeSeriesDatabase is just a set of storage decisions:
    ➡️ Row layout
    ➡️ Compression timing
    ➡️ Partitioning strategy

    These choices often impact cost and query performance more than the database you pick.

    This #InfoQ article breaks down these fundamentals from first principles using #PostgreSQL & #ApacheParquetbit.ly/4fkDHlV

    #BigData #TimeSeriesData #Database

  10. Every is just a set of storage decisions:
    ➡️ Row layout
    ➡️ Compression timing
    ➡️ Partitioning strategy

    These choices often impact cost and query performance more than the database you pick.

    This article breaks down these fundamentals from first principles using & bit.ly/4fkDHlV

  11. Ah, yes, the riveting saga of cramming "user-defined indexes" into Apache Parquet files. 😴 Because who doesn’t love a story about exploiting footer metadata to do something nobody asked for? Next time, tell us how to alphabetize your sock drawer using ForestDB. 🧦📚
    datafusion.apache.org/blog/202 #userdefinedindexes #apacheparquet #footermetadata #techhumor #dataengineering #HackerNews #ngated

  12. Ah, yes, the riveting saga of cramming "user-defined indexes" into Apache Parquet files. 😴 Because who doesn’t love a story about exploiting footer metadata to do something nobody asked for? Next time, tell us how to alphabetize your sock drawer using ForestDB. 🧦📚
    datafusion.apache.org/blog/202 #userdefinedindexes #apacheparquet #footermetadata #techhumor #dataengineering #HackerNews #ngated

  13. Ah, yes, the riveting saga of cramming "user-defined indexes" into Apache Parquet files. 😴 Because who doesn’t love a story about exploiting footer metadata to do something nobody asked for? Next time, tell us how to alphabetize your sock drawer using ForestDB. 🧦📚
    datafusion.apache.org/blog/202 #userdefinedindexes #apacheparquet #footermetadata #techhumor #dataengineering #HackerNews #ngated

  14. Ah, yes, the riveting saga of cramming "user-defined indexes" into Apache Parquet files. 😴 Because who doesn’t love a story about exploiting footer metadata to do something nobody asked for? Next time, tell us how to alphabetize your sock drawer using ForestDB. 🧦📚
    datafusion.apache.org/blog/202 #userdefinedindexes #apacheparquet #footermetadata #techhumor #dataengineering #HackerNews #ngated

  15. 😱 Oh no, another *shocking* #security flaw! This time it's Apache Parquet, because who doesn't love a good #data #storage scare? 🙄 But wait, first you have to enable #JavaScript and #cookies to even read about it—because security is all about user inconvenience! 🍪🔐
    bleepingcomputer.com/news/secu #flaw #ApacheParquet #HackerNews #ngated

  16. Our latest post is out, check it out for the full details here 👉 opalsec.io/daily-news-update-s

    If you're short on time, here's a quick rundown of the key stories:

    🇦🇺 Australian Pension Funds Under Attack: A significant credential stuffing campaign hit multiple Aussie superannuation funds (Australian Super, REST, Hostplus, Insignia, ART) over the March 29-30 weekend. Attackers used stolen creds, likely targeting web portals and mobile apps, accessing accounts and unfortunately stealing funds in some cases (reports mention ~AU$500k from four Australian Super members alone). ASFA is coordinating the response. A stark reminder about password reuse and MFA effectiveness, especially during off-hours!

    🏛️ Shake-up at NSA/Cyber Command: Big news out of the US – Gen. Timothy Haugh has been fired from his dual-hat role leading the NSA and USCYBERCOM after just over a year. Deputy Director Wendy Noble is also reportedly out. Reasons are murky, but speculation points towards political motivations (linked to Laura Loomer's visit with President Trump). This raises questions about stability, the ongoing 'Cybercom 2.0' review, and the future of the dual-hat structure, especially with ongoing nation-state threats.

    ⏱️ Incident Response Speed vs. Backups: An interesting debate highlighted recently – while backups are vital for recovery, is rapid IR potentially even more critical? It’s a tough balancing act: contain fast (risking tipping off attackers/losing evidence) or investigate thoroughly while the breach continues? Emphasises the need for skilled responders and adequate tooling, not just relying on backups as a silver bullet.

    ⚠️ Critical RCE in Apache Parquet (CVE-2025-30065): Heads up, data folks! A CVSS 10.0 RCE vulnerability has been found in the widely used Apache Parquet columnar storage format (up to v1.15.0). Given its use in Hadoop, AWS, Azure, GCP, and by major tech companies, the potential impact is huge. Patch to version 1.15.1 ASAP!

    📱 Pentagon Probes Defense Secretary's Signal Use: The DoD's Inspector General is investigating Defense Secretary Pete Hegseth's use of Signal for official business. This follows a report where a journalist was accidentally added to a Signal chat discussing sensitive airstrike details (targets, timing). Raises concerns about classified info on unclassified apps, need-to-know, and record-keeping compliance.

    The full blog post dives deeper into each of these stories and much more. Don't forget to sign up to our newsletter so you can get this daily wrap-up straight to your inbox!

    📨 opalsec.io/daily-news-update-s

    What are your biggest takeaways from this week's news? Let's discuss below!

    #CyberSecurity #InfoSec #ThreatIntel #DataBreach #CredentialStuffing #Ransomware #Phishing #Vulnerability #ApacheParquet #NSA #CyberCommand #IncidentResponse #CloudSecurity #NationalSecurity #Espionage #Privacy

  17. Our latest post is out, check it out for the full details here 👉 opalsec.io/daily-news-update-s

    If you're short on time, here's a quick rundown of the key stories:

    🇦🇺 Australian Pension Funds Under Attack: A significant credential stuffing campaign hit multiple Aussie superannuation funds (Australian Super, REST, Hostplus, Insignia, ART) over the March 29-30 weekend. Attackers used stolen creds, likely targeting web portals and mobile apps, accessing accounts and unfortunately stealing funds in some cases (reports mention ~AU$500k from four Australian Super members alone). ASFA is coordinating the response. A stark reminder about password reuse and MFA effectiveness, especially during off-hours!

    🏛️ Shake-up at NSA/Cyber Command: Big news out of the US – Gen. Timothy Haugh has been fired from his dual-hat role leading the NSA and USCYBERCOM after just over a year. Deputy Director Wendy Noble is also reportedly out. Reasons are murky, but speculation points towards political motivations (linked to Laura Loomer's visit with President Trump). This raises questions about stability, the ongoing 'Cybercom 2.0' review, and the future of the dual-hat structure, especially with ongoing nation-state threats.

    ⏱️ Incident Response Speed vs. Backups: An interesting debate highlighted recently – while backups are vital for recovery, is rapid IR potentially even more critical? It’s a tough balancing act: contain fast (risking tipping off attackers/losing evidence) or investigate thoroughly while the breach continues? Emphasises the need for skilled responders and adequate tooling, not just relying on backups as a silver bullet.

    ⚠️ Critical RCE in Apache Parquet (CVE-2025-30065): Heads up, data folks! A CVSS 10.0 RCE vulnerability has been found in the widely used Apache Parquet columnar storage format (up to v1.15.0). Given its use in Hadoop, AWS, Azure, GCP, and by major tech companies, the potential impact is huge. Patch to version 1.15.1 ASAP!

    📱 Pentagon Probes Defense Secretary's Signal Use: The DoD's Inspector General is investigating Defense Secretary Pete Hegseth's use of Signal for official business. This follows a report where a journalist was accidentally added to a Signal chat discussing sensitive airstrike details (targets, timing). Raises concerns about classified info on unclassified apps, need-to-know, and record-keeping compliance.

    The full blog post dives deeper into each of these stories and much more. Don't forget to sign up to our newsletter so you can get this daily wrap-up straight to your inbox!

    📨 opalsec.io/daily-news-update-s

    What are your biggest takeaways from this week's news? Let's discuss below!

    #CyberSecurity #InfoSec #ThreatIntel #DataBreach #CredentialStuffing #Ransomware #Phishing #Vulnerability #ApacheParquet #NSA #CyberCommand #IncidentResponse #CloudSecurity #NationalSecurity #Espionage #Privacy

  18. Our latest post is out, check it out for the full details here 👉 opalsec.io/daily-news-update-s

    If you're short on time, here's a quick rundown of the key stories:

    🇦🇺 Australian Pension Funds Under Attack: A significant credential stuffing campaign hit multiple Aussie superannuation funds (Australian Super, REST, Hostplus, Insignia, ART) over the March 29-30 weekend. Attackers used stolen creds, likely targeting web portals and mobile apps, accessing accounts and unfortunately stealing funds in some cases (reports mention ~AU$500k from four Australian Super members alone). ASFA is coordinating the response. A stark reminder about password reuse and MFA effectiveness, especially during off-hours!

    🏛️ Shake-up at NSA/Cyber Command: Big news out of the US – Gen. Timothy Haugh has been fired from his dual-hat role leading the NSA and USCYBERCOM after just over a year. Deputy Director Wendy Noble is also reportedly out. Reasons are murky, but speculation points towards political motivations (linked to Laura Loomer's visit with President Trump). This raises questions about stability, the ongoing 'Cybercom 2.0' review, and the future of the dual-hat structure, especially with ongoing nation-state threats.

    ⏱️ Incident Response Speed vs. Backups: An interesting debate highlighted recently – while backups are vital for recovery, is rapid IR potentially even more critical? It’s a tough balancing act: contain fast (risking tipping off attackers/losing evidence) or investigate thoroughly while the breach continues? Emphasises the need for skilled responders and adequate tooling, not just relying on backups as a silver bullet.

    ⚠️ Critical RCE in Apache Parquet (CVE-2025-30065): Heads up, data folks! A CVSS 10.0 RCE vulnerability has been found in the widely used Apache Parquet columnar storage format (up to v1.15.0). Given its use in Hadoop, AWS, Azure, GCP, and by major tech companies, the potential impact is huge. Patch to version 1.15.1 ASAP!

    📱 Pentagon Probes Defense Secretary's Signal Use: The DoD's Inspector General is investigating Defense Secretary Pete Hegseth's use of Signal for official business. This follows a report where a journalist was accidentally added to a Signal chat discussing sensitive airstrike details (targets, timing). Raises concerns about classified info on unclassified apps, need-to-know, and record-keeping compliance.

    The full blog post dives deeper into each of these stories and much more. Don't forget to sign up to our newsletter so you can get this daily wrap-up straight to your inbox!

    📨 opalsec.io/daily-news-update-s

    What are your biggest takeaways from this week's news? Let's discuss below!

    #CyberSecurity #InfoSec #ThreatIntel #DataBreach #CredentialStuffing #Ransomware #Phishing #Vulnerability #ApacheParquet #NSA #CyberCommand #IncidentResponse #CloudSecurity #NationalSecurity #Espionage #Privacy

  19. Our latest post is out, check it out for the full details here 👉 opalsec.io/daily-news-update-s

    If you're short on time, here's a quick rundown of the key stories:

    🇦🇺 Australian Pension Funds Under Attack: A significant credential stuffing campaign hit multiple Aussie superannuation funds (Australian Super, REST, Hostplus, Insignia, ART) over the March 29-30 weekend. Attackers used stolen creds, likely targeting web portals and mobile apps, accessing accounts and unfortunately stealing funds in some cases (reports mention ~AU$500k from four Australian Super members alone). ASFA is coordinating the response. A stark reminder about password reuse and MFA effectiveness, especially during off-hours!

    🏛️ Shake-up at NSA/Cyber Command: Big news out of the US – Gen. Timothy Haugh has been fired from his dual-hat role leading the NSA and USCYBERCOM after just over a year. Deputy Director Wendy Noble is also reportedly out. Reasons are murky, but speculation points towards political motivations (linked to Laura Loomer's visit with President Trump). This raises questions about stability, the ongoing 'Cybercom 2.0' review, and the future of the dual-hat structure, especially with ongoing nation-state threats.

    ⏱️ Incident Response Speed vs. Backups: An interesting debate highlighted recently – while backups are vital for recovery, is rapid IR potentially even more critical? It’s a tough balancing act: contain fast (risking tipping off attackers/losing evidence) or investigate thoroughly while the breach continues? Emphasises the need for skilled responders and adequate tooling, not just relying on backups as a silver bullet.

    ⚠️ Critical RCE in Apache Parquet (CVE-2025-30065): Heads up, data folks! A CVSS 10.0 RCE vulnerability has been found in the widely used Apache Parquet columnar storage format (up to v1.15.0). Given its use in Hadoop, AWS, Azure, GCP, and by major tech companies, the potential impact is huge. Patch to version 1.15.1 ASAP!

    📱 Pentagon Probes Defense Secretary's Signal Use: The DoD's Inspector General is investigating Defense Secretary Pete Hegseth's use of Signal for official business. This follows a report where a journalist was accidentally added to a Signal chat discussing sensitive airstrike details (targets, timing). Raises concerns about classified info on unclassified apps, need-to-know, and record-keeping compliance.

    The full blog post dives deeper into each of these stories and much more. Don't forget to sign up to our newsletter so you can get this daily wrap-up straight to your inbox!

    📨 opalsec.io/daily-news-update-s

    What are your biggest takeaways from this week's news? Let's discuss below!

    #CyberSecurity #InfoSec #ThreatIntel #DataBreach #CredentialStuffing #Ransomware #Phishing #Vulnerability #ApacheParquet #NSA #CyberCommand #IncidentResponse #CloudSecurity #NationalSecurity #Espionage #Privacy

  20. Our latest post is out, check it out for the full details here 👉 opalsec.io/daily-news-update-s

    If you're short on time, here's a quick rundown of the key stories:

    🇦🇺 Australian Pension Funds Under Attack: A significant credential stuffing campaign hit multiple Aussie superannuation funds (Australian Super, REST, Hostplus, Insignia, ART) over the March 29-30 weekend. Attackers used stolen creds, likely targeting web portals and mobile apps, accessing accounts and unfortunately stealing funds in some cases (reports mention ~AU$500k from four Australian Super members alone). ASFA is coordinating the response. A stark reminder about password reuse and MFA effectiveness, especially during off-hours!

    🏛️ Shake-up at NSA/Cyber Command: Big news out of the US – Gen. Timothy Haugh has been fired from his dual-hat role leading the NSA and USCYBERCOM after just over a year. Deputy Director Wendy Noble is also reportedly out. Reasons are murky, but speculation points towards political motivations (linked to Laura Loomer's visit with President Trump). This raises questions about stability, the ongoing 'Cybercom 2.0' review, and the future of the dual-hat structure, especially with ongoing nation-state threats.

    ⏱️ Incident Response Speed vs. Backups: An interesting debate highlighted recently – while backups are vital for recovery, is rapid IR potentially even more critical? It’s a tough balancing act: contain fast (risking tipping off attackers/losing evidence) or investigate thoroughly while the breach continues? Emphasises the need for skilled responders and adequate tooling, not just relying on backups as a silver bullet.

    ⚠️ Critical RCE in Apache Parquet (CVE-2025-30065): Heads up, data folks! A CVSS 10.0 RCE vulnerability has been found in the widely used Apache Parquet columnar storage format (up to v1.15.0). Given its use in Hadoop, AWS, Azure, GCP, and by major tech companies, the potential impact is huge. Patch to version 1.15.1 ASAP!

    📱 Pentagon Probes Defense Secretary's Signal Use: The DoD's Inspector General is investigating Defense Secretary Pete Hegseth's use of Signal for official business. This follows a report where a journalist was accidentally added to a Signal chat discussing sensitive airstrike details (targets, timing). Raises concerns about classified info on unclassified apps, need-to-know, and record-keeping compliance.

    The full blog post dives deeper into each of these stories and much more. Don't forget to sign up to our newsletter so you can get this daily wrap-up straight to your inbox!

    📨 opalsec.io/daily-news-update-s

    What are your biggest takeaways from this week's news? Let's discuss below!

    #CyberSecurity #InfoSec #ThreatIntel #DataBreach #CredentialStuffing #Ransomware #Phishing #Vulnerability #ApacheParquet #NSA #CyberCommand #IncidentResponse #CloudSecurity #NationalSecurity #Espionage #Privacy

  21. A critical flaw in Apache Parquet could let attackers run code remotely on your systems—rated a perfect 10.0 for severity. Is your big data framework safe? Read up on the fix and protect your data today.

    thedefendopsdiaries.com/addres

    #cve202530065
    #apacheparquet
    #rcevulnerability
    #bigdatasecurity
    #cybersecurity

  22. A critical flaw in Apache Parquet could let attackers run code remotely on your systems—rated a perfect 10.0 for severity. Is your big data framework safe? Read up on the fix and protect your data today.

    thedefendopsdiaries.com/addres

    #cve202530065
    #apacheparquet
    #rcevulnerability
    #bigdatasecurity
    #cybersecurity

  23. A critical flaw in Apache Parquet could let attackers run code remotely on your systems—rated a perfect 10.0 for severity. Is your big data framework safe? Read up on the fix and protect your data today.

    thedefendopsdiaries.com/addres

    #cve202530065
    #apacheparquet
    #rcevulnerability
    #bigdatasecurity
    #cybersecurity

  24. For #datascience working with tabular data and especially #apacheparquet files (which I also recommend - very fast way to store your tabular data) - the little application called "Tad" is a great way to view your data between things and scroll through it. tadviewer.com

    Open-source, plain, simple, and fast. Does its job very well for inspecting data for a scroll-through (and ensuring your data stores properly).

  25. So Tessellate inherits lots of support for various data formats from Cascading
    github.com/cwensel/cascading

    Even though dropped Cascading support, we were able to port it over.

    Now that Parquet is native to Cascading, it should be easier to add support.

    This would allow to convert data as it arrives into Iceberg continuously for use in Athena or other data front-ends.

    Anyone interested in a challenge?

  26. A little more color on this announcement..
    fosstodon.org/@cwensel/1105490

    First, removed support, so I had to splice the original source into Cascading. But the ParquetScheme didn't honor type information fully. So there is a new TypedParquetScheme that has native support for JSON and Timestamps.

    Second, Parquet requires the FileSystem, which means we get the wonderful S3A implementation. But we also get a 331MB jar dependency with the aws bundle.

  27. Been testing visidata (visidata.org/) for some time already, using it for quick CSV/TSV, #ApacheParquet, #sqlite data mangling/inspection.

    A few weeks ago, I learned about vdsql (github.com/visidata/vdsql), which extends the tool to include support for database backends, and it is just a breeze to use with, for example, #duckdb:

    $ vdsql really-big.duckdb

    I see some minor issues with 'duckdb-engine' and calculated columns (used by vdsql/ibis), but overall it works reasonably well.