home.social

#parquet — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #parquet, aggregated by home.social.

  1. When does #Iceberg beat #Parquet+projection on #AWSGlue, and when doesn't ?

    An end-to-end #ETL PoC on #AWS to find out: producer, #Kinesis, two #Firehose paths, two #Glue jobs, #Athena.

    🔮 Spoiler: how the data is read is the key to the choice.

    In the article: every choice with its why, plus a few gems from some Glue experience 😄

    alessandra.bilardi.net/diary/a

    #DiaryOfALazyDeveloper

  2. When does #Iceberg beat #Parquet+projection on #AWSGlue, and when doesn't ?

    An end-to-end #ETL PoC on #AWS to find out: producer, #Kinesis, two #Firehose paths, two #Glue jobs, #Athena.

    🔮 Spoiler: how the data is read is the key to the choice.

    In the article: every choice with its why, plus a few gems from some Glue experience 😄

    alessandra.bilardi.net/diary/a

    #DiaryOfALazyDeveloper

  3. Saint-Étienne : un homme décède après avoir été roué de coups en pleine rue

    Un homme de 38 ans est décédé dans la nuit du mercredi 22 au jeudi 23 avril après…
    #SaintEtienne #FR #France #Actu #News #Europe #EU #Saint-Étienne #actu #Actualités #Auvergne-Rhône-Alpes #BAC #beaubrun #décès #europe #Loire #lynchage #parquet #Police #Pompiers #Républiquefrançaise #Rixe #saint-étienne
    europesays.com/fr/888891/

  4. Гайд: Как работать с форматом PARQUET

    В прошлом году мы начали публиковать данные в каталоге «Если быть точным» в формате Parquet . Его придумали инженеры Twitter и Cloudera в 2013 году, и сегодня он стал стандартом хранения аналитических данных — его используют Google, Amazon, Netflix и большинство современных data-платформ. В этом гайде мы расскажем, как эффективно работать с данными в формате Parquet с помощью Python.

    habr.com/ru/articles/1013604/

    #parquet #python #анализ_данных

  5. 🐒 Ah, yes, the holy grail of nerd bragging rights: a 47M+ item #archive of Hacker News, now in the culinary delight format of #Parquet for all your "data chef" needs. 🍽️ Updated every 5 minutes, because clearly, what's more riveting than a play-by-play of techie's daily musings? Oh wait, I forgot—🥱 anything else.
    huggingface.co/datasets/open-i #HackerNews #DataChef #TechieBraggingRights #DailyUpdates #HackerNews #ngated

  6. Ils pilotaient un trafic de stupéfiants depuis la prison : un réseau démantelé entre la Loire et le Puy-de-Dôme

    Quatorze trafiquants présumés ont été placés en garde à vue après leur interpellation entre ce mardi 3 …
    #SaintEtienne #FR #France #Actu #News #Europe #EU #Saint-Étienne #actu #Actualités #Auvergne-Rhône-Alpes #détention #Drogue #europe #LaTalaudière #maisond'arrêt #parquet #Prison #puy-de-dôme #Républiquefrançaise #Roanne #stupéfiant #trafic
    europesays.com/fr/780817/

  7. Ils pilotaient un trafic de stupéfiants depuis la prison : un réseau démantelé entre la Loire et le Puy-de-Dôme

    Quatorze trafiquants présumés ont été placés en garde à vue après leur interpellation entre ce mardi 3 …
    #SaintEtienne #FR #France #Actu #News #Europe #EU #Saint-Étienne #actu #Actualités #Auvergne-Rhône-Alpes #détention #Drogue #europe #LaTalaudière #maisond'arrêt #parquet #Prison #puy-de-dôme #Républiquefrançaise #Roanne #stupéfiant #trafic
    europesays.com/fr/777442/

  8. Ho provato a riversare un dump #Wikidata in #Parquet e ad interrogarlo con #DuckDB: ci mette meno di un'ora ad estrapolare tutte le 19.939.182 entità che rappresentano persone, incluse le sottoclassi di wdt:Q5.
    Decisamente meglio del mio deserializzatore implementato in Go, che per fare la stessa cosa ci mette quasi 8 ore.

  9. Munquet 0.2.1 just landed on Flathub 🚀

    Fixed a small race condition when canceling a conversion — turns out the process could finish right before you clicked “Yes” 😅

    Two lines later… all good.

    flathub.org/en/apps/io.gitlab.

    #Flatpak #GTK4 #OpenSource #Parquet #DataScience #Linux #Python #PyArrow

  10. Munquet 0.2.0 is now available on Flathub 🎉

    ✨ Display real host paths via XDG Portal
    🛠 Introduced a .Devel Flatpak manifest for development builds

    Continuing to improve the Linux desktop data workflow 🚀

    flathub.org/en/apps/io.gitlab.

    #Flathub #Flatpak #XDGPortal #GTK4 #OpenSource #Parquet #Python #DataScience

  11. Munquet is now officially on Flathub 🎉

    A native Linux app to convert datasets into Apache Parquet using PyArrow backend. Perfect for data science workflows, analytics, and anyone needing fast local conversions.

    Get it here: flathub.org/en/apps/io.gitlab.

    @gnome @xfce @kde @GTK @linux @flathub

    #apache #pyarrow #datascience #parquet #csv #OpenSource #Python #GNOME #GTK4 #Adwaita

  12. New entries in Awesome #Parquet

    - Munquet: A desktop tool to convert CSV files to Parquet

    - nail: A CLI tool for analyzing, transforming, and exploring data files

    - odbc2parquet: query an ODBC data source and write the result to parquet.

    - DataStudio (screenshot): a webapp to explore and visualize data, entirely in the browser.

    - a new "Parquet engineering" section that groups best practices for writing Parquet files

    github.com/severo/awesome-parq

  13. 🚀 Munquet — Convert, merge, rename & validate tabular data into Parquet, fully offline & batch-ready.

    GitLab: gitlab.com/zulfian1732/munquet

    Featured in: @severo 's Awesome Parquet: github.com/severo/awesome-parq 🙏

    #Parquet #OpenSource #Python #GNOME #GTK4 #Adwaita #PyArrow

  14. 🚀 Sneak peek Munquet!
    Convert, merge, rename, and validate tabular data safely into Parquet. Works offline, with batch processing and progress feedback.

    GitLab repo:

    gitlab.com/zulfian1732/munquet

    Flathub release coming soon!

    #Python #GTK4 #GNOME #PyArrow #Parquet #DataScience #Libadwaita

  15. "Une enquête criminelle, ça prend du temps". Le procureur de Lyon, manifestement sous pression, reste extrêmement prudent sur les circonstances de la mort de Quentin D. A rebours des déclarations tonitruantes jusqu'au sommet de l'Etat, notamment les ministres de la Justice et de l'Intérieur.

    #Politique #Justice #Proces #Parquet #Lyon #Quentin #Macron #LFI

  16. splitting a big parquet file into chunks

    read (in parallel) every chunk and write again into two different output files

    #nextflow #parquet

  17. New in Awesome Parquet: the best practices for writing Parquet files (Parquet engineering 🪛 ).

    github.com/severo/awesome-parq

    It might become the most useful section.

    It's often hard to choose the best parameters: row group size, compression algorithm, whether to include statistics, whether to include indexes, whether to include bloom filters...

    Please send me other references (or open a PR), I'm eager to read more about optimizing Parquet files for specific (or general) use.

    #parquet

  18. So, we had to find a novel approach. It's the story I'm telling in this blog post. I hope you'll enjoy it.

    rednegra.net/blog/20260212-vir

    #parquet #ux #html #webdev #scroll #react

  19. #nextflow #parquet plugin version v0.2.2 is out!

    This update introduces powerful new splitting capabilities including by, file, and count options, bringing it in line with standard Nextflow splitters as splitFasta for example.

    Specifically, the file option allows you to partition large datasets into smaller chunks, enabling seamless parallel processing

    Additionally, this version includes an experimental feature for reading files directly from S3

    Read more at nextflow-io.github.io/nf-parqu

  20. yesterday I've published #nextflow #parquet plugin version 0.2.2-edge2

    a big refactor of the plugin to be aligned with others Splitters

    now you can chunk a parquet file into smaller files using the `file` option, specify a batch size using `by` option, and so on

    happy to see how this plugin is gaining popularity

    nextflow-io.github.io/nf-parqu

  21. 記事書いた
    Parquetで使用できる型(PhysicalType、LogicalType、ConvertedType)の一覧 #Parquet - Qiita qiita.com/kotet/items/ef0faaf8

  22. 📣 R Consortium webinar: Scaling up data analysis in R with Arrow

    If “scaling R” has meant databases/clusters or rewriting everything, this session is for you. Dr Nic Crane (Arrow R maintainer; Apache Arrow PMC) will walk through practical, memory-efficient ways to work with larger datasets in R—plus why Parquet is a workflow upgrade and where DuckDB fits.

    Register:
    r-consortium.org/webinars/scal

  23. Using Apache Parquet? Found a TUI for you 👀📦

    🔍 **parqeye** — A TUI for inspecting Parquet data, schemas and metadata.

    💯 Browse tables, explore schemas, inspect row groups & view file stats.

    🦀 Written in Rust & built with @ratatui_rs

    ⭐ GitHub: github.com/kaushiksrini/parqeye

  24. Hello !

    Thinking about a better mzML to store proteomics data, but not convinced by the approach, I've converted it into :
    * Smaller data files (only 66% of the mzML original file) for the exact same data
    * Faster to read (25s for a big mzML vs 18s in mzcbor on the same computer)
    * Very quick random access to spectra (24.6577 ms for mzML vs 786.731 μs for mzcbor for the same operation using index)

    I'd like to share it if you are interested at

  25. The reason I made a sample dataset was that I thought it was a bit sluggish querying the GeoPackage file from DuckDB. The query in the image took 2.56 s on the GeoPackage file. I now tried to save the entire dataset into a Parquet file (sorted on county and municipality) and compressed with ZSTD. The same query takes 0.0140s.

    Also the Parquet file is 141 MiB compared to 1.18 GiB for the GeoPackage file. The Parquet file is smaller than the original zip file with the GeoPackage file.

    #DuckDB #GeoPackage #Skogsstyrelsen #Parquet

  26. House with land - Guaira Paraguay 💓

    #Itati is a beautiful spot with a bathing #lake and natural swimming #pool. The area around the #Piramides 🔺 Naturales is considered one of the most #beautiful areas of Paraguay.

    open #living/dining area
    #guest toilet/shower
    #HWR
    #parquet flooring
    #Melgarejo 20 min
    #Planta Urbana 30 min

    199.000 €
    Rooms: 5
    Living space: 155m²
    Plot: 2.000m²
    #Guaira #Paraguay 🇵🇾

    bluehomes.com/PPY0057/en/House

    #homeforsale #realestate #auswandern

  27. House with land - Guaira Paraguay 💓

    #Itati is a beautiful spot with a bathing #lake and natural swimming #pool. The area around the #Piramides 🔺 Naturales is considered one of the most #beautiful areas of Paraguay.

    open #living/dining area
    #guest toilet/shower
    #HWR
    #parquet flooring
    #Melgarejo 20 min
    #Planta Urbana 30 min

    199.000 €
    Rooms: 5
    Living space: 155m²
    Plot: 2.000m²
    #Guaira #Paraguay 🇵🇾

    bluehomes.com/PPY0057/en/House

    #homeforsale #realestate #auswandern

  28. We've created a way to display interactive maps in the browser, completely client-side!

    Drop your data in as or file, and your vector shapefile as a , and your map is ready to go!

    It's hosted on pages (so it's free!) but can be embedded anywhere

    Tutorial:
    odissei-soda.nl/tutorials/map-

    Example:
    sodascience.github.io/map-expl

    (we tried out @penpot in the design process!)

  29. Как мы строили хранилище на 70 ПБ данных и не планируем останавливаться

    Привет, сегодня я расскажу о том, как наша команда строила платформу обработки и хранения данных для обучения GenAI-моделей в Сбере, и как мы выросли до 70 ПБ сырых данных. Меня зовут Александр, я работаю в Сбере и два года занимался развитием этой платформы.

    habr.com/ru/companies/sberbank

    #Apache_Spark #apache_iceberg #parquet #s3 #big_data

  30. Votre parquet est posé, mais il manque LA touche finale ? Les PLINTHES ! 😱 La coupe des angles vous terrorise ?

    Pas de panique ! On a créé LE guide du débutant pour des finitions dignes d'un pro.

    #poserdesplinthes #bricolage #diy #renovation #travauxmaison #finition #parquet #astucebricolage #decorationinterieur #outillage #menuiserie

    lemagdesastuces.fr/comment-pos

  31. La pose du parquet, ça allait... jusqu'à l'arrivée du premier mur ! 😱 La découpe, c'est votre cauchemar ?

    On a créé LE guide pour transformer cette étape stressante en un jeu de précision.

    #parquet #bricolage #diy #renovation #travauxmaison #parquetflottant #sciesauteuse #astucebricolage #outillage #decorationinterieur #tutoriel

    lemagdesastuces.fr/comment-dec

  32. Online GeoParquet Visualizer: For day 7 of the #30DayMapChallenge on the topic of #accessibility, @DomeGIS released the #GeoParquet Visualizer. The GeoParquet Visualizer is a free and open-source web tool built with #MapLibre and #parquet-#wasm that lets users view, style, and share GeoParquet and Parquet datasets directly in the browser. spatialists.ch/posts/2025/11/1 #GIS #GISchat #geospatial #SwissGIS

  33. Оптимизация производительности запросов: мощный тандем StarRocks и Apache Iceberg

    Apache Iceberg — табличный формат для озёр данных с поддержкой ACID, Schema Evolution, Hidden Partition и версионирования, но при больших метаданных и работе через S3 страдает планирование запросов и латентность. В связке со StarRocks мы показываем, как распределённый Job Plan, Manifest Cache, CBO с гистограммами, Data Cache и материализованные представления выводят lakehouse‑аналитику на уровень DWH: снижают накладные расходы на метаданные, ускоряют планы и выполнение, а запись обратно в Iceberg сохраняет единый источник истины. Разбираем архитектуру Iceberg, типовые узкие места и практики оптимизации на StarRocks 3.2–3.3, включая кейс WeChat/Tencent.

    habr.com/ru/articles/963410/

    #apache_iceberg #starrocks #lakehouse #data_analysis #data_lake #parquet #manifest #materialized_views

  34. Released scrapy-contrib-bigexporter 1.0.0 (codeberg.org/ZuInnoTe/scrapy-c) - additional export formats for the webscraping framework Scrapy.

    Migrated parquet export from fastparquet to pyarrow as fastparquet is deprecated (docs.dask.org/en/stable/change)

    Migrated orc export from pyorc to pyarrow to reduce the number of dependencies

    #scrapy #crawling #python #parquet #orc #pyarrow #webcrawling #scraping

  35. GEOMETRY is a #Parquet logical type for 6 months now. The data is encoded as WKB.

    Hyparquet, a pure JavaScript Parquet library, now supports it as of version 1.19.0 by decoding geometry columns to GeoJSON geometries.

    You can try the hyparquet demo:

    hyparam.github.io/demos/hyparq

    or use the hyperparam CLI tool:

    ```
    hyp raw.githubusercontent.com/apac
    ```

    (install locally with `npm I -g hyperparam`).

    #geospatial #data #maps #geojson

  36. 🗺️ Parquet with GEOMETRY type is not GeoParquet.

    In a new blog post, structured as an FAQ, I detail the differences between GeoParquet and the latest version of Parquet, which supports geospatial data via the GEOMETRY logical type.

    TL;DR: the two standards are orthogonal, compatible, and can be combined, with the only caveat that the columns must be encoded as WKB.

    👀 Ready for a deep dive?

    ➡️ ➡️ ➡️ rednegra.net/blog/20250925-par

    #parquet #geoparquet

  37. New nf-parquet version 0.2.1 deployed using new plugin repository

    Interesting the new way to publish plugins, once I use it a little more I'll write a post about it

    #Nextflow #parquet #apache_parquet

    registry.nextflow.io/plugins/n

  38. New nf-parquet version 0.2.1 deployed using new plugin repository

    Interesting the new way to publish plugins, once I use it a little more I'll write a post about it

    #Nextflow #parquet #apache_parquet

    registry.nextflow.io/plugins/n

  39. New nf-parquet version 0.2.1 deployed using new plugin repository

    Interesting the new way to publish plugins, once I use it a little more I'll write a post about it

    #Nextflow #parquet #apache_parquet

    registry.nextflow.io/plugins/n