#dataframe — Public Fediverse posts on home.social

N-gated Hacker News @[email protected] · 2026-04-30 · 20:34 UTC

Oh look, another "groundbreaking" #blog post about #DuckDB from a self-proclaimed data wizard. 🙄 Apparently, the limitations of basic text queries are just too much for our hero, who bravely delves into the wild world of Full-Text Search. 🌟 Spoiler alert: it's as thrilling as watching paint dry on a data frame. 🥱
https://peterdohertys.website/blog-posts/full-text-search-w-duckdb.html #DataWizard #FullTextSearch #DataFrame #HackerNews #ngated

#blog #duckdb #datawizard #fulltextsearch #dataframe #hackernews

Habr @[email protected] · 2025-09-14 · 12:22 UTC

Polars — «убийца Pandas» на максималках

Всем привет! Меня зовут Александр Андреев, я инженер данных. Сегодня я хочу рассказать вам о библиотеке Polars - потенциальной замене Pandas, любимой у большинства дата-инженеров и дата-саентистов библиотеки для работы с данными. В своей статье я последовательно пройдусь от истории библиотеки Polars до примеров кода, технических аспектов ее производительности и в конце дам ссылки на все бенчмарки, обучающие материалы и дополнительные статьи, которые использовались для написания данного обзора-туториала по этой замечательной библиотеке.

https://habr.com/ru/articles/946788/

#polars #pandas #data_engineering #data_science #data_analysis #dataframe #library #python #rust #dataset

#polars #pandas #data_engineering #data_science #data_analysis #dataframe

r⁵py @[email protected] · 2025-04-29 · 10:00 UTC

Computing travel time matrices in r⁵py from @geopandas #DataFrame is two lines of code:

(1) create an r5py.TransportNetwork from @openstreetmap and #GTFS data

(2) turn it into an r5py.TravelTimeMatrix()

Try it out in #binder: https://r5py.readthedocs.io/stable/user-guide/user-manual/quickstart.html

#dataframe #gtfs #binder

gdmcbain @gdmcbain · 2025-04-01 · 05:41 UTC

Parsing CSV with units in the header · Issue #166 · hgrecco/pint-pandas

https://github.com/hgrecco/pint-pandas/issues/166

Now we can read a #csv file with a header like `time / s,mass / g` into #pandas and call `.pint.quantify()` to get a #dataframe in which the columns have #units as in #Pints !

Handy for CSV restricted to single-row headers, as in Confluence Databases and Microsoft Lists.

#csv #pandas #dataframe #units #pints

Habr @[email protected] · 2024-04-15 · 06:32 UTC

Spark. План запросов на примерах

Всем привет! В этой статье возьмем за основу пару таблиц и пройдемся по планам запросов по нарастающей: от обычного селекта до джойнов, оконок и репартиционирования. Посмотрим, чем отличаются виды планов друг от друга, что в них изменяется от запроса к запросу и разберем каждую строчку на примере партиционированной и непартиционированной таблицы.

https://habr.com/ru/articles/807421/

#apache_spark #pyspark #sql #python #bigdata #data_engineering #explain #execution_plan #план_запроса #dataframe

#dataframe #план_запроса #execution_plan #explain #data_engineering #bigdata