#dask — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #dask, aggregated by home.social.
-
Собираем ML-платформу на базе Kubernetes: Yandex Cloud, JupyterHub, Dask и S3
Привет! Я Алиса, DevOps-инженер в KTS . В этой статье я расскажу об одном из наших недавних проектов, на котором мы строили инфраструктуру для команды дата-инженеров и аналитиков. Сразу оговорюсь, что это была не платформа для инференса Production-моделей, а именно полигон для исследований. В общем, делюсь практическим опытом построения масштабируемой инфраструктуры с автоскейлингом. Если тема вам актуальна — приглашаю к прочтению.
https://habr.com/ru/companies/kts/articles/1021976/
#yandex_cloud #jupyterhub #dask #s3 #kubernetes #ML #MLOps #devops
-
Восемь высокопроизводительных Python-библиотек в копилку разработчикам
Когда в 1991 году Гвидо ван Россум представил миру Python, никто не мог предсказать, какое место через несколько десятилетий этот язык займет в веб-разработке, Data Science и Machine Learning. Сейчас Python продолжает развиваться: с новым поколением инструментов в прошлое уходят традиционные ограничения — производительность, GIL и сложность параллельных вычислений. Привет, Хабр! С вами Леша Жиряков, я руковожу бэкенд-направлением витрины KION, возглавляю гильдию по Python и пишу для блога MWS на Хабре. Я каждый день сталкиваюсь с вызовами высоконагруженных систем и сформировался пул инструментов, которые помогают решать критические проблемы современной разработки — от обработки данных с Polars до управления зависимостями с UV. В этом материале я сделаю обзор Python-библиотек, с которыми можно создавать системы, сравнимые по производительности с Go и Rust.
https://habr.com/ru/companies/ru_mts/articles/968776/
#библиотеки #python #fastapi #litestar #polars #httpx #dask #Pydantic_V2 #ruff #Pithon_UV
-
Questa settimana ho fatto la-due-giorni-a-Bologna 🚀🚀
Intensa di #talk ispiranti, #gadget bellissimi … e le persone hanno reso tutto davvero indimenticabile. ✨
Conosco i retroscena dell’organizzazione e i ragazzi del @grusp hanno resto tutto perfetto, leggero e spensierato .. sebbene non fosse per nulla facile 💪
E’ sempre un piacere essere accettati come #speaker ai loro eventi ❤️
Alla prossima !
#DataAnalysis #Dask #Kubernetes #ParallelComputing #Scalability #AWS #DevSecOpsDay #ContainerDay
-
Siamo pronti per gli eventi del @grusp ?
Parlerò al #ContainerDay25 tra #SoftwareArchitect, #CloudEngineer e #PlatformSpecialist .. un po' #outsider ?
Tutt’altro: i dati oggi viaggiano in produzione, dentro cluster orchestrati e hanno bisogno di essere scalati, gestiti, monitorati proprio come il resto del software !
📍 Ci vediamo a #Bologna !
-
Ray and Dask are Python libraries that help data scientists work faster with parallel processing. Dask excels at scalable data analysis with familiar pandas-like syntax, perfect for large datasets and ETL tasks. Ray shines in distributed ML training, hyperparameter tuning and model serving with built-in libraries like Ray Tune and Ray Serve. Choose Dask for data processing; Ray for ML pipelines. #DataScience #Python #MachineLearning #BigData #Ray #Dask #DataProcessing #ML https://www.kdnuggets.com/ray-or-dask-a-practical-guide-for-data-scientists
-
Thank you to #Kone & Mai and Tor Nessling Foundations for supporting this work. A quantitative work like this would not be possible without a robust suite of FOSS tools. My thanks to the maintainers of #QGIS, #pandas, #geopandas, #duckdb, #dask, #statsmodels, #jupyter and many more!
-
𝗚𝗲𝗼𝘀𝗽𝗮𝘁𝗶𝗮𝗹 𝗣𝘆𝘁𝗵𝗼𝗻 𝗧𝘂𝘁𝗼𝗿𝗶𝗮𝗹𝘀
SpatialThoughts provides tutorials which cover a broad range of geospatial topics and technologies, e.g., #GeoPandas, #XArray, #dask, and more. Each technology is described in a notebook with step-by-step explanation. Check it out.
https://www.geopythontutorials.com -
Det blir som sagt ingen metakalender i år, så denne honningkakeoppskriften kan dere nyte uten tanke på sesongen.
-
State of my toolchain 2024
An overview of the software, hardware and other personal productivity toolchains I’m using as of December 2024.
https://blog.kathyreid.id.au/2024/12/08/state-of-my-toolchain-2024/
-
Dask для анализа временных рядов
Привет, Хабр! Сегодня расскажем, как с помощью Dask можно анализировать временные ряды. С временными рядами всегда заморочек много: большие данные, сложные расчеты. Но Dask отлично с этим справляется.
-
Dear #gis users and #gischat . I just wrote my first (real) post on my block. I tried to learn some modern frameworks. Therefore, I compared the execution speed for the Intersection for the buildings of a whole German state with their parcels and land usage. I compared #Geopandas #duckdb #apachesedona and #dask GeoPandas.
Sedona and Dask-GeoPandas were the fastest. DuckDB's had some problems. Btw.: DuckDB did have the smallest memory footprint.
Here is the entry: https://sehheiden.github.io/posts/speed_comparision_gis_intersection/ -
I am moving all my computing libraries to #xarray, no regrets. It is a natural way to manipulate datasets of rectangular arrays, with named coordinates and dimensions: https://xarray.dev/
There are several possible backends, including #dask which allows lazy data loading.
I had the pleasure of meeting some of the devs last week, who showed me a preview of the upcoming `DataTree` structure which is going to make this library even more versatile! -
[Перевод] Уроки, извлеченные из масштабирования до многотерабайтных датасетов
В этой статье я расскажу об уроках, которые вынес при работе с многотерабайтными датасетами. Объясню, с какими сложностями столкнулся при увеличении масштабов датасета и как их удалось решить. Я разделил статью на две части: первая посвящена масштабированию на отдельной машине, вторая — масштабированию на множестве машин. Наша цель — максимизировать доступные ресурсы и как можно быстрее выполнить поставленные задачи.
https://habr.com/ru/companies/magnus-tech/articles/834506/
#датасеты #big_data #joblib #машинное+обучение #параллелизация #spark #dask #виртуализация #инстансы #виртуальная_машина
-
Spent the morning playing with pystac-client and Dask. It's interesting for small areas but I still need to figure out how to scale it when working with huge extents.
#python #stac #remotesensing #dask #jalisco #mexico #laprimavera #incendios #wildfires
-
If, like me, you are confused about all the terminology around geospatial cloud computing and how it all fits together, I recommend this video. Great explanation! #STAC, #COG, #Zarr, #Dask, #AWS, #EarthEngine
https://www.youtube.com/watch?v=YPno-89l54Q -
As part of my #PhD work, I recently had to perform computation on two very large files using @pandas_dev and I turned to #dask - a set of libraries on top of #pandas, aimed at scaling #python workloads from the laptop to the cluster.
Here's what I learned!
https://blog.kathyreid.id.au/2024/01/27/scaling-python-dask/
-
Good morning folks! It's been a while since I did one of my #TwitterMigration #Introduction #ConnectionList posts where I curate interesting people for you to follow on the #Fediverse :fediverse:
Today, I'd like you to meet:
@LMonteroSantos Lola is a #PhD #researcher at #EUI interested in #data #regulation, digital #economy and #AntiTrust, passionate about #DataScience and #programming. New to Mastodon, please make welcome 👋 🇪🇺
@danlockton is a #Professor at @TUEindhoven where he works in #design, #imagination and #climate #futures. He often posts interesting things around co-design and #collaboration 🇳🇱
@1sabelR is a #researcher @ANUResearch where she is into #SolarPunk and @scicomm 🇦🇺 She co-hosts the #SciBurst #podcast - worth a listen!
@timrichards is a #travel #writer based in #Naarm / #Melbourne in Australia, specialising in #rail 🇦🇺
@microstevens is a #DataScience facilitator at #UWMadison and she works in #OpenScience and #genomics 💻 🧬
@mrocklin does amazing things with #dask in #python, and I am very grateful in recent weeks for his posts and #StackOverflow responses. Thank you 🙏 🐍
@everythingopen is Australia's premier open #technology conference, covering #linux, #OpenSource, #OpenData, #OpenGov, #OpenGLAM, #OpenScience and everything else open. You should check it out! 🐧 🇦🇺
That's all for today - don't forget to share your own lists so we can more richly connect the :fediverse: and curate the conversations we want to have ❤️
-
So im almost finished with my first independent implementation of a standard and I want to write up the process bc it was surprisingly challenging and I learned a lot about how to write them.
I was purposefully experimenting with different methods of translation (eg. Adapter classes vs. pure functions in a build pipeline, recursive functions vs. flattening everything) so the code isnt as sleek as it could be. I had planned on this beforehand, but two major things I learned were a) not just isolating special cases, but making specific means to organize them and make them visible, and b) isolating different layers of the standard (eg. schema language is separate from models is separate from I/O) and not backpropagating special cases between layers.
This is also my first project thats fully in the "new style" of python thats basically a typed language with validating classes, and it makes you write differently but uniformly for the better - it's almost self-testing bc if all the classes validate in an end-to-end test then you know that shit is working as intended. Forcing yourself to deal with errors immediately is the way.
Lots more 2 say but anyway we're like 2 days of work away from a fully independent translation of #NWB to #LinkML that uses @pydantic models + #Dask for arrays. Schema extensions are now no-code: just write the schema (in nwb schema lang or linkml) and poof you can use it. Hoping this makes it way easier for tools to integrate with NWB, and my next step will be to put them in a SQL database and triple store so we can yno more easily share and grab smaller pieces of them and index across lots of datasets.
Then, uh, we'll bridge our data archives + notebooks with the fedi for a new kind of scholarly communication....