#mapreduce — Public Fediverse posts on home.social

N-gated Hacker News @[email protected] · 2026-03-07 · 04:37 UTC

🚀✨ Behold, the thrilling tale of querying 3 billion vectors—a journey where Vicki Boykis heroically attempts to decode Jeff Dean's cryptic wisdom on #mapreduce. Spoiler: It's basically a nerdy treasure hunt for semantically similar items, but with more floating-point numbers than your brain can handle. 💻🧠
https://vickiboykis.com/2026/02/21/querying-3-billion-vectors/ #HackerNews #VickiBoykis #treasureHunt #techJourney #floatingPoint #HackerNews #ngated

#mapreduce #hackernews #vickiboykis #treasurehunt #techjourney #floatingpoint

Habr @[email protected] · 2025-10-09 · 10:32 UTC

Распределенные вычисления в Apache Ignite 3

В статье разбираются возможности распределённых вычислений в Apache Ignite 3 . Покажу, как развернуть кластер в Docker, задеплоить собственные джобы и сравнить Ignite 3 с предыдущей версией. Затронем новые возможности Ignite как полноценной распределённой платформы, а не просто in-memory кэша.

https://habr.com/ru/articles/954928/

#distributed_computing #распределённые_вычисления #colocated_computations #коллокационные_вычисления #inmemory_database #java #apache_ignite_3 #mapreduce

#mapreduce #apache_ignite_3 #java #inmemory_database #коллокационные_вычисления #colocated_computations

Habr @[email protected] · 2025-06-10 · 09:42 UTC

Как мы заменили сотни Join’ов на один РТ-процессинг с 1kk RPS

Как связаны скидки, пользовательские пути и огромные массивы данных в Яндекс Рекламе? Привет, Хабр! Меня зовут Максим Стаценко, я работаю с базами данных и яростно в них копаюсь с 2010 года, а в Big Data — с 2016. Сейчас работаю в Яндексе в DWH поиска и рекламы. Мы работаем с ОЧЕНЬ большими данными. Каждый день миллионы пользователей видят рекламу Яндекса, а наши системы обрабатывают огромные объёмы данных. Чтобы реклама работала эффективно, нам нужно в каждый момент времени иметь максимально полную информацию об истории жизни рекламного объявления, а значит нужно каким-то образом передавать данные от одного события к другому внутри рекламной воронки. Расскажу, как мы решали эту проблему.

https://habr.com/ru/companies/oleg-bunin/articles/884560/

#ytsaurus #mapreduce #olap #oltp #антифрод #распределенные_системы #оптимизация #обработка_данных #хранилища_данных

#хранилища_данных #обработка_данных #оптимизация #распределенные_системы #антифрод #oltp

Habr @[email protected] · 2025-06-02 · 13:22 UTC

Соединение SortMergeJoin в Apache Spark

Рассмотрим, как реализован SortMergeJoin в Apache Spark, и заодно заглянем в исходный код на GitHub. Spark написан на языке Scala, и вся логика работы оператора доступна в открытом репозитории проекта. Вот здесь :) Первое, что рассмотрим - это конструктор кейс-класса 1. Конструктор SortMergeJoinExec

https://habr.com/ru/companies/gnivc/articles/914932/

#spark #join #hadoop #bigdata #mapreduce

#mapreduce #bigdata #hadoop #join #spark

Doug Whitfield [Minneapolis] @[email protected] · 2025-01-20 · 20:14 UTC

so, gonna write some stuff on #HDFS #MapReduce #yarn and maybe clustering. Also, #machinelearning was suggested but I think that may be too broad of a topic for this. I did cover Machine Learning in a blog back in 2023, but this time is for KB, not blog: https://www.openlogic.com/blog/using-cassandra-kafka-and-spark-ai

Hmm, perhaps some sort of ML performance (as in disk io, etc not accuracy) document would be good but still, where to even start.

If anyone has beginner resources, I'll likely be pointing folks to some resources

#hdfs #mapreduce #yarn #machinelearning

Doug Whitfield [Minneapolis] @[email protected] · 2025-01-20 · 20:14 UTC

so, gonna write some stuff on #HDFS #MapReduce #yarn and maybe clustering. Also, #machinelearning was suggested but I think that may be too broad of a topic for this. I did cover Machine Learning in a blog back in 2023, but this time is for KB, not blog: https://www.openlogic.com/blog/using-cassandra-kafka-and-spark-ai

Hmm, perhaps some sort of ML performance (as in disk io, etc not accuracy) document would be good but still, where to even start.

If anyone has beginner resources, I'll likely be pointing folks to some resources

#hdfs #mapreduce #yarn #machinelearning

Doug Whitfield [Minneapolis] @[email protected] · 2025-01-20 · 20:14 UTC

so, gonna write some stuff on #HDFS #MapReduce #yarn and maybe clustering. Also, #machinelearning was suggested but I think that may be too broad of a topic for this. I did cover Machine Learning in a blog back in 2023, but this time is for KB, not blog: https://www.openlogic.com/blog/using-cassandra-kafka-and-spark-ai

Hmm, perhaps some sort of ML performance (as in disk io, etc not accuracy) document would be good but still, where to even start.

If anyone has beginner resources, I'll likely be pointing folks to some resources

#hdfs #mapreduce #yarn #machinelearning

Doug Whitfield [Minneapolis] @[email protected] · 2025-01-20 · 20:14 UTC

so, gonna write some stuff on #HDFS #MapReduce #yarn and maybe clustering. Also, #machinelearning was suggested but I think that may be too broad of a topic for this. I did cover Machine Learning in a blog back in 2023, but this time is for KB, not blog: https://www.openlogic.com/blog/using-cassandra-kafka-and-spark-ai

Hmm, perhaps some sort of ML performance (as in disk io, etc not accuracy) document would be good but still, where to even start.

If anyone has beginner resources, I'll likely be pointing folks to some resources

#machinelearning #yarn #mapreduce #hdfs

Doug Whitfield [Minneapolis] @[email protected] · 2025-01-20 · 20:14 UTC

so, gonna write some stuff on #HDFS #MapReduce #yarn and maybe clustering. Also, #machinelearning was suggested but I think that may be too broad of a topic for this. I did cover Machine Learning in a blog back in 2023, but this time is for KB, not blog: https://www.openlogic.com/blog/using-cassandra-kafka-and-spark-ai

Hmm, perhaps some sort of ML performance (as in disk io, etc not accuracy) document would be good but still, where to even start.

If anyone has beginner resources, I'll likely be pointing folks to some resources

#hdfs #mapreduce #yarn #machinelearning

:rss: Qiita - 人気の記事 @[email protected] · 2024-09-16 · 11:40 UTC

Pythonで始めるMapReduceデータ処理：中級者向け
https://qiita.com/Tadataka_Takahashi/items/997f4e215663a355937a?utm_campaign=popular_items&utm_medium=feed&utm_source=popular_items

#qiita #Python #MapReduce #分散処理 #GoogleColab #大規模データ処理

#qiita #python #mapreduce #分散処理 #googlecolab #大規模データ処理

:rss: Qiita - 人気の記事 @[email protected] · 2024-09-16 · 11:40 UTC

Pythonで始めるMapReduceデータ処理：中級者向け
https://qiita.com/Tadataka_Takahashi/items/997f4e215663a355937a?utm_campaign=popular_items&utm_medium=feed&utm_source=popular_items

#qiita #Python #MapReduce #分散処理 #GoogleColab #大規模データ処理

#qiita #python #mapreduce #分散処理 #googlecolab #大規模データ処理