home.social

#mapreduce — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #mapreduce, aggregated by home.social.

  1. 🚀✨ Behold, the thrilling tale of querying 3 billion vectors—a journey where Vicki Boykis heroically attempts to decode Jeff Dean's cryptic wisdom on #mapreduce. Spoiler: It's basically a nerdy treasure hunt for semantically similar items, but with more floating-point numbers than your brain can handle. 💻🧠
    vickiboykis.com/2026/02/21/que #HackerNews #VickiBoykis #treasureHunt #techJourney #floatingPoint #HackerNews #ngated

  2. Распределенные вычисления в Apache Ignite 3

    В статье разбираются возможности распределённых вычислений в Apache Ignite 3 . Покажу, как развернуть кластер в Docker, задеплоить собственные джобы и сравнить Ignite 3 с предыдущей версией. Затронем новые возможности Ignite как полноценной распределённой платформы, а не просто in-memory кэша.

    habr.com/ru/articles/954928/

    #distributed_computing #распределённые_вычисления #colocated_computations #коллокационные_вычисления #inmemory_database #java #apache_ignite_3 #mapreduce

  3. Как мы заменили сотни Join’ов на один РТ-процессинг с 1kk RPS

    Как связаны скидки, пользовательские пути и огромные массивы данных в Яндекс Рекламе? Привет, Хабр! Меня зовут Максим Стаценко, я работаю с базами данных и яростно в них копаюсь с 2010 года, а в Big Data — с 2016. Сейчас работаю в Яндексе в DWH поиска и рекламы. Мы работаем с ОЧЕНЬ большими данными. Каждый день миллионы пользователей видят рекламу Яндекса, а наши системы обрабатывают огромные объёмы данных. Чтобы реклама работала эффективно, нам нужно в каждый момент времени иметь максимально полную информацию об истории жизни рекламного объявления, а значит нужно каким-то образом передавать данные от одного события к другому внутри рекламной воронки. Расскажу, как мы решали эту проблему.

    habr.com/ru/companies/oleg-bun

    #ytsaurus #mapreduce #olap #oltp #антифрод #распределенные_системы #оптимизация #обработка_данных #хранилища_данных

  4. Соединение SortMergeJoin в Apache Spark

    Рассмотрим, как реализован SortMergeJoin в Apache Spark, и заодно заглянем в исходный код на GitHub. Spark написан на языке Scala, и вся логика работы оператора доступна в открытом репозитории проекта. Вот здесь :) Первое, что рассмотрим - это конструктор кейс-класса 1. Конструктор SortMergeJoinExec

    habr.com/ru/companies/gnivc/ar

    #spark #join #hadoop #bigdata #mapreduce

  5. so, gonna write some stuff on #HDFS #MapReduce #yarn and maybe clustering. Also, #machinelearning was suggested but I think that may be too broad of a topic for this. I did cover Machine Learning in a blog back in 2023, but this time is for KB, not blog: openlogic.com/blog/using-cassa

    Hmm, perhaps some sort of ML performance (as in disk io, etc not accuracy) document would be good but still, where to even start.

    If anyone has beginner resources, I'll likely be pointing folks to some resources

  6. so, gonna write some stuff on #HDFS #MapReduce #yarn and maybe clustering. Also, #machinelearning was suggested but I think that may be too broad of a topic for this. I did cover Machine Learning in a blog back in 2023, but this time is for KB, not blog: openlogic.com/blog/using-cassa

    Hmm, perhaps some sort of ML performance (as in disk io, etc not accuracy) document would be good but still, where to even start.

    If anyone has beginner resources, I'll likely be pointing folks to some resources

  7. so, gonna write some stuff on #HDFS #MapReduce #yarn and maybe clustering. Also, #machinelearning was suggested but I think that may be too broad of a topic for this. I did cover Machine Learning in a blog back in 2023, but this time is for KB, not blog: openlogic.com/blog/using-cassa

    Hmm, perhaps some sort of ML performance (as in disk io, etc not accuracy) document would be good but still, where to even start.

    If anyone has beginner resources, I'll likely be pointing folks to some resources

  8. so, gonna write some stuff on #HDFS #MapReduce #yarn and maybe clustering. Also, #machinelearning was suggested but I think that may be too broad of a topic for this. I did cover Machine Learning in a blog back in 2023, but this time is for KB, not blog: openlogic.com/blog/using-cassa

    Hmm, perhaps some sort of ML performance (as in disk io, etc not accuracy) document would be good but still, where to even start.

    If anyone has beginner resources, I'll likely be pointing folks to some resources

  9. so, gonna write some stuff on #HDFS #MapReduce #yarn and maybe clustering. Also, #machinelearning was suggested but I think that may be too broad of a topic for this. I did cover Machine Learning in a blog back in 2023, but this time is for KB, not blog: openlogic.com/blog/using-cassa

    Hmm, perhaps some sort of ML performance (as in disk io, etc not accuracy) document would be good but still, where to even start.

    If anyone has beginner resources, I'll likely be pointing folks to some resources

  10. The Hadoop ecosystem comprises various tools and frameworks designed to handle large-scale data processing and analytics. Let's discuss the core components, namely Hadoop, HBase, and Hive, along with other significant tools such as Pig, Sqoop, Flume, Oozie, and Zookeeper.

    linuxexpert.org/so-you-wanna-d

    #Hadoop #HBase #Hive #BigData #HadoopEcosystem #HDFS #MapReduce #YARN #Pig #Sqoop #Flume #Oozie #Zookeeper #DataProcessing #DataAnalytics #DataWarehousing #ETL #DataIngestion #Security

  11. The Hadoop ecosystem comprises various tools and frameworks designed to handle large-scale data processing and analytics. Let's discuss the core components, namely Hadoop, HBase, and Hive, along with other significant tools such as Pig, Sqoop, Flume, Oozie, and Zookeeper.

    linuxexpert.org/so-you-wanna-d

    #Hadoop #HBase #Hive #BigData #HadoopEcosystem #HDFS #MapReduce #YARN #Pig #Sqoop #Flume #Oozie #Zookeeper #DataProcessing #DataAnalytics #DataWarehousing #ETL #DataIngestion #Security

  12. The Hadoop ecosystem comprises various tools and frameworks designed to handle large-scale data processing and analytics. Let's discuss the core components, namely Hadoop, HBase, and Hive, along with other significant tools such as Pig, Sqoop, Flume, Oozie, and Zookeeper.

    linuxexpert.org/so-you-wanna-d

    #Hadoop #HBase #Hive #BigData #HadoopEcosystem #HDFS #MapReduce #YARN #Pig #Sqoop #Flume #Oozie #Zookeeper #DataProcessing #DataAnalytics #DataWarehousing #ETL #DataIngestion #Security

  13. The Hadoop ecosystem comprises various tools and frameworks designed to handle large-scale data processing and analytics. Let's discuss the core components, namely Hadoop, HBase, and Hive, along with other significant tools such as Pig, Sqoop, Flume, Oozie, and Zookeeper.

    linuxexpert.org/so-you-wanna-d

    #Hadoop #HBase #Hive #BigData #HadoopEcosystem #HDFS #MapReduce #YARN #Pig #Sqoop #Flume #Oozie #Zookeeper #DataProcessing #DataAnalytics #DataWarehousing #ETL #DataIngestion #Security

  14. The Hadoop ecosystem comprises various tools and frameworks designed to handle large-scale data processing and analytics. Let's discuss the core components, namely Hadoop, HBase, and Hive, along with other significant tools such as Pig, Sqoop, Flume, Oozie, and Zookeeper.

    linuxexpert.org/so-you-wanna-d

    #Hadoop #HBase #Hive #BigData #HadoopEcosystem #HDFS #MapReduce #YARN #Pig #Sqoop #Flume #Oozie #Zookeeper #DataProcessing #DataAnalytics #DataWarehousing #ETL #DataIngestion #Security