#apachespark — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #apachespark, aggregated by home.social.
-
Treating SparkContext as a control tower shifts how you think about Spark: not just as an API, but as the coordinator for your entire distributed engine.
Read More: https://zalt.me/blog/2026/05/sparkcontext-control-tower
-
Join me Tuesday for my next Python Data Science & AI Full Throttle! https://deitel.com/PYDSFT
O'Reilly Media Pearson #deitel #python #machinelearning #deeplearning #NLP #datamining #ApacheSpark #BigData #IoT #GenAI
-
96% fewer out-of-memory (OOM) failures!
#Pinterest shared how it improved the reliability of its #ApacheSpark workloads.
By focusing on:
✅ Enhanced observability
✅ Configuration tuning
✅ Automatic memory retriesThe changes addressed persistent job failures affecting recommendation systems and large-scale data processing.
Details here ⇨ https://bit.ly/4smqrQD
#SoftwareArchitecture #BigData #CostOptimization #Memory #DistributedSystems #Observability #InfoQ
-
The Data Lakehouse Explained: Why Apache Iceberg Is Quietly Running the Show
https://techlife.blog/posts/data-lakehouse-iceberg
#ApacheIceberg #DataLakehouse #DataWarehouse #DataLake #Snowflake #ApacheSpark #DataEngineering
-
Bellevue / Seattle area friends: I’m super stoked for next week’s Spark Community Spring (Friday Mar 13th: spooky 👻).
If you’ve ever wanted to contribute to Apache Spark, come hang out and get your first Spark PR started with Felix Cheung, Huaxin Gao, Devin Petersohn, and myself :)
We’ll help folks find starter issues, get their dev environments set up, and walk through the contribution process.
There will be free lunch, and if enough people show up… maybe even Taco Bell for an afternoon snack*.
#ApacheSpark #OSS #hackathon #freelunch #tacofridaymaaaaybe
(* Depends on attendance)
-
#Pinterest launched a next-gen CDC-based ingestion framework.
Using #ApacheKafka, #ApacheFlink, #ApacheSpark & #ApacheIceberg, they achieved:
• Latency cut from 24+ hours to 15 minutes
• Processing of only changed records
• Support for incremental updates & deletions
• Petabyte-scale data across 1,000+ pipelinesWin: optimized cost & efficiency!
Read the architectural deep dive on InfoQ 👉 https://bit.ly/4rMJB2H
-
#CaseStudy - Agoda consolidated multiple independent data pipelines into a central #ApacheSpark platform, eliminating financial data inconsistencies.
A multi-layered quality framework - with automated checks, ML anomaly detection, and data contracts - ensures accurate financial metrics while handling millions of daily bookings.
Deep dive into the architecture here ⇨ https://bit.ly/4a109NP
-
Discover how Decathlon, one of the world’s leading sports retailers, adopted the #opensource library #Polars to optimize its data workflows.
By migrating from Apache Spark to Polars for small input datasets, Decathlon achieved:
• Significant speed
• Meaningful cost savings👉 Learn more: https://bit.ly/4qmb2zc
-
#CaseStudy - #Lyft rearchitected its ML platform, LyftLearn, into a hybrid system!
Offline workloads now run on AWS SageMaker, while Kubernetes continues to power online model serving.
The result❓ Read #InfoQ and find out 👉 https://bit.ly/3Y3hTBG
-
ICYMI: Abe Sharp looks at Volcano, a #CNCF project that optimizes high-performance workloads on Kubernetes to avoid deadlocks
https://www.admin-magazine.com/Archive/2025/86/Application-aware-batch-scheduler?utm_source=mam
#Kubernetes #scheduler #Volcano #Queue #PodGroup #ApacheSpark #PyTorch #MachineLearning -
Test your skills with our Data Engineering Quiz!
Tag a data nerd friend who’d ace it!
#DataEngineering #ETL #TechQuiz #DataNerd #EngineerLife #LearnWithFun #TechCareers #BigData #DataPipeline
#DataEngineerLife #SQL #ApacheSpark
#CareerInTech #LearnDataEngineering
#EngineeringQuiz #DataNerd #UpSkill
#TechCareers #DataDriven -
Test your skills with our Data Engineering Quiz!
Tag a data nerd friend who’d ace it!
#DataEngineering #ETL #TechQuiz #DataNerd #EngineerLife #LearnWithFun #TechCareers #BigData #DataPipeline
#DataEngineerLife #SQL #ApacheSpark
#CareerInTech #LearnDataEngineering
#EngineeringQuiz #DataNerd #UpSkill
#TechCareers #DataDriven -
Today is the DBA Appreciation Day!
Bring your DBAs a cake and a coffee, please. And don't drop any tables in production, pretty please. It's weekend ...
#PostgreSQL #SQLServer #Oracle #DB2 #MySQL #MariaDB #Snowflake #SQLite #Neo4j #Teradata #SAPHana #Aerospike #ApacheSpark #Clickhouse #Informix #WarehousePG #Greenplum #Adabas
-
🎃The October issue of #CheckpointChronicle is now out 🌟
It covers Ververica's Fluss, #ApacheFlink 2.0, Iggy.rs, Strimzi's support for #ApacheKafka 4.0, tons of OTF material from @vanlightly, Christian Hollinger's write up of ngrok's data platform, nice detail of how SmartNews use #ApacheIceberg with Flink and #ApacheSpark, a good writeup from Sudhendu Pandey on #ApachePolaris, notes from Kir Titievsky on Kafka's Avro serialisers, and much more!
-
anybody know if it is ok to run #apachespark and #apachehive on the same box? I have 969 #java processes on this #centos box, which seems like a lot, but not sure if it is actually a problem.
Something is certainly a problem.
-
Latest version of my Whisky clustering using Apache projects talk:
https://speakerdeck.com/paulk/groovy-whiskey
Tickets are still available for CoCEU. #apachecon #communityovercode #apachewayang #ApacheFlink #ApacheSpark #ApacheBeam #ApacheIgnite #ApacheCommons @ApacheGroovy #opensource #machinelearning #groovylang -
Latest version of my Whisky clustering using Apache projects talk:
https://speakerdeck.com/paulk/groovy-whiskey
Tickets are still available for CoCEU. #apachecon #communityovercode #apachewayang #ApacheFlink #ApacheSpark #ApacheBeam #ApacheIgnite #ApacheCommons @ApacheGroovy #opensource #machinelearning #groovylang -
Latest version of my Whisky clustering using Apache projects talk:
https://speakerdeck.com/paulk/groovy-whiskey
Tickets are still available for CoCEU. #apachecon #communityovercode #apachewayang #ApacheFlink #ApacheSpark #ApacheBeam #ApacheIgnite #ApacheCommons @ApacheGroovy #opensource #machinelearning #groovylang -
Latest version of my Whisky clustering using Apache projects talk:
https://speakerdeck.com/paulk/groovy-whiskey
Tickets are still available for CoCEU. #apachecon #communityovercode #apachewayang #ApacheFlink #ApacheSpark #ApacheBeam #ApacheIgnite #ApacheCommons @ApacheGroovy #opensource #machinelearning #groovylang -
🔥⏲️ Fudge Sunday "Are You Gonna Go Parquet" A look at the past, present, and future of Apache Parquet
#apacheiceberg #apachespark #prestodb #prestosql #trino #aiops #mlops #artificialintelligence #ai #aiforgood #aiforall #aiandbusiness #datalake #datalakehouse #datalakes #insights #dataengineering #realtimeanalytics #realtimedata #dataintegration #platformengineering #watsonx #devx #developerexperience #newsletter #newsletters
-
Curious to hear from folks, besides #apacheKafka, what are your favorite #streaming technologies? 🤔
#streamingdata #eventstreaming #streamingtechnology #apacheFlink #apacheSpark #apachePulsar
-
Claus Stadler is presenting their work behind SANSA: 'Scaling RML and SPARQL-based Knowledge Graph Construction with Apache Spark' now at the Knowledge Graph Construction Workshop!
#ESWC2023 #KGCW2023 #RML #SPARQL #ApacheSpark @eswc_conf @aksw
-
“A really big deal”—Dolly is a free, open source, ChatGPT-style AI model - Enlarge (credit: Databricks)
On Wednesday, Databricks released... - https://arstechnica.com/?p=1931693 #largelanguagemodels #machinelearning #textsynthesis #apachespark #databricks #eleutherai #finetuning #biz #pythia #dolly #llama #meta #ai
-
Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv
https://www.javaadvent.com/2022/12/groovy-and-data-science.html
Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning #javaadvent22
-
Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv
https://www.javaadvent.com/2022/12/groovy-and-data-science.html
Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning #javaadvent22
-
Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv
https://www.javaadvent.com/2022/12/groovy-and-data-science.html
Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning #javaadvent22
-
Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv
https://www.javaadvent.com/2022/12/groovy-and-data-science.html
Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning #javaadvent22
-
Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv
https://www.javaadvent.com/2022/12/groovy-and-data-science.html
Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning #javaadvent22
-
Die Cloudera Data Platform One bündelt alle für Datenanalyse und -erkundung erforderlichen Tools als Software-as-a-Service auf Basis der Lakehouse-Architektur.
Data Science: Cloudera startet All-in-one-Datendienst in der Cloud -
Die Big Data Tools 1.6, ein Plug-in für Zugriff auf Zeppelin Notebooks, beherrscht nun auch das Monitoring von Apache Flink und bindet den Hive Metastore ein.
JetBrains' Big Data Tools 1.6 behalten Flink-Jobs im Auge -
Das neue Release der Data-Science-Software SystemDS führt ein Federated Backend für Mehrmandantenfähigkeit ein und vollzieht das Update auf Java 11 und Spark 3.
Data Science: Apache SystemDS 3.0 erhält ein Backend für Multi-Tenancy -
Google und OpenMined machen die Vorzüge des differenzierten Datenschutzes auch der Python Developer Community als Open Source zugänglich.
PipelineDP: Differential Privacy Framework für das Python-Universum -
Google und OpenMined machen die Vorzüge des differenzierten Datenschutzes auch der Python Developer Community als Open Source zugänglich.
PipelineDP: Differential Privacy Framework für das Python-Universum -
Die Erweiterung zum Zugriff auf Zeppelin Notebooks und für das Monitoring von Spark- und Hadoop-Anwendungen ist nun in Version 1.0 verfügbar.
Big Data Tools: JetBrains-Plug-in für Apache Zeppelin verlässt die Preview-Phase -
Databricks launches SQL Analytics - AI and data analytics company Databricks today announced the launch of SQL Analytics, a new service ... - http://feedproxy.google.com/~r/Techcrunch/~3/UMT86lb4A7s/ #artificialintelligence #businessintelligence #fishtownanalytics #machinelearning #datamanagement #dataprocessing #datawarehouse #dataanalysis #apachespark #datascience #information #enterprise #databricks #alighodsi #analytics #democrats #matillion #datalake #fivetran #tableau #looker #cloud
-
Die Tools für ETL-Prozesse, Data-Pipeline-Orchestrierung, Automatisierung und Monitoring sind als Spark-Service in die Cloudera Data Platform integriert.
Cloudera startet Cloud-nativen Dienst für Data Engineering -
Das Major-Release der Big-Data-Engine hat viele Verbesserungen, aber auch neue Ansätze im Gepäck, die höhere Performance und mehr Kompatibilität versprechen.
Apache Spark 3.0 liefert erweiterte SQL-Funktionen und ein Update der Python-API
#ApacheSpark #BigData #DataStreaming #Databricks -
Nutzer der IDEs erhalten über das Plug-in direkten Zugriff auf Zeppelin Notebooks, Spark-Anwendungen sowie Dateien auf S3-Speicherinstanzen
Big Data Tools: JetBrains gibt Plug-in für IntelliJ, PyCharm und DataGrip frei
#ApacheSpark #ApacheZeppelin #BigData #DataGrip #IntelliJIDEA #JetBrains #PyCharm -
Google brings Cloud Dataproc to Kubernetes - Cloud Dataproc is probably one of the lesser-known products in Google Cloud’s portfolio, but it’s a ... more: http://feedproxy.google.com/~r/Techcrunch/~3/ojK5tQQznp4/ #apachesoftwarefoundation #cloudinfrastructure #dataprocessing #apachehadoop #apachespark #googlecloud #kubernetes #developer #apache #google #hadoop #cloud #gke