#apachespark — Public Fediverse posts on home.social

Mahmoud Zalt @[email protected] · 2026-05-12 · 03:18 UTC

Treating SparkContext as a control tower shifts how you think about Spark: not just as an API, but as the coordinator for your entire distributed engine.

#ApacheSpark #SparkContext #distributed #systems

#apachespark #sparkcontext #distributed #systems

deitel @[email protected] · 2026-05-03 · 14:07 UTC

Join me Tuesday for my next Python Data Science & AI Full Throttle! https://deitel.com/PYDSFT

O'Reilly Media Pearson #deitel #python #machinelearning #deeplearning #NLP #datamining #ApacheSpark #BigData #IoT #GenAI

#deitel #python #machinelearning #deeplearning #nlp #datamining

InfoQ @[email protected] · 2026-04-08 · 08:47 UTC

96% fewer out-of-memory (OOM) failures!

#Pinterest shared how it improved the reliability of its #ApacheSpark workloads.

By focusing on:
✅ Enhanced observability
✅ Configuration tuning
✅ Automatic memory retries

The changes addressed persistent job failures affecting recommendation systems and large-scale data processing.

Details here ⇨ https://bit.ly/4smqrQD

#SoftwareArchitecture #BigData #CostOptimization #Memory #DistributedSystems #Observability #InfoQ

#pinterest #apachespark #softwarearchitecture #bigdata #costoptimization #memory

TechLİfe @techlife_blog · 2026-03-31 · 08:24 UTC

The Data Lakehouse Explained: Why Apache Iceberg Is Quietly Running the Show

https://techlife.blog/posts/data-lakehouse-iceberg

#ApacheIceberg #DataLakehouse #DataWarehouse #DataLake #Snowflake #ApacheSpark #DataEngineering

#apacheiceberg #datalakehouse #datawarehouse #datalake #snowflake #apachespark

Holden @[email protected] · 2026-03-04 · 21:45 UTC

Bellevue / Seattle area friends: I’m super stoked for next week’s Spark Community Spring (Friday Mar 13th: spooky 👻).

If you’ve ever wanted to contribute to Apache Spark, come hang out and get your first Spark PR started with Felix Cheung, Huaxin Gao, Devin Petersohn, and myself :)

We’ll help folks find starter issues, get their dev environments set up, and walk through the contribution process.

There will be free lunch, and if enough people show up… maybe even Taco Bell for an afternoon snack*.

#ApacheSpark #OSS #hackathon #freelunch #tacofridaymaaaaybe

https://luma.com/rrfvx0ey

(* Depends on attendance)

#apachespark #oss #hackathon #freelunch #tacofridaymaaaaybe

InfoQ @[email protected] · 2026-03-03 · 13:02 UTC

#Pinterest launched a next-gen CDC-based ingestion framework.

Using #ApacheKafka, #ApacheFlink, #ApacheSpark & #ApacheIceberg, they achieved:
• Latency cut from 24+ hours to 15 minutes
• Processing of only changed records
• Support for incremental updates & deletions
• Petabyte-scale data across 1,000+ pipelines

Win: optimized cost & efficiency!

Read the architectural deep dive on InfoQ 👉 https://bit.ly/4rMJB2H

#SoftwareArchitecture #ChangeDataCapture

#pinterest #apachekafka #apacheflink #apachespark #apacheiceberg #softwarearchitecture

InfoQ @[email protected] · 2026-01-19 · 09:53 UTC

#CaseStudy - Agoda consolidated multiple independent data pipelines into a central #ApacheSpark platform, eliminating financial data inconsistencies.

A multi-layered quality framework - with automated checks, ML anomaly detection, and data contracts - ensures accurate financial metrics while handling millions of daily bookings.

Deep dive into the architecture here ⇨ https://bit.ly/4a109NP

#InfoQ #SoftwareArchitecture #AI #DataPipelines

#casestudy #apachespark #infoq #softwarearchitecture #ai #datapipelines

InfoQ @[email protected] · 2025-12-22 · 13:33 UTC

Discover how Decathlon, one of the world’s leading sports retailers, adopted the #opensource library #Polars to optimize its data workflows.

By migrating from Apache Spark to Polars for small input datasets, Decathlon achieved:
• Significant speed
• Meaningful cost savings

👉 Learn more: https://bit.ly/4qmb2zc

#InfoQ #AI #ApacheSpark

#opensource #polars #infoq #ai #apachespark

InfoQ @[email protected] · 2025-12-19 · 07:00 UTC

#CaseStudy - #Lyft rearchitected its ML platform, LyftLearn, into a hybrid system!

Offline workloads now run on AWS SageMaker, while Kubernetes continues to power online model serving.

The result❓ Read #InfoQ and find out 👉 https://bit.ly/3Y3hTBG

#SoftwareArchitecture #AI #ML #ApacheSpark #Kubernetes

#casestudy #lyft #infoq #softwarearchitecture #ai #ml

ADMIN magazine @[email protected] · 2025-08-28 · 15:38 UTC

ICYMI: Abe Sharp looks at Volcano, a #CNCF project that optimizes high-performance workloads on Kubernetes to avoid deadlocks
https://www.admin-magazine.com/Archive/2025/86/Application-aware-batch-scheduler?utm_source=mam
#Kubernetes #scheduler #Volcano #Queue #PodGroup #ApacheSpark #PyTorch #MachineLearning

#cncf #kubernetes #scheduler #volcano #queue #podgroup

Browsejobs @[email protected] · 2025-08-05 · 09:42 UTC

Test your skills with our Data Engineering Quiz!

Tag a data nerd friend who’d ace it!

#DataEngineering #ETL #TechQuiz #DataNerd #EngineerLife #LearnWithFun #TechCareers #BigData #DataPipeline
#DataEngineerLife #SQL #ApacheSpark
#CareerInTech #LearnDataEngineering
#EngineeringQuiz #DataNerd #UpSkill
#TechCareers #DataDriven

#datadriven #upskill #engineeringquiz #learndataengineering #careerintech #apachespark

Browsejobs @[email protected] · 2025-08-05 · 09:42 UTC

Test your skills with our Data Engineering Quiz!

Tag a data nerd friend who’d ace it!

#DataEngineering #ETL #TechQuiz #DataNerd #EngineerLife #LearnWithFun #TechCareers #BigData #DataPipeline
#DataEngineerLife #SQL #ApacheSpark
#CareerInTech #LearnDataEngineering
#EngineeringQuiz #DataNerd #UpSkill
#TechCareers #DataDriven

#dataengineering #etl #techquiz #datanerd #engineerlife #learnwithfun

Andreas Scherbaum @[email protected] · 2025-07-04 · 17:51 UTC

Today is the DBA Appreciation Day!

Bring your DBAs a cake and a coffee, please. And don't drop any tables in production, pretty please. It's weekend ...

#PostgreSQL #SQLServer #Oracle #DB2 #MySQL #MariaDB #Snowflake #SQLite #Neo4j #Teradata #SAPHana #Aerospike #ApacheSpark #Clickhouse #Informix #WarehousePG #Greenplum #Adabas

#postgresql #sqlserver #oracle #db2 #mysql #mariadb

rmoff 🏃🏻 🍺 🥓 @[email protected] · 2024-10-30 · 16:19 UTC

🎃The October issue of #CheckpointChronicle is now out 🌟

It covers Ververica's Fluss, #ApacheFlink 2.0, Iggy.rs, Strimzi's support for #ApacheKafka 4.0, tons of OTF material from @vanlightly, Christian Hollinger's write up of ngrok's data platform, nice detail of how SmartNews use #ApacheIceberg with Flink and #ApacheSpark, a good writeup from Sudhendu Pandey on #ApachePolaris, notes from Kir Titievsky on Kafka's Avro serialisers, and much more!

https://dcbl.link/cc-oct242

#checkpointchronicle #apacheflink #apachekafka #apacheiceberg #apachespark #apachepolaris

Doug Whitfield [Minneapolis] @[email protected] · 2024-07-09 · 20:07 UTC

anybody know if it is ok to run #apachespark and #apachehive on the same box? I have 969 #java processes on this #centos box, which seems like a lot, but not sure if it is actually a problem.

Something is certainly a problem.

#bigdata

#apachespark #apachehive #java #centos #bigdata

Paul King @[email protected] · 2024-05-29 · 11:03 UTC

Latest version of my Whisky clustering using Apache projects talk:
https://speakerdeck.com/paulk/groovy-whiskey
Tickets are still available for CoCEU. #apachecon #communityovercode #apachewayang #ApacheFlink #ApacheSpark #ApacheBeam #ApacheIgnite #ApacheCommons @ApacheGroovy #opensource #machinelearning #groovylang

#apachecon #communityovercode #apachewayang #apacheflink #apachespark #apachebeam

Paul King @[email protected] · 2024-05-29 · 11:03 UTC

Latest version of my Whisky clustering using Apache projects talk:
https://speakerdeck.com/paulk/groovy-whiskey
Tickets are still available for CoCEU. #apachecon #communityovercode #apachewayang #ApacheFlink #ApacheSpark #ApacheBeam #ApacheIgnite #ApacheCommons @ApacheGroovy #opensource #machinelearning #groovylang

#apachecon #communityovercode #apachewayang #apacheflink #apachespark #apachebeam

Paul King @[email protected] · 2024-05-29 · 11:03 UTC

Latest version of my Whisky clustering using Apache projects talk:
https://speakerdeck.com/paulk/groovy-whiskey
Tickets are still available for CoCEU. #apachecon #communityovercode #apachewayang #ApacheFlink #ApacheSpark #ApacheBeam #ApacheIgnite #ApacheCommons @ApacheGroovy #opensource #machinelearning #groovylang

#apachecon #communityovercode #apachewayang #apacheflink #apachespark #apachebeam

Paul King @[email protected] · 2024-05-29 · 11:03 UTC

Latest version of my Whisky clustering using Apache projects talk:
https://speakerdeck.com/paulk/groovy-whiskey
Tickets are still available for CoCEU. #apachecon #communityovercode #apachewayang #ApacheFlink #ApacheSpark #ApacheBeam #ApacheIgnite #ApacheCommons @ApacheGroovy #opensource #machinelearning #groovylang

#apachecon #communityovercode #apachewayang #apacheflink #apachespark #apachebeam

jay @[email protected] · 2023-08-28 · 00:57 UTC

🔥⏲️ Fudge Sunday "Are You Gonna Go Parquet" A look at the past, present, and future of Apache Parquet

#apacheiceberg #apachespark #prestodb #prestosql #trino #aiops #mlops #artificialintelligence #ai #aiforgood #aiforall #aiandbusiness #datalake #datalakehouse #datalakes #insights #dataengineering #realtimeanalytics #realtimedata #dataintegration #platformengineering #watsonx #devx #developerexperience #newsletter #newsletters

https://fudge.org/archive/are-you-gonna-go-parquet/

#apacheiceberg #apachespark #prestodb #prestosql #trino #aiops

Danica Fine @[email protected] · 2023-06-21 · 15:59 UTC

Curious to hear from folks, besides #apacheKafka, what are your favorite #streaming technologies? 🤔

#streamingdata #eventstreaming #streamingtechnology #apacheFlink #apacheSpark #apachePulsar

#apachekafka #streaming #streamingdata #eventstreaming #streamingtechnology #apacheflink

Dylan Van Assche @dylanvanassche · 2023-05-28 · 14:16 UTC

Claus Stadler is presenting their work behind SANSA: 'Scaling RML and SPARQL-based Knowledge Graph Construction with Apache Spark' now at the Knowledge Graph Construction Workshop!

#ESWC2023 #KGCW2023 #RML #SPARQL #ApacheSpark @eswc_conf @aksw

#eswc2023 #kgcw2023 #rml #sparql #apachespark

IT News @[email protected] · 2023-04-13 · 22:35 UTC

“A really big deal”—Dolly is a free, open source, ChatGPT-style AI model - Enlarge (credit: Databricks)

On Wednesday, Databricks released... - https://arstechnica.com/?p=1931693 #largelanguagemodels #machinelearning #textsynthesis #apachespark #databricks #eleutherai #finetuning #biz⁢ #pythia #dolly #llama #meta #ai

#ai #meta #llama #dolly #pythia #biz

Paul King @[email protected] · 2022-12-17 · 10:19 UTC

Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv

https://www.javaadvent.com/2022/12/groovy-and-data-science.html

Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning #javaadvent22

#apachebeam #apachecamel #apachecommons #apacheignite #apachemxnet #apacheopennlp

Paul King @[email protected] · 2022-12-17 · 10:19 UTC

Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv

https://www.javaadvent.com/2022/12/groovy-and-data-science.html

Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning #javaadvent22

#javaadvent22 #apachebeam #apachecamel #apachecommons #apacheignite #apachemxnet

Paul King @[email protected] · 2022-12-17 · 10:19 UTC

Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv

https://www.javaadvent.com/2022/12/groovy-and-data-science.html

Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning #javaadvent22

#apachebeam #apachecamel #apachecommons #apacheignite #apachemxnet #apacheopennlp

Paul King @[email protected] · 2022-12-17 · 10:19 UTC

Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv

https://www.javaadvent.com/2022/12/groovy-and-data-science.html

Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning #javaadvent22

#javaadvent22 #deeplearning #neuralnets #groovylang #ml #ai

Paul King @[email protected] · 2022-12-17 · 10:19 UTC

Data science & Groovy using #ApacheBeam #ApacheCamel #ApacheCommons @ApacheGroovy #ApacheIgnite #ApacheMXNet #ApacheOpennlp #ApacheSpark #ApacheWayang #Datumbox #deepjavalibrary #DeepNetts #EclipseDL4J #graalvm #gradle #stanfordnlp #TensorFlow #Tribuo #smile #tablesaw #opencsv

https://www.javaadvent.com/2022/12/groovy-and-data-science.html

Covers data manipulation, regression, clustering, classification, natural language processing & object detection #jsr381 #visrec #ai #ml #groovylang #neuralnets #deeplearning #javaadvent22

#apachebeam #apachecamel #apachecommons #apacheignite #apachemxnet #apacheopennlp

heise online (inoffiziell) @[email protected] · 2022-08-19 · 10:03 UTC

Die Cloudera Data Platform One bündelt alle für Datenanalyse und -erkundung erforderlichen Tools als Software-as-a-Service auf Basis der Lakehouse-Architektur.
Data Science: Cloudera startet All-in-one-Datendienst in der Cloud

#cloudera #apachespark #datascience #dataengineering

heise online (inoffiziell) @[email protected] · 2022-08-02 · 10:46 UTC

Die Big Data Tools 1.6, ein Plug-in für Zugriff auf Zeppelin Notebooks, beherrscht nun auch das Monitoring von Apache Flink und bindet den Hive Metastore ein.
JetBrains' Big Data Tools 1.6 behalten Flink-Jobs im Auge

#bigdata #jetbrains #apachespark #apachezeppelin #apachehive

heise online (inoffiziell) @[email protected] · 2022-07-06 · 11:07 UTC

Das neue Release der Data-Science-Software SystemDS führt ein Federated Backend für Mehrmandantenfähigkeit ein und vollzieht das Update auf Java 11 und Spark 3.
Data Science: Apache SystemDS 3.0 erhält ein Backend für Multi-Tenancy

#bigdata #apachespark #databricks #apachesystemds #datastreaming

heise online (inoffiziell) @[email protected] · 2022-01-31 · 14:36 UTC

Google und OpenMined machen die Vorzüge des differenzierten Datenschutzes auch der Python Developer Community als Open Source zugänglich.
PipelineDP: Differential Privacy Framework für das Python-Universum

#google #python #apachespark #datascience #differentialprivacy #openmined

heise online (inoffiziell) @[email protected] · 2022-01-31 · 14:36 UTC

Google und OpenMined machen die Vorzüge des differenzierten Datenschutzes auch der Python Developer Community als Open Source zugänglich.
PipelineDP: Differential Privacy Framework für das Python-Universum

#google #python #apachespark #datascience #differentialprivacy #openmined

heise online (inoffiziell) @[email protected] · 2021-03-23 · 08:40 UTC

Die Erweiterung zum Zugriff auf Zeppelin Notebooks und für das Monitoring von Spark- und Hadoop-Anwendungen ist nun in Version 1.0 verfügbar.
Big Data Tools: JetBrains-Plug-in für Apache Zeppelin verlässt die Preview-Phase

#bigdata #jetbrains #apachespark #apachezeppelin

IT News @[email protected] · 2020-11-12 · 16:35 UTC

Databricks launches SQL Analytics - AI and data analytics company Databricks today announced the launch of SQL Analytics, a new service ... - http://feedproxy.google.com/~r/Techcrunch/~3/UMT86lb4A7s/ #artificialintelligence #businessintelligence #fishtownanalytics #machinelearning #datamanagement #dataprocessing #datawarehouse #dataanalysis #apachespark #datascience #information #enterprise #databricks #alighodsi #analytics #democrats #matillion #datalake #fivetran #tableau #looker #cloud

#cloud #looker #tableau #fivetran #datalake #matillion

heise online (inoffiziell) @[email protected] · 2020-09-18 · 10:58 UTC

Die Tools für ETL-Prozesse, Data-Pipeline-Orchestrierung, Automatisierung und Monitoring sind als Spark-Service in die Cloudera Data Platform integriert.
Cloudera startet Cloud-nativen Dienst für Data Engineering

#cloudera #apachespark #dataengineering

heise online (inoffiziell) @[email protected] · 2020-06-19 · 12:26 UTC

Das Major-Release der Big-Data-Engine hat viele Verbesserungen, aber auch neue Ansätze im Gepäck, die höhere Performance und mehr Kompatibilität versprechen.
Apache Spark 3.0 liefert erweiterte SQL-Funktionen und ein Update der Python-API
#ApacheSpark #BigData #DataStreaming #Databricks

#bigdata #apachespark #databricks #datastreaming

heise online (inoffiziell) @[email protected] · 2020-06-02 · 10:47 UTC

Nutzer der IDEs erhalten über das Plug-in direkten Zugriff auf Zeppelin Notebooks, Spark-Anwendungen sowie Dateien auf S3-Speicherinstanzen
Big Data Tools: JetBrains gibt Plug-in für IntelliJ, PyCharm und DataGrip frei
#ApacheSpark #ApacheZeppelin #BigData #DataGrip #IntelliJIDEA #JetBrains #PyCharm

#bigdata #jetbrains #pycharm #intellijidea #apachespark #apachezeppelin

IT News @[email protected] · 2019-09-10 · 16:20 UTC

Google brings Cloud Dataproc to Kubernetes - Cloud Dataproc is probably one of the lesser-known products in Google Cloud’s portfolio, but it’s a ... more: http://feedproxy.google.com/~r/Techcrunch/~3/ojK5tQQznp4/ #apachesoftwarefoundation #cloudinfrastructure #dataprocessing #apachehadoop #apachespark #googlecloud #kubernetes #developer #apache #google #hadoop #cloud #gke

#gke #cloud #hadoop #google #apache #developer