home.social

#datapipelines — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #datapipelines, aggregated by home.social.

  1. #LinkedIn has launched a unified integrations platform to standardize & reconcile hiring data across systems.

    • 72% faster onboarding
    • Improved data consistency and completeness
    • Scalable AI-driven hiring enabled via standardized schemas, orchestration workflows, and centralized data processing

    Learn more: bit.ly/48KFwof

    #SoftwareArchitecture #EvolutionaryArchitecture #DataPipelines #DataAnalytics #InfoQ

  2. #LinkedIn has launched a unified integrations platform to standardize & reconcile hiring data across systems.

    • 72% faster onboarding
    • Improved data consistency and completeness
    • Scalable AI-driven hiring enabled via standardized schemas, orchestration workflows, and centralized data processing

    Learn more: bit.ly/48KFwof

    #SoftwareArchitecture #EvolutionaryArchitecture #DataPipelines #DataAnalytics #InfoQ

  3. #LinkedIn has launched a unified integrations platform to standardize & reconcile hiring data across systems.

    • 72% faster onboarding
    • Improved data consistency and completeness
    • Scalable AI-driven hiring enabled via standardized schemas, orchestration workflows, and centralized data processing

    Learn more: bit.ly/48KFwof

    #SoftwareArchitecture #EvolutionaryArchitecture #DataPipelines #DataAnalytics #InfoQ

  4. #LinkedIn has launched a unified integrations platform to standardize & reconcile hiring data across systems.

    • 72% faster onboarding
    • Improved data consistency and completeness
    • Scalable AI-driven hiring enabled via standardized schemas, orchestration workflows, and centralized data processing

    Learn more: bit.ly/48KFwof

    #SoftwareArchitecture #EvolutionaryArchitecture #DataPipelines #DataAnalytics #InfoQ

  5. has launched a unified integrations platform to standardize & reconcile hiring data across systems.

    • 72% faster onboarding
    • Improved data consistency and completeness
    • Scalable AI-driven hiring enabled via standardized schemas, orchestration workflows, and centralized data processing

    Learn more: bit.ly/48KFwof

  6. #Confluent introduces a new approach in #ApacheKafka that moves schema IDs from message payloads to record headers.

    ✅ Simplify schema governance & evolution.
    ✅ Improve compatibility across serialization formats
    ✅ Reduce coupling between data & metadata in event-driven architectures

    Read the deep dive on #InfoQbit.ly/4tF7Fot

    #ML #EventStreamProcessing #ProtocolBuffers #DataPipelines #DataAnalytics

  7. #Confluent introduces a new approach in #ApacheKafka that moves schema IDs from message payloads to record headers.

    ✅ Simplify schema governance & evolution.
    ✅ Improve compatibility across serialization formats
    ✅ Reduce coupling between data & metadata in event-driven architectures

    Read the deep dive on #InfoQbit.ly/4tF7Fot

    #ML #EventStreamProcessing #ProtocolBuffers #DataPipelines #DataAnalytics

  8. #Confluent introduces a new approach in #ApacheKafka that moves schema IDs from message payloads to record headers.

    ✅ Simplify schema governance & evolution.
    ✅ Improve compatibility across serialization formats
    ✅ Reduce coupling between data & metadata in event-driven architectures

    Read the deep dive on #InfoQbit.ly/4tF7Fot

    #ML #EventStreamProcessing #ProtocolBuffers #DataPipelines #DataAnalytics

  9. #Confluent introduces a new approach in #ApacheKafka that moves schema IDs from message payloads to record headers.

    ✅ Simplify schema governance & evolution.
    ✅ Improve compatibility across serialization formats
    ✅ Reduce coupling between data & metadata in event-driven architectures

    Read the deep dive on #InfoQbit.ly/4tF7Fot

    #ML #EventStreamProcessing #ProtocolBuffers #DataPipelines #DataAnalytics

  10. introduces a new approach in that moves schema IDs from message payloads to record headers.

    ✅ Simplify schema governance & evolution.
    ✅ Improve compatibility across serialization formats
    ✅ Reduce coupling between data & metadata in event-driven architectures

    Read the deep dive on bit.ly/4tF7Fot

  11. 🎉 Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestra—not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    ✅️ "It works on my laptop" isn't a strategy.
    ✅ Need IaC, partitioning, clustering, and strict error handling.
    ✅ dbt ensures reproducible, tested models.
    ✅ Infra is invisible work—if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  12. 🎉 Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestra—not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    ✅️ "It works on my laptop" isn't a strategy.
    ✅ Need IaC, partitioning, clustering, and strict error handling.
    ✅ dbt ensures reproducible, tested models.
    ✅ Infra is invisible work—if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  13. 🎉 Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestra—not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    ✅️ "It works on my laptop" isn't a strategy.
    ✅ Need IaC, partitioning, clustering, and strict error handling.
    ✅ dbt ensures reproducible, tested models.
    ✅ Infra is invisible work—if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  14. 🎉 Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestra—not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    ✅️ "It works on my laptop" isn't a strategy.
    ✅ Need IaC, partitioning, clustering, and strict error handling.
    ✅ dbt ensures reproducible, tested models.
    ✅ Infra is invisible work—if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  15. 🎉 Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestra—not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    ✅️ "It works on my laptop" isn't a strategy.
    ✅ Need IaC, partitioning, clustering, and strict error handling.
    ✅ dbt ensures reproducible, tested models.
    ✅ Infra is invisible work—if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

  16. In this #InfoQ article, Vignesh Durai explains how agentic and multimodal AI systems can be engineered using #ApacheCamel & #LangChain4j.

    The solution combines LLM-based reasoning, retrieval-augmented generation (RAG), and image classification.

    🔗 Read now: bit.ly/4sXdlcM

    #AI #LLMs #DataPipelines

  17. In this #InfoQ article, Vignesh Durai explains how agentic and multimodal AI systems can be engineered using #ApacheCamel & #LangChain4j.

    The solution combines LLM-based reasoning, retrieval-augmented generation (RAG), and image classification.

    🔗 Read now: bit.ly/4sXdlcM

    #AI #LLMs #DataPipelines

  18. In this #InfoQ article, Vignesh Durai explains how agentic and multimodal AI systems can be engineered using #ApacheCamel & #LangChain4j.

    The solution combines LLM-based reasoning, retrieval-augmented generation (RAG), and image classification.

    🔗 Read now: bit.ly/4sXdlcM

    #AI #LLMs #DataPipelines

  19. In this #InfoQ article, Vignesh Durai explains how agentic and multimodal AI systems can be engineered using #ApacheCamel & #LangChain4j.

    The solution combines LLM-based reasoning, retrieval-augmented generation (RAG), and image classification.

    🔗 Read now: bit.ly/4sXdlcM

    #AI #LLMs #DataPipelines

  20. In this article, Vignesh Durai explains how agentic and multimodal AI systems can be engineered using & .

    The solution combines LLM-based reasoning, retrieval-augmented generation (RAG), and image classification.

    🔗 Read now: bit.ly/4sXdlcM

  21. Astro CLI Touts Agent-Ready Airflow Access

    New Astro CLI feature lets agents control Airflow directly. See how this changes data workflows and what it means for developers starting 15 May 2024.

    #AstroCLI, #AirflowAPI, #AIDataEngineering, #DevOps, #DataPipelines

    newsletter.tf/astro-cli-agent-

  22. Astro CLI Touts Agent-Ready Airflow Access

    New Astro CLI feature lets agents control Airflow directly. See how this changes data workflows and what it means for developers starting 15 May 2024.

    #AstroCLI, #AirflowAPI, #AIDataEngineering, #DevOps, #DataPipelines

    newsletter.tf/astro-cli-agent-

  23. Astro CLI Touts Agent-Ready Airflow Access

    New Astro CLI feature lets agents control Airflow directly. See how this changes data workflows and what it means for developers starting 15 May 2024.

    #AstroCLI, #AirflowAPI, #AIDataEngineering, #DevOps, #DataPipelines

    newsletter.tf/astro-cli-agent-

  24. Astro CLI Touts Agent-Ready Airflow Access

    New Astro CLI feature lets agents control Airflow directly. See how this changes data workflows and what it means for developers starting 15 May 2024.

    #AstroCLI, #AirflowAPI, #AIDataEngineering, #DevOps, #DataPipelines

    newsletter.tf/astro-cli-agent-

  25. Astro CLI Touts Agent-Ready Airflow Access

    New Astro CLI feature lets agents control Airflow directly. See how this changes data workflows and what it means for developers starting 15 May 2024.

    #AstroCLI, #AirflowAPI, #AIDataEngineering, #DevOps, #DataPipelines

    newsletter.tf/astro-cli-agent-

  26. Astro CLI now lets AI agents control Airflow directly, a big step from 15 May 2024. This is like giving robots the keys to manage complex data tasks.

    #AstroCLI, #AirflowAPI, #AIDataEngineering, #DevOps, #DataPipelines
    newsletter.tf/astro-cli-agent-

  27. Astro CLI now lets AI agents control Airflow directly, a big step from 15 May 2024. This is like giving robots the keys to manage complex data tasks.

    #AstroCLI, #AirflowAPI, #AIDataEngineering, #DevOps, #DataPipelines
    newsletter.tf/astro-cli-agent-

  28. Diving deep into Spark batch processing!⚡️

    Learned how to:
    ✅ Optimize data pipelines with filtering, repartitioning & grouping
    ✅ Design efficient ETL pipelines with Spark
    ✅ Understanding when and how to use partitioning strategies
    ✅ Use Google Cloud Storage (GCS) as a data source for Spark applications and configuring Spark to read Parquet or other formats from GCS
    ✅ Visualize execution plans for efficient coding
    ✅ Review the Spark UI for performance monitoring

    💡 Key takeaway: One thing that amazes me about distributed computing is how we've transformed from struggling with massive datasets to generating insights in near real-time. As an analyst who has dealt with long wait times in processing data, spark saves so much time in getting results faster and make data-driven decisions more quickly.

    Review my work here: github.com/ammartin8/data_engi

  29. What is Data Engineering? Tips, Tools, & Why It Matters

    Data engineering helps organizations collect, transform, and manage large volumes of raw data for analytics and decision-making. Reliable data pipelines, integration, and automation ensure high-quality data for business intelligence and machine learning.

    Learn key tips, tools, and best practices:
    hitechanalytics.com/blog/what-

    #DataEngineering #DataPipelines #DataIntegration #ETL

  30. Les pipelines de données influencent vos décisions (même sans que vous le sachiez). Vérifiez les contrôles, impliquez tek, surveillez les métadonnées. Une donnée fiable = une décision fiable.

    #DataDriven #DecisionMaking #DataPipelines #DataEngineer #Data

    linkedin.com/posts/gabriel-cha

  31. OMG, Moldova! 🌍 Apparently, this tiny country is not just good at #Eurovision, but also at breaking data pipelines. 😂 Who knew geopolitical drama could sneak into our AWS #Redshift like a bad soap opera? 🎭📉
    avraam.dev/blog/moldova-broke- #Moldova #dataAWS #geopoliticaldrama #datapipelines #HackerNews #ngated

  32. OMG, Moldova! 🌍 Apparently, this tiny country is not just good at #Eurovision, but also at breaking data pipelines. 😂 Who knew geopolitical drama could sneak into our AWS #Redshift like a bad soap opera? 🎭📉
    avraam.dev/blog/moldova-broke- #Moldova #dataAWS #geopoliticaldrama #datapipelines #HackerNews #ngated

  33. OMG, Moldova! 🌍 Apparently, this tiny country is not just good at #Eurovision, but also at breaking data pipelines. 😂 Who knew geopolitical drama could sneak into our AWS #Redshift like a bad soap opera? 🎭📉
    avraam.dev/blog/moldova-broke- #Moldova #dataAWS #geopoliticaldrama #datapipelines #HackerNews #ngated

  34. OMG, Moldova! 🌍 Apparently, this tiny country is not just good at #Eurovision, but also at breaking data pipelines. 😂 Who knew geopolitical drama could sneak into our AWS #Redshift like a bad soap opera? 🎭📉
    avraam.dev/blog/moldova-broke- #Moldova #dataAWS #geopoliticaldrama #datapipelines #HackerNews #ngated

  35. Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.

    The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.

    At the core of a #ShiftLeft strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.

    📖 Great insights from this #InfoQ article on rethinking the Medallion Architecture: bit.ly/3WHjxsf

    #SoftwareArchitecture #DataMesh #DataEngineering #DataLake #DataPipelines

  36. Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.

    The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.

    At the core of a #ShiftLeft strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.

    📖 Great insights from this #InfoQ article on rethinking the Medallion Architecture: bit.ly/3WHjxsf

    #SoftwareArchitecture #DataMesh #DataEngineering #DataLake #DataPipelines

  37. Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.

    The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.

    At the core of a #ShiftLeft strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.

    📖 Great insights from this #InfoQ article on rethinking the Medallion Architecture: bit.ly/3WHjxsf

    #SoftwareArchitecture #DataMesh #DataEngineering #DataLake #DataPipelines

  38. Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.

    The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.

    At the core of a strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.

    📖 Great insights from this article on rethinking the Medallion Architecture: bit.ly/3WHjxsf

  39. Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.

    The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.

    At the core of a #ShiftLeft strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.

    📖 Great insights from this #InfoQ article on rethinking the Medallion Architecture: bit.ly/3WHjxsf

    #SoftwareArchitecture #DataMesh #DataEngineering #DataLake #DataPipelines

  40. #CaseStudy - Agoda consolidated multiple independent data pipelines into a central #ApacheSpark platform, eliminating financial data inconsistencies.

    A multi-layered quality framework - with automated checks, ML anomaly detection, and data contracts - ensures accurate financial metrics while handling millions of daily bookings.

    Deep dive into the architecture here ⇨ bit.ly/4a109NP

    #InfoQ #SoftwareArchitecture #AI #DataPipelines