home.social

#datapipelines — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #datapipelines, aggregated by home.social.

  1. #LinkedIn has launched a unified integrations platform to standardize & reconcile hiring data across systems.

    • 72% faster onboarding
    • Improved data consistency and completeness
    • Scalable AI-driven hiring enabled via standardized schemas, orchestration workflows, and centralized data processing

    Learn more: bit.ly/48KFwof

    #SoftwareArchitecture #EvolutionaryArchitecture #DataPipelines #DataAnalytics #InfoQ

  2. #Confluent introduces a new approach in #ApacheKafka that moves schema IDs from message payloads to record headers.

    ✅ Simplify schema governance & evolution.
    ✅ Improve compatibility across serialization formats
    ✅ Reduce coupling between data & metadata in event-driven architectures

    Read the deep dive on #InfoQbit.ly/4tF7Fot

    #ML #EventStreamProcessing #ProtocolBuffers #DataPipelines #DataAnalytics

  3. 🎉 Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestra—not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    ✅️ "It works on my laptop" isn't a strategy.
    ✅ Need IaC, partitioning, clustering, and strict error handling.
    ✅ dbt ensures reproducible, tested models.
    ✅ Infra is invisible work—if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  4. OMG, Moldova! 🌍 Apparently, this tiny country is not just good at #Eurovision, but also at breaking data pipelines. 😂 Who knew geopolitical drama could sneak into our AWS #Redshift like a bad soap opera? 🎭📉
    avraam.dev/blog/moldova-broke- #Moldova #dataAWS #geopoliticaldrama #datapipelines #HackerNews #ngated

  5. Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.

    The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.

    At the core of a #ShiftLeft strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.

    📖 Great insights from this #InfoQ article on rethinking the Medallion Architecture: bit.ly/3WHjxsf

    #SoftwareArchitecture #DataMesh #DataEngineering #DataLake #DataPipelines

  6. Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.

    The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.

    At the core of a #ShiftLeft strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.

    📖 Great insights from this #InfoQ article on rethinking the Medallion Architecture: bit.ly/3WHjxsf

    #SoftwareArchitecture #DataMesh #DataEngineering #DataLake #DataPipelines

  7. Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.

    The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.

    At the core of a #ShiftLeft strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.

    📖 Great insights from this #InfoQ article on rethinking the Medallion Architecture: bit.ly/3WHjxsf

    #SoftwareArchitecture #DataMesh #DataEngineering #DataLake #DataPipelines

  8. Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.

    The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.

    At the core of a strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.

    📖 Great insights from this article on rethinking the Medallion Architecture: bit.ly/3WHjxsf

  9. "The release served as a crucial turning point for the project. Downloads from its GitHub repository increased, and more enterprises adopted the software. Encouraged by this growth, the team envisioned the next generation of Airflow: a modular architecture, a more modern user interface, and a “run anywhere, anytime” feature, enabling it to operate on premises, in the cloud, or on edge devices and handle event-driven and ad hoc scenarios in addition to scheduled tasks. The team delivered on this vision with the launch of Airflow 3.0 last April.

    “It was amazing that we managed to ‘rebuild the plane while flying it’ when we worked on Airflow 3—even if we had some temporary issues and glitches,” says Jarek Potiuk, one of the foremost contributors to Airflow and now a member of its project-management committee. “We had to refactor and move a lot of pieces of the software while keeping Airflow 2 running and providing some bug fixes for it.”

    Compared with Airflow’s second version, which Koka says had only a few hundred to a thousand downloads per month on GitHub, “now we’re averaging somewhere between 35 to 40 million downloads a month,” he says. The project’s community also soared, with more than 3,000 developers of all skill levels from around the world contributing to Airflow."

    spectrum.ieee.org/apache-airfl

    #AirFlow #ApacheAirflow #AirBnB #OpenSource #FLOSS #WorkflowOrchestratror #Python #DataPipelines

  10. Shifting Left isn’t just a buzzword - it’s the foundation for efficiency in your organization!

    By making clean, reliable, and accessible data available across your organization, you reduce complexity and unlock time to focus on higher-value work.

    💡 Data products are the foundation of this #ShiftLeft, enabling healthy, scalable data communication.

    📖 Dive into the details in the #InfoQ article: bit.ly/3WHjxsf

    #SoftwareArchitecture #DataMesh #DataLake #DataPipelines #ETL

  11. A #ShiftLeft approach to #DataProcessing relies on data products, which form the basis of data communication across the business.

    This addresses many flaws in traditional data processing and makes data more relevant, complete, and trustworthy.

    #InfoQ article: bit.ly/3WHjxsf

    #SoftwareArchitecture #DataMesh #DataLake #DataPipelines #ETL

  12. Fivetran hauls in $565M on $5.6B valuation, acquires competitor HVR for $700M - Fivetran, the data connectivity startup, had a big day today. For starters it anno... - feedproxy.google.com/~r/Techcr #mergersandacquisitions #andreessenhorowitz #fundings&exits #recentfunding #datapipelines #enterprise #startups #fivetran #cloud #exit #ma