#data-engineering — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #data-engineering, aggregated by home.social.
-
The Postgres "lakehouse" pitch is now real product: Snowflake open-sourced pg_lake (Iceberg access, with Postgres as the catalog), and Databricks took Lakebase to GA.
The question for your team is not the keynote. It is whether Postgres-over-Iceberg replaces your analytics stack or just moves the complexity.
-
The Postgres "lakehouse" pitch is now real product: Snowflake open-sourced pg_lake (Iceberg access, with Postgres as the catalog), and Databricks took Lakebase to GA.
The question for your team is not the keynote. It is whether Postgres-over-Iceberg replaces your analytics stack or just moves the complexity.
-
Learn how configuration-driven architecture helps multi-state Medicaid platforms scale by handling X12 834 variation without hardcoded logic. https://hackernoon.com/building-medicaid-data-platforms-that-scale-across-states #dataengineering
-
Learn how configuration-driven architecture helps multi-state Medicaid platforms scale by handling X12 834 variation without hardcoded logic. https://hackernoon.com/building-medicaid-data-platforms-that-scale-across-states #dataengineering
-
Candidats *en fuite* ? *Simplifiez vos entretiens*. #Recrutement #Stratégie #RH #Tech #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-strataezgie-rh-share-7476898805972516864-F6Qv/
-
Candidats *en fuite* ? *Simplifiez vos entretiens*. #Recrutement #Stratégie #RH #Tech #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-strataezgie-rh-share-7476898805972516864-F6Qv/
-
Un Data Engineer : *l’architecte de vos données*. #DataEngineering #Architecture #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-architecture-expertise-share-7476896269290938368-6pc7/
-
Un Data Engineer : *l’architecte de vos données*. #DataEngineering #Architecture #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-architecture-expertise-share-7476896269290938368-6pc7/
-
Offre d’emploi = *dictionnaire d’anglicismes* ? *Corrigez-la*. #Recrutement #Tech #Humour #Cloud #Digital #Numérique #Data #DataEngineer #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-tech-humour-share-7476611561827143680-Ts8w/
-
Offre d’emploi = *dictionnaire d’anglicismes* ? *Corrigez-la*. #Recrutement #Tech #Humour #Cloud #Digital #Numérique #Data #DataEngineer #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-tech-humour-share-7476611561827143680-Ts8w/
-
5 confusions *coûteuses* en recrutement tech. #Recrutement #Tech #Précision #Stratégie #Java #JavaScript #Shell #PowerShell #Numérique #Digital #DataEngineer #DataEngineering #Cloud #OnPremise ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-tech-praezcision-share-7476590244621004800-ENSs/
-
5 confusions *coûteuses* en recrutement tech. #Recrutement #Tech #Précision #Stratégie #Java #JavaScript #Shell #PowerShell #Numérique #Digital #DataEngineer #DataEngineering #Cloud #OnPremise ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-tech-praezcision-share-7476590244621004800-ENSs/
-
In most tiering systems, cold data is read-only. A GDPR deletion on archived rows means restore-delete-rearchive: a half-day job.
ColdFront: UPDATE or DELETE archived rows with one SQL statement.
HFS Research analyst Ashish Chaturvedi, in Anirban Ghoshal's InfoWorld piece today. ColdFront appears alongside Databricks, Snowflake & EDB on the OLTP/OLAP divide. The only 100% #OpenSource option, with #Postgres as the interface. -
In most tiering systems, cold data is read-only. A GDPR deletion on archived rows means restore-delete-rearchive: a half-day job.
ColdFront: UPDATE or DELETE archived rows with one SQL statement.
HFS Research analyst Ashish Chaturvedi, in Anirban Ghoshal's InfoWorld piece today. ColdFront appears alongside Databricks, Snowflake & EDB on the OLTP/OLAP divide. The only 100% #OpenSource option, with #Postgres as the interface. -
Votre pipeline est un *spaghetti* ? Je le *démêle*. #DataEngineering #Pipelines #Humour #Tech #NoBullshit ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-pipelines-humour-share-7475792877206421504-_eeq/
-
Votre pipeline est un *spaghetti* ? Je le *démêle*. #DataEngineering #Pipelines #Humour #Tech #NoBullshit ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-pipelines-humour-share-7475792877206421504-_eeq/
-
Vos données sont un *désert* ? Faites-les *fleurir*. #DataEngineering #QualitéDesDonnées #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-qualitaezdesdonnaezes-expertise-share-7475792070998409216-xo74/
-
Vos données sont un *désert* ? Faites-les *fleurir*. #DataEngineering #QualitéDesDonnées #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-qualitaezdesdonnaezes-expertise-share-7475792070998409216-xo74/
-
Module 1 of LLM Zoomcamp is done! 🎉
I turned my original RAG pipeline into an Agent!
I spent these last few days diving deep into Agentic RAG. It's been fascinating to build it step by step. Every time I ask the LLM to learn about something new, I see how it naturally figures out which tools to use, when to search, and how many times to gather info before giving me a solid answer.
What exactly is Agentic RAG?
It’s like giving the AI a brain that can actually act. Instead of just retrieving from a fixed knowledge base, the model decides whether it needs external tools first, gathers what it needs, and then answers. It’s pretty interesting to understand how it actually works behind the scenes!Why does this matter?
A few days ago I asked for a detailed guide on using the OpenAI Python library with the chat.completion API. The Local LLM called web search multiple times until it had enough context and built something useful from those pieces. Now that I am building these systems, I can finally understand why it does what it does.💡 Insights from this week:
- Building a static pipeline is a great start, but to make something truly flexible, you need function or tool calling. It lets the LLM look at the question first and decide whether it needs to search a knowledge base before answering.
- I used to think "chunking" was just about breaking up text. Turns out it can reduce token input by 3x! 🤯
- You have to learn how to walk before you run. Starting small, understanding each component manually, and seeing how the pieces fit together… it felt slow at first but worth it. Now I’m able to accelerate with agent frameworks like toyaikit, LangChain, PydanticAI, or OpenAI Agents.
- There is definitely a learning curve with the API syntax. Between the new response API and chat completions, tool responses are structured differently and you have to adjust your code accordingly. Frustrating at times, but also a great way to learn!Quick takeaway:
It is best to start simple, then add complexity only when needed. Sometimes an agent can burn tokens unnecessarily, so only add that layer if your problem really needs it!Had a lot of fun with this module and I’m already curious about what’s next. If you’re interested in learning along, this is the full free course Alexey at the Data Talks Club: https://github.com/DataTalksClub/llm-zoomcamp/
Anyone else tinkering with LLM agents lately? What kind of projects are you exploring or trying out? Would love to hear where your journey is heading!
#ai #localai #llm #mastodon #fediverse #buildinpublic #linux #github #aiengineering #DataEngineering
-
Module 1 of LLM Zoomcamp is done! 🎉
I turned my original RAG pipeline into an Agent!
I spent these last few days diving deep into Agentic RAG. It's been fascinating to build it step by step. Every time I ask the LLM to learn about something new, I see how it naturally figures out which tools to use, when to search, and how many times to gather info before giving me a solid answer.
What exactly is Agentic RAG?
It’s like giving the AI a brain that can actually act. Instead of just retrieving from a fixed knowledge base, the model decides whether it needs external tools first, gathers what it needs, and then answers. It’s pretty interesting to understand how it actually works behind the scenes!Why does this matter?
A few days ago I asked for a detailed guide on using the OpenAI Python library with the chat.completion API. The Local LLM called web search multiple times until it had enough context and built something useful from those pieces. Now that I am building these systems, I can finally understand why it does what it does.💡 Insights from this week:
- Building a static pipeline is a great start, but to make something truly flexible, you need function or tool calling. It lets the LLM look at the question first and decide whether it needs to search a knowledge base before answering.
- I used to think "chunking" was just about breaking up text. Turns out it can reduce token input by 3x! 🤯
- You have to learn how to walk before you run. Starting small, understanding each component manually, and seeing how the pieces fit together… it felt slow at first but worth it. Now I’m able to accelerate with agent frameworks like toyaikit, LangChain, PydanticAI, or OpenAI Agents.
- There is definitely a learning curve with the API syntax. Between the new response API and chat completions, tool responses are structured differently and you have to adjust your code accordingly. Frustrating at times, but also a great way to learn!Quick takeaway:
It is best to start simple, then add complexity only when needed. Sometimes an agent can burn tokens unnecessarily, so only add that layer if your problem really needs it!Had a lot of fun with this module and I’m already curious about what’s next. If you’re interested in learning along, this is the full free course Alexey at the Data Talks Club: https://github.com/DataTalksClub/llm-zoomcamp/
Anyone else tinkering with LLM agents lately? What kind of projects are you exploring or trying out? Would love to hear where your journey is heading!
#ai #localai #llm #mastodon #fediverse #buildinpublic #linux #github #aiengineering #DataEngineering
-
Meet #OpenAI’s Kepler - an internal AI data analyst that operates across 600+ petabytes of data and 70,000+ datasets daily.
Learn how OpenAI combines MCP, RAG & vector search over platform metadata to power an autonomous agent that can discover datasets, generate complex queries, investigate anomalies, and deliver insights in natural language.
🎬 Watch now: https://bit.ly/4vtmVGF
-
Meet #OpenAI’s Kepler - an internal AI data analyst that operates across 600+ petabytes of data and 70,000+ datasets daily.
Learn how OpenAI combines MCP, RAG & vector search over platform metadata to power an autonomous agent that can discover datasets, generate complex queries, investigate anomalies, and deliver insights in natural language.
🎬 Watch now: https://bit.ly/4vtmVGF
-
Retour d’expérience : *comment sauver un projet Data*. #DataEngineering #RetourExpérience #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-retourexpaezrience-expertise-share-7475449675840630784-njtJ/
-
Retour d’expérience : *comment sauver un projet Data*. #DataEngineering #RetourExpérience #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-retourexpaezrience-expertise-share-7475449675840630784-njtJ/
-
Outils data *trop complexes* ? *Simplifiez-les*. #DataEngineering #Outils #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-outils-expertise-share-7475446975291797504-fIUx/
-
Outils data *trop complexes* ? *Simplifiez-les*. #DataEngineering #Outils #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-outils-expertise-share-7475446975291797504-fIUx/
-
Data Engineering : *3 métriques à suivre*. #DataEngineering #Métriques #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-maeztriques-expertise-share-7475083205684056064-u_cs/
-
Data Engineering : *3 métriques à suivre*. #DataEngineering #Métriques #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-maeztriques-expertise-share-7475083205684056064-u_cs/
-
Votre processus fait *peur* ? *Désencombrez-le*. #Recrutement #RH #Tech #DataEngineering #NoBullshit ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-rh-tech-share-7475073800469073920-j5Dh/
-
Votre processus fait *peur* ? *Désencombrez-le*. #Recrutement #RH #Tech #DataEngineering #NoBullshit ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-rh-tech-share-7475073800469073920-j5Dh/
-
Data Engineer : *un escalier qui mène loin*. #DataEngineering #Carrière #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-carriaeyre-expertise-share-7475072337168752640-_Btu/
-
Data Engineer : *un escalier qui mène loin*. #DataEngineering #Carrière #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-carriaeyre-expertise-share-7475072337168752640-_Btu/
-
Prototyping AI with #PostgreSQL is easy. Production is where teams get stuck.
Mike Josephson (pgEdge) covers the full open source stack - MCP Server, RAG Server, AI DBA Workbench - and the Q&A goes deep: why a dedicated MCP server vs. direct LLM access? - enterprise controls, TSV optimization, semantic caching.
Live demo runs fully local on Ollama + Gemma 4 31B. No data leaving the machine.
Watch the on-demand replay: 🎙️ https://pages.pgedge.com/postgresworld-webinar-postgres-series-ai-dba-workbench-for-postgresql-a-technical-walkthrough-02efc371-9c20-4ba0-86f4-faea117d7a2f
-
Prototyping AI with #PostgreSQL is easy. Production is where teams get stuck.
Mike Josephson (pgEdge) covers the full open source stack - MCP Server, RAG Server, AI DBA Workbench - and the Q&A goes deep: why a dedicated MCP server vs. direct LLM access? - enterprise controls, TSV optimization, semantic caching.
Live demo runs fully local on Ollama + Gemma 4 31B. No data leaving the machine.
Watch the on-demand replay: 🎙️ https://pages.pgedge.com/postgresworld-webinar-postgres-series-ai-dba-workbench-for-postgresql-a-technical-walkthrough-02efc371-9c20-4ba0-86f4-faea117d7a2f
-
Cold data is read-only - that's the assumption baked into basically every tiering solution on the market. ColdFront breaks it.
UPDATE & DELETE on archived rows work through standard SQL. A GDPR deletion on five-year-old events is a single DELETE statement. No restore cycle.
DuckDB in-process, Apache Iceberg on any S3. Stock unpatched #PostgreSQL 16/17/18. Beta now, PostgreSQL License. Led by Jimmy Angelakos.
-
Cold data is read-only - that's the assumption baked into basically every tiering solution on the market. ColdFront breaks it.
UPDATE & DELETE on archived rows work through standard SQL. A GDPR deletion on five-year-old events is a single DELETE statement. No restore cycle.
DuckDB in-process, Apache Iceberg on any S3. Stock unpatched #PostgreSQL 16/17/18. Beta now, PostgreSQL License. Led by Jimmy Angelakos.
-
Data Engineering : *les métriques qui comptent*. #DataEngineering #Métriques #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-maeztriques-expertise-share-7474762948939485184-f_h7/
-
Data Engineering : *les métriques qui comptent*. #DataEngineering #Métriques #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-maeztriques-expertise-share-7474762948939485184-f_h7/
-
Métriques inutiles ? *Concentrez-vous sur l’impact*. #Data #Métriques #Stratégie #Tech #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_data-maeztriques-strataezgie-activity-7474756560473341952-YQv2
-
Métriques inutiles ? *Concentrez-vous sur l’impact*. #Data #Métriques #Stratégie #Tech #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_data-maeztriques-strataezgie-activity-7474756560473341952-YQv2
-
🚨 Our blog cuts deep this week. The untold truth about running the Databricks platform behind the scenes.
💡 Azure Databricks Governance with Terraform and Databricks Bundles
Read the full story here:
🔗 https://www.nextlytics.com/blog/azure-databricks-governance-with-terraform-and-databricks-bundles
#databricks #azuredatabricks #terraform #platformengineering #dataengineering #iac #datascience #businessintelligence #enterprisedata
-
🚨 Our blog cuts deep this week. The untold truth about running the Databricks platform behind the scenes.
💡 Azure Databricks Governance with Terraform and Databricks Bundles
Read the full story here:
🔗 https://www.nextlytics.com/blog/azure-databricks-governance-with-terraform-and-databricks-bundles
#databricks #azuredatabricks #terraform #platformengineering #dataengineering #iac #datascience #businessintelligence #enterprisedata
-
pgEdge ColdFront: #PostgreSQL data tiering. Hot data in the heap, cold to Apache Iceberg on S3 - up to 90% lower storage cost.
The cold tier is writable. UPDATE & DELETE on cold rows work in standard SQL. No restore cycle, no rehydration. No app changes.
DuckDB runs in-process. No daemon, no sidecar. PostgreSQL License, beta now. Led by @vyruss.
Press release: 👉 https://www.pgedge.com/press-releases/pgedge-announces-coldfront-for-postgresql
GitHub: 🔗 https://github.com/pgEdge/coldfront
-
pgEdge ColdFront: #PostgreSQL data tiering. Hot data in the heap, cold to Apache Iceberg on S3 - up to 90% lower storage cost.
The cold tier is writable. UPDATE & DELETE on cold rows work in standard SQL. No restore cycle, no rehydration. No app changes.
DuckDB runs in-process. No daemon, no sidecar. PostgreSQL License, beta now. Led by @vyruss.
Press release: 👉 https://www.pgedge.com/press-releases/pgedge-announces-coldfront-for-postgresql
GitHub: 🔗 https://github.com/pgEdge/coldfront
-
Vos données sont un *bordel* ? Je les range. #DataEngineering #BigData #Humour #Tech #Bioinformatique ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-bigdata-humour-share-7473642098735534080-SIrr/
-
Vos données sont un *bordel* ? Je les range. #DataEngineering #BigData #Humour #Tech #Bioinformatique ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-bigdata-humour-share-7473642098735534080-SIrr/
-
L’ingénierie de la donnée : *l’art de cuisiner l’information*. #DataEngineering #BigData #Luxe #Expertise #Tech ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-bigdata-luxe-share-7473639217156399106-rAtX/
-
L’ingénierie de la donnée : *l’art de cuisiner l’information*. #DataEngineering #BigData #Luxe #Expertise #Tech ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-bigdata-luxe-share-7473639217156399106-rAtX/
-
🧊 pgEdge ColdFront beta is out - transparent data tiering for #PostgreSQL. Fully writable cold tier.
Hot data stays in the heap. Cold data moves to Iceberg on S3 at up to 90% lower cost. UPDATE & DELETE on archived rows, same SQL. No rehydration. No code changes.
DuckDB runs in-process - no daemon, no RPC. C extension routes DML to the correct tier transparently.
Development by Jimmy Angelakos. Blog by Antony Pegg: 📖 https://www.pgedge.com/blog/introducing-coldfront-seamlessly-uniting-oltp-analytics-and-ai-workloads-on-postgresql
-
🛠️I spent 10 hours last week mapping out a RAG pipeline. Three things became clear:
Week 1 of building this from scratch was intense but rewarding. I wanted to understand how the pieces fit together before jumping into implementation. That investment paid off.
The pipeline itself brings together three main components: a knowledge base search index, a prompt that includes user input plus retrieved context, and the LLM model itself. Pretty neat stuff once you see how they connect.
What stood out most was the value of modular design. You can swap models, clients, or data indexes down the road without rewriting everything; huge win for long-term flexibility. I especially like this approach because it keeps things adaptable as needs change.
As an extra challenge, I’m experimenting with Local AI instead of cloud APIs. The appeal is privacy, which has always mattered to me. So far it’s as simple as switching the API response call. No major overhaul needed; just a tweak in where responses come from.
💡 Insights from Week 1:
- RAG might be older technology, but it remains popular because it grounds responses in context and reduces hallucinations
- Worth understanding the trade-offs between in-memory vs. persistent databases before choosing an architecture
- Modular design makes future swaps easier when you need them; less technical debt over time
- Some models require different API calls (chat completions vs. standard responses), so compatibility matters
- Clean structured data is key; it’s often the most time-consuming part of building a RAG appIt’s a lot to unpack, but also pretty exciting. If you're navigating something interesting in your learning or work right now, I'd love to hear about it.
Build is a working progress, feel free to review my Github repo here: https://github.com/ammartin8/llm_zoomcamp_portfolio
#ai #localai #DataEngineering #mastodon #fediverse #generativeAI #linux #buildinpublic #github