#data-engineering — Public Fediverse posts on home.social

PGX @pgexperts · 2026-06-29 · 17:54 UTC

The Postgres "lakehouse" pitch is now real product: Snowflake open-sourced pg_lake (Iceberg access, with Postgres as the catalog), and Databricks took Lakebase to GA.

The question for your team is not the keynote. It is whether Postgres-over-Iceberg replaces your analytics stack or just moves the complexity.

pg_lake: https://github.com/Snowflake-Labs/pg_lake

https://pgexperts.com

#PostgreSQL #postgres #DataEngineering

#postgresql #postgres #dataengineering

PGX @[email protected] · 2026-06-29 · 17:54 UTC

The Postgres "lakehouse" pitch is now real product: Snowflake open-sourced pg_lake (Iceberg access, with Postgres as the catalog), and Databricks took Lakebase to GA.

The question for your team is not the keynote. It is whether Postgres-over-Iceberg replaces your analytics stack or just moves the complexity.

pg_lake: https://github.com/Snowflake-Labs/pg_lake

https://pgexperts.com

#PostgreSQL #postgres #DataEngineering

#postgresql #postgres #dataengineering

HackerNoon @[email protected] · 2026-06-28 · 18:43 UTC

Learn how configuration-driven architecture helps multi-state Medicaid platforms scale by handling X12 834 variation without hardcoded logic. https://hackernoon.com/building-medicaid-data-platforms-that-scale-across-states #dataengineering

#dataengineering

HackerNoon @[email protected] · 2026-06-28 · 18:43 UTC

Learn how configuration-driven architecture helps multi-state Medicaid platforms scale by handling X12 834 variation without hardcoded logic. https://hackernoon.com/building-medicaid-data-platforms-that-scale-across-states #dataengineering

#dataengineering

gaby_wald @[email protected] · 2026-06-28 · 07:27 UTC

Candidats *en fuite* ? *Simplifiez vos entretiens*. #Recrutement #Stratégie #RH #Tech #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-strataezgie-rh-share-7476898805972516864-F6Qv/

#recrutement #strategie #rh #tech #dataengineering

gaby_wald @[email protected] · 2026-06-28 · 07:27 UTC

Candidats *en fuite* ? *Simplifiez vos entretiens*. #Recrutement #Stratégie #RH #Tech #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-strataezgie-rh-share-7476898805972516864-F6Qv/

#recrutement #strategie #rh #tech #dataengineering

gaby_wald @[email protected] · 2026-06-28 · 07:17 UTC

Un Data Engineer : *l’architecte de vos données*. #DataEngineering #Architecture #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-architecture-expertise-share-7476896269290938368-6pc7/

#dataengineering #architecture #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-28 · 07:17 UTC

Un Data Engineer : *l’architecte de vos données*. #DataEngineering #Architecture #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-architecture-expertise-share-7476896269290938368-6pc7/

#dataengineering #architecture #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-27 · 12:27 UTC

Offre d’emploi = *dictionnaire d’anglicismes* ? *Corrigez-la*. #Recrutement #Tech #Humour #Cloud #Digital #Numérique #Data #DataEngineer #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-tech-humour-share-7476611561827143680-Ts8w/

#recrutement #tech #humour #cloud #digital #numerique

gaby_wald @[email protected] · 2026-06-27 · 12:27 UTC

Offre d’emploi = *dictionnaire d’anglicismes* ? *Corrigez-la*. #Recrutement #Tech #Humour #Cloud #Digital #Numérique #Data #DataEngineer #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-tech-humour-share-7476611561827143680-Ts8w/

#recrutement #tech #humour #cloud #digital #numerique

pgEdge Postgres @[email protected] · 2026-06-26 · 19:43 UTC

In most tiering systems, cold data is read-only. A GDPR deletion on archived rows means restore-delete-rearchive: a half-day job.

ColdFront: UPDATE or DELETE archived rows with one SQL statement.
HFS Research analyst Ashish Chaturvedi, in Anirban Ghoshal's InfoWorld piece today. ColdFront appears alongside Databricks, Snowflake & EDB on the OLTP/OLAP divide. The only 100% #OpenSource option, with #Postgres as the interface.

📖 https://hubs.la/Q04mQNSw0

#PostgreSQL #ApacheIceberg #DataEngineering

#opensource #postgres #postgresql #apacheiceberg #dataengineering

pgEdge Postgres @[email protected] · 2026-06-26 · 19:43 UTC

In most tiering systems, cold data is read-only. A GDPR deletion on archived rows means restore-delete-rearchive: a half-day job.

ColdFront: UPDATE or DELETE archived rows with one SQL statement.
HFS Research analyst Ashish Chaturvedi, in Anirban Ghoshal's InfoWorld piece today. ColdFront appears alongside Databricks, Snowflake & EDB on the OLTP/OLAP divide. The only 100% #OpenSource option, with #Postgres as the interface.

📖 https://hubs.la/Q04mQNSw0

#PostgreSQL #ApacheIceberg #DataEngineering

#opensource #postgres #postgresql #apacheiceberg #dataengineering

gaby_wald @[email protected] · 2026-06-25 · 06:13 UTC

Votre pipeline est un *spaghetti* ? Je le *démêle*. #DataEngineering #Pipelines #Humour #Tech #NoBullshit ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-pipelines-humour-share-7475792877206421504-_eeq/

#dataengineering #pipelines #humour #tech #nobullshit

gaby_wald @[email protected] · 2026-06-25 · 06:13 UTC

Votre pipeline est un *spaghetti* ? Je le *démêle*. #DataEngineering #Pipelines #Humour #Tech #NoBullshit ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-pipelines-humour-share-7475792877206421504-_eeq/

#dataengineering #pipelines #humour #tech #nobullshit

gaby_wald @[email protected] · 2026-06-25 · 06:10 UTC

Vos données sont un *désert* ? Faites-les *fleurir*. #DataEngineering #QualitéDesDonnées #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-qualitaezdesdonnaezes-expertise-share-7475792070998409216-xo74/

#dataengineering #qualitedesdonnees #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-25 · 06:10 UTC

Vos données sont un *désert* ? Faites-les *fleurir*. #DataEngineering #QualitéDesDonnées #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-qualitaezdesdonnaezes-expertise-share-7475792070998409216-xo74/

#dataengineering #qualitedesdonnees #expertise #tech #luxe

amah_codes @[email protected] · 2026-06-24 · 20:30 UTC

Module 1 of LLM Zoomcamp is done! 🎉

I turned my original RAG pipeline into an Agent!

I spent these last few days diving deep into Agentic RAG. It's been fascinating to build it step by step. Every time I ask the LLM to learn about something new, I see how it naturally figures out which tools to use, when to search, and how many times to gather info before giving me a solid answer.

What exactly is Agentic RAG?
It’s like giving the AI a brain that can actually act. Instead of just retrieving from a fixed knowledge base, the model decides whether it needs external tools first, gathers what it needs, and then answers. It’s pretty interesting to understand how it actually works behind the scenes!

Why does this matter?
A few days ago I asked for a detailed guide on using the OpenAI Python library with the chat.completion API. The Local LLM called web search multiple times until it had enough context and built something useful from those pieces. Now that I am building these systems, I can finally understand why it does what it does.

💡 Insights from this week:
- Building a static pipeline is a great start, but to make something truly flexible, you need function or tool calling. It lets the LLM look at the question first and decide whether it needs to search a knowledge base before answering.
- I used to think "chunking" was just about breaking up text. Turns out it can reduce token input by 3x! 🤯
- You have to learn how to walk before you run. Starting small, understanding each component manually, and seeing how the pieces fit together… it felt slow at first but worth it. Now I’m able to accelerate with agent frameworks like toyaikit, LangChain, PydanticAI, or OpenAI Agents.
- There is definitely a learning curve with the API syntax. Between the new response API and chat completions, tool responses are structured differently and you have to adjust your code accordingly. Frustrating at times, but also a great way to learn!

Quick takeaway:
It is best to start simple, then add complexity only when needed. Sometimes an agent can burn tokens unnecessarily, so only add that layer if your problem really needs it!

Had a lot of fun with this module and I’m already curious about what’s next. If you’re interested in learning along, this is the full free course Alexey at the Data Talks Club: https://github.com/DataTalksClub/llm-zoomcamp/

Anyone else tinkering with LLM agents lately? What kind of projects are you exploring or trying out? Would love to hear where your journey is heading!

#ai #localai #llm #mastodon #fediverse #buildinpublic #linux #github #aiengineering #DataEngineering

#ai #localai #llm #mastodon #fediverse #buildinpublic

amah_codes @amah_codes · 2026-06-24 · 20:30 UTC

Module 1 of LLM Zoomcamp is done! 🎉

I turned my original RAG pipeline into an Agent!

I spent these last few days diving deep into Agentic RAG. It's been fascinating to build it step by step. Every time I ask the LLM to learn about something new, I see how it naturally figures out which tools to use, when to search, and how many times to gather info before giving me a solid answer.

What exactly is Agentic RAG?
It’s like giving the AI a brain that can actually act. Instead of just retrieving from a fixed knowledge base, the model decides whether it needs external tools first, gathers what it needs, and then answers. It’s pretty interesting to understand how it actually works behind the scenes!

Why does this matter?
A few days ago I asked for a detailed guide on using the OpenAI Python library with the chat.completion API. The Local LLM called web search multiple times until it had enough context and built something useful from those pieces. Now that I am building these systems, I can finally understand why it does what it does.

💡 Insights from this week:
- Building a static pipeline is a great start, but to make something truly flexible, you need function or tool calling. It lets the LLM look at the question first and decide whether it needs to search a knowledge base before answering.
- I used to think "chunking" was just about breaking up text. Turns out it can reduce token input by 3x! 🤯
- You have to learn how to walk before you run. Starting small, understanding each component manually, and seeing how the pieces fit together… it felt slow at first but worth it. Now I’m able to accelerate with agent frameworks like toyaikit, LangChain, PydanticAI, or OpenAI Agents.
- There is definitely a learning curve with the API syntax. Between the new response API and chat completions, tool responses are structured differently and you have to adjust your code accordingly. Frustrating at times, but also a great way to learn!

Quick takeaway:
It is best to start simple, then add complexity only when needed. Sometimes an agent can burn tokens unnecessarily, so only add that layer if your problem really needs it!

Had a lot of fun with this module and I’m already curious about what’s next. If you’re interested in learning along, this is the full free course Alexey at the Data Talks Club: https://github.com/DataTalksClub/llm-zoomcamp/

Anyone else tinkering with LLM agents lately? What kind of projects are you exploring or trying out? Would love to hear where your journey is heading!

#ai #localai #llm #mastodon #fediverse #buildinpublic #linux #github #aiengineering #DataEngineering

#ai #localai #llm #mastodon #fediverse #buildinpublic

InfoQ @[email protected] · 2026-06-24 · 10:11 UTC

Meet #OpenAI’s Kepler - an internal AI data analyst that operates across 600+ petabytes of data and 70,000+ datasets daily.

Learn how OpenAI combines MCP, RAG & vector search over platform metadata to power an autonomous agent that can discover datasets, generate complex queries, investigate anomalies, and deliver insights in natural language.

🎬 Watch now: https://bit.ly/4vtmVGF

#AIAgents #DataEngineering #AI

#openai #aiagents #dataengineering #ai

InfoQ @infoq · 2026-06-24 · 10:11 UTC

Meet #OpenAI’s Kepler - an internal AI data analyst that operates across 600+ petabytes of data and 70,000+ datasets daily.

Learn how OpenAI combines MCP, RAG & vector search over platform metadata to power an autonomous agent that can discover datasets, generate complex queries, investigate anomalies, and deliver insights in natural language.

🎬 Watch now: https://bit.ly/4vtmVGF

#AIAgents #DataEngineering #AI

#openai #aiagents #dataengineering #ai

gaby_wald @[email protected] · 2026-06-24 · 07:29 UTC

Retour d’expérience : *comment sauver un projet Data*. #DataEngineering #RetourExpérience #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-retourexpaezrience-expertise-share-7475449675840630784-njtJ/

#dataengineering #retourexperience #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-24 · 07:29 UTC

Retour d’expérience : *comment sauver un projet Data*. #DataEngineering #RetourExpérience #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-retourexpaezrience-expertise-share-7475449675840630784-njtJ/

#dataengineering #retourexperience #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-24 · 07:19 UTC

Outils data *trop complexes* ? *Simplifiez-les*. #DataEngineering #Outils #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-outils-expertise-share-7475446975291797504-fIUx/

#dataengineering #outils #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-24 · 07:19 UTC

Outils data *trop complexes* ? *Simplifiez-les*. #DataEngineering #Outils #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-outils-expertise-share-7475446975291797504-fIUx/

#dataengineering #outils #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-23 · 07:13 UTC

Data Engineering : *3 métriques à suivre*. #DataEngineering #Métriques #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-maeztriques-expertise-share-7475083205684056064-u_cs/

#dataengineering #metriques #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-23 · 07:13 UTC

Data Engineering : *3 métriques à suivre*. #DataEngineering #Métriques #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-maeztriques-expertise-share-7475083205684056064-u_cs/

#dataengineering #metriques #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-23 · 06:37 UTC

Votre processus fait *peur* ? *Désencombrez-le*. #Recrutement #RH #Tech #DataEngineering #NoBullshit ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-rh-tech-share-7475073800469073920-j5Dh/

#recrutement #rh #tech #dataengineering #nobullshit

gaby_wald @[email protected] · 2026-06-23 · 06:37 UTC

Votre processus fait *peur* ? *Désencombrez-le*. #Recrutement #RH #Tech #DataEngineering #NoBullshit ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-rh-tech-share-7475073800469073920-j5Dh/

#recrutement #rh #tech #dataengineering #nobullshit

gaby_wald @[email protected] · 2026-06-23 · 06:31 UTC

Data Engineer : *un escalier qui mène loin*. #DataEngineering #Carrière #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-carriaeyre-expertise-share-7475072337168752640-_Btu/

#dataengineering #carriere #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-23 · 06:31 UTC

Data Engineer : *un escalier qui mène loin*. #DataEngineering #Carrière #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-carriaeyre-expertise-share-7475072337168752640-_Btu/

#dataengineering #carriere #expertise #tech #luxe

pgEdge Postgres @[email protected] · 2026-06-22 · 18:09 UTC

Prototyping AI with #PostgreSQL is easy. Production is where teams get stuck.

Mike Josephson (pgEdge) covers the full open source stack - MCP Server, RAG Server, AI DBA Workbench - and the Q&A goes deep: why a dedicated MCP server vs. direct LLM access? - enterprise controls, TSV optimization, semantic caching.

Live demo runs fully local on Ollama + Gemma 4 31B. No data leaving the machine.

Watch the on-demand replay: 🎙️ https://pages.pgedge.com/postgresworld-webinar-postgres-series-ai-dba-workbench-for-postgresql-a-technical-walkthrough-02efc371-9c20-4ba0-86f4-faea117d7a2f

#OpenSource #AI #MCP #DataEngineering #tech #llm

#postgresql #opensource #ai #mcp #dataengineering #tech

pgEdge Postgres @[email protected] · 2026-06-22 · 18:09 UTC

Prototyping AI with #PostgreSQL is easy. Production is where teams get stuck.

Mike Josephson (pgEdge) covers the full open source stack - MCP Server, RAG Server, AI DBA Workbench - and the Q&A goes deep: why a dedicated MCP server vs. direct LLM access? - enterprise controls, TSV optimization, semantic caching.

Live demo runs fully local on Ollama + Gemma 4 31B. No data leaving the machine.

Watch the on-demand replay: 🎙️ https://pages.pgedge.com/postgresworld-webinar-postgres-series-ai-dba-workbench-for-postgresql-a-technical-walkthrough-02efc371-9c20-4ba0-86f4-faea117d7a2f

#OpenSource #AI #MCP #DataEngineering #tech #llm

#postgresql #opensource #ai #mcp #dataengineering #tech

pgEdge Postgres @[email protected] · 2026-06-22 · 18:07 UTC

Cold data is read-only - that's the assumption baked into basically every tiering solution on the market. ColdFront breaks it.

UPDATE & DELETE on archived rows work through standard SQL. A GDPR deletion on five-year-old events is a single DELETE statement. No restore cycle.

DuckDB in-process, Apache Iceberg on any S3. Stock unpatched #PostgreSQL 16/17/18. Beta now, PostgreSQL License. Led by Jimmy Angelakos.

📖 https://github.com/pgEdge/coldfront

#OpenSource #DataEngineering #ApacheIceberg #DuckDB

#postgresql #opensource #dataengineering #apacheiceberg #duckdb

pgEdge Postgres @[email protected] · 2026-06-22 · 18:07 UTC

Cold data is read-only - that's the assumption baked into basically every tiering solution on the market. ColdFront breaks it.

UPDATE & DELETE on archived rows work through standard SQL. A GDPR deletion on five-year-old events is a single DELETE statement. No restore cycle.

DuckDB in-process, Apache Iceberg on any S3. Stock unpatched #PostgreSQL 16/17/18. Beta now, PostgreSQL License. Led by Jimmy Angelakos.

📖 https://github.com/pgEdge/coldfront

#OpenSource #DataEngineering #ApacheIceberg #DuckDB

#postgresql #opensource #dataengineering #apacheiceberg #duckdb

gaby_wald @[email protected] · 2026-06-22 · 10:00 UTC

Data Engineering : *les métriques qui comptent*. #DataEngineering #Métriques #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-maeztriques-expertise-share-7474762948939485184-f_h7/

#dataengineering #metriques #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-22 · 10:00 UTC

Data Engineering : *les métriques qui comptent*. #DataEngineering #Métriques #Expertise #Tech #Luxe ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-maeztriques-expertise-share-7474762948939485184-f_h7/

#dataengineering #metriques #expertise #tech #luxe

gaby_wald @[email protected] · 2026-06-22 · 09:54 UTC

Métriques inutiles ? *Concentrez-vous sur l’impact*. #Data #Métriques #Stratégie #Tech #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_data-maeztriques-strataezgie-activity-7474756560473341952-YQv2

#data #metriques #strategie #tech #dataengineering

gaby_wald @[email protected] · 2026-06-22 · 09:54 UTC

Métriques inutiles ? *Concentrez-vous sur l’impact*. #Data #Métriques #Stratégie #Tech #DataEngineering ... https://www.linkedin.com/posts/gabriel-chandesris_data-maeztriques-strataezgie-activity-7474756560473341952-YQv2

#data #metriques #strategie #tech #dataengineering

NextLytics AG @[email protected] · 2026-06-19 · 19:46 UTC

🚨 Our blog cuts deep this week. The untold truth about running the Databricks platform behind the scenes.

💡 Azure Databricks Governance with Terraform and Databricks Bundles

Read the full story here:

🔗 https://www.nextlytics.com/blog/azure-databricks-governance-with-terraform-and-databricks-bundles

#databricks #azuredatabricks #terraform #platformengineering #dataengineering #iac #datascience #businessintelligence #enterprisedata

#databricks #azuredatabricks #terraform #platformengineering #dataengineering #iac

NextLytics AG @[email protected] · 2026-06-19 · 19:46 UTC

🚨 Our blog cuts deep this week. The untold truth about running the Databricks platform behind the scenes.

💡 Azure Databricks Governance with Terraform and Databricks Bundles

Read the full story here:

🔗 https://www.nextlytics.com/blog/azure-databricks-governance-with-terraform-and-databricks-bundles

#databricks #azuredatabricks #terraform #platformengineering #dataengineering #iac #datascience #businessintelligence #enterprisedata

#databricks #azuredatabricks #terraform #platformengineering #dataengineering #iac

pgEdge Postgres @[email protected] · 2026-06-19 · 16:05 UTC

pgEdge ColdFront: #PostgreSQL data tiering. Hot data in the heap, cold to Apache Iceberg on S3 - up to 90% lower storage cost.

The cold tier is writable. UPDATE & DELETE on cold rows work in standard SQL. No restore cycle, no rehydration. No app changes.

DuckDB runs in-process. No daemon, no sidecar. PostgreSQL License, beta now. Led by @vyruss.

Press release: 👉 https://www.pgedge.com/press-releases/pgedge-announces-coldfront-for-postgresql

GitHub: 🔗 https://github.com/pgEdge/coldfront

#OpenSource #DataEngineering #ApacheIceberg #DuckDB

#postgresql #opensource #dataengineering #apacheiceberg #duckdb

pgEdge Postgres @[email protected] · 2026-06-19 · 16:05 UTC

pgEdge ColdFront: #PostgreSQL data tiering. Hot data in the heap, cold to Apache Iceberg on S3 - up to 90% lower storage cost.

The cold tier is writable. UPDATE & DELETE on cold rows work in standard SQL. No restore cycle, no rehydration. No app changes.

DuckDB runs in-process. No daemon, no sidecar. PostgreSQL License, beta now. Led by @vyruss.

Press release: 👉 https://www.pgedge.com/press-releases/pgedge-announces-coldfront-for-postgresql

GitHub: 🔗 https://github.com/pgEdge/coldfront

#OpenSource #DataEngineering #ApacheIceberg #DuckDB

#postgresql #opensource #dataengineering #apacheiceberg #duckdb

gaby_wald @[email protected] · 2026-06-19 · 07:46 UTC

Vos données sont un *bordel* ? Je les range. #DataEngineering #BigData #Humour #Tech #Bioinformatique ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-bigdata-humour-share-7473642098735534080-SIrr/

#dataengineering #bigdata #humour #tech #bioinformatique

gaby_wald @[email protected] · 2026-06-19 · 07:46 UTC

Vos données sont un *bordel* ? Je les range. #DataEngineering #BigData #Humour #Tech #Bioinformatique ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-bigdata-humour-share-7473642098735534080-SIrr/

#dataengineering #bigdata #humour #tech #bioinformatique

gaby_wald @[email protected] · 2026-06-19 · 07:37 UTC

L’ingénierie de la donnée : *l’art de cuisiner l’information*. #DataEngineering #BigData #Luxe #Expertise #Tech ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-bigdata-luxe-share-7473639217156399106-rAtX/

#dataengineering #bigdata #luxe #expertise #tech

gaby_wald @[email protected] · 2026-06-19 · 07:37 UTC

L’ingénierie de la donnée : *l’art de cuisiner l’information*. #DataEngineering #BigData #Luxe #Expertise #Tech ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-bigdata-luxe-share-7473639217156399106-rAtX/

#dataengineering #bigdata #luxe #expertise #tech

pgEdge Postgres @[email protected] · 2026-06-18 · 14:04 UTC

🧊 pgEdge ColdFront beta is out - transparent data tiering for #PostgreSQL. Fully writable cold tier.

Hot data stays in the heap. Cold data moves to Iceberg on S3 at up to 90% lower cost. UPDATE & DELETE on archived rows, same SQL. No rehydration. No code changes.

DuckDB runs in-process - no daemon, no RPC. C extension routes DML to the correct tier transparently.

Development by Jimmy Angelakos. Blog by Antony Pegg: 📖 https://www.pgedge.com/blog/introducing-coldfront-seamlessly-uniting-oltp-analytics-and-ai-workloads-on-postgresql

#OpenSource #DataEngineering #ApacheIceberg #DuckDB

#postgresql #opensource #dataengineering #apacheiceberg #duckdb

amah_codes @amah_codes · 2026-06-16 · 16:51 UTC

🛠️I spent 10 hours last week mapping out a RAG pipeline. Three things became clear:

Week 1 of building this from scratch was intense but rewarding. I wanted to understand how the pieces fit together before jumping into implementation. That investment paid off.

The pipeline itself brings together three main components: a knowledge base search index, a prompt that includes user input plus retrieved context, and the LLM model itself. Pretty neat stuff once you see how they connect.

What stood out most was the value of modular design. You can swap models, clients, or data indexes down the road without rewriting everything; huge win for long-term flexibility. I especially like this approach because it keeps things adaptable as needs change.

As an extra challenge, I’m experimenting with Local AI instead of cloud APIs. The appeal is privacy, which has always mattered to me. So far it’s as simple as switching the API response call. No major overhaul needed; just a tweak in where responses come from.

💡 Insights from Week 1:

- RAG might be older technology, but it remains popular because it grounds responses in context and reduces hallucinations
- Worth understanding the trade-offs between in-memory vs. persistent databases before choosing an architecture
- Modular design makes future swaps easier when you need them; less technical debt over time
- Some models require different API calls (chat completions vs. standard responses), so compatibility matters
- Clean structured data is key; it’s often the most time-consuming part of building a RAG app

It’s a lot to unpack, but also pretty exciting. If you're navigating something interesting in your learning or work right now, I'd love to hear about it.

Build is a working progress, feel free to review my Github repo here: https://github.com/ammartin8/llm_zoomcamp_portfolio

#ai #localai #DataEngineering #mastodon #fediverse #generativeAI #linux #buildinpublic #github

#ai #localai #dataengineering #mastodon #fediverse #generativeai