“Sarah_Lea” — Fediverse search results on home.social

Sarah Lea @Sarah_Lea · 2025-03-11 · 19:37 UTC

Have you heard "it works on my machine"? Enter: containers. Learn how Docker, Inc ensures consistent ML models, data pipelines, and environments across any system in this article :blobcoffee: https://towardsdatascience.com/why-data-scientists-should-care-about-containers-and-stand-out-with-this-knowledge/

#docker #container #kubernetes #datascientist #dataengineering #dataengineers #datascience #virtualmachines

#docker #container #kubernetes #datascientist #dataengineering #dataengineers

Sarah Lea @[email protected] · 2025-02-06 · 17:17 UTC

From Delivery to Smart Cities: Learn Pygame Simulation Basics :blobcoffee:

Try out a simple project with pygame to simplify more complex situations: https://medium.com/pythoneers/from-delivery-to-smart-cities-learn-pygame-simulation-basics-5b9cffcfe5f7

If you don't have the paid Medium version: https://open.substack.com/pub/sarahleaschrch/p/from-delivery-to-smart-cities-learn?utm_source=share&utm_medium=android&r=3khq41

#python #programming #beginnersguide #pygame #smartcity #parcelservice #energygrid

#python #programming #beginnersguide #pygame #smartcity #parcelservice

Sarah Lean @[email protected] · 2024-06-07 · 07:02 UTC

🚀 Exciting news!

I'm working on "Terraform for Ops: Automating Infrastructure Tasks" 📘.

This book is your guide to mastering Terraform and streamlining IT operations.

Sign up for updates and be the first to know when it's out! 👉 https://leanpub.com/terraformforops

#Terraform #DevOps #ITPros

#terraform #devops #itpros

Sarah Lea @Sarah_Lea · 2026-05-14 · 15:07 UTC

Regex vs. LLM for B2B document extraction. This week, I tried out both.

:blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

:blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

:blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

:blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

Full comparison with code and trade-off breakdown on TDS: https://shorturl.at/v4gdl

#Python #DataScience #business #technology #dataengineering #LLM #Automation #OCR

#python #datascience #business #technology #dataengineering #llm

Sarah Lea @[email protected] · 2026-05-14 · 15:07 UTC

Regex vs. LLM for B2B document extraction. This week, I tried out both.

:blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

:blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

:blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

:blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

Full comparison with code and trade-off breakdown on TDS: https://shorturl.at/v4gdl

#Python #DataScience #business #technology #dataengineering #LLM #Automation #OCR

#python #datascience #business #technology #dataengineering #llm

Sarah Lea @[email protected] · 2026-05-14 · 15:07 UTC

Regex vs. LLM for B2B document extraction. This week, I tried out both.

:blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

:blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

:blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

:blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

Full comparison with code and trade-off breakdown on TDS: https://shorturl.at/v4gdl

#Python #DataScience #business #technology #dataengineering #LLM #Automation #OCR

#python #datascience #business #technology #dataengineering #llm

Sarah Lea @[email protected] · 2026-05-14 · 15:07 UTC

Regex vs. LLM for B2B document extraction. This week, I tried out both.

:blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

:blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

:blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

:blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

Full comparison with code and trade-off breakdown on TDS: https://shorturl.at/v4gdl

#Python #DataScience #business #technology #dataengineering #LLM #Automation #OCR

#python #datascience #business #technology #dataengineering #llm

Sarah Lea @[email protected] · 2026-05-14 · 15:07 UTC

Regex vs. LLM for B2B document extraction. This week, I tried out both.

:blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

:blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

:blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

:blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

Full comparison with code and trade-off breakdown on TDS: https://shorturl.at/v4gdl

#Python #DataScience #business #technology #dataengineering #LLM #Automation #OCR

#ocr #automation #llm #dataengineering #technology #business

Sarah Lea @[email protected] · 2026-04-05 · 19:01 UTC

Formatting in Word stole hours of my thesis work. So I built a different process for long documents.

:blobcoffee: OneNote as the thinking hub.
:blobcoffee: OneLatex as the translator as it turns the notebook into a clean, formatted PDF automatically.

My new article: a 7-step workflow + a free OneNote template.

:blobcoffee: 👉 http://bit.ly/4e4y7n4

#business #it #writing #productivity #thesis #onenote #word #latex #technology #student

#business #it #writing #productivity #thesis #onenote

Sarah Lea @[email protected] · 2026-02-10 · 02:22 UTC

Most ML issues are not model problems. They are data problems.

I retrained the same churn model twice.
Same code. Same path to the data.
Different result.

Why? Because of mutable data references.

:blobcoffee: I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: https://tinyurl.com/lake-vs-lakehouse-medium

:blobcoffee: Friend-Link: https://medium.com/towards-artificial-intelligence/from-data-lake-to-data-lakehouse-why-ai-changes-the-rules-for-data-platforms-c78feab48e1c?sk=405811cbc10baa4622bcfcad90736ed4

#ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression

#ai #machinelearning #data #lakehouse #warehouse #python

Sarah Lea @[email protected] · 2026-02-10 · 02:22 UTC

Most ML issues are not model problems. They are data problems.

I retrained the same churn model twice.
Same code. Same path to the data.
Different result.

Why? Because of mutable data references.

:blobcoffee: I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: https://tinyurl.com/lake-vs-lakehouse-medium

:blobcoffee: Friend-Link: https://medium.com/towards-artificial-intelligence/from-data-lake-to-data-lakehouse-why-ai-changes-the-rules-for-data-platforms-c78feab48e1c?sk=405811cbc10baa4622bcfcad90736ed4

#ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression

#ai #machinelearning #data #lakehouse #warehouse #python

Sarah Lea @[email protected] · 2026-02-10 · 02:22 UTC

Most ML issues are not model problems. They are data problems.

I retrained the same churn model twice.
Same code. Same path to the data.
Different result.

Why? Because of mutable data references.

:blobcoffee: I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: https://tinyurl.com/lake-vs-lakehouse-medium

:blobcoffee: Friend-Link: https://medium.com/towards-artificial-intelligence/from-data-lake-to-data-lakehouse-why-ai-changes-the-rules-for-data-platforms-c78feab48e1c?sk=405811cbc10baa4622bcfcad90736ed4

#ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression

#regression #technology #datalake #python #warehouse #lakehouse

Sarah Lea @Sarah_Lea · 2026-02-10 · 02:22 UTC

Most ML issues are not model problems. They are data problems.

I retrained the same churn model twice.
Same code. Same path to the data.
Different result.

Why? Because of mutable data references.

:blobcoffee: I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: https://tinyurl.com/lake-vs-lakehouse-medium

:blobcoffee: Friend-Link: https://medium.com/towards-artificial-intelligence/from-data-lake-to-data-lakehouse-why-ai-changes-the-rules-for-data-platforms-c78feab48e1c?sk=405811cbc10baa4622bcfcad90736ed4

#ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression

#ai #machinelearning #data #lakehouse #warehouse #python

Sarah Lea @[email protected] · 2026-02-10 · 02:22 UTC

Most ML issues are not model problems. They are data problems.

I retrained the same churn model twice.
Same code. Same path to the data.
Different result.

Why? Because of mutable data references.

:blobcoffee: I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: https://tinyurl.com/lake-vs-lakehouse-medium

:blobcoffee: Friend-Link: https://medium.com/towards-artificial-intelligence/from-data-lake-to-data-lakehouse-why-ai-changes-the-rules-for-data-platforms-c78feab48e1c?sk=405811cbc10baa4622bcfcad90736ed4

#ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression

#ai #machinelearning #data #lakehouse #warehouse #python

Sarah Lea @[email protected] · 2026-01-01 · 08:32 UTC

Chunk size in RAG systems defines the size of the text segments into which documents are split before embedding.

I wanted to understand the impact of three different chunk sizes, so I built a small RAG system to test it: https://towardsdatascience.com/chunk-size-as-an-experimental-variable-in-rag-systems/

:blobcoffee: Wishing you all a successful start to 2026

#ai #datascience #datascientist #ki #artificialintelligence #python #rag #towardsdatascience #programming #Technology

#ai #datascience #datascientist #ki #artificialintelligence #python

Sarah Lea @[email protected] · 2025-09-22 · 19:21 UTC

What’s a CSV Plot Agent? I wanted to create an agent that automatically analyzes and visualizes data from a CSV. I built it using LangChain and Streamlit (two Python frameworks).

:blobcoffee: Check out the step-by-step guide here: https://medium.com/towards-artificial-intelligence/csv-plot-agent-with-langchain-streamlit-your-introduction-to-data-agents-aa282ae970ff?sk=f9be8a191ca89eca28b4aacc45efa52f

:blobcoffee: Here’s the code in the GitHub repo: https://github.com/Sari95/CSV-Plot-Agent-with-LangChain-and-Streamlit

#python #langchain #programming #agenticai #ai #ki #data #datascience #datascientist #streamlit #agent

#python #langchain #programming #agenticai #ai #ki

Sarah Lea @[email protected] · 2025-08-02 · 07:27 UTC

I asked ChatGPT to create a study plan and add the sessions directly to my calendar. It worked.

2025 has been called the year of AI agents and these two new ChatGPT modes show why:
:blobcoffee: Agents that research, act, and run tools on their own
:blobcoffee: Tutors that guide your thinking instead of just answering

What do you think about the two modes?

👉 https://medium.com/p/77e5477efe59

#chatgpt #openai #agent #agentai #agenticai #samaltman #technology

#chatgpt #openai #agent #agentai #agenticai #samaltman

Sarah Lea @[email protected] · 2025-06-09 · 20:49 UTC

LangWHAT?
You've seen names like LangChain, LangGraph, LangFlow or LangSmith – but what’s really behind them?

:blobcoffee: LangChain helps us build LLM apps via modular code.

:blobcoffee: LangGraph adds branching logic and multi-agent workflows.

:blobcoffee: LangFlow lets us create flows with drag & drop.

:blobcoffee: LangSmith monitors and evaluates our LLM stack.

LangChain, LangGraph and LangSmith come from the same ecosystem. LangFlow is a visual builder developed independently by DataStax.

Tried both LangChain and Langflow to build the same chatbot — Medium article coming shortly.

#LangChain #LangFlow #LLM #AI #KI #python #OpenSource #LangGraph #LangSmith #technology #chatbot #ollama

#langchain #langflow #llm #ai #ki #python

Sarah Lea @[email protected] · 2024-12-24 · 20:53 UTC

One of the most highlighted parts: "There is no need to move data. Data latency is minimised. Data can be transformed and analysed within a single platform.“

This is one of the reasons for 'Why ETL-Zero' :blobcoffee:

https://towardsdatascience.com/why-etl-zero-understanding-the-shift-in-data-integration-as-a-beginner-d0cefa244154

#data #datascience #dataanalysis #dataanalytics #DataEngineering #sql #salesforce #etl #datawarehouse #datalake #datalakehouse #programming

#programming #data #datascience #dataanalysis #dataanalytics #dataengineering

Sarah Lea @[email protected] · 2024-12-12 · 01:09 UTC

In a data warehouse you store structured & organized data. In a data lake you can additionally store unstructured data. And was is now a data lakehouse?

Think of a combination of the strengths of both previous data platforms. :blobcoffee:

https://towardsdatascience.com/sql-and-data-modelling-in-action-a-deep-dive-into-data-lakehouses-fcbab9a4b9c2

#data #DataEngineering #datalakehouse #datacenters #datawarehouse #datalake #datascience #sql

#data #dataengineering #datalakehouse #datacenters #datawarehouse #datalake

Sarah Lean @[email protected] · 2023-12-14 · 19:44 UTC

Planning some new content for the YouTube channel.

#ThinkPadThursday #LenovoIN

#thinkpadthursday #lenovoin

Sarah Lean @[email protected] · 2023-01-08 · 16:54 UTC

New home lab kit has arrived! 👌

#Intel #IntelNUC #HybridCloud #Azure #AzureArc #WindowsServer #WindowsServer2022

#intel #intelnuc #hybridcloud #azure #azurearc #windowsserver

Sarah Lean @[email protected] · 2022-12-06 · 08:54 UTC

In today's blog post as part of the #FestiveTechCalendar, I am talking about Azure Stack HCI!

https://www.techielass.com/azure-stack-hci-the-best-of-the-cloud-and-on-premises/

#CloudFamily #MVPBuzz #AzureStackHCI #Hybrid

#festivetechcalendar #cloudfamily #mvpbuzz #azurestackhci #hybrid

The Hollywood Reporter @[email protected] · 2026-02-14 · 18:20 UTC

Sarah Pidgeon Learned to Speak Her Mind by Playing Style Icon Carolyn Bessette Kennedy in ‘Love Story’
#TV #TVFeatures #FX #JFKJr #LoveStory #NextBigThing #RyanMurphy #SarahPidgeon #TheKennedys

https://www.hollywoodreporter.com/tv/tv-features/sarah-pidgeon-love-story-carolyn-bessette-style-interview-1236504303/

#tv #tvfeatures #fx #jfkjr #lovestory #nextbigthing

The Hollywood Reporter @[email protected] · 2026-02-14 · 18:20 UTC

Sarah Pidgeon Learned to Speak Her Mind by Playing Style Icon Carolyn Bessette Kennedy in ‘Love Story’
#TV #TVFeatures #FX #JFKJr #LoveStory #NextBigThing #RyanMurphy #SarahPidgeon #TheKennedys

https://www.hollywoodreporter.com/tv/tv-features/sarah-pidgeon-love-story-carolyn-bessette-style-interview-1236504303/

#tv #tvfeatures #fx #jfkjr #lovestory #nextbigthing

NewsletterTF @[email protected] · 2026-02-12 · 17:05 UTC

Sarah Ferguson Leaves TV Talk While Asked About Old Problem

https://newsletter.tf/sarah-ferguson-interview-walk-out-scandal/

Sarah Ferguson walked out of a TV talk years ago when asked about money for access. A video of this is now being shared.

#SarahFerguson, #RoyalFamily, #Scandal, #TVInterview, #PrinceAndrew

#sarahferguson #royalfamily #scandal #tvinterview #princeandrew

NewsletterTF @[email protected] · 2026-02-12 · 17:03 UTC

Sarah Ferguson Leaves TV Talk While Asked About Old Problem

A video shows Sarah Ferguson leaving a TV talk a long time ago. She was asked about an old problem where she was accused of taking money to help people meet Prince Andrew. She got upset and walked away.

https://newsletter.tf/sarah-ferguson-interview-walk-out-scandal/

#SarahFerguson, #RoyalFamily, #Scandal, #TVInterview, #PrinceAndrew