home.social

Search

1000 results for “Sarah_Lea”

  1. Have you heard "it works on my machine"? Enter: containers. Learn how Docker, Inc ensures consistent ML models, data pipelines, and environments across any system in this article :blobcoffee: towardsdatascience.com/why-dat

  2. 🚀 Exciting news!

    I'm working on "Terraform for Ops: Automating Infrastructure Tasks" 📘.

    This book is your guide to mastering Terraform and streamlining IT operations.

    Sign up for updates and be the first to know when it's out! 👉 leanpub.com/terraformforops

    #Terraform #DevOps #ITPros

  3. Regex vs. LLM for B2B document extraction. This week, I tried out both.

    :blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

    :blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

    :blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

    :blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

    Full comparison with code and trade-off breakdown on TDS: shorturl.at/v4gdl

  4. Regex vs. LLM for B2B document extraction. This week, I tried out both.

    :blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

    :blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

    :blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

    :blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

    Full comparison with code and trade-off breakdown on TDS: shorturl.at/v4gdl

    #Python #DataScience #business #technology #dataengineering #LLM #Automation #OCR

  5. Regex vs. LLM for B2B document extraction. This week, I tried out both.

    :blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

    :blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

    :blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

    :blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

    Full comparison with code and trade-off breakdown on TDS: shorturl.at/v4gdl

    #Python #DataScience #business #technology #dataengineering #LLM #Automation #OCR

  6. Regex vs. LLM for B2B document extraction. This week, I tried out both.

    :blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

    :blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

    :blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

    :blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

    Full comparison with code and trade-off breakdown on TDS: shorturl.at/v4gdl

    #Python #DataScience #business #technology #dataengineering #LLM #Automation #OCR

  7. Regex vs. LLM for B2B document extraction. This week, I tried out both.

    :blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

    :blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

    :blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

    :blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

    Full comparison with code and trade-off breakdown on TDS: shorturl.at/v4gdl

    #Python #DataScience #business #technology #dataengineering #LLM #Automation #OCR

  8. Formatting in Word stole hours of my thesis work. So I built a different process for long documents.

    :blobcoffee: OneNote as the thinking hub.
    :blobcoffee: OneLatex as the translator as it turns the notebook into a clean, formatted PDF automatically.

    My new article: a 7-step workflow + a free OneNote template.

    :blobcoffee: 👉 bit.ly/4e4y7n4

    #business #it #writing #productivity #thesis #onenote #word #latex #technology #student

  9. Most ML issues are not model problems. They are data problems.

    I retrained the same churn model twice.
    Same code. Same path to the data.
    Different result.

    Why? Because of mutable data references.

    :blobcoffee: I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: tinyurl.com/lake-vs-lakehouse-

    :blobcoffee: Friend-Link: medium.com/towards-artificial-

    #ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression

  10. Most ML issues are not model problems. They are data problems.

    I retrained the same churn model twice.
    Same code. Same path to the data.
    Different result.

    Why? Because of mutable data references.

    :blobcoffee: I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: tinyurl.com/lake-vs-lakehouse-

    :blobcoffee: Friend-Link: medium.com/towards-artificial-

    #ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression

  11. Most ML issues are not model problems. They are data problems.

    I retrained the same churn model twice.
    Same code. Same path to the data.
    Different result.

    Why? Because of mutable data references.

    :blobcoffee: I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: tinyurl.com/lake-vs-lakehouse-

    :blobcoffee: Friend-Link: medium.com/towards-artificial-

    #ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression

  12. Most ML issues are not model problems. They are data problems.

    I retrained the same churn model twice.
    Same code. Same path to the data.
    Different result.

    Why? Because of mutable data references.

    :blobcoffee: I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: tinyurl.com/lake-vs-lakehouse-

    :blobcoffee: Friend-Link: medium.com/towards-artificial-

  13. Most ML issues are not model problems. They are data problems.

    I retrained the same churn model twice.
    Same code. Same path to the data.
    Different result.

    Why? Because of mutable data references.

    :blobcoffee: I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: tinyurl.com/lake-vs-lakehouse-

    :blobcoffee: Friend-Link: medium.com/towards-artificial-

    #ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression

  14. Chunk size in RAG systems defines the size of the text segments into which documents are split before embedding.

    I wanted to understand the impact of three different chunk sizes, so I built a small RAG system to test it: towardsdatascience.com/chunk-s

    :blobcoffee: Wishing you all a successful start to 2026

    #ai #datascience #datascientist #ki #artificialintelligence #python #rag #towardsdatascience #programming #Technology

  15. What’s a CSV Plot Agent? I wanted to create an agent that automatically analyzes and visualizes data from a CSV. I built it using LangChain and Streamlit (two Python frameworks).

    :blobcoffee: Check out the step-by-step guide here: medium.com/towards-artificial-

    :blobcoffee: Here’s the code in the GitHub repo: github.com/Sari95/CSV-Plot-Age

    #python #langchain #programming #agenticai #ai #ki #data #datascience #datascientist #streamlit #agent

  16. I asked ChatGPT to create a study plan and add the sessions directly to my calendar. It worked.

    2025 has been called the year of AI agents and these two new ChatGPT modes show why:
    :blobcoffee: Agents that research, act, and run tools on their own
    :blobcoffee: Tutors that guide your thinking instead of just answering

    What do you think about the two modes?

    👉 medium.com/p/77e5477efe59

    #chatgpt #openai #agent #agentai #agenticai #samaltman #technology

  17. LangWHAT?
    You've seen names like LangChain, LangGraph, LangFlow or LangSmith – but what’s really behind them?

    :blobcoffee: LangChain helps us build LLM apps via modular code.

    :blobcoffee: LangGraph adds branching logic and multi-agent workflows.

    :blobcoffee: LangFlow lets us create flows with drag & drop.

    :blobcoffee: LangSmith monitors and evaluates our LLM stack.

    LangChain, LangGraph and LangSmith come from the same ecosystem. LangFlow is a visual builder developed independently by DataStax.

    Tried both LangChain and Langflow to build the same chatbot — Medium article coming shortly.

    #LangChain #LangFlow #LLM #AI #KI #python #OpenSource #LangGraph #LangSmith #technology #chatbot #ollama

  18. In a data warehouse you store structured & organized data. In a data lake you can additionally store unstructured data. And was is now a data lakehouse?

    Think of a combination of the strengths of both previous data platforms. :blobcoffee:

    towardsdatascience.com/sql-and

    #data #DataEngineering #datalakehouse #datacenters #datawarehouse #datalake #datascience #sql

  19. Sarah Ferguson Leaves TV Talk While Asked About Old Problem

    newsletter.tf/sarah-ferguson-i

    Sarah Ferguson walked out of a TV talk years ago when asked about money for access. A video of this is now being shared.

    #SarahFerguson, #RoyalFamily, #Scandal, #TVInterview, #PrinceAndrew

  20. Sarah Ferguson Leaves TV Talk While Asked About Old Problem

    A video shows Sarah Ferguson leaving a TV talk a long time ago. She was asked about an old problem where she was accused of taking money to help people meet Prince Andrew. She got upset and walked away.

    newsletter.tf/sarah-ferguson-i

    #SarahFerguson, #RoyalFamily, #Scandal, #TVInterview, #PrinceAndrew