home.social

#data-science — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #data-science, aggregated by home.social.

fetched live
  1. Regex vs. LLM for B2B document extraction. This week, I tried out both.

    :blobcoffee: The rule-based pipeline with pytesseract + regex worked perfectly for Layout A. For Layout B? Every single field returned None.

    :blobcoffee: Because "PO Number" and "Order Reference" are the same thing for a human. Not for a regex pattern.

    :blobcoffee: The LLM-based approach (pytesseract + Ollama + LLaMA 3) extracted both layouts correctly, without touching a single rule. It even normalized the date format automatically.

    :blobcoffee: But LLMs aren't always the right answer. If your documents are stable, speed matters at scale, or explainability is required, regex might still win.

    Full comparison with code and trade-off breakdown on TDS: shorturl.at/v4gdl

    #Python #DataScience #business #technology #dataengineering #LLM #Automation #OCR

  2. 1/5
    The "algorithmic ultimatum" has arrived. Prediction markets just pushed "Power Plants Day"—the total neutralization of the Iranian electrical grid—to a 94% probability for Friday, May 15. When the "smart money" hits this level of certainty, we aren't looking at a guess; we're looking at a countdown.
    #Geopolitics #DataScience #War #Economy #Intelligence #Technology #Politics #Defense #Strategy #GlobalNews

  3. 🌑 Astrobites (M. Ogborn, Penn State): wandering supermassive black holes can be revealed via tidal disruption events (TDEs) – stars ripped apart by an SMBH. The event AT2024tvd lies ~0.8 kpc off the host galaxy's nucleus – the SMBH likely drifted away. Future key tool: Rubin/LSST.

    📅 May 13, 2026
    👉 astrobites.org/2026/05/13/wand

    #Astronomy #DataScience #Science #Space

  4. 💧 Interstellar comet 3I/ATLAS carries water from another planetary system: the ALMA radio interferometer measured ~30× more semi-heavy water (HDO) than Solar System comets and ~40× more than Earth's oceans. It formed in extreme cold below 30 K. Published in Nature Astronomy.

    📅 April 24, 2026
    👉 almaobservatory.org/en/press-r

    #RadioAstronomy #Astronomy #DataScience #Science

  5. ☀️ A University of Sheffield team (R. Jain) has published in Solar Physics: an AI decodes the Sun's p-modes – acoustic waves carrying info from deep inside our star. 30 years of helioseismic data + machine learning = an independent forecaster of solar activity, key to protecting satellites & power grids.

    📅 May 12, 2026
    👉 phys.org/news/2026-05-scientis

    #DataScience #Astronomy #Science #Space

  6. 🔭 Citizen scientists from Backyard Worlds: Planet 9 have DOUBLED the known population of brown dwarfs! A new paper (Schneider et al., Astronomical Journal) reports 3,000+ motion-confirmed L & T dwarf candidates found by volunteers in WISE/NEOWISE-R data via Zooniverse — over the project's 10 years.

    📅 May 13, 2026
    👉 science.nasa.gov/get-involved/

    #CitizenScience #DataScience #Astronomy #Science