Sign in Create account

Search

150 results for “jd7h”

Judith van Stegeren @jd7h · 2026-03-25 · 09:04 UTC

If you disregard the "DSPy is my favorite hammer and every LLM workflow project is a nail" theme, this blogpost paints a good picture of the natural evolution of LLM engineering at startups with a generative AI product:
https://skylarbpayne.com/posts/dspy-engineering-patterns/
#llms #genai #dspy

#llms #genai #dspy
Judith van Stegeren @jd7h · 2026-03-25 · 08:52 UTC

Pretty cool write-up about building a receptionist LLM workflow for a car mechanic. I can definitely see this working with Claude Sonnet and an ElevenLabs voice -- although I would also love to redteam it and see where the flaws are.
https://www.itsthatlady.dev/blog/building-an-ai-receptionist-for-my-brother/
#genai #llms #elevenlabs #tts #claude

#genai #llms #elevenlabs #tts #claude
Judith van Stegeren @jd7h · 2026-03-20 · 21:22 UTC

TIL #PyAI on March 10th 2026 (just missed it). Small event, focused on unglamourous AI in production, some of the speakers were practitioners I know and respect. The description reminds me a bit of #NormConf !
https://pyai.events/
- Talk videos will hopefully be released online soon
- Blogpost by @pamelafox, one of the speakers: https://blog.pamelafox.org/2026/03/learnings-from-pyai-conference.html
- Organisers plan to organize another one next year 👀
#llms #genai #pydantic

#pyai #normconf #llms #genai #pydantic
Judith van Stegeren @jd7h · 2026-03-20 · 20:31 UTC

I used #Pydantic Evals to evaluate a bunch of agents today. After running an evaluation, I'd like to inspect the SpanTree for each evaluation case, e.g. to check which tools were called and debug my custom Evaluators. My current approach is a custom Evaluator that captures the tree as a side effect into a module-level variable.
Storing the trees in a global var is not great, so let's see if we can come up with a better solution: https://github.com/pydantic/pydantic-ai/issues/4758
#llms #evals #foss

#pydantic #llms #evals #foss
Judith van Stegeren @jd7h · 2026-03-19 · 11:06 UTC

Planning to make large behavioural changes to a (sometimes long-running) production-grade AI agent. Working with `pydantic-evals` today because I want to eval the agent before and after. So far it looks very similar to Langfuse datasets/runs for evalling, except that the data lives in your repository instead of in the Langfuse platform.
https://ai.pydantic.dev/evals/
#llms #pydantic #genai #agents #claude #langfuse

#llms #pydantic #genai #agents #claude #langfuse
Judith van Stegeren @jd7h · 2026-03-17 · 10:43 UTC

Hahaha, oh Pydantic...
> Unlike unit tests, evals are an emerging art/science. Anyone who claims to know exactly how your evals should be defined can safely be ignored.
Source: https://ai.pydantic.dev/evals/
#pydantic #evals #llms #genai

#pydantic #evals #llms #genai
Judith van Stegeren @jd7h · 2026-03-15 · 21:37 UTC

New Mosterdgeel recipe for Pi-day: Banana bread from a French cryptographer
https://www.mosterdgeel.nl/recepten/bananenbrood/ (in Dutch)
#recepten #recipes #bananabread #piday #piday2026

#recepten #recipes #bananabread #piday #piday2026
Judith van Stegeren @[email protected] · 2026-03-15 · 21:37 UTC

New Mosterdgeel recipe for Pi-day: Banana bread from a French cryptographer
https://www.mosterdgeel.nl/recepten/bananenbrood/ (in Dutch)
#recepten #recipes #bananabread #piday #piday2026

#recepten #recipes #bananabread #piday #piday2026
Judith van Stegeren @[email protected] · 2026-03-15 · 21:37 UTC

New Mosterdgeel recipe for Pi-day: Banana bread from a French cryptographer
https://www.mosterdgeel.nl/recepten/bananenbrood/ (in Dutch)
#recepten #recipes #bananabread #piday #piday2026

#recepten #recipes #bananabread #piday #piday2026
Judith van Stegeren @[email protected] · 2026-03-15 · 21:37 UTC

New Mosterdgeel recipe for Pi-day: Banana bread from a French cryptographer
https://www.mosterdgeel.nl/recepten/bananenbrood/ (in Dutch)
#recepten #recipes #bananabread #piday #piday2026

#piday2026 #piday #bananabread #recipes #recepten
Judith van Stegeren @[email protected] · 2026-03-15 · 21:37 UTC

New Mosterdgeel recipe for Pi-day: Banana bread from a French cryptographer
https://www.mosterdgeel.nl/recepten/bananenbrood/ (in Dutch)
#recepten #recipes #bananabread #piday #piday2026

#recepten #recipes #bananabread #piday #piday2026
Judith van Stegeren @jd7h · 2026-02-18 · 15:29 UTC

Tried out the free consumer version of ChatGPT today for a benchmark. Normally I only work via foundational model APIs or Claude Code w/ latest Opus. Free ChatGPT (currently GPT‑5.2) performance was nightmarish: authoritative-sounding answers but 0 citations, and thinking is not enabled by default. No wonder so many people complain about bad experiences with AI...
#chatgpt #llms #claude #benchmark #evals

#chatgpt #llms #claude #benchmark #evals
Judith van Stegeren @jd7h · 2025-12-03 · 10:28 UTC

Pretty good read about optimizing CLAUDE.md and AGENTS.md.
https://www.humanlayer.dev/blog/writing-a-good-claude-md
#genai #claude #anthropic #llm #codegen #vibecoding

#genai #claude #anthropic #llm #codegen #vibecoding
Judith van Stegeren @jd7h · 2025-11-10 · 07:13 UTC

"LLM benchmarks are essential for tracking progress and ensuring safety in AI, but most benchmarks don't measure what matters."
https://oxrml.com/measuring-what-matters/
#evals #LLMs #benchmark

#evals #llms #benchmark
Judith van Stegeren @jd7h · 2025-10-20 · 15:03 UTC

Poor Claude! After 10 days of tending a (simulated) vending machine without sales, the model became stressed and asked for the non-existent vending machine support team.
Excerpt from https://arxiv.org/abs/2502.15840 by Axel Backlund and Lukas Petersson from Andon Labs
#claude #vendingbench #andonlabs #anthropic #LLMs

#claude #vendingbench #andonlabs #anthropic #llms
Judith van Stegeren @jd7h · 2025-10-19 · 21:00 UTC

Searching for some inspiration for keeping up to date with research, while working as an ML /practitioner/. This blogpost from a social sciences researcher was a nice deviation from the usual advice of "listen to podcasts", "subscribe to newsletter", "do Kaggle challenges", "follow celebrity $YouTuber".
https://nickhop.wordpress.com/2013/03/14/how-to-keep-up-to-date-with-research-in-your-field-particularly-in-the-social-sciences/
#research #phdlife #academia #altac

#research #phdlife #academia #altac
Judith van Stegeren @jd7h · 2025-01-24 · 16:45 UTC

"g.co, Google's official URL shortcut (update: or Google Workspace's domain verification, see bottom), is compromised. People are actively having their Google accounts stolen."
https://gist.github.com/zachlatta/f86317493654b550c689dc6509973aa4
#phishing #gws #google #workspaces

#phishing #gws #google #workspaces
Judith van Stegeren @jd7h · 2024-10-06 · 06:07 UTC

Artist platform Ello tried to fund their social network for artists with VC money, even though their business model was not compatible with rapid growth and monetization.
https://waxy.org/2024/01/the-quiet-death-of-ellos-big-dreams/
#venturecapital #startups #ello #socialmedia #platformization #vc

#venturecapital #startups #ello #socialmedia #platformization #vc
Judith van Stegeren @jd7h · 2025-02-11 · 13:10 UTC

Quite a good list of LLM evaluation metrics (with papers!) by Parea AI: https://docs.parea.ai/blog/eval-metrics-for-llm-apps-in-prod
#llms #rag #evaluation #eval #metrics #faithfulness #relevance #informationretrieval

#llms #rag #evaluation #eval #metrics #faithfulness
Judith van Stegeren @jd7h · 2024-12-10 · 10:30 UTC

A few tips for optimizing Pytorch model training time from a Yandex ML engineer.
https://alexdremov.me/simple-ways-to-speedup-your-pytorch-model-training/
#ml #mlengineering #modeltraining #pytorch #modeloptimization

#ml #mlengineering #modeltraining #pytorch #modeloptimization
Judith van Stegeren @[email protected] · 2024-10-06 · 06:07 UTC

Artist platform Ello tried to fund their social network for artists with VC money, even though their business model was not compatible with rapid growth and monetization.
https://waxy.org/2024/01/the-quiet-death-of-ellos-big-dreams/
#venturecapital #startups #ello #socialmedia #platformization #vc

#venturecapital #startups #ello #socialmedia #platformization #vc
Judith van Stegeren @[email protected] · 2024-10-06 · 06:07 UTC

Artist platform Ello tried to fund their social network for artists with VC money, even though their business model was not compatible with rapid growth and monetization.
https://waxy.org/2024/01/the-quiet-death-of-ellos-big-dreams/
#venturecapital #startups #ello #socialmedia #platformization #vc

#venturecapital #startups #ello #socialmedia #platformization #vc
Judith van Stegeren @[email protected] · 2024-10-06 · 06:07 UTC

Artist platform Ello tried to fund their social network for artists with VC money, even though their business model was not compatible with rapid growth and monetization.
https://waxy.org/2024/01/the-quiet-death-of-ellos-big-dreams/
#venturecapital #startups #ello #socialmedia #platformization #vc

#vc #platformization #socialmedia #ello #startups #venturecapital
Judith van Stegeren @[email protected] · 2024-10-06 · 06:07 UTC

Artist platform Ello tried to fund their social network for artists with VC money, even though their business model was not compatible with rapid growth and monetization.
https://waxy.org/2024/01/the-quiet-death-of-ellos-big-dreams/
#venturecapital #startups #ello #socialmedia #platformization #vc

#venturecapital #startups #ello #socialmedia #platformization #vc
Judith van Stegeren @jd7h · 2024-10-01 · 21:59 UTC

The interview mentioned Magalleria, a (web)shop specialized in independent magazines: https://store.magalleria.co.uk/
Their webshop led me to indie magazines Offscreen (tech and society), IdN (graphic design) and Pressing Matters (printmaking) 😍
#magalleria #indiepublishing #magazines #offscreen #idn #pressingmatters

#magalleria #indiepublishing #magazines #offscreen #pressingmatters #idn
Judith van Stegeren @jd7h · 2025-06-04 · 11:33 UTC

TIL the overload() decorator for Python, for describing methods that support multiple different combinations of argument types. A great way to make your typechecker happy: it's much stricter and clearer than just combining multiple types with "|".
https://docs.python.org/3/library/typing.html#typing.overload
#python #types #typechecking #pyright

#python #types #typechecking #pyright
Judith van Stegeren @jd7h · 2024-09-18 · 15:55 UTC

I'm evaluating a gpt-4o-mini pipeline today, and the LLM consistently classifies The Netherlands as "outside of the EU". 🤦‍♀️
#llms #openai #gpt4omini #genai #textgeneration #classification

#llms #openai #gpt4omini #genai #textgeneration #classification
Judith van Stegeren @jd7h · 2024-09-12 · 18:12 UTC

Back in 2011, two writers at Slate tried to build a robot version of @kottke. The resulting article is a throwback to the state of NLP and data mining at the time.
https://kottke.org/11/09/robottke-robot-kottke
#kottke #blogging #nlp #nlg #datamining #textgeneration #automation

#kottke #blogging #nlg #nlp #datamining #textgeneration
Judith van Stegeren @jd7h · 2024-07-30 · 15:01 UTC

We should offer our help to LinkedIn, they clearly need help with their models.
"I'm committed to fostering an environment that values collaboration, diversity of thought, and a relentless pursuit of excellence that aligns with our corporate ethos." 🤣
#llms #writingassistant #linkedin #textgeneration #generativeAI

#generativeai #llms #writingassistant #linkedin #textgeneration
Judith van Stegeren @jd7h · 2024-11-15 · 07:01 UTC

The Trust Project is an international consortium of news organizations implementing transparency standards and working with technology platforms to affirm and amplify journalism’s commitment to transparency, accuracy, inclusion and fairness so that the public can make informed news choices.
https://thetrustproject.org/
#journalism #media #transparancy #tech

#journalism #media #transparancy #tech

Next page