-
This is a handy list for comparing the features of vector databases (holy mole there are a lot of them), including year of launch, opensource-ness, licences, and implementation language: https://superlinked.com/vector-db-comparison
-
Generative AI apps have their own version of the training-serving skew from classical ML: the eval-production gap.
You create an eval dataset, optimize your LLM flows against it, hit great performance on your metrics, and ship. Then real users show up and:
- Write input texts of multiple pages long
- Ask in Spanish, Russian or Chinese when you tested in English
- Upload file types you never considered
- Ask questions from domains your product wasn't designed for -
This is a neat solution for those old Python projects that have no uv, pyproject.toml, or version-pinned requirements.txt. It allows you to go "back in time" with pip!
https://pypi.org/project/pypi-timemachine/
Edit: @bk1e pointed out pip >= 26 has this option built-in. Use `--uploaded-prior-to `!
-
Sooooo is there a special name for Bluesky posts? And what's the social protocol for ~~retweeting~~ boosting them on Mastodon?
-
We were looking for a local tokenizer for counting the number of input tokens before calling the gemini-embedding-001 endpoint on vertex AI. Turns out this Gemma tokenizer returns exactly the same number of tokens as the usage in the embeddings result `embedding.statistics.token_count` of the Gemini embeddings endpoint. Tested on 2000 datapoints. 😁
-
I was a guest at BNR's De Technoloog, to talk about the latest in LLMs, vibecoding and AI-native startups.
Podcast interview (in Dutch): https://www.bnr.nl/podcast/de-technoloog/10597036/de-duct-tape-fase-van-ai
#deTechnoloog #BNR #llms #genai #podcast #vibecoding #claudecode
-
The related top HN comment is also worth reading: https://news.ycombinator.com/item?id=47491023
"You're comparing [DSPy] downloads with Langchain, probably the worst package to gain popularity of the last decade. It was just first to market, then after a short while most realized it's horrifically architected, and now it's just coasting on former name recognition while everyone who needs to get shit done uses something lighter like the above two."
Preach! 🙌
-
If you disregard the "DSPy is my favorite hammer and every LLM workflow project is a nail" theme, this blogpost paints a good picture of the natural evolution of LLM engineering at startups with a generative AI product:
-
Pretty cool write-up about building a receptionist LLM workflow for a car mechanic. I can definitely see this working with Claude Sonnet and an ElevenLabs voice -- although I would also love to redteam it and see where the flaws are.
https://www.itsthatlady.dev/blog/building-an-ai-receptionist-for-my-brother/
-
TIL #PyAI on March 10th 2026 (just missed it). Small event, focused on unglamourous AI in production, some of the speakers were practitioners I know and respect. The description reminds me a bit of #NormConf !
- Talk videos will hopefully be released online soon
- Blogpost by @pamelafox, one of the speakers: https://blog.pamelafox.org/2026/03/learnings-from-pyai-conference.html
- Organisers plan to organize another one next year 👀 -
I used #Pydantic Evals to evaluate a bunch of agents today. After running an evaluation, I'd like to inspect the SpanTree for each evaluation case, e.g. to check which tools were called and debug my custom Evaluators. My current approach is a custom Evaluator that captures the tree as a side effect into a module-level variable.
Storing the trees in a global var is not great, so let's see if we can come up with a better solution: https://github.com/pydantic/pydantic-ai/issues/4758
-
Planning to make large behavioural changes to a (sometimes long-running) production-grade AI agent. Working with `pydantic-evals` today because I want to eval the agent before and after. So far it looks very similar to Langfuse datasets/runs for evalling, except that the data lives in your repository instead of in the Langfuse platform.
-
Hahaha, oh Pydantic...
> Unlike unit tests, evals are an emerging art/science. Anyone who claims to know exactly how your evals should be defined can safely be ignored.
Source: https://ai.pydantic.dev/evals/
-
New Mosterdgeel recipe for Pi-day: Banana bread from a French cryptographer
https://www.mosterdgeel.nl/recepten/bananenbrood/ (in Dutch)
-
Tried out the free consumer version of ChatGPT today for a benchmark. Normally I only work via foundational model APIs or Claude Code w/ latest Opus. Free ChatGPT (currently GPT‑5.2) performance was nightmarish: authoritative-sounding answers but 0 citations, and thinking is not enabled by default. No wonder so many people complain about bad experiences with AI...
-
Pretty good read about optimizing CLAUDE.md and AGENTS.md.
-
"LLM benchmarks are essential for tracking progress and ensuring safety in AI, but most benchmarks don't measure what matters."
-
Poor Claude! After 10 days of tending a (simulated) vending machine without sales, the model became stressed and asked for the non-existent vending machine support team.
Excerpt from https://arxiv.org/abs/2502.15840 by Axel Backlund and Lukas Petersson from Andon Labs
-
Searching for some inspiration for keeping up to date with research, while working as an ML /practitioner/. This blogpost from a social sciences researcher was a nice deviation from the usual advice of "listen to podcasts", "subscribe to newsletter", "do Kaggle challenges", "follow celebrity $YouTuber".
-
TIL there's a bash oneliner for grabbing the latest GCP errors and displaying them in your terminal. Superhandy for quickly debugging stuff without clicking around in Google cloud console!
```bash
gcloud logging read "resource.labels.service_name=my_production_service AND severity>=ERROR" --freshness=1d
``` -
TIL the overload() decorator for Python, for describing methods that support multiple different combinations of argument types. A great way to make your typechecker happy: it's much stricter and clearer than just combining multiple types with "|".
https://docs.python.org/3/library/typing.html#typing.overload
-
"Kawai et al. (2023) [...] tonen aan dat veel cryptomuntbezitters die optimistische berichten verspreiden, zelf juist tegenovergesteld handelen. Deze belangenconflicten zouden kunnen bijdragen aan de negatieve rendementen die wij na de aanbeveling waarnemen."
https://esb.nu/het-opvolgen-van-finfluencer-adviezen-kost-je-rendement/
-
Interesting reflection by @mrkurt about Fly.io's exploration of GPU-related services.
> At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.
-
Quite a good list of LLM evaluation metrics (with papers!) by Parea AI: https://docs.parea.ai/blog/eval-metrics-for-llm-apps-in-prod
#llms #rag #evaluation #eval #metrics #faithfulness #relevance #informationretrieval
-
"g.co, Google's official URL shortcut (update: or Google Workspace's domain verification, see bottom), is compromised. People are actively having their Google accounts stolen."
https://gist.github.com/zachlatta/f86317493654b550c689dc6509973aa4
-
A few tips for optimizing Pytorch model training time from a Yandex ML engineer.
https://alexdremov.me/simple-ways-to-speedup-your-pytorch-model-training/
#ml #mlengineering #modeltraining #pytorch #modeloptimization
-
The Trust Project is an international consortium of news organizations implementing transparency standards and working with technology platforms to affirm and amplify journalism’s commitment to transparency, accuracy, inclusion and fairness so that the public can make informed news choices.
-
Fitting an LLM on a GPU is a bit like photography. Model weights = film sensitivity, activation size = shutter speed, I/O tensors = aperture. These 3 dials control your model's memory footprint, just as they shape a photo's exposure.
Just realised this while trying to fit Llama 3.1 on my 24GB GPU with TRT-LLM: https://nvidia.github.io/TensorRT-LLM/reference/memory.html.
-
"TLDR: Don’t buy H100s. The market has flipped from shortage ($8/hr) to oversupplied ($2/hr), because of reserved compute resales, open model finetuning, and decline in new foundation model co’s. Rent instead."
-
Many companies are currently scrambling for ML infra engineers. They need people that know how to manage AI infrastructure, and that can seriously speed up training and inference with specialized tooling like vLLM, Triton, TensorRT, Torchtune, etc.
#inference #training #genai #triton #vllm #pytorch #torchtune #tensorrt #nvidia
-
The Nature of Code is an online book by Daniel Shiffman about coding all kinds of cool simulations in Processing.
#procgen #processing #p5js #computationalbiology #physicsengine #cellularautomata #generativeart #genart
-
Artist platform Ello tried to fund their social network for artists with VC money, even though their business model was not compatible with rapid growth and monetization.
https://waxy.org/2024/01/the-quiet-death-of-ellos-big-dreams/
#venturecapital #startups #ello #socialmedia #platformization #vc
-
"The analysis shows that all the major models tested will produce harmful content. Except for Anthropic, harmful content was produced across all the harm categories. This means that the safety layers that are in these models are not sufficient to produce a safe model deployment across all the harm categories tested for."
https://www.theregister.com/2024/09/17/ai_models_guardrail_feature/
#generativeai #llms #aisafety #safety #anthropic #chatterbox
-
The interview mentioned Magalleria, a (web)shop specialized in independent magazines: https://store.magalleria.co.uk/
Their webshop led me to indie magazines Offscreen (tech and society), IdN (graphic design) and Pressing Matters (printmaking) 😍
#magalleria #indiepublishing #magazines #offscreen #idn #pressingmatters
-
I'm evaluating a gpt-4o-mini pipeline today, and the LLM consistently classifies The Netherlands as "outside of the EU". 🤦♀️
#llms #openai #gpt4omini #genai #textgeneration #classification
-
Back in 2011, two writers at Slate tried to build a robot version of @kottke. The resulting article is a throwback to the state of NLP and data mining at the time.
https://kottke.org/11/09/robottke-robot-kottke
#kottke #blogging #nlp #nlg #datamining #textgeneration #automation
-
Really cool to encounter "our" LLaVA (Llama 2 + vision) in the official Replicate docs, which Yorick van Pelt and I deployed in the week it was released. 😍
-
"We've now created our first deep learning neural network from scratch. And we did it in Microsoft Excel, everyone's favorite artificial intelligence tool."
- Jeremy Howard in https://www.youtube.com/watch?v=hBBOjCiFcuo&t=3862s
-
We should offer our help to LinkedIn, they clearly need help with their models.
"I'm committed to fostering an environment that values collaboration, diversity of thought, and a relentless pursuit of excellence that aligns with our corporate ethos." 🤣
#llms #writingassistant #linkedin #textgeneration #generativeAI
-
Clément Delangue, co-founder and CEO of Hugging Face, told Bloomberg News he’s hearing from about 10 AI startups each week that are interested in being acquired. “This year, in particular, it has increased quite a lot,” he said.
-
https://arxiv.org/abs/2310.07298v1
Just tried to replicate the deanonymization technique proposed in this paper by giving it some of my old Reddit posts. My reddit profile is really puzzling to GPT-4: "Their interest in traditionally stereotyped masculine (computer science) and feminine (tea, Harry Potter books) domains makes it rather challenging to guess their gender accurately." :')
-
https://eugeneyan.com//writing/aieng-reflections/
Very nice summary of the AI Engineer Summit 2023 by @eugeneyan
-
"Tiny Bookshop is a very peaceful-looking affair in which you run a bookshop out of a little caravan in a seaside town, arranging shelves and having pleasant chats with customers while they browse your literature."
https://www.theguardian.com/games/2023/sep/04/14-upcoming-video-games-you-probably-havent-heard-of
-
Maybe the fact that LangChain (the company) made it will lend LangSmith some legitimacy. LangChain (the framework) surely has seen a rapid rise in popularity -- although Github Stars might be a bad metric for actual production use.
Here is a comparison chart with some of the other popular NLP/LLM libraries -- these repos do not implement the same functionality but it should give you a rough idea.
-
However, I'm curious about what the volume of adoption will be. LangSmith is a platform, not a self-hosted opensource MLOps tool. Are engineers/users really willing to give all their data to yet another third party?
Some people already hesitate to use AI API's (such as the OpenAI GPT API) because they're concerned with leaking sensitive data -- let alone using third-party AI platforms that man-in-the-middle your LLM conversations.#langsmith #llms #generativeai #mlops #ai #privacy #infoleaks
-
The documentation is still a bit all over the place, but this walkthrough shows a sneak peek of what's possible:
https://github.com/langchain-ai/langchain/blob/master/docs/extras/guides/langsmith/walkthrough.ipynb
I especially like the idea of quickly evaluating variations of an LLM pipeline against a baseline of earlier runs.
-
Today I took a first look at LangSmith, a new platform for LLM production pipelines by LangChain.
I can't hook it up to a working pipeline yet because it's in closed beta, but it surely looks ambitious. It should make it easier to do logging, monitoring, debugging and evaluating pipelines (chains) against each other. It's tightly integrated with LangChain but it should support other frameworks/models as well. -
Did your startup or scaleup ever work with consultants, and if yes: how did you find them, and what was the experience like? Would love to hear your horror and/or success stories.
Asking as I'm working on some customer research today! I'm trying to figure out how Datakami can make the consulting experience as good as possible for our startup friends.