Search
47 results for “jd7h”
-
This is a handy list for comparing the features of vector databases (holy mole there are a lot of them), including year of launch, opensource-ness, licences, and implementation language: https://superlinked.com/vector-db-comparison
-
Generative AI apps have their own version of the training-serving skew from classical ML: the eval-production gap.
You create an eval dataset, optimize your LLM flows against it, hit great performance on your metrics, and ship. Then real users show up and:
- Write input texts of multiple pages long
- Ask in Spanish, Russian or Chinese when you tested in English
- Upload file types you never considered
- Ask questions from domains your product wasn't designed for -
This is a neat solution for those old Python projects that have no uv, pyproject.toml, or version-pinned requirements.txt. It allows you to go "back in time" with pip!
https://pypi.org/project/pypi-timemachine/
Edit: @bk1e pointed out pip >= 26 has this option built-in. Use `--uploaded-prior-to `!
-
Sooooo is there a special name for Bluesky posts? And what's the social protocol for ~~retweeting~~ boosting them on Mastodon?
-
We were looking for a local tokenizer for counting the number of input tokens before calling the gemini-embedding-001 endpoint on vertex AI. Turns out this Gemma tokenizer returns exactly the same number of tokens as the usage in the embeddings result `embedding.statistics.token_count` of the Gemini embeddings endpoint. Tested on 2000 datapoints. 😁
-
The related top HN comment is also worth reading: https://news.ycombinator.com/item?id=47491023
"You're comparing [DSPy] downloads with Langchain, probably the worst package to gain popularity of the last decade. It was just first to market, then after a short while most realized it's horrifically architected, and now it's just coasting on former name recognition while everyone who needs to get shit done uses something lighter like the above two."
Preach! 🙌
-
If you disregard the "DSPy is my favorite hammer and every LLM workflow project is a nail" theme, this blogpost paints a good picture of the natural evolution of LLM engineering at startups with a generative AI product:
-
Pretty cool write-up about building a receptionist LLM workflow for a car mechanic. I can definitely see this working with Claude Sonnet and an ElevenLabs voice -- although I would also love to redteam it and see where the flaws are.
https://www.itsthatlady.dev/blog/building-an-ai-receptionist-for-my-brother/
-
TIL #PyAI on March 10th 2026 (just missed it). Small event, focused on unglamourous AI in production, some of the speakers were practitioners I know and respect. The description reminds me a bit of #NormConf !
- Talk videos will hopefully be released online soon
- Blogpost by @pamelafox, one of the speakers: https://blog.pamelafox.org/2026/03/learnings-from-pyai-conference.html
- Organisers plan to organize another one next year 👀 -
I used #Pydantic Evals to evaluate a bunch of agents today. After running an evaluation, I'd like to inspect the SpanTree for each evaluation case, e.g. to check which tools were called and debug my custom Evaluators. My current approach is a custom Evaluator that captures the tree as a side effect into a module-level variable.
Storing the trees in a global var is not great, so let's see if we can come up with a better solution: https://github.com/pydantic/pydantic-ai/issues/4758
-
Planning to make large behavioural changes to a (sometimes long-running) production-grade AI agent. Working with `pydantic-evals` today because I want to eval the agent before and after. So far it looks very similar to Langfuse datasets/runs for evalling, except that the data lives in your repository instead of in the Langfuse platform.
-
Hahaha, oh Pydantic...
> Unlike unit tests, evals are an emerging art/science. Anyone who claims to know exactly how your evals should be defined can safely be ignored.
Source: https://ai.pydantic.dev/evals/
-
New Mosterdgeel recipe for Pi-day: Banana bread from a French cryptographer
https://www.mosterdgeel.nl/recepten/bananenbrood/ (in Dutch)
-
Tried out the free consumer version of ChatGPT today for a benchmark. Normally I only work via foundational model APIs or Claude Code w/ latest Opus. Free ChatGPT (currently GPT‑5.2) performance was nightmarish: authoritative-sounding answers but 0 citations, and thinking is not enabled by default. No wonder so many people complain about bad experiences with AI...
-
Pretty good read about optimizing CLAUDE.md and AGENTS.md.
-
"LLM benchmarks are essential for tracking progress and ensuring safety in AI, but most benchmarks don't measure what matters."
-
Poor Claude! After 10 days of tending a (simulated) vending machine without sales, the model became stressed and asked for the non-existent vending machine support team.
Excerpt from https://arxiv.org/abs/2502.15840 by Axel Backlund and Lukas Petersson from Andon Labs
-
Searching for some inspiration for keeping up to date with research, while working as an ML /practitioner/. This blogpost from a social sciences researcher was a nice deviation from the usual advice of "listen to podcasts", "subscribe to newsletter", "do Kaggle challenges", "follow celebrity $YouTuber".
-
TIL there's a bash oneliner for grabbing the latest GCP errors and displaying them in your terminal. Superhandy for quickly debugging stuff without clicking around in Google cloud console!
```bash
gcloud logging read "resource.labels.service_name=my_production_service AND severity>=ERROR" --freshness=1d
``` -
TIL the overload() decorator for Python, for describing methods that support multiple different combinations of argument types. A great way to make your typechecker happy: it's much stricter and clearer than just combining multiple types with "|".
https://docs.python.org/3/library/typing.html#typing.overload
-
Quite a good list of LLM evaluation metrics (with papers!) by Parea AI: https://docs.parea.ai/blog/eval-metrics-for-llm-apps-in-prod
#llms #rag #evaluation #eval #metrics #faithfulness #relevance #informationretrieval
-
"g.co, Google's official URL shortcut (update: or Google Workspace's domain verification, see bottom), is compromised. People are actively having their Google accounts stolen."
https://gist.github.com/zachlatta/f86317493654b550c689dc6509973aa4
-
A few tips for optimizing Pytorch model training time from a Yandex ML engineer.
https://alexdremov.me/simple-ways-to-speedup-your-pytorch-model-training/
#ml #mlengineering #modeltraining #pytorch #modeloptimization
-
The Trust Project is an international consortium of news organizations implementing transparency standards and working with technology platforms to affirm and amplify journalism’s commitment to transparency, accuracy, inclusion and fairness so that the public can make informed news choices.
-
"TLDR: Don’t buy H100s. The market has flipped from shortage ($8/hr) to oversupplied ($2/hr), because of reserved compute resales, open model finetuning, and decline in new foundation model co’s. Rent instead."
-
"TLDR: Don’t buy H100s. The market has flipped from shortage ($8/hr) to oversupplied ($2/hr), because of reserved compute resales, open model finetuning, and decline in new foundation model co’s. Rent instead."
-
"TLDR: Don’t buy H100s. The market has flipped from shortage ($8/hr) to oversupplied ($2/hr), because of reserved compute resales, open model finetuning, and decline in new foundation model co’s. Rent instead."
-
"TLDR: Don’t buy H100s. The market has flipped from shortage ($8/hr) to oversupplied ($2/hr), because of reserved compute resales, open model finetuning, and decline in new foundation model co’s. Rent instead."
-
The Nature of Code is an online book by Daniel Shiffman about coding all kinds of cool simulations in Processing.
#procgen #processing #p5js #computationalbiology #physicsengine #cellularautomata #generativeart #genart
-
Artist platform Ello tried to fund their social network for artists with VC money, even though their business model was not compatible with rapid growth and monetization.
https://waxy.org/2024/01/the-quiet-death-of-ellos-big-dreams/
#venturecapital #startups #ello #socialmedia #platformization #vc