#interpretableai — Public Fediverse posts on home.social

Bogdan Buduroiu @[email protected] · 2026-02-24 · 16:49 UTC

Steerling-8B, the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data.

https://www.guidelabs.ai/post/steerling-8b-base-model-release/

#AI #InterpretableAI #DiffusionModel #DiffusionModels

#ai #interpretableai #diffusionmodel #diffusionmodels

Bogdan Buduroiu @[email protected] · 2026-02-24 · 16:49 UTC

Steerling-8B, the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data.

https://www.guidelabs.ai/post/steerling-8b-base-model-release/

#AI #InterpretableAI #DiffusionModel #DiffusionModels

#ai #interpretableai #diffusionmodel #diffusionmodels

Bogdan Buduroiu @budududuroiu · 2026-02-24 · 16:49 UTC

Steerling-8B, the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data.

https://www.guidelabs.ai/post/steerling-8b-base-model-release/

#AI #InterpretableAI #DiffusionModel #DiffusionModels

#ai #interpretableai #diffusionmodel #diffusionmodels

Bogdan Buduroiu @[email protected] · 2026-02-24 · 16:49 UTC

Steerling-8B, the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data.

https://www.guidelabs.ai/post/steerling-8b-base-model-release/

#AI #InterpretableAI #DiffusionModel #DiffusionModels

#diffusionmodels #diffusionmodel #interpretableai #ai

Bogdan Buduroiu @[email protected] · 2026-02-24 · 16:49 UTC

Steerling-8B, the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data.

https://www.guidelabs.ai/post/steerling-8b-base-model-release/

#AI #InterpretableAI #DiffusionModel #DiffusionModels

#ai #interpretableai #diffusionmodel #diffusionmodels

Anand Philip @[email protected] · 2023-08-17 · 15:42 UTC

Stephen Hahn, Rico Zhu, Simon Mak, Cynthia Rudin, and Yue Jiang. 2023. An Interpretable, Flexible, and Interactive Probabilistic Framework for Melody Generation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23). Association for Computing Machinery, New York, NY, USA, 4089–4099. https://doi.org/10.1145/3580305.3599772 | I love #interpretableAI and generally the kinda stuff Cynthia rudin produces. Made a few tunes using the tool and they are pretty damn good

#interpretableai

Wout Bittremieux @[email protected] · 2023-06-15 · 09:35 UTC

Our latest paper has now been published in #ImmunoInformatics! 🎉

Predicting #TCR #epitope binding is extremely challenging. 🤯 We used #InterpretableAI techniques to explore how these prediction models work, to achieve a deeper understanding of TCR–epitope interactions and learn how these computational tools can be improved. 🕵️

Publication: https://www.sciencedirect.com/science/article/pii/S2667119023000071

#immunoinformatics #tcr #epitope #interpretableai

Ari Benjamin @[email protected] · 2023-05-10 · 15:03 UTC

Interpretable AI really wants to understand what neurons in LLMs are doing. But this effort is very likely to fail – and it's not the right approach to understand what AI is doing and why.

Like, today, there's weirdly a lot of press about how OpenAI just showed that "Language models can explain neurons in language models" (https://openai.com/research/language-models-can-explain-neurons-in-language-models). But look at the metrics – this was a failed effort. GPT-4 *cannot explain* what neurons in GPT-2 are doing.

More importantly, single-unit interpretability in LLMs is not the same as understanding why and what LLMs as a whole are doing. Even if you did understand when a handful of units activate, you will never be able to stitch these together into a general understanding of why an LLM says the words that it does.

LLMs may someday be able to explain themselves in plain language. But describing (in plain language) when each neuron fires is not going to get us there.

#interpretableAI #LLMs #openai

#interpretableai #llms #openai

Arie van Deursen @[email protected] · 2023-01-05 · 20:13 UTC

“Why is it that neurons sometimes align with features and sometimes don't? Why do some models and tasks have many of these clean neurons, while they're vanishingly rare in others?

In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions.“

https://transformer-circuits.pub/2022/toy_model/index.html

#AnthropicAI #InterpretableAI #superposition

#superposition #interpretableai #anthropicai