home.social

#interpretableai — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #interpretableai, aggregated by home.social.

  1. Steerling-8B, the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data.

    guidelabs.ai/post/steerling-8b

    #AI #InterpretableAI #DiffusionModel #DiffusionModels

  2. Steerling-8B, the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data.

    guidelabs.ai/post/steerling-8b

    #AI #InterpretableAI #DiffusionModel #DiffusionModels

  3. Steerling-8B, the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data.

    guidelabs.ai/post/steerling-8b

  4. Steerling-8B, the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data.

    guidelabs.ai/post/steerling-8b

    #AI #InterpretableAI #DiffusionModel #DiffusionModels

  5. Steerling-8B, the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data.

    guidelabs.ai/post/steerling-8b

    #AI #InterpretableAI #DiffusionModel #DiffusionModels

  6. Stephen Hahn, Rico Zhu, Simon Mak, Cynthia Rudin, and Yue Jiang. 2023. An Interpretable, Flexible, and Interactive Probabilistic Framework for Melody Generation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23). Association for Computing Machinery, New York, NY, USA, 4089–4099. doi.org/10.1145/3580305.359977 | I love #interpretableAI and generally the kinda stuff Cynthia rudin produces. Made a few tunes using the tool and they are pretty damn good

  7. Our latest paper has now been published in #ImmunoInformatics! 🎉

    Predicting #TCR #epitope binding is extremely challenging. 🤯 We used #InterpretableAI techniques to explore how these prediction models work, to achieve a deeper understanding of TCR–epitope interactions and learn how these computational tools can be improved. 🕵️

    Publication: sciencedirect.com/science/arti

  8. Interpretable AI really wants to understand what neurons in LLMs are doing. But this effort is very likely to fail – and it's not the right approach to understand what AI is doing and why.

    Like, today, there's weirdly a lot of press about how OpenAI just showed that "Language models can explain neurons in language models" (openai.com/research/language-m). But look at the metrics – this was a failed effort. GPT-4 *cannot explain* what neurons in GPT-2 are doing.

    More importantly, single-unit interpretability in LLMs is not the same as understanding why and what LLMs as a whole are doing. Even if you did understand when a handful of units activate, you will never be able to stitch these together into a general understanding of why an LLM says the words that it does.

    LLMs may someday be able to explain themselves in plain language. But describing (in plain language) when each neuron fires is not going to get us there.

    #interpretableAI #LLMs #openai

  9. “Why is it that neurons sometimes align with features and sometimes don't? Why do some models and tasks have many of these clean neurons, while they're vanishingly rare in others?

    In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions.“

    transformer-circuits.pub/2022/

    #AnthropicAI #InterpretableAI #superposition