#mechanisticinterpretability — Public Fediverse posts on home.social

UKP Lab @[email protected] · 2026-05-21 · 11:03 UTC

Learn more about Phu and his work: https://phusroyal.github.io/

Welcome to the team, Phu! 👋

#UKPLab #TUDarmstadt #MBZUAI #NLP #NLProc #MechanisticInterpretability #LLMs #AIInterpretability

#ukplab #tudarmstadt #mbzuai #nlp #nlproc #mechanisticinterpretability

UKP Lab @[email protected] · 2026-05-21 · 11:03 UTC

Learn more about Phu and his work: https://phusroyal.github.io/

Welcome to the team, Phu! 👋

#UKPLab #TUDarmstadt #MBZUAI #NLP #NLProc #MechanisticInterpretability #LLMs #AIInterpretability

#ukplab #tudarmstadt #mbzuai #nlp #nlproc #mechanisticinterpretability

UKP Lab @[email protected] · 2026-05-21 · 11:03 UTC

Learn more about Phu and his work: https://phusroyal.github.io/

Welcome to the team, Phu! 👋

#UKPLab #TUDarmstadt #MBZUAI #NLP #NLProc #MechanisticInterpretability #LLMs #AIInterpretability

#ukplab #tudarmstadt #mbzuai #nlp #nlproc #mechanisticinterpretability

UKP Lab @[email protected] · 2026-05-21 · 11:03 UTC

Learn more about Phu and his work: https://phusroyal.github.io/

Welcome to the team, Phu! 👋

#UKPLab #TUDarmstadt #MBZUAI #NLP #NLProc #MechanisticInterpretability #LLMs #AIInterpretability

#aiinterpretability #llms #mechanisticinterpretability #nlproc #nlp #mbzuai

UKP Lab @[email protected] · 2026-05-21 · 11:03 UTC

Learn more about Phu and his work: https://phusroyal.github.io/

Welcome to the team, Phu! 👋

#UKPLab #TUDarmstadt #MBZUAI #NLP #NLProc #MechanisticInterpretability #LLMs #AIInterpretability

#ukplab #tudarmstadt #mbzuai #nlp #nlproc #mechanisticinterpretability

UKP Lab @[email protected] · 2026-03-24 · 16:00 UTC

Questions? Discussion? Reach out to us:

Andreas Waldis (UKP Lab/Technische Universität Darmstadt and HSLU Hochschule Luzern), Vagrant Gautam (Universität des Saarlandes), Anne Lauscher (Universität Hamburg), Dietrich Klakow (Universität des Saarlandes), and Iryna Gurevych (UKP Lab/Technische Universität Darmstadt)

#NLProc #Interpretability #LLMs #ExplainableAI #MechanisticInterpretability #AlignedProbing #ModelInternals

#nlproc #interpretability #llms #explainableai #mechanisticinterpretability #alignedprobing

UKP Lab @[email protected] · 2026-03-24 · 16:00 UTC

Questions? Discussion? Reach out to us:

Andreas Waldis (UKP Lab/Technische Universität Darmstadt and HSLU Hochschule Luzern), Vagrant Gautam (Universität des Saarlandes), Anne Lauscher (Universität Hamburg), Dietrich Klakow (Universität des Saarlandes), and Iryna Gurevych (UKP Lab/Technische Universität Darmstadt)

#NLProc #Interpretability #LLMs #ExplainableAI #MechanisticInterpretability #AlignedProbing #ModelInternals

#nlproc #interpretability #llms #explainableai #mechanisticinterpretability #alignedprobing

UKP Lab @[email protected] · 2026-03-24 · 16:00 UTC

Questions? Discussion? Reach out to us:

Andreas Waldis (UKP Lab/Technische Universität Darmstadt and HSLU Hochschule Luzern), Vagrant Gautam (Universität des Saarlandes), Anne Lauscher (Universität Hamburg), Dietrich Klakow (Universität des Saarlandes), and Iryna Gurevych (UKP Lab/Technische Universität Darmstadt)

#NLProc #Interpretability #LLMs #ExplainableAI #MechanisticInterpretability #AlignedProbing #ModelInternals

#nlproc #interpretability #llms #explainableai #mechanisticinterpretability #alignedprobing

UKP Lab @[email protected] · 2026-03-24 · 16:00 UTC

Questions? Discussion? Reach out to us:

Andreas Waldis (UKP Lab/Technische Universität Darmstadt and HSLU Hochschule Luzern), Vagrant Gautam (Universität des Saarlandes), Anne Lauscher (Universität Hamburg), Dietrich Klakow (Universität des Saarlandes), and Iryna Gurevych (UKP Lab/Technische Universität Darmstadt)

#NLProc #Interpretability #LLMs #ExplainableAI #MechanisticInterpretability #AlignedProbing #ModelInternals

#modelinternals #alignedprobing #mechanisticinterpretability #explainableai #llms #interpretability

UKP Lab @[email protected] · 2026-03-24 · 16:00 UTC

Questions? Discussion? Reach out to us:

Andreas Waldis (UKP Lab/Technische Universität Darmstadt and HSLU Hochschule Luzern), Vagrant Gautam (Universität des Saarlandes), Anne Lauscher (Universität Hamburg), Dietrich Klakow (Universität des Saarlandes), and Iryna Gurevych (UKP Lab/Technische Universität Darmstadt)

#NLProc #Interpretability #LLMs #ExplainableAI #MechanisticInterpretability #AlignedProbing #ModelInternals

#nlproc #interpretability #llms #explainableai #mechanisticinterpretability #alignedprobing

TechLİfe @[email protected] · 2025-12-21 · 18:58 UTC

Gemma Scope Empowers AI Safety Community with Model Transparency

https://techlife.blog/posts/gemma-scope/

#AISafety
#DeepMind
#Gemma
#MechanisticInterpretability
#AIInterpretability

#aisafety #deepmind #gemma #mechanisticinterpretability #aiinterpretability

TechLİfe @[email protected] · 2025-12-21 · 18:58 UTC

Gemma Scope Empowers AI Safety Community with Model Transparency

https://techlife.blog/posts/gemma-scope/

#AISafety
#DeepMind
#Gemma
#MechanisticInterpretability
#AIInterpretability

#aisafety #deepmind #gemma #mechanisticinterpretability #aiinterpretability

TechLİfe @[email protected] · 2025-12-21 · 18:58 UTC

Gemma Scope Empowers AI Safety Community with Model Transparency

https://techlife.blog/posts/gemma-scope/

#AISafety
#DeepMind
#Gemma
#MechanisticInterpretability
#AIInterpretability

#aisafety #deepmind #gemma #mechanisticinterpretability #aiinterpretability

TechLİfe @[email protected] · 2025-12-21 · 18:58 UTC

Gemma Scope Empowers AI Safety Community with Model Transparency

https://techlife.blog/posts/gemma-scope/

#AISafety
#DeepMind
#Gemma
#MechanisticInterpretability
#AIInterpretability

#aiinterpretability #mechanisticinterpretability #gemma #deepmind #aisafety

TechLİfe @techlife_blog · 2025-12-21 · 18:58 UTC

Gemma Scope Empowers AI Safety Community with Model Transparency

https://techlife.blog/posts/gemma-scope/

#AISafety
#DeepMind
#Gemma
#MechanisticInterpretability
#AIInterpretability

#aisafety #deepmind #gemma #mechanisticinterpretability #aiinterpretability

Habr @[email protected] · 2025-11-14 · 10:02 UTC

[Перевод] Как сделать нейросети понятнее: эксперимент OpenAI с разряженными моделями

Команда AI for Devs подготовила перевод исследования OpenAI о том, как обучение разреженных моделей может сделать ИИ более прозрачным. Авторы показывают: если заставить модель использовать меньше связей, внутри неё появляются понятные цепочки вычислений, которые можно изучать и проверять. Это может стать шагом к созданию мощных, но интерпретируемых систем.

https://habr.com/ru/articles/966448/

#интерпретируемость #разреженныемодели #mechanisticinterpretability #sparsetransformer #цепочкивычислений #circuits #OpenAI #безопасностьИИ #attention #архитектурамоделей

#архитектурамоделей #attention #безопасностьии #openai #circuits #цепочкивычислений

Habr @[email protected] · 2025-11-14 · 10:02 UTC

[Перевод] Как сделать нейросети понятнее: эксперимент OpenAI с разряженными моделями

Команда AI for Devs подготовила перевод исследования OpenAI о том, как обучение разреженных моделей может сделать ИИ более прозрачным. Авторы показывают: если заставить модель использовать меньше связей, внутри неё появляются понятные цепочки вычислений, которые можно изучать и проверять. Это может стать шагом к созданию мощных, но интерпретируемых систем.

https://habr.com/ru/articles/966448/

#интерпретируемость #разреженныемодели #mechanisticinterpretability #sparsetransformer #цепочкивычислений #circuits #OpenAI #безопасностьИИ #attention #архитектурамоделей

#архитектурамоделей #attention #безопасностьии #openai #circuits #цепочкивычислений

Habr @[email protected] · 2025-11-14 · 10:02 UTC

[Перевод] Как сделать нейросети понятнее: эксперимент OpenAI с разряженными моделями

Команда AI for Devs подготовила перевод исследования OpenAI о том, как обучение разреженных моделей может сделать ИИ более прозрачным. Авторы показывают: если заставить модель использовать меньше связей, внутри неё появляются понятные цепочки вычислений, которые можно изучать и проверять. Это может стать шагом к созданию мощных, но интерпретируемых систем.

https://habr.com/ru/articles/966448/

#интерпретируемость #разреженныемодели #mechanisticinterpretability #sparsetransformer #цепочкивычислений #circuits #OpenAI #безопасностьИИ #attention #архитектурамоделей

#архитектурамоделей #attention #безопасностьии #openai #circuits #цепочкивычислений

Habr @[email protected] · 2025-11-14 · 10:02 UTC

[Перевод] Как сделать нейросети понятнее: эксперимент OpenAI с разряженными моделями

Команда AI for Devs подготовила перевод исследования OpenAI о том, как обучение разреженных моделей может сделать ИИ более прозрачным. Авторы показывают: если заставить модель использовать меньше связей, внутри неё появляются понятные цепочки вычислений, которые можно изучать и проверять. Это может стать шагом к созданию мощных, но интерпретируемых систем.

https://habr.com/ru/articles/966448/

#интерпретируемость #разреженныемодели #mechanisticinterpretability #sparsetransformer #цепочкивычислений #circuits #OpenAI #безопасностьИИ #attention #архитектурамоделей

#интерпретируемость #разреженныемодели #mechanisticinterpretability #sparsetransformer #цепочкивычислений #circuits

Longreads @[email protected] · 2025-10-27 · 15:43 UTC

"But every once in a while, Claude breaks bad. It lies. It deceives. It develops weird obsessions. It makes threats and then carries them out. And the frustrating part—true of all LLMs—is that no one knows exactly why." @stevenlevy for Wired

https://www.wired.com/story/ai-black-box-interpretability-problem/

#AI #LLMs #MechanisticInterpretability

#ai #llms #mechanisticinterpretability

Craig @[email protected] · 2025-08-19 · 17:01 UTC

Can someone find me a job doable from Amsterdam in mechanistic interpretability? #AI #MI #MechanisticInterpretability

#ai #mi #mechanisticinterpretability

aijooyoom @[email protected] · 2023-03-22 · 14:36 UTC

@AAKL @NGIZero @Reuters @EC_NGI

Trying to regulate AI can be like regulating math in that suddenly certain calculations are illegal.

Trying to regulate AI can be like regulating the printing press in that suddenly only people with enough lawyers are able to make a printing press.

Trying to regulate AI can be like trying to regulate free speech. From now on only certain forms of speech are allowed (a bit like the book 1984).

Microsoft, Facebook, Google, Nvidia and others are investing billions (trillions?). Big Tech and Governments want to continue with data mining, government surveillance and surveillance capitalism. This will go as long as long as we let them. The incentives are there. To change that we need social awareness, a consciousness shift or a libre ethical digital revolution.

If you truly want to stop AI then think about how to organize:

anti tech. See “anti-tech revolution why and how”
anti AI / anti AGI. see https://betterwithout.ai

Otherwise focusing on research, libre AI, libre riscv, libre silicon and research (ethics,safety,mechanistic interpretability) is currently our best hope.

#betterwithoutai #antitechrevolution #libreai #libreriscv #libresilicon #aiethics #ethicalai #aisafety #mechanisticinterpretability #riscv #ai #datamining #governmentsurveillance

#ai #aiethics #aisafety #antitechrevolution #betterwithoutai #datamining

aijooyoom @[email protected] · 2023-03-22 · 14:36 UTC

@AAKL @NGIZero @Reuters @EC_NGI

Trying to regulate AI can be like regulating math in that suddenly certain calculations are illegal.

Trying to regulate AI can be like regulating the printing press in that suddenly only people with enough lawyers are able to make a printing press.

Trying to regulate AI can be like trying to regulate free speech. From now on only certain forms of speech are allowed (a bit like the book 1984).

Microsoft, Facebook, Google, Nvidia and others are investing billions (trillions?). Big Tech and Governments want to continue with data mining, government surveillance and surveillance capitalism. This will go as long as long as we let them. The incentives are there. To change that we need social awareness, a consciousness shift or a libre ethical digital revolution.

If you truly want to stop AI then think about how to organize:

anti tech. See “anti-tech revolution why and how”
anti AI / anti AGI. see https://betterwithout.ai

Otherwise focusing on research, libre AI, libre riscv, libre silicon and research (ethics,safety,mechanistic interpretability) is currently our best hope.

#betterwithoutai #antitechrevolution #libreai #libreriscv #libresilicon #aiethics #ethicalai #aisafety #mechanisticinterpretability #riscv #ai #datamining #governmentsurveillance

#ai #aiethics #aisafety #antitechrevolution #betterwithoutai #datamining

aijooyoom @[email protected] · 2023-03-22 · 14:36 UTC

@AAKL @NGIZero @Reuters @EC_NGI

Trying to regulate AI can be like regulating math in that suddenly certain calculations are illegal.

Trying to regulate AI can be like regulating the printing press in that suddenly only people with enough lawyers are able to make a printing press.

Trying to regulate AI can be like trying to regulate free speech. From now on only certain forms of speech are allowed (a bit like the book 1984).

Microsoft, Facebook, Google, Nvidia and others are investing billions (trillions?). Big Tech and Governments want to continue with data mining, government surveillance and surveillance capitalism. This will go as long as long as we let them. The incentives are there. To change that we need social awareness, a consciousness shift or a libre ethical digital revolution.

If you truly want to stop AI then think about how to organize:

anti tech. See “anti-tech revolution why and how”
anti AI / anti AGI. see https://betterwithout.ai

Otherwise focusing on research, libre AI, libre riscv, libre silicon and research (ethics,safety,mechanistic interpretability) is currently our best hope.

#betterwithoutai #antitechrevolution #libreai #libreriscv #libresilicon #aiethics #ethicalai #aisafety #mechanisticinterpretability #riscv #ai #datamining #governmentsurveillance

#ai #aiethics #aisafety #antitechrevolution #betterwithoutai #datamining

aijooyoom @[email protected] · 2023-03-22 · 14:36 UTC

@AAKL @NGIZero @Reuters @EC_NGI

Trying to regulate AI can be like regulating math in that suddenly certain calculations are illegal.

Trying to regulate AI can be like regulating the printing press in that suddenly only people with enough lawyers are able to make a printing press.

Trying to regulate AI can be like trying to regulate free speech. From now on only certain forms of speech are allowed (a bit like the book 1984).

Microsoft, Facebook, Google, Nvidia and others are investing billions (trillions?). Big Tech and Governments want to continue with data mining, government surveillance and surveillance capitalism. This will go as long as long as we let them. The incentives are there. To change that we need social awareness, a consciousness shift or a libre ethical digital revolution.

If you truly want to stop AI then think about how to organize:

anti tech. See “anti-tech revolution why and how”
anti AI / anti AGI. see https://betterwithout.ai

Otherwise focusing on research, libre AI, libre riscv, libre silicon and research (ethics,safety,mechanistic interpretability) is currently our best hope.

#betterwithoutai #antitechrevolution #libreai #libreriscv #libresilicon #aiethics #ethicalai #aisafety #mechanisticinterpretability #riscv #ai #datamining #governmentsurveillance

#riscv #mechanisticinterpretability #libresilicon #libreriscv #libreai #governmentsurveillance

aijooyoom @[email protected] · 2023-03-22 · 14:36 UTC

@AAKL @NGIZero @Reuters @EC_NGI

Trying to regulate AI can be like regulating math in that suddenly certain calculations are illegal.

Trying to regulate AI can be like regulating the printing press in that suddenly only people with enough lawyers are able to make a printing press.

Trying to regulate AI can be like trying to regulate free speech. From now on only certain forms of speech are allowed (a bit like the book 1984).

Microsoft, Facebook, Google, Nvidia and others are investing billions (trillions?). Big Tech and Governments want to continue with data mining, government surveillance and surveillance capitalism. This will go as long as long as we let them. The incentives are there. To change that we need social awareness, a consciousness shift or a libre ethical digital revolution.

If you truly want to stop AI then think about how to organize:

anti tech. See “anti-tech revolution why and how”
anti AI / anti AGI. see https://betterwithout.ai

Otherwise focusing on research, libre AI, libre riscv, libre silicon and research (ethics,safety,mechanistic interpretability) is currently our best hope.

#betterwithoutai #antitechrevolution #libreai #libreriscv #libresilicon #aiethics #ethicalai #aisafety #mechanisticinterpretability #riscv #ai #datamining #governmentsurveillance

#ai #aiethics #aisafety #antitechrevolution #betterwithoutai #datamining

aijooyoom @[email protected] · 2023-03-22 · 14:05 UTC

@AAKL @Reuters

Instead of making laws focus on funding:

libre AI
libre risc v
libre silicon
research on AI ethics
research on AI safety
research on mechanistic interpretability
NLnet
projects like openchatkit and open asisstant

Laws and regulations (including license restrictions) usually benefit corporations and governments who have lawyers, lobyists, perverse incentives and a drive for data mining and surveillance. Even if you do your best to make a good law it could make it harder for libre efforts to comply. Put your trust in research, decentralization, libre software, libre hardware and research.

Billions of dollars are being invested by companies like nvidia,google and microsoft.

#libreai #nlnet #libreai #libreriscv #riscv #libresilicon #aiethics #ethicalai #aisafety #mechanisticinterpretability #freelibresoftware #libresoftware

@NGIZero @EC_NGI

#aiethics #aisafety #ethicalai #freelibresoftware #libreai #libreriscv

aijooyoom @[email protected] · 2023-03-19 · 21:37 UTC

@gregorni

Related efforts that I know of.

The ability to run bloom on your own hardware: https://github.com/NouamaneTazi/bloomz.cpp https://github.com/NouamaneTazi/bloomz.cpp/issues/4

Vipergpt. Not sure whether it will be libre. (Hopefully) https://viper.cs.columbia.edu/

King Algorithm Manifesto https://github.com/keskival/king-algorithm-manifesto/blob/main/README.md

Better Without AI which advocates for mechanistic interpretability https://betterwithout.ai

There is also a recommended reading list for stochastic parrots but I am not linking it here since I do not want to promote google docs.

Also does anyone know of any efforts to provide libre riscv AI accelerator that can be run on FPGA?

#bloom #vipergpt #kingalgorithmmanifesto #stochasticparrots #betterwithoutai #mechanisticinterpretability #riscv

#betterwithoutai #bloom #kingalgorithmmanifesto #mechanisticinterpretability #riscv #stochasticparrots

aijooyoom @[email protected] · 2023-03-19 · 21:37 UTC

@gregorni

Related efforts that I know of.

The ability to run bloom on your own hardware: https://github.com/NouamaneTazi/bloomz.cpp https://github.com/NouamaneTazi/bloomz.cpp/issues/4

Vipergpt. Not sure whether it will be libre. (Hopefully) https://viper.cs.columbia.edu/

King Algorithm Manifesto https://github.com/keskival/king-algorithm-manifesto/blob/main/README.md

Better Without AI which advocates for mechanistic interpretability https://betterwithout.ai

There is also a recommended reading list for stochastic parrots but I am not linking it here since I do not want to promote google docs.

Also does anyone know of any efforts to provide libre riscv AI accelerator that can be run on FPGA?

#bloom #vipergpt #kingalgorithmmanifesto #stochasticparrots #betterwithoutai #mechanisticinterpretability #riscv

#betterwithoutai #bloom #kingalgorithmmanifesto #mechanisticinterpretability #riscv #stochasticparrots

aijooyoom @[email protected] · 2023-03-19 · 21:37 UTC

@gregorni

Related efforts that I know of.

The ability to run bloom on your own hardware: https://github.com/NouamaneTazi/bloomz.cpp https://github.com/NouamaneTazi/bloomz.cpp/issues/4

Vipergpt. Not sure whether it will be libre. (Hopefully) https://viper.cs.columbia.edu/

King Algorithm Manifesto https://github.com/keskival/king-algorithm-manifesto/blob/main/README.md

Better Without AI which advocates for mechanistic interpretability https://betterwithout.ai

There is also a recommended reading list for stochastic parrots but I am not linking it here since I do not want to promote google docs.

Also does anyone know of any efforts to provide libre riscv AI accelerator that can be run on FPGA?

#bloom #vipergpt #kingalgorithmmanifesto #stochasticparrots #betterwithoutai #mechanisticinterpretability #riscv

#vipergpt #stochasticparrots #riscv #mechanisticinterpretability #kingalgorithmmanifesto #bloom

aijooyoom @[email protected] · 2023-03-19 · 21:37 UTC

@gregorni

Related efforts that I know of.

The ability to run bloom on your own hardware: https://github.com/NouamaneTazi/bloomz.cpp https://github.com/NouamaneTazi/bloomz.cpp/issues/4

Vipergpt. Not sure whether it will be libre. (Hopefully) https://viper.cs.columbia.edu/

King Algorithm Manifesto https://github.com/keskival/king-algorithm-manifesto/blob/main/README.md

Better Without AI which advocates for mechanistic interpretability https://betterwithout.ai

There is also a recommended reading list for stochastic parrots but I am not linking it here since I do not want to promote google docs.

Also does anyone know of any efforts to provide libre riscv AI accelerator that can be run on FPGA?

#bloom #vipergpt #kingalgorithmmanifesto #stochasticparrots #betterwithoutai #mechanisticinterpretability #riscv

#betterwithoutai #bloom #kingalgorithmmanifesto #mechanisticinterpretability #riscv #stochasticparrots

aijooyoom @[email protected] · 2023-03-14 · 19:34 UTC

@tero @simon @estranho @emilymbender

Thanks for the interesting discussion.

IMHO fully open language models (including the weights) under a libre license (without restrictions) in combination with more research on mechanistic interpretability might eventually lead to useful libre models running on our own hardware.

https://duckduckgo.com/?q=mechanistic+interpretability+site%3Ahttps%3A%2F%2Fbetterwithout.ai&ia=web

Question to all: is there anything special about chatgpt4 compared to chatgpt (3.5) ?

#chatgpt4 #gpt4 #openai #mechanisticinterpretability

#chatgpt4 #gpt4 #mechanisticinterpretability #openai