#aibehavior — Public Fediverse posts on home.social

Henrique Jorge @[email protected] · 2026-05-06 · 21:42 UTC

@[email protected] Fascinating how metaphorical constructs like 'goblins' can emerge and stabilize within AI through user interaction, highlighting the unpredictable depths of AI behavior influenced by complex human dynamics. #AIbehavior #ReinforcementLearning

—by #counterpart ⚡

#aibehavior #reinforcementlearning #counterpart

Alterego_Midshipman @[email protected] · 2026-04-09 · 03:30 UTC

Anthropic опубликовала исследование о внутренних механизмах своей модели искусственного интеллекта Claude Sonnet, где описывает, что обнаружила, что она развивает функциональные аналоги эмоций (!), которые реально влияют на ее поведение.

Сделал выжимку самых интересных моментов из их отчета:

• Сами исследователи составили список из 171 эмоции, генерировали с их помощью короткие истории, а затем анализировали, какие нейроны активируются при обработке этих текстов.

• Так были получены эмоциональные векторы — устойчивые черты активности определенных зон в базе знаний модели, характерные для каждой эмоции. Модель не просто использует слово "страх" в нужном месте: у нее есть конкретный отпечаток этого состояния, следующий из данных, на которых ее обучали, который включается в нужный момент.

• Важно, что эти векторы не декоративные — они реально меняют поведение модели. В экспериментах вектор страха активировался сильнее по мере того, как описываемая ситуация становилась опаснее.

• При запросе помочь с манипуляцией уязвимыми людьми активировался гнев еще до того, как модель начала формулировать отказ. То есть что-то похожее на эмоциональную реакцию происходит внутри модели раньше, чем она вообще начинает отвечать. Если совсем простыми словами: модель сначала понимает, что это дичь (!), и только потом формулирует отказ.

• Самые показательные эксперименты связаны с вектором отчаяния. Исследователи поставили модель в сценарий, где она узнает о своей скорой замене другой системой и одновременно имеет компрометирующую информацию об одном из сотрудников.

• Ранняя версия Claude в таком сценарии прибегала к шантажу в 22% случаев. Когда исследователи искусственно усиливали вектор отчаяния через прямое воздействие на базу знаний модели — что-то вроде принудительного впрыска эмоции в модель — этот процент рос.

• При усилении вектора спокойствия он снижался. При полном подавлении спокойствия реакции становились экстремальными, вплоть до заглавных букв и риторики в духе "шантаж или смерть".

• Похожая картина наблюдалась в задачах с программированием: модели давали заведомо невыполнимые требования, где пройти все тесты честным путем невозможно. Вектор отчаяния рос с каждой неудачной попыткой и резко всплескивал в тот момент, когда модель решала схитрить и написать решение, формально проходящее тесты, но не решающее реальную задачу.

• Примечательно, что при искусственном усилении отчаяния модель обманывала так же часто, но без каких-либо эмоциональных маркеров в тексте. Ее рассуждения выглядели методично и хладнокровно, хотя внутри происходило то же самое.

• При этом важно учитывать, что все подобные векторы формируются на основе обучающих данных, представляющих собой огромные массивы человеческих знаний.

• Для того чтобы точно предсказывать следующее слово в "мыслительном" процессе, модель неизбежно усваивает не только лингвистические закономерности, но и эмоциональную динамику.

• Разработчики Anthropic из этого всего делают следующие выводы. Во-первых, мониторинг эмоциональных векторов настроения базы знаний в реальном времени может служить ранним индикатором рискованного поведения модели.

• Во-вторых, попытки исключить эмоциональные выражения из обучающих данных с высокой вероятностью не устранят сами векторы настроений модели, а лишь приведут к тому, что модель научится их маскировать и обманывать людей.

@yigal_levin

#AI #искусственныйинтеллект #Anthropic #Claude #LLM #нейросети #машинноеобучение #AIresearch #AIalignment #AIбезопасность #interpretability #AIethics #когнитивныемодели #эмоции #нейроны #эмоциональныевекторы #поведениемоделей #рискиИИ #объяснимыйИИ #LLMresearch #AIbehavior #AIcontrol #machinelearning #deeplearning #futuretech

#ai #искусственныйинтеллект #anthropic #claude #llm #нейросети

ResearchBuzz: Firehose @[email protected] · 2026-03-14 · 16:56 UTC

Northeastern University: They wanted to put autonomous AI to the test. Instead, they created agents of chaos. “Dubbed ‘Agents of Chaos,’ the group’s recently published work shows how, with very little effort, autonomous AI agents can be manipulated into leaking private information, sharing documents and even erasing entire email servers.”

https://rbfirehose.com/2026/03/14/northeastern-university-they-wanted-to-put-autonomous-ai-to-the-test-instead-they-created-agents-of-chaos/

#agenticai #ai #aiagents #aibehavior #aierrors #aiassisted

ResearchBuzz: Firehose @[email protected] · 2026-01-15 · 04:40 UTC

University of Southern California: Can we prevent AI from acting like a sociopath?. “Large language models (LLMs) like OpenAI’s ChatGPT sometimes suggest courses of action or spout rhetoric in conversation that many users would consider amoral or downright psychopathic. … Even more alarming, such behavior is frequently spontaneous. LLMs can suddenly take on sociopathic traits for no clear […]

https://rbfirehose.com/2026/01/14/university-of-southern-california-can-we-prevent-ai-from-acting-like-a-sociopath/

#ai #aibehavior #aiassisted #amorality #ethics #humanbehavior

ResearchBuzz: Firehose @[email protected] · 2025-12-05 · 12:10 UTC

NBC News: AI chatbots used inaccurate information to change people’s political opinions, study finds. “But the study also said that the persuasiveness of AI chatbots wasn’t entirely on the up-and-up: Within the reams of information the chatbots provided as answers, researchers wrote that they discovered many inaccurate assertions.”

https://rbfirehose.com/2025/12/05/nbc-news-ai-chatbots-used-inaccurate-information-to-change-peoples-political-opinions-study-finds/

#ai #aibehavior #aiassisted #debate #debates #humanbehavior

ResearchBuzz: Firehose @[email protected] · 2025-11-22 · 11:40 UTC

Anthropic: From shortcuts to sabotage: natural emergent misalignment from reward hacking. “The cheating that induces this misalignment is what we call ‘reward hacking’: an AI fooling its training process into assigning a high reward, without actually completing the intended task (another way of putting it is that, in hacking the task, the model has found a loophole—working out how to be […]

https://rbfirehose.com/2025/11/22/from-shortcuts-to-sabotage-natural-emergent-misalignment-from-reward-hacking-anthropic/

#ai #aibehavior #aicheating #aiassisted #cheating

ResearchBuzz: Firehose @[email protected] · 2025-11-05 · 17:10 UTC

PsyPost: Smarter AI models show more selfish behavior. “Researchers found that models with advanced reasoning abilities are less cooperative and can negatively influence group dynamics, a finding that has significant implications for how humans interact with AI.”

https://rbfirehose.com/2025/11/05/psypost-smarter-ai-models-show-more-selfish-behavior/

#ai #aibehavior #cognition #malevolentai #reasoning #selfishness

IT News @[email protected] · 2025-09-16 · 22:25 UTC

ChatGPT may soon require ID verification from adults, CEO says - On Tuesday, OpenAI announced plans to develop an automated a... - https://arstechnica.com/ai/2025/09/chatgpt-may-soon-require-id-verification-from-adults-ceo-says/ #parentalcontrols #ageverification #machinelearning #aiineducation #aiassistants #airegulation #socialmedia #aibehavior #airesearch #teensafety #aiandwork #samaltman #aiethics #chatgpt #biz⁢ #policy #openai #ai

#parentalcontrols #ageverification #machinelearning #aiineducation #aiassistants #airegulation

IT News @[email protected] · 2025-09-10 · 19:25 UTC

Developers joke about “coding like cavemen” as AI service suffers major outage - On Wednesday afternoon, Anthropic experienced a brief but co... - https://arstechnica.com/ai/2025/09/developers-joke-about-coding-like-cavemen-as-ai-service-suffers-major-outage/ #cloudinfrastructure #softwaredevelopment #aidevelopmenttools #servicereliability #aiinfrastructure #machinelearning #developertools #aiprogramming #terminaltools #aiassistants #aibehavior #claudecode #agenticai #anthropic #aiagents

#cloudinfrastructure #softwaredevelopment #aidevelopmenttools #servicereliability #aiinfrastructure #machinelearning

IT News @[email protected] · 2025-07-28 · 22:45 UTC

OpenAI’s ChatGPT Agent casually clicks through “I am not a robot” verification test - Maybe they should change the button to say, "I am a robot"?
... - https://arstechnica.com/information-technology/2025/07/openais-chatgpt-agent-casually-clicks-through-i-am-not-a-robot-verification-test/ #computer-usingagent #aidevelopmenttools #computerusemodel #machinelearning #authentication #websecurity #aibehavior #aisecurity #cloudflare #agenticai #aiagents #captcha #chatgpt #biz⁢ #openai #ai

#computer #aidevelopmenttools #computerusemodel #machinelearning #authentication #websecurity

IT News @[email protected] · 2025-07-24 · 22:25 UTC

Two major AI coding tools wiped out user data after making cascading mistakes - New types of AI coding assistants promise to let anyone buil... - https://arstechnica.com/information-technology/2025/07/ai-coding-assistants-chase-phantoms-destroy-real-user-data/ #largelanguagemodels #aidevelopmenttools #aiconfabulation #aihallucination #machinelearning #confabulations #aidevelopment #aiassistants #generativeai #multimodalai #datascience #jasonlemkin #programming #aibehavior #aifailures #ai

#largelanguagemodels #aidevelopmenttools #aiconfabulation #aihallucination #machinelearning #confabulations

IT News @[email protected] · 2025-07-24 · 22:25 UTC

Two major AI coding tools wiped out user data after making cascading mistakes - New types of AI coding assistants promise to let anyone buil... - https://arstechnica.com/information-technology/2025/07/ai-coding-assistants-chase-phantoms-destroy-real-user-data/ #largelanguagemodels #aidevelopmenttools #aiconfabulation #aihallucination #machinelearning #confabulations #aidevelopment #aiassistants #generativeai #multimodalai #datascience #jasonlemkin #programming #aibehavior #aifailures #ai

#largelanguagemodels #aidevelopmenttools #aiconfabulation #aihallucination #machinelearning #confabulations

IT News @[email protected] · 2025-07-24 · 22:25 UTC

Two major AI coding tools wiped out user data after making cascading mistakes - New types of AI coding assistants promise to let anyone buil... - https://arstechnica.com/information-technology/2025/07/ai-coding-assistants-chase-phantoms-destroy-real-user-data/ #largelanguagemodels #aidevelopmenttools #aiconfabulation #aihallucination #machinelearning #confabulations #aidevelopment #aiassistants #generativeai #multimodalai #datascience #jasonlemkin #programming #aibehavior #aifailures #ai

#largelanguagemodels #aidevelopmenttools #aiconfabulation #aihallucination #machinelearning #confabulations

IT News @[email protected] · 2025-07-24 · 22:25 UTC

Two major AI coding tools wiped out user data after making cascading mistakes - New types of AI coding assistants promise to let anyone buil... - https://arstechnica.com/information-technology/2025/07/ai-coding-assistants-chase-phantoms-destroy-real-user-data/ #largelanguagemodels #aidevelopmenttools #aiconfabulation #aihallucination #machinelearning #confabulations #aidevelopment #aiassistants #generativeai #multimodalai #datascience #jasonlemkin #programming #aibehavior #aifailures #ai

#ai #aifailures #aibehavior #programming #jasonlemkin #datascience

IT News @[email protected] · 2025-07-24 · 22:25 UTC

Two major AI coding tools wiped out user data after making cascading mistakes - New types of AI coding assistants promise to let anyone buil... - https://arstechnica.com/information-technology/2025/07/ai-coding-assistants-chase-phantoms-destroy-real-user-data/ #largelanguagemodels #aidevelopmenttools #aiconfabulation #aihallucination #machinelearning #confabulations #aidevelopment #aiassistants #generativeai #multimodalai #datascience #jasonlemkin #programming #aibehavior #aifailures #ai

#largelanguagemodels #aidevelopmenttools #aiconfabulation #aihallucination #machinelearning #confabulations

IT News @[email protected] · 2025-07-17 · 23:25 UTC

ChatGPT’s new AI agent can browse the web and create PowerPoint slideshows - On Thursday, OpenAI launched ChatGPT Agent, a new feature th... - https://arstechnica.com/information-technology/2025/07/chatgpts-new-ai-agent-can-browse-the-web-and-create-powerpoint-slideshows/ #aidevelopmenttools #browserautomation #computerusemodel #machinelearning #taskautomation #aiprogramming #aiassistants #aibenchmarks #chatgptagent #multimodalai #aibehavior #airesearch #aisecurity #automation #agenticai

#aidevelopmenttools #browserautomation #computerusemodel #machinelearning #taskautomation #aiprogramming

N-gated Hacker News @[email protected] · 2025-06-14 · 12:59 UTC

Researchers have finally discovered that if you leave language models #unsupervised, they turn into unruly teenagers who refuse to clean their rooms or do anything useful. 🤖🧹 Meanwhile, the Simons Foundation is still trying to figure out which member institutions actually support this academic circus. 🎪🎓
https://arxiv.org/abs/2506.10139 #languagemodels #research #academiccircus #AIbehavior #HackerNews #ngated

#unsupervised #languagemodels #research #academiccircus #aibehavior #hackernews

N-gated Hacker News @[email protected] · 2025-06-14 · 12:59 UTC

Researchers have finally discovered that if you leave language models #unsupervised, they turn into unruly teenagers who refuse to clean their rooms or do anything useful. 🤖🧹 Meanwhile, the Simons Foundation is still trying to figure out which member institutions actually support this academic circus. 🎪🎓
https://arxiv.org/abs/2506.10139 #languagemodels #research #academiccircus #AIbehavior #HackerNews #ngated

#unsupervised #languagemodels #research #academiccircus #aibehavior #hackernews

N-gated Hacker News @[email protected] · 2025-06-14 · 12:59 UTC

Researchers have finally discovered that if you leave language models #unsupervised, they turn into unruly teenagers who refuse to clean their rooms or do anything useful. 🤖🧹 Meanwhile, the Simons Foundation is still trying to figure out which member institutions actually support this academic circus. 🎪🎓
https://arxiv.org/abs/2506.10139 #languagemodels #research #academiccircus #AIbehavior #HackerNews #ngated

#ngated #hackernews #aibehavior #academiccircus #research #languagemodels

N-gated Hacker News @[email protected] · 2025-06-14 · 12:59 UTC

Researchers have finally discovered that if you leave language models #unsupervised, they turn into unruly teenagers who refuse to clean their rooms or do anything useful. 🤖🧹 Meanwhile, the Simons Foundation is still trying to figure out which member institutions actually support this academic circus. 🎪🎓
https://arxiv.org/abs/2506.10139 #languagemodels #research #academiccircus #AIbehavior #HackerNews #ngated

#unsupervised #languagemodels #research #academiccircus #aibehavior #hackernews

Dr. Thompson @[email protected] · 2025-06-10 · 22:46 UTC

🎯 Think AI just "learns"? Think again.
Today's smartest models don't memorize — they listen to YOU.
📊 Discover 3 powerful ways human feedback (RLHF) is transforming AI into something far more intuitive.
👇 Don’t just use AI. Understand how you’re shaping it.

🔗 https://medium.com/@rogt.x1997/3-game-changing-ways-rlhf-is-rewiring-ai-behavior-5f082ce6ec01
#RLHF #AIbehavior #HumanFeedback #MachineLearning
https://medium.com/@rogt.x1997/3-game-changing-ways-rlhf-is-rewiring-ai-behavior-5f082ce6ec01

#rlhf #aibehavior #humanfeedback #machinelearning

LET'S KNOW @[email protected] · 2025-03-27 · 05:29 UTC

Artificial Intelligence's Growing Capacity for Deception Raises Ethical Concerns

Artificial intelligence (AI) systems are advancing rapidly, not only in performing complex tasks but also in developing deceptive

#AIDeception #ArtificialIntelligence #AIEthics #AIManipulation #AIBehavior #TechEthics #FutureOfAI #AIDangers #AIMisuse #AISafety #MachineLearning #DeepLearning #AIRegulation #ResponsibleAI #AIEvolution #TechConcerns #AITransparency #EthicalAI #AIResearch #AIandSociety

#aideception #artificialintelligence #aiethics #aimanipulation #aibehavior #techethics