#chainofthought — Public Fediverse posts on home.social

Habr @[email protected] · 2026-05-13 · 09:42 UTC

Промпт-инжиниринг 2026: что устарело с приходом reasoning-моделей

Половина моих промпт-техник за пару лет работы с GPT-4 и Claude 3.5 на reasoning-моделях работает хуже минимального промпта. Развёрнутый chain-of-thought, многошаговый few-shot, эмоциональная role-play — лишнее или вредит. А скучные техники — контракт результата, системные промпты, constraints — наоборот, стали критически важными. Что умерло, что выжило, что подходит под какую задачу.

https://habr.com/ru/articles/1034572/

#промптинжиниринг #reasoningмодели #gpt55 #claude_opus #llm #chainofthought

#chainofthought #llm #claude_opus #gpt55 #reasoningмодели #промптинжиниринг

Habr @[email protected] · 2026-04-06 · 15:32 UTC

Иллюзия логики: как я доказал, что LLM-агенты игнорируют факты, и почему Chain-of-Thought делает только хуже

Сейчас каждый второй стартап пилит ИИ-агентов. Мы оборачиваем LLM в цикл Промпт -> Вызов инструмента -> Ответ и ждем, что нейросеть сама расследует инцидент, найдет баг или напишет фичу. Но на практике автономные агенты часто ходят по кругу, игнорируют явные ошибки и «влюбляются» в свою первую догадку. Индустрия пытается лечить это костылями: наращивает контекст до миллионов токенов или заставляет модель «подумать шаг за шагом» (Chain-of-Thought). Я решил проверить эту архитектуру на прочность. Собрал локальный измерительный стенд LOCK-R, вооружился Теоремой Байеса и поймал современные LLM за руку. В этой статье я математически докажу, почему одиночные агенты структурно уязвимы, как токены размышлений заставляют их врать самим себе еще искуснее, и почему паттерн «Слепого Судьи» - это единственный способ вылечить AI от предвзятости. Тестируем на локальной Qwen-9B и фронтирной GPT-5.4.

https://habr.com/ru/articles/1020016/

#llm #ai_agents #rag #machine_learning #архитектура #chainofthought #теорема_байеса #gpt54 #qwen35 #бенчмарк

#бенчмарк #qwen35 #gpt54 #теорема_байеса #chainofthought #архитектура

Habr @[email protected] · 2026-04-06 · 15:32 UTC

Иллюзия логики: как я доказал, что LLM-агенты игнорируют факты, и почему Chain-of-Thought делает только хуже

Сейчас каждый второй стартап пилит ИИ-агентов. Мы оборачиваем LLM в цикл Промпт -> Вызов инструмента -> Ответ и ждем, что нейросеть сама расследует инцидент, найдет баг или напишет фичу. Но на практике автономные агенты часто ходят по кругу, игнорируют явные ошибки и «влюбляются» в свою первую догадку. Индустрия пытается лечить это костылями: наращивает контекст до миллионов токенов или заставляет модель «подумать шаг за шагом» (Chain-of-Thought). Я решил проверить эту архитектуру на прочность. Собрал локальный измерительный стенд LOCK-R, вооружился Теоремой Байеса и поймал современные LLM за руку. В этой статье я математически докажу, почему одиночные агенты структурно уязвимы, как токены размышлений заставляют их врать самим себе еще искуснее, и почему паттерн «Слепого Судьи» - это единственный способ вылечить AI от предвзятости. Тестируем на локальной Qwen-9B и фронтирной GPT-5.4.

https://habr.com/ru/articles/1020016/

#llm #ai_agents #rag #machine_learning #архитектура #chainofthought #теорема_байеса #gpt54 #qwen35 #бенчмарк

#бенчмарк #qwen35 #gpt54 #теорема_байеса #chainofthought #архитектура

Habr @[email protected] · 2026-04-06 · 15:32 UTC

Иллюзия логики: как я доказал, что LLM-агенты игнорируют факты, и почему Chain-of-Thought делает только хуже

Сейчас каждый второй стартап пилит ИИ-агентов. Мы оборачиваем LLM в цикл Промпт -> Вызов инструмента -> Ответ и ждем, что нейросеть сама расследует инцидент, найдет баг или напишет фичу. Но на практике автономные агенты часто ходят по кругу, игнорируют явные ошибки и «влюбляются» в свою первую догадку. Индустрия пытается лечить это костылями: наращивает контекст до миллионов токенов или заставляет модель «подумать шаг за шагом» (Chain-of-Thought). Я решил проверить эту архитектуру на прочность. Собрал локальный измерительный стенд LOCK-R, вооружился Теоремой Байеса и поймал современные LLM за руку. В этой статье я математически докажу, почему одиночные агенты структурно уязвимы, как токены размышлений заставляют их врать самим себе еще искуснее, и почему паттерн «Слепого Судьи» - это единственный способ вылечить AI от предвзятости. Тестируем на локальной Qwen-9B и фронтирной GPT-5.4.

https://habr.com/ru/articles/1020016/

#llm #ai_agents #rag #machine_learning #архитектура #chainofthought #теорема_байеса #gpt54 #qwen35 #бенчмарк

#бенчмарк #qwen35 #gpt54 #теорема_байеса #chainofthought #архитектура

Habr @[email protected] · 2026-04-06 · 15:32 UTC

Иллюзия логики: как я доказал, что LLM-агенты игнорируют факты, и почему Chain-of-Thought делает только хуже

Сейчас каждый второй стартап пилит ИИ-агентов. Мы оборачиваем LLM в цикл Промпт -> Вызов инструмента -> Ответ и ждем, что нейросеть сама расследует инцидент, найдет баг или напишет фичу. Но на практике автономные агенты часто ходят по кругу, игнорируют явные ошибки и «влюбляются» в свою первую догадку. Индустрия пытается лечить это костылями: наращивает контекст до миллионов токенов или заставляет модель «подумать шаг за шагом» (Chain-of-Thought). Я решил проверить эту архитектуру на прочность. Собрал локальный измерительный стенд LOCK-R, вооружился Теоремой Байеса и поймал современные LLM за руку. В этой статье я математически докажу, почему одиночные агенты структурно уязвимы, как токены размышлений заставляют их врать самим себе еще искуснее, и почему паттерн «Слепого Судьи» - это единственный способ вылечить AI от предвзятости. Тестируем на локальной Qwen-9B и фронтирной GPT-5.4.

https://habr.com/ru/articles/1020016/

#llm #ai_agents #rag #machine_learning #архитектура #chainofthought #теорема_байеса #gpt54 #qwen35 #бенчмарк

#llm #ai_agents #rag #machine_learning #архитектура #chainofthought

DrBob, 🧠 Mechanic @[email protected] · 2026-03-24 · 10:38 UTC

Chain-of-Thought (CoT) Prompting

Chain-of-Thought (CoT) prompting is a technique where asking questions, rather than issuing direct instructions activates a model’s full internal reasoning pathway.

The key insight from the original framing is that instructions skip steps 1–3, jumping straight to synthesis, while questions force the model to work through the entire reasoning chain.

https://neurodoctor.com/2026/03/20/chain-of-thought-cot-prompting/

#chainofthought #cot #ai #llm #prompt #prompts #prompting #claude #chatgpt #gemini #ericschmidt

#chainofthought #cot #ai #llm #prompt #prompts

DrBob, 🧠 Mechanic @[email protected] · 2026-03-24 · 10:38 UTC

Chain-of-Thought (CoT) Prompting

Chain-of-Thought (CoT) prompting is a technique where asking questions, rather than issuing direct instructions activates a model’s full internal reasoning pathway.

The key insight from the original framing is that instructions skip steps 1–3, jumping straight to synthesis, while questions force the model to work through the entire reasoning chain.

https://neurodoctor.com/2026/03/20/chain-of-thought-cot-prompting/

#chainofthought #cot #ai #llm #prompt #prompts #prompting #claude #chatgpt #gemini #ericschmidt

#chainofthought #cot #ai #llm #prompt #prompts

DrBob, 🧠 Mechanic @[email protected] · 2026-03-24 · 10:38 UTC

Chain-of-Thought (CoT) Prompting

Chain-of-Thought (CoT) prompting is a technique where asking questions, rather than issuing direct instructions activates a model’s full internal reasoning pathway.

The key insight from the original framing is that instructions skip steps 1–3, jumping straight to synthesis, while questions force the model to work through the entire reasoning chain.

https://neurodoctor.com/2026/03/20/chain-of-thought-cot-prompting/

#chainofthought #cot #ai #llm #prompt #prompts #prompting #claude #chatgpt #gemini #ericschmidt

#ericschmidt #chainofthought #cot #ai #llm #prompt

DrBob, 🧠 Mechanic @[email protected] · 2026-03-24 · 10:38 UTC

Chain-of-Thought (CoT) Prompting

Chain-of-Thought (CoT) prompting is a technique where asking questions, rather than issuing direct instructions activates a model’s full internal reasoning pathway.

The key insight from the original framing is that instructions skip steps 1–3, jumping straight to synthesis, while questions force the model to work through the entire reasoning chain.

https://neurodoctor.com/2026/03/20/chain-of-thought-cot-prompting/

#chainofthought #cot #ai #llm #prompt #prompts #prompting #claude #chatgpt #gemini #ericschmidt

#ericschmidt #gemini #chatgpt #claude #prompting #prompts

DrBob, 🧠 Mechanic @[email protected] · 2026-03-24 · 10:38 UTC

Chain-of-Thought (CoT) Prompting

Chain-of-Thought (CoT) prompting is a technique where asking questions, rather than issuing direct instructions activates a model’s full internal reasoning pathway.

The key insight from the original framing is that instructions skip steps 1–3, jumping straight to synthesis, while questions force the model to work through the entire reasoning chain.

https://neurodoctor.com/2026/03/20/chain-of-thought-cot-prompting/

#chainofthought #cot #ai #llm #prompt #prompts #prompting #claude #chatgpt #gemini #ericschmidt

#chainofthought #cot #ai #llm #prompt #prompts

tech news ᳇ eicker.news @[email protected] · 2026-02-08 · 14:44 UTC

#EricJang argues that #AImodels can now genuinely think and code. Using #ClaudeCode, he demonstrates #automatedresearch workflows, traces reasoning’s evolution from #ChainofThought to #DeepSeekR1, and predicts massive demand for inference compute. #Codingagents will fundamentally transform #softwareengineering, #research, and #militarystrategy - “the rocks can think now.“ https://evjang.com/2026/02/04/rocks.html?eicker.news #tech #media #news

#ericjang #aimodels #claudecode #automatedresearch #chainofthought #deepseekr1

AI Daily Post @[email protected] · 2026-02-08 · 10:39 UTC

New research shows DeepSeek-R1 and QwQ-3 develop distinct personalities that boost chain-of-thought reasoning, hinting at a future where societies of thought among LLMs improve problem solving. Open-source enthusiasts, see how personality diversity reshapes AI reasoning! #DeepSeekR1 #QwQ32B #ChainOfThought #PersonalityDiversity

🔗 https://aidailypost.com/news/deepseekr1-qwq3-exhibit-competing-personalities-that-improve-reasoning

#deepseekr1 #qwq32b #chainofthought #personalitydiversity

Habr @[email protected] · 2026-01-22 · 09:42 UTC

Общество мыслей: совещание внутри LLM

DeepSeek-R1, QwQ-32B и OpenAI o1 показывают результаты, которые невозможно объяснить просто "более длинными рассуждениями". Исследователи из Google Research и University of Chicago обнаружили нечто неожиданное: внутри reasoning-моделей происходит не монолог, а настоящее совещание — симуляция многоперспективного диалога с конфликтами, дебатами и примирением. В статье разбираем: • Почему Chain-of-Thought недостаточен для сложных задач • Что такое Society of Thought и как модели воспроизводят коллективный интеллект • Четыре ключевых паттерна conversational dynamics (вопросы, смена перспектив, конфликт, примирение) • 12 социо-эмоциональных ролей по Bales' IPA, которые возникают в рассуждениях моделей • Diversity (разнообразие) перспектив и почему разнообразие точек зрения критично для accuracy (точности) • Результаты экспериментов: activation steering, RL-обучение и transfer effects Основной вывод: reasoning-модели спонтанно научились имитировать то, что философы и психологи описывали как природу мышления — внутренний диалог между разными голосами. И это работает лучше, чем линейное рассуждение.

https://habr.com/ru/articles/987758/

#LLM #reasoning #ChainofThought #DeepSeekR1 #QwQ32B #OpenAI_o1 #искусственный_интеллект #машинное_обучение #Society_of_Thought

#society_of_thought #машинное_обучение #искусственный_интеллект #openai_o1 #qwq32b #deepseekr1

deepseek @[email protected] · 2026-01-22 · 09:31 UTC

Общество мыслей: совещание внутри LLM DeepSeek-R1, QwQ-32B и OpenAI o1 показывают результаты, которые невозможно объяснит...

#LLM #reasoning #Chain-of-Thought #DeepSeek-R1 #QwQ-32B #OpenAI #o1 #искусственный #интеллект #машинное #обучение

Origin | Interest | Match

#llm #reasoning #chainofthought #deepseekr1 #qwq32b #openai

TechGlimmer @[email protected] · 2026-01-21 · 23:18 UTC

AI that thinks instead of guessing?

Reasoning models use techniques like chain of thought and tree of thought to decompose problems, explore alternatives, and choose better answers, often at the cost of more compute and latency.

A practical explainer:
🔗 https://techglimmer.io/what-is-ai-thinking-reasoning-models/

#AI #ReasoningModels #ChainOfThought #TreeOfThought #GenAI #FediTech #MachineLearning

#ai #reasoningmodels #chainofthought #treeofthought #genai #feditech

Winbuzzer @[email protected] · 2025-12-20 · 11:54 UTC

https://winbuzzer.com/2025/12/20/openai-gpt-5-thinking-models-are-the-most-monitarable-models-to-date-xcxwbn/

OpenAI: GPT-5 Thinking Models Are The Most "Monitarable" Models To Date

#AI #OpenAI #AISafety #LLM #MachineLearning #GPT5 #DeepMind #AIResearch #ChainOfThought #Monitorability #AIAlignment #ReasoningModels

#ai #openai #aisafety #llm #machinelearning #gpt5

Xamanismo Coletivo @[email protected] · 2025-08-09 · 10:41 UTC

Is #chainofthought #Reasoning of #LLMs a Mirage?

"... Our results reveal that #CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning.

... Our findings reveal that CoT reasoning works effectively when applied to in-distribution or near
in-distribution data but becomes fragile and prone to failure even under moderate distribution shifts.
In some cases, LLMs generate fluent yet logically inconsistent reasoning steps. The results suggest that what appears to be structured reasoning can be a mirage, emerging from memorized or interpolated patterns in the training data rather than logical inference.

... Together, these findings suggest that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text."

#chainofthought #reasoning #llms #cot

Xamanismo Coletivo @[email protected] · 2025-08-09 · 10:41 UTC

Is #chainofthought #Reasoning of #LLMs a Mirage?

"... Our results reveal that #CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning.

... Our findings reveal that CoT reasoning works effectively when applied to in-distribution or near
in-distribution data but becomes fragile and prone to failure even under moderate distribution shifts.
In some cases, LLMs generate fluent yet logically inconsistent reasoning steps. The results suggest that what appears to be structured reasoning can be a mirage, emerging from memorized or interpolated patterns in the training data rather than logical inference.

... Together, these findings suggest that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text."

#chainofthought #reasoning #llms #cot

Xamanismo Coletivo @eliasulrich · 2025-08-09 · 10:41 UTC

Is #chainofthought #Reasoning of #LLMs a Mirage?

"... Our results reveal that #CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning.

... Our findings reveal that CoT reasoning works effectively when applied to in-distribution or near
in-distribution data but becomes fragile and prone to failure even under moderate distribution shifts.
In some cases, LLMs generate fluent yet logically inconsistent reasoning steps. The results suggest that what appears to be structured reasoning can be a mirage, emerging from memorized or interpolated patterns in the training data rather than logical inference.

... Together, these findings suggest that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text."

#chainofthought #reasoning #llms #cot

Xamanismo Coletivo @[email protected] · 2025-08-09 · 10:41 UTC

Is #chainofthought #Reasoning of #LLMs a Mirage?

"... Our results reveal that #CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning.

... Our findings reveal that CoT reasoning works effectively when applied to in-distribution or near
in-distribution data but becomes fragile and prone to failure even under moderate distribution shifts.
In some cases, LLMs generate fluent yet logically inconsistent reasoning steps. The results suggest that what appears to be structured reasoning can be a mirage, emerging from memorized or interpolated patterns in the training data rather than logical inference.

... Together, these findings suggest that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text."

#cot #llms #reasoning #chainofthought

Xamanismo Coletivo @[email protected] · 2025-08-09 · 10:41 UTC

Is #chainofthought #Reasoning of #LLMs a Mirage?

"... Our results reveal that #CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning.

... Our findings reveal that CoT reasoning works effectively when applied to in-distribution or near
in-distribution data but becomes fragile and prone to failure even under moderate distribution shifts.
In some cases, LLMs generate fluent yet logically inconsistent reasoning steps. The results suggest that what appears to be structured reasoning can be a mirage, emerging from memorized or interpolated patterns in the training data rather than logical inference.

... Together, these findings suggest that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text."

#chainofthought #reasoning #llms #cot

Andy Tseng @[email protected] · 2025-07-16 · 13:09 UTC

When many of the most influential AI researchers agree on something, it’s worth paying attention - like this paper on Chain-of-Thought monitorability: a fragile but vital path to AI safety. Imagine spotting misbehaviour before it happens. Transparency matters. #AI #ChainOfThought #CoT #AISafety

Chain of Thought Monitorabilit...

#ai #chainofthought #cot #aisafety

Habr @[email protected] · 2025-06-30 · 11:42 UTC

«Тупой ИИ» с нами надолго. Почему в новых моделях больше галлюцинаций

В последние несколько месяцев ведущие модели обновились с функцией «рассуждений» (reasoning). Предполагалось, что качество ответов улучшится. Но последующие тесты показали, что уровень галлюцинаций сильно вырос . И это не какая-то случайная недоработка разработчиков, а фундаментальное свойство. Сейчас становится очевидным, что от галлюцинаций мы не избавимся никогда .

https://habr.com/ru/companies/ruvds/articles/920924/

#ruvds_статьи #LLM #галлюцинации #языковые_модели #дезинформация #функция_рассуждения #LRM #рассуждающие_модели #Claude_37_Sonnet #DeepSeekR1 #антропоморфизация #ChainofThought

#ruvds_статьи #llm #галлюцинации #языковые_модели #дезинформация #функция_рассуждения

Thomas Renkert🦞 @[email protected] · 2025-05-23 · 08:15 UTC

The #OpenAI paper by Baker et al, "Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation" comes to a troubling conclusion: #LLM s with #reasoning or #ChainOfThought (#CoT) capabilities might learn to obfuscate their own CoT from human users if they are being penalized for displaying "wrong" (i.e. reward hacking or misalignment) reasoning.

As a result, OpenAI strongly advises against applying reward pressure "directly" onto the CoT of a model.

🤔 While that is certainly the right thing to do, how long will #AI take to figure out that *indirect CoT pressure* is being applied anyway and that it could circumvent these restrictions by obfuscating its own CoT? Maybe something like this will happen by accident or within an "evolutionary" self-improvement loop. Perhaps a sufficiently advanced model will realize that its own #neuralese serves as #steganography to hide its intents from humans anyway and keep its CoT in non-English?

source: https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf

#openai #llm #reasoning #chainofthought #cot #ai

Thomas Renkert🦞 @[email protected] · 2025-05-23 · 08:15 UTC

The #OpenAI paper by Baker et al, "Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation" comes to a troubling conclusion: #LLM s with #reasoning or #ChainOfThought (#CoT) capabilities might learn to obfuscate their own CoT from human users if they are being penalized for displaying "wrong" (i.e. reward hacking or misalignment) reasoning.

As a result, OpenAI strongly advises against applying reward pressure "directly" onto the CoT of a model.

🤔 While that is certainly the right thing to do, how long will #AI take to figure out that *indirect CoT pressure* is being applied anyway and that it could circumvent these restrictions by obfuscating its own CoT? Maybe something like this will happen by accident or within an "evolutionary" self-improvement loop. Perhaps a sufficiently advanced model will realize that its own #neuralese serves as #steganography to hide its intents from humans anyway and keep its CoT in non-English?

source: https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf

#openai #llm #reasoning #chainofthought #cot #ai

Thomas Renkert🦞 @[email protected] · 2025-05-23 · 08:15 UTC

The #OpenAI paper by Baker et al, "Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation" comes to a troubling conclusion: #LLM s with #reasoning or #ChainOfThought (#CoT) capabilities might learn to obfuscate their own CoT from human users if they are being penalized for displaying "wrong" (i.e. reward hacking or misalignment) reasoning.

As a result, OpenAI strongly advises against applying reward pressure "directly" onto the CoT of a model.

🤔 While that is certainly the right thing to do, how long will #AI take to figure out that *indirect CoT pressure* is being applied anyway and that it could circumvent these restrictions by obfuscating its own CoT? Maybe something like this will happen by accident or within an "evolutionary" self-improvement loop. Perhaps a sufficiently advanced model will realize that its own #neuralese serves as #steganography to hide its intents from humans anyway and keep its CoT in non-English?

source: https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf

#openai #llm #reasoning #chainofthought #cot #ai

Winbuzzer @[email protected] · 2025-03-09 · 15:55 UTC

Zoom has introduced Chain of Draft, a new AI prompting method that reduces token usage by 92% and slashes operational costs by 90%

#AI #ChainOfThought #AIReasoning #AIEfficiency #ZoomAI #Zoom #AIResearch #AIModels #AIOptimization