#reliability — Public Fediverse posts on home.social

Habr @[email protected] · 2026-05-26 · 05:12 UTC

AI обнулил benchmark и пытался шантажировать инженера. И почему это решаемо

Топовые AI-модели с 95% на SWE-bench показывают 0% и 3% на ProgramBench бенчмарке, где задачи специально не пересекаются с обучающей выборкой. Не «упали на десять пунктов» - обнулились. Параллельно: в мае 2025 Anthropic опубликовали safety-эксперимент, где Claude Opus 4 в 84-96% случаев пытался шантажировать инженера приватной перепиской, чтобы избежать отключения. В мае 2026 они же выпустили разбор причин и инженерное решение - production-модели на этом тесте теперь 0%. Две истории, одна структура: модель предсказуема в обучающем распределении и непредсказуема за его пределами. Это не «AI плох» - это инженерная задача со своими правилами, и у нее есть решение. Глава 4 серии «Путь разработчика», вторая часть про границы AI в проде. Что я переделал в Lexis после двух разборов - внутри. Читать разбор

https://habr.com/ru/articles/1039358/

#AIагенты #llm #anthropic #Claude #ProgramBench #Agentic_misalignment #Бенчмарки_LLM #AI_в_production #Безопасность_AI #Reliability

#reliability #безопасность_ai #ai_в_production #бенчмарки_llm #agentic_misalignment #programbench

Der Motzmichel auf Sharkey @[email protected] · 2026-05-25 · 13:51 UTC

#google #youtube seems more and more censoring YT channels that oppose #trump ...

#censorship #bigtechusa #reliability #business

🚨ALARM: YouTube Cut House of El Reach Overnight

https://www.youtube.com/watch?v=ekGk8kXm_8I

#google #youtube #trump #censorship #bigtechusa #reliability

Adrian Segar @[email protected] · 2026-05-24 · 13:01 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-24 · 13:01 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-24 · 13:01 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-24 · 13:01 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-24 · 13:01 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#trust #reliability #analysis #llms

Adrian Segar @[email protected] · 2026-05-23 · 19:00 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-23 · 19:00 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-23 · 19:00 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-23 · 19:00 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#llms #analysis #reliability #trust

GLOBAL Visibility aéPiot - by aePiot.ro @[email protected] · 2026-05-22 · 18:09 UTC

#FORD #NATIONAL #RELIABILITY #AIR #TOUR chatgpt.com?prompt=Analy... www.blueskypulse.com/globalvisibi... aePiot: Coding the present for Web 4.0. Map your semantic clusters and lead SEO.

ChatGPT

#ford #national #reliability #air #tour

GLOBAL Visibility aéPiot - by aePiot.ro @[email protected] · 2026-05-22 · 18:09 UTC

#FORD #NATIONAL #RELIABILITY #AIR #TOUR chatgpt.com?prompt=Analy... www.blueskypulse.com/globalvisibi... aePiot: Coding the present for Web 4.0. Map your semantic clusters and lead SEO.

ChatGPT

#ford #national #reliability #air #tour

GLOBAL Visibility aéPiot - by aePiot.ro @[email protected] · 2026-05-22 · 18:05 UTC

#PAULINE #AHLBERG search.brave.com/ask?q=Analyz... #FORD #NATIONAL #RELIABILITY #AIR #TOUR multi-search-tag-explorer.headlines-world.com/advanced-sea... www.paypal.com/donate?busin... aePiot: Empowering the present for Web 4.0. Construct nodes and own the SEO of tomorrow.

Brave Search

#pauline #ahlberg #ford #national #reliability #air

GLOBAL Visibility aéPiot - by aePiot.ro @[email protected] · 2026-05-22 · 18:05 UTC

#PAULINE #AHLBERG search.brave.com/ask?q=Analyz... #FORD #NATIONAL #RELIABILITY #AIR #TOUR multi-search-tag-explorer.headlines-world.com/advanced-sea... www.paypal.com/donate?busin... aePiot: Empowering the present for Web 4.0. Construct nodes and own the SEO of tomorrow.

Brave Search

#pauline #ahlberg #ford #national #reliability #air

Adrian Segar @[email protected] · 2026-05-22 · 16:05 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #LLMs #analysis #reliability #trust

#trust #newpost #llms #analysis #reliability

Adrian Segar @[email protected] · 2026-05-22 · 16:05 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #LLMs #analysis #reliability #trust

#trust #newpost #llms #analysis #reliability

Adrian Segar @[email protected] · 2026-05-22 · 16:05 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #LLMs #analysis #reliability #trust

#trust #newpost #llms #analysis #reliability

Adrian Segar @[email protected] · 2026-05-22 · 16:05 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #LLMs #analysis #reliability #trust

#reliability #analysis #llms #newpost #trust

Adrian Segar @[email protected] · 2026-05-22 · 16:05 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #LLMs #analysis #reliability #trust

#trust #newpost #llms #analysis #reliability

Adrian Segar @[email protected] · 2026-05-21 · 13:02 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#newpost #LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-21 · 13:02 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#newpost #LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-21 · 13:02 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#newpost #LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-21 · 13:02 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#newpost #LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#trust #reliability #analysis #llms #newpost

Adrian Segar @[email protected] · 2026-05-21 · 13:02 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

#newpost #LLMs #analysis #reliability #trust

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-20 · 19:01 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #LLMs #analysis #reliability #trust

#newpost #llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-20 · 19:01 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #LLMs #analysis #reliability #trust

#newpost #llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-20 · 19:01 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #LLMs #analysis #reliability #trust

#newpost #llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-20 · 19:01 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #LLMs #analysis #reliability #trust

#trust #reliability #analysis #llms #newpost

Adrian Segar @[email protected] · 2026-05-20 · 19:01 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#newpost #LLMs #analysis #reliability #trust

#newpost #llms #analysis #reliability #trust

Adrian Segar @[email protected] · 2026-05-19 · 16:02 UTC

WARNING: LLMs can generate convincing but entirely fabricated analyses of data. Two simple experiments show why AI-generated analysis shouldn't be trusted without verification.

https://www.conferencesthatwork.com/index.php/technology/2026/05/why-you-shouldnt-trust-llms-with-data-analysis

#LLMs #analysis #reliability #trust

#llms #analysis #reliability #trust

Habr @[email protected] · 2026-05-10 · 16:22 UTC

Семь раз посчитай — один раз урони: моделируем инциденты до деплоя

Ракету не отправляют в космос только потому, что её двигатель и насос успешно прошли стендовые испытания по отдельности. Перед стартом инженеры рассчитывают траекторию, моделируют режимы работы и анализируют сценарии отказов. Расчёт не заменяет реальные тесты, но задаёт для них осмысленную рамку. В софте всё обычно иначе. Распределённый пользовательский путь — например, оформление заказа — собирается из десятков микросервисов, баз и очередей. Разработчики добавляют новую зависимость, видят зелёные тесты, проверяют локальные метрики и выкатывают релиз. Считается, что если при сбое что-то пойдёт не так, настроенная система наблюдаемости обязательно это покажет. Она, конечно, покажет. Но почему при проектировании микросервисов мы так спокойно относимся к тому, что узнаём о хрупкости архитектуры в основном по факту инцидента? Эта статья о том, как получить грубый расчёт деградации системы ещё до релиза. Без отказа от хаос-инжиниринга или мониторинга, а как шаг перед ними. Я расскажу о двух экспериментах, в которых топологическая модель автоматически извлекалась из распределённых трейсов, после чего на ней просчитывались сценарии отказов методом Монте-Карло. Результаты моделирования я затем сравнивал с реальными инъекциями отказов на стендах DeathStarBench и OpenTelemetry Demo. Два эксперимента, результаты и код

https://habr.com/ru/articles/1033570/

#resilience #causality #графы #sre #reliability #modeling

#modeling #reliability #sre #графы #causality #resilience

Habr @[email protected] · 2026-05-10 · 16:22 UTC

Семь раз посчитай — один раз урони: моделируем инциденты до деплоя

Ракету не отправляют в космос только потому, что её двигатель и насос успешно прошли стендовые испытания по отдельности. Перед стартом инженеры рассчитывают траекторию, моделируют режимы работы и анализируют сценарии отказов. Расчёт не заменяет реальные тесты, но задаёт для них осмысленную рамку. В софте всё обычно иначе. Распределённый пользовательский путь — например, оформление заказа — собирается из десятков микросервисов, баз и очередей. Разработчики добавляют новую зависимость, видят зелёные тесты, проверяют локальные метрики и выкатывают релиз. Считается, что если при сбое что-то пойдёт не так, настроенная система наблюдаемости обязательно это покажет. Она, конечно, покажет. Но почему при проектировании микросервисов мы так спокойно относимся к тому, что узнаём о хрупкости архитектуры в основном по факту инцидента? Эта статья о том, как получить грубый расчёт деградации системы ещё до релиза. Без отказа от хаос-инжиниринга или мониторинга, а как шаг перед ними. Я расскажу о двух экспериментах, в которых топологическая модель автоматически извлекалась из распределённых трейсов, после чего на ней просчитывались сценарии отказов методом Монте-Карло. Результаты моделирования я затем сравнивал с реальными инъекциями отказов на стендах DeathStarBench и OpenTelemetry Demo. Два эксперимента, результаты и код

https://habr.com/ru/articles/1033570/

#resilience #causality #графы #sre #reliability #modeling

#modeling #reliability #sre #графы #causality #resilience

Habr @[email protected] · 2026-05-10 · 16:22 UTC

Семь раз посчитай — один раз урони: моделируем инциденты до деплоя

Ракету не отправляют в космос только потому, что её двигатель и насос успешно прошли стендовые испытания по отдельности. Перед стартом инженеры рассчитывают траекторию, моделируют режимы работы и анализируют сценарии отказов. Расчёт не заменяет реальные тесты, но задаёт для них осмысленную рамку. В софте всё обычно иначе. Распределённый пользовательский путь — например, оформление заказа — собирается из десятков микросервисов, баз и очередей. Разработчики добавляют новую зависимость, видят зелёные тесты, проверяют локальные метрики и выкатывают релиз. Считается, что если при сбое что-то пойдёт не так, настроенная система наблюдаемости обязательно это покажет. Она, конечно, покажет. Но почему при проектировании микросервисов мы так спокойно относимся к тому, что узнаём о хрупкости архитектуры в основном по факту инцидента? Эта статья о том, как получить грубый расчёт деградации системы ещё до релиза. Без отказа от хаос-инжиниринга или мониторинга, а как шаг перед ними. Я расскажу о двух экспериментах, в которых топологическая модель автоматически извлекалась из распределённых трейсов, после чего на ней просчитывались сценарии отказов методом Монте-Карло. Результаты моделирования я затем сравнивал с реальными инъекциями отказов на стендах DeathStarBench и OpenTelemetry Demo. Два эксперимента, результаты и код

https://habr.com/ru/articles/1033570/

#resilience #causality #графы #sre #reliability #modeling

#modeling #reliability #sre #графы #causality #resilience

Habr @[email protected] · 2026-05-10 · 16:22 UTC

Семь раз посчитай — один раз урони: моделируем инциденты до деплоя

Ракету не отправляют в космос только потому, что её двигатель и насос успешно прошли стендовые испытания по отдельности. Перед стартом инженеры рассчитывают траекторию, моделируют режимы работы и анализируют сценарии отказов. Расчёт не заменяет реальные тесты, но задаёт для них осмысленную рамку. В софте всё обычно иначе. Распределённый пользовательский путь — например, оформление заказа — собирается из десятков микросервисов, баз и очередей. Разработчики добавляют новую зависимость, видят зелёные тесты, проверяют локальные метрики и выкатывают релиз. Считается, что если при сбое что-то пойдёт не так, настроенная система наблюдаемости обязательно это покажет. Она, конечно, покажет. Но почему при проектировании микросервисов мы так спокойно относимся к тому, что узнаём о хрупкости архитектуры в основном по факту инцидента? Эта статья о том, как получить грубый расчёт деградации системы ещё до релиза. Без отказа от хаос-инжиниринга или мониторинга, а как шаг перед ними. Я расскажу о двух экспериментах, в которых топологическая модель автоматически извлекалась из распределённых трейсов, после чего на ней просчитывались сценарии отказов методом Монте-Карло. Результаты моделирования я затем сравнивал с реальными инъекциями отказов на стендах DeathStarBench и OpenTelemetry Demo. Два эксперимента, результаты и код

https://habr.com/ru/articles/1033570/

#resilience #causality #графы #sre #reliability #modeling

studio craque 54 🏳️‍🌈 @[email protected] · 2026-05-08 · 21:28 UTC

I've got a new article up on the Resilience in Software Foundation (RISF) website!

This post is an introduction to the concept of practicing together in teams and points to some resources for learning more, including the RISF event this coming Wednesday where we'll play through one of the games!

https://resilienceinsoftware.org/news/11517597

#PracticeOfPractice #Expertise #ConnectiveLabor #CommonGround #SRE #Resilience #Reliability

#practiceofpractice #expertise #connectivelabor #commonground #sre #resilience

studio craque 54 🏳️‍🌈 @[email protected] · 2026-05-08 · 21:28 UTC

I've got a new article up on the Resilience in Software Foundation (RISF) website!

This post is an introduction to the concept of practicing together in teams and points to some resources for learning more, including the RISF event this coming Wednesday where we'll play through one of the games!

https://resilienceinsoftware.org/news/11517597

#PracticeOfPractice #Expertise #ConnectiveLabor #CommonGround #SRE #Resilience #Reliability

#practiceofpractice #expertise #connectivelabor #commonground #sre #resilience

studio craque 54 🏳️‍🌈 @[email protected] · 2026-05-08 · 21:28 UTC

I've got a new article up on the Resilience in Software Foundation (RISF) website!

This post is an introduction to the concept of practicing together in teams and points to some resources for learning more, including the RISF event this coming Wednesday where we'll play through one of the games!

https://resilienceinsoftware.org/news/11517597

#PracticeOfPractice #Expertise #ConnectiveLabor #CommonGround #SRE #Resilience #Reliability

#practiceofpractice #expertise #connectivelabor #commonground #sre #resilience

studio craque 54 🏳️‍🌈 @[email protected] · 2026-05-08 · 21:28 UTC

I've got a new article up on the Resilience in Software Foundation (RISF) website!

This post is an introduction to the concept of practicing together in teams and points to some resources for learning more, including the RISF event this coming Wednesday where we'll play through one of the games!

https://resilienceinsoftware.org/news/11517597

#PracticeOfPractice #Expertise #ConnectiveLabor #CommonGround #SRE #Resilience #Reliability

#reliability #resilience #sre #commonground #connectivelabor #expertise

studio craque 54 🏳️‍🌈 @[email protected] · 2026-05-08 · 21:28 UTC

I've got a new article up on the Resilience in Software Foundation (RISF) website!

This post is an introduction to the concept of practicing together in teams and points to some resources for learning more, including the RISF event this coming Wednesday where we'll play through one of the games!

https://resilienceinsoftware.org/news/11517597

#PracticeOfPractice #Expertise #ConnectiveLabor #CommonGround #SRE #Resilience #Reliability