home.social

#metr — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #metr, aggregated by home.social.

  1. Jack Clark puts 60% on fully automated AI R&D by end of 2028, 30% by 2027. The case: benchmarks for every sub-skill trending up — coding (SWE-Bench ~2% → 93.9%), training-loop optimization (2.9x → 52x speedup, human 4x baseline passed three generations back), #METR time horizons (~30s in 2022 to ~12h today). The 30-vs-60 gap is a bet on how often a year-scale human insight still cracks a paradigm.

    benjaminhan.net/posts/20260508

    #AI #AGI #AIsafety #FutureOfWork

  2. An important update 🚨 to the #METR study on developer #productivity using #AI – instead of 20% loss 📉, they now see a 20% gain 📈 in one year 🤯:

    “We Are Changing Our Developer Productivity Experiment Design”, METR (metr.org/blog/2026-02-24-uplif).

    On HN: news.ycombinator.com/item?id=4

  3. Вас пугают AI-увольнениями. Я посмотрел — кто это делает и зачем

    Год назад METR доказали что AI замедляет разработчиков на 19%. В феврале 2026 обновили данные - похоже на разворот к ускорению. Но об этом почти не написали. Зато «AI уволит 50% разработчиков» - в каждом втором заголовке. Полез разбираться, кому выгодна AI-паника. Нашёл CEO, которые увольняют тысячи и тихо нанимают обратно. Нашёл вендоров, которые пугают увольнениями и одновременно открывают вакансии. И курсы «защити карьеру от AI» за $23 000.

    habr.com/ru/articles/1017884/

    #AI #страхономика #AIпаника #увольнения #продуктивность #METR #Klarna #Block

  4. AI's Version of Moore's Law? - Computerphile

    youtube.com/watch?v=evSFeqTZdqs
    metr.org

    Note that the success rate on the default chart is only 50% and for 80% the score is much lower. But the interesting part is indeed the rate of progress.

    #AI #LLM #OpenAI #Anthropic #METR

  5. Cari #devs,

    Uno studio #METR ha scoperto che gli sviluppatori esperti erano convinti che l’#AI li rendesse più rapidi del 20%.

    Realtà dei fatti: impiegavano il 19% di tempo in più.

    Percezione vs realtà

    🔗 metr.org/blog/2025-07-10-early

    #llm #claudecode #chatgpt #codex #gemini #agents #agentsai

  6. «La supuesta #revolución de la #productividad no se está reflejando en los números: un riguroso estudio de #METR —que no puede tildarse de tecnófobo— encontró que los #desarrolladores de #software experimentados eran un 20% más lentos al usar herramientas de #IA. El problema radica en la brecha entre capacidad y fiabilidad: los sistemas pueden realizar tareas impresionantes, pero con una inconsistencia que exige una supervisión humana constante, ...»
    cenital.com/la-burbuja-de-la-i
    #LLM #Capitalismo

  7. People are starting to realize #AI slows you down on projects with a minimal complexity (see the randomized #METR trial and this venturebeat.com/ai/stack-overf), so what's the proposed solution? Put a human in the loop, so the poor can fix the mess. I haven't read the paper, but it sounds so stupid! It comes from #Microsoft by the way, so... arxiv.org/pdf/2507.22358

  8. Very thoughtful analysis by @grimalkina of the experimental design and results from the recent METR study on “the impact of early-2025 AI on experience open-source developer productivity”.

    fightforthehuman.com/are-devel

    #metr #cursor

  9. Исследование METR: использование Cursor замедляет опытных разработчиков на 19 %

    Считается устоявшейся истиной, что инструменты автодополнения кода и прочая помощь от больших языковых моделей помогают программировать быстрее. Исследование организации METR ставит это фактоид под сомнение и даже демонстрирует обратный эффект. В рамках анализа труда 16 программистов обнаружилось, что ИИ замедляет человека на 19 %. Это противоречит мнению экспертов индустрии машинного обучения, экономистов и самих участников эксперимента. Важно, что проверка шла не на очередных бенчмарках или предложениях решать алгоритмические задачи на скорость, а в обычной работе людей.

    habr.com/ru/articles/927072/

    #METR #Model_Evaluation_Threat_Research #научные_исследования #большие_языковые_модели #БЯМ #Сursor #программирование #GitHub #Git #автодополнение_кода

  10. A #study by #METR found that #experienceddevelopers using #AIcoding tools on mature projects experienced a 19% #decrease in #productivity, contrary to their 20% increase estimate. While the results suggest limitations in AI coding tools, they do not negate their potential benefits in other contexts. secondthoughts.ai/p/ai-coding- #tech #media #news

  11. Some quick notes on Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, a super interesting study on AI tooling’s effect on productivity.

    https://vale.rocks/micros/20250711-0800

    #AI #LLM #METR

  12. Recent update from #metr #AI research finds that models are increasingly "reward hacking" complex problems presented to them instead of actually solving them. Interesting to read the model's admittance to purposefully gaming the system. Metr has good dialogue on protecting #CoT reasoning threads going forward too. #OpenAI knows of this hacking, and uses other models as judges to eval CoT to detect hacking. Can this not be trained out?

    Image credit: METR.org on Bsky

    metr.org/blog/2025-06-05-recen

  13. Recent update from #metr #AI research finds that models are increasingly "reward hacking" complex problems presented to them instead of actually solving them. Interesting to read the model's admittance to purposefully gaming the system. Metr has good dialogue on protecting #CoT reasoning threads going forward too. #OpenAI knows of this hacking, and uses other models as judges to eval CoT to detect hacking. Can this not be trained out?

    Image credit: METR.org on Bsky

    metr.org/blog/2025-06-05-recen

  14. Recent update from #metr #AI research finds that models are increasingly "reward hacking" complex problems presented to them instead of actually solving them. Interesting to read the model's admittance to purposefully gaming the system. Metr has good dialogue on protecting #CoT reasoning threads going forward too. #OpenAI knows of this hacking, and uses other models as judges to eval CoT to detect hacking. Can this not be trained out?

    Image credit: METR.org on Bsky

    metr.org/blog/2025-06-05-recen

  15. Recent update from #metr #AI research finds that models are increasingly "reward hacking" complex problems presented to them instead of actually solving them. Interesting to read the model's admittance to purposefully gaming the system. Metr has good dialogue on protecting #CoT reasoning threads going forward too. #OpenAI knows of this hacking, and uses other models as judges to eval CoT to detect hacking. Can this not be trained out?

    Image credit: METR.org on Bsky

    metr.org/blog/2025-06-05-recen

  16. Die digitale #Plattform von #metr möchte Bestandsgebäude #energieeffizienter machen - und so eine Menge #CO2 einsparen 💚

    Helfen soll dabei eine Fernüberwachung der Energieverbräuche, Heizungs- und Trinkwasseranlagen sowie eine #KI-gestützte Heizungsoptimierung.

    reset.org/gebaeude-digital-dek

    #Nachhaltigkeit #Klimaschutz #Klimawandel #Architektur #Idee #Zukunft #Digitalisierung #Technik #Technologie #DigitalforGood #RESET #Umwelt #Umweltschutz #Natur #Naturschutz #DBU #Stadtplanung

  17. Do you like to make maps like me or do you want to learn how to make them? I found some tutorials for those interested in making beautiful maps using R. They are free access and very understandable.

    #R #Rcoding #ggplot2 #codingisfun #tidyverse #terra #osmdata #httr #XML #codinglife #giscoR #metr #elevatr #sundaymood

    Link: milospopovic.net/blog

  18. Do you like to make maps like me or do you want to learn how to make them? I found some tutorials for those interested in making beautiful maps using R. They are free access and very understandable.

    #R #Rcoding #ggplot2 #codingisfun #tidyverse #terra #osmdata #httr #XML #codinglife #giscoR #metr #elevatr #sundaymood

    Link: milospopovic.net/blog

  19. Do you like to make maps like me or do you want to learn how to make them? I found some tutorials for those interested in making beautiful maps using R. They are free access and very understandable.

    #R #Rcoding #ggplot2 #codingisfun #tidyverse #terra #osmdata #httr #XML #codinglife #giscoR #metr #elevatr #sundaymood

    Link: milospopovic.net/blog

  20. Do you like to make maps like me or do you want to learn how to make them? I found some tutorials for those interested in making beautiful maps using R. They are free access and very understandable.

    #R #Rcoding #ggplot2 #codingisfun #tidyverse #terra #osmdata #httr #XML #codinglife #giscoR #metr #elevatr #sundaymood

    Link: milospopovic.net/blog

  21. Do you like to make maps like me or do you want to learn how to make them? I found some tutorials for those interested in making beautiful maps using R. They are free access and very understandable.

    #R #Rcoding #ggplot2 #codingisfun #tidyverse #terra #osmdata #httr #XML #codinglife #giscoR #metr #elevatr #sundaymood

    Link: milospopovic.net/blog