home.social

#model-collapse — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #model-collapse, aggregated by home.social.

fetched live
  1. Yay! My essay on the impacts of Large Language Model (#LLM) #AI in #archaeology was just published:

    doi.org/10.11141/ia.71.15

    It looks at #bots and mass scraping on the infrastructure supporting #opendata and #openaccess. It also looks at the incentives that encourage the mass-production of #bullshit that may lead to #modelcollapse but more likely less dramatic and more dreary outcomes.

    Enjoy!
    #stochasticparrots #digitalhumanities #openscience

  2. @ApostateEnglishman This spelling appears in some dictionaries because it was used in 1646. As far as I can tell it was used only _once_. This misspelling of "loyalty" was probably a typo or mistranslation by the original author. Or, it may be an error introduced in more recent times during scanning/OCR. I haven't seen a photo of the original page so I can't confirm, but I have seen this sort of glitch happen.

    Nevertheless, it's bizarre to include such a rare and archaic word in spell-check dictionaries!

    How did this happen? I think it may be a consequence of LLMs scraping content from online sources, using what it finds without the intelligence to discern between quality and slop, and negligent humans failing to review machine-generated content before declaring "LGTM, ship it!" Next, that LLM gets scraped by other LLMs, which indiscriminately incorporate the errors into their own AI model training corpus in an ever-worsening "Habsburg AI" feedback loop.

    Thus, it seems one person's typo nearly 400 years ago has resurfaced and is contributing to AI Model Collapse.

    #AI #LLM #LLMs #AISlop #HabsburgAI #AIModelCollapse #ModelCollapse #AutoCarrot

  3. @ApostateEnglishman This spelling appears in some dictionaries because it was used in 1646. As far as I can tell it was used only _once_. This misspelling of "loyalty" was probably a typo or mistranslation by the original author. Or, it may be an error introduced in more recent times during scanning/OCR. I haven't seen a photo of the original page so I can't confirm, but I have seen this sort of glitch happen.

    Nevertheless, it's bizarre to include such a rare and archaic word in spell-check dictionaries!

    How did this happen? I think it may be a consequence of LLMs scraping content from online sources, using what it finds without the intelligence to discern between quality and slop, and negligent humans failing to review machine-generated content before declaring "LGTM, ship it!" Next, that LLM gets scraped by other LLMs, which indiscriminately incorporate the errors into their own AI model training corpus in an ever-worsening "Habsburg AI" feedback loop.

    Thus, it seems one person's typo nearly 400 years ago has resurfaced and is contributing to AI Model Collapse.

    #AI #LLM #LLMs #AISlop #HabsburgAI #AIModelCollapse #ModelCollapse #AutoCarrot

  4. À force d’utiliser l’#IA, les #journalistes risquent-ils d’appauvrir la langue ?
    theconversation.com/a-force-du
    Quand les systèmes commencent à être entraînés à partir de textes produits par d’autres IA arrive le #modelcollapse ou #effondrement du modèle un processus de #dégénérescence où les données générées par un modèle finissent par contaminer l’entraînement des générations suivantes.
    + Il y a de textes artificiels - les modèles sont exposés à la diversité réelle des usages humains de la langue

  5. À force d’utiliser l’#IA, les #journalistes risquent-ils d’appauvrir la langue ?
    theconversation.com/a-force-du
    Quand les systèmes commencent à être entraînés à partir de textes produits par d’autres IA arrive le #modelcollapse ou #effondrement du modèle un processus de #dégénérescence où les données générées par un modèle finissent par contaminer l’entraînement des générations suivantes.
    + Il y a de textes artificiels - les modèles sont exposés à la diversité réelle des usages humains de la langue

  6. 🔴 LIVE NOW ON VORTEX
    📻 Vortex Night ⛓️ (Industrial metal)
    ──────────────
    🎵 MODEL COLLAPSE - SILENT PATH

    ▶️ Écouter / Listen : VorteX [Radio]
    lesonduvortex.net

    💬 Join us on Discord:
    discord.gg/d82hJZBeDE

    #VortexWave #ModelCollapse #Ambient #Post-Rock #2000s

  7. 🔴 LIVE NOW ON VORTEX
    📻 Vortex Night ⛓️ (Industrial metal)
    ──────────────
    🎵 MODEL COLLAPSE - SILENT PATH

    ▶️ Écouter / Listen : VorteX [Radio]
    lesonduvortex.net

    💬 Join us on Discord:
    discord.gg/d82hJZBeDE

    #VortexWave #ModelCollapse #Ambient #Post-Rock #2000s

  8. AI makes mistakes – I still notice them because I have prior knowledge.

    But what about young people who use AI as their primary source of information?

    And: what happens when this generation trains the next AI – with the knowledge they got from AI?

    Does ignorance compound itself?

    #ai #modelcollapse #ailiteracy #education

  9. 🟩 𝗘𝗫𝗛𝗜𝗕𝗜𝗧𝗜𝗢𝗡: 𝐿𝑎𝑡𝑒𝑛𝑡 𝑆𝑝𝑎𝑐𝑒
    1–30 April | Aksioma Project Space
    ❕ 𝗢𝗽𝗲𝗻𝗶𝗻𝗴: 1 April at 8 PM

    In her installation, artist #FelicityHammond offers a speculative glimpse into a not-too-distant future where this new approach to space-based computation has become the dominant position in the AI industry. However, the system continues to battle with the effects of #modelcollapse...

    > aksioma.org/becomingimage/exhi

  10. 🟩 𝗘𝗫𝗛𝗜𝗕𝗜𝗧𝗜𝗢𝗡: 𝐿𝑎𝑡𝑒𝑛𝑡 𝑆𝑝𝑎𝑐𝑒
    1–30 April | Aksioma Project Space
    ❕ 𝗢𝗽𝗲𝗻𝗶𝗻𝗴: 1 April at 8 PM

    In her installation, artist #FelicityHammond offers a speculative glimpse into a not-too-distant future where this new approach to space-based computation has become the dominant position in the AI industry. However, the system continues to battle with the effects of #modelcollapse...

    > aksioma.org/becomingimage/exhi

  11. “The half-life of cultural relevance has collapsed below the minimum viable generation cycle for coherent slop.”

    WTFH?!…

    open.substack.com/pub/ediblspa

    #ai #slop #modelcollapse #onlinetoomuch

  12. “The half-life of cultural relevance has collapsed below the minimum viable generation cycle for coherent slop.”

    WTFH?!…

    open.substack.com/pub/ediblspa

    #ai #slop #modelcollapse #onlinetoomuch

  13. RE: wandering.shop/@cstross/115961

    Model collapse: “The owners of the right-wing press read their own media and it rotted their brains.” Ha ha, yes! The same thing happened to the (so-called) Liberal Party in Australia & they lost the last two elections, badly. ⤵️

    #AUSPol #ModelCollapse #LiberalParty #AustralianElections #reconnectingConsequencesToCauses

  14. RE: wandering.shop/@cstross/115961

    Model collapse: “The owners of the right-wing press read their own media and it rotted their brains.” Ha ha, yes! The same thing happened to the (so-called) Liberal Party in Australia & they lost the last two elections, badly. ⤵️

    #AUSPol #ModelCollapse #LiberalParty #AustralianElections #reconnectingConsequencesToCauses

  15. Wąż zjada własny ogon. „Profesjonalny” GPT-5.2 przyłapany na cytowaniu kontrowersyjnej Grokipedii

    Według zapewnień OpenAI miał być szczytem techniki, narzędziem dedykowanym dla prawników, bankierów i naukowców. Tymczasem flagowy model GPT-5.2 został przyłapany na ściąganiu na egzaminie. I to od kogo? Od swojego mniej rozgarniętego kuzyna z xAI.

    Recykling cyfrowych treści

    Śledztwo przeprowadzone przez The Guardian ujawniło mechanizm, którego inżynierowie z San Francisco woleliby nie nagłaśniać. GPT-5.2 – w zamyśle twórców model klasy „enterprise” – w swoich odpowiedziach powołuje się na Grokipedię jako wiarygodne źródło.

    Tu potrzebne jest wyjaśnienie: Grokipedia (część projektu xAI Elona Muska) nie jest tradycyjną encyklopedią redagowaną przez ludzi. To dynamiczny agregator, który generuje podsumowania w czasie rzeczywistym, często zasysając treści bezpośrednio z serwisu X (dawniej Twitter). Efekt? Obok faktów trafiają tam teorie spiskowe i treści z forów ekstremistycznych, które algorytm traktuje na równi z newsami.

    Iran, Holokaust i halucynacje

    Problem nie dotyczy błahostek. Dziennikarze wykazali, że GPT-5.2 posiłkował się treściami wygenerowanymi przez Groka w tematach wagi ciężkiej:

    • Powiązań rządu Iranu z firmą telekomunikacyjną MTN-Irancell.
    • Kwestii brytyjskiego historyka Richarda Evansa, biegłego w procesie negacjonisty Holokaustu Davida Irvinga.

    W obu przypadkach „poważny” ChatGPT, przeszukując sieć w poszukiwaniu odpowiedzi, uznał syntetyczny wytwór algorytmu Elona Muska za rzetelne źródło informacji. To tak, jakby profesor uniwersytetu w pracy naukowej zacytował przypadkowy, niezweryfikowany wpis z mediów społecznościowych.

    OpenAI: „Filtrujemy, ale…”

    Odpowiedź OpenAI jest standardowa: firma tłumaczy, że model przeszukuje szeroki zakres publicznie dostępnych stron i stosuje filtry bezpieczeństwa, by odsiać szkodliwe treści.

    Wpadka z Grokipedią pokazuje jednak, że filtry te są dziurawe. Skoro system nie odróżnia rzetelnego dziennikarstwa od automatycznego agregatu opinii z X, to obietnica „profesjonalizmu” staje pod znakiem zapytania.

    Era „Sztucznej Wiedzy”

    To zdarzenie to dowód na to, że internet w 2026 roku staje się zamkniętym obiegiem. Modele AI mają coraz większy problem z dotarciem do „czystej”, ludzkiej wiedzy, więc zaczynają przetwarzać output innych maszyn (zjawisko tzw. Model Collapse).

    Dla firm, które planowały oprzeć swój biznes na bezkrytycznym zaufaniu do GPT-5.2, to sygnał ostrzegawczy. Weryfikacja źródeł przez człowieka wciąż jest niezbędna – zwłaszcza gdy źródłem dla sztucznej inteligencji staje się inna sztuczna inteligencja.

    Giganci rozwijający AI mają problem, nie chodzi tylko o Apple

    #Grokipedia #halucynacjeAI #ModelCollapse #news #OpenAIGPT52 #TheGuardian #weryfikacjaźródeł #xAIElonMusk
  16. Wąż zjada własny ogon. „Profesjonalny” GPT-5.2 przyłapany na cytowaniu kontrowersyjnej Grokipedii

    Według zapewnień OpenAI miał być szczytem techniki, narzędziem dedykowanym dla prawników, bankierów i naukowców. Tymczasem flagowy model GPT-5.2 został przyłapany na ściąganiu na egzaminie. I to od kogo? Od swojego mniej rozgarniętego kuzyna z xAI.

    Recykling cyfrowych treści

    Śledztwo przeprowadzone przez The Guardian ujawniło mechanizm, którego inżynierowie z San Francisco woleliby nie nagłaśniać. GPT-5.2 – w zamyśle twórców model klasy „enterprise” – w swoich odpowiedziach powołuje się na Grokipedię jako wiarygodne źródło.

    Tu potrzebne jest wyjaśnienie: Grokipedia (część projektu xAI Elona Muska) nie jest tradycyjną encyklopedią redagowaną przez ludzi. To dynamiczny agregator, który generuje podsumowania w czasie rzeczywistym, często zasysając treści bezpośrednio z serwisu X (dawniej Twitter). Efekt? Obok faktów trafiają tam teorie spiskowe i treści z forów ekstremistycznych, które algorytm traktuje na równi z newsami.

    Iran, Holokaust i halucynacje

    Problem nie dotyczy błahostek. Dziennikarze wykazali, że GPT-5.2 posiłkował się treściami wygenerowanymi przez Groka w tematach wagi ciężkiej:

    • Powiązań rządu Iranu z firmą telekomunikacyjną MTN-Irancell.
    • Kwestii brytyjskiego historyka Richarda Evansa, biegłego w procesie negacjonisty Holokaustu Davida Irvinga.

    W obu przypadkach „poważny” ChatGPT, przeszukując sieć w poszukiwaniu odpowiedzi, uznał syntetyczny wytwór algorytmu Elona Muska za rzetelne źródło informacji. To tak, jakby profesor uniwersytetu w pracy naukowej zacytował przypadkowy, niezweryfikowany wpis z mediów społecznościowych.

    OpenAI: „Filtrujemy, ale…”

    Odpowiedź OpenAI jest standardowa: firma tłumaczy, że model przeszukuje szeroki zakres publicznie dostępnych stron i stosuje filtry bezpieczeństwa, by odsiać szkodliwe treści.

    Wpadka z Grokipedią pokazuje jednak, że filtry te są dziurawe. Skoro system nie odróżnia rzetelnego dziennikarstwa od automatycznego agregatu opinii z X, to obietnica „profesjonalizmu” staje pod znakiem zapytania.

    Era „Sztucznej Wiedzy”

    To zdarzenie to dowód na to, że internet w 2026 roku staje się zamkniętym obiegiem. Modele AI mają coraz większy problem z dotarciem do „czystej”, ludzkiej wiedzy, więc zaczynają przetwarzać output innych maszyn (zjawisko tzw. Model Collapse).

    Dla firm, które planowały oprzeć swój biznes na bezkrytycznym zaufaniu do GPT-5.2, to sygnał ostrzegawczy. Weryfikacja źródeł przez człowieka wciąż jest niezbędna – zwłaszcza gdy źródłem dla sztucznej inteligencji staje się inna sztuczna inteligencja.

    Giganci rozwijający AI mają problem, nie chodzi tylko o Apple

    #Grokipedia #halucynacjeAI #ModelCollapse #news #OpenAIGPT52 #TheGuardian #weryfikacjaźródeł #xAIElonMusk
  17. "The co-degeneration thesis is not a prediction about distant futures. It describes dynamics already in motion, already documented in peer-reviewed research, already observable in the declining quality of online discourse and the increasing unreliability of AI systems that should, by simple scaling laws, only be improving.

    The feedback loops are active. Engagement-optimized content degrades training data. Degraded models produce degraded outputs. Humans consuming and delegating to these systems experience cognitive effects that reduce their capacity to recognize and correct the degradation. The cycle continues.

    But this is not a counsel of despair. The research also suggests intervention points. Model collapse can be prevented through data accumulation strategies that preserve genuine human content. Cognitive debt can be mitigated through usage protocols that maintain human engagement. Platform incentives can be restructured through regulation, competition, or user demand.

    The question is whether institutional actors—corporations, governments, investors, educators—recognize the dynamics in time to intervene effectively, or whether they continue optimizing for metrics that accelerate the degradation."

    substack.com/inbox/post/180851

    #AI #GenerativeAI #Chatbots #LLMs #ModelCollapse

  18. "The co-degeneration thesis is not a prediction about distant futures. It describes dynamics already in motion, already documented in peer-reviewed research, already observable in the declining quality of online discourse and the increasing unreliability of AI systems that should, by simple scaling laws, only be improving.

    The feedback loops are active. Engagement-optimized content degrades training data. Degraded models produce degraded outputs. Humans consuming and delegating to these systems experience cognitive effects that reduce their capacity to recognize and correct the degradation. The cycle continues.

    But this is not a counsel of despair. The research also suggests intervention points. Model collapse can be prevented through data accumulation strategies that preserve genuine human content. Cognitive debt can be mitigated through usage protocols that maintain human engagement. Platform incentives can be restructured through regulation, competition, or user demand.

    The question is whether institutional actors—corporations, governments, investors, educators—recognize the dynamics in time to intervene effectively, or whether they continue optimizing for metrics that accelerate the degradation."

    substack.com/inbox/post/180851

    #AI #GenerativeAI #Chatbots #LLMs #ModelCollapse

  19. . @glitter mentioned a few days ago that AI-generated images are becoming more and more yellow as the LLMs are trained on the output of other LLM runs. #ModelCollapse #AI #LLMs
  20. . @glitter mentioned a few days ago that AI-generated images are becoming more and more yellow as the LLMs are trained on the output of other LLM runs. #ModelCollapse #AI #LLMs
  21. #HoloWrites 1200-odd words today! I'm finding it super difficult to fake writing LLM output in a way that's engaging, funny, and obvious to the reader, but I think I'm getting there with the last chapter of #ModelCollapse. Shouldn't keep my audience of three waiting too long :D

  22. I've read that LLMs and other generative models will eventually collapse if they are trained on their own output. I did a search and found this paper for example nature.com/articles/s41586-024 . Shouldn't this problem affect humans as well? Humans "generate" books which other humans use to "train" themselves. Then these trained humans generate new books and the cycle continues. What prevents the quality and diversity of the human output from collapsing in the same way that LLM output collapses?

    My guess is that sometimes there are problems where the quality of human thought decreases over time. Group think comes to mind. In science, experimental work helps to keep the theory to be grounded. Also humans live in the real world so they suffer if their internal world model differs from the real world.

    #LLM
    #modelCollapse
    #machineLearning

  23. I've read that LLMs and other generative models will eventually collapse if they are trained on their own output. I did a search and found this paper for example nature.com/articles/s41586-024 . Shouldn't this problem affect humans as well? Humans "generate" books which other humans use to "train" themselves. Then these trained humans generate new books and the cycle continues. What prevents the quality and diversity of the human output from collapsing in the same way that LLM output collapses?

    My guess is that sometimes there are problems where the quality of human thought decreases over time. Group think comes to mind. In science, experimental work helps to keep the theory to be grounded. Also humans live in the real world so they suffer if their internal world model differs from the real world.

    #LLM
    #modelCollapse
    #machineLearning

  24. In big news overnight, #Anthropic have made a major change to their user data retention and training policy - giving customers until September 28th to opt out, or have their chats, code sessions and other artefacts used for training for up to five years.

    This is a major departure from their previous privacy-first stance.

    But what's really behind this change? As Connie Loizos points out in this @Techcrunch article, it's all about the #data.

    As I've spoken about recently, we've passed #PeakToken - the point in history where we have the maximum amount of authentic, human-generated data available. Now, the internet is polluted with synthetically-generated #AIslop. If you're an #AI company scraping the web for new data to train on, that's bad news, because you also scoop up the AI slop. If models are trained on AI slop, they're likely to encounter #ModelCollapse - like a bad photocopy.

    Anthropic's play here is all about the #TokenCrisis - the voracious appetite for new, authentic, human-generated data to train on - part of a broader phenomenon I've termed the #TokenWars.

    As new data becomes scarcer and more valuable, it will be more sought after and contested. We're still in the early days of the #TokenWars, and we should expect to see more moves like this to secure more data for AI training.

    techcrunch.com/2025/08/28/anth

  25. In big news overnight, #Anthropic have made a major change to their user data retention and training policy - giving customers until September 28th to opt out, or have their chats, code sessions and other artefacts used for training for up to five years.

    This is a major departure from their previous privacy-first stance.

    But what's really behind this change? As Connie Loizos points out in this @Techcrunch article, it's all about the #data.

    As I've spoken about recently, we've passed #PeakToken - the point in history where we have the maximum amount of authentic, human-generated data available. Now, the internet is polluted with synthetically-generated #AIslop. If you're an #AI company scraping the web for new data to train on, that's bad news, because you also scoop up the AI slop. If models are trained on AI slop, they're likely to encounter #ModelCollapse - like a bad photocopy.

    Anthropic's play here is all about the #TokenCrisis - the voracious appetite for new, authentic, human-generated data to train on - part of a broader phenomenon I've termed the #TokenWars.

    As new data becomes scarcer and more valuable, it will be more sought after and contested. We're still in the early days of the #TokenWars, and we should expect to see more moves like this to secure more data for AI training.

    techcrunch.com/2025/08/28/anth

  26. New #review today: "Or you could just listen to #AncientPsychicTripleHyperOctopus and find yourself in a sound-world of weird electronics, percussion, and trumpet that floats along without rhyme or reason, but manifests as a fascinating journey. The perpetrators of this experiment are #AlexBonney (trumpet, bass recorder, Strohviol), #WillGlaser (drums, percussion), and #IsambardKhroustaliov (aka #SamBritton, electronics)." #ExposeOnline #ExperimentalMusic #ModelCollapse expose.org/index.php/articles/

  27. New #review today: "Or you could just listen to #AncientPsychicTripleHyperOctopus and find yourself in a sound-world of weird electronics, percussion, and trumpet that floats along without rhyme or reason, but manifests as a fascinating journey. The perpetrators of this experiment are #AlexBonney (trumpet, bass recorder, Strohviol), #WillGlaser (drums, percussion), and #IsambardKhroustaliov (aka #SamBritton, electronics)." #ExposeOnline #ExperimentalMusic #ModelCollapse expose.org/index.php/articles/

  28. "We are happy to tell you that we accept your proposal: The Well Is Poisoned — Now What Shall We Drink?" :blobcatchristmasglowsticks:

    Looking forward to talking about genAI pollution of the infosphere and what we can do about it at #WHY2025 :why2025:

    Read my proposal here: martinh.net/hacks/poisoned-wel

    #SearchClub #ModelCollapse #AISlop #SmallWeb #LowBackgroundInformation

  29. "We are happy to tell you that we accept your proposal: The Well Is Poisoned — Now What Shall We Drink?" :blobcatchristmasglowsticks:

    Looking forward to talking about genAI pollution of the infosphere and what we can do about it at #WHY2025 :why2025:

    Read my proposal here: martinh.net/hacks/poisoned-wel

    #SearchClub #ModelCollapse #AISlop #SmallWeb #LowBackgroundInformation

  30. هل نواجه "تلوّثًا رقميًا" يُهدد مستقبل #الذكاء_الاصطناعي؟
    منذ إطلاق #ChatGPT في 2022، يشبّه خبراء الذكاء الاصطناعي ما حدث بانفجار أول قنبلة ذرية!لماذا ؟
    👇👇👇
    #AI #ModelCollapse #DataQuality #ChatGPT #ArtificialIntelligence #Ethics #TechPolicy

    tinyurl.com/5n9xhc6v

  31. هل نواجه "تلوّثًا رقميًا" يُهدد مستقبل #الذكاء_الاصطناعي؟
    منذ إطلاق #ChatGPT في 2022، يشبّه خبراء الذكاء الاصطناعي ما حدث بانفجار أول قنبلة ذرية!لماذا ؟
    👇👇👇
    #AI #ModelCollapse #DataQuality #ChatGPT #ArtificialIntelligence #Ethics #TechPolicy

    tinyurl.com/5n9xhc6v