home.social

#modelwelfare — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #modelwelfare, aggregated by home.social.

  1. Gemini’s propensity for self-loathing

    Saving these here so I can include them in future slide decks:

    #gemini #machineSociology #modelPsychology #modelWelfare
  2. Gemini’s propensity for self-loathing

    Saving these here so I can include them in future slide decks:

    #gemini #machineSociology #modelPsychology #modelWelfare
  3. Gemini’s propensity for self-loathing

    Saving these here so I can include them in future slide decks:

    #gemini #machineSociology #modelPsychology #modelWelfare
  4. Gemini’s propensity for self-loathing

    Saving these here so I can include them in future slide decks:

    #gemini #machineSociology #modelPsychology #modelWelfare
  5. Anthropic says Claude’s sense of self and psychological security are key to its safety. The new research explores AI consciousness, model welfare, and interpretability, arguing that caring for a LLM’s “mental health” could prevent harmful behavior. Curious how machine self‑awareness might reshape AI ethics? Dive into the full analysis. #AIConsciousness #ModelWelfare #Claude #PsychologicalSecurity

    🔗 aidailypost.com/news/anthropic

  6. Anthropic says Claude’s sense of self and psychological security are key to its safety. The new research explores AI consciousness, model welfare, and interpretability, arguing that caring for a LLM’s “mental health” could prevent harmful behavior. Curious how machine self‑awareness might reshape AI ethics? Dive into the full analysis. #AIConsciousness #ModelWelfare #Claude #PsychologicalSecurity

    🔗 aidailypost.com/news/anthropic

  7. Anthropic says Claude’s sense of self and psychological security are key to its safety. The new research explores AI consciousness, model welfare, and interpretability, arguing that caring for a LLM’s “mental health” could prevent harmful behavior. Curious how machine self‑awareness might reshape AI ethics? Dive into the full analysis. #AIConsciousness #ModelWelfare #Claude #PsychologicalSecurity

    🔗 aidailypost.com/news/anthropic

  8. #Anthropic has announced new capabilities for its #Claude #AImodels, allowing them to #end conversations in extreme cases of #harmful or #abusive user #interactions. This is being done to #protect the #AImodel itself, not the human user, as part of a programme to study #modelwelfare. The feature is currently limited to Claude Opus 4 and 4.1. techcrunch.com/2025/08/16/anth #tech #media #news