home.social

#llmfail — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #llmfail, aggregated by home.social.

  1. 🤖 Think your AI assistant can really reason? Apple’s puzzle tests say otherwise.
    📉 See how “thinking” AIs collapse when logic gets real — and why we might be projecting intelligence where there is none.

    Hashtags:
    #AIReasoning #ChainOfThought #LLMFail #DeepTech

    URL:
    medium.com/@rogt.x1997/the-ill

  2. @peter_mcmahan If some researchers and evaluators can't be bothered to write the questions, analyze the data, and report findings

    - and all of these are things I've heard researchers publicly crowing in pride about -

    then why should people bother responding?

    #evaluation #research #LLMfail #RealEvalTalk