#bullshitbenchmark — Public Fediverse posts on home.social

A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark

The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.

#ai #hallucinate #bullshitbenchmark #claude #qwen #openai

A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark

The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.

#ai #hallucinate #bullshitbenchmark #claude #qwen #openai

A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark

The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.

#ai #hallucinate #bullshitbenchmark #claude #qwen #openai

A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark

The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.

#llm #openai #qwen #claude #bullshitbenchmark #hallucinate

A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark

The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.

#ai #hallucinate #bullshitbenchmark #claude #qwen #openai