#bullshitbenchmark — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #bullshitbenchmark, aggregated by home.social.
-
A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark
The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.
Blog post: https://adam.holter.com/bullshitbench-v2-claude-and-qwen-are-the-only-models-that-push-back/
Results: https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html -
A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark
The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.
Blog post: https://adam.holter.com/bullshitbench-v2-claude-and-qwen-are-the-only-models-that-push-back/
Results: https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html -
A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark
The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.
Blog post: https://adam.holter.com/bullshitbench-v2-claude-and-qwen-are-the-only-models-that-push-back/
Results: https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html -
A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark
The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.
Blog post: https://adam.holter.com/bullshitbench-v2-claude-and-qwen-are-the-only-models-that-push-back/
Results: https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html -
A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark
The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.
Blog post: https://adam.holter.com/bullshitbench-v2-claude-and-qwen-are-the-only-models-that-push-back/
Results: https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html