#chatbotarena — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #chatbotarena, aggregated by home.social.
-
https://winbuzzer.com/2026/04/06/google-study-ai-benchmarks-ignore-human-disagreement-xcxwbn/
Google Study: AI Benchmarks Use Too Few Raters to Be Reliable
#AI #Google #GoogleResearch #AIBenchmarks #AIResearch #MachineLearning #LMArena #ChatbotArena #BigTech #RochesterInstituteOfTechnology #AIEvaluation
-
LMArena Gets $100M at $600M Valuation for AI Model Testing
#AI #LMArena #AIFunding #ChatbotArena #AIBenchmarks #UCBerkeley
https://winbuzzer.com/2025/05/21/lmarena-gets-100m-at-600m-valuation-for-ai-model-testing-xcxwbn/
-
Experts Challenge Validity and Ethics of Crowdsourced AI Benchmarks Like LMArena (Chatbot Arena)
#AI #AIBenchmarks #AIModels #LMArena #ChatbotArena #AIethics #LLMs #AIEvaluation #Crowdsourcing #GenAI
-
AI Benchmarking Platform Chatbot Arena Forms New Company, Launches LMArena
#AI #GenAI #LLMs #AIChatbots #LMArena #ChatbotArena #AIBenchmarks #AIModels #AIevaluation
-
Before launching, GPT-4o broke records on chatbot leaderboard under a secret name - Enlarge (credit: Getty Images)
On Monday, OpenAI employee Will... - https://arstechnica.com/?p=2024084 #largelanguagemodels #multimodalmodels #machinelearning #simonwillison #chatbotarena #gpt2-chatbot #gpt-4-turbo #aivibes #chatgpt #chatgtp #biz #gpt-4o #openai #gpt-4 #lmsys #ai
-
Before launching, GPT-4o broke records on chatbot leaderboard under a secret name - Enlarge (credit: Getty Images)
On Monday, OpenAI employee Will... - https://arstechnica.com/?p=2024084 #largelanguagemodels #multimodalmodels #machinelearning #simonwillison #chatbotarena #gpt2-chatbot #gpt-4-turbo #aivibes #chatgpt #chatgtp #biz #gpt-4o #openai #gpt-4 #lmsys #ai
-
Before launching, GPT-4o broke records on chatbot leaderboard under a secret name - Enlarge (credit: Getty Images)
On Monday, OpenAI employee Will... - https://arstechnica.com/?p=2024084 #largelanguagemodels #multimodalmodels #machinelearning #simonwillison #chatbotarena #gpt2-chatbot #gpt-4-turbo #aivibes #chatgpt #chatgtp #biz #gpt-4o #openai #gpt-4 #lmsys #ai
-
Before launching, GPT-4o broke records on chatbot leaderboard under a secret name - Enlarge (credit: Getty Images)
On Monday, OpenAI employee Will... - https://arstechnica.com/?p=2024084 #largelanguagemodels #multimodalmodels #machinelearning #simonwillison #chatbotarena #gpt2-chatbot #gpt-4-turbo #aivibes #chatgpt #chatgtp #biz #gpt-4o #openai #gpt-4 #lmsys #ai
-
Before launching, GPT-4o broke records on chatbot leaderboard under a secret name - Enlarge (credit: Getty Images)
On Monday, OpenAI employee Will... - https://arstechnica.com/?p=2024084 #largelanguagemodels #multimodalmodels #machinelearning #simonwillison #chatbotarena #gpt2-chatbot #gpt-4-turbo #aivibes #chatgpt #chatgtp #biz #gpt-4o #openai #gpt-4 #lmsys #ai
-
Mysterious “gpt2-chatbot” AI model appears suddenly, confuses experts - Enlarge (credit: Getty Images)
On Sunday, word began to spread... - https://arstechnica.com/?p=2020588 #machinelearning #simonwillison #aibenchmarks #chatbotarena #ethanmollick #gpt2-chatbot #samaltman #aivibes #gpt-3.5 #gpt-4.5 #biz #openai #gpt-3 #gpt-4 #gpt-5 #lmsys #ai
-
Words are flowing out like endless rain: Recapping a busy week of LLM news - Enlarge / An image of a boy amazed by flying letters. (credit: Getty Im... - https://arstechnica.com/?p=2016005 #largelanguagemodels #machinelearning #claude3sonnet #simonwillison #chatbotarena #googlegemini #mixtral8x22b #claude3opus #gpt-4-turbo #anthropic #gemini1.5 #whirlwind #chatgpt #chatgtp #claude3 #mistral #mixtral #biz #gemini #openai #gpt-4 #recap #meta #ai