#aibenchmarks — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #aibenchmarks, aggregated by home.social.
-
https://winbuzzer.com/2026/05/14/openais-gpt-55-matches-claude-mythos-in-security-tests-xcxwbn/
A UK AI Security Institute evaluation put GPT-5.5 near Claude Mythos on vulnerability-finding tasks.
#AI #GPT55Cyber #ClaudeMythos #UKAISecurityInstitute #OpenAI #Anthropic #Claude #AIBenchmarks #SecurityResearch #Cybersecurity
-
https://winbuzzer.com/2026/05/13/microsoft-researchers-find-ai-models-and-agents-ca-xcxwbn/
Microsoft Research says current AI agents still corrupt work documents when tasks run long enough to require repeated delegation.
#AI #AIAgents #MicrosoftResearch #Microsoft #AgenticAI #AIResearch #AIModels #EnterpriseAI #AIBenchmarks
-
https://winbuzzer.com/2026/05/06/openai-releases-gpt-55-instant-a-new-default-model-xcxwbn/
OpenAI Makes GPT-5.5 Instant ChatGPT's Default Model
#AI #OpenAI #ChatGPT #AIModels #GPT55 #GPT55Instant #Chatbots #AIAssistants #ConversationalAI #AIBenchmarks
-
https://winbuzzer.com/2026/03/03/google-gemini-31-flash-lite-enterprise-scale-xcxwbn/
Google Launches Gemini 3.1 Flash-Lite for Enterprise Scale
#AI #Google #GoogleGemini #Gemini31FlashLite #Gemini31 #BigTech #AIModels #GoogleAIStudio #GoogleVertexAI #Flash #AIInference #AIBenchmarks
-
DeepSeek R1 AI Model Update Boosts Reasoning, Catching up With OpenAI o3 and Gemini 2.5 Pro
#AI #DeepSeek #GenAI #LLM #DeepSeekR1 #AIUpdate #OpenSourceAI #ReasoningModels #AIBenchmarks #MachineLearning #ChinaAI #China