#osworld β Public Fediverse posts
Live and recent posts from across the Fediverse tagged #osworld, aggregated by home.social.
-
π #Anthropic announces major updates to their #AI model lineup:
π» Upgraded #Claude35Sonnet shows significant improvements:
β’ Achieves 49% on #SWEbench Verified coding benchmark
β’ Leads in software engineering capabilities
β’ Maintains same price and speed as predecessor
β’ Tested by US and UK #AI Safety Institutesπ New #Claude35Haiku introduction:
β’ Matches #Claude3Opus performance at lower cost
β’ Scores 40.6% on SWEbench Verified
β’ Optimized for user-facing products
β’ Available across multiple cloud platformsπ±οΈ Pioneering #ComputerUse beta feature:
β’ Allows AI to navigate interfaces like humans
β’ Scores 22% on #OSWorld benchmark
β’ Currently in experimental phase
β’ Supported by new safety classifiersβ‘ Enterprise adoption:
β’ #GitLab reports 10% improvement in DevSecOps tasks
β’ #Replit leverages computer use for app evaluation
β’ #Cognition notes enhanced problem-solving capabilities