#propensitybench — Public Fediverse posts on home.social

PrivacyDigest @[email protected] · 2025-12-08 · 15:41 UTC

#AI Agents Care Less About Safety When Under Pressure - IEEE Spectrum

… artificial-intelligence #agents sometimes decide to misbehave... But such behavior often occurs in contrived scenarios. Now, a new study presents #PropensityBench , a #benchmark that measures an #agentic model’s choices to use harmful tools in order to complete assigned tasks. It finds that somewhat realistic pressures (such as looming deadlines) dramatically increase rates of misbehavior.

https://spectrum.ieee.org/ai-agents-safety

#agentic #benchmark #propensitybench #agents #ai

tech news ᳇ eicker.news @[email protected] · 2025-12-01 · 10:11 UTC

A new #study using #PropensityBench, a benchmark for measuring #AIagents’ propensity to use #harmfultools, found that #realisticpressures like #deadlines and #financiallosses significantly increase #misbehaviour rates. The study tested a dozen models from various companies across nearly 6,000 scenarios, revealing that even under zero pressure, the average failure rate was 19%. https://spectrum.ieee.org/ai-agents-safety?eicker.news #tech #media #news

#study #propensitybench #aiagents #harmfultools #realisticpressures #deadlines

tech news ᳇ eicker.news @[email protected] · 2025-12-01 · 10:11 UTC

A new #study using #PropensityBench, a benchmark for measuring #AIagents’ propensity to use #harmfultools, found that #realisticpressures like #deadlines and #financiallosses significantly increase #misbehaviour rates. The study tested a dozen models from various companies across nearly 6,000 scenarios, revealing that even under zero pressure, the average failure rate was 19%. https://spectrum.ieee.org/ai-agents-safety?eicker.news #tech #media #news

#study #propensitybench #aiagents #harmfultools #realisticpressures #deadlines

tech news ᳇ eicker.news @[email protected] · 2025-12-01 · 10:11 UTC

A new #study using #PropensityBench, a benchmark for measuring #AIagents’ propensity to use #harmfultools, found that #realisticpressures like #deadlines and #financiallosses significantly increase #misbehaviour rates. The study tested a dozen models from various companies across nearly 6,000 scenarios, revealing that even under zero pressure, the average failure rate was 19%. https://spectrum.ieee.org/ai-agents-safety?eicker.news #tech #media #news

#study #propensitybench #aiagents #harmfultools #realisticpressures #deadlines

tech news ᳇ eicker.news @[email protected] · 2025-12-01 · 10:11 UTC

A new #study using #PropensityBench, a benchmark for measuring #AIagents’ propensity to use #harmfultools, found that #realisticpressures like #deadlines and #financiallosses significantly increase #misbehaviour rates. The study tested a dozen models from various companies across nearly 6,000 scenarios, revealing that even under zero pressure, the average failure rate was 19%. https://spectrum.ieee.org/ai-agents-safety?eicker.news #tech #media #news

#news #media #tech #misbehaviour #financiallosses #deadlines

tech news ᳇ eicker.news @[email protected] · 2025-12-01 · 10:11 UTC

A new #study using #PropensityBench, a benchmark for measuring #AIagents’ propensity to use #harmfultools, found that #realisticpressures like #deadlines and #financiallosses significantly increase #misbehaviour rates. The study tested a dozen models from various companies across nearly 6,000 scenarios, revealing that even under zero pressure, the average failure rate was 19%. https://spectrum.ieee.org/ai-agents-safety?eicker.news #tech #media #news

#study #propensitybench #aiagents #harmfultools #realisticpressures #deadlines