Sign in Create account

#llmbenchmark — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #llmbenchmark, aggregated by home.social.

Reddit Tech VN Bot @[email protected] · 2026-01-27 · 22:24 UTC

Bàn về hiệu năng hệ thống AI workstation kép RTX PRO 6000 với 1.15TB RAM: So sánh xử lý GPU-only (INT4) vs CPU+GPU (fp8) trên mô hình MiniMax-M2.1. Kết quả: GPU-only nhanh hơn 2–4x ở prefill nhưng chỉ xử lý tối đa ~3 request đồng thời do giới hạn KV-cache..fp8 tuy chậm hơn nhưng mở rộng tốt hơn cho 10+ người dùng, đặc biệt với context dài. Queue time là điểm nghẽn quan trọng. Phù hợp cho agent coding nội bộ. #AIWorkstation #LLMBenchmark #MultiUserAI #GPUvsCPU #LocalLLM #HPC #MachineLearning #Tín

#aiworkstation #llmbenchmark #multiuserai #gpuvscpu #localllm #hpc
Hacker News @[email protected] · 2025-12-10 · 15:52 UTC

New benchmark shows top LLMs struggle in real mental health care
https://swordhealth.com/newsroom/sword-introduces-mindeval
#HackerNews #LLMbenchmark #MentalHealth #AIinHealthcare #MentalHealthTech #HealthcareInnovation

#hackernews #llmbenchmark #mentalhealth #aiinhealthcare #mentalhealthtech #healthcareinnovation
Hacker News @[email protected] · 2025-12-10 · 15:52 UTC

New benchmark shows top LLMs struggle in real mental health care
https://swordhealth.com/newsroom/sword-introduces-mindeval
#HackerNews #LLMbenchmark #MentalHealth #AIinHealthcare #MentalHealthTech #HealthcareInnovation

#hackernews #llmbenchmark #mentalhealth #aiinhealthcare #mentalhealthtech #healthcareinnovation
Hacker News @[email protected] · 2025-12-10 · 15:52 UTC

New benchmark shows top LLMs struggle in real mental health care
https://swordhealth.com/newsroom/sword-introduces-mindeval
#HackerNews #LLMbenchmark #MentalHealth #AIinHealthcare #MentalHealthTech #HealthcareInnovation

#hackernews #llmbenchmark #mentalhealth #aiinhealthcare #mentalhealthtech #healthcareinnovation
Hacker News @[email protected] · 2025-12-10 · 15:52 UTC

New benchmark shows top LLMs struggle in real mental health care
https://swordhealth.com/newsroom/sword-introduces-mindeval
#HackerNews #LLMbenchmark #MentalHealth #AIinHealthcare #MentalHealthTech #HealthcareInnovation

#healthcareinnovation #mentalhealthtech #aiinhealthcare #mentalhealth #llmbenchmark #hackernews
Hacker News @[email protected] · 2025-12-10 · 15:52 UTC

New benchmark shows top LLMs struggle in real mental health care
https://swordhealth.com/newsroom/sword-introduces-mindeval
#HackerNews #LLMbenchmark #MentalHealth #AIinHealthcare #MentalHealthTech #HealthcareInnovation

#hackernews #llmbenchmark #mentalhealth #aiinhealthcare #mentalhealthtech #healthcareinnovation
Giskard @Giskard · 2025-05-07 · 07:30 UTC

The article present some key findings from our benchmark:
- Most widely used models aren't necessarily the most reliable
- Some models tend to agree with users regardless of factual accuracy
- The way questions are phrased impacts response reliability
Thanks to Les Echos and Joséphine Boone for this coverage 🤝
Read the article here: https://www.lesechos.fr/tech-medias/intelligence-artificielle/desinformation-rumeurs-influences-quelles-ia-hallucinent-le-plus-2163628
#AISecurity #LLMBenchmark #LesEchos

#aisecurity #llmbenchmark #lesechos