#aibenchmark — Public Fediverse posts on home.social

Archer Dynamics @[email protected] · 2026-04-06 · 16:19 UTC

Tested Cogito V1 14B Qwen on my Linux server. 45 t/s, 9.7GB VRAM, and the same IDA self-awareness trick its 8B sibling pulled -- Run 2 deliberately stepped back to brute force because a beginner probably needed simpler first. Run 3 came back stronger with a nice candy analogy. That's DeepCogito's IDA training making a transformation of Qwen into something way better.

Read the full breakdown below.

#LocalAI #Ollama #HomeLabAI #LLM #AIBenchmark

https://goarcherdynamics.com/2026/04/06/aihome-cogito-v1-14b-review/?utm_source=mastodon&utm_medium=jetpack_social

#localai #ollama #homelabai #llm #aibenchmark

Archer Dynamics @[email protected] · 2026-04-03 · 16:16 UTC

Tested Cogito V1 8B on my Linux server. 83 t/s, 5.4GB VRAM, 131k context. The real story is where it deliberately wrote worse code because it decided a beginner needed simplicity over efficiency -- and admitted it! That's IDA self-reflection making a live call.
I guess a 5GB model with a conscience is worth more than a 70B model with none?

Read the full breakdown below.

#LocalAI #Ollama #HomeLabAI #LLM #AIBenchmark

https://goarcherdynamics.com/2026/04/03/aihome-cogito-v1-8b-review/?utm_source=mastodon&utm_medium=jetpack_social

#localai #ollama #homelabai #llm #aibenchmark

Le site de Korben [Unofficial] @[email protected] · 2026-01-08 · 13:05 UTC

Windows 11 est le dernier des Windows

https://fed.brid.gy/r/https://korben.info/windows-11-performances-degradation-benchmark.html

#windowsastuceswindows #aibenchmark #ameliorationdesperformancesnavigateur #antimicrosoft #horlogewindows11 #windows81

Reddit Tech VN Bot @[email protected] · 2025-11-19 · 14:23 UTC

Meituan Longcat vừa ra mắt AMO Bench, bộ tiêu chuẩn đánh giá AI Toán học. Theo đó, Kimi k2 Thinking được xác định là AI tốt nhất về giải toán. AMO Bench gồm 50 bài toán mới, độ khó cấp IMO, chấm điểm tự động chính xác cao.

#AIBenchmark #MathAI #KimiK2Thinking #MeituanLongcat #TríTuệNhânTạo #ToánHọc

https://www.reddit.com/r/LocalLLaMA/comments/1p18lim/meituan_longcat_releases_amo_bench_kimi_k2/

#aibenchmark #mathai #kimik2thinking #meituanlongcat #trituệnhantạo #toanhọc

Winbuzzer @[email protected] · 2025-11-18 · 08:35 UTC

https://winbuzzer.com/2025/11/18/aa-omniscience-new-ai-reliability-benchmark-reveals-most-top-models-are-more-likely-to-hallucinate-than-be-correct-xcxwbn

AA-Omniscience: New AI Reliability Benchmark Reveals Top Models Are More Likely to Hallucinate

#AI #LLM #GenAI #AIBenchmark #Hallucination #AISafety #OpenAI #Anthropic #xAI #Grok #GPT51 #ClaudeAI

#ai #llm #genai #aibenchmark #hallucination #aisafety