home.social

#benchmarking — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #benchmarking, aggregated by home.social.

  1. iai-callgrind gives you deterministic, instruction-count benchmarks. The catch: it needs Valgrind, and Valgrind has zero Apple Silicon support.

    Here's how I run them locally on an M-series Mac in a native arm64 container - seccomp trap and all.

    martinhicks.dev/articles/runni

  2. Measuring AI Security Effectiveness Proves Elusive

    Measuring AI security effectiveness is a complex challenge that can't be reduced to a single score or benchmark. Relying on benchmarks alone simply doesn't work when it comes to safeguarding AI systems.

    osintsights.com/measuring-ai-s

    #AiSecurity #ArtificialIntelligence #Benchmarking #SecurityEffectiveness #SoftwareSecurity

  3. RT @mr_r0b0t: Hier ist ein sehr beliebtes Modell, das wirklich vom richtigen Einsatz Ihrer @NVIDIAAI Blackwell GPU/GB10 mit NVFP4 und dem @AlibabaQwen 3.6-27B nativen MTP profitiert. Dies wurde auf einer einzelnen GB10 ausgeführt. Vollständige Benchmark-Ergebnisse und Methoden finden Sie unten ⏬

    mehr auf Arint.info

    #Benchmarking #BlackwellGPU #GB10 #NVFP4 #NVIDIAAI #Qwen3 #arint_info

    https://x.com/mr_r0b0t/status/2056953515092619474#m

  4. 🥳✨ Behold, the ultimate guide to running #LLMs that won't fry your toaster! In a shocking twist, they've actually ranked them by #performance instead of size—mind-blowing, right? 🎉 Because who doesn't want to spend eternity #benchmarking their #AI models instead of doing something productive? 😅
    github.com/Andyyyy64/whichllm #Toaster #Safety #Productivity #HackerNews #ngated

  5. 🥳✨ Behold, the ultimate guide to running #LLMs that won't fry your toaster! In a shocking twist, they've actually ranked them by #performance instead of size—mind-blowing, right? 🎉 Because who doesn't want to spend eternity #benchmarking their #AI models instead of doing something productive? 😅
    github.com/Andyyyy64/whichllm #Toaster #Safety #Productivity #HackerNews #ngated

  6. 🥳✨ Behold, the ultimate guide to running #LLMs that won't fry your toaster! In a shocking twist, they've actually ranked them by #performance instead of size—mind-blowing, right? 🎉 Because who doesn't want to spend eternity #benchmarking their #AI models instead of doing something productive? 😅
    github.com/Andyyyy64/whichllm #Toaster #Safety #Productivity #HackerNews #ngated

  7. 🥳✨ Behold, the ultimate guide to running #LLMs that won't fry your toaster! In a shocking twist, they've actually ranked them by #performance instead of size—mind-blowing, right? 🎉 Because who doesn't want to spend eternity #benchmarking their #AI models instead of doing something productive? 😅
    github.com/Andyyyy64/whichllm #Toaster #Safety #Productivity #HackerNews #ngated

  8. 🥳✨ Behold, the ultimate guide to running #LLMs that won't fry your toaster! In a shocking twist, they've actually ranked them by #performance instead of size—mind-blowing, right? 🎉 Because who doesn't want to spend eternity #benchmarking their #AI models instead of doing something productive? 😅
    github.com/Andyyyy64/whichllm #Toaster #Safety #Productivity #HackerNews #ngated

  9. If you want to know who's taller, you don't measure people hours apart with a precise ruler - you line them up side by side

    Denis Bazhenov (JetBrains) applies the same logic to microbenchmarking: instead of running implementations separately and comparing results, run them simultaneously on the same machine. Background noise affects both equally, and you measure relative performance directly.

    🔗 oxidizeconf.com/sessions/just_

    #Oxidize2026 #RustLang #Benchmarking #Perf #SystemsProgramming

  10. If you want to know who's taller, you don't measure people hours apart with a precise ruler - you line them up side by side

    Denis Bazhenov (JetBrains) applies the same logic to microbenchmarking: instead of running implementations separately and comparing results, run them simultaneously on the same machine. Background noise affects both equally, and you measure relative performance directly.

    🔗 oxidizeconf.com/sessions/just_

    #Oxidize2026 #RustLang #Benchmarking #Perf #SystemsProgramming

  11. If you want to know who's taller, you don't measure people hours apart with a precise ruler - you line them up side by side

    Denis Bazhenov (JetBrains) applies the same logic to microbenchmarking: instead of running implementations separately and comparing results, run them simultaneously on the same machine. Background noise affects both equally, and you measure relative performance directly.

    🔗 oxidizeconf.com/sessions/just_

    #Oxidize2026 #RustLang #Benchmarking #Perf #SystemsProgramming

  12. If you want to know who's taller, you don't measure people hours apart with a precise ruler - you line them up side by side

    Denis Bazhenov (JetBrains) applies the same logic to microbenchmarking: instead of running implementations separately and comparing results, run them simultaneously on the same machine. Background noise affects both equally, and you measure relative performance directly.

    🔗 oxidizeconf.com/sessions/just_

    #Oxidize2026 #RustLang #Benchmarking #Perf #SystemsProgramming

  13. 😎 One of the coolest things about the F5 Academy last week was getting hands on with `warp` (from the MinIO team):

    🔗 github.com/minio/warp

    ✨ Built for for S3 benchmarking but also great for testing the impact of infra changes...and pairs well with Prometheus and Grafana for observability.

    🤭 I enjoyed it enough to reproduce the lab stack on `localhost` to continue experimenting tonight.

    #F5Academy #S3 #benchmarking

  14. Our Eurographics short paper “ConJEB: A Large Elastic Contact Jet Engine Bracket Quadratic Program Dataset” is now available online!

    Current QP benchmark datasets don't contain large, sparse problems that occur in many graphics applications. ConJEB addresses this by creating analogous contact problems for every simulation in the SimJEB dataset

    diglib.eg.org/handle/10.2312/e

    #Dataset #Simulation #QuadraticPrograms #Benchmarking #SimJEB #ConJEB

  15. 🚀 Oh, joy! Another *essential* benchmark, because what #AI truly needs is a way to flex its lambda calculus muscles. 🤖💪 #GitHub user #VictorTaelin hits us with #LamBench v1, solving the urgent problem of AI's speed in abstract math land. 🧠✨ #TotallyUseful
    victortaelin.github.io/lambenc #Benchmarking #AI #Performance #HackerNews #ngated

  16. QIMMA قِمّة y su Impacto en el D…

    QIMMA es un leaderboard que clasifica modelos de lenguaje árabe basándose en criterios de calidad. Su importancia radica en que proporciona un marco estandarizado para evaluar la efectividad y precisión de estos modelos, algo que ha sido escaso hasta ahora.

    norvik.tech/news/analisis-qimm

    #Technology #Qimma #LlmArabe #EvaluacionTecnica #Benchmarking #NorvikTech #DesarrolloSoftware #TechInnovation

  17. QIMMA قِمّة y su Impacto en el D…

    QIMMA es un leaderboard que clasifica modelos de lenguaje árabe basándose en criterios de calidad. Su importancia radica en que proporciona un marco estandarizado para evaluar la efectividad y precisión de estos modelos, algo que ha sido escaso hasta ahora.

    norvik.tech/news/analisis-qimm

    #Technology #Qimma #LlmArabe #EvaluacionTecnica #Benchmarking #NorvikTech #DesarrolloSoftware #TechInnovation

  18. QIMMA قِمّة y su Impacto en el D…

    QIMMA es un leaderboard que clasifica modelos de lenguaje árabe basándose en criterios de calidad. Su importancia radica en que proporciona un marco estandarizado para evaluar la efectividad y precisión de estos modelos, algo que ha sido escaso hasta ahora.

    norvik.tech/news/analisis-qimm

    #Technology #Qimma #LlmArabe #EvaluacionTecnica #Benchmarking #NorvikTech #DesarrolloSoftware #TechInnovation

  19. QIMMA قِمّة y su Impacto en el D…

    QIMMA es un leaderboard que clasifica modelos de lenguaje árabe basándose en criterios de calidad. Su importancia radica en que proporciona un marco estandarizado para evaluar la efectividad y precisión de estos modelos, algo que ha sido escaso hasta ahora.

    norvik.tech/news/analisis-qimm

    #Technology #Qimma #LlmArabe #EvaluacionTecnica #Benchmarking #NorvikTech #DesarrolloSoftware #TechInnovation

  20. Как получать эффект от ИИ, когда нет железа

    Вы уверены, что для внедрения корпоративного ИИ в закрытом контуре нужны суперкомпьютеры? Мы решили это проверить и добиться вменяемого качества от крошечной модели в максимально жестких условиях. CPU вместо GPU, закрытый контур. Кейс – научить крошечную LLM отвечать на вопросы по программе газификации РФ. В статье пошаговый разбор, код LLLaMBA для автоматизации бенчмарка и готовые конфиги. Повторите эксперимент на своих данных! Узнать, как мы это сделали

    habr.com/ru/companies/gazpromc

    #llm #rag #benchmarking #gpu

  21. Over the weekend, I decided to benchmark some of the VPS's I host services on. I spun up new FreeBSD instances of varying sizes and benchmarked them using sysbench.

    The chart shows the results in three main categories using relative performance numbers, each normalized to "out of 100" for easy comparison... (1/2)

    #vps #freebsd #benchmark #benchmarking #vultr #lightsail

  22. Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!

    "Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
    by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
    sigarch.org/beyond-qubits-a-sy

    The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
    📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
    Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
    iiswc.org/iiswc2026/cftw.html
    #IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization

  23. Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!

    "Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
    by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
    sigarch.org/beyond-qubits-a-sy

    The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
    📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
    Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
    iiswc.org/iiswc2026/cftw.html
    #IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization

  24. Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!

    "Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
    by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
    sigarch.org/beyond-qubits-a-sy

    The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
    📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
    Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
    iiswc.org/iiswc2026/cftw.html
    #IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization

  25. Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!

    "Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
    by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
    sigarch.org/beyond-qubits-a-sy

    The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
    📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
    Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
    iiswc.org/iiswc2026/cftw.html
    #IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization