#benchmarking — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #benchmarking, aggregated by home.social.
-
iai-callgrind gives you deterministic, instruction-count benchmarks. The catch: it needs Valgrind, and Valgrind has zero Apple Silicon support.
Here's how I run them locally on an M-series Mac in a native arm64 container - seccomp trap and all.
#rust #rustlang #performance #benchmarking #applesilicon #valgrind
https://martinhicks.dev/articles/running-iai-callgrind-on-apple-silicon
-
Measuring AI Security Effectiveness Proves Elusive
Measuring AI security effectiveness is a complex challenge that can't be reduced to a single score or benchmark. Relying on benchmarks alone simply doesn't work when it comes to safeguarding AI systems.
#AiSecurity #ArtificialIntelligence #Benchmarking #SecurityEffectiveness #SoftwareSecurity
-
RT @mr_r0b0t: Hier ist ein sehr beliebtes Modell, das wirklich vom richtigen Einsatz Ihrer @NVIDIAAI Blackwell GPU/GB10 mit NVFP4 und dem @AlibabaQwen 3.6-27B nativen MTP profitiert. Dies wurde auf einer einzelnen GB10 ausgeführt. Vollständige Benchmark-Ergebnisse und Methoden finden Sie unten ⏬
mehr auf Arint.info
#Benchmarking #BlackwellGPU #GB10 #NVFP4 #NVIDIAAI #Qwen3 #arint_info
-
via #AIFoundry : How to run evals for the model router
https://ift.tt/XAF1Ivt
#ModelRouter #Foundry #Evals #Evaluations #LLM #AIModelRouting #PromptEngineering #ModelSelection #Latency #Cost #Quality #Benchmarking #OpenSource #GitHub #EvalRepo #Azure #AzureOpenAI #Claude #Fou… -
via #AIFoundry : How to run evals for the model router
https://ift.tt/XAF1Ivt
#ModelRouter #Foundry #Evals #Evaluations #LLM #AIModelRouting #PromptEngineering #ModelSelection #Latency #Cost #Quality #Benchmarking #OpenSource #GitHub #EvalRepo #Azure #AzureOpenAI #Claude #Fou… -
via #AIFoundry : How to run evals for the model router
https://ift.tt/XAF1Ivt
#ModelRouter #Foundry #Evals #Evaluations #LLM #AIModelRouting #PromptEngineering #ModelSelection #Latency #Cost #Quality #Benchmarking #OpenSource #GitHub #EvalRepo #Azure #AzureOpenAI #Claude #Fou… -
via #AIFoundry : How to run evals for the model router
https://ift.tt/XAF1Ivt
#ModelRouter #Foundry #Evals #Evaluations #LLM #AIModelRouting #PromptEngineering #ModelSelection #Latency #Cost #Quality #Benchmarking #OpenSource #GitHub #EvalRepo #Azure #AzureOpenAI #Claude #Fou… -
via #AIFoundry : How to run evals for the model router
https://ift.tt/XAF1Ivt
#ModelRouter #Foundry #Evals #Evaluations #LLM #AIModelRouting #PromptEngineering #ModelSelection #Latency #Cost #Quality #Benchmarking #OpenSource #GitHub #EvalRepo #Azure #AzureOpenAI #Claude #Fou… -
Stellantis Joins World Class Manufacturing Association
🕑2 min read As the automotive industry continues evolv…
#Netherlands #Nederland #NL #Europe #Europa #EU #Stellantis #Assembly #Automation #Automotive #Benchmarking #Chrysler #Dodge #Efficiency #Engineering #Factories #global #Improvement #Industry #Innovation #Jeep #Lean #Manufacturing #Operations #Performance #Production #productivity #Quality #Ram #WCM #WCMA #WorldClassManufacturingAssociation
https://www.europesays.com/netherlands/11866/ -
🥳✨ Behold, the ultimate guide to running #LLMs that won't fry your toaster! In a shocking twist, they've actually ranked them by #performance instead of size—mind-blowing, right? 🎉 Because who doesn't want to spend eternity #benchmarking their #AI models instead of doing something productive? 😅
https://github.com/Andyyyy64/whichllm #Toaster #Safety #Productivity #HackerNews #ngated -
🥳✨ Behold, the ultimate guide to running #LLMs that won't fry your toaster! In a shocking twist, they've actually ranked them by #performance instead of size—mind-blowing, right? 🎉 Because who doesn't want to spend eternity #benchmarking their #AI models instead of doing something productive? 😅
https://github.com/Andyyyy64/whichllm #Toaster #Safety #Productivity #HackerNews #ngated -
🥳✨ Behold, the ultimate guide to running #LLMs that won't fry your toaster! In a shocking twist, they've actually ranked them by #performance instead of size—mind-blowing, right? 🎉 Because who doesn't want to spend eternity #benchmarking their #AI models instead of doing something productive? 😅
https://github.com/Andyyyy64/whichllm #Toaster #Safety #Productivity #HackerNews #ngated -
🥳✨ Behold, the ultimate guide to running #LLMs that won't fry your toaster! In a shocking twist, they've actually ranked them by #performance instead of size—mind-blowing, right? 🎉 Because who doesn't want to spend eternity #benchmarking their #AI models instead of doing something productive? 😅
https://github.com/Andyyyy64/whichllm #Toaster #Safety #Productivity #HackerNews #ngated -
🥳✨ Behold, the ultimate guide to running #LLMs that won't fry your toaster! In a shocking twist, they've actually ranked them by #performance instead of size—mind-blowing, right? 🎉 Because who doesn't want to spend eternity #benchmarking their #AI models instead of doing something productive? 😅
https://github.com/Andyyyy64/whichllm #Toaster #Safety #Productivity #HackerNews #ngated -
If you want to know who's taller, you don't measure people hours apart with a precise ruler - you line them up side by side
Denis Bazhenov (JetBrains) applies the same logic to microbenchmarking: instead of running implementations separately and comparing results, run them simultaneously on the same machine. Background noise affects both equally, and you measure relative performance directly.
🔗 https://oxidizeconf.com/sessions/just_stand_them_next_to_each_other
#Oxidize2026 #RustLang #Benchmarking #Perf #SystemsProgramming
-
If you want to know who's taller, you don't measure people hours apart with a precise ruler - you line them up side by side
Denis Bazhenov (JetBrains) applies the same logic to microbenchmarking: instead of running implementations separately and comparing results, run them simultaneously on the same machine. Background noise affects both equally, and you measure relative performance directly.
🔗 https://oxidizeconf.com/sessions/just_stand_them_next_to_each_other
#Oxidize2026 #RustLang #Benchmarking #Perf #SystemsProgramming
-
If you want to know who's taller, you don't measure people hours apart with a precise ruler - you line them up side by side
Denis Bazhenov (JetBrains) applies the same logic to microbenchmarking: instead of running implementations separately and comparing results, run them simultaneously on the same machine. Background noise affects both equally, and you measure relative performance directly.
🔗 https://oxidizeconf.com/sessions/just_stand_them_next_to_each_other
#Oxidize2026 #RustLang #Benchmarking #Perf #SystemsProgramming
-
If you want to know who's taller, you don't measure people hours apart with a precise ruler - you line them up side by side
Denis Bazhenov (JetBrains) applies the same logic to microbenchmarking: instead of running implementations separately and comparing results, run them simultaneously on the same machine. Background noise affects both equally, and you measure relative performance directly.
🔗 https://oxidizeconf.com/sessions/just_stand_them_next_to_each_other
#Oxidize2026 #RustLang #Benchmarking #Perf #SystemsProgramming
-
Through the looking glass of benchmark hacking
https://poolside.ai/blog/through-the-looking-glass
#HackerNews #benchmarking #hacking #cybersecurity #technology #insights
-
Through the looking glass of benchmark hacking
https://poolside.ai/blog/through-the-looking-glass
#HackerNews #benchmarking #hacking #cybersecurity #technology #insights
-
Through the looking glass of benchmark hacking
https://poolside.ai/blog/through-the-looking-glass
#HackerNews #benchmarking #hacking #cybersecurity #technology #insights
-
Through the looking glass of benchmark hacking
https://poolside.ai/blog/through-the-looking-glass
#HackerNews #benchmarking #hacking #cybersecurity #technology #insights
-
Through the looking glass of benchmark hacking
https://poolside.ai/blog/through-the-looking-glass
#HackerNews #benchmarking #hacking #cybersecurity #technology #insights
-
https://www.europesays.com/afrique/97722/ Télécoms : le Cameroun modernise sa régulation numérique #Afrique #art #benchmarking #Cameroun #CoopérationRégionale #Cybersécurité #EconomieNumérique #GouvernanceNumérique #InfrastructuresNumeriques #InnovationTechnologique #NCC #Nigeria #ProtectionDesConsommateurs #régulation #SpectreDesFréquences #telecommunications
-
Krankenhausmanagement: Datentransparenz wird zum strategischen Erfolgsfaktor
LOGEX sieht integrierte Datenstrategien als Schlüssel für wirtschaftliche Kliniksteuerung
#Benchmarking #Controlling #Datentransparenz #Kliniksteuerung #Krankenhausreform #LOGEX #Medizincontrolling -
https://www.europesays.com/uk/941105/ Luma opens Uni-1.1 API to image developers in Europe #Adobe #AIAdoption #AIEthics&Governance #AMD #ArtificialIntelligence(AI) #Benchmarking #ComputerVision #DeveloperTools #EU #Europe #Europe(European) #European #GenerativeAI(GenAI) #london #Luma #MachineLearning(ML) #MarketingTechnologies(MarTech) #PromptEngineering #SoftwareDevelopment #UnitedKingdomUK
-
😎 One of the coolest things about the F5 Academy last week was getting hands on with `warp` (from the MinIO team):
🔗 https://github.com/minio/warp
✨ Built for for S3 benchmarking but also great for testing the impact of infra changes...and pairs well with Prometheus and Grafana for observability.
🤭 I enjoyed it enough to reproduce the lab stack on `localhost` to continue experimenting tonight.
-
Our Eurographics short paper “ConJEB: A Large Elastic Contact Jet Engine Bracket Quadratic Program Dataset” is now available online!
Current QP benchmark datasets don't contain large, sparse problems that occur in many graphics applications. ConJEB addresses this by creating analogous contact problems for every simulation in the SimJEB dataset
https://diglib.eg.org/handle/10.2312/egs20261017
#Dataset #Simulation #QuadraticPrograms #Benchmarking #SimJEB #ConJEB
-
I benchmarked Claude Code's caveman plugin against "be brief."
https://www.maxtaylor.me/articles/i-benchmarked-caveman-against-two-words
#HackerNews #benchmarking #caveman #plugin #Claude #Code #efficiency #tech #news
-
I benchmarked Claude Code's caveman plugin against "be brief."
https://www.maxtaylor.me/articles/i-benchmarked-caveman-against-two-words
#HackerNews #benchmarking #caveman #plugin #Claude #Code #efficiency #tech #news
-
I benchmarked Claude Code's caveman plugin against "be brief."
https://www.maxtaylor.me/articles/i-benchmarked-caveman-against-two-words
#HackerNews #benchmarking #caveman #plugin #Claude #Code #efficiency #tech #news
-
I benchmarked Claude Code's caveman plugin against "be brief."
https://www.maxtaylor.me/articles/i-benchmarked-caveman-against-two-words
#HackerNews #benchmarking #caveman #plugin #Claude #Code #efficiency #tech #news
-
I benchmarked Claude Code's caveman plugin against "be brief."
https://www.maxtaylor.me/articles/i-benchmarked-caveman-against-two-words
#HackerNews #benchmarking #caveman #plugin #Claude #Code #efficiency #tech #news
-
🚀 Oh, joy! Another *essential* benchmark, because what #AI truly needs is a way to flex its lambda calculus muscles. 🤖💪 #GitHub user #VictorTaelin hits us with #LamBench v1, solving the urgent problem of AI's speed in abstract math land. 🧠✨ #TotallyUseful
https://victortaelin.github.io/lambench/ #Benchmarking #AI #Performance #HackerNews #ngated -
QIMMA قِمّة y su Impacto en el D…
QIMMA es un leaderboard que clasifica modelos de lenguaje árabe basándose en criterios de calidad. Su importancia radica en que proporciona un marco estandarizado para evaluar la efectividad y precisión de estos modelos, algo que ha sido escaso hasta ahora.
https://norvik.tech/news/analisis-qimma-arabic-llm-leaderboard
#Technology #Qimma #LlmArabe #EvaluacionTecnica #Benchmarking #NorvikTech #DesarrolloSoftware #TechInnovation
-
QIMMA قِمّة y su Impacto en el D…
QIMMA es un leaderboard que clasifica modelos de lenguaje árabe basándose en criterios de calidad. Su importancia radica en que proporciona un marco estandarizado para evaluar la efectividad y precisión de estos modelos, algo que ha sido escaso hasta ahora.
https://norvik.tech/news/analisis-qimma-arabic-llm-leaderboard
#Technology #Qimma #LlmArabe #EvaluacionTecnica #Benchmarking #NorvikTech #DesarrolloSoftware #TechInnovation
-
QIMMA قِمّة y su Impacto en el D…
QIMMA es un leaderboard que clasifica modelos de lenguaje árabe basándose en criterios de calidad. Su importancia radica en que proporciona un marco estandarizado para evaluar la efectividad y precisión de estos modelos, algo que ha sido escaso hasta ahora.
https://norvik.tech/news/analisis-qimma-arabic-llm-leaderboard
#Technology #Qimma #LlmArabe #EvaluacionTecnica #Benchmarking #NorvikTech #DesarrolloSoftware #TechInnovation
-
QIMMA قِمّة y su Impacto en el D…
QIMMA es un leaderboard que clasifica modelos de lenguaje árabe basándose en criterios de calidad. Su importancia radica en que proporciona un marco estandarizado para evaluar la efectividad y precisión de estos modelos, algo que ha sido escaso hasta ahora.
https://norvik.tech/news/analisis-qimma-arabic-llm-leaderboard
#Technology #Qimma #LlmArabe #EvaluacionTecnica #Benchmarking #NorvikTech #DesarrolloSoftware #TechInnovation
-
Как получать эффект от ИИ, когда нет железа
Вы уверены, что для внедрения корпоративного ИИ в закрытом контуре нужны суперкомпьютеры? Мы решили это проверить и добиться вменяемого качества от крошечной модели в максимально жестких условиях. CPU вместо GPU, закрытый контур. Кейс – научить крошечную LLM отвечать на вопросы по программе газификации РФ. В статье пошаговый разбор, код LLLaMBA для автоматизации бенчмарка и готовые конфиги. Повторите эксперимент на своих данных! Узнать, как мы это сделали
-
Over the weekend, I decided to benchmark some of the VPS's I host services on. I spun up new FreeBSD instances of varying sizes and benchmarked them using sysbench.
The chart shows the results in three main categories using relative performance numbers, each normalized to "out of 100" for easy comparison... (1/2)
-
Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!
"Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
https://www.sigarch.org/beyond-qubits-a-systems-view-of-hybrid-cv-dv-quantum-computing/The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
https://iiswc.org/iiswc2026/cftw.html
#IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization -
Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!
"Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
https://www.sigarch.org/beyond-qubits-a-systems-view-of-hybrid-cv-dv-quantum-computing/The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
https://iiswc.org/iiswc2026/cftw.html
#IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization -
Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!
"Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
https://www.sigarch.org/beyond-qubits-a-systems-view-of-hybrid-cv-dv-quantum-computing/The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
https://iiswc.org/iiswc2026/cftw.html
#IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization -
Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!
"Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
https://www.sigarch.org/beyond-qubits-a-systems-view-of-hybrid-cv-dv-quantum-computing/The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
https://iiswc.org/iiswc2026/cftw.html
#IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization