#benchmarking — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #benchmarking, aggregated by home.social.
-
Through the looking glass of benchmark hacking
https://poolside.ai/blog/through-the-looking-glass
#HackerNews #benchmarking #hacking #cybersecurity #technology #insights
-
Through the looking glass of benchmark hacking
https://poolside.ai/blog/through-the-looking-glass
#HackerNews #benchmarking #hacking #cybersecurity #technology #insights
-
Through the looking glass of benchmark hacking
https://poolside.ai/blog/through-the-looking-glass
#HackerNews #benchmarking #hacking #cybersecurity #technology #insights
-
Through the looking glass of benchmark hacking
https://poolside.ai/blog/through-the-looking-glass
#HackerNews #benchmarking #hacking #cybersecurity #technology #insights
-
Through the looking glass of benchmark hacking
https://poolside.ai/blog/through-the-looking-glass
#HackerNews #benchmarking #hacking #cybersecurity #technology #insights
-
https://www.europesays.com/afrique/97722/ Télécoms : le Cameroun modernise sa régulation numérique #Afrique #art #benchmarking #Cameroun #CoopérationRégionale #Cybersécurité #EconomieNumérique #GouvernanceNumérique #InfrastructuresNumeriques #InnovationTechnologique #NCC #Nigeria #ProtectionDesConsommateurs #régulation #SpectreDesFréquences #telecommunications
-
https://www.europesays.com/uk/941105/ Luma opens Uni-1.1 API to image developers in Europe #Adobe #AIAdoption #AIEthics&Governance #AMD #ArtificialIntelligence(AI) #Benchmarking #ComputerVision #DeveloperTools #EU #Europe #Europe(European) #European #GenerativeAI(GenAI) #london #Luma #MachineLearning(ML) #MarketingTechnologies(MarTech) #PromptEngineering #SoftwareDevelopment #UnitedKingdomUK
-
😎 One of the coolest things about the F5 Academy last week was getting hands on with `warp` (from the MinIO team):
🔗 https://github.com/minio/warp
✨ Built for for S3 benchmarking but also great for testing the impact of infra changes...and pairs well with Prometheus and Grafana for observability.
🤭 I enjoyed it enough to reproduce the lab stack on `localhost` to continue experimenting tonight.
-
Our Eurographics short paper “ConJEB: A Large Elastic Contact Jet Engine Bracket Quadratic Program Dataset” is now available online!
Current QP benchmark datasets don't contain large, sparse problems that occur in many graphics applications. ConJEB addresses this by creating analogous contact problems for every simulation in the SimJEB dataset
https://diglib.eg.org/handle/10.2312/egs20261017
#Dataset #Simulation #QuadraticPrograms #Benchmarking #SimJEB #ConJEB
-
Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!
"Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
https://www.sigarch.org/beyond-qubits-a-systems-view-of-hybrid-cv-dv-quantum-computing/The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
https://iiswc.org/iiswc2026/cftw.html
#IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization -
Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!
"Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
https://www.sigarch.org/beyond-qubits-a-systems-view-of-hybrid-cv-dv-quantum-computing/The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
https://iiswc.org/iiswc2026/cftw.html
#IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization -
Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!
"Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
https://www.sigarch.org/beyond-qubits-a-systems-view-of-hybrid-cv-dv-quantum-computing/The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
https://iiswc.org/iiswc2026/cftw.html
#IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization -
Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!
"Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
https://www.sigarch.org/beyond-qubits-a-systems-view-of-hybrid-cv-dv-quantum-computing/The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
https://iiswc.org/iiswc2026/cftw.html
#IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization -
Monday madness brings us a new SIGARCH blog and the call for tutorial & workshop proposals for IISWC. Check em out!
"Beyond Qubits: A Systems View of Hybrid CV-DV Quantum Computing"
by Yuan Liu, Zihan Chen, Shubdeep Mohapatra, Jim Furches, Zheng (Eddy) Zhang, Huiyang Zhou
https://www.sigarch.org/beyond-qubits-a-systems-view-of-hybrid-cv-dv-quantum-computing/The IISWC 2026 Tutorial & Workshop CFP is officially OPEN!
📢 If you are working on simulation tools, evaluation methodologies, or emerging domains like AI/ML and Cloud, submit your session proposals.
Sept 27, 2026 Boulder, Colorado Submit to: [email protected]
https://iiswc.org/iiswc2026/cftw.html
#IISWC2026 #ComputerArchitecture #Benchmarking #WorkloadCharacterization -
Ah yes, the riveting world of #C++ hashmaps—where time stands still and somehow still takes forever. 😴🔄 After 3 years, Martin bravely ventures back to prove that the art of #benchmarking is the best way to waste your time while pretending to be productive. ⚙️📉
https://martin.ankerl.com/2022/08/27/hashmap-bench-01/ #hashmaps #productivity #timewasting #techhumor #HackerNews #ngated -
Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study We present a cross-architecture evaluation of production LLM inference on AMD Inst...
#Computer #science #paper #AMD #Radeon #Instinct #MI325X #Benchmarking #LLM
Origin | Interest | Match -
Thinkpad #X230 of 2013 (gift from friend),
#apple #imac 18.2 of 2017 (350 euros 64 Gb RAM 27" Retina display) ,
#hetzner CX43 #vm (10 euros per month) online server,
#Supermicro (SM in table on right) X11-WTR SYS-5019P-WTR #Xeon Silver 4110 × 16 (basic parts 500 euros 2nd hand)
CPU comparisons using #hardinfo2 #benchmarking #homelab -
Thinkpad #X230 of 2013 (gift from friend),
#apple #imac 18.2 of 2017 (350 euros 64 Gb RAM 27" Retina display) ,
#hetzner CX43 #vm (10 euros per month) online server,
#Supermicro (SM in table on right) X11-WTR SYS-5019P-WTR #Xeon Silver 4110 × 16 (basic parts 500 euros 2nd hand)
CPU comparisons using #hardinfo2 #benchmarking #homelab -
Thinkpad #X230 of 2013 (gift from friend),
#apple #imac 18.2 of 2017 (350 euros 64 Gb RAM 27" Retina display) ,
#hetzner CX43 #vm (10 euros per month) online server,
#Supermicro (SM in table on right) X11-WTR SYS-5019P-WTR #Xeon Silver 4110 × 16 (basic parts 500 euros 2nd hand)
CPU comparisons using #hardinfo2 #benchmarking #homelab -
Thinkpad #X230 of 2013 (gift from friend),
#apple #imac 18.2 of 2017 (350 euros 64 Gb RAM 27" Retina display) ,
#hetzner CX43 #vm (10 euros per month) online server,
#Supermicro (SM in table on right) X11-WTR SYS-5019P-WTR #Xeon Silver 4110 × 16 (basic parts 500 euros 2nd hand)
CPU comparisons using #hardinfo2 #benchmarking #homelab -
Thinkpad #X230 of 2013 (gift from friend),
#apple #imac 18.2 of 2017 (350 euros 64 Gb RAM 27" Retina display) ,
#hetzner CX43 #vm (10 euros per month) online server,
#Supermicro (SM in table on right) X11-WTR SYS-5019P-WTR #Xeon Silver 4110 × 16 (basic parts 500 euros 2nd hand)
CPU comparisons using #hardinfo2 #benchmarking #homelab -
I built labeille to find CPython JIT crashes, but it's a "run real world test suites at scale" platform.
It also works for:
— Checking which packages pass their tests on a new CPython version
— Testing free-threaded (no-GIL) CPython compatibility
— Measuring coverage.py or memray overhead across hundreds of packages
— Comparing CPython vs PyPy performance on real codeThe registry of 350+ packages with install/test commands is the core.
-
I've been working on a new Python tool: labeille. Its main purpose is to look for CPython JIT crashes by running real world test suites.
https://github.com/devdanzin/labeille
But it's grown a feature that might interest more people: benchmarking using PyPI packages.
How does that work?
labeille allows you to run test suites in 2 different configurations. Say, with coverage on and off, or memray on and off. Here's an example:
https://gist.github.com/devdanzin/63528343df98779b5fedf657bf8286cd
-
I've been working on a new Python tool: labeille. Its main purpose is to look for CPython JIT crashes by running real world test suites.
https://github.com/devdanzin/labeille
But it's grown a feature that might interest more people: benchmarking using PyPI packages.
How does that work?
labeille allows you to run test suites in 2 different configurations. Say, with coverage on and off, or memray on and off. Here's an example:
https://gist.github.com/devdanzin/63528343df98779b5fedf657bf8286cd
-
I've been working on a new Python tool: labeille. Its main purpose is to look for CPython JIT crashes by running real world test suites.
https://github.com/devdanzin/labeille
But it's grown a feature that might interest more people: benchmarking using PyPI packages.
How does that work?
labeille allows you to run test suites in 2 different configurations. Say, with coverage on and off, or memray on and off. Here's an example:
https://gist.github.com/devdanzin/63528343df98779b5fedf657bf8286cd
-
OTelBench: AI struggles with simple SRE tasks (Opus 4.5 scores only 29%)
https://quesma.com/blog/introducing-otel-bench/
#ycombinator #benchmarking #opentelemetry #observability #llm #instrumentation #tracing -
Companies Overpay 5-10x for LLMs Without Benchmarking Alternatives Companies are wasting billions on expensive large language models (LLMs) without benchmarking them against specific needs, often o...
#AITrends #AI #cost #efficiency #large #language #models #LLM #benchmarking #overpayment #trap
Origin | Interest | Match -
Without benchmarking LLMs, you're likely overpaying 5-10x
https://karllorey.com/posts/without-benchmarking-llms-youre-overpaying
#HackerNews #LLMs #Benchmarking #Overpaying #AIInsights #CostEfficiency
-
#throwback What started as a simple DBaaS comparison turned into a deep dive into PostgreSQL benchmarking
🚀 Dirk Krautschick shares hard-earned lessons on tools, workloads, tuning, and real vs synthetic benchmarks. Avoid common pitfalls and benchmark smarter.▶️ Watch now! https://www.youtube.com/watch?v=aB5dNcpBI44&list=PL_m-TUcr7ZvnSBmPoxZvcB1lfy7C9eced&index=7
-
#throwback What started as a simple DBaaS comparison turned into a deep dive into PostgreSQL benchmarking
🚀 Dirk Krautschick shares hard-earned lessons on tools, workloads, tuning, and real vs synthetic benchmarks. Avoid common pitfalls and benchmark smarter.▶️ Watch now! https://www.youtube.com/watch?v=aB5dNcpBI44&list=PL_m-TUcr7ZvnSBmPoxZvcB1lfy7C9eced&index=7
-
#throwback What started as a simple DBaaS comparison turned into a deep dive into PostgreSQL benchmarking
🚀 Dirk Krautschick shares hard-earned lessons on tools, workloads, tuning, and real vs synthetic benchmarks. Avoid common pitfalls and benchmark smarter.▶️ Watch now! https://www.youtube.com/watch?v=aB5dNcpBI44&list=PL_m-TUcr7ZvnSBmPoxZvcB1lfy7C9eced&index=7
-
#throwback What started as a simple DBaaS comparison turned into a deep dive into PostgreSQL benchmarking
🚀 Dirk Krautschick shares hard-earned lessons on tools, workloads, tuning, and real vs synthetic benchmarks. Avoid common pitfalls and benchmark smarter.▶️ Watch now! https://www.youtube.com/watch?v=aB5dNcpBI44&list=PL_m-TUcr7ZvnSBmPoxZvcB1lfy7C9eced&index=7
-
#throwback What started as a simple DBaaS comparison turned into a deep dive into PostgreSQL benchmarking
🚀 Dirk Krautschick shares hard-earned lessons on tools, workloads, tuning, and real vs synthetic benchmarks. Avoid common pitfalls and benchmark smarter.▶️ Watch now! https://www.youtube.com/watch?v=aB5dNcpBI44&list=PL_m-TUcr7ZvnSBmPoxZvcB1lfy7C9eced&index=7
-
Die @Cyberagentur startet HEGEMON, einen europaweit einzigartigen Forschungswettbewerb zur Bewertung und Anpassung von Foundation Models für sicherheitskritische Anwendungen. Vier Teams entwickeln Benchmarks und KI-Modelle für komplexe Aufgaben im Geoinformationswesen.
Mehr dazu: https://t1p.de/7ct97
#Cyberagentur #HEGEMON #KI #FoundationModels #Cybersicherheit #Benchmarking -
Cộng đồng đang tìm kiếm công cụ benchmark tốt nhất cho các cổng AI LiteLLM và mô hình. Các tiêu chí quan trọng bao gồm TTFT, tốc độ xuất token, độ chính xác, và kiểm tra dưới áp lực. Bạn có biết công cụ "plug and play" nào không?
#AI #Benchmarking #LiteLLM #LLM #Tools #ArtificialIntelligence #ĐánhGiáAI #CôngCụAI #HọcMáy
https://www.reddit.com/r/LocalLLaMA/comments/1pduptm/best_current_benchmarking_tool/
-
[Перевод] GDPval: измерение производительности AI-моделей на реальных задачах
Наша миссия — обеспечить то, чтобы искусственный общий интеллект (AGI) приносил пользу всему человечеству. В рамках этой миссии мы стремимся максимально прозрачно освещать прогресс того, как AI-модели учатся помогать людям в реальной жизни. Именно поэтому мы представляем GDPval — новую систему оценки, разработанную для отслеживания того, насколько эффективно наши модели и модели других разработчиков справляются с задачами, имеющими экономическую ценность и практическое значение. Мы назвали эту метрику GDPval, потому что она вдохновлена концепцией валового внутреннего продукта (ВВП, англ. GDP) как ключевого экономического индикатора, а набор задач основан на типичных ролях в индустриях, которые вносят наибольший вклад в ВВП. Люди часто рассуждают о масштабном влиянии AI на общество, но самый наглядный способ понять каков его потенциал, это посмотреть на то, что модели уже умеют делать на практике. История показывает, что крупным технологиям, от интернета до смартфонов, требовалось более десяти лет, чтобы пройти путь от изобретения до массового внедрения. Такие оценки, как GDPval, помогают приземлить разговоры о будущем ИИ на факты, а не на догадки, и дают возможность отслеживать прогресс моделей во времени.
https://habr.com/ru/articles/962702/
#ai #llm #openai #gpt #genai #benchmark #benchmarking #chatgpt #open_ai
-
The @association is taking the next big step toward trustworthy AI! 12 projects from the UNLOCK call will build open, high-quality multimodal, and cross-domain benchmarks to test & compare AI models across science.
More 👉 https://helmholtz-imaging.de/news/helmholtz-launches-first-unlock-benchmarking-projects/
#AI #Benchmarking #Helmholtz #OpenScience #unlock@Helmholtz_HZI @helmholtz_hips @DKFZ @dzne @fzj @KIT_Karlsruhe @HelmholtzMunich @helmholtz_hmc @ufz @awi @DLR @hzbde @GFZ @hereon
-
Oh joy, yet another benchmark suite promising to revolutionize #coding with a dazzling cocktail of #languages nobody actually uses 😴. I mean, really, #Oberon and Luon? Are we fast yet, or just fast asleep? 😂🚀
https://github.com/rochus-keller/Are-we-fast-yet #benchmarking #revolution #Luon #fastasleep #HackerNews #ngated -
Are-we-fast-yet: Benchmark suite in Oberon, C++, C, Pascal, Micron and Luon
https://github.com/rochus-keller/Are-we-fast-yet
#HackerNews #AreWeFastYet #Benchmarking #Oberon #C++ #C #Pascal #Micron #Luon
-
[Перевод] Неожиданный результат: ИИ замедляет опытных разработчиков
Мы провели рандомизированное контролируемое исследование (RCT), чтобы оценить, как инструменты искусственного интеллекта начала 2025 года влияют на продуктивность опытных open-source разработчиков, работающих в своих собственных репозиториях. Неожиданно оказалось, что при использовании ИИ-инструментов разработчики выполняют задачи на 19% дольше, чем без них — то есть ИИ замедляет их работу. Мы рассматриваем этот результат как срез текущего уровня возможностей ИИ в одном из прикладных сценариев. Поскольку системы продолжают стремительно развиваться, мы планируем использовать аналогичную методологию в будущем, чтобы отслеживать, насколько ИИ способен ускорять работу в сфере автоматизации R&D [1] . Подробности — в полной версии статьи .
https://habr.com/ru/articles/936938/
#ai #ai_agent #ai_tools #benchmark #benchmarking #development #opensource #developer #ии #ии_помощник