#avx512 — Public Fediverse posts on home.social

Habr @[email protected] · 2026-07-18 · 14:22 UTC

Бенчмаркая Sum: ускорил циклом — замедлил в ×4,7

Уважаемые читатели, в этой статье я хочу рассказать о том, что происходит внутри values.Sum() в современном .NET — там нашлись векторные инструкции, контроль переполнения и список процессоров, которым рантайм намеренно ограничивает ширину вектора, — и представить свои выводы. В прошлых статьях серии самописные циклы уже проигрывали BCL в поиске по строке , JIT сам выкидывал проверки границ , а foreach прятал аллокации . Тут случай интереснее: values.Sum() — это LINQ, который при оптимизации первым делом меняют на цикл.

https://habr.com/ru/articles/1060470/

#benchmarkdotnet #linq #sum #simd #avx512 #векторизация #производительность #jit #ryujit #overflowexception

#overflowexception #ryujit #jit #производительность #векторизация #avx512

Habr @[email protected] · 2026-07-15 · 19:22 UTC

Бенчмаркая поиск по строке: самописные циклы проигрывают от ×14 до ×154

Уважаемые читатели, в этой статье я хочу рассказать про поиск по строке и представить свои выводы. Началось с оптимизации: сравнил поиск символа циклом со string.IndexOf — и получил разницу в разы. Заодно выяснилось, что серверный Xeon с AVX-512 в этой задаче медленнее игрового десктопа с AVX2. Ниже разбор обоих фактов с дизасмом и замерами. Будет четыре истории, и в каждой — вопрос, на который я искал ответ:

https://habr.com/ru/articles/1059624/

#simd #avx2 #avx512 #benchmarkdotnet #indexof #searchvalues #ryujit #дизасм #производительность #бенчмарк

#бенчмарк #производительность #дизасм #ryujit #searchvalues #indexof

N-gated Hacker News @[email protected] · 2026-06-27 · 09:16 UTC

Manticore’s high-speed #KNN quest is hilariously thwarted by the cunning duo of #JavaScript and #cookies. 🤖🍪 AVX-512 and batched distances might sound fancy, but apparently, they’re no match for a simple browser setting. 🚀🔒
https://medium.com/@s_nikolaev/faster-knn-search-in-manticore-2-pass-hnsw-batched-distances-and-avx-512-b85604647aab #Manticore #AVX512 #TechHumor #HackerNews #ngated

#knn #javascript #cookies #manticore #avx512 #techhumor

N-gated Hacker News @[email protected] · 2026-06-27 · 09:16 UTC

Manticore’s high-speed #KNN quest is hilariously thwarted by the cunning duo of #JavaScript and #cookies. 🤖🍪 AVX-512 and batched distances might sound fancy, but apparently, they’re no match for a simple browser setting. 🚀🔒
https://medium.com/@s_nikolaev/faster-knn-search-in-manticore-2-pass-hnsw-batched-distances-and-avx-512-b85604647aab #Manticore #AVX512 #TechHumor #HackerNews #ngated

#knn #javascript #cookies #manticore #avx512 #techhumor

Hacker News @[email protected] · 2026-06-27 · 09:16 UTC

Faster KNN search in Manticore: 2-pass HNSW, batched distances, and AVX-512

https://medium.com/@s_nikolaev/faster-knn-search-in-manticore-2-pass-hnsw-batched-distances-and-avx-512-b85604647aab

#HackerNews #FasterKNN #Manticore #HNSW #AVX512 #MachineLearning

#hackernews #fasterknn #manticore #hnsw #avx512 #machinelearning

Hacker News @[email protected] · 2026-06-27 · 09:16 UTC

Faster KNN search in Manticore: 2-pass HNSW, batched distances, and AVX-512

https://medium.com/@s_nikolaev/faster-knn-search-in-manticore-2-pass-hnsw-batched-distances-and-avx-512-b85604647aab

#HackerNews #FasterKNN #Manticore #HNSW #AVX512 #MachineLearning

#hackernews #fasterknn #manticore #hnsw #avx512 #machinelearning

Hacker News @[email protected] · 2026-06-21 · 05:22 UTC

Zigzag Decoding with AVX-512

https://zeux.io/2026/06/17/zigzag-decoding-avx512/

#HackerNews #Zigzag #Decoding #AVX512 #HighPerformance #Computing #Optimization

#hackernews #zigzag #decoding #avx512 #highperformance #computing

Hacker News @[email protected] · 2026-06-21 · 05:22 UTC

Zigzag Decoding with AVX-512

https://zeux.io/2026/06/17/zigzag-decoding-avx512/

#HackerNews #Zigzag #Decoding #AVX512 #HighPerformance #Computing #Optimization

#hackernews #zigzag #decoding #avx512 #highperformance #computing

Hacker News @[email protected] · 2026-02-21 · 07:00 UTC

The Evolution of x86 SIMD: From SSE to AVX-512

https://bgslabs.org/blog/evolution-of-x86-simd/

#HackerNews #x86 #SIMD #SSE #AVX512 #technology #evolution #programming

#hackernews #x86 #simd #sse #avx512 #technology

Hacker News @[email protected] · 2026-02-21 · 07:00 UTC

The Evolution of x86 SIMD: From SSE to AVX-512

https://bgslabs.org/blog/evolution-of-x86-simd/

#HackerNews #x86 #SIMD #SSE #AVX512 #technology #evolution #programming

#hackernews #x86 #simd #sse #avx512 #technology

Habr @[email protected] · 2026-01-28 · 09:02 UTC

Ускорение вычислений в алгоритме DRS-виртуализации через векторизацию

Переписать решение с Python на Go и получить ускорение в 35 раз — звучит приятно. Но можно ведь пойти дальше, вспомнить о возможностях современных процессоров и увеличить отрыв Go до 200 раз! Статья написана по мотивам доклада для

https://habr.com/ru/companies/oleg-bunin/articles/980710/

#avx #avx2 #avx512 #бэкенд #разработка #go #golang #phyton #ускорение #ускорение_кода

#ускорение_кода #ускорение #phyton #golang #go #разработка

N-gated Hacker News @[email protected] · 2026-01-19 · 03:51 UTC

AVX-512: a #magical #unicorn that only arrives with the right incantation and a sprinkle of fairy dust! 🦄✨ Who knew that making CPUs sweat like a marathon runner was the secret to true programmability? 😂💻
https://shihab-shahriar.github.io//blog/2026/AVX-512-First-Impressions-on-Performance-and-Programmability/ #AVX512 #CPU #programmability #tech #humor #HackerNews #ngated

#magical #unicorn #avx512 #cpu #programmability #tech

N-gated Hacker News @[email protected] · 2026-01-19 · 03:51 UTC

AVX-512: a #magical #unicorn that only arrives with the right incantation and a sprinkle of fairy dust! 🦄✨ Who knew that making CPUs sweat like a marathon runner was the secret to true programmability? 😂💻
https://shihab-shahriar.github.io//blog/2026/AVX-512-First-Impressions-on-Performance-and-Programmability/ #AVX512 #CPU #programmability #tech #humor #HackerNews #ngated

#magical #unicorn #avx512 #cpu #programmability #tech

Hacker News @[email protected] · 2026-01-19 · 03:51 UTC

AVX-512: First Impressions on Performance and Programmability

https://shihab-shahriar.github.io//blog/2026/AVX-512-First-Impressions-on-Performance-and-Programmability/

#HackerNews #AVX512 #Performance #Programmability #TechInsights

#hackernews #avx512 #performance #programmability #techinsights

Hacker News @[email protected] · 2026-01-19 · 03:51 UTC

AVX-512: First Impressions on Performance and Programmability

https://shihab-shahriar.github.io//blog/2026/AVX-512-First-Impressions-on-Performance-and-Programmability/

#HackerNews #AVX512 #Performance #Programmability #TechInsights

#hackernews #avx512 #performance #programmability #techinsights

Hanno Rein @[email protected] · 2026-01-13 · 22:23 UTC

FWIW, the code uses #AVX512 at other places in the code. That could be related (i.e. some alignment issue?). But I'm lost as how to find the right spot as the code the debuggers point to has no AVX512.

#avx512

Hanno Rein @[email protected] · 2026-01-13 · 22:23 UTC

FWIW, the code uses #AVX512 at other places in the code. That could be related (i.e. some alignment issue?). But I'm lost as how to find the right spot as the code the debuggers point to has no AVX512.

#avx512

:rss: Qiita - 人気の記事 @[email protected] · 2025-12-24 · 12:05 UTC

Intel CPUのAVX-512ユニットの5番ポートは物理的に遠い：文献調査編
https://qiita.com/Terminus-IMRC/items/659d4fd502a96baab9c5?utm_campaign=popular_items&utm_medium=feed&utm_source=popular_items

#qiita #SIMD #avx512 #AVX_512

#qiita #simd #avx512 #avx_512

N-gated Hacker News @[email protected] · 2025-12-16 · 12:23 UTC

🎉 30 minutes of your life you'll never get back! 🚀 Why settle for regular #Unicode searches when you can dive into a 6,244-word #saga on AVX-512? 🤦‍♂️ #ICU, you're so #slow, but hey, at least you're thorough! 😏
https://ashvardanian.com/posts/search-utf8/ #AVX512 #Thorough #HackerNews #ngated

#unicode #saga #icu #slow #avx512 #thorough

N-gated Hacker News @[email protected] · 2025-12-16 · 12:23 UTC

🎉 30 minutes of your life you'll never get back! 🚀 Why settle for regular #Unicode searches when you can dive into a 6,244-word #saga on AVX-512? 🤦‍♂️ #ICU, you're so #slow, but hey, at least you're thorough! 😏
https://ashvardanian.com/posts/search-utf8/ #AVX512 #Thorough #HackerNews #ngated

#unicode #saga #icu #slow #avx512 #thorough

Hacker News @[email protected] · 2025-12-16 · 12:23 UTC

Full Unicode Search at 50× ICU Speed with AVX‑512

https://ashvardanian.com/posts/search-utf8/

#HackerNews #FullUnicodeSearch #AVX512 #ICUSpeed #UTF8Optimization #TechInnovation

#hackernews #fullunicodesearch #avx512 #icuspeed #utf8optimization #techinnovation

Hacker News @[email protected] · 2025-12-16 · 12:23 UTC

Full Unicode Search at 50× ICU Speed with AVX‑512

https://ashvardanian.com/posts/search-utf8/

#HackerNews #FullUnicodeSearch #AVX512 #ICUSpeed #UTF8Optimization #TechInnovation

#hackernews #fullunicodesearch #avx512 #icuspeed #utf8optimization #techinnovation

Habr @[email protected] · 2025-11-26 · 08:12 UTC

Эпоха универсальных CPU закончилась: как выбрать между P- и E-ядрами Xeon 6

Привет! На связи Максим Башмаков. Мы в Selectel производим, собираем и внедряем в продакшен

https://habr.com/ru/companies/selectel/articles/970218/

#selectel #xeon #xeon_6 #intel #AVX512 #AMX #HPC #ML

#selectel #xeon #xeon_6 #intel #avx512 #amx

FCLC @[email protected] · 2025-10-13 · 20:34 UTC

Hard to believe, but 2 years ago to the day I gave my @easybuild TechTalk on #AVX10 and the history of #SIMD on x86!

Very fun looking back on the slides (available here: https://github.com/FCLC/Talks/blob/main/AVX10forHPC_small_v2.pdf)

And thinking back on the context that talk was written in.

Namely at the then existing AVX10.N/M proposed spec, where M could be {128,256,512}

#asm #AVX512 #AVX10

#avx10 #simd #asm #avx512

FCLC @[email protected] · 2025-10-13 · 20:34 UTC

Hard to believe, but 2 years ago to the day I gave my @easybuild TechTalk on #AVX10 and the history of #SIMD on x86!

Very fun looking back on the slides (available here: https://github.com/FCLC/Talks/blob/main/AVX10forHPC_small_v2.pdf)

And thinking back on the context that talk was written in.

Namely at the then existing AVX10.N/M proposed spec, where M could be {128,256,512}

#asm #AVX512 #AVX10

#avx10 #simd #asm #avx512

michabbb @[email protected] · 2025-07-18 · 17:01 UTC

🔧 Custom #Cachy kernel with BORE scheduler for superior system responsiveness 💻 CPU-specific optimizations: Auto-detects #AVX512 capable processors for 5-20% performance boost 🎮 Gaming excellence: One-click #Steam, #Lutris, #Heroic installation with #AMD advantage over #Nvidia

#cachy #avx512 #steam #lutris #heroic #amd

michabbb @[email protected] · 2025-07-18 · 17:01 UTC

🔧 Custom #Cachy kernel with BORE scheduler for superior system responsiveness 💻 CPU-specific optimizations: Auto-detects #AVX512 capable processors for 5-20% performance boost 🎮 Gaming excellence: One-click #Steam, #Lutris, #Heroic installation with #AMD advantage over #Nvidia

#cachy #avx512 #steam #lutris #heroic #amd

Habr @[email protected] · 2025-06-30 · 02:02 UTC

О векторном вычислении экспоненциальной функции

Как вычислить экспоненциальную функцию быстро и с минимальной погрешностью? Пишем векторизованный код.

https://habr.com/ru/articles/923234/

#Simd #avx512 #параллельное_программирование #векторизация

#simd #avx512 #параллельное_программирование #векторизация

David JONES @[email protected] · 2025-06-18 · 12:51 UTC

So here's an idea i had that i'm almost certainly not going to do anything with (so you should). With AVX-512 we have 16 x 32-bit registers. Let's pretend that's a 16-deep stack. The permute instruction let us do a DROP and DUP (except, you'd probably want to ROLL them, but whatever). I'm imaging that top-of-stack would always be register 0; PUSHing something permutes all the registers 1-higher and replaces register 0. Now implement a FORTH.
#AVX512 #FORTH

#avx512 #forth

David JONES @[email protected] · 2025-06-18 · 12:51 UTC

So here's an idea i had that i'm almost certainly not going to do anything with (so you should). With AVX-512 we have 16 x 32-bit registers. Let's pretend that's a 16-deep stack. The permute instruction let us do a DROP and DUP (except, you'd probably want to ROLL them, but whatever). I'm imaging that top-of-stack would always be register 0; PUSHing something permutes all the registers 1-higher and replaces register 0. Now implement a FORTH.
#AVX512 #FORTH

#avx512 #forth

Habr @[email protected] · 2025-06-09 · 05:32 UTC

Детальный обзор полей Галуа

"Попросите Якоби или Гаусса публично высказать своё мнение — не о истинности, а о важности этих теорем. Позже, я надеюсь, найдутся люди, которым будет выгодно разобраться во всём этом хаосе." Этими словами заканчивалось письмо Эвариста Галуа, написанное для своего друга Огюста Шевалье за два дня до его смерти от полученных на дуэли ран на 21 году жизни. Ни Якоби, ни Гаусс в его теоремах не разобрались, зато спустя 15 лет разобрался Жозеф Лиувилль и опубликовал работы Галуа, ставшие впоследствии фундаментом современной алгебры, известные сейчас как теория Галуа. В статье расскажу про одну из частей этой теории - поля Галуа, получившая настолько повсеместное применение в криптографии и избыточном кодировании, что Intel и AMD выпустили набор процессорных расширений для эффективной реализации операций над этими полями. Заметка! Если вам довелось использовать/реализовывать поля Галуа, то большая часть статьи для вас скорее всего будет не интересна, но возможно в последних разделах будет что-то для вас новое.

https://habr.com/ru/articles/916740/

#галуа #конечные_поля #avx512 #ридсоломон #aes

#aes #ридсоломон #avx512 #конечные_поля #галуа

Hacker News @[email protected] · 2025-05-30 · 16:23 UTC

Beating Google's kernelCTF PoW using AVX512

https://anemato.de/blog/kctf-vdf

#HackerNews #Beating #kernelCTF #PoW #using #AVX512 #kernelCTF #AVX512 #PoW #Google #HackersNews

#hackernews #beating #kernelctf #pow #using #avx512

Hacker News @[email protected] · 2025-05-30 · 16:23 UTC

Beating Google's kernelCTF PoW using AVX512

https://anemato.de/blog/kctf-vdf

#HackerNews #Beating #kernelCTF #PoW #using #AVX512 #kernelCTF #AVX512 #PoW #Google #HackersNews

#hackernews #beating #kernelctf #pow #using #avx512

Benjamin Carr, Ph.D. 👨🏻‍💻🧬 @[email protected] · 2025-05-14 · 10:04 UTC

#AMD #EPYC #4565P & #4585PX #Benchmarks Against #Xeon #6369P
For "conventional" #server workloads like web serving and databases, the EPYC 4005 series dominates.
With up to 16C/32TH, #AVX512, DDR5-5600 memory and other advantages, the EPYC 4005 series is the very easy answer for those that may be looking for affordable #HPC
The AMD #EPYC4005 series #CPU deliver excellent generational uplift over the EPYC 4004 series and outright obliterating the #Xeon6300 series
https://www.phoronix.com/review/amd-epyc-4585px-4565p-benchmarks

#amd #epyc #4565p #4585px #benchmarks #xeon

Benjamin Carr, Ph.D. 👨🏻‍💻🧬 @[email protected] · 2025-05-14 · 10:04 UTC

#AMD #EPYC #4565P & #4585PX #Benchmarks Against #Xeon #6369P
For "conventional" #server workloads like web serving and databases, the EPYC 4005 series dominates.
With up to 16C/32TH, #AVX512, DDR5-5600 memory and other advantages, the EPYC 4005 series is the very easy answer for those that may be looking for affordable #HPC
The AMD #EPYC4005 series #CPU deliver excellent generational uplift over the EPYC 4004 series and outright obliterating the #Xeon6300 series
https://www.phoronix.com/review/amd-epyc-4585px-4565p-benchmarks

#amd #epyc #4565p #4585px #benchmarks #xeon

Benjamin Carr, Ph.D. 👨🏻‍💻🧬 @[email protected] · 2025-04-04 · 12:39 UTC

#AMD #Ryzen9000 vs. #Intel #CoreUltra #ArrowLake On #Linux For Q1-2025 In ~400 Benchmarks
In cases where #AVX512 can be utilized, the Ryzen 9000 series is the definitive winner over the Intel Core Ultra Series 2 desktop processors. In some HPC applications the Core Ultra 9 285K with 24 physical cores does well in scenarios where SMP isn't leveraged.
Overall the #Zen5 based #Ryzen9 #9950X straight-up won 50% of the time with a first place finish.
https://www.phoronix.com/review/ryzen9000-core-ultra-linux613

#amd #ryzen9000 #intel #coreultra #arrowlake #linux

Benjamin Carr, Ph.D. 👨🏻‍💻🧬 @[email protected] · 2025-04-04 · 12:39 UTC

#AMD #Ryzen9000 vs. #Intel #CoreUltra #ArrowLake On #Linux For Q1-2025 In ~400 Benchmarks
In cases where #AVX512 can be utilized, the Ryzen 9000 series is the definitive winner over the Intel Core Ultra Series 2 desktop processors. In some HPC applications the Core Ultra 9 285K with 24 physical cores does well in scenarios where SMP isn't leveraged.
Overall the #Zen5 based #Ryzen9 #9950X straight-up won 50% of the time with a first place finish.
https://www.phoronix.com/review/ryzen9000-core-ultra-linux613

#amd #ryzen9000 #intel #coreultra #arrowlake #linux

Benjamin Carr, Ph.D. 👨🏻‍💻🧬 @[email protected] · 2025-03-05 · 22:21 UTC

The Compelling #AVX512 Performance Advantage On #AMD #EPYC 9005 "Turin"
Workloads tested on this #EPYC9655 Supermicro server, with AVX-512 yielded 1.57x the performance of the same hardware/software but with AVX-512 forced off.
https://www.phoronix.com/review/amd-epyc-turin-avx512

#avx512 #amd #epyc #epyc9655

Benjamin Carr, Ph.D. 👨🏻‍💻🧬 @[email protected] · 2025-03-05 · 22:21 UTC

The Compelling #AVX512 Performance Advantage On #AMD #EPYC 9005 "Turin"
Workloads tested on this #EPYC9655 Supermicro server, with AVX-512 yielded 1.57x the performance of the same hardware/software but with AVX-512 forced off.
https://www.phoronix.com/review/amd-epyc-turin-avx512

#avx512 #amd #epyc #epyc9655

Hacker News @[email protected] · 2025-03-01 · 05:42 UTC

Zen 5's AVX-512 Frequency Behavior — https://chipsandcheese.com/p/zen-5s-avx-512-frequency-behavior
#HackerNews #Zen5 #AVX512 #Frequency #Behavior #Chips #Architecture #Performance

#hackernews #zen5 #avx512 #frequency #behavior #chips

Hacker News @[email protected] · 2025-03-01 · 05:42 UTC

Zen 5's AVX-512 Frequency Behavior — https://chipsandcheese.com/p/zen-5s-avx-512-frequency-behavior
#HackerNews #Zen5 #AVX512 #Frequency #Behavior #Chips #Architecture #Performance

#chips #architecture #performance #hackernews #zen5 #avx512

camelcdr @[email protected] · 2024-11-18 · 16:44 UTC

Scaling an RGB image: https://godbolt.org/z/vMojsrhcG

GCC can only vectorize it on RVV and generates nice code with three indexed loads and a three segment segmented store. It fails for AVX512 /NEON.

clang manages something with AVX512, but you can barely call it vectorization.
The RVV codegen looks better, but it uses fixed length vectorization and seems to have miscalculated the best LMUL choice, which causes it to spill. You get better codegen if you set -mllvm --riscv-v-fixed-length-vector-lmul-max=4.

#RVV #AVX512 #NEON #gcc #llvm

#rvv #avx512 #neon #gcc #llvm

OSTechNix @[email protected] · 2024-11-06 · 08:49 UTC

FFmpeg Sees 94x Performance Boost with Handwritten AVX-512 Code #ffmpeg #AVX512 #AssemblyCode #Opensource
https://ostechnix.com/ffmpeg-sees-94x-performance-boost-with-handwritten-avx-512-code/

#ffmpeg #avx512 #assemblycode #opensource

Benjamin Carr, Ph.D. 👨🏻‍💻🧬 @[email protected] · 2024-11-05 · 17:47 UTC

#FFmpeg devs boast of up to 94x performance boost after implementing handwritten #AVX512 assembly code
The developers have created an optimized code path using the AVX-512 instruction set to accelerate specific functions within the FFmpeg multimedia processing library. By leveraging AVX-512, they were able to achieve significant performance improvements -- from three to 94 times faster -- compared to standard implementations.
https://www.tomshardware.com/pc-components/cpus/ffmpeg-devs-boast-of-up-to-94x-performance-boost-after-implementing-handwritten-avx-512-assembly-code

#ffmpeg #avx512

Maxi 12x 💉 @[email protected] · 2024-11-05 · 14:19 UTC

Klingt zu gut um generalisierbar wahr zu sein, ich vermute dahinter letztlich einen von Intel finanzierten Image-Stunt.

AVX-512: #FFmpeg mit 94-facher Leistung

https://www.golem.de/news/avx-512-ffmpeg-mit-94-facher-leistung-2411-190481.html

https://www.tomshardware.com/pc-components/cpus/ffmpeg-devs-boast-of-up-to-94x-performance-boost-after-implementing-handwritten-avx-512-assembly-code #AVX512

#ffmpeg #avx512

Benjamin Carr, Ph.D. 👨🏻‍💻🧬 @[email protected] · 2024-10-24 · 19:35 UTC

#Intel #CoreUltra 9 285K "#ArrowLake" Delivers Strong #Linux Performance Review
Power efficiency improvements with Arrow Lake are real. Core Ultra 9 285K on average was at 136W, right inline with 137W Ryzen 9 9950X and much lower than 156W average with the Core i9 14900K. Core Ultra 9 285K was very competitive but if running a lot of #AVX512 workloads and areas where Zen 5 was delivering striking wins, Ryzen 9 9950X and the ~$429 Ryzen 9 9900X can deliver great value.
https://www.phoronix.com/review/intel-core-ultra-9-285k-linux

#intel #coreultra #arrowlake #linux #avx512