home.social

#avx512 — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #avx512, aggregated by home.social.

  1. europesays.com/ie/361479/ Intel Publishes “Granite Rapids-WS” Xeon 600 Turbo Frequencies, AVX-512 and AMX Slash Boost Speeds #600 #amx #Avx512 #avx2 #boost #Éire #frequencies #Granite #Https://wwwTechpowerupCom/346889/intelPublishesGraniteRapidsWsXeon600TurboFrequenciesAvx512AndAmxSlashBoostSpeeds #IE #intel #Ireland #News #publishes #RapidsWs #Slash #speeds #sse #Technology #Turbo #xeon #Xeon600

  2. europesays.com/uk/775965/ AMD Ryzen 10000 “Olympic Ridge” to Debut with 6/8/10/12/16/20/24-Core “Zen 6” SKUs #10000 #2Nm #6/8/10/12/16/20/24Core #AMD #Avx512 #ccd #debut #Flagship #Https://wwwTechpowerupCom/346560/amdRyzen10000OlympicRidgeToDebutWith681012162024CoreZen6Skus #News #Olympic #OlympicRidge #ridge #ryzen #Ryzen10000 #skus #Technology #TSMC #UK #UnitedKingdom #VCache #x86 #zen #Zen6

  3. Ускорение вычислений в алгоритме DRS-виртуализации через векторизацию

    Переписать решение с Python на Go и получить ускорение в 35 раз — звучит приятно. Но можно ведь пойти дальше, вспомнить о возможностях современных процессоров и увеличить отрыв Go до 200 раз! Статья написана по мотивам доклада для

    habr.com/ru/companies/oleg-bun

    #avx #avx2 #avx512 #бэкенд #разработка #go #golang #phyton #ускорение #ускорение_кода

  4. AVX-512: a #magical #unicorn that only arrives with the right incantation and a sprinkle of fairy dust! 🦄✨ Who knew that making CPUs sweat like a marathon runner was the secret to true programmability? 😂💻
    shihab-shahriar.github.io//blo #AVX512 #CPU #programmability #tech #humor #HackerNews #ngated

  5. 🎉 30 minutes of your life you'll never get back! 🚀 Why settle for regular #Unicode searches when you can dive into a 6,244-word #saga on AVX-512? 🤦‍♂️ #ICU, you're so #slow, but hey, at least you're thorough! 😏
    ashvardanian.com/posts/search- #AVX512 #Thorough #HackerNews #ngated

  6. Эпоха универсальных CPU закончилась: как выбрать между P- и E-ядрами Xeon 6

    Привет! На связи Максим Башмаков. Мы в Selectel производим, собираем и внедряем в продакшен

    habr.com/ru/companies/selectel

    #selectel #xeon #xeon_6 #intel #AVX512 #AMX #HPC #ML

  7. Hard to believe, but 2 years ago to the day I gave my @easybuild TechTalk on #AVX10 and the history of #SIMD on x86!

    Very fun looking back on the slides (available here: github.com/FCLC/Talks/blob/mai)

    And thinking back on the context that talk was written in.

    Namely at the then existing AVX10.N/M proposed spec, where M could be {128,256,512}

    #asm #AVX512 #AVX10

  8. 🔧 Custom #Cachy kernel with BORE scheduler for superior system responsiveness 💻 CPU-specific optimizations: Auto-detects #AVX512 capable processors for 5-20% performance boost 🎮 Gaming excellence: One-click #Steam, #Lutris, #Heroic installation with #AMD advantage over #Nvidia

  9. О векторном вычислении экспоненциальной функции

    Как вычислить экспоненциальную функцию быстро и с минимальной погрешностью? Пишем векторизованный код.

    habr.com/ru/articles/923234/

    #Simd #avx512 #параллельное_программирование #векторизация

  10. So here's an idea i had that i'm almost certainly not going to do anything with (so you should). With AVX-512 we have 16 x 32-bit registers. Let's pretend that's a 16-deep stack. The permute instruction let us do a DROP and DUP (except, you'd probably want to ROLL them, but whatever). I'm imaging that top-of-stack would always be register 0; PUSHing something permutes all the registers 1-higher and replaces register 0. Now implement a FORTH.
    #AVX512 #FORTH

  11. Детальный обзор полей Галуа

    "Попросите Якоби или Гаусса публично высказать своё мнение — не о истинности, а о важности этих теорем. Позже, я надеюсь, найдутся люди, которым будет выгодно разобраться во всём этом хаосе." Этими словами заканчивалось письмо Эвариста Галуа, написанное для своего друга Огюста Шевалье за два дня до его смерти от полученных на дуэли ран на 21 году жизни. Ни Якоби, ни Гаусс в его теоремах не разобрались, зато спустя 15 лет разобрался Жозеф Лиувилль и опубликовал работы Галуа, ставшие впоследствии фундаментом современной алгебры, известные сейчас как теория Галуа. В статье расскажу про одну из частей этой теории - поля Галуа, получившая настолько повсеместное применение в криптографии и избыточном кодировании, что Intel и AMD выпустили набор процессорных расширений для эффективной реализации операций над этими полями. Заметка! Если вам довелось использовать/реализовывать поля Галуа, то большая часть статьи для вас скорее всего будет не интересна, но возможно в последних разделах будет что-то для вас новое.

    habr.com/ru/articles/916740/

    #галуа #конечные_поля #avx512 #ридсоломон #aes

  12. #AMD #EPYC #4565P & #4585PX #Benchmarks Against #Xeon #6369P
    For "conventional" #server workloads like web serving and databases, the EPYC 4005 series dominates.
    With up to 16C/32TH, #AVX512, DDR5-5600 memory and other advantages, the EPYC 4005 series is the very easy answer for those that may be looking for affordable #HPC
    The AMD #EPYC4005 series #CPU deliver excellent generational uplift over the EPYC 4004 series and outright obliterating the #Xeon6300 series
    phoronix.com/review/amd-epyc-4

  13. #AMD #Ryzen9000 vs. #Intel #CoreUltra #ArrowLake On #Linux For Q1-2025 In ~400 Benchmarks
    In cases where #AVX512 can be utilized, the Ryzen 9000 series is the definitive winner over the Intel Core Ultra Series 2 desktop processors. In some HPC applications the Core Ultra 9 285K with 24 physical cores does well in scenarios where SMP isn't leveraged.
    Overall the #Zen5 based #Ryzen9 #9950X straight-up won 50% of the time with a first place finish.
    phoronix.com/review/ryzen9000-

  14. #AMD #Ryzen9000 vs. #Intel #CoreUltra #ArrowLake On #Linux For Q1-2025 In ~400 Benchmarks
    In cases where #AVX512 can be utilized, the Ryzen 9000 series is the definitive winner over the Intel Core Ultra Series 2 desktop processors. In some HPC applications the Core Ultra 9 285K with 24 physical cores does well in scenarios where SMP isn't leveraged.
    Overall the #Zen5 based #Ryzen9 #9950X straight-up won 50% of the time with a first place finish.
    phoronix.com/review/ryzen9000-

  15. vs. On For Q1-2025 In ~400 Benchmarks
    In cases where can be utilized, the Ryzen 9000 series is the definitive winner over the Intel Core Ultra Series 2 desktop processors. In some HPC applications the Core Ultra 9 285K with 24 physical cores does well in scenarios where SMP isn't leveraged.
    Overall the based straight-up won 50% of the time with a first place finish.
    phoronix.com/review/ryzen9000-

  16. #AMD #Ryzen9000 vs. #Intel #CoreUltra #ArrowLake On #Linux For Q1-2025 In ~400 Benchmarks
    In cases where #AVX512 can be utilized, the Ryzen 9000 series is the definitive winner over the Intel Core Ultra Series 2 desktop processors. In some HPC applications the Core Ultra 9 285K with 24 physical cores does well in scenarios where SMP isn't leveraged.
    Overall the #Zen5 based #Ryzen9 #9950X straight-up won 50% of the time with a first place finish.
    phoronix.com/review/ryzen9000-

  17. #AMD #Ryzen9000 vs. #Intel #CoreUltra #ArrowLake On #Linux For Q1-2025 In ~400 Benchmarks
    In cases where #AVX512 can be utilized, the Ryzen 9000 series is the definitive winner over the Intel Core Ultra Series 2 desktop processors. In some HPC applications the Core Ultra 9 285K with 24 physical cores does well in scenarios where SMP isn't leveraged.
    Overall the #Zen5 based #Ryzen9 #9950X straight-up won 50% of the time with a first place finish.
    phoronix.com/review/ryzen9000-

  18. The Compelling #AVX512 Performance Advantage On #AMD #EPYC 9005 "Turin"
    Workloads tested on this #EPYC9655 Supermicro server, with AVX-512 yielded 1.57x the performance of the same hardware/software but with AVX-512 forced off.
    phoronix.com/review/amd-epyc-t

  19. Насколько ПК удобнее смартфона

    Ноутбуки со свободной прошивкой Libreboot Вот и выросло первое поколение «продвинутых пользователей смартфонов», которые никогда не работали за компьютером. Сейчас они заканчивают университет и начинают искать работу. Люди вытворяют на смартфоне удивительные вещи. Но не понимают, насколько убоги эти устройства на фоне полноценного компьютера. Смартфон действительно незаменим за пределами дома или офиса, в походе или поездке: для навигации, фото- и видеосъёмки, для срочных сообщений и др. Но при наличии нормального компьютера использовать смартфон по большей части глупо.

    habr.com/ru/companies/ruvds/ar

    #FFmpeg #AVX512 #ассемблер #оптимизация_софта #тяжеловесные_приложения #Google_Play #Apple_AppStore #монополия #Spotube #DeskHop #Slack_Dumper #тачскрины #смартфонизация #нормисы #ruvds_статьи

  20. Scaling an RGB image: godbolt.org/z/vMojsrhcG

    GCC can only vectorize it on RVV and generates nice code with three indexed loads and a three segment segmented store. It fails for AVX512 /NEON.

    clang manages something with AVX512, but you can barely call it vectorization.
    The RVV codegen looks better, but it uses fixed length vectorization and seems to have miscalculated the best LMUL choice, which causes it to spill. You get better codegen if you set -mllvm --riscv-v-fixed-length-vector-lmul-max=4.

    #RVV #AVX512 #NEON #gcc #llvm

  21. #FFmpeg devs boast of up to 94x performance boost after implementing handwritten #AVX512 assembly code
    The developers have created an optimized code path using the AVX-512 instruction set to accelerate specific functions within the FFmpeg multimedia processing library. By leveraging AVX-512, they were able to achieve significant performance improvements -- from three to 94 times faster -- compared to standard implementations.
    tomshardware.com/pc-components

  22. #Intel #CoreUltra 9 285K "#ArrowLake" Delivers Strong #Linux Performance Review
    Power efficiency improvements with Arrow Lake are real. Core Ultra 9 285K on average was at 136W, right inline with 137W Ryzen 9 9950X and much lower than 156W average with the Core i9 14900K. Core Ultra 9 285K was very competitive but if running a lot of #AVX512 workloads and areas where Zen 5 was delivering striking wins, Ryzen 9 9950X and the ~$429 Ryzen 9 9900X can deliver great value.
    phoronix.com/review/intel-core

  23. An interview with #AMD's #MikeClark, Father of Zen — 'Zen Daddy' says 3nm #Zen5 is coming fast; also talks compact cores for desktop
    AMD expands its Zen 5 architecture. Unlike Intel, which has to reduce clock speeds when its processors run #AVX512 workloads, AMD says these powerful instructions will run at the same clock speeds as standard integer operations. Clark also expanded on how the company achieved that feat and said that its #Zen5c cores can also run full AVX512.
    tomshardware.com/pc-components

  24. Turns out @cheese is a real 3D person!

    Here he sits down with Mike Clark, chief architect of Zen, to talk about AMDs latest microarchitecture, #zen5

    #HPC #x86 #microarchitecture #avx512

    From: @chipsandcheese
    techhub.social/@chipsandcheese

  25. Turns out @cheese is a real 3D person!

    Here he sits down with Mike Clark, chief architect of Zen, to talk about AMDs latest microarchitecture, #zen5

    #HPC #x86 #microarchitecture #avx512

    From: @chipsandcheese
    techhub.social/@chipsandcheese

  26. 👉 AMD apre il Computex 2024 con una valanga di annunci su CPU, GPU e AI
    AMD svela i processori Ryzen 9000 basati sulla nuova architettura Zen 5 e i Ryzen AI 300 Series: incrementi di prestazioni per PC e IA

    gomoot.com/amd-apre-il-compute

    #AI #ai300 #AM5 #amd #avx512 #blog #cdna4 #computex #copilotplus #copilot #cpu #epyc #gpu #ia #laptop #news #NPU #picks #processori #ryzen9 #ryzen9000 #tech #tecnologia #turin #xdna2 #zen5

  27. AVX-512's VPCOMPRESS instruction is so damn cool. For a simple array filtering problem (retain in-place only even 32-bit numbers out of a 256MiB array), it'll out-perform native C code by a factor of 10x. The C code executes at about 800MHz, while the AVX512 code executes at about 90MHz - it's just 100 times more productive with the cycles it executes.

    #avx512 #vectorization

  28. Holy shit, `VPSHLDQ` is so cool! On my laptop, the scalar `SHLD` on 64-bit GPRs is 1-3 cycles of latency. `VPSHLDQ` does the same thing (with a constant shift value, which is fine for my use case) on a 8x64-bit ZMM register with just 1 cycle of latency. I can perform the same operation 8-24 times faster!

    #simd #vectorization #avx512

  29. AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU

    Giving "Zen 5" a 512-bit FPU meant that AMD also had to scale up the ancillaries [..]. The L1 Data cache has been doubled in bandwidth, and increased in size by 50%. The L1D is now 48 KB in size [..]. FPU MADD latency has been reduced by 1 cycle. Besides the FPU, AMD also increased the number of Integer execution pipes to 10, from 8 on "Zen 4."

    techpowerup.com/321201/amd-zen

    #AMD #Zen5 #CPU #FPU #AVX512 #Microarchitecture

  30. AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU

    Giving "Zen 5" a 512-bit FPU meant that AMD also had to scale up the ancillaries [..]. The L1 Data cache has been doubled in bandwidth, and increased in size by 50%. The L1D is now 48 KB in size [..]. FPU MADD latency has been reduced by 1 cycle. Besides the FPU, AMD also increased the number of Integer execution pipes to 10, from 8 on "Zen 4."

    techpowerup.com/321201/amd-zen

    #AMD #Zen5 #CPU #FPU #AVX512 #Microarchitecture

  31. #Benchmarking The Experimental #Ubuntu #x86_64_v3 Build For Greater Performance On Modern #CPU
    With x86-64-v3 basically being #Intel #Haswell and #AMD Excavator or newer (with some exceptions like select Atoms), it would be really interesting too if #Canonical would consider an x86-64-v4 option for modern systems with #AVX512 support. It'd be really interesting to see at least an experimental Ubuntu x86-64-v4 build to see what that could mean for #servers and #HPC
    phoronix.com/review/ubuntu-x86

  32. Как оптимизировать код на С для x86-процессоров: подсистема кэша и памяти, инструкции AVX-512

    Меня зовут Андрей Бакшаев, я ведущий инженер-программист в YADRO. Моя команда занимается разработкой и оптимизацией математических библиотек под архитектуру x86. До этого я 15 лет работал в Intel. Значительная часть моих задач заключалась в том, чтобы реализовывать некоторые алгоритмы обработки изображений и сигналов в довольно известной математической библиотеке IPP, максимально эффективно используя возможности процессоров. Я также исследовал производительность этих алгоритмов в процессорах на ранней стадии проектирования. В статье я поделюсь своим опытом оптимизации низкоуровневого кода на языке C. Рассмотрим подсистему кэша и памяти процессоров и новые инструкции AVX-512. Разберем пример ускорения копирования байтового массива данных и посмотрим, как векторизованный код позволяет сократить время работы широко используемого алгоритма замены байтов по таблице с 619 до 34 мс, то есть примерно в 18 раз.

    habr.com/ru/companies/yadro/ar

    #icelake #dsp #avx2 #avx512

  33. Hi friends! Very excited to announce that I'll be giving an @easybuild Tech Talk on the 13th of October on #AVX10!

    The Talk is titled "AVX10 for HPC:
    A reasonable solution to the 7 levels of AVX-512 folly"

    Registration is free, all #x86, #AVX, #AVX512, #SIMD, and #HPC experience levels welcome!

    The page is here: easybuild.io/tech-talks/008_av

    And you can register here! event.ugent.be/registration/eb

  34. Sanity check, but looks to me like LLVM (and by extension ICX) are slightly borked in this context?

    Code is a version of the intel AMX examples

    godbolt.org/z/MdxEK7jsb

    #AMX #AVX512

  35. If someone wants to learn assembly using the most up to date ISA's on real hardware that they own, what's actually available for a reasonable cost? Looking at x86, Arm and RISC-V
    Vectors ISA: AVX512, SVE2, RISC-V V
    Matrix ISA: AMX, SME, RISC-V M

    x86_64:
    #AVX512 You're in relatively good shape. IceLake is a bargain, as is Tigerlake. Zen4 is pricier, but still accessible

    #AMX As of now, it's "SoonTM", mainly depending on what boards will cost for SPR-W. Chip, board and Ram, hopefully sub 1200

  36. If Xeon-W has #amx I'm super interested in it as a platform.

    If it doesn't? It becomes a little uninteresting for me specifically (since I already have #AVX512 Golden Cove, but with less cores/PCIe/mem )

    Also hoping that they don't segment AMX between the 2400 and 3400 series....

  37. Ok, props to Intel: with launching #sapphirerapids, they're also launching/releasing the specific *per port* throughput and latency per instruction.

    If you want to wring out every last clock cycle, this is the way to do it.

    #HPC #Xeon #avx512 #AVX512fp16 #sapphirerapids