home.social

#tflops — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #tflops, aggregated by home.social.

  1. New benchmark shows that larger CUDA tiles can cut Flash Attention throughput by 18‑43 % across sequence lengths. The study dives into kernel design, TFLOPS loss, and what it means for transformer model efficiency on NVIDIA GPUs. Open‑source researchers can use these insights to tune their kernels and reclaim performance. #FlashAttention #CUDATiles #GPUPerformance #TFLOPS

    🔗 aidailypost.com/news/large-cud

  2. New benchmark shows that larger CUDA tiles can cut Flash Attention throughput by 18‑43 % across sequence lengths. The study dives into kernel design, TFLOPS loss, and what it means for transformer model efficiency on NVIDIA GPUs. Open‑source researchers can use these insights to tune their kernels and reclaim performance. #FlashAttention #CUDATiles #GPUPerformance #TFLOPS

    🔗 aidailypost.com/news/large-cud

  3. New benchmark shows that larger CUDA tiles can cut Flash Attention throughput by 18‑43 % across sequence lengths. The study dives into kernel design, TFLOPS loss, and what it means for transformer model efficiency on NVIDIA GPUs. Open‑source researchers can use these insights to tune their kernels and reclaim performance. #FlashAttention #CUDATiles #GPUPerformance #TFLOPS

    🔗 aidailypost.com/news/large-cud

  4. 😂 Ah, the classic tale of tech sorcery where simply naming your kernel "cutlass" magically unlocks 100 #tflops of speed! Meanwhile, x.com is still busy booting you off your browser faster than you can say "incompatibility." 🏴‍☠️🔗📉
    twitter.com/cis_female/status/ #techhumor #cutlass #xcom #incompatibility #HackerNews #ngated

  5. #China's secretive #Tianh 3 #supercomputer uses homegrown hybrid #CPU — rivals US systems with 1.57 #Exaflops of performance. #NUDT #MT3000 features a unique heterogeneous architecture that includes general-purpose CPU cores with 96 control cores and 1,536 accelerator cores. Netting the MT-3000 processor reportedly achieves 11.6 FP64 #TFLOPS of peak performance and demonstrates a power efficiency of 45.4 #GigaFLOPS/Watt at operational frequency of 1.20 GHz tomshardware.com/tech-industry #hpc #sanctions