home.social
  1. Amazing work from Sarah El-Kazdadi. has become standard for applications needing small, dense matrix multiply/tensor contraction. It uses JIT, which was widely believed to be necessary to achieve high performance in this domain. Sarah's new library, , is competitive or better without JIT (modulo a caveat about padding).

    sarah-ek.veganb.tw/blog/nano-g