#npsimd — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #npsimd, aggregated by home.social.
-
I have so many good ideas for polyfilling SIMD instructions in older instruction sets and I don't know how to put them in my library properly
You want PCMPEQQ but don't have SSE4.1? No worries, do a PCMPEQD, use PSHUFD to swap pairs of dwords, then PAND with the original result (3 cycles).
PSRLB? Just PSRLD and PAND out some bits (2 cycles).
PSRAQ? Use PSRLQ, and OR the result with the negation of the shifted MSB (i.e. PSRLQ, PAND, PSUB, POR, 4 cycles). For PSRAB, do the same, but do an additional PAND (concurrently with the PAND / PSUB) to mask out overlapping high bits.
Want a VPOPCNTB for cheap? Perform two PSHUFBs (one on the low bits, one on the high bits, both with masking) to popcount nibbles and add their results. Even older CPUs should be able to do that in 3 cycles. For VPLZCNTB / VPTZCNTB, use PMIN/PMAX instead of adding the results.
-
Version 0.2.0 of `npsimd` is now published, with a new low-level API that supports runtime feature detection (currently only SSE2 is implemented). I'm going to slowly migrate all the existing functionality over to it, and then work on a better higher-level API. See <https://docs.rs/npsimd>!