SuperWord (Auto-Vectorization) - An Introduction


SIMD and Auto-Vectorization - Modern CPU’s have a variety of SIMD (single input multiple data) vector instructions (eg. intel’s SSE and AVX, ARM’s NEON and SVE). They make use of vector registers, which can hold multiple values of a type. For example a avx512 registers (512 bit) can hold 64 bytes, or 16 ints/floats, or 8 long/doubles. They can thus load, store, add, multiply, etc multiple values with a single instruction, but usually at the same cost (instructions per cycle, and latency) as with scalar (single) values...

More at https://eme64.github.io/blog/2023/02/23/SuperWord-Introduction.html