> Any media processing can generally gain significant performance with SIMD inst...

janwas · on Aug 14, 2021

> Maybe modern codecs are seeing the writing on the wall and are increasing macroblock size for better parallel processing 10 years into the future

We did indeed do this for JPEG XL - the future is now :) 256x256 pixel groups are independently decodable (multi-core), each with >= 64-item (float) SIMD.

cma · on Aug 14, 2021

AVX-512 lines up with 64-byte cache lines, it seems like it would be a huge change to go bigger.

dragontamer · on Aug 14, 2021

NVidia GPUs are 32 wide warps, AMD CDNA are 64 wide. That's 1024 bit and 2048 bit respectively.

Cache lines are probably 64 wide for the purpose of burst length 8 (64 bit burst length 8 is 64 bytes / 512 bits).