Another good example is using Morton Order for things like texture maps in graph...

psykotic · on Feb 19, 2011

Yes, that's texture swizzling.

It's useful any time you are accessing higher-rank arrays in a spatially coherent manner without any directional bias.

Bit interleaving only works with power-of-two dimensions. That's limiting. But for the cache benefits you don't need self-similarity at all levels. For less regularly sized arrays, you can play this trick with only some slices of lower bits, and then it's usually called tiling.

Scientific computing people also do this whenever they want to lay out a matrix in memory so that it efficiently supports both row-coherent and column-coherent access, which is required for basic operations like matrix multiplication. You can formulate tiled matrix multiplication in terms of smaller tile vs tile multiplications.

The earliest GPUs supported only square power-of-two-sized textures using swizzling. Later ones like the GeForce FX supported non-square, non-power-of-two textures using a simple memory unit with tiling. But the total number of tiles in use was a limited and non-scaleable resource, and once you exceeded the limit you fell off a performance cliff. Finally, the last two generations of GPUs (starting with the G80 microarchitecture in NVIDIA's case) have a more scaleable multi-level tiling approach that combines benefits of both swizzling and tiling although with higher hardware complexity than either of them.

My coworker wrote a great blog post on fast multi-level tiling in software: http://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-s...

enjalot · on Feb 19, 2011

I came here to say this. Not just graphics, but indexing arrays in general on the GPU for use in OpenCL or CUDA. We are working on optimizing our neighbor searches using this.

A good book: Foundations of Multidimensional and Metric Data Structures by Hanan Samet 2006

A good paper: Interactive SPH Simulation and Rendering on the GPU by Goswami

gruseom · on Feb 19, 2011

Samet's book is the definitive reference, but I have trouble understanding his explanations. They do a good job of summarizing what's distinctive about each technique relative to the others, but a poor job of explaining what the techniques are. This is a pitfall of technical writing, especially by experts: it's easy to produce something that's intelligible only if you already know the material, which defeats the purpose (assuming your purpose is explanatory and not only compository). Still, the book is indispensable as a reference.

arnoldoMuller · on Feb 19, 2011

When I was building OBsearch.net I used this book extensively. I grant you that some explanations are not so detailed. On the other hand, the explanations of the most relevant techniques are quite good. Check for instance the section that explains LSH or the explanation about Navarro's SAT.

tom_b · on Feb 19, 2011

:-)

I remember using Morton Order for memory layout of dense matrix operations. Very handy and much better than row or column major order. At the time, I was using this for wavelet operations.

Some of the nastiest C code I ever wrote from a readability perspective. The experience permanently scarred me - prior to doing it, I wrote nice, clean C. After, I couldn't read my own code after a week . . .