Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Another good example is using Morton Order for things like texture maps in graphics - it is easy to derive a memory address by interleaving the bits of the x,y coords:

http://www.devmaster.net/forums/showthread.php?t=10125

http://en.wikipedia.org/wiki/Z-order_curve

From distant memory - switching to a variant of this from simple row major order textures gave > 20% increase in a scene rendering benchmark (software pipe, full scene, no game logic). Prior to that I had been doing daft things like having textures stored in column or row major order depending on their 'usual' orientation.



Yes, that's texture swizzling.

It's useful any time you are accessing higher-rank arrays in a spatially coherent manner without any directional bias.

Bit interleaving only works with power-of-two dimensions. That's limiting. But for the cache benefits you don't need self-similarity at all levels. For less regularly sized arrays, you can play this trick with only some slices of lower bits, and then it's usually called tiling.

Scientific computing people also do this whenever they want to lay out a matrix in memory so that it efficiently supports both row-coherent and column-coherent access, which is required for basic operations like matrix multiplication. You can formulate tiled matrix multiplication in terms of smaller tile vs tile multiplications.

The earliest GPUs supported only square power-of-two-sized textures using swizzling. Later ones like the GeForce FX supported non-square, non-power-of-two textures using a simple memory unit with tiling. But the total number of tiles in use was a limited and non-scaleable resource, and once you exceeded the limit you fell off a performance cliff. Finally, the last two generations of GPUs (starting with the G80 microarchitecture in NVIDIA's case) have a more scaleable multi-level tiling approach that combines benefits of both swizzling and tiling although with higher hardware complexity than either of them.

My coworker wrote a great blog post on fast multi-level tiling in software: http://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-s...


I came here to say this. Not just graphics, but indexing arrays in general on the GPU for use in OpenCL or CUDA. We are working on optimizing our neighbor searches using this.

A good book: Foundations of Multidimensional and Metric Data Structures by Hanan Samet 2006

A good paper: Interactive SPH Simulation and Rendering on the GPU by Goswami


Samet's book is the definitive reference, but I have trouble understanding his explanations. They do a good job of summarizing what's distinctive about each technique relative to the others, but a poor job of explaining what the techniques are. This is a pitfall of technical writing, especially by experts: it's easy to produce something that's intelligible only if you already know the material, which defeats the purpose (assuming your purpose is explanatory and not only compository). Still, the book is indispensable as a reference.


When I was building OBsearch.net I used this book extensively. I grant you that some explanations are not so detailed. On the other hand, the explanations of the most relevant techniques are quite good. Check for instance the section that explains LSH or the explanation about Navarro's SAT.


:-)

I remember using Morton Order for memory layout of dense matrix operations. Very handy and much better than row or column major order. At the time, I was using this for wavelet operations.

Some of the nastiest C code I ever wrote from a readability perspective. The experience permanently scarred me - prior to doing it, I wrote nice, clean C. After, I couldn't read my own code after a week . . .




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: