Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It certainly wasn't the fastest general-purpose way to do copying for the decades of CPUs between the 8086 (late 1970s) and introduction of enhanced rep movsb (ERMSB) in Ivybridge (2012). For post-Ivybridge CPUs, I believe it was non optimal for some usage patterns for a while, too. It may be optimal in very recent Intel CPUs, but that's a far cry from saying it's been the fastest since introduction in the 8086.

E.g., Agner: https://www.agner.org/optimize/blog/read.php?i=372

E.g., Stackoverflow: https://stackoverflow.com/questions/43343231/enhanced-rep-mo...



Right. The follow-on to ERMSB, FSRM ("fast short rep movs"), which first appeared in Icelake, finally makes it consistently competitive with SW sequences¹.

¹ but you still want to branch around it when the length is zero ("when the length is zero?!" I hear you cry; it turns out that if you instrument actual systems, you will find that this happens shockingly often).


> FSRM finally makes it consistently competitive with SW sequences

Mateusz guzik says it's decent above 128 bytes, but that software sequences still win below that.


It depends on the exact distribution of sizes (and especially if the size[s] are statically knowable—e.g. if you are copying exactly 31 bytes, or something like an unknown size between 48 and 62 bytes, a SW sequence will still win), but it is now _competitive_ if not actually as fast (previously it was often 2-3x slower in that range, even when the length was not fixed).




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: