It certainly wasn't the fastest general-purpose way to do copying for the *decad...

stephencanon · on Dec 20, 2022

Right. The follow-on to ERMSB, FSRM ("fast short rep movs"), which first appeared in Icelake, finally makes it consistently competitive with SW sequences¹.

¹ but you still want to branch around it when the length is zero ("when the length is zero?!" I hear you cry; it turns out that if you instrument actual systems, you will find that this happens shockingly often).

moonchild · on Dec 20, 2022

> FSRM finally makes it consistently competitive with SW sequences

Mateusz guzik says it's decent above 128 bytes, but that software sequences still win below that.

stephencanon · on Dec 20, 2022

It depends on the exact distribution of sizes (and especially if the size[s] are statically knowable—e.g. if you are copying exactly 31 bytes, or something like an unknown size between 48 and 62 bytes, a SW sequence will still win), but it is now _competitive_ if not actually as fast (previously it was often 2-3x slower in that range, even when the length was not fixed).