> Not really. The unused parts of the SIMD execution units are powered down
True, and that helps with frequency and power, but what I meant with "wastes a lot of compute power" is that e.g. you have an avx2 capable execution unit that can do 256/32 float adds in parallel, but only a single SSE instruction can be scheduled to it, hence you can only get 128/32 float adds via SSE on that execution unit.
True, and that helps with frequency and power, but what I meant with "wastes a lot of compute power" is that e.g. you have an avx2 capable execution unit that can do 256/32 float adds in parallel, but only a single SSE instruction can be scheduled to it, hence you can only get 128/32 float adds via SSE on that execution unit.