Additionally, it looks like each subset of AVX-512 past the foundation is an optional feature and needs to be tested for. With only a few exceptions (usually as a result of bickering between Intel and AMD), previous ISA extensions implied you had everything that came before it too.
In practice this means you could pick a few entire subsets: Legacy+SSE2 is always there for 64 bit, maybe test up to SSE4.2 for another subset. Maybe switch everything from REX to VEX if it has AVX and AVX2. That's effectively three back ends, which is manageable. With AVX-512, everything beyond AVX-512F is a la carte, and that adds unwanted complexity for instruction selection in a compiler.
Just look at all the separate AVX feature flags from CPUID:
Although there are many AVX-512 features, actual implementations still break down into only a few subsets.
If you ignore the EOL Xeon Phi stuff (with different and incompatible ISAs), it was proceeding in a superset approach, but cascade lake and cooper lake AI extensions kind of messed that up.
Basically you have the SKX subset and the ICL as the big important ones in the near future, unless you care about AI, in which case Cascade Lake is like SKX + VNNI and Cooper Lake is additionally + BF16.
So in practice you'll target one of those subsets, nothing more fine-grained that that. Yes you should still test for all the required extensions, but that part is easy.
It's complicated yes, but it doesn't approach the level of thinking about all 20 AVX-512 extensions individually and testing for them.
Basically, on the ground, it is about as complicated as say the few 128 and 256 but extensions: there are only a few sets of functionality you have to care about (2 if you don't care about AI).
It's just that within those groups Intel decided to be very fine grained about the functionality, dividing the instructions among many flags (still, in a logical way).
So instead of the new generation just supporting SSE2, say, it supports 6 new flavors of AVX-512. My claim then is that this doesn't matter much: you can just think of all of those 6 as a unit, AVX-512-ICELAKE or whatever, because there are no CPUs that support a proper subset and there probably never will be (if there is, that's fine - you'll evaluate then if it makes sense for a new codepath).
Maybe I'm not making a good case that this is same :).
Nah, you're making a good case. I think what you're trying to say is to test the CPUID bits for a consistent subset of AVX512 flags and treat it all as one or two clumps. There was always going to be a fallback path for older subsets (SSE2-SSE4.2, AVX1-AVX2) anyways, so punt if it doesn't have all the features in a clump.
It's more like "Why do you care about ISA features"? Usually because you are trying to choose how many code paths to support for runtime ISA-based dispatching, or how many binaries to build when you build multiple versions of a binary (which may include compile-time dispatching).
So for that planning process, you only care about a few clumps. Then your CPUID testing strategy should still test all the required extensions, for completeness, and fall back as usual. Or something like that.
In practice this means you could pick a few entire subsets: Legacy+SSE2 is always there for 64 bit, maybe test up to SSE4.2 for another subset. Maybe switch everything from REX to VEX if it has AVX and AVX2. That's effectively three back ends, which is manageable. With AVX-512, everything beyond AVX-512F is a la carte, and that adds unwanted complexity for instruction selection in a compiler.
Just look at all the separate AVX feature flags from CPUID:
https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_F...
Between the performance problems and complexity, I think it'll be a while until AVX-512 is attractive.