Is there a trick to getting pytorch+cu121 and xformers to play nicely together? All the xformers packages I can find are torch==2.01+cu118.
Edit: After a bit more research it looks like scaled dot product attention in Pytorch 2 provides much the same benefit as xformers without the need for xformers proper. Nice.
PyTorch itself is wonkily packaged. But I'm sure they have a good reason for this. Anyway, it goes to show that you can put a huge amount of effort into fixing this particular problem that everyone touching this technology has, and the maintainers everywhere will go nowhere with it. And I don't think this is a "me" problem, because there is so much demand for packaging PyTorch correctly - all the easy UIs, etc.
CUDA and ROCM make this an intractable problem. Basically there is no way to sanely package everything users need, and the absolutely enormous, cude/rocm versioned pytorch packages with missing libs are already a compromise.
TBH the whole ecosystem is not meant to be for end user inference anyway.
The two most popular stable diffusion UIs (automatic1111 and comfy) have longstanding issues with a few known but poorly documented bugs, like the ADA performance issue.
For instance, the torch.compile thing we are talking about is (last I checked) totally irrelevant for those UIs because they are still using the Stability AI implementation, not diffusers package that Huggingface checks for graph breaks. This may extend to SDXL.
These also allow `torch.compile` to function properly with dynamic input, which should net another 30%+ boost to SD.