Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can also run PyTorch cu121 nightly builds,

These also allow `torch.compile` to function properly with dynamic input, which should net another 30%+ boost to SD.



Is there a trick to getting pytorch+cu121 and xformers to play nicely together? All the xformers packages I can find are torch==2.01+cu118.

Edit: After a bit more research it looks like scaled dot product attention in Pytorch 2 provides much the same benefit as xformers without the need for xformers proper. Nice.


xformers has to match the PyTorch build. For PyTorch nightly, you need to build from source.

xformers still has a tiny performance benefit (especially at higher resolutions IIRC), but yeah, PyTorch's SDP is good.


This comment brings a tear to my eye.


The underlying problem is the community's decision to make users manage this in the first place.

This is an example of a setup.py that correctly installs the accelerated PyTorch for your platform:

https://github.com/comfyanonymous/ComfyUI/blob/9aeaac4af5e19...

As you can see, never merged. For philosophical reasons I believe. The author wanted to merge it earlier and changed his mind.

Like why make end users deal with this at all? The ROI from a layperson choosing these details is very low.

Python has a packaging problem, this is well known. Fixing setuptools would be highest yield. Other package tooling can't install PyTorch, for example: https://github.com/python-poetry/poetry/issues/6409#issuecom....

PyTorch itself is wonkily packaged. But I'm sure they have a good reason for this. Anyway, it goes to show that you can put a huge amount of effort into fixing this particular problem that everyone touching this technology has, and the maintainers everywhere will go nowhere with it. And I don't think this is a "me" problem, because there is so much demand for packaging PyTorch correctly - all the easy UIs, etc.


> But I'm sure they have a good reason for this.

CUDA and ROCM make this an intractable problem. Basically there is no way to sanely package everything users need, and the absolutely enormous, cude/rocm versioned pytorch packages with missing libs are already a compromise.

TBH the whole ecosystem is not meant to be for end user inference anyway.


Sorry, no idea what you are talking about.

I am talking about dynamic shapes in torch.compile.

You seem to be talking about software packaging. You also make heavy use of the word "this" without it being clear what "this" is.


The two most popular stable diffusion UIs (automatic1111 and comfy) have longstanding issues with a few known but poorly documented bugs, like the ADA performance issue.

For instance, the torch.compile thing we are talking about is (last I checked) totally irrelevant for those UIs because they are still using the Stability AI implementation, not diffusers package that Huggingface checks for graph breaks. This may extend to SDXL.


Pretty interesting. Using nightly + cu121 im getting 8.18 it/s, another 5% improvement vs 7.78 that cu118 gave.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: