Surprised people don't know about this, as it has been common knowledge in the SD community [1] since october last year. Strictly speaking you don't even need cuda 11.8+ to get the speedup; it's sufficient to use cuDNN 8.6+, though you should use the newest versions for other reasons.
[1]: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issu...