Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
saagarjha
10 months ago
|
parent
|
context
|
favorite
| on:
Writing Speed-of-Light Flash Attention for 5090 in...
There's a 2x performance hit from the weird restriction on fp32 accumulation, plus the fact that 5090 has "fake" Blackwell (no tcgen05) which limits the size and throughput of matrix multiplication through the tensor cores.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: