Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
spockz
71 days ago
|
parent
|
context
|
favorite
| on:
Consistency diffusion language models: Up to 14x f...
You can run smaller models on smaller compute hardware and split the compute. For large models you need to be able to fit the whole model in memory to get any decent throughput.
stavros
71 days ago
[–]
Ah interesting, I didn't realize MoE doesn't need to all run in the same place.
Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: