You can run smaller models on smaller compute hardware and split the compute. Fo... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		spockz 71 days ago \| parent \| context \| favorite \| on: Consistency diffusion language models: Up to 14x f... You can run smaller models on smaller compute hardware and split the compute. For large models you need to be able to fit the whole model in memory to get any decent throughput.

stavros 71 days ago [–]

Ah interesting, I didn't realize MoE doesn't need to all run in the same place.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact