I imagine (because there are not that many details on the Mill out there) that the power savings come mainly from:
- A post-compiler compilation step that adapts a binary for a specific chip, replacing a complex hardware control module by a one-time software run, so the only thing actually on the chip is the routing;
- A very nice set of primitives that push the non-determinism into data, instead of flow control.
- Cheap interruptions, cheap memory access, and every other detail rethought to be cheap.
- A post-compiler compilation step that adapts a binary for a specific chip, replacing a complex hardware control module by a one-time software run, so the only thing actually on the chip is the routing;
- A very nice set of primitives that push the non-determinism into data, instead of flow control.
- Cheap interruptions, cheap memory access, and every other detail rethought to be cheap.