You can't use more than 1 core with a unique node process. So they spawn n cores for n processes. With a multithreaded runtime, this would not be required.
Yes exactly. My suggestion was the application to fork (CPUs - 1) workers, ie. 1 master 7 workers, instead of 1 master 3 workers and 3 stores, and have workers manage their key-value stores. Apparently each worker don't need a store, see author's comment (https://news.ycombinator.com/item?id=7713561) so it looks good.