Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you know how many CPUs you are bringing up, then you can allocate a bunch of stacks contiguously and have the CPUs race to pick up the next one, say

  mov rsp, STACKSIZE
  lock xadd [currstack], rsp
Of course, the contention on that xadd is going to cost you (if not 10ms per CPU... probably?), and this presumes you aren’t using the kernel stack pointer for anything (like a stable CPU number). To fix that, you probably will need to traverse a CPU -> startup data map in assembly. But it’s a start (no pun intended), and is not as horrendous a hack as having multiple CPUs push the same return address onto the same stack.


As an order of magnitude point, my experience has been that a bunch of CPUs trying to xadd has a throughput bottleneck on the scale of once per 50 to 100 nanoseconds.

But even if you allow an entire extra order of magnitude, at one per microsecond, that's still 10000 over the course of 10 milliseconds which is plenty for this usecase, at least for now.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: