I guess it depends on the complexity of your distributed system (assuming you’re...

Galanwe · on Dec 30, 2019

I get your point, and I think we agree here. I mainly wanted to argue against the original statement, which was quite broad “high performance machines running in a production environments will absolutely have swap disabled”. Would you agree to rephrase both our arguments by:

- paging incurs seemingly random performance degradation of processes and should be avoided

- if you have a form of task queue/job distribution system which handles automatic re-run, and can afford at no business cost to restart a process from scratch, then disabling swapping allows fail fast behaviour

- otherwise swapping can be used as a safety net for programs that would be better off slightly late than restarted from scratch

- both scenarios require sane monitoring of process behaviours, to catch symptomatic failures/restart in case 1) and recurring swap usage in case 2)

AdamJacobMuller · on Dec 30, 2019

> if your working under tight latency requirements then paging can push you over that boundary

Even if you're not under tight requirements, swap can do strange things. I've actually seen situations where hitting swap, even trivially, can cause massive increases in latency.

I'm talking about jobs which took 10s of milliseconds to complete now taking multiple 10s of seconds.

I've even seem some absurdly bad memory management where Linux will make very very poor choices about what to page out.

> Ultimately disabling paging is a really good tool for limiting the blast radius of bad behaviour

1000% agreed. Fail fast rather than fail slowly.