You're right that it's not _strictly_ memory consumption and that other criteria and overrides exist, but memory consumption is highly weighted.
Regarding SSH, if you enable sshd debug logging you can see that sshd sets its own score to the minimum possible [0] which is why your comment about sshd being targeted still doesn't make sense to me. I actually didn't know it was sshd doing this on its own till I ran this:
...which is fascinating and clever. That's when I checked the source code linked at "[0]". I now finally have an answer as to why I've seen dmesg memory stat dumps display different oom_score_adj values for sshd. I always thought _something_ was smart enough to know that we don't want to risk killing sshd, but I didn't know what that _something_ was. It turns out it was the daemon itself.
Oct 13 23:20:38 server kernel: Mem-Info:
...
Oct 13 23:20:38 server kernel: [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Oct 13 23:20:38 server kernel: [ 2455] 60 2455 1778653 273114 710 10 0 0 mysqld
...
Oct 13 23:20:38 server kernel: [12085] 207 12085 22574 673 32 3 0 0 tlsmgr
Oct 13 23:20:38 server kernel: [ 4238] 0 4238 9234 518 20 3 0 -1000 systemd-udevd
Oct 13 23:20:38 server kernel: [12278] 0 12278 88107 5597 136 4 0 0 apache2
Oct 13 23:20:38 server kernel: [17222] 0 17222 1258983 142035 505 8 0 0 qemu-system-x86
...
Oct 13 23:20:38 server kernel: [21069] 0 21069 5033 487 14 4 0 0 bash
Oct 13 23:20:38 server kernel: [15935] 0 15935 7081 487 16 3 0 -1000 sshd
...
In retrospect it makes a lot of sense, especially considering sshd runs as root -- it has complete ability to do that. And it's not like anything else would know the importance of sshd except for itself.
However I still don't understand your comment about the Linux OOM killer wanting to kill sshd "first" (or _ever_ based on these renewed findings!) Can you elaborate?
If you have a large amount of memory, are doing decent amounts of IO, and linux OOMs, the system becomes unresponsive for many minutes before killing any process. At which point, ssh sessions timeout endlessly. A serial console stands a chance.
There is then also any case where you're debugging AMI builds and need to fix grub, or the init system without waiting 20 minutes for a new AMI build each time.
Also, the existing console log feature in AWS is insultingly not real time. It doesn't typically update at all unless you're within minutes of boot or trigger a reboot and it only buffers something like 4kb so a reboot can easily fully replace the logs. This really sucks when you're trying to get the debug console output, so this feature finally solves that.
Why would linux trigger OOM if you have a large amount of memory available? Or, what did you mean by "large amount of memory"?
Also, why would an SSH session, which is entirely in memory, time out because of I/O thrashing? You can disconnect the hard drive that sshd and/or the OS is running from and your SSH connections to that machine won't break. If you run some commands that aren't cached in memory you'll naturally get critical I/O errors, but it won't cause a disconnect on the SSH layer.
By large amount of memory, I meant systems that have large amounts of memory that is nearly full. For example, a server with 256GB of memory and 255.9GB in use.
SSH is purely in memory, however, in order to allocate memory for it, linux will pull "free" memory out of whatever heavily fragmented corners it can find them in. And, it may even need to perform disk I/O to free memory that was tied up in various disk caches.
People refer to this as a "livelock", where Linux is going crazy doing lots of stuff but from userspace the system is completely frozen.
They actually have gone so far as to submit kernel patches for newer PSI (pressure stall information) interfaces which they use in oomd to better detect stalls due to this thrashing
> However I still don't understand your comment about the Linux OOM killer wanting to kill sshd "first" (or _ever_ based on these renewed findings!) Can you elaborate?
I suspect running low on memory can trigger symptoms that look like sshd failing.
sshd gets paged out (or something else you need for a successful login). Un-paging becomes incredibly slow, as there's lots of IO going on from all the paging. Anything garbage-collected starts running GC constantly, using 100% CPU.
Then your attempt to SSH times out - and with no access to list running processes, one naturally concludes sshd has failed.
Yeah, maybe in addition to my survivorship bias from RHEL6-era images, I am mentally conflating samples from “we OOM killed sshd” and “we are swapping violently; we won’t get in via sshd”.
In fact, I would guess (especially given all this investigation!) that it’s much more likely that an inaccessible box is just under too much memory pressure for sshd to respond.
Amusingly, the answer is still the same: serial port! :).
Thanks again for all the pointers (to everyone in this thread).
Amusingly, this finally forced me to find bugs like this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1071290
(All processes started under a remote shell get adjustment -1000, which is basically never shoot me).
There are a few related to setting up the sshd adjustment itself as well.
So, looks like a config problem! Thanks for pointing this out.