I'd say your original statement is correct, and that the term "page fault" is overloaded. It can be used for both the TLB miss handler and also for loading swapped out data from disk. It's up to context to make it clear.
What you described, a TLB miss, is just called a TLB miss. The CPU will automatically find and read the appropriate PTE and load it into the TLB, just like it does on a cache miss.
Right, I was assuming x86 (since that's what Linus was describing). Indeed it is true a lot of older RISCs had MMUs that need a lot of "hand-holding" in software. I think it's fortunate that x86 didn't go this route, as evidenced by the increasing cost of context switches, since that basically requires flushing the pipeline and switching to a completely different instruction stream, while an automatic TLB, like a cache, doesn't interfere when it misses -- an OoO/superscalar design can continue to execute around it, if there are other instructions that don't depend on the miss.
A software-managed TLB involves switching contexts and executing instructions in a TLB miss handler (the fetching of which could cause cache misses too), then switching back to the instruction that was interrupted. Compare that to just internally dispatching a memory read or two more, and you'll probably see why soft TLBs seem to have fallen out of favour; even if context switches could be done with no overhead, that extra cost of fetching, decoding, and executing instructions can't be recovered. (As that old saying goes, "The fastest way to do something is to not do it at all.")
Looking a bit more into it, MIPS is the most widely-used CPU that still has a "soft TLB". The other popular RISC, ARM, is automatic like x86.
Edit: sorry, I'm totally wrong. Now I am wondering what the case I described is called. It is the event when the page table walker is invoked.