Most software I've seen can be dramatically improved by removing exception handlers. (And then fixing the underlying problems).
On the server side I prefer one exception handler which is the OS, it will kill your process, release and free all memory, file handlers, etc. Then I wrap it in a bash script / other monitoring tool, and have it immediately restart whenever it fails.
As for the server itself I'll also add an automatic reboot to the server to ensure nothing out of the ordinary persists for very long (like that PATH modification that someone forgot to add in the appropriate place to get set on restart). If it's virtualized I prefer to destroy the entire server if possible.
One advantage of forking servers; kill and reboot the parent, and you don't loose in-flight connections.
That said, I do this as well, even the best behaved daemon can get... funky... after a few months. Planned outages for a daemon restart are ok in my experience, particularly if you can fail over to other nodes as part of a rolling restart.
Of course, this refers to planned restarts, though forking servers helps with unplanned exceptions as well.
Don't you find performance suffers? AIUI this approach means you can only handle as many concurrent requests as you have processes, and the OS scheduler has less information to work with than if you were using threads.
Not typically: using forking daemons does not mean that you can't also use threads. The ideal model probably uses both - so they can be stuck to a processor, but still use threads per request. It's nice to have a library to abstract the implementation details away for you, but not necessary.
Right, but doesn't the error handling approach you describe mean allowing a whole process to fail whenever an error condition occurs, which would cause any requests that were being handled by other threads of that process to fail even though they were perfectly valid?
An exception in one thread shouldn't affect others - why would it?
I agree that unstructured use of threading primitives leaves you with an unpredictable system, but it's possible to build safer, higher-level abstractions and use those.
Because fundamentally the only reason to use threads rather than fork is to share memory. If an exception leaves shared memory in an undefined state then all threads that share that memory are in undefined states.
True, but why would an exception ever leave memory in an undefined state? I can imagine it corrupting a thread's own stack in some languages, but you wouldn't use a thread's stack for shared memory (at least, not without some construct that told other threads when it was safe to access it)
It's more of a question of what the semantics of failed requests are than concurrent clients.
The key is to segment your system so processing is orthogonal to the implementation details of the networking protocol on a given system. Just because on a particular OS when a process closes the TCP/IP connections are dropped does not mean that every time your process crashes that client connections are dropped.
In the case of a webserver you can use something like mongrel2 / nginx that maps physical connections to backend processes so that a process restart doesn't mean a dropped connection, or failed request.
Forcing your machines to reboot early and often makes you think about and deal with these problems rather than simply delaying them until one of your nodes dies and takes out a bunch of client connections anyway.
Fail-fast is one of the philosophies I love in Erlang. By using gen_server and supervisor, it's easy to write fail-fast software that works.
In most software I see, exception handlers are a myriad of poor implementations of "log a message, clean up state, and try/fail again" rather than the actual handling of a specific exceptional state. Many of the most frustrating bugs I've seen are introduced when an exception handler fails to clean up state and tries again with some kind of insane context or a leaked handle.
Most software I've seen can be dramatically improved by removing exception handlers. (And then fixing the underlying problems).
On the server side I prefer one exception handler which is the OS, it will kill your process, release and free all memory, file handlers, etc. Then I wrap it in a bash script / other monitoring tool, and have it immediately restart whenever it fails.
As for the server itself I'll also add an automatic reboot to the server to ensure nothing out of the ordinary persists for very long (like that PATH modification that someone forgot to add in the appropriate place to get set on restart). If it's virtualized I prefer to destroy the entire server if possible.