Yes! Most software I've seen can be dramatically improved by removing exception ...

dtwwtd · on Dec 6, 2013

Are the servers you're building serving concurrent clients? An exception could take out multiple in-flight requests.

(not against your idea, just curious how you handle it)

falcolas · on Dec 6, 2013

One advantage of forking servers; kill and reboot the parent, and you don't loose in-flight connections.

That said, I do this as well, even the best behaved daemon can get... funky... after a few months. Planned outages for a daemon restart are ok in my experience, particularly if you can fail over to other nodes as part of a rolling restart.

Of course, this refers to planned restarts, though forking servers helps with unplanned exceptions as well.

lmm · on Dec 6, 2013

Don't you find performance suffers? AIUI this approach means you can only handle as many concurrent requests as you have processes, and the OS scheduler has less information to work with than if you were using threads.

falcolas · on Dec 6, 2013

Not typically: using forking daemons does not mean that you can't also use threads. The ideal model probably uses both - so they can be stuck to a processor, but still use threads per request. It's nice to have a library to abstract the implementation details away for you, but not necessary.

lmm · on Dec 6, 2013

Right, but doesn't the error handling approach you describe mean allowing a whole process to fail whenever an error condition occurs, which would cause any requests that were being handled by other threads of that process to fail even though they were perfectly valid?

_3u10 · on Dec 6, 2013

Can you really tell me that you know after an exception that the other threads are really in a well-defined state let alone 'perfectly valid'?

Look at something like ZeroMQ that is being rewritten specifically to avoid the non-determinism inherent in throwing an exception.

Once you're using threads it's pretty much anyone's guess as to what state the system is in at any point, add exceptions and it just gets worse.

lmm · on Dec 6, 2013

An exception in one thread shouldn't affect others - why would it?

I agree that unstructured use of threading primitives leaves you with an unpredictable system, but it's possible to build safer, higher-level abstractions and use those.

_3u10 · on Dec 7, 2013

Because fundamentally the only reason to use threads rather than fork is to share memory. If an exception leaves shared memory in an undefined state then all threads that share that memory are in undefined states.

lmm · on Dec 8, 2013

True, but why would an exception ever leave memory in an undefined state? I can imagine it corrupting a thread's own stack in some languages, but you wouldn't use a thread's stack for shared memory (at least, not without some construct that told other threads when it was safe to access it)

_3u10 · on Dec 6, 2013

Why do you need to handle requests concurrently when something like the disruptor pattern can handle 6 million/sec on a single core.

lmm · on Dec 6, 2013

Disruptor would be even worse for this - you'd lose all the perfectly valid messages in all the ringbuffers.

_3u10 · on Dec 6, 2013

It's more of a question of what the semantics of failed requests are than concurrent clients.

The key is to segment your system so processing is orthogonal to the implementation details of the networking protocol on a given system. Just because on a particular OS when a process closes the TCP/IP connections are dropped does not mean that every time your process crashes that client connections are dropped.

In the case of a webserver you can use something like mongrel2 / nginx that maps physical connections to backend processes so that a process restart doesn't mean a dropped connection, or failed request.

Forcing your machines to reboot early and often makes you think about and deal with these problems rather than simply delaying them until one of your nodes dies and takes out a bunch of client connections anyway.

bri3d · on Dec 6, 2013

Fail-fast is one of the philosophies I love in Erlang. By using gen_server and supervisor, it's easy to write fail-fast software that works.

In most software I see, exception handlers are a myriad of poor implementations of "log a message, clean up state, and try/fail again" rather than the actual handling of a specific exceptional state. Many of the most frustrating bugs I've seen are introduced when an exception handler fails to clean up state and tries again with some kind of insane context or a leaked handle.