Dyad: Minimal, portable async networking library for C

NkVczPkybiXICG · on Aug 20, 2014

It uses select, which is notorious for poor performance (although is relatively portable). Try not to use this project for anything high-load.

regi · on Aug 20, 2014

And anything that requires more than 1024 fds...

baruch · on Aug 20, 2014

You can increase the define if you need more than 1024 fds but then the performance is getting very poor.

Chromozon · on Aug 20, 2014

How are you supposed to do socket programming without select?

deathanatos · on Aug 20, 2014

poll is fairly portable, but still has scaling problems. To get beyond scaling problems, you usually need something OS-specific, such as epoll (Linux), kqueue (BSDs of some kind), etc. One of the things I'd expect of a cross-platform networking library would be to use the best available abstraction, especially on major platforms such as linux, and perhaps fall back to select on odd platforms. Of course, your docs need to point out what the behavior is.

The problem with poll/select is that you need to pass (transfer) to the kernel the entire list of events you're interested in, just to wait for a single event. Each time you re-enter your event loop you need to do this, and if you have thousands of connections, it will break down, because poll is O(number of connections).

epoll, for example, works around this by having an "epoll FD", which on the kernel side contains all the events you're interested in. You wait by just waiting on that epoll fd with epoll_wait, but you don't specify the events you're interested in when you wait: you do that ahead of time, with other system calls. This allows you to change the event list only when you need to, which is much less frequent than some data arrived from somewhere. The API is supposed to be O(1), instead of O(number of connections) per wait call.

My understanding is the kqueue works similarly, but I'm a Linux guy, so I can't really tell you.

select also has other problems w.r.t. FDs with high numbers.

jhancock · on Aug 20, 2014

yeah...epoll and kqueue in my experience are easily interchangeable. I built a server on FreeBSD and the port to Linux was straightforward. My first event-based socket usage was on Windows NT around 1999. When we ported the server parts to Linux, replacing with epoll was also straightforward.

justincormack · on Aug 20, 2014

Plus kqueue gives you ways to wait for other events (timeouts, signals etc), and epoll+other linux specific calls does too. This simplifies your code a lot.

strags · on Aug 20, 2014

libev - http://software.schmorp.de/pkg/libev.html -is an existing library that provides async networking IO, over a number of backends including (I think) kqueue, epoll and in the worst case, select.

rgbrenner · on Aug 20, 2014

this competes with libevent, libev, and libuv... all of which use the best method for the platform where it's installed.. so kqueue on BSD, epoll on Linux, etc.

That's one of the big reasons to use a lib for this.. so you get the best performance, without having to change your code to get it (or bother detecting which is best, etc).

doomrobo · on Aug 20, 2014

I think the only viable (in terms of portability) alternative is poll. There's a pretty good comparison of the two written by the author of cURL [0]. But essentially it's the same speed and doesn't have a hard-coded FD limit, but it runs on fewer platforms.

[0] http://daniel.haxx.se/docs/poll-vs-select.html

electrum · on Aug 20, 2014

I sent patches in for curl about 10 years ago to switch from select to poll to get around the 1024 FD limit (which was a problem for multi-threaded servers that handled many sockets).

baruch · on Aug 20, 2014

I like epoll over poll because you don't need a central point in your application that knows about all FDs, each component can manage it's own FDs registration with the OS.

rdtsc · on Aug 20, 2014

> http://linux.die.net/man/7/socket

create a socket, bind, listen, connect, send etc.

Why do you need select?

ivank · on Aug 20, 2014

You need select or a select-like function to know which sockets are readable/writable without tying up one thread per socket

rdtsc · on Aug 20, 2014

I know what select/epoll and friends are for.

What's wrong with tying up one thread per socket?

ivank · on Aug 20, 2014

"Context switching is expensive. My rule of thumb is that it'll cost you about 30µs of CPU overhead. This seems to be a good worst-case approximation. Applications that create too many threads that are constantly fighting for CPU time (such as Apache's HTTPd or many Java applications) can waste considerable amounts of CPU cycles just to switch back and forth between different threads."

http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-ma...

On 32-bit systems, you can also easily run out of address space for your thread stacks.

rdtsc · on Aug 20, 2014

I don't trust those measurements. They never fully explored what is due to cache effects and what is due to context switching. You'd have heavy cache effects if you switched data between client contexts after select returns, too. So just because spinning on a futex show a max 30us wasted CPU doesn't mean you won't waste that with select as well.

Also in case of an IO bound thread, they are not just spinning on a futex aimlessly. There is a different mechanism, so should really benchmark with a more characteristic workload.

Speaking of characteristic workload, they should have probably also measured on a tickless kernel since I saw they complained about time quanta and HZ=100. Well recent kernels are tickless so they'll behave differently. (Might be worse even).

> On 32-bit systems, you can also easily run out of address space for your thread stacks.

Well don't run large servers with so many threads on 32 bit systems ;-) Many database vendors don't even package for or support 32 bit versions of Linux.

Sorry, I haven't bought into the whole "async is always better" trend. Some (ex?) Senior Google engineer (Paul Tyma) agrees with me:

http://www.mailinator.com/tymaPaulMultithreaded.pdf

Async / select pattern is usually good where there is very little business logic. Like a router, proxy or simple web server and so on. In a large application having a giant dispatch call at the center of it, with callbacks branching out is not a healthy pattern.

baruch · on Aug 20, 2014

In those cases I like the coroutines/user-space-threading. It gives you the reduced cost of having a single or a few threads without the heavy toll of callbacks.

baruch · on Aug 20, 2014

Just to expand on the timing and cost argument:

When you have 10,000 tasks and about 8 cores (give or take a few) the number of context switches is very large. Switching in the kernel will happen mostly in the system call boundary of blocking IOs and require the scheduler to make a decision on what thread to wake up next and then change the running process.

This can be seen in function context_switch inhttps://github.com/torvalds/linux/blob/master/kernel/sched/c... without the arch dependent components and can hardly be compared in complexity and effort to switching between 4 and 8 registers in user-space.

The above still doesn't include any changes to the TLB and memory protection tables as I assume the OS optimized those away when it switched between two threads of the same program. An optimization I'm not sure that happens normally.

cnvogel · on Aug 20, 2014

Probably nothing. {taking aside performance issues when you run into the thousands of parallel connections}

I personally prefer the old-school async approach, because there you are forced to explicitly manage your connections' state, and the application/process-wide data access is inherently race-condition free. I'd use this as far as possible.

If you let your OS schedule threads, obviously you have to be careful that shared data is correctly locked/only atomically changed, but you get parallelism (especially for CPU heavy tasks) for free. If you are used to do these chores (I'm not), perfect! And your connections state (or the state of required computations) can be arbitrarily complex (ugly?) and still quite elegantly hidden in your threads's stack.

So, I don't see that one approach is better than the other. For me the extremes are probably clear in favor of one or the other, with a large grey area in between.

electrum · on Aug 20, 2014

Using async to avoid race conditions due to multiple threads was great 15-20 years ago when single CPU machines ruled, but now you need multiple threads or processes for concurrency. Using multiple processes is usually not an option due to lack of any shared state (and if you're trying to share state across processes you should probably just use threads).

mikeash · on Aug 20, 2014

In addition to the problems discussed in the other replies, it's (nearly?) impossible to correctly cancel an operation while blocked on a socket.

rdtsc · on Aug 20, 2014

Have you tried:

* closing the socket (it would be from another, monitoring thread in this case)

* setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO)

* Would that be different than blocking on select with a an infinite timeout. How do you cancel that? Or are you relying on other sockets getting constant stream of data to wake you up?

* do something with an ALARM signal

mikeash · on Aug 20, 2014

Closing the socket results in a race condition. You might close the socket, and then another thread opens a new file or socket and happens to get the same fd as your socket used to hold. Now your reader reads from some random unaffiliated fd it shouldn't be touching, causing all sorts of havoc.

A timeout will work fine, but now you're polling, meaning you have an unpleasant tradeoff between efficiency and how long it takes for your thread to notice that it's dead.

Canceling select or any other multi-fd call is really easy. Create a pipe and add it to your fd set. Any time you want the thread to wake up (e.g. because you need to tell it that you're canceling something) you just write to the pipe.

Signals have a similar race condition as closing the socket. If the signal is delivered after you check for cancellation but before you enter the system call, you'll hang.

rdtsc · on Aug 20, 2014

> Closing the socket results in a race condition.

That is true. To go more in-depth, you'd do shutdown first. But I think you have to be connected for that.

> Canceling select or any other multi-fd call is really easy. Create a pipe and add it to your fd set. Any time you want the thread to wake up (e.g. because you need to tell it that you're canceling something) you just write to the pipe.

That a good way, agree. But I would still use a select with 2 file descriptors per thread. One fd for the pipe and one for socket itself. Each thread handles its own request and processing as needed without having one global dispatch in the whole application. Pipe is exposed to the outside in case shutdown needs to be triggered (from another thread).

mikeash · on Aug 20, 2014

2 fds per thread works well. Unfortunately, this means you hit select's performance problems twice over, since the performance scales with the maximum fd you pass it, not the number of fds. But as long as that's OK for what you're doing, it's a nice way to arrange things.

BruceM · on Aug 20, 2014

poll, epoll, kqueue, sigio and other options depending on OS, etc.

haberman · on Aug 20, 2014

Recommended additions to the docs / marketing material:

"portable" -- portable to what?

comparisons -- vs. libevent, libuv, etc.

in the sample program -- when will dyad_getStreamCount() drop to zero? will it ever? what exactly is a "stream" anyway and how does it relate to individual connections?

kbaker · on Aug 20, 2014

Here is a libevent echo server for comparison:

http://www.wangafu.net/~nickm/libevent-book/Ref8_listener.ht...

It looks like this library has only 2 weeks worth of commits so I'll reserve any criticism. My vote is still with libevent though!

codehero · on Aug 20, 2014

dyad_writef is certainly not 'async' when using the 'r' input specifier (which seems to redirect input from a FILE* to the output stream). And the way dyad_writef achieves 'async' is to buffer the entire contents of the output in RAM.

So for 'r', why bother using the FILE* formatted IO if all you're going to do is pull single characters at a time? You can just use file descriptors.

dimman · on Aug 20, 2014

Is it just me that wants to have complete control over stuff? I mean if you really have the need to use a library like libevent you most likely know why and could do your own handling rather easily.

I'm not saying there's not a need for libraries like libevent, but IMO it's not needed for most applications. I might be biased because I want complete control and know what happens, I think that's important. I don't want to use a library before I know what happens in the "background". When I know that and know what I need, only then it might be appropriate to use a library.

matthavener · on Aug 20, 2014

I'm curious what you mean by "complete control"? Does that mean less abstraction? Or do you want to avoid inversion of control? That is, you want to avoid using callback-based libraries?

ishbits · on Aug 20, 2014

For me, libevent just provides the boiler plate but I'd be writing too often for fd registration and timeouts.

I could do it myself, but it would just end up looking more and more like libevent each time I did it.

larssonvomdach · on Aug 20, 2014

A friend of mine wrote something similar for C++: https://github.com/Kosta-Github/http-cpp

floatboth · on Aug 20, 2014

Is it called Dyad for the reason I think it is called Dyad? ;-)

curiousDog · on Aug 20, 2014

Very neat! Although after using C#'s TAP, callbacks sound like hell :)

regi · on Aug 20, 2014

Callbacks can be avoided using co-routines. I started a project that uses asynchronous sockets but they appear synchronous. With a bit of abstraction, it's really easy to develop network applications even in C:

https://github.com/reginaldl/librinoo

baruch · on Aug 20, 2014

Didn't find this one before.

I wrote my own user-space-threading library called libwire. Mostly for not liking the malloc-everywhere approach so common everywhere. The tradeoffs are different and the code is more verbose at times but I like the fact that there are no mallocs in code, at least not very explicit ones. I do provided a memory pool so it allocates memory but it is bounded in size.

libwire: https://github.com/baruch/libwire

list of coroutine/user-space-threading libraries: https://github.com/baruch/libwire/wiki/Other-coroutine-libra...

brown-dragon · on Aug 20, 2014

Your approach looks very interesting - the examples are definitely much easier to read and understand over most async C code I have seen. How does it work internally?

baruch · on Aug 20, 2014

All of the different coroutine/user-space-threads work by switching the stack and the the instruction pointer when the code is about to block on an async system call. So one would find that to read from the socket would block (by getting EWOULDBLOCK in errno) register to wait for this fd to become readable and switch to another coroutine. There is always one coroutine that gets scheduled from time to time and users select/poll/epoll to get all the fds that got data in them and wake up and schedule the coroutines that are waiting on each of these fds.

It's a very neat concept that allows for cleaner code (sequential rather than callback-hell) and use one or a few threads for many actions but without the overhead of thread-per-socket.

brown-dragon · on Aug 20, 2014

Isn't it effectively thread-per-socket except that the switches now only happen on an async system call? The difference, I guess, is you can optimize the context switch to be very small (and have smaller thread stacks) at the cost of giving up pre-emptive switching.

baruch · on Aug 20, 2014

It reduces context switches in the kernel which are more expensive and reduces the memory resources needed for kernel threads which are larger.

There are also some hidden gains in terms of TLB caches and other costs of kernel threads switching.

An additional advantage is that between user-space-threads you have fewer locking problem since they implicitly lock out each other between context switch points so you only need locks when you need to protect an area across several context switch points.

regi · on Aug 20, 2014

It uses "user-space threads" similar to ucontext (swapcontext(3)). I have my own version: https://github.com/reginaldl/fcontext

Once the stack allocated for a thread, context switches are almost as cheap as a function call.

RiNOO has an event driven scheduler, based on epoll, which resume/release these user-space threads (that I call tasks) according to pending IOs. The library provides with IO functions (read, write...) which use the RiNOO scheduler.

As a bonus, real threading is quite easy: just need to run a scheduler per thread (see examples with multi-threading).

brown-dragon · on Aug 20, 2014

That's really interesting - thanks.

shroukkhan · on Aug 20, 2014

does not seem like it supports ssl socket..or did i miss it ?

eps · on Aug 20, 2014

There is no such thing as "ssl socket".