Callbacks can be avoided using co-routines.
I started a project that uses asynchronous sockets but they appear synchronous. With a bit of abstraction, it's really easy to develop network applications even in C:
I wrote my own user-space-threading library called libwire. Mostly for not liking the malloc-everywhere approach so common everywhere. The tradeoffs are different and the code is more verbose at times but I like the fact that there are no mallocs in code, at least not very explicit ones. I do provided a memory pool so it allocates memory but it is bounded in size.
Your approach looks very interesting - the examples are definitely much easier to read and understand over most async C code I have seen. How does it work internally?
All of the different coroutine/user-space-threads work by switching the stack and the the instruction pointer when the code is about to block on an async system call. So one would find that to read from the socket would block (by getting EWOULDBLOCK in errno) register to wait for this fd to become readable and switch to another coroutine. There is always one coroutine that gets scheduled from time to time and users select/poll/epoll to get all the fds that got data in them and wake up and schedule the coroutines that are waiting on each of these fds.
It's a very neat concept that allows for cleaner code (sequential rather than callback-hell) and use one or a few threads for many actions but without the overhead of thread-per-socket.
Isn't it effectively thread-per-socket except that the switches now only happen on an async system call? The difference, I guess, is you can optimize the context switch to be very small (and have smaller thread stacks) at the cost of giving up pre-emptive switching.
It reduces context switches in the kernel which are more expensive and reduces the memory resources needed for kernel threads which are larger.
There are also some hidden gains in terms of TLB caches and other costs of kernel threads switching.
An additional advantage is that between user-space-threads you have fewer locking problem since they implicitly lock out each other between context switch points so you only need locks when you need to protect an area across several context switch points.
Once the stack allocated for a thread, context switches are almost as cheap as a function call.
RiNOO has an event driven scheduler, based on epoll, which resume/release these user-space threads (that I call tasks) according to pending IOs.
The library provides with IO functions (read, write...) which use the RiNOO scheduler.
As a bonus, real threading is quite easy: just need to run a scheduler per thread (see examples with multi-threading).
https://github.com/reginaldl/librinoo