Why is IRC distributed across multiple servers?

H8crilA · on Sept 12, 2021

Just remember that "netsplits" exist in every distributed system, be it a chat app or a database. It's just the CAP theorem. IRC has chosen to sacrifice C (consistency).

The only thing that changed in the modern times is that the P (partitions) are extremely rare in modern high octane cloud infrastructures. Also, modern solutions often decide to sacrifice A (availability), by returning an error saying "we're aware of the problem and we're working on a solution". This is what happened quite recently when Google authentication went out and half of the internet went dark, while under the hood they had a simple out-of-quota situation on one of the replicas of their core authentication systems. The system was programmed to sacrifice A (availability) and reject all authentication requests.

moonchild · on Sept 12, 2021

> IRC has chosen to sacrifice C (consistency)

Hm? Hasn't it sacrificed partition tolerance? A netsplit is a partition.

Twisol · on Sept 12, 2021

It tolerates partitions just fine; I've been through many netsplits where folks just kept talking on our side of the split until the network healed.

Partition tolerance doesn't mean partitions don't affect the system, or that they can't happen. It just means the system has to choose whether to become unavailable or inconsistent (since it can't have both in the presence of a partition). IRC chooses to remain available, at the cost of losing messages for people on the wrong side of the split.

Doxin · on Sept 13, 2021

IRC netsplits are a great example of what the "split brain" problem looks like from the inside.

mappu · on Sept 12, 2021

In CAP, the P happens whether you like it or not, and you get to choose between C-but-not-A or A-but-not-C.

IRC is an AP system. It stays up (+A) in a netsplit (+P) but the resulting servers are not consistent.

j56no · on Sept 12, 2021

If it had sacrificed P IRC would stop working in case of a net split. Instead it keeps working in an inconsistent state.

wlonkly · on Sept 12, 2021

No, "stop working" is Availability.

NovemberWhiskey · on Sept 13, 2021

As has been pointed out, in the real world, the CAP theorem comes down to "do I choose to offer service to everyone in the presence of a partition?"

Consider a replicated database. A replica that's partitioned from the others cannot receive or propagate updates and will become inconsistent with those on the other side of the partition.

If you allow that node to stay up, you're sacrificing consistency for availability; and you're "AP".

If you force that node down, you're sacrificing availability for consistency; and you're "CP".

In principle, you can choose "CA". That's equivalent to saying "I choose to offer no service at all in the presence of a partition", so that's kinda sorta strictly worse for most workloads than "CP" and therefore uncommon in practice.

throwaway20371 · on Sept 12, 2021

> "netsplits" exist in every distributed system, be it a chat app or a database, it's just the CAP theorem

Well let's not get carried away. Network partitions happen everywhere, but everything is not about the CAP theorem. CAP theorem is a very specific model that a lot of apps (even ACID databases) don't conform with. Comparing IRC to CAP theorem is like comparing it to ACID and saying, "IRC decided to sacrifice transaction integrity".

IRC didn't explicitly sacrifice the C in CAP, they designed a simple server protocol. They could have added a bunch of weirdness to hide splits from users, but it would have been unnecessarily complicated and not contributed significantly to the user experience.

H8crilA · on Sept 12, 2021

I'm sorry but I don't think you realise how simple and fundamental the CAP theorem is. It's almost a tautology. And yes it applies fully.

The most basic case is if there's absolutely no method of exchanging information from point A to point B. Then agents at A and B will not be able to communicate. That's it. Any system built to facilitate information exhange will either have to deliver incomplete information (C) or will have to refuse to operate (A).

Now then, as I said, nowadays it's extremely unlikely that there's truly no connection between any two major Internet hubs (though it can happen, hello BGP). It still happens in specific systems that do not work on any method of information transfer but rather on specific methods of information transfer. The IRC example requires specific servers to be up, not just a functioning IP routing between the end clients. If some server is not up then (at least temporarily) from IRC's point of view there's no way to deliver information from A to B. The Google auth outage example requires (among most likely many other things) disk space availability on specific servers for information exchange to happen.

TheDong · on Sept 12, 2021

> I don't think you realise how simple and fundamental the CAP theorem is

May I recommend reading "A Critique of the CAP Theorum - Martin Kleppmann", available as a PDF here https://arxiv.org/abs/1509.05393

As that paper points out, your definition of CAP theorem is simplified and incomplete to the point of being wrong, as many are.

As it also point out, CAP theorem doesn't really account for eventual consistency well.

I would argue that a chat protocol is a good place to perform eventual consistency, and those tradeoffs work well. During network partitions, have both sides of the partition continue to accept messages. Have the client mark messages with random unique IDs, and have each server mark messages with a server timestamp. The well-defined merge operation is now to sort by server-time and dedupe by message ID, such that if a message is sent to two servers it only displays once.

This doesn't work for IRC traditionally, since messages do not have unique IDs, and so no merge operation can deduplicate them, and servers do not store messages during netsplits (or at any time really), so they cannot be re-sent.

However, a similar system exists for other chat systems. matrix is a federated system of multiple servers, and when partitions occur, each server will still accept new messages, and later those messages will be made available to other servers and merged in at the appropriate time.

I think that CAP theorem's results are less interesting if you consider application-level resolutions to network issues (i.e. eventual consistency), and as I believe the paper also implies trotting it out constantly when talking about practical systems gets old fast.

H8crilA · on Sept 12, 2021

If you can always merge reordered edits/messages then CAP does not apply because you don't need C (as defined in CAP), you may instead talk about partitions/connectivity issues as if they were some anomalous sources of large latencies in the system. You have your own, different definition of C. There are some very very large scale systems out there that work under the assumption that any edits can arrive reordered, and it's OK for the observable properties of the system.

Here's what's "inconsistent" in an eventually consistent chat app: your typed responses might have been different had you seen in time what the other party has to say. To some degree the "computation" happens in your head. A "fully consistent" / "fully synchronous" chat app would sometimes refuse to send a message because the other party might have said something in the meantime. Like you'd expect from a fully-synchronous bank account balance handling system that wants to keep >= 0 balance at all times, rejecting overdraft transactions.

(And I agree that this is completely acceptable behavior for a chat app; we as people are built to tolerate this kind of a problem in async person to person communication; just pointing out what does C in CAP exactly mean; the "fully synchronous" chat app would be just an occasional pain in the ass with little benefit)

lmm · on Sept 13, 2021

> As that paper points out, your definition of CAP theorem is simplified and incomplete to the point of being wrong, as many are.

How so? Grandparent's statement conveys something true and useful, as far as I can see.

> As it also point out, CAP theorem doesn't really account for eventual consistency well.

It doesn't need to. CAP will tell you that some responses are inconsistent, and that remains true and important in an eventually consistent system.

> I would argue that a chat protocol is a good place to perform eventual consistency, and those tradeoffs work well. During network partitions, have both sides of the partition continue to accept messages. Have the client mark messages with random unique IDs, and have each server mark messages with a server timestamp. The well-defined merge operation is now to sort by server-time and dedupe by message ID, such that if a message is sent to two servers it only displays once.

Messages aren't the issue, things like operator permissions and channel takeovers are. It's not as simple as you're making it sound (e.g. a common problem for servers following your algorithm was that someone who got operator permissions on the wrong side of the split would correctly lose them on the merge, but bans that they'd created during the split would stay in place, allowing them to prevent the legitimate operators of a channel from exercising their control).

> I think that CAP theorem's results are less interesting if you consider application-level resolutions to network issues (i.e. eventual consistency), and as I believe the paper also implies trotting it out constantly when talking about practical systems gets old fast.

I too am frustrated that we have to trot it out so often, but for the opposite reason: the CAP theorem should be part of the common baseline that everyone understands, but even that much gets disputed. People constantly want to believe that their new technique (such as eventual consistency) has magically solved all their problems. The fact that the CAP theorem is extremely simplistic is a strength, as it cuts through a lot of obfuscating nonsense; in my experience people who want to dismiss it usually being naive about the impact that inconsistency will have on their system, just like in your IRC example above.

loo · on Sept 13, 2021

> As it also point out, CAP theorem doesn't really account for eventual consistency well.

Consistency is the first letter. Systems can be eventually available or eventually non-partitioned too.

You can promote two of the 'eventually's to 'now'. It ejects the other tautologically.

throwaway20371 · on Sept 12, 2021

> The most basic case is if there's absolutely no method of exchanging information from point A to point B. Then agents at A and B will not be able to communicate. That's it.

That's not it. The most basic case is if there's no linearizeability between A and B. A and B can continue communicating but fail the C in CAP if linearizeability fails. Hence we shouldn't compare everything to CAP.

jchw · on Sept 12, 2021

Well, from my historical reading of it, initially, IRC was a federated network of servers that were essentially one network, the way email is one network: there was no shared administration or anything. Anyone could run a server and jump into the network. Due to abuse, servers began restricting who they peered with, and it fractured into multiple networks.

So really, I suspect it was designed to be distributed and federated, and it just became what it is by accident.

Ekaros · on Sept 12, 2021

Many other services also used to be like this. Think of Usenet aka. news. It is effective model when you think of Internet as network of networks. When there was real difference between connecting to your local area network, metropolitan area network or even wide area network.

Actually we have come quite far from those days and full speed point-to-point links between most points is somewhat realistic.

unilynx · on Sept 12, 2021

It was never open to attach your server to a network, unlike email. A server connection was way too powerful for that. You needed an existing server admin to allow your server to connect.

ghancock · on Sept 12, 2021

I wasn’t there but I have seen multiple histories say that there were servers that accepted connections from anyone (most famously eris.berkeley.edu but not only that one). For example, https://about.psyc.eu/IRC

albertgoeswoof · on Sept 12, 2021

What would be the abuse issues from open peering? How were we able to solve them for email, but not IRC?

nickelpro · on Sept 12, 2021

We didn't, email spam exists to this day. The solution has been to ban entire swaths of domains and even IP ranges by chucking all mail from them into spam folders

Doxin · on Sept 13, 2021

The problem is mainly that on IRC messages coming from a server connection are assumed to never be lies. A server can do anything from disconnecting people to spoofing messages.

Which is in fact how services such as nickserv operate. If I don't log in to nickserv, nickserv will spoof a nickname change, blocking me from using the registered nickname.

Having server connections be freely available to everyone is worse than giving everyone OPER powers.

Ologn · on Sept 12, 2021

> One of the problems of having multiple servers is that netsplits can occur.

In the early/mid 1990s, the IRC servers in Australia would split from the IRC servers in the US all of the time (sometimes Europe would break from the US as well). The Internet connection between the US and Australia was slower and flakier back then. It made lots of sense for Australians to be on Australian IRC servers and Americans to be on US IRC servers, and to all be talking together when the link was working (the majority of the time) and to not be when the link broke (fairly regularly). The CAP theorem says something has to go in those cases, and the thing that went was consistency between US and Australian (or European) messages sent to a channel - the messages from the other side of the split would be dropped during the split.

I don't remember many technical netsplits on Freenode or Libera in recent years, so it is less of a thing now. IRC servers were always federated, so there was the original split of Anet and EFnet, and the Undernet split, then the EFnet/IRCnet split which revolved around those US/Europe/Australia issues. More recently there was the Freenode/Libera split.

IRC's model always worked for me.

throwaway20371 · on Sept 12, 2021

Why are Linux distributions hosted on multiple mirror servers that they don't own?

1) money 2) availability 3) trust 4) security

1) If you don't have a lot of money, you take the servers you can get. Donated mirrors means you don't have to pay the bandwidth or hosting bills.

2) If you have multiple servers, it's less likely that one server going down will tank your project. When GitHub, AWS, or even Level3 has an outage, Linux distros keep on chugging like nothing happened. Traditional server maintenance is also easier when everyone can just switch to a different server.

3) Maintainers can use their PGP keys to create signed packages and downloads. Their public keys are distributed on mirrors, as well as embedded in the downloads they've signed. Once downloaded by users, the distribution can verify its own integrity. But how does the user know they started with the real maintainers' public key? The public key is distributed on a hundred geographically-distributed servers all owned by different people; the user can check them all. So other than compromising a maintainer's key, it's logistically impossible to compromise end-user security. (this one is more Linux-specific than IRC-specific)

4) If you only have one server & it gets compromised, it can be hard to tell. By comparing its operating state to the other servers, you can sometimes more quickly identify the compromise. And if you do find a compromise, you can remove the compromised server quickly, close the hole on the other servers, and start regenerating keys. It's an eventuality every large project should be prepared for, and IRC servers do get compromised. Linux mirrors don't matter in this regard, but the build servers etc do matter.

IRC comes from the same time and place, and has some (but not all) of the same considerations.

mvanbaak · on Sept 12, 2021

> via round robin DNS (meaning that when people resolve the DNS it gives them a random server from the set of 20 to connect to)

Most of the times, it's not simple round-robin, but also geo-based. This means clients will get ip addresses of the servers closest to them.

magila · on Sept 12, 2021

My experience with Freenode/Libra Chat is that they either don't implement geo DNS or don't do a very good job of it. I'm on the US west coast and lookups to irc.libera.chat often return servers in Europe.

Edit: Double checking Libra Chat's website I see that they have added regional hostnames so I guess that's their solution.

pvtmert · on Sept 12, 2021

if they're using aws route53, your isp needs to support edns.

otherwise, your netblock might have been falsely advertised in the dns provider's geoip database. (eg. maxmind)

pushrax · on Sept 12, 2021

They're using Cloudflare. When I resolve them from the east coast, I got a San Francisco server once and a server in Budapest once. They have a server in Toronto, Ashburn, Montreal, and other places that are closer.

I know geodns works here since I use it for some of my own deployments.

melony · on Sept 12, 2021

Does IRC predates distributed state machines? Why can't the servers sync up the chat via Paxos or Raft?

X6S1x6Okd1st · on Sept 12, 2021

paxos was first created 1989, but not popularized for a long while after: https://en.m.wikipedia.org/wiki/Paxos_(computer_science)

irc 1988: https://en.m.wikipedia.org/wiki/Internet_Relay_Chat

Earliest reference for raft I can find is 2013.

giantrobot · on Sept 12, 2021

Distributing state wasn't the goal on IRC, only relaying messages. If you miss a message you miss a message. You can use client-side tools (bots, bouncers, etc) to record state but the protocol itself doesn't care.

jandrese · on Sept 13, 2021

IRC is sized to run on 80s hardware. Storing messages wasn’t possible because few nodes would have enough memory to store more than a handful of messages. The links were so slow that spitting over a backlog on reconnect could stall the connection for a non-trivial amount of time.

duskwuff · on Sept 12, 2021

Implementing Paxos would mean that stateful operations (like connecting to the server, joining a channel, or changing modes) become impossible on a server, or a group of servers, that have lost quorum.

manquer · on Sept 12, 2021

Chat is not that complex as other distributed applications you probably don't need Raft.Both Paxos and Raft are very complex algorithms to implement.

A CRDT based append only implementation is probably more than enough?. Data is never modified only added/removed in typical chat workflows.

Reading discourd engineering blog over the years it looks like scaling the pub/sub for the consumers in large channels is lot harder than DB/store itself being distributed.

sterlind · on Sept 12, 2021

hard to do Paxos over large geographical distances efficiently, but... it's IRC, so..

I just assume it was from an earlier internet where distributed systems weren't as well understood. I don't think it necessarily predates Paxos but it definitely predates Paxos being a household name.

Bender · on Sept 12, 2021

I can somewhat answer this. Apologies, this became a bit long winded and I have barely touched on several historical, technical and logistical reasons.

Part of the answer is historical and part of this was technical. IRC has been around for a very long time. As such, the earlier versions of the servers and daemons could not accept tens of thousands of client connections ePoll vs Select. The connections between servers are multiplexed and not directly related to the number of people connected to the server. There was also a matter of latency. Servers in a region would keep the messages local to that region, as only people in the same channel get the messages and it was less common to have people in the same channel all over the world. This also changed with time. If there was a split, you lost other regions. This was not always the case, so of course I am over-generalizing since there were many different IRC networks designed by many different people. Being long running services, I had seen a great deal of hesitation to re-architect anything on the fly on at least some of the networks, even after ePoll and modern hardware made it possible to have tens of thousands of people on one server. Some of the smaller IRC networks indeed consolidated into fewer or a single server.

Another facet is logistics and ownership. Many of the bigger networks are comprised of servers owned and managed by different people and organizations. The servers are linked as a matter of trust. That trust can be revoked. Most of the early IRC networks were run by people doing this in their free time with their own money and/or limited resources. In some other cases some organizations prefer to have their own servers so that their own people indeed to not suffer splits for their local communication. There are a myriad of other use-cases and reasons why some organizations had their own servers. Sometimes there was a need to give LocalOps special permissions that would not be permitted network-wide. Despite the technical capability to have less servers, some organizations are not going to give up their local nodes.

One issue not mentioned is permission losses on splits. The issue with splits and permission changes has more to do with the way services are integrated into IRC, or more specifically, aren't. Services are treated like bots with higher privilege and most if not all of them were not written to be multi-master. Rather than dealing with moving services around or pushing for read-only daemons, they just lived with the possibility that there would be splits and they would eventually resolve themselves. I personally would have preferred to see a more common integration with OpenLDAP. Some of the IRC daemons can use LDAP, but it is more of an after-thought, or bolt on. This would have allowed splits to occur without losing channel permissions and clients could be configured to quickly attach to another server in another region and that is just DNS management. This could have been further improved by amending or replacing the IRC RFC's to allow SRV records. This may have been done by now for all I know. I shut down my last public server some time ago.

There is a lot more to this than I could sum up on HN. Anyway, today you can fire up an IRCd of your choice on modern hardware and accept tens of thousands if not hundreds of thousands of people on a single server if you wish. It is technically possible. I would still design the network to have multiple servers, as you will eventually hit a bottleneck. If you really want to do this, you will have to de-tune the anti-ddos counter measures to allow the thundering herd to join your standby server or make code changes to permit the thundering herd briefly on fail-over.

wpietri · on Sept 12, 2021

People whose only experience is with modern hardware and networks really have a hard time getting the first point. As somebody who started coding around the time IRC was created, hardware and networks are amazingly good compared to what we had at the time.

In the mid-90s, years after IRC was written, I set up a distributed system for the financial traders I was working for. Our between-cities links were 64 kb/s guranteed and could burst all the way up to 256 kb/s. And those links were not super reliable. These were connecting systems with Pentium processor running ~90 MHz with ~8 MB RAM. They did very, very little compared with even the cheapest server slice you can get with AWS.

This is one of those things, like George Washington never knowing about dinosaurs, where it's just hard to comprehend how people thought back in the olden days.

rdpintqogeogsaa · on Sept 12, 2021

Lots of correct and insightful information here, but I'd like to pick out one specific aspect here.

> [...] clients could be configured to quickly attach to another server in another region and that is just DNS management. This could have been further improved by amending or replacing the IRC RFC's to allow SRV records. This may have been done by now for all I know.

To set the stage: Larger IRC networks balance their global servers. A DNS A query for irc.example.com will yield a list of geographically local servers, possibly shuffled on each query as well.

I know of at least one IRC network that refuses to send even the list of all geographically local servers, only sending a subset, as a measure to avoid trivial DDoS attacks if people don't go around collecting the DNS records ahead of time. I'm told that this actually works because the thread actors are not the sophisticated kind.

Incidentally, I have also noted that some networks will shuffle the order of A records for each query because the clients cannot be trusted to select a random DNS response. Considering something this trivial already doesn't work, I dread to imagine how much a DNS SRV implementation would go wrong, considering it needs both sorting and a weighted random sampling[1] to really work.

[1] https://datatracker.ietf.org/doc/html/rfc2782 page 3 et seq.

samsquire · on Sept 12, 2021

On some operating systems getaddrinfo sorts the DNS response by IPv6 distance! Breaking load balancing

https://access.redhat.com/solutions/22132

toast0 · on Sept 12, 2021

This is not exclusive to IPv6, I've seen it on v4 as well. If you've got short DNS TTLs and can return 2-4 records out of a larger pool, that can help, but if your TTLs are longer, you have to consider the handful of recusrive DNS servers that serve a large number of users... You want to give them more records to balance that traffic better.

OTOH, current IRC usage numbers are pretty low, a beefy single server should work, except for the disruption potential of single servers. Latency can be a bit of an issue too, depending on where your users are; not great if users are in south asia and the only server is in east coast US.

wayoutthere · on Sept 12, 2021

As someone who ran IRC servers in the 90s the technical limitation was the number of file descriptors. I think Linux at the time was limited to 1024 and the biggest server on our network was a DEC Alpha with 4096. The entire network (DALnet at the time) was in the 20-30k user range so we absolutely needed multiple servers.

blibble · on Sept 12, 2021

there was also no efficient io multiplexing

an ircd with a few thousand clients was cpu bound on poll()/select()

/dev/poll and kqueue/epoll were game changing

jlokier · on Sept 12, 2021

It was actually possible to delegate subsets of descriptors to child processes doing the poll()/select(), making polling have the same time complexity as /dev/poll and kqueue/epoll, and avoid being CPU bound. Even better if you delegated cold subsets, and kept a hot subset in the main process.

But few knew the trick so it didn't catch on.

blibble · on Sept 12, 2021

mind explaining how?

with poll()/select() I don't see how you can avoid checking every FD at least once (poll's fd counter aside), vs. epoll() only returning those in the desired state

(and I don't think you could do tricks like epoll_wait() on an epoll fd)

jlokier · on Sept 12, 2021

Sure.

Fork some child processes, and keep an AF_UNIX socketpair() open to them so you can pass them file descriptiors with SCM_RIGHTS.

Have the main process divide up the fds it is waiting on into a "hot" subset and cold subsets of size at most N, and for each cold subset pick a child process P. fds can be moved between hot and cold at any time, and generally you will move them to hot after they have woken and been used, and move them to cold after a few consecutive poll-cycles where they were not ready. Don't move fds to cold subsets belonging to child processes that you don't want to wake, though.

When the main process is ready to "poll everything", have it iterate over each child process that is not already sleeping, and send a message over the socketpair(), containing a list of fd_set additions and removals to that child's wait-for subset, including the type of poll (read, write, etc).

For each fd where the child doesn't have the real file descriptor yet, pass that over the socketpair() as part of the message. (If threads are usable instead of processes, there's no need to send the file descriptor. But on old systems, the system threads were often implemented by userspace multiplexing with poll/select anyway, so it wasn't a good idea to use threads with this technique.)

As well as a list of changes, this message tells the child process to run poll/select on its subset, and then reply with the set of fds that are ready (and their readiness type).

After issuing all the child process messages, the main process does its own poll/select, to wait for hot fds and replies from the child processes.

The reason this has different scaling properties, despite the overheads, is that each child handles a limited size subset, messages scale with the amount of change activity not the size of sets, and ideally the "coldest" fds end up gathered together in child processes that continue to sleep between a large number of main process polls, so the number of active child processes and messages scales with the amount of change activity as well.

Keep in mind, even active fds are removed from the wait-for subset if they've recently reported they are ready and the poll loop hasn't read/written them yet. So it has similar algorithmic properties to epoll.

As a bonus in the case of select(), the fds in the child processes have smaller values than in the main process. So in addition to the number of fds polled per cycle scaling with the amount of activity instead of the total number of fds, the fd_set bitset size does not grow with the total number of fds either. In the main process the bitset size does grow, but it's possible to juggle fd values with dup2() to overcome that.

blibble · on Sept 12, 2021

thank you for the detailed response

that is indeed clever and I agree would practically improve the performance drastically for many (most?) workloads

worst case scenario (say all your sockets have a approximately the same chance of waking up): the behaviour devolves into the same as for poll() though?

as a broadcast medium: IRC has some annoying qualities in this regard (inline PINGs at regular intervals, events visible by nearly all that trigger socket avalanches, etc)

jlokier · on Sept 13, 2021

It does not devolve into poll() scaling behaviour, even in your uniform probability scenario.

To picture it, maybe it helps to consider low and high uniform probabilities. If the uniform probability is a low enough per wait call, epoll/kqueue/etc have low cost per wait call, but so does this child-select method, because most child processes don't wake on each wait call. If the uniform probability is high enough, epoll/kqueue/etc return most of the fds, scaling just like the cost of select/poll, and child-select. So epoll/kqueue/etc and child-select match at the extremes, modulo constant factors.

In general, with a constant-bounded subset size, you can say the cost of unnecessary fd polls is amortised over the necessary fd polls, because unnecessary fd polls only occur in child processes that wake due to a necessary fd poll. So the algorithmic big-O scaling of this method is still the same as epoll/kqueue/etc as the number of fds increases and the probability varies, whether it's uniform or non-uniform.

No guarantees about the constant factors, though. You'd still want to tune parameters, to have enough and not too many in each subset. Constant-bounded subset size ensures scaling but not necessarily optimal performance. You can relax the constant bound and still keep the worst-case big-O scaling, if it's adaptive in the right sort of way, to better handle non-uniform probabilities with fewer processes.

blibble · on Sept 13, 2021

yes I see now, thanks

throwaway20371 · on Sept 12, 2021

I'm pretty sure even back then you could edit the hard-coded limit in the source code and recompile. I remember us doing something like this as it was too expensive to just keep buying servers and our apps were connection-happy.

NovemberWhiskey · on Sept 13, 2021

I recall going down this rabbit hole about fifteen years ago. It is, in principle, possible but you're going to be recompiling everything that could conceivably use an fd_set (so probably every single piece of code that links libc...) because it's a fixed size buffer that scales with FD_SETSIZE.

wayoutthere · on Sept 13, 2021

This was actually less of a problem than it seems in hindsight — back at that time, package managers were relatively new and not often used as were dynamic libraries, so most things were just static binaries compiled from tarballs.

NovemberWhiskey · on Sept 13, 2021

I know 15 years ago feels like it ought to be back in the dimly-veiled prehistory of Linux but actually we were on RHEL4, which was released in 2005.

wayoutthere · on Sept 14, 2021

Really? Thought they bumped it in the late 90s. I was running IRC servers mostly in the 1995-2000 timeframe and I remember the connection limit going away at one point? It’s been ages, but I largely quit using IRC once I went to college so it had to be sometime around the late 90s :)

Every network also seemed to have its own fork of ircd (and at least one rewrote it from scratch), so could have just been that the ones I used patched it in and others didn’t.

wayoutthere · on Sept 12, 2021

1024 was the max you could boost it to; 256 was the default as I recall. Linux 1.x was pretty bootstrappy.

Sophira · on Sept 12, 2021

> Services are treated like bots with higher privilege

A slight correction: Services normally link as a server to the network, which is how they get the higher privilege that they do (because only servers, not clients, get the ability to kill users from the network, etc).

And to add to this for others who may be curious: typically there is some special configuration on the IRC server side to allow the link, and some additional configuration to disallow clients from changing their nickname to names like "NickServ", etc (but to still allow the names when a server on the network broadcasts a user with that nick). Normal non-Services IRC bots, on the other hand, connect as regular clients.

duskwuff · on Sept 12, 2021

Services also need to perform actions which aren't possible for ordinary users, like knowing when a user connects, forcibly changing a user's nick, or changing a user's permissions in a channel without being an operator in the channel.

hrpnk · on Sept 12, 2021

Ah, netsplits were so eventful. I still remember the split-wars where groups would wait for a split to happen and gain operator permissions only to take over a channel on the merge [1].

[1] https://en.wikipedia.org/wiki/IRC_takeover#Riding_the_split

sterlind · on Sept 12, 2021

wouldn't ChanServ fix things once the split resolves?

_xy8h · on Sept 12, 2021

ChanServ is a relatively modern function of IRC. For a good while, still to this day on some networks, services did not exist.

hrpnk · on Sept 12, 2021

I did not experience that, but you're right: https://en.wikipedia.org/wiki/IRC_services#ChanServ

lnxg33k1 · on Sept 12, 2021

Not all networks have services, for example that happened a lot on IRCNet which doesn't(? maybe now has?)

rain1 · on Sept 12, 2021

thanks, really appreciate this comment.

Bender · on Sept 12, 2021

My pleasure. I am sure others could add a great deal more. There is a very long history and there are many pieces of history I am leaving out. A big part I left out is the individual server rate limits vs. network link rate limits and network topology and that is both a technical and logistical issue.

spinax · on Sept 12, 2021

One positive thing I'd add - as a user - under logistics is high availability. Life is messy, servers go down planned or unplanned for whatever reasons - IRC networks are in a sense truly 'federated' in that the client will get a new server on reconnect attempts much like webservers behind a load balancer. You never have to worry about your 'home instance' being unavailable, as they're all your home instance. (I speak about the public networks like Libera or OFTC)

k__ · on Sept 12, 2021

So its users can get fun netsplits.

I remember we would all try to get on the same server in our channel, but some less technical people would use a web client that assigned different ones every time.

300bps · on Sept 12, 2021

One thing I haven’t seen covered is multiple servers = redundancy.

If a server goes down, having a net split is a lot better than having the entire network down.

throwthere · on Sept 12, 2021

I don’t know if the numbers are realistic here. First and most importantly, messages are only sent to clients in the same chatroom, not sever wide. Second, 10% of users are only very rarely going to send messages at once. By rare you can probably substitute never. This, this is simple very small text messages where seconds of lag don’t really matter— why would it be hard to manage tens of thousands of concurrent connections?whatsapp crushed millions of connections on single server back in 2012— https://web.archive.org/web/20140501234954/https://blog.what...

jbverschoor · on Sept 12, 2021

Because many of the early protocols, including IP, we’re designed with network failures in mind.

hcykb · on Sept 12, 2021

Because when IRC was popular servers and routes went down often and a single server couldn't handle all the users a network would have. Neither of those are a concern anymore.

jimjams · on Sept 12, 2021

In reality only guys in the same channel get sent the messages... if messages are spread between even a few channels the autual numbers are much more manageable for one server.

nathias · on Sept 12, 2021

The side effect was also great for communities on .net servers that didn't have services like user accounts and channels. Channel ops were battle-won and people who had them were much better at not sucking completely.

Sunspark · on Sept 12, 2021

ChanOps have always been a problem. Anyone who becomes one, is a dictator for life. There is no recourse, the only option is to either be on their good side, or go to a different channel or network.

I like IRC as an open technology. I don't like the lack of accountability from the gatekeepers.

It is the same problem on online forums like reddit. If the mods do not look upon you with favour, you are banned, even if the rules have not been broken.

nathias · on Sept 12, 2021

Yes, chanops make channels into properties of individuals, but without them they are property of the community that uses them.

rawoke083600 · on Sept 12, 2021

Would be fun to visit the old problems(like this one) with modern toolset. Say golang with channels(not the /join type of channels) :p

prdonahue · on Sept 12, 2021

So you can nick collide people, obviously.

unixhero · on Sept 12, 2021

Engineering IRC networks is si much fun.