It's perfectly fine to use an EC system for many use cases. Caches are a perfect...

zozbot234 · on April 5, 2021

To establish that using an EC system is "perfectly safe" for any given use case, stronger properties are required than mere EC since that by itself provides no safety guarantees at all, only an assertion that data will "eventually" be reconciled in some arbitrary way. One example would be "strong eventual consistency" as provided by CRDT's ("conflict-free replicated data types").

staticassertion · on April 5, 2021

Yep, exactly. It's actually pretty trivial for some work. If you can do that, you get the best of both worlds. Lots of queries can be answered with stale data, which means you drop the massive overhead of strong consistency and transactional workloads.

IgorPartola · on April 5, 2021

Not if you are caching permissions. Or bank account balances. Or game scores. Or a million other things that are not Facebook comments.

staticassertion · on April 5, 2021

OK, don't do that? Or do it carefully.

We do it to answer very correctness-sensitive questions around security, but it doesn't matter because stale answers are still valid if you know how those answers could have possibly been updated - and that's just an application invariant.

IgorPartola · on April 6, 2021

In that very specialized case this works. Does your system have any guarantees beyond eventual consistency where eventual could mean hours or days?

My point is that in general, EC is not a feature. Nobody sets out hoping to find a database that provides EC. They usually set out to find a database that can be globally distributed and have strong ACID guarantees. When confronted with various cost constrains they eventually settle for a system that makes trade offs where part of the price is EC. They then work around EC, usually not completely but enough that most of the time the system works fine. But EC is not in and of itself good or desirable, it’s just a less of several evils. Moreover, of the evils that it does compete with its necessarily the least, just the easiest to implement and as a result the most popular.

Your argument of “it works in this one case and it works well” is a bit of a straw man in that no cache at all also works in some cases, but that doesn’t make it a general solution. I have successfully used an EC system for a decently sized (at the time at least) dataset and it worked well but it was only because that particular workflow naturally allowed for EC semantics (streaming updates every few seconds/source). But I sure as hell wouldn’t want to build a bank on EC.

staticassertion · on April 6, 2021

> In that very specialized case this works.

The specialized case is where you can ensure a few application level constraints about your data, which so far in my experience are extremely valuable constraints. It's maybe a less trotted road, but not a difficult one.

The benefit is massive improvements to performance and reduction in complexity - you eliminate the need for a complex consensus system.

It is far more than "it works in this one case", it is that EC removes a massive cost in databases that is often unnecessary - transactional logic, and in return it gives huge improvements in other areas. Specifically, and relevant to this thread (because the article is on caching), in the area of caching this is particularly desirable.

This is, as another user mentioned, called strong eventual consistency.

Your comment compared EC to race conditions, which I think is quite a negative way to view them, so I wanted to point out that EC is not "strictly worse" or buggy or whatever.

IgorPartola · on April 6, 2021

I think you are saying what I am saying: in specialized cases EC is fine and a good cost saving measure.

My only addition to that is that it’s popularity may be due to ease of developing EC databases vs ones with distributed consensus algorithms, and that personally I prefer to start with a system not based on EC, then add EC where necessary whereas it sounds like you prefer to start with EC and add constraints. I think your approach is more popular, but in my personal work experience it leads to more fragile systems which is why I advocate for at least understanding why that choice is being made.

EC systems aren’t inherently buggy. They just by themselves don’t include guarantees that you might find useful or desirable for general workloads.