Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



SQLite is great but it's way overhyped and abused on HN. People are very eager to turn SQLite into a durable, distributed database and it's really not meant for that, and by going down that road instead of using something like MySQL or Postgres you're missing out on lots of important functionality and tooling.

I only say this because I have made this mistake at my previous startup. We built these really cool distributed databases on top of a similar storage engine (RocksDB) plus Kafka, but it ended up being more trouble than it was worth. We should have just used a battle-tested relational database instead.

Using SQLite for these applications is really fun, and it seems like a good idea on paper. But in practice I just don't think it's worth it. YMMV though.


So you didn't use SQLite then? Because RocksDB + Kafka is not similar at all.

Also databases all use the same fundamental primitives and it's up to you to choose the level of abstraction you need. For example, FoundationDB is a durable distributed database that uses SQLite underneath as the storage layer but exposes an ordered key/value API, but then allows you to build your own relational DB on top.

If you just needed distributed SQL because a single instance wasn't enough then there are already plenty of choices like CockroachDB/Yugabyte/TiDB/Memsql/etc that can serve the purpose instead of building your own.


It's actually quite similar. Both are embedded storage engines that are designed for a single node.

Actually, the case for RocksDB for backing a distributed data store is probably much stronger than SQLite given that it supports multiple concurrent writers.

SQLite lacks many important characteristics that one would expect a distributed data store to have. Row level locking is one obvious feature that's super important in a highly concurrent context (as mentioned, RocksDB has this). Want to backup your production DB? You're going to need to block all writes until the backup completes.

Additionally, features like profiling and replication are nonexistent or immature with SQLite. Rqlite and Litestream are super new relative to tools like Postgres and MySQL and you can't find a lot of people that know how to run them.

Also, you can't autoscale your app since your processes are now highly stateful. Sure, this is a problem with MySQL/Postgres too, but I can pay AWS or Google Cloud for a managed version that will abstract this problem away from me.

Most of these problems are solvable with enough net new software on top of SQLite. But... why? I think the only reason you'd subject yourself to such an architecture is because you want to learn (great!) or you're gunning for that next promotion and need to show off your system design skills :P


> So you didn't use SQLite then? Because RocksDB + Kafka is not similar at all.

To me, I could make the connection in the sense that just like sqlite, rocksdb is an embedded store, while Kafka can be used to build a replicated log (log device).

> If you just needed distributed SQL because a single instance wasn't enough then there are already plenty of choices...

Well, that was GP's point, too? In addition, they mention that existing DBMS like Postgres have way more breath and depth than a replicated sqlite can ever hope to have (which isn't really a controversial assertion at all, tbh).


I accept that you learned a lot about the limits of combining RockDB with Kafka, especially in the exact way you combined them.

This might have limited utility if the goal were to combine RocksDB with something else. And even less for SQLite and something else.

The big push of interest in SQLite serverside isn't driven by people who have never set up pgbounce, but rather by developers who have both read the SQLite docs very carefully and have used the library extensively, and know what it's good for.


I'm not sure why you concluded that SQLite is the problem when you built a "really cool distributed database" with Kafka. Distributed databases are complicated, Kafka's complicated.

If you're saying that a replicated Postgres setup would be simpler than what you're built, I agree; but SQLite+Litestream probably would be too.


Litestream is too much work if you're not using S3: replication over sftp. Even fossil has nicer no nonsense replication done over http/s. It's way easier to set up mysql with replication than manage unix accounts and public keys.


Is this any good? https://github.com/rqlite/rqlite

I've been looking for a turn key solution that is better than me running a single node Postgres instance "bare metal" or in a container.

postgres-operator seems cool but... k8s, pretty heavy I guess.


It’s the default storage engine for FoundationDB - not sure many would agree that isn’t a “durable, distributed database”.


For one thing, they're ripping it out because of its poor write parallelism https://youtu.be/nlus1Z7TVTI?t=271

But that's orthogonal to my point. As a user of FoundationDB, you're not programming directly against SQLite, so you aren't going to run into these issues as much since FoundationDB exposes different semantics and coordinates concurrency across many SQLite instances in parallel.

I think it's best to think of SQLite as a replacement for your filesystem, rather than a replacement for your relational DBMS.


SQLite has been cool forever. It was the underlying data store for my machine learning email filter POPFile 20 years ago!

https://en.wikipedia.org/wiki/POPFile https://getpopfile.org/browser/trunk/engine/POPFile/Database...


It's high-quality software too. It's well-commented and exceptionally well tested.[^1][^2]

> As of version 3.33.0 (2020-08-14), the SQLite library consists of approximately 143.4 KSLOC of C code. ... By comparison, the project has 640 times as much test code and test scripts - 91911.0 KSLOC.

I don't usually place much stock in those sort of counts, but 640x is notable.

It makes sense considering the wide variety of use-cases, from embedded devices to edge computing and everything in between.

[1]: https://www.sqlite.org/testing.html [2]: https://sqlite.org/src/dir?ci=trunk


I used POPFile!! It was awesome.


SQLite was originally great for desktop applications.

Problem is, there's still a huge market for these apps but everything has moved to the web (no one is making desktop apps anymore). So having a full-blown RDMS is overkill for these kind of app, and now SQLite is starting to fill these web app needs.

@sqlite - if you are reading this, any word on merging WAL2 and BEGIN CONCURRENT into main? There clearly is a new class of needs to do so in this world that has completely moved over to web app development (which introduces concurrency problems never experienced on desktop). Any thoughts of focusing more on these web related needs for SQLite (or maybe even fork your own code base to have a more enhanced SQLite version targeted at web needs)?


I think it’s long overdue. While SQLite certainly has its limitations, it’s a winner in many categories. Even for sites with mild traffic using ordinary SQLite in PHP like a decade ago, it was always nice to use for its simplicity and the performance was totally acceptable. In comparison, the memory usage of typical relational database servers was high enough to make it hard to fit on a single lowend VPS with the same data and traffic. (I found myself tuning MySQL, but I never needed to tune SQLite.)


The main thing for tuning SQLite will be how to open it, e.g. in write-ahead mode, to turn on foreign keys (this needs to be enabled manually), and whether it should wait to get a database lock on slower hardware before giving up. There's also some gotchas like if you mark an ID column as primary key, it'll use the rowid as key - which can be reused if a row is removed. So you need to explicitly set primary key AND autoincrement, else you're going to have a bad time. (https://www.sqlite.org/autoinc.html)


If you define a table with an integer primary key you get autoincrement as default at least in newer versions.


With the mileage (and attention) those new products are getting out of using SQLite, I think Richard Hipp deserves a lot more acknowledgement for creating such an amazing piece of software.


New products getting a lot of mileage out of sqlite is old-hat at this point. It one of those rare evergreen techs - pay attention for a while and this latest round of attention will die down for 6-12 months then someone else will start another round of "look how cool sqlite is".

At least that's been my observation since I started coming around here.


I'm wondering if we'll see some similar energy around non-sql embedded databases like leveldb or rocksdb


Right! SQLite is great, but those two are great as well. It seems like the energy should be around "hey, you should consider a local, maybe even in-memory, database for some things!" more so than specifically "SQLite is great" (though it is).


Well I don't think it's a good fit for regular service, exactly how do you handle 2 replicas of the same service talking to the same DB?

The fact that it's just a file on disk limits the usage.


Projects such as litestream and rqlite have this figured out.


rqlite author here, happy to answer any questions.


Mutliple writer on the same SQLLite?


Transactions, locks, queues, etc. No different than multiple app instances changing the same row in other databases.

Any state mutation is ultimately ordered in time and how that that ordering is accomplished depends on the abstractions you're using: in your app, network layer, database, etc.


Why would you use SQLlite once you start dealing with network, just use MySQl or PG.

It's just re-inventing the wheel badly, I need to read the details but basically you're using a tool SQLLite that was not designed to be used outside of a single app use case.


What context are you talking about here?

For Cloudflare, they're offering it because it's simple and lightweight, and they already have their Durable Objects product which serves as the transaction ordering mechanism and takes care of writes.

If you're doing it yourself then sure it's probably not the best fit but that's up to you to decide.


SQLLite was not meant to be used by multiple process so you have to build the missing parts yourself, 100% those have more limitations and issues than regular RDBMS that were built for it.


Is think one way to think about this is to have one database being tied to one replica (replicas could handle more than one database). Where (importantly) the idea would be one database for each user. You horizontally scale for the number of users, but each user is only using one end node.

It’s interesting because you have to consider how to scale your database as well as your application. The fact that you don’t have one central database opens up more possibilities. But it doesn’t work for all instances (such as a shared read-write data source for all users). For example, this approach wouldn’t work for something like Twitter (at least the original architecture).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: