> I use Gitea(Github clone) locally with SQLite running on ZFS where I can take ...

CodesInChaos · on April 15, 2021

When I back up a database, I expect to get a consistent snapshot of the database, which includes all committed transactions. Uncommitted transactions can't be included in this, since there is no guarantee they'd ever commit.

From your answer it sounds like I'd get such a snapshot (even if it relies on recovery from a journal internally). So I don't understand why you see a problem here. Is the recovery process unreliable?

OskarS · on April 15, 2021

Here's what SQLite has to say about the matter in their article "How To Corrupt An SQLite Database File" [0]:

> 1.2. Backup or restore while a transaction is active

> Systems that run automatic backups in the background might try to make a backup copy of an SQLite database file while it is in the middle of a transaction. The backup copy then might contain some old and some new content, and thus be corrupt.

> The best approach to make reliable backup copies of an SQLite database is to make use of the backup API that is part of the SQLite library. Failing that, it is safe to make a copy of an SQLite database file as long as there are no transactions in progress by any process. If the previous transaction failed, then it is important that any rollback journal (the -journal file) or write-ahead log (the -wal file) be copied together with the database file itself.

So no, it doesn't appear that it is safe in general to just copy the file whenever. If there are transactions in progress, things can go wrong. I don't quite understand why (isn't this the same as if a power failure happens, and SQLite is resistant to that?), but this is what the docs say.

It might be different with ZFS, since the backups are atomic (and this doc might assume that they are not), but I'm not 100% sure I would rely on it.

[0]: https://www.sqlite.org/howtocorrupt.html

CodesInChaos · on April 15, 2021

I think this warning only applies to non-atomic copies. Basic copy tools will copy a file piece-by-piece, where different pieces can have different age if the file is modified concurrently, which results in corrupt output.

mulander · on April 15, 2021

The only problem with atomic filesystem snapshots is the time required to perform recovery which can be avoided. The recovery process is of course reliable.

CodesInChaos · on April 15, 2021

Does recovery take a long time? Do you have any numbers?

I'd have expected recovery to take time proportional to the size of recent/uncommitted transactions, which should be quick, even for large databases.

mulander · on April 15, 2021

I don't have experience with large SQLite databases. I had PostgreSQL take over 40 minutes for recovery on a 2 TB database set to checkpoint every 30 minutes. How long it takes will depend on the WAL size, the frequency of checkpoints and the operations that were performed before a crash/snapshot.

Annatar · on April 15, 2021

That's not how database backup works: either you use a tool like RMAN, which knows about internal Oracle database structures, or you must shut a database down in order to do a cold backup. As there is no such thing as database shutdown in SQLite's case, if you want a clean backup before doing a ZFS snapshot (and shipping it off, else it's not a backup), you must shut down the application which is writing to the SQLite database.

CodesInChaos · on April 15, 2021

Since power-loss is (hopefully) equivalent to an atomic storage snapshot, a database which doesn't produce a consistent backup via snapshots can't be safe against power loss and thus lacks the durability property of ACID.

What's unsafe is using a naive file copy tool (e.g. `cp`), which non atomically copies a running database.

Annatar · on April 22, 2021

Without taking the aforementioned steps I specified, power loss is not an atomic event, because most databases nowadays rely on the fsync(2) system call to tell them when the I/O has completed. If the fsync(2) call is unreliable because it is lying about completing the requested I/O, the RDBMS stands no chance of guaranteeing atomicity. I am sorry.

jeltz · on April 15, 2021

Are you sure? Since for PostgreSQL an atomic snapshot is enough. It is only when you do not have access to atomic snapshots that you strictly need to use the online backup api. And, yes, it will have to do a crash recovery when you first use the backup but if you cannot trust crash recovery you can't really trust your database at all.

mulander · on April 15, 2021

You are correct and what I had in mind was the required recovery. I do trust recovery but performing it can be very time consuming.