Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I use Gitea(Github clone) locally with SQLite running on ZFS where I can take atomic snapshots of both the SQLite database and git repositories.

ZFS snapshots may not result in a consistent databases backup of sqlite. You should use VACUUM INTO and then do a snapshot or use the sqlite Backup API.

See my other comment: https://www.sqlite.org/backup.html

Essentially, while you have a proper atomic snapshot of changes on disk the in flight transactions won't be there and you will be on the mercy of having a sucessful recovery from the journal/wal if you have that.



When I back up a database, I expect to get a consistent snapshot of the database, which includes all committed transactions. Uncommitted transactions can't be included in this, since there is no guarantee they'd ever commit.

From your answer it sounds like I'd get such a snapshot (even if it relies on recovery from a journal internally). So I don't understand why you see a problem here. Is the recovery process unreliable?


Here's what SQLite has to say about the matter in their article "How To Corrupt An SQLite Database File" [0]:

> 1.2. Backup or restore while a transaction is active

> Systems that run automatic backups in the background might try to make a backup copy of an SQLite database file while it is in the middle of a transaction. The backup copy then might contain some old and some new content, and thus be corrupt.

> The best approach to make reliable backup copies of an SQLite database is to make use of the backup API that is part of the SQLite library. Failing that, it is safe to make a copy of an SQLite database file as long as there are no transactions in progress by any process. If the previous transaction failed, then it is important that any rollback journal (the -journal file) or write-ahead log (the -wal file) be copied together with the database file itself.

So no, it doesn't appear that it is safe in general to just copy the file whenever. If there are transactions in progress, things can go wrong. I don't quite understand why (isn't this the same as if a power failure happens, and SQLite is resistant to that?), but this is what the docs say.

It might be different with ZFS, since the backups are atomic (and this doc might assume that they are not), but I'm not 100% sure I would rely on it.

[0]: https://www.sqlite.org/howtocorrupt.html


I think this warning only applies to non-atomic copies. Basic copy tools will copy a file piece-by-piece, where different pieces can have different age if the file is modified concurrently, which results in corrupt output.


The only problem with atomic filesystem snapshots is the time required to perform recovery which can be avoided. The recovery process is of course reliable.


Does recovery take a long time? Do you have any numbers?

I'd have expected recovery to take time proportional to the size of recent/uncommitted transactions, which should be quick, even for large databases.


I don't have experience with large SQLite databases. I had PostgreSQL take over 40 minutes for recovery on a 2 TB database set to checkpoint every 30 minutes. How long it takes will depend on the WAL size, the frequency of checkpoints and the operations that were performed before a crash/snapshot.


That's not how database backup works: either you use a tool like RMAN, which knows about internal Oracle database structures, or you must shut a database down in order to do a cold backup. As there is no such thing as database shutdown in SQLite's case, if you want a clean backup before doing a ZFS snapshot (and shipping it off, else it's not a backup), you must shut down the application which is writing to the SQLite database.


Since power-loss is (hopefully) equivalent to an atomic storage snapshot, a database which doesn't produce a consistent backup via snapshots can't be safe against power loss and thus lacks the durability property of ACID.

What's unsafe is using a naive file copy tool (e.g. `cp`), which non atomically copies a running database.


Without taking the aforementioned steps I specified, power loss is not an atomic event, because most databases nowadays rely on the fsync(2) system call to tell them when the I/O has completed. If the fsync(2) call is unreliable because it is lying about completing the requested I/O, the RDBMS stands no chance of guaranteeing atomicity. I am sorry.


Are you sure? Since for PostgreSQL an atomic snapshot is enough. It is only when you do not have access to atomic snapshots that you strictly need to use the online backup api. And, yes, it will have to do a crash recovery when you first use the backup but if you cannot trust crash recovery you can't really trust your database at all.


You are correct and what I had in mind was the required recovery. I do trust recovery but performing it can be very time consuming.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: