LowDB – A flat JSON file database

DogeDogeDoge · on Aug 22, 2014

This has like zero fault tolerancy and by zero i mean none. Don't even push flush to disk... it has a level of a in memory hash of persistence and is totally insecure in terms of multithreading. As a Toy to show off yeah could work but... in the very end of the day. You can minus me but this is crap.

blablabla123 · on Aug 22, 2014

I think the idea is cool, actually I once worked on something similar in Go, very small code base. And adding flushing, caching and thread-safety is no black art. The charming thing here is that, once severe Data errors happen, one can easily recover from them using a simple Editor.

Imagine how to fix low-level data corruption on your <Insert your DBMS name> Server.

couchand · on Aug 22, 2014

once severe Data errors happen, one can easily recover from them using a simple Editor.

Unfortunately, in this case, you can't necessarily. One of the issues here is that holding on to handles to multiple collections doesn't do what it seems to, and also doesn't fail. Try running this:

    var low = require('lowdb');

    var foo = low('foo');
    low('bar');

    low.on('add', function(collection) {
        console.log("this should be foo:", collection);
    });

    foo.insert({ name: 'foo' });

typicode · on Aug 22, 2014

Actually, in my code I always start with low():

  low('foo').insert({ name: 'foo' })
  low('bar').insert({ name: 'bar' })
  low('foo').insert({ name: 'foo' })

So never really considered this way of writing or had this bug.

But I can see that it's a flaw and unexpected behavior. I'm adding this to the enhancements for the next version.

Thanks for spotting it :)

typicode · on Sept 4, 2014

Just a little update. I've released LowDB 0.4.0 which is a complete rewrite of the project.

It fixes some initial flaws/bugs and includes some new features. Also, writing is now fault-tolerant and fully asynchronous.

mlvljr · on Aug 22, 2014

How. Dare. You. (now please downvote me)

ThePhysicist · on Aug 21, 2014

If you're looking for something similar for Python, check out BlitzDB:

https://github.com/adewes/blitzdb

It's a serverless, flat-file, document-oriented database which supports indexing and transactions and comes with a built-in ORM layer and a rich query language modeled after MongoDB.

jamesmoss · on Aug 21, 2014

On a similar tip for PHP, check out my library Flywheel:

https://github.com/jamesmoss/flywheel

It's no way near as powerful as BlitzDB but I'm slowly adding features to it.

max0563 · on Aug 21, 2014

Or https://github.com/Max00355/LanderDB

ShinyCyril · on Aug 22, 2014

I've been using TinyDB[1] on Python for a little project - it seems to offer similar functionality.

[1] http://tinydb.readthedocs.org/en/latest/usage.html

andrewchoi · on Aug 21, 2014

Unsure if OP is the maintainer, but a quick point: Benchmarks without any indication of underlying hardware don't convey a huge amount of information.

rakoo · on Aug 21, 2014

I think the most important thing is to compare with other known dbs on the same hardware, though.

kasey_junk · on Aug 22, 2014

Not really. Different hardware constraints can make serious differences in performance metrics. Some systems perform better on high memory systems and low hd speed, others are the opposite.

louischatriot · on Aug 22, 2014

Yep, also missing on what collection sizes benchmarks are performed. If that's on a 1 doc, 10 docs or 10k docs collections the results are going to be very different. With these kidns of results I expect the dataset size to be quite small.

Also benchmarks should be runnable by users :)

couchand · on Aug 22, 2014

To be fair, it is in the repo, under the /benchmark folder. But don't go look at them, it will make you sad.

reidrac · on Aug 22, 2014

I was thinking almost the same, but to be fair that benchmark is useful to see how different operations compare (eg, delete is 29 times slower than read).

couchand · on Aug 21, 2014

Please don't use this for anything significant. Shared global state and synchronous file I/O pretty much guarantee that it will blow up in any non-trivial use.

hliyan · on Aug 22, 2014

Perhaps this could benefit from ArangoDB's (1) architecture -- using memory mapped files in append mode and writing to file asynchronously with a new revision number for each write. Or conversely, ArangoDB could benefit from a simple API like this one.

(1) https://www.arangodb.org

pizza · on Aug 21, 2014

What demarcates trivial and non-trivial applications? # of machines, guaranteeing xyz?

couchand · on Aug 21, 2014

When I say trivial I mean one machine, one user, no errors.

The shared global state (see here: https://github.com/typicode/lowdb/blob/481cf43d6b0a52c1cb996...) means that attempting to use more than one collection will cause issues but it will silently succeed. The synchronous file I/O will grind your process to a halt.

If you are serving a mock API for the purpose of local development you might be fine. If you're trying to build any web application at all, this isn't the right choice.

TheZenPsycho · on Aug 21, 2014

Do you think in theory that it is possible to create a flat file based system that is multi-thread/multiuser safe?

My somewhat ignorant inclination is to suggest that maildir fits this description.

couchand · on Aug 21, 2014

Certainly. SQLite is a great example of one. It handles the tradeoff elegantly: reads can be done in parallel but writes have to be sequential.

I don't believe maildir would be described as a flat file.

TheZenPsycho · on Aug 22, 2014

Well I suppose then maybe I have the wrong idea of what "flat file database" means. To you anyway. I would not actually describe sqlite as a flat file database system... Since the data cannot be read safely directly from the file, using a text editor, or -- since a flat file could be binary data, any arbitrary program that just knows the format. And "flat" implies there is no special data structure involved. I would think that would not be the case for sqlite given how well it performs, it must use some kind of binary tree or clever indexing system of some sort.

You must go via the actual sqlite process.

On the other hand maildir, composed of multiple ascii compatible text files, can be read by any email client- provided it follows the protocol for reading and writing of files in the maildir spec.

couchand · on Aug 22, 2014

The term "flat file database" conventionally refers to a database that is serialized to a single file, whether it is plain text or a binary. The emphasis here is on the single file aspect: it's really easy to transfer the entire database between machines.

jmulho · on Aug 22, 2014

Another meaning of "flat file database" is non-relational -- the kind of thing you might create if you were unaware of normal form and tried to cram the entire data model into a single table. This single table can reside in a single flat file. The file is flat as in not related to any other files. Here is an example "flat file" data model:

emp_id emp_fname emp_lname emp_phone1 emp_phone1_type emp_phone2 emp_phone1_type dept_id dept_manager dept_manager_phone1 dept_manager_phone1_type

This approach has all kinds of problems (solved 40+ years ago by Codd and others). Examples: how do you add a third phone for an employee. How do you keep the managers phone from getting out of sync across rows.

I took "flat JSON file database" to mean essentially the same thing. Something like this:

emp_id emp_JSON_object

JSON perhaps solves some issues (e.g. adding a third phone for an employee), but still suffers from most of the issues (e.g. keeping the managers phone from getting out of sync across employees). Plus it suffers from new issues, namely the fact that each JSON object could have its own layout (e.g. one of them could have a third phone) so there is a bunch of parsing overhead compared to say knowing that phone number is at a specific offset on every row.

mappu · on Aug 22, 2014

MyISAM serializes each table to a single file on disk (well, three files, one each for the definition, data and indexes).

Would you consider MySQL to be a flat file database?

couchand · on Aug 22, 2014

I wouldn't, but I'm not really an expert here. If every table is three files, that implies a non-trivial database would be dozens of files.

I suppose the other use of "flat file" I've heard is a flat directory with no sub-directories, but I've never heard that applied to a "flat file database".

xxs · on Aug 22, 2014

>>And "flat" implies there is no special data structure involved.

Flat files doesn't mean neither text file, nor human-readable. Text format, human-readable format is a terrible one to store data, esp. numbers.

TheZenPsycho · on Aug 22, 2014

>> since a flat file could be binary data...

dragonwriter · on Aug 22, 2014

SQLite is not a flat file database, its an inprocess database that doesn't use flat files.

couchand · on Aug 22, 2014

"As of 2011, one of the most popular flat file database engines is SQLite"

https://en.wikipedia.org/wiki/Flat_file_database#Contemporar...

dragonwriter · on Aug 22, 2014

"It is in Wikipedia" ⇏ "It is true"

SQLite files are not "flat files" by any definition commonly in use, including the "narrow" and "broader" definitions in that article. (Which, incidentally, is almost entirely unsourced fact claims, which violates Wikipedia's own quality standards.)

couchand · on Aug 22, 2014

Great, you take issue with these things. Do you have a source you can point to? Could you use your time improving Wikipedia or at least pointing us poor saps on HN to more detailed information? That seems like it would be more useful than looking for a minor UTF-8 character to "prove" your point.

What are the definitions in common use that you know? Is it possible there are others you aren't aware of?

mlvljr · on Aug 22, 2014

Should be noticed, the semantics of SQLite behavior in multithreaded mode is pretty cryptic, even to its long-term users (and others just silently give up, and hand-roll coarse-grained locks, I suspect):

http://sqlite.1065341.n5.nabble.com/think-I-need-better-erro...

couchand · on Aug 22, 2014

The linked discussion has one of the most succinct comments about error handling I've seen.

You handle the result codes you know what to do with, and everything that remains means your program should explode immediately.

mlvljr · on Aug 22, 2014

:) problem is, there's no clear understanding of how one can avoid the possibility of unwanted error codes ever returned in his program, but yes

xxs · on Aug 22, 2014

There is zero issue to have a flat file database that's all multi. MySQL InnoDB, the transactional one engine, can be a single file database. [Small exception - part of the schema is stored on two separates files for easier access, there won't be a big deal to store them in the big file. Transaction log is also not in the same file for obvious reasons]

If you want a full serialization of the memory to the 'flat' file - then no. It just makes no sense. Yet, a single file databases that's multi-user, mulch-threaded and mult-transaction is all viable.

efsavage · on Aug 22, 2014

How much someone is paying you to write it and how much it costs to replace are good criteria to start with.

jdp · on Aug 21, 2014

Related is tiny[0], an in-process document store that supports Mongo-style queries, as well as a style similar to CouchDB views. You can dump its contents to a JSON file.

Also interesting is PouchDB[1], another document storage library. It can be used with Node or in the browser through various backends (like IndexedDB), and can even replicate to CouchDB.

[0]: https://github.com/chjj/tiny [1]: http://pouchdb.com/

typicode · on Aug 22, 2014

Hi everyone,

First, thanks for all the interest, it’s quite sudden and unexpected.

Actually, LowDB is an extract from JSON Server, a mocking REST server based on plain JSON (https://github.com/typicode/json-server).

So, basically, it's not meant to be used in critical / intensive applications.

Instead, it's much more a new convenient way to store data in simple use cases.

Regarding file writing, if your database is small or if you don't run a cluster of Node processes, you should be fine.

Regarding benchmark, as someone pointed it out, it’s mainly to show that storing to JSON file is fast enough and to compare operations speed. I agree that it says nothing about other databases and LowDB doesn't try to be the fastest either, just fast enough. By the way, 'npm run benchmark' lets you run it on your machine.

However, keep in mind too that LowDB official release is quite recent so it should be improved over time.

Anyway, thanks for all the feedbacks and I hope you’ll have fun with it :)

bikamonki · on Aug 22, 2014

I do find a 'production' use case: I have a static site that uses Backbone to load a static json collection of pages. The json file itself is very small (less than 100k) and will not grow much with time. I built a small CMS to let me edit pages, the data for the CMS is dynamic (MongoDB), and I update the static json file after edit. I could use a CMS that uses LowDB instead to edit the json file directly. Keep in mind that this is one user making one or two edits per day at most. If your realize that the vast majority of small sites/blogs out there have similar requirements, then this tool makes sense. No?

ddevault · on Aug 21, 2014

I have a hard time believing that this will be performant for anything of signifiance.

mkoryak · on Aug 22, 2014

here is another flat file db that ive been eyeing: https://github.com/sergeyksv/tingodb

upward compatible with mongodb.

Ive also used nedb, which is ok but tends to delete the datastore when it runs out of disk space :P

louischatriot · on Aug 22, 2014

NeDB creator here. I'd like to know more about the issue you're describing, I've never heard about it and considering the test coverage I just don't see how it could happen.

Could you post an issue on https://github.com/louischatriot/nedb with your environment details and how to reproduce this?

mkoryak · on Aug 22, 2014

Sure, for some reason I thought this was a known issue. I saw it mentioned somewhere else on github - but its can't be that well known since you don't know about it. Ill open an issue this afternoon.

keyle · on Aug 22, 2014

Also, I've used nedb and highly recommend it. It's fast and comprehensive.

https://github.com/louischatriot/nedb

kolodny · on Aug 22, 2014

I always wondered in this situation:

    var topFiveSongs = low('songs')
      .where({published: true})
      .sortBy('views')
      .first(5)
      .value();

If any db engines figure when the dataset is really large and the limit is really small (no idea what the cutoff would be), if instead of sorting and giving the first five, it instead just looks for the 5 largest/smallest. Anyone have any idea on this?

Erwin · on Aug 22, 2014

PostgreSQL use a "top-N sort heapsort" for this. You can see that in effect when doing "EXPLAIN ANALYZE" on a query like this. Without an index you cannot avoid a full table scan. However the top-N heapsort avoids allocating memory for ALL the rows, since you only care about the first 5 ones

For example, I had a 6 million entry table sitting around, and asked for top 5 rows by an unindexed column. With LIMIT 5 applied it told me:

          Sort Method: top-N heapsort  Memory: 95kB

Without:

           Sort Method: quicksort  Memory: 577595kB

So if Postgres had to store all the 6 million rows sorted it would need 577 MB of work memory. If (see SHOW work_mem) it was below that that would lead to it writing a lot of temporary files to disk:

           Sort Method: external merge  Disk: 144072kB

Note how Postgres is more wasteful with memory usage than disk writes for temp storage.

Generally a query that requires you to do a full table scan on a large table should be used sparingly however.

delinka · on Aug 22, 2014

This is what an index is for on a SQL database. You declare an index on 'published' and and index on 'views' and the engine can now optimize selection of the records you requested. Without those indices, the engine must necessarily scan every row of the table at least once.

Yes, there are analyzers with the smarts to recognize that indices on particular fields would have helped a given query, but even if the engine's going to auto-create those for you, it then has to scan every row in the table to create the index that first time.

scotty79 · on Aug 22, 2014

Do you know any database that keeps the data in multiline human readable text files (like json, yaml, or even nicely formatted xml) but also provides some robustness, concurrent access by multiple users and indices to search the data fast?

My usecase would be to keep the data for a website in it and keep the datafiles themselves in repository so it can be backed up there, monitored and possibly merged.

nitrogen · on Aug 22, 2014

I don't know about indices, but at least at one point news.yc used flat files for its storage. IIRC an older version is included with the Arc language examples.

chippy · on Aug 21, 2014

Reminds me of the Ruby Standard Library Hash based datastore PStore: http://ruby-doc.org/stdlib-1.9.2/libdoc/pstore/rdoc/PStore.h... ... but without the query wrappers. Nice to see basic things done well.

andyl · on Aug 21, 2014

And ruby has YamlStore. I use PStore and YamlStore all the time - extremely simple and helpful. But - I wouldn't necessary call any of these tools 'databases', just persistent object stores.

jonny_eh · on Aug 21, 2014

Are all operations sychronous? If so, this kind of makes it a no-go for most uses with NodeJS.

eim_kasp · on Aug 22, 2014

Interesting concept, if using it in the right way can save you number of queries and add more speed to your application. Or could be great replacement, where you need reasonable ammounts of data saved, like contests or prelaunch signups, etc.

misiti3780 · on Aug 21, 2014

Serious question - why should I use this over MongoDB or any other NOSQL solution?

mkoryak · on Aug 22, 2014

You shouldnt. You should use this: https://github.com/sergeyksv/tingodb so that when you realize that you do need mongodb after all, you can just drop it in.

louischatriot · on Aug 22, 2014

Or NeDB which solve the same constraints :)

thristian · on Aug 21, 2014

It sounds like the NoSQL equivalent of SQLite. Which is to say, if you just want a local cache in a standalone program, rather than a network service (with all the maintenance and deployment ceremony that implies).

couchand · on Aug 21, 2014

If you just want a local cache in a standalone program, why not simply use a POJO? There's an awful lot of ceremony here and it doesn't seem to be adding any value.

josteink · on Aug 22, 2014

Sounds about as useful as the "XML-file only" databases some people decided was the smartest thing ever at the turn of the century.

You don't see those around anymore.

econic · on Aug 21, 2014

Anyone know if there is something similiar to this for Go?

fiatmoney · on Aug 21, 2014

https://github.com/HouzuoGuo/tiedot

Guillaume86 · on Aug 21, 2014

Nice, I use nedb in one if my projects, always good to have alternatives.

krazykringle · on Aug 22, 2014

Does it need to suck the entire database into memory to answer a query?

kpennell · on Aug 21, 2014

Is this similar to firebase?

couchand · on Aug 21, 2014

No, the storage backend here is the local filesystem.

korzun · on Aug 21, 2014

Benchmarking per 1000 iterations is pretty silly. My guess is that it will start to choke as soon as you have concurrency at a higher load.

jakejake · on Aug 22, 2014

I think showing a small benchmark like that somewhat indicates the target use case. I would just guess, it's something to use for single-user apps. Perhaps storing user preferences or something. Most likely an app that is storing data locally.

I think it's fairly safe to assume that this isn't going to be a wise choice for a server back-end.

derpalittlederp · on Aug 22, 2014

JSON files are my favorite NoSQL :D