Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
LowDB – A flat JSON file database (github.com/typicode)
148 points by ca98am79 on Aug 21, 2014 | hide | past | favorite | 70 comments


This has like zero fault tolerancy and by zero i mean none. Don't even push flush to disk... it has a level of a in memory hash of persistence and is totally insecure in terms of multithreading. As a Toy to show off yeah could work but... in the very end of the day. You can minus me but this is crap.


I think the idea is cool, actually I once worked on something similar in Go, very small code base. And adding flushing, caching and thread-safety is no black art. The charming thing here is that, once severe Data errors happen, one can easily recover from them using a simple Editor.

Imagine how to fix low-level data corruption on your <Insert your DBMS name> Server.


once severe Data errors happen, one can easily recover from them using a simple Editor.

Unfortunately, in this case, you can't necessarily. One of the issues here is that holding on to handles to multiple collections doesn't do what it seems to, and also doesn't fail. Try running this:

    var low = require('lowdb');

    var foo = low('foo');
    low('bar');

    low.on('add', function(collection) {
        console.log("this should be foo:", collection);
    });

    foo.insert({ name: 'foo' });


Actually, in my code I always start with low():

  low('foo').insert({ name: 'foo' })
  low('bar').insert({ name: 'bar' })
  low('foo').insert({ name: 'foo' })
So never really considered this way of writing or had this bug.

But I can see that it's a flaw and unexpected behavior. I'm adding this to the enhancements for the next version.

Thanks for spotting it :)


Just a little update. I've released LowDB 0.4.0 which is a complete rewrite of the project.

It fixes some initial flaws/bugs and includes some new features. Also, writing is now fault-tolerant and fully asynchronous.


How. Dare. You. (now please downvote me)


If you're looking for something similar for Python, check out BlitzDB:

https://github.com/adewes/blitzdb

It's a serverless, flat-file, document-oriented database which supports indexing and transactions and comes with a built-in ORM layer and a rich query language modeled after MongoDB.


On a similar tip for PHP, check out my library Flywheel:

https://github.com/jamesmoss/flywheel

It's no way near as powerful as BlitzDB but I'm slowly adding features to it.



I've been using TinyDB[1] on Python for a little project - it seems to offer similar functionality.

[1] http://tinydb.readthedocs.org/en/latest/usage.html


Unsure if OP is the maintainer, but a quick point: Benchmarks without any indication of underlying hardware don't convey a huge amount of information.


I think the most important thing is to compare with other known dbs on the same hardware, though.


Not really. Different hardware constraints can make serious differences in performance metrics. Some systems perform better on high memory systems and low hd speed, others are the opposite.


Yep, also missing on what collection sizes benchmarks are performed. If that's on a 1 doc, 10 docs or 10k docs collections the results are going to be very different. With these kidns of results I expect the dataset size to be quite small.

Also benchmarks should be runnable by users :)


To be fair, it is in the repo, under the /benchmark folder. But don't go look at them, it will make you sad.


I was thinking almost the same, but to be fair that benchmark is useful to see how different operations compare (eg, delete is 29 times slower than read).


Please don't use this for anything significant. Shared global state and synchronous file I/O pretty much guarantee that it will blow up in any non-trivial use.


Perhaps this could benefit from ArangoDB's (1) architecture -- using memory mapped files in append mode and writing to file asynchronously with a new revision number for each write. Or conversely, ArangoDB could benefit from a simple API like this one.

(1) https://www.arangodb.org


What demarcates trivial and non-trivial applications? # of machines, guaranteeing xyz?


When I say trivial I mean one machine, one user, no errors.

The shared global state (see here: https://github.com/typicode/lowdb/blob/481cf43d6b0a52c1cb996...) means that attempting to use more than one collection will cause issues but it will silently succeed. The synchronous file I/O will grind your process to a halt.

If you are serving a mock API for the purpose of local development you might be fine. If you're trying to build any web application at all, this isn't the right choice.


Do you think in theory that it is possible to create a flat file based system that is multi-thread/multiuser safe?

My somewhat ignorant inclination is to suggest that maildir fits this description.


Certainly. SQLite is a great example of one. It handles the tradeoff elegantly: reads can be done in parallel but writes have to be sequential.

I don't believe maildir would be described as a flat file.


Well I suppose then maybe I have the wrong idea of what "flat file database" means. To you anyway. I would not actually describe sqlite as a flat file database system... Since the data cannot be read safely directly from the file, using a text editor, or -- since a flat file could be binary data, any arbitrary program that just knows the format. And "flat" implies there is no special data structure involved. I would think that would not be the case for sqlite given how well it performs, it must use some kind of binary tree or clever indexing system of some sort.

You must go via the actual sqlite process.

On the other hand maildir, composed of multiple ascii compatible text files, can be read by any email client- provided it follows the protocol for reading and writing of files in the maildir spec.


The term "flat file database" conventionally refers to a database that is serialized to a single file, whether it is plain text or a binary. The emphasis here is on the single file aspect: it's really easy to transfer the entire database between machines.


Another meaning of "flat file database" is non-relational -- the kind of thing you might create if you were unaware of normal form and tried to cram the entire data model into a single table. This single table can reside in a single flat file. The file is flat as in not related to any other files. Here is an example "flat file" data model:

emp_id emp_fname emp_lname emp_phone1 emp_phone1_type emp_phone2 emp_phone1_type dept_id dept_manager dept_manager_phone1 dept_manager_phone1_type

This approach has all kinds of problems (solved 40+ years ago by Codd and others). Examples: how do you add a third phone for an employee. How do you keep the managers phone from getting out of sync across rows.

I took "flat JSON file database" to mean essentially the same thing. Something like this:

emp_id emp_JSON_object

JSON perhaps solves some issues (e.g. adding a third phone for an employee), but still suffers from most of the issues (e.g. keeping the managers phone from getting out of sync across employees). Plus it suffers from new issues, namely the fact that each JSON object could have its own layout (e.g. one of them could have a third phone) so there is a bunch of parsing overhead compared to say knowing that phone number is at a specific offset on every row.


MyISAM serializes each table to a single file on disk (well, three files, one each for the definition, data and indexes).

Would you consider MySQL to be a flat file database?


I wouldn't, but I'm not really an expert here. If every table is three files, that implies a non-trivial database would be dozens of files.

I suppose the other use of "flat file" I've heard is a flat directory with no sub-directories, but I've never heard that applied to a "flat file database".


>>And "flat" implies there is no special data structure involved.

Flat files doesn't mean neither text file, nor human-readable. Text format, human-readable format is a terrible one to store data, esp. numbers.


>> since a flat file could be binary data...


SQLite is not a flat file database, its an inprocess database that doesn't use flat files.


"As of 2011, one of the most popular flat file database engines is SQLite"

https://en.wikipedia.org/wiki/Flat_file_database#Contemporar...


"It is in Wikipedia" ⇏ "It is true"

SQLite files are not "flat files" by any definition commonly in use, including the "narrow" and "broader" definitions in that article. (Which, incidentally, is almost entirely unsourced fact claims, which violates Wikipedia's own quality standards.)


Great, you take issue with these things. Do you have a source you can point to? Could you use your time improving Wikipedia or at least pointing us poor saps on HN to more detailed information? That seems like it would be more useful than looking for a minor UTF-8 character to "prove" your point.

What are the definitions in common use that you know? Is it possible there are others you aren't aware of?


Should be noticed, the semantics of SQLite behavior in multithreaded mode is pretty cryptic, even to its long-term users (and others just silently give up, and hand-roll coarse-grained locks, I suspect):

http://sqlite.1065341.n5.nabble.com/think-I-need-better-erro...


The linked discussion has one of the most succinct comments about error handling I've seen.

You handle the result codes you know what to do with, and everything that remains means your program should explode immediately.


:) problem is, there's no clear understanding of how one can avoid the possibility of unwanted error codes ever returned in his program, but yes


There is zero issue to have a flat file database that's all multi. MySQL InnoDB, the transactional one engine, can be a single file database. [Small exception - part of the schema is stored on two separates files for easier access, there won't be a big deal to store them in the big file. Transaction log is also not in the same file for obvious reasons]

If you want a full serialization of the memory to the 'flat' file - then no. It just makes no sense. Yet, a single file databases that's multi-user, mulch-threaded and mult-transaction is all viable.


How much someone is paying you to write it and how much it costs to replace are good criteria to start with.


Related is tiny[0], an in-process document store that supports Mongo-style queries, as well as a style similar to CouchDB views. You can dump its contents to a JSON file.

Also interesting is PouchDB[1], another document storage library. It can be used with Node or in the browser through various backends (like IndexedDB), and can even replicate to CouchDB.

[0]: https://github.com/chjj/tiny [1]: http://pouchdb.com/


Hi everyone,

First, thanks for all the interest, it’s quite sudden and unexpected.

Actually, LowDB is an extract from JSON Server, a mocking REST server based on plain JSON (https://github.com/typicode/json-server).

So, basically, it's not meant to be used in critical / intensive applications.

Instead, it's much more a new convenient way to store data in simple use cases.

Regarding file writing, if your database is small or if you don't run a cluster of Node processes, you should be fine.

Regarding benchmark, as someone pointed it out, it’s mainly to show that storing to JSON file is fast enough and to compare operations speed. I agree that it says nothing about other databases and LowDB doesn't try to be the fastest either, just fast enough. By the way, 'npm run benchmark' lets you run it on your machine.

However, keep in mind too that LowDB official release is quite recent so it should be improved over time.

Anyway, thanks for all the feedbacks and I hope you’ll have fun with it :)


I do find a 'production' use case: I have a static site that uses Backbone to load a static json collection of pages. The json file itself is very small (less than 100k) and will not grow much with time. I built a small CMS to let me edit pages, the data for the CMS is dynamic (MongoDB), and I update the static json file after edit. I could use a CMS that uses LowDB instead to edit the json file directly. Keep in mind that this is one user making one or two edits per day at most. If your realize that the vast majority of small sites/blogs out there have similar requirements, then this tool makes sense. No?


I have a hard time believing that this will be performant for anything of signifiance.


here is another flat file db that ive been eyeing: https://github.com/sergeyksv/tingodb

upward compatible with mongodb.

Ive also used nedb, which is ok but tends to delete the datastore when it runs out of disk space :P


NeDB creator here. I'd like to know more about the issue you're describing, I've never heard about it and considering the test coverage I just don't see how it could happen.

Could you post an issue on https://github.com/louischatriot/nedb with your environment details and how to reproduce this?


Sure, for some reason I thought this was a known issue. I saw it mentioned somewhere else on github - but its can't be that well known since you don't know about it. Ill open an issue this afternoon.


Also, I've used nedb and highly recommend it. It's fast and comprehensive.

https://github.com/louischatriot/nedb


I always wondered in this situation:

    var topFiveSongs = low('songs')
      .where({published: true})
      .sortBy('views')
      .first(5)
      .value();
If any db engines figure when the dataset is really large and the limit is really small (no idea what the cutoff would be), if instead of sorting and giving the first five, it instead just looks for the 5 largest/smallest. Anyone have any idea on this?


PostgreSQL use a "top-N sort heapsort" for this. You can see that in effect when doing "EXPLAIN ANALYZE" on a query like this. Without an index you cannot avoid a full table scan. However the top-N heapsort avoids allocating memory for ALL the rows, since you only care about the first 5 ones

For example, I had a 6 million entry table sitting around, and asked for top 5 rows by an unindexed column. With LIMIT 5 applied it told me:

          Sort Method: top-N heapsort  Memory: 95kB
Without:

           Sort Method: quicksort  Memory: 577595kB
So if Postgres had to store all the 6 million rows sorted it would need 577 MB of work memory. If (see SHOW work_mem) it was below that that would lead to it writing a lot of temporary files to disk:

           Sort Method: external merge  Disk: 144072kB
Note how Postgres is more wasteful with memory usage than disk writes for temp storage.

Generally a query that requires you to do a full table scan on a large table should be used sparingly however.


This is what an index is for on a SQL database. You declare an index on 'published' and and index on 'views' and the engine can now optimize selection of the records you requested. Without those indices, the engine must necessarily scan every row of the table at least once.

Yes, there are analyzers with the smarts to recognize that indices on particular fields would have helped a given query, but even if the engine's going to auto-create those for you, it then has to scan every row in the table to create the index that first time.


Do you know any database that keeps the data in multiline human readable text files (like json, yaml, or even nicely formatted xml) but also provides some robustness, concurrent access by multiple users and indices to search the data fast?

My usecase would be to keep the data for a website in it and keep the datafiles themselves in repository so it can be backed up there, monitored and possibly merged.


I don't know about indices, but at least at one point news.yc used flat files for its storage. IIRC an older version is included with the Arc language examples.


Reminds me of the Ruby Standard Library Hash based datastore PStore: http://ruby-doc.org/stdlib-1.9.2/libdoc/pstore/rdoc/PStore.h... ... but without the query wrappers. Nice to see basic things done well.


And ruby has YamlStore. I use PStore and YamlStore all the time - extremely simple and helpful. But - I wouldn't necessary call any of these tools 'databases', just persistent object stores.


Are all operations sychronous? If so, this kind of makes it a no-go for most uses with NodeJS.


Interesting concept, if using it in the right way can save you number of queries and add more speed to your application. Or could be great replacement, where you need reasonable ammounts of data saved, like contests or prelaunch signups, etc.


Serious question - why should I use this over MongoDB or any other NOSQL solution?


You shouldnt. You should use this: https://github.com/sergeyksv/tingodb so that when you realize that you do need mongodb after all, you can just drop it in.


Or NeDB which solve the same constraints :)


It sounds like the NoSQL equivalent of SQLite. Which is to say, if you just want a local cache in a standalone program, rather than a network service (with all the maintenance and deployment ceremony that implies).


If you just want a local cache in a standalone program, why not simply use a POJO? There's an awful lot of ceremony here and it doesn't seem to be adding any value.


Sounds about as useful as the "XML-file only" databases some people decided was the smartest thing ever at the turn of the century.

You don't see those around anymore.


Anyone know if there is something similiar to this for Go?



Nice, I use nedb in one if my projects, always good to have alternatives.


Does it need to suck the entire database into memory to answer a query?


Is this similar to firebase?


No, the storage backend here is the local filesystem.


Benchmarking per 1000 iterations is pretty silly. My guess is that it will start to choke as soon as you have concurrency at a higher load.


I think showing a small benchmark like that somewhat indicates the target use case. I would just guess, it's something to use for single-user apps. Perhaps storing user preferences or something. Most likely an app that is storing data locally.

I think it's fairly safe to assume that this isn't going to be a wise choice for a server back-end.


JSON files are my favorite NoSQL :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: