This has like zero fault tolerancy and by zero i mean none. Don't even push flush to disk... it has a level of a in memory hash of persistence and is totally insecure in terms of multithreading. As a Toy to show off yeah could work but... in the very end of the day. You can minus me but this is crap.
I think the idea is cool, actually I once worked on something similar in Go, very small code base. And adding flushing, caching and thread-safety is no black art. The charming thing here is that, once severe Data errors happen, one can easily recover from them using a simple Editor.
Imagine how to fix low-level data corruption on your <Insert your DBMS name> Server.
once severe Data errors happen, one can easily recover from them using a simple Editor.
Unfortunately, in this case, you can't necessarily. One of the issues here is that holding on to handles to multiple collections doesn't do what it seems to, and also doesn't fail. Try running this:
var low = require('lowdb');
var foo = low('foo');
low('bar');
low.on('add', function(collection) {
console.log("this should be foo:", collection);
});
foo.insert({ name: 'foo' });
It's a serverless, flat-file, document-oriented database which supports indexing and transactions and comes with a built-in ORM layer and a rich query language modeled after MongoDB.
Not really. Different hardware constraints can make serious differences in performance metrics. Some systems perform better on high memory systems and low hd speed, others are the opposite.
Yep, also missing on what collection sizes benchmarks are performed. If that's on a 1 doc, 10 docs or 10k docs collections the results are going to be very different. With these kidns of results I expect the dataset size to be quite small.
I was thinking almost the same, but to be fair that benchmark is useful to see how different operations compare (eg, delete is 29 times slower than read).
Please don't use this for anything significant. Shared global state and synchronous file I/O pretty much guarantee that it will blow up in any non-trivial use.
Perhaps this could benefit from ArangoDB's (1) architecture -- using memory mapped files in append mode and writing to file asynchronously with a new revision number for each write. Or conversely, ArangoDB could benefit from a simple API like this one.
When I say trivial I mean one machine, one user, no errors.
The shared global state (see here: https://github.com/typicode/lowdb/blob/481cf43d6b0a52c1cb996...) means that attempting to use more than one collection will cause issues but it will silently succeed. The synchronous file I/O will grind your process to a halt.
If you are serving a mock API for the purpose of local development you might be fine. If you're trying to build any web application at all, this isn't the right choice.
Well I suppose then maybe I have the wrong idea of what "flat file database" means. To you anyway. I would not actually describe sqlite as a flat file database system... Since the data cannot be read safely directly from the file, using a text editor, or -- since a flat file could be binary data, any arbitrary program that just knows the format. And "flat" implies there is no special data structure involved. I would think that would not be the case for sqlite given how well it performs, it must use some kind of binary tree or clever indexing system of some sort.
You must go via the actual sqlite process.
On the other hand maildir, composed of multiple ascii compatible text files, can be read by any email client- provided it follows the protocol for reading and writing of files in the maildir spec.
The term "flat file database" conventionally refers to a database that is serialized to a single file, whether it is plain text or a binary. The emphasis here is on the single file aspect: it's really easy to transfer the entire database between machines.
Another meaning of "flat file database" is non-relational -- the kind of thing you might create if you were unaware of normal form and tried to cram the entire data model into a single table. This single table can reside in a single flat file. The file is flat as in not related to any other files. Here is an example "flat file" data model:
This approach has all kinds of problems (solved 40+ years ago by Codd and others). Examples: how do you add a third phone for an employee. How do you keep the managers phone from getting out of sync across rows.
I took "flat JSON file database" to mean essentially the same thing. Something like this:
emp_id
emp_JSON_object
JSON perhaps solves some issues (e.g. adding a third phone for an employee), but still suffers from most of the issues (e.g. keeping the managers phone from getting out of sync across employees). Plus it suffers from new issues, namely the fact that each JSON object could have its own layout (e.g. one of them could have a third phone) so there is a bunch of parsing overhead compared to say knowing that phone number is at a specific offset on every row.
I wouldn't, but I'm not really an expert here. If every table is three files, that implies a non-trivial database would be dozens of files.
I suppose the other use of "flat file" I've heard is a flat directory with no sub-directories, but I've never heard that applied to a "flat file database".
SQLite files are not "flat files" by any definition commonly in use, including the "narrow" and "broader" definitions in that article. (Which, incidentally, is almost entirely unsourced fact claims, which violates Wikipedia's own quality standards.)
Great, you take issue with these things. Do you have a source you can point to? Could you use your time improving Wikipedia or at least pointing us poor saps on HN to more detailed information? That seems like it would be more useful than looking for a minor UTF-8 character to "prove" your point.
What are the definitions in common use that you know? Is it possible there are others you aren't aware of?
Should be noticed, the semantics of SQLite behavior in multithreaded mode is pretty cryptic, even to its long-term users (and others just silently give up, and hand-roll coarse-grained locks, I suspect):
There is zero issue to have a flat file database that's all multi. MySQL InnoDB, the transactional one engine, can be a single file database. [Small exception - part of the schema is stored on two separates files for easier access, there won't be a big deal to store them in the big file. Transaction log is also not in the same file for obvious reasons]
If you want a full serialization of the memory to the 'flat' file - then no. It just makes no sense.
Yet, a single file databases that's multi-user, mulch-threaded and mult-transaction is all viable.
Related is tiny[0], an in-process document store that supports Mongo-style queries, as well as a style similar to CouchDB views. You can dump its contents to a JSON file.
Also interesting is PouchDB[1], another document storage library. It can be used with Node or in the browser through various backends (like IndexedDB), and can even replicate to CouchDB.
So, basically, it's not meant to be used in critical / intensive applications.
Instead, it's much more a new convenient way to store data in simple use cases.
Regarding file writing, if your database is small or if you don't run a cluster of Node processes, you should be fine.
Regarding benchmark, as someone pointed it out, it’s mainly to show that storing to JSON file is fast enough and to compare operations speed. I agree that it says nothing about other databases and LowDB doesn't try to be the fastest either, just fast enough.
By the way, 'npm run benchmark' lets you run it on your machine.
However, keep in mind too that LowDB official release is quite recent so it should be improved over time.
Anyway, thanks for all the feedbacks and I hope you’ll have fun with it :)
I do find a 'production' use case: I have a static site that uses Backbone to load a static json collection of pages. The json file itself is very small (less than 100k) and will not grow much with time. I built a small CMS to let me edit pages, the data for the CMS is dynamic (MongoDB), and I update the static json file after edit. I could use a CMS that uses LowDB instead to edit the json file directly. Keep in mind that this is one user making one or two edits per day at most. If your realize that the vast majority of small sites/blogs out there have similar requirements, then this tool makes sense. No?
NeDB creator here. I'd like to know more about the issue you're describing, I've never heard about it and considering the test coverage I just don't see how it could happen.
Sure, for some reason I thought this was a known issue. I saw it mentioned somewhere else on github - but its can't be that well known since you don't know about it. Ill open an issue this afternoon.
var topFiveSongs = low('songs')
.where({published: true})
.sortBy('views')
.first(5)
.value();
If any db engines figure when the dataset is really large and the limit is really small (no idea what the cutoff would be), if instead of sorting and giving the first five, it instead just looks for the 5 largest/smallest. Anyone have any idea on this?
PostgreSQL use a "top-N sort heapsort" for this. You can see that in effect when doing "EXPLAIN ANALYZE" on a query like this. Without an index you cannot avoid a full table scan. However the top-N heapsort avoids allocating memory for ALL the rows, since you only care about the first 5 ones
For example, I had a 6 million entry table sitting around, and asked for top 5 rows by an unindexed column. With LIMIT 5 applied it told me:
Sort Method: top-N heapsort Memory: 95kB
Without:
Sort Method: quicksort Memory: 577595kB
So if Postgres had to store all the 6 million rows sorted it would need 577 MB of work memory. If (see SHOW work_mem) it was below that that would lead to it writing a lot of temporary files to disk:
Sort Method: external merge Disk: 144072kB
Note how Postgres is more wasteful with memory usage than disk writes for temp storage.
Generally a query that requires you to do a full table scan on a large table should be used sparingly however.
This is what an index is for on a SQL database. You declare an index on 'published' and and index on 'views' and the engine can now optimize selection of the records you requested. Without those indices, the engine must necessarily scan every row of the table at least once.
Yes, there are analyzers with the smarts to recognize that indices on particular fields would have helped a given query, but even if the engine's going to auto-create those for you, it then has to scan every row in the table to create the index that first time.
Do you know any database that keeps the data in multiline human readable text files (like json, yaml, or even nicely formatted xml) but also provides some robustness, concurrent access by multiple users and indices to search the data fast?
My usecase would be to keep the data for a website in it and keep the datafiles themselves in repository so it can be backed up there, monitored and possibly merged.
I don't know about indices, but at least at one point news.yc used flat files for its storage. IIRC an older version is included with the Arc language examples.
And ruby has YamlStore. I use PStore and YamlStore all the time - extremely simple and helpful. But - I wouldn't necessary call any of these tools 'databases', just persistent object stores.
Interesting concept, if using it in the right way can save you number of queries and add more speed to your application. Or could be great replacement, where you need reasonable ammounts of data saved, like contests or prelaunch signups, etc.
You shouldnt. You should use this: https://github.com/sergeyksv/tingodb so that when you realize that you do need mongodb after all, you can just drop it in.
It sounds like the NoSQL equivalent of SQLite. Which is to say, if you just want a local cache in a standalone program, rather than a network service (with all the maintenance and deployment ceremony that implies).
If you just want a local cache in a standalone program, why not simply use a POJO? There's an awful lot of ceremony here and it doesn't seem to be adding any value.
I think showing a small benchmark like that somewhat indicates the target use case. I would just guess, it's something to use for single-user apps. Perhaps storing user preferences or something. Most likely an app that is storing data locally.
I think it's fairly safe to assume that this isn't going to be a wise choice for a server back-end.