At Couchbase we did a survey of developers (this was ages ago) and the biggest m...

somedudethere · on Aug 19, 2015

I'm in the opposite camp. I don't like NoSQL because of the flexible schema. I have to build tools to make sure my data is consistent or have error handling. Migrations ensure that whenever I pull down a version of the code, the database is in the right state.

I don't see too many use cases where having being schemaless is a good thing outside of infrastructure ease of use. If you want to store arbitrary data in a table, BJSON in postgres is very efficient and flexible in that regard, while allowing you to have a schema for the rest of the data (e.g. if you are collecting data you probably want a schema for things like name, email, timestamp, etc and then have a BJSON field to jam in whatever you need.)

nstart · on Aug 20, 2015

Having worked in ERP, Ecommerce, Financial tech, and general SASS based tech stuff, I agree with you on the "not too many use cases where having schemaless is a good thing". In most cases it's a shortcut and an unnecessary tradeoff made by people to avoid the few technical issues like DB migrations (also, a solved problem in many ways as long as you don't attempt to reinvent the wheel). The only time I saw a good use for a NoSQL db was to store products in Ecommerce. Managing taxonomy and attributes was always a nightmare and everyone was constantly afraid of performance issues (being on Magento and battling the EAV system didn't help). It would have been great to have only the products being stored on a NoSQL instance and the rest of the data being on the traditional relational data store.

mkehrt · on Aug 19, 2015

As someone who spent several years studying programming languages, the thing that drives me crazy about traditional relational databases is the assumption that all data is tuple-structured. Much data is structured as unions of alternates or more complex things like maps. Shoehorning your data model into a tuple-based system is always possible, but often unnatural.

The place NoSQL shines is the acknowledgement that most data is complex. Of course it often also does away with ACIDity, which is a huge disaster (EDIT: the doing-away-with is a disaster, I mean).

duaneb · on Aug 20, 2015

I don't think this is correct. You can have a relational, columnar-stored key-value map that stores any values you want. Bonus: it's super easy to make these kinds of updates ACID. Of course, if you're maxing out the storage space, you're gonna have a rough time with indexes unless you take the EXACT SAME approach as you would with NoSQL.

I don't think there are any "inherent" problems to relational or NoSQL databases, but there are many tradeoffs. The tradeoff of NoSQL databases is that complexity gets very, very difficult to pull off in a distributed fashion. So throw 99% of the indices out the window, dumb your queries down, and cache any joins or scans as much as possible. The upside, I guess, is that the "schema" is pretty irrelevant if it's not your primary key (or secondary, in some databases). But, you lose joins, schemas, subqueries, orderings, many types of transactions, etc, etc, and a lot of "free" stuff that is really only "free" for small numbers of rows per table or strong assumptions about the data.

EDIT: Clarification, spelling.

parasubvert · on Aug 20, 2015

The relational model is extremely general, its' been argued (fairly successfully) by Date and Codd to be THE most general model (Graph being a close second). It's a rigorous approach to managing data with integrity.

I used to be a programming language oriented person, was big into data structures and objects, but then I read Date and my mind was blown at how beautiful and expressive the relational model is -- for its intended purpose (managing data for logical integrity and ad hoc queriability).

The main issues are

1. is that many implementations don't include some features such as unions.

2. Certain things (tree traversal) have also been hard to express in older versions of SQL or older versions of Tutorial D (Chris Date's language that's closer to the model).

3. Sometimes you don't care about long term data management (i.e. ad hoc queriability and integrity), you just want programmatic data persistence with pre-baked access paths that are FAST.

4. Relational integrity features are often crude implementations that slow things down too much or require custom triggers.

5. Queriability in reality requires decent knowledge of the physical layout and indexing if you're going to make it performant

6. Most relational databases have not been built in a cloud native era where we assume distribution across ephemeral disks and compute

So... great mathematical model, great way to think through and organize your data for no ambiguity, but the practical implementations leave a lot to be desired.

The problem is that "my data is too complex for the relational model" often means "I haven't thought through my data". Things like maps, unions, ordered sets, N-ary relationships, graphs and trees, are actually quite straight forward to represent in relations. The challenge is many of the lessons and arguments for this are trapped in books from the 70s-90s, not on the Web.

cmrdporcupine · on Aug 20, 2015

Agree, 100%. I too early in my career was very enamored by object and graph databases (this is pre the 'nosql' buzz), but once I started reading Date and Codd (and the inflammatory Fabian Pascal) some lights starting turning on in my brain.

Firstly, it should be made clear to people that SQL is not truly relational, and a lot of the things people dislike about it are nothing to do with the relational model and more to do with its late 70s, early 80s heritage. It was thrown together at a time when business systems were still very focused around COBOL.

The second thing that people are not picking up on is that the industry _already_ went through a pre-sql "nosql" phase in the 60s and 70s when network and hierarchical databases were the norm, and the relational model was developed to deal with the perceived faults those systems had: an enforced topology which could not easily rearranged at query time, lack of rigor in modeling, lack of standardized modeling concepts and notation...

Finally, I did find in previous jobs certain uses for nosql systems -- very low latency high throughput quasi-realtime systems that deal with very small bits of simply structured data and need to distribute it widely across a cluster. For that I used Cassandra (tho I understand now that there are successors that are better).

What I don't get is the point of systems like Redis or MongoDB which don't offer a compelling distribution implementation and simply replace the fairly well understood quasi-relational model of SQL with their own ultimately inferior graph/network/hierarchical models.

parasubvert · on Aug 20, 2015

As a fan of Cassandra (though not for its relational model ;), what successors do you believe are better? Riak is the only one that comes to mind.

Btw, to me the point of Redis and Mongo is a very fast distributed dictionary. They're data structure servers, for when you want to persist and share data structures across processes, not "manage information". It depends on your goal.

mkehrt · on Aug 20, 2015

Oh, this is true, and I think I came off too harshly. The relational model is general enough to support (almost?) any data model, it certainly has a lot of advantages in terms of efficient implementation, and the math is elegant.

I just don't think that it naturally reflects the data structures people use, and we should be willing to make the computer do the work, rather than humans.

rhizome · on Aug 20, 2015

Where do you get that "most data is complex" in a way that leans away from relational DBs?