Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You need some migrations anyway or you'll get cruft in the db or worse. Think of old documents with extra fields your sw don't use anymore or without fields that are needed. Multiply by embedded documents and you get ton of problems you can solve only by taking care of data.

This happens even in development before going live for the first time, and way more often as you keep changing sw. Even if you throw away the data every time you still have to update the seeding scripts (in a relational db you have seeding + schema changes).

Anyway, what did you do? Did you keep using JSONB with that planner config setting or did you extract some data to ordinary columns?



Totally agree. At a previous job, some of the senior engineers decided to use MongoDB as the main data store, and doing migrations was among the worst things about it. I think some engineers envision that they'll just be able to do read repair and things will magically work. In practice, you can only really do read repair when you have a workload oriented to reading single records at a time and you have strict controls on concurrent access to prevent weird A-B-A errors with read repair. Complex aggregate queries are almost always impossible with read repair. Even with single records, read repair is still a pain in the ass. You often have to maintain unmarshalling code for several versions of a record formats. In the end, one of the engineers ended up having to write some internal migration tool (which was of course strictly worse than migrations via Postgres, because schema changes did require rewriting a table with update queries, so we ended up needing a bit of downtime). Even with the migration tool, shipping always required a lot of people on call, since migrations would inevitably break during the release process due to frequently brittle migration code.

As for the above story, that engineer was sort of on his way out at the time, so I used the above method to provide query hints as a short term fix. After he left, I was able to restructure the event data schema to make more use of columns. Some of the ancillary attributes that weren't used for row selection stayed as jsonb, but things like timestamp, event name, user id, etc. were moved to columns.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: