Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is actually huge. A common problem that arises when you write applications is you want to INSERT if key does not exist else UPDATE. The right way of doing this without an upsert is using a transaction. However this will make life easier as you can do it directly in one SQL statement.


Just "using a transaction" is insufficient. You must be prepared to handle the case that neither the INSERT nor the UPDATE succeeds (with READ COMMITTED isolation), or that the transaction fails (with REPEATABLE READ isolation or better), by repeating the transaction. And if latency is at all a concern to you, you must wrap this all in a stored procedure to avoid the necessary round-trip-time between the commands.

Hence this is more than just saving typing a couple lines -- this saves writing entire stupid loops to do what is conceptually a simple (and very common) operation.

Postgres gets better and better.


...and then you additionally have to verify that the specific reason both the update and the insert failed was due to concurrency, or you can make the mistake I did a couple years ago where I had transactions spinning against my database for weeks on end due to an unrelated constraint failure I had failed to notice with some transactions that I kept retrying in a loop as part of my hacked together upsert implementation (which is why my user id numbers jumped from ~8m to ~460m... it is a meaningless mistake, but one I am reminded of every day due to that "46" staring at me :/).


Is my conclusion correct that this implies sequence increments are run outside of the transaction?


Sequences in Postgres are updated even if a transaction is rolled back. It's pretty easy to see why that's important, especially when dealing with concurrent transactions.

You can't give out the same number to multiple transactions without causing a bunch of unique constraint violations. And you can't retroactively decrease a sequence number without affecting all the other transactions currently executing. You could in theory go back and plug in the gaps in a sequence, but it's more important to guarantee that a single thread of execution will always see increasing numbers than it is to guarantee that all numbers will eventually be handed out to a successful transaction.


Yes: http://www.postgresql.org/docs/9.4/static/functions-sequence...

> Important: To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the nextval later aborts. This means that aborted transactions might leave unused "holes" in the sequence of assigned values.


Yes indeed.


It's fairly trivial to do with a savepoint, but it's definitely not performant, and requires some pretty stupid client-side error-handling.

It will also cause your Postgres logs to be littered with benign but misleading error messages (possibly these could be silenced, I never bothered to find out).

So, yes, this is great news.


Minor nit: it's performant and doesn't pollute logs to implement this on the server side. I've written macros to make gen'ing upsert functions easy, given a certain table pattern. But this definitely cleans that situation up as well as makes it accessible to more casual developers.


Yes, sorry, I was referring to doing it client-side.


i've seen a system in mysql do "insert on duplicate key update" for ages, so this is that same pattern?


Yeah, it is the same as INSERT ... ON DUPLICATE KEY UPDATE, and similar to REPLACE in MySQL. Exciting to see it in Postgres!


I think the way MySQL works in doing "INSERT ... ON DUPLICATE KEY UPDATE" is to delete the old row and insert a new row if there's an auto incremental column (mostly it's ID column). Being curious if Postgres works the same way in this regard.


No, that's how REPLACE works -- which is a complete pain to be honest! INSERT ... UPDATE works the way you'd hope it would, and keeps the auto inc columns correct.


No, the auto increment columns are maintained as is.


That's great. Otherwise, it just wastes IDs unless one figures out a workaround for it.


Why do you need to use a transaction for that?

Do the update statement, followed by insert-where-not-exists. If the update doesn't match, 0 rows updated, and the insert works. If the update matches, then 0 rows inserted.


That's unfortunately not correct. The WHERE NOT EXISTS(...) will not "see" rows inserted by concurrently running transactions that have not committed yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: