> *It is a common mistake to try to use files for locking, for example, instead ...

avar · on May 31, 2016

The flock() method is preferable when you don't need to use NFS because as you say it'll automatically clean the lock up if the process holding it dies.

This gets rid of all the edge cases with stale locks in one fell swoop.

But as you point out if you want to do this e.g. over NFS you should create a file, but then you need to deal with stale locks.

If you can at all avoid that using flock() is generally better.

geofft · on May 31, 2016

http://0pointer.de/blog/projects/locking.html claims that flock() is less reliable over NFS (returns true without actually locking anything on Linux < 2.6.12 and "BSD" - not sure which BSDs or whether that's still true).

And my instinct is that in a networked scenario, you're at least as worried about a machine dying as a process on the machine (i.e. a network partition). A flock()-based lock doesn't clean itself up if the client is unreachable, does it?

avar · on May 31, 2016

Yes as I pointed out you don't want this if you're doing NFS.

Personally I prefer something like a MySQL table with GET_LOCK() to process things instead of NFS if I need multiple machines. It gives you flock() like semantics in that if a machine or client goes away the GET_LOCK() is automatically freed, i.e. it survives as long as the connection to the database survives.

Not having to deal with stale locks generally sucks way less than the extra overhead of a database.

For any NFS-based scenario you usually end up creating a "task" "task.underway" and "task.done" files as locks, and re-enqueuing tasks if you have a "underway" file that's too old without a "done" file.

You'd do the same with a MySQL table that you GET_LOCK() on, except you can safely re-enqueue "underway" tasks if you acquire the lock on them, since you know their consumers have gone away.

jonaf · on June 1, 2016

Technically, you're right that it's atomic to create a file. But creating a lock using a file can be deceptive and is a common pitfall in my experience. I have seen a lot of shell scripts take this form:

  if [ ! -f $FILE ] ; then
   touch $FILE
   # do something dangerous, assuming I have a lock
   rm $FILE
  fi

The problem here is, of course, that I've checked whether the file exists, but another process (even a concurrent execution of the same script) could remove $FILE after I've checked that it doesn't exist. Now I (or any other process) can happily proceed to create $FILE, thinking that no one else is executing simultaneously. Actually, if I ran two executions of this script at about the same time, they could both pass this check and executed the (mistakenly expectedly) "synchronized" block.

Of course, you don't have to use flock(1) to make this operation atomic. It just handles a lot of the extra work that I don't want to have to think about, even if I did set `noclobber` or something like that.