Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Huh, that first example is comparing apples and oranges.

This is the "slow" lock-free version from there:

  std::atomic<unsigned long> sum;
  void do_work(size_t N, unsigned long* a) {
    for (size_t i = 0; i < N; ++i) sum += a[i];
  }
It atomically adds for every iteration of the loop.

This is the "fast" mutex-based version:

  unsigned long sum(0); std::mutex M;
  void do_work(size_t N, unsigned long* a) {
    unsigned long s = 0;
    for (size_t i = 0; i < N; ++i) s += a[i];
    std::lock_guard<std::mutex> L(M); sum += s;
  }
Here the sum is accumulated before being added to the total.

Of course the mutex-based version is faster. Try again with:

  std::atomic<unsigned long> sum;
  void do_work(size_t N, unsigned long* a) {
    unsigned long s = 0;
    for (size_t i = 0; i < N; ++i) s += a[i];
    sum += s;
  }


I think one of the questions at the end covers this. The speaker admits the lock choice is a gimmick and reiterates that the time sink is related to thrashing on the cache line.


I haven't watched the video up to the end, but, how do you think mutexes work under the hood? They use atomic operations. So the cache line thrashing happens in both cases.

Edit: come to think of it, they only use compare-and-exchange, which I guess is lighter on the cache lines.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: