In case anyone here doesn’t know, Paul McKenney was one of the main contributors...

cwzwarich · on Jan 6, 2023

Although, somewhat amusingly, neither C++ nor Rust (which basically just copies the C++ atomics and memory model) can be used to correctly express RCU written within the language. The leading proposal for C++ (at least AFAICT) is to just add RCU primitives to the library and make their correctness the implementation's concern:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p25...

I hope we get a next generation of systems languages with improved memory models that can actually express these things directly.

gpderetta · on Jan 6, 2023

> Although, somewhat amusingly neither C++ nor Rust [...] can be used to correctly express RCU written within the language.

What do you mean exactly? Of course the required remote memory barrier need to be a primitive as it is provided by the OS, and ultimately the hardware, but AFAIK the asymmetric synchronization needed by RCU can be modelled within the C++ as a signal and a atomic signal fence.

cwzwarich · on Jan 6, 2023

You can't express the equivalent of Linux's rcu_dereference primitive in a way that avoids data races without using memory_order_acquire (or memory_order_consume, which is treated like memory_order_acquire in all implementations, and is fundamentally broken as I link to in my sibling reply). With a memory consistency model weaker than TSO (e.g. ARM, RISC-V, POWER), this imposes unnecessary overhead and eliminates some of the benefits of RCU.

To work around this flaw, you can use inline assembly, implementation-specific "compiler-only" barriers, etc. but that's outside of the language. It also forces the compiler to be much more pessimistic about optimizations than is actually necessary to preserve dependency ordering.

throwaway17_17 · on Jan 6, 2023

Can you provide any extra information about what is unable to be expressed? I’d like to look into what sort of constructs could be useful in this context.

cwzwarich · on Jan 6, 2023

The keyword to search for is "memory_order_consume", which was C++11's proposed solution to the problem that turned out to not work in practice. Here are some of the C++ WG documents describing the issues:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p00... https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p01... https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p04... https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p07...

A lot of the problems center around the reliance on data dependencies for ordering, and the difficulty of expressing these requirements in a manner that is compatible with separate compilation and compiler optimizations.

zozbot234 · on Jan 7, 2023

Memory_order_consume was added to the C/C++ memory model to support architectures like the Alpha, where indirecting through memory (which includes array indexing, etc.) need not create a true happens_before relationship. So you can have things like:

Assume: a = 1, p = &a, b = 0

  Thr. 1         | Thr. 2
  b = 1          |
  memory_barrier | i = *p
  p = &b         |

The result can be i = 0

Even though b=1 happens "before" p=&b within Thread 1, it is seen as happening after it by Thread 2. To overcome this, Thread 2 needs a barrier between the load of p from memory and the indirect load that assigns to i. Practically no other architectures have that quirk, and it only happens on Alpha because of its two separate cache banks.

Unfortunately, it has turned out to be infeasible to give memory_order_consume a proper weak semantics setting it apart from memory_order_acquire, given e.g. how common it is for indirections to be altered by compiler optimizations. So Ordering::Consume does not, as of yet, exist in Rust.

cwzwarich · on Jan 7, 2023

The problem of the CPU reordering things only exists on architectures like Alpha. However, there's still the problem of the compiler reordering things (which exists on every architecture), and memory_order_consume was intended to solve both. As you can see from the original rationale document from 2010, this RCU use-case was one of the main motivations:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1525.htm

chandlore · on Jan 6, 2023

He also wrote a good introduction to parallel programming that's freely available as well:

https://mirrors.edge.kernel.org/pub/linux/kernel/people/paul...