In case anyone here doesn’t know, Paul McKenney was one of the main contributors to the RCU[1] implementation for the Linux kernel. So the guy knows a thing or two about concurrency.
Although, somewhat amusingly, neither C++ nor Rust (which basically just copies the C++ atomics and memory model) can be used to correctly express RCU written within the language. The leading proposal for C++ (at least AFAICT) is to just add RCU primitives to the library and make their correctness the implementation's concern:
> Although, somewhat amusingly neither C++ nor Rust [...] can be used to correctly express RCU written within the language.
What do you mean exactly? Of course the required remote memory barrier need to be a primitive as it is provided by the OS, and ultimately the hardware, but AFAIK the asymmetric synchronization needed by RCU can be modelled within the C++ as a signal and a atomic signal fence.
You can't express the equivalent of Linux's rcu_dereference primitive in a way that avoids data races without using memory_order_acquire (or memory_order_consume, which is treated like memory_order_acquire in all implementations, and is fundamentally broken as I link to in my sibling reply). With a memory consistency model weaker than TSO (e.g. ARM, RISC-V, POWER), this imposes unnecessary overhead and eliminates some of the benefits of RCU.
To work around this flaw, you can use inline assembly, implementation-specific "compiler-only" barriers, etc. but that's outside of the language. It also forces the compiler to be much more pessimistic about optimizations than is actually necessary to preserve dependency ordering.
Can you provide any extra information about what is unable to be expressed? I’d like to look into what sort of constructs could be useful in this context.
The keyword to search for is "memory_order_consume", which was C++11's proposed solution to the problem that turned out to not work in practice. Here are some of the C++ WG documents describing the issues:
A lot of the problems center around the reliance on data dependencies for ordering, and the difficulty of expressing these requirements in a manner that is compatible with separate compilation and compiler optimizations.
Memory_order_consume was added to the C/C++ memory model to support architectures like the Alpha, where indirecting through memory (which includes array indexing, etc.) need not create a true happens_before relationship. So you can have things like:
Assume: a = 1, p = &a, b = 0
Thr. 1 | Thr. 2
b = 1 |
memory_barrier | i = *p
p = &b |
The result can be i = 0
Even though b=1 happens "before" p=&b within Thread 1, it is seen as happening after it by Thread 2. To overcome this, Thread 2 needs a barrier between the load of p from memory and the indirect load that assigns to i. Practically no other architectures have that quirk, and it only happens on Alpha because of its two separate cache banks.
Unfortunately, it has turned out to be infeasible to give memory_order_consume a proper weak semantics setting it apart from memory_order_acquire, given e.g. how common it is for indirections to be altered by compiler optimizations. So Ordering::Consume does not, as of yet, exist in Rust.
The problem of the CPU reordering things only exists on architectures like Alpha. However, there's still the problem of the compiler reordering things (which exists on every architecture), and memory_order_consume was intended to solve both. As you can see from the original rationale document from 2010, this RCU use-case was one of the main motivations:
[1] https://en.wikipedia.org/wiki/Read-copy-update if you just want to know what it is and http://www.rdrop.com/users/paulmck/RCU/rclock_OLS.2001.05.01... if you want the good stuff