Stupid RCU Tricks: Ordering of RCU Callback Invocation

Again, suppose a Linux-kernel task registers a pair of RCU callbacks, as follows:

call_rcu(&p->rcu, myfunc);
smp_mb();
call_rcu(&q->rcu, myfunc);

Given that these two callbacks are guaranteed to be registered in order, are they also guaranteed to be invoked in order?

Unfortunately, the answer is “no”. Here is one sequence of events that can result in misordering:

  1. The task executes the first call_rcu() while executing on CPU 0, which has a very large number of callbacks pending.
  2. While executing the smp_mb(), the task is preempted.
  3. The task resumes on CPU 1, which has no callbacks pending.
  4. The task executes the second call_rcu().
  5. A subsequent grace period completes, so that both callbacks are now ready for invocation.
  6. CPU 1 invokes the second callback almost immediately.
  7. CPU 0 must invoke all of its very large number of prior callbacks before it can invoke the first callback.

Thus, the callbacks are invoked out of order.

Please note that this is but one misordering scenario. There are many other ways that callbacks can be misordered, for example, courtesy of the fact that different CPUs become aware of the beginning and end of a given grace period at different times.

Of course, those familiar with the Linux-kernel RCU implementation will be quick to point out that it does provide an ordering guarantee if preemption is disabled:

preempt_disable();
call_rcu(&p->rcu, myfunc);
smp_mb();
call_rcu(&q->rcu, myfunc);
preempt_enable();

However, this is a consequence of the current implementation of rcu_barrier() rather than a hard-and-fast property of RCU in general. And this implementation may have to change, for example, if systems continue to increase in size and workloads continue to increase in variety, it might become necessary to load-balance RCU callbacks. This change would of course mean that callbacks could be misordered even if they were registered from a single task with preemption disabled. This would in turn require rcu_barrier() to be redesigned, but redesign appears to be par for the course for the Linux-kernel RCU implementation!

So what if your code needs RCU callbacks to be invoked in order? Here are some ways of making that happen:

  1. Use a single call_rcu() invocation to register a single callback that invokes myfunc() on both &p->rcu and &q->rcu. This of course requires that the single call_rcu() invocation be passed some data structure capable of tracking both pointers.
  2. Change the smp_mb() into an rcu_barrier(), which will force the ordering.
  3. Invoke synchronize_rcu() to wait for a grace period, directly invoke myfunc(&p->rcu) (possibly under local_bh_disable()), invoke another synchronize_rcu(), and finally directly invoke myfunc(&p->rcu) (again, possibly under local_bh_disable()).
  4. Rework myfunc() so that ordering is not required.

Given that the common case is for RCU callbacks to simply free memory, ordering should be the exception rather than the rule.