RCU-Protected Deletion

Again, the code fragment is as follows:

  1 rcu_read_lock();
  2 p = rcu_dereference(gp);
  3 if (must_delete(p)) {
  4   spin_lock(&my_lock);
  5   if (must_delete(p)) {
  6     kfree_rcu(p);
  7     rcu_assign_pointer(gp, q);
  8     already_deleted(p);
  9   }
 10   spin_unlock(&my_lock);
 11 } else {
 12   do_something_with(p);
 13 }
 14 rcu_read_unlock();

This is a bit unconventional: Normally lines 6 and 7 would be interchanged. But the RCU read-side critical section will prevent the structure from being freed until after the global pointer is NULLed out, right?

Yes it will, but that does not make the above code sequence safe. Far from it. To see this, consider the following sequence of events:

  1. CPU 0 invokes the kfree_rcu() on line 6 above, which starts a new grace period.
  2. CPU 0 takes a long-running interrupt.
  3. CPU 1 notes the new grace period, passes through a quiescent state, and then informs RCU that it has passed through a quiescent state for the new grace period.
  4. CPU 1 executes the above code, and spins on line 4 because CPU 0 holds the lock.
  5. CPU 1 takes a long-running interrupt.
  6. All the other CPUs pass through quiescent states, so that CPU 0 is now the only CPU blocking the grace period.
  7. CPU 0 returns from interrupt and executes the remainder of the above code, including the spin_unlock() on line 10 and the rcu_read_unlock() on line 14.
  8. CPU 0 is now in a quiescent state, so the grace period ends, and the structure passed to kfree_rcu() is now freed.
  9. CPU 1 returns from its interrupt and acquires the lock.
  10. CPU 1 accesses the already-freed structure on line 5. Arbitrarily bad things happen.

The moral of this story: Always prevent readers from obtaining new references to the structure before starting the grace period. Always!

In this case, it is necessary to interchange lines 6 and 7 so that the readers' path to the to-be-deleted data item is removed before the grace period is started. In other words, the rcu_assign_pointer() must precede the kfree_rcu(). The resulting corrected code is as follows:

  1 rcu_read_lock();
  2 p = rcu_dereference(gp);
  3 if (must_delete(p)) {
  4   spin_lock(&my_lock);
  5   if (must_delete(p)) {
  6     rcu_assign_pointer(gp, q);
  7     kfree_rcu(p);
  8     already_deleted(p);
  9   }
 10   spin_unlock(&my_lock);
 11 } else {
 12   do_something_with(p);
 13 }
 14 rcu_read_unlock();

Many thanks to Neil Brown for asking some excellent questions that led to this scenario.