Transactional Lock Elision and Priority Boosting

Real-time applications that use locking can be subject to priority inversion, where a low-priority thread holding a lock is preempted by a medium-priority CPU-bound thread. There must be at least one such CPU-bound medium-priority thread per CPU, because otherwise the low-priority thread would simply migrate to an idle CPU. Now that the low-priority process has been preempted indefinitely, if a high-priority thread runs and attempts to acquire the lock, it will block. It cannot acquire the lock until the low-priority thread releases it, the low-priority thread cannot release the lock until it gets a chance to run, and it cannot run while the medium-priority threads are running. In effect, the medium-priority threads are blocking the high-priority thread, hence the name “priority inversion”.

There are a number of ways to handle priority inversion, one of which is called priority inheritance. When the high-priority thread attempts to acquire the lock held by the low-priority thread, the low-priority thread inherits its high priority, which causes the low-priority thread to preempt a medium-priority thread. As soon as the low-priority thread releases the lock, it reverts back to its low priority, thus allowing the high-priority thread to proceed. This priority-inheritance procedure is sometimes called priority boosting.

But there are other uses for priority boosting beyond just avoiding priority inversion. For example, let's suppose that there is a low-priority thread that is required to make forward progress every so often, say roughly once per millisecond. Priority boosting can be used to accomplish this. The low-priority thread might be as follows:

  1   int i = 0;
  2 
  3   acquire_lock(&boost_lock[i]);
  4   for (;;) {
  5     acquire_lock(&boost_lock[!i]);
  6     release_lock(&boost_lock[i]);
  7     i = i ^ 1;
  8 
  9     do_something();
 10   }

This thread is always holding one of the two locks in the boost_lock[] array, acquiring them in alternating order while repeatedly invoking do_something(). This approach clearly depends on the low-priority thread acquiring its first lock before the system gets too busy for it to run, but this is usually easily arranged. In fact, this approach will be reasonably robust when running on operating systems with modest real-time capabilities, even on modern highly nondeterministic hardware.

Then a high-priority thread executing the following code would do the required priority boosting roughly once per millisecond:

  1   int i = 0;
  2 
  3   for (;;) {
  4     usleep(1000); /* sleep one millisecond. */
  5     acquire_lock(&boost_lock[i]);
  6     release_lock(&boost_lock[i]);
  7     i = i ^ 1;
  8   }

Every millisecond, the above code acquires one of the locks. If the low-priority thread is holding that lock, its priority is boosted. If the low-priority thread is holding the other lock, it is not boosted, but then we know that it executed at least once during the prior millisecond, as required.

What happens if HTM lock elision is applied? The low-priority process will be executing an infinite transaction, which will abort at some point, for example, the first time that the low-priority process is preempted. But once it is preempted, it must resume before it can fall back to locking. Until the low-priority process executes far enough to actually acquire the lock, the high-priority process's lock acquisitions will be ineffective, regardless of whether or not they are elided. Unfortunately, it is quite likely that the initialization-time quiet period has finished (why else did the low-priority process get preempted, after all?), so it is quite possible that the low-priority task will never get a chance to run.

It is important to note that this example also depends on timing: The developer has assumed that the low-priority process will acquire its first lock before the end of the first milliscond. As noted earlier, the probability of the developer getting away with this during the initialization-time quiet period is extremely high, so there might well be software harboring this form of lock-elision-unfriendly timing dependency that nevertheless runs reliably on modern hardware.

Please note that this is not to say that transactional lock elision is useless, or even that it is unsafe at any speed. It is just that significant caution is required when applying it to large poorly understood programs. After all, such programs just might well be relying on timing or real-time features that fail to work and play nicely with transactional lock elision.