Transactional Memory Everywhere?

I recently encountered a researcher who assured me that transactional memory (TM) would soon be the only synchronization mechanism, supplanting all the complex and error-prone synchronization mechanisms currently in use. This did surprise me a bit, given that TM has some difficulty with commonplace operations such as input and output, but I most certainly cannot fault the man's ambitions for TM.

Of course, anyone who has worked on a large software artifact will have encountered situations where the ability to do even very small and restricted transactions would be extremely helpful. Even the relatively slow software TM (STM) implementations could do well in such situations, as the competing deadlock-avoidance/recovery schemes can be quite complex and slow in and of themselves.

That said, if TM is to achieve the researcher's great ambitions, it will need to interact with other mechanisms, be they I/O, system calls, or other synchronization primitives, with this latter being critically important when converting legacy software to TM. To this end, let us look at challenges to TM posed by other mechanisms, along with alternative resolutions to these challenges:

In short, although TM offers much promise for small changes to memory-only data structures, there are a number of issues with large-scale transacations in real software systems:

  1. I/O operations, especially RPCs, which cannot in general be rolled back.
  2. Memory-mapping operations, particularly when you unmap some of a given transaction's variables from outside of that transaction.
  3. Multi-threaded transactions do not seem to be supported by most TM implementations, which means that TM does not generally support things like pthread_create().
  4. Extra-transactional accesses, whose behavior varies greatly from one TM implementation to another.
  5. Time delays, which can interact with the notion of atomicity in interesting and ill-defined ways.
  6. Interactions with locking, particularly reader-writer locking. These interactions are clearly critically important if TM is to be introduced into large existing software programs that use locking. Of course, the interactions between TM and RCU are a special interest of mine.
  7. Persistent transactions are an interesting possibility. There are forms of locking that can span address spaces, and that can survive reboots and even operating-system upgrades. Should TM offer similar persistence?
  8. Dynamic linking and loading of functions invoked from within transactions.
  9. Debugging transactions, especially setting breakpoints within transactions for hardware TM implementations.
  10. Transactions containing the exec() system call have interesting implications.

So, what can we conclude from this list?

  1. One interesting property of TM is the fact that transactions are subject to rollback and retry. This property underlies TM's difficulties with irreversible operations, including unbuffered I/O, RPCs, memory-mapping operations, time delays, and the exec() system call.
  2. Another interesting property of TM, noted by Shpeisman et al., is that TM intertwines the synchronization with the data it protects. This property underlies TM's issues with I/O, memory-mapping operations, extra-transactional accesses, and debugging breakpoints. In contrast, conventional synchronization primitives, including locking and RCU, maintain a clear separation between the synchronization primitives and the data that they protect.
  3. One of the stated goals of many workers in the TM area is to ease parallelization of large sequential programs. As such, individual transactions are commonly expected to execute serially, which might do much to explain TM's issues with multithreaded transactions.

What should TM researchers and developers do about all of this? One approach is to focus on TM in the small, focusing on situations where hardware assist potentially provides substantial advantages over other synchronization primitives. This is in fact the approach Sun took with its Rock research CPU. Some TM researchers seem to agree with this approach, while others have much higher hopes for TM.

Of course, it is quite possible that TM will be able to take on larger problems, and this series of blog posts lists a few of the issues that must be resolved if TM is to achieve this lofty goal.

My personal hope is that everyone involved treats this as a learning experience. It appears to me that TM researchers have great deal to learn from practitioners who have successfully built large software systems using traditional synchronization primitives.

And vice versa.