glossary.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809

% glossary.tex
% mainfile: perfbook.tex
% SPDX-License-Identifier: CC-BY-SA-3.0

\chapter{Glossary}
%
\Epigraph{Dictionaries are inherently circular in nature.}
	 {\emph{Self Reference in word definitions},
	        David~Levary~et~al.}

\begin{description}
\item[\IXG{Acquire Load}:]
	A read from memory that has acquire semantics.
	Normal use cases pair an acquire load with a release store,
	in which case if the load returns the value stored, then all
	code executed by the loading CPU after that acquire load will
	see the effects of all memory-reference instructions executed
	by the storing CPU prior to that release store.
	Acquiring a lock provides similar memory-ordering semantics,
	hence the ``acquire'' in ``acquire load''.
	(See also ``memory barrier'' and ``release store''.)
\item[\IXGh{Address}{dependency}:]
	The value returned by a load instruction is used to compute
	a later memory-reference instruction's address.
	Address dependencies provide weak memory ordering as described
	in \cref{sec:memorder:Address Dependencies}.
	However, because compilers do not understand them, address
	dependencies are fragile, so please pay close attention to the
	potential difficulties discussed in
	\cref{sec:memorder:Address- and Data-Dependency Difficulties}.
\item[\IXGr{Amdahl's Law}:]
	If sufficient numbers of CPUs are used to run a job that has both
	a sequential portion and a concurrent portion, performance and
	scalability will be limited by the overhead of the sequential
	portion.
\item[\IXGalt{Associativity}{Cache associativity}:]
	The number of cache lines that can be held simultaneously in
	a given cache, when all of these cache lines hash identically
	in that cache.
	A cache that could hold four cache lines for each possible
	hash value would be termed a ``four-way set-associative'' cache,
	while a cache that could hold only one cache line for each
	possible hash value would be termed a ``direct-mapped'' cache.
	A cache whose associativity was equal to its capacity would
	be termed a ``fully associative'' cache.
	Fully associative caches have the advantage of eliminating
	associativity misses, but, due to hardware limitations,
	fully associative caches are normally quite limited in size.
	The associativity of the large caches found on modern microprocessors
	typically range from two-way to eight-way.
\item[\IXGalth{Associativity Miss}{associativity}{cache miss}:]
	A cache miss incurred because the corresponding CPU has recently
	accessed more data hashing to a given set of the cache than will
	fit in that set.
	Fully associative caches are not subject to associativity misses
	(or, equivalently, in fully associative caches, associativity
	and capacity misses are identical).
\item[\IXG{Atomic}:]
	An operation is considered ``atomic'' if it is not possible to
	observe any intermediate state.
	For example, on most CPUs, a store to a properly aligned pointer
	is atomic, because other CPUs will see either the old value or
	the new value, but are guaranteed not to see some mixed value
	containing some pieces of the new and old values.
\item[\IXG{Atomic Read-Modify-Write Operation}:]
	An atomic operation that both reads and writes memory is
	considered an atomic read-modify-write operation, or atomic RMW
	operation for short.
	Although the value written usually depends on the value read,
	\co{atomic_xchg()} is the exception that proves this rule.
\item[\IXGh{Bounded}{Wait Free}:]
	A forward-progress guarantee in which every thread makes
	progress within a specific finite period of time, the specific
	time being the bound.
\item[\IXGh{Bounded Population-Oblivious}{Wait Free}:]
	A forward-progress guarantee in which every thread makes
	progress within a specific finite period of time, the specific
	time being the bound, where this bound is independent of the
	number of threads.
\item[\IXG{Cache}:]
	In modern computer systems, CPUs have caches in which to hold
	frequently used data.
	These caches can be thought of as hardware hash tables with very
	simple hash functions,
	but in which each hash bucket (termed a ``set'' by hardware types)
	can hold only a limited number of data items.
	The number of data items that can be held by each of a cache's hash
	buckets is termed the cache's ``associativity''.
	These data items are normally called ``cache lines'', which
	can be thought of a fixed-length blocks of data that circulate
	among the CPUs and memory.
\item[\IXG{Cache Coherence}:]
	A property of most modern SMP machines where all CPUs will
	observe a sequence of values for a given variable that is
	consistent with at least one global order of values for
	that variable.
	Cache coherence also guarantees that at the end of a group
	of stores to a given variable, all CPUs will agree
	on the final value for that variable.
	Note that cache coherence applies only to the series of values
	taken on by a single variable.
	In contrast, the memory consistency model for a given machine
	describes the order in which loads and stores to groups of
	variables will appear to occur.
	See \cref{sec:memorder:Cache Coherence}
	for more information.
\item[\IXG{Cache-Coherence Protocol}:]
	A communications protocol, normally implemented in hardware,
	that enforces memory consistency and ordering, preventing
	different CPUs from seeing inconsistent views of data held
	in their caches.
\item[\IXG{Cache Geometry}:]
	The size and associativity of a cache is termed its geometry.
	Each cache may be thought of as a two-dimensional array,
	with rows of cache lines (``sets'') that have the same hash
	value, and columns of cache lines (``ways'') in which every
	cache line has a different hash value.
	The associativity of a given cache is its number of
	columns (hence the name ``way''---a two-way set-associative
	cache has two ``ways''), and the size of the cache is its
	number of rows multiplied by its number of columns.
\item[\IXG{Cache Line}:]
	(1) The unit of data that circulates among the CPUs and memory,
	usually a moderate power of two in size.
	Typical cache-line sizes range from 16 to 256 bytes. \\
	(2) A physical location in a CPU cache capable of holding
	one cache-line unit of data. \\
	(3) A physical location in memory capable of holding one
	cache-line unit of data, but that it also aligned
	on a cache-line boundary.
	For example, the address of the first word of a cache line
	in memory will end in 0x00 on systems with 256-byte cache lines.
\item[\IXG{Cache Miss}:]
	A cache miss occurs when data needed by the CPU is not in
	that CPU's cache.
	The data might be missing because of a number of reasons,
	including:
	\begin{enumerate*}[(1)]
	\item This CPU has never accessed the data before
	(``startup'' or ``warmup'' miss),
	\item This CPU has recently accessed more
	data than would fit in its cache, so that some of the older
	data had to be removed (``capacity'' miss),
	\item This CPU
	has recently accessed more data in a given set\footnote{
		In hardware-cache terminology, the word ``set''
		is used in the same way that the word ``bucket''
		is used when discussing software caches.}
	than that set could hold (``associativity'' miss),
	\item Some other CPU has written to the data (or some other
	data in the same cache line) since this CPU has accessed it
	(``communication miss''), or
	\item This CPU attempted to write to a cache line that is
	currently read-only, possibly due to that line being replicated
	in other CPUs' caches.
	\end{enumerate*}
\item[\IXGalth{Capacity Miss}{capacity}{cache miss}:]
	A cache miss incurred because the corresponding CPU has recently
	accessed more data than will fit into the cache.
\item[CAS:]\glsuseriii{cas}
	Compare-and-swap operation, which is an atomic operation that
	takes a pointer, and old value, and a new value.
	If the pointed-to value is equal to the old value, it is atomically
	replaced with the new value.
	There is some variety in CAS API\@.
	One variation returns the actual pointed-to value, so that the
	caller compares the CAS return value to the specified old value,
	with equality indicating a successful CAS operation.
	Another variation returns a boolean success indication, in which
	case a pointer to the old value may be passed in, and if so,
	the old value is updated in the CAS failure case.
\item[\IXG{Clash Free}:]
	A forward-progress guarantee in which, in the absence of
	contention, at least one thread makes progress within a finite
	period of time.
\item[\IXGalth{Code Locking}{code}{locking}:]
	A simple locking design in which a ``global lock'' is used to protect
	a set of critical sections, so that access by a given thread
	to that set is
	granted or denied based only on the set of threads currently
	occupying the set of critical sections, not based on what
	data the thread intends to access.
	The scalability of a code-locked program is limited by the code;
	increasing the size of the data set will normally not increase
	scalability (in fact, will typically \emph{decrease} scalability
	by increasing ``lock contention'').
	Contrast with ``data locking''.
\item[\IXG{Combinatorial Explosion}:]
	Denotes the exponential increase in executions that
	formal-verification tools must analyze as problem size increases.
\item[\IXG{Combinatorial Implosion}:]
	Denotes the exponential decrease in executions that
	formal-verification tools must analyze when a given
	code fragment is partitioned.
\item[\IXGalth{Communication Miss}{communication}{cache miss}:]
	A cache miss incurred because some other CPU has written to
	the cache line since the last time this CPU accessed it.
\item[\IXG{Concurrent}:]
	In this book, a synonym of parallel.
	Please see \cref{sec:app:questions:What is the Difference Between ``Concurrent'' and ``Parallel''?}
	on \cpageref{sec:app:questions:What is the Difference Between ``Concurrent'' and ``Parallel''?}
	for a discussion of the recent distinction between these two
	terms.
\item[\IXGh{Control}{dependency}:]
	The value returned by a load instruction is used to determine
	whether or not a later store instruction is executed.
	Control dependencies provide weak memory ordering as described
	in \cref{sec:memorder:Control Dependencies}.
	However, because compilers do not understand them, control
	dependencies are exceedingly fragile, so please avoid using them.
	If severe performance requirements mean that you absolutely must
	use control dependencies, please carefully consider the potential
	calamities discussed in
	\cref{sec:memorder:Control-Dependency Calamities}.
	Also, please think carefully about alternative approaches that
	might permit you to meet your performance requirements without
	use of control dependencies.
\item[\IXG{Critical Section}:]
	A section of code guarded by some synchronization mechanism,
	so that its execution constrained by that primitive.
	For example, if a set of critical sections are guarded by
	the same global lock, then only one of those critical sections
	may be executing at a given time.
	If a thread is executing in one such critical section,
	any other threads must wait until the first thread completes
	before executing any of the critical sections in the set.
\item[\IXGh{Data}{dependency}:]
	The value returned by a load instruction is used to compute
	the value stored by a later store instruction.
	Data dependencies provide weak memory ordering as described
	in \cref{sec:memorder:Data Dependencies}.
	However, because compilers do not understand them, data
	dependencies are fragile, so please pay close attention to the
	potential difficulties discussed in
	\cref{sec:memorder:Address- and Data-Dependency Difficulties}.
\item[\IXGh{Data}{Locking}:]
	A scalable locking design in which each instance of a given
	data structure has its own lock.
	If each thread is using a different instance of the
	data structure, then all of the threads may be executing in
	the set of critical sections simultaneously.
	Data locking has the advantage of automatically scaling to
	increasing numbers of CPUs as the number of instances of
	data grows.
	Contrast with ``code locking''.
\item[\IXG{Data Race}:]
	A race condition in which several CPUs or threads access
	a variable concurrently, and in which at least one of those
	accesses is a store and at least one of those accesses
	is a plain access.
	It is important to note that while the presence of data races
	often indicates the presence of bugs, the absence of data races
	in no way implies the absence of bugs.
	(See ``Plain access'' and ``Race condition''.)
\item[\IXG{Deadlock}:]
	A failure mode in which each of several threads is unable to
	make progress until some other thread makes progress.
	For example, if two threads acquire a pair of locks in opposite
	orders, deadlock can result.
	More information is provided in
	\cref{sec:locking:Deadlock}.
\item[\IXG{Deadlock Free}:]
	A forward-progress guarantee in which, in the absence of
	failures, at least one thread makes progress within a finite
	period of time.
\item[\IXGh{Direct-Mapped}{Cache}:]
	A cache with only one way, so that it may hold only one cache
	line with a given hash value.
\item[\IXG{Efficiency}:]
	A measure of effectiveness normally expressed as a ratio
	of some metric actually achieved to some maximum value.
	The maximum value might be a theoretical maximum, but in
	parallel programming is often based on the corresponding
	measured single-threaded metric.
\item[\IXG{Embarrassingly Parallel}:]
	A problem or algorithm where adding threads does not significantly
	increase the overall cost of the computation, resulting in
	linear speedups as threads are added (assuming sufficient
	CPUs are available).
\item[\IXGalth{Energy Efficiency}{energy}{efficiency}:]
	Shorthand for ``energy-efficient use'' in which the goal is to
	carry out a given computation with reduced energy consumption.
	Sublinear scalability can be an obstacle to energy-efficient use
	of a multicore system.
\item[Epoch-Based Reclamation (EBR):]\glsuseriii{ebr}
	An \acr{rcu} implementation style put forward by
	\ppl{Keir}{Fraser}~\cite{KeirAnthonyFraserPhD,UCAM-CL-TR-579,KeirFraser2007withoutLocks}.
\item[\IXG{Existence Guarantee}:]
	An existence guarantee is provided by a synchronization mechanism
	that prevents a given dynamically allocated object from being
	freed for the duration of that guarantee.
	For example, \acr{rcu} provides existence guarantees for the duration
	of \acr{rcu} read-side critical sections.
	A similar but strictly weaker guarantee is provided by
	type-safe memory.
\item[\IXGh{Exclusive}{Lock}:]
	An exclusive lock is a mutual-exclusion mechanism that
	permits only one thread at a time into the
	set of critical sections guarded by that lock.
\item[\IXG{False Sharing}:]
	If two CPUs each frequently write to one of a pair of data items,
	but the pair of data items are located in the same cache line,
	this cache line will be repeatedly invalidated, ``ping-ponging''
	back and forth between the two CPUs' caches.
	This is a common cause of ``cache thrashing'', also called
	``cacheline bouncing'' (the latter most commonly in the Linux
	community).
	False sharing can dramatically reduce both performance and
	scalability.
\item[\IXG{Forward-Progress Guarantee}:]
	Algorithms or programs that guarantee that execution will
	progress at some rate under specified conditions.
	Academic forward-progress guarantees are grouped into a
	formal hierarchy shown in
	\cref{sec:advsync:Non-Blocking Synchronization}.
	A wide variety of practical forward-progress guarantees are
	provided by real-time systems, as discussed in
	\cref{sec:advsync:Parallel Real-Time Computing}.
\item[\IXG{Fragmentation}:]
	A memory pool that has a large amount of unused memory, but
	not laid out to permit satisfying a relatively small request
	is said to be fragmented.
	External fragmentation occurs when the space is divided up
	into small fragments lying between allocated blocks of memory,
	while internal fragmentation occurs when specific requests or
	types of requests have been allotted more memory than they
	actually requested.
\item[\IXGh{Fully Associative}{Cache}:]
	A fully associative cache contains only
	one set, so that it can hold any subset of
	memory that fits within its capacity.
\item[\IXG{Grace Period}:]
	A grace period is any contiguous time interval such that
	any \acr{rcu} read-side critical section that began before the
	start of that interval has
	completed before the end of that same interval.
	Many \acr{rcu} implementations define a grace period to be a
	time interval during which each thread has passed through at
	least one quiescent state.
	Since \acr{rcu} read-side critical sections by definition cannot
	contain quiescent states, these two definitions are almost
	always interchangeable.
\item[Hardware Transactional Memory (HTM):]\glsuseriii{htm}
	A transactional-memory system based on hardware instructions
	provided for this purpose, as discussed in
	\cref{sec:future:Hardware Transactional Memory}.
	(See ``Transactional memory''.)
\item[\IXG{Hazard Pointer}:]
	A scalable counterpart to a reference counter in which an
	object's reference count is represented implicitly by a count
	of the number of special hazard pointers referencing that object.
\item[\IXG{Heisenbug}:]
	A timing-sensitive bug that disappears from sight when you
	add print statements or tracing in an attempt to track it
	down.
\item[\IXG{Hot Spot}:]
	Data structure that is very heavily used, resulting in high
	levels of contention on the corresponding lock.
	One example of this situation would be a hash table with
	a poorly chosen hash function.
\item[\IXG{Humiliatingly Parallel}:]
	A problem or algorithm where adding threads significantly
	\emph{decreases} the overall cost of the computation, resulting in
	large superlinear speedups as threads are added (assuming sufficient
	CPUs are available).
\item[\IXG{Immutable}:]
	In this book, a synonym for read-mostly.
\item[\IXG{Invalidation}:]
	When a CPU wishes to write to a data item, it must first ensure
	that this data item is not present in any other CPUs' cache.
	If necessary, the item is removed from the other CPUs' caches
	via ``invalidation'' messages from the writing CPUs to any
	CPUs having a copy in their caches.
\item[IPI:]\glsuseriii{ipi}
	Inter-processor interrupt, which is an
	interrupt sent from one CPU to another.
	IPIs are used heavily in the Linux kernel, for example, within
	the scheduler to alert CPUs that a high-priority process is now
	runnable.
\item[IRQ:]\glsuseriii{irq}
	Interrupt request, often used as an abbreviation for ``interrupt''
	within the Linux kernel community, as in ``irq handler''.
\item[\IXG{Latency}:]
	The wall-clock time required for a given operation to complete.
\item[\IXG{Linearizable}:]
	A sequence of operations is ``linearizable'' if there is at
	least one global ordering of the sequence that is consistent
	with the observations of all CPUs and/or threads.
	Linearizability is much prized by many researchers, but less
	useful in practice than one might
	expect~\cite{AndreasHaas2012FIFOisnt}.
\item[\IXG{Livelock}:]
	A failure mode in which each of several threads is able to
	execute, but in which a repeating series of failed operations
	prevents any of the threads from making any useful forward progress.
	For example, incorrect use of conditional locking
	(for example, \co{spin_trylock()} in the Linux kernel)
	can result in livelock.
	More information is provided in
	\cref{sec:locking:Livelock and Starvation}.
\item[\IXG{Lock}:]
	A software abstraction that can be used to guard critical sections,
	as such, an example of a ``mutual exclusion mechanism''.
	An ``exclusive lock'' permits only one thread at a time into the
	set of critical sections guarded by that lock, while a
	``reader-writer lock'' permits any number of reading
	threads, or but one writing thread, into the set of critical
	sections guarded by that lock.
	(Just to be clear, the presence	of a writer thread in any of
	a given reader-writer lock's critical sections will prevent
	any reader from entering any of that lock's critical sections
	and vice versa.)
\item[\IXG{Lock Contention}:]
	A lock is said to be suffering contention when it is being
	used so heavily that there is often a CPU waiting on it.
	Reducing lock contention is often a concern when designing
	parallel algorithms and when implementing parallel programs.
\item[\IXG{Lock Free}:]
	A forward-progress guarantee in which at least one thread makes
	progress within a finite period of time.
\item[\IXG{Marked Access}:]
	A source-code memory access that uses a special function or
	macro, such as \co{READ_ONCE()}, \co{WRITE_ONCE()},
	\co{atomic_inc()}, and so on, in order to protect that access
	from compiler and/or hardware optimizations.
	In contrast, a plain access simply mentions the name of
	the object being accessed, so that in the following, line~2
	is the plain-access equivalent of line~1:
	\begin{VerbatimN}
	WRITE_ONCE(a, READ_ONCE(b) + READ_ONCE(c));
	a = b + c;
	\end{VerbatimN}
\item[\IXG{Memory}:]
	From the viewpoint of memory models, the main memory,
	caches, and store buffers in which values might be stored.
	However, this term is often used to denote the main memory
	itself, excluding caches and store buffers.
\item[\IXG{Memory Barrier}:]
	A compiler directive that might also include a special
	memory-barrier instruction.
	The purpose of a memory barrier is to order memory-reference
	instructions that executed before the memory barrier to precede
	those that will execute following that memory barrier.
	(See also ``read memory barrier'' and ``write memory barrier''.)
\item[\IXGh{Memory}{Consistency}:]
	A set of properties that impose constraints on the order in
	which accesses to groups of variables appear to occur.
	Memory consistency models range from sequential consistency,
	a very constraining model popular in academic circles, through
	process consistency, release consistency, and weak consistency.
\item[\IXGaltr{MESI Protocol}{MESI protocol}:]
	The
	cache-coherence protocol featuring
	modified, exclusive, shared, and invalid (MESI) states,
	so that this protocol is named after the states that the
	cache lines in a given cache can take on.
	A modified line has been recently written to by this CPU,
	and is the sole representative of the current value of
	the corresponding memory location.
	An exclusive cache line has not been written to, but this
	CPU has the right to write to it at any time, as the line
	is guaranteed not to be replicated into any other CPU's cache
	(though the corresponding location in main memory is up to date).
	A shared cache line is (or might be) replicated in some other
	CPUs' cache, meaning that this CPU must interact with those other
	CPUs before writing to this cache line.
	An invalid cache line contains no value, instead representing
	``empty space'' in the cache into which data from memory might
	be loaded.
\item[\IXGaltr{Moore's Law}{Moore's Law}:]
	A 1965 empirical projection by Gordon Moore that
	transistor density increases exponentially over
	time~\cite{GordonMoore1965MooresLaw}.
\item[\IXG{Mutual-Exclusion Mechanism}:]
	A software abstraction that regulates threads' access to
	``critical sections'' and corresponding data.
\item[NMI:]\glsuseriii{nmi}
	Non-maskable interrupt.
	As the name indicates, this is an extremely high-priority
	interrupt that cannot be masked.
	These are used for hardware-specific purposes such as profiling.
	The advantage of using NMIs for profiling is that it allows you
	to profile code that runs with interrupts disabled.
\item[\IXG{Non-Blocking}:]
	A group of academic forward-progress guarantees that includes
	bounded population-oblivious wait free,
	bounded wait free,
	wait free,
	lock free,
	obstruction free,
	clash free,
	starvation free, and
	deadlock free.
	See \cref{sec:advsync:Non-Blocking Synchronization}
	for more information.
\item[Non-Blocking Synchronization (NBS):]\glsuseriii{nbs}
	The use of algorithms, mechanisms, or techniques that provide
	non-blocking forward-progress guarantees.
	NBS is often used in a more restrictive sense of providing one
	of the stronger forward-progress guarantees, usually wait free or
	lock free, but sometimes also obstruction free.
	(See ``Non-blocking''.)
\item[NUCA:]\glsuseriii{nuca}
	Non-uniform cache architecture, where groups of CPUs share
	caches and/or store buffers.
	CPUs in a group can therefore exchange cache lines with each
	other much more quickly than they can with CPUs in other groups.
	Systems comprised of CPUs with hardware threads will generally
	have a NUCA architecture.
\item[NUMA:]\glsuseriii{numa}
	Non-uniform memory architecture, where memory is split into
	banks and each such bank is ``close'' to a group of CPUs,
	the group being termed a ``NUMA node''.
	An example NUMA machine is Sequent's NUMA-Q system, where
	each group of four CPUs had a bank of memory nearby.
	The CPUs in a given group can access their memory much
	more quickly than another group's memory.
\item[\IXGaltr{NUMA Node}{NUMA node}:]
	A group of closely placed CPUs and associated memory within
	a larger NUMA machines.
\item[\IXG{Obstruction Free}:]
	A forward-progress guarantee in which, in the absence of
	contention, every thread makes progress within a finite
	period of time.
\item[\IXG{Overhead}:]
	Operations that must be executed, but which do not contribute
	directly to the work that must be accomplished.
	For example, lock acquisition and release is normally considered
	to be overhead, and specifically to be synchronization overhead.
\item[\IXG{Parallel}:]
	In this book, a synonym of concurrent.
	Please see \cref{sec:app:questions:What is the Difference Between ``Concurrent'' and ``Parallel''?}
	on \cpageref{sec:app:questions:What is the Difference Between ``Concurrent'' and ``Parallel''?}
	for a discussion of the recent distinction between these two
	terms.
\item[\IXG{Performance}:]
	Rate at which work is done, expressed as work per unit time.
	If this work is fully serialized, then the performance will
	be the reciprocal of the mean latency of the work items.
\item[\IXGr{Pipelined CPU}:]
	A CPU with a pipeline, which is
	an internal flow of instructions internal to the CPU that
	is in some way similar to an assembly line, with many of
	the same advantages and disadvantages.
	In the 1960s through the early 1980s, pipelined CPUs were the
	province of supercomputers, but started appearing in microprocessors
	(such as the 80486) in the late 1980s.
\item[\IXG{Plain Access}:]
	A source-code memory access that simply mentions the name of
	the object being accessed.
	(See ``Marked access''.)
\item[\IXGh{Process}{Consistency}:]
	A memory-consistency model in which each CPU's stores appear to
	occur in program order, but in which different CPUs might see
	accesses from more than one CPU as occurring in different orders.
\item[\IXG{Program Order}:]
	The order in which a given thread's instructions
	would be executed by a now-mythical ``in-order'' CPU that
	completely executed each instruction before proceeding to
	the next instruction.
	(The reason such CPUs are now the stuff of ancient myths
	and legends is that they were extremely slow.
	These dinosaurs were one of the many victims of
	Moore's-Law-driven increases in CPU clock frequency.
	Some claim that these beasts will roam the earth once again,
	others vehemently disagree.)
\item[\IXG{Quiescent State}:]
	In \acr{rcu}, a point in the code where there can be no references held
	to \acr{rcu}-protected data structures, which is normally any point
	outside of an \acr{rcu} read-side critical section.
	Any interval of time during which all threads pass through at
	least one quiescent state each is termed a ``grace period''.
\item[Quiescent-State-Based Reclamation (QSBR):]\glsuseriii{qsbr}
	An \acr{rcu} implementation style characterized by explicit quiescent
	states.
	In \acr{qsbr} implementations, read-side markers
	(\co{rcu_read_lock()} and \co{rcu_read_unlock()} in the Linux
	kernel) are no-ops~\cite{McKenney98,Slingwine95}.
	Hooks in other parts of the software (for example, the Linux-kernel
	scheduler) provide the quiescent states.
\item[\IXG{Race Condition}:]
	Any situation where multiple CPUs or threads can interact,
	though this term is often used in cases where such interaction
	is undesirable.
	(See ``Data race''.)
\item[\IXGaltr{RCU-Protected Data}{RCU-protected data}:]
	A block of dynamically allocated memory whose freeing will be
	deferred such that an \acr{rcu} grace period will elapse between the
	time that there were no longer any \acr{rcu}-reader-accessible pointers
	to that block and the time that that block is freed.
	This ensures that no \acr{rcu} readers will have access to that block at
	the time that it is freed.
\item[\IXGaltr{RCU-Protected Pointer}{RCU-protected pointer}:]
	A pointer to \acr{rcu}-protected data.
	Such pointers must be handled carefully, for example, any reader
	that intends to dereference an \acr{rcu}-protected pointer must
	use \co{rcu_dereference()} (or stronger) to load that pointer,
	and any updater must use \co{rcu_assign_pointer()} (or stronger)
	to store to that pointer.
	More information is provided in
	\cref{sec:memorder:Address- and Data-Dependency Difficulties}.
\item[\IXGalthmr{RCU Read-Side Critical Section}{RCU read-side}{critical section}:]
	A section of code protected by \acr{rcu}, for example, beginning with
	\co{rcu_read_lock()} and ending with \co{rcu_read_unlock()}.
	(See ``Read-side critical section''.)
\item[Read-Copy Update (RCU):]\glsuseriii{rcu}
	A synchronization mechanism that can be thought of as a replacement
	for reader-writer locking or reference counting.
	RCU provides extremely low-overhead access for readers, while
	writers incur additional overhead maintaining old versions
	for the benefit of pre-existing readers.
	Readers neither block nor spin, and thus cannot participate in
	deadlocks, however, they also can see stale data and can
	run concurrently with updates.
	RCU is thus best-suited for read-mostly situations where
	stale data can either be tolerated (as in routing tables)
	or avoided (as in the Linux kernel's System V IPC implementation).
\item[\IXGh{Read}{Memory Barrier}:]
	A memory barrier that is only guaranteed to affect the ordering
	of load instructions, that is, reads from memory.
	(See also ``memory barrier'' and ``write memory barrier''.)
\item[\IXG{Read Mostly}:]
	Read-mostly data is (again, as the name implies) rarely updated.
	However, it might be updated at any time.
\item[\IXG{Read Only}:]
	Read-only data is, as the name implies, never updated except
	by beginning-of-time initialization.
	In this book, a synonym for immutable.
\item[\IXGh{Read-Side}{Critical Section}:]
	A section of code guarded by read-acquisition of
	some reader-writer synchronization mechanism.
	For example, if one set of critical sections are guarded by
	read-acquisition of
	a given global reader-writer lock, while a second set of critical
	section are guarded by write-acquisition of that same reader-writer
	lock, then the first set of critical sections will be the
	read-side critical sections for that lock.
	Any number of threads may concurrently execute the read-side
	critical sections, but only if no thread is executing one of
	the write-side critical sections.
	(See also ``RCU read-side critical section''.)
\item[\IXGh{Reader-Writer}{Lock}:]
	A reader-writer lock is a mutual-exclusion mechanism that
	permits any number of reading
	threads, or but one writing thread, into the set of critical
	sections guarded by that lock.
	Threads attempting to write must wait until all pre-existing
	reading threads release the lock, and, similarly, if there
	is a pre-existing writer, any threads attempting to write must
	wait for the writer to release the lock.
	A key concern for reader-writer locks is ``fairness'':
	Can an unending stream of readers starve a writer or vice versa?
\item[\IXG{Real Time}:]
	A situation in which getting the correct result is not sufficient,
	but where this result must also be obtained within a given amount
	of time.
\item[\IXG{Reference Count}:]
	A counter that tracks the number of users of a given object or
	entity.
	Reference counters provide existence guarantees and are sometimes
	used to implement garbage collectors.
\item[\IXG{Release Store}:]
	A write to memory that has release semantics.
	Normal use cases pair an acquire load with a release store,
	in which case if the load returns the value stored, then all
	code executed by the loading CPU after that acquire load will
	see the effects of all memory-reference instructions executed
	by the storing CPU prior to that release store.
	Releasing a lock provides similar memory-ordering semantics,
	hence the ``release'' in ``release store''.
	(See also ``acquire load'' and ``memory barrier''.)
\item[\IXG{Scalability}:]
	A measure of how effectively a given system is able to utilize
	additional resources.
	For parallel computing, the additional resources are usually
	additional CPUs.
\item[\IXGh{Sequence}{Lock}:]
	A reader-writer synchronization mechanism in which readers
	retry their operations if a writer was present.
\item[\IXGh{Sequential}{Consistency}:]
	A memory-consistency model where all memory references appear to occur
	in an order consistent with
	a single global order, and where each CPU's memory references
	appear to all CPUs to occur in program order.
\item[Software Transactional Memory (HTM):]\glsuseriii{stm}
	A transactional-memory system capable running on computer systems
	without special hardware support.
	(See ``Transactional memory''.)
\item[\IXG{Starvation}:]
	A condition where at least one CPU or thread is unable to make
	progress due to an unfortunate series of resource-allocation
	decisions, as discussed in
	\cref{sec:locking:Livelock and Starvation}.
	For example, in a multisocket system, CPUs on one socket having
	privileged access to the data structure implementing a given lock
	could prevent CPUs on other sockets from ever acquiring that lock.
\item[\IXG{Starvation Free}:]
	A forward-progress guarantee in which, in the absence of
	failures, every thread makes progress within a finite
	period of time.
\item[\IXG{Store Buffer}:]
	A small set of internal registers used by a given CPU
	to record pending stores
	while the corresponding cache lines are making their
	way to that CPU\@.
	Also called ``store queue''.
\item[\IXG{Store Forwarding}:]
	An arrangement where a given CPU refers to its store buffer
	as well as its cache so as to ensure that the software sees
	the memory operations performed by this CPU as if they
	were carried out in program order.
\item[\IXGr{Superscalar CPU}:]
	A scalar (non-vector) CPU capable of executing multiple instructions
	concurrently.
	This is a step up from a pipelined CPU that executes multiple
	instructions in an assembly-line fashion---in a superscalar
	CPU, each stage of the pipeline would be capable of handling
	more than one instruction.
	For example, if the conditions were exactly right,
	the Intel Pentium Pro CPU from the mid-1990s could
	execute two (and sometimes three) instructions per clock cycle.
	Thus, a 200\,MHz Pentium Pro CPU could ``retire'', or complete the
	execution of, up to 400 million instructions per second.
\item[\IXG{Synchronization}:]
	Means for avoiding destructive interactions among CPUs or threads.
	Synchronization mechanisms include atomic RMW operations, memory
	barriers, locking, reference counting, hazard pointers, sequence
	locking, RCU, non-blocking synchronization, and transactional
	memory.
\item[\IXG{Teachable}:]
	A topic, concept, method, or mechanism that teachers believe that
	they understand completely and are therefore comfortable teaching.
\item[\IXG{Throughput}:]
	A performance metric featuring work items completed per unit time.
\item[Transactional Lock Elision (TLE):]\glsuseriii{tle}
	The use of transactional memory to emulate locking.
	Synchronization is instead carried out by conflicting accesses
	to the data to be protected by the lock.
	In some cases, this can increase performance because TLE
	avoids contention on the lock
	word~\cite{MartinPohlack2011HTM2TLE,Kleen:2014:SEL:2566590.2576793,PascalFelber2016rwlockElision,SeongJaePark2020HTMRCUlock}.
\item[Transactional Memory (TM):]\glsuseriii{tm}
	A synchronization mechanism that gathers groups of memory
	accesses so as to execute them atomically from the viewpoint
	of transactions on other CPUs or threads, discussed in
	\cref{sec:future:Transactional Memory,sec:future:Hardware Transactional Memory}.
\item[\IXG{Type-Safe Memory}:]
	Type-safe memory~\cite{Cheriton96a} is provided by a
	synchronization mechanism that prevents a given dynamically
	allocated object from changing to an incompatible type.
	Note that the object might well be freed and then reallocated, but
	the reallocated object is guaranteed to be of a compatible type.
	Within the Linux kernel, type-safe memory is provided within
	RCU read-side critical sections for memory allocated from slabs
	marked with the \co{SLAB_TYPESAFE_BY_RCU} flag.
	The strictly stronger existence guarantee also prevents freeing
	of the protected object.
\item[Unbounded Transactional Memory (UTM):]\glsuseriii{utm}
	A transactional-memory system based on hardware instructions
	provided for this purpose, but with special hardware or
	software capabilities that allow a given transaction to
	have a very large memory footprint.
	Such a system would at least partially avoid
	HTM's transaction-size limitations called out in
	\cref{sec:future:Transaction-Size Limitations}.
	(See ``Hardware transactional memory''.)
\item[\IXG{Unfairness}:]
	A condition where the progress of at least one CPU or thread
	is impeded by an unfortunate series of resource-allocation
	decisions, as discussed in
	\cref{sec:locking:Livelock and Starvation}.
	Extreme levels of unfairness are termed ``starvation''.
\item[\IXG{Unteachable}:]
	A topic, concept, method, or mechanism that the teacher does
	not understand well is therefore uncomfortable teaching.
\item[\IXGr{Vector CPU}:]
	A CPU that can apply a single instruction to multiple items of
	data concurrently.
	In the 1960s through the 1980s, only supercomputers had vector
	capabilities, but the advent of MMX in x86 CPUs and VMX in
	PowerPC CPUs brought vector processing to the masses.
\item[\IXG{Wait Free}:]
	A forward-progress guarantee in which every thread makes
	progress within a finite period of time.
\item[\IXGh{Write}{Memory Barrier}:]
	A memory barrier that is only guaranteed to affect the ordering
	of store instructions, that is, writes to memory.
	(See also ``memory barrier'' and ``read memory barrier''.)
\item[\IXGalth{Write Miss}{write}{cache miss}:]
	A cache miss incurred because the corresponding CPU attempted
	to write to a cache line that is read-only, most likely due
	to its being replicated in other CPUs' caches.
\item[\IXG{Write Mostly}:]
	Write-mostly data is (yet again, as the name implies) frequently
	updated.
\item[\IXGh{Write-Side}{Critical Section}:]
	A section of code guarded by write-acquisition of
	some reader-writer synchronization mechanism.
	For example, if one set of critical sections are guarded by
	write-acquisition of
	a given global reader-writer lock, while a second set of critical
	section are guarded by read-acquisition of that same reader-writer
	lock, then the first set of critical sections will be the
	write-side critical sections for that lock.
	Only one thread may execute in the write-side critical section
	at a time, and even then only if there are no threads are
	executing concurrently in any of the corresponding read-side
	critical sections.
\end{description}