aboutsummaryrefslogtreecommitdiffstats
path: root/virtio-iommu.tex
blob: 08b358a4032eedc8f8a573f868419975053c9c4d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
\section{IOMMU device}\label{sec:Device Types / IOMMU Device}

The virtio-iommu device manages Direct Memory Access (DMA) from one or
more endpoints. It may act both as a proxy for physical IOMMUs managing
devices assigned to the guest, and as virtual IOMMU managing emulated and
paravirtualized devices.

The driver first discovers endpoints managed by the virtio-iommu device
using platform specific mechanisms. It then sends requests to create
virtual address spaces and virtual-to-physical mappings for these
endpoints. In its simplest form, the virtio-iommu supports four request
types:

\begin{enumerate}
\item Create a domain and attach an endpoint to it.  \\
  \texttt{attach(endpoint = 0x8, domain = 1)}
\item Create a mapping between a range of guest-virtual and guest-physical
  address. \\
  \texttt{map(domain = 1, virt_start = 0x1000, virt_end = 0x1fff,
          phys = 0xa000, flags = READ)}

  Endpoint 0x8, for example a hardware PCI endpoint with BDF 00:01.0, can
  now read at addresses 0x1000-0x1fff. These accesses are translated
  into system-physical addresses by the IOMMU.

\item Remove the mapping.\\
  \texttt{unmap(domain = 1, virt_start = 0x1000, virt_end = 0x1fff)}

  Any access to addresses 0x1000-0x1fff by endpoint 0x8 would now be
  rejected.
\item Detach the device and remove the domain.\\
  \texttt{detach(endpoint = 0x8, domain = 1)}
\end{enumerate}

\subsection{Device ID}\label{sec:Device Types / IOMMU Device / Device ID}

23

\subsection{Virtqueues}\label{sec:Device Types / IOMMU Device / Virtqueues}

\begin{description}
\item[0] requestq
\item[1] eventq
\end{description}

\subsection{Feature bits}\label{sec:Device Types / IOMMU Device / Feature bits}

\begin{description}
\item[VIRTIO_IOMMU_F_INPUT_RANGE (0)]
  Available range of virtual addresses is described in
    \field{input_range}.

\item[VIRTIO_IOMMU_F_DOMAIN_RANGE (1)]
  The number of domains supported is described in \field{domain_range}.

\item[VIRTIO_IOMMU_F_MAP_UNMAP (2)]
  Map and unmap requests are available.\footnote{Future extensions may add
  different modes of operations. At the moment, only
  VIRTIO_IOMMU_F_MAP_UNMAP is supported.}

\item[VIRTIO_IOMMU_F_BYPASS (3)]
  When not attached to a domain, endpoints downstream of the IOMMU
  can access the guest-physical address space.

\item[VIRTIO_IOMMU_F_PROBE (4)]
  The PROBE request is available.

\item[VIRTIO_IOMMU_F_MMIO (5)]
  The VIRTIO_IOMMU_MAP_F_MMIO flag is available.
\end{description}

\drivernormative{\subsubsection}{Feature bits}{Device Types / IOMMU Device / Feature bits}

The driver SHOULD accept any of the VIRTIO_IOMMU_F_INPUT_RANGE,
VIRTIO_IOMMU_F_DOMAIN_RANGE and VIRTIO_IOMMU_F_PROBE feature bits if
offered by the device.

\devicenormative{\subsubsection}{Feature bits}{Device Types / IOMMU Device / Feature bits}

The device SHOULD offer feature bit VIRTIO_IOMMU_F_MAP_UNMAP.

\subsection{Device configuration layout}\label{sec:Device Types / IOMMU Device / Device configuration layout}

The \field{page_size_mask} field is always present. Availability of the
others all depend on feature bits described in
\ref{sec:Device Types / IOMMU Device / Feature bits}.

\begin{lstlisting}
struct virtio_iommu_config {
  le64 page_size_mask;
  struct virtio_iommu_range_64 {
    le64 start;
    le64 end;
  } input_range;
  struct virtio_iommu_range_32 {
    le32 start;
    le32 end;
  } domain_range;
  le32 probe_size;
};
\end{lstlisting}

\drivernormative{\subsubsection}{Device configuration layout}{Device Types / IOMMU Device / Device configuration layout}

The driver MUST NOT write to device configuration fields.

\devicenormative{\subsubsection}{Device configuration layout}{Device Types / IOMMU Device / Device configuration layout}

The device MUST set at least one bit in \field{page_size_mask}, describing
the page granularity. The device MAY set more than one bit in
\field{page_size_mask}.

\subsection{Device initialization}\label{sec:Device Types / IOMMU Device / Device initialization}

When the device is reset, endpoints are not attached to any domain.

If the VIRTIO_IOMMU_F_BYPASS feature is negotiated, all accesses from
unattached endpoints are allowed and translated by the IOMMU using the
identity function. If the feature is not negotiated, any memory access
from an unattached endpoint fails. Upon attaching an endpoint in
bypass mode to a new domain, any memory access from the endpoint fails,
since the domain does not contain any mapping.

Future devices might support more modes of operation besides MAP/UNMAP.
Drivers verify that devices set VIRTIO_IOMMU_F_MAP_UNMAP and fail
gracefully if they don't.

\drivernormative{\subsubsection}{Device Initialization}{Device Types / IOMMU Device / Device Initialization}

The driver MUST NOT negotiate VIRTIO_IOMMU_F_MAP_UNMAP if it is incapable
of sending VIRTIO_IOMMU_T_MAP and VIRTIO_IOMMU_T_UNMAP requests.

If the VIRTIO_IOMMU_F_PROBE feature is negotiated, the driver SHOULD send a
VIRTIO_IOMMU_T_PROBE request for each endpoint before attaching the
endpoint to a domain.

\devicenormative{\subsubsection}{Device Initialization}{Device Types / IOMMU Device / Device Initialization}

If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
device SHOULD NOT let endpoints access the guest-physical address space.

\subsection{Device operations}\label{sec:Device Types / IOMMU Device / Device operations}

Driver send requests on the request virtqueue, notifies the device and
waits for the device to return the request with a status in the used ring.
All requests are split in two parts: one device-readable, one device-
writable.

\begin{lstlisting}
struct virtio_iommu_req_head {
  u8   type;
  u8   reserved[3];
};

struct virtio_iommu_req_tail {
  u8   status;
  u8   reserved[3];
};
\end{lstlisting}

Type may be one of:

\begin{lstlisting}
#define VIRTIO_IOMMU_T_ATTACH     1
#define VIRTIO_IOMMU_T_DETACH     2
#define VIRTIO_IOMMU_T_MAP        3
#define VIRTIO_IOMMU_T_UNMAP      4
#define VIRTIO_IOMMU_T_PROBE      5
\end{lstlisting}

A few general-purpose status codes are defined here.

\begin{lstlisting}
/* All good! Carry on. */
#define VIRTIO_IOMMU_S_OK         0
/* Virtio communication error */
#define VIRTIO_IOMMU_S_IOERR      1
/* Unsupported request */
#define VIRTIO_IOMMU_S_UNSUPP     2
/* Internal device error */
#define VIRTIO_IOMMU_S_DEVERR     3
/* Invalid parameters */
#define VIRTIO_IOMMU_S_INVAL      4
/* Out-of-range parameters */
#define VIRTIO_IOMMU_S_RANGE      5
/* Entry not found */
#define VIRTIO_IOMMU_S_NOENT      6
/* Bad address */
#define VIRTIO_IOMMU_S_FAULT      7
/* Insufficient resources */
#define VIRTIO_IOMMU_S_NOMEM      8
\end{lstlisting}

When the device fails to parse a request, for instance if a request is too
small for its type and the device cannot find the tail, then it is unable
to set \field{status}. In that case, it returns the buffers without
writing to them.

Range limits of some request fields are described in the device
configuration:

\begin{itemize}
\item \field{page_size_mask} contains the bitmask of all page sizes that
  can be mapped. The least significant bit set defines the page
  granularity of IOMMU mappings.

  The smallest page granularity supported by the IOMMU is one byte. It is
  legal for the driver to map one byte at a time if bit 0 of
  \field{page_size_mask} is set.

  Other bits in \field{page_size_mask} are hints and describe larger page
  sizes that the IOMMU device handles efficiently. For example, when the
  device stores mappings using a page table tree, it may be able to
  describe large mappings using a few leaf entries in intermediate tables,
  rather than using lots of entries in the last level of the tree.
  Creating mappings aligned on large page sizes can improve performance
  since they require fewer page table and TLB entries.

\item If the VIRTIO_IOMMU_F_DOMAIN_RANGE feature is offered,
  \field{domain_range} describes the values supported in a \field{domain}
  field. If the feature is not offered, any \field{domain} value is valid.

\item If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered,
  \field{input_range} contains the virtual address range that the IOMMU is
  able to translate. Any mapping request to virtual addresses outside of
  this range fails.

  If the feature is not offered, virtual mappings span over the whole
  64-bit address space (\texttt{start = 0, end = 0xffffffff ffffffff})
\end{itemize}

\drivernormative{\subsubsection}{Device operations}{Device Types / IOMMU Device / Device operations}

The driver SHOULD set field \field{reserved} of struct
virtio_iommu_req_head to zero and MUST ignore field \field{reserved} of
struct virtio_iommu_req_tail.

When a device uses a buffer without having written to it (i.e.
used length is zero), the driver SHOULD interpret it as a request failure.

If the VIRTIO_IOMMU_F_INPUT_RANGE feature is negotiated, the driver MUST
NOT send requests with \field{virt_start} less than
\field{input_range.start} or \field{virt_end} greater than
\field{input_range.end}.

If the VIRTIO_IOMMU_F_DOMAIN_RANGE feature is negotiated, the driver MUST
NOT send requests with \field{domain} less than \field{domain_range.start}
or greater than \field{domain_range.end}.

\devicenormative{\subsubsection}{Device operations}{Device Types / IOMMU Device / Device operations}

The device SHOULD set \field{status} to VIRTIO_IOMMU_S_OK if a request
succeeds.

If a request \field{type} is not recognized, the device SHOULD NOT write
the buffer and SHOULD set the used length to zero.

The device MUST ignore field \field{reserved} of struct
virtio_iommu_req_head and SHOULD set field \field{reserved} of struct
virtio_iommu_req_tail to zero.

\subsubsection{ATTACH request}\label{sec:Device Types / IOMMU Device / Device operations / ATTACH request}

\begin{lstlisting}
struct virtio_iommu_req_attach {
  struct virtio_iommu_req_head head;
  le32 domain;
  le32 endpoint;
  u8   reserved[8];
  struct virtio_iommu_req_tail tail;
};
\end{lstlisting}

Attach an endpoint to a domain. \field{domain} uniquely identifies a
domain within the virtio-iommu device. If the domain doesn't exist in the
device, it is created. Semantics of the \field{endpoint} identifier are
platform specific, but the following rules apply:

\begin{itemize}
\item The endpoint ID uniquely identifies an endpoint from the
  virtio-iommu point of view. Multiple endpoints whose DMA transactions
  are not translated by the same virtio-iommu device can have the same
  endpoint ID. Endpoints whose DMA transactions may be translated by the
  same virtio-iommu device have different endpoint IDs.

\item On some platforms, it might not be possible to completely isolate
  two endpoints from each other. For example on a conventional PCI bus,
  endpoints can snoop DMA transactions from other endpoints on the same
  bus. Such limitations need to be communicated in a platform specific
  way.
\end{itemize}

Multiple endpoints can be attached to the same domain. An endpoint can be
attached to a single domain at a time. Endpoints attached to different
domains are isolated from each other.

\drivernormative{\paragraph}{ATTACH request}{Device Types / IOMMU Device / Device operations / ATTACH request}

The driver SHOULD set \field{reserved} to zero.

The driver SHOULD ensure that endpoints that cannot be isolated from each
other are attached to the same domain.

\devicenormative{\paragraph}{ATTACH request}{Device Types / IOMMU Device / Device operations / ATTACH request}

If the \field{reserved} field of an ATTACH request is not zero, the device
MUST reject the request and set \field{status} to VIRTIO_IOMMU_S_INVAL.

If the endpoint identified by \field{endpoint} doesn't exist, the device
MUST reject the request and set \field{status} to VIRTIO_IOMMU_S_NOENT.

If another endpoint is already attached to the domain identified by
\field{domain}, then the device MAY attach the endpoint identified by
\field{endpoint} to the domain. If it cannot do so, the device MUST reject
the request and set \field{status} to VIRTIO_IOMMU_S_UNSUPP.

If the endpoint identified by \field{endpoint} is already attached to
another domain, then the device SHOULD first detach it from that domain
and attach it to the one identified by \field{domain}. In that case the
device SHOULD behave as if the driver issued a DETACH request with this
\field{endpoint}, followed by the ATTACH request. If the device cannot do
so, it MUST reject the request and set \field{status} to
VIRTIO_IOMMU_S_UNSUPP.

If properties of the endpoint (obtained with a PROBE request) are
compatible with properties of other endpoints already attached to the
requested domain, then the device SHOULD attach the endpoint. Otherwise
the device SHOULD reject the request and set \field{status} to
VIRTIO_IOMMU_S_UNSUPP.

A device that does not reject the request MUST attach the endpoint.

\subsubsection{DETACH request}

\begin{lstlisting}
struct virtio_iommu_req_detach {
  struct virtio_iommu_req_head head;
  le32 domain;
  le32 endpoint;
  u8   reserved[8];
  struct virtio_iommu_req_tail tail;
};
\end{lstlisting}

Detach an endpoint from a domain. When this request completes, the
endpoint cannot access any mapping from that domain anymore. If feature
VIRTIO_IOMMU_F_BYPASS has been negotiated, then once this request
completes all accesses from the endpoint are allowed and translated by the
IOMMU using the identity function.

After all endpoints have been successfully detached from a domain, it
ceases to exist and its ID can be reused by the driver for another domain.

\drivernormative{\paragraph}{DETACH request}{Device Types / IOMMU Device / Device operations / DETACH request}

The driver SHOULD set \field{reserved} to zero.

\devicenormative{\paragraph}{DETACH request}{Device Types / IOMMU Device / Device operations / DETACH request}

The device MUST ignore \field{reserved}.

If the endpoint identified by \field{endpoint} doesn't exist, then the
device MUST reject the request and set \field{status} to
VIRTIO_IOMMU_S_NOENT.

If the domain identified by \field{domain} doesn't exist, or if the
endpoint identified by \field{endpoint} isn't attached to this domain,
then the device MAY set the request \field{status} to
VIRTIO_IOMMU_S_INVAL.

The device MUST ensure that after being detached from a domain, the
endpoint cannot access any mapping from that domain.

\subsubsection{MAP request}\label{sec:Device Types / IOMMU Device / Device operations / MAP request}

\begin{lstlisting}
struct virtio_iommu_req_map {
  struct virtio_iommu_req_head head;
  le32  domain;
  le64  virt_start;
  le64  virt_end;
  le64  phys_start;
  le32  flags;
  struct virtio_iommu_req_tail tail;
};

/* Read access is allowed */
#define VIRTIO_IOMMU_MAP_F_READ   (1 << 0)
/* Write access is allowed */
#define VIRTIO_IOMMU_MAP_F_WRITE  (1 << 1)
/* Accesses are to memory-mapped I/O device */
#define VIRTIO_IOMMU_MAP_F_MMIO   (1 << 2)
\end{lstlisting}

Map a range of virtually-contiguous addresses to a range of
physically-contiguous addresses of the same size. After the request
succeeds, all endpoints attached to this domain can access memory in the
range $[virt\_start; virt\_end]$ (inclusive). For example, if an endpoint
accesses address $VA \in [virt\_start; virt\_end]$, the device (or the
physical IOMMU) translates the address: $PA = VA - virt\_start +
phys\_start$. If the access parameters are compatible with \field{flags}
(for instance, the access is write and \field{flags} are
VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE) then the IOMMU allows
the access to reach $PA$.

The range defined by \field{virt_start} and \field{virt_end} should be
within the limits specified by \field{input_range}. Given $phys\_end =
phys\_start + virt\_end - virt\_start$, the range defined by
\field{phys_start} and phys_end should be within the guest-physical
address space. This includes upper and lower limits, as well as any
carving of guest-physical addresses for use by the host. Guest physical
boundaries are set by the host in a platform specific way.

Availability and allowed combinations of \field{flags} depend on the
underlying IOMMU architectures. VIRTIO_IOMMU_MAP_F_READ and
VIRTIO_IOMMU_MAP_F_WRITE are usually implemented, although READ is
sometimes implied by WRITE. In addition combinations such as "WRITE and
not READ" might not be supported.

The VIRTIO_IOMMU_MAP_F_MMIO flag is a memory type rather than a protection
flag. It is only available when the VIRTIO_IOMMU_F_MMIO feature has been
negotiated. Accesses to the mapping are not speculated, buffered, cached,
split into multiple accesses or combined with other accesses. It may be
used, for example, to map Message Signaled Interrupt doorbells when a
VIRTIO_IOMMU_RESV_MEM_T_MSI region isn't available. To trigger interrupts
the endpoint performs a direct memory write to another peripheral, the IRQ
chip.

This request is only available when VIRTIO_IOMMU_F_MAP_UNMAP has been
negotiated.

\drivernormative{\paragraph}{MAP request}{Device Types / IOMMU Device / Device operations / MAP request}

The driver SHOULD set undefined \field{flags} bits to zero.

\field{virt_end} MUST be strictly greater than \field{virt_start}.

The driver SHOULD set the VIRTIO_IOMMU_MAP_F_MMIO flag when the physical
range corresponds to memory-mapped device registers. The physical range
SHOULD have a single memory type: either normal memory or memory-mapped
I/O.

If it intends to allow read accesses from endpoints attached to
the domain, the driver MUST set the VIRTIO_IOMMU_MAP_F_READ flag.

If the VIRTIO_IOMMU_F_MMIO feature isn't negotiated, the driver MUST NOT
use the VIRTIO_IOMMU_MAP_F_MMIO flag.

\devicenormative{\paragraph}{MAP request}{Device Types / IOMMU Device / Device operations / MAP request}

If \field{virt_start}, \field{phys_start} or (\field{virt_end} + 1) is
not aligned on the page granularity, the device SHOULD reject the request
and set \field{status} to VIRTIO_IOMMU_S_RANGE.

If a mapping already exists in the requested range, the device SHOULD
reject the request and set \field{status} to VIRTIO_IOMMU_S_INVAL.

If the device doesn't recognize a \field{flags} bit, it MUST reject the
request and set \field{status} to VIRTIO_IOMMU_S_INVAL.

If \field{domain} does not exist, the device SHOULD reject the request and
set \field{status} to VIRTIO_IOMMU_S_NOENT.

The device MUST NOT allow writes to a range mapped without the
VIRTIO_IOMMU_MAP_F_WRITE flag. However, if the underlying architecture
does not support write-only mappings, the device MAY allow reads to a
range mapped with VIRTIO_IOMMU_MAP_F_WRITE but not
VIRTIO_IOMMU_MAP_F_READ.

\subsubsection{UNMAP request}\label{sec:Device Types / IOMMU Device / Device operations / UNMAP request}

\begin{lstlisting}
struct virtio_iommu_req_unmap {
  struct virtio_iommu_req_head head;
  le32  domain;
  le64  virt_start;
  le64  virt_end;
  u8    reserved[4];
  struct virtio_iommu_req_tail tail;
};
\end{lstlisting}

Unmap a range of addresses mapped with VIRTIO_IOMMU_T_MAP. We define here
a mapping as a virtual region created with a single MAP request. All
mappings covered by the range $[virt\_start; virt\_end]$ (inclusive) are
removed.

The semantics of unmapping are specified in \ref{drivernormative:Device
Types / IOMMU Device / Device operations / UNMAP request} and
\ref{devicenormative:Device Types / IOMMU Device / Device operations /
UNMAP request}, and illustrated with the following requests, assuming each
example sequence starts with a blank address space. We define two
pseudocode functions \texttt{map(virt_start, virt_end) -> mapping} and
\texttt{unmap(virt_start, virt_end)}.

\begin{lstlisting}
(1) unmap(virt_start=0,
          virt_end=4)            -> succeeds, doesn't unmap anything

(2) a = map(virt_start=0,
            virt_end=9);
    unmap(0, 9)                  -> succeeds, unmaps a

(3) a = map(0, 4);
    b = map(5, 9);
    unmap(0, 9)                  -> succeeds, unmaps a and b

(4) a = map(0, 9);
    unmap(0, 4)                  -> fails, doesn't unmap anything

(5) a = map(0, 4);
    b = map(5, 9);
    unmap(0, 4)                  -> succeeds, unmaps a

(6) a = map(0, 4);
    unmap(0, 9)                  -> succeeds, unmaps a

(7) a = map(0, 4);
    b = map(10, 14);
    unmap(0, 14)                 -> succeeds, unmaps a and b
\end{lstlisting}

As illustrated by example (4), partially removing a mapping isn't
supported.

This request is only available when VIRTIO_IOMMU_F_MAP_UNMAP has been
negotiated.

\drivernormative{\paragraph}{UNMAP request}{Device Types / IOMMU Device / Device operations / UNMAP request}

The driver SHOULD set the \field{reserved} field to zero.

The range, defined by \field{virt_start} and \field{virt_end}, SHOULD
cover one or more contiguous mappings created with MAP requests. The range
MAY spill over unmapped virtual addresses.

The first address of a range MUST either be the first address of a mapping
or be outside any mapping. The last address of a range MUST either be the
last address of a mapping or be outside any mapping.

\devicenormative{\paragraph}{UNMAP request}{Device Types / IOMMU Device / Device operations / UNMAP request}

If the \field{reserved} field of an UNMAP request is not zero, the device
MAY set the request \field{status} to VIRTIO_IOMMU_S_INVAL, in which case
the device MAY perform the UNMAP operation.

If \field{domain} does not exist, the device SHOULD set the request
\field{status} to VIRTIO_IOMMU_S_NOENT.

If a mapping affected by the range is not covered in its entirety by the
range (the UNMAP request would split the mapping), then the device SHOULD
set the request \field{status} to VIRTIO_IOMMU_S_RANGE, and SHOULD NOT
remove any mapping.

If part of the range or the full range is not covered by an existing
mapping, then the device SHOULD remove all mappings affected by the range
and set the request \field{status} to VIRTIO_IOMMU_S_OK.

\subsubsection{PROBE request}\label{sec:Device Types / IOMMU Device / Device operations / PROBE request}

If the VIRTIO_IOMMU_F_PROBE feature bit is present, the driver sends a
VIRTIO_IOMMU_T_PROBE request for each endpoint that the virtio-iommu
device manages. This probe is performed before attaching the endpoint to
a domain.

\begin{lstlisting}
struct virtio_iommu_req_probe {
  struct virtio_iommu_req_head head;
  /* Device-readable */
  le32  endpoint;
  u8    reserved[64];

  /* Device-writable */
  u8    properties[probe_size];
  struct virtio_iommu_req_tail tail;
};
\end{lstlisting}

\begin{description}
\item[\field{endpoint}] has the same meaning as in ATTACH and DETACH
  requests.

\item[\field{reserved}] is used as padding, so that future extensions can
  add fields to the device-readable part.

\item[\field{properties}] contains a list of properties of the
  \field{endpoint}, filled by the device. The length of the
  \field{properties} field is \field{probe_size} bytes. Each property is
  described with a struct virtio_iommu_probe_property header, which may be
  followed by a value of size \field{length}.

\begin{lstlisting}
struct virtio_iommu_probe_property {
  le16 {
    type      : 12;
    reserved  : 4;
  };
  le16  length;
};
\end{lstlisting}

\end{description}

The driver allocates a buffer for the PROBE request, large enough to
accommodate \field{probe_size} bytes of \field{properties}. It writes
\field{endpoint} and adds the buffer to the request queue. The device
fills the \field{properties} field with a list of properties for this
endpoint.

The driver parses the first property by reading \field{type}, then
\field{length}. If the driver recognizes \field{type}, it reads and
handles the rest of the property. The driver then reads the next property,
that is located $(\field{length} + 4)$ bytes after the beginning of the
first one, and so on. The driver parses all properties until it reaches an
empty property (\field{type} is 0) or the end of \field{properties}.

Available property types are described in section
\ref{sec:Device Types / IOMMU Device / Device operations / PROBE properties}.

\drivernormative{\paragraph}{PROBE request}{Device Types / IOMMU Device / Device operations / PROBE request}

The size of \field{properties} MUST be \field{probe_size} bytes.

The driver SHOULD set field \field{reserved} of the PROBE request to zero.

If the driver doesn't recognize the \field{type} of a property, it SHOULD
ignore the property.

The driver SHOULD NOT deduce the property length from \field{type}.

The driver MUST ignore a property whose \field{reserved} field is not
zero.

If the driver ignores a property, it SHOULD continue parsing the list.

\devicenormative{\paragraph}{PROBE request}{Device Types / IOMMU Device / Device operations / PROBE request}

The device MUST ignore field \field{reserved} of a PROBE request.

If the endpoint identified by \field{endpoint} doesn't exist, then the
device SHOULD reject the request and set \field{status} to
VIRTIO_IOMMU_S_NOENT.

If the device does not offer the VIRTIO_IOMMU_F_PROBE feature, and if the
driver sends a VIRTIO_IOMMU_T_PROBE request, then the device SHOULD NOT
write the buffer and SHOULD set the used length to zero.

The device SHOULD set field \field{reserved} of a property to zero.

The device MUST write the size of a property without the struct
virtio_iommu_probe_property header, in bytes, into \field{length}.

When two properties follow each other, the device MUST put the second
property exactly $(\field{length} + 4)$ bytes after the beginning of the
first one.

If the \field{properties} list is smaller than \field{probe_size}, the
device SHOULD NOT write any property. It SHOULD reject the request and set
\field{status} to VIRTIO_IOMMU_S_INVAL.

If the device doesn't fill all \field{probe_size} bytes with properties,
it SHOULD fill the remaining bytes of \field{properties} with zeroes.

\subsubsection{PROBE properties}\label{sec:Device Types / IOMMU Device / Device operations / PROBE properties}

\begin{lstlisting}
#define VIRTIO_IOMMU_PROBE_T_RESV_MEM   1
\end{lstlisting}

\paragraph{Property RESV_MEM}\label{sec:Device Types / IOMMU Device / Device operations / PROBE properties / RESVMEM}

The RESV_MEM property describes a chunk of reserved virtual memory. It may
be used by the device to describe virtual address ranges that cannot be
used by the driver, or that are special.

\begin{lstlisting}
struct virtio_iommu_probe_resv_mem {
  struct virtio_iommu_probe_property head;
  u8    subtype;
  u8    reserved[3];
  le64  start;
  le64  end;
};
\end{lstlisting}

Fields \field{start} and \field{end} describe the range of reserved virtual
addresses. \field{subtype} may be one of:

\begin{description}
  \item[VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0)]
    These virtual addresses cannot be used in a MAP requests. The region
    is be reserved by the device, for example, if the platform needs to
    setup DMA mappings of its own.

  \item[VIRTIO_IOMMU_RESV_MEM_T_MSI (1)]
    This region is a doorbell for Message Signaled Interrupts (MSIs). It
    is similar to VIRTIO_IOMMU_RESV_MEM_T_RESERVED, in that the driver
    cannot map virtual addresses described by the property.

    In addition it provides information about MSI doorbells. If the
    endpoint doesn't have a VIRTIO_IOMMU_RESV_MEM_T_MSI property, then the
    driver creates an MMIO mapping to the doorbell of the MSI controller.
\end{description}

\drivernormative{\subparagraph}{Property RESV_MEM}{Device Types / IOMMU Device / Device operations / PROBE properties / RESVMEM}

The driver SHOULD NOT map any virtual address described by a
VIRTIO_IOMMU_RESV_MEM_T_RESERVED or VIRTIO_IOMMU_RESV_MEM_T_MSI property.

The driver MUST ignore \field{reserved}.

The driver SHOULD treat any \field{subtype} it doesn't recognize as if it
was VIRTIO_IOMMU_RESV_MEM_T_RESERVED.

\devicenormative{\subparagraph}{Property RESV_MEM}{Device Types / IOMMU Device / Device operations / PROBE properties / RESVMEM}

The device SHOULD set \field{reserved} to zero.

The device SHOULD NOT present more than one VIRTIO_IOMMU_RESV_MEM_T_MSI
property per endpoint.

The device SHOULD NOT present multiple RESV_MEM properties that overlap
each other for the same endpoint.

The device SHOULD reject a MAP request that overlaps a RESV_MEM region.

The device SHOULD NOT allow accesses from the endpoint to RESV_MEM regions
to affect any other component than the endpoint and the driver.

\subsubsection{Fault reporting}\label{sev:Device Types / IOMMU Device / Device operations / Fault reporting}

The device can report translation faults and other significant
asynchronous events on the event virtqueue. The driver initially populates
the queue with device-writeable buffers. When the device needs to report
an event, it fills a buffer and notifies the driver. The driver consumes
the report and adds a new buffer to the virtqueue.

If no buffer is available, the device can either wait for one to be
consumed, or drop the event.

\begin{lstlisting}
struct virtio_iommu_fault {
  u8    reason;
  u8    reserved[3];
  le32  flags;
  le32  endpoint;
  le32  reserved1;
  le64  address;
};

#define VIRTIO_IOMMU_FAULT_F_READ     (1 << 0)
#define VIRTIO_IOMMU_FAULT_F_WRITE    (1 << 1)
#define VIRTIO_IOMMU_FAULT_F_ADDRESS  (1 << 8)
\end{lstlisting}

\begin{description}
  \item[\field{reason}] The reason for this report. It may have the
    following values:
    \begin{description}
      \item[VIRTIO_IOMMU_FAULT_R_UNKNOWN (0)] An internal error happened, or
        an error that cannot be described with the following reasons.
      \item[VIRTIO_IOMMU_FAULT_R_DOMAIN (1)] The endpoint attempted to
        access \field{address} without being attached to a domain.
      \item[VIRTIO_IOMMU_FAULT_R_MAPPING (2)] The endpoint attempted to
        access \field{address}, which wasn't mapped in the domain or
        didn't have the correct protection flags.
    \end{description}
  \item[\field{flags}] Information about the fault context.
  \item[\field{endpoint}] The endpoint causing the fault.
  \item[\field{reserved} and \field{reserved1}] Should be zero.
  \item[\field{address}] If VIRTIO_IOMMU_FAULT_F_ADDRESS is set, the
    address causing the fault.
\end{description}

When the fault is reported by a physical IOMMU, the fault reasons may not
match exactly the reason of the original fault report. The device does its
best to find the closest match.

If the device encounters an internal error that wasn't caused by a
specific endpoint, it is unlikely that the driver would be able to do
anything else than print the fault and stop using the device, so reporting
the fault on the event queue isn't useful. In that case, we recommend
using the DEVICE_NEEDS_RESET status bit.

\drivernormative{\paragraph}{Fault reporting}{Device Types / IOMMU Device / Device operations / Fault reporting}

If the \field{reserved} field is not zero, the driver MUST ignore the
fault report.

The driver MUST ignore \field{reserved1}.

The driver MUST ignore undefined \field{flags}.

If the driver doesn't recognize \field{reason}, it SHOULD treat the fault
as if it was VIRTIO_IOMMU_FAULT_R_UNKNOWN.

\devicenormative{\paragraph}{Fault reporting}{Device Types / IOMMU Device / Device operations / Fault reporting}

The device SHOULD set \field{reserved} and \field{reserved1} to zero.

The device SHOULD set undefined \field{flags} to zero.

The device SHOULD write a valid endpoint ID in \field{endpoint}.

The device MAY omit setting VIRTIO_IOMMU_FAULT_F_ADDRESS and writing
\field{address} in any fault report, regardless of the \field{reason}.

If a buffer is too small to contain the fault report\footnotemark, the
device SHOULD NOT use multiple buffers to describe it. The device MAY fall
back to using an older fault report format that fits in the buffer.

\footnotetext{This would happen for example if the device implements a
more recent version of this specification, whose fault report contains
additional fields.}