Age | Commit message (Collapse) | Author | Files | Lines |
|
In e820_setup(), the memory region of MB_BIOS is [MB_BIOS_BEGIN, MB_BIOS_END],
so its memory size should be MB_BIOS_SIZE (= MB_BIOS_END - MB_BIOS_BEGIN + 1).
The same thing goes for BDA, EBDA, MB_BIOS and VGA_ROM in setup_bios().
By the way, a little change is made in setup_irq_handler() to avoid using
hard coding.
Signed-off-by: Sicheng Liu <lsc2001@outlook.com>
Link: https://lore.kernel.org/r/SY6P282MB373318D6241D56E074B040DFA3392@SY6P282MB3733.AUSP282.PROD.OUTLOOK.COM
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We add "--disable-sbi-sta" options to allow users disable SBI steal-time
extension for the Guest.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240325153141.6816-11-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zfa extension is available expose it to the guest
via device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240325153141.6816-10-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zvfh[min] extensions are available expose it to the guest
via device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240325153141.6816-9-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zihintntl extension is available expose it to the guest
via device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240325153141.6816-8-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zfh[min] extensions are available expose it to the guest
via device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240325153141.6816-7-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the vector extensions are available expose them to the guest
via device tree so that guest can use it. This includes extensions
Zvbb, Zvbc, Zvkb, Zvkg, Zvkned, Zvknha, Zvknhb, Zvksed, Zvksh,
and Zvkt.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240325153141.6816-6-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the scalar extensions are available expose them to the guest
via device tree so that guest can use it. This includes extensions
Zbkb, Zbkc, Zbkx, Zknd, Zkne, Zknh, Zkr, Zksed, Zksh, and Zkt.
The Zkr extension requires SEED CSR emulation in user space so
we also add related KVM_EXIT_RISCV_CSR handling.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240325153141.6816-5-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zbc extension is available expose it to the guest
via device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240325153141.6816-4-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The absence of __packed definition in kvm/compiler.h cause build
failer after syncing kernel headers with Linux-6.8 because the
kernel header uapi/linux/virtio_pci.h uses __packed for structures.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240325153141.6816-3-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We sync-up Linux headers to get latest KVM RISC-V headers having
Zbc, Scalar crypto, Vector crypto, Zfh[min], Zihintntl, Zvfh[min],
Zfa, and SBI steal-time support.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20240325153141.6816-2-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Our team found that a public QEMU's 9pfs security issue[1] also exists
in upstream kvmtool's 9pfs device. A privileged guest user can create
and access the special device file (e.g., block files) in the shared
folder, allowing the malicious user to access the host device and
acheive privilege escalation.
The virtio_p9_open function code on the 9p.c only checks file directory
attributes, but does not check special files. Special device files can
be filtered on the device through the S_IFREG and S_IFDIR flag bits.
[1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-2861
Link: https://lore.kernel.org/r/20240303183659.20656-1-ywsplz@gmail.com
Signed-off-by: Yanwu Shen <ywsPlz@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
As the kvm api(https://docs.kernel.org/virt/kvm/api.html) reads,
KVM_CREATE_PIT2 call is only valid after enabling in-kernel irqchip
support via KVM_CREATE_IRQCHIP.
Signed-off-by: Tengfei Yu <moehanabichan@gmail.com>
Link: https://lore.kernel.org/r/20240129123310.28118-1-moehanabichan@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Recently due to commit 74af1456dfa0, the virtio device emulation
in KVMTOOL now calls irq__update_msix_route() upon guest poweroff
which results in KVMTOOL crash when Guest uses PLIC emulation in
user space. This is because irq__update_msix_route() expects the
irq_routing table to be available but the KVMTOOL PLIC emulation
does not populate any irq_routing entries.
Fixes: 74af1456dfa0 ("virtio: Cancel and join threads when exiting devices devices")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231130041633.78725-1-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The new SBI DBCN functions are forwarded by in-kernel KVM RISC-V module
to user-space so let us handle these calls in kvm_cpu_riscv_sbi() function.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231128145628.413414-11-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Instead of hard-coding the mmu-type DT property, we should set it
based on satp_mode ONE_REG interface.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231128145628.413414-10-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zicond extension is available expose it to the guest
via device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231128145628.413414-9-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Smstateen extension is available expose it to the guest
via device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231128145628.413414-8-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zicsr and Zifencei extension is available expose it to the guest
via device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231128145628.413414-7-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zicntr and Zihpm extension is available expose it to the guest
via device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231128145628.413414-6-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zba and Zbs extension is available expose it to the guest
via device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231128145628.413414-5-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently, the CPU_ISA_MAX_LEN is a fixed value so we will easily
run out of space when all possible ISA extensions supported by
KVM RISC-V are available.
Instead of above, let us make CPU_ISA_MAX_LEN depend upon the
isa_info_arr[] array size so that CPU_ISA_MAX_LEN automatically
adapts to growing number of ISA extensions.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231128145628.413414-4-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Let's print name of the ISA extension in warning if generate_cpu_nodes()
drops the ISA extension from generated ISA string due to lack of space.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231128145628.413414-3-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
$ ./util/update_headers.sh ~/work/linux
Signed-off-by: Will Deacon <will@kernel.org>
|
|
For RISC-V multilib toolchains, we must specify -mabi and -march
options when linking guest/init.
Fixes: 2e99678314c2 ("riscv: Initial skeletal support")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231118132847.758785-7-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The KVM RISC-V kernel module supports AIA in-kernel irqchip when
underlying host has AIA support. We detect and use AIA in-kernel
irqchip whenever possible otherwise we fallback to PLIC emulated
in user-space.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231118132847.758785-6-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
To use irqfd with in-kernel AIA irqchip, we add custom
irq__add_irqfd and irq__del_irqfd functions. This allows
us to defer actual KVM_IRQFD ioctl() until AIA irqchip
is initialized by KVMTOOL.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231118132847.758785-5-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We will be having different types of irqchip:
1) PLIC emulated by user-space
2) AIA APLIC and IMSIC provided by in-kernel KVM module
To support above, we de-couple PLIC specific code from generic
RISC-V code (such as FDT generation) so that we can easily add
other types of irqchip. As part of the PLIC de-coupling, we
introduce various riscv_irqchip_xyz global variable to describe
the chosen irqchip hence PLIC is no longer required to register
itself using device__register().
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231118132847.758785-4-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Svnapot extension is available expose it to the guest via
device tree so that guest can use it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231118132847.758785-3-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We sync-up Linux headers to get latest KVM RISC-V headers having
V, Svnapot, AIA and other extensions.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20231118132847.758785-2-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
I'm experiencing a segmentation fault in lkvm where it may crash after
powering off a guest machine that uses a virtio network device.
The crash is hard to reproduce, because looks like it only happens
when the guest machine is powering off while extra virtio threads is
doing some work,
when it happens lkvm crashes in the function virtio_net_rx_thread
while attempting to read invalid guest physical memory,
because guest physical memory was unmapped.
I've isolated the problem and looks like when lkvm exits it unmaps the
guest memory while virtio device extra threads may still be executing.
I noticed most virtio devices are not executing pthread_cancel +
pthread_join to synchronize extra threads when exiting,
to make sure this happens I added explicit calls to the virtio device
exit function to all virtio devices,
which should cancel and join all threads before unmapping guest
physical memory, fixing the crash for me.
Signed-off-by: Eduardo Bart <edub4rt@gmail.com>
Link: https://lore.kernel.org/r/20231117170455.80578-2-edub4rt@gmail.com
[will: Added commit message from https://lore.kernel.org/all/CABqCASLWAZ5aq27GuQftWsXSf7yLFCKwrJxWMUF-fiV7Bc4LUA@mail.gmail.com/]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
KVM_PCI_CFG_AREA is registered with kvm__register_mmio during pci__init,
but it isn't deregistered during pci__exit.
So, this commit is to kvm__deregister_mmio the KVM_PCI_CFG_AREA on pci__exit.
Signed-off-by: Tan En De <ende.tan@starfivetech.com>
Link: https://lore.kernel.org/r/20230916052303.1003-1-ende.tan@starfivetech.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Avoid using VIRTIO_IRQ_{HIGH,LOW} which belong to a different
namespace. Instead define VIRTIO_PCI_ISR_QUEUE as a logical extension
of the VIRTIO_PCI_ISR_* namespace. Since this bit flag is missing from
a header imported verbatim from Linux, define it directly in pci.c.
Signed-off-by: Keir Fraser <keirf@google.com>
Link: https://lore.kernel.org/r/20230912151623.2558794-4-keirf@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The PCI ISR is defined in the virtio spec as a set of flags which can
be bitwise ORed together. Therefore we should avoid clearing
previously-set flags.
Signed-off-by: Keir Fraser <keirf@google.com>
Link: https://lore.kernel.org/r/20230912151623.2558794-3-keirf@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The PCI legacy IRQ line is level triggered, but is treated as
edge triggered via kvm__irq_trigger() for signalling of config
changes.
Fix this by using kvm__irq_level(), as for queue signalling.
Signed-off-by: Keir Fraser <keirf@google.com>
Link: https://lore.kernel.org/r/20230912151623.2558794-2-keirf@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
It can be useful to disable all network devices, for example, to remove the
compat warning for the default network device when the guest does not
initialize it. This can be done by passing mode=none to the --network
command line option, but without in-depth knowledge of the code, there is
no way for the user to know this. Update the help message for -n/--network
to explain what mode=none does.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20230907171655.6996-3-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This reverts commit 15757e8e6441d83757c39046a6cdd3e4d74200ce.
Turns out there's a way to disable the default virtio-net device: pass
--network mode=none when running a VM.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20230907171655.6996-2-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently, we ensure that guest RAM alloc size is at least 2M for
THP which works well for RV64 but breaks hugepage support for RV32.
To fix this, we use 4M as hugepage size for RV32.
Fixes: 867159a7963b ("riscv: Implement Guest/VM arch functions")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20230712163501.1769737-10-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Ssaia extension is available expose it to the guest.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20230712163501.1769737-9-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zicboz extension is available expose it to the guest.
Also provide the guest the size of the cache block through DT.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20230712163501.1769737-8-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The zbb extension allows software to use basic bitmanip instructions.
Let us add the zbb extension to the Guest device tree whenever it is
supported by the host.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20230712163501.1769737-7-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Let us follow alphabetical order for listing ISA extensions in
the isa_info_arr[] array.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20230712163501.1769737-6-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We add "--disable-sbi-<xyz>" options to disable various SBI extensions
visible to the Guest. This allows users to disable deprecated/redundant
SBI extensions.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20230712163501.1769737-5-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We add command-line parameter to set custom mvendorid, marchid, and
mimpid so that users can show fake CPU type to Guest/VM which does
not match underlying Host CPU.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20230712163501.1769737-4-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We sync-up Linux headers to get latest KVM RISC-V headers having
SBI extension enable/disable, Zbb, Zicboz, and Ssaia support.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20230712163501.1769737-3-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Latest x86 UAPI headers uses __DECLARE_FLEX_ARRAY() macro so let us take
this macro from Linux UAPI header and add it to include/linux/stddef.h.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20230712163501.1769737-2-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Compat messages are there to print a warning when the user creates a virtio
device for the VM, but the guest doesn't initialize it.
This generally works great, except that kvmtool will always create a
virtio-net device, even if the user hasn't specified one, which means that
each time kvmtool loads a guest that doesn't probe the network interface,
the user will get the compat warning. This can get particularly annoying
when running kvm-unit-tests, which doesn't need to use a network interface,
and the virtio-net warning is displayed after each test.
Let's fix this by skipping the compat message in the case of the
automatically created virtio-net device. This lets kvmtool keep the compat
warnings as they are, but removes the false positive.
Even if the user is relying on kvmtool creating the default virtio-net
device, a missing network interface in the guest is very easy to
discover.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20230714152909.31723-1-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Calculate the guest ram size based a ratio proportional to the
number of pages available, rather than the amount of memory
available in bytes, in the host. This is to ensure that the
result is always page-aligned.
If the result of get_ram_size() isn't aligned to the host page
size, it triggers an error in __kvm_set_memory_region(), called
via the KVM_SET_USER_MEMORY_REGION ioctl, which requires the size
to be page-aligned.
Fixes: 18bd8c3bd2a7 ("kvm tools: Don't use all of host RAM for guests by default")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20230717121232.3559948-4-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Factor out getting the number of physical pages available for the
host into a separate function. This will be used in a subsequent
patch.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20230717121232.3559948-3-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Factor out getting the page size of the host into a separate
function. This will be used in a subsequent patch.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20230717121232.3559948-2-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add --loglevel command line argument, with the possible values of 'error',
'warning', 'info' or 'debug' to control what messages kvmtool displays. The
argument functions similarly to the Linux kernel parameter, when lower
verbosity levels hide all message with a higher verbosity (for example,
'warning' hides info and debug messages, allows warning and error
messsages).
The default level is 'info', to match the current behaviour. --debug has
been kept as a legacy option, which might be removed in the future.
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230707151119.81208-5-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
pr_debug() is special, because it can be suppressed with a command line
argument, and because it needs to be a macro to capture the correct
filename, function name and line number. Display debug messages with the
prefix "Debug", to make it clear that those aren't informational messages.
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230707151119.81208-4-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
To prepare for allowing finer control over the messages that kvmtool
displays, replace printf() and fprintf() with the pr_* macros.
Minor changes were made to fix coding style issues that were pet peeves for
the author. And use pr_err() in kvm_cpu__init() instead of pr_warning() for
fatal errors.
Also, fix the message when printing the exit code for KVM_EXIT_UNKNOWN by
removing the '0x' part, because it's printing a decimal number, not a
hexadecimal one (the format specifier is %llu, not %llx).
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230707151119.81208-3-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Of all the pr_* functions, pr_err() is the only function that returns a
value, which is -1. The code in parse_options is the only code that relies
on pr_err() returning a value, and that value must be exactly -1, because
it is being treated differently than the other return values.
This makes the code opaque, because it's not immediately obvious where that
value comes from, and fragile, as a change in the return value of pr_err
would break it.
Make pr_err() more like the other functions and don't return a value.
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230707151119.81208-2-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The MSI and MSI-X implementations is a bit complex, because it keeps
track of capability and vector states as seen by both the guest and the
host. Add a few comments about those states and rename them to something
more accurate.
What's called phys_state at the moment represents the software state
maintained by VFIO and kvmtool, rather than the physical MSI capability,
so host_state is more correct. To be consistent, rename virt_state to
guest_state as well.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230628112331.453904-4-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
MSI vectors can be masked and unmasked individually when using the MSI-X
capability, or when the classic MSI capability supports Per-Vector
Masking. At the moment we incorrectly initialize the guest's view of the
vectors (virt_state) as masked, so when using a MSI capability without
Per-Vector Masking, the vectors are never unmasked and MSIs don't work.
Initialize them unmasked instead.
Since VFIO doesn't support per-vector masking we implement it by
disconnecting the irqfd, and keep track of it with the vector's
phys_state. Initially the irqfd is not connected so phys_state is
masked.
Reported-by: Vivek Gautam <vivek.gautam@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230628112331.453904-3-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Vhost interprets the VIRTIO_F_ACCESS_PLATFORM flag as if accesses need
to use vhost-iotlb, and since kvmtool does not implement vhost-iotlb,
vhost will fail to access the virtqueue.
This fix is preventive. Kvmtool does not set VIRTIO_F_ACCESS_PLATFORM at
the moment but the Arm CCA and pKVM changes will likely hit the issue
(as experienced with the CCA development tree), so we might as well fix
it now.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-18-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
To signal a virtqueue, a kernel vhost worker writes an eventfd
registered by kvmtool with VHOST_SET_VRING_CALL. When MSIs are
supported, this eventfd is connected directly to KVM IRQFD to inject the
interrupt into the guest. However direct injection does not work when
MSIs are not supported. The virtio-mmio transport does not support MSIs
at all, and even with virtio-pci, the guest may use INTx if the irqchip
does not support MSIs (e.g. irqchip=gicv3 on arm64).
In this case, injecting the interrupt requires writing an ISR register
in virtio to signal that it is a virtqueue notification rather than a
config change. Add a thread that polls the vhost eventfd for interrupts,
and notifies the guest. When the guest configures MSIs, disable polling
on the eventfd and enable direct injection.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-17-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Both ioeventfd and ipc use an epoll thread roughly the same way. In
order to add a new epoll user, factor the common bits into epoll.c
Slight implementation changes which shouldn't affect behavior:
* At the moment ioeventfd mixes file descriptor (for the stop event) and
pointers in the epoll_event.data union, which could in theory cause
aliasing. Use a pointer for the stop event instead. kvm-ipc uses only
file descriptors. It could be changed but since epoll.c compares the
stop event pointer first, the risk of aliasing with an fd is much
lower there.
* kvm-ipc uses EPOLLET, edge-triggered events, but having the stop event
level-triggered shouldn't make a difference.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-16-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
vhost-net requires to open one file descriptor for each TX/RX queue
pair. At the moment kvmtool does not support multi-queue vhost: it
issues all vhost ioctls on the first pair, and the other pairs are
broken. Refuse the enable vhost when the user asks for multi-queue.
Using multi-queue vhost-net also requires creating the tap interface
with the 'multi_queue' parameter.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-15-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The suggested CONFIG options do not exist.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-14-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add a few instructions for testing the devices. Testing devices like
vhost-scsi or vsock may seem daunting but is relatively easy.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-13-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Move VHOST_GET_FEATURES to get_host_features() so the guest is aware of
what will actually be supported. This removes the invalid guess about
VIRTIO_NET_F_MRG_RXBUF (if vhost didn't support it, we shouldn't let the
guest negotiate it).
Note the masking of VHOST_NET_F_VIRTIO_NET_HDR when handing features to
vhost. Unfortunately the vhost-net driver interprets VIRTIO_F_ANY_LAYOUT
as VHOST_NET_F_VIRTIO_NET_HDR, which is specific to vhost and forces
vhost-net to supply the vnet header. Since this is done by tap, we don't
want to set the bit.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-12-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We should advertise to the guest only the features supported by vhost
and kvmtool. Then we should set in vhost only the features acked by the
guest. Move vhost feature query to get_host_features(), and vhost
feature setting to device start (after the guest has acked features).
This fixes vsock because we used to enable all vhost features including
VIRTIO_F_ACCESS_PLATFORM, which forces vhost to use vhost-iotlb and
isn't supported by kvmtool.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-11-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We should advertise to the guest only the features supported by vhost
and kvmtool. Then we should set in vhost only the features acked by the
guest. Move vhost feature query to get_host_features(), and vhost
feature setting to device start (after the guest has acked features).
This fixes scsi because we used to enable all vhost features including
VIRTIO_SCSI_F_T10_PI which changes the request layout and caused
inconsistency between guest and vhost.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-10-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The Linux guest does not find any target when 'max_target' is 0.
Initialize it to the maximum defined by virtio, "5.6.4 Device
configuration layout":
max_target SHOULD be less than or equal to 255.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-9-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The SCSI backend doesn't call disk_image__new() so the disk ops are
NULL. Check for this case on exit.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-8-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Fix and simplify the command-line parameter for virtio-scsi. Currently
passing a "scsi:xxxx" parameter without the second "tpgt" argument
causes kvmtool to segfault. But only the "wwpn" parameter is necessary.
The tpgt parameter is ignored and was never used upstream. See
linux/vhost_types.h:
* ABI Rev 0: July 2012 version starting point for v3.6-rc merge candidate +
* RFC-v2 vhost-scsi userspace. Add GET_ABI_VERSION ioctl usage
* ABI Rev 1: January 2013. Ignore vhost_tpgt field in struct vhost_scsi_target.
* All the targets under vhost_wwpn can be seen and used by guset.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-7-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The vhost driver expects virtqueues to be operational by the time we
call SET_ENDPOINT. We currently do it too early. Device start, which
happens when the driver writes the DRIVER_OK status, is a good time to
do this.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-6-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
All vhost devices should perform the same operations when initializing
the IRQFD. Move it to virtio/vhost.c
This fixes vsock, which didn't go through the irq__add_irqfd() helper
and couldn't be used on systems that require GSI translation (GICv2m).
Also correct notify_vq_gsi() in net.c, to check which virtqueue is being
configured. Since vhost only manages the data queues, we shouldn't try
to setup GSI routing for the control queue. This hasn't been a problem
so far because the Linux guest doesn't use IRQs for the control queue.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-5-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
All vhost devices perform the same operation when setting up the
ioeventfd. Move it to virtio/vhost.c
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-4-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The VHOST_VRING* ioctls are common to all device types, move them to
virtio/vhost.c
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-3-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Move vhost owner and memory table setup to virtio/vhost.c.
This also fixes vsock and SCSI which did not support multiple memory
regions until now (vsock didn't allocate the right region size and would
trigger a buffer overflow).
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606130426.978945-2-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
On a 32-bit build GCC complains about the min() parameters:
include/linux/kernel.h:36:24: error: comparison of distinct pointer types lacks a cast [-Werror]
36 | (void) (&_min1 == &_min2); \
| ^~
virtio/rng.c:78:34: note: in expansion of macro 'min'
78 | iov[0].iov_len = min(iov[0].iov_len, 256UL);
| ^~~
Use min_t() instead
Fixes: bc23b9d9b152 ("virtio/rng: return at least one byte of entropy")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606143733.994679-4-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
GCC 13.1 complains about uninitialized value:
arm/kvm-cpu.c: In function 'kvm_cpu__arch_init':
arm/kvm-cpu.c:119:41: error: 'target' may be used uninitialized [-Werror=maybe-uninitialized]
119 | vcpu->cpu_compatible = target->compatible;
| ~~~~~~^~~~~~~~~~~~
arm/kvm-cpu.c:40:32: note: 'target' was declared here
40 | struct kvm_arm_target *target;
| ^~~~~~
This can't happen in practice (we call die() when no target is found), but
initialize the target variable earlier to make GCC happy.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606143733.994679-3-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When looking for the silent flag 's' in MAKEFLAGS we accidentally catch
variable definitions like "ARCH=mips" or "CROSS_COMPILE=/cross/...",
causing several test builds to be silent.
MAKEFLAGS contains the single-letter make flags (without the dash),
followed by flags that don't have a single-letter equivalent such as
"--warn-undefined-variables" (with the dashes), followed by "--" and
command-line variables. For example `make ARCH=mips -k' results in
MAKEFLAGS "k -- ARCH=mips". Running $(filter-out --%) on this does not
discard ARCH=mips, only "--". However adding $(firstword) ensures that
we run the filter either on the single-letter flags or on something
beginning with "--", and avoids silent builds.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230606143733.994679-2-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In virtio/scsi.c we had a small hack to avoid compiler warnings when not
using cross-endian support: we were assigning a variable to itself.
This upsets clang:
virtio/scsi.c:63:7: error: explicitly assigning value of variable of type
'struct virtio_device *' to itself [-Werror,-Wself-assign]
This hack was needed because we use *macros* to do the endianess
conversion, and for architectures like x86 the "dev" argument was removed
from the code.
Provide the endianess conversion functions as inline functions, which do
not suffer from the unused problem.
This requires to isolate the "endian" parameter, because there were
*two* different structures used as the first argument(virtio_device and
virt_queue), *both* with an identically defined "endian" member.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20230525144827.679651-3-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The "force-pci" and "virtio-legacy" option definitions were using '\0'
to initialise an unused ".argh" member, even though this is a string.
This triggers warnings with some compilers like clang.
Also, for some odd reason, the .argh member was not named explicitly in
the option helper macros initialisation, which made this problem harder
to locate.
Sanitise the option macros by always using designated initialisers for
each member, and use the correct empty string for the "force-pci" and
"virtio-legacy" options.
This fixes warnings (promoted to errors) when compiling with clang.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20230525144827.679651-2-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In contrast to the original v0.9 virtio spec (which was rather vague),
the virtio 1.0+ spec demands that a RNG request returns at least one
byte:
"The device MUST place one or more random bytes into the buffer, but it
MAY use less than the entire buffer length."
Our current implementation does not prevent returning zero bytes, which
upsets an assert in EDK II. /dev/urandom should always return at least
256 bytes of entropy, unless interrupted by a signal.
Repeat the read if that happens, and give up if that fails as well.
This makes sure we return some entropy and become spec compliant.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reported-by: Sami Mujawar <sami.mujawar@arm.com>
Link: https://lore.kernel.org/r/20230524112207.586101-3-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
At the moment we use /dev/random as the backing device to provide random
numbers to our virtio-rng implementation. The downside of doing so is
that it may block indefinitely - or return EAGAIN repeatedly in our case.
On one headless system without ample noise sources (no keyboard, mouse,
or network traffic) I measured 30 seconds to gain one byte of randomness.
At the moment EDK II insists in waiting for all of the requsted random
bytes (for its EFI_RNG_PROTOCOL runtime service) to arrive, that held up
a Linux kernel boot for more than 10 minutes(!).
According to the Internet(TM), on Linux /dev/urandom provides the same
quality random numbers as /dev/random, it just does not block when the
entropy estimation algorithm suggests so. For all practical purposes the
recommendation is to just use /dev/urandom, QEMU did the switch as well
in 2019 [1].
Use /dev/urandom instead of /dev/random when opening the file descriptor
providing the randomness source for the virtio/rng implementation.
Due to a special behaviour documented on the urandom(4) manpage, a read
from /dev/urandom will never block, so we can drop the O_NONBLOCK flag.
[1] https://gitlab.com/qemu-project/qemu/-/commit/a2230bd778d8
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230524112207.586101-2-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The arm code tries to align the memory allocation size to 2M to potentially
make use of the transparent hugepages. But this would be problematic if we
try to allocate from the hugetlbfs, where the allocation size could be more than
2M. Given we support upto 1G, let use leave it to the user to align the
requested memory when hugetlbfs is used.
Without the patch:
$ echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
$ mount -t hugetlbfs -o pagesize=1G none /root/hugemem/
$ lkvm run -m 1024 --hugetlbfs /root/hugemem/ ...
# lkvm run -k ... -m 1024 -c 6
Fatal: Can't ftruncate for mem mapping size 1075838976
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230405110905.669217-1-suzuki.poulose@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This is a follow-up patch for [0] which proposed the --force-pci option
for riscv. As per the discussion it was concluded to add virtio-tranport
option taking in four options (pci, pci-legacy, mmio, mmio-legacy).
With this change force-pci and virtio-legacy are both deprecated and
arm's default transport changes from MMIO to PCI as agreed in [0].
This is also true for riscv.
Nothing changes for other architectures.
[0]: https://lore.kernel.org/all/20230118172007.408667-1-rkanwal@rivosinc.com/
Signed-off-by: Rajnesh Kanwal <rkanwal@rivosinc.com>
Link: https://lore.kernel.org/r/20230320143344.404307-1-rkanwal@rivosinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The default serial and rtc IO region overlaps with PCI IO bar
region leading bar 0 activation to fail. Moving these devices
to MMIO region similar to ARM.
Given serial has been moved from 0x3f8 to 0x10000000, this
requires us to now pass earlycon=uart8250,mmio,0x10000000
from cmdline rather than earlycon=uart8250,mmio,0x3f8.
To avoid the need to change the address every time the tool
is updated, we can also just pass "earlycon" from cmdline
and guest then finds the type and base address by following
the Device Tree's stdout-path property.
Signed-off-by: Rajnesh Kanwal <rkanwal@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20230203122934.18714-1-rkanwal@rivosinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
By default, the KVM RISC-V keeps all extensions available to VCPU
enabled and KVMTOOL does not disable any extension.
We add --disable-<xyz> command-line options in KVMTOOL RISC-V to
allow users explicitly disable certain extension if they don't
desire it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20221018140854.69846-7-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When the Zicbom extension is available expose it to the guest.
Also provide the guest the size of the cache block through DT.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20221018140854.69846-6-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We'll need one of these helpers in the next patch in another file.
Let's proactively move them all now, since others may some day also
be useful.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20221018140854.69846-5-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The zihintpause extension allows software to use the PAUSE instruction to
reduce energy consumption while executing spin-wait code sequences. Add the
zihintpause extension to the device tree if it is supported by the host.
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Link: https://lore.kernel.org/r/20221018140854.69846-4-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Svinval extension allows the guest OS to perform range based TLB
maintenance efficiently. Add the Svinval extensiont to the device
tree if it is supported by the host.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20221018140854.69846-3-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We update all UAPI headers based on Linux-6.1-rc1 so that we can
use latest features.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20221018140854.69846-2-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
GCC Version:
gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1)
hw/i8042.c: In function ‘kbd_io’:
hw/i8042.c:153:19: error: ‘value’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
state.write_cmd = val;
~~~~~~~~~~~~~~~~^~~~~
hw/i8042.c:298:5: note: ‘value’ was declared here
u8 value;
^~~~~
cc1: all warnings being treated as errors
make: *** [Makefile:508: hw/i8042.o] Error 1
Signed-off-by: hbuxiaofei <hbuxiaofei@gmail.com>
Link: https://lore.kernel.org/r/20221102080501.69274-1-hbuxiaofei@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Although the PCI Status register only contains read-only and
write-1-to-clear bits, we currently keep anything written there, which
can confuse a guest.
The problem was highlighted by recent Linux commit 6cd514e58f12 ("PCI:
Clear PCI_STATUS when setting up device"), which unconditionally writes
0xffff to the Status register in order to clear pending errors. Then the
EDAC driver sees the parity status bits set and attempts to clear them
by writing 0xc100, which in turn clears the Capabilities List bit.
Later on, when the virtio-pci driver starts probing, it assumes due to
missing capabilities that the device is using the legacy transport, and
fails to setup the device because of mismatched protocol.
Filter writes to the config space, keeping only those to writable
fields. Tighten the access size check while we're at it, to prevent
overflow. This is only a small step in the right direction, not a
foolproof solution, because a guest could still write both Command and
Status registers using a single 32-bit write. More work is needed for:
* Supporting arbitrary sized writes.
* Sanitizing accesses to capabilities, which are device-specific.
Also remove the old hack that filtered accesses. It was most likely
guarding against ROM BAR writes, which is now handled by the
pci_config_writable bitmap.
Reported-by: Pierre Gondois <pierre.gondois@arm.com>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20221020173452.203043-1-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
VIRTIO_RING_F_EVENT_IDX is a bit position value, but
virtio_init_device_vq populates vq->use_event_idx by ANDing this value
directly to vdev->features.
Fix the check for this flag in virtio_init_device_vq.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Tu Dinh Ngoc <dinhngoc.tu@irit.fr>
Link: https://lore.kernel.org/r/20220929121858.156-1-dinhngoc.tu@irit.fr
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We have all MMIO devices under "/smb" DT node so the serial0 alias
path should have "/smb" prefix.
Fixes: 7c9aac003925 ("riscv: Generate FDT at runtime for Guest/VM")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20220815101325.477694-6-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Sstc extension allows the guest OS to program the timer directly without
relying on the SBI call. The kernel detects the presence of Sstc extnesion
from the riscv,isa DT property. Add the Sstc extension to the device tree
if it is supported by the host.
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20220815101325.477694-5-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The Svpbmt extension allows PTE based memory attributes in page tables.
This extension also allows Guest/VM to use PTE based memory attributes
in VS-stage page tables so let us add it Guest/VM ISA string when KVM
RISC-V supports it.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20220815101325.477694-4-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The riscv,isa DT property only contains single letter base extensions
until now. However, there are also multi-letter extensions which were
ratified recently. Add a mechanism to append those extension details
to the device tree so that guest can leverage those.
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20220815101325.477694-3-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We update all UAPI headers based on Linux-6.0-rc1 so that we can
use latest features.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20220815101325.477694-2-apatel@ventanamicro.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When a script is specified for a guest nic setup, we fork() and execl()s
the script when it is time to execute the script. However this is not
optimal, given we are running a VM. The fork() will trigger marking the
entire page-table of the current process as CoW, which will trigger
unmapping the entire stage2 page tables from the guest. Anyway, the
child process will exec the script as soon as we fork(), making all
these mm operations moot. Also, this operation could be problematic
for confidential compute VMs, where it may be expensive (and sometimes
destructive) to make changes to the stage2 page tables.
So, instead we could use vfork() and avoid the CoW and unmap of the stage2.
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220809124816.2880990-1-suzuki.poulose@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The arm, arm64, powerpc and riscv architectures require that libfdt is
installed on the system, however the library might not be available for
every architecture on the user's distro of choice. Or the static version of
the library, needed for the lkvm-static target, might be missing.
Fortunately, kvmtool has anticipated this situation and it includes
instructions to compile and install libfdt in the INSTALL file.
Unfortunately, those instructions do not always work (for example, because
the user is missing the needed permisssions), leaving the user unable to
compile kvmtool.
As an alternative to installing libfdt system-wide, provide the
LIBFDT_DIR variable when compiling kvmtool. For example, when compiling
with the command:
$ make ARCH=<arch> CROSS_COMPILE=<cross_compile> LIBFDT_DIR=<dir>
kvmtool will link the executable against the static version of the library
located in LIBFDT_DIR/libfdt.a.
LIBFDT_DIR takes precedence over the system library, as there are valid
reasons to prefer a self-compiled library over the one that the distro
provides (like the system library being older).
Note that this will slightly increase the size of the executable. For the
arm64 architecture, the increase has been measured to be about 100KB, or
about 5% of the total executable size.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220722141448.168252-2-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Use calloc() to avoid uninitialized fields in the rng device.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20220722141731.64039-5-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Since commit 2108c86d0623 ("virtio/pci: Signal INTx interrupts as level
instead of edge"), virtio uses level-triggered IRQs. Bring the modern
device up to date, by deasserting the IRQ line when the guest reads the
interrupt status register.
Fixes: 3bf79498e6d5 ("virtio: Add support for modern virtio-pci")
Reported-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20220722141731.64039-4-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Variables set on the command-line are not overridden by normal
assignments. So when passing ARCH=x86_64 on the command-line, build
fails:
Makefile:227: *** This architecture (x86_64) is not supported in kvmtool.
Use the 'override' directive to force the ARCH reassignment.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220722141731.64039-3-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When running kvmtool after updating without doing a make clean, one
might run into strange issues such as:
Warning: Failed init: symbol_init
Fatal: Initialisation failed
or worse. This happens because symbol.o is not automatically rebuilt
after a change of headers, because .symbol.o.d is not in the $(DEPS)
variable. So if the layout of struct kvm_config changes, for example,
symbols.o that was built for an older version will try to read
kvm->vmlinux from the wrong location in struct kvm, and lkvm will die.
Add all .d files to $(DEPS). Also include $(STATIC_DEPS) which was
previously set but not used.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220722141731.64039-2-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
pvtime uses ARM_PVTIME_BASE instead of ARM_PVTIME_SIZE for the size of the
memory region given to the guest, which causes to the following error when
creating a flash device (via the -F/--flash command line argument):
Error: RAM (read-only) region [2000000-27fffff] would overlap RAM region [1020000-203ffff]
The read-only region represents the guest memory where the flash image is
copied by kvmtool. The region starting at 0x102_0000 (ARM_PVTIME_BASE) is
the pvtime region, which should be 64K in size. kvmtool erroneously creates
the region to be ARM_PVTIME_BASE in size instead, and the last address
becomes:
ARM_PVTIME_BASE + ARM_PVTIME_BASE - 1 = 0x102_0000 + 0x102_0000 - 1 = 0x203_ffff
which corresponds to the end of the region from the error message.
Do the right thing and make the pvtime memory region ARM_PVTIME_SIZE = 64K
bytes, as it was intended.
Fixes: 7d4671e5d372 ("aarch64: Add stolen time support")
Reported-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Sebastian Ene <sebastianene@google.com>
Link: https://lore.kernel.org/r/20220629103905.24480-1-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
VIRTIO_PCI_F_SIGNAL_MSI is not a virtio feature but an internal flag.
Change it to bool to avoid confusion.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-13-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
According to the virtio spec, all vectors must be initialized to
VIRTIO_MSI_NO_VECTOR (0xffff). In 4.1.5.1.2.1 "Device Requirements:
MSI-X Vector Configuration":
The device MUST return vector mapped to a given event, (NO_VECTOR if
unmapped) on read of config_msix_vector/queue_msix_vector.
Currently we return 0, which is a valid MSI vector. Return NO_VECTOR
instead.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-12-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add modern MMIO transport to virtio, make it the default. Legacy transport
can be enabled with --virtio-legacy. The main change for MMIO is the queue
addresses. They are now 64-bit addresses instead of 32-bit PFNs. Apart
from that all changes for supporting modern devices are already
implemented.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-11-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
To make space for the modern register layout, move the current code to
mmio-legacy.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-10-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add support for modern virtio-pci implementation (based on the 1.0 virtio
spec). We add a new transport, alongside MMIO and PCI-legacy. This is now
the default when selecting PCI, but users can still select the legacy
transport for all virtio devices by passing "--virtio-legacy" on the
command-line.
The main change in modern PCI is the way we address virtqueues, using
64-bit values instead of PFNs. To keep the queue configuration atomic the
device also gets a "queue enable" register. Configuration is also made
extensible by more feature bits and PCI capabilities. Scalability is
improved as well, as devices can have notification registers for each
virtqueue on separate pages. However this implementation keeps a single
notification register.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-9-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
To make space for the more recent virtio version, move the legacy bits of
virtio-pci to a different file.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-8-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Modern virtio uses more than 32 bits of features. Bump the feature
bitfield size to 64 bits.
virtio_set_guest_features() changes in behavior because it will now be
called multiple times, each time the guest writes to a 32-bit slice of
the features.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-7-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We currently call VHOST_SET_BACKEND from notify_vq_gsi(), which can't
work with modern virtio because vhost checks that the virtqueue is
accessible when handling VHOST_SET_BACKEND, and the modern driver
initializes the MSIs before setting up the virtqueue. Move
VHOST_SET_BACKEND to init_vq().
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-6-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Legacy virtio drivers write to the I/O port BAR, and the modern virtio
device uses the MMIO BAR. Since vhost can only listen on one ioeventfd,
select the one that the guest will use.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-5-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The doorbell offset depends on the transport - virtio-legacy uses a
fixed offset, but modern virtio can have per-vq offsets. Add an offset
field to the virtio_pci structure.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-4-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Modern virtio will need to reuse this code when initializing a
virtqueue. It's not much, but still nicer to have next to exit_vq().
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-3-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
On exit_vq() and device reset, remove the MSI routes that were set up at
runtime.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220701142434.75170-2-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Allow the user to specify the RAM base address by using -m/--mem size@addr
command line argument. The base address must be above 2GB, as to not
overlap with the MMIO I/O region.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-13-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add a new function, kvm__arch_default_ram_address(), which returns the
default address for guest RAM for each architecture.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-12-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
RAM initialization is unnecessarily split between kvm__init_ram() and
kvm__arch_init(). Move all code related to RAM initialization to
kvm__init_ram(), making the code easier to follow and to modify.
One thing to note is that the initialization order is slightly altered:
kvm__arch_enable_mte() and gic__create() are now called before mmap'ing the
guest RAM. That is perfectly fine, as they don't use the host's mapping of
the guest memory.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-11-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The kvm struct already contains a pointer to the configuration, which
contains both hugetlbfs_path and ram_size, so is it not necessary to pass
them as arguments to kvm__arch_init().
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-10-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Allow the user to use the standard B (bytes), K (kilobytes), M (megabytes),
G (gigabytes), T (terabytes) and P (petabytes) suffixes for memory size.
When none are specified, the default is megabytes.
Also raise an error if the guest specifies 0 as the memory size, instead
of treating it as uninitialized, as kvmtool has done so far.
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-9-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The ARM_HIMAP_MAX_MEMORY() is a remnant of a time when KVM only supported
40 bits if IPA. There are no users left for this macro, remove it.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-8-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
For 32-bit guests, the maximum memory size is represented by the define
ARM_LOMAP_MAX_MEMORY, which ARM_MAX_MEMORY() returns.
For 64-bit guests, the RAM size is checked against the maximum allowed
by KVM in kvm__get_vm_type().
There are no users left for the ARM_MAX_MEMORY() macro, remove it.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-7-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
For 64-bit guests, kvmtool exists with an error in kvm__get_vm_type() if
the memory size is larger than what KVM supports. For 32-bit guests, the
RAM size is silently rounded down to ARM_LOMAP_MAX_MEMORY in
kvm__arch_init().
Be consistent and exit with an error when the user has configured the
wrong RAM size for 32-bit guests.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-6-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Architectures are free to set their own command line options. Add an
architecture specific hook to validate these options.
For now, the hook does nothing, but it will be used in later patches.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-5-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
host_ram_size() uses sysconf() to calculate the available ram, and
sysconf() can fail. When that happens, host_ram_size() returns 0. kvmtool
warns the user when the configured VM ram size exceeds the size of the
host's memory, but doesn't take into account that host_ram_size() can
return 0. If the function returns zero, skip the warning.
Since this can only happen when the user sets the memory size (via the
-m/--mem command line argument), skip the check entirely if the user hasn't
set it. Move the check to kvm_run_validate_cfg(), as it checks for valid
user configuration.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-4-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The user can specify the virtual machine memory size in MB, which is saved
in cfg->ram_size. kvmtool validates it against the host memory size,
converted from bytes to MB. ram_size is then converted to bytes, and this
is how it is used throughout the rest of kvmtool.
To avoid any confusion about the unit of measurement, especially once the
user is allowed to specify the unit of measurement, always use ram_size in
bytes.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-3-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The help text for the -m/--mem argument states that the guest memory size
is in MiB (mebibyte). MiB is the same thing as MB (megabyte), and indeed
this is how MB is used throughout kvmtool.
Replace MiB with MB, so people don't get the wrong idea and start
believing that for kvmtool a MB is 10^6 bytes instead of 2^20.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220616134828.129006-2-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The GICv2 DT binding describes the third cell in each interrupt
descriptor as holding the trigger type, but also the CPU mask that this
IRQ applies to, in bits [15:8]. However this is not the case for GICv3,
where we don't use a CPU mask in the third cell: a simple mask wouldn't
fit for the many more supported cores anyway.
At the moment we fill this CPU mask field regardless of the GIC type,
for the PMU and arch timer DT nodes. This is not only the wrong thing to
do in case of a GICv3, but also triggers UBSAN splats when using more
than 30 cores, as we do shifting beyond what a u32 can hold:
$ lkvm run -k Image -c 31 --pmu
arm/timer.c:13:22: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
arm/timer.c:13:38: runtime error: signed integer overflow: -2147483648 - 1 cannot be represented in type 'int'
arm/timer.c:13:43: runtime error: left shift of 2147483647 by 8 places cannot be represented in type 'int'
arm/aarch64/pmu.c:202:22: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
arm/aarch64/pmu.c:202:38: runtime error: signed integer overflow: -2147483648 - 1 cannot be represented in type 'int'
arm/aarch64/pmu.c:202:43: runtime error: left shift of 2147483647 by 8 places cannot be represented in type 'int'
Fix that by adding a function that creates the mask by looking at the
GIC type first, and returning zero when a GICv3 is used. Also we
explicitly check for the CPU limit again, even though this would be
done before already, when we try to create a GICv2 VM with more than 8
cores.
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20220616145526.3337196-1-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The code for creating an MSI route is already duplicated between config
and virtqueue MSI. Modern virtio will need it as well, so move it to a
separate function.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-17-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The current virtio-block implementation assumes that buffers have a
specific layout (5.2.6.4 "Legacy Interface: Framing Requirements").
Modern virtio removes this layout constraint, so we have to be careful
when reading buffers. Note that since the Linux driver uses the same
layout as the legacy transport, arbitrary layouts were not actually
tested.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-16-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Our virtio-console implementation already supports ANY_LAYOUT, because
buffers are accessed with scatter-gather operations. Advertise the
VIRTIO_F_ANY_LAYOUT feature.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-15-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Modern virtio demands that devices do not make assumptions about the
buffer layouts. Currently the user network backend assumes that TX
packets are neatly split between virtio-net header and ethernet frame.
Modern virtio-net usually puts everything into one descriptor, but could
also split the buffer arbitrarily. Handle arbitrary buffer layouts and
advertise the VIRTIO_F_ANY_LAYOUT feature.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-14-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The virtio_net header contains a 'num_buffers' field, used when the
VIRTIO_NET_F_MRG_RXBUF feature is negotiated. The legacy driver does not
present this field when the feature is not negotiated. In that case the
header is 2 bytes smaller.
When using the modern virtio transport, the header always contains the
field and in addition the device MUST set it to 1 when the
VIRTIO_NET_F_MRG_RXBUF is not negotiated. Prepare for modern virtio
support by enabling this case once the 'legacy' flag is switched off.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-13-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The conversion of vnet header fields will be more difficult when
supporting the virtio ANY_LAYOUT feature. Since the uip backend doesn't
use the vnet header, and since tap can handle that conversion itself,
offload it to tap.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-12-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Existing IOV functions don't take the iovec size as parameter. This is
unfortunate because when parsing buffers split into header and body,
callers may want to know where the body starts in the iovec, after copying
the header. Add a function that does the same as memcpy_fromiovec, but
also allows to iterate over the iovec.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-11-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Now that devices have a status callback, they don't use
set_guest_features() anymore. The negotiated feature set is available in
struct virtio_device. Remove the callback from all devices.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-10-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Remove unused set_status() callback
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-9-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Some legacy virtio drivers expect to read the device-specific config in
guest endianness (2.5.3 "Legacy Interface: A Note on Device
Configuration Space endian-ness").
Kvmtool doesn't know the guest endianness until it can probe a VCPU. So
the config fields start in host endianness, and are swapped once the
guest is running. Currently this is done in set_guest_features(), but
that is too late because the driver is allowed to read config fields
before setting feature bits (2.5.2 "Device Requirements: Device
Configuration Space"). In addition some devices don't swap the fields,
and those that do swap the fields do it every time the guest writes the
feature register, which can't work if a device gets reset more than
once.
Initialize the config on device reset. Do it on every reset because in
theory multiple guests could run with different endianness during the
lifetime of the device.
Notes:
* the balloon device uses little-endian (5.5.4.0.0.1 "Legacy Interface:
Device configuration layout").
* the vsock device was introduced after virtio 0.9.5, hence doesn't
describe a legacy interface, but the Linux driver allows to use the
legacy transport, and always reads the 64-bit guest_cid field as
little-endian.
* the specification does not describe the 9p device, but the Linux
driver uses guest-endian helpers.
* the specification does not explicitly forbid a driver from reading the
configuration at any time, but a driver must follow the sequence from
3.1.1 "Driver Requirements: Device Initialization", where the driver
is allowed to read the config after setting the DRIVER status bit. It
should therefore be safe to keep dealing with guest endianness only on
device reset, and not on the first config access.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-8-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
At the moment device-specific config access is tailored for a Linux
guest, that performs any access in 8 bits. But config access can have
any size, and modern virtio drivers must use the size of the accessed
field. Add helpers that generalize config accesses.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-7-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Modern virtio devices can use separate buffer for descriptors, available
and used rings. They can also use 64-bit addresses instead of 44-bit.
Rework the virtqueue initialization function to support modern virtio.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-6-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
All virtio devices perform the same set of operations when initializing
their virtqueues. Move it to virtio core.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-5-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The core already tells us whether a device is being started or stopped.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-4-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Don't test for VIRTIO__STATUS_STOP right after setting it.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-3-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Not all toolchains used to know about VIRTIO_CONFIG_S_NEEDS_RESET, so we
left it out of the status mask. Now that we include our own version of
virtio_config.h and we'll need it for virtio 1.0, add it back.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Link: https://lore.kernel.org/r/20220607170239.120084-2-jean-philippe.brucker@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Fixes the following compilation issue:
include/linux/kernel.h:5:10: fatal error: asm/kernel.h: No such file
or directory
5 | #include "asm/kernel.h"
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Dao Lu <daolu@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Fixes: 0febaae00bb6 ("Add cpumask functions")
Link: https://lore.kernel.org/r/20220524180030.1848992-1-daolu@rivosinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 45b4968e0de1 ("hw/serial: ARM/arm64: Use MMIO at higher addresses")
changed how the address for the UART is computed by using KVM_IOPORT_AREA.
The symbol is not defined for MIPS, which results in the following
compilation error:
hw/serial.c:21:27: error: ‘KVM_IOPORT_AREA’ undeclared here (not in a function); did you mean ‘KVM_MIPS_IOPORT_AREA’?
21 | #define serial_iobase_0 (KVM_IOPORT_AREA + 0x3f8)
| ^~~~~~~~~~~~~~~
hw/serial.c:29:27: note: in expansion of macro ‘serial_iobase_0’
29 | #define serial_iobase(nr) serial_iobase_##nr
| ^~~~~~~~~~~~~~
hw/serial.c:92:15: note: in expansion of macro ‘serial_iobase’
92 | .iobase = serial_iobase(0),
| ^~~~~~~~~~~~~
Before the commit, the serial was placed at addresses 0x3f8, 0x2f8,
0x3e8 and 0x2e8. However, MIPS puts the RAM at those addresses, up to
KVM_MMIO_START, which is 0x10000000. Meaning that serial device
emulation never worked, as those addresses were part of a valid memslot
representing memory. This has been the case since commit 7281a8db199b
("kvm tools, mips: Add MIPS support") from 2014.
A quick examination of the MIPS code reveals that the architecture relies
on hypercalls from the guest and the virtio console for input and output.
Since nobody complained about the missing serial device, assume that it is
indeed not needed and do not compile it for MIPS.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220525165704.186754-3-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 4639b72f61a3 ("arm64: Add --vcpu-affinity command line argument")
introduced the --vcpu-affinity command line argument to pin the VCPUs to a
given list of physical CPUs. Unfortunately, the affinity is set only for an
arm64 guest, leading to the following error when running a 32-bit guest on
a system with two or more PMUs:
KVM exit reason: 9 ("KVM_EXIT_FAIL_ENTRY")
Registers:
PC: 0x8000c608
PSTATE: 0x200000d3
SP_EL1: 0x0
LR: 0x0
*pc:
0x8000c608: 25 3f a0 e1 83 61 a0 e1
0x8000c610: 83 31 98 e7 04 10 82 e1
0x8000c618: 07 2c 81 e3 28 10 1b e5
0x8000c620: 03 20 82 e3 03 00 a0 e1
*lr:
Warning: unable to translate guest address 0x0 to host
0x00000000: <unknown>
0x00000008: <unknown>
0x00000010: <unknown>
0x00000018: <unknown>
# KVM compatibility warning.
virtio-net device was not detected.
While you have requested a virtio-net device, the guest kernel did not initialize it.
Please make sure that the guest kernel was compiled with CONFIG_VIRTIO_NET=y enabled in .config.
# KVM session ended normally.
Make the error go away by setting the affinity of the VCPUs for both 32-bit
and 64-bit guests.
Fixes: 4639b72f61a3 ("arm64: Add --vcpu-affinity command line argument")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220525165704.186754-2-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit a08bb43a0c37 ("kvmtool: Copy Linux' up-to-date virtio headers")
copied in some of the virtio UAPI headers from the kernel tree, but
didn't include all of them, as we were relying on some of them being
provided by the distribution.
Now commit bc77bf49df6e ("stat: Add descriptions for new virtio_balloon
stat types") used some newer virtio balloon symbols, that some older
distros (e.g. Ubuntu 18.04) do not carry, which breaks compilation
there:
=======================
CC builtin-stat.o
builtin-stat.c: In function 'do_memstat':
builtin-stat.c:86:8: error: 'VIRTIO_BALLOON_S_HTLB_PGALLOC' undeclared (first use in this function); did you mean 'VIRTIO_BALLOON_S_AVAIL'?
case VIRTIO_BALLOON_S_HTLB_PGALLOC:
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
VIRTIO_BALLOON_S_AVAIL
builtin-stat.c:86:8: note: each undeclared identifier is reported only once for each function it appears in
=======================
To fix this include the remaining virtio headers (those that we actually
need for kvmtool at the moment), from Linux v5.18.0.
Fixes: bc77bf49df6e ("stat: Add descriptions for new virtio_balloon stat types")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20220524150611.523910-5-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit a08bb43a0c37 ("kvmtool: Copy Linux' up-to-date virtio headers")
copied the kernel's virtio UAPI headers into the kvmtool tree, because
at the time some distros didn't include (all of) them in their kernel
headers package.
Let's update those copies, so that we can use newer features, if needed.
This syncs in the already existing copies of the headers from Linux
v5.18.0.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20220524150611.523910-4-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We already have an update_headers.sh sync script, where we occasionally
update the KVM interface UAPI kernel headers into our tree.
So far this covered only the generic kvm.h, plus each architecture's
version of that file.
Commit bc77bf49df6e ("stat: Add descriptions for new virtio_balloon
stat types") used newer virtio symbols, which some older distros do not
include in their kernel headers package. To help fixing this and to
avoid similar problems in the future, add the virtio headers to our sync
script, so that we can get the same, up-to-date versions of the headers
easily.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20220524150611.523910-3-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
At the time we pulled in virtio_mmio.h from the kernel tree (commit
a08bb43a0c37c "kvmtool: Copy Linux' up-to-date virtio headers"), this was
not an official UAPI header file, so wasn't stable and was not shipped
with distributions.
This has changed with Linux commit 51be7a9a261c ("virtio_mmio: expose
header to userspace"), so we can now use that file officially.
However before that the name of some symbols have changed, so we have to
adjust their usage in our source.
This pulls in virtio_mmio.h from Linux v5.18.0.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20220524150611.523910-2-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch fixes an issue of having the stack be executable
for x86 builds by ensuring that the two objects bios-rom.o
and entry.o have the section .note.GNU-stack.
Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Martin Radev <martin.b.radev@gmail.com>
Link: https://lore.kernel.org/r/20220509203940.754644-7-martin.b.radev@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch checks for overflows in QUEUE_NOTIFY and QUEUE_SEL in
the PCI and MMIO operation handling paths. Further, the return
value type of get_vq_count is changed from int to uint since negative
doesn't carry any semantic meaning.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Martin Radev <martin.b.radev@gmail.com>
Link: https://lore.kernel.org/r/20220509203940.754644-6-martin.b.radev@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The handling of VIRTIO_PCI_O_CONFIG is prone to buffer access overflows.
This patch sanitizes this operation by using the newly added virtio op
get_config_size. Any access which goes beyond the config structure's
size is prevented and a failure is returned.
Additionally, PCI accesses which span more than a single byte are prevented
and a warning is printed because the implementation does not currently
support the behavior correctly.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Martin Radev <martin.b.radev@gmail.com>
Link: https://lore.kernel.org/r/20220509203940.754644-5-martin.b.radev@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Per the Linux user API, the struct virtio_9p_config "tag" field contains
the non-NULL terminated tag name and this is how the tag name is
copied by kvmtool in virtio_9p__register(). However, the memory allocation
for the struct is off by one, as it allocates memory for the tag name and
the NULL byte. Fix it by reducing the allocation by exactly one byte.
This is also matches how the struct is allocated by QEMU tagged v7.0.0 in
virtio_9p_get_config().
Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Martin Radev <martin.b.radev@gmail.com>
Link: https://lore.kernel.org/r/YnzhdgUwrLlqmzch@monolith.localdoman
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The PCI access size type is changed from a signed type
to an unsigned type since the size is never expected to
be negative, and the type also matches the type in the
signature of virtio_pci__io_mmio_callback.
This change simplifies size checking in the next patch.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Martin Radev <martin.b.radev@gmail.com>
Link: https://lore.kernel.org/r/20220509203940.754644-4-martin.b.radev@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch verifies that adding the addr and length arguments
from an MMIO op do not overflow. This is necessary because the
arguments are controlled by the VM. The length may be set to
an arbitrary value by using the rep prefix.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Martin Radev <martin.b.radev@gmail.com>
Link: https://lore.kernel.org/r/20220509203940.754644-3-martin.b.radev@gmail.com
[will: Drop redundant o/f check in virtio_mmio_device_specific() per Alex]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add a macro to enable to print a warning only once. This is
beneficial for cases where a warning could be helpful for
debugging, but still log pollution is preferred not to happen.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Martin Radev <martin.b.radev@gmail.com>
Link: https://lore.kernel.org/r/20220509203940.754644-2-martin.b.radev@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Unknown types would print the value with no descriptive text at all.
Add descriptions for all known stat types, and a default description
when the type is unknown.
Signed-off-by: Keir Fraser <keirf@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220520143706.550169-3-keirf@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The collect_stats hook dereferences the stats virtio queue without
checking that it has been initialised.
Signed-off-by: Keir Fraser <keirf@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220520143706.550169-2-keirf@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
KVM doesn't support combination of MTE and AArch32 guest, so do not
even try.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220520123844.127733-1-vladimir.murzin@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add a new command line argument, --vcpu-affinity, to set the CPU affinity
for the VCPUs. The affinity is expressed as a cpulist and will apply to all
VCPU threads.
This gives the user a second option for choosing the PMU on a heterogeneous
system. The PMU setup code, when --vcpu-affinity is specified, will search
for the PMU associated with the CPUs specified with this command line
argument instead of the PMU associated with the CPU on which the main
thread is executing.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-12-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU ioctl is
used to assign a physical PMU to the events that KVM creates when emulating
the PMU for that VCPU. This is useful on heterogeneous systems, when there
is more than one hardware PMU present. All VCPUs must have the same PMU
assigned.
The assumption that is made in the implementation is that the user will pin
the kvmtool process on a set of CPUs that share the same PMU. This allows
kvmtool to set the same PMU for all VCPUs from the main thread, instead of
in the individual VCPU threads. If a VCPU thread migrates to a CPU which
has a different a PMU than the CPU on which the main thread was executing
when the PMU was set, the KVM_RUN ioctl will fail with kvm_run.exit_reason
set to KVM_EXIT_FAIL_ENTRY, and kvm_run.fail_entry will be populated with
the physical CPU ID on which the VCPU tried to execute.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-11-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-10-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add a handful of cpumask functions, some of which will be used when
dealing with different PMUs on heterogeneous systems.
The maximum number of CPUs in a system, NR_CPUS, which dictates the size of
the cpumask, has been taken from the Kconfig file for each architecture,
from Linux version 5.16.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-9-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
By the time kvmtool generates the DTB node for the PMU, the
KVM_ARM_VCPU_PMU_V3 VCPU feature is already set by kvm_cpu__arch_init().
KVM refuses to run a VCPU if the PMU hasn't been initialized. A PMU
cannot be initialized if the interrupt ID hasn't been set by userspace.
As a consequence, kvmtool will get an error if the interrupt ID or if
the PMU has not been initialized:
KVM_RUN failed: Invalid argument
To make debugging easier, exit with an error message as soon as one the
PMU ioctls fails, instead of waiting until the VCPU is first run.
To avoid the repetition of assigning a new kvm_device_attr struct in the
main body of pmu__generate_fdt_nodes(), which hinders readability of the
function, move the struct to set_pmu_attr().
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-8-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
KVM for aarch32 does not exist anymore, PMUv3 is a hardware feature
present only on aarch64 CPUs, the command line option to enable the
feature for a VCPU is aarch64 specific, the PMU code is called only from
an aarch64 function and it compiles to an empty stub when ARCH=arm.
There is no reason to have the PMUv3 emulation code in the common code
area for arm and arm64, so move it to the arm64 directory, where it can
be expanded in the future without fear of breaking aarch32 support.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-7-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The ARM_VCPU_FEATURE_FLAGS() macro sets a feature bit in a rather
convoluted way: if cpu_id is 0, then bit KVM_ARM_VCPU_POWER_OFF is 0,
otherwise is set to 1. There's really no need for this indirection,
especially considering that the macro has been changed to return the same
value for both the arm and arm64 architectures. Replace it with a simple
conditional statement in kvm_cpu__arch_init(), which makes it clearer to
understand.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-6-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
KVM_CAP_ARM_EL1_32BIT and KVM_CAP_ARM_PMU_V3 are arm64 specific features.
They are set based on arm64 specific command line options and they target
arm64 hardware features. It makes little sense for kvmtool to set the
features in the code that is shared between arm and arm64. Move the logic
to set the feature bits to the arch specific function
kvm_cpu__select_features(), which is already used by arm64 to set other
arm64 specific features.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-5-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-4-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Trying to build a source file which included bitops.h, but didn't also
bring in the definition for __WORDSIZE (by including limits.h, for example)
would result in the following error:
include/linux/bitops.h:8:23: error: ‘__WORDSIZE’ undeclared (first use in this function)
8 | #define BITS_PER_LONG __WORDSIZE
| ^~~~~~~~~~
The symbol is defined in the bits/wordsize.h header file, include it.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-3-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add missing header stdbool.h to avoid errors like this one, which can
happen if the including file doesn't include stdbool.h:
include/linux/err.h:33:15: error: type defaults to ‘int’ in declaration of ‘bool’ [-Werror=implicit-int]
33 | static inline bool __must_check IS_ERR(__force const void *ptr)
| ^~~~
include/linux/err.h:33:15: error: variable ‘bool’ declared ‘inline’ [-Werror]
include/linux/err.h:33:1: error: ‘warn_unused_result’ attribute only applies to function types [-Werror=attributes]
33 | static inline bool __must_check IS_ERR(__force const void *ptr)
| ^~~~~~
include/linux/err.h:33:33: error: expected ‘,’ or ‘;’ before ‘IS_ERR’
33 | static inline bool __must_check IS_ERR(__force const void *ptr)
| ^~~~~~
include/linux/err.h:38:15: error: type defaults to ‘int’ in declaration of ‘bool’ [-Werror=implicit-int]
38 | static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr)
| ^~~~
include/linux/err.h:38:15: error: variable ‘bool’ declared ‘inline’ [-Werror]
include/linux/err.h:38:1: error: ‘warn_unused_result’ attribute only applies to function types [-Werror=attributes]
38 | static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr)
| ^~~~~~
include/linux/err.h:38:15: error: redundant redeclaration of ‘bool’ [-Werror=redundant-decls]
38 | static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr)
| ^~~~
include/linux/err.h:33:15: note: previous declaration of ‘bool’ was here
33 | static inline bool __must_check IS_ERR(__force const void *ptr)
| ^~~~
include/linux/err.h:38:33: error: expected ‘,’ or ‘;’ before ‘IS_ERR_OR_NULL’
38 | static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr)
| ^~~~~~~~~~~~~~
include/linux/err.h: In function ‘PTR_ERR_OR_ZERO’:
include/linux/err.h:58:6: error: implicit declaration of function ‘IS_ERR’ [-Werror=implicit-function-declaration]
58 | if (IS_ERR(ptr))
| ^~~~~~
include/linux/err.h:58:6: error: nested extern declaration of ‘IS_ERR’ [-Werror=nested-externs]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220412133231.35355-2-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
MTE has been supported in Linux since commit 673638f434ee ("KVM: arm64:
Expose KVM_ARM_CAP_MTE"), add support for it in kvmtool. MTE is enabled by
default.
Enabling the MTE capability incurs a cost, both in time (for each
translation fault the tags need to be cleared), and in space (the tags need
to be saved when a physical page is swapped out). This overhead is expected
to be negligible for most users, but for those cases where it matters
(like performance benchmarks), a --disable-mte option has been added.
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220328103328.18768-3-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220328103328.18768-2-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The stolen time option is available only for aarch64 and is enabled by
default. Move the option that disables stolen time functionality in the
arch specific path.
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Link: https://lore.kernel.org/r/20220324154304.2572891-1-sebastianene@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This reverts commit bc0b99a2a74047707db73ba057743febf458fd90.
Thanks to some digging from Andre [1], we know that kvmtool commit
bc0b99a2a740 ("kvm tools: Filter out CPU vendor string") was intended
to work around a guest kernel bug resulting from kernel commit
5bbc097d8904 ("x86, amd: Disable GartTlbWlkErr when BIOS forgets it").
Critically, KVM does not implement the MC4 mask MSR and instead injects
a #GP into the guest. On guest kernels without commit d47cc0db8fd6
("x86, amd: Use _safe() msr access for GartTlbWlk disable code") this is
unexpected and causes a kernel oops.
Since the kernel has taken the position to fix the bug in the guest and
not KVM, there is no need for CPU vendor string filtering in kvmtool.
Vendor string filtering is highly problematic for feature discovery,
both in the kernel and userspace. As Andre noted, glibc depends on the
vendor string to discover CPU features at runtime [2]. This has been
generally innocuous, but as distributions begin to raise the minimum ISA
guest userspace will quickly crash and burn on kvmtool. Hiding the
vendor string also makes it impossible to test vendor-specific CPU
features in kvmtool guest kernels.
Given the fact that there are known dependencies in kernel and userspace
on the CPU vendor string, allow the guest to see the native CPU vendor
string. This has the potential to break certain guest kernels of 2011
vintage when running on an AMD Fam10h processor. Onus is on the guest to
update its kernel at this point.
Link: https://lore.kernel.org/kvm/20220311121042.010bbb30@donnerap.cambridge.arm.com/
Link: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/cpu-features.c;h=514226b37889;hb=HEAD#l398
Reported-by: Dongli Si <sidongli1997@gmail.com>
Suggested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Link: https://lore.kernel.org/r/20220318204938.496840-1-oupton@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The command line argument disables the stolen time functionality when is
specified.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Link: https://lore.kernel.org/r/20220313161949.3565171-4-sebastianene@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch adds support for stolen time by sharing a memory region
with the guest which will be used by the hypervisor to store the stolen
time information. Reserve a 64kb MMIO memory region after the RTC peripheral
to be used by pvtime. The exact format of the structure stored by the
hypervisor is described in the ARM DEN0057A document.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Link: https://lore.kernel.org/r/20220313161949.3565171-3-sebastianene@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Move the vCPU structure initialisation before the target->init() call to
keep a reference to the kvm structure during init().
This is required by the pvtime peripheral to reserve a memory region
while the vCPU is beeing initialised.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Link: https://lore.kernel.org/r/20220313161949.3565171-2-sebastianene@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The "msi-parent" PCI root complex property describes the MSI parent of the
root complex. When the VM is created with a GICv2 or GICv3 irqchip
(--irqchip=gicv3 or --irqchip=gicv2), there is no MSI controller present on
the system and the corresponding phandle is not generated, leaving the
"msi-parent" property to point to a non-existing phandle. Skip creating the
"msi-parent" property when no MSI controller exists.
Reported-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220214165830.69207-4-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When loading a kernel image, kvmtool is nice enough to print a message
informing the user where the file was loaded in guest memory, which is very
useful for debugging. Do the same for the firmware image.
Commit e1c7c62afc7b ("arm: turn pr_info() into pr_debug() messages")
changed various pr_info() into pr_debug() messages to stop kvmtool from
cluttering stdout. Do the same when printing where the FDT has been copied
when loading a firmware image.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220214165830.69207-3-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Linux, besides CPIO, supports 7 different compressed formats for the initrd
(gzip, bzip2, LZMA, XZ, LZO, LZ4, ZSTD), but kvmtool only recognizes one of
them.
Remove the initrd magic check because:
1. It doesn't bring much to the end user, as the Linux kernel still
complains if the initrd is in an unknown format.
2. --kernel can be used to load something that is not a Linux kernel (like
a kvm-unit-tests test), in which case a format which is not supported by
a Linux kernel can still be perfectly valid. For example, kvm-unit-tests
load the test environment as an initrd in plain ASCII format.
3. It cuts down on the maintenance effort when new formats are added to
the Linux kernel. Not a big deal, since that doesn't happen very often,
but it's still an effort with very little gain (see point #1 above).
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220214165830.69207-2-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
It appears that the way INTx is emulated is "slightly" out of spec
in kvmtool. We happily inject an edge interrupt, even if the spec
mandates a level.
This doesn't change much for either the guest or userspace (only
KVM will have a bit more work tracking the EOI), but at least
this is correct.
Reported-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Sami Mujawar <sami.mujawar@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220131160242.2665191-1-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When kvmtool boots a kernel, the dmesg will print the following message:
[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 APIC: 30
Fix this by setting up correct initial_apicid to cpu_id.
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220216113735.52240-2-songmuchun@bytedance.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When dev_hdr->dev_num is greater one, the initialization of last_addr
is wrong. Fix it.
Fixes: f83cd16 ("kvm tools: irq: replace the x86 irq rbtree with the PCI device tree")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220216113735.52240-1-songmuchun@bytedance.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch extends FDT generation to generate PCI host DT node.
Of course, PCI host for Guest/VM is not useful at the moment
because it's mostly for PCI pass-through and we don't have
IOMMU and interrupt routing available for KVM RISC-V. In future,
we might be able to use PCI host for VirtIO PCI transport or
other software emulated PCI devices.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Link: https://lore.kernel.org/r/20211119124515.89439-9-anup.patel@wdc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The kernel KVM RISC-V module will forward certain SBI calls
to user space. These forwared SBI calls will usually be the
SBI calls which cannot be emulated in kernel space such as
PUTCHAR and GETCHAR calls.
This patch extends kvm_cpu__handle_exit() to handle SBI calls
forwarded to user space.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Link: https://lore.kernel.org/r/20211119124515.89439-8-anup.patel@wdc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We generate FDT at runtime for RISC-V Guest/VM so that KVMTOOL users
don't have to pass FDT separately via command-line parameters.
Also, we provide "--dump-dtb <filename>" command-line option to dump
generated FDT into a file for debugging purpose.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Link: https://lore.kernel.org/r/20211119124515.89439-7-anup.patel@wdc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The PLIC (platform level interrupt controller) manages peripheral
interrupts in RISC-V world. The per-CPU interrupts are managed
using CPU CSRs hence virtualized in-kernel by KVM RISC-V.
This patch adds PLIC device emulation for KVMTOOL RISC-V.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
[For PLIC context CLAIM register emulation]
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Link: https://lore.kernel.org/r/20211119124515.89439-6-anup.patel@wdc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch implements kvm_cpu__<xyz> Guest/VM VCPU arch functions.
These functions mostly deal with:
1. VCPU allocation and initialization
2. VCPU reset
3. VCPU show/dump code
4. VCPU show/dump registers
We also save RISC-V ISA, XLEN, and TIMEBASE frequency for each VCPU
so that it can be later used for generating Guest/VM FDT.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Link: https://lore.kernel.org/r/20211119124515.89439-5-anup.patel@wdc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch implements all kvm__arch_<xyz> Guest/VM arch functions.
These functions mostly deal with:
1. Guest/VM RAM initialization
2. Updating terminals on character read
3. Loading kernel and initrd images
Firmware loading is not implemented currently because initially we
will be booting kernel directly without any bootloader. In future,
we will certainly support firmware loading.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Link: https://lore.kernel.org/r/20211119124515.89439-4-anup.patel@wdc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch adds initial skeletal KVMTOOL RISC-V support which
just compiles for RV32 and RV64 host.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Link: https://lore.kernel.org/r/20211119124515.89439-3-anup.patel@wdc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We sync-up all ABI headers with Linux-5.16-rc1 so that RISC-V
specfic changes in include/linux/kvm.h are available.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Link: https://lore.kernel.org/r/20211119124515.89439-2-anup.patel@wdc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Command 'lvm version' works incorrect.
It is expected to print:
# ./lvm version
# kvm tool [KVMTOOLS_VERSION]
but the KVMTOOLS_VERSION is missed:
# ./lvm version
# kvm tool
The KVMTOOLS_VERSION is defined in the KVMTOOLS-VERSION-FILE file which
is included at the end of Makefile. Since the CFLAGS is a 'Simply
expanded variables' which means CFLAGS is only scanned once. So the
definetion of KVMTOOLS_VERSION at the end of Makefile would not scanned
by CFLAGS. So the '-DKVMTOOLS_VERSION=' remains empty.
I fixed the bug by moving the '-include $(OUTPUT)KVMTOOLS-VERSION-FILE'
before the CFLAGS.
Signed-off-by: haibiao.xiao <xiaohaibiao331@outlook.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211210030708.288066-1-haibiao.xiao@zstack.io
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The interrupt pin cell in "interrupt-map" property
is defined only for legacy interrupts with a valid
range in [1-4] corrspoding to INTA#..INTD#. And the
PCI endpoint devices that support advance interrupt
mechanism like MSI or MSI-X should not have an entry
with value 0 in "interrupt-map". This patch takes
care of this problem by avoiding redundant entries.
Signed-off-by: Sathyam Panda <sathyam.panda@arm.com>
Reviewed-by: Vivek Kumar Gautam <vivek.gautam@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211111120231.5468-1-sathyam.panda@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When allocating MMIO space for the MSI-X table, kvmtool rounds the
allocation to the host's page size to make it as easy as possible for the
guest to map the table to a page, if it wants to (and doesn't do BAR
reassignment, like the x86 architecture for example). However, the host's
page size can differ from the guest's on architectures which support
multiple page sizes. For example, arm64 supports three different page size,
and it is possible for the host to be using 4k pages, while the guest is
using 64k pages.
To make sure the allocation is always aligned to a guest's page size, round
it up to the maximum architectural page size. Do the same for the pending
bit array if it lives in its own BAR.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20211012132510.42134-8-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Now that we keep track of the real size of MSIX table and PBA, print an
error when the guest tries to write to an offset which is not inside the
correct regions.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211012132510.42134-7-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When creating the MSIX table and PBA, kvmtool rounds up the table and
pending bit array sizes to the host's page size. Unfortunately, when doing
that, it doesn't take into account that the new size can exceed the device
BAR size, leading to hard to diagnose errors for certain configurations.
One theoretical example: PBA and table in the same 4k BAR, host's page size
is 4k. In this case, table->size = 4k, pba->size = 4k, map_size = 4k, which
means that pba->guest_phys_addr = table->guest_phys_addr + 4k, which is
outside of the 4k MMIO range allocated for both structures.
Another example, this time a real-world error that I encountered: happens
with a 64k host booting a 4k guest, an RTL8168 PCIE NIC assigned to the
guest. In this case, kvmtool sets table->size = 64k (because it's rounded
to the host's page size) and pba->size = 64k.
Truncated output of lspci -vv on the host:
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
Subsystem: TP-LINK Technologies Co., Ltd. TG-3468 Gigabit PCI Express Network Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 255
Region 0: I/O ports at 1000 [size=256]
Region 2: Memory at 40000000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at 100000000 (64-bit, prefetchable) [size=16K]
[..]
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
[..]
When booting the guest:
[..]
[ 0.207444] pci-host-generic 40000000.pci: host bridge /pci ranges:
[ 0.208564] pci-host-generic 40000000.pci: IO 0x0000000000..0x000000ffff -> 0x0000000000
[ 0.209857] pci-host-generic 40000000.pci: MEM 0x0050000000..0x007fffffff -> 0x0050000000
[ 0.211184] pci-host-generic 40000000.pci: ECAM at [mem 0x40000000-0x4fffffff] for [bus 00]
[ 0.212625] pci-host-generic 40000000.pci: PCI host bridge to bus 0000:00
[ 0.213647] pci_bus 0000:00: root bus resource [bus 00]
[ 0.214429] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 0.215355] pci_bus 0000:00: root bus resource [mem 0x50000000-0x7fffffff]
[ 0.216676] pci 0000:00:00.0: [10ec:8168] type 00 class 0x020000
[ 0.223771] pci 0000:00:00.0: reg 0x10: [io 0x6200-0x62ff]
[ 0.239765] pci 0000:00:00.0: reg 0x18: [mem 0x50010000-0x50010fff]
[ 0.244595] pci 0000:00:00.0: reg 0x20: [mem 0x50000000-0x50003fff]
[ 0.246331] pci 0000:00:01.0: [1af4:1000] type 00 class 0x020000
[ 0.247278] pci 0000:00:01.0: reg 0x10: [io 0x6300-0x63ff]
[ 0.248212] pci 0000:00:01.0: reg 0x14: [mem 0x50020000-0x500200ff]
[ 0.249172] pci 0000:00:01.0: reg 0x18: [mem 0x50020400-0x500207ff]
[ 0.250450] pci 0000:00:02.0: [1af4:1001] type 00 class 0x018000
[ 0.251392] pci 0000:00:02.0: reg 0x10: [io 0x6400-0x64ff]
[ 0.252351] pci 0000:00:02.0: reg 0x14: [mem 0x50020800-0x500208ff]
[ 0.253312] pci 0000:00:02.0: reg 0x18: [mem 0x50020c00-0x50020fff]
[ 0.254760] pci 0000:00:00.0: BAR 4: assigned [mem 0x50000000-0x50003fff] (1)
[ 0.255805] pci 0000:00:00.0: BAR 2: assigned [mem 0x50004000-0x50004fff] (2)
Warning: [10ec:8168] Error activating emulation for BAR 2
Warning: [10ec:8168] Error activating emulation for BAR 2
[ 0.260432] pci 0000:00:01.0: BAR 2: assigned [mem 0x50005000-0x500053ff]
Warning: [1af4:1000] Error activating emulation for BAR 2
Warning: [1af4:1000] Error activating emulation for BAR 2
[ 0.261469] pci 0000:00:02.0: BAR 2: assigned [mem 0x50005400-0x500057ff]
Warning: [1af4:1001] Error activating emulation for BAR 2
Warning: [1af4:1001] Error activating emulation for BAR 2
[ 0.262499] pci 0000:00:00.0: BAR 0: assigned [io 0x1000-0x10ff]
[ 0.263415] pci 0000:00:01.0: BAR 0: assigned [io 0x1100-0x11ff]
[ 0.264462] pci 0000:00:01.0: BAR 1: assigned [mem 0x50005800-0x500058ff]
Warning: [1af4:1000] Error activating emulation for BAR 1
Warning: [1af4:1000] Error activating emulation for BAR 1
[ 0.265481] pci 0000:00:02.0: BAR 0: assigned [io 0x1200-0x12ff]
[ 0.266397] pci 0000:00:02.0: BAR 1: assigned [mem 0x50005900-0x500059ff]
Warning: [1af4:1001] Error activating emulation for BAR 1
Warning: [1af4:1001] Error activating emulation for BAR 1
[ 0.267892] EINJ: ACPI disabled.
[ 0.269922] virtio-pci 0000:00:01.0: virtio_pci: leaving for legacy driver
[ 0.271118] virtio-pci 0000:00:02.0: virtio_pci: leaving for legacy driver
[ 0.274122] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.275930] printk: console [ttyS0] disabled
[ 0.276669] 1000000.U6_16550A: ttyS0 at MMIO 0x1000000 (irq = 13, base_baud = 115200) is a 16550A
[ 0.278058] printk: console [ttyS0] enabled
[ 0.278058] printk: console [ttyS0] enabled
[ 0.279304] printk: bootconsole [ns16550a0] disabled
[ 0.279304] printk: bootconsole [ns16550a0] disabled
[ 0.281252] 1001000.U6_16550A: ttyS1 at MMIO 0x1001000 (irq = 14, base_baud = 115200) is a 16550A
[ 0.282842] 1002000.U6_16550A: ttyS2 at MMIO 0x1002000 (irq = 15, base_baud = 115200) is a 16550A
[ 0.284611] 1003000.U6_16550A: ttyS3 at MMIO 0x1003000 (irq = 16, base_baud = 115200) is a 16550A
[ 0.286094] SuperH (H)SCI(F) driver initialized
[ 0.286868] msm_serial: driver initialized
[ 0.287890] [drm] radeon kernel modesetting enabled.
[ 0.288826] cacheinfo: Unable to detect cache hierarchy for CPU 0
[ 0.293321] loop: module loaded
KVM_SET_GSI_ROUTING: Invalid argument
At (1), the guest writes 0x50000000 into BAR 4 of the NIC (which holds
the MSIX table and PBA), expecting that will cover only 16k of address
space (the BAR size), up to 0x50003fff, inclusive. On the host side, in
vfio_pci_bar_activate(), kvmtool will actually register for MMIO
emulation the region 0x50000000-0x5000ffff (64k in total) for the MSIX
table and 0x50010000-0x5001ffff (another 64k) for the PBA (kvmtool set
table->size and pba->size to 64k when it aligned them to the host's page
size).
Then at step (2), the guest writes the next available address (from its
point of view) into BAR 2 of the NIC, which is 0x50004000. On the host
side, the PCI emulation layer will search all the regions that overlap with
the BAR address range (0x50004000-0x50004fff) and will find none because,
just like the guest, it uses the BAR size to check for overlaps. When
vfio_pci_bar_activate() is reached, kvmtool will try to register memory for
this region, but it is already registered for the MSIX table emulation and
fails.
The same scenario repeats for every following memory BAR, because the MSIX
table and PBA use memory from 0x50000000 to 0x5001ffff.
The error at the end, which finally terminates the VM, is caused by the
guest trying to write to a totally different BAR, which vfio-pci
interpretes as a write to MSI-X table because it falls in the 64k region
that was registered for emulation. The IRQ ID is not a valid SPI number and
gicv2m_update_routing() returns an error (and sets errno to EINVAL).
Fix this by aligning the table and PBA size to 8 bytes to allow for
qword accesses, like PCI 3.0 mandates.
For the sake of simplicity, the PBA offset in a BAR, in case of a shared
BAR, is kept the same as the offset of the physical device. One hopes that
the device respects the recommendations set forth in PCI LOCAL BUS
SPECIFICATION, REV. 3.0, section "MSI-X Capability and Table Structures"
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20211012132510.42134-6-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The MSI-X capability defines a PBA offset, which is the offset of the PBA
array in the BAR that holds the array.
kvmtool uses the field "pba_offset" in struct msix_cap (which represents
the MSIX capability) to refer to the [PBA offset:BAR] field of the
capability; and the field "offset" in the struct vfio_pci_msix_pba to refer
to offset of the PBA array in the device descriptor created by the VFIO
driver.
As we're getting ready to add yet another field that represents an offset
to struct vfio_pci_msix_pba, try to avoid ambiguities by renaming the
struct's "offset" field to "fd_offset".
No functional change intended.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211012132510.42134-5-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|