aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2024-04-09x86: Fix some memory sizes when setting up biosHEADmasterSicheng Liu2-6/+9
In e820_setup(), the memory region of MB_BIOS is [MB_BIOS_BEGIN, MB_BIOS_END], so its memory size should be MB_BIOS_SIZE (= MB_BIOS_END - MB_BIOS_BEGIN + 1). The same thing goes for BDA, EBDA, MB_BIOS and VGA_ROM in setup_bios(). By the way, a little change is made in setup_irq_handler() to avoid using hard coding. Signed-off-by: Sicheng Liu <lsc2001@outlook.com> Link: https://lore.kernel.org/r/SY6P282MB373318D6241D56E074B040DFA3392@SY6P282MB3733.AUSP282.PROD.OUTLOOK.COM Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09riscv: Allow disabling SBI STA extension for GuestAnup Patel1-1/+4
We add "--disable-sbi-sta" options to allow users disable SBI steal-time extension for the Guest. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240325153141.6816-11-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09riscv: Add Zfa extensiona supportAnup Patel2-0/+4
When the Zfa extension is available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240325153141.6816-10-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09riscv: Add Zvfh[min] extensions supportAnup Patel2-0/+8
When the Zvfh[min] extensions are available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240325153141.6816-9-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09riscv: Add Zihintntl extension supportAnup Patel2-0/+4
When the Zihintntl extension is available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240325153141.6816-8-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09riscv: Add Zfh[min] extensions supportAnup Patel2-0/+8
When the Zfh[min] extensions are available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240325153141.6816-7-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09riscv: Add vector crypto extensions supportAnup Patel2-0/+40
When the vector extensions are available expose them to the guest via device tree so that guest can use it. This includes extensions Zvbb, Zvbc, Zvkb, Zvkg, Zvkned, Zvknha, Zvknhb, Zvksed, Zvksh, and Zvkt. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240325153141.6816-6-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09riscv: Add scalar crypto extensions supportAnup Patel4-0/+88
When the scalar extensions are available expose them to the guest via device tree so that guest can use it. This includes extensions Zbkb, Zbkc, Zbkx, Zknd, Zkne, Zknh, Zkr, Zksed, Zksh, and Zkt. The Zkr extension requires SEED CSR emulation in user space so we also add related KVM_EXIT_RISCV_CSR handling. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240325153141.6816-5-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09riscv: Add Zbc extension supportAnup Patel2-0/+4
When the Zbc extension is available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240325153141.6816-4-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09kvmtool: Fix absence of __packed definitionAnup Patel1-0/+2
The absence of __packed definition in kvm/compiler.h cause build failer after syncing kernel headers with Linux-6.8 because the kernel header uapi/linux/virtio_pci.h uses __packed for structures. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240325153141.6816-3-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-04-09Sync-up headers with Linux-6.8 for KVM RISC-VAnup Patel5-91/+168
We sync-up Linux headers to get latest KVM RISC-V headers having Zbc, Scalar crypto, Vector crypto, Zfh[min], Zihintntl, Zvfh[min], Zfa, and SBI steal-time support. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20240325153141.6816-2-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-03-04Fix 9pfs open device file security flawYanwu Shen1-1/+12
Our team found that a public QEMU's 9pfs security issue[1] also exists in upstream kvmtool's 9pfs device. A privileged guest user can create and access the special device file (e.g., block files) in the shared folder, allowing the malicious user to access the host device and acheive privilege escalation. The virtio_p9_open function code on the 9p.c only checks file directory attributes, but does not check special files. Special device files can be filtered on the device through the S_IFREG and S_IFDIR flag bits. [1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-2861 Link: https://lore.kernel.org/r/20240303183659.20656-1-ywsplz@gmail.com Signed-off-by: Yanwu Shen <ywsPlz@gmail.com> Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09x86: Enable in-kernel irqchip before creating PITTengfei Yu1-4/+4
As the kvm api(https://docs.kernel.org/virt/kvm/api.html) reads, KVM_CREATE_PIT2 call is only valid after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. Signed-off-by: Tengfei Yu <moehanabichan@gmail.com> Link: https://lore.kernel.org/r/20240129123310.28118-1-moehanabichan@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09riscv: Fix guest poweroff when using PLIC emulationAnup Patel1-0/+32
Recently due to commit 74af1456dfa0, the virtio device emulation in KVMTOOL now calls irq__update_msix_route() upon guest poweroff which results in KVMTOOL crash when Guest uses PLIC emulation in user space. This is because irq__update_msix_route() expects the irq_routing table to be available but the KVMTOOL PLIC emulation does not populate any irq_routing entries. Fixes: 74af1456dfa0 ("virtio: Cancel and join threads when exiting devices devices") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231130041633.78725-1-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09riscv: Handle SBI DBCN calls from Guest/VMAnup Patel3-3/+73
The new SBI DBCN functions are forwarded by in-kernel KVM RISC-V module to user-space so let us handle these calls in kvm_cpu_riscv_sbi() function. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231128145628.413414-11-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09riscv: Set mmu-type DT property based on satp_mode ONE_REG interfaceAnup Patel1-7/+37
Instead of hard-coding the mmu-type DT property, we should set it based on satp_mode ONE_REG interface. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231128145628.413414-10-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09riscv: Add Zicond extension supportAnup Patel2-0/+4
When the Zicond extension is available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231128145628.413414-9-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09riscv: Add Smstateen extension supportAnup Patel2-0/+4
When the Smstateen extension is available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231128145628.413414-8-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09riscv: Add Zicsr and Zifencei extension supportAnup Patel2-0/+8
When the Zicsr and Zifencei extension is available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231128145628.413414-7-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09riscv: Add Zicntr and Zihpm extension supportAnup Patel2-0/+8
When the Zicntr and Zihpm extension is available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231128145628.413414-6-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09riscv: Add Zba and Zbs extension supportAnup Patel2-0/+8
When the Zba and Zbs extension is available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231128145628.413414-5-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09riscv: Make CPU_ISA_MAX_LEN depend upon isa_info_arr array sizeAnup Patel1-1/+1
Currently, the CPU_ISA_MAX_LEN is a fixed value so we will easily run out of space when all possible ISA extensions supported by KVM RISC-V are available. Instead of above, let us make CPU_ISA_MAX_LEN depend upon the isa_info_arr[] array size so that CPU_ISA_MAX_LEN automatically adapts to growing number of ISA extensions. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231128145628.413414-4-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09riscv: Improve warning in generate_cpu_nodes()Anup Patel1-1/+2
Let's print name of the ISA extension in warning if generate_cpu_nodes() drops the ISA extension from generated ISA string due to lack of space. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231128145628.413414-3-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09Sync kernel headers with v6.7 to enable additional Risc-V extensionsWill Deacon5-0/+71
$ ./util/update_headers.sh ~/work/linux Signed-off-by: Will Deacon <will@kernel.org>
2023-11-21riscv: Fix guest/init linkage for multilib toolchainAnup Patel1-0/+2
For RISC-V multilib toolchains, we must specify -mabi and -march options when linking guest/init. Fixes: 2e99678314c2 ("riscv: Initial skeletal support") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231118132847.758785-7-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-11-21riscv: Use AIA in-kernel irqchip whenever KVM RISC-V supportsAnup Patel5-6/+251
The KVM RISC-V kernel module supports AIA in-kernel irqchip when underlying host has AIA support. We detect and use AIA in-kernel irqchip whenever possible otherwise we fallback to PLIC emulated in user-space. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231118132847.758785-6-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-11-21riscv: Add IRQFD support for in-kernel AIA irqchipAnup Patel2-0/+84
To use irqfd with in-kernel AIA irqchip, we add custom irq__add_irqfd and irq__del_irqfd functions. This allows us to defer actual KVM_IRQFD ioctl() until AIA irqchip is initialized by KVMTOOL. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231118132847.758785-5-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-11-21riscv: Make irqchip support pluggableAnup Patel6-49/+147
We will be having different types of irqchip: 1) PLIC emulated by user-space 2) AIA APLIC and IMSIC provided by in-kernel KVM module To support above, we de-couple PLIC specific code from generic RISC-V code (such as FDT generation) so that we can easily add other types of irqchip. As part of the PLIC de-coupling, we introduce various riscv_irqchip_xyz global variable to describe the chosen irqchip hence PLIC is no longer required to register itself using device__register(). Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231118132847.758785-4-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-11-21riscv: Add Svnapot extension supportAnup Patel2-0/+4
When the Svnapot extension is available expose it to the guest via device tree so that guest can use it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231118132847.758785-3-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-11-21Sync-up header with Linux-6.6 for KVM RISC-VAnup Patel3-4/+126
We sync-up Linux headers to get latest KVM RISC-V headers having V, Svnapot, AIA and other extensions. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20231118132847.758785-2-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-11-21virtio: Cancel and join threads when exiting devices devicesEduardo Bart9-1/+47
I'm experiencing a segmentation fault in lkvm where it may crash after powering off a guest machine that uses a virtio network device. The crash is hard to reproduce, because looks like it only happens when the guest machine is powering off while extra virtio threads is doing some work, when it happens lkvm crashes in the function virtio_net_rx_thread while attempting to read invalid guest physical memory, because guest physical memory was unmapped. I've isolated the problem and looks like when lkvm exits it unmaps the guest memory while virtio device extra threads may still be executing. I noticed most virtio devices are not executing pthread_cancel + pthread_join to synchronize extra threads when exiting, to make sure this happens I added explicit calls to the virtio device exit function to all virtio devices, which should cancel and join all threads before unmapping guest physical memory, fixing the crash for me. Signed-off-by: Eduardo Bart <edub4rt@gmail.com> Link: https://lore.kernel.org/r/20231117170455.80578-2-edub4rt@gmail.com [will: Added commit message from https://lore.kernel.org/all/CABqCASLWAZ5aq27GuQftWsXSf7yLFCKwrJxWMUF-fiV7Bc4LUA@mail.gmail.com/] Signed-off-by: Will Deacon <will@kernel.org>
2023-09-18pci: Deregister KVM_PCI_CFG_AREA on pci__exitTan En De1-0/+1
KVM_PCI_CFG_AREA is registered with kvm__register_mmio during pci__init, but it isn't deregistered during pci__exit. So, this commit is to kvm__deregister_mmio the KVM_PCI_CFG_AREA on pci__exit. Signed-off-by: Tan En De <ende.tan@starfivetech.com> Link: https://lore.kernel.org/r/20230916052303.1003-1-ende.tan@starfivetech.com Signed-off-by: Will Deacon <will@kernel.org>
2023-09-18virtio/pci: Use consistent naming for the PCI ISR bit flagsKeir Fraser2-2/+5
Avoid using VIRTIO_IRQ_{HIGH,LOW} which belong to a different namespace. Instead define VIRTIO_PCI_ISR_QUEUE as a logical extension of the VIRTIO_PCI_ISR_* namespace. Since this bit flag is missing from a header imported verbatim from Linux, define it directly in pci.c. Signed-off-by: Keir Fraser <keirf@google.com> Link: https://lore.kernel.org/r/20230912151623.2558794-4-keirf@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-09-18virtio/pci: Treat PCI ISR as a set of bit flagsKeir Fraser1-2/+2
The PCI ISR is defined in the virtio spec as a set of flags which can be bitwise ORed together. Therefore we should avoid clearing previously-set flags. Signed-off-by: Keir Fraser <keirf@google.com> Link: https://lore.kernel.org/r/20230912151623.2558794-3-keirf@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-09-18virtio/pci: Level-trigger the legacy IRQ line in all casesKeir Fraser1-1/+1
The PCI legacy IRQ line is level triggered, but is treated as edge triggered via kvm__irq_trigger() for signalling of config changes. Fix this by using kvm__irq_level(), as for queue signalling. Signed-off-by: Keir Fraser <keirf@google.com> Link: https://lore.kernel.org/r/20230912151623.2558794-2-keirf@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-09-18builtin-run: Document mode=none for -n/--networkAlexandru Elisei1-1/+2
It can be useful to disable all network devices, for example, to remove the compat warning for the default network device when the guest does not initialize it. This can be done by passing mode=none to the --network command line option, but without in-depth knowledge of the code, there is no way for the user to know this. Update the help message for -n/--network to explain what mode=none does. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20230907171655.6996-3-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-09-18Revert "virtio-net: Don't print the compat warning for the default device"Alexandru Elisei1-4/+4
This reverts commit 15757e8e6441d83757c39046a6cdd3e4d74200ce. Turns out there's a way to disable the default virtio-net device: pass --network mode=none when running a VM. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20230907171655.6996-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20riscv: Fix guest RAM alloc size computation for RV32Anup Patel1-3/+12
Currently, we ensure that guest RAM alloc size is at least 2M for THP which works well for RV64 but breaks hugepage support for RV32. To fix this, we use 4M as hugepage size for RV32. Fixes: 867159a7963b ("riscv: Implement Guest/VM arch functions") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20230712163501.1769737-10-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20riscv: Add Ssaia extension supportAnup Patel2-0/+4
When the Ssaia extension is available expose it to the guest. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20230712163501.1769737-9-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20riscv: Add Zicboz extension supportAndrew Jones2-1/+14
When the Zicboz extension is available expose it to the guest. Also provide the guest the size of the cache block through DT. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20230712163501.1769737-8-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20riscv: Add zbb extension supportAnup Patel2-0/+4
The zbb extension allows software to use basic bitmanip instructions. Let us add the zbb extension to the Guest device tree whenever it is supported by the host. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20230712163501.1769737-7-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20riscv: Sort the ISA extension array alphabeticallyAnup Patel1-2/+3
Let us follow alphabetical order for listing ISA extensions in the isa_info_arr[] array. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20230712163501.1769737-6-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20riscv: Allow disabling SBI extensions for GuestAnup Patel3-9/+59
We add "--disable-sbi-<xyz>" options to disable various SBI extensions visible to the Guest. This allows users to disable deprecated/redundant SBI extensions. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20230712163501.1769737-5-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20riscv: Allow setting custom mvendorid, marchid, and mimpidAnup Patel2-1/+37
We add command-line parameter to set custom mvendorid, marchid, and mimpid so that users can show fake CPU type to Guest/VM which does not match underlying Host CPU. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20230712163501.1769737-4-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20Sync-up headers with Linux-6.4Anup Patel7-31/+286
We sync-up Linux headers to get latest KVM RISC-V headers having SBI extension enable/disable, Zbb, Zicboz, and Ssaia support. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20230712163501.1769737-3-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20kvm tools: Add __DECLARE_FLEX_ARRAY() in include/linux/stddef.hAnup Patel1-0/+16
Latest x86 UAPI headers uses __DECLARE_FLEX_ARRAY() macro so let us take this macro from Linux UAPI header and add it to include/linux/stddef.h. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20230712163501.1769737-2-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20virtio-net: Don't print the compat warning for the default deviceAlexandru Elisei1-4/+4
Compat messages are there to print a warning when the user creates a virtio device for the VM, but the guest doesn't initialize it. This generally works great, except that kvmtool will always create a virtio-net device, even if the user hasn't specified one, which means that each time kvmtool loads a guest that doesn't probe the network interface, the user will get the compat warning. This can get particularly annoying when running kvm-unit-tests, which doesn't need to use a network interface, and the virtio-net warning is displayed after each test. Let's fix this by skipping the compat message in the case of the automatically created virtio-net device. This lets kvmtool keep the compat warnings as they are, but removes the false positive. Even if the user is relying on kvmtool creating the default virtio-net device, a missing network interface in the guest is very easy to discover. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20230714152909.31723-1-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20Apply scaling down the calculated guest ram size to the number of pagesFuad Tabba1-8/+6
Calculate the guest ram size based a ratio proportional to the number of pages available, rather than the amount of memory available in bytes, in the host. This is to ensure that the result is always page-aligned. If the result of get_ram_size() isn't aligned to the host page size, it triggers an error in __kvm_set_memory_region(), called via the KVM_SET_USER_MEMORY_REGION ioctl, which requires the size to be page-aligned. Fixes: 18bd8c3bd2a7 ("kvm tools: Don't use all of host RAM for guests by default") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20230717121232.3559948-4-tabba@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20Factor out getting the number of physical memory host pagesFuad Tabba1-4/+10
Factor out getting the number of physical pages available for the host into a separate function. This will be used in a subsequent patch. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20230717121232.3559948-3-tabba@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20Factor out getting the host page sizeFuad Tabba1-7/+13
Factor out getting the page size of the host into a separate function. This will be used in a subsequent patch. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20230717121232.3559948-2-tabba@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-12Add --loglevel argument for the run commandAlexandru Elisei3-6/+44
Add --loglevel command line argument, with the possible values of 'error', 'warning', 'info' or 'debug' to control what messages kvmtool displays. The argument functions similarly to the Linux kernel parameter, when lower verbosity levels hide all message with a higher verbosity (for example, 'warning' hides info and debug messages, allows warning and error messsages). The default level is 'info', to match the current behaviour. --debug has been kept as a legacy option, which might be removed in the future. Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20230707151119.81208-5-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-12util: Use __pr_debug() instead of pr_info() to print debug messagesAlexandru Elisei2-1/+17
pr_debug() is special, because it can be suppressed with a command line argument, and because it needs to be a macro to capture the correct filename, function name and line number. Display debug messages with the prefix "Debug", to make it clear that those aren't informational messages. Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20230707151119.81208-4-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-12Replace printf/fprintf with pr_* macrosAlexandru Elisei6-37/+37
To prepare for allowing finer control over the messages that kvmtool displays, replace printf() and fprintf() with the pr_* macros. Minor changes were made to fix coding style issues that were pet peeves for the author. And use pr_err() in kvm_cpu__init() instead of pr_warning() for fatal errors. Also, fix the message when printing the exit code for KVM_EXIT_UNKNOWN by removing the '0x' part, because it's printing a decimal number, not a hexadecimal one (the format specifier is %llu, not %llx). Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20230707151119.81208-3-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-12util: Make pr_err() return voidAlexandru Elisei3-15/+18
Of all the pr_* functions, pr_err() is the only function that returns a value, which is -1. The code in parse_options is the only code that relies on pr_err() returning a value, and that value must be exactly -1, because it is being treated differently than the other return values. This makes the code opaque, because it's not immediately obvious where that value comes from, and fragile, as a change in the return value of pr_err would break it. Make pr_err() more like the other functions and don't return a value. Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20230707151119.81208-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-06vfio/pci: Clarify the MSI statesJean-Philippe Brucker2-37/+57
The MSI and MSI-X implementations is a bit complex, because it keeps track of capability and vector states as seen by both the guest and the host. Add a few comments about those states and rename them to something more accurate. What's called phys_state at the moment represents the software state maintained by VFIO and kvmtool, rather than the physical MSI capability, so host_state is more correct. To be consistent, rename virt_state to guest_state as well. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230628112331.453904-4-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-07-06vfio/pci: Initialize MSI vectors unmaskedJean-Philippe Brucker1-1/+1
MSI vectors can be masked and unmasked individually when using the MSI-X capability, or when the classic MSI capability supports Per-Vector Masking. At the moment we incorrectly initialize the guest's view of the vectors (virt_state) as masked, so when using a MSI capability without Per-Vector Masking, the vectors are never unmasked and MSIs don't work. Initialize them unmasked instead. Since VFIO doesn't support per-vector masking we implement it by disconnecting the irqfd, and keep track of it with the vector's phys_state. Initially the irqfd is not connected so phys_state is masked. Reported-by: Vivek Gautam <vivek.gautam@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230628112331.453904-3-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/vhost: Clear VIRTIO_F_ACCESS_PLATFORMJean-Philippe Brucker5-14/+19
Vhost interprets the VIRTIO_F_ACCESS_PLATFORM flag as if accesses need to use vhost-iotlb, and since kvmtool does not implement vhost-iotlb, vhost will fail to access the virtqueue. This fix is preventive. Kvmtool does not set VIRTIO_F_ACCESS_PLATFORM at the moment but the Arm CCA and pKVM changes will likely hit the issue (as experienced with the CCA development tree), so we might as well fix it now. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-18-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/vhost: Support line interrupt signalingJean-Philippe Brucker6-22/+104
To signal a virtqueue, a kernel vhost worker writes an eventfd registered by kvmtool with VHOST_SET_VRING_CALL. When MSIs are supported, this eventfd is connected directly to KVM IRQFD to inject the interrupt into the guest. However direct injection does not work when MSIs are not supported. The virtio-mmio transport does not support MSIs at all, and even with virtio-pci, the guest may use INTx if the irqchip does not support MSIs (e.g. irqchip=gicv3 on arm64). In this case, injecting the interrupt requires writing an ISR register in virtio to signal that it is a virtqueue notification rather than a config change. Add a thread that polls the vhost eventfd for interrupts, and notifies the guest. When the guest configures MSIs, disable polling on the eventfd and enable direct injection. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-17-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08Factor epoll threadJean-Philippe Brucker5-155/+151
Both ioeventfd and ipc use an epoll thread roughly the same way. In order to add a new epoll user, factor the common bits into epoll.c Slight implementation changes which shouldn't affect behavior: * At the moment ioeventfd mixes file descriptor (for the stop event) and pointers in the epoll_event.data union, which could in theory cause aliasing. Use a pointer for the stop event instead. kvm-ipc uses only file descriptors. It could be changed but since epoll.c compares the stop event pointer first, the risk of aliasing with an fd is much lower there. * kvm-ipc uses EPOLLET, edge-triggered events, but having the stop event level-triggered shouldn't make a difference. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-16-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/net: Warn about enabling multiqueue with vhostJean-Philippe Brucker1-0/+5
vhost-net requires to open one file descriptor for each TX/RX queue pair. At the moment kvmtool does not support multi-queue vhost: it issues all vhost ioctls on the first pair, and the other pairs are broken. Refuse the enable vhost when the user asks for multi-queue. Using multi-queue vhost-net also requires creating the tap interface with the 'multi_queue' parameter. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-15-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio: Fix messages about missing Linux configJean-Philippe Brucker2-2/+2
The suggested CONFIG options do not exist. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-14-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio: Document how to test the devicesJean-Philippe Brucker1-0/+141
Add a few instructions for testing the devices. Testing devices like vhost-scsi or vsock may seem daunting but is relatively easy. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-13-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/net: Fix feature selectionJean-Philippe Brucker1-10/+12
Move VHOST_GET_FEATURES to get_host_features() so the guest is aware of what will actually be supported. This removes the invalid guess about VIRTIO_NET_F_MRG_RXBUF (if vhost didn't support it, we shouldn't let the guest negotiate it). Note the masking of VHOST_NET_F_VIRTIO_NET_HDR when handing features to vhost. Unfortunately the vhost-net driver interprets VIRTIO_F_ANY_LAYOUT as VHOST_NET_F_VIRTIO_NET_HDR, which is specific to vhost and forces vhost-net to supply the vnet header. Since this is done by tap, we don't want to set the bit. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-12-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/vsock: Fix feature selectionJean-Philippe Brucker1-14/+20
We should advertise to the guest only the features supported by vhost and kvmtool. Then we should set in vhost only the features acked by the guest. Move vhost feature query to get_host_features(), and vhost feature setting to device start (after the guest has acked features). This fixes vsock because we used to enable all vhost features including VIRTIO_F_ACCESS_PLATFORM, which forces vhost to use vhost-iotlb and isn't supported by kvmtool. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-11-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/scsi: Fix feature selectionJean-Philippe Brucker1-14/+19
We should advertise to the guest only the features supported by vhost and kvmtool. Then we should set in vhost only the features acked by the guest. Move vhost feature query to get_host_features(), and vhost feature setting to device start (after the guest has acked features). This fixes scsi because we used to enable all vhost features including VIRTIO_SCSI_F_T10_PI which changes the request layout and caused inconsistency between guest and vhost. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-10-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/scsi: Initialize max_targetJean-Philippe Brucker1-0/+1
The Linux guest does not find any target when 'max_target' is 0. Initialize it to the maximum defined by virtio, "5.6.4 Device configuration layout": max_target SHOULD be less than or equal to 255. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-9-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08disk/core: Fix segfault on exit with SCSIJean-Philippe Brucker1-2/+2
The SCSI backend doesn't call disk_image__new() so the disk ops are NULL. Check for this case on exit. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-8-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/scsi: Fix and simplify command-lineJean-Philippe Brucker3-14/+6
Fix and simplify the command-line parameter for virtio-scsi. Currently passing a "scsi:xxxx" parameter without the second "tpgt" argument causes kvmtool to segfault. But only the "wwpn" parameter is necessary. The tpgt parameter is ignored and was never used upstream. See linux/vhost_types.h: * ABI Rev 0: July 2012 version starting point for v3.6-rc merge candidate + * RFC-v2 vhost-scsi userspace. Add GET_ABI_VERSION ioctl usage * ABI Rev 1: January 2013. Ignore vhost_tpgt field in struct vhost_scsi_target. * All the targets under vhost_wwpn can be seen and used by guset. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-7-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/scsi: Move VHOST_SCSI_SET_ENDPOINT to device startJean-Philippe Brucker1-8/+7
The vhost driver expects virtqueues to be operational by the time we call SET_ENDPOINT. We currently do it too early. Device start, which happens when the driver writes the DRIVER_OK status, is a good time to do this. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-6-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/vhost: Factor notify_vq_gsi()Jean-Philippe Brucker5-63/+53
All vhost devices should perform the same operations when initializing the IRQFD. Move it to virtio/vhost.c This fixes vsock, which didn't go through the irq__add_irqfd() helper and couldn't be used on systems that require GSI translation (GICv2m). Also correct notify_vq_gsi() in net.c, to check which virtqueue is being configured. Since vhost only manages the data queues, we shouldn't try to setup GSI routing for the control queue. This hasn't been a problem so far because the Linux guest doesn't use IRQs for the control queue. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-5-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/vhost: Factor notify_vq_eventfd()Jean-Philippe Brucker5-28/+20
All vhost devices perform the same operation when setting up the ioeventfd. Move it to virtio/vhost.c Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-4-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/vhost: Factor vring operationJean-Philippe Brucker5-76/+38
The VHOST_VRING* ioctls are common to all device types, move them to virtio/vhost.c Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-3-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio: Factor vhost initializationJean-Philippe Brucker6-75/+42
Move vhost owner and memory table setup to virtio/vhost.c. This also fixes vsock and SCSI which did not support multiple memory regions until now (vsock didn't allocate the right region size and would trigger a buffer overflow). Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606130426.978945-2-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08virtio/rng: Fix build warning from min()Jean-Philippe Brucker1-1/+1
On a 32-bit build GCC complains about the min() parameters: include/linux/kernel.h:36:24: error: comparison of distinct pointer types lacks a cast [-Werror] 36 | (void) (&_min1 == &_min2); \ | ^~ virtio/rng.c:78:34: note: in expansion of macro 'min' 78 | iov[0].iov_len = min(iov[0].iov_len, 256UL); | ^~~ Use min_t() instead Fixes: bc23b9d9b152 ("virtio/rng: return at least one byte of entropy") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606143733.994679-4-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08arm/kvm-cpu: Fix new build warningJean-Philippe Brucker1-2/+1
GCC 13.1 complains about uninitialized value: arm/kvm-cpu.c: In function 'kvm_cpu__arch_init': arm/kvm-cpu.c:119:41: error: 'target' may be used uninitialized [-Werror=maybe-uninitialized] 119 | vcpu->cpu_compatible = target->compatible; | ~~~~~~^~~~~~~~~~~~ arm/kvm-cpu.c:40:32: note: 'target' was declared here 40 | struct kvm_arm_target *target; | ^~~~~~ This can't happen in practice (we call die() when no target is found), but initialize the target variable earlier to make GCC happy. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606143733.994679-3-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08Makefile: Refine -s handling in the make parametersJean-Philippe Brucker1-1/+1
When looking for the silent flag 's' in MAKEFLAGS we accidentally catch variable definitions like "ARCH=mips" or "CROSS_COMPILE=/cross/...", causing several test builds to be silent. MAKEFLAGS contains the single-letter make flags (without the dash), followed by flags that don't have a single-letter equivalent such as "--warn-undefined-variables" (with the dashes), followed by "--" and command-line variables. For example `make ARCH=mips -k' results in MAKEFLAGS "k -- ARCH=mips". Running $(filter-out --%) on this does not discard ARCH=mips, only "--". However adding $(firstword) ensures that we run the filter either on the single-letter flags or on something beginning with "--", and avoids silent builds. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230606143733.994679-2-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2023-06-05virtio: sanitise virtio endian wrappersAndre Przywara7-61/+71
In virtio/scsi.c we had a small hack to avoid compiler warnings when not using cross-endian support: we were assigning a variable to itself. This upsets clang: virtio/scsi.c:63:7: error: explicitly assigning value of variable of type 'struct virtio_device *' to itself [-Werror,-Wself-assign] This hack was needed because we use *macros* to do the endianess conversion, and for architectures like x86 the "dev" argument was removed from the code. Provide the endianess conversion functions as inline functions, which do not suffer from the unused problem. This requires to isolate the "endian" parameter, because there were *two* different structures used as the first argument(virtio_device and virt_queue), *both* with an identically defined "endian" member. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20230525144827.679651-3-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-05option parsing: fix type of empty .argh parameterAndre Przywara3-5/+6
The "force-pci" and "virtio-legacy" option definitions were using '\0' to initialise an unused ".argh" member, even though this is a string. This triggers warnings with some compilers like clang. Also, for some odd reason, the .argh member was not named explicitly in the option helper macros initialisation, which made this problem harder to locate. Sanitise the option macros by always using designated initialisers for each member, and use the correct empty string for the "force-pci" and "virtio-legacy" options. This fixes warnings (promoted to errors) when compiling with clang. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20230525144827.679651-2-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-05virtio/rng: return at least one byte of entropyAndre Przywara1-3/+15
In contrast to the original v0.9 virtio spec (which was rather vague), the virtio 1.0+ spec demands that a RNG request returns at least one byte: "The device MUST place one or more random bytes into the buffer, but it MAY use less than the entire buffer length." Our current implementation does not prevent returning zero bytes, which upsets an assert in EDK II. /dev/urandom should always return at least 256 bytes of entropy, unless interrupted by a signal. Repeat the read if that happens, and give up if that fails as well. This makes sure we return some entropy and become spec compliant. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reported-by: Sami Mujawar <sami.mujawar@arm.com> Link: https://lore.kernel.org/r/20230524112207.586101-3-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-05virtio/rng: switch to using /dev/urandomAndre Przywara1-1/+1
At the moment we use /dev/random as the backing device to provide random numbers to our virtio-rng implementation. The downside of doing so is that it may block indefinitely - or return EAGAIN repeatedly in our case. On one headless system without ample noise sources (no keyboard, mouse, or network traffic) I measured 30 seconds to gain one byte of randomness. At the moment EDK II insists in waiting for all of the requsted random bytes (for its EFI_RNG_PROTOCOL runtime service) to arrive, that held up a Linux kernel boot for more than 10 minutes(!). According to the Internet(TM), on Linux /dev/urandom provides the same quality random numbers as /dev/random, it just does not block when the entropy estimation algorithm suggests so. For all practical purposes the recommendation is to just use /dev/urandom, QEMU did the switch as well in 2019 [1]. Use /dev/urandom instead of /dev/random when opening the file descriptor providing the randomness source for the virtio/rng implementation. Due to a special behaviour documented on the urandom(4) manpage, a read from /dev/urandom will never block, so we can drop the O_NONBLOCK flag. [1] https://gitlab.com/qemu-project/qemu/-/commit/a2230bd778d8 Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20230524112207.586101-2-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-04-06arm: Do not add padding alignment for hugetlbfs backed memorySuzuki K Poulose1-1/+3
The arm code tries to align the memory allocation size to 2M to potentially make use of the transparent hugepages. But this would be problematic if we try to allocate from the hugetlbfs, where the allocation size could be more than 2M. Given we support upto 1G, let use leave it to the user to align the requested memory when hugetlbfs is used. Without the patch: $ echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages $ mount -t hugetlbfs -o pagesize=1G none /root/hugemem/ $ lkvm run -m 1024 --hugetlbfs /root/hugemem/ ... # lkvm run -k ... -m 1024 -c 6 Fatal: Can't ftruncate for mem mapping size 1075838976 Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230405110905.669217-1-suzuki.poulose@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-03-24Add virtio-transport option and deprecate force-pci and virtio-legacy.Rajnesh Kanwal16-30/+61
This is a follow-up patch for [0] which proposed the --force-pci option for riscv. As per the discussion it was concluded to add virtio-tranport option taking in four options (pci, pci-legacy, mmio, mmio-legacy). With this change force-pci and virtio-legacy are both deprecated and arm's default transport changes from MMIO to PCI as agreed in [0]. This is also true for riscv. Nothing changes for other architectures. [0]: https://lore.kernel.org/all/20230118172007.408667-1-rkanwal@rivosinc.com/ Signed-off-by: Rajnesh Kanwal <rkanwal@rivosinc.com> Link: https://lore.kernel.org/r/20230320143344.404307-1-rkanwal@rivosinc.com Signed-off-by: Will Deacon <will@kernel.org>
2023-03-24riscv: Move serial and rtc from IO port space to MMIO area.Rajnesh Kanwal3-1/+14
The default serial and rtc IO region overlaps with PCI IO bar region leading bar 0 activation to fail. Moving these devices to MMIO region similar to ARM. Given serial has been moved from 0x3f8 to 0x10000000, this requires us to now pass earlycon=uart8250,mmio,0x10000000 from cmdline rather than earlycon=uart8250,mmio,0x3f8. To avoid the need to change the address every time the tool is updated, we can also just pass "earlycon" from cmdline and guest then finds the type and base address by following the Device Tree's stdout-path property. Signed-off-by: Rajnesh Kanwal <rkanwal@rivosinc.com> Tested-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20230203122934.18714-1-rkanwal@rivosinc.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08riscv: Add --disable-<xyz> options to allow user disable extensionsAnup Patel2-1/+25
By default, the KVM RISC-V keeps all extensions available to VCPU enabled and KVMTOOL does not disable any extension. We add --disable-<xyz> command-line options in KVMTOOL RISC-V to allow users explicitly disable certain extension if they don't desire it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20221018140854.69846-7-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08riscv: Add Zicbom extension supportAndrew Jones1-0/+11
When the Zicbom extension is available expose it to the guest. Also provide the guest the size of the cache block through DT. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20221018140854.69846-6-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08riscv: Move reg encoding helpers to kvm-cpu-arch.hAndrew Jones3-18/+19
We'll need one of these helpers in the next patch in another file. Let's proactively move them all now, since others may some day also be useful. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20221018140854.69846-5-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08riscv: Add zihintpause extension supportMayuresh Chitale1-0/+1
The zihintpause extension allows software to use the PAUSE instruction to reduce energy consumption while executing spin-wait code sequences. Add the zihintpause extension to the device tree if it is supported by the host. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Link: https://lore.kernel.org/r/20221018140854.69846-4-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08riscv: Add Svinval extension supportAnup Patel1-0/+1
Svinval extension allows the guest OS to perform range based TLB maintenance efficiently. Add the Svinval extensiont to the device tree if it is supported by the host. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20221018140854.69846-3-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08Update UAPI headers based on Linux-6.1-rc1Anup Patel6-14/+46
We update all UAPI headers based on Linux-6.1-rc1 so that we can use latest features. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20221018140854.69846-2-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08hw/i8042: Fix value uninitialized in kbd_io()hbuxiaofei1-1/+1
GCC Version: gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1) hw/i8042.c: In function ‘kbd_io’: hw/i8042.c:153:19: error: ‘value’ may be used uninitialized in this function [-Werror=maybe-uninitialized] state.write_cmd = val; ~~~~~~~~~~~~~~~~^~~~~ hw/i8042.c:298:5: note: ‘value’ was declared here u8 value; ^~~~~ cc1: all warnings being treated as errors make: *** [Makefile:508: hw/i8042.o] Error 1 Signed-off-by: hbuxiaofei <hbuxiaofei@gmail.com> Link: https://lore.kernel.org/r/20221102080501.69274-1-hbuxiaofei@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08pci: Disable writes to Status registerJean-Philippe Brucker1-14/+40
Although the PCI Status register only contains read-only and write-1-to-clear bits, we currently keep anything written there, which can confuse a guest. The problem was highlighted by recent Linux commit 6cd514e58f12 ("PCI: Clear PCI_STATUS when setting up device"), which unconditionally writes 0xffff to the Status register in order to clear pending errors. Then the EDAC driver sees the parity status bits set and attempts to clear them by writing 0xc100, which in turn clears the Capabilities List bit. Later on, when the virtio-pci driver starts probing, it assumes due to missing capabilities that the device is using the legacy transport, and fails to setup the device because of mismatched protocol. Filter writes to the config space, keeping only those to writable fields. Tighten the access size check while we're at it, to prevent overflow. This is only a small step in the right direction, not a foolproof solution, because a guest could still write both Command and Status registers using a single 32-bit write. More work is needed for: * Supporting arbitrary sized writes. * Sanitizing accesses to capabilities, which are device-specific. Also remove the old hack that filtered accesses. It was most likely guarding against ROM BAR writes, which is now handled by the pci_config_writable bitmap. Reported-by: Pierre Gondois <pierre.gondois@arm.com> Tested-by: Pierre Gondois <pierre.gondois@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20221020173452.203043-1-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2022-10-04virtio-net: Fix vq->use_event_idx flag checkTu Dinh Ngoc1-1/+1
VIRTIO_RING_F_EVENT_IDX is a bit position value, but virtio_init_device_vq populates vq->use_event_idx by ANDing this value directly to vdev->features. Fix the check for this flag in virtio_init_device_vq. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Tu Dinh Ngoc <dinhngoc.tu@irit.fr> Link: https://lore.kernel.org/r/20220929121858.156-1-dinhngoc.tu@irit.fr Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22riscv: Fix serial0 alias pathAnup Patel1-4/+8
We have all MMIO devices under "/smb" DT node so the serial0 alias path should have "/smb" prefix. Fixes: 7c9aac003925 ("riscv: Generate FDT at runtime for Guest/VM") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20220815101325.477694-6-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22riscv: Add Sstc extension supportAtish Patra1-0/+1
Sstc extension allows the guest OS to program the timer directly without relying on the SBI call. The kernel detects the presence of Sstc extnesion from the riscv,isa DT property. Add the Sstc extension to the device tree if it is supported by the host. Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20220815101325.477694-5-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22riscv: Add Svpbmt extension supportAnup Patel1-0/+1
The Svpbmt extension allows PTE based memory attributes in page tables. This extension also allows Guest/VM to use PTE based memory attributes in VS-stage page tables so let us add it Guest/VM ISA string when KVM RISC-V supports it. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20220815101325.477694-4-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22riscv: Append ISA extensions to the device treeAtish Patra3-11/+41
The riscv,isa DT property only contains single letter base extensions until now. However, there are also multi-letter extensions which were ratified recently. Add a mechanism to append those extension details to the device tree so that guest can leverage those. Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20220815101325.477694-3-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22Update UAPI headers based on Linux-6.0-rc1Anup Patel9-30/+301
We update all UAPI headers based on Linux-6.0-rc1 so that we can use latest features. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20220815101325.477694-2-apatel@ventanamicro.com Signed-off-by: Will Deacon <will@kernel.org>
2022-09-22net: Use vfork() instead of fork() for script executionSuzuki K Poulose1-1/+1
When a script is specified for a guest nic setup, we fork() and execl()s the script when it is time to execute the script. However this is not optimal, given we are running a VM. The fork() will trigger marking the entire page-table of the current process as CoW, which will trigger unmapping the entire stage2 page tables from the guest. Anyway, the child process will exec the script as soon as we fork(), making all these mm operations moot. Also, this operation could be problematic for confidential compute VMs, where it may be expensive (and sometimes destructive) to make changes to the stage2 page tables. So, instead we could use vfork() and avoid the CoW and unmap of the stage2. Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220809124816.2880990-1-suzuki.poulose@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-08-04Makefile: Introduce LIBFDT_DIR to specify libfdt locationAlexandru Elisei2-8/+33
The arm, arm64, powerpc and riscv architectures require that libfdt is installed on the system, however the library might not be available for every architecture on the user's distro of choice. Or the static version of the library, needed for the lkvm-static target, might be missing. Fortunately, kvmtool has anticipated this situation and it includes instructions to compile and install libfdt in the INSTALL file. Unfortunately, those instructions do not always work (for example, because the user is missing the needed permisssions), leaving the user unable to compile kvmtool. As an alternative to installing libfdt system-wide, provide the LIBFDT_DIR variable when compiling kvmtool. For example, when compiling with the command: $ make ARCH=<arch> CROSS_COMPILE=<cross_compile> LIBFDT_DIR=<dir> kvmtool will link the executable against the static version of the library located in LIBFDT_DIR/libfdt.a. LIBFDT_DIR takes precedence over the system library, as there are valid reasons to prefer a self-compiled library over the one that the distro provides (like the system library being older). Note that this will slightly increase the size of the executable. For the arm64 architecture, the increase has been measured to be about 100KB, or about 5% of the total executable size. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220722141448.168252-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-08-04virtio/rng: Zero-initialize the deviceJean-Philippe Brucker1-1/+1
Use calloc() to avoid uninitialized fields in the rng device. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20220722141731.64039-5-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2022-08-04virtio/pci: Deassert IRQ line on ISR readJean-Philippe Brucker1-4/+1
Since commit 2108c86d0623 ("virtio/pci: Signal INTx interrupts as level instead of edge"), virtio uses level-triggered IRQs. Bring the modern device up to date, by deasserting the IRQ line when the guest reads the interrupt status register. Fixes: 3bf79498e6d5 ("virtio: Add support for modern virtio-pci") Reported-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20220722141731.64039-4-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2022-08-04Makefile: Fix ARCH overrideJean-Philippe Brucker1-2/+2
Variables set on the command-line are not overridden by normal assignments. So when passing ARCH=x86_64 on the command-line, build fails: Makefile:227: *** This architecture (x86_64) is not supported in kvmtool. Use the 'override' directive to force the ARCH reassignment. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220722141731.64039-3-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2022-08-04Makefile: Add missing build dependenciesJean-Philippe Brucker1-1/+2
When running kvmtool after updating without doing a make clean, one might run into strange issues such as: Warning: Failed init: symbol_init Fatal: Initialisation failed or worse. This happens because symbol.o is not automatically rebuilt after a change of headers, because .symbol.o.d is not in the $(DEPS) variable. So if the layout of struct kvm_config changes, for example, symbols.o that was built for an older version will try to read kvm->vmlinux from the wrong location in struct kvm, and lkvm will die. Add all .d files to $(DEPS). Also include $(STATIC_DEPS) which was previously set but not used. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220722141731.64039-2-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01arm64: pvtime: Use correct region sizeAlexandru Elisei1-5/+5
pvtime uses ARM_PVTIME_BASE instead of ARM_PVTIME_SIZE for the size of the memory region given to the guest, which causes to the following error when creating a flash device (via the -F/--flash command line argument): Error: RAM (read-only) region [2000000-27fffff] would overlap RAM region [1020000-203ffff] The read-only region represents the guest memory where the flash image is copied by kvmtool. The region starting at 0x102_0000 (ARM_PVTIME_BASE) is the pvtime region, which should be 64K in size. kvmtool erroneously creates the region to be ARM_PVTIME_BASE in size instead, and the last address becomes: ARM_PVTIME_BASE + ARM_PVTIME_BASE - 1 = 0x102_0000 + 0x102_0000 - 1 = 0x203_ffff which corresponds to the end of the region from the error message. Do the right thing and make the pvtime memory region ARM_PVTIME_SIZE = 64K bytes, as it was intended. Fixes: 7d4671e5d372 ("aarch64: Add stolen time support") Reported-by: Pierre Gondois <pierre.gondois@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Sebastian Ene <sebastianene@google.com> Link: https://lore.kernel.org/r/20220629103905.24480-1-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio/pci: Remove VIRTIO_PCI_F_SIGNAL_MSIJean-Philippe Brucker2-7/+5
VIRTIO_PCI_F_SIGNAL_MSI is not a virtio feature but an internal flag. Change it to bool to avoid confusion. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-13-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio/pci: Initialize all vectors to VIRTIO_MSI_NO_VECTORJean-Philippe Brucker2-2/+4
According to the virtio spec, all vectors must be initialized to VIRTIO_MSI_NO_VECTOR (0xffff). In 4.1.5.1.2.1 "Device Requirements: MSI-X Vector Configuration": The device MUST return vector mapped to a given event, (NO_VECTOR if unmapped) on read of config_msix_vector/queue_msix_vector. Currently we return 0, which is a valid MSI vector. Return NO_VECTOR instead. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-12-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio: Add support for modern virtio-mmioJean-Philippe Brucker8-11/+195
Add modern MMIO transport to virtio, make it the default. Legacy transport can be enabled with --virtio-legacy. The main change for MMIO is the queue addresses. They are now 64-bit addresses instead of 32-bit PFNs. Apart from that all changes for supporting modern devices are already implemented. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-11-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio: Move MMIO transport to mmio-legacyJean-Philippe Brucker4-155/+165
To make space for the modern register layout, move the current code to mmio-legacy. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-10-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio: Add support for modern virtio-pciJean-Philippe Brucker15-19/+445
Add support for modern virtio-pci implementation (based on the 1.0 virtio spec). We add a new transport, alongside MMIO and PCI-legacy. This is now the default when selecting PCI, but users can still select the legacy transport for all virtio devices by passing "--virtio-legacy" on the command-line. The main change in modern PCI is the way we address virtqueues, using 64-bit values instead of PFNs. To keep the queue configuration atomic the device also gets a "queue enable" register. Configuration is also made extensible by more feature bits and PCI capabilities. Scalability is improved as well, as devices can have notification registers for each virtqueue on separate pages. However this implementation keeps a single notification register. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-9-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio: Move PCI transport to pci-legacyJean-Philippe Brucker4-236/+254
To make space for the more recent virtio version, move the legacy bits of virtio-pci to a different file. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-8-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio: Prepare for more feature bitsJean-Philippe Brucker10-14/+14
Modern virtio uses more than 32 bits of features. Bump the feature bitfield size to 64 bits. virtio_set_guest_features() changes in behavior because it will now be called multiple times, each time the guest writes to a 32-bit slice of the features. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-7-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio/net: Set vhost backend after queue addressJean-Philippe Brucker1-5/+6
We currently call VHOST_SET_BACKEND from notify_vq_gsi(), which can't work with modern virtio because vhost checks that the virtqueue is accessible when handling VHOST_SET_BACKEND, and the modern driver initializes the MSIs before setting up the virtqueue. Move VHOST_SET_BACKEND to init_vq(). Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-6-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio/pci: Use the correct eventfd for vhost notificationJean-Philippe Brucker1-4/+5
Legacy virtio drivers write to the I/O port BAR, and the modern virtio device uses the MMIO BAR. Since vhost can only listen on one ioeventfd, select the one that the guest will use. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-5-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio/pci: Make doorbell offset dynamicJean-Philippe Brucker2-5/+10
The doorbell offset depends on the transport - virtio-legacy uses a fixed offset, but modern virtio can have per-vq offsets. Add an offset field to the virtio_pci structure. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-4-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio: Extract init_vq() for PCI and MMIOJean-Philippe Brucker2-8/+30
Modern virtio will need to reuse this code when initializing a virtqueue. It's not much, but still nicer to have next to exit_vq(). Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-3-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01virtio/pci: Delete MSI routesJean-Philippe Brucker1-0/+14
On exit_vq() and device reset, remove the MSI routes that were set up at runtime. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220701142434.75170-2-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01arm64: Allow the user to specify the RAM base addressAlexandru Elisei7-10/+64
Allow the user to specify the RAM base address by using -m/--mem size@addr command line argument. The base address must be above 2GB, as to not overlap with the MMIO I/O region. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-13-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01Introduce kvm__arch_default_ram_address()Alexandru Elisei7-0/+31
Add a new function, kvm__arch_default_ram_address(), which returns the default address for guest RAM for each architecture. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-12-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01arm/arm64: Consolidate RAM initialization in kvm__init_ram()Julien Grall1-26/+26
RAM initialization is unnecessarily split between kvm__init_ram() and kvm__arch_init(). Move all code related to RAM initialization to kvm__init_ram(), making the code easier to follow and to modify. One thing to note is that the initialization order is slightly altered: kvm__arch_enable_mte() and gic__create() are now called before mmap'ing the guest RAM. That is perfectly fine, as they don't use the host's mapping of the guest memory. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-11-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01kvm__arch_init: Remove hugetlbfs_path and ram_size as parametersJulien Grall7-14/+20
The kvm struct already contains a pointer to the configuration, which contains both hugetlbfs_path and ram_size, so is it not necessary to pass them as arguments to kvm__arch_init(). Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-10-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01builtin_run: Allow standard size specifiers for memorySuzuki K Poulose1-5/+54
Allow the user to use the standard B (bytes), K (kilobytes), M (megabytes), G (gigabytes), T (terabytes) and P (petabytes) suffixes for memory size. When none are specified, the default is megabytes. Also raise an error if the guest specifies 0 as the memory size, instead of treating it as uninitialized, as kvmtool has done so far. Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-9-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01arm/arm64: Kill the ARM_HIMAP_MAX_MEMORY() macroAlexandru Elisei1-1/+0
The ARM_HIMAP_MAX_MEMORY() is a remnant of a time when KVM only supported 40 bits if IPA. There are no users left for this macro, remove it. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-8-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01arm/arm64: Kill the ARM_MAX_MEMORY() macroAlexandru Elisei2-18/+0
For 32-bit guests, the maximum memory size is represented by the define ARM_LOMAP_MAX_MEMORY, which ARM_MAX_MEMORY() returns. For 64-bit guests, the RAM size is checked against the maximum allowed by KVM in kvm__get_vm_type(). There are no users left for the ARM_MAX_MEMORY() macro, remove it. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-7-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01arm/arm64: Fail if RAM size is too large for 32-bit guestsAlexandru Elisei3-1/+10
For 64-bit guests, kvmtool exists with an error in kvm__get_vm_type() if the memory size is larger than what KVM supports. For 32-bit guests, the RAM size is silently rounded down to ARM_LOMAP_MAX_MEMORY in kvm__arch_init(). Be consistent and exit with an error when the user has configured the wrong RAM size for 32-bit guests. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-6-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01builtin-run: Add arch hook to validate VM configurationAlexandru Elisei9-0/+29
Architectures are free to set their own command line options. Add an architecture specific hook to validate these options. For now, the hook does nothing, but it will be used in later patches. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-5-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01builtin-run: Rework RAM size validationAlexandru Elisei1-7/+13
host_ram_size() uses sysconf() to calculate the available ram, and sysconf() can fail. When that happens, host_ram_size() returns 0. kvmtool warns the user when the configured VM ram size exceeds the size of the host's memory, but doesn't take into account that host_ram_size() can return 0. If the function returns zero, skip the warning. Since this can only happen when the user sets the memory size (via the -m/--mem command line argument), skip the check entirely if the user hasn't set it. Move the check to kvm_run_validate_cfg(), as it checks for valid user configuration. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-4-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01builtin-run: Always use RAM size in bytesAlexandru Elisei3-13/+15
The user can specify the virtual machine memory size in MB, which is saved in cfg->ram_size. kvmtool validates it against the host memory size, converted from bytes to MB. ram_size is then converted to bytes, and this is how it is used throughout the rest of kvmtool. To avoid any confusion about the unit of measurement, especially once the user is allowed to specify the unit of measurement, always use ram_size in bytes. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-3-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01Use MB for megabytes consistentlyAlexandru Elisei2-3/+3
The help text for the -m/--mem argument states that the guest memory size is in MiB (mebibyte). MiB is the same thing as MB (megabyte), and indeed this is how MB is used throughout kvmtool. Replace MiB with MB, so people don't get the wrong idea and start believing that for kvmtool a MB is 10^6 bytes instead of 2^20. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-and-Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20220616134828.129006-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01arm: gic: fdt: fix PPI CPU mask calculationAndre Przywara4-5/+16
The GICv2 DT binding describes the third cell in each interrupt descriptor as holding the trigger type, but also the CPU mask that this IRQ applies to, in bits [15:8]. However this is not the case for GICv3, where we don't use a CPU mask in the third cell: a simple mask wouldn't fit for the many more supported cores anyway. At the moment we fill this CPU mask field regardless of the GIC type, for the PMU and arch timer DT nodes. This is not only the wrong thing to do in case of a GICv3, but also triggers UBSAN splats when using more than 30 cores, as we do shifting beyond what a u32 can hold: $ lkvm run -k Image -c 31 --pmu arm/timer.c:13:22: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' arm/timer.c:13:38: runtime error: signed integer overflow: -2147483648 - 1 cannot be represented in type 'int' arm/timer.c:13:43: runtime error: left shift of 2147483647 by 8 places cannot be represented in type 'int' arm/aarch64/pmu.c:202:22: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' arm/aarch64/pmu.c:202:38: runtime error: signed integer overflow: -2147483648 - 1 cannot be represented in type 'int' arm/aarch64/pmu.c:202:43: runtime error: left shift of 2147483647 by 8 places cannot be represented in type 'int' Fix that by adding a function that creates the mask by looking at the GIC type first, and returning zero when a GICv3 is used. Also we explicitly check for the CPU limit again, even though this would be done before already, when we try to create a GICv2 VM with more than 8 cores. Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20220616145526.3337196-1-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio/pci: Factor MSI route creationJean-Philippe Brucker1-33/+27
The code for creating an MSI route is already duplicated between config and virtqueue MSI. Modern virtio will need it as well, so move it to a separate function. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-17-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio/blk: Implement VIRTIO_F_ANY_LAYOUT featureJean-Philippe Brucker3-28/+60
The current virtio-block implementation assumes that buffers have a specific layout (5.2.6.4 "Legacy Interface: Framing Requirements"). Modern virtio removes this layout constraint, so we have to be careful when reading buffers. Note that since the Linux driver uses the same layout as the legacy transport, arbitrary layouts were not actually tested. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-16-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio/console: Add VIRTIO_F_ANY_LAYOUT featureJean-Philippe Brucker1-1/+1
Our virtio-console implementation already supports ANY_LAYOUT, because buffers are accessed with scatter-gather operations. Advertise the VIRTIO_F_ANY_LAYOUT feature. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-15-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio/net: Implement VIRTIO_F_ANY_LAYOUT featureJean-Philippe Brucker2-35/+57
Modern virtio demands that devices do not make assumptions about the buffer layouts. Currently the user network backend assumes that TX packets are neatly split between virtio-net header and ethernet frame. Modern virtio-net usually puts everything into one descriptor, but could also split the buffer arbitrarily. Handle arbitrary buffer layouts and advertise the VIRTIO_F_ANY_LAYOUT feature. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-14-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio/net: Prepare for modern virtioJean-Philippe Brucker3-7/+21
The virtio_net header contains a 'num_buffers' field, used when the VIRTIO_NET_F_MRG_RXBUF feature is negotiated. The legacy driver does not present this field when the feature is not negotiated. In that case the header is 2 bytes smaller. When using the modern virtio transport, the header always contains the field and in addition the device MUST set it to 1 when the VIRTIO_NET_F_MRG_RXBUF is not negotiated. Prepare for modern virtio support by enabling this case once the 'legacy' flag is switched off. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-13-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio/net: Offload vnet header endianness conversion to tapJean-Philippe Brucker1-20/+19
The conversion of vnet header fields will be more difficult when supporting the virtio ANY_LAYOUT feature. Since the uip backend doesn't use the vnet header, and since tap can handle that conversion itself, offload it to tap. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-12-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09Add memcpy_fromiovec_safeJean-Philippe Brucker2-0/+33
Existing IOV functions don't take the iovec size as parameter. This is unfortunate because when parsing buffers split into header and body, callers may want to know where the body starts in the iovec, after copying the header. Add a function that does the same as memcpy_fromiovec, but also allows to iterate over the iovec. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-11-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio: Remove set_guest_features() device opJean-Philippe Brucker11-69/+2
Now that devices have a status callback, they don't use set_guest_features() anymore. The negotiated feature set is available in struct virtio_device. Remove the callback from all devices. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-10-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio/console: Remove unused callbackJean-Philippe Brucker1-5/+0
Remove unused set_status() callback Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-9-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio: Fix device-specific config endiannessJean-Philippe Brucker10-63/+84
Some legacy virtio drivers expect to read the device-specific config in guest endianness (2.5.3 "Legacy Interface: A Note on Device Configuration Space endian-ness"). Kvmtool doesn't know the guest endianness until it can probe a VCPU. So the config fields start in host endianness, and are swapped once the guest is running. Currently this is done in set_guest_features(), but that is too late because the driver is allowed to read config fields before setting feature bits (2.5.2 "Device Requirements: Device Configuration Space"). In addition some devices don't swap the fields, and those that do swap the fields do it every time the guest writes the feature register, which can't work if a device gets reset more than once. Initialize the config on device reset. Do it on every reset because in theory multiple guests could run with different endianness during the lifetime of the device. Notes: * the balloon device uses little-endian (5.5.4.0.0.1 "Legacy Interface: Device configuration layout"). * the vsock device was introduced after virtio 0.9.5, hence doesn't describe a legacy interface, but the Linux driver allows to use the legacy transport, and always reads the 64-bit guest_cid field as little-endian. * the specification does not describe the 9p device, but the Linux driver uses guest-endian helpers. * the specification does not explicitly forbid a driver from reading the configuration at any time, but a driver must follow the sequence from 3.1.1 "Driver Requirements: Device Initialization", where the driver is allowed to read the config after setting the DRIVER status bit. It should therefore be safe to keep dealing with guest endianness only on device reset, and not on the first config access. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-8-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio: Add config access helpersJean-Philippe Brucker4-65/+47
At the moment device-specific config access is tailored for a Linux guest, that performs any access in 8 bits. But config access can have any size, and modern virtio drivers must use the size of the accessed field. Add helpers that generalize config accesses. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-7-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio: Support modern virtqueue addressesJean-Philippe Brucker12-53/+86
Modern virtio devices can use separate buffer for descriptors, available and used rings. They can also use 64-bit addresses instead of 44-bit. Rework the virtqueue initialization function to support modern virtio. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-6-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio: Factor virtqueue initializationJean-Philippe Brucker10-59/+34
All virtio devices perform the same set of operations when initializing their virtqueues. Move it to virtio core. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-5-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio/vsock: Remove redundant state trackingJean-Philippe Brucker1-5/+5
The core already tells us whether a device is being started or stopped. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-4-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio: Remove redundant testJean-Philippe Brucker1-2/+1
Don't test for VIRTIO__STATUS_STOP right after setting it. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-3-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-09virtio: Add NEEDS_RESET to the status maskJean-Philippe Brucker1-0/+1
Not all toolchains used to know about VIRTIO_CONFIG_S_NEEDS_RESET, so we left it out of the status mask. Now that we include our own version of virtio_config.h and we'll need it for virtio 1.0, add it back. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Link: https://lore.kernel.org/r/20220607170239.120084-2-jean-philippe.brucker@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-26riscv: Add missing asm/kernel.h headerDao Lu1-0/+8
Fixes the following compilation issue: include/linux/kernel.h:5:10: fatal error: asm/kernel.h: No such file or directory 5 | #include "asm/kernel.h" Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Dao Lu <daolu@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Fixes: 0febaae00bb6 ("Add cpumask functions") Link: https://lore.kernel.org/r/20220524180030.1848992-1-daolu@rivosinc.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-26mips: Do not emulate a serial deviceAlexandru Elisei2-2/+10
Commit 45b4968e0de1 ("hw/serial: ARM/arm64: Use MMIO at higher addresses") changed how the address for the UART is computed by using KVM_IOPORT_AREA. The symbol is not defined for MIPS, which results in the following compilation error: hw/serial.c:21:27: error: ‘KVM_IOPORT_AREA’ undeclared here (not in a function); did you mean ‘KVM_MIPS_IOPORT_AREA’? 21 | #define serial_iobase_0 (KVM_IOPORT_AREA + 0x3f8) | ^~~~~~~~~~~~~~~ hw/serial.c:29:27: note: in expansion of macro ‘serial_iobase_0’ 29 | #define serial_iobase(nr) serial_iobase_##nr | ^~~~~~~~~~~~~~ hw/serial.c:92:15: note: in expansion of macro ‘serial_iobase’ 92 | .iobase = serial_iobase(0), | ^~~~~~~~~~~~~ Before the commit, the serial was placed at addresses 0x3f8, 0x2f8, 0x3e8 and 0x2e8. However, MIPS puts the RAM at those addresses, up to KVM_MMIO_START, which is 0x10000000. Meaning that serial device emulation never worked, as those addresses were part of a valid memslot representing memory. This has been the case since commit 7281a8db199b ("kvm tools, mips: Add MIPS support") from 2014. A quick examination of the MIPS code reveals that the architecture relies on hypercalls from the guest and the virtio console for input and output. Since nobody complained about the missing serial device, assume that it is indeed not needed and do not compile it for MIPS. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220525165704.186754-3-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-26arm64: Honor --vcpu-affinity for aarch32 guestsAlexandru Elisei1-10/+12
Commit 4639b72f61a3 ("arm64: Add --vcpu-affinity command line argument") introduced the --vcpu-affinity command line argument to pin the VCPUs to a given list of physical CPUs. Unfortunately, the affinity is set only for an arm64 guest, leading to the following error when running a 32-bit guest on a system with two or more PMUs: KVM exit reason: 9 ("KVM_EXIT_FAIL_ENTRY") Registers: PC: 0x8000c608 PSTATE: 0x200000d3 SP_EL1: 0x0 LR: 0x0 *pc: 0x8000c608: 25 3f a0 e1 83 61 a0 e1 0x8000c610: 83 31 98 e7 04 10 82 e1 0x8000c618: 07 2c 81 e3 28 10 1b e5 0x8000c620: 03 20 82 e3 03 00 a0 e1 *lr: Warning: unable to translate guest address 0x0 to host 0x00000000: <unknown> 0x00000008: <unknown> 0x00000010: <unknown> 0x00000018: <unknown> # KVM compatibility warning. virtio-net device was not detected. While you have requested a virtio-net device, the guest kernel did not initialize it. Please make sure that the guest kernel was compiled with CONFIG_VIRTIO_NET=y enabled in .config. # KVM session ended normally. Make the error go away by setting the affinity of the VCPUs for both 32-bit and 64-bit guests. Fixes: 4639b72f61a3 ("arm64: Add --vcpu-affinity command line argument") Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220525165704.186754-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-26include: add new virtio uapi header filesAndre Przywara8-0/+1005
Commit a08bb43a0c37 ("kvmtool: Copy Linux' up-to-date virtio headers") copied in some of the virtio UAPI headers from the kernel tree, but didn't include all of them, as we were relying on some of them being provided by the distribution. Now commit bc77bf49df6e ("stat: Add descriptions for new virtio_balloon stat types") used some newer virtio balloon symbols, that some older distros (e.g. Ubuntu 18.04) do not carry, which breaks compilation there: ======================= CC builtin-stat.o builtin-stat.c: In function 'do_memstat': builtin-stat.c:86:8: error: 'VIRTIO_BALLOON_S_HTLB_PGALLOC' undeclared (first use in this function); did you mean 'VIRTIO_BALLOON_S_AVAIL'? case VIRTIO_BALLOON_S_HTLB_PGALLOC: ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ VIRTIO_BALLOON_S_AVAIL builtin-stat.c:86:8: note: each undeclared identifier is reported only once for each function it appears in ======================= To fix this include the remaining virtio headers (those that we actually need for kvmtool at the moment), from Linux v5.18.0. Fixes: bc77bf49df6e ("stat: Add descriptions for new virtio_balloon stat types") Signed-off-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20220524150611.523910-5-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-26include: update virtio UAPI headersAndre Przywara4-92/+305
Commit a08bb43a0c37 ("kvmtool: Copy Linux' up-to-date virtio headers") copied the kernel's virtio UAPI headers into the kvmtool tree, because at the time some distros didn't include (all of) them in their kernel headers package. Let's update those copies, so that we can use newer features, if needed. This syncs in the already existing copies of the headers from Linux v5.18.0. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20220524150611.523910-4-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-26util: include virtio UAPI headers in syncAndre Przywara1-0/+10
We already have an update_headers.sh sync script, where we occasionally update the KVM interface UAPI kernel headers into our tree. So far this covered only the generic kvm.h, plus each architecture's version of that file. Commit bc77bf49df6e ("stat: Add descriptions for new virtio_balloon stat types") used newer virtio symbols, which some older distros do not include in their kernel headers package. To help fixing this and to avoid similar problems in the future, add the virtio headers to our sync script, so that we can get the same, up-to-date versions of the headers easily. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20220524150611.523910-3-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-26update virtio_mmio.hAndre Przywara2-11/+52
At the time we pulled in virtio_mmio.h from the kernel tree (commit a08bb43a0c37c "kvmtool: Copy Linux' up-to-date virtio headers"), this was not an official UAPI header file, so wasn't stable and was not shipped with distributions. This has changed with Linux commit 51be7a9a261c ("virtio_mmio: expose header to userspace"), so we can now use that file officially. However before that the name of some symbols have changed, so we have to adjust their usage in our source. This pulls in virtio_mmio.h from Linux v5.18.0. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20220524150611.523910-2-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-20kvmtool: Have stack be not executable on x86Martin Radev2-0/+10
This patch fixes an issue of having the stack be executable for x86 builds by ensuring that the two objects bios-rom.o and entry.o have the section .note.GNU-stack. Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Martin Radev <martin.b.radev@gmail.com> Link: https://lore.kernel.org/r/20220509203940.754644-7-martin.b.radev@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-20virtio: Check for overflows in QUEUE_NOTIFY and QUEUE_SELMartin Radev11-12/+39
This patch checks for overflows in QUEUE_NOTIFY and QUEUE_SEL in the PCI and MMIO operation handling paths. Further, the return value type of get_vq_count is changed from int to uint since negative doesn't carry any semantic meaning. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Martin Radev <martin.b.radev@gmail.com> Link: https://lore.kernel.org/r/20220509203940.754644-6-martin.b.radev@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-20virtio: Sanitize config accessesMartin Radev12-9/+119
The handling of VIRTIO_PCI_O_CONFIG is prone to buffer access overflows. This patch sanitizes this operation by using the newly added virtio op get_config_size. Any access which goes beyond the config structure's size is prevented and a failure is returned. Additionally, PCI accesses which span more than a single byte are prevented and a warning is printed because the implementation does not currently support the behavior correctly. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Martin Radev <martin.b.radev@gmail.com> Link: https://lore.kernel.org/r/20220509203940.754644-5-martin.b.radev@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-20virtio/9p: Fix virtio_9p_config allocation sizeMartin Radev1-1/+1
Per the Linux user API, the struct virtio_9p_config "tag" field contains the non-NULL terminated tag name and this is how the tag name is copied by kvmtool in virtio_9p__register(). However, the memory allocation for the struct is off by one, as it allocates memory for the tag name and the NULL byte. Fix it by reducing the allocation by exactly one byte. This is also matches how the struct is allocated by QEMU tagged v7.0.0 in virtio_9p_get_config(). Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Martin Radev <martin.b.radev@gmail.com> Link: https://lore.kernel.org/r/YnzhdgUwrLlqmzch@monolith.localdoman Signed-off-by: Will Deacon <will@kernel.org>
2022-05-20virtio: Use u32 instead of int in pci_data_in/outMartin Radev1-4/+4
The PCI access size type is changed from a signed type to an unsigned type since the size is never expected to be negative, and the type also matches the type in the signature of virtio_pci__io_mmio_callback. This change simplifies size checking in the next patch. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Martin Radev <martin.b.radev@gmail.com> Link: https://lore.kernel.org/r/20220509203940.754644-4-martin.b.radev@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-20mmio: Sanitize addr and lenMartin Radev1-0/+4
This patch verifies that adding the addr and length arguments from an MMIO op do not overflow. This is necessary because the arguments are controlled by the VM. The length may be set to an arbitrary value by using the rep prefix. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Martin Radev <martin.b.radev@gmail.com> Link: https://lore.kernel.org/r/20220509203940.754644-3-martin.b.radev@gmail.com [will: Drop redundant o/f check in virtio_mmio_device_specific() per Alex] Signed-off-by: Will Deacon <will@kernel.org>
2022-05-20kvmtool: Add WARN_ONCE macroMartin Radev1-0/+10
Add a macro to enable to print a warning only once. This is beneficial for cases where a warning could be helpful for debugging, but still log pollution is preferred not to happen. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Martin Radev <martin.b.radev@gmail.com> Link: https://lore.kernel.org/r/20220509203940.754644-2-martin.b.radev@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-20stat: Add descriptions for new virtio_balloon stat typesKeir Fraser1-1/+16
Unknown types would print the value with no descriptive text at all. Add descriptions for all known stat types, and a default description when the type is unknown. Signed-off-by: Keir Fraser <keirf@google.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220520143706.550169-3-keirf@google.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-20virtio/balloon: Fix a crash when collecting statsKeir Fraser1-1/+6
The collect_stats hook dereferences the stats virtio queue without checking that it has been initialised. Signed-off-by: Keir Fraser <keirf@google.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220520143706.550169-2-keirf@google.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-20aarch64: Give up with MTE for AArch32 guestVladimir Murzin1-0/+5
KVM doesn't support combination of MTE and AArch32 guest, so do not even try. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220520123844.127733-1-vladimir.murzin@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06arm64: Add --vcpu-affinity command line argumentAlexandru Elisei7-22/+118
Add a new command line argument, --vcpu-affinity, to set the CPU affinity for the VCPUs. The affinity is expressed as a cpulist and will apply to all VCPU threads. This gives the user a second option for choosing the PMU on a heterogeneous system. The PMU setup code, when --vcpu-affinity is specified, will search for the PMU associated with the CPUs specified with this command line argument instead of the PMU associated with the CPU on which the main thread is executing. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-12-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06arm64: Add support for KVM_ARM_VCPU_PMU_V3_SET_PMUAlexandru Elisei2-3/+148
The KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU ioctl is used to assign a physical PMU to the events that KVM creates when emulating the PMU for that VCPU. This is useful on heterogeneous systems, when there is more than one hardware PMU present. All VCPUs must have the same PMU assigned. The assumption that is made in the implementation is that the user will pin the kvmtool process on a set of CPUs that share the same PMU. This allows kvmtool to set the same PMU for all VCPUs from the main thread, instead of in the individual VCPU threads. If a VCPU thread migrates to a CPU which has a different a PMU than the CPU on which the main thread was executing when the PMU was set, the KVM_RUN ioctl will fail with kvm_run.exit_reason set to KVM_EXIT_FAIL_ENTRY, and kvm_run.fail_entry will be populated with the physical CPU ID on which the VCPU tried to execute. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-11-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06update_headers.sh: Sync ABI headers with Linux v5.18-rc2Alexandru Elisei2-2/+24
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-10-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06Add cpumask functionsAlexandru Elisei14-0/+517
Add a handful of cpumask functions, some of which will be used when dealing with different PMUs on heterogeneous systems. The maximum number of CPUs in a system, NR_CPUS, which dictates the size of the cpumask, has been taken from the Kconfig file for each architecture, from Linux version 5.16. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-9-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06arm64: Rework set_pmu_attr()Alexandru Elisei1-32/+16
By the time kvmtool generates the DTB node for the PMU, the KVM_ARM_VCPU_PMU_V3 VCPU feature is already set by kvm_cpu__arch_init(). KVM refuses to run a VCPU if the PMU hasn't been initialized. A PMU cannot be initialized if the interrupt ID hasn't been set by userspace. As a consequence, kvmtool will get an error if the interrupt ID or if the PMU has not been initialized: KVM_RUN failed: Invalid argument To make debugging easier, exit with an error message as soon as one the PMU ioctls fails, instead of waiting until the VCPU is first run. To avoid the repetition of assigning a new kvm_device_attr struct in the main body of pmu__generate_fdt_nodes(), which hinders readability of the function, move the struct to set_pmu_attr(). Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-8-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06arm: Make the PMUv3 emulation code arm64 specificAlexandru Elisei4-12/+10
KVM for aarch32 does not exist anymore, PMUv3 is a hardware feature present only on aarch64 CPUs, the command line option to enable the feature for a VCPU is aarch64 specific, the PMU code is called only from an aarch64 function and it compiles to an empty stub when ARCH=arm. There is no reason to have the PMUv3 emulation code in the common code area for arm and arm64, so move it to the arm64 directory, where it can be expanded in the future without fear of breaking aarch32 support. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-7-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06arm: Get rid of the ARM_VCPU_FEATURE_FLAGS() macroAlexandru Elisei3-9/+5
The ARM_VCPU_FEATURE_FLAGS() macro sets a feature bit in a rather convoluted way: if cpu_id is 0, then bit KVM_ARM_VCPU_POWER_OFF is 0, otherwise is set to 1. There's really no need for this indirection, especially considering that the macro has been changed to return the same value for both the arm and arm64 architectures. Replace it with a simple conditional statement in kvm_cpu__arch_init(), which makes it clearer to understand. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-6-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06arm: Move arch specific VCPU features to the arch specific functionAlexandru Elisei3-11/+13
KVM_CAP_ARM_EL1_32BIT and KVM_CAP_ARM_PMU_V3 are arm64 specific features. They are set based on arm64 specific command line options and they target arm64 hardware features. It makes little sense for kvmtool to set the features in the code that is shared between arm and arm64. Move the logic to set the feature bits to the arch specific function kvm_cpu__select_features(), which is already used by arm64 to set other arm64 specific features. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-5-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06arm/arm64: pmu.h: Add missing header guardsAlexandru Elisei1-0/+4
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-4-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06linux/bitops.h: Include wordsize.h to provide the __WORDSIZE defineAlexandru Elisei1-0/+2
Trying to build a source file which included bitops.h, but didn't also bring in the definition for __WORDSIZE (by including limits.h, for example) would result in the following error: include/linux/bitops.h:8:23: error: ‘__WORDSIZE’ undeclared (first use in this function) 8 | #define BITS_PER_LONG __WORDSIZE | ^~~~~~~~~~ The symbol is defined in the bits/wordsize.h header file, include it. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-3-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06linux/err.h: Add missing stdbool.h includeAlexandru Elisei1-0/+2
Add missing header stdbool.h to avoid errors like this one, which can happen if the including file doesn't include stdbool.h: include/linux/err.h:33:15: error: type defaults to ‘int’ in declaration of ‘bool’ [-Werror=implicit-int] 33 | static inline bool __must_check IS_ERR(__force const void *ptr) | ^~~~ include/linux/err.h:33:15: error: variable ‘bool’ declared ‘inline’ [-Werror] include/linux/err.h:33:1: error: ‘warn_unused_result’ attribute only applies to function types [-Werror=attributes] 33 | static inline bool __must_check IS_ERR(__force const void *ptr) | ^~~~~~ include/linux/err.h:33:33: error: expected ‘,’ or ‘;’ before ‘IS_ERR’ 33 | static inline bool __must_check IS_ERR(__force const void *ptr) | ^~~~~~ include/linux/err.h:38:15: error: type defaults to ‘int’ in declaration of ‘bool’ [-Werror=implicit-int] 38 | static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr) | ^~~~ include/linux/err.h:38:15: error: variable ‘bool’ declared ‘inline’ [-Werror] include/linux/err.h:38:1: error: ‘warn_unused_result’ attribute only applies to function types [-Werror=attributes] 38 | static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr) | ^~~~~~ include/linux/err.h:38:15: error: redundant redeclaration of ‘bool’ [-Werror=redundant-decls] 38 | static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr) | ^~~~ include/linux/err.h:33:15: note: previous declaration of ‘bool’ was here 33 | static inline bool __must_check IS_ERR(__force const void *ptr) | ^~~~ include/linux/err.h:38:33: error: expected ‘,’ or ‘;’ before ‘IS_ERR_OR_NULL’ 38 | static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr) | ^~~~~~~~~~~~~~ include/linux/err.h: In function ‘PTR_ERR_OR_ZERO’: include/linux/err.h:58:6: error: implicit declaration of function ‘IS_ERR’ [-Werror=implicit-function-declaration] 58 | if (IS_ERR(ptr)) | ^~~~~~ include/linux/err.h:58:6: error: nested extern declaration of ‘IS_ERR’ [-Werror=nested-externs] Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220412133231.35355-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-04aarch64: Add support for MTEAlexandru Elisei6-0/+31
MTE has been supported in Linux since commit 673638f434ee ("KVM: arm64: Expose KVM_ARM_CAP_MTE"), add support for it in kvmtool. MTE is enabled by default. Enabling the MTE capability incurs a cost, both in time (for each translation fault the tags need to be cleared), and in space (the tags need to be saved when a physical page is swapped out). This overhead is expected to be negligible for most users, but for those cases where it matters (like performance benchmarks), a --disable-mte option has been added. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220328103328.18768-3-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-04update_headers.sh: Sync ABI headers with Linux v5.17Alexandru Elisei3-1/+41
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220328103328.18768-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-04Make --no-pvtime command argument arm specificSebastian Ene5-7/+6
The stolen time option is available only for aarch64 and is enabled by default. Move the option that disables stolen time functionality in the arch specific path. Signed-off-by: Sebastian Ene <sebastianene@google.com> Link: https://lore.kernel.org/r/20220324154304.2572891-1-sebastianene@google.com Signed-off-by: Will Deacon <will@kernel.org>
2022-03-21Revert "kvm tools: Filter out CPU vendor string"Oliver Upton1-8/+0
This reverts commit bc0b99a2a74047707db73ba057743febf458fd90. Thanks to some digging from Andre [1], we know that kvmtool commit bc0b99a2a740 ("kvm tools: Filter out CPU vendor string") was intended to work around a guest kernel bug resulting from kernel commit 5bbc097d8904 ("x86, amd: Disable GartTlbWlkErr when BIOS forgets it"). Critically, KVM does not implement the MC4 mask MSR and instead injects a #GP into the guest. On guest kernels without commit d47cc0db8fd6 ("x86, amd: Use _safe() msr access for GartTlbWlk disable code") this is unexpected and causes a kernel oops. Since the kernel has taken the position to fix the bug in the guest and not KVM, there is no need for CPU vendor string filtering in kvmtool. Vendor string filtering is highly problematic for feature discovery, both in the kernel and userspace. As Andre noted, glibc depends on the vendor string to discover CPU features at runtime [2]. This has been generally innocuous, but as distributions begin to raise the minimum ISA guest userspace will quickly crash and burn on kvmtool. Hiding the vendor string also makes it impossible to test vendor-specific CPU features in kvmtool guest kernels. Given the fact that there are known dependencies in kernel and userspace on the CPU vendor string, allow the guest to see the native CPU vendor string. This has the potential to break certain guest kernels of 2011 vintage when running on an AMD Fam10h processor. Onus is on the guest to update its kernel at this point. Link: https://lore.kernel.org/kvm/20220311121042.010bbb30@donnerap.cambridge.arm.com/ Link: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/cpu-features.c;h=514226b37889;hb=HEAD#l398 Reported-by: Dongli Si <sidongli1997@gmail.com> Suggested-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Oliver Upton <oupton@google.com> Link: https://lore.kernel.org/r/20220318204938.496840-1-oupton@google.com Signed-off-by: Will Deacon <will@kernel.org>
2022-03-21Add --no-pvtime command line argumentSebastian Ene1-0/+2
The command line argument disables the stolen time functionality when is specified. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Sebastian Ene <sebastianene@google.com> Link: https://lore.kernel.org/r/20220313161949.3565171-4-sebastianene@google.com Signed-off-by: Will Deacon <will@kernel.org>
2022-03-21aarch64: Add stolen time supportSebastian Ene8-2/+114
This patch adds support for stolen time by sharing a memory region with the guest which will be used by the hypervisor to store the stolen time information. Reserve a 64kb MMIO memory region after the RTC peripheral to be used by pvtime. The exact format of the structure stored by the hypervisor is described in the ARM DEN0057A document. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Sebastian Ene <sebastianene@google.com> Link: https://lore.kernel.org/r/20220313161949.3565171-3-sebastianene@google.com Signed-off-by: Will Deacon <will@kernel.org>
2022-03-21aarch64: Populate the vCPU struct before target->init()Sebastian Ene1-7/+7
Move the vCPU structure initialisation before the target->init() call to keep a reference to the kvm structure during init(). This is required by the pvtime peripheral to reserve a memory region while the vCPU is beeing initialised. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sebastian Ene <sebastianene@google.com> Link: https://lore.kernel.org/r/20220313161949.3565171-2-sebastianene@google.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-16arm: pci: Generate "msi-parent" property only with a MSI controllerAlexandru Elisei3-4/+9
The "msi-parent" PCI root complex property describes the MSI parent of the root complex. When the VM is created with a GICv2 or GICv3 irqchip (--irqchip=gicv3 or --irqchip=gicv2), there is no MSI controller present on the system and the corresponding phandle is not generated, leaving the "msi-parent" property to point to a non-existing phandle. Skip creating the "msi-parent" property when no MSI controller exists. Reported-by: Pierre Gondois <pierre.gondois@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220214165830.69207-4-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-16arm: Use pr_debug() to print memory layout when loading a firmware imageAlexandru Elisei1-3/+5
When loading a kernel image, kvmtool is nice enough to print a message informing the user where the file was loaded in guest memory, which is very useful for debugging. Do the same for the firmware image. Commit e1c7c62afc7b ("arm: turn pr_info() into pr_debug() messages") changed various pr_info() into pr_debug() messages to stop kvmtool from cluttering stdout. Do the same when printing where the FDT has been copied when loading a firmware image. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220214165830.69207-3-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-16Remove initrd magic checkAlexandru Elisei1-22/+0
Linux, besides CPIO, supports 7 different compressed formats for the initrd (gzip, bzip2, LZMA, XZ, LZO, LZ4, ZSTD), but kvmtool only recognizes one of them. Remove the initrd magic check because: 1. It doesn't bring much to the end user, as the Linux kernel still complains if the initrd is in an unknown format. 2. --kernel can be used to load something that is not a Linux kernel (like a kvm-unit-tests test), in which case a format which is not supported by a Linux kernel can still be perfectly valid. For example, kvm-unit-tests load the test environment as an initrd in plain ASCII format. 3. It cuts down on the maintenance effort when new formats are added to the Linux kernel. Not a big deal, since that doesn't happen very often, but it's still an effort with very little gain (see point #1 above). Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220214165830.69207-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-16virtio/pci: Signal INTx interrupts as level instead of edgeMarc Zyngier2-2/+2
It appears that the way INTx is emulated is "slightly" out of spec in kvmtool. We happily inject an edge interrupt, even if the spec mandates a level. This doesn't change much for either the guest or userspace (only KVM will have a bit more work tracking the EOI), but at least this is correct. Reported-by: Pierre Gondois <pierre.gondois@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220131160242.2665191-1-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-02-16x86: Set the correct APIC IDMuchun Song1-2/+4
When kvmtool boots a kernel, the dmesg will print the following message: [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 APIC: 30 Fix this by setting up correct initial_apicid to cpu_id. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220216113735.52240-2-songmuchun@bytedance.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-16x86: Fix initialization of irq mptableMuchun Song1-1/+1
When dev_hdr->dev_num is greater one, the initialization of last_addr is wrong. Fix it. Fixes: f83cd16 ("kvm tools: irq: replace the x86 irq rbtree with the PCI device tree") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20220216113735.52240-1-songmuchun@bytedance.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14riscv: Generate PCI host DT nodeAnup Patel4-0/+115
This patch extends FDT generation to generate PCI host DT node. Of course, PCI host for Guest/VM is not useful at the moment because it's mostly for PCI pass-through and we don't have IOMMU and interrupt routing available for KVM RISC-V. In future, we might be able to use PCI host for VirtIO PCI transport or other software emulated PCI devices. Signed-off-by: Anup Patel <anup.patel@wdc.com> Link: https://lore.kernel.org/r/20211119124515.89439-9-anup.patel@wdc.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14riscv: Handle SBI calls forwarded to user spaceAnup Patel2-1/+96
The kernel KVM RISC-V module will forward certain SBI calls to user space. These forwared SBI calls will usually be the SBI calls which cannot be emulated in kernel space such as PUTCHAR and GETCHAR calls. This patch extends kvm_cpu__handle_exit() to handle SBI calls forwarded to user space. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com> Link: https://lore.kernel.org/r/20211119124515.89439-8-anup.patel@wdc.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14riscv: Generate FDT at runtime for Guest/VMAnup Patel6-0/+255
We generate FDT at runtime for RISC-V Guest/VM so that KVMTOOL users don't have to pass FDT separately via command-line parameters. Also, we provide "--dump-dtb <filename>" command-line option to dump generated FDT into a file for debugging purpose. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com> Link: https://lore.kernel.org/r/20211119124515.89439-7-anup.patel@wdc.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14riscv: Add PLIC device emulationAnup Patel4-2/+526
The PLIC (platform level interrupt controller) manages peripheral interrupts in RISC-V world. The per-CPU interrupts are managed using CPU CSRs hence virtualized in-kernel by KVM RISC-V. This patch adds PLIC device emulation for KVMTOOL RISC-V. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> [For PLIC context CLAIM register emulation] Signed-off-by: Anup Patel <anup.patel@wdc.com> Link: https://lore.kernel.org/r/20211119124515.89439-6-anup.patel@wdc.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14riscv: Implement Guest/VM VCPU arch functionsAnup Patel2-7/+390
This patch implements kvm_cpu__<xyz> Guest/VM VCPU arch functions. These functions mostly deal with: 1. VCPU allocation and initialization 2. VCPU reset 3. VCPU show/dump code 4. VCPU show/dump registers We also save RISC-V ISA, XLEN, and TIMEBASE frequency for each VCPU so that it can be later used for generating Guest/VM FDT. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com> Link: https://lore.kernel.org/r/20211119124515.89439-5-anup.patel@wdc.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14riscv: Implement Guest/VM arch functionsAnup Patel2-6/+134
This patch implements all kvm__arch_<xyz> Guest/VM arch functions. These functions mostly deal with: 1. Guest/VM RAM initialization 2. Updating terminals on character read 3. Loading kernel and initrd images Firmware loading is not implemented currently because initially we will be booting kernel directly without any bootloader. In future, we will certainly support firmware loading. Signed-off-by: Anup Patel <anup.patel@wdc.com> Link: https://lore.kernel.org/r/20211119124515.89439-4-anup.patel@wdc.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14riscv: Initial skeletal supportAnup Patel13-5/+440
This patch adds initial skeletal KVMTOOL RISC-V support which just compiles for RV32 and RV64 host. Signed-off-by: Anup Patel <anup.patel@wdc.com> Link: https://lore.kernel.org/r/20211119124515.89439-3-anup.patel@wdc.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14update_headers: Sync-up ABI headers with Linux-5.16-rc1Anup Patel4-14/+557
We sync-up all ABI headers with Linux-5.16-rc1 so that RISC-V specfic changes in include/linux/kvm.h are available. Signed-off-by: Anup Patel <anup.patel@wdc.com> Link: https://lore.kernel.org/r/20211119124515.89439-2-anup.patel@wdc.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14Makefile: Calculate the correct kvmtool versionhaibiao.xiao1-2/+2
Command 'lvm version' works incorrect. It is expected to print: # ./lvm version # kvm tool [KVMTOOLS_VERSION] but the KVMTOOLS_VERSION is missed: # ./lvm version # kvm tool The KVMTOOLS_VERSION is defined in the KVMTOOLS-VERSION-FILE file which is included at the end of Makefile. Since the CFLAGS is a 'Simply expanded variables' which means CFLAGS is only scanned once. So the definetion of KVMTOOLS_VERSION at the end of Makefile would not scanned by CFLAGS. So the '-DKVMTOOLS_VERSION=' remains empty. I fixed the bug by moving the '-include $(OUTPUT)KVMTOOLS-VERSION-FILE' before the CFLAGS. Signed-off-by: haibiao.xiao <xiaohaibiao331@outlook.com> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20211210030708.288066-1-haibiao.xiao@zstack.io Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14arm/pci: update interrupt-map only for legacy interruptsSathyam Panda1-0/+10
The interrupt pin cell in "interrupt-map" property is defined only for legacy interrupts with a valid range in [1-4] corrspoding to INTA#..INTD#. And the PCI endpoint devices that support advance interrupt mechanism like MSI or MSI-X should not have an entry with value 0 in "interrupt-map". This patch takes care of this problem by avoiding redundant entries. Signed-off-by: Sathyam Panda <sathyam.panda@arm.com> Reviewed-by: Vivek Kumar Gautam <vivek.gautam@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20211111120231.5468-1-sathyam.panda@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-13vfio/pci: Align MSIX Table and PBA size to guest maximum page sizeAlexandru Elisei6-2/+21
When allocating MMIO space for the MSI-X table, kvmtool rounds the allocation to the host's page size to make it as easy as possible for the guest to map the table to a page, if it wants to (and doesn't do BAR reassignment, like the x86 architecture for example). However, the host's page size can differ from the guest's on architectures which support multiple page sizes. For example, arm64 supports three different page size, and it is possible for the host to be using 4k pages, while the guest is using 64k pages. To make sure the allocation is always aligned to a guest's page size, round it up to the maximum architectural page size. Do the same for the pending bit array if it lives in its own BAR. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20211012132510.42134-8-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-13vfio/pci: Print an error when offset is outside of the MSIX table or PBAAlexandru Elisei1-0/+9
Now that we keep track of the real size of MSIX table and PBA, print an error when the guest tries to write to an offset which is not inside the correct regions. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20211012132510.42134-7-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-13vfio/pci: Rework MSIX table and PBA physical size allocationAlexandru Elisei2-28/+42
When creating the MSIX table and PBA, kvmtool rounds up the table and pending bit array sizes to the host's page size. Unfortunately, when doing that, it doesn't take into account that the new size can exceed the device BAR size, leading to hard to diagnose errors for certain configurations. One theoretical example: PBA and table in the same 4k BAR, host's page size is 4k. In this case, table->size = 4k, pba->size = 4k, map_size = 4k, which means that pba->guest_phys_addr = table->guest_phys_addr + 4k, which is outside of the 4k MMIO range allocated for both structures. Another example, this time a real-world error that I encountered: happens with a 64k host booting a 4k guest, an RTL8168 PCIE NIC assigned to the guest. In this case, kvmtool sets table->size = 64k (because it's rounded to the host's page size) and pba->size = 64k. Truncated output of lspci -vv on the host: 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) Subsystem: TP-LINK Technologies Co., Ltd. TG-3468 Gigabit PCI Express Network Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 255 Region 0: I/O ports at 1000 [size=256] Region 2: Memory at 40000000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at 100000000 (64-bit, prefetchable) [size=16K] [..] Capabilities: [b0] MSI-X: Enable- Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 [..] When booting the guest: [..] [ 0.207444] pci-host-generic 40000000.pci: host bridge /pci ranges: [ 0.208564] pci-host-generic 40000000.pci: IO 0x0000000000..0x000000ffff -> 0x0000000000 [ 0.209857] pci-host-generic 40000000.pci: MEM 0x0050000000..0x007fffffff -> 0x0050000000 [ 0.211184] pci-host-generic 40000000.pci: ECAM at [mem 0x40000000-0x4fffffff] for [bus 00] [ 0.212625] pci-host-generic 40000000.pci: PCI host bridge to bus 0000:00 [ 0.213647] pci_bus 0000:00: root bus resource [bus 00] [ 0.214429] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] [ 0.215355] pci_bus 0000:00: root bus resource [mem 0x50000000-0x7fffffff] [ 0.216676] pci 0000:00:00.0: [10ec:8168] type 00 class 0x020000 [ 0.223771] pci 0000:00:00.0: reg 0x10: [io 0x6200-0x62ff] [ 0.239765] pci 0000:00:00.0: reg 0x18: [mem 0x50010000-0x50010fff] [ 0.244595] pci 0000:00:00.0: reg 0x20: [mem 0x50000000-0x50003fff] [ 0.246331] pci 0000:00:01.0: [1af4:1000] type 00 class 0x020000 [ 0.247278] pci 0000:00:01.0: reg 0x10: [io 0x6300-0x63ff] [ 0.248212] pci 0000:00:01.0: reg 0x14: [mem 0x50020000-0x500200ff] [ 0.249172] pci 0000:00:01.0: reg 0x18: [mem 0x50020400-0x500207ff] [ 0.250450] pci 0000:00:02.0: [1af4:1001] type 00 class 0x018000 [ 0.251392] pci 0000:00:02.0: reg 0x10: [io 0x6400-0x64ff] [ 0.252351] pci 0000:00:02.0: reg 0x14: [mem 0x50020800-0x500208ff] [ 0.253312] pci 0000:00:02.0: reg 0x18: [mem 0x50020c00-0x50020fff] [ 0.254760] pci 0000:00:00.0: BAR 4: assigned [mem 0x50000000-0x50003fff] (1) [ 0.255805] pci 0000:00:00.0: BAR 2: assigned [mem 0x50004000-0x50004fff] (2) Warning: [10ec:8168] Error activating emulation for BAR 2 Warning: [10ec:8168] Error activating emulation for BAR 2 [ 0.260432] pci 0000:00:01.0: BAR 2: assigned [mem 0x50005000-0x500053ff] Warning: [1af4:1000] Error activating emulation for BAR 2 Warning: [1af4:1000] Error activating emulation for BAR 2 [ 0.261469] pci 0000:00:02.0: BAR 2: assigned [mem 0x50005400-0x500057ff] Warning: [1af4:1001] Error activating emulation for BAR 2 Warning: [1af4:1001] Error activating emulation for BAR 2 [ 0.262499] pci 0000:00:00.0: BAR 0: assigned [io 0x1000-0x10ff] [ 0.263415] pci 0000:00:01.0: BAR 0: assigned [io 0x1100-0x11ff] [ 0.264462] pci 0000:00:01.0: BAR 1: assigned [mem 0x50005800-0x500058ff] Warning: [1af4:1000] Error activating emulation for BAR 1 Warning: [1af4:1000] Error activating emulation for BAR 1 [ 0.265481] pci 0000:00:02.0: BAR 0: assigned [io 0x1200-0x12ff] [ 0.266397] pci 0000:00:02.0: BAR 1: assigned [mem 0x50005900-0x500059ff] Warning: [1af4:1001] Error activating emulation for BAR 1 Warning: [1af4:1001] Error activating emulation for BAR 1 [ 0.267892] EINJ: ACPI disabled. [ 0.269922] virtio-pci 0000:00:01.0: virtio_pci: leaving for legacy driver [ 0.271118] virtio-pci 0000:00:02.0: virtio_pci: leaving for legacy driver [ 0.274122] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.275930] printk: console [ttyS0] disabled [ 0.276669] 1000000.U6_16550A: ttyS0 at MMIO 0x1000000 (irq = 13, base_baud = 115200) is a 16550A [ 0.278058] printk: console [ttyS0] enabled [ 0.278058] printk: console [ttyS0] enabled [ 0.279304] printk: bootconsole [ns16550a0] disabled [ 0.279304] printk: bootconsole [ns16550a0] disabled [ 0.281252] 1001000.U6_16550A: ttyS1 at MMIO 0x1001000 (irq = 14, base_baud = 115200) is a 16550A [ 0.282842] 1002000.U6_16550A: ttyS2 at MMIO 0x1002000 (irq = 15, base_baud = 115200) is a 16550A [ 0.284611] 1003000.U6_16550A: ttyS3 at MMIO 0x1003000 (irq = 16, base_baud = 115200) is a 16550A [ 0.286094] SuperH (H)SCI(F) driver initialized [ 0.286868] msm_serial: driver initialized [ 0.287890] [drm] radeon kernel modesetting enabled. [ 0.288826] cacheinfo: Unable to detect cache hierarchy for CPU 0 [ 0.293321] loop: module loaded KVM_SET_GSI_ROUTING: Invalid argument At (1), the guest writes 0x50000000 into BAR 4 of the NIC (which holds the MSIX table and PBA), expecting that will cover only 16k of address space (the BAR size), up to 0x50003fff, inclusive. On the host side, in vfio_pci_bar_activate(), kvmtool will actually register for MMIO emulation the region 0x50000000-0x5000ffff (64k in total) for the MSIX table and 0x50010000-0x5001ffff (another 64k) for the PBA (kvmtool set table->size and pba->size to 64k when it aligned them to the host's page size). Then at step (2), the guest writes the next available address (from its point of view) into BAR 2 of the NIC, which is 0x50004000. On the host side, the PCI emulation layer will search all the regions that overlap with the BAR address range (0x50004000-0x50004fff) and will find none because, just like the guest, it uses the BAR size to check for overlaps. When vfio_pci_bar_activate() is reached, kvmtool will try to register memory for this region, but it is already registered for the MSIX table emulation and fails. The same scenario repeats for every following memory BAR, because the MSIX table and PBA use memory from 0x50000000 to 0x5001ffff. The error at the end, which finally terminates the VM, is caused by the guest trying to write to a totally different BAR, which vfio-pci interpretes as a write to MSI-X table because it falls in the 64k region that was registered for emulation. The IRQ ID is not a valid SPI number and gicv2m_update_routing() returns an error (and sets errno to EINVAL). Fix this by aligning the table and PBA size to 8 bytes to allow for qword accesses, like PCI 3.0 mandates. For the sake of simplicity, the PBA offset in a BAR, in case of a shared BAR, is kept the same as the offset of the physical device. One hopes that the device respects the recommendations set forth in PCI LOCAL BUS SPECIFICATION, REV. 3.0, section "MSI-X Capability and Table Structures" Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20211012132510.42134-6-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-13vfio/pci: Rename PBA offset in device descriptor to fd_offsetAlexandru Elisei2-4/+4
The MSI-X capability defines a PBA offset, which is the offset of the PBA array in the BAR that holds the array. kvmtool uses the field "pba_offset" in struct msix_cap (which represents the MSIX capability) to refer to the [PBA offset:BAR] field of the capability; and the field "offset" in the struct vfio_pci_msix_pba to refer to offset of the PBA array in the device descriptor created by the VFIO driver. As we're getting ready to add yet another field that represents an offset to struct vfio_pci_msix_pba, try to avoid ambiguities by renaming the struct's "offset" field to "fd_offset". No functional change intended. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20211012132510.42134-5-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>