aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-11-22arm64: Add SVE supportHEADmasterDave Martin4-0/+26
This patch enables the Scalable Vector Extension for the guest when the host supports it. This requires use of the new KVM_ARM_VCPU_FINALIZE ioctl before the vcpu is runnable, so a new hook kvm_cpu__configure_features() is added to provide an appropriate place to do this work. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-11-22arm/arm64: Factor out ptrauth vcpu feature setupDave Martin4-7/+14
In the interest of readability, factor out the vcpu feature setup for ptrauth into a separate function. Also, because aarch32 doesn't have this feature or the related command line options anyway, move the actual code into aarch64/. Since ARM_VCPU_PTRAUTH_FEATURE is only there to make the ptrauth feature setup code compile on arm, it is no longer needed: inline and remove it. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-11-22KVM: arm/arm64: Add a vcpu feature for pointer authenticationAmit Daniel Kachhap3-0/+10
This patch adds a runtime capabality for KVM tool to enable Arm64 8.3 Pointer Authentication in guest kernel. Two vcpu features KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] are supplied together to enable Pointer Authentication in KVM guest after checking the capability. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> [merge new kernel heaers] Signed-off-by: Will Deacon <will@kernel.org>
2019-11-22update_headers: Sync kvm UAPI headers with Linux 5.3Will Deacon6-19/+234
We're going to need updated headers for arm64 SVE and ptrauth support. Signed-off-by: Will Deacon <will@kernel.org>
2019-10-25virtio: Ensure virt_queue is always initialisedWill Deacon3-0/+3
Failing to initialise the virt_queue via virtio_init_device_vq() leaves, amongst other things, the endianness unspecified. On arm/arm64 this results in virtio_guest_to_host_uxx() treating the queue as big-endian and trying to translate bogus addresses: Warning: unable to translate guest address 0x80b8249800000000 to host Ensure the virt_queue is always initialised by the virtio device during setup. Cc: Marc Zyngier <maz@kernel.org> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-02README: Update my email addressJulien Thierry1-1/+1
My @arm.com address is gonna stop working. Update README information with an address people can use to actually reach me. Signed-off-by: Julien Thierry <julien.thierry.kdev@gmail.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-07-03update_headers.sh: arm64: Copy sve_context.h if availableDave Martin1-1/+13
The SVE KVM support for arm64 includes the additional backend header <asm/sve_context.h> from <asm/kvm.h>. So update this header if it is available. To avoid creating a sudden dependency on a specific minimum kernel version, ignore such optional headers if the source kernel tree doesn't have them. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-07-03update_headers.sh: Cleanly report failure on errorDave Martin1-0/+2
If in intermediate step fails, update_headers.sh blindly continues and may return success status. To avoid errors going unnoticed when driving this script, exit and report failure status as soon as something goes wrong. For good measure, also fail on expansion of undefined shell variables to aid future maintainers. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-07-03update_headers.sh: Add missing shell quotingDave Martin1-5/+5
update_headers.sh can break if the current working directory has a funny name or if something odd is passed for LINUX_ROOT. In the interest of cleanliness, quote where appropriate. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-07-03README: Add maintainers sectionWill Deacon1-0/+6
Julien has kindly offered to help maintain kvmtool, but it occurred to me that we don't actually provide any maintainer contact details in the repository as it stands. Add a brief "Maintainers" section to the README, immediately after the "Contributing" section so that people know who to nag about merging and reviewing patches. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-06-10run: Check for ghost socket file upon VM creationAndre Przywara1-4/+27
Kvmtool creates a (debug) UNIX socket file for each VM, using its (possibly auto-generated) name as the filename. There is a check using access(), which bails out with an error message if a socket with that name already exists. Aside from this check being unnecessary, as the bind() call later would complain as well, this is also racy. But more annoyingly the bail out is not needed most of the time: an existing socket inode is most likely just an orphaned leftover from a previous kvmtool run, which just failed to remove that file, because of a crash, for instance. Upon finding such a collision, let's first try to connect to that socket, to detect if there is still a kvmtool instance listening on the other end. If that fails, this socket will never come back to life, so we can safely clean it up and reuse the name for the new guest. However if the connect() succeeds, there is an actual live kvmtool instance using this name, so not proceeding is the only option. This should never happen with the (PID based) automatically generated names, though. This avoids an annoying (and not helpful) error message and helps automated kvmtool runs to proceed in more cases. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-10list: Clean up ghost socket filesAndre Przywara1-3/+3
When kvmtool (or the host kernel) crashes or gets killed, we cannot automatically remove the socket file we created for that VM. A later call of "lkvm list" iterates over all those files and complains about those "ghost socket files", as there is no one listening on the other side. Also sometimes the automatic guest name generation happens to generate the same name again, so an unrelated "lkvm run" later complains and stops, which is bad for automation. As the only code doing a listen() on this socket is kvmtool upon VM *creation*, such an orphaned socket file will never come back to life, so we can as well unlink() those sockets in the code. This spares the user from doing it herself. We keep the message in the code to notify the user of this. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-29virtio/blk: Avoid taking pointer to packed structAndre Przywara1-2/+2
clang and GCC9 refuse to compile virtio/blk.c with the following message: virtio/blk.c:161:37: error: taking address of packed member 'geometry' of class or structure 'virtio_blk_config' may result in an unaligned pointer value [-Werror,-Waddress-of-packed-member] struct virtio_blk_geometry *geo = &conf->geometry; Since struct virtio_blk_geometry is in a kernel header, we can't do much about the packed attribute, but as Peter pointed out, the solution is rather simple: just get rid of the convenience variable and use the original struct member directly. Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-29vfio: rework vfio_irq_set payload settingAndre Przywara1-8/+13
struct vfio_irq_set from the kernel headers contains a variable sized array to hold a payload. The vfio_irq_eventfd struct puts the "fd" member right after this, hoping it to automatically fit in the payload slot. But having a variable sized type not at the end of a struct is a GNU C extension, so clang will refuse to compile this. Solve this by somewhat doing the compiler's job and place the payload manually at the end of the structure. Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-29vfio: remove unneeded testAndre Przywara1-5/+0
clang complained that the comparison of an u8 variable against 256 is somewhat pointless. Just remove the check, as the condition will never hit. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-29vfio: remove spurious ampersandAndre Przywara1-1/+1
As clang rightfully pointed out, the ampersand in front of this member looks wrong. Remove it so we actually really compare against the count being 0. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26virtio/blk: sync I/O on resetJean-Philippe Brucker1-0/+2
Ensure that all requests are complete when resetting a virtqueue, by draining the AIO queue after stopping the submission thread. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26disk/aio: Add wait() disk operationJean-Philippe Brucker5-32/+70
Add a call into the disk layer to synchronize the AIO queue. Wait for all pending requests to complete. This will be necessary when resetting a virtqueue. The wait() operation isn't the same as flush(). A VIRTIO_BLK_T_FLUSH request ensures that any write request *that completed before the FLUSH is sent* is committed to permanent storage (e.g. written back from a write cache). But it doesn't do anything for requests that are still pending when the FLUSH is sent. Avoid introducing a mutex on the io_submit() and io_getevents() paths, because it can lead to 30% throughput drop on heavy FIO jobs. Instead manage an inflight counter using compare-and-swap operations, which is simple enough as the caller doesn't submit new requests while it waits for the AIO queue to drain. The __sync_fetch_and_* operations are a bit rough since they use full barriers, but that didn't seem to introduce a performance regression. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26disk/aio: Cancel AIO thread on cleanupJean-Philippe Brucker2-2/+4
If the AIO thread is still calling io_getevents() while the exit path calls io_destroy(), it will segfault. Wait for the thread to finish before destroying the context. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26disk/aio: Fix AIO threadJean-Philippe Brucker1-5/+16
Currently when the kernel completes a batch of AIO requests and signals it via eventfd, we retrieve at most AIO_MAX events (256), and ignore the rest. Call io_getevents() again in case more events are pending. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26disk/aio: Fix use of disk->asyncJean-Philippe Brucker5-21/+15
Add an 'async' attribute to disk_image_operations, that describes if they can submit async I/O or not. disk_image->async is now set iff CONFIG_HAS_AIO and the ops do use AIO. This fixes qcow1, which used to set async = 1 even though the qcow operations don't use AIO. The disk core would perform the read/write operation without pushing the completion onto the virtio queue, and the guest would be stuck waiting. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26disk/aio: Refactor AIO codeJean-Philippe Brucker7-125/+167
Move all AIO code to a separate file, disk/aio.c, to remove as much #ifdefs as possible. Split the raw read/write disk ops into async and sync, and choose which ones to use depending on CONFIG_HAS_AIO. Note that we fix raw_image__close() which incorrectly checked CONFIG_HAS_VIRTIO instead of CONFIG_HAS_AIO, and closed an unitialized disk->evt. A subsequent commit will complete this refactoring by fixing use of the 'async' disk attribute. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26guest: sync disk before shutting downJean-Philippe Brucker1-0/+1
sync() should be called before reboot(RB_AUTOBOOT), otherwise data written to disks might be lost. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26virtio/blk: Set VIRTIO_BLK_F_RO when the disk is read-onlyJean-Philippe Brucker3-3/+12
Since we don't currently tell the guest when the disk backend is read-only, it will report any inconsistent read after write as an error. An image may be read-only either because user requested it on the command-line, or because write support isn't implemented. Pass the read-only attribute using the VIRTIO_BLK_F_RO feature. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26qcow: Fix qcow1 exit faultJean-Philippe Brucker1-0/+1
Even though qcow1 doesn't use the refcount table, the cleanup path still attempts to iterate over its LRU list. Initialize the list to avoid a segfault on exit. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26brlock: fix build with KVM_BRLOCK_DEBUGJulien Thierry3-6/+12
Build breaks when using KVM_BRLOCK_DEBUG because the header was seamingly conceived to be included in a single .c file... Fix this by moving the definition of the read/write lock into the kvm struct. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26brlock: Always pass argument to br_read_lock/unlockJulien Thierry2-4/+4
The kvm argument is not passed to br_read_lock/unlock, this works for the barrier implementation because the argument is not used. This ever breaks if another lock implementation is used. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26Makefile: Only compile vesa for archs that need itJulien Thierry1-1/+2
The vesa framebuffer is only used by architectures that explicitly require it (i.e. x86). Compile it out for architectures not using it, as its current implementation might not work for them. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26vfio-pci: Re-enable INTx mode when disable MSI/MSIXLeo Yan1-6/+22
Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by default to disable INTx mode when enable MSI/MSIX mode; but this logic is easily broken if the guest PCI driver detects the MSI/MSIX cannot work as expected and tries to rollback to use INTx mode. In this case, the INTx mode has been disabled and has no chance to re-enable it, thus both INTx mode and MSI/MSIX mode cannot work in vfio. Below shows the detailed flow for introducing this issue: vfio_pci_configure_dev_irqs() `-> vfio_pci_enable_intx() vfio_pci_enable_msis() `-> vfio_pci_disable_intx() vfio_pci_disable_msis() => Guest PCI driver disables MSI To fix this issue, when disable MSI/MSIX we need to check if INTx mode is available for this device or not; if the device can support INTx then re-enable it so that the device can fallback to use it. Since vfio_pci_disable_intx() / vfio_pci_enable_intx() pair functions may be called for multiple times, this patch uses 'intx_fd == -1' to denote the INTx is disabled, the pair functions can directly bail out when detect INTx has been disabled and enabled respectively. Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26vfio-pci: Add new function for INTx one-time initialisationLeo Yan1-27/+40
To support INTx enabling for multiple times, we need firstly to extract one-time initialisation and move the related code into a new function vfio_pci_init_intx(); if later disable and re-enable the INTx, we can skip these one-time operations. This patch move below three main operations for INTx one-time initialisation from function vfio_pci_enable_intx() into function vfio_pci_init_intx(): - Reserve 2 FDs for INTx; - Sanity check with ioctl VFIO_DEVICE_GET_IRQ_INFO; - Setup pdev->intx_gsi. Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-26vfio-pci: Release INTx's unmask eventfd properlyLeo Yan2-0/+3
The PCI device INTx uses event fd 'unmask_fd' to signal the deassertion of the line from guest to host; but this eventfd isn't released properly when disable INTx. This patch firstly adds field 'unmask_fd' in struct vfio_pci_device for storing unmask eventfd and close it when disable INTx. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-11arm: Auto-detect guest GIC typeAndre Przywara2-0/+17
At the moment kvmtool always tries to instantiate a virtual GICv2 interrupt controller for the guest, and fails with some scary error message if that doesn't work. The user has then to manually specify "--irqchip=gicv3", which is not really obvious. With the advent of more GICv3-only machines, let's try to be more clever and implement some auto-detection of the GIC type needed: We try gicv3-its, gicv3, gicv2m and gicv2, in that order. The first one succeeding wins. For GICv2 machines the first two will always fail. On GICv3 machines offering GICv2 compatibility we used to prefer a virtual GICv2 in the guest, but these days the GICv3 support both in guests and in KVM is equally mature and wide-spread, so we should use the GICv3 emulation for the guest as well. This algorithm is in effect is there is no explicit --irqchip parameter on the command line. We still allow the GIC type to be set explicitly. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-08net/dhcp: avoid misleading strncpyAndre Przywara1-1/+1
The code for copying an empty IP address into the DHCP opt buffer used strncpy, however used the source length as the size argument. GCC 8.x complains about it. Since the source string is actually fixed, just revert to the old strcpy, which gives us actually the same level of security in this case, but makes the compiler happy. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-08virtio: use strlcpyAndre Przywara2-3/+5
GCC 8.x complains about improper usage of strncpy in virtio/net.c and virtio/scsi.c: In function 'virtio_scsi_init_one', inlined from 'virtio_scsi_init' at virtio/scsi.c:285:7: virtio/scsi.c:247:2: error: 'strncpy' specified bound 224 equals destination size [-Werror=stringop-truncation] strncpy((char *)&sdev->target.vhost_wwpn, disk->wwpn, sizeof(sdev->target.vhost_wwpn)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this and the other occurences in virtio/ by using strlcpy instead of strncpy. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-08builtin-run: Replace strncpy calls with strlcpyAndre Przywara1-2/+2
There are two uses of strncpy in builtin-run.c, where we don't make proper use of strncpy, so that GCC 8.x complains and aborts compilation. Replace those two calls with strlcpy(), which does the right thing in our case. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-08Makefile: support -s switchAndre Przywara1-1/+5
"make -s" suppresses normal output, just shows warnings and errors. But since we explicitly override the make output with our fancy concise version, we miss out on this feature. Do as the kernel does and explicitly suppress every normal output when -s is given. This helps to spot warnings that scroll out of the terminal window too quickly. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-08arm: fdt: add stdout-path to /chosen nodeAndre Przywara3-0/+21
The DT spec describes the stdout-path property in the /chosen node to contain the DT path for a default device usable for outputting characters. The Linux kernel uses this for earlycon (without further parameters), other DT users might rely on this as well. Add a stdout-path property pointing to the "serial0" alias, then add an aliases node at the end of the FDT, containing the actual path. This allows the FDT generation code in hw/serial.c to set this string. Even when we use the virtio console, the serial console is still there and works, so we can expose this unconditionally. Putting the virtio console path in there will not work anyway. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-08kvmtool: 9p: fix overapping snprintfAnisse Astier1-1/+8
GCC 8.2 gives this warning: virtio/9p.c: In function ‘virtio_p9_create’: virtio/9p.c:335:21: error: passing argument 1 to restrict-qualified parameter aliases with argument 4 [-Werror=restrict] ret = snprintf(dfid->path, size, "%s/%s", dfid->path, name); ~~~~^~~~~~ ~~~~~~~~~~ Fix it by allocating a temporary string with dfid->path content instead of overwriting it in-place, which is limited in glibc snprintf with the __restrict qualifier. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Anisse Astier <aastier@freebox.fr> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-08virtio: fix warning on strncpyAnisse Astier1-3/+3
GCC 8.2 gives this warning: virtio/net.c: In function ‘virtio_net__tap_init’: virtio/net.c:336:47: error: argument to ‘sizeof’ in ‘strncpy’ call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] strncpy(ifr.ifr_name, ndev->tap_name, sizeof(ndev->tap_name)); ^ virtio/net.c:348:47: error: argument to ‘sizeof’ in ‘strncpy’ call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] strncpy(ifr.ifr_name, ndev->tap_name, sizeof(ndev->tap_name)); ^ Fix it by using sizeof of destination instead, even if they're the same size in this case. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Anisse Astier <aastier@freebox.fr> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-08builtin-run: Fix warning when resolving pathAnisse Astier1-1/+3
GCC 8.2 gives this warning: builtin-run.c: In function ‘kvm_run_write_sandbox_cmd.isra.1’: builtin-run.c:417:28: error: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size 4091 [-Werror=format-truncation=] snprintf(dst, len, "/host%s", resolved_path); ^~ ~~~~~~~~~~~~~ It's because it understands that len is PATH_MAX, the same as resolved_path's size. This patch handles the case where the string is truncated, and fixes the warning. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Anisse Astier <aastier@freebox.fr> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-01init: fix sysfs mount argumentsDmitry Monakhov1-1/+1
It is not good idea to pass empty 'source' argument to mount(2) because libmount complains about incorrect /proc/self/mountinfo structure. This affects many applications such as findmnt, umount and etc. Let's add fake source argument to sysfs mount command as we do with all other filesystems. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-01arm: Allow command line for firmwareAndre Przywara1-6/+3
When loading a firmware instead of a kernel, we can still pass on any *user-provided* command line, as /chosen/bootargs is a generic device tree feature. We just need to make sure to not pass our mangled-for-Linux version. This allows to run "firmware" images which make use of a command line, still are not Linux kernels, like kvm-unit-tests. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-01Makefile: Remove echoing of kvmtools version fileAndre Przywara1-1/+0
On every build we report the kvmtool "version" number, which isn't meaningful at all anymore. Remove the line from the KVMTOOLS-VERSION-GEN script to drop a pointless message. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-30arm: pmu: Improve PMU error reportingAndre Przywara1-1/+1
The KVM ioctls mostly just return -1 in the error case, leaving the actual error code in errno. Change the output of the PMU error message to actually print this error code instead of the generic -1. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-30arm: turn pr_info() into pr_debug() messagesAndre Przywara3-10/+11
For whatever reason on ARM/arm64 machines kvmtool greets us with quite some elaborate messages: Info: Loaded kernel to 0x80080000 (18704896 bytes) Info: Placing fdt at 0x8fe00000 - 0x8fffffff Info: virtio-mmio.devices=0x200@0x10000:36 Info: virtio-mmio.devices=0x200@0x10200:37 Info: virtio-mmio.devices=0x200@0x10400:38 This is not really useful information for the casual user, so change those lines to use pr_debug(). This also fixes the long standing line ending issue for the mmio output. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22virtio/console: Implement resetJean-Philippe Brucker1-12/+20
The virtio-console reset cancels all running jobs. Unfortunately we don't have a good way to prevent the term polling thread from getting in the way, read invalid data during reset and cause a segfault. To prevent this, move all handling of the Rx queue in the threadpool job. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22virtio/p9: Implement resetJean-Philippe Brucker1-0/+16
The p9 reset cancels all running jobs and closes any open fid. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22threadpool: Add cancel() functionJean-Philippe Brucker2-1/+26
When resetting a virtqueue, it is often necessary to make sure that the associated threadpool job isn't running anymore. Add a function to cancel a job. A threadpool job has three states: idle, queued and running. A job is queued when it is in the job list. It is running when it is out the list, but its signal count is greater than zero. It is idle when it is both out of the list and its signal count is zero. The cancel() function simply waits for the job to be idle. It is up to the caller to make sure that the job isn't queued concurrently. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22virtio/blk: Reset virtqueueJean-Philippe Brucker1-22/+45
Move pthread creation to init_vq, and kill the thread in exit_vq. Initialize the virtqueue states at runtime. All in-flight I/O is canceled with the virtqueue pthreads, except for AIO threads, but after reading the code I'm not sure if AIO has ever worked anyway. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22virtio/net: Implement device and virtqueue resetJean-Philippe Brucker1-0/+63
On exit_vq(), clean all resources allocated for the queue. When the device is reset, clean the backend. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22virtio/net: Clean virtqueue stateJean-Philippe Brucker1-53/+57
Currently the virtqueue state is mixed with the netdev state. Move it to a separate structure. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22net/uip: Add exit functionJean-Philippe Brucker5-16/+108
When resetting the virtio-net queues, the UIP state needs to be reset as well. Stop all threads (one per TCP stream and UDP connection) and free memory. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22virtio: Add reset() callbackJean-Philippe Brucker6-11/+37
When the guest writes a status of 0, the device should be reset. Add a reset() callback for the transport layer to reset all queues and their ioeventfd. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22virtio: Add exit_vq() callbackJean-Philippe Brucker4-10/+54
Virtio allows to reset individual virtqueues. For legacy devices, it's done by writing an address of 0 into the PFN register. Modern devices have an "enable" register. Add an exit_vq() callback to all devices. A lot more work is required by each device to clean up their virtqueue state, and by the core to reset things like MSI routes and ioeventfds. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22virtio: Add get_vq() callbackJean-Philippe Brucker10-27/+29
To ease future changes to the core, replace get_pfn_vq() with get_vq(). This way adding new generic operation on virtqueues won't require modifying every virtio device. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22virtio: Add get_vq_count() callbackJean-Philippe Brucker8-0/+45
Modern virtio requires devices to report how many queues they support. Add an operation to query all devices about their capacities. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22virtio: Implement notify_statusJean-Philippe Brucker10-5/+79
Modern virtio require proper status handling and reset. A "notify_status" callback is already present in the virtio ops, but isn't implemented by any device. Instead they currently use "set_guest_feature" to reset the device and deal with endianess. This isn't sufficient for proper device reset, so add the notify_status callback to all devices that need it. To add useful hints like "start" and "stop", extend the status variable to 32-bits. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> [Julien T: Remove VIRTIO_CONFIG_S_NEEDS_RESET from config mask, as it is virtio v1+ macro and kvmtool only implements v0.9, this macro should not be referenced for now] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22ioeventfd: Fix removal of ioeventfdJean-Philippe Brucker1-2/+4
Fix three bugs that prevent removal of ioeventfds in KVM. Store the flags in the right structure, check the datamatch parameter, and pass the fd to KVM. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22arm: Support firmware loadingJulien Thierry3-3/+77
Implement firmware image loading for arm and set the boot start address to the firmware address. Add an option for the user to specify where to map the firmware. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22builtin-run: Do not look for default kernel when firmware is providedJulien Thierry1-8/+16
When a firmware file is provided, kvmtool is not responsible for loading a kernel image. There is no reason for looking for a default kernel image when loading a firmware. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22arm: Move firmware functionJulien Thierry2-10/+10
Firmware loading/setup function are in fdt file while it is not very related to this. Move them to the file that does the main init/setup for memory. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22rtc: Initialize the Register D for MC146818 RTCSami Mujawar1-0/+8
Some software drivers check the VRT bit (BIT7) of Register D before using the MC146818 RTC. Initialized the VRT bit in rtc__init() to indicate that the RAM and time contents are valid. Signed-off-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22arm64: Correct ARM64_CORE_REG() size encodings for all core registersDave Martin1-2/+15
ARM64_CORE_REG() is currently only used to generate the KVM register IDs for registers that happen to be 64 bits in size, so KVM_REG_SIZE_U64 is hard-coded in the definition. To enable this macro to generate correct encodings for the FPSIMD registers too (which are a mix of 128-bit and 32-bit registers), this patch extends the macro to encode the correct size for each class of register in KVM_REG_ARM_CORE. The approach is crude, but because the KVM_REG_ARM_CORE ID arrangement is ABI, it's not expected to evolve. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22update_headers: Sync kvm UAPI headers with linux v5.0-rc2Dave Martin6-14/+461
The local copies of the kvm user API headers are getting stale. In preparation for some arch-specific updated, this patch reflects a re-run of util/update_headers.sh to pull in upstream updates from linux v5.0-rc2. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-22guest: Add generated file guest/guest_init.c to .gitignoreDave Martin1-0/+2
guest/guest_init.c is a generated file, but git doesn't currently ignore it. This can be annoying when running git status etc. This patch adds a suitable .gitignore entry for this file. Signed-off-by: Dave Martin <Dave.Martin@arm.com> [will: Do the same for guest/guest_pre_init.c] Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-02kvm-cpu: Pause vCPU in signal handlerJulien Thierry1-3/+12
Currently, the handling a pause signal only sets a state that will be checked at the begining of the CPU run loop. At the checking point the vCPU sends the notification that it is actually paused allowing the pause requester to confirm all vCPUs are paused. Receiving the pause signal during a KVM_RUN ioctl will make KVM exit to userspace. However, there is a small window between that check on cpu->paused and the execution of KVM_RUN where the signal has been received but the vCPU does not go back through the notification and starts KVM_RUN. Since there is no guarantee the vCPU will come back to userspace, the pause requester might deadlock. Perform the pause directly from the signal handler. This relies on a vCPU thread never receiving a pause signal while being pause, but such scenario would have caused a deadlock for the pause requester anyway. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-02kvm: Do not pause already paused vcpusJulien Thierry2-4/+5
With the following sequence: kvm__pause(); kvm__continue(); kvm__pause(); There is a chance that not all paused threads have been resumed, and the second kvm__pause will attempt to pause them again. Since the paused thread is waiting to own the pause_lock, it won't write its second pause notification. kvm__pause will be waiting for that notification while owning pause_lock, so... deadlock. Simple solution is not to try to pause thread that had not the chance to resume. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-02virtio: Fix ordering of virt_queue__available()Jean-Philippe Brucker1-1/+8
After adding buffers to the virtio queue, the guest increments the avail index. It then reads the event index to check if it needs to notify the host. If the event index corresponds to the previous avail value, then the guest notifies the host. Otherwise it means that the host is still processing the queue and hasn't had a chance to increment the event index yet. Once it gets there, the host will see the new avail index and process the descriptors, so there is no need for a notification. This is only guaranteed to work if both threads write and read the indices in the right order. Currently a barrier is missing from virt_queue__available(), and the host may not see an up-to-date value of event index after writing avail. HOST | GUEST | | write avail = 1 | mb() | read event -> 0 write event = 0 | == prev_avail -> notify read avail -> 1 | | write event = 1 | read avail -> 1 | wait() | write avail = 2 | mb() | read event -> 0 | != prev_avail -> no notification By adding a memory barrier on the host side, we ensure that it doesn't miss any notification. Reviewed-By: Steven Price <steven.price@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-08-16ioport: unregister port device when unregistering portJulien Thierry1-0/+2
Ioport register bus devices when they registered. These devices are not unregistered when the ioports entries containing their headers are unregistered. This results in dangling pointers in the device rb_tree. Unregister ioport bus devices when the ioport is unregistered. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-13Makefile: Try dynamic linkage for bfdJulien Thierry1-1/+7
On Debian Stretch/Ubuntu 14.04, the libbfd provided by libbfd-dev or binutils-dev packages does not like being linked statically. Add a dynamic linkage test when detecting libbfd. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-13Makefile: fix skipped dependenciesJulien Thierry1-10/+12
For some optional dependencies, both static and dynamic linking is tested. But if the first one being tested fails, the dependency is added to the NOTFOUND list and reported as being skipped while it might still be built with another linkage. Add optional dependencies to NOTFOUND only if both linkage are invalid. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-06Fix subfolder dependency generationJean-Philippe Brucker1-1/+1
When building an object "foo.o", kvmtool also creates a ".foo.o.d" file, using the dependency generation feature of CPP. This file describes in Makefile format all headers included by foo.c. When one header is modified, make rebuilds all objects that include it. Dependency files in subfolders are currently ignored by make, because the target doesn't contain the right prefix. For example virtio/.blk.o.d has target "blk.o" instead of "virtio/blk.o". As a result, rebuilding kvmtool without first issuing a make clean can introduce sneaky bugs, where different objects use mismatched headers. To write the right targets in dependency files, add a -MT argument to CPP. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19vfio: check reserved regions before mapping DMAJean-Philippe Brucker1-0/+49
Use the new reserved_regions API to ensure that RAM doesn't overlap any reserved region. This prevents for instance from mapping an MSI doorbell into the guest IPA space. For the moment we reject any overlapping. In the future, we might carve reserved regions out of the guest physical space. Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19Introduce reserved memory regionsJean-Philippe Brucker2-14/+64
When passing devices to the guest, there might be address ranges unavailable to the device. For instance, if address 0x10000000 corresponds to an MSI doorbell, any transaction from a device to that address will be directed to the MSI controller and might not even reach the IOMMU. In that case 0x10000000 is reserved by the physical IOMMU in the guest's physical space. This patch introduces a simple API to register reserved ranges of addresses that should not or cannot be provided to the guest. For the moment it only checks that a reserved range does not overlap any user memory (we don't consider MMIO) and aborts otherwise. It should be possible instead to poke holes in the guest-physical memory map and report them via the architecture's preferred route: * ARM and PowerPC can add reserved-memory nodes to the DT they provide to the guest. * x86 could poke holes in the memory map reported with e820. This requires to postpone creating the memory map until at least VFIO is initialized. * MIPS could describe the reserved ranges with the "memmap=mm$ss" kernel parameter. This would also require to call KVM_SET_USER_MEMORY_REGION for all memory regions at the end of kvmtool initialisation. Extra care should be taken to ensure we don't break any architecture, since they currently rely on having a linear address space with at most two memory blocks. This patch doesn't implement any address space carving. If an abort is encountered, user can try to rebuild kvmtool with different addresses or change its IOMMU resv regions if possible. Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19vfio: Support non-mmappable regionsJean-Philippe Brucker3-33/+176
In some cases device regions don't support mmap. They can still be made available to the guest by trapping all accesses and forwarding reads or writes to VFIO. Such regions may be: * PCI I/O port BARs. * Sub-page regions, for example a 4kB region on a host with 64k pages. * Similarly, sparse mmap regions. For example when VFIO allows to mmap fragments of a PCI BAR and forbids accessing things like MSI-X tables. We don't support the sparse capability at the moment, so trap these regions instead (if VFIO rejects the mmap). Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19vfio-pci: add MSI supportJean-Philippe Brucker3-7/+195
Allow guests to use the MSI capability in devices that support it. Emulate the MSI capability, which is simpler than MSI-X as it doesn't rely on external tables. Reuse most of the MSI-X code. Guests may choose between MSI and MSI-X at runtime since we present both capabilities, but they cannot enable MSI and MSI-X at the same time (forbidden by PCI). Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19vfio-pci: add MSI-X supportJean-Philippe Brucker2-12/+691
Add virtual MSI-X tables for PCI devices, and create IRQFD routes to let the kernel inject MSIs from a physical device directly into the guest. It would be tempting to create the MSI routes at init time before starting vCPUs, when we can afford to exit gracefully. But some of it must be initialized when the guest requests it. * On the KVM side, MSIs must be enabled after devices allocate their IRQ lines and irqchips are operational, which can happen until late_init. * On the VFIO side, hardware state of devices may be updated when setting up MSIs. For example, when passing a virtio-pci-legacy device to the guest: (1) The device-specific configuration layout (in BAR0) depends on whether MSIs are enabled or not in the device. If they are enabled, the device-specific configuration starts at offset 24, otherwise it starts at offset 20. (2) Linux guest assumes that MSIs are initially disabled (doesn't actually check the capability). So it reads the device config at offset 20. (3) Had we enabled MSIs early, host would have enabled the MSI-X capability and device would return the config at offset 24. (4) The guest would read junk and explode. Therefore we have to create MSI-X routes when the guest requests MSIs, and enable/disable them in VFIO when the guest pokes the MSI-X capability. We have to follow both physical and virtual state of the capability, which makes the state machine a bit complex, but I think it works. An important missing feature is the absence of pending MSI handling. When a vector or the function is masked, we should rewire the IRQFD to a special thread that keeps note of pending interrupts (or just poll the IRQFD before recreating the route?). And when the vector is unmasked, one MSI should be injected if it was pending. At the moment no MSI is injected, we simply disconnect the IRQFD and all messages are lost. Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19Add PCI device passthrough using VFIOJean-Philippe Brucker8-1/+973
Assigning devices using VFIO allows the guest to have direct access to the device, whilst filtering accesses to sensitive areas by trapping config space accesses and mapping DMA with an IOMMU. This patch adds a new option to lkvm run: --vfio-pci=<BDF>. Before assigning a device to a VM, some preparation is required. As described in Linux Documentation/vfio.txt, the device driver needs to be changed to vfio-pci: $ dev=0000:00:00.0 $ echo $dev > /sys/bus/pci/devices/$dev/driver/unbind $ echo vfio-pci > /sys/bus/pci/devices/$dev/driver_override $ echo $dev > /sys/bus/pci/drivers_probe Adding --vfio-pci=$dev to lkvm-run will pass the device to the guest. Multiple devices can be passed to the guest by adding more --vfio-pci parameters. This patch only implements PCI with INTx. MSI-X routing will be added in a subsequent patch, and at some point we might add support for passing platform devices to guests. Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19Add fls_long and roundup_pow_of_two helpersJean-Philippe Brucker1-0/+14
It's always nice to have a log2 handy, and the vfio-pci code will need to perform power of two allocation from an arbitrary size. Add fls_long and roundup_pow_of_two, based on the GCC builtin. Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19Import VFIO headersJean-Philippe Brucker1-0/+719
To ensure consistency between kvmtool and the kernel, import the UAPI headers of the VFIO version we implement. This is from Linux v4.12. Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19pci: add capability helpersJean-Philippe Brucker2-0/+27
Add a way to iterate over all capabilities in a config space. Add a search function for getting a specific capability. Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19Extend memory bank API with memory typesJean-Philippe Brucker8-13/+84
Introduce memory types RAM and DEVICE, along with a way for subsystems to query the global memory banks. This is required by VFIO, which will need to pin and map guest RAM so that assigned devices can safely do DMA to it. Depending on the architecture, the physical map is made of either one or two RAM regions. In addition, this new memory types API paves the way to reserved memory regions introduced in a subsequent patch. For the moment we put vesa and ivshmem memory into the DEVICE category, so they don't have to be pinned. This means that physical devices assigned with VFIO won't be able to DMA to the vesa frame buffer or ivshmem. In order to do that, simply changing the type to "RAM" would work. But to keep the types consistent, it would be better to introduce flags such as KVM_MEM_TYPE_DMA that would complement both RAM and DEVICE type. We could then reuse the API for generating firmware information (that is, for x86 bios; DT supports reserved-memory description). Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19irq: add irqfd helpersJean-Philippe Brucker7-22/+135
Add helpers to add and remove IRQFD routing for both irqchips and MSIs. We have to make a special case of IRQ lines on ARM where the initialisation order goes like this: (1) Devices reserve their IRQ lines (2) VGIC is setup with VGIC_CTRL_INIT (in a late_init call) (3) MSIs are reserved lazily, when the guest needs them Since we cannot setup IRQFD before (2), store the IRQFD routing for IRQ lines temporarily until we're ready to submit them. Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19pci: allow to specify IRQ type for PCI devicesJean-Philippe Brucker3-1/+11
Currently all our virtual device interrupts are edge-triggered. But we're going to need level-triggered interrupts when passing physical devices. Let the device configure its interrupt kind. Keep edge as default, to avoid changing existing users. Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-19pci: add config operations callbacks on the PCI headerJean-Philippe Brucker2-72/+89
When implementing PCI device passthrough, we will need to forward config accesses from a guest to the VFIO driver. Add a private cfg_ops structure to the PCI header, and use it in the PCI config access functions. A read from the guest first calls into the device's cfg_ops.read, to let the backend update the local header before filling the guest register. Same happens for a write, we let the backend perform the write and replace the guest-provided register with whatever sticks, before updating the local header. Try to untangle the PCI config access logic while we're at it. Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> [JPB: moved to a separate patch] Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-05-23arm/gic: move GICv2M gadget size into private headerAndre Przywara3-4/+2
The header files in arm/aarch*/include/asm/ are directly copied from Linux, so we can't just put our own definitions in there. Move the GICv2M MMIO frame size into a more private header, to avoid breaking the build once the header files are synced from Linux. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-05-23arm/gic: avoid GICv2m MMIO frame overlapAndre Przywara1-1/+1
Currently we accidentally overlap the GICv2m MMIO frame with the CPU interface region. Fix this by moving the v2m frame below the CPUI region. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-05-23arm/gic: remove extra 64K from ITS allocationAndre Przywara1-2/+2
The KVM_VGIC_V3_ITS_SIZE macro from the Linux API header file already covers the doorbell page, so we don't need to add that extra page size in our code. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-06virtio/pci: Register a single eventfd for vhostJean-Philippe Brucker1-7/+5
Vhost supports a single eventfd as the kick mechanism. Registering a second one will override the first. To ensure vhost works with our virtio-pci, only register the kick eventfd that is used by the guest. Fixes: a508ea95f954 ("virtio/pci: Use port I/O for configuration registers by default") Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-06ioeventfd: Don't register on the PIO bus if the arch doesn't support itJean-Philippe Brucker6-4/+18
virtio/pci.c registers a notification ioeventfd on both PIO and MMIO buses. But architectures other than x86 cannot differentiate MMIO from PIO traps, and the kernel always calls kvm_io_bus_read/write with KVM_MMIO_BUS as argument. As a result kvmtool's ioeventfd isn't used with virtio PCI, because the kernel can't find it and all accesses to the doorbell return to userspace. To fix it, don't set the PIO flag if the architecture doesn't support it. Fixes: a508ea95f954 ("virtio/pci: Use port I/O for configuration registers by default") Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-06ioeventfd: Always add a new event to the listJean-Philippe Brucker1-12/+11
With vhost, the USER_POLL flags isn't passed to ioeventfd__add_event, the function returns early and doesn't add the new event to the used_ioevents list. As a result ioeventfd__del_event doesn't remove the KVM event or free the structure. Always add the event to the list. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-19virtio: Clean up next_descJean-Philippe Brucker1-6/+4
The wmb() in next_desc seems out of place and the comments are inaccurate. Remove the unnecessary barrier and clean up next_desc(). next_desc() is called by virt_queue__get_head_iov() when filling the iov with desciptor addresses. It reads the descriptor's flag and next index. The virt_queue__get_head_iov() only reads the direct and indirect descriptors, and doesn't write any shared memory except from iov and cursors that will be read by the caller. As far as I can see, vhost (the kernel implementation of virtio device) does well without any barrier here, so I think it might be safe to remove. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-19virtio: Fix ordering of avail index and descriptor readJean-Philippe Brucker1-0/+8
One barrier seems to be missing from kvmtool's virtio implementation, between virt_queue__available() and virt_queue__pop(). In the following scenario "avail" represents the shared "available" structure in the virtio queue: Guest | Host | avail.ring[shadow] = desc_idx | while (avail.idx != shadow) smp_wmb() | /* missing smp_rmb() */ avail.idx = ++shadow | desc_idx = avail.ring[shadow++] If the host observes the avail.idx write before the avail.ring update, then it will fetch the wrong desc_idx. Add the missing barrier. This seems to fix the horrible bug I'm often seeing when running netperf in a guest (virtio-net + tap) on AMD Seattle. The TX thread reads the wrong descriptor index and either faults when accessing the TX buffer, or pushes the wrong index to the used ring. In that case the guest complains that "id %u is not a head!" and stops the queue. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-01-29virtio/pci: Use port I/O for configuration registers by defaultJean-Philippe Brucker1-3/+3
Modern virtio PCI is allowed to use both memory and I/O BARs for the config space, but legacy devices must use I/O for BAR0, as specified by Virtio v1.0 cs04: 4.1.5.1.1.1 Legacy Interface: A Note on Device Layout Detection "Transitional devices MUST expose the Legacy Interface in I/O space in BAR0." What virtio calls "I/O space" is most certainly port I/O, as hinted by the discussion in 4.1.4 Virtio Structure PCI Capabilities, where it distinguishes "memory BARs" from "I/O BARs". This is also the conclusion made by SeaBIOS [1], which only looks for port I/O in BAR0 when driving a transitional device. I think MMIO was made the default by a463650caad6 ("kvm tools: pci: add MMIO interface to virtio-pci devices") to support ARM targets, but we support PIO as well as MMIO nowadays. So let's make the legacy virtio implementation comply with the specification and use port I/O for BAR0. [1] https://patchwork.kernel.org/patch/10038927/ Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-01-29virtio: Support drivers that don't negotiate VIRTIO_RING_F_EVENT_IDXJean-Philippe Brucker2-2/+17
Bad things happen when the VIRTIO_RING_F_EVENT_IDX feature isn't negotiated and we try to write the avail_event anyway. SeaBIOS, for example, stores internal data where avail_event should be [1]. Technically the Virtio specification doesn't forbid the device from writing the avail_event, and it's up to the driver to reserve space for it ("the transitional driver [...] MUST allocate the total number of bytes for the virtqueue according to [formula containing the avail event]"). But it doesn't hurt us to avoid writing avail_event, and kvmtool needs changes for interrupt suppression anyway, in order to comply with the spec. Indeed Virtio 1.0 cs04 says, in 2.4.7.2 Device Requirements: Virtqueue Interrupt Suppression: """ If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: * The device MUST ignore the used_event value. * After the device writes a descriptor index into the used ring: - If flags is 1, the device SHOULD NOT send an interrupt. """ So let's do that. [1] https://patchwork.kernel.org/patch/10038931/ Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-01-29virtio: Save negotiated featuresJean-Philippe Brucker4-3/+16
We're going to need the features bits negotiated between host and guest in the core code. Save them in the virtio_device structure. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-12-14virtio-console: Fix pthread_cond initialization raceJean-Philippe Brucker1-2/+1
When characters are input on the console before virtio_console is initialized, the term.c poll thread will get stuck in virtio_console__inject_interrupt, because it ends up doing pthread_cond_wait on the uninitialized poll_cond, which will hang indefinitely. As a result it becomes impossible to input characters into the guest, even when using serial instead of virtio console. Initialize poll_cond statically to prevent this race. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03irq.h: fix compilation error due to missing bool typeAndre Przywara1-0/+1
Commit f6108d72e977 ("Add GICv2m support") introduced a bool return type, but missed to include the respective header (this was probably part of a former prerequisite series). Fix this by including the header. Fixes: f6108d72e977cce00e7bc824acd1d73da8cc9729 ("Add GICv2m support") Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03Add GICv2m supportJean-Philippe Brucker10-16/+236
GICv2m is a small extension to the GICv2 architecture, specified in the Server Base System Architecture (SBSA). It adds a set of register to converts MSIs into SPIs, effectively enabling MSI support for pre-GICv3 platforms. Implement a GICv2m emulation entirely in userspace. Add a thin translation layer in irq.c to catch the MSI->SPI routing setup of the guest, and then transform irqfd injection of MSI into the associated SPI. There shouldn't be any significant runtime overhead compared to gicv3-its. The device can be enabled by passing "--irqchip gicv2m" to kvmtool. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03Prevent segfault when kvm_pause is called too earlyJean-Philippe Brucker1-1/+1
When kvm_pause is called early (from taking the rwlock), it segfaults because the CPU array is initialized slightly later. Fix this. This doesn't happen at the moment but the gicv2m patch will register an MMIO region, which requires br_write_lock. gicv2m is instantiated by kvm__arch_init from within core_init (level 0). The CPU array is initialized later in base_init (level 1). Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03builtin-run: Fix console= parameter concatenationJean-Philippe Brucker1-3/+3
Commit 5857730ceee5 ("builtin-run: Pass console= parameter based on active console") adds a console parameter to the kernel command line, but doesn't account for x86 kvm__arch_set_cmdline populating real_cmdline without adding a space. Fix the concatenation. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-10-25builtin-run: Pass console= parameter based on active consoleWill Deacon2-3/+21
x86 already does this in the backend, but doing it in the generic code means that it is possible to boot a defconfig arm64 kernel under kvmtool without having to specify any additional parameters at all. Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-10-24arm: Allow all terminal ports to be bi-directionalWei Chen1-4/+2
In kvmtool, the terminal has 4 term-devices at most. And these term-devices can connect to serial8250 or virtio console ports. The kvmtool has a loop thread to detect the incoming data on these term-devices and then send the data to guest through serial8250 or virtio console ports. On x86, kvmtool allow to read data from all 4 term-devices. But on ARM, we only support reading data from the first term-devices. The data from the other term-devices will be ignored. Currently, we're adding the kvmtool support to runv (a kind of hyper container) with Hyperhq guys. Here we're using 3 serial ports in guest to communicate with host (Container runtime). On x86, it works fine, but on ARM it could not work. Because we're using terminal 2 to send/receive control message, but terminal 2 is single direction. In this case, we change the kvm__arch_read_term for ARM to allow reading data from all term-devices. Signed-off-by: Wei Chen <Wei.Chen@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-10-09arm64: Allow random seed to be specified for KASLRWill Deacon3-1/+6
Fully fledged bootloaders should really be populating this from within the guest using virtio-rng, but having a way to specify it on the cmdline is useful for developers or users without a bootloader. Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-30Makefile: avoid using linker for embedding guest_init binariesMarc Zyngier3-14/+21
At the moment we use the linker to convert the compiled guest_init binary into an ELF object file, so it can be embedded into the kvmtool binary and accessed later easily at runtime. Now this has two problems: 1) This approach does not work for MIPS, because the linker defaults to a different ABI than the compiler, so the GCC generated object files are not compatible with this converted binary. 2) The size symbol as it's used at the moment in the object file is subject to relocation, which leads to wrong results when using PIE builds, which is now the default for some distributions. Fix those two problems at once by using some shell tools to create a C source file containing the guest_init binary, which then gets compiled into a proper object file with the normal compiler and its flags. The size of the guest init binaries is now simply a variable, which does not get mangled at all. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-30Makefile: properly express guest_init dependencyAndre Przywara1-9/+15
So far the generation of the guest_init binaries is not properly modelled in the Makefile: the intermediate object files are not targets. This leads to failures when those files get deleted. So (also in preperation for the upcoming rework) rework the dependency chain to have those intermediate files covered as well, which involves splitting the generation into two steps. On the way use automatic variables where applicable and remove the explicit listing of the guest_init targets, which are now covered by the final $(GUEST_OBJS) targets. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-30net: Check UFO offloading support for tap driverWei Chen1-34/+80
In Linux commit fb652fdfe83710da0ca13448a41b7ed027d0a984: https://www.spinics.net/lists/netdev/msg443562.html The UFO support had been removed. If we use tap mode for network (--network mode=tap,tapif=...), we will get following error: "Warning: Config tap device TUNSETOFFLOAD error You have requested a TAP device, but creation of one has failed because: Invalid argument" So, if we're running with latest kernel, we'd better to remove TUN_F_UFO from TAP init. But if we're running with older kernels without above commit. We'll miss the UFO feature. In this case, we'd better to check the kernel UFO support status for tap driver. The tap UFO state will used in get_host_features to return correct VIRTIO_NET features. If we defer the tap UFO support check in virtio_net__tap_init, it will be too later. So we separate the tap create code from tap_init to a standalone function. This new function will be used in virtio_net_init to create tap device and check the tap UFO support status at the very beginning. Signed-off-by: Wei Chen <Wei.Chen@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-30x86/kvm-cpu.c: don't include <asm/msr-index.h>Thomas Petazzoni1-1/+16
Since kernel commit 25dc1d6cc3082aab293e5dad47623b550f7ddd2a ("x86: stop exporting msr-index.h to userland"), <asm/msr-index.h> is no longer exported to userspace. Therefore, any toolchain built with kernel headers >= 4.12 will no longer have this header file, causing a build failure in kvmtool. As a replacement, this patch includes inside x86/kvm-cpu.c the necessary MSR_* definitions. Reviewed-by: Riku Voipio <riku.voipio@linaro.org> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09ARM: fdt: Bump CPU_NAME_MAX_LEN to avoid silly GCC warningWill Deacon1-1/+1
GCC 7 warns about truncating the mpidr when we print the cpu_name into the device tree: arm/fdt.c: In function ‘setup_fdt’: arm/fdt.c:58:45: error: ‘%lx’ directive output may be truncated writing between 1 and 10 bytes into a region of size 7 [-Werror=format-truncation=] snprintf(cpu_name, CPU_NAME_MAX_LEN, "cpu@%lx", mpidr); Fix this by bumping the buffer to 15 bytes. We really only need 11 bytes, but GCC isn't smart enough to identify that we mask out the top buts of the MPIDR and the analysis just seems to be based on types. Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09kvmtool: makedev should be sourced from sysmacrosJeremy Linton2-1/+1
makedev() should be sourced from sys/sysmacros.h rather than sys/types.h. This is because glibc is moving away from having it available in types.h. https://patchwork.ozlabs.org/patch/611994/ Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09arm64: enable GICv3-ITS emulationAndre Przywara2-1/+3
With everything in place for the ITS emulation add a new option to the --irqchip parameter to allow the user to specify --irqchip=gicv3-its to enable the ITS emulation. This will trigger creating the FDT node and an ITS register frame to tell the kernel we want ITS emulation in the guest. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09extend GSI IRQ routing to take a device IDAndre Przywara4-6/+15
For ITS emulation we need the device ID along with the MSI payload and doorbell address to identify an MSI, so we need to put it in the GSI IRQ routing table too. There is a per-VM capability by which the kernel signals the need for a device ID, so check this and put the device ID into the routing table if needed. For PCI devices we take the bus/device/function triplet and and that to the routing setup call. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09arm: setup SPI IRQ routing tablesAndre Przywara1-0/+32
Since we soon start using GSI routing on ARM platforms too, we have to setup the initial SPI routing table. Before the first call to KVM_SET_GSI_ROUTING, the kernel holds this table internally, but this is overwritten with the ioctl, so we have to explicitly set it up here. The routing is actually not used for IRQs triggered by KVM_IRQ_LINE, but it needs to be here anyway. We use a simple 1:1 mapping. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09PCI: inject PCI device ID on MSI injectionAndre Przywara3-1/+11
The ITS emulation requires a unique device ID to be passed along the MSI payload when kvmtool wants to trigger an MSI in the guest. According to the proposed changes to the interface add the PCI bus/device/function triple to the structure passed with the ioctl. Check the respective capability before actually adding the device ID to the kvm_msi struct. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09add kvm__supports_vm_extension()Andre Przywara2-0/+29
KVM capabilities can be per-VM, in this case the ioctl should be issued on the VM file descriptor, not on the system fd. Since this feature is guarded by a (system) capability itself, wrap the call into a function of its own. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09arm: FDT: create MSI controller DT nodeAndre Przywara3-2/+27
The ARM GICv3 ITS requires a separate device tree node to describe the ITS. Add this as a child to the GIC interrupt controller node to let a guest discover and use the ITS if the user requests it. Since we now need to specify #address-cells for the GIC node, we have to add two zeroes to the interrupt map to match that. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09arm: allow creation of an MSI register frame regionAndre Przywara2-0/+65
The GICv3 ITS expects a separate 64K page to hold ITS registers. Add a function to reserve such a page in the guest's I/O memory and use that for the ITS vGIC type. To cover the 64K page with the MSI doorbell (which directly follows the page with the register frames), we reserve this as well, although the guest is never expected to write into this. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09arm: allow vGICv3 emulationVladimir Murzin5-16/+3
KVM/arm recently got support for vGICv3 (and vITS), which is evident in the updated header file. So as now ARM has feature parity when it comes to the GIC emulation, we can remove the special defines we had in place to allow compilation for ARM(32). For simplicity we now use 64K sized GIC regions everywhere, as GICv3 mandates them. [Andre: some update, reword commit message] Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09update public Linux headers for GICv3 ITS emulationAndre Przywara6-9/+117
The GICv3 ITS emulation brings some additions to the headers, so lets update kvmtool's version of the headers to Linux' v4.11-rc7-57. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09PCI: Only allocate IRQ routing entry when availableAndre Przywara1-3/+26
If we need to inject an MSI into the guest, we rely at the moment on a working GSI MSI routing functionality. However we can get away without IRQ routing, if the host supports MSI injection via the KVM_SIGNAL_MSI ioctl. So we try the GSI routing first, but if that fails due to a missing IRQ routing functionality, we fall back to KVM_SIGNAL_MSI (if that is supported). Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09virtio: fix endianness check for vhost supportAndre Przywara2-3/+8
Currently we deny any VHOST_* functionality if the architecture supports guests with different endianness than the host. Most of the time even on those architectures the endianness of guest and host are the same, though, so we are denying the glory of VHOST needlessly. Switch from compile time determination to a run time scheme, which takes the actual endianness of the guest into account. For this we change the semantics of VIRTIO_ENDIAN_HOST to return the actual endianness of the host (the endianness of kvmtool at compile time, really). The actual check in vhost_net now compares this against the guest endianness. This enables vhost support on ARM and ARM64. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09MSI-X: update GSI routing after changed MSI-X configurationAndre Przywara3-10/+80
When we set up GSI routing to map MSIs to KVM's GSI numbers, we write the current device's MSI setup into the kernel routing table. However the device driver in the guest can use PCI configuration space accesses to change the MSI configuration (address and/or payload data). Whenever this happens after we have setup the routing table already, we must amend the previously sent data. So when MSI-X PCI config space accesses write address or payload, find the associated GSI number and the matching routing table entry and update the kernel routing table (only if the data has changed). This fixes vhost-net, where the queue's IRQFD was setup before the MSI vectors. To avoid issues, we ignore writes to the PBA region. The spec says: "Software should never write, and should only read Pending Bits. If software writes to Pending Bits, the result is undefined." Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09irq: move IRQ routing into irq.cAndre Przywara9-100/+114
The current IRQ routing code in x86/irq.c is mostly implementing a generic KVM interface which other architectures may use too. Move the code to set up an MSI route into the generic irq.c file and guard it with the KVM_CAP_IRQ_ROUTING capability to return an error if the kernel does not support interrupt routing. This also removes the dummy implementations for all other architectures and only leaves the x86 specific code in x86/irq.c. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09arm: use static DT phandle for the GICAndre Przywara8-15/+15
As KVM supports only onc (v)GIC per guest and it's hard to imagine that we will ever need more than that, lets simplify the FDT generation by not passing that single, constant phandle around. Let's just reference that one global symbol from enum phandles instead. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-09FDT: use static phandlesAndre Przywara9-12/+41
The current implementation of fdt__alloc_phandle() suffers from being implemented in a static inline function situated in a header file. This will only create expected results within a single compilation unit. It seems a bit over the top to use a function to allocate phandles, when at the end of the day a phandle is just a unique identifier. To simplify things - especially with upcoming patches - we just introduce an enum per architecture to hold all possible phandle sources and use that instead of the dynamic allocation. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-24kvmtool: Switch serial input to raw modeMarc Zyngier1-0/+1
As I was trying to install a new VM using the Debian installer, I noticed that the return key would work just fine in a shell, but wouldn't do anything in the menu. Pretty annoying. Further investigation showed that the terminal was left in cooked mode, converting CR to LF, and thus giving the VM the wrong information. Clearing the ICRNL flag in the input flag set fixes the issue. Suggested-by: Dave martin <dave.martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-02-17kvmtool: virtio-net: fix VIRTIO_NET_F_MRG_RXBUF usage in rx threadWill Deacon3-23/+42
When merging virtio-net buffers using the VIRTIO_NET_F_MRG_RXBUF feature, the first buffer added to the used ring should indicate the total number of buffers used to hold the packet. Unfortunately, kvmtool has a number of issues when constructing these merged buffers: - Commit 5131332e3f1a ("kvmtool: convert net backend to support bi-endianness") introduced a strange loop counter, which resulted in hdr->num_buffers being set redundantly the first time round - When adding the buffers to the ring, we actually add them one-by-one, allowing the guest to see the header before we've inserted the rest of the data buffers... - ... which is made worse because we non-atomically increment the num_buffers count in the header each time we insert a new data buffer Consequently, the guest quickly becomes confused in its net rx code and the whole thing grinds to a halt. This is easily exemplified by trying to boot a root filesystem over NFS, which seldom succeeds. This patch resolves the issues by allowing us to insert items into the used ring without updating the index. Once the full payload has been added and num_buffers corresponds to the total size, we *then* publish the buffers to the guest. Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-02-01virtio: Describe virtio coherency in DTRobin Murphy2-0/+2
We use cacheable accesses on our end of the virtio ring, so make sure the guest is aware of that, and thus doesn't try to use non-cacheable DMA buffers, by including the dma-coherent property on its DT node. Signed-off-by: Robin Murphy <robin.murphy@arm.com> [will: do the same for the PCI node for virtio-pci devices] Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-12-20README: suggest a format.subjectprefixAndrew Jones1-2/+7
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-28kvmtool: 9p: fix a buffer overflow in rel_to_absG. Campana1-13/+16
Make use of get_full_path_helper() instead of sprintf. Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-28kvmtool: 9p: fix check for snprintf truncation of full_pathG. Campana1-1/+1
The check on the return value of snprintf should reuse the size parameter, rather than take sizeof(full_path) as the bound. Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-18kvmtool: 9p: refactor fixes with get_full_path()G. Campana1-75/+36
The code responsible of path verification is identical in several functions. Move it to a new function. Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-18kvmtool: 9p: fix strcpy vulnerabilitiesG. Campana1-16/+55
Use strncpy instead of strcpy to avoid buffer overflow vulnerabilities. Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com> [will: keep strcpy when we've verified the size already] Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-18kvmtool: 9p: fix sprintf vulnerabilitiesG. Campana1-11/+70
Use snprintf instead of sprintf to avoid buffer overflow vulnerabilities. Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-18kvmtool: 9p: fix path traversal vulnerabilitiesG. Campana1-0/+55
A path traversal exists because the guest can send "../" sequences to the host 9p handlers. To fix this vulnerability, we ensure that path components sent by the guest don't contain "../" sequences. Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-05kvmtool: Makefile: disable PIE build for bios and pre_initRiku Voipio1-2/+7
Latest Debian and Ubuntu GCC default to PIE code. Disable PIC for bios and PIE for pre_init. Since the flag -no-pie is not available on older GCC's, make use of flag only if the option is available. -fno-pic is more widely available and should be safe to enable uncondionally. Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-08-09kvmtool: ARM: madvise mergeable and hugepage separatelyStefan Agner1-1/+4
The madvise behavior is not a bit field and hence can not be or'ed. Also madvise_behavior_valid checks the flag using a case statement hence only one behavior is supposed to be supplied. Call madvise twice, once for MERGEABLE and once for HUGEPAGE. Acked-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-07-29kvmtool: remove redundant calls to lseekWill Deacon4-18/+0
open() sets the file osset to the beginning of the file, so there's no need for an explicit lseek when called in kvm__arch_load_kernel_image. Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-15kvmtool: devices.c: Update parent when inserting into rbtreeJames Morse1-0/+1
When walking the devices rbtree to insert a node, we must keep track of the parent node when we descend. If we skip this step, we always insert new nodes with a NULL parent, bypassing __rb_insert()s rebalance code. Things get worse when we come to walk the tree, as we can't move up a level. This isn't a problem in practice, as all devices appear to be inserted in-order, so our rbtree is actually a monochrome linked list. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-14kvmtool/arm: Fix timer triggerMarc Zyngier1-4/+4
KVM exposes a level triggered timer to the guest, and yet kvmtool presents it as being edge-triggered in the DT. Let's fix it and match what the kernel exposes. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-09gitignore: fix cscope ignoringAndrew Jones1-1/+1
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-05-17kvmtool: add script for updating kernel headersAndre Przywara1-0/+35
From time to time (when new KVM kernel features get enabled in kvmtool), we need to update the public kernel headers from a recent Linux tree. Provide a script that makes sure we get the right files and that also covers every architecture. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-05-17kvmtool: headers: update to Linux v4.6 releaseAndre Przywara3-4/+40
Update our copy of the KVM header files to match the kernel's v4.6.0. This fixes the ARM PMU support, where the feature identifier was changed during the merge window due to a merge conflict. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-18kvmtool: Change readdir_r to readdirMichal Rostecki1-8/+8
readdir_r is deprecated[1] and usage of readdir is recommended. [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=7584a3f96de88d5eefe5d6c634515278cbfbf052 Signed-off-by: Michal Rostecki <michal.rostecki@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-14kvmtool: delegate exit/reboot responsibility to vcpu0Will Deacon3-25/+12
Our exit/reboot code is a bit of a mess: - Both kvm__reboot and kvm_cpu_exit send SIGKVMEXIT to running vcpus - When vcpu0 exits, the main thread starts executing destructors (exitcalls) whilst other vcpus may be running - The pause_lock isn't always held when inspecting is_running for a vcpu This patch attempts to fix these issues by restricting the exit/reboot path to vcpu0 and the main thread. In particular, a KVM_SYSTEM_EVENT will signal SIGKVMEXIT to vcpu0, which will join with the main thread and then tear down the other vcpus before invoking any destructor code. Acked-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-11Implement spapr pci for little endian systems.Balbir Singh1-11/+14
Port the spapr_pci implementation for ppc64le. Based on suggestions by Alexey Kardashevskiy <aik@ozlabs.ru> We should have always used phys_hi and 64 bit addr and size. Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-11Implement H_SET_MODE for ppc64leBalbir Singh4-3/+82
Use the infrastructure for queuing a task to a specific vCPU and sett ILE (Little Endian Interrupt Handling) on power via h_set_mode hypercall Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-11Add basic infrastructure to run tasks on vCPUsMichael Ellerman7-0/+84
This patch adds kvm_cpu__run_on_all_cpus() to run a task on each vCPU. This infrastructure uses signals to signal the vCPU to allow a task to be added to each vCPU's task. The vCPU executes any pending tasks in the cpu run loop Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-11Add basic little endian support for ppc64le.Balbir Singh2-14/+15
Currently kvmtool works well/was designed for big endian ppc64 systems. This patch adds support for little endian systems The system does not yet boot as support for h_set_mode is required to help with exceptions in big endian mode -- first page fault. The support comes in the next patch of the series Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-16kvmtool/tests: fix iso build on debianRiku Voipio1-1/+9
Debian and some other distro's don't provide mkisofs due to licensing concerns. xorrisofs from package xorriso provides a command-line compatible command in this case. Update the makefile of tests to pick xorrisofs if mkisofs is not available. Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-11kvmtool: arm: Work around missing PMU on AArch32Marc Zyngier1-0/+4
We don't have PMU support on 32bit ARM just yet, so let's work around this the ugly way for now. Cc: Will Deacon <will.deacon@arm.com> Reported-by: Riku Voipio <riku.voipio@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-02Documentation: remove documentation stubs and common-cmds.h generationAndre Przywara15-263/+19
Now that we have a manpage in place, we can remove the manpage-style text files from the Documentation directory. This allows us also to get rid of the crude common-cmds.h generation, which relied on these files and on a command-list.txt file. Instead include the version of that header file generated with the current HEAD into the source tree. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-02Add a rudimentary manpageAndre Przywara1-0/+226
The kvmtool documentation is somewhat lacking, also it is not easily accessible when living in the source tree only. Add a good ol' manpage to document at least the basic commands and their options. This level of documentation matches the one that is already there in the Documentation directory and should be subject to extension. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-02arm64: Add PMUv3 supportMarc Zyngier7-3/+91
In order to enable the in-kernel PMU emulation code, add a tiny bit of setup code that initializes the PMU on each CPU and populates the DT. The IRQ is harcoded to PPI7 (INTID23) in order to match what QEMU does. The code is enabled when the --pmu option is passed to lkvm. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-02arm64: Update kernel includesMarc Zyngier2-5/+91
In order to enable the PMU support on arm64, update the copy of the kernel include files. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-01kvmtool: Makefile: fix indentation of warning stanzaMaciek Borzecki1-1/+1
If a static libc is not present in the system the build will fail with make complaining about commands starting before the first target. The patch fixes indentation of a warning about missing static libc, thus fixing the build. Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-18arm: move kernel loading into arm/kvm.cAndre Przywara2-94/+89
For some reasons (probably to have easy access to the command line) the kernel loading for arm and arm64 was located in arm/fdt.c. Move the routines to kvm.c (where other architectures put it) to only have real device tree code in fdt.c. We use the pointer in struct kvm to access the command line string. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-18arm/arm64: use read_file() in kernel and initrd loadingAndre Przywara1-22/+18
Use the new read_file() wrapper in our arm/arm64 kernel image loading function instead of the private implementation. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-18x86: use read wrappers in kernel loadingAndre Przywara1-21/+14
Replace the unsafe read-loops in the x86 kernel image loading functions with our safe read_file() and read_in_full() wrappers. This should fix random fails in kernel image loading, especially from pipes and sockets. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-18MIPS: use read wrappers in kernel loadingAndre Przywara1-18/+18
Replace the unsafe read-loops used in the MIPS kernel image loading with our safe read_file() and read_in_full() wrappers. This should fix random fails in kernel image loading, especially from pipes and sockets. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-18powerpc: use read_file() in kernel and initrd loadingAndre Przywara1-16/+20
Replace the unsafe read-loops in the powerpc kernel image loading function with our new and safe read_file() wrapper. This should fix random fails in kernel image loading, especially from pipes and sockets. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-18provide generic read_file() implementationAndre Przywara2-0/+23
In various parts of kvmtool we simply try to read files into memory, but fail to do so in a safe way. The read(2) syscall can return early having only parts of the file read, or it may return -1 due to being interrupted by a signal (in which case we should simply retry). The ARM code seems to provide the only safe implementation, so take that as an inspiration to provide a generic read_file() function usable by every part of kvmtool. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-18Refactor kernel image loadingAndre Przywara6-58/+46
Let's face it: Kernel loading is quite architecture specific. Don't claim otherwise and move the loading routines into each architecture's responsibility. This introduces kvm__arch_load_kernel(), which each architecture can implement accordingly. Provide bzImage loading for x86 and ELF loading for MIPS as special cases for those architectures (removing the arch specific code from the generic kvm.c file on the way) and rename the existing "flat binary" loader functions for the other architectures to the new name. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-11kvmtool: Makefile: remove static dependency files when make cleanJames Hunt1-1/+4
After make lkvm-static & make clean, the dependency files for static objects (.xxx.static.o.d) are not removed. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Dimitri John Ledkov <dimitri.j.ledkov@intel.com> Signed-off-by: James Hunt <james.o.hunt@intel.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-09kvmtool: Makefile: remove LDFLAGS from guest_init linkingAndre Przywara1-2/+2
Looking back at the HEAD from a few commits ago, it's obvious that using the LDFLAGS variable for linking the guest_init binary was rather pointless, as it was zeroed in the beginning and then never set. As guest_init is a rather special binary that does not cope well with arbitrary linker flags, let's reinstantiate the previous state by removing the LDFLAGS variable from those linking steps. This allows LDFLAGS to be used for linking the actual kvmtool binary only and helps to re-merge commit d0e2772b93a ("Makefile: allow overriding CFLAGS on the command line"). Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-05kvmtool: fix VM exit race attempting to pthread_kill an exited threadWill Deacon9-26/+36
lkvm currently suffers from a Segmentation Fault when exiting, which can also lead to the console not being cleaned up correctly after a VM exits. The issue is that (the misnamed) kvm_cpu__reboot function sends a SIGKVMEXIT to each vcpu thread, which causes those vcpu threads to exit once their main loops (kvm_cpu__start) detect that cpu->is_running is now false. The lack of synchronisation in this exit path means that a concurrent pause event (due to the br_write_lock in ioport__unregister) ends up sending SIGKVMPAUSE to an exited thread, resulting in a SEGV. This patch fixes the issue by moving kvm_cpu__reboot into kvm.c (renaming it in the process) where it can hold the pause_lock mutex across the reboot operation. This in turn makes it safe for the pause code to check the is_running field of each CPU before attempting to send a SIGKVMPAUSE signal. Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-04Revert "Makefile: allow overriding CFLAGS on the command line"Will Deacon1-7/+8
Riku Voipio reports a regression introduced by d0e2772b93ab ("Makefile: allow overriding CFLAGS on the command line"): | This breaks builds of debian packages as dpkg-buildpackage sets LDFLAGS | to something unsuitable for guest init. Revert the problematic patch for the moment, while we rethink how we'd like to support user-provided toolchain flags. This reverts commit d0e2772b93abcc8a66f83ed8ed248c94adabce4b. Conflicts: Makefile Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-02Makefile: consider LDFLAGS on feature tests and when linking executablesAndre Przywara1-15/+15
While we have an LDFLAGS variable in kvmtool's Makefile, it's not really used when both doing the feature tests and when finally linking the lkvm executable. Add that variable to all the linking steps to allow the user to specify custom library directories or linker options on the command line. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-02Makefile: allow overriding CFLAGS on the command lineAndre Przywara1-8/+7
When a Makefile variable is set on the make command line, all Makefile-internal assignments to that very variable are _ignored_. Since we add quite some essential values to CFLAGS internally, specifying some CFLAGS on the command line will usually break the build (and not fix any include file problems you hoped to overcome with that). Somewhat against intuition GNU make provides the "override" directive to change this behavior; with that assignments in the Makefile get _appended_ to the value given on the command line. [1] Change any internal assignments to use that directive, so that a user can use: $ make CFLAGS=/path/to/my/include/dir to teach kvmtool about non-standard header file locations (helpful for cross-compilation) or to tweak other compiler options. Signed-off-by: Andre Przywara <andre.przywara@arm.com> [1] https://www.gnu.org/software/make/manual/html_node/Override-Directive.html Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-11-02kvmtool/run: set a default cmdline if not setWilliam Dauchy1-1/+1
when starting with custom kernel and disk options, kernel_cmdline is NULL; it results in a segfault while trying to look for a string using `strstr`: __strstr_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strstr-sse2-unaligned.S:40 0x00000000004056bf in kvm_cmd_run_init (argc=<optimized out>, argv=<optimized out>) at builtin-run.c:608 0x000000000040639d in kvm_cmd_run (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>) at builtin-run.c:659 0x0000000000412b8f in handle_command (command=0x62bbc0 <kvm_commands>, argc=5, argv=0x7fffffffe840) at kvm-cmd.c:84 0x00007ffff7211b45 in __libc_start_main (main=0x403540 <main>, argc=6, argv=0x7fffffffe838, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe828) at libc-start.c:287 0x0000000000403962 in _start () this patch suggests to set a minimal cmdline when kernel_cmdline is NULL Fixes: 8a7163f3dbc7 ("kvmtool/run: append cfg.kernel_cmdline at the end of real_cmdline") Signed-off-by: William Dauchy <william@gandi.net> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-28kvmtool: set 9p caching mode to support writable mmapsSasha Levin1-1/+1
9p doesn't support writable mmaps by default (when cache=none), set it to loose caching to allow for writable mmaps. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-27kvmtool/run: do not overwrite /virt/initOleg Nesterov1-3/+7
To me kvm_setup_guest_init() behaviour looks "obviously wrong" and unfriendly because it always overwrites /virt/init. kvm_setup_guest_init() is also called when we are going to use this tree as a rootfs, and without another patch ("kvmtool/run: append cfg.kernel_cmdline at the end of real_cmdline") the user can't use "lkvm run -p init=my_init_path". This simply means that you can not use a customized init unless you patch kvmtool. Change extract_file() to do nothing if the file already exists. This should not affect do_setup() which calls kvm_setup_guest_init() only if make_dir(guestfs_name) creates the new/empty dir. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-27kvmtool/run: don't abuse "root=" parameter, don't pass "rw" to v9fs_mount()Oleg Nesterov1-1/+1
1. kvm_cmd_run_init() appends "root=/dev/root" to real_cmdline if cfg.using_rootfs == T. This doesn't hurt but makes no sense and looks confusing. We do not need to initialiaze the kernel's saved_root_name[] and "/dev/root" means nothing to name_to_dev_t(). We only need to pass this mount-tag to 9p but the kernel always uses dev_name="/dev/root" in mount_root() path, so we can safely remove this option from the command line. 2. "rw" in rootflags looks confusing too, it is silently ignored by v9fs_parse_options() and has no effect. We need to clear MS_RDONLY from root_mountflags, this is what the "standalone" kernel parameter correctly does. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-27kvmtool: add lkvm-static to gitignoreOleg Nesterov1-0/+1
add lkvm-static to gitignore Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-27kvmtool/x86: implement guest_pre_initOleg Nesterov5-1/+58
Add the tiny x86/init.S which just mounts /host and execs /virt/init. NOTE: of course, the usage of CONFIG_GUEST_PRE_INIT is ugly, we need to cleanup this code. But I'd prefer to do this on top of this minimal/simple change. And I think this needs cleanups in any case, for example I think lkvm shouldn't abuse the "init=" kernel parameter at all. Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-27kvmtool/build: introduce GUEST_PRE_INIT targetOleg Nesterov2-5/+21
This comes as a separate patch because I do not really understand /usr/bin/make, probably it should be updated. Change the main Makefile so that if an arch defines ARCH_PRE_INIT then we - build $GUEST_INIT without "-static" - add -DCONFIG_GUEST_PRE_INIT to $CFLAGS - build $ARCH_PRE_INIT as guest/guest_pre_init.o and embed it into lkvm the same as we do with guest/guest_init.o This also means that ARCH_PRE_INIT case doesn't depend on glibc-static, we can relax the SOURCE_STATIC check later. Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-27kvmtool/setup: Introduce extract_file() helperOleg Nesterov1-10/+14
Turn kvm_setup_guest_init(guestfs_name) into a more generic helper, extract_file(guestfs_name, filename, data, size) and reimplement kvm_setup_guest_init() as a trivial wrapper. Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-27kvmtool: correct order of the vcpu destructorSasha Levin1-1/+1
The vcpu module is a core component which should be removed last, but the destructor was mistakenly marked as something that should be done first. This would cause the vcpu data to be freed up before anything else had the chance to exit, and assuming that that data was still valid - causing use after frees. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-27kvmtool/term: unexport term_set_tty, term_init, term_exitOleg Nesterov2-10/+8
According to git grep they can be static. term_got_escape can be static too, and we can even move it into term_getc(). "int term_escape_char" doesn't make sense at least until we allow to redefine it, turn it into preprocessor constant. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-27kvmtool/run: append cfg.kernel_cmdline at the end of real_cmdlineOleg Nesterov1-7/+6
This allows the user to always override the paramaters set by lkvm. Say, currently 'lkvm run -p ro' doesn't work. To keep the current logic we need to change strstr("root=") to check cfg.kernel_cmdline, not real_cmdline. And perhaps we can even add a simple helper add_param(name, val) to make this all more consistent; it should only append "name=val" to real_cmdline if cfg.kernel_cmdline doesn't include this paramater. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-09Add a link to the lwn.net articleSven Dowideit1-0/+3
Signed-off-by: Sven Dowideit <SvenDowideit@home.org.au> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-09-15Make static libc and guest-init functionality optional.Dimitri John Ledkov4-38/+23
If one typically only boots full disk-images, one wouldn't necessaraly want to statically link glibc, for the guest-init feature of the kvmtool. As statically linked glibc triggers haevy security maintainance. Signed-off-by: Dimitri John Ledkov <dimitri.j.ledkov@intel.com> [will: moved all the guest_init handling into builtin_setup.c] Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-09-10Makefile: relax arm testRiku Voipio1-1/+1
Currently Makefile accepts only armv7l.* When building kvmtool under 32bit personality on Aarch64 machines, uname -m reports "armv8l", so build fails. We expect doing 32bit arm builds in Aarch64 to become standard the same way people do i386 builds on x86_64 machines. Make the sed test a little more greedy so armv8l becomes acceptable. Acked-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-09-04Handle KVM_EXIT_SYSTEM_EVENT on any VCPUMark Rutland1-4/+9
When VCPU #0 exits (e.g. due to KVM_EXIT_SYSTEM_EVENT), it sends SIGKVMEXIT to all other VCPUs, waits for them to exit, then tears down any remaining context. The signalling of SIGKVMEXIT is critical to forcing VCPUs to shut down in response to a system event (e.g. PSCI SYSTEM_OFF). VCPUs other that VCPU #0 simply exit in kvm_cpu_thread without forcing other CPUs to shut down. Thus if a system event is taken on a VCPU other than VCPU #0, the remaining CPUs are left online. This results in KVM tool not exiting as expected when a system event is taken on a VCPU other than VCPU #0 (as may happen if the guest panics). Fix this by tearing down all CPUs upon a system event, regardless of the CPU on which the event occurred. While this means the VCPU thread will signal itself, and VCPU #0 will signal all other VCPU threads a second time, these are harmless. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-08-07README: Add section for where to send patches.Josh Triplett1-0/+7
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-08-07kvm__emulate_io: Don't fall through from IO in to IO out if no handlerJosh Triplett1-1/+1
If an IO port device has no io_in handler, kvm__emulate_io would fall through and call the io_out handler instead. Fix to only call the handler for the appropriate direction. If no handler exists, kvm__emulate_io will automatically treat it as an IO error (due to the default "ret = false"). Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-08-07kvm__emulate_io: Don't call br_read_unlock() twice on IO errorJosh Triplett1-7/+4
The IO error path in kvm__emulate_io would call br_read_unlock(), then goto error, which would call br_read_unlock() again. Refactor the control flow to have only one exit path and one call to br_read_unlock(). Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-08-06kvmtool: Introduce downscript option for virtio-netFan Du2-12/+38
To detach tap device automatically from bridge when exiting, just like what the reverse of "script" does. Signed-off-by: Fan Du <fan.du@intel.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-22avoid redefining PAGE_SIZEAndre Przywara1-0/+3
PAGE_SIZE may have been defined by the C libary (musl-libc does that). So avoid redefining it here unconditionally, instead only use our definition if none has been provided by the libc. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-22Makefile: avoid non-literal printf format string warningsAndre Przywara1-0/+1
The clang compiler by default dislikes non-literal format strings in *printf functions, so it complains about kvm__set_dir() in kvm.c and about the error reporting functions. Since a fix is not easy and the code itself is fine (just seems that the compiler is not smart enough to see that), let's just disable the warning. Since GCC knows about this option as well (it just doesn't have it enabled with -Wall), we can unconditionally add this to the warning options. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-20remove KVM_CAP_MAX_VCPUS hackAndre Przywara1-8/+0
As we now have the header file in our repository, we can safely follow the recommendation in kvm.c and remove the hack adding the KVM_CAP_MAX_VCPUS macro. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-20check for and use C library provided strlcpy and strlcatAndre Przywara4-0/+19
The musl-libc library provides implementations of strlcpy and strlcat, so introduce a feature check for it and only use the kvmtool implementation if there is no library support for it. This avoids clashes with the public definition. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-20use <poll.h> instead of <sys/poll.h>Andre Przywara1-1/+1
The manpage of poll(2) states that the prototype of poll is defined in <poll.h>. Use that header file instead of <sys/poll.h> to allow compilation against musl-libc. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-20Fix call to connect()Andre Przywara1-1/+1
According to the manpage and the prototype the second argument to connect(2) is a "const struct sockaddr*", so cast our protocol specific type back to the super type. This fixes compilation on musl-libc. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-20ui: remove pointless double const in keymap declarationsAndre Przywara2-2/+2
clang does not like two const specifiers in one declaration, so remove one to let clang compile kvmtool. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-20Makefile: remove unneeded -s switch on compiling BIOS filesAndre Przywara1-5/+5
Stripping has no effect on object files, so having "-s -c" on the command line makes no sense. In fact clang complains about it and aborts with an error, so lets just remove the unneeded "-s" switch here. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-20kvm-ipc: use proper type for file descriptorAndre Przywara1-1/+1
A socket (as any other file descriptor) is of type "int" to catch the negative error cases. Fix the declaration to allow errors to be detected. Found and needed by clang. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-20qcow: fix signedness bugsAndre Przywara1-4/+4
Some functions in qcow.c return u64, but are checked against < 0 because they want to check for the -1 error return value. Do an explicit comparison against the casted -1 to express this properly. This was silently compiled out by gcc, but clang complained about it. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-20avoid casts when initializing structuresAndre Przywara10-12/+12
Due to our kernel heritage we have code in kvmtool that relies on the (still) implicit -std=gnu89 compiler switch. It turns out that this just affects some structure initialization, where we currently provide a cast to the type, which upsets GCC for anything beyond -std=gnu89 (for instance gnu99 or gnu11). We do need the casts when initializing structures that are not assigned to the same type, so we put it there explicitly. This allows us to compile with all the three GNU standards GCC currently supports: gnu89/90, gnu99 and gnu11. GCC threatens people with moving to gnu11 as the new default standard, so lets fix this better sooner than later. (Compiling without GNU extensions still breaks and I don't bother to fix that without very good reasons.) Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-08arm: use new irqchip parameter to create different vGIC typesAndre Przywara4-3/+26
Currently we unconditionally create a virtual GICv2 in the guest. Add a --irqchip= parameter to let the user specify a different GIC type for the guest, when omitting this parameter it still defaults to --irqchip=gicv2. For now the only other supported type is --irqchip=gicv3 Signed-off-by: Andre Przywara <andre.przywara@arm.com> [will: use pr_err instead of fprintf] Signed-off-by: Will Deacon <will.deacon@arm.com>