Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch enables the Scalable Vector Extension for the guest when
the host supports it.
This requires use of the new KVM_ARM_VCPU_FINALIZE ioctl before the
vcpu is runnable, so a new hook kvm_cpu__configure_features() is
added to provide an appropriate place to do this work.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In the interest of readability, factor out the vcpu feature setup
for ptrauth into a separate function.
Also, because aarch32 doesn't have this feature or the related
command line options anyway, move the actual code into aarch64/.
Since ARM_VCPU_PTRAUTH_FEATURE is only there to make the ptrauth
feature setup code compile on arm, it is no longer needed: inline
and remove it.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch adds a runtime capabality for KVM tool to enable Arm64 8.3
Pointer Authentication in guest kernel. Two vcpu features
KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] are supplied together to enable
Pointer Authentication in KVM guest after checking the capability.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com> [merge new kernel heaers]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We're going to need updated headers for arm64 SVE and ptrauth support.
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Failing to initialise the virt_queue via virtio_init_device_vq() leaves,
amongst other things, the endianness unspecified. On arm/arm64 this
results in virtio_guest_to_host_uxx() treating the queue as big-endian
and trying to translate bogus addresses:
Warning: unable to translate guest address 0x80b8249800000000 to host
Ensure the virt_queue is always initialised by the virtio device during
setup.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
My @arm.com address is gonna stop working. Update README information
with an address people can use to actually reach me.
Signed-off-by: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The SVE KVM support for arm64 includes the additional backend
header <asm/sve_context.h> from <asm/kvm.h>.
So update this header if it is available.
To avoid creating a sudden dependency on a specific minimum kernel
version, ignore such optional headers if the source kernel tree
doesn't have them.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
If in intermediate step fails, update_headers.sh blindly continues
and may return success status.
To avoid errors going unnoticed when driving this script, exit and
report failure status as soon as something goes wrong. For good
measure, also fail on expansion of undefined shell variables to aid
future maintainers.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
update_headers.sh can break if the current working directory has a
funny name or if something odd is passed for LINUX_ROOT.
In the interest of cleanliness, quote where appropriate.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Julien has kindly offered to help maintain kvmtool, but it occurred to
me that we don't actually provide any maintainer contact details in the
repository as it stands.
Add a brief "Maintainers" section to the README, immediately after the
"Contributing" section so that people know who to nag about merging and
reviewing patches.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Kvmtool creates a (debug) UNIX socket file for each VM, using its
(possibly auto-generated) name as the filename. There is a check using
access(), which bails out with an error message if a socket with that
name already exists.
Aside from this check being unnecessary, as the bind() call later would
complain as well, this is also racy. But more annoyingly the bail out is
not needed most of the time: an existing socket inode is most likely just
an orphaned leftover from a previous kvmtool run, which just failed to
remove that file, because of a crash, for instance.
Upon finding such a collision, let's first try to connect to that socket,
to detect if there is still a kvmtool instance listening on the other
end. If that fails, this socket will never come back to life, so we can
safely clean it up and reuse the name for the new guest.
However if the connect() succeeds, there is an actual live kvmtool
instance using this name, so not proceeding is the only option.
This should never happen with the (PID based) automatically generated
names, though.
This avoids an annoying (and not helpful) error message and helps
automated kvmtool runs to proceed in more cases.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When kvmtool (or the host kernel) crashes or gets killed, we cannot
automatically remove the socket file we created for that VM.
A later call of "lkvm list" iterates over all those files and complains
about those "ghost socket files", as there is no one listening on
the other side. Also sometimes the automatic guest name generation
happens to generate the same name again, so an unrelated "lkvm run"
later complains and stops, which is bad for automation.
As the only code doing a listen() on this socket is kvmtool upon VM
*creation*, such an orphaned socket file will never come back to life,
so we can as well unlink() those sockets in the code. This spares the
user from doing it herself.
We keep the message in the code to notify the user of this.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
clang and GCC9 refuse to compile virtio/blk.c with the following message:
virtio/blk.c:161:37: error: taking address of packed member 'geometry' of class
or structure 'virtio_blk_config' may result in an unaligned pointer value
[-Werror,-Waddress-of-packed-member]
struct virtio_blk_geometry *geo = &conf->geometry;
Since struct virtio_blk_geometry is in a kernel header, we can't do much
about the packed attribute, but as Peter pointed out, the solution is
rather simple: just get rid of the convenience variable and use the
original struct member directly.
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
struct vfio_irq_set from the kernel headers contains a variable sized
array to hold a payload. The vfio_irq_eventfd struct puts the "fd"
member right after this, hoping it to automatically fit in the payload slot.
But having a variable sized type not at the end of a struct is a GNU C
extension, so clang will refuse to compile this.
Solve this by somewhat doing the compiler's job and place the payload
manually at the end of the structure.
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
clang complained that the comparison of an u8 variable against 256 is
somewhat pointless.
Just remove the check, as the condition will never hit.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
As clang rightfully pointed out, the ampersand in front of this member
looks wrong.
Remove it so we actually really compare against the count being 0.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Ensure that all requests are complete when resetting a virtqueue, by
draining the AIO queue after stopping the submission thread.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Add a call into the disk layer to synchronize the AIO queue. Wait for all
pending requests to complete. This will be necessary when resetting a
virtqueue.
The wait() operation isn't the same as flush(). A VIRTIO_BLK_T_FLUSH
request ensures that any write request *that completed before the FLUSH is
sent* is committed to permanent storage (e.g. written back from a write
cache). But it doesn't do anything for requests that are still pending
when the FLUSH is sent.
Avoid introducing a mutex on the io_submit() and io_getevents() paths,
because it can lead to 30% throughput drop on heavy FIO jobs. Instead
manage an inflight counter using compare-and-swap operations, which is
simple enough as the caller doesn't submit new requests while it waits for
the AIO queue to drain. The __sync_fetch_and_* operations are a bit rough
since they use full barriers, but that didn't seem to introduce a
performance regression.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
If the AIO thread is still calling io_getevents() while the exit path
calls io_destroy(), it will segfault. Wait for the thread to finish before
destroying the context.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently when the kernel completes a batch of AIO requests and signals it
via eventfd, we retrieve at most AIO_MAX events (256), and ignore the
rest. Call io_getevents() again in case more events are pending.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Add an 'async' attribute to disk_image_operations, that describes if they
can submit async I/O or not. disk_image->async is now set iff
CONFIG_HAS_AIO and the ops do use AIO.
This fixes qcow1, which used to set async = 1 even though the qcow
operations don't use AIO. The disk core would perform the read/write
operation without pushing the completion onto the virtio queue, and the
guest would be stuck waiting.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Move all AIO code to a separate file, disk/aio.c, to remove as much
#ifdefs as possible. Split the raw read/write disk ops into async and
sync, and choose which ones to use depending on CONFIG_HAS_AIO. Note that
we fix raw_image__close() which incorrectly checked CONFIG_HAS_VIRTIO
instead of CONFIG_HAS_AIO, and closed an unitialized disk->evt. A
subsequent commit will complete this refactoring by fixing use of the
'async' disk attribute.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
sync() should be called before reboot(RB_AUTOBOOT), otherwise data written
to disks might be lost.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Since we don't currently tell the guest when the disk backend is
read-only, it will report any inconsistent read after write as an error.
An image may be read-only either because user requested it on the
command-line, or because write support isn't implemented. Pass the
read-only attribute using the VIRTIO_BLK_F_RO feature.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Even though qcow1 doesn't use the refcount table, the cleanup path still
attempts to iterate over its LRU list. Initialize the list to avoid a
segfault on exit.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Build breaks when using KVM_BRLOCK_DEBUG because the header was seamingly
conceived to be included in a single .c file...
Fix this by moving the definition of the read/write lock into the kvm
struct.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The kvm argument is not passed to br_read_lock/unlock, this works for
the barrier implementation because the argument is not used. This ever
breaks if another lock implementation is used.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The vesa framebuffer is only used by architectures that explicitly
require it (i.e. x86). Compile it out for architectures not using it, as
its current implementation might not work for them.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
default to disable INTx mode when enable MSI/MSIX mode; but this logic is
easily broken if the guest PCI driver detects the MSI/MSIX cannot work as
expected and tries to rollback to use INTx mode. In this case, the INTx
mode has been disabled and has no chance to re-enable it, thus both INTx
mode and MSI/MSIX mode cannot work in vfio.
Below shows the detailed flow for introducing this issue:
vfio_pci_configure_dev_irqs()
`-> vfio_pci_enable_intx()
vfio_pci_enable_msis()
`-> vfio_pci_disable_intx()
vfio_pci_disable_msis() => Guest PCI driver disables MSI
To fix this issue, when disable MSI/MSIX we need to check if INTx mode
is available for this device or not; if the device can support INTx then
re-enable it so that the device can fallback to use it.
Since vfio_pci_disable_intx() / vfio_pci_enable_intx() pair functions
may be called for multiple times, this patch uses 'intx_fd == -1' to
denote the INTx is disabled, the pair functions can directly bail out
when detect INTx has been disabled and enabled respectively.
Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
To support INTx enabling for multiple times, we need firstly to extract
one-time initialisation and move the related code into a new function
vfio_pci_init_intx(); if later disable and re-enable the INTx, we can
skip these one-time operations.
This patch move below three main operations for INTx one-time
initialisation from function vfio_pci_enable_intx() into function
vfio_pci_init_intx():
- Reserve 2 FDs for INTx;
- Sanity check with ioctl VFIO_DEVICE_GET_IRQ_INFO;
- Setup pdev->intx_gsi.
Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The PCI device INTx uses event fd 'unmask_fd' to signal the deassertion
of the line from guest to host; but this eventfd isn't released properly
when disable INTx.
This patch firstly adds field 'unmask_fd' in struct vfio_pci_device for
storing unmask eventfd and close it when disable INTx.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
At the moment kvmtool always tries to instantiate a virtual GICv2
interrupt controller for the guest, and fails with some scary error
message if that doesn't work.
The user has then to manually specify "--irqchip=gicv3", which is not
really obvious.
With the advent of more GICv3-only machines, let's try to be more
clever and implement some auto-detection of the GIC type needed:
We try gicv3-its, gicv3, gicv2m and gicv2, in that order. The first one
succeeding wins.
For GICv2 machines the first two will always fail.
On GICv3 machines offering GICv2 compatibility we used to prefer a
virtual GICv2 in the guest, but these days the GICv3 support both in
guests and in KVM is equally mature and wide-spread, so we should use
the GICv3 emulation for the guest as well.
This algorithm is in effect is there is no explicit --irqchip parameter
on the command line. We still allow the GIC type to be set explicitly.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The code for copying an empty IP address into the DHCP opt buffer used
strncpy, however used the source length as the size argument. GCC 8.x
complains about it.
Since the source string is actually fixed, just revert to the old
strcpy, which gives us actually the same level of security in this case,
but makes the compiler happy.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
GCC 8.x complains about improper usage of strncpy in virtio/net.c and
virtio/scsi.c:
In function 'virtio_scsi_init_one',
inlined from 'virtio_scsi_init' at virtio/scsi.c:285:7:
virtio/scsi.c:247:2: error: 'strncpy' specified bound 224 equals destination size [-Werror=stringop-truncation]
strncpy((char *)&sdev->target.vhost_wwpn, disk->wwpn, sizeof(sdev->target.vhost_wwpn));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix this and the other occurences in virtio/ by using strlcpy instead
of strncpy.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
There are two uses of strncpy in builtin-run.c, where we don't make
proper use of strncpy, so that GCC 8.x complains and aborts compilation.
Replace those two calls with strlcpy(), which does the right thing in
our case.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
"make -s" suppresses normal output, just shows warnings and errors.
But since we explicitly override the make output with our fancy concise
version, we miss out on this feature.
Do as the kernel does and explicitly suppress every normal output when -s
is given. This helps to spot warnings that scroll out of the terminal
window too quickly.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The DT spec describes the stdout-path property in the /chosen node to
contain the DT path for a default device usable for outputting characters.
The Linux kernel uses this for earlycon (without further parameters),
other DT users might rely on this as well.
Add a stdout-path property pointing to the "serial0" alias, then add an
aliases node at the end of the FDT, containing the actual path. This
allows the FDT generation code in hw/serial.c to set this string.
Even when we use the virtio console, the serial console is still there
and works, so we can expose this unconditionally. Putting the virtio
console path in there will not work anyway.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
GCC 8.2 gives this warning:
virtio/9p.c: In function ‘virtio_p9_create’:
virtio/9p.c:335:21: error: passing argument 1 to restrict-qualified parameter aliases with argument 4 [-Werror=restrict]
ret = snprintf(dfid->path, size, "%s/%s", dfid->path, name);
~~~~^~~~~~ ~~~~~~~~~~
Fix it by allocating a temporary string with dfid->path content instead
of overwriting it in-place, which is limited in glibc snprintf with the
__restrict qualifier.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Anisse Astier <aastier@freebox.fr>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
GCC 8.2 gives this warning:
virtio/net.c: In function ‘virtio_net__tap_init’:
virtio/net.c:336:47: error: argument to ‘sizeof’ in ‘strncpy’ call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
strncpy(ifr.ifr_name, ndev->tap_name, sizeof(ndev->tap_name));
^
virtio/net.c:348:47: error: argument to ‘sizeof’ in ‘strncpy’ call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
strncpy(ifr.ifr_name, ndev->tap_name, sizeof(ndev->tap_name));
^
Fix it by using sizeof of destination instead, even if they're the same
size in this case.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Anisse Astier <aastier@freebox.fr>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
GCC 8.2 gives this warning:
builtin-run.c: In function ‘kvm_run_write_sandbox_cmd.isra.1’:
builtin-run.c:417:28: error: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size 4091 [-Werror=format-truncation=]
snprintf(dst, len, "/host%s", resolved_path);
^~ ~~~~~~~~~~~~~
It's because it understands that len is PATH_MAX, the same as
resolved_path's size. This patch handles the case where the string is
truncated, and fixes the warning.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Anisse Astier <aastier@freebox.fr>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
It is not good idea to pass empty 'source' argument to mount(2) because
libmount complains about incorrect /proc/self/mountinfo structure. This
affects many applications such as findmnt, umount and etc.
Let's add fake source argument to sysfs mount command as we do with all other
filesystems.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When loading a firmware instead of a kernel, we can still pass on any
*user-provided* command line, as /chosen/bootargs is a generic device tree
feature. We just need to make sure to not pass our mangled-for-Linux
version.
This allows to run "firmware" images which make use of a command line,
still are not Linux kernels, like kvm-unit-tests.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
On every build we report the kvmtool "version" number, which isn't
meaningful at all anymore.
Remove the line from the KVMTOOLS-VERSION-GEN script to drop a
pointless message.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The KVM ioctls mostly just return -1 in the error case, leaving the
actual error code in errno.
Change the output of the PMU error message to actually print this error
code instead of the generic -1.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
For whatever reason on ARM/arm64 machines kvmtool greets us with quite
some elaborate messages:
Info: Loaded kernel to 0x80080000 (18704896 bytes)
Info: Placing fdt at 0x8fe00000 - 0x8fffffff
Info: virtio-mmio.devices=0x200@0x10000:36
Info: virtio-mmio.devices=0x200@0x10200:37
Info: virtio-mmio.devices=0x200@0x10400:38
This is not really useful information for the casual user, so change
those lines to use pr_debug().
This also fixes the long standing line ending issue for the mmio output.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The virtio-console reset cancels all running jobs.
Unfortunately we don't have a good way to prevent the term polling thread
from getting in the way, read invalid data during reset and cause a
segfault. To prevent this, move all handling of the Rx queue in the
threadpool job.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The p9 reset cancels all running jobs and closes any open fid.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When resetting a virtqueue, it is often necessary to make sure that the
associated threadpool job isn't running anymore. Add a function to
cancel a job.
A threadpool job has three states: idle, queued and running. A job is
queued when it is in the job list. It is running when it is out the
list, but its signal count is greater than zero. It is idle when it is
both out of the list and its signal count is zero. The cancel() function
simply waits for the job to be idle. It is up to the caller to make sure
that the job isn't queued concurrently.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Move pthread creation to init_vq, and kill the thread in exit_vq.
Initialize the virtqueue states at runtime.
All in-flight I/O is canceled with the virtqueue pthreads, except for AIO
threads, but after reading the code I'm not sure if AIO has ever worked
anyway.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
On exit_vq(), clean all resources allocated for the queue. When the device
is reset, clean the backend.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently the virtqueue state is mixed with the netdev state. Move it to a
separate structure.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When resetting the virtio-net queues, the UIP state needs to be reset as
well. Stop all threads (one per TCP stream and UDP connection) and free
memory.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When the guest writes a status of 0, the device should be reset. Add a
reset() callback for the transport layer to reset all queues and their
ioeventfd.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Virtio allows to reset individual virtqueues. For legacy devices, it's
done by writing an address of 0 into the PFN register. Modern devices have
an "enable" register. Add an exit_vq() callback to all devices. A lot more
work is required by each device to clean up their virtqueue state, and by
the core to reset things like MSI routes and ioeventfds.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
To ease future changes to the core, replace get_pfn_vq() with get_vq().
This way adding new generic operation on virtqueues won't require
modifying every virtio device.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Modern virtio requires devices to report how many queues they support. Add
an operation to query all devices about their capacities.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Modern virtio require proper status handling and reset. A "notify_status"
callback is already present in the virtio ops, but isn't implemented by
any device. Instead they currently use "set_guest_feature" to reset the
device and deal with endianess. This isn't sufficient for proper device
reset, so add the notify_status callback to all devices that need it.
To add useful hints like "start" and "stop", extend the status variable to
32-bits.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
[Julien T: Remove VIRTIO_CONFIG_S_NEEDS_RESET from config mask, as
it is virtio v1+ macro and kvmtool only implements v0.9, this
macro should not be referenced for now]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Fix three bugs that prevent removal of ioeventfds in KVM. Store the
flags in the right structure, check the datamatch parameter, and pass
the fd to KVM.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Implement firmware image loading for arm and set the boot start address
to the firmware address.
Add an option for the user to specify where to map the firmware.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When a firmware file is provided, kvmtool is not responsible for loading
a kernel image.
There is no reason for looking for a default kernel image when loading
a firmware.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Firmware loading/setup function are in fdt file while it is not very
related to this.
Move them to the file that does the main init/setup for memory.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Some software drivers check the VRT bit (BIT7) of Register D before
using the MC146818 RTC. Initialized the VRT bit in rtc__init() to
indicate that the RAM and time contents are valid.
Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
ARM64_CORE_REG() is currently only used to generate the KVM
register IDs for registers that happen to be 64 bits in size, so
KVM_REG_SIZE_U64 is hard-coded in the definition.
To enable this macro to generate correct encodings for the FPSIMD
registers too (which are a mix of 128-bit and 32-bit registers),
this patch extends the macro to encode the correct size for each
class of register in KVM_REG_ARM_CORE.
The approach is crude, but because the KVM_REG_ARM_CORE ID
arrangement is ABI, it's not expected to evolve.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The local copies of the kvm user API headers are getting stale.
In preparation for some arch-specific updated, this patch reflects
a re-run of util/update_headers.sh to pull in upstream updates from
linux v5.0-rc2.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
guest/guest_init.c is a generated file, but git doesn't currently
ignore it. This can be annoying when running git status etc.
This patch adds a suitable .gitignore entry for this file.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[will: Do the same for guest/guest_pre_init.c]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently, the handling a pause signal only sets a state that will be
checked at the begining of the CPU run loop. At the checking point the vCPU
sends the notification that it is actually paused allowing the pause
requester to confirm all vCPUs are paused.
Receiving the pause signal during a KVM_RUN ioctl will make KVM exit to
userspace. However, there is a small window between that check on
cpu->paused and the execution of KVM_RUN where the signal has been received
but the vCPU does not go back through the notification and starts KVM_RUN.
Since there is no guarantee the vCPU will come back to userspace, the
pause requester might deadlock.
Perform the pause directly from the signal handler. This relies on a vCPU
thread never receiving a pause signal while being pause, but such scenario
would have caused a deadlock for the pause requester anyway.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
With the following sequence:
kvm__pause();
kvm__continue();
kvm__pause();
There is a chance that not all paused threads have been resumed, and the
second kvm__pause will attempt to pause them again. Since the paused thread
is waiting to own the pause_lock, it won't write its second pause
notification. kvm__pause will be waiting for that notification while owning
pause_lock, so... deadlock.
Simple solution is not to try to pause thread that had not the chance to
resume.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
After adding buffers to the virtio queue, the guest increments the avail
index. It then reads the event index to check if it needs to notify the
host. If the event index corresponds to the previous avail value, then
the guest notifies the host. Otherwise it means that the host is still
processing the queue and hasn't had a chance to increment the event
index yet. Once it gets there, the host will see the new avail index and
process the descriptors, so there is no need for a notification.
This is only guaranteed to work if both threads write and read the
indices in the right order. Currently a barrier is missing from
virt_queue__available(), and the host may not see an up-to-date value of
event index after writing avail.
HOST | GUEST
|
| write avail = 1
| mb()
| read event -> 0
write event = 0 | == prev_avail -> notify
read avail -> 1 |
|
write event = 1 |
read avail -> 1 |
wait() | write avail = 2
| mb()
| read event -> 0
| != prev_avail -> no notification
By adding a memory barrier on the host side, we ensure that it doesn't
miss any notification.
Reviewed-By: Steven Price <steven.price@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Ioport register bus devices when they registered. These devices are not
unregistered when the ioports entries containing their headers are
unregistered. This results in dangling pointers in the device rb_tree.
Unregister ioport bus devices when the ioport is unregistered.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
On Debian Stretch/Ubuntu 14.04, the libbfd provided by libbfd-dev or
binutils-dev packages does not like being linked statically.
Add a dynamic linkage test when detecting libbfd.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
For some optional dependencies, both static and dynamic linking is tested.
But if the first one being tested fails, the dependency is added to the
NOTFOUND list and reported as being skipped while it might still be built
with another linkage.
Add optional dependencies to NOTFOUND only if both linkage are invalid.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When building an object "foo.o", kvmtool also creates a ".foo.o.d" file,
using the dependency generation feature of CPP. This file describes in
Makefile format all headers included by foo.c. When one header is
modified, make rebuilds all objects that include it.
Dependency files in subfolders are currently ignored by make, because
the target doesn't contain the right prefix. For example virtio/.blk.o.d
has target "blk.o" instead of "virtio/blk.o". As a result, rebuilding
kvmtool without first issuing a make clean can introduce sneaky bugs,
where different objects use mismatched headers. To write the right
targets in dependency files, add a -MT argument to CPP.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Use the new reserved_regions API to ensure that RAM doesn't overlap any
reserved region. This prevents for instance from mapping an MSI doorbell
into the guest IPA space. For the moment we reject any overlapping. In the
future, we might carve reserved regions out of the guest physical
space.
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When passing devices to the guest, there might be address ranges
unavailable to the device. For instance, if address 0x10000000 corresponds
to an MSI doorbell, any transaction from a device to that address will be
directed to the MSI controller and might not even reach the IOMMU. In that
case 0x10000000 is reserved by the physical IOMMU in the guest's physical
space.
This patch introduces a simple API to register reserved ranges of
addresses that should not or cannot be provided to the guest. For the
moment it only checks that a reserved range does not overlap any user
memory (we don't consider MMIO) and aborts otherwise.
It should be possible instead to poke holes in the guest-physical memory
map and report them via the architecture's preferred route:
* ARM and PowerPC can add reserved-memory nodes to the DT they provide to
the guest.
* x86 could poke holes in the memory map reported with e820. This requires
to postpone creating the memory map until at least VFIO is initialized.
* MIPS could describe the reserved ranges with the "memmap=mm$ss" kernel
parameter.
This would also require to call KVM_SET_USER_MEMORY_REGION for all memory
regions at the end of kvmtool initialisation. Extra care should be taken
to ensure we don't break any architecture, since they currently rely on
having a linear address space with at most two memory blocks.
This patch doesn't implement any address space carving. If an abort is
encountered, user can try to rebuild kvmtool with different addresses or
change its IOMMU resv regions if possible.
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
In some cases device regions don't support mmap. They can still be made
available to the guest by trapping all accesses and forwarding reads or
writes to VFIO. Such regions may be:
* PCI I/O port BARs.
* Sub-page regions, for example a 4kB region on a host with 64k pages.
* Similarly, sparse mmap regions. For example when VFIO allows to mmap
fragments of a PCI BAR and forbids accessing things like MSI-X tables.
We don't support the sparse capability at the moment, so trap these
regions instead (if VFIO rejects the mmap).
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Allow guests to use the MSI capability in devices that support it. Emulate
the MSI capability, which is simpler than MSI-X as it doesn't rely on
external tables. Reuse most of the MSI-X code. Guests may choose between
MSI and MSI-X at runtime since we present both capabilities, but they
cannot enable MSI and MSI-X at the same time (forbidden by PCI).
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Add virtual MSI-X tables for PCI devices, and create IRQFD routes to let
the kernel inject MSIs from a physical device directly into the guest.
It would be tempting to create the MSI routes at init time before starting
vCPUs, when we can afford to exit gracefully. But some of it must be
initialized when the guest requests it.
* On the KVM side, MSIs must be enabled after devices allocate their IRQ
lines and irqchips are operational, which can happen until late_init.
* On the VFIO side, hardware state of devices may be updated when setting
up MSIs. For example, when passing a virtio-pci-legacy device to the
guest:
(1) The device-specific configuration layout (in BAR0) depends on
whether MSIs are enabled or not in the device. If they are enabled,
the device-specific configuration starts at offset 24, otherwise it
starts at offset 20.
(2) Linux guest assumes that MSIs are initially disabled (doesn't
actually check the capability). So it reads the device config at
offset 20.
(3) Had we enabled MSIs early, host would have enabled the MSI-X
capability and device would return the config at offset 24.
(4) The guest would read junk and explode.
Therefore we have to create MSI-X routes when the guest requests MSIs, and
enable/disable them in VFIO when the guest pokes the MSI-X capability. We
have to follow both physical and virtual state of the capability, which
makes the state machine a bit complex, but I think it works.
An important missing feature is the absence of pending MSI handling. When
a vector or the function is masked, we should rewire the IRQFD to a
special thread that keeps note of pending interrupts (or just poll the
IRQFD before recreating the route?). And when the vector is unmasked, one
MSI should be injected if it was pending. At the moment no MSI is
injected, we simply disconnect the IRQFD and all messages are lost.
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Assigning devices using VFIO allows the guest to have direct access to the
device, whilst filtering accesses to sensitive areas by trapping config
space accesses and mapping DMA with an IOMMU.
This patch adds a new option to lkvm run: --vfio-pci=<BDF>. Before
assigning a device to a VM, some preparation is required. As described in
Linux Documentation/vfio.txt, the device driver needs to be changed to
vfio-pci:
$ dev=0000:00:00.0
$ echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
$ echo vfio-pci > /sys/bus/pci/devices/$dev/driver_override
$ echo $dev > /sys/bus/pci/drivers_probe
Adding --vfio-pci=$dev to lkvm-run will pass the device to the guest.
Multiple devices can be passed to the guest by adding more --vfio-pci
parameters.
This patch only implements PCI with INTx. MSI-X routing will be added in a
subsequent patch, and at some point we might add support for passing
platform devices to guests.
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
It's always nice to have a log2 handy, and the vfio-pci code will need to
perform power of two allocation from an arbitrary size. Add fls_long and
roundup_pow_of_two, based on the GCC builtin.
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
To ensure consistency between kvmtool and the kernel, import the UAPI
headers of the VFIO version we implement. This is from Linux v4.12.
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Add a way to iterate over all capabilities in a config space. Add a search
function for getting a specific capability.
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Introduce memory types RAM and DEVICE, along with a way for subsystems to
query the global memory banks. This is required by VFIO, which will need
to pin and map guest RAM so that assigned devices can safely do DMA to it.
Depending on the architecture, the physical map is made of either one or
two RAM regions. In addition, this new memory types API paves the way to
reserved memory regions introduced in a subsequent patch.
For the moment we put vesa and ivshmem memory into the DEVICE category, so
they don't have to be pinned. This means that physical devices assigned
with VFIO won't be able to DMA to the vesa frame buffer or ivshmem. In
order to do that, simply changing the type to "RAM" would work. But to
keep the types consistent, it would be better to introduce flags such as
KVM_MEM_TYPE_DMA that would complement both RAM and DEVICE type. We could
then reuse the API for generating firmware information (that is, for x86
bios; DT supports reserved-memory description).
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Add helpers to add and remove IRQFD routing for both irqchips and MSIs.
We have to make a special case of IRQ lines on ARM where the
initialisation order goes like this:
(1) Devices reserve their IRQ lines
(2) VGIC is setup with VGIC_CTRL_INIT (in a late_init call)
(3) MSIs are reserved lazily, when the guest needs them
Since we cannot setup IRQFD before (2), store the IRQFD routing for IRQ
lines temporarily until we're ready to submit them.
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently all our virtual device interrupts are edge-triggered. But we're
going to need level-triggered interrupts when passing physical devices.
Let the device configure its interrupt kind. Keep edge as default, to
avoid changing existing users.
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When implementing PCI device passthrough, we will need to forward config
accesses from a guest to the VFIO driver. Add a private cfg_ops structure
to the PCI header, and use it in the PCI config access functions.
A read from the guest first calls into the device's cfg_ops.read, to let
the backend update the local header before filling the guest register.
Same happens for a write, we let the backend perform the write and replace
the guest-provided register with whatever sticks, before updating the local
header.
Try to untangle the PCI config access logic while we're at it.
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[JPB: moved to a separate patch]
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The header files in arm/aarch*/include/asm/ are directly copied from
Linux, so we can't just put our own definitions in there.
Move the GICv2M MMIO frame size into a more private header, to avoid
breaking the build once the header files are synced from Linux.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently we accidentally overlap the GICv2m MMIO frame with the CPU
interface region. Fix this by moving the v2m frame below the CPUI region.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The KVM_VGIC_V3_ITS_SIZE macro from the Linux API header file already
covers the doorbell page, so we don't need to add that extra page size
in our code.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Vhost supports a single eventfd as the kick mechanism. Registering a
second one will override the first. To ensure vhost works with our
virtio-pci, only register the kick eventfd that is used by the guest.
Fixes: a508ea95f954 ("virtio/pci: Use port I/O for configuration registers by default")
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
virtio/pci.c registers a notification ioeventfd on both PIO and MMIO
buses. But architectures other than x86 cannot differentiate MMIO from
PIO traps, and the kernel always calls kvm_io_bus_read/write with
KVM_MMIO_BUS as argument.
As a result kvmtool's ioeventfd isn't used with virtio PCI, because the
kernel can't find it and all accesses to the doorbell return to
userspace. To fix it, don't set the PIO flag if the architecture doesn't
support it.
Fixes: a508ea95f954 ("virtio/pci: Use port I/O for configuration registers by default")
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
With vhost, the USER_POLL flags isn't passed to ioeventfd__add_event,
the function returns early and doesn't add the new event to the
used_ioevents list. As a result ioeventfd__del_event doesn't remove the
KVM event or free the structure. Always add the event to the list.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The wmb() in next_desc seems out of place and the comments are
inaccurate. Remove the unnecessary barrier and clean up next_desc().
next_desc() is called by virt_queue__get_head_iov() when filling the iov
with desciptor addresses. It reads the descriptor's flag and next index.
The virt_queue__get_head_iov() only reads the direct and indirect
descriptors, and doesn't write any shared memory except from iov and
cursors that will be read by the caller.
As far as I can see, vhost (the kernel implementation of virtio device)
does well without any barrier here, so I think it might be safe to remove.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
One barrier seems to be missing from kvmtool's virtio implementation,
between virt_queue__available() and virt_queue__pop(). In the following
scenario "avail" represents the shared "available" structure in the virtio
queue:
Guest | Host
|
avail.ring[shadow] = desc_idx | while (avail.idx != shadow)
smp_wmb() | /* missing smp_rmb() */
avail.idx = ++shadow | desc_idx = avail.ring[shadow++]
If the host observes the avail.idx write before the avail.ring update,
then it will fetch the wrong desc_idx. Add the missing barrier.
This seems to fix the horrible bug I'm often seeing when running netperf
in a guest (virtio-net + tap) on AMD Seattle. The TX thread reads the
wrong descriptor index and either faults when accessing the TX buffer, or
pushes the wrong index to the used ring. In that case the guest complains
that "id %u is not a head!" and stops the queue.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Modern virtio PCI is allowed to use both memory and I/O BARs for the
config space, but legacy devices must use I/O for BAR0, as specified by
Virtio v1.0 cs04:
4.1.5.1.1.1 Legacy Interface: A Note on Device Layout Detection
"Transitional devices MUST expose the Legacy Interface in I/O space in
BAR0."
What virtio calls "I/O space" is most certainly port I/O, as hinted by the
discussion in 4.1.4 Virtio Structure PCI Capabilities, where it
distinguishes "memory BARs" from "I/O BARs". This is also the conclusion
made by SeaBIOS [1], which only looks for port I/O in BAR0 when driving a
transitional device.
I think MMIO was made the default by a463650caad6 ("kvm tools: pci: add
MMIO interface to virtio-pci devices") to support ARM targets, but we
support PIO as well as MMIO nowadays. So let's make the legacy virtio
implementation comply with the specification and use port I/O for BAR0.
[1] https://patchwork.kernel.org/patch/10038927/
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Bad things happen when the VIRTIO_RING_F_EVENT_IDX feature isn't
negotiated and we try to write the avail_event anyway. SeaBIOS, for
example, stores internal data where avail_event should be [1].
Technically the Virtio specification doesn't forbid the device from
writing the avail_event, and it's up to the driver to reserve space for it
("the transitional driver [...] MUST allocate the total number of bytes
for the virtqueue according to [formula containing the avail event]").
But it doesn't hurt us to avoid writing avail_event, and kvmtool needs
changes for interrupt suppression anyway, in order to comply with the
spec. Indeed Virtio 1.0 cs04 says, in 2.4.7.2 Device Requirements:
Virtqueue Interrupt Suppression:
"""
If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
* The device MUST ignore the used_event value.
* After the device writes a descriptor index into the used ring:
- If flags is 1, the device SHOULD NOT send an interrupt.
"""
So let's do that.
[1] https://patchwork.kernel.org/patch/10038931/
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
We're going to need the features bits negotiated between host and guest in
the core code. Save them in the virtio_device structure.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When characters are input on the console before virtio_console is
initialized, the term.c poll thread will get stuck in
virtio_console__inject_interrupt, because it ends up doing
pthread_cond_wait on the uninitialized poll_cond, which will hang
indefinitely. As a result it becomes impossible to input characters into
the guest, even when using serial instead of virtio console.
Initialize poll_cond statically to prevent this race.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Commit f6108d72e977 ("Add GICv2m support") introduced a bool return
type, but missed to include the respective header (this was probably
part of a former prerequisite series).
Fix this by including the header.
Fixes: f6108d72e977cce00e7bc824acd1d73da8cc9729 ("Add GICv2m support")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
GICv2m is a small extension to the GICv2 architecture, specified in the
Server Base System Architecture (SBSA). It adds a set of register to
converts MSIs into SPIs, effectively enabling MSI support for pre-GICv3
platforms.
Implement a GICv2m emulation entirely in userspace. Add a thin translation
layer in irq.c to catch the MSI->SPI routing setup of the guest, and then
transform irqfd injection of MSI into the associated SPI. There shouldn't
be any significant runtime overhead compared to gicv3-its.
The device can be enabled by passing "--irqchip gicv2m" to kvmtool.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When kvm_pause is called early (from taking the rwlock), it segfaults
because the CPU array is initialized slightly later. Fix this.
This doesn't happen at the moment but the gicv2m patch will register an
MMIO region, which requires br_write_lock. gicv2m is instantiated by
kvm__arch_init from within core_init (level 0). The CPU array is
initialized later in base_init (level 1).
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Commit 5857730ceee5 ("builtin-run: Pass console= parameter based on
active console") adds a console parameter to the kernel command line,
but doesn't account for x86 kvm__arch_set_cmdline populating
real_cmdline without adding a space. Fix the concatenation.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
x86 already does this in the backend, but doing it in the generic code
means that it is possible to boot a defconfig arm64 kernel under kvmtool
without having to specify any additional parameters at all.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
In kvmtool, the terminal has 4 term-devices at most. And these term-devices
can connect to serial8250 or virtio console ports. The kvmtool has a loop
thread to detect the incoming data on these term-devices and then send the
data to guest through serial8250 or virtio console ports. On x86, kvmtool
allow to read data from all 4 term-devices. But on ARM, we only support reading
data from the first term-devices. The data from the other term-devices will
be ignored.
Currently, we're adding the kvmtool support to runv (a kind of hyper container)
with Hyperhq guys. Here we're using 3 serial ports in guest to communicate with
host (Container runtime). On x86, it works fine, but on ARM it could not work.
Because we're using terminal 2 to send/receive control message, but terminal 2
is single direction.
In this case, we change the kvm__arch_read_term for ARM to allow reading data
from all term-devices.
Signed-off-by: Wei Chen <Wei.Chen@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Fully fledged bootloaders should really be populating this from within
the guest using virtio-rng, but having a way to specify it on the cmdline
is useful for developers or users without a bootloader.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
At the moment we use the linker to convert the compiled guest_init binary
into an ELF object file, so it can be embedded into the kvmtool binary
and accessed later easily at runtime.
Now this has two problems:
1) This approach does not work for MIPS, because the linker defaults to
a different ABI than the compiler, so the GCC generated object files are
not compatible with this converted binary.
2) The size symbol as it's used at the moment in the object file is subject
to relocation, which leads to wrong results when using PIE builds, which is
now the default for some distributions.
Fix those two problems at once by using some shell tools to create a C
source file containing the guest_init binary, which then gets compiled into
a proper object file with the normal compiler and its flags.
The size of the guest init binaries is now simply a variable, which does
not get mangled at all.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
So far the generation of the guest_init binaries is not properly
modelled in the Makefile: the intermediate object files are not targets.
This leads to failures when those files get deleted.
So (also in preperation for the upcoming rework) rework the dependency
chain to have those intermediate files covered as well, which involves
splitting the generation into two steps.
On the way use automatic variables where applicable and remove the
explicit listing of the guest_init targets, which are now covered by
the final $(GUEST_OBJS) targets.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
In Linux commit fb652fdfe83710da0ca13448a41b7ed027d0a984:
https://www.spinics.net/lists/netdev/msg443562.html
The UFO support had been removed.
If we use tap mode for network (--network mode=tap,tapif=...),
we will get following error:
"Warning: Config tap device TUNSETOFFLOAD error
You have requested a TAP device,
but creation of one has failed because: Invalid argument"
So, if we're running with latest kernel, we'd better to remove
TUN_F_UFO from TAP init. But if we're running with older kernels
without above commit. We'll miss the UFO feature. In this case,
we'd better to check the kernel UFO support status for tap driver.
The tap UFO state will used in get_host_features to return correct
VIRTIO_NET features. If we defer the tap UFO support check in
virtio_net__tap_init, it will be too later. So we separate the
tap create code from tap_init to a standalone function. This new
function will be used in virtio_net_init to create tap device and
check the tap UFO support status at the very beginning.
Signed-off-by: Wei Chen <Wei.Chen@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Since kernel commit 25dc1d6cc3082aab293e5dad47623b550f7ddd2a ("x86:
stop exporting msr-index.h to userland"), <asm/msr-index.h> is no
longer exported to userspace. Therefore, any toolchain built with
kernel headers >= 4.12 will no longer have this header file, causing a
build failure in kvmtool.
As a replacement, this patch includes inside x86/kvm-cpu.c the
necessary MSR_* definitions.
Reviewed-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
GCC 7 warns about truncating the mpidr when we print the cpu_name into
the device tree:
arm/fdt.c: In function ‘setup_fdt’:
arm/fdt.c:58:45: error: ‘%lx’ directive output may be truncated writing between 1 and 10 bytes into a region of size 7 [-Werror=format-truncation=]
snprintf(cpu_name, CPU_NAME_MAX_LEN, "cpu@%lx", mpidr);
Fix this by bumping the buffer to 15 bytes. We really only need 11 bytes,
but GCC isn't smart enough to identify that we mask out the top buts
of the MPIDR and the analysis just seems to be based on types.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
makedev() should be sourced from sys/sysmacros.h rather than
sys/types.h. This is because glibc is moving away from having
it available in types.h.
https://patchwork.ozlabs.org/patch/611994/
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
With everything in place for the ITS emulation add a new option to the
--irqchip parameter to allow the user to specify --irqchip=gicv3-its
to enable the ITS emulation.
This will trigger creating the FDT node and an ITS register frame to
tell the kernel we want ITS emulation in the guest.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
For ITS emulation we need the device ID along with the MSI payload
and doorbell address to identify an MSI, so we need to put it in the
GSI IRQ routing table too.
There is a per-VM capability by which the kernel signals the need for
a device ID, so check this and put the device ID into the routing
table if needed.
For PCI devices we take the bus/device/function triplet and and that
to the routing setup call.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Since we soon start using GSI routing on ARM platforms too, we have
to setup the initial SPI routing table. Before the first call to
KVM_SET_GSI_ROUTING, the kernel holds this table internally, but this
is overwritten with the ioctl, so we have to explicitly set it up
here.
The routing is actually not used for IRQs triggered by KVM_IRQ_LINE,
but it needs to be here anyway. We use a simple 1:1 mapping.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The ITS emulation requires a unique device ID to be passed along the
MSI payload when kvmtool wants to trigger an MSI in the guest.
According to the proposed changes to the interface add the PCI
bus/device/function triple to the structure passed with the ioctl.
Check the respective capability before actually adding the device ID
to the kvm_msi struct.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
KVM capabilities can be per-VM, in this case the ioctl should be
issued on the VM file descriptor, not on the system fd.
Since this feature is guarded by a (system) capability itself, wrap
the call into a function of its own.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The ARM GICv3 ITS requires a separate device tree node to describe
the ITS. Add this as a child to the GIC interrupt controller node
to let a guest discover and use the ITS if the user requests it.
Since we now need to specify #address-cells for the GIC node, we
have to add two zeroes to the interrupt map to match that.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The GICv3 ITS expects a separate 64K page to hold ITS registers.
Add a function to reserve such a page in the guest's I/O memory and
use that for the ITS vGIC type.
To cover the 64K page with the MSI doorbell (which directly follows the
page with the register frames), we reserve this as well, although
the guest is never expected to write into this.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
KVM/arm recently got support for vGICv3 (and vITS), which is evident
in the updated header file. So as now ARM has feature parity when it
comes to the GIC emulation, we can remove the special defines we had
in place to allow compilation for ARM(32).
For simplicity we now use 64K sized GIC regions everywhere, as GICv3
mandates them.
[Andre: some update, reword commit message]
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The GICv3 ITS emulation brings some additions to the headers, so
lets update kvmtool's version of the headers to Linux' v4.11-rc7-57.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
If we need to inject an MSI into the guest, we rely at the moment on a
working GSI MSI routing functionality. However we can get away without
IRQ routing, if the host supports MSI injection via the KVM_SIGNAL_MSI
ioctl.
So we try the GSI routing first, but if that fails due to a missing
IRQ routing functionality, we fall back to KVM_SIGNAL_MSI (if that is
supported).
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently we deny any VHOST_* functionality if the architecture
supports guests with different endianness than the host. Most of the
time even on those architectures the endianness of guest and host are
the same, though, so we are denying the glory of VHOST needlessly.
Switch from compile time determination to a run time scheme, which
takes the actual endianness of the guest into account.
For this we change the semantics of VIRTIO_ENDIAN_HOST to return the
actual endianness of the host (the endianness of kvmtool at compile
time, really). The actual check in vhost_net now compares this against
the guest endianness.
This enables vhost support on ARM and ARM64.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When we set up GSI routing to map MSIs to KVM's GSI numbers, we
write the current device's MSI setup into the kernel routing table.
However the device driver in the guest can use PCI configuration space
accesses to change the MSI configuration (address and/or payload data).
Whenever this happens after we have setup the routing table already,
we must amend the previously sent data.
So when MSI-X PCI config space accesses write address or payload,
find the associated GSI number and the matching routing table entry
and update the kernel routing table (only if the data has changed).
This fixes vhost-net, where the queue's IRQFD was setup before the
MSI vectors.
To avoid issues, we ignore writes to the PBA region. The spec says:
"Software should never write, and should only read Pending Bits.
If software writes to Pending Bits, the result is undefined."
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The current IRQ routing code in x86/irq.c is mostly implementing a
generic KVM interface which other architectures may use too.
Move the code to set up an MSI route into the generic irq.c file and
guard it with the KVM_CAP_IRQ_ROUTING capability to return an error
if the kernel does not support interrupt routing.
This also removes the dummy implementations for all other
architectures and only leaves the x86 specific code in x86/irq.c.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
As KVM supports only onc (v)GIC per guest and it's hard to imagine that
we will ever need more than that, lets simplify the FDT generation by
not passing that single, constant phandle around.
Let's just reference that one global symbol from enum phandles instead.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The current implementation of fdt__alloc_phandle() suffers from being
implemented in a static inline function situated in a header file.
This will only create expected results within a single compilation
unit.
It seems a bit over the top to use a function to allocate phandles,
when at the end of the day a phandle is just a unique identifier.
To simplify things - especially with upcoming patches - we just
introduce an enum per architecture to hold all possible phandle sources
and use that instead of the dynamic allocation.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
As I was trying to install a new VM using the Debian installer,
I noticed that the return key would work just fine in a shell,
but wouldn't do anything in the menu. Pretty annoying.
Further investigation showed that the terminal was left in
cooked mode, converting CR to LF, and thus giving the VM
the wrong information.
Clearing the ICRNL flag in the input flag set fixes the issue.
Suggested-by: Dave martin <dave.martin@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When merging virtio-net buffers using the VIRTIO_NET_F_MRG_RXBUF feature,
the first buffer added to the used ring should indicate the total number
of buffers used to hold the packet. Unfortunately, kvmtool has a number
of issues when constructing these merged buffers:
- Commit 5131332e3f1a ("kvmtool: convert net backend to support
bi-endianness") introduced a strange loop counter, which resulted in
hdr->num_buffers being set redundantly the first time round
- When adding the buffers to the ring, we actually add them one-by-one,
allowing the guest to see the header before we've inserted the rest
of the data buffers...
- ... which is made worse because we non-atomically increment the
num_buffers count in the header each time we insert a new data buffer
Consequently, the guest quickly becomes confused in its net rx code and
the whole thing grinds to a halt. This is easily exemplified by trying
to boot a root filesystem over NFS, which seldom succeeds.
This patch resolves the issues by allowing us to insert items into the
used ring without updating the index. Once the full payload has been
added and num_buffers corresponds to the total size, we *then* publish
the buffers to the guest.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
We use cacheable accesses on our end of the virtio ring, so make sure
the guest is aware of that, and thus doesn't try to use non-cacheable
DMA buffers, by including the dma-coherent property on its DT node.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[will: do the same for the PCI node for virtio-pci devices]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Make use of get_full_path_helper() instead of sprintf.
Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The check on the return value of snprintf should reuse the size parameter,
rather than take sizeof(full_path) as the bound.
Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The code responsible of path verification is identical in several
functions. Move it to a new function.
Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Use strncpy instead of strcpy to avoid buffer overflow vulnerabilities.
Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com>
[will: keep strcpy when we've verified the size already]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Use snprintf instead of sprintf to avoid buffer overflow
vulnerabilities.
Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
A path traversal exists because the guest can send "../" sequences to
the host 9p handlers. To fix this vulnerability, we ensure that path
components sent by the guest don't contain "../" sequences.
Signed-off-by: G. Campana <gcampana+kvm@quarkslab.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Latest Debian and Ubuntu GCC default to PIE code. Disable
PIC for bios and PIE for pre_init. Since the flag -no-pie
is not available on older GCC's, make use of flag only if
the option is available. -fno-pic is more widely available
and should be safe to enable uncondionally.
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The madvise behavior is not a bit field and hence can not be or'ed.
Also madvise_behavior_valid checks the flag using a case statement
hence only one behavior is supposed to be supplied. Call madvise
twice, once for MERGEABLE and once for HUGEPAGE.
Acked-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
open() sets the file osset to the beginning of the file, so there's no
need for an explicit lseek when called in kvm__arch_load_kernel_image.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When walking the devices rbtree to insert a node, we must keep track of the
parent node when we descend. If we skip this step, we always insert new
nodes with a NULL parent, bypassing __rb_insert()s rebalance code.
Things get worse when we come to walk the tree, as we can't move up a
level. This isn't a problem in practice, as all devices appear to be
inserted in-order, so our rbtree is actually a monochrome linked list.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
KVM exposes a level triggered timer to the guest, and yet kvmtool
presents it as being edge-triggered in the DT. Let's fix it and
match what the kernel exposes.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
From time to time (when new KVM kernel features get enabled in kvmtool),
we need to update the public kernel headers from a recent Linux tree.
Provide a script that makes sure we get the right files and that also
covers every architecture.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Update our copy of the KVM header files to match the kernel's v4.6.0.
This fixes the ARM PMU support, where the feature identifier was
changed during the merge window due to a merge conflict.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
readdir_r is deprecated[1] and usage of readdir is recommended.
[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=7584a3f96de88d5eefe5d6c634515278cbfbf052
Signed-off-by: Michal Rostecki <michal.rostecki@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Our exit/reboot code is a bit of a mess:
- Both kvm__reboot and kvm_cpu_exit send SIGKVMEXIT to running vcpus
- When vcpu0 exits, the main thread starts executing destructors
(exitcalls) whilst other vcpus may be running
- The pause_lock isn't always held when inspecting is_running for
a vcpu
This patch attempts to fix these issues by restricting the exit/reboot
path to vcpu0 and the main thread. In particular, a KVM_SYSTEM_EVENT
will signal SIGKVMEXIT to vcpu0, which will join with the main thread
and then tear down the other vcpus before invoking any destructor code.
Acked-by: Balbir Singh <bsingharora@gmail.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Port the spapr_pci implementation for ppc64le.
Based on suggestions by Alexey Kardashevskiy <aik@ozlabs.ru>
We should have always used phys_hi and 64 bit addr and size.
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Use the infrastructure for queuing a task to a specific vCPU and
sett ILE (Little Endian Interrupt Handling) on power via h_set_mode
hypercall
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch adds kvm_cpu__run_on_all_cpus() to run a task on each vCPU.
This infrastructure uses signals to signal the vCPU to allow a task
to be added to each vCPU's task. The vCPU executes any pending tasks
in the cpu run loop
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently kvmtool works well/was designed for big endian ppc64 systems.
This patch adds support for little endian systems
The system does not yet boot as support for h_set_mode is required to help
with exceptions in big endian mode -- first page fault. The support comes in
the next patch of the series
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Debian and some other distro's don't provide mkisofs due to
licensing concerns. xorrisofs from package xorriso provides
a command-line compatible command in this case. Update the
makefile of tests to pick xorrisofs if mkisofs is not available.
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
We don't have PMU support on 32bit ARM just yet, so let's work
around this the ugly way for now.
Cc: Will Deacon <will.deacon@arm.com>
Reported-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Now that we have a manpage in place, we can remove the manpage-style
text files from the Documentation directory.
This allows us also to get rid of the crude common-cmds.h generation,
which relied on these files and on a command-list.txt file.
Instead include the version of that header file generated with the
current HEAD into the source tree.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The kvmtool documentation is somewhat lacking, also it is not easily
accessible when living in the source tree only.
Add a good ol' manpage to document at least the basic commands and
their options.
This level of documentation matches the one that is already there in
the Documentation directory and should be subject to extension.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
In order to enable the in-kernel PMU emulation code, add a tiny bit
of setup code that initializes the PMU on each CPU and populates
the DT. The IRQ is harcoded to PPI7 (INTID23) in order to match
what QEMU does.
The code is enabled when the --pmu option is passed to lkvm.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
In order to enable the PMU support on arm64, update the copy of the
kernel include files.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
If a static libc is not present in the system the build will fail with
make complaining about commands starting before the first target. The
patch fixes indentation of a warning about missing static libc, thus
fixing the build.
Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
For some reasons (probably to have easy access to the command line)
the kernel loading for arm and arm64 was located in arm/fdt.c.
Move the routines to kvm.c (where other architectures put it) to
only have real device tree code in fdt.c. We use the pointer in
struct kvm to access the command line string.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Use the new read_file() wrapper in our arm/arm64 kernel image loading
function instead of the private implementation.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Replace the unsafe read-loops in the x86 kernel image loading
functions with our safe read_file() and read_in_full() wrappers.
This should fix random fails in kernel image loading, especially
from pipes and sockets.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Replace the unsafe read-loops used in the MIPS kernel image loading
with our safe read_file() and read_in_full() wrappers.
This should fix random fails in kernel image loading, especially
from pipes and sockets.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Replace the unsafe read-loops in the powerpc kernel image loading
function with our new and safe read_file() wrapper.
This should fix random fails in kernel image loading, especially
from pipes and sockets.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
In various parts of kvmtool we simply try to read files into memory,
but fail to do so in a safe way. The read(2) syscall can return early
having only parts of the file read, or it may return -1 due to being
interrupted by a signal (in which case we should simply retry).
The ARM code seems to provide the only safe implementation, so take
that as an inspiration to provide a generic read_file() function
usable by every part of kvmtool.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Let's face it: Kernel loading is quite architecture specific. Don't
claim otherwise and move the loading routines into each
architecture's responsibility.
This introduces kvm__arch_load_kernel(), which each architecture can
implement accordingly.
Provide bzImage loading for x86 and ELF loading for MIPS as special
cases for those architectures (removing the arch specific code from
the generic kvm.c file on the way) and rename the existing "flat binary"
loader functions for the other architectures to the new name.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
After make lkvm-static & make clean, the dependency files for static
objects (.xxx.static.o.d) are not removed.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Dimitri John Ledkov <dimitri.j.ledkov@intel.com>
Signed-off-by: James Hunt <james.o.hunt@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Looking back at the HEAD from a few commits ago, it's obvious that
using the LDFLAGS variable for linking the guest_init binary was
rather pointless, as it was zeroed in the beginning and then never
set.
As guest_init is a rather special binary that does not cope well with
arbitrary linker flags, let's reinstantiate the previous state by
removing the LDFLAGS variable from those linking steps. This allows
LDFLAGS to be used for linking the actual kvmtool binary only and
helps to re-merge commit d0e2772b93a ("Makefile: allow overriding
CFLAGS on the command line").
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
lkvm currently suffers from a Segmentation Fault when exiting, which can
also lead to the console not being cleaned up correctly after a VM exits.
The issue is that (the misnamed) kvm_cpu__reboot function sends a
SIGKVMEXIT to each vcpu thread, which causes those vcpu threads to exit
once their main loops (kvm_cpu__start) detect that cpu->is_running is
now false. The lack of synchronisation in this exit path means that a
concurrent pause event (due to the br_write_lock in ioport__unregister)
ends up sending SIGKVMPAUSE to an exited thread, resulting in a SEGV.
This patch fixes the issue by moving kvm_cpu__reboot into kvm.c
(renaming it in the process) where it can hold the pause_lock mutex
across the reboot operation. This in turn makes it safe for the pause
code to check the is_running field of each CPU before attempting to
send a SIGKVMPAUSE signal.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Riku Voipio reports a regression introduced by d0e2772b93ab ("Makefile:
allow overriding CFLAGS on the command line"):
| This breaks builds of debian packages as dpkg-buildpackage sets LDFLAGS
| to something unsuitable for guest init.
Revert the problematic patch for the moment, while we rethink how we'd
like to support user-provided toolchain flags.
This reverts commit d0e2772b93abcc8a66f83ed8ed248c94adabce4b.
Conflicts:
Makefile
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
While we have an LDFLAGS variable in kvmtool's Makefile, it's not
really used when both doing the feature tests and when finally linking
the lkvm executable.
Add that variable to all the linking steps to allow the user to
specify custom library directories or linker options on the command
line.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When a Makefile variable is set on the make command line, all
Makefile-internal assignments to that very variable are _ignored_.
Since we add quite some essential values to CFLAGS internally,
specifying some CFLAGS on the command line will usually break the
build (and not fix any include file problems you hoped to overcome
with that).
Somewhat against intuition GNU make provides the "override" directive
to change this behavior; with that assignments in the Makefile get
_appended_ to the value given on the command line. [1]
Change any internal assignments to use that directive, so that a user
can use:
$ make CFLAGS=/path/to/my/include/dir
to teach kvmtool about non-standard header file locations (helpful
for cross-compilation) or to tweak other compiler options.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[1] https://www.gnu.org/software/make/manual/html_node/Override-Directive.html
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
when starting with custom kernel and disk options, kernel_cmdline is
NULL; it results in a segfault while trying to look for a string
using `strstr`:
__strstr_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strstr-sse2-unaligned.S:40
0x00000000004056bf in kvm_cmd_run_init (argc=<optimized out>, argv=<optimized out>) at builtin-run.c:608
0x000000000040639d in kvm_cmd_run (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>) at builtin-run.c:659
0x0000000000412b8f in handle_command (command=0x62bbc0 <kvm_commands>, argc=5, argv=0x7fffffffe840) at kvm-cmd.c:84
0x00007ffff7211b45 in __libc_start_main (main=0x403540 <main>, argc=6, argv=0x7fffffffe838, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7fffffffe828) at libc-start.c:287
0x0000000000403962 in _start ()
this patch suggests to set a minimal cmdline when kernel_cmdline is NULL
Fixes: 8a7163f3dbc7 ("kvmtool/run: append cfg.kernel_cmdline at the end of real_cmdline")
Signed-off-by: William Dauchy <william@gandi.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
9p doesn't support writable mmaps by default (when cache=none), set it to
loose caching to allow for writable mmaps.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
To me kvm_setup_guest_init() behaviour looks "obviously wrong" and
unfriendly because it always overwrites /virt/init.
kvm_setup_guest_init() is also called when we are going to use this
tree as a rootfs, and without another patch ("kvmtool/run: append
cfg.kernel_cmdline at the end of real_cmdline") the user can't use
"lkvm run -p init=my_init_path". This simply means that you can not
use a customized init unless you patch kvmtool.
Change extract_file() to do nothing if the file already exists. This
should not affect do_setup() which calls kvm_setup_guest_init() only
if make_dir(guestfs_name) creates the new/empty dir.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
1. kvm_cmd_run_init() appends "root=/dev/root" to real_cmdline if
cfg.using_rootfs == T. This doesn't hurt but makes no sense and
looks confusing.
We do not need to initialiaze the kernel's saved_root_name[] and
"/dev/root" means nothing to name_to_dev_t().
We only need to pass this mount-tag to 9p but the kernel always
uses dev_name="/dev/root" in mount_root() path, so we can safely
remove this option from the command line.
2. "rw" in rootflags looks confusing too, it is silently ignored by
v9fs_parse_options() and has no effect.
We need to clear MS_RDONLY from root_mountflags, this is what the
"standalone" kernel parameter correctly does.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
add lkvm-static to gitignore
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Add the tiny x86/init.S which just mounts /host and execs
/virt/init.
NOTE: of course, the usage of CONFIG_GUEST_PRE_INIT is ugly, we
need to cleanup this code. But I'd prefer to do this on top of
this minimal/simple change. And I think this needs cleanups in
any case, for example I think lkvm shouldn't abuse the "init="
kernel parameter at all.
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This comes as a separate patch because I do not really understand
/usr/bin/make, probably it should be updated.
Change the main Makefile so that if an arch defines ARCH_PRE_INIT
then we
- build $GUEST_INIT without "-static"
- add -DCONFIG_GUEST_PRE_INIT to $CFLAGS
- build $ARCH_PRE_INIT as guest/guest_pre_init.o and embed it
into lkvm the same as we do with guest/guest_init.o
This also means that ARCH_PRE_INIT case doesn't depend on glibc-static,
we can relax the SOURCE_STATIC check later.
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Turn kvm_setup_guest_init(guestfs_name) into a more generic helper,
extract_file(guestfs_name, filename, data, size) and reimplement
kvm_setup_guest_init() as a trivial wrapper.
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The vcpu module is a core component which should be removed last, but the
destructor was mistakenly marked as something that should be done first.
This would cause the vcpu data to be freed up before anything else had the
chance to exit, and assuming that that data was still valid - causing use
after frees.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
According to git grep they can be static.
term_got_escape can be static too, and we can even move it into
term_getc().
"int term_escape_char" doesn't make sense at least until we allow
to redefine it, turn it into preprocessor constant.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This allows the user to always override the paramaters set by lkvm.
Say, currently 'lkvm run -p ro' doesn't work.
To keep the current logic we need to change strstr("root=") to check
cfg.kernel_cmdline, not real_cmdline. And perhaps we can even add a
simple helper add_param(name, val) to make this all more consistent;
it should only append "name=val" to real_cmdline if cfg.kernel_cmdline
doesn't include this paramater.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Signed-off-by: Sven Dowideit <SvenDowideit@home.org.au>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
If one typically only boots full disk-images, one wouldn't necessaraly
want to statically link glibc, for the guest-init feature of the
kvmtool. As statically linked glibc triggers haevy security
maintainance.
Signed-off-by: Dimitri John Ledkov <dimitri.j.ledkov@intel.com>
[will: moved all the guest_init handling into builtin_setup.c]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently Makefile accepts only armv7l.* When building kvmtool under 32bit
personality on Aarch64 machines, uname -m reports "armv8l", so build fails.
We expect doing 32bit arm builds in Aarch64 to become standard the same way
people do i386 builds on x86_64 machines.
Make the sed test a little more greedy so armv8l becomes acceptable.
Acked-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When VCPU #0 exits (e.g. due to KVM_EXIT_SYSTEM_EVENT), it sends
SIGKVMEXIT to all other VCPUs, waits for them to exit, then tears down
any remaining context. The signalling of SIGKVMEXIT is critical to
forcing VCPUs to shut down in response to a system event (e.g. PSCI
SYSTEM_OFF).
VCPUs other that VCPU #0 simply exit in kvm_cpu_thread without forcing
other CPUs to shut down. Thus if a system event is taken on a VCPU other
than VCPU #0, the remaining CPUs are left online. This results in KVM
tool not exiting as expected when a system event is taken on a VCPU
other than VCPU #0 (as may happen if the guest panics).
Fix this by tearing down all CPUs upon a system event, regardless of the
CPU on which the event occurred. While this means the VCPU thread will
signal itself, and VCPU #0 will signal all other VCPU threads a second
time, these are harmless.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
If an IO port device has no io_in handler, kvm__emulate_io would fall
through and call the io_out handler instead. Fix to only call the
handler for the appropriate direction.
If no handler exists, kvm__emulate_io will automatically treat it as an
IO error (due to the default "ret = false").
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The IO error path in kvm__emulate_io would call br_read_unlock(), then
goto error, which would call br_read_unlock() again. Refactor the
control flow to have only one exit path and one call to
br_read_unlock().
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
To detach tap device automatically from bridge when exiting,
just like what the reverse of "script" does.
Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
PAGE_SIZE may have been defined by the C libary (musl-libc does that).
So avoid redefining it here unconditionally, instead only use our
definition if none has been provided by the libc.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The clang compiler by default dislikes non-literal format strings
in *printf functions, so it complains about kvm__set_dir() in kvm.c
and about the error reporting functions.
Since a fix is not easy and the code itself is fine (just seems that
the compiler is not smart enough to see that), let's just disable
the warning. Since GCC knows about this option as well (it just
doesn't have it enabled with -Wall), we can unconditionally add this
to the warning options.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
As we now have the header file in our repository, we can safely follow
the recommendation in kvm.c and remove the hack adding the
KVM_CAP_MAX_VCPUS macro.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The musl-libc library provides implementations of strlcpy and strlcat,
so introduce a feature check for it and only use the kvmtool
implementation if there is no library support for it.
This avoids clashes with the public definition.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The manpage of poll(2) states that the prototype of poll is defined
in <poll.h>. Use that header file instead of <sys/poll.h> to allow
compilation against musl-libc.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
According to the manpage and the prototype the second argument to
connect(2) is a "const struct sockaddr*", so cast our protocol
specific type back to the super type.
This fixes compilation on musl-libc.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
clang does not like two const specifiers in one declaration, so
remove one to let clang compile kvmtool.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Stripping has no effect on object files, so having "-s -c" on the
command line makes no sense.
In fact clang complains about it and aborts with an error, so lets
just remove the unneeded "-s" switch here.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
A socket (as any other file descriptor) is of type "int" to catch the
negative error cases. Fix the declaration to allow errors to be
detected.
Found and needed by clang.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Some functions in qcow.c return u64, but are checked against < 0
because they want to check for the -1 error return value.
Do an explicit comparison against the casted -1 to express this
properly.
This was silently compiled out by gcc, but clang complained about it.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Due to our kernel heritage we have code in kvmtool that relies on
the (still) implicit -std=gnu89 compiler switch.
It turns out that this just affects some structure initialization,
where we currently provide a cast to the type, which upsets GCC for
anything beyond -std=gnu89 (for instance gnu99 or gnu11).
We do need the casts when initializing structures that are not
assigned to the same type, so we put it there explicitly.
This allows us to compile with all the three GNU standards GCC
currently supports: gnu89/90, gnu99 and gnu11.
GCC threatens people with moving to gnu11 as the new default standard,
so lets fix this better sooner than later.
(Compiling without GNU extensions still breaks and I don't bother to
fix that without very good reasons.)
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently we unconditionally create a virtual GICv2 in the guest.
Add a --irqchip= parameter to let the user specify a different GIC
type for the guest, when omitting this parameter it still defaults to
--irqchip=gicv2.
For now the only other supported type is --irqchip=gicv3
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[will: use pr_err instead of fprintf]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|