Commit graph

74 commits

Author SHA1 Message Date
Mark Johnston 56a26fc1af libvmmapi: Conditionalize compilation of some functions
Hide definitions of several functions that currently don't have
implementatations in the arm64 vmm port.  In particular, add a
WITH_VMMAPI_SNAPSHOT preprocessor variable that can be used to enable
compilation of save/restore functions, and conditionalize compilation of
some functions only used by amd64 bhyve.  If in the long term they
remain amd64-only, they can move to vmmapi_machdep.c, but for now it's
not clear to me that that's the right thing to do.

MFC after:	2 weeks
Sponsored by:	Innovate UK
2024-04-10 11:17:56 -04:00
Mark Johnston 1855002ddf libvmmapi: Make vm_raise_msi() a common function
Currently, bhyve PCI emulation uses vm_lapic_msi() to raise an MSI in
the guest.  The arm64 port has a similar function, vm_raise_msi().
Add vm_raise_msi() on amd64 as well and have it simply call
vm_lapic_msi() so that bhyve can use a common, generically named
function.

Reviewed by:	corvink, andrew, jhb
MFC after:	2 weeks
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D41752
2024-04-10 11:17:56 -04:00
Mark Johnston 5ec6c3007e libvmmapi: Add arm64 support
- Define wrappers for some MD ioctls.
- Provide a list of vmm device ioctls for cap_ioctl_limit().
- Disable use of the lowmem region.

Reviewed by:	corvink
MFC after:	2 weeks
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D41005
2024-04-10 11:17:56 -04:00
Mark Johnston 7e0fa79412 libvmmapi: Make memory segment handling a bit more abstract
libvmmapi leaves a hole at [3GB, 4GB) in the guest physical address
space.  This hole is not used in the arm64 port, which maps everything
above 4GB.  This change makes the code a bit more general to accomodate
arm64 more naturally.  In particular:

- Remove vm_set_lowmem_limit(): it is unused and doesn't have
  well-defined constraints, e.g., nothing prevents a consumer from
  setting a lowmem limit above the highmem base.
- Define a constant for the highmem base and use that everywhere that
  the base is currently hard-coded.
- Make the lowmem limit a compile-time constant instead of a vmctx field.
- Store segment info in an array.
- Add vm_get_highmem_base(), for use in bhyve since the current value is
  hard-coded in some places.

No functional change intended.

Reviewed by:	corvink, jhb
MFC after:	2 weeks
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D41004
2024-04-10 11:17:56 -04:00
Mark Johnston 3170dcaea9 libvmmapi: Move more amd64-specific ioctl wrappers to vmmapi_machdep.c
No functional change intended.

Reviewed by:	corvink, jhb
MFC after:	2 weeks
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D41002
2024-04-10 11:17:56 -04:00
Mark Johnston e4656e10d1 libvmmapi: Move some ioctl wrappers to vmmapi_machdep.c
ioctls relating to segments and various x86-specific interrupt
controllers are easy candidates to move to vmmapi_machdep.c.

In vmmapi.h I'm just ifdefing MD prototypes for now.  We could instead
split vmmapi.h into multiple headers, e.g., vmmapi.h and
vmmapi_machdep.h, but it's not obvious to me yet that that's the right
approach.

No functional change intended.

Reviewed by:	corvink, jhb
MFC after:	2 weeks
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D40999
2024-04-10 11:17:55 -04:00
Warner Losh a2f733abcf lib: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:23:59 -07:00
Warner Losh b3e7694832 Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:16 -06:00
Mark Johnston e17eca3276 vmm: Avoid embedding cpuset_t ioctl ABIs
Commit 0bda8d3e9f ("vmm: permit some IPIs to be handled by userspace")
embedded cpuset_t into the vmm(4) ioctl ABI.  This was a mistake since
we otherwise have some leeway to change the cpuset_t for the whole
system, but we want to keep the vmm ioctl ABI stable.

Rework IPI reporting to avoid this problem.  Along the way, make VM_RUN
a bit more efficient:
- Split vmexit metadata out of the main VM_RUN structure.  This data is
  only written by the kernel.
- Have userspace pass a cpuset_t pointer and cpusetsize in the VM_RUN
  structure, as is done for cpuset syscalls.
- Have the destination CPU mask for VM_EXITCODE_IPIs live outside the
  vmexit info structure, and make VM_RUN copy it out separately.  Zero
  out any extra bytes in the CPU mask, like cpuset syscalls do.
- Modify the vmexit handler prototype to take a full VM_RUN structure.

PR:		271330
Reviewed by:	corvink, jhb (previous versions)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D40113
2023-05-23 21:15:59 -04:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
John Baldwin 0f735657aa bhyve: Remove vmctx member from struct vm_snapshot_meta.
This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel.  For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.

This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.

It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument.  As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38125
2023-03-24 11:49:06 -07:00
John Baldwin 7d9ef309bd libvmmapi: Add a struct vcpu and use it in most APIs.
This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi.  For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server.  The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38124
2023-03-24 11:49:06 -07:00
John Baldwin d3956e4673 vmm: Use struct vcpu in the instruction emulation code.
This passes struct vcpu down in place of struct vm and and integer
vcpu index through the in-kernel instruction emulation code.  To
minimize userland disruption, helper macros are used for the vCPU
arguments passed into and through the shared instruction emulation
code.

A few other APIs used by the instruction emulation code have also been
updated to accept struct vcpu in the kernel including
vm_get/set_register and vm_inject_fault.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37161
2022-11-18 10:25:37 -08:00
John Baldwin 2b4fe856f4 bhyve: Remove unused vm and vcpu arguments from vm_copy routines.
The arguments identifying the VM and vCPU are only needed for
vm_copy_setup.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37158
2022-11-18 10:25:36 -08:00
Mark Johnston 3e9b4532d1 libvmmapi: Provide an interface for limiting rights on the device fd
Currently libvmmapi provides a way to get a list of the allowed ioctls
on the vmm device file, so that bhyve can limit rights on the device
file fd.  The interface is rather strange: it allocates a copy of the
list but returns a const pointer, so the caller has to cast away the
const in order to free it without aggravating the compiler.

As far as I can see, there's no reason to make a copy of the array, but
changing vm_get_ioctls() to not do that would break compatibility.  So
this change just introduces a better interface: move all rights-limiting
logic into libvmmapi.

Any new operations on the fd should be wrapped by libvmmapi, so also
discourage use of vm_get_device_fd().  Currently bhyve uses it only when
limiting rights on the device fd.

No functional change intended.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D37098
2022-10-24 17:33:13 -04:00
Vitaliy Gusev 1228a047aa libvmm: add __BEGIN_DECLS/__END_DECLS for linking with c++ binaries
Reviewed by:	jhb, markj, imp
Sponsored by:	vStack
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35719
2022-07-11 15:58:42 -04:00
Vitaliy Gusev f0880ab791 libvmmapi: Add vm_close()
Currently there is no way to safely free a vm structure without
leaking the fd.  vm_destroy() closes the fd but also destroys the VM
whereas in some cases a VM needs to be opened (vm_open) and then
closed (vm_close).

Reviewed by:	jhb
Sponsored by:	vStack
Differential Revision:	https://reviews.freebsd.org/D35073
2022-06-30 14:21:57 -07:00
Robert Wing 3efc45f34e libvmm: constify vm_get_name()
Allows callers of vm_get_name() to retrieve the vm name without having
to allocate a buffer.

While in the vicinity, do minor cleanup in vm_snapshot_basic_metadata().

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D34290
2022-03-17 21:38:21 -08:00
Corvin Köhne e47fe3183e bhyve: add ROM emulation
Some PCI devices especially GPUs require a ROM to work properly.
The ROM is executed by boot firmware to initialize the device.
To add a ROM to a device use the new ROM option for passthru device
(e.g. -s passthru,0/2/0,rom=<path>/<to>/<rom>).

It's necessary that the ROM is executed by the boot firmware.
It won't be executed by any OS.
Additionally, the boot firmware should be configured to execute the
ROM file.
For that reason, it's only possible to use a ROM when using
OVMF with enabled bus enumeration.

Differential Revision:	https://reviews.freebsd.org/D33129
Sponsored by:   Beckhoff Automation GmbH & Co. KG
MFC after:      1 month
2022-03-10 12:30:37 +01:00
D Scott Phillips f8a6ec2d57 bhyve: support relocating fbuf and passthru data BARs
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.

We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.

[1]: https://github.com/freebsd/uefi-edk2/pull/9/

Originally by:	scottph
MFC After:	1 week
Sponsored by:	Intel Corporation
Reviewed by:	grehan
Approved by:	philip (mentor)
Differential Revision:	https://reviews.freebsd.org/D24066
2021-03-19 11:04:36 +08:00
Robert Wing 4f4065e0a2 libvmm: clean up vmmapi.h
struct checkpoint_op, enum checkpoint_opcodes, and
MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi
header.

They are used for the save/restore functionality that bhyve(8)
provides and are better suited in usr.sbin/bhyve/snapshot.h

Since bhyvectl(8) requires these, the Makefile for bhyvectl has been
modified to include usr.sbin/bhyve/snapshot.h

Reviewed by:    kevans, grehan
Differential Revision:  https://reviews.freebsd.org/D28410
2021-02-17 17:46:42 -09:00
John Baldwin 1925586e03 Honor the disabled setting for MSI-X interrupts for passthrough devices.
Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X.  This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).

This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.

While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.

Reported by:	Sony Arpita Das @ Chelsio
Reviewed by:	grehan, markj
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27212
2020-11-24 23:18:52 +00:00
Conrad Meyer 8a68ae80f6 vmm(4), bhyve(8): Expose kernel-emulated special devices to userspace
Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace
for use in, e.g., fallback instruction emulation (when userspace has a
newer instruction decode/emulation layer than the kernel vmm(4)).

Plumb the ioctl through libvmmapi and register the memory ranges in
bhyve(8).

Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D24525
2020-05-15 15:54:22 +00:00
John Baldwin 483d953a86 Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed.  In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken).  A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations.  The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system).  In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions.  The file format also does not currently support
versioning of individual chunks of state.  As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files.  The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility.  As a result, the current implementation is not enabled
by default.  It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by:	Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by:	Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes:	yes
Sponsored by:	University Politehnica of Bucharest
Sponsored by:	Matthew Grooms (student scholarships)
Sponsored by:	iXsystems
Differential Revision:	https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
Rodney W. Grimes 01d822d33b Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).

The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.

Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse.  [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.

bhyvectl --get-cpu-topology option added.

Reviewed by:	grehan (maintainer, earlier version),
Reviewed by:	bcr (manpages)
Approved by:	bde (mentor), phk (mentor)
Tested by:	Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after:	1 week
Relnotes:	Y
Differential Revision:	https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
John Baldwin fc276d92ae Add a way to temporarily suspend and resume virtual CPUs.
This is used as part of implementing run control in bhyve's debug
server.  The hypervisor now maintains a set of "debugged" CPUs.
Attempting to run a debugged CPU will fail to execute any guest
instructions and will instead report a VM_EXITCODE_DEBUG exit to
the userland hypervisor.  Virtual CPUs are placed into the debugged
state via vm_suspend_cpu() (implemented via a new VM_SUSPEND_CPU ioctl).
Virtual CPUs can be resumed via vm_resume_cpu() (VM_RESUME_CPU ioctl).

The debug server suspends virtual CPUs when it wishes them to stop
executing in the guest (for example, when a debugger attaches to the
server).  The debug server can choose to resume only a subset of CPUs
(for example, when single stepping) or it can choose to resume all
CPUs.  The debug server must explicitly mark a CPU as resumed via
vm_resume_cpu() before the virtual CPU will successfully execute any
guest instructions.

Reviewed by:	avg, grehan
Tested on:	Intel (jhb), AMD (avg)
Differential Revision:	https://reviews.freebsd.org/D14466
2018-04-06 22:03:43 +00:00
John Baldwin 5f8754c077 Add a new variant of the GLA2GPA ioctl for use by the debug server.
Unlike the existing GLA2GPA ioctl, GLA2GPA_NOFAULT does not modify
the guest.  In particular, it does not inject any faults or modify
PTEs in the guest when performing an address space translation.

This is used by bhyve's debug server to read and write memory for
the remote debugger.

Reviewed by:	grehan
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D14075
2018-02-26 19:19:05 +00:00
John Baldwin 4f8666989a Add two new ioctls to bhyve for batch register fetch/store operations.
These are a convenience for bhyve's debug server to use a single
ioctl for 'g' and 'G' rather than a loop of individual get/set
ioctl requests.

Reviewed by:	grehan
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D14074
2018-02-22 00:39:25 +00:00
Pedro F. Giffuni 5e53a4f90f lib: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-26 02:00:33 +00:00
Bartek Rutkowski 00ef17befe Capsicum support for bhyve(8).
Adds Capsicum sandboxing to bhyve.

Submitted by:	Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by:	grehan, oshogbo
Approved by:	emaste, grehan
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D8290
2017-02-14 13:35:59 +00:00
Pedro F. Giffuni 75f46cf6c8 lib: minor spelling fixes in comments.
No functional change.
2016-05-01 19:37:33 +00:00
Neel Natu 9b1aa8d622 Restructure memory allocation in bhyve to support "devmem".
devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.

devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.

Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).

Reviewed by:	tychon
Discussed with:	grehan
Differential Revision:	https://reviews.freebsd.org/D2762
MFC after:	4 weeks
2015-06-18 06:00:17 +00:00
Neel Natu 9c4d547896 Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().
Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.

The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.

Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.

Reviewed by:	tychon
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D2428
2015-05-06 16:25:20 +00:00
Tycho Nightingale ef7c2a82ed Fix "MOVS" instruction memory to MMIO emulation. Currently updates to
%rdi, %rsi, etc are inadvertently bypassed along with the check to
see if the instruction needs to be repeated per the 'rep' prefix.

Add "MOVS" instruction support for the 'MMIO to MMIO' case.

Reviewed by:	neel
2015-04-01 00:15:31 +00:00
Neel Natu 009e2acba6 Fix a bug in libvmmapi 'vm_copy_setup()' where it would return success even if
the 'gpa' was in the guest MMIO region. This would manifest as a segmentation
fault in 'vm_map_copyin()' or 'vm_map_copyout()' because 'vm_map_gpa()' would
return NULL for this 'gpa'.

Fix this by calling 'vm_map_gpa()' in 'vm_copy_setup' and returning a failure
if the 'gpa' cannot be mapped. This matches the behavior of 'vm_copy_setup()'
in vmm.ko.

MFC after:	1 week
2015-01-19 06:51:04 +00:00
Neel Natu d087a39935 Simplify instruction restart logic in bhyve.
Keep track of the next instruction to be executed by the vcpu as 'nextrip'.
As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should
start execution.

Also, instruction restart happens implicitly via 'vm_inject_exception()' or
explicitly via 'vm_restart_instruction()'. The APIs behave identically in
both kernel and userspace contexts. The main beneficiary is the instruction
emulation code that executes in both contexts.

bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length'
as readonly:
- Restarting an instruction is now done by calling 'vm_restart_instruction()'
  as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout())
- Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP
  as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch())

Differential Revision:	https://reviews.freebsd.org/D1526
Reviewed by:		grehan
MFC after:		2 weeks
2015-01-18 03:08:30 +00:00
Neel Natu 0dafa5cd4b Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.
The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.

Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.

The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>

Reviewed by:	tychon
Discussed with:	grehan
Differential Revision:	https://reviews.freebsd.org/D1385
MFC after:	2 weeks
2014-12-30 22:19:34 +00:00
Neel Natu d37f2adb38 Fix fault injection in bhyve.
The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.

A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.
2014-07-24 01:38:11 +00:00
Neel Natu d665d229ce Emulate instructions emitted by OpenBSD/i386 version 5.5:
- CMP REG, r/m
- MOV AX/EAX/RAX, moffset
- MOV moffset, AX/EAX/RAX
- PUSH r/m
2014-07-23 04:28:51 +00:00
Neel Natu 091d453222 Handle nested exceptions in bhyve.
A nested exception condition arises when a second exception is triggered while
delivering the first exception. Most nested exceptions can be handled serially
but some are converted into a double fault. If an exception is generated during
delivery of a double fault then the virtual machine shuts down as a result of
a triple fault.

vm_exit_intinfo() is used to record that a VM-exit happened while an event was
being delivered through the IDT. If an exception is triggered while handling
the VM-exit it will be treated like a nested exception.

vm_entry_intinfo() is used by processor-specific code to get the event to be
injected into the guest on the next VM-entry. This function is responsible for
deciding the disposition of nested exceptions.
2014-07-19 20:59:08 +00:00
Neel Natu be679db4cd Provide APIs to directly get 'lowmem' and 'highmem' size directly.
Previously the sizes were inferred indirectly based on the size of the mappings
at 0 and 4GB respectively. This works fine as long as size of the allocation is
identical to the size of the mapping in the guest's address space. However, if
the mapping is disjoint then this assumption falls apart (e.g., due to the
legacy BIOS hole between 640KB and 1MB).
2014-06-24 02:02:51 +00:00
Neel Natu 5fcf252f41 Add ioctl(VM_REINIT) to reinitialize the virtual machine state maintained
by vmm.ko. This allows the virtual machine to be restarted without having
to destroy it first.

Reviewed by:	grehan
2014-06-07 21:36:52 +00:00
Neel Natu 95ebc360ef Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing
it implicitly in vmm.ko.

Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.

This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.
2014-05-31 23:37:34 +00:00
Neel Natu 6303b65d35 Fix issue with restarting an "insb/insw/insl" instruction because of a page
fault on the destination buffer.

Prior to this change a page fault would be detected in vm_copyout(). This
was done after the I/O port access was done. If the I/O port access had
side-effects (e.g. reading the uart FIFO) then restarting the instruction
would result in incorrect behavior.

Fix this by validating the guest linear address before doing the I/O port
emulation. If the validation results in a page fault exception being injected
into the guest then the instruction can now be restarted without any
side-effects.
2014-05-26 18:21:08 +00:00
Neel Natu da11f4aa1d Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out
of the guest linear address space. These APIs in turn use a new ioctl
'VM_GLA2GPA' to convert the guest linear address to guest physical.

Use the new copyin/copyout APIs when emulating ins/outs instruction in
bhyve(8).
2014-05-24 23:12:30 +00:00
John Baldwin b3e9732a76 Implement a PCI interrupt router to route PCI legacy INTx interrupts to
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
  8 steerable pins configured via config space access to byte-wide
  registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
  pin and a PCI interrupt router pin.  When a PCI INTx interrupt is
  asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
  8259A pins (ISA IRQs) and initialize the interrupt line config register
  for the corresponding PCI function with the ISA IRQ as this matches
  existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
  configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
  PRT tables and return the appropriate table based on the configured
  routing configuration.  Note that if the lpc device is not configured, no
  routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
  to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
  pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
  the DSDT.  iasl(8) will fill in the actual number of elements, and
  this makes it simpler to generate a Package with a variable number of
  elements.

Reviewed by:	tycho
2014-05-15 14:16:55 +00:00
Neel Natu 0dd10c0047 Don't include the guest memory segments in the bhyve(8) process core dump.
This has not added a lot of value when debugging bhyve issues while greatly
increasing the time and space required to store the core file.

Passing the "-C" option to bhyve(8) will change the default and dump guest
memory in the core dump.

Requested by:	grehan
Reviewed by:	grehan
2014-05-13 16:40:27 +00:00
Neel Natu f0fdcfe247 Allow a virtual machine to be forcibly reset or powered off. This is done
by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual
machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF.

The disposition of VM_SUSPEND is also made available to the exit handler
via the 'u.suspended' member of 'struct vm_exit'.

This capability is exposed via the '--force-reset' and '--force-poweroff'
arguments to /usr/sbin/bhyvectl.

Discussed with:	grehan@
2014-04-28 22:06:40 +00:00
Tycho Nightingale b96be57a2d Add support for emulating the slave PIC.
Reviewed by:	grehan, jhb
Approved by:	grehan (co-mentor)
2014-04-14 19:00:20 +00:00
Neel Natu b15a09c05e Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called
from any context i.e., it is not required to be called from a vcpu thread. The
ioctl simply sets a state variable 'vm->suspend' to '1' and returns.

The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the
vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The
suspend handler waits until all 'vm->active_cpus' have transitioned to
'vm->suspended_cpus' before returning to userspace.

Discussed with:	grehan
2014-03-26 23:34:27 +00:00