Commit graph

475 commits

Author SHA1 Message Date
Peter Grehan fd19881492 Temporarily revert r282922 which bumped the max descriptors.
While there is no issued with the number of descriptors in
a virtio indirect descriptor, it's a guest's choice as to
whether indirect descriptors are used. For the case where
they aren't, the virtio block ring size is still 64 which
is less than the now reported max_segs of 67. This results
in an assertion in recent Linux guests even though it was
benign since they were using indirect descs.

The intertwined relationship between virtio ring size,
max seg size and blockif queue size will be addressed
in an upcoming commit, at which point the max descriptors
will again be bumped up to 67.
2015-05-21 04:19:22 +00:00
Peter Grehan 253396a378 Bump the size of the blockif scatter-gather list to 67.
The Windows virtio driver ignores the advertized seg_max
field and assumes the host can accept up to 67 segments
in indirect descriptors, triggering an assert in the bhyve
process.

No objection from:	mav
Reviewed by:	neel
Reported and tested by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-05-14 21:08:48 +00:00
Peter Grehan 604b521003 Set the subvendor field in config space to the vendor ID.
This is required by the Windows virtio drivers to correctly
match a device.

Submitted by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-05-13 17:38:07 +00:00
Neel Natu 4e43c1e8b5 Allow configuration of the sector size advertised to the guest.
The default behavior is to infer the logical and physical sector sizes from
the block device backend. However older versions of Windows only work with
specific logical/physical combinations:
- Vista and Windows 7:	512/512
- Windows 7 SP1:	512/512 or 512/4096

For this reason allow the sector size to be specified using the following
block device option: sectorsize=logical[/physical]

Reported by:	Leon Dang (ldang@nahannisys.com)
Reviewed by:	grehan
MFC after:	2 weeks
2015-05-12 00:30:39 +00:00
Peter Grehan be80efd491 Handling indirect descriptors is a capability of the host and
not one that needs to be negotiated. Use the host capabilities
field and not the negotiated field when verifying that indirect
descriptors are supported.

Found with the Redhat Windows viostor driver, which clears
the indirect capability in the negotiated caps and then starts
using them.

Reported and tested by: Leon Dang (ldang@nahannisys.com)
MFC after:   2 weeks
2015-05-11 21:24:10 +00:00
Baptiste Daroussin 3deada4168 Merge from HEAD 2015-05-07 23:18:23 +00:00
Neel Natu 1cba333329 Allow byte reads of AHCI registers.
This is needed to support Windows guests that use byte reads to access certain
AHCI registers (e.g. PxTFD.Status and PxTFD.Error).

Reviewed by:	grehan, mav
Reported by:	Leon Dang (ldang@nahannisys.com)
Differential Revision:	https://reviews.freebsd.org/D2469
MFC after:	2 weeks
2015-05-07 18:35:15 +00:00
Alexander Motin 79f1cdb4fb Add memory barrier to r281764.
While race at this point may cause only a single packet delay and so was
not really reproduced, it is better to not have it at all.

MFC after:	1 week
2015-05-06 18:04:31 +00:00
Neel Natu 9c4d547896 Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().
Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.

The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.

Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.

Reviewed by:	tychon
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D2428
2015-05-06 16:25:20 +00:00
Alexander Motin 5b3ee130e3 Reimplement queue freeze on error, added in r282429:
It is not required to use CLO to recover from task file error, it should
be enough to do only stop/start, that does not clear the PxTFD.STS.ERR.

MFC after:	13 days
2015-05-06 09:59:19 +00:00
Alexander Motin b208a147b9 Implement in-order execution of non-NCQ commands.
Using status updates in r282364, block queue on BSY, DRQ or ERR bits set.
This can be a performance penalization for non-NCQ commands, but it is
required for proper error recovery and standard compliance.

MFC after:	2 weeks
2015-05-04 19:55:01 +00:00
Baptiste Daroussin 7757a1b4dc Merge from head 2015-05-03 19:30:11 +00:00
Alexander Motin 9dba9460d9 Implement basic PxTFD.STS.BSY reporting.
MFC after:	2 weeks
2015-05-03 07:43:58 +00:00
Alexander Motin 1025d8e679 Initialize PxCMD on reset and make its read-only bits such.
MFC after:	2 weeks
2015-05-02 16:11:29 +00:00
Alexander Motin 52f224dfbf Handle ATA_SEND_FPDMA_QUEUED as NCQ in ahci_port_stop().
MFC after:	1 week
2015-05-02 14:43:37 +00:00
Neel Natu fd4e0d4c52 Advertise an additional memory BAR in the "dummy" device emulation.
This is useful for testing the MOVS emulation when both the source and
destination addresses are in the MMIO space.

MFC after:	1 week
2015-05-02 03:25:24 +00:00
Neel Natu f39630c2d6 Implement the century byte in the RTC. Some guests require this field to be
properly set.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-28 23:44:47 +00:00
Neel Natu 54335630a7 Don't allow guest to modify readonly bits in the PCI config 'status' register.
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-24 19:15:38 +00:00
John Baldwin 179fa75e6e Reassign copyright statements on several files from Advanced
Computing Technologies LLC to Hudson River Trading LLC.

Approved by:	Hudson River Trading LLC (who owns ACT LLC)
MFC after:	1 week
2015-04-23 14:22:20 +00:00
Alexander Motin fdd86701e5 Don't set bits that should be zero for SATA devices.
Old value made Linux think that it is PATA device with SATA bridge.

MFC after:	2 weeks
2015-04-20 19:11:27 +00:00
Alexander Motin 910280e539 Report link as up if tap device is not specified (black hole).
MFC after:	2 weeks
2015-04-20 14:55:01 +00:00
Alexander Motin f2c58daab8 Report link as up only if we managed to open tap device.
It would be cool to report tap device status, but it has no such API.

MFC after:	2 weeks
2015-04-20 14:23:18 +00:00
Alexander Motin d9a6698393 Disable RX/TX queues notifications when not needed.
This reduces CPU load and doubles iperf throughput, reaching 2-3Gbit/s.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-04-20 10:29:42 +00:00
Alexander Motin bb1524af0c Workaround bhyve virtual disks operation on top of GEOM providers.
GEOM does not support scatter/gather lists in its I/Os.  Such requests
are cut in pieces by physio(), that may be problematic, if those pieces
are not multiple of provider's sector size.  If such case is detected,
move the data through temporary sequential buffer.

MFC after:	2 weeks
2015-04-18 20:10:19 +00:00
Alexander Motin 0990a33089 Make virtual AHCI more careful with I/O lengths.
MFC after:	2 weeks
2015-04-17 20:20:55 +00:00
Neel Natu 77afcadd51 If the number of guest vcpus is less than '1' then flag it as an error.
MFC after:	1 week
2015-04-16 20:11:49 +00:00
Tycho Nightingale 3b65fbe4d1 Prior to aborting due to an ioport error, it is always interesting to
see what the guest's %rip is.

Reviewed by:	grehan
2015-04-15 18:49:03 +00:00
Baptiste Daroussin ea4a4d8a2e Fix overlinking in bhyve:
libvmmapi is actually needed to be linked to libutil, not bhyve nor bhyveload
2015-04-09 21:38:40 +00:00
Baptiste Daroussin 4bf53d0b46 Merge from HEAD 2015-04-03 23:23:09 +00:00
Tycho Nightingale 703e4974aa Prior to aborting due to an instruction emulation error, it is always
interesting to see what the guest's %rip and instruction bytes are.

Reviewed by:	grehan
2015-04-01 20:36:07 +00:00
Peter Grehan fed2d5edfc Move legacy interrupt allocation for virtio devices to common code.
There are a number of assumptions about legacy interrupts always
being available in virtio so don't allow back-ends to make the
decision to support them.

This fixes the issue seen with virtio-rnd on OpenBSD. MSI-x vectors
were not being used, and the virtio-rnd backend wasn't allocating a
legacy interrupt resulting in a bhyve assert and guest exit.

Reported by:	Julian Hsiao, madoka at nyanisore dot net
Reviewed by:	neel
MFC after:	1 week
2015-03-27 01:58:44 +00:00
Alexander Motin 8187174a9b Add missing variable initialization.
Reported by:	Coverity
CID:		1288938
MFC after:	3 days
2015-03-20 16:05:13 +00:00
Baptiste Daroussin 59fa1525e0 Merge from head 2015-03-17 19:10:51 +00:00
Alexander Motin cb5c792950 Report that we may have write cache, and that we do support FLUSH.
FreeBSD guest driver does not use that legacy flag, but Linux seems does.

MFC after:	2 weeks
2015-03-16 20:13:25 +00:00
Alexander Motin 54b7bb7626 Increase S/G list size of 32 to 33 entries.
32 entries are not enough for the worst case of misaligned 128KB request,
that made FreeBSD to chunk large quests in odd pieces.

MFC after:	2 weeks
2015-03-16 09:15:59 +00:00
Alexander Motin e365f36c32 Pre-allocate one extra request per processing thread.
Processing threads call callbacks before freeing requests.  As result,
new requests may arrive before old ones are freed.

MFC after:	2 weeks
2015-03-15 22:44:53 +00:00
Alexander Motin 811a355f1a According to Linux and QEMU, s/n equal to buffer is not zero-terminated.
This makes same s/n reported for both virtio and AHCI drivers.

MFC after:	2 weeks
2015-03-15 17:45:16 +00:00
Alexander Motin f2e62de7d9 Close potential race on blockif_close().
Reported by:	vangyzen
MFC after:	2 weeks
2015-03-15 16:18:03 +00:00
Alexander Motin 7315946b80 Fix networking problem after r280026.
I've missed that network driver sometimes returns taken request back to
available queue without processing.  Add new helper function for that case.

Reported by:	flo
MFC after:	2 weeks
2015-03-15 16:09:39 +00:00
Alexander Motin e72d4950e1 Give AHCI disk serial based on backing file path same as for virtio block.
It is still not good that they may intersect on different hosts, but that
is better then intersecting on the same host.

MFC after:	2 weeks
2015-03-15 15:29:03 +00:00
Alexander Motin 066a8f1411 Rewrite virtio block device driver to work asynchronously and use the block
I/O interface.

Asynchronous operation, based on r280026 change, allows to not block virtual
CPU during I/O processing, that on slow/busy storage can take seconds.
Use of recently improved block I/O interface allows to process multiple
requests same time, that improves random I/O performance on wide storages.

Benchmarks of virtual disk, backed by ZVOL on RAID10 pool of 4 HDDs, show
~3.5 times random read performance improvements, while no degradation on
linear I/O.  Guest CPU usage during test dropped from 100% to almost zero.

MFC after:	2 weeks
2015-03-15 14:57:11 +00:00
Alexander Motin fdb7e97f87 Modify virtqueue helpers added in r253440 to allow queuing.
Original virtqueue design allows queued and out-of-order processing, but
helpers added in r253440 suppose only direct blocking in-order one.
It could be fine for network, etc., but it is a huge limitation for storage
devices.
2015-03-15 11:37:07 +00:00
Baptiste Daroussin 7426d57242 Merge from head 2015-03-15 10:58:47 +00:00
Alexander Motin 7e8e553940 Block delete capability for read-only devices.
Submitted by:	neel
MFC after:	2 weeks
2015-03-15 08:09:56 +00:00
Alexander Motin 79565afed8 Give block I/O interface multiple (8) execution threads.
On parallel random I/O this allows better utilize wide storage pools.
To not confuse prefetcher on linear I/O, consecutive requests are executed
sequentially, following the same logic as was earlier implemented in CTL.

Benchmarks of virtual AHCI disk, backed by ZVOL on RAID10 pool of 4 HDDs,
show ~3.5 times random read performance improvements, while no degradation
on linear I/O.

MFC after:	2 weeks
2015-03-14 21:15:45 +00:00
Alexander Motin df57ec4933 Add checksums to identify data and NCQ command error log.
MFC after:	2 weeks
2015-03-14 14:06:37 +00:00
Alexander Motin b441dabf7e Slightly polish virtual AHCI CD reporting.
MFC after:	2 weeks
2015-03-14 12:18:26 +00:00
Alexander Motin fb329df8e4 Fix NOP and IDLE commands for virtual AHCI disks.
MFC after:	2 weeks
2015-03-14 10:38:25 +00:00
Alexander Motin 1fcb801948 Add support for NCQ variant of DSM TRIM for virtual AHCI disks.
The code is not really tested yet due to lack of initiator support.

Requested by:	imp
MFC after:	2 weeks
2015-03-14 09:46:43 +00:00
Alexander Motin 9009f43407 Improve NCQ errors reporting for virtual AHCI disks.
While this implementation is still not perfect, previous was just broken.

MFC after:	2 weeks
2015-03-14 08:45:54 +00:00
Alexander Motin dcd0c998a9 Remove incorrect SERR register setting.
At this point we have nothing to report through that register.

MFC after:	2 weeks
2015-03-13 21:01:25 +00:00
Alexander Motin 9463f47b3a Change prdbc value reporting.
MFC after:	2 weeks
2015-03-13 20:56:17 +00:00
Alexander Motin 295e61d6a3 Polish AHCI disk identify data and fix speed negotiation.
MFC after:	2 weeks
2015-03-13 20:14:35 +00:00
Alexander Motin 5f6b63de7a Add support for PIO variants of READ/WRITE commands for AHCI disks.
AHCI API hides all PIO specifics, so this functionality is almost free.

MFC after:	2 weeks
2015-03-13 18:35:38 +00:00
Alexander Motin f7c5bc2cfe Use ahci_write_fis_d2h() for commands completion.
MFC after:	2 weeks
2015-03-13 18:04:07 +00:00
Alexander Motin 0b9d25c935 Add DSM TRIM command support for virtual AHCI disks.
It works only for virtual disks backed by ZVOLs and raw devices supporting
BIO_DELETE.  Virtual disks backed by files won't report this capability.

MFC after:	2 weeks
Relnotes:	yes
2015-03-13 16:43:52 +00:00
Alexander Motin f5f4836d62 Add variable initialization missed by me and clang.
Reported by:	grehan
MFC after:	2 weeks
2015-03-05 20:29:18 +00:00
Alexander Motin 371f1d88b6 Fix error translation broken in r279658.
Reported by:	grehan
MFC after:	2 weeks
2015-03-05 20:24:34 +00:00
Alexander Motin 2d678f1f4f Implement cache flush for ahci-hd and for virtio-blk over device.
MFC after:	2 weeks
2015-03-05 15:29:18 +00:00
Alexander Motin d951589ddb Add check for absent stripe size to r279652.
MFC after:	2 weeks
2015-03-05 13:52:30 +00:00
Alexander Motin 94682383d9 Report logical/physical sector sizes for virtual SATA disk.
MFC after:	2 weeks
2015-03-05 12:21:12 +00:00
Alexander Motin 297c4868dd Add support for TOPOLOGY feature of virtio block device.
Passing through physical block size/offset from underlying storage allows
guest to manage proper data and I/O alignment to improve performance.

MFC after:	2 weeks
2015-03-05 10:40:45 +00:00
Baptiste Daroussin c2e2d02cbe Make FreeBSD-bhyve an indivual package 2015-03-05 07:30:48 +00:00
Neel Natu 12f91c70a3 Emulate MSR 0xC0011024 when running on AMD processors.
OpenBSD guests test bit 0 of this MSR to detect whether the workaround for
erratum 721 has been applied.

Reported by:	Jason Tubnor (jason@tubnor.net)
MFC after:	1 week
2015-02-24 05:15:40 +00:00
Neel Natu c974767896 Add "-u" option to bhyve(8) to indicate that the RTC should maintain UTC time.
The default remains localtime for compatibility with the original device model
in bhyve(8). This is required for OpenBSD guests which assume that the RTC
keeps UTC time.

Reviewed by:	grehan
Pointed out by:	Jason Tubnor (jason@tubnor.net)
MFC after:	2 weeks
2015-02-24 02:04:16 +00:00
Peter Grehan 65392c66a5 Don't close a block context if it couldn't be opened,
for example if the backing file doesn't exist,
avoiding a null deref.

Reviewed by:	neel
MFC after:	1 week.
2015-02-23 22:31:39 +00:00
Neel Natu d087a39935 Simplify instruction restart logic in bhyve.
Keep track of the next instruction to be executed by the vcpu as 'nextrip'.
As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should
start execution.

Also, instruction restart happens implicitly via 'vm_inject_exception()' or
explicitly via 'vm_restart_instruction()'. The APIs behave identically in
both kernel and userspace contexts. The main beneficiary is the instruction
emulation code that executes in both contexts.

bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length'
as readonly:
- Restarting an instruction is now done by calling 'vm_restart_instruction()'
  as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout())
- Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP
  as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch())

Differential Revision:	https://reviews.freebsd.org/D1526
Reviewed by:		grehan
MFC after:		2 weeks
2015-01-18 03:08:30 +00:00
Neel Natu 0dafa5cd4b Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.
The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.

Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.

The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>

Reviewed by:	tychon
Discussed with:	grehan
Differential Revision:	https://reviews.freebsd.org/D1385
MFC after:	2 weeks
2014-12-30 22:19:34 +00:00
Baptiste Daroussin c6db8143ed Convert usr.sbin to LIBADD
Reduce overlinking
2014-11-25 16:57:27 +00:00
Edward Tomasz Napierala aca4343c62 Fix improper .Fx macro usage.
Differential Revision:	https://reviews.freebsd.org/D1158
Reviewed by:	wblock@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2014-11-19 18:19:21 +00:00
Tycho Nightingale 48a9d8f214 To allow a request to be submitted from within the callback routine of
a completing one increase the total by 1 but don't advertise it.

Reviewed by:	grehan
2014-11-09 21:08:52 +00:00
Tycho Nightingale ae45750d6c Improve the ability to cancel an in-flight request by using an
interrupt, via SIGCONT, to force the read or write system call to
return prematurely.

Reviewed by:	grehan
2014-11-04 01:06:33 +00:00
Tycho Nightingale 26bf96112b If the start bit, PxCMD.ST, is cleared and nothing is in-flight then
PxCI, PxSACT, PxCMD.CCS and PxCMD.CR should be 0.

Reviewed by:	grehan
2014-11-03 12:55:31 +00:00
Neel Natu c17d4a83b8 Add a comment explaining the intent behind the I/O reservation [0x72-0x77]. 2014-10-26 21:17:44 +00:00
Neel Natu 160ef77abf Move the ACPI PM timer emulation into vmm.ko.
This reduces variability during timer calibration by keeping the emulation
"close" to the guest. Additionally having all timer emulations in the kernel
will ease the transition to a per-VM clock source (as opposed to using the
host's uptime keep track of time).

Discussed with:	grehan
2014-10-26 04:44:28 +00:00
Neel Natu e1a172e1c2 IFC @r273214 2014-10-20 02:57:30 +00:00
Neel Natu 592cd7d3be Don't advertise the "OS visible workarounds" feature in cpuid.80000001H:ECX.
bhyve doesn't emulate the MSRs needed to support this feature at this time.

Don't expose any model-specific RAS and performance monitoring features in
cpuid leaf 80000007H.

Emulate a few more MSRs for AMD: TSEG base address, TSEG address mask and
BIOS signature and P-state related MSRs.

This eliminates all the unimplemented MSRs accessed by Linux/x86_64 kernels
2.6.32, 3.10.0 and 3.17.0.
2014-10-19 21:38:58 +00:00
Tycho Nightingale 3ef05c4677 Support stopping and restarting the AHCI command list via toggling
PxCMD.ST from '1' to '0' and back.  This allows the driver a chance to
recover if for instance a timeout occurred due to activity on the
host.

Reviewed by:	grehan
2014-10-17 11:37:50 +00:00
Neel Natu 2688a818a3 Don't advertise the Instruction Based Sampling feature because it requires
emulating a large number of MSRs.

Ignore writes to a couple more AMD-specific MSRs and return 0 on read.

This further reduces the unimplemented MSRs accessed by a Linux guest on boot.
2014-10-17 06:23:04 +00:00
Neel Natu 02904c45ab Hide extended PerfCtr MSRs on AMD processors by clearing bits 23, 24 and 28 in
CPUID.80000001H:ECX.

Handle accesses to PerfCtrX and PerfEvtSelX MSRs by ignoring writes and
returning 0 on reads.

This further reduces the number of unimplemented MSRs hit by a Linux guest
during boot.
2014-10-17 03:04:38 +00:00
Neel Natu 913d54b96e Emulate the "Hardware Configuration" MSR when running on an AMD host.
This gets rid of the "TSC doesn't count with P0 frequency!" message when
booting a Linux guest.

Tested on an "AMD Opteron 6320" courtesy of Ben Perrault.
2014-10-16 19:27:26 +00:00
Neel Natu ed6aacb51f IFC @r272887 2014-10-10 23:52:56 +00:00
Neel Natu 5295c3e61d Support Intel-specific MSRs that are accessed when booting up a linux in bhyve:
- MSR_PLATFORM_INFO
- MSR_TURBO_RATIO_LIMITx
- MSR_RAPL_POWER_UNIT

Reviewed by:	grehan
MFC after:	1 week
2014-10-09 19:13:33 +00:00
Neel Natu 02c282e862 iasl(8) expects integer fields in data tables to be specified as hexadecimal
values. Therefore the bit width of the "PM Timer Block" was actually being
interpreted as 50-bits instead of the expected 32-bit.

This eliminates an error message emitted by a Linux 3.17 guest during boot:
"Invalid length for FADT/PmTimerBlock: 50, using default 32"

Reviewed by:	grehan
MFC after:	1 week
2014-10-09 19:02:32 +00:00
Neel Natu 8ccb28efcd Implement the FLUSH operation in the virtio-block emulation.
This gets rid of the following error message during FreeBSD guest bootup:
"vtbd0: hard error cmd=flush fsbn 0"

Reported by:	rodrigc
Reviewed by:	grehan
2014-10-07 17:08:53 +00:00
Neel Natu 107af8f2ed IFC @r272481 2014-10-05 01:28:21 +00:00
Peter Grehan 8b58e6af3c Add new fields in the FADT, required by IASL 20140926-64.
The new IASL from the recent acpi-ca import will error out
if it doesn't see these new fields, which were previously
reserved.

Reported by:	lme
Reviewed by:	neel
2014-10-03 17:27:30 +00:00
Neel Natu 970388bf8d IFC @r272185 2014-09-27 22:15:50 +00:00
Peter Grehan 5ed6ab5baa Correct display of bhyve SMBIOS UUIDs with dmidecode by bumping the version.
The mixed little/big-endianness of SMBIOS UUIDs was clarified in v2.6
of the SMBIOS spec. dmidecode uses the reported version of SMBIOS to
determine the layout and what to byte-swap.

bhyve's SMBIOS reported as 2.4 though it implemented the 2.6-style of
memory layout. This resulted in dmidecode reporting a different
UUID than one passed in via the -U option.

Fix by exporting a version of 2.6.

Reviewed by:	tychon
Reported by:	julian
MFC after:	1 day
2014-09-23 01:17:22 +00:00
Neel Natu 8f02c5e456 IFC r271888.
Restructure MSR emulation so it is all done in processor-specific code.
2014-09-20 21:46:31 +00:00
Neel Natu b6cf6c8ca6 IFC @r271887 2014-09-20 06:27:37 +00:00
Neel Natu c3498942a5 Restructure the MSR handling so it is entirely handled by processor-specific
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.

The VT-x code has the following types of MSRs:

- MSRs that are unconditionally saved/restored on every guest/host context
  switch (e.g., MSR_GSBASE).

- MSRs that are restored to guest values on entry to vmx_run() and saved
  before returning. This is an optimization for MSRs that are not used in
  host kernel context (e.g., MSR_KGSBASE).

- MSRs that are emulated and every access by the guest causes a trap into
  the hypervisor (e.g., MSR_IA32_MISC_ENABLE).

Reviewed by:	grehan
2014-09-20 02:35:21 +00:00
Neel Natu 4e27d36d38 IFC @r271694 2014-09-17 18:46:51 +00:00
Glen Barber 7fca1ad503 Update the bhyve(8) manual to reflect that it is no
longer considered 'experimental.'

Reviewed by:	grehan
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2014-09-17 16:45:20 +00:00
Neel Natu bbadcde418 Set the 'vmexit->inst_length' field properly depending on the type of the
VM-exit and ultimately on whether nRIP is valid. This allows us to update
the %rip after the emulation is finished so any exceptions triggered during
the emulation will point to the right instruction.

Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability
is available. The effective segment field in EXITINFO1 is not valid without
this capability.

Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the
VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging.

Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2
fields in bhyve(8).

Reviewed by:	Anish Gupta (akgupt3@gmail.com)
Reviewed by:	grehan
2014-09-14 04:39:04 +00:00
Neel Natu 1aba8e7ff8 Initialize 'bc_rdonly' to the right value.
Note that independent of this change a readonly disk file would still be
opened O_RDONLY and protected from writes by the guest.

Reviewed by:	grehan
2014-09-11 21:15:20 +00:00
Peter Grehan 82560f19d0 Allow vtnet operation without merged rx buffers.
NetBSD's virtio-net implementation doesn't negotiate
the merged rx-buffers feature. To support this, check
to see if the feature was negotiated, and then adjust
the operation of the receive path accordingly by using
a larger iovec, and a smaller rx header.
In addition, ignore writes to the (read-only) status byte.

Tested with NetBSD/amd64 5.2.2, 6.1.4 and 7-beta.

Reviewed by:	neel, tychon
Phabric:	D745
MFC after:	3 days
2014-09-09 22:35:02 +00:00
Peter Grehan e18f344b9b Add a callback to be notified about negotiated features.
Submitted by:	luigi
Obtained from:	Vincenzo Maffione, Universita` di Pisa
MFC after:	3 days
2014-09-09 04:11:54 +00:00
Neel Natu 04da7226c4 Set the 'inst_length' to '0' early on before any error conditions are detected
in the emulation of the task switch. If any exceptions are triggered then the
guest %rip should point to instruction that caused the task switch as opposed
to the one after it.
2014-08-30 18:35:16 +00:00
Tycho Nightingale b297e71ede Fix a recursive lock acquisition in vi_reset_dev().
Reviewed by:	grehan
2014-08-22 13:01:22 +00:00
Neel Natu 33424543f2 Minor cleanup:
- Set 'pirq_cold' to '0' on the first PIRQ allocation.
- Make assertions stronger.

Reviewed by:	jhb
CR:		https://phabric.freebsd.org/D592
2014-08-13 00:14:26 +00:00
Neel Natu 12a6eb99a1 Support PCI extended config space in bhyve.
Add the ACPI MCFG table to advertise the extended config memory window.

Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted
or moved in the guest's address space. The PCI extended config space is an
example of an immutable memory range.

Add emulation for the "movzw" instruction. This instruction is used by FreeBSD
to read a 16-bit extended config space register.

CR:		https://phabric.freebsd.org/D505
Reviewed by:	jhb, grehan
Requested by:	tychon
2014-08-08 03:49:01 +00:00
Tycho Nightingale 42404fae46 Commands which encounter a fatal error shouldn't be marked as completed.
Furthermore, provide an indication of the current command so it can be
determined which one actually failed.

Reviewed by:	grehan
2014-07-30 18:47:31 +00:00
Neel Natu afd5e8ba88 Simplify the meaning of return values from the inout handlers. After this
change 0 means success and non-zero means failure.

This also helps to eliminate VMEXIT_POWEROFF and VMEXIT_RESET as return values
from VM-exit handlers.

CR:		D480
Reviewed by:	grehan, jhb
2014-07-25 20:18:35 +00:00
Neel Natu e84d8ebfcc Reduce the proliferation of VMEXIT_RESTART in task_switch.c.
This is in preparation for further simplification of the return values from
VM exit handlers in bhyve(8).
2014-07-24 05:31:57 +00:00
Neel Natu d37f2adb38 Fix fault injection in bhyve.
The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.

A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.
2014-07-24 01:38:11 +00:00
Neel Natu d665d229ce Emulate instructions emitted by OpenBSD/i386 version 5.5:
- CMP REG, r/m
- MOV AX/EAX/RAX, moffset
- MOV moffset, AX/EAX/RAX
- PUSH r/m
2014-07-23 04:28:51 +00:00
Neel Natu 091d453222 Handle nested exceptions in bhyve.
A nested exception condition arises when a second exception is triggered while
delivering the first exception. Most nested exceptions can be handled serially
but some are converted into a double fault. If an exception is generated during
delivery of a double fault then the virtual machine shuts down as a result of
a triple fault.

vm_exit_intinfo() is used to record that a VM-exit happened while an event was
being delivered through the IDT. If an exception is triggered while handling
the VM-exit it will be treated like a nested exception.

vm_entry_intinfo() is used by processor-specific code to get the event to be
injected into the guest on the next VM-entry. This function is responsible for
deciding the disposition of nested exceptions.
2014-07-19 20:59:08 +00:00
Neel Natu 3d5444c864 Add emulation for legacy x86 task switching mechanism.
FreeBSD/i386 uses task switching to handle double fault exceptions and this
change enables that to work.

Reported by:	glebius
2014-07-16 21:26:26 +00:00
Peter Grehan ad15140ee7 Use the blockif CHS routine to create fake CHS values,
and then populate them in the identity page.

This fixes a divide-by-zero error at probe time with NetBSD.

MFC after:	1 week.
2014-07-15 00:27:08 +00:00
Peter Grehan c4813fadf1 Add a call to synthesize a C/H/S value for block emulations
that require it (ahci). The algorithm used is from the VHD
specification.
2014-07-15 00:25:54 +00:00
Peter Grehan 18e32ebc89 Extend capabilities to 64-bits in preparation for some API changes.
The v1.0 virtio spec supports an extended size for guest/host
caps, but in practice 64-bits should last for a long time.
2014-07-05 02:38:53 +00:00
Peter Grehan f23a8ac1b9 Use correct flag for event index.
Submitted by:	luigi
Obtained from:	Vincenzo Maffione, Universita` di Pisa
MFC after:	1 week
2014-07-03 00:23:14 +00:00
Neel Natu 64fe72354c Add post-mortem debugging for "EPT Misconfiguration" VM-exit. This error
is hard to reproduce so try to collect all the breadcrumbs when it happens.

Reviewed by:	grehan
2014-06-27 18:00:38 +00:00
John Baldwin cde1f5b8a0 Sort command flags in usage output and the manpages. 2014-06-27 15:20:34 +00:00
Peter Grehan 62f17e92fe Set the version and date to fixed fields rather than using
preprocessor macros that don't allow reproducible builds.
As a side-effect, the date string is now spec-compliant.

root@bhyve:~ # dmidecode
# dmidecode 2.12
SMBIOS 2.4 present.
12 structures occupying 514 bytes.
Table at 0x000F101F.

Handle 0x0001, DMI type 0, 24 bytes
BIOS Information
        Vendor: BHYVE
        Version: 1.0
        Release Date: 03/14/2014

Submitted by:	des (original version)
Reviewed by:	tychon
MFC after:	1 week
2014-06-27 05:27:37 +00:00
John Baldwin 5749449d9b - Document -b to enable the bvmcons console (but mark it as deprecated
similar to -g.)
- Document -U to set the SMBIOS UUID.
- Add missing options to the usage output and to the manpage Synopsis.
- Don't claim that bvmdebug is amd64-only (it is also a device, not an
  option).
2014-06-26 20:12:38 +00:00
Neel Natu be679db4cd Provide APIs to directly get 'lowmem' and 'highmem' size directly.
Previously the sizes were inferred indirectly based on the size of the mappings
at 0 and 4GB respectively. This works fine as long as size of the allocation is
identical to the size of the mapping in the guest's address space. However, if
the mapping is disjoint then this assumption falls apart (e.g., due to the
legacy BIOS hole between 640KB and 1MB).
2014-06-24 02:02:51 +00:00
Baptiste Daroussin 01c2b8ac0d use .Mt to mark up email addresses consistently (part2)
PR:		191174
Submitted by:	Franco Fichtner  <franco@lastsummer.de>
2014-06-20 09:57:27 +00:00
Neel Natu 79aad80d3c Fix typo and rename macro KDB_SYS_FLAG to KBD_SYS_FLAG.
Reviewed by:	tychon
2014-06-18 17:20:02 +00:00
Tycho Nightingale 67b6ffaad6 r267169 should apply to 64-bit BARs as well.
Reviewed by:	neel
2014-06-09 19:55:50 +00:00
Joel Dahl 087129d22c Remove blank lines. 2014-06-09 19:29:10 +00:00
Tycho Nightingale b6ae8b050b Some devices (e.g. Intel AHCI and NICs) support quad-word access to
register pairs where two 32-bit registers make up a larger logical
size.  Support those access by splitting the quad-word into two
double-words.

Reviewed by:	grehan
2014-06-06 16:18:37 +00:00
Neel Natu 26cdcdbebb Use MIN(a,b) from <sys/param.h> instead of rolling our own version.
Pointed out by:	grehan
2014-06-01 02:47:09 +00:00
Neel Natu 0be3798af5 Limit the maximum number of back-to-back iterations of a "rep; ins/outs"
to 16. This is arbitrary and is used to ensure that a vcpu goes back into
the vm_run() loop to process interrupts or rendezvous events in a timely
fashion.

Found with:	Coverity Scan
CID:		1216436
2014-06-01 02:13:07 +00:00
Neel Natu 95ebc360ef Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing
it implicitly in vmm.ko.

Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.

This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.
2014-05-31 23:37:34 +00:00
Neel Natu 65ffa035a7 Add segment protection and limits violation checks in vie_calculate_gla()
for 32-bit x86 guests.

Tested using ins/outs executed in a FreeBSD/i386 guest.
2014-05-27 04:26:22 +00:00
Neel Natu 6303b65d35 Fix issue with restarting an "insb/insw/insl" instruction because of a page
fault on the destination buffer.

Prior to this change a page fault would be detected in vm_copyout(). This
was done after the I/O port access was done. If the I/O port access had
side-effects (e.g. reading the uart FIFO) then restarting the instruction
would result in incorrect behavior.

Fix this by validating the guest linear address before doing the I/O port
emulation. If the validation results in a page fault exception being injected
into the guest then the instruction can now be restarted without any
side-effects.
2014-05-26 18:21:08 +00:00
Neel Natu 5382c19d81 Do the linear address calculation for the ins/outs emulation using a new
API function 'vie_calculate_gla()'.

While the current implementation is simplistic it forms the basis of doing
segmentation checks if the guest is in 32-bit protected mode.
2014-05-25 00:57:24 +00:00
Neel Natu da11f4aa1d Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out
of the guest linear address space. These APIs in turn use a new ioctl
'VM_GLA2GPA' to convert the guest linear address to guest physical.

Use the new copyin/copyout APIs when emulating ins/outs instruction in
bhyve(8).
2014-05-24 23:12:30 +00:00
Neel Natu e813a87350 Consolidate all the information needed by the guest page table walker into
'struct vm_guest_paging'.

Check for canonical addressing in vmm_gla2gpa() and inject a protection
fault into the guest if a violation is detected.

If the page table walk is restarted in vmm_gla2gpa() then reset 'ptpphys' to
point to the root of the page tables.
2014-05-24 20:26:57 +00:00
Neel Natu a7424861fb Check for alignment check violation when processing in/out string instructions. 2014-05-23 19:59:14 +00:00
Neel Natu d17b5104a9 Add emulation of the "outsb" instruction. NetBSD guests use this to write to
the UART FIFO.

The emulation is constrained in a number of ways: 64-bit only, doesn't check
for all exception conditions, limited to i/o ports emulated in userspace.

Some of these constraints will be relaxed in followup commits.

Requested by:	grehan
Reviewed by:	tychon (partially and a much earlier version)
2014-05-23 05:15:17 +00:00
John Baldwin b3e9732a76 Implement a PCI interrupt router to route PCI legacy INTx interrupts to
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
  8 steerable pins configured via config space access to byte-wide
  registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
  pin and a PCI interrupt router pin.  When a PCI INTx interrupt is
  asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
  8259A pins (ISA IRQs) and initialize the interrupt line config register
  for the corresponding PCI function with the ISA IRQ as this matches
  existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
  configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
  PRT tables and return the appropriate table based on the configured
  routing configuration.  Note that if the lpc device is not configured, no
  routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
  to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
  pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
  the DSDT.  iasl(8) will fill in the actual number of elements, and
  this makes it simpler to generate a Package with a variable number of
  elements.

Reviewed by:	tycho
2014-05-15 14:16:55 +00:00
Neel Natu 0dd10c0047 Don't include the guest memory segments in the bhyve(8) process core dump.
This has not added a lot of value when debugging bhyve issues while greatly
increasing the time and space required to store the core file.

Passing the "-C" option to bhyve(8) will change the default and dump guest
memory in the core dump.

Requested by:	grehan
Reviewed by:	grehan
2014-05-13 16:40:27 +00:00
Neel Natu ee2dbd023c abort(3) the process in response to a VMEXIT_ABORT. This usually happens in
response to an unhandled VM exit or an unexpected error so a core is useful.

Remove unused macro VMEXIT_SWITCH.

Reviewed by:	grehan
2014-05-12 23:35:10 +00:00
Neel Natu 2bd073e13b Disable the 'uart_drain()' callback when the emulated receive FIFO is full.
Failing to do this will cause the kevent(2) notification to trigger
continuously and the bhyve(8) mevent thread will hog the cpu until the
characters on the backend tty device are drained.

Also, make the uart backend file descriptor non-blocking to avoid a
select(2) before every byte read from that backend.

Reviewed by:	grehan
2014-05-05 23:54:13 +00:00
Neel Natu 9b6155a20c Modify the "-p" option to be more flexible when associating a 'vcpu' with
a 'hostcpu'. The new format of the argument string is "vcpu:hostcpu".

This allows pinning a subset of the vcpus if desired.

It also allows pinning a vcpu to more than a single 'hostcpu'.

Submitted by:	novel (initial version)
2014-05-05 18:06:35 +00:00
Neel Natu 067824256f Remove misleading "addcpu" in an error message emitted by fbsdrun_deletecpu().
Pointed out by:	novel
2014-05-05 16:35:37 +00:00
Neel Natu 09fd42cb88 Re-adding an event to a kqueue modifies the parameters of the original event.
However, if the original knote had been disabled then it is not automatically
re-enabled.

Fix this by using EV_ADD to create an mevent and EV_ENABLE to enable it.

Adding a kevent for the first time implicitly enables it so existing callers
of mevent_add() don't need to change.

Reviewed by:	grehan
2014-05-05 16:30:03 +00:00
Neel Natu b100acf254 Don't allow MPtable generation if there are multiple PCI hierarchies. This is
because there isn't a standard way to relay this information to the guest OS.

Add a command line option "-Y" to bhyve(8) to inhibit MPtable generation.

If the virtual machine is using PCI devices on buses other than 0 then it can
still use ACPI tables to convey this information to the guest.

Discussed with:	grehan@
2014-05-02 04:51:31 +00:00
Neel Natu e50ce2aa06 Add logic in the HLT exit handler to detect if the guest has put all vcpus
to sleep permanently by executing a HLT with interrupts disabled.

When this condition is detected the guest with be suspended with a reason of
VM_SUSPEND_HALT and the bhyve(8) process will exit.

Tested by executing "halt" inside a RHEL7-beta guest.

Discussed with:	grehan@
Reviewed by:	jhb@, tychon@
2014-05-02 00:33:56 +00:00
Neel Natu 2cb97c9dd6 Ignore writes to microcode update MSR. This MSR is accessed by RHEL7 guest.
Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.
2014-04-30 02:08:27 +00:00
Neel Natu c6a0cc2e21 Some Linux guests will implement a 'halt' by disabling the APIC and executing
the 'HLT' instruction. This condition was detected by 'vm_handle_hlt()' and
converted into the SPINDOWN_CPU exitcode . The bhyve(8) process would exit
the vcpu thread in response to a SPINDOWN_CPU and when the last vcpu was
spun down it would reset the virtual machine via vm_suspend(VM_SUSPEND_RESET).

This functionality was broken in r263780 in a way that made it impossible
to kill the bhyve(8) process because it would loop forever in
vm_handle_suspend().

Unbreak this by removing the code to spindown vcpus. Thus a 'halt' from
a Linux guest will appear to be hung but this is consistent with the
behavior on bare metal. The guest can be rebooted by using the bhyvectl
options '--force-reset' or '--force-poweroff'.

Reviewed by:	grehan@
2014-04-29 18:42:56 +00:00
Neel Natu f0fdcfe247 Allow a virtual machine to be forcibly reset or powered off. This is done
by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual
machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF.

The disposition of VM_SUSPEND is also made available to the exit handler
via the 'u.suspended' member of 'struct vm_exit'.

This capability is exposed via the '--force-reset' and '--force-poweroff'
arguments to /usr/sbin/bhyvectl.

Discussed with:	grehan@
2014-04-28 22:06:40 +00:00
Peter Grehan 67e1705297 Implement legacy interrupts for the AHCI device emulation
according to the method outlined in the AHCI spec.

Tested with FreeBSD 9/10/11 with MSI disabled,
and also NetBSD/amd64 (lightly).

Reviewed by:	neel, tychon
MFC after:	3 weeks
2014-04-28 18:41:25 +00:00
Peter Grehan fcbec69157 Respect and track the enable bit in the PCI configuration address word.
Ignore writes, and return 0xff's, on config accesses when not set.
Behaviour now matches that seen on h/w.

Found with a NetBSD/amd64 guest.

Reviewed by:	tychon
MFC after:	3 weeks
2014-04-25 17:35:34 +00:00
Tycho Nightingale d42ea5731e Provide a very basic stub for the 8042 PS/2 keyboard controller.
Reviewed by:	jhb
Approved by:	neel (co-mentor)
2014-04-25 13:38:18 +00:00
Xin LI 994f858a8b Use calloc() in favor of malloc + memset.
Reviewed by:	neel
2014-04-22 18:55:21 +00:00
Tycho Nightingale 82c2c89084 Factor out common ioport handler code for better hygiene -- pointed
out by neel@.

Approved by:	neel (co-mentor)
2014-04-22 16:13:56 +00:00
Tycho Nightingale 1d6be92ac6 Fix ACPI DSDT indentation cosmetic breakage introduced in r264631 --
pointed out by jhb@.

Approved by:	grehan (co-mentor)
2014-04-18 16:01:19 +00:00
Tycho Nightingale d6aa08c3ef Respect the destination operand size of the 'Input from Port' instruction.
Approved by:	grehan (co-mentor)
2014-04-18 15:22:56 +00:00
Tycho Nightingale 79d6ca331e Add support for reading the PIT Counter 2 output signal via the NMI
Status and Control register at port 0x61.

Be more conservative about "catching up" callouts that were supposed
to fire in the past by skipping an interrupt if it was
scheduled too far in the past.

Restore the PIT ACPI DSDT entries and add an entry for NMISC too.

Approved by:	neel (co-mentor)
2014-04-18 00:02:06 +00:00
Tycho Nightingale b96be57a2d Add support for emulating the slave PIC.
Reviewed by:	grehan, jhb
Approved by:	grehan (co-mentor)
2014-04-14 19:00:20 +00:00
Tycho Nightingale 8b4a7f857b Constrain the amount of data returned to what is actually available
not the size of the buffer.

Approved by:	grehan (co-mentor)
2014-04-09 14:50:55 +00:00
John Baldwin 2cf4f7ef79 Handle single-byte reads from the bvmcons port (0x220) by returning
0xff.  Some guests may attempt to read from this port to identify
psuedo-PNP ISA devices.  (The ie(4) driver in FreeBSD/i386 is one
example.)

Reviewed by:	grehan
2014-04-08 21:02:03 +00:00
Peter Grehan 9d0c4e17d9 Add support for the virtio RNG entropy-source device.
Call through to /dev/random synchronously to fill
virtio buffers with RNG data.

Tested with FreeBSD-CURRENT and Ubuntu guests.

Submitted by:	Leon Dang
Discussed with:	markm
MFC after:	3 weeks
Sponsored by:	Nahanni Systems
2014-04-02 20:18:17 +00:00
Neel Natu b15a09c05e Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called
from any context i.e., it is not required to be called from a vcpu thread. The
ioctl simply sets a state variable 'vm->suspend' to '1' and returns.

The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the
vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The
suspend handler waits until all 'vm->active_cpus' have transitioned to
'vm->suspended_cpus' before returning to userspace.

Discussed with:	grehan
2014-03-26 23:34:27 +00:00
Tycho Nightingale e883c9bb40 Move the atpit device model from userspace into vmm.ko for better
precision and lower latency.

Approved by:	grehan (co-mentor)
2014-03-25 19:20:34 +00:00
Neel Natu 0826d045cc Use 'cpuset_t' to represent the vcpus active in a virtual machine. 2014-03-20 18:15:37 +00:00
Tycho Nightingale 4feac03f2c Don't reissue in-flight commands.
Approved by:	neel (co-mentor)
2014-03-18 23:25:35 +00:00
Tycho Nightingale 7292923b49 Though there currently isn't a way to insert new media into an ATAPI
drive, at least pretend to support Asynchronous Notification (AN) to
avoid a guest needlessly polling for it.

Approved by:	grehan (co-mentor)
2014-03-16 12:33:40 +00:00
Tycho Nightingale 113d84c11d Support the bootloader's single 16-bit 'outw' access to the Divisor
Latch MSB and LSB registers.

Approved by:	neel (co-mentor)
2014-03-16 12:31:28 +00:00
Tycho Nightingale 762fd20804 Replace the userspace atpic stub with a more functional vmm.ko model.
New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.

Reviewed by:	jhb, neel
Approved by:	neel (co-mentor)
2014-03-11 16:56:00 +00:00
Peter Grehan f4959d3537 Open the uart emulation's backing tty in non-blocking mode.
This fixes the issue of bhyve appearing to halt when using
nmdm ports for the console, until a connection is made to
the other end.

bhyveload already does this.

Reported by:	Many.
MFC after:	3 weeks.
2014-03-07 06:23:37 +00:00
Tycho Nightingale af5bfc53b8 Add SMBIOS support.
A new option, -U, can be used to set the UUID in the System
Information (Type 1) structure.  Manpage fix to follow.

Approved by:	grehan (co-mentor)
2014-03-04 17:12:06 +00:00
Neel Natu 9777ca203c Document the "-a" and "-x" options to match the changes in r262236.
Reviewed by:	grehan
2014-02-26 19:14:54 +00:00
Neel Natu dc50650607 Queue pending exceptions in the 'struct vcpu' instead of directly updating the
processor-specific VMCS or VMCB. The pending exception will be delivered right
before entering the guest.

The order of event injection into the guest is:
- hardware exception
- NMI
- maskable interrupt

In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt
window-exiting and inject it as soon as possible after the hardware exception
is injected. Also since interrupts are inherently asynchronous, injecting
them after the hardware exception should not affect correctness from the
guest perspective.

Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
it to only deliver x86 hardware exceptions. This new ioctl is now used to
inject a protection fault when the guest accesses an unimplemented MSR.

Discussed with:	grehan, jhb
Reviewed by:	jhb
2014-02-26 00:52:05 +00:00
Peter Grehan 4258c52e29 Fix virtio spec URL.
Submitted by:	lwhsu
MFC after:	1 week
2014-02-21 22:45:35 +00:00
Tycho Nightingale 182d7debb9 Avoid clobbering the counter mode when issuing a latch command.
Approved by:	grehan (co-mentor)
2014-02-21 01:15:26 +00:00
Neel Natu 52e5c8a2ec Simplify APIC mode switching from MMIO to x2APIC. In part this is done to
simplify the implementation of the x2APIC virtualization assist in VT-x.

Prior to this change the vlapic allowed the guest to change its mode from
xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked
when the virtual machine is created. This is not very constraining because
operating systems already have to deal with BIOS setting up the APIC in
x2APIC mode at boot.

Fix a bug in the CPUID emulation where the x2APIC capability was leaking
from the host to the guest.

Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore
MSR accesses to the vlapic when it is in xAPIC mode.

The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8)
can be used to change the mode to x2APIC instead.

Discussed with:	grehan@
2014-02-20 01:48:25 +00:00
Neel Natu 7a902ec0ec Add a check to validate that memory BARs of passthru devices are 4KB aligned.
Also, the MSI-x table offset is not required to be 4KB aligned so take this
into account when computing the pages occupied by the MSI-x tables.
2014-02-18 19:00:15 +00:00
John Baldwin a96b8b801a Tweak the handling of PCI capabilities in emulated devices to remove
the non-standard zero capability list terminator.   Instead, track
the start and end of the most recently added capability and use that
to adjust the previous capability's next pointer when a capability is
added and to determine the range of config registers belonging to
PCI capability registers.

Reviewed by:	neel
2014-02-18 03:00:20 +00:00
Neel Natu 06db1b4a59 Update bhyve(8) man page to describe the usage of the "-s" option to assign
bus numbers to emulated devices. Also add the restriction that the LPC bridge
emulation can only be configured on bus 0.

Reviewed by:	grehan@
2014-02-14 21:46:04 +00:00
Neel Natu d84882ca8f Allow PCI devices to be configured on all valid bus numbers from 0 to 255.
This is done by representing each bus as root PCI device in ACPI. The device
implements the _BBN method to return the PCI bus number to the guest OS.

Each PCI bus keeps track of the resources that is decodes for devices
configured on the bus: i/o, mmio (32-bit) and mmio (64-bit). These windows
are advertised to the guest via the _CRS object of the root device.

Bus 0 is treated specially since it consumes the I/O ports to access the
PCI config space [0xcf8-0xcff]. It also decodes the legacy I/O ports that
are consumed by devices on the LPC bus. For this reason the LPC bridge can
be configured only on bus 0.

The bus number can be specified using the following command line option
to bhyve(8): "-s <bus>:<slot>:<func>,<emul>[,<config>]"

Discussed with:	grehan@
Reviewed by:	jhb@
2014-02-14 21:34:08 +00:00
Tycho Nightingale 2a261121af Provide an indication a "PIO Setup Device to Host FIS" occurred while executing
the IDENTIFY DEVICE and IDENTIFY PACKET DEVICE commands.

Also, provide an indication a "D2H Register FIS" occurred during a SET FEATURES
command.

Approved by:	grehan (co-mentor)
2014-02-12 00:32:14 +00:00
John Baldwin 1f82944f35 Mark the I/O ports used by the bhyve console and debug devices as system
resources.

MFC after:	1 week
2014-02-07 20:53:41 +00:00
John Baldwin 3cbf3585cb Enhance the support for PCI legacy INTx interrupts and enable them in
the virtio backends.
- Add a new ioctl to export the count of pins on the I/O APIC from vmm
  to the hypervisor.
- Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for
  ISA interrupts.
- Populate the MP Table with I/O interrupt entries for any PCI INTx
  interrupts.
- Create a _PRT table under the PCI root bridge in ACPI to route any
  PCI INTx interrupts appropriately.
- Track which INTx interrupts are in use per-slot so that functions
  that share a slot attempt to distribute their INTx interrupts across
  the four available pins.
- Implicitly mask INTx interrupts if either MSI or MSI-X is enabled
  and when the INTx DIS bit is set in a function's PCI command register.
  Either assert or deassert the associated I/O APIC pin when the
  state of one of those conditions changes.
- Add INTx support to the virtio backends.
- Always advertise the MSI capability in the virtio backends.

Submitted by:	neel (7)
Reviewed by:	neel
MFC after:	2 weeks
2014-01-29 14:56:48 +00:00
John Baldwin d2bc4816c5 Remove support for legacy PCI devices. These haven't been needed since
support for LPC uart devices was added and it conflicts with upcoming
patches to add PCI INTx support.

Reviewed by:	neel
2014-01-27 22:26:15 +00:00
Tycho Nightingale 4e5f86e009 Fix issue with stale fields from a recycled request pulled off the freelist.
Approved by:	grehan (co-mentor)
2014-01-22 01:57:52 +00:00
Tycho Nightingale 40eb53f232 Increase the block-layer backend maximum number of requests to match
the AHCI command queue depth.  This allows a slew of commands issued
by a Linux guest to be absorbed without error.

Approved by:	grehan (co-mentor)
2014-01-22 01:56:49 +00:00
Peter Grehan d68f0bd618 Fix issue with the virtio descriptor region being truncated
if it was above 4GB. This was seen with CentOS 6.5 guests with
large RAM, since the block drivers are loaded late in the
boot sequence and end up allocating descriptor memory from
high addresses.

Reported by:	Michael Dexter
MFC after:	3 days
2014-01-09 07:17:21 +00:00
Remko Lodder a8be8e5ee3 virtio-block does not exist, the correct name is virtio-blk.
PR:		185573
Submitted by:	Allan Jude
Facilitated by:	Snow B.V.
MFC after:	3 days
2014-01-08 08:37:30 +00:00
Peter Grehan b1843e712e Cosmetic change - switch over to vertical SRCS to make it
easier to keep files in alpha order.

Reviewed by:	neel
2014-01-03 19:31:40 +00:00
John Baldwin e6c8bc291a Rework the DSDT generation code a bit to generate more accurate info about
LPC devices.  Among other things, the LPC serial ports now appear as
ACPI devices.
- Move the info for the top-level PCI bus into the PCI emulation code and
  add ResourceProducer entries for the memory ranges decoded by the bus
  for memory BARs.
- Add a framework to allow each PCI emulation driver to optionally write
  an entry into the DSDT under the \_SB_.PCI0 namespace.  The LPC driver
  uses this to write a node for the LPC bus (\_SB_.PCI0.ISA).
- Add a linker set to allow any LPC devices to write entries into the
  DSDT below the LPC node.
- Move the existing DSDT block for the RTC to the RTC driver.
- Add DSDT nodes for the AT PIC, the 8254 ISA timer, and the LPC UART
  devices.
- Add a "SuperIO" device under the LPC node to claim "system resources"
  aling with a linker set to allow various drivers to add IO or memory
  ranges that should be claimed as a system resource.
- Add system resource entries for the extended RTC IO range, the registers
  used for ACPI power management, the ELCR, PCI interrupt routing register,
  and post data register.
- Add various helper routines for generating DSDT entries.

Reviewed by:	neel (earlier version)
2014-01-02 21:26:59 +00:00
Neel Natu 0492757c70 Restructure the VMX code to enter and exit the guest. In large part this change
hides the setjmp/longjmp semantics of VM enter/exit. vmx_enter_guest() is used
to enter guest context and vmx_exit_guest() is used to transition back into
host context.

Fix a longstanding race where a vcpu interrupt notification might be ignored
if it happens after vmx_inject_interrupts() but before host interrupts are
disabled in vmx_resume/vmx_launch. We now called vmx_inject_interrupts() with
host interrupts disabled to prevent this.

Suggested by:	grehan@
2014-01-01 21:17:08 +00:00
John Baldwin 058e24d34b Extend the ACPI power management support to wire a virtual power button up
to SIGTERM when ACPI is enabled.  Sending SIGTERM to the hypervisor when an
ACPI-aware OS is running will now trigger a soft-off allowing for a graceful
shutdown of the guest.
- Move constants for ACPI-related registers to acpi.h.
- Implement an SMI_CMD register with commands to enable and disable ACPI.
  Currently the only change when ACPI is enabled is to enable the virtual
  power button via SIGTERM.
- Implement a fixed-feature power button when ACPI is enabled by asserting
  PWRBTN_STS in PM1_EVT when SIGTERM is received.
- Add support for EVFILT_SIGNAL events to mevent.
- Implement support for the ACPI system command interrupt (SCI) and assert
  it when needed based on the values in PM1_EVT.  Mark the SCI as active-low
  and level triggered in the MADT and MP Table.
- Mark PCI interrupts in the MP Table as active-low in addition to level
  triggered.

Reviewed by:	neel
2013-12-28 04:01:05 +00:00
John Baldwin cf952fe841 Use pthread_once() to replace a static integer initted flag.
Reviewed by:	neel
2013-12-28 03:21:15 +00:00
John Baldwin 6450da0774 Support soft power-off via the ACPI S5 state for bhyve guests.
- Implement the PM1_EVT and PM1_CTL registers required by ACPI.
  The PM1_EVT register is mostly a dummy as bhyve doesn't support any
  of the hardware-initiated events.  The only bit of PM1_CNT that is
  implemented are the sleep request bits (SPL_EN and SLP_TYP) which
  request a graceful power off for S5.  In particular, for S5, bhyve
  exits with a non-zero value which terminates the loop in vmrun.sh.
- Emulate the Reset Control register at I/O port 0xcf9 and advertise
  it as the reset register via ACPI.
- Advertise an _S5 package.
- Extend the in/out interface to allow an in/out handler to request
  that the hypervisor trigger a reset or power-off.
- While here, note that all vCPUs in a guest support C1 ("hlt").

Reviewed by:	neel (earlier version)
2013-12-24 16:14:19 +00:00
John Baldwin 330baf58c6 Extend the support for local interrupts on the local APIC:
- Add a generic routine to trigger an LVT interrupt that supports both
  fixed and NMI delivery modes.
- Add an ioctl and bhyvectl command to trigger local interrupts inside a
  guest.  In particular, a global NMI similar to that raised by SERR# or
  PERR# can be simulated by asserting LINT1 on all vCPUs.
- Extend the LVT table in the vCPU local APIC to support CMCI.
- Flesh out the local APIC error reporting a bit to cache errors and
  report them via ESR when ESR is written to.  Add support for asserting
  the error LVT when an error occurs.  Raise illegal vector errors when
  attempting to signal an invalid vector for an interrupt or when sending
  an IPI.
- Ignore writes to reserved bits in LVT entries.
- Export table entries the MADT and MP Table advertising the stock x86
  config of LINT0 set to ExtInt and LINT1 wired to NMI.

Reviewed by:	neel (earlier version)
2013-12-23 19:29:07 +00:00
Joel Dahl 6081b93c89 mdoc: nuke whitespace. 2013-12-23 15:00:15 +00:00
Neel Natu f80330a820 Add a parameter to 'vcpu_set_state()' to enforce that the vcpu is in the IDLE
state before the requested state transition. This guarantees that there is
exactly one ioctl() operating on a vcpu at any point in time and prevents
unintended state transitions.

More details available here:
http://lists.freebsd.org/pipermail/freebsd-virtualization/2013-December/001825.html

Reviewed by:	grehan
Reported by:	Markiyan Kushnir (markiyan.kushnir at gmail.com)
MFC after:	3 days
2013-12-22 20:29:59 +00:00
Neel Natu 851d84f1b5 Add an option to ignore accesses by the guest to unimplemented MSRs.
Also, ignore a couple of SandyBridge uncore PMC MSRs that Centos 6.4 writes
to during boot.

Reviewed by:	grehan
2013-12-19 22:27:28 +00:00
Neel Natu 55888cfaa2 Rename the ambiguously named 'vm_setup_msi()' and 'vm_setup_msix()' to
'vm_setup_pptdev_msi()' and 'vm_setup_pptdev_msix()' respectively.

It should now be clear that these functions operate on passthru devices.
2013-12-18 03:58:51 +00:00
Neel Natu 4f8be175d5 Add an API to deliver message signalled interrupts to vcpus. This allows
callers treat the MSI 'addr' and 'data' fields as opaque and also lets
bhyve implement multiple destination modes: physical, flat and clustered.

Submitted by:	Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
Reviewed by:	grehan@
2013-12-16 19:59:31 +00:00
Joel Dahl 05f7cd8bce mdoc: sort SEE ALSO. 2013-12-15 08:52:16 +00:00
Peter Grehan b13e60da56 bhyve(8) man page.
mdoc formatting and much input and review from Warren Block (wblock@).

Reviewed by:	many
MFC after:	3 days
2013-12-13 08:31:13 +00:00
Neel Natu 1c05219285 If a vcpu disables its local apic and then executes a 'HLT' then spin down the
vcpu and destroy its thread context. Also modify the 'HLT' processing to ignore
pending interrupts in the IRR if interrupts have been disabled by the guest.
The interrupt cannot be injected into the guest in any case so resuming it
is futile.

With this change "halt" from a Linux guest works correctly.

Reviewed by:	grehan@
Tested by:	Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
2013-12-07 22:18:36 +00:00
John Baldwin c71f0d951a Fix the processor table entry structure to use a fixed-width type for
32-bit fields so it is the correct size on amd64.  Remove a workaround
for the broken structure from bhyve(8).

MFC after:	1 week
2013-12-05 21:51:54 +00:00
Neel Natu b5b28fc9dc Add support for level triggered interrupt pins on the vioapic. Prior to this
commit level triggered interrupts would work as long as the pin was not shared
among multiple interrupt sources.

The vlapic now keeps track of level triggered interrupts in the trigger mode
register and will forward the EOI for a level triggered interrupt to the
vioapic. The vioapic in turn uses the EOI to sample the level on the pin and
re-inject the vector if the pin is still asserted.

The vhpet is the first consumer of level triggered interrupts and advertises
that it can generate interrupts on pins 20 through 23 of the vioapic.

Discussed with:	grehan@
2013-11-27 22:18:08 +00:00