Some implementations copy data to userspace, an operation which can in
principle fail. In preparation for adding a __result_use_check
annotation to copyin() and related functions, let implementations of
cpu_set_upcall() return an error, and check for errors when copying data
to user memory.
Reviewed by: kib, jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43100
Namely, switch from vm_fault_quick_hold() to pmap_extract() KPI to
translate gpa to hpa. Assert that the looked up hpa belongs to the wired
page, as it should be for the VM which is configured for pass-throu
(this is theoretically a restriction that could be removed on newer
DMARs).
Noted by: alc
Reviewed by: alc, jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43140
These are in fact GPLv2 when distributed with the Linux kernel, but the
license also allows MIT if distributed separately. Add the markers to
avoid interference by automated tools.
Differential Revision: https://reviews.freebsd.org/D32796
Reviewed by: royger
This patch implements the interrupt blocking VM capability on AMD
CPUs. Implementing this capability allows the GDB stub to single-step
a virtual machine without landing inside interrupt handlers.
Reviewed by: jhb, corvink
Sponsored by: Google, Inc. (GSoC 2022)
Differential Revision: https://reviews.freebsd.org/D42299
This patch implements single-stepping for AMD CPUs using the RFLAGS.TF
single-stepping mechanism. The GDB stub requests single-stepping
using the VM_CAP_RFLAGS_TF capability. Setting this capability will
set the RFLAGS.TF bit on the selected vCPU, activate DB exception
intercepts, and activate POPF/PUSH instruction intercepts. The
resulting DB exception is then caught by the IDT_DB vmexit handler and
bounced to userland where it is processed by the GDB stub. This patch
also makes sure that the value of the TF bit is correctly updated and
that it is not erroneously propagated into memory. Stepping over PUSHF
will cause the vm_handle_db function to correct the pushed RFLAGS
value and stepping over POPF will update the shadowed TF bit copy.
Reviewed by: jhb
Sponsored by: Google, Inc. (GSoC 2022)
Differential Revision: https://reviews.freebsd.org/D42296
This patch adds support for software breakpoint vmexits on AMD SVM.
It implements the VM_CAP_BPT_EXIT used to enable software breakpoints.
When enabled, breakpoint vmexits are passed to userspace where they
are handled by the GDB stub.
Reviewed by: jhb
Sponsored by: Google, Inc. (GSoC 2022)
Differential Revision: https://reviews.freebsd.org/D42295
This patch refactors AMD SVM event reflection to allow events to be
propagated to userland, rather than always reflected into the guest.
This is necessary to implement some capabilities that request VMEXITs
when a specific exception occurs (e.g. VM_CAP_BPT_EXIT).
Reviewed by: jhb
Sponsored by: Google, Inc. (GSoC 2022)
Differential Revision: https://reviews.freebsd.org/D42405
In particular, this enables support for PCI config access for domains
(segments) other than 0.
Reported by: cperciva
Tested by: cperciva (m7i.metal-48xl AWS instance)
Reviewed by: imp
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D42828
Split out some bits of pcie_cfgregopen that only need to be executed
once into helper functions in preparation for supporting multiple MCFG
entries.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D42829
This commit changes the API of pci_cfgreg(read|write) to add a domain
argument (referred to as a segment in ACPI parlance) (note that this
is not the same as a NUMA domain, but something PCI-specific). This
does not yet enable access to domains other than 0, but updates the
API to support domains.
Places that use hard-coded bus/slot/function addresses have been
updated to hardcode a domain of 0. A few places that have the PCI
domain (segment) available such as the acpi_pcib_acpi.c Host-PCI
bridge driver pass the PCI domain.
The hpt27xx(4) and hptnr(4) drivers fail to attach to a device not on
domain 0 since they provide APIs to their binary blobs that only
permit bus/slot/function addressing.
The x86 non-ACPI PCI bus drivers all hardcode a domain of 0 as they do
not support multiple domains.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D42827
A precursor to merging them. The spacing differs quite a bit between
the i386 and amd64 hypercall headers, despite very similar content.
Consistently use tabs instead of spaces.
Reviewed by: royger
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
For the uncommon items: Go through the tree and remove sccs tags that
didn't fit any nice pattern. If in the neighborhood, other SCM tags were
removed when they were detritis of long-ago CVS somehow in the early
mists of the project. Some adjacent copyrights stringswere removed (they
duplicated the copyright notices in the file). This also removed
non-standard formations of omission of SCCS tags (usually by adding an
extra #if 0 somewhere.
After this commit, a number of strings tagged with the 'what' @(#)
prefix remain, but they are primarily copyright notices.
Sponsored by: Netflix
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
We only want to produce syscall.mk for the main syscall table so default
to not producing it (send it to /dev/null) and add a syscalls.conf to
sys/kern to trigger the creation of sys/sys/syscall.mk. This eliminates
the need for entries in other syscalls.conf files and is a cleaner
pattern going forward.
Reviewed by: kevans, imp
Differential Revision: https://reviews.freebsd.org/D42663
All of these used the 'immediately at beginning' variation of the
BSD-2-Clause license. This wasn't intentional, just what I copied from
from a random file in the tree back in 2005. It was not an intentional
decision.
The different arch bus.h files are a mix of BSD-2-Clause and
BSD-4-Clause that have various copyright holders (Charles M. Hannum,
Christopher G. Demetriou, The NetBSD Foundation and KATO Takenori), and
some of the content of these files were likely copied from there.
However, apart from the uncopyrightable interface lines, there are very
few comments. It's unclear if these comments are 'original material'
here to copyright, but to the extent that there is, license it under the
standard BSD-2-Clause copyright that's the norm for the project today.
In any event, the standard BSD-2-Clause is also closer to those
originals.
In addition, FreeBSD uses different type definitions than the original
NetBSD code in part. The comments that were copied have been copied a
lot, but appear in NetBSD's bus.h files in NetBSD 1.3.
While I'm here, assign the copyright, to the extent any exists from me,
to the FreeBSD Foundation. I just cut and pasted these into _bus.h from
the different machine files and those files have a rich history of
modification from the original imports from NetBSD over more than 25
years so it's tricky to say who, exactly, wrote each bit. Given the size
of the files, this seems like the best compromise. Also add an
acknowledgement to the NetBSD 1.3 bus.h files and their authors (there
were no additional FreeBSD authors listed in the various
sys/*/include/bus.h files). Finally, use the SPDX identifier instead of
multiple copies of the text.
Differential Revision: https://reviews.freebsd.org/D42532
Sponsored by: Netflix
With clang it expands to "inline"; clang in practice may inline
externally visible functions even without the hint. So just remove the
hints and let the compiler decide.
No functional change intended. pmap.o is identical before and after
this patch.
Reviewed by: alc
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42446
The sysctl knob 'vm.pmap.allow_2m_x_ept' is loader tunable and have
public document entry in security(7) but is fetched from kernel
environment 'hw.allow_2m_x_ept'. That is inconsistent and obscure.
As there is public security advisory FreeBSD-SA-19:25.mcepsc [1],
people may refer to it and use 'hw.allow_2m_x_ept', let's keep old
name for compatibility.
[1] https://www.freebsd.org/security/advisories/FreeBSD-SA-19:25.mcepsc.asc
Reviewed by: kib
Fixes: c08973d09c Workaround for Intel SKL002/SKL012S errata
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42311
The following loader tunables do have corresponding sysctl MIBs but
with different names. That may be historical reason. Let's prefer
consistent naming for them so that it will be easier to read and
maintain.
1. hw.vmm.l1d_flush -> hw.vmm.vmx.l1d_flush
2. hw.vmm.l1d_flush_sw -> hw.vmm.vmx.l1d_flush_sw
3. hw.vmm.vmx.use_apic_pir -> hw.vmm.vmx.cap.posted_interrupts
4. hw.vmm.vmx.use_apic_vid -> hw.vmm.vmx.cap.virtual_interrupt_delivery
5. hw.vmm.vmx.use_tpr_shadowing -> hw.vmm.vmx.cap.tpr_shadowing
Old names are kept for compatibility.
Meanwhile, add sysctl flag CTLFLAG_TUN to them so that `sysctl -T` will
report them correctly.
Reviewed by: corvink, jhb, kib, #bhyve
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D42251
To match the sysctl MIBs and document entries in security(7).
Fixes: 2dec2b4a34 amd64: flush L1 data cache on syscall return with an error
Fixes: 17edf152e5 Control for Special Register Buffer Data Sampling mitigation
Reviewed by: kib
MFC after: 1 day
Differential Revision: https://reviews.freebsd.org/D42249
Make sure that we don't try to copy with a negative resid.
Make sure that we don't walk off the end of the iovec array.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42098
At least KMSAN relies on zero-initialization of AP PCPU regions, see
commit 4b136ef259.
Prior to commit af1c6d3f30 these were allocated with allocpages() in
the amd64 pmap, which always returns zero-initialized memory.
Reviewed by: kib
Fixes: af1c6d3f30 ("amd64: do not leak pcpu pages")
MFC after: 3 days
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D42241
Add a man page for pmap_kextract(9), with alias to vtophys(9). This man
page is based on pmap_extract(9).
Add it as cross reference in pmap(9), and add comments above the
function implementations.
Co-authored-by: Graham Perrin <grahamperrin@gmail.com>
Co-authored-by: mhorne
Sponsored by: The FreeBSD Foundation
Pull Request: https://github.com/freebsd/freebsd-src/pull/827
Use it wherever COMPAT_FREEBSD13 is currently specified.
Reviewed by: brooks, zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42100
to allow ABIs to indicate that SIGSYS is needed. Mark all native
FreeBSD ABIs with the flag.
This implicitly marks Linux' ABIs as not delivering SIGSYS on invalid
syscall.
Reviewed by: dchagin, markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41976
In particular, when the syscall number is too large, or when syscall is
dynamic. For that, add nosys_sysent structure to pass fake sysent to
syscall top code.
Reviewed by: dchagin, markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41976
Applies only to bare-metal Zen2 processors. The system currently
automatically applies it to all of them.
Tunable/sysctl 'machdep.mitigations.zenbleed.enable' can be used to
forcibly enable or disable the mitigation at boot or run-time. Possible
values are:
0: Mitigation disabled
1: Mitigation enabled
2: Run the automatic determination.
Currently, value 2 is the default and has identical effect as value 1.
This might change in the future if we choose to take into account
microcode revisions in the automatic determination process.
The tunable/sysctl value is simply ignored on non-applicable CPU models,
which is useful to apply the same configuration on a set of machines
that do not all have Zen2 processors. Trying to set it to any integer
value not listed above is silently equivalent to setting it to value 2
(automatic determination).
The current mitigation state can be queried through sysctl
'machdep.mitigations.zenbleed.state', which returns "Not applicable",
"Mitigation enabled" or "Mitigation disabled". Note that this state is
not guaranteed to be accurate in case of intervening modifications of
the corresponding chicken bit directly via cpuctl(4) (this includes the
cpucontrol(8) utility). Resetting the desired policy through
'machdep.mitigations.zenbleed.enable' (possibly to its current value)
will reset the hardware state and ensure that the reported state is
again coherent with it.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41817
This patch reverts the changes made in D19670 and fixes the original
issue by allocating and prepopulating a leaf page table page for wired
userspace 2M pages.
The original issue is an edge case that creates an unmapped, wired
region in userspace. Subsequent faults on this region can trigger wired
superpage creation, which leads to a panic in pmap_demote_pde_locked()
as the pmap does not create a leaf page table page for the wired
superpage. D19670 fixed this by disallowing preemptive creation of
wired superpage mappings, but that fix is currently interfering with an
ongoing effort of speeding up vm_map_wire for large, contiguous entries
(e.g. bhyve wiring guest memory).
Reviewed by: alc, markj
Sponsored by: Google, Inc. (GSoC 2023)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41132
They are a bit more informative than raw hexadecimal values.
While here, sort existing defines of bits for AMD MSRs to match the address
order.
Reviewed by: kib, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41816
To help porting the Linux emulation layer to a new platforms start using
Linux names for conditional builds instead of architecture-specific ifdefs.
MFC after: 1 week
Switch to using db_addr_t to hold frame pointer values until they are
verified to be suitably aligned.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D41532