Make sure that we don't try to copy with a negative resid.
Make sure that we don't walk off the end of the iovec array.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42098
At least KMSAN relies on zero-initialization of AP PCPU regions, see
commit 4b136ef259.
Prior to commit af1c6d3f30 these were allocated with allocpages() in
the amd64 pmap, which always returns zero-initialized memory.
Reviewed by: kib
Fixes: af1c6d3f30 ("amd64: do not leak pcpu pages")
MFC after: 3 days
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D42241
Add a man page for pmap_kextract(9), with alias to vtophys(9). This man
page is based on pmap_extract(9).
Add it as cross reference in pmap(9), and add comments above the
function implementations.
Co-authored-by: Graham Perrin <grahamperrin@gmail.com>
Co-authored-by: mhorne
Sponsored by: The FreeBSD Foundation
Pull Request: https://github.com/freebsd/freebsd-src/pull/827
Use it wherever COMPAT_FREEBSD13 is currently specified.
Reviewed by: brooks, zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42100
to allow ABIs to indicate that SIGSYS is needed. Mark all native
FreeBSD ABIs with the flag.
This implicitly marks Linux' ABIs as not delivering SIGSYS on invalid
syscall.
Reviewed by: dchagin, markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41976
In particular, when the syscall number is too large, or when syscall is
dynamic. For that, add nosys_sysent structure to pass fake sysent to
syscall top code.
Reviewed by: dchagin, markj
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41976
Applies only to bare-metal Zen2 processors. The system currently
automatically applies it to all of them.
Tunable/sysctl 'machdep.mitigations.zenbleed.enable' can be used to
forcibly enable or disable the mitigation at boot or run-time. Possible
values are:
0: Mitigation disabled
1: Mitigation enabled
2: Run the automatic determination.
Currently, value 2 is the default and has identical effect as value 1.
This might change in the future if we choose to take into account
microcode revisions in the automatic determination process.
The tunable/sysctl value is simply ignored on non-applicable CPU models,
which is useful to apply the same configuration on a set of machines
that do not all have Zen2 processors. Trying to set it to any integer
value not listed above is silently equivalent to setting it to value 2
(automatic determination).
The current mitigation state can be queried through sysctl
'machdep.mitigations.zenbleed.state', which returns "Not applicable",
"Mitigation enabled" or "Mitigation disabled". Note that this state is
not guaranteed to be accurate in case of intervening modifications of
the corresponding chicken bit directly via cpuctl(4) (this includes the
cpucontrol(8) utility). Resetting the desired policy through
'machdep.mitigations.zenbleed.enable' (possibly to its current value)
will reset the hardware state and ensure that the reported state is
again coherent with it.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41817
This patch reverts the changes made in D19670 and fixes the original
issue by allocating and prepopulating a leaf page table page for wired
userspace 2M pages.
The original issue is an edge case that creates an unmapped, wired
region in userspace. Subsequent faults on this region can trigger wired
superpage creation, which leads to a panic in pmap_demote_pde_locked()
as the pmap does not create a leaf page table page for the wired
superpage. D19670 fixed this by disallowing preemptive creation of
wired superpage mappings, but that fix is currently interfering with an
ongoing effort of speeding up vm_map_wire for large, contiguous entries
(e.g. bhyve wiring guest memory).
Reviewed by: alc, markj
Sponsored by: Google, Inc. (GSoC 2023)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41132
They are a bit more informative than raw hexadecimal values.
While here, sort existing defines of bits for AMD MSRs to match the address
order.
Reviewed by: kib, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41816
To help porting the Linux emulation layer to a new platforms start using
Linux names for conditional builds instead of architecture-specific ifdefs.
MFC after: 1 week
Switch to using db_addr_t to hold frame pointer values until they are
verified to be suitably aligned.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D41532
For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.
Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32360
Adding a writev syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.
MFC after: 1 month
Adding a writev syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.
MFC after: 1 month
Adding a write syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.
MFC after: 1 month
Adding a write syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.
MFC after: 1 month
bhyve in a 13.x jail fails to boot guests with more than one vCPU
because they pass too small a buffer to VM_GET_CPUS, causing the ioctl
handler to return ERANGE. Handle this the same way as cpuset system
calls: make sure that the result can fit in the truncated space, and
relax the check on the cpuset buffer.
As a side effect, fix an insufficient bounds check on "size". The
signed/unsigned comparison with sizeof(cpuset_t) fails to exclude
negative values, so we can end up allocating impossibly large amounts of
memory.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41496
EARLY_AP_STARTUP was introduced in 2016 (commit fdce57a042) with note:
As a transition aid, the new behavior is moved under a new
kernel option (EARLY_AP_STARTUP). This will allow the option
to be turned off if need be during initial testing. I hope to
enable this on x86 by default in a followup commit ...
It was enabled by default, but became effectively mandatory (on x86)
some time later. Move it to DEFAULTS to avoid an unbootable system if
the option is left out of a custom kernel configuration file.
Reported by: wollman
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41352
Following the removal of general MIPS support, there's no longer a need
to have the AHB bus-frontend in place, which according to Linux sources
also isn't used with any non-MIPS SoCs. For simplicity, PCI bus support
is only made conditional on the main one again, i. e. device ath_pci is
removed, and built into the main module, i. e. if_ath_pci.ko obsoleted,
respectively.
Effectively, this reverts the following commits and associated changes:
dba9c85977e849bb3ecb
Approved by: adrian
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D41354
Because KASAN shadows the kernel image itself (KMSAN currently does
not), a shadow mapping of the boot stack must be created very early
during boot. pmap_san_enter() reserves a fixed number of pages for the
purpose of creating and mapping this shadow region.
After commit 789df254cc ("amd64: Use a larger boot stack"), it could
happen that this reservation is insufficient; this happens when
bootstack crosses a PAGE_SHIFT + KASAN_SHADOW_SCALE_SHIFT boundary.
Update the calculation to take into account the new size of the boot
stack.
Fixes: 789df254cc ("amd64: Use a larger boot stack")
Sponsored by: The FreeBSD Foundation
Hardware with more than 256 CPU cores is currently available and will
become increasingly common over FreeBSD 14's lifetime. Increase MAXCPU
in the amd64 GENERIC kernel configuration to 1024.
Earlier commits increased some related limits. These prerequisite
commits include at least:
- d7ed40243769 Increase MAX_APIC_ID safeguard to 0x800
- d1639e43c5 cpuset: increase userland maximum size to 1024
Global and allocated arrays sized by MAXCPU result in excessive bloat
on systems with lower core counts. In addition, some code used u_char
(8 bits) to hold a CPU index, which is not valid if MAXCPU is greater
than 256.
A number of recent commits addressed these sorts of issues, including
at least:
- 133935d26f pf: atomically increment state ids
- 74ac712f72 vmm: Dynamically allocate a couple of per-CPU state save areas
- 78cfa762eb callout: Move per-CPU callout state into the dpcpu region
- 42f722e721 amd64: store pcids pmap data in pcpu zone
- 9801e7c275 smp_topo: dynamically allocate group array
- 9fb6718d1b smp: Dynamically allocate the stoppcbs array
- 2bb16c6352 x86: retire use of intr_bind
There are some additional allocations still to be converted and
more scalability work is required to make effective use of very high
core count systems, but this change allows us to boot on these systems
and provides a Kernel Binary Interface (KBI) for the FreeBSD 14 release
that supports these configurations.
Special thanks to AMD for providing hardware to test these changes.
PR: 269572
Reviewed by: des
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36838
amd64 is special in that its implementation of zpcpu_offset_cpu() is not
the identity transformation, even in !SMP kernels. Because the pm_pcidp
array of amd64's struct pmap is allocated from a pcpu UMA zone, this
means that accessing pm_pcidp directly, as is done in !SMP
implementations of pmap_invalidate_*, does not work. Specifically, I
see occasional unexplicable crashes in userspace when PCIDs are enabled.
Apply a minimal patch to fix the problem. While it would also make
sense to provide separate implementations of zpcpu_* for !SMP kernels,
fixing it this way makes the SMP and !SMP implementations of
pmap_invalidate_* more similar.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D41230
On x86 Linux via AT_HWCAP2 the user controlled (by tunables) processor
capabilities are exposed.
Reviewed by:
Differential Revision: https://reviews.freebsd.org/D41165
MFC after: 2 weeks
With sanitizers enabled, it becomes possible to overflow the stack when
only a single page is used. Follow arm64's example and use the default
kernel stack size instead. This is a bit wasteful, but without a guard
page, overflow merely corrupts adjacent .bss entries and is thus
difficult to debug.
Note, with a GENERIC kernel we already consume over half of the
available boot stack space, see the review for an example.
Reviewed by: kib
Reported by: Jenkins
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41166
Several vm_radix tries are not initialized with vm_radix_init. That
works, for now, since static initialization zeroes the root field
anyway, but if initialization changes, these tries will fail. Add
missing initializer calls.
Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D40971
The function pmap_pde_ept_executable() should not be conditionally
compiled based on VM_NRESERVLEVEL. It is required indirectly by
pmap_enter(..., psind=1) even when reservation-based allocation is
disabled at compile time.
Reviewed by: alc
MFC after: 1 week
At least an error from vcpu_lock_all() at line 553 would leak
memseg lock. There might be other cases as well.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D40981
When vcpu array is empty, function would return random value from
stack. What I observed was -1.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D40980
This makes use of the alternate address space support in both GCC and
clang to access per-CPU data as accesses relative to GS:. The
original motivation for this is that it quiets verbose warnings from
GCC 12. However, this version is also much easier to read and
allows the compiler to generate better code (e.g. the compiler can
use a GS: memory operand directly in other instructions such as IMUL
and CMP rather than always MOVing to a temporary register).
The one caveat is that the current approach is very inefficient at -O0
since the compiler expects to load the 0 base offset from a global
variable instead of assuming it is 0 (even with the const).
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D40647
Use of compiler builtin ffs/ctz functions will result in optimized
instruction sequences when possible, and fall back to calling a function
provided by the compiler run-time library. We have slowly shifted our
platforms to take advantage of these builtins in 60645781d6 (arm64),
1c76d3a9fb (arm), 9e319462a0 (powerpc, partial).
Some platforms still rely on the libkern implementations of these
functions provided by libkern, namely riscv, powerpc (ffs*, flsll), and
i386 (ffsll and flsll). These routines are slow, as they perform a
linear search for the bit in question. Even on platforms lacking
dedicated bit-search instructions, such as riscv, the compiler library
will provide better-optimized routines, e.g. by using binary search.
Consolidate all definitions of these functions (whether currently using
builtins or not) to libkern.h. This should result in equivalent or
better performing routines in all cases.
One wart in all of this is the existing HAVE_INLINE_F*** macros, which
we use in a few places to conditionally avoid the slow libkern routines.
These aren't easily removed in one commit. For now, provide these
defines unconditionally, but marked for removal after subsequent
cleanup.
Removal of the now unused libkern routines will follow in the next
commit.
Reviewed by: dougm, imp (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40698
Since pmap_ps_enabled() is true by default, check it inside of
pmap_promote_pde() instead of at every call site.
Modify pmap_promote_pde() to return true if the promotion succeeded and
false otherwise. Use this return value in a couple places.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D40744
Stop requiring all of the PTEs to have the accessed bit set for superpage
promotion to occur. Given that change, add support for promotion to
pmap_enter_quick(), which does not set the accessed bit in the PTE that
it creates.
Since the final mapping within a superpage-aligned and sized region of a
memory-mapped file is typically created by a call to pmap_enter_quick(),
we now achieve promotions in circumstances where they did not occur
before, for example, the X server's read-only mapping of libLLVM-15.so.
See also https://www.usenix.org/system/files/atc20-zhu-weixi_0.pdf
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40478
Due to fxsave area is os independent reimplement fxsave handmade code
using copying of a whole area.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D40443
MFC after: 2 weeks
Now that <sys/tslog.h> is wrapped in #ifdef _KERNEL, it's safe to have
tslog annotations in files which might be built from userland (i.e. in
subr_boot.c, which is built as part of the boot loader).
This reverts commit 59588a546f.
The change to subr_boot.c broke the libsa build because the TSLOG
macros have their own definitions for the boot loader -- I didn't
realize that the loader code used subr_boot.c.
I'm currently testing a fix and I'll revert this revert once I'm
satisfied that everything works, but I don't want to leave the
tree broken for too long.
This reverts commit 469cfa3c30.
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
SYSINIT cpu takes roughly 2770 us:
* 2280 us in vm_ksubmap_init
* 535 us in kmem_malloc
* 450 us in pmap_zero_page
* 1720 us in pmap_growkernel
* 1620 us in pmap_zero_page
* 80 us in bufinit
* 480 us in cpu_setregs
* 430 us in cpu_setregs calling load_cr0
Much of this is hypervisor overhead: load_cr0 is slow because it traps
to the hypervisor, and 99% of the time in pmap_zero_page is spent when
we first touch the page, presumably due to the host Linux kernel
faulting in backing pages one by one.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D40327
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
pmap_zero_page is responsible for 4.6 ms of the 25.0 ms of boot time.
This is not in fact time spent zeroing pages though; almost all of
that time is spent in a first-touch penalty, presumably due to the
host Linux kernel faulting in backing pages one by one.
There's probably a way to improve that by teaching Firecracker to
fault in all the VM's pages from the start rather than having them
faulted in one at a time, but that's outside of FreeBSD's control.
This commit adds a TSLOG_PAGEZERO option which enables TSLOG on the
amd64 pmap_zero_page function; it's a separate option (turned off
by default even if TSLOG is enabled) since zeroing pages happens
enough that it can easily fill the TSLOG buffer and prevent other
timing information from being recorded.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D40326
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
hammer_time takes roughly 2740 us:
* 55 us in xen_pvh_parse_preload_data
* 20 us in boot_parse_cmdline_delim
* 20 us in boot_env_to_howto
* 15 us in identify_hypervisor
* 1320 us in link_elf_reloc
* 1310 us in relocate_file1 handling ef->rela
* 25 us in init_param1
* 30 us in dpcpu_init
* 355 us in initializecpu
* 255 us in initializecpu calling load_cr4
* 425 us in getmemsize
* 280 us in pmap_bootstrap
* 205 us in create_pagetables
* 10 us in init_param2
* 25 us in pci_early_quirks
* 60 us in cninit
* 90 us in kdb_init
* 105 us in msgbufinit
* 20 us in fpuinit
* 205 us elsewhere in hammer_time
Some of these are unavoidable (e.g. identify_hypervisor uses CPUID and
load_cr4 loads the CR4 register, both of which trap to the hypervisor)
but others may deserve attention.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D40325
The sysentvec sv_imgact_try was used by kern_exec() to allow
non-native ABI to fixup shell path according to ABI root directory.
Since the non-native ABI can now specify its root directory directly
to namei() via pwd_altroot() call this facility is not needed anymore.
Differential Revision: https://reviews.freebsd.org/D40092
MFC after: 2 month
The Barndinfo emul_path was used by the Elf image activator to fixup
interpreter file name according to ABI root directory. Since the
non-native ABI can now specify its root directory directly to namei()
via pwd_altroot() call this facility is not needed anymore.
Differential Revision: https://reviews.freebsd.org/D40091
MFC after: 2 month
Get rid of using register numbers which is undefined in libunwind
on x86_64.
Differential Revision: https://reviews.freebsd.org/D40156
MFC after: 1 month
Perhaps, this does not makes much sense as destroyng %rcx declared by
the x86_64 Linux syscall ABI. However,:
a) if we get a signal while we are in the kernel, we should restore
tf_rcx when preparing machine context for signal handlers.
b) the Linux world is strange, someone can depend on %rcx value
after syscall, something like go.
Differential Revision: https://reviews.freebsd.org/D40155
MFC after: 1 month
I agree, it would be great to avoid PCB_FULL_IRET, however we should
follow Linux system call ABI.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D40152
MFC after: 1 month
This avoids bloating the kernel image when MAXCPU is large.
A follow-up patch for kgdb and other kernel debuggers is needed since
the stoppcbs symbol is now a pointer. Bump __FreeBSD_version so that
debuggers can use osreldate to figure out how to handle stoppcbs.
PR: 269572
MFC after: never
Reviewed by: mjg, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39806
Commit 0bda8d3e9f ("vmm: permit some IPIs to be handled by userspace")
embedded cpuset_t into the vmm(4) ioctl ABI. This was a mistake since
we otherwise have some leeway to change the cpuset_t for the whole
system, but we want to keep the vmm ioctl ABI stable.
Rework IPI reporting to avoid this problem. Along the way, make VM_RUN
a bit more efficient:
- Split vmexit metadata out of the main VM_RUN structure. This data is
only written by the kernel.
- Have userspace pass a cpuset_t pointer and cpusetsize in the VM_RUN
structure, as is done for cpuset syscalls.
- Have the destination CPU mask for VM_EXITCODE_IPIs live outside the
vmexit info structure, and make VM_RUN copy it out separately. Zero
out any extra bytes in the CPU mask, like cpuset syscalls do.
- Modify the vmexit handler prototype to take a full VM_RUN structure.
PR: 271330
Reviewed by: corvink, jhb (previous versions)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40113
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
pmap_get_pcid() calls zpcpu_get() which is defined in pcpu.h.
It is unclear why we do not include that header but like right
above the change add another guard around pmap_get_pcid().
This allows some LinuxKPI headers to compile again.
Suggested by: markj
MFC after: 10 days
All UFS options work for ufs.ko.
Reviewed by: emaste, imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39990
Failing to preserve pir_desc can result in pending interrupts being lost
on resume leading to a hung VM.
Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D35447
This patch fixes virtual machine single stepping on VMX hosts.
Currently, when using bhyve's gdb stub, each attempt at single-stepping
a vCPU lands in a timer interrupt. The current single-stepping mechanism
uses the Monitor Trap Flag feature to cause VMEXIT after a single
instruction is executed. Unfortunately, the SDM states that MTF causes
VMEXITs for the next instruction that gets executed, which is often not
what the person using the debugger expects. [1]
This patch adds a new VM capability that masks interrupts on a vCPU by
blocking interrupt injection and modifies the gdb stub to use the newly
added capability while single-stepping a vCPU.
[1] Intel SDM 26.5.2 Vol. 3C
Reviewed by: corvink, jbh
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39949
This existing helper function is preferable to the hand-rolled
calculation of the kstack bounds.
Make some small style improvements while here. Notably, rename every
instance of "r", the return address, to "ra". Tidy the includes in the
affected files.
Reviewed by: jkoshy
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39909
This is the MINIMAL config with SMP/NUMA options turned off.
Useful to ensure that UP configuration still builds, until it is removed
finally.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
On amd64 ACPI is required to boot, it cannot work as a module, and we do
not build the ACPI module for long time.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
If vmx or svm is disabled in BIOS or the device isn't supported by vmm,
modinit won't allocate these state save areas. As kmem_free panics when
passing a NULL pointer to it, loading the vmm kernel module causes a
panic too.
PR: 271251
Reviewed by: markj
Fixes: 74ac712f72 ("vmm: Dynamically allocate a couple of per-CPU state save areas")
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D39974
Do not preallocate pcpu area backing pages on early startup, only
allocate enough of KVA for pcpu[MAXCPU] and the page for BSP. Other
pages are allocated after we know the number of cpus and their
assignments to the domains.
PCPUs are not accessed until they are initialized, which happens on AP
startup.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39945
The APs pcpu area is zeroed in init_secondary() by pcpu_init(), so the
early initialization in pmap_bootstrap() is nop.
Fixes: 42f722e721cd010ae5759a4b0d3b7b93c2b9cad2ESC
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39945
This change eliminates the struct pmap_pcid array embedded into struct
pmap and sized by MAXCPU, which would bloat with MAXCPU increase. Also
it removes false sharing of cache lines, since the array elements are
mostly locally accessed by corresponding CPUs.
Suggested by: mjg
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
Cpuid is used to index the pmap->pm_pcids array only.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
to initialize pm_pcids array for a new user pmap
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
and rename the structure to pmap_pcid.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit unsigned integer
on all platforms. Prior the 2.6 kernel dev_t type was an unsigned short.
However, since the firs commit of the Linuxulator, mknod syscall get int dev
argument.
Also, there is some confusion here, while the kernel declares a dev_t type
as a 32-bit sized, the user-space dev_t type can be size of 64 bits, e.g.,
in the Glibc library.
To avoid confusion and to help porting of the Linuxulator to other platforms
use explicit l_dev_t for dev argument of mknod syscalls.
This avoids bloating the BSS when MAXCPU is large.
No functional change intended.
PR: 269572
Reviewed by: corvink, rew
Tested by: rew
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39805
This fixes the detection of pending interrupts when pirval is 0 and the
pending bit is set
More information how this situation occurs, can be found here:
c5b5f2d808/sys/amd64/vmm/intel/vmx.c (L4016-L4031)
Reviewed by: corvink, markj
Fixes: 02cc877968 ("Recognize a pending virtual interrupt while emulating the halt instruction.")
MFC after: 1 week
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D39620
The driver is enormous and rarely used.
text data bss dec hex filename
23076646 1870505 4415872 29363023 0x1c00b4f kernel.before
20017433 1870305 4416000 26303738 0x1915cfa kernel.after
People using the driver will need to add pmspcv_load="YES" to
their loader.conf.
Reviewed by: jhb
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39816
This is required to announce support for some accelerated AES
operations. AVX512BW indicates support for the AVX512-FP16 extension
and AVX512VL indicates support for the use of AVX512 instructions with
vector lengths smaller than 512 bits.
VAES and VPCLMULQDQ extensions indicate that VEX-prefixed AES-NI and
pclmulqdq instructions are supported.
All of these bits are needed for OpenSSL to use VAES to accelerate
AES-GCM transforms.
Reviewed by: corvink, kib, jhb
MFC after: 2 weeks
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D39781
Export default MINSIGSTKSZ value for the x86 until we do not preserve AVX
registers in the signal context.
Differential Revision: https://reviews.freebsd.org/D39644
MFC after: 1 month
Have more accruate comments. While #if, #else, etc are copied to the
header files, lines that don't start with # are not. And #include files
are only output to sysinc (which winds up at the front of init_sysent.c
which seems a bit odd). This is all radically undocumented, and likely
has drifted somewhat from 4.4BSD and what other systems do (they've
drifted too, fwiw).
Sponsored by: Netflix
The explanation from https://reviews.freebsd.org/D39637 by stevek:
The "use_xsave" variable is a global and that is only supposed to be
initialized early before scheduling gets started. However, with the way
the ifuncs for "fpusave" and "fpurestore" are implemented, the value
could be changed at runtime when scheduling is active if "use_xsave"
was set to 0 by the tunable. This leaves a window of opportunity where
"use_xsave" gets re-initialized to 1 and a context switch could occur
with a thread that was not set up to be able to use xsave functionality.
This can lead to an "privileged instruction fault".
The fix is to protect "use_xsave" from being initialized more than once.
Reported and reviewed by: stevek
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39660
When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact. Provide a variant of pmap_remove(), pmap_map_delete(), to
allow pmap to distinguish between real removes of the UVA mappings
and any other internal removes, e.g. swapout.
For non-amd64, pmap_map_delete() is stubbed by define to pmap_remove().
Reported by: andrew
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39556
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header. A
similar situation exists for the NR_EVENT_CHANNELS constant.
Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.
This was originally implemented by Julien Grall, but has been heavily
modified. The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around. This allows the
architecture to add function definitions which use struct xenisrc.
The original version only moved xi_intsrc into xen_arch_isrc_t. Moving
xi_vector was done by the submitter.
The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t. Those disappeared with the removal of PVHv1 support.
Copyright note. The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs. Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
- Adjust some line lengths
- Fix comment about NR_EVENT_CHANNELS after movement.
- Use #include instead of symlinks.
xen_intr_handle_upcall() has two interfaces. It needs to be called by
the x86 assembly code invoked by the APIC. Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.
Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface. Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().
When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled. This removes the need for
critical_enter()/critical_exit() when called this way.
The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.
Additionally driver_filter_t functions have no need to handle interrupt
counters. The intrcnt_add() calling function was reworked to match the
current situation. intrcnt_add() is now only called via one path.
The increment/decrement of curthread->td_intr_nesting_level had
previously been left out. Appears this was mostly harmless, but this
was noticed during implementation and has been added.
CONFIG_X86 is a leftover from use with Linux. While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.
Copyright note. xen_intr_intrcnt_add() was introduced at 76acc41fb7
by Justin T. Gibbs. xen_intrcnt_init() was introduced at fd036deac1
by John Baldwin.
sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface. Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later. As such the filename and header list
belong to Julien Grall, but what those were created for is later.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006