Commit graph

9243 commits

Author SHA1 Message Date
Dmitry Chagin 5bdd74cc05 linux(4): Drop the outdated comments about sixth register on i386 int0x80
This is well documented in the Linux syscall(2).

MFC after:		1 week
2023-10-10 12:33:22 +03:00
Dmitry Chagin 4fe779900b linux(4): Deduplicate SystemV IPC defines from amd64/linux
MFC after:		1 week
2023-10-04 21:18:45 +03:00
Dmitry Chagin 199e397e9b linux(4): Deorbit linux_nosys
Differential Revision:	https://reviews.freebsd.org/D41901
MFC after:		1 week
2023-10-03 10:38:03 +03:00
Dmitry Chagin 99abee8b7b linux(4): Regen for linux_nosys change
MFC after:		1 week
2023-10-03 10:38:03 +03:00
Dmitry Chagin 8e523be5a5 linux(4): Deorbit linux_nosys from syscalls.master
Differential Revision:	https://reviews.freebsd.org/D41902
MFC after:		1 week
2023-10-03 10:38:02 +03:00
Konstantin Belousov 7acc4240ce linuxolator: fix nosys() to not send SIGSYS
Reviewed by:	dchagin, markj
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41976
2023-10-03 01:30:53 +03:00
Konstantin Belousov b82b4ae752 sysentvec: add SV_SIGSYS flag
to allow ABIs to indicate that SIGSYS is needed.  Mark all native
FreeBSD ABIs with the flag.

This implicitly marks Linux' ABIs as not delivering SIGSYS on invalid
syscall.

Reviewed by:	dchagin, markj
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41976
2023-10-03 01:30:52 +03:00
Konstantin Belousov 39024a8914 syscalls: fix missing SIGSYS for several ENOSYS errors
In particular, when the syscall number is too large, or when syscall is
dynamic.  For that, add nosys_sysent structure to pass fake sysent to
syscall top code.

Reviewed by:	dchagin, markj
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41976
2023-10-03 01:30:52 +03:00
Konstantin Belousov 6b3bb233cd amd64 cpu_fetch_syscall_args_fallback(): fix whitespace
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41976
2023-10-03 01:30:52 +03:00
Olivier Certner ebaea1bcd2 x86: AMD Zen2: Zenbleed chicken bit mitigation
Applies only to bare-metal Zen2 processors.  The system currently
automatically applies it to all of them.

Tunable/sysctl 'machdep.mitigations.zenbleed.enable' can be used to
forcibly enable or disable the mitigation at boot or run-time.  Possible
values are:

    0: Mitigation disabled
    1: Mitigation enabled
    2: Run the automatic determination.

Currently, value 2 is the default and has identical effect as value 1.
This might change in the future if we choose to take into account
microcode revisions in the automatic determination process.

The tunable/sysctl value is simply ignored on non-applicable CPU models,
which is useful to apply the same configuration on a set of machines
that do not all have Zen2 processors.  Trying to set it to any integer
value not listed above is silently equivalent to setting it to value 2
(automatic determination).

The current mitigation state can be queried through sysctl
'machdep.mitigations.zenbleed.state', which returns "Not applicable",
"Mitigation enabled" or "Mitigation disabled".  Note that this state is
not guaranteed to be accurate in case of intervening modifications of
the corresponding chicken bit directly via cpuctl(4) (this includes the
cpucontrol(8) utility).  Resetting the desired policy through
'machdep.mitigations.zenbleed.enable' (possibly to its current value)
will reset the hardware state and ensure that the reported state is
again coherent with it.

Reviewed by:	kib
Sponsored by:   The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D41817
2023-10-02 15:29:18 -04:00
Dmitry Chagin 28035f675b linux(4): Regen
MFC after:		1 week
2023-09-25 12:26:34 +03:00
Dmitry Chagin 0a16d3d14d linux(4): Update syscalls.master to 6.5
MFC after:		1 week
2023-09-25 12:24:58 +03:00
Mark Johnston f7c733e4fe amd64: Convert a cheap DIAGNOSTIC check to a KASSERT
MFC after:	1 week
2023-09-17 06:27:31 -04:00
Bojan Novković aa3bcaad51 amd64: Add a leaf PTP when pmap_enter(psind=1) creates a wired mapping
This patch reverts the changes made in D19670 and fixes the original
issue by allocating and prepopulating a leaf page table page for wired
userspace 2M pages.

The original issue is an edge case that creates an unmapped, wired
region in userspace. Subsequent faults on this region can trigger wired
superpage creation, which leads to a panic in pmap_demote_pde_locked()
as the pmap does not create a leaf page table page for the wired
superpage. D19670 fixed this by disallowing preemptive creation of
wired superpage mappings, but that fix is currently interfering with an
ongoing effort of speeding up vm_map_wire for large, contiguous entries
(e.g. bhyve wiring guest memory).

Reviewed by:	alc, markj
Sponsored by:	Google, Inc. (GSoC 2023)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D41132
2023-09-17 06:27:22 -04:00
Olivier Certner 125bbadf60 x86: Add defines for workaround bits in AMD's MSR "Decode Configuration"
They are a bit more informative than raw hexadecimal values.

While here, sort existing defines of bits for AMD MSRs to match the address
order.

Reviewed by:	kib, emaste
Sponsored by:   The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D41816
2023-09-14 16:24:48 +01:00
Dmitry Chagin ba90a31d08 linux(4): Cleanup includes under amd64/linux32
No functional changes.

MFC after:		1 week
2023-09-11 21:29:40 +03:00
Dmitry Chagin 68df2376e0 linux(4): Cleanup includes under amd64/linux
No functional changes.

MFC after:		1 week
2023-09-11 21:29:34 +03:00
Dmitry Chagin 2a1cf1b6b5 linux(4): Deduplicate mmap2
To help porting the Linux emulation layer to a new platforms start using
Linux names for conditional builds instead of architecture-specific ifdefs.

MFC after:		1 week
2023-09-05 21:16:39 +03:00
Dmitry Chagin 553b1a4e4e linux(4): Deduplicate mprotect, madvise
MFC after:		1 week
2023-09-05 21:15:52 +03:00
John Baldwin 4a9cd9fc22 amd64 db_trace: Reject unaligned frame pointers
Switch to using db_addr_t to hold frame pointer values until they are
verified to be suitably aligned.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D41532
2023-09-01 15:55:37 -07:00
John Baldwin d1e4c63d9e efirt_machdep.c: Trim some unused includes
Reviewed by:	imp, kib, markj
Differential Revision:	https://reviews.freebsd.org/D41596
2023-08-28 16:22:03 -07:00
John Baldwin 8173fa60dd efirt: Move comment about fpu_kern_enter to where it is called
Reviewed by:	imp, kib, andrew, markj
Differential Revision:	https://reviews.freebsd.org/D41576
2023-08-25 12:33:00 -07:00
Konstantin Belousov 74ccb8ecf6 Add cpu_sync_core()
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32360
2023-08-23 03:02:21 +03:00
Konstantin Belousov 8882b7852a add pmap_active_cpus()
For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.

Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32360
2023-08-23 03:02:21 +03:00
Dmitry Chagin 524c9accdc linux(4): Replace linux32_copyiniov by freebsd32_copyiniov
MFC after:		1 month
2023-08-20 10:36:32 +03:00
Dmitry Chagin c987ff4d7b linux(4): Replace linux32_copyinuio by freebsd32_copyinuio
MFC after:		1 month
2023-08-20 10:36:32 +03:00
Dmitry Chagin 4e5f2eb0b6 Regen for readv syscall 2023-08-20 10:36:32 +03:00
Dmitry Chagin 5585afe642 linux(4): Prepare to retire linux32_copyinuio
MFC after:		1 month
2023-08-20 10:36:31 +03:00
Dmitry Chagin 4231b825ac linux(4): Add a dedicated writev syscall wrapper
Adding a writev syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.

MFC after:		1 month
2023-08-20 10:36:31 +03:00
Dmitry Chagin 1f9d71ee32 Regen for writev syscall 2023-08-20 10:36:31 +03:00
Dmitry Chagin aad4b799f7 linux(4): Add a writev syscall wrapper
Adding a writev syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.

MFC after:		1 month
2023-08-20 10:36:30 +03:00
Dmitry Chagin e58ff66464 linux(4): Add a write syscall wrapper
Adding a write syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.

MFC after:		1 month
2023-08-20 10:36:29 +03:00
Dmitry Chagin 89d270b28d Regen for write syscall 2023-08-20 10:36:29 +03:00
Dmitry Chagin 510f5c88f0 linux(4): Modify write syscall to match Linux
Adding a write syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.

MFC after:		1 month
2023-08-20 10:36:28 +03:00
Dmitry Chagin 3460fab5fc linux(4): Remove sys/cdefs.h inclusion where it's not needed due to 685dc743 2023-08-18 13:12:02 +03:00
Mark Johnston 5635d5b61e vmm: Fix VM_GET_CPUS compatibility
bhyve in a 13.x jail fails to boot guests with more than one vCPU
because they pass too small a buffer to VM_GET_CPUS, causing the ioctl
handler to return ERANGE.  Handle this the same way as cpuset system
calls: make sure that the result can fit in the truncated space, and
relax the check on the cpuset buffer.

As a side effect, fix an insufficient bounds check on "size".  The
signed/unsigned comparison with sizeof(cpuset_t) fails to exclude
negative values, so we can end up allocating impossibly large amounts of
memory.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D41496
2023-08-17 18:10:02 -04:00
Dmitry Chagin 270e01d468 linux(4): Fix leftovers after 2ff63af9 2023-08-17 23:54:00 +03:00
Dmitry Chagin 158b57295f linux(4): Regen for sendfile 2023-08-17 22:57:17 +03:00
Dmitry Chagin 5068387f42 linux(4): Use l_off_t type for offset argument in sendfile syscall
The off_t on Linux is a long, so it's non-functional change, just to
avoid confusing future readers.

MFC after:		1 month
2023-08-17 22:57:16 +03:00
Warner Losh 78d146160d sys: Remove $FreeBSD$: one-line bare tag
Remove /^\s*\$FreeBSD\$$\n/
2023-08-16 11:55:17 -06:00
Warner Losh 031beb4e23 sys: Remove $FreeBSD$: one-line sh pattern
Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:58 -06:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh 71625ec9ad sys: Remove $FreeBSD$: one-line .c comment pattern
Remove /^/[*/]\s*\$FreeBSD\$.*\n/
2023-08-16 11:54:24 -06:00
Warner Losh 2ff63af9b8 sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:18 -06:00
Warner Losh 95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Ed Maste a51f81c2e5 x86: move EARLY_AP_STARTUP into DEFAULTS
EARLY_AP_STARTUP was introduced in 2016 (commit fdce57a042) with note:

    As a transition aid, the new behavior is moved under a new
    kernel option (EARLY_AP_STARTUP). This will allow the option
    to be turned off if need be during initial testing. I hope to
    enable this on x86 by default in a followup commit ...

It was enabled by default, but became effectively mandatory (on x86)
some time later.  Move it to DEFAULTS to avoid an unbootable system if
the option is left out of a custom kernel configuration file.

Reported by:	wollman
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41352
2023-08-14 16:17:48 -04:00
Marius Strobl 37c8ee8847 ath(4): Remove MIPS AHB frontend and join PCI one w/ main support again
Following the removal of general MIPS support, there's no longer a need
to have the AHB bus-frontend in place, which according to Linux sources
also isn't used with any non-MIPS SoCs. For simplicity, PCI bus support
is only made conditional on the main one again, i. e. device ath_pci is
removed, and built into the main module, i. e. if_ath_pci.ko obsoleted,
respectively.
Effectively, this reverts the following commits and associated changes:
dba9c85977
e849bb3ecb

Approved by:	adrian
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D41354
2023-08-08 22:30:13 +02:00
Mark Johnston 78cc000cba amd64: Increase sanitizers' static shadow memory reservation
Because KASAN shadows the kernel image itself (KMSAN currently does
not), a shadow mapping of the boot stack must be created very early
during boot.  pmap_san_enter() reserves a fixed number of pages for the
purpose of creating and mapping this shadow region.

After commit 789df254cc ("amd64: Use a larger boot stack"), it could
happen that this reservation is insufficient; this happens when
bootstack crosses a PAGE_SHIFT + KASAN_SHADOW_SCALE_SHIFT boundary.
Update the calculation to take into account the new size of the boot
stack.

Fixes:		789df254cc ("amd64: Use a larger boot stack")
Sponsored by:	The FreeBSD Foundation
2023-08-04 12:38:24 -04:00
Dmitry Chagin b5c0b9555d linux(4): Regen for ioprio syscalls
MFC after:		1 month
2023-08-04 16:03:57 +03:00
Dmitry Chagin 1c83154e49 linux(4): Modify ioprio syscalls to match Linux
MFC after:		1 month
2023-08-04 16:03:55 +03:00
Ed Maste 9051987e40 amd64: Bump MAXCPU to 1024 (from 256)
Hardware with more than 256 CPU cores is currently available and will
become increasingly common over FreeBSD 14's lifetime.  Increase MAXCPU
in the amd64 GENERIC kernel configuration to 1024.

Earlier commits increased some related limits.  These prerequisite
commits include at least:

- d7ed40243769 Increase MAX_APIC_ID safeguard to 0x800
- d1639e43c5 cpuset: increase userland maximum size to 1024

Global and allocated arrays sized by MAXCPU result in excessive bloat
on systems with lower core counts.  In addition, some code used u_char
(8 bits) to hold a CPU index, which is not valid if MAXCPU is greater
than 256.

A number of recent commits addressed these sorts of issues, including
at least:

- 133935d26f pf: atomically increment state ids
- 74ac712f72 vmm: Dynamically allocate a couple of per-CPU state save areas
- 78cfa762eb callout: Move per-CPU callout state into the dpcpu region
- 42f722e721 amd64: store pcids pmap data in pcpu zone
- 9801e7c275 smp_topo: dynamically allocate group array
- 9fb6718d1b smp: Dynamically allocate the stoppcbs array
- 2bb16c6352 x86: retire use of intr_bind

There are some additional allocations still to be converted and
more scalability work is required to make effective use of very high
core count systems, but this change allows us to boot on these systems
and provides a Kernel Binary Interface (KBI) for the FreeBSD 14 release
that supports these configurations.

Special thanks to AMD for providing hardware to test these changes.

PR:		269572
Reviewed by:	des
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36838
2023-08-03 17:41:26 -04:00
Gordon Bergling 29eab3e4e0 linux(4): Fix two typos in source code comments
- s/decriptors/descriptors/

MFC after:	3 days
2023-08-02 11:55:30 +02:00
Mark Johnston 5ad29bc8d4 amd64: Fix TLB invalidation routines in !SMP kernels
amd64 is special in that its implementation of zpcpu_offset_cpu() is not
the identity transformation, even in !SMP kernels.  Because the pm_pcidp
array of amd64's struct pmap is allocated from a pcpu UMA zone, this
means that accessing pm_pcidp directly, as is done in !SMP
implementations of pmap_invalidate_*, does not work.  Specifically, I
see occasional unexplicable crashes in userspace when PCIDs are enabled.

Apply a minimal patch to fix the problem.  While it would also make
sense to provide separate implementations of zpcpu_* for !SMP kernels,
fixing it this way makes the SMP and !SMP implementations of
pmap_invalidate_* more similar.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D41230
2023-07-30 11:12:35 -04:00
Alan Cox 3d7c37425e amd64 pmap: Catch up with pctrie changes
Recent changes to the pctrie code make it necessary to initialize the
kernel pmap's rangeset for PKU.
2023-07-28 15:13:13 -05:00
Dmitry Chagin 4281dab8bc linux(4): Add elf_hwcap2 to x86
On x86 Linux via AT_HWCAP2 the user controlled (by tunables) processor
capabilities are exposed.

Reviewed by:
Differential Revision:	https://reviews.freebsd.org/D41165
MFC after:		2 weeks
2023-07-28 11:56:59 +03:00
Mark Johnston 640e5cb304 kmsan: Add a comment explaining why KMSAN doesn't shadow above KERNBASE
Sponsored by:	The FreeBSD Foundation
2023-07-27 16:01:58 -04:00
Mark Johnston 789df254cc amd64: Use a larger boot stack
With sanitizers enabled, it becomes possible to overflow the stack when
only a single page is used.  Follow arm64's example and use the default
kernel stack size instead.  This is a bit wasteful, but without a guard
page, overflow merely corrupts adjacent .bss entries and is thus
difficult to debug.

Note, with a GENERIC kernel we already consume over half of the
available boot stack space, see the review for an example.

Reviewed by:	kib
Reported by:	Jenkins
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D41166
2023-07-24 18:49:36 -04:00
Dmitry Chagin d9c2dc6bf1 linux(4): Regen for xattr syscalls
MFC after:		1 month
2023-07-22 14:03:32 +03:00
Dmitry Chagin 41f2c69ee3 linux(4): Modify xattr syscalls to match Linux
MFC after:		1 month
2023-07-22 14:03:31 +03:00
Kristof Provost 208fcb55e3 Fix MINIMAL build on amd64
amd64/include/counter.h uses KASSERT, but failed to include the
kassert.h header.
2023-07-14 09:18:43 +02:00
Doug Moore 3e04ae433f vm_radix_init: use initializer
Several vm_radix tries are not initialized with vm_radix_init. That
works, for now, since static initialization zeroes the root field
anyway, but if initialization changes, these tries will fail. Add
missing initializer calls.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D40971
2023-07-14 01:49:55 -05:00
Yufeng Zhou 294c52d969 amd64 pmap: Fix compilation when superpage reservations are disabled
The function pmap_pde_ept_executable() should not be conditionally
compiled based on VM_NRESERVLEVEL. It is required indirectly by
pmap_enter(..., psind=1) even when reservation-based allocation is
disabled at compile time.

Reviewed by:	alc
MFC after:	1 week
2023-07-12 12:07:42 -05:00
Gleb Smirnoff 0d1ff2b04d vmm: don't leak locks exiting vmmdev_ioctl()
At least an error from vcpu_lock_all() at line 553 would leak
memseg lock.  There might be other cases as well.

Reviewed by:		corvink, markj
Differential Revision:	https://reviews.freebsd.org/D40981
2023-07-12 09:16:40 -07:00
Gleb Smirnoff 30f0328a32 vmm: don't return random error from vcpu_lock_all() if vcpu is empty
When vcpu array is empty, function would return random value from
stack.  What I observed was -1.

Reviewed by:		corvink, markj
Differential Revision:	https://reviews.freebsd.org/D40980
2023-07-12 09:16:40 -07:00
John Baldwin 2596008a0b amd64 pcpu.h: Add missing 'do' from do-while loop around __PCPU_SET.
Reported by:	mjg
Diagnosed by:	jrtc27
2023-07-08 12:59:26 -07:00
John Baldwin 2329393c61 amd64: Use __seg_gs to implement per-CPU data accesses.
This makes use of the alternate address space support in both GCC and
clang to access per-CPU data as accesses relative to GS:.  The
original motivation for this is that it quiets verbose warnings from
GCC 12.  However, this version is also much easier to read and
allows the compiler to generate better code (e.g. the compiler can
use a GS: memory operand directly in other instructions such as IMUL
and CMP rather than always MOVing to a temporary register).

The one caveat is that the current approach is very inefficient at -O0
since the compiler expects to load the 0 base offset from a global
variable instead of assuming it is 0 (even with the const).

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D40647
2023-07-07 13:06:55 -07:00
Mitchell Horne a89262079e Consistently provide ffs/fls using builtins
Use of compiler builtin ffs/ctz functions will result in optimized
instruction sequences when possible, and fall back to calling a function
provided by the compiler run-time library. We have slowly shifted our
platforms to take advantage of these builtins in 60645781d6 (arm64),
1c76d3a9fb (arm), 9e319462a0 (powerpc, partial).

Some platforms still rely on the libkern implementations of these
functions provided by libkern, namely riscv, powerpc (ffs*, flsll), and
i386 (ffsll and flsll). These routines are slow, as they perform a
linear search for the bit in question. Even on platforms lacking
dedicated bit-search instructions, such as riscv, the compiler library
will provide better-optimized routines, e.g. by using binary search.

Consolidate all definitions of these functions (whether currently using
builtins or not) to libkern.h. This should result in equivalent or
better performing routines in all cases.

One wart in all of this is the existing HAVE_INLINE_F*** macros, which
we use in a few places to conditionally avoid the slow libkern routines.
These aren't easily removed in one commit. For now, provide these
defines unconditionally, but marked for removal after subsequent
cleanup.

Removal of the now unused libkern routines will follow in the next
commit.

Reviewed by:	dougm, imp (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D40698
2023-07-06 14:46:41 -03:00
Mark O'Donovan b0d3d44dfe qlnxe: add driver to amd64 NOTES
Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/779
2023-07-01 11:06:59 -06:00
Alan Cox 0d2f98c2f0 amd64 pmap: Tidy up pmap_promote_pde() calls
Since pmap_ps_enabled() is true by default, check it inside of
pmap_promote_pde() instead of at every call site.

Modify pmap_promote_pde() to return true if the promotion succeeded and
false otherwise.  Use this return value in a couple places.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D40744
2023-06-24 13:09:04 -05:00
Alan Cox 34eeabff5a amd64/arm64 pmap: Stop requiring the accessed bit for superpage promotion
Stop requiring all of the PTEs to have the accessed bit set for superpage
promotion to occur.  Given that change, add support for promotion to
pmap_enter_quick(), which does not set the accessed bit in the PTE that
it creates.

Since the final mapping within a superpage-aligned and sized region of a
memory-mapped file is typically created by a call to pmap_enter_quick(),
we now achieve promotions in circumstances where they did not occur
before, for example, the X server's read-only mapping of libLLVM-15.so.

See also https://www.usenix.org/system/files/atc20-zhu-weixi_0.pdf

Reviewed by:	kib, markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D40478
2023-06-12 13:40:57 -05:00
Warner Losh 9121945d70 Regenerate sysent stuff after $FreeBSD$ removal
Sponsored by:		Netflix
2023-06-09 07:28:27 -06:00
Dmitry Chagin cbbac56091 linux(4): Preserve fpu xsave state across signal delivery on amd64
PR:			270247
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D40444
MFC after:		2 weeks
2023-06-09 01:33:26 +03:00
Dmitry Chagin 920184ed6e linux(4): In preparation for xsave refactor fxsave code on amd64
Due to fxsave area is os independent reimplement fxsave handmade code
using copying of a whole area.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D40443
MFC after:		2 weeks
2023-06-09 01:32:46 +03:00
Dmitry Chagin 84617f6fcc linux(4) rt_sendsig: Remove the use of caddr_t
Replace caddr_t by more appropriate char *.

MFC after:		2 weeks
2023-06-06 23:01:39 +03:00
Colin Percival 9d6ae1e3c2 Revert "Revert "tslog: Annotate some early boot functions""
Now that <sys/tslog.h> is wrapped in #ifdef _KERNEL, it's safe to have
tslog annotations in files which might be built from userland (i.e. in
subr_boot.c, which is built as part of the boot loader).

This reverts commit 59588a546f.
2023-06-04 22:49:38 -07:00
Xin LI 4d779448ad gve: Fix build on i386 and enable LINT builds.
Reviewed-by:	imp
Differential Revision: https://reviews.freebsd.org/D40419
2023-06-04 16:35:00 -07:00
Colin Percival 59588a546f Revert "tslog: Annotate some early boot functions"
The change to subr_boot.c broke the libsa build because the TSLOG
macros have their own definitions for the boot loader -- I didn't
realize that the loader code used subr_boot.c.

I'm currently testing a fix and I'll revert this revert once I'm
satisfied that everything works, but I don't want to leave the
tree broken for too long.

This reverts commit 469cfa3c30.
2023-06-04 11:39:45 -07:00
Colin Percival 45cc8519f5 tslog: Annotate parts of SYSINIT cpu
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
SYSINIT cpu takes roughly 2770 us:
* 2280 us in vm_ksubmap_init
  * 535 us in kmem_malloc
    * 450 us in pmap_zero_page
  * 1720 us in pmap_growkernel
    * 1620 us in pmap_zero_page
* 80 us in bufinit
* 480 us in cpu_setregs
  * 430 us in cpu_setregs calling load_cr0

Much of this is hypervisor overhead: load_cr0 is slow because it traps
to the hypervisor, and 99% of the time in pmap_zero_page is spent when
we first touch the page, presumably due to the host Linux kernel
faulting in backing pages one by one.

Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D40327
2023-06-04 10:16:35 -07:00
Colin Percival 2404380aac tslog: Optionally instrument pmap_zero_page
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
pmap_zero_page is responsible for 4.6 ms of the 25.0 ms of boot time.
This is not in fact time spent zeroing pages though; almost all of
that time is spent in a first-touch penalty, presumably due to the
host Linux kernel faulting in backing pages one by one.

There's probably a way to improve that by teaching Firecracker to
fault in all the VM's pages from the start rather than having them
faulted in one at a time, but that's outside of FreeBSD's control.

This commit adds a TSLOG_PAGEZERO option which enables TSLOG on the
amd64 pmap_zero_page function; it's a separate option (turned off
by default even if TSLOG is enabled) since zeroing pages happens
enough that it can easily fill the TSLOG buffer and prevent other
timing information from being recorded.

Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D40326
2023-06-04 10:16:31 -07:00
Colin Percival 469cfa3c30 tslog: Annotate some early boot functions
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
hammer_time takes roughly 2740 us:
* 55 us in xen_pvh_parse_preload_data
  * 20 us in boot_parse_cmdline_delim
  * 20 us in boot_env_to_howto
* 15 us in identify_hypervisor
* 1320 us in link_elf_reloc
  * 1310 us in relocate_file1 handling ef->rela
* 25 us in init_param1
* 30 us in dpcpu_init
* 355 us in initializecpu
  * 255 us in initializecpu calling load_cr4
* 425 us in getmemsize
  * 280 us in pmap_bootstrap
    * 205 us in create_pagetables
* 10 us in init_param2
* 25 us in pci_early_quirks
* 60 us in cninit
* 90 us in kdb_init
* 105 us in msgbufinit
* 20 us in fpuinit
* 205 us elsewhere in hammer_time

Some of these are unavoidable (e.g. identify_hypervisor uses CPUID and
load_cr4 loads the CR4 register, both of which trap to the hypervisor)
but others may deserve attention.

Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D40325
2023-06-04 10:16:22 -07:00
Mark Johnston 18282c4772 sysarch: Add includes required for ktrcapfail() calls to be compiled
Reported by:	jfree
MFC after:	1 week
2023-06-01 17:18:23 -04:00
Mateusz Guzik 6217c2473d amd64: zero-pad register dumps on panic
de gustibus and so on

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-05-30 13:15:56 +00:00
Dmitry Chagin eb98f77910 linux(4): Regen for linux_execve
MFC after:		2 month
2023-05-29 12:18:30 +03:00
Dmitry Chagin 8340b03425 linux(4): Add a dedicated linux_exec_copyin_args()
Because Linux allows to exec binaries with 0 argc.

Reviewed by:		brooks
Differential Revision:	https://reviews.freebsd.org/D40148
MFC after:		2 month
2023-05-29 12:18:16 +03:00
Dmitry Chagin d706d02edb sysentvec: Retire sv_imgact_try as unneeded anymore
The sysentvec sv_imgact_try was used by kern_exec() to allow
non-native ABI to fixup shell path according to ABI root directory.
Since the non-native ABI can now specify its root directory directly
to namei() via pwd_altroot() call this facility is not needed anymore.

Differential Revision:	https://reviews.freebsd.org/D40092
MFC after:		2 month
2023-05-29 11:18:11 +03:00
Dmitry Chagin 57578deac7 Brandinfo: Retire emul_path as unneeded anymore
The Barndinfo emul_path was used by the Elf image activator to fixup
interpreter file name according to ABI root directory. Since the
non-native ABI can now specify its root directory directly to namei()
via pwd_altroot() call this facility is not needed anymore.

Differential Revision:	https://reviews.freebsd.org/D40091
MFC after:		2 month
2023-05-29 11:17:28 +03:00
Dmitry Chagin fd745e1db6 linux(4): Use pwd_altroot() to tell namei() about ABI root path
PR:			72920
Differential Revision:	https://reviews.freebsd.org/D40090
MFC after:		2 month
2023-05-29 11:16:46 +03:00
Dmitry Chagin 78c2e58fa5 linux(4): Fix stack unwinding across signal frame on x86_64
Get rid of using register numbers which is undefined in libunwind
on x86_64.

Differential Revision:	https://reviews.freebsd.org/D40156
MFC after:		1 month
2023-05-28 17:07:28 +03:00
Dmitry Chagin 037b60fb0f linux(4): Preserve %rcx (return address) like a Linux do
Perhaps, this does not makes much sense as destroyng %rcx declared by
the x86_64 Linux syscall ABI. However,:
a) if we get a signal while we are in the kernel, we should restore
   tf_rcx when preparing machine context for signal handlers.
b) the Linux world is strange, someone can depend on %rcx value
   after syscall, something like go.

Differential Revision:	https://reviews.freebsd.org/D40155
MFC after:		1 month
2023-05-28 17:06:47 +03:00
Dmitry Chagin 185bd9fa30 linux(4): Simplify %r10 restoring on amd64
Restore %r10 at system call entry to avoid doing this multiply times.

Differential Revision:	https://reviews.freebsd.org/D40154
MFC after:		1 month
2023-05-28 17:06:23 +03:00
Dmitry Chagin a463dd8108 linux(4): Add a comment explaining registers at syscall entry point on amd64
Differential Revision:	https://reviews.freebsd.org/D40153
MFC after:		1 month
2023-05-28 17:06:05 +03:00
Dmitry Chagin a99b890ecd linux(4): Drop a weird comment from linux_set_syscall_retval on amd64
I agree, it would be great to avoid PCB_FULL_IRET, however we should
follow Linux system call ABI.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D40152
MFC after:		1 month
2023-05-28 17:05:44 +03:00
Mark Johnston 9fb6718d1b smp: Dynamically allocate the stoppcbs array
This avoids bloating the kernel image when MAXCPU is large.

A follow-up patch for kgdb and other kernel debuggers is needed since
the stoppcbs symbol is now a pointer.  Bump __FreeBSD_version so that
debuggers can use osreldate to figure out how to handle stoppcbs.

PR:		269572
MFC after:	never
Reviewed by:	mjg, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39806
2023-05-25 18:09:55 -04:00
Mark Johnston e17eca3276 vmm: Avoid embedding cpuset_t ioctl ABIs
Commit 0bda8d3e9f ("vmm: permit some IPIs to be handled by userspace")
embedded cpuset_t into the vmm(4) ioctl ABI.  This was a mistake since
we otherwise have some leeway to change the cpuset_t for the whole
system, but we want to keep the vmm ioctl ABI stable.

Rework IPI reporting to avoid this problem.  Along the way, make VM_RUN
a bit more efficient:
- Split vmexit metadata out of the main VM_RUN structure.  This data is
  only written by the kernel.
- Have userspace pass a cpuset_t pointer and cpusetsize in the VM_RUN
  structure, as is done for cpuset syscalls.
- Have the destination CPU mask for VM_EXITCODE_IPIs live outside the
  vmexit info structure, and make VM_RUN copy it out separately.  Zero
  out any extra bytes in the CPU mask, like cpuset syscalls do.
- Modify the vmexit handler prototype to take a full VM_RUN structure.

PR:		271330
Reviewed by:	corvink, jhb (previous versions)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D40113
2023-05-23 21:15:59 -04:00
Dmitry Chagin 1d76741520 linux(4): Implement ptrace_pokeusr for x86_64
Differential Revision:	https://reviews.freebsd.org/D40097
MFC after:		1 week
2023-05-18 20:02:35 +03:00
Dmitry Chagin 3d0addcd35 linux(4): Make ptrace_pokeusr machine dependent
Differential Revision:	https://reviews.freebsd.org/D40096
MFC after:		1 week
2023-05-18 20:01:12 +03:00
Dmitry Chagin dd2a6cd701 linux(4): Make ptrace_peekusr machine dependend
And partially implement it for x86_64.

Differential Revision:	https://reviews.freebsd.org/D40095
MFC after:		1 week
2023-05-18 20:00:12 +03:00
Piotr Pawel Stefaniak 411942a70e GENERIC: remove a stray space character 2023-05-13 21:31:49 +02:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Bjoern A. Zeeb 721b44ba5f amd64: pmap.h put a guard around a pcpu.h function
pmap_get_pcid() calls zpcpu_get() which is defined in pcpu.h.
It is unclear why we do not include that header but like right
above the change add another guard around pmap_get_pcid().
This allows some LinuxKPI headers to compile again.

Suggested by:	markj
MFC after:	10 days
2023-05-12 11:14:54 +00:00
Warner Losh 062a7b918f twe: Remove driver
Sponsored by:		Netflix
2023-05-10 22:24:12 -06:00
Konstantin Belousov bf864c3ed5 amd64 MINIMAL: SysV IPC syscalls are loadable
Reviewed by:	emaste, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39990
2023-05-09 18:30:07 +03:00
Konstantin Belousov 0c1c5e36eb amd64 MINIMAL: remove UFS from compiled-in list
Reviewed by:	emaste, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39990
2023-05-09 18:30:07 +03:00
Konstantin Belousov bba6249ae9 amd64 MINIMAL config: remove statements about UFS module
All UFS options work for ufs.ko.

Reviewed by:	emaste, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39990
2023-05-09 18:30:07 +03:00
Vitaliy Gusev c543e09f1f
bhyve: save/restore pir_desc
Failing to preserve pir_desc can result in pending interrupts being lost
on resume leading to a hung VM.

Reviewed by:		corvink, jhb
MFC after:		1 week
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D35447
2023-05-09 10:31:27 +02:00
Bojan Novković fefac54359
bhyve: fix vCPU single-stepping on VMX
This patch fixes virtual machine single stepping on VMX hosts.

Currently, when using bhyve's gdb stub, each attempt at single-stepping
a vCPU lands in a timer interrupt. The current single-stepping mechanism
uses the Monitor Trap Flag feature to cause VMEXIT after a single
instruction is executed. Unfortunately, the SDM states that MTF causes
VMEXITs for the next instruction that gets executed, which is often not
what the person using the debugger expects. [1]

This patch adds a new VM capability that masks interrupts on a vCPU by
blocking interrupt injection and modifies the gdb stub to use the newly
added capability while single-stepping a vCPU.

[1] Intel SDM 26.5.2 Vol. 3C

Reviewed by:		corvink, jbh
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D39949
2023-05-09 10:04:55 +02:00
Mitchell Horne aba91805aa hwpmc: use kstack_contains()
This existing helper function is preferable to the hand-rolled
calculation of the kstack bounds.

Make some small style improvements while here. Notably, rename every
instance of "r", the return address, to "ra". Tidy the includes in the
affected files.

Reviewed by:	jkoshy
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39909
2023-05-06 14:49:19 -03:00
Konstantin Belousov 38843fe0f2 amd64: add MINIMALUP config
This is the MINIMAL config with SMP/NUMA options turned off.
Useful to ensure that UP configuration still builds, until it is removed
finally.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-05-06 14:24:07 +03:00
Konstantin Belousov 3a8c69c1ff amd64 MINIMAL config: remove sentence about acpi
On amd64 ACPI is required to boot, it cannot work as a module, and we do
not build the ACPI module for long time.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-05-06 14:24:07 +03:00
Konstantin Belousov 7c8e66ed8d amd64: convert UP code to dynamically allocated pmap->pm_pcid
Reported by:	peterj
Sponsored by:	The FreeBSD Foundation
2023-05-06 14:24:07 +03:00
Corvin Köhne b10e100d16
vmm: don't free unallocated memory
If vmx or svm is disabled in BIOS or the device isn't supported by vmm,
modinit won't allocate these state save areas. As kmem_free panics when
passing a NULL pointer to it, loading the vmm kernel module causes a
panic too.

PR:			271251
Reviewed by:		markj
Fixes:			74ac712f72 ("vmm: Dynamically allocate a couple of per-CPU state save areas")
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D39974
2023-05-05 15:34:00 +02:00
Igor Ostapenko 0167b5a793 sys/amd64/conf/FIRECRACKER: typo (compatiblity)
https://bugs.freebsd.org/269753

PR:                      269753
Reported by:             Igor Ostapenko
Approved by:             doc, src (delphij, imp, zlei)
Differential revision:   https://reviews.freebsd.org/D38741
2023-05-05 01:23:08 +01:00
John Baldwin 4961faaacc pmap_{un}map_io_transient: Use bool instead of boolean_t.
Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D39920
2023-05-04 12:29:48 -07:00
John Baldwin 407f675718 imgact_elf: Change header_supported to return bool instead of boolean_t.
Reviewed by:	imp, kib, emaste
Differential Revision:	https://reviews.freebsd.org/D39919
2023-05-04 12:29:29 -07:00
Konstantin Belousov 3582acbad3 amd64 mp_machdep.c: remove useless comment
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39945
2023-05-04 18:39:22 +03:00
Konstantin Belousov af1c6d3f30 amd64: do not leak pcpu pages
Do not preallocate pcpu area backing pages on early startup, only
allocate enough of KVA for pcpu[MAXCPU] and the page for BSP.  Other
pages are allocated after we know the number of cpus and their
assignments to the domains.

PCPUs are not accessed until they are initialized, which happens on AP
startup.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39945
2023-05-04 18:39:22 +03:00
Konstantin Belousov e704f88f3d amd64: initialize APs kpmap_store in init_secondary()
The APs pcpu area is zeroed in init_secondary() by pcpu_init(), so the
early initialization in pmap_bootstrap() is nop.

Fixes:	42f722e721cd010ae5759a4b0d3b7b93c2b9cad2ESC
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39945
2023-05-04 18:39:22 +03:00
Konstantin Belousov 42f722e721 amd64: store pcids pmap data in pcpu zone
This change eliminates the struct pmap_pcid array embedded into struct
pmap and sized by MAXCPU, which would bloat with MAXCPU increase.  Also
it removes false sharing of cache lines, since the array elements are
mostly locally accessed by corresponding CPUs.

Suggested by:	mjg
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39890
2023-05-02 14:32:47 +03:00
Konstantin Belousov 9c8cbf3819 amd64 pmap_pcid_alloc(): pass a pointer to struct pmap_pcid instead of cpuid
Cpuid is used to index the pmap->pm_pcids array only.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39890
2023-05-02 14:32:40 +03:00
Konstantin Belousov 9e0143694a amd64: add pmap_get_pcid() helper
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39890
2023-05-02 14:32:35 +03:00
Konstantin Belousov 86b61ccb34 amd64 pmap: add pmap_pinit_pcids() helper
to initialize pm_pcids array for a new user pmap

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39890
2023-05-02 14:32:29 +03:00
Konstantin Belousov 32bb28d8ad amd64: move definition of the struct pmap_pcids into _pmap.h
and rename the structure to pmap_pcid.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39890
2023-05-02 14:32:20 +03:00
Dmitry Chagin 80d8a4a003 linux(4): Make struct stat64 to match Linux actual one 2023-04-28 11:55:04 +03:00
Dmitry Chagin cd0fca82bb linux(4): Regen for mknod syscall changes 2023-04-28 11:55:04 +03:00
Dmitry Chagin ca3333dd4a linux(4): Use Linux dev_t type for mknod syscalls dev argument
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit unsigned integer
on all platforms. Prior the 2.6 kernel dev_t type was an unsigned short.
However, since the firs commit of the Linuxulator, mknod syscall get int dev
argument.
Also, there is some confusion here, while the kernel declares a dev_t type
as a 32-bit sized, the user-space dev_t type can be size of 64 bits, e.g.,
in the Glibc library.
To avoid confusion and to help porting of the Linuxulator to other platforms
use explicit l_dev_t for dev argument of mknod syscalls.
2023-04-28 11:55:02 +03:00
Dmitry Chagin 19973638be linux(4): Move dev_t type declaration under /compat/linux
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit unsigned integer
on all platforms. Move it into the MI linux.h under /compat/linux.
2023-04-28 11:55:02 +03:00
Dmitry Chagin e0bfe0d62c linux(4): Make struct newstat to match actual Linux one
In the struct stat the st_dev, st_rdev are unsigned long.
2023-04-28 11:55:01 +03:00
Dmitry Chagin 023e688496 linux(4): Regen for struct l_old_stat changes 2023-04-28 11:55:01 +03:00
Dmitry Chagin 2370c7321f linux(4): Update syscalls.master to reflect struct l_old_stat 2023-04-28 11:54:59 +03:00
Dmitry Chagin 391fd1e1a1 linux(4): Mark old fstat syscal as unimplemented
It looks like the old fstat system call never been implemented.
2023-04-28 11:54:59 +03:00
Dmitry Chagin a408fc097f linux(4): Rename obsolete old struct l_stat to struct l_old_stat 2023-04-28 11:54:59 +03:00
Mark Johnston 74ac712f72 vmm: Dynamically allocate a couple of per-CPU state save areas
This avoids bloating the BSS when MAXCPU is large.

No functional change intended.

PR:		269572
Reviewed by:	corvink, rew
Tested by:	rew
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39805
2023-04-26 10:08:42 -04:00
Vitaliy Gusev 0912408a28
vmm: fix HLT loop while vcpu has requested virtual interrupts
This fixes the detection of pending interrupts when pirval is 0 and the
pending bit is set

More information how this situation occurs, can be found here:
c5b5f2d808/sys/amd64/vmm/intel/vmx.c (L4016-L4031)

Reviewed by:		corvink, markj
Fixes:			02cc877968 ("Recognize a pending virtual interrupt while emulating the halt instruction.")
MFC after:		1 week
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D39620
2023-04-26 10:38:46 +02:00
Mateusz Guzik 95e4f5ef7c x86: whack pmspcv from GENERIC
The driver is enormous and rarely used.

      text      data       bss        dec         hex   filename
  23076646   1870505   4415872   29363023   0x1c00b4f   kernel.before
  20017433   1870305   4416000   26303738   0x1915cfa   kernel.after

People using the driver will need to add pmspcv_load="YES" to
their loader.conf.

Reviewed by:	jhb
Relnotes:	yes
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39816
2023-04-25 18:09:44 +00:00
Mark Johnston 47cf1b37f4 vmm: Expose some more AVX512 CPUID bits to guests
This is required to announce support for some accelerated AES
operations.  AVX512BW indicates support for the AVX512-FP16 extension
and AVX512VL indicates support for the use of AVX512 instructions with
vector lengths smaller than 512 bits.

VAES and VPCLMULQDQ extensions indicate that VEX-prefixed AES-NI and
pclmulqdq instructions are supported.

All of these bits are needed for OpenSSL to use VAES to accelerate
AES-GCM transforms.

Reviewed by:	corvink, kib, jhb
MFC after:	2 weeks
Sponsored by:	Stormshield
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D39781
2023-04-25 13:35:14 -04:00
Dmitry Chagin 56c5230afd linux(4): Fix LINUX_AT_COUNT comments
Differential Revision:	https://reviews.freebsd.org/D39645
MFC after:		1 month
2023-04-22 22:16:43 +03:00
Dmitry Chagin 7d8c983983 linux(4): Deduplicate linux_copyout_auxargs()
Export default MINSIGSTKSZ value for the x86 until we do not preserve AVX
registers in the signal context.

Differential Revision:	https://reviews.freebsd.org/D39644
MFC after:		1 month
2023-04-22 22:16:02 +03:00
Warner Losh 559b94a122 syscall.master: Fix comments
Have more accruate comments. While #if, #else, etc are copied to the
header files, lines that don't start with # are not.  And #include files
are only output to sysinc (which winds up at the front of init_sysent.c
which seems a bit odd). This is all radically undocumented, and likely
has drifted somewhat from 4.4BSD and what other systems do (they've
drifted too, fwiw).

Sponsored by:		Netflix
2023-04-20 16:18:02 -06:00
Dmitry Chagin de4da6cd04 x86: Move i386 timerreg.h to x86
Reviewed by:		emaste, jhb
Differential Revision:	https://reviews.freebsd.org/D39656
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Dmitry Chagin d1f4c44aa8 x86: Move i386 ppireg.h to x86
Differential Revision:	https://reviews.freebsd.org/D39655
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Konstantin Belousov 617a11eab6 x86: initialize use_xsave once
The explanation from https://reviews.freebsd.org/D39637 by stevek:
The "use_xsave" variable is a global and that is only supposed to be
initialized early before scheduling gets started. However, with the way
the ifuncs for "fpusave" and "fpurestore" are implemented, the value
could be changed at runtime when scheduling is active if "use_xsave"
was set to 0 by the tunable. This leaves a window of opportunity where
"use_xsave" gets re-initialized to 1 and a context switch could occur
with a thread that was not set up to be able to use xsave functionality.
This can lead to an "privileged instruction fault".

The fix is to protect "use_xsave" from being initialized more than once.

Reported and reviewed by:	stevek
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39660
2023-04-19 02:22:28 +03:00
Konstantin Belousov 1e0e335b0f amd64: fix PKRU and swapout interaction
When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact.  Provide a variant of pmap_remove(), pmap_map_delete(), to
allow pmap to distinguish between real removes of the UVA mappings
and any other internal removes, e.g. swapout.

For non-amd64, pmap_map_delete() is stubbed by define to pmap_remove().

Reported by:	andrew
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39556
2023-04-15 02:53:59 +03:00
Julien Grall ab7ce14b1d xen/intr: introduce dev/xen/bus/intr-internal.h
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header.  A
similar situation exists for the NR_EVENT_CHANNELS constant.

Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.

This was originally implemented by Julien Grall, but has been heavily
modified.  The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around.  This allows the
architecture to add function definitions which use struct xenisrc.

The original version only moved xi_intsrc into xen_arch_isrc_t.  Moving
xi_vector was done by the submitter.

The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t.  Those disappeared with the removal of PVHv1 support.

Copyright note.  The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs.  Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
 - Adjust some line lengths
 - Fix comment about NR_EVENT_CHANNELS after movement.
 - Use #include instead of symlinks.
2023-04-14 15:58:53 +02:00
Elliott Mitchell af610cabf1 xen/intr: adjust xen_intr_handle_upcall() to match driver filter
xen_intr_handle_upcall() has two interfaces.  It needs to be called by
the x86 assembly code invoked by the APIC.  Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.

Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface.  Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().

When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled.  This removes the need for
critical_enter()/critical_exit() when called this way.

The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.

Additionally driver_filter_t functions have no need to handle interrupt
counters.  The intrcnt_add() calling function was reworked to match the
current situation.  intrcnt_add() is now only called via one path.

The increment/decrement of curthread->td_intr_nesting_level had
previously been left out.  Appears this was mostly harmless, but this
was noticed during implementation and has been added.

CONFIG_X86 is a leftover from use with Linux.  While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.

Copyright note.  xen_intr_intrcnt_add() was introduced at 76acc41fb7
by Justin T. Gibbs.  xen_intrcnt_init() was introduced at fd036deac1
by John Baldwin.

sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface.  Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later.  As such the filename and header list
belong to Julien Grall, but what those were created for is later.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006
2023-04-14 15:58:52 +02:00
Elliott Mitchell ecdcad6516 xen: remove CONFIG_XEN_COMPAT, purge Xen 3.0 compatibility
This overlaps the purpose of __XEN_INTERFACE_VERSION__.  Remove Xen 3.0.2
compatibility.  __XEN_INTERFACE_VERSION__ has compatibility to Xen 3.2.8
enabled.  As Xen 3.3 was released almost 15 years ago, it seems unlikely
anyone hasn't updated.

Reviewed by: royger
2023-04-14 15:58:48 +02:00
Elliott Mitchell b2c50bb934 xen/efi: make Xen PV EFI clock optional
The present implementation is only for x86.  Other architectures need
adjustments for querying presence of EFI.

Xen's EFI support is also quite troublesome on non-x86.  This is being
slowly remedied, but until in better shape the EFI clock functionality
should be disabled.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31065
2023-04-14 15:58:47 +02:00
Henri Hennebert 71883128e5 rtsx: Add plug-and-play info
Add MODULE_PNP_INFO() to the driver to make it autoload if not linked
statically into the kernel. Remove the device from amd64/i386 GENERIC.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D35074
2023-04-13 11:12:50 -03:00
Dmitry Chagin 50111714f5 linux(4): Regen for close_range syscall
MFC after:		2 weeks
2023-04-04 23:23:37 +03:00
Dmitry Chagin 1c27dce1f8 linux(4): Modify close_range syscall to match Linux
MFC after:		2 weeks
2023-04-04 23:23:24 +03:00
Alexander V. Chernikov 3091d980f5 netlink: add NETLINK to the DEFAULTS for each architecture
NETLINK is going to replace rtsock and a number of other ioctl/sysctl interfaces.
In-base utilies such as route(8), netstat(8) and soon ifconfig(8)
 are being converted to use netlink sockets as a transport between
 kernel and userland.
In the current configuration, it still possible have the kernel
 without NETLINK (`nooptions NETLINK`) and use the aforementioned
 utilies by buidling the world with `WITHOUT_NETLINK` src.conf knob.
However, this approach does not cover the cases when person unintentionally
 builds a custom kernel without netlink and tries to use the standard userland.

This change adds `option NETLINK` to the default options for each
 architecture, fixing the custom kernel issue.
For arm, this change uses `std.armv6` and `std.armv7` (netlink already in)
 instead of DEFAULTS.

Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D39339
2023-04-02 15:27:21 +00:00