Commit graph

9249 commits

Author SHA1 Message Date
Doug Moore 5dbf886104 x86: use order_base_2
Use order_base_2 in place of expressions involving fls.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D45536
2024-06-24 02:26:23 -05:00
Ryan Libby 6095f4b04c amd64 kernel __storeload_barrier: quiet gcc -Warray-bounds
Use a constant input operand instead of an output operand to tell the
compiler about OFFSETOF_MONITORBUF.  If we tell it we are writing to
*(u_int *)OFFSETOF_MONITORBUF, it rightly complains, but we aren't.  The
memory clobber already covers the necessary semantics for the compiler.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45694
2024-06-23 16:23:14 -07:00
Mark Johnston ddf0ed09bd sdt: Implement SDT probes using hot-patching
The idea here is to avoid a memory access and conditional branch per
probe site.  Instead, the probe is represented by an "unreachable"
unconditional function call.  asm goto is used to store the address of
the probe site (represented by a no-op sled) and the address of the
function call into a tracepoint record.  Each SDT probe carries a list
of tracepoints.

When the probe is enabled, the no-op sled corresponding to each
tracepoint is overwritten with a jmp to the corresponding label.  The
implementation uses smp_rendezvous() to park all other CPUs while the
instruction is being overwritten, as this can't be done atomically in
general.  The compiler moves argument marshalling code and the
sdt_probe() function call out-of-line, i.e., to the end of the function.

Per gallatin@ in D43504, this approach has less overhead when probes are
disabled.  To make the implementation a bit simpler, I removed support
for probes with 7 arguments; nothing makes use of this except a
regression test case.  It could be re-added later if need be.

The approach taken in this patch enables some more improvements:
1. We can now automatically fill out the "function" field of SDT probe
   names.  The SDT macros let the programmer specify the function and
   module names, but this is really a bug and shouldn't have been
   allowed.  The intent was to be able to have the same probe in
   multiple functions and to let the user restrict which probes actually
   get enabled by specifying a function name or glob.
2. We can avoid branching on SDT_PROBES_ENABLED() by adding the ability
   to include blocks of code in the out-of-line path.  For example:

	if (SDT_PROBES_ENABLED()) {
		int reason = CLD_EXITED;

		if (WCOREDUMP(signo))
			reason = CLD_DUMPED;
		else if (WIFSIGNALED(signo))
			reason = CLD_KILLED;
		SDT_PROBE1(proc, , , exit, reason);
	}

could be written

	SDT_PROBE1_EXT(proc, , , exit, reason,
		int reason;

		reason = CLD_EXITED;
		if (WCOREDUMP(signo))
			reason = CLD_DUMPED;
		else if (WIFSIGNALED(signo))
			reason = CLD_KILLED;
	);

In the future I would like to use this mechanism more generally, e.g.,
to remove branches and marshalling code used by hwpmc, and generally to
make it easier to add new tracepoint consumers without having to add
more conditional branches to hot code paths.

Reviewed by:	Domagoj Stolfa, avg
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D44483
2024-06-19 16:57:41 -04:00
Mark Johnston 46bb2dca53 kasan: Increase the size of the bootstrap PTP reservation
We were undercounting in the case where the boot stack crosses a 2MB
boundary, resulting in a panic during locore execution.

MFC after:	1 week
Fixes:	756bc3adc5 ("kasan: Create a shadow for the bootstack prior to hammer_time()")
2024-06-16 13:33:13 -04:00
Mark Johnston 4441dd4094 vm_phys: Fix a typo
Fixes:	b16b4c22d2 ("vm_page: Implement lazy page initialization")
Reported by:	Steffen Nurpmeso <steffen@sdaoden.eu>
2024-06-16 13:33:00 -04:00
Bojan Novković b53b21e8f8 amd64 pmap: Release PTP reference on leaf ptpage allocation failure
aa3bcaa fixed an edge case invloving mlock() and superpage creation
by creating and inserting a leaf pagetable page for mlock'd superpages.
However, the code does not properly release the reference to the
pagetable page in the error handling path.
This commit fixes the issue by adding calls to 'pmap_abort_ptp'
in the error handling path.

Reported by: alc
Approved by: markj (mentor)
Fixes: aa3bcaa
Differential Revision: https://reviews.freebsd.org/D45577
2024-06-16 18:19:26 +02:00
Mark Johnston aede0d3bad amd64/vmm: Make vmm.h more self-contained
CTASSERT is defined in kassert.h, so include that here.  No functional
change intended.

MFC after:	1 week
2024-06-13 21:19:00 -04:00
Mark Johnston b16b4c22d2 vm_page: Implement lazy page initialization
FreeBSD's boot times have decreased to the point where vm_page array
initialization represents a significant fraction of the total boot time.
For example, when booting FreeBSD in Firecracker (a VMM designed to
support lightweight VMs) with 128MB and 1GB of RAM, vm_page
initialization consumes 9% (3ms) and 37% (21.5ms) of the kernel boot
time, respectively.  This is generally relevant in cloud environments,
where one wants to be able to spin up VMs as quickly as possible.

This patch implements lazy initialization of (most) page structures,
following a suggestion from cperciva@.  The idea is to introduce a new
free pool, VM_FREEPOOL_LAZYINIT, into which all vm_page structures are
initially placed.  For this to work, we need only initialize the first
free page of each chunk placed into the buddy allocator.  Then, early
page allocations draw from the lazy init pool and initialize vm_page
chunks (up to 16MB, 4096 pages) on demand.  Once APs are started, an
idle-priority thread drains the lazy init pool in the background to
avoid introducing extra latency in the allocator.  With this scheme,
almost all of the initialization work is moved out of the critical path.

A couple of vm_phys operations require the pool to be drained before
they can run: vm_phys_find_range() and vm_phys_unfree_page().  However,
these are rare operations.  I believe that
vm_phys_find_freelist_contig() does not require any special treatment,
as it only ever accesses the first page in a power-of-2-sized free page
chunk, which is always initialized.

For now the new pool is only used on amd64 and arm64, since that's where
I can easily test and those platforms would get the most benefit.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D40403
2024-06-13 21:19:00 -04:00
Mark Johnston 69ccea1c89 vm_page: Let vm_page_init_page() take a pool parameter
This is useful for a subsequent patch which implements lazy
initialization of vm_page structures using a dedicate vm_phys free page
pool.

No functional change intended.

Reviewed by:	alc, kib, emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D40399
2024-06-13 21:18:59 -04:00
Doug Moore 2c10bacdf4 rangeset: add next() iteration
Add a method rangeset_next to find the first range that starts at or
after a given value. Use it to rewrite pmap_pkru_same and
pmap_bti_same to avoid walking a page at a time over pages in no
range.

Reviewed by:	andrew, kib
Differential Revision:	https://reviews.freebsd.org/D45511
2024-06-06 13:42:31 -05:00
Konstantin Belousov 9c5d7e4a0c pmap: move the smp_targeted_tlb_shutdown pointer stuff to amd64 pmap.h
Fixes:	bec000c9c1ef409989685bb03ff0532907befb4aESC
Sponsored by:	The FreeBSD Foundation
2024-06-06 08:15:08 +03:00
Souradeep Chakrabarti bec000c9c1 amd64: add a func pointer to tlb shootdown function
Make the tlb shootdown function as a pointer. By default, it still
points to the system function smp_targeted_tlb_shootdown(). It allows
other implemenations to overwrite in the future.

Reviewed by:	kib
Tested by:	whu
Authored-by:    Souradeep Chakrabarti <schakrabarti@microsoft.com>
Co-Authored-by: Erni Sri Satya Vennela <ernis@microsoft.com>
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D45174
2024-06-05 12:25:05 +00:00
Doug Moore 9ff1462976 x86: simplify ceil(log2(x)) function
A function called mask_width in one place and log2 in the other
calculates its value in a more complex way than necessary. A simpler
implementation offered here saves a few bytes in the functions that
call it.

Reviewed by:	alc, avg
Differential Revision:	https://reviews.freebsd.org/D45483
2024-06-04 13:00:25 -05:00
Alan Cox f1d73aacdc pmap: Skip some superpage promotion attempts that will fail
Implement a simple heuristic to skip pointless promotion attempts by
pmap_enter_quick_locked() and moea64_enter().  Specifically, when
vm_fault() calls pmap_enter_quick() to map neighboring pages at the end
of a copy-on-write fault, there is no point in attempting promotion in
pmap_enter_quick_locked() and moea64_enter().  Promotion will fail
because the base pages have differing protection.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D45431
MFC after:	1 week
2024-06-04 00:38:05 -05:00
Doug Moore b0056b31e9 libkern: add ilog2 macro
The kernel source contains several definitions of an ilog2 function;
some are slower than necessary, and one of them is incorrect.
Elimininate them all and define an ilog2 macro in libkern to replace
them, in a way that is fast, correct for all argument types, and, in a
GENERIC kernel, includes a check for an invalid zero parameter.

Folks at Microsoft have verified that having a correct ilog2
definition for their MANA driver doesn't break it.

Reviewed by:	alc, markj, mhorne (older version), jhibbits (older version)
Differential Revision:	https://reviews.freebsd.org/D45170
Differential Revision:	https://reviews.freebsd.org/D45235
2024-06-03 11:37:55 -05:00
Mitchell Horne deab57178f Adjust comments referencing vm_mem_init()
I cannot find a time where the function was not named this.

Reviewed by:	kib, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45383
2024-05-27 18:37:40 -03:00
Bojan Novković 0a44b8a56d vm: Simplify startup page dumping conditional
This commit introduces the MINIDUMP_STARTUP_PAGE_TRACKING symbol and
uses it to simplify several instances of a complex preprocessor conditional
for adding pages allocated when bootstraping the kernel to minidumps.

Reviewed by:	markj, mhorne
Approved by:	markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45085
2024-05-25 19:24:55 +02:00
Bojan Novković da76d349b6 uma: Deduplicate uma_small_alloc
This commit refactors the UMA small alloc code and
removes most UMA machine-dependent code.
The existing machine-dependent uma_small_alloc code is almost identical
across all architectures, except for powerpc where using the direct
map addresses involved extra steps in some cases.

The MI/MD split was replaced by a default uma_small_alloc
implementation that can be overridden by architecture-specific code by
defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was
introduced to replace most UMA_MD_SMALL_ALLOC uses.

Reviewed by: markj, kib
Approved by: markj (mentor)
Differential Revision:	https://reviews.freebsd.org/D45084
2024-05-25 19:24:46 +02:00
Ed Maste 9b1de7e484 vt/sc: retire logic to select vt(4) by default for UEFI boot
We previously defaulted to using sc(4) with a special case to prefer
vt(4) when booted via UEFI.  As vt(4) is now always the default we can
simplify this.

Reviewed by:	imp, kevans
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45356
2024-05-25 11:00:35 -04:00
Lexi Winter bfd248f59d sys/amd64/conf/LINT-NOINET{6,}: don't set WITHOUT_INET{6,}_SUPPORT
Previously, it was necessary to set WITHOUT_INET_SUPPORT when building
the kernel without INET, and WITHOUT_INET6_SUPPORT when building the
kernel without INET6, or else the modules build would fail.  The
LINT-NOINET and LINT-NOINET6 configs did this using makeoptions.

After recent changes, this is no longer required, so remove these
makeoptions.  This avoids masking potential future build issues when
these aren't set.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1255
2024-05-24 22:21:25 -06:00
Henrich Hartzer 87bf0aaba8 Remove COMPAT_FREEBSD4/5/6/7/9 from MINIMAL and FIRECRACKER kernel configurations
FIRECRACKER is not a legacy config, so remove the really old FreeBSD
versions from it. MINIMAL has a similar history, and limited target
audience which has little to no overlap with really old binaries. Either
of these is really easy to get additional binary compat with the include
directive, so balance things better. Leave GENERIC alone.

PR: 231768
Signed-off-by: Henrich Hartzer <henrichhartzer@tuta.io>
Reviewed by: imp (MINIMAL), cperciva (FIRECRACKER)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1228
2024-05-23 14:30:57 -06:00
Warner Losh bedbaee805 syscalls: Regen for Linux emulator additions 2024-05-23 13:40:47 -06:00
Ricardo Branco 97add684f5 linux: Support POSIX message queues
Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
2024-05-23 13:40:46 -06:00
Ricardo Branco 427db2c45e linux: Fix linux_mq_notify_args & linux_timer_create_args
Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
2024-05-23 13:40:46 -06:00
John Baldwin 473c90ac04 uio: Use switch statements when handling UIO_READ vs UIO_WRITE
This is mostly to reduce the diff with CheriBSD which adds additional
constants to enum uio_rw, but also matches the normal style used for
uio_segflg.

Reviewed by:	kib, emaste
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D45142
2024-05-10 13:43:36 -07:00
Mark Johnston e3333648b7 vmm: Start reconciling amd64 and arm64 copies of vmm_dev.c
Most of the code in vmm_dev.c and vmm.c can and should be shared between
amd64 and arm64 (and eventually riscv) rather than being duplicated.  To
the end of adding a shared implementation in sys/dev/vmm, this patch
eliminates most of the differences between the two copies of vmm_dev.c.

- Remove an unneeded cdefs.h include.
- Simplify the amd64 implementation of vcpu_unlock_one().
- Simplify the arm64 implementation of vcpu_lock_one().
- Pass buffer sizes to alloc_memseg() and get_memseg() on arm64.  On
  amd64 this is needed for compat ioctls, but these functions should be
  merged.
- Make devmem_mmap_single() stricter on arm64.

Reviewed by:	corvink, jhb
Differential Revision:	https://reviews.freebsd.org/D44995
2024-05-08 12:11:03 -04:00
Warner Losh 04ea5e9f84 MINIMAL: Grow minimal to support ata, scsi and nvme
Until the boot loader automatically loads these things (including the
CAM dependency), we need to have them in the minimal kernel since they
are needed to boot. These aren't strictly required to be in the kernel,
since modules work, but are high enough demand items that until we sort
out boot loader automation, I'm adding them here. These devices are also
common in vm environments. The delta is relatively small in size. Once
the boot loader automation arrives, these and a lot of other things can
be trimmed. It's less than ideal, but is a good middle ground for the
moment.

Sponsored by:		Netflix
Reviewed by:		kevans, emaste
Differential Revision:	https://reviews.freebsd.org/D45012
2024-05-03 09:08:03 -06:00
Lexi Winter 8a8daeafaf sys/*/conf: do not use "../../conf/" when including std.*
Since config(8) searches sys/conf by default, there's no need to specify
the full relative path here; replace it by the filename alone.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1124
2024-04-23 15:13:31 -06:00
Lexi Winter 4f8f9d708e sys: add conf/std.debug, generic debugging options
The new sys/conf/std.debug contains the list of debugging options
enabled by default in -CURRENT, so they don't need to be listed
individually in every kernel config.

The enabled options are the set of all debug options which were enabled
for the GENERIC kernel on any platform.  This means some architectures
now have debugging options enabled in GENERIC which weren't previously
enabled:

- amd64: [1]
- arm64: [2]
- arm: [2]. [3]
- i386: [1], [2]
- powerpc: [1], [2], [3]
- riscv: [2]

[1] ALT_BREAK_TO_DEBUGGER is now enabled.
[2] BUF_TRACKING, FULL_BUF_TRACKING, and QUEUE_MACRO_DEBUG_TRASH are now
    enabled.
[3] DEADLKRES is now enabled.

While here, move the documentation for the (commented out) K*SAN options
for amd64 from GENERIC to NOTES.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1124
2024-04-23 15:13:31 -06:00
Gordon Bergling 8b5c5cae92 vmm(4): Fix a typo in a kernel message
- s/cant/can't/

MFC after:	1 week
2024-04-21 09:44:18 +02:00
Brooks Davis 5d88a2aacf sysproto.h: sys/acl.h -> sys/types.h
In sysproto.h, stop including sys/acl.h as syscall defintions now use
__acl* types from sys/_types.h.  Add sys/types.h to provide types
previously provided by sys/param.h (via sys/acl.h).

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44467
2024-04-15 21:35:41 +01:00
Brooks Davis 6bb132ba1e Reduce reliance on sys/sysproto.h pollution
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.

sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44465
2024-04-15 21:35:40 +01:00
John Baldwin 1f38677ba4 x86 NOTES: Move shared options from amd/i386 NOTES to x86 NOTES
While here, reorder some of the entries using headers more aligned
with sys/conf/NOTES.  Also add a pointer from the amd64/i386 NOTES
files to x86 NOTES.

The "extra" ACPI device drivers were only present in i386 NOTES
previously.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44787
2024-04-13 19:12:07 -07:00
John Baldwin 5ea0b89242 NOTES: Move ENABLE_ALART option to MI NOTES next to intpm device
This option is for this driver.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44786
2024-04-13 19:11:49 -07:00
John Baldwin b620daf633 x86 NOTES: Move NKPT and PMAP_SHPGPERPROC options to VM OPTIONS section
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44785
2024-04-13 19:11:21 -07:00
John Baldwin 717b22e18c x86 NOTES: Remove some obsolete comments
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44784
2024-04-13 19:11:06 -07:00
John Baldwin 1f678b6ba2 NOTES: Move the VirtIO entries to the MI NOTES file
While here, add virtio_gpu

Reviewed by:	imp, emaste
Differential Revision:	https://reviews.freebsd.org/D44782
2024-04-13 19:10:27 -07:00
John Baldwin ff3569be6f NOTES: Move safe(4) to the MI NOTES file
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44780
2024-04-13 19:09:57 -07:00
John Baldwin 9c3fd2c1c7 NOTES: Move IEEE80211_DEBUG_REFCNT to the MI NOTES file
This option is not specific to amd64

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44779
2024-04-13 19:09:38 -07:00
Stephen J. Kiernan bfd2ce2a5a efidev: Allow for optionally including efidev and efirtc into the kernel
Require both "efirt" and "efidev" in order to build in efidev
Require both "efirt" and "efirtc" in order to build in efirtc

Update FIRECRACKER, GENERIC, and NOTES for amd64
Update NOTES and std.arm for arm64

Reviewed by:	imp
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D44745
2024-04-12 13:30:32 -04:00
Elyes Haouas ef764e4801 vhpet: Fix typo
Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885
2024-04-11 11:28:32 -06:00
Elyes Haouas 8d66b134f3 vmm/x86: Fix typo
Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885
2024-04-11 11:28:30 -06:00
Elyes Haouas 33afe704bf sigtramp: Fix typo
Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885
2024-04-11 11:28:27 -06:00
Elyes Haouas 8551c31b2e exception: Fix typos
Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885
2024-04-11 11:28:25 -06:00
Elyes Haouas ca4ceadbe5 minidump_machdep: Fix typo
Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885
2024-04-11 11:28:21 -06:00
Elyes Haouas b8d29d68c4 pmap: Fix typos
Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885
2024-04-11 11:28:19 -06:00
Elyes Haouas 1eedb4e592 vmm: Fix typo
Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885
2024-04-11 11:28:16 -06:00
Elyes Haouas 73bb5aea88 atomic: Fix typo
Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885
2024-04-11 11:28:13 -06:00
Elyes Haouas f6df79ab8d msan: Fix typo
Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885
2024-04-11 11:28:09 -06:00
John Baldwin 8f7105a206 NOTES: Move NVMe entries to MI file
While here, adjust the sample setting for NVME_USE_NVD to use a
non-default setting as is typical in entries in NOTES.

Discussed with:	imp
Reviewed by:	manu
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44691
2024-04-09 15:02:58 -07:00