FreeBSD's boot times have decreased to the point where vm_page array
initialization represents a significant fraction of the total boot time.
For example, when booting FreeBSD in Firecracker (a VMM designed to
support lightweight VMs) with 128MB and 1GB of RAM, vm_page
initialization consumes 9% (3ms) and 37% (21.5ms) of the kernel boot
time, respectively. This is generally relevant in cloud environments,
where one wants to be able to spin up VMs as quickly as possible.
This patch implements lazy initialization of (most) page structures,
following a suggestion from cperciva@. The idea is to introduce a new
free pool, VM_FREEPOOL_LAZYINIT, into which all vm_page structures are
initially placed. For this to work, we need only initialize the first
free page of each chunk placed into the buddy allocator. Then, early
page allocations draw from the lazy init pool and initialize vm_page
chunks (up to 16MB, 4096 pages) on demand. Once APs are started, an
idle-priority thread drains the lazy init pool in the background to
avoid introducing extra latency in the allocator. With this scheme,
almost all of the initialization work is moved out of the critical path.
A couple of vm_phys operations require the pool to be drained before
they can run: vm_phys_find_range() and vm_phys_unfree_page(). However,
these are rare operations. I believe that
vm_phys_find_freelist_contig() does not require any special treatment,
as it only ever accesses the first page in a power-of-2-sized free page
chunk, which is always initialized.
For now the new pool is only used on amd64 and arm64, since that's where
I can easily test and those platforms would get the most benefit.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D40403
This is useful for a subsequent patch which implements lazy
initialization of vm_page structures using a dedicate vm_phys free page
pool.
No functional change intended.
Reviewed by: alc, kib, emaste
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40399
Add a method rangeset_next to find the first range that starts at or
after a given value. Use it to rewrite pmap_pkru_same and
pmap_bti_same to avoid walking a page at a time over pages in no
range.
Reviewed by: andrew, kib
Differential Revision: https://reviews.freebsd.org/D45511
Make the tlb shootdown function as a pointer. By default, it still
points to the system function smp_targeted_tlb_shootdown(). It allows
other implemenations to overwrite in the future.
Reviewed by: kib
Tested by: whu
Authored-by: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Co-Authored-by: Erni Sri Satya Vennela <ernis@microsoft.com>
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D45174
A function called mask_width in one place and log2 in the other
calculates its value in a more complex way than necessary. A simpler
implementation offered here saves a few bytes in the functions that
call it.
Reviewed by: alc, avg
Differential Revision: https://reviews.freebsd.org/D45483
Implement a simple heuristic to skip pointless promotion attempts by
pmap_enter_quick_locked() and moea64_enter(). Specifically, when
vm_fault() calls pmap_enter_quick() to map neighboring pages at the end
of a copy-on-write fault, there is no point in attempting promotion in
pmap_enter_quick_locked() and moea64_enter(). Promotion will fail
because the base pages have differing protection.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45431
MFC after: 1 week
The kernel source contains several definitions of an ilog2 function;
some are slower than necessary, and one of them is incorrect.
Elimininate them all and define an ilog2 macro in libkern to replace
them, in a way that is fast, correct for all argument types, and, in a
GENERIC kernel, includes a check for an invalid zero parameter.
Folks at Microsoft have verified that having a correct ilog2
definition for their MANA driver doesn't break it.
Reviewed by: alc, markj, mhorne (older version), jhibbits (older version)
Differential Revision: https://reviews.freebsd.org/D45170
Differential Revision: https://reviews.freebsd.org/D45235
I cannot find a time where the function was not named this.
Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45383
This commit introduces the MINIDUMP_STARTUP_PAGE_TRACKING symbol and
uses it to simplify several instances of a complex preprocessor conditional
for adding pages allocated when bootstraping the kernel to minidumps.
Reviewed by: markj, mhorne
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45085
This commit refactors the UMA small alloc code and
removes most UMA machine-dependent code.
The existing machine-dependent uma_small_alloc code is almost identical
across all architectures, except for powerpc where using the direct
map addresses involved extra steps in some cases.
The MI/MD split was replaced by a default uma_small_alloc
implementation that can be overridden by architecture-specific code by
defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was
introduced to replace most UMA_MD_SMALL_ALLOC uses.
Reviewed by: markj, kib
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45084
We previously defaulted to using sc(4) with a special case to prefer
vt(4) when booted via UEFI. As vt(4) is now always the default we can
simplify this.
Reviewed by: imp, kevans
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45356
Previously, it was necessary to set WITHOUT_INET_SUPPORT when building
the kernel without INET, and WITHOUT_INET6_SUPPORT when building the
kernel without INET6, or else the modules build would fail. The
LINT-NOINET and LINT-NOINET6 configs did this using makeoptions.
After recent changes, this is no longer required, so remove these
makeoptions. This avoids masking potential future build issues when
these aren't set.
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1255
FIRECRACKER is not a legacy config, so remove the really old FreeBSD
versions from it. MINIMAL has a similar history, and limited target
audience which has little to no overlap with really old binaries. Either
of these is really easy to get additional binary compat with the include
directive, so balance things better. Leave GENERIC alone.
PR: 231768
Signed-off-by: Henrich Hartzer <henrichhartzer@tuta.io>
Reviewed by: imp (MINIMAL), cperciva (FIRECRACKER)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1228
This is mostly to reduce the diff with CheriBSD which adds additional
constants to enum uio_rw, but also matches the normal style used for
uio_segflg.
Reviewed by: kib, emaste
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D45142
Most of the code in vmm_dev.c and vmm.c can and should be shared between
amd64 and arm64 (and eventually riscv) rather than being duplicated. To
the end of adding a shared implementation in sys/dev/vmm, this patch
eliminates most of the differences between the two copies of vmm_dev.c.
- Remove an unneeded cdefs.h include.
- Simplify the amd64 implementation of vcpu_unlock_one().
- Simplify the arm64 implementation of vcpu_lock_one().
- Pass buffer sizes to alloc_memseg() and get_memseg() on arm64. On
amd64 this is needed for compat ioctls, but these functions should be
merged.
- Make devmem_mmap_single() stricter on arm64.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D44995
Until the boot loader automatically loads these things (including the
CAM dependency), we need to have them in the minimal kernel since they
are needed to boot. These aren't strictly required to be in the kernel,
since modules work, but are high enough demand items that until we sort
out boot loader automation, I'm adding them here. These devices are also
common in vm environments. The delta is relatively small in size. Once
the boot loader automation arrives, these and a lot of other things can
be trimmed. It's less than ideal, but is a good middle ground for the
moment.
Sponsored by: Netflix
Reviewed by: kevans, emaste
Differential Revision: https://reviews.freebsd.org/D45012
Since config(8) searches sys/conf by default, there's no need to specify
the full relative path here; replace it by the filename alone.
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1124
The new sys/conf/std.debug contains the list of debugging options
enabled by default in -CURRENT, so they don't need to be listed
individually in every kernel config.
The enabled options are the set of all debug options which were enabled
for the GENERIC kernel on any platform. This means some architectures
now have debugging options enabled in GENERIC which weren't previously
enabled:
- amd64: [1]
- arm64: [2]
- arm: [2]. [3]
- i386: [1], [2]
- powerpc: [1], [2], [3]
- riscv: [2]
[1] ALT_BREAK_TO_DEBUGGER is now enabled.
[2] BUF_TRACKING, FULL_BUF_TRACKING, and QUEUE_MACRO_DEBUG_TRASH are now
enabled.
[3] DEADLKRES is now enabled.
While here, move the documentation for the (commented out) K*SAN options
for amd64 from GENERIC to NOTES.
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1124
In sysproto.h, stop including sys/acl.h as syscall defintions now use
__acl* types from sys/_types.h. Add sys/types.h to provide types
previously provided by sys/param.h (via sys/acl.h).
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44467
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.
sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44465
While here, reorder some of the entries using headers more aligned
with sys/conf/NOTES. Also add a pointer from the amd64/i386 NOTES
files to x86 NOTES.
The "extra" ACPI device drivers were only present in i386 NOTES
previously.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44787
Require both "efirt" and "efidev" in order to build in efidev
Require both "efirt" and "efirtc" in order to build in efirtc
Update FIRECRACKER, GENERIC, and NOTES for amd64
Update NOTES and std.arm for arm64
Reviewed by: imp
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D44745
While here, adjust the sample setting for NVME_USE_NVD to use a
non-default setting as is typical in entries in NOTES.
Discussed with: imp
Reviewed by: manu
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44691
Report syscalls that are not allowed in capability mode with
CAPFAIL_SYSCALL.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D40678
In the next commit I will update syscalls.master to use struct __siginfo
(which actually exists) so this update will be needed to make
generated files (from make sysent) align.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44380
This callback shouldn't be modifying any of the arguments.
Reviewed by: imp, kib, emaste, jhb
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D44193