Use a constant input operand instead of an output operand to tell the
compiler about OFFSETOF_MONITORBUF. If we tell it we are writing to
*(u_int *)OFFSETOF_MONITORBUF, it rightly complains, but we aren't. The
memory clobber already covers the necessary semantics for the compiler.
Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D45694
The idea here is to avoid a memory access and conditional branch per
probe site. Instead, the probe is represented by an "unreachable"
unconditional function call. asm goto is used to store the address of
the probe site (represented by a no-op sled) and the address of the
function call into a tracepoint record. Each SDT probe carries a list
of tracepoints.
When the probe is enabled, the no-op sled corresponding to each
tracepoint is overwritten with a jmp to the corresponding label. The
implementation uses smp_rendezvous() to park all other CPUs while the
instruction is being overwritten, as this can't be done atomically in
general. The compiler moves argument marshalling code and the
sdt_probe() function call out-of-line, i.e., to the end of the function.
Per gallatin@ in D43504, this approach has less overhead when probes are
disabled. To make the implementation a bit simpler, I removed support
for probes with 7 arguments; nothing makes use of this except a
regression test case. It could be re-added later if need be.
The approach taken in this patch enables some more improvements:
1. We can now automatically fill out the "function" field of SDT probe
names. The SDT macros let the programmer specify the function and
module names, but this is really a bug and shouldn't have been
allowed. The intent was to be able to have the same probe in
multiple functions and to let the user restrict which probes actually
get enabled by specifying a function name or glob.
2. We can avoid branching on SDT_PROBES_ENABLED() by adding the ability
to include blocks of code in the out-of-line path. For example:
if (SDT_PROBES_ENABLED()) {
int reason = CLD_EXITED;
if (WCOREDUMP(signo))
reason = CLD_DUMPED;
else if (WIFSIGNALED(signo))
reason = CLD_KILLED;
SDT_PROBE1(proc, , , exit, reason);
}
could be written
SDT_PROBE1_EXT(proc, , , exit, reason,
int reason;
reason = CLD_EXITED;
if (WCOREDUMP(signo))
reason = CLD_DUMPED;
else if (WIFSIGNALED(signo))
reason = CLD_KILLED;
);
In the future I would like to use this mechanism more generally, e.g.,
to remove branches and marshalling code used by hwpmc, and generally to
make it easier to add new tracepoint consumers without having to add
more conditional branches to hot code paths.
Reviewed by: Domagoj Stolfa, avg
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D44483
We were undercounting in the case where the boot stack crosses a 2MB
boundary, resulting in a panic during locore execution.
MFC after: 1 week
Fixes: 756bc3adc5 ("kasan: Create a shadow for the bootstack prior to hammer_time()")
aa3bcaa fixed an edge case invloving mlock() and superpage creation
by creating and inserting a leaf pagetable page for mlock'd superpages.
However, the code does not properly release the reference to the
pagetable page in the error handling path.
This commit fixes the issue by adding calls to 'pmap_abort_ptp'
in the error handling path.
Reported by: alc
Approved by: markj (mentor)
Fixes: aa3bcaa
Differential Revision: https://reviews.freebsd.org/D45577
FreeBSD's boot times have decreased to the point where vm_page array
initialization represents a significant fraction of the total boot time.
For example, when booting FreeBSD in Firecracker (a VMM designed to
support lightweight VMs) with 128MB and 1GB of RAM, vm_page
initialization consumes 9% (3ms) and 37% (21.5ms) of the kernel boot
time, respectively. This is generally relevant in cloud environments,
where one wants to be able to spin up VMs as quickly as possible.
This patch implements lazy initialization of (most) page structures,
following a suggestion from cperciva@. The idea is to introduce a new
free pool, VM_FREEPOOL_LAZYINIT, into which all vm_page structures are
initially placed. For this to work, we need only initialize the first
free page of each chunk placed into the buddy allocator. Then, early
page allocations draw from the lazy init pool and initialize vm_page
chunks (up to 16MB, 4096 pages) on demand. Once APs are started, an
idle-priority thread drains the lazy init pool in the background to
avoid introducing extra latency in the allocator. With this scheme,
almost all of the initialization work is moved out of the critical path.
A couple of vm_phys operations require the pool to be drained before
they can run: vm_phys_find_range() and vm_phys_unfree_page(). However,
these are rare operations. I believe that
vm_phys_find_freelist_contig() does not require any special treatment,
as it only ever accesses the first page in a power-of-2-sized free page
chunk, which is always initialized.
For now the new pool is only used on amd64 and arm64, since that's where
I can easily test and those platforms would get the most benefit.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D40403
This is useful for a subsequent patch which implements lazy
initialization of vm_page structures using a dedicate vm_phys free page
pool.
No functional change intended.
Reviewed by: alc, kib, emaste
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40399
Add a method rangeset_next to find the first range that starts at or
after a given value. Use it to rewrite pmap_pkru_same and
pmap_bti_same to avoid walking a page at a time over pages in no
range.
Reviewed by: andrew, kib
Differential Revision: https://reviews.freebsd.org/D45511
Make the tlb shootdown function as a pointer. By default, it still
points to the system function smp_targeted_tlb_shootdown(). It allows
other implemenations to overwrite in the future.
Reviewed by: kib
Tested by: whu
Authored-by: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Co-Authored-by: Erni Sri Satya Vennela <ernis@microsoft.com>
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D45174
A function called mask_width in one place and log2 in the other
calculates its value in a more complex way than necessary. A simpler
implementation offered here saves a few bytes in the functions that
call it.
Reviewed by: alc, avg
Differential Revision: https://reviews.freebsd.org/D45483
Implement a simple heuristic to skip pointless promotion attempts by
pmap_enter_quick_locked() and moea64_enter(). Specifically, when
vm_fault() calls pmap_enter_quick() to map neighboring pages at the end
of a copy-on-write fault, there is no point in attempting promotion in
pmap_enter_quick_locked() and moea64_enter(). Promotion will fail
because the base pages have differing protection.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45431
MFC after: 1 week
The kernel source contains several definitions of an ilog2 function;
some are slower than necessary, and one of them is incorrect.
Elimininate them all and define an ilog2 macro in libkern to replace
them, in a way that is fast, correct for all argument types, and, in a
GENERIC kernel, includes a check for an invalid zero parameter.
Folks at Microsoft have verified that having a correct ilog2
definition for their MANA driver doesn't break it.
Reviewed by: alc, markj, mhorne (older version), jhibbits (older version)
Differential Revision: https://reviews.freebsd.org/D45170
Differential Revision: https://reviews.freebsd.org/D45235
I cannot find a time where the function was not named this.
Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45383
This commit introduces the MINIDUMP_STARTUP_PAGE_TRACKING symbol and
uses it to simplify several instances of a complex preprocessor conditional
for adding pages allocated when bootstraping the kernel to minidumps.
Reviewed by: markj, mhorne
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45085
This commit refactors the UMA small alloc code and
removes most UMA machine-dependent code.
The existing machine-dependent uma_small_alloc code is almost identical
across all architectures, except for powerpc where using the direct
map addresses involved extra steps in some cases.
The MI/MD split was replaced by a default uma_small_alloc
implementation that can be overridden by architecture-specific code by
defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was
introduced to replace most UMA_MD_SMALL_ALLOC uses.
Reviewed by: markj, kib
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45084
We previously defaulted to using sc(4) with a special case to prefer
vt(4) when booted via UEFI. As vt(4) is now always the default we can
simplify this.
Reviewed by: imp, kevans
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45356
Previously, it was necessary to set WITHOUT_INET_SUPPORT when building
the kernel without INET, and WITHOUT_INET6_SUPPORT when building the
kernel without INET6, or else the modules build would fail. The
LINT-NOINET and LINT-NOINET6 configs did this using makeoptions.
After recent changes, this is no longer required, so remove these
makeoptions. This avoids masking potential future build issues when
these aren't set.
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1255
FIRECRACKER is not a legacy config, so remove the really old FreeBSD
versions from it. MINIMAL has a similar history, and limited target
audience which has little to no overlap with really old binaries. Either
of these is really easy to get additional binary compat with the include
directive, so balance things better. Leave GENERIC alone.
PR: 231768
Signed-off-by: Henrich Hartzer <henrichhartzer@tuta.io>
Reviewed by: imp (MINIMAL), cperciva (FIRECRACKER)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1228
This is mostly to reduce the diff with CheriBSD which adds additional
constants to enum uio_rw, but also matches the normal style used for
uio_segflg.
Reviewed by: kib, emaste
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D45142
Most of the code in vmm_dev.c and vmm.c can and should be shared between
amd64 and arm64 (and eventually riscv) rather than being duplicated. To
the end of adding a shared implementation in sys/dev/vmm, this patch
eliminates most of the differences between the two copies of vmm_dev.c.
- Remove an unneeded cdefs.h include.
- Simplify the amd64 implementation of vcpu_unlock_one().
- Simplify the arm64 implementation of vcpu_lock_one().
- Pass buffer sizes to alloc_memseg() and get_memseg() on arm64. On
amd64 this is needed for compat ioctls, but these functions should be
merged.
- Make devmem_mmap_single() stricter on arm64.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D44995
Until the boot loader automatically loads these things (including the
CAM dependency), we need to have them in the minimal kernel since they
are needed to boot. These aren't strictly required to be in the kernel,
since modules work, but are high enough demand items that until we sort
out boot loader automation, I'm adding them here. These devices are also
common in vm environments. The delta is relatively small in size. Once
the boot loader automation arrives, these and a lot of other things can
be trimmed. It's less than ideal, but is a good middle ground for the
moment.
Sponsored by: Netflix
Reviewed by: kevans, emaste
Differential Revision: https://reviews.freebsd.org/D45012
Since config(8) searches sys/conf by default, there's no need to specify
the full relative path here; replace it by the filename alone.
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1124
The new sys/conf/std.debug contains the list of debugging options
enabled by default in -CURRENT, so they don't need to be listed
individually in every kernel config.
The enabled options are the set of all debug options which were enabled
for the GENERIC kernel on any platform. This means some architectures
now have debugging options enabled in GENERIC which weren't previously
enabled:
- amd64: [1]
- arm64: [2]
- arm: [2]. [3]
- i386: [1], [2]
- powerpc: [1], [2], [3]
- riscv: [2]
[1] ALT_BREAK_TO_DEBUGGER is now enabled.
[2] BUF_TRACKING, FULL_BUF_TRACKING, and QUEUE_MACRO_DEBUG_TRASH are now
enabled.
[3] DEADLKRES is now enabled.
While here, move the documentation for the (commented out) K*SAN options
for amd64 from GENERIC to NOTES.
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1124
In sysproto.h, stop including sys/acl.h as syscall defintions now use
__acl* types from sys/_types.h. Add sys/types.h to provide types
previously provided by sys/param.h (via sys/acl.h).
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44467
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.
sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44465
While here, reorder some of the entries using headers more aligned
with sys/conf/NOTES. Also add a pointer from the amd64/i386 NOTES
files to x86 NOTES.
The "extra" ACPI device drivers were only present in i386 NOTES
previously.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44787
Require both "efirt" and "efidev" in order to build in efidev
Require both "efirt" and "efirtc" in order to build in efirtc
Update FIRECRACKER, GENERIC, and NOTES for amd64
Update NOTES and std.arm for arm64
Reviewed by: imp
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D44745
While here, adjust the sample setting for NVME_USE_NVD to use a
non-default setting as is typical in entries in NOTES.
Discussed with: imp
Reviewed by: manu
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44691