The idea here is to avoid a memory access and conditional branch per
probe site. Instead, the probe is represented by an "unreachable"
unconditional function call. asm goto is used to store the address of
the probe site (represented by a no-op sled) and the address of the
function call into a tracepoint record. Each SDT probe carries a list
of tracepoints.
When the probe is enabled, the no-op sled corresponding to each
tracepoint is overwritten with a jmp to the corresponding label. The
implementation uses smp_rendezvous() to park all other CPUs while the
instruction is being overwritten, as this can't be done atomically in
general. The compiler moves argument marshalling code and the
sdt_probe() function call out-of-line, i.e., to the end of the function.
Per gallatin@ in D43504, this approach has less overhead when probes are
disabled. To make the implementation a bit simpler, I removed support
for probes with 7 arguments; nothing makes use of this except a
regression test case. It could be re-added later if need be.
The approach taken in this patch enables some more improvements:
1. We can now automatically fill out the "function" field of SDT probe
names. The SDT macros let the programmer specify the function and
module names, but this is really a bug and shouldn't have been
allowed. The intent was to be able to have the same probe in
multiple functions and to let the user restrict which probes actually
get enabled by specifying a function name or glob.
2. We can avoid branching on SDT_PROBES_ENABLED() by adding the ability
to include blocks of code in the out-of-line path. For example:
if (SDT_PROBES_ENABLED()) {
int reason = CLD_EXITED;
if (WCOREDUMP(signo))
reason = CLD_DUMPED;
else if (WIFSIGNALED(signo))
reason = CLD_KILLED;
SDT_PROBE1(proc, , , exit, reason);
}
could be written
SDT_PROBE1_EXT(proc, , , exit, reason,
int reason;
reason = CLD_EXITED;
if (WCOREDUMP(signo))
reason = CLD_DUMPED;
else if (WIFSIGNALED(signo))
reason = CLD_KILLED;
);
In the future I would like to use this mechanism more generally, e.g.,
to remove branches and marshalling code used by hwpmc, and generally to
make it easier to add new tracepoint consumers without having to add
more conditional branches to hot code paths.
Reviewed by: Domagoj Stolfa, avg
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D44483
Most of vmm.h is machine-independent. Simplify merging amd64 and arm64
vmm code by removing this machine-dependent routine from arm64's vmm.h.
No functional change intended.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D45557
FreeBSD's boot times have decreased to the point where vm_page array
initialization represents a significant fraction of the total boot time.
For example, when booting FreeBSD in Firecracker (a VMM designed to
support lightweight VMs) with 128MB and 1GB of RAM, vm_page
initialization consumes 9% (3ms) and 37% (21.5ms) of the kernel boot
time, respectively. This is generally relevant in cloud environments,
where one wants to be able to spin up VMs as quickly as possible.
This patch implements lazy initialization of (most) page structures,
following a suggestion from cperciva@. The idea is to introduce a new
free pool, VM_FREEPOOL_LAZYINIT, into which all vm_page structures are
initially placed. For this to work, we need only initialize the first
free page of each chunk placed into the buddy allocator. Then, early
page allocations draw from the lazy init pool and initialize vm_page
chunks (up to 16MB, 4096 pages) on demand. Once APs are started, an
idle-priority thread drains the lazy init pool in the background to
avoid introducing extra latency in the allocator. With this scheme,
almost all of the initialization work is moved out of the critical path.
A couple of vm_phys operations require the pool to be drained before
they can run: vm_phys_find_range() and vm_phys_unfree_page(). However,
these are rare operations. I believe that
vm_phys_find_freelist_contig() does not require any special treatment,
as it only ever accesses the first page in a power-of-2-sized free page
chunk, which is always initialized.
For now the new pool is only used on amd64 and arm64, since that's where
I can easily test and those platforms would get the most benefit.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D40403
This will be used when we add SVE support to reduce the registers
needed to be saved on context switch.
Reviewed by: imp
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43305
Adjust the mair_el1 macro indentation to be consistent with the
surrounding macros.
Reviewed by: emaste
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45524
This will be used to implement parts of bhyve's gdb stub.
Three VM capabilities are added, similar to amd64 without monitor mode.
Two cause breakpoint and single-step exceptions to be raised to EL2 and
then down to bhyve. One lets the gdb stub mask hardware interrupts
while single-stepping, since otherwise the guest will handle a timer
interrupt before executing the target instruction and thus fail
to make progress.
Reviewed by: bnovkov, andrew
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D44739
This commit introduces the MINIDUMP_STARTUP_PAGE_TRACKING symbol and
uses it to simplify several instances of a complex preprocessor conditional
for adding pages allocated when bootstraping the kernel to minidumps.
Reviewed by: markj, mhorne
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45085
This commit refactors the UMA small alloc code and
removes most UMA machine-dependent code.
The existing machine-dependent uma_small_alloc code is almost identical
across all architectures, except for powerpc where using the direct
map addresses involved extra steps in some cases.
The MI/MD split was replaced by a default uma_small_alloc
implementation that can be overridden by architecture-specific code by
defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was
introduced to replace most UMA_MD_SMALL_ALLOC uses.
Reviewed by: markj, kib
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45084
It is inherited from arm, where the global exists and is used. No
functional change.
Reviewed by: markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45323
No functional change.
Reviewed by: markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45322
On systems configured with 16KB pages, this change creates 1GB page
mappings in the direct map where possible. Previously, the largest page
size that was used to implement the direct map was 32MB. Similarly, on
systems configured with 4KB pages, this change creates 32MB page
mappings, instead of 2MB, in the direct map where 1GB is too large.
Implement demotion on L2C (32MB/1GB) page mappings within the DMAP.
Update sysctl vm.pmap.kernel_maps to report on L2C page mappings.
Reviewed by: markj
Tested by: gallatin, Eliot Solomon <ehs3@rice.edu>
Differential Revision: https://reviews.freebsd.org/D45224
Add the pointer authentication registers to armreg.h. These will be
used to support pointer authentication in a kernel built with GCC.
Reviewed by: jhb
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45262
While clang can handle numbers with a UL suffix in assembly files
gcc/gas is unable to. Switch to use the UL macro for TCR_EL1 defines as
some are used in locore.S
Reviewed by: brooks, jhb
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45261
CONTEXTIDR_EL1 is used in debug and trace features to identify the
current process or context.
Reviewed by: andrew
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45173
Bits [5:0] of PMBSR_MSS encodes either Buffer Status Code (BSC) or Fault
Status Code (FSC) depending on PMBSR_EC value.
Add PMBSR_MSS_{BSC,FSC} to cover this field.
Reviewed by: andrew
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45172
When the register is not defined in Armv8.0 i.e. added in a later
extension, like SPE added in v8.2, the alternative name format of:
S<op0>_<op1>_C<crn>_C<crm>_<op2>
should be used; otherwise, calls to {READ,WRITE}_SPECIALREG() will
fail.
Use the MRS_REG_ALT_NAME() macro for SPE changing hex to decimal as
required by the macro.
Reviewed by: andrew
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45171
These can be used even when the compiler is too old for the register
to be included.
Reviewed by: Zachary Leaf <zachary.leaf@arm.com>
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D45176
When checking if a physical address is in the DMAP region we assume
all physical addresses between DMAP_MIN_PHYSADDR and DMAP_MAX_PHYSADDR
are able to be accesses through the DMAP. It may be the case that
there is device memory in this range that shouldn't be accessed through
the DMAP mappings.
Add a check to PHYS_IN_DMAP that the translated virtual address is a
valid kernel address. To support code that already checks the address
is valid add PHYS_IN_DMAP_RANGE.
PR: 278233
Reviewed by: alc, markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D44677
As with watchpoints allow the kernel debugger to set hardware
breakpoints on arm64.
These have been tested to work in both the ddb and gdb backends.
Reviewed by: jhb (earlier version)
Sponsored by: Arm Ltd
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D44355
Some SoCs do not include a PSCI for power management and defer it to
something else instead. Add a CPU reset hook to account for this, and
use it in the psci driver.
Reviewed by: andrew
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D44535
The ATTR_CONTIGUOUS bit within an L3 page table entry designates that
L3 page as being part of an aligned, physically contiguous collection
of L3 pages. For example, 16 aligned, physically contiguous 4 KB pages
can form a 64 KB superpage, occupying a single TLB entry. While this
change only creates ATTR_CONTIGUOUS mappings in a few places,
specifically, the direct map and pmap_kenter{,_device}(), it adds all
of the necessary code for handling them once they exist, including
demotion, protection, and removal. Consequently, new ATTR_CONTIGUOUS
usage can be added (and tested) incrementally.
Modify the implementation of sysctl vm.pmap.kernel_maps so that it
correctly reports the number of ATTR_CONTIGUOUS mappings on machines
configured to use a 16 KB base page size, where an ATTR_CONTIGUOUS
mapping consists of 128 base pages.
Additionally, this change adds support for creating L2 superpage
mappings to pmap_kenter{,_device}().
Reviewed by: markj
Tested by: gallatin
Differential Revision: https://reviews.freebsd.org/D42737
Correctly configure the free page queues and the reservation size when
the base page size is 16KB. In particular, the reservation size was
less than the L2 Block size, making L2 promotions and mappings all but
impossible.
Reviewed by: markj
Tested by: gallatin
Differential Revision: https://reviews.freebsd.org/D42737
Rather than try to detect when vfp_save_state is called by savectx use
a separate function that sets up the pcb as needed.
Reviewed by: imp
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43304
To support recent extensions to the Arm architecture we may need to
store more or larger registers when sending a signal.
To support this create a list of these extra registers. Userspace that
needs to access a register in the signal handler can then walk the list
to find the correct register struct and read/write its contents.
Reviewed by: kib, markj (earlier version)
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43302
In the next commit I will update syscalls.master to use struct __siginfo
(which actually exists) so this update will be needed to make
generated files (from make sysent) align.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44380
No functional change, but this reduces diffs with CheriBSD downstream.
Reviewed by: andrew
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D44344
No functional change, but this reduces diffs with CheriBSD downstream.
Reviewed by: andrew
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D44342
This is used to enable the guard page when an elf binary is built with
BTI instructions.
Reviewed by: markj
Sponsored by: Arm Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39453
Add a rangeset to the arm64 pmap to describe which address space needs
the Branch Target Identification (BTI) Guard Page flag set in the page
table.
On hardware that supports BTI the Guard Page flag tells the hardware
to raise an exception if the target of a BR* and BLR* instruction is
not an appropriate landing pad instruction.
To support this in userspace we need to know which address space
should be guarded. For this add a rangeset to the arm64 pmap when the
hardware supports BTI. The kernel can then use pmap_bti_set and
pmap_bti_clear mark and unmark which address space is guarded.
Sponsored by: Arm Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42328
To support virtual machines on arm64 add the vmm code. This is based on
earlier work by Mihai Carabas and Alexandru Elisei at University
Politehnica of Bucharest, with further work by myself and Mark Johnston.
All AArch64 CPUs should work, however only the GICv3 interrupt
controller is supported. There is initial support to allow the GICv2
to be supported in the future. Only pure Armv8.0 virtualisation is
supported, the Virtualization Host Extensions are not currently used.
With a separate userspace patch and U-Boot port FreeBSD guests are able
to boot to multiuser mode, and the hypervisor can be tested with the
kvm unit tests. Linux partially boots, but hangs before entering
userspace. Other operating systems are untested.
Sponsored by: Arm Ltd
Sponsored by: Innovate UK
Sponsored by: The FreeBSD Foundation
Sponsored by: University Politehnica of Bucharest
Differential Revision: https://reviews.freebsd.org/D37428
Add a macro to find which bits from far_el2 are needed to be copied
to get the full intermediate physical address (IPA).
The hpfar_el2 register only contains a 4k aligned fault address. We
need to include the lower bits from far_el2 if we need the full
faulting IPA.
Add a function to support devices that may need to know if the kernel
has enabled the Armv8.1 Virtulization Host Extensions (FEAT_VHE).
Some devices, e.g. the generic timer, will need to know, e.g. use a
different interrupt.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43973
This works identically to amd64. In particular, only the
bus_dma_bounce_impl busdma implementation handles KMSAN at the moment.
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D43157
This is mostly a copy of amd64's msan.h, except that we currently do not
avoid shadowing the kernel itself, and we need a more restrictive upper
bound in kmsan_md_unsupported() to avoid probing non-existent shadow
mappings of device mappings.
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D43156
Both are the same size as the kernel map.
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D43154
pmap_san_bootstrap() doesn't really do much, and it was hard-coding the
the bootstrap stack size defined in locore.S. Moreover, the name is a
bit confusing given the existence of pmap_bootstrap_san(). Just remove
it and call kasan_init_early() directly like we do on amd64. It will
not be used by KMSAN in a forthcoming patch series.
No functional change intended.
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D43403
When booting a KMSAN kernel on an Ampere Altra, I've seen some boot time
hangs when the XHCI controller driver attempts to allocate memory for
32-bit DMA. The system boots fine with a GENERIC kernel; I believe that
the additional memory requirements of KMSAN push it over the edge. The
system has a bit less than 2GB of RAM below the 4GB boundary.
Allocate a new freelist to segregate memory below 4GB, as we do on
amd64, so that such memory allocation failures are less likely to occur.
Reviewed by: alc
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D43503
The arm and arm64 implementations of dispatching IPIs via PIC_IPI_SEND
are almost identical, and entirely MI with the lone exception of a
single store barrier on arm64 (that is likely either redundant or needed
on arm too). Thus, de-duplicate this code by moving it to INTRNG as a
generic IPI glue framework. The ipi_* functions remain declared in MD
smp.h headers and implemented in MD code, but are trivial wrappers
around intr_ipi_send that could be made MI, at least for INTRNG ports,
at a later date.
Note that, whilst both arm and arm64 had an ii_send member in intr_ipi
to abstract over how to send interrupts,, they were always ultimately
using PIC_IPI_SEND, and so this complexity has been removed. A follow-up
commit will re-introduce the same flexibility by instead allowing a
device other than the root PIC to be registered as the IPI sender.
As part of this, strengthen a MAXCPU assertion that was missed in commit
2f0b059eea ("intrng: switch from MAXCPU to mp_ncpus") (which itself is
mis-titled).
Reviewed by: mmel, mhorne
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D35898
After removing filter functionality, the naming doesn't clearly
represent what the function does, so try to address this. Include some
code clarity and style improvements.
Create a common version in subr_busdma_bounce.c, used by most
implementations. powerpc still needs its own version of the function,
due to its dmat->iommu == NULL check.
No functional change intended.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42896
Without filter functions, we do not need to keep track of tag ancestry.
All inheritance of the parent tag's parameters occurs when creating the
new child tag.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42895