Commit graph

20161 commits

Author SHA1 Message Date
Ryan Libby eae1767d8f vfs: move __always_inline to canonical position
Ahead of including inline in __always_inline, move __always_inline to
where inline goes.

Reviewed by:	kib, olce
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45709
2024-06-24 10:05:58 -07:00
Ryan Libby 3c84b4b35f kern: move __always_inline to canonical position
Ahead of including inline in __always_inline, move __always_inline to
where inline goes.

Reviewed by:	kib, olce
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45708
2024-06-24 10:05:58 -07:00
Mark Johnston f45213c74c physmem: Correct a comment
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
2024-06-20 17:45:40 -04:00
Gleb Smirnoff 3beb43dd4f callout: assert that callout_init_*lock* functions are called with a lock
Quick grep around kernel confirms they all do.
2024-06-20 10:53:31 -07:00
Gleb Smirnoff 39afff09c5 callout: tidy up _callout_init_lock()
Separate function into assertive part and into assigning part.
Consistently use __func__ in the assertions.  Write the assigning code in
a declarative style.

The functional change is that we no longer validate flags in the
non-INVARIANT kernel.  The assertion that checks flags has been there for
17 years, so all code that calls with invalid flags must have been
filtered and fixed.
2024-06-20 10:53:31 -07:00
Gleb Smirnoff 95a9594adc mutex: add static qualifier to implementations previously declared static 2024-06-20 10:53:31 -07:00
Gleb Smirnoff aaef18e6ae rwlock: add static qualifier to implementations previously declared static 2024-06-20 10:53:31 -07:00
Mark Johnston 70c712a86d sdt: Support fetching the probe sixth argument with MI machinery
SDT calls dtrace_probe() directly, and this can be used to pass up to
five probe arguments directly.  To pass the sixth argument (SDT
currently doesn't support more than this), we use a hack: just add
additional parameters to the call and cast dtrace_probe accordingly.
This happens to work on amd64, but doesn't work in general.

Modify SDT to call dtrace_probe() after storing arguments beyond the
first five in thread-local storage.  Implement sdt_getargval() to fetch
extra argument values this way.  An alternative would be to use invop
handlers instead and make sdt_probe_func point to a breakpoint
instruction, so that one can extract arguments using the breakpoint
exception trapframe, but this makes the providers more expensive when
enabled and doesn't seem justified.  This approach works well unless we
want to add more than one or two more parameters to SDT probes, which
seems unlikely at present.

In particular, this fixes fetching the last argument of most ip and tcp
probes on arm64.

Reported by:	rwatson
Reviewed by:	Domagoj Stolfa
MFC after:	1 month
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D45648
2024-06-20 12:40:25 -04:00
Mark Johnston ddf0ed09bd sdt: Implement SDT probes using hot-patching
The idea here is to avoid a memory access and conditional branch per
probe site.  Instead, the probe is represented by an "unreachable"
unconditional function call.  asm goto is used to store the address of
the probe site (represented by a no-op sled) and the address of the
function call into a tracepoint record.  Each SDT probe carries a list
of tracepoints.

When the probe is enabled, the no-op sled corresponding to each
tracepoint is overwritten with a jmp to the corresponding label.  The
implementation uses smp_rendezvous() to park all other CPUs while the
instruction is being overwritten, as this can't be done atomically in
general.  The compiler moves argument marshalling code and the
sdt_probe() function call out-of-line, i.e., to the end of the function.

Per gallatin@ in D43504, this approach has less overhead when probes are
disabled.  To make the implementation a bit simpler, I removed support
for probes with 7 arguments; nothing makes use of this except a
regression test case.  It could be re-added later if need be.

The approach taken in this patch enables some more improvements:
1. We can now automatically fill out the "function" field of SDT probe
   names.  The SDT macros let the programmer specify the function and
   module names, but this is really a bug and shouldn't have been
   allowed.  The intent was to be able to have the same probe in
   multiple functions and to let the user restrict which probes actually
   get enabled by specifying a function name or glob.
2. We can avoid branching on SDT_PROBES_ENABLED() by adding the ability
   to include blocks of code in the out-of-line path.  For example:

	if (SDT_PROBES_ENABLED()) {
		int reason = CLD_EXITED;

		if (WCOREDUMP(signo))
			reason = CLD_DUMPED;
		else if (WIFSIGNALED(signo))
			reason = CLD_KILLED;
		SDT_PROBE1(proc, , , exit, reason);
	}

could be written

	SDT_PROBE1_EXT(proc, , , exit, reason,
		int reason;

		reason = CLD_EXITED;
		if (WCOREDUMP(signo))
			reason = CLD_DUMPED;
		else if (WIFSIGNALED(signo))
			reason = CLD_KILLED;
	);

In the future I would like to use this mechanism more generally, e.g.,
to remove branches and marshalling code used by hwpmc, and generally to
make it easier to add new tracepoint consumers without having to add
more conditional branches to hot code paths.

Reviewed by:	Domagoj Stolfa, avg
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D44483
2024-06-19 16:57:41 -04:00
Doug Rabson e97ad33a89 Add an implementation of the 9P filesystem
This is derived from swills@ fork of the Juniper virtfs with many
changes by me including bug fixes, style improvements, clearer layering
and more consistent logging. The filesystem is renamed to p9fs to better
reflect its function and to prevent possible future confusion with
virtio-fs.

Several updates and fixes from Juniper have been integrated into this
version by Val Packett and these contributions along with the original
Juniper authors are credited below.

To use this with bhyve, add 'virtio_p9fs_load=YES' to loader.conf. The
bhyve virtio-9p device allows access from the guest to files on the host
by mapping a 'sharename' to a host path. It is possible to use p9fs as a
root filesystem by adding this to /boot/loader.conf:

	vfs.root.mountfrom="p9fs:sharename"

for non-root filesystems add something like this to /etc/fstab:

	sharename /mnt p9fs rw 0 0

In both examples, substitute the share name used on the bhyve command
line.

The 9P filesystem protocol relies on stateful file opens which map
protocol-level FIDs to host file descriptors. The FreeBSD vnode
interface doesn't really support this and we use heuristics to guess the
right FID to use for file operations.  This can be confused by privilege
lowering and does not guarantee that the FID created for a given file
open is always used for file operations, even if the calling process is
using the file descriptor from the original open call. Improving this
would involve changes to the vnode interface which is out-of-scope for
this import.

Differential Revision: https://reviews.freebsd.org/D41844
Reviewed by: kib, emaste, dch
MFC after: 3 months
Co-authored-by: Val Packett <val@packett.cool>
Co-authored-by: Ka Ho Ng <kahon@juniper.net>
Co-authored-by: joyu <joyul@juniper.net>
Co-authored-by: Kumara Babu Narayanaswamy <bkumara@juniper.net>
2024-06-19 13:12:04 +01:00
Ryan Libby 0dc98b57f3 getblk: track "non-sterile" bufobj to avoid bo lock on miss if sterile
This is a scheme to avoid taking the bufobj lock and doing a second
lookup in the case where in getblk we do an unlocked lookup and find no
buf.  Was there really no buf, or were we in the middle of a reassignbuf
race?  By tracking any use of reassignbuf with a flag, we can know if
there can't have been a race because there has been no reassignbuf.
Because this scheme is spoiled on the first use of reassignbuf, it is
mostly only beneficial for cases where a certain vnode is never expected
to use dirty bufs at all.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45571
2024-06-16 14:09:45 -07:00
Doug Moore 2a21cfe60f pctrie: avoid typecast
Have PCTRIE_RECLAIM_CALLBACK typecast one function pointer type to
another, to relieve the writer of the call back function from having
to cast its first argument from void* to member type.

Reviewed by:	rlibby
Differential Revision:	https://reviews.freebsd.org/D45586
2024-06-14 02:19:03 -05:00
Doug Moore d19851f002 subr_pctrie: add a word to a comment
No functional changes.
Reported by:	alc
2024-06-13 15:28:15 -05:00
Doug Moore a7f67ebd82 subr_rangeset: use pctrie_reclaim_cb in remove_all
Replace the lookup-remove loop in rangeet_remove_all with a call
to SWAP_PCTRIE_RECLAIM_CALLBACK, to eliminate repeated trie searches.

Reviewed by:	rlibby
Differential Revision:	https://reviews.freebsd.org/D45584
2024-06-13 13:52:25 -05:00
Doug Moore c0d0bc2bed subr_pctrie: add leaf callbacks to pctrie_reclaim
PCTRIE_RECLAIM frees all the interior nodes in a pctrie, but is little
used because most trie-destroyers want to free leaves of the tree
too. Add PCTRIE_RECLAIM_CALLBACK, with two extra arguments, a callback
function and an auxiliary argument, that is invoked on every non-NULL
leaf in the tree as the tree is destroyed.

Reviewed by:	rlibby, kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D45565
2024-06-13 11:48:09 -05:00
Doug Moore 2c10bacdf4 rangeset: add next() iteration
Add a method rangeset_next to find the first range that starts at or
after a given value. Use it to rewrite pmap_pkru_same and
pmap_bti_same to avoid walking a page at a time over pages in no
range.

Reviewed by:	andrew, kib
Differential Revision:	https://reviews.freebsd.org/D45511
2024-06-06 13:42:31 -05:00
Ryan Libby 780666c09b getblk: reduce time under bufobj lock
Use the new pctrie combined insert/lookup facility to reduce work and
time under the bufobj interlock when associating a buf with a vnode.

We now do one lookup in the dirty tree and one combined lookup/insert in
the clean tree instead of one lookup in dirty, two in clean, and then an
insert in clean.  We also avoid touching the possibly unrelated buf at
the tail of the queue.

Also correct an issue where the actual order of the tail queue depended
on the insertion order due to sign issues.

Reviewed by:	kib (previous version), dougm, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45395
2024-06-05 20:21:22 -07:00
Ryan Libby bbf81f4629 pctrie: add combined insert/lookup operations
In several places in code, we do a pctrie lookup followed by a pctrie
insert.  Provide a few flavors of combined lookup/insert.  This may save
a portion of the work from walking a large pctrie twice.

The general idea is that while we walk the trie during insert, we also
do the same kind of tracking work that we do during pctrie_lookup_ge or
pctrie_lookup_le, and we pass out a pctrie node from where such a lookup
may continue.

Reviewed by:	dougm (previous version), kib (previous version), markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45394
2024-06-05 19:17:20 -07:00
Andrew Turner c2e0d56f5e arm64: Support BTI checking in most of the kernel
LLD has the -zbti-report=error argument to check if the BTI note is
present when linking. To allow for this to be used when linking the
kernel and modules:
 - Add the BTI note to the remaining assembly files
 - Mark ptrauth.c as protected by BTI
 - Disable -zbti-report for vmm hypervisor switching code as it's not
   used there.

The linux64 module doesn't build with the flag as it includes vdso code
that doesn't include the note.

Reviewed by:	imp, kib, emaste
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D45466
2024-06-05 09:23:40 +00:00
Andrew Turner a5affc0c4c stats: Fix the build under gcc
Reviewed by:	brooks, imp
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D45302
2024-06-05 09:23:40 +00:00
Mitchell Horne 5df74441b3 devmap: eliminate unused arguments
The optional 'table' pointer is a legacy part of the interface, which
has been replaced by devmap_register_table()/devmap_add_entry(). The few
in-tree callers have already adapted to this, so it can be removed.

The 'l1pt' argument is already entirely unused within the function.

Reviewed by:	andrew, markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45319
2024-06-04 20:17:47 -03:00
Mitchell Horne 191e6a6049 physmem: zero entire array
As a convenience to callers, who might allocate the array on the stack.
An empty/zero-valued range indicates the end of the physmap entries.

Remove the now-redundant calls to bzero() at the call site.

Reviewed by:	andrew
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45318
2024-06-04 20:17:13 -03:00
Gleb Smirnoff a9b55a6644 unix: use m_freemp() when disposing unix socket buffers
The new unix/dgram uses m_nextpkt linkage, while the old unix/stream
uses m_next linkage.  This fixes memory leak.

Diagnosed by:		khng
Reviewed by:		khng, markj
PR:			279467
Fixes:			458f475df8
Differential Revision:	https://reviews.freebsd.org/D45478
MFC After:		1 week
2024-06-03 17:23:06 -07:00
Gleb Smirnoff badf44cc21 mbuf: provide m_freemp()
This function follows both m_nextpkt and m_next linkage freeing all mbufs.
Note that existing m_freem() follows only m_next.

Reviewed by:		khng
Differential Revision:	https://reviews.freebsd.org/D45477
2024-06-03 17:23:06 -07:00
Ryan Libby 3ca6bf7929 db_show_buffer: minor cleanup
Do some light cleanup to make the output format more consistent for
readability.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45442
2024-06-03 11:35:28 -07:00
Doug Moore 749c249dc3 subr_pctrie: use ilog2(x) instead of fls(x)-1
In three instances where fls(x)-1 is used, the compiler does not know
that x is nonzero and so adds needless zero checks.  Using ilog(x)
instead saves, in each instance, about 4 instructions, including a
conditional, and 16 or so bytes, on an amd64 build.

Reviewed by:    alc
Differential Revision:  https://reviews.freebsd.org/D45330
2024-06-03 13:31:19 -05:00
Doug Moore e3537f9235 Revert "subr_pctrie: use ilog2(x) instead of fls(x)-1"
This reverts commit 574ef65069.
2024-06-03 13:07:42 -05:00
Doug Moore 574ef65069 subr_pctrie: use ilog2(x) instead of fls(x)-1
In three instances where fls(x)-1 is used, the compiler does not know
that x is nonzero and so adds needless zero checks.  Using ilog(x)
instead saves, in each instance, about 4 instructions, including a
conditional, and 16 or so bytes, on an amd64 build.

Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D45330
2024-06-03 12:45:45 -05:00
Mitchell Horne deab57178f Adjust comments referencing vm_mem_init()
I cannot find a time where the function was not named this.

Reviewed by:	kib, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45383
2024-05-27 18:37:40 -03:00
Bojan Novković da76d349b6 uma: Deduplicate uma_small_alloc
This commit refactors the UMA small alloc code and
removes most UMA machine-dependent code.
The existing machine-dependent uma_small_alloc code is almost identical
across all architectures, except for powerpc where using the direct
map addresses involved extra steps in some cases.

The MI/MD split was replaced by a default uma_small_alloc
implementation that can be overridden by architecture-specific code by
defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was
introduced to replace most UMA_MD_SMALL_ALLOC uses.

Reviewed by: markj, kib
Approved by: markj (mentor)
Differential Revision:	https://reviews.freebsd.org/D45084
2024-05-25 19:24:46 +02:00
Ed Maste 9b1de7e484 vt/sc: retire logic to select vt(4) by default for UEFI boot
We previously defaulted to using sc(4) with a special case to prefer
vt(4) when booted via UEFI.  As vt(4) is now always the default we can
simplify this.

Reviewed by:	imp, kevans
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45356
2024-05-25 11:00:35 -04:00
Ricardo Branco e30621d58f mqueue: Introduce kern_kmq_timedreceive & kern_kmq_timedsend
Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
2024-05-23 13:40:46 -06:00
Ricardo Branco 289b2d6a79 mqueue: Export some functions to be used by Linuxulator
Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
2024-05-23 13:40:46 -06:00
Ricardo Branco ddbfb544c6 mqueuefs: Relax restriction that path must begin with a slash
This is needed to support Linux implementation which discards the leading slash when calling mq_open(2)

Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
2024-05-23 13:40:46 -06:00
Ricardo Branco acb7a4deb2 mqueue: Add sysctl for default_maxmsg & default_msgsize and fix descriptions
Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
2024-05-23 13:40:45 -06:00
Konstantin Belousov f0a4dd6d46 mqueuefs: mark newly allocated vnode as constructed, under the lock
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-05-23 01:13:29 +03:00
Konstantin Belousov b6f4a3fa75 mqueuefs: uma_zfree() can be postponed until mqfs sx mi_lock is dropped
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-05-23 01:13:29 +03:00
Konstantin Belousov 63f18b37e0 mqueuefs: minor style pass
Also remove not needed inclusion of sys/cdefs.h.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-05-23 01:13:29 +03:00
Stephen J. Kiernan 56b2742130 Add function to OSD to get values without taking the lock.
There are some cases of OSD use where the value is only initialized once
at a point where successive access of the value can be done so safely
without the need to take the lock.

Reviewed by:	markj
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D44631
2024-05-22 15:55:48 -04:00
Pawel Jakub Dawidek 61e3e1776d capsicum: SIGTRAP is delivered also on ECAPMODE error.
Approved by: oshogbo (mentor)
2024-05-21 21:51:50 -07:00
Elliott Mitchell cbcb9778dd kern/rman: mark rman get functions as taking constants
The arguments are left completely unchanged by these functions.  This
allows passing constant pointers for verifying ownership, but not
modifying the contents.

Reviewed by: imp,jhb
Pull Request: https://github.com/freebsd/freebsd-src/pull/1224
2024-05-21 17:52:29 -06:00
Elliott Mitchell 996fa9fb4e kern/rman: update rman_make_alignment_flags()
The flsl() function makes use of hardware functionality to compute the
value faster than this loop.  The only deviation from flsl() is at 0.

Reviewed by: imp,jhb
Pull Request: https://github.com/freebsd/freebsd-src/pull/1224
2024-05-21 17:52:27 -06:00
Elliott Mitchell 037946dc9b kern/rman: remove rman_reserve_resource_bound(), partially revert 13fb665772
Not once has rman_reserve_resource_bound() ever been used.  There are
though several uses of RF_ALIGNMENT.  In light of this remove this
extra and leave the actually used portion in place.

This partially reverts commit 13fb665772.

Reviewed by: imp,jhb
Pull Request: https://github.com/freebsd/freebsd-src/pull/1224
2024-05-21 17:52:24 -06:00
Elliott Mitchell beb1165a01 kern/rman: update debugging lines in subr_rman.c
Rather than hard-code the function name, use __func__ instead.  Apply
some style and adjust indentation as appropriate.  Remove the no longer
required braces.

Reviewed by: imp,jhb
Pull Request: https://github.com/freebsd/freebsd-src/pull/1224
2024-05-21 17:52:21 -06:00
Elliott Mitchell 973c32297f kern/rman: update DPRINTF() macro, avoid semicolon swallowing match function
Using a variadic macro allows passing everything properly to printf().
Using the do { } while(0) construct ensures the macro acts like any
other single statement.  This shows just how long some of this has
existed.

Reviewed by: imp,jhb
Pull Request: https://github.com/freebsd/freebsd-src/pull/1224
2024-05-21 17:52:15 -06:00
Mariusz Zaborski 408957613b Regen 2024-05-21 22:03:20 +02:00
Edward Tomasz Napierala 6b7e4254a2 capsicum: allow rfork(2) in capability mode
Reviewed by:	brooks, rwatson
MFC after:	4 days
Differential Revision:	https://reviews.freebsd.org/D45040
2024-05-21 22:02:36 +02:00
Ryan Libby a332ba32d4 getblk: fail faster with GB_LOCK_NOWAIT
If we asked not to wait on a lock, and then we failed to get a buf lock
because we would have had to wait, then just return the error.  This
avoids taking the bufobj lock and a second trip to lockmgr.

Reviewed by:	mckusick, kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45245
2024-05-21 10:21:50 -07:00
Ryan Libby b92cd6b294 lockmgr: make lockmgr_disowned public and use it
Reviewed by:	mckusick, kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D45248
2024-05-21 10:21:50 -07:00
Ricardo Branco 7975f57b7e uipc_shm: Fix double check for shmfd->shm_path
Reviewed by:	emaste, zlei
Pull Request:	https://github.com/freebsd/freebsd-src/pull/1250
2024-05-21 09:39:53 -04:00