Commit graph

20150 commits

Author SHA1 Message Date
Elliott Mitchell 38c35248fe kern/intr: remove support for passing trap frame as argument
While otherwise a handy potential approach, getting the trap frame via
the argument isn't documented and isn't supposed to be used.  With all
uses removed, now remove support to end the mixed calling conventions.

Differential Revision: https://reviews.freebsd.org/D37688

Reviewed by: imp, mhorne
Pull Request: https://github.com/freebsd/freebsd-src/pull/1225
2024-05-10 15:33:24 -06:00
John Baldwin 473c90ac04 uio: Use switch statements when handling UIO_READ vs UIO_WRITE
This is mostly to reduce the diff with CheriBSD which adds additional
constants to enum uio_rw, but also matches the normal style used for
uio_segflg.

Reviewed by:	kib, emaste
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D45142
2024-05-10 13:43:36 -07:00
Isaac Cilia Attard 6437872c1d New sysctl to disable NOMATCH until devmatch runs
Introduce hw.bus.devctl_nomatch_enabled and use it to suppress NOMATCH
until devmatch runs

There's a lot of NOMATCH events generated at boot. We also run devmatch
once during early boot to load unmatched devices. To avoid redundant
work, don't start generating NOMATCH events until after devmatch runs.
Set hw.bus.devctl_nomatch_enabled=1 just before we run devmatch. The
kernel will suppress NOMATCH events until this is set to true.

This saves about 170ms from the boot on aarch64 running atop Apple
M-series processors and the VMWare Fusion hypervisor.

Reviewed by:    imp, cperciva
MFC after:      3 days
Sponsored by:   Google Summer of Code
Pull Request:   https://github.com/freebsd/freebsd-src/pull/1213
2024-05-09 17:56:40 -07:00
Elliott Mitchell 9f3a552f9e intrng: switch flag arguments to unsigned
The flag variables behind these are all unsigned.  As such adjust the
declarations to match reality and reduce the number of mismatches.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1126
2024-05-09 17:14:38 -06:00
Elliott Mitchell a9e0f316b3 kern/intr: redeclare intr_setaffinity()'s third arg constant
This matches reality and allows removal of a __DECONST().

Fixes: 4c72d075a5 ("LinuxKPI: const argument to irq_set_affinity_hint()")
Fixes: 9b33b154b5 ("Add support to cpuset for binding hardware interrupts")
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1126
2024-05-09 17:14:35 -06:00
Elliott Mitchell cd04887b95 kern/intr: change ->ie_irq to unsigned
All architecture implementations actually want this to be unsigned.
INTRNG the equivalent is overtly unsigned.  x86 and PowerPC merely avoid
the need to explicitly convert at several points.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1126
2024-05-09 17:14:33 -06:00
Mitchell Horne a77e1f0f81 busdma: better handling of small segment bouncing
Typically, when a DMA transaction requires bouncing, we will break up
the request into segments that are, at maximum, page-sized.

However, in the atypical case of a driver whose maximum segment size is
smaller than PAGE_SIZE, we end up inefficiently assigning each segment
its own bounce page. For example, the dwmmc driver has a maximum segment
size of 2048 (PAGE_SIZE / 2); a 4-page transfer ends up requiring 8
bounce pages in the current scheme.

We should attempt to batch segments into bounce pages more efficiently.
This is achieved by pushing all considerations of the maximum segment
size into the new _bus_dmamap_addsegs() function, which wraps
_bus_dmamap_addseg(). Thus we allocate the minimal number of bounce
pages required to complete the entire transfer, while still performing
the transfer with smaller-sized transactions.

For most drivers with a segment size >= PAGE_SIZE, this will have no
impact. For drivers like dwmmc mentioned above, this improves the memory
and performance efficiency when bouncing a large transfer.

Co-authored-by:	jhb
Reviewed by:	jhb
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45048
2024-05-07 13:02:57 -03:00
Mitchell Horne 5604069824 busdma: deduplicate _bus_dmamap_addseg() function
It is functionally identical in all implementations, so move the
function to subr_busdma_bounce.c. The KASSERT present in the x86 version
is now enabled for all architectures. It should be universally
applicable.

Reviewed by:	jhb
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45047
2024-05-07 13:02:57 -03:00
Gleb Smirnoff 99b0270adc sockets: hide socket hhook(9)s under SOCKET_HHOOK
There are no in-tree consumers of these hooks.

Reviewed by:		stevek
Differential Revision:	https://reviews.freebsd.org/D44928
2024-05-06 12:49:29 -07:00
John Baldwin 51346bd594 mbuf: Add EXT_CTL for mbufs backed by a CTL backend buffer
This is somewhat similar to EXT_NET_DRV, but CTL isn't a network
driver.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44725
2024-05-02 16:38:30 -07:00
Mark Johnston d5eae57088 sysctl: Make sysctl_ctx_free() a bit safer
Clear the list before returning so that sysctl_ctx_free() can be called
more than once on the same list without side effects.  This simplifies
error handling in drivers; previously, drivers would have to be careful
to call sysctl_ctx_free() at most once to avoid a use-after-free.

While here, use TAILQ_FOREACH_SAFE in the loop which unregisters OIDs.

Reviewed by:	thj, emaste
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45041
2024-05-02 15:42:28 -04:00
Josef 'Jeff' Sipek 0fe60dc655 fattime: fix fattime to timespec conversion of dates beyond 2106-02-06
It turns out that the only conversion issue was in fattime2timespec, where
multiplying the number of seconds in a day by the number of days overflowed
32-bit unsigned int for dates beyond 2106-02-07 06:28:15.

Casting one of the multiplicands as time_t forces a 64-bit multiplication on
systems where time_t is 64-bits and produces no binary changes on the one
remaining system with 32-bit time_t (namely i386).

Since the code is now tested & fixed, this change removes the fixme comments.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44755
2024-05-01 07:56:41 +03:00
Josef 'Jeff' Sipek 9d1396c346 fattime: make the test code check beyond 32-bit time_t limits
On systems that have a 64-bit time_t, the test code now exercises the whole
range of fattime.  To do this, this commit...

1. replaces the call to random() with two calls to arc4random() to
   generate a 33-bit number of seconds in order to cover the entire range of
   fattime [1970,2107].  (32-bits stops just short - in January 2106.)
   On systems with 32-bit time_t, the extra bits are discarded and only the
   time_t expressible range is tested.
2. casts time_t values passed to printf as longs and changes the format
   string to match.

Now, the test code builds, runs, and exercises what it can (i.e., the whole
fattime range or the 32-bit time_t subset of it) on both 32-bit and 64-bit
time_t systems.

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44754
2024-05-01 07:56:41 +03:00
Josef 'Jeff' Sipek 7b8b613d08 fattime: make the test code build again
This change...

1. replaces calls to timet2fattime/fattime2timet with calls to
   timespec2fattime/fattime2timespec.  The functions got renamed shortly
   after they landed in the kernel but the test code wasn't updated (see
   7ea93e912b).
2. adds a utc_offset stub.

With this, the test code builds and runs as a 32-bit binary (cc -Wall -O2
-m32 subr_fattime.c).

Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44753
2024-05-01 07:56:41 +03:00
Justin Hibbits 2cb4909011 cons: Add boot option to mute boot messages after banner
This is useful for embedded systems, where it provides feedback that the
kernel has booted, but avoids printing the probe messages.  If both
mutemsgs and verbose are set, verbose cancels the mute.

Additionally, this unmutes the console on panic, so a user can see what
happened leading up to the panic.

Obtained from:  Juniper Networks, Inc.
2024-04-30 16:23:47 -04:00
Andrew Gallatin 13a5a46c49 Fix new users of MAXPHYS and hide it from the kernel namespace
In cd85379104, kib made maxphys a load-time tunable.  This made
the #define MAXPHYS in sys/param.h  almost entirely obsolete, as
it could now be overridden by kern.maxphys at boot time, or by
opt_maxphys.h.

However, decades of tradition have led to several new, incorrect, uses
of MAXPHYS in other parts of the kernel, mostly by seasoned
developers.  I've corrected those uses here in a mechanical fashion,
and verified that it fixes a bug in the md driver that I was
experiencing.

Since using MAXPHYS is such an easy mistake to make, it is best to
hide it from the kernel namespace.  So I've moved its definition to
_maxphys.h, which is now included in param.h only for userspace.

That brings up the fact that lots of userspace programs use MAXPHYS
for different reasons, most of them probably wrong.  Userspace consumers
that really need to know the value of maxphys should probably be
changed to use the kern.maxphys sysctl.  But that's outside the scope
of this change.

Reviewed by: imp, jkim, kib, markj
Fixes: 30038a8b4e ("md: Get rid of the pbuf zone")
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D44986
2024-04-30 15:29:06 -04:00
Konstantin Belousov 5b3e5c6ce3 kcmp_pget(): do not accept TIDs
Otherwise pget() might still look up and hold the current process.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2024-04-30 10:07:03 +03:00
Konstantin Belousov 1e01650a78 kcmp_pget(): add an assert that we did not hold the current process
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2024-04-30 10:06:52 +03:00
Mark Johnston d66399326c kthread: Set *tdptr earlier in kproc_kthread_add()
See commit ae77041e07 ("kthread: Set *newtdp earlier in
kthread_add1()") for details.  That commit was incomplete since
g_init()'s first call to kproc_kthread_add() will cause
kproc_kthread_add() to take the `*procptr == NULL` branch, which avoids
kthread_create().

To ensure that the thread pointer is initialized before the thread
starts running, we have to start the kernel process with RFSTOPPED.
We could perhaps go further and use RFSTOPPED only when tdptr != NULL,
but it's probably better to have consistent behaviour.

Reviewed by:	olce, kib
Reported by:	syzbot+e91e798f3c088215ace6@syzkaller.appspotmail.com
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44927
2024-04-25 09:35:38 -04:00
Gleb Smirnoff 19307b86d3 accept_filter: return different errors for non-listener and a busy socket
The fact that an accept filter needs to be cleared first before setting to
a different one isn't properly documented.  The requirement that the
socket needs already be listening, although trivial, isn't documented
either.  At least return a more meaningful error than EINVAL for an
existing filter.  Cover this with a test case.
2024-04-24 21:55:58 -07:00
Brooks Davis 78101d437a syscalls.master: correct return type of {read,write}v
This was missed when read/write, etc were updated to return ssize_t.

Fixes:		2e83b28161 Fix a few syscall arguments to use size_t instead of u_int.

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D44930
2024-04-24 20:48:46 +01:00
Konstantin Belousov 6b0cf2a237 vfs_lookup.c: only call ktrcapfail() if KTRACE is enabled
Reviewed by:	emaste, imp, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44931
2024-04-24 22:43:32 +03:00
Konstantin Belousov 66df81021e sys/namei.h: move NI_CAP_VIOLATION() macro from namei.h to vfs_lookup.c
Reviewed by:	emaste, imp, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44931
2024-04-24 22:43:31 +03:00
Mark Johnston 8ef2c02182 busdma: uma_zcreate() does not fail
No functional change intended.

MFC after:	1 week
2024-04-24 08:46:41 -04:00
Mark Johnston 1e607a0753 khelp: uma_zcreate() does not fail
No functional change intended.

MFC after:	1 week
2024-04-24 08:46:35 -04:00
Gleb Smirnoff a8acc2bf56 sockets: inherit SO_ACCEPTFILTER from listener to child
This is crucial for operation of accept_filter(9).  See added comment.

Fixes:	d29b95ecc0
2024-04-23 17:17:14 -07:00
Konstantin Belousov 53186bc143 sigqueue(2): add impl-specific flag __SIGQUEUE_TID
The flag allows the pid argument to designate a thread from the calling
process.  The flag value is carved from the high bit of the signal
number, which slightly changes the ABI of syscall.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44867
2024-04-23 19:51:09 +03:00
Konstantin Belousov 0c11c1792b kern_thr.c: normalize includes
Remove extra sys/param.h, provided by sys/systm.h.
Order the rest alphabetically.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44867
2024-04-23 19:51:07 +03:00
Konstantin Belousov 2effad53b4 kern_thr.c/kern_sig.c: remove sys/cdefs.h
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44867
2024-04-23 19:51:05 +03:00
Konstantin Belousov 53e0938b0b kern_thread.c: remove unneeded include of sys/param.h
Handled by sys/systm.h already.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44867
2024-04-23 19:51:03 +03:00
Mark Johnston 7a7063cc54 thread: Add a missing include of asan.h
I didn't notice this during testing because invariants-enabled kernels
implicitly include asan.h via kassert.h.

Reported by:	Lexi Winter <lexi@le-Fay.org>
Fixes:		800da341bc ("thread: Simplify sanitizer integration with thread creation")
2024-04-22 13:07:53 -04:00
Mark Johnston 800da341bc thread: Simplify sanitizer integration with thread creation
fork() may allocate a new thread in one of two ways: from UMA, or cached
in a freed proc that was just allocated from UMA.  In either case, KASAN
and KMSAN need to initialize some state; in particular they need to
initialize the shadow mapping of the new thread's stack.

This is done differently between KASAN and KMSAN, which is confusing.
This patch improves things a bit:
- Add a new thread_recycle() function, which moves all kernel stack
  handling out of kern_fork.c, since it doesn't really belong there.
- Then, thread_alloc_stack() has only one local caller, so just inline
  it.
- Avoid redundant shadow stack initialization: thread_alloc()
  initializes the KMSAN shadow stack (via kmsan_thread_alloc()) even
  through vm_thread_new() already did that.
- Add kasan_thread_alloc(), for consistency with kmsan_thread_alloc().

No functional change intended.

Reviewed by:	khng
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44891
2024-04-22 11:46:59 -04:00
Gordon Bergling 9576fc16ca uipc_domain: Fix a typo in a source code comment
- s/cant/can't/

MFC after:	3 days
2024-04-21 09:51:14 +02:00
Ka Ho Ng 68a3a7fc94 kasan: fix false-positive kasan_report upon thread reuse
In fork1(), if a thread is reused and thread_alloc_stack() is not
called, mark the reused thread's kstack pages clean in the KASAN shadow
buffer.

Sponsored by:	Juniper Networks, Inc.
MFC after:	3 days
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D44875
2024-04-19 15:18:27 -04:00
Mark Johnston e411b22736 uipc_shm: Fix a free() of an uninitialized variable
Reported by:	Coverity
CID:		1544043
Fixes:		b112232e4f ("uipc_shm: Copyin userpath for ktrace(2)")
2024-04-18 20:18:29 -04:00
Brooks Davis 1fd880742a libsys: add a libsys.h
This declares an API for libsys which currently consists of
__sys_<foo>() declarations for system call stubs and function pointer
typedefs of the form __sys_<foo>_t.  The vast majority of the
implementation resides in a generated _libsys.h which ensures that all
system call stub declarations match syscalls.master.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44387
2024-04-16 17:48:07 +01:00
Brooks Davis 6bb132ba1e Reduce reliance on sys/sysproto.h pollution
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.

sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44465
2024-04-15 21:35:40 +01:00
Gleb Smirnoff e6a4b57239 mbuf: restore m_uiotombuf() feature of returning a zero length mbuf
PR:	278340
Fixes:	aba79b0f4a
2024-04-14 10:21:07 -07:00
Gleb Smirnoff 0020e1b617 Revert "sendfile: mark it explicitly as a TCP only feature"
This reverts commit 3b7aa842e2.
2024-04-10 11:28:11 -07:00
Olivier Certner afc10f8bba
sys_procctl(): Make it clear that negative commands are invalid
An initial reading of the preamble of sys_procctl() gives the impression
that no test prevents a malicious user from passing a negative commands
index (in 'uap->com'), which is soon used as an index into the static
array procctl_cmds_info[].

However, a closer examination leads to the conclusion that the existing
code is technically correct.  Indeed, the comparison of 'uap->com' to
the nitems() expression, which expands to a ratio of sizeof(), leads to
a conversion of 'uap->com' to an 'unsigned int' as per Usual Arithmetic
Conversions/Integer Promotions applied by '<=', because sizeof() returns
'size_t' values, and we define 'size_t' as an equivalent of 'unsigned
int' (which is not mandated by the standard, the latter allowing, e.g.,
integers of lower ranks).

With this conversion, negative values of 'uap->com' are automatically
ruled-out since they are converted to very big unsigned integers which
are caught by the test.  An analysis of assembly code produced by LLVM
16 on amd64 and practical tests confirm that no exploitation is possible.

However, the guard code as written is misleading to readers and might
trip up static analysis tools.  Make sure that negative values are
explicitly excluded so that it is immediately clear that EINVAL will be
returned in this case.

Build tested with clang 16 and GCC 12.

Approved by:    markj (mentor)
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
2024-04-10 17:15:25 +02:00
Jake Freeland b112232e4f uipc_shm: Copyin userpath for ktrace(2)
If userpath is not SHM_ANON, then copy it in early so ktrace(2) can
record it. Without this change, ktrace(2) will attempt to strcpy a
userspace string and trigger a page fault.

Reported by:	syzbot+490b9c2a89f53b1b9779@syzkaller.appspotmail.com
Fixes:		0cd9cde767
Approved by:	markj (mentor)
Reviewed by:	markj
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D44702
2024-04-09 21:17:11 -05:00
Gleb Smirnoff 5716d902ae Revert "unix: new implementation of unix/stream & unix/seqpacket"
The regressions in aio(4) and kernel RPC aren't a 5 minute problem.

This reverts commit d80a97def9.
This reverts commit d1cbb17a87.
This reverts commit fb8a8333b4.
2024-04-09 13:15:47 -07:00
Stephen J. Kiernan 81b4d1c4d4 sockets: Add hhook in sonewconn for inheriting OSD specific data
Added HHOOK_SOCKET_NEWCONN and bumped HHOOK_SOCKET_LAST

Reviewed by:	glebius, tuexen
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D44632
2024-04-08 21:31:34 -04:00
Gleb Smirnoff fb8a8333b4 unix: return immediately on MSG_OOB
Jumping to cleanup routines will work on uninitialized stack mc.

Fixes:	d80a97def9
Reported-by:	syzbot+4adf0b37849ea7723586@syzkaller.appspotmail.com
2024-04-08 17:09:16 -07:00
Gleb Smirnoff d1cbb17a87 unix: fix the ad hoc STAILQ_PREPEND()
If there is nothing to prepend, don't try STAILQ_INSERT_HEAD().

Fixes:	d80a97def9
Reported-by: syzbot+bb7f3d07c79b5faf8de8@syzkaller.appspotmail.com
2024-04-08 17:02:00 -07:00
Gleb Smirnoff d80a97def9 unix: new implementation of unix/stream & unix/seqpacket
Provide protocol specific pr_sosend and pr_soreceive for PF_UNIX
SOCK_STREAM sockets and implement SOCK_SEQPACKET sockets as an extension
of SOCK_STREAM.  The change meets three goals: get rid of unix(4) specific
stuff in the generic socket code, provide a faster and robust unix/stream
sockets and bring unix/seqpacket much closer to specification.  Highlights
follow:

- The send buffer now is truly bypassed.  Previously it was always empty,
but the send(2) still needed to acquire its lock and do a variety of
tricks to be woken up in the right time while sleeping on it.  Now the
only two things we care about in the send buffer is the I/O sx(9) lock
that serializes operations and value of so_snd.sb_hiwat, which we can read
without obtaining a lock.  The sleep of a send(2) happens on the mutex of
the receive buffer of the peer.  A bulk send/recv of data with large
socket buffers will make both syscalls just bounce between owning the
receive buffer lock and copyin(9)/copyout(9), no other locks would be
involved.

- The implementation uses new mchain structure to manipulate mbuf chains.
Note that this required converting to mchain two functions that are shared
with unix/dgram: unp_internalize() and unp_addsockcred() as well as adding
a new shared one uipc_process_kernel_mbuf().  This induces some non-
functional changes in the unix/dgram code as well.  There is a space for
improvement here, as right now it is a mix of mchain and manually managed
mbuf chains.

- unix/seqpacket previously marked as PR_ADDR & PR_ATOMIC and thus treated
as a datagram socket by the generic socket code, now becomes a true stream
socket with record markers.

- unix/stream loses the sendfile(2) support.  This can be brought back,
but requires some work.  Let's first see if there is any interest in this
feature, except purely academical.

Reviewed by:		markj, tuexen
Differential Revision:	https://reviews.freebsd.org/D44151
2024-04-08 13:16:51 -07:00
Gleb Smirnoff aba79b0f4a mbuf: provide mc_uiotomc() a function to copy from uio(9) to mchain
Implement m_uiotombuf() as a wrapper around mc_uiotomc().  The M_EXTPG is
left untouched.  The m_uiotombuf() is left as a compat KPI.  New code
should use either mc_uiotomc() or m_uiotombuf_nomap().

Reviewed by:		markj, tuexen
Differential Revision:	https://reviews.freebsd.org/D44150
2024-04-08 13:16:51 -07:00
Gleb Smirnoff 71f8702f49 mbuf: provide mc_get() that allocates struct mchain of given length
Implement m_getm2(), which is widely used via m_getm() macro, as a wrapper
around mc_get().  New code is advised to use mc_get().

Reviewed by:		markj, tuexen
Differential Revision:	https://reviews.freebsd.org/D44149
2024-04-08 13:16:51 -07:00
Gleb Smirnoff fd01798fc4 mbuf: add mc_split() that works on two struct mchain
It preserves tail points and all length/memory accounting, so that caller
doesn't need to do any extra traversals.  It doesn't respect M_PKTHDR but
it may be improved if needed.  It respects M_EOR, though.  First consumer
will be the new unix(4) SOCK_STREAM and SOCK_SEQPACKET.

Also provide much more simple mc_concat() that glues two chains back.

Reviewed by:		markj
Differentail Revision:	https://reviews.freebsd.org/D44148
2024-04-08 13:16:51 -07:00
Gleb Smirnoff 3b7aa842e2 sendfile: mark it explicitly as a TCP only feature
Back in 2015 when it turned non-blocking, it was working with PF_UNIX
and it may still work.  However, the usefullness of such application
of sendfile(2) is questionable.  Disable the feature while unix/stream
is under refactoring.

Relnotes:	yes
2024-04-08 13:16:51 -07:00
Jake Freeland aa32d7cbc9 ktrace: Record socket violations with KTR_CAPFAIL
Report restricted access to socket addresses and protocols while
Capsicum violation tracing with CAPFAIL_ADDR and CAPFAIL_PROTO.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40681
2024-04-07 18:52:51 -05:00
Jake Freeland 0cd9cde767 ktrace: Record namei violations with KTR_CAPFAIL
Report namei path lookups while Capsicum violation tracing with
CAPFAIL_NAMEI. vfs caching is also ignored when tracing to mimic
capability mode behavior.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40680
2024-04-07 18:52:51 -05:00
Jake Freeland 6a4616a529 ktrace: Record signal violations with KTR_CAPFAIL
Report the delivery of signals to processes other than self while
Capsicum violation tracing with CAPFAIL_SIGNAL.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40679
2024-04-07 18:52:51 -05:00
Jake Freeland 05296a0ff6 ktrace: Record syscall violations with KTR_CAPFAIL
Report syscalls that are not allowed in capability mode with
CAPFAIL_SYSCALL.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40678
2024-04-07 18:52:51 -05:00
Jake Freeland 96c8b3e509 ktrace: Record cpuset violations with KTR_CAPFAIL
Report Capsicum violations in the cpuset namespace with CAPFAIL_CPUSET.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40677
2024-04-07 18:52:51 -05:00
Jake Freeland 9bec841312 ktrace: Record detailed ECAPMODE violations
When a Capsicum violation occurs in the kernel, ktrace will now record
detailed information pertaining to the violation.

For example:
- When a namei lookup violation occurs, ktrace will record the path.
- When a signal violation occurs, ktrace will record the signal number.
- When a sendto(2) violation occurs, ktrace will record the recipient
  sockaddr.

For all violations, the syscall and ABI is recorded.

kdump is also modified to display this new information to the user.

Reviewed by:	oshogbo, markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40676
2024-04-07 18:52:51 -05:00
Michael Tuexen 681711b77c uipc_socket: handle socket buffer locks in sopeeloff
PR:			278171
Reviewed by:		markj
Fixes:			a4fc41423f ("sockets: enable protocol specific socket buffers")
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D44640
2024-04-05 18:20:19 +02:00
Konstantin Belousov 235436d631 stop_all_proc(): skip traced or signal-stoped processes
Since thread_single(SINGLE_ALLPROC) ignores them since 9241ebc796,
and there is not much we can do for the debugger-controlled process.

Noted by:	olce
Reviewed by:	markj, olce
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44638
2024-04-05 17:52:39 +03:00
Mark Johnston 08f3d5b60c copy_file_range: Call vn_rdwr() at least once
This ensures that we invoke VOP_READ on the input file even if it's
empty, which in turn helps ensure that filesystems update the atime of
the file.

PR:		274615
Reviewed by:	olce, rmacklem, kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D43524
2024-04-04 17:03:07 -04:00
Lawrence Stewart 7eb92c502e Reinstate returning EOVERFLOW from stats_v1_blob_clone()
a0993376ec (from D43179) subtly changed stats_v1_blob_clone() to stop returning EOVERFLOW in the case where the user buffer is not large enough to receive the entire statsblob. This results in any consumers which are implemented to retry on receiving EOVERFLOW to instead give up after receiving an empty statsblob header.

Fix by latching any errors recorded prior to copyout.

Reviewed by:	markj
Obtained from:	Netflix, Inc.
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44585
Fixes:	a0993376ec ("stats: Check for errors from copyout()")
2024-04-03 12:58:26 +11:00
Mark Johnston 7ef5c19b21 kern linker: Don't invoke dtors without having invoked ctors
I have a kernel module which fails to load because of an unrecognized
relocation type.  link_elf_load_file() fails before the module's ctors
are invoked and it calls linker_file_unload(), which causes the module's
dtors to be executed, resulting in a kernel panic.

Add a flag to the linker file to ensure that dtors are not invoked if
unloading due to an error prior to ctors being invoked.

At the moment I only implemented this for link_elf_obj.c since
link_elf.c doesn't invoke dtors, but I refactored link_elf.c to make
them more similar.

Fixes:		9e575fadf4 ("link_elf_obj: Invoke fini callbacks")
Reviewed by:	zlei, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D44559
2024-03-31 14:15:11 -04:00
Alan Cox e0388a906c arm64: enable superpage mappings by pmap_mapdev{,_attr}()
In order for pmap_kenter{,_device}() to create superpage mappings,
either 64 KB or 2 MB, pmap_mapdev{,_attr}() must request appropriately
aligned virtual addresses.

Reviewed by:	markj
Tested by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D42737
2024-03-30 15:41:30 -05:00
Konstantin Belousov 9241ebc796 thread_single(9): decline external requests for traced or debugger-stopped procs
Debugger has the powers to cause unbound delay in single-threading,
which then blocks the threaded taskqueue.  The reproducer is
`truss -f timeout 2 sleep 10`.

Reported by:	mjg
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44523
2024-03-30 16:43:52 +02:00
Bojan Novković bdc903460b kern_ctf.c: Don't print out warning messages unconditionally
The kernel CTF loading routines print various warnings when attempting
to load CTF data from an ELF file. After the changes in c21bc6f3c2
those warnings are unnecessarily printed for each kernel module
that was compiled without CTF data.

The kernel linker already uses the bootverbose flag to conditionally
print CTF loading errors. This patch alters kern_ctf.c
routines to do the same.

Reported by:	Alexander@leidinger.net
Approved by:	markj (mentor)
Fixes: c21bc6f3c2 ("ddb: Add CTF-based pretty printing")
2024-03-29 20:32:18 +01:00
Gleb Smirnoff 1a8d176432 inpcb: fully retire inp_ppcb pointer
Before a protocol specific control block started to embed inpcb in self
(see 0aa120d52f, e68b379244, 483fe96511) this pointer used to point
at it.

Retain kf_sock_inpcb field in the struct kinfo_file in <sys/user.h>.  The
exp-run detected a minimal use of the field in ports:
  * sysutils/lsof - patched upstream
  * net-mgmt/netdata  - patch accepted upstream
  * emulators/qemu-user-static - upstream master branch seems not using
    the field anymore
We can keep the field around for some time, but eventually it may be
reused for something else.

PR:			277659 (exp-run)
Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D44491
2024-03-29 12:18:32 -07:00
Bojan Novković 722b8e3cb6 Fix style nits in kern_linker.c
Reported by:	jrtc27
Fixes:	c21bc6f3c2 ("ddb: Add CTF-based pretty printing")
Approved by:	markj (mentor)
2024-03-28 20:36:30 +01:00
Stephen J. Kiernan 2aee804c9e kerneldump: Add flag to indicate kernel core was successfully dumped
This allows for shutdown_final EVENTHANDLERs to know that a core dump
successfully occurred. Embedded systems may want to record this fact
or act on it.

Obtained from:	Juniper Networks, Inc.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44542
2024-03-28 14:11:16 -04:00
Randall Stewart b7b78c1c16 Optimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold
HPTS inserts a softclock for system call return that optimizes performance. However when
no HPTS threads need the help (i.e. when they have less than 100 or so connections) then
there should be little work done i.e. check the counter and return instead of running through
all the threads getting locks etc.ptimize HPTS so that little work is done until we have a hpts
thread that is over the connection threshold.

Reported by:    eduardo
Reviewed by:    gallatin, glebius, tuexen
Tested by:      gallatin
Differential Revision: https://reviews.freebsd.org/D44420
2024-03-28 08:12:37 -04:00
Zhenlei Huang 1c7307cf67 kern linker: Make linker_file_add_dependency() void
The only possible return value has been zero since cee9542d51.

No functional change intended.

Reviewed by:	dfr
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44507
2024-03-27 12:02:32 +08:00
Zhenlei Huang 39450eba8e kern linker: Do not touch userrefs of the kernel file
A nonzero `userrefs` of a linker file indicates that the file, either
loaded from kldload(2) or preloaded, can be unloaded via kldunload(2).
As for the kernel file, it can be unloaded by the loader but should not
be after initialization.

This change fixes regression from d9ce8a41ea which incidentally
increases `userrefs` of the kernel file.

Reviewed by:	dfr, dab, jhb
Fixes:	d9ce8a41ea kern_linker: Handle module-loading failures in preloaded .ko files
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D42530
2024-03-26 16:47:02 +08:00
Zhenlei Huang f43ff3e15c kern linker: Do not unload a module if it has dependants
Despite the name, linker_file_unload() will drop a reference and return
success when the module file has dependants, i.e. it has more than one
reference. When user request to unload such modules then the kernel
should reject unambiguously and immediately.

PR:		274986
Reviewed by:	dfr, dab, jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D42527
2024-03-26 11:55:45 +08:00
Konstantin Belousov e0c92dd2b7 amd64: initialize td_frame stack area for init(8) main thread
Unitialized td_frame mostly does not matter since all registers are
overwritten on exec to activate init(8).  Except PSL_T bit from the
%rflags which might leak into fresh init as garbage, causing spurious
SIGTRAPs delivered to init until first syscall is executed.

Reviewed by:	emaste, jhb, jhibbits
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44498
2024-03-26 04:01:38 +02:00
Gleb Smirnoff 15bfd7cf27 soreceive_dgram: use M_WAITOK when we don't hold any locks 2024-03-22 22:44:16 -07:00
Gleb Smirnoff 26389b308d soreceive_dgram: assert that a datagram has control or data 2024-03-22 22:44:16 -07:00
Mitchell Horne dc7ae2bc6f kern_ctf.c: fix linking with nooptions DDB
!DDB builds don't include the db_ctf_lookup_typename() symbol, so this
is a stop-gap to fix linking of the MINIMAL kernel config.

Reported by:	bapt
Fixes:		c21bc6f3c2 ("ddb: Add CTF-based pretty printing")
2024-03-22 13:26:00 -03:00
Bojan Novković c21bc6f3c2 ddb: Add CTF-based pretty printing
Add basic CTF support and a CTF-powered pretty-printer to ddb.

The db_ctf.* files expose a basic interface for fetching type
data for ELF symbols, interacting with the CTF string table,
and translating type identifiers to type data.

The db_pprint.c file uses those interfaces to implement
a pretty-printer for all kernel ELF symbols.
The pretty-printer works with symbol names and arbitrary addresses:
pprint struct thread 0xffffffff8194ad90

Pretty-printing currently only works after the root filesystem
gets mounted because the CTF info is not available during
early boot.

Differential Revision:	https://reviews.freebsd.org/D37899
Approved by: markj (mentor)
2024-03-22 04:03:33 +01:00
Brooks Davis e07d37c705 sysent: regen 2024-03-19 23:13:27 +00:00
Brooks Davis 27676ae365 syscalls.master: use __acl_type_t
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44418
2024-03-19 23:13:27 +00:00
Brooks Davis d0efabdf15 syscalls.master: make __sys_fcntl take an intptr_t
The (optional) third argument of fcntl is sometimes a pointer so change
the type to intptr_t.  Update the libc-internal defintion (actually used
by libthr) to take a fixed intptr_t argument rather than pretending it's
a variadic function.  (That worked because all supported architectures
pass variadic arguments as though the function was declared with those
types.  In CheriBSD that changes because variadic arguments are passed
via a bounded array.)

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44381
2024-03-19 23:13:26 +00:00
Brooks Davis cab73e5305 syscalls.master: struct siginfo -> struct __siginfo
struct siginfo doesn't exist, it's struct __siginfo (and siginfo_t).

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44380
2024-03-19 23:13:26 +00:00
Brooks Davis 7936d4e4d0 syscalls.master: align with sigfastblock declaration
sigfastblock is declared to take a void * argument in the manpage in
headers so declare it that way and use SAL annotations to say it
interacts with a 32-bit word.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44379
2024-03-19 23:13:26 +00:00
Brooks Davis d8d4ed26c9 syscall.master: fix aio_suspend signature
It takes a `const struct iovec *iovp`.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44378
2024-03-19 23:13:26 +00:00
Brooks Davis 128443a9f2 syscalls.master: fix readv and writev iovp decl
Both take const struct iovec * and only read the values.

Reviewed by:	olce, kib
Differential Revision:	https://reviews.freebsd.org/D44377
2024-03-19 23:13:25 +00:00
Vijeyalakshumi Koteeswaran 60bc9617e7 kerneldump: add livedump_start_vnode(9)
livedump_start_vnode(9) is introduced such that the live minidump on the
system could take a vnode. This interface could be used to extend support
for the existing framework in downstream.

Bump __FreeBSD_version for introducing livedump_start_vnode(9).

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	khng
Differential Revision:	https://reviews.freebsd.org/D43471
2024-03-18 17:12:18 -04:00
Richard Scheffenegger b5a9299bb8 ktls: catch invalid parameters earlier
Move safety checks forward from ktls_session_create() to
ktls_copyin_tls_enable(). Prevents zero mallocs, and excessively
large kernel mallocs.

Reported-by:	syzbot+72022fa9163fa958b66c@syzkaller.appspotmail.com
Reported-by:	syzbot+8992893e13058ce0670a@syzkaller.appspotmail.com
Sponsored by:	NetApp, Inc.
X-NetApp-PR:	#79
Reviewed By:	tuexen
Differential Revision:	https://reviews.freebsd.org/D44364
2024-03-18 03:37:49 +01:00
Gleb Smirnoff d62c4607e8 sockets: remove unused KPIs to manipulate sockets
These KPIs were added in dd0e6c383a and through 15 years had zero use.
They slightly remind what IfAPI does for struct ifnet.  But IfAPI does
that for the sake of large collection of NIC drivers not being aware of
struct ifnet.  For the sockets it is unclear what could be a large
collection of externally written kernel modules that need extensively use
sockets and not be aware of their internals at the same time. This
isolation of a structure knowledge requires a lot of work, and just
throwing in a few KPIs isn't helpful.

Reviewed by:		kib, olce, markj
Differential Revision:	https://reviews.freebsd.org/D44311
2024-03-18 08:50:30 -07:00
Mateusz Guzik b0aaf8beb1 Rename VM_LAST to more appropriate VM_GUEST_LAST
NFC

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-03-18 10:49:09 +00:00
Rick Macklem 89f1dcb3eb vfs_vnops.c: Use va_bytes >= va_size hint to avoid SEEK_DATA/SEEKHOLE
vn_generic_copy_file_range() tries to maintain holes
in file ranges being copied, using SEEK_DATA/SEEK_HOLE
where possible,

Unfortunately SEEK_DATA/SEEK_HOLE operations can take
a long time under certain circumstances.
Although it is not currently possible to know if a file has
unallocated data regions, the case where va_bytes >= va_size
is a strong hint that there are no unallocated data regions.
This hint does not work well for file systems doing compression,
but since it is only a hint, it is still useful.

For the case of va_bytes >= va_size, avoid doing SEEK_DATA/SEEK_HOLE.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D44509
2024-03-14 17:35:32 -07:00
John Baldwin 9dbf5b0e68 new-bus: Remove the 'rid' and 'type' arguments from BUS_RELEASE_RESOURCE
The public bus_release_resource() API still accepts both forms, but
the internal kobj method no longer passes the arguments.
Implementations which need the rid or type now use rman_get_rid() or
rman_get_type() to fetch the value from the allocated resource.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44131
2024-03-13 15:05:54 -07:00
John Baldwin 2baed46e85 new-bus: Remove the 'rid' and 'type' arguments from BUS_*ACTIVATE_RESOURCE
The public bus_activate/deactivate_resource() API still accepts both
forms, but the internal kobj methods no longer pass the arguments.
Implementations which need the rid or type now use rman_get_rid() or
rman_get_type() to fetch the value from the allocated resource.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44130
2024-03-13 15:05:54 -07:00
John Baldwin d77f2092ce new-bus: Remove the 'type' argument from BUS_MAP/UNMAP_RESOURCE
The public bus_map/unmap_resource() API still accepts both forms, but
the internal kobj methods no longer pass the argument.
Implementations which need the type now use rman_get_type() to fetch
the value from the allocated resource.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44129
2024-03-13 15:05:54 -07:00
John Baldwin fef01f0498 new-bus: Remove the 'type' argument from BUS_ADJUST_RESOURCE
The public bus_adjust_resource() API still accepts both forms, but the
internal kobj method no longer passes the argument.  Implementations
which need the type now use rman_get_type() to fetch the value from
the allocated resource.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44128
2024-03-13 15:05:54 -07:00
John Baldwin 9edb8d0aed new-bus: Introduce a simpler bus API for managing resources
Remove the 'type' and 'rid' arguments from the wrapper bus API
functions (e.g. bus_release_resource) that accept a struct resource.
The "new" versions extract the 'type' and/or 'rid' from the passed in
resource object via rman_get_type and rman_get_rid.

This commit adds the new API as functions with a _new suffix.  Wrapper
macros choose between the old and new functions based on the number of
arguments provided to the macro.  This commit does not change the ABI
but can be safely MFCd to older branches so long as older kernels use
rman_set_type when allocating resources.

Future commits will push the removal of these extraneous arguments
through the bus implementation.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44124
2024-03-13 15:05:53 -07:00
John Baldwin 1b9bcffff3 sys: Set the type of allocated bus resources
Use rman_set_type to set the type of allocated resources everywhere
rman_set_rid is currently called.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44123
2024-03-13 15:05:53 -07:00
John Baldwin b30a80b655 rman: Add rman_get/set_type
This permits associating a resource type (e.g. SYS_RES_MEMORY) with a
struct resource.

I considered adding a new field to struct rman to store the type and
only providing rman_get_type as an accessor.  However, changing
'struct rman' is an ABI breakage.  I might revisit this in main, but
the current approach is MFC'able.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44122
2024-03-13 15:05:53 -07:00
Richard Scheffenegger 85df11a1de ktls: deep copy tls_enable struct for in-kernel tcp consumers
Doing a deep copy of the keys early allows users of the
tls_enable structure to assume kernel memory.
This enables the socket options to be set by kernel threads.

Reviewed By:	#transport, tuexen, jhb, rrs
Sponsored by:	NetApp, Inc.
X-NetApp-PR:	#79
Differential Revision:	https://reviews.freebsd.org/D44250
2024-03-13 13:23:13 +01:00
John Baldwin f980f48f13 Revert "new-bus: Disable assertions for rman mismatches for activate/deactivate"
With recent fixes to the ACPI and pcib drivers to translate mapping
requests of child resources into mappings of sub-ranges of parent
resources these assertions should now be true.

This reverts commit ed88eef140.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D43691
2024-03-13 13:19:10 -07:00
Jason A. Harmening d56c175ac9 uipc_bindat(): Explicitly specify exclusive locking for the new vnode
When calling VOP_CREATE(), uipc_bindat() reuses the componentname
object from the preceding lookup operation, which is likely to specify
LK_SHARED.  Furthermore, the VOP_CREATE() interface technically only
requires the newly-created vnode to be returned with a shared lock.
However, the socket layer requires the new vnode to be locked exclusive
and asserts to that effect.

In most cases, this is not a practical concern because most if not
all base-layer filesystems (certainly FFS, ZFS, and msdosfs at least)
always return the vnode locked exclusive regardless of the lock flags.
However, it is an issue for unionfs which uses cn_lkflags to determine
how the new unionfs wrapper vnode should be locked.  While it would
be easy enough to work around this issue within unionfs itself, it
seems better for the socket layer to be explicit about its locking
requirements when issuing VOP_CREATE().

Reviewed by:		kib, olce
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D44047
2024-03-09 19:48:02 -06:00
Jason A. Harmening fa26f46dc2 vn_lock_pair(): allow lkflags1/lkflags2 to be 0 if vp1/vp2 is NULL
It's a bit strange to require the caller to pass contrived lock flags
if the corresponding vnode is NULL, simply to appease the assertion
that exactly one of LK_SHARED or LK_EXCLUSIVE must be set.  On the
other hand, we still want to catch cases in which completely bogus
or corrupt flags are passed even if the corresponding vnode is NULL.
Therefore, specifically allow empty flags for lkflags1/lkflags2 iff
the respective vp1/vp2 param is NULL.

Reviewed by:		kib, olce
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D44046
2024-03-09 19:41:45 -06:00
Mark Johnston a58813fd70 ktrace: Fix the build when options KTRACE is not configured
MFC after:	1 week
Reported by:	John Nielsen <lists@jnielsen.net>
2024-03-09 00:33:55 -05:00