Commit graph

20082 commits

Author SHA1 Message Date
Mark Johnston d66399326c kthread: Set *tdptr earlier in kproc_kthread_add()
See commit ae77041e07 ("kthread: Set *newtdp earlier in
kthread_add1()") for details.  That commit was incomplete since
g_init()'s first call to kproc_kthread_add() will cause
kproc_kthread_add() to take the `*procptr == NULL` branch, which avoids
kthread_create().

To ensure that the thread pointer is initialized before the thread
starts running, we have to start the kernel process with RFSTOPPED.
We could perhaps go further and use RFSTOPPED only when tdptr != NULL,
but it's probably better to have consistent behaviour.

Reviewed by:	olce, kib
Reported by:	syzbot+e91e798f3c088215ace6@syzkaller.appspotmail.com
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44927
2024-04-25 09:35:38 -04:00
Gleb Smirnoff 19307b86d3 accept_filter: return different errors for non-listener and a busy socket
The fact that an accept filter needs to be cleared first before setting to
a different one isn't properly documented.  The requirement that the
socket needs already be listening, although trivial, isn't documented
either.  At least return a more meaningful error than EINVAL for an
existing filter.  Cover this with a test case.
2024-04-24 21:55:58 -07:00
Brooks Davis 78101d437a syscalls.master: correct return type of {read,write}v
This was missed when read/write, etc were updated to return ssize_t.

Fixes:		2e83b28161 Fix a few syscall arguments to use size_t instead of u_int.

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D44930
2024-04-24 20:48:46 +01:00
Konstantin Belousov 6b0cf2a237 vfs_lookup.c: only call ktrcapfail() if KTRACE is enabled
Reviewed by:	emaste, imp, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44931
2024-04-24 22:43:32 +03:00
Konstantin Belousov 66df81021e sys/namei.h: move NI_CAP_VIOLATION() macro from namei.h to vfs_lookup.c
Reviewed by:	emaste, imp, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44931
2024-04-24 22:43:31 +03:00
Mark Johnston 8ef2c02182 busdma: uma_zcreate() does not fail
No functional change intended.

MFC after:	1 week
2024-04-24 08:46:41 -04:00
Mark Johnston 1e607a0753 khelp: uma_zcreate() does not fail
No functional change intended.

MFC after:	1 week
2024-04-24 08:46:35 -04:00
Gleb Smirnoff a8acc2bf56 sockets: inherit SO_ACCEPTFILTER from listener to child
This is crucial for operation of accept_filter(9).  See added comment.

Fixes:	d29b95ecc0
2024-04-23 17:17:14 -07:00
Konstantin Belousov 53186bc143 sigqueue(2): add impl-specific flag __SIGQUEUE_TID
The flag allows the pid argument to designate a thread from the calling
process.  The flag value is carved from the high bit of the signal
number, which slightly changes the ABI of syscall.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44867
2024-04-23 19:51:09 +03:00
Konstantin Belousov 0c11c1792b kern_thr.c: normalize includes
Remove extra sys/param.h, provided by sys/systm.h.
Order the rest alphabetically.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44867
2024-04-23 19:51:07 +03:00
Konstantin Belousov 2effad53b4 kern_thr.c/kern_sig.c: remove sys/cdefs.h
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44867
2024-04-23 19:51:05 +03:00
Konstantin Belousov 53e0938b0b kern_thread.c: remove unneeded include of sys/param.h
Handled by sys/systm.h already.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44867
2024-04-23 19:51:03 +03:00
Mark Johnston 7a7063cc54 thread: Add a missing include of asan.h
I didn't notice this during testing because invariants-enabled kernels
implicitly include asan.h via kassert.h.

Reported by:	Lexi Winter <lexi@le-Fay.org>
Fixes:		800da341bc ("thread: Simplify sanitizer integration with thread creation")
2024-04-22 13:07:53 -04:00
Mark Johnston 800da341bc thread: Simplify sanitizer integration with thread creation
fork() may allocate a new thread in one of two ways: from UMA, or cached
in a freed proc that was just allocated from UMA.  In either case, KASAN
and KMSAN need to initialize some state; in particular they need to
initialize the shadow mapping of the new thread's stack.

This is done differently between KASAN and KMSAN, which is confusing.
This patch improves things a bit:
- Add a new thread_recycle() function, which moves all kernel stack
  handling out of kern_fork.c, since it doesn't really belong there.
- Then, thread_alloc_stack() has only one local caller, so just inline
  it.
- Avoid redundant shadow stack initialization: thread_alloc()
  initializes the KMSAN shadow stack (via kmsan_thread_alloc()) even
  through vm_thread_new() already did that.
- Add kasan_thread_alloc(), for consistency with kmsan_thread_alloc().

No functional change intended.

Reviewed by:	khng
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44891
2024-04-22 11:46:59 -04:00
Gordon Bergling 9576fc16ca uipc_domain: Fix a typo in a source code comment
- s/cant/can't/

MFC after:	3 days
2024-04-21 09:51:14 +02:00
Ka Ho Ng 68a3a7fc94 kasan: fix false-positive kasan_report upon thread reuse
In fork1(), if a thread is reused and thread_alloc_stack() is not
called, mark the reused thread's kstack pages clean in the KASAN shadow
buffer.

Sponsored by:	Juniper Networks, Inc.
MFC after:	3 days
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D44875
2024-04-19 15:18:27 -04:00
Mark Johnston e411b22736 uipc_shm: Fix a free() of an uninitialized variable
Reported by:	Coverity
CID:		1544043
Fixes:		b112232e4f ("uipc_shm: Copyin userpath for ktrace(2)")
2024-04-18 20:18:29 -04:00
Brooks Davis 1fd880742a libsys: add a libsys.h
This declares an API for libsys which currently consists of
__sys_<foo>() declarations for system call stubs and function pointer
typedefs of the form __sys_<foo>_t.  The vast majority of the
implementation resides in a generated _libsys.h which ensures that all
system call stub declarations match syscalls.master.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44387
2024-04-16 17:48:07 +01:00
Brooks Davis 6bb132ba1e Reduce reliance on sys/sysproto.h pollution
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.

sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44465
2024-04-15 21:35:40 +01:00
Gleb Smirnoff e6a4b57239 mbuf: restore m_uiotombuf() feature of returning a zero length mbuf
PR:	278340
Fixes:	aba79b0f4a
2024-04-14 10:21:07 -07:00
Gleb Smirnoff 0020e1b617 Revert "sendfile: mark it explicitly as a TCP only feature"
This reverts commit 3b7aa842e2.
2024-04-10 11:28:11 -07:00
Olivier Certner afc10f8bba
sys_procctl(): Make it clear that negative commands are invalid
An initial reading of the preamble of sys_procctl() gives the impression
that no test prevents a malicious user from passing a negative commands
index (in 'uap->com'), which is soon used as an index into the static
array procctl_cmds_info[].

However, a closer examination leads to the conclusion that the existing
code is technically correct.  Indeed, the comparison of 'uap->com' to
the nitems() expression, which expands to a ratio of sizeof(), leads to
a conversion of 'uap->com' to an 'unsigned int' as per Usual Arithmetic
Conversions/Integer Promotions applied by '<=', because sizeof() returns
'size_t' values, and we define 'size_t' as an equivalent of 'unsigned
int' (which is not mandated by the standard, the latter allowing, e.g.,
integers of lower ranks).

With this conversion, negative values of 'uap->com' are automatically
ruled-out since they are converted to very big unsigned integers which
are caught by the test.  An analysis of assembly code produced by LLVM
16 on amd64 and practical tests confirm that no exploitation is possible.

However, the guard code as written is misleading to readers and might
trip up static analysis tools.  Make sure that negative values are
explicitly excluded so that it is immediately clear that EINVAL will be
returned in this case.

Build tested with clang 16 and GCC 12.

Approved by:    markj (mentor)
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
2024-04-10 17:15:25 +02:00
Jake Freeland b112232e4f uipc_shm: Copyin userpath for ktrace(2)
If userpath is not SHM_ANON, then copy it in early so ktrace(2) can
record it. Without this change, ktrace(2) will attempt to strcpy a
userspace string and trigger a page fault.

Reported by:	syzbot+490b9c2a89f53b1b9779@syzkaller.appspotmail.com
Fixes:		0cd9cde767
Approved by:	markj (mentor)
Reviewed by:	markj
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D44702
2024-04-09 21:17:11 -05:00
Gleb Smirnoff 5716d902ae Revert "unix: new implementation of unix/stream & unix/seqpacket"
The regressions in aio(4) and kernel RPC aren't a 5 minute problem.

This reverts commit d80a97def9.
This reverts commit d1cbb17a87.
This reverts commit fb8a8333b4.
2024-04-09 13:15:47 -07:00
Stephen J. Kiernan 81b4d1c4d4 sockets: Add hhook in sonewconn for inheriting OSD specific data
Added HHOOK_SOCKET_NEWCONN and bumped HHOOK_SOCKET_LAST

Reviewed by:	glebius, tuexen
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D44632
2024-04-08 21:31:34 -04:00
Gleb Smirnoff fb8a8333b4 unix: return immediately on MSG_OOB
Jumping to cleanup routines will work on uninitialized stack mc.

Fixes:	d80a97def9
Reported-by:	syzbot+4adf0b37849ea7723586@syzkaller.appspotmail.com
2024-04-08 17:09:16 -07:00
Gleb Smirnoff d1cbb17a87 unix: fix the ad hoc STAILQ_PREPEND()
If there is nothing to prepend, don't try STAILQ_INSERT_HEAD().

Fixes:	d80a97def9
Reported-by: syzbot+bb7f3d07c79b5faf8de8@syzkaller.appspotmail.com
2024-04-08 17:02:00 -07:00
Gleb Smirnoff d80a97def9 unix: new implementation of unix/stream & unix/seqpacket
Provide protocol specific pr_sosend and pr_soreceive for PF_UNIX
SOCK_STREAM sockets and implement SOCK_SEQPACKET sockets as an extension
of SOCK_STREAM.  The change meets three goals: get rid of unix(4) specific
stuff in the generic socket code, provide a faster and robust unix/stream
sockets and bring unix/seqpacket much closer to specification.  Highlights
follow:

- The send buffer now is truly bypassed.  Previously it was always empty,
but the send(2) still needed to acquire its lock and do a variety of
tricks to be woken up in the right time while sleeping on it.  Now the
only two things we care about in the send buffer is the I/O sx(9) lock
that serializes operations and value of so_snd.sb_hiwat, which we can read
without obtaining a lock.  The sleep of a send(2) happens on the mutex of
the receive buffer of the peer.  A bulk send/recv of data with large
socket buffers will make both syscalls just bounce between owning the
receive buffer lock and copyin(9)/copyout(9), no other locks would be
involved.

- The implementation uses new mchain structure to manipulate mbuf chains.
Note that this required converting to mchain two functions that are shared
with unix/dgram: unp_internalize() and unp_addsockcred() as well as adding
a new shared one uipc_process_kernel_mbuf().  This induces some non-
functional changes in the unix/dgram code as well.  There is a space for
improvement here, as right now it is a mix of mchain and manually managed
mbuf chains.

- unix/seqpacket previously marked as PR_ADDR & PR_ATOMIC and thus treated
as a datagram socket by the generic socket code, now becomes a true stream
socket with record markers.

- unix/stream loses the sendfile(2) support.  This can be brought back,
but requires some work.  Let's first see if there is any interest in this
feature, except purely academical.

Reviewed by:		markj, tuexen
Differential Revision:	https://reviews.freebsd.org/D44151
2024-04-08 13:16:51 -07:00
Gleb Smirnoff aba79b0f4a mbuf: provide mc_uiotomc() a function to copy from uio(9) to mchain
Implement m_uiotombuf() as a wrapper around mc_uiotomc().  The M_EXTPG is
left untouched.  The m_uiotombuf() is left as a compat KPI.  New code
should use either mc_uiotomc() or m_uiotombuf_nomap().

Reviewed by:		markj, tuexen
Differential Revision:	https://reviews.freebsd.org/D44150
2024-04-08 13:16:51 -07:00
Gleb Smirnoff 71f8702f49 mbuf: provide mc_get() that allocates struct mchain of given length
Implement m_getm2(), which is widely used via m_getm() macro, as a wrapper
around mc_get().  New code is advised to use mc_get().

Reviewed by:		markj, tuexen
Differential Revision:	https://reviews.freebsd.org/D44149
2024-04-08 13:16:51 -07:00
Gleb Smirnoff fd01798fc4 mbuf: add mc_split() that works on two struct mchain
It preserves tail points and all length/memory accounting, so that caller
doesn't need to do any extra traversals.  It doesn't respect M_PKTHDR but
it may be improved if needed.  It respects M_EOR, though.  First consumer
will be the new unix(4) SOCK_STREAM and SOCK_SEQPACKET.

Also provide much more simple mc_concat() that glues two chains back.

Reviewed by:		markj
Differentail Revision:	https://reviews.freebsd.org/D44148
2024-04-08 13:16:51 -07:00
Gleb Smirnoff 3b7aa842e2 sendfile: mark it explicitly as a TCP only feature
Back in 2015 when it turned non-blocking, it was working with PF_UNIX
and it may still work.  However, the usefullness of such application
of sendfile(2) is questionable.  Disable the feature while unix/stream
is under refactoring.

Relnotes:	yes
2024-04-08 13:16:51 -07:00
Jake Freeland aa32d7cbc9 ktrace: Record socket violations with KTR_CAPFAIL
Report restricted access to socket addresses and protocols while
Capsicum violation tracing with CAPFAIL_ADDR and CAPFAIL_PROTO.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40681
2024-04-07 18:52:51 -05:00
Jake Freeland 0cd9cde767 ktrace: Record namei violations with KTR_CAPFAIL
Report namei path lookups while Capsicum violation tracing with
CAPFAIL_NAMEI. vfs caching is also ignored when tracing to mimic
capability mode behavior.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40680
2024-04-07 18:52:51 -05:00
Jake Freeland 6a4616a529 ktrace: Record signal violations with KTR_CAPFAIL
Report the delivery of signals to processes other than self while
Capsicum violation tracing with CAPFAIL_SIGNAL.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40679
2024-04-07 18:52:51 -05:00
Jake Freeland 05296a0ff6 ktrace: Record syscall violations with KTR_CAPFAIL
Report syscalls that are not allowed in capability mode with
CAPFAIL_SYSCALL.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40678
2024-04-07 18:52:51 -05:00
Jake Freeland 96c8b3e509 ktrace: Record cpuset violations with KTR_CAPFAIL
Report Capsicum violations in the cpuset namespace with CAPFAIL_CPUSET.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40677
2024-04-07 18:52:51 -05:00
Jake Freeland 9bec841312 ktrace: Record detailed ECAPMODE violations
When a Capsicum violation occurs in the kernel, ktrace will now record
detailed information pertaining to the violation.

For example:
- When a namei lookup violation occurs, ktrace will record the path.
- When a signal violation occurs, ktrace will record the signal number.
- When a sendto(2) violation occurs, ktrace will record the recipient
  sockaddr.

For all violations, the syscall and ABI is recorded.

kdump is also modified to display this new information to the user.

Reviewed by:	oshogbo, markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40676
2024-04-07 18:52:51 -05:00
Michael Tuexen 681711b77c uipc_socket: handle socket buffer locks in sopeeloff
PR:			278171
Reviewed by:		markj
Fixes:			a4fc41423f ("sockets: enable protocol specific socket buffers")
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D44640
2024-04-05 18:20:19 +02:00
Konstantin Belousov 235436d631 stop_all_proc(): skip traced or signal-stoped processes
Since thread_single(SINGLE_ALLPROC) ignores them since 9241ebc796,
and there is not much we can do for the debugger-controlled process.

Noted by:	olce
Reviewed by:	markj, olce
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44638
2024-04-05 17:52:39 +03:00
Mark Johnston 08f3d5b60c copy_file_range: Call vn_rdwr() at least once
This ensures that we invoke VOP_READ on the input file even if it's
empty, which in turn helps ensure that filesystems update the atime of
the file.

PR:		274615
Reviewed by:	olce, rmacklem, kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D43524
2024-04-04 17:03:07 -04:00
Lawrence Stewart 7eb92c502e Reinstate returning EOVERFLOW from stats_v1_blob_clone()
a0993376ec (from D43179) subtly changed stats_v1_blob_clone() to stop returning EOVERFLOW in the case where the user buffer is not large enough to receive the entire statsblob. This results in any consumers which are implemented to retry on receiving EOVERFLOW to instead give up after receiving an empty statsblob header.

Fix by latching any errors recorded prior to copyout.

Reviewed by:	markj
Obtained from:	Netflix, Inc.
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44585
Fixes:	a0993376ec ("stats: Check for errors from copyout()")
2024-04-03 12:58:26 +11:00
Mark Johnston 7ef5c19b21 kern linker: Don't invoke dtors without having invoked ctors
I have a kernel module which fails to load because of an unrecognized
relocation type.  link_elf_load_file() fails before the module's ctors
are invoked and it calls linker_file_unload(), which causes the module's
dtors to be executed, resulting in a kernel panic.

Add a flag to the linker file to ensure that dtors are not invoked if
unloading due to an error prior to ctors being invoked.

At the moment I only implemented this for link_elf_obj.c since
link_elf.c doesn't invoke dtors, but I refactored link_elf.c to make
them more similar.

Fixes:		9e575fadf4 ("link_elf_obj: Invoke fini callbacks")
Reviewed by:	zlei, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D44559
2024-03-31 14:15:11 -04:00
Alan Cox e0388a906c arm64: enable superpage mappings by pmap_mapdev{,_attr}()
In order for pmap_kenter{,_device}() to create superpage mappings,
either 64 KB or 2 MB, pmap_mapdev{,_attr}() must request appropriately
aligned virtual addresses.

Reviewed by:	markj
Tested by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D42737
2024-03-30 15:41:30 -05:00
Konstantin Belousov 9241ebc796 thread_single(9): decline external requests for traced or debugger-stopped procs
Debugger has the powers to cause unbound delay in single-threading,
which then blocks the threaded taskqueue.  The reproducer is
`truss -f timeout 2 sleep 10`.

Reported by:	mjg
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44523
2024-03-30 16:43:52 +02:00
Bojan Novković bdc903460b kern_ctf.c: Don't print out warning messages unconditionally
The kernel CTF loading routines print various warnings when attempting
to load CTF data from an ELF file. After the changes in c21bc6f3c2
those warnings are unnecessarily printed for each kernel module
that was compiled without CTF data.

The kernel linker already uses the bootverbose flag to conditionally
print CTF loading errors. This patch alters kern_ctf.c
routines to do the same.

Reported by:	Alexander@leidinger.net
Approved by:	markj (mentor)
Fixes: c21bc6f3c2 ("ddb: Add CTF-based pretty printing")
2024-03-29 20:32:18 +01:00
Gleb Smirnoff 1a8d176432 inpcb: fully retire inp_ppcb pointer
Before a protocol specific control block started to embed inpcb in self
(see 0aa120d52f, e68b379244, 483fe96511) this pointer used to point
at it.

Retain kf_sock_inpcb field in the struct kinfo_file in <sys/user.h>.  The
exp-run detected a minimal use of the field in ports:
  * sysutils/lsof - patched upstream
  * net-mgmt/netdata  - patch accepted upstream
  * emulators/qemu-user-static - upstream master branch seems not using
    the field anymore
We can keep the field around for some time, but eventually it may be
reused for something else.

PR:			277659 (exp-run)
Reviewed by:		tuexen
Differential Revision:	https://reviews.freebsd.org/D44491
2024-03-29 12:18:32 -07:00
Bojan Novković 722b8e3cb6 Fix style nits in kern_linker.c
Reported by:	jrtc27
Fixes:	c21bc6f3c2 ("ddb: Add CTF-based pretty printing")
Approved by:	markj (mentor)
2024-03-28 20:36:30 +01:00
Stephen J. Kiernan 2aee804c9e kerneldump: Add flag to indicate kernel core was successfully dumped
This allows for shutdown_final EVENTHANDLERs to know that a core dump
successfully occurred. Embedded systems may want to record this fact
or act on it.

Obtained from:	Juniper Networks, Inc.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44542
2024-03-28 14:11:16 -04:00
Randall Stewart b7b78c1c16 Optimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold
HPTS inserts a softclock for system call return that optimizes performance. However when
no HPTS threads need the help (i.e. when they have less than 100 or so connections) then
there should be little work done i.e. check the counter and return instead of running through
all the threads getting locks etc.ptimize HPTS so that little work is done until we have a hpts
thread that is over the connection threshold.

Reported by:    eduardo
Reviewed by:    gallatin, glebius, tuexen
Tested by:      gallatin
Differential Revision: https://reviews.freebsd.org/D44420
2024-03-28 08:12:37 -04:00