See commit ae77041e07 ("kthread: Set *newtdp earlier in
kthread_add1()") for details. That commit was incomplete since
g_init()'s first call to kproc_kthread_add() will cause
kproc_kthread_add() to take the `*procptr == NULL` branch, which avoids
kthread_create().
To ensure that the thread pointer is initialized before the thread
starts running, we have to start the kernel process with RFSTOPPED.
We could perhaps go further and use RFSTOPPED only when tdptr != NULL,
but it's probably better to have consistent behaviour.
Reviewed by: olce, kib
Reported by: syzbot+e91e798f3c088215ace6@syzkaller.appspotmail.com
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44927
The fact that an accept filter needs to be cleared first before setting to
a different one isn't properly documented. The requirement that the
socket needs already be listening, although trivial, isn't documented
either. At least return a more meaningful error than EINVAL for an
existing filter. Cover this with a test case.
This was missed when read/write, etc were updated to return ssize_t.
Fixes: 2e83b28161 Fix a few syscall arguments to use size_t instead of u_int.
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D44930
The flag allows the pid argument to designate a thread from the calling
process. The flag value is carved from the high bit of the signal
number, which slightly changes the ABI of syscall.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44867
Remove extra sys/param.h, provided by sys/systm.h.
Order the rest alphabetically.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44867
I didn't notice this during testing because invariants-enabled kernels
implicitly include asan.h via kassert.h.
Reported by: Lexi Winter <lexi@le-Fay.org>
Fixes: 800da341bc ("thread: Simplify sanitizer integration with thread creation")
fork() may allocate a new thread in one of two ways: from UMA, or cached
in a freed proc that was just allocated from UMA. In either case, KASAN
and KMSAN need to initialize some state; in particular they need to
initialize the shadow mapping of the new thread's stack.
This is done differently between KASAN and KMSAN, which is confusing.
This patch improves things a bit:
- Add a new thread_recycle() function, which moves all kernel stack
handling out of kern_fork.c, since it doesn't really belong there.
- Then, thread_alloc_stack() has only one local caller, so just inline
it.
- Avoid redundant shadow stack initialization: thread_alloc()
initializes the KMSAN shadow stack (via kmsan_thread_alloc()) even
through vm_thread_new() already did that.
- Add kasan_thread_alloc(), for consistency with kmsan_thread_alloc().
No functional change intended.
Reviewed by: khng
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44891
In fork1(), if a thread is reused and thread_alloc_stack() is not
called, mark the reused thread's kstack pages clean in the KASAN shadow
buffer.
Sponsored by: Juniper Networks, Inc.
MFC after: 3 days
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D44875
This declares an API for libsys which currently consists of
__sys_<foo>() declarations for system call stubs and function pointer
typedefs of the form __sys_<foo>_t. The vast majority of the
implementation resides in a generated _libsys.h which ensures that all
system call stub declarations match syscalls.master.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44387
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.
sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44465
An initial reading of the preamble of sys_procctl() gives the impression
that no test prevents a malicious user from passing a negative commands
index (in 'uap->com'), which is soon used as an index into the static
array procctl_cmds_info[].
However, a closer examination leads to the conclusion that the existing
code is technically correct. Indeed, the comparison of 'uap->com' to
the nitems() expression, which expands to a ratio of sizeof(), leads to
a conversion of 'uap->com' to an 'unsigned int' as per Usual Arithmetic
Conversions/Integer Promotions applied by '<=', because sizeof() returns
'size_t' values, and we define 'size_t' as an equivalent of 'unsigned
int' (which is not mandated by the standard, the latter allowing, e.g.,
integers of lower ranks).
With this conversion, negative values of 'uap->com' are automatically
ruled-out since they are converted to very big unsigned integers which
are caught by the test. An analysis of assembly code produced by LLVM
16 on amd64 and practical tests confirm that no exploitation is possible.
However, the guard code as written is misleading to readers and might
trip up static analysis tools. Make sure that negative values are
explicitly excluded so that it is immediately clear that EINVAL will be
returned in this case.
Build tested with clang 16 and GCC 12.
Approved by: markj (mentor)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
If userpath is not SHM_ANON, then copy it in early so ktrace(2) can
record it. Without this change, ktrace(2) will attempt to strcpy a
userspace string and trigger a page fault.
Reported by: syzbot+490b9c2a89f53b1b9779@syzkaller.appspotmail.com
Fixes: 0cd9cde767
Approved by: markj (mentor)
Reviewed by: markj
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D44702
The regressions in aio(4) and kernel RPC aren't a 5 minute problem.
This reverts commit d80a97def9.
This reverts commit d1cbb17a87.
This reverts commit fb8a8333b4.
Provide protocol specific pr_sosend and pr_soreceive for PF_UNIX
SOCK_STREAM sockets and implement SOCK_SEQPACKET sockets as an extension
of SOCK_STREAM. The change meets three goals: get rid of unix(4) specific
stuff in the generic socket code, provide a faster and robust unix/stream
sockets and bring unix/seqpacket much closer to specification. Highlights
follow:
- The send buffer now is truly bypassed. Previously it was always empty,
but the send(2) still needed to acquire its lock and do a variety of
tricks to be woken up in the right time while sleeping on it. Now the
only two things we care about in the send buffer is the I/O sx(9) lock
that serializes operations and value of so_snd.sb_hiwat, which we can read
without obtaining a lock. The sleep of a send(2) happens on the mutex of
the receive buffer of the peer. A bulk send/recv of data with large
socket buffers will make both syscalls just bounce between owning the
receive buffer lock and copyin(9)/copyout(9), no other locks would be
involved.
- The implementation uses new mchain structure to manipulate mbuf chains.
Note that this required converting to mchain two functions that are shared
with unix/dgram: unp_internalize() and unp_addsockcred() as well as adding
a new shared one uipc_process_kernel_mbuf(). This induces some non-
functional changes in the unix/dgram code as well. There is a space for
improvement here, as right now it is a mix of mchain and manually managed
mbuf chains.
- unix/seqpacket previously marked as PR_ADDR & PR_ATOMIC and thus treated
as a datagram socket by the generic socket code, now becomes a true stream
socket with record markers.
- unix/stream loses the sendfile(2) support. This can be brought back,
but requires some work. Let's first see if there is any interest in this
feature, except purely academical.
Reviewed by: markj, tuexen
Differential Revision: https://reviews.freebsd.org/D44151
Implement m_uiotombuf() as a wrapper around mc_uiotomc(). The M_EXTPG is
left untouched. The m_uiotombuf() is left as a compat KPI. New code
should use either mc_uiotomc() or m_uiotombuf_nomap().
Reviewed by: markj, tuexen
Differential Revision: https://reviews.freebsd.org/D44150
Implement m_getm2(), which is widely used via m_getm() macro, as a wrapper
around mc_get(). New code is advised to use mc_get().
Reviewed by: markj, tuexen
Differential Revision: https://reviews.freebsd.org/D44149
It preserves tail points and all length/memory accounting, so that caller
doesn't need to do any extra traversals. It doesn't respect M_PKTHDR but
it may be improved if needed. It respects M_EOR, though. First consumer
will be the new unix(4) SOCK_STREAM and SOCK_SEQPACKET.
Also provide much more simple mc_concat() that glues two chains back.
Reviewed by: markj
Differentail Revision: https://reviews.freebsd.org/D44148
Back in 2015 when it turned non-blocking, it was working with PF_UNIX
and it may still work. However, the usefullness of such application
of sendfile(2) is questionable. Disable the feature while unix/stream
is under refactoring.
Relnotes: yes
Report the delivery of signals to processes other than self while
Capsicum violation tracing with CAPFAIL_SIGNAL.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D40679
Report syscalls that are not allowed in capability mode with
CAPFAIL_SYSCALL.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D40678
When a Capsicum violation occurs in the kernel, ktrace will now record
detailed information pertaining to the violation.
For example:
- When a namei lookup violation occurs, ktrace will record the path.
- When a signal violation occurs, ktrace will record the signal number.
- When a sendto(2) violation occurs, ktrace will record the recipient
sockaddr.
For all violations, the syscall and ABI is recorded.
kdump is also modified to display this new information to the user.
Reviewed by: oshogbo, markj
Approved by: markj (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D40676
Since thread_single(SINGLE_ALLPROC) ignores them since 9241ebc796,
and there is not much we can do for the debugger-controlled process.
Noted by: olce
Reviewed by: markj, olce
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44638
This ensures that we invoke VOP_READ on the input file even if it's
empty, which in turn helps ensure that filesystems update the atime of
the file.
PR: 274615
Reviewed by: olce, rmacklem, kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D43524
a0993376ec (from D43179) subtly changed stats_v1_blob_clone() to stop returning EOVERFLOW in the case where the user buffer is not large enough to receive the entire statsblob. This results in any consumers which are implemented to retry on receiving EOVERFLOW to instead give up after receiving an empty statsblob header.
Fix by latching any errors recorded prior to copyout.
Reviewed by: markj
Obtained from: Netflix, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44585
Fixes: a0993376ec ("stats: Check for errors from copyout()")
I have a kernel module which fails to load because of an unrecognized
relocation type. link_elf_load_file() fails before the module's ctors
are invoked and it calls linker_file_unload(), which causes the module's
dtors to be executed, resulting in a kernel panic.
Add a flag to the linker file to ensure that dtors are not invoked if
unloading due to an error prior to ctors being invoked.
At the moment I only implemented this for link_elf_obj.c since
link_elf.c doesn't invoke dtors, but I refactored link_elf.c to make
them more similar.
Fixes: 9e575fadf4 ("link_elf_obj: Invoke fini callbacks")
Reviewed by: zlei, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D44559
In order for pmap_kenter{,_device}() to create superpage mappings,
either 64 KB or 2 MB, pmap_mapdev{,_attr}() must request appropriately
aligned virtual addresses.
Reviewed by: markj
Tested by: gallatin
Differential Revision: https://reviews.freebsd.org/D42737
Debugger has the powers to cause unbound delay in single-threading,
which then blocks the threaded taskqueue. The reproducer is
`truss -f timeout 2 sleep 10`.
Reported by: mjg
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44523
The kernel CTF loading routines print various warnings when attempting
to load CTF data from an ELF file. After the changes in c21bc6f3c2
those warnings are unnecessarily printed for each kernel module
that was compiled without CTF data.
The kernel linker already uses the bootverbose flag to conditionally
print CTF loading errors. This patch alters kern_ctf.c
routines to do the same.
Reported by: Alexander@leidinger.net
Approved by: markj (mentor)
Fixes: c21bc6f3c2 ("ddb: Add CTF-based pretty printing")
Before a protocol specific control block started to embed inpcb in self
(see 0aa120d52f, e68b379244, 483fe96511) this pointer used to point
at it.
Retain kf_sock_inpcb field in the struct kinfo_file in <sys/user.h>. The
exp-run detected a minimal use of the field in ports:
* sysutils/lsof - patched upstream
* net-mgmt/netdata - patch accepted upstream
* emulators/qemu-user-static - upstream master branch seems not using
the field anymore
We can keep the field around for some time, but eventually it may be
reused for something else.
PR: 277659 (exp-run)
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D44491
This allows for shutdown_final EVENTHANDLERs to know that a core dump
successfully occurred. Embedded systems may want to record this fact
or act on it.
Obtained from: Juniper Networks, Inc.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D44542
HPTS inserts a softclock for system call return that optimizes performance. However when
no HPTS threads need the help (i.e. when they have less than 100 or so connections) then
there should be little work done i.e. check the counter and return instead of running through
all the threads getting locks etc.ptimize HPTS so that little work is done until we have a hpts
thread that is over the connection threshold.
Reported by: eduardo
Reviewed by: gallatin, glebius, tuexen
Tested by: gallatin
Differential Revision: https://reviews.freebsd.org/D44420