system/freebsd-src

mirror of https://github.com/freebsd/freebsd-src synced 2024-10-15 21:05:08 +00:00

Author	SHA1	Message	Date
Elliott Mitchell	38c35248fe	kern/intr: remove support for passing trap frame as argument While otherwise a handy potential approach, getting the trap frame via the argument isn't documented and isn't supposed to be used. With all uses removed, now remove support to end the mixed calling conventions. Differential Revision: https://reviews.freebsd.org/D37688 Reviewed by: imp, mhorne Pull Request: https://github.com/freebsd/freebsd-src/pull/1225	2024-05-10 15:33:24 -06:00
John Baldwin	473c90ac04	uio: Use switch statements when handling UIO_READ vs UIO_WRITE This is mostly to reduce the diff with CheriBSD which adds additional constants to enum uio_rw, but also matches the normal style used for uio_segflg. Reviewed by: kib, emaste Obtained from: CheriBSD Differential Revision: https://reviews.freebsd.org/D45142	2024-05-10 13:43:36 -07:00
Isaac Cilia Attard	6437872c1d	New sysctl to disable NOMATCH until devmatch runs Introduce hw.bus.devctl_nomatch_enabled and use it to suppress NOMATCH until devmatch runs There's a lot of NOMATCH events generated at boot. We also run devmatch once during early boot to load unmatched devices. To avoid redundant work, don't start generating NOMATCH events until after devmatch runs. Set hw.bus.devctl_nomatch_enabled=1 just before we run devmatch. The kernel will suppress NOMATCH events until this is set to true. This saves about 170ms from the boot on aarch64 running atop Apple M-series processors and the VMWare Fusion hypervisor. Reviewed by: imp, cperciva MFC after: 3 days Sponsored by: Google Summer of Code Pull Request: https://github.com/freebsd/freebsd-src/pull/1213	2024-05-09 17:56:40 -07:00
Elliott Mitchell	9f3a552f9e	intrng: switch flag arguments to unsigned The flag variables behind these are all unsigned. As such adjust the declarations to match reality and reduce the number of mismatches. Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/1126	2024-05-09 17:14:38 -06:00
Elliott Mitchell	a9e0f316b3	kern/intr: redeclare intr_setaffinity()'s third arg constant This matches reality and allows removal of a __DECONST(). Fixes: `4c72d075a5` ("LinuxKPI: const argument to irq_set_affinity_hint()") Fixes: `9b33b154b5` ("Add support to cpuset for binding hardware interrupts") Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/1126	2024-05-09 17:14:35 -06:00
Elliott Mitchell	cd04887b95	kern/intr: change ->ie_irq to unsigned All architecture implementations actually want this to be unsigned. INTRNG the equivalent is overtly unsigned. x86 and PowerPC merely avoid the need to explicitly convert at several points. Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/1126	2024-05-09 17:14:33 -06:00
Mitchell Horne	a77e1f0f81	busdma: better handling of small segment bouncing Typically, when a DMA transaction requires bouncing, we will break up the request into segments that are, at maximum, page-sized. However, in the atypical case of a driver whose maximum segment size is smaller than PAGE_SIZE, we end up inefficiently assigning each segment its own bounce page. For example, the dwmmc driver has a maximum segment size of 2048 (PAGE_SIZE / 2); a 4-page transfer ends up requiring 8 bounce pages in the current scheme. We should attempt to batch segments into bounce pages more efficiently. This is achieved by pushing all considerations of the maximum segment size into the new _bus_dmamap_addsegs() function, which wraps _bus_dmamap_addseg(). Thus we allocate the minimal number of bounce pages required to complete the entire transfer, while still performing the transfer with smaller-sized transactions. For most drivers with a segment size >= PAGE_SIZE, this will have no impact. For drivers like dwmmc mentioned above, this improves the memory and performance efficiency when bouncing a large transfer. Co-authored-by: jhb Reviewed by: jhb MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45048	2024-05-07 13:02:57 -03:00
Mitchell Horne	5604069824	busdma: deduplicate _bus_dmamap_addseg() function It is functionally identical in all implementations, so move the function to subr_busdma_bounce.c. The KASSERT present in the x86 version is now enabled for all architectures. It should be universally applicable. Reviewed by: jhb MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45047	2024-05-07 13:02:57 -03:00
Gleb Smirnoff	99b0270adc	sockets: hide socket hhook(9)s under SOCKET_HHOOK There are no in-tree consumers of these hooks. Reviewed by: stevek Differential Revision: https://reviews.freebsd.org/D44928	2024-05-06 12:49:29 -07:00
John Baldwin	51346bd594	mbuf: Add EXT_CTL for mbufs backed by a CTL backend buffer This is somewhat similar to EXT_NET_DRV, but CTL isn't a network driver. Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44725	2024-05-02 16:38:30 -07:00
Mark Johnston	d5eae57088	sysctl: Make sysctl_ctx_free() a bit safer Clear the list before returning so that sysctl_ctx_free() can be called more than once on the same list without side effects. This simplifies error handling in drivers; previously, drivers would have to be careful to call sysctl_ctx_free() at most once to avoid a use-after-free. While here, use TAILQ_FOREACH_SAFE in the loop which unregisters OIDs. Reviewed by: thj, emaste MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D45041	2024-05-02 15:42:28 -04:00
Josef 'Jeff' Sipek	0fe60dc655	fattime: fix fattime to timespec conversion of dates beyond 2106-02-06 It turns out that the only conversion issue was in fattime2timespec, where multiplying the number of seconds in a day by the number of days overflowed 32-bit unsigned int for dates beyond 2106-02-07 06:28:15. Casting one of the multiplicands as time_t forces a 64-bit multiplication on systems where time_t is 64-bits and produces no binary changes on the one remaining system with 32-bit time_t (namely i386). Since the code is now tested & fixed, this change removes the fixme comments. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44755	2024-05-01 07:56:41 +03:00
Josef 'Jeff' Sipek	9d1396c346	fattime: make the test code check beyond 32-bit time_t limits On systems that have a 64-bit time_t, the test code now exercises the whole range of fattime. To do this, this commit... 1. replaces the call to random() with two calls to arc4random() to generate a 33-bit number of seconds in order to cover the entire range of fattime [1970,2107]. (32-bits stops just short - in January 2106.) On systems with 32-bit time_t, the extra bits are discarded and only the time_t expressible range is tested. 2. casts time_t values passed to printf as longs and changes the format string to match. Now, the test code builds, runs, and exercises what it can (i.e., the whole fattime range or the 32-bit time_t subset of it) on both 32-bit and 64-bit time_t systems. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44754	2024-05-01 07:56:41 +03:00
Josef 'Jeff' Sipek	7b8b613d08	fattime: make the test code build again This change... 1. replaces calls to timet2fattime/fattime2timet with calls to timespec2fattime/fattime2timespec. The functions got renamed shortly after they landed in the kernel but the test code wasn't updated (see `7ea93e912b`). 2. adds a utc_offset stub. With this, the test code builds and runs as a 32-bit binary (cc -Wall -O2 -m32 subr_fattime.c). Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44753	2024-05-01 07:56:41 +03:00
Justin Hibbits	2cb4909011	cons: Add boot option to mute boot messages after banner This is useful for embedded systems, where it provides feedback that the kernel has booted, but avoids printing the probe messages. If both mutemsgs and verbose are set, verbose cancels the mute. Additionally, this unmutes the console on panic, so a user can see what happened leading up to the panic. Obtained from: Juniper Networks, Inc.	2024-04-30 16:23:47 -04:00
Andrew Gallatin	13a5a46c49	Fix new users of MAXPHYS and hide it from the kernel namespace In `cd85379104`, kib made maxphys a load-time tunable. This made the #define MAXPHYS in sys/param.h almost entirely obsolete, as it could now be overridden by kern.maxphys at boot time, or by opt_maxphys.h. However, decades of tradition have led to several new, incorrect, uses of MAXPHYS in other parts of the kernel, mostly by seasoned developers. I've corrected those uses here in a mechanical fashion, and verified that it fixes a bug in the md driver that I was experiencing. Since using MAXPHYS is such an easy mistake to make, it is best to hide it from the kernel namespace. So I've moved its definition to _maxphys.h, which is now included in param.h only for userspace. That brings up the fact that lots of userspace programs use MAXPHYS for different reasons, most of them probably wrong. Userspace consumers that really need to know the value of maxphys should probably be changed to use the kern.maxphys sysctl. But that's outside the scope of this change. Reviewed by: imp, jkim, kib, markj Fixes: `30038a8b4e` ("md: Get rid of the pbuf zone") Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D44986	2024-04-30 15:29:06 -04:00
Konstantin Belousov	5b3e5c6ce3	kcmp_pget(): do not accept TIDs Otherwise pget() might still look up and hold the current process. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2024-04-30 10:07:03 +03:00
Konstantin Belousov	1e01650a78	kcmp_pget(): add an assert that we did not hold the current process Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2024-04-30 10:06:52 +03:00
Mark Johnston	d66399326c	kthread: Set tdptr earlier in kproc_kthread_add() See commit `ae77041e07` ("kthread: Set newtdp earlier in kthread_add1()") for details. That commit was incomplete since g_init()'s first call to kproc_kthread_add() will cause kproc_kthread_add() to take the `*procptr == NULL` branch, which avoids kthread_create(). To ensure that the thread pointer is initialized before the thread starts running, we have to start the kernel process with RFSTOPPED. We could perhaps go further and use RFSTOPPED only when tdptr != NULL, but it's probably better to have consistent behaviour. Reviewed by: olce, kib Reported by: syzbot+e91e798f3c088215ace6@syzkaller.appspotmail.com MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D44927	2024-04-25 09:35:38 -04:00
Gleb Smirnoff	19307b86d3	accept_filter: return different errors for non-listener and a busy socket The fact that an accept filter needs to be cleared first before setting to a different one isn't properly documented. The requirement that the socket needs already be listening, although trivial, isn't documented either. At least return a more meaningful error than EINVAL for an existing filter. Cover this with a test case.	2024-04-24 21:55:58 -07:00
Brooks Davis	78101d437a	syscalls.master: correct return type of {read,write}v This was missed when read/write, etc were updated to return ssize_t. Fixes: `2e83b28161` Fix a few syscall arguments to use size_t instead of u_int. Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D44930	2024-04-24 20:48:46 +01:00
Konstantin Belousov	6b0cf2a237	vfs_lookup.c: only call ktrcapfail() if KTRACE is enabled Reviewed by: emaste, imp, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44931	2024-04-24 22:43:32 +03:00
Konstantin Belousov	66df81021e	sys/namei.h: move NI_CAP_VIOLATION() macro from namei.h to vfs_lookup.c Reviewed by: emaste, imp, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44931	2024-04-24 22:43:31 +03:00
Mark Johnston	8ef2c02182	busdma: uma_zcreate() does not fail No functional change intended. MFC after: 1 week	2024-04-24 08:46:41 -04:00
Mark Johnston	1e607a0753	khelp: uma_zcreate() does not fail No functional change intended. MFC after: 1 week	2024-04-24 08:46:35 -04:00
Gleb Smirnoff	a8acc2bf56	sockets: inherit SO_ACCEPTFILTER from listener to child This is crucial for operation of accept_filter(9). See added comment. Fixes: `d29b95ecc0`	2024-04-23 17:17:14 -07:00
Konstantin Belousov	53186bc143	sigqueue(2): add impl-specific flag __SIGQUEUE_TID The flag allows the pid argument to designate a thread from the calling process. The flag value is carved from the high bit of the signal number, which slightly changes the ABI of syscall. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44867	2024-04-23 19:51:09 +03:00
Konstantin Belousov	0c11c1792b	kern_thr.c: normalize includes Remove extra sys/param.h, provided by sys/systm.h. Order the rest alphabetically. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44867	2024-04-23 19:51:07 +03:00
Konstantin Belousov	2effad53b4	kern_thr.c/kern_sig.c: remove sys/cdefs.h Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44867	2024-04-23 19:51:05 +03:00
Konstantin Belousov	53e0938b0b	kern_thread.c: remove unneeded include of sys/param.h Handled by sys/systm.h already. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44867	2024-04-23 19:51:03 +03:00
Mark Johnston	7a7063cc54	thread: Add a missing include of asan.h I didn't notice this during testing because invariants-enabled kernels implicitly include asan.h via kassert.h. Reported by: Lexi Winter <lexi@le-Fay.org> Fixes: `800da341bc` ("thread: Simplify sanitizer integration with thread creation")	2024-04-22 13:07:53 -04:00
Mark Johnston	800da341bc	thread: Simplify sanitizer integration with thread creation fork() may allocate a new thread in one of two ways: from UMA, or cached in a freed proc that was just allocated from UMA. In either case, KASAN and KMSAN need to initialize some state; in particular they need to initialize the shadow mapping of the new thread's stack. This is done differently between KASAN and KMSAN, which is confusing. This patch improves things a bit: - Add a new thread_recycle() function, which moves all kernel stack handling out of kern_fork.c, since it doesn't really belong there. - Then, thread_alloc_stack() has only one local caller, so just inline it. - Avoid redundant shadow stack initialization: thread_alloc() initializes the KMSAN shadow stack (via kmsan_thread_alloc()) even through vm_thread_new() already did that. - Add kasan_thread_alloc(), for consistency with kmsan_thread_alloc(). No functional change intended. Reviewed by: khng MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D44891	2024-04-22 11:46:59 -04:00
Gordon Bergling	9576fc16ca	uipc_domain: Fix a typo in a source code comment - s/cant/can't/ MFC after: 3 days	2024-04-21 09:51:14 +02:00
Ka Ho Ng	68a3a7fc94	kasan: fix false-positive kasan_report upon thread reuse In fork1(), if a thread is reused and thread_alloc_stack() is not called, mark the reused thread's kstack pages clean in the KASAN shadow buffer. Sponsored by: Juniper Networks, Inc. MFC after: 3 days Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D44875	2024-04-19 15:18:27 -04:00
Mark Johnston	e411b22736	uipc_shm: Fix a free() of an uninitialized variable Reported by: Coverity CID: 1544043 Fixes: `b112232e4f` ("uipc_shm: Copyin userpath for ktrace(2)")	2024-04-18 20:18:29 -04:00
Brooks Davis	1fd880742a	libsys: add a libsys.h This declares an API for libsys which currently consists of __sys_<foo>() declarations for system call stubs and function pointer typedefs of the form __sys_<foo>_t. The vast majority of the implementation resides in a generated _libsys.h which ensures that all system call stub declarations match syscalls.master. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D44387	2024-04-16 17:48:07 +01:00
Brooks Davis	6bb132ba1e	Reduce reliance on sys/sysproto.h pollution Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed. sys/sysproto.h currently includes sys/acl.h which currently includes sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in sys/errno.h sys/malloc.h. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D44465	2024-04-15 21:35:40 +01:00
Gleb Smirnoff	e6a4b57239	mbuf: restore m_uiotombuf() feature of returning a zero length mbuf PR: 278340 Fixes: `aba79b0f4a`	2024-04-14 10:21:07 -07:00
Gleb Smirnoff	0020e1b617	Revert "sendfile: mark it explicitly as a TCP only feature" This reverts commit `3b7aa842e2`.	2024-04-10 11:28:11 -07:00
Olivier Certner	afc10f8bba	sys_procctl(): Make it clear that negative commands are invalid An initial reading of the preamble of sys_procctl() gives the impression that no test prevents a malicious user from passing a negative commands index (in 'uap->com'), which is soon used as an index into the static array procctl_cmds_info[]. However, a closer examination leads to the conclusion that the existing code is technically correct. Indeed, the comparison of 'uap->com' to the nitems() expression, which expands to a ratio of sizeof(), leads to a conversion of 'uap->com' to an 'unsigned int' as per Usual Arithmetic Conversions/Integer Promotions applied by '<=', because sizeof() returns 'size_t' values, and we define 'size_t' as an equivalent of 'unsigned int' (which is not mandated by the standard, the latter allowing, e.g., integers of lower ranks). With this conversion, negative values of 'uap->com' are automatically ruled-out since they are converted to very big unsigned integers which are caught by the test. An analysis of assembly code produced by LLVM 16 on amd64 and practical tests confirm that no exploitation is possible. However, the guard code as written is misleading to readers and might trip up static analysis tools. Make sure that negative values are explicitly excluded so that it is immediately clear that EINVAL will be returned in this case. Build tested with clang 16 and GCC 12. Approved by: markj (mentor) MFC after: 1 week Sponsored by: The FreeBSD Foundation	2024-04-10 17:15:25 +02:00
Jake Freeland	b112232e4f	uipc_shm: Copyin userpath for ktrace(2) If userpath is not SHM_ANON, then copy it in early so ktrace(2) can record it. Without this change, ktrace(2) will attempt to strcpy a userspace string and trigger a page fault. Reported by: syzbot+490b9c2a89f53b1b9779@syzkaller.appspotmail.com Fixes: `0cd9cde767` Approved by: markj (mentor) Reviewed by: markj MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D44702	2024-04-09 21:17:11 -05:00
Gleb Smirnoff	5716d902ae	Revert "unix: new implementation of unix/stream & unix/seqpacket" The regressions in aio(4) and kernel RPC aren't a 5 minute problem. This reverts commit `d80a97def9`. This reverts commit `d1cbb17a87`. This reverts commit `fb8a8333b4`.	2024-04-09 13:15:47 -07:00
Stephen J. Kiernan	81b4d1c4d4	sockets: Add hhook in sonewconn for inheriting OSD specific data Added HHOOK_SOCKET_NEWCONN and bumped HHOOK_SOCKET_LAST Reviewed by: glebius, tuexen Obtained from: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D44632	2024-04-08 21:31:34 -04:00
Gleb Smirnoff	fb8a8333b4	unix: return immediately on MSG_OOB Jumping to cleanup routines will work on uninitialized stack mc. Fixes: `d80a97def9` Reported-by: syzbot+4adf0b37849ea7723586@syzkaller.appspotmail.com	2024-04-08 17:09:16 -07:00
Gleb Smirnoff	d1cbb17a87	unix: fix the ad hoc STAILQ_PREPEND() If there is nothing to prepend, don't try STAILQ_INSERT_HEAD(). Fixes: `d80a97def9` Reported-by: syzbot+bb7f3d07c79b5faf8de8@syzkaller.appspotmail.com	2024-04-08 17:02:00 -07:00
Gleb Smirnoff	d80a97def9	unix: new implementation of unix/stream & unix/seqpacket Provide protocol specific pr_sosend and pr_soreceive for PF_UNIX SOCK_STREAM sockets and implement SOCK_SEQPACKET sockets as an extension of SOCK_STREAM. The change meets three goals: get rid of unix(4) specific stuff in the generic socket code, provide a faster and robust unix/stream sockets and bring unix/seqpacket much closer to specification. Highlights follow: - The send buffer now is truly bypassed. Previously it was always empty, but the send(2) still needed to acquire its lock and do a variety of tricks to be woken up in the right time while sleeping on it. Now the only two things we care about in the send buffer is the I/O sx(9) lock that serializes operations and value of so_snd.sb_hiwat, which we can read without obtaining a lock. The sleep of a send(2) happens on the mutex of the receive buffer of the peer. A bulk send/recv of data with large socket buffers will make both syscalls just bounce between owning the receive buffer lock and copyin(9)/copyout(9), no other locks would be involved. - The implementation uses new mchain structure to manipulate mbuf chains. Note that this required converting to mchain two functions that are shared with unix/dgram: unp_internalize() and unp_addsockcred() as well as adding a new shared one uipc_process_kernel_mbuf(). This induces some non- functional changes in the unix/dgram code as well. There is a space for improvement here, as right now it is a mix of mchain and manually managed mbuf chains. - unix/seqpacket previously marked as PR_ADDR & PR_ATOMIC and thus treated as a datagram socket by the generic socket code, now becomes a true stream socket with record markers. - unix/stream loses the sendfile(2) support. This can be brought back, but requires some work. Let's first see if there is any interest in this feature, except purely academical. Reviewed by: markj, tuexen Differential Revision: https://reviews.freebsd.org/D44151	2024-04-08 13:16:51 -07:00
Gleb Smirnoff	aba79b0f4a	mbuf: provide mc_uiotomc() a function to copy from uio(9) to mchain Implement m_uiotombuf() as a wrapper around mc_uiotomc(). The M_EXTPG is left untouched. The m_uiotombuf() is left as a compat KPI. New code should use either mc_uiotomc() or m_uiotombuf_nomap(). Reviewed by: markj, tuexen Differential Revision: https://reviews.freebsd.org/D44150	2024-04-08 13:16:51 -07:00
Gleb Smirnoff	71f8702f49	mbuf: provide mc_get() that allocates struct mchain of given length Implement m_getm2(), which is widely used via m_getm() macro, as a wrapper around mc_get(). New code is advised to use mc_get(). Reviewed by: markj, tuexen Differential Revision: https://reviews.freebsd.org/D44149	2024-04-08 13:16:51 -07:00
Gleb Smirnoff	fd01798fc4	mbuf: add mc_split() that works on two struct mchain It preserves tail points and all length/memory accounting, so that caller doesn't need to do any extra traversals. It doesn't respect M_PKTHDR but it may be improved if needed. It respects M_EOR, though. First consumer will be the new unix(4) SOCK_STREAM and SOCK_SEQPACKET. Also provide much more simple mc_concat() that glues two chains back. Reviewed by: markj Differentail Revision: https://reviews.freebsd.org/D44148	2024-04-08 13:16:51 -07:00
Gleb Smirnoff	3b7aa842e2	sendfile: mark it explicitly as a TCP only feature Back in 2015 when it turned non-blocking, it was working with PF_UNIX and it may still work. However, the usefullness of such application of sendfile(2) is questionable. Disable the feature while unix/stream is under refactoring. Relnotes: yes	2024-04-08 13:16:51 -07:00
Jake Freeland	aa32d7cbc9	ktrace: Record socket violations with KTR_CAPFAIL Report restricted access to socket addresses and protocols while Capsicum violation tracing with CAPFAIL_ADDR and CAPFAIL_PROTO. Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40681	2024-04-07 18:52:51 -05:00
Jake Freeland	0cd9cde767	ktrace: Record namei violations with KTR_CAPFAIL Report namei path lookups while Capsicum violation tracing with CAPFAIL_NAMEI. vfs caching is also ignored when tracing to mimic capability mode behavior. Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40680	2024-04-07 18:52:51 -05:00
Jake Freeland	6a4616a529	ktrace: Record signal violations with KTR_CAPFAIL Report the delivery of signals to processes other than self while Capsicum violation tracing with CAPFAIL_SIGNAL. Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40679	2024-04-07 18:52:51 -05:00
Jake Freeland	05296a0ff6	ktrace: Record syscall violations with KTR_CAPFAIL Report syscalls that are not allowed in capability mode with CAPFAIL_SYSCALL. Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40678	2024-04-07 18:52:51 -05:00
Jake Freeland	96c8b3e509	ktrace: Record cpuset violations with KTR_CAPFAIL Report Capsicum violations in the cpuset namespace with CAPFAIL_CPUSET. Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40677	2024-04-07 18:52:51 -05:00
Jake Freeland	9bec841312	ktrace: Record detailed ECAPMODE violations When a Capsicum violation occurs in the kernel, ktrace will now record detailed information pertaining to the violation. For example: - When a namei lookup violation occurs, ktrace will record the path. - When a signal violation occurs, ktrace will record the signal number. - When a sendto(2) violation occurs, ktrace will record the recipient sockaddr. For all violations, the syscall and ABI is recorded. kdump is also modified to display this new information to the user. Reviewed by: oshogbo, markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40676	2024-04-07 18:52:51 -05:00
Michael Tuexen	681711b77c	uipc_socket: handle socket buffer locks in sopeeloff PR: 278171 Reviewed by: markj Fixes: `a4fc41423f` ("sockets: enable protocol specific socket buffers") MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D44640	2024-04-05 18:20:19 +02:00
Konstantin Belousov	235436d631	stop_all_proc(): skip traced or signal-stoped processes Since thread_single(SINGLE_ALLPROC) ignores them since `9241ebc796`, and there is not much we can do for the debugger-controlled process. Noted by: olce Reviewed by: markj, olce Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44638	2024-04-05 17:52:39 +03:00
Mark Johnston	08f3d5b60c	copy_file_range: Call vn_rdwr() at least once This ensures that we invoke VOP_READ on the input file even if it's empty, which in turn helps ensure that filesystems update the atime of the file. PR: 274615 Reviewed by: olce, rmacklem, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D43524	2024-04-04 17:03:07 -04:00
Lawrence Stewart	7eb92c502e	Reinstate returning EOVERFLOW from stats_v1_blob_clone() `a0993376ec` (from D43179) subtly changed stats_v1_blob_clone() to stop returning EOVERFLOW in the case where the user buffer is not large enough to receive the entire statsblob. This results in any consumers which are implemented to retry on receiving EOVERFLOW to instead give up after receiving an empty statsblob header. Fix by latching any errors recorded prior to copyout. Reviewed by: markj Obtained from: Netflix, Inc. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D44585 Fixes: `a0993376ec` ("stats: Check for errors from copyout()")	2024-04-03 12:58:26 +11:00
Mark Johnston	7ef5c19b21	kern linker: Don't invoke dtors without having invoked ctors I have a kernel module which fails to load because of an unrecognized relocation type. link_elf_load_file() fails before the module's ctors are invoked and it calls linker_file_unload(), which causes the module's dtors to be executed, resulting in a kernel panic. Add a flag to the linker file to ensure that dtors are not invoked if unloading due to an error prior to ctors being invoked. At the moment I only implemented this for link_elf_obj.c since link_elf.c doesn't invoke dtors, but I refactored link_elf.c to make them more similar. Fixes: `9e575fadf4` ("link_elf_obj: Invoke fini callbacks") Reviewed by: zlei, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D44559	2024-03-31 14:15:11 -04:00
Alan Cox	e0388a906c	arm64: enable superpage mappings by pmap_mapdev{,_attr}() In order for pmap_kenter{,_device}() to create superpage mappings, either 64 KB or 2 MB, pmap_mapdev{,_attr}() must request appropriately aligned virtual addresses. Reviewed by: markj Tested by: gallatin Differential Revision: https://reviews.freebsd.org/D42737	2024-03-30 15:41:30 -05:00
Konstantin Belousov	9241ebc796	thread_single(9): decline external requests for traced or debugger-stopped procs Debugger has the powers to cause unbound delay in single-threading, which then blocks the threaded taskqueue. The reproducer is `truss -f timeout 2 sleep 10`. Reported by: mjg Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44523	2024-03-30 16:43:52 +02:00
Bojan Novković	bdc903460b	kern_ctf.c: Don't print out warning messages unconditionally The kernel CTF loading routines print various warnings when attempting to load CTF data from an ELF file. After the changes in `c21bc6f3c2` those warnings are unnecessarily printed for each kernel module that was compiled without CTF data. The kernel linker already uses the bootverbose flag to conditionally print CTF loading errors. This patch alters kern_ctf.c routines to do the same. Reported by: Alexander@leidinger.net Approved by: markj (mentor) Fixes: `c21bc6f3c2` ("ddb: Add CTF-based pretty printing")	2024-03-29 20:32:18 +01:00
Gleb Smirnoff	1a8d176432	inpcb: fully retire inp_ppcb pointer Before a protocol specific control block started to embed inpcb in self (see `0aa120d52f`, `e68b379244`, `483fe96511`) this pointer used to point at it. Retain kf_sock_inpcb field in the struct kinfo_file in <sys/user.h>. The exp-run detected a minimal use of the field in ports: * sysutils/lsof - patched upstream * net-mgmt/netdata - patch accepted upstream * emulators/qemu-user-static - upstream master branch seems not using the field anymore We can keep the field around for some time, but eventually it may be reused for something else. PR: 277659 (exp-run) Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D44491	2024-03-29 12:18:32 -07:00
Bojan Novković	722b8e3cb6	Fix style nits in kern_linker.c Reported by: jrtc27 Fixes: `c21bc6f3c2` ("ddb: Add CTF-based pretty printing") Approved by: markj (mentor)	2024-03-28 20:36:30 +01:00
Stephen J. Kiernan	2aee804c9e	kerneldump: Add flag to indicate kernel core was successfully dumped This allows for shutdown_final EVENTHANDLERs to know that a core dump successfully occurred. Embedded systems may want to record this fact or act on it. Obtained from: Juniper Networks, Inc. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44542	2024-03-28 14:11:16 -04:00
Randall Stewart	b7b78c1c16	Optimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold HPTS inserts a softclock for system call return that optimizes performance. However when no HPTS threads need the help (i.e. when they have less than 100 or so connections) then there should be little work done i.e. check the counter and return instead of running through all the threads getting locks etc.ptimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold. Reported by: eduardo Reviewed by: gallatin, glebius, tuexen Tested by: gallatin Differential Revision: https://reviews.freebsd.org/D44420	2024-03-28 08:12:37 -04:00
Zhenlei Huang	1c7307cf67	kern linker: Make linker_file_add_dependency() void The only possible return value has been zero since `cee9542d51`. No functional change intended. Reviewed by: dfr MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D44507	2024-03-27 12:02:32 +08:00
Zhenlei Huang	39450eba8e	kern linker: Do not touch userrefs of the kernel file A nonzero `userrefs` of a linker file indicates that the file, either loaded from kldload(2) or preloaded, can be unloaded via kldunload(2). As for the kernel file, it can be unloaded by the loader but should not be after initialization. This change fixes regression from `d9ce8a41ea` which incidentally increases `userrefs` of the kernel file. Reviewed by: dfr, dab, jhb Fixes: `d9ce8a41ea` kern_linker: Handle module-loading failures in preloaded .ko files MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D42530	2024-03-26 16:47:02 +08:00
Zhenlei Huang	f43ff3e15c	kern linker: Do not unload a module if it has dependants Despite the name, linker_file_unload() will drop a reference and return success when the module file has dependants, i.e. it has more than one reference. When user request to unload such modules then the kernel should reject unambiguously and immediately. PR: 274986 Reviewed by: dfr, dab, jhb MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D42527	2024-03-26 11:55:45 +08:00
Konstantin Belousov	e0c92dd2b7	amd64: initialize td_frame stack area for init(8) main thread Unitialized td_frame mostly does not matter since all registers are overwritten on exec to activate init(8). Except PSL_T bit from the %rflags which might leak into fresh init as garbage, causing spurious SIGTRAPs delivered to init until first syscall is executed. Reviewed by: emaste, jhb, jhibbits Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44498	2024-03-26 04:01:38 +02:00
Gleb Smirnoff	15bfd7cf27	soreceive_dgram: use M_WAITOK when we don't hold any locks	2024-03-22 22:44:16 -07:00
Gleb Smirnoff	26389b308d	soreceive_dgram: assert that a datagram has control or data	2024-03-22 22:44:16 -07:00
Mitchell Horne	dc7ae2bc6f	kern_ctf.c: fix linking with nooptions DDB !DDB builds don't include the db_ctf_lookup_typename() symbol, so this is a stop-gap to fix linking of the MINIMAL kernel config. Reported by: bapt Fixes: `c21bc6f3c2` ("ddb: Add CTF-based pretty printing")	2024-03-22 13:26:00 -03:00
Bojan Novković	c21bc6f3c2	ddb: Add CTF-based pretty printing Add basic CTF support and a CTF-powered pretty-printer to ddb. The db_ctf.* files expose a basic interface for fetching type data for ELF symbols, interacting with the CTF string table, and translating type identifiers to type data. The db_pprint.c file uses those interfaces to implement a pretty-printer for all kernel ELF symbols. The pretty-printer works with symbol names and arbitrary addresses: pprint struct thread 0xffffffff8194ad90 Pretty-printing currently only works after the root filesystem gets mounted because the CTF info is not available during early boot. Differential Revision: https://reviews.freebsd.org/D37899 Approved by: markj (mentor)	2024-03-22 04:03:33 +01:00
Brooks Davis	e07d37c705	sysent: regen	2024-03-19 23:13:27 +00:00
Brooks Davis	27676ae365	syscalls.master: use __acl_type_t Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D44418	2024-03-19 23:13:27 +00:00
Brooks Davis	d0efabdf15	syscalls.master: make __sys_fcntl take an intptr_t The (optional) third argument of fcntl is sometimes a pointer so change the type to intptr_t. Update the libc-internal defintion (actually used by libthr) to take a fixed intptr_t argument rather than pretending it's a variadic function. (That worked because all supported architectures pass variadic arguments as though the function was declared with those types. In CheriBSD that changes because variadic arguments are passed via a bounded array.) Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D44381	2024-03-19 23:13:26 +00:00
Brooks Davis	cab73e5305	syscalls.master: struct siginfo -> struct __siginfo struct siginfo doesn't exist, it's struct __siginfo (and siginfo_t). Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D44380	2024-03-19 23:13:26 +00:00
Brooks Davis	7936d4e4d0	syscalls.master: align with sigfastblock declaration sigfastblock is declared to take a void * argument in the manpage in headers so declare it that way and use SAL annotations to say it interacts with a 32-bit word. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D44379	2024-03-19 23:13:26 +00:00
Brooks Davis	d8d4ed26c9	syscall.master: fix aio_suspend signature It takes a `const struct iovec *iovp`. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D44378	2024-03-19 23:13:26 +00:00
Brooks Davis	128443a9f2	syscalls.master: fix readv and writev iovp decl Both take const struct iovec * and only read the values. Reviewed by: olce, kib Differential Revision: https://reviews.freebsd.org/D44377	2024-03-19 23:13:25 +00:00
Vijeyalakshumi Koteeswaran	60bc9617e7	kerneldump: add livedump_start_vnode(9) livedump_start_vnode(9) is introduced such that the live minidump on the system could take a vnode. This interface could be used to extend support for the existing framework in downstream. Bump __FreeBSD_version for introducing livedump_start_vnode(9). Sponsored by: Juniper Networks, Inc. Reviewed by: khng Differential Revision: https://reviews.freebsd.org/D43471	2024-03-18 17:12:18 -04:00
Richard Scheffenegger	b5a9299bb8	ktls: catch invalid parameters earlier Move safety checks forward from ktls_session_create() to ktls_copyin_tls_enable(). Prevents zero mallocs, and excessively large kernel mallocs. Reported-by: syzbot+72022fa9163fa958b66c@syzkaller.appspotmail.com Reported-by: syzbot+8992893e13058ce0670a@syzkaller.appspotmail.com Sponsored by: NetApp, Inc. X-NetApp-PR: #79 Reviewed By: tuexen Differential Revision: https://reviews.freebsd.org/D44364	2024-03-18 03:37:49 +01:00
Gleb Smirnoff	d62c4607e8	sockets: remove unused KPIs to manipulate sockets These KPIs were added in `dd0e6c383a` and through 15 years had zero use. They slightly remind what IfAPI does for struct ifnet. But IfAPI does that for the sake of large collection of NIC drivers not being aware of struct ifnet. For the sockets it is unclear what could be a large collection of externally written kernel modules that need extensively use sockets and not be aware of their internals at the same time. This isolation of a structure knowledge requires a lot of work, and just throwing in a few KPIs isn't helpful. Reviewed by: kib, olce, markj Differential Revision: https://reviews.freebsd.org/D44311	2024-03-18 08:50:30 -07:00
Mateusz Guzik	b0aaf8beb1	Rename VM_LAST to more appropriate VM_GUEST_LAST NFC Sponsored by: Rubicon Communications, LLC ("Netgate")	2024-03-18 10:49:09 +00:00
Rick Macklem	89f1dcb3eb	vfs_vnops.c: Use va_bytes >= va_size hint to avoid SEEK_DATA/SEEKHOLE vn_generic_copy_file_range() tries to maintain holes in file ranges being copied, using SEEK_DATA/SEEK_HOLE where possible, Unfortunately SEEK_DATA/SEEK_HOLE operations can take a long time under certain circumstances. Although it is not currently possible to know if a file has unallocated data regions, the case where va_bytes >= va_size is a strong hint that there are no unallocated data regions. This hint does not work well for file systems doing compression, but since it is only a hint, it is still useful. For the case of va_bytes >= va_size, avoid doing SEEK_DATA/SEEK_HOLE. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D44509	2024-03-14 17:35:32 -07:00
John Baldwin	9dbf5b0e68	new-bus: Remove the 'rid' and 'type' arguments from BUS_RELEASE_RESOURCE The public bus_release_resource() API still accepts both forms, but the internal kobj method no longer passes the arguments. Implementations which need the rid or type now use rman_get_rid() or rman_get_type() to fetch the value from the allocated resource. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44131	2024-03-13 15:05:54 -07:00
John Baldwin	2baed46e85	new-bus: Remove the 'rid' and 'type' arguments from BUS_*ACTIVATE_RESOURCE The public bus_activate/deactivate_resource() API still accepts both forms, but the internal kobj methods no longer pass the arguments. Implementations which need the rid or type now use rman_get_rid() or rman_get_type() to fetch the value from the allocated resource. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44130	2024-03-13 15:05:54 -07:00
John Baldwin	d77f2092ce	new-bus: Remove the 'type' argument from BUS_MAP/UNMAP_RESOURCE The public bus_map/unmap_resource() API still accepts both forms, but the internal kobj methods no longer pass the argument. Implementations which need the type now use rman_get_type() to fetch the value from the allocated resource. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44129	2024-03-13 15:05:54 -07:00
John Baldwin	fef01f0498	new-bus: Remove the 'type' argument from BUS_ADJUST_RESOURCE The public bus_adjust_resource() API still accepts both forms, but the internal kobj method no longer passes the argument. Implementations which need the type now use rman_get_type() to fetch the value from the allocated resource. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44128	2024-03-13 15:05:54 -07:00
John Baldwin	9edb8d0aed	new-bus: Introduce a simpler bus API for managing resources Remove the 'type' and 'rid' arguments from the wrapper bus API functions (e.g. bus_release_resource) that accept a struct resource. The "new" versions extract the 'type' and/or 'rid' from the passed in resource object via rman_get_type and rman_get_rid. This commit adds the new API as functions with a _new suffix. Wrapper macros choose between the old and new functions based on the number of arguments provided to the macro. This commit does not change the ABI but can be safely MFCd to older branches so long as older kernels use rman_set_type when allocating resources. Future commits will push the removal of these extraneous arguments through the bus implementation. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44124	2024-03-13 15:05:53 -07:00
John Baldwin	1b9bcffff3	sys: Set the type of allocated bus resources Use rman_set_type to set the type of allocated resources everywhere rman_set_rid is currently called. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44123	2024-03-13 15:05:53 -07:00
John Baldwin	b30a80b655	rman: Add rman_get/set_type This permits associating a resource type (e.g. SYS_RES_MEMORY) with a struct resource. I considered adding a new field to struct rman to store the type and only providing rman_get_type as an accessor. However, changing 'struct rman' is an ABI breakage. I might revisit this in main, but the current approach is MFC'able. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D44122	2024-03-13 15:05:53 -07:00
Richard Scheffenegger	85df11a1de	ktls: deep copy tls_enable struct for in-kernel tcp consumers Doing a deep copy of the keys early allows users of the tls_enable structure to assume kernel memory. This enables the socket options to be set by kernel threads. Reviewed By: #transport, tuexen, jhb, rrs Sponsored by: NetApp, Inc. X-NetApp-PR: #79 Differential Revision: https://reviews.freebsd.org/D44250	2024-03-13 13:23:13 +01:00
John Baldwin	f980f48f13	Revert "new-bus: Disable assertions for rman mismatches for activate/deactivate" With recent fixes to the ACPI and pcib drivers to translate mapping requests of child resources into mappings of sub-ranges of parent resources these assertions should now be true. This reverts commit `ed88eef140`. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D43691	2024-03-13 13:19:10 -07:00
Jason A. Harmening	d56c175ac9	uipc_bindat(): Explicitly specify exclusive locking for the new vnode When calling VOP_CREATE(), uipc_bindat() reuses the componentname object from the preceding lookup operation, which is likely to specify LK_SHARED. Furthermore, the VOP_CREATE() interface technically only requires the newly-created vnode to be returned with a shared lock. However, the socket layer requires the new vnode to be locked exclusive and asserts to that effect. In most cases, this is not a practical concern because most if not all base-layer filesystems (certainly FFS, ZFS, and msdosfs at least) always return the vnode locked exclusive regardless of the lock flags. However, it is an issue for unionfs which uses cn_lkflags to determine how the new unionfs wrapper vnode should be locked. While it would be easy enough to work around this issue within unionfs itself, it seems better for the socket layer to be explicit about its locking requirements when issuing VOP_CREATE(). Reviewed by: kib, olce MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D44047	2024-03-09 19:48:02 -06:00
Jason A. Harmening	fa26f46dc2	vn_lock_pair(): allow lkflags1/lkflags2 to be 0 if vp1/vp2 is NULL It's a bit strange to require the caller to pass contrived lock flags if the corresponding vnode is NULL, simply to appease the assertion that exactly one of LK_SHARED or LK_EXCLUSIVE must be set. On the other hand, we still want to catch cases in which completely bogus or corrupt flags are passed even if the corresponding vnode is NULL. Therefore, specifically allow empty flags for lkflags1/lkflags2 iff the respective vp1/vp2 param is NULL. Reviewed by: kib, olce MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D44046	2024-03-09 19:41:45 -06:00
Mark Johnston	a58813fd70	ktrace: Fix the build when options KTRACE is not configured MFC after: 1 week Reported by: John Nielsen <lists@jnielsen.net>	2024-03-09 00:33:55 -05:00

1 2 3 4 5 ...

20150 commits