Commit graph

578 commits

Author SHA1 Message Date
Jake Freeland 0cd9cde767 ktrace: Record namei violations with KTR_CAPFAIL
Report namei path lookups while Capsicum violation tracing with
CAPFAIL_NAMEI. vfs caching is also ignored when tracing to mimic
capability mode behavior.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D40680
2024-04-07 18:52:51 -05:00
Konstantin Belousov 4a69fc16a5 Add membarrier(2)
This is an attempt at clean-room implementation of the Linux'
membarrier(2) syscall.  For documentation, you would need to read
both membarrier(2) Linux man page, the comments in Linux
kernel/sched/membarrier.c implementation and possibly look at
actual uses.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32360
2023-08-23 03:02:21 +03:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Jessica Clarke 94426d21bf pmc: Rework PROCEXEC event to support PIEs
Currently the PROCEXEC event only reports a single address, entryaddr,
which is the entry point of the interpreter in the typical dynamic case,
and used solely to calculate the base address of the interpreter. For
PDEs this is fine, since the base address is known from the program
headers, but for PIEs the base address varies at run time based on where
the kernel chooses to load it, and so pmcstat has no way of knowing the
real address ranges for the executable. This was less of an issue in the
past since PIEs were rare, but now they're on by default on 64-bit
architectures it's more of a problem.

To solve this, pass through what was picked for et_dyn_addr by the
kernel, and use that as the offset for the executable's start address
just as is done for everything in the kernel. Since we're changing this
interface, sanitise the way we determine the interpreter's base address
by passing it through directly rather than indirectly via the entry
point and having to subtract off whatever the ELF header's e_entry is
(and anything that wants the entry point in future can still add that
back on as needed; this merely changes the interface to directly provide
the underlying variables involved).

This will be followed up by a bump to the pmc major version.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D39595
2023-05-31 00:20:36 +01:00
Dmitry Chagin d706d02edb sysentvec: Retire sv_imgact_try as unneeded anymore
The sysentvec sv_imgact_try was used by kern_exec() to allow
non-native ABI to fixup shell path according to ABI root directory.
Since the non-native ABI can now specify its root directory directly
to namei() via pwd_altroot() call this facility is not needed anymore.

Differential Revision:	https://reviews.freebsd.org/D40092
MFC after:		2 month
2023-05-29 11:18:11 +03:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Doug Rabson 5eeb4f737f imgact_binmisc: Optionally pre-open the interpreter vnode
This allows the use of chroot and/or jail environments which depend on
interpreters registed with imgact_binmisc to use emulator binaries from
the host to emulate programs inside the chroot.

Reviewed by:	imp
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D37432
2022-12-08 14:32:03 +00:00
Mateusz Guzik 5b5b7e2ca2 vfs: always retain path buffer after lookup
This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D36542
2022-09-17 09:10:38 +00:00
Konstantin Belousov 5e5675cb4b Remove struct proc p_singlethr member
It does not serve any purpose after we stopped doing
thread_single(SINGLE_ALLPROC) from stoppable user processes.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:34:30 +03:00
Kornel Dulęba 939f0b6323 Implement shared page address randomization
It used to be mapped at the top of the UVA.
If the randomization is enabled any address above .data section will be
randomly chosen and a guard page will be inserted in the shared page
default location.
The shared page is now mapped in exec_map_stack, instead of
exec_new_vmspace. The latter function is called before image activator
has a chance to parse ASLR related flags.
The KERN_PROC_VM_LAYOUT sysctl was extended to provide shared page
address.
The feature is enabled by default for 64 bit applications on all
architectures.
It can be toggled kern.elf64.aslr.shared_page sysctl.

Approved by:	mw(mentor)
Sponsored by:	Stormshield
Obtained from:	Semihalf
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D35349
2022-07-18 16:27:37 +02:00
Kornel Dulęba 361971fbca Rework how shared page related data is stored
Store the shared page address in struct vmspace.
Also instead of storing absolute addresses of various shared page
segments save their offsets with respect to the shared page address.
This will be more useful when the shared page address is randomized.

Approved by:	mw(mentor)
Sponsored by:	Stormshield
Obtained from:	Semihalf
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D35393
2022-07-18 16:27:32 +02:00
Konstantin Belousov 4493a13e3b Do not single-thread itself when the process single-threaded some another process
Since both self single-threading and remote single-threading rely on
suspending the thread doing thread_single(), it cannot be mixed: thread
doing thread_suspend_switch() might be subject to thread_suspend_one()
and vice versa.

In collaboration with:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D35310
2022-06-13 22:30:03 +03:00
Mateusz Guzik bb92cd7bcd vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd) 2022-03-24 10:20:51 +00:00
Kyle Evans 773fa8cd13 execve: disallow argc == 0
The manpage has contained the following verbiage on the matter for just
under 31 years:

"At least one argument must be present in the array"

Previous to this version, it had been prefaced with the weakening phrase
"By convention."

Carry through and document it the rest of the way.  Allowing argc == 0
has been a source of security issues in the past, and it's hard to
imagine a valid use-case for allowing it.  Toss back EINVAL if we ended
up not copying in any args for *execve().

The manpage change can be considered "Obtained from: OpenBSD"

Reviewed by:	emaste, kib, markj (all previous version)
Differential Revision:	https://reviews.freebsd.org/D34045
2022-01-26 13:40:27 -06:00
Mark Johnston 1811c1e957 exec: Reimplement stack address randomization
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack.  This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings.  In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.

There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose.  Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.

The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.

PR:		260303
Reviewed by:	kib
Discussed with:	emaste, mw
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:12:36 -05:00
Mark Johnston 758d98debe exec: Remove the stack gap implementation
ASLR stack randomization will reappear in a forthcoming commit.  Rather
than inserting a random gap into the stack mapping, the entire stack
mapping itself will be randomized in the same way that other mappings
are when ASLR is enabled.

No functional change intended, as the stack gap implementation is
currently disabled by default.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:11:54 -05:00
Mark Johnston 706f4a81a8 exec: Introduce the PROC_PS_STRINGS() macro
Rather than fetching the ps_strings address directly from a process'
sysentvec, use this macro.  With stack address randomization the
ps_strings address is no longer fixed.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:11:54 -05:00
Mark Johnston 1544f5add8 Revert "kern_exec: Add kern.stacktop sysctl."
The current ASLR stack gap feature will be removed, and with that the
need for the kern.stacktop sysctl is gone.  All consumers have been
removed.

This reverts commit a97d697122.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 11:41:58 -05:00
Mark Johnston f04a096049 exec: Simplify sv_copyout_strings implementations a bit
Simplify control flow around handling of the execpath length and signal
trampoline.  Cache the sysentvec pointer in a local variable.

No functional change intended.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33703
2021-12-31 12:50:15 -05:00
Mateusz Guzik 7e1d3eefd4 vfs: remove the unused thread argument from NDINIT*
See b4a58fbf64 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.
2021-11-25 22:50:42 +00:00
Konstantin Belousov be10c0a910 fexecve(2): allow O_PATH file descriptors opened without O_EXEC
This improves compatibility with Linux.

Noted by:	Drew DeVault <sir@cmpwn.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32821
2021-11-03 18:00:42 +02:00
Konstantin Belousov e4ce23b238 fexecve(2): restore the attempts to calculate the executable path
vn_fullpath() call was not converted to pass newtextvp, instead it used
imgp->vp which is still NULL there.  As result vn_fullpath() always
returned EINVAL and execpath was recorded from the value of arg0.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-11-03 15:10:22 +02:00
Konstantin Belousov 1c69690319 Unmap shared page manually before doing vm_map_remove() on exit or exec
This allows the pmap_remove(min, max) call to see empty pmap and exploit
empty pmap optimization.

Reviewed by:	markj
Tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32569
2021-10-28 22:01:59 +03:00
Konstantin Belousov 351d5f7fc5 exec: store parent directory and hardlink name of the binary in struct proc
While doing it, also move all the code to resolve pathnames and obtain
text vp and dvp, into single place.   Besides simplifying the code, it
avoids spurious vnode relocks and validates the explanation why
a transient text reference on the script vnode is not harmful.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:56 +03:00
Konstantin Belousov 0c10648fbb exec: provide right hardlink name in AT_EXECPATH
For this, use vn_fullpath_hardlink() to resolve executable name for
execve(2).

This should provide the right hardlink name, used for execution, instead
of random hardlink pointing to this binary.  Also this should make the
AT_EXECNAME reliable for execve(2), since kernel only needs to resolve
parent directory path, which should always succeed (except pathological
cases like unlinking a directory).

PR:	248184
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:31 +03:00
Konstantin Belousov 15bf81f354 struct image_params: use bool type for boolean members
Also re-align comments, and group booleans and char members together.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:21 +03:00
Konstantin Belousov 9d58243fbc do_execve(): switch boolean locals to use bool type
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:16 +03:00
Konstantin Belousov 143dba3a91 kern_exec.c: style
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:10 +03:00
Colin Percival 46dd801acb Add userland boot profiling to TSLOG
On kernels compiled with 'options TSLOG', record for each process ID:
* The timestamp of the fork() which creates it and the parent
process ID,
* The first path passed to execve(), if any,
* The first path resolved by namei, if any, and
* The timestamp of the exit() which terminates the process.

Expose this information via a new sysctl, debug.tslog_user.

On kernels lacking 'options TSLOG' (the default), no information is
recorded and the sysctl does not exist.

Note that recording namei is needed in order to obtain the names of
rc.d scripts being launched, as the rc system sources them in a
subshell rather than execing the scripts.

With this commit it is now possible to generate flamecharts of the
entire boot process from the start of the loader to the end of
/etc/rc.  The code needed to perform this processing is currently
found in github: https://github.com/cperciva/freebsd-boot-profiling

Reviewed by:	mhorne
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D32493
2021-10-16 11:47:34 -07:00
Dawid Gorecki a97d697122 kern_exec: Add kern.stacktop sysctl.
With stack gap enabled top of the stack is moved down by a random
amount of bytes. Because of that some multithreaded applications
which use kern.usrstack sysctl to calculate address of stacks for
their threads can fail. Add kern.stacktop sysctl, which can be used
to retrieve address of the stack after stack gap is applied to it.
Returns value identical to kern.usrstack for processes which have
no stack gap.

Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31897
2021-10-15 10:21:55 +02:00
Dawid Gorecki 889b56c8cd setrlimit: Take stack gap into account.
Calling setrlimit with stack gap enabled and with low values of stack
resource limit often caused the program to abort immediately after
exiting the syscall. This happened due to the fact that the resource
limit was calculated assuming that the stack started at sv_usrstack,
while with stack gap enabled the stack is moved by a random number
of bytes.

Save information about stack size in struct vmspace and adjust the
rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max,
then the value is truncated to rlim_max.

PR: 253208
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31516
2021-10-15 10:21:47 +02:00
Mateusz Guzik a0558fe90d Retire code added to support CloudABI
CloudABI was removed in cf0ee8738e
2021-10-10 18:24:29 +00:00
Konstantin Belousov b5cadc643e Make core dump writes interruptible with SIGKILL
This can be disabled by sysctl kern.core_dump_can_intr

Reported and tested by:	pho
Reviewed by:	imp, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32313
2021-10-08 03:21:43 +03:00
Justin Hibbits 63cb9308a7 Fix segment size in compressing core dumps
A core segment is bounded in size only by memory size.  On 64-bit
architectures this means a segment can be much larger than 4GB.
However, compress_chunk() takes only a u_int, clamping segment size to
4GB-1, resulting in a truncated core.  Everything else, including the
compressor internally, uses size_t, so use size_t at the boundary here.

This dates back to the original refactor back in 2015 (r279801 /
aa14e9b7).

MFC after:	1 week
Sponsored by:	Juniper Networks, Inc.
2021-10-01 14:16:33 -05:00
Konstantin Belousov 397f188936 Remove SV_CAPSICUM
It was only needed for cloudabi

Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D31923
2021-09-22 00:18:44 +03:00
Andrew Turner b792434150 Create sys/reg.h for the common code previously in machine/reg.h
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).

Reviewed by:	imp, markj
Sponsored by:	DARPA, AFRL (original work)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19830
2021-08-30 12:50:53 +01:00
Dmitry Chagin af29f39958 umtx: Split umtx.h on two counterparts.
To prevent umtx.h polluting by future changes split it on two headers:
umtx.h - ABI header for userspace;
umtxvar.h - the kernel staff.

While here fix umtx_key_match style.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D31248
MFC after:		2 weeks
2021-07-29 12:41:29 +03:00
Dmitry Chagin 5fd9cd53d2 linux(4): Modify sv_onexec hook to return an error.
Temporary add stubs to the Linux emulation layer which calls the existing hook.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D30911
MFC after:		2 weeks
2021-07-20 09:56:25 +03:00
Dmitry Chagin 62ba4cd340 Call sv_onexec hook after the process VA is created.
For future use in the Linux emulation layer call sv_onexec hook right after
the new process address space is created. It's safe, as sv_onexec used only
by Linux abi and linux_on_exec() does not depend on a state of process VA.

Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D30899
MFC after:		2 weeks
2021-07-20 09:55:14 +03:00
Konstantin Belousov 28a66fc3da Do not call FreeBSD-ABI specific code for all ABIs
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI.  Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.

Requested by:	dchagin
Reviewed by:	dchagin, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:07 +03:00
Konstantin Belousov 71ab344524 Add sv_onexec_old() sysent hook for exec event
Unlike sv_onexec(), it is called from the old (pre-exec) sysentvec structure.
The old vmspace for the process is still intact during the call.

Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:07 +03:00
Edward Tomasz Napierala db8d680ebe procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared.  The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.

The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30939
2021-07-01 09:42:07 +01:00
Dmitry Chagin 615f22b2fb Add a link to the Elf_Brandinfo into the struc proc.
To allow the ABI to make a dicision based on the Brandinfo add a link
to the Elf_Brandinfo into the struct proc. Add a note that the high 8 bits
of Elf_Brandinfo flags is private to the ABI.

Note to MFC: it breaks KBI.

Reviewed by:		kib, markj
Differential Revision:	https://reviews.freebsd.org/D30918
MFC after:		2 weeks
2021-06-29 20:15:08 +03:00
Konstantin Belousov 2d423f7671 sysent: allow ABI to disable setid on exec.
Reviewed by:	dchagin
Tested by:	trasz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28154
2021-06-06 21:42:52 +03:00
Konstantin Belousov 19e6043a44 kern_exec.c: Add execve_nosetid() helper
Reviewed by:	dchagin
Tested by:	trasz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28154
2021-06-06 21:42:41 +03:00
Edward Tomasz Napierala 3b9971c8da Clean up some of the core dumping code.
No functional changes.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30397
2021-05-25 16:30:32 +01:00
Mateusz Guzik 154f0ecc10 Fix tinderbox build after 1762f674cc ktrace commit. 2021-05-22 19:41:19 +00:00
Konstantin Belousov 1762f674cc ktrace: pack all ktrace parameters into allocated structure ktr_io_params
Ref-count the ktr_io_params structure instead of vnode/cred.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Edward Tomasz Napierala 33621dfc19 Refactor core dumping code a bit
This makes it possible to use core_write(), core_output(),
and sbuf_drain_core_output(), in Linux coredump code.  Moving
them out of imgact_elf.c is necessary because of the weird way
it's being built.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30369
2021-05-22 09:59:00 +01:00
Mark Johnston f1c3adefd9 execve: Mark exec argument buffers
We cache mapped execve argument buffers to avoid the overhead of TLB
shootdowns.  Mark them invalid when they are freed to the cache.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29460
2021-04-13 17:42:21 -04:00