Commit graph

73 commits

Author SHA1 Message Date
Olivier Certner 7fa08d4152 kern_racct.c: Don't compile if RACCT undefined
Just skip compiling this file if RACCT isn't defined.  This allows to
skip including headers that no code uses at all, and also to remove the
whole file's #ifdef/#endif bracketing.

Reviewed by:    markj
MFC after:      2 weeks
Sponsored by:   The FreeBSD Foundation
2023-11-22 14:17:17 -05:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh 95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Konstantin Belousov c6d31b8306 AST: rework
Make most AST handlers dynamically registered.  This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it.  For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.

Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit.  For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.

The dynamic registration also allows third-party modules to register AST
handlers if needed.  There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it.  In fact, this
is already present behavior for hwpmc.ko and ufs.ko.  I do not think it
is worth the efforts and the runtime overhead to try to fix it.

Reviewed by:	markj
Tested by:	emaste (arm64), pho
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D35888
2022-08-02 21:11:09 +03:00
Dmitry Chagin 31d1b816fe sysent: Get rid of bogus sys/sysent.h include.
Where appropriate hide sysent.h under proper condition.

MFC after:	2 weeks
2022-05-28 20:52:17 +03:00
Gordon Bergling a9bee9c77a kern_racct: Fix a typo in a source code comment
- s/maxumum/maximum/

MFC after:	3 days
2022-02-06 17:28:27 +01:00
Dmitry Chagin af29f39958 umtx: Split umtx.h on two counterparts.
To prevent umtx.h polluting by future changes split it on two headers:
umtx.h - ABI header for userspace;
umtxvar.h - the kernel staff.

While here fix umtx_key_match style.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D31248
MFC after:		2 weeks
2021-07-29 12:41:29 +03:00
Alex Richardson fa2528ac64 Use atomic loads/stores when updating td->td_state
KCSAN complains about racy accesses in the locking code. Those races are
fine since they are inside a TD_SET_RUNNING() loop that expects the value
to be changed by another CPU.

Use relaxed atomic stores/loads to indicate that this variable can be
written/read by multiple CPUs at the same time. This will also prevent
the compiler from doing unexpected re-ordering.

Reported by:	GENERIC-KCSAN
Test Plan:	KCSAN no longer complains, kernel still runs fine.
Reviewed By:	markj, mjg (earlier version)
Differential Revision: https://reviews.freebsd.org/D28569
2021-02-18 14:02:48 +00:00
Edward Tomasz Napierala bce7ee9d41 Drop "All rights reserved" from all my stuff. This includes
Foundation copyrights, approved by emaste@.  It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.

Reviewed by:	emaste, imp, gbe (manpages)
Differential Revision:	https://reviews.freebsd.org/D26980
2020-10-28 13:46:11 +00:00
Edward Tomasz Napierala 30d158eecc Move racct/rctl throttling from userret() to ast(). There's no reason
for it to sit in the syscall fast path.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26368
2020-09-14 09:44:24 +00:00
Mateusz Guzik 6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Pawel Biernacki 7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mateusz Guzik 88cc62e5a5 proc: eliminate the zombproc list
It is not needed by anything in the kernel and it slightly drives up contention
on both proctree and allproc locks.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21447
2019-08-28 16:18:23 +00:00
Mateusz Guzik 448db4f761 racct: add RACCT_ENABLED macro and racct_set_unlocked
This allows to remove PROC_LOCK/UNLOCK pairs spread thorought the kernel
only used to appease racct_set.

Sponsored by:	The FreeBSD Foundation
2018-12-07 16:47:34 +00:00
Mateusz Guzik eec8d0a378 Convert racct_enable to bool and annotate as __read_frequently
Sponsored by:	The FreeBSD Foundation
2018-11-29 05:17:16 +00:00
Mateusz Guzik 64cf6a62d4 Deinline racct throttling out of syscall exit path.
racct is not enabled by default and even when it is enabled processes are
typically not throttled. The order of checks is left unchanged since
racct_enable will be annotated as __read_frequently, while checking for the
flag in the processes would probably require an extra fetch.

Sponsored by:	The FreeBSD Foundation
2018-11-29 05:08:46 +00:00
Mateusz Guzik 1e9a1bf589 proc: create a dedicated lock for zombproc to ligthen the load on allproc_lock
waitpid always takes proctree to evaluate the list, but only takes allproc
if it can reap. With this patch allproc is no longer taken, which helps during
poudriere -j 128.

Discussed with: kib
Sponsored by:	The FreeBSD Foundation
2018-11-29 02:52:08 +00:00
Mateusz Guzik a5ac8272c0 fork: remove avoidable proc lock/unlock pair
We don't have to access the process after making it runnable, so there
is no need to hold it either.

Sponsored by:	The FreeBSD Foundation
2018-11-22 21:29:36 +00:00
Andriy Gapon f87beb93e8 call racct_proc_ucred_changed() under the proc lock
The lock is required to ensure that the switch to the new credentials
and the transfer of the process's accounting data from the old
credentials to the new ones is done atomically.  Otherwise, some updates
may be applied to the new credentials and then additionally transferred
from the old credentials if the updates happen after proc_set_cred() and
before racct_proc_ucred_changed().

The problem is especially pronounced for RACCT_RSS because
- there is a strict accounting for this resource (it's reclaimable)
- it's updated asynchronously by the vm daemon
- it's updated by setting an absolute value instead of applying a delta

I had to remove a call to rctl_proc_ucred_changed() from
racct_proc_ucred_changed() and make all callers of latter call the
former as well.  The reason is that rctl_proc_ucred_changed, as it is
implemented now, cannot be called while holding the proc lock, so the
lock is dropped after calling racct_proc_ucred_changed.  Additionally,
I've added calls to crhold / crfree around the rctl call, because
without the proc lock there is no gurantee that the new credentials,
owned by the process, will stay stable.  That does not eliminate a
possibility that the credentials passed to the rctl will get stale.
Ideally, rctl_proc_ucred_changed should be able to work under the proc
lock.

Many thanks to kib for pointing out the above problems.

PR:		222027
Discussed with:	kib
No comment:	trasz
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D15048
2018-04-20 13:08:04 +00:00
Pedro F. Giffuni 8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Andriy Gapon 937c1b0757 try to fix RACCT_RSS accounting
There could be a race between the vm daemon setting RACCT_RSS based on
the vm space and vmspace_exit (called from exit1) resetting RACCT_RSS to
zero.  In that case we can get a zombie process with non-zero RACCT_RSS.
If the process is jailed, that may break accounting for the jail.
There could be other consequences.

Fix this race in the vm daemon by updating RACCT_RSS only when a process
is in the normal state.  Also, make accounting a little bit more
accurate by refreshing the page resident count after calling
vm_pageout_map_deactivate_pages().
Finally, add an assert that the RSS is zero when a process is reaped.

PR:		210315
Reviewed by:	trasz
Differential Revision: https://reviews.freebsd.org/D9464
2017-02-14 13:54:05 +00:00
Edward Tomasz Napierala 5c93966020 Remove redundant KASSERT. 2017-01-22 15:35:51 +00:00
Edward Tomasz Napierala bbe4eb6d54 Get rid of rctl_lock; use racct_lock where appropriate. The fast paths
already required both of them, so having a separate rctl_lock didn't
buy us anything.

Reviewed by:	mjg@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5914
2016-04-21 16:22:52 +00:00
Edward Tomasz Napierala 23e6fff29d Allocate RACCT/RCTL zones without UMA_ZONE_NOFREE; no idea why it was there
in the first place.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-04-15 13:34:59 +00:00
Edward Tomasz Napierala c1a43e73c5 Sort variable declarations.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-04-15 11:55:29 +00:00
Edward Tomasz Napierala ae34b6ff96 Add four new RCTL resources - readbps, readiops, writebps and writeiops,
for limiting disk (actually filesystem) IO.

Note that in some cases these limits are not quite precise. It's ok,
as long as it's within some reasonable bounds.

Testing - and review of the code, in particular the VFS and VM parts - is
very welcome.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5080
2016-04-07 04:23:25 +00:00
Edward Tomasz Napierala 4c230cdafd Use proper locking macros in RACCT in RCTL.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-04-05 11:30:52 +00:00
Edward Tomasz Napierala 097e0da79d Fix mismerge.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-04-01 18:45:04 +00:00
Edward Tomasz Napierala 862d03fb7f Drop the 'resource' argument to racct_decay(); it wouldn't make sense
to iterate separately for each resource.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-04-01 18:36:10 +00:00
Edward Tomasz Napierala 659e74662f Call rctl_enforce() in all cases the resource usage goes up, even when called
from racct_*_force() functions.  It makes the "log" and "devctl" actions work
in those cases.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-04-01 17:28:55 +00:00
Edward Tomasz Napierala 0b9f1ecb87 Reorder the functions; no functional changes.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-04-01 17:21:55 +00:00
Edward Tomasz Napierala 7cea96f606 Reduce code duplication.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-04-01 17:17:32 +00:00
Edward Tomasz Napierala 1028719823 Reduce code duplication. There should be no (intended) functional changes.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-04-01 17:05:46 +00:00
Konstantin Belousov db57c70a5b Rename P_KTHREAD struct proc p_flag to P_KPROC.
I left as is an apparent bug in ntoskrnl_var.h:AT_PASSIVE_LEVEL()
definition.

Suggested by:	jhb
Sponsored by:	The FreeBSD Foundation
2016-02-09 16:30:16 +00:00
Mateusz Guzik 813361c140 fork: plug a use after free of the returned process
fork1 required its callers to pass a pointer to struct proc * which would
be set to the new process (if any). procdesc and racct manipulation also
used said pointer.

However, the process could have exited prior to do_fork return and be
automatically reaped, thus making this a use-after-free.

Fix the problem by letting callers indicate whether they want the pid or
the struct proc, return the process in stopped state for the latter case.

Reviewed by:	kib
2016-02-04 04:25:30 +00:00
Mark Johnston 3616095801 Fix style issues around existing SDT probes.
- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
  at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
  set automatically by the SDT framework.

MFC after:	1 week
2015-12-16 23:39:27 +00:00
Edward Tomasz Napierala af1a7b2526 Tweak comments.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-12-13 11:30:36 +00:00
Edward Tomasz Napierala a8e0740e85 Actually make the 'amount' argument to racct_adjust_resource() signed,
as it was always supposed to be.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-12-13 11:21:13 +00:00
Edward Tomasz Napierala 57e2ebdc08 Avoid useless relocking.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-12-13 11:08:29 +00:00
Edward Tomasz Napierala 15db3c0738 Speed up rctl operation with large rulesets, by holding the lock
during iteration instead of relocking it for each traversed rule.

Reviewed by:	mjg@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D4110
2015-11-15 12:10:51 +00:00
Josh Paetzel 09766dd52a Fix a bug in the CPU % limiting code
If you attempt to set a pcpu limit that is higher than
110% using rctl (for instance, you want a jail to be
able to use 2 cores on your system so you set pcpu to
200%) the thing you are trying to limit becomes unthrottled.

PR:	189870
Submitted by:	dustinwenz@ebureau.com
Reviewed by:	trasz
MFC after:	1 week
2015-11-10 14:14:32 +00:00
Andriy Gapon 2f2f522b5d save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE
SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters
where n is typically smaller than 5.

Perhaps SDT_PROBE should be made a private implementation detail.

MFC after:	20 days
2015-09-28 12:14:16 +00:00
Jeremie Le Hen 3768a5dfb5 nit: Rename racct_alloc_resource to racct_adjust_resource.
This is more accurate as the amount can be negative.

MFC after:	2 weeks
2015-06-14 08:33:14 +00:00
Edward Tomasz Napierala ba8f0eb8fc Build GENERIC with RACCT/RCTL support by default. Note that it still
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.

Differential Revision:	https://reviews.freebsd.org/D2407
Reviewed by:	emaste@, wblock@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-05-14 14:03:55 +00:00
Edward Tomasz Napierala 4b5c9cf62f Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

Differential Revision:	https://reviews.freebsd.org/D2369
Reviewed by:	kib@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-04-29 10:23:02 +00:00
Konstantin Belousov 5c7bebf961 The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
  the process, and interlocking with process mutex and thread lock.
  The main reason of this is that turnstile locks are after thread
  locks, so you e.g. cannot unlock blockable mutex (think process
  mutex) while owning thread lock.

- Virtual and profiling itimers, since the timers activation is done
  from the clock interrupt context.  Replace the p_slock by p_itimmtx
  and PROC_ITIMLOCK().

- Profiling code (profil(2)), for similar reason.  Replace the p_slock
  by p_profmtx and PROC_PROFLOCK().

- Resource usage accounting.  Need for the spinlock there is subtle,
  my understanding is that spinlock blocks context switching for the
  current thread, which prevents td_runtime and similar fields from
  changing (updates are done at the mi_switch()).  Replace the p_slock
  by p_statmtx and PROC_STATLOCK().

The split is done mostly for code clarity, and should not affect
scalability.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-26 14:10:00 +00:00
Mateusz Guzik dd2390be68 Convert racct stubs to inline functions.
This saves some symbols and function calls for kernel without RACCT.

MFC after:	1 week
2014-10-06 02:31:33 +00:00
Andriy Gapon d9fae5ab88 dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE
In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by:	markj
MFC after:	3 weeks
2013-11-26 08:46:27 +00:00
Attilio Rao 54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00