1993-06-12 14:58:17 +00:00
|
|
|
/*-
|
2017-11-18 14:26:50 +00:00
|
|
|
* SPDX-License-Identifier: BSD-4-Clause
|
|
|
|
*
|
1994-06-06 14:54:41 +00:00
|
|
|
* Copyright (C) 1994, David Greenman
|
|
|
|
* Copyright (c) 1990, 1993
|
|
|
|
* The Regents of the University of California. All rights reserved.
|
2022-07-25 13:03:59 +00:00
|
|
|
* Copyright (c) 2007, 2022 The FreeBSD Foundation
|
1993-06-12 14:58:17 +00:00
|
|
|
*
|
|
|
|
* This code is derived from software contributed to Berkeley by
|
|
|
|
* the University of Utah, and William Jolitz.
|
|
|
|
*
|
2007-12-07 08:20:17 +00:00
|
|
|
* Portions of this software were developed by A. Joseph Koshy under
|
|
|
|
* sponsorship from the FreeBSD Foundation and Google, Inc.
|
|
|
|
*
|
1993-06-12 14:58:17 +00:00
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. All advertising materials mentioning features or use of this software
|
|
|
|
* must display the following acknowledgement:
|
|
|
|
* This product includes software developed by the University of
|
|
|
|
* California, Berkeley and its contributors.
|
|
|
|
* 4. Neither the name of the University nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
2003-06-11 00:56:59 +00:00
|
|
|
#include <sys/cdefs.h>
|
2012-03-28 20:58:30 +00:00
|
|
|
#include "opt_hwpmc_hooks.h"
|
1996-01-03 21:42:35 +00:00
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/kernel.h>
|
2022-07-25 13:03:59 +00:00
|
|
|
#include <sys/limits.h>
|
2001-06-29 19:51:37 +00:00
|
|
|
#include <sys/lock.h>
|
2021-08-10 21:14:47 +00:00
|
|
|
#include <sys/msan.h>
|
2000-10-20 07:58:15 +00:00
|
|
|
#include <sys/mutex.h>
|
2001-06-29 19:51:37 +00:00
|
|
|
#include <sys/proc.h>
|
Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
2002-06-29 17:26:22 +00:00
|
|
|
#include <sys/ktr.h>
|
1997-11-24 13:25:37 +00:00
|
|
|
#include <sys/resourcevar.h>
|
2002-10-12 05:32:24 +00:00
|
|
|
#include <sys/sched.h>
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
#include <sys/syscall.h>
|
2010-06-30 18:03:42 +00:00
|
|
|
#include <sys/syscallsubr.h>
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
#include <sys/sysent.h>
|
2001-06-29 19:51:37 +00:00
|
|
|
#include <sys/systm.h>
|
1995-12-07 12:48:31 +00:00
|
|
|
#include <sys/vmmeter.h>
|
2003-07-31 01:36:24 +00:00
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
#include <machine/cpu.h>
|
2000-12-12 01:14:32 +00:00
|
|
|
|
2011-02-14 20:49:37 +00:00
|
|
|
#ifdef VIMAGE
|
|
|
|
#include <net/vnet.h>
|
|
|
|
#endif
|
|
|
|
|
2012-03-28 20:58:30 +00:00
|
|
|
#ifdef HWPMC_HOOKS
|
|
|
|
#include <sys/pmckern.h>
|
|
|
|
#endif
|
|
|
|
|
2022-07-25 13:03:59 +00:00
|
|
|
#ifdef EPOCH_TRACE
|
|
|
|
#include <sys/epoch.h>
|
|
|
|
#endif
|
2006-10-22 11:52:19 +00:00
|
|
|
|
2024-03-28 12:12:37 +00:00
|
|
|
volatile uint32_t __read_frequently hpts_that_need_softclock = 0;
|
|
|
|
|
2023-12-04 18:19:46 +00:00
|
|
|
void (*tcp_hpts_softclock)(void);
|
|
|
|
|
2001-06-29 19:51:37 +00:00
|
|
|
/*
|
2007-03-04 22:36:48 +00:00
|
|
|
* Define the code needed before returning to user mode, for trap and
|
|
|
|
* syscall.
|
2001-06-29 19:51:37 +00:00
|
|
|
*/
|
2001-01-24 09:53:49 +00:00
|
|
|
void
|
2006-02-08 08:09:17 +00:00
|
|
|
userret(struct thread *td, struct trapframe *frame)
|
1994-06-06 14:54:41 +00:00
|
|
|
{
|
2001-09-12 08:38:13 +00:00
|
|
|
struct proc *p = td->td_proc;
|
1994-06-06 14:54:41 +00:00
|
|
|
|
Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
2002-06-29 17:26:22 +00:00
|
|
|
CTR3(KTR_SYSC, "userret: thread %p (pid %d, %s)", td, p->p_pid,
|
2007-11-14 06:51:33 +00:00
|
|
|
td->td_name);
|
2011-10-03 16:58:58 +00:00
|
|
|
KASSERT((p->p_flag & P_WEXIT) == 0,
|
|
|
|
("Exiting process returns to usermode"));
|
2004-03-05 17:35:28 +00:00
|
|
|
#ifdef DIAGNOSTIC
|
2016-07-12 03:53:15 +00:00
|
|
|
/*
|
|
|
|
* Check that we called signotify() enough. For
|
|
|
|
* multi-threaded processes, where signal distribution might
|
|
|
|
* change due to other threads changing sigmask, the check is
|
|
|
|
* racy and cannot be performed reliably.
|
2016-07-18 10:53:47 +00:00
|
|
|
* If current process is vfork child, indicated by P_PPWAIT, then
|
|
|
|
* issignal() ignores stops, so we block the check to avoid
|
|
|
|
* classifying pending signals.
|
2016-07-12 03:53:15 +00:00
|
|
|
*/
|
|
|
|
if (p->p_numthreads == 1) {
|
|
|
|
PROC_LOCK(p);
|
|
|
|
thread_lock(td);
|
Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
|
|
|
if ((p->p_flag & P_PPWAIT) == 0 &&
|
2022-07-18 16:39:17 +00:00
|
|
|
(td->td_pflags & TDP_SIGFASTBLOCK) == 0 &&
|
|
|
|
SIGPENDING(td) && !td_ast_pending(td, TDA_AST) &&
|
|
|
|
!td_ast_pending(td, TDA_SIG)) {
|
|
|
|
thread_unlock(td);
|
|
|
|
panic(
|
|
|
|
"failed to set signal flags for ast p %p "
|
|
|
|
"td %p td_ast %#x fl %#x",
|
|
|
|
p, td, td->td_ast, td->td_flags);
|
2016-07-18 10:53:47 +00:00
|
|
|
}
|
2016-07-12 03:53:15 +00:00
|
|
|
thread_unlock(td);
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
}
|
Currently, when signal is delivered to the process and there is a thread
not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.
Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.
Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.
Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.
Reported by: Justin Teller <justin.teller gmail com>
Reviewed by: davidxu, jilles
MFC after: 1 month
2009-10-11 16:49:30 +00:00
|
|
|
#endif
|
2020-02-20 15:34:02 +00:00
|
|
|
|
2003-02-01 12:17:09 +00:00
|
|
|
/*
|
|
|
|
* Charge system time if profiling.
|
|
|
|
*/
|
2020-02-14 13:08:46 +00:00
|
|
|
if (__predict_false(p->p_flag & P_PROFIL))
|
2006-02-08 08:09:17 +00:00
|
|
|
addupc_task(td, TRAPF_PC(frame), td->td_pticks * psratio);
|
2018-06-04 01:10:23 +00:00
|
|
|
|
|
|
|
#ifdef HWPMC_HOOKS
|
|
|
|
if (PMC_THREAD_HAS_SAMPLES(td))
|
|
|
|
PMC_CALL_HOOK(td, PMC_FN_THR_USERRET, NULL);
|
2021-07-06 19:23:22 +00:00
|
|
|
#endif
|
|
|
|
/*
|
2023-12-04 18:19:46 +00:00
|
|
|
* Calling tcp_hpts_softclock() here allows us to avoid frequent,
|
|
|
|
* expensive callouts that trash the cache and lead to a much higher
|
|
|
|
* number of interrupts and context switches. Testing on busy web
|
|
|
|
* servers at Netflix has shown that this improves CPU use by 7% over
|
|
|
|
* relying only on callouts to drive HPTS, and also results in idle
|
|
|
|
* power savings on mostly idle servers.
|
|
|
|
* This was inspired by the paper "Soft Timers: Efficient Microsecond
|
|
|
|
* Software Timer Support for Network Processing"
|
|
|
|
* by Mohit Aron and Peter Druschel.
|
2021-07-06 19:23:22 +00:00
|
|
|
*/
|
2023-12-04 18:19:46 +00:00
|
|
|
tcp_hpts_softclock();
|
2004-12-26 07:30:35 +00:00
|
|
|
/*
|
|
|
|
* Let the scheduler adjust our priority etc.
|
|
|
|
*/
|
|
|
|
sched_userret(td);
|
2012-09-08 18:35:15 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Check for misbehavior.
|
2012-10-30 15:10:50 +00:00
|
|
|
*
|
|
|
|
* In case there is a callchain tracing ongoing because of
|
|
|
|
* hwpmc(4), skip the scheduler pinning check.
|
|
|
|
* hwpmc(4) subsystem, infact, will collect callchain informations
|
|
|
|
* at ast() checkpoint, which is past userret().
|
2012-09-08 18:35:15 +00:00
|
|
|
*/
|
|
|
|
WITNESS_WARN(WARN_PANIC, NULL, "userret: returning");
|
|
|
|
KASSERT(td->td_critnest == 0,
|
|
|
|
("userret: Returning in a critical section"));
|
2005-03-24 09:35:38 +00:00
|
|
|
KASSERT(td->td_locks == 0,
|
2012-09-08 18:35:15 +00:00
|
|
|
("userret: Returning with %d locks held", td->td_locks));
|
2013-12-17 13:37:02 +00:00
|
|
|
KASSERT(td->td_rw_rlocks == 0,
|
|
|
|
("userret: Returning with %d rwlocks held in read mode",
|
|
|
|
td->td_rw_rlocks));
|
2018-05-22 07:20:22 +00:00
|
|
|
KASSERT(td->td_sx_slocks == 0,
|
|
|
|
("userret: Returning with %d sx locks held in shared mode",
|
|
|
|
td->td_sx_slocks));
|
2019-08-19 11:18:36 +00:00
|
|
|
KASSERT(td->td_lk_slocks == 0,
|
|
|
|
("userret: Returning with %d lockmanager locks held in shared mode",
|
|
|
|
td->td_lk_slocks));
|
2012-09-08 18:35:15 +00:00
|
|
|
KASSERT((td->td_pflags & TDP_NOFAULTING) == 0,
|
|
|
|
("userret: Returning with pagefaults disabled"));
|
2019-10-29 17:28:25 +00:00
|
|
|
if (__predict_false(!THREAD_CAN_SLEEP())) {
|
|
|
|
#ifdef EPOCH_TRACE
|
|
|
|
epoch_trace_list(curthread);
|
|
|
|
#endif
|
2020-09-09 16:13:33 +00:00
|
|
|
KASSERT(0, ("userret: Returning with sleep disabled"));
|
2019-10-29 17:28:25 +00:00
|
|
|
}
|
2012-10-30 15:10:50 +00:00
|
|
|
KASSERT(td->td_pinned == 0 || (td->td_pflags & TDP_CALLCHAIN) != 0,
|
2023-04-13 18:03:37 +00:00
|
|
|
("userret: Returning with pinned thread"));
|
2020-01-11 22:58:14 +00:00
|
|
|
KASSERT(td->td_vp_reserved == NULL,
|
|
|
|
("userret: Returning with preallocated vnode"));
|
2016-06-26 20:07:24 +00:00
|
|
|
KASSERT((td->td_flags & (TDF_SBDRY | TDF_SEINTR | TDF_SERESTART)) == 0,
|
2013-02-21 19:02:50 +00:00
|
|
|
("userret: Returning with stop signals deferred"));
|
2018-03-24 13:51:27 +00:00
|
|
|
KASSERT(td->td_vslock_sz == 0,
|
|
|
|
("userret: Returning with vslock-wired space"));
|
2011-02-14 20:49:37 +00:00
|
|
|
#ifdef VIMAGE
|
|
|
|
/* Unfortunately td_vnet_lpush needs VNET_DEBUG. */
|
|
|
|
VNET_ASSERT(curvnet == NULL,
|
|
|
|
("%s: Returning on td %p (pid %d, %s) with vnet %p set in %s",
|
|
|
|
__func__, td, p->p_pid, td->td_name, curvnet,
|
|
|
|
(td->td_vnet_lpush != NULL) ? td->td_vnet_lpush : "N/A"));
|
|
|
|
#endif
|
1994-06-06 14:54:41 +00:00
|
|
|
}
|
1993-06-12 14:58:17 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
static void
|
|
|
|
ast_prep(struct thread *td, int tda __unused)
|
|
|
|
{
|
|
|
|
VM_CNT_INC(v_trap);
|
|
|
|
td->td_pticks = 0;
|
|
|
|
if (td->td_cowgen != atomic_load_int(&td->td_proc->p_cowgen))
|
|
|
|
thread_cow_update(td);
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
struct ast_entry {
|
|
|
|
int ae_flags;
|
|
|
|
int ae_tdp;
|
|
|
|
void (*ae_f)(struct thread *td, int ast);
|
|
|
|
};
|
|
|
|
|
|
|
|
_Static_assert(TDAI(TDA_MAX) <= UINT_MAX, "Too many ASTs");
|
|
|
|
|
|
|
|
static struct ast_entry ast_entries[TDA_MAX] __read_mostly = {
|
|
|
|
[TDA_AST] = { .ae_f = ast_prep, .ae_flags = ASTR_UNCOND},
|
|
|
|
};
|
|
|
|
|
|
|
|
void
|
|
|
|
ast_register(int ast, int flags, int tdp,
|
|
|
|
void (*f)(struct thread *, int asts))
|
|
|
|
{
|
|
|
|
struct ast_entry *ae;
|
|
|
|
|
|
|
|
MPASS(ast < TDA_MAX);
|
|
|
|
MPASS((flags & ASTR_TDP) == 0 || ((flags & ASTR_ASTF_REQUIRED) != 0
|
|
|
|
&& __bitcount(tdp) == 1));
|
|
|
|
ae = &ast_entries[ast];
|
|
|
|
MPASS(ae->ae_f == NULL);
|
|
|
|
ae->ae_flags = flags;
|
|
|
|
ae->ae_tdp = tdp;
|
|
|
|
atomic_interrupt_fence();
|
|
|
|
ae->ae_f = f;
|
|
|
|
}
|
|
|
|
|
1993-06-12 14:58:17 +00:00
|
|
|
/*
|
2022-07-18 16:39:17 +00:00
|
|
|
* XXXKIB Note that the deregistration of an AST handler does not
|
|
|
|
* drain threads possibly executing it, which affects unloadable
|
|
|
|
* modules. The issue is either handled by the subsystem using
|
|
|
|
* handlers, or simply ignored. Fixing the problem is considered not
|
|
|
|
* worth the overhead.
|
1993-06-12 14:58:17 +00:00
|
|
|
*/
|
2000-09-07 01:33:02 +00:00
|
|
|
void
|
2022-07-18 16:39:17 +00:00
|
|
|
ast_deregister(int ast)
|
2000-09-07 01:33:02 +00:00
|
|
|
{
|
2022-07-18 16:39:17 +00:00
|
|
|
struct ast_entry *ae;
|
|
|
|
|
|
|
|
MPASS(ast < TDA_MAX);
|
|
|
|
ae = &ast_entries[ast];
|
|
|
|
MPASS(ae->ae_f != NULL);
|
|
|
|
ae->ae_f = NULL;
|
|
|
|
atomic_interrupt_fence();
|
|
|
|
ae->ae_flags = 0;
|
|
|
|
ae->ae_tdp = 0;
|
|
|
|
}
|
2000-09-07 01:33:02 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
void
|
|
|
|
ast_sched_locked(struct thread *td, int tda)
|
|
|
|
{
|
|
|
|
THREAD_LOCK_ASSERT(td, MA_OWNED);
|
|
|
|
MPASS(tda < TDA_MAX);
|
2021-08-10 21:14:47 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
td->td_ast |= TDAI(tda);
|
|
|
|
}
|
2002-10-01 14:16:50 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
void
|
|
|
|
ast_unsched_locked(struct thread *td, int tda)
|
|
|
|
{
|
|
|
|
THREAD_LOCK_ASSERT(td, MA_OWNED);
|
|
|
|
MPASS(tda < TDA_MAX);
|
|
|
|
|
|
|
|
td->td_ast &= ~TDAI(tda);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ast_sched(struct thread *td, int tda)
|
|
|
|
{
|
Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
|
|
|
thread_lock(td);
|
2022-07-18 16:39:17 +00:00
|
|
|
ast_sched_locked(td, tda);
|
Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
|
|
|
thread_unlock(td);
|
2022-07-18 16:39:17 +00:00
|
|
|
}
|
2004-09-22 15:24:33 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
void
|
|
|
|
ast_sched_mask(struct thread *td, int ast)
|
|
|
|
{
|
|
|
|
thread_lock(td);
|
|
|
|
td->td_ast |= ast;
|
|
|
|
thread_unlock(td);
|
|
|
|
}
|
Currently, when signal is delivered to the process and there is a thread
not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.
Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.
Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.
Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.
Reported by: Justin Teller <justin.teller gmail com>
Reviewed by: davidxu, jilles
MFC after: 1 month
2009-10-11 16:49:30 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
static bool
|
|
|
|
ast_handler_calc_tdp_run(struct thread *td, const struct ast_entry *ae)
|
|
|
|
{
|
|
|
|
return ((ae->ae_flags & ASTR_TDP) == 0 ||
|
|
|
|
(td->td_pflags & ae->ae_tdp) != 0);
|
|
|
|
}
|
2020-09-14 10:17:07 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
/*
|
|
|
|
* Process an asynchronous software trap.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ast_handler(struct thread *td, struct trapframe *framep, bool dtor)
|
|
|
|
{
|
|
|
|
struct ast_entry *ae;
|
|
|
|
void (*f)(struct thread *td, int asts);
|
|
|
|
int a, td_ast;
|
|
|
|
bool run;
|
|
|
|
|
|
|
|
if (framep != NULL) {
|
|
|
|
kmsan_mark(framep, sizeof(*framep), KMSAN_STATE_INITED);
|
|
|
|
td->td_frame = framep;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (__predict_true(!dtor)) {
|
|
|
|
WITNESS_WARN(WARN_PANIC, NULL, "Returning to user mode");
|
|
|
|
mtx_assert(&Giant, MA_NOTOWNED);
|
|
|
|
THREAD_LOCK_ASSERT(td, MA_NOTOWNED);
|
2020-09-14 10:14:03 +00:00
|
|
|
|
2016-07-12 03:53:15 +00:00
|
|
|
/*
|
2022-07-18 16:39:17 +00:00
|
|
|
* This updates the td_ast for the checks below in one
|
|
|
|
* atomic operation with turning off all scheduled AST's.
|
|
|
|
* If another AST is triggered while we are handling the
|
|
|
|
* AST's saved in td_ast, the td_ast is again non-zero and
|
|
|
|
* ast() will be called again.
|
2016-07-12 03:53:15 +00:00
|
|
|
*/
|
2022-07-18 16:39:17 +00:00
|
|
|
thread_lock(td);
|
|
|
|
td_ast = td->td_ast;
|
|
|
|
td->td_ast = 0;
|
2016-07-12 03:53:15 +00:00
|
|
|
thread_unlock(td);
|
2021-01-11 01:22:44 +00:00
|
|
|
} else {
|
2022-07-18 16:39:17 +00:00
|
|
|
/*
|
|
|
|
* The td thread's td_lock is not guaranteed to exist,
|
|
|
|
* the thread might be not initialized enough when it's
|
|
|
|
* destructor is called. It is safe to read and
|
|
|
|
* update td_ast without locking since the thread is
|
|
|
|
* not runnable or visible to other threads.
|
|
|
|
*/
|
|
|
|
td_ast = td->td_ast;
|
|
|
|
td->td_ast = 0;
|
2008-03-21 08:23:25 +00:00
|
|
|
}
|
1997-04-07 07:16:06 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
CTR3(KTR_SYSC, "ast: thread %p (pid %d, %s)", td, td->td_proc->p_pid,
|
|
|
|
td->td_proc->p_comm);
|
|
|
|
KASSERT(framep == NULL || TRAPF_USERMODE(framep),
|
|
|
|
("ast in kernel mode"));
|
|
|
|
|
|
|
|
for (a = 0; a < nitems(ast_entries); a++) {
|
|
|
|
ae = &ast_entries[a];
|
|
|
|
f = ae->ae_f;
|
|
|
|
if (f == NULL)
|
|
|
|
continue;
|
|
|
|
atomic_interrupt_fence();
|
|
|
|
|
|
|
|
run = false;
|
|
|
|
if (__predict_false(framep == NULL)) {
|
|
|
|
if ((ae->ae_flags & ASTR_KCLEAR) != 0)
|
|
|
|
run = ast_handler_calc_tdp_run(td, ae);
|
|
|
|
} else {
|
|
|
|
if ((ae->ae_flags & ASTR_UNCOND) != 0)
|
|
|
|
run = true;
|
|
|
|
else if ((ae->ae_flags & ASTR_ASTF_REQUIRED) != 0 &&
|
|
|
|
(td_ast & TDAI(a)) != 0)
|
|
|
|
run = ast_handler_calc_tdp_run(td, ae);
|
|
|
|
}
|
|
|
|
if (run)
|
|
|
|
f(td, td_ast);
|
2009-10-27 10:55:34 +00:00
|
|
|
}
|
2022-07-18 16:39:17 +00:00
|
|
|
}
|
2009-10-27 10:55:34 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
void
|
|
|
|
ast(struct trapframe *framep)
|
|
|
|
{
|
|
|
|
struct thread *td;
|
2020-09-14 09:44:24 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
td = curthread;
|
|
|
|
ast_handler(td, framep, false);
|
2006-02-08 08:09:17 +00:00
|
|
|
userret(td, framep);
|
1997-04-07 07:16:06 +00:00
|
|
|
}
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
|
2022-07-18 16:39:17 +00:00
|
|
|
void
|
|
|
|
ast_kclear(struct thread *td)
|
|
|
|
{
|
|
|
|
ast_handler(td, NULL, td != curthread);
|
|
|
|
}
|
|
|
|
|
2010-05-26 15:39:43 +00:00
|
|
|
const char *
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
syscallname(struct proc *p, u_int code)
|
|
|
|
{
|
|
|
|
static const char unknown[] = "unknown";
|
2010-07-04 18:16:17 +00:00
|
|
|
struct sysentvec *sv;
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
|
2010-07-04 18:16:17 +00:00
|
|
|
sv = p->p_sysent;
|
|
|
|
if (sv->sv_syscallnames == NULL || code >= sv->sv_size)
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
return (unknown);
|
2010-07-04 18:16:17 +00:00
|
|
|
return (sv->sv_syscallnames[code]);
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
}
|