freebsd-src/sys/kern/kern_switch.c
Matthew Dillon f96ad4c223 STAGE-1 of 3 commit - allow (but do not require) interrupts to remain
enabled in critical sections and streamline critical_enter() and
critical_exit().

This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes.  Architectures that do not wish to do
this are not effected by this change.

This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature).  For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.

This commit is just the first stage.  Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.

The following changes have been made:

	* critical_enter() and critical_exit() for I386 now simply increment
	  and decrement curthread->td_critnest.  They no longer disable
	  hard interrupts.  When critical_exit() decrements the counter to
	  0 it effectively calls a routine to deal with whatever interrupts
	  were deferred during the time the code was operating in a critical
	  section.

	  Other architectures are unaffected.

	* fork_exit() has been conditionalized to remove MD assumptions for
	  the new code.  Old code will still use the old MD assumptions
	  in regards to hard interrupt disablement.  In STAGE-2 this will
	  be turned into a subroutine call into MD code rather then hardcoded
	  in MI code.

	  The new code places the burden of entering the critical section
	  in the trampoline code where it belongs.

	* I386: interrupts are now enabled while we are in a critical section.
	  The interrupt vector code has been adjusted to deal with the fact.
	  If it detects that we are in a critical section it currently defers
	  the interrupt by adding the appropriate bit to an interrupt mask.

	* In order to accomplish the deferral, icu_lock is required.  This
	  is i386-specific.  Thus icu_lock can only be obtained by mainline
	  i386 code while interrupts are hard disabled.  This change has been
	  made.

	* Because interrupts may or may not be hard disabled during a
	  context switch, cpu_switch() can no longer simply assume that
	  PSL_I will be in a consistent state.  Therefore, it now saves and
	  restores eflags.

	* FAST INTERRUPT PROVISION.  Fast interrupts are currently deferred.
	  The intention is to eventually allow them to operate either while
	  we are in a critical section or, if we are able to restrict the
	  use of sched_lock, while we are not holding the sched_lock.

	* ICU and APIC vector assembly for I386 cleaned up.  The ICU code
	  has been cleaned up to match the APIC code in regards to format
	  and macro availability.  Additionally, the code has been adjusted
	  to deal with deferred interrupts.

	* Deferred interrupts use a per-cpu boolean int_pending, and
	  masks ipending, spending, and fpending.  Being per-cpu variables
	  it is not currently necessary to lock; bus cycles modifying them.

	  Note that the same mechanism will enable preemption to be
	  incorporated as a true software interrupt without having to
	  further hack up the critical nesting code.

	* Note: the old critical_enter() code in kern/kern_switch.c is
	  currently #ifdef to be compatible with both the old and new
	  methodology.  In STAGE-2 it will be moved entirely to MD code.

Performance issues:

	One of the purposes of this commit is to enhance critical section
	performance, specifically to greatly reduce bus overhead to allow
	the critical section code to be used to protect per-cpu caches.
	These caches, such as Jeff's slab allocator work, can potentially
	operate very quickly making the effective savings of the new
	critical section code's performance very significant.

	The second purpose of this commit is to allow architectures to
	enable certain interrupts while in a critical section.  Specifically,
	the intention is to eventually allow certain FAST interrupts to
	operate rather then defer.

	The third purpose of this commit is to begin to clean up the
	critical_enter()/critical_exit()/cpu_critical_enter()/
	cpu_critical_exit() API which currently has serious cross pollution
	in MI code (in fork_exit() and ast() for example).

	The fourth purpose of this commit is to provide a framework that
	allows kernel-preempting software interrupts to be implemented
	cleanly.  This is currently used for two forward interrupts in I386.
	Other architectures will have the choice of using this infrastructure
	or building the functionality directly into critical_enter()/
	critical_exit().

	Finally, this commit is designed to greatly improve the flexibility
	of various architectures to manage critical section handling,
	software interrupts, preemption, and other highly integrated
	architecture-specific details.
2002-02-26 17:06:21 +00:00

306 lines
7.3 KiB
C

/*
* Copyright (c) 2001 Jake Burkholder <jake@FreeBSD.org>
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*/
#include <sys/param.h>
#include <sys/systm.h>
#include <sys/kernel.h>
#include <sys/ktr.h>
#include <sys/lock.h>
#include <sys/mutex.h>
#include <sys/proc.h>
#include <sys/queue.h>
/*
* Global run queue.
*/
static struct runq runq;
SYSINIT(runq, SI_SUB_RUN_QUEUE, SI_ORDER_FIRST, runq_init, &runq)
/*
* Wrappers which implement old interface; act on global run queue.
*/
struct thread *
choosethread(void)
{
return (runq_choose(&runq)->ke_thread);
}
int
procrunnable(void)
{
return runq_check(&runq);
}
void
remrunqueue(struct thread *td)
{
runq_remove(&runq, td->td_kse);
}
void
setrunqueue(struct thread *td)
{
runq_add(&runq, td->td_kse);
}
/*
* XXX temporary until these routines are moved fully into MD areas
*/
#ifndef MACHINE_CRITICAL_ENTER
/* Critical sections that prevent preemption. */
void
critical_enter(void)
{
struct thread *td;
td = curthread;
if (td->td_critnest == 0)
td->td_savecrit = cpu_critical_enter();
td->td_critnest++;
}
void
critical_exit(void)
{
struct thread *td;
td = curthread;
if (td->td_critnest == 1) {
td->td_critnest = 0;
cpu_critical_exit(td->td_savecrit);
} else
td->td_critnest--;
}
#endif
/*
* Clear the status bit of the queue corresponding to priority level pri,
* indicating that it is empty.
*/
static __inline void
runq_clrbit(struct runq *rq, int pri)
{
struct rqbits *rqb;
rqb = &rq->rq_status;
CTR4(KTR_RUNQ, "runq_clrbit: bits=%#x %#x bit=%#x word=%d",
rqb->rqb_bits[RQB_WORD(pri)],
rqb->rqb_bits[RQB_WORD(pri)] & ~RQB_BIT(pri),
RQB_BIT(pri), RQB_WORD(pri));
rqb->rqb_bits[RQB_WORD(pri)] &= ~RQB_BIT(pri);
}
/*
* Find the index of the first non-empty run queue. This is done by
* scanning the status bits, a set bit indicates a non-empty queue.
*/
static __inline int
runq_findbit(struct runq *rq)
{
struct rqbits *rqb;
int pri;
int i;
rqb = &rq->rq_status;
for (i = 0; i < RQB_LEN; i++)
if (rqb->rqb_bits[i]) {
pri = (RQB_FFS(rqb->rqb_bits[i]) - 1) +
(i << RQB_L2BPW);
CTR3(KTR_RUNQ, "runq_findbit: bits=%#x i=%d pri=%d",
rqb->rqb_bits[i], i, pri);
return (pri);
}
return (-1);
}
/*
* Set the status bit of the queue corresponding to priority level pri,
* indicating that it is non-empty.
*/
static __inline void
runq_setbit(struct runq *rq, int pri)
{
struct rqbits *rqb;
rqb = &rq->rq_status;
CTR4(KTR_RUNQ, "runq_setbit: bits=%#x %#x bit=%#x word=%d",
rqb->rqb_bits[RQB_WORD(pri)],
rqb->rqb_bits[RQB_WORD(pri)] | RQB_BIT(pri),
RQB_BIT(pri), RQB_WORD(pri));
rqb->rqb_bits[RQB_WORD(pri)] |= RQB_BIT(pri);
}
#ifdef INVARIANT_SUPPORT
/*
* Return true if the specified process is already in the run queue.
*/
static __inline int
runq_find(struct runq *rq, struct kse *ke)
{
struct kse *ke2;
int i;
mtx_assert(&sched_lock, MA_OWNED);
for (i = 0; i < RQB_LEN; i++)
TAILQ_FOREACH(ke2, &rq->rq_queues[i], ke_procq)
if (ke2 == ke)
return 1;
return 0;
}
#endif
/*
* Add the process to the queue specified by its priority, and set the
* corresponding status bit.
*/
void
runq_add(struct runq *rq, struct kse *ke)
{
struct rqhead *rqh;
int pri;
#ifdef INVARIANTS
struct proc *p = ke->ke_proc;
#endif
if (ke->ke_flags & KEF_ONRUNQ)
return;
mtx_assert(&sched_lock, MA_OWNED);
KASSERT(p->p_stat == SRUN, ("runq_add: proc %p (%s) not SRUN",
p, p->p_comm));
KASSERT(runq_find(rq, ke) == 0,
("runq_add: proc %p (%s) already in run queue", ke, p->p_comm));
pri = ke->ke_thread->td_priority / RQ_PPQ;
ke->ke_rqindex = pri;
runq_setbit(rq, pri);
rqh = &rq->rq_queues[pri];
CTR4(KTR_RUNQ, "runq_add: p=%p pri=%d %d rqh=%p",
ke->ke_proc, ke->ke_thread->td_priority, pri, rqh);
TAILQ_INSERT_TAIL(rqh, ke, ke_procq);
ke->ke_flags |= KEF_ONRUNQ;
}
/*
* Return true if there are runnable processes of any priority on the run
* queue, false otherwise. Has no side effects, does not modify the run
* queue structure.
*/
int
runq_check(struct runq *rq)
{
struct rqbits *rqb;
int i;
rqb = &rq->rq_status;
for (i = 0; i < RQB_LEN; i++)
if (rqb->rqb_bits[i]) {
CTR2(KTR_RUNQ, "runq_check: bits=%#x i=%d",
rqb->rqb_bits[i], i);
return (1);
}
CTR0(KTR_RUNQ, "runq_check: empty");
return (0);
}
/*
* Find and remove the highest priority process from the run queue.
* If there are no runnable processes, the per-cpu idle process is
* returned. Will not return NULL under any circumstances.
*/
struct kse *
runq_choose(struct runq *rq)
{
struct rqhead *rqh;
struct kse *ke;
int pri;
mtx_assert(&sched_lock, MA_OWNED);
if ((pri = runq_findbit(rq)) != -1) {
rqh = &rq->rq_queues[pri];
ke = TAILQ_FIRST(rqh);
KASSERT(ke != NULL, ("runq_choose: no proc on busy queue"));
KASSERT(ke->ke_proc->p_stat == SRUN,
("runq_choose: process %d(%s) in state %d", ke->ke_proc->p_pid,
ke->ke_proc->p_comm, ke->ke_proc->p_stat));
CTR3(KTR_RUNQ, "runq_choose: pri=%d kse=%p rqh=%p", pri, ke, rqh);
TAILQ_REMOVE(rqh, ke, ke_procq);
if (TAILQ_EMPTY(rqh)) {
CTR0(KTR_RUNQ, "runq_choose: empty");
runq_clrbit(rq, pri);
}
ke->ke_flags &= ~KEF_ONRUNQ;
return (ke);
}
CTR1(KTR_RUNQ, "runq_choose: idleproc pri=%d", pri);
return (PCPU_GET(idlethread)->td_kse);
}
/*
* Initialize a run structure.
*/
void
runq_init(struct runq *rq)
{
int i;
bzero(rq, sizeof *rq);
for (i = 0; i < RQ_NQS; i++)
TAILQ_INIT(&rq->rq_queues[i]);
}
/*
* Remove the process from the queue specified by its priority, and clear the
* corresponding status bit if the queue becomes empty.
*/
void
runq_remove(struct runq *rq, struct kse *ke)
{
struct rqhead *rqh;
int pri;
if (!(ke->ke_flags & KEF_ONRUNQ))
return;
mtx_assert(&sched_lock, MA_OWNED);
pri = ke->ke_rqindex;
rqh = &rq->rq_queues[pri];
CTR4(KTR_RUNQ, "runq_remove: p=%p pri=%d %d rqh=%p",
ke, ke->ke_thread->td_priority, pri, rqh);
KASSERT(ke != NULL, ("runq_remove: no proc on busy queue"));
TAILQ_REMOVE(rqh, ke, ke_procq);
if (TAILQ_EMPTY(rqh)) {
CTR0(KTR_RUNQ, "runq_remove: empty");
runq_clrbit(rq, pri);
}
ke->ke_flags &= ~KEF_ONRUNQ;
}