Commit graph

7268 commits

Author SHA1 Message Date
Daniel Eischen 4fc21c0947 Keep track of threads waiting in kse_release() to avoid a race
condition where kse_wakeup() doesn't yet see them in (interruptible)
sleep queues.  Also add an upcall check to sleepqueue_catch_signals()
suggested by jhb.

This commit should fix recent mysql hangs.

Reviewed by:	jhb, davidxu
Mysql'd by:	Robin P. Blanchard <robin.blanchard at gactr uga edu>
2004-04-28 20:36:53 +00:00
David Schultz 06afcd9d10 If the buffer supplied to kenv(KENV_DUMP, ...) isn't big enough,
return the number of bytes needed instead of 0.  The manpage claims
that we do this anyway.
2004-04-28 01:27:33 +00:00
Bosko Milekic 5a59cefcd1 Give jail(8) the feature to allow raw sockets from within a
jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.

Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.

The patch being committed is not identical to the patch
in the PR.  The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely.  This change has also been
presented and addressed on the freebsd-hackers mailing
list.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800
2004-04-26 19:46:52 +00:00
Pawel Jakub Dawidek 6c0ad4a77a Always use nd.ni_vp->v_mount as an argument for VFS_QUOTACTL(), just like
in RELENG_4.

Pointed out by:	Alex Lyashkov <umka@sevinter.net>
2004-04-26 15:44:42 +00:00
Hiten Pandya 024035e822 The paper "Hashed Timers and Hierarchical Wheels: Data Structures for the
Efficient Implementation of a Timer Facility" was co-author'ed by T. Lauk,
not A. Lauk.

Adjust nearby whitespace.
2004-04-25 04:10:17 +00:00
Alan Cox 59c8bc40ce Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes
kmem_alloc_wait()) for mapping the image header.  On all machines with a
direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.
2004-04-23 03:01:40 +00:00
David E. O'Brien 207a6c0dcb There was a thread on "unusually high load averages" when running under
sched_ule, in January 2004.  Looking at this, "pagezero" is (one of) the
culprit(s).  We had no provision for processes with P_NOLOAD set.  With
pagezero not running at PRI_ITHD, kseq_load_{add,rem} count pagezero as
another-normal-process, thus the "expected-plus-one" load reported in
the above thread.

Submitted by:	Nikos Ntarmos <ntarmos@ceid.upatras.gr>
2004-04-22 21:37:46 +00:00
Pawel Jakub Dawidek 0c0c597faa Look out! vn_start_write() is able to return 0 and NULL 'mp'.
Submitted by:	Alex Lyashkov <shadow@psoft.net>
2004-04-22 15:40:27 +00:00
Bruce Evans 057e27959f Include <sys/mutex.h> and its prerequisite <sys/lock.h> instesd of depending
on namespace pollution in <sys/vnode.h>.

Sorted includes.
2004-04-21 12:10:30 +00:00
Colin Percival 05641e82d7 1. Remove callout_stop binary compatibility.
2. Document that this means that kernel modules must be rebuilt.
3. While I'm here, fix my sorting error in callout.h

Requested by:	many [1], scottl [2], bde [3]
2004-04-20 15:49:31 +00:00
Mike Makonnen b9fb5d4286 If you're trying to find out if a thread is valid and in
the same process as the current thread it makes absolutely
no sense to lock the parent process through the pointer in
said thread.

Submitted by:	pho (with minor correction)
Pointy Hat To:	mtm
2004-04-19 14:20:01 +00:00
Luigi Rizzo 24665342d3 constify the last argument of m_copyback. 2004-04-18 13:01:28 +00:00
Bruce Evans 7b1fe905ef Fixed some style bugs in previous commit (mainly an insertion sort error
for declarations, and poorly worded messages).

Fixed some nearby style bugs (unsorted declarations).
2004-04-17 02:46:05 +00:00
John Baldwin 7870c3c61c - Enable (unmask) interrupt sources earlier in the ithread loop.
Specifically, we used to enable the source after locking sched_lock
  and just before we had already decided to do a context switch.
  This meant that an ithread could never process more than one interrupt
  per context switch.  Enabling earlier in the loop before sched_lock is
  acquired allows an ithread to handle multiple interrupts per context
  switch if interrupts fire very rapidly.  For the case of heavy interrupt
  load this can reduce the number of context switches (and thus overhead)
  as well as reduce interrupt latency.
- Now that we can handle multiple interrupts per context switch, add simple
  interrupt storm protection to threaded interrupts.  If X number of
  consecutive interrupts are triggered before the itherad voluntarily
  yields to another thread, then the interrupt thread will sleep with the
  associated interrupt source disabled (masked) for 1/10th of a second.
  The default value of X is 500, but it can be tweaked via the tunable/
  sysctl hw.intr_storm_threshold.  If an interrupt storm is detected, then
  a message is output to the kernel console on the first occurrence per
  interrupt thread.  Interrupt storm protection can be disabled completely
  by setting this value to 0.  There is no scientific reasoning for the
  1/10th of a second or 500 interrupts values, so they may require tweaking
  at some point in the future.

Tested by:	rwatson (an earlier version w/o the storm protection)
Tested by:	mux (reportedly made a machine with two PCI interrupts
		storming usable rather than hard locked)
Reviewed by:	imp
2004-04-16 20:25:40 +00:00
Robert Watson d54efd4d31 At some point during the history of m_getcl(), MAC support began to
unconditionally initialize the mbuf header even if cluster allocation
failed, which could result in a NULL pointer dereference in low-memory
conditions.

PR:		kern/65548
Submitted by:	Stephan Uphoff <ups@tree.com>
2004-04-16 14:35:11 +00:00
Ruslan Ermilov 61f7581d08 Ensure that the poll_burst <= poll_burst_max constraint really holds.
Reviewed by:	luigi
2004-04-15 07:38:44 +00:00
Warner Losh 5e1d0a23bc Fix off by one error, twice.
Submitted by: Carlos Velasco (first one), jhb (second one)
2004-04-12 23:02:21 +00:00
Colin Percival 4a3b3dcb55 stop() no longer needs sched_lock held; in fact, holding sched_lock causes
a LOR against sleepq.  Fix the comment, and fix ptracestop() to pick up
sched_lock after stop() rather than before.

Reported by:	Scott Sipe <cscotts@mindspring.com>
Reviewed by:	rwatson, jhb
2004-04-12 15:56:05 +00:00
Maxime Henrion a0b5a67929 Put deprecated sysctl code inside BURN_BRIDGES. 2004-04-11 21:09:22 +00:00
Alan Cox 148b3f62a9 Use vm_page_hold() rather than vm_page_wire() for short-duration page
wiring.  The reason being that vm_page_hold() is cheaper.
2004-04-11 19:57:11 +00:00
Maxime Henrion 4ddd1e65d4 Remove a comment that complains about the lack of %qd, to justify
truncating a rlim_t to a long.  We have %qd since some time now.
However, the correct format to use here is %jd and a cast to
intmax_t, so do this.
2004-04-10 11:08:16 +00:00
Peter Edwards 24554d00bc Plug minor memory leak of module_t structures when unloading a file
from the kernel.

Reviewed By: Doug Rabson (dfr@)
2004-04-09 15:27:38 +00:00
Olivier Houchard d50c87decf Spell "switches" a more conventional way. 2004-04-09 14:31:29 +00:00
Robert Watson 123f024b24 Compare pointers with NULL rather than using pointers are booleans in
if/for statements.  Assign pointers to NULL rather than typecast 0.
Compare pointers with NULL rather than 0.
2004-04-09 13:23:51 +00:00
Mike Silbersack e8410540b7 Fix a regression in my change which sends headers along with data; a
side effect of that change caused headers to not be sent if a 0 byte
file was passed to sendfile.  This change fixes that behavior, allowing
sendfile to send out the headers even with a 0 byte file again.

Noticed by:	Dirk Engling
2004-04-08 07:14:34 +00:00
Marcel Moolenaar ece267ba58 Do not assume that the initial thread (i.e. the thread with the ID
equal to the process ID) is still present when we dump a core. It
already may have been destroyed. In that case we would end up
dereferencing a NULL pointer, so specifically test for that as well.

Reported & tested by: Dan Nelson <dnelson@allantgroup.com>
2004-04-08 06:37:00 +00:00
Colin Percival 49a74476a6 Add whitespace before comment blocks. (reported by njl)
Remove spurious whitespace, add indent protection, fix punctuation,
remove initialization of static variables to zero, put wakeup_ctr
and wakeup_needed in the correct order. (reported by bde)

This doesn't fix all the style bugs I introduced, but the remaining
style bugs make it easier for me to understand what I did here.
2004-04-08 02:03:49 +00:00
Warner Losh f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Colin Percival ec513ff759 Fix filt_timer* races: Finish initializing a knote before we pass it to
a callout, and use the new callout_drain API to make sure that a callout
has finished before we deallocate memory it is using.

PR:		kern/64121
Discussed with:	gallatin
2004-04-07 05:59:57 +00:00
Colin Percival 2c1bb20746 Introduce a callout_drain() function. This acts in the same manner as
callout_stop(), except that if the callout being stopped is currently
in progress, it blocks attempts to reset the callout and waits until the
callout is completed before it returns.

This makes it possible to clean up callout-using code safely, e.g.,
without potentially freeing memory which is still being used by a callout.

Reviewed by:	mux, gallatin, rwatson, jhb
2004-04-06 23:08:49 +00:00
John Baldwin 9000d57d57 Associate a simple count of waiters with each condition variable. The
count is protected by the mutex that protects the condition, so the count
does not require any extra locking or atomic operations.  It serves as an
optimization to avoid calling into the sleepqueue code at all if there are
no waiters.

Note that the count can get temporarily out of sync when threads sleeping
on a condition variable time out or are aborted.  However, it doesn't hurt
to call the sleepqueue code for either a signal or a broadcast when there
are no waiters, and the count is never out of sync in the opposite
direction unless we have more than INT_MAX sleeping threads.
2004-04-06 19:17:46 +00:00
John Baldwin 535eb30962 Add a new kernel option MUTEX_WAKE_ALL that changes the mutex unlock code
to awaken all waiters when a contested mutex is released instead of just
the highest priority waiter.  If the various threads are awakened in
sequence then each thread may acquire and release the lock in question
without contention resulting in fewer expensive unlock and lock
operations.  This old behavior of waking just the highest priority is
still used if this option is specified.  Making the algorithm conditional
on a kernel option will allows us to benchmark both cases later and
determine which one should be used by default.

Requested by:	tanimura-san
2004-04-06 19:12:24 +00:00
John Baldwin ef2c0ba7e4 Rename turnstile_wakeup() to turnstile_broadcast() to make the naming
more consistent with other APIs. sleepq and cv's use signal/broadcast, and
msleep uses wakeup_one/wakeup.  Prior to this turnstiles were using a
signal/wakeup mixture.
2004-04-06 19:07:21 +00:00
Bruce Evans 295ed75297 Removed some less than useful comments:
- don't say what a small subset of the options includes are for.
- don't mark up functions which use all their args with /* ARGSUSED */.
  The markup should have been removed when the unused retval parameter
  was removed.
- don't comment on what routine suser() checks do.  Removed nearby
  excessive vertical whitespace.
2004-04-06 10:05:02 +00:00
Warner Losh 7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Doug Rabson 7d5ea13fcd Try not to crash instantly when signalling a libthr program to death. 2004-04-05 15:06:01 +00:00
Doug Rabson e2c8a799c1 Regen. 2004-04-05 10:17:23 +00:00
Doug Rabson 0b0a60fb43 Add lgetfh(2) which is like getfh(2) but doesn't follow symlinks. 2004-04-05 10:15:53 +00:00
Robert Watson 051bbf603a Detatch incorrect spellings of detach. 2004-04-04 19:15:45 +00:00
Jeff Roberson 37a35e4a60 - Use the proper constant in sched_interact_update(). Previously,
SCHED_INTERACT_MAX was used where SCHED_SLP_RUN_MAX was needed.  This was
   causing the interactivity scaler to lose history at a more dramatic rate
   than intended.
2004-04-04 19:12:56 +00:00
Marcel Moolenaar 8c9b7b2c84 Create NT_PRSTATUS and NT_FPREGSET notes for each and every thread
in the process. This is required for proper debugging of corefiles
created by 1:1 or M:N threaded processes. Add an XXX comment where
we should actually call a function that dumps MD specific notes.
An example of a MD specific note is the NT_PRXFPREG note for SSE
registers.

Since BFD creates non-annotated pseudo-sections for the first PRSTATUS
and FPREGSET notes (non-annotated in the sense that the name of the
section does not contain the pid/tid), make sure those sections describe
the initial thread of the process (i.e. the thread which tid equals the
pid). This is not strictly necessary, but makes sure that tools that use
the non-annotated section names will not change behaviour due to this
change.

The practical upshot of this all is that one can see the threads in
the debugger when looking at a corefile. For 1:1 threading this means
that *all* threads are visible.
2004-04-03 20:25:41 +00:00
Marcel Moolenaar fdcac92868 Assign thread IDs to kernel threads. The purpose of the thread ID (tid)
is twofold:
1. When a 1:1 or M:N threaded process dumps core, we need to put the
   register state of each of its kernel threads in the core file.
   This can only be done by differentiating the pid field in the
   respective note. For this we need the tid.
2. When thread support is present for remote debugging the kernel
   with gdb(1), threads need to be identified by an integer due to
   limitations in the remote protocol. This requires having a tid.

To minimize the impact of having thread IDs, threads that are created
as part of a fork (i.e. the initial thread in a process) will inherit
the process ID (i.e. tid=pid). Subsequent threads will have IDs larger
than PID_MAX to avoid interference with the pid allocation algorithm.
The assignment of tids is handled by thread_new_tid().

The thread ID allocation algorithm has been written with 3 assumptions
in mind:
1. IDs need to be created as fast a possible,
2. Reuse of IDs may happen instantaneously,
3. Someone else will write a better algorithm.
2004-04-03 15:59:13 +00:00
Alan Cox 121230a40d In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not.  Add a new parameter so that the caller can specify which is
the case.

Reported by:	dillon
2004-04-03 09:16:27 +00:00
Kris Kennaway c5af600675 Add missing comment terminator. 2004-04-02 04:57:40 +00:00
Julian Elischer 4f73277a35 The comment complained about not having a thread_unlink()
and did the work itself, but thread_unink() has existed for a while... use it.
2004-04-02 01:01:34 +00:00
John Baldwin e43257aa7d Finish fixing up Alpha to work with an MP safe ptrace():
- ptrace_single_step() is no longer called with the proc lock held, so
  don't try to unlock it and then relock it.
- Push Giant down into proc_rwmem() instead of forcing all the consumers
  (including Alpha breakpoint support) to explicitly wrap calls to
  proc_rwmem() with Giant.

Tested by:	kensmith
2004-04-01 20:56:44 +00:00
Scott Long cd587b1397 Don't print out 'GIANT-LOCKED' for INTR_FAST drivers. 2004-04-01 07:18:42 +00:00
Pawel Jakub Dawidek 2fc0588da2 Remove sysctl kern.ps_argsopen, it is not very useful, one should use
security.bsd.see_other_uids instead.

Discussed with:	phk, rwatson
2004-04-01 00:10:45 +00:00
Pawel Jakub Dawidek 5e2c0c0b0e Remove ps_argsopen check. It is was bogus in the past and was corrected
not quite well by me - if kern.ps_argsopen was set to 0, users weren't
permitted to see arguments of even own processes.
But kern.ps_argsopen is going away, so just remove this check and leave
security checks for p_cansee() function.
2004-04-01 00:08:20 +00:00
Julian Elischer 4ccbe07e84 Remove unused variable. 2004-03-31 08:20:44 +00:00
Robert Watson 8e44a7ec13 In sofree(), avoid nested declaration and initialization in
declaration.  Observe that initialization in declaration is
frequently incompatible with locking, not just a bad idea
due to style(9).

Submitted by:	bde
2004-03-31 03:48:35 +00:00
Robert Watson db48c0d254 Export uipc_connect2() from uipc_usrreq.c instead of unp_connect2(),
and consume that interface in portalfs and fifofs instead.  In the
new world order, unp_connect2() assumes that the unpcb mutex is
held, whereas uipc_connect2() validates that the passed sockets are
UNIX domain sockets, then grabs the mutex.

NB: the portalfs and fifofs code gets down and dirty with UNIX domain
sockets.  Maybe this is a bad thing.
2004-03-31 01:41:30 +00:00
Alan Cox 1dc10fceaa White space and wording changes to init_param3().
Mostly submitted by:	bde
2004-03-30 08:00:11 +00:00
Robert Watson fc3fcacf52 Prefer NULL to 0 when testing and assigning pointer values. 2004-03-30 02:16:25 +00:00
Peter Wemm 9a6a4cb50d Shorten some XXXKSE commentry 2004-03-29 22:46:54 +00:00
Peter Wemm 39d3505a30 Kill some XXXKSE's. vnlru/syncer are single threaded. 2004-03-29 22:45:33 +00:00
Peter Wemm b21126c6b3 Clean up the stub fake vnode locking implemenations. The main reason this
stuff was here (NFS) was fixed by Alfred in November.  The only remaining
consumer of the stub functions was umapfs, which is horribly horribly
broken.  It has missed out on about the last 5 years worth of maintenence
that was done on nullfs (from which umapfs is derived).  It needs major
work to bring it up to date with the vnode locking protocol.  umapfs really
needs to find a caretaker to bring it into the 21st century.

Functions GC'ed:
vop_noislocked, vop_nolock, vop_nounlock, vop_sharedlock.
2004-03-29 22:41:21 +00:00
Robert Watson 181e65db5b Use a common return path for filt_soread() and filt_sowrite() to
simplify the impact of locking on these functions.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
2004-03-29 18:06:15 +00:00
Robert Watson 71c90a2944 In sofree(), moving caching of 'head' from 'so->so_head' to later in
the function once it has been determined to be non-NULL to simplify
locking on an earlier return.
2004-03-29 17:57:43 +00:00
Robert Watson 5a35e5f9af If debug.mpsafenet, initialize UNIX domain socket timeouts as MPSAFE;
otherwise, assert Giant in the callouts.
2004-03-29 17:00:05 +00:00
Robert Watson 627e4a9973 Conditionally acquire Giant when entering the sockets layer via the
socket-specific system calls based on debug.mpsafenet, rather than
acquiring Giant unconditionally.
2004-03-29 02:21:56 +00:00
Robert Watson 32903c86e7 Conditionally acquire Giant when entering the socket layer via file
descriptor operations based on debug.mpsafenet, rather than acquiring
Giant unconditionally.
2004-03-29 01:55:32 +00:00
Robert Watson 74041f5a10 When validating that the length sum in recvit(), we fail to release
Giant on an error.  Add a Giant acquisition.

Reviewed by:	sam, bms
2004-03-29 01:37:06 +00:00
Robert Watson a1288c786e Conditionally assert Giant in fputsock() based on the value of
debug.mpsafenet.
2004-03-29 00:33:02 +00:00
Alan Cox e3b19536fb Revise the direct or optimized case to use uiomove_fromphys() by the reader
instead of ephemeral mappings using pmap_qenter() by the writer.  The
writer is still, however, responsible for wiring the pages, just not
mapping them.  Consequently, the allocation of KVA for the direct case is
unnecessary.  Remove it and the sysctls limiting it, i.e.,
kern.ipc.maxpipekvawired and kern.ipc.amountpipekvawired.  The number
of temporarily wired pages is still, however, limited by
kern.ipc.maxpipekva.

Note: On platforms lacking a direct virtual-to-physical mapping,
uiomove_fromphys() uses sf_bufs to cache ephemeral mappings.  Thus,
the number of available sf_bufs can influence the performance of pipes
on platforms such i386.  Surprisingly, I saw the greatest gain from this
change on such a machine: lmbench's pipe bandwidth result increased from
~1050MB/s to ~1850MB/s on my 2.4GHz, 400MHz FSB P4 Xeon.
2004-03-27 19:50:23 +00:00
Marcel Moolenaar b2ae7ed72c Change the type of the various CPU masks to cpumask_t. Note that as
long as there are still explicit uses of int, whether in types or
in function names (such as atomic_set_int() in sched_ule.c), we can
not change cpumask_t to be anything other than u_int. See also the
commit log for sys/sys/types.h, revision 1.84.
2004-03-27 18:21:24 +00:00
Mike Makonnen a73027fee9 Regen for libthr thread synchronization syscalls. 2004-03-27 14:34:17 +00:00
Mike Makonnen 0af67a2ef9 Use the proc lock to sleep on a libthr umtx. 2004-03-27 14:32:03 +00:00
Mike Makonnen 1713a51661 Separate thread synchronization from signals in libthr. Instead
use msleep() and wakeup_one().

Discussed with: jhb, peter, tjr
2004-03-27 14:30:43 +00:00
Pawel Jakub Dawidek 0b68054f9d - Add a description for vfs.usermount sysctl.
- Add the vfs_equalopts() function for mount options comparsion.
  Now it looks much more clear.
- Style fixed.

In co-operation with:	bde
2004-03-27 08:39:28 +00:00
Pawel Jakub Dawidek 6c8cc8ec4b - Loudly disallow MNT_SUIDDIR mount flag for unprivileged users mounts.
- Style fixed.

Submitted by:	bde
2004-03-27 08:09:00 +00:00
Pawel Jakub Dawidek 2c6040bbb7 We probably shouldn't allow users to mount file systems with MNT_SUIDDIR.
There should be not shell access when SUIDDIR is compiled in, but
better be sure.

Reviewed by:	rwatson
2004-03-26 21:12:14 +00:00
Alan Cox 2b63e7f397 Use uiomove_fromphys() instead of pmap_qenter() and pmap_qremove() in
proc_rwmem().
2004-03-24 23:35:04 +00:00
Warner Losh 9fc0327792 Conform to local file sytle and prefer (a && (b & flag)). 2004-03-24 16:49:37 +00:00
David E. O'Brien 0d50bcb36b Change the !MPSAFE boot string to something that doesn't potentially
scare users that the kernel won't run on MP systems.
2004-03-23 01:58:09 +00:00
Alfred Perlstein 12e9993f65 Emit a traceback when witness_trace is set and witness_warn() is
called and triggers (typically caused by sleeping with a non-sleepable
lock).

Reviewed by: jhb
2004-03-23 00:32:27 +00:00
David E. O'Brien f1c8692d0a Rather than display which interrupts are MPSAFE, display those that aren't.
This way we can take stock of the work to be done.  boot -v will note those
interrupts that are MPSAFE.
2004-03-22 22:36:11 +00:00
Paul Saab 2eada6bc8e Remove some netbsd debug code that crept into rev 1.116 2004-03-22 10:17:40 +00:00
David E. O'Brien b003da7938 Give a more reasonable CPU time to the threads which are using scheduler
activation (i.e., applications are using libpthread).  This is because
SCHED_ULE sometimes puts P_SA processes into ksq_next unnecessarily.
Which doesn't give fair amount of CPU time to processes which are
using scheduler-activation-based threads when other (semi-)CPU-intensive,
non-P_SA processes are running.

Further work will no doubt be done by jeffr at a later date.

Submitted by:	Taku YAMAMOTO <taku@cent.saitama-u.ac.jp>
Reviewed by:	rwatson, freebsd-current@
2004-03-21 18:53:29 +00:00
Julian Elischer 84eef27df4 Massively up the (artificial) limit on system scope threads
in a process from 50 to 500

Also up the number of process scope threads allowed to be in the kernel
at one time from 150 to 1500 (per process)
2004-03-21 09:22:38 +00:00
Brian Feldman 150883179a Add the missing Giant when doing anything with VFS -- in this case,
releasing the ktrace vnode.
2004-03-18 18:15:58 +00:00
Jacques Vidrine 3dc19c4677 Verify more bits of the ELF header: the program header table
entry size and the ELF version.  Also, avoid a potential integer
overflow when determining whether the ELF header fits entirely
within the first page.

Reviewed by:	jdp

A panic when attempting to execute an ELF binary with a bogus program
header table entry size was

Reported by:	Christer Öberg <christer.oberg@texonet.com>
2004-03-18 16:33:05 +00:00
Alan Cox 9508f75c23 Revise socow_iodone() in light of recent sf_buf changes. Specifically,
use sf_buf_free() instead of sf_buf_mext() to consolidate all actions
that require the page queues lock in one critical section.  While I'm
here remove unnecessary splvm() and splx() calls.
2004-03-17 23:25:04 +00:00
John Baldwin b7e23e826c - Replace wait1() with a kern_wait() function that accepts the pid,
options, status pointer and rusage pointer as arguments.  It is up to
  the caller to copyout the status and rusage to userland if needed.  This
  lets us axe the 'compat' argument and hide all that functionality in
  owait(), by the way.  This also cleans up some locking in kern_wait()
  since it no longer has to drop locks around copyout() since all the
  copyout()'s are deferred.
- Convert owait(), wait4(), and the various ABI compat wait() syscalls to
  use kern_wait() rather than wait1() or wait4().  This removes a bit
  more stackgap usage.

Tested on:	i386
Compiled on:	i386, alpha, amd64
2004-03-17 20:00:00 +00:00
Pawel Jakub Dawidek 9cdb62160b Fix information leakage.
Without this fix it is possible to cheat policies like:
- sysctl security.bsd.see_other_[gu]ids=0,
- mac_seeotheruids(4),
- jail(2)
and get full processes list with their arguments.

This problem exists from revision 1.62 of kern_proc.c when it was
introduced.

Reviewed by:	nectar, rwatson.
2004-03-17 13:19:43 +00:00
Colin Percival 018e32c194 Adjust the number of processes waiting on a semaphore properly if we're
woken up in the middle of sleeping.

PR:		misc/64347
Reviewed by:	tjr
MFC after:	7 days
2004-03-17 09:37:13 +00:00
Alan Cox 90ecfebd82 Refactor the existing machine-dependent sf_buf_free() into a machine-
dependent function by the same name and a machine-independent function,
sf_buf_mext().  Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical.  Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page.  That is now the purpose of
sf_buf_mext().  Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.
2004-03-16 19:04:28 +00:00
John Baldwin 27de234992 Remove a bogus assertion and readd it in a more correct location. A thread
might be enqueued on a sleep queue but not be asleep when the timeout fires
if it is blocked on a lock trying to check for pending signals before going
to sleep.  In the case of fixing up the TDF_TIMEOUT race, however, the
thread must be marked asleep.

Reported by:	kan (the bogus one)
2004-03-16 18:56:22 +00:00
Peter Grehan 721b6196d5 Add powerpc to temporary fix. The new cpu device claims all
'generic' OpenFirmware nexus nodes, since it uses bus_generic_probe.
Maybe the cpu device probe should be MD.
2004-03-16 13:34:50 +00:00
David Malone 31c7e8b05b Nudge Giant as far as I can into kern_open(). Mark open() as MPSAFE.
Use kern_open() to implement creat() rather than taking the long route
through open(). Mark creat as MPSAFE.

While I'm at it, mark nosys() (syscall 0) as MPSAFE, for all the
difference it will make.
2004-03-16 10:46:42 +00:00
David Malone 1f325ae35e Get ready to mark open, creat and nosys as MPSAFE. 2004-03-16 10:41:23 +00:00
Tim J. Robbins 537370d0a4 Make vfs_nmount() public. The Linux emulator needs this in order to mount
linprocfs filesystems.
2004-03-16 08:59:37 +00:00
Don Lewis a961520c13 Rename the wiredlen member of struct sysctl_req to validlen and always
set it to avoid the need for a bunch of code that tests whether or
not the lock member is set to REQ_WIRED in order to determine which
length member should be used.

Fix another bug in the oldlen return value code.

Fix a potential wired memory leak if a sysctl handler uses
sysctl_wire_old_buffer() and returns an EAGAIN error to trigger
a retry.
2004-03-16 06:53:03 +00:00
Don Lewis 8ac3e8e940 Don't bother calling vslock() and vsunlock() if oldlen is zero.
If vslock() returns ENOMEM, sysctl_wire_old_buffer() should set
wiredlen to zero and return zero (success) so that the handler will
operate according to sysctl(3):
     The size of the buffer is given by the location specified by
     oldlenp before the call, and that location gives the amount
     of data copied after a successful call and after a call that
     returns with the error code ENOMEM.
The handler will return an ENOMEM error because the zero length
buffer will overflow.
2004-03-16 01:28:45 +00:00
John Baldwin 6b55d75c44 Regen for ptrace being safe again. 2004-03-15 18:50:06 +00:00
John Baldwin 8ac61436e6 Drop the proc lock around calls to the MD functions ptrace_single_step(),
ptrace_set_pc(), and cpu_ptrace() so that those functions are free to
acquire Giant, sleep, etc.  We already do a PHOLD/PRELE around them so
that it is safe to sleep inside of these routines if necessary.  This
allows ptrace() to be marked MP safe again as it no longer triggers lock
order reversals on Alpha.

Tested by:	wilko
2004-03-15 18:48:28 +00:00
Pawel Jakub Dawidek 7f4704c01d Remove sysctl security.jail.list_allowed.
This functionality was a misfeature, sysctl was added and turned off by
default just to check if nobody complains.

Reviewed by:	rwatson
2004-03-15 12:10:34 +00:00
Don Lewis ce8660e395 Revert to the original vslock() and vsunlock() API with the following
exceptions:
	Retain the recently added vslock() error return.

	The type of the len argument should be size_t, not u_int.

Suggested by:	bde
2004-03-15 06:42:40 +00:00
Poul-Henning Kamp bcfe6d8b26 Annual NTP kernel code spring-cleaning:
Use int64_t rather than long long for the fixpoint type.

Don't discard fractional nanosecond frequency correction.
2004-03-14 15:23:05 +00:00
Peter Wemm 8f650450c6 Set default HZ to 1024 for amd64. The comment in kern/tty.c doesn't
apply here because we have 64 bit longs and don't suffer the hz > 169
overflows.
2004-03-14 05:49:31 +00:00
Peter Wemm a5bdcb2a2f Make the process_exit eventhandler run without Giant. Add Giant hooks
in the two consumers that need it.. processes using AIO and netncp.
Update docs.  Say that process_exec is called with Giant, but not to
depend on it.  All our consumers can handle it without Giant.
2004-03-14 02:06:28 +00:00
Peter Wemm 8a412f314e Move the process_fork event out from under Giant. This one is easy,
since there are no consumers in the tree.  Document this.
2004-03-14 01:48:32 +00:00
Peter Wemm 78c45c5d66 Regen for mpsafe kse_create() 2004-03-13 22:32:17 +00:00
Peter Wemm 37814395c1 Push Giant down a little further:
- no longer serialize on Giant for thread_single*() and family in fork,
  exit and exec
- thread_wait() is mpsafe, assert no Giant
- reduce scope of Giant in exit to not cover thread_wait and just do
  vm_waitproc().
- assert that thread_single() family are not called with Giant
- remove the DROP/PICKUP_GIANT macros from thread_single() family
- assert that thread_suspend_check() s not called with Giant
- remove manual drop_giant hack in thread_suspend_check since we know it
  isn't held.
- remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family
- mark kse_create() mpsafe
2004-03-13 22:31:39 +00:00
Robert Watson 5d8dd01da2 Add annotations to mtx_lock(&Giant) in kern_select() and poll() that
we always grab Giant, even if we're actually only polling objects that
don't require giant.  Once socket locking is merged, there will be
strong motivation to fix this.
2004-03-13 05:58:57 +00:00
Bruce Evans 0249823ecb Align the offset in vn_rdwr_inchunks() so that at most the first and
the last chunk are misaligned relative to a MAXBSIZE byte boundary.
vn_rdwr_inchunks() is used mainly for elf core dumps, and elf sections
are usually perfectly misaligned relative to MAXBSIZE, and chunking
prevents the file system from doing much realigning.

This gives a surprisingly large speedup for core dumps -- from 50 to
13 seconds for a 512MB core dump here.  The pessimization was mostly
from an interaction of the misalignment with IO_DIRECT.  It increased
the number of i/o's for each chunk by a factor of 5 (3 writes and 2
read-before-writes instead of 1 write).
2004-03-13 02:56:27 +00:00
Tom Rhodes a122cca953 These are changes to allow to use the Intel C/C++ compiler (lang/icc)
to build the kernel. It doesn't affect the operation if gcc.

Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.

Additional changes:
 - in_cksum.[ch]:
   * use a generic C version instead of the assembly version in the !gcc
     case (ASM code breaks with the optimizations icc does)
     -> no bad checksums with an icc compiled kernel
     Help from:		andre, grehan, das
     Stolen from: 	alpha version via ppc version
     The entire checksum code should IMHO be replaced with the DragonFly
     version (because it isn't guaranteed future revisions of gcc will
     include similar optimizations) as in:
        ---snip---
          Revision  Changes    Path
          1.12      +1 -0      src/sys/conf/files.i386
          1.4       +142 -558  src/sys/i386/i386/in_cksum.c
          1.5       +33 -69    src/sys/i386/include/in_cksum.h
          1.5       +2 -0      src/sys/netinet/igmp.c
          1.6       +0 -1      src/sys/netinet/in.h
          1.6       +2 -0      src/sys/netinet/ip_icmp.c

          1.4       +3 -4      src/contrib/ipfilter/ip_compat.h
          1.3       +1 -2      src/sbin/natd/icmp.c
          1.4       +0 -1      src/sbin/natd/natd.c
          1.48      +1 -0      src/sys/conf/files
          1.2       +0 -1      src/sys/conf/files.amd64
          1.13      +0 -1      src/sys/conf/files.i386
          1.5       +0 -1      src/sys/conf/files.pc98
          1.7       +1 -1      src/sys/contrib/ipfilter/netinet/fil.c
          1.10      +2 -3      src/sys/contrib/ipfilter/netinet/ip_compat.h
          1.10      +1 -1      src/sys/contrib/ipfilter/netinet/ip_fil.c
          1.7       +1 -1      src/sys/dev/netif/txp/if_txp.c
          1.7       +1 -1      src/sys/net/ip_mroute/ip_mroute.c
          1.7       +1 -2      src/sys/net/ipfw/ip_fw2.c
          1.6       +1 -2      src/sys/netinet/igmp.c
          1.4       +158 -116  src/sys/netinet/in_cksum.c
          1.6       +1 -1      src/sys/netinet/ip_gre.c
          1.7       +1 -2      src/sys/netinet/ip_icmp.c
          1.10      +1 -1      src/sys/netinet/ip_input.c
          1.10      +1 -2      src/sys/netinet/ip_output.c
          1.13      +1 -2      src/sys/netinet/tcp_input.c
          1.9       +1 -2      src/sys/netinet/tcp_output.c
          1.10      +1 -1      src/sys/netinet/tcp_subr.c
          1.10      +1 -1      src/sys/netinet/tcp_syncache.c
          1.9       +1 -2      src/sys/netinet/udp_usrreq.c

          1.5       +1 -2      src/sys/netinet6/ipsec.c
          1.5       +1 -2      src/sys/netproto/ipsec/ipsec.c
          1.5       +1 -1      src/sys/netproto/ipsec/ipsec_input.c
          1.4       +1 -2      src/sys/netproto/ipsec/ipsec_output.c

          and finally remove
            sys/i386/i386        in_cksum.c
            sys/i386/include     in_cksum.h
        ---snip---
 - endian.h:
   * DTRT in C++ mode
 - quad.h:
   * we don't use gcc v1 anymore, remove support for it
   Suggested by:	bde (long ago)
 - assym.h:
   * avoid zero-length arrays (remove dependency on a gcc specific
     feature)
     This change changes the contents of the object file, but as it's
     only used to generate some values for a header, and the generator
     knows how to handle this, there's no impact in the gcc case.
   Explained by:	bde
   Submitted by:	Marius Strobl <marius@alchemy.franken.de>
 - aicasm.c:
   * minor change to teach it about the way icc spells "-nostdinc"
   Not approved by:	gibbs (no reply to my mail)
 - bump __FreeBSD_version (lang/icc needs to know about the changes)

Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.

Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.

Reviewed by:	-arch
Submitted by:	netchild
2004-03-12 21:45:33 +00:00
Ruslan Ermilov 7700eb86e7 Do what the execve(2) manpage says and enforce what a Strictly
Conforming POSIX application should do by disallowing the argv
argument to be NULL.

PR:		kern/33738
Submitted by:	Marc Olzheim, Serge van den Boom
OK'ed by:	nectar
2004-03-12 21:06:20 +00:00
Ken Smith db322c7eba This is a temporary fix to solve a regression issue on sparc64 that
is caused by the way sparc64 registers its CPUs.  Nate will work on
a real fix shortly.

Approved by:	njl
2004-03-12 20:35:21 +00:00
John Baldwin 1ed3e44f22 - Remove old sleep queues.
- Remove sleepqueue argument from sleepq_set_timeout() since it is not
  used.
2004-03-12 19:06:18 +00:00
John Baldwin 595bc82a1d Fixup a comment. 2004-03-12 19:05:46 +00:00
Dag-Erling Smørgrav 30a058027a Replace a manual check of a VMIO candidate with vn_canvmio(). This
silences an annoying warning in getblk() when VMIO'ing on a directory
vnode, which can happen when vfs.vmiodirenable is 1.

Bring the warning message in line with reality at the same time.

Submitted by:	hmp
2004-03-12 12:02:12 +00:00
Poul-Henning Kamp ceb58ca58f When I was a kid my work table was one cluttered mess an cleaning it up
were a rather overwhelming task.  I soon learned that if you don't know
where you're going to store something, at least try to pile it next to
something slightly related in the hope that a pattern emerges.

Apply the same principle to the ffs/snapshot/softupdates code which have
leaked into specfs:  Add yet a buf-quasi-method and call it from the
only two places I can see it can make a difference and implement the
magic in ffs_softdep.c where it belongs.

It's not pretty, but at least it's one less layer violated.
2004-03-11 18:50:33 +00:00
Poul-Henning Kamp 4d453ef101 Properly vector all bwrite() and BUF_WRITE() calls through the same path
and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
2004-03-11 18:02:36 +00:00
Poul-Henning Kamp 2b348f7429 Remove unused mnt_reservedvnlist field. 2004-03-11 16:59:57 +00:00
Poul-Henning Kamp 651b11eaf2 Remove unused second arg to vfinddev().
Don't call addaliasu() on VBLK nodes.
2004-03-11 16:33:11 +00:00
Poul-Henning Kamp 8666b655b5 Correctly account for extra bits in unit numbers when looking for
next free unit.
2004-03-11 14:11:02 +00:00
Poul-Henning Kamp 9397290e76 Add clone_setup() function rather than rely on lazy initialization.
Requested by:	rwatson
2004-03-11 12:58:55 +00:00
John-Mark Gurney 0235bf0261 make sure we had the filedesc lock when calling fdinit when RFCFDG is set
on call to rfork.

Submitted by:	Brian Buchanan
Semi-Reviewed by: rwatson
2004-03-10 00:27:36 +00:00
Nate Lawson 29f5b9a8c1 Hook CPUs up to newbus. CPUs will ultimately be a bus driver so that
multiple CPU-specific drivers can attach.  This is a work in progress
so children aren't supported yet.

Help from:	jhb
2004-03-09 03:37:21 +00:00
Robert Watson ce89352952 Mark loadaverage callout as CALLOUT_MPSAFE.
Reviewed by:	jhb
2004-03-08 22:01:19 +00:00
Pawel Jakub Dawidek dd604e2647 Add two new sysctls:
- security.bsd.hardlink_check_uid, when set, means, that unprivileged
		users are not permitted to create hard links to files not
		owned by them,
	- security.bsd.hardlink_check_gid, when set, means, that unprivileged
		users are not permitted to create hard links to files owned
		by group they don't belong to.

OK'ed by:	rwatson
2004-03-08 20:37:25 +00:00
Peter Wemm a69d88af52 Move a vref call outside of proc locks and Giant. By virtue of the fact
that we (p1) are currently running, we hold a reference on p_textvp which
means the vnode cannot go away.  p2 cannot run yet (and hence cannot exit)
so this should be safe to do at this point.  As a bonus, it removes a
block of under-Giant code that was there to support the vref.
2004-03-08 00:32:34 +00:00
Alan Cox 3eba15c12e Remove GIANT_REQUIRED from vunmapbuf(). 2004-03-07 00:37:18 +00:00
Alan Cox 5fadbfeac2 Giant is not required for vm_thread_new_altkstack(). 2004-03-07 00:06:32 +00:00
Alexander Kabaev ff85a3f0e1 Always call vn_finished_write after vn_start_write was called. All
occurences of 'goto done' after vn_start_write invocation were cleaning
up incompletely.
2004-03-06 04:09:54 +00:00
Peter Wemm 5750ee293d Add a missing part of jhb's previous commit. It looks like he had a
patch chunk rejected that he missed.  This would manifest as a lock
assertion panic at boot (Giant not locked in kern_fork.c).

Obtained from:  jhb
2004-03-06 00:44:59 +00:00
John Baldwin 6074439965 kthread_exit() no longer requires Giant, so don't force callers to acquire
Giant just to call kthread_exit().

Requested by:	many
2004-03-05 22:42:17 +00:00
John Baldwin 4ae89b957c - Push down Giant in exit() and wait().
- Push Giant down a bit in coredump() and call coredump() with the proc
  lock already held rather than unlocking it only to turn around and
  relock it.

Requested by:	peter
2004-03-05 22:39:53 +00:00
John Baldwin 8144e3b884 Lock Giant around the single threading code in exec() to satisfy an
assertion in the single threading code.
2004-03-05 22:38:26 +00:00
John Baldwin 5ce2f67858 - Grab a share lock of the proctree lock while looking for a pid due to the
process group and session dereferences.  Also, check that p_pgrp and
  p_sesssion are NULL before dereferencing them.
- Push down Giant in fork1().

Requested by:	peter
2004-03-05 22:37:32 +00:00
Don Lewis 169299398a Undo the merger of mlock()/vslock and munlock()/vsunlock() and the
introduction of kern_mlock() and kern_munlock() in
        src/sys/kern/kern_sysctl.c      1.150
        src/sys/vm/vm_extern.h          1.69
        src/sys/vm/vm_glue.c            1.190
        src/sys/vm/vm_mmap.c            1.179
because different resource limits are appropriate for transient and
"permanent" page wiring requests.

Retain the kern_mlock() and kern_munlock() API in the revived
vslock() and vsunlock() functions.

Combine the best parts of each of the original sets of implementations
with further code cleanup.  Make the mclock() and vslock()
implementations as similar as possible.

Retain the RLIMIT_MEMLOCK check in mlock().  Move the most strigent
test, which can return EAGAIN, last so that requests that have no
hope of ever being satisfied will not be retried unnecessarily.

Disable the test that can return EAGAIN in the vslock() implementation
because it will cause the sysctl code to wedge.

Tested by:	Cy Schubert <Cy.Schubert AT komquats.com>
2004-03-05 22:03:11 +00:00
Robert Watson 8cbec0c8dd The roundrobin callout from sched_4bsd is MPSAFE, so set up the
callout as MPSAFE to avoid grabbing Giant.

Reviewed by:	jhb
2004-03-05 19:27:04 +00:00
Robert Watson 16df17d062 Put "failed to set signal flags properly for ast()" check under
DIAGNOSTIC instead of INVARIANTS.  INVARIANTS is intended for tests
that don't substantially change code flow or behavior (passive), but
this test required locking both the proc lock and scheduler lock
in order to execute.  It also appears to be a very advisory diagnostic
as opposed to an invariant violation.

Following discussion with:	bde
2004-03-05 17:35:28 +00:00
Poul-Henning Kamp 1e0e79c993 Just because the timecounter reads the same value on two samples
after each other doesn't mean that nothing happened.
2004-03-04 14:14:23 +00:00
Bruce Evans c8564ad433 Fixed some style bugs (mainly English usage errors in comments). 2004-03-04 09:56:29 +00:00
Bruce Evans 01e3f3ae4f Fixed some style bugs (mainly misplaced comments, and totally disordered
declarations in acct_process()).
2004-03-04 09:47:09 +00:00
Robert Watson 0b759971a2 Remove unneeded label 'done2' from socket(). We now grab Giant
only around socreate(), and don't need it for file descriptor
accesses.

Submitted by:	sam
2004-03-04 01:57:48 +00:00
Dag-Erling Smørgrav 86b5e56351 Use different dummy wait channels to avoid panic in msleep().
Reviewed by:	jhb
2004-03-03 23:03:18 +00:00
John Baldwin efac7951fe Always assert that the passed in lock is the same as the saved lock in the
sleep queue now that the one abnormal case has been fixed.
2004-03-02 15:02:08 +00:00
John Baldwin 959c0c4122 Correct handling of PDROP in msleep() to just skip the mtx_lock() rather
than clear the lock pointer so that sleepq_add() still gets the correct
lock pointer and doesn't bogusly trip an assertion.
2004-03-02 14:58:33 +00:00
John Baldwin 707559e402 Check for TDF_SINTR before calling sleepq_abort() as there is a narrow
race in between sleepq_add() and sleepq_catch_signals() in that setting
td_wchan and TDF_SINTR is not atomic to sched_lock but only to the sleepq
lock.  This band-aid will stop assertion failures, but there is perhaps a
larger problem with the sleepq_add/sleepq_catch_signals race that I am not
sure how to solve.  For the signals case the race is harmless because we
always call cursig() after setting TDF_SINTR.  However, KSE doesn't do
anything in sleepq_catch_signals() to check that this race was lost, so I
am unsure if this race is harmful for this specific abort.
2004-03-01 23:07:58 +00:00
Robert Watson 746e5bf09b Rename dup_sockaddr() to sodupsockaddr() for consistency with other
functions in kern_socket.c.

Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT
in from the caller context rather than "1" or "0".

Correct mflags pass into mac_init_socket() from previous commit to not
include M_ZERO.

Submitted by:	sam
2004-03-01 03:14:23 +00:00
Scott Long 740d9ba692 Convert the other use of flags to mflags in soalloc(). 2004-03-01 01:14:28 +00:00
Robert Watson 2bc87dcfbe Modify soalloc() API so that it accepts a malloc flags argument rather
than a "waitok" argument.  Callers now passing M_WAITOK or M_NOWAIT
rather than 0 or 1.  This simplifies the soalloc() logic, and also
makes the waiting behavior of soalloc() more clear in the calling
context.

Submitted by:	sam
2004-02-29 17:54:05 +00:00
Poul-Henning Kamp 2cf6bdac50 Loudly announce WITNESS and DIAGNOSTIC options and warn about reduced
performance.
2004-02-29 16:56:54 +00:00
Poul-Henning Kamp 3d6e5ccb06 Make sure to disable the watchdog if we cannot honour the timeout. 2004-02-28 22:01:19 +00:00
Poul-Henning Kamp 4103b7652d Rename the WATCHDOG option to SW_WATCHDOG and make it use the
generic watchdoc(9) interface.

Make watchdogd(8) perform as watchdog(8) as well, and make it
possible to specify a check command to run, timeout and sleep
periods.

Update watchdog(4) to talk about the generic interface and add
new watchdog(8) page.
2004-02-28 20:56:35 +00:00
John Baldwin 44f3b09204 Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
  and condition variables.  Having sleep qeueus in a hash table avoids
  having to allocate a queue head for each wait channel.  Thus, struct cv
  has shrunk down to just a single char * pointer now.  However, the
  hash table does not hold threads directly, but queue heads.  This means
  that once you have located a queue in the hash bucket, you no longer have
  to walk the rest of the hash chain looking for threads.  Instead, you have
  a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
  differentiates between cv's and sleep/wakeup.  For example, calls to
  abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
  Thus, the TDF_CVWAITQ flag is removed.  Also, calls to unsleep() and
  cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
  sleep's no longer inherently bump the priority.  Instead, this is soley
  a propery of msleep() which explicitly calls sched_prio() before
  blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used.  The
  associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
  dropped and replaced with a single explicit clearing of td_wchan.
  TD_SET_ONSLEEPQ() would really have only made sense if it had taken
  the wait channel and message as arguments anyway.  Now that that only
  happens in one place, a macro would be overkill.
2004-02-27 18:52:44 +00:00
John Baldwin e5bb601d87 Drop sched_lock around the wakeup of the parent process after setting
the process state to zombie when a process exits to avoid a lock order
reversal with the sleepqueue locks.  This appears to be the only place
that we call wakeup() with sched_lock held.
2004-02-27 18:39:09 +00:00