Commit graph

4932 commits

Author SHA1 Message Date
Poul-Henning Kamp 7e2d76ff05 Remove the tc_update() function. Any frequency change to the
timecounter will be used starting at the next second, which is
good enough for sysctl purposes.  If better adjustment is needed
the NTP PLL should be used.
2002-04-26 10:06:26 +00:00
Brian Somers b94c4e9a93 Test if rootvnode is NULL rather than if rootdev is NODEV when determining
if there's a filesystem present.

rootdev can be NODEV in the NFS-mounted root scenario.

Discussed with: Harti Brandt <brandt@fokus.gmd.de>, iedowse
2002-04-26 09:52:54 +00:00
Mike Silbersack e1f1827f98 Make sure that sockets undergoing accept filtering are aborted in a
LRU fashion when the listen queue fills up.  Previously, there was
no mechanism to kick out old sockets, leading to an easy DoS of
daemons using accept filtering.

Reviewed by:	alfred
MFC after:	3 days
2002-04-26 02:07:46 +00:00
Dag-Erling Smørgrav 521eb014c8 Add the mutex profiling lock to the witness list. This hopefully unbreaks
the MUTEX_PROFILING + WITNESS + !WITNESS_SKIPSPIN case.

Submitted by:	Hiten Pandya <hiten@uk.FreeBSD.org>
2002-04-25 22:48:40 +00:00
Bruce Evans 2c900f6451 Fixed some longstanding bugs in _getenv_static():
- malformed environment strings (ones without an '=') were not rejected.
  There shouldn't be any of these, but when the static environment is
  empty it always begins with one of these; this one should be considered
  as the terminator after the end of the environment, but it isn't.
- the comparison of the name being looked up with the name in the
  environment was fuzzy -- only the characters up to the length of the
  latter were compared, so _getenv_static("foobar") matched "foo=..."
  in the environment and everything matched "" in the empty environment.

MFC after:	3 days
2002-04-25 20:25:15 +00:00
Bruce Evans ff557fa1a9 Break the following implementation of panic(3):
#!bin/sh

	# Original version of this by Michael Reifenberger
	# <root@nihil.plaut.de>.

	mdconfig -d -u 11 >/dev/null 2>&1
	dd if=/dev/zero of=zz bs=1m count=1

	while :
	do
		mdconfig -a -t vnode -f zz -u 11
		fdisk -f - -iv /dev/md11 <<EOF1
		g c1 h64 s32
		p 1 165 0 2048
		a 1
	EOF1
		mdconfig -d -u 11
	done

Garbage pointers in __si_u were not cleared by destroy_dev().  Not
clearing si_disk made the above fatal because the disk layer uses
si_disk as a flag to indicate that the dev_t has been completely
initialized.  disk_destroy() clears si_disk for the parent dev_t
but doesn't get called for children.

Not fixed:
- setting the undocumented sysctl debug.free_devt should cause more
  complete destruction of the dev_t including clearing of __si_u, but
  actually causes the above to panic a little earlier.
- the loop leaks 10 memory allocations per iteration (4 DEVFS, 2 devbuf
  and 4 dev_t).

Reviewed by:	timeout by MAINTAINER after 3 months
2002-04-25 13:17:33 +00:00
Marcel Moolenaar d297ad160e Don't use the symbol name to lookup the symbol value when we can use
the symbol index defined by the relocation. The elf_lookup() support
function is to be used by elf_reloc() when symbol lookups need to be
done. The elf_lookup() function operates on the symbol index and
will do a symbol name based lookup when such is required, otherwise
it uses the symbol index directly. This solves the problem seen on
ia64 where the symbol hash table does not contain local symbols and
a symbol name based lookup would fail for those symbols.

Don't pass the symbol name to elf_reloc(), as it isn't used any more.
2002-04-25 01:22:16 +00:00
Seigo Tanimura ce00aebe22 Free(9) should be Giant-free.
Suggested by:	jhb
2002-04-24 09:59:18 +00:00
Mike Silbersack c473d3e406 Remove sodropablereq - this function hasn't been used since the
syncache went in.

MFC after:	3 days
2002-04-24 04:11:08 +00:00
Jeffrey Hsu 4bc37205bc The cold and panicstr variables do not need to be protected by sched_lock.
Submitted by:	Jennifer Yang (yangjihui@yahoo.com)
Reviewed by:	jake & jhb in principle
2002-04-23 19:50:22 +00:00
Poul-Henning Kamp 708da94ef2 Add a basic sanity check on pointers passed to free(9).
Should be improved by:	jeff
2002-04-23 18:50:25 +00:00
Poul-Henning Kamp 00d70dec4e Don't call malloc(9) to allocate zero bytes softc data for devices. 2002-04-23 15:48:23 +00:00
Robert Watson 7a0776e477 Slightly restructure extattr_get_vp() so that there's only one entry point
to VOP_GETEXTATTR().  This simplifies code flow when inserting MAC hooks.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-23 01:27:38 +00:00
Alfred Perlstein ea5b39d029 Don't FILEDESC_LOCK around calls to falloc(). 2002-04-22 20:09:11 +00:00
Dag-Erling Smørgrav d397408818 Usage style sweep: spell "usage" with a small 'u'.
Also change one case of blatant __progname abuse (several more remain)
This commit does not touch anything in src/{contrib,crypto,gnu}/.
2002-04-22 13:44:47 +00:00
Poul-Henning Kamp 29f88f470e Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.

The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *".  Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.

Also, curthread may or may not have anything to do with the I/O request
at hand.

The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.

Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.

The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
2002-04-22 06:53:20 +00:00
Marcel Moolenaar 8420105927 Add function link_elf_get_gp(), specific to ia64 for now, to get
the DT_PLTGOT value. On ia64 this is the value of GP. We need this
to construct function descriptors, but the elf file structure is
not exported to MD code.

Note that the name of the function is based on the meaning that
DT_PLTGOT has on ia64. This may differ on other architectures. As
such, link_elf_get_gp() has a high level of MD to it. Renaming the
function to describe what DT_* value is returned makes it generic,
but also makes the MD code less clear and if we only need this on
ia64, then a general name for a specific function doesn't help.

In short: I don't know what is "right" at this time, so I'll go
with what I have.
2002-04-21 21:08:30 +00:00
Mark Murray bd41864183 Use protected names (_foo) to cutdown on boatloads of lint warnings. 2002-04-21 11:16:10 +00:00
Marcel Moolenaar 9daa5b147a GCC 3.x WARNS: Add a break to the default case. 2002-04-20 21:56:42 +00:00
Seigo Tanimura 1c2451c24d Push down Giant for setpgid(), setsid() and aio_daemon(). Giant protects only
malloc(9) and free(9).
2002-04-20 12:02:52 +00:00
Robert Watson 0510317039 Improve style consistency of vfs_syscalls.c by converting the style used
in various extattr_*() calls to match the rest of the file.  Originally,
these bits at the end looked more like style(9).  This patch was submitted
by green by way of the TrustedBSD MAC tree, and I fixed a few problems
with it on the way through.  Someone with more time on their hands should
convert the entire file to style(9); this commit is for diff reduction
purposes.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-20 01:37:08 +00:00
Robert Watson 89e9e6e7c5 In sendfile(), use the vn_rdwr() helper function, rather than manually
constructing a struct aio and invoking VOP_READ() directly.  This cleans
up the code a little, but also has the advantage of making sure almost
all vnode read/write access in the kernel goes through the helper
function, meaning that instrumentation of that helper function can impact
almost all relevant read/write operations.  In this case, it permits us
to put MAC hooks into vn_rdwr() and not modify uipc_syscalls.c (yet).

In general, if helper vn_*() functions exist, they should be used in
preference to direct VOP's in system call service code.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-19 13:46:24 +00:00
Robert Watson 5a06cb0ca6 Divorce proc0 and proc1 credentials earlier; while this isn't technically
needed in the current code, in the MAC tree, create_init() relies on the
ability to modify the credentials present for initproc, and should not
perform that modification on a shared credential.  Pro-active diff
reduction against MAC changes that are in the queue; also facilitates
other work, including the capabilities implementation.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-04-19 13:35:53 +00:00
Poul-Henning Kamp 3bdd2d061a suser is Giant safe, so optimize a pointless case. 2002-04-19 09:20:13 +00:00
SUZUKI Shinsuke 88ff5695c1 just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by:	ume
MFC after:	1 week
2002-04-19 04:46:24 +00:00
Jacques Vidrine e983a3762b When exec'ing a set[ug]id program, make sure that the stdio file descriptors
(0, 1, 2) are allocated by opening /dev/null for any which are not already
open.

Reviewed by:	alfred, phk
MFC after:	2 days
2002-04-19 00:45:29 +00:00
Maxime Henrion b48a4280fc Avoid calling malloc() or free() while holding the
kenv lock.

Reviewed by:	jake
2002-04-17 17:51:10 +00:00
Maxime Henrion d786139c76 Rework the kernel environment subsystem. We now convert the static
environment needed at boot time to a dynamic subsystem when VM is
up.  The dynamic kernel environment is protected by an sx lock.

This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv().  freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.

The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).

Reviewed by:	peter
2002-04-17 13:06:36 +00:00
Maxime Henrion fd448168b7 Add an entry for the kenv(2) syscall (code to follow).
Reviewed by: peter
2002-04-17 13:05:13 +00:00
Ian Dowse df99ca52f1 The recent NFS forced unmount improvements introduced a side-effect
where some client operations might be unexpectedly cancelled during
an unsuccessful non-forced unmount attempt. This causes problems
for amd(8), because it periodically attempts a non-forced unmount
to check if the filesystem is still in use.

Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set
only during the operation of a forced unmount. Use this instead of
MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.

Also correct a problem where dounmount() might inadvertently clear
the MNTK_UNMOUNT flag.

Reported by:	simokawa
MFC after:	1 week
2002-04-17 01:07:29 +00:00
John Baldwin ba626c1db2 Lock proctree_lock instead of pgrpsess_lock. 2002-04-16 17:11:34 +00:00
John Baldwin 596325f154 - Lock proctree_lock instead of pgrpsess_lock.
- Use temporary variables to hold a pointer to a pgrp while we dink with it
  while not holding either the associated proc lock or proctree_lock.  It
  is in theory possible that p->p_pgrp could change out from under us.
2002-04-16 17:09:22 +00:00
John Baldwin c8b1829d8e - Lock proctree_lock instead of pgrpsess_lock.
- Simplify return logic of setsid() and setpgid().
2002-04-16 17:06:11 +00:00
John Baldwin ea97757a54 - Lock proctree_lock instead of pgrpsess_lock.
- Exclusively lock proctree_lock while calling leavepgrp().
2002-04-16 17:04:21 +00:00
John Baldwin f089b57070 - Merge the pgrpsess_lock and proctree_lock sx locks into one proctree_lock
sx lock.  Trying to get the lock order between these locks was getting
  too complicated as the locking in wait1() was being fixed.
- leavepgrp() now requires an exclusive lock of proctree_lock to be held
  when it is called.
- fixjobc() no longer gets a shared lock of proctree_lock now that it
  requires an xlock be held by the caller.
- Locking notes in sys/proc.h are adjusted to note that everything that
  used to be protected by the pgrpsess_lock is now protected by the
  proctree_lock.
2002-04-16 17:03:05 +00:00
Poul-Henning Kamp fe4dc7a6ee Remove two debug printfs which should never have been committed. 2002-04-15 21:08:51 +00:00
John Baldwin 38e0823392 You have to cast int64_t's to long long if you printf them with %lld.
This now compiles on alpha without a warning.

Pointy-hat to:	phk
2002-04-15 21:04:32 +00:00
Poul-Henning Kamp e1d970f181 Improve the implementation of adjtime(2).
Apply the change as a continuous slew rather than as a series of
discrete steps and make it possible to adjust arbitraryly huge
amounts of time in either direction.

In practice this is done by hooking into the same once-per-second
loop as the NTP PLL and setting a suitable frequency offset deducting
the amount slewed from the remainder.  If the remaining delta is
larger than 1 second we slew at 5000PPM (5msec/sec), for a delta
less than a second we slew at 500PPM (500usec/sec) and for the last
one second period we will slew at whatever rate (less than 500PPM)
it takes to eliminate the delta entirely.

The old implementation stepped the clock a number of microseconds
every HZ to acheive the same effect, using the same rates of change.

Eliminate the global variables tickadj, tickdelta and timedelta and
their various use and initializations.

This removes the most significant obstacle to running timecounter and
NTP housekeeping from a timeout rather than hardclock.
2002-04-15 12:23:11 +00:00
Poul-Henning Kamp b35c8f287d Take the "tickadj" element out of struct clockinfo. Our adjtime(2)
implementation is being changed and the very concept of tickadj will
no longer be meaningful.
2002-04-15 12:11:06 +00:00
Poul-Henning Kamp b9c6e8bdbd In the ntp_adjtime(2) syscall, return our actual estimate of unapplied
offset correction instead of the most recent offset applied.
2002-04-15 08:58:24 +00:00
Jeff Roberson 5e914b96b9 Finish adding support code for sysctl kern.mprof. This dumps some malloc
information related to bucket size effeciency.  Three things are printed on
each row:

Size is the size the user actually asked for rounded to 16 bytes.
Requests is the number of times this size was asked for.
Real Size is the size we actually handed out.

At the end the total memory used and total waste is displayed.  Currently my
system displays about 33% wasted memory.

The intent of this code is to gather statistics for tuning the malloc bucket
sizes.  It is not intended to be run with INVARIANTS and it is not entirely
mp safe.  It can be enabled via 'options MALLOC_PROFILE' which was commited
earlier.
2002-04-15 05:24:01 +00:00
Jeff Roberson 6f2671750e Remove malloc_type's ks_limit.
Updated the kmemzones logic such that the ks_size bitmap can be used as an
index into it to report the size of the zone used.

Create the kern.malloc sysctl which replaces the kvm mechanism to report
similar data.  This will provide an easy place for statistics aggregation if
malloc_type statistics become per cpu data.

Add some code ifdef'd under MALLOC_PROFILING to facilitate a tool for sizing
the malloc buckets.
2002-04-15 04:05:53 +00:00
Alfred Perlstein 46e12b42fe Don't allow one to trace an ancestor when already traced.
PR: kern/29741
Submitted by: Dave Zarzycki <zarzycki@FreeBSD.org>
Fix from: Tim J. Robbins <tim@robbins.dropbear.id.au>
MFC After: 2 weeks
2002-04-14 17:12:55 +00:00
Jeff Roberson 79a3e97054 Use VOP_GETVOBJECT instead of accessing the member directly. This fixed
an issue with nullfs and NAMEI shared.

Submitted by:	Alexander Kabaev
2002-04-14 10:18:48 +00:00
Alan Cox 24ab015f79 Regen 2002-04-14 05:33:58 +00:00
Alan Cox b0d97980f6 Remove the requirement that Giant be held around sigreturn(). 2002-04-14 05:31:47 +00:00
Alan Cox 00e731601d o Use aiocblist::fd_file in the AIO threads rather than recomputing
the file * from the calling process's descriptor table.
 o Eliminate sharing of the calling process's descriptor table
   with the AIO threads.
2002-04-14 03:04:19 +00:00
John Baldwin 9c1ab3e04a - Change killpg1()'s first argument to be a thread instead of a process so
we can use td_ucred.
- In killpg1(), the proc lock is sufficient to check if p_stat is SZOMB
  or not.  We don't need sched_lock.
- Close some races in psignal().  In psignal() there is a big switch
  statement based on p_stat.  All the different cases are assuming that
  the process (or thread) isn't going to change state out from under it.
  To ensure this is true, just lock sched_lock for the entire switch.  We
  practically held it the entire time already anyways.  This also
  simplifies the locking somewhat and actually results in fewer lock
  operations.
- Allow signotify() to be called with the sched_lock held since psignal()
  now does that.
- Use td_ucred in a couple of places.
2002-04-13 23:33:36 +00:00
John Baldwin bad56603ba - Change donice() to take a thread as the first argument instead of a
process so it can use td_ucred.
- Require the target process of donice() to be locked when donice() is
  called.
- Use td_ucred.
- Lock the target process of p_cansee() and while reading the credentials
  of a process.
- Change the logic of rtprio() slightly so it does it's copyin() if needed
  prior to locking the target process.
- rtprio() no longer needs Giant.  In theory with full KSE it would still
  need Giant to protect p_ucred of curproc for the p_canfoo() functions
  but p_canfoo() will be changing to using td_ucred of curthread before
  full KSE hits the tree.
2002-04-13 23:28:23 +00:00
John Baldwin 07f3485d5e - Change the algorithms of the syscalls to modify process credentials to
allocate a blank cred first, lock the process, perform checks on the
  old process credential, copy the old process credential into the new
  blank credential, modify the new credential, update the process
  credential pointer, unlock the process, and cleanup rather than trying
  to allocate a new credential after performing the checks on the old
  credential.
- Cleanup _setugid() a little bit.
- setlogin() doesn't need Giant thanks to pgrp/session locking and
  td_ucred.
2002-04-13 23:07:05 +00:00
John Baldwin a7ff744350 - Change the first argument of ktrcanset(), ktrsetchildren(), and ktrops()
to a thread pointer so that ktrcanset() can use td_ucred.
- Add some proc locking to partially protect p_tracep and p_traceflag.
2002-04-13 22:54:18 +00:00
Thomas Moestl 8db523989f Use pmap_extract() instead of pmap_kextract() to retrieve the physical
address associated with a user virtual address in
pipe_build_write_buffer().

Reviewed by:	alc
2002-04-13 20:09:06 +00:00
Jeroen Ruigrok van der Werven bcbf4411d6 Use the correct macros for F_SETFD/F_GETFD instead of magic numbers.
Reflect that fact in the manual page.

PR:		12723
Submitted by:	Peter Jeremy <peter.jeremy@alcatel.com.au>
Approved by:	bde
MFC after:	2 weeks
2002-04-13 10:16:53 +00:00
Thomas Moestl de67a4bd91 Back out the last revision - it does not work correctly when one of
the pages in question is not in the top-level vm object, but in
one of the shadow ones.

Pointed out by: alc
Pointy hat to:	tmm
2002-04-13 00:03:07 +00:00
John Baldwin 6871a6c89e Rework ptrace(2) to be more locking friendly. We do any needed copyin()'s
and acquire the proctree_lock if needed first.  Then we lock the process
if necessary and fiddle with it as appropriate.  Finally we drop locks and
do any needed copyout's.  This greatly simplifies the locking.
2002-04-12 21:17:37 +00:00
Thomas Moestl 60f2606a7d Do not use pmap_kextract() to find out the physical address of a user
belong to a user virtual address; while this happens to work on some
architectures, it can't on sparc64, since user and kernel virtual
address spaces overlap there (the distinction between them is done via
separate address space identifiers).

Instead, look up the page in the vm_map of the process in question.

Reviewed by:	jake
2002-04-12 19:38:41 +00:00
Jeffrey Hsu 4037698769 Fix corner case where m_len was not being initialized.
Submitted by:	Maksim Yevmenkin <myevmenk@digisle.net>
MFC after:	1 week
2002-04-12 00:01:50 +00:00
John Baldwin b106d2f56a - Set the base priority of an ithread that has no handlers when we set its
normal priority.
- Lock sched_lock while we dink with the priorities.
- Remove a few extra blank lines.
2002-04-11 21:03:35 +00:00
Alan Cox ab9ab5702e Regen 2002-04-11 17:35:53 +00:00
Alan Cox a0805f6f7a Remove the requirement that Giant be held around osigreturn(). All platform-
specific implementations are MPSAFE.
2002-04-11 17:34:38 +00:00
John Baldwin 7edfb592df - Change settime() to take a thread as its first argument instead of a proc
so it can use td_ucred.
- Push Giant down into the end of settime() where we actually set the time
  on the timecounter and time of day clock.
- Remove Giant from clock_settime().
- Push Giant down in settimeofday() to just protect the 'tz' global
  variable.
2002-04-10 04:09:07 +00:00
John Baldwin 9522390c28 Display the recursion count in the lock_instance in the show locks
output.

Indirectly requested by:	peter
2002-04-10 01:25:11 +00:00
John Baldwin 9351347a17 Cosmetic fixup in output of lock types in show locks output. 2002-04-10 01:19:53 +00:00
Brian Somers f1e4a6e941 In linker_load_module(), check that rootdev != NODEV before calling
linker_search_module().

Without this, modules loaded from loader.conf that then try to load
in additional modules (such as digi.ko loading a card's BIOS) die
badly in the vn_open() called from linker_search_module().

It may be worth checking (KASSERTing?) that rootdev != NODEV in
vn_open() too.
2002-04-10 01:14:45 +00:00
Brian Somers 96987c74d6 Change linker_reference_module() so that it's passed a struct
mod_depend * (which may be NULL).  The only consumer of this
function at the moment is digi_loadmoduledata(), and that passes
a NULL mod_depend *.

In linker_reference_module(), check to see if we've already got
the required module loaded.  If we have, bump the reference count
and return that, otherwise continue the module search as normal.
2002-04-10 01:13:57 +00:00
John Baldwin 65c9b4303b - Change fill_kinfo_proc() to require that the process is locked when it
is called.
- Change sysctl_out_proc() to require that the process is locked when it
  is called and to drop the lock before it returns.  If this proves too
  complex we can change sysctl_out_proc() to simply acquire the lock at
  the very end and have the calling code drop the lock right after it
  returns.
- Lock the process we are going to export before the p_cansee() in the
  loop in sysctl_kern_proc() and hold the lock until we call
  sysctl_out_proc().
- Don't call p_cansee() on the process about to be exported twice in
  the aforementioned loop.
2002-04-09 20:10:46 +00:00
John Baldwin 9b28af9165 Whitespace changes to wrap long lines. 2002-04-09 20:01:16 +00:00
John Baldwin 6dc958b9ff We don't need Giant to read the pgrp ID since the proc lock has protected
p_pgrp since the pgrp locking went in.  We also don't need it to check for
invalid values in the options argument to wait1(), so push Giant down
slightly.
2002-04-09 20:00:40 +00:00
John Baldwin 16e7bc7b90 - Remove an early KSE diagnostic panic. The thread pointer here is always
curthread.
- We don't need Giant to do suser() checks now, so don't lock Giant until
  after the check.
2002-04-09 19:58:38 +00:00
John Baldwin 2b60cfc5ce Don't lock the ithread lock in ithread_create(). The ithread isn't on any
lists or in any tables yet so there are no other references to it, thus
we don't need to lock it.
2002-04-09 16:26:37 +00:00
Poul-Henning Kamp 1bdb20a68e Implement DIOCGFRONTSTUFF ioctl which reports how many bytes from the start
of the device magic stuff might occupy.

Sponsored by: DARPA & NAI Labs.
2002-04-09 15:43:32 +00:00
Poul-Henning Kamp 7f086a0852 Rename DIOCGKERNELDUMP to DIOCSKERNELDUMP as it strictly speaking
is a "set" not a "get" operation.

Sponsored by:	DARPA & NAI Labs.
2002-04-09 10:04:09 +00:00
Jeff Roberson a59f8b9e6c Turn #ifdef LOOKUP_SHARED into #ifndef LOOKUP_EXCLUSIVE to enable this
behavior by default.  Also, change the options line to reflect this.

If there are no problems reported this will become the only behavior and the
knob will be removed in a month or so.

Demanded by:	obrien
2002-04-09 05:14:17 +00:00
Maxime Henrion a48ca36999 The fourth parameter to copystr() is a size_t, not an int.
Approved by:	peter
2002-04-08 21:14:19 +00:00
Poul-Henning Kamp 2dd527b3ac Move generic disk ioctls from <sys/disklabel.h> to <sys/disk.h>.
Sponsored by:	DARPA & NAI Labs
2002-04-08 09:20:07 +00:00
Poul-Henning Kamp d39e457bba Put back dumppcb, but this time we put a comment to tell what it is for.
Brucifixion by:	bde
2002-04-08 06:59:13 +00:00
Alan Cox c0bf5caa74 Restructure aio_return() to eliminate duplicated code and facilitate Giant
push down.
2002-04-08 04:57:56 +00:00
Jeffrey Hsu 20504246d8 There's only one socket zone so we don't need to remember it
in every socket structure.
2002-04-08 03:04:22 +00:00
Maxime Henrion 9d8353732e o Change kernel_vmount() interface to be more convenient : pass two
separate strings instead of passing "foo=bar".
o Don't forget to clear the VMOUNT flag on the vnode when vfs_nmount()
  fails because the fs doesn't implement VFS_NMOUNT (and in vfs_mount()
  when the fs doesn't implement VFS_MOUNT) ; also decrement the vfs
  refcount in the !MNT_UPDATE case.
2002-04-07 13:22:47 +00:00
David Malone cf4ce70bb3 Remove a comment which relates to the old name cache code, which
was replaced in 1997.

Approved by:	phk
2002-04-07 08:58:31 +00:00
Alan Cox ae124fc4bd Reduce the duplication of code for error handling in _aio_aqueue(). 2002-04-07 07:17:59 +00:00
Alan Cox 63a4964eec Change jobref and *ijoblist from int to long in order to avoid
a catastrophe after the 2^32nd AIO operation on 64-bit architectures.
2002-04-07 01:28:34 +00:00
Jake Burkholder 98281c99fc Remove a stale comment. 2002-04-06 08:44:04 +00:00
Jake Burkholder a9f5d33875 Include machine/ktr.h for sparc64 so we pick up KTR_CPU. 2002-04-06 08:43:17 +00:00
Jake Burkholder a30d7c60f6 Use CTASSERT rather than a runtime check to detect kinfo_proc size changes.
Remove the ugly yuck code to busy wait for 20 seconds.
2002-04-06 08:13:52 +00:00
Yoshihiro Takahashi d7ef6277af Added the new kernel dumping support for pc98. 2002-04-06 06:41:54 +00:00
Bruce Evans c78f394575 Updated a doubly stale comment about signotify(). Fixed a nearby long line. 2002-04-05 10:00:37 +00:00
Peter Wemm 911fc92344 Increase the size of the register stack storage on ia64 from 32K to 2MB so
that we can compile gcc.  This is a hack because it adds a fixed 2MB to
each process's VSIZE regardless of how much is really being used since
there is no grow-up stack support.  At least it isn't physical memory.
Sigh.

Add a sysctl to enable tweaking it for new processes.
2002-04-05 01:57:45 +00:00
Thomas Moestl d7f7792edf Add a generic implementation of inittodr() and resettodr(), as well as
a set of helper routines to deal with real-time clocks. The generic
functions access the clock diver using a kobj interface. This is intended
to reduce code reduplication and make it easy to support more than one
clock model on a single architecture.

This code is currently only used on sparc64, but it is planned to convert
the code of the other architectures to it later.
2002-04-04 23:39:10 +00:00
John Baldwin 6008862bc2 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
John Baldwin 0c88508a78 Change mtx_init() to now take an extra argument. The third argument is
the generic lock type for use with witness.  If this argument is NULL then
the lock name is used as the lock type.  Add a macro for a lock type name
for network driver locks.
2002-04-04 20:52:27 +00:00
John Baldwin 9939f0f11c Set the lock type equal to the lock name for now as all of the current
sx locks don't use very specific lock names.
2002-04-04 20:49:35 +00:00
John Baldwin b6396e1656 Add a new char * pointer lo_type to struct lock_object that is used to
point to a more generic name for a lock that is more suitable for use by
witness when grouping locks.  For example, although network driver locks
use the interface name for the name of each lock, they should all use the
same witness and be treated the same as witness.  Another example is that
all UMA zone locks should be treated the same.  The witness code has also
been updated to print out the lock type in addition to the lock name in a
few places where it is relevant.
2002-04-04 20:45:21 +00:00
Poul-Henning Kamp f67ad03a25 Delete the bogus d_boot[01] fields from struct disklabel.
This shrinks the size 4 bytes on alpha, down to the same 276 bytes
as all other platforms.

Construct a hack to make old ioctls work on new kernels.

Once world is recompiled only the new and correct sysctls will be
used.

This hack will become annoying around 1st of may to make people
rebuild their worlds and it will be gone before 5.0.
2002-04-04 20:34:48 +00:00
Bruce Evans 79065dba2a Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by:	luoqi (first part only, long ago, reorganized by me)
Reminded by:	dillon
2002-04-04 17:49:48 +00:00
Bruce Evans 179235b38b Optimized the check for unmasked pending signals in CURSIG() using a new
inline function sigsetmasked() and a new macro SIGPENDING().  CURSIG()
will soon be moved out of the normal path of execution for syscalls and
traps.  Then its efficiency will be less important but the new interfaces
will be useful for checking for unmasked pending signals in more places.

Submitted by:		luoqi (long ago, in a slightly different form)

Assert that sched_lock is not held in CURSIG().
2002-04-04 15:19:41 +00:00
Alan Cox 9b16adc1e7 o aio_process needn't fhold()/fdrop() the fp now that _aio_aqueue() and
aio_free_entry() do this.
 o Remove two unnecessary/unused variables from aio_process() and one field
   from aiocblist.
2002-04-04 02:13:20 +00:00
Alfred Perlstein 19a0f7e1be Avoid a lock order reversal by dropping the eventhandler_mutex earlier.
We get enough protection from the lock on the individual lists that we
aquire later.

Noticed/Tested by: Steven G. Kargl <kargl@troutmask.apl.washington.edu>
Submitted by: Jonathan Mini <mini@haikugeek.com>
2002-04-04 00:52:03 +00:00
John Baldwin 7049932843 - Axe a stale comment. We haven't allowed the ucred pointer passed to
securelevel_*() to be NULL for a while now.
- Use KASSERT() instead of if (foo) panic(); to optimize the
  !INVARIANTS case.

Submitted by:	Martin Faxer <gmh003532@brfmasthugget.se>
2002-04-03 18:35:25 +00:00
Maxime Henrion bcc931752f Add two forgotten vfs_unbusy() calls, in vfs_mount() and vfs_nmount().
Reviewed by:	phk
2002-04-03 12:19:03 +00:00
Ruslan Ermilov 12c79eb288 Dike out a highly insecure UCONSOLE option.
TIOCCONS must be able to VOP_ACCESS() /dev/console to succeed.

Obtained from:	OpenBSD
2002-04-03 10:56:59 +00:00
Matthew Dillon d1b534dfc6 brelse() was improperly clearing B_DELWRI in the B_DELWRI|B_INVAL case
without removing the buffer from the vnode's dirty buffer list, which
can result in a panic in NFS.  Replaced the code with a call to bundirty()
which deals with it properly.

PR:		kern/36108, kern/36174
Submitted by:	various people
Special mention: to Danny Schales <dan@coes.LaTech.edu> for providing a core dump that helped me track this down.
MFC after:	1 day
2002-04-03 00:17:36 +00:00
Dag-Erling Smørgrav e633070431 Revert to open hashing. It makes the code simpler, and works farily well
even when the number of records approaches the size of the hash table.
Besides, the previous implementation (using linear probing) was broken :)

Also, use the newly introduced MTX_SYSINIT.
2002-04-02 23:26:32 +00:00
John Baldwin c53c013bae - Move the MI mutexes sched_lock and Giant from being declared in the
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
  needed for mutexes such as setting up thread0's contested lock list
  and initializing MI mutexes.  Change the various MD startup routines
  to call this function instead of duplicating all the code themselves.

Tested on:	alpha, i386
2002-04-02 22:19:16 +00:00
John Baldwin 7feefcd6ce Spelling police. 2002-04-02 20:44:30 +00:00
John Baldwin c08cf3c3e8 Enforce an implicit lock order of sleepable locks before non-sleepable
locks.
2002-04-02 19:27:21 +00:00
Andrew R. Reiter 72a492cacf - Add a mutex to lock the global securelevel value.
- Make use of MTX_SYSINIT() as the means to initialize our mutex lock.
2002-04-02 17:43:17 +00:00
Seigo Tanimura 2a60b9b951 Fix leakage of p_pgrp lock. 2002-04-02 17:12:06 +00:00
John Baldwin 48c343df5f Explicitly document how we implicitly enforce the lock order of sleep
locks before spin locks.
2002-04-02 16:51:20 +00:00
Andrew R. Reiter c27b56999e - Add MTX_SYSINIT and SX_SYSINIT as macro glue for allowing sx and mtx
locks to be able to setup a SYSINIT call.  This helps in places where
  a lock is needed to protect some data, but the data is not truly
  associated with a subsystem that can properly initialize it's lock.
  The macros use the mtx_sysinit() and sx_sysinit() functions,
  respectively, as the handler argument to SYSINIT().

Reviewed by: alfred, jhb, smp@
2002-04-02 16:05:43 +00:00
Dag-Erling Smørgrav b784ffe91a Instead of get_cyclecount(9), use nanotime(9) to record acquisition and
release times.  Measurements are made and stored in nanoseconds but
presented in microseconds, which should be sufficient for the locks for
which we actually want this (those that are held long and / or often).
Also, rename some variables and structure members to unit-agnostic names.
2002-04-02 14:42:01 +00:00
Poul-Henning Kamp 408ab1b875 Retire the bogus ioctl DIOCGPART in toto.
Once again we can notice that badly thought out hacks ferment and infect
far more code than initially expected.

Sponsored by:	DARPA and NAI Labs.
2002-04-02 11:52:13 +00:00
Marcel Moolenaar 7902451821 Don't compile the dummy dumpsys for ia64. 2002-04-02 10:55:40 +00:00
Robert Watson 3bd1da2958 Update comment regarding the locking of the sysctl tree.
Rename memlock to sysctllock, and MEMLOCK()/MEMUNLOCK() to SYSCTL_LOCK()/
SYSCTL_UNLOCK() and related changes to make the lock names make more
sense.

Submitted by:	Jonathan Mini <mini@haikugeek.com>
2002-04-02 05:50:07 +00:00
Alfred Perlstein 29a2c0cd09 Use sx locks instead of flags+tsleep locks.
Submitted by: Jonathan Mini <mini@haikugeek.com>
2002-04-02 04:20:38 +00:00
Alfred Perlstein 28fe1a715e Use sx locks rather than lockmgr locks for eventhandlers.
Submitted by: Jonathan Mini <mini@haikugeek.com>
2002-04-02 04:18:54 +00:00
Dag-Erling Smørgrav 6c35e80948 Mutex profiling code, conditional on the MUTEX_PROFILING option. Adds the
following sysctl variables:

  debug.mutex.prof.enable	    enable / disable profiling
  debug.mutex.prof.acquisitions	    number of mutex acquisitions recorded
  debug.mutex.prof.records	    number of acquisition points recorded
  debug.mutex.prof.maxrecords	    max number of acquisition points
  debug.mutex.prof.rejected	    number of rejections (due to full table)
  debug.mutex.prof.hashsize	    hash size
  debug.mutex.prof.collisions	    number of hash collisions
  debug.mutex.prof.stats	    profiling statistics

The code records four numbers for each acquisition point (identified by
source file name and line number): longest time held, total time held,
number of non-recursive acquisitions, average time held.  The measurements
are in clock cycles (as returned by get_cyclecount(9)); this may cause
measurements on some SMP systems to be unreliable.  This can probably be
worked around by replacing get_cyclecount(9) by some incarnation of
nanotime(9).

This work was derived from initial patches by eivind.
2002-04-02 00:01:49 +00:00
Matthew Dillon 182da8209d Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter()
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.

Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development.  Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures.  Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable.  Re-integrate bug fixes that Jake
made to the sparc64 code.

Note: In general, developers should not gratuitously move declarations out
of sub-blocks.  They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.

Reviewed by:	jake
2002-04-01 23:51:23 +00:00
John Baldwin 44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
John Baldwin 4c44ad8ee5 Whitespace only change: use ANSI function declarations instead of K&R. 2002-04-01 20:13:31 +00:00
Poul-Henning Kamp c23cda8580 Extend a hack to also hack around PC98's definition of __i386__ 2002-04-01 20:13:03 +00:00
John Baldwin 4269e184e8 Fix style bug in previous commit. 2002-04-01 17:53:42 +00:00
Jake Burkholder 60a57b73ef ktr changes to improve performance and make writing a userland utility to
dump the trace buffer feasible.
- Remove KTR_EXTEND.  This changes the format of the trace entries when
  activated, making writing a userland tool which is not tied to a specific
  kernel configuration difficult.
- Use get_cyclecount() for timestamps.  nanotime() is much too heavy weight
  and requires recursion protection due to ktr traces occuring as a result
  of ktr traces.  KTR_VERBOSE may still require recursion protection, which
  is now conditional on it.
- Allow KTR_CPU to be overridden by MD code.  This is so that it is possible
  to trace early in startup before pcpu and/or curthread are setup.
- Add a version number for the ktr interface.  A userland tool can check this
  to detect mismatches.
- Use an array for the parameters to make decoding in userland easier.
- Add file and line recording to the non-extended traces now that the extended
  version is no more.

These changes will break gdb macros to decode the extended version of the
trace buffer which are floating around.  Users of these macros should either
use the show ktr command in ddb, or use the userland utility which can be run
on a core dump.

Approved by:	jhb
Tested on:	i386, sparc64
2002-04-01 05:35:26 +00:00
Poul-Henning Kamp 81661c94b6 Here follows the new kernel dumping infrastructure.
Caveats:

The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.

I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc.  (send me email if
you are interested).

Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.

All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.

Documentation is quite sparse at this time, more to come.

Details:

ATA and SCSI drivers should work as the dump formatting code has been
removed.  The IDA, TWE and AAC have not yet been converted.

Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev.  To implement the "off" argument, /dev/null
is used as the device.

Savecore will fail if handed any options since they are not (yet)
implemented.  All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record.  The header record
is dumped in readable format in the .info file.  The kernel
is not saved.  Only complete dumps will be saved.

All maintainer rights for this code are disclaimed: feel free to
improve and extend.

Sponsored by:   DARPA, NAI Labs
2002-03-31 22:37:00 +00:00
Poul-Henning Kamp 1f3a74b1b1 Implement the two "GEOM" ioctls DIOCGSECTORSIZE and DIOCGMEDIASIZE for
the non-GEOM code as well.  This simplifies the the kernel-dumping
and disk-management tools as less compatibility cruft will be needed.

Sponsored by:	DARPA and NAI Labs.
2002-03-31 21:17:12 +00:00
Alan Cox a5c0b1c020 Keep the reference to the file acquired in _aio_aqueue() until the operation
completes.  The reference is released in aio_free_entry().

Submitted by:	tegge
2002-03-31 20:17:56 +00:00
Alfred Perlstein 7b11fea64f Close some holes with p->p_args by NULL'ing out the p->p_args pointer
while holding the proc lock, and by holding the pargs structure when
accessing it from outside of the owner.

Submitted by: Jonathan Mini <mini@haikugeek.com>
2002-03-31 10:33:12 +00:00
Poul-Henning Kamp 8d19a26558 Centralize the "bootdev" and "dumpdev" variables. They are still pretty
bogus all things considered, but at least now they don't camouflage as
being MD variables.
2002-03-31 07:15:28 +00:00
Alan Cox 5e20c11f19 Add a local proc *p in exec_new_vmspace() to avoid repeated dereferencing
to obtain it.
2002-03-31 00:05:30 +00:00
Bruce Evans 4f1f485f34 Fixed handling of short reads in readdisklabel() and writedisklabel().
These functions use DEV_STRATEGY() which can easily return a short
count (with no error) for reads near EOF.  EOF happens for "disks" too
small to contain a label sector (mainly for empty slices).  The functions
didn't understand this at all, and looked for labels in the garbage
in the buffer beyond what DEV_STRATEGY() returned.  The recent UMA
changes combined with my local changes and configuration resulted in
the garbage often containing a valid but garbage label left over from
a previous call.

Bugs in EOF handling in -current limited the problem to "disks" with
size precisely LABELSECTOR sectors.  LABELSECTOR happens to be a very
unusual "disk" size since it is only 0 for non-i386 arches that don't
usually have disks with DOS MBRs.
2002-03-30 16:02:43 +00:00
Dan Moschuk e7876c0943 Nuke CV_DEBUG in favour of INVARIANTS.
Approved by: jhb
2002-03-30 03:52:52 +00:00
Jake Burkholder b454c6dd29 Style fixes purposefully left out of last commit. I checked the kse tree
and didn't see any changes that this conflicts with.
2002-03-29 16:45:03 +00:00
Jake Burkholder d0ce9a7e07 Remove abuse of intr_disable/restore in MI code by moving the loop in ast()
back into the calling MD code.  The MD code must ensure no races between
checking the astpening flag and returning to usermode.

Submitted by:	peter (ia64 bits)
Tested on:	alpha (peter, jeff), i386, ia64 (peter), sparc64
2002-03-29 16:35:26 +00:00
Seigo Tanimura 5cf4bcebbf The description of fd_mtx is "filedesc structure." 2002-03-29 11:26:05 +00:00
Matthew N. Dodd 32bc1098b2 Add resource_list_add_next() which returns the RID for the resource added. 2002-03-29 06:42:54 +00:00
Alfred Perlstein c1508b28c6 To remove nested include of sys/lock.h and sys/mutex.h from sys/proc.h
make the pargs_* functions into non-inlines in kern/kern_proc.c.

Requested by: bde
2002-03-28 18:12:27 +00:00
Poul-Henning Kamp 45609bea17 Get the magnitude of the NTP adjustment right. 2002-03-28 16:02:44 +00:00
Maxime Henrion daab5e2472 - Properly sync vfs_nmount() with changes that have be already done
in vfs_mount(), in particular revisions 1.215, 1.227 and 1.240.
- flag2 is a low quality variable name, change it to kern_flag.
- strncpy NUL-terminates f_fstypename and f_mntonname since the strings
  have length <= <buffer length> - 1, so the explicit NUL-termination is
  bogus.
- M_ZERO'ing space for fstype and fspath is stupid since we never use the
  space beyond the end of the string.
- Do various style(9) cleanups in both functions.

Submitted by:	bde
Reviewed by:	phk
2002-03-28 13:47:32 +00:00
Alan Cox cd430164f1 Allow resursion on the pipe mutex because filt_piperead() and filt_pipewrite()
can be called both with and without the pipe mutex held.  (For example,
if called by pipeselwakeup(), it is held.  Whereas, if called by kqueue_scan(),
it is not.)

Reviewed by:	alfred
2002-03-27 21:47:50 +00:00
Alfred Perlstein 8899023f66 Make the reference counting of 'struct pargs' SMP safe.
There is still some locations where the PROC lock should be held
in order to prevent inconsistent views from outside (like the
proc->p_fd fix for kern/vfs_syscalls.c:checkdirs()) that can be
fixed later.

Submitted by: Jonathan Mini <mini@haikugeek.com>
2002-03-27 21:36:18 +00:00
Jeff Roberson f22a4b62f5 Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks
with this flag.  Remove the dup_list and dup_ok code from subr_witness.  Now
we just check for the flag instead of doing string compares.

Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface.  The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.

Approved by:	jhb
2002-03-27 09:23:41 +00:00
Matthew Dillon e6bbfd402d oops, forgot to commit this. td->td_savecrit = 0 replaced by API
call cpu_thread_link().
2002-03-27 08:26:37 +00:00
Jake Burkholder f2a79bb9b4 Make this compile.
Pointy hat to:	dillon
2002-03-27 06:44:32 +00:00
Matthew Dillon d74ac6819b Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit().  Cleanup the td_savecrit field by moving it
from MI to MD.  Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections.  This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit.  Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways.  This should
be temporary.

Reviewed by:	core
Approved by:	core
2002-03-27 05:39:23 +00:00
Bruce Evans c0f7f75fd7 "Fixed" -Wshadow warnings by changing the name of some function parameters
from `index' to `indx'.  The correct fix would be to not support or use
index().
2002-03-27 04:04:17 +00:00
Alan Cox cb100b25ce Remove an unnecessary and inconsistently used variable from exec_new_vmspace(). 2002-03-26 19:20:04 +00:00
Andrew R. Reiter dcce8874eb - Fixup a few style nits:
- return error -> return (error);
  - move a declaration to the top of the function.
  - become bug for bug compatible with if (error) lines.

Submitted by: bde
2002-03-26 18:07:10 +00:00
Maxime Henrion 17594b936b As discussed in -arch, add the new nmount(2) system call and the
new vfs_getopt()/vfs_copyopt() API.  This is intended to be used
later, when there will be filesystems implementing the VFS_NMOUNT
operation.  The mount(2) system call will disappear when all
filesystems will be converted to the new API.  Documentation will
be committed in a while.

Reviewed by:	phk
2002-03-26 15:33:44 +00:00
Bruce Evans 237e41fc58 Added used include of <sys/sx.h>. Don't depend on namespace pollution in
<sys/file.h>.
2002-03-26 01:09:51 +00:00
Bruce Evans ee99e978a3 Added used include of <sys/sx.h>. Don't depend on namespace pollution in
<sys/file.h> or <sys/socketvar.h>.
2002-03-25 21:52:04 +00:00
David E. O'Brien 0beb3ecc6c Commit work-around for panics when mounting FS's that are auto-loaded as
modules (ie. procfs.ko).

When the kernel loads dynamic filesystem module, it looks for any of the
VOP operations specified by the new filesystem that have not been registered
already by the currently known filesystems.  If any of such operations exist,
vfs_add_vnops function calls vfs_opv_recalc function, which rebuilds vop_t
vectors for each filesystem and sets all global pointers like ufs_vnops_p,
devfs_specop_p, etc to the new values and then frees the old pointers.  This
behavior is bad because there might be already active vnodes whose v_op fields
will be left pointing to the random garbage, leading to inevitable crash soon.

Submitted by:	Alexander Kabaev <ak03@gte.com>
2002-03-25 21:30:50 +00:00
Andrew R. Reiter 517f30c2c1 - Recommit the securelevel_gt() calls removed by commits rev. 1.84 of
kern_linker.c and rev. 1.237 of vfs_syscalls.c since these are not the
  source of the recent panics occuring around kldloading file system
  support modules.

Requested by: rwatson
2002-03-25 18:26:34 +00:00
Poul-Henning Kamp aaead0dfe9 Modernize my email address. 2002-03-25 13:52:45 +00:00
Bruce Evans 70f52b4845 Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses.  Switch to KNF
formatting and/or rewrap the whole prototype in some cases.
2002-03-24 05:09:11 +00:00
John Baldwin d846883bc4 Use td_ucred in several trivial syscalls and remove Giant locking as
appropriate.
2002-03-22 22:32:04 +00:00
John Baldwin f2ae7368ea Use explicit Giant locks and unlocks for rather than instrumented ones for
code that is still not safe.  suser() reads p_ucred so it still needs
Giant for the time being.  This should allow kern.giant.proc to be set
to 0 for the time being.
2002-03-22 21:02:02 +00:00
Robert Watson 29dc1288b0 Merge from TrustedBSD MAC branch:
Move the network code from using cr_cansee() to check whether a
    socket is visible to a requesting credential to using a new
    function, cr_canseesocket(), which accepts a subject credential
    and object socket.  Implement cr_canseesocket() so that it does a
    prison check, a uid check, and add a comment where shortly a MAC
    hook will go.  This will allow MAC policies to seperately
    instrument the visibility of sockets from the visibility of
    processes.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-03-22 19:57:41 +00:00
Alfred Perlstein db51256707 When "cloning" a pipe's buffer bcopy the data after dropping the pipe's
lock as the data may be paged out and cause a fault.
2002-03-22 16:09:22 +00:00
Robert Watson 7906271f25 In sysctl, req->td is believed always to be non-NULL, so there's no need
to test req->td for NULL values and then do somewhat more bizarre things
relating to securelevel special-casing and suser checks.  Remove the
testing and conditional security checks based on req->td!=NULL, and insert
a KASSERT that td != NULL.  Callers to sysctl must always specify the
thread (be it kernel or otherwise) requesting the operation, or a
number of current sysctls will fail due to assumptions that the thread
exists.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
Discussed with:	bde
2002-03-22 14:58:27 +00:00
Robert Watson 4584bb3945 Since cred never appears to be passed into the securelevel calls as
NULL, turn warning printf's into panic's, since this call has been
restructured such that a NULL cred would result in a page fault anyway.

There appears to be one case where NULL is explicitly passed in in the
sysctl code, and this is believed to be in error, so will be modified.
Securelevels now always require a credential context so that per-jail
securelevels are properly implemented.

Obtained from:	TrustedBSD Project
Sponsored by:	NAI Labs
Discussed with:	bde
2002-03-22 14:49:12 +00:00
Andrew R. Reiter fe3240e9aa - Back out the commit to make the linker_load_file() securelevel check
made aware in jail environments.  Supposedly something is broken, so
  this should be backed out until further investigation proves otherwise,
  or a proper fix can be provided.
2002-03-22 04:56:09 +00:00
Robert Watson 1b350b4542 Break out the "see_other_uids" policy check from the various
method-based inter-process security checks.  To do this, introduce
a new cr_seeotheruids(u1, u2) function, which encapsulates the
"see_other_uids" logic.  Call out to this policy following the
jail security check for all of {debug,sched,see,signal} inter-process
checks.  This more consistently enforces the check, and makes the
check easy to modify.  Eventually, it may be that this check should
become a MAC policy, loaded via a module.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-03-22 02:28:26 +00:00
Andrew R. Reiter e85b9ae9ac - Fix a logic error in checking the securelevel that was introduced in the
previous commit.

Pointy hats to: arr, rwatson
2002-03-21 15:27:39 +00:00
Warner Losh cb9a238a8a Remove last two abuses of cpu_critical_{enter,exit} in the MI code.
Reviewed by: jake, jhb, rwatson
2002-03-21 06:11:09 +00:00
Benno Rice 565ab9395f Add a change mirroring that made to kern/subr_trap.c and others.
This makes kernel builds with DIAGNOSTIC work again.

Apparently forgotten by:	jhb
Might want to be checked by:	jhb
2002-03-21 02:47:51 +00:00
Jeff Roberson 59295dba57 UMA permited us to utilize the 'waitok' flag to soalloc. 2002-03-20 21:23:26 +00:00
John Baldwin 01c04d2de9 Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined.
Instead of caching the ucred reference, just go ahead and eat the
decerement and increment of the refcount.  Now that Giant is pushed down
into crfree(), we no longer have to get Giant in the common case.  In the
case when we are actually free'ing the ucred, we would normally free it on
the next kernel entry, so the cost there is not new, just in a different
place.  This also removse td_cache_ucred from struct thread.  This is
still only done #ifdef DIAGNOSTIC.

[ missed this file in the previous commit ]

Tested on:	i386, alpha
2002-03-20 21:12:04 +00:00
John Baldwin c1a513c951 - Push down Giant into crfree() in the case that we actually free a ucred.
- Add a cred_free_thread() function (conditional on DIAGNOSTICS) that drops
  a per-thread ucred reference to be used in debugging code when leaving
  the kernel.
2002-03-20 21:00:50 +00:00
Andrew R. Reiter c457a4403a - Change a check of securelevel to securelevel_gt() call in order to help
against users within a jail attempting to load kernel modules.
- Add a check of securelevel_gt() to vfs_mount() in order to chop some
  low hanging fruit for the repair of securelevel checking of linking and
  unlinking files from within jails.  There is more to be done here.

Reviewed by: rwatson
2002-03-20 16:03:42 +00:00
Andrew R. Reiter dca9d05526 - Remove a semi-colon from after SYSINIT that was introduced in rev. 1.163. 2002-03-20 14:46:38 +00:00
Jeff Roberson 586c8b6b29 Add calls to uma_zone_set_max() to restore previously enforced limits. 2002-03-20 05:30:58 +00:00
Jeff Roberson 54d77689ed Backout part of my previous commit; I was wrong about vm_zone's handling of
limits on zones w/o objects.
2002-03-20 04:39:32 +00:00
Jeff Roberson 9e9d298a9b Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 04:11:52 +00:00
Jeff Roberson c897b81311 Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets.  If you look carefully you'll notice that the old
zone allocator never honored this anyway.
2002-03-20 04:09:59 +00:00
Alfred Perlstein 4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Alfred Perlstein 1f31a77ce8 don't generate files with __P. 2002-03-19 20:48:32 +00:00
Andrew R. Reiter 08a54da785 - Change a malloc / bzero pair to make use of the M_ZERO malloc(9) flag. 2002-03-19 15:41:21 +00:00
Peter Wemm 30171114b3 Fix a gcc-3.1+ warning.
warning: deprecated use of label at end of compound statement

ie: you cannot do this anymore:
switch(foo) {
....

default:
}
2002-03-19 11:02:06 +00:00
Peter Wemm 3ba30c18a2 Pacify gcc-3.1+, initialize two variables to avoid -Wuninitialized
warnings.
2002-03-19 10:57:40 +00:00
Peter Wemm a5e7c7da5e Fix warnings on gcc-3.1+ where __func__ is a const char * instead of a
string.
2002-03-19 10:56:46 +00:00
Jeff Roberson 8355f576a9 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
Alfred Perlstein 4a950215ef Close a race when vfs_syscalls.c:checkdirs() runs.
To do this protect the filedesc pointer in the proc with PROC_LOCK
in both checkdirs() and kern_descrip.c:fdfree().
2002-03-19 04:30:04 +00:00
Bruce Evans 367b50a28f Fixed some printf format errors (hopefully all of the remaining daddr64_t
ones for GENERIC, and all others on the same line as those).  Reformat
the printfs if necessary to avoid new long lones or old format printf
errors.
2002-03-19 04:09:21 +00:00
Andrew R. Reiter 9b3851e9e3 - Lock down the ``module'' structure by adding an SX lock that is used by
all the global bits of ``module'' data.  This commit adds a few generic
  macros, MOD_SLOCK, MOD_XLOCK, etc., that are meant to be used as ways
  of accessing the SX lock.  It is also the first step in helping to lock
  down the kernel linker and module systems.

Reviewed by: jhb, jake, smp@
2002-03-18 07:45:30 +00:00
Kirk McKusick a0595d0249 Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
2002-03-17 01:25:47 +00:00
Jake Burkholder ac59490b5e Convert all pmap_kenter/pmap_kremove pairs in MI code to use pmap_qenter/
pmap_qremove.  pmap_kenter is not safe to use in MI code because it is not
guaranteed to flush the mapping from the tlb on all cpus.  If the process
in question is preempted and migrates cpus between the call to pmap_kenter
and pmap_kremove, the original cpu will be left with stale mappings in its
tlb.  This is currently not a problem for i386 because we do not use PG_G on
SMP, and thus all mappings are flushed from the tlb on context switches, not
just user mappings.  This is not the case on all architectures, and if PG_G
is to be used with SMP on i386 it will be a problem.  This was committed by
peter earlier as part of his fine grained tlb shootdown work for i386, which
was backed out for other reasons.

Reviewed by:	peter
2002-03-17 00:56:41 +00:00
Dag-Erling Smørgrav 8bc814e603 Implement PT_IO (read / write arbitrary amounts of data or text).
Submitted by:	Artur Grabowski <art@{blahonga,openbsd}.org>
Obtained from:	OpenBSD
2002-03-16 02:40:02 +00:00
Dag-Erling Smørgrav a888d317bb PT_[GS]ET{,DB,FP}REGS isn't really optional any more, since we have dummy
backend functions for those archs that don't support them.  I meant to do
this ages ago, but never got around to it.

Inspired by:	OpenBSD
2002-03-15 20:17:12 +00:00
Kirk McKusick 0d2af52141 Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
2002-03-15 18:49:47 +00:00
Alfred Perlstein 628abf6c69 Giant pushdown for read/write/pread/pwrite syscalls.
kern/kern_descrip.c:
Aquire Giant in fdrop_locked when file refcount hits zero, this removes
the requirement for the caller to own Giant for the most part.

kern/kern_ktrace.c:
Aquire Giant in ktrgenio, simplifies locking in upper read/write syscalls.

kern/vfs_bio.c:
Aquire Giant in bwillwrite if needed.

kern/sys_generic.c
Giant pushdown, remove Giant for:
   read, pread, write and pwrite.
readv and writev aren't done yet because of the possible malloc calls
for iov to uio processing.

kern/sys_socket.c
Grab giant in the socket fo_read/write functions.

kern/vfs_vnops.c
Grab giant in the vnode fo_read/write functions.
2002-03-15 08:03:46 +00:00
Alfred Perlstein 3b018f572d Bug fixes:
Missed a place where the pipe sleep lock was needed in order to safely grab
Giant, fix it and add an assertion to make sure this doesn't happen again.

Fix typos in the PIPE_GET_GIANT/PIPE_DROP_GIANT that could cause the
wrong mutex to get passed to PIPE_LOCK/PIPE_UNLOCK.

Fix a location where the wrong pipe was being passed to
PIPE_GET_GIANT/PIPE_DROP_GIANT.
2002-03-15 07:18:09 +00:00
Alfred Perlstein 85f190e4d1 Fixes to make select/poll mpsafe.
Problem:
  selwakeup required calling pfind which would cause lock order
  reversals with the allproc_lock and the per-process filedesc lock.
Solution:
  Instead of recording the pid of the select()'ing process into the
  selinfo structure, actually record a pointer to the thread.  To
  avoid dereferencing a bad address all the selinfo structures that
  are in use by a thread are kept in a list hung off the thread
  (protected by sellock).  When a selwakeup occurs the selinfo is
  removed from that threads list, it is also removed on the way out
  of select or poll where the thread will traverse its list removing
  all the selinfos from its own list.

Problem:
  Previously the PROC_LOCK was used to provide the mutual exclusion
  needed to ensure proper locking, this couldn't work because there
  was a single condvar used for select and poll and condvars can
  only be used with a single mutex.
Solution:
  Introduce a global mutex 'sellock' which is used to provide mutual
  exclusion when recording events to wait on as well as performing
  notification when an event occurs.

Interesting note:
  schedlock is required to manipulate the per-thread TDF_SELECT
  flag, however if given its own field it would not need schedlock,
  also because TDF_SELECT is only manipulated under sellock one
  doesn't actually use schedlock for syncronization, only to protect
  against corruption.

Proc locks are no longer used in select/poll.

Portions contributed by: davidc
2002-03-14 01:32:30 +00:00
Brian Feldman 0e0af8ecda Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate.
While doing this, move it earlier in the sysinit boot process so that the
VM system can use it.

After that, the system is now able to use sx locks instead of lockmgr
locks in the VM system.  To accomplish this, some of the more
questionable uses of the locks (such as testing whether they are
owned or not, as well as allowing shared+exclusive recursion) are
removed, and simpler logic throughout is used so locks should also be
easier to understand.

This has been tested on my laptop for months, and has not shown any
problems on SMP systems, either, so appears quite safe.  One more
user of lockmgr down, many more to go :)
2002-03-13 23:48:08 +00:00
Archie Cobbs 44a8ff315e Add realloc() and reallocf(), and make free(NULL, ...) acceptable.
Reviewed by:	alfred
2002-03-13 01:42:33 +00:00
Jeff Roberson 8de00f4a87 This patch adds the "LOCKSHARED" option to namei which causes it to only acquire shared locks on leafs.
The stat() and open() calls have been changed to make use of this new functionality.  Using shared locks in
these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes.  Also,
this reduces the number of exclusive locks that are floating around in the system, which helps reduce the
number of deadlocks that occur.

A new kernel option "LOOKUP_SHARED" has been added.  It defaults to off so this patch can be turned on for
testing, and should eventually go away once it is proven to be stable.  I have personally been running this
patch for over a year now, so it is believed to be fully stable.

Reviewed by:	jake, obrien
Approved by:	jake
2002-03-12 04:00:11 +00:00
Poul-Henning Kamp 417fb7f6fa Make the disk_clone() routine more robust for abuse.
Sneak in a trivial bit of the GEOM stuff while we're here anyway.
2002-03-11 08:08:02 +00:00
Seigo Tanimura 183ccde6c6 Stop abusing the pgrpsess_lock. 2002-03-11 07:53:13 +00:00
Seigo Tanimura aa3bf85c54 Do not lock the pgrpsess_lock exclusively across ttywait().
Spotted by:		David Wolfskill <david@catwhisker.org>
Investigated by:	rwatson
2002-03-11 07:51:08 +00:00
David Malone 6c75a65a00 Don't assign strcmp to a variable called err and then compare it
with zero, just compare strcmp with zero. This fixes the same bug
which Maxim just fixed and fixes some odd style too.

PR:		35712
Reviewed by:	arr
2002-03-10 23:12:43 +00:00
Maxim Sobolev 832af2d5ed Fix a breakage introduced in rev.1.75 (supposedly style cleanup), which results
in "missing dependencies" error when loading some kld modules. It is sad to
see how often these days style cleanus break doesn't broken things. Perhaps
people should recall good old principle: "don't fix it if it isn't broken".
2002-03-10 19:20:01 +00:00