Commit graph

522 commits

Author SHA1 Message Date
Don Lewis 8ded654028 Don't attempt to trigger the syncer thread final sync code in the
shutdown_pre_sync state if the RB_NOSYNC flag is set.  This is the
likely cause of hangs after a system panic that are keeping crash
dumps from being done.

This is a MFC candidate for RELENG_5.

MFC after:	3 days
2004-08-20 19:21:47 +00:00
David E. O'Brien 78c37b0de8 s/MAX_SAFE_MAXVNODES/MAXVNODES_MAX/g 2004-08-16 08:33:37 +00:00
John-Mark Gurney ad3b9257c2 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
Robert Watson 87e83e7d4c In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However,
we may sleep when doing so; check that we didn't race with another thread
allocating storage for the vnode after allocation is made to a local
pointer, and only update the vnode pointer if it's still NULL.  Otherwise,
accept that another thread got there first, and release the local storage.

Discussed with:	jmg
2004-08-11 01:27:53 +00:00
Nate Lawson c8c216d558 Skip the syncing disks loop if there are no dirty buffers. Remove a
variable used to flag the initial printf.

Submitted by:	truckman (earlier version)
2004-08-10 01:32:05 +00:00
David E. O'Brien 64298d52cc Put a cap on the auto-tuning of kern.maxvnodes.
Cap value chosen by:	scottl
2004-08-02 21:52:43 +00:00
Nate Lawson b1c8139147 Minor message cleanup. 2004-07-30 01:30:05 +00:00
Poul-Henning Kamp 3dfe213e61 Convert the vfsconf list to a TAILQ.
Introduce vfs_byname() function to find things on it.

Staticize vfs_nmount() function under the name vfs_donmount().

Various cleanups.
2004-07-27 22:32:01 +00:00
Colin Percival 56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Poul-Henning Kamp cf95b5c381 Eliminate unused second argument to reassignbuf() and simplify it
accordingly.
2004-07-25 21:24:23 +00:00
Alfred Perlstein 05656b6e2b put several of the options for DEBUG_VFS_LOCKS under control of sysctls. 2004-07-21 07:13:14 +00:00
Alfred Perlstein bb5faea34f Cleanup shutdown output. 2004-07-15 08:01:00 +00:00
Alfred Perlstein da6303bacc Tidy up system shutdown. 2004-07-15 04:29:48 +00:00
Alfred Perlstein f257b7a54b Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
Alfred Perlstein 7ae8ce5df1 Dump the actual bad values when this assertion is tripped. 2004-07-12 04:13:38 +00:00
Marcel Moolenaar 32240d082c Update for the KDB framework:
o  Call kdb_enter() instead of Debugger().
2004-07-10 21:47:53 +00:00
Alfred Perlstein 057589c485 fixup sysctl by fsid node 2004-07-08 06:11:36 +00:00
Alfred Perlstein ea0104b032 Introduce vfs_suser(), used to test if a user should have special privs
for a mount.
2004-07-06 09:37:43 +00:00
Alfred Perlstein c713aaaeca NFS mobility PHASE I, II & III (phase VI, and V pending):
Rebind the client socket when we experience a timeout.  This fixes
the case where our IP changes for some reason.

Signal a VFS event when NFS transitions from up to down and vice
versa.

Add a placeholder vfs_sysctl where we will put status reporting
shortly.

Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.
2004-07-06 09:12:03 +00:00
Don Lewis 27875d9c88 Unconditionally set last_work_seen while in the SYNCER_RUNNING state
so that last_work_seen has a reasonable value at the transition
to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened
to be zero at the time.

Initialize last_work_seen to zero as a safety measure in case the
syncer never ran in the SYNCER_RUNNING state.

Tested by:	phk
2004-07-05 21:32:01 +00:00
Don Lewis faf1b66d1d Rework syncer termination code:
Speed up the syncer when shutting down by sleeping for a shorter
    period of time instead of cranking up rushjob and using the
    normal one second sleep.

    Skip empty worklist slots when shutting down to avoid lengthy
    intervals of inactivity.

    Give I/O more time to complete between steps by not speeding the
    syncer quite as much.

    Terminate the syncer after one full pass through the worklist
    plus one second with the worklist containing nothing but syncer
    vnodes.

    Print an indication of shutdown progress to the console.

Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist
to be monitored.
2004-07-05 01:07:33 +00:00
Poul-Henning Kamp c555963fd1 Give synthetic root filesystem device vnodes a v_bsize of DEV_BSIZE. 2004-07-04 22:33:22 +00:00
Alfred Perlstein 2d1dca73ee Pass the operation in with the fsidctl.
Remove some fsidctls that we will not be using.
Correct prototypes for fs sysctls.
2004-07-04 20:21:58 +00:00
Poul-Henning Kamp 7f6599fec6 Make the last commit handle non-phk root devices better. 2004-07-04 19:42:25 +00:00
Poul-Henning Kamp 1cbb1e02c4 Blocksize for I/O should be a property of the vnode and not found by groping
around in the vnodes surroundings when we allocate a block.

Assign a blocksize when we create a vnode, and yell a warning (and ignore it)
if we got the wrong size.

Please email all such warnings to me.
2004-07-04 12:49:04 +00:00
Alfred Perlstein 94ed9c8af5 Introduce a new kevent filter. EVFILT_FS that will be used to signal
generic filesystem events to userspace.  Currently only mount and unmount
of filesystems are signalled.  Soon to be added, up/down status of NFS.

Introduce a sysctl node used to route requests to/from filesystems
based on filesystem ids.

Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/
entrypoint by the sysctl code to change individual filesystems.
2004-07-04 10:52:54 +00:00
Alfred Perlstein 903ac7c219 Revision 1.496 would not boot on my system due to
ffs_mount -> bdevvp -> getnewvnode(..., mp = NULL, ...) ->
 insmntqueue(vp, mp = NULL) -> KASSERT -> panic

Make getnewvnode() only call insmntqueue() if the mountpoint parameter
is not NULL.
2004-07-04 10:19:15 +00:00
Poul-Henning Kamp e3c5a7a4dd When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint.  If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

		MNT_ILOCK(mp);
	loop:
		for (vp = TAILQ_FIRST(&mp...);
		    (vp = nvp) != NULL;
		    nvp = TAILQ_NEXT(vp,...)) {
			if (vp->v_mount != mp)
				goto loop;
			MNT_IUNLOCK(mp);
			...
			MNT_ILOCK(mp);
		}
		MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

	MNT_ILOCK(vp->v_mount);
	...
	TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
	...
	MNT_IUNLOCK(vp->v_mount);
	...
	vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go.  This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.
2004-07-04 08:52:35 +00:00
Don Lewis e06500dde5 When shutting down the syncer kernel thread, first tell it to run
faster and iterate to over its work list a few times in an attempt
to empty the work list before the syncer terminates.  This leaves
fewer dirty blocks to be written at the "syncing disks" stage and
keeps the the "giving up on N buffers" problem from being triggered
by the presence of a large soft updates work list at system shutdown
time.  The downside is that the syncer takes noticeably longer to
terminate.

Tested by:	"Arjan van Leeuwen" <avleeuwen AT piwebs DOT com>
Approved by:	mckusick
2004-07-01 23:59:19 +00:00
Poul-Henning Kamp f3732fd15b Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
Poul-Henning Kamp 89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Poul-Henning Kamp 170593a9b5 Remove a left over from userland buffer-cache access to disks. 2004-06-14 14:25:03 +00:00
Robert Watson 9e6127fe3b Assert Giant in vrele(). 2004-05-31 19:06:01 +00:00
Maxime Henrion a0b5a67929 Put deprecated sysctl code inside BURN_BRIDGES. 2004-04-11 21:09:22 +00:00
Warner Losh 7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Peter Wemm 39d3505a30 Kill some XXXKSE's. vnlru/syncer are single threaded. 2004-03-29 22:45:33 +00:00
Poul-Henning Kamp 4d453ef101 Properly vector all bwrite() and BUF_WRITE() calls through the same path
and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
2004-03-11 18:02:36 +00:00
Poul-Henning Kamp 651b11eaf2 Remove unused second arg to vfinddev().
Don't call addaliasu() on VBLK nodes.
2004-03-11 16:33:11 +00:00
Alexander Kabaev ff85a3f0e1 Always call vn_finished_write after vn_start_write was called. All
occurences of 'goto done' after vn_start_write invocation were cleaning
up incompletely.
2004-03-06 04:09:54 +00:00
John Baldwin 44f3b09204 Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
  and condition variables.  Having sleep qeueus in a hash table avoids
  having to allocate a queue head for each wait channel.  Thus, struct cv
  has shrunk down to just a single char * pointer now.  However, the
  hash table does not hold threads directly, but queue heads.  This means
  that once you have located a queue in the hash bucket, you no longer have
  to walk the rest of the hash chain looking for threads.  Instead, you have
  a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
  differentiates between cv's and sleep/wakeup.  For example, calls to
  abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
  Thus, the TDF_CVWAITQ flag is removed.  Also, calls to unsleep() and
  cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
  sleep's no longer inherently bump the priority.  Instead, this is soley
  a propery of msleep() which explicitly calls sched_prio() before
  blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used.  The
  associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
  dropped and replaced with a single explicit clearing of td_wchan.
  TD_SET_ONSLEEPQ() would really have only made sense if it had taken
  the wait channel and message as arguments anyway.  Now that that only
  happens in one place, a macro would be overkill.
2004-02-27 18:52:44 +00:00
Don Lewis 47934cef8f Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
Poul-Henning Kamp ded67d0f77 Check for NODEV return from udev2dev() 2004-02-21 23:52:03 +00:00
Poul-Henning Kamp cd690b60de Device megapatch 6/6:
This is what we came here for:  Hang dev_t's from their cdevsw,
refcount cdevsw and dev_t and generally keep track of things a lot
better than we used to:

Hold a cdevsw reference around all entrances into the device driver,
this will be necessary to safely determine when we can unload driver
code.

Hold a dev_t reference while the device is open.

KASSERT that we do not enter the driver on a non-referenced dev_t.

Remove old D_NAG code, anonymous dev_t's are not a problem now.

When destroy_dev() is called on a referenced dev_t, move it to
dead_cdevsw's list.  When the refcount drops, free it.

Check that cdevsw->d_version is correct.  If not, set all methods
to the dead_*() methods to prevent entrance into driver.  Print
warning on console to this effect.  The device driver may still
explode if it is also incompatible with newbus, but in that case
we probably didn't get this far in the first place.
2004-02-21 21:57:26 +00:00
Poul-Henning Kamp 816d62bbb9 Device megapatch 5/6:
Remove the unused second argument from udev2dev().

Convert all remaining users of makedev() to use udev2dev().  The
semantic difference is that udev2dev() will only locate a pre-existing
dev_t, it will not line makedev() create a new one.

Apart from the tiny well controlled windown in D_PSEUDO drivers,
there should no longer be any "anonymous" dev_t's in the system
now, only dev_t's created with make_dev() and make_dev_alias()
2004-02-21 21:32:15 +00:00
Alexander Kabaev 580ddfa64b More style fixes.
Obtained from:	bde
2004-01-05 23:40:46 +00:00
Alexander Kabaev b0fdf71656 style(9):
Add empty line before first code line in functions with no local
variables.
Properly terminate comment sentences.
Indent lines which are longer that 80 characters.
Move v_addpollinfo closer to the rest of poll-related functions.
Move DEBUG_VFS_LOCKS ifdefed block to the end of file.

Obtained from:	bde (partly)
2004-01-05 19:04:29 +00:00
Alexander Kabaev 3ff1b7c23f Cosmetics: strip '\n' from a string passed to Debugger(). 2004-01-04 03:42:20 +00:00
Bruce Evans 9efe7d9d83 v_vxproc was a bogus name for a thread (pointer). 2003-12-28 09:12:56 +00:00
Jeff Roberson 958557e9c7 - In vget() if LK_NOWAIT is specified we should return EBUSY and not ENOENT.
Submitted by:	Stephan Uphoff <ups@stups.com>
2003-12-16 17:08:27 +00:00
Jeff Roberson d85213669b - When doing a forced unmount, VFS attempts to keep VCHR vnodes valid by
reassigning their v_ops field to specfs, detaching from the mountpoint, etc.
   However, this is not sufficient.  If we vclean() the vnode the pages owned
   by the vnode are lost, potentially while buffers reference them.  Implement
   parts of vclean() seperately in vgonechrl() so that the pages and bufs
   associated with a device vnode are not destroyed while in use.
2003-12-16 17:05:05 +00:00