Commit graph

3120 commits

Author SHA1 Message Date
Jake Burkholder 4ef34f39ec Really fix USER_LDT. (Don't use currentldt as an L-value.) 2000-09-08 03:36:09 +00:00
Jason Evans 0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Jason Evans 62ae6c89ad Add KTR, a facility that logs kernel events in order to to facilitate
debugging.

Acquired from:	BSDi (BSD/OS)
Submitted by:	dfr, grog, jake, jhb
2000-09-07 01:29:44 +00:00
Don Lewis c5930ee45a Change the calls to panic() in uifree(), chgproccnt(), and chgsbsize()
to printf().  Any errors detected are not likely to be fatal, so it
should be safe to let things keep running.
2000-09-06 19:00:19 +00:00
Alfred Perlstein 34b94e8b82 Accept filter maintainance
Update copyrights.

Introduce a new sysctl node:
  net.inet.accf

Although acceptfilters need refcounting to be properly (safely) unloaded
as a temporary hack allow them to be unloaded if the sysctl
net.inet.accf.unloadable is set, this is really for developers who want
to work on thier own filters.

A near complete re-write of the accf_http filter:
  1) Parse check if the request is HTTP/1.0 or HTTP/1.1 if not dump
     to the application.
     Because of the performance implications of this there is a sysctl
     'net.inet.accf.http.parsehttpversion' that when set to non-zero
     parses the HTTP version.
     The default is to parse the version.
  2) Check if a socket has filled and dump to the listener
  3) optimize the way that mbuf boundries are handled using some voodoo
  4) even though you'd expect accept filters to only be used on TCP
     connections that don't use m_nextpkt I've fixed the accept filter
     for socket connections that use this.

This rewrite of accf_http should allow someone to use them and maintain
full HTTP compliance as long as net.inet.accf.http.parsehttpversion is
set.
2000-09-06 18:49:13 +00:00
Peter Wemm fc611b0634 Do not panic on an uninitialized VOP_xxx() call. This was meant as a
sanity check, but it is too easy to run into, eg: making an ACL syscall
when no filesystems have the ACL implementation enabled.

The original reason for the panic was that the VOP_ vector had not been
assigned and therefor could not be passed down the stack.. and there
was no point passing it down since nothing implemented it anyway.
vop_defaultop entries could not pass it on because it had a zero (unknown)
vector that was indistinguishable from another unknown VOP vector.

Anyway, we can do something reasonable in this case, we shouldn't need
to panic here as there is a reasonable recovery option (return EOPNOTSUPP
and dont pass it down the stack).

Requested by: rwatson
2000-09-06 17:51:54 +00:00
Robert Watson 728783c27a o Synchronize vaccess() capability access control checks with TrustedBSD
tree.

Obtained from:	TrustedBSD Project
2000-09-06 12:18:24 +00:00
David E. O'Brien 6b6821c771 The kernel is now known as `kernel.ko' and it and its matching modules
live in ``/boot/kernel/''.

Submitted by:	Hisashi Hiramoto <hiramoto@phys.chs.nihon-u.ac.jp>
2000-09-06 06:22:20 +00:00
Boris Popov 9548091b84 Ignore ELF files with 'interpreter' section because KLDs doesn't contain it.
Reviewed by:	peter
2000-09-06 02:21:43 +00:00
Don Lewis f535380cb6 Remove uidinfo hash table lookup and maintenance out of chgproccnt() and
chgsbsize(), which are called rather frequently and may be called from an
interrupt context in the case of chgsbsize().  Instead, do the hash table
lookup and maintenance when credentials are changed, which is a lot less
frequent.  Add pointers to the uidinfo structures to the ucred and pcred
structures for fast access.  Pass a pointer to the credential to chgproccnt()
and chgsbsize() instead of passing the uid.  Add a reference count to the
uidinfo structure and use it to decide when to free the structure rather
than freeing the structure when the resource consumption drops to zero.
Move the resource tracking code from kern_proc.c to kern_resource.c.  Move
some duplicate code sequences in kern_prot.c to separate helper functions.
Change KASSERTs in this code to unconditional tests and calls to panic().
2000-09-05 22:11:13 +00:00
Poul-Henning Kamp 64dc16df4a Move extern declaration of dead_vnodeop_p to a .h file.
Remove race condition in vn_isdisk().
2000-09-05 21:09:56 +00:00
Robert Watson e81c5f4307 o vn_extattr_set() will now call appropriate vn_start_write() and
vn_finished_write() if IO_NODELOCKED is not set.

Obtained from:	TrustedBSD Project
2000-09-05 03:15:02 +00:00
Robert Watson b4d0de586d o Remove commented out code which modified return values from
extattr_{get,set} syscalls in the face of partial reads or writes.

Obtained from:	TrustedBSD Project
2000-09-05 02:13:14 +00:00
Peter Wemm 58da4602af When we are picking the next available unit number, specifically say
what we picked.  Otherwise it is anybody's guess as to where the
device ended up.
2000-09-05 00:30:46 +00:00
Poul-Henning Kamp 97804a5c99 Update the NTP kernel PLL code to the 2000-08-29 version of Dave Mills
nanokernel.

The FreeBSD private mode hardpps Type 2 PLL has been removed.
2000-09-04 08:19:32 +00:00
Alan Cox b70158bae1 Make filt_aio() check the jobstate for JOBST_JOBBFINISHED (in addition
to JOBST_JOBFINISHED) in case the aio_read() or aio_write() was performed
via the high-performance physio method, i.e., aio_qphysio().
2000-09-04 07:56:32 +00:00
Peter Wemm 82acbcf57b kern_shutdown.c was more ANSI-C than K&R - remove the remnants of K&R
support with extreme prejudice.
2000-09-03 06:44:53 +00:00
Peter Wemm 87de370376 gcc knows that savectx() is potentially a setjmp style dual-return
function which may lead to stack lossage and clobbered variables.
This isn't the case here, but there is no way to tell gcc that.

Work around this in a kinda bizzare way, but it shuts gcc up.
2000-09-03 06:35:04 +00:00
Poul-Henning Kamp db90128160 Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf.  Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.
2000-09-02 19:17:34 +00:00
Don Lewis 8577117cc8 access() shouldn't diddle with the contents of a potentially shared
credential.  Create a temporary copy of the current credential and
modify the copy.

Submitted by:	tegge
2000-09-02 12:31:55 +00:00
Brian Feldman 468ddc8a44 Casts are needed to subtract u_longs.
Submitted by:	tor
2000-08-31 22:21:33 +00:00
Robert Watson c52396e365 o p_cansee() wasn't setting privused when suser() was required to override
kern.ps_showallprocs.  Apparently got lost in the merge process from
  the capability patches.  Now fixed.

Submitted by:	jdp
Obtained from:	TrustedBSD Project
2000-08-31 15:55:17 +00:00
Brian Feldman b6240737d5 Fix hangs caused by overzealous code removal.
Thanks, Nickolay, for figuring out this is the problem.

Submitted by:	Nickolay Dudorov <nnd@mail.nsk.ru>
2000-08-31 11:31:58 +00:00
Mike Smith 3e755f76d1 Make it possible to pass boot()'s flags to shutdown_nice() so that the
kernel can instigate an orderly shutdown but still determine the form of
that shutdown.  Make it possible eg. to cleanly shutdown and power off the
system under ACPI when the power button is pressed.
2000-08-31 00:08:50 +00:00
Robert Watson 387d2c036b o Centralize inter-process access control, introducing:
int p_can(p1, p2, operation, privused)

  which allows specification of subject process, object process,
  inter-process operation, and an optional call-by-reference privused
  flag, allowing the caller to determine if privilege was required
  for the call to succeed.  This allows jail, kern.ps_showallprocs and
  regular credential-based interaction checks to occur in one block of
  code.  Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
  and P_CAN_DEBUG.  p_can currently breaks out as a wrapper to a
  series of static function checks in kern_prot, which should not
  be invoked directly.

o Commented out capabilities entries are included for some checks.

o Update most inter-process authorization to make use of p_can() instead
  of manual checks, PRISON_CHECK(), P_TRESPASS(), and
  kern.ps_showallprocs.

o Modify suser{,_xxx} to use const arguments, as it no longer modifies
  process flags due to the disabling of ASU.

o Modify some checks/errors in procfs so that ENOENT is returned instead
  of ESRCH, further improving concealment of processes that should not
  be visible to other processes.  Also introduce new access checks to
  improve hiding of processes for procfs_lookup(), procfs_getattr(),
  procfs_readdir().  Correct a bug reported by bp concerning not
  handling the CREATE case in procfs_lookup().  Remove volatile flag in
  procfs that caused apparently spurious qualifier warnigns (approved by
  bde).

o Add comment noting that ktrace() has not been updated, as its access
  control checks are different from ptrace(), whereas they should
  probably be the same.  Further discussion should happen on this topic.

Reviewed by:	bde, green, phk, freebsd-security, others
Approved by:	bde
Obtained from:	TrustedBSD Project
2000-08-30 04:49:09 +00:00
Robert Watson c6fac29aff o Disable flagging of ASU in suser_xxx() authorization check. For the
time being, the ASU accounting flag will no longer be available, but
  may be reinstituted in the future once authorization have been redone.
  As it is, the kernel went through contortions in access control to
  avoid calling suser, which always set the flag.  This will also allow
  suser to accept const struct *{cred, proc} arguments.

Reviewed by:	bde, green, phk, freebsd-security, others
Approved by:	bde
Obtained from:	TrustedBSD Project
2000-08-30 04:35:32 +00:00
Brian Feldman 343079d9b2 Remove an extraneous setting of sb_hiwat. 2000-08-30 00:09:57 +00:00
Robert Watson 012c643d3e o Restructure vaccess() so as to check for DAC permission to modify the
object before falling back on privilege.  Make vaccess() accept an
  additional optional argument, privused, to determine whether
  privilege was required for vaccess() to return 0.  Add commented
  out capability checks for reference.  Rename some variables to make
  it more clear which modes/uids/etc are associated with the object,
  and which with the access mode.
o Update file system use of vaccess() to pass NULL as the optional
  privused argument.  Once additional patches are applied, suser()
  will no longer set ASU, so privused will permit passing of
  privilege information up the stack to the caller.

Reviewed by:	bde, green, phk, -security, others
Obtained from:	TrustedBSD Project
2000-08-29 14:45:49 +00:00
Brian Feldman 6aef685fbb Remove any possibility of hiwat-related race conditions by changing
the chgsbsize() call to use a "subject" pointer (&sb.sb_hiwat) and
a u_long target to set it to.  The whole thing is splnet().

This fixes a problem that jdp has been able to provoke.
2000-08-29 11:28:06 +00:00
Doug Rabson f80e454726 Add kobj_class_compile_static() to allow classes to be initialised
statically (i.e. without calling malloc). This allows kobj to be used
very early in the boot sequence.
2000-08-28 21:11:12 +00:00
Doug Rabson a4f9b116e3 * Remove a bogus call to kobj_init() from make_device().
* Add a non-empty implementation of root_print_child().
2000-08-28 21:08:12 +00:00
Marcel Moolenaar 3f4ab6537f Regen: fix prototypes for {o|}{g|s}etrlimit. 2000-08-28 07:56:38 +00:00
Marcel Moolenaar ae51d56ce1 Fix prototypes for {o|}{g|s}etrlimit. A recent change in the
Linuxulator caused this bug to trigger.
2000-08-28 07:50:44 +00:00
Alfred Perlstein c58b821e4c new sysctl 'kern.openfiles' (exports nfiles to userland) 2000-08-26 23:49:44 +00:00
Robert Watson 877dd71fc6 o Correct spelling of ufs_exttatr_find_attr -> ufs_extattr_find_attr
o Add "const" qualifier to attrname argument of various calls to remove
  warnings

Obtained from:	TrustedBSD Project
2000-08-26 22:00:58 +00:00
Marcel Moolenaar 31c8f3f0af Make this file compile again when COMPAT_43 has not been
defined. This boils down to conditionally compile the
old signal syscalls.

We might want to extend the types in syscalls.master to
make these syscalls conditionally on something more
appropriate than COMPAT_43.
2000-08-26 02:27:01 +00:00
Peter Wemm 9579e8c145 m_mballoc_wait() had a spl/tsleep race. mbufs can be freed in interrupt
context, which can cause a wakeup.. which can race with this.
2000-08-25 22:28:08 +00:00
Peter Wemm 12db06a04f If the config program found a hints file and included it as a fallback,
then treat it as such.  This isn't perfect, but should do for things
like GENERIC.  When in fallback mode, they will be used if there are NO
other hints.
2000-08-25 19:48:10 +00:00
Poul-Henning Kamp d8cd1501f2 Dang, a _clone routine escaped #ifdef DEVFS containment. 2000-08-24 15:59:44 +00:00
Poul-Henning Kamp a481b90b82 Fix panic when removing open device (found by bp@)
Implement subdirs.
 Build the full "devicename" for cloning functions.
 Fix panic when deleted device goes away.
 Collaps devfs_dir and devfs_dirent structures.
 Add proper cloning to the /dev/fd* "device-"driver.
 Fix a bug in make_dev_alias() handling which made aliases appear
  multiple times.
 Use devfs_clone to implement getdiskbyname()
 Make specfs maintain the stat(2) timestamps per dev_t
2000-08-24 15:36:55 +00:00
Brian Feldman 0a7d171157 Revert the suser -> suser_xxx change made previously. It was right
before.
2000-08-24 04:54:31 +00:00
Paul Saab 03f808c55a Add a sysctl which hides all process except those that belong to
the user asking for the process list.

Reviewed by:	peter
2000-08-23 21:41:25 +00:00
Poul-Henning Kamp 3f54a085a6 Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)
Remove old DEVFS support fields from dev_t.

  Make uid, gid & mode members of dev_t and set them in make_dev().

  Use correct uid, gid & mode in make_dev in disk minilayer.

  Add support for registering alias names for a dev_t using the
  new function make_dev_alias().  These will show up as symlinks
  in DEVFS.

  Use makedev() rather than make_dev() for MFSs magic devices to prevent
  DEVFS from noticing this abuse.

  Add a field for DEVFS inode number in dev_t.

  Add new DEVFS in fs/devfs.

  Add devfs cloning to:
        disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
        md(4), tun(4), bpf(4), fd(4)

  If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

  Add commented out DEVFS to GENERIC
2000-08-20 21:34:39 +00:00
Poul-Henning Kamp 4fe6d43729 Fix typo in last commit. 2000-08-20 11:46:39 +00:00
Poul-Henning Kamp e39c53eda5 Centralize the canonical vop_access user/group/other check in vaccess().
Discussed with: bde
2000-08-20 08:36:26 +00:00
David Malone a5c4836d39 Replace the mbuf external reference counting code with something
that should be better.

The old code counted references to mbuf clusters by using the offset
of the cluster from the start of memory allocated for mbufs and
clusters as an index into an array of chars, which did the reference
counting. If the external storage was not a cluster then reference
counting had to be done by the code using that external storage.

NetBSD's system of linked lists of mbufs was cosidered, but Alfred
felt it would have locking issues when the kernel was made more
SMP friendly.

The system implimented uses a pool of unions to track external
storage. The union contains an int for counting the references and
a pointer for forming a free list. The reference counts are
incremented and decremented atomically and so should be SMP friendly.
This system can track reference counts for any sort of external
storage.

Access to the reference counting stuff is now through macros defined
in mbuf.h, so it should be easier to make changes to the system in
the future.

The possibility of storing the reference count in one of the
referencing mbufs was considered, but was rejected 'cos it would
often leave extra mbufs allocated. Storing the reference count in
the cluster was also considered, but because the external storage
may not be a cluster this isn't an option.

The size of the pool of reference counters is available in the
stats provided by "netstat -m".

PR:		19866
Submitted by:	Bosko Milekic <bmilekic@dsuper.net>
Reviewed by:	alfred (glanced at by others on -net)
2000-08-19 08:32:59 +00:00
Poul-Henning Kamp 39f70682ae Introduce vop_stdinactive() and make it the default if no vop_inactive
is declared.

Sort and prune a few vop_op[].
2000-08-18 10:01:02 +00:00
Brian Feldman 9b96968623 Fix a couple cases where p_trespass wasn't transitioned into place.
Make RTP_SET (rtprio) only accessible to real root, not root in jails.
2000-08-16 23:28:54 +00:00
Peter Wemm 37b087a645 Clean up some low level bootstrap code:
- stop using the evil 'struct trapframe' argument for mi_startup()
  (formerly main()).  There are much better ways of doing it.
- do not use prepare_usermode() - setregs() in execve() will do it
  all for us as long as the p_md.md_regs pointer is set.  (which is
  now done in machdep.c rather than init_main.c.  The Alpha port did it
  this way all along and is much cleaner).
- collect all the magic %cr0 etc register settings into one place and
  have the AP's call that instead of using magic numbers (!!) that keep
  changing over and over again.
- Make it safe to call kthread_create() earlier, including during the
  device probe sequence.  It doesn't need the callback mechanism that
  NetBSD's version uses.
- kthreads created this way are root-less as they exist before the root
  filesystem is mounted.  init(1) is set up so that it aquires the root
  pointers prior to running.  If other kthreads want filesystem acccess
  we can make this code more generic.
- set all threads start times once we have decided what time it is.
- init uses a trampoline rather than the evil prepare_usermode() hack.
- kern_descrip.c has a couple of tweaks to deal with forking when there
  is no rootdir or cwd etc.
- adjust the early SYSINIT() sequence so that a few prereqisites are in
  place. eg: make sure the run queue is initialized before doing forks.

With this, the USB code can easily create a kthread to do the device
tree discovery.  (I have tested it, it works nicely).

There are still some open issues before this is truely useful.
- tsleep() does not like working before the clock is running.  It
  sort-of tries to spin wait, but it can do more useful things now.
- stopping a kthread in kld code at unload time is "interesting" but
  we have a solution for that.

The Alpha code needs no changes for this.  It already uses pretty much the
same strategies, but a little cleaner.
2000-08-11 09:05:12 +00:00
Tor Egge 4428d39d63 Don't skip IOAPIC id conflict detection when only one pci bus is present.
PR:		20312
Reviewed by:	Steve Roome <steve@sse0691.bri.hp.com>
2000-08-10 17:33:24 +00:00