Commit graph

5800 commits

Author SHA1 Message Date
John Baldwin 04f4a16448 If the file descriptors passed into do_dup() are negative, return EBADF
instead of panicing.  Also, perform some of the simpler sanity checks on
the fds before acquiring the filedesc lock.

Approved by:	re
Reported by:	Dan Nelson <dan@emsphone.com> and others
2002-11-26 17:22:15 +00:00
Robert Watson 4d10c0ce5f Un-staticize mac_cred_mmapped_drop_perms() so that it may be used
by policy modules making use of downgrades in the MAC AST event.  This
is required by the mac_lomac port of LOMAC to the MAC Framework.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-26 17:11:57 +00:00
Alan Cox 2d21129db2 Acquire and release the page queues lock around pmap_remove_pages() because
it updates several of vm_page's fields.
2002-11-25 04:37:44 +00:00
Alan Cox 178949e021 Hold the page queues/flags lock when calling vm_page_set_validclean().
Approved by:	re
2002-11-23 19:10:31 +00:00
Maxime Henrion b19d9defef Under certain circumstances, we were calling kmem_free() from
i386 cpu_thread_exit().  This resulted in a panic with WITNESS
since we need to hold Giant to call kmem_free(), and we weren't
helding it anymore in cpu_thread_exit().  We now do this from a
new MD function, cpu_thread_dtor(), called by thread_dtor().

Approved by:	re@
Suggested by:	jhb
2002-11-22 23:57:02 +00:00
Jeff Roberson 79acfc497b - Add the new sched_pctcpu() function to the sched_* api.
- Provide a routine in sched_4bsd to add this functionality.
 - Use sched_pctcpu() in kern_proc, which is the one place outside of
   sched_4bsd where the old pctcpu value was accessed directly.

Approved by:	re
2002-11-21 09:30:55 +00:00
Jeff Roberson 06439a04a1 - Move scheduler specific macros and defines out of proc.h
Approved by:	re
2002-11-21 09:14:13 +00:00
Jeff Roberson 148302c9c9 - Move FSCALE back to kern_sync. This is not scheduler specific.
- Create a new callout for lbolt and move it out of schedcpu().  This is not
   scheduler specific either.

Approved by:	re
2002-11-21 08:57:08 +00:00
Jeff Roberson de028f5a4a - Implement a mechanism for allowing schedulers to place scheduler dependant
data in the scheduler independant structures (proc, ksegrp, kse, thread).
 - Implement unused stubs for this mechanism in sched_4bsd.

Approved by:	re
Reviewed by:	luigi, trb
Tested on:	x86, alpha
2002-11-21 01:22:38 +00:00
Robert Watson 2555374c4f Introduce p_label, extensible security label storage for the MAC framework
in struct proc.  While the process label is actually stored in the
struct ucred pointed to by p_ucred, there is a need for transient
storage that may be used when asynchronous (deferred) updates need to
be performed on the "real" label for locking reasons.  Unlike other
label storage, this label has no locking semantics, relying on policies
to provide their own protection for the label contents, meaning that
a policy leaf mutex may be used, avoiding lock order issues.  This
permits policies that act based on historical process behavior (such
as audit policies, the MAC Framework port of LOMAC, etc) can update
process properties even when many existing locks are held without
violating the lock order.  No currently committed policies implement use
of this label storage.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-20 15:41:25 +00:00
Robert Watson a3df768b04 Merge kld access control checks from the MAC tree: these access control
checks permit policy modules to augment the system policy for permitting
kld operations.  This permits policies to limit access to kld operations
based on credential (and other) properties, as well as to perform checks
on the kld being loaded (integrity, etc).

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-19 22:12:42 +00:00
Robert Watson 293d2d2261 We leaked a process lock reference in the event an RFTHREAD process
leader wasn't exiting during a fork; instead, do remember to release
the lock avoiding lock order reversals and recursion panic.

Reported by:	"Joel M. Baldwin" <qumqats@outel.org>
2002-11-18 14:23:21 +00:00
David Xu bfd8325073 Make sure only update wall clock at upcall time, slightly reformat
code in kse_relase().
2002-11-18 12:28:15 +00:00
Alfred Perlstein ec63e12a03 During shutdown explain what the numbers following the 'syncing
disks' message mean, specifically, 'buffers remaining...'.
2002-11-18 02:41:03 +00:00
David Xu 8798d4f9c8 1. Support versioning and wall clock in kse mailbox,
also add rusage time in thread mailbox.
2. Minor change for thread limit code in thread_user_enter(),
   fix typo in kse_release() last I committed.

Reviewed by: deischen, mini
2002-11-18 01:59:31 +00:00
Julian Elischer 904f1b77cc include smp.h.
it is required by some code that was commented out until david's
last commit.
2002-11-17 23:26:42 +00:00
David Xu fdc5ecd24f 1.Add sysctls to control KSE resource allocation.
kern.threads.max_threads_per_proc
  kern.threads.max_groups_per_proc
2.Temporary disable borrower thread stash itself as
  owner thread's spare thread in thread_exit(). there
  is a race between owner thread and borrow thread:
  an owner thread may allocate a spare thread as this:
	if (td->td_standin == NULL)
		td->standin = thread_alloc();
  but thread_alloc() can block the thread, then a borrower
  thread would possible stash it self as owner's spare
  thread in thread_exit(), after owner is resumed, result
  is a thread leak in kernel, double check in owner can
  avoid the race, but it may be ugly and not worth to do.
2002-11-17 11:47:03 +00:00
David Xu db9b0729fc Rework last exiting thread in kse_release(), wait a signal and then
schedule an upcall and call thread_exit().
2002-11-17 10:12:00 +00:00
Jeff Roberson a9a088823e - Release the imgp vnode prior to freeing exec_map resources to avoid
deadlock.
2002-11-17 09:33:00 +00:00
Alfred Perlstein f51c1e897d Rework the sysconf(3) interaction with aio:
sysconf.c:
  Use 'break' rather than 'goto yesno' in sysconf.c so that we report a '0'
  return value from the kernel sysctl.

vfs_aio.c:
  Make aio reset its configuration parameters to -1 after unloading
  instead of 0.

posix4_mib.c:
  Initialize the aio configuration parameters to -1
  to indicate that it is not loaded.
  Add a facility (p31b_iscfg()) to determine if a posix4 facility has been
  initialized to avoid having to re-order the SYSINITs.
  Use p31b_iscfg() to determine if aio has had a chance to run yet which
  is likely if it is compiled into the kernel and avoid spamming its
  values.
  Introduce a macro P31B_VALID() instead of doing the same comparison over
  and over.

posix4.h:
  Prototype p31b_iscfg().
2002-11-17 04:15:34 +00:00
Alan Cox 4fec79bef8 Now that pmap_remove_all() is exported by our pmap implementations
use it directly.
2002-11-16 07:44:25 +00:00
Alfred Perlstein 86d52125a2 Export the values for _SC_AIO_MAX and _SC_AIO_PRIO_DELTA_MAX via the p1003b
sysctl interface.
2002-11-16 06:38:07 +00:00
Daniel Eischen f3ec9000e9 Regenerate after adding system calls. 2002-11-16 06:36:56 +00:00
Daniel Eischen 2be05b70c9 Add getcontext, setcontext, and swapcontext as system calls.
Previously these were libc functions but were requested to
be made into system calls for atomicity and to coalesce what
might be two entrances into the kernel (signal mask setting
and floating point trap) into one.

A few style nits and comments from bde are also included.

Tested on alpha by: gallatin
2002-11-16 06:35:53 +00:00
Alfred Perlstein c844abc920 Call 'p31b_setcfg(CTL_P1003_1B_AIO_LISTIO_MAX, AIO_LISTIO_MAX)'
when AIO is initialized so that sysconf() gives correct results.

Reported by: Craig Rodrigues <rodrigc@attbi.com>
2002-11-16 04:22:55 +00:00
Alfred Perlstein b565fb9e6f headers should not really include "opt_foo.h" (in this case opt_posix.h).
remove it from the header and add it to the files that require it.
2002-11-15 22:55:06 +00:00
David Xu 1d2c5bd519 Return EWOULDBLOCK for last thread in kse_release().
Requested by: archie
2002-11-15 00:53:59 +00:00
Thomas Moestl 01ee43955c Make the msg_size, msg_bufx and msg_bufr memebers of struct msgbuf
signed, since they describe a ring buffer and signed arithmetic is
performed on them. This avoids some evilish casts.

Since this changes all but two members of this structure, style(9)
those remaining ones, too.

Requested by:	bde
Reviewed by:	bde (earlier version)
2002-11-14 16:11:12 +00:00
David Xu ca161eb6e9 In kse_release(), check if current thread is bound
and current kse mailbox was already initialized, also
prevent last thread from exiting unless we figure out
how to safely support null thread proc.
2002-11-14 06:06:45 +00:00
Robert Watson a96acd1ace Introduce a condition variable to avoid returning EBUSY when
the MAC policy list is busy during a load or unload attempt.
We assert no locks held during the cv wait, meaning we should
be fairly deadlock-safe.  Because of the cv model and busy
count, it's possible for a cv waiter waiting for exclusive
access to the policy list to be starved by active and
long-lived access control/labeling events.  For now, we
accept that as a necessary tradeoff.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-13 15:47:09 +00:00
Maxime Henrion 2bb95458bd Add support for the C99 %t format modifier. 2002-11-13 15:15:59 +00:00
Robert Watson 63b6f478ec Garbage collect mac_create_devfs_vnode() -- it hasn't been used since
we brought in the new cache and locking model for vnode labels.  We
now rely on mac_associate_devfs_vnode().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-12 04:20:36 +00:00
John Baldwin d2b28e078a Correct an assertion in the code to traverse the list of locks to find an
earlier acquired lock with the same witness as the lock currently being
acquired.  If we had released several earlier acquired locks after
acquiring enough locks to require another lock_list_entry bucket in the
lock list, then subsequent lock_list_entry buckets could contain only one
lock instance in which case i would be zero.

Reported by:	Joel M. Baldwin <qumqats@outel.org>
2002-11-11 16:36:20 +00:00
Robert Watson 2d43d24ed4 Garbage collect definition of M_MACOPVEC -- we no longer perform a
dynamic mapping of an operation vector into an operation structure,
rather, we rely on C99 sparse structure initialization.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-11 14:15:58 +00:00
Alan Cox d154fb4fe6 When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than
indirectly through vm_page_protect().  The one remaining page flag that
is updated by vm_page_protect() is already being updated by our various
pmap implementations.

Note: A later commit will similarly change the VM_PROT_READ case and
eliminate vm_page_protect().
2002-11-10 07:12:04 +00:00
Alfred Perlstein 29f194457c Fix instances of macros with improperly parenthasized arguments.
Verified by: md5
2002-11-09 12:55:07 +00:00
Robert Watson 6d7bdc8def Assign value of NULL to imgp->execlabel when imgp is initialized
in the ELF code.  Missed in earlier merge from the MAC tree.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-08 20:49:50 +00:00
Robert Watson 52378b8acd To reduce per-return overhead of userret(), call into
mac_thread_userret() only if PS_MACPEND is set in the process AST mask.
This avoids the cost of the entry point in the common case, but
requires policies interested in the userret event to set the flag
(protected by the scheduler lock) if they do want the event.  Since
all the policies that we're working with which use mac_thread_userret()
use the entry point only selectively to perform operations deferred
for locking reasons, this maintains the desired semantics.

Approved by:	re
Requested by:	bde
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-08 19:00:17 +00:00
Robert Watson 9fa3506ecd Add an explicit execlabel argument to exec-related MAC policy entry
points, rather than relying on policies to grub around in the
image activator instance structure.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-08 18:04:00 +00:00
Thomas Moestl 0fca57b8b8 Move the definitions of the hw.physmem, hw.usermem and hw.availpages
sysctls to MI code; this reduces code duplication and makes all of them
available on sparc64, and the latter two on powerpc.
The semantics by the i386 and pc98 hw.availpages is slightly changed:
previously, holes between ranges of available pages would be included,
while they are excluded now. The new behaviour should be more correct
and brings i386 in line with the other architectures.

Move physmem to vm/vm_init.c, where this variable is used in MI code.
2002-11-07 23:57:17 +00:00
John Baldwin 6274bdda4c - Use %j to print intmax_t values.
- Cast more daddr_t values to intmax_t when printing to quiet warnings.
2002-11-07 22:41:08 +00:00
John Baldwin d0e938f4f1 Use %z to quiet a warning. 2002-11-07 22:38:04 +00:00
Maxime Henrion a7a00d0546 - Fix a bunch of casts to long which were truncating off_t's.
- Remove the comments which were justifying this by the fact
that we don't have %q in the kernel, this was probably right
back in time, but we now have %q, and we even have better to
print those types (%j).
2002-11-07 21:56:05 +00:00
Maxime Henrion b65d1ba9dd - Use a better definition for MNAMELEN which doesn't require
to have one #ifdef per architecture.
- Change a space to a tab after a nearby #define.

Obtained from:	bde
2002-11-07 21:15:02 +00:00
Robert Watson f8f750c53e Do a bit more work in the aio code to simulate the credential environment
of the original AIO request: save and restore the active thread credential
as well as using the file credential, since MAC (and some other bits of
the system) rely on the thread credential instead of/as well as the
file credential.  In brief: cache td->td_ucred when the AIO operation
is queued, temporarily set and restore the kernel thread credential,
and release the credential when done.  Similar to ktrace credential
management.

Reviewed by:	alc
Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-07 20:46:37 +00:00
Kelly Yancey 04ac9b97b5 Spotted a couple of places where the socket buffer's counters were being
manipulated directly (rather than using sballoc()/sbfree()); update them
to tweak the new sb_ctl field too.

Sponsored by:	NTT Multimedia Communications Labs
2002-11-05 18:52:25 +00:00
Kelly Yancey 247a32f22a Fix filt_soread() to properly flag a kevent when a 0-byte datagram is
received.

Verified by:	dougb, Manfred Antar <null@pozo.com>
Sponsored by:	NTT Multimedia Communications Labs
2002-11-05 18:48:46 +00:00
Robert Watson 0c93266b9c Correct merge-o: disable the right execve() variation if !MAC 2002-11-05 18:04:50 +00:00
Robert Watson 670cb89bf4 Bring in two sets of changes:
(1) Permit userland applications to request a change of label atomic
    with an execve() via mac_execve().  This is required for the
    SEBSD port of SELinux/FLASK.  Attempts to invoke this without
    MAC compiled in result in ENOSYS, as with all other MAC system
    calls.  Complexity, if desired, is present in policy modules,
    rather than the framework.

(2) Permit policies to have access to both the label of the vnode
    being executed as well as the interpreter if it's a shell
    script or related UNIX nonsense.  Because we can't hold both
    vnode locks at the same time, cache the interpreter label.
    SEBSD relies on this because it supports secure transitioning
    via shell script executables.  Other policies might want to
    take both labels into account during an integrity or
    confidentiality decision at execve()-time.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 17:51:56 +00:00
Robert Watson 051c41caf1 Regen. 2002-11-05 17:48:04 +00:00
Robert Watson 21bb9ea225 Flesh out the definition of __mac_execve(): per earlier discussion,
it's essentially execve() with an optional MAC label argument.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 17:47:08 +00:00
Robert Watson 4443e9ff4a Assert that appropriate vnodes are locked in mac_execve_will_transition().
Allow transitioning to be twiddled off using the process and fs enforcement
flags, although at some point this should probably be its own flag.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 15:11:33 +00:00
Robert Watson ccafe7eb35 Hook up the mac_will_execve_transition() and mac_execve_transition()
entrypoints, #ifdef MAC.  The supporting logic already existed in
kern_mac.c, so no change there.  This permits MAC policies to cause
a process label change as the result of executing a binary --
typically, as a result of executing a specially labeled binary.

For example, the SEBSD port of SELinux/FLASK uses this functionality
to implement TE type transitions on processes using transitioning
binaries, in a manner similar to setuid.  Policies not implementing
a notion of transition (all the ones in the tree right now) require
no changes, since the old label data is copied to the new label
via mac_create_cred() even if a transition does occur.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 14:57:49 +00:00
Giorgos Keramidas 5f9ae8e026 Typo in comment: commmand -> command
Reviewed by:	jhb
2002-11-05 14:54:07 +00:00
Robert Watson 450ffb4427 Remove reference to struct execve_args from struct imgact, which
describes an image activation instance.  Instead, make use of the
existing fname structure entry, and introduce two new entries,
userspace_argv, and userspace_envv.  With the addition of
mac_execve(), this divorces the image structure from the specifics
of the execve() system call, removes a redundant pointer, etc.
No semantic change from current behavior, but it means that the
structure doesn't depend on syscalls.master-generated includes.

There seems to be some redundant initialization of imgact entries,
which I have maintained, but which could probably use some cleaning
up at some point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 01:59:56 +00:00
Robert Watson e5e820fd1f Permit MAC policies to instrument the access control decisions for
system accounting configuration and for nfsd server thread attach.
Policies might use this to protect the integrity or confidentiality
of accounting data, limit the ability to turn on or off accounting,
as well as to prevent inappropriately labeled threads from becoming nfs
server threads.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-04 15:13:36 +00:00
Robert Watson 3da87a65c7 Remove mac_cache_fslabel_in_vnode sysctl -- with the new VFS/MAC
construction, labels are always cached.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-04 14:55:14 +00:00
Robert Watson 6201265be7 License clarification and wording changes: NAI has approved removal of
clause three, and NAI Labs now goes by the name Network Associates
Laboratories.
2002-11-04 01:42:39 +00:00
Robert Watson 4b8d5f2d97 Introduce mac_check_system_settime(), a MAC check allowing policies to
augment the system policy for changing the system time.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-03 02:39:42 +00:00
Robert Watson 01ce3b5661 Regen from yesterday's system call placeholder rename. 2002-11-02 23:54:36 +00:00
Alan Cox 151113a946 Catch up with the removal of the vm page buckets spin mutex. 2002-11-02 22:42:18 +00:00
Alan Cox 5ee0a409fc Revert the change in revision 1.77 of kern/uipc_socket2.c. It is causing
a panic because the socket's state isn't as expected by sofree().

Discussed with: dillon, fenner
2002-11-02 05:14:31 +00:00
Kelly Yancey 47baac87a6 Update the st_size reported via stat(2) to accurately reflect the amount
of data available to read for non-TCP sockets.

Reviewed by:	-net, -arch
Sponsored by:	NTT Multimedia Communications Labs
MFC after:	2 weeks
2002-11-01 21:31:13 +00:00
Kelly Yancey e0f640e82d Track the number of non-data chararacters stored in socket buffers so that
the data value returned by kevent()'s EVFILT_READ filter on non-TCP
sockets accurately reflects the amount of data that can be read from the
sockets by applications.

PR:		30634
Reviewed by:	-net, -arch
Sponsored by:	NTT Multimedia Communications Labs
MFC after:	2 weeks
2002-11-01 21:27:59 +00:00
Robert Watson 6cedb451fb Rename __execve_mac() to __mac_execve() for increased consistency
with other MAC system calls.

Requested by:	various (phk, gordont, jake, ...)
2002-11-01 21:00:02 +00:00
Robert Watson e686e5ae91 Add MAC checks for various kenv() operations: dump, get, set, unset,
permitting MAC policies to limit access to the kernel environment.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-01 20:46:53 +00:00
Poul-Henning Kamp 1fb14a47a1 Introduce malloc_last_fail() which returns the number of seconds since
malloc(9) failed last time.  This is intended to help code adjust
memory usage to the current circumstances.

A typical use could be:
	if (malloc_last_fail() < 60)
		reduce_cache_by_one();
2002-11-01 18:58:12 +00:00
Poul-Henning Kamp 38b0884cc3 Introduce a "time_uptime" global variable which holds the time since boot
in seconds.
2002-11-01 18:52:20 +00:00
David Xu adac9400a7 KSE-enabled processes only. 2002-10-31 08:00:51 +00:00
Robert Watson 5c8dd34218 Move to C99 sparse structure initialization for the mac_policy_ops
structure definition, rather than using an operation vector
we translate into the structure.  Originally, we used a vector
for two reasons:

(1) We wanted to define the structure sparsely, which wasn't
    supported by the C compiler for structures.  For a policy
    with five entry points, you don't want to have to stick in
    a few hundred NULL function pointers.

(2) We thought it would improve ABI compatibility allowing modules
    to work with kernels that had a superset of the entry points
    defined in the module, even if the kernel had changed its
    entry point set.

Both of these no longer apply:

(1) C99 gives us a way to sparsely define a static structure.

(2) The ABI problems existed anyway, due to enumeration numbers,
    argument changes, and semantic mismatches.  Since the going
    rule for FreeBSD is that you really need your modules to
    pretty closely match your kernel, it's not worth the
    complexity.

This submit eliminates the operation vector, dynamic allocation
of the operation structure, copying of the vector to the
structure, and redoes the vectors in each policy to direct
structure definitions.  One enourmous benefit of this change
is that we now get decent type checking on policy entry point
implementation arguments.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-30 18:48:51 +00:00
Robert Watson b914de36c0 While 'mode_t' seemed like a good idea for the access mode argument for
MAC access() and open() checks, the argument actually has an int type
where it becomes available.  Switch to using 'int' for the mode argument
throughout the MAC Framework and policy modules.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-30 17:56:57 +00:00
David Xu 8db2431f61 Check NULL thread mailbox pointer. 2002-10-30 05:09:29 +00:00
David Xu 7b290dd008 Style fixes. 2002-10-30 03:01:28 +00:00
David Xu 37fcb8bcc8 Don't forget to set syscall result. 2002-10-30 02:39:10 +00:00
David Xu 34e80e027d Add an actual implementation of kse_thr_interrupt() 2002-10-30 02:28:41 +00:00
Robert Watson 9a1b076af2 Minor comment typo fix.
Submitted by:	Wayne Morrison <tewok@tislabs.com>
2002-10-29 20:51:44 +00:00
David Malone 6bd34a1e6f The syscall names are string constants, so make them consts. 2002-10-29 15:47:06 +00:00
Robert Watson 6151efaa54 Trim extraneous #else and #endif MAC comments per style(9). 2002-10-28 21:17:53 +00:00
Robert Watson 8b3a843438 An inappropriate ASSERT slipped in during the recent merge of the
reboot checking; remove.
2002-10-28 18:53:53 +00:00
David Xu 72465621ff Close a race window in kse_create(): signal delivered after SIGPENDING call
but before we call kse_link().
2002-10-28 07:37:06 +00:00
Ian Dowse 4e08ccb2ff Fix a case in kern_rename() where a vn_finished_write() call was
missed. This bug has been present since the vn_start_write() and
vn_finished_write() calls were first added in revision 1.159. When
the case is triggered, any attempts to create snapshots on the
filesystem will deadlock and also prevent further write activity
on that filesystem.
2002-10-27 23:23:51 +00:00
Garrett Wollman c7047e5204 Change the way support for asynchronous I/O is indicated to applications
to conform to 1003.1-2001.  Make it possible for applications to actually
tell whether or not asynchronous I/O is supported.

Since FreeBSD's aio implementation works on all descriptor types, don't
call down into file or vnode ops when [f]pathconf() is asked about
_PC_ASYNC_IO; this avoids the need for every file and vnode op to know about
it.
2002-10-27 18:07:41 +00:00
Robert Watson 9e913ebd0a Centrally manage enforcement of {reboot,swapon,sysctl} using the
mac_enforce_system toggle, rather than several separate toggles.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-27 15:50:49 +00:00
Robert Watson d3fc69ee6a Implement mac_check_system_sysctl(), a MAC Framework entry point to
permit MAC policies to augment the security protections on sysctl()
operations.  This is not really a wonderful entry point, as we
only have access to the MIB of the target sysctl entry, rather than
the more useful entry name, but this is sufficient for policies
like Biba that wish to use their notions of privilege or integrity
to prevent inappropriate sysctl modification.  Affects MAC kernels
only.  Since SYSCTL_LOCK isn't in sysctl.h, just kern_sysctl.c,
we can't assert the SYSCTL subsystem lockin the MAC Framework.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-27 07:12:34 +00:00
Robert Watson a2ecb9b790 Hook up mac_check_system_reboot(), a MAC Framework entry point that
permits MAC modules to augment system security decisions regarding
the reboot() system call, if MAC is compiled into the kernel.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-27 07:03:29 +00:00
Robert Watson 03ce2c0c9b Merge from MAC tree: rename mac_check_vnode_swapon() to
mac_check_system_swapon(), to reflect the fact that the primary
object of this change is the running kernel as a whole, rather
than just the vnode.  We'll drop additional checks of this
class into the same check namespace, including reboot(),
sysctl(), et al.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-27 06:54:06 +00:00
Maxime Henrion 5b8ee62bc2 Fix a style nit. 2002-10-26 18:19:46 +00:00
Robert Watson 763bbd2f4f Slightly change the semantics of vnode labels for MAC: rather than
"refreshing" the label on the vnode before use, just get the label
right from inception.  For single-label file systems, set the label
in the generic VFS getnewvnode() code; for multi-label file systems,
leave the labeling up to the file system.  With UFS1/2, this means
reading the extended attribute during vfs_vget() as the inode is
pulled off disk, rather than hitting the extended attributes
frequently during operations later, improving performance.  This
also corrects sematics for shared vnode locks, which were not
previously present in the system.  This chances the cache
coherrency properties WRT out-of-band access to label data, but in
an acceptable form.  With UFS1, there is a small race condition
during automatic extended attribute start -- this is not present
with UFS2, and occurs because EAs aren't available at vnode
inception.  We'll introduce a work around for this shortly.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-26 14:38:24 +00:00
Julian Elischer 053effc60e iBack out david's last commit. the suspension code needs to be called
for non KSE processes too.
2002-10-26 04:44:17 +00:00
David Xu 3139ada54c Move suspension checking code from userret() into thread_userret(). 2002-10-26 02:56:51 +00:00
David Xu 56a6a23ea6 Backout revision 1.48. 2002-10-26 01:26:36 +00:00
Robert Watson a67fe518a1 Comment describing the semantics of mac_late.
Trim trailing whitespace.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-25 20:45:27 +00:00
Maxime Henrion 4578a2e652 - Rename the DDB specific %z printf format to %y.
- Make DDB use %y instead of %z.
- Teach GCC about %y.
- Implement support for the C99 %z format modifier.

Approved by:	re@
Reviewed by:	peter
Tested on:	i386, sparc64
2002-10-25 19:41:32 +00:00
Peter Wemm 23eeeff7be Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves.  This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43.  Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too.  Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64.  Compiled on sparc64 (a few days ago).
Approved by: re
2002-10-25 19:10:58 +00:00
Poul-Henning Kamp df6b615a42 #include <geom/geom.h> to get proper prototypes. Contrary to my fears we
seem to have all the prerequisites already.

Call g_waitidle() as the first thing in vfs_mountroot() so that we have
it out of the way before we even decide if we should call .._ask() or
.._try().

Call the g_dev_print() function to provide better guidance for the
root-mount prompt.
2002-10-25 18:44:42 +00:00
David Xu ddc4f28155 suspend thread only when it can be interrupted. 2002-10-25 13:12:36 +00:00
David Xu 0cf609706f let thread_schedule_upcall() handle idle kse. 2002-10-25 12:50:31 +00:00
Poul-Henning Kamp fa669ab7b8 Disable the kernacc() check in mtx_validate() until such time that kernacc
does not require Giant.

This means that we may miss panics on a class of mutex programming bugs,
but only if running with a Chernobyl setting of debug-flags.

Spotted by:	Pete Carah <pete@ns.altadena.net>
2002-10-25 08:40:20 +00:00
Poul-Henning Kamp 0d6dc414b4 In vrele() we can actually have a VCHR with v_rdev == NULL if we
came from the bottom of addaliasu().  Don't panic.
2002-10-25 07:58:25 +00:00
Julian Elischer de4723f6e8 fix style-o 2002-10-25 07:17:07 +00:00
Julian Elischer 9d10277721 More work on the interaction between suspending and sleeping threads.
Also clean up some code used with 'single-threading'.

Reviewed by:	davidxu
2002-10-25 07:11:12 +00:00
Kirk McKusick 9ab73fd11a Within ufs, the ffs_sync and ffs_fsync functions did not always
check for and/or report I/O errors. The result is that a VFS_SYNC
or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in
the presence of a hard error writing a disk sector or in a filesystem
full condition. This patch ensures that I/O errors will always be
checked and returned.  This patch also ensures that every call to
VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes
appropriate action when an error is returned.

Sponsored by:   DARPA & NAI Labs.
2002-10-25 00:20:37 +00:00
David Xu 4c40dcd4d7 fix typo. 2002-10-25 00:13:46 +00:00
Julian Elischer 1434d3fe6f Extract out KSE specific code from machine specific code
so that there is ony one copy of it. Fix that one copy
so that KSEs with no mailbox in a KSE program are not a cause
of page faults (this can legitmatly happen).

Submitted by:	(parts) davidxu
2002-10-24 23:09:48 +00:00
Poul-Henning Kamp a2fb4feded Fix the spechash lock order reversal by keeping an updated sum
of v_usecount in the dev_t which vcount() can return without
locking any vnodes.

Seen by:	jhb
2002-10-24 19:38:56 +00:00
Poul-Henning Kamp 7c0c26b4c4 Make sure GEOM has stopped rattling the disks before we try to mount
the root filesystem, this may be implicated in the PC98 issue.
2002-10-24 19:26:08 +00:00
Poul-Henning Kamp 1f59664b68 Don't try to be cute and save a call/return by implementing a degenerate
vrele() inline.
2002-10-24 17:55:49 +00:00
David Xu 33862f40b0 respect TDF_SINTR, also for SINGLE_NO_EXIT threading mode, if a thread
was already suspended, do nothing.
2002-10-24 14:43:48 +00:00
David Xu 9991db0cb5 don't forget to remove kse from idle queue. 2002-10-24 09:16:46 +00:00
Julian Elischer 5c8329ed6c Move thread related code from kern_proc.c to kern_thread.c.
Add code to free KSEs and KSEGRPs on exit.
Sort KSE prototypes in proc.h.
Add the missing kse_exit() syscall.

ksetest now does not leak KSEs and KSEGRPS.

Submitted by:	(parts) davidxu
2002-10-24 08:46:34 +00:00
Dag-Erling Smørgrav f2c1ea8152 Whitespace cleanup. 2002-10-23 10:26:54 +00:00
Alexander Kabaev 96725dd01a Handle binaries with arbitrary number PT_LOAD sections, not only
ones with one text and one data section.

The text and data rlimit checks still needs to be fixed to properly
accout for additional sections.

Reviewed by:	peter (slightly different patch version)
2002-10-23 01:57:39 +00:00
John Baldwin 12f65109c8 Don't dereference the 'x' pointer if it is NULL, instead skip the
assignment.  The netsmb code likes to call these functions with a NULL
x argument a lot.

Reported by:	Vallo Kallaste <kalts@estpak.ee>
2002-10-22 18:44:59 +00:00
Robert Drehmel d08926b1f6 Change the `mutex_prof' structure to use three variables contained
in an anonymous structure as counters, instead of an array with
preprocessor-defined names for indices.  Remove the associated XXX-
comment.
2002-10-22 16:06:28 +00:00
Robert Watson 1cbfd977fd Introduce MAC_CHECK_VNODE_SWAPON, which permits MAC policies to
perform authorization checks during swapon() events; policies
might choose to enforce protections based on the credential
requesting the swap configuration, the target of the swap operation,
or other factors such as internal policy state.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-22 15:53:43 +00:00
Robert Watson 2789e47e2c Missed in previous merge: export sizeof(struct oldmac) rather than
sizeof(struct mac).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-22 15:33:33 +00:00
Robert Watson f7b951a8e0 Support the new MAC user API in kernel: modify existing system calls
to use a modified notion of 'struct mac', and flesh out the new variation
system calls (almost identical to existing ones except that they permit
a pid to be specified for process label retrieval, and don't follow
symlinks).  This generalizes the label API so that the framework is
now almost entirely policy-agnostic.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-22 14:29:47 +00:00
Robert Watson 5cb559a5e0 Regen. 2002-10-22 14:23:52 +00:00
Robert Watson aad1cdc852 Flesh out prototypes for __mac_get_pid, __mac_get_link, and
__mac_set_link, based on __mac_get_proc() except with a pid,
and __mac_get_file(), __mac_set_file() except that they do
not follow symlinks.  First in a series of commits to flesh
out the user API.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-22 14:22:24 +00:00
David Xu 81fd489272 detect idle kse correctly. 2002-10-22 02:27:19 +00:00
Kirk McKusick 9e4b381a54 This update removes a race between unmount and lookup. The lookup
locks the mount point directory while waiting for vfs_busy to clear.
Meanwhile the unmount which holds the vfs_busy lock tried to lock
the mount point vnode. The fix is to observe that it is safe for the
unmount to remove the vnode from the mount point without locking it.
The lookup will wait for the unmount to complete, then recheck the
mount point when the vfs_busy lock clears.

Sponsored by:	DARPA & NAI Labs.
2002-10-22 01:06:44 +00:00
Kirk McKusick e03486d198 This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):

----------------------------
revision 1.65
date: 2002/04/22 06:53:20;  author: phk;  state: Exp;  lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.

The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *".  Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.

Also, curthread may or may not have anything to do with the I/O request
at hand.

The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.

Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.

The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------

As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.

On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.

Sponsored by:	DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@critter.freebsd.dk>
2002-10-22 00:59:49 +00:00
Poul-Henning Kamp c177d125bf GEOM does not (and shall not) propagate flags like D_MEMDISK, so we will
revert to checking the name to determine if our root device is a ramdisk,
md(4) specifically to determine if we should attempt the root-mount RW

Sponsored by:	DARPA & NAI Labs.
2002-10-21 20:09:59 +00:00
Dag-Erling Smørgrav 6d0369001a Reduce the overhead of the mutex statistics gathering code, try to produce
shorter lines in the report, and clean up some minor style issues.
2002-10-21 18:48:28 +00:00
Olivier Houchard e3bf3aea25 One #include <sys/sysctl.h> should be enough.
Approved by:	mux (mentor)
2002-10-21 18:40:40 +00:00
Brooks Davis 29e1b85f97 Use if_printf(ifp, "blah") instead of
printf("%s%d: blah", ifp->if_name, ifp->if_xname).
2002-10-21 02:51:56 +00:00
Thomas Moestl 5775150869 Fix the calculations of the length of the unread message buffer
contents. The code was subtracting two unsigned ints, stored the
result in a log and expected it to be the same as of a signed
subtraction; this does only work on platforms where int and long
have the same size (due to overflows).
Instead, cast to long before the subtraction; the numbers are
guaranteed to be small enough so that there will be no overflows
because of that.
2002-10-20 23:13:05 +00:00
Poul-Henning Kamp 962414a120 We have memset() and memcpy() in the kernel now, so we don't need to
#define them to bzero and bcopy.

Spotted by:	FlexeLint
2002-10-20 22:33:42 +00:00
Julian Elischer 2f030624b1 Add an actual implementation of kse_wakeup()
Submitted by:	Davidxu
2002-10-20 21:08:47 +00:00
Thomas Moestl e381d2455b Add kernel dump support, based on the ia64 version (which was committed
as sparc64/sparc64/dump_machdep.c a while back).
Other than ia64 (which uses ELF), sparc64 uses a homegrown format for
the dumps (headers are required because the physical address and size of
the tsb must be noted, and because physical memory may be discontiguous);
ELF would not offer any advantages here.

Reviewed by:	jake
2002-10-20 17:03:15 +00:00
Poul-Henning Kamp ab33958276 #unifdef the code for checking blessed lock collisions until we need it.
Spotted by:	DARPA & NAI Labs.
2002-10-20 08:48:39 +00:00
Robert Watson a13c67da35 If MAC_MAX_POLICIES isn't defined, don't try to define it, just let the
compile fail.  MAC_MAX_POLICIES should always be defined, or we have
bigger problems at hand.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-20 03:41:09 +00:00
Peter Wemm 8556393bb2 Stake a claim on 418 (__xstat), 419 (__xfstat), 420 (__xlstat) 2002-10-19 22:25:31 +00:00
Peter Wemm c8447553b5 Grab 416/417 real estate before I get burned while testing again.
This is for the not-quite-ready signal/fpu abi stuff.  It may not see
the light of day, but I'm certainly not going to be able to validate it
when getting shot in the foot due to syscall number conflicts.
2002-10-19 22:09:23 +00:00
Robert Watson b614dd131a Add a new 'NOMACCHECK' flag to namei() NDINIT flags, which permits the
caller to indicate that MAC checks are not required for the lookup.
Similar to IO_NOMACCHECK for vn_rdwr(), this indicates that the caller
has already performed all required protections and that this is an
internally generated operation.  This will be used by the NFS server
code, as we don't currently enforce MAC protections against requests
delivered via NFS.

While here, add NOCROSSMOUNT to PARAMASK; apparently this was used at
one point for name lookup flag checking, but isn't any longer or it
would have triggered from the NFS server code passing it to indicate
that mountpoints shouldn't be crossed in lookups.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 21:25:51 +00:00
Robert Watson 3ab93f0958 Regen from addition of execve_mac placeholder. 2002-10-19 21:15:10 +00:00
Robert Watson bc5245d94c Add a placeholder for the execve_mac() system call, similar to SELinux's
execve_secure() system call, which permits a process to pass in a label
for a label change during exec.  This permits SELinux to change the
label for the resulting exec without a race following a manual label
change on the process.  Because this interface uses our general purpose
MAC label abstraction, we call it execve_mac(), and wrap our port of
SELinux's execve_secure() around it with appropriate sid mappings.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 21:06:57 +00:00
Robert Watson 89c61753a0 Drop in the MAC check for file creation as part of open().
Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 20:56:44 +00:00
Robert Watson 9aeffb2b28 Make sure to clear the 'registered' flag for MAC policies when they
unregister.  Under some obscure (perhaps demented) circumstances,
this can result in a panic if a policy is unregistered, and then someone
foolishly unregisters it again.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 20:30:12 +00:00
Robert Watson 7587203c2f Hook up most of the MAC entry points relating to file/directory/node
creation, deletion, and rename.  There are one or two other stray
cases I'll catch in follow-up commits (such as unix domain socket
creation); this permits MAC policy modules to limit the ability to
perform these operations based on existing UNIX credential / vnode
attributes, extended attributes, and security labels.  In the rename
case using MAC, we now have to lock the from directory and file
vnodes for the MAC check, but this is done only in the MAC case,
and the locks are immediately released so that the remainder of the
rename implementation remains the same.  Because the create check
takes a vattr to know object type information, we now initialize
additional fields in the VATTR passed to VOP_SYMLINK() in the MAC
case.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 20:25:57 +00:00
Marcel Moolenaar 1aeb23cdfa Add two hooks to signal module load and module unload to MD code.
The primary reason for this is to allow MD code to process machine
specific attributes, segments or sections in the ELF file and
update machine specific state accordingly. An immediate use of this
is in the ia64 port where unwind information is updated to allow
debugging and tracing in/across modules. Note that this commit
does not add the functionality to the ia64 port. See revision 1.9
of ia64/ia64/elf_machdep.c.

Validated on: alpha, i386, ia64
2002-10-19 19:16:03 +00:00
Marcel Moolenaar c143d6c24a Reduce code duplication by moving the common actions in
link_elf_init(), link_elf_link_preload_finish() and
link_elf_load_file() to link_elf_link_common_finish().
Since link_elf_init() did initializations as a side-effect
of doing the common actions, keep the initialization in
that function. Consequently, link_elf_add_gdb() is now also
called to insert the very first link_map() (ie the kernel).
2002-10-19 18:59:33 +00:00
Marcel Moolenaar 1720979bc5 Non-functional change in preparation of the next commit:
Move link_elf_add_gdb(), link_elf_delete_gdb() and link_elf_error()
near the top of the file. The *_gdb() functions are moved inside
the #ifdef DDB already present there.
2002-10-19 18:43:37 +00:00
Marcel Moolenaar f5b07e11ad In link_elf_load_file(), when SPARSE_MAPPING is defined and we
cannot allocate ef->object, we freed ef before bailing out with
an error. This is wrong because ef=lf and when we have an error
and lf is non-NULL (which holds if we try to alloc ef->object),
we free lf and thus ef as part of the bailing-out.
2002-10-19 05:01:54 +00:00
Alfred Perlstein 871de19fab Don't leak memory in semop(2). (Fix a bug I introduced in rev 1.55.)
Detective work by: jake
2002-10-19 02:07:35 +00:00
John Baldwin 6222047300 Do not lock the process when calling fdfree() (this would have recursed on
a non-recursive lock, the proc lock, before) since we don't need it to
change p_fd.
2002-10-18 17:45:41 +00:00
John Baldwin 6d345e2a45 fdfree() clears p_fd for us, no need to do it again. 2002-10-18 17:44:39 +00:00
John Baldwin 4562d72638 Don't lock the proc lock to clear p_fd. p_fd isn't protected by the proc
lock.
2002-10-18 17:42:28 +00:00
Kirk McKusick 3a096f6c09 Have lockinit() initialize the debugging fields of a lock
when DEBUG_LOCKS is defined.

Sponsored by:	DARPA & NAI Labs.
2002-10-18 01:34:10 +00:00
Kirk McKusick bc7bdd50c1 When the number of dirty buffers rises too high, the buf_daemon runs
to help clean up. After selecting a potential buffer to write, this
patch has it acquire a lock on the vnode that owns the buffer before
trying to write it. The vnode lock is necessary to avoid a race with
some other process holding the vnode locked and trying to flush its
dirty buffers. In particular, if the vnode in question is a snapshot
file, then the race can lead to a deadlock. To avoid slowing down the
buf_daemon, it does a non-blocking lock request when trying to lock
the vnode. If it fails to get the lock it skips over the buffer and
continues down its queue looking for buffers to flush.

Sponsored by:	DARPA & NAI Labs.
2002-10-18 01:29:59 +00:00