Commit graph

6210 commits

Author SHA1 Message Date
Robert Watson 6d1a6a9a9a mac_init_mbuf_tag() accepts malloc flags, not mbuf allocator flags, so
don't try and convert the argument flags to malloc flags, or we risk
implicitly requesting blocking and generating witness warnings.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-04-15 19:33:23 +00:00
Mike Silbersack 352d050e79 Add another MBUF_STRESS_TEST feature, m_defragrandomfailures.
When enabled, this causes m_defrag to randomly return NULL (following
its normal failure case so that extra memory leaks are not introduced.)

Code similar to this was used to find / fix a few bugs last week.
2003-04-15 02:14:43 +00:00
Robert Watson 225bff6f8b Move MAC label storage for mbufs into m_tags from the m_pkthdr structure,
returning some additional room in the first mbuf in a chain, and
avoiding feature-specific contents in the mbuf header.  To do this:

- Modify mbuf_to_label() to extract the tag, returning NULL if not
  found.

- Introduce mac_init_mbuf_tag() which does most of the work
  mac_init_mbuf() used to do, except on an m_tag rather than an
  mbuf.

- Scale back mac_init_mbuf() to perform m_tag allocation and invoke
  mac_init_mbuf_tag().

- Replace mac_destroy_mbuf() with mac_destroy_mbuf_tag(), since
  m_tag's are now GC'd deep in the m_tag/mbuf code rather than
  at a higher level when mbufs are directly free()'d.

- Add mac_copy_mbuf_tag() to support m_copy_pkthdr() and related
  notions.

- Generally change all references to mbuf labels so that they use
  mbuf_to_label() rather than &mbuf->m_pkthdr.label.  This
  required no changes in the MAC policies (yay!).

- Tweak mbuf release routines to not call mac_destroy_mbuf(),
  tag destruction takes care of it for us now.

- Remove MAC magic from m_copy_pkthdr() and m_move_pkthdr() --
  the existing m_tag support does all this for us.  Note that
  we can no longer just zero the m_tag list on the target mbuf,
  rather, we have to delete the chain because m_tag's will
  already be hung off freshly allocated mbuf's.

- Tweak m_tag copying routines so that if we're copying a MAC
  m_tag, we don't do a binary copy, rather, we initialize the
  new storage and do a deep copy of the label.

- Remove use of MAC_FLAG_INITIALIZED in a few bizarre places
  having to do with mbuf header copies previously.

- When an mbuf is copied in ip_input(), we no longer need to
  explicitly copy the label because it will get handled by the
  m_tag code now.

- No longer any weird handling of MAC labels in if_loop.c during
  header copies.

- Add MPC_LOADTIME_FLAG_LABELMBUFS flag to Biba, MLS, mac_test.
  In mac_test, handle the label==NULL case, since it can be
  dynamically loaded.

In order to improve performance with this change, introduce the notion
of "lazy MAC label allocation" -- only allocate m_tag storage for MAC
labels if we're running with a policy that uses MAC labels on mbufs.
Policies declare this intent by setting the MPC_LOADTIME_FLAG_LABELMBUFS
flag in their load-time flags field during declaration.  Note: this
opens up the possibility of post-boot policy modules getting back NULL
slot entries even though they have policy invariants of non-NULL slot
entries, as the policy might have been loaded after the mbuf was
allocated, leaving the mbuf without label storage.  Policies that cannot
handle this case must be declared as NOTLATE, or must be modified.

- mac_labelmbufs holds the current cumulative status as to whether
  any policies require mbuf labeling or not.  This is updated whenever
  the active policy set changes by the function mac_policy_updateflags().
  The function iterates the list and checks whether any have the
  flag set.  Write access to this variable is protected by the policy
  list; read access is currently not protected for performance reasons.
  This might change if it causes problems.

- Add MAC_POLICY_LIST_ASSERT_EXCLUSIVE() to permit the flags update
  function to assert appropriate locks.

- This makes allocation in mac_init_mbuf() conditional on the flag.

Reviewed by:	sam
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-04-14 20:39:06 +00:00
Robert Watson 10eeb10c63 Abstract access to the mbuf header label behind a new function,
mbuf_to_label().  This permits the vast majority of entry point code
to be unaware that labels are stored in m->m_pkthdr.label, such that
we can experiment storage of labels elsewhere (such as in m_tags).

Reviewed by:	sam
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-04-14 18:11:18 +00:00
Robert Watson aa65d9f538 Use MBTOM() to convert mbuf allocator flags to malloc() flags, rather
than using the same compare/substitute in many places.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-04-14 16:04:10 +00:00
Olivier Houchard 695d74f337 Use while (*controlp != NULL) instead of do ... while (*control != NULL)
There are valid cases where *controlp will be NULL at this point.

Discussed with:	dwmalone
2003-04-14 14:44:36 +00:00
Alan Cox de5ef10142 Update locking on the kernel_object to use the new macros. 2003-04-14 00:36:53 +00:00
Jake Burkholder 2373499592 Made vmspace0 non-static. Its useful to be able to identify a vmspace as
the kernel vmspace.
2003-04-13 21:29:11 +00:00
Alan Cox b077a36297 Lock some manipulations of the vm object's flags. 2003-04-13 19:36:18 +00:00
Poul-Henning Kamp 7c1d57b6e8 Since dynamic allocation of device major numbers so far have not
resulted in any earthquakes, civil wars or early onset hair-loss,
I think we can do without the printf announcing the assigned number.
2003-04-13 15:27:49 +00:00
Alan Cox e96c181d16 Use vm_object_pip_wait() rather than reimplementing it. 2003-04-13 05:10:44 +00:00
Jeff Roberson a5f099d0c4 - Unbreak priority prop. for timeshare threads. Always place something on
the current queue if its priority is really elevated.  This needs more work
   as there are cases where a next queue kse could be holding up what would
   be a curr queue kse, and thus hurting interactivity.  Also, when a thread
   with an elevated priority has its priority lowered it should be placed
   back on the next queue.
2003-04-12 22:33:24 +00:00
Jeff Roberson 9bca28a703 - Clean up some debug code left over from my earlier megacommit. 2003-04-12 07:28:36 +00:00
Jeff Roberson b5c4c4a7e5 - We only care about the base priority. Ignore the SCHED_FIFO_BIT so that
we dont get confused.

Reported and debugged by:	Steve Kargl <sgk@troutmask.apl.washington.edu>
2003-04-12 07:00:16 +00:00
David Xu f9b89f7e3e Style fix. 2003-04-12 02:54:46 +00:00
Kelly Yancey f563420e8d Fix race between a process registering a NOTE_EXIT EVFILT_PROC event and
the target process exiting which causes attempts to register the kevent
to randomly fail depending on whether the target runs to completion before
the parent can call kevent(2).  The bug actually effects EVFILT_PROC
events on any zombie process, but the most common manifestation is with
parents trying to monitor child processes.

MFC after:	2 weeks
Sponsored by:	NTT Multimedia Communications Labs
2003-04-12 01:57:04 +00:00
David Xu 5312b1c7fa Check SIG_HOLD action ealier to avoid missing test it in later code. 2003-04-12 00:38:47 +00:00
Jeff Roberson a22ec9d8f2 - Call sched_exit_{kse,thread} and sched_fork{kse,thread} so that thr works
with ULE.  This was not strictly required by sched_4bsd.
2003-04-11 19:24:37 +00:00
Jeff Roberson 141ad61c78 - Add sched_exit_*
- Call sched_exit_kse() from sched_exit() instead of implementing it here.
2003-04-11 19:24:00 +00:00
Jeff Roberson 58177de2de - Only select kseqs with more than one kse to steal. The running kse
is reflected in the load now and you can't very well migrate that.
2003-04-11 18:40:34 +00:00
Jeff Roberson c36ccfa22b - When migrating a kse from one kseq to the next actually insert it onto
the second kseq's run queue so that it is referenced by the kse when
   it is switched out.
 - Spell ksq_rslices properly.

Reported by:	Ian Freislich <ianf@za.uu.net>
2003-04-11 18:37:34 +00:00
Alan Cox a4c9ca4f83 The data in an sf_buf should not be modified by the mbuf system. Mark
the mbuf as read only.

Reviewed by:	gallatin
2003-04-11 07:02:36 +00:00
Jeff Roberson 15dc847e52 - Add a SYSCTL node for the ule scheduler.
- Allow user adjustable min and max time slices (suggested by hiten).
 - Change the SLP_RUN_MAX to 100ms from 2 seconds so that we learn whether a
   process is interactive or not much more quickly.
 - Place a process on the current run queue if it is interactive or if it is
   running at an interrupt thread priority due to priority prop.
 - Use the 'current' timeshare queue for interrupt threads, realtime threads,
   and idle threads that are running at higher priority due to priority prop.
   This fixes problems where priorities would have been elevated but we would
   not check the timeshare run queue until other lower priority tasks were
   no longer runnable.
 - Keep an array of loads indexed by the priority class as well as a global
   load.
 - Keep an bucket of nice values with a count of the number of kses currently
   runnable with that nice value.
 - Keep track of the minimum nice value of any running thread.
 - Remove the unused short term sleep accounting.  I was attempting to use
   this for load balancing but it didn't work out.
 - Define a kseq_print() for use with debugging.
 - Add KTR debugging at useful places so we can easily debug slice and
   priority assignment.
 - Decouple the runq assignment from the kseq assignment.  kseq_add now keeps
   track of statistics.  This is done so that the nice and load is still
   tracked for the currently running process.  Previously if a niced process
   was added while a non nice process was running the niced process would
   still get a slice since it was not aware of the unnice process.
 - Make adjustments for the sched api changes.
2003-04-11 03:47:14 +00:00
Jeff Roberson f7f9e7f34d - Catch up with sched api changes. 2003-04-11 03:39:48 +00:00
Jeff Roberson f6f230febe - Adjust sched hooks for fork and exec to take processes as arguments instead
of ksegs since they primarily operation on processes.
 - KSEs take ticks so pass the kse through sched_clock().
 - Add a sched_class() routine that adjusts a ksegrp pri class.
 - Define a sched_fork_{kse,thread,ksegrp} and sched_exit_{kse,thread,ksegrp}
   that will be used to tell the scheduler about new instances of these
   structures within the same process.  These will be used by THR and KSE.
 - Change sched_4bsd to reflect this API update.
2003-04-11 03:39:07 +00:00
Julian Elischer 060563ec50 Move the _oncpu entry from the KSE to the thread.
The entry in the KSE still exists but it's purpose will change a bit
when we add the ability to lock a KSE to a cpu.
2003-04-10 17:35:44 +00:00
Mike Barcroft 94d079eb1f Regen. 2003-04-09 02:57:29 +00:00
Mike Barcroft fd7a8150fb o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
  fields (pr_path for reporting and pr_root vnode instance) to store
  the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
  vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
  to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
  struct xprison instances that represent a snapshot of active jails.

Reviewed by:	rwatson, tjr
2003-04-09 02:55:18 +00:00
Alan Cox b8831f8d68 Remove some dead code. 2003-04-08 18:24:28 +00:00
Dag-Erling Smørgrav fe58453891 Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header.  Use it instead of hand-
rolled versions wherever applicable.

Submitted by:	Hiten Pandya <hiten@unixdaemons.com>
2003-04-08 14:25:47 +00:00
Jake Burkholder a12efae1ea Merged from kern_thread.c 1.113, avoid a panic in cpu_throw when the first
thread of a multithreaded process exits.

This unrelated and possibly wrong change was not mentioned in the commit
message for kern_thread.c 1.113.
2003-04-08 08:13:47 +00:00
David Xu 36f7b36f8a Inherit blocked thread's context for upcall thread. 2003-04-08 07:45:56 +00:00
Peter Wemm 67db8b23c3 Search for "elf32 kernel" (and elf64) and "elf32 module" (and elf64)
as well as "elf kernel" and "elf module".  This is a precursor to
x86-64 support in the i386 loader so it can load an elf64 x86-64 kernel.
2003-04-06 05:20:00 +00:00
Alan Cox 0b556837a9 Remove an unnecessary trunc_page() from vmapbuf().
Reviewed by:	tegge
2003-04-06 00:40:54 +00:00
Alan Cox ef38cda165 Don't reinitialize fields that are already initialized by getpbuf(). 2003-04-05 23:02:58 +00:00
Alan Cox cdb06eda66 Sufficient access checks are performed by vmapbuf() that calling
useracc() is pointless.  Remove the call to useracc() from physio().

Reviewed by:	tegge
2003-04-05 21:19:58 +00:00
Alan Cox 06363906bc o Remove useracc() calls from aio_qphysio(); they are redundant
given the checks performed by vmapbuf().

Reviewed by:	tegge
2003-04-04 06:26:28 +00:00
Alan Cox 08468b6ad7 o Check the b_bufsize passed to vmapbuf() returning an error
if it is invalid.
 o Remove a debugging printf() from vmapbuf().

Suggested by:   tegge
2003-04-04 06:14:54 +00:00
Poul-Henning Kamp b0fc6220b8 Remove BIO_SETATTR from non-GEOM part of kernel as well. 2003-04-03 19:22:32 +00:00
Jeff Roberson a8949de20e - Keep seperate statistics and run queues for different scheduling classes.
- Treat each class specially in kseq_{choose,add,rem}.  Let the rest of the
   code be less aware of scheduling classes.
 - Skip the interactivity calculation for non TIMESHARE ksegrps.
 - Move slice and runq selection into kseq_add().  Uninline it now that it's
   big.
2003-04-03 00:29:28 +00:00
Peter Wemm cc66ebe2a9 Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup.  Currently its under
options LAZY_SWITCH.  What this does is avoid %cr3 reloads for short
context switches that do not involve another user process.  ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb.  However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still.  There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option.  I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff.  This was more
meant for the kthread case, while his was for interrupts.  Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place.  This allows us to catch the silly
case of doing a cpu_switch() to the current process.  This happens
uncomfortably often.  This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle).  This has been
implemented on i386 and (thanks to jake) sparc64.  The others will come
soon.  This is actually seperate to the lazy switch stuff.

Glanced at by:  jake, jhb
2003-04-02 23:53:30 +00:00
John Baldwin 6751370f6f Lock the process before sending it a SIGIO. Not doing so is a panic(2)
implementation with INVARIANTS.
2003-04-02 21:54:51 +00:00
Jeffrey Hsu c31548c820 Need to hold the same SMP lock for (knote) list traversal as for
list manipulation.  This lock also protects read-modify-write operations
on the pipe_state field.
2003-04-02 15:24:50 +00:00
Jeff Roberson 5053d272c2 - Make the interactivity calculator decay faster.
- Make the pcpu estimator update faster.
2003-04-02 08:22:33 +00:00
Jeff Roberson 98c9b132d1 - I meant divide by two and not shift by two in SCHED_PRI_NHALF. 2003-04-02 08:21:24 +00:00
Jake Burkholder cef57e7624 - Make casuptr return the old value of the location we're trying to update,
and change the umtx code to expect this.

Reviewed by:	jeff
2003-04-02 08:02:27 +00:00
Jeff Roberson 245f3abfd5 - Add in support for KSEs with 0 slice values on the run queue. If we try
to select a KSE with a slice of 0 we will update its slice and insert it
   onto the next queue.
 - Pass the KSE instead of the ksegrp into sched_slice().  This more
   accurately reflects the behavior of the code.  Slices are granted to kses.
 - Add a function kseq_nice_min() which finds the smallest nice value
   assigned to the kseg of any KSE on the queue.
 - Rewrite the logic in sched_slice().  Add a large comment describing the
   new slice selection scheme.  To summarize, slices are assigned based on
   the nice value.  Priorities are still calculated based on the nice and
   interactivity of a process.  Slice sizes of 0 may be granted for KSEs
   whos nice is 20 or futher away from the lowest nice on the run queue.
   Other nice values are scaled across the range [min, min+20].  This fixes
   ULEs bad behavior with positively niced processes.
2003-04-02 06:46:43 +00:00
Jake Burkholder fc2fca74d8 - Fix UC_COPY_SIZE. Adding up the size of structure fields doesn't take
alignment into account.
- Return EJUSTRETURN from set_context on success to avoid clobbering the
  first 2 out registers with td_retval on sparc64.
2003-04-01 23:25:18 +00:00
Poul-Henning Kamp 817509273e #include <geom/geom_disk.h> 2003-04-01 19:00:38 +00:00
Poul-Henning Kamp af6ca7f4a9 Introduce bioq_flush() function. 2003-04-01 12:49:40 +00:00