what it is.
Be more correct in unbusying the mountpoint (especially before freeing it).
Remove support for mounting 'r' devices as root. You don't mount 'r'
devices anywhere else, and they're going away anyway.
Submitted by: bde
(kern.randompid), which is currently defaulted off. Use ARC4 (RC4) for our
random number generation, which will not get me executed for violating
crypto laws; a Good Thing(tm).
Reviewed and Approved by: bde, imp
commit to kern_synch.c:
----------------------------
revision 1.55
date: 1999/02/23 02:56:03; author: ross; state: Exp; lines: +39 -10
Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code
=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)
=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.
=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.
=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
----------------------------
The details are a little different in FreeBSD:
=== nice bug === Fixing this is the main point of this commit. We use
essentially the same clipping rule as NetBSD (our limit on p_estcpu
differs by a scale factor). However, clipping at all is fundamentally
bad. It gives free CPU the hoggiest hogs once they reach the limit, and
reaching the limit is normal for long-running hogs. This will be fixed
later.
=== New schedclk() mechanism === We don't use the NetBSD schedclk()
(now schedclock()) mechanism. We require (real)stathz to be about 128
and scale by an extra factor of 2 compared with NetBSD's statclock().
We scale p_estcpu instead of scaling the clock. This is more accurate
and flexible.
=== Algorithm change === Same change.
=== Other bugs === The p_pctcpu bug was fixed long ago. We don't try as
hard to abstract functionality yet.
Related changes: the new limit on p_estcpu must be exported to kern_exit.c
for clipping in wait1().
Agreed with by: dufault
commit to kern_synch.c:
----------------------------
revision 1.55
date: 1999/02/23 02:56:03; author: ross; state: Exp; lines: +39 -10
Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code
=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)
=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.
=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.
=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
----------------------------
The details are a little different in FreeBSD:
=== nice bug === Fixing this is the main point of this commit. We use
essentially the same clipping rule as NetBSD (our limit on p_estcpu
differs by a scale factor). However, clipping at all is fundamentally
bad. It gives free CPU the hoggiest hogs once they reach the limit, and
reaching the limit is normal for long-running hogs. This will be fixed
later.
=== New schedclk() mechanism === We don't use the NetBSD schedclk()
(now schedclock()) mechanism. We require (real)stathz to be about 128
and scale by an extra factor of 2 compared with NetBSD's statclock().
We scale p_estcpu instead of scaling the clock. This is more accurate
and flexible.
=== Algorithm change === Same change.
=== Other bugs === The p_pctcpu bug was fixed long ago. We don't try as
hard to abstract functionality yet.
Related changes: the new limit on p_estcpu must be exported to kern_exit.c
for clipping in wait1().
Agreed with by: dufault
and extend. The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).
Agreed with in principle by: dufault
Add MD_ROOT and MD_ROOT_SIZE options to the md driver.
Make the md driver handle MFS_ROOT and MFS_ROOT_SIZE options for compatibility.
Add md driver to GENERIC, PCCARD and LINT.
This is a cleanup which removes the need for some of the worse hacks in
MFS: We really want to have a rootvnode but MFS on a preloaded image
doesn't really have one. md is a true device, so it is less trouble.
This has been tested with make release, and if people remember to add
the "md" pseudo-device to their kernels, PicoBSD should be just fine
as well. If people have no other use for MFS, it can be removed from
the kernel.
with NetBSD and the Single Unix Specification v2.
This updates some structures with other, almost equivalent types and
effort is under way to get the whole more consistent.
Also removes a double definition of INET6 and some other clean-ups.
Reviewed by: green, bde, phk
Some part obtained from: NetBSD, SUSv2 specification
parameter a char ** instead of a const char **. This make these
kernel routines consistent with the corresponding libc userland
routines.
Which is actually 'correct' is debatable, but consistency and
following the spec was deemed more important in this case.
Reviewed by (in concept): phk, bde
for IPv6 yet)
With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
p_trespass(struct proc *p1, struct proc *p2)
which returns zero or an errno depending on the legality of p1 trespassing
on p2.
Replace kern_sig.c:CANSIGNAL() with call to p_trespass() and one
extra signal related check.
Replace procfs.h:CHECKIO() macros with calls to p_trespass().
Only show command lines to process which can trespass on the target
process.
I've made a seperate version (c_index() etc) that use const/const, but
I'm not sure it's worth it considering there is one file in the tree
that uses index on const strings (kern_linker.c) and it's easily adjusted
to scan the strings directly (and is perhaps more efficient that way).
linked list to store the callbak routines. The patch converts the
lists to queue(3) TAILQs, making the code slightly clearer and ensuring
that callbacks are executed in FIFO order.
Man page also updated as necesary.
(discontinued use of M_TEMP malloc type while here anyway /phk)
Submitted by: Jake Burkholder jake@checker.org
PR: 14912
returned to user mode in the spare fields of the stat structure.
PR: kern/14966
Reviewed by: dillon@freebsd.org
Submitted by: Kelly Yancey kbyanc@posi.net
This fixes some nasty procfs problems for SMP, makes ps(1) run much faster,
and makes ps(1) even less dependent on /proc which will aid chroot and
jails alike.
To disable this facility and revert to previous behaviour:
sysctl -w kern.ps_arg_cache_limit=0
For full details see the current@FreeBSD.org mail-archives.
Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.
Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914
Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.
This batch of changes compile to the same object files.
Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914
All Makefiles now use MACHINE_ARCH for the target architecture.
Unification is required for cross-building.
Tags added to:
sys/boot/Makefile
sys/boot/arc/loader/Makefile
sys/kern/Makefile
usr.bin/cpp/Makefile
usr.bin/gcore/Makefile
usr.bin/truss/Makefile
usr.bin/gcore/Makefile:
fixed typo: MACHINDE -> MACHINE_ARCH
Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.
Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code.
Unify spec_open() for bdev and cdev cases.
Remove the disabled bdev specific read/write code.
devsw_module_handler() indirectly and not use the chain arguments. To
eliminate this indirection via that function (which does nothing now)
without duplicating a modevent handler into all the routines that don't
presently have one, supply a NOP (do nothing, return OK) routine which
is functionally equivalent to what's there now. This is a hack and is
still wrong, because there doesn't appear to be anything to reclaim
resources on an unload of a module with one of these in it. I'm not
sure whether to make the NOP handler refuse a MOD_UNLOAD event or what.
We currently only search SCSI and IDE CDROMs; if there's felt to be a
need for supporting the very old and rare soundcard etc. drives for this
application they can be trivially added.
In order to achieve this, root filesystem mount is moved from
SI_ORDER_FIRST to SI_ORDER_SECOND in the SI_SUB_MOUNT_ROOT sysinit
group. Now, modules which wish to usurp the default root mount
can use SI_ORDER_FIRST.
A compiled-in or preloaded MFS filesystem will become the root
filesystem unless the vfs.root.mountfrom environment variable refers
to a valid bootable device. This will normally only be the case when
the kernel and MFS image have been loaded from a disk which has a
valid /etc/fstab file. In this case, the variable should be manually
overridden in the loader, or the kernel booted with -a. In either
case "mfs:" should be supplied as the new value.
Also fix a typo in one DFLTROOT case that would not have compiled.
filesystem is discovered. Preference is given to using the kernel
environment variable vfs.root.mountfrom, which is set by the loader
according to the contents of /etc/fstab. Changes in the MD code
provide fallback mechanisms for systems not using the loader.
A more robust fallback path is also provided, with the last recourse
being to prompt on the console for a root device.
These changes drastically simplify the machine-dependant parts of
the root configuration process. In addition, support for CDROM root
devices has been removed; it was a nasty hack and didn't work.
be ignored by default by the df(1) program. This is used mostly to
avoid stat()-ing entries that do not represent "real" disk mount
points (such as those made by an automounter such as amd.) It is
also useful not to have to stat() these entries because it takes
longer to report them that for other file systems, being that these
mount points are served by a user-level file server and resulting in
several context switches. Worse, if the automounter is down
unexpectedly, a causal df(1) will hang in an interruptible way.
PR: kern/9764
Submitted by: Erez Zadok <ezk@cs.columbia.edu>
"rw" argument, rather than hijacking B_{READ|WRITE}.
Fix two bugs (physio & cam) resulting by the confusion caused by this.
Submitted by: Tor.Egge@fast.no
Reviewed by: alc, ken (partly)
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.
This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
slightly older version of this code was tested by BDE and I.
Also fixes a lockup situation when kva gets too fragmented.
Remove the maxvmiobufspace variable and sysctl, they are no longer
used. Also cleanup (remove) #if 0 sections from prior commits.
This code is more of a hack, but presumably the whole buffer cache
implementation is going to be rewritten in the next year so it's no
big deal.
resource_list_release. This removes the dependancy on the
layout of ivars.
* Move set_resource, get_resource and delete_resource from
isa_if.m to bus_if.m.
* Simplify driver code by providing wrappers to those methods:
bus_set_resource(dev, type, rid, start, count);
bus_get_resource(dev, type, rid, startp, countp);
bus_get_resource_start(dev, type, rid);
bus_get_resource_count(dev, type, rid);
bus_delete_resource(dev, type, rid);
* Delete isa_get_rsrc and use bus_get_resource_start instead.
* Fix a stupid typo in isa_alloc_resource reported by Takahashi
Yoshihiro <nyan@FreeBSD.org>.
* Print a diagnostic message if we can't assign resources to a PnP
device.
* Change device_print_prettyname() so that it doesn't print
"(no driver assigned)-1" for anonymous devices.
can provide the correct context to each signal handler.
Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).
Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.
Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.
Reviewed by: marcel, jdp, bde