Commit graph

5003 commits

Author SHA1 Message Date
Jung-uk Kim 3a33908404 Regen for set_thread_area. 2007-03-30 00:08:21 +00:00
Jung-uk Kim 9c5b213e51 MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.
Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by:	emulation
2007-03-30 00:06:21 +00:00
Julian Elischer 6734f35eac Implement the openat() linux syscall
Submitted by:	Roman Divacky (rdivacky@)
MFC after:	2 weeks
2007-03-29 02:11:46 +00:00
Kris Kennaway 67eae018cb Remove unnecessary giant acquisition around panic in #ifdef DIAGNOSTIC
code.

# There is some question about whether this code is even relevant any
# longer (it dates back to prehistoric times, i.e. present in r1.1),
# especially on amd64.

Reviewed by:	jhb
2007-03-26 21:45:44 +00:00
Nate Lawson 0d4ac62a35 Add an interface for drivers to be notified of changes to CPU frequency.
cpufreq_pre_change is called before the change, giving each driver a chance
to revoke the change.  cpufreq_post_change provides the results of the
change (success or failure).  cpufreq_levels_changed gives the unit number
of the cpufreq device whose number of available levels has changed.  Hook
in all the drivers I could find that needed it.

* TSC: update TSC frequency value.  When the available levels change, take the
highest possible level and notify the timecounter set_cputicker() of that
freq.  This gets rid of the "calcru: runtime went backwards" messages.
* identcpu: updates the sysctl hw.clockrate value
* Profiling: if profiling is active when the clock changes, let the user
know the results may be inaccurate.

Reviewed by:	bde, phk
MFC after:	1 month
2007-03-26 18:03:29 +00:00
Jung-uk Kim 2be4e4713a Catch up with ACPI-CA 20070320 import. 2007-03-22 18:16:43 +00:00
John Baldwin d66ff27773 Change the amd64, i386, and ia64 nexus drivers to setup bus space tags and
handles when activating a resource via bus_activate_resource() rather than
doing some of the work in bus_alloc_resource() and some of it in
bus_activate_resource().

One note is that when using isa_alloc_resourcev() on PC-98, drivers now
need to just use bus_release_resource() without explicitly calling
bus_deactivate_resource() first.  nyan@ has already fixed all of the PC-98
drivers.
2007-03-21 15:36:38 +00:00
John Baldwin b8783b00f8 Add a new apic0 psuedo-device to claim memory resources for the memory
address ranges used by local and I/O APICs in the system.  Some systems
also reserve these ranges as system resources via either PnPBIOS or
ACPI, so this device currently attaches after acpi0 and legacy0 so that
the system resources are given precedence.
2007-03-20 21:53:31 +00:00
John Baldwin 95a07592ee Add a new ram0 pseudo-device that claims memory resouces for physical
addresses corresponding to system RAM.  On amd64 ram0 uses the SMAP
and claims all the type 1 SMAP regions.  On i386 ram0 uses the
dump_avail[] array.  Note that on i386 we have to ignore regions above
4G in PAE kernels since bus resources use longs.
2007-03-20 21:08:39 +00:00
Jung-uk Kim 2498f259d4 - Add macros for newly added CPUID bits in the corresponding header files.
- Use correct capticalization in xTPR as Intel uses in their documents.
- Use proper description instead of vendor code name in comment.
2007-03-20 20:22:45 +00:00
John Baldwin ce533e82a2 Tweak the probe/attach order of devices on the x86 nexus devices.
Various BIOS-related psuedo-devices are added at an order of 5.  acpi0 is
added at an order of 10, and legacy0 is added at an order of 11.
2007-03-20 20:21:44 +00:00
John Baldwin 86f07bb052 MFi386 1.173: Display two new Intel feature bits. 2007-03-20 18:48:04 +00:00
Jung-uk Kim ab5916a526 Add another CPUID for AMD CPUs and fix style(9) while I am here. 2007-03-12 20:27:21 +00:00
Alan Cox c640357f04 Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file.  Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb
2007-03-11 05:54:29 +00:00
Alan Cox a2a90145e8 Completely eliminate "avail_start". It serves no useful purpose. 2007-03-10 20:26:43 +00:00
John Baldwin 07bb813626 Defer calling lapic_init() until we've completed the 'MPTable: <...>'
printf.  Otherwise, printfs inside of lapic_init() (such as during a
verbose boot) can uglify the output.
2007-03-09 15:49:57 +00:00
Mohan Srinivasan f9bb753844 Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
2007-03-09 04:02:38 +00:00
Scott Long 94e6bc303f Don't increment total_bounced when doing no-op dmamap_sync ops. 2007-03-06 18:28:43 +00:00
John Baldwin 4c5bec1161 Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid))
rather than local APIC IDs to keep track of CPUs which can handle
interrupts.
2007-03-06 17:16:47 +00:00
Alan Cox 8da3fc95a7 Acquiring smp_ipi_mtx on every call to pmap_invalidate_*() is wasteful.
For example, during a buildworld more than half of the calls do not
generate an IPI because the only TLB entry invalidated is on the calling
processor.  This revision pushes down the acquisition and release of
smp_ipi_mtx into smp_tlb_shootdown() and smp_targeted_tlb_shootdown() and
instead uses sched_pin() and sched_unpin() in pmap_invalidate_*() so that
thread migration doesn't lead to a missed TLB invalidation.

Reviewed by: jhb
MFC after: 3 weeks
2007-03-05 21:40:10 +00:00
John Baldwin aa7a005ee0 Use vm_paddr_t rather than uintptr_t when passing the physical address of
APICs to lapic_init() and ioapic_create().
2007-03-05 20:35:17 +00:00
John Baldwin db181dc741 Add a simple device driver to "eat" any I/O APICs that show up as PCI
devices.

MFC after:	1 week
2007-03-05 16:22:49 +00:00
Jung-uk Kim a4e3bad794 MFP4: 115220, 115222
- Fix style(9) and reduce diff between amd64 and i386.
- Prefix Linuxulator macros with LINUX_ to prevent future collision.
2007-03-02 00:08:47 +00:00
Jung-uk Kim 6a5964d385 MFP4: 115094
Linux does not check file descriptor when MAP_ANONYMOUS is set.
This should fix recent LTP test regressions.

Reported by:	Scot Hetzel (swhetzel at gmail dot com)
		netchild
2007-02-27 02:08:01 +00:00
Alexander Leidinger 802e08a360 Partial MFp4 of 114977:
Whitespace commit: Fix grammar, spelling and punctuation.

Submitted by:	"Scot Hetzel" <swhetzel@gmail.com>
2007-02-24 16:49:25 +00:00
John Baldwin 860d8e2312 Use ih_filter instead of ih_handler in a couple of places. This fixes
most INTR_FAST handlers on i386.

Reviewed by:	piso
2007-02-23 20:03:24 +00:00
Paolo Pisati ef544f6312 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@
2007-02-23 12:19:07 +00:00
Konstantin Belousov e277569ee2 MFi386 rev. 1.544 of i386/i386/pmap.c:
Rounding addr upwards to next 2M boundary in pmap_growkernel() could
cause addr to become 0, resulting in an early return without populating
the last PDE.

Reported and tested by:	kris
Suggested by:	alc
MFC after:	1 week
2007-02-19 10:55:16 +00:00
Alan Cox ae0663a383 Eliminate some acquisitions and releases of the page queues lock that are
no longer necessary.
2007-02-18 06:33:02 +00:00
John Baldwin 9be403be00 Add bootverbose printfs to indicate which IDT vectors are assigned to MSI
interrupts.
2007-02-15 22:22:57 +00:00
Jung-uk Kim 4d93342633 Fix accidental removal of an empty line from the previous commit. 2007-02-15 01:20:43 +00:00
Jung-uk Kim da351821f7 Regen. 2007-02-15 01:15:31 +00:00
Jung-uk Kim 1e5ed8c1c2 MFP4: 113033
Port iopl(2) from i386.  This fixes LTP iopl01 and iopl02 on amd64.
2007-02-15 01:13:36 +00:00
Jung-uk Kim 10931a467a MFP4: 113025, 113146, 113177, 113203, 113500, 113546, 113570
- PROT_READ, PROT_WRITE, or PROT_EXEC implies PROT_READ and PROT_EXEC.
Linux/ia64's i386 emulation layer does this and it complies with Linux
header files.  This fixes mmap05 LTP test case on amd64.
- Do not adjust stack size when failure has occurred.
- Synchronize i386 mmap/mprotect with amd64.
2007-02-15 00:54:40 +00:00
Brooks Davis 983f970981 Include GEOM_LABEL in GENERIC. It's very useful and not well publicized
enough.

Approved by:	pjd
2007-02-09 19:03:18 +00:00
John Baldwin 71ddf30bd2 Don't send interrupts to CPUs disabled via lapic hints.
Reported by:	Ludger Bolmerg <lbolmerg ! web.de>
MFC after:	3 days
Pointy hat to:	jhb
2007-02-08 16:49:59 +00:00
Marcel Moolenaar 1d3aed33e8 Evolve the ctlreq interface added to geom_gpt into a generic
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).

The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
2007-02-07 18:55:31 +00:00
Bruce Evans 1300fd67f3 Fixed some style bugs. Routine except:
- don't use __GNUCLIKE___OFFSETOF, since __offsetof() is a standard
  FreeBSD implementaion detail which has nothing to do with GNUC.
2007-02-06 18:04:02 +00:00
Bruce Evans 3764a82377 Simplified PCPU_GET() and PCPU_SET(). We must copy through a temporary
variable to avoid invalid constraints in dead code.  Use an array of
u_char's (inside a struct) instead of a char/short/int/long variable so
that the variable and its accesses can be spelled in the same way in all
cases and code doesn't need to be cloned just to hold the spelling
differences.

Fixed strict-aliasing errors in PCPU_SET() and in the amd64 PCPU_GET().
Cast to (void *) as in rev.1.37 of the i386 version where the errors
were fixed for the i386 PCPU_GET() only.  It would be more correct to
copy to and from the temp. variable using memcpy(), but then an
ifdef tangle would be required to ensure using the builtin memcpy().
We depend on fairly aggressive optimization to put the temp. variable
only in a register despite it being copied using
*(type *)(void *)&anothertype and could depend on this when using
memcpy() too.  This seems to work right even for -O0, but the -O0 case
has not been completely tested.

This change gives identical object code for all object files in LINT
on amd64 (except for one file with a __TIME__ stamp).  For LINT on
i386 it gives unimportant differences in instruction order and padding
in a few object files.  This was only tested for -O.

This change (actually a previous version of it) gives the following
reductions in the number of object files in LINT that fail to compile
with -O2 but without the -fno-strict-aliasing kludge:
- amd64: 29 (down from 211)
- i386: 36 (down from 47)

gcc-3.4.6 actually allows the invalid constraints that result from not
using the temp. variable, at least with -O[1-2], but gcc-3.3.3 crashes
on them and I don't want to depend on compiler bugs.
2007-02-06 16:21:09 +00:00
John Baldwin c632517124 Change GDB_BUFSZ to be large enough to hold a register dump where each
register takes 16 characters (64-bit register in hex).  In practice this
is a slight bit of overkill as 7 of the 56 registers are only 32-bit, but
having the buffer too small results in remote kgdb trashing kernel memory
when it connects.

PR:		amd64/108673
Submitted by:	Ravi Murty, Nikhil Rao @ Intel
MFC after:	3 days
2007-02-05 21:48:32 +00:00
Konstantin Belousov d0b2365eec Introduce some more SO_ option equivalents from Linux to FreeBSD.
The msg variable in linux_recvmsg() was not initialized.
Copy it from userspace.

Submitted by: rdivacky
2007-02-01 13:36:19 +00:00
Konstantin Belousov a9ccaccfc3 Fix LOR that occurs because proctree_lock was acquired while holding
emuldata lock by moving the code upwards outside the emul_lock coverage.

Submitted by: rdivacky
2007-02-01 13:27:52 +00:00
Konstantin Belousov 84fbdf86b3 MFi386: Use LINUX_SIG_VALID macro.
Submitted by: rdivacky
2007-02-01 13:24:40 +00:00
Joseph Koshy 20c71e39c3 Use a known good stack at the time of servicing an NMI --- reuse
the space allocated for the double fault handler since this space
is otherwise unused till the time a double fault occurs.

This change should have been committed alongside r1.127 of
"exception.S", but I somehow missed doing so.

Problem reported by:	jeff
Pointy hat to:		jkoshy
2007-01-27 18:13:24 +00:00
Jeff Roberson f0393f063a - Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty.  The few asserts and thread state
   setting were moved to the individual schedulers.  sched_add() was
   chosen to displace it for naming consistency reasons.
 - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
   different on all three schedulers where it was only called in one place
   each.
 - Remove the long ifdef'd out remrunqueue code.
 - Remove the now redundant ts_state.  Inspect the thread state directly.
 - Don't set TSF_* flags from kern_switch.c, we were only doing this to
   support a feature in one scheduler.
 - Change sched_choose() to return a thread rather than a td_sched.  Also,
   rely on the schedulers to return the idlethread.  This simplifies the
   logic in choosethread().  Aside from the run queue links kern_switch.c
   mostly does not care about the contents of td_sched.

Discussed with:	julian

 - Move the idle thread loop into the per scheduler area.  ULE wants to
   do something different from the other schedulers.

Suggested by:	jhb

Tested on:	x86/amd64 sched_{4BSD, ULE, CORE}.
2007-01-23 08:46:51 +00:00
Jeff Roberson 3c93ca7d2f - Allow the schedulers to IPI_PREEMPT idlethread. This puts the decision
for this behavior on the initiator side.
2007-01-23 08:38:39 +00:00
Bruce Evans 71799af2d5 Cleaned up declaration and initialization of clock_lock. It is only
used by clock code, so don't export it to the world for machdep.c to
initialize.  There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work.  Split it up a bit more and do the first part as
late as possible to document the necessary order.  The functions that
implement the split are still bogusly exported.

Cleaned up initialization of the i8254 clock hardware using the new
split.  Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.

This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug.  The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.
2007-01-23 08:01:20 +00:00
John Baldwin 5fe82bca57 Expand the MSI/MSI-X API to address some deficiencies in the MSI-X support.
- First off, device drivers really do need to know if they are allocating
  MSI or MSI-X messages.  MSI requires allocating powerof2() messages for
  example where MSI-X does not.  To address this, split out the MSI-X
  support from pci_msi_count() and pci_alloc_msi() into new driver-visible
  functions pci_msix_count() and pci_alloc_msix().  As a result,
  pci_msi_count() now just returns a count of the max supported MSI
  messages for the device, and pci_alloc_msi() only tries to allocate MSI
  messages.  To get a count of the max supported MSI-X messages, use
  pci_msix_count().  To allocate MSI-X messages, use pci_alloc_msix().
  pci_release_msi() still handles both MSI and MSI-X messages, however.
  As a result of this change, drivers using the existing API will only
  use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
  values (and thus does not require all of the messages to have their
  MD vectors allocated as a group), some devices allow for "sparse" use
  of MSI-X message slots.  For example, if a device supports 8 messages
  but the OS is only able to allocate 2 messages, the device may make the
  best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
  than default of using the first N slots (or indicies) at 1 and 2.  To
  support this, add a new pci_remap_msix() function that a driver may call
  after a successful pci_alloc_msix() (but before allocating any of the
  SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
  assigned to different message indices.  For example, from the earlier
  example, after pci_alloc_msix() returned a value of 2, the driver would
  call pci_remap_msix() passing in array of integers { 1, 4 } as the
  new message indices to use.  The rid's for the SYS_RES_IRQ resources
  will always match the message indices.  Thus, after the call to
  pci_remap_msix() the driver would be able to access the first message
  in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
  SYS_RES_IRQ rid 4.  Note that the message slots/indices are 1-based
  rather than 0-based so that they will always correspond to the rid
  values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
  To support this API, a new PCIB_REMAP_MSIX() method was added to the
  pcib interface to change the message index for a single IRQ.

Tested by:	scottl
2007-01-22 21:48:44 +00:00
Alexander Leidinger d071f5048c MFp4 (113077, 113083, 113103, 113124, 113097):
Dont expose em->shared to the outside world before its properly
	initialized. Might not affect anything but its at least a better
	coding style.

	Dont expose em via p->p_emuldata until its properly initialized.
	This also enables us to get rid of some locking and simplify the
	code because we are workin on a local copy.

	In linux_fork and linux_vfork create the process in stopped state
	to be sure that the new process runs with fully initialized emuldata
	structure [1]. Also fix the vfork (both in linux_clone and linux_vfork)
	race that could result in never woken up process [2].

Reported by:	Scot Hetzel	[1]
Suggested by:	jhb		[2]
Reviewed by:	jhb (at least some important parts)
Submitted by:	rdivacky
Tested by:	Scot Hetzel (on amd64)

Change 2 comments (in the new code) to comply to style(9).

Suggested by:	jhb
2007-01-20 14:58:59 +00:00
Craig Rodrigues 5a09873361 Revert previous change.
Requested by:	kan
2007-01-18 05:46:32 +00:00
Craig Rodrigues e76c6d8cd3 Forward declare __pcpu as a pointer type instead of an array type to
eliminate GCC 4.1 error: "array type has incomplete element type".
2007-01-18 02:00:04 +00:00
Alexander Leidinger 973ac082f8 MFp4 (112893):
Make linux_vfork() actually work. This enables make to work again with 2.6.
It also fixes the LTP vfork tests.

Submitted by:	rdivacky
2007-01-14 16:20:37 +00:00
Warner Losh fed32d7544 Remove 3rd clause, renumber, ok per email 2007-01-12 07:26:21 +00:00
John Baldwin cad688f011 Remove magic from rman_activate_resource() that uses the direct map at
KERNBASE for the first 1 MB of RAM instead of calling pmap_mapdev().
pmap_mapdev() knows how to handle the first 1 MB (and has known for a
while now) and properly maps the memory as UC to boot.

MFC after:	2 weeks
2007-01-11 19:40:19 +00:00
Jeff Roberson b31e373bf7 - Use the correct test in the ipi bitmask handler for IPI_PREEMPT so that
we actually issue preemptions.
 - Remove the #ifdef IPI_PREEMPTION so it is always compiled in.  Leave
   the option which optionally enables support in sched_4bsd.  sched_ule.c
   will soon use this functionality as a run time rather than compile time
   option.
 - Compare against the idlethread rather than the priority.  There are some
   idle prio tasks that we can preempt.

Discussed with:	ups
Tested on:	i386, amd64
2007-01-11 00:17:02 +00:00
Jung-uk Kim 5efc6c44ff Add SSSE3 extensions and correct CNXT-ID spelling for Intel processors. 2007-01-09 19:23:22 +00:00
Alexander Leidinger 1c65504ca8 MFp4 (112498):
Rename the locking flags to EMUL_DOLOCK and EMUL_DONTLOCK to prevent confusion.

Submitted by:	rdivacky
2007-01-07 19:00:38 +00:00
Alexander Leidinger 4f383e20a9 MFi386 rev 1.56:
Bring the linux mmap code more into line with how linux (2.4.x) behaves.

Tested by:	Scot Hetzel <swhetzel@gmail.com> on amd64 without PROT_EXEC

Additionally to the i386 version always use PROT_EXEC in the mapping like the
previous version of the amd64 code did. We need to examinate this further to
decide what the right thing to do is. For now this fixes several problems in
the LTP test runs and should behave regarding PROT_EXEC like before.
2007-01-06 15:58:34 +00:00
Alexander Leidinger 99e9dcf022 regen after addition of linux_utimes and linux_rt_sigtimedwait 2006-12-31 13:20:31 +00:00
Alexander Leidinger c9447c7551 MFp4 (111746, 108671, 108945, 112352):
- add linux utimes syscall [1]
 - add linux rt_sigtimedwait syscall [2]

Submitted by:	"Scot Hetzel" <swhetzel@gmail.com> [1]
Submitted by:	Bruce Becker <hostmaster@whois.gts.net> [2]
PR:		93199 [2]
2006-12-31 13:16:00 +00:00
Bruce Evans f28e1c8f99 Fixed some style bugs (mainly assorted errors in comments, and inconsistent
spelling of `result').
2006-12-29 15:29:49 +00:00
Bruce Evans 6c296ffa81 Fixed some style bugs (whitespace only). 2006-12-29 14:28:23 +00:00
Bruce Evans 7e4277e591 Try harder to garbage-collect the "LOCORE" (really asm) version of
MPLOCKED.  The cleaning in rev.1.25 was supposed to have been undone
by rev.1.26, but 1.26 could never have actually affected asm files
since atomic.h is full of C declarations so including it in asm files
would just give syntax errors.  The asm MPLOCKED is even less needed
than when misplaced definitions of it were first removed, and is now
unused in any asm file in the src tree except in anachronismns in
sys/i386/i386/support.s.
2006-12-29 13:36:26 +00:00
Robert Watson e9e1341c06 Regenerate. 2006-12-29 01:17:09 +00:00
Robert Watson a46b391df7 Assign or clean up audit identifiers for a number of additional Linux
system calls on the amd64 architecture.

Some minor white space tweaks for consistency with other syscalls.master
files.

Obtained from:	TrustedBSD Project
2006-12-29 01:17:02 +00:00
Bruce Evans 276c702d8d Removed gratuitous cosmetic differences with the i386 version. This
mainly involves removing all __CC_SUPPORTS___INLINE__ ifdefs.  These
ifdefs are even less needed for amd64 than for i386, but the i386
atomic.h never had them.  The ifdefs here were just an optimization
of obsolescent compatibility cruft (__inline) for a null set of
compilers.  I think null sets of compilers should only be supported
in cases where this is more than an optimization, doesn't require
extensive ifdefs, and only involves not-so-obsolescent compatibility
cruft (plain inline here).
2006-12-28 08:15:14 +00:00
Bruce Evans 26ab2d1d23 Avoid an instruction in atomic_cmpset_{int_long)() in most cases.
These functions are used a lot for mutexes, so this reduces the text
size of an average kernel by about 0.75%.  This wasn't intended to
be a significant optimization, but it somehow increased the maximum
number of packets per second that can be transmitted by my bge hardware
from 320000 to 460000 (this benchmark is CPU-bound and remarkably
sensitive to changes in the text section).

Details: we would prefer to leave the result of the cmpxchg in %al,
but cannot tell gcc that it is there, so we have to convert it to an
integer register.  We converted  to %al, then to %[re]ax, but the
latter step is usually wasted since gcc usually only wants the condition
code and can recover it from %al just as easily as from %[re]ax.  Let
gcc promote %al in the few cases where this is needed.

Nearby style fixes;
- let gcc manage the load of `res', and don't abuse `res' for a copy of `exp'
- don't echo `res's name in comments
- consistently spell the condition code as 'e' after comparison for equality
- don't hard-code %al anywhere except in constraints
- for the version that doesn't use cmpxchg, there is no requirement to use
  %al anywhere, so don't hard-code it in the constraints either.

Style non-fix:
- for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al)
  for the main output operand, although this is not required.  The input
  and output operands that use the "a" constraint are now decoupled, and
  this makes things clearer except for the reason that the output register
  is hard-coded.  It is now just a hack to tell gcc that the input "a" has
  been clobbered without increasing the number of operands.
2006-12-27 20:26:00 +00:00
David Xu 4b0f4e9d9e Fix a panic when rebooting a SMP machine, when option STOP_NMI is used,
nmi handler is used to stop other processors, nmi hander calls trap(),
however, trap() now accepts a pointer rather than a reference, this was
changed by kmacy@.
2006-12-23 03:30:50 +00:00
Jung-uk Kim 77424f4177 MFP4: 109655
- Move linux_nanosleep() from src/sys/amd64/linux32/linux32_machdep.c to
src/sys/compat/linux/linux_time.c.
- Validate timespec ranges before use as Linux kernel does.
- Fix l_timespec structure.
- Clean up style(9) nits.
2006-12-20 20:17:35 +00:00
David Xu 4e32b7b3cc Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
    kern.threads.umtx_dflt_spins
        if the sysctl value is non-zero, a zero umutex.m_spincount will
        cause the sysctl value to be used a spin cycle count.
    kern.threads.umtx_max_spins
        the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130
2006-12-20 04:40:39 +00:00
Kip Macy 1726d94f4e Evidently neither GENERIC nor kan's config had isa in it :-0. As
Doug Barton says, "embrace the LINT".
2006-12-17 21:51:44 +00:00
Kip Macy e5f8d4099d Newer versions of gcc don't support treating structures passed by value
as if they were really passed by reference. Specifically, the dead stores
elimination pass in the GCC 4.1 optimiser breaks the non-compliant behavior
on which FreeBSD relied. This change brings FreeBSD up to date by switching
trap frames to being explicitly passed by reference.

Reviewed by: kan
Tested by: kan
2006-12-17 06:48:40 +00:00
Pyun YongHyeon 1f90cf9895 Add msk(4) to the list of drivers supported by GENERIC kernel. 2006-12-13 03:41:47 +00:00
John Baldwin 8964299ac8 Give Host-PCI bridge drivers their own pcib_alloc_msi() and
pcib_alloc_msix() methods instead of using the method from the generic
PCI-PCI bridge driver as the PCI-PCI methods will be gaining some PCI-PCI
specific logic soon.
2006-12-12 19:27:01 +00:00
John Baldwin fde45e231a Sort function prototypes. 2006-12-12 19:24:45 +00:00
John Baldwin c304531851 Add a function to return the MD interrupt source cookie associated with
an interrupt event.  Use this in the x86 code to fixup the intrcnt names
when an interrupt handler is removed.
2006-12-12 19:20:19 +00:00
Maxim Sobolev efa43a53bd Allow machdep.cpu_idle_hlt to be set from the loader. This should allow
to workaround the problem with SMP kernels on Turion64 X2 processors
described in kern/104678 and may be useful in other situations too.

MFC after:	3 days
2006-12-06 18:27:17 +00:00
Julian Elischer ad1e7d285a Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
Ruslan Ermilov 3cbc967ef7 Use a different bitmask for superpages' base address so that it
doesn't conflict with the PG_PDE_PAT bit.  (We still don't mask
off all the reserved bits but that's okay for now.)

Reviewed by:	alc
2006-12-05 11:31:33 +00:00
Alexander Leidinger 786e4fc47d MFP4 (110939):
MFi386: return EOPNOTSUPP for unknown module events.

Submitted by:	rdivacky
2006-12-03 21:06:07 +00:00
Alexander Leidinger 43d9d89b3f Sync with i386 (remove the LINUX stuff) now that the module is usable. 2006-12-03 21:02:09 +00:00
Bruce Evans b73057227b Optimized RTC accesses by avoiding null writes to the index register
and by only delaying when an RTC register is written to.  The delay
after writing to the data register is now not just a workaround.

This reduces the number of ISA accesses in the usual case from 4 to
1.  The usual case is 2 rtcin()'s for each RTC interrupt.  The index
register is almost always RTC_INTR for this.  The 3 extra ISA accesses
were 1 for writing the index and 2 for delays.  Some delays are needed
in theory, but in practice they now just slow down slow accesses some
more since almost eveyone including us does them wrong so modern systems
enforce sufficient delays in hardware.  I used to have the delays ifdefed
out, but with the index register optimization the delays are rarely
executed so the old magic ones can be kept or even implemented non-
magically without significant cost.

Optimizing RTC interrupt handling is more interesting than it used to
be because RTC interrupts are currently needed to fix the more efficient
apic timer interrupts on some systems.  apic_timer_hz is normally 2000
so the RTC interrupt rate needs to be 2048 to keep the apic timer
firing on such systems.  Without these changes, each RTC interrupt
normally took 10 ISA accesses (2 PIC accesses and 2 sets of 4 RTC
accesses).  Each ISA access takes 1-1.5uS so 10 of then at 2048 Hz
takes 2-3% of a CPU.  Now 4 of them take 0.8-1.2% of a CPU.
2006-12-03 03:49:28 +00:00
John Birrell e0b651251d Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.
2006-11-30 04:17:05 +00:00
Ruslan Ermilov 34028cf7d1 Differentiate between data and instruction fetch in the fatal
page fault trap handler.

Reviewed by:	alc
2006-11-28 20:04:00 +00:00
Ruslan Ermilov ca830b9a74 Use a define instead of a "magic" value. 2006-11-23 21:37:04 +00:00
Ruslan Ermilov 7b0381568e Finish the PG_NX support at the pmap level.
Reviewed by:	alc
2006-11-23 21:36:02 +00:00
Ruslan Ermilov f27eb21694 It's been possible to build linprocfs as a module for some time now.
Submitted by:	rdivacky
2006-11-22 10:34:12 +00:00
Alan Cox da44960498 The global variable avail_end is redundant and only used once. Eliminate
it.  Make avail_start static to the pmap on amd64.  (It no longer exists
on other architectures.)
2006-11-19 20:54:58 +00:00
John Baldwin 81efc3d94c Add support for 8 byte hardware watches in long mode. Kernel hardware
watches support 8 byte watches.  For userland, we disallow 8 byte watches
for 32-bit tasks.
2006-11-17 20:27:01 +00:00
John Baldwin 7693afca4e - Add macro constants for the various fields in %dr7 and use them in place
of various scattered magic values.
- Pretty print the address of hardware watchpoints in 'show watch' rather
  than just displaying hex.
- Expand address field width on amd64 for 64-bit pointers.
2006-11-17 19:20:32 +00:00
John Baldwin 5527d3ed75 Trim some noise from bootverbose:
- Drop the printf in intr_machdep.c when we assign an interrupt souce to
  a CPU.  Each source already has a more detailed printf.
- Don't output a line for each ioapic pin showing its initial state, this
  has outlived its usefulness.
- When an APIC enumerator sets the bus, polarity, or trigger mode of an
  ioapic pin, just return success without printing anything if the new
  value matches the current one.

MFC after:	2 weeks
2006-11-17 16:41:03 +00:00
John Baldwin 5d346a567c A few more style fixes. 2006-11-17 16:37:35 +00:00
John Baldwin 71f4007710 Various whitespace and style fixes. 2006-11-15 19:53:48 +00:00
John Baldwin 15f266289d Fix a typo that broke MSI (MSI-X worked fine) in the later revisions of
the MSI patches.
2006-11-15 18:40:00 +00:00
John Baldwin 4184900911 MD support for PCI Message Signalled Interrupts on amd64 and i386:
- Add a new apic_alloc_vectors() method to the local APIC support code
  to allocate N contiguous IDT vectors (aligned on a M >= N boundary).
  This function is used to allocate IDT vectors for a group of MSI
  messages.
- Add MSI and MSI-X PICs.  The PIC code here provides methods to manage
  edge-triggered MSI messages as x86 interrupt sources.  In addition to
  the PIC methods, msi.c also includes methods to allocate and release
  MSI and MSI-X messages.  For x86, we allow for up to 128 different
  MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs,
  16-254 for APIC PCI IRQs, and IRQ 255 is reserved).
- Add pcib_(alloc|release)_msi[x]() methods to the MD x86 PCI bridge
  drivers to bubble the request up to the nexus driver.
- Add pcib_(alloc|release)_msi[x]() methods to the x86 nexus drivers that
  ask the MSI PIC code to allocate resources and IDT vectors.

MFC after:	2 months
2006-11-13 22:23:34 +00:00
John Baldwin 818b0b4bdf Various fixes:
- Remove an extra entry from the array for 0x0f prefixed instruction groups.
  This fixes decoding of instructions where the second opcode >= 0x80.
- Add support for the 64-bit immediate mov instructions.
- When short_addr is enabled, don't parse the modr/m byte for a 16-bit
  address, but as a 32-bit address.
- Support %rip relative addressing.
- Don't print a displacement of 0 if there is a base or index register.

MFC after:	3 days
2006-11-13 21:14:54 +00:00
Ruslan Ermilov d77f5882e7 Fix NKPT comments to match reality. Note that the current value
of NKPT is no longer enough to run amd64 with 16G of RAM, as it
doesn't have space for mapping a kernel (16M kernel would require
additionally 8 page tables).
2006-11-13 20:33:54 +00:00
Ruslan Ermilov 26af9ac7d0 Fix a comment. 2006-11-13 06:26:57 +00:00
Alan Cox 44b8bd66f9 Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller.  (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)
2006-11-12 21:48:34 +00:00
Ruslan Ermilov 9f70620442 Regen.
Forgotten by:	trhodes
2006-11-11 21:49:08 +00:00
Ruslan Ermilov 7eae4829bf Spelling. 2006-11-07 21:57:18 +00:00
Ruslan Ermilov 81490cbe6f Line up memory amount reporting that got broken when s/real/usable/. 2006-11-07 21:55:39 +00:00
John Baldwin 6ddd7e6a5a Add a new 'union l_sigval' to use in place of 'union sigval' in the
linux siginfo structure.  l_sigval uses a l_uintptr_t for sival_ptr so
that sival_ptr is the right size for linux32 on amd64.  Since no code
currently uses 'lsi_ptr' this is just a cosmetic nit rather than a bug
fix.
2006-11-07 18:53:49 +00:00
John Baldwin 3900a3be21 Remove duplicate IDTVEC macro definition, it's already defined in
<machine/intr_machdep.h>.
2006-11-07 18:46:33 +00:00
Robert Watson acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
John Birrell 8391a99bf7 Remove the KDTRACE option again because of the complaints about having
it as a default.

For the record, the KDTRACE option caused _no_ additional source files
to be compiled in; certainly no CDDL source files. All it did was to
allow existing BSD licensed kernel files to include one or more CDDL
header files.

By removing this from DEFAULTS, the onus is on a kernel builder to add
the option to the kernel config, possibly by including GENERIC and
customising from there. It means that DTrace won't be a feature
available in FreeBSD by default, which is the way I intended it to be.

Without this option, you can't load the dtrace module (which contains
the dtrace device and the DTrace framework). This is equivalent to
requiring an option in a kernel config before you can load the linux
emulation module, for example.

I think it is a mistake to have DTrace ported to FreeBSD, but not
to have it available to everyone, all the time. The only exception
to this is the companies which distribute systems with FreeBSD embedded.
Those companies will customise their systems anyway. The KDTRACE
option was intended for them, and only them.
2006-11-04 23:50:12 +00:00
John Birrell 1f80cd9398 Build in kernel support for loading DTrace modules by default. This
adds the hooks that DTrace modules register with, and adds a few functions
which have the dtrace_ prefix to allow the DTrace FBT (function boundary
trace) provider to avoid tracing because they are called from the DTtrace
probe context.

Unlike other forms of tracing and debug, DTrace support in the kernel
incurs negligible run-time cost.

I think the only reason why anyone wouldn't want to have kernel support
enabled for DTrace would be due to the license (CDDL) under which DTrace
is released.
2006-11-04 04:58:10 +00:00
John Birrell 3d068827c2 Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by:	scottl, jhb
MFC after:	2 weeks
2006-11-01 04:54:51 +00:00
Konstantin Belousov d4d2a400e4 Fix a typo resulting in truncated linux32 signal trampoline code copied
to the usermode. Usually, signal handler segfaulted on return.

Reviewed by:	jhb
MFC after:	3 days
2006-10-31 17:53:02 +00:00
Alexander Leidinger 96ed72ac81 regen after linux_io_* backout 2006-10-29 14:12:44 +00:00
Alexander Leidinger 3680a41902 Backout the linux aio stuff. Several problems where identified and the
dynamic nature (if no native aio code is available, the linux part
returns ENOSYS because of missing requisites) should be solved differently
than it is.

All this will be done in P4.

Not included in this commit is a backout of the changes to the native aio
code (removing static in some places). Those changes (and some more) will
also be needed when the reworked linux aio stuff will reenter the tree.

Requested by:	rwatson
Discussed with:	rwatson
2006-10-29 14:02:39 +00:00
Bruce Evans 6a70163fcc Removed some SMP ifdefs so that using the TSC as a cputime clock is
not completely decided at config time.  Just don't default to using
the TSC if there are multiple active CPUs.  Also, don't default to
using the TSC if it is broken.  SMP ifdefs are still used to disallow
using perfmon since perfmon is always broken if SMP is just configured.

This only helps much for SMP kernels running on 1 CPU.  The overheads
for using the i8254 cputime clock were a bit too high on 486/33's, and
now on multi-GHz CPUs they are usually in the 99-99.9% range.  Switching
from the old default of an i8254 clock to the TSC works poorly because
the overheads are not recalibrated.

Use the same condition for declaring perfmon stuff as for using it.
2006-10-29 09:48:44 +00:00
Bruce Evans 91b4d1bfc2 In the userland .mcount():
- Don't use a frame pointer.  Our callers need a frame pointer, but we
  could only use one to support things that aren't supported.  (These
  things are:
  - profiling of profiling
  - debugging of profiling.  The core ENTRY() macro doesn't support
    forcing a frame pointer for debugging, so don't do more here.)
- Ensure that we are in the text section and have normal alignment.
- Use the normal syntax for `.type'.
2006-10-28 13:12:06 +00:00
Alexander Leidinger c1ea90bfd3 regen (prctl addition) 2006-10-28 11:24:38 +00:00
Bruce Evans 43f0ea0a27 i386/include/profile.h:
Fixed a syntax error for the (!__KERNEL && !__GNUCLIKE_ASM) case in
rev.1.36.  Apparently, this case has never been reached even by lint.

Submitted by:	stefanf

{amd64,i386}/include/profile.h:
In case the above case is actually reached, break it properly by
providing null support that will fail at link time instead of a stub
that gives wrong (null) profiling at runtime.
2006-10-28 11:03:03 +00:00
Alexander Leidinger 955d762aca MFP4:
Implement prctl().

Submitted by:	rdivacky
Tested with:	LTP
2006-10-28 10:59:59 +00:00
Bruce Evans 853b92dacf In MCOUNT_OVERHEAD(label), actually use the `label' parameter. We were
still using the global label named "profil", and this worked accidentally
because all callers use the same name.
2006-10-28 07:59:11 +00:00
Bruce Evans 3a110062fd Cleaned up includes. <machine/profile.h> was unused. <machine/timerreg.h>
was only used in the GUPROF case, so the messes to get its i386 prerequisites
included shouldn't have been needed.

Fixed some style bugs. Quote #error contents, and don't repeat an #error
directive on amd64.
2006-10-28 06:38:51 +00:00
Bruce Evans 94450a83e8 Removed all traces of HIDENAME() in amd64 and i386 kernel code. Using
this used to be slightly cleaner than using ifdefs in a few places to
support both a.out and elf, but using it now just causes messes and
unportabilities.  It seems to be impossible to implement the elf
HIDENAME() portably in cpp (since token pasting of "." and <name> is
invalid).

*/prof_machdep.c:
- Removed all uses of CNAME().  CNAME() is easy enough to use in pure
  asm code, but using it in inline asm requires messy quoting.  The
  core pure asm code has been hacked on more and all uses of CNAME() in
  it have already gone away.  Just assume the elf convention here too.
- Removed now-uneeded include of <machine/asmacros.h>.
- Removed the workaround for a namespace conflict with this include.
2006-10-28 06:04:29 +00:00
Bruce Evans 447647908c Don't call mexitcount or provide a stub mexitcount to call when
profiling is configured but high resolution profiling is not configured.
Only functions in *.[Ss] called the stub, so efficiency was not
significantly affected.
2006-10-27 14:17:50 +00:00
John Birrell 3750d1ecad Remove the KSE option now that it's in DEFAULTS on these arches/machines.
The 'nooption' kernel config entry has to be used to turn KSE off now.
This isn't my preferred way of dealing with this, but I'll defer to
scottl's experience with the io/mem kernel option change and the grief
experienced over that.

Submitted by:	scottl@
2006-10-26 22:11:35 +00:00
John Birrell 013d6d8cb4 Add 'options KSE' to the kernel config DEFAULTS on all arches/machines
except sun4v.

This change makes the transition from a default to an option more
transparent and is an attempt to head off all the compliants that are
likely from people who don't read UPDATING, based on experience with
the io/mem change.

Submitted by:	scottl@
2006-10-26 22:05:25 +00:00
John Birrell 8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
Ruslan Ermilov 837f167eb2 Move "device splash" back to MI NOTES and "files", it's MI. 2006-10-23 13:23:14 +00:00
Alan Cox 43200cd3ed Eliminate unnecessary PG_BUSY tests. 2006-10-22 04:18:01 +00:00
Ruslan Ermilov 7971a9bc04 MFi386: 1.13: Fix booting with ps2 keyboards. 2006-10-21 12:52:46 +00:00
Dag-Erling Smørgrav c43ac89acc Move more MD devices and options out of MI NOTES. 2006-10-20 09:52:27 +00:00
Bruce Evans 045f738b58 Don't show debug registers in "show registers". Special registers should
be displayed specially, and debug registers are among of the least
interesting special registers (far behind %cr3).  The debug registers
are still accessible as variables and displayed in another bogus place
("show watches").
2006-10-20 09:44:21 +00:00
Dag-Erling Smørgrav c276283866 The VGA_DEBUG option only exists on {amd64,i386,ia64}.
Also remove 'device io' from amd64 NOTES; DEFAULTS takes care of it.
2006-10-20 08:56:26 +00:00
Warner Losh e54ad0a189 Remove references to pccard.conf 2006-10-19 05:17:55 +00:00
David Xu 5f641fc0fb o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
  umtx and wait channel.
o Rename casuptr to casuword.
2006-10-17 02:24:47 +00:00
John Baldwin b85360078a Add one more include to fix the case of !DDB and !atpic. 2006-10-16 21:40:46 +00:00
Hiroki Sato b84baf83b9 Add a newline to the printf().
Spotted by:	Peter Carah <pete@altadena.net>
MFC after:	3 days
2006-10-15 16:52:59 +00:00
Alexander Leidinger 95f2da66d3 regen (linux AIO stuff) 2006-10-15 14:24:10 +00:00
Alexander Leidinger 6a1162d4cd MFP4 (with some minor changes):
Implement the linux_io_* syscalls (AIO). They are only enabled if the native
AIO code is available (either compiled in to the kernel or as a module) at
the time the functions are used. If the AIO stuff is not available there
will be a ENOSYS.

From the submitter:
---snip---
DESIGN NOTES:

1. Linux permits a process to own multiple AIO queues (distinguished by
   "context"), but FreeBSD creates only one single AIO queue per process.
   My code maintains a request queue (STAILQ of queue(3)) per "context",
   and throws all AIO requests of all contexts owned by a process into
   the single FreeBSD per-process AIO queue.

   When the process calls io_destroy(2), io_getevents(2), io_submit(2) and
   io_cancel(2), my code can pick out requests owned by the specified context
   from the single FreeBSD per-process AIO queue according to the per-context
   request queues maintained by my code.

2. The request queue maintained by my code stores contrast information between
   Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks
   (struct aiocb). FreeBSD IO control block actually exists in userland memory
   space, required by FreeBSD native aio_XXXXXX(2).

3. It is quite troubling that the function io_getevents() of libaio-0.3.105
   needs to use Linux-specific "struct aio_ring", which is a partial mirror
   of context in user space. I would rather take the address of context in
   kernel as the context ID, but the io_getevents() of libaio forces me to
   take the address of the "ring" in user space as the context ID.

   To my surprise, one comment line in the file "io_getevents.c" of
   libaio-0.3.105 reads:

             Ben will hate me for this

REFERENCE:

1. Linux kernel source code:   http://www.kernel.org/pub/linux/kernel/v2.6/
   (include/linux/aio_abi.h, fs/aio.c)

2. Linux manual pages:         http://www.kernel.org/pub/linux/docs/manpages/
   (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2))

3. Linux Scalability Effort:   http://lse.sourceforge.net/io/aio.html
   The design notes:           http://lse.sourceforge.net/io/aionotes.txt

4. The package libaio, both source and binary:
       http://rpmfind.net/linux/rpm2html/search.php?query=libaio
   Simple transparent interface to Linux AIO system calls.

5. Libaio-oracle:              http://oss.oracle.com/projects/libaio-oracle/
   POSIX AIO implementation based on Linux AIO system calls (depending on
   libaio).
---snip---

Submitted by:	Li, Xiao <intron@intron.ac>
2006-10-15 14:22:14 +00:00
Alexander Leidinger 0a62e03542 MFP4 (106538 + 106541):
Implement CLONE_VFORK. This fixes the clone05 LTP test.

Submitted by:	rdivacky
2006-10-15 13:39:40 +00:00
Alexander Leidinger 2482245b0c Revert my previous commit, I mismerged this to the wrong place.
Pointy hat to:	netchild
2006-10-15 13:30:45 +00:00
Alexander Leidinger 21aed094a9 MFP4 (106541): Fix the clone05 test in the LTP.
Submitted by:	rdivacky
2006-10-15 13:25:23 +00:00
Alexander Leidinger 4b3583a354 MFP4 (107144[1]): Implement CLONE_FS on i386[1] and amd64.
Submitted by:	rdivacky	[1]
2006-10-15 13:22:14 +00:00
John Baldwin d3998dcf2e Move the 2 additional #includes down into the #ifndef DEV_ATPIC section. 2006-10-13 17:31:57 +00:00
John Birrell e70cbcb5ba Attempt to fix the GENERIC kernel build which has been failing on
tinderbox for a couple of days.
2006-10-13 04:53:22 +00:00
John Baldwin 5d54487ef2 Fix nodevice atpic compile.
Pointy hat to:	jhb
2006-10-12 12:48:21 +00:00
John Baldwin 520ffff83e Change the x86 interrupt code to suspend/resume interrupt controllers
(PICs) rather than interrupt sources.  This allows interrupt controllers
with no interrupt pics (such as the 8259As when APIC is in use) to
participate in suspend/resume.
- Always register the 8259A PICs even if we don't use any of their pins.
- Explicitly reset the 8259As on resume on amd64 if 'device atpic' isn't
  included.
- Add a "dummy" PIC for the local APIC on the BSP to reset the local APIC
  on resume.  This gets suspend/resume working with APIC on UP systems.
  SMP still needs more work to bring the APs back to life.

The MFC after is tentative.

Tested by:	anholt (i386)
Submitted by:	Andrea Bittau <a.bittau at cs.ucl.ac.uk> (3)
MFC after:	1 week
2006-10-10 23:23:12 +00:00
John Baldwin 6e20fe33ba Oops, fix sign bug in #ifdef for value of INTRCNT_COUNT.
PR:		kern/99870
Submitted by:	jkim
MFC after:	3 days
2006-10-10 19:26:35 +00:00
Simon L. B. Nielsen 4517aab293 - Remove SCHED_ULE from GENERIC to better avoid foot-shooting by
unsuspecting users.
- Add a comment in NOTES about experimental status of SCHED_ULE.
- Make warning about experimental status in sched_ule(4) a bit
  stronger.

Suggested and reviewed by:	dougb
Discussed on:			developers
MFC after:			3 days
2006-10-05 20:31:58 +00:00
David Xu c6511aea86 Move some declaration of 32-bit signal structures into file
freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.
2006-10-05 01:56:11 +00:00
John Birrell 6825d60738 PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
Security:
Move the relocation definitions to the common elf header so that DTrace
can use them on one architecture targeted to a different one.

Add the additional ELF types defines in Sun's "Linker and Libraries"
manual.
2006-10-04 21:37:10 +00:00
Poul-Henning Kamp e5037a18a9 Use utc_offset() where applicable, and hide the internals of it
as static variables.
2006-10-02 18:23:37 +00:00
Poul-Henning Kamp b69f71eb29 Second part of a little cleanup in the calendar/timezone/RTC handling.
Split subr_clock.c in two parts (by repo-copy):
   subr_clock.c contains generic RTC and calendaric stuff. etc.
   subr_rtc.c contains the newbus'ified RTC interface.

Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock}
sysctls and associated variables into subr_clock.c.  They are
not machine dependent and we have generic code that relies on being
present so they are not even optional.
2006-10-02 15:42:02 +00:00
Poul-Henning Kamp f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
Maxim Sobolev 2c473eaf67 Extend comment explaining why code is conditional at !defined(SCHED_ULE).
Suggested by:	ru
2006-09-27 22:09:35 +00:00
Maxim Sobolev 6e93c19e3d Since ULE doesn't honor hlt_cpus_mask don't compile code that prevents
timer interrupt servicing for disabled HTT cores in ULE case. Should be
probably fixed in ULE code instead, but we have no real maintainer for
ULE to do it.

PR:		103697
2006-09-27 18:51:19 +00:00
Ruslan Ermilov 6c9fdda750 Added COMPAT_FREEBSD6 option. 2006-09-26 12:36:34 +00:00
David Xu 07a8ebcc75 Stop reloading %fs and %gs, since it causes the base address from
GDT to be loaded into FS.base and GS.base, these values of course
are not the values set by sysarch() with I386_SET_FSBASE and
I386_SET_GSBASE, the change fixed a crash for 32bit libthr after
signal handler returned and normal code is accessing thread pointer,
for example: movl %gs:8, %eax.
2006-09-23 13:42:09 +00:00
John Baldwin d72a078647 Update the ipmi(4) driver:
- Split out the communication protocols into their own files and use
  a couple of function pointers in the softc that the commuication
  protocols setup in their own attach routine.
- Add support for the SSIF interface (talking to IPMI over SMBus).
- Add an ACPI attachment.
- Add a PCI attachment that attaches to devices with the IPMI interface
  subclass.
- Split the ISA attachment out into its own file: ipmi_isa.c.
- Change the code to probe the SMBIOS table for an IPMI entry to just use
  pmap_mapbios() to map the table in rather than trying to setup a fake
  resource on an isa device and then activating the resource to map in the
  table.
- Make bus attachments leaner by adding attach functions for each
  communication interface (ipmi_kcs_attach(), ipmi_smic_attach(), etc.)
  that setup per-interface data.
- Formalize the model used by the driver to handle requests by adding an
  explicit struct ipmi_request object that holds the state of a given
  request and reply for the entire lifetime of the request.  By bundling
  the request into an object, it is easier to add retry logic to the various
  communication backends (as well as eventually support BT mode which uses
  a slightly different message format than KCS, SMIC, and SSIF).
- Add a per-softc lock and remove D_NEEDGIANT as the driver is now MPSAFE.
- Add 32-bit compatibility ioctl shims so you can use a 32-bit ipmitool
  on FreeBSD/amd64.
- Add ipmi(4) to i386 and amd64 NOTES.

Submitted by:	ambrisko (large portions of 2 and 3)
Sponsored by:	IronPort Systems, Yahoo!
MFC after:	6 days
2006-09-22 22:11:29 +00:00
Alexander Kabaev d9cb97ff9d Use __builtin_va_start instead of __builtin_stdarg_start. GCC4 obsoletes
the former and  __builtin_va_start was present in all GCC version 3.1 and
later.
2006-09-21 01:37:02 +00:00
Wojciech A. Koszek dec10b39fd Correct 'interrupt interrupt' -> 'interrupt' in the comment.
Requested by:	jhb
Approved by:	cognet (mentor)
2006-09-20 20:52:11 +00:00
David Xu 103c065406 Make cpu_set_upcall_kse() and cpu_set_user_tls() work for 32bit process. 2006-09-17 14:54:14 +00:00
John Baldwin 884ff1813f Add a new ddb command 'show lapic' to dump details about the local APIC
registers for the current CPU.

MFC after:	3 days
2006-09-11 20:12:42 +00:00
John Baldwin 5c15c7e71d Actually hook up the IPI_INVLCACHE IDT vectors backing
pmap_invalidate_cache() in the SMP case so pmap_mapdev() in multiuser
doesn't panic with a trap 30.  I broke this many months ago when I
added pmap_invalidate_cache() as early parts of the PAT work.

Patience from:	jmg
Pointy hat:	jhb
2006-09-11 20:10:42 +00:00
John Baldwin 9914a8cc7d - Fix rman_manage_region() to be a lot more intelligent. It now checks
for overlaps, but more importantly, it collapses adjacent free regions.
  This is needed to cope with BIOSen that split up ports for system devices
  (like IPMI controllers) across multiple system resource entries.
- Now that rman_manage_region() is not so dumb, remove extra logic in the
  x86 nexus drivers to populate the IRQ rman that manually coalesced the
  regions.

MFC after:	1 week
2006-09-11 19:31:52 +00:00
Alexander Leidinger bb59e63f8f Change futex lock from mutex to sx. Make futex_get atomic (protected by the
futex lock).

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Suggested by:	jhb
2006-09-09 16:25:25 +00:00
John Baldwin f6c48c1932 Use a single constant to define the sizes of the physmap[], phys_avail[],
and dump_avail[] arrays so they are in sync (previously it was possible
to store more entries in the physmap[] then we could store in phys_avail[],
which was pointless).  While I'm here, bump up the length of these tables
to hold 30 entries on amd64 and 16 on i386.  This allows machines with
fairly fragmented memory maps to boot ok (at least one machine would
not boot FreeBSD/i386 but would boot FreeBSD/amd64 because amd64 allowed
for more fragments).

MFC after:	3 days
2006-09-07 15:03:02 +00:00
Maxim Sobolev e7d33dcbc5 Unbreak in the case when device apic is compiled into non-SMP kernel.
Reported by:	jhay
MFC after:	2 weeks
2006-09-06 22:05:34 +00:00
Maxim Sobolev 23da540855 The FreeBSD by default "disables" hyper-threading cores, by not scheduling
any threads to them. However, it still counts those cores as "active but
permanently idle" when calculating system-wide CPUs statistics. It is
incorrect, since it skews statistics quite a bit and creates real problems
for certain types of applications (monitoring applications for example),
by making them believe that the system does have enough idle CPU resources,
while in fact it does not.

Correct the problem by not calling performance counting routines on "disabled"
cores. The cleaner solution would be to just disable APIC timer interrupts on
those cores completely, but ENOTIME here and it is not clear if the
additional complexity really worth minor performance gain.

Reviewed by:	ssouhlal
Sponsored by:	Sippy Software, Inc.
MFC after:	2 weeks
2006-09-05 17:15:24 +00:00
Alexander Leidinger 4038a816f8 MFi386 parts of rev 1.55 (modulo real MD parts):
- implement CLONE_PARENT semantic
 - lock proc in the currently disabled part of CLONE_THREAD

Submitted by:	rdivacky
2006-08-28 13:09:24 +00:00
David Xu 66e1c26dba Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.
2006-08-28 02:28:15 +00:00
Alexander Leidinger 084556f5d7 regen 2006-08-27 08:58:00 +00:00
Alexander Leidinger 835e506190 Add the linux statfs64 call. This allows Tivoli backup to proceed a little
but further on -current (still not successful, but a step into the right
direction).

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Tested by:	Paul Mather <paul@gromit.dlib.vt.edu>
2006-08-27 08:56:54 +00:00
Alexander Leidinger 40f734dd0d Emulate what vfork does instead of using it in linux_vfork. This way
we can do the stuff we need to do with linux processes at fork and
don't panic the kernel at exit of the child.

Submitted by:	rdivacky
Tested with:	tst-vfork* (glibc regression tests)
Tested by:	netchild
2006-08-25 11:59:56 +00:00
Alexander Leidinger 1a28c0df09 Sync the MI parts for amd64 with i386 and remove the corresponding special
handling for amd64 in the common code. The MD parts for amd64 are still
outstanding, but at least this fixes some panics on amd64.

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Tested by:	bsam
2006-08-20 13:50:27 +00:00
Alexander Leidinger 29ddc19bbf Get rid of some nested includes.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb
2006-08-19 15:13:01 +00:00
Alexander Leidinger 94cb2ecf79 Move some stuff into headers where they belong.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb, ssouhlal
2006-08-17 21:06:48 +00:00
Alexander Leidinger c632e9d3cc Initialize the emul sx-lock.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-08-17 10:04:49 +00:00
David Xu 55fd8f9e8c Change xorq back to xorl.
Noticed by: bde
2006-08-16 22:22:28 +00:00
Alexander Leidinger 0eef2f8a4e Style fixes to comments.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb, ssouhlal
2006-08-16 18:54:51 +00:00
David Xu 421cb5a9f2 Backout revision 1.117, xorl and xorq have same result, but xorq needs
longer decoding.
2006-08-15 22:43:02 +00:00
John Baldwin f8f1f7fb85 Regen to propogate <prefix>_AUE_<mumble> changes as well as the earlier
systrace changes.
2006-08-15 17:37:01 +00:00
John Baldwin df78f6d313 - Remove unused sysvec variables from various syscalls.conf.
- Send the systrace_args files for all the compat ABIs to /dev/null for
  now.  Right now makesyscalls.sh generates a file with a hardcoded
  function name, so it wouldn't work for any of the ABIs anyway.  Probably
  the function name should be configurable via a 'systracename' variable
  and the functions should be stored in a function pointer in the sysvec
  structure.
2006-08-15 17:25:55 +00:00
Alexander Leidinger 7c09e6c0bd Initialize the eventhandlers, mutexes and sx locks.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-08-15 14:58:15 +00:00
Alexander Leidinger 77b959aa51 add autogenerated systrace_args stuff for dtrace 2006-08-15 12:56:36 +00:00
Alexander Leidinger 9b44bfc556 Add the linux 2.6.x stuff (not used by default!):
- TLS - complete
 - pid/tid mangling - complete
 - thread area - complete
 - futexes - complete with issues
 - clone() extension - complete with some possible minor issues
 - mq*/timer*/clock* stuff - complete but untested and the mq* stuff is
   disabled when not build as part of the kernel with native FreeBSD mq*
   support (module support for this will come later)

Tested with:
 - linux-firefox - works, tested
 - linux-opera - works, tested
 - linux-realplay - doesnt work, issue with futexes
 - linux-skype - doesnt work, issue with futexes
 - linux-rt2-demo - works, tested
 - linux-acroread - doesnt work, unknown reason (coredump) and sometimes
   issue with futexes
 - various unix utilities in linux-base-gentoo3 and linux-base-fc4:
   everything tried worked

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

To test this new stuff, you have to run
	sysctl compat.linux.osrelease=2.6.16
to switch back use
	sysctl compat.linux.osrelease=2.4.2

Don't switch while running a linux program, strange things may or may not
happen.

Sponsored by:			Google SoC 2006
Submitted by:			rdivacky
Some suggestions/help by:	jhb, kib, manu@NetBSD.org, netchild
2006-08-15 12:54:30 +00:00
Alexander Leidinger c107650561 regen 2006-08-15 12:51:45 +00:00
David Xu 2328274aec Because fuword on AMD64 returns 64bit long integer -1 on fault, clear
entire %rax to zero instead of only clearing %eax, otherwise it will
leave garbage data in upper 32 bits.
2006-08-15 12:45:51 +00:00
Alexander Leidinger b4359bd8e5 Add new syscalls in the linuxolator (only used when the sysctl
compat.linux.osrelease is changed to "2.6.16" or similar).

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-08-15 12:28:14 +00:00
Alan Cox c3b182d235 Eliminate an unnecessary initialization from trap_pfault() that also
happens to contain a style error.
2006-08-14 19:53:53 +00:00
John Baldwin d25fdf539e Don't try to preserve PAT bits in pmap_enter(). We currently on pages that
aren't mapped via pmap_enter() (KVA).  We will eventually support PAT bits
on user pages, but those will require some sort of MI caching mode stored
in the vm_page.

Reviewed by:	alc
2006-08-14 15:39:41 +00:00
Alan Cox 3173042ecc It's not entirely obvious that PGEX_I must be zero if no-execute is neither
supported nor enabled.  Just to be sure, verify that no-execute is enabled
before passing VM_PROT_EXECUTE to vm_fault().

Suggested by: tegge@
2006-08-14 06:15:16 +00:00
John Baldwin 7e9f73f3ed First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR).  Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
  additional parameter (relative to pmap_mapdev()) specifying the cache
  mode for this mapping.  Note that on amd64 only WB mappings are done with
  the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
  mappings rather than WB.  Previously we relied on the BIOS setting up
  MTRR's to enforce memio regions being treated as UC.  This might make
  hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
  that used pmap_mapdev() to map non-device memory (such as ACPI tables)
  to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
  caching mode for a range of KVA.

Reviewed by:	alc
2006-08-11 19:22:57 +00:00
Alexander Leidinger 50e422f056 Add some more errno mappings (bsd -> linux) and a comment about the status..
Submitted by:	"Intron" <mag@intron.ac>
2006-08-10 22:05:25 +00:00
Alan Cox 079ba18aac Pass VM_PROT_EXECUTE to vm_fault() instead of VM_PROT_READ if the page
fault was caused by an instruction fetch.
2006-08-08 04:01:29 +00:00
Alan Cox 14aaab5329 Eliminate the acquisition and release of the page queues lock around a call
to vm_page_sleep_if_busy().
2006-08-06 06:29:16 +00:00
Alan Cox f8883c0160 Define the additional page fault error codes that are implemented by amd64. 2006-08-02 16:24:23 +00:00
Alan Cox 78985e424a Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write().  However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567.  The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)
2006-08-01 19:06:06 +00:00
David E. O'Brien a4755e0e13 Correct spelling of 3DNow!. 2006-08-01 01:23:39 +00:00
Marcel Moolenaar 302981e72a Remove sio(4) and related options from MI files to amd64, i386
and pc98 MD files. Remove nodevice and nooption lines specific
to sio(4) from ia64, powerpc and sparc64 NOTES. There were no
such lines for arm yet.
sio(4) is usable on less than half the platforms, not counting
a future mips platform. Its presence in MI files is therefore
increasingly becoming a burden.
2006-07-29 18:38:54 +00:00
John Baldwin cb76d9b05c Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.
2006-07-28 20:22:58 +00:00
John Baldwin 91ce2694d1 Regen for MPSAFE flag removal. 2006-07-28 19:08:37 +00:00
John Baldwin af5bf12239 Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
  syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.
2006-07-28 19:05:28 +00:00
John Baldwin e0b4add8d8 Various fixes to comments in the syscall master files including removing
cruft from the audit import and adding mention of COMPAT4 to freebsd32.
2006-07-28 18:55:18 +00:00