Commit graph

1903 commits

Author SHA1 Message Date
Konstantin Belousov 578113aaa3 Remove locking of the vm page queues from several pmaps, which only
protected the dirty mask updates. The dirty mask updates are handled
by atomics after the r225840.

Submitted by:	alc
Tested by:	flo (sparc64)
MFC after:	2 weeks
2011-09-28 15:01:20 +00:00
Kip Macy 8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Konstantin Belousov 26ccf4f10f Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by:	jhb
Approved by:	re (bz)
MFC after:	2 weeks
2011-09-11 16:05:09 +00:00
Konstantin Belousov 3407fefef6 Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
Konstantin Belousov d98d0ce27a - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
  lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
  VPO_UNMANAGED. As a consequence, pmap code now can use use just
  VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by:	alc
Tested by:	pho (x86, previous version), marius (sparc64),
    marcel (arm, ia64, powerpc), ray (mips)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (bz)
2011-08-09 21:01:36 +00:00
Marcel Moolenaar 3b6ecbbdf1 Fix kernel core dumps now that the kernel is using PBVM. The basic
problem to solve is that we don't have a fixed mapping from kernel
text to physical address so that libkvm can bootstrap itself. We
solve this by passing the physical address of the bootinfo structure
to the consumer as the entry point of the core file. This way,
libkvm can extract the PBVM page table information and locate the
kernel in the core file.
We also need to dump memory chunks of type loader data, because
those hold the kernel and the PBVM page table (among other things).

Approved by:	re (blanket)
2011-08-06 03:40:33 +00:00
Marcel Moolenaar 1af1866850 Follow-up commit: refactor pmap_kextract() to make it easier to
catch and debug issues like the one fixed in the previous commit:
Replace all return statements with goto statements so that we end
up at a single place with a value for the physical address.
Print a message for all unknown KVA addresses.

Approved by:	re (blanket)
2011-08-05 23:10:47 +00:00
Marcel Moolenaar 444b037cd2 Remove stray semicolon in pmap_kextract() that turned the conditional
"return (0)" into an unconditional one and as such broke PBVM address
queries -- such as during kernel core dumps.

Approved by:	re (blanket)
2011-08-05 23:05:46 +00:00
Attilio Rao 786ef92b7b Bump MAXCPU for amd64, ia64 and XLP mips appropriately.
From now on, default values for FreeBSD will be 64 maxiumum supported
CPUs on amd64 and ia64 and 128 for XLP. All the other architectures
seem already capped appropriately (with the exception of sparc64 which
needs further support on jalapeno flavour).

Bump __FreeBSD_version in order to reflect KBI/KPI brekage introduced
during the infrastructure cleanup for supporting MAXCPU > 32. This
covers cpumask_t retiral too.

The switch is considered completed at the present time, so for whatever
bug you may experience that is reconducible to that area, please report
immediately.

Requested by:	marcel, jchandra
Tested by:	pluknet, sbruno
Approved by:	re (kib)
2011-07-19 13:00:30 +00:00
Attilio Rao 732772c701 On 64 bit architectures size_t is 8 bytes, thus it should use an 8 bytes
storage.
Fix the sintrcnt/sintrnames specification.

No MFC is previewed for this patch.

Reported, reviewed and tested by:	marcel
Approved by:	re (kib)
2011-07-19 12:41:57 +00:00
Attilio Rao 68b739cd6f Add the possibility to specify from kernel configs MAXCPU value.
This patch is going to help in cases like mips flavours where you
want a more granular support on MAXCPU.

No MFC is previewed for this patch.

Tested by:	pluknet
Approved by:	re (kib)
2011-07-19 00:37:24 +00:00
Attilio Rao 521ea19d1c - Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
  tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
  This area can be widely improved by applying the same to other
  architectures and likely finding an unified approach among them and
  move the whole code to be MI. More work in this area is expected to
  happen fairly soon.

No MFC is previewed for this patch.

Tested by:	pluknet
Reviewed by:	jhb
Approved by:	re (kib)
2011-07-18 15:19:40 +00:00
John Baldwin 9c931b6fdc Enable NEW_PCIB by default on ia64.
Approved by:	re (kib), marcel
2011-07-18 14:05:14 +00:00
John Baldwin fbd3fc8fc7 Implement bus_adjust_resource() for the ia64 nexus driver.
Reviewed by:	marcel
Approved by:	re (kib)
2011-07-18 14:04:37 +00:00
Marcel Moolenaar 08f72ab5dc Don't assume pmap_mapdev() gets called only for memory mapped I/O
addresses (i.e. uncacheable). ACPI in particular uses pmap_mapdev()
rather excessively (number of calls) just to get a valid KVA. To
that end, have pmap_mapdev():
1.  cache the last result so that we don't waste time for multiple
    consecutive invocations with the same PA/SZ.
2.  find the memory descriptor that covers the PA and return NULL
    if none was found or when the PA is for a common DRAM address.
3.  Use either a region 6 or region 7 KVA, in accordance with the
    memory attribute.
2011-07-16 20:34:02 +00:00
Marcel Moolenaar 5b927e4ea9 Don't send EOI to the CPU before we handled the interrupt. This could
potentially trigger multiple pending interrupts for level-sensitive
interrupts.  However, the event timer interrupt does need EOI before
being handled to avoid missing clock events.

These conflicting requirements are handled by having the XIV handler
inform the dispatch code whether or not it send EOI to the CPU. If not,
the dispatch code will do it. This allows handlers to send EOI before
doing potentially long-running activities, while still have a sensible
default behaviour.
2011-07-16 20:16:49 +00:00
Marcel Moolenaar 303c22c53b Add a few more helper functions for working with memory descriptors:
o   efi_md_find() - returns the md that covers the given address
o   efi_md_last() - returns the last md in the list
o   efi_md_prev() - returns the md that preceeds the given md.
2011-07-16 19:56:07 +00:00
Marcel Moolenaar 0eb8c1410a Implement basic support for memory attributes. At this time we only
distinguish between UC and WB memory so that we can map the page to
either a region 6 address (for UC) or a region 7 address (for WB).

This change is only now possible, because previously we would map
regions 6 and 7 with 256MB translations and on top of that had the
kernel mapped in region 7 using a wired translation. The introduction
of the PBVM moved the kernel into its own region and freed up region
7 and allowed us to revert to standard page-sized translations.

This commit inroduces pmap_page_to_va() that respects the attribute.
2011-07-08 16:30:54 +00:00
Marcel Moolenaar 6359509327 Disable PREEMPTION for now. See also PR ia64/147501. 2011-07-04 16:59:26 +00:00
Attilio Rao 470107b2f1 MFC 2011-07-04 11:13:00 +00:00
Alan Cox 80788b2a27 When iterating over a paging queue, explicitly check for PG_MARKER, instead
of relying on zeroed memory being interpreted as an empty PV list.

Reviewed by:	kib
2011-07-02 23:42:04 +00:00
Marcel Moolenaar c412551667 Change the management of nested faults by switching to physical
addressing while reading or writing the trap frame. It's not
possible to guarantee that the one translation cache entry that
we depend on is not going to get purged by the CPU. We already
know that global shootdowns (ptc.g and/or ptc.ga) can (and will)
cause multiple TC entries to get purged and we initialize tried
to handle that by serializing kernel entry with these operations.
However, we need to serialize kernel exit as well.

But even if we can serialize, it appears that CPU threads within
a core can affect each other's TC entries beyond the global
shootdown. This would mean serializing any and all translatation
cache updates with the threads in a core with the kernel entry
and exit of any thread in that core. This is just too painful
and complicated.

Since we already properly coded for the 2 nested faults that we
can get, all we need to do is use those to obtain the physical
address of the trap frame, switch to physical mode and in that
way eliminate any further faults. The trap frame is already
aligned to 1KB boundaries to make sure we don't cross the page
boundary, this is safe to do.

We still need to serialize ptc.g or ptc.ga across CPUs because
the platform can only have 1 such operation outstanding at the
same time. We can now use a regular (spin) lock for this.

Also, it has been observed that we can get a nested TLB faults
for region 7 virtual addresses. This was unexpected. For now,
we enhance the nested TLB fault handler to deal with those as
well, but it needs to be understood.
2011-06-30 20:34:55 +00:00
Attilio Rao 7b744f6b01 MFC 2011-06-30 10:19:43 +00:00
Alan Cox 6bbee8e28a Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings.  Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write().  It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by:	kib
2011-06-29 16:40:41 +00:00
Attilio Rao cfdfd32d34 MFC 2011-06-26 17:30:46 +00:00
Marcel Moolenaar 191eae8259 Oops. The sec field of struct bintime is *not* a 32-bit type.
It's time_t, which is 64 bits on ia64.
2011-06-25 17:58:35 +00:00
Marcel Moolenaar 3ebd36e3d0 Define the minimum fractional period in terms of hz. We know hz is
a magnitude smaller than itc_freq. A minimum period of 10*hz is
sufficient precision. As a side-effect, the number of clocks per
second, when the machine is idle, dropped by more than 50%.
Be anal and define the maximum period to be at least 4G seconds.
With a 64-bit counter and an ITC frequency that's expected to be
always less than 4Ghz, it takes longer than that to wrap around.
2011-06-25 16:35:43 +00:00
Marcel Moolenaar dbd57cbf8f Replace the original copyright notice with my own. Everything in
this file is written by me and has no bearing on the initial or
original version.
2011-06-25 03:43:58 +00:00
Marcel Moolenaar 3f5deb80b7 Update copyright. 2011-06-25 03:37:40 +00:00
Marcel Moolenaar e920e3978e Switch to the event timers infrastructure. This includes:
o   Setting td_intr_frame to the XIVs trap frame because it's referenced
    by the ET event handler.
o   Signal EOI to the CPU before calling the registered XIV handlers.
    This prevents lost ITC interrupts, which cause starvation in one-shot
    mode.
o   Adding support for IPI_HARDCLOCK with corresponding per-CPU counters.
o   Have the APs call cpu_initclocks() so as to limited the scattering of
    clock related initialization. cpu_initclocks() calls the <self>_bsp()
    or <self>_ap() version accordingly.
o   Uncomment the ET clock handling in cpu_idle().
o   Update the DDB 'show pcpu' output for the new MD fields.
o   Entirely rewritten ia64_ih_clock(). Note that we don't create as many
    clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale.
    We can only have 240 XIVs and we can have more CPUs than that. There's
    a single intrcnt index for the cumulative clock ticks and we keep per
    CPU counts in the PCPU stats structure.
o   Register the ITC by hooking SI_SUB_CONFIGURE (2nd order).

Open issues:
o   Clock interrupts can still be lost. Some tweaking is still necessary.

Thanks to: mav@ for his support, feedback and explanations.

ET stats while committing:
eris% sysctl machdep.cpu | grep nclks

machdep.cpu.0.nclks: 24007
machdep.cpu.1.nclks: 22895
machdep.cpu.2.nclks: 13523
machdep.cpu.3.nclks: 9342
machdep.cpu.4.nclks: 9103
machdep.cpu.5.nclks: 9298
machdep.cpu.6.nclks: 10039
machdep.cpu.7.nclks: 9479
eris% vmstat -i | grep clock
clock                      108599         50
2011-06-25 02:15:14 +00:00
Attilio Rao de138ec703 MFC 2011-06-24 16:35:40 +00:00
Marcel Moolenaar ded49c2eae Unblock the outgoing thread after we performed pmap_switch() to
switch the region registers. pmap_switch() returns the pmap for
which the region register are currently programmed, which needs
to be re-programmed on the CPU the ougoing thread gets switched
in.  This change does not noticibly change anything or fix known
bugs, but does give me a warm fuzzy feeling by being more
correct.
2011-06-23 16:21:43 +00:00
Attilio Rao 9b571ec6b3 MFC 2011-06-22 19:42:32 +00:00
Alan Cox 9ed9322551 Use a non-standard page size that is supported. 2011-06-21 12:38:40 +00:00
Attilio Rao 5519971c21 MFC 2011-06-19 14:22:35 +00:00
Marcel Moolenaar 218b719b74 Improve on style(9) 2011-06-17 05:30:12 +00:00
Marcel Moolenaar acd1d4d28e Properly serialize the global shootdown with the instruction
stream of the local processor. Also explicitly invalidate
the ALAT. This is done on the other CPUs in the coherence
domain by virtue of the ptc.ga instruction, but does not
apply to the local CPU.
2011-06-17 04:26:03 +00:00
Attilio Rao 2b0320744a Remove the usage of pc_cpumask and pc_other_cpus from ia64.
Reviewed and tested by:	marcel
2011-06-15 07:29:20 +00:00
Marcel Moolenaar 52737ec557 Add the model number for the Montvale processor (marketed as Itanium 2 9100).
At this time we're missing just one: Tukwila (Itanium 2 9300).
2011-06-11 02:22:11 +00:00
Attilio Rao 5e9857e76b MFC 2011-06-07 08:24:29 +00:00
Marcel Moolenaar 28fb80aa8c Call set_cputicker() to have the time counter use the ITC register.
Note that the ITC frequency is fixed.
2011-06-07 01:06:49 +00:00
Attilio Rao 81c02539f1 MFC 2011-06-06 21:38:39 +00:00
Marcel Moolenaar e726a6b70c Improve cpu_idle():
o   cpu_idle_hook is expected to be called with interrupts
    disabled and re-enables interrupts on return.
o   sync with x86: don't idle when the CPU has runnable tasks
o   have callers of ia64_call_pal_static() disable interrupts
    and re-enable interrupts.
o   add, but compile-out, support for idle mode. This will be
    enabled at some later time, after proper testing.
2011-06-06 19:06:15 +00:00
Attilio Rao 61b926921f MFC 2011-05-31 21:22:44 +00:00
Nathan Whitehorn d098f93019 On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by:	jhb
2011-05-31 15:11:43 +00:00
Attilio Rao c02f1527a9 MFC 2011-05-14 19:20:13 +00:00
Marcel Moolenaar 767ca6ed1a Prefer switching the memory stack from user to kernel *before* switching
the register stack. While the ordering doesn't matter, it creates an
invariant not previously there: the memory stack pointer will always be
larger than the register stack pointer. With this invariant in place,
it's easier to add instrumentation code that detects a stack overflow
because in such a scenario the memory stack pointer and register stack
pointers have crossed each other.

Aside: basic kernel operation needs about half the stack size (~16K)
at most. We have plenty of head room on the kernel stack...
2011-05-14 14:55:15 +00:00
Marcel Moolenaar 65385d6d79 Sharpening the saw:
o   Clobber the register that holds the restart token immediately after
    crossing the restart point. This prevents false positives (i.e. a
    nested exception that we don't know can happen and that is being
    treated as one we know by virtue of a lingering restart token).
o   Now that the bootstrap kernel stack is free, switch onto it and call
    trap() for nested traps that we don't know about. In trap we panic()
    so that we can analyze the condition.
2011-05-14 14:47:19 +00:00
Marcel Moolenaar 7fb64531d3 Be pedantic: mark the pcpu pointer (= register r13) itself as volatile. 2011-05-14 14:40:24 +00:00
Marcel Moolenaar dc03be9d67 Turn ia64_srlz() and ia64_srlz_i() into defines so that the code is
still correct when inlining is disabled.
2011-05-14 14:36:08 +00:00