Commit graph

5779 commits

Author SHA1 Message Date
H. Peter Anvin 30a0fb947a x86: correct the CPUID pattern for MSR_IA32_MISC_ENABLE availability
Impact: re-enable CPUID unmasking on affected processors

As far as I am capable of discerning from the documentation,
MSR_IA32_MISC_ENABLE should be available for all family 0xf CPUs, as
well as family 6 for model >= 0xd (newer Pentium M).

The documentation on this isn't ideal, so we need to be on the lookout
for errors, still.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-01-26 09:40:58 -08:00
Rakib Mullick 659d2618b3 x86: fix section mismatch warning
Here function vmi_activate calls a init function activate_vmi , which
causes the following section mismatch warnings:

  LD      arch/x86/kernel/built-in.o
WARNING: arch/x86/kernel/built-in.o(.text+0x13ba9): Section mismatch
in reference from the function vmi_activate() to the function
.init.text:vmi_time_init()
The function vmi_activate() references
the function __init vmi_time_init().
This is often because vmi_activate lacks a __init
annotation or the annotation of vmi_time_init is wrong.

WARNING: arch/x86/kernel/built-in.o(.text+0x13bd1): Section mismatch
in reference from the function vmi_activate() to the function
.devinit.text:vmi_time_bsp_init()
The function vmi_activate() references
the function __devinit vmi_time_bsp_init().
This is often because vmi_activate lacks a __devinit
annotation or the annotation of vmi_time_bsp_init is wrong.

WARNING: arch/x86/kernel/built-in.o(.text+0x13bdb): Section mismatch
in reference from the function vmi_activate() to the function
.devinit.text:vmi_time_ap_init()
The function vmi_activate() references
the function __devinit vmi_time_ap_init().
This is often because vmi_activate lacks a __devinit
annotation or the annotation of vmi_time_ap_init is wrong.

Fix it by marking vmi_activate() as __init too.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-26 14:27:18 +01:00
Ingo Molnar 99fb4d349d x86: unmask CPUID levels on Intel CPUs, fix
Impact: fix boot hang on pre-model-15 Intel CPUs

rdmsrl_safe() does not work in very early bootup code yet, because we
dont have the pagefault handler installed yet so exception section
does not get parsed. rdmsr_safe() will just crash and hang the bootup.

So limit the MSR_IA32_MISC_ENABLE MSR read to those CPU types that
support it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-26 12:36:24 +01:00
Eric Anholt ef5fa0ab24 x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn.
In the absence of PAT, PAGE_KERNEL_WC ends up mapping to a memory type that
gets UC behavior even in the presence of a WC MTRR covering the area in
question.  By swapping to PAGE_KERNEL_UC_MINUS, we can get the actual
behavior the caller wanted (WC if you can manage it, UC otherwise).

This recovers the 40% performance improvement of using WC in the DRM
to upload vertex data.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-01-26 11:14:27 +01:00
Ingo Molnar e1b4d11436 x86: use standard PIT frequency
the RDC and ELAN platforms use slighly different PIT clocks, resulting in
a timex.h hack that changes PIT_TICK_RATE during build time. But if a
tester enables any of these platform support .config options, the PIT
will be miscalibrated on standard PC platforms.

So use one frequency - in a subsequent patch we'll add a quirk to allow
x86 platforms to define different PIT frequencies.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-25 16:57:47 +01:00
Peter Zijlstra 42ef73fe13 x86, mm: fix pte_free()
On -rt we were seeing spurious bad page states like:

Bad page state in process 'firefox'
page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3
[<c043d0f3>] ? printk+0x14/0x19
[<c0272d4e>] bad_page+0x4e/0x79
[<c0273831>] free_hot_cold_page+0x5b/0x1d3
[<c02739f6>] free_hot_page+0xf/0x11
[<c0273a18>] __free_pages+0x20/0x2b
[<c027d170>] __pte_alloc+0x87/0x91
[<c027d25e>] handle_mm_fault+0xe4/0x733
[<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
[<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
[<c0218875>] do_page_fault+0x36f/0x88a

This is the case where a concurrent fault already installed the PTE and
we get to free the newly allocated one.

This is due to pgtable_page_ctor() doing the spin_lock_init(&page->ptl)
which is overlaid with the {private, mapping} struct.

union {
    struct {
        unsigned long private;
        struct address_space *mapping;
    };
    spinlock_t ptl;
    struct kmem_cache *slab;
    struct page *first_page;
};

Normally the spinlock is small enough to not stomp on page->mapping, but
PREEMPT_RT=y has huge 'spin'locks.

But lockdep kernels should also be able to trigger this splat, as the
lock tracking code grows the spinlock to cover page->mapping.

The obvious fix is calling pgtable_page_dtor() like the regular pte free
path __pte_free_tlb() does.

It seems all architectures except x86 and nm10300 already do this, and
nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
doesn't do SMP or simply doesnt do MMU at all or something.

Signed-off-by: Peter Zijlstra <a.p.zijlsta@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
2009-01-23 18:42:06 +01:00
Markus Metzger ba2607fe9c x86, ds, bts: cleanup/fix DS configuration
Cleanup the cpuid check for DS configuration.

This also fixes a Corei7 CPUID enumeration bug.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-22 14:35:00 +01:00
Thomas Gleixner 336f6c322d debugobjects: add and use INIT_WORK_ON_STACK
Impact: Fix debugobjects warning

debugobject enabled kernels spit out a warning in hpet code due to a
workqueue which is initialized on stack.

Add INIT_WORK_ON_STACK() which calls init_timer_on_stack() and use it
in hpet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-01-22 10:02:07 +01:00
H. Peter Anvin 066941bd4e x86: unmask CPUID levels on Intel CPUs
Impact: Fixes crashes with misconfigured BIOSes on XSAVE hardware

Avuton Olrich reported early boot crashes with v2.6.28 and
bisected it down to dc1e35c6e9
("x86, xsave: enable xsave/xrstor on cpus with xsave support").

If the CPUID limit bit in MSR_IA32_MISC_ENABLE is set, clear it to
make all CPUID information available.  This is required for some
features to work, in particular XSAVE.

Reported-and-bisected-by: Avuton Olrich <avuton@gmail.com>
Tested-by: Avuton Olrich <avuton@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-01-22 09:24:02 +01:00
H. Peter Anvin bdf21a49ba x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h>
Impact: None (new bit definitions currently unused)

Add bit definitions for the MSR_IA32_MISC_ENABLE MSRs to
<asm/msr-index.h>.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-01-21 15:13:53 -08:00
Suresh Siddha 9597134218 x86: fix PTE corruption issue while mapping RAM using /dev/mem
Beschorner Daniel reported:
> hwinfo problem since 2.6.28, showing this in the oops:
>	Corrupted page table at address 7fd04de3ec00

Also, PaX Team reported a regression with this commit:

>	commit 9542ada803
>	Author: Suresh Siddha <suresh.b.siddha@intel.com>
>	Date:   Wed Sep 24 08:53:33 2008 -0700
>
>	    x86: track memtype for RAM in page struct

This commit breaks mapping any RAM page through /dev/mem, as the
reserve_memtype() was not initializing the return attribute type and as such
corrupting the PTE entry that was setup with the return attribute type.

Because of this bug, application mapping this RAM page through /dev/mem
will die with "Corrupted page table at address xxxx" message in the kernel
log and also the kernel identity mapping which maps the underlying RAM
page gets converted to UC.

Fix this by initializing the return attribute type before calling
reserve_ram_pages_type()

Reported-by: PaX Team <pageexec@freemail.hu>
Reported-and-tested-by: Beschorner Daniel <Daniel.Beschorner@facton.com>
Tested-and-Acked-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 18:42:32 +01:00
Thomas Renninger 731f1872f4 x86: mtrr fix debug boot parameter
while looking at:

  http://bugzilla.kernel.org/show_bug.cgi?id=11541

I realized that the mtrr.show param cannot work, because
the code is processed much too early.

This patch:
 - Declares mtrr.show as early_param
 - Stays consistent with the previous param (which I doubt
   that it ever worked), so mtrr.show=1 would still work
 - Declares mtrr_show as initdata

Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 12:26:42 +01:00
Suresh Siddha a1e46212a4 x86: fix page attribute corruption with cpa()
Impact: fix sporadic slowdowns and warning messages

This patch fixes a performance issue reported by Linus on his
Nehalem system. While Linus reverted the PAT patch (commit
58dab916df) which exposed the issue,
existing cpa() code can potentially still cause wrong(page attribute
corruption) behavior.

This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that
various people reported.

In 64bit kernel, kernel identity mapping might have holes depending
on the available memory and how e820 reports the address range
covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole
in the address range that is not listed by e820 entries, kernel identity
mapping will have a corresponding hole in its 1-1 identity mapping.

If cpa() happens on the kernel identity mapping which falls into these holes,
existing code fails like this:

	__change_page_attr_set_clr()
		__change_page_attr()
			returns 0 because of if (!kpte). But doesn't
			set cpa->numpages and cpa->pfn.
		cpa_process_alias()
			uses uninitialized cpa->pfn (random value)
			which can potentially lead to changing the page
			attribute of kernel text/data, kernel identity
			mapping of RAM pages etc. oops!

This bug was easily exposed by another PAT patch which was doing
cpa() more often on kernel identity mapping holes (physical range between
max_low_pfn_mapped and 4GB), where in here it was setting the
cache disable attribute(PCD) for kernel identity mappings aswell.

Fix cpa() to handle the kernel identity mapping holes. Retain
the WARN() for cpa() calls to other not present address ranges
(kernel-text/data, ioremap() addresses)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 12:24:54 +01:00
Ingo Molnar 552b8aa4d1 Revert "x86: signal: change type of paramter for sys_rt_sigreturn()"
This reverts commit 4217458daf.

Justin Madru bisected this commit, it was causing weird Firefox
crashes.

The reason is that GCC mis-optimizes (re-uses) the on-stack parameters of
the calling frame, which corrupts the syscall return pt_regs state and
thus corrupts user-space register state.

So we go back to the slightly less clean but more optimization-safe
method of getting to pt_regs. Also add a comment to explain this.

Resolves: http://bugzilla.kernel.org/show_bug.cgi?id=12505

Reported-and-bisected-by: Justin Madru <jdm64@gawab.com>
Tested-by: Justin Madru <jdm64@gawab.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 09:43:18 +01:00
Andi Kleen e0a96129db x86: use early clobbers in usercopy*.c
Impact: fix rare (but currently harmless) miscompile with certain configs and gcc versions

Hugh Dickins noticed that strncpy_from_user() was miscompiled
in some circumstances with gcc 4.3.

Thanks to Hugh's excellent analysis it was easy to track down.

Hugh writes:

> Try building an x86_64 defconfig 2.6.29-rc1 kernel tree,
> except not quite defconfig, switch CONFIG_PREEMPT_NONE=y
> and CONFIG_PREEMPT_VOLUNTARY off (because it expands a
> might_fault() there, which hides the issue): using a
> gcc 4.3.2 (I've checked both openSUSE 11.1 and Fedora 10).
>
> It generates the following:
>
> 0000000000000000 <__strncpy_from_user>:
>    0:   48 89 d1                mov    %rdx,%rcx
>    3:   48 85 c9                test   %rcx,%rcx
>    6:   74 0e                   je     16 <__strncpy_from_user+0x16>
>    8:   ac                      lods   %ds:(%rsi),%al
>    9:   aa                      stos   %al,%es:(%rdi)
>    a:   84 c0                   test   %al,%al
>    c:   74 05                   je     13 <__strncpy_from_user+0x13>
>    e:   48 ff c9                dec    %rcx
>   11:   75 f5                   jne    8 <__strncpy_from_user+0x8>
>   13:   48 29 c9                sub    %rcx,%rcx
>   16:   48 89 c8                mov    %rcx,%rax
>   19:   c3                      retq
>
> Observe that "sub %rcx,%rcx; mov %rcx,%rax", whereas gcc 4.2.1
> (and many other configs) say "sub %rcx,%rdx; mov %rdx,%rax".
> Isn't it returning 0 when it ought to be returning strlen?

The asm constraints for the strncpy_from_user() result were missing an
early clobber, which tells gcc that the last output arguments
are written before all input arguments are read.

Also add more early clobbers in the rest of the file and fix 32-bit
usercopy.c in the same way.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
[ since this API is rarely used and no in-kernel user relies on a 'len'
  return value (they only rely on negative return values) this miscompile
  was never noticed in the field. But it's worth fixing it nevertheless. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 09:43:17 +01:00
Gary Hade f5495506c3 x86: remove kernel_physical_mapping_init() from init section
Impact: fix crash with memory hotplug enabled

kernel_physical_mapping_init() is called during memory hotplug
so it does not belong in the init section.

If the kernel is built with CONFIG_DEBUG_SECTION_MISMATCH=y on
the make command line, arch/x86/mm/init_64.c is compiled with
the -fno-inline-functions-called-once gcc option defeating
inlining of kernel_physical_mapping_init() within init_memory_mapping().

When kernel_physical_mapping_init() is not inlined it is placed
in the .init.text section according to the __init in it's current
declaration.  A later call to kernel_physical_mapping_init() during
a memory hotplug operation encounters an int3 trap because the
.init.text section memory has been freed.

This patch eliminates the crash caused by the int3 trap by moving the
non-inlined kernel_physical_mapping_init() from .init.text to .meminit.text.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-20 00:31:43 +01:00
Ingo Molnar bfa318ad52 fix: crash: IP: __bitmap_intersects+0x48/0x73
-tip testing found this crash:

> [   35.258515] calling  acpi_cpufreq_init+0x0/0x127 @ 1
> [   35.264127] BUG: unable to handle kernel NULL pointer dereference at (null)
> [   35.267554] IP: [<ffffffff80478092>] __bitmap_intersects+0x48/0x73
> [   35.267554] PGD 0
> [   35.267554] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC

arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c is still broken: there's no
allocation of the variable mask, so we pass in an uninitialized cmd.mask
field to drv_read(), which then passes it to the scheduler which then
crashes ...

Switch it over to the much simpler constant-cpumask-pointers approach.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-20 00:17:01 +01:00
Mike Travis 7285908185 cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
Impact: use new work_on_cpu function to reduce stack usage

Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for drv_read() and drv_write().

Basically converts do_drv_{read,write} into "work_on_cpu" functions that
are now called by drv_read and drv_write.

Note: This patch basically reverts 50c668d6 which reverted 7503bfba, now
that the work_on_cpu() function is more stable.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Dieter Ries <clip2@gmx.de>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <cpufreq@vger.kernel.org>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-19 22:36:13 +01:00
Leonardo Potenza c7f8562a51 x86: fix section mismatch warnings in kernel/setup_percpu.c
The function setup_cpu_local_masks() has been marked __init, in
order to remove the following section mismatch messages:

WARNING: vmlinux.o(.text+0x3c2c7): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.

WARNING: vmlinux.o(.text+0x3c2d3): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.

WARNING: vmlinux.o(.text+0x3c2df): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.

WARNING: vmlinux.o(.text+0x3c2eb): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.

Signed-off-by: Leonardo Potenza <lpotenza@inwind.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-18 23:59:22 +01:00
Mike Travis b2b815d80a x86: put trigger in to detect mismatched apic versions
Impact: add debug warning

Fire off one message if two apic's discovered with different
apic versions. (this code is only called during CPU init)

The goal of this is to pave the way of the removal of the apic_version[]
array. We dont expect any apic version incompatibilities in the x86
landscape of systems [if so we dont handle them very well and probably
never will handle deep apic version assymetries well], but it's prudent
to have a debug check for one kernel cycle nevertheless.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-18 21:15:27 +01:00
Len Brown 88d998c264 Merge branch 'misc' into release 2009-01-16 14:45:34 -05:00
Masami Hiramatsu 5a4ccaf37f kprobes: check CONFIG_FREEZER instead of CONFIG_PM
Check CONFIG_FREEZER instead of CONFIG_PM because kprobe booster
depends on freeze_processes() and thaw_processes() when CONFIG_PREEMPT=y.

This fixes a linkage error which occurs when CONFIG_PREEMPT=y, CONFIG_PM=y
and CONFIG_FREEZER=n.

Reported-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-01-16 14:32:17 -05:00
Jan Beulich a3c6018e56 x86: fix assumed to be contiguous leaf page tables for kmap_atomic region (take 2)
Debugging and original patch from Nick Piggin <npiggin@suse.de>

The early fixmap pmd entry inserted at the very top of the KVA is causing the
subsequent fixmap mapping code to not provide physically linear pte pages over
the kmap atomic portion of the fixmap (which relies on said property to
calculate pte addresses).

This has caused weird boot failures in kmap_atomic much later in the boot
process (initial userspace faults) on a 32-bit PAE system with a larger number
of CPUs (smaller CPU counts tend not to run over into the next page so don't
show up the problem).

Solve this by attempting to clear out the page table, and copy any of its
entries to the new one. Also, add a bug if a nonlinear condition is encountered
and can't be resolved, which might save some hours of debugging if this fragile
scheme ever breaks again...

Once we have such logic, we can also use it to eliminate the early ioremap
trickery around the page table setup for the fixmap area. This also fixes
potential issues with FIX_* entries sharing the leaf page table with the early
ioremap ones getting discarded by early_ioremap_clear() and not restored by
early_ioremap_reset(). It at once eliminates the temporary (and configuration,
namely NR_CPUS, dependent) unavailability of early fixed mappings during the
time the fixmap area page tables get constructed.

Finally, also replace the hard coded calculation of the initial table space
needed for the fixmap area with a proper one, allowing kernels configured for
large CPU counts to actually boot.

Based-on: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-16 13:47:04 +01:00
Linus Torvalds b5db0e3865 Revert "x86 PAT: remove CPA WARN_ON for zero pte"
This reverts commit 58dab916df, which
makes my Nehalem come to a nasty crawling almost-halt.  It looks like it
turns off caching of regular kernel RAM, with the understandable
slowdown of a few orders of magnitude as a result.

Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:25:09 -08:00
Cliff Wickman 18c07cf530 x86, UV: cpu_relax in uv_wait_completion
The function uv_wait_completion() spins on reads of a memory-mapped
register, waiting for completion of BAU hardware replies.

It should call "cpu_relax()" between those reads to improve performance
on hyperthreaded configurations.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-15 23:48:20 +01:00
Jan Beulich 4a13ad0bd8 x86: avoid early crash in disable_local_APIC()
E.g. when called due to an early panic.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-15 23:48:19 +01:00
Linus Torvalds bca268565f Merge branch 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (44 commits)
  [CVE-2009-0029] s390 specific system call wrappers
  [CVE-2009-0029] System call wrappers part 33
  [CVE-2009-0029] System call wrappers part 32
  [CVE-2009-0029] System call wrappers part 31
  [CVE-2009-0029] System call wrappers part 30
  [CVE-2009-0029] System call wrappers part 29
  [CVE-2009-0029] System call wrappers part 28
  [CVE-2009-0029] System call wrappers part 27
  [CVE-2009-0029] System call wrappers part 26
  [CVE-2009-0029] System call wrappers part 25
  [CVE-2009-0029] System call wrappers part 24
  [CVE-2009-0029] System call wrappers part 23
  [CVE-2009-0029] System call wrappers part 22
  [CVE-2009-0029] System call wrappers part 21
  [CVE-2009-0029] System call wrappers part 20
  [CVE-2009-0029] System call wrappers part 19
  [CVE-2009-0029] System call wrappers part 18
  [CVE-2009-0029] System call wrappers part 17
  [CVE-2009-0029] System call wrappers part 16
  [CVE-2009-0029] System call wrappers part 15
  ...
2009-01-14 19:58:40 -08:00
Harvey Harrison 74d96f0186 byteorder: make swab.h include asm/swab.h like a regular header
Add swab.h to kbuild.asm and remove the individual entries from
each arch, mark as unifdef as some arches have some kernel-only
bits inside.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-14 19:56:50 -08:00
Sam Ravnborg 2ea038917b Revert "kbuild: strip generated symbols from *.ko"
This reverts commit ad7a953c52.

And commit: ("allow stripping of generated symbols under CONFIG_KALLSYMS_ALL")
            9bb482476c

These stripping patches has caused a set of issues:

1) People have reported compatibility issues with binutils due to
   lack of support for `--strip-unneeded-symbols' with objcopy 2.15.92.0.2
   Reported by: Wenji
2) ccache and distcc no longer works as expeced
   Reported by: Ted, Roland, + others
3) The installed modules increased a lot in size
   Reported by: Ted, Davej + others

Reported-by: Wenji Huang <wenji.huang@oracle.com>
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-01-14 21:38:20 +01:00
Suresh Siddha 5cca0cf15a x86, pat: fix reserve_memtype() for legacy 1MB range
Thierry Vignaud reported:
> http://bugzilla.kernel.org/show_bug.cgi?id=12372
>
> On P4 with an SiS motherboard (video card is a SiS 651)
> X server fails to start with error:
> xf86MapVidMem: Could not mmap framebuffer (0x00000000,0x2000) (Invalid
> argument)

Here X is trying to map first 8KB of memory using /dev/mem. Existing
code treats first 0-4KB of memory as non-RAM and 4KB-8KB as RAM. Recent
code changes don't allow to map memory with different attributes
at the same time.

Fix this by treating the first 1MB legacy region as special and always
track the attribute requests with in this region using linear linked
list (and don't bother if the range is RAM or non-RAM or mixed)

Reported-and-tested-by: Thierry Vignaud <tvignaud@mandriva.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14 20:14:45 +01:00
Heiko Carstens e55380edf6 [CVE-2009-0029] Rename old_readdir to sys_old_readdir
This way it matches the generic system call name convention.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:15 +01:00
venkatesh.pallipadi@intel.com 58dab916df x86 PAT: remove CPA WARN_ON for zero pte
Impact: reduce scope of debug check - avoid warnings

The logic to find whether identity map exists or not using
high_memory or max_low_pfn_mapped/max_pfn_mapped are not complete
as the memory withing the range may not be mapped if there is a
unusable hole in e820.

Specifically, on my test system I started seeing these warnings with
tools like hwinfo, acpidump trying to map ACPI region.

[   27.400018] ------------[ cut here ]------------
[   27.400344] WARNING: at /home/venkip/src/linus/linux-2.6/arch/x86/mm/pageattr.c:560 __change_page_attr_set_clr+0xf3/0x8b8()
[   27.400821] Hardware name: X7DB8
[   27.401070] CPA: called for zero pte. vaddr = ffff8800cff6a000 cpa->vaddr = ffff8800cff6a000
[   27.401569] Modules linked in:
[   27.401882] Pid: 4913, comm: dmidecode Not tainted 2.6.28-05716-gfe0bdec #586
[   27.402141] Call Trace:
[   27.402488]  [<ffffffff80237c21>] warn_slowpath+0xd3/0x10f
[   27.402749]  [<ffffffff80274ade>] ? find_get_page+0xb3/0xc9
[   27.403028]  [<ffffffff80274a2b>] ? find_get_page+0x0/0xc9
[   27.403333]  [<ffffffff80226425>] __change_page_attr_set_clr+0xf3/0x8b8
[   27.403628]  [<ffffffff8028ec99>] ? __purge_vmap_area_lazy+0x192/0x1a1
[   27.403883]  [<ffffffff8028eb52>] ? __purge_vmap_area_lazy+0x4b/0x1a1
[   27.404172]  [<ffffffff80290268>] ? vm_unmap_aliases+0x1ab/0x1bb
[   27.404512]  [<ffffffff80290105>] ? vm_unmap_aliases+0x48/0x1bb
[   27.404766]  [<ffffffff80226d28>] change_page_attr_set_clr+0x13e/0x2e6
[   27.405026]  [<ffffffff80698fa7>] ? _spin_unlock+0x26/0x2a
[   27.405292]  [<ffffffff80227e6a>] ? reserve_memtype+0x19b/0x4e3
[   27.405590]  [<ffffffff80226ffd>] _set_memory_wb+0x22/0x24
[   27.405844]  [<ffffffff80225d28>] ioremap_change_attr+0x26/0x28
[   27.406097]  [<ffffffff80228355>] reserve_pfn_range+0x1a3/0x235
[   27.406427]  [<ffffffff80228430>] track_pfn_vma_new+0x49/0xb3
[   27.406686]  [<ffffffff80286c46>] remap_pfn_range+0x94/0x32c
[   27.406940]  [<ffffffff8022878d>] ? phys_mem_access_prot_allowed+0xb5/0x1a8
[   27.407209]  [<ffffffff803e9bf4>] mmap_mem+0x75/0x9d
[   27.407523]  [<ffffffff8028b3b4>] mmap_region+0x2cf/0x53e
[   27.407776]  [<ffffffff8028b8cc>] do_mmap_pgoff+0x2a9/0x30d
[   27.408034]  [<ffffffff8020f4a4>] sys_mmap+0x92/0xce
[   27.408339]  [<ffffffff8020b65b>] system_call_fastpath+0x16/0x1b
[   27.408614] ---[ end trace 4b16ad70c09a602d ]---
[   27.408871] dmidecode:4913 reserve_pfn_range ioremap_change_attr failed write-back for cff6a000-cff6b000

This is wih track_pfn_vma_new trying to keep identity map in sync.
The address cff6a000 is the ACPI region according to e820.

[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
[    0.000000]  BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
[    0.000000]  BIOS-e820: 00000000cff60000 - 00000000cff69000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000cff69000 - 00000000cff80000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000230000000 (usable)

And is not mapped as per init_memory_mapping.

[    0.000000] init_memory_mapping: 0000000000000000-00000000cff60000
[    0.000000] init_memory_mapping: 0000000100000000-0000000230000000

We can add logic to check for this. But, there can also be other holes in
identity map when we have 1GB of aligned reserved space in e820.

This patch handles it by removing the WARN_ON and returning a specific
error value (EFAULT) to indicate that the address does not have any
identity mapping.

The code that tries to keep identity map in sync can ignore
this error, with other callers of cpa still getting error here.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 19:13:02 +01:00
venkatesh.pallipadi@intel.com cdecff6864 x86 PAT: return compatible mapping to remap_pfn_range callers
Impact: avoid warning message, potentially solve 3D performance regression

Change x86 PAT code to return compatible memtype if the exact memtype that
was requested in remap_pfn_rage and friends is not available due to some
conflict.

This is done by returning the compatible type in pgprot parameter of
track_pfn_vma_new(), and the caller uses that memtype for page table.

Note that track_pfn_vma_copy() which is basically called during fork gets the
prot from existing page table and should not have any conflict. Hence we use
strict memtype check there and do not allow compatible memtypes.

This patch fixes the bug reported here:

  http://marc.info/?l=linux-kernel&m=123108883716357&w=2

Specifically the error message:

  X:5010 map pfn expected mapping type write-back for d0000000-d0101000,
  got write-combining

Should go away.

Reported-and-bisected-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 19:13:02 +01:00
venkatesh.pallipadi@intel.com e4b866ed19 x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param
Impact: cleanup

Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer.
Subsequent patch changes the x86 PAT handling to return a compatible
memtype in pgprot_t, if what was requested cannot be allowed due to conflicts.
No fuctionality change in this patch.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 19:13:01 +01:00
venkatesh.pallipadi@intel.com afc7d20c84 x86 PAT: consolidate old memtype new memtype check into a function
Impact: cleanup

Move the new memtype old memtype allowed check to header so that is can be
shared by other users. Subsequent patch uses this in pat.c in remap_pfn_range()
code path. No functionality change in this patch.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 19:13:00 +01:00
Andi Kleen c8399943bd x86, generic: mark complex bitops.h inlines as __always_inline
Impact: reduce kernel image size

Hugh Dickins noticed that older gcc versions when the kernel
is built for code size didn't inline some of the bitops.

Mark all complex x86 bitops that have more than a single
asm statement or two as always inline to avoid this problem.

Probably should be done for other architectures too.

Ingo then found a better fix that only requires
a single line change, but it unfortunately only
works on gcc 4.3.

On older gccs the original patch still makes a ~0.3% defconfig
difference with CONFIG_OPTIMIZE_INLINING=y.

With gcc 4.1 and a defconfig like build:

    6116998 1138540  883788 8139326  7c323e vmlinux-oi-with-patch
    6137043 1138540  883788 8159371  7c808b vmlinux-optimize-inlining

~20k / 0.3% difference.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 18:56:30 +01:00
Ingo Molnar 4a922a969c x86, cpufreq: remove leftover copymask_copy()
Impact: fix potential boot crash on MAXSMP

Remove code left over by:

  50c668d: Revert "cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read

That cmd.cpumask is not allocated anymore. No impact on default !MAXSMP
kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 16:11:00 +01:00
Ingo Molnar e8cea892df Revert "i386: add TRACE_IRQS_OFF for the nmi"
This reverts commit e0c7317557.

This patch was wrong, as lockdep (and thus the irq state tracer)
aren't nmi safe. People are already seeing lockdep warnings due
to this.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-12 19:36:59 +01:00
Ingo Molnar 50c668d678 Revert "cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write"
This reverts commit 7503bfbae8.

Dieter Ries reported bootup soft-hangs and bisected it back to
this commit, and reverting this commit gave him a working system.

The commit introduces work_on_cpu() use into the cpufreq code,
but that is subtly problematic from a lock hierarchy POV: the
hotplug-cpu lock is an highlevel lock that is taken before
lowlevel locks, and in this codepath we are called with the
policy lock taken.

Dieter did not have lockdep enabled so we dont have a nice stack
trace proof for this, but using work_on_cpu() in such a lowlevel
place certainly looks wrong, so we revert the patch.

work_on_cpu() needs to be reworked to be more generally usable.

Reported-by: Dieter Ries <clip2@gmx.de>
Tested-by: Dieter Ries <clip2@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-12 19:24:23 +01:00
Jaswinder Singh Rajput 2bc1379712 x86: fix apic.c build error on latest git
Fix this by reintroducing asm/smp.h include in apic.c - later on
I will fix this by removing non-smp data from smp.h

Also fix the __inquire_remote_apic() prototype/inline.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-12 19:24:23 +01:00
Jaswinder Singh Rajput 4884d8e6a0 x86: fix mpparse.c build error on latest git
Fix this by reintroducing asm/smp.h include in mpparse.c - later on
I will fix this by removing non-smp data from smp.h.

Reported-by: Petr Titera <P.Titera@century.cz>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-12 19:24:22 +01:00
Andi Kleen f313e12308 x86: avoid theoretical vmalloc fault loop
Ajith Kumar noticed:

 I was going through the vmalloc fault handling for x86_64 and am unclear
 about the following lines in the vmalloc_fault() function.

 pgd = pgd_offset(current->mm ?: &init_mm, address);
 pgd_ref = pgd_offset_k(address);

 Here the intention is to get the pgd corresponding to the current process
 and sync it up with the pgd in init_mm(obtained from pgd_offset_k).
 However, for kernel threads current->mm is NULL and hence pgd =
 pgd_offset(init_mm, address) = pgd_ref which means the fault handler
 returns without setting the pgd entry in the MM structure in the context
 of which the kernel thread has faulted.  This could lead to never-ending
 faults and busy looping of kernel threads like pdflush.  So, shouldn't the
 pgd = pgd_offset(current->mm ?: &init_mm, address); be pgd =
 pgd_offset(current->active_mm ?: &init_mm, address);

We can use active_mm unconditionally because it should be always set.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-12 19:24:21 +01:00
Peter Zijlstra 6d612b0f94 locking, hpet: annotate false positive warning
Alexander Beregalov reported that this warning is caused by the HPET code:

> hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
> hpet0: 3 comparators, 64-bit 14.318180 MHz counter
> ODEBUG: object is on stack, but not annotated
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:251 __debug_object_init+0x2a4/0x352()

> Bisected down to 26afe5f2fb
> (x86: HPET_MSI Initialise per-cpu HPET timers)

The commit is fine - but the on-stack workqueue entry needs annotation.

Reported-and-bisected-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-12 13:33:20 +01:00
Ingo Molnar f45ac22ae2 Merge commit 'v2.6.29-rc1' into x86/urgent 2009-01-11 03:03:30 +01:00
Linus Torvalds 3d14bdad40 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
  x86: fix section mismatch warnings in mcheck/mce_amd_64.c
  x86: offer frame pointers in all build modes
  x86: remove duplicated #include's
  x86: k8 numa register active regions later
  x86: update Alan Cox's email addresses
  x86: rename all fields of mpc_table mpc_X to X
  x86: rename all fields of mpc_oemtable oem_X to X
  x86: rename all fields of mpc_bus mpc_X to X
  x86: rename all fields of mpc_cpu mpc_X to X
  x86: rename all fields of mpc_intsrc mpc_X to X
  x86: rename all fields of mpc_lintsrc mpc_X to X
  x86: rename all fields of mpc_iopic mpc_X to X
  x86: irqinit_64.c init_ISA_irqs should be static
  Documentation/x86/boot.txt: payload length was changed to payload_length
  x86: setup_percpu.c fix style problems
  x86: irqinit_64.c fix style problems
  x86: irqinit_32.c fix style problems
  x86: i8259.c fix style problems
  x86: irq_32.c fix style problems
  x86: ioport.c fix style problems
  ...
2009-01-10 06:13:09 -08:00
Linus Torvalds 4e9b1c184c Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  [IA64] fix typo in cpumask_of_pcibus()
  x86: fix x86_32 builds for summit and es7000 arch's
  cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
  cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
  cpumask: use cpumask_var_t in acpi-cpufreq.c
  cpumask: use work_on_cpu in acpi/cstate.c
  cpumask: convert struct cpufreq_policy to cpumask_var_t
  cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
  x86: cleanup remaining cpumask_t ops in smpboot code
  cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
  cpumask: update local_cpus_show to use new cpumask API
  ia64: cpumask fix for is_affinity_mask_valid()
2009-01-10 06:12:18 -08:00
Linus Torvalds c4295fbb60 x86: make 'constant_test_bit()' take an unsigned bit number
Ingo noticed that using signed arithmetic seems to confuse the gcc
inliner, and make it potentially decide that it's all too complicated.

(Yeah, yeah, it's a constant. It's always positive. Still..)

Based-on: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-09 12:49:50 -08:00
Andi Kleen 8659c406ad x86: only scan the root bus in early PCI quirks
We found a situation on Linus' machine that the Nvidia timer quirk hit on
a Intel chipset system.  The problem is that the system has a fancy Nvidia
card with an own PCI bridge, and the early-quirks code looking for any
NVidia bridge triggered on it incorrectly.  This didn't lead a boot
failure by luck, but the timer routing code selecting the wrong timer
first and some ugly messages.  It might lead to real problems on other
systems.

I checked all the devices which are currently checked for by early_quirks
and it turns out they are all located in the root bus zero.

So change the early-quirks loop to only scan bus 0.  This incidently also
saves quite some unnecessary scanning work, because early_quirks doesn't
go through all the non root busses.

The graphics card is not on bus 0, so it is not matched anymore.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-09 12:46:22 -08:00
Linus Torvalds 4ce5f24193 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile: (31 commits)
  powerpc/oprofile: fix whitespaces in op_model_cell.c
  powerpc/oprofile: IBM CELL: add SPU event profiling support
  powerpc/oprofile: fix cell/pr_util.h
  powerpc/oprofile: IBM CELL: cleanup and restructuring
  oprofile: make new cpu buffer functions part of the api
  oprofile: remove #ifdef CONFIG_OPROFILE_IBS in non-ibs code
  ring_buffer: fix ring_buffer_event_length()
  oprofile: use new data sample format for ibs
  oprofile: add op_cpu_buffer_get_data()
  oprofile: add op_cpu_buffer_add_data()
  oprofile: rework implementation of cpu buffer events
  oprofile: modify op_cpu_buffer_read_entry()
  oprofile: add op_cpu_buffer_write_reserve()
  oprofile: rename variables in add_ibs_begin()
  oprofile: rename add_sample() in cpu_buffer.c
  oprofile: rename variable ibs_allowed to has_ibs in op_model_amd.c
  oprofile: making add_sample_entry() inline
  oprofile: remove backtrace code for ibs
  oprofile: remove unused ibs macro
  oprofile: remove unused components in struct oprofile_cpu_buffer
  ...
2009-01-09 12:43:06 -08:00
Len Brown b2576e1d44 Merge branch 'linus' into release 2009-01-09 03:39:43 -05:00
Len Brown 3cc8a5f4ba Merge branch 'suspend' into release 2009-01-09 03:38:15 -05:00
Len Brown d0302bc62a Merge branch 'misc' into release
Conflicts:
	include/acpi/acpixf.h

Signed-off-by: Len Brown <len.brown@intel.com>
2009-01-09 03:37:48 -05:00
Len Brown b8ef914e58 Merge branches 'release', 'bugzilla-11880', 'bugzilla-12037' and 'bugzilla-12257' into release 2009-01-09 03:37:11 -05:00
Zhao Yakui 237889bf0a ACPI : Use RSDT instead of XSDT by adding boot option of "acpi=rsdt"
On some boxes there exist both RSDT and XSDT table. But unfortunately
sometimes there exists the following error when XSDT table is used:
   a. 32/64X address mismatch
   b. The 32/64X FACS address mismatch

   In such case the boot option of "acpi=rsdt" is provided so that
RSDT is tried instead of XSDT table when the system can't work well.

http://bugzilla.kernel.org/show_bug.cgi?id=8246

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
cc:Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-01-09 01:41:58 -05:00
Zhao Yakui 13b40a1a06 ACPI: Avoid array address overflow when _CST MWAIT hint bits are set
The Cx Register address obtained from the _CST object is used as the MWAIT
hints if the register type is FFixedHW. And it is used to check whether
the Cx type is supported or not.

On some boxes the following Cx state package is obtained from _CST object:
    >{
                ResourceTemplate ()
                {
                    Register (FFixedHW,
                        0x01,               // Bit Width
                        0x02,               // Bit Offset
                        0x0000000000889759, // Address
                        0x03,               // Access Size
                        )
                },

                0x03,
                0xF5,
                0x015E }

   In such case we should use the bit[7:4] of Cx address to check whether
the Cx type is supported or not.

mask the MWAIT hint to avoid array address overflow

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by:Venki Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-01-09 01:28:01 -05:00
Fernando Carrijo c19a28e119 remove lots of double-semicolons
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:14 -08:00
Kyle McMartin 79f3b3cb7a x86, mtrr: fix types used in userspace exported header
Commit 932d27a791 exported some mtrr
structures without using the exportable __uX types, causing userspace
build failures.

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-08 16:13:59 +01:00
Robert Richter d2852b932f Merge branch 'oprofile/ring_buffer' into oprofile/oprofile-for-tip 2009-01-08 14:27:34 +01:00
Harvey Harrison 9b4778f680 trivial: replace last usages of __FUNCTION__ in kernel
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-07 15:48:54 -08:00
Linus Torvalds b424e8d3b4 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (98 commits)
  PCI PM: Put PM callbacks in the order of execution
  PCI PM: Run default PM callbacks for all devices using new framework
  PCI PM: Register power state of devices during initialization
  PCI PM: Call pci_fixup_device from legacy routines
  PCI PM: Rearrange code in pci-driver.c
  PCI PM: Avoid touching devices behind bridges in unknown state
  PCI PM: Move pci_has_legacy_pm_support
  PCI PM: Power-manage devices without drivers during suspend-resume
  PCI PM: Add suspend counterpart of pci_reenable_device
  PCI PM: Fix poweroff and restore callbacks
  PCI: Use msleep instead of cpu_relax during ASPM link retraining
  PCI: PCIe portdrv: Add kerneldoc comments to remining core funtions
  PCI: PCIe portdrv: Rearrange code so that related things are together
  PCI: PCIe portdrv: Fix suspend and resume of PCI Express port services
  PCI: PCIe portdrv: Add kerneldoc comments to some core functions
  x86/PCI: Do not use interrupt links for devices using MSI-X
  net: sfc: Use pci_clear_master() to disable bus mastering
  PCI: Add pci_clear_master() as opposite of pci_set_master()
  PCI hotplug: remove redundant test in cpq hotplug
  PCI: pciehp: cleanup register and field definitions
  ...
2009-01-07 15:41:01 -08:00
Robert Richter 14f0ca8eae oprofile: make new cpu buffer functions part of the api
This patch creates the new functions

 oprofile_write_reserve()
 oprofile_add_data()
 oprofile_write_commit()

and makes them part of the oprofile api.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2009-01-07 22:48:15 +01:00
Robert Richter 1acda878e2 oprofile: use new data sample format for ibs
The new ring buffer implementation allows the storage of samples with
different size. This patch implements the usage of the new sample
format to store ibs samples in the cpu buffer. Until now, writing to
the cpu buffer could lead to incomplete sampling sequences since IBS
samples were transfered in multiple samples. Due to a full buffer,
data could be lost at any time. This can't happen any more since the
complete data is reserved in advance and then stored in a single
sample.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2009-01-07 22:47:23 +01:00
Robert Richter ae735e9964 oprofile: rework implementation of cpu buffer events
Special events such as task or context switches are marked with an
escape code in the cpu buffer followed by an event code or a task
identifier. There is one escape code per event. To make escape
sequences also available for data samples the internal cpu buffer
format must be changed. The current implementation does not allow the
extension of event codes since this would lead to collisions with the
task identifiers. To avoid this, this patch introduces an event mask
that allows the storage of multiple events with one escape code. Now,
task identifiers are stored in the data section of the sample. The
implementation also allows the usage of custom data in a sample. As a
side effect the new code is much more readable and easier to
understand.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2009-01-07 22:40:47 +01:00
Robert Richter fc81be8ca2 oprofile: rename variable ibs_allowed to has_ibs in op_model_amd.c
This patch renames ibs_allowed to has_ibs. Varible name fits better
now.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2009-01-07 22:34:21 +01:00
Linus Torvalds 57c44c5f6f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
  trivial: chack -> check typo fix in main Makefile
  trivial: Add a space (and a comma) to a printk in 8250 driver
  trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
  trivial: Fix misspelling of "firmware" in powerpc Makefile
  trivial: Fix misspelling of "firmware" in usb.c
  trivial: Fix misspelling of "firmware" in qla1280.c
  trivial: Fix misspelling of "firmware" in a100u2w.c
  trivial: Fix misspelling of "firmware" in megaraid.c
  trivial: Fix misspelling of "firmware" in ql4_mbx.c
  trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
  trivial: Fix misspelling of "firmware" in ipw2100.c
  trivial: Fix misspelling of "firmware" in atmel.c
  trivial: Fix misspelled firmware in Kconfig
  trivial: fix an -> a typos in documentation and comments
  trivial: fix then -> than typos in comments and documentation
  trivial: update Jesper Juhl CREDITS entry with new email
  trivial: fix singal -> signal typo
  trivial: Fix incorrect use of "loose" in event.c
  trivial: printk: fix indentation of new_text_line declaration
  trivial: rtc-stk17ta8: fix sparse warning
  ...
2009-01-07 11:31:52 -08:00
Rafael J. Wysocki 16cf0ebc35 x86/PCI: Do not use interrupt links for devices using MSI-X
pcibios_enable_device() and pcibios_disable_device() don't handle
IRQs for devices that have MSI enabled and it should treat the
devices with MSI-X enabled in the same way.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:25 -08:00
Bjorn Helgaas 2b8c2efe44 x86/PCI: use dev_printk for PCI bus locality messages
Since pci_bus has a struct device, use dev_printk directly instead
of faking it by hand.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:19 -08:00
Bjorn Helgaas 904d6a3033 PCI: x86/visws: use generic INTx swizzle from PCI core
Use the generic pci_common_swizzle() instead of arch-specific code.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:16 -08:00
Bjorn Helgaas b1c86792a0 PCI: x86: use generic pci_swizzle_interrupt_pin()
Use the generic pci_swizzle_interrupt_pin() instead of arch-specific code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:54 -08:00
Bjorn Helgaas 12b955ff63 x86/PCI: minor logic simplications
Test "pin" immediately to simplify the subsequent code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:49 -08:00
Bjorn Helgaas f672c392b9 x86/PCI: use config space encoding for interrupt pins
Keep "pin" encoded as it is in the "Interrupt Pin" value in PCI config
space, i.e., 0=device doesn't use interrupts, 1=INTA, ..., 4=INTD.

This makes the bridge INTx swizzle match other architectures.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:49 -08:00
Bjorn Helgaas 878f2e50fd PCI: use config space encoding in pci_get_interrupt_pin()
This patch makes pci_get_interrupt_pin() return values encoded
the same way as the "Interrupt Pin" value in PCI config space,
i.e., 1=INTA, ..., 4=INTD.

pirq_bios_set() is the only in-tree caller of pci_get_interrupt_pin()
and pci_get_interrupt_pin() is not exported.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:48 -08:00
Jacob Pan 23a3600274 PCI: avoid early PCI mmconfig init if pci=noearly is given in cmdline
Early type 1 accesses can cause problems on some platforms, and
pci=noearly is supposed to prevent them from occurring.  However, early
mcfg probing code uses type 1 and  isn't protected by a check for
noearly.  This patch fixes that problem.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:46 -08:00
Bjorn Helgaas 0663a36284 x86/PCI: make PCI bus locality messages more meaningful
Change PCI bus locality messages so they have a bit more context
and look like the rest of PCI, e.g.,

    - bus 01 -> node 0
    - bus 04 -> node 0
    + pci 0000:01: bus on NUMA node 0
    + pci 0000:04: bus on NUMA node 0

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:45 -08:00
Ingo Molnar 104bafcfab PCI: Don't carp about BAR allocation failures in quiet boot
These are easy to trigger (more or less harmlessly) with multiple video
cards, since the ROM BAR will typically not be given any space by the
BIOS bridge setup.  No reason to punish quiet boot for this.

Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:42 -08:00
Arjan van de Ven e8de1481fd resource: allow MMIO exclusivity for device drivers
Device drivers that use pci_request_regions() (and similar APIs) have a
reasonable expectation that they are the only ones accessing their device.
As part of the e1000e hunt, we were afraid that some userland (X or some
bootsplash stuff) was mapping the MMIO region that the driver thought it
had exclusively via /dev/mem or via various sysfs resource mappings.

This patch adds the option for device drivers to cause their reserved
regions to the "banned from /dev/mem use" list, so now both kernel memory
and device-exclusive MMIO regions are banned.
NOTE: This is only active when CONFIG_STRICT_DEVMEM is set.

In addition to the config option, a kernel parameter iomem=relaxed is
provided for the cases where developers want to diagnose, in the field,
drivers issues from userspace.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:32 -08:00
Andrew Patterson 0ef5f8f615 ACPI/PCI: PCI extended config _OSC support called when root bridge added
The _OSC capability OSC_EXT_PCI_CONFIG_SUPPORT is set when the root
bridge is added with pci_acpi_osc_support() if we can access PCI
extended config space.

This adds the function pci_ext_cfg_avail which returns true if we can
access PCI extended config space (offset greater than 0xff). It
currently only returns false if arch=x86 and raw_pci_ext_ops is not set
(which might happen if pci=nommcfg is set on the kernel command-line).

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:28 -08:00
Kay Sievers 1a9271331a PCI: struct device - replace bus_id with dev_name(), dev_set_name()
This patch is part of a larger patch series which will remove
the "char bus_id[20]" name string from struct device. The device
name is managed in the kobject anyway, and without any size
limitation, and just needlessly copied into "struct device".

To set and read the device name dev_name(dev) and dev_set_name(dev)
must be used. If your code uses static kobjects, which it shouldn't
do, "const char *init_name" can be used to statically provide the
name the registered device should have. At registration time, the
init_name field is cleared, to enforce the use of dev_name(dev) to
access the device name at a later time.

We need to get rid of all occurrences of bus_id in the entire tree
to be able to enable the new interface. Please apply this patch,
and possibly convert any remaining remaining occurrences of bus_id.

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:23 -08:00
Leonardo Potenza 51d7a1398d x86: fix section mismatch warnings in mcheck/mce_amd_64.c
Mark the function local_allocate_threshold_blocks() with __cpuinit,
in order to remove the following section mismatch messages:

WARNING: arch/x86/kernel/cpu/mcheck/built-in.o(.text+0x1363): Section mismatch in reference from the function local_allocate_threshold_blocks() to the function .cpuinit.text:allocate_threshold_blocks()
The function local_allocate_threshold_blocks() references
the function __cpuinit allocate_threshold_blocks().
This is often because local_allocate_threshold_blocks lacks a __cpuinit
annotation or the annotation of allocate_threshold_blocks is wrong.

WARNING: arch/x86/kernel/cpu/built-in.o(.text+0x1def): Section mismatch in reference from the function local_allocate_threshold_blocks() to the function .cpuinit.text:allocate_threshold_blocks()
The function local_allocate_threshold_blocks() references
the function __cpuinit allocate_threshold_blocks().
This is often because local_allocate_threshold_blocks lacks a __cpuinit
annotation or the annotation of allocate_threshold_blocks is wrong.

WARNING: arch/x86/kernel/built-in.o(.text+0xef2b): Section mismatch in reference from the function local_allocate_threshold_blocks() to the function .cpuinit.text:allocate_threshold_blocks()
The function local_allocate_threshold_blocks() references
the function __cpuinit allocate_threshold_blocks().
This is often because local_allocate_threshold_blocks lacks a __cpuinit
annotation or the annotation of allocate_threshold_blocks is wrong.

All the callsites of this function are __cpuinit already, and all the
functions it calls are __cpuinit as well.

Signed-off-by: Leonardo Potenza <lpotenza@inwind.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-07 12:23:35 +01:00
Ingo Molnar da4276b829 x86: offer frame pointers in all build modes
CONFIG_FRAME_POINTERS=y results in much better debug info for the
kernel (clear and precise backtraces), with the only drawback being
a ~1% increase in kernel size.

So offer it unconditionally and enable it by default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-07 11:18:59 +01:00
Harvey Harrison 5d30a68388 x86: introduce asm/swab.h
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 18:10:27 -08:00
Linus Torvalds f94181da71 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: fix rcutorture bug
  rcu: eliminate synchronize_rcu_xxx macro
  rcu: make treercu safe for suspend and resume
  rcu: fix rcutree grace-period-latency bug on small systems
  futex: catch certain assymetric (get|put)_futex_key calls
  futex: make futex_(get|put)_key() calls symmetric
  locking, percpu counters: introduce separate lock classes
  swiotlb: clean up EXPORT_SYMBOL usage
  swiotlb: remove unnecessary declaration
  swiotlb: replace architecture-specific swiotlb.h with linux/swiotlb.h
  swiotlb: add support for systems with highmem
  swiotlb: store phys address in io_tlb_orig_addr array
  swiotlb: add hwdev to swiotlb_phys_to_bus() / swiotlb_sg_to_bus()
2009-01-06 17:10:04 -08:00
Masami Hiramatsu 1294156078 kprobes: add kprobe_insn_mutex and cleanup arch_remove_kprobe()
Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove
kprobe_mutex from architecture dependent code.

This allows us to call arch_remove_kprobe() (and free_insn_slot) while
holding kprobe_mutex.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:20 -08:00
Alexey Dobriyan f1883f86de Remove remaining unwinder code
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Gabor Gombas <gombasg@sztaki.hu>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:11 -08:00
Matthew Wilcox ea43546750 atomic_t: unify all arch definitions
The atomic_t type cannot currently be used in some header files because it
would create an include loop with asm/atomic.h.  Move the type definition
to linux/types.h to break the loop.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:10 -08:00
Gary Hade c04fc586c1 mm: show node to memory section relationship with symlinks in sysfs
Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX.  For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
  - Provides information needed to determine the specific node
    on which a defective DIMM is located.  This will reduce system
    downtime when the node or defective DIMM is swapped out.
  - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM.  This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node.  The consequences of reintroducing the defective memory
    could be ugly.
  - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
Future:
  - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems.  Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:00 -08:00
Nick Piggin 1c0fe6e3bd mm: invoke oom-killer from page fault
Rather than have the pagefault handler kill a process directly if it gets
a VM_FAULT_OOM, have it call into the OOM killer.

With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
oom killing throttling, oom priority adjustment or selective disabling,
panic on oom, etc), it's silly to unconditionally kill the faulting
process at page fault time.  Create a hook for pagefault oom path to call
into instead.

Only converted x86 and uml so far.

[akpm@linux-foundation.org: make __out_of_memory() static]
[akpm@linux-foundation.org: fix comment]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeff Dike <jdike@addtoit.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:58:58 -08:00
Ingo Molnar 0936912274 Merge branches 'x86/cleanups', 'x86/mpparse', 'x86/numa' and 'x86/uv' into x86/urgent 2009-01-06 17:39:52 +01:00
Huang Weiyi 9e9197370d x86: remove duplicated #include's
Removed duplicated #include's in:

  arch/x86/kernel/mpparse.c
  arch/x86/kernel/nmi.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 13:34:03 +01:00
Mike Travis 4d9f94319c x86: fix x86_32 builds for summit and es7000 arch's
Fix the following build errors reported by Yinghai Lu:

| In file included from arch/x86/mach-generic/summit.c:16:
| tip/linux-2.6/arch/x86/include/asm/summit/apic.h:
| In function 'cpu_mask_to_apicid_and':
| tip/linux-2.6/arch/x86/include/asm/summit/apic.h:179:
| error: 'GFP_ATOMIC' undeclared (first use in this function)

Reported-by: Yinghai Lu <Yinghai.Lu@Sun.COM>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 13:26:50 +01:00
Yinghai Lu 40bcc69b39 x86: k8 numa register active regions later
Impact: cleanup

don't register early, so we don't need to clear actived regions if it fail
to get node hash shift or wild set in nb config.

also remove nodeids array that is not needed

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 13:21:21 +01:00
Frederik Schwarzer 025dfdafe7 trivial: fix then -> than typos in comments and documentation
- (better, more, bigger ...) then -> (...) than

Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-06 11:28:06 +01:00
Ingo Molnar fdbc0450df Merge branches 'core/futexes', 'core/locking', 'core/rcu' and 'linus' into core/urgent 2009-01-06 09:32:11 +01:00
Mike Travis e39ad415ac cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
Impact: use new cpumask API to reduce stack usage

Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for read_measured_perf_ctrs().

Basically splits off the work function from get_measured_perf which is
run on the designated cpu.  Moves definition of struct perf_cur out of
function local namespace, and is used as the work function argument.
References in get_measured_perf use values in the perf_cur struct.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 09:05:43 +01:00
Mike Travis 7503bfbae8 cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
Impact: use new cpumask API to reduce stack usage

Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for drv_read() and drv_write().

Basically converts do_drv_{read,write} into "work_on_cpu" functions that
are now called by drv_read and drv_write.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 09:05:40 +01:00
Mike Travis 4d8bb53749 cpumask: use cpumask_var_t in acpi-cpufreq.c
Impact: cleanup, reduce stack usage, use new cpumask API.

Replace the cpumask_t in struct drv_cmd with a cpumask_var_t.  Remove unneeded
online_policy_cpus cpumask_t in acpi_cpufreq_target.  Update refs to use
new cpumask API.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 09:05:37 +01:00
Mike Travis c74f31c035 cpumask: use work_on_cpu in acpi/cstate.c
Impact: use new cpumask API to reduce stack usage

Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for the acpi_processor_ffh_cstate_probe() function.

Basically splits acpi_processor_ffh_cstate_probe() into two functions, the
other being acpi_processor_ffh_cstate_probe_cpu which is the work function
run on the designated cpu.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 09:05:34 +01:00
Rusty Russell 835481d9bc cpumask: convert struct cpufreq_policy to cpumask_var_t
Impact: use new cpumask API to reduce memory usage

This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS.  cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 09:05:31 +01:00
Rusty Russell 5cb0535f17 cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
Impact: cleanup

There's only one user, and it's a fairly easy conversion.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 09:05:28 +01:00
Ingo Molnar d12418fdea Merge branch 'linus' into cpus4096 2009-01-06 09:04:32 +01:00