Commit graph

4898 commits

Author SHA1 Message Date
Alexander van Heukelum 2e04bc7656 i386: fix return to 16-bit stack from NMI handler
Returning to a task with a 16-bit stack requires special care: the iret
instruction does not restore the high word of esp in that case. The
espfix code fixes this, but currently is not invoked on NMIs. This means
that a running task gets the upper word of esp clobbered due intervening
NMIs. To reproduce, compile and run the following program with the nmi
watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can
see that the high bits of esp contain garbage, while the low bits are
still correct.

This patch puts the espfix code back into the NMI code path.

The patch is slightly complicated due to the irqtrace infrastructure not
being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET.
Otherwise, the tail of the normal iret-code is correct for the nmi code
path too. To be able to share this code-path, the TRACE_IRQS_IRET was
move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but
this code explicitly disables interrupts. This short interrupts-off
section is now not traced anymore. The return-to-kernel path now always
includes the preliminary test to decide if the espfix code should be
called. This is never the case, but doing it this way keeps the patch as
simple as possible and the few extra instructions should not affect
timing in any significant way.

 #define _GNU_SOURCE
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/mman.h>
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <asm/ldt.h>

int modify_ldt(int func, void *ptr, unsigned long bytecount)
{
        return syscall(SYS_modify_ldt, func, ptr, bytecount);
}

/* this is assumed to be usable */
 #define SEGBASEADDR 0x10000
 #define SEGLIMIT 0x20000

/* 16-bit segment */
struct user_desc desc = {
        .entry_number = 0,
        .base_addr = SEGBASEADDR,
        .limit = SEGLIMIT,
        .seg_32bit = 0,
        .contents = 0, /* ??? */
        .read_exec_only = 0,
        .limit_in_pages = 0,
        .seg_not_present = 0,
        .useable = 1
};

int main(void)
{
        setvbuf(stdout, NULL, _IONBF, 0);

        /* map a 64 kb segment */
        char *pointer = mmap((void *)SEGBASEADDR, SEGLIMIT+1,
                        PROT_EXEC|PROT_READ|PROT_WRITE,
                        MAP_SHARED|MAP_ANONYMOUS, -1, 0);
        if (pointer == NULL) {
                printf("could not map space\n");
                return 0;
        }

        /* write ldt, new mode */
        int err = modify_ldt(0x11, &desc, sizeof(desc));
        if (err) {
                printf("error modifying ldt: %i\n", err);
                return 0;
        }

        for (int i=0; i<1000; i++) {
        asm volatile (
                "pusha\n\t"
                "mov %ss, %eax\n\t" /* preserve ss:esp */
                "mov %esp, %ebp\n\t"
                "push $7\n\t" /* index 0, ldt, user mode */
                "push $65536-4096\n\t" /* esp */
                "lss (%esp), %esp\n\t" /* switch to new stack */
                "push %eax\n\t" /* save old ss:esp on new stack */
                "push %ebp\n\t"
                "add $17*65536, %esp\n\t" /* set high bits */
                "mov %esp, %edx\n\t"

                "mov $10000000, %ecx\n\t" /* wait... */
                "1: loop 1b\n\t" /* ... a bit */

                "cmp %esp, %edx\n\t"
                "je 1f\n\t"
                "ud2\n\t" /* esp changed inexplicably! */
                "1:\n\t"
                "sub $17*65536, %esp\n\t" /* restore high bits */
                "lss (%esp), %esp\n\t" /* restore old ss:esp */
                "popa\n\t");

                printf("\rx%ix", i);
        }

        return 0;
}

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17 21:35:09 -07:00
Cyrill Gorcunov 3f4c3955ea x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present
Vegard Nossum reported:

[  503.576724] ACPI: Preparing to enter system sleep state S5
[  503.710857] Disabling non-boot CPUs ...
[  503.716853] Power down.
[  503.717770] ------------[ cut here ]------------
[  503.717770] WARNING: at arch/x86/kernel/apic/apic.c:249 native_apic_write_du)
[  503.717770] Hardware name: OptiPlex GX100
[  503.717770] Modules linked in:
[  503.717770] Pid: 2136, comm: halt Not tainted 2.6.30 #443
[  503.717770] Call Trace:
[  503.717770]  [<c154d327>] ? printk+0x18/0x1a
[  503.717770]  [<c1017358>] ? native_apic_write_dummy+0x38/0x50
[  503.717770]  [<c10360fc>] warn_slowpath_common+0x6c/0xc0
[  503.717770]  [<c1017358>] ? native_apic_write_dummy+0x38/0x50
[  503.717770]  [<c1036165>] warn_slowpath_null+0x15/0x20
[  503.717770]  [<c1017358>] native_apic_write_dummy+0x38/0x50
[  503.717770]  [<c1017173>] disconnect_bsp_APIC+0x63/0x100
[  503.717770]  [<c1019e48>] disable_IO_APIC+0xb8/0xc0
[  503.717770]  [<c1214231>] ? acpi_power_off+0x0/0x29
[  503.717770]  [<c1015e55>] native_machine_shutdown+0x65/0x80
[  503.717770]  [<c1015c36>] native_machine_power_off+0x26/0x30
[  503.717770]  [<c1015c49>] machine_power_off+0x9/0x10
[  503.717770]  [<c1046596>] kernel_power_off+0x36/0x40
[  503.717770]  [<c104680d>] sys_reboot+0xfd/0x1f0
[  503.717770]  [<c109daa0>] ? perf_swcounter_event+0xb0/0x130
[  503.717770]  [<c109db7d>] ? perf_counter_task_sched_out+0x5d/0x120
[  503.717770]  [<c102dfc6>] ? finish_task_switch+0x56/0xd0
[  503.717770]  [<c154da1e>] ? schedule+0x49e/0xb40
[  503.717770]  [<c10444b0>] ? sys_kill+0x70/0x160
[  503.717770]  [<c119d9db>] ? selinux_file_ioctl+0x3b/0x50
[  503.717770]  [<c10dd443>] ? sys_ioctl+0x63/0x70
[  503.717770]  [<c1003024>] sysenter_do_call+0x12/0x22
[  503.717770] ---[ end trace 8157b5d0ed378f15 ]---

|
| That's including this commit:
|
| commit 103428e57b
|Author: Cyrill Gorcunov <gorcunov@openvz.org>
|Date:   Sun Jun 7 16:48:40 2009 +0400
|
|    x86, apic: Fix dummy apic read operation together with broken MP handling
|

If we have apic disabled we don't even switch to APIC mode and do not
calling for connect_bsp_APIC. Though on SMP compiled kernel the
native_machine_shutdown does try to write the apic register anyway.

Fix it with explicit check if we really should touch apic registers.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090617181322.GG10822@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 20:24:39 +02:00
Huang Weiyi 8653f88ff9 x86: Remove duplicated #include's
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
LKML-Reference: <1244895686-2348-1-git-send-email-weiyi.huang@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 19:02:35 +02:00
Prarit Bhargava fe955e5c79 x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
Expand Intel NMI perfctr1 workaround to include a Core2 processor stepping
(cpuid family-6, model-f, stepping-4).  Resolves a situation where the NMI
would not enable on these processors.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: prarit@redhat.com
Cc: suresh.b.siddha@intel.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 18:20:39 +02:00
Jaswinder Singh Rajput 8f7007aabe x86: apic/io_apic.c: dmar_msi_type should be static
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 17:16:08 +02:00
Figo.zhang 50a8d4d297 x86, io_apic.c: Work around compiler warning
This compiler warning:

  arch/x86/kernel/apic/io_apic.c: In function ‘ioapic_write_entry’:
  arch/x86/kernel/apic/io_apic.c:466: warning: ‘eu’ is used uninitialized in this function
  arch/x86/kernel/apic/io_apic.c:465: note: ‘eu’ was declared here

Is bogus as 'eu' is always initialized. But annotate it away by
initializing the variable, to make it easier for people to notice
real warnings. A compiler that sees through this logic will
optimize away the initialization.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
LKML-Reference: <1245248720.3312.27.camel@myhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 17:13:25 +02:00
Cyrill Gorcunov 5ce4243dce x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present
If APIC was disabled (for some reason) and as result
it's not even mapped we should not try to enable thermal
interrupts at all.

Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Tested-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090615182633.GA7606@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 17:10:22 +02:00
Andi Kleen 203abd67b7 x86: mce: Handle banks == 0 case in K7 quirk
Vegard Nossum reported:

> I get an MCE-related crash like this in latest linus tree:
>
> [    0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [    0.116396] CPU: L2 Cache: 512K (64 bytes/line)
> [    0.120570] mce: CPU supports 0 MCE banks
> [    0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000010
> [    0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320
> [    0.128001] PGD 0
> [    0.128001] Thread overran stack, or stack corrupted
> [    0.128001] Oops: 0002 [#1] PREEMPT SMP
> [    0.128001] last sysfs file:
> [    0.128001] CPU 0
> [    0.128001] Modules linked in:
> [    0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426
> [    0.128001] RIP: 0010:[<ffffffff813b98ad>]  [<ffffffff813b98ad>] mcheck_init+0x278/0x320
> [    0.128001] RSP: 0018:ffffffff81595e38  EFLAGS: 00000246
> [    0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000
> [    0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010
> [    0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000
> [    0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000
> [    0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
> [    0.128001] FS:  0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000
> 00000000000
> [    0.128001] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [    0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0
> [    0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
> [    0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff
> ff8152a4a0)
> [    0.128001] Stack:
> [    0.128001]  0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f
> 914
> [    0.128001]  ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8
> 69c
> [    0.128001]  5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb
> e6e
> [    0.128001] Call Trace:
> [    0.128001]  [<ffffffff813b869c>] identify_cpu+0x331/0x392
> [    0.128001]  [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
> [    0.128001]  [<ffffffff815a14ac>] check_bugs+0x1c/0x60
> [    0.128001]  [<ffffffff8159c075>] start_kernel+0x403/0x46e
> [    0.128001]  [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
> [    0.128001]  [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
> [    0.128001]  [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71

This happens on QEMU which reports MCA capability, but no banks.
Without this patch there is a buffer overrun and boot ops because
the code would try to initialize the 0 element of a zero length
kmalloc() buffer.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20090615125200.GD31969@one.firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 08:59:45 +02:00
Ingo Molnar cc4949e1fd Merge branch 'linus' into x86/urgent
Merge reason: pull in latest to fix a bug in it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 08:59:10 +02:00
Cliff Wickman e2a7147640 x86: correct the conversion of EFI memory types
This patch causes all the EFI_RESERVED_TYPE memory reservations to be recorded
in the e820 table as type E820_RESERVED.

(This patch replaces one called 'x86: vendor reserved memory type'.
 This version has been discussed a bit with Peter and Yinghai but not given
 a final opinion.)

Without this patch EFI_RESERVED_TYPE memory reservations may be
marked usable in the e820 table. There may be a collision between
kernel use and some reserver's use of this memory.

(An example use of this functionality is the UV system, which
 will access extremely large areas of memory with a memory engine
 that allows a user to address beyond the processor's range.  Such
 areas are reserved in the EFI table by the BIOS.
 Some loaders have a restricted number of entries possible in the e820 table,
 hence the need to record the reservations in the unrestricted EFI table.)

The call to do_add_efi_memmap() is only made if "add_efi_memmap" is specified
on the kernel command line.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 17:47:32 -07:00
H. Peter Anvin 95ee14e437 x86: cap iomem_resource to addressable physical memory
iomem_resource is by default initialized to -1, which means 64 bits of
physical address space if 64-bit resources are enabled.  However, x86
CPUs cannot address 64 bits of physical address space.  Thus, we want
to cap the physical address space to what the union of all CPU can
actually address.

Without this patch, we may end up assigning inaccessible values to
uninitialized 64-bit PCI memory resources.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Martin Mares <mj@ucw.cz>
Cc: stable@kernel.org
2009-06-16 17:47:31 -07:00
Ingo Molnar 8a4a6182fd Merge branch 'amd-iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2009-06-16 11:51:24 +02:00
Chris Wright 6a047d8b9e amd-iommu: resume cleanup
Now that enable_iommus() will call iommu_disable() for each iommu,
the call to disable_iommus() during resume is redundant.  Also, the order
for an invalidation is to invalidate device table entries first, then
domain translations.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-16 10:19:16 +02:00
Linus Torvalds 19035e5b5d Merge branch 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timers: Logic to move non pinned timers
  timers: /proc/sys sysctl hook to enable timer migration
  timers: Identifying the existing pinned timers
  timers: Framework for identifying pinned timers
  timers: allow deferrable timers for intervals tv2-tv5 to be deferred

Fix up conflicts in kernel/sched.c and kernel/timer.c manually
2009-06-15 10:06:19 -07:00
Joerg Roedel 09067207f6 amd-iommu: set event buffer head and tail to 0 manually
These registers may contain values from previous kernels. So reset them
to known values before enable the event buffer again.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-15 16:06:48 +02:00
Chris Wright a8c485bb68 amd-iommu: disable cmd buffer and evt logging before reprogramming iommu
The IOMMU spec states that IOMMU behavior may be undefined when the
IOMMU registers are rewritten while command or event buffer is enabled.
Disable them in IOMMU disable path.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-15 15:53:45 +02:00
Chris Wright 42a49f965a amd-iommu: flush domain tlb when attaching a new device
When kexec'ing to a new kernel (for example, when crashing and launching
a kdump session), the AMD IOMMU may have cached translations.  The kexec'd
kernel, during initialization, will invalidate the IOMMU device table
entries, but not the domain translations.  These stale entries can cause
a device's DMA to fail, makes it rough to write a dump to disk when the
disk controller can't DMA ;-)

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-15 15:42:00 +02:00
Joerg Roedel 61d047be99 x86: disable IOMMUs on kernel crash
If the IOMMUs are still enabled when the kexec kernel boots access to
the disk is not possible. This is bad for tools like kdump or anything
else which wants to use PCI devices.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-15 15:20:40 +02:00
Joerg Roedel 0975904276 amd-iommu: disable IOMMU hardware on shutdown
When the IOMMU stays enabled the BIOS may not be able to finish the
machine shutdown properly. So disable the hardware on shutdown.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-15 15:20:40 +02:00
Thomas Gleixner 507fa3a3d8 x86: hpet: Mark per cpu interrupts IRQF_TIMER to prevent resume failure
timer interrupts are excluded from being disabled during suspend. The
clock events code manages the disabling of clock events on its own
because the timer interrupt needs to be functional before the resume
code reenables the device interrupts.

The hpet per cpu timers request their interrupt without setting the
IRQF_TIMER flag so suspend_device_irqs() disables them as well which
results in a fatal resume failure on the boot CPU.

Adding IRQF_TIMER to the interupt flags when requesting the hpet per
cpu timer interrupts solves the problem.

Reported-by: Benjamin S. <sbenni@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Benjamin S. <sbenni@gmx.de>
Cc: stable@kernel.org
2009-06-14 18:24:29 +02:00
Linus Torvalds a2ee2981ae Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits)
  x86, mce: Add boot options for corrected errors
  x86, mce: Fix mce printing
  x86, mce: fix for mce counters
  x86, mce: support action-optional machine checks
  x86, mce: define MCE_VECTOR
  x86, mce: rename mce_notify_user to mce_notify_irq
  x86: fix panic with interrupts off (needed for MCE)
  x86, mce: export MCE severities coverage via debugfs
  x86, mce: implement new status bits
  x86, mce: print header/footer only once for multiple MCEs
  x86, mce: default to panic timeout for machine checks
  x86, mce: improve mce_get_rip
  x86, mce: make non Monarch panic message "Fatal machine check" too
  x86, mce: switch x86 machine check handler to Monarch election.
  x86, mce: implement panic synchronization
  x86, mce: implement bootstrapping for machine check wakeups
  x86, mce: check early in exception handler if panic is needed
  x86, mce: add table driven machine check grading
  x86, mce: remove TSC print heuristic
  x86, mce: log corrected errors when panicing
  ...
2009-06-13 13:14:51 -07:00
Linus Torvalds 947ec0b0c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Add empty suspend/resume device irq functions
  PM/Hibernate: Move NVS routines into a seperate file (v2).
  PM/Hibernate: Rename disk.c to hibernate.c
  PM: Separate suspend to RAM functionality from core
  Driver Core: Rework platform suspend/resume, print warning
  PM: Remove device_type suspend()/resume()
  PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  PM/Suspend: Do not shrink memory before suspend
  PM: Remove bus_type suspend_late()/resume_early() V2
  PM core: rename suspend and resume functions
  PM: Rename device_power_down/up()
  PM: Remove unused asm/suspend.h
  x86: unify power/cpu_(32|64).c
  x86: unify power/cpu_(32|64) copyright notes
  x86: unify power/cpu_(32|64) regarding restoring processor state
  x86: unify power/cpu_(32|64) regarding saving processor state
  x86: unify power/cpu_(32|64) global variables
  x86: unify power/cpu_(32|64) headers
  PM: Warn if interrupts are enabled during suspend-resume of sysdevs
  PM/ACPI/x86: Fix sparse warning in arch/x86/kernel/acpi/sleep.c
2009-06-12 13:17:27 -07:00
Linus Torvalds 4ddbac9898 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter: Start documenting HAVE_PERF_COUNTERS requirements
  perf_counter: Add forward/backward attribute ABI compatibility
  perf record: Explicity program a default counter
  perf_counter: Remove PERF_TYPE_RAW special casing
  perf_counter: PERF_TYPE_HW_CACHE is a hardware counter too
  powerpc, perf_counter: Fix performance counter event types
  perf_counter/x86: Add a quirk for Atom processors
  perf_counter tools: Remove one L1-data alias
2009-06-12 13:16:52 -07:00
Alan Stern d161630297 PM core: rename suspend and resume functions
This patch (as1241) renames a bunch of functions in the PM core.
Rather than go through a boring list of name changes, suffice it to
say that in the end we have a bunch of pairs of functions:

	device_resume_noirq	dpm_resume_noirq
	device_resume		dpm_resume
	device_complete		dpm_complete
	device_suspend_noirq	dpm_suspend_noirq
	device_suspend		dpm_suspend
	device_prepare		dpm_prepare

in which device_X does the X operation on a single device and dpm_X
invokes device_X for all devices in the dpm_list.

In addition, the old dpm_power_up and device_resume_noirq have been
combined into a single function (dpm_resume_noirq).

Lastly, dpm_suspend_start and dpm_resume_end are the renamed versions
of the former top-level device_suspend and device_resume routines.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-06-12 21:32:31 +02:00
Magnus Damm e39a71ef80 PM: Rename device_power_down/up()
Rename the functions performing "_noirq" dev_pm_ops
operations from device_power_down() and device_power_up()
to device_suspend_noirq() and device_resume_noirq().

The new function names are chosen to show that the functions
are responsible for calling the _noirq() versions to finalize
the suspend/resume operation. The current function names do
not perform power down/up anymore so the names may be misleading.

Global function renames:
- device_power_down() -> device_suspend_noirq()
- device_power_up() -> device_resume_noirq()

Static function renames:
- suspend_device_noirq() -> __device_suspend_noirq()
- resume_device_noirq() -> __device_resume_noirq()

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Len Brown <lenb@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-06-12 21:32:31 +02:00
Jaswinder Singh Rajput ce4b3c5547 PM/ACPI/x86: Fix sparse warning in arch/x86/kernel/acpi/sleep.c
One of the numbers in arch/x86/kernel/acpi/sleep.c is long, but it is
not annotated appropriately, so sparese warns about it.  Fix that.

[rjw: added the changelog.]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-06-12 21:32:29 +02:00
Linus Torvalds 6d21491838 Merge branch 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slab: setup cpu caches later on when interrupts are enabled
  slab,slub: don't enable interrupts during early boot
  slab: fix gfp flag in setup_cpu_cache()
  x86: make zap_low_mapping could be used early
  irq: slab alloc for default irq_affinity
  memcg: fix page_cgroup fatal error in FLATMEM
2009-06-12 09:52:30 -07:00
Linus Torvalds 7f3591cfac Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits)
  lguest: add support for indirect ring entries
  lguest: suppress notifications in example Launcher
  lguest: try to batch interrupts on network receive
  lguest: avoid sending interrupts to Guest when no activity occurs.
  lguest: implement deferred interrupts in example Launcher
  lguest: remove obsolete LHREQ_BREAK call
  lguest: have example Launcher service all devices in separate threads
  lguest: use eventfds for device notification
  eventfd: export eventfd_signal and eventfd_fget for lguest
  lguest: allow any process to send interrupts
  lguest: PAE fixes
  lguest: PAE support
  lguest: Add support for kvm_hypercall4()
  lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD
  lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated
  lguest: map switcher with executable page table entries
  lguest: fix writev returning short on console output
  lguest: clean up length-used value in example launcher
  lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition.
  lguest: beyond ARRAY_SIZE of cpu->arch.gdt
  ...
2009-06-12 09:32:26 -07:00
Linus Torvalds 65d52cc9d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param:
  module: cleanup FIXME comments about trimming exception table entries.
  module: trim exception table on init free.
  module: merge module_alloc() finally
  uml module: fix uml build process due to this merge
  x86 module: merge the rest functions with macros
  x86 module: merge the same functions in module_32.c and module_64.c
  uvesafb: improve parameter handling.
  module_param: allow 'bool' module_params to be bool, not just int.
  module_param: add __same_type convenience wrapper for __builtin_types_compatible_p
  module_param: split perm field into flags and perm
  module_param: invbool should take a 'bool', not an 'int'
  cyber2000fb.c: use proper method for stopping unload if CONFIG_ARCH_SHARK
2009-06-12 09:30:36 -07:00
Linus Torvalds db8e7f10ed Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Provide _sdata in the vmlinux.lds.S file
  x86: handle initrd that extends into unusable memory
2009-06-12 09:26:32 -07:00
Rusty Russell 61f4bc83fe lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.

But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers.  In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.

The results are in the noise, but since it's about the same amount of
code, it's worth applying.

1GB Guest->Host: input(suppressed),output(suppressed)
Before:
	Seconds: 0:16.53
	Packets: 377268,753673
	Interrupts: 22461,24297
	Notifications: 1(5245),21303(732370)
	Net IRQs triggered: 377023(245),42578(711095)

After:
	Seconds: 0:16.48
	Packets: 377289,753673
	Interrupts: 22281,24465
	Notifications: 1(5245),21296(732377)
	Net IRQs triggered: 377060(229),42564(711109)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 +09:30
Rusty Russell 5933048c69 module: cleanup FIXME comments about trimming exception table entries.
Everyone cut and paste this comment from my original one.  We now do
it generically, so cut the comments.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Amerigo Wang <amwang@redhat.com>
2009-06-12 21:47:05 +09:30
Amerigo Wang c398df30d5 module: merge module_alloc() finally
As Christoph Hellwig suggested, module_alloc() actually can be
unified for i386 and x86_64 (of course, also UML).

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: 'Ingo Molnar' <mingo@elte.hu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 21:47:03 +09:30
Amerigo Wang 0fdc83b950 x86 module: merge the rest functions with macros
Merge the rest functions together, with proper preprocessing directives.
Finally remove module_{32|64}.c.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 21:47:01 +09:30
Amerigo Wang 2d5bf28fb9 x86 module: merge the same functions in module_32.c and module_64.c
Merge the same functions both in module_32.c and module_64.c into
module.c.

This is the first step to merge both of them finally.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 21:47:00 +09:30
Yong Wang dff5da6d09 perf_counter/x86: Add a quirk for Atom processors
The fixed-function performance counters do not work on current Atom
processors. Use the general-purpose ones instead.

Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20090612080855.GA2286@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-12 13:48:32 +02:00
Yinghai Lu 55cd63676e x86: make zap_low_mapping could be used early
Only one cpu is there, just call __flush_tlb for it. Fixes the following boot
warning on x86:

  [    0.000000] Memory: 885032k/915540k available (5993k kernel code, 29844k reserved, 3842k data, 428k init, 0k highmem)
  [    0.000000] virtual kernel memory layout:
  [    0.000000]     fixmap  : 0xffe17000 - 0xfffff000   (1952 kB)
  [    0.000000]     vmalloc : 0xf8615000 - 0xffe15000   ( 120 MB)
  [    0.000000]     lowmem  : 0xc0000000 - 0xf7e15000   ( 894 MB)
  [    0.000000]       .init : 0xc19a5000 - 0xc1a10000   ( 428 kB)
  [    0.000000]       .data : 0xc15da4bb - 0xc199af6c   (3842 kB)
  [    0.000000]       .text : 0xc1000000 - 0xc15da4bb   (5993 kB)
  [    0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
  [    0.000000] ------------[ cut here ]------------
  [    0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x50/0x1b0()
  [    0.000000] Hardware name: System Product Name
  [    0.000000] Modules linked in:
  [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip #52504
  [    0.000000] Call Trace:
  [    0.000000]  [<c104aa16>] warn_slowpath_common+0x65/0x95
  [    0.000000]  [<c104aa58>] warn_slowpath_null+0x12/0x15
  [    0.000000]  [<c1073bbe>] smp_call_function_many+0x50/0x1b0
  [    0.000000]  [<c1037615>] ? do_flush_tlb_all+0x0/0x41
  [    0.000000]  [<c1037615>] ? do_flush_tlb_all+0x0/0x41
  [    0.000000]  [<c1073d4f>] smp_call_function+0x31/0x58
  [    0.000000]  [<c1037615>] ? do_flush_tlb_all+0x0/0x41
  [    0.000000]  [<c104f635>] on_each_cpu+0x26/0x65
  [    0.000000]  [<c10374b5>] flush_tlb_all+0x19/0x1b
  [    0.000000]  [<c1032ab3>] zap_low_mappings+0x4d/0x56
  [    0.000000]  [<c15d64b5>] ? printk+0x14/0x17
  [    0.000000]  [<c19b42a8>] mem_init+0x23d/0x245
  [    0.000000]  [<c19a56a1>] start_kernel+0x17a/0x2d5
  [    0.000000]  [<c19a5347>] ? unknown_bootoption+0x0/0x19a
  [    0.000000]  [<c19a5039>] __init_begin+0x39/0x41
  [    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-12 13:50:24 +03:00
Catalin Marinas 1260866a27 x86: Provide _sdata in the vmlinux.lds.S file
_sdata is a common symbol defined by many architectures and made
available to the kernel via asm-generic/sections.h. Kmemleak uses this
symbol when scanning the data sections.

[ Impact: add new global symbol ]

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
LKML-Reference: <20090511122105.26556.96593.stgit@pc1117.cambridge.arm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-12 09:21:33 +02:00
Yinghai Lu 12274e96b4 x86: use zalloc_cpumask_var in arch_early_irq_init
So we make sure MAXSMP gets a cleared cpumask

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11 20:04:36 -07:00
Yinghai Lu 8c5dd8f433 x86: handle initrd that extends into unusable memory
On a system where system memory (according e820) is not covered by
mtrr, mtrr_trim_memory converts a portion of memory to reserved, but
bootloader has already put the initrd in that range.

Thus, we need to have 64bit to use relocate_initrd too.

[ Impact: fix using initrd when mtrr_trim_memory happen ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
2009-06-11 15:19:13 -07:00
Ingo Molnar 0d5959723e Merge branch 'linus' into x86/mce3
Conflicts:
	arch/x86/kernel/cpu/mcheck/mce_64.c
	arch/x86/kernel/irq.c

Merge reason: Resolve the conflicts above.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 23:31:52 +02:00
Linus Torvalds 8a1ca8cedd Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits)
  perf_counter: Turn off by default
  perf_counter: Add counter->id to the throttle event
  perf_counter: Better align code
  perf_counter: Rename L2 to LL cache
  perf_counter: Standardize event names
  perf_counter: Rename enums
  perf_counter tools: Clean up u64 usage
  perf_counter: Rename perf_counter_limit sysctl
  perf_counter: More paranoia settings
  perf_counter: powerpc: Implement generalized cache events for POWER processors
  perf_counters: powerpc: Add support for POWER7 processors
  perf_counter: Accurate period data
  perf_counter: Introduce struct for sample data
  perf_counter tools: Normalize data using per sample period data
  perf_counter: Annotate exit ctx recursion
  perf_counter tools: Propagate signals properly
  perf_counter tools: Small frequency related fixes
  perf_counter: More aggressive frequency adjustment
  perf_counter/x86: Fix the model number of Intel Core2 processors
  perf_counter, x86: Correct some event and umask values for Intel processors
  ...
2009-06-11 14:01:07 -07:00
Linus Torvalds b640f042fa Merge branch 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  vgacon: use slab allocator instead of the bootmem allocator
  irq: use kcalloc() instead of the bootmem allocator
  sched: use slab in cpupri_init()
  sched: use alloc_cpumask_var() instead of alloc_bootmem_cpumask_var()
  memcg: don't use bootmem allocator in setup code
  irq/cpumask: make memoryless node zero happy
  x86: remove some alloc_bootmem_cpumask_var calling
  vt: use kzalloc() instead of the bootmem allocator
  sched: use kzalloc() instead of the bootmem allocator
  init: introduce mm_init()
  vmalloc: use kzalloc() instead of alloc_bootmem()
  slab: setup allocators earlier in the boot sequence
  bootmem: fix slab fallback on numa
  bootmem: use slab if bootmem is no longer available
2009-06-11 12:25:06 -07:00
Linus Torvalds 6cd8e300b4 Merge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits)
  KVM: Prevent overflow in largepages calculation
  KVM: Disable large pages on misaligned memory slots
  KVM: Add VT-x machine check support
  KVM: VMX: Rename rmode.active to rmode.vm86_active
  KVM: Move "exit due to NMI" handling into vmx_complete_interrupts()
  KVM: Disable CR8 intercept if tpr patching is active
  KVM: Do not migrate pending software interrupts.
  KVM: inject NMI after IRET from a previous NMI, not before.
  KVM: Always request IRQ/NMI window if an interrupt is pending
  KVM: Do not re-execute INTn instruction.
  KVM: skip_emulated_instruction() decode instruction if size is not known
  KVM: Remove irq_pending bitmap
  KVM: Do not allow interrupt injection from userspace if there is a pending event.
  KVM: Unprotect a page if #PF happens during NMI injection.
  KVM: s390: Verify memory in kvm run
  KVM: s390: Sanity check on validity intercept
  KVM: s390: Unlink vcpu on destroy - v2
  KVM: s390: optimize float int lock: spin_lock_bh --> spin_lock
  KVM: s390: use hrtimer for clock wakeup from idle - v2
  KVM: s390: Fix memory slot versus run - v3
  ...
2009-06-11 10:03:30 -07:00
Yinghai Lu dad213aeb5 irq/cpumask: make memoryless node zero happy
Don't hardcode to node zero for early boot IRQ setup memory allocations.

[ penberg@cs.helsinki.fi: minor cleanups ]
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-11 19:27:08 +03:00
Yinghai Lu 38c7fed2f5 x86: remove some alloc_bootmem_cpumask_var calling
Now that we set up the slab allocator earlier, we can get rid of some
alloc_bootmem_cpumask_var() calls in boot code.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-11 19:27:07 +03:00
Ingo Molnar 940010c5a3 Merge branch 'linus' into perfcounters/core
Conflicts:
	arch/x86/kernel/irqinit.c
	arch/x86/kernel/irqinit_64.c
	arch/x86/kernel/traps.c
	arch/x86/mm/fault.c
	include/linux/sched.h
	kernel/exit.c
2009-06-11 17:55:42 +02:00
Peter Zijlstra 8be6e8f3c3 perf_counter: Rename L2 to LL cache
The top (fastest) and last level (biggest) caches are the most
interesting ones, performance wise.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
[ Fixed the Nehalem LL table to LLC Reference/Miss events ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:17 +02:00
Peter Zijlstra f4dbfa8f31 perf_counter: Standardize event names
Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:15 +02:00
Hidetoshi Seto 62fdac5913 x86, mce: Add boot options for corrected errors
This patch introduces three boot options (no_cmci, dont_log_ce
and ignore_ce) to control handling for corrected errors.

The "mce=no_cmci" boot option disables the CMCI feature.

Since CMCI is a new feature so having boot controls to disable
it will be a help if the hardware is misbehaving.

The "mce=dont_log_ce" boot option disables logging for corrected
errors. All reported corrected errors will be cleared silently.
This option will be useful if you never care about corrected
errors.

The "mce=ignore_ce" boot option disables features for corrected
errors, i.e. polling timer and cmci.  All corrected events are
not cleared and kept in bank MSRs.

Usually this disablement is not recommended, however it will be
a help if there are some conflict with the BIOS or hardware
monitoring applications etc., that clears corrected events in
banks instead of OS.

[ And trivial cleanup (space -> tab) for doc is included. ]

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <4A30ACDF.5030408@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 11:42:18 +02:00