Commit graph

9148 commits

Author SHA1 Message Date
Tejun Heo 98e5e1bf72 dump_stack: implement arch-specific hardware description in task dumps
x86 and ia64 can acquire extra hardware identification information
from DMI and print it along with task dumps; however, the usage isn't
consistent.

* x86 show_regs() collects vendor, product and board strings and print
  them out with PID, comm and utsname.  Some of the information is
  printed again later in the same dump.

* warn_slowpath_common() explicitly accesses the DMI board and prints
  it out with "Hardware name:" label.  This applies to both x86 and
  ia64 but is irrelevant on all other archs.

* ia64 doesn't show DMI information on other non-WARN dumps.

This patch introduces arch-specific hardware description used by
dump_stack().  It can be set by calling dump_stack_set_arch_desc()
during boot and, if exists, printed out in a separate line with
"Hardware name:" label.

dmi_set_dump_stack_arch_desc() is added which sets arch-specific
description from DMI data.  It uses dmi_ids_string[] which is set from
dmi_present() used for DMI debug message.  It is superset of the
information x86 show_regs() is using.  The function is called from x86
and ia64 boot code right after dmi_scan_machine().

This makes the explicit DMI handling in warn_slowpath_common()
unnecessary.  Removed.

show_regs() isn't yet converted to use generic debug information
printing and this patch doesn't remove the duplicate DMI handling in
x86 show_regs().  The next patch will unify show_regs() handling and
remove the duplication.

An example WARN dump follows.

 WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
  0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
  ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e
  0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
 Call Trace:
  [<ffffffff81c614dc>] dump_stack+0x19/0x1b
  [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
  [<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8234a0c3>] init_workqueues+0x35/0x505
  ...

v2: Use the same string as the debug message from dmi_present() which
    also contains BIOS information.  Move hardware name into its own
    line as warn_slowpath_common() did.  This change was suggested by
    Bjorn Helgaas.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00
Tejun Heo 196779b9b4 dump_stack: consolidate dump_stack() implementations and unify their behaviors
Both dump_stack() and show_stack() are currently implemented by each
architecture.  show_stack(NULL, NULL) dumps the backtrace for the
current task as does dump_stack().  On some archs, dump_stack() prints
extra information - pid, utsname and so on - in addition to the
backtrace while the two are identical on other archs.

The usages in arch-independent code of the two functions indicate
show_stack(NULL, NULL) should print out bare backtrace while
dump_stack() is used for debugging purposes when something went wrong,
so it does make sense to print additional information on the task which
triggered dump_stack().

There's no reason to require archs to implement two separate but mostly
identical functions.  It leads to unnecessary subtle information.

This patch expands the dummy fallback dump_stack() implementation in
lib/dump_stack.c such that it prints out debug information (taken from
x86) and invokes show_stack(NULL, NULL) and drops arch-specific
dump_stack() implementations in all archs except blackfin.  Blackfin's
dump_stack() does something wonky that I don't understand.

Debug information can be printed separately by calling
dump_stack_print_info() so that arch-specific dump_stack()
implementation can still emit the same debug information.  This is used
in blackfin.

This patch brings the following behavior changes.

* On some archs, an extra level in backtrace for show_stack() could be
  printed.  This is because the top frame was determined in
  dump_stack() on those archs while generic dump_stack() can't do that
  reliably.  It can be compensated by inlining dump_stack() but not
  sure whether that'd be necessary.

* Most archs didn't use to print debug info on dump_stack().  They do
  now.

An example WARN dump follows.

 WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
 Hardware name: empty
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9
  0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
  ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c
  0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
 Call Trace:
  [<ffffffff81c614dc>] dump_stack+0x19/0x1b
  [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8234a071>] init_workqueues+0x35/0x505
  ...

v2: CPU number added to the generic debug info as requested by s390
    folks and dropped the s390 specific dump_stack().  This loses %ksp
    from the debug message which the maintainers think isn't important
    enough to keep the s390-specific dump_stack() implementation.

    dump_stack_print_info() is moved to kernel/printk.c from
    lib/dump_stack.c.  Because linkage is per objecct file,
    dump_stack_print_info() living in the same lib file as generic
    dump_stack() means that archs which implement custom dump_stack()
    - at this point, only blackfin - can't use dump_stack_print_info()
    as that will bring in the generic version of dump_stack() too.  v1
    The v1 patch broke build on blackfin due to this issue.  The build
    breakage was reported by Fengguang Wu.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	[s390 bits]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00
Tejun Heo a77f2a4e6f x86: don't show trace beyond show_stack(NULL, NULL)
There are multiple ways a task can be dumped - explicit call to
dump_stack(), triggering WARN() or BUG(), through sysrq-t and so on.
Most of what gets printed is upto each architecture and the current
state is not particularly pretty.  Different pieces of information are
presented differently depending on which path the dump takes and which
architecture it's running on.  This is messy for no good reason and
makes it exceedingly difficult to add or modify debug information to
task dumps.

In all archs except for s390, there's nothing arch-specific about the
printed debug information.  This patchset updates all those archs to use
the same helpers to consistently print out the same debug information.

An example WARN dump after this patchset.

 WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
  0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
  ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e
  0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
 Call Trace:
  [<ffffffff81c614dc>] dump_stack+0x19/0x1b
  [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
  [<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8234a0c3>] init_workqueues+0x35/0x505
  ...

And BUG dump.

 kernel BUG at kernel/workqueue.c:4841!
 invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
 task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
 RIP: 0010:[<ffffffff8234a07e>]  [<ffffffff8234a07e>] init_workqueues+0x4/0x6
 RSP: 0000:ffff88007c861ec8  EFLAGS: 00010246
 RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
 RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Stack:
  ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
  0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
  ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
 Call Trace:
  [<ffffffff81000312>] do_one_initcall+0x122/0x170
  [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
  [<ffffffff81c47760>] ? rest_init+0x140/0x140
  [<ffffffff81c4776e>] kernel_init+0xe/0xf0
  [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81c47760>] ? rest_init+0x140/0x140
  ...

This patchset contains the following seven patches.

 0001-x86-don-t-show-trace-beyond-show_stack-NULL-NULL.patch
 0002-sparc32-make-show_stack-acquire-fp-if-_ksp-is-not-sp.patch
 0003-dump_stack-consolidate-dump_stack-implementations-an.patch
 0004-dmi-morph-dmi_dump_ids-into-dmi_format_ids-which-for.patch
 0005-dump_stack-implement-arch-specific-hardware-descript.patch
 0006-dump_stack-unify-debug-information-printed-by-show_r.patch
 0007-arc-print-fatal-signals-reduce-duplicated-informatio.patch

0001-0002 update stack dumping functions in x86 and sparc32 in
preparation.

0003 makes all arches except blackfin use generic dump_stack().
blackfin still uses the generic helper to print the same info.

0004-0005 properly abstract DMI identifier printing in WARN() and
show_regs() so that all dumps print out the information.  This enables
show_regs() to use the same debug info message.

0006 updates show_regs() of all arches to use a common generic helper
to print debug info.

0007 removes somem duplicate information from arc dumps.

While this patchset changes how debug info is printed on some archs,
the printed information is always superset of what used to be there.

This patchset makes task dump debug messages consistent and enables
adding more information.  Workqueue is scheduled to add worker
information including the workqueue in use and work item specific
description.

While this patch touches a lot of archs, it isn't too likely to cause
non-trivial conflicts with arch-specfic changes and would probably be
best to route together either through -mm.

x86 is tested but other archs are either only compile tested or not
tested at all.  Changes to most archs are generally trivial.

This patch:

show_stack(current or NULL, NULL) is used to print the backtrace of the
current task.  As trace beyond the function itself isn't of much
interest to anyone, don't show it by determining sp and bp in
show_stack()'s frame and passing them to show_stack_log_lvl().

This brings show_stack(NULL, NULL)'s behavior in line with
dump_stack().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:01 -07:00
Linus Torvalds 5a5a1bf099 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:

 - Add an Intel CMCI hotplug fix

 - Add AMD family 16h EDAC support

 - Make the AMD MCE banks code more flexible for virtual environments

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  amd64_edac: Add Family 16h support
  x86/mce: Rework cmci_rediscover() to play well with CPU hotplug
  x86, MCE, AMD: Use MCG_CAP MSR to find out number of banks on AMD
  x86, MCE, AMD: Replace shared_bank array with is_shared_bank() helper
2013-04-30 08:42:45 -07:00
Linus Torvalds 74c7d2f520 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform changes from Ingo Molnar:
 "Small fixes and cleanups all over the map"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/setup: Drop unneeded include <asm/dmi.h>
  x86/olpc/xo1/sci: Don't call input_free_device() after input_unregister_device()
  x86/platform/intel/mrst: Remove cast for kmalloc() return value
  x86/platform/uv: Replace kmalloc() & memset with kzalloc()
2013-04-30 08:42:11 -07:00
Linus Torvalds 1e2f5b598a Merge branch 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt update from Ingo Molnar:
 "Various paravirtualization related changes - the biggest one makes
  guest support optional via CONFIG_HYPERVISOR_GUEST"

* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, wakeup, sleep: Use pvops functions for changing GDT entries
  x86, xen, gdt: Remove the pvops variant of store_gdt.
  x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed
  x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed.
  x86: Make Linux guest support optional
  x86, Kconfig: Move PARAVIRT_DEBUG into the paravirt menu
2013-04-30 08:41:21 -07:00
Linus Torvalds f9b3bcfbc4 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "Misc smaller changes all over the map"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/iommu/dmar: Remove warning for HPET scope type
  x86/mm/gart: Drop unnecessary check
  x86/mm/hotplug: Put kernel_physical_mapping_remove() declaration in CONFIG_MEMORY_HOTREMOVE
  x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMER
  x86/mm/numa: Simplify some bit mangling
  x86/mm: Re-enable DEBUG_TLBFLUSH for X86_32
  x86/mm/cpa: Cleanup split_large_page() and its callee
  x86: Drop always empty .text..page_aligned section
2013-04-30 08:40:35 -07:00
Linus Torvalds 01c7cd0ef5 Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perparatory x86 kasrl changes from Ingo Molnar:
 "This contains changes from the ongoing KASLR work, by Kees Cook.

  The main changes are the use of a read-only IDT on x86 (which
  decouples the userspace visible virtual IDT address from the physical
  address), and a rework of ELF relocation support, in preparation of
  random, boot-time kernel image relocation."

* 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, relocs: Refactor the relocs tool to merge 32- and 64-bit ELF
  x86, relocs: Build separate 32/64-bit tools
  x86, relocs: Add 64-bit ELF support to relocs tool
  x86, relocs: Consolidate processing logic
  x86, relocs: Generalize ELF structure names
  x86: Use a read-only IDT alias on all CPUs
2013-04-30 08:37:24 -07:00
Linus Torvalds 39b2f8656e Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug update from Ingo Molnar:
 "Two small changes: a documentation update and a constification"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, early-printk: Update earlyprintk documentation (and kill x86 copy)
  x86: Constify a few items
2013-04-30 08:35:20 -07:00
Linus Torvalds df8edfa9af Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid changes from Ingo Molnar:
 "The biggest change is x86 CPU bug handling refactoring and cleanups,
  by Borislav Petkov"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, CPU, AMD: Drop useless label
  x86, AMD: Correct {rd,wr}msr_amd_safe warnings
  x86: Fold-in trivial check_config function
  x86, cpu: Convert AMD Erratum 400
  x86, cpu: Convert AMD Erratum 383
  x86, cpu: Convert Cyrix coma bug detection
  x86, cpu: Convert FDIV bug detection
  x86, cpu: Convert F00F bug detection
  x86, cpu: Expand cpufeature facility to include cpu bugs
2013-04-30 08:34:38 -07:00
Linus Torvalds 874f6d1be7 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc smaller cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/lib: Fix spelling, put space between a numeral and its units
  x86/lib: Fix spelling in the comments
  x86, quirks: Shut-up a long-standing gcc warning
  x86, msr: Unify variable names
  x86-64, docs, mm: Add vsyscall range to virtual address space layout
  x86: Drop KERNEL_IMAGE_START
  x86_64: Use __BOOT_DS instead_of __KERNEL_DS for safety
2013-04-30 08:34:07 -07:00
Linus Torvalds ab86e974f0 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core timer updates from Ingo Molnar:
 "The main changes in this cycle's merge are:

   - Implement shadow timekeeper to shorten in kernel reader side
     blocking, by Thomas Gleixner.

   - Posix timers enhancements by Pavel Emelyanov:

   - allocate timer ID per process, so that exact timer ID allocations
     can be re-created be checkpoint/restore code.

   - debuggability and tooling (/proc/PID/timers, etc.) improvements.

   - suspend/resume enhancements by Feng Tang: on certain new Intel Atom
     processors (Penwell and Cloverview), there is a feature that the
     TSC won't stop in S3 state, so the TSC value won't be reset to 0
     after resume.  This can be taken advantage of by the generic via
     the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to
     recover/approximate sleep time, the main (and precise) clocksource
     can be used.

   - Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many
     CPUs the file goes beyond 4MB of size and thus the current
     simplistic seqfile approach fails.  Convert /proc/timer_list to a
     proper seq_file with its own iterator.

   - Cleanups and refactorings of the core timekeeping code by John
     Stultz.

   - International Atomic Clock time is managed by the NTP code
     internally currently but not exposed externally.  Separate the TAI
     code out and add CLOCK_TAI support and TAI support to the hrtimer
     and posix-timer code, by John Stultz.

   - Add deep idle support enhacement to the broadcast clockevents core
     timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ
     clockevents feature (which will be utilized by future clockevents
     driver updates), which allows the use of IRQ affinities to avoid
     spurious wakeups of idle CPUs - the right CPU with an expiring
     timer will be woken.

   - Add new ARM bcm281xx clocksource driver, by Christian Daudt

   - ... various other fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  clockevents: Set dummy handler on CPU_DEAD shutdown
  timekeeping: Update tk->cycle_last in resume
  posix-timers: Remove unused variable
  clockevents: Switch into oneshot mode even if broadcast registered late
  timer_list: Convert timer list to be a proper seq_file
  timer_list: Split timer_list_show_tickdevices
  posix-timers: Show sigevent info in proc file
  posix-timers: Introduce /proc/PID/timers file
  posix timers: Allocate timer id per process (v2)
  timekeeping: Make sure to notify hrtimers when TAI offset changes
  hrtimer: Fix ktime_add_ns() overflow on 32bit architectures
  hrtimer: Add expiry time overflow check in hrtimer_interrupt
  timekeeping: Shorten seq_count region
  timekeeping: Implement a shadow timekeeper
  timekeeping: Delay update of clock->cycle_last
  timekeeping: Store cycle_last value in timekeeper struct as well
  ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state
  timekeeping: Simplify tai updating from do_adjtimex
  timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
  timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex()
  ...
2013-04-30 08:15:40 -07:00
Linus Torvalds 8700c95adb Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP/hotplug changes from Ingo Molnar:
 "This is a pretty large, multi-arch series unifying and generalizing
  the various disjunct pieces of idle routines that architectures have
  historically copied from each other and have grown in random, wildly
  inconsistent and sometimes buggy directions:

   101 files changed, 455 insertions(+), 1328 deletions(-)

  this went through a number of review and test iterations before it was
  committed, it was tested on various architectures, was exposed to
  linux-next for quite some time - nevertheless it might cause problems
  on architectures that don't read the mailing lists and don't regularly
  test linux-next.

  This cat herding excercise was motivated by the -rt kernel, and was
  brought to you by Thomas "the Whip" Gleixner."

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  idle: Remove GENERIC_IDLE_LOOP config switch
  um: Use generic idle loop
  ia64: Make sure interrupts enabled when we "safe_halt()"
  sparc: Use generic idle loop
  idle: Remove unused ARCH_HAS_DEFAULT_IDLE
  bfin: Fix typo in arch_cpu_idle()
  xtensa: Use generic idle loop
  x86: Use generic idle loop
  unicore: Use generic idle loop
  tile: Use generic idle loop
  tile: Enter idle with preemption disabled
  sh: Use generic idle loop
  score: Use generic idle loop
  s390: Use generic idle loop
  powerpc: Use generic idle loop
  parisc: Use generic idle loop
  openrisc: Use generic idle loop
  mn10300: Use generic idle loop
  mips: Use generic idle loop
  microblaze: Use generic idle loop
  ...
2013-04-30 07:50:17 -07:00
Linus Torvalds 16fa94b532 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes in this development cycle were:

   - full dynticks preparatory work by Frederic Weisbecker

   - factor out the cpu time accounting code better, by Li Zefan

   - multi-CPU load balancer cleanups and improvements by Joonsoo Kim

   - various smaller fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  sched: Fix init NOHZ_IDLE flag
  sched: Prevent to re-select dst-cpu in load_balance()
  sched: Rename load_balance_tmpmask to load_balance_mask
  sched: Move up affinity check to mitigate useless redoing overhead
  sched: Don't consider other cpus in our group in case of NEWLY_IDLE
  sched: Explicitly cpu_idle_type checking in rebalance_domains()
  sched: Change position of resched_cpu() in load_balance()
  sched: Fix wrong rq's runnable_avg update with rt tasks
  sched: Document task_struct::personality field
  sched/cpuacct/UML: Fix header file dependency bug on the UML build
  cgroup: Kill subsys.active flag
  sched/cpuacct: No need to check subsys active state
  sched/cpuacct: Initialize cpuacct subsystem earlier
  sched/cpuacct: Initialize root cpuacct earlier
  sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically
  sched/cpuacct: Clean up cpuacct.h
  sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field()
  sched/cpuacct: Remove redundant NULL checks in cpuacct_charge()
  sched/cpuacct: Add cpuacct_acount_field()
  sched/cpuacct: Add cpuacct_init()
  ...
2013-04-30 07:43:28 -07:00
Linus Torvalds e0972916e8 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Features:

   - Add "uretprobes" - an optimization to uprobes, like kretprobes are
     an optimization to kprobes.  "perf probe -x file sym%return" now
     works like kretprobes.  By Oleg Nesterov.

   - Introduce per core aggregation in 'perf stat', from Stephane
     Eranian.

   - Add memory profiling via PEBS, from Stephane Eranian.

   - Event group view for 'annotate' in --stdio, --tui and --gtk, from
     Namhyung Kim.

   - Add support for AMD NB and L2I "uncore" counters, by Jacob Shin.

   - Add Ivy Bridge-EP uncore support, by Zheng Yan

   - IBM zEnterprise EC12 oprofile support patchlet from Robert Richter.

   - Add perf test entries for checking breakpoint overflow signal
     handler issues, from Jiri Olsa.

   - Add perf test entry for for checking number of EXIT events, from
     Namhyung Kim.

   - Add perf test entries for checking --cpu in record and stat, from
     Jiri Olsa.

   - Introduce perf stat --repeat forever, from Frederik Deweerdt.

   - Add --no-demangle to report/top, from Namhyung Kim.

   - PowerPC fixes plus a couple of cleanups/optimizations in uprobes
     and trace_uprobes, by Oleg Nesterov.

  Various fixes and refactorings:

   - Fix dependency of the python binding wrt libtraceevent, from
     Naohiro Aota.

   - Simplify some perf_evlist methods and to allow 'stat' to share code
     with 'record' and 'trace', by Arnaldo Carvalho de Melo.

   - Remove dead code in related to libtraceevent integration, from
     Namhyung Kim.

   - Revert "perf sched: Handle PERF_RECORD_EXIT events" to get 'perf
     sched lat' back working, by Arnaldo Carvalho de Melo

   - We don't use Newt anymore, just plain libslang, by Arnaldo Carvalho
     de Melo.

   - Kill a bunch of die() calls, from Namhyung Kim.

   - Fix build on non-glibc systems due to libio.h absence, from Cody P
     Schafer.

   - Remove some perf_session and tracing dead code, from David Ahern.

   - Honor parallel jobs, fix from Borislav Petkov

   - Introduce tools/lib/lk library, initially just removing duplication
     among tools/perf and tools/vm.  from Borislav Petkov

  ... and many more I missed to list, see the shortlog and git log for
  more details."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (136 commits)
  perf/x86/intel/P4: Robistify P4 PMU types
  perf/x86/amd: Fix AMD NB and L2I "uncore" support
  perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c
  perf/x86: Check all MSRs before passing hw check
  perf/x86/amd: Add support for AMD NB and L2I "uncore" counters
  perf/x86/intel: Add Ivy Bridge-EP uncore support
  perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management
  perf/x86: Avoid kfree() in CPU_{STARTING,DYING}
  uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty
  uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()
  uprobes/tracing: Change create_trace_uprobe() to support uretprobes
  uprobes/tracing: Make seq_printf() code uretprobe-friendly
  uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly
  uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly
  uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()
  uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers
  uprobes/tracing: Generalize struct uprobe_trace_entry_head
  uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls
  uprobes/tracing: Kill the pointless seq_print_ip_sym() call
  uprobes/tracing: Kill the pointless task_pt_regs() calls
  ...
2013-04-30 07:41:01 -07:00
Thomas Gleixner d0380e6c3c early_printk: consolidate random copies of identical code
The early console implementations are the same all over the place.  Move
the print function to kernel/printk and get rid of the copies.

[akpm@linux-foundation.org: arch/mips/kernel/early_printk.c needs kernel.h for va_list]
[paul.gortmaker@windriver.com: sh4: make the bios early console support depend on EARLY_PRINTK]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:13 -07:00
Rob Landley 0c0de199ce mkcapflags.pl: convert to mkcapflags.sh
Generate asm-x86/cpufeature.h with posix-2008 commands instead of perl.

Signed-off-by: Rob Landley <rob@landley.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowell@redhat.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:27 -07:00
Ingo Molnar 5ac2b5c272 perf/x86/intel/P4: Robistify P4 PMU types
Linus found, while extending integer type extension checks in the
sparse static code checker, various fragile patterns of mixed
signed/unsigned  64-bit/32-bit integer use in perf_events_p4.c.

The relevant hardware register ABI is 64 bit wide on 32-bit
kernels as  well, so clean it all up a bit, remove unnecessary
casts, and make sure we  use 64-bit unsigned integers in these
places.

[ Unfortunately this patch was not tested on real P4 hardware,
  those are pretty rare already. If this patch causes any
  problems on P4 hardware then please holler ... ]

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130424072630.GB1780@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-26 09:31:41 +02:00
Jean Delvare 06d219dc22 x86/setup: Drop unneeded include <asm/dmi.h>
arch/x86/kernel/setup.c includes <asm/dmi.h> but it doesn't look
like it needs it, <linux/dmi.h> is sufficient.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/1366881845.4186.65.camel@chaos.site
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2013-04-25 11:32:51 +02:00
Thomas Gleixner 6402c7dc2a Merge branch 'linus' into timers/core
Reason: Get upstream fixes before adding conflicting code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-24 20:33:54 +02:00
Jacob Shin 94f4db3590 perf/x86/amd: Fix AMD NB and L2I "uncore" support
Borislav Petkov reported a lockdep splat warning about kzalloc()
done in an IPI (hardirq) handler.

This is a real bug, do not call kzalloc() in a smp_call_function_single()
handler because it can schedule and crash.

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <eranian@google.com>
Cc: <a.p.zijlstra@chello.nl>
Cc: <acme@ghostprotocols.net>
Cc: <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20130421180627.GA21049@jshin-Toonie
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-22 10:10:55 +02:00
Linus Torvalds 3125929454 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix offcore_rsp valid mask for SNB/IVB
  perf: Treat attr.config as u64 in perf_swevent_init()
2013-04-21 10:25:42 -07:00
Jacob Shin 0cf5f4323b perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c
Support for NB counters, MSRs 0xc0010240 ~ 0xc0010247, got
moved to perf_event_amd_uncore.c in the following commit:

  c43ca5091a perf/x86/amd: Add support for AMD NB and L2I "uncore" counters

AMD Family 10h NB events (events 0xe0 ~ 0xff, on MSRs 0xc001000 ~
0xc001007) will still continue to be handled by perf_event_amd.c

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/1366046483-1765-2-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21 17:21:59 +02:00
George Dunlap a5ebe0ba3d perf/x86: Check all MSRs before passing hw check
check_hw_exists() has a number of checks which go to two exit
paths: msr_fail and bios_fail.  Checks classified as msr_fail
will cause check_hw_exists() to return false, causing the PMU
not to be used; bios_fail checks will only cause a warning to be
printed, but will return true.

The problem is that if there are both msr failures and bios
failures, and the routine hits a bios_fail check first, it will
exit early and return true, not finishing the rest of the msr
checks.  If those msrs are in fact broken, it will cause them to
be used erroneously.

In the case of a Xen PV VM, the guest OS has read access to all
the MSRs, but write access is white-listed to supported
features.  Writes to unsupported MSRs have no effect.  The PMU
MSRs are not (typically) supported, because they are expensive
to save and restore on a VM context switch.  One of the
"msr_fail" checks is supposed to detect this circumstance (ether
for Xen or KVM) and disable the harware PMU.

However, on one of my AMD boxen, there is (apparently) a broken
BIOS which triggers one of the bios_fail checks.  In particular,
MSR_K7_EVNTSEL0 has the ARCH_PERFMON_EVENTSEL_ENABLE bit set.
The guest kernel detects this because it has read access to all
MSRs, and causes it to skip the rest of the checks and try to
use the non-existent hardware PMU.  This minimally causes a lot
of useless instruction emulation and Xen console spam; it may
cause other issues with the watchdog as well.

This changset causes check_hw_exists() to go through all of the
msr checks, failing and returning false if any of them fail.
This makes sure that a guest running under Xen without a virtual
PMU will detect that there is no functioning PMU and not attempt
to use it.

This problem affects kernels as far back as 3.2, and should thus
be considered for backport.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Link: http://lkml.kernel.org/r/1365000388-32448-1-git-send-email-george.dunlap@eu.citrix.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21 11:16:29 +02:00
Jacob Shin c43ca5091a perf/x86/amd: Add support for AMD NB and L2I "uncore" counters
Add support for AMD Family 15h [and above] northbridge
performance counters. MSRs 0xc0010240 ~ 0xc0010247 are shared
across all cores that share a common northbridge.

Add support for AMD Family 16h L2 performance counters. MSRs
0xc0010230 ~ 0xc0010237 are shared across all cores that share a
common L2 cache.

We do not enable counter overflow interrupts. Sampling mode and
per-thread events are not supported.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130419213428.GA8229@jshin-Toonie
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21 11:01:24 +02:00
Yan, Zheng e850f9c33c perf/x86/intel: Add Ivy Bridge-EP uncore support
The uncore subsystem in Ivy Bridge-EP is similar to Sandy
Bridge-EP. There are some differences in config register
encoding and pci device IDs. The Ivy Bridge-EP uncore also
supports a few new events.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1366113067-3262-4-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21 11:01:24 +02:00
Yan, Zheng 46bdd90598 perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management
The existing code assumes all Cbox and PCU events are using
filter, but actually the filter is event specific. Furthermore
the filter is sub-divided into multiple fields which are used
by different events.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1366113067-3262-3-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Stephane Eranian <eranian@google.com>
2013-04-21 11:01:23 +02:00
Yan, Zheng 22cc4ccf63 perf/x86: Avoid kfree() in CPU_{STARTING,DYING}
On -rt kfree() can schedule, but CPU_{STARTING,DYING} should be
atomic. So use a list to defer kfree until CPU_{ONLINE,DEAD}.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1366113067-3262-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21 10:59:01 +02:00
Ingo Molnar 73e21ce28d Merge branch 'perf/urgent' into perf/core
Conflicts:
	arch/x86/kernel/cpu/perf_event_intel.c

Merge in the latest fixes before applying new patches, resolve the conflict.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21 10:57:33 +02:00
Linus Torvalds 830ac8524f Merge branch 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull kdump fixes from Peter Anvin:
 "The kexec/kdump people have found several problems with the support
  for loading over 4 GiB that was introduced in this merge cycle.  This
  is partly due to a number of design problems inherent in the way the
  various pieces of kdump fit together (it is pretty horrifically manual
  in many places.)

  After a *lot* of iterations this is the patchset that was agreed upon,
  but of course it is now very late in the cycle.  However, because it
  changes both the syntax and semantics of the crashkernel option, it
  would be desirable to avoid a stable release with the broken
  interfaces."

I'm not happy with the timing, since originally the plan was to release
the final 3.9 tomorrow.  But apparently I'm doing an -rc8 instead...

* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kexec: use Crash kernel for Crash kernel low
  x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
  x86, kdump: Retore crashkernel= to allocate under 896M
  x86, kdump: Set crashkernel_low automatically
2013-04-20 18:40:36 -07:00
Linus Torvalds db93f8b420 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Three groups of fixes:

   1. Make sure we don't execute the early microcode patching if family
      < 6, since it would touch MSRs which don't exist on those
      families, causing crashes.

   2. The Xen partial emulation of HyperV can be dealt with more
      gracefully than just disabling the driver.

   3. More EFI variable space magic.  In particular, variables hidden
      from runtime code need to be taken into account too."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode: Verify the family before dispatching microcode patching
  x86, hyperv: Handle Xen emulation of Hyper-V more gracefully
  x86,efi: Implement efi_no_storage_paranoia parameter
  efi: Export efi_query_variable_store() for efivars.ko
  x86/Kconfig: Make EFI select UCS2_STRING
  efi: Distinguish between "remaining space" and actually used space
  efi: Pass boot services variable info to runtime code
  Move utf16 functions to kernel core and rename
  x86,efi: Check max_size only if it is non-zero.
  x86, efivars: firmware bug workarounds should be in platform code
2013-04-20 18:38:48 -07:00
H. Peter Anvin 74c3e3fcf3 x86, microcode: Verify the family before dispatching microcode patching
For each CPU vendor that implements CPU microcode patching, there will
be a minimum family for which this is implemented.  Verify this
minimum level of support.

This can be done in the dispatch function or early in the application
functions.  Doing the latter turned out to be somewhat awkward because
of the ineviable split between the BSP and the AP paths, and rather
than pushing deep into the application functions, do this in
the dispatch function.

Reported-by: "Bryan O'Donoghue" <bryan.odonoghue.lkml@nexus-software.ie>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1366392183-4149-1-git-send-email-bryan.odonoghue.lkml@nexus-software.ie
2013-04-19 16:36:03 -07:00
Ingo Molnar 5379f8c0d7 Add required support for AMD F16h to amd64_edac.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRcSI0AAoJEBLB8Bhh3lVKWjIP/3BdMQljadBzIm5xSU5Vt4lW
 1R22XrfNAwqdbjuwHGV7TPMJrjnV8zGQBsghALdqCT7xtRnl71ce7XVGbd1MblGc
 llTcOBlCp1v5xHbNaL1xmunvI7JuMushX69YRD/KOFoEaXo2iQDCjSpA+73K3d/g
 OHjFQGrZkuuNkdTEWiOo6cplDMwXKNzLlS1ccN7XzzKPQ3aA0Q/XwlUftjHdBfGP
 WCxahtsz60xCxJt1x4hy5/ocZT2WfLp/shSS/udsGcTl5g75Yjhz8RkTuOmNySzo
 KDwsSQaT7HBz+b3SirG1IHEzgsF2DKZqDziDlBt8qx2xcHnkp7DfF/rZajCOzoLy
 k6tdFI2FEj3JZqyCDVxFAP/xmct39S/arqFlLLAUIAQHbsVkrmIH6ZC9NGiSPq9M
 kEdOOl/G0CbfnPFGd1hYCQUdRzJc7m5hKT05WFRRexRdsLVoAluO8rPWHtbqlmRS
 lvYkrvVglnOTlN0jVN6xw9LaTkX/q3+kUFzu4ZPiDQYqVzw+XmKYYh+L9OTgnDqA
 rKZqljdDgwb5GoiexyKtvuJUt7dJ2KlDZoEZ7INLQvA4Im9EQnYW/RH48G+8qouJ
 XfSjxUET0fmapYzulQI8Ikhs/dbPt7FItUn/ipX7I9fCxMDJOyoXTqZb2S0FbyYa
 kdZIbxzNDja0cCCcZfOs
 =5BNU
 -----END PGP SIGNATURE-----

Merge tag 'edac_amd_f16h' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull AMD F16h support for amd64_edac from Borislav Petkov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-19 13:03:08 +02:00
Aravind Gopalakrishnan 94c1acf2c8 amd64_edac: Add Family 16h support
Add code to handle DRAM ECC errors decoding for Fam16h.

Tested on Fam16h with ECC turned on using the mce_amd_inj facility and
works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Boris: cleanups and clarifications ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-04-19 12:46:50 +02:00
K. Y. Srinivasan 7eff7ded02 x86, hyperv: Handle Xen emulation of Hyper-V more gracefully
Install the Hyper-V specific interrupt handler only when needed. This would
permit us to get rid of the Xen check. Note that when the vmbus drivers invokes
the call to register its handler, we are sure to be running on Hyper-V.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1366299886-6399-1-git-send-email-kys@microsoft.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-18 08:59:20 -07:00
Yinghai Lu adbc742bf7 x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of
crashkernel_hign=X crashkernel_low=Y. As that could be extensible.

-v2: according to Vivek, change delimiter to ;
-v3: let hign and low only handle simple form and it conforms to
	description in kernel-parameters.txt
     still keep crashkernel=X override any crashkernel=X,high
        crashkernel=Y,low
-v4: update get_last_crashkernel returning and add more strict
     checking in parse_crashkernel_simple() found by HATAYAMA.
-v5: Change delimiter back to , according to HPA.
     also separate parse_suffix from parse_simper according to vivek.
	so we can avoid @pos in that path.
-v6: Tight the checking about crashkernel=X,highblahblah,high
     found by HTYAYAMA.

Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17 12:35:33 -07:00
Yinghai Lu 55a20ee780 x86, kdump: Retore crashkernel= to allocate under 896M
Vivek found old kexec-tools does not work new kernel anymore.

So change back crashkernel= back to old behavoir, and add crashkernel_high=
to let user decide if buffer could be above 4G, and also new kexec-tools will
be needed.

-v2: let crashkernel=X override crashkernel_high=
    update description about _high will be ignored by crashkernel=X
-v3: update description about kernel-parameters.txt according to Vivek.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-4-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17 12:35:33 -07:00
Yinghai Lu c729de8fce x86, kdump: Set crashkernel_low automatically
Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.

We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y.  This causes regression
if iommu is not enabled.  Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.

Set crashkernel_low automatically if the user does not specify that.

For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.

-v3: add swiotlb_size() according to Konrad.
-v4: add comments what 8M is for according to hpa.
     also update more crashkernel_low= in kernel-parameters.txt
-v5: update changelog according to Vivek.
-v6: Change description about swiotlb referring according to HATAYAMA.

Reported-by: WANG Chao <chaowang@redhat.com>
Tested-by: WANG Chao <chaowang@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-2-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17 12:35:32 -07:00
Stephane Eranian f1923820c4 perf/x86: Fix offcore_rsp valid mask for SNB/IVB
The valid mask for both offcore_response_0 and
offcore_response_1 was wrong for SNB/SNB-EP,
IVB/IVB-EP. It was possible to write to
reserved bit and cause a GP fault crashing
the kernel.

This patch fixes the problem by correctly marking the
reserved bits in the valid mask for all the processors
mentioned above.

A distinction between desktop and server parts is introduced
because bits 24-30 are only available on the server parts.

This version of the  patch is just a rebase to perf/urgent tree
and should apply to older kernels as well.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: security@kernel.org
Cc: ak@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-16 15:02:06 +02:00
Borislav Petkov 1077c932db x86, CPU, AMD: Drop useless label
All we want to do is return from this function so stop jumping around
like a flea for no good reason.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1365436666-9837-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-04-16 11:50:51 +02:00
Borislav Petkov 682469a5db x86, AMD: Correct {rd,wr}msr_amd_safe warnings
The idea with those routines is to slowly phase them out and not call
them on anything else besides K8. They even have a check for that which,
when called too early, fails. Let me explain:

It gets the cpuinfo_x86 pointer from the per_cpu array and when this
happens for cpu0, before its boot_cpu_data has been copied back to the
per_cpu array in smp_store_boot_cpu_info(), we get an empty struct and
thus the check fails.

Use boot_cpu_data directly instead.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1365436666-9837-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-04-16 11:50:51 +02:00
Borislav Petkov 55a36b65ee x86: Fold-in trivial check_config function
Fold it into its single call site. No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1365436666-9837-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-04-16 11:50:50 +02:00
Ingo Molnar b5210b2a34 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
Pull uprobes updates from Oleg Nesterov:

 - "uretprobes" - an optimization to uprobes, like kretprobes are an optimization
   to kprobes. "perf probe -x file sym%return" now works like kretprobes.

 - PowerPC fixes plus a couple of cleanups/optimizations in uprobes and trace_uprobes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-16 11:04:10 +02:00
Wang YanQing 26bfc540f6 x86/mm/gart: Drop unnecessary check
The memblock_find_in_range() return value addr is guaranteed
to be within "addr + aper_size" and not beyond GART_MAX_ADDR.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/20130416013734.GA14641@udknight
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-16 10:54:40 +02:00
Linus Torvalds 6c4c4d4bda Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set
  x86/mm/cpa/selftest: Fix false positive in CPA self test
  x86/mm/cpa: Convert noop to functional fix
  x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates
2013-04-14 11:13:24 -07:00
Linus Torvalds ae9f4939ba Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix error return code
  ftrace: Fix strncpy() use, use strlcpy() instead of strncpy()
  perf: Fix strncpy() use, use strlcpy() instead of strncpy()
  perf: Fix strncpy() use, always make sure it's NUL terminated
  perf: Fix ring_buffer perf_output_space() boundary calculation
  perf/x86: Fix uninitialized pt_regs in intel_pmu_drain_bts_buffer()
2013-04-14 11:10:44 -07:00
Anton Arapov 791eca1010 uretprobes/x86: Hijack return address
Hijack the return address and replace it with a trampoline address.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-04-13 15:31:55 +02:00
Konrad Rzeszutek Wilk 357d122670 x86, xen, gdt: Remove the pvops variant of store_gdt.
The two use-cases where we needed to store the GDT were during ACPI S3 suspend
and resume. As the patches:
 x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed
 x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed.

have demonstrated - there are other mechanism by which the GDT is
saved and reloaded during early resume path.

Hence we do not need to worry about the pvops call-chain for saving the
GDT and can and can eliminate it. The other areas where the store_gdt is
used are never going to be hit when running under the pvops platforms.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1365194544-14648-4-git-send-email-konrad.wilk@oracle.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-11 15:40:38 -07:00
Konrad Rzeszutek Wilk 84e70971e6 x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed
During the ACPI S3 suspend, we store the GDT in the wakup_header (see
wakeup_asm.s) field called 'pmode_gdt'.

Which is then used during the resume path and has the same exact
value as what the store/load_gdt do with the saved_context
(which is saved/restored via save/restore_processor_state()).

The flow during resume from ACPI S3 is simpler than the 64-bit
counterpart. We only use the early bootstrap once (wakeup_gdt) and
do various checks in real mode.

After the checks are completed, we load the saved GDT ('pmode_gdt') and
continue on with the resume (by heading to startup_32 in trampoline_32.S) -
which quickly jumps to what was saved in 'pmode_entry'
aka 'wakeup_pmode_return'.

The 'wakeup_pmode_return' restores the GDT (saved_gdt) again (which was
saved in do_suspend_lowlevel initially). After that it ends up calling
the 'ret_point' which calls 'restore_processor_state()'.

We have two opportunities to remove code where we restore the same GDT
twice.

Here is the call chain:
 wakeup_start
       |- lgdtl wakeup_gdt [the work-around broken BIOSes]
       |
       | - lgdtl pmode_gdt [the real one]
       |
       \-- startup_32 (in trampoline_32.S)
              \-- wakeup_pmode_return (in wakeup_32.S)
                       |- lgdtl saved_gdt [the real one]
                       \-- ret_point
                             |..
                             |- call restore_processor_state

The hibernate path is much simpler. During the saving of the hibernation
image we call save_processor_state() and save the contents of that
along with the rest of the kernel in the hibernation image destination.
We save the EIP of 'restore_registers' (restore_jump_address) and
cr3 (restore_cr3).

During hibernate resume, the 'restore_registers' (via the
'restore_jump_address) in hibernate_asm_32.S is invoked which
restores the contents of most registers. Naturally the resume path benefits
from already being in 32-bit mode, so it does not have to reload the GDT.

It only reloads the cr3 (from restore_cr3) and continues on. Note
that the restoration of the restore image page-tables is done prior to
this.

After the 'restore_registers' it returns and we end up called
restore_processor_state() - where we reload the GDT. The reload of
the GDT is not needed as bootup kernel has already loaded the GDT
which is at the same physical location as the the restored kernel.

Note that the hibernation path assumes the GDT is correct during its
'restore_registers'. The assumption in the code is that the restored
image is the same as saved - meaning we are not trying to restore
an different kernel in the virtual address space of a new kernel.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1365194544-14648-3-git-send-email-konrad.wilk@oracle.com
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-11 15:40:17 -07:00
Kees Cook 4eefbe792b x86: Use a read-only IDT alias on all CPUs
Make a copy of the IDT (as seen via the "sidt" instruction) read-only.
This primarily removes the IDT from being a target for arbitrary memory
write attacks, and has the added benefit of also not leaking the kernel
base offset, if it has been relocated.

We already did this on vendor == Intel and family == 5 because of the
F0 0F bug -- regardless of if a particular CPU had the F0 0F bug or
not.  Since the workaround was so cheap, there simply was no reason to
be very specific.  This patch extends the readonly alias to all CPUs,
but does not activate the #PF to #UD conversion code needed to deliver
the proper exception in the F0 0F case except on Intel family 5
processors.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130410192422.GA17344@www.outflux.net
Cc: Eric Northup <digitaleric@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-11 13:53:19 -07:00