linux

mirror of https://github.com/torvalds/linux synced 2024-10-26 05:09:47 +00:00

Author	SHA1	Message	Date
Linus Torvalds	1b195b170d	Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6 * 'kmemleak' of git://linux-arm.org/linux-2.6: kmemleak: Improve the "Early log buffer exceeded" error message kmemleak: fix sparse warning for static declarations kmemleak: fix sparse warning over overshadowed flags kmemleak: move common painting code together kmemleak: add clear command support kmemleak: use bool for true/false questions kmemleak: Do no create the clean-up thread during kmemleak_disable() kmemleak: Scan all thread stacks kmemleak: Don't scan uninitialized memory when kmemcheck is enabled kmemleak: Ignore the aperture memory hole on x86_64 kmemleak: Printing of the objects hex dump kmemleak: Do not report alloc_bootmem blocks as leaks kmemleak: Save the stack trace for early allocations kmemleak: Mark the early log buffer as __initdata kmemleak: Dump object information on request kmemleak: Allow rescheduling during an object scanning	2009-09-11 09:16:22 -07:00
Ben Hutchings	5367b6887e	x86: Fix code patching for paravirt-alternatives on 486 As reported in <http://bugs.debian.org/511703> and <http://bugs.debian.org/515982>, kernels with paravirt-alternatives enabled crash in text_poke_early() on at least some 486-class processors. The problem is that text_poke_early() itself uses inline functions affected by paravirt-alternatives and so will modify instructions that have already been prefetched. Pentium and later processors will invalidate the prefetched instructions in this case, but 486-class processors do not. Change sync_core() to limit prefetching on 486-class (and 386-class) processors, and move the call to sync_core() above the call to the modifiable local_irq_restore(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> LKML-Reference: <1252547631.3423.134.camel@localhost> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-10 16:50:19 -07:00
James Morris	a3c8b97396	Merge branch 'next' into for-linus	2009-09-11 08:04:49 +10:00
Marcelo Tosatti	6ba6617875	KVM guest: do not batch pte updates from interrupt context Commit `b8bcfe997e` made paravirt pte updates synchronous in interrupt context. Unfortunately the KVM pv mmu code caches the lazy/nonlazy mode internally, so a pte update from interrupt context during a lazy mmu operation can be batched while it should be performed synchronously. https://bugzilla.redhat.com/show_bug.cgi?id=518022 Drop the internal mode variable and use paravirt_get_lazy_mode(), which returns the correct state. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 18:10:50 +03:00
Glauber Costa	a20316d2aa	KVM guest: fix bogus wallclock physical address calculation The use of __pa() to calculate the address of a C-visible symbol is wrong, and can lead to unpredictable results. See arch/x86/include/asm/page.h for details. It should be replaced with __pa_symbol(), that does the correct math here, by taking relocations into account. This ensures the correct wallclock data structure physical address is passed to the hypervisor. Cc: stable@kernel.org Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:58 +03:00
Jiri Slaby	748df9a4c6	x86/PCI: pci quirks, fix pci refcounting Stanse found a pci reference leak in quirk_amd_nb_node. Instead of putting nb_ht, there is a put of dev passed as an argument. http://stanse.fi.muni.cz/ Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-09-09 14:11:02 -07:00
Alex Williamson	80286879c2	PCI iommu: iommu=pt is a valid early param This avoids a "Malformed early option 'iommu'" on boot when trying to use pass-through mode. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-09-09 13:29:28 -07:00
Ingo Molnar	ed011b22ce	Merge commit 'v2.6.31-rc9' into tracing/core Merge reason: move from -rc5 to -rc9. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-06 06:11:42 +02:00
Ingo Molnar	695a461296	Merge branch 'amd-iommu/2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu	2009-09-04 14:44:16 +02:00
Yinghai Lu	0d96b9ff74	x86: Use hard_smp_processor_id() to get apic id for AMD K8 cpus Otherwise, system with apci id lifting will have wrong apicid in /proc/cpuinfo. and use that in srat_detect_node(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <4A998CCA.1040407@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 09:55:29 +02:00
markus.t.metzger@intel.com	1653192f51	x86, perf_counter, bts: Do not allow kernel BTS tracing for now Kernel BTS tracing generates too much data too fast for us to handle, causing the kernel to hang. Fail for BTS requests for kernel code. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Acked-by: Peter Zijlstra <a.p.zjilstra@chello.nl> LKML-Reference: <20090902140616.901253000@intel.com> [ This is really a workaround - but we want BTS tracing in .32 so make sure we dont regress. The lockup should be fixed ASAP. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 09:26:40 +02:00
markus.t.metzger@intel.com	596da17f94	x86, perf_counter, bts: Correct pointer-to-u64 casts On 32bit, pointers in the DS AREA configuration are cast to u64. The current (long) cast to avoid compiler warnings results in a signed 64bit address. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090902140615.305889000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 09:26:39 +02:00
markus.t.metzger@intel.com	747b50aaf7	x86, perf_counter, bts: Fail if BTS is not available Reserve PERF_COUNT_HW_BRANCH_INSTRUCTIONS with sample_period == 1 for BTS tracing and fail, if BTS is not available. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090902140612.943801000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 09:26:39 +02:00
Jeremy Fitzhardinge	53f824520b	x86/i386: Put aligned stack-canary in percpu shared_aligned section Pack aligned things together into a special section to minimize padding holes. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <4AA035C0.9070202@goop.org> [ queued up in tip:x86/asm because it depends on this commit: x86/i386: Make sure stack-protector segment base is cache aligned ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 07:10:31 +02:00
Andreas Herrmann	5a925b4282	x86, sched: Workaround broken sched domain creation for AMD Magny-Cours Current sched domain creation code can't handle multi-node processors. When switching to power_savings scheduling errors show up and system might hang later on (due to broken sched domain hierarchy): # echo 0 >> /sys/devices/system/cpu/sched_mc_power_savings CPU0 attaching sched-domain: domain 0: span 0-5 level MC groups: 0 1 2 3 4 5 domain 1: span 0-23 level NODE groups: 0-5 6-11 18-23 12-17 ... # echo 1 >> /sys/devices/system/cpu/sched_mc_power_savings CPU0 attaching sched-domain: domain 0: span 0-11 level MC groups: 0 1 2 3 4 5 6 7 8 9 10 11 ERROR: parent span is not a superset of domain->span domain 1: span 0-5 level CPU ERROR: domain->groups does not contain CPU0 groups: 6-11 (__cpu_power = 12288) ERROR: groups don't span domain->span domain 2: span 0-23 level NODE groups: ERROR: domain->cpu_power not set ERROR: groups don't span domain->span ... Fixing all aspects of power-savings scheduling for Magny-Cours needs some larger changes in the sched domain creation code. As a short-term and temporary workaround avoid the problems by extending "the worst possible hack" ;-( and always use llc_shared_map on AMD Magny-Cours when MC domain span is calculated. With this I get: # echo 1 >> /sys/devices/system/cpu/sched_mc_power_savings CPU0 attaching sched-domain: domain 0: span 0-5 level MC groups: 0 1 2 3 4 5 domain 1: span 0-5 level CPU groups: 0-5 (__cpu_power = 6144) domain 2: span 0-23 level NODE groups: 0-5 (__cpu_power = 6144) 6-11 (__cpu_power = 6144) 18-23 (__cpu_power = 6144) 12-17 (__cpu_power = 6144) ... I.e. no errors during sched domain creation, no system hangs, and also mc_power_savings scheduling works to a certain extend. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-03 15:10:14 -07:00
Andreas Herrmann	cb9805ab5b	x86, mcheck: Use correct cpumask for shared bank4 This fixes threshold_bank4 support on multi-node processors. The correct mask to use is llc_shared_map, representing an internal node on Magny-Cours. We need to create 2 sets of symlinks for sibling shared banks -- one set for each internal node, symlinks of each set should target the first core on same internal node. Currently only one set is created where all symlinks are targeting the first core of the entire socket. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-03 15:10:08 -07:00
Andreas Herrmann	a326e948c5	x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors L3 cache size, associativity and shared_cpu information need to be adapted to show information for an internal node instead of the entire physical package. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-03 15:10:03 -07:00
Andreas Herrmann	4a376ec3a2	x86: Fix CPU llc_shared_map information for AMD Magny-Cours Construct entire NodeID and use it as cpu_llc_id. Thus internal node siblings are stored in llc_shared_map. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-03 15:09:59 -07:00
Jeremy Fitzhardinge	1ea0d14e48	x86/i386: Make sure stack-protector segment base is cache aligned The Intel Optimization Reference Guide says: In Intel Atom microarchitecture, the address generation unit assumes that the segment base will be 0 by default. Non-zero segment base will cause load and store operations to experience a delay. - If the segment base isn't aligned to a cache line boundary, the max throughput of memory operations is reduced to one [e]very 9 cycles. [...] Assembly/Compiler Coding Rule 15. (H impact, ML generality) For Intel Atom processors, use segments with base set to 0 whenever possible; avoid non-zero segment base address that is not aligned to cache line boundary at all cost. We can't avoid having a non-zero base for the stack-protector segment, but we can make it cache-aligned. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: <stable@kernel.org> LKML-Reference: <4AA01893.6000507@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-03 21:30:51 +02:00
Joerg Roedel	2b681fafcc	Merge branch 'amd-iommu/pagetable' into amd-iommu/2.6.32 Conflicts: arch/x86/kernel/amd_iommu.c	2009-09-03 17:14:57 +02:00
Joerg Roedel	03362a05c5	Merge branch 'amd-iommu/passthrough' into amd-iommu/2.6.32 Conflicts: arch/x86/kernel/amd_iommu.c arch/x86/kernel/amd_iommu_init.c	2009-09-03 16:34:23 +02:00
Joerg Roedel	85da07c409	Merge branches 'gart/fixes', 'amd-iommu/fixes+cleanups' and 'amd-iommu/fault-handling' into amd-iommu/2.6.32	2009-09-03 16:32:00 +02:00
Joerg Roedel	4751a95134	x86/amd-iommu: Initialize passthrough mode when requested This patch enables the passthrough mode for AMD IOMMU by running the initialization function when iommu=pt is passed on the kernel command line. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:46 +02:00
Joerg Roedel	a1ca331c8a	x86/amd-iommu: Don't detach device from pt domain on driver unbind This patch makes sure a device is not detached from the passthrough domain when the device driver is unloaded or does otherwise release the device. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:46 +02:00
Joerg Roedel	21129f786f	x86/amd-iommu: Make sure a device is assigned in passthrough mode When the IOMMU driver runs in passthrough mode it has to make sure that every device not assigned to an IOMMU-API domain must be put into the passthrough domain instead of keeping it unassigned. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:45 +02:00
Joerg Roedel	eba6ac60ba	x86/amd-iommu: Align locking between attach_device and detach_device This patch makes the locking behavior between the functions attach_device and __attach_device consistent with the locking behavior between detach_device and __detach_device. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:44 +02:00
Joerg Roedel	aa879fff5d	x86/amd-iommu: Fix device table write order The V bit of the device table entry has to be set after the rest of the entry is written to not confuse the hardware. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:43 +02:00
Joerg Roedel	0feae533dd	x86/amd-iommu: Add passthrough mode initialization functions When iommu=pt is passed on kernel command line the devices should run untranslated. This requires the allocation of a special domain for that purpose. This patch implements the allocation and initialization path for iommu=pt. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:42 +02:00
Joerg Roedel	2650815fb0	x86/amd-iommu: Add core functions for pd allocation/freeing This patch factors some code of protection domain allocation into seperate functions. This way the logic can be used to allocate the passthrough domain later. As a side effect this patch fixes an unlikely domain id leakage bug. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:15:34 +02:00
Joerg Roedel	ac0101d396	x86/dma: Mark iommu_pass_through as __read_mostly This variable is read most of the time. This patch marks it as such. It also documents the meaning the this variable while at it. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:13:46 +02:00
Joerg Roedel	abdc5eb3d6	x86/amd-iommu: Change iommu_map_page to support multiple page sizes This patch adds a map_size parameter to the iommu_map_page function which makes it generic enough to handle multiple page sizes. This also requires a change to alloc_pte which is also done in this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:11:17 +02:00
Joerg Roedel	a6b256b413	x86/amd-iommu: Support higher level PTEs in iommu_page_unmap This patch changes fetch_pte and iommu_page_unmap to support different page sizes too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:11:08 +02:00
Joerg Roedel	8f7a017ce0	x86/amd-iommu: Use 2-level page tables for dma_ops domains The driver now supports a dynamic number of levels for IO page tables. This allows to reduce the number of levels for dma_ops domains by one because a dma_ops domain has usually an address space size between 128MB and 4G. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:49 +02:00
Joerg Roedel	bad1cac28a	x86/amd-iommu: Remove bus_addr check in iommu_map_page The driver now supports full 64 bit device address spaces. So this check is not longer required. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:48 +02:00
Joerg Roedel	8c8c143cdc	x86/amd-iommu: Remove last usages of IOMMU_PTE_L0_INDEX This change allows to remove these old macros later. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:47 +02:00
Joerg Roedel	8bc3e12742	x86/amd-iommu: Change alloc_pte to support 64 bit address space This patch changes the alloc_pte function to be able to map pages into the whole 64 bit address space supported by AMD IOMMU hardware from the old limit of 2**39 bytes. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:46 +02:00
Joerg Roedel	50020fb632	x86/amd-iommu: Introduce increase_address_space function This function will be used to increase the address space size of a protection domain. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:46 +02:00
Joerg Roedel	04bfdd8406	x86/amd-iommu: Flush domains if address space size was increased Thist patch introduces the update_domain function which propagates the larger address space of a protection domain to the device table and flushes all relevant DTEs and the domain TLB. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:45 +02:00
Joerg Roedel	407d733e30	x86/amd-iommu: Introduce set_dte_entry function This function factors out some logic of attach_device to a seperate function. This new function will be used to update device table entries when necessary. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:44 +02:00
Joerg Roedel	6a0dbcbe4e	x86/amd-iommu: Add a gneric version of amd_iommu_flush_all_devices This patch adds a generic variant of amd_iommu_flush_all_devices function which flushes only the DTEs for a given protection domain. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:43 +02:00
Joerg Roedel	a6d41a4027	x86/amd-iommu: Use fetch_pte in amd_iommu_iova_to_phys Don't reimplement the page table walker in this function. Use the generic one. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:42 +02:00
Joerg Roedel	38a76eeeaf	x86/amd-iommu: Use fetch_pte in iommu_unmap_page Instead of reimplementing existing logic use fetch_pte to walk the page table in iommu_unmap_page. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:41 +02:00
Joerg Roedel	9355a08186	x86/amd-iommu: Make fetch_pte aware of dynamic mapping levels This patch changes the fetch_pte function in the AMD IOMMU driver to support dynamic mapping levels. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 16:03:34 +02:00
Joerg Roedel	6a1eddd2f9	x86/amd-iommu: Reset command buffer if wait loop fails Instead of a panic on an comletion wait loop failure, try to recover from that event from resetting the command buffer. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:56:19 +02:00
Joerg Roedel	b26e81b871	x86/amd-iommu: Panic if IOMMU command buffer reset fails To prevent the driver from doing recursive command buffer resets, just panic when that recursion happens. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:55:35 +02:00
Joerg Roedel	a345b23b79	x86/amd-iommu: Reset command buffer on ILLEGAL_COMMAND_ERROR On an ILLEGAL_COMMAND_ERROR the IOMMU stops executing further commands. This patch changes the code to handle this case better by resetting the command buffer in the IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:55:34 +02:00
Joerg Roedel	93f1cc67cf	x86/amd-iommu: Add reset function for command buffers This patch factors parts of the command buffer initialization code into a seperate function which can be used to reset the command buffer later. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:55:34 +02:00
Joerg Roedel	d586d7852c	x86/amd-iommu: Add function to flush all DTEs on one IOMMU This function flushes all DTE entries on one IOMMU for all devices behind this IOMMU. This is required for command buffer resetting later. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:55:23 +02:00
Joerg Roedel	e0faf54ee8	x86/amd-iommu: fix broken check in amd_iommu_flush_all_devices The amd_iommu_pd_table is indexed by protection domain number and not by device id. So this check is broken and must be removed. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:49:56 +02:00
Joerg Roedel	ae908c22aa	x86/amd-iommu: Remove redundant 'IOMMU' string The 'IOMMU: ' prefix is not necessary because the DUMP_printk macro already prints its own prefix. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:49:56 +02:00
Joerg Roedel	4c6f40d4e0	x86/amd-iommu: replace "AMD IOMMU" by "AMD-Vi" This patch replaces the "AMD IOMMU" printk strings with the official name for the hardware: "AMD-Vi". Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:49:56 +02:00
Joerg Roedel	f2430bd104	x86/amd-iommu: Remove some merge helper code This patch removes some left-overs which where put into the code to simplify merging code which also depends on changes in other trees. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:49:55 +02:00
Joerg Roedel	e394d72aa8	x86/amd-iommu: Introduce function for iommu-local domain flush This patch introduces a function to flush all domain tlbs for on one given IOMMU. This is required later to reset the command buffer on one IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 15:41:34 +02:00
Joerg Roedel	945b4ac44e	x86/amd-iommu: Dump illegal command on ILLEGAL_COMMAND_ERROR This patch adds code to dump the command which caused an ILLEGAL_COMMAND_ERROR raised by the IOMMU hardware. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 14:28:04 +02:00
Joerg Roedel	e3e59876e8	x86/amd-iommu: Dump fault entry on DTE error This patch adds code to dump the content of the device table entry which caused an ILLEGAL_DEV_TABLE_ENTRY error from the IOMMU hardware. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-09-03 14:17:08 +02:00
Ingo Molnar	f76bd108e5	Merge branch 'perfcounters/urgent' into perfcounters/core Merge reason: We are going to modify a place modified by perfcounters/urgent. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-02 21:42:59 +02:00
David Howells	ee18d64c1f	KEYS: Add a keyctl to install a process's session keyring on its parent [try #6 ] Add a keyctl to install a process's session keyring onto its parent. This replaces the parent's session keyring. Because the COW credential code does not permit one process to change another process's credentials directly, the change is deferred until userspace next starts executing again. Normally this will be after a wait() syscall. To support this, three new security hooks have been provided: cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in the blank security creds and key_session_to_parent() - which asks the LSM if the process may replace its parent's session keyring. The replacement may only happen if the process has the same ownership details as its parent, and the process has LINK permission on the session keyring, and the session keyring is owned by the process, and the LSM permits it. Note that this requires alteration to each architecture's notify_resume path. This has been done for all arches barring blackfin, m68k and xtensa, all of which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the replacement to be performed at the point the parent process resumes userspace execution. This allows the userspace AFS pioctl emulation to fully emulate newpag() and the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to alter the parent process's PAG membership. However, since kAFS doesn't use PAGs per se, but rather dumps the keys into the session keyring, the session keyring of the parent must be replaced if, for example, VIOCSETTOK is passed the newpag flag. This can be tested with the following program: #include <stdio.h> #include <stdlib.h> #include <keyutils.h> #define KEYCTL_SESSION_TO_PARENT 18 #define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0) int main(int argc, char **argv) { key_serial_t keyring, key; long ret; keyring = keyctl_join_session_keyring(argv[1]); OSERROR(keyring, "keyctl_join_session_keyring"); key = add_key("user", "a", "b", 1, keyring); OSERROR(key, "add_key"); ret = keyctl(KEYCTL_SESSION_TO_PARENT); OSERROR(ret, "KEYCTL_SESSION_TO_PARENT"); return 0; } Compiled and linked with -lkeyutils, you should see something like: [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 355907932 --alswrv 4043 -1 \_ keyring: _uid.4043 [dhowells@andromeda ~]$ /tmp/newpag [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 1055658746 --alswrv 4043 4043 \_ user: a [dhowells@andromeda ~]$ /tmp/newpag hello [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: hello 340417692 --alswrv 4043 4043 \_ user: a Where the test program creates a new session keyring, sticks a user key named 'a' into it and then installs it on its parent. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-09-02 21:29:22 +10:00
Ingo Molnar	936e894a97	Merge commit 'v2.6.31-rc8' into x86/txt Conflicts: arch/x86/kernel/reboot.c security/Kconfig Merge reason: resolve the conflicts, bump up from rc3 to rc8. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-02 08:17:56 +02:00
Shane Wang	69575d3886	x86, intel_txt: clean up the impact on generic code, unbreak non-x86 Move tboot.h from asm to linux to fix the build errors of intel_txt patch on non-X86 platforms. Remove the tboot code from generic code init/main.c and kernel/cpu.c. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-01 18:25:07 -07:00
Prarit Bhargava	1a8e42fa81	[CPUFREQ] Create a blacklist for processors that should not load the acpi-cpufreq module. Create a blacklist for processors that should not load the acpi-cpufreq module. The initial entry in the blacklist function is the Intel 0f68 processor. It's specification update mentions errata AL30 which implies that cpufreq should not run on this processor. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-09-01 12:45:20 -04:00
Mark Langsdorf	db39d5529d	[CPUFREQ] Powernow-k8: Enable more than 2 low P-states Remove an obsolete check that used to prevent there being more than 2 low P-states. Now that low-to-low P-states changes are enabled, it prevents otherwise workable configurations with multiple low P-states. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Tested-by: Krists Krilovs <pow@pow.za.net> Signed-off-by: Dave Jones <davej@redhat.com>	2009-09-01 12:45:20 -04:00
Catalin Marinas	acde31dc46	kmemleak: Ignore the aperture memory hole on x86_64 This block is allocated with alloc_bootmem() and scanned by kmemleak but the kernel direct mapping may no longer exist. This patch tells kmemleak to ignore this memory hole. The dma32_bootmem_ptr in dma32_reserve_bootmem() is also ignored. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Ingo Molnar <mingo@elte.hu>	2009-09-01 11:12:32 +01:00
H. Peter Anvin	ff55df53df	x86, msr: Export the register-setting MSR functions via /dev/*/msr Make it possible to access the all-register-setting/getting MSR functions via the MSR driver. This is implemented as an ioctl() on the standard MSR device node. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <petkovbb@gmail.com>	2009-08-31 16:16:04 -07:00
H. Peter Anvin	0cc0213e73	x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT For some reason, the _safe MSR functions returned -EFAULT, not -EIO. However, the only user which cares about the return code as anything other than a boolean is the MSR driver, which wants -EIO. Change it to -EIO across the board. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au>	2009-08-31 15:15:23 -07:00
Borislav Petkov	6b0f43ddfa	x86, AMD: Disable wrongly set X86_FEATURE_LAHF_LM CPUID bit `fbd8b1819e` turns off the bit for /proc/cpuinfo. However, a proper/full fix would be to additionally turn off the bit in the CPUID output so that future callers get correct CPU features info. Do that by basically reversing what the BIOS wrongfully does at boot. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1251705011-18636-3-git-send-email-petkovbb@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-31 15:14:29 -07:00
Borislav Petkov	177fed1ee8	x86, msr: Rewrite AMD rd/wrmsr variants Switch them to native_{rd,wr}msr_safe_regs and remove pv_cpu_ops.read_msr_amd. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <1251705011-18636-2-git-send-email-petkovbb@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-31 15:14:28 -07:00
Borislav Petkov	132ec92f3f	x86, msr: Add rd/wrmsr interfaces with preset registers native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow presetting of a subset of eight x86 GPRs before executing the rd/wrmsr instructions. This is needed at least on AMD K8 for accessing an erratum workaround MSR. Originally based on an idea by H. Peter Anvin. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <1251705011-18636-1-git-send-email-petkovbb@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-31 15:14:26 -07:00
Thomas Gleixner	e11dadabf4	x86: apic namespace cleanup boot_cpu_physical_apicid is a global variable and used as function argument as well. Rename the function arguments to avoid confusion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 21:30:47 +02:00
Thomas Gleixner	bc07844a33	x86: Distangle ioapic and i8259 The proposed Moorestown support patches use an extra feature flag mechanism to make the ioapic work w/o an i8259. There is a much simpler solution. Most i8259 specific functions are already called dependend on the irq number less than NR_IRQS_LEGACY. Replacing that constant by a read_mostly variable which can be set to 0 by the platform setup code allows us to achieve the same without any special feature flags. That trivial change allows us to proceed with MRST w/o doing a full blown overhaul of the ioapic code which would delay MRST unduly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 19:23:09 +02:00
Thomas Gleixner	3f4110a48a	x86: Add Moorestown early detection Moorestown MID devices need to be detected early in the boot process to setup and do not call x86_default_early_setup as there is no EBDA region to reserve. [ Copied the minimal code from Jacobs latest MRST series ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@intel.com>	2009-08-31 11:09:40 +02:00
Pan, Jacob jun	162bc7ab01	x86: Add hardware_subarch ID for Moorestown x86 bootprotocol 2.07 has introduced hardware_subarch ID in the boot parameters provided by FW. We use it to identify Moorestown platforms. [ tglx: Cleanup and paravirt fix ] Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 11:09:40 +02:00
Thomas Gleixner	47a3d5da70	x86: Add early platform detection Platforms like Moorestown require early setup and want to avoid the call to reserve_ebda_region. The x86_init override is too late when the MRST detection happens in setup_arch. Move the default i386 x86_init overrides and the call to reserve_ebda_region into a separate function which is called as the default of a switch case depending on the hardware_subarch id in boot params. This allows us to add a case for MRST and let MRST have its own early setup function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 11:09:40 +02:00
Thomas Gleixner	dd0a70c8f9	x86: Move tsc_init to late_time_init We do not need the TSC before late_time_init. Move the tsc_init to the late time init code so we can also utilize HPET for calibration (which we claimed to do but never did except in some older kernel version). This also helps Moorestown to calibrate the TSC with the AHBT timer which needs to be initialized in late_time_init like HPET. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:47 +02:00
Thomas Gleixner	2d826404f0	x86: Move tsc_calibration to x86_init_ops TSC calibration is modified by the vmware hypervisor and paravirt by separate means. Moorestown wants to add its own calibration routine as well. So make calibrate_tsc a proper x86_init_ops function and override it by paravirt or by the early setup of the vmware hypervisor. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:47 +02:00
Thomas Gleixner	47926214d8	x86: Replace the now identical time_32/64.c by time.c Remove the redundant copy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	ef4512882d	x86: time_32/64.c unify profile_pc The code is identical except for the formatting and a useless #ifdef. Make it the same. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	08047c4f17	x86: Move calibrate_cpu to tsc.c Move the code where it's only user is. Also we need to look whether this hardwired hackery might interfere with perfcounters. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	454ede7eeb	x86: Make timer setup and global variables the same in time_32/64.c The timer and timer irq setup code is identical in 32 and 64 bit. Make it the same formatting as well. Also add the global variables under the necessary ifdefs to both files. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	0be6939422	x86: Remove mca bus ifdef from timer interrupt MCA_bus is constant 0 when CONFIG_MCA=n. So the compiler removes that code w/o needing an extra #ifdef Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	64fcbac1f3	x86: Simplify timer_ack magic in time_32.c Let the compiler optimize the timer_ack magic away in the 32bit timer interrupt and put the same code into time_64.c. It's optimized out for CONFIG_X86_IO_APIC on 32bit and for 64bit because timer_ack is const 0 in both cases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	dd3e6e8c6e	x86: Prepare unification of time_32/64.c Unify the top comment and the includes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	ecce85089e	x86: Remove do_timer hook This is a left over of the old x86 sub arch support. Remove it and open code it like we do in time_64.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	845b3944bb	x86: Add timer_init to x86_init_ops The timer init code is convoluted with several quirks and the paravirt timer chooser. Figuring out which code path is actually taken is not for the faint hearted. Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and replace the paravirt time chooser and the remaining x86 quirk with a simple x86_init_ops function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	736decac64	x86: Move percpu clockevents setup to x86_init_ops paravirt overrides the setup of the default apic timers as per cpu timers. Moorestown needs to override that as well. Move it to x86_init_ops setup and create a separate x86_cpuinit struct which holds the function for the secondary evtl. hotplugabble CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	f1d7062a23	x86: Move xen_post_allocator_init into xen_pagetable_setup_done We really do not need two paravirt/x86_init_ops functions which are called in two consecutive source lines. Move the only user of post_allocator_init into the already existing pagetable_setup_done function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	030cb6c00d	x86: Move paravirt pagetable_setup to x86_init_ops Replace more paravirt hackery by proper x86_init_ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	6f30c1ac3f	x86: Move paravirt banner printout to x86_init_ops Replace another obscure paravirt magic and move it to x86_init_ops. Such a hook is also useful for embedded and special hardware. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	42bbdb43b1	x86: Replace ARCH_SETUP by a proper x86_init_ops ARCH_SETUP is a horrible leftover from the old arch/i386 mach support code. It still has a lonely user in xen. Move it to x86_init_ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	428cf9025b	x86: Move traps_init to x86_init_ops Replace the quirks by a simple x86_init_ops function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	66bcaf0bde	x86: Move irq_init to x86_init_ops irq_init is overridden by x86_quirks and by paravirts. Unify the whole mess and make it an unconditional x86_init_ops function which defaults to the standard function and can be overridden by the early platform code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	d9112f4302	x86: Move pre_intr_init to x86_init_ops Replace the quirk machinery by a x86_init_ops function which defaults to the standard implementation. This is also a preparatory patch for Moorestown support which needs to replace the default init_ISA_irqs as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	b3f1b617f4	x86: Move get/find_smp_config to x86_init_ops Replace the quirk machinery by a x86_init_ops function which defaults to the standard implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Jan Beulich	47d25003cb	x86: Fix earlyprintk=dbgp for machines without NX Since parse_early_param() may (e.g. for earlyprintk=dbgp) involve calls to page table manipulation functions (here set_fixmap_nocache()), NX hardware support must be determined before calling that function (so that __supported_pte_mask gets properly set up). But the call after parse_early_param() can also not go away, as that will honor eventual command line specified disabling of the NX functionality. ( This will then just result in whatever mappings got established during parse_early_param() having the NX bit set despite it being disabled on the command line, but I think that's tolerable). Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> LKML-Reference: <4A97F3BD02000078000121B9@vpn.id2.novell.com> [ merged to x86/pat to resolve a conflict. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-29 15:47:32 +02:00
Ingo Molnar	eebc57f73d	Merge branch 'for-ingo' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6 into x86/apic Merge reason: the SFI (Simple Firmware Interface) feature in the ACPI tree needs this cleanup, pull it into the APIC branch as well so that there's no interactions. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-29 09:31:47 +02:00
Feng Tang	efafc8b213	x86: add arch-specific SFI support arch/x86/kernel/sfi.c serves the dual-purpose of supporting the SFI core with arch specific code, as well as a home for the arch-specific code that uses SFI. analogous to ACPI, drivers/sfi/Kconfig is pulled in by arch/x86/Kconfig Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2009-08-28 19:57:34 -04:00
Feng Tang	2a4ab640d3	ACPI, x86: expose some IO-APIC routines when CONFIG_ACPI=n Some IO-APIC routines are ACPI specific now, but need to be exposed when CONFIG_ACPI=n for the benefit of SFI. Remove #ifdef ACPI around these routines: io_apic_get_unique_id(int ioapic, int apic_id); io_apic_get_version(int ioapic); io_apic_get_redir_entries(int ioapic); Move these routines from ACPI-specific boot.c to io_apic.c: uniq_ioapic_id(u8 id) mp_find_ioapic() mp_find_ioapic_pin() mp_register_ioapic() Also, since uniq_ioapic_id() is now no longer static, re-name it to io_apic_unique_id() for consistency with the other public io_apic routines. For simplicity, do not #ifdef the resulting code ACPI \|\| SFI, thought that could be done in the future if it is important to optimize the !ACPI !SFI IO-APIC x86 kernel for size. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2009-08-28 19:57:27 -04:00
Thomas Gleixner	7285dd7fd3	clocksource: Resolve cpu hotplug dead lock with TSC unstable Martin Schwidefsky analyzed it: To register a clocksource the clocksource_mutex is acquired and if necessary timekeeping_notify is called to install the clocksource as the timekeeper clock. timekeeping_notify uses stop_machine which needs to take cpu_add_remove_lock mutex. Starting a new cpu is done with the cpu_add_remove_lock mutex held. native_cpu_up checks the tsc of the new cpu and if the tsc is no good clocksource_change_rating is called. Which needs the clocksource_mutex and the deadlock is complete. The solution is to replace the TSC via the clocksource watchdog mechanism. Mark the TSC as unstable and schedule the watchdog work so it gets removed in the watchdog thread context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com>	2009-08-28 20:25:24 +02:00
Thomas Gleixner	90e1c6969d	x86: Move oem_bus_info to x86_init_ops Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:53 +02:00
Thomas Gleixner	52fdb56846	x86: Move mpc_oem_pci_bus to x86_init_ops Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
Thomas Gleixner	72302142e1	x86: Move smp_read_mpc_oem to x86_init_ops. Move smp_read_mpc_oem from quirks to x86_init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
Thomas Gleixner	fd6c666149	x86: Move mpc_apic_id to x86_init_ops The mpc_apic_id setup is handled by a x86_quirk. Make it a x86_init_ops function with a default implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
Thomas Gleixner	de93410310	x86: Move ioapic_ids_setup to x86_init_ops 32bit and also the numaq code have special requirements on the ioapic_id setup. Convert it to a x86_init_ops function and get rid of the quirks and #ifdefs Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
Thomas Gleixner	f4848472cd	x86: Sanitize smp_record and move it to x86_init_ops The x86 quirkification introduced an extra ugly hackery with a variable pointer in the mpparse code. If the pointer is initialized then it is dereferenced and the variable set to 0 or incremented. Create a x86_init_ops function and let the affected numaq code hold the function. Default init is a setup noop. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
Thomas Gleixner	6b18ae3e2f	x86: Move memory_setup to x86_init_ops memory_setup is overridden by x86_quirks and by paravirts with weak functions and quirks. Unify the whole mess and make it an unconditional x86_init_ops function which defaults to the standard function and can be overridden by the early platform code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
Thomas Gleixner	816c25e7d4	x86: Add reserve_ebda_region to x86_init_ops reserve_ebda_region needs to be called befor start_kernel. Moorestown needs to override it. Make it a x86_init_ops function and initialize it with the default reserve_ebda_region. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
Thomas Gleixner	8fee697d99	x86: Add request_standard_resources to x86_init The 32bit and the 64bit code are slighty different in the reservation of standard resources. Also the upcoming Moorestown support needs its own version of that. Add it to x86_init_ops and initialize it with the 64bit default. 32bit overrides it in early boot. Now moorestown can add it's own override w/o sprinkling the code with more #ifdefs Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
Thomas Gleixner	f7cf5a5b8c	x86: Add probe_roms to x86_init probe_roms is only used on 32bit. Add it to the x86_init ops and remove the #ifdefs. Default initializer is x86_init_noop() which is overridden in the 32bit boot code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
Thomas Gleixner	57844a8f8e	x86: Add x86_init infrastructure The upcoming Moorestown support brings the embedded world to x86. The setup code of x86 has already a couple of hooks which are either x86_quirks or paravirt ops. Some of those setup hooks are pretty convoluted like the timer setup and the tsc calibration code. But there are other places which could do with a cleanup. Instead of having inline functions/macros which are modified at compile time I decided to introduce x86_init ops which are unconditional in the code and make it clear that they can be changed either during compile time or in the early boot process. The function pointers are initialized by default functions which can be noops so that the pointer can be called unconditionally in the most cases. This also allows us to remove 32bit/64bit, paravirt and other #ifdeffery. paravirt guests are just a hardware platform in the setup code, so we should treat them as such and not hide all behind multiple layers of indirection and compile time dependencies. It's more obvious that x86_init.timers.timer_init() is a function pointer than the late_time_init = choose_time_init() obscurity. It's also way simpler to grep for x86_init.timers.timer_init and find all the places which modify that function pointer instead of analyzing weak functions, macros and paravirt indirections. Note. This is not a general paravirt_ops replacement. It just will move setup related hooks which are potentially useful for other platform setup purposes as well out of the paravirt domain. Add the base infrastructure without any functionality. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
Thomas Gleixner	4152f93508	Merge branch 'sched/clock' into x86/cleanups Reason: The tsc init cleanup depends on sched_clock_init moving past late_time_init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:06:24 +02:00
H. Peter Anvin	b855192c08	Merge branch 'x86/urgent' into x86/pat Reason: Change to is_new_memtype_allowed() in x86/urgent Resolved semantic conflicts in: arch/x86/mm/pat.c arch/x86/mm/ioremap.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 17:24:28 -07:00
Jason Baron	57421dbbdc	tracing: Convert event tracing code to use NR_syscalls Convert the syscalls event tracing code to use NR_syscalls, instead of FTRACE_SYSCALL_MAX. NR_syscalls is standard accross most arches, and reduces code confusion/complexity. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Josh Stone <jistone@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anwin <hpa@zytor.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <9b4f1a84ecae57cc6599412772efa36f0d2b815b.1251146513.git.jbaron@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 21:30:02 +02:00
Jason Baron	a5a2f8e2ac	tracing: Define NR_syscalls for x86_64 Express the available number of syscalls in a standard way by defining NR_syscalls. The common way to define it is to place its definition in asm/unistd.h However, the number of syscalls is defined using __NR_syscall_max in x86-64 after building a dynamic header file "asm-offsets.h" The source file that generates this header, asm-offsets-64.c includes unistd.h, then if we want to express NR_syscalls from __NR_syscall_max in unistd.h only after generating the dynamic header file, we need a watchguard. If unistd.h is included from asm-offsets-64.c, then we are generating asm-offset.h which defines __NR_syscall_max. At this time, we don't want to (we can't) define NR_syscalls, then we do nothing. Otherwise we define NR_syscalls because we know asm-offsets.h has been generated. Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Josh Stone <jistone@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anwin <hpa@zytor.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <20090826160910.GB2658@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 21:29:58 +02:00
Cyrill Gorcunov	d3a247bfb2	x86, apic: Slim down stack usage in early_init_lapic_mapping() As far as I see there is no external poking of mp_lapic_addr in this procedure which could lead to unpredited changes and require local storage unit for it. Lets use it plain forward. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090826171324.GC4548@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-26 20:26:23 +02:00
Hidetoshi Seto	680b6cfd3c	x86, mce: CE in last bank prevents panic by unknown MCE If MCE handler is called but none of mces_seen have machine check event which might signal the MCE (i.e. event higher than MCE_KEEP_SEVERITY), panic with "Machine check from unknown source" will be taken since the MCE is assumed to be signaled from external agent or so. Usually mces_seen never point MCE_KEEP_SEVERITY event such as CE. But it can happen because initial value of mces_seen is accidentally modified by mce_no_way_out() - in case if mce_no_way_out() run through all banks and the last bank has the CE, mces_seen points the CE and the "panic by unknown" will not be taken. This patch fixes this undesired behavior, and clarifies the logic. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com> LKML-Reference: <4A94E244.3020301@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>	2009-08-26 20:21:11 +02:00
Yinghai Lu	295594e9cf	x86: Fix vSMP boot crash 2.6.31-rc7 does not boot on vSMP systems: [ 8.501108] CPU31: Thermal monitoring enabled (TM1) [ 8.501127] CPU 31 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8 [ 8.650254] CPU31: Intel(R) Xeon(R) CPU E5540 @ 2.53GHz stepping 04 [ 8.710324] Brought up 32 CPUs [ 8.713916] Total of 32 processors activated (162314.96 BogoMIPS). [ 8.721489] ERROR: parent span is not a superset of domain->span [ 8.727686] ERROR: domain->groups does not contain CPU0 [ 8.733091] ERROR: groups don't span domain->span [ 8.737975] ERROR: domain->cpu_power not set [ 8.742416] Ravikiran Thirumalai bisected it to: \| commit `2759c3287d` \| x86: don't call read_apic_id if !cpu_has_apic The problem is that on vSMP systems the CPUID derived initial-APICIDs are overlapping - so we need to fall back on hard_smp_processor_id() which reads the local APIC. Both come from the hardware (influenced by firmware though) so it's a tough call which one to trust. Doing the quirk expresses the vSMP property properly and also does not affect other systems, so we go for this solution instead of a revert. Reported-and-Tested-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shai Fultheim <shai@scalex86.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <4A944D3C.5030100@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-26 10:13:17 +02:00
Cyrill Gorcunov	5051fd6977	x86, e820: Guard against array overflowed in __e820_add_region() Better to be paranoid against unpredicted nr_map modifications. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090824175551.146070377@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-26 08:17:47 +02:00
Cyrill Gorcunov	ffc438366c	x86, ioapic: Get rid of needless check and simplify ioapic_setup_resources() alloc_bootmem() already panics on allocation failure. There is no need to check the result. Also there is a way to unbind global variable from its body and use it as a parameter which allow us to simplify ioapic_init_mappings as well -- "for" cycle already uses nr_ioapics as a conditional variable and there is no need to check if ioapic_setup_resources was returning NULL again. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090824175551.493629148@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-26 08:16:38 +02:00
Cyrill Gorcunov	8f3e1df48b	x86, ioapic: Define IO_APIC_DEFAULT_PHYS_BASE constant We already have APIC_DEFAULT_PHYS_BASE so just to be consistent. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090824175550.927946757@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-26 08:16:38 +02:00
Josh Stone	1c569f0264	tracing: Create generic syscall TRACE_EVENTs This converts the syscall_enter/exit tracepoints into TRACE_EVENTs, so you can have generic ftrace events that capture all system calls with arguments and return values. These generic events are also renamed to sys_enter/exit, so they're more closely aligned to the specific sys_enter_foo events. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <1251150194-1713-5-git-send-email-jistone@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 00:41:48 +02:00
H. Peter Anvin	e8a2eb47e6	Merge commit 'origin/x86/urgent' into x86/asm	2009-08-25 15:40:29 -07:00
Josh Stone	9741987586	tracing: Move tracepoint callbacks from declaration to definition It's not strictly correct for the tracepoint reg/unreg callbacks to occur when a client is hooking up, because the actual tracepoint may not be present yet. This happens to be fine for syscall, since that's in the core kernel, but it would cause problems for tracepoints defined in a module that hasn't been loaded yet. It also means the reg/unreg has to be EXPORTed for any modules to use the tracepoint (as in SystemTap). This patch removes DECLARE_TRACE_WITH_CALLBACK, and instead introduces DEFINE_TRACE_FN which stores the callbacks in struct tracepoint. The callbacks are used now when the active state of the tracepoint changes in set_tracepoint & disable_tracepoint. This also introduces TRACE_EVENT_FN, so ftrace events can also provide registration callbacks if needed. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <1251150194-1713-4-git-send-email-jistone@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 00:36:41 +02:00
Josh Stone	6670000119	tracing: Rename FTRACE_SYSCALLS for tracepoints s/HAVE_FTRACE_SYSCALLS/HAVE_SYSCALL_TRACEPOINTS/g s/TIF_SYSCALL_FTRACE/TIF_SYSCALL_TRACEPOINT/g The syscall enter/exit tracing is no longer specific to just ftrace, so they now have names that reflect their tie to tracepoints instead. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <1251150194-1713-2-git-send-email-jistone@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-26 00:17:35 +02:00
Linus Torvalds	44afa9a4b8	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clockevent: Prevent dead lock on clockevents_lock timers: Drop write permission on /proc/timer_list	2009-08-25 11:24:04 -07:00
Linus Torvalds	9f459fadbb	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix build with older binutils and consolidate linker script x86: Fix an incorrect argument of reserve_bootmem() x86: add vmlinux.lds to targets in arch/x86/boot/compressed/Makefile xen: rearrange things to fix stackprotector x86: make sure load_percpu_segment has no stackprotector i386: Fix section mismatches for init code with !HOTPLUG_CPU x86, pat: Allow ISA memory range uncacheable mapping requests	2009-08-25 11:23:25 -07:00
Roel Kluin	005155b1f6	x86: Fix x86_model test in es7000_apic_is_cluster() For the x86_model to be greater than 6 or less than 12 is logically always true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-25 15:58:12 +02:00
Jan Beulich	c62e43202e	x86: Fix build with older binutils and consolidate linker script binutils prior to 2.17 can't deal with the currently possible situation of a new segment following the per-CPU segment, but that new segment being empty - objcopy misplaces the .bss (and perhaps also the .brk) sections outside of any segment. However, the current ordering of sections really just appears to be the effect of cumulative unrelated changes; re-ordering things allows to easily guarantee that the segment following the per-CPU one is non-empty, and at once eliminates the need for the bogus data.init2 segment. Once touching this code, also use the various data section helper macros from include/asm-generic/vmlinux.lds.h. -v2: fix !SMP builds. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: <sam@ravnborg.org> LKML-Reference: <4A94085D02000078000119A5@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-25 15:54:16 +02:00
Alexey Dobriyan	10f02d1168	x86: uv: Clean up uv_ptc_init(), use proc_create() create_proc_entry() is getting duhprecated. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: cpw@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-24 12:26:46 +02:00
Ingo Molnar	5f9ece0240	Merge commit 'v2.6.31-rc7' into x86/cleanups Merge reason: we were on -rc1 before - go up to -rc7 Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-24 12:25:54 +02:00
Ingo Molnar	8a517c514d	Merge commit 'v2.6.31-rc7' into x86/cpu	2009-08-23 11:18:47 +02:00
H. Peter Anvin	5400743db5	x86, mtrr: make mtrr_aps_delayed_init static bool mtr_aps_delayed_init was declared u32 and made global, but it only ever takes boolean values and is only ever used in arch/x86/kernel/cpu/mtrr/main.c. Declare it "static bool" and remove external references. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com>	2009-08-21 17:00:02 -07:00
Suresh Siddha	d0af9eed5a	x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init SDM Vol 3a section titled "MTRR considerations in MP systems" specifies the need for synchronizing the logical cpu's while initializing/updating MTRR. Currently Linux kernel does the synchronization of all cpu's only when a single MTRR register is programmed/updated. During an AP online (during boot/cpu-online/resume) where we initialize all the MTRR/PAT registers, we don't follow this synchronization algorithm. This can lead to scenarios where during a dynamic cpu online, that logical cpu is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical HT sibling continue to run (also with cache disabled because of cr0.cd=1 on its sibling). Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly (because of some VMX performance optimizations) and the above scenario (with one logical cpu doing VMX activity and another logical cpu coming online) can result in system crash. Fix the MTRR initialization by doing rendezvous of all the cpus. During boot and resume, we delay the MTRR/PAT init for APs till all the logical cpu's come online and the rendezvous process at the end of AP's bringup, will initialize the MTRR/PAT for all AP's. For dynamic single cpu online, we synchronize all the logical cpus and do the MTRR/PAT init on the AP that is coming online. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-21 16:25:55 -07:00
Jan Beulich	8b5a10fc6f	x86: properly annotate alternatives.c Some of the NOPs tables aren't used on 64-bits, quite some code and data is needed post-init for module loading only, and a couple of functions aren't used outside that file (i.e. can be static, and don't need to be exported). The change to __INITDATA/__INITRODATA is needed to avoid an assembler warning. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4A8BC8A00200007800010823@vpn.id2.novell.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-21 15:30:12 -07:00
john stultz	da15cfdae0	time: Introduce CLOCK_REALTIME_COARSE After talking with some application writers who want very fast, but not fine-grained timestamps, I decided to try to implement new clock_ids to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE which returns the time at the last tick. This is very fast as we don't have to access any hardware (which can be very painful if you're using something like the acpi_pm clocksource), and we can even use the vdso clock_gettime() method to avoid the syscall. The only trade off is you only get low-res tick grained time resolution. This isn't a new idea, I know Ingo has a patch in the -rt tree that made the vsyscall gettimeofday() return coarse grained time when the vsyscall64 sysctrl was set to 2. However this affects all applications on a system. With this method, applications can choose the proper speed/granularity trade-off for themselves. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: nikolag@ca.ibm.com Cc: Darren Hart <dvhltc@us.ibm.com> Cc: arjan@infradead.org Cc: jonathan@jonmasters.org LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-21 21:43:46 +02:00
Thomas Gleixner	8cab02dc3c	x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown This basically reverts commit `1a0c009ac` (x86: unregister PIT clocksource when PIT is disabled) because the problem which was tried to address with that patch has been solved by commit `3f68535ada` (clocksource: sanity check sysfs clocksource changes). The problem addressed by the original patch is that PIT could be selected as clocksource after the system switched the PIT off or set the PIT into one shot mode which would result in complete timekeeping wreckage. Now with the sysfs sanity check in place PIT cannot be selected again when the system is in oneshot mode. The system will not switch to one shot mode as long as PIT is installed because PIT is not suitable for one shot. The shutdown case which happens when the lapic timer is installed is covered by the fact that init_pit_clocksource() is called after the lapic timer take over and then does not install the PIT clocksource at all. We should have done the sanity checks back then, but ... This also solves the locking problem which was reported vs. the clocksource rework. LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-21 21:13:37 +02:00
Linus Torvalds	83d349f35e	x86: don't send an IPI to the empty set of CPU's The default_send_IPI_mask_logical() function uses the "flat" APIC mode to send an IPI to a set of CPU's at once, but if that set happens to be empty, some older local APIC's will apparently be rather unhappy. So just warn if a caller gives us an empty mask, and ignore it. This fixes a regression in 2.6.30.x, due to commit `4595f9620` ("x86: change flush_tlb_others to take a const struct cpumask"), documented here: http://bugzilla.kernel.org/show_bug.cgi?id=13933 which causes a silent lock-up. It only seems to happen on PPro, P2, P3 and Athlon XP cores. Most developers sadly (or not so sadly, if you're a developer..) have more modern CPU's. Also, on x86-64 we don't use the flat APIC mode, so it would never trigger there even if the APIC didn't like sending an empty IPI mask. Reported-by: Pavel Vilim <wylda@volny.cz> Reported-and-tested-by: Thomas Björnell <thomas.bjornell@gmail.com> Reported-and-tested-by: Martin Rogge <marogge@onlinehome.de> Cc: Mike Travis <travis@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-21 09:23:57 -07:00
Xiao Guangrong	8126dec327	x86: Fix system crash when loading with "reservetop" parameter The system will die if the kernel is booted with "reservetop" parameter, in present code, parse "reservetop" parameter after early_ioremap_init(), and some function still use early_ioremap() after it. The problem is, "reservetop" parameter can modify 'FIXADDR_TOP', then the virtual address got by early_ioremap() is base on old 'FIXADDR_TOP', but the page mapping is base on new 'FIXADDR_TOP', it will occur page fault, and the IDT is not prepare yet, so, the system is dead. So, put parse_early_param() in the front of early_ioremap_init() in this patch. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: yinghai@kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A8D402F.4080805@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-21 16:40:30 +02:00
Ingo Molnar	cbcb340cb6	Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/urgent	2009-08-20 12:05:24 +02:00
Jeremy Fitzhardinge	5416c26635	x86: make sure load_percpu_segment has no stackprotector load_percpu_segment() is used to set up the per-cpu segment registers, which are also used for -fstack-protector. Make sure that the load_percpu_segment() function doesn't have stackprotector enabled. [ Impact: allow percpu setup before calling stack-protected functions ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-08-19 17:09:21 -07:00
Suresh Siddha	f833bab87f	clockevent: Prevent dead lock on clockevents_lock Currently clockevents_notify() is called with interrupts enabled at some places and interrupts disabled at some other places. This results in a deadlock in this scenario. cpu A holds clockevents_lock in clockevents_notify() with irqs enabled cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled cpu C doing set_mtrr() which will try to rendezvous of all the cpus. This will result in C and A come to the rendezvous point and waiting for B. B is stuck forever waiting for the spinlock and thus not reaching the rendezvous point. Fix the clockevents code so that clockevents_lock is taken with interrupts disabled and thus avoid the above deadlock. Also call lapic_timer_propagate_broadcast() on the destination cpu so that we avoid calling smp_call_function() in the clockevents notifier chain. This issue left us wondering if we need to change the MTRR rendezvous logic to use stop machine logic (instead of smp_call_function) or add a check in spinlock debug code to see if there are other spinlocks which gets taken under both interrupts enabled/disabled conditions. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com> Cc: "Brown Len" <len.brown@intel.com> LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-19 18:15:10 +02:00
Linus Torvalds	77f312a96d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: use the right flag for get_vm_area() percpu, sparc64: fix sparse possible cpu map handling init: set nr_cpu_ids before setup_per_cpu_areas()	2009-08-18 19:41:05 -07:00
Jan Beulich	78b89ecd73	i386: Fix section mismatches for init code with !HOTPLUG_CPU Commit `0e83815be7` changed the section the initial_code variable gets allocated in, in an attempt to address a section conflict warning. This, however created a new section conflict when building without HOTPLUG_CPU. The apparently only (reasonable) way to address this is to always use __REFDATA. Once at it, also fix a second section mismatch when not using HOTPLUG_CPU. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4A8AE7CD020000780001054B@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-18 17:52:35 +02:00
Yinghai Lu	b7f42ab2e2	x86, apic: Move dmar_table_init() out of enable_IR() On an x2apic system, we got: [ 1.818072] ------------[ cut here ]------------ [ 1.820376] WARNING: at kernel/lockdep.c:2461 lockdep_trace_alloc+0xa5/0xe9() [ 1.835282] Hardware name: ASSY, [ 1.839006] Modules linked in: [ 1.841253] Pid: 1, comm: swapper Not tainted 2.6.31-rc5-tip-03926-g39aaa80-dirty #510 [ 1.858056] Call Trace: [ 1.859913] [<ffffffff810d13aa>] ? lockdep_trace_alloc+0xa5/0xe9 [ 1.876270] [<ffffffff81093f37>] warn_slowpath_common+0x8d/0xd0 [ 1.879132] [<ffffffff81093fa1>] warn_slowpath_null+0x27/0x3d [ 1.896823] [<ffffffff810d13aa>] lockdep_trace_alloc+0xa5/0xe9 [ 1.900659] [<ffffffff810cf5a0>] ? lock_release_holdtime+0x2f/0x199 [ 1.917188] [<ffffffff81167a3c>] kmem_cache_alloc_notrace+0x42/0x111 [ 1.922320] [<ffffffff8106fe8c>] ? reserve_memtype+0x152/0x518 [ 1.938137] [<ffffffff8106f8b1>] ? pat_pagerange_is_ram+0x4a/0x91 [ 1.941730] [<ffffffff8106fe8c>] reserve_memtype+0x152/0x518 [ 1.958115] [<ffffffff8106ce62>] __ioremap_caller+0x1dd/0x30f [ 1.975507] [<ffffffff81ce2c5c>] ? acpi_os_map_memory+0x2a/0x47 [ 1.978987] [<ffffffff8106d0fd>] ioremap_nocache+0x2a/0x40 [ 2.031400] [<ffffffff810d0364>] ? trace_hardirqs_off+0x20/0x36 [ 2.036096] [<ffffffff81ce2c5c>] acpi_os_map_memory+0x2a/0x47 [ 2.046263] [<ffffffff815cd642>] acpi_tb_verify_table+0x3d/0x85 [ 2.050349] [<ffffffff81d34af7>] ? _spin_unlock_irqrestore+0x50/0x76 [ 2.067327] [<ffffffff815ccad6>] acpi_get_table_with_size+0x64/0xd9 [ 2.070860] [<ffffffff81d34af7>] ? _spin_unlock_irqrestore+0x50/0x76 [ 2.088000] [<ffffffff825c88d5>] dmar_table_detect+0x33/0x70 [ 2.092047] [<ffffffff825c8a01>] dmar_table_init+0x43/0x428 [ 2.106854] [<ffffffff825a7537>] enable_IR+0x1c/0x8d [ 2.110256] [<ffffffff825a7624>] enable_IR_x2apic+0x7c/0x19e [ 2.127139] [<ffffffff825a4876>] native_smp_prepare_cpus+0x139/0x3b8 [ 2.145175] [<ffffffff8259678d>] kernel_init+0x71/0x1da [ 2.148913] [<ffffffff8104305a>] child_rip+0xa/0x20 [ 2.152349] [<ffffffff810429fc>] ? restore_args+0x0/0x30 [ 2.167931] [<ffffffff8259671c>] ? kernel_init+0x0/0x1da [ 2.171671] [<ffffffff81043050>] ? child_rip+0x0/0x20 [ 2.187607] ---[ end trace a7919e7f17c0a725 ]--- Venkatesh Pallipadi said: \| Looks like the problem started with this commit \| \| commit `ce69a78450` \| Author: Gleb Natapov <gleb@redhat.com> \| Date: Mon Jul 20 15:24:17 2009 +0300 \| \| x86/apic: Enable x2APIC without interrupt remapping under KVM \| \| Before this commit, dmar_table_init() was getting called \| with interrupts enabled and after this commit, it is getting \| called with interrupts disabled. so try to move out dmar_table_init out of that function. Analyzed-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gleb Natapov <gleb@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> LKML-Reference: <4A899F3C.2050104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-17 20:22:56 +02:00
H. Peter Anvin	62a3207b8c	x86, intel_txt: Handle ACPI_SLEEP without X86_TRAMPOLINE On 32 bits, we can have CONFIG_ACPI_SLEEP set without implying CONFIG_X86_TRAMPOLINE. In that case, we simply do not need to mark the trampoline as a MAC region. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Shane Wang <shane.wang@intel.com> Cc: Joseph Cihula <joseph.cihula@intel.com>	2009-08-17 11:16:16 -07:00
Ingo Molnar	e412cd257e	x86, mce: Don't initialize MCEs on unknown CPUs An older test-box started hanging at the following point during bootup: [ 0.022996] Mount-cache hash table entries: 512 [ 0.024996] Initializing cgroup subsys debug [ 0.025996] Initializing cgroup subsys cpuacct [ 0.026995] Initializing cgroup subsys devices [ 0.027995] Initializing cgroup subsys freezer [ 0.028995] mce: CPU supports 5 MCE banks I've bisected it down to commit `4efc0670` ("x86, mce: use 64bit machine check code on 32bit"), which utilizes the MCE code on 32-bit systems too. The problem is caused by this detail in my config: # CONFIG_CPU_SUP_INTEL is not set This disables the quirks in mce_cpu_quirks() but still enables MCE support - which then hangs due to the missing quirk workaround needed on this CPU: if (c->x86 == 6 && c->x86_model < 0x1A && banks > 0) mce_banks[0].init = 0; The safe solution is to not initialize MCEs if we dont know on what CPU we are running (or if that CPU's support code got disabled in the config). Also be a bit more defensive on 32-bit systems: dont do a boot-time dump of pending MCEs not just on the specific system that we found a problem with (Pentium-M), but earlier ones as well. Now this problem is probably not common and disabling CPU support is rare - but still being more defensive in something we turned on for a wide range of CPUs is prudent. Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> LKML-Reference: Message-ID: <4A88E3E4.40506@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-17 13:28:25 +02:00
Bartlomiej Zolnierkiewicz	c7f6fa4411	x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog): MCE 0 HARDWARE ERROR. This is NOT a software problem! Please contact your hardware vendor CPU 0 BANK 1 MCG status: MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: Data CACHE Level-1 UNKNOWN Error STATUS f200000000000195 MCGSTATUS 0 [ The other STATUS values observed: f2000000000001b5 (... UNKNOWN error) and f200000000000115 (... READ Error). To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump content of STATUS MSR before it is cleared during initialization. ] Since the bogus MCE results in a kernel taint (which in turn disables lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs by default ("mce=bootlog" boot parameter can be be used to get the old behavior). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-17 10:17:02 +02:00
Leonardo Potenza	52459ab913	x86: Annotate section mismatch warnings in kernel/apic/x2apic_uv_x.c The function uv_acpi_madt_oem_check() has been marked __init, the struct apic_x2apic_uv_x has been marked __refdata. The aim is to address the following section mismatch messages: WARNING: arch/x86/kernel/apic/built-in.o(.data+0x1368): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary() The variable apic_x2apic_uv_x references the function __cpuinit uv_wakeup_secondary() If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable: driver, _template, _timer, _sht, _ops, _probe, _probe_one, _console, WARNING: arch/x86/kernel/built-in.o(.data+0x68e8): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary() The variable apic_x2apic_uv_x references the function __cpuinit uv_wakeup_secondary() If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable: driver, _template, _timer, _sht, _ops, _probe, _probe_one, _console, WARNING: arch/x86/built-in.o(.text+0x7b36f): Section mismatch in reference from the function uv_acpi_madt_oem_check() to the function .init.text:early_ioremap() The function uv_acpi_madt_oem_check() references the function __init early_ioremap(). This is often because uv_acpi_madt_oem_check lacks a __init annotation or the annotation of early_ioremap is wrong. WARNING: arch/x86/built-in.o(.text+0x7b38d): Section mismatch in reference from the function uv_acpi_madt_oem_check() to the function .init.text:early_iounmap() The function uv_acpi_madt_oem_check() references the function __init early_iounmap(). This is often because uv_acpi_madt_oem_check lacks a __init annotation or the annotation of early_iounmap is wrong. WARNING: arch/x86/built-in.o(.data+0x8668): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary() The variable apic_x2apic_uv_x references the function __cpuinit uv_wakeup_secondary() If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable: driver, _template, _timer, _sht, _ops, _probe, _probe_one, _console, Signed-off-by: Leonardo Potenza <lpotenza@inwind.it> LKML-Reference: <200908161855.48302.lpotenza@inwind.it> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-16 19:44:13 +02:00
Hugh Dickins	4e5c25d405	x86, mce: therm_throt: Don't log redundant normality `0d01f31439` "x86, mce: therm_throt - change when we print messages" removed redundant announcements of "Temperature/speed normal". They're not worth logging and remove their accompanying "Machine check events logged" messages as well from the console. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dmitry Torokhov <dtor@mail.ru> LKML-Reference: <Pine.LNX.4.64.0908161544100.7929@sister.anvils> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-16 17:25:41 +02:00
Ingo Molnar	be750231ce	Merge branch 'perfcounters/urgent' into perfcounters/core Conflicts: kernel/perf_counter.c Merge reason: update to latest upstream (-rc6) and resolve the conflict with urgent fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-15 12:06:12 +02:00
Cliff Wickman	3ef12c3c97	x86: Fix UV BAU destination subnode id The SGI UV Broadcast Assist Unit is used to send TLB shootdown messages to remote nodes of the system. The header of the message must contain the subnode id of the block in the receiving hub that handles such messages. It should always be 0x10, the id of the "LB" block. It had previously been documented as a "must be zero" field. Signed-off-by: Cliff Wickman <cpw@sgi.com> Acked-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <E1Mc1x7-0005Ce-6t@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-15 11:58:02 +02:00
Martin Schwidefsky	d4f587c67f	timekeeping: Increase granularity of read_persistent_clock() The persistent clock of some architectures (e.g. s390) have a better granularity than seconds. To reduce the delta between the host clock and the guest clock in a virtualized system change the read_persistent_clock function to return a struct timespec. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134811.013873340@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-15 10:55:46 +02:00
Martin Schwidefsky	1be3967948	timekeeping: Move reset of cycle_last for tsc clocksource to tsc change_clocksource resets the cycle_last value to zero then sets it to a value read from the clocksource. The reset to zero is required only for the TSC clocksource to make the read_tsc function work after a resume. The reason is that the TSC read function uses cycle_last to detect backwards going TSCs. In the resume case cycle_last contains the TSC value from the last update before the suspend. On resume the TSC starts counting from 0 again and would trip over the cycle_last comparison. This is subtle and surprising. Move the reset to a resume function in the tsc code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.142191175@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-15 10:55:45 +02:00
H. Peter Anvin	58c41d2825	x86, intel_txt: Factor out the code for S3 setup S3 sleep requires special setup in tboot. However, the data structures needed to do such setup are only available if CONFIG_ACPI_SLEEP is enabled. Abstract them out as much as possible, so we can have a single tboot_setup_sleep() which either is a proper implementation or a stub which simply calls BUG(). Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Shane Wang <shane.wang@intel.com> Cc: Joseph Cihula <joseph.cihula@intel.com>	2009-08-14 12:14:19 -07:00
Tejun Heo	4518e6a0c0	x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA Embedding percpu first chunk allocator can now handle very sparse unit mapping. Use embedding allocator instead of lpage for 64bit NUMA. This removes extra TLB pressure and the need to do complex and fragile dancing when changing page attributes. For 32bit, using very sparse unit mapping isn't a good idea because the vmalloc space is very constrained. 32bit NUMA machines aren't exactly the focus of optimization and it isn't very clear whether lpage performs better than page. Use page first chunk allocator for 32bit NUMAs. As this leaves setup_pcpu_*() functions pretty much empty, fold them into setup_per_cpu_areas(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <andi@firstfloor.org>	2009-08-14 15:00:52 +09:00
Tejun Heo	c8826dd538	percpu: update embedding first chunk allocator to handle sparse units Now that percpu core can handle very sparse units, given that vmalloc space is large enough, embedding first chunk allocator can use any memory to build the first chunk. This patch teaches pcpu_embed_first_chunk() about distances between cpus and to use alloc/free callbacks to allocate node specific areas for each group and use them for the first chunk. This brings the benefits of embedding allocator to NUMA configurations - no extra TLB pressure with the flexibility of unified dynamic allocator and no need to restructure arch code to build memory layout suitable for percpu. With units put into atom_size aligned groups according to cpu distances, using large page for dynamic chunks is also easily possible with falling back to reuglar pages if large allocation fails. Embedding allocator users are converted to specify NULL cpu_distance_fn, so this patch doesn't cause any visible behavior difference. Following patches will convert them. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 15:00:52 +09:00
Tejun Heo	fb435d5233	percpu: add pcpu_unit_offsets[] Currently units are mapped sequentially into address space. This patch adds pcpu_unit_offsets[] which allows units to be mapped to arbitrary offsets from the chunk base address. This is necessary to allow sparse embedding which might would need to allocate address ranges and memory areas which aren't aligned to unit size but allocation atom size (page or large page size). This also simplifies things a bit by removing the need to calculate offset from unit number. With this change, there's no need for the arch code to know pcpu_unit_size. Update pcpu_setup_first_chunk() and first chunk allocators to return regular 0 or -errno return code instead of unit size or -errno. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David S. Miller <davem@davemloft.net>	2009-08-14 15:00:51 +09:00
Tejun Heo	fd1e8a1fe2	percpu: introduce pcpu_alloc_info and pcpu_group_info Till now, non-linear cpu->unit map was expressed using an integer array which maps each cpu to a unit and used only by lpage allocator. Although how many units have been placed in a single contiguos area (group) is known while building unit_map, the information is lost when the result is recorded into the unit_map array. For lpage allocator, as all allocations are done by lpages and whether two adjacent lpages are in the same group or not is irrelevant, this didn't cause any problem. Non-linear cpu->unit mapping will be used for sparse embedding and this grouping information is necessary for that. This patch introduces pcpu_alloc_info which contains all the information necessary for initializing percpu allocator. pcpu_alloc_info contains array of pcpu_group_info which describes how units are grouped and mapped to cpus. pcpu_group_info also has base_offset field to specify its offset from the chunk's base address. pcpu_build_alloc_info() initializes this field as if all groups are allocated back-to-back as is currently done but this will be used to sparsely place groups. pcpu_alloc_info is a rather complex data structure which contains a flexible array which in turn points to nested cpu_map arrays. * pcpu_alloc_alloc_info() and pcpu_free_alloc_info() are provided to help dealing with pcpu_alloc_info. * pcpu_lpage_build_unit_map() is updated to build pcpu_alloc_info, generalized and renamed to pcpu_build_alloc_info(). @cpu_distance_fn may be NULL indicating that all cpus are of LOCAL_DISTANCE. * pcpul_lpage_dump_cfg() is updated to process pcpu_alloc_info, generalized and renamed to pcpu_dump_alloc_info(). It now also prints which group each alloc unit belongs to. * pcpu_setup_first_chunk() now takes pcpu_alloc_info instead of the separate parameters. All first chunk allocators are updated to use pcpu_build_alloc_info() to build alloc_info and call pcpu_setup_first_chunk() with it. This has the side effect of packing units for sparse possible cpus. ie. if cpus 0, 2 and 4 are possible, they'll be assigned unit 0, 1 and 2 instead of 0, 2 and 4. * x86 setup_pcpu_lpage() is updated to deal with alloc_info. * sparc64 setup_per_cpu_areas() is updated to build alloc_info. Although the changes made by this patch are pretty pervasive, it doesn't cause any behavior difference other than packing of sparse cpus. It mostly changes how information is passed among initialization functions and makes room for more flexibility. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>	2009-08-14 15:00:51 +09:00
Tejun Heo	3cbc856527	percpu: add @align to pcpu_fc_alloc_fn_t pcpu_fc_alloc_fn_t is about to see more interesting usage, add @align parameter. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 15:00:50 +09:00
Tejun Heo	9a7737691e	percpu: drop @static_size from first chunk allocators First chunk allocators assume percpu areas have been linked using one of PERCPU_*() macros and depend on __per_cpu_load symbol defined by those macros, so there isn't much point in passing in static area size explicitly when it can be easily calculated from __per_cpu_start and __per_cpu_end. Drop @static_size from all percpu first chunk allocators and helpers. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 15:00:50 +09:00
Tejun Heo	f58dc01ba2	percpu: generalize first chunk allocator selection Now that all first chunk allocators are in mm/percpu.c, it makes sense to make generalize percpu_alloc kernel parameter. Define PCPU_FC_* and set pcpu_chosen_fc using early_param() in mm/percpu.c. Arch code can use the set value to determine which first chunk allocator to use. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 15:00:50 +09:00
Tejun Heo	00ae4064b1	percpu: rename 4k first chunk allocator to page Page size isn't always 4k depending on arch and configuration. Rename 4k first chunk allocator to page. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David Howells <dhowells@redhat.com>	2009-08-14 15:00:49 +09:00
Tejun Heo	384be2b18a	Merge branch 'percpu-for-linus' into percpu-for-next Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 14:45:31 +09:00
Tejun Heo	74d46d6b2d	percpu, sparc64: fix sparse possible cpu map handling percpu code has been assuming num_possible_cpus() == nr_cpu_ids which is incorrect if cpu_possible_map contains holes. This causes percpu code to access beyond allocated memories and vmalloc areas. On a sparc64 machine with cpus 0 and 2 (u60), this triggers the following warning or fails boot. WARNING: at /devel/tj/os/work/mm/vmalloc.c:106 vmap_page_range_noflush+0x1f0/0x240() Modules linked in: Call Trace: [00000000004b17d0] vmap_page_range_noflush+0x1f0/0x240 [00000000004b1840] map_vm_area+0x20/0x60 [00000000004b1950] __vmalloc_area_node+0xd0/0x160 [0000000000593434] deflate_init+0x14/0xe0 [0000000000583b94] __crypto_alloc_tfm+0xd4/0x1e0 [00000000005844f0] crypto_alloc_base+0x50/0xa0 [000000000058b898] alg_test_comp+0x18/0x80 [000000000058dad4] alg_test+0x54/0x180 [000000000058af00] cryptomgr_test+0x40/0x60 [0000000000473098] kthread+0x58/0x80 [000000000042b590] kernel_thread+0x30/0x60 [0000000000472fd0] kthreadd+0xf0/0x160 ---[ end trace 429b268a213317ba ]--- This patch fixes generic percpu functions and sparc64 setup_per_cpu_areas() so that they handle sparse cpu_possible_map properly. Please note that on x86, cpu_possible_map() doesn't contain holes and thus num_possible_cpus() == nr_cpu_ids and this patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu>	2009-08-14 13:20:53 +09:00
Linus Torvalds	3493e84de6	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_counter: Report the cloning task as parent on perf_counter_fork() perf_counter: Fix an ipi-deadlock perf: Rework/fix the whole read vs group stuff perf_counter: Fix swcounter context invariance perf report: Don't show unresolved DSOs and symbols when -S/-d is used perf tools: Add a general option to enable raw sample records perf tools: Add a per tracepoint counter attribute to get raw sample perf_counter: Provide hw_perf_counter_setup_online() APIs perf list: Fix large list output by using the pager perf_counter, x86: Fix/improve apic fallback perf record: Add missing -C option support for specifying profile cpu perf tools: Fix dso__new handle() to handle deleted DSOs perf tools: Fix fallback to cplus_demangle() when bfd_demangle() is not available perf report: Show the tid too in -D perf record: Fix .tid and .pid fill-in when synthesizing events perf_counter, x86: Fix generic cache events on P6-mobile CPUs perf_counter, x86: Fix lapic printk message	2009-08-13 12:24:33 -07:00
H. Peter Anvin	81e2d7b30d	x86, intel_txt: tboot.c needs <asm/fixmap.h> arch/x86/kernel/tboot.c needs <asm/fixmap.h>. In most configurations that ends up getting implicitly included, but not in all, causing build failures in some configurations. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Joseph Cihula <joseph.cihula@intel.com> Cc: Shane Wang <shane.wang@intel.com>	2009-08-12 05:45:34 -07:00
Ingo Molnar	04da8a43da	perf_counter, x86: Fix/improve apic fallback Johannes Stezenbach reported that his Pentium-M based laptop does not have the local APIC enabled by default, and hence perfcounters do not get initialized. Add a fallback for this case: allow non-sampled counters and return with an error on sampled counters. This allows 'perf stat' to work out of box - and allows 'perf top' and 'perf record' to fall back on a hrtimer based sampling method. ( Passing 'lapic' on the boot line will allow hardware sampling to occur - but if the APIC is disabled permanently by the hardware then this fallback still allows more systems to use perfcounters. ) Also decouple perfcounter support from X86_LOCAL_APIC. -v2: fix typo breaking counters on all other systems ... Reported-by: Johannes Stezenbach <js@sig21.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-12 14:12:49 +02:00
Ondrej Zary	e8055139d9	x86: Fix oops in identify_cpu() on CPUs without CPUID Kernel is broken for x86 CPUs without CPUID since 2.6.28. It crashes with NULL pointer dereference in identify_cpu(): 766 generic_identify(c); 767 768--> if (this_cpu->c_identify) 769 this_cpu->c_identify(c); this_cpu is NULL. This is because it's only initialized in get_cpu_vendor() function, which is not called if the CPU has no CPUID instruction. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> LKML-Reference: <200908112000.15993.linux@rainbow-software.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-12 11:49:41 +02:00
Jason Baron	0ac676fb50	tracing: Convert x86_64 mmap and uname to use DEFINE_SYSCALL A number of syscalls are not using 'DEFINE_SYSCALL'. I'm not sure why. Convert x86_64 uname and mmap to use DEFINE_SYSCALL. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-11 20:35:29 +02:00
Jason Baron	64c12e0444	tracing: Add individual syscalls tracepoint id support The current state of syscalls tracepoints generates only one event id for every syscall events. This patch associates an id with each syscall trace event, so that we can identify each syscall trace event using the 'perf' tool. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-11 20:35:28 +02:00
Jason Baron	a871bd33a6	tracing: Add syscall tracepoints add two tracepoints in syscall exit and entry path, conditioned on TIF_SYSCALL_FTRACE. Supports the syscall trace event code. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-11 20:35:26 +02:00
Jason Baron	066e0378c2	tracing: Call arch_init_ftrace_syscalls at boot Call arch_init_ftrace_syscalls at boot, so we can determine early the set of syscalls for the syscall trace events. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-11 20:35:26 +02:00
Jason Baron	eeac19a7ef	tracing: Map syscall name to number Add a new function to support translating a syscall name to number at runtime. This allows the syscall event tracer to map syscall names to number. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-11 20:35:25 +02:00
Ingo Molnar	89034bc2c7	Merge branch 'linus' into tracing/core Conflicts: kernel/trace/trace_events_filter.c We use the tracing/core version. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-11 14:19:09 +02:00
Kevin Winchester	fbd8b1819e	x86: Clear incorrectly forced X86_FEATURE_LAHF_LM flag Due to an erratum with certain AMD Athlon 64 processors, the BIOS may need to force enable the LAHF_LM capability. Unfortunately, in at least one case, the BIOS does this even for processors that do not support the functionality. Add a specific check that will clear the feature bit for processors known not to support the LAHF/SAHF instructions. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Acked-by: Borislav Petkov <petkovbb@googlemail.com> LKML-Reference: <4A80A5AD.2000209@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-11 13:34:54 +02:00
Ingo Molnar	f64ccccb8a	perf_counter, x86: Fix generic cache events on P6-mobile CPUs Johannes Stezenbach reported that 'perf stat' does not count cache-miss and cache-references events on his Pentium-M based laptop. This is because we left them blank in p6_perfmon_event_map[], fill them in. Reported-by: Johannes Stezenbach <js@sig21.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-11 11:35:26 +02:00
Ingo Molnar	3c581a7f94	perf_counter, x86: Fix lapic printk message Instead of this garbled bootup on UP Pentium-M systems: [ 0.015048] Performance Counters: [ 0.016004] no Local APIC, try rebooting with lapicno PMU driver, software counters only. Print: [ 0.015050] Performance Counters: [ 0.016004] no APIC, boot with the "lapic" boot parameter to force-enable it. [ 0.017003] no PMU driver, software counters only. Cf: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-11 10:58:25 +02:00
Dmitry Torokhov	0d01f31439	x86, mce: therm_throt - change when we print messages My Latitude d630 seems to be handling thermal events in SMI by lowering the max frequency of the CPU till it cools down but still leaks the "everything is normal" events. This spams the console and with high priority printks. Adjust therm_throt driver to only print messages about the fact that temperatire returned back to normal when leaving the throttling state. Also lower the severity of "back to normal" message from KERN_CRIT to KERN_INFO. Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Acked-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <20090810051513.0558F526EC9@mailhub.coreip.homeip.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-11 09:54:17 +02:00
Huang Ying	bf783f9f7d	x86, mce: Fake panic support for MCE testing If "fake panic" mode is turned on, just log panic message instead of go real panic. This is used for testing only, so that the test suite can check for the correct panic message and do regression testing for MCE would go panic. This patch is based on x86-tip.git/mce. ChangeLog: v5: - Rebased on x86-tip.git/mce v4: - Move config file from sysfs to debugfs Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-10 13:59:12 -07:00
Huang Ying	5be9ed251f	x86, mce: Move debugfs mce dir creating to mce.c Because more debugfs files under mce dir will be create in mce.c. ChangeLog: v5: - Rebased on x86-tip.git/mce Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-10 13:58:53 -07:00
Huang Ying	0dcc66851f	x86, mce: Support specifying raise mode for software MCE injection Raise mode include raising as exception or raising as poll, it is specified via the mce.inject_flags field. This can be used to specify raise mode of UCNA, which is UC error but raised not as exception. And this can be used to test the filter code of poll handler or exception handler too. For example, enforce a poll raise mode for a fatal MCE. ChangeLog: v2: - Re-base on latest x86-tip.git/mce3 Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-10 13:58:41 -07:00
Huang Ying	5b7e88edc6	x86, mce: Support specifying context for software mce injection The cpu context is specified via the new mce.inject_flags fields. This allows more realistic machine check testing in different situations. "RANDOM" context is implemented via NMI broadcasting to add randomization to testing. AK: Fix NMI broadcasting check. Fix 32-bit building. Some race fixes. Move to module. Various changes ChangeLog: v3: - Re-based on latest x86-tip.git/mce4 - Fix 32-bit building v2: - Re-base on latest x86-tip.git/mce3 Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-10 13:58:27 -07:00
Shunichi Fuji	3e03bbeac5	x86: Add reboot quirk for every 5 series MacBook/Pro Reboot does not work on my MacBook Pro 13 inch (MacBookPro5,5) too. It seems all unibody MacBook and MacBookPro require PCI reboot handling, i guess. Following model/machine ID list shows unibody MacBook/Pro have the 5 series of model number: http://www.everymac.com/systems/by_capability/macs-by-machine-model-machine-id.html Signed-off-by: Shunichi Fuji <palglowr@gmail.com> Cc: Ozan Çağlayan <ozan@pardus.org.tr> LKML-Reference: <30046e3b0908101134p6487ddbftd8776e4ddef204be@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-10 20:59:42 +02:00
Linus Torvalds	b6e61eef4f	x86: Fix serialization in pit_expect_msb() Wei Chong Tan reported a fast-PIT-calibration corner-case: \| pit_expect_msb() is vulnerable to SMI disturbance corner case \| in some platforms which causes /proc/cpuinfo to show wrong \| CPU MHz value when quick_pit_calibrate() jumps to success \| section. I think that the real issue isn't even an SMI - but the fact that in the very last iteration of the loop, there's no serializing instruction _after_ the last 'rdtsc'. So even in the absense of SMI's, we do have a situation where the cycle counter was read without proper serialization. The last check should be done outside the outer loop, since _inside_ the outer loop, we'll be testing that the PIT has the right MSB value has the right value in the next iteration. So only the _last_ iteration is special, because that's the one that will not check the PIT MSB value any more, and because the final 'get_cycles()' isn't serialized. In other words: - I'd like to move the PIT MSB check to after the last iteration, rather than in every iteration - I think we should comment on the fact that it's also a serializing instruction and so 'fences in' the TSC read. Here's a suggested replacement. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: "Tan, Wei Chong" <wei.chong.tan@intel.com> Tested-by: "Tan, Wei Chong" <wei.chong.tan@intel.com> LKML-Reference: <B28277FD4E0F9247A3D55704C440A140D5D683F3@pgsmsx504.gar.corp.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-10 19:56:57 +02:00
Akinobu Mita	c7425314c7	x86: Introduce GDT_ENTRY_INIT(), initialize bad_bios_desc statically Fully initialize bad_bios_desc statically instead of doing some fields statically and some dynamically. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <20090809080350.GA4765@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-10 11:10:52 +02:00
Arnd Bergmann	a8ad568dd8	dma-ops: Remove flush_write_buffers() in dma-mapping-common.h This moves flush_write_buffers() in asm-generic/dma-mapping-common.h to arch/x86/kernel/pci-nommu.c. The purpose of this patch is that, we can avoid defining NULL flush_write_buffers() on IA64 and SPARC. dma-mapping-common.h is used by X86 and IA64 (and SPARC soon) but only X86 with CONFIG_X86_OOSTORE or CONFIG_X86_PPRO_FENCE actually uses flush_write_buffers(). CONFIG_X86_OOSTORE or CONFIG_X86_PPRO_FENCE is usable with only kernel/pci-nommu.c (that is, not usable with other X86 IOMMU implementations such as SWIOTLB, VT-d, etc) so we can safely move flush_write_buffers() in asm-generic/dma-mapping-common.h to arch/x86/kernel/pci-nommu.c. The further discussion is: http://lkml.org/lkml/2009/6/28/104 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: davem@davemloft.net Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com LKML-Reference: <1249872797-1314-2-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-10 09:34:57 +02:00
Marcin Slusarz	9f51e24ee8	x86: Use printk_once() Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> LKML-Reference: <1249847649-11631-6-git-send-email-marcin.slusarz@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-09 22:28:34 +02:00
Markus Metzger	30dd568c91	x86, perf_counter, bts: Add BTS support to perfcounters Implement a performance counter with: attr.type = PERF_TYPE_HARDWARE attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS attr.sample_period = 1 Using branch trace store (BTS) on x86 hardware, if available. The from and to address for each branch can be sampled using: PERF_SAMPLE_IP for the from address PERF_SAMPLE_ADDR for the to address [ v2: address review feedback, fix bugs ] Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-09 13:04:14 +02:00
Roel Kluin	fdb8a42742	x86: fix buffer overflow in efi_init() If the vendor name (from c16) can be longer than 100 bytes (or missing a terminating null), then the null is written past the end of vendor[]. Found with Parfait, http://research.sun.com/projects/parfait/ Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Huang Ying <ying.huang@intel.com>	2009-08-09 01:08:42 -07:00
Ingo Molnar	72c4d85302	x86: Introduce GDT_ENTRY_INIT(), fix APM This crash: [ 0.891983] calling cache_sysfs_init+0x0/0x1ee @ 1 [ 0.897251] initcall cache_sysfs_init+0x0/0x1ee returned 0 after 405 usecs [ 0.904019] calling mce_init_device+0x0/0x242 @ 1 [ 0.909124] initcall mce_init_device+0x0/0x242 returned 0 after 347 usecs [ 0.915815] calling apm_init+0x0/0x38d @ 1 [ 0.919967] apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac) [ 0.926813] general protection fault: 0000 [#1] [ 0.927269] last sysfs file: [ 0.927269] Modules linked in: [ 0.927269] [ 0.927269] Pid: 271, comm: kapmd Not tainted (2.6.31-rc3-00100-gd520da1-dirty #311) System Product Name [ 0.927269] EIP: 00c0:[<000082b2>] EFLAGS: 00010002 CPU: 0 [ 0.927269] EIP is at 0x82b2 [ 0.927269] EAX: 0000530e EBX: 00000000 ECX: 00000102 EDX: 00000000 [ 0.927269] ESI: 00000000 EDI: f6a4bf44 EBP: 67890000 ESP: f6a4beec [ 0.927269] DS: 00c8 ES: 0000 FS: 0000 GS: 0000 SS: 0068 [ 0.927269] Process kapmd (pid: 271, ti=f6a4a000 task=f7142280 task.ti=f6a4a000) [ 0.927269] Stack: [ 0.927269] 0000828d 02160000 00b88092 f6a4bf3c c102a63d 00000060 f6a4bf3c f6a4bf44 [ 0.927269] <0> 0000007b 0000007b 00000000 00000000 00000000 00000000 560aae9e 00000000 [ 0.927269] <0> 00000200 f705fd74 00000000 c102af70 f6a4bf60 c102a6ec 0000530e 00000000 [ 0.927269] Call Trace: [ 0.927269] [<c102a63d>] ? __apm_bios_call_simple+0x7d/0x110 [ 0.927269] [<c102af70>] ? apm+0x0/0x6a0 [ 0.927269] [<c102a6ec>] ? apm_bios_call_simple+0x1c/0x50 [ 0.927269] [<c102b3f5>] ? apm+0x485/0x6a0 [ 0.927269] [<c1038e7a>] ? finish_task_switch+0x2a/0xb0 [ 0.927269] [<c164a69e>] ? schedule+0x31e/0x480 [ 0.927269] [<c102af70>] ? apm+0x0/0x6a0 [ 0.927269] [<c102af70>] ? apm+0x0/0x6a0 [ 0.927269] [<c1052654>] ? kthread+0x74/0x80 [ 0.927269] [<c10525e0>] ? kthread+0x0/0x80 [ 0.927269] [<c101d627>] ? kernel_thread_helper+0x7/0x10 [ 0.927269] Code: Bad EIP value. [ 0.927269] EIP: [<000082b2>] 0x82b2 SS:ESP 0068:f6a4beec [ 0.927269] ---[ end trace a7919e7f17c0a725 ]--- [ 0.927269] Kernel panic - not syncing: Fatal exception [ 0.927269] Pid: 271, comm: kapmd Tainted: G D 2.6.31-rc3-00100-gd520da1-dirty #311 Is caused by an incorrect GDT_ENTRY_INIT() conversion in the apm code, as noticed by hpa. Reported-by: Ingo Molnar <mingo@elte.hu> Noticed-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <20090808094905.GA2954@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-08 17:47:12 +02:00
Akinobu Mita	1e5de18278	x86: Introduce GDT_ENTRY_INIT() GDT_ENTRY_INIT is static initializer of desc_struct. We already have similar macro GDT_ENTRY() but it's static initializer for u64 and it cannot be used for desc_struct. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <20090718151219.GD11294@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-08 17:44:11 +02:00
Cyrill Gorcunov	f3d1915a86	x86, ioapic: Panic on irq-pin binding only if needed Though the most time we are to panic on irq-pin allocation fails, for PCI interrupts it's not the case and we could continue operate even if irq-pin allocation failed. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090805200931.GB5319@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-08 17:20:03 +02:00
Ozan Çağlayan	498cdbfbcf	x86: Add quirk to make Apple MacBookPro5,1 use reboot=pci MacBookPro5,1 is not able to reboot unless reboot=pci is set. This patch forces it through a DMI quirk specific to this device. Signed-off-by: Ozan Çağlayan <ozan@pardus.org.tr> LKML-Reference: <1249403971-6543-1-git-send-email-ozan@pardus.org.tr> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-08 17:09:11 +02:00
Yinghai Lu	087d7e56de	x86: Fix MSI-X initialization by using online_mask for x2apic target_cpus found a system where x2apic reports an MSI-X irq initialization failure: [ 302.859446] igbvf 0000:81:10.4: enabling device (0000 -> 0002) [ 302.874369] igbvf 0000:81:10.4: using 64bit DMA mask [ 302.879023] igbvf 0000:81:10.4: using 64bit consistent DMA mask [ 302.894386] igbvf 0000:81:10.4: enabling bus mastering [ 302.898171] igbvf 0000:81:10.4: setting latency timer to 64 [ 302.914050] reserve_memtype added 0xefb08000-0xefb0c000, track uncached-minus, req uncached-minus, ret uncached-minus [ 302.933839] reserve_memtype added 0xefb28000-0xefb29000, track uncached-minus, req uncached-minus, ret uncached-minus [ 302.940367] alloc irq_desc for 265 on node 4 [ 302.956874] alloc kstat_irqs on node 4 [ 302.959452] alloc irq_2_iommu on node 0 [ 302.974328] igbvf 0000:81:10.4: irq 265 for MSI/MSI-X [ 302.977778] alloc irq_desc for 266 on node 4 [ 302.980347] alloc kstat_irqs on node 4 [ 302.995312] free_memtype request 0xefb28000-0xefb29000 [ 302.998816] igbvf 0000:81:10.4: Failed to initialize MSI-X interrupts. ... it turns out that when trying to enable MSI-X, __assign_irq_vector(new, cfg_new, apic->target_cpus()) can not get vector because for x2apic target-cpus returns cpumask_of(0) Update that to online_mask like xapic. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <4A785AFF.3050902@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-08 17:04:58 +02:00
David Woodhouse	a131bc1855	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 Pull fixes in from 2.6.31 so that people testing the iommu-2.6.git tree no longer trip over bugs which were already fixed (sorry, Horms).	2009-08-08 11:26:15 +01:00
Frederic Weisbecker	07868b086c	tracing/function-graph-tracer: Drop the useless nmi protection The function graph tracer used to have a protection against NMI while entering a function entry tracing. But this is useless now, this tracer is reentrant and the ring buffer supports the NMI tracing. We can then drop this protection. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2009-08-06 07:28:05 +02:00
Gleb Natapov	ce69a78450	x86/apic: Enable x2APIC without interrupt remapping under KVM KVM would like to provide x2APIC interface to a guest without emulating interrupt remapping device. The reason KVM prefers guest to use x2APIC is that x2APIC interface is better virtualizable and provides better performance than mmio xAPIC interface: - msr exits are faster than mmio (no page table walk, emulation) - no need to read back ICR to look at the busy bit - one 64 bit ICR write instead of two 32 bit writes - shared code with the Hyper-V paravirt interface Included patch changes x2APIC enabling logic to enable it even if IR initialization failed, but kernel runs under KVM and no apic id is greater than 255 (if there is one spec requires BIOS to move to x2apic mode before starting an OS). -v2: fix build -v3: fix bug causing compiler warning Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Sheng Yang <sheng@linux.intel.com> Cc: "avi@redhat.com" <avi@redhat.com> LKML-Reference: <20090720122417.GR5638@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-05 14:28:50 +02:00
Cyrill Gorcunov	9910887af8	x86, apic: Drop redundant bit assignment cpu_has_apic has already investigated boot_cpu_data X86_FEATURE_APIC bit for being clear if condition is triggered. So there is no need to clear this bit second time. Signed-off-by: Cyrill Gorcuno v <gorcunov@openvz.org> Cc: "Maciej W. Rozycki" <macro@linux-mips.org> LKML-Reference: <20090722205259.GE15805@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-05 10:30:52 +02:00
Cyrill Gorcunov	a7428cd2ef	x86, ioapic: Throw BUG instead of NULL dereference Instead of plain NULL deref we better throw error message with a backtrace. Actually we need more gracious error handling here. Meanwhile leave it as is. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: yinghai@kernel.org LKML-Reference: <20090801075435.769301745@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-05 10:30:50 +02:00
Cyrill Gorcunov	2977fb3ffc	x86, ioapic: Introduce for_each_irq_pin() helper This allow us to save a few lines of code. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: yinghai@kernel.org LKML-Reference: <20090801075435.597863129@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-05 10:30:49 +02:00
Alok Kataria	7d5b005652	x86: Fix VMI && stack protector With CONFIG_STACK_PROTECTOR turned on, VMI doesn't boot with more than one processor. The problem is with the gs value not being initialized correctly when registering the secondary processor for VMI's case. The patch below initializes the gs value for the AP to __KERNEL_STACK_CANARY. Without this the secondary processor keeps on taking a GP on every gs access. Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: <stable@kernel.org> # for v2.6.30.x LKML-Reference: <1249425262.18955.40.camel@ank32.eng.vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-05 10:20:29 +02:00
David Woodhouse	19943b0e30	intel-iommu: Unify hardware and software passthrough support This makes the hardware passthrough mode work a lot more like the software version, so that the behaviour of a kernel with 'iommu=pt' is the same whether the hardware supports passthrough or not. In particular: - We use a single si_domain for the pass-through devices. - 32-bit devices can be taken out of the pass-through domain so that they don't have to use swiotlb. - Devices will work again after being removed from a KVM guest. - A potential oops on OOM (in init_context_pass_through()) is fixed. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-08-04 16:19:23 +01:00

... 2 3 4 5 6 ...

5455 commits