linux

mirror of https://github.com/torvalds/linux synced 2024-09-27 06:50:51 +00:00

Author	SHA1	Message	Date
Adam Buchbinder	6070d81eb5	tree-wide: fix misspelling of "definition" in comments "Definition" is misspelled "defintion" in several comments; this patch fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 23:41:47 +01:00
André Goddard Rosa	af901ca181	tree-wide: fix assorted typos all over the place That is "success", "unknown", "through", "performance", "[re\|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over\|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:55 +01:00
Ian Campbell	f6eafe3665	xen: call clock resume notifier on all CPUs tick_resume() is never called on secondary processors. Presumably this is because they are offlined for suspend on native and so this is normally taken care of in the CPU onlining path. Under Xen we keep all CPUs online over a suspend. This patch papers over the issue for me but I will investigate a more generic, less hacky, way of doing to the same. tick_suspend is also only called on the boot CPU which I presume should be fixed too. Signed-off-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>	2009-12-03 11:14:55 -08:00
Jeremy Fitzhardinge	6aaf5d633b	xen: use iret for return from 64b kernel to 32b usermode If Xen wants to return to a 32b usermode with sysret it must use the right form. When using VCGF_in_syscall to trigger this, it looks at the code segment and does a 32b sysret if it is FLAT_USER_CS32. However, this is different from __USER32_CS, so it fails to return properly if we use the normal Linux segment. So avoid the whole mess by dropping VCGF_in_syscall and simply use plain iret to return to usermode. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Jan Beulich <jbeulich@novell.com> Cc: Stable Kernel <stable@kernel.org>	2009-12-03 11:14:54 -08:00
Jeremy Fitzhardinge	499d19b82b	xen: register runstate info for boot CPU early printk timestamping uses sched_clock, which in turn relies on runstate info under Xen. So make sure we set it up before any printks can be called. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>	2009-12-03 11:14:53 -08:00
Ian Campbell	028896721a	xen: register runstate on secondary CPUs The commit "xen: re-register runstate area earlier on resume" caused us to never try and setup the runstate area for secondary CPUs. Ensure that we do this... Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>	2009-12-03 11:14:52 -08:00
Ian Campbell	f350c7922f	xen: register timer interrupt with IRQF_TIMER Otherwise the timer is disabled by dpm_suspend_noirq() which in turn prevents correct operation of stop_machine on multi-processor systems and breaks suspend. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>	2009-12-03 11:14:52 -08:00
Ian Campbell	fa24ba62ea	xen: correctly restore pfn_to_mfn_list_list after resume pvops kernels >= 2.6.30 can currently only be saved and restored once. The second attempt to save results in: ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000 ERROR Internal error: Failed to map/save the p2m frame list I finally narrowed it down to: commit `cdaead6b4e` Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Date: Fri Feb 27 15:34:59 2009 -0800 xen: split construction of p2m mfn tables from registration Build the p2m_mfn_list_list early with the rest of the p2m table, but register it later when the real shared_info structure is in place. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> The unforeseen side-effect of this change was to cause the mfn list list to not be rebuilt on resume. Prior to this change it would have been rebuilt via xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list(). Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend(). Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>	2009-12-03 11:14:51 -08:00
Jeremy Fitzhardinge	3905bb2aa7	xen: restore runstate_info even if !have_vcpu_info_placement Even if have_vcpu_info_placement is not set, we still need to set up the runstate area on each resumed vcpu. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>	2009-12-03 11:14:51 -08:00
Ian Campbell	be012920ec	xen: re-register runstate area earlier on resume. This is necessary to ensure the runstate area is available to xen_sched_clock before any calls to printk which will require it in order to provide a timestamp. I chose to pull the xen_setup_runstate_info out of xen_time_init into the caller in order to maintain parity with calling xen_setup_runstate_info separately from calling xen_time_resume. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>	2009-12-03 11:14:50 -08:00
Ingo Molnar	d103d01e4b	Merge branch 'perf/probes' into perf/core Merge reason: add these fixes to 'perf probe'. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-03 20:11:38 +01:00
Ingo Molnar	26fb20d008	Merge branch 'perf/mce' into perf/core Merge reason: It's ready for v2.6.33. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-03 20:11:06 +01:00
Mikael Pettersson	7d1849aff6	x86, apic: Enable lapic nmi watchdog on AMD Family 11h The x86 lapic nmi watchdog does not recognize AMD Family 11h, resulting in: NMI watchdog: CPU not supported As far as I can see from available documentation (the BKDM), family 11h looks identical to family 10h as far as the PMU is concerned. Extending the check to accept family 11h results in: Testing NMI watchdog ... OK. I've been running with this change on a Turion X2 Ultra ZM-82 laptop for a couple of weeks now without problems. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: <stable@kernel.org> LKML-Reference: <19223.53436.931768.278021@pilspetsen.it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-03 16:25:15 +01:00
Jens Axboe	220d0b1dbf	Merge branch 'master' into for-2.6.33	2009-12-03 13:49:39 +01:00
Xiaotian Feng	57fea8f7ab	x86/reboot: Add pci_dev_put in reboot_fixup_32.c for consistency pci_get_device will increase the ref count of found device. Although we're going to reset soon, we should use pci_dev_put to decrease the ref count for consistency. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1259838400-23833-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-03 12:17:55 +01:00
Darrick J. Wong	4528752f49	x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree On a multi-node x3950M2 system, there's a slight oddity in the PCI device tree for all secondary nodes: 30:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) \-33:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01) \-34:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) ...as compared to the primary node: 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) \-01:00.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) 03:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01) \-04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) In both nodes, the LSI RAID controller hangs off a CalIOC2 device, but on the secondary nodes, the BIOS hides the VGA device and substitutes the device tree ending with the disk controller. It would seem that Calgary devices don't necessarily appear at the top of the PCI tree, which means that the current code to find the Calgary IOMMU that goes with a particular device is buggy. Rather than walk all the way to the top of the PCI device tree and try to match bus number with Calgary descriptor, the code needs to examine each parent of the particular device; if it encounters a Calgary with a matching bus number, simply use that. Otherwise, we BUG() when the bus number of the Calgary doesn't match the bus number of whatever's at the top of the device tree. Extra note: This patch appears to work correctly for the x3950 that came before the x3950 M2. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Jon D. Mason <jdmason@kudzu.us> Cc: Corinna Schultz <coschult@us.ibm.com> Cc: <stable@kernel.org> LKML-Reference: <20091202230556.GG10295@tux1.beaverton.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-03 11:44:05 +01:00
Avi Kivity	d5696725b2	KVM: VMX: Fix comparison of guest efer with stale host value update_transition_efer() masks out some efer bits when deciding whether to switch the msr during guest entry; for example, NX is emulated using the mmu so we don't need to disable it, and LMA/LME are handled by the hardware. However, with shared msrs, the comparison is made against a stale value; at the time of the guest switch we may be running with another guest's efer. Fix by deferring the mask/compare to the actual point of guest entry. Noted by Marcelo. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:34:20 +02:00
Avi Kivity	3548bab501	KVM: Drop user return notifier when disabling virtualization on a cpu This way, we don't leave a dangling notifier on cpu hotunplug or module unload. In particular, module unload leaves the notifier pointing into freed memory. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:26 +02:00
Sheng Yang	046d87103a	KVM: VMX: Disable unrestricted guest when EPT disabled Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest supported processors. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:25 +02:00
Avi Kivity	eb3c79e64a	KVM: x86 emulator: limit instructions to 15 bytes While we are never normally passed an instruction that exceeds 15 bytes, smp games can cause us to attempt to interpret one, which will cause large latencies in non-preempt hosts. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:25 +02:00
Jan Kiszka	3cfc3092f4	KVM: x86: Add KVM_GET/SET_VCPU_EVENTS This new IOCTL exports all yet user-invisible states related to exceptions, interrupts, and NMIs. Together with appropriate user space changes, this fixes sporadic problems of vmsave/restore, live migration and system reset. [avi: future-proof abi by adding a flags field] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:25 +02:00
Avi Kivity	65ac726404	KVM: VMX: Report unexpected simultaneous exceptions as internal errors These happen when we trap an exception when another exception is being delivered; we only expect these with MCEs and page faults. If something unexpected happens, things probably went south and we're better off reporting an internal error and freezing. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:24 +02:00
Avi Kivity	a9c7399d6c	KVM: Allow internal errors reported to userspace to carry extra data Usually userspace will freeze the guest so we can inspect it, but some internal state is not available. Add extra data to internal error reporting so we can expose it to the debugger. Extra data is specific to the suberror. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:24 +02:00
Jan Kiszka	4f926bf291	KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance, avoid triggering the WARN_ON in kvm_queue_exception if there is already an exception pending and reject such invalid requests. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:24 +02:00
Marcelo Tosatti	2204ae3c96	KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic Otherwise kvm might attempt to dereference a NULL pointer. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:23 +02:00
Marcelo Tosatti	3ddea128ad	KVM: x86: disallow multiple KVM_CREATE_IRQCHIP Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP. Also serialize multiple accesses with kvm->lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:23 +02:00
Avi Kivity	92c0d90015	KVM: VMX: Remove vmx->msr_offset_efer This variable is used to communicate between a caller and a callee; switch to a function argument instead. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:23 +02:00
Marcelo Tosatti	5f5c35aad5	KVM: MMU: update invlpg handler comment Large page translations are always synchronized (either in level 3 or level 2), so its not necessary to properly deal with them in the invlpg handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:23 +02:00
Marcelo Tosatti	7c93be44a4	KVM: VMX: move CR3/PDPTR update to vmx_set_cr3 GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from outside guest context. Similarly pdptrs are updated via load_pdptrs. Let kvm_set_cr3 perform the update, removing it from the vcpu_run fast path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Acked-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:22 +02:00
Gleb Natapov	1655e3a3dc	KVM: remove duplicated task_switch check Probably introduced by a bad merge. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:22 +02:00
Avi Kivity	26bb0981b3	KVM: VMX: Use shared msr infrastructure Instead of reloading syscall MSRs on every preemption, use the new shared msr infrastructure to reload them at the last possible minute (just before exit to userspace). Improves vcpu/idle/vcpu switches by about 2000 cycles (when EFER needs to be reloaded as well). [jan: fix slot index missing indirection] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:22 +02:00
Avi Kivity	18863bdd60	KVM: x86 shared msr infrastructure The various syscall-related MSRs are fairly expensive to switch. Currently we switch them on every vcpu preemption, which is far too often: - if we're switching to a kernel thread (idle task, threaded interrupt, kernel-mode virtio server (vhost-net), for example) and back, then there's no need to switch those MSRs since kernel threasd won't be exiting to userspace. - if we're switching to another guest running an identical OS, most likely those MSRs will have the same value, so there's little point in reloading them. - if we're running the same OS on the guest and host, the MSRs will have identical values and reloading is unnecessary. This patch uses the new user return notifiers to implement last-minute switching, and checks the msr values to avoid unnecessary reloading. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:21 +02:00
Avi Kivity	44ea2b1758	KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area Currently MSR_KERNEL_GS_BASE is saved and restored as part of the guest/host msr reloading. Since we wish to lazy-restore all the other msrs, save and reload MSR_KERNEL_GS_BASE explicitly instead of using the common code. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:21 +02:00
Eduardo Habkost	3ce672d484	KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization The svm_set_cr0() call will initialize save->cr0 properly even when npt is enabled, clearing the NW and CD bits as expected, so we don't need to initialize it manually for npt_enabled anymore. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:21 +02:00
Eduardo Habkost	18fa000ae4	KVM: SVM: Reset cr0 properly on vcpu reset svm_vcpu_reset() was not properly resetting the contents of the guest-visible cr0 register, causing the following issue: https://bugzilla.redhat.com/show_bug.cgi?id=525699 Without resetting cr0 properly, the vcpu was running the SIPI bootstrap routine with paging enabled, making the vcpu get a pagefault exception while trying to run it. Instead of setting vmcb->save.cr0 directly, the new code just resets kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0(). kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:21 +02:00
Eduardo Habkost	fa40052ca0	KVM: VMX: Use macros instead of hex value on cr0 initialization This should have no effect, it is just to make the code clearer. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:21 +02:00
Glauber Costa	afbcf7ab8d	KVM: allow userspace to adjust kvmclock offset When we migrate a kvm guest that uses pvclock between two hosts, we may suffer a large skew. This is because there can be significant differences between the monotonic clock of the hosts involved. When a new host with a much larger monotonic time starts running the guest, the view of time will be significantly impacted. Situation is much worse when we do the opposite, and migrate to a host with a smaller monotonic clock. This proposed ioctl will allow userspace to inform us what is the monotonic clock value in the source host, so we can keep the time skew short, and more importantly, never goes backwards. Userspace may also need to trigger the current data, since from the first migration onwards, it won't be reflected by a simple call to clock_gettime() anymore. [marcelo: future-proof abi with a flags field] [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it] Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:19 +02:00
Jan Kiszka	6be7d3062b	KVM: SVM: Cleanup NMI singlestep Push the NMI-related singlestep variable into vcpu_svm. It's dealing with an AMD-specific deficit, nothing generic for x86. Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> arch/x86/include/asm/kvm_host.h \| 1 - arch/x86/kvm/svm.c \| 12 +++++++----- 2 files changed, 7 insertions(+), 6 deletions(-) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:19 +02:00
Jan Kiszka	94fe45da48	KVM: x86: Fix guest single-stepping while interruptible Commit 705c5323 opened the doors of hell by unconditionally injecting single-step flags as long as guest_debug signaled this. This doesn't work when the guest branches into some interrupt or exception handler and triggers a vmexit with flag reloading. Fix it by saving cs:rip when user space requests single-stepping and restricting the trace flag injection to this guest code position. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:19 +02:00
Ed Swierk	ffde22ac53	KVM: Xen PV-on-HVM guest support Support for Xen PV-on-HVM guests can be implemented almost entirely in userspace, except for handling one annoying MSR that maps a Xen hypercall blob into guest address space. A generic mechanism to delegate MSR writes to userspace seems overkill and risks encouraging similar MSR abuse in the future. Thus this patch adds special support for the Xen HVM MSR. I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell KVM which MSR the guest will write to, as well as the starting address and size of the hypercall blobs (one each for 32-bit and 64-bit) that userspace has loaded from files. When the guest writes to the MSR, KVM copies one page of the blob from userspace to the guest. I've tested this patch with a hacked-up version of Gerd's userspace code, booting a number of guests (CentOS 5.3 i386 and x86_64, and FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices. [jan: fix i386 build warning] [avi: future proof abi with a flags field] Signed-off-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:18 +02:00
Jan Kiszka	94c30d9ca6	KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check This (broken) check dates back to the days when this code was shared across architectures. x86 has IOMEM, so drop it. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:18 +02:00
Marcelo Tosatti	9fb41ba896	KVM: VMX: fix handle_pause declaration There's no kvm_run argument anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:18 +02:00
Zachary Amsden	6b7d7e762b	KVM: x86: Harden against cpufreq If cpufreq can't determine the CPU khz, or cpufreq is not compiled in, we should fallback to the measured TSC khz. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:18 +02:00
Mark Langsdorf	565d0998ec	KVM: SVM: Support Pause Filter in AMD processors New AMD processors (Family 0x10 models 8+) support the Pause Filter Feature. This feature creates a new field in the VMCB called Pause Filter Count. If Pause Filter Count is greater than 0 and intercepting PAUSEs is enabled, the processor will increment an internal counter when a PAUSE instruction occurs instead of intercepting. When the internal counter reaches the Pause Filter Count value, a PAUSE intercept will occur. This feature can be used to detect contended spinlocks, especially when the lock holding VCPU is not scheduled. Rescheduling another VCPU prevents the VCPU seeking the lock from wasting its quantum by spinning idly. Experimental results show that most spinlocks are held for less than 1000 PAUSE cycles or more than a few thousand. Default the Pause Filter Counter to 3000 to detect the contended spinlocks. Processor support for this feature is indicated by a CPUID bit. On a 24 core system running 4 guests each with 16 VCPUs, this patch improved overall performance of each guest's 32 job kernbench by approximately 3-5% when combined with a scheduler algorithm thati caused the VCPU to sleep for a brief period. Further performance improvement may be possible with a more sophisticated yield algorithm. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:17 +02:00
Zhai, Edwin	4b8d54f972	KVM: VMX: Add support for Pause-Loop Exiting New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution control fields: PLE_Gap - upper bound on the amount of time between two successive executions of PAUSE in a loop. PLE_Window - upper bound on the amount of time a guest is allowed to execute in a PAUSE loop If the time, between this execution of PAUSE and previous one, exceeds the PLE_Gap, processor consider this PAUSE belongs to a new loop. Otherwise, processor determins the the total execution time of this loop(since 1st PAUSE in this loop), and triggers a VM exit if total time exceeds the PLE_Window. * Refer SDM volume 3b section 21.6.13 & 22.1.3. Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP is sched-out after hold a spinlock, then other VPs for same lock are sched-in to waste the CPU time. Our tests indicate that most spinlocks are held for less than 212 cycles. Performance tests show that with 2X LP over-commitment we can get +2% perf improvement for kernel build(Even more perf gain with more LPs). Signed-off-by: Zhai Edwin <edwin.zhai@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:17 +02:00
Joerg Roedel	d36f19e9ec	KVM: SVM: Remove nsvm_printk debugging code With all important informations now delivered through tracepoints we can savely remove the nsvm_printk debugging code for nested svm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:17 +02:00
Joerg Roedel	532a46b989	KVM: SVM: Add tracepoint for skinit instruction This patch adds a tracepoint for the event that the guest executed the SKINIT instruction. This information is important because SKINIT is an SVM extenstion not yet implemented by nested SVM and we may need this information for debugging hypervisors that do not yet run on nested SVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:16 +02:00
Joerg Roedel	ec1ff79084	KVM: SVM: Add tracepoint for invlpga instruction This patch adds a tracepoint for the event that the guest executed the INVLPGA instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:16 +02:00
Joerg Roedel	236649de33	KVM: SVM: Add tracepoint for #vmexit because intr pending This patch adds a special tracepoint for the event that a nested #vmexit is injected because kvm wants to inject an interrupt into the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:16 +02:00
Joerg Roedel	17897f3668	KVM: SVM: Add tracepoint for injected #vmexit This patch adds a tracepoint for a nested #vmexit that gets re-injected to the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:15 +02:00
Joerg Roedel	d8cabddf7e	KVM: SVM: Add tracepoint for nested #vmexit This patch adds a tracepoint for every #vmexit we get from a nested guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:15 +02:00
Joerg Roedel	0ac406de8f	KVM: SVM: Add tracepoint for nested vmrun This patch adds a dedicated kvm tracepoint for a nested vmrun. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:15 +02:00
Joerg Roedel	cd3ff653ae	KVM: SVM: Move INTR vmexit out of atomic code The nested SVM code emulates a #vmexit caused by a request to open the irq window right in the request function. This is a bug because the request function runs with preemption and interrupts disabled but the #vmexit emulation might sleep. This can cause a schedule()-while-atomic bug and is fixed with this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:15 +02:00
Alexander Graf	8d23c46624	KVM: SVM: Notify nested hypervisor of lost event injections If event_inj is valid on a #vmexit the host CPU would write the contents to exit_int_info, so the hypervisor knows that the event wasn't injected. We don't do this in nested SVM by now which is a bug and fixed by this patch. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:14 +02:00
Glauber Costa	e3267cbbbf	KVM: x86: include pvclock MSRs in msrs_to_save For a while now, we are issuing a rdmsr instruction to find out which msrs in our save list are really supported by the underlying machine. However, it fails to account for kvm-specific msrs, such as the pvclock ones. This patch moves then to the beginning of the list, and skip testing them. Cc: stable@kernel.org Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:14 +02:00
Jan Kiszka	91586a3b7d	KVM: x86: Rework guest single-step flag injection and filtering Push TF and RF injection and filtering on guest single-stepping into the vender get/set_rflags callbacks. This makes the whole mechanism more robust wrt user space IOCTL order and instruction emulations. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:14 +02:00
Marcelo Tosatti	a68a6a7282	KVM: x86: disable paravirt mmu reporting Disable paravirt MMU capability reporting, so that new (or rebooted) guests switch to native operation. Paravirt MMU is a burden to maintain and does not bring significant advantages compared to shadow anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:14 +02:00
Jan Kiszka	355be0b930	KVM: x86: Refactor guest debug IOCTL handling Much of so far vendor-specific code for setting up guest debug can actually be handled by the generic code. This also fixes a minor deficit in the SVM part /wrt processing KVM_GUESTDBG_ENABLE. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:14 +02:00
Juan Quintela	201d945bcf	KVM: remove pre_task_link setting in save_state_to_tss16 Now, also remove pre_task_link setting in save_state_to_tss16. commit `b237ac37a1` Author: Gleb Natapov <gleb@redhat.com> Date: Mon Mar 30 16:03:24 2009 +0300 KVM: Fix task switch back link handling. CC: Gleb Natapov <gleb@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:13 +02:00
Zachary Amsden	3230bb4707	KVM: Fix hotplug of CPUs Both VMX and SVM require per-cpu memory allocation, which is done at module init time, for only online cpus. Backend was not allocating enough structure for all possible CPUs, so new CPUs coming online could not be hardware enabled. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:13 +02:00
Zachary Amsden	e6732a5af9	KVM: Fix printk name error in svm.c Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:13 +02:00
Zachary Amsden	0cca790753	KVM: Kill the confusing tsc_ref_khz and ref_freq variables They are globals, not clearly protected by any ordering or locking, and vulnerable to various startup races. Instead, for variable TSC machines, register the cpufreq notifier and get the TSC frequency directly from the cpufreq machinery. Not only is it always right, it is also perfectly accurate, as no error prone measurement is required. On such machines, when a new CPU online is brought online, it isn't clear what frequency it will start with, and it may not correspond to the reference, thus in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure it is set before running on a VCPU. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:12 +02:00
Zachary Amsden	b820cc0ca2	KVM: Separate timer intialization into an indepedent function Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:12 +02:00
Joerg Roedel	e935d48e1b	KVM: SVM: Remove remaining occurences of rdtscll This patch replaces them with native_read_tsc() which can also be used in expressions and saves a variable on the stack in this case. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:12 +02:00
Joerg Roedel	33527ad7e1	KVM: SVM: don't copy exit_int_info on nested vmrun The exit_int_info field is only written by the hardware and never read. So it does not need to be copied on a vmrun emulation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:11 +02:00
Joerg Roedel	7fcdb5103d	KVM: SVM: reorganize svm_interrupt_allowed This patch reorganizes the logic in svm_interrupt_allowed to make it better to read. This is important because the logic is a lot more complicated with Nested SVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:11 +02:00
Huang Weiyi	bfc33beaed	KVM: remove duplicated #include Remove duplicated #include('s) in arch/x86/kvm/lapic.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:10 +02:00
Alexander Graf	10474ae894	KVM: Activate Virtualization On Demand X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. To circumvent this problem at least a bit, this patch introduces on demand activation of virtualization. This means, that instead virtualization is enabled on creation of the first virtual machine and disabled on destruction of the last one. So using this, KVM can be easily autoloaded, while keeping other hypervisors usable. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:10 +02:00
Marcelo Tosatti	e8b3433a5c	KVM: SVM: remove needless mmap_sem acquision from nested_svm_map nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since gfn_to_page / get_user_pages are responsible for it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:10 +02:00
Mohammed Gamal	80ced186d1	KVM: VMX: Enhance invalid guest state emulation - Change returned handle_invalid_guest_state() to return relevant exit codes - Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit() - Return to userspace instead of repeatedly trying to emulate instructions that have already failed Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-12-03 09:32:09 +02:00
Mohammed Gamal	abcf14b560	KVM: x86 emulator: Add pusha and popa instructions This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting MINIX with invalid guest state emulation on. [marcelo: remove unused variable] Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:09 +02:00
Mohammed Gamal	94677e61fd	KVM: x86 emulator: Add missing decoder flags for 'or' instructions Add missing decoder flags for or instructions (0xc-0xd). Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:09 +02:00
Avi Kivity	bfd99ff5d4	KVM: Move assigned device code to own file Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:09 +02:00
Avi Kivity	367e1319b2	KVM: Return -ENOTTY on unrecognized ioctls Not the incorrect -EINVAL. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:08 +02:00
Gleb Natapov	680b3648ba	KVM: Drop kvm->irq_lock lock from irq injection path The only thing it protects now is interrupt injection into lapic and this can work lockless. Even now with kvm->irq_lock in place access to lapic is not entirely serialized since vcpu access doesn't take kvm->irq_lock. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:08 +02:00
Gleb Natapov	eba0226bdf	KVM: Move IO APIC to its own lock The allows removal of irq_lock from the injection path. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:08 +02:00
Gleb Natapov	136bdfeee7	KVM: Move irq ack notifier list to arch independent code Mask irq notifier list is already there. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:07 +02:00
Gleb Natapov	3e71f88bc9	KVM: Maintain back mapping from irqchip/pin to gsi Maintain back mapping from irqchip/pin to gsi to speedup interrupt acknowledgment notifications. [avi: build fix on non-x86/ia64] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:07 +02:00
Gleb Natapov	1a6e4a8c27	KVM: Move irq sharing information to irqchip level This removes assumptions that max GSIs is smaller than number of pins. Sharing is tracked on pin level not GSI level. [avi: no PIC on ia64] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:06 +02:00
Gleb Natapov	79c727d437	KVM: Call pic_clear_isr() on pic reset to reuse logic there Also move call of ack notifiers after pic state change. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:06 +02:00
Avi Kivity	851ba6922a	KVM: Don't pass kvm_run arguments They're just copies of vcpu->run, which is readily accessible. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:06 +02:00
Mohammed Gamal	d8769fedd4	KVM: x86 emulator: Introduce No64 decode option Introduces a new decode option "No64", which is used for instructions that are invalid in long mode. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:05 +02:00
Mohammed Gamal	0934ac9d13	KVM: x86 emulator: Add 'push/pop sreg' instructions [avi: avoid buffer overflow] Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:05 +02:00
Avi Kivity	58988b07cf	Merge remote branch 'tip/x86/entry' into kvm-updates/2.6.33 Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:30:06 +02:00
Hidetoshi Seto	fe5ed91ddc	x86, mce: don't restart timer if disabled Even it is in error path unlikely taken, add_timer_on() at CPU_DOWN_FAILED* needs to be skipped if mce_timer is disabled. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: <stable@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-02 21:27:32 -08:00
Jan Beulich	99063c0bce	x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h This at once also gets the alignment specification right for x86-64. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4B0FF8F80200007800022708@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-02 11:39:45 +01:00
Jan Beulich	01be50a308	x86/alternatives: Check replacementlen <= instrlen at build time Having run into the run-(boot-)time check a couple of times lately, I finally took time to find a build-time check so that one doesn't need to analyze the register/stack dump and resolve this (through manual lookup in vmlinux) to the offending construct. The assembler will emit a message like "Error: value of <num> too large for field of 1 bytes at <offset>", which while not pointing out the source location still makes analysis quite a bit easier. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4B0FF8AA0200007800022703@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-02 11:39:45 +01:00
Masami Hiramatsu	e859cf8656	x86: Fix comments of register/stack access functions Fix typos and some redundant comments of register/stack access functions in asm/ptrace.h. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Wenji Huang <wenji.huang@oracle.com> Cc: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> LKML-Reference: <20091201000222.7669.7477.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu> Suggested-by: Wenji Huang <wenji.huang@oracle.com>	2009-12-02 10:22:22 +01:00
Suresh Siddha	6d20792e85	x86: Remove unnecessary mdelay() from cpu_disable_common() fixup_irqs() already has a mdelay(). Remove the extra and unnecessary mdelay() from cpu_disable_common(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: ebiederm@xmission.com Cc: garyhade@us.ibm.com LKML-Reference: <20091201233335.232177348@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-02 10:11:02 +01:00
Suresh Siddha	1c83995b6c	x86, ioapic: Document another case when level irq is seen as an edge In the case when cpu goes offline, fixup_irqs() will forward any unhandled interrupt on the offlined cpu to the new cpu destination that is handling the corresponding interrupt. This interrupt forwarding is done via IPI's. Hence, in this case also level-triggered io-apic interrupt will be seen as an edge interrupt in the cpu's APIC IRR. Document this scenario in the code which handles this case by doing an explicit EOI to the io-apic to clear remote IRR of the io-apic RTE. Requested-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: ebiederm@xmission.com Cc: garyhade@us.ibm.com LKML-Reference: <20091201233335.143970505@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-02 10:11:01 +01:00
Suresh Siddha	c29d9db338	x86, ioapic: Fix the EOI register detection mechanism Maciej W. Rozycki reported: > 82093AA I/O APIC has its version set to 0x11 and it > does not support the EOI register. Similarly I/O APICs > integrated into the 82379AB south bridge and the 82374EB/SB > EISA component. IO-APIC versions below 0x20 don't support EOI register. Some of the Intel ICH Specs (ICH2 to ICH5) documents the io-apic version as 0x2. This is an error with documentation and these ICH chips use io-apic's of version 0x20 and indeed has a working EOI register for the io-apic. Fix the EOI register detection mechanism to check for version 0x20 and beyond. And also, a platform can potentially have io-apic's with different versions. Make the EOI register check per io-apic. Reported-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: ebiederm@xmission.com Cc: garyhade@us.ibm.com LKML-Reference: <20091201233335.065361533@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-02 10:11:01 +01:00
Maciej W. Rozycki	ca64c47cec	x86, io-apic: Move the effort of clearing remoteIRR explicitly before migrating the irq When the level-triggered interrupt is seen as an edge interrupt, we try to clear the remoteIRR explicitly (using either an io-apic eoi register when present or through the idea of changing trigger mode of the io-apic RTE to edge and then back to level). But this explicit try also needs to happen before we try to migrate the irq. Otherwise irq migration attempt will fail anyhow, as it postpones the irq migration to a later attempt when it sees the remoteIRR in the io-apic RTE still set. Signed-off-by: "Maciej W. Rozycki" <macro@linux-mips.org> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: ebiederm@xmission.com Cc: garyhade@us.ibm.com LKML-Reference: <20091201233334.975416130@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-02 10:11:00 +01:00
Frederic Weisbecker	1cedae7290	hw-breakpoints: Keep track of user disabled breakpoints When we disable a breakpoint through dr7, we unregister it right away, making us lose track of its corresponding address register value. It means that the following sequence would be unsupported: - set address in dr0 - enable it through dr7 - disable it through dr7 - enable it through dr7 because we lost the address register value when we disabled the breakpoint. Don't unregister the disabled breakpoints but rather disable them. Reported-by: "K.Prasad" <prasad@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259735536-9236-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-02 09:59:03 +01:00
David S. Miller	ff9c38bba3	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/ht.c	2009-12-01 22:13:38 -08:00
Herbert Xu	8386324381	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2009-12-01 15:16:22 +08:00
H. Peter Anvin	ccef086454	x86, mm: Correct the implementation of is_untracked_pat_range() The semantics the PAT code expect of is_untracked_pat_range() is "is this range completely contained inside the untracked region." This means that checkin `8a27138924` was technically wrong, because the implementation needlessly confusing. The sane interface is for it to take a semiclosed range like just about everything else (as evidenced by the sheer number of "- 1"'s removed by that patch) so change the actual implementation to match. Reported-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jack Steiner <steiner@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <20091119202341.GA4420@sgi.com>	2009-11-30 21:33:51 -08:00
Helight.Xu	9eaa192d89	x86: Fix a section mismatch in arch/x86/kernel/setup.c copy_edd() should be __init. warning msg: WARNING: vmlinux.o(.text+0x7759): Section mismatch in reference from the function copy_edd() to the variable .init.data:boot_params The function copy_edd() references the variable __initdata boot_params. This is often because copy_edd lacks a __initdata annotation or the annotation of boot_params is wrong. Signed-off-by: ZhenwenXu <helight.xu@gmail.com> LKML-Reference: <4B139F8F.4000907@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-11-30 11:16:49 -08:00
Thomas Gleixner	b8b7d791a8	x86: Use -maccumulate-outgoing-args for sane mcount prologues commit `746357d` (x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage) uses -mtune=generic to work around the function prologue problem with mcount on -march=pentium-mmx and others. Jakub pointed out that we can use -maccumulate-outgoing-args instead which is selected by -mtune=generic and prevents the problem without losing the -march specific optimizations. Pointed-out-by: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@kernel.org	2009-11-28 15:08:30 +01:00
Thomas Gleixner	18ed61da98	x86: hpet: Make WARN_ON understandable Andrew complained rightly that the WARN_ON in hpet_next_event() is confusing and the code comment not really helpful. Change it to WARN_ONCE and print the reason in clear text. Change the comment to explain what kind of hardware wreckage we deal with. Pointed-out-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>	2009-11-27 20:37:41 +01:00
Joerg Roedel	492667dacc	x86/amd-iommu: Remove amd_iommu_pd_table The data that was stored in this table is now available in dev->archdata.iommu. So this table is not longer necessary. This patch removes the remaining uses of that variable and removes it from the code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:37 +01:00
Joerg Roedel	8eed983334	x86/amd-iommu: Move reset_iommu_command_buffer out of locked code This patch removes the ugly contruct where the iommu->lock must be released while before calling the reset_iommu_command_buffer function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:37 +01:00
Joerg Roedel	b00d3bcff4	x86/amd-iommu: Cleanup DTE flushing code This patch cleans up the code to flush device table entries in the IOMMU. With this chance the driver can get rid of the iommu_queue_inv_dev_entry() function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:36 +01:00
Joerg Roedel	3fa43655d8	x86/amd-iommu: Introduce iommu_flush_device() function This patch adds a function to flush a DTE entry for a given struct device and replaces iommu_queue_inv_dev_entry calls with this function where appropriate. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:35 +01:00
Joerg Roedel	7f760ddd70	x86/amd-iommu: Cleanup attach/detach_device code This patch cleans up the attach_device and detach_device paths and fixes reference counting while at it. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:35 +01:00
Joerg Roedel	7c392cbe98	x86/amd-iommu: Keep devices per domain in a list This patch introduces a list to each protection domain which keeps all devices associated with the domain. This can be used later to optimize certain functions and to completly remove the amd_iommu_pd_table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:34 +01:00
Joerg Roedel	241000556f	x86/amd-iommu: Add device bind reference counting This patch adds a reference count to each device to count how often the device was bound to that domain. This is important for single devices that act as an alias for a number of others. These devices must stay bound to their domains until all devices that alias to it are unbound from the same domain. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:33 +01:00
Joerg Roedel	657cbb6b6c	x86/amd-iommu: Use dev->arch->iommu to store iommu related information This patch changes IOMMU code to use dev->archdata->iommu to store information about the alias device and the domain the device is attached to. This allows the driver to get rid of the amd_iommu_pd_table in the future. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:32 +01:00
Joerg Roedel	8793abeb78	x86/amd-iommu: Remove support for domain sharing This patch makes device isolation mandatory and removes support for the amd_iommu=share option. This simplifies the code in several places. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:32 +01:00
Joerg Roedel	171e7b3739	x86/amd-iommu: Rearrange dma_ops related functions This patch rearranges two dma_ops related functions so that their forward declarations are not longer necessary. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:31 +01:00
Joerg Roedel	308973d3b9	x86/amd-iommu: Move some pte allocation functions in the right section This patch moves alloc_pte() and fetch_pte() into the page table handling code section so that the forward declarations for them could be removed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:30 +01:00
Joerg Roedel	87a64d5238	x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc This function doesn't use the parameter anymore so it can be removed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:30 +01:00
Joerg Roedel	98fc5a693b	x86/amd-iommu: Use get_device_id and check_device where appropriate The logic of these two functions is reimplemented (at least in parts) in places in the code. This patch removes these code duplications and uses the functions instead. As a side effect it moves check_device() to the helper function code section. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:29 +01:00
Joerg Roedel	71c70984e5	x86/amd-iommu: Move find_protection_domain to helper functions This is a helper function and when its placed in the helper function section we can remove its forward declaration. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:28 +01:00
Joerg Roedel	94f6d190ee	x86/amd-iommu: Simplify get_device_resources() With the previous changes the get_device_resources function can be simplified even more. The only important information for the callers is the protection domain. This patch renames the function to get_domain() and let it only return the protection domain for a device. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:20:21 +01:00
Joerg Roedel	15898bbcb4	x86/amd-iommu: Let domain_for_device handle aliases If there is no domain associated to a device yet and the device has an alias device which already has a domain, the original device needs to have the same domain as the alias device. This patch changes domain_for_device to handle this situation and directly assigns the alias device domain to the device in this situation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:17:09 +01:00
Joerg Roedel	f3be07da53	x86/amd-iommu: Remove iommu specific handling from dma_ops path This patch finishes the removal of all iommu specific handling code in the dma_ops path. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:17:08 +01:00
Joerg Roedel	cd8c82e875	x86/amd-iommu: Remove iommu parameter from __(un)map_single With the prior changes this parameter is not longer required. This patch removes it from the function and all callers. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:17:08 +01:00
Joerg Roedel	576175c250	x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs Since the assumption that an dma_ops domain is only bound to one IOMMU was given up we need to make alloc_new_range aware of it. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:17:01 +01:00
Joerg Roedel	680525e06d	x86/amd-iommu: Remove iommu parameter from dma_ops_domain_(un)map The parameter is unused in these function so remove it from the parameter list. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:16:31 +01:00
Joerg Roedel	f99c0f1c75	x86/amd-iommu: Use check_device in get_device_resources Every call-place of get_device_resources calls check_device before it. So call it from get_device_resources directly and simplify the code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:16:30 +01:00
Joerg Roedel	420aef8a3a	x86/amd-iommu: Use check_device for amd_iommu_dma_supported The check_device logic needs to include the dma_supported checks to be really sure. Merge the dma_supported logic into check_device and use it to implement dma_supported. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:16:30 +01:00
Joerg Roedel	318afd41d2	x86/amd-iommu: Make np-cache a global flag The non-present cache flag was IOMMU local until now which doesn't make sense. Make this a global flag so we can remove the lase user of 'struct iommu' in the map/unmap path. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:16:29 +01:00
Joerg Roedel	09b4280439	x86/amd-iommu: Reimplement flush_all_domains_on_iommu() This patch reimplements the function flush_all_domains_on_iommu to use the global protection domain list. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:16:28 +01:00
Joerg Roedel	e3306664eb	x86/amd-iommu: Reimplement amd_iommu_flush_all_domains() This patch reimplementes the amd_iommu_flush_all_domains function to use the global protection domain list instead of flushing every domain on every IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:16:28 +01:00
Joerg Roedel	aeb26f5533	x86/amd-iommu: Implement protection domain list This patch adds code to keep a global list of all protection domains. This allows to simplify the resume code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:16:27 +01:00
Joerg Roedel	601367d76b	x86/amd-iommu: Remove iommu_flush_domain function This iommu_flush_tlb_pde function does essentially the same. So the iommu_flush_domain function is redundant and can be removed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:16:26 +01:00
Joerg Roedel	dcd1e92e40	x86/amd-iommu: Use __iommu_flush_pages for tlb flushes This patch re-implements iommu_flush_tlb functions to use the __iommu_flush_pages logic. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:16:26 +01:00
Joerg Roedel	6de8ad9b9e	x86/amd-iommu: Make iommu_flush_pages aware of multiple IOMMUs This patch extends the iommu_flush_pages function to flush the TLB entries on all IOMMUs the domain has devices on. This basically gives up the former assumption that dma_ops domains are only bound to one IOMMU in the system. For dma_ops domains this is still true but not for IOMMU-API managed domains. Giving this assumption up for dma_ops domains too allows code simplification. Further it splits out the main logic into a generic function which can be used by iommu_flush_tlb too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 14:16:18 +01:00
Joerg Roedel	0518a3a458	x86/amd-iommu: Add function to complete a tlb flush This patch adds a function to the AMD IOMMU driver which completes all queued commands an all IOMMUs a specific domain has devices attached on. This is required in a later patch when per-domain flushing is implemented. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 11:45:50 +01:00
Joerg Roedel	c459611424	x86/amd-iommu: Add per IOMMU reference counting This patch adds reference counting for protection domains per IOMMU. This allows a smarter TLB flushing strategy. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 11:45:50 +01:00
Joerg Roedel	bb52777ec4	x86/amd-iommu: Add an index field to struct amd_iommu This patch adds an index field to struct amd_iommu which can be used to lookup it up in an array. This index will be used in struct protection_domain to keep track which protection domain has devices behind which IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 11:45:49 +01:00
Joerg Roedel	bf3118c127	x86/amd-iommu: Update copyright headers This patch updates the copyright headers in the relevant AMD IOMMU driver files to match the date of the latest changes. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 11:45:49 +01:00
Joerg Roedel	6a9401a7ac	x86/amd-iommu: Separate internal interface definitions This patch moves all function declarations which are only used inside the driver code to a seperate header file. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-27 11:45:48 +01:00
Frederic Weisbecker	5fa10b28e5	hw-breakpoints: Use struct perf_event_attr to define user breakpoints In-kernel user breakpoints are created using functions in which we pass breakpoint parameters as individual variables: address, length and type. Although it fits well for x86, this just does not scale across archictectures that may support this api later as these may have more or different needs. Pass in a perf_event_attr structure instead because it is meant to evolve as much as possible into a generic hardware breakpoint parameter structure. Reported-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259294154-5197-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-27 06:22:58 +01:00
Xiaotian Feng	dd4377b02d	x86/pat: Trivial: don't create debugfs for memtype if pat is disabled If pat is disabled (boot with nopat), there's no need to create debugfs for it, it's empty all the time. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1259236428-16329-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 15:05:03 +01:00
Jack Steiner	918bc960dc	x86: SGI UV: Map low MMR ranges Explicitly mmap the UV chipset MMR address ranges used to access blade-local registers. Although these same MMRs are also mmaped at higher addresses, the low range is more convenient when accessing blade-local registers. The low range addresses always alias to the local blade regardless of the blade id. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20091125162018.GA25445@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 10:52:36 +01:00
Brian Gerst	8ec6993d9f	x86, 64-bit: Set data segments to null after switching to 64-bit mode This prevents kernel threads from inheriting non-null segment selectors, and causing optimizations in __switch_to() to be ineffective. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Tim Blechmann <tim@klingt.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jan Beulich <JBeulich@novell.com> LKML-Reference: <1259165856-3512-1-git-send-email-brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 10:44:30 +01:00
Ingo Molnar	64b028b226	x86: Clean up the loadsegment() macro Make it readable in the source too, not just in the assembly output. No change in functionality. Cc: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 10:38:52 +01:00
Brian Gerst	79b0379cee	x86: Optimize loadsegment() Zero the input register in the exception handler instead of using an extra register to pass in a zero value. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 10:33:58 +01:00
Hidetoshi Seto	767df1bdd8	x86, mce: Add __cpuinit to hotplug callback functions The mce_disable_cpu() and mce_reenable_cpu() are called only from mce_cpu_callback() which is marked as __cpuinit. So these functions can be __cpuinit too. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <ak@linux.intel.com> LKML-Reference: <4B0E3C4E.4090809@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 10:29:41 +01:00
Mike Travis	9b3660a55a	x86: Limit number of per cpu TSC sync messages Limit the number of per cpu TSC sync messages by only printing to the console if an error occurs, otherwise print as a DEBUG message. The info message "Skipping synchronization ..." is only printed after the last cpu has booted. Signed-off-by: Mike Travis <travis@sgi.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091118002222.181053000@alcatraz.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 10:17:45 +01:00
Frederic Weisbecker	2c31b7958f	x86/hw-breakpoints: Don't lose GE flag while disabling a breakpoint When we schedule out a breakpoint from the cpu, we also incidentally remove the "Global exact breakpoint" flag from the breakpoint control register. It makes us losing the fine grained precision about the origin of the instructions that may trigger breakpoint exceptions for the other breakpoints running in this cpu. Reported-by: Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259211878-6013-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 09:29:22 +01:00
Frederic Weisbecker	605bfaee90	hw-breakpoints: Simplify error handling in breakpoint creation requests This simplifies the error handling when we create a breakpoint. We don't need to check the NULL return value corner case anymore since we have improved perf_event_create_kernel_counter() to always return an error code in the failure case. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Prasad <prasad@linux.vnet.ibm.com> LKML-Reference: <1259210142-5714-3-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 09:29:21 +01:00
Ilya Loginov	2d4dc890b5	block: add helpers to run flush_dcache_page() against a bio and a request's pages Mtdblock driver doesn't call flush_dcache_page for pages in request. So, this causes problems on architectures where the icache doesn't fill from the dcache or with dcache aliases. The patch fixes this. The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid pointless empty cache-thrashing loops on architectures for which flush_dcache_page() is a no-op. Every architecture was provided with this flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is equal 1 or do nothing otherwise. See "fix mtd_blkdevs problem with caches on some architectures" discussion on LKML for more information. Signed-off-by: Ilya Loginov <isloginov@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Peter Horton <phorton@bitbox.co.uk> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-11-26 09:16:19 +01:00
Ingo Molnar	67f2de0bf9	x86: dumpstack, 64-bit: Disable preemption when walking the IRQ/exception stacks This warning: [ 847.140022] rb_producer D 0000000000000000 5928 519 2 0x00000000 [ 847.203627] BUG: using smp_processor_id() in preemptible [00000000] code: khungtaskd/517 [ 847.207360] caller is show_stack_log_lvl+0x2e/0x241 [ 847.210364] Pid: 517, comm: khungtaskd Not tainted 2.6.32-rc8-tip+ #13761 [ 847.213395] Call Trace: [ 847.215847] [<ffffffff81413bde>] debug_smp_processor_id+0x1f0/0x20a [ 847.216809] [<ffffffff81015eae>] show_stack_log_lvl+0x2e/0x241 [ 847.220027] [<ffffffff81018512>] show_stack+0x1c/0x1e [ 847.223365] [<ffffffff8107b7db>] sched_show_task+0xe4/0xe9 [ 847.226694] [<ffffffff8112f21f>] check_hung_task+0x140/0x199 [ 847.230261] [<ffffffff8112f4a8>] check_hung_uninterruptible_tasks+0x1b7/0x20f [ 847.233371] [<ffffffff8112f500>] ? watchdog+0x0/0x50 [ 847.236683] [<ffffffff8112f54e>] watchdog+0x4e/0x50 [ 847.240034] [<ffffffff810cee56>] kthread+0x97/0x9f [ 847.243372] [<ffffffff81012aea>] child_rip+0xa/0x20 [ 847.246690] [<ffffffff81e43494>] ? restore_args+0x0/0x30 [ 847.250019] [<ffffffff81e43083>] ? _spin_lock+0xe/0x10 [ 847.253351] [<ffffffff810cedbf>] ? kthread+0x0/0x9f [ 847.256833] [<ffffffff81012ae0>] ? child_rip+0x0/0x20 Happens because on preempt-RCU, khungd calls show_stack() with preemption enabled. Make sure we are not preemptible while walking the IRQ and exception stacks on 64-bit. (32-bit stack dumping is preemption safe.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 08:29:10 +01:00
Ingo Molnar	b803090615	x86: dumpstack: Clean up the x86_stack_ids[][] initalization and other details Make the initialization more readable, plus tidy up a few small visual details as well. No change in functionality. LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 08:24:33 +01:00
Tejun Heo	28b4e0d86a	x86: Rename global percpu symbol dr7 to cpu_dr7 Percpu symbols now occupy the same namespace as other global symbols and as such short global symbols without subsystem prefix tend to collide with local variables. dr7 percpu variable used by x86 was hit by this. Rename it to cpu_dr7. The rename also makes it more consistent with its fellow cpu_debugreg percpu variable. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>, Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20091125115856.GA17856@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>	2009-11-25 14:30:04 +01:00
FUJITA Tomonori	273bee27fa	x86: Fix iommu=soft boot option iommu=soft boot option forces the kernel to use swiotlb. ( This has the side-effect of enabling the swiotlb over the GART if this boot option is provided. This is the desired behavior of the swiotlb boot option and works like that for all other hw-IOMMU drivers. ) Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: yinghai@kernel.org LKML-Reference: <20091125084611O.fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-25 10:12:51 +01:00
Lin Ming	2263576cfc	ACPICA: Add post-order callback to acpi_walk_namespace The existing interface only has a pre-order callback. This change adds an additional parameter for a post-order callback which will be more useful for bus scans. ACPICA BZ 779. Also update the external calls to acpi_walk_namespace. http://www.acpica.org/bugzilla/show_bug.cgi?id=779 Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2009-11-24 21:31:10 -05:00
Bjorn Helgaas	f6e1d8cc38	x86/PCI: MMCONFIG: add lookup function This patch factors out the search for an MMCONFIG region, which was previously implemented in both mmconfig_32 and mmconfig_64. No functional change. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:30:36 -08:00
Bjorn Helgaas	8c57786ad3	x86/PCI: MMCONFIG: clean up printks No functional change; just tidy up printks and make them more consistent with the rest of PCI. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:30:30 -08:00
Bjorn Helgaas	ba2afbabfc	x86/PCI: MMCONFIG: add pci_mmconfig_remove() to remove MMCONFIG region This is only used internally now, but eventually will be used in the hot-remove path to remove the MMCONFIG region associated with a host bridge. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:30:24 -08:00
Bjorn Helgaas	ff097ddd4a	x86/PCI: MMCONFIG: manage pci_mmcfg_region as a list, not a table This changes pci_mmcfg_region from a table to a list, to make it easier to add and remove MMCONFIG regions for PCI host bridge hotplug. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:30:14 -08:00
Bjorn Helgaas	987c367b4e	x86/PCI: MMCONFIG: remove typeof so we can use a list This replaces "typeof(pci_mmcfg_config[0])" with the actual type because I plan to convert pci_mmcfg_config to a list, and then "pci_mmcfg_config[0]" won't mean anything. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:30:08 -08:00
Bjorn Helgaas	3f0f550392	x86/PCI: MMCONFIG: add virtual address to struct pci_mmcfg_region The virtual address is only used for x86_64, but it's so much simpler to manage it as part of the pci_mmcfg_region that I think it's worth wasting a pointer per MMCONFIG region on x86_32. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:30:01 -08:00
Bjorn Helgaas	2f2a8b9c90	x86/PCI: MMCONFIG: trivial is_mmconf_reserved() interface simplification Since pci_mmcfg_region contains the struct resource, no need to pass the pci_mmcfg_region and the resource start/size. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:29:49 -08:00
Bjorn Helgaas	56ddf4d3cf	x86/PCI: MMCONFIG: add resource to struct pci_mmcfg_region This patch adds a resource and corresponding name to the MMCONFIG structure. This makes allocation simpler (we can allocate the resource and name at the same time we allocate the pci_mmcfg_region), and gives us a way to hang onto the resource after inserting it. This will be needed so we can release and free it when hot-removing a host bridge. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:29:41 -08:00
Bjorn Helgaas	95cf1cf0c5	x86/PCI: MMCONFIG: use pointer to simplify pci_mmcfg_config[] structure access No functional change, but simplifies a future patch to convert the table to a list. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:29:34 -08:00
Bjorn Helgaas	d7e6b66fe8	x86/PCI: MMCONFIG: rename pci_mmcfg_region structure members This only renames the struct pci_mmcfg_region members; no functional change. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:29:24 -08:00
Bjorn Helgaas	d215a9c8b4	x86/PCI: MMCONFIG: use a private structure rather than the ACPI MCFG one This adds a struct pci_mmcfg_region with a little more information than the struct acpi_mcfg_allocation used previously. The acpi_mcfg structure is defined by the spec, so we can't change it. To begin with, struct pci_mmcfg_region is basically the same as the ACPI MCFG version, but future patches will add more information. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:29:17 -08:00
Bjorn Helgaas	df5eb1d67e	x86/PCI: MMCONFIG: add PCI_MMCFG_BUS_OFFSET() to factor common expression This factors out the common "bus << 20" expression used when computing the MMCONFIG address. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:29:11 -08:00
Bjorn Helgaas	f7ca698487	x86/PCI: MMCONFIG: reject MMCONFIG apertures at address zero Since all MMCONFIG regions go through pci_mmconfig_add(), we can test the address once there. If the caller supplies an address of zero, we never insert it in the pci_mmcfg_config[] table, so no need to test it elsewhere. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:29:03 -08:00
Bjorn Helgaas	463a5df175	x86/PCI: MMCONFIG: simplify tests for empty pci_mmcfg_config table We never set pci_mmcfg_config unless we increment pci_mmcfg_config_num, so there's no need to test both pci_mmcfg_config_num and pci_mmcfg_config. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:28:56 -08:00
Bjorn Helgaas	7da7d360ae	x86/PCI: MMCONFIG: centralize MCFG structure management This patch encapsulate pci_mmcfg_config[] updates. All alloc/free is now done in pci_mmconfig_add() and free_all_mcfg(), so all updates to pci_mmcfg_config[] and pci_mmcfg_config_num are in those two functions. This replaces the previous sequence of extend_mmcfg(), fill_one_mmcfg() with the single pci_mmconfig_add() interface. This interface is currently static but will eventually be used in the host bridge hot-add path. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:28:51 -08:00
Bjorn Helgaas	d3578ef7aa	x86/PCI: MMCONFIG: step through MCFG table, not pci_mmcfg_config[] Step through the ACPI MCFG table, not pci_mmcfg_config[]. No functional change, but simplifies future patches that encapsulate pci_mmcfg_config[]. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:28:43 -08:00
Bjorn Helgaas	e823d6ff58	x86/PCI: MMCONFIG: count MCFG structures with local variable Use a local variable, not pci_mmcfg_config_num, to count MCFG entries. No functional change, but simplifies future changes. Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:28:37 -08:00
Bjorn Helgaas	5663b1b963	x86/PCI: MMCONFIG: remove unused definitions Reviewed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:28:30 -08:00
Yinghai Lu	67f241f457	x86/pci: seperate x86_pci_rootbus_res_quirks from amd_bus.c Those functions are used by intel_bus.c so seperate them to another file. and make amd_bus a bit smaller. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:25:59 -08:00
Jiri Kosina	7b7a785942	PCI: fix comment typo in bus_numa.h Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:25:20 -08:00
Alex Chiang	2ed7a806d8	x86/PCI: remove early PCI pr_debug statements commit db635adc turned -DDEBUG for x86/pci on when CONFIG_PCI_DEBUG is set. In general, I agree with that change. However, it exposes a bunch of very low level PCI debugging in the early x86 path, such as: 0 reading 2 from a: ffff 1 reading 2 from a: ffff 2 reading 2 from a: ffff 3 reading 2 from a: 300 3 reading 2 from 0: 1002 3 reading 2 from 2: 515e These statements add a lot of noise to the boot and aren't likely to be necessary even when handling random upstream bug reports. [In contrast, statements such as these: pci 0000:02:04.0: found [14e4:164a] class 000200 header type 00 pci 0000:02:04.0: reg 10: [mem 0xf8000000-0xf9ffffff 64bit] pci 0000:02:04.0: reg 30: [mem 0x00000000-0x0001ffff pref] are indeed useful when remote debugging users' machines] Remove the noisy printks and save electrons everywhere. Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-24 15:25:19 -08:00
Yinghai Lu	5bf65b9ba6	x86, mtrr: Fix sorting of mtrr after subtracting In some cases we can coalesce MTRR entries after cleanup; this may allow us to have more entries. As such, introduce clean_sort_range to to sort and coaelsce the MTRR entries. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B0BB9A3.5020908@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-11-24 13:06:16 -08:00
Thomas Renninger	e2f74f355e	[ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface This interface is mainly intended (and implemented) for ACPI _PPC BIOS frequency limitations, but other cpufreq drivers can also use it for similar use-cases. Why is this needed: Currently it's not obvious why cpufreq got limited. People see cpufreq/scaling_max_freq reduced, but this could have happened by: - any userspace prog writing to scaling_max_freq - thermal limitations - hardware (_PPC in ACPI case) limitiations Therefore export bios_limit (in kHz) to: - Point the user that it's the BIOS (broken or intended) which limits frequency - Export it as a sysfs interface for userspace progs. While this was a rarely used feature on laptops, there will appear more and more server implemenations providing "Green IT" features like allowing the service processor to limit the frequency. People want to know about HW/BIOS frequency limitations. All ACPI P-state driven cpufreq drivers are covered with this patch: - powernow-k8 - powernow-k7 - acpi-cpufreq Tested with a patched DSDT which limits the first two cores (_PPC returns 1) via _PPC, exposed by bios_limit: # echo 2200000 >cpu2/cpufreq/scaling_max_freq # cat cpu/cpufreq/scaling_max_freq 2600000 2600000 2200000 2200000 # #scaling_max_freq shows general user/thermal/BIOS limitations # cat cpu/cpufreq/bios_limit 2600000 2600000 2800000 2800000 # #bios_limit only shows the HW/BIOS limitation CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com> CC: Len Brown <lenb@kernel.org> CC: davej@codemonkey.org.uk CC: linux@dominikbrodowski.net Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>	2009-11-24 13:33:34 -05:00
Rusty Russell	1cce76c2ac	[CPUFREQ] use an enum for speedstep processor identification The "unsigned int processor" everywhere confused Rusty, leading to breakage when he passed in smp_processor_id(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Jones <davej@redhat.com>	2009-11-24 13:33:34 -05:00
Krzysztof Helt	db2820dd54	[CPUFREQ] powernow-k6: set transition latency value so ondemand governor can be used Set the transition latency to value smaller than CPUFREQ_ETERNAL so governors other than "performance" work (like the "ondemand" one). The value is found in "AMD PowerNow! Technology Platform Design Guide for Embedded Processors" dated December 2000 (AMD doc #24267A). There is the answer to one of FAQs on page 40 which states that suggested complete transition period is 200 us. Tested on K6-2+ CPU with K6-3 core (model 13, stepping 4). Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Dave Jones <davej@redhat.com>	2009-11-24 13:33:33 -05:00
Rusty Russell	b8cbe7e82e	[CPUFREQ] cpumask: don't put a cpumask on the stack in x86...cpufreq/powernow-k8.c It's still mugging the current process's cpumask, but as comment in `1ff6e97f1d` says, it's not a trivial fix. So, at least we can use a cpumask_var_t to do the Wrong Thing the Right Way :) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> To: cpufreq@vger.kernel.org Cc: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-11-24 13:33:33 -05:00
Harald Welte	d77b819745	[CPUFREQ] Enable ACPI PDC handshake for VIA/Centaur CPUs In commit `0de51088e6`, we introduced the use of acpi-cpufreq on VIA/Centaur CPU's by removing a vendor check for VENDOR_INTEL. However, as it turns out, at least the Nano CPU's also need the PDC (processor driver capabilities) handshake in order to activate the methods required for acpi-cpufreq. Since arch_acpi_processor_init_pdc() contains another vendor check for Intel, the PDC is not initialized on VIA CPU's. The resulting behavior of a current mainline kernel on such systems is: acpi-cpufreq loads and it indicates CPU frequency changes. However, the CPU stays at a single frequency This trivial patch ensures that init_intel_pdc() is called on Intel and VIA/Centaur CPU's alike. Signed-off-by: Harald Welte <HaraldWelte@viatech.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-11-24 13:33:32 -05:00
Stephane Eranian	1261a02a0c	perf_events, x86: Fix validate_event bug The validate_event() was failing on valid event combinations. The function was assuming that if x86_schedule_event() returned 0, it meant error. But x86_schedule_event() returns the counter index and 0 is a perfectly valid value. An error is returned if the function returns a negative value. Furthermore, validate_event() was also failing for event groups because the event->pmu was not set until after hw_perf_event_init(). Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: paulus@samba.org Cc: perfmon2-devel@lists.sourceforge.net Cc: eranian@gmail.com LKML-Reference: <4b0bdf36.1818d00a.07cc.25ae@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> -- arch/x86/kernel/cpu/perf_event.c \| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)	2009-11-24 19:23:48 +01:00
Yinghai Lu	b24c2a925a	x86: Move find_smp_config() earlier and avoid bootmem usage Move the find_smp_config() call to before bootmem is initialized. Use reserve_early() instead of reserve_bootmem() in it. This simplifies the code, we only need to call find_smp_config() once and can remove the now unneeded reserve parameter from x86_init_mpparse::find_smp_config. We thus also reduce x86's dependency on bootmem allocations. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B0BB9F2.70907@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-24 12:10:51 +01:00
H. Peter Anvin	eb41c8be89	x86, platform: Change is_untracked_pat_range() to bool; cleanup init - Change is_untracked_pat_range() to return bool. - Clean up the initialization of is_untracked_pat_range() -- by default, we simply point it at is_ISA_range() directly. - Move is_untracked_pat_range to the end of struct x86_platform, since it is the newest field. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jack Steiner <steiner@sgi.com> LKML-Reference: <20091119202341.GA4420@sgi.com>	2009-11-23 17:09:59 -08:00
H. Peter Anvin	65f116f5f1	x86: Change is_ISA_range() into an inline function Change is_ISA_range() from a macro to an inline function. This makes it type safe, and also allows it to be assigned to a function pointer if necessary. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <20091119202341.GA4420@sgi.com>	2009-11-23 17:09:59 -08:00
H. Peter Anvin	8a27138924	x86, mm: is_untracked_pat_range() takes a normal semiclosed range is_untracked_pat_range() -- like its components, is_ISA_range() and is_GRU_range(), takes a normal semiclosed interval (>=, <) whereas the PAT code called it as if it took a closed range (>=, <=). Fix. Although this is a bug, I believe it is non-manifest, simply because none of the callers will call this with non-page-aligned addresses. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091119202341.GA4420@sgi.com>	2009-11-23 17:09:59 -08:00
H. Peter Anvin	55a6ca2547	x86, mm: Call is_untracked_pat_range() rather than is_ISA_range() Checkin `fd12a0d69a` made the PAT untracked range a platform configurable, but missed on occurrence of is_ISA_range() which still refers to PAT-untracked memory, and therefore should be using the configurable. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091119202341.GA4420@sgi.com>	2009-11-23 17:09:59 -08:00
Borislav Petkov	27c13ecec4	x86, cpu: mv display_cacheinfo -> cpu_detect_cache_sizes display_cacheinfo() doesn't display anything anymore and it is used to detect CPU cache sizes. Rename it accordingly. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <20091121130145.GA31357@liondog.tnic> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-11-23 11:59:53 -08:00
Jack Steiner	fd12a0d69a	x86: UV SGI: Don't track GRU space in PAT GRU space is always mapped as WB in the page table. There is no need to track the mappings in the PAT. This also eliminates the "freeing invalid memtype" messages when the GRU space is unmapped. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20091119202341.GA4420@sgi.com> [ v2: fix build failure ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 19:47:42 +01:00
Dimitri Sivanich	581f202bcd	x86: UV RTC: Always enable RTC clocksource Always enable the RTC clocksource on UV systems. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> LKML-Reference: <20091120214826.GA20016@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 19:41:30 +01:00
Jan Beulich	0444c9bd0c	x86: Tighten conditionals on MCE related statistics irq_thermal_count is only being maintained when X86_THERMAL_VECTOR, and both X86_THERMAL_VECTOR and X86_MCE_THRESHOLD don't need extra wrapping in X86_MCE conditionals. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Yong Wang <yong.y.wang@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Arjan van de Ven <arjan@infradead.org> LKML-Reference: <4B06AFA902000078000211F8@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 19:40:06 +01:00
Cliff Wickman	e38e2af1c5	x86: SGI UV: Fix BAU initialization A memory mapped register that affects the SGI UV Broadcast Assist Unit's interrupt handling may sometimes be unintialized. Remove the condition on its initialization, as that condition can be randomly satisfied by a hardware reset. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: <stable@kernel.org> LKML-Reference: <E1NBGB9-0005nU-Dp@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 19:12:50 +01:00
K.Prasad	ba6909b719	hw-breakpoint: Attribute authorship of hw-breakpoint related files Attribute authorship to developers of hw-breakpoint related files. Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091123154713.GA5593@in.ibm.com> [ v2: moved it to latest -tip ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 18:18:29 +01:00
Jiri Kosina	68ee87164e	crypto: ghash-clmulni-intel - Put proper .data section in place Lbswap_mask, Lpoly and Ltwo_one should clearly belong to .data section, not .text. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2009-11-23 20:19:47 +08:00
Huang Ying	564ec0ec05	crypto: ghash-clmulni-intel - Use gas macro for PCLMULQDQ-NI and PSHUFB Old binutils do not support PCLMULQDQ-NI and PSHUFB, to make kernel can be compiled by them, .byte code is used instead of assembly instructions. But the readability and flexibility of raw .byte code is not good. So corresponding assembly instruction like gas macro is used instead. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2009-11-23 19:55:22 +08:00
Joerg Roedel	be83129771	x86/amd-iommu: attach devices to pre-allocated domains early For some devices the ACPI table may define unity map requirements which must me met when the IOMMU is enabled. So we need to attach devices to their domains as early as possible so that these mappings are in place when needed. This patch assigns the domains right after they are allocated. Otherwise this can result in I/O page faults before a driver binds to a device and BIOS is still using it. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-23 12:54:17 +01:00
Huang Ying	b369e52123	crypto: aesni-intel - Use gas macro for AES-NI instructions Old binutils do not support AES-NI instructions, to make kernel can be compiled by them, .byte code is used instead of AES-NI assembly instructions. But the readability and flexibility of raw .byte code is not good. So corresponding assembly instruction like gas macro is used instead. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2009-11-23 19:54:06 +08:00
Joerg Roedel	9f800de38b	x86/amd-iommu: un__init iommu_setup_msi This function may be called on the resume path and can not be dropped after booting. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-11-23 12:45:25 +01:00
Jan Beulich	0e7810be30	x86: Suppress stack overrun message for init_task init_task doesn't get its stack end location set to STACK_END_MAGIC, and hence the message is confusing rather than helpful in this case. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4B06AEFE02000078000211F4@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 11:45:34 +01:00
Ingo Molnar	6e3d8330ae	perf events: Do not generate function trace entries in perf code Decreases perf overhead when function tracing is enabled, by about 50%. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 10:19:20 +01:00
Yinghai Lu	d9c2d5ac6a	x86, numa: Use near(er) online node instead of roundrobin for NUMA CPU to node mapping is set via the following sequence: 1. numa_init_array(): Set up roundrobin from cpu to online node 2. init_cpu_to_node(): Set that according to apicid_to_node[] according to srat only handle the node that is online, and leave other cpu on node without ram (aka not online) to still roundrobin. 3. later call srat_detect_node for Intel/AMD, will use first_online node or nearby node. Problem is that setup_per_cpu_areas() is not called between 2 and 3, the per_cpu for cpu on node with ram is on different node, and could put that on node with two hops away. So try to optimize this and add find_near_online_node() and call init_cpu_to_node(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 10:06:24 +01:00
Yinghai Lu	021428ad14	x86, numa, bootmem: Only free bootmem on NUMA failure path In the NUMA bootmem setup failure path we freed nodedata_phys incorrectly. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 10:00:48 +01:00
Yinghai Lu	163d3866cf	x86: apic: Print out SRAT table APIC id in hex Make it consistent with APIC MADT print out, for big systems APIC id in hex is more readable. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 09:56:30 +01:00
Yinghai Lu	37ef2a3029	x86: Re-get cfg_new in case reuse/move irq_desc When irq_desc is moved, we need to make sure to use the right cfg_new. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 09:56:05 +01:00
Yinghai Lu	e670761f12	x86: apic: Remove not needed #ifdef Suresh made dmar_table_init() already have that protection. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 09:54:15 +01:00
Yinghai Lu	44280733e7	x86: Change crash kernel to reserve via reserve_early() use find_e820_area()/reserve_early() instead. -v2: address Eric's request, to restore original semantics. will fail, if the provided address can not be used. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> LKML-Reference: <4B09E2F9.7040403@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 09:09:23 +01:00
Ingo Molnar	96200591a3	Merge branch 'tracing/hw-breakpoints' into perf/core Conflicts: arch/x86/kernel/kprobes.c kernel/trace/Makefile Merge reason: hw-breakpoints perf integration is looking good in testing and in reviews, plus conflicts are mounting up - so merge & resolve. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-21 14:07:23 +01:00
Masami Hiramatsu	6f5f67267d	x86: insn decoder test checks objdump version Check objdump version before using it for insn decoder build test, because some older objdump can't decode AVX code correctly. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Jim Keniston <jkenisto@us.ibm.com> LKML-Reference: <20091120171314.6715.30390.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-11-20 23:01:04 -08:00
Masami Hiramatsu	80509e27e4	x86: Fix insn decoder test typos Fix postest_verbose to posttest_verbose, and add posttest_64bit option for CONFIG_64BIT != y, since old command just passed '-' instead of '-n' when CONFIG_64BIT is not set. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Jim Keniston <jkenisto@us.ibm.com> LKML-Reference: <20091120171307.6715.66099.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-11-20 22:59:36 -08:00
Thomas Gleixner	746357d6a5	x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage When the kernel is compiled with -pg for tracing GCC 4.4.x inserts stack alignment of a function _before_ the mcount prologue if the -march=pentium-mmx is set and -mtune=generic is not set. This breaks the assumption of the function graph tracer which expects that the mcount prologue push %ebp mov %esp, %ebp is the first stack operation in a function because it needs to modify the function return address on the stack to trap into the tracer before returning to the real caller. The generated code is: push %edi lea 0x8(%esp),%edi and $0xfffffff0,%esp pushl -0x4(%edi) push %ebp mov %esp,%ebp so the tracer modifies the copy of the return address which is stored after the stack alignment and therefor does not trap the return which in turn breaks the call chain logic of the tracer and leads to a kernel panic. Aside of the fact that the generated code is horrible for no good reason other -march -mtune options generate the expected: push %ebp mov %esp,%ebp and $0xfffffff0,%esp which does the same and keeps everything intact. After some experimenting we found out that this problem is restricted to gcc4.4.x and to the following -march settings: i586, pentium, pentium-mmx, k6, k6-2, k6-3, winchip-c6, winchip2, c3, geode By adding -mtune=generic the code generator produces always the expected code. So forcing -mtune=generic when CONFIG_FUNCTION_GRAPH_TRACER=y is not pretty, but at the moment the only way to prevent that the kernel trips over gcc-shrooms induced code madness. Most distro kernels have CONFIG_X86_GENERIC=y anyway which forces -mtune=generic as well so it will not impact those. References: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109 http://lkml.org/lkml/2009/11/19/17 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <alpine.LFD.2.00.0911200206570.24119@localhost.localdomain> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com>, Cc: Jeff Law <law@redhat.com> Cc: gcc@gcc.gnu.org Cc: David Daney <ddaney@caviumnetworks.com> Cc: Andrew Haley <aph@redhat.com> Cc: Richard Guenther <richard.guenther@gmail.com> Cc: stable@kernel.org	2009-11-20 14:06:46 +01:00
Masami Hiramatsu	ce64c62074	x86: Instruction decoder test should generate build warning Since some instructions are not decoded correctly by older versions of objdump, it may cause false positive error in insn decoder posttest. This changes build error of insn decoder test to build warning. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20091116230631.5250.41579.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-19 21:40:13 +01:00
David S. Miller	3505d1a9fd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/sfc/sfe4001.c drivers/net/wireless/libertas/cmd.c drivers/staging/Kconfig drivers/staging/Makefile drivers/staging/rtl8187se/Kconfig drivers/staging/rtl8192e/Kconfig	2009-11-18 22:19:03 -08:00
Jan Beulich	350f8f5631	x86: Eliminate redundant/contradicting cache line size config options Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT (with inconsistent defaults), just having the latter suffices as the former can be easily calculated from it. To be consistent, also change X86_INTERNODE_CACHE_BYTES to X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA to account for last level cache line size (which here matters more than L1 cache line size). Finally, make sure the default value for X86_L1_CACHE_SHIFT, when X86_GENERIC is selected, is being seen before that for the individual CPU model options (other than on x86-64, where GENERIC_CPU is part of the choice construct, X86_GENERIC is a separate option on ix86). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ravikiran Thirumalai <kiran@scalex86.org> Acked-by: Nick Piggin <npiggin@suse.de> LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-19 04:58:34 +01:00
Thomas Gleixner	070e5c3f99	x86: vmiclock: Fix printk format clockevents.mult became u32. Fix the printk format. Pointed-out-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-11-18 12:31:06 +01:00
Thomas Gleixner	6dbfe5a57d	x86: Fixup last users of irq_chip->typename The typename member of struct irq_chip was kept for migration purposes and is obsolete since more than 2 years. Fix up the leftovers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-11-18 11:45:29 +01:00
Rusty Russell	8dca15e408	[CPUFREQ] speedstep-ich: fix error caused by `394122ab14` "[CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c" changed the code to mistakenly pass the current cpu as the "processor" argument of speedstep_get_frequency(), whereas it should be the type of the processor. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14340 Based on a patch by Dave Mueller. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Dominik Brodowski <linux@brodo.de> Reported-by: Dave Mueller <dave.mueller@gmx.ch> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Jones <davej@redhat.com>	2009-11-17 23:15:04 -05:00
John Villalovos	293afe44d7	[CPUFREQ] acpi-cpufreq: blacklist Intel 0f68: Fix HT detection and put in notification message Removing the SMT/HT check, since the Errata doesn't mention Hyper-Threading. Adding in a printk, so that the user knows why acpi-cpufreq refuses to load. Also, once system is blacklisted, don't repeat checks to see if blacklisted. This also causes the message to only be printed once, rather than for each CPU. Signed-off-by: John L. Villalovos <john.l.villalovos@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-11-17 23:15:03 -05:00
Roel Kluin	c53614ec17	[CPUFREQ] powernow-k8: Fix test in get_transition_latency() Not makes it a bool before the comparison. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-11-17 23:15:03 -05:00
Krzysztof Helt	f7f3cad060	[CPUFREQ] longhaul: select Longhaul version 2 for capable CPUs There is a typo in the longhaul detection code so only Longhaul v1 or Longhaul v3 is selected. The Longhaul v2 is not selected even for CPUs which are capable of. Tested on PCChips Giga Pro board. Frequency changes work and the Longhaul v2 detects that the board is not capable of changing CPU voltage. Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Dave Jones <davej@redhat.com>	2009-11-17 23:15:03 -05:00
Yinghai Lu	508d85c2c6	x86: When cleaning MTRRs, do not fold WP into UC The current MTRR code treats WP as a form of UC. This really isn't desirable behaviour, except possibly in the case of severe MTRR shortage. Disable this, to allow legitimate uses of WP to remain unmolested. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2009-11-17 10:26:53 -08:00
Lin Ming	0696b711e4	timekeeping: Fix clock_gettime vsyscall time warp Since commit `0a544198` "timekeeping: Move NTP adjusted clock multiplier to struct timekeeper" the clock multiplier of vsyscall is updated with the unmodified clock multiplier of the clock source and not with the NTP adjusted multiplier of the timekeeper. This causes user space observerable time warps: new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf Add a new argument "mult" to update_vsyscall() and hand in the timekeeping internal NTP adjusted multiplier. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-11-17 11:52:34 +01:00
Ingo Molnar	a7b63425a4	Merge branch 'perf/core' into perf/probes Resolved merge conflict in tools/perf/Makefile Merge reason: we want to queue up a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-17 10:17:47 +01:00
Andreas Herrmann	8cc2361bd0	x86: ucode-amd: Move family check to microcde_amd.c's init function ... to avoid useless trial to load firmware on systems with unsupported AMD CPUs. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Mike Travis <travis@sgi.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Andreas Mohr <andi@lisas.de> Cc: Jack Steiner <steiner@sgi.com> LKML-Reference: <20091117070638.GA27691@alberich.amd.com> [ v2: changed BUG_ON() to WARN_ON() ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-17 10:10:40 +01:00
Eric W. Biederman	bb9074ff58	Merge commit 'v2.6.32-rc7' Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler gets a small bug fix and the sysctl tree where I am removing all sysctl strategy routines.	2009-11-17 01:01:34 -08:00
Ingo Molnar	123bf0e2ed	x86: gart: Clean up the code a bit Clean up various small stylistic details in the GART code. No functionality changed. Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: muli@il.ibm.com Cc: joerg.roedel@amd.com LKML-Reference: <1258287594-8777-2-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-17 07:57:00 +01:00
FUJITA Tomonori	1f7564ca83	x86: Calgary: Remove unnecessary DMA_ERROR_CODE usage This cleans up iommu_alloc() a bit and removes unnecessary DMA_ERROR_CODE usage. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: muli@il.ibm.com Cc: joerg.roedel@amd.com LKML-Reference: <1258287594-8777-4-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-17 07:53:21 +01:00
FUJITA Tomonori	8fd524b355	x86: Kill bad_dma_address variable This kills bad_dma_address variable, the old mechanism to enable IOMMU drivers to make dma_mapping_error() work in IOMMU's specific way. bad_dma_address variable was introduced to enable IOMMU drivers to make dma_mapping_error() work in IOMMU's specific way. However, it can't handle systems that use both swiotlb and HW IOMMU. SO we introduced dma_map_ops->mapping_error to solve that case. Intel VT-d, GART, and swiotlb already use dma_map_ops->mapping_error. Calgary, AMD IOMMU, and nommu use zero for an error dma address. This adds DMA_ERROR_CODE and converts them to use it (as SPARC and POWER does). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: muli@il.ibm.com Cc: joerg.roedel@amd.com LKML-Reference: <1258287594-8777-3-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-17 07:53:21 +01:00
FUJITA Tomonori	42109197eb	x86: gart: Add own dma_mapping_error function GART IOMMU is the only user of bad_dma_address variable. This patch converts GART to use the newer mechanism, fill in ->mapping_error() in struct dma_map_ops, to make dma_mapping_error() work in IOMMU specific way. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: muli@il.ibm.com Cc: joerg.roedel@amd.com LKML-Reference: <1258287594-8777-2-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-17 07:53:20 +01:00
Ingo Molnar	99f4c9de2b	Merge commit 'v2.6.32-rc7' into core/iommu Merge reason: Add fixes we'll depend on. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-17 07:51:07 +01:00
Masami Hiramatsu	35039eb6b1	x86: Show symbol name if insn decoder test failed Show symbol name if insn decoder test find a difference. This will help us to find out where the issue is. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20091116230624.5250.49813.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-17 07:16:50 +01:00
Masami Hiramatsu	d65ff75fbe	x86: Add verbose option to insn decoder test Add verbose option to insn decoder test. This dumps decoded instruction when building kernel with V=1. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20091116230618.5250.18762.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-17 07:16:48 +01:00
H. Peter Anvin	5bd085b5fb	x86: remove "extern" from function prototypes in <asm/proto.h> Function prototypes don't need "extern", and it is generally frowned upon to have them. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-11-16 13:55:31 -08:00
Kees Cook	4b0f3b81eb	x86, mm: Report state of NX protections during boot It is possible for x86_64 systems to lack the NX bit either due to the hardware lacking support or the BIOS having turned off the CPU capability, so NX status should be reported. Additionally, anyone booting NX-capable CPUs in 32bit mode without PAE will lack NX functionality, so this change provides feedback for that case as well. Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1258154897-6770-6-git-send-email-hpa@zytor.com>	2009-11-16 13:44:59 -08:00
H. Peter Anvin	4763ed4d45	x86, mm: Clean up and simplify NX enablement The 32- and 64-bit code used very different mechanisms for enabling NX, but even the 32-bit code was enabling NX in head_32.S if it is available. Furthermore, we had a bewildering collection of tests for the available of NX. This patch: a) merges the 32-bit set_nx() and the 64-bit check_efer() function into a single x86_configure_nx() function. EFER control is left to the head code. b) eliminates the nx_enabled variable entirely. Things that need to test for NX enablement can verify __supported_pte_mask directly, and cpu_has_nx gives the supported status of NX. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegardno@ifi.uio.no> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Chris Wright <chrisw@sous-sol.org> LKML-Reference: <1258154897-6770-5-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>	2009-11-16 13:44:59 -08:00
H. Peter Anvin	583140afb9	x86, pageattr: Make set_memory_(x\|nx) aware of NX support Make set_memory_x/set_memory_nx directly aware of if NX is supported in the system or not, rather than requiring that every caller assesses that support independently. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Starling <tstarling@wikimedia.org> Cc: Hannes Eder <hannes@hanneseder.net> LKML-Reference: <1258154897-6770-4-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>	2009-11-16 13:44:58 -08:00
H. Peter Anvin	a7c4c0d934	x86, sleep: Always save the value of EFER Always save the value of EFER, regardless of the state of NX. Since EFER may not actually exist, use rdmsr_safe() to do so. v2: check the return value from rdmsr_safe() instead of relying on the output values being unchanged on error. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@tuxonice.net> LKML-Reference: <1258154897-6770-3-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>	2009-11-16 13:44:57 -08:00
H. Peter Anvin	8a50e5135a	x86-32: Use symbolic constants, safer CPUID when enabling EFER.NX Use symbolic constants rather than hard-coded values when setting EFER.NX in head_32.S, and do a more rigorous test for the validity of the response when probing for the extended CPUID range. Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1258154897-6770-2-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>	2009-11-16 13:44:56 -08:00
Cyrill Gorcunov	e79c65a97c	x86: io-apic: IO-APIC MMIO should not fail on resource insertion If IO-APIC base address is 1K aligned we should not fail on resourse insertion procedure. For this sake we define IO_APIC_SLOT_SIZE constant which should cover all IO-APIC direct accessible registers. An example of a such configuration is there http://marc.info/?l=linux-kernel&m=118114792006520 \| \| Quoting the message \| \| IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 \| IOAPIC[1]: apic_id 3, version 32, address 0xfec80000, GSI 24-47 \| IOAPIC[2]: apic_id 4, version 32, address 0xfec80400, GSI 48-71 \| IOAPIC[3]: apic_id 5, version 32, address 0xfec84000, GSI 72-95 \| IOAPIC[4]: apic_id 8, version 32, address 0xfec84400, GSI 96-119 \| Reported-by: "Maciej W. Rozycki" <macro@linux-mips.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20091116151426.GC5653@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-16 16:37:10 +01:00
Frederic Weisbecker	3c93ca00ee	x86: Add missing might_fault() checks to copy_{to,from}_user() On x86-64, copy_[to\|from]_user() rely on assembly routines that never call might_fault(), making us missing various lockdep checks. This doesn't apply to __copy_from,to_user() that explicitly handle these calls, neither is it a problem in x86-32 where copy_to,from_user() rely on the "__" prefixed versions that also call might_fault(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1258382538-30979-1-git-send-email-fweisbec@gmail.com> [ v2: fix module export ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-16 16:09:52 +01:00
Prarit Bhargava	303fc0870f	x86: AMD Northbridge: Verify NB's node is online Fix panic seen on some IBM and HP systems on 2.6.32-rc6: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c [...] [<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b [<ffffffff81225c62>] pci_device_probe+0x8e/0xf5 [<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c [<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9 [<ffffffff812b9f1d>] __driver_attach+0x58/0x7c [<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c [<ffffffff812b9298>] bus_for_each_dev+0x54/0x89 [<ffffffff812b9b4f>] driver_attach+0x19/0x1b [<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d [<ffffffff812ba1e7>] driver_register+0x98/0x109 [<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3 [<ffffffff81072776>] ? up_read+0x26/0x2a [<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp] [<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp] [<ffffffff8100a073>] do_one_initcall+0x6d/0x185 [<ffffffff8108d765>] sys_init_module+0xd3/0x236 [<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b I put in a printk and commented out the set_dev_node() call when and got this output: quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3 I.e. the issue appears to be that the HW has set val to a valid value, however, the system is only configured for a single node -- 0, the others are offline. Check to see if the node is actually online before setting the numa node for an AMD northbridge in quirk_amd_nb_node(). Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: bhavna.sarathy@amd.com Cc: jbarnes@virtuousgeek.org Cc: andreas.herrmann3@amd.com LKML-Reference: <20091112180933.12532.98685.sendpatchset@prarit.bos.redhat.com> [ v2: clean up the code and add comments ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-16 15:43:05 +01:00
Thomas Gleixner	411462f62a	x86: Fix printk format due to variable type change clockevents.mult became u32. Fix the printk format. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-11-16 11:55:20 +01:00
Hiroshi Shimamoto	62ad33f670	x86: Don't put iommu_shutdown_noop() in init section It causes kernel panic on shutdown or reboot. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <4B00BC8E.50801@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-16 08:58:51 +01:00
Thomas Gleixner	dc186ad741	workqueue: Add debugobjects support Add debugobject support to track the life time of work_structs. While at it, remove duplicate definition of INIT_DELAYED_WORK_ON_STACK(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2009-11-16 01:09:48 +09:00
Ingo Molnar	39dc78b651	Merge commit 'v2.6.32-rc7' into perf/core Merge reason: pick up perf fixlets Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-15 09:50:41 +01:00
Jan Beulich	1472248583	x86-64: __copy_from_user_inatomic() adjustments This v2.6.26 commit: `ad2fc2c`: x86: fix copy_user on x86 rendered __copy_from_user_inatomic() identical to copy_user_generic(), yet didn't make the former just call the latter from an inline function. Furthermore, this v2.6.19 commit: `b885808`: [PATCH] Add proper sparse __user casts to __copy_to_user_inatomic converted the return type of __copy_to_user_inatomic() from unsigned long to int, but didn't do the same to __copy_from_user_inatomic(). Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: <v.mayatskih@gmail.com> LKML-Reference: <4AFD5778020000780001F8F4@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-15 09:29:47 +01:00
FUJITA Tomonori	f4131c6259	x86: Make calgary_iommu_init() static This makes calgary_iommu_init() static and moves it to remove the forward declaration. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: muli@il.ibm.com LKML-Reference: <20091114212603U.fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-15 09:04:14 +01:00
FUJITA Tomonori	6959450e56	swiotlb: Remove duplicate swiotlb_force extern declarations Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: tony.luck@intel.com LKML-Reference: <1258199198-16657-4-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-15 09:03:10 +01:00
FUJITA Tomonori	94a15564ac	x86: Move iommu_shutdown_noop to x86_init.c iommu_init_noop() is in arch/x86/kernel/x86_init.c but iommu_shutdown_noop() in arch/x86/include/asm/iommu.h. This moves iommu_shutdown_noop() to x86_init.c for consistency. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <1258199198-16657-3-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-15 09:03:10 +01:00
FUJITA Tomonori	a3b28ee109	x86: Set dma_ops to nommu_dma_ops by default We set dma_ops to nommu_dma_ops at two different places for x86_32 and x86_64. This unifies them by setting dma_ops to nommu_dma_ops by default. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <1258199198-16657-2-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-15 09:03:09 +01:00
Ingo Molnar	68efa37df7	hw-breakpoints, x86: Fix modular KVM build This build error: arch/x86/kvm/x86.c:3655: error: implicit declaration of function 'hw_breakpoint_restore' Happens because in the CONFIG_KVM=m case there's no 'CONFIG_KVM' define in the kernel - it's CONFIG_KVM_MODULE in that case. Make the prototype available unconditionally. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Prasad <prasad@linux.vnet.ibm.com> LKML-Reference: <1258114575-32655-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-14 15:32:53 +01:00
Ingo Molnar	31c997cac7	x86: Fix cpu_devs[] initialization in early_cpu_init() Yinghai Lu noticed that this commit: `0388423`: x86: Minimise printk spew from per-vendor init code mistakenly left out the initialization of cpu_devs[] in the !PROCESSOR_SELECT case. Fix it. Reported-by: Yinghai Lu <yinghai@kernel.org> Cc: Dave Jones <davej@redhat.com> LKML-Reference: <20091113203000.GA19160@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-14 10:36:50 +01:00
Roland Dreier	b01c845f0f	x86: Remove CPU cache size output for non-Intel too As Dave Jones said about the output in intel_cacheinfo.c: "They aren't useful, and pollute the dmesg output a lot (especially on machines with many cores). Also the same information can be trivially found out from userspace." Give the generic display_cacheinfo() function the same treatment. Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Dave Jones <davej@redhat.com> Cc: Mike Travis <travis@sgi.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <adaocn6dp99.fsf_-_@roland-alpha.cisco.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-14 01:51:18 +01:00
Dave Jones	0388423dba	x86: Minimise printk spew from per-vendor init code In the default case where the kernel supports all CPU vendors, we currently print out a bunch of not useful messages on every system. 32-bit: KERNEL supported cpus: Intel GenuineIntel AMD AuthenticAMD NSC Geode by NSC Cyrix CyrixInstead Centaur CentaurHauls Transmeta GenuineTMx86 Transmeta TransmetaCPU UMC UMC UMC UMC 64-bit: KERNEL supported cpus: Intel GenuineIntel AMD AuthenticAMD Centaur CentaurHauls Given that "what CPUs does the kernel support" isn't useful for the "support everything" case, we can suppress these printk's. Signed-off-by: Dave Jones <davej@redhat.com> LKML-Reference: <20091113203000.GA19160@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-14 01:18:05 +01:00
Matthew Garrett	d9b263528e	x86, setup: Store the boot cursor state Add a field to store the boot cursor state and implement this for VGA on x86. This can then be used to set the default policy for the boot console. Signed-off-by: Matthew Garrett <mjg@redhat.com> LKML-Reference: <1258142222-16092-1-git-send-email-mjg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-11-13 14:23:11 -08:00
Dave Jones	15cd8812ab	x86: Remove the CPU cache size printk's They aren't really useful, and they pollute the dmesg output a lot (especially on machines with many cores). Also the same information can be trivially found out from userspace. Reported-by: Mike Travis <travis@sgi.com> Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091112231542.GA7129@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-13 09:14:55 +01:00
Eric W. Biederman	24a065624d	sysctl x86: Remove dead binary sysctl support Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name and .strategy members of sysctl tables are dead code. Remove them. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2009-11-12 02:05:04 -08:00
Hiroshi Shimamoto	db48cccc7c	perf_event, x86: Annotate init functions and data Annotate init functions and data with __init and __initconst. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@gmail.com> LKML-Reference: <4AFB721E.8070203@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-12 09:18:36 +01:00
Hidetoshi Seto	cffd377e58	x86, mce: Fix __init annotations The intel_init_thermal() is called from resume path, so it cannot be marked as __init. OTOH mce_banks_init() is only called from __mcheck_cpu_cap_init() which is marked as __cpuinit, so it can be also marked as __cpuinit. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Yong Wang <yong.y.wang@linux.intel.com> LKML-Reference: <4AFBB0B8.2070501@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-12 09:17:11 +01:00
Linus Torvalds	55871bdd03	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86/PCI: Adjust GFP mask handling for coherent allocations PCI ASPM: fix oops on root port removal	2009-11-11 11:34:14 -08:00
Linus Torvalds	605f37504f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, amd-ucode: Check UCODE_MAGIC before loading the container file x86: Fix error return sequence in __ioremap_caller() x86: Add Phoenix/MSC BIOSes to lowmem corruption list	2009-11-11 11:29:10 -08:00
Yinghai Lu	196cf0d67a	x86: Make sure wakeup trampoline code is below 1MB Instead of using bootmem, try find_e820_area()/reserve_early(), and call acpi_reserve_memory() early, to allocate the wakeup trampoline code area below 1M. This is more reliable, and it also removes a dependency on bootmem. -v2: change function name to acpi_reserve_wakeup_memory(), as suggested by Rafael. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: pm list <linux-pm@lists.linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4AFA210B.3020207@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-11 20:14:32 +01:00
Andreas Herrmann	9f15226e75	x86, ucode-amd: Ensure ucode update on suspend/resume after CPU off/online cycle When switching a CPU offline/online and then doing suspend/resume, ucode is not updated on this CPU. This is due to the microcode_fini_cpu() call which frees uci->mc when setting the CPU offline: static void microcode_fini_cpu_amd(int cpu) { struct ucode_cpu_info uci = ucode_cpu_info + cpu; vfree(uci->mc); uci->mc = NULL; } When the CPU is set online uci->mc is still NULL because no ucode update is required. Finally this prevents ucode update when resuming after suspend: static enum ucode_state microcode_resume_cpu(int cpu) { struct ucode_cpu_info uci = ucode_cpu_info + cpu; if (!uci->mc) return UCODE_NFOUND; ... } Fix is to check whether uci->mc is valid before microcode_resume_cpu() is called. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: dimm <dmitry.adamushko@gmail.com> LKML-Reference: <20091111190329.GF18592@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-11 20:06:54 +01:00
FUJITA Tomonori	b18485e7ac	swiotlb: Remove the swiotlb variable usage POWERPC doesn't expect it to be used. This fixes the linux-next build failure reported by Stephen Rothwell: lib/swiotlb.c: In function 'setup_io_tlb_npages': lib/swiotlb.c:114: error: 'swiotlb' undeclared (first use in this function) Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: peterz@infradead.org LKML-Reference: <20091112000258F.fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-11 16:51:18 +01:00
Yong Wang	ce6b5d768c	x86: Mark the thermal init functions __init Mark the thermal init functions __init so that the init memory can be freed. Signed-off-by: Yong Wang <yong.y.wang@intel.com> LKML-Reference: <20091111075125.GA17900@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-11 12:33:32 +01:00
Randy Dunlap	e84446de5c	x86 VSDO: Fix Kconfig help COMPAT_VDSO has 2 help text blocks, but kconfig only uses the last one found, so merge the 2 blocks. It would be real nice if kconfig would warn about this. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <4AF9FB6C.70003@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-11 07:26:41 +01:00
Dimitri Sivanich	200a9ae280	x86: Remove asm/apicnum.h arch/x86/include/asm/apicnum.h is not referenced anywhere anymore. Its definitions appear in apicdef.h. Remove it. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Mike Travis <travis@sgi.com> LKML-Reference: <20091110195835.GA4393@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 22:07:35 +01:00
Dave Jones	e02e0e1a13	x86: Fix typo in Intel CPU cache size descriptor I double-checked the datasheet. One of the existing descriptors has a typo: it should be 2MB not 2038 KB. Signed-off-by: Dave Jones <davej@redhat.com> Cc: <stable@kernel.org> # .3x.x: `85160b9`: x86: Add new Intel CPU cache size descriptors Cc: <stable@kernel.org> # .3x.x LKML-Reference: <20091110200120.GA27090@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 21:52:32 +01:00
Dave Jones	85160b92fb	x86: Add new Intel CPU cache size descriptors The latest rev of Intel doc AP-485 details new cache descriptors that we don't yet support. 12MB, 18MB and 24MB 24-way assoc L3 caches. Signed-off-by: Dave Jones <davej@redhat.com> LKML-Reference: <20091110184924.GA20337@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 20:06:16 +01:00
Ingo Molnar	b4941a9a60	x86: Add iommu_init to x86_init_ops, fix build Most of the time x86_init.h is included in pci-dma.c - but not always, leading to this rare build failure: arch/x86/kernel/pci-dma.c:296: error: 'x86_init' undeclared (first use in this function) So include asm/x86_init.h explicitly. Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: chrisw@sous-sol.org Cc: dwmw2@infradead.org Cc: joerg.roedel@amd.com Cc: muli@il.ibm.com LKML-Reference: <1257849980-22640-2-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 14:37:58 +01:00
FUJITA Tomonori	72d03802b8	x86, 32-bit: Fix swiotlb boot crash Ingo Molnar reported this boot crash: [ 8.655620] pata_amd 0000:00:06.0: version 0.4.1 [ 8.660286] BUG: unable to handle kernel NULL pointer dereference at 00000034 [ 8.663572] IP: [<c100617b>] dma_supported+0x3b/0xa4 [ 8.663572] *pde = 00000000 Initialize dma_ops properly in the 32-bit case. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 14:11:32 +01:00
FUJITA Tomonori	75f1cdf1dd	x86: Handle HW IOMMU initialization failure gracefully If HW IOMMU initialization fails (Intel VT-d often does this, typically due to BIOS bugs), we fall back to nommu. It doesn't work for the majority since nowadays we have more than 4GB memory so we must use swiotlb instead of nommu. The problem is that it's too late to initialize swiotlb when HW IOMMU initialization fails. We need to allocate swiotlb memory earlier from bootmem allocator. Chris explained the issue in detail: http://marc.info/?l=linux-kernel&m=125657444317079&w=2 The current x86 IOMMU initialization sequence is too complicated and handling the above issue makes it more hacky. This patch changes x86 IOMMU initialization sequence to handle the above issue cleanly. The new x86 IOMMU initialization sequence are: 1. we initialize the swiotlb (and setting swiotlb to 1) in the case of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by the boot option, we finish here. 2. we call the detection functions of all the IOMMUs 3. the detection function sets x86_init.iommu.iommu_init to the IOMMU initialization function (so we can avoid calling the initialization functions of all the IOMMUs needlessly). 4. if the IOMMU initialization function doesn't need to swiotlb then sets swiotlb to zero (e.g. the initialization is sucessful). 5. if we find that swiotlb is set to zero, we free swiotlb resource. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: chrisw@sous-sol.org Cc: dwmw2@infradead.org Cc: joerg.roedel@amd.com Cc: muli@il.ibm.com LKML-Reference: <1257849980-22640-10-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 12:32:07 +01:00
FUJITA Tomonori	ad32e8cb86	swiotlb: Defer swiotlb init printing, export swiotlb_print_info() This enables us to avoid printing swiotlb memory info when we initialize swiotlb. After swiotlb initialization, we could find that we don't need swiotlb. This patch removes the code to print swiotlb memory info in swiotlb_init() and exports the function to do that. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: chrisw@sous-sol.org Cc: dwmw2@infradead.org Cc: joerg.roedel@amd.com Cc: muli@il.ibm.com Cc: tony.luck@intel.com Cc: benh@kernel.crashing.org LKML-Reference: <1257849980-22640-9-git-send-email-fujita.tomonori@lab.ntt.co.jp> [ -v2: merge up conflict ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 12:32:00 +01:00
FUJITA Tomonori	9d5ce73a64	x86: intel-iommu: Convert detect_intel_iommu to use iommu_init hook This changes detect_intel_iommu() to set intel_iommu_init() to iommu_init hook if detect_intel_iommu() finds the IOMMU. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: chrisw@sous-sol.org Cc: dwmw2@infradead.org Cc: joerg.roedel@amd.com Cc: muli@il.ibm.com LKML-Reference: <1257849980-22640-6-git-send-email-fujita.tomonori@lab.ntt.co.jp> [ -v2: build fix for the !CONFIG_DMAR case ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 12:31:36 +01:00
FUJITA Tomonori	ea1b0d3945	x86: amd_iommu: Convert amd_iommu_detect() to use iommu_init hook This changes amd_iommu_detect() to set amd_iommu_init to iommu_init hook if amd_iommu_detect() finds the AMD IOMMU. We can kill the code to check if we found the IOMMU in amd_iommu_init() since amd_iommu_detect() sets amd_iommu_init() only when it found the IOMMU. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: chrisw@sous-sol.org Cc: dwmw2@infradead.org Cc: joerg.roedel@amd.com Cc: muli@il.ibm.com LKML-Reference: <1257849980-22640-5-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 12:31:30 +01:00
FUJITA Tomonori	de957628ce	x86: GART: Convert gart_iommu_hole_init() to use iommu_init hook This changes gart_iommu_hole_init() to set gart_iommu_init() to iommu_init hook if gart_iommu_hole_init() finds the GART IOMMU. We can kill the code to check if we found the IOMMU in gart_iommu_init() since gart_iommu_hole_init() sets gart_iommu_init() only when it found the IOMMU. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: chrisw@sous-sol.org Cc: dwmw2@infradead.org Cc: joerg.roedel@amd.com Cc: muli@il.ibm.com LKML-Reference: <1257849980-22640-4-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 12:31:23 +01:00
FUJITA Tomonori	d7b9f7be21	x86: Calgary: Convert detect_calgary() to use iommu_init hook This changes detect_calgary() to set init_calgary() to iommu_init hook if detect_calgary() finds the Calgary IOMMU. We can kill the code to check if we found the IOMMU in init_calgary() since detect_calgary() sets init_calgary() only when it found the IOMMU. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: chrisw@sous-sol.org Cc: dwmw2@infradead.org Cc: joerg.roedel@amd.com LKML-Reference: <1257849980-22640-3-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 12:31:15 +01:00
FUJITA Tomonori	d07c1be069	x86: Add iommu_init to x86_init_ops We call the detections functions of all the IOMMUs then all their initialization functions. The latter is pointless since we don't detect multiple different IOMMUs. What we need to do is calling the initialization function of the detected IOMMU. This adds iommu_init hook to x86_init_ops so if an IOMMU detection function can set its initialization function to the hook. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: chrisw@sous-sol.org Cc: dwmw2@infradead.org Cc: joerg.roedel@amd.com Cc: muli@il.ibm.com LKML-Reference: <1257849980-22640-2-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 12:31:07 +01:00
Andreas Herrmann	1a74357066	x86: ucode-amd: Convert printk(KERN_...) to pr_(...) Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: dimm <dmitry.adamushko@gmail.com> LKML-Reference: <20091110110920.GJ30802@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 12:15:50 +01:00
Andreas Herrmann	14c569425a	x86: ucode-amd: Don't warn when no ucode is available for a CPU revision There is no point in warning when there is no ucode available for a specific CPU revision. Currently the container-file, which provides the AMD ucode patches for OS load, contains only a few ucode patches. It's already clearly indicated by the printed patch_level whenever new ucode was available and an update happened. So the warning message is of no help but rather annoying on systems with many CPUs. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: dimm <dmitry.adamushko@gmail.com> LKML-Reference: <20091110110825.GI30802@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 12:15:49 +01:00
Andreas Herrmann	d1c84f79a6	x86: ucode-amd: Load ucode-patches once and not separately of each CPU This also implies that corresponding log messages, e.g. platform microcode: firmware: requesting amd-ucode/microcode_amd.bin show up only once on module load and not when ucode is updated for each CPU. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: dimm <dmitry.adamushko@gmail.com> LKML-Reference: <20091110110723.GH30802@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 12:15:48 +01:00
Frederic Weisbecker	59d8eb53ea	hw-breakpoints: Wrap in the KVM breakpoint active state check Wrap in the cpu dr7 check that tells if we have active breakpoints that need to be restored in the cpu. This wrapper makes the check more self-explainable and also reusable for any further other uses. Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>	2009-11-10 11:23:43 +01:00
Frederic Weisbecker	9f6b3c2c30	hw-breakpoints: Fix broken a.out format dump Fix the broken a.out format dump. For now we only dump the ptrace breakpoints. TODO: Dump every perf breakpoints for the current thread, not only ptrace based ones. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>	2009-11-10 11:23:05 +01:00
Xiaotian Feng	2fb8f4e6a8	x86: pat: Remove ioremap_default() Commit: `b6ff32d`: x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() consolidated reserve_memtype() and pat_x_mtrr_type, this made ioremap_default() same as ioremap_cache(). Remove the redundant function and change the only caller to use ioremap_cache. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <1257845005-7938-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 10:34:05 +01:00
Xiaotian Feng	83ea05ea69	x86: pat: Clean up req_type special case for reserve_memtype() Commit: `b6ff32d`: x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() consolidated code in pat_x_mtrr_type() and reserve_memtype(), which removed the special case (req_type is -1) for the PAT-enabled part. We should also change comments and the PAT-disabled part. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <1257844987-7906-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 10:34:04 +01:00
Joe Perches	41855b7754	x86: GART: pci-gart_64.c: Use correct length in strncmp Signed-off-by: Joe Perches <joe@perches.com> Cc: <stable@kernel.org> # .3x.x LKML-Reference: <1257818330.12852.72.camel@Joe-Laptop.home> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 06:05:39 +01:00
Yong Wang	a2202aa292	x86: Under BIOS control, restore AP's APIC_LVTTHMR to the BSP value On platforms where the BIOS handles the thermal monitor interrupt, APIC_LVTTHMR on each logical CPU is programmed to generate a SMI and OS must not touch it. Unfortunately AP bringup sequence using INIT-SIPI-SIPI clears all the LVT entries except the mask bit. Essentially this results in all LVT entries including the thermal monitoring interrupt set to masked (clearing the bios programmed value for APIC_LVTTHMR). And this leads to kernel take over the thermal monitoring interrupt on AP's but not on BSP (leaving the bios programmed value only on BSP). As a result of this, we have seen system hangs when the thermal monitoring interrupt is generated. Fix this by reading the initial value of thermal LVT entry on BSP and if bios has taken over the control, then program the same value on all AP's and leave the thermal monitoring interrupt control on all the logical cpu's to the bios. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Arjan van de Ven <arjan@infradead.org> LKML-Reference: <20091110013824.GA24940@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org	2009-11-10 05:57:55 +01:00
Cyrill Gorcunov	7abc075313	x86: apic: Do not use stacked physid_mask_t We should not use physid_mask_t as a stack based variable in apic code. This type depends on MAX_APICS parameter which may be huge enough. Especially it became a problem with apic NOOP driver which is portable between 32 bit and 64 bit environment (where we have really huge MAX_APICS). So apic driver should operate with pointers and a caller in turn should aware of allocation physid_mask_t variable. As a side (but positive) effect -- we may use already implemented physid_set_mask_of_physid function eliminating default_apicid_to_cpu_present completely. Note that physids_coerce and physids_promote turned into static inline from macro (since macro hides the fact that parameter is being interpreted as unsigned long, make it explicit). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20091109220659.GA5568@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 05:52:07 +01:00
Andreas Herrmann	6e18da75c2	x86, amd-ucode: Remove needless log messages Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091029134742.GD30802@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 05:49:28 +01:00
Borislav Petkov	506f90eeae	x86, amd-ucode: Check UCODE_MAGIC before loading the container file Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20091029134552.GC30802@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 05:46:09 +01:00
Huang Ying	fd650a6394	x86: Generate .byte code for some new instructions via gas macro It will take some time for binutils (gas) to support some newly added instructions, such as SSE4.1 instructions or the AES-NI instructions found in upcoming Intel CPU. To make the source code can be compiled by old binutils, .byte code is used instead of the assembly instruction. But the readability and flexibility of raw .byte code is not good. This patch solves the issue of raw .byte code via generating it via assembly instruction like gas macro. The syntax is as close as possible to real assembly instruction. Some helper macros such as MODRM is not a full feature implementation. It can be extended when necessary. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2009-11-09 13:52:26 -05:00
Cyrill Gorcunov	f4a70c5537	x86, apic: Get rid of apicid_to_cpu_present assign on 64-bit In fact it's never get used on x86-64 (for 64 bit platform we use differ technique to enumerate io-units). Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20091108131645.GD5300@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-08 19:46:17 +01:00
Cyrill Gorcunov	4343fe1024	x86, ioapic: Use snrpintf while set names for IO-APIC resourses We should be ready that one day MAX_IO_APICS may raise its number. To prevent memory overwrite we're to use safe snprintf while set IO-APIC resourse name. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20091108155431.GC25940@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-08 17:06:23 +01:00
Cyrill Gorcunov	46dc281b1b	x86, apic: Use PAGE_SIZE instead of numbers The whole page is reserved for IO-APIC fixmap due to non-cacheable requirement. So lets note this explicitly instead of playing with numbers. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Maciej W. Rozycki <macro@linux-mips.org> LKML-Reference: <20091108155356.GB25940@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-08 17:06:22 +01:00
Jan Beulich	eb647138ac	x86/PCI: Adjust GFP mask handling for coherent allocations Rather than forcing GFP flags and DMA mask to be inconsistent, GFP flags should be determined even for the fallback device through dma_alloc_coherent_mask()/dma_alloc_coherent_gfp_flags(). This restores 64-bit behavior as it was prior to commits `8965eb1938` and `4a367f3a9d` (not sure why there are two of them), where GFP_DMA was forced on for 32-bit, but not for 64-bit, with the slight adjustment that afaict even 32-bit doesn't need this without CONFIG_ISA. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Takashi Iwai <tiwai@suse.de> LKML-Reference: <4AF18187020000780001D8AA@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-08 07:44:30 -08:00
Frederic Weisbecker	24f1e32c60	hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events This patch rebase the implementation of the breakpoints API on top of perf events instances. Each breakpoints are now perf events that handle the register scheduling, thread/cpu attachment, etc.. The new layering is now made as follows: ptrace kgdb ftrace perf syscall \ \| / / \ \| / / / Core breakpoint API / / \| / \| / Breakpoints perf events \| \| Breakpoints PMU ---- Debug Register constraints handling (Part of core breakpoint API) \| \| Hardware debug registers Reasons of this rewrite: - Use the centralized/optimized pmu registers scheduling, implying an easier arch integration - More powerful register handling: perf attributes (pinned/flexible events, exclusive/non-exclusive, tunable period, etc...) Impact: - New perf ABI: the hardware breakpoints counters - Ptrace breakpoints setting remains tricky and still needs some per thread breakpoints references. Todo (in the order): - Support breakpoints perf counter events for perf tools (ie: implement perf_bpcounter_event()) - Support from perf tools Changes in v2: - Follow the perf "event " rename - The ptrace regression have been fixed (ptrace breakpoint perf events weren't released when a task ended) - Drop the struct hw_breakpoint and store generic fields in perf_event_attr. - Separate core and arch specific headers, drop asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h - Use new generic len/type for breakpoint - Handle off case: when breakpoints api is not supported by an arch Changes in v3: - Fix broken CONFIG_KVM, we need to propagate the breakpoint api changes to kvm when we exit the guest and restore the bp registers to the host. Changes in v4: - Drop the hw_breakpoint_restore() stub as it is only used by KVM - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a module - Restore the breakpoints unconditionally on kvm guest exit: TIF_DEBUG_THREAD doesn't anymore cover every cases of running breakpoints and vcpu->arch.switch_db_regs might not always be set when the guest used debug registers. (Waiting for a reliable optimization) Changes in v5: - Split-up the asm-generic/hw-breakpoint.h moving to linux/hw_breakpoint.h into a separate patch - Optimize the breakpoints restoring while switching from kvm guest to host. We only want to restore the state if we have active breakpoints to the host, otherwise we don't care about messed-up address registers. - Add asm/hw_breakpoint.h to Kbuild - Fix bad breakpoint type in trace_selftest.c Changes in v6: - Fix wrong header inclusion in trace.h (triggered a build error with CONFIG_FTRACE_SELFTEST Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jan Kiszka <jan.kiszka@web.de> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Avi Kivity <avi@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org>	2009-11-08 15:34:42 +01:00
Tejun Heo	2ae8bb75db	x86: Fix iommu=nodac parameter handling iommu=nodac should forbid dac instead of enabling it. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Matteo Frigo <athena@fftw.org> Cc: <stable@kernel.org> # .32.x and older LKML-Reference: <4AE5B52A.4050408@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-08 13:19:05 +01:00
FUJITA Tomonori	338bac527e	x86: Use x86_platform for iommu_shutdown This patch cleans up pci_iommu_shutdown() a bit to use x86_platform (similar to how IA64 initializes an IOMMU driver). This adds iommu_shutdown() to x86_platform to avoid calling every IOMMUs' shutdown functions in pci_iommu_shutdown() in order. The IOMMU shutdown functions are platform specific (we don't have multiple different IOMMU hardware) so the current way is pointless. An IOMMU driver sets x86_platform.iommu_shutdown to the shutdown function if necessary. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: joerg.roedel@amd.com LKML-Reference: <20091027163358F.fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-08 13:12:26 +01:00
Randy Dunlap	0420101c07	x86: k8.h: Add struct bootnode k8.h uses struct bootnode but does not #include a header file for it, so provide a simple declaration for it. arch/x86/include/asm/k8.h:13: warning: 'struct bootnode' declared inside parameter list arch/x86/include/asm/k8.h:13: warning: its scope is only this definition or declaration, which is probably not what you want Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20091028160955.d27ccb16.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-08 12:55:38 +01:00
Xiaotian Feng	de2a47cf2b	x86: Fix error return sequence in __ioremap_caller() kernel missed to free memtype if get_vm_area_caller failed in __ioremap_caller. This patch introduces error path to fix this and cleans up the repetitive error return sequences that contributed to the creation of the bug. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1257389031-20429-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-08 12:48:58 +01:00
Rusty Russell	0d0fbbddcc	x86, msr, cpumask: Use struct cpumask rather than the deprecated cpumask_t This makes the declarations match the definitions, which already use 'struct cpumask'. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <200911052245.41803.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-08 11:58:38 +01:00
Masami Hiramatsu	c12a229bc5	x86: Remove unused thread_return label from switch_to() Remove unused thread_return label from switch_to() macro on x86-64. Since this symbol cuts into schedule(), backtrace at the latter half of schedule() was always shown as thread_return(). Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> LKML-Reference: <20091105160359.5181.26225.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-08 11:57:13 +01:00
Simon Kagstrom	f1b291d4c4	x86: Add Phoenix/MSC BIOSes to lowmem corruption list We have a board with a Phoenix/MSC BIOS which also corrupts the low 64KB of RAM, so add an entry to the table. Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net> LKML-Reference: <20091106154404.002648d9@marrow.netinsight.se> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-11-06 14:49:39 -08:00
Bjorn Helgaas	ea7f1b6ee9	x86/PCI: remove 64-bit division The roundup() caused a build error (undefined reference to `__udivdi3'). We're aligning to power-of-two boundaries, so it's simpler to just use ALIGN() anyway, which avoids the division. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-11-06 13:59:34 -08:00
Eric W. Biederman	c3359fbce4	sysctl: x86 Use the compat_sys_sysctl Now that we have a generic 32bit compatibility implementation there is no need for x86 to implement it's own. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2009-11-06 03:53:58 -08:00
Frederic Weisbecker	2da3e160cb	hw-breakpoint: Move asm-generic/hw_breakpoint.h to linux/hw_breakpoint.h We plan to make the breakpoints parameters generic among architectures. For that it's better to move the asm-generic header to a generic linux header. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-11-05 23:48:01 +01:00

... 4 5 6 7 8 ...

9710 commits