linux

mirror of https://github.com/torvalds/linux synced 2024-10-23 11:46:42 +00:00

Author	SHA1	Message	Date
Anton Blanchard	8aa6d35929	powerpc: Move kdump default base address to half RMO size on 64bit We are seeing boot failures on some very large boxes even with commit `b5416ca9f8` (powerpc: Move kdump default base address to 64MB on 64bit). This patch halves the RMO so both kernels get about the same amount of RMO memory. On large machines this region will be at least 256MB, so each kernel will get 128MB. We cap it at 256MB (small SLB size) since some early allocations need to be in the bolted SLB region. We could relax this on machines with 1TB SLBs in a future patch. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-08-05 14:47:56 +10:00
David Ahern	b59a1bfcc2	powerpc/perf: Disable pagefaults during callchain stack read Panic observed on an older kernel when collecting call chains for the context-switch software event: [<b0180e00>]rb_erase+0x1b4/0x3e8 [<b00430f4>]__dequeue_entity+0x50/0xe8 [<b0043304>]set_next_entity+0x178/0x1bc [<b0043440>]pick_next_task_fair+0xb0/0x118 [<b02ada80>]schedule+0x500/0x614 [<b02afaa8>]rwsem_down_failed_common+0xf0/0x264 [<b02afca0>]rwsem_down_read_failed+0x34/0x54 [<b02aed4c>]down_read+0x3c/0x54 [<b0023b58>]do_page_fault+0x114/0x5e8 [<b001e350>]handle_page_fault+0xc/0x80 [<b0022dec>]perf_callchain+0x224/0x31c [<b009ba70>]perf_prepare_sample+0x240/0x2fc [<b009d760>]__perf_event_overflow+0x280/0x398 [<b009d914>]perf_swevent_overflow+0x9c/0x10c [<b009db54>]perf_swevent_ctx_event+0x1d0/0x230 [<b009dc38>]do_perf_sw_event+0x84/0xe4 [<b009dde8>]perf_sw_event_context_switch+0x150/0x1b4 [<b009de90>]perf_event_task_sched_out+0x44/0x2d4 [<b02ad840>]schedule+0x2c0/0x614 [<b0047dc0>]__cond_resched+0x34/0x90 [<b02adcc8>]_cond_resched+0x4c/0x68 [<b00bccf8>]move_page_tables+0xb0/0x418 [<b00d7ee0>]setup_arg_pages+0x184/0x2a0 [<b0110914>]load_elf_binary+0x394/0x1208 [<b00d6e28>]search_binary_handler+0xe0/0x2c4 [<b00d834c>]do_execve+0x1bc/0x268 [<b0015394>]sys_execve+0x84/0xc8 [<b001df10>]ret_from_syscall+0x0/0x3c A page fault occurred walking the callchain while creating a perf sample for the context-switch event. To handle the page fault the mmap_sem is needed, but it is currently held by setup_arg_pages. (setup_arg_pages calls shift_arg_pages with the mmap_sem held. shift_arg_pages then calls move_page_tables which has a cond_resched at the top of its for loop - hitting that cond_resched is what caused the context switch.) This is an extension of Anton's proposed patch: https://lkml.org/lkml/2011/7/24/151 adding case for 32-bit ppc. Tested on the system that first generated the panic and then again with latest kernel using a PPC VM. I am not able to test the 64-bit path - I do not have H/W for it and 64-bit PPC VMs (qemu on Intel) is horribly slow. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-08-05 14:47:56 +10:00
Anton Blanchard	fbafd72815	powerpc: Clean up some panic messages in prom_init Add a newline to the panic messages in make_room. Also fix a comment that suggested our chunk size is 4Mb. It's 1MB. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-08-05 14:47:55 +10:00
Anton Blanchard	966728dd88	powerpc: Fix device tree claim code I have a box that fails in OF during boot with: DEFAULT CATCH!, exception-handler=fff00400 at %SRR0: 49424d2c4c6f6768 %SRR1: 800000004000b002 ie "IBM,Logh". OF got corrupted with a device tree string. Looking at make_room and alloc_up, we claim the first chunk (1 MB) but we never claim any more. mem_end is always set to alloc_top which is the top of our available address space, guaranteeing we will never call alloc_up and claim more memory. Also alloc_up wasn't setting alloc_bottom to the bottom of the available address space. This doesn't help the box to boot, but we at least fail with an obvious error. We could relocate the device tree in a future patch. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-08-05 14:47:54 +10:00
Scott Wood	26ee97672e	powerpc: Return the_cpu_ spec from identify_cpu Commit `af9eef3c7b` caused cpu_setup to see the_cpu_spec, rather than the source struct. However, on 32-bit, the return value of identify_cpu was being used for feature fixups, and identify_cpu was returning the source struct. So if cpu_setup patches the feature bits, the update won't affect the fixups. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-08-05 14:47:54 +10:00
Linus Torvalds	3960ef326a	Merge branch 'next/cross-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc * 'next/cross-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: ARM: Consolidate the clkdev header files ARM: set vga memory base at run-time ARM: convert PCI defines to variables ARM: pci: make pcibios_assign_all_busses use pci_has_flag ARM: remove unnecessary mach/hardware.h includes pci: move microblaze and powerpc pci flag functions into asm-generic powerpc: rename ppc_pci__flags to pci__flags Fix up conflicts in arch/microblaze/include/asm/pci-bridge.h	2011-07-26 17:12:10 -07:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Linus Torvalds	184475029a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (99 commits) drivers/virt: add missing linux/interrupt.h to fsl_hypervisor.c powerpc/85xx: fix mpic configuration in CAMP mode powerpc: Copy back TIF flags on return from softirq stack powerpc/64: Make server perfmon only built on ppc64 server devices powerpc/pseries: Fix hvc_vio.c build due to recent changes powerpc: Exporting boot_cpuid_phys powerpc: Add CFAR to oops output hvc_console: Add kdb support powerpc/pseries: Fix hvterm_raw_get_chars to accept < 16 chars, fixing xmon powerpc/irq: Quieten irq mapping printks powerpc: Enable lockup and hung task detectors in pseries and ppc64 defeconfigs powerpc: Add mpt2sas driver to pseries and ppc64 defconfig powerpc: Disable IRQs off tracer in ppc64 defconfig powerpc: Sync pseries and ppc64 defconfigs powerpc/pseries/hvconsole: Fix dropped console output hvc_console: Improve tty/console put_chars handling powerpc/kdump: Fix timeout in crash_kexec_wait_realmode powerpc/mm: Fix output of total_ram. powerpc/cpufreq: Add cpufreq driver for Momentum Maple boards powerpc: Correct annotations of pmu registration functions ... Fix up trivial Kconfig/Makefile conflicts in arch/powerpc, drivers, and drivers/cpufreq	2011-07-25 22:59:39 -07:00
Linus Torvalds	45b583b10a	Merge 'akpm' patch series * Merge akpm patch series: (122 commits) drivers/connector/cn_proc.c: remove unused local Documentation/SubmitChecklist: add RCU debug config options reiserfs: use hweight_long() reiserfs: use proper little-endian bitops pnpacpi: register disabled resources drivers/rtc/rtc-tegra.c: properly initialize spinlock drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time() drivers/rtc: add support for Qualcomm PMIC8xxx RTC drivers/rtc/rtc-s3c.c: support clock gating drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200 init: skip calibration delay if previously done misc/eeprom: add eeprom access driver for digsy_mtc board misc/eeprom: add driver for microwire 93xx46 EEPROMs checkpatch.pl: update $logFunctions checkpatch: make utf-8 test --strict checkpatch.pl: add ability to ignore various messages checkpatch: add a "prefer __aligned" check checkpatch: validate signature styles and To: and Cc: lines checkpatch: add __rcu as a sparse modifier checkpatch: suggest using min_t or max_t ... Did this as a merge because of (trivial) conflicts in - Documentation/feature-removal-schedule.txt - arch/xtensa/include/asm/uaccess.h that were just easier to fix up in the merge than in the patch series.	2011-07-25 21:00:19 -07:00
Amerigo Wang	c5f41752fd	notifiers: sys: move reboot notifiers into reboot.h It is not necessary to share the same notifier.h. This patch already moves register_reboot_notifier() and unregister_reboot_notifier() from kernel/notifier.c to kernel/sys.c. [amwang@redhat.com: make allyesconfig succeed on ppc64] Signed-off-by: WANG Cong <amwang@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:14 -07:00
Linus Torvalds	d3ec4844d4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) fs: Merge split strings treewide: fix potentially dangerous trailing ';' in #defined values/expressions uwb: Fix misspelling of neighbourhood in comment net, netfilter: Remove redundant goto in ebt_ulog_packet trivial: don't touch files that are removed in the staging tree lib/vsprintf: replace link to Draft by final RFC number doc: Kconfig: `to be' -> `be' doc: Kconfig: Typo: square -> squared doc: Konfig: Documentation/power/{pm => apm-acpi}.txt drivers/net: static should be at beginning of declaration drivers/media: static should be at beginning of declaration drivers/i2c: static should be at beginning of declaration XTENSA: static should be at beginning of declaration SH: static should be at beginning of declaration MIPS: static should be at beginning of declaration ARM: static should be at beginning of declaration rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check Update my e-mail address PCIe ASPM: forcedly -> forcibly gma500: push through device driver tree ... Fix up trivial conflicts: - arch/arm/mach-ep93xx/dma-m2p.c (deleted) - drivers/gpio/gpio-ep93xx.c (renamed and context nearby) - drivers/net/r8169.c (just context changes)	2011-07-25 13:56:39 -07:00
Linus Torvalds	fcda12e7f6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: modpost: Fix modpost's license checking V3 module: add /sys/module/<name>/uevent files module: change attr callbacks to take struct module_kobject modules: make arch's use default loader hooks modules: add default loader hook implementations param: fix return value handling in param_set_*	2011-07-24 09:54:54 -07:00
Linus Torvalds	5fabc487c9	Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) KVM: IOMMU: Disable device assignment without interrupt remapping KVM: MMU: trace mmio page fault KVM: MMU: mmio page fault support KVM: MMU: reorganize struct kvm_shadow_walk_iterator KVM: MMU: lockless walking shadow page table KVM: MMU: do not need atomicly to set/clear spte KVM: MMU: introduce the rules to modify shadow page table KVM: MMU: abstract some functions to handle fault pfn KVM: MMU: filter out the mmio pfn from the fault pfn KVM: MMU: remove bypass_guest_pf KVM: MMU: split kvm_mmu_free_page KVM: MMU: count used shadow pages on prepareing path KVM: MMU: rename 'pt_write' to 'emulate' KVM: MMU: cleanup for FNAME(fetch) KVM: MMU: optimize to handle dirty bit KVM: MMU: cache mmio info on page fault path KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code KVM: MMU: do not update slot bitmap if spte is nonpresent KVM: MMU: fix walking shadow page table KVM guest: KVM Steal time registration ...	2011-07-24 09:07:03 -07:00
Jonas Bonn	66574cc054	modules: make arch's use default loader hooks This patch removes all the module loader hook implementations in the architecture specific code where the functionality is the same as that now provided by the recently added default hooks. Signed-off-by: Jonas Bonn <jonas@southpole.se> Acked-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-07-24 22:06:04 +09:30
Linus Torvalds	4d4abdcb1d	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits) perf: Remove the nmi parameter from the oprofile_perf backend x86, perf: Make copy_from_user_nmi() a library function perf: Remove perf_event_attr::type check x86, perf: P4 PMU - Fix typos in comments and style cleanup perf tools: Make test use the preset debugfs path perf tools: Add automated tests for events parsing perf tools: De-opt the parse_events function perf script: Fix display of IP address for non-callchain path perf tools: Fix endian conversion reading event attr from file header perf tools: Add missing 'node' alias to the hw_cache[] array perf probe: Support adding probes on offline kernel modules perf probe: Add probed module in front of function perf probe: Introduce debuginfo to encapsulate dwarf information perf-probe: Move dwarf library routines to dwarf-aux.{c, h} perf probe: Remove redundant dwarf functions perf probe: Move strtailcmp to string.c perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END tracing/kprobe: Update symbol reference when loading module tracing/kprobes: Support module init function probing kprobes: Return -ENOENT if probe point doesn't exist ...	2011-07-22 16:44:39 -07:00
Linus Torvalds	acb41c0f92	Merge branch 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: pci/of: Consolidate pci_bus_to_OF_node() pci/of: Consolidate pci_device_to_OF_node() x86/devicetree: Use generic PCI <-> OF matching microblaze/pci: Move the remains of pci_32.c to pci-common.c microblaze/pci: Remove powermac originated cruft pci/of: Match PCI devices to OF nodes dynamically	2011-07-22 14:54:02 -07:00
Benjamin Herrenschmidt	50d2a4223b	powerpc: Copy back TIF flags on return from softirq stack We already did it for hard IRQs but it looks like we forgot to do it for softirqs. Without this, we would lose flags such as TIF_NEED_RESCHED set using current_thread_info() by something running of a softirq. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-22 13:38:58 +10:00
Benjamin Herrenschmidt	4b575f3e8a	Merge remote-tracking branch 'jwb/next' into next	2011-07-22 13:16:41 +10:00
Andrew Gabbasov	9974eec2b8	powerpc: Exporting boot_cpuid_phys Kernel loadable module can use hard_smp_processor_id() if building with SMP kernel. In order to make it work for UP kernels too, boot_cpuid_phys symbol (which is what hard_smp_processor_id() macro resolves to in non-SMP configuration) must be exported. Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-19 15:13:34 +10:00
Michael Neuling	5115a026ce	powerpc: Add CFAR to oops output Now we have the CFAR saved add it to the oops output. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-19 15:13:33 +10:00
Anton Blanchard	8896293422	powerpc/irq: Quieten irq mapping printks HFI creates interrupts each time a window is setup. This results in a lot of messages in the kernel log buffer: irq: irq 199007 on host null mapped to virtual irq 351 This box has over 3500 of them, causing more important kernel messages to be overwritten. We can get at this information via debugfs now so we may as well turn it into a pr_debug. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-19 15:13:07 +10:00
Michael Neuling	63f21a56f1	powerpc/kdump: Fix timeout in crash_kexec_wait_realmode The existing code it pretty ugly. How about we clean it up even more like this? From: Anton Blanchard <anton@samba.org> We check for timeout expiry in the outer loop, but we also need to check it in the inner loop or we can lock up forever waiting for a CPU to hit real mode. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-19 15:13:05 +10:00
Dmitry Eremin-Solenikov	77c2342a57	powerpc: Correct annotations of pmu registration functions This fixes the following warning: WARNING: arch/powerpc/kernel/built-in.o(.text+0x29768): Section mismatch in reference from the function .register_power_pmu() to the function .cpuinit.text:.power_pmu_notifier() The function .register_power_pmu() references the function __cpuinit .power_pmu_notifier(). This is often because .register_power_pmu lacks a __cpuinit annotation or the annotation of .power_pmu_notifier is wrong. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-19 15:12:40 +10:00
Mathias Krause	0e0ebdb9c2	powerpc: Remove redundant set_fs(USER_DS) The address limit is already set in flush_old_exec() so this set_fs(USER_DS) is redundant. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-19 15:12:40 +10:00
Dave Kleikamp	9661534d6a	powerpc/47x: allow kernel to be loaded in higher physical memory The 44x code (which is shared by 47x) assumes the available physical memory begins at 0x00000000. This is not necessarily the case in an AMP environment. Support CONFIG_RELOCATABLE for 476 in order to allow the kernel to be loaded into a higher memory range. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2011-07-12 10:34:24 -04:00
Rob Herring	0e47ff1ce6	powerpc: rename ppc_pci__flags to pci__flags This renames pci flags functions and enums in preparation for creating generic version in asm-generic/pci-bridge.h. The following search and replace is done: s/ppc_pci_/pci_/ s/PPC_PCI_/PCI_/ Direct accesses to ppc_pci_flag variable are replaced with helper functions. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org>	2011-07-12 09:28:04 -05:00
Dave Kleikamp	91b191c71e	powerpc/44x: don't use tlbivax on AMP systems Since other OS's may be running on the other cores don't use tlbivax Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2011-07-12 09:21:55 -04:00
Paul Mackerras	9e368f2915	KVM: PPC: book3s_hv: Add support for PPC970-family processors This adds support for running KVM guests in supervisor mode on those PPC970 processors that have a usable hypervisor mode. Unfortunately, Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to 1), but the YDL PowerStation does have a usable hypervisor mode. There are several differences between the PPC970 and POWER7 in how guests are managed. These differences are accommodated using the CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature bits. Notably, on PPC970: * The LPCR, LPID or RMOR registers don't exist, and the functions of those registers are provided by bits in HID4 and one bit in HID0. * External interrupts can be directed to the hypervisor, but unlike POWER7 they are masked by MSR[EE] in non-hypervisor modes and use SRR0/1 not HSRR0/1. * There is no virtual RMA (VRMA) mode; the guest must use an RMO (real mode offset) area. * The TLB entries are not tagged with the LPID, so it is necessary to flush the whole TLB on partition switch. Furthermore, when switching partitions we have to ensure that no other CPU is executing the tlbie or tlbsync instructions in either the old or the new partition, otherwise undefined behaviour can occur. * The PMU has 8 counters (PMC registers) rather than 6. * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist. * The SLB has 64 entries rather than 32. * There is no mediated external interrupt facility, so if we switch to a guest that has a virtual external interrupt pending but the guest has MSR[EE] = 0, we have to arrange to have an interrupt pending for it so that we can get control back once it re-enables interrupts. We do that by sending ourselves an IPI with smp_send_reschedule after hard-disabling interrupts. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:59 +03:00
Paul Mackerras	969391c58a	powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to indicate that we have a usable hypervisor mode, and another to indicate that the processor conforms to PowerISA version 2.06. We also add another bit to indicate that the processor conforms to ISA version 2.01 and set that for PPC970 and derivatives. Some PPC970 chips (specifically those in Apple machines) have a hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode is not useful in the sense that there is no way to run any code in supervisor mode (HV=0 PR=0). On these processors, the LPES0 and LPES1 bits in HID4 are always 0, and we use that as a way of detecting that hypervisor mode is not useful. Where we have a feature section in assembly code around code that only applies on POWER7 in hypervisor mode, we use a construct like END_FTR_SECTION_IFSET(CPU_FTR_HVMODE \| CPU_FTR_ARCH_206) The definition of END_FTR_SECTION_IFSET is such that the code will be enabled (not overwritten with nops) only if all bits in the provided mask are set. Note that the CPU feature check in __tlbie() only needs to check the ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called if we are running bare-metal, i.e. in hypervisor mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:58 +03:00
Paul Mackerras	aa04b4cc5b	KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests This adds infrastructure which will be needed to allow book3s_hv KVM to run on older POWER processors, including PPC970, which don't support the Virtual Real Mode Area (VRMA) facility, but only the Real Mode Offset (RMO) facility. These processors require a physically contiguous, aligned area of memory for each guest. When the guest does an access in real mode (MMU off), the address is compared against a limit value, and if it is lower, the address is ORed with an offset value (from the Real Mode Offset Register (RMOR)) and the result becomes the real address for the access. The size of the RMA has to be one of a set of supported values, which usually includes 64MB, 128MB, 256MB and some larger powers of 2. Since we are unlikely to be able to allocate 64MB or more of physically contiguous memory after the kernel has been running for a while, we allocate a pool of RMAs at boot time using the bootmem allocator. The size and number of the RMAs can be set using the kvm_rma_size=xx and kvm_rma_count=xx kernel command line options. KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability of the pool of preallocated RMAs. The capability value is 1 if the processor can use an RMA but doesn't require one (because it supports the VRMA facility), or 2 if the processor requires an RMA for each guest. This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the pool and returns a file descriptor which can be used to map the RMA. It also returns the size of the RMA in the argument structure. Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION ioctl calls from userspace. To cope with this, we now preallocate the kvm->arch.ram_pginfo array when the VM is created with a size sufficient for up to 64GB of guest memory. Subsequently we will get rid of this array and use memory associated with each memslot instead. This moves most of the code that translates the user addresses into host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level to kvmppc_core_prepare_memory_region. Also, instead of having to look up the VMA for each page in order to check the page size, we now check that the pages we get are compound pages of 16MB. However, if we are adding memory that is mapped to an RMA, we don't bother with calling get_user_pages_fast and instead just offset from the base pfn for the RMA. Typically the RMA gets added after vcpus are created, which makes it inconvenient to have the LPCR (logical partition control register) value in the vcpu->arch struct, since the LPCR controls whether the processor uses RMA or VRMA for the guest. This moves the LPCR value into the kvm->arch struct and arranges for the MER (mediated external request) bit, which is the only bit that varies between vcpus, to be set in assembly code when going into the guest if there is a pending external interrupt request. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:57 +03:00
Paul Mackerras	371fefd6f2	KVM: PPC: Allow book3s_hv guests to use SMT processor modes This lifts the restriction that book3s_hv guests can only run one hardware thread per core, and allows them to use up to 4 threads per core on POWER7. The host still has to run single-threaded. This capability is advertised to qemu through a new KVM_CAP_PPC_SMT capability. The return value of the ioctl querying this capability is the number of vcpus per virtual CPU core (vcore), currently 4. To use this, the host kernel should be booted with all threads active, and then all the secondary threads should be offlined. This will put the secondary threads into nap mode. KVM will then wake them from nap mode and use them for running guest code (while they are still offline). To wake the secondary threads, we send them an IPI using a new xics_wake_cpu() function, implemented in arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage we assume that the platform has a XICS interrupt controller and we are using icp-native.c to drive it. Since the woken thread will need to acknowledge and clear the IPI, we also export the base physical address of the XICS registers using kvmppc_set_xics_phys() for use in the low-level KVM book3s code. When a vcpu is created, it is assigned to a virtual CPU core. The vcore number is obtained by dividing the vcpu number by the number of threads per core in the host. This number is exported to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes to run the guest in single-threaded mode, it should make all vcpu numbers be multiples of the number of threads per core. We distinguish three states of a vcpu: runnable (i.e., ready to execute the guest), blocked (that is, idle), and busy in host. We currently implement a policy that the vcore can run only when all its threads are runnable or blocked. This way, if a vcpu needs to execute elsewhere in the kernel or in qemu, it can do so without being starved of CPU by the other vcpus. When a vcore starts to run, it executes in the context of one of the vcpu threads. The other vcpu threads all go to sleep and stay asleep until something happens requiring the vcpu thread to return to qemu, or to wake up to run the vcore (this can happen when another vcpu thread goes from busy in host state to blocked). It can happen that a vcpu goes from blocked to runnable state (e.g. because of an interrupt), and the vcore it belongs to is already running. In that case it can start to run immediately as long as the none of the vcpus in the vcore have started to exit the guest. We send the next free thread in the vcore an IPI to get it to start to execute the guest. It synchronizes with the other threads via the vcore->entry_exit_count field to make sure that it doesn't go into the guest if the other vcpus are exiting by the time that it is ready to actually enter the guest. Note that there is no fixed relationship between the hardware thread number and the vcpu number. Hardware threads are assigned to vcpus as they become runnable, so we will always use the lower-numbered hardware threads in preference to higher-numbered threads if not all the vcpus in the vcore are runnable, regardless of which vcpus are runnable. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:57 +03:00
Paul Mackerras	a8606e20e4	KVM: PPC: Handle some PAPR hcalls in the kernel This adds the infrastructure for handling PAPR hcalls in the kernel, either early in the guest exit path while we are still in real mode, or later once the MMU has been turned back on and we are in the full kernel context. The advantage of handling hcalls in real mode if possible is that we avoid two partition switches -- and this will become more important when we support SMT4 guests, since a partition switch means we have to pull all of the threads in the core out of the guest. The disadvantage is that we can only access the kernel linear mapping, not anything vmalloced or ioremapped, since the MMU is off. This also adds code to handle the following hcalls in real mode: H_ENTER Add an HPTE to the hashed page table H_REMOVE Remove an HPTE from the hashed page table H_READ Read HPTEs from the hashed page table H_PROTECT Change the protection bits in an HPTE H_BULK_REMOVE Remove up to 4 HPTEs from the hashed page table H_SET_DABR Set the data address breakpoint register Plus code to handle the following hcalls in the kernel: H_CEDE Idle the vcpu until an interrupt or H_PROD hcall arrives H_PROD Wake up a ceded vcpu H_REGISTER_VPA Register a virtual processor area (VPA) The code that runs in real mode has to be in the base kernel, not in the module, if KVM is compiled as a module. The real-mode code can only access the kernel linear mapping, not vmalloc or ioremap space. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:55 +03:00
Paul Mackerras	de56a948b9	KVM: PPC: Add support for Book3S processors in hypervisor mode This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:54 +03:00
Paul Mackerras	3c42bf8a71	KVM: PPC: Split host-state fields out of kvmppc_book3s_shadow_vcpu There are several fields in struct kvmppc_book3s_shadow_vcpu that temporarily store bits of host state while a guest is running, rather than anything relating to the particular guest or vcpu. This splits them out into a new kvmppc_host_state structure and modifies the definitions in asm-offsets.c to suit. On 32-bit, we have a kvmppc_host_state structure inside the kvmppc_book3s_shadow_vcpu since the assembly code needs to be able to get to them both with one pointer. On 64-bit they are separate fields in the PACA. This means that on 64-bit we don't need to copy the kvmppc_host_state in and out on vcpu load/unload, and in future will mean that the book3s_hv code doesn't need a shadow_vcpu struct in the PACA at all. That does mean that we have to be careful not to rely on any values persisting in the hstate field of the paca across any point where we could block or get preempted. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:53 +03:00
Paul Mackerras	923c53caea	powerpc: Set up LPCR for running guest partitions In hypervisor mode, the LPCR controls several aspects of guest partitions, including virtual partition memory mode, and also controls whether the hypervisor decrementer interrupts are enabled. This sets up LPCR at boot time so that guest partitions will use a virtual real memory area (VRMA) composed of 16MB large pages, and hypervisor decrementer interrupts are disabled. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:52 +03:00
Paul Mackerras	b01c8b54a1	powerpc, KVM: Rework KVM checks in first-level interrupt handlers Instead of branching out-of-line with the DO_KVM macro to check if we are in a KVM guest at the time of an interrupt, this moves the KVM check inline in the first-level interrupt handlers. This speeds up the non-KVM case and makes sure that none of the interrupt handlers are missing the check. Because the first-level interrupt handlers are now larger, some things had to be move out of line in exceptions-64s.S. This all necessitated some minor changes to the interrupt entry code in KVM. This also streamlines the book3s_32 KVM test. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:48 +03:00
Liu Yu	dd9ebf1f94	KVM: PPC: e500: Add shadow PID support Dynamically assign host PIDs to guest PIDs, splitting each guest PID into multiple host (shadow) PIDs based on kernel/user and MSR[IS/DS]. Use both PID0 and PID1 so that the shadow PIDs for the right mode can be selected, that correspond both to guest TID = zero and guest TID = guest PID. This allows us to significantly reduce the frequency of needing to invalidate the entire TLB. When the guest mode or PID changes, we just update the host PID0/PID1. And since the allocation of shadow PIDs is global, multiple guests can share the TLB without conflict. Note that KVM does not yet support the guest setting PID1 or PID2 to a value other than zero. This will need to be fixed for nested KVM to work. Until then, we enforce the requirement for guest PID1/PID2 to stay zero by failing the emulation if the guest tries to set them to something else. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:39 +03:00
Scott Wood	4cd35f675b	KVM: PPC: e500: Save/restore SPE state This is done lazily. The SPE save will be done only if the guest has used SPE since the last preemption or heavyweight exit. Restore will be done only on demand, when enabling MSR_SPE in the shadow MSR, in response to an SPE fault or mtmsr emulation. For SPEFSCR, Linux already switches it on context switch (non-lazily), so the only remaining bit is to save it between qemu and the guest. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:32 +03:00
Scott Wood	ecee273fc4	KVM: PPC: booke: use shadow_msr Keep the guest MSR and the guest-mode true MSR separate, rather than modifying the guest MSR on each guest entry to produce a true MSR. Any bits which should be modified based on guest MSR must be explicitly propagated from vcpu->arch.shared->msr to vcpu->arch.shadow_msr in kvmppc_set_msr(). While we're modifying the guest entry code, reorder a few instructions to bury some load latencies. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:32 +03:00
Scott Wood	c51584d52e	powerpc/e500: SPE register saving: take arbitrary struct offset Previously, these macros hardcoded THREAD_EVR0 as the base of the save area, relative to the base register passed. This base offset is now passed as a separate macro parameter, allowing reuse with other SPE save areas, such as used by KVM. Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:31 +03:00
yu liu	685659ee70	powerpc/e500: Save SPEFCSR in flush_spe_to_thread() giveup_spe() saves the SPE state which is protected by MSR[SPE]. However, modifying SPEFSCR does not trap when MSR[SPE]=0. And since SPEFSCR is already saved/restored in _switch(), not all the callers want to save SPEFSCR again. Thus, saving SPEFSCR should not belong to giveup_spe(). This patch moves SPEFSCR saving to flush_spe_to_thread(), and cleans up the caller that needs to save SPEFSCR accordingly. Signed-off-by: Liu Yu <yu.liu@freescale.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:30 +03:00
Jiri Kosina	b7e9c223be	Merge branch 'master' into for-next Sync with Linus' tree to be able to apply pending patches that are based on newer code already present upstream.	2011-07-11 14:15:55 +02:00
Kumar Gala	6471fc6630	powerpc: Dont require a dma_ops struct to set dma mask The only reason to require a dma_ops struct is to see if it has implemented set_dma_mask. If not we can fall back to setting the mask directly. This resolves an issue with how to sequence the setting of a DMA mask for platform devices. Before we had an issue in that we have no way of setting the DMA mask before the various low level bus notifiers get called that might check it (swiotlb). So now we can do: pdev = platform_device_alloc("foobar", 0); dma_set_mask(&pdev->dev, DMA_BIT_MASK(37)); platform_device_add(pdev); And expect the right thing to happen with the bus notifiers get called via platform_device_add. Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2011-07-08 00:21:36 -05:00
Kumar Gala	314b02f503	powerpc: implement arch_setup_pdev_archdata We have a long standing issues with platform devices not have a valid dma_mask pointer. This hasn't been an issue to date as no platform device has tried to set its dma_mask value to a non-default value. Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2011-07-08 00:21:36 -05:00
Becky Bruce	3160b09796	powerpc: Create next_tlbcam_idx percpu variable for FSL_BOOKE This is used to round-robin TLBCAM entries. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2011-07-08 00:21:34 -05:00
Avi Kivity	4dc0da8696	perf: Add context field to perf_event The perf_event overflow handler does not receive any caller-derived argument, so many callers need to resort to looking up the perf_event in their local data structure. This is ugly and doesn't scale if a single callback services many perf_events. Fix by adding a context parameter to perf_event_create_kernel_counter() (and derived hardware breakpoints APIs) and storing it in the perf_event. The field can be accessed from the callback as event->overflow_handler_context. All callers are updated. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-01 11:06:38 +02:00
Peter Zijlstra	89d6c0b5bd	perf, arch: Add generic NODE cache events Add a NODE level to the generic cache events which is used to measure local vs remote memory accesses. Like all other cache events, an ACCESS is HIT+MISS, if there is no way to distinguish between reads and writes do reads only etc.. The below needs filling out for !x86 (which I filled out with unsupported events). I'm fairly sure ARM can leave it like that since it doesn't strike me as an architecture that even has NUMA support. SH might have something since it does appear to have some NUMA bits. Sparc64, PowerPC and MIPS certainly want a good look there since they clearly are NUMA capable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Miller <davem@davemloft.net> Cc: Anton Blanchard <anton@samba.org> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-01 11:06:38 +02:00
Peter Zijlstra	a8b0ca17b8	perf: Remove the nmi parameter from the swevent and overflow interface The nmi parameter indicated if we could do wakeups from the current context, if not, we would set some state and self-IPI and let the resulting interrupt do the wakeup. For the various event classes: - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from the PMI-tail (ARM etc.) - tracepoint: nmi=0; since tracepoint could be from NMI context. - software: nmi=[0,1]; some, like the schedule thing cannot perform wakeups, and hence need 0. As one can see, there is very little nmi=1 usage, and the down-side of not using it is that on some platforms some software events can have a jiffy delay in wakeup (when arch_irq_work_raise isn't implemented). The up-side however is that we can remove the nmi parameter and save a bunch of conditionals in fast paths. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-01 11:06:35 +02:00
Peter Zijlstra	4f8b50bbbe	irq_work, ppc: Fix up arch hooks Commit `e360adbe29` ("irq_work: Add generic hardirq context callbacks") fouled up the ppc bit, not properly naming the arch specific function that raises the 'self-IPI'. Cc: Huang Ying <ying.huang@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: stable@kernel.org # 37+ Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-eg0aqien8p1aqvzu9dft6dtv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-01 11:02:22 +02:00
Michael Ellerman	ac5f89c7d8	powerpc: Add jump label support This patch adds support for the new "jump label" feature. Unlike x86 and sparc we just merrily patch the code with no locks etc, as far as I know this is safe, but I'm not really sure what the x86/sparc code is protecting against so maybe it's not. I also don't see any reason for us to implement the poke_early() routine, even though sparc does. [BenH: Updated the patch to upstream generic changes] Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-01 13:48:55 +10:00

1 2 3 4 5 ...

2986 commits