Commit graph

2504 commits

Author SHA1 Message Date
Paul Mackerras c1aa687d49 powerpc: Clean up obsolete code relating to decrementer and timebase
Since the decrementer and timekeeping code was moved over to using
the generic clockevents and timekeeping infrastructure, several
variables and functions have been obsolete and effectively unused.
This deletes them.

In particular, wakeup_decrementer() is no longer needed since the
generic code reprograms the decrementer as part of the process of
resuming the timekeeping code, which happens during sysdev resume.
Thus the wakeup_decrementer calls in the suspend_enter methods for
52xx platforms have been removed.  The call in the powermac cpu
frequency change code has been replaced by set_dec(1), which will
cause a timer interrupt as soon as interrupts are enabled, and the
generic code will then reprogram the decrementer with the correct
value.

This also simplifies the generic_suspend_en/disable_irqs functions
and makes them static since they are not referenced outside time.c.
The preempt_enable/disable calls are removed because the generic
code has disabled all but the boot cpu at the point where these
functions are called, so we can't be moved to another cpu.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09 11:26:16 +10:00
Paul Mackerras 8fd63a9ea7 powerpc: Rework VDSO gettimeofday to prevent time going backwards
Currently it is possible for userspace to see the result of
gettimeofday() going backwards by 1 microsecond, assuming that
userspace is using the gettimeofday() in the VDSO.  The VDSO
gettimeofday() algorithm computes the time in "xsecs", which are
units of 2^-20 seconds, or approximately 0.954 microseconds,
using the algorithm

	now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec

and then converts the time in xsecs to seconds and microseconds.

The kernel updates the tb_orig_stamp and stamp_xsec values every
tick in update_vsyscall().  If the length of the tick is not an
integer number of xsecs, then some precision is lost in converting
the current time to xsecs.  For example, with CONFIG_HZ=1000, the
tick is 1ms long, which is 1048.576 xsecs.  That means that
stamp_xsec will advance by either 1048 or 1049 on each tick.
With the right conditions, it is possible for userspace to get
(timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is
slightly late in updating the vdso_datapage, and then for stamp_xsec
to advance by 1048 when the kernel does update it, and for userspace
to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to
integer truncation.  The result is that time appears to go backwards
by 1 microsecond.

To fix this we change the VDSO gettimeofday to use a new field in the
VDSO datapage which stores the nanoseconds part of the time as a
fractional number of seconds in a 0.32 binary fraction format.
(Or put another way, as a 32-bit number in units of 0.23283 ns.)
This is convenient because we can use the mulhwu instruction to
convert it to either microseconds or nanoseconds.

Since it turns out that computing the time of day using this new field
is simpler than either using stamp_xsec (as gettimeofday does) or
stamp_xtime.tv_nsec (as clock_gettime does), this converts both
gettimeofday and clock_gettime to use the new field.  The existing
__do_get_tspec function is converted to use the new field and take
a parameter in r7 that indicates the desired resolution, 1,000,000
for microseconds or 1,000,000,000 for nanoseconds.  The __do_get_xsec
function is then unused and is deleted.

The new algorithm is

	now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs
		+ (stamp_xtime_seconds << 32) + stamp_sec_fraction

with 'now' in units of 2^-32 seconds.  That is then converted to
seconds and either microseconds or nanoseconds with

	seconds = now >> 32
	partseconds = ((now & 0xffffffff) * resolution) >> 32

The 32-bit VDSO code also makes a further simplification: it ignores
the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary
fraction.  Doing so gets rid of 4 multiply instructions.  Assuming
a timebase frequency of 1GHz or less and an update interval of no
more than 10ms, the upper 32 bits of tb_to_xs will be at least
4503599, so the error from ignoring the low 32 bits will be at most
2.2ns, which is more than an order of magnitude less than the time
taken to do gettimeofday or clock_gettime on our fastest processors,
so there is no possibility of seeing inconsistent values due to this.

This also moves update_gtod() down next to its only caller, and makes
update_vsyscall use the time passed in via the wall_time argument rather
than accessing xtime directly.  At present, wall_time always points to
xtime, but that could change in future.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09 11:26:16 +10:00
Benjamin Herrenschmidt 5f07aa7524 Merge commit 'paulus-perf/master' into next 2010-07-09 11:25:48 +10:00
Paul E. McKenney c2be05481f powerpc: Fix default_machine_crash_shutdown #ifdef botch
crash_kexec_wait_realmode() is defined only if CONFIG_PPC_STD_MMU_64
and CONFIG_SMP, but is called if CONFIG_PPC_STD_MMU_64 even if !CONFIG_SMP.
Fix the conditional compilation around the invocation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-08 18:11:45 +10:00
Johannes Berg 3cd8519248 powerpc: Fix logic error in fixup_irqs
When SPARSE_IRQ is set, irq_to_desc() can
return NULL. While the code here has a
check for NULL, it's not really correct.
Fix it by separating the check for it.

This fixes CPU hot unplug for me.

Reported-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Cc: stable@kernel.org [2.6.32+]
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-08 18:11:44 +10:00
Anton Blanchard 33ad5e4b6c powerpc: Linux cannot run with 0 cores
If we configure with CONFIG_SMP=n or set NR_CPUS less than the number of
SMT threads we will set the max cores property to 0 in the
ibm,client-architecture-support structure. On new versions of firmware that
understand this property it obliges and terminates our partition.

Use DIV_ROUND_UP so we handle not only the CONFIG_SMP=n case but also the
case where NR_CPUS isn't a multiple of the number of SMT threads.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-08 18:11:42 +10:00
Stephen Rothwell 5afd878a95 powerpc: Fix compile errors in prom_init_check for gcc 4.5
Just whitelist these extra compiler generated symbols.
Fixes these errors:

Error: External symbol '_restgpr0_14' referenced from prom_init.c
Error: External symbol '_restgpr0_20' referenced from prom_init.c
Error: External symbol '_restgpr0_22' referenced from prom_init.c
Error: External symbol '_restgpr0_24' referenced from prom_init.c
Error: External symbol '_restgpr0_25' referenced from prom_init.c
Error: External symbol '_restgpr0_26' referenced from prom_init.c
Error: External symbol '_restgpr0_27' referenced from prom_init.c
Error: External symbol '_restgpr0_28' referenced from prom_init.c
Error: External symbol '_restgpr0_29' referenced from prom_init.c
Error: External symbol '_restgpr0_31' referenced from prom_init.c
Error: External symbol '_savegpr0_14' referenced from prom_init.c
Error: External symbol '_savegpr0_20' referenced from prom_init.c
Error: External symbol '_savegpr0_22' referenced from prom_init.c
Error: External symbol '_savegpr0_24' referenced from prom_init.c
Error: External symbol '_savegpr0_25' referenced from prom_init.c
Error: External symbol '_savegpr0_26' referenced from prom_init.c
Error: External symbol '_savegpr0_27' referenced from prom_init.c
Error: External symbol '_savegpr0_28' referenced from prom_init.c
Error: External symbol '_savegpr0_29' referenced from prom_init.c
Error: External symbol '_savegpr0_31' referenced from prom_init.c

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-08 18:11:39 +10:00
Matt Evans 219a92a4c4 powerpc/perf_event: Fix for power_pmu_disable()
When power_pmu_disable() removes the given event from a particular index into
cpuhw->event[], it shuffles down higher event[] entries.  But, this array is
paired with cpuhw->events[] and cpuhw->flags[] so should shuffle them
similarly.

If these arrays get out of sync, code such as power_check_constraints() will
fail.  This caused a bug where events were temporarily disabled and then failed
to be re-enabled; subsequent code tried to write_pmc() with its (disabled) idx
of 0, causing a message "oops trying to write PMC0".  This triggers this bug on
POWER7, running a miss-heavy test:

  perf record -e L1-dcache-load-misses -e L1-dcache-store-misses ./misstest

Signed-off-by: Matt Evans <matt@ozlabs.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-08 18:11:37 +10:00
Paul Mackerras d09ec73871 powerpc, hw_breakpoint: Tell generic code we have no instruction breakpoints
At present, hw_breakpoint_slots() returns 1 regardless of what
type of breakpoint is specified in the type argument.  Since we
don't define CONFIG_HAVE_MIXED_BREAKPOINTS_REGS, there are
separate values for TYPE_INST and TYPE_DATA, and hw_breakpoint_slots()
returns 1 for both, effectively advertising instruction breakpoint
support which doesn't exist.

This fixes it by making hw_breakpoint_slots return 1 for TYPE_DATA
and 0 for TYPE_INST.  This moves hw_breakpoint_slots() from the
powerpc hw_breakpoint.h to hw_breakpoint.c because the definitions
of TYPE_INST and TYPE_DATA aren't available in <asm/hw_breakpoint.h>.
They are defined in <linux/hw_breakpoint.h> but we can't include
that header in <asm/hw_breakpoint.h>, and nor can we rely on
<linux/hw_breakpoint.h> being included before <asm/hw_breakpoint.h>.
Since hw_breakpoint_slots() is only called at boot time, there is
no performance impact from making it a real function rather than
a static inline.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2010-06-30 13:54:58 +10:00
Paul Mackerras 76b0f13376 powerpc, hw_breakpoint: Cooperate better with other single-steppers
The code we had to clear the MSR_SE bit was not doing anything because
the caller (ultimately single_step_exception() in traps.c) had already
cleared.  Instead of trying to leave MSR_SE set if the TIF_SINGLESTEP
flag is set (which indicates that the process is being single-stepped
by ptrace), we instead return NOTIFY_DONE in that case, which means
the caller will generate a SIGTRAP for the process.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2010-06-23 15:46:55 +10:00
Paul Mackerras 574cb24899 powerpc, hw_breakpoint: Fix off-by-one in checking access address
The code would accept an access to an address one byte past the end
of the requested range as legitimate, due to having a "<=" rather than
a "<".  This fixes that and cleans up the code a bit.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2010-06-23 15:42:43 +10:00
K.Prasad e3e94084ad powerpc, hw_breakpoint: Discard extraneous interrupt due to accesses outside symbol length
Many a times, the requested breakpoint length can be less than the
fixed breakpoint length i.e. 8 bytes supported by PowerPC 64-bit
server (Book III S) processors.  This could lead to extraneous
interrupts resulting in false breakpoint notifications.  This
detects and discards such interrupts for non-ptrace requests.
We don't change ptrace behaviour to avoid breaking compatability.

[Suggestion from Paul Mackerras <paulus@samba.org> to add a new flag in
'struct arch_hw_breakpoint' to identify extraneous interrupts]

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2010-06-22 19:40:51 +10:00
K.Prasad 06532a6743 powerpc, hw_breakpoint: Enable hw-breakpoints while handling intervening signals
A signal delivered between a hw_breakpoint_handler() and the
single_step_dabr_instruction() will not have the breakpoint active
while the signal handler is running -- the signal delivery will
set up a new MSR value which will not have MSR_SE set, so we
won't get the signal step interrupt until and unless the signal
handler returns (which it may never do).

To fix this, we restore the breakpoint when delivering a signal --
we clear the MSR_SE bit and set the DABR again.  If the signal
handler returns, the DABR interrupt will occur again when the
instruction that we were originally trying to single-step gets
re-executed.

[Paul Mackerras <paulus@samba.org> pointed out the need to do this.]

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2010-06-22 19:40:50 +10:00
K.Prasad 2538c2d08f powerpc, hw_breakpoint: Handle concurrent alignment interrupts
If an alignment interrupt occurs on an instruction that is being
single-stepped, the alignment interrupt handler currently handles
the single-step condition by unconditionally sending a SIGTRAP to
the process.  Other synchronous interrupts that result in the
instruction being emulated do likewise.

With hw_breakpoint support, the hw_breakpoint code needs to be able
to intercept these single-step events as well as those where the
instruction executes normally and a trace interrupt happens.

Fix this by making emulate_single_step() use the existing
single_step_exception() function instead of calling _exception()
directly.  We then make single_step_exception() use the abstracted
clear_single_step() rather than clearing bits in the MSR image
directly so that emulate_single_step() will continue to work
correctly on Book 3E processors.

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2010-06-22 19:40:50 +10:00
K.Prasad 5aae8a5370 powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors
Implement perf-events based hw-breakpoint interfaces for PowerPC
64-bit server (Book III S) processors.  This allows access to a
given location to be used as an event that can be counted or
profiled by the perf_events subsystem.

This is done using the DABR (data breakpoint register), which can
also be used for process debugging via ptrace.  When perf_event
hw_breakpoint support is configured in, the perf_event subsystem
manages the DABR and arbitrates access to it, and ptrace then
creates a perf_event when it is requested to set a data breakpoint.

[Adopted suggestions from Paul Mackerras <paulus@samba.org> to
- emulate_step() all system-wide breakpoints and single-step only the
  per-task breakpoints
- perform arch-specific cleanup before unregistration through
  arch_unregister_hw_breakpoint()
]

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2010-06-22 19:40:50 +10:00
Milton Miller bd2b64a12b powerpc: rtas_flash needs to use rtas_data_buf
When trying to flash a machine via the update_flash command, Anton received the
following error:

    Restarting system.
    FLASH: kernel bug...flash list header addr above 4GB

The code in question has a comment that the flash list should be in
the kernel data and therefore under 4GB:

        /* NOTE: the "first" block list is a global var with no data
         * blocks in the kernel data segment.  We do this because
         * we want to ensure this block_list addr is under 4GB.
         */

Unfortunately the Kconfig option is marked tristate which means the variable
may not be in the kernel data and could be above 4GB.

Instead of relying on the data segment being below 4GB, use the static
data buffer allocated by the kernel for use by rtas.  Since we don't
use the header struct directly anymore, convert it to a simple pointer.

Reported-By: Anton Blanchard <anton@samba.org>
Signed-Off-By: Milton Miller <miltonm@bga.com
Tested-By: Anton Blanchard <anton@samba.org>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-06-15 15:02:37 +10:00
Christoph Hellwig f1ba9a5b2a powerpc: Unconditionally enabled irq stacks
Irq stacks provide an essential protection from stack overflows through
external interrupts, at the cost of two additionals stacks per CPU.

Enable them unconditionally to simplify the kernel build and prevent
people from accidentally disabling them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-06-15 15:02:37 +10:00
Matt Evans b636f1379e powerpc/kexec: Wait for online/possible CPUs only.
kexec_perpare_cpus_wait() iterates i through NR_CPUS to check
paca[i].kexec_state of each to make sure they have quiesced.
However now we have dynamic PACA allocation, paca[NR_CPUS] is not necessarily
valid and we overrun the array;  spurious "cpu is not possible, ignoring"
errors result.  This patch iterates for_each_online_cpu so stays
within the bounds of paca[] -- and every CPU is now 'possible'.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-06-15 15:02:33 +10:00
Linus Torvalds eda054770e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: clear bridge resource range if BIOS assigned bad one
  PCI: hotplug/cpqphp, fix NULL dereference
  Revert "PCI: create function symlinks in /sys/bus/pci/slots/N/"
  PCI: change resource collision messages from KERN_ERR to KERN_INFO
2010-06-11 14:15:44 -07:00
Yinghai Lu 837c4ef13c PCI: clear bridge resource range if BIOS assigned bad one
Yannick found that video does not work with 2.6.34.  The cause of this
bug was that the BIOS had assigned the wrong range to the PCI bridge
above the video device.  Before 2.6.34 the kernel would have shrunk
the size of the bridge window, but since
  d65245c PCI: don't shrink bridge resources
the kernel will avoid shrinking BIOS ranges.

So zero out the old range if we fail to claim it at boot time; this will
cause us to allocate a new range at startup, restoring the 2.6.34
behavior.

Fixes regression https://bugzilla.kernel.org/show_bug.cgi?id=16009.

Reported-by: Yannick <yannick.roehlly@free.fr>
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-06-11 13:24:51 -07:00
Ananth N Mavinakayanahalli db97bc7f99 powerpc/kprobes: Remove resume_execution() in kprobes
emulate_step() in kprobe_handler() would've already determined if the
probed instruction can be emulated. We single-step in hardware only if
the instruction couldn't be emulated. resume_execution() therefore is
superfluous -- all we need is to fix up the instruction pointer after
single-stepping.

Thanks to Paul Mackerras for catching this.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-06-02 17:50:37 +10:00
Linus Torvalds aef4b9aaae Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Don't export cvt_fd & _df when CONFIG_PPC_FPU is not set
  powerpc/44x: icon: select SM502 and frame buffer console support
  powerpc/85xx: Add P1021MDS board support
  powerpc/85xx: Change MPC8572DS camp dtses for MSI sharing
  powerpc/fsl_msi: add removal path and probe failing path
  powerpc/fsl_msi: enable msi sharing through AMP OSes
  powerpc/fsl_msi: enable msi allocation in all banks
  powerpc/fsl_msi: fix the conflict of virt_msir's chip_data
  powerpc/fsl_msi: Add multiple MSI bank support
  powerpc/kexec: Add support for FSL-BookE
  powerpc/fsl-booke: Move the entry setup code into a seperate file
  powerpc/fsl-booke: fix the case where we are not in the first page
  powerpc/85xx: Enable support for ports 3 and 4 on 8548 CDS
  powerpc/fsl-booke: Add hibernation support for FSL BookE processors
  powerpc/e500mc: Implement machine check handler.
  powerpc/44x: Add basic ICON PPC440SPe board support
  powerpc/44x: Fix UART clocks on 440SPe
  powerpc/44x: Add reset-type to katmai.dts
  powerpc/44x: Adding PCI-E support for PowerPC 460SX based SOC.
2010-06-01 14:13:14 -07:00
Linus Torvalds 1f73897861 Merge branch 'for-35' of git://repo.or.cz/linux-kbuild
* 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits)
  kbuild: Revert part of e8d400a to resolve a conflict
  kbuild: Fix checking of scm-identifier variable
  gconfig: add support to show hidden options that have prompts
  menuconfig: add support to show hidden options which have prompts
  gconfig: remove show_debug option
  gconfig: remove dbg_print_ptype() and dbg_print_stype()
  kconfig: fix zconfdump()
  kconfig: some small fixes
  add random binaries to .gitignore
  kbuild: Include gen_initramfs_list.sh and the file list in the .d file
  kconfig: recalc symbol value before showing search results
  .gitignore: ignore *.lzo files
  headerdep: perlcritic warning
  scripts/Makefile.lib: Align the output of LZO
  kbuild: Generate modules.builtin in make modules_install
  Revert "kbuild: specify absolute paths for cscope"
  kbuild: Do not unnecessarily regenerate modules.builtin
  headers_install: use local file handles
  headers_check: fix perl warnings
  export_report: fix perl warnings
  ...
2010-06-01 08:55:52 -07:00
Benjamin Herrenschmidt a7fed9f736 powerpc: Don't export cvt_fd & _df when CONFIG_PPC_FPU is not set
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-31 11:51:54 +10:00
Benjamin Herrenschmidt ecca1a34be Merge commit 'kumar/next' into next
Conflicts:
	arch/powerpc/sysdev/fsl_msi.c
2010-05-31 10:01:50 +10:00
FUJITA Tomonori 712d3e22a8 powerpc: remove unnecessary sync_single_range_* in swiotlb_dma_ops
sync_single_range_for_cpu and sync_single_range_for_device hooks in
swiotlb_dma_ops are unnecessary because sync_single_for_cpu and
sync_single_for_device are used there.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Becky Bruce <beckyb@kernel.crashing.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:52 -07:00
Sebastian Andrzej Siewior b3df895aeb powerpc/kexec: Add support for FSL-BookE
This adds support kexec on FSL-BookE where the MMU can not be simply
switched off. The code borrows the initial MMU-setup code to create the
identical mapping mapping. The only difference to the original boot code
is the size of the mapping(s) and the executeable address.
The kexec code maps the first 2 GiB of memory in 256 MiB steps. This
should work also on e500v1 boxes.
SMP support is still not available.

(Kumar: Added minor change to build to ifdef CONFIG_PPC_STD_MMU_64 some
code that was PPC64 specific)

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-05-24 21:25:32 -05:00
Sebastian Andrzej Siewior 7c08ce718f powerpc/fsl-booke: Move the entry setup code into a seperate file
This patch only moves the initial entry code which setups the mapping
from what ever to KERNELBASE into a seperate file. No code change has
been made here.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-05-24 14:01:12 -05:00
Sebastian Andrzej Siewior 2289d2d1a8 powerpc/fsl-booke: fix the case where we are not in the first page
During boot we change the mapping a few times until we have a "defined"
mapping. During this procedure a small 4KiB mapping is created and after
that one a 64MiB. Currently the offset of the 4KiB page in that we run
is zero because the complete startup up code is in first page which
starts at RPN zero.
If the code is recycled and moved to another location then its execution
will fail because the start address in 64 MiB mapping is computed
wrongly. It does not consider the offset to the page from the begin of
the memory.
This patch fixes this. Usually (system boot) r25 is zero so this does
not change anything unless the code is recycled.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-05-24 14:00:53 -05:00
Grant Likely cf9b59e9d3 Merge remote branch 'origin' into secretlab/next-devicetree
Merging in current state of Linus' tree to deal with merge conflicts and
build failures in vio.c after merge.

Conflicts:
	drivers/i2c/busses/i2c-cpm.c
	drivers/i2c/busses/i2c-mpc.c
	drivers/net/gianfar.c

Also fixed up one line in arch/powerpc/kernel/vio.c to use the
correct node pointer.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-05-22 00:36:56 -06:00
Grant Likely 4018294b53 of: Remove duplicate fields from of_platform_driver
.name, .match_table and .owner are duplicated in both of_platform_driver
and device_driver.  This patch is a removes the extra copies from struct
of_platform_driver and converts all users to the device_driver members.

This patch is a pretty mechanical change.  The usage model doesn't change
and if any drivers have been missed, or if anything has been fixed up
incorrectly, then it will fail with a compile time error, and the fixup
will be trivial.  This patch looks big and scary because it touches so
many files, but it should be pretty safe.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Sean MacLennan <smaclennan@pikatech.com>
2010-05-22 00:10:40 -06:00
Grant Likely 597b9d1e44 drivercore: Add of_match_table to the common device drivers
OF-style matching can be available to any device, on any type of bus.
This patch allows any driver to provide an OF match table when CONFIG_OF
is enabled so that drivers can be bound against devices described in
the device tree.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-22 00:10:40 -06:00
Grant Likely cb6dc512b7 arch/powerpc: Move dma_mask from of_device into pdev_archdata
By moving dma_mask into pdev_archdata, and adding archdata to
struct of_device, it makes it possible to substitute of_device
with struct platform_device, which is a stepping stone to
removing the of_platform bus entirely.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-05-22 00:10:40 -06:00
Linus Torvalds 98edb6ca41 Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits)
  KVM: x86: Add missing locking to arch specific vcpu ioctls
  KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
  KVM: MMU: Segregate shadow pages with different cr0.wp
  KVM: x86: Check LMA bit before set_efer
  KVM: Don't allow lmsw to clear cr0.pe
  KVM: Add cpuid.txt file
  KVM: x86: Tell the guest we'll warn it about tsc stability
  x86, paravirt: don't compute pvclock adjustments if we trust the tsc
  x86: KVM guest: Try using new kvm clock msrs
  KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
  KVM: x86: add new KVMCLOCK cpuid feature
  KVM: x86: change msr numbers for kvmclock
  x86, paravirt: Add a global synchronization point for pvclock
  x86, paravirt: Enable pvclock flags in vcpu_time_info structure
  KVM: x86: Inject #GP with the right rip on efer writes
  KVM: SVM: Don't allow nested guest to VMMCALL into host
  KVM: x86: Fix exception reinjection forced to true
  KVM: Fix wallclock version writing race
  KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
  KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
  ...
2010-05-21 17:16:21 -07:00
Linus Torvalds 79c4581262 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (92 commits)
  powerpc: Remove unused 'protect4gb' boot parameter
  powerpc: Build-in e1000e for pseries & ppc64_defconfig
  powerpc/pseries: Make request_ras_irqs() available to other pseries code
  powerpc/numa: Use ibm,architecture-vec-5 to detect form 1 affinity
  powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to enable zone reclaim
  powerpc: Use smt_snooze_delay=-1 to always busy loop
  powerpc: Remove check of ibm,smt-snooze-delay OF property
  powerpc/kdump: Fix race in kdump shutdown
  powerpc/kexec: Fix race in kexec shutdown
  powerpc/kexec: Speedup kexec hash PTE tear down
  powerpc/pseries: Add hcall to read 4 ptes at a time in real mode
  powerpc: Use more accurate limit for first segment memory allocations
  powerpc/kdump: Use chip->shutdown to disable IRQs
  powerpc/kdump: CPUs assume the context of the oopsing CPU
  powerpc/crashdump: Do not fail on NULL pointer dereferencing
  powerpc/eeh: Fix oops when probing in early boot
  powerpc/pci: Check devices status property when scanning OF tree
  powerpc/vio: Switch VIO Bus PM to use generic helpers
  powerpc: Avoid bad relocations in iSeries code
  powerpc: Use common cpu_die (fixes SMP+SUSPEND build)
  ...
2010-05-21 11:17:05 -07:00
Anton Vorontsov 90103f932f powerpc/fsl-booke: Add hibernation support for FSL BookE processors
This is started as swsusp_32.S modifications, but the amount of #ifdefs
made the whole file horribly unreadable, so let's put the support into
its own separate file.

The code should be relatively easy to modify to support 44x BookEs as
well, but since I don't have any 44x to test, let's confine the code to
FSL BookE. (The only FSL-specific part so far is 'flush_dcache_L1'.)

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-05-21 07:41:53 -05:00
Scott Wood fe04b11215 powerpc/e500mc: Implement machine check handler.
Most of the MSCR bit assigments are different in e500mc versus
e500, and they are now write-one-to-clear.

Some e500mc machine check conditions are made recoverable (as long as
they aren't stuck on), most notably L1 instruction cache parity errors.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-05-21 07:41:52 -05:00
FUJITA Tomonori 99ec28f183 powerpc: Remove unused 'protect4gb' boot parameter
'protect4gb' boot parameter was introduced to avoid allocating dma
space acrossing 4GB boundary in 2007 (the commit
569975591c).

In 2008, the IOMMU was fixed to use the boundary_mask parameter per
device properly. So 'protect4gb' workaround was removed (the
383af9525b). But somehow I messed the
'protect4gb' boot parameter that was used to enable the
workaround.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:13 +10:00
Anton Blanchard b878dc0059 powerpc: Use smt_snooze_delay=-1 to always busy loop
Right now if we want to busy loop and not give up any time to the hypervisor
we put a very large value into smt_snooze_delay. This is sometimes useful
when running a single partition and you want to avoid any latencies due
to the hypervisor or CPU power state transitions. While this works, it's a bit
ugly - how big a number is enough now we have NO_HZ and can be idle for a very
long time.

The patch below makes smt_snooze_delay signed, and a negative value means loop
forever:

echo -1 > /sys/devices/system/cpu/cpu0/smt_snooze_delay

This change shouldn't affect the existing userspace tools (eg ppc64_cpu), but
I'm cc-ing Nathan just to be sure.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:12 +10:00
Anton Blanchard dd04c63c96 powerpc: Remove check of ibm,smt-snooze-delay OF property
I'm not sure why we have code for parsing an ibm,smt-snooze-delay OF
property. Since we have a smt-snooze-delay= boot option and we can
also set it at runtime via sysfs, it should be safe to get rid of
this code.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:11 +10:00
Michael Neuling 60adec6226 powerpc/kdump: Fix race in kdump shutdown
When we are crashing, the crashing/primary CPU IPIs the secondaries to
turn off IRQs, go into real mode and wait in kexec_wait.  While this
is happening, the primary tears down all the MMU maps.  Unfortunately
the primary doesn't check to make sure the secondaries have entered
real mode before doing this.

On PHYP machines, the secondaries can take a long time shutting down
the IRQ controller as RTAS calls are need.  These RTAS calls need to
be serialised which resilts in the secondaries contending in
lock_rtas() and hence taking a long time to shut down.

We've hit this on large POWER7 machines, where some secondaries are
still waiting in lock_rtas(), when the primary tears down the HPTEs.

This patch makes sure all secondaries are in real mode before the
primary tears down the MMU.  It uses the new kexec_state entry in the
paca.  It times out if the secondaries don't reach real mode after
10sec.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:11 +10:00
Michael Neuling 1fc711f7ff powerpc/kexec: Fix race in kexec shutdown
In kexec_prepare_cpus, the primary CPU IPIs the secondary CPUs to
kexec_smp_down().  kexec_smp_down() calls kexec_smp_wait() which sets
the hw_cpu_id() to -1.  The primary does this while leaving IRQs on
which means the primary can take a timer interrupt which can lead to
the IPIing one of the secondary CPUs (say, for a scheduler re-balance)
but since the secondary CPU now has a hw_cpu_id = -1, we IPI CPU
-1... Kaboom!

We are hitting this case regularly on POWER7 machines.

There is also a second race, where the primary will tear down the MMU
mappings before knowing the secondaries have entered real mode.

Also, the secondaries are clearing out any pending IPIs before
guaranteeing that no more will be received.

This changes kexec_prepare_cpus() so that we turn off IRQs in the
primary CPU much earlier.  It adds a paca flag to say that the
secondaries have entered the kexec_smp_down() IPI and turned off IRQs,
rather than overloading hw_cpu_id with -1.  This new paca flag is
again used to in indicate when the secondaries has entered real mode.

It also ensures that all CPUs have their IRQs off before we clear out
any pending IPI requests (in kexec_cpu_down()) to ensure there are no
trailing IPIs left unacknowledged.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:11 +10:00
Anton Blanchard 095c7965f4 powerpc: Use more accurate limit for first segment memory allocations
Author: Milton Miller <miltonm@bga.com>

On large machines we are running out of room below 256MB. In some cases we
only need to ensure the allocation is in the first segment, which may be
256MB or 1TB.

Add slb0_limit and use it to specify the upper limit for the irqstack and
emergency stacks.

On a large ppc64 box, this fixes a panic at boot when the crashkernel=
option is specified (previously we would run out of memory below 256MB).

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:10 +10:00
Anton Blanchard 5d7a87217d powerpc/kdump: Use chip->shutdown to disable IRQs
I saw this in a kdump kernel:

IOMMU table initialized, virtual merging enabled
Interrupt 155954 (real) is invalid, disabling it.
Interrupt 155953 (real) is invalid, disabling it.

ie we took some spurious interrupts. default_machine_crash_shutdown tries
to disable all interrupt sources but uses chip->disable which maps to
the default action of:

static void default_disable(unsigned int irq)
{
}

If we use chip->shutdown, then we actually mask the IRQ:

static void default_shutdown(unsigned int irq)
{
        struct irq_desc *desc = irq_to_desc(irq);

        desc->chip->mask(irq);
        desc->status |= IRQ_MASKED;
}

Not sure why we don't implement a ->disable action for xics.c, or why
default_disable doesn't mask the interrupt.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:10 +10:00
Anton Blanchard 0644079410 powerpc/kdump: CPUs assume the context of the oopsing CPU
We wrap the crash_shutdown_handles[] calls with longjmp/setjmp, so if any
of them fault we can recover. The problem is we add a hook to the debugger
fault handler hook which calls longjmp unconditionally.

This first part of kdump is run before we marshall the other CPUs, so there
is a very good chance some CPU on the box is going to page fault. And when
it does it hits the longjmp code and assumes the context of the oopsing CPU.
The machine gets very confused when it has 10 CPUs all with the same stack,
all thinking they have the same CPU id. I get even more confused trying
to debug it.

The patch below adds crash_shutdown_cpu and uses it to specify which cpu is
in the protected region. Since it can only be -1 or the oopsing CPU, we don't
need to use memory barriers since it is only valid on the local CPU - no other
CPU will ever see a value that matches it's local CPU id.

Eventually we should switch the order and marshall all CPUs before doing the
crash_shutdown_handles[] calls, but that is a bigger fix.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:10 +10:00
Maxim Uvarov 426b6cb478 powerpc/crashdump: Do not fail on NULL pointer dereferencing
Signed-off-by: Maxim Uvarov <muvarov@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:09 +10:00
Sonny Rao 5b339bdf16 powerpc/pci: Check devices status property when scanning OF tree
We ran into an issue where it looks like we're not properly ignoring a
pci device with a non-good status property when we walk the device tree
and instanciate the Linux side PCI devices.

However, the EEH init code does look for the property and disables EEH
on these devices. This leaves us in an inconsistent where we are poking
at a supposedly bad piece of hardware and RTAS will block our config
cycles because EEH isn't enabled anyway.

Signed-of-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:09 +10:00
Brian King a1263c7144 powerpc/vio: Switch VIO Bus PM to use generic helpers
Switch to use the generic power management helpers.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:09 +10:00
Milton Miller abb17f9c3a powerpc: Use common cpu_die (fixes SMP+SUSPEND build)
Configuring a powerpc 32 bit kernel for both SMP and SUSPEND turns on
CPU_HOTPLUG to enable disable_nonboot_cpus to be called by the common
suspend code.  Previously the definition of cpu_die for ppc32 was in
the powermac platform code, causing it to be undefined if that platform
as not selected.

arch/powerpc/kernel/built-in.o: In function 'cpu_idle':
arch/powerpc/kernel/idle.c:98: undefined reference to 'cpu_die'

Move the code from setup_64 to smp.c and rename the power mac
versions to their specific names.

Note that this does not setup the cpu_die pointers in either
smp_ops (request a given cpu die) or ppc_md (make this cpu die),
for other platforms but there are generic versions in smp.c.

Reported-by: Matt Sealey <matt@genesi-usa.com>
Reported-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:08 +10:00
Michael Ellerman 7358650e9e powerpc/rtasd: Don't start event scan if scan rate is zero
There appear to be Pegasos systems which have the rtas-event-scan
RTAS tokens, but on which the event scan always fails. They also
have an event-scan-rate property containing 0, which means call
event scan 0 times per minute.

So interpret a scan rate of 0 to mean don't scan at all. This fixes
the problem on the Pegasos machines and makes sense as well.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:29:39 +10:00