Commit graph

377568 commits

Author SHA1 Message Date
Vineet Gupta 220f535408 xtensa: Flat DeviceTree copy not future-safe
flat DT copy code calls bootmem allocator with @align = 0.
This is probably OK with legacy allocator which xtensa uses right now,
but this will panic right away with memblock allocator

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Marc Gauthier <marc@tensilica.com>
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:18:57 -07:00
Max Filippov a99e07ee5e xtensa: check TLB sanity on return to userspace
- check that user TLB mappings correspond to the current page table;
- check that TLB mapping VPN is in the kernel/user address range
  in accordance with its ASID.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:18:56 -07:00
Max Filippov c5a771d067 xtensa: adjust boot parameters address when INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX is selected
The virtual address of boot parameters chain is passed to the kernel via
a2 register. Adjust it in case it is remapped during MMUv3 -> MMUv2
mapping change, i.e. when it is in the first 128M.

Also fix interpretation of initrd and FDT addresses passed in the boot
parameters: these are physical addresses.

Cc: stable@vger.kernel.org
Reported-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:18:56 -07:00
Baruch Siach 661b40b036 xtensa: bootparams: fix typo
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:18:55 -07:00
Baruch Siach 77b4dcdbfa xtensa: tell git to ignore generated .dtb files
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:18:55 -07:00
Baruch Siach e3f432919f xtensa: ccount based sched_clock
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:11:38 -07:00
Baruch Siach 925f5532e8 xtensa: ccount based clockevent implementation
Reused some code from a preliminary implementation by Max Fillippov.

Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:11:38 -07:00
Baruch Siach 8102f47ab5 xtensa: consolidate ccount access routines
Use get_ccount everywhere; remove xtensa_get_ccount.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:11:37 -07:00
Baruch Siach e504c4b607 xtensa: cleanup ccount frequency tracking
Remove unused nsec_per_ccount, and rename ccount_per_jiffy to ccount_preq.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:11:37 -07:00
Baruch Siach ed9dfed62c xtensa: timex.h: remove unused symbols
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:11:36 -07:00
Baruch Siach f47a3ca224 xtensa: tell git to ignore copied zlib source files
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-07-08 01:11:35 -07:00
Chris Zankel 033d777f54 Merge branch 'timers-core-for-linus' of https://git.kernel.org/cgit/linux/kernel/git/tip/tip into tst5 2013-07-08 01:10:26 -07:00
Thomas Gleixner 73b0cd674c hrtimer: Remove unused variable
Sigh, should have noticed myself.

Reported-by: fengguang.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-06 10:34:00 +02:00
Thomas Gleixner 5ec2481b7b hrtimers: Move SMP function call to thread context
smp_call_function_* must not be called from softirq context.

But clock_was_set() which calls on_each_cpu() is called from softirq
context to implement a delayed clock_was_set() for the timer interrupt
handler. Though that almost never gets invoked. A recent change in the
resume code uses the softirq based delayed clock_was_set to support
Xens resume mechanism.

linux-next contains a new warning which warns if smp_call_function_*
is called from softirq context which gets triggered by that Xen
change.

Fix this by moving the delayed clock_was_set() call to a work context.

Reported-and-tested-by: Artem Savkov <artem.savkov@gmail.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>,
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: xen-devel@lists.xen.org
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-05 17:25:58 +02:00
Thomas Gleixner 332962f2c8 clocksource: Reselect clocksource when watchdog validated high-res capability
Up to commit 5d33b883a (clocksource: Always verify highres capability)
we had no sanity check when selecting a clocksource, which prevented
that a non highres capable clocksource is used when the system already
switched to highres/nohz mode.

The new sanity check works as Alex and Tim found out. It prevents the
TSC from being used. This happens because on x86 the boot process
looks like this:

 tsc_start_freqency_validation(TSC);
 clocksource_register(HPET);
 clocksource_done_booting();
	clocksource_select()
		Selects HPET which is valid for high-res

 switch_to_highres();

 clocksource_register(TSC);
 	TSC is not selected, because it is not yet
	flagged as VALID_HIGH_RES

 clocksource_watchdog()
	Validates TSC for highres, but that does not make TSC
	the current clocksource.

Before the sanity check was added, we installed TSC unvalidated which
worked most of the time. If the TSC was really detected as unstable,
then the unstable logic removed it and installed HPET again.

The sanity check is correct and needed. So the watchdog needs to kick
a reselection of the clocksource, when it qualifies TSC as a valid
high res clocksource.

To solve this, we mark the clocksource which got the flag
CLOCK_SOURCE_VALID_FOR_HRES set by the watchdog with an new flag
CLOCK_SOURCE_RESELECT and trigger the watchdog thread. The watchdog
thread evaluates the flag and invokes clocksource_select() when set.

To avoid that the clocksource_done_booting() code, which is about to
install the first real clocksource anyway, needs to go through
clocksource_select and tick_oneshot_notify() pointlessly, split out
the clocksource_watchdog_kthread() list walk code and invoke the
select/notify only when called from clocksource_watchdog_kthread().

So clocksource_done_booting() can utilize the same splitout code
without the select/notify invocation and the clocksource_mutex
unlock/relock dance.

Reported-and-tested-by: Alex Shi <alex.shi@intel.com>
Cc: Hans Peter Anvin <hpa@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Tested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307042239150.11637@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-05 11:09:28 +02:00
Thomas Gleixner 2b0f89317e Merge branch 'timers/posix-cpu-timers-for-tglx' of
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core

Frederic sayed: "Most of these patches have been hanging around for
several month now, in -mmotm for a significant chunk. They already
missed a few releases."

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-04 23:11:22 +02:00
KOSAKI Motohiro fa18f7bde3 posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
When tsk->signal->cputimer->running is 1, signal->cputimer (i.e. per process
timer account) and tsk->sum_sched_runtime (i.e. per thread timer account)
increase at the same pace because update_curr() increases both accounting.

However, there is one exception. When thread exiting, __exit_signal() turns
over task's sum_shced_runtime to sig->sum_sched_runtime, but it doesn't stop
signal->cputimer accounting.

This inconsistency makes POSIX timer wake up too early. This patch fixes it.

Original-patch-by: Olivier Langlois <olivier@trillion01.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-07-04 18:02:30 +02:00
Frederic Weisbecker a0b2062b09 posix_timers: fix racy timer delta caching on task exit
When a task exits, we perform a caching of the remaining cputime delta
before expiring of its timers.

This is done from the following places:

* When the task is reaped. We iterate through its list of
  posix cpu timers and store the remaining timer delta to
  the timer struct instead of the absolute value.
  (See posix_cpu_timers_exit() / posix_cpu_timers_exit_group() )

* When we call posix_cpu_timer_get() or posix_cpu_timer_schedule().
  If the timer's task is considered dying when watched from these
  places, the same conversion from absolute to relative expiry time
  is performed. Then the given task's reference is released.
  (See clear_dead_task() ).

The relevance of this caching is questionable but this is another
and deeper debate.

The big issue here is that these two sources of caching don't mix
up very well together.

More specifically, the caching can easily be done twice, resulting
in a wrong delta as it gets spuriously substracted a second time by
the elapsed clock. This can happen in the following scenario:

1) The task exits and gets reaped: we call posix_cpu_timers_exit()
   and the absolute timer expiry values are converted to a relative
   delta.

2) timer_gettime() -> posix_cpu_timer_get() is called and relies on
   clear_dead_task() because  tsk->exit_state == EXIT_DEAD.
   The delta gets substracted again by the elapsed clock and we return
   a wrong result.

To fix this, just remove the caching done on task reaping time.  It
doesn't bring much value on its own.  The caching done from
posix_cpu_timer_get/schedule is enough.

And it would also be hard to get it really right: we could make it put and
clear the target task in the timer struct so that readers know if they are
dealing with a relative cached of absolute value.  But it would be racy.
The only safe way to do it would be to lock the itimer->it_lock so that we
know nobody reads the cputime expiry value while we modify it and its
target task reference.  Doing so would involve some funny workarounds to
avoid circular lock against the sighand lock.  There is just no reason to
maintain this.

The user visible effect of this patch can be observed by running the
following code: it creates a subthread that launches a posix cputimer
which expires after 10 seconds. But then the subthread only busy loops for 2
seconds and exits. The parent reaps the subthread and read the timer value.
Its expected value should the be the initial timer's expiration value
minus the cputime elapsed in the subthread. Roughly 10 - 2 = 8 seconds:

	#include <sys/time.h>
	#include <stdio.h>
	#include <unistd.h>
	#include <time.h>
	#include <pthread.h>

	static timer_t id;
	static struct itimerspec val = { .it_value.tv_sec = 10, }, new;

	static void *thread(void *unused)
	{
		int err;
		struct timeval start, end, diff;

		timer_create(CLOCK_THREAD_CPUTIME_ID, NULL, &id);
		if (err < 0) {
			perror("Can't create timer\n");
			return NULL;
		}

		/* Arm 10 sec timer */
		err = timer_settime(id, 0, &val, NULL);
		if (err < 0) {
			perror("Can't set timer\n");
			return NULL;
		}

		/* Exit after 2 seconds of execution */
		gettimeofday(&start, NULL);
	        do {
			gettimeofday(&end, NULL);
			timersub(&end, &start, &diff);
		} while (diff.tv_sec < 2);

		return NULL;
	}

	int main(int argc, char **argv)
	{
		pthread_t pthread;
		int err;

		err = pthread_create(&pthread, NULL, thread, NULL);
		if (err) {
			perror("Can't create thread\n");
			return -1;
		}
		pthread_join(pthread, NULL);
		/* Just wait a little bit to make sure the child got reaped */
		sleep(1);
		err = timer_gettime(id, &new);
		if (err)
			perror("Can't get timer value\n");
		printf("%d %ld\n", new.it_value.tv_sec, new.it_value.tv_nsec);

		return 0;
	}

Before the patch:

       $ ./posix_cpu_timers
       6 2278074

After the patch:

      $ ./posix_cpu_timers
      8 1158766

Before the patch, the elapsed time got two more seconds spuriously accounted.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-07-03 16:54:42 +02:00
Frederic Weisbecker 76cdcdd979 posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
In order to re-arm a timer after it fired, we take a sample of the current
process or thread cputime.

If the task is dying though, we don't arm anything but we cache the
remaining timer expiration delta for further reads.

Something similar is performed in posix_cpu_timer_get() but here we forget
to take the process wide cputime sample before caching it.

As a result we are storing random stack content, leading every further
reads of that timer to return junk values.

Fix this by taking the appropriate sample in the case of process wide
timers.

This probably doesn't matter much in practice because, at this stage, the
thread is the last one in the group and we reached exit_notify().  This
implies that we called exit_itimers() and there should be no more timers
to handle for that task.

So this is likely dead code anyway but let's fix the current logic
and the warning that came along:

    kernel/posix-cpu-timers.c: In function 'posix_cpu_timer_schedule':
    kernel/posix-cpu-timers.c:1127: warning: 'now' may be used uninitialized in this function

Then we can start to think further about cleaning up that code.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Chen Gang <gang.chen@asianux.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-07-03 16:20:20 +02:00
Frederic Weisbecker 0bc4b0cf15 selftests: add basic posix timers selftests
Add some initial basic tests on a few posix timers interface such as
setitimer() and timer_settime().

These simply check that expiration happens in a reasonable timeframe after
expected elapsed clock time (user time, user + system time, real time,
...).

This is helpful for finding basic breakages while hacking
on this subsystem.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-07-03 16:20:03 +02:00
Frederic Weisbecker 2473f3e7a9 posix_cpu_timers: consolidate expired timers check
Consolidate the common code amongst per thread and per process timers list
on tick time.

List traversal, expiry check and subsequent updates can be shared in a
common helper.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-07-03 16:19:23 +02:00
Frederic Weisbecker 1a7fa510b3 posix_cpu_timers: consolidate timer list cleanups
Cleaning up the posix cpu timers on task exit shares some common code
among timer list types, most notably the list traversal and expiry time
update.

Unify this in a common helper.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-07-03 16:18:37 +02:00
Frederic Weisbecker 55ccb616a6 posix_cpu_timer: consolidate expiry time type
The posix cpu timer expiry time is stored in a union of two types: a 64
bits field if we rely on scheduler precise accounting, or a cputime_t if
we rely on jiffies.

This results in quite some duplicate code and special cases to handle the
two types.

Just unify this into a single 64 bits field.  cputime_t can always fit
into it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2013-07-03 16:16:20 +02:00
Thomas Gleixner 07bd117290 tick: Sanitize broadcast control logic
The recent implementation of a generic dummy timer resulted in a
different registration order of per cpu local timers which made the
broadcast control logic go belly up.

If the dummy timer is the first clock event device which is registered
for a CPU, then it is installed, the broadcast timer is initialized
and the CPU is marked as broadcast target.

If a real clock event device is installed after that, we can fail to
take the CPU out of the broadcast mask. In the worst case we end up
with two periodic timer events firing for the same CPU. One from the
per cpu hardware device and one from the broadcast.

Now the problem is that we have no way to distinguish whether the
system is in a state which makes broadcasting necessary or the
broadcast bit was set due to the nonfunctional dummy timer
installment.

To solve this we need to keep track of the system state seperately and
provide a more detailed decision logic whether we keep the CPU in
broadcast mode or not.

The old decision logic only clears the broadcast mode, if the newly
installed clock event device is not affected by power states.

The new logic clears the broadcast mode if one of the following is
true:

  - The new device is not affected by power states.

  - The system is not in a power state affected mode

  - The system has switched to oneshot mode. The oneshot broadcast is
    controlled from the deep idle state. The CPU is not in idle at
    this point, so it's safe to remove it from the mask.

If we clear the broadcast bit for the CPU when a new device is
installed, we also shutdown the broadcast device when this was the
last CPU in the broadcast mask.

If the broadcast bit is kept, then we leave the new device in shutdown
state and rely on the broadcast to deliver the timer interrupts via
the broadcast ipis.

Reported-and-tested-by: Stehle Vincent-B46079 <B46079@freescale.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: John Stultz <john.stultz@linaro.org>,
Cc: Mark Rutland <mark.rutland@arm.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-02 14:26:45 +02:00
Thomas Gleixner 1f73a9806b tick: Prevent uncontrolled switch to oneshot mode
When the system switches from periodic to oneshot mode, the broadcast
logic causes a possibility that a CPU which has not yet switched to
oneshot mode puts its own clock event device into oneshot mode without
updating the state and the timer handler.

CPU0				CPU1
				per cpu tickdev is in periodic mode
				and switched to broadcast

Switch to oneshot mode
 tick_broadcast_switch_to_oneshot()
  cpumask_copy(tick_oneshot_broacast_mask,
	       tick_broadcast_mask);

  broadcast device mode = oneshot

				Timer interrupt
						
				irq_enter()
				 tick_check_oneshot_broadcast()
				  dev->set_mode(ONESHOT);

				tick_handle_periodic()
				 if (dev->mode == ONESHOT)
				   dev->next_event += period;
				   FAIL.

We fail, because dev->next_event contains KTIME_MAX, if the device was
in periodic mode before the uncontrolled switch to oneshot happened.

We must copy the broadcast bits over to the oneshot mask, because
otherwise a CPU which relies on the broadcast would not been woken up
anymore after the broadcast device switched to oneshot mode.

So we need to verify in tick_check_oneshot_broadcast() whether the CPU
has already switched to oneshot mode. If not, leave the device
untouched and let the CPU switch controlled into oneshot mode.

This is a long standing bug, which was never noticed, because the main
user of the broadcast x86 cannot run into that scenario, AFAICT. The
nonarchitected timer mess of ARM creates a gazillion of differently
broken abominations which trigger the shortcomings of that broadcast
code, which better had never been necessary in the first place.

Reported-and-tested-by: Stehle Vincent-B46079 <B46079@freescale.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: John Stultz <john.stultz@linaro.org>,
Cc: Mark Rutland <mark.rutland@arm.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-02 14:26:45 +02:00
Thomas Gleixner c9b5a266b1 tick: Make oneshot broadcast robust vs. CPU offlining
In periodic mode we remove offline cpus from the broadcast propagation
mask. In oneshot mode we fail to do so. This was not a problem so far,
but the recent changes to the broadcast propagation introduced a
constellation which can result in a NULL pointer dereference.

What happens is:

CPU0			CPU1
			idle()
			  arch_idle()
			    tick_broadcast_oneshot_control(OFF);
			      set cpu1 in tick_broadcast_force_mask
			  if (cpu_offline())
			     arch_cpu_dead()

cpu_dead_cleanup(cpu1)
 cpu1 tickdevice pointer = NULL

broadcast interrupt
  dereference cpu1 tickdevice pointer -> OOPS

We dereference the pointer because cpu1 is still set in
tick_broadcast_force_mask and tick_do_broadcast() expects a valid
cpumask and therefor lacks any further checks.

Remove the cpu from the tick_broadcast_force_mask before we set the
tick device pointer to NULL. Also add a sanity check to the oneshot
broadcast function, so we can detect such issues w/o crashing the
machine.

Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: athorlton@sgi.com
Cc: CAI Qian <caiqian@redhat.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1306261303260.4013@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-02 14:26:44 +02:00
Linus Torvalds 8bb495e3f0 Linux 3.10 2013-06-30 15:13:29 -07:00
Linus Torvalds f0277dce1b Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull another powerpc fix from Benjamin Herrenschmidt:
 "I mentioned that while we had fixed the kernel crashes, EEH error
  recovery didn't always recover...  It appears that I had a fix for
  that already in powerpc-next (with a stable CC).

  I cherry-picked it today and did a few tests and it seems that things
  now work quite well.  The patch is also pretty simple, so I see no
  reason to wait before merging it."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/eeh: Fix fetching bus for single-dev-PE
2013-06-30 15:08:15 -07:00
Linus Torvalds 4b483802fd SCSI fixes on 20130626
This is a set of seven bug fixes.  Several fcoe fixes for locking problems,
 initiator issues and a VLAN API change, all of which could eventually lead to
 data corruption, one fix for a qla2xxx locking problem which could lead to
 multiple completions of the same request (and subsequent data corruption) and
 a use after free in the ipr driver.  Plus one minor MAINTAINERS file update
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJRy9qEAAoJEDeqqVYsXL0MIoUH/RkCyVWZqj2utTMXg0olPxm/
 GE2OC5X944mBrMeY0wSfEj58NQoo9cRiXGTxZLl+X0lRHbTU4CCGUAAEu/0o5N/M
 Gqtmk3Gn+iY819F0gYqs/En4IjLPsifonXaamHA1341NlNjDqb6IQdOQN8qjlfQT
 aDebRuzS/z5jFO8pviem+GD2FDVSCdkM24SeJLxqxoyuOR77W/3n6bjlVY1jkCoI
 lFA5k9OeqQRiEGqR+Da2nLWmCPt85R+qjNzxTqFmF2gh+Z/VW1hwAcjWz+/xSc4O
 V7d/ZN9Qhk31PY3oQ1Q1jJU0fW95bqOo6dZvGrLkOVc1FkaFgjMF7RQuA8rVxRI=
 =xWnQ
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of seven bug fixes.  Several fcoe fixes for locking
  problems, initiator issues and a VLAN API change, all of which could
  eventually lead to data corruption, one fix for a qla2xxx locking
  problem which could lead to multiple completions of the same request
  (and subsequent data corruption) and a use after free in the ipr
  driver.  Plus one minor MAINTAINERS file update"

(only six bugfixes in this pull, since I had already pulled the fcoe API
fix directly from Robert Love)

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] ipr: Avoid target_destroy accessing memory after it was freed
  [SCSI] qla2xxx: Fix for locking issue between driver ISR and mailbox routines
  MAINTAINERS: Fix fcoe mailing list
  libfc: extend ex_lock to protect all of fc_seq_send
  libfc: Correct check for initiator role
  libfcoe: Fix Conflicting FCFs issue in the fabric
2013-06-30 15:06:25 -07:00
Gavin Shan ea461abf61 powerpc/eeh: Fix fetching bus for single-dev-PE
While running Linux as guest on top of phyp, we possiblly have
PE that includes single PCI device. However, we didn't return
its PCI bus correctly and it leads to failure on recovery from
EEH errors for single-dev-PE. The patch fixes the issue.

Cc: <stable@vger.kernel.org> # v3.7+
Cc: Steve Best <sbest@us.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-30 14:08:34 +10:00
Linus Torvalds 6c355beafd Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
 "We discovered some breakage in our "EEH" (PCI Error Handling) code
  while doing error injection, due to a couple of regressions.  One of
  them is due to a patch (37f02195be "powerpc/pci: fix PCI-e devices
  rescan issue on powerpc platform") that, in hindsight, I shouldn't
  have merged considering that it caused more problems than it solved.

  Please pull those two fixes.  One for a simple EEH address cache
  initialization issue.  The other one is a patch from Guenter that I
  had originally planned to put in 3.11 but which happens to also fix
  that other regression (a kernel oops during EEH error handling and
  possibly hotplug).

  With those two, the couple of test machines I've hammered with error
  injection are remaining up now.  EEH appears to still fail to recover
  on some devices, so there is another problem that Gavin is looking
  into but at least it's no longer crashing the kernel."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/pci: Improve device hotplug initialization
  powerpc/eeh: Add eeh_dev to the cache during boot
2013-06-29 17:02:48 -07:00
Olof Johansson 8d5bc1a6ac ARM: dt: Only print warning, not WARN() on bad cpu map in device tree
Due to recent changes and expecations of proper cpu bindings, there are
now cases for many of the in-tree devicetrees where a WARN() will hit
on boot due to badly formatted /cpus nodes.

Downgrade this to a pr_warn() to be less alarmist, since it's not a
new problem.

Tested on Arndale, Cubox, Seaboard and Panda ES. Panda hits the WARN
without this, the others do not.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-29 17:00:40 -07:00
Guenter Roeck 7846de406f powerpc/pci: Improve device hotplug initialization
Commit 37f02195b (powerpc/pci: fix PCI-e devices rescan issue on powerpc
platform) fixes a problem with interrupt and DMA initialization on hot
plugged devices. With this commit, interrupt and DMA initialization for
hot plugged devices is handled in the pci device enable function.

This approach has a couple of drawbacks. First, it creates two code paths
for device initialization, one for hot plugged devices and another for devices
known during the initial PCI scan. Second, the initialization code for hot
plugged devices is only called when the device is enabled, ie typically
in the probe function. Also, the platform specific setup code is called each
time pci_enable_device() is called, not only once during device discovery,
meaning it is actually called multiple times, once for devices discovered
during the initial scan and again each time a driver is re-loaded.

The visible result is that interrupt pins are only assigned to hot plugged
devices when the device driver is loaded. Effectively this changes the PCI
probe API, since pci_dev->irq and the device's dma configuration will now
only be valid after pci_enable() was called at least once. A more subtle
change is that platform specific PCI device setup is moved from device
discovery into the driver's probe function, more specifically into the
pci_enable_device() call.

To fix the inconsistencies, add new function pcibios_add_device.
Call pcibios_setup_device from pcibios_setup_bus_devices if device setup
is not complete, and from pcibios_add_device if bus setup is complete.

With this change, device setup code is moved back into device initialization,
and called exactly once for both static and hot plugged devices.

[ This also fixes a regression introduced by the above patch which
  causes dev->irq to be overwritten under some cirumstances after
  MSIs have been enabled for the device which leads to crashes due
  to the MSI core "hijacking" dev->irq to store the base MSI number
  and not the LSI. --BenH
]

Cc: Yuanquan Chen <Yuanquan.Chen@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-30 08:46:46 +10:00
Linus Torvalds 133841cab7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This fixes a crash in the crypto layer exposed by an SCTP test tool"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: algboss - Hold ref count on larval
2013-06-29 11:34:18 -07:00
Linus Torvalds 6554431937 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm/qxl fix from Dave Airlie:
 "Bad me forgot an access check, possible security issue, but since this
  is the first kernel with it, should be fine to just put it in now"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/qxl: add missing access check for execbuffer ioctl
2013-06-29 11:32:05 -07:00
Mathieu Desnoyers 706b23bde2 Fix: kernel/ptrace.c: ptrace_peek_siginfo() missing __put_user() validation
This __put_user() could be used by unprivileged processes to write into
kernel memory.  The issue here is that even if copy_siginfo_to_user()
fails, the error code is not checked before __put_user() is executed.

Luckily, ptrace_peek_siginfo() has been added within the 3.10-rc cycle,
so it has not hit a stable release yet.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-29 11:29:08 -07:00
Linus Torvalds bd2931b5cf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
 "This is a recently spotted regression in the snapshot behavior...

  It turns out several tests weren't being run in the nightlies so this
  took a while to spot"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: send snapshot context with writes
2013-06-29 10:31:15 -07:00
Linus Torvalds 63edbce160 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ubifs fixes from Al Viro:
 "A couple of ubifs readdir/lseek race fixes.  Stable fodder, really
  nasty..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  UBIFS: fix a horrid bug
  UBIFS: prepare to fix a horrid bug
2013-06-29 10:30:31 -07:00
Linus Torvalds a61aef7fc0 MN10300 changes 2013-06-28
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIVAwUAUc20KxOxKuMESys7AQIjwQ/9HVdEuh3fGHnagqq3N18n+hvLIeczAFpf
 8DO2YvIEdi+qw3AQoUexryeRjAPpwAWcyujdDXArC/VPzNjl15QgUhIiTjeBD07Y
 8Y9Iwr5PtKJOAvbw6YE45OfpY1vPBk9VpsqQjreaSRszjZroC5ApMwyvL0Lxb85V
 rWF+sIaFW+1tc+BT6Qaj9JiAR5A0SDRcD3H65jVRENvekFrCY0mqS+ay5LOK5GFi
 wuK2hmSnvm7+aJukVuBYIwkq1ZvoKpcOAjMSet1P9TzEf6lC4Xs3vGX4LoRkuATU
 jQVRpyxDIbdyDzqhbGcsfC+ydipr4RkqyeCs04itpO75D0EJ0ByZVwHPoRMj8ZG6
 CSEno2PrcLMo4a7O+rdCxOpGxqZrRViEBI9nVQfVft/yw0ZWzxLFqrjOFOS9sbWW
 GePY0B4DN68pq9q/wkdLRskw/rWzMTnDN078HemVf3T60MZxTUjJEh0EXpmq/qd1
 0e7Yrrs1WYokh0oEvvjVZg1kQj9jYBsglgKIiDiyqbVKhPwoTXLYKdui5vLu7Djp
 j/qZsqFXh1k3edepoqp/INVXJuoyAJpkbOh2ruSr5I1jMVG7byq4Zo6TCJOoLT/h
 Hs4wZCXni6oWPircrS6SrTpjKyCVYCsDHX8efu+551DzWi/oJ1z147hMXks628HI
 ohJoCnRj5Lg=
 =5e/L
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20130628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300

Pull two MN10300 fixes from David Howells:
 "The first fixes a problem with passing arrays rather than pointers to
  get_user() where __typeof__ then wants to declare and initialise an
  array variable which gcc doesn't like.

  The second fixes a problem whereby putting mem=xxx into the kernel
  command line causes init=xxx to get an incorrect value."

* tag 'for-linus-20130628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300:
  mn10300: Use early_param() to parse "mem=" parameter
  mn10300: Allow to pass array name to get_user()
2013-06-29 10:28:52 -07:00
Linus Torvalds a75930c633 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "Correct an ordering issue in the tick broadcast code.  I really wish
  we'd get compensation for pain and suffering for each line of code we
  write to work around dysfunctional timer hardware."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick: Fix tick_broadcast_pending_mask not cleared
2013-06-29 10:27:19 -07:00
Linus Torvalds 82d0b80ad6 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
 "One more fix for a recently discovered bug"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Disable monitoring on setuid processes for regular users
2013-06-29 10:26:50 -07:00
Artem Bityutskiy 605c912bb8 UBIFS: fix a horrid bug
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.

This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses
it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage,
but this may corrupt memory and lead to all kinds of problems like crashes an
security holes.

This patch fixes the problem by using the 'file->f_version' field, which
'->llseek()' always unconditionally sets to zero. We set it to 1 in
'ubifs_readdir()' and whenever we detect that it became 0, we know there was a
seek and it is time to clear the state saved in 'file->private_data'.

I tested this patch by writing a user-space program which runds readdir and
seek in parallell. I could easily crash the kernel without these patches, but
could not crash it with these patches.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:45:37 +04:00
Artem Bityutskiy 33f1a63ae8 UBIFS: prepare to fix a horrid bug
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.

First of all, this means that 'file->private_data' can be freed while
'ubifs_readdir()' uses it.  But this particular patch does not fix the problem.
This patch is only a preparation, and the fix will follow next.

In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly,
because 'file->f_pos' can be changed by '->llseek()' at any point. This may
lead 'ubifs_readdir()' to returning inconsistent data: directory entry names
may correspond to incorrect file positions.

So here we introduce a local variable 'pos', read 'file->f_pose' once at very
the beginning, and then stick to 'pos'. The result of this is that when
'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of
'ubifs_readdir()', the latter "wins".

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:45:37 +04:00
David Vrabel 47433b8c9d x86: xen: Sync the CMOS RTC as well as the Xen wallclock
Adjustments to Xen's persistent clock via update_persistent_clock()
don't actually persist, as the Xen wallclock is a software only clock
and modifications to it do not modify the underlying CMOS RTC.

The x86_platform.set_wallclock hook is there to keep the hardware RTC
synchronized. On a guest this is pointless.

On Dom0 we can use the native implementaion which actually updates the
hardware RTC, but we still need to keep the software emulation of RTC
for the guests up to date. The subscription to the pvclock_notifier
allows us to emulate this easily. The notifier is called at every tick
and when the clock was set.

Right now we only use that notifier when the clock was set, but due to
the fact that it is called periodically from the timekeeping update
code, we can utilize it to emulate the NTP driven drift compensation
of update_persistant_clock() for the Xen wall (software) clock.

Add a 11 minutes periodic update to the pvclock_gtod notifier callback
to achieve that. The static variable 'next' which maintains that 11
minutes update cycle is protected by the core code serialization so
there is no need to add a Xen specific serialization mechanism.

[ tglx: Massaged changelog and added a few comments ]

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-6-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28 23:15:07 +02:00
David Vrabel 5584880e44 x86: xen: Sync the wallclock when the system time is set
Currently the Xen wallclock is only updated every 11 minutes if NTP is
synchronized to its clock source (using the sync_cmos_clock() work).
If a guest is started before NTP is synchronized it may see an
incorrect wallclock time.

Use the pvclock_gtod notifier chain to receive a notification when the
system time has changed and update the wallclock to match.

This chain is called on every timer tick and we want to avoid an extra
(expensive) hypercall on every tick.  Because dom0 has historically
never provided a very accurate wallclock and guests do not expect one,
we can do this simply: the wallclock is only updated if the clock was
set.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-5-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28 23:15:06 +02:00
David Vrabel 780427f0e1 timekeeping: Indicate that clock was set in the pvclock gtod notifier
If the clock was set (stepped), set the action parameter to functions
in the pvclock gtod notifier chain to non-zero.  This allows the
callee to only do work if the clock was stepped.

This will be used on Xen as the synchronization of the Xen wallclock
to the control domain's (dom0) system time will be done with this
notifier and updating on every timer tick is unnecessary and too
expensive.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-4-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28 23:15:06 +02:00
David Vrabel 04397fe94a timekeeping: Pass flags instead of multiple bools to timekeeping_update()
Instead of passing multiple bools to timekeeping_updated(), define
flags and use a single 'action' parameter.  It is then more obvious
what each timekeeping_update() call does.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-3-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28 23:15:06 +02:00
David Vrabel 0eb0716514 xen: Remove clock_was_set() call in the resume path
commit 359cdd3f866(xen: maintain clock offset over save/restore) added
a clock_was_set() call into the xen resume code to propagate the
system time changes. With the modified hrtimer resume code, which
makes sure that all cpus are notified this call is not longer necessary.

[ tglx: Separated it from the hrtimer change ]

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk  <konrad.wilk@oracle.com>
Cc: John Stultz  <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-2-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28 23:15:06 +02:00
David Vrabel 7c4c3a0f18 hrtimers: Support resuming with two or more CPUs online (but stopped)
hrtimers_resume() only reprograms the timers for the current CPU as it
assumes that all other CPUs are offline at this point in the resume
process. If other CPUs are online then their timers will not be
corrected and they may fire at the wrong time.

When running as a Xen guest, this assumption is not true.  Non-boot
CPUs are only stopped with IRQs disabled instead of offlining them.
This is a performance optimization as disabling the CPUs would add an
unacceptable amount of additional downtime during a live migration (>
200 ms for a 4 VCPU guest).

hrtimers_resume() cannot call on_each_cpu(retrigger_next_event,...)
as the other CPUs will be stopped with IRQs disabled.  Instead, defer
the call to the next softirq.

[ tglx: Separated the xen change out ]

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk  <konrad.wilk@oracle.com>
Cc: John Stultz  <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-2-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28 23:15:06 +02:00
Akira Takeuchi e3f12a5304 mn10300: Use early_param() to parse "mem=" parameter
This fixes the problem that "init=" options may not be passed to kernel
correctly.

parse_mem_cmdline() of mn10300 arch gets rid of "mem=" string from
redboot_command_line. Then init_setup() parses the "init=" options from
static_command_line, which is a copy of redboot_command_line, and keeps
the pointer to the init options in execute_command variable.

Since the commit 026cee0 upstream (params: <level>_initcall-like kernel
parameters), static_command_line becomes overwritten by saved_command_line at
do_initcall_level(). Notice that saved_command_line is a command line
which includes "mem=" string.

As a result, execute_command may point to weird string by the length of
"mem=" parameter.
I noticed this problem when using the command line like this:

    mem=128M console=ttyS0,115200 init=/bin/sh

Here is the processing flow of command line parameters.
    start_kernel()
      setup_arch(&command_line)
         parse_mem_cmdline(cmdline_p)
           * strcpy(boot_command_line, redboot_command_line);
           * Remove "mem=xxx" from redboot_command_line.
           * *cmdline_p = redboot_command_line;
      setup_command_line(command_line) <-- command_line is redboot_command_line
        * strcpy(saved_command_line, boot_command_line)
        * strcpy(static_command_line, command_line)
      parse_early_param()
        strlcpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
        parse_early_options(tmp_cmdline);
          parse_args("early options", cmdline, NULL, 0, 0, 0, do_early_param);
      parse_args("Booting ..", static_command_line, ...);
        init_setup() <-- save the pointer in execute_command
      rest_init()
        kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);

At this point, execute_command points to "/bin/sh" string.

    kernel_init()
      kernel_init_freeable()
        do_basic_setup()
          do_initcalls()
            do_initcall_level()
              (*) strcpy(static_command_line, saved_command_line);

Here, execute_command gets to point to "200" string !!

Signed-off-by: David Howells <dhowells@redhat.com>
2013-06-28 16:53:03 +01:00