linux/kernel
Jason Wessel 9c106c119e softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression
The touch_nmi_watchdog() routine on x86 ultimately calls
touch_softlockup_watchdog().  The problem is that to touch the
softlockup watchdog, the cpu_clock code has to be called which could
involve multiple cpu locks and can lead to a hard hang if one of the
locks is held by a processor that is not going to return anytime soon
(such as could be the case with kgdb or perhaps even with some other
kind of exception).

This patch causes the public version of the
touch_softlockup_watchdog() to defer the cpu clock access to a later
point.

The test case for this problem is to use the following kernel config
options:

CONFIG_KGDB_TESTS=y
CONFIG_KGDB_TESTS_ON_BOOT=y
CONFIG_KGDB_TESTS_BOOT_STRING="V1F100I100000"

It should be noted that kgdb test suite and these options were not
available until 2.6.26-rc2, so it was necessary to patch the kgdb
test suite during the bisection.

I would consider this patch a regression fix because the problem first
appeared in commit 27ec440779 when some
logic was added to try to periodically sync the clocks.  It was
possible to work around this particular problem by simply not
performing the sync anytime the system was in a critical context.
This was ok until commit 3e51f33fcc,
which added config option CONFIG_HAVE_UNSTABLE_SCHED_CLOCK and some
multi-cpu locks to sync the clocks.  It became clear that accessing
this code from an nmi was the source of the lockups.  Avoiding the
access to the low level clock code from an code inside the NMI
processing also fixed the problem with the 27ec44... commit.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 09:45:38 +02:00
..
irq genirq: reenable a nobody cared disabled irq when a new driver arrives 2008-05-02 13:40:34 +02:00
power
time clocksource: allow read access to available/current_clocksource 2008-05-03 18:11:48 +02:00
.gitignore
acct.c
audit.c [patch 1/1] audit_send_reply(): fix error-path memory leak 2008-05-17 03:30:22 -04:00
audit.h
audit_tree.c [PATCH] list_for_each_rcu must die: audit 2008-05-17 03:30:23 -04:00
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c capabilities: remain source compatible with 32-bit raw legacy capability support. 2008-05-31 16:36:16 -07:00
cgroup.c cgroups: remove node_ prefix_from ns subsystem 2008-05-24 09:56:14 -07:00
cgroup_debug.c
compat.c
configs.c
cpu.c
cpuset.c cpuset: limit the input of cpuset.sched_relax_domain_level 2008-06-19 09:45:36 +02:00
delayacct.c
dma.c
exec_domain.c
exit.c signals: fix sigqueue_free() vs __exit_signal() race 2008-05-24 09:56:10 -07:00
extable.c
fork.c [PATCH] dup_fd() fixes, part 1 2008-05-16 17:22:26 -04:00
futex.c Removal of FUTEX_FD 2008-05-05 08:18:45 -07:00
futex_compat.c
hrtimer.c hrtimer: remove duplicate helper function 2008-05-03 18:11:48 +02:00
itimer.c
kallsyms.c
Kconfig.hz
Kconfig.preempt
kexec.c
kfifo.c
kgdb.c kgdb: use common ascii helpers and put_unaligned_be32 helper 2008-05-28 12:49:56 -05:00
kmod.c
kprobes.c kprobes: fix error checking of batch registration 2008-06-12 18:05:40 -07:00
ksysfs.c
kthread.c
latencytop.c
lockdep.c
lockdep_internals.h
lockdep_proc.c
Makefile sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK 2008-05-05 23:56:18 +02:00
marker.c
module.c modules: proper cleanup of kobject without CONFIG_SYSFS 2008-05-23 13:09:33 +10:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c
panic.c
params.c
pid.c
pid_namespace.c
pm_qos_params.c
posix-cpu-timers.c
posix-timers.c
printk.c
profile.c
ptrace.c make generic sys_ptrace unconditional 2008-05-01 10:21:54 -07:00
rcuclassic.c
rcupdate.c
rcupreempt.c rcupreempt: remove export of rcu_batches_completed_bh 2008-06-19 09:45:37 +02:00
rcupreempt_trace.c
rcutorture.c
relay.c splice: fix sendfile() issue with relay 2008-05-28 14:49:27 +02:00
res_counter.c
resource.c
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c cpuset: limit the input of cpuset.sched_relax_domain_level 2008-06-19 09:45:36 +02:00
sched_clock.c sched: fix sched_clock_cpu() 2008-05-29 11:29:19 +02:00
sched_debug.c revert ("sched: fair-group: SMP-nice for group scheduling") 2008-05-29 11:28:57 +02:00
sched_fair.c sched: stop wake_affine from causing serious imbalance 2008-05-29 11:29:20 +02:00
sched_features.h
sched_idletask.c sched: make rt_sched_class, idle_sched_class static 2008-05-05 23:56:17 +02:00
sched_rt.c revert ("sched: fair-group: SMP-nice for group scheduling") 2008-05-29 11:28:57 +02:00
sched_stats.h show_schedstat(): fix memleak 2008-05-29 11:25:15 +02:00
seccomp.c
semaphore.c Revert "semaphore: fix" 2008-05-10 20:43:22 -07:00
signal.c posix timers: discard SI_TIMER signals on exec 2008-05-26 10:37:07 -07:00
softirq.c
softlockup.c softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression 2008-06-19 09:45:38 +02:00
spinlock.c
srcu.c
stacktrace.c
stop_machine.c stop_machine: make stop_machine_run more virtualization friendly 2008-05-23 13:09:34 +10:00
sys.c sys_prctl(): fix return of uninitialized value 2008-05-24 09:56:13 -07:00
sys_ni.c
sysctl.c [PATCH] avoid multiplication overflows and signedness issues for max_fds 2008-05-16 17:22:52 -04:00
sysctl_check.c
taskstats.c
test_kprobes.c
time.c Make constants in kernel/timeconst.h fixed 64 bits 2008-05-02 16:18:42 -07:00
timeconst.pl Make constants in kernel/timeconst.h fixed 64 bits 2008-05-02 16:18:42 -07:00
timer.c
tsacct.c
uid16.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
wait.c
workqueue.c