linux/kernel
David Miller b9f8fcd55b sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
Relax stable-sched-clock architectures to not save/disable/restore
hardirqs in cpu_clock().

The background is that I was trying to resolve a sparc64 perf
issue when I discovered this problem.

On sparc64 I implement pseudo NMIs by simply running the kernel
at IRQ level 14 when local_irq_disable() is called, this allows
performance counter events to still come in at IRQ level 15.

This doesn't work if any code in an NMI handler does
local_irq_save() or local_irq_disable() since the "disable" will
kick us back to cpu IRQ level 14 thus letting NMIs back in and
we recurse.

The only path which that does that in the perf event IRQ
handling path is the code supporting frequency based events.  It
uses cpu_clock().

cpu_clock() simply invokes sched_clock() with IRQs disabled.

And that's a fundamental bug all on it's own, particularly for
the HAVE_UNSTABLE_SCHED_CLOCK case.  NMIs can thus get into the
sched_clock() code interrupting the local IRQ disable code
sections of it.

Furthermore, for the not-HAVE_UNSTABLE_SCHED_CLOCK case, the IRQ
disabling done by cpu_clock() is just pure overhead and
completely unnecessary.

So the core problem is that sched_clock() is not NMI safe, but
we are invoking it from NMI contexts in the perf events code
(via cpu_clock()).

A less important issue is the overhead of IRQ disabling when it
isn't necessary in cpu_clock().

CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures are not
affected by this patch.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091213.182502.215092085.davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15 09:04:36 +01:00
..
gcov microblaze: Enable GCOV_PROFILE_ALL 2009-09-21 14:29:21 +02:00
irq Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-12-09 19:43:33 -08:00
power PM / Runtime: Export the PM runtime workqueue 2009-12-06 16:17:56 +01:00
time hrtimer: Tune hrtimer_interrupt hang logic 2009-12-10 13:08:11 +01:00
trace Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-11 20:47:44 -08:00
.gitignore
acct.c bsdacct: switch credentials for writing to the accounting file 2009-08-24 11:33:40 +10:00
async.c async: Fix lack of boot-time console due to insufficient synchronization 2009-06-08 12:31:53 -07:00
audit.c Audit: send signal info if selinux is disabled 2009-09-24 03:50:26 -04:00
audit.h Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
audit_tree.c Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
audit_watch.c Audit: reorganize struct audit_watch to save 8 bytes 2009-09-24 03:50:25 -04:00
auditfilter.c Audit: clean up all op= output to include string quoting 2009-06-24 00:00:52 -04:00
auditsc.c Audit: rearrange audit_context to save 16 bytes per struct 2009-09-24 03:50:26 -04:00
backtracetest.c
bounds.c
capability.c remove CONFIG_SECURITY_FILE_CAPABILITIES compile option 2009-11-24 15:06:47 +11:00
cgroup.c cgroup: fix strstrip() misuse 2009-10-29 07:39:25 -07:00
cgroup_freezer.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
compat.c signals: implement sys_rt_tgsigqueueinfo 2009-04-30 19:24:24 +02:00
configs.c
cpu.c Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-12 11:34:10 -08:00
cpuset.c sched: Fix balance vs hotplug race 2009-12-06 21:10:56 +01:00
cred-internals.h
cred.c creds_are_invalid() needs to be exported for use by modules: 2009-09-23 11:02:26 -07:00
delayacct.c headers: taskstats_kern.h trim 2009-09-18 09:48:52 -07:00
dma.c
exec_domain.c
exit.c tty: Move the leader test in disassociate 2009-12-11 15:18:08 -08:00
extable.c
fork.c Merge branch 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block 2009-12-08 08:19:16 -08:00
freezer.c sched: fix nr_uninterruptible accounting of frozen tasks really 2009-07-18 14:19:53 +02:00
futex.c futex: Take mmap_sem for get_user_pages in fault_in_user_writeable 2009-12-08 14:59:36 +01:00
futex_compat.c futex: Fix compat_futex to be same as futex for REQUEUE_PI 2009-08-10 15:41:12 +02:00
groups.c groups: move code to kernel/groups.c 2009-06-16 19:47:48 -07:00
hrtimer.c hrtimer: move timer stats helper functions to hrtimer.c 2009-12-10 13:08:11 +01:00
hung_task.c softlockup: Fix hung_task_check_count sysctl 2009-11-27 06:21:57 +01:00
hw_breakpoint.c hw-breakpoints: Modify breakpoints without unregistering them 2009-12-09 09:48:20 +01:00
itimer.c itimers: Fix racy writes to cpu_itimer fields 2009-11-18 16:32:12 +01:00
kallsyms.c hw-breakpoints: Fix broken hw-breakpoint sample module 2009-11-10 11:23:29 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks mutex: Better control mutex adaptive spinning config 2009-12-03 11:50:11 +01:00
Kconfig.preempt
kexec.c kexec: fix omitting offset in extended crashkernel syntax 2009-07-29 19:10:34 -07:00
kfifo.c kfifo: Use "const" definitions 2009-09-19 13:13:17 -07:00
kgdb.c kgdb: Always process the whole breakpoint list on activate or deactivate 2009-12-11 08:43:20 -06:00
kmod.c security: report the module name to security_module_request 2009-11-10 09:33:46 +11:00
kprobes.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-05 15:30:21 -08:00
ksysfs.c
kthread.c sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c 2009-11-03 07:25:00 +01:00
latencytop.c
lockdep.c lockdep: Avoid out of bounds array reference in save_trace() 2009-12-10 08:29:33 +01:00
lockdep_internals.h lockdep: BFS cleanup 2009-07-24 10:53:29 +02:00
lockdep_proc.c seq_file: constify seq_operations 2009-09-23 07:39:29 -07:00
lockdep_states.h
Makefile Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm 2009-12-08 08:02:38 -08:00
module.c modules: don't export section names of empty sections via sysfs 2009-12-02 15:38:25 -08:00
mutex-debug.c headers: remove sched.h from interrupt.h 2009-10-11 11:20:58 -07:00
mutex-debug.h
mutex.c mutex: Better control mutex adaptive spinning config 2009-12-03 11:50:11 +01:00
mutex.h
notifier.c kprobes: Fix to add __kprobes to notify_die 2009-08-30 03:08:26 +02:00
ns_cgroup.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
nsproxy.c nsproxy: extract create_nsproxy() 2009-06-18 13:03:56 -07:00
panic.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-10-08 12:16:35 -07:00
params.c param: fix setting arrays of bool 2009-10-29 08:56:20 +10:30
perf_event.c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-11 20:47:30 -08:00
pid.c mm: also use alloc_large_system_hash() for the PID hash table 2009-09-22 07:17:38 -07:00
pid_namespace.c pidns: deny CLONE_PARENT|CLONE_NEWPID combination 2009-09-24 07:21:04 -07:00
pm_qos_params.c pm_qos: clean up racy global "name" variable 2009-10-14 15:31:10 +02:00
posix-cpu-timers.c posix-cpu-timers: optimize and document timer_create callback 2009-11-18 12:36:05 +01:00
posix-timers.c time: Introduce CLOCK_REALTIME_COARSE 2009-08-21 21:43:46 +02:00
printk.c Merge branch 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-05 09:50:22 -08:00
profile.c kernel/profile.c: Switch /proc/irq/prof_cpu_mask to seq_file 2009-09-20 20:15:40 +02:00
ptrace.c ptrace: __ptrace_detach: do __wake_up_parent() if we reap the tracee 2009-09-24 07:20:59 -07:00
rcupdate.c rcu: Re-arrange code to reduce #ifdef pain 2009-11-22 18:58:16 +01:00
rcutiny.c rcu: Eliminate unneeded function wrapping 2009-11-22 18:58:16 +01:00
rcutorture.c rcu: Add expedited grace-period support for preemptible RCU 2009-12-03 11:35:25 +01:00
rcutree.c rcu: Add expedited grace-period support for preemptible RCU 2009-12-03 11:35:25 +01:00
rcutree.h rcu: Add expedited grace-period support for preemptible RCU 2009-12-03 11:35:25 +01:00
rcutree_plugin.h rcu: Add expedited grace-period support for preemptible RCU 2009-12-03 11:35:25 +01:00
rcutree_trace.c rcu: Add expedited grace-period support for preemptible RCU 2009-12-03 11:35:25 +01:00
relay.c const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
res_counter.c memcg: some modification to softlimit under hierarchical memory reclaim. 2009-10-01 16:11:13 -07:00
resource.c resources: when allocate_resource() fails, leave resource untouched 2009-11-04 13:06:46 -08:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock() 2009-08-06 05:50:21 +02:00
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c sched: Use rcu in sched_get_rr_param() 2009-12-14 17:11:35 +01:00
sched_clock.c sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK 2009-12-15 09:04:36 +01:00
sched_cpupri.c sched: Add new prio to cpupri before removing old prio 2009-08-02 14:26:09 +02:00
sched_cpupri.h
sched_debug.c sched: Remove forced2_migrations stats 2009-12-10 20:32:39 +01:00
sched_fair.c sched: Update normalized values on user updates via proc 2009-12-09 10:04:02 +01:00
sched_features.h sched: Discard some old bits 2009-12-09 10:03:07 +01:00
sched_idletask.c sched: Use pr_fmt() and pr_<level>() 2009-12-13 08:13:55 +01:00
sched_rt.c sched: Protect sched_rr_get_param() access to task->sched_class 2009-12-09 10:01:07 +01:00
sched_stats.h
seccomp.c
semaphore.c
signal.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-05 15:30:21 -08:00
slow-work-debugfs.c SLOW_WORK: Move slow_work's proc file to debugfs 2009-12-01 08:20:31 -08:00
slow-work.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6 2009-12-08 07:38:50 -08:00
slow-work.h SLOW_WORK: Move slow_work's proc file to debugfs 2009-12-01 08:20:31 -08:00
smp.c generic-ipi: Add smp_call_function_any() 2009-11-18 14:52:25 +01:00
softirq.c rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls 2009-11-02 16:06:37 +01:00
softlockup.c sysctl: remove "struct file *" argument of ->proc_handler 2009-09-24 07:21:04 -07:00
spinlock.c locking: Reduce ifdefs in kernel/spinlock.c 2009-11-13 20:53:28 +01:00
srcu.c rcu: Add synchronize_srcu_expedited() 2009-10-26 09:40:30 +01:00
stacktrace.c
stop_machine.c
sys.c Merge branch 'bkl-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-09 08:07:17 -08:00
sys_ni.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
sysctl.c Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-12 11:34:10 -08:00
sysctl_binary.c sysctl binary: Reorder the tests to process wild card entries first. 2009-11-12 01:42:31 -08:00
sysctl_check.c ipv4 05/05: add sysctl to accept packets with local source addresses 2009-12-03 12:14:38 -08:00
taskstats.c genetlink: make netns aware 2009-07-12 14:03:27 -07:00
test_kprobes.c
time.c Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-08 19:27:08 -08:00
timeconst.pl
timer.c Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-09-23 09:46:15 -07:00
tracepoint.c trivial: fix typo "to to" in multiple files 2009-09-21 15:14:55 +02:00
tsacct.c
uid16.c headers: utsname.h redux 2009-09-23 18:13:10 -07:00
up.c
user-return-notifier.c core: Clean up user return notifers use of per_cpu 2009-12-02 10:22:59 +01:00
user.c uids: Prevent tear down race 2009-11-02 16:02:39 +01:00
user_namespace.c
utsname.c utsns: extract creeate_uts_ns() 2009-06-18 13:03:55 -07:00
utsname_sysctl.c sysctl kernel: Remove binary sysctl logic 2009-11-12 02:04:55 -08:00
wait.c locking, sched: Give waitqueue spinlocks their own lockdep classes 2009-08-10 14:43:09 +02:00
workqueue.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2009-12-10 09:35:44 -08:00