linux/kernel
Paul E. McKenney a682604838 rcu: Teach RCU that idle task is not quiscent state at boot
This patch fixes a bug located by Vegard Nossum with the aid of
kmemcheck, updated based on review comments from Nick Piggin,
Ingo Molnar, and Andrew Morton.  And cleans up the variable-name
and function-name language.  ;-)

The boot CPU runs in the context of its idle thread during boot-up.
During this time, idle_cpu(0) will always return nonzero, which will
fool Classic and Hierarchical RCU into deciding that a large chunk of
the boot-up sequence is a big long quiescent state.  This in turn causes
RCU to prematurely end grace periods during this time.

This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
function to ignore the idle task as a quiescent state until the
system has started up the scheduler in rest_init(), introducing a
new non-API function rcu_idle_now_means_idle() to inform RCU of this
transition.  RCU maintains an internal rcu_idle_cpu_truthful variable
to track this state, which is then used by rcu_check_callback() to
determine if it should believe idle_cpu().

Because this patch has the effect of disallowing RCU grace periods
during long stretches of the boot-up sequence, this patch also introduces
Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
no-op if num_online_cpus() returns 1.  This allows boot-time code that
calls synchronize_rcu() to proceed normally.  Note, however, that RCU
callbacks registered by call_rcu() will likely queue up until later in
the boot sequence.  Although rcuclassic and rcutree can also use this
same optimization after boot completes, rcupreempt must restrict its
use of this optimization to the portion of the boot sequence before the
scheduler starts up, given that an rcupreempt RCU read-side critical
section may be preeempted.

In addition, this patch takes Nick Piggin's suggestion to make the
system_state global variable be __read_mostly.

Changes since v4:

o	Changes the name of the introduced function and variable to
	be less emotional.  ;-)

Changes since v3:

o	WARN_ON(nr_context_switches() > 0) to verify that RCU
	switches out of boot-time mode before the first context
	switch, as suggested by Nick Piggin.

Changes since v2:

o	Created rcu_blocking_is_gp() internal-to-RCU API that
	determines whether a call to synchronize_rcu() is itself
	a grace period.

o	The definition of rcu_blocking_is_gp() for rcuclassic and
	rcutree checks to see if but a single CPU is online.

o	The definition of rcu_blocking_is_gp() for rcupreempt
	checks to see both if but a single CPU is online and if
	the system is still in early boot.

	This allows rcupreempt to again work correctly if running
	on a single CPU after booting is complete.

o	Added check to rcupreempt's synchronize_sched() for there
	being but one online CPU.

Tested all three variants both SMP and !SMP, booted fine, passed a short
rcutorture test on both x86 and Power.

Located-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 04:08:14 +01:00
..
irq Merge branch 'core/xen' into x86/urgent 2009-02-04 14:54:56 +01:00
power PM: Split up sysdev_[suspend|resume] from device_power_[down|up] 2009-02-22 10:33:44 -08:00
time hrtimers: allow the hot-unplugging of all cpus 2009-01-30 22:35:29 +01:00
trace tracing: limit the number of loops the ring buffer self test can make 2009-02-18 22:50:01 -05:00
.gitignore
acct.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
async.c async: use list_move_tail 2009-02-08 10:00:26 -08:00
audit.c [PATCH] fix broken timestamps in AVC generated by kernel threads 2008-12-09 02:27:41 -05:00
audit.h fixing audit rule ordering mess, part 1 2009-01-04 15:14:41 -05:00
audit_tree.c audit: validate comparison operations, store them in sane form 2009-01-04 15:14:42 -05:00
auditfilter.c audit: validate comparison operations, store them in sane form 2009-01-04 15:14:42 -05:00
auditsc.c make sure that filterkey of task,always rules is reported 2009-01-04 15:14:42 -05:00
backtracetest.c
bounds.c
capability.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
cgroup.c cgroups: fix possible use after free 2009-02-18 15:37:54 -08:00
cgroup_debug.c cgroups: fix probable race with put_css_set[_taskexit] and find_css_set 2008-10-20 08:52:38 -07:00
cgroup_freezer.c freezer_cg: disable writing freezer.state of root cgroup 2008-11-12 17:17:16 -08:00
compat.c Allow times and time system calls to return small negative values 2009-01-06 15:59:13 -08:00
configs.c kernel/configs.c: remove useless comments 2008-10-20 08:52:34 -07:00
cpu.c stop_machine/cpu hotplug: fix disable_nonboot_cpus 2009-01-07 11:36:14 -08:00
cpuset.c cpuset: fix possible deadlock in async_rebuild_sched_domains 2009-01-19 02:44:00 +01:00
cred-internals.h CRED: Inaugurate COW credentials 2008-11-14 10:39:23 +11:00
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 2009-01-09 13:59:25 -08:00
delayacct.c schedstat: consolidate per-task cpu runtime stats 2008-12-18 13:54:01 +01:00
dma-coherent.c dma-coherent: Restore dma_alloc_from_coherent() large alloc fall back policy. 2009-01-21 18:51:53 +09:00
dma.c kernel/dma.c: remove a CVS keyword 2008-10-16 11:21:30 -07:00
exec_domain.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
exit.c signal: re-add dead task accumulation stats. 2009-02-05 13:04:33 +01:00
extable.c Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-12-30 16:10:19 -08:00
fork.c Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-02-11 08:24:32 -08:00
freezer.c freezer_cg: use thaw_process() in unfreeze_cgroup() 2008-10-30 11:38:45 -07:00
futex.c futex: fix reference leak 2009-02-11 18:24:08 +01:00
futex_compat.c CRED: Use RCU to access another task's creds and to release a task's own creds 2008-11-14 10:39:19 +11:00
hrtimer.c hrtimer: prevent negative expiry value after clock_was_set() 2009-01-30 22:35:34 +01:00
itimer.c timers: split process wide cpu clocks/timers 2009-02-05 13:04:33 +01:00
kallsyms.c Revert "kbuild: strip generated symbols from *.ko" 2009-01-14 21:38:20 +01:00
Kconfig.freezer container freezer: implement freezer cgroup subsystem 2008-10-20 08:52:34 -07:00
Kconfig.hz sched: fix SCHED_HRTICK dependency 2008-07-28 14:37:38 +02:00
Kconfig.preempt rcu: provide RCU options on non-preempt architectures too 2008-12-25 09:31:28 +01:00
kexec.c PM: Split up sysdev_[suspend|resume] from device_power_[down|up] 2009-02-22 10:33:44 -08:00
kfifo.c
kgdb.c kgdb: call touch_softlockup_watchdog on resume 2008-10-06 13:50:59 -05:00
kmod.c kmod: fix varargs kernel-doc 2009-01-06 15:59:27 -08:00
kprobes.c kprobes: check CONFIG_FREEZER instead of CONFIG_PM 2009-01-16 14:32:17 -05:00
ksysfs.c kernel/ksysfs.c:fix dependence on CONFIG_NET 2009-01-06 10:44:31 -08:00
kthread.c tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() 2008-11-16 09:01:36 +01:00
latencytop.c KSYM_SYMBOL_LEN fixes 2008-12-10 08:01:54 -08:00
lockdep.c Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-12-30 16:10:19 -08:00
lockdep_internals.h lockdep: build fix 2008-08-13 12:55:10 +02:00
lockdep_proc.c lockstat: contend with points 2008-10-20 15:43:10 +02:00
Makefile PM: fix build for CONFIG_PM unset 2009-02-21 14:17:17 -08:00
marker.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
module.c modules: Use a better scheme for refcounting 2009-02-02 19:17:55 -08:00
mutex-debug.c
mutex-debug.h
mutex.c mutex: __used is needed for function referenced only from inline asm 2008-11-24 10:00:28 +01:00
mutex.h
notifier.c Merge commit 'v2.6.28-rc6' into core/debug 2008-11-26 08:22:50 +01:00
ns_cgroup.c ns_cgroup: remove unused spinlock 2009-01-08 08:31:02 -08:00
nsproxy.c User namespaces: set of cleanups (v2) 2008-11-24 18:57:41 -05:00
panic.c oops: increment the oops UUID every time we oops 2009-01-06 15:59:12 -08:00
params.c Fix compile warning in kernel/params.c 2008-10-23 12:09:00 -07:00
pid.c pid: generalize task_active_pid_ns 2009-01-08 08:31:12 -08:00
pid_namespace.c pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits 2008-09-02 19:21:38 -07:00
pm_qos_params.c pm_qos_requirement might sleep 2008-09-02 19:21:40 -07:00
posix-cpu-timers.c timers: more consistently use clock vs timer 2009-02-13 13:04:05 +01:00
posix-timers.c [CVE-2009-0029] System call wrappers part 05 2009-01-14 14:15:20 +01:00
printk.c PM: Fix suspend_console and resume_console to use only one semaphore 2009-02-21 14:17:18 -08:00
profile.c profiling: fix broken profiling regression 2009-02-10 00:50:37 +01:00
ptrace.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
rcuclassic.c rcu: Teach RCU that idle task is not quiscent state at boot 2009-02-26 04:08:14 +01:00
rcupdate.c rcu: Teach RCU that idle task is not quiscent state at boot 2009-02-26 04:08:14 +01:00
rcupreempt.c rcu: Teach RCU that idle task is not quiscent state at boot 2009-02-26 04:08:14 +01:00
rcupreempt_trace.c "Tree RCU": scalable classic RCU implementation 2008-12-18 21:56:04 +01:00
rcutorture.c rcu: fix bug in rcutorture system-shutdown code 2009-01-07 23:36:25 +01:00
rcutree.c rcu: Teach RCU that idle task is not quiscent state at boot 2009-02-26 04:08:14 +01:00
rcutree_trace.c "Tree RCU": scalable classic RCU implementation 2008-12-18 21:56:04 +01:00
relay.c relay: fix lock imbalance in relay_late_setup_files 2009-01-18 20:29:35 +01:00
res_counter.c memcg: memory cgroup resource counters for hierarchy 2009-01-08 08:31:05 -08:00
resource.c resources: fix parameter name and kernel-doc 2009-01-15 16:39:38 -08:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c hrtimer: convert kernel/* to the new hrtimer apis 2008-09-05 21:35:13 -07:00
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c sched: cpu hotplug fix 2009-02-12 11:57:36 +01:00
sched_clock.c sched_clock: prevent scd->clock from moving backwards, take #2 2008-12-31 09:53:21 +01:00
sched_cpupri.c sched: fix section mismatch 2009-01-06 11:07:15 +01:00
sched_cpupri.h sched: convert struct cpupri_vec cpumask_var_t. 2008-11-24 17:52:22 +01:00
sched_debug.c sched: partly revert "sched debug: remove NULL checking in print_cfs_rt_rq()" 2009-01-11 02:40:32 +01:00
sched_fair.c sched: revert recent sync wakeup changes 2009-02-11 14:43:35 +01:00
sched_features.h sched: backward looking buddy 2008-11-05 10:30:14 +01:00
sched_idletask.c sched: add CONFIG_SMP consistency 2008-10-22 10:01:52 +02:00
sched_rt.c sched_rt: don't use first_cpu on cpumask created with cpumask_and 2009-02-01 10:49:52 +01:00
sched_stats.h timers: split process wide cpu clocks/timers 2009-02-05 13:04:33 +01:00
seccomp.c
semaphore.c semaphore: __down_common: use signal_pending_state() 2008-08-05 14:33:47 -07:00
signal.c signal: re-add dead task accumulation stats. 2009-02-05 13:04:33 +01:00
smp.c generic-ipi: use per cpu data for single cpu ipi calls 2009-01-30 18:31:08 +01:00
softirq.c Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-03 12:04:39 -08:00
softlockup.c softlock: fix false panic which can occur if softlockup_thresh is reduced 2009-01-14 11:48:07 +01:00
spinlock.c lockdep: spin_lock_nest_lock(), checkpatch fixes 2008-08-13 13:56:51 +02:00
srcu.c
stacktrace.c stacktrace: provide save_stack_trace_tsk() weak alias 2008-12-25 11:44:43 +01:00
stop_machine.c stop_machine: introduce stop_machine_create/destroy. 2009-01-05 08:40:14 +10:30
sys.c revert "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY" 2009-02-05 12:56:47 -08:00
sys_ni.c [CVE-2009-0029] Make sys_syslog a conditional system call 2009-01-14 14:15:16 +01:00
sysctl.c mm: fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches 2009-02-11 14:25:35 -08:00
sysctl_check.c [XFS] remove restricted chown parameter from xfs linux 2008-10-30 18:30:09 +11:00
taskstats.c cpumask: convert rest of files in kernel/ 2009-01-01 10:12:28 +10:30
test_kprobes.c kprobes: add tests for register_kprobes 2009-01-06 15:59:20 -08:00
time.c [CVE-2009-0029] System call wrappers part 01 2009-01-14 14:15:18 +01:00
timeconst.pl
timer.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
tracepoint.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
tsacct.c mm: introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting 2009-01-06 15:59:09 -08:00
uid16.c [CVE-2009-0029] System call wrappers part 19 2009-01-14 14:15:26 +01:00
up.c smp_call_function_single(): be slightly less stupid, fix #2 2009-01-12 16:04:37 +01:00
user.c User namespaces: Only put the userns when we unhash the uid 2009-02-13 08:07:40 -08:00
user_namespace.c User namespaces: set of cleanups (v2) 2008-11-24 18:57:41 -05:00
utsname.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
utsname_sysctl.c sysctl: simplify ->strategy 2008-10-16 11:21:47 -07:00
wait.c wait: prevent exclusive waiter starvation 2009-02-05 12:56:48 -08:00
workqueue.c work_on_cpu: Use our own workqueue. 2009-01-19 22:36:07 +01:00