Commit graph

857 commits

Author SHA1 Message Date
Joel Fernandes (Google) 0392bebebf rcu: Add multiple in-flight batches of kfree_rcu() work
During testing, it was observed that amount of memory consumed due
kfree_rcu() batching is 300-400MB. Previously we had only a single
head_free pointer pointing to the list of rcu_head(s) that are to be
freed after a grace period. Until this list is drained, we cannot queue
any more objects on it since such objects may not be ready to be
reclaimed when the worker thread eventually gets to drainin g the
head_free list.

We can do better by maintaining multiple lists as done by this patch.
Testing shows that memory consumption came down by around 100-150MB with
just adding another list. Adding more than 1 additional list did not
show any improvement.

Suggested-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Code style and initialization handling. ]
[ paulmck: Fix field name, reported by kbuild test robot <lkp@intel.com>. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:24:31 -08:00
Joel Fernandes 569d767087 rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
Because the ->monitor_todo field is always protected by krcp->lock,
this commit downgrades from xchg() to non-atomic unmarked assignment
statements.

Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
[ paulmck: Update to include early-boot kick code. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:24:31 -08:00
Byungchul Park a35d16905e rcu: Add basic support for kfree_rcu() batching
Recently a discussion about stability and performance of a system
involving a high rate of kfree_rcu() calls surfaced on the list [1]
which led to another discussion how to prepare for this situation.

This patch adds basic batching support for kfree_rcu(). It is "basic"
because we do none of the slab management, dynamic allocation, code
moving or any of the other things, some of which previous attempts did
[2]. These fancier improvements can be follow-up patches and there are
different ideas being discussed in those regards. This is an effort to
start simple, and build up from there. In the future, an extension to
use kfree_bulk and possibly per-slab batching could be done to further
improve performance due to cache-locality and slab-specific bulk free
optimizations. By using an array of pointers, the worker thread
processing the work would need to read lesser data since it does not
need to deal with large rcu_head(s) any longer.

Torture tests follow in the next patch and show improvements of around
5x reduction in number of  grace periods on a 16 CPU system. More
details and test data are in that patch.

There is an implication with rcu_barrier() with this patch. Since the
kfree_rcu() calls can be batched, and may not be handed yet to the RCU
machinery in fact, the monitor may not have even run yet to do the
queue_rcu_work(), there seems no easy way of implementing rcu_barrier()
to wait for those kfree_rcu()s that are already made. So this means a
kfree_rcu() followed by an rcu_barrier() does not imply that memory will
be freed once rcu_barrier() returns.

Another implication is higher active memory usage (although not
run-away..) until the kfree_rcu() flooding ends, in comparison to
without batching. More details about this are in the second patch which
adds an rcuperf test.

Finally, in the near future we will get rid of kfree_rcu() special casing
within RCU such as in rcu_do_batch and switch everything to just
batching. Currently we don't do that since timer subsystem is not yet up
and we cannot schedule the kfree_rcu() monitor as the timer subsystem's
lock are not initialized. That would also mean getting rid of
kfree_call_rcu_nobatch() entirely.

[1] http://lore.kernel.org/lkml/20190723035725-mutt-send-email-mst@kernel.org
[2] https://lkml.org/lkml/2017/12/19/824

Cc: kernel-team@android.com
Cc: kernel-team@lge.com
Co-developed-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Applied 0day and Paul Walmsley feedback on ->monitor_todo. ]
[ paulmck: Make it work during early boot. ]
[ paulmck: Add a crude early boot self-test. ]
[ paulmck: Style adjustments and experimental docbook structure header. ]
Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.9999.1908161931110.32497@viisi.sifive.com/T/#me9956f66cb611b95d26ae92700e1d901f46e8c59
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:17:03 -08:00
Paul E. McKenney c30fe54189 rcu: Mark non-global functions and variables as static
Each of rcu_state, rcu_rnp_online_cpus(), rcu_dynticks_curr_cpu_in_eqs(),
and rcu_dynticks_snap() are used only in the kernel/rcu/tree.o translation
unit, and may thus be marked static.  This commit therefore makes this
change.

Reported-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2019-12-12 10:24:52 -08:00
Sebastian Andrzej Siewior 90326f0521 rcu: Use CONFIG_PREEMPTION where appropriate
The config option `CONFIG_PREEMPT' is used for the preemption model
"Low-Latency Desktop". The config option `CONFIG_PREEMPTION' is enabled
when kernel preemption is enabled which is true for the preemption model
`CONFIG_PREEMPT' and `CONFIG_PREEMPT_RT'.

Use `CONFIG_PREEMPTION' if it applies to both preemption models and not
just to `CONFIG_PREEMPT'.

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: rcu@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:37:51 -08:00
Marco Elver 6cf539a87a rcu: Fix data-race due to atomic_t copy-by-value
This fixes a data-race where `atomic_t dynticks` is copied by value. The
copy is performed non-atomically, resulting in a data-race if `dynticks`
is updated concurrently.

This data-race was found with KCSAN:
==================================================================
BUG: KCSAN: data-race in dyntick_save_progress_counter / rcu_irq_enter

write to 0xffff989dbdbe98e0 of 4 bytes by task 10 on cpu 3:
 atomic_add_return include/asm-generic/atomic-instrumented.h:78 [inline]
 rcu_dynticks_snap kernel/rcu/tree.c:310 [inline]
 dyntick_save_progress_counter+0x43/0x1b0 kernel/rcu/tree.c:984
 force_qs_rnp+0x183/0x200 kernel/rcu/tree.c:2286
 rcu_gp_fqs kernel/rcu/tree.c:1601 [inline]
 rcu_gp_fqs_loop+0x71/0x880 kernel/rcu/tree.c:1653
 rcu_gp_kthread+0x22c/0x3b0 kernel/rcu/tree.c:1799
 kthread+0x1b5/0x200 kernel/kthread.c:255
 <snip>

read to 0xffff989dbdbe98e0 of 4 bytes by task 154 on cpu 7:
 rcu_nmi_enter_common kernel/rcu/tree.c:828 [inline]
 rcu_irq_enter+0xda/0x240 kernel/rcu/tree.c:870
 irq_enter+0x5/0x50 kernel/softirq.c:347
 <snip>

Reported by Kernel Concurrency Sanitizer on:
CPU: 7 PID: 154 Comm: kworker/7:1H Not tainted 5.3.0+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: kblockd blk_mq_run_work_fn
==================================================================

Signed-off-by: Marco Elver <elver@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:56 -08:00
Paul E. McKenney 8dcdfb7096 Merge branches 'doc.2019.10.29a', 'fixes.2019.10.30a', 'nohz.2019.10.28a', 'replace.2019.10.30a', 'torture.2019.10.05a' and 'lkmm.2019.10.05a' into HEAD
doc.2019.10.29a: RCU documentation updates.
fixes.2019.10.30a: RCU miscellaneous fixes.
nohz.2019.10.28a: RCU NO_HZ and NO_HZ_FULL updates.
replace.2019.10.30a: Replace rcu_swap_protected() with rcu_replace().
torture.2019.10.05a: RCU torture-test updates.

lkmm.2019.10.05a: Linux kernel memory model updates.
2019-10-30 08:47:13 -07:00
Joel Fernandes (Google) 05ef9e9eb3 rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
The RCU-specific resched_cpu() function sends a resched IPI to the
specified CPU, which can be used to force the tick on for a given
nohz_full CPU.  This is needed when this nohz_full CPU is looping in the
kernel while blocking the current grace period.  However, for the tick
to actually be forced on in all cases, that CPU's rcu_data structure's
->rcu_urgent_qs flag must be set beforehand.  This commit therefore
causes rcu_implicit_dynticks_qs() to set this flag prior to invoking
resched_cpu() on a holdout nohz_full CPU.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-30 08:34:35 -07:00
Paul E. McKenney dd7dafd1ad rcu: Make kernel-mode nohz_full CPUs invoke the RCU core processing
If a nohz_full CPU is idle or executing in userspace, it makes good sense
to keep it out of RCU core processing.  After all, the RCU grace-period
kthread can see its quiescent states and all of its callbacks are
offloaded, so there is nothing for RCU core processing to do.

However, if a nohz_full CPU is executing in kernel space, the RCU
grace-period kthread cannot do anything for it, so such a CPU must report
its own quiescent states.  This commit therefore makes nohz_full CPUs
skip RCU core processing only if the scheduler-clock interrupt caught
them in idle or in userspace.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-28 07:02:21 -07:00
Paul E. McKenney ed93dfc6bc rcu: Confine ->core_needs_qs accesses to the corresponding CPU
Commit 671a63517c ("rcu: Avoid unnecessary softirq when system
is idle") fixed a bug that could result in an indefinite number of
unnecessary invocations of the RCU_SOFTIRQ handler at the trailing edge
of a scheduler-clock interrupt.  However, the fix introduced off-CPU
stores to ->core_needs_qs.  These writes did not conflict with the
on-CPU stores because the CPU's leaf rcu_node structure's ->lock was
held across all such stores.  However, the loads from ->core_needs_qs
were not promoted to READ_ONCE() and, worse yet, the code loading from
->core_needs_qs was written assuming that it was only ever updated by
the corresponding CPU.  So operation has been robust, but only by luck.
This situation is therefore an accident waiting to happen.

This commit therefore takes a different approach.  Instead of clearing
->core_needs_qs from the grace-period kthread's force-quiescent-state
processing, it modifies the rcu_pending() function to suppress the
rcu_sched_clock_irq() function's call to invoke_rcu_core() if there is no
grace period in progress.  This avoids the infinite needless RCU_SOFTIRQ
handlers while still keeping all accesses to ->core_needs_qs local to
the corresponding CPU.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-28 07:02:21 -07:00
Joel Fernandes (Google) 516e5ae0c9 rcu: Reset CPU hints when reporting a quiescent state
In some cases, tracing shows that need_heavy_qs is still set even though
urgent_qs was cleared upon reporting of a quiescent state.  One such
case is when the softirq reports that a CPU has passed quiescent state.

Commit 671a63517c ("rcu: Avoid unnecessary softirq when system is
idle") fixed a bug where core_needs_qs was not being cleared.  In order
to avoid running into similar situations with the urgent-grace-period
flags, this commit causes rcu_disable_urgency_upon_qs(), previously
rcu_disable_tick_upon_qs(), to clear the urgency hints, ->rcu_urgent_qs
and ->rcu_need_heavy_qs.  Note that it is possible for CPUs to go
offline with these urgency hints still set.  This is handled because
rcu_disable_urgency_upon_qs() is also invoked during the online process.

Because these hints can be cleared both by the corresponding CPU and by
the grace-period kthread, this commit also adds a number of READ_ONCE()
and WRITE_ONCE() calls.

Tested overnight with rcutorture running for 60 minutes on all
configurations of RCU.

Signed-off-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
[ paulmck: Clear urgency flags in rcu_disable_urgency_upon_qs(). ]
[ paulmck: Remove ->core_needs_qs from the set cleared at quiescent state. ]
[ paulmck: Make rcu_disable_urgency_upon_qs static per kbuild test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-28 07:02:21 -07:00
Paul E. McKenney b200a04895 rcu: Force nohz_full tick on upon irq enter instead of exit
There is interrupt-exit code that forces on the tick for nohz_full CPUs
failing to respond to the current grace period in a timely fashion.
However, this code must compare ->dynticks_nmi_nesting to the value 2
in the interrupt-exit fastpath.  This commit therefore moves this code
to the interrupt-entry fastpath, where a lighter-weight comparison to
zero may be used.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
[ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-28 07:02:21 -07:00
Paul E. McKenney 66e4c33b51 rcu: Force tick on for nohz_full CPUs not reaching quiescent states
CPUs running for long time periods in the kernel in nohz_full mode
might leave the scheduling-clock interrupt disabled for then full
duration of their in-kernel execution.  This can (among other things)
delay grace periods.  This commit therefore forces the tick back on
for any nohz_full CPU that is failing to pass through a quiescent state
upon return from interrupt, which the resched_cpu() will induce.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
[ paulmck: Clear ->rcu_forced_tick as reported by Joel Fernandes testing. ]
[ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-28 07:02:21 -07:00
Paul E. McKenney 79ba7ff5a9 rcutorture: Emulate dyntick aspect of userspace nohz_full sojourn
During an actual call_rcu() flood, there would be frequent trips to
userspace (in-kernel call_rcu() floods must be otherwise housebroken).
Userspace execution on nohz_full CPUs implies an RCU dyntick idle/not-idle
transition pair, so this commit adds emulation of that pair.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:05 -07:00
Paul E. McKenney 96926686de rcu: Make CPU-hotplug removal operations enable tick
CPU-hotplug removal operations run the multi_cpu_stop() function, which
relies on the scheduler to gain control from whatever is running on the
various online CPUs, including any nohz_full CPUs running long loops in
kernel-mode code.  Lack of the scheduler-clock interrupt on such CPUs
can delay multi_cpu_stop() for several minutes and can also result in
RCU CPU stall warnings.  This commit therefore causes CPU-hotplug removal
operations to enable the scheduler-clock interrupt on all online CPUs.

[ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ]
[ paulmck: Apply simplifications suggested by Frederic Weisbecker. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:05 -07:00
Paul E. McKenney 366237e7b0 stop_machine: Provide RCU quiescent state in multi_cpu_stop()
When multi_cpu_stop() loops waiting for other tasks, it can trigger an RCU
CPU stall warning.  This can be misleading because what is instead needed
is information on whatever task is blocking multi_cpu_stop().  This commit
therefore inserts an RCU quiescent state into the multi_cpu_stop()
function's waitloop.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:05 -07:00
Paul E. McKenney 6a949b7af8 rcu: Force on tick when invoking lots of callbacks
Callback invocation can run for a significant time period, and within
CONFIG_NO_HZ_FULL=y kernels, this period will be devoid of scheduler-clock
interrupts.  In-kernel execution without such interrupts can cause all
manner of malfunction, with RCU CPU stall warnings being but one result.

This commit therefore forces scheduling-clock interrupts on whenever more
than a few RCU callbacks are invoked.  Because offloaded callback invocation
can be preempted, this forcing is withdrawn on each context switch.  This
in turn requires that the loop invoking RCU callbacks reiterate the forcing
periodically.

[ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ]
[ paulmck: Remove NO_HZ_FULL check per Frederic Weisbecker feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:04 -07:00
Linus Torvalds 7e67a85999 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
   Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
   Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.

   As perf and the scheduler is getting bigger and more complex,
   document the status quo of current responsibilities and interests,
   and spread the review pain^H^H^H^H fun via an increase in the Cc:
   linecount generated by scripts/get_maintainer.pl. :-)

 - Add another series of patches that brings the -rt (PREEMPT_RT) tree
   closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
   into a new CONFIG_PREEMPTION category that will allow the eventual
   introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
   to go though.

 - Extend the CPU cgroup controller with uclamp.min and uclamp.max to
   allow the finer shaping of CPU bandwidth usage.

 - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).

 - Improve the behavior of high CPU count, high thread count
   applications running under cpu.cfs_quota_us constraints.

 - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.

 - Improve CPU isolation housekeeping CPU allocation NUMA locality.

 - Fix deadline scheduler bandwidth calculations and logic when cpusets
   rebuilds the topology, or when it gets deadline-throttled while it's
   being offlined.

 - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
   setscheduler() system calls without creating global serialization.
   Add new synchronization between cpuset topology-changing events and
   the deadline acceptance tests in setscheduler(), which were broken
   before.

 - Rework the active_mm state machine to be less confusing and more
   optimal.

 - Rework (simplify) the pick_next_task() slowpath.

 - Improve load-balancing on AMD EPYC systems.

 - ... and misc cleanups, smaller fixes and improvements - please see
   the Git log for more details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  sched/psi: Correct overly pessimistic size calculation
  sched/fair: Speed-up energy-aware wake-ups
  sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
  sched/uclamp: Update CPU's refcount on TG's clamp changes
  sched/uclamp: Use TG's clamps to restrict TASK's clamps
  sched/uclamp: Propagate system defaults to the root group
  sched/uclamp: Propagate parent clamps
  sched/uclamp: Extend CPU's cgroup controller
  sched/topology: Improve load balancing on AMD EPYC systems
  arch, ia64: Make NUMA select SMP
  sched, perf: MAINTAINERS update, add submaintainers and reviewers
  sched/fair: Use rq_lock/unlock in online_fair_sched_group
  cpufreq: schedutil: fix equation in comment
  sched: Rework pick_next_task() slow-path
  sched: Allow put_prev_task() to drop rq->lock
  sched/fair: Expose newidle_balance()
  sched: Add task_struct pointer to sched_class::set_curr_task
  sched: Rework CPU hotplug task selection
  sched/{rt,deadline}: Fix set_next_task vs pick_next_task
  sched: Fix kerneldoc comment for ia64_set_curr_task
  ...
2019-09-16 17:25:49 -07:00
Ingo Molnar 563c4f85f9 Merge branch 'sched/rt' into sched/core, to pick up -rt changes
Pick up the first couple of patches working towards PREEMPT_RT.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-16 14:05:04 +02:00
Eric Dumazet cfcdef5e30 rcu: Allow rcu_do_batch() to dynamically adjust batch sizes
Bimodal behavior of rcu_do_batch() is not really suited to Google
applications like gfe servers.

When a process with millions of sockets exits, closing all files
queues two rcu callbacks per socket.

This eventually reaches the point where RCU enters an emergency
mode, where rcu_do_batch() do not return until whole queue is flushed.

Each rcu callback lasts at least 70 nsec, so with millions of
elements, we easily spend more than 100 msec without rescheduling.

Goal of this patch is to avoid the infamous message like following
"need_resched set for > 51999388 ns (52 ticks) without schedule"

We dynamically adjust the number of elements we process, instead
of 10 / INFINITE choices, we use a floor of ~1 % of current entries.

If the number is above 1000, we switch to a time based limit of 3 msec
per batch, adjustable with /sys/module/rcutree/parameters/rcu_resched_ns

Signed-off-by: Eric Dumazet <edumazet@google.com>
[ paulmck: Forward-port and remove debug statements. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:38:24 -07:00
Paul E. McKenney 23651d9b96 rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks()
The rcutree_migrate_callbacks() invokes rcu_advance_cbs() on both the
offlined CPU's ->cblist and that of the surviving CPU, then merges
them.  However, after the merge, and of the offlined CPU's callbacks
that were not ready to be invoked will no longer be associated with a
grace-period number.  This commit therefore invokes rcu_advance_cbs()
one more time on the merged ->cblist in order to assign a grace-period
number to these callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:38:24 -07:00
Paul E. McKenney d1b222c6be rcu/nocb: Add bypass callback queueing
Use of the rcu_data structure's segmented ->cblist for no-CBs CPUs
takes advantage of unrelated grace periods, thus reducing the memory
footprint in the face of floods of call_rcu() invocations.  However,
the ->cblist field is a more-complex rcu_segcblist structure which must
be protected via locking.  Even though there are only three entities
which can acquire this lock (the CPU invoking call_rcu(), the no-CBs
grace-period kthread, and the no-CBs callbacks kthread), the contention
on this lock is excessive under heavy stress.

This commit therefore greatly reduces contention by provisioning
an rcu_cblist structure field named ->nocb_bypass within the
rcu_data structure.  Each no-CBs CPU is permitted only a limited
number of enqueues onto the ->cblist per jiffy, controlled by a new
nocb_nobypass_lim_per_jiffy kernel boot parameter that defaults to
about 16 enqueues per millisecond (16 * 1000 / HZ).  When that limit is
exceeded, the CPU instead enqueues onto the new ->nocb_bypass.

The ->nocb_bypass is flushed into the ->cblist every jiffy or when
the number of callbacks on ->nocb_bypass exceeds qhimark, whichever
happens first.  During call_rcu() floods, this flushing is carried out
by the CPU during the course of its call_rcu() invocations.  However,
a CPU could simply stop invoking call_rcu() at any time.  The no-CBs
grace-period kthread therefore carries out less-aggressive flushing
(every few jiffies or when the number of callbacks on ->nocb_bypass
exceeds (2 * qhimark), whichever comes first).  This means that the
no-CBs grace-period kthread cannot be permitted to do unbounded waits
while there are callbacks on ->nocb_bypass.  A ->nocb_bypass_timer is
used to provide the needed wakeups.

[ paulmck: Apply Coverity feedback reported by Colin Ian King. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:37:32 -07:00
Paul E. McKenney 6608c3a027 rcu/nocb: Reduce contention at no-CBs registry-time CB advancement
Currently, __call_rcu_nocb_wake() conditionally acquires the leaf rcu_node
structure's ->lock, and only afterwards does rcu_advance_cbs_nowake()
check to see if it is possible to advance callbacks without potentially
needing to awaken the grace-period kthread.  Given that the no-awaken
check can be done locklessly, this commit reverses the order, so that
rcu_advance_cbs_nowake() is invoked without holding the leaf rcu_node
structure's ->lock and rcu_advance_cbs_nowake() checks the grace-period
state before conditionally acquiring that lock, thus reducing the number
of needless acquistions of the leaf rcu_node structure's ->lock.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney 7f36ef82e5 rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread
Currently, the code provides an extra wakeup for the no-CBs grace-period
kthread if one of its CPUs is generating excessive numbers of callbacks.
But satisfying though it is to wake something up when things are going
south, unless the thing being awakened can actually help solve the
problem, that extra wakeup does nothing but consume additional CPU time,
which is exactly what you don't want during a call_rcu() flood.

This commit therefore avoids doing anything if the corresponding
no-CBs callback kthread is going full tilt.  Otherwise, if advancing
callbacks immediately might help and if the leaf rcu_node structure's
lock is immediately available, this commit invokes a new variant of
rcu_advance_cbs() that advances callbacks only if doing so won't require
awakening the grace-period kthread (not to be confused with any of the
no-CBs grace-period kthreads).

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney 921bb5fad1 rcu/nocb: Use build-time no-CBs check in rcu_pending()
Currently, rcu_pending() invokes rcu_segcblist_is_offloaded() even
in CONFIG_RCU_NOCB_CPU=n kernels, which cannot possibly be offloaded.
Given that rcu_pending() is on a fastpath, it makes sense to check for
CONFIG_RCU_NOCB_CPU=y before invoking rcu_segcblist_is_offloaded().
This commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney c1ab99d66e rcu/nocb: Use build-time no-CBs check in rcu_core()
Currently, rcu_core() invokes rcu_segcblist_is_offloaded() each time it
needs to know whether the current CPU is a no-CBs CPU.  Given that it is
not possible to change the no-CBs status of a CPU after boot, and given
that it is not possible to even have no-CBs CPUs in CONFIG_RCU_NOCB_CPU=n
kernels, this repeated runtime invocation wastes CPU.  This commit
therefore created a const on-stack variable to allow this check to be
done only once per rcu_core() invocation.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney ec5ef87bac rcu/nocb: Use build-time no-CBs check in rcu_do_batch()
Currently, rcu_do_batch() invokes rcu_segcblist_is_offloaded() each time
it needs to know whether the current CPU is a no-CBs CPU.  Given that it
is not possible to change the no-CBs status of a CPU after boot, and given
that it is not possible to even have no-CBs CPUs in CONFIG_RCU_NOCB_CPU=n
kernels, this per-callback invocation wastes CPU.  This commit therefore
created a const on-stack variable to allow this check to be done only
once per rcu_do_batch() invocation.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney c035280f17 rcu/nocb: Remove obsolete nocb_q_count and nocb_q_count_lazy fields
This commit removes the obsolete nocb_q_count and nocb_q_count_lazy
fields, also removing rcu_get_n_cbs_nocb_cpu(), adjusting
rcu_get_n_cbs_cpu(), and making rcutree_migrate_callbacks() once again
disable the ->cblist fields of offline CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney 5d6742b377 rcu/nocb: Use rcu_segcblist for no-CBs CPUs
Currently the RCU callbacks for no-CBs CPUs are queued on a series of
ad-hoc linked lists, which means that these callbacks cannot benefit
from "drive-by" grace periods, thus suffering needless delays prior
to invocation.  In addition, the no-CBs grace-period kthreads first
wait for callbacks to appear and later wait for a new grace period,
which means that callbacks appearing during a grace-period wait can
be delayed.  These delays increase memory footprint, and could even
result in an out-of-memory condition.

This commit therefore enqueues RCU callbacks from no-CBs CPUs on the
rcu_segcblist structure that is already used by non-no-CBs CPUs.  It also
restructures the no-CBs grace-period kthread to be checking for incoming
callbacks while waiting for grace periods.  Also, instead of waiting
for a new grace period, it waits for the closest grace period that will
cause some of the callbacks to be safe to invoke.  All of these changes
reduce callback latency and thus the number of outstanding callbacks,
in turn reducing the probability of an out-of-memory condition.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney e83e73f5b0 rcu/nocb: Leave ->cblist enabled for no-CBs CPUs
As a first step towards making no-CBs CPUs use the ->cblist, this commit
leaves the ->cblist enabled for these CPUs.  The main reason to make
no-CBs CPUs use ->cblist is to take advantage of callback numbering,
which will reduce the effects of missed grace periods which in turn will
reduce forward-progress problems for no-CBs CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney 85f69b3212 rcu/nocb: Check for deferred nocb wakeups before nohz_full early exit
In theory, a timer is used to defer wakeups of no-CBs grace-period
kthreads when the wakeup cannot be done safely directly from the
call_rcu().  In practice, the one-jiffy delay is not always consistent
with timely callback invocation under heavy call_rcu() loads.  Therefore,
there are a number of checks for a pending deferred wakeup, including
from the scheduling-clock interrupt.  Unfortunately, this check follows
the rcu_nohz_full_cpu() early exit, which renders it useless on such CPUs.

This commit therefore moves the check for the pending deferred no-CB
wakeup to precede the rcu_nohz_full_cpu() early exit.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney c00045be32 rcu/nocb: Make rcutree_migrate_callbacks() start at leaf rcu_node structure
Because rcutree_migrate_callbacks() is invoked infrequently and because
an exact snapshot of the grace-period state might save some callbacks a
second trip through a grace period, this function has used the root
rcu_node structure.  However, this safe-second-trip optimization
happens only if rcutree_migrate_callbacks() races with grace-period
initialization, so it is not worth the added mental load.  This commit
therefore makes rcutree_migrate_callbacks() start with the leaf rcu_node
structures, as is done elsewhere.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney 750d7f6a43 rcu/nocb: Add checks for offloaded callback processing
This commit is a preparatory patch for offloaded callbacks using the
same ->cblist structure used by non-offloaded callbacks.  It therefore
adds rcu_segcblist_is_offloaded() calls where they will be needed when
!rcu_segcblist_is_enabled() no longer flags the offloaded case.  It also
adds checks in rcu_do_batch() to ensure that there are no missed checks:
Currently, it should not be possible for offloaded execution to reach
rcu_do_batch(), though this will change later in this series.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Paul E. McKenney ce5215c134 rcu/nocb: Use separate flag to indicate offloaded ->cblist
RCU callback processing currently uses rcu_is_nocb_cpu() to determine
whether or not the current CPU's callbacks are to be offloaded.
This works, but it is not so good for cache locality.  Plus use of
->cblist for offloaded callbacks will greatly increase the frequency
of these checks.  This commit therefore adds a ->offloaded flag to the
rcu_segcblist structure to provide a more flexible and cache-friendly
means of checking for callback offloading.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-13 14:35:49 -07:00
Peter Zijlstra 130d9c331b rcu/tree: Fix SCHED_FIFO params
A rather embarrasing mistake had us call sched_setscheduler() before
initializing the parameters passed to it.

Fixes: 1a763fd7c6 ("rcu/tree: Call setschedule() gp ktread to SCHED_FIFO outside of atomic region")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
2019-08-08 09:09:30 +02:00
Thomas Gleixner 01b1d88b09 rcu: Use CONFIG_PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by
CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same
functionality which today depends on CONFIG_PREEMPT.

Switch the conditionals in RCU to use CONFIG_PREEMPTION.

That's the first step towards RCU on RT. The further tweaks are work in
progress. This neither touches the selftest bits which need a closer look
by Paul.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190726212124.210156346@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-31 19:03:35 +02:00
Juri Lelli 1a763fd7c6 rcu/tree: Call setschedule() gp ktread to SCHED_FIFO outside of atomic region
sched_setscheduler() needs to acquire cpuset_rwsem, but it is currently
called from an invalid (atomic) context by rcu_spawn_gp_kthread().

Fix that by simply moving sched_setscheduler_nocheck() call outside of
the atomic region, as it doesn't actually require to be guarded by
rcu_node lock.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-8-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25 15:55:03 +02:00
Paul E. McKenney 11ca7a9d54 Merge branches 'consolidate.2019.05.28a', 'doc.2019.05.28a', 'fixes.2019.06.13a', 'srcu.2019.05.28a', 'sync.2019.05.28a' and 'torture.2019.05.28a' into HEAD
consolidate.2019.05.28a: RCU flavor consolidation cleanups and optmizations.
doc.2019.05.28a: Documentation updates.
fixes.2019.06.13a: Miscellaneous fixes.
srcu.2019.05.28a: SRCU updates.
sync.2019.05.28a: RCU-sync flavor consolidation.
torture.2019.05.28a: Torture-test updates.
2019-06-19 09:21:46 -07:00
Paul E. McKenney d5a9a8c3bc rcu: Set a maximum limit for back-to-back callback invocation
Currently, if a CPU has more than 10,000 callbacks pending, it will
increase rdp->blimit to LONG_MAX.  If you are lucky, LONG_MAX is only
about two billion, but this is still a bit too many callbacks to invoke
back-to-back while otherwise ignoring the world.

This commit therefore sets a maximum limit of DEFAULT_MAX_RCU_BLIMIT,
which is set to 10,000, for rdp->blimit.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 09:02:57 -07:00
Joel Fernandes (Google) eddded8012 rcu: Add checks for dynticks counters in rcu_is_cpu_rrupt_from_idle()
It would be good to combine the dynticks and dynticks_nesting counters
in order to simplify the code.  Unfortunately, there are concerns
about usermode upcalls appearing to RCU as half of an interrupt, as
Byungchul learned [1].  The "half" in "half interrupt" is due to an
unpaired rcu_irq_enter(): Normally, each rcu_irq_enter() has a later
call to rcu_irq_exit().

Out of an abundance of caution, Paul added warnings [2] in the RCU
code which if not fired by 2021 will be interpreted as meaning that
this half-interrupt scenario cannot happen any more, thus permitting
simplification of this code.

In the meantime, this commit makes the following changes:

(1) Combining these two counters requires that rcu_rrupt_from_idle()
    is invoked only from hard-interrupt contexts as discussed here [3].
    This commit therefore adds the required lockdep_assert_in_irq()
    to check this constraint.

(2) Furthermore, rcu_rrupt_from_idle() is not explicit about how it
    is using the counters which can lead to weird future bugs. This
    commit therefore adds comments indicating the meaning and use of
    each counter.

(3) Lastly, this commit checks for counter underflows as another check
    that half interrupts don't occur.  (Previously, the function would
    simply return true upon underflow.)

All these checks checks are NOOPs if PROVE_LOCKING (and thus PROVE_RCU)
are disabled.

[1] https://lore.kernel.org/patchwork/patch/952349/
[2] Commit e11ec65cc8 ("rcu: Add warning to detect half-interrupts")
[3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/

Cc: byungchul.park@lge.com
Cc: kernel-team@android.com
Cc: rcu@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 08:48:19 -07:00
Paul E. McKenney 43e903ad3e rcu: Inline invoke_rcu_callbacks() into its sole remaining caller
This commit saves a few lines of code by inlining invoke_rcu_callbacks()
into its sole remaining caller.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-25 14:50:49 -07:00
Sebastian Andrzej Siewior 48d07c04b4 rcu: Enable elimination of Tree-RCU softirq processing
Some workloads need to change kthread priority for RCU core processing
without affecting other softirq work.  This commit therefore introduces
the rcutree.use_softirq kernel boot parameter, which moves the RCU core
work from softirq to a per-CPU SCHED_OTHER kthread named rcuc.  Use of
SCHED_OTHER approach avoids the scalability problems that appeared
with the earlier attempt to move RCU core processing to from softirq
to kthreads.  That said, kernels built with RCU_BOOST=y will run the
rcuc kthreads at the RCU-boosting priority.

Note that rcutree.use_softirq=0 must be specified to move RCU core
processing to the rcuc kthreads: rcutree.use_softirq=1 is the default.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked
  from RCU core processing, in contrast to softirq->rcuc transition
  in old mainline RCU priority boosting. ]
[ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock()
  while holding rq or pi locks, also possibly fixing a pre-existing latent
  bug involving raise_softirq()-induced wakeups. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-25 14:50:46 -07:00
Linus Torvalds d2d8b14604 The major changes in this tracing update includes:
- Removing of non-DYNAMIC_FTRACE from 32bit x86
 
  - Removing of mcount support from x86
 
  - Emulating a call from int3 on x86_64, fixes live kernel patching
 
  - Consolidated Tracing Error logs file
 
 Minor updates:
 
  - Removal of klp_check_compiler_support()
 
  - kdb ftrace dumping output changes
 
  - Accessing and creating ftrace instances from inside the kernel
 
  - Clean up of #define if macro
 
  - Introduction of TRACE_EVENT_NOP() to disable trace events based on config
    options
 
 And other minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXNxMZxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qq4PAP44kP6VbwL8CHyI2A3xuJ6Hwxd+2Z2r
 ip66RtzyJ+2iCgEA2QCuWUlEt2bLpF9a8IQ4N9tWenSeW2i7gunPb+tioQw=
 =RVQo
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The major changes in this tracing update includes:

   - Removal of non-DYNAMIC_FTRACE from 32bit x86

   - Removal of mcount support from x86

   - Emulating a call from int3 on x86_64, fixes live kernel patching

   - Consolidated Tracing Error logs file

  Minor updates:

   - Removal of klp_check_compiler_support()

   - kdb ftrace dumping output changes

   - Accessing and creating ftrace instances from inside the kernel

   - Clean up of #define if macro

   - Introduction of TRACE_EVENT_NOP() to disable trace events based on
     config options

  And other minor fixes and clean ups"

* tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
  x86: Hide the int3_emulate_call/jmp functions from UML
  livepatch: Remove klp_check_compiler_support()
  ftrace/x86: Remove mcount support
  ftrace/x86_32: Remove support for non DYNAMIC_FTRACE
  tracing: Simplify "if" macro code
  tracing: Fix documentation about disabling options using trace_options
  tracing: Replace kzalloc with kcalloc
  tracing: Fix partial reading of trace event's id file
  tracing: Allow RCU to run between postponed startup tests
  tracing: Fix white space issues in parse_pred() function
  tracing: Eliminate const char[] auto variables
  ring-buffer: Fix mispelling of Calculate
  tracing: probeevent: Fix to make the type of $comm string
  tracing: probeevent: Do not accumulate on ret variable
  tracing: uprobes: Re-enable $comm support for uprobe events
  ftrace/x86_64: Emulate call function while updating in breakpoint handler
  x86_64: Allow breakpoints to emulate call instructions
  x86_64: Add gap to int3 to allow for call emulation
  tracing: kdb: Allow ftdump to skip all but the last few entries
  tracing: Add trace_total_entries() / trace_total_entries_cpu()
  ...
2019-05-15 16:05:47 -07:00
Linus Torvalds 0968621917 Printk changes for 5.2
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAlzP8nQACgkQUqAMR0iA
 lPK79A/+NkRouqA9ihAZhUbgW0DHzOAFvUJSBgX11HQAZbGjngakuoyYFvwUx0T0
 m80SUTCysxQrWl+xLdccPZ9ZrhP2KFQrEBEdeYHZ6ymcYcl83+3bOIBS7VwdZAbO
 EzB8u/58uU/sI6ABL4lF7ZF/+R+U4CXveEUoVUF04bxdPOxZkRX4PT8u3DzCc+RK
 r4yhwQUXGcKrHa2GrRL3GXKsDxcnRdFef/nzq4RFSZsi0bpskzEj34WrvctV6j+k
 FH/R3kEcZrtKIMPOCoDMMWq07yNqK/QKj0MJlGoAlwfK4INgcrSXLOx+pAmr6BNq
 uMKpkxCFhnkZVKgA/GbKEGzFf+ZGz9+2trSFka9LD2Ig6DIstwXqpAgiUK8JFQYj
 lq1mTaJZD3DfF2vnGHGeAfBFG3XETv+mIT/ow6BcZi3NyNSVIaqa5GAR+lMc6xkR
 waNkcMDkzLFuP1r0p7ZizXOksk9dFkMP3M6KqJomRtApwbSNmtt+O2jvyLPvB3+w
 wRyN9WT7IJZYo4v0rrD5Bl6BjV15ZeCPRSFZRYofX+vhcqJQsFX1M9DeoNqokh55
 Cri8f6MxGzBVjE1G70y2/cAFFvKEKJud0NUIMEuIbcy+xNrEAWPF8JhiwpKKnU10
 c0u674iqHJ2HeVsYWZF0zqzqQ6E1Idhg/PrXfuVuhAaL5jIOnYY=
 =WZfC
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull printk updates from Petr Mladek:

 - Allow state reset of printk_once() calls.

 - Prevent crashes when dereferencing invalid pointers in vsprintf().
   Only the first byte is checked for simplicity.

 - Make vsprintf warnings consistent and inlined.

 - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
   modifiers.

 - Some clean up of vsprintf and test_printf code.

* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  lib/vsprintf: Make function pointer_string static
  vsprintf: Limit the length of inlined error messages
  vsprintf: Avoid confusion between invalid address and value
  vsprintf: Prevent crash when dereferencing invalid pointers
  vsprintf: Consolidate handling of unknown pointer specifiers
  vsprintf: Factor out %pO handler as kobject_string()
  vsprintf: Factor out %pV handler as va_format()
  vsprintf: Factor out %p[iI] handler as ip_addr_string()
  vsprintf: Do not check address of well-known strings
  vsprintf: Consistent %pK handling for kptr_restrict == 0
  vsprintf: Shuffle restricted_pointer()
  printk: Tie printk_once / printk_deferred_once into .data.once for reset
  treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
  lib/test_printf: Switch to bitmap_zalloc()
2019-05-07 09:18:12 -07:00
Paul E. McKenney 6cdbc07a5a Merge branches 'consolidate.2019.04.09a', 'doc.2019.03.26b', 'fixes.2019.03.26b', 'srcu.2019.03.26b', 'stall.2019.03.26b' and 'torture.2019.03.26b' into HEAD
consolidate.2019.04.09a: Lingering RCU flavor consolidation cleanups.
doc.2019.03.26b: Documentation updates.
fixes.2019.03.26b: Miscellaneous fixes.
srcu.2019.03.26b: SRCU updates.
stall.2019.03.26b: RCU CPU stall warning updates.
torture.2019.03.26b: Torture-test updates.
2019-04-09 08:08:13 -07:00
Sakari Ailus d75f773c86 treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
%pF and %pf are functionally equivalent to %pS and %ps conversion
specifiers. The former are deprecated, therefore switch the current users
to use the preferred variant.

The changes have been produced by the following command:

	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

And verifying the result.

Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-pci@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mm@kvack.org
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-04-09 14:19:06 +02:00
Yafang Shao 4f5fbd78a7 rcu: validate arguments for rcu tracepoints
When CONFIG_RCU_TRACE is not set, all these tracepoints are defined as
do-nothing macro.
We'd better make those inline functions that take proper arguments.

As RCU_TRACE() is defined as do-nothing marco as well when
CONFIG_RCU_TRACE is not set, so we can clean it up.

Link: http://lkml.kernel.org/r/1553602391-11926-4-git-send-email-laoar.shao@gmail.com

Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-08 09:22:51 -04:00
Paul E. McKenney b51bcbbf16 rcu: Move forward-progress checkers into tree_stall.h
This commit further consolidates stall-warning functionality by moving
forward-progress checkers into kernel/rcu/tree_stall.h, updating a
comment or two while in the area.  More specifically, this commit moves
show_rcu_gp_kthreads(), rcu_check_gp_start_stall(), rcu_fwd_progress_check(),
sysrq_rcu, sysrq_show_rcu(), sysrq_rcudump_op, and rcu_sysrq_init() from
kernel/rcu/tree.c to kernel/rcu/tree_stall.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney 7ac1907c9e rcu: Move irq-disabled stall-warning checking to tree_stall.h
The rcu_iw_handler() function's sole purpose in life is to indicate
whether a stalled CPU had interrupts disabled, so it belongs in
kernel/rcu/tree_stall.h.  This commit therefore makes that move,
clarifying its header comment while in the area.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney 32255d51b6 rcu: Move RCU CPU stall-warning code out of tree.c
This commit completes the process of consolidating the code for RCU CPU
stall warnings for normal grace periods by moving the remaining such
code from kernel/rcu/tree.c to kernel/rcu/tree_stall.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney 10462d6f58 rcu: Move RCU CPU stall-warning code out of update.c
The RCU CPU stall-warning code for normal grace periods is currently
scattered across three files, due to earlier Tiny RCU support for RCU
CPU stall warnings and for old Kconfig options that have long since
been retired.  Given that it is hard for the lead RCU maintainer to
find relevant stall-warning code, it would be good to consolidate it.
This commit starts this process by moving stall-warning code from
kernel/rcu/update.c to a new kernel/rcu/tree_stall.h file.

Note that the definitions of rcu_cpu_stall_suppress and
rcu_cpu_stall_timeout must remain in kernel/rcu/update.h to provide
compatibility for kernel boot parameter lists.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Zhouyi Zhou 5d8a752e31 rcu: Fix force_qs_rnp() header comment
Previously, threads blocked on offlining CPUS were migrated to the
root rcu_node structure, thus requiring RCU priority boosting on this
structure.  However, since commit d19fb8d1f3 ("rcu: Don't migrate
blocked tasks even if all corresponding CPUs offline"), RCU does not
migrate blocked tasks.  Consequently, RCU no longer does RCU priority
boosting on the root rcu_node structure as of commit 1be0085b51 ("rcu:
Don't initiate RCU priority boosting on root rcu_node").

This commit therefore brings comments for the force_qs_rnp() function's
header comment in line with this new no-root-boosting reality.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
[ paulmck: Also remove obsolete comment on suppressing new grace periods. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Paul E. McKenney 85f2b60c43 rcu: Update jiffies_to_sched_qs and adjust_jiffies_till_sched_qs() comments
This commit better documents the jiffies_to_sched_qs default-value
strategy used by adjust_jiffies_till_sched_qs()

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Neeraj Upadhyay 6973032a60 rcu: Default jiffies_to_sched_qs to jiffies_till_sched_qs
The current code only calls adjust_jiffies_till_sched_qs() if
jiffies_till_sched_qs is left at its default value, so when the
jiffies_till_sched_qs kernel-boot parameter actually is specified,
jiffies_to_sched_qs will be left with the value zero, which
will result in useless slowdowns of cond_resched().  This commit
therefore changes rcu_init_geometry() to unconditionally invoke
adjust_jiffies_till_sched_qs(), which ensures that jiffies_to_sched_qs
will be initialized in all cases, thus maintaining good cond_resched()
performance.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Neeraj Upadhyay 0f58d2ac2c rcu: Fix self-wakeups for grace-period kthread
The current rcu_gp_kthread_wake() function uses in_interrupt()
and thus does a self-wakeup from all interrupt contexts, including
the pointless case where the GP kthread happens to be running with
bottom halves disabled, along with the impossible case where the GP
kthread is running within an NMI handler (you are not supposed to invoke
rcu_gp_kthread_wake() from within an NMI handler.  This commit therefore
replaces the in_interrupt() with in_irq(), so that the self-wakeups
happen only from handlers for hardware interrupts and softirqs.
This also makes the code match the comment.

Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-26 14:37:49 -07:00
Akira Yokosawa b2eb85b49a rcu: Move common code out of if-else block
As the result of recent addition of "rdp->core_needs_qs = false;" in
the "if" block, now both branches of the if-else have the same
assignment.

Factor it out and reduce line count.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2019-03-26 14:37:49 -07:00
Liu Song 3ffe3d1adc rcu: Set rcutree.kthread_prio sysfs access to read-only
The rcutree.kthread_prio kernel-boot parameter is used to set the
priority for boost (rcub), per-CPU (rcuc), and grace-period (rcu_preempt
or rcu_sched) kthreads.  It is also used by rcutorture to check whether
it is possible to meaningfully test RCU priority boosting.  However,
all of these cases will either ignore or be confused by any post-boot
changes to rcutree.kthread_prio.

Note that the user really can change the priorities of all of these
kthreads using chrt, given sufficient privileges.  Therefore, the
read-write nature of sysfs access to rcutree.kthread_prio is thus at
best an attractive nuisance.

This commit therefore changes sysfs access to rcutree.kthread_prio to
be read-only.

Signed-off-by: Liu Song <liu.song11@zte.com.cn>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:37:49 -07:00
Joel Fernandes (Google) 671a63517c rcu: Avoid unnecessary softirq when system is idle
When there are no callbacks pending on an idle system, I noticed that
RCU softirq is continuously firing. During this the cpu_no_qs is set to
false, and core_needs_qs is set to true indefinitely. This causes
rcu_process_callbacks to be repeatedly called, even though the node
corresponding to the CPU has that CPU's mask bit cleared and the system
is idle. I believe the race is when such mask clearing is done during
idle CPU scan of the quiescent state forcing stage in the kthread
instead of the softirq. Since the rnp mask is cleared, but the flags on
the CPU's rdp are not cleared, the CPU thinks it still needs to report
to core RCU.

Cure this by clearing the core_needs_qs flag when the CPU detects that
its node is already updated which will avoid the unwanted softirq raises
to the benefit of real-time systems.

Test: Ran rcutorture for various tree RCU configs.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:37:49 -07:00
Paul E. McKenney e85e6a21b2 rcu: Unconditionally expedite during suspend/hibernate
The rcu_pm_notify() function refuses to switch to/from expedited grace
periods on systems with more than 256 CPUs due to the serialized
initialization of expedited grace periods.  However, expedited grace
periods are now initialized in parallel, removing this concern.
This commit therefore removes the checks from rcu_pm_notify(), so that
expedited grace periods are used unconditionally during suspend/resume
and hibernate/wake operations.

As always, real-time workloads wishing to completely avoid expedited
grace periods can use the rcupdate.rcu_normal= kernel parameter.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:37:49 -07:00
Linus Torvalds 203b6609e0 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Lots of tooling updates - too many to list, here's a few highlights:

   - Various subcommand updates to 'perf trace', 'perf report', 'perf
     record', 'perf annotate', 'perf script', 'perf test', etc.

   - CPU and NUMA topology and affinity handling improvements,

   - HW tracing and HW support updates:
      - Intel PT updates
      - ARM CoreSight updates
      - vendor HW event updates

   - BPF updates

   - Tons of infrastructure updates, both on the build system and the
     library support side

   - Documentation updates.

   - ... and lots of other changes, see the changelog for details.

  Kernel side updates:

   - Tighten up kprobes blacklist handling, reduce the number of places
     where developers can install a kprobe and hang/crash the system.

   - Fix/enhance vma address filter handling.

   - Various PMU driver updates, small fixes and additions.

   - refcount_t conversions

   - BPF updates

   - error code propagation enhancements

   - misc other changes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
  perf script python: Add Python3 support to syscall-counts-by-pid.py
  perf script python: Add Python3 support to syscall-counts.py
  perf script python: Add Python3 support to stat-cpi.py
  perf script python: Add Python3 support to stackcollapse.py
  perf script python: Add Python3 support to sctop.py
  perf script python: Add Python3 support to powerpc-hcalls.py
  perf script python: Add Python3 support to net_dropmonitor.py
  perf script python: Add Python3 support to mem-phys-addr.py
  perf script python: Add Python3 support to failed-syscalls-by-pid.py
  perf script python: Add Python3 support to netdev-times.py
  perf tools: Add perf_exe() helper to find perf binary
  perf script: Handle missing fields with -F +..
  perf data: Add perf_data__open_dir_data function
  perf data: Add perf_data__(create_dir|close_dir) functions
  perf data: Fail check_backup in case of error
  perf data: Make check_backup work over directories
  perf tools: Add rm_rf_perf_data function
  perf tools: Add pattern name checking to rm_rf
  perf tools: Add depth checking to rm_rf
  perf data: Add global path holder
  ...
2019-03-06 07:59:36 -08:00
Masami Hiramatsu c13324a505 x86/kprobes: Prohibit probing on functions before kprobe_int3_handler()
Prohibit probing on the functions called before kprobe_int3_handler()
in do_int3(). More specifically, ftrace_int3_handler(),
poke_int3_handler(), and ist_enter(). And since rcu_nmi_enter() is
called by ist_enter(), it also should be marked as NOKPROBE_SYMBOL.

Since those are handled before kprobe_int3_handler(), probing those
functions can cause a breakpoint recursion and crash the kernel.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154998793571.31052.11301258949601150994.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13 08:16:39 +01:00
Paul E. McKenney e7ffb4eb9a Merge branches 'doc.2019.01.26a', 'fixes.2019.01.26a', 'sil.2019.01.26a', 'spdx.2019.02.09a', 'srcu.2019.01.26a' and 'torture.2019.01.26a' into HEAD
doc.2019.01.26a:  Documentation updates.
fixes.2019.01.26a:  Miscellaneous fixes.
sil.2019.01.26a:  Removal of a few more spin_is_locked() instances.
spdx.2019.02.09a:  Add SPDX identifiers to RCU files
srcu.2019.01.26a:  SRCU updates.
torture.2019.01.26a: Torture-test updates.
2019-02-09 08:47:52 -08:00
Paul E. McKenney 22e4092531 rcu/tree: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Update .h file SPDX comment format per Joe Perches. ]
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:44:10 -08:00
Sebastian Andrzej Siewior e81baf4cb1 srcu: Remove srcu_queue_delayed_work_on()
srcu_queue_delayed_work_on() disables preemption (and therefore CPU
hotplug in RCU's case) and then checks based on its own accounting if a
CPU is online. If the CPU is online it uses queue_delayed_work_on()
otherwise it fallbacks to queue_delayed_work().
The problem here is that queue_work() on -RT does not work with disabled
preemption.

queue_work_on() works also on an offlined CPU. queue_delayed_work_on()
has the problem that it is possible to program a timer on an offlined
CPU. This timer will fire once the CPU is online again. But until then,
the timer remains programmed and nothing will happen.

Add a local timer which will fire (as requested per delay) on the local
CPU and then enqueue the work on the specific CPU.

RCUtorture testing with SRCU-P for 24h showed no problems.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:36:42 -08:00
Paul E. McKenney 39abefe743 rcu: Repair rcu_nmi_exit() docbook header
This commit removes the "@irq" argument from the rcu_nmi_exit() docbook
header, given that this function now has no arguments.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:35:23 -08:00
Paul E. McKenney fb60e533be rcu: Rename rcu_process_callbacks() to rcu_core() for Tree RCU
Although the name rcu_process_callbacks() still makes sense for Tiny
RCU, where most of what it does is invoke callbacks, it no longer makes
much sense for Tree RCU, especially given that the actually callback
invocation is relegated to rcu_do_batch(), or, for no-CBs CPUs, to the
rcuo kthreads.  Especially in the latter case, rcu_process_callbacks()
has very little to do with actual callbacks.  A better description of
this function is that it performs RCU's core processing.

This commit therefore changes the name of Tree RCU's rcu_process_callbacks()
function to rcu_core(), which also has the virtue of being consistent with
the existing invoke_rcu_core() function.

While in the area, the header comment is reworked.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:35:22 -08:00
Paul E. McKenney c98cac603f rcu: Rename rcu_check_callbacks() to rcu_sched_clock_irq()
The name rcu_check_callbacks() arguably made sense back in the early
2000s when RCU was quite a bit simpler than it is today, but it has
become quite misleading, especially with the advent of dyntick-idle
and NO_HZ_FULL.  The rcu_check_callbacks() function is RCU's hook into
the scheduling-clock interrupt, and is now but one of many ways that
callbacks get promoted to invocable state.

This commit therefore changes the name to rcu_sched_clock_irq(),
which is the same number of characters and clearly indicates this
function's relation to the rest of the Linux kernel.  In addition, for
the sake of consistency, rcu_flavor_check_callbacks() is also renamed
to rcu_flavor_sched_clock_irq().

While in the area, the header comments for both functions are reworked.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:35:21 -08:00
Paul E. McKenney 7a968bb26a Merge branches 'consolidate.2019.01.26a' and 'fwd.2019.01.26a' into HEAD
consolidate.2019.01.26a: RCU flavor consolidation cleanups.
fwd.2019.01.26a: RCU grace-period forward-progress fixes.
2019-01-25 15:32:01 -08:00
Zhang, Jun 13dc7d0c7a rcu: Prevent needless ->gp_seq_needed update in __note_gp_changes()
Currently, __note_gp_changes() checks to see if the rcu_node structure's
->gp_seq_needed is greater than or equal to that of the rcu_data
structure, and if so, updates the rcu_data structure's ->gp_seq_needed
field.  This results in a useless store in the case where the two fields
are equal.

This commit therefore carries out this store only in the case where the
rcu_node structure's ->gp_seq_needed is strictly greater than that of
the rcu_data structure.

Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Link: https://lkml.kernel.org/r/88DC34334CA3444C85D647DBFA962C2735AD5F77@SHSMSX104.ccr.corp.intel.com
2019-01-25 15:30:00 -08:00
Zhang, Jun 1d1f898df6 rcu: Do RCU GP kthread self-wakeup from softirq and interrupt
The rcu_gp_kthread_wake() function is invoked when it might be necessary
to wake the RCU grace-period kthread.  Because self-wakeups are normally
a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from
this kthread, it naturally refuses to do the wakeup.

Unfortunately, natural though it might be, this heuristic fails when
rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler
that interrupted the grace-period kthread just after the final check of
the wait-event condition but just before the schedule() call.  In this
case, a wakeup is required, even though the call to rcu_gp_kthread_wake()
is within the RCU grace-period kthread's context.  Failing to provide
this wakeup can result in grace periods failing to start, which in turn
results in out-of-memory conditions.

This race window is quite narrow, but it actually did happen during real
testing.  It would of course need to be fixed even if it was strictly
theoretical in nature.

This patch does not Cc stable because it does not apply cleanly to
earlier kernel versions.

Fixes: 48a7639ce8 ("rcu: Make callers awaken grace-period kthread")
Reported-by: "He, Bo" <bo.he@intel.com>
Co-developed-by: "Zhang, Jun" <jun.zhang@intel.com>
Co-developed-by: "He, Bo" <bo.he@intel.com>
Co-developed-by: "xiao, jin" <jin.xiao@intel.com>
Co-developed-by: Bai, Jie A <jie.a.bai@intel.com>
Signed-off: "Zhang, Jun" <jun.zhang@intel.com>
Signed-off: "He, Bo" <bo.he@intel.com>
Signed-off: "xiao, jin" <jin.xiao@intel.com>
Signed-off: Bai, Jie A <jie.a.bai@intel.com>
Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com>
[ paulmck: Switch from !in_softirq() to "!in_interrupt() &&
  !in_serving_softirq() to avoid redundant wakeups and to also handle the
  interrupt-handler scenario as well as the softirq-handler scenario that
  actually occurred in testing. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.com
2019-01-25 15:29:59 -08:00
Paul E. McKenney 2ccaff10f7 rcu: Add sysrq rcu_node-dump capability
Life is hard if RCU manages to get stuck without triggering RCU CPU
stall warnings or triggering the rcu_check_gp_start_stall() checks
for failing to start a grace period.  This commit therefore adds a
boot-time-selectable sysrq key (commandeering "y") that allows manually
dumping Tree RCU state.  The new rcutree.sysrq_rcu kernel boot parameter
must be set for this sysrq to be available.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:59 -08:00
Paul E. McKenney 3b6505fd8e rcu: Protect rcu_check_gp_kthread_starvation() access to ->gp_flags
The rcu_check_gp_kthread_starvation() function can be invoked without
holding locks, so the access to the rcu_state structure's ->gp_flags
field must be protected with READ_ONCE().  This commit therefore adds
this protection.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:58 -08:00
Paul E. McKenney fd897573fa rcu: Improve diagnostics for failed RCU grace-period start
If a grace period fails to start (for example, because you commented
out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
will invoke rcu_check_gp_start_stall(), which will notice and complain.
However, this complaint is lacking crucial debugging information such
as when the last wakeup executed and what the value of ->gp_seq was at
that time.  This commit therefore removes the current pr_alert() from
rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
which has been updated to print the needed information, which is collected
by rcu_gp_kthread_wake().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:57 -08:00
Paul E. McKenney 9cf422a8e7 rcu: Accommodate zero jiffies_till_first_fqs and kthread kicking
It is perfectly fine to set the rcutree.jiffies_till_first_fqs boot
parameter to zero, in fact, this can be useful on specialty systems that
usually have at least one idle CPU and that need fast grace periods.
This is because this setting causes the RCU grace-period kthread to
scan for idle threads immediately after grace-period initialization,
as opposed to waiting several jiffies to do so.

It is also perfectly fine to set the rcutree.rcu_kick_kthreads kernel
parameter, which gives the RCU grace-period kthread an extra wakeup
if it doesn't make progress for a period of three times the setting of
the rcutree.jiffies_till_first_fqs boot parameter.  This is of course
problematic when the value of this parameter is zero, as it can result
in unnecessary wakeup IPIs along with unnecessary WARN_ONCE() invocations.

This commit therefore defers kthread kicking for at least two jiffies,
regardless of the setting of rcutree.jiffies_till_first_fqs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:29:53 -08:00
Paul E. McKenney 260e1e4fd8 rcu: Discard separate per-CPU callback counts
Back when there were multiple flavors of RCU, it was necessary to
separately count lazy and non-lazy callbacks for each CPU.  These counts
were used in CONFIG_RCU_FAST_NO_HZ kernels to determine how long a newly
idle CPU should be allowed to sleep before handling its RCU callbacks.
But now that there is only one flavor, the callback counts for a given
CPU's sole rcu_data structure are the counts for that CPU.

This commit therefore removes the rcu_data structure's ->nonlazy_posted
and ->nonlazy_posted_snap fields, the rcu_idle_count_callbacks_posted()
and rcu_cpu_has_callbacks() functions, repurposes the rcu_data structure's
->all_lazy field to record the laziness state at the beginning of the
latest idle sojourn, and modifies CONFIG_RCU_FAST_NO_HZ RCU CPU stall
warnings accordingly.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:30 -08:00
Paul E. McKenney e5bc3af773 rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu()
Now that rcu_blocking_is_gp() makes the correct immediate-return
decision for both PREEMPT and !PREEMPT, a single implementation of
synchronize_rcu() will work correctly under both configurations.
This commit therefore eliminates a few lines of code by consolidating
the two implementations of synchronize_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:28 -08:00
Paul E. McKenney c97058d033 rcu: Eliminate RCU_BH_FLAVOR and RCU_SCHED_FLAVOR
Now that the RCU flavors have been consolidated, RCU_BH_FLAVOR and
RCU_SCHED_FLAVOR are no longer used.  This commit therefore saves a
few lines by removing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:25 -08:00
Paul E. McKenney cd920e5a34 rcu: Inline force_quiescent_state() into rcu_force_quiescent_state()
Given that rcu_force_quiescent_state() is a simple wrapper around
force_quiescent_state(), this commit saves a few lines of code by
inlining force_quiescent_state() into rcu_force_quiescent_state(),
and changing all references to force_quiescent_state() to instead
invoke rcu_force_quiescent_state().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:24 -08:00
Paul E. McKenney ad368d15b0 rcu: Rename and comment changes due to only one rcuo kthread per CPU
Given RCU flavor consolidation, the name rcu_spawn_all_nocb_kthreads()
is quite misleading.  It no longer ever creates more than one kthread,
and it does so only for the specified CPU.  This commit therefore changes
this name to the more descriptive rcu_spawn_cpu_nocb_kthread(), and also
fixes up a similar issue in its header comment while in the area.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:23 -08:00
Paul E. McKenney c51d7b5e6c rcutorture: Print time since GP end upon forward-progress failure
If rcutorture's forward-progress tests fail while a grace period is not
in progress, it is useful to print the time since the last grace period
ended as a way to detect failure to launch a new grace period.  This
commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:40 -08:00
Paul E. McKenney 8dd3b54689 rcutorture: Print GP age upon forward-progress failure
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:38 -08:00
Paul E. McKenney bfcfcffc5f rcu: Print per-CPU callback counts for forward-progress failures
This commit prints out the non-zero per-CPU callback counts when a
forware-progress error (OOM event) occurs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix a pair of uninitialized locals spotted by kbuild test robot. ]
2018-12-01 12:45:38 -08:00
Paul E. McKenney 903ee83d91 rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
The RCU CPU stall warnings print an estimate of the total number of
RCU callbacks queued in the system, but this estimate leaves out
the callbacks queued for nocbs CPUs.  This commit therefore introduces
rcu_get_n_cbs_cpu(), which gives an accurate callback estimate for
both nocbs and normal CPUs, and uses this new function as needed.

This commit also introduces a rcu_get_n_cbs_nocb_cpu() helper function
that returns the number of callbacks for nocbs CPUs or zero otherwise,
and also uses this function in place of direct access to ->nocb_q_count
while in the area (fewer characters, you see).

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:37 -08:00
Paul E. McKenney e0aff97355 rcutorture: Dump grace-period diagnostics upon forward-progress OOM
This commit adds an OOM notifier during rcutorture forward-progress
testing.  If this notifier is invoked, it dumps out some grace-period
state to help debug the forward-progress problem.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:36 -08:00
Paul E. McKenney eaaf055f27 Merge branches 'bug.2018.11.12a', 'consolidate.2018.12.01a', 'doc.2018.11.12a', 'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD
bug.2018.11.12a:  Get rid of BUG_ON() and friends
consolidate.2018.12.01a:  Continued RCU flavor-consolidation cleanup
doc.2018.11.12a:  Documentation updates
fixes.2018.11.12a:  Miscellaneous fixes
initrd.2018.11.08b:  Automate creation of rcutorture initrd
sil.2018.11.12a:  Remove more spin_unlock_wait() calls
2018-12-01 12:43:16 -08:00
Paul E. McKenney 0a89e5a402 rcu: Trace end of grace period before end of grace period
Currently, rcu_gp_cleanup() traces the end of the old grace period after
the old grace period has officially ended.  This might make intuitive
sense, but it also makes for confusing event-trace output because the
"end" trace displays not the old but instead the new grace-period number.
This commit therefore traces the end of an old grace period just before
that grace period officially ends.

Reported-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Zhouyi Zhou 2320bda26d rcu: Adjust the comment of function rcu_is_watching
Because RCU avoids interrupting idle CPUs, rcu_is_watching() is used to
test whether or not it is currently legal to run RCU read-side critical
sections on this CPU.  However, the first sentence and last sentences
of current comment for rcu_is_watching have opposite meaning of what
is expected.  This commit therefore fixes this header comment.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney c669c014d1 rcu: Add jiffies-since-GP-activity to show_rcu_gp_kthreads()
This commit adds a printout of the number of jiffies since the last time
that the RCU grace-period kthread did any processing.  This can be useful
when tracking down forward-progress issues.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney 691960197e rcu: Add state name to show_rcu_gp_kthreads() output
This commit adds the name of the RCU grace-period state to
the show_rcu_gp_kthreads() output in order to ease debugging.
This commit also moves gp_state_getname() up in the code so that
show_rcu_gp_kthreads() can use it.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney 791416c471 rcu: Parameterize rcu_check_gp_start_stall()
In order to debug forward-progress stalls, it is necessary to check
for excessively delayed grace-period starts.  This is currently done
for RCU CPU stall warnings by rcu_check_gp_start_stall(), which checks
to see if the start of a requested grace period has been delayed by an
RCU CPU stall warning period.  Because rcutorture will need to check
for the time consumed by an RCU forward-progress delay, this commit
promotes gpssdelay from a local variable to a formal parameter.  It is
not necessary to export rcu_check_gp_start_stall() because rcutorture
will access it via a wrapper function.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney b3c1d9ec7c rcu: Avoid double multiply by HZ
The rcu_check_gp_start_stall() function multiplies the return value
from rcu_jiffies_till_stall_check() by HZ, but the units are already
in jiffies.  This commit therefore avoids the need for introduction of
a jiffies-squared unit by removing the extraneous multiplication.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney 08543bda42 rcu: Eliminate BUG_ON() for kernel/rcu/tree.c
The tree.c file has a number of calls to BUG_ON(), which panics the
kernel, which is not a good strategy for devices (like embedded) that
don't have a way to capture console output.  This commit therefore
converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-08 21:41:57 -08:00
Paul E. McKenney b56ada1209 Merge branches 'doc.2018.08.30a', 'dynticks.2018.08.30b', 'srcu.2018.08.30b' and 'torture.2018.08.29a' into HEAD
doc.2018.08.30a: Documentation updates
dynticks.2018.08.30b: RCU flavor consolidation updates and cleanups
srcu.2018.08.30b: SRCU updates
torture.2018.08.29a: Torture-test updates
2018-08-30 16:12:53 -07:00
Paul E. McKenney e0fcba9ac0 srcu: Make call_srcu() available during very early boot
Event tracing is moving to SRCU in order to take advantage of the fact
that SRCU may be safely used from idle and even offline CPUs.  However,
event tracing can invoke call_srcu() very early in the boot process,
even before workqueue_init_early() is invoked (let alone rcu_init()).
Therefore, call_srcu()'s attempts to queue work fail miserably.

This commit therefore detects this situation, and refrains from attempting
to queue work before rcu_init() time, but does everything else that it
would have done, and in addition, adds the srcu_struct to a global list.
The rcu_init() function now invokes a new srcu_init() function, which
is empty if CONFIG_SRCU=n.  Otherwise, srcu_init() queues work for
each srcu_struct on the list.  This all happens early enough in boot
that there is but a single CPU with interrupts disabled, which allows
synchronization to be dispensed with.

Of course, the queued work won't actually be invoked until after
workqueue_init() is invoked, which happens shortly after the scheduler
is up and running.  This means that although call_srcu() may be invoked
any time after per-CPU variables have been set up, there is still a very
narrow window when synchronize_srcu() won't work, and this window
extends from the time that the scheduler starts until the time that
workqueue_init() returns.  This can be fixed in a manner similar to
the fix for synchronize_rcu_expedited() and friends, but until someone
actually needs to use synchronize_srcu() during this window, this fix
is added churn for no benefit.

Finally, note that Tree SRCU's new srcu_init() function invokes
queue_work() rather than the queue_delayed_work() function that is
invoked post-boot.  The reason is that queue_delayed_work() will (as you
would expect) post a timer, and timers have not yet been initialized.
So use of queue_work() avoids the complaints about use of uninitialized
spinlocks that would otherwise result.  Besides, some delay is already
provide by the aforementioned fact that the queued work won't actually
be invoked until after the scheduler is up and running.

Requested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-30 16:10:19 -07:00
Mike Galbraith 894d45bbf7 rcu: Convert rcu_state.ofl_lock to raw_spinlock_t
1e64b15a4b ("rcu: Fix grace-period hangs due to race with CPU offline")
added spinlock_t ofl_lock to the rcu_state structure, then takes it with
preemption disabled during CPU offline, which gives the -rt patchset's
sleeping spinlock heartburn.

This commit therefore converts ->ofl_lock to raw_spinlock_t.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2018-08-30 16:03:54 -07:00
Paul E. McKenney 8d8a9d0e7e rcu: Remove obsolete ->dynticks_fqs and ->cond_resched_completed
The rcu_data structure's ->dynticks_fqs is incremented but never
accesses.  Its ->cond_resched_completed field isn't used at all.
This commit therefore removes both fields.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:53 -07:00
Paul E. McKenney dc5a4f2932 rcu: Switch ->dynticks to rcu_data structure, remove rcu_dynticks
This commit move ->dynticks from the rcu_dynticks structure to the
rcu_data structure, replacing the field of the same name.  It also updates
the code to access ->dynticks from the rcu_data structure and to use the
rcu_data structure rather than following to now-gone ->dynticks field
to the now-gone rcu_dynticks structure.  While in the area, this commit
also fixes up comments.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:52 -07:00
Paul E. McKenney 4c5273bf2b rcu: Switch dyntick nesting counters to rcu_data structure
This commit removes ->dynticks_nesting and ->dynticks_nmi_nesting from
the rcu_dynticks structure and updates the code to access them from the
rcu_data structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:51 -07:00
Paul E. McKenney 2dba13f0b6 rcu: Switch urgent quiescent-state requests to rcu_data structure
This commit removes ->rcu_need_heavy_qs and ->rcu_urgent_qs from the
rcu_dynticks structure and updates the code to access them from the
rcu_data structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:50 -07:00
Paul E. McKenney df63fa5bc1 rcu: Convert "1UL << x" to "BIT(x)"
This commit saves a few characters by converting "1UL << x" to "BIT(x)".

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:46 -07:00
Paul E. McKenney fced9c8cfe rcu: Avoid resched_cpu() when rescheduling the current CPU
The resched_cpu() interface is quite handy, but it does acquire the
specified CPU's runqueue lock, which does not come for free.  This
commit therefore substitutes the following when directing resched_cpu()
at the current CPU:

	set_tsk_need_resched(current);
	set_preempt_need_resched();

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2018-08-30 16:03:45 -07:00
Paul E. McKenney d3052109c0 rcu: More aggressively enlist scheduler aid for nohz_full CPUs
Because nohz_full CPUs can leave the scheduler-clock interrupt disabled
even when in kernel mode, RCU cannot rely on rcu_check_callbacks() to
enlist the scheduler's aid in extracting a quiescent state from such CPUs.
This commit therefore more aggressively uses resched_cpu() on nohz_full
CPUs that fail to pass through a quiescent state in a timely manner.
By default, the resched_cpu() beating starts 300 milliseconds into the
quiescent state.

While in the neighborhood, add a ->last_fqs_resched field to the rcu_data
structure in order to rate-limit resched_cpu() calls from the RCU
grace-period kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:44 -07:00
Paul E. McKenney c06aed0e31 rcu: Compute jiffies_till_sched_qs from other kernel parameters
The jiffies_till_sched_qs value used to determine how old a grace period
must be before RCU enlists the help of the scheduler to force a quiescent
state on the holdout CPU.  Currently, this defaults to HZ/10 regardless of
system size and may be set only at boot time.  This can be a problem for
very large systems, because if the values of the jiffies_till_first_fqs
and jiffies_till_next_fqs kernel parameters are left at their defaults,
they are calculated to increase as the number of CPUs actually configured
on the system increases.  Thus, on a sufficiently large system, RCU would
enlist the help of the scheduler before the grace-period kthread had a
chance to scan for idle CPUs, which wastes CPU time.

This commit therefore allows jiffies_till_sched_qs to be set, if desired,
but if left as default, computes is as jiffies_till_first_fqs plus twice
jiffies_till_next_fqs, thus allowing three force-quiescent-state scans
for idle CPUs.  This scales with the number of CPUs, providing sensible
default values.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:43 -07:00
Paul E. McKenney 7e28c5af4e rcu: Eliminate ->rcu_qs_ctr from the rcu_dynticks structure
The ->rcu_qs_ctr counter was intended to allow providing a lightweight
report of a quiescent state to all RCU flavors.  But now that there is
only one flavor of RCU in any one running kernel, there is no point in
having this feature.  This commit therefore removes the ->rcu_qs_ctr
field from the rcu_dynticks structure and the ->rcu_qs_ctr_snap field
from the rcu_data structure.  This results in the "rqc" option to the
rcu_fqs trace event no longer being used, so this commit also removes the
"rqc" description from the header comment.

While in the neighborhood, this commit also causes the forward-progress
request .rcu_need_heavy_qs be set one jiffies_till_sched_qs interval
later in the grace period than the first setting of .rcu_urgent_qs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:42 -07:00
Paul E. McKenney a0ef9ec241 rcu: Provide improved interrupt-from-idle check in rcu_check_callbacks()
The patch making need_resched() respond to urgent RCU-QS needs used
is_idle_task(current) to detect an interrupt from idle, which does work
reasonably, but is (in theory at least) vulnerable to loops containing
need_resched() invoked from within RCU_NONIDLE() or its tracepoint
equivalent.  This commit therefore moves rcu_is_cpu_rrupt_from_idle()
to a place from which rcu_check_callbacks() can invoke it and replaces
the is_idle_task(current) with rcu_is_cpu_rrupt_from_idle().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:39 -07:00
Paul E. McKenney 92aa39e9dc rcu: Make need_resched() respond to urgent RCU-QS needs
The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent
need for an RCU quiescent state from the force-quiescent-state processing
within the grace-period kthread to context switches and to cond_resched().
Unfortunately, such urgent needs are not communicated to need_resched(),
which is sometimes used to decide when to invoke cond_resched(), for
but one example, within the KVM vcpu_run() function.  As of v4.15, this
can result in synchronize_sched() being delayed by up to ten seconds,
which can be problematic, to say nothing of annoying.

This commit therefore checks rcu_dynticks.rcu_urgent_qs from within
rcu_check_callbacks(), which is invoked from the scheduling-clock
interrupt handler.  If the current task is not an idle task and is
not executing in usermode, a context switch is forced, and either way,
the rcu_dynticks.rcu_urgent_qs variable is set to false.  If the current
task is an idle task, then RCU's dyntick-idle code will detect the
quiescent state, so no further action is required.  Similarly, if the
task is executing in usermode, other code in rcu_check_callbacks() and
its called functions will report the corresponding quiescent state.

Reported-by: Marius Hillenbrand <mhillenb@amazon.de>
Reported-by: David Woodhouse <dwmw2@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:39 -07:00
Paul E. McKenney dd46a7882c rcu: Inline _rcu_barrier() into its sole remaining caller
Because rcu_barrier() is a one-line wrapper function for _rcu_barrier()
and because nothing else calls _rcu_barrier(), this commit inlines
_rcu_barrier() into rcu_barrier().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:39 -07:00
Paul E. McKenney 395a2f097e rcu: Define rcu_all_qs() only in !PREEMPT builds
Now that rcu_all_qs() is used only in !PREEMPT builds, move it to
tree_plugin.h so that it is defined only in those builds.  This in
turn means that rcu_momentary_dyntick_idle() is only used in !PREEMPT
builds, but it is simply marked __maybe_unused in order to keep it
near the rest of the dyntick-idle code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:37 -07:00
Paul E. McKenney 49918a54e6 rcu: Clean up flavor-related definitions and comments in tree.c
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:34 -07:00
Paul E. McKenney de3875d302 rcu: Remove now-unused rcutorture APIs
This commit removes rcu_sched_get_gp_seq(), rcu_bh_get_gp_seq(),
rcu_exp_batches_completed_sched(), rcu_sched_force_quiescent_state(),
and rcu_bh_force_quiescent_state(), which are no longer used because
rcutorture no longer does "rcu_bh" and "rcu_sched" torture types.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:30 -07:00
Paul E. McKenney a8bb74acd8 rcu: Consolidate RCU-sched update-side function definitions
This commit saves a few lines by consolidating the RCU-sched function
definitions at the end of include/linux/rcupdate.h.  This consolidation
also makes it easier to remove them all when the time comes.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:26 -07:00
Paul E. McKenney 4c7e9c1434 rcu: Consolidate RCU-bh update-side function definitions
This commit saves a few lines by consolidating the RCU-bh function
definitions at the end of include/linux/rcupdate.h.  This consolidation
also makes it easier to remove them all when the time comes.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:25 -07:00
Paul E. McKenney c3854a055b rcu: Pull rcu_gp_kthread() FQS loop into separate function
The rcu_gp_kthread() function is long and deeply indented, so this
commit pulls the loop that repeatedly invokes rcu_gp_fqs() into a new
rcu_gp_fqs_loop() function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:24 -07:00
Paul E. McKenney 4e95020cdd rcu: Inline increment_cpu_stall_ticks() into its sole caller
Consolidation of the RCU flavors into one makes increment_cpu_stall_ticks()
a trivial one-line function with only one caller.  This commit therefore
inlines it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:23 -07:00
Paul E. McKenney 8ff0b90780 rcu: Fix typo in force_qs_rnp()'s parameter's parameter
Pointers to rcu_data structures should be named rdp, not rsp.  This
commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:23 -07:00
Paul E. McKenney eb7a665388 rcu: Eliminate initialization-time use of rsp
Now that there is only one rcu_state structure, there is less point in
maintaining a pointer to it.  This commit therefore replaces rsp with
&rcu_state in rcu_cpu_starting() and rcu_init_one().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:22 -07:00
Paul E. McKenney ec9f5835f7 rcu: Eliminate RCU-barrier use of rsp
Now that there is only one rcu_state structure, there is less point
in maintaining a pointer to it.  This commit therefore replaces rsp
with &rcu_state in rcu_barrier_callback(), rcu_barrier_func(), and
_rcu_barrier().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:22 -07:00
Paul E. McKenney 67a0edbf3c rcu: Eliminate quiescent-state and grace-period-nonstart use of rsp
Now that there is only one rcu_state structure, there is less point in
maintaining a pointer to it.  This commit therefore replaces rsp with
&rcu_state in rcu_report_qs_rnp(), force_quiescent_state(), and
rcu_check_gp_start_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:21 -07:00
Paul E. McKenney 3c779dfef2 rcu: Eliminate callback-invocation/invocation use of rsp
Now that there is only one rcu_state structure, there is less point in
maintaining a pointer to it.  This commit therefore replaces rsp with
&rcu_state in rcu_do_batch(), invoke_rcu_callbacks(), and __call_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:21 -07:00
Paul E. McKenney 9cbc5b9702 rcu: Eliminate grace-period management code use of rsp
Now that there is only one rcu_state structure, there is less point
in maintaining a pointer to it.  This commit therefore replaces
rsp with &rcu_state in rcu_start_this_gp(), rcu_accelerate_cbs(),
__note_gp_changes(), rcu_gp_init(), rcu_gp_fqs(), rcu_gp_cleanup(),
rcu_gp_kthread(), and rcu_report_qs_rsp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:20 -07:00
Paul E. McKenney 4c6ed43708 rcu: Eliminate stall-warning use of rsp
Now that there is only one rcu_state structure, there is less point
in maintaining a pointer to it.  This commit therefore replaces rsp
with &rcu_state in print_other_cpu_stall(), print_cpu_stall(), and
check_cpu_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:20 -07:00
Paul E. McKenney 7cba4775ba rcu: Restructure rcu_check_gp_kthread_starvation()
This commit removes the rsp and gpa local variables, repurposes the j
local variable and adds a gpk (GP kthread) local to improve readability.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:19 -07:00
Paul E. McKenney f7dd7d44fd rcu: Simplify rcutorture_get_gp_data()
This commit restructures rcutorture_get_gp_data() to take advantage of
the fact that there is only one flavor of RCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:19 -07:00
Paul E. McKenney b97d23c51c rcu: Remove for_each_rcu_flavor() flavor-traversal macro
Now that there is only ever a single flavor of RCU in a given kernel
build, there isn't a whole lot of point in having a flavor-traversal
macro.  This commit therefore removes it and converts calls to it to
straightline code, inlining trivial functions as appropriate.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:18 -07:00
Paul E. McKenney 88d1bead85 rcu: Remove rcu_data structure's ->rsp field
Now that there is only one rcu_state structure, there is no need for the
rcu_data structure to indicate which it corresponds to.  This commit
therefore removes the rcu_data structure's ->rsp field, replacing all
remaining uses of it with &rcu_state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:17 -07:00
Paul E. McKenney aedf4ba984 rcu: Remove rsp parameter from rcu_node tree accessor macros
There now is only one rcu_state structure in a given build of the Linux
kernel, so there is no need to pass it as a parameter to RCU's rcu_node
tree's accessor macros.  This commit therefore removes the rsp parameter
from those macros in kernel/rcu/rcu.h, and removes some now-unused rsp
local variables while in the area.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:16 -07:00
Paul E. McKenney 63d4c8c979 rcu: Remove rsp parameter from expedited grace-period functions
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to
RCU's functions.  This commit therefore removes the rsp parameter
from the code in kernel/rcu/tree_exp.h, and removes all of the
rsp local variables while in the area.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:14 -07:00
Paul E. McKenney 4580b0541b rcu: Remove rsp parameter from no-CBs CPU functions
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to
RCU's functions.  This commit therefore removes the rsp parameter
from rcu_nocb_cpu_needs_barrier(), rcu_spawn_one_nocb_kthread(),
rcu_organize_nocb_kthreads(), rcu_nocb_cpu_needs_barrier(), and
rcu_nohz_full_cpu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:13 -07:00
Paul E. McKenney b21ebed951 rcu: Remove rsp parameter from print_cpu_stall_info()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
print_cpu_stall_info().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:12 -07:00
Paul E. McKenney 81ab59a3ad rcu: Remove rsp parameter from dump_blkd_tasks() and friend
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
dump_blkd_tasks() and rcu_preempt_blocked_readers_cgp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:10 -07:00
Paul E. McKenney a2887cd85f rcu: Remove rsp parameter from rcu_print_detail_task_stall()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_print_detail_task_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:09 -07:00
Paul E. McKenney b8bb1f63cf rcu: Remove rsp parameter from rcu_init_one() and friends
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_init_one() and rcu_dump_rcu_node_tree().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:09 -07:00
Paul E. McKenney 53b46303da rcu: Remove rsp parameter from rcu_boot_init_percpu_data() and friends
There now is only one rcu_state structure in a given build of
the Linux kernel, so there is no need to pass it as a parameter
to RCU's functions.  This commit therefore removes the rsp
parameter from rcu_boot_init_percpu_data(), rcu_init_percpu_data(),
rcu_cleanup_dying_idle_cpu(), and rcu_migrate_callbacks().  While in
the neighborhood, line the last three into rcutree_prepare_cpu(),
rcu_report_dead() and rcutree_migrate_callbacks(), respectively.
This also gets rid of the for_each_rcu_flavor() calls that were in
those tree functions.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:08 -07:00
Paul E. McKenney 8344b871b1 rcu: Remove rsp parameter from _rcu_barrier() and friends
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
_rcu_barrier_trace() and _rcu_barrier().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:08 -07:00
Paul E. McKenney 98ece508b5 rcu: Remove rsp parameter from __rcu_pending()
There now is only one rcu_state structure in a given build of the Linux
kernel, so there is no need to pass it as a parameter to RCU's functions.
This commit therefore removes the rsp parameter from __rcu_pending(),
and also inlines it into rcu_pending(), removing the for_each_rcu_flavor()
while in the neighborhood..

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:07 -07:00
Paul E. McKenney 5c7d89676b rcu: Remove rsp parameter from __call_rcu() and friend
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
__call_rcu_core() and __call_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:07 -07:00
Paul E. McKenney b049fdf8e3 rcu: Remove rsp parameter from __rcu_process_callbacks()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
__rcu_process_callbacks(), and also inlines it into rcu_process_callbacks(),
removing the for_each_rcu_flavor() while in the neighborhood.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:06 -07:00
Paul E. McKenney b96f9dc4fb rcu: Remove rsp parameter from rcu_check_gp_start_stall()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_check_gp_start_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:06 -07:00
Paul E. McKenney e9ecb780fe rcu: Remove rsp parameter from force-quiescent-state functions
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
force_qs_rnp() and force_quiescent_state().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:05 -07:00
Paul E. McKenney 5bb5d09cc4 rcu: Remove rsp parameter from rcu_do_batch()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_do_batch().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:05 -07:00
Paul E. McKenney 780cd59083 rcu: Remove rsp parameter from CPU hotplug functions
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_cleanup_dying_cpu() and rcu_cleanup_dead_cpu().  And, as long as
we are in the neighborhood, inlines them into rcutree_dying_cpu() and
rcutree_dead_cpu(), respectively.  This also eliminates a pair of
for_each_rcu_flavor() loops.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:04 -07:00
Paul E. McKenney 8087d3e3c4 rcu: Remove rsp parameter from rcu_check_quiescent_state()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_check_quiescent_state().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:04 -07:00
Paul E. McKenney 0854a05c9f rcu: Remove rsp parameter from rcu_gp_kthread() and friends
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_gp_init(), rcu_gp_fqs_check_wake(), rcu_gp_fqs(), rcu_gp_cleanup(),
and rcu_gp_kthread().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:03 -07:00
Paul E. McKenney 22212332c1 rcu: Remove rsp parameter from rcu_gp_slow()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_gp_slow().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:03 -07:00
Paul E. McKenney 15cabdffbb rcu: Remove rsp parameter from note_gp_changes()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
note_gp_changes().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:02 -07:00
Paul E. McKenney c7e48f7ba3 rcu: Remove rsp parameter from __note_gp_changes()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
__note_gp_changes().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:01 -07:00
Paul E. McKenney 834f56bf54 rcu: Remove rsp parameter from rcu_advance_cbs()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_advance_cbs().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:01 -07:00
Paul E. McKenney c6e09b97b9 rcu: Remove rsp parameter from rcu_accelerate_cbs_unlocked()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_accelerate_cbs_unlocked().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:00 -07:00
Paul E. McKenney 02f501423d rcu: Remove rsp parameter from rcu_accelerate_cbs()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_accelerate_cbs().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:03:00 -07:00
Paul E. McKenney 532c00c97f rcu: Remove rsp parameter from rcu_gp_kthread_wake()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_gp_kthread_wake().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:59 -07:00
Paul E. McKenney 3481f2eab0 rcu: Remove rsp parameter from rcu_future_gp_cleanup()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_future_gp_cleanup().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:59 -07:00
Paul E. McKenney ea12ff2b7d rcu: Remove rsp parameter from check_cpu_stall()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
check_cpu_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:58 -07:00
Paul E. McKenney 4e8b8e08f9 rcu: Remove rsp parameter from print_cpu_stall()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
print_cpu_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:58 -07:00
Paul E. McKenney a91e7e58b1 rcu: Remove rsp parameter from print_other_cpu_stall()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
print_other_cpu_stall().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:57 -07:00
Paul E. McKenney e1741c69d4 rcu: Remove rsp parameter from rcu_stall_kick_kthreads()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_stall_kick_kthreads().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:57 -07:00
Paul E. McKenney 33dbdbf025 rcu: Remove rsp parameter from rcu_dump_cpu_stacks()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_dump_cpu_stacks().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:56 -07:00
Paul E. McKenney 8fd119b652 rcu: Remove rsp parameter from rcu_check_gp_kthread_starvation()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_check_gp_kthread_starvation().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:56 -07:00
Paul E. McKenney ad3832e974 rcu: Remove rsp parameter from record_gp_stall_check_time()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
record_gp_stall_check_time().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:55 -07:00
Paul E. McKenney 336a4f6c45 rcu: Remove rsp parameter from rcu_get_root()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_get_root().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:55 -07:00
Paul E. McKenney de8e87305a rcu: Remove rsp parameter from rcu_gp_in_progress()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_gp_in_progress().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:54 -07:00
Paul E. McKenney 33085c469a rcu: Remove rsp parameter from rcu_report_qs_rdp()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_report_qs_rdp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:53 -07:00
Paul E. McKenney 139ad4da5a rcu: Remove rsp parameter from rcu_report_unblock_qs_rnp()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_report_unblock_qs_rnp(), which is particularly appropriate in
this case given that this parameter is no longer used.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:53 -07:00
Paul E. McKenney aff4e9ede5 rcu: Remove rsp parameter from rcu_report_qs_rsp()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_report_qs_rsp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:52 -07:00
Paul E. McKenney b50912d0b5 rcu: Remove rsp parameter from rcu_report_qs_rnp()
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions.  This commit therefore removes the rsp parameter from
rcu_report_qs_rnp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:51 -07:00
Paul E. McKenney 2280ee5a7d rcu: Remove rcu_data_p pointer to default rcu_data structure
The rcu_data_p pointer references the default set of per-CPU rcu_data
structures, that is, those that call_rcu() uses, as opposed to
call_rcu_bh() and sometimes call_rcu_sched().  But there is now only one
set of per-CPU rcu_data structures, so that one set is by definition
the default, which means that the rcu_data_p pointer no longer serves
any useful purpose.  This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:51 -07:00
Paul E. McKenney 16fc9c600b rcu: Remove rcu_state_p pointer to default rcu_state structure
The rcu_state_p pointer references the default rcu_state structure,
that is, the one that call_rcu() uses, as opposed to call_rcu_bh()
and sometimes call_rcu_sched().  But there is now only one rcu_state
structure, so that one structure is by definition the default, which
means that the rcu_state_p pointer no longer serves any useful purpose.
This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:50 -07:00
Paul E. McKenney da1df50d16 rcu: Remove rcu_state structure's ->rda field
The rcu_state structure's ->rda field was used to find the per-CPU
rcu_data structures corresponding to that rcu_state structure.  But now
there is only one rcu_state structure (creatively named "rcu_state")
and one set of per-CPU rcu_data structures (creatively named "rcu_data").
Therefore, uses of the ->rda field can always be replaced by "rcu_data,
and this commit makes that change and removes the ->rda field.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:49 -07:00
Paul E. McKenney ec5dd444b6 rcu: Eliminate rcu_state structure's ->call field
The rcu_state structure's ->call field references the corresponding RCU
flavor's call_rcu() function.  However, now that there is only ever one
rcu_state structure in a given build of the Linux kernel, and that flavor
uses plain old call_rcu(), there is not a lot of point in continuing to
have the ->call field.  This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:48 -07:00
Paul E. McKenney 358be2d368 rcu: Remove RCU_STATE_INITIALIZER()
Now that a given build of the Linux kernel has only one set of rcu_state,
rcu_node, and rcu_data structures, there is no point in creating a macro
to declare and compile-time initialize them.  This commit therefore
just does normal declaration and compile-time initialization of these
structures.  While in the area, this commit also removes #ifndefs of
the no-longer-ever-defined preprocessor macro RCU_TREE_NONCORE.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:47 -07:00
Paul E. McKenney 45975c7d21 rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds
Now that RCU-preempt knows about preemption disabling, its implementation
of synchronize_rcu() works for synchronize_sched(), and likewise for the
other RCU-sched update-side API members.  This commit therefore confines
the RCU-sched update-side code to CONFIG_PREEMPT=n builds, and defines
RCU-sched's update-side API members in terms of those of RCU-preempt.

This means that any given build of the Linux kernel has only one
update-side flavor of RCU, namely RCU-preempt for CONFIG_PREEMPT=y builds
and RCU-sched for CONFIG_PREEMPT=n builds.  This in turn means that kernels
built with CONFIG_RCU_NOCB_CPU=y have only one rcuo kthread per CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
2018-08-30 16:02:45 -07:00
Paul E. McKenney 4cf439a200 rcu: Fix typo in rcu_get_gp_kthreads_prio() header comment
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:43 -07:00
Paul E. McKenney 2bbfc25b09 rcu: Drop "wake" parameter from rcu_report_exp_rdp()
The rcu_report_exp_rdp() function is always invoked with its "wake"
argument set to "true", so this commit drops this parameter.  The only
potential call site that would use "false" is in the code driving the
expedited grace period, and that code uses rcu_report_exp_cpu_mult()
instead, which therefore retains its "wake" parameter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:43 -07:00
Paul E. McKenney 82fcecfa81 rcu: Update comments and help text for no more RCU-bh updaters
This commit updates comments and help text to account for the fact that
RCU-bh update-side functions are now simple wrappers for their RCU or
RCU-sched counterparts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:42 -07:00
Paul E. McKenney 65cfe3583b rcu: Define RCU-bh update API in terms of RCU
Now that the main RCU API knows about softirq disabling and softirq's
quiescent states, the RCU-bh update code can be dispensed with.
This commit therefore removes the RCU-bh update-side implementation and
defines RCU-bh's update-side API in terms of that of either RCU-preempt or
RCU-sched, depending on the setting of the CONFIG_PREEMPT Kconfig option.

In kernels built with CONFIG_RCU_NOCB_CPU=y this has the knock-on effect
of reducing by one the number of rcuo kthreads per CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:40 -07:00
Paul E. McKenney d28139c4e9 rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe
One necessary step towards consolidating the three flavors of RCU is to
make sure that the resulting consolidated "one flavor to rule them all"
correctly handles networking denial-of-service attacks.  One thing that
allows RCU-bh to do so is that __do_softirq() invokes rcu_bh_qs() every
so often, and so something similar has to happen for consolidated RCU.

This must be done carefully.  For example, if a preemption-disabled
region of code takes an interrupt which does softirq processing before
returning, consolidated RCU must ignore the resulting rcu_bh_qs()
invocations -- preemption is still disabled, and that means an RCU
reader for the consolidated flavor.

This commit therefore creates a new rcu_softirq_qs() that is called only
from the ksoftirqd task, thus avoiding the interrupted-a-preempted-region
problem.  This new rcu_softirq_qs() function invokes rcu_sched_qs(),
rcu_preempt_qs(), and rcu_preempt_deferred_qs().  The latter call handles
any deferred quiescent states.

Note that __do_softirq() still invokes rcu_bh_qs().  It will continue to
do so until a later stage of cleanup when the RCU-bh flavor is removed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix !SMP issue located by kbuild test robot. ]
2018-08-30 16:02:38 -07:00
Paul E. McKenney e11ec65cc8 rcu: Add warning to detect half-interrupts
RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
either an interrupt that invokes rcu_irq_enter() but never invokes the
corresponding rcu_irq_exit() on the one hand, or an interrupt that never
invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
on the other.  These things really did happen at one time, as evidenced
by this ca-2011 LKML post:

http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com

The reason why RCU tolerates half-interrupts is that usermode helpers
used exceptions to invoke a system call from within the kernel such that
the system call did a normal return (not a return from exception) to
the calling context.  This caused rcu_irq_enter() to be invoked without
a matching rcu_irq_exit().  However, usermode helpers have since been
rewritten to make much more housebroken use of workqueues, kernel threads,
and do_execve(), and therefore should no longer produce half-interrupts.
No one knows of any other source of half-interrupts, but then again,
no one seems insane enough to go audit the entire kernel to verify that
half-interrupts really are a relic of the past.

This commit therefore adds a pair of WARN_ON_ONCE() calls that will
trigger in the presence of half interrupts, which the code will continue
to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
mid-2021, then perhaps RCU can stop handling half-interrupts, which
would be a considerable simplification.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-08-30 16:02:36 -07:00
Paul E. McKenney 3e31009898 rcu: Defer reporting RCU-preempt quiescent states when disabled
This commit defers reporting of RCU-preempt quiescent states at
rcu_read_unlock_special() time when any of interrupts, softirq, or
preemption are disabled.  These deferred quiescent states are reported
at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug
offline operation.  Of course, if another RCU read-side critical
section has started in the meantime, the reporting of the quiescent
state will be further deferred.

This also means that disabling preemption, interrupts, and/or
softirqs will act as an RCU-preempt read-side critical section.
This is enforced by checking preempt_count() as needed.

Some special cases must be handled on an ad-hoc basis, for example,
context switch is a quiescent state even though both the scheduler and
do_exit() disable preemption.  In these cases, additional calls to
rcu_preempt_deferred_qs() override the preemption disabling.  Similar
logic overrides disabled interrupts in rcu_preempt_check_callbacks()
because in this case the quiescent state happened just before the
corresponding scheduling-clock interrupt.

In theory, this change lifts a long-standing restriction that required
that if interrupts were disabled across a call to rcu_read_unlock()
that the matching rcu_read_lock() also be contained within that
interrupts-disabled region of code.  Because the reporting of the
corresponding RCU-preempt quiescent state is now deferred until
after interrupts have been enabled, it is no longer possible for this
situation to result in deadlocks involving the scheduler's runqueue and
priority-inheritance locks.  This may allow some code simplification that
might reduce interrupt latency a bit.  Unfortunately, in practice this
would also defer deboosting a low-priority task that had been subjected
to RCU priority boosting, so real-time-response considerations might
well force this restriction to remain in place.

Because RCU-preempt grace periods are now blocked not only by RCU
read-side critical sections, but also by disabling of interrupts,
preemption, and softirqs, it will be possible to eliminate RCU-bh and
RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels.  This may
require some additional plumbing to provide the network denial-of-service
guarantees that have been traditionally provided by RCU-bh.  Once these
are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh
into RCU-sched.  This would mean that all kernels would have but
one flavor of RCU, which would open the door to significant code
cleanup.

Moving to a single flavor of RCU would also have the beneficial effect
of reducing the NOCB kthreads by at least a factor of two.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback
  from Joel Fernandes. ]
[ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
  response to bug reports from kbuild test robot. ]
[ paulmck: Fix bug located by kbuild test robot involving recursion
  via rcu_preempt_deferred_qs(). ]
2018-08-30 16:02:34 -07:00
Byungchul Park cf7614e13c rcu: Refactor rcu_{nmi,irq}_{enter,exit}()
When entering or exiting irq or NMI handlers, the current code uses
->dynticks_nmi_nesting to detect if it is in the outermost handler,
that is, the one interrupting or returning to an RCU-idle context (the
idle loop or nohz_full usermode execution).  When entering the outermost
handler via an interrupt (as opposed to NMI), it is necessary to invoke
rcu_dynticks_task_exit() just before the CPU is marked non-idle from an
RCU perspective and to invoke rcu_cleanup_after_idle() just after the
CPU is marked non-idle.  Similarly, when exiting the outermost handler
via an interrupt, it is necessary to invoke rcu_prepare_for_idle() just
before marking the CPU idle and to invoke rcu_dynticks_task_enter()
just after marking the CPU idle.

The decision to execute these four functions is currently taken in
rcu_irq_enter() and rcu_irq_exit() as follows:

   rcu_irq_enter()
      /* A conditional branch with ->dynticks_nmi_nesting */
      rcu_nmi_enter()
         /* A conditional branch with ->dynticks */
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_irq_exit()
      /* A conditional branch with ->dynticks_nmi_nesting */
      rcu_nmi_exit()
         /* A conditional branch with ->dynticks_nmi_nesting */
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_nmi_enter()
      /* A conditional branch with ->dynticks */

   rcu_nmi_exit()
      /* A conditional branch with ->dynticks_nmi_nesting */

This works, but the conditional branches in rcu_irq_enter() and
rcu_irq_exit() are redundant with those in rcu_nmi_enter() and
rcu_nmi_exit(), respectively.  Redundant branches are not something
we want in the to/from-idle fastpaths, so this commit refactors
rcu_{nmi,irq}_{enter,exit}() so they use a common inlined function passed
a constant argument as follows:

   rcu_irq_enter() inlining rcu_nmi_enter_common(irq=true)
      /* A conditional branch with ->dynticks */

   rcu_irq_exit() inlining rcu_nmi_exit_common(irq=true)
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_nmi_enter() inlining rcu_nmi_enter_common(irq=false)
      /* A conditional branch with ->dynticks */

   rcu_nmi_exit() inlining rcu_nmi_exit_common(irq=false)
      /* A conditional branch with ->dynticks_nmi_nesting */

The combination of the constant function argument and the inlining allows
the compiler to discard the conditionals that previously controlled
execution of rcu_dynticks_task_exit(), rcu_cleanup_after_idle(),
rcu_prepare_for_idle(), and rcu_dynticks_task_enter().  This reduces both
the to-idle and from-idle path lengths by two conditional branches each,
and improves readability as well.

This commit also changes order of execution from this:

	rcu_dynticks_task_exit();
	rcu_dynticks_eqs_exit();
	trace_rcu_dyntick();
	rcu_cleanup_after_idle();

To this:

	rcu_dynticks_task_exit();
	rcu_dynticks_eqs_exit();
	rcu_cleanup_after_idle();
	trace_rcu_dyntick();

In other words, the calls to rcu_cleanup_after_idle() and
trace_rcu_dyntick() are reversed.  This has no functional effect because
the real concern is whether a given call is before or after the call to
rcu_dynticks_eqs_exit(), and this patch does not change that.  Before the
call to rcu_dynticks_eqs_exit(), RCU is not yet watching the current
CPU and after that call RCU is watching.

A similar switch in calling order happens on the idle-entry path, with
similar lack of effect for the same reasons.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Applied Steven Rostedt feedback. ]
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-30 16:00:46 -07:00
Linus Torvalds f7951c33f0 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:

 - Cleanup and improvement of NUMA balancing

 - Refactoring and improvements to the PELT (Per Entity Load Tracking)
   code

 - Watchdog simplification and related cleanups

 - The usual pile of small incremental fixes and improvements

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  watchdog: Reduce message verbosity
  stop_machine: Reflow cpu_stop_queue_two_works()
  sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()
  sched/numa: Use group_weights to identify if migration degrades locality
  sched/numa: Update the scan period without holding the numa_group lock
  sched/numa: Remove numa_has_capacity()
  sched/numa: Modify migrate_swap() to accept additional parameters
  sched/numa: Remove unused task_capacity from 'struct numa_stats'
  sched/numa: Skip nodes that are at 'hoplimit'
  sched/debug: Reverse the order of printing faults
  sched/numa: Use task faults only if numa_group is not yet set up
  sched/numa: Set preferred_node based on best_cpu
  sched/numa: Simplify load_too_imbalanced()
  sched/numa: Evaluate move once per node
  sched/numa: Remove redundant field
  sched/debug: Show the sum wait time of a task group
  sched/fair: Remove #ifdefs from scale_rt_capacity()
  sched/core: Remove get_cpu() from sched_fork()
  sched/cpufreq: Clarify sugov_get_util()
  sched/sysctl: Remove unused sched_time_avg_ms sysctl
  ...
2018-08-13 11:25:07 -07:00
Paul E. McKenney 18952651da Merge branches 'fixes1.2018.07.12b' and 'torture1.2018.07.12b' into HEAD
fixes1.2018.07.12b: Post-gp_seq miscellaneous fixes
torture1.2018.07.12b: Post-gp_seq torture-test updates
2018-07-12 15:42:41 -07:00
Joel Fernandes (Google) 4babd855fd rcutorture: Add support to detect if boost kthread prio is too low
When rcutorture is built in to the kernel, an earlier patch detects
that and raises the priority of RCU's kthreads to allow rcutorture's
RCU priority boosting tests to succeed.

However, if rcutorture is built as a module, those priorities must be
raised manually via the rcutree.kthread_prio kernel boot parameter.
If this manual step is not taken, rcutorture's RCU priority boosting
tests will fail due to kthread starvation.  One approach would be to
raise the default priority, but that risks breaking existing users.
Another approach would be to allow runtime adjustment of RCU's kthread
priorities, but that introduces numerous "interesting" race conditions.
This patch therefore instead detects too-low priorities, and prints a
message and disables the RCU priority boosting tests in that case.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:08 -07:00
Paul E. McKenney 6bea2cc5a9 rcu: Remove rcutorture test version and sequence number
Back when RCU had a debugfs interface, there was a test version and
sequence number that allowed associating debugfs data with a particular
test run, where the test run started with modprobe and ended with rmmod,
which was how tests were run back on the old ABAT system within IBM.
But rcutorture testing no longer runs on ABAT, and there is no longer an
RCU debugfs interface, so there is no longer any need for test versions
and sequence numbers.

This commit therefore removes the rcutorture_record_test_transition()
and rcutorture_record_progress() functions, and along with them the
rcutorture_testseq and rcutorture_vernum variables that they update.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:42:01 -07:00
Joel Fernandes (Google) c7cd161ecb rcu: Assign higher prio to RCU threads if rcutorture is built-in
The rcutorture RCU priority boosting tests fail even with CONFIG_RCU_BOOST
set because rcutorture's threads run at the same priority as the default
RCU kthreads (RT class with priority of 1).

This patch checks if RCU torture is built into the kernel and if so,
assigns RT priority 1 to the RCU threads, allowing the rcutorture boost
tests to pass.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:26 -07:00
Byungchul Park 67abb96cbf rcu: Check the range of jiffies_till_{first,next}_fqs when setting them
Currently, the range of jiffies_till_{first,next}_fqs are checked and
adjusted on and on in the loop of rcu_gp_kthread on runtime.

However, it's enough to check them only when setting them, not every
time in the loop. So make them handled on a setting time via sysfs.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:22 -07:00
Paul E. McKenney 47199a0812 rcu: Add diagnostics for rcutorture writer stall warning
This commit adds any in-the-future ->gp_seq_needed fields to the
diagnostics for an rcutorture writer stall warning message.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:22 -07:00
Paul E. McKenney b06ae25a1e rcu: Use RCU CPU stall timeout for rcu_check_gp_start_stall()
Currently, rcu_check_gp_start_stall() waits for one second after the first
request before complaining that a grace period has not yet started.  This
was desirable while testing the conversion from ->future_gp_needed[] to
->gp_seq_needed, but it is a bit on the hair-trigger side for production
use under heavy load.  This commit therefore makes this wait time be
exactly that of the RCU CPU stall warning, allowing easy adjustment of
both timeouts to suit the distribution or installation at hand.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:20 -07:00
Paul E. McKenney 51fbb910f5 rcu: Remove __maybe_unused from rcu_cpu_has_callbacks()
The rcu_cpu_has_callbacks() function is now used in all configurations,
so this commit removes the __maybe_unused.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:19 -07:00
Paul E. McKenney 95394e69c4 rcu: Remove "inline" from panic_on_rcu_stall() and rcu_blocking_is_gp()
These functions are in kernel/rcu/tree.c, which is not an include file,
so there is no problem dropping the "inline", especially given that these
functions are nowhere near a fastpath.  This commit therefore delegates
the inlining decision to the compiler by dropping the "inline".

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:18 -07:00
Paul E. McKenney 3b57a3994f rcu: Inline rcu_dynticks_momentary_idle() into its sole caller
The rcu_dynticks_momentary_idle() function is invoked only from
rcu_momentary_dyntick_idle(), and neither function is particularly
large.  This commit therefore saves a few lines by inlining
rcu_dynticks_momentary_idle() into rcu_momentary_dyntick_idle().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:16 -07:00
Paul E. McKenney 6f56f714db rcu: Improve RCU-tasks naming and comments
The naming and comments associated with some RCU-tasks code make
the faulty assumption that context switches due to cond_resched()
are voluntary.  As several people pointed out, this is not the case.
This commit therefore updates function names and comments to better
reflect current reality.

Reported-by: Byungchul Park <byungchul.park@lge.com>
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:15 -07:00
Joe Perches a7538352da rcu: Use pr_fmt to prefix "rcu: " to logging output
This commit also adjusts some whitespace while in the area.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Revert string-breaking %s as requested by Andy Shevchenko. ]
2018-07-12 15:39:13 -07:00
Byungchul Park 07f27570dc rcu: Improve rcu_note_voluntary_context_switch() reporting
We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
is called, no matter whether it actually be scheduled or not. However,
it currently doesn't report the quiescent state when the task enters
into __schedule() as it's called with preempt = true. So make it report
the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
called.

And in TINY_RCU, even though the quiescent state of rcu_bh also should
be reported when the tick interrupt comes from user, it doesn't. So make
it reported.

Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
reported when the tick interrupt comes from not only user but also idle,
as an extended quiescent state.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Simplify rcutiny portion given no RCU-tasks for !PREEMPT. ]
2018-07-12 15:39:12 -07:00
Paul E. McKenney f2e2df5978 rcu: Add diagnostics for offline CPUs failing to report QS
CPUs are expected to report quiescent states when coming online and
when going offline, and grace-period initialization is supposed to
handle any race conditions where a CPU's ->qsmask bit is set just after
it goes offline.  This commit adds diagnostics for the case where an
offline CPU nevertheless has a grace period waiting on it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:10 -07:00
Paul E. McKenney fea3f222d3 rcu: Record ->gp_state for both phases of grace-period initialization
Grace-period initialization first processes any recent CPU-hotplug
operations, and then initializes state for the new grace period.  These
two phases of initialization are currently not distinguished in debug
prints, but the distinction is valuable in a number of debug situations.
This commit therefore introduces two new values for ->gp_state,
RCU_GP_ONOFF and RCU_GP_INIT, in order to make this distinction.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:09 -07:00
Paul E. McKenney 5773894231 rcu: Add CPU online/offline state to dump_blkd_tasks()
Interactions between CPU-hotplug operations and grace-period
initialization can result in dump_blkd_tasks().  One of the first
debugging actions in this case is to search back in dmesg to work
out which of the affected rcu_node structure's CPUs are online and to
determine the last CPU-hotplug operation affecting any of those CPUs.
This can be laborious and error-prone, especially when console output
is lost.

This commit therefore causes dump_blkd_tasks() to dump the state of
the affected rcu_node structure's CPUs and the last grace period during
which the last offline and online operation affected each of these CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:09 -07:00
Paul E. McKenney e05121ba5b rcu: Remove CPU-hotplug failsafe from force-quiescent-state code path
Now that quiescent states for newly offlined CPUs are reported either
when that CPU goes offline or at the end of grace-period initialization,
the CPU-hotplug failsafe in the force-quiescent-state code path is no
longer needed.

This commit therefore removes this failsafe.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:07 -07:00
Paul E. McKenney 17a8212b8d rcu: Remove failsafe check for lost quiescent state
Now that quiescent-state reporting is fully event-driven, this commit
removes the check for a lost quiescent state from force_qs_rnp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:06 -07:00
Paul E. McKenney f34f2f5852 rcu: Move grace-period pre-init delay after pre-init
The main race with the early part of grace-period initialization appears
to be with CPU hotplug.  To more fully open this race window, this commit
moves the rcu_gp_slow() from the beginning of the early initialization
loop to follow that loop, thus widening the race window, especially for
the rcu_node structures that are initialized last.  This commit also
expands rcutree.gp_preinit_delay from 3 to 12, giving the same overall
delay in the grace period, but concentrated in the spot where it will
do the most good.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:06 -07:00
Paul E. McKenney 1e64b15a4b rcu: Fix grace-period hangs due to race with CPU offline
Without special fail-safe quiescent-state-propagation checks, grace-period
hangs can result from the following scenario:

1.	CPU 1 goes offline.

2.	Because CPU 1 is the only CPU in the system blocking the current
	grace period, the grace period ends as soon as
	rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp()
	returns.

3.	At this point, the leaf rcu_node structure's ->lock is no longer
	held: rcu_report_qs_rnp() has released it, as it must in order
	to awaken the RCU grace-period kthread.

4.	At this point, that same leaf rcu_node structure's ->qsmaskinitnext
	field still records CPU 1 as being online.  This is absolutely
	necessary because the scheduler uses RCU (in this case on the
	wake-up path while awakening RCU's grace-period kthread), and
	->qsmaskinitnext contains RCU's idea as to which CPUs are online.
	Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's
	bit from ->qsmaskinitnext would result in a lockdep-RCU splat
	due to RCU being used from an offline CPU.

5.	RCU's grace-period kthread awakens, sees that the old grace period
	has completed and that a new one is needed.  It therefore starts
	a new grace period, but because CPU 1's leaf rcu_node structure's
	->qsmaskinitnext field still shows CPU 1 as being online, this new
	grace period is initialized to wait for a quiescent state from the
	now-offline CPU 1.

6.	Without the fail-safe force-quiescent-state checks, there would
	be no quiescent state from the now-offline CPU 1, which would
	eventually result in RCU CPU stall warnings and memory exhaustion.

It would be good to get rid of the special fail-safe quiescent-state
propagation checks, and thus it would be good to fix things so that
the above scenario cannot happen.  This commit therefore adds a new
->ofl_lock to the rcu_state structure.  This lock is held by rcu_gp_init()
across the applying of buffered online and offline operations to the
rcu_node tree, and it is also held by rcu_cleanup_dying_idle_cpu()
when buffering a new offline operation.  This prevents rcu_gp_init()
from acquiring the leaf rcu_node structure's lock during the interval
between when rcu_cleanup_dying_idle_cpu() invokes rcu_report_qs_rnp(),
which releases ->lock and the re-acquisition of that same lock.
This in turn prevents the failure scenario outlined above, and will
hopefully eventually allow removal of the offline-CPU checks from the
force-quiescent-state code path.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:04 -07:00
Paul E. McKenney ec2c29765a rcu: Fix grace-period hangs from mid-init task resume
Without special fail-safe quiescent-state-propagation checks, grace-period
hangs can result from the following scenario:

1.	A task running on a given CPU is preempted in its RCU read-side
	critical section.

2.	That CPU goes offline, and there are now no online CPUs
	corresponding to that CPU's leaf rcu_node structure.

3.	The rcu_gp_init() function does the first phase of grace-period
	initialization, and sets the aforementioned leaf rcu_node
	structure's ->qsmaskinit field to all zeroes.  Because there
	is a blocked task, it does not propagate the zeroing of either
	->qsmaskinit or ->qsmaskinitnext up the rcu_node tree.

4.	The task resumes on some other CPU and exits its critical section.
	There is no grace period in progress, so the resulting quiescent
	state is not reported up the tree.

5.	The rcu_gp_init() function does the second phase of grace-period
	initialization, which results in the leaf rcu_node structure
	being initialized to expect no further quiescent states, but
	with that structure's parent expecting a quiescent-state report.

	The parent will never receive a quiescent state from this leaf
	rcu_node structure, so the grace period will hang, resulting in
	RCU CPU stall warnings.

It would be good to get rid of the special fail-safe quiescent-state
propagation checks.  This commit therefore checks the leaf rcu_node
structure's ->wait_blkd_tasks field during grace-period initialization.
If this flag is set, the rcu_report_qs_rnp() is invoked to immediately
report the possible quiescent state.  While in the neighborhood, this
commit also report quiescent states for any CPUs that went offline between
the two phases of grace-period initialization, thus reducing grace-period
delays and hopefully eventually allowing removal of offline-CPU checks
from the force-quiescent-state code path.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:04 -07:00
Paul E. McKenney 99990da1b3 rcu: Suppress more involved false-positive preempted-task splats
Consider the following sequence of events in a PREEMPT=y kernel:

1.	All but one of the CPUs corresponding to a given leaf rcu_node
	structure go offline.  Each of these CPUs clears its bit in that
	structure's ->qsmaskinitnext field.

2.	A new grace period starts, and rcu_gp_init() scans the leaf
	rcu_node structures, applying CPU-hotplug changes since the
	start of the previous grace period, including those changes in
	#1 above.  This copies each leaf structure's ->qsmaskinitnext
	to its ->qsmask field, which represents the CPUs that this new
	grace period will wait on.  Each copy operation is done holding
	the corresponding leaf rcu_node structure's ->lock, and at the
	end of this scan, rcu_gp_init() holds no locks.

3.	The last CPU corresponding to #1's leaf rcu_node structure goes
	offline, clearing its bit in that structure's ->qsmaskinitnext
	field, but not touching the ->qsmaskinit field.  Note that
	rcu_gp_init() is not currently holding any locks!  This CPU does
	-not- report a quiescent state because the grace period has not
	yet initialized itself sufficiently to have set any bits in any
	of the leaf rcu_node structures' ->qsmask fields.

4.	The rcu_gp_init() function continues initializing the new grace
	period, copying each leaf rcu_node structure's ->qsmaskinit
	field to its ->qsmask field while holding the corresponding ->lock.
	This sets the ->qsmask bit corresponding to #3's CPU.

5.	Before the grace period ends, #3's CPU comes back online.
	Because te grace period has not yet done any force-quiescent-state
	scans (which would report a quiescent state on behalf of any
	offline CPUs), this CPU's ->qsmask bit is still set.

6.	A task running on the newly onlined CPU is preempted while in
	an RCU read-side critical section.  Because this CPU's ->qsmask
	bit is net, not only does this task queue itself on the leaf
	rcu_node structure's ->blkd_tasks list, it also sets that
	structure's ->gp_tasks pointer to reference it.

7.	The grace period started in #1 above comes to an end.  This
	results in rcu_gp_cleanup() being invoked, which, among other
	things, checks to make sure that there are no tasks blocking the
	just-ended grace period, that is, that all ->gp_tasks pointers
	are NULL.  The ->gp_tasks pointer corresponding to the task
	preempted in #3 above is non-NULL, which results in a splat.

This splat is a false positive.  The task's RCU read-side critical
section cannot have begun before the just-ended grace period because
this would mean either: (1) The CPU came online before the grace period
started, which cannot have happened because the grace period started
before that CPU went offline, or (2) The task started its RCU read-side
critical section on some other CPU, but then it would have had to have
been preempted before migrating to this CPU, which would mean that it
would have instead queued itself on that other CPU's rcu_node structure.
RCU's grace periods thus are working correctly.  Or, more accurately,
that remaining bugs in RCU's grace periods are elsewhere.

This commit eliminates this false positive by adding code to the end
of rcu_cpu_starting() that reports a quiescent state to RCU, which has
the side-effect of clearing that CPU's ->qsmask bit, preventing the
above scenario.  This approach has the added benefit of more promptly
reporting quiescent states corresponding to offline CPUs.  Nevertheless,
this commit does -not- remove the need for the force-quiescent-state
scans to check for offline CPUs, given that a CPU might remain offline
indefinitely.  And without the checks in the force-quiescent-state scans,
the grace period would also persist indefinitely, which could result in
hangs or memory exhaustion.

Note well that the call to rcu_report_qs_rnp() reporting the quiescent
state must come -after- the setting of this CPU's bit in the leaf rcu_node
structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU will complain
bitterly about quiescent states coming from an offline CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:03 -07:00
Paul E. McKenney fece27760f rcu: Suppress false-positive preempted-task splats
Consider the following sequence of events in a PREEMPT=y kernel:

1.	All CPUs corresponding to a given rcu_node structure go offline.
	A new grace period starts just after the CPU-hotplug code path
	does its synchronize_rcu() for the last CPU, so at least this
	CPU is present in that structure's ->qsmask.

2.	Before the grace period ends, a CPU comes back online, and not
	just any CPU, but the one corresponding to a non-zero bit in
	the leaf rcu_node structure's ->qsmask.

3.	A task running on the newly onlined CPU is preempted while in
	an RCU read-side critical section.  Because this CPU's ->qsmask
	bit is net, not only does this task queue itself on the leaf
	rcu_node structure's ->blkd_tasks list, it also sets that
	structure's ->gp_tasks pointer to reference it.

4.	The grace period started in #1 above comes to an end.  This
	results in rcu_gp_cleanup() being invoked, which, among other
	things, checks to make sure that there are no tasks blocking the
	just-ended grace period, that is, that all ->gp_tasks pointers
	are NULL.  The ->gp_tasks pointer corresponding to the task
	preempted in #3 above is non-NULL, which results in a splat.

This splat is a false positive.  The task's RCU read-side critical
section cannot have begun before the just-ended grace period because
this would mean either: (1) The CPU came online before the grace period
started, which cannot have happened because the grace period started
before that CPU was all the way offline, or (2) The task started its
RCU read-side critical section on some other CPU, but then it would
have had to have been preempted before migrating to this CPU, which
would mean that it would have instead queued itself on that other CPU's
rcu_node structure.

This commit eliminates this false positive by adding code to the end
of rcu_cleanup_dying_idle_cpu() that reports a quiescent state to RCU,
which has the side-effect of clearing that CPU's ->qsmask bit, preventing
the above scenario.  This approach has the added benefit of more promptly
reporting quiescent states corresponding to offline CPUs.

Note well that the call to rcu_report_qs_rnp() reporting the quiescent
state must come -before- the clearing of this CPU's bit in the leaf
rcu_node structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU
will complain bitterly about quiescent states coming from an offline CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:02 -07:00
Paul E. McKenney 5554788e1d rcu: Suppress false-positive offline-CPU lockdep-RCU splat
The rcu_lockdep_current_cpu_online() function currently checks only the
RCU-sched data structures to determine whether or not RCU believes that a
given CPU is offline.  Unfortunately, there are multiple flavors of RCU,
which means that there is a short window of time during which the various
flavors disagree as to whether or not a given CPU is offline.  This can
result in false-positive lockdep-RCU splats in which some other flavor
of RCU tries to do something based on its view that the CPU is online,
only to get hit with a lockdep-RCU splat because RCU-sched instead
believes that the CPU is offline.

This commit therefore changes rcu_lockdep_current_cpu_online() to scan
all RCU flavors and to consider a given CPU to be online if any of the
RCU flavors believe it to be online, thus preventing these false-positive
splats.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:02 -07:00
Paul E. McKenney 928164351e rcu: Prevent useless FQS scan after all CPUs have checked in
The force_qs_rnp() function checks for ->qsmask being all zero, that is,
all CPUs for the current rcu_node structure having already passed through
quiescent states.  But with RCU-preempt, this is not sufficient to report
quiescent states further up the tree, so there are further checks that
can initiate RCU priority boosting and also for races with CPU-hotplug
operations.  However, if neither of these further checks apply, the code
proceeds to carry out a useless scan of an all-zero ->qsmask.

This commit therefore adds code to release the current rcu_node
structure's lock and continue on to the next rcu_node structure, thereby
avoiding this useless scan.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:01 -07:00
Paul E. McKenney 91f63ced7d rcu: Replace smp_wmb() with smp_store_release() for stall check
This commit gets rid of the smp_wmb() in record_gp_stall_check_time()
in favor of an smp_store_release().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:01 -07:00
Paul E. McKenney 77cfc7bf24 rcu: Fix typo and add additional debug
This commit fixes a typo and adds some additional debugging to the
message emitted when a task blocking the current grace period is listed
as blocking it when either that grace period ends or the next grace
period begins.  This commit also reformats the console message for
readability.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:39:00 -07:00
Paul E. McKenney c74859d1eb rcu: Make rcu_report_unblock_qs_rnp() warn on violated preconditions
If rcu_report_unblock_qs_rnp() is invoked on something other than
preemptible RCU or if there are still preempted tasks blocking the
current grace period, something went badly wrong in the caller.
This commit therefore adds WARN_ON_ONCE() to these conditions, but
leaving the legitimate reason for early exit (rnp->qsmask != 0)
unwarned.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:59 -07:00
Paul E. McKenney 8d672fa6bf rcu: Make rcu_init_new_rnp() stop upon already-set bit
Currently, rcu_init_new_rnp() walks up the rcu_node combining tree,
setting bits in the ->qsmaskinit fields on the way up.  It walks up
unconditionally, regardless of the initial state of these bits.  This is
OK because only the corresponding RCU grace-period kthread ever tests
or sets these bits during runtime.  However, it is also pointless, and
it increases both memory and lock contention (albeit only slightly), so
this commit stops the walk as soon as an already-set bit is encountered.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:59 -07:00
Paul E. McKenney c50cbe535c rcu: Fix an obsolete ->qsmaskinit comment
Back in the old days, when grace-period initialization blocked CPU
hotplug, the ->qsmaskinit mask was indeed updated at the time that
a given CPU went offline.  However, with the deferral of these updates
until the beginning of the next grace period in commit 0aa04b055e
("rcu: Process offlining and onlining only at grace-period start"),
it is instead ->qsmaskinitnext that gets updated at that time.

This commit therefore updates the obsolete comment.  It also fixes
punctuation while on the topic of comments mentioning ->qsmaskinit.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:58 -07:00
Paul E. McKenney 962aff03c3 rcu: Clean up handling of tasks blocked across full-rcu_node offline
Commit 0aa04b055e ("rcu: Process offlining and onlining only at
grace-period start") deferred handling of CPU-hotplug events until the
start of the next grace period, but consider the following sequence
of events:

1.	A task is preempted within an RCU-preempt read-side critical
	section.

2.	The CPU that this task was running on goes offline, along with all
	other CPUs sharing the corresponding leaf rcu_node structure.

3.	The task resumes execution.

4.	One of those CPUs comes back online before a new grace period starts.

In step 2, the code in the next rcu_gp_init() invocation will (correctly)
defer removing the leaf rcu_node structure from the upper-level bitmasks,
and will (correctly) set that structure's ->wait_blkd_tasks field.  During
the ensuing interval, RCU will (correctly) track the tasks preempted on
that structure because they must block any subsequent grace period.

In step 3, the code in rcu_read_unlock_special() will (correctly) remove
the task from the leaf rcu_node structure.  From this point forward, RCU
need not pay attention to this structure, at least not until one of the
corresponding CPUs comes back online.

In step 4, the code in the next rcu_gp_init() invocation will
(incorrectly) invoke rcu_init_new_rnp().  This is incorrect because
the corresponding rcu_cleanup_dead_rnp() was never invoked.  This is
nevertheless harmless because the upper-level bits are still set.
So, no harm, no foul, right?

At least, all is well until a little further into rcu_gp_init()
invocation, which will notice that there are no longer any tasks blocked
on the leaf rcu_node structure, conclude that there is no longer anything
left over from step 2's offline operation, and will therefore invoke
rcu_cleanup_dead_rnp().  But this invocation of rcu_cleanup_dead_rnp()
is for the beginning of the earlier offline interval, and the previous
invocation of rcu_init_new_rnp() is for the end of that same interval.
That is right, they are invoked out of order.

That cannot be good, can it?

It turns out that this is not a (correctness!) problem because
rcu_cleanup_dead_rnp() checks to see if any of the corresponding CPUs
are online, and refuses to do anything if so.  In other words, in the
case where rcu_init_new_rnp() and rcu_cleanup_dead_rnp() execute out of
order, they both have no effect.

But this is at best an accident waiting to happen.

This commit therefore adds logic to rcu_gp_init() so that
rcu_init_new_rnp() and rcu_cleanup_dead_rnp() are always invoked in
order, and so that neither are invoked at all in cases where RCU had to
pay attention to the leaf rcu_node structure during the entire time that
all corresponding CPUs were offline.

And, while in the area, this commit reduces confusion by using formal
parameters rather than local variables that just happen to have the same
value at that particular point in the code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:58 -07:00
Joel Fernandes (Google) 226ca5e766 rcu: Identify grace period is in progress as we advance up the tree
There's no need to keep checking the same starting node for whether a
grace period is in progress as we advance up the funnel lock loop. Its
sufficient if we just checked it in the start, and then subsequently
checked the internal nodes as we advanced up the combining tree. This
also makes sense because the grace-period updates propogate from the
root to the leaf, so there's a chance we may find a grace period has
started as we advance up, lets check for the same.

Reported-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:57 -07:00
Joel Fernandes (Google) df2bf8f7f7 rcu: Use better variable names in funnel locking loop
The funnel locking loop in rcu_start_this_gp uses rcu_root as a
temporary variable while walking the combining tree. This causes a
tiresome exercise of a code reader reminding themselves that rcu_root
may not be root. Lets just call it rnp, and rename other variables as
well to be more appropriate.

Original patch: https://patchwork.kernel.org/patch/10396577/

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix name in comment as well. ]
2018-07-12 15:38:57 -07:00
Joel Fernandes b73de91d6a rcu: Rename the grace-period-request variables and parameters
The name 'c' is used for variables and parameters holding the requested
grace-period sequence number.  However it is no longer very meaningful
given the conversions from ->gpnum and (especially) ->completed to
->gp_seq. This commit therefore renames 'c' to 'gp_seq_req'.

Previous patch discussion is at:
https://patchwork.kernel.org/patch/10396579/

Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:56 -07:00
Paul E. McKenney 3d18469a2b rcu: Regularize resetting of rcu_data wrap indicator
The rcu_data structure's ->gpwrap indicator is currently reset only
when the CPU in question detects a new grace period.  This is in theory
sufficient because any CPU that has been out of action for long enough
that its ->gpwrap indicator is set is guaranteed to see both the end
of an old grace period and the start of a new one.

However, the current code leaves a short window during which the ->gpwrap
indicator has been reset but the corresponding ->gp_seq counter has not
yet been brought up to date.  This is harmless because interrupts are
disabled, but it is likely to (at the very least) cause confusion.

This commit therefore moves the resetting of ->gpwrap to follow the
updating of ->gp_seq.  While in the area, it also resets ->gp_seq_needed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:56 -07:00
Paul E. McKenney d72193123c rcutorture: Correctly handle grace-period sequence wrap
The new ->gq_seq grace-period sequence numbers must be shifted down,
which give artifacts when these numbers wrap.  This commit therefore
enables rcutorture and rcuperf to handle grace-period sequence numbers
even if they do wrap.  It does this by allowing a special subtraction
function to be specified, and this function subtracts before shifting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:55 -07:00
Paul E. McKenney 2e3e5e5501 rcu: Make rcu_start_this_gp() check for grace period already started
In the old days of ->gpnum and ->completed, the code requesting a new
grace period checked to see if that grace period had already started,
bailing early if so.  The new-age ->gp_seq approach instead checks
whether the grace period has already finished.  A compensating change
pushed the requested grace period down to the bottom of the tree, thus
reducing lock contention and even eliminating it in some cases.  But why
not further reduce contention, especially on large systems, by doing both,
especially given that the cost of doing both is extremely small?

This commit therefore adds a new rcu_seq_started() function that checks
whether a specified grace period has already started.  It then uses
this new function in place of rcu_seq_done() in the rcu_start_this_gp()
function's funnel locking code.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:54 -07:00
Joel Fernandes (Google) 5ca0905f67 rcu: Fix cpustart tracepoint gp_seq number
The "cpustart" trace event shows a stale gp_seq. This is because it uses
rdp->gp_seq, which is updated only at the end of the __note_gp_changes()
function. This commit therefore instead uses rnp->gp_seq.

An alternative fix would be to update rdp->gp_seq earlier, but this would
break RCU's detection of the beginning of a new-to-this-CPU grace period.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:53 -07:00
Joel Fernandes (Google) 5b55072f22 rcu: Produce last "CleanupMore" trace only if late-breaking request
Currently Tree RCU's clean-up code emits a "CleanupMore" trace event in
response to late-arriving grace-period requests even if the grace period
was already requested. This makes "CleanupMore" show up an extra time (in
addition to once for each rcu_node structure that was previously marked
with the request), and for no good reason.  This commit therefore avoids
emitting this trace message unless the the only request for this next
grace period arrived during or after the cleanup scan of the rcu_node
structures.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:53 -07:00
Paul E. McKenney a2165e4168 rcu: Don't funnel-lock above leaf node if GP in progress
The old grace-period start code would acquire only the leaf's rcu_node
structure's ->lock if that structure believed that a grace period was
in progress.  The new code advances to the leaf's parent in this case,
needlessly acquiring then leaf's parent's ->lock.  This commit therefore
checks the grace-period state after marking the leaf with the need for
the specified grace period, and if the leaf believes that a grace period
is in progress, takes an early exit.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add "Startedleaf" tracing as suggested by Joel Fernandes. ]
2018-07-12 15:38:52 -07:00
Paul E. McKenney e44e73ca47 rcu: Make simple callback acceleration refer to rdp->gp_seq_needed
Now that the rcu_data structure contains ->gp_seq_needed, create an
rcu_accelerate_cbs_unlocked() helper function that locklessly checks to
see if new callbacks' required grace period has already been requested.
If so, update the callback list locally and again locklessly.  (Though
interrupts must be and are disabled to avoid racing with conflicting
updates in interrupt handlers.)

Otherwise, call rcu_accelerate_cbs() as before.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:49 -07:00
Paul E. McKenney ff3bb6f4d0 rcu: Remove ->gpnum and ->completed
Now that everything has been converted to use ->gp_seq instead of
->gpnum and ->completed, this commit removes ->gpnum and ->completed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:48 -07:00
Paul E. McKenney fee5997c17 rcu: Convert rcu_fqs tracepoint to ->gp_seq
This commit makes the rcu_fqs tracepoint use ->gp_seq instead of ->gpnum.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:47 -07:00
Paul E. McKenney db023296f0 rcu: Convert rcu_quiescent_state_report tracepoint to ->gp_seq
This commit makes the rcu_quiescent_state_report tracepoint use ->gp_seq
instead of ->gpnum.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:47 -07:00
Paul E. McKenney abd13fdd95 rcu: Convert rcu_future_grace_period tracepoint to gp_seq
This commit makes the rcu_future_grace_period tracepoint use gp_seq
instead of ->gpnum and ->completed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:44 -07:00
Paul E. McKenney 477351f782 rcu: Convert rcu_grace_period tracepoint to gp_seq
This commit makes the rcu_grace_period tracepoint use gp_seq instead
of ->gpnum or ->completed.  It also introduces a "cpuofl-bgp" string to
less obscurely indicate when a CPU has gone offline while a grace period
is waiting on it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:43 -07:00
Paul E. McKenney ab5e869c1f rcu: Make rcu_nocb_wait_gp() check if GP already requested
This commit makes rcu_nocb_wait_gp() check rdp->gp_seq_needed to see
if the current CPU already knows about the needed grace period having
already been requested.  If so, it avoids acquiring the corresponding
leaf rcu_node structure's ->lock, thus decreasing contention.  This
optimization is intended for cases where either multiple leader rcuo
kthreads are running on the same CPU or these kthreads are running on
a non-offloaded (e.g., housekeeping) CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Move lock release past "if" as suggested by Joel Fernandes. ]
[ paulmck: Fix caching of furthest-future requested grace period. ]
2018-07-12 15:38:42 -07:00
Paul E. McKenney 7a1d0f23ad rcu: Move from ->need_future_gp[] to ->gp_seq_needed
One problem with the ->need_future_gp[] array is that the grace-period
assignment of each element changes as the grace periods complete.
This means that it is necessary to hold a lock when checking this
array to learn if a given grace period has already been requested.
This increase lock contention, which is the opposite of helpful.
This commit therefore replaces the ->need_future_gp[] with a single
->gp_seq_needed value and keeps it updated in the rcu_data structure.

This will enable reliable lockless checking of whether or not a given
grace period has already been requested.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:37:48 -07:00
Paul E. McKenney aebc82644b rcutorture: Convert rcutorture_get_gp_data() to ->gp_seq
SRCU has long used ->srcu_gp_seq, and now RCU uses ->gp_seq.  This
commit therefore moves the rcutorture_get_gp_data() function from
a ->gpnum / ->completed pair to ->gp_seq.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:57 -07:00
Paul E. McKenney 471f87c3d9 rcu: Make RCU CPU stall warnings use ->gp_seq
This commit makes the RCU CPU stall-warning code in print_other_cpu_stall(),
print_cpu_stall(), and check_cpu_stall() use ->gp_seq instead of ->gpnum
and ->completed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:56 -07:00
Paul E. McKenney 29365e563b rcu: Convert grace-period requests to ->gp_seq
This commit converts the grace-period request code paths from ->completed
and ->gpnum to ->gp_seq.  The need_future_gp_element() macro encapsulates
the shift operation required to use ->gp_seq as an index to the
->need_future_gp[] array.  The rcu_cbs_completed() function is removed
in favor of the rcu_seq_snap() function.  The rcu_start_this_gp()
gets some temporary consistency checks and uses rcu_seq_done(),
rcu_seq_current(), rcu_seq_state(), and rcu_gp_in_progress() in place
of the earlier open-coded comparisons of ->gpnum and ->completed.
The rcu_future_gp_cleanup() function replaces use of ->completed
with ->gp_seq.  The rcu_accelerate_cbs() function replaces a call to
rcu_cbs_completed() with one to rcu_seq_snap().  The rcu_advance_cbs()
function replaces an access to >completed with one to ->gp_seq and adds
some temporary warnings.  The rcu_nocb_wait_gp() function replaces a
call to rcu_cbs_completed() with one to rcu_seq_snap() and an open-coded
comparison with rcu_seq_done().

The temporary warnings will be removed when the various ->gpnum and
->completed fields are removed.  Their purpose is to locate code who
might still be using ->gpnum and ->completed.  (Much easier that way
than trying to trace down the causes of too-short grace periods and
grace-period hangs!)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:55 -07:00
Paul E. McKenney d43a5d32e1 rcu: Convert ->completedqs to ->gp_seq
This commit switches the quiescent-state no-backtracking checks from
->gpnum and ->completed to ->gp_seq.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:54 -07:00
Paul E. McKenney 8aa670cdac rcu: Convert ->rcu_iw_gpnum to ->gp_seq
This commit switches the interrupt-disabled detection mechanism to
->gp_seq.  This mechanism is used as part of RCU CPU stall warnings,
and detects cases where the stall is due to a CPU having interrupts
disabled.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:53 -07:00
Paul E. McKenney ba04107fc9 rcu: Move rcu_gp_in_progress() to ->gp_seq
This commit makes rcu_gp_in_progress() use ->gp_seq instead of
->completed and ->gpnum.  The READ_ONCE() invocations are buried
in rcu_seq_current().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:52 -07:00
Paul E. McKenney e05720b097 rcu: Move rcu_implicit_dynticks_qs() to ->gp_seq
This commit makes rcu_implicit_dynticks_qs() use ->gp_seq, with the
exception of tracing, which will be converted later.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:51 -07:00
Paul E. McKenney a66ae8ae35 rcu: Convert rcu_gpnum_ovf() to ->gp_seq
This commit converts rcu_gpnum_ovf() to use ->gp_seq instead of ->gpnum.
Same size unsigned long, so same approach.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:50 -07:00
Paul E. McKenney 67e14c1e39 rcu: Move RCU's grace-period-change code to ->gp_seq
This commit moves __note_gp_changes(), note_gp_changes(), and
__rcu_pending() to ->gp_seq, creating new rcu_seq_completed_gp() and
rcu_seq_new_gp() functions for this purpose.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Reinstate "cpuend: trace as suggested by Joel Fernandes. ]
2018-07-12 14:27:50 -07:00
Paul E. McKenney e4be81a2ed rcu: Convert conditional grace-period primitives to ->gp_seq
This commit converts get_state_synchronize_rcu(), cond_synchronize_rcu(),
get_state_synchronize_sched(), and cond_synchronize_sched() from ->gpnum
and ->completed to ->gp_seq.  Note that this also introduces a full
memory barrier in the already-done paths off cond_synchronize_rcu() and
cond_synchronize_sched(), as work with LKMM indicates that the earlier
smp_load_acquire() were insufficiently strong in some situations where
these two functions were called just as the grace period ended.  In such
cases, these two functions would not gain the benefit of memory ordering
at the end of the grace period.

Please note that the performance impact is negligible, as you shouldn't
be using either function anywhere near a fastpath in any case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:49 -07:00
Paul E. McKenney c9a24e2d0c rcu: Make quiescent-state reporting use ->gp_seq
This commit switches the functions reporting quiescent states from
use of ->gpnum to ->gp_seq.  In either case, the point is to handle
races where a given grace period ends before a quiescent state can
be reported.  Failing to catch these races would result in too-short
grace periods, hence the checking.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:48 -07:00
Paul E. McKenney 78c5a67f17 rcu: Convert rcu_check_gp_kthread_starvation() to GP sequence number
This commit switches rcu_check_gp_kthread_starvation() from printing
->gpnum and ->completed to printing ->gp_seq upon detecting a starving
RCU grace-period kthread during an RCU CPU stall warning.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:48 -07:00
Paul E. McKenney 17ef2fe97c rcu: Make rcutorture's batches-completed API use ->gp_seq
The rcutorture test invokes rcu_batches_started(),
rcu_batches_completed(), rcu_batches_started_bh(),
rcu_batches_completed_bh(), rcu_batches_started_sched(), and
rcu_batches_completed_sched() to do grace-period consistency checks,
and rcuperf uses the _completed variants for statistics.
These functions use ->gpnum and ->completed.  This commit therefore
replaces them with rcu_get_gp_seq(), rcu_bh_get_gp_seq(), and
rcu_sched_get_gp_seq(), adjusting rcutorture and rcuperf to make
use of them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:47 -07:00
Paul E. McKenney dee4f42298 rcu: Move rcu_gp_slow() to ->gp_seq
This commit moves rcu_gp_slow() to ->gp_seq.  This function only uses
the grace-period number to modulate delay, so rcu_seq_ctr(rsp->gp_seq)
gets the same effect, at least in cases where the delay is to happen
more than four times per wrap of an unsigned long.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:46 -07:00
Paul E. McKenney de30ad512a rcu: Introduce grace-period sequence numbers
This commit adds grace-period sequence numbers (->gp_seq) to the
rcu_state, rcu_node, and rcu_data structures, and updates them.
It also checks for consistency between rsp->gpnum and rsp->gp_seq.
These ->gp_seq counters will eventually replace the existing ->gpnum
and ->completed counters, allowing a single memory access to determine
whether or not a grace period is in progress and if so, which one.
This in turn will enable changes that will reduce ->lock contention on
the leaf rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:46 -07:00
Paul E. McKenney 18390aeae7 rcu: Make rcu_gp_cleanup() write only once to ->gp_flags
At the end of rcu_gp_cleanup(), if another grace period is needed, but
not via rcu_accelerate_cbs(), the ->gp_flags field is written twice,
once when making the new grace-period request, and once when clearing
all other types of requests.  This commit therefore adds an else-clause
to avoid this double write.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:25:17 -07:00
Paul E. McKenney 26d950a945 rcu: Diagnostics for grace-period startup hangs
This commit causes a splat if RCU is idle and a request for a new grace
period is ignored for more than one second.  This splat normally indicates
that some code path asked for a new grace period, but failed to wake up
the RCU grace-period kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix bug located by Dan Carpenter and his static checker. ]
[ paulmck: Fix self-deadlock bug located 0day test robot. ]
[ paulmck: Disable unless CONFIG_PROVE_RCU=y. ]
2018-07-12 14:24:42 -07:00
Paul E. McKenney 8c42b1f39f rcu: Exclude near-simultaneous RCU CPU stall warnings
There is a two-jiffy delay between the time that a CPU will self-report
an RCU CPU stall warning and the time that some other CPU will report a
warning on behalf of the first CPU.  This has worked well in the past,
but on busy systems, it is possible for the two warnings to overlap,
which makes interpreting them extremely difficult.

This commit therefore uses a cmpxchg-based timing decision that
allows only one report in a given one-minute period (assuming default
stall-warning Kconfig parameters).  This approach will of course fail
if you are seeing minute-long vCPU preemption, but in that case the
overlapping RCU CPU stall warnings are the least of your worries.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:56 -07:00
Paul E. McKenney 4bc8d55574 rcu: Add debugging info to assertion
The WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp()) in
rcu_gp_cleanup() triggers (inexplicably, of course) every so often.
This commit therefore extracts more information.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:55 -07:00
Peter Zijlstra b3dae109fa sched/swait: Rename to exclusive
Since swait basically implemented exclusive waits only, make sure
the API reflects that.

  $ git grep -l -e "\<swake_up\>"
		-e "\<swait_event[^ (]*"
		-e "\<prepare_to_swait\>" | while read file;
    do
	sed -i -e 's/\<swake_up\>/&_one/g'
	       -e 's/\<swait_event[^ (]*/&_exclusive/g'
	       -e 's/\<prepare_to_swait\>/&_exclusive/g' $file;
    done

With a few manual touch-ups.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: bigeasy@linutronix.de
Cc: oleg@redhat.com
Cc: paulmck@linux.vnet.ibm.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org
2018-06-20 11:35:56 +02:00
Peter Zijlstra f64c6013a2 rcu/x86: Provide early rcu_cpu_starting() callback
The x86/mtrr code does horrific things because hardware. It uses
stop_machine_from_inactive_cpu(), which does a wakeup (of the stopper
thread on another CPU), which uses RCU, all before the CPU is onlined.

RCU complains about this, because wakeups use RCU and RCU does
(rightfully) not consider offline CPUs for grace-periods.

Fix this by initializing RCU way early in the MTRR case.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add !SMP support, per 0day Test Robot report. ]
2018-05-22 16:12:26 -07:00
Paul E. McKenney 22df7316ac Merge branches 'exp.2018.05.15a', 'fixes.2018.05.15a', 'lock.2018.05.15a' and 'torture.2018.05.15a' into HEAD
exp.2018.05.15a: Parallelize expedited grace-period initialization.
fixes.2018.05.15a: Miscellaneous fixes.
lock.2018.05.15a: Decrease lock contention on root rcu_node structure,
	which is a step towards merging RCU flavors.
torture.2018.05.15a: Torture-test updates.
2018-05-15 10:33:05 -07:00
Paul E. McKenney a458360af6 rcu: Drop early GP request check from rcu_gp_kthread()
Now that grace-period requests use funnel locking and now that they
set ->gp_flags to RCU_GP_FLAG_INIT even when the RCU grace-period
kthread has not yet started, rcu_gp_kthread() no longer needs to check
need_any_future_gp() at startup time.  This commit therefore removes
this check.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:31:04 -07:00