linux/kernel
Tejun Heo e5ced8ebb1 cgroup_freezer: replace freezer->lock with freezer_mutex
After 96d365e0b8 ("cgroup: make css_set_lock a rwsem and rename it
to css_set_rwsem"), css task iterators requires sleepable context as
it may block on css_set_rwsem.  I missed that cgroup_freezer was
iterating tasks under IRQ-safe spinlock freezer->lock.  This leads to
errors like the following on freezer state reads and transitions.

  BUG: sleeping function called from invalid context at /work
 /os/work/kernel/locking/rwsem.c:20
  in_atomic(): 0, irqs_disabled(): 0, pid: 462, name: bash
  5 locks held by bash/462:
   #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff811f0843>] vfs_write+0x1a3/0x1c0
   #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8126d78b>] kernfs_fop_write+0xbb/0x170
   #2:  (s_active#70){.+.+.+}, at: [<ffffffff8126d793>] kernfs_fop_write+0xc3/0x170
   #3:  (freezer_mutex){+.+...}, at: [<ffffffff81135981>] freezer_write+0x61/0x1e0
   #4:  (rcu_read_lock){......}, at: [<ffffffff81135973>] freezer_write+0x53/0x1e0
  Preemption disabled at:[<ffffffff81104404>] console_unlock+0x1e4/0x460

  CPU: 3 PID: 462 Comm: bash Not tainted 3.15.0-rc1-work+ #10
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
   ffff88000916a6d0 ffff88000e0a3da0 ffffffff81cf8c96 0000000000000000
   ffff88000e0a3dc8 ffffffff810cf4f2 ffffffff82388040 ffff880013aaf740
   0000000000000002 ffff88000e0a3de8 ffffffff81d05974 0000000000000246
  Call Trace:
   [<ffffffff81cf8c96>] dump_stack+0x4e/0x7a
   [<ffffffff810cf4f2>] __might_sleep+0x162/0x260
   [<ffffffff81d05974>] down_read+0x24/0x60
   [<ffffffff81133e87>] css_task_iter_start+0x27/0x70
   [<ffffffff8113584d>] freezer_apply_state+0x5d/0x130
   [<ffffffff81135a16>] freezer_write+0xf6/0x1e0
   [<ffffffff8112eb88>] cgroup_file_write+0xd8/0x230
   [<ffffffff8126d7b7>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f0756>] vfs_write+0xb6/0x1c0
   [<ffffffff811f121d>] SyS_write+0x4d/0xc0
   [<ffffffff81d08292>] system_call_fastpath+0x16/0x1b

freezer->lock used to be used in hot paths but that time is long gone
and there's no reason for the lock to be IRQ-safe spinlock or even
per-cgroup.  In fact, given the fact that a cgroup may contain large
number of tasks, it's not a good idea to iterate over them while
holding IRQ-safe spinlock.

Let's simplify locking by replacing per-cgroup freezer->lock with
global freezer_mutex.  This also makes the comments explaining the
intricacies of policy inheritance and the locking around it as the
states are protected by a common mutex.

The conversion is mostly straight-forward.  The followings are worth
mentioning.

* freezer_css_online() no longer needs double locking.

* freezer_attach() now performs propagation simply while holding
  freezer_mutex.  update_if_frozen() race no longer exists and the
  comment is removed.

* freezer_fork() now tests whether the task is in root cgroup using
  the new task_css_is_root() without doing rcu_read_lock/unlock().  If
  not, it grabs freezer_mutex and performs the operation.

* freezer_read() and freezer_change_state() grab freezer_mutex across
  the whole operation and pin the css while iterating so that each
  descendant processing happens in sleepable context.

Fixes: 96d365e0b8 ("cgroup: make css_set_lock a rwsem and rename it to css_set_rwsem")
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-13 11:26:31 -04:00
..
debug mm: per-thread vma caching 2014-04-07 16:35:53 -07:00
events Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2014-04-05 13:20:43 -07:00
gcov gcov: reuse kbasename helper 2013-11-13 12:09:34 +09:00
irq genirq: Export symbol no_action() 2014-03-22 11:33:09 +01:00
locking lglock: map to spinlock when !CONFIG_SMP 2014-04-07 16:36:14 -07:00
power kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
printk printk: fix one circular lockdep warning about console_lock 2014-04-03 16:21:08 -07:00
rcu Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 11:21:19 -07:00
sched Merge branch 'akpm' (incoming from Andrew) 2014-04-07 16:38:06 -07:00
time kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
trace Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
.gitignore Ignore generated file kernel/x509_certificate_list 2013-12-10 18:21:34 +00:00
acct.c
async.c
audit.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
audit.h audit: Use struct net not pid_t to remember the network namespce to reply in 2014-03-20 10:10:53 -04:00
audit_tree.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
audit_watch.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
auditfilter.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
auditsc.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
backtracetest.c
bounds.c mm: do not allocate page->ptl dynamically, if spinlock_t fits to long 2013-12-20 12:25:45 -08:00
capability.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2014-04-03 09:26:18 -07:00
cgroup.c cgroup: introduce task_css_is_root() 2014-05-13 11:26:27 -04:00
cgroup_freezer.c cgroup_freezer: replace freezer->lock with freezer_mutex 2014-05-13 11:26:31 -04:00
compat.c Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-02 12:51:41 -07:00
configs.c
context_tracking.c context_tracking: Wrap static key check into more intuitive function name 2013-12-02 20:43:14 +01:00
cpu.c CPU hotplug: Provide lockless versions of callback registration functions 2014-03-20 13:43:40 +01:00
cpu_pm.c
cpuset.c Merge branch 'akpm' (incoming from Andrew) 2014-04-03 16:22:16 -07:00
crash_dump.c
cred.c
delayacct.c kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk() 2013-11-13 12:09:12 +09:00
dma.c
elfcore.c switch elf_core_write_extra_phdrs() to dump_emit() 2013-11-09 00:16:23 -05:00
exec_domain.c
exit.c wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process 2014-04-07 16:36:06 -07:00
extable.c asmlinkage: Make main_extable_sort_needed visible 2014-02-13 18:13:22 -08:00
fork.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex.c futex: update documentation for ordering guarantees 2014-04-12 17:57:51 -07:00
futex_compat.c compat: Get rid of (get|put)_compat_time(val|spec) 2014-02-02 14:09:12 -08:00
groups.c kernel/groups.c: remove return value of set_groups 2014-04-03 16:21:05 -07:00
hrtimer.c timer: Remove code redundancy while calling get_nohz_timer_target() 2014-03-20 12:35:46 +01:00
hung_task.c kernel: audit/fix non-modular users of module_init in core code 2014-04-03 16:21:07 -07:00
irq_work.c perf/x86: Warn to early_printk() in case irq_work is too slow 2014-02-21 21:49:07 +01:00
itimer.c
jump_label.c static_key: WARN on usage before jump_label_init was called 2013-10-19 19:45:35 -04:00
kallsyms.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks
Kconfig.preempt
kexec.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
kmod.c execve: use 'struct filename *' for executable name passing 2014-02-05 12:54:53 -08:00
kprobes.c kprobes: use KSYM_NAME_LEN to size identifier buffers 2013-11-13 12:09:26 +09:00
ksysfs.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
kthread.c kthread: ensure locality of task_struct allocations 2014-04-03 16:20:49 -07:00
latencytop.c
Makefile Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 14:13:25 -07:00
module-internal.h KEYS: Separate the kernel signature checking keyring from module signing 2013-09-25 17:17:01 +01:00
module.c modules: use raw_cpu_write for initialization of per cpu refcount. 2014-04-07 16:36:14 -07:00
module_signing.c keys: change asymmetric keys to use common hash definitions 2013-10-25 17:15:18 -04:00
notifier.c notifier: Substitute rcu_access_pointer() for rcu_dereference_raw() 2014-02-26 06:35:13 -08:00
nsproxy.c
padata.c padata: Fix wrong usage of rcu_dereference() 2013-12-05 21:28:42 +08:00
panic.c kernel/panic.c: display reason at end + pr_emerg 2014-04-07 16:36:08 -07:00
params.c params: improve standard definitions 2013-12-04 14:09:46 +10:30
pid.c pidns: fix free_pid() to handle the first fork failure 2013-09-30 14:31:03 -07:00
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-02 16:20:21 -07:00
posix-cpu-timers.c posix-timers: Convert abuses of BUG_ON to WARN_ON 2013-12-09 16:56:29 +01:00
posix-timers.c
profile.c CPU hotplug notifiers registration fixes for 3.15-rc1 2014-04-07 14:55:46 -07:00
ptrace.c kernel/compat: convert to COMPAT_SYSCALL_DEFINE 2014-03-06 15:35:10 +01:00
range.c
reboot.c kexec: migrate to reboot cpu 2013-12-18 19:04:50 -08:00
relay.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
res_counter.c res_counter: remove interface for locked charging and uncharging 2014-04-07 16:35:54 -07:00
resource.c kernel/resource.c: make reallocate_resource() static 2014-04-03 16:21:07 -07:00
seccomp.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
signal.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
smp.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
smpboot.c
smpboot.h
softirq.c softirq: Add linux/irq.h to make it compile again 2014-03-19 11:28:14 +01:00
stacktrace.c
stop_machine.c stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() 2014-03-11 11:33:47 +01:00
sys.c mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE 2014-04-07 16:35:52 -07:00
sys_ni.c fs, kernel: permit disabling the uselib syscall 2014-04-03 16:21:05 -07:00
sysctl.c hung_task: check the value of "sysctl_hung_task_timeout_sec" 2014-04-07 16:36:07 -07:00
sysctl_binary.c kernel/sysctl_binary.c: use scnprintf() instead of snprintf() 2013-11-13 12:09:33 +09:00
system_certificates.S KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
system_keyring.c KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
task_work.c
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c
time.c
timeconst.bc
timer.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:00:07 -07:00
torture.c rcutorture: Gracefully handle NULL cleanup hooks 2014-02-23 09:04:39 -08:00
tracepoint.c This includes the final patch to clean up and fix the issue with the 2014-04-12 13:06:10 -07:00
tsacct.c
uid16.c
up.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
user-return-notifier.c
user.c kernel: audit/fix non-modular users of module_init in core code 2014-04-03 16:21:07 -07:00
user_namespace.c kernel: audit/fix non-modular users of module_init in core code 2014-04-03 16:21:07 -07:00
utsname.c
utsname_sysctl.c
watchdog.c kernel/watchdog.c: touch_nmi_watchdog should only touch local cpu not every one 2014-04-03 16:20:58 -07:00
workqueue.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:00:07 -07:00
workqueue_internal.h