linux/mm
Dennis Zhou (Facebook) 59b57717ff blkcg: delay blkg destruction until after writeback has finished
Currently, blkcg destruction relies on a sequence of events:
  1. Destruction starts. blkcg_css_offline() is called and blkgs
     release their reference to the blkcg. This immediately destroys
     the cgwbs (writeback).
  2. With blkgs giving up their reference, the blkcg ref count should
     become zero and eventually call blkcg_css_free() which finally
     frees the blkcg.

Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
on the completion of all writeback associated with the blkcg. A count of
the number of cgwbs is maintained and once that goes to zero, blkg
destruction can follow. This should prevent premature blkg destruction
related to writeback.

The new process for blkcg cleanup is as follows:
  1. Destruction starts. blkcg_css_offline() is called which offlines
     writeback. Blkg destruction is delayed on the cgwb_refcnt count to
     avoid punting potentially large amounts of outstanding writeback
     to root while maintaining any ongoing policies. Here, the base
     cgwb_refcnt is put back.
  2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
     and handles destruction of blkgs. This is where the css reference
     held by each blkg is released.
  3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
     This finally frees the blkg.

It seems in the past blk-throttle didn't do the most understandable
things with taking data from a blkg while associating with current. So,
the simplification and unification of what blk-throttle is doing caused
this.

Fixes: 08e18eab0c ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-31 14:48:56 -06:00
..
kasan kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN 2018-08-17 16:20:30 -07:00
backing-dev.c blkcg: delay blkg destruction until after writeback has finished 2018-08-31 14:48:56 -06:00
balloon_compaction.c
bootmem.c
cleancache.c
cma.c mm/cma: remove unsupported gfp_mask parameter from cma_alloc() 2018-08-17 16:20:32 -07:00
cma.h
cma_debug.c mm/cma: remove unsupported gfp_mask parameter from cma_alloc() 2018-08-17 16:20:32 -07:00
compaction.c
debug.c
debug_page_ref.c
dmapool.c
early_ioremap.c
fadvise.c mm/fadvise.c: fix signed overflow UBSAN complaint 2018-08-17 16:20:30 -07:00
failslab.c
filemap.c
frame_vector.c
frontswap.c
gup.c mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
gup_benchmark.c
highmem.c
hmm.c mm, oom: distinguish blockable mode for mmu notifiers 2018-08-22 10:52:44 -07:00
huge_memory.c mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
hugetlb.c mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c
internal.h mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
interval_tree.c
Kconfig mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP 2018-08-17 16:20:32 -07:00
Kconfig.debug mm: clarify CONFIG_PAGE_POISONING and usage 2018-08-22 10:52:44 -07:00
khugepaged.c mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
kmemleak-test.c
kmemleak.c
ksm.c include/linux/compiler*.h: make compiler-*.h mutually exclusive 2018-08-22 17:31:34 -07:00
list_lru.c mm/list_lru: introduce list_lru_shrink_walk_irq() 2018-08-17 16:20:32 -07:00
maccess.c
madvise.c
Makefile
memblock.c mm/memblock.c: replace u64 with phys_addr_t where appropriate 2018-08-17 16:20:30 -07:00
memcontrol.c mm, oom: introduce memory.oom.group 2018-08-22 10:52:45 -07:00
memfd.c
memory-failure.c mm: soft-offline: close the race against page allocation 2018-08-23 18:48:43 -07:00
memory.c mm/cow: don't bother write protecting already write-protected pages 2018-08-25 13:15:03 -07:00
memory_hotplug.c mm/page_alloc: Introduce free_area_init_core_hotplug 2018-08-22 10:52:45 -07:00
mempolicy.c mm: access zone->node via zone_to_nid() and zone_set_nid() 2018-08-22 10:52:45 -07:00
mempool.c mm/mempool.c: add missing parameter description 2018-08-22 10:52:44 -07:00
memtest.c
migrate.c mm: soft-offline: close the race against page allocation 2018-08-23 18:48:43 -07:00
mincore.c
mlock.c
mm_init.c mm: access zone->node via zone_to_nid() and zone_set_nid() 2018-08-22 10:52:45 -07:00
mmap.c mm, oom: remove oom_lock from oom_reaper 2018-08-22 10:52:44 -07:00
mmu_context.c
mmu_notifier.c mm, oom: distinguish blockable mode for mmu notifiers 2018-08-22 10:52:44 -07:00
mmzone.c
mprotect.c
mremap.c
msync.c
nobootmem.c
nommu.c mm: provide a fallback for PAGE_KERNEL_EXEC for architectures 2018-08-17 16:20:29 -07:00
oom_kill.c Merge branch 'akpm' (patches from Andrew) 2018-08-22 12:34:08 -07:00
page-writeback.c mm/page-writeback.c: update stale account_page_redirty() comment 2018-08-17 16:20:30 -07:00
page_alloc.c mm: soft-offline: close the race against page allocation 2018-08-23 18:48:43 -07:00
page_counter.c
page_ext.c mm/page_ext.c: constify lookup_page_ext() argument 2018-08-17 16:20:28 -07:00
page_idle.c
page_io.c
page_isolation.c
page_owner.c
page_poison.c
page_vma_mapped.c
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c /proc/meminfo: add percpu populated pages count 2018-08-22 10:52:45 -07:00
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
rodata_test.c
shmem.c mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
slab.c
slab.h mm: introduce CONFIG_MEMCG_KMEM as combination of CONFIG_MEMCG && !CONFIG_SLOB 2018-08-17 16:20:30 -07:00
slab_common.c mm: introduce CONFIG_MEMCG_KMEM as combination of CONFIG_MEMCG && !CONFIG_SLOB 2018-08-17 16:20:30 -07:00
slob.c
slub.c
sparse-vmemmap.c mm/sparse: delete old sparse_init and enable new one 2018-08-17 16:20:32 -07:00
sparse.c mm/sparse: delete old sparse_init and enable new one 2018-08-17 16:20:32 -07:00
swap.c
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c
swapfile.c mm/swapfile.c: put_swap_page: share more between huge/normal code path 2018-08-22 10:52:44 -07:00
truncate.c
usercopy.c
userfaultfd.c
util.c mm/util: add kernel-doc for kvfree 2018-08-23 18:48:43 -07:00
vmacache.c mm, vmacache: hash addresses based on pmd 2018-08-17 16:20:32 -07:00
vmalloc.c mm: provide a fallback for PAGE_KERNEL_EXEC for architectures 2018-08-17 16:20:29 -07:00
vmpressure.c
vmscan.c mm: fix page_freeze_refs and page_unfreeze_refs in comments 2018-08-22 10:52:44 -07:00
vmstat.c
workingset.c mm/list_lru: introduce list_lru_shrink_walk_irq() 2018-08-17 16:20:32 -07:00
z3fold.c
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc.c: make several functions and a struct static 2018-08-17 16:20:30 -07:00
zswap.c