linux/mm
Tong Tiangen d256d1cd8d mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs()
We found a softlock issue in our test, analyzed the logs, and found that
the relevant CPU call trace as follows:

CPU0:
  _do_fork
    -> copy_process()
      -> write_lock_irq(&tasklist_lock)  //Disable irq,waiting for
      					 //tasklist_lock

CPU1:
  wp_page_copy()
    ->pte_offset_map_lock()
      -> spin_lock(&page->ptl);        //Hold page->ptl
    -> ptep_clear_flush()
      -> flush_tlb_others() ...
        -> smp_call_function_many()
          -> arch_send_call_function_ipi_mask()
            -> csd_lock_wait()         //Waiting for other CPUs respond
	                               //IPI

CPU2:
  collect_procs_anon()
    -> read_lock(&tasklist_lock)       //Hold tasklist_lock
      ->for_each_process(tsk)
        -> page_mapped_in_vma()
          -> page_vma_mapped_walk()
	    -> map_pte()
              ->spin_lock(&page->ptl)  //Waiting for page->ptl

We can see that CPU1 waiting for CPU0 respond IPI,CPU0 waiting for CPU2
unlock tasklist_lock, CPU2 waiting for CPU1 unlock page->ptl. As a result,
softlockup is triggered.

For collect_procs_anon(), what we're doing is task list iteration, during
the iteration, with the help of call_rcu(), the task_struct object is freed
only after one or more grace periods elapse. the logic as follows:

release_task()
  -> __exit_signal()
    -> __unhash_process()
      -> list_del_rcu()

  -> put_task_struct_rcu_user()
    -> call_rcu(&task->rcu, delayed_put_task_struct)

delayed_put_task_struct()
  -> put_task_struct()
  -> if (refcount_sub_and_test())
     	__put_task_struct()
          -> free_task()

Therefore, under the protection of the rcu lock, we can safely use
get_task_struct() to ensure a safe reference to task_struct during the
iteration.

By removing the use of tasklist_lock in task list iteration, we can break
the softlock chain above.

The same logic can also be applied to:
 - collect_procs_file()
 - collect_procs_fsdax()
 - collect_procs_ksm()

Link: https://lkml.kernel.org/r/20230828022527.241693-1-tongtiangen@huawei.com
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-05 11:11:52 -07:00
..
damon merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
kasan kasan, slub: fix HW_TAGS zeroing with slub_debug 2023-07-08 09:29:32 -07:00
kfence - An extensive rework of kexec and crash Kconfig from Eric DeVolder 2023-08-29 14:53:51 -07:00
kmsan mm: kmsan: use helper macros PAGE_ALIGN and PAGE_ALIGN_DOWN 2023-08-21 13:37:29 -07:00
backing-dev.c writeback: remove redundant checks for root memcg 2023-08-21 13:37:48 -07:00
balloon_compaction.c
bootmem_info.c
cma.c dma-maping updates for Linux 6.6 2023-08-29 20:32:10 -07:00
cma.h
cma_debug.c
cma_sysfs.c
compaction.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
debug.c mm: update validate_mm() to use vma iterator 2023-06-09 16:25:31 -07:00
debug_page_alloc.c mm: page_alloc: split out DEBUG_PAGEALLOC 2023-06-09 16:25:23 -07:00
debug_page_ref.c
debug_vm_pgtable.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
dmapool.c
dmapool_test.c
early_ioremap.c mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup() 2023-06-09 16:25:56 -07:00
fadvise.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
fail_page_alloc.c
failslab.c
filemap.c mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs() 2023-09-05 11:11:52 -07:00
folio-compat.c filemap: Add fgf_t typedef 2023-07-24 18:04:30 -04:00
gup.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h
highmem.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
hmm.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
huge_memory.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
hugetlb.c hugetlb: add documentation for vma_kernel_pagesize() 2023-08-24 16:20:31 -07:00
hugetlb_cgroup.c
hugetlb_vmemmap.c mm: hugetlb_vmemmap: fix a race between vmemmap pmd split 2023-08-18 10:12:14 -07:00
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c mm: move dummy_vm_ops out of a header 2023-08-21 13:37:46 -07:00
internal.h Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
Kconfig.debug
khugepaged.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
kmemleak.c Rename kmemleak_initialized to kmemleak_late_initialized 2023-08-21 13:38:02 -07:00
ksm.c mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs() 2023-09-05 11:11:52 -07:00
list_lru.c
maccess.c
madvise.c swap: remove remnants of polling from read_swap_cache_async 2023-08-24 16:20:16 -07:00
Makefile - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c mm: disable kernelcore=mirror when no mirror memory 2023-08-21 13:37:43 -07:00
memcontrol.c memcontrol: ensure memcg acquired by id is properly set up 2023-09-05 10:13:45 -07:00
memfd.c revert "memfd: improve userspace warnings for missing exec-related flags". 2023-09-05 11:11:52 -07:00
memory-failure.c mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs() 2023-09-05 11:11:52 -07:00
memory-tiers.c memory tier: use helper macro __ATTR_RW() 2023-08-18 10:12:38 -07:00
memory.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
memory_hotplug.c mm/memory_hotplug: embed vmem_altmap details in memory block 2023-08-21 13:37:49 -07:00
mempolicy.c mm: convert prep_transhuge_page() to folio_prep_large_rmappable() 2023-08-21 14:28:43 -07:00
mempool.c
memremap.c
memtest.c mm: memtest: convert to memtest_report_meminfo() 2023-08-21 13:37:47 -07:00
migrate.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
migrate_device.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
mincore.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
mlock.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
mm_init.c mm/mm_init: use helper macro BITS_PER_LONG and BITS_PER_BYTE 2023-08-21 13:37:47 -07:00
mm_slot.h
mmap.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
mmap_lock.c
mmu_gather.c mm: fix kernel-doc warning from tlb_flush_rmaps() 2023-08-24 16:20:30 -07:00
mmu_notifier.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
mmzone.c
mprotect.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
mremap.c mm/huge pud: use transparent huge pud helpers only with CONFIG_TRANSPARENT_HUGEPAGE 2023-08-18 10:12:54 -07:00
msync.c
nommu.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
oom_kill.c mm: remove redundant K() macro definition 2023-08-21 13:37:44 -07:00
page-writeback.c mm: remove folio_account_redirty 2023-08-21 14:52:16 +02:00
page_alloc.c mm: add large_rmappable page flag 2023-08-21 14:28:44 -07:00
page_counter.c
page_ext.c mm/page_ext: move functions around for minor cleanups to page_ext 2023-08-18 10:12:31 -07:00
page_idle.c
page_io.c zswap: make zswap_load() take a folio 2023-08-21 13:37:27 -07:00
page_isolation.c mm/hugetlb: get rid of page_hstate() 2023-08-18 10:12:39 -07:00
page_owner.c mm/page_ext: use page_ext_data helper in page_owner 2023-08-21 13:37:27 -07:00
page_poison.c mm/page_poison: remove unused page_ext.h from page_poison 2023-08-21 13:37:30 -07:00
page_reporting.c
page_reporting.h
page_table_check.c mm: convert page_table_check_pte_set() to page_table_check_ptes_set() 2023-08-24 16:20:18 -07:00
page_vma_mapped.c mm: correct stale comment of function check_pte 2023-08-18 10:12:13 -07:00
pagewalk.c mm/pagewalk: fix bootstopping regression from extra pte_unmap() 2023-09-02 08:39:21 -07:00
percpu-internal.h percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing 2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm/percpu.c: print error message too if atomic alloc failed 2023-08-25 08:04:59 -07:00
pgalloc-track.h
pgtable-generic.c mm/pgtable: notes on pte_offset_map[_lock]() 2023-08-18 10:12:25 -07:00
process_vm_access.c mm/gup: remove unused vmas parameter from pin_user_pages_remote() 2023-06-09 16:25:25 -07:00
ptdump.c mm: ptdump should use ptep_get_lockless() 2023-06-19 16:19:24 -07:00
readahead.c filemap: Allow __filemap_get_folio to allocate large folios 2023-07-24 18:04:30 -04:00
rmap.c mm/swap: stop using page->private on tail pages for THP_SWAP 2023-08-24 16:20:28 -07:00
rodata_test.c
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
shmem_quota.c shmem: Add default quota limit mount options 2023-08-09 09:15:40 +02:00
show_mem.c mm,thp: no space after colon in Mem-Info fields 2023-08-21 13:38:01 -07:00
shrinker_debug.c Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless" 2023-06-19 13:19:34 -07:00
shuffle.c
shuffle.h
slab.c Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slab.h Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slab_common.c dma-maping updates for Linux 6.6 2023-08-29 20:32:10 -07:00
slub.c mm/slub: remove freelist_dereference() 2023-07-14 09:57:21 +02:00
sparse-vmemmap.c mm/vmemmap: allow architectures to override how vmemmap optimization works 2023-08-18 10:12:53 -07:00
sparse.c mm/sparse: remove redundant judgments from macro for_each_present_section_nr 2023-08-18 10:12:14 -07:00
swap.c mm: remove references to pagevec 2023-06-23 16:59:30 -07:00
swap.h swap: remove remnants of polling from read_swap_cache_async 2023-08-24 16:20:16 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00
swapfile.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00
truncate.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
usercopy.c
userfaultfd.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
util.c rcu: dump vmalloc memory info safely 2023-09-05 10:13:45 -07:00
vmalloc.c mm/vmalloc: add a safer version of find_vm_area() for debug 2023-09-05 10:13:45 -07:00
vmpressure.c net-memcg: Fix scope of sockmem pressure indicators 2023-08-16 12:21:32 +01:00
vmscan.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00
vmstat.c mm/vmstat: remove unused page_ext.h from vmstat 2023-08-21 13:37:30 -07:00
workingset.c mm: memcg: use rstat for non-hierarchical stats 2023-08-24 16:20:18 -07:00
z3fold.c mm/z3fold: remove obsolete comment for struct z3fold_pool 2023-08-21 13:37:51 -07:00
zbud.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zpool.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zsmalloc.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
zswap.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00