linux/mm
Waiman Long 41eb5df1cb mm: memcg/slab: properly set up gfp flags for objcg pointer array
Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4.

Since the merging of the new slab memory controller in v5.9, the page
structure stores a pointer to objcg pointer array for slab pages.  When
the slab has no used objects, it can be freed in free_slab() which will
call kfree() to free the objcg pointer array in
memcg_alloc_page_obj_cgroups().  If it happens that the objcg pointer
array is the last used object in its slab, that slab may then be freed
which may caused kfree() to be called again.

With the right workload, the slab cache may be set up in a way that allows
the recursive kfree() calling loop to nest deep enough to cause a kernel
stack overflow and panic the system.  In fact, we have a reproducer that
can cause kernel stack overflow on a s390 system involving kmalloc-rcl-256
and kmalloc-rcl-128 slabs with the following kfree() loop recursively
called 74 times:

  [ 285.520739] [<000000000ec432fc>] kfree+0x4bc/0x560 [ 285.520740]
[<000000000ec43466>] __free_slab+0xc6/0x228 [ 285.520741]
[<000000000ec41fc2>] __slab_free+0x3c2/0x3e0 [ 285.520742]
[<000000000ec432fc>] kfree+0x4bc/0x560 : While investigating this issue, I
also found an issue on the allocation side.  If the objcg pointer array
happen to come from the same slab or a circular dependency linkage is
formed with multiple slabs, those affected slabs can never be freed again.

This patch series addresses these two issues by introducing a new set of
kmalloc-cg-<n> caches split from kmalloc-<n> caches.  The new set will
only contain non-reclaimable and non-dma objects that are accounted in
memory cgroups whereas the old set are now for unaccounted objects only.
By making this split, all the objcg pointer arrays will come from the
kmalloc-<n> caches, but those caches will never hold any objcg pointer
array.  As a result, deeply nested kfree() call and the unfreeable slab
problems are now gone.

This patch (of 4):

Since the merging of the new slab memory controller in v5.9, the page
structure may store a pointer to obj_cgroup pointer array for slab pages.
Currently, only the __GFP_ACCOUNT bit is masked off.  However, the array
is not readily reclaimable and doesn't need to come from the DMA buffer.
So those GFP bits should be masked off as well.

Do the flag bit clearing at memcg_alloc_page_obj_cgroups() to make sure
that it is consistently applied no matter where it is called.

Link: https://lkml.kernel.org/r/20210505200610.13943-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20210505200610.13943-2-longman@redhat.com
Fixes: 286e04b8ed ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:49 -07:00
..
kasan mm/slub, kunit: add a KUnit test for SLUB debugging functionality 2021-06-29 10:53:46 -07:00
kfence mm, slub: change run-time assertion in kmalloc_index() to compile-time 2021-06-29 10:53:46 -07:00
backing-dev.c writeback, cgroup: release dying cgwbs by switching attached inodes 2021-06-29 10:53:48 -07:00
balloon_compaction.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma.c mm: use proper type for cma_[alloc|release] 2021-05-05 11:27:24 -07:00
cma.h mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
cma_debug.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma_sysfs.c mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
compaction.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
debug.c mm/debug: improve memcg debugging 2021-02-24 13:38:27 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage() 2021-06-29 10:53:47 -07:00
dmapool.c mm/dmapool: switch from strlcpy to strscpy 2021-04-30 11:20:39 -07:00
early_ioremap.c mm/early_ioremap.c: use __func__ instead of function name 2021-02-26 09:41:02 -08:00
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
frontswap.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
gup.c mm: gup: pack has_pinned in MMF_HAS_PINNED 2021-06-29 10:53:48 -07:00
gup_test.c selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages 2021-05-05 11:27:26 -07:00
gup_test.h selftests/vm: gup_test: fix test flag 2021-05-05 11:27:26 -07:00
highmem.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
hmm.c mm: do page fault accounting in handle_mm_fault 2020-08-12 10:58:02 -07:00
huge_memory.c mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split 2021-06-16 09:24:42 -07:00
hugetlb.c mm, futex: fix shared futex pgoff on shmem huge page 2021-06-24 19:40:54 -07:00
hugetlb_cgroup.c hugetlb: make free_huge_page irq safe 2021-05-05 11:27:22 -07:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-15 12:13:39 -08:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-16 09:24:42 -07:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
ioremap.c mm/ioremap: fix iomap_max_page_shift 2021-05-14 19:41:32 -07:00
Kconfig mm,memory_hotplug: allocate memmap from the added memory range 2021-05-05 11:27:26 -07:00
Kconfig.debug mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO 2020-12-15 12:13:46 -08:00
khugepaged.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
kmemleak.c mm/kmemleak: fix possible wrong memory scanning period 2021-06-29 10:53:47 -07:00
ksm.c ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()" 2021-05-14 19:41:32 -07:00
list_lru.c mm: vmscan: consolidate shrinker_maps handling code 2021-05-05 11:27:23 -07:00
maccess.c uaccess: add force_uaccess_{begin,end} helpers 2020-08-12 10:57:59 -07:00
madvise.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
Makefile mm,memory_hotplug: add kernel boot option to enable memmap_on_memory 2021-05-05 11:27:27 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: guard hugepage pud's usage 2021-04-16 16:10:37 -07:00
memblock.c memblock: remove return value of memblock_free_all() 2021-02-22 13:01:23 -08:00
memcontrol.c mm: memcg/slab: properly set up gfp flags for objcg pointer array 2021-06-29 10:53:49 -07:00
memfd.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
memory-failure.c mm/hwpoison: do not lock page again when me_huge_page() successfully recovers 2021-06-24 19:40:54 -07:00
memory.c mm: free idle swap cache page after COW 2021-06-29 10:53:49 -07:00
memory_hotplug.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
mempolicy.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
mempool.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
memremap.c mm/memremap.c: fix improper SPDX comment style 2021-04-30 11:20:37 -07:00
memtest.c
migrate.c mm, thp: use head page in __migration_entry_wait() 2021-06-16 09:24:42 -07:00
mincore.c inode: make init and permission helpers idmapped mount aware 2021-01-24 14:27:16 +01:00
mlock.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
mm_init.c include/linux/page-flags-layout.h: cleanups 2021-04-30 11:20:42 -07:00
mmap.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
mmap_lock.c mm: mmap_lock: use local locks instead of disabling preemption 2021-06-29 10:53:47 -07:00
mmu_gather.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-25 09:22:55 -07:00
mmzone.c mm/lru: replace pgdat lru_lock with lruvec lock 2020-12-15 14:48:04 -08:00
mprotect.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
mremap.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
msync.c mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start 2021-04-30 11:20:37 -07:00
nommu.c mm/vmalloc: remove vwrite() 2021-05-07 00:26:34 -07:00
oom_kill.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
page-writeback.c fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
page_alloc.c mm/page_alloc: correct return value of populated elements if bulk array is populated 2021-06-29 10:53:45 -07:00
page_counter.c mm: page_counter: mitigate consequences of a page_counter underflow 2021-04-30 11:20:38 -07:00
page_ext.c mm: fix some spelling mistakes in comments 2020-12-15 22:46:19 -08:00
page_idle.c mm: page_idle_get_page() does not need lru_lock 2020-12-15 14:48:03 -08:00
page_io.c swap: fix swapfile read/write offset 2021-03-02 17:25:46 -07:00
page_isolation.c mm/page_isolation: do not isolate the max order page 2020-12-15 12:13:45 -08:00
page_owner.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
page_poison.c mm: page_poison: print page info when corruption is caught 2021-04-30 11:20:36 -07:00
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-24 19:40:53 -07:00
pagewalk.c mm: pagewalk: fix walk for hugepage tables 2021-06-29 10:53:49 -07:00
percpu-internal.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-09 13:58:38 +00:00
percpu-vm.c mm/vmalloc: remove unmap_kernel_range 2021-04-30 11:20:40 -07:00
percpu.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-16 09:24:42 -07:00
process_vm_access.c mm/process_vm_access.c: remove duplicate include 2021-05-05 11:27:27 -07:00
ptdump.c mm: ptdump: fix build failure 2021-04-16 16:10:37 -07:00
readahead.c mm: Implement readahead_control pageset expansion 2021-04-23 10:14:29 +01:00
rmap.c mm/thp: fix page_address_in_vma() on file THP tails 2021-06-16 09:24:42 -07:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c mm/shmem: fix shmem_swapin() race with swapoff 2021-06-29 10:53:49 -07:00
shuffle.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
slab.h mm: memcg/slab: properly set up gfp flags for objcg pointer array 2021-06-29 10:53:49 -07:00
slab_common.c mm: slub: move sysfs slab alloc/free interfaces to debugfs 2021-06-29 10:53:47 -07:00
slob.c mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels 2021-03-08 14:18:46 -08:00
slub.c mm/slub: add taint after the errors are printed 2021-06-29 10:53:47 -07:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/sparse: fix check_usemap_section_nr warnings 2021-06-16 09:24:43 -07:00
swap.c mm: fix some typos and code style problems 2021-05-07 00:26:33 -07:00
swap_cgroup.c mm: memcontrol: make swap tracking an integral part of memory control 2020-06-03 20:09:48 -07:00
swap_slots.c mm/swap_slots.c: delete meaningless forward declarations 2021-06-29 10:53:49 -07:00
swap_state.c swap: check mapping_empty() for swap cache before being freed 2021-06-29 10:53:49 -07:00
swapfile.c mm, swap: remove unnecessary smp_rmb() in swap_type_to_swap_info() 2021-06-29 10:53:49 -07:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-16 09:24:42 -07:00
usercopy.c mm/usercopy.c: delete duplicated word 2020-08-12 10:57:58 -07:00
userfaultfd.c userfaultfd: hugetlbfs: fix new flag usage in error path 2021-05-22 15:09:07 -10:00
util.c mm/util.c: fix typo 2021-05-05 11:27:25 -07:00
vmacache.c kernel: better document the use_mm/unuse_mm API contract 2020-06-10 19:14:18 -07:00
vmalloc.c mm/vmalloc: unbreak kasan vmalloc support 2021-06-24 19:40:54 -07:00
vmpressure.c mm: vmpressure: use mem_cgroup_is_root API 2020-04-02 09:35:31 -07:00
vmscan.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
vmstat.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
workingset.c mm: stop accounting shadow entries 2021-05-05 11:27:19 -07:00
z3fold.c mm: fix some typos and code style problems 2021-05-07 00:26:33 -07:00
zbud.c mm: set the sleep_mapped to true for zbud and z3fold 2021-02-26 09:41:01 -08:00
zpool.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
zsmalloc.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
zswap.c mm/zswap.c: switch from strlcpy to strscpy 2021-05-05 11:27:27 -07:00