linux/mm
Dave Chinner 1c00f93686 mm: lift gfp_kmemleak_mask() to gfp.h
Patch series "mm: fix nested allocation context filtering".

This patchset is the followup to the comment I made earlier today:

https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/

Tl;dr: Memory allocations that are done inside the public memory
allocation API need to obey the reclaim recursion constraints placed on
the allocation by the original caller, including the "don't track
recursion for this allocation" case defined by __GFP_NOLOCKDEP.

These nested allocations are generally in debug code that is tracking
something about the allocation (kmemleak, KASAN, etc) and so are
allocating private kernel objects that only that debug system will use.

Neither the page-owner code nor the stack depot code get this right.  They
also also clear GFP_ZONEMASK as a separate operation, which is completely
redundant because the constraint filter applied immediately after
guarantees that GFP_ZONEMASK bits are cleared.

kmemleak gets this filtering right.  It preserves the allocation
constraints for deadlock prevention and clears all other context flags
whilst also ensuring that the nested allocation will fail quickly,
silently and without depleting emergency kernel reserves if there is no
memory available.

This can be made much more robust, immune to whack-a-mole games and the
code greatly simplified by lifting gfp_kmemleak_mask() to
include/linux/gfp.h and using that everywhere.  Also document it so that
there is no excuse for not knowing about it when writing new debug code
that nests allocations.

Tested with lockdep, KASAN + page_owner=on and kmemleak=on over multiple
fstests runs with XFS.


This patch (of 3):

Any "internal" nested allocation done from within an allocation context
needs to obey the high level allocation gfp_mask constraints.  This is
necessary for debug code like KASAN, kmemleak, lockdep, etc that allocate
memory for saving stack traces and other information during memory
allocation.  If they don't obey things like __GFP_NOLOCKDEP or
__GFP_NOWARN, they produce false positive failure detections.

kmemleak gets this right by using gfp_kmemleak_mask() to pass through the
relevant context flags to the nested allocation to ensure that the
allocation follows the constraints of the caller context.

KASAN recently was foudn to be missing __GFP_NOLOCKDEP due to stack depot
allocations, and even more recently the page owner tracking code was also
found to be missing __GFP_NOLOCKDEP support.

We also don't wan't want KASAN or lockdep to drive the system into OOM
kill territory by exhausting emergency reserves.  This is something that
kmemleak also gets right by adding (__GFP_NORETRY | __GFP_NOMEMALLOC |
__GFP_NOWARN) to the allocation mask.

Hence it is clear that we need to define a common nested allocation filter
mask for these sorts of third party nested allocations used in debug code.
So to start this process, lift gfp_kmemleak_mask() to gfp.h and rename it
to gfp_nested_mask(), and convert the kmemleak callers to use it.

Link: https://lkml.kernel.org/r/20240430054604.4169568-1-david@fromorbit.com
Link: https://lkml.kernel.org/r/20240430054604.4169568-2-david@fromorbit.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-19 14:40:44 -07:00
..
damon mm/damon/core: fix return value from damos_wmark_metric_value 2024-05-11 15:41:36 -07:00
kasan fix missing vmalloc.h includes 2024-04-25 20:55:49 -07:00
kfence mm: introduce slabobj_ext to support slab object extensions 2024-04-25 20:55:51 -07:00
kmsan mm: kmsan: implement kmsan_memmove() 2024-04-25 21:07:02 -07:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c
bootmem_info.c bootmem: use kmemleak_free_part_phys in put_page_bootmem 2023-10-25 16:47:13 -07:00
cma.c mm/cma: drop incorrect alignment check in cma_init_reserved_mem 2024-04-25 20:56:42 -07:00
cma.h mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma_debug.c
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
debug.c mm/debug: print only page mapcount (excluding folio entire mapcount) in __dump_folio() 2024-05-05 17:53:31 -07:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: test pmd_leaf() behavior with pmd_mkinvalid() 2024-05-07 10:37:00 -07:00
dmapool.c mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs 2023-12-05 11:17:58 +01:00
dmapool_test.c
early_ioremap.c
execmem.c mm/execmem, arch: convert remaining overrides of module_alloc to execmem 2024-05-14 00:31:43 -07:00
fadvise.c
fail_page_alloc.c
failslab.c
filemap.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
folio-compat.c mm: remove __set_page_dirty_nobuffers() 2024-04-25 20:56:25 -07:00
gup.c mm/gup: fix hugepd handling in hugetlb rework 2024-05-07 10:37:01 -07:00
gup_test.c
gup_test.h
highmem.c x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs 2023-12-29 12:22:28 -08:00
hmm.c mm/treewide: replace pXd_huge() with pXd_leaf() 2024-04-25 20:55:46 -07:00
huge_memory.c thp: remove HPAGE_PMD_ORDER minimum assertion 2024-05-07 10:37:02 -07:00
hugetlb.c mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp 2024-05-11 15:41:37 -07:00
hugetlb_cgroup.c mm/hugetlb: assert hugetlb_lock in __hugetlb_cgroup_commit_charge 2024-05-05 17:53:41 -07:00
hugetlb_vmemmap.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: fix reference to nonexistent file 2023-10-25 16:47:14 -07:00
hwpoison-inject.c mm/memory-failure: convert shake_page() to shake_folio() 2024-05-05 17:53:45 -07:00
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h mm/vmscan: remove ignore_references argument of reclaim_pages() 2024-05-07 10:37:02 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
Kconfig.debug mm/slub: unify all sl[au]b parameters with "slab_$param" 2024-01-22 10:31:08 +01:00
khugepaged.c mm: simplify thp_vma_allowable_order 2024-05-05 17:53:53 -07:00
kmemleak.c mm: lift gfp_kmemleak_mask() to gfp.h 2024-05-19 14:40:44 -07:00
ksm.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
list_lru.c mm/zswap: stop lru list shrinking when encounter warm region 2024-02-22 10:24:54 -08:00
maccess.c
madvise.c mm/vmscan: remove ignore_references argument of reclaim_pages() 2024-05-07 10:37:02 -07:00
Makefile The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c cxl fixes for 6.8-rc6 2024-02-24 15:53:40 -08:00
memcontrol.c memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order 2024-05-11 15:41:37 -07:00
memfd.c mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() 2024-03-04 17:01:21 -08:00
memory-failure.c memory-failure: remove calls to page_mapping() 2024-05-05 17:53:48 -07:00
memory-tiers.c memory tier: create CPUless memory tiers after obtaining HMAT info 2024-05-05 17:53:26 -07:00
memory.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
memory_hotplug.c mm/hugetlb: rename dissolve_free_huge_pages() to dissolve_free_hugetlb_folios() 2024-05-05 17:53:35 -07:00
mempolicy.c mm: add pmd_folio() 2024-04-25 20:56:19 -07:00
mempool.c mempool: hook up to memory allocation profiling 2024-04-25 20:55:56 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate.c mm: convert hugetlb_page_mapping_lock_write to folio 2024-05-05 17:53:46 -07:00
migrate_device.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
mincore.c
mlock.c mm: add pmd_folio() 2024-04-25 20:56:19 -07:00
mm_init.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
mm_slot.h
mmap.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
mmap_lock.c
mmu_gather.c mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing 2024-02-22 15:27:17 -08:00
mmu_notifier.c mmu_notifier: remove the .change_pte() callback 2024-04-11 13:18:36 -04:00
mmzone.c zswap: shrink zswap pool based on memory pressure 2023-12-12 10:57:02 -08:00
mprotect.c mm: support multi-size THP numa balancing 2024-04-25 20:56:30 -07:00
mremap.c mm: remove "prot" parameter from move_pte() 2024-04-25 20:56:24 -07:00
msync.c
nommu.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
oom_kill.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
page-writeback.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
page_alloc.c mm: page_alloc: allowing mTHP compaction to capture the freed page directly 2024-05-05 17:53:37 -07:00
page_counter.c
page_ext.c mm: make page_ext_get() take a const argument 2024-04-25 20:56:14 -07:00
page_idle.c
page_io.c mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters 2024-05-05 17:53:35 -07:00
page_isolation.c mm: page_isolation: prepare for hygienic freelists 2024-04-25 20:56:04 -07:00
page_owner.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm/page_table_check: support userfault wr-protect entries 2024-05-05 17:53:41 -07:00
page_vma_mapped.c mm: make page_mapped_in_vma conditional on CONFIG_MEMORY_FAILURE 2024-05-05 17:53:45 -07:00
pagewalk.c mm: pagewalk: assert write mmap lock only for walking the user page tables 2023-12-10 16:51:53 -08:00
percpu-internal.h mm: percpu: add codetag reference into pcpuobj_ext 2024-04-25 20:55:56 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c mm: percpu: enable per-cpu allocation tagging 2024-04-25 20:55:56 -07:00
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-05-07 10:37:00 -07:00
process_vm_access.c mm: fix process_vm_rw page counts 2023-12-10 16:51:39 -08:00
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
rmap.c mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED 2024-05-11 15:41:35 -07:00
rodata_test.c
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
shmem_quota.c tmpfs: fix race on handling dquot rbtree 2024-03-26 11:07:23 -07:00
show_mem.c lib: add memory allocations report in show_mem() 2024-04-25 20:55:57 -07:00
shrinker.c mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() 2024-01-05 09:58:32 -08:00
shrinker_debug.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
shuffle.c
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab.h The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
slab_common.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
slub.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
sparse-vmemmap.c
sparse.c mm/sparse: guard the size of mem_section is power of 2 2024-05-05 17:53:40 -07:00
swap.c mm: add kernel-doc for folio_mark_accessed() 2024-05-05 17:53:50 -07:00
swap.h mm/swap: fix race when skipping swapcache 2024-02-20 14:20:48 -08:00
swap_cgroup.c
swap_slots.c mm: swap: update get_swap_pages() to take folio order 2024-04-25 20:56:37 -07:00
swap_state.c mm: remove struct page from get_shadow_from_swap_cache 2024-04-25 20:56:40 -07:00
swapfile.c mm/swapfile: mark racy access on si->highest_bit 2024-05-05 17:53:57 -07:00
truncate.c mm: convert pagecache_isize_extended to use a folio 2024-04-25 20:56:43 -07:00
usercopy.c
userfaultfd.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
util.c mm: switch mm->get_unmapped_area() to a flag 2024-04-25 20:56:25 -07:00
vmalloc.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
vmpressure.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vmscan.c mm/vmscan: remove ignore_references argument of reclaim_folio_list() 2024-05-07 10:37:02 -07:00
vmstat.c iommu: observability of the IOMMU allocations 2024-04-15 14:31:47 +02:00
workingset.c mm: cleanup WORKINGSET_NODES in workingset 2024-05-07 10:36:59 -07:00
z3fold.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zswap.c mm: zswap: remove same_filled module params 2024-05-05 17:53:38 -07:00