linux/mm
Barry Song 9d57090e73 mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
Patch series "mm: enable large folios swap-in support", v9.

Currently, we support mTHP swapout but not swapin.  This means that once
mTHP is swapped out, it will come back as small folios when swapped in. 
This is particularly detrimental for devices like Android, where more than
half of the memory is in swap.

The lack of mTHP swapin functionality makes mTHP a showstopper in
scenarios that heavily rely on swap.  This patchset introduces mTHP
swap-in support.  It starts with synchronous devices similar to zRAM,
aiming to benefit as many users as possible with minimal changes.


This patch (of 3):

There could be a corner case where the first entry is non-zeromap, but a
subsequent entry is zeromap.  In this case, we should not let
swap_read_folio_zeromap() return false since we will still read corrupted
data.

Additionally, the iteration of test_bit() is unnecessary and can be
replaced with bitmap operations, which are more efficient.

We can adopt the style of swap_pte_batch() and folio_pte_batch() to
introduce swap_zeromap_batch() which seems to provide the greatest
flexibility for the caller.  This approach allows the caller to either
check if the zeromap status of all entries is consistent or determine the
number of contiguous entries with the same status.

Since swap_read_folio() can't handle reading a large folio that's
partially zeromap and partially non-zeromap, we've moved the code to
mm/swap.h so that others, like those working on swap-in, can access it.

Link: https://lkml.kernel.org/r/20240908232119.2157-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240908232119.2157-2-21cnbao@gmail.com
Fixes: 0ca0c24e32 ("mm: store zero pages to be swapped out in a bitmap")
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Usama Arif <usamaarif642@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuanhua Han <hanchuanhua@oppo.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 01:07:01 -07:00
..
damon mm/damon/tests/core-kunit: skip damon_test_nr_accesses_to_accesses_bp() if aggr_interval is zero 2024-09-09 16:39:14 -07:00
kasan kasan: fix bad call to unpoison_slab_object 2024-06-24 20:52:09 -07:00
kfence kfence: save freeing stack trace at calling time instead of freeing time 2024-09-01 20:26:12 -07:00
kmsan kmsan: do not pass NULL pointers as 0 2024-07-03 19:30:26 -07:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c
cma.c mm/cma: add cma_{alloc,free}_folio() 2024-09-03 21:15:36 -07:00
cma.h mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma_debug.c
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c mm:page_alloc: fix the NULL ac->nodemask in __alloc_pages_slowpath() 2024-09-03 21:15:47 -07:00
debug.c mm: support only one page_type per page 2024-09-03 21:15:43 -07:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries 2024-09-17 01:07:01 -07:00
dmapool.c
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
early_ioremap.c
execmem.c mm/execmem, arch: convert remaining overrides of module_alloc to execmem 2024-05-14 00:31:43 -07:00
fadvise.c
fail_page_alloc.c mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC 2024-07-17 21:05:18 -07:00
failslab.c mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB 2024-07-17 21:05:18 -07:00
filemap.c mm: replace xa_get_order with xas_get_order where appropriate 2024-09-09 16:39:17 -07:00
folio-compat.c mm: remove putback_lru_page() 2024-09-09 16:38:59 -07:00
gup.c mm/gup: detect huge pfnmap entries in gup-fast 2024-09-17 01:06:58 -07:00
gup_test.c
gup_test.h
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
huge_memory.c mm/fork: accept huge pfnmap entries 2024-09-17 01:06:58 -07:00
hugetlb.c mm/codetag: fix pgalloc_tag_split() 2024-09-09 16:39:18 -07:00
hugetlb_cgroup.c mm: memcg: don't call propagate_protected_usage() needlessly 2024-09-01 20:25:50 -07:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO 2024-09-01 20:25:45 -07:00
hugetlb_vmemmap.h
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h mm: remove putback_lru_page() 2024-09-09 16:38:59 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig mm: z3fold: deprecate CONFIG_Z3FOLD 2024-09-17 01:07:00 -07:00
Kconfig.debug mm/slub: unify all sl[au]b parameters with "slab_$param" 2024-01-22 10:31:08 +01:00
khugepaged.c mm,tmpfs: consider end of file write in shmem_is_huge 2024-09-09 16:39:12 -07:00
kmemleak.c mm/kmemleak: use IS_ERR_PCPU() for pointer in the percpu address space 2024-09-03 21:15:38 -07:00
ksm.c mm: remove PageSwapCache 2024-09-03 21:15:44 -07:00
list_lru.c mm: list_lru: fix UAF for memory cgroup 2024-08-07 18:33:56 -07:00
maccess.c
madvise.c mseal: replace can_modify_mm_madv with a vma variant 2024-09-03 21:15:41 -07:00
Makefile mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
mapping_dirty_helpers.c
memblock.c mm: rework accept memory helpers 2024-09-01 20:26:07 -07:00
memcontrol-v1.c memcg: initiate deprecation of pressure_level 2024-09-01 20:26:21 -07:00
memcontrol-v1.h memcg: cleanup with !CONFIG_MEMCG_V1 2024-09-17 01:07:00 -07:00
memcontrol.c mm: clean up mem_cgroup_iter() 2024-09-09 16:39:16 -07:00
memfd.c mm/gup: introduce memfd_pin_folios() for pinning memfd folios 2024-07-12 15:52:09 -07:00
memory-failure.c mm: migrate: add isolate_folio_to_list() 2024-09-03 21:15:59 -07:00
memory-tiers.c memory tier: fix deadlock warning while onlining pages 2024-09-03 21:15:56 -07:00
memory.c mm: support poison recovery from copy_present_page() 2024-09-17 01:07:00 -07:00
memory_hotplug.c mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation 2024-09-03 21:15:59 -07:00
mempolicy.c mm,memcg: provide per-cgroup counters for NUMA balancing operations 2024-09-03 21:15:36 -07:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate.c mm/codetag: add pgalloc_tag_copy() 2024-09-09 16:39:18 -07:00
migrate_device.c mm: remap unused subpages to shared zeropage when splitting isolated thp 2024-09-09 16:39:03 -07:00
mincore.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
mlock.c Random number generator updates for Linux 6.11-rc1. 2024-07-24 10:29:50 -07:00
mm_init.c mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION 2024-09-03 21:15:28 -07:00
mm_slot.h
mmap.c mm: care about shadow stack guard gap when getting an unmapped area 2024-09-09 16:39:13 -07:00
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-07-03 19:30:26 -07:00
mmu_gather.c mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing 2024-02-22 15:27:17 -08:00
mmu_notifier.c mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
mmzone.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mprotect.c mm/mprotect: replace can_modify_mm with can_modify_vma 2024-09-03 21:15:41 -07:00
mremap.c mm/mremap: replace can_modify_mm with can_modify_vma 2024-09-03 21:15:41 -07:00
mseal.c mm: remove can_modify_mm() 2024-09-03 21:15:42 -07:00
msync.c
nommu.c mm: remove follow_page() 2024-09-01 20:26:01 -07:00
numa.c mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
numa_emulation.c mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
numa_memblks.c mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
oom_kill.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
page-writeback.c mm:page-writeback: use folio_next_index() helper in writeback_iter() 2024-09-03 21:15:46 -07:00
page_alloc.c mm/codetag: fix pgalloc_tag_split() 2024-09-09 16:39:18 -07:00
page_counter.c mm, memcg: cg2 memory{.swap,}.peak write handlers 2024-09-01 20:25:53 -07:00
page_ext.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
page_idle.c
page_io.c mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
page_isolation.c mm: remove migration for HugePage in isolate_single_pageblock() 2024-09-03 21:15:40 -07:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-15 10:43:04 -07:00
page_vma_mapped.c mm: make page_mapped_in_vma conditional on CONFIG_MEMORY_FAILURE 2024-05-05 17:53:45 -07:00
pagewalk.c mm/pagewalk: check pfnmap for folio_walk_start() 2024-09-17 01:06:58 -07:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c percpu: remove pcpu_alloc_size() 2024-09-01 20:26:04 -07:00
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-05-07 10:37:00 -07:00
process_vm_access.c mm: fix process_vm_rw page counts 2023-12-10 16:51:39 -08:00
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c Merge branch 'mm-hotfixes-stable' into mm-stable to pick up "mm: fix 2024-07-06 11:44:41 -07:00
rmap.c mm: introduce a pageflag for partially mapped folios 2024-09-09 16:39:04 -07:00
rodata_test.c
secretmem.c
shmem.c mm: replace xa_get_order with xas_get_order where appropriate 2024-09-09 16:39:17 -07:00
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
show_mem.c mm/show_mem.c: report alloc tags in human readable units 2024-09-17 01:07:00 -07:00
shrinker.c mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() 2024-01-05 09:58:32 -08:00
shrinker_debug.c mm: shrinker: use min() to improve shrinker_debugfs_scan_write() 2024-09-03 21:15:40 -07:00
shuffle.c
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab.h - 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
slab_common.c mm: krealloc: clarify valid usage of __GFP_ZERO 2024-09-03 21:15:37 -07:00
slub.c mm, slub: do not call do_slab_free for kfence object 2024-07-30 11:50:00 +02:00
sparse-vmemmap.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
sparse.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
swap.c mm: remove isolate_lru_page() 2024-09-09 16:38:59 -07:00
swap.h mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
swap_cgroup.c mm: attempt to batch free swap entries for zap_pte_range() 2024-09-03 21:15:33 -07:00
swap_slots.c mm: swap: update get_swap_pages() to take folio order 2024-04-25 20:56:37 -07:00
swap_state.c mm: return the folio from swapin_readahead 2024-09-01 20:26:05 -07:00
swapfile.c swap: convert swapon() to use a folio 2024-09-09 16:38:58 -07:00
truncate.c mm: Fix missing folio invalidation calls during truncation 2024-08-24 16:09:16 +02:00
usercopy.c
userfaultfd.c mm,tmpfs: consider end of file write in shmem_is_huge 2024-09-09 16:39:12 -07:00
util.c mm: only enforce minimum stack gap size if it's sensible 2024-09-01 20:26:02 -07:00
vma.c mm/vma: return the exact errno in vms_gather_munmap_vmas() 2024-09-17 01:07:00 -07:00
vma.h mm: make vma_prepare() and friends static and internal to vma.c 2024-09-03 21:15:54 -07:00
vma_internal.h mm: remove duplicated include in vma_internal.h 2024-09-01 20:26:02 -07:00
vmalloc.c mm/vmalloc.c: use "high-order" in description non 0-order pages 2024-09-09 16:39:17 -07:00
vmpressure.c
vmscan.c mm: introduce a pageflag for partially mapped folios 2024-09-09 16:39:04 -07:00
vmstat.c mm: split underused THPs 2024-09-09 16:39:04 -07:00
workingset.c cachestat: do not flush stats in recency check 2024-07-03 22:40:37 -07:00
z3fold.c mm/z3fold: add __percpu annotation to *unbuddied pointer in struct z3fold_pool 2024-09-01 20:25:56 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c zsmalloc: use all available 24 bits of page_type 2024-09-03 21:15:43 -07:00
zswap.c mm: remove code to handle same filled pages 2024-09-03 21:15:47 -07:00