linux/mm
Michal Hocko 94723aafb9 mm: unclutter THP migration
THP migration is hacked into the generic migration with rather
surprising semantic.  The migration allocation callback is supposed to
check whether the THP can be migrated at once and if that is not the
case then it allocates a simple page to migrate.  unmap_and_move then
fixes that up by spliting the THP into small pages while moving the head
page to the newly allocated order-0 page.  Remaning pages are moved to
the LRU list by split_huge_page.  The same happens if the THP allocation
fails.  This is really ugly and error prone [1].

I also believe that split_huge_page to the LRU lists is inherently wrong
because all tail pages are not migrated.  Some callers will just work
around that by retrying (e.g.  memory hotplug).  There are other pfn
walkers which are simply broken though.  e.g. madvise_inject_error will
migrate head and then advances next pfn by the huge page size.
do_move_page_to_node_array, queue_pages_range (migrate_pages, mbind),
will simply split the THP before migration if the THP migration is not
supported then falls back to single page migration but it doesn't handle
tail pages if the THP migration path is not able to allocate a fresh THP
so we end up with ENOMEM and fail the whole migration which is a
questionable behavior.  Page compaction doesn't try to migrate large
pages so it should be immune.

This patch tries to unclutter the situation by moving the special THP
handling up to the migrate_pages layer where it actually belongs.  We
simply split the THP page into the existing list if unmap_and_move fails
with ENOMEM and retry.  So we will _always_ migrate all THP subpages and
specific migrate_pages users do not have to deal with this case in a
special way.

[1] http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com

Link: http://lkml.kernel.org/r/20180103082555.14592-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Andrea Reale <ar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:32 -07:00
..
kasan
backing-dev.c mm/vmscan: don't mess with pgdat->flags in memcg reclaim 2018-04-11 10:28:30 -07:00
balloon_compaction.c
bootmem.c
cleancache.c
cma.c
cma.h
cma_debug.c
compaction.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
debug.c
debug_page_ref.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c
frame_vector.c
frontswap.c
gup.c
gup_benchmark.c
highmem.c
hmm.c mm/hmm.c: remove superfluous RCU protection around radix tree lookup 2018-04-11 10:28:31 -07:00
huge_memory.c mm: unclutter THP migration 2018-04-11 10:28:32 -07:00
hugetlb.c
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c
internal.h mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
interval_tree.c
Kconfig
Kconfig.debug
khugepaged.c memcg, thp: do not invoke oom killer on thp charges 2018-04-11 10:28:31 -07:00
kmemleak-test.c
kmemleak.c
ksm.c mm/ksm.c: fix inconsistent accounting of zero pages 2018-04-11 10:28:31 -07:00
list_lru.c
maccess.c
madvise.c
Makefile
memblock.c
memcontrol.c memcg: fix per_node_info cleanup 2018-04-11 10:28:32 -07:00
memory-failure.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
memory.c
memory_hotplug.c mm: unclutter THP migration 2018-04-11 10:28:32 -07:00
mempolicy.c mm: unclutter THP migration 2018-04-11 10:28:32 -07:00
mempool.c
memtest.c
migrate.c mm: unclutter THP migration 2018-04-11 10:28:32 -07:00
mincore.c
mlock.c
mm_init.c
mmap.c
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c sched/numa: avoid trapping faults and attempting migration of file-backed dirty pages 2018-04-11 10:28:31 -07:00
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c
page-writeback.c
page_alloc.c
page_counter.c
page_ext.c
page_idle.c
page_io.c
page_isolation.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
page_owner.c
page_poison.c
page_vma_mapped.c
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
rodata_test.c
shmem.c
slab.c
slab.h
slab_common.c
slob.c
slub.c
sparse-vmemmap.c
sparse.c
swap.c
swap_cgroup.c
swap_slots.c
swap_state.c
swapfile.c mm/swapfile.c: make pointer swap_avail_heads static 2018-04-11 10:28:32 -07:00
truncate.c
usercopy.c
userfaultfd.c
util.c mm: treat indirectly reclaimable memory as free in overcommit logic 2018-04-11 10:28:29 -07:00
vmacache.c
vmalloc.c
vmpressure.c
vmscan.c mm: memcg: make sure memory.events is uptodate when waking pollers 2018-04-11 10:28:31 -07:00
vmstat.c
workingset.c
z3fold.c mm/z3fold.c: use gfpflags_allow_blocking 2018-04-11 10:28:31 -07:00
zbud.c
zpool.c
zsmalloc.c
zswap.c