linux/mm
Huang Ying 2628bd6fc0 mm, swap: fix race between swap count continuation operations
One page may store a set of entries of the sis->swap_map
(swap_info_struct->swap_map) in multiple swap clusters.

If some of the entries has sis->swap_map[offset] > SWAP_MAP_MAX,
multiple pages will be used to store the set of entries of the
sis->swap_map.  And the pages are linked with page->lru.  This is called
swap count continuation.  To access the pages which store the set of
entries of the sis->swap_map simultaneously, previously, sis->lock is
used.  But to improve the scalability of __swap_duplicate(), swap
cluster lock may be used in swap_count_continued() now.  This may race
with add_swap_count_continuation() which operates on a nearby swap
cluster, in which the sis->swap_map entries are stored in the same page.

The race can cause wrong swap count in practice, thus cause unfreeable
swap entries or software lockup, etc.

To fix the race, a new spin lock called cont_lock is added to struct
swap_info_struct to protect the swap count continuation page list.  This
is a lock at the swap device level, so the scalability isn't very well.
But it is still much better than the original sis->lock, because it is
only acquired/released when swap count continuation is used.  Which is
considered rare in practice.  If it turns out that the scalability
becomes an issue for some workloads, we can split the lock into some
more fine grained locks.

Link: http://lkml.kernel.org/r/20171017081320.28133-1-ying.huang@intel.com
Fixes: 235b621767 ("mm/swap: add cluster lock")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>	[4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-03 07:39:19 -07:00
..
kasan License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
backing-dev.c mm/backing-dev.c: fix an error handling path in 'cgwb_create()' 2017-09-11 14:16:44 -06:00
balloon_compaction.c mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY 2017-09-08 18:26:46 -07:00
bootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cleancache.c
cma.c mm/cma.c: take __GFP_NOWARN into account in cma_alloc() 2017-10-13 16:18:32 -07:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma_debug.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compaction.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
debug.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dmapool.c
early_ioremap.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fadvise.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
failslab.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
filemap.c mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC 2017-10-03 17:54:24 -07:00
frame_vector.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
frontswap.c
gup.c mm/device-public-memory: device memory cache coherent with CPU 2017-09-08 18:26:46 -07:00
highmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hmm.c mm/hmm: avoid bloating arch that do not make use of HMM 2017-09-08 18:26:46 -07:00
huge_memory.c mm/huge_memory.c: deposit page table when copying a PMD migration entry 2017-11-03 07:39:19 -07:00
hugetlb.c userfaultfd: hugetlbfs: prevent UFFDIO_COPY to fill beyond the end of i_size 2017-11-03 07:39:19 -07:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
internal.h mm, oom: do not rely on TIF_MEMDIE for memory reserves access 2017-09-06 17:27:30 -07:00
interval_tree.c lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
Kconfig mm/hmm: avoid bloating arch that do not make use of HMM 2017-09-08 18:26:46 -07:00
Kconfig.debug
khugepaged.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kmemcheck.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kmemleak-test.c
kmemleak.c mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects 2017-07-06 16:24:34 -07:00
ksm.c ksm: fix unlocked iteration over vmas in cmp_and_merge_page() 2017-10-03 17:54:23 -07:00
list_lru.c mm: memcontrol: use vmalloc fallback for large kmem memcg arrays 2017-10-03 17:54:25 -07:00
maccess.c
madvise.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memblock.c mm/memblock.c: reversed logic in memblock_discard() 2017-08-25 16:12:46 -07:00
memcontrol.c net: memcontrol: defer call to mem_cgroup_sk_alloc() 2017-10-09 20:55:01 -07:00
memory-failure.c x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages 2017-08-17 10:30:49 +02:00
memory.c mm/device-public-memory: fix edge case in _vm_normal_page() 2017-10-03 17:54:25 -07:00
memory_hotplug.c mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long 2017-10-03 17:54:26 -07:00
mempolicy.c mm/mempolicy: fix NUMA_INTERLEAVE_HIT counter 2017-10-13 16:18:32 -07:00
mempool.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mincore.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mlock.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mm_init.c
mmap.c lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
mmu_context.c
mmu_notifier.c mm/mmu_notifier: kill invalidate_page 2017-08-31 16:13:00 -07:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mremap.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
msync.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nobootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nommu.c Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
oom_kill.c mm, oom_reaper: skip mm structs with mmu notifiers 2017-10-03 17:54:24 -07:00
page-writeback.c mm: rename global_page_state to global_zone_page_state 2017-09-06 17:27:29 -07:00
page_alloc.c mm, page_alloc: add scheduling point to memmap_init_zone 2017-10-03 17:54:25 -07:00
page_counter.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_ext.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_idle.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_io.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_isolation.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_owner.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_poison.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_vma_mapped.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pagewalk.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
percpu-internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
percpu-km.c percpu: replace area map allocator with bitmap 2017-07-26 17:41:05 -04:00
percpu-stats.c percpu: fix starting offset for chunk statistics traversal 2017-09-27 14:45:57 -07:00
percpu-vm.c
percpu.c mm, percpu: add support for __GFP_NOWARN flag 2017-10-19 13:13:49 +01:00
pgtable-generic.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
process_vm_access.c
quicklist.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readahead.c
rmap.c lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
rodata_test.c mm: fix RODATA_TEST failure "rodata_test: test data was not read only" 2017-10-03 17:54:24 -07:00
shmem.c mm: treewide: remove GFP_TEMPORARY allocation flag 2017-09-13 18:53:16 -07:00
slab.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
slab.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
slab_common.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
slob.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
slub.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sparse-vmemmap.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sparse.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap.c mm: avoid marking swap cached page as lazyfree 2017-10-03 17:54:24 -07:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_state.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swapfile.c mm, swap: fix race between swap count continuation operations 2017-11-03 07:39:19 -07:00
truncate.c mm/truncate.c: fix THP handling in invalidate_mapping_pages() 2017-07-10 16:32:32 -07:00
usercopy.c
userfaultfd.c userfaultfd: shmem: wire up shmem_mfill_zeropage_pte 2017-09-06 17:27:28 -07:00
util.c mm: rename global_page_state to global_zone_page_state 2017-09-06 17:27:29 -07:00
vmacache.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
vmalloc.c Revert "vmalloc: back off when the current task is killed" 2017-10-13 16:18:32 -07:00
vmpressure.c mm, vmpressure: pass-through notification support 2017-07-10 16:32:31 -07:00
vmscan.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
vmstat.c mm: consider the number in local CPUs when reading NUMA stats 2017-09-08 18:26:47 -07:00
workingset.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
z3fold.c z3fold: fix stale list handling 2017-10-03 17:54:24 -07:00
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc.c: change stat type parameter to int 2017-09-08 18:26:47 -07:00
zswap.c mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare() 2017-07-06 16:24:35 -07:00