linux/Documentation
Mel Gorman bbbecb35a4 mm/page_alloc: delete vm.percpu_pagelist_fraction
Patch series "Calculate pcp->high based on zone sizes and active CPUs", v2.

The per-cpu page allocator (PCP) is meant to reduce contention on the zone
lock but the sizing of batch and high is archaic and neither takes the
zone size into account or the number of CPUs local to a zone.  With larger
zones and more CPUs per node, the contention is getting worse.
Furthermore, the fact that vm.percpu_pagelist_fraction adjusts both batch
and high values means that the sysctl can reduce zone lock contention but
also increase allocation latencies.

This series disassociates pcp->high from pcp->batch and then scales
pcp->high based on the size of the local zone with limited impact to
reclaim and accounting for active CPUs but leaves pcp->batch static.  It
also adapts the number of pages that can be on the pcp list based on
recent freeing patterns.

The motivation is partially to adjust to larger memory sizes but is also
driven by the fact that large batches of page freeing via release_pages()
often shows zone contention as a major part of the problem.  Another is a
bug report based on an older kernel where a multi-terabyte process can
takes several minutes to exit.  A workaround was to use
vm.percpu_pagelist_fraction to increase the pcp->high value but testing
indicated that a production workload could not use the same values because
of an increase in allocation latencies.  Unfortunately, I cannot reproduce
this test case myself as the multi-terabyte machines are in active use but
it should alleviate the problem.

The series aims to address both and partially acts as a pre-requisite.
pcp only works with order-0 which is useless for SLUB (when using high
orders) and THP (unconditionally).  To store high-order pages on PCP, the
pcp->high values need to be increased first.

This patch (of 6):

The vm.percpu_pagelist_fraction is used to increase the batch and high
limits for the per-cpu page allocator (PCP).  The intent behind the sysctl
is to reduce zone lock acquisition when allocating/freeing pages but it
has a problem.  While it can decrease contention, it can also increase
latency on the allocation side due to unreasonably large batch sizes.
This leads to games where an administrator adjusts
percpu_pagelist_fraction on the fly to work around contention and
allocation latency problems.

This series aims to alleviate the problems with zone lock contention while
avoiding the allocation-side latency problems.  For the purposes of
review, it's easier to remove this sysctl now and reintroduce a similar
sysctl later in the series that deals only with pcp->high.

Link: https://lkml.kernel.org/r/20210525080119.5455-1-mgorman@techsingularity.net
Link: https://lkml.kernel.org/r/20210525080119.5455-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:54 -07:00
..
ABI libnvdimm fixes for 5.13-rc2 2021-05-15 08:32:51 -07:00
accounting
admin-guide mm/page_alloc: delete vm.percpu_pagelist_fraction 2021-06-29 10:53:54 -07:00
arm It's been a relatively busy cycle in docsland, though more than usually 2021-04-26 13:22:43 -07:00
arm64 Assorted arm64 fixes and clean-ups, the most important: 2021-05-07 12:11:05 -07:00
block Documentation: drop optional BOMs 2021-05-10 15:17:34 -06:00
bpf bpf: Document the pahole release info related to libbpf in bpf_devel_QA.rst 2021-04-23 17:11:58 -07:00
cdrom docs: cdrom-standard.rst: get rid of uneeded UTF-8 chars 2021-05-11 11:00:17 -06:00
core-api A few late-arriving documentation fixes, including some oprofile cleanup, a 2021-05-06 08:33:54 -07:00
cpu-freq
crypto
dev-tools kasan: test: improve failure message in KUNIT_EXPECT_KASAN_FAIL() 2021-06-29 10:53:52 -07:00
devicetree USB fixes for 5.13-rc6 2021-06-12 12:34:49 -07:00
doc-guide
driver-api usb: Restore the usb_header label 2021-05-21 14:41:37 +02:00
fault-injection
fb Documentation: Add leading slash to some paths 2021-03-31 13:49:19 -06:00
features powerpc updates for 5.13 2021-04-30 12:22:28 -07:00
filesystems erofs: update documentation about data compression 2021-05-11 16:47:15 +08:00
firmware-guide Documentation: firmware-guide: gpio-properties: Add note to SPI CS case 2021-04-28 19:11:13 +02:00
firmware_class
fpga
gpu drm-misc-next for 5.13: 2021-04-07 17:32:12 +10:00
hid Documentation: Add leading slash to some paths 2021-03-31 13:49:19 -06:00
hwmon docs: hwmon: tmp103.rst: fix bad usage of UTF-8 chars 2021-05-11 11:00:18 -06:00
i2c
ia64
ide
iio
infiniband
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2021-05-06 23:37:55 -07:00
isdn
kbuild Kconfig updates for v5.13 2021-04-29 14:32:00 -07:00
kernel-hacking
leds Documentation: Add leading slash to some paths 2021-03-31 13:49:19 -06:00
litmus-tests
livepatch
locking
m68k
maintainer
mhi
mips
misc-devices dw-xdata-pcie: Update outdated info and improve text format 2021-04-14 19:47:28 +02:00
netlabel
networking docs: networking: device_drivers: fix bad usage of UTF-8 chars 2021-05-11 11:00:18 -06:00
nios2
nvdimm
openrisc
parisc
PCI
pcmcia
power power supply and reset changes for the v5.13 series 2021-04-28 15:43:58 -07:00
powerpc powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls 2021-05-21 00:58:03 +10:00
process Documentation: drop optional BOMs 2021-05-10 15:17:34 -06:00
RCU
riscv riscv: Ensure BPF_JIT_REGION_START aligned with PMD size 2021-06-18 21:10:05 -07:00
s390 s390/pci: expose UID uniqueness guarantee 2021-04-05 11:30:57 +02:00
scheduler sched,doc: sched_debug_verbose cmdline should be sched_verbose 2021-05-06 15:33:26 +02:00
scsi for-5.13/block-2021-04-27 2021-04-28 14:27:12 -07:00
security Documentation: drop optional BOMs 2021-05-10 15:17:34 -06:00
sh
sound
sparc
sphinx
sphinx-static
spi spi: Updates for v5.13 2021-04-26 16:32:11 -07:00
staging
target
timers Documentation: drop optional BOMs 2021-05-10 15:17:34 -06:00
trace Documentation: trace: Add documentation for TRBE 2021-04-06 16:05:38 -06:00
translations docs/zh_CN: Remove obsolete translation file 2021-05-10 15:14:31 -06:00
usb USB fixes for 5.13-rc2 2021-05-16 09:55:05 -07:00
userspace-api Documentation: seccomp: Fix user notification documentation 2021-05-28 09:54:12 -07:00
virt Bugfixes, including a TLB flush fix that affects processors 2021-06-09 13:09:57 -07:00
vm mm/slub: clarify verification reporting 2021-06-16 09:24:42 -07:00
w1
watchdog
x86 x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG 2021-05-10 07:51:38 +02:00
xtensa
.gitignore
arch.rst
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py
COPYING-logo
docutils.conf
dontdiff kbuild: generate Module.symvers only when vmlinux exists 2021-04-25 05:17:02 +09:00
index.rst
Kconfig
logo.gif
Makefile
memory-barriers.txt
SubmittingPatches
watch_queue.rst