linux/arch/x86/kernel
Roman Gushchin cf11e85fc0 mm: hugetlb: optionally allocate gigantic hugepages using cma
Commit 944d9fec8d ("hugetlb: add support for gigantic page allocation
at runtime") has added the run-time allocation of gigantic pages.

However it actually works only at early stages of the system loading,
when the majority of memory is free.  After some time the memory gets
fragmented by non-movable pages, so the chances to find a contiguous 1GB
block are getting close to zero.  Even dropping caches manually doesn't
help a lot.

At large scale rebooting servers in order to allocate gigantic hugepages
is quite expensive and complex.  At the same time keeping some constant
percentage of memory in reserved hugepages even if the workload isn't
using it is a big waste: not all workloads can benefit from using 1 GB
pages.

The following solution can solve the problem:
1) On boot time a dedicated cma area* is reserved. The size is passed
   as a kernel argument.
2) Run-time allocations of gigantic hugepages are performed using the
   cma allocator and the dedicated cma area

In this case gigantic hugepages can be allocated successfully with a
high probability, however the memory isn't completely wasted if nobody
is using 1GB hugepages: it can be used for pagecache, anon memory, THPs,
etc.

* On a multi-node machine a per-node cma area is allocated on each node.
  Following gigantic hugetlb allocation are using the first available
  numa node if the mask isn't specified by a user.

Usage:
1) configure the kernel to allocate a cma area for hugetlb allocations:
   pass hugetlb_cma=10G as a kernel argument

2) allocate hugetlb pages as usual, e.g.
   echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

If the option isn't enabled or the allocation of the cma area failed,
the current behavior of the system is preserved.

x86 and arm-64 are covered by this patch, other architectures can be
trivially added later.

The patch contains clean-ups and fixes proposed and implemented by Aslan
Bakirov and Randy Dunlap.  It also contains ideas and suggestions
proposed by Rik van Riel, Michal Hocko and Mike Kravetz.  Thanks!

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Andreas Schaufler <andreas.schaufler@gmx.de>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Aslan Bakirov <aslan@fb.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
..
acpi x86: ACPI: fix CPU hotplug deadlock 2020-04-04 16:28:24 +02:00
apic Updates for the interrupt subsystem: 2020-03-30 17:35:14 -07:00
cpu SPDX patches for 5.7-rc1. 2020-04-03 13:12:26 -07:00
fpu x86/pkeys: Add check for pkey "overflow" 2020-02-24 20:25:21 +01:00
kprobes x86/optprobe: Fix OPTPROBE vs UACCESS 2020-03-20 13:06:22 +01:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
alternative.c x86/alternatives: Mark text_poke_loc_init() static 2020-03-25 12:42:35 +01:00
amd_gart_64.c dma-mapping updates for 5.5-rc1 2019-11-28 11:16:43 -08:00
amd_nb.c x86/amd_nb, char/amd64-agp: Use amd_nb_num() accessor 2020-03-17 10:25:58 +01:00
apb_timer.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
aperture_64.c x86/gart: Exclude GART aperture from kcore 2019-03-23 12:11:49 +01:00
apm_32.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118 2019-05-24 17:39:02 +02:00
asm-offsets.c efi/x86: Avoid using code32_start 2020-03-08 09:58:17 +01:00
asm-offsets_32.c x86 entry code updates: 2020-03-30 19:14:28 -07:00
asm-offsets_64.c x86/entry: Move max syscall number calculation to syscallhdr.sh 2020-03-21 16:03:21 +01:00
audit_64.c
bootflag.c
check.c
cpuid.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 142 2019-05-30 11:25:17 -07:00
crash.c x86/crash: Use resource_size() 2020-01-09 14:40:03 +01:00
crash_core_32.c x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y 2019-12-23 12:58:41 +01:00
crash_core_64.c x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y 2019-12-23 12:58:41 +01:00
crash_dump_32.c
crash_dump_64.c fs/core/vmcore: Move sev_active() reference to x86 arch code 2019-08-09 22:52:10 +10:00
devicetree.c
doublefault_32.c x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit 2019-11-26 22:00:04 +01:00
dumpstack.c x86/kasan: Print original address on #GP 2019-12-31 13:15:38 +01:00
dumpstack_32.c x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area 2019-11-26 21:53:34 +01:00
dumpstack_64.c x86/dumpstack/64: Don't evaluate exception stacks before setup 2019-11-05 00:51:35 +01:00
e820.c ACPI updates for 5.5-rc1 2019-11-26 19:25:25 -08:00
early-quirks.c x86/intel: Disable HPET on Intel Ice Lake platforms 2019-11-29 12:17:58 +01:00
early_printk.c efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation 2019-02-04 08:27:30 +01:00
ebda.c
eisa.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 243 2019-06-19 17:09:07 +02:00
espfix_64.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 2019-06-05 17:36:37 +02:00
ftrace.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-01-28 09:44:15 -08:00
ftrace_32.S x86/ftrace: Get rid of function_hook 2019-10-25 10:52:22 +02:00
ftrace_64.S New tracing features: 2019-11-27 11:42:01 -08:00
head32.c
head64.c x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area 2019-10-11 18:38:15 +02:00
head_32.S x86/boot: Remove KEEP_SEGMENTS support 2020-02-22 23:37:37 +01:00
head_64.S x86/asm/64: Change all ENTRY+END to SYM_CODE_* 2019-10-18 11:58:26 +02:00
hpet.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
hw_breakpoint.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
i8237.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
i8253.c x86/timer: Skip PIT initialization on modern chipsets 2019-06-29 11:35:35 +02:00
i8259.c
idt.c Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 11:22:57 -07:00
ima_arch.c EFI updates for v5.7: 2020-02-26 15:21:22 +01:00
io_delay.c x86/io_delay: Define IO_DELAY macros in C instead of Kconfig 2019-05-24 08:46:06 +02:00
ioport.c x86/iopl: Include prototype header for ksys_ioperm() 2020-02-17 16:36:53 +01:00
irq.c x86/irq: Remove useless return value from do_IRQ() 2020-02-27 14:48:40 +01:00
irq_32.c x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() code 2019-08-19 23:19:06 +02:00
irq_64.c x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() code 2019-08-19 23:19:06 +02:00
irq_work.c
irqflags.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
irqinit.c x86: Replace setup_irq() by request_irq() 2020-03-21 15:15:47 +01:00
itmt.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
jailhouse.c x86/jailhouse: Only enable platform UARTs if available 2019-10-10 15:43:59 +02:00
jump_label.c x86/jump_label: Move 'inline' keyword placement 2020-03-27 11:05:41 +01:00
kdebugfs.c x86/boot: Introduce setup_indirect 2019-11-12 16:21:15 +01:00
kexec-bzimage64.c efi/x86: Make fw_vendor, config_table and runtime sysfs nodes x86 specific 2020-02-23 21:59:42 +01:00
kgdb.c x86/apic: Provide and use helper for send_IPI_allbutself() 2019-07-25 16:12:00 +02:00
ksysfs.c x86/boot: Introduce setup_indirect 2019-11-12 16:21:15 +01:00
kvm.c KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis 2020-02-28 10:34:25 +01:00
kvmclock.c x86/vdso: Use generic VDSO clock mode storage 2020-02-17 14:40:23 +01:00
ldt.c x86: Remove unneeded includes 2020-03-21 16:03:25 +01:00
livepatch.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
machine_kexec_32.c x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y 2019-12-23 12:58:41 +01:00
machine_kexec_64.c x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y 2019-12-23 12:58:41 +01:00
Makefile Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity 2020-04-02 14:49:46 -07:00
mmconf-fam10h_64.c
module.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
mpparse.c x86/boot: Fix memory leak in default_get_smp_config() 2019-07-16 23:13:48 +02:00
msr.c x86/msr: Restrict MSR access when the kernel is locked down 2019-08-19 21:54:16 -07:00
nmi.c x86: Fix a handful of typos 2020-02-16 20:58:06 +01:00
nmi_selftest.c
paravirt-spinlocks.c
paravirt.c x86/ioperm: Add new paravirt function update_io_bitmap() 2020-02-29 12:43:09 +01:00
paravirt_patch.c x86/paravirt: Standardize 'insn_buff' variable names 2019-04-29 16:05:49 +02:00
pci-dma.c dma-mapping updates for 5.5-rc1 2019-11-28 11:16:43 -08:00
pci-iommu_table.c
pci-swiotlb.c dma-mapping: fix filename references 2019-09-03 08:36:30 +02:00
pcspeaker.c
perf_regs.c perf/x86/regs: Check reserved bits 2019-06-24 19:19:24 +02:00
platform-quirks.c
pmem.c
probe_roms.c
process.c Support for "split lock" detection: 2020-03-30 19:35:52 -07:00
process.h x86: Use the correct SPDX License Identifier in headers 2019-10-01 20:31:35 +02:00
process_32.c x86: Remove unneeded includes 2020-03-21 16:03:25 +01:00
process_64.c x86: Remove unneeded includes 2020-03-21 16:03:25 +01:00
ptrace.c x86/ptrace: Document FSBASE and GSBASE ABI oddities 2019-11-26 22:00:12 +01:00
pvclock.c x86/vdso: Use generic VDSO clock mode storage 2020-02-17 14:40:23 +01:00
quirks.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
reboot.c x86: Fix a handful of typos 2020-02-16 20:58:06 +01:00
reboot_fixups_32.c
relocate_kernel_32.S x86/asm: Annotate relocate_kernel_{32,64}.c 2019-10-18 09:53:19 +02:00
relocate_kernel_64.S x86/kexec: Make relocate_kernel_64.S objtool clean 2020-03-25 18:28:28 +01:00
resource.c
rtc.c
setup.c mm: hugetlb: optionally allocate gigantic hugepages using cma 2020-04-10 15:36:21 -07:00
setup_percpu.c x86: Use pr_warn instead of pr_warning 2019-10-18 15:00:18 +02:00
signal.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-31 11:04:05 -07:00
signal_compat.c
smp.c x86/smp: Move smp_function_call implementations into IPI code 2019-07-25 16:12:01 +02:00
smpboot.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-31 11:04:05 -07:00
stacktrace.c x86 user stack frame reads: switch to explicit __get_user() 2020-02-15 17:26:26 -05:00
step.c
sys_ia32.c x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments 2020-03-21 16:03:24 +01:00
sys_x86_64.c x86: Remove unneeded includes 2020-03-21 16:03:25 +01:00
sysfb.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
sysfb_efi.c x86/sysfb_efi: Add quirks for some devices with swapped width and height 2019-07-22 10:47:11 +02:00
sysfb_simplefb.c x86/sysfb: Fix check for bad VRAM size 2020-01-20 10:57:53 +01:00
tboot.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
time.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-31 11:04:05 -07:00
tls.c x86/tls: Fix possible spectre-v1 in do_get_thread_area() 2019-06-27 23:48:04 +02:00
tls.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193 2019-05-30 11:29:21 -07:00
topology.c x86/smp: Replace cpu_up/down() with add/remove_cpu() 2020-03-25 12:59:35 +01:00
trace_clock.c
tracepoint.c
traps.c Support for "split lock" detection: 2020-03-30 19:35:52 -07:00
tsc.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-31 11:04:05 -07:00
tsc_msr.c x86 timer updates: 2020-03-30 19:55:39 -07:00
tsc_sync.c x86: Fix a handful of typos 2020-02-16 20:58:06 +01:00
umip.c Merge branches 'x86-cpu-for-linus' and 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 08:58:08 -08:00
unwind_frame.c x86/stackframe/32: Provide consistent pt_regs 2019-06-25 10:23:47 +02:00
unwind_guess.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
unwind_orc.c x86/unwind/orc: Fix !CONFIG_MODULES build warning 2020-01-07 08:15:11 +01:00
uprobes.c x86/apic, x86/uprobes: Correct parameter names in kernel-doc comments 2019-10-27 09:00:28 +01:00
verify_cpu.S x86/asm: Annotate local pseudo-functions 2019-10-18 10:04:04 +02:00
vm86_32.c x86: switch save_v86_state() to unsafe_put_user() 2020-03-18 20:36:01 -04:00
vmlinux.lds.S Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-31 10:51:12 -07:00
vsmp_64.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 346 2019-06-05 17:37:08 +02:00
x86_init.c A set of fixes for X86: 2020-02-09 12:11:12 -08:00