mirror of
https://github.com/torvalds/linux
synced 2024-07-23 19:49:59 +00:00
![Linus Torvalds](/assets/img/avatar_default.png)
switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page(). - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY= =J+Dj -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
280 lines
9.9 KiB
Plaintext
280 lines
9.9 KiB
Plaintext
# SPDX-License-Identifier: GPL-2.0-only
|
|
config PAGE_EXTENSION
|
|
bool "Extend memmap on extra space for more information on page"
|
|
help
|
|
Extend memmap on extra space for more information on page. This
|
|
could be used for debugging features that need to insert extra
|
|
field for every page. This extension enables us to save memory
|
|
by not allocating this extra memory according to boottime
|
|
configuration.
|
|
|
|
config DEBUG_PAGEALLOC
|
|
bool "Debug page memory allocations"
|
|
depends on DEBUG_KERNEL
|
|
depends on !HIBERNATION || ARCH_SUPPORTS_DEBUG_PAGEALLOC && !PPC && !SPARC
|
|
select PAGE_POISONING if !ARCH_SUPPORTS_DEBUG_PAGEALLOC
|
|
help
|
|
Unmap pages from the kernel linear mapping after free_pages().
|
|
Depending on runtime enablement, this results in a small or large
|
|
slowdown, but helps to find certain types of memory corruption.
|
|
|
|
Also, the state of page tracking structures is checked more often as
|
|
pages are being allocated and freed, as unexpected state changes
|
|
often happen for same reasons as memory corruption (e.g. double free,
|
|
use-after-free). The error reports for these checks can be augmented
|
|
with stack traces of last allocation and freeing of the page, when
|
|
PAGE_OWNER is also selected and enabled on boot.
|
|
|
|
For architectures which don't enable ARCH_SUPPORTS_DEBUG_PAGEALLOC,
|
|
fill the pages with poison patterns after free_pages() and verify
|
|
the patterns before alloc_pages(). Additionally, this option cannot
|
|
be enabled in combination with hibernation as that would result in
|
|
incorrect warnings of memory corruption after a resume because free
|
|
pages are not saved to the suspend image.
|
|
|
|
By default this option will have a small overhead, e.g. by not
|
|
allowing the kernel mapping to be backed by large pages on some
|
|
architectures. Even bigger overhead comes when the debugging is
|
|
enabled by DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc
|
|
command line parameter.
|
|
|
|
config DEBUG_PAGEALLOC_ENABLE_DEFAULT
|
|
bool "Enable debug page memory allocations by default?"
|
|
depends on DEBUG_PAGEALLOC
|
|
help
|
|
Enable debug page memory allocations by default? This value
|
|
can be overridden by debug_pagealloc=off|on.
|
|
|
|
config DEBUG_SLAB
|
|
bool "Debug slab memory allocations"
|
|
depends on DEBUG_KERNEL && SLAB
|
|
help
|
|
Say Y here to have the kernel do limited verification on memory
|
|
allocation as well as poisoning memory on free to catch use of freed
|
|
memory. This can make kmalloc/kfree-intensive workloads much slower.
|
|
|
|
config SLUB_DEBUG
|
|
default y
|
|
bool "Enable SLUB debugging support" if EXPERT
|
|
depends on SLUB && SYSFS && !SLUB_TINY
|
|
select STACKDEPOT if STACKTRACE_SUPPORT
|
|
help
|
|
SLUB has extensive debug support features. Disabling these can
|
|
result in significant savings in code size. While /sys/kernel/slab
|
|
will still exist (with SYSFS enabled), it will not provide e.g. cache
|
|
validation.
|
|
|
|
config SLUB_DEBUG_ON
|
|
bool "SLUB debugging on by default"
|
|
depends on SLUB && SLUB_DEBUG
|
|
select STACKDEPOT_ALWAYS_INIT if STACKTRACE_SUPPORT
|
|
default n
|
|
help
|
|
Boot with debugging on by default. SLUB boots by default with
|
|
the runtime debug capabilities switched off. Enabling this is
|
|
equivalent to specifying the "slub_debug" parameter on boot.
|
|
There is no support for more fine grained debug control like
|
|
possible with slub_debug=xxx. SLUB debugging may be switched
|
|
off in a kernel built with CONFIG_SLUB_DEBUG_ON by specifying
|
|
"slub_debug=-".
|
|
|
|
config PAGE_OWNER
|
|
bool "Track page owner"
|
|
depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
|
|
select DEBUG_FS
|
|
select STACKTRACE
|
|
select STACKDEPOT
|
|
select PAGE_EXTENSION
|
|
help
|
|
This keeps track of what call chain is the owner of a page, may
|
|
help to find bare alloc_page(s) leaks. Even if you include this
|
|
feature on your build, it is disabled in default. You should pass
|
|
"page_owner=on" to boot parameter in order to enable it. Eats
|
|
a fair amount of memory if enabled. See tools/mm/page_owner_sort.c
|
|
for user-space helper.
|
|
|
|
If unsure, say N.
|
|
|
|
config PAGE_TABLE_CHECK
|
|
bool "Check for invalid mappings in user page tables"
|
|
depends on ARCH_SUPPORTS_PAGE_TABLE_CHECK
|
|
select PAGE_EXTENSION
|
|
help
|
|
Check that anonymous page is not being mapped twice with read write
|
|
permissions. Check that anonymous and file pages are not being
|
|
erroneously shared. Since the checking is performed at the time
|
|
entries are added and removed to user page tables, leaking, corruption
|
|
and double mapping problems are detected synchronously.
|
|
|
|
If unsure say "n".
|
|
|
|
config PAGE_TABLE_CHECK_ENFORCED
|
|
bool "Enforce the page table checking by default"
|
|
depends on PAGE_TABLE_CHECK
|
|
help
|
|
Always enable page table checking. By default the page table checking
|
|
is disabled, and can be optionally enabled via page_table_check=on
|
|
kernel parameter. This config enforces that page table check is always
|
|
enabled.
|
|
|
|
If unsure say "n".
|
|
|
|
config PAGE_POISONING
|
|
bool "Poison pages after freeing"
|
|
help
|
|
Fill the pages with poison patterns after free_pages() and verify
|
|
the patterns before alloc_pages. The filling of the memory helps
|
|
reduce the risk of information leaks from freed data. This does
|
|
have a potential performance impact if enabled with the
|
|
"page_poison=1" kernel boot option.
|
|
|
|
Note that "poison" here is not the same thing as the "HWPoison"
|
|
for CONFIG_MEMORY_FAILURE. This is software poisoning only.
|
|
|
|
If you are only interested in sanitization of freed pages without
|
|
checking the poison pattern on alloc, you can boot the kernel with
|
|
"init_on_free=1" instead of enabling this.
|
|
|
|
If unsure, say N
|
|
|
|
config DEBUG_PAGE_REF
|
|
bool "Enable tracepoint to track down page reference manipulation"
|
|
depends on DEBUG_KERNEL
|
|
depends on TRACEPOINTS
|
|
help
|
|
This is a feature to add tracepoint for tracking down page reference
|
|
manipulation. This tracking is useful to diagnose functional failure
|
|
due to migration failures caused by page reference mismatches. Be
|
|
careful when enabling this feature because it adds about 30 KB to the
|
|
kernel code. However the runtime performance overhead is virtually
|
|
nil until the tracepoints are actually enabled.
|
|
|
|
config DEBUG_RODATA_TEST
|
|
bool "Testcase for the marking rodata read-only"
|
|
depends on STRICT_KERNEL_RWX
|
|
help
|
|
This option enables a testcase for the setting rodata read-only.
|
|
|
|
config ARCH_HAS_DEBUG_WX
|
|
bool
|
|
|
|
config DEBUG_WX
|
|
bool "Warn on W+X mappings at boot"
|
|
depends on ARCH_HAS_DEBUG_WX
|
|
depends on MMU
|
|
select PTDUMP_CORE
|
|
help
|
|
Generate a warning if any W+X mappings are found at boot.
|
|
|
|
This is useful for discovering cases where the kernel is leaving W+X
|
|
mappings after applying NX, as such mappings are a security risk.
|
|
|
|
Look for a message in dmesg output like this:
|
|
|
|
<arch>/mm: Checked W+X mappings: passed, no W+X pages found.
|
|
|
|
or like this, if the check failed:
|
|
|
|
<arch>/mm: Checked W+X mappings: failed, <N> W+X pages found.
|
|
|
|
Note that even if the check fails, your kernel is possibly
|
|
still fine, as W+X mappings are not a security hole in
|
|
themselves, what they do is that they make the exploitation
|
|
of other unfixed kernel bugs easier.
|
|
|
|
There is no runtime or memory usage effect of this option
|
|
once the kernel has booted up - it's a one time check.
|
|
|
|
If in doubt, say "Y".
|
|
|
|
config GENERIC_PTDUMP
|
|
bool
|
|
|
|
config PTDUMP_CORE
|
|
bool
|
|
|
|
config PTDUMP_DEBUGFS
|
|
bool "Export kernel pagetable layout to userspace via debugfs"
|
|
depends on DEBUG_KERNEL
|
|
depends on DEBUG_FS
|
|
depends on GENERIC_PTDUMP
|
|
select PTDUMP_CORE
|
|
help
|
|
Say Y here if you want to show the kernel pagetable layout in a
|
|
debugfs file. This information is only useful for kernel developers
|
|
who are working in architecture specific areas of the kernel.
|
|
It is probably not a good idea to enable this feature in a production
|
|
kernel.
|
|
|
|
If in doubt, say N.
|
|
|
|
config HAVE_DEBUG_KMEMLEAK
|
|
bool
|
|
|
|
config DEBUG_KMEMLEAK
|
|
bool "Kernel memory leak detector"
|
|
depends on DEBUG_KERNEL && HAVE_DEBUG_KMEMLEAK
|
|
select DEBUG_FS
|
|
select STACKTRACE if STACKTRACE_SUPPORT
|
|
select KALLSYMS
|
|
select CRC32
|
|
select STACKDEPOT
|
|
select STACKDEPOT_ALWAYS_INIT if !DEBUG_KMEMLEAK_DEFAULT_OFF
|
|
help
|
|
Say Y here if you want to enable the memory leak
|
|
detector. The memory allocation/freeing is traced in a way
|
|
similar to the Boehm's conservative garbage collector, the
|
|
difference being that the orphan objects are not freed but
|
|
only shown in /sys/kernel/debug/kmemleak. Enabling this
|
|
feature will introduce an overhead to memory
|
|
allocations. See Documentation/dev-tools/kmemleak.rst for more
|
|
details.
|
|
|
|
Enabling DEBUG_SLAB or SLUB_DEBUG may increase the chances
|
|
of finding leaks due to the slab objects poisoning.
|
|
|
|
In order to access the kmemleak file, debugfs needs to be
|
|
mounted (usually at /sys/kernel/debug).
|
|
|
|
config DEBUG_KMEMLEAK_MEM_POOL_SIZE
|
|
int "Kmemleak memory pool size"
|
|
depends on DEBUG_KMEMLEAK
|
|
range 200 1000000
|
|
default 16000
|
|
help
|
|
Kmemleak must track all the memory allocations to avoid
|
|
reporting false positives. Since memory may be allocated or
|
|
freed before kmemleak is fully initialised, use a static pool
|
|
of metadata objects to track such callbacks. After kmemleak is
|
|
fully initialised, this memory pool acts as an emergency one
|
|
if slab allocations fail.
|
|
|
|
config DEBUG_KMEMLEAK_DEFAULT_OFF
|
|
bool "Default kmemleak to off"
|
|
depends on DEBUG_KMEMLEAK
|
|
help
|
|
Say Y here to disable kmemleak by default. It can then be enabled
|
|
on the command line via kmemleak=on.
|
|
|
|
config DEBUG_KMEMLEAK_AUTO_SCAN
|
|
bool "Enable kmemleak auto scan thread on boot up"
|
|
default y
|
|
depends on DEBUG_KMEMLEAK
|
|
help
|
|
Depending on the cpu, kmemleak scan may be cpu intensive and can
|
|
stall user tasks at times. This option enables/disables automatic
|
|
kmemleak scan at boot up.
|
|
|
|
Say N here to disable kmemleak auto scan thread to stop automatic
|
|
scanning. Disabling this option disables automatic reporting of
|
|
memory leaks.
|
|
|
|
If unsure, say Y.
|
|
|
|
config PER_VMA_LOCK_STATS
|
|
bool "Statistics for per-vma locks"
|
|
depends on PER_VMA_LOCK
|
|
default y
|
|
help
|
|
Statistics for per-vma locks.
|