linux/mm
Jeff Xu 8be7258aad mseal: add mseal syscall
The new mseal() is an syscall on 64 bit CPU, and with following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

Following input during RFC are incooperated into this patch:

Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Linus Torvalds: assisting in defining system call signature and scope.
Liam R. Howlett: perf optimization.
Theo de Raadt: sharing the experiences and insight gained from
  implementing mimmutable() in OpenBSD.

Finally, the idea that inspired this patch comes from Stephen Röttger's
work in Chrome V8 CFI.

[jeffxu@chromium.org: add branch prediction hint, per Pedro]
  Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00
..
damon mm/damon/core: fix return value from damos_wmark_metric_value 2024-05-11 15:41:36 -07:00
kasan fix missing vmalloc.h includes 2024-04-25 20:55:49 -07:00
kfence mm: introduce slabobj_ext to support slab object extensions 2024-04-25 20:55:51 -07:00
kmsan mm: kmsan: implement kmsan_memmove() 2024-04-25 21:07:02 -07:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c
bootmem_info.c
cma.c mm/cma: drop incorrect alignment check in cma_init_reserved_mem 2024-04-25 20:56:42 -07:00
cma.h
cma_debug.c
cma_sysfs.c
compaction.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
debug.c mm/debug: print only page mapcount (excluding folio entire mapcount) in __dump_folio() 2024-05-05 17:53:31 -07:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: test pmd_leaf() behavior with pmd_mkinvalid() 2024-05-07 10:37:00 -07:00
dmapool.c
dmapool_test.c
early_ioremap.c
execmem.c mm/execmem, arch: convert remaining overrides of module_alloc to execmem 2024-05-14 00:31:43 -07:00
fadvise.c
fail_page_alloc.c
failslab.c
filemap.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
folio-compat.c mm: remove __set_page_dirty_nobuffers() 2024-04-25 20:56:25 -07:00
gup.c mm/gup: fix hugepd handling in hugetlb rework 2024-05-07 10:37:01 -07:00
gup_test.c
gup_test.h
highmem.c
hmm.c mm/treewide: replace pXd_huge() with pXd_leaf() 2024-04-25 20:55:46 -07:00
huge_memory.c thp: remove HPAGE_PMD_ORDER minimum assertion 2024-05-07 10:37:02 -07:00
hugetlb.c mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp 2024-05-11 15:41:37 -07:00
hugetlb_cgroup.c mm/hugetlb: assert hugetlb_lock in __hugetlb_cgroup_commit_charge 2024-05-05 17:53:41 -07:00
hugetlb_vmemmap.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
hugetlb_vmemmap.h
hwpoison-inject.c mm/memory-failure: convert shake_page() to shake_folio() 2024-05-05 17:53:45 -07:00
init-mm.c
internal.h mseal: add mseal syscall 2024-05-23 19:40:26 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
Kconfig.debug
khugepaged.c mm: simplify thp_vma_allowable_order 2024-05-05 17:53:53 -07:00
kmemleak.c mm: lift gfp_kmemleak_mask() to gfp.h 2024-05-19 14:40:44 -07:00
ksm.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
list_lru.c
maccess.c
madvise.c mseal: add mseal syscall 2024-05-23 19:40:26 -07:00
Makefile mseal: add mseal syscall 2024-05-23 19:40:26 -07:00
mapping_dirty_helpers.c
memblock.c
memcontrol.c memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order 2024-05-11 15:41:37 -07:00
memfd.c
memory-failure.c memory-failure: remove calls to page_mapping() 2024-05-05 17:53:48 -07:00
memory-tiers.c memory tier: create CPUless memory tiers after obtaining HMAT info 2024-05-05 17:53:26 -07:00
memory.c mm: simplify and improve print_vma_addr() output 2024-05-22 14:37:23 -07:00
memory_hotplug.c mm/hugetlb: rename dissolve_free_huge_pages() to dissolve_free_hugetlb_folios() 2024-05-05 17:53:35 -07:00
mempolicy.c mm: add pmd_folio() 2024-04-25 20:56:19 -07:00
mempool.c mempool: hook up to memory allocation profiling 2024-04-25 20:55:56 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c
migrate.c mm: convert hugetlb_page_mapping_lock_write to folio 2024-05-05 17:53:46 -07:00
migrate_device.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
mincore.c
mlock.c mm: add pmd_folio() 2024-04-25 20:56:19 -07:00
mm_init.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
mm_slot.h
mmap.c mseal: add mseal syscall 2024-05-23 19:40:26 -07:00
mmap_lock.c
mmu_gather.c
mmu_notifier.c mmu_notifier: remove the .change_pte() callback 2024-04-11 13:18:36 -04:00
mmzone.c
mprotect.c mseal: add mseal syscall 2024-05-23 19:40:26 -07:00
mremap.c mseal: add mseal syscall 2024-05-23 19:40:26 -07:00
mseal.c mseal: add mseal syscall 2024-05-23 19:40:26 -07:00
msync.c
nommu.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
oom_kill.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
page-writeback.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
page_alloc.c mm: page_alloc: allowing mTHP compaction to capture the freed page directly 2024-05-05 17:53:37 -07:00
page_counter.c
page_ext.c mm: make page_ext_get() take a const argument 2024-04-25 20:56:14 -07:00
page_idle.c
page_io.c mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters 2024-05-05 17:53:35 -07:00
page_isolation.c mm: page_isolation: prepare for hygienic freelists 2024-04-25 20:56:04 -07:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm/page_table_check: support userfault wr-protect entries 2024-05-05 17:53:41 -07:00
page_vma_mapped.c mm: make page_mapped_in_vma conditional on CONFIG_MEMORY_FAILURE 2024-05-05 17:53:45 -07:00
pagewalk.c
percpu-internal.h mm: percpu: add codetag reference into pcpuobj_ext 2024-04-25 20:55:56 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c mm: percpu: enable per-cpu allocation tagging 2024-04-25 20:55:56 -07:00
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-05-07 10:37:00 -07:00
process_vm_access.c
ptdump.c
readahead.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
rmap.c mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED 2024-05-11 15:41:35 -07:00
rodata_test.c
secretmem.c
shmem.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
shmem_quota.c tmpfs: fix race on handling dquot rbtree 2024-03-26 11:07:23 -07:00
show_mem.c lib: add memory allocations report in show_mem() 2024-04-25 20:55:57 -07:00
shrinker.c
shrinker_debug.c
shuffle.c
shuffle.h
slab.h The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
slab_common.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
slub.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
sparse-vmemmap.c
sparse.c mm/sparse: guard the size of mem_section is power of 2 2024-05-05 17:53:40 -07:00
swap.c mm: add kernel-doc for folio_mark_accessed() 2024-05-05 17:53:50 -07:00
swap.h
swap_cgroup.c
swap_slots.c mm: swap: update get_swap_pages() to take folio order 2024-04-25 20:56:37 -07:00
swap_state.c mm: remove struct page from get_shadow_from_swap_cache 2024-04-25 20:56:40 -07:00
swapfile.c getting rid of bogus set_blocksize() uses, switching it 2024-05-21 08:34:51 -07:00
truncate.c mm: convert pagecache_isize_extended to use a folio 2024-04-25 20:56:43 -07:00
usercopy.c
userfaultfd.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
util.c mm: switch mm->get_unmapped_area() to a flag 2024-04-25 20:56:25 -07:00
vmalloc.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
vmpressure.c
vmscan.c mm/vmscan: remove ignore_references argument of reclaim_folio_list() 2024-05-07 10:37:02 -07:00
vmstat.c iommu: observability of the IOMMU allocations 2024-04-15 14:31:47 +02:00
workingset.c mm: cleanup WORKINGSET_NODES in workingset 2024-05-07 10:36:59 -07:00
z3fold.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zswap.c mm: zswap: remove same_filled module params 2024-05-05 17:53:38 -07:00