mirror of
https://github.com/torvalds/linux
synced 2024-07-21 10:41:44 +00:00
Jeff Xu's implementation of the mseal() syscall.
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZlDhVAAKCRDdBJ7gKXxA jqDSAP0aGY505ka3+ffe6e5OP7W7syKjXHLy84Hp2t6YWnU+6QEA86qcXnfOI7HB 7FPy+fa9sMm6BfAAZPkYnICAgVpbBAw= =Q3vf -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more mm updates from Andrew Morton: "Jeff Xu's implementation of the mseal() syscall" * tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: selftest mm/mseal read-only elf memory segment mseal: add documentation selftest mm/mseal memory sealing mseal: add mseal syscall mseal: wire up mseal syscall
This commit is contained in:
commit
0b32d436c0
|
@ -20,6 +20,7 @@ System calls
|
||||||
futex2
|
futex2
|
||||||
ebpf/index
|
ebpf/index
|
||||||
ioctl/index
|
ioctl/index
|
||||||
|
mseal
|
||||||
|
|
||||||
Security-related interfaces
|
Security-related interfaces
|
||||||
===========================
|
===========================
|
||||||
|
|
199
Documentation/userspace-api/mseal.rst
Normal file
199
Documentation/userspace-api/mseal.rst
Normal file
|
@ -0,0 +1,199 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
=====================
|
||||||
|
Introduction of mseal
|
||||||
|
=====================
|
||||||
|
|
||||||
|
:Author: Jeff Xu <jeffxu@chromium.org>
|
||||||
|
|
||||||
|
Modern CPUs support memory permissions such as RW and NX bits. The memory
|
||||||
|
permission feature improves security stance on memory corruption bugs, i.e.
|
||||||
|
the attacker can’t just write to arbitrary memory and point the code to it,
|
||||||
|
the memory has to be marked with X bit, or else an exception will happen.
|
||||||
|
|
||||||
|
Memory sealing additionally protects the mapping itself against
|
||||||
|
modifications. This is useful to mitigate memory corruption issues where a
|
||||||
|
corrupted pointer is passed to a memory management system. For example,
|
||||||
|
such an attacker primitive can break control-flow integrity guarantees
|
||||||
|
since read-only memory that is supposed to be trusted can become writable
|
||||||
|
or .text pages can get remapped. Memory sealing can automatically be
|
||||||
|
applied by the runtime loader to seal .text and .rodata pages and
|
||||||
|
applications can additionally seal security critical data at runtime.
|
||||||
|
|
||||||
|
A similar feature already exists in the XNU kernel with the
|
||||||
|
VM_FLAGS_PERMANENT flag [1] and on OpenBSD with the mimmutable syscall [2].
|
||||||
|
|
||||||
|
User API
|
||||||
|
========
|
||||||
|
mseal()
|
||||||
|
-----------
|
||||||
|
The mseal() syscall has the following signature:
|
||||||
|
|
||||||
|
``int mseal(void addr, size_t len, unsigned long flags)``
|
||||||
|
|
||||||
|
**addr/len**: virtual memory address range.
|
||||||
|
|
||||||
|
The address range set by ``addr``/``len`` must meet:
|
||||||
|
- The start address must be in an allocated VMA.
|
||||||
|
- The start address must be page aligned.
|
||||||
|
- The end address (``addr`` + ``len``) must be in an allocated VMA.
|
||||||
|
- no gap (unallocated memory) between start and end address.
|
||||||
|
|
||||||
|
The ``len`` will be paged aligned implicitly by the kernel.
|
||||||
|
|
||||||
|
**flags**: reserved for future use.
|
||||||
|
|
||||||
|
**return values**:
|
||||||
|
|
||||||
|
- ``0``: Success.
|
||||||
|
|
||||||
|
- ``-EINVAL``:
|
||||||
|
- Invalid input ``flags``.
|
||||||
|
- The start address (``addr``) is not page aligned.
|
||||||
|
- Address range (``addr`` + ``len``) overflow.
|
||||||
|
|
||||||
|
- ``-ENOMEM``:
|
||||||
|
- The start address (``addr``) is not allocated.
|
||||||
|
- The end address (``addr`` + ``len``) is not allocated.
|
||||||
|
- A gap (unallocated memory) between start and end address.
|
||||||
|
|
||||||
|
- ``-EPERM``:
|
||||||
|
- sealing is supported only on 64-bit CPUs, 32-bit is not supported.
|
||||||
|
|
||||||
|
- For above error cases, users can expect the given memory range is
|
||||||
|
unmodified, i.e. no partial update.
|
||||||
|
|
||||||
|
- There might be other internal errors/cases not listed here, e.g.
|
||||||
|
error during merging/splitting VMAs, or the process reaching the max
|
||||||
|
number of supported VMAs. In those cases, partial updates to the given
|
||||||
|
memory range could happen. However, those cases should be rare.
|
||||||
|
|
||||||
|
**Blocked operations after sealing**:
|
||||||
|
Unmapping, moving to another location, and shrinking the size,
|
||||||
|
via munmap() and mremap(), can leave an empty space, therefore
|
||||||
|
can be replaced with a VMA with a new set of attributes.
|
||||||
|
|
||||||
|
Moving or expanding a different VMA into the current location,
|
||||||
|
via mremap().
|
||||||
|
|
||||||
|
Modifying a VMA via mmap(MAP_FIXED).
|
||||||
|
|
||||||
|
Size expansion, via mremap(), does not appear to pose any
|
||||||
|
specific risks to sealed VMAs. It is included anyway because
|
||||||
|
the use case is unclear. In any case, users can rely on
|
||||||
|
merging to expand a sealed VMA.
|
||||||
|
|
||||||
|
mprotect() and pkey_mprotect().
|
||||||
|
|
||||||
|
Some destructive madvice() behaviors (e.g. MADV_DONTNEED)
|
||||||
|
for anonymous memory, when users don't have write permission to the
|
||||||
|
memory. Those behaviors can alter region contents by discarding pages,
|
||||||
|
effectively a memset(0) for anonymous memory.
|
||||||
|
|
||||||
|
Kernel will return -EPERM for blocked operations.
|
||||||
|
|
||||||
|
For blocked operations, one can expect the given address is unmodified,
|
||||||
|
i.e. no partial update. Note, this is different from existing mm
|
||||||
|
system call behaviors, where partial updates are made till an error is
|
||||||
|
found and returned to userspace. To give an example:
|
||||||
|
|
||||||
|
Assume following code sequence:
|
||||||
|
|
||||||
|
- ptr = mmap(null, 8192, PROT_NONE);
|
||||||
|
- munmap(ptr + 4096, 4096);
|
||||||
|
- ret1 = mprotect(ptr, 8192, PROT_READ);
|
||||||
|
- mseal(ptr, 4096);
|
||||||
|
- ret2 = mprotect(ptr, 8192, PROT_NONE);
|
||||||
|
|
||||||
|
ret1 will be -ENOMEM, the page from ptr is updated to PROT_READ.
|
||||||
|
|
||||||
|
ret2 will be -EPERM, the page remains to be PROT_READ.
|
||||||
|
|
||||||
|
**Note**:
|
||||||
|
|
||||||
|
- mseal() only works on 64-bit CPUs, not 32-bit CPU.
|
||||||
|
|
||||||
|
- users can call mseal() multiple times, mseal() on an already sealed memory
|
||||||
|
is a no-action (not error).
|
||||||
|
|
||||||
|
- munseal() is not supported.
|
||||||
|
|
||||||
|
Use cases:
|
||||||
|
==========
|
||||||
|
- glibc:
|
||||||
|
The dynamic linker, during loading ELF executables, can apply sealing to
|
||||||
|
non-writable memory segments.
|
||||||
|
|
||||||
|
- Chrome browser: protect some security sensitive data-structures.
|
||||||
|
|
||||||
|
Notes on which memory to seal:
|
||||||
|
==============================
|
||||||
|
|
||||||
|
It might be important to note that sealing changes the lifetime of a mapping,
|
||||||
|
i.e. the sealed mapping won’t be unmapped till the process terminates or the
|
||||||
|
exec system call is invoked. Applications can apply sealing to any virtual
|
||||||
|
memory region from userspace, but it is crucial to thoroughly analyze the
|
||||||
|
mapping's lifetime prior to apply the sealing.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
- aio/shm
|
||||||
|
|
||||||
|
aio/shm can call mmap()/munmap() on behalf of userspace, e.g. ksys_shmdt() in
|
||||||
|
shm.c. The lifetime of those mapping are not tied to the lifetime of the
|
||||||
|
process. If those memories are sealed from userspace, then munmap() will fail,
|
||||||
|
causing leaks in VMA address space during the lifetime of the process.
|
||||||
|
|
||||||
|
- Brk (heap)
|
||||||
|
|
||||||
|
Currently, userspace applications can seal parts of the heap by calling
|
||||||
|
malloc() and mseal().
|
||||||
|
let's assume following calls from user space:
|
||||||
|
|
||||||
|
- ptr = malloc(size);
|
||||||
|
- mprotect(ptr, size, RO);
|
||||||
|
- mseal(ptr, size);
|
||||||
|
- free(ptr);
|
||||||
|
|
||||||
|
Technically, before mseal() is added, the user can change the protection of
|
||||||
|
the heap by calling mprotect(RO). As long as the user changes the protection
|
||||||
|
back to RW before free(), the memory range can be reused.
|
||||||
|
|
||||||
|
Adding mseal() into the picture, however, the heap is then sealed partially,
|
||||||
|
the user can still free it, but the memory remains to be RO. If the address
|
||||||
|
is re-used by the heap manager for another malloc, the process might crash
|
||||||
|
soon after. Therefore, it is important not to apply sealing to any memory
|
||||||
|
that might get recycled.
|
||||||
|
|
||||||
|
Furthermore, even if the application never calls the free() for the ptr,
|
||||||
|
the heap manager may invoke the brk system call to shrink the size of the
|
||||||
|
heap. In the kernel, the brk-shrink will call munmap(). Consequently,
|
||||||
|
depending on the location of the ptr, the outcome of brk-shrink is
|
||||||
|
nondeterministic.
|
||||||
|
|
||||||
|
|
||||||
|
Additional notes:
|
||||||
|
=================
|
||||||
|
As Jann Horn pointed out in [3], there are still a few ways to write
|
||||||
|
to RO memory, which is, in a way, by design. Those cases are not covered
|
||||||
|
by mseal(). If applications want to block such cases, sandbox tools (such as
|
||||||
|
seccomp, LSM, etc) might be considered.
|
||||||
|
|
||||||
|
Those cases are:
|
||||||
|
|
||||||
|
- Write to read-only memory through /proc/self/mem interface.
|
||||||
|
- Write to read-only memory through ptrace (such as PTRACE_POKETEXT).
|
||||||
|
- userfaultfd.
|
||||||
|
|
||||||
|
The idea that inspired this patch comes from Stephen Röttger’s work in V8
|
||||||
|
CFI [4]. Chrome browser in ChromeOS will be the first user of this API.
|
||||||
|
|
||||||
|
Reference:
|
||||||
|
==========
|
||||||
|
[1] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274
|
||||||
|
|
||||||
|
[2] https://man.openbsd.org/mimmutable.2
|
||||||
|
|
||||||
|
[3] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com
|
||||||
|
|
||||||
|
[4] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc
|
|
@ -501,3 +501,4 @@
|
||||||
569 common lsm_get_self_attr sys_lsm_get_self_attr
|
569 common lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
570 common lsm_set_self_attr sys_lsm_set_self_attr
|
570 common lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
571 common lsm_list_modules sys_lsm_list_modules
|
571 common lsm_list_modules sys_lsm_list_modules
|
||||||
|
572 common mseal sys_mseal
|
||||||
|
|
|
@ -475,3 +475,4 @@
|
||||||
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 common lsm_list_modules sys_lsm_list_modules
|
461 common lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 common mseal sys_mseal
|
||||||
|
|
|
@ -39,7 +39,7 @@
|
||||||
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
|
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
|
||||||
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
|
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
|
||||||
|
|
||||||
#define __NR_compat_syscalls 462
|
#define __NR_compat_syscalls 463
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#define __ARCH_WANT_SYS_CLONE
|
#define __ARCH_WANT_SYS_CLONE
|
||||||
|
|
|
@ -929,6 +929,8 @@ __SYSCALL(__NR_lsm_get_self_attr, sys_lsm_get_self_attr)
|
||||||
__SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
|
__SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
|
||||||
#define __NR_lsm_list_modules 461
|
#define __NR_lsm_list_modules 461
|
||||||
__SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
|
__SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
|
||||||
|
#define __NR_mseal 462
|
||||||
|
__SYSCALL(__NR_mseal, sys_mseal)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Please add new compat syscalls above this comment and update
|
* Please add new compat syscalls above this comment and update
|
||||||
|
|
|
@ -461,3 +461,4 @@
|
||||||
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 common lsm_list_modules sys_lsm_list_modules
|
461 common lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 common mseal sys_mseal
|
||||||
|
|
|
@ -467,3 +467,4 @@
|
||||||
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 common lsm_list_modules sys_lsm_list_modules
|
461 common lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 common mseal sys_mseal
|
||||||
|
|
|
@ -400,3 +400,4 @@
|
||||||
459 n32 lsm_get_self_attr sys_lsm_get_self_attr
|
459 n32 lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 n32 lsm_set_self_attr sys_lsm_set_self_attr
|
460 n32 lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 n32 lsm_list_modules sys_lsm_list_modules
|
461 n32 lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 n32 mseal sys_mseal
|
||||||
|
|
|
@ -376,3 +376,4 @@
|
||||||
459 n64 lsm_get_self_attr sys_lsm_get_self_attr
|
459 n64 lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 n64 lsm_set_self_attr sys_lsm_set_self_attr
|
460 n64 lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 n64 lsm_list_modules sys_lsm_list_modules
|
461 n64 lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 n64 mseal sys_mseal
|
||||||
|
|
|
@ -449,3 +449,4 @@
|
||||||
459 o32 lsm_get_self_attr sys_lsm_get_self_attr
|
459 o32 lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 o32 lsm_set_self_attr sys_lsm_set_self_attr
|
460 o32 lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 o32 lsm_list_modules sys_lsm_list_modules
|
461 o32 lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 o32 mseal sys_mseal
|
||||||
|
|
|
@ -460,3 +460,4 @@
|
||||||
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 common lsm_list_modules sys_lsm_list_modules
|
461 common lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 common mseal sys_mseal
|
||||||
|
|
|
@ -548,3 +548,4 @@
|
||||||
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 common lsm_list_modules sys_lsm_list_modules
|
461 common lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 common mseal sys_mseal
|
||||||
|
|
|
@ -464,3 +464,4 @@
|
||||||
459 common lsm_get_self_attr sys_lsm_get_self_attr sys_lsm_get_self_attr
|
459 common lsm_get_self_attr sys_lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 common lsm_set_self_attr sys_lsm_set_self_attr sys_lsm_set_self_attr
|
460 common lsm_set_self_attr sys_lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 common lsm_list_modules sys_lsm_list_modules sys_lsm_list_modules
|
461 common lsm_list_modules sys_lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 common mseal sys_mseal sys_mseal
|
||||||
|
|
|
@ -464,3 +464,4 @@
|
||||||
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 common lsm_list_modules sys_lsm_list_modules
|
461 common lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 common mseal sys_mseal
|
||||||
|
|
|
@ -507,3 +507,4 @@
|
||||||
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 common lsm_list_modules sys_lsm_list_modules
|
461 common lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 common mseal sys_mseal
|
||||||
|
|
|
@ -466,3 +466,4 @@
|
||||||
459 i386 lsm_get_self_attr sys_lsm_get_self_attr
|
459 i386 lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 i386 lsm_set_self_attr sys_lsm_set_self_attr
|
460 i386 lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 i386 lsm_list_modules sys_lsm_list_modules
|
461 i386 lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 i386 mseal sys_mseal
|
||||||
|
|
|
@ -383,6 +383,7 @@
|
||||||
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 common lsm_list_modules sys_lsm_list_modules
|
461 common lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 common mseal sys_mseal
|
||||||
|
|
||||||
#
|
#
|
||||||
# Due to a historical design error, certain syscalls are numbered differently
|
# Due to a historical design error, certain syscalls are numbered differently
|
||||||
|
|
|
@ -432,3 +432,4 @@
|
||||||
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
459 common lsm_get_self_attr sys_lsm_get_self_attr
|
||||||
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
460 common lsm_set_self_attr sys_lsm_set_self_attr
|
||||||
461 common lsm_list_modules sys_lsm_list_modules
|
461 common lsm_list_modules sys_lsm_list_modules
|
||||||
|
462 common mseal sys_mseal
|
||||||
|
|
|
@ -821,6 +821,7 @@ asmlinkage long sys_process_mrelease(int pidfd, unsigned int flags);
|
||||||
asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
|
asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
|
||||||
unsigned long prot, unsigned long pgoff,
|
unsigned long prot, unsigned long pgoff,
|
||||||
unsigned long flags);
|
unsigned long flags);
|
||||||
|
asmlinkage long sys_mseal(unsigned long start, size_t len, unsigned long flags);
|
||||||
asmlinkage long sys_mbind(unsigned long start, unsigned long len,
|
asmlinkage long sys_mbind(unsigned long start, unsigned long len,
|
||||||
unsigned long mode,
|
unsigned long mode,
|
||||||
const unsigned long __user *nmask,
|
const unsigned long __user *nmask,
|
||||||
|
|
|
@ -842,8 +842,11 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
|
||||||
#define __NR_lsm_list_modules 461
|
#define __NR_lsm_list_modules 461
|
||||||
__SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
|
__SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
|
||||||
|
|
||||||
|
#define __NR_mseal 462
|
||||||
|
__SYSCALL(__NR_mseal, sys_mseal)
|
||||||
|
|
||||||
#undef __NR_syscalls
|
#undef __NR_syscalls
|
||||||
#define __NR_syscalls 462
|
#define __NR_syscalls 463
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* 32 bit systems traditionally used different
|
* 32 bit systems traditionally used different
|
||||||
|
|
|
@ -196,6 +196,7 @@ COND_SYSCALL(migrate_pages);
|
||||||
COND_SYSCALL(move_pages);
|
COND_SYSCALL(move_pages);
|
||||||
COND_SYSCALL(set_mempolicy_home_node);
|
COND_SYSCALL(set_mempolicy_home_node);
|
||||||
COND_SYSCALL(cachestat);
|
COND_SYSCALL(cachestat);
|
||||||
|
COND_SYSCALL(mseal);
|
||||||
|
|
||||||
COND_SYSCALL(perf_event_open);
|
COND_SYSCALL(perf_event_open);
|
||||||
COND_SYSCALL(accept4);
|
COND_SYSCALL(accept4);
|
||||||
|
|
|
@ -43,6 +43,10 @@ ifdef CONFIG_CROSS_MEMORY_ATTACH
|
||||||
mmu-$(CONFIG_MMU) += process_vm_access.o
|
mmu-$(CONFIG_MMU) += process_vm_access.o
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
ifdef CONFIG_64BIT
|
||||||
|
mmu-$(CONFIG_MMU) += mseal.o
|
||||||
|
endif
|
||||||
|
|
||||||
obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
|
obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
|
||||||
maccess.o page-writeback.o folio-compat.o \
|
maccess.o page-writeback.o folio-compat.o \
|
||||||
readahead.o swap.o truncate.o vmscan.o shrinker.o \
|
readahead.o swap.o truncate.o vmscan.o shrinker.o \
|
||||||
|
|
|
@ -1435,6 +1435,43 @@ void __meminit __init_single_page(struct page *page, unsigned long pfn,
|
||||||
unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
|
unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
|
||||||
int priority);
|
int priority);
|
||||||
|
|
||||||
|
#ifdef CONFIG_64BIT
|
||||||
|
/* VM is sealed, in vm_flags */
|
||||||
|
#define VM_SEALED _BITUL(63)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef CONFIG_64BIT
|
||||||
|
static inline int can_do_mseal(unsigned long flags)
|
||||||
|
{
|
||||||
|
if (flags)
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool can_modify_mm(struct mm_struct *mm, unsigned long start,
|
||||||
|
unsigned long end);
|
||||||
|
bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
|
||||||
|
unsigned long end, int behavior);
|
||||||
|
#else
|
||||||
|
static inline int can_do_mseal(unsigned long flags)
|
||||||
|
{
|
||||||
|
return -EPERM;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start,
|
||||||
|
unsigned long end)
|
||||||
|
{
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
|
||||||
|
unsigned long end, int behavior)
|
||||||
|
{
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
#ifdef CONFIG_SHRINKER_DEBUG
|
#ifdef CONFIG_SHRINKER_DEBUG
|
||||||
static inline __printf(2, 0) int shrinker_debugfs_name_alloc(
|
static inline __printf(2, 0) int shrinker_debugfs_name_alloc(
|
||||||
struct shrinker *shrinker, const char *fmt, va_list ap)
|
struct shrinker *shrinker, const char *fmt, va_list ap)
|
||||||
|
|
12
mm/madvise.c
12
mm/madvise.c
|
@ -1401,6 +1401,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
|
||||||
* -EIO - an I/O error occurred while paging in data.
|
* -EIO - an I/O error occurred while paging in data.
|
||||||
* -EBADF - map exists, but area maps something that isn't a file.
|
* -EBADF - map exists, but area maps something that isn't a file.
|
||||||
* -EAGAIN - a kernel resource was temporarily unavailable.
|
* -EAGAIN - a kernel resource was temporarily unavailable.
|
||||||
|
* -EPERM - memory is sealed.
|
||||||
*/
|
*/
|
||||||
int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior)
|
int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior)
|
||||||
{
|
{
|
||||||
|
@ -1444,6 +1445,15 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
|
||||||
start = untagged_addr_remote(mm, start);
|
start = untagged_addr_remote(mm, start);
|
||||||
end = start + len;
|
end = start + len;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check if the address range is sealed for do_madvise().
|
||||||
|
* can_modify_mm_madv assumes we have acquired the lock on MM.
|
||||||
|
*/
|
||||||
|
if (unlikely(!can_modify_mm_madv(mm, start, end, behavior))) {
|
||||||
|
error = -EPERM;
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
blk_start_plug(&plug);
|
blk_start_plug(&plug);
|
||||||
switch (behavior) {
|
switch (behavior) {
|
||||||
case MADV_POPULATE_READ:
|
case MADV_POPULATE_READ:
|
||||||
|
@ -1456,6 +1466,8 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
blk_finish_plug(&plug);
|
blk_finish_plug(&plug);
|
||||||
|
|
||||||
|
out:
|
||||||
if (write)
|
if (write)
|
||||||
mmap_write_unlock(mm);
|
mmap_write_unlock(mm);
|
||||||
else
|
else
|
||||||
|
|
31
mm/mmap.c
31
mm/mmap.c
|
@ -1255,6 +1255,16 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
|
||||||
if (mm->map_count > sysctl_max_map_count)
|
if (mm->map_count > sysctl_max_map_count)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* addr is returned from get_unmapped_area,
|
||||||
|
* There are two cases:
|
||||||
|
* 1> MAP_FIXED == false
|
||||||
|
* unallocated memory, no need to check sealing.
|
||||||
|
* 1> MAP_FIXED == true
|
||||||
|
* sealing is checked inside mmap_region when
|
||||||
|
* do_vmi_munmap is called.
|
||||||
|
*/
|
||||||
|
|
||||||
if (prot == PROT_EXEC) {
|
if (prot == PROT_EXEC) {
|
||||||
pkey = execute_only_pkey(mm);
|
pkey = execute_only_pkey(mm);
|
||||||
if (pkey < 0)
|
if (pkey < 0)
|
||||||
|
@ -2727,6 +2737,14 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
|
||||||
if (end == start)
|
if (end == start)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check if memory is sealed before arch_unmap.
|
||||||
|
* Prevent unmapping a sealed VMA.
|
||||||
|
* can_modify_mm assumes we have acquired the lock on MM.
|
||||||
|
*/
|
||||||
|
if (unlikely(!can_modify_mm(mm, start, end)))
|
||||||
|
return -EPERM;
|
||||||
|
|
||||||
/* arch_unmap() might do unmaps itself. */
|
/* arch_unmap() might do unmaps itself. */
|
||||||
arch_unmap(mm, start, end);
|
arch_unmap(mm, start, end);
|
||||||
|
|
||||||
|
@ -2789,7 +2807,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Unmap any existing mapping in the area */
|
/* Unmap any existing mapping in the area */
|
||||||
if (do_vmi_munmap(&vmi, mm, addr, len, uf, false))
|
error = do_vmi_munmap(&vmi, mm, addr, len, uf, false);
|
||||||
|
if (error == -EPERM)
|
||||||
|
return error;
|
||||||
|
else if (error)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -3139,6 +3160,14 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
|
||||||
{
|
{
|
||||||
struct mm_struct *mm = vma->vm_mm;
|
struct mm_struct *mm = vma->vm_mm;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check if memory is sealed before arch_unmap.
|
||||||
|
* Prevent unmapping a sealed VMA.
|
||||||
|
* can_modify_mm assumes we have acquired the lock on MM.
|
||||||
|
*/
|
||||||
|
if (unlikely(!can_modify_mm(mm, start, end)))
|
||||||
|
return -EPERM;
|
||||||
|
|
||||||
arch_unmap(mm, start, end);
|
arch_unmap(mm, start, end);
|
||||||
return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
|
return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
|
||||||
}
|
}
|
||||||
|
|
|
@ -32,6 +32,7 @@
|
||||||
#include <linux/sched/sysctl.h>
|
#include <linux/sched/sysctl.h>
|
||||||
#include <linux/userfaultfd_k.h>
|
#include <linux/userfaultfd_k.h>
|
||||||
#include <linux/memory-tiers.h>
|
#include <linux/memory-tiers.h>
|
||||||
|
#include <uapi/linux/mman.h>
|
||||||
#include <asm/cacheflush.h>
|
#include <asm/cacheflush.h>
|
||||||
#include <asm/mmu_context.h>
|
#include <asm/mmu_context.h>
|
||||||
#include <asm/tlbflush.h>
|
#include <asm/tlbflush.h>
|
||||||
|
@ -744,6 +745,15 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* checking if memory is sealed.
|
||||||
|
* can_modify_mm assumes we have acquired the lock on MM.
|
||||||
|
*/
|
||||||
|
if (unlikely(!can_modify_mm(current->mm, start, end))) {
|
||||||
|
error = -EPERM;
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
prev = vma_prev(&vmi);
|
prev = vma_prev(&vmi);
|
||||||
if (start > vma->vm_start)
|
if (start > vma->vm_start)
|
||||||
prev = vma;
|
prev = vma;
|
||||||
|
|
31
mm/mremap.c
31
mm/mremap.c
|
@ -902,7 +902,25 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
|
||||||
if ((mm->map_count + 2) >= sysctl_max_map_count - 3)
|
if ((mm->map_count + 2) >= sysctl_max_map_count - 3)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* In mremap_to().
|
||||||
|
* Move a VMA to another location, check if src addr is sealed.
|
||||||
|
*
|
||||||
|
* Place can_modify_mm here because mremap_to()
|
||||||
|
* does its own checking for address range, and we only
|
||||||
|
* check the sealing after passing those checks.
|
||||||
|
*
|
||||||
|
* can_modify_mm assumes we have acquired the lock on MM.
|
||||||
|
*/
|
||||||
|
if (unlikely(!can_modify_mm(mm, addr, addr + old_len)))
|
||||||
|
return -EPERM;
|
||||||
|
|
||||||
if (flags & MREMAP_FIXED) {
|
if (flags & MREMAP_FIXED) {
|
||||||
|
/*
|
||||||
|
* In mremap_to().
|
||||||
|
* VMA is moved to dst address, and munmap dst first.
|
||||||
|
* do_munmap will check if dst is sealed.
|
||||||
|
*/
|
||||||
ret = do_munmap(mm, new_addr, new_len, uf_unmap_early);
|
ret = do_munmap(mm, new_addr, new_len, uf_unmap_early);
|
||||||
if (ret)
|
if (ret)
|
||||||
goto out;
|
goto out;
|
||||||
|
@ -1061,6 +1079,19 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
|
||||||
goto out;
|
goto out;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Below is shrink/expand case (not mremap_to())
|
||||||
|
* Check if src address is sealed, if so, reject.
|
||||||
|
* In other words, prevent shrinking or expanding a sealed VMA.
|
||||||
|
*
|
||||||
|
* Place can_modify_mm here so we can keep the logic related to
|
||||||
|
* shrink/expand together.
|
||||||
|
*/
|
||||||
|
if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) {
|
||||||
|
ret = -EPERM;
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Always allow a shrinking remap: that just unmaps
|
* Always allow a shrinking remap: that just unmaps
|
||||||
* the unnecessary pages..
|
* the unnecessary pages..
|
||||||
|
|
307
mm/mseal.c
Normal file
307
mm/mseal.c
Normal file
|
@ -0,0 +1,307 @@
|
||||||
|
// SPDX-License-Identifier: GPL-2.0
|
||||||
|
/*
|
||||||
|
* Implement mseal() syscall.
|
||||||
|
*
|
||||||
|
* Copyright (c) 2023,2024 Google, Inc.
|
||||||
|
*
|
||||||
|
* Author: Jeff Xu <jeffxu@chromium.org>
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <linux/mempolicy.h>
|
||||||
|
#include <linux/mman.h>
|
||||||
|
#include <linux/mm.h>
|
||||||
|
#include <linux/mm_inline.h>
|
||||||
|
#include <linux/mmu_context.h>
|
||||||
|
#include <linux/syscalls.h>
|
||||||
|
#include <linux/sched.h>
|
||||||
|
#include "internal.h"
|
||||||
|
|
||||||
|
static inline bool vma_is_sealed(struct vm_area_struct *vma)
|
||||||
|
{
|
||||||
|
return (vma->vm_flags & VM_SEALED);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void set_vma_sealed(struct vm_area_struct *vma)
|
||||||
|
{
|
||||||
|
vm_flags_set(vma, VM_SEALED);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* check if a vma is sealed for modification.
|
||||||
|
* return true, if modification is allowed.
|
||||||
|
*/
|
||||||
|
static bool can_modify_vma(struct vm_area_struct *vma)
|
||||||
|
{
|
||||||
|
if (unlikely(vma_is_sealed(vma)))
|
||||||
|
return false;
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
static bool is_madv_discard(int behavior)
|
||||||
|
{
|
||||||
|
return behavior &
|
||||||
|
(MADV_FREE | MADV_DONTNEED | MADV_DONTNEED_LOCKED |
|
||||||
|
MADV_REMOVE | MADV_DONTFORK | MADV_WIPEONFORK);
|
||||||
|
}
|
||||||
|
|
||||||
|
static bool is_ro_anon(struct vm_area_struct *vma)
|
||||||
|
{
|
||||||
|
/* check anonymous mapping. */
|
||||||
|
if (vma->vm_file || vma->vm_flags & VM_SHARED)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* check for non-writable:
|
||||||
|
* PROT=RO or PKRU is not writeable.
|
||||||
|
*/
|
||||||
|
if (!(vma->vm_flags & VM_WRITE) ||
|
||||||
|
!arch_vma_access_permitted(vma, true, false, false))
|
||||||
|
return true;
|
||||||
|
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check if the vmas of a memory range are allowed to be modified.
|
||||||
|
* the memory ranger can have a gap (unallocated memory).
|
||||||
|
* return true, if it is allowed.
|
||||||
|
*/
|
||||||
|
bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end)
|
||||||
|
{
|
||||||
|
struct vm_area_struct *vma;
|
||||||
|
|
||||||
|
VMA_ITERATOR(vmi, mm, start);
|
||||||
|
|
||||||
|
/* going through each vma to check. */
|
||||||
|
for_each_vma_range(vmi, vma, end) {
|
||||||
|
if (unlikely(!can_modify_vma(vma)))
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Allow by default. */
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check if the vmas of a memory range are allowed to be modified by madvise.
|
||||||
|
* the memory ranger can have a gap (unallocated memory).
|
||||||
|
* return true, if it is allowed.
|
||||||
|
*/
|
||||||
|
bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end,
|
||||||
|
int behavior)
|
||||||
|
{
|
||||||
|
struct vm_area_struct *vma;
|
||||||
|
|
||||||
|
VMA_ITERATOR(vmi, mm, start);
|
||||||
|
|
||||||
|
if (!is_madv_discard(behavior))
|
||||||
|
return true;
|
||||||
|
|
||||||
|
/* going through each vma to check. */
|
||||||
|
for_each_vma_range(vmi, vma, end)
|
||||||
|
if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma)))
|
||||||
|
return false;
|
||||||
|
|
||||||
|
/* Allow by default. */
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
|
||||||
|
struct vm_area_struct **prev, unsigned long start,
|
||||||
|
unsigned long end, vm_flags_t newflags)
|
||||||
|
{
|
||||||
|
int ret = 0;
|
||||||
|
vm_flags_t oldflags = vma->vm_flags;
|
||||||
|
|
||||||
|
if (newflags == oldflags)
|
||||||
|
goto out;
|
||||||
|
|
||||||
|
vma = vma_modify_flags(vmi, *prev, vma, start, end, newflags);
|
||||||
|
if (IS_ERR(vma)) {
|
||||||
|
ret = PTR_ERR(vma);
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
|
set_vma_sealed(vma);
|
||||||
|
out:
|
||||||
|
*prev = vma;
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check for do_mseal:
|
||||||
|
* 1> start is part of a valid vma.
|
||||||
|
* 2> end is part of a valid vma.
|
||||||
|
* 3> No gap (unallocated address) between start and end.
|
||||||
|
* 4> map is sealable.
|
||||||
|
*/
|
||||||
|
static int check_mm_seal(unsigned long start, unsigned long end)
|
||||||
|
{
|
||||||
|
struct vm_area_struct *vma;
|
||||||
|
unsigned long nstart = start;
|
||||||
|
|
||||||
|
VMA_ITERATOR(vmi, current->mm, start);
|
||||||
|
|
||||||
|
/* going through each vma to check. */
|
||||||
|
for_each_vma_range(vmi, vma, end) {
|
||||||
|
if (vma->vm_start > nstart)
|
||||||
|
/* unallocated memory found. */
|
||||||
|
return -ENOMEM;
|
||||||
|
|
||||||
|
if (vma->vm_end >= end)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
nstart = vma->vm_end;
|
||||||
|
}
|
||||||
|
|
||||||
|
return -ENOMEM;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Apply sealing.
|
||||||
|
*/
|
||||||
|
static int apply_mm_seal(unsigned long start, unsigned long end)
|
||||||
|
{
|
||||||
|
unsigned long nstart;
|
||||||
|
struct vm_area_struct *vma, *prev;
|
||||||
|
|
||||||
|
VMA_ITERATOR(vmi, current->mm, start);
|
||||||
|
|
||||||
|
vma = vma_iter_load(&vmi);
|
||||||
|
/*
|
||||||
|
* Note: check_mm_seal should already checked ENOMEM case.
|
||||||
|
* so vma should not be null, same for the other ENOMEM cases.
|
||||||
|
*/
|
||||||
|
prev = vma_prev(&vmi);
|
||||||
|
if (start > vma->vm_start)
|
||||||
|
prev = vma;
|
||||||
|
|
||||||
|
nstart = start;
|
||||||
|
for_each_vma_range(vmi, vma, end) {
|
||||||
|
int error;
|
||||||
|
unsigned long tmp;
|
||||||
|
vm_flags_t newflags;
|
||||||
|
|
||||||
|
newflags = vma->vm_flags | VM_SEALED;
|
||||||
|
tmp = vma->vm_end;
|
||||||
|
if (tmp > end)
|
||||||
|
tmp = end;
|
||||||
|
error = mseal_fixup(&vmi, vma, &prev, nstart, tmp, newflags);
|
||||||
|
if (error)
|
||||||
|
return error;
|
||||||
|
nstart = vma_iter_end(&vmi);
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* mseal(2) seals the VM's meta data from
|
||||||
|
* selected syscalls.
|
||||||
|
*
|
||||||
|
* addr/len: VM address range.
|
||||||
|
*
|
||||||
|
* The address range by addr/len must meet:
|
||||||
|
* start (addr) must be in a valid VMA.
|
||||||
|
* end (addr + len) must be in a valid VMA.
|
||||||
|
* no gap (unallocated memory) between start and end.
|
||||||
|
* start (addr) must be page aligned.
|
||||||
|
*
|
||||||
|
* len: len will be page aligned implicitly.
|
||||||
|
*
|
||||||
|
* Below VMA operations are blocked after sealing.
|
||||||
|
* 1> Unmapping, moving to another location, and shrinking
|
||||||
|
* the size, via munmap() and mremap(), can leave an empty
|
||||||
|
* space, therefore can be replaced with a VMA with a new
|
||||||
|
* set of attributes.
|
||||||
|
* 2> Moving or expanding a different vma into the current location,
|
||||||
|
* via mremap().
|
||||||
|
* 3> Modifying a VMA via mmap(MAP_FIXED).
|
||||||
|
* 4> Size expansion, via mremap(), does not appear to pose any
|
||||||
|
* specific risks to sealed VMAs. It is included anyway because
|
||||||
|
* the use case is unclear. In any case, users can rely on
|
||||||
|
* merging to expand a sealed VMA.
|
||||||
|
* 5> mprotect and pkey_mprotect.
|
||||||
|
* 6> Some destructive madvice() behavior (e.g. MADV_DONTNEED)
|
||||||
|
* for anonymous memory, when users don't have write permission to the
|
||||||
|
* memory. Those behaviors can alter region contents by discarding pages,
|
||||||
|
* effectively a memset(0) for anonymous memory.
|
||||||
|
*
|
||||||
|
* flags: reserved.
|
||||||
|
*
|
||||||
|
* return values:
|
||||||
|
* zero: success.
|
||||||
|
* -EINVAL:
|
||||||
|
* invalid input flags.
|
||||||
|
* start address is not page aligned.
|
||||||
|
* Address arange (start + len) overflow.
|
||||||
|
* -ENOMEM:
|
||||||
|
* addr is not a valid address (not allocated).
|
||||||
|
* end (start + len) is not a valid address.
|
||||||
|
* a gap (unallocated memory) between start and end.
|
||||||
|
* -EPERM:
|
||||||
|
* - In 32 bit architecture, sealing is not supported.
|
||||||
|
* Note:
|
||||||
|
* user can call mseal(2) multiple times, adding a seal on an
|
||||||
|
* already sealed memory is a no-action (no error).
|
||||||
|
*
|
||||||
|
* unseal() is not supported.
|
||||||
|
*/
|
||||||
|
static int do_mseal(unsigned long start, size_t len_in, unsigned long flags)
|
||||||
|
{
|
||||||
|
size_t len;
|
||||||
|
int ret = 0;
|
||||||
|
unsigned long end;
|
||||||
|
struct mm_struct *mm = current->mm;
|
||||||
|
|
||||||
|
ret = can_do_mseal(flags);
|
||||||
|
if (ret)
|
||||||
|
return ret;
|
||||||
|
|
||||||
|
start = untagged_addr(start);
|
||||||
|
if (!PAGE_ALIGNED(start))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
len = PAGE_ALIGN(len_in);
|
||||||
|
/* Check to see whether len was rounded up from small -ve to zero. */
|
||||||
|
if (len_in && !len)
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
end = start + len;
|
||||||
|
if (end < start)
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
if (end == start)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
if (mmap_write_lock_killable(mm))
|
||||||
|
return -EINTR;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* First pass, this helps to avoid
|
||||||
|
* partial sealing in case of error in input address range,
|
||||||
|
* e.g. ENOMEM error.
|
||||||
|
*/
|
||||||
|
ret = check_mm_seal(start, end);
|
||||||
|
if (ret)
|
||||||
|
goto out;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Second pass, this should success, unless there are errors
|
||||||
|
* from vma_modify_flags, e.g. merge/split error, or process
|
||||||
|
* reaching the max supported VMAs, however, those cases shall
|
||||||
|
* be rare.
|
||||||
|
*/
|
||||||
|
ret = apply_mm_seal(start, end);
|
||||||
|
|
||||||
|
out:
|
||||||
|
mmap_write_unlock(current->mm);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
SYSCALL_DEFINE3(mseal, unsigned long, start, size_t, len, unsigned long,
|
||||||
|
flags)
|
||||||
|
{
|
||||||
|
return do_mseal(start, len, flags);
|
||||||
|
}
|
2
tools/testing/selftests/mm/.gitignore
vendored
2
tools/testing/selftests/mm/.gitignore
vendored
|
@ -47,3 +47,5 @@ mkdirty
|
||||||
va_high_addr_switch
|
va_high_addr_switch
|
||||||
hugetlb_fault_after_madv
|
hugetlb_fault_after_madv
|
||||||
hugetlb_madv_vs_map
|
hugetlb_madv_vs_map
|
||||||
|
mseal_test
|
||||||
|
seal_elf
|
||||||
|
|
|
@ -59,6 +59,8 @@ TEST_GEN_FILES += mlock2-tests
|
||||||
TEST_GEN_FILES += mrelease_test
|
TEST_GEN_FILES += mrelease_test
|
||||||
TEST_GEN_FILES += mremap_dontunmap
|
TEST_GEN_FILES += mremap_dontunmap
|
||||||
TEST_GEN_FILES += mremap_test
|
TEST_GEN_FILES += mremap_test
|
||||||
|
TEST_GEN_FILES += mseal_test
|
||||||
|
TEST_GEN_FILES += seal_elf
|
||||||
TEST_GEN_FILES += on-fault-limit
|
TEST_GEN_FILES += on-fault-limit
|
||||||
TEST_GEN_FILES += pagemap_ioctl
|
TEST_GEN_FILES += pagemap_ioctl
|
||||||
TEST_GEN_FILES += thuge-gen
|
TEST_GEN_FILES += thuge-gen
|
||||||
|
|
1894
tools/testing/selftests/mm/mseal_test.c
Normal file
1894
tools/testing/selftests/mm/mseal_test.c
Normal file
File diff suppressed because it is too large
Load diff
179
tools/testing/selftests/mm/seal_elf.c
Normal file
179
tools/testing/selftests/mm/seal_elf.c
Normal file
|
@ -0,0 +1,179 @@
|
||||||
|
// SPDX-License-Identifier: GPL-2.0
|
||||||
|
#define _GNU_SOURCE
|
||||||
|
#include <sys/mman.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <sys/time.h>
|
||||||
|
#include <sys/resource.h>
|
||||||
|
#include <stdbool.h>
|
||||||
|
#include "../kselftest.h"
|
||||||
|
#include <syscall.h>
|
||||||
|
#include <errno.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <sys/ioctl.h>
|
||||||
|
#include <sys/vfs.h>
|
||||||
|
#include <sys/stat.h>
|
||||||
|
|
||||||
|
/*
|
||||||
|
* need those definition for manually build using gcc.
|
||||||
|
* gcc -I ../../../../usr/include -DDEBUG -O3 -DDEBUG -O3 seal_elf.c -o seal_elf
|
||||||
|
*/
|
||||||
|
#define FAIL_TEST_IF_FALSE(c) do {\
|
||||||
|
if (!(c)) {\
|
||||||
|
ksft_test_result_fail("%s, line:%d\n", __func__, __LINE__);\
|
||||||
|
goto test_end;\
|
||||||
|
} \
|
||||||
|
} \
|
||||||
|
while (0)
|
||||||
|
|
||||||
|
#define SKIP_TEST_IF_FALSE(c) do {\
|
||||||
|
if (!(c)) {\
|
||||||
|
ksft_test_result_skip("%s, line:%d\n", __func__, __LINE__);\
|
||||||
|
goto test_end;\
|
||||||
|
} \
|
||||||
|
} \
|
||||||
|
while (0)
|
||||||
|
|
||||||
|
|
||||||
|
#define TEST_END_CHECK() {\
|
||||||
|
ksft_test_result_pass("%s\n", __func__);\
|
||||||
|
return;\
|
||||||
|
test_end:\
|
||||||
|
return;\
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef u64
|
||||||
|
#define u64 unsigned long long
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/*
|
||||||
|
* define sys_xyx to call syscall directly.
|
||||||
|
*/
|
||||||
|
static int sys_mseal(void *start, size_t len)
|
||||||
|
{
|
||||||
|
int sret;
|
||||||
|
|
||||||
|
errno = 0;
|
||||||
|
sret = syscall(__NR_mseal, start, len, 0);
|
||||||
|
return sret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void *sys_mmap(void *addr, unsigned long len, unsigned long prot,
|
||||||
|
unsigned long flags, unsigned long fd, unsigned long offset)
|
||||||
|
{
|
||||||
|
void *sret;
|
||||||
|
|
||||||
|
errno = 0;
|
||||||
|
sret = (void *) syscall(__NR_mmap, addr, len, prot,
|
||||||
|
flags, fd, offset);
|
||||||
|
return sret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline int sys_mprotect(void *ptr, size_t size, unsigned long prot)
|
||||||
|
{
|
||||||
|
int sret;
|
||||||
|
|
||||||
|
errno = 0;
|
||||||
|
sret = syscall(__NR_mprotect, ptr, size, prot);
|
||||||
|
return sret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static bool seal_support(void)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
void *ptr;
|
||||||
|
unsigned long page_size = getpagesize();
|
||||||
|
|
||||||
|
ptr = sys_mmap(NULL, page_size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
|
||||||
|
if (ptr == (void *) -1)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
ret = sys_mseal(ptr, page_size);
|
||||||
|
if (ret < 0)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
const char somestr[4096] = {"READONLY"};
|
||||||
|
|
||||||
|
static void test_seal_elf(void)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
FILE *maps;
|
||||||
|
char line[512];
|
||||||
|
uintptr_t addr_start, addr_end;
|
||||||
|
char prot[5];
|
||||||
|
char filename[256];
|
||||||
|
unsigned long page_size = getpagesize();
|
||||||
|
unsigned long long ptr = (unsigned long long) somestr;
|
||||||
|
char *somestr2 = (char *)somestr;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Modify the protection of readonly somestr
|
||||||
|
*/
|
||||||
|
if (((unsigned long long)ptr % page_size) != 0)
|
||||||
|
ptr = (unsigned long long)ptr & ~(page_size - 1);
|
||||||
|
|
||||||
|
ksft_print_msg("somestr = %s\n", somestr);
|
||||||
|
ksft_print_msg("change protection to rw\n");
|
||||||
|
ret = sys_mprotect((void *)ptr, page_size, PROT_READ|PROT_WRITE);
|
||||||
|
FAIL_TEST_IF_FALSE(!ret);
|
||||||
|
*somestr2 = 'A';
|
||||||
|
ksft_print_msg("somestr is modified to: %s\n", somestr);
|
||||||
|
ret = sys_mprotect((void *)ptr, page_size, PROT_READ);
|
||||||
|
FAIL_TEST_IF_FALSE(!ret);
|
||||||
|
|
||||||
|
maps = fopen("/proc/self/maps", "r");
|
||||||
|
FAIL_TEST_IF_FALSE(maps);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* apply sealing to elf binary
|
||||||
|
*/
|
||||||
|
while (fgets(line, sizeof(line), maps)) {
|
||||||
|
if (sscanf(line, "%lx-%lx %4s %*x %*x:%*x %*u %255[^\n]",
|
||||||
|
&addr_start, &addr_end, prot, filename) == 4) {
|
||||||
|
if (strlen(filename)) {
|
||||||
|
/*
|
||||||
|
* seal the mapping if read only.
|
||||||
|
*/
|
||||||
|
if (strstr(prot, "r-")) {
|
||||||
|
ret = sys_mseal((void *)addr_start, addr_end - addr_start);
|
||||||
|
FAIL_TEST_IF_FALSE(!ret);
|
||||||
|
ksft_print_msg("sealed: %lx-%lx %s %s\n",
|
||||||
|
addr_start, addr_end, prot, filename);
|
||||||
|
if ((uintptr_t) somestr >= addr_start &&
|
||||||
|
(uintptr_t) somestr <= addr_end)
|
||||||
|
ksft_print_msg("mapping for somestr found\n");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
fclose(maps);
|
||||||
|
|
||||||
|
ret = sys_mprotect((void *)ptr, page_size, PROT_READ | PROT_WRITE);
|
||||||
|
FAIL_TEST_IF_FALSE(ret < 0);
|
||||||
|
ksft_print_msg("somestr is sealed, mprotect is rejected\n");
|
||||||
|
|
||||||
|
TEST_END_CHECK();
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char **argv)
|
||||||
|
{
|
||||||
|
bool test_seal = seal_support();
|
||||||
|
|
||||||
|
ksft_print_header();
|
||||||
|
ksft_print_msg("pid=%d\n", getpid());
|
||||||
|
|
||||||
|
if (!test_seal)
|
||||||
|
ksft_exit_skip("sealing not supported, check CONFIG_64BIT\n");
|
||||||
|
|
||||||
|
ksft_set_plan(1);
|
||||||
|
|
||||||
|
test_seal_elf();
|
||||||
|
|
||||||
|
ksft_finished();
|
||||||
|
}
|
Loading…
Reference in a new issue