linux/fs/proc
Rik van Riel d2cd9ede6e mm,fork: introduce MADV_WIPEONFORK
Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
in the child process after fork.  This differs from MADV_DONTFORK in one
important way.

If a child process accesses memory that was MADV_WIPEONFORK, it will get
zeroes.  The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will get a
segmentation fault, since those address ranges are no longer valid in
the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large programs
to fork in systems with strict memory overcommit restrictions, changing
the semantics of MADV_DONTFORK might break existing programs.

MADV_WIPEONFORK only works on private, anonymous VMAs.

The use case is libraries that store or cache information, and want to
know that they need to regenerate it in the child process after fork.

Examples of this would be:
 - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
   check, which is too slow without a PID cache)
 - PKCS#11 API reinitialization check (mandated by specification)
 - glibc's upcoming PRNG (reseed after fork)
 - OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized PRNG in
every child process are pretty obvious.  However, due to libraries
having all kinds of internal state, and programs getting compiled with
many different versions of each library, it is unreasonable to expect
calling programs to re-initialize everything manually after fork.

A further complication is the proliferation of clone flags, programs
bypassing glibc's functions to call clone directly, and programs calling
unshare, causing the glibc pthread_atfork hook to not get called.

It would be better to have the kernel take care of this automatically.

The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
MADV_WIPEONFORK.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

[akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.com
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: Colm MacCártaigh <colm@allcosts.net>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:30 -07:00
..
array.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
base.c mm: add /proc/pid/smaps_rollup 2017-09-06 17:27:30 -07:00
cmdline.c fs/proc: don't use module_init for non-modular core code 2014-01-23 16:37:02 -08:00
consoles.c fs/proc: don't use module_init for non-modular core code 2014-01-23 16:37:02 -08:00
cpuinfo.c fs/proc: don't use module_init for non-modular core code 2014-01-23 16:37:02 -08:00
devices.c block: order /proc/devices by major number 2017-07-17 15:42:20 +02:00
fd.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
fd.h proc: unsigned file descriptors 2016-09-27 18:47:38 -04:00
generic.c fs/proc/generic.c: switch to ida_simple_get/remove 2017-07-10 16:32:34 -07:00
inode.c fs/proc/inode.c: remove cast from memory allocation 2017-05-08 17:15:10 -07:00
internal.h mm: add /proc/pid/smaps_rollup 2017-09-06 17:27:30 -07:00
interrupts.c fs/proc: don't use module_init for non-modular core code 2014-01-23 16:37:02 -08:00
Kconfig fs, proc: add help for CONFIG_PROC_CHILDREN 2015-07-17 16:39:52 -07:00
kcore.c fs/proc: kcore: use kcore_list type to check for vmalloc/module address 2017-06-20 12:42:57 +01:00
kmsg.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
loadavg.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/stat.h> 2017-03-02 08:42:34 +01:00
Makefile fs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2 2016-08-03 12:45:23 -04:00
meminfo.c mm: rename global_page_state to global_zone_page_state 2017-09-06 17:27:29 -07:00
namespaces.c pidns: expose task pid_ns_for_children to userspace 2017-05-08 17:15:12 -07:00
nommu.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
page.c mm: fix KPF_SWAPCACHE in /proc/kpageflags 2017-02-07 12:08:32 -08:00
proc_net.c Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
proc_sysctl.c Merge branch 'akpm' (patches from Andrew) 2017-07-13 12:38:49 -07:00
proc_tty.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
root.c Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
self.c vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
softirqs.c fs/proc: don't use module_init for non-modular core code 2014-01-23 16:37:02 -08:00
stat.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
task_mmu.c mm,fork: introduce MADV_WIPEONFORK 2017-09-06 17:27:30 -07:00
task_nommu.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
thread_self.c vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
uptime.c sched/cputime: Convert kcpustat to nsecs 2017-02-01 09:13:47 +01:00
version.c fs/proc: don't use module_init for non-modular core code 2014-01-23 16:37:02 -08:00
vmcore.c userfaultfd: non-cooperative: add event for memory unmaps 2017-02-24 17:46:55 -08:00