Commit graph

206 commits

Author SHA1 Message Date
Linus Torvalds 642e7fd233 Merge branch 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux
Pull removal of in-kernel calls to syscalls from Dominik Brodowski:
 "System calls are interaction points between userspace and the kernel.
  Therefore, system call functions such as sys_xyzzy() or
  compat_sys_xyzzy() should only be called from userspace via the
  syscall table, but not from elsewhere in the kernel.

  At least on 64-bit x86, it will likely be a hard requirement from
  v4.17 onwards to not call system call functions in the kernel: It is
  better to use use a different calling convention for system calls
  there, where struct pt_regs is decoded on-the-fly in a syscall wrapper
  which then hands processing over to the actual syscall function. This
  means that only those parameters which are actually needed for a
  specific syscall are passed on during syscall entry, instead of
  filling in six CPU registers with random user space content all the
  time (which may cause serious trouble down the call chain). Those
  x86-specific patches will be pushed through the x86 tree in the near
  future.

  Moreover, rules on how data may be accessed may differ between kernel
  data and user data. This is another reason why calling sys_xyzzy() is
  generally a bad idea, and -- at most -- acceptable in arch-specific
  code.

  This patchset removes all in-kernel calls to syscall functions in the
  kernel with the exception of arch/. On top of this, it cleans up the
  three places where many syscalls are referenced or prototyped, namely
  kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h"

* 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits)
  bpf: whitelist all syscalls for error injection
  kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions
  kernel/sys_ni: sort cond_syscall() entries
  syscalls/x86: auto-create compat_sys_*() prototypes
  syscalls: sort syscall prototypes in include/linux/compat.h
  net: remove compat_sys_*() prototypes from net/compat.h
  syscalls: sort syscall prototypes in include/linux/syscalls.h
  kexec: move sys_kexec_load() prototype to syscalls.h
  x86/sigreturn: use SYSCALL_DEFINE0
  x86: fix sys_sigreturn() return type to be long, not unsigned long
  x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
  mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
  mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
  mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
  fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
  fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
  fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
  fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
  kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
  kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
  ...
2018-04-02 21:22:12 -07:00
Dominik Brodowski c84d0791df ipc: add shmctl syscall/compat_syscall wrappers
Provide ksys_shmctl() and compat_ksys_shmctl() wrappers to avoid in-kernel
calls to these syscalls. The ksys_ prefix denotes that these functions are
meant as a drop-in replacement for the syscalls. In particular, they use
the same calling convention as sys_shmctl() and compat_sys_shmctl().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:25 +02:00
Dominik Brodowski da1e274434 ipc: add shmdt syscall wrapper
Provide ksys_shmdt() wrapper to avoid in-kernel calls to this syscall.
The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_shmdt().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:25 +02:00
Dominik Brodowski 65749e0bb5 ipc: add shmget syscall wrapper
Provide ksys_shmget() wrapper to avoid in-kernel calls to this syscall.
The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_shmget().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:24 +02:00
Mike Kravetz 3d942ee079 ipc/shm.c: add split function to shm_vm_ops
If System V shmget/shmat operations are used to create a hugetlbfs
backed mapping, it is possible to munmap part of the mapping and split
the underlying vma such that it is not huge page aligned.  This will
untimately result in the following BUG:

  kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
  CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G         C  E 4.15.0-10-generic #11-Ubuntu
  NIP:  c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
  REGS: c000003fbcdcf810 TRAP: 0700   Tainted: G         C  E (4.15.0-10-generic)
  MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002222  XER: 20040000
  CFAR: c00000000036ee44 SOFTE: 1
  NIP __unmap_hugepage_range+0xa4/0x760
  LR __unmap_hugepage_range_final+0x28/0x50
  Call Trace:
    0x7115e4e00000 (unreliable)
    __unmap_hugepage_range_final+0x28/0x50
    unmap_single_vma+0x11c/0x190
    unmap_vmas+0x94/0x140
    exit_mmap+0x9c/0x1d0
    mmput+0xa8/0x1d0
    do_exit+0x360/0xc80
    do_group_exit+0x60/0x100
    SyS_exit_group+0x24/0x30
    system_call+0x58/0x6c
  ---[ end trace ee88f958a1c62605 ]---

This bug was introduced by commit 31383c6865 ("mm, hugetlbfs:
introduce ->split() to vm_operations_struct").  A split function was
added to vm_operations_struct to determine if a mapping can be split.
This was mostly for device-dax and hugetlbfs mappings which have
specific alignment constraints.

Mappings initiated via shmget/shmat have their original vm_ops
overwritten with shm_vm_ops.  shm_vm_ops functions will call back to the
original vm_ops if needed.  Add such a split function to shm_vm_ops.

Link: http://lkml.kernel.org/r/20180321161314.7711-1-mike.kravetz@oracle.com
Fixes: 31383c6865 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-28 13:42:05 -10:00
Philippe Mikoyan 87ad4b0d85 ipc: fix ipc data structures inconsistency
As described in the title, this patch fixes <ipc>id_ds inconsistency when
<ipc>ctl_stat executes concurrently with some ds-changing function, e.g.
shmat, msgsnd or whatever.

For instance, if shmctl(IPC_STAT) is running concurrently
with shmat, following data structure can be returned:
{... shm_lpid = 0, shm_nattch = 1, ...}

Link: http://lkml.kernel.org/r/20171202153456.6514-1-philippe.mikoyan@skat.systems
Signed-off-by: Philippe Mikoyan <philippe.mikoyan@skat.systems>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:46 -08:00
Linus Torvalds fa7f578076 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - a bit more MM

 - procfs updates

 - dynamic-debug fixes

 - lib/ updates

 - checkpatch

 - epoll

 - nilfs2

 - signals

 - rapidio

 - PID management cleanup and optimization

 - kcov updates

 - sysvipc updates

 - quite a few misc things all over the place

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
  EXPERT Kconfig menu: fix broken EXPERT menu
  include/asm-generic/topology.h: remove unused parent_node() macro
  arch/tile/include/asm/topology.h: remove unused parent_node() macro
  arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro
  arch/sh/include/asm/topology.h: remove unused parent_node() macro
  arch/ia64/include/asm/topology.h: remove unused parent_node() macro
  drivers/pcmcia/sa1111_badge4.c: avoid unused function warning
  mm: add infrastructure for get_user_pages_fast() benchmarking
  sysvipc: make get_maxid O(1) again
  sysvipc: properly name ipc_addid() limit parameter
  sysvipc: duplicate lock comments wrt ipc_addid()
  sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
  initramfs: use time64_t timestamps
  drivers/watchdog: make use of devm_register_reboot_notifier()
  kernel/reboot.c: add devm_register_reboot_notifier()
  kcov: update documentation
  Makefile: support flag -fsanitizer-coverage=trace-cmp
  kcov: support comparison operands collection
  kcov: remove pointless current != NULL check
  kernel/panic.c: add TAINT_AUX
  ...
2017-11-17 16:56:17 -08:00
Davidlohr Bueso 39c96a1b96 sysvipc: duplicate lock comments wrt ipc_addid()
The comment in msgqueues when using ipc_addid() is quite useful imo.
Duplicate it for shm and semaphores.

Link: http://lkml.kernel.org/r/20170831172049.14576-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17 16:10:04 -08:00
Linus Torvalds ca5b857cb0 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted stuff, really no common topic here"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: grab the lock instead of blocking in __fd_install during resizing
  vfs: stop clearing close on exec when closing a fd
  include/linux/fs.h: fix comment about struct address_space
  fs: make fiemap work from compat_ioctl
  coda: fix 'kernel memory exposure attempt' in fsync
  pstore: remove unneeded unlikely()
  vfs: remove unneeded unlikely()
  stubs for mount_bdev() and kill_block_super() in !CONFIG_BLOCK case
  make vfs_ustat() static
  do_handle_open() should be static
  elf_fdpic: fix unused variable warning
  fold destroy_super() into __put_super()
  new helper: destroy_unused_super()
  fix address space warnings in ipc/
  acct.h: get rid of detritus
2017-11-17 12:54:01 -08:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Linus Torvalds 6aa211e8ce fix address space warnings in ipc/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-10-11 13:41:41 -04:00
Al Viro b776e4b1a9 fix a typo in put_compat_shm_info()
"uip" misspelled as "up"; unfortunately, the latter happens to be
a function and gcc is happy to convert it to void *...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-25 20:41:46 -04:00
Will Deacon 58aff0af75 ipc/shm: Fix order of parameters when calling copy_compat_shmid_to_user
Commit 553f770ef7 ("ipc: move compat shmctl to native") moved the
compat IPC syscall handling into ipc/shm.c and refactored the struct
accessors in the process. Unfortunately, the call to
copy_compat_shmid_to_user when handling a compat {IPC,SHM}_STAT command
gets the arguments the wrong way round, passing a kernel stack address
as the user buffer (destination) and the user buffer as the kernel stack
address (source).

This patch fixes the parameter ordering so the buffers are accessed
correctly.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-20 23:27:48 -04:00
Linus Torvalds cc73fee0ba Merge branch 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ipc compat cleanup and 64-bit time_t from Al Viro:
 "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
  Dinamani"

* 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  utimes: Make utimes y2038 safe
  ipc: shm: Make shmid_kernel timestamps y2038 safe
  ipc: sem: Make sem_array timestamps y2038 safe
  ipc: msg: Make msg_queue timestamps y2038 safe
  ipc: mqueue: Replace timespec with timespec64
  ipc: Make sys_semtimedop() y2038 safe
  get rid of SYSVIPC_COMPAT on ia64
  semtimedop(): move compat to native
  shmat(2): move compat to native
  msgrcv(2), msgsnd(2): move compat to native
  ipc(2): move compat to native
  ipc: make use of compat ipc_perm helpers
  semctl(): move compat to native
  semctl(): separate all layout-dependent copyin/copyout
  msgctl(): move compat to native
  msgctl(): split the actual work from copyin/copyout
  ipc: move compat shmctl to native
  shmctl: split the work from copyin/copyout
2017-09-14 17:37:26 -07:00
Guillaume Knispel 0cfb6aee70 ipc: optimize semget/shmget/msgget for lots of keys
ipc_findkey() used to scan all objects to look for the wanted key.  This
is slow when using a high number of keys.  This change adds an rhashtable
of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n).

This change gives a 865% improvement of benchmark reaim.jobs_per_min on a
56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1]

Other (more micro) benchmark results, by the author: On an i5 laptop, the
following loop executed right after a reboot took, without and with this
change:

    for (int i = 0, k=0x424242; i < KEYS; ++i)
        semget(k++, 1, IPC_CREAT | 0600);

                 total       total          max single  max single
   KEYS        without        with        call without   call with

      1            3.5         4.9   µs            3.5         4.9
     10            7.6         8.6   µs            3.7         4.7
     32           16.2        15.9   µs            4.3         5.3
    100           72.9        41.8   µs            3.7         4.7
   1000        5,630.0       502.0   µs             *           *
  10000    1,340,000.0     7,240.0   µs             *           *
  31900   17,600,000.0    22,200.0   µs             *           *

 *: unreliable measure: high variance

The duration for a lookup-only usage was obtained by the same loop once
the keys are present:

                 total       total          max single  max single
   KEYS        without        with        call without   call with

      1            2.1         2.5   µs            2.1         2.5
     10            4.5         4.8   µs            2.2         2.3
     32           13.0        10.8   µs            2.3         2.8
    100           82.9        25.1   µs             *          2.3
   1000        5,780.0       217.0   µs             *           *
  10000    1,470,000.0     2,520.0   µs             *           *
  31900   17,400,000.0     7,810.0   µs             *           *

Finally, executing each semget() in a new process gave, when still
summing only the durations of these syscalls:

creation:
                 total       total
   KEYS        without        with

      1            3.7         5.0   µs
     10           32.9        36.7   µs
     32          125.0       109.0   µs
    100          523.0       353.0   µs
   1000       20,300.0     3,280.0   µs
  10000    2,470,000.0    46,700.0   µs
  31900   27,800,000.0   219,000.0   µs

lookup-only:
                 total       total
   KEYS        without        with

      1            2.5         2.7   µs
     10           25.4        24.4   µs
     32          106.0        72.6   µs
    100          591.0       352.0   µs
   1000       22,400.0     2,250.0   µs
  10000    2,510,000.0    25,700.0   µs
  31900   28,200,000.0   115,000.0   µs

[1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop

Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix
Signed-off-by: Guillaume Knispel <guillaume.knispel@supersonicimagine.com>
Reviewed-by: Marc Pardo <marc.pardo@supersonicimagine.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Guillaume Knispel <guillaume.knispel@supersonicimagine.com>
Cc: Marc Pardo <marc.pardo@supersonicimagine.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:51 -07:00
Deepa Dinamani 7ff2819e8d ipc: shm: Make shmid_kernel timestamps y2038 safe
time_t is not y2038 safe. Replace all uses of
time_t by y2038 safe time64_t.

Similarly, replace the calls to get_seconds() with
y2038 safe ktime_get_real_seconds().
Note that this preserves fast access on 64 bit systems,
but 32 bit systems need sequence counters.

The syscall interfaces themselves are not changed as part of
the patch. They will be part of a different series.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-03 20:24:29 -04:00
Kees Cook ade9f91b32 ipc: add missing container_of()s for randstruct
When building with the randstruct gcc plugin, the layout of the IPC
structs will be randomized, which requires any sub-structure accesses to
use container_of().  The proc display handlers were missing the needed
container_of()s since the iterator is passing in the top-level struct
kern_ipc_perm.

This would lead to crashes when running the "lsipc" program after the
system had IPC registered (e.g. after starting up Gnome):

  general protection fault: 0000 [#1] PREEMPT SMP
  ...
  RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
  ...
  Call Trace:
    sysvipc_shm_proc_show+0x5e/0x150
    sysvipc_proc_show+0x1a/0x30
    seq_read+0x2e9/0x3f0
  ...

Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
Fixes: 3859a271a0 ("randstruct: Mark various structs for randomization")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:12 -07:00
Al Viro a78ee9ed2f shmat(2): move compat to native
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-15 20:46:47 -04:00
Al Viro 28327fae62 ipc: make use of compat ipc_perm helpers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-15 20:46:45 -04:00
Al Viro 553f770ef7 ipc: move compat shmctl to native
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-15 20:46:42 -04:00
Al Viro 9ba720c186 shmctl: split the work from copyin/copyout
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-15 20:46:41 -04:00
Kees Cook 42e618f77d ipc/shm: remove special shm_alloc/free
There is nothing special about the shm_alloc/free routines any more, so
remove them to make code more readable.

[manfred@colorfullife.com: Rediff, to continue to keep rcu for free calls after a successful security_shm_alloc()]
Link: http://lkml.kernel.org/r/20170525185107.12869-18-manfred@colorfullife.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:02 -07:00
Kees Cook 3d3653f973 ipc: move atomic_set() to where it is needed
Only after ipc_addid() has succeeded will refcounting be used, so move
initialization into ipc_addid() and remove from open-coded *_alloc()
routines.

Link: http://lkml.kernel.org/r/20170525185107.12869-17-manfred@colorfullife.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:02 -07:00
Manfred Spraul a2642f8770 ipc/shm.c: avoid ipc_rcu_putref for failed ipc_addid()
Loosely based on a patch from Kees Cook <keescook@chromium.org>:
 - id and error can be merged
 - if operations before ipc_addid() fail, then use call_rcu() directly.

The difference is that call_rcu is used for failures after
security_shm_alloc(), to continue to guaranteed an rcu delay for
security_sem_free().

Link: http://lkml.kernel.org/r/20170525185107.12869-15-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:02 -07:00
Kees Cook 3e0c24042e ipc/shm: avoid ipc_rcu_alloc()
Instead of using ipc_rcu_alloc() which only performs the refcount bump,
open code it.  This also allows for shmid_kernel structure layout to be
randomized in the future.

Link: http://lkml.kernel.org/r/20170525185107.12869-11-manfred@colorfullife.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:01 -07:00
Kees Cook 66470b1817 ipc/shm: do not use ipc_rcu_free()
Avoid using ipc_rcu_free, since it just re-finds the original structure
pointer.  For the pre-list-init failure path, there is no RCU needed,
since it was just allocated.  It can be directly freed.

Link: http://lkml.kernel.org/r/20170525185107.12869-7-manfred@colorfullife.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:01 -07:00
Manfred Spraul dba4cdd39e ipc: merge ipc_rcu and kern_ipc_perm
ipc has two management structures that exist for every id:
 - struct kern_ipc_perm, it contains e.g. the permissions.
 - struct ipc_rcu, it contains the rcu head for rcu handling and the
   refcount.

The patch merges both structures.

As a bonus, we may save one cacheline, because both structures are
cacheline aligned.  In addition, it reduces the number of casts, instead
most codepaths can use container_of.

To simplify code, the ipc_rcu_alloc initializes the allocation to 0.

[manfred@colorfullife.com: really include the memset() into ipc_alloc_rcu()]
  Link: http://lkml.kernel.org/r/564f8612-0601-b267-514f-a9f650ec9b32@colorfullife.com
Link: http://lkml.kernel.org/r/20170525185107.12869-3-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:01 -07:00
Jeff Layton 0f41074a65 fs: remove call_fsync helper function
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-07-05 18:44:23 -04:00
Davidlohr Bueso f0cb88026f ipc/shm: some shmat cleanups
Clean up early flag and address some minutia.

Link: http://lkml.kernel.org/r/1486673582-6979-3-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:12 -07:00
Linus Torvalds 94e877d0fb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile two from Al Viro:

 - orangefs fix

 - series of fs/namei.c cleanups from me

 - VFS stuff coming from overlayfs tree

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs: Use RCU for destroy_inode
  vfs: use helper for calling f_op->fsync()
  mm: use helper for calling f_op->mmap()
  vfs: use helpers for calling f_op->{read,write}_iter()
  vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
  vfs: extract common parts of {compat_,}do_readv_writev()
  vfs: wrap write f_ops with file_{start,end}_write()
  vfs: deny copy_file_range() for non regular files
  vfs: deny fallocate() on directory
  vfs: create vfs helper vfs_tmpfile()
  namei.c: split unlazy_walk()
  namei.c: fold the check for DCACHE_OP_REVALIDATE into d_revalidate()
  lookup_fast(): clean up the logics around the fallback to non-rcu mode
  namei: fold unlazy_link() into its sole caller
2017-03-02 15:20:00 -08:00
Davidlohr Bueso 95e91b831f ipc/shm: Fix shmat mmap nil-page protection
The issue is described here, with a nice testcase:

    https://bugzilla.kernel.org/show_bug.cgi?id=192931

The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
the address rounded down to 0.  For the regular mmap case, the
protection mentioned above is that the kernel gets to generate the
address -- arch_get_unmapped_area() will always check for MAP_FIXED and
return that address.  So by the time we do security_mmap_addr(0) things
get funky for shmat().

The testcase itself shows that while a regular user crashes, root will
not have a problem attaching a nil-page.  There are two possible fixes
to this.  The first, and which this patch does, is to simply allow root
to crash as well -- this is also regular mmap behavior, ie when hacking
up the testcase and adding mmap(...  |MAP_FIXED).  While this approach
is the safer option, the second alternative is to ignore SHM_RND if the
rounded address is 0, thus only having MAP_SHARED flags.  This makes the
behavior of shmat() identical to the mmap() case.  The downside of this
is obviously user visible, but does make sense in that it maintains
semantics after the round-down wrt 0 address and mmap.

Passes shm related ltp tests.

Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Gareth Evans <gareth.evans@contextis.co.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Mike Rapoport 897ab3e0c4 userfaultfd: non-cooperative: add event for memory unmaps
When a non-cooperative userfaultfd monitor copies pages in the
background, it may encounter regions that were already unmapped.
Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely
changes in the virtual memory layout.

Since there might be different uffd contexts for the affected VMAs, we
first should create a temporary representation for the unmap event for
each uffd context and then notify them one by one to the appropriate
userfault file descriptors.

The event notification occurs after the mmap_sem has been released.

[arnd@arndb.de: fix nommu build]
  Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de
[mhocko@suse.com: fix nommu build]
  Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Dave Jiang 11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Miklos Szeredi 0eb8af4916 vfs: use helper for calling f_op->fsync()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-20 16:51:23 +01:00
Miklos Szeredi f74ac01520 mm: use helper for calling f_op->mmap()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-20 16:51:23 +01:00
Shailesh Pandey 63980c80e1 ipc/shm.c: coding style fixes
This patch fixes below warnings:

  WARNING: Missing a blank line after declarations
  WARNING: Block comments use a trailing */ on a separate line
  ERROR: spaces required around that '=' (ctx:WxV)

Above warnings were reported by checkpatch.pl

Link: http://lkml.kernel.org/r/1478604980-18062-1-git-send-email-p.shailesh@samsung.com
Signed-off-by: Shailesh Pandey <p.shailesh@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Kirill A. Shutemov 4595ef88d1 shmem: make shmem_inode_info::lock irq-safe
We are going to need to call shmem_charge() under tree_lock to get
accoutning right on collapse of small tmpfs pages into a huge one.

The problem is that tree_lock is irq-safe and lockdep is not happy, that
we take irq-unsafe lock under irq-safe[1].

Let's convert the lock to irq-safe.

[1] https://gist.github.com/kiryl/80c0149e03ed35dfaf26628b8e03cdbc

Link: http://lkml.kernel.org/r/1466021202-61880-34-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Hugh Dickins c01d5b3007 shmem: get_unmapped_area align huge page
Provide a shmem_get_unmapped_area method in file_operations, called at
mmap time to decide the mapping address.  It could be conditional on
CONFIG_TRANSPARENT_HUGEPAGE, but save #ifdefs in other places by making
it unconditional.

shmem_get_unmapped_area() first calls the usual mm->get_unmapped_area
(which we treat as a black box, highly dependent on architecture and
config and executable layout).  Lots of conditions, and in most cases it
just goes with the address that chose; but when our huge stars are
rightly aligned, yet that did not provide a suitable address, go back to
ask for a larger arena, within which to align the mapping suitably.

There have to be some direct calls to shmem_get_unmapped_area(), not via
the file_operations: because of the way shmem_zero_setup() is called to
create a shmem object late in the mmap sequence, when MAP_SHARED is
requested with MAP_ANONYMOUS or /dev/zero.  Though this only matters
when /proc/sys/vm/shmem_huge has been set.

Link: http://lkml.kernel.org/r/1466021202-61880-29-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Michal Hocko 91f4f94ea3 ipc, shm: make shmem attach/detach wait for mmap_sem killable
shmat and shmdt rely on mmap_sem for write.  If the waiting task gets
killed by the oom killer it would block oom_reaper from asynchronous
address space reclaim and reduce the chances of timely OOM resolving.
Wait for the lock in the killable mode and return with EINTR if the task
got killed while waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Kirill A. Shutemov 1ac0b6dec6 ipc/shm: handle removed segments gracefully in shm_mmap()
remap_file_pages(2) emulation can reach file which represents removed
IPC ID as long as a memory segment is mapped.  It breaks expectations of
IPC subsystem.

Test case (rewritten to be more human readable, originally autogenerated
by syzkaller[1]):

	#define _GNU_SOURCE
	#include <stdlib.h>
	#include <sys/ipc.h>
	#include <sys/mman.h>
	#include <sys/shm.h>

	#define PAGE_SIZE 4096

	int main()
	{
		int id;
		void *p;

		id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
		p = shmat(id, NULL, 0);
		shmctl(id, IPC_RMID, NULL);
		remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);

	        return 0;
	}

The patch changes shm_mmap() and code around shm_lock() to propagate
locking error back to caller of shm_mmap().

[1] http://github.com/google/syzkaller

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Yaowei Bai 2954e440be ipc/shm.c: is_file_shm_hugepages() can be boolean
Make is_file_shm_hugepages() return bool to improve readability due to
this particular function only using either one or zero as its return
value.

No functional change.

Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20 17:09:18 -08:00
Linus Torvalds b9a5322779 Initialize msg/shm IPC objects before doing ipc_addid()
As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
having initialized the IPC object state.  Yes, we initialize the IPC
object in a locked state, but with all the lockless RCU lookup work,
that IPC object lock no longer means that the state cannot be seen.

We already did this for the IPC semaphore code (see commit e8577d1f03:
"ipc/sem.c: fully initialize sem_array before making it visible") but we
clearly forgot about msg and shm.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-30 12:48:40 -04:00
Davidlohr Bueso d0edd85283 ipc: convert invalid scenarios to use WARN_ON
Considering Linus' past rants about the (ab)use of BUG in the kernel, I
took a look at how we deal with such calls in ipc.  Given that any errors
or corruption in ipc code are most likely contained within the set of
processes participating in the broken mechanisms, there aren't really many
strong fatal system failure scenarios that would require a BUG call.
Also, if something is seriously wrong, ipc might not be the place for such
a BUG either.

1. For example, recently, a customer hit one of these BUG_ONs in shm
   after failing shm_lock().  A busted ID imho does not merit a BUG_ON,
   and WARN would have been better.

2. MSG_COPY functionality of posix msgrcv(2) for checkpoint/restore.
   I don't see how we can hit this anyway -- at least it should be IS_ERR.
    The 'copy' arg from do_msgrcv is always set by calling prepare_copy()
   first and foremost.  We could also probably drop this check altogether.
    Either way, it does not merit a BUG_ON.

3. No ->fault() callback for the fs getting the corresponding page --
   seems selfish to make the system unusable.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Stephen Smalley e1832f2923 ipc: use private shmem or hugetlbfs inodes for shm segments.
The shm implementation internally uses shmem or hugetlbfs inodes for shm
segments.  As these inodes are never directly exposed to userspace and
only accessed through the shm operations which are already hooked by
security modules, mark the inodes with the S_PRIVATE flag so that inode
security initialization and permission checking is skipped.

This was motivated by the following lockdep warning:

  ======================================================
   [ INFO: possible circular locking dependency detected ]
   4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G        W
  -------------------------------------------------------
   httpd/1597 is trying to acquire lock:
   (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130
   but task is already holding lock:
   (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180
   which lock already depends on the new lock.
   the existing dependency chain (in reverse order) is:
   -> #3 (&mm->mmap_sem){++++++}:
        lock_acquire+0xc7/0x270
        __might_fault+0x7a/0xa0
        filldir+0x9e/0x130
        xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
        xfs_readdir+0x1b4/0x330 [xfs]
        xfs_file_readdir+0x2b/0x30 [xfs]
        iterate_dir+0x97/0x130
        SyS_getdents+0x91/0x120
        entry_SYSCALL_64_fastpath+0x12/0x76
   -> #2 (&xfs_dir_ilock_class){++++.+}:
        lock_acquire+0xc7/0x270
        down_read_nested+0x57/0xa0
        xfs_ilock+0x167/0x350 [xfs]
        xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
        xfs_attr_get+0xbd/0x190 [xfs]
        xfs_xattr_get+0x3d/0x70 [xfs]
        generic_getxattr+0x4f/0x70
        inode_doinit_with_dentry+0x162/0x670
        sb_finish_set_opts+0xd9/0x230
        selinux_set_mnt_opts+0x35c/0x660
        superblock_doinit+0x77/0xf0
        delayed_superblock_init+0x10/0x20
        iterate_supers+0xb3/0x110
        selinux_complete_init+0x2f/0x40
        security_load_policy+0x103/0x600
        sel_write_load+0xc1/0x750
        __vfs_write+0x37/0x100
        vfs_write+0xa9/0x1a0
        SyS_write+0x58/0xd0
        entry_SYSCALL_64_fastpath+0x12/0x76
  ...

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Morten Stevens <mstevens@fedoraproject.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Davidlohr Bueso 55b7ae5016 ipc: rename ipc_obtain_object
...  to ipc_obtain_object_idr, which is more meaningful and makes the code
slightly easier to follow.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:58 -07:00
Davidlohr Bueso c5c8975b2e ipc,shm: move BUG_ON check into shm_lock
Upon every shm_lock call, we BUG_ON if an error was returned, indicating
racing either in idr or in shm_destroy.  Move this logic into the locking.

[akpm@linux-foundation.org: simplify code]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:58 -07:00
Linus Torvalds 9ec3a646fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
 "d_inode() annotations from David Howells (sat in for-next since before
  the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RCU pathwalk breakage when running into a symlink overmounting something
  fix I_DIO_WAKEUP definition
  direct-io: only inc/dec inode->i_dio_count for file systems
  fs/9p: fix readdir()
  VFS: assorted d_backing_inode() annotations
  VFS: fs/inode.c helpers: d_inode() annotations
  VFS: fs/cachefiles: d_backing_inode() annotations
  VFS: fs library helpers: d_inode() annotations
  VFS: assorted weird filesystems: d_inode() annotations
  VFS: normal filesystems (and lustre): d_inode() annotations
  VFS: security/: d_inode() annotations
  VFS: security/: d_backing_inode() annotations
  VFS: net/: d_inode() annotations
  VFS: net/unix: d_backing_inode() annotations
  VFS: kernel/: d_inode() annotations
  VFS: audit: d_backing_inode() annotations
  VFS: Fix up some ->d_inode accesses in the chelsio driver
  VFS: Cachefiles should perform fs modifications on the top layer only
  VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-26 17:22:07 -07:00
Joe Perches 7f032d6ef6 ipc: remove use of seq_printf return value
The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03 ("seq_file: Rename seq_overflow() to
     seq_has_overflowed() and make public")

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15 16:35:24 -07:00
David Howells 75c3cfa855 VFS: assorted weird filesystems: d_inode() annotations
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:58 -04:00
Dave Hansen 07a46ed27d shmdt: use i_size_read() instead of ->i_size
Andrew Morton noted

	http://lkml.kernel.org/r/20141104142027.a7a0d010772d84560b445f59@linux-foundation.org

that the shmdt uses inode->i_size outside of i_mutex being held.
There is one more case in shm.c in shm_destroy().  This converts
both users over to use i_size_read().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00