linux/fs
Jérôme Glisse 5042db43cc mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory
HMM (heterogeneous memory management) need struct page to support
migration from system main memory to device memory.  Reasons for HMM and
migration to device memory is explained with HMM core patch.

This patch deals with device memory that is un-addressable memory (ie CPU
can not access it).  Hence we do not want those struct page to be manage
like regular memory.  That is why we extend ZONE_DEVICE to support
different types of memory.

A persistent memory type is define for existing user of ZONE_DEVICE and a
new device un-addressable type is added for the un-addressable memory
type.  There is a clear separation between what is expected from each
memory type and existing user of ZONE_DEVICE are un-affected by new
requirement and new use of the un-addressable type.  All specific code
path are protect with test against the memory type.

Because memory is un-addressable we use a new special swap type for when a
page is migrated to device memory (this reduces the number of maximum swap
file).

The main two additions beside memory type to ZONE_DEVICE is two callbacks.
First one, page_free() is call whenever page refcount reach 1 (which
means the page is free as ZONE_DEVICE page never reach a refcount of 0).
This allow device driver to manage its memory and associated struct page.

The second callback page_fault() happens when there is a CPU access to an
address that is back by a device page (which are un-addressable by the
CPU).  This callback is responsible to migrate the page back to system
main memory.  Device driver can not block migration back to system memory,
HMM make sure that such page can not be pin into device memory.

If device is in some error condition and can not migrate memory back then
a CPU page fault to device memory should end with SIGBUS.

[arnd@arndb.de: fix warning]
  Link: http://lkml.kernel.org/r/20170823133213.712917-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170817000548.32038-8-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:46 -07:00
..
9p Merge branch 'akpm' (patches from Andrew) 2017-09-06 20:49:49 -07:00
adfs
affs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
afs Merge branch 'akpm' (patches from Andrew) 2017-09-06 20:49:49 -07:00
autofs4 Fix up over-eager 'wait_queue_t' renaming 2017-07-10 11:40:19 -07:00
befs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
bfs bfs: fix sanity checks for empty files 2017-07-12 16:26:00 -07:00
btrfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
cachefiles sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming 2017-06-20 12:19:14 +02:00
ceph Merge branch 'akpm' (patches from Andrew) 2017-09-06 20:49:49 -07:00
cifs enable xattr support for smb3 and also a bugfix 2017-09-07 16:06:14 -07:00
coda fs: implement vfs_iter_write using do_iter_write 2017-06-29 17:49:23 -04:00
configfs configfs: Introduce config_item_get_unless_zero() 2017-06-12 13:20:20 +02:00
cramfs
crypto block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
debugfs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
devpts pty: Repair TIOCGPTPEER 2017-08-24 13:23:03 -07:00
dlm File locking related changes for v4.14 2017-09-06 13:43:26 -07:00
ecryptfs ecryptfs: convert to file_write_and_wait in ->fsync 2017-08-03 14:20:22 -04:00
efivarfs VFS: Kill off s_options and helpers 2017-07-11 06:09:21 -04:00
efs
exofs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
exportfs
ext2 dax: use common 4k zero page for dax mmap reads 2017-09-06 17:27:24 -07:00
ext4 Merge branch 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-09-07 15:19:35 -07:00
f2fs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
fat
freevxfs
fscache mm: remove nr_pages argument from pagevec_lookup{,_range}() 2017-09-06 17:27:27 -07:00
fuse Writeback error handling fixes for v4.14 2017-09-06 14:11:03 -07:00
gfs2 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
hfs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
hfsplus Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
hostfs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
hpfs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
hugetlbfs mm: remove nr_pages argument from pagevec_lookup{,_range}() 2017-09-06 17:27:27 -07:00
isofs isofs: Delete an unnecessary variable initialisation in isofs_read_inode() 2017-08-23 18:54:03 +02:00
jbd2 Writeback error handling fixes (pile #2) 2017-07-07 19:38:17 -07:00
jffs2 fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
jfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
kernfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
lockd sunrpc: mark all struct svc_version instances as const 2017-07-13 15:58:03 -04:00
minix Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-08 10:50:54 -07:00
ncpfs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
nfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
nfs_common
nfsd annotate RWF_... flags 2017-08-31 17:32:38 -04:00
nilfs2 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
nls
notify fsnotify: make dnotify_fsnotify_ops const 2017-08-30 16:02:48 +02:00
ntfs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
ocfs2 Merge branch 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-09-07 15:19:35 -07:00
omfs omfs: Implement show_options 2017-07-06 03:31:46 -04:00
openpromfs
orangefs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
overlayfs overlayfs, locking: Remove smp_mb__before_spinlock() usage 2017-08-10 12:29:02 +02:00
proc mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory 2017-09-08 18:26:46 -07:00
pstore Revert "pstore: Honor dmesg_restrict sysctl on dmesg dumps" 2017-08-17 16:29:19 -07:00
qnx4
qnx6
quota Merge branch 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-09-07 15:19:35 -07:00
ramfs mm: make pagevec_lookup() update index 2017-09-06 17:27:26 -07:00
reiserfs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-09-07 14:53:17 -07:00
romfs
squashfs
sysfs sysfs: be careful of error returns from ops->show() 2017-04-08 17:33:32 +02:00
sysv mm: drop "wait" parameter from write_one_page() 2017-07-05 18:44:22 -04:00
tracefs VFS: Don't use save/replace_mount_options if not using generic_show_options 2017-07-06 03:31:46 -04:00
ubifs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
udf fs-udf: Delete an error message for a failed memory allocation in two functions 2017-08-16 16:43:23 +02:00
ufs Writeback error handling fixes (pile #1) 2017-07-07 18:39:15 -07:00
xfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
aio.c fs: aio: fix the increment of aio-nr and counting against aio-max-nr 2017-09-07 12:28:28 -04:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c This series has the ultimate goal of providing a sane stack rlimit when 2017-09-07 20:35:29 -07:00
binfmt_elf_fdpic.c binfmt: Introduce secureexec flag 2017-08-01 12:03:05 -07:00
binfmt_em86.c
binfmt_flat.c exec: Rename bprm->cred_prepared to called_set_creds 2017-08-01 12:02:48 -07:00
binfmt_misc.c fs: constify tree_descr arrays passed to simple_fill_super() 2017-04-26 23:54:06 -04:00
binfmt_script.c
block_dev.c block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
buffer.c Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
char_dev.c char_dev: order /proc/devices by major number 2017-07-17 15:28:50 +02:00
compat.c fs/compat.c: trim unused includes 2017-04-17 12:52:27 -04:00
compat_binfmt_elf.c
compat_ioctl.c media: get rid of removed DMX_GET_CAPS and DMX_SET_SOURCE leftovers 2017-09-05 08:25:07 -04:00
coredump.c
dax.c dax: initialize variable pfn before using it 2017-09-06 17:27:24 -07:00
dcache.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
dcookies.c
direct-io.c block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
drop_caches.c
eventfd.c There has been a fair amount of activity in the docs tree this time 2017-07-03 21:13:25 -07:00
eventpoll.c epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove() 2017-09-01 13:07:35 -07:00
exec.c exec: Consolidate pdeath_signal clearing 2017-08-01 12:03:14 -07:00
fcntl.c vfs: fix flock compat thinko 2017-07-07 13:48:18 -07:00
fhandle.c fhandle: move compat syscalls from compat.c 2017-04-17 12:52:26 -04:00
file.c fs/file.c: replace alloc_fdmem() with kvmalloc() alternative 2017-07-06 16:24:30 -07:00
file_table.c fs: new infrastructure for writeback error handling and reporting 2017-07-06 07:02:25 -04:00
filesystems.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
fs-writeback.c writeback: rework wb_[dec|inc]_stat family of functions 2017-07-12 16:26:05 -07:00
fs_pin.c sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming 2017-06-20 12:19:14 +02:00
fs_struct.c
inode.c xfs: evict all inodes involved with log redo item 2017-09-01 10:55:30 -07:00
internal.h xfs: evict all inodes involved with log redo item 2017-09-01 10:55:30 -07:00
ioctl.c
iomap.c Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
Kconfig fs/Kconfig: kill CONFIG_PERCPU_RWSEM some more 2017-07-12 16:26:00 -07:00
Kconfig.binfmt
libfs.c fs: convert __generic_file_fsync to use errseq_t based reporting 2017-07-06 07:02:29 -04:00
locks.c locks: restore a warn for leaked locks on close 2017-07-21 13:57:31 -04:00
Makefile
mbcache.c ext4: xattr inode deduplication 2017-06-22 11:44:55 -04:00
mount.h Now that IPC and other changes have landed, enable manual markings for 2017-07-19 08:55:18 -07:00
mpage.c block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
namei.c Now that IPC and other changes have landed, enable manual markings for 2017-07-19 08:55:18 -07:00
namespace.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
no-block.c
nsfs.c VFS: Provide empty name qstr 2017-07-06 03:27:09 -04:00
open.c Writeback error handling fixes (pile #2) 2017-07-07 19:38:17 -07:00
pipe.c VFS: Provide empty name qstr 2017-07-06 03:27:09 -04:00
pnode.c mnt: Make propagate_umount less slow for overlapping mount propagation trees 2017-05-23 08:41:17 -05:00
pnode.h
posix_acl.c
proc_namespace.c
read_write.c annotate RWF_... flags 2017-08-31 17:32:38 -04:00
readdir.c readdir: move compat syscalls from compat.c 2017-04-17 12:52:24 -04:00
select.c fs/select: Fix memory corruption in compat_get_fd_set() 2017-08-28 16:09:19 -07:00
seq_file.c mm: introduce kv[mz]alloc helpers 2017-05-08 17:15:12 -07:00
signalfd.c sched/wait: Rename wait_queue_t => wait_queue_entry_t 2017-06-20 12:18:27 +02:00
splice.c fs: implement vfs_iter_write using do_iter_write 2017-06-29 17:49:23 -04:00
stack.c
stat.c fs: Provide __inode_get_bytes() 2017-08-17 22:06:03 +02:00
statfs.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-08 10:50:54 -07:00
super.c quota: Convert dqio_mutex to rwsem 2017-08-17 18:52:48 +02:00
sync.c Merge branch 'akpm' (patches from Andrew) 2017-09-06 20:49:49 -07:00
timerfd.c timerfd: Use get_itimerspec64() and put_itimerspec64() 2017-06-30 04:14:38 -04:00
userfaultfd.c userfaultfd: provide pid in userfault msg - add feat union 2017-09-06 17:27:29 -07:00
utimes.c utimes: move compat syscalls from compat.c 2017-04-17 12:52:23 -04:00
xattr.c treewide: use kv[mz]alloc* rather than opencoded variants 2017-05-08 17:15:13 -07:00