linux/fs
Collin Fijalkovich eb6ecbed0a mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs
Transparent huge pages are supported for read-only non-shmem files, but
are only used for vmas with VM_DENYWRITE.  This condition ensures that
file THPs are protected from writes while an application is running
(ETXTBSY).  Any existing file THPs are then dropped from the page cache
when a file is opened for write in do_dentry_open().  Since sys_mmap
ignores MAP_DENYWRITE, this constrains the use of file THPs to vmas
produced by execve().

Systems that make heavy use of shared libraries (e.g.  Android) are unable
to apply VM_DENYWRITE through the dynamic linker, preventing them from
benefiting from the resultant reduced contention on the TLB.

This patch reduces the constraint on file THPs allowing use with any
executable mapping from a file not opened for write (see
inode_is_open_for_write()).  It also introduces additional conditions to
ensure that files opened for write will never be backed by file THPs.

Restricting the use of THPs to executable mappings eliminates the risk
that a read-only file later opened for write would encounter significant
latencies due to page cache truncation.

The ld linker flag '-z max-page-size=(hugepage size)' can be used to
produce executables with the necessary layout.  The dynamic linker must
map these file's segments at a hugepage size aligned vma for the mapping
to be backed with THPs.

Comparison of the performance characteristics of 4KB and 2MB-backed
libraries follows; the Android dex2oat tool was used to AOT compile an
example application on a single ARM core.

4KB Pages:
==========

count              event_name            # count / runtime
598,995,035,942    cpu-cycles            # 1.800861 GHz
 81,195,620,851    raw-stall-frontend    # 244.112 M/sec
347,754,466,597    iTLB-loads            # 1.046 G/sec
  2,970,248,900    iTLB-load-misses      # 0.854122% miss rate

Total test time: 332.854998 seconds.

2MB Pages:
==========

count              event_name            # count / runtime
592,872,663,047    cpu-cycles            # 1.800358 GHz
 76,485,624,143    raw-stall-frontend    # 232.261 M/sec
350,478,413,710    iTLB-loads            # 1.064 G/sec
    803,233,322    iTLB-load-misses      # 0.229182% miss rate

Total test time: 329.826087 seconds

A check of /proc/$(pidof dex2oat64)/smaps shows THPs in use:

/apex/com.android.art/lib64/libart.so
FilePmdMapped:      4096 kB

/apex/com.android.art/lib64/libart-compiler.so
FilePmdMapped:      2048 kB

Link: https://lkml.kernel.org/r/20210406000930.3455850-1-cfijalkovich@google.com
Signed-off-by: Collin Fijalkovich <cfijalkovich@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Song Liu <song@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:29 -07:00
..
9p 9p for 5.13-rc1 2021-05-07 11:18:52 -07:00
adfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
affs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
afs netfslib fixes 2021-06-25 09:41:29 -07:00
autofs autofs: should_expire() argument is guaranteed to be positive 2021-03-24 14:14:27 -04:00
befs fs/befs: Delete obsolete TODO file 2021-03-30 16:54:49 -07:00
bfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
btrfs for-5.13-rc6-tag 2021-06-18 16:39:03 -07:00
cachefiles fscache, cachefiles: Add alternate API to use kiocb for read/write to cache 2021-04-23 10:14:32 +01:00
ceph ceph: fix error handling in ceph_atomic_open and ceph_lookup 2021-06-22 13:53:28 +02:00
cifs cifs: change format of CIFS_FULL_KEY_DUMP ioctl 2021-05-27 15:26:32 -05:00
coda coda: fix reference counting in coda_file_mmap error path 2021-04-23 14:42:39 -07:00
configfs fs: move ramfs_aops to libfs 2021-06-29 10:53:48 -07:00
cramfs
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-04-26 08:51:23 -07:00
debugfs debugfs: Fix debugfs_read_file_str() 2021-06-04 15:01:08 +02:00
devpts
dlm fs: dlm: fix missing unlock on error in accept_from_sock() 2021-03-29 13:28:18 -05:00
ecryptfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
efivarfs efivars: convert to fileattr 2021-04-12 15:04:29 +02:00
efs
erofs erofs: fix 1 lcluster-sized pcluster for big pcluster 2021-05-13 15:58:46 +08:00
exfat mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
exportfs
ext2 fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
ext4 fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
f2fs f2fs: return EINVAL for hole cases in swap file 2021-05-12 07:38:00 -07:00
fat mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
freevxfs
fscache fscache, cachefiles: Add alternate API to use kiocb for read/write to cache 2021-04-23 10:14:32 +01:00
fuse mm: move page dirtying prototypes from mm.h 2021-06-29 10:53:48 -07:00
gfs2 iomap: use __set_page_dirty_nobuffers 2021-06-29 10:53:48 -07:00
hfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
hfsplus mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
hostfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-02 09:14:01 -07:00
hpfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
hugetlbfs mm/hugetlb: expand restore_reserve_on_error functionality 2021-06-16 09:24:42 -07:00
iomap iomap: use __set_page_dirty_nobuffers 2021-06-29 10:53:48 -07:00
isofs isofs: fix fall-through warnings for Clang 2021-05-06 19:24:13 -07:00
jbd2 ext4: fix debug format string warning 2021-04-09 23:32:16 -04:00
jffs2 This pull request contains changes for JFFS2, UBI and UBIFS 2021-05-04 18:08:40 -07:00
jfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
kernfs fs: move ramfs_aops to libfs 2021-06-29 10:53:48 -07:00
lockd
minix mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
netfs netfs: fix test for whether we can skip read when writing beyond EOF 2021-06-21 21:24:07 +01:00
nfs NFSv4: Fix second deadlock in nfs4_evict_inode() 2021-06-03 10:14:42 -04:00
nfs_common NFSD: Add an xdr_stream-based encoder for NFSv2/3 ACLs 2021-03-22 10:19:00 -04:00
nfsd NFS client updates for Linux 5.13 2021-05-07 11:23:41 -07:00
nilfs2 mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
nls
notify fanotify: fix copy_event_to_user() fid error clean up 2021-06-14 12:16:37 +02:00
ntfs ntfs: fix validity check for file name attribute 2021-06-29 10:53:45 -07:00
ocfs2 mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
omfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
openpromfs openpromfs: don't do unlock_new_inode() until the new inode is set up 2021-03-12 22:15:22 -05:00
orangefs orangefs: leave files in the page cache for a few micro seconds at least 2021-04-29 08:06:05 -04:00
overlayfs overlayfs update for 5.13 2021-04-30 15:17:08 -07:00
proc fs/proc/kcore: use page_offline_(freeze|thaw) 2021-06-30 20:47:28 -07:00
pstore printk changes for 5.13 2021-04-27 18:09:44 -07:00
qnx4
qnx6
quota quota: Use 'hlist_for_each_entry' to simplify code 2021-05-10 16:27:49 +02:00
ramfs fs: move ramfs_aops to libfs 2021-06-29 10:53:48 -07:00
reiserfs treewide: remove editor modelines and cruft 2021-05-07 00:26:34 -07:00
romfs
squashfs squashfs: add option to panic on errors 2021-06-29 10:53:46 -07:00
sysfs
sysv mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
tracefs tracing: Fix various typos in comments 2021-03-23 14:08:18 -04:00
ubifs This pull request contains changes for JFFS2, UBI and UBIFS 2021-05-04 18:08:40 -07:00
udf mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
ufs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
unicode .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
vboxsf vboxsf: don't allow to change the inode type 2021-03-12 22:15:00 -05:00
verity fsverity: relax build time dependency on CRYPTO_SHA256 2021-04-22 17:31:32 +10:00
xfs fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
zonefs mm: move page dirtying prototypes from mm.h 2021-06-29 10:53:48 -07:00
aio.c Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio" 2021-04-30 11:20:39 -07:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_elf.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_elf_fdpic.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_em86.c
binfmt_flat.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_misc.c binfmt_misc: fix possible deadlock in bm_register_write 2021-03-13 11:27:30 -08:00
binfmt_script.c
block_dev.c mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
buffer.c mm/writeback: move __set_page_dirty() to core mm 2021-06-29 10:53:48 -07:00
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: Limit what can interrupt coredumps 2021-06-10 14:02:29 -07:00
d_path.c constify dentry argument of dentry_path()/dentry_path_raw() 2021-03-21 11:43:58 -04:00
dax.c dax: fix ENOMEM handling in grab_mapping_entry() 2021-06-29 10:53:47 -07:00
dcache.c useful constants: struct qstr for ".." 2021-04-15 22:36:45 -04:00
direct-io.c fs: direct-io: fix missing sdio->boundary 2021-04-09 14:54:23 -07:00
drop_caches.c
eventfd.c
eventpoll.c fs/epoll: restore waking from ep_done_scan() 2021-05-06 19:24:13 -07:00
exec.c fs: delete repeated words in comments 2021-02-24 13:38:26 -08:00
fcntl.c idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
fhandle.c fs: delete repeated words in comments 2021-02-24 13:38:26 -08:00
file.c Merge branch 'work.file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-03 11:05:28 -07:00
file_table.c
filesystems.c
fs-writeback.c writeback, cgroup: release dying cgwbs by switching attached inodes 2021-06-29 10:53:48 -07:00
fs_context.c
fs_parser.c vfs: fs_parser: clean up kernel-doc warnings 2021-04-30 11:20:35 -07:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c mm: remove nrexceptional from inode: remove BUG_ON 2021-05-05 11:27:20 -07:00
internal.h idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
io-wq.c io-wq: Fix UAF when wakeup wqe in hash waitqueue 2021-05-26 09:03:56 -06:00
io-wq.h io_uring/io-wq: close io-wq full-stop gap 2021-05-25 19:39:58 -06:00
io_uring.c io_uring: add feature flag for rsrc tags 2021-06-10 16:33:51 -06:00
ioctl.c vfs: add fileattr ops 2021-04-12 15:04:23 +02:00
Kconfig mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON 2021-06-30 20:47:26 -07:00
Kconfig.binfmt binfmt_flat: allow not offsetting data start 2021-04-19 09:56:37 +10:00
kernel_read_file.c
libfs.c fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
locks.c Additional fixes and clean-ups for NFSD since tags/nfsd-5.13, 2021-05-05 13:44:19 -07:00
Makefile netfs: Provide readahead and readpage netfs helpers 2021-04-23 10:14:32 +01:00
mbcache.c
mount.h
mpage.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
namei.c fs.idmapped.helpers.v5.13 2021-04-27 12:49:42 -07:00
namespace.c fs/mount_setattr: tighten permission checks 2021-05-12 14:13:16 +02:00
no-block.c
nsfs.c
open.c mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs 2021-06-30 20:47:29 -07:00
pipe.c fs: delete repeated words in comments 2021-02-24 13:38:26 -08:00
pnode.c
pnode.h mount: fix mounting of detached mounts onto targets that reside on shared mounts 2021-03-08 15:18:43 +01:00
posix_acl.c
proc_namespace.c
read_write.c
readdir.c readdir: make sure to verify directory entry for legacy interfaces too 2021-04-17 11:39:49 -07:00
remap_range.c
select.c kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data() 2021-03-16 22:13:10 +01:00
seq_file.c seq_file: Add a seq_bprintf function 2021-04-27 15:50:15 -07:00
signalfd.c signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo 2021-05-18 16:20:54 -05:00
splice.c for-5.12/block-2021-02-17 2021-02-21 11:02:48 -08:00
stack.c
stat.c fs: fix reporting supported extra file attributes for statx() 2021-04-17 23:03:50 -04:00
statfs.c s390,alpha: switch to 64-bit ino_t 2021-02-13 17:17:53 +01:00
super.c fs,security: Add sb_delete hook 2021-04-22 12:22:11 -07:00
sync.c
timerfd.c
userfaultfd.c userfaultfd/shmem: advertise shmem minor fault support 2021-06-30 20:47:27 -07:00
utimes.c
xattr.c xattr: fix kernel-doc for mnt_userns and vfs xattr helpers 2021-03-23 11:20:26 +01:00