linux/fs
Roman Gushchin 4ade5867b4 writeback, cgroup: do not switch inodes with I_WILL_FREE flag
Patch series "cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups", v9.

When an inode is getting dirty for the first time it's associated with a
wb structure (see __inode_attach_wb()).  It can later be switched to
another wb (if e.g.  some other cgroup is writing a lot of data to the
same inode), but otherwise stays attached to the original wb until being
reclaimed.

The problem is that the wb structure holds a reference to the original
memory and blkcg cgroups.  So if an inode has been dirty once and later is
actively used in read-only mode, it has a good chance to pin down the
original memory and blkcg cgroups forever.  This is often the case with
services bringing data for other services, e.g.  updating some rpm
packages.

In the real life it becomes a problem due to a large size of the memcg
structure, which can easily be 1000x larger than an inode.  Also a really
large number of dying cgroups can raise different scalability issues, e.g.
making the memory reclaim costly and less effective.

To solve the problem inodes should be eventually detached from the
corresponding writeback structure.  It's inefficient to do it after every
writeback completion.  Instead it can be done whenever the original memory
cgroup is offlined and writeback structure is getting killed.  Scanning
over a (potentially long) list of inodes and detach them from the
writeback structure can take quite some time.  To avoid scanning all
inodes, attached inodes are kept on a new list (b_attached).  To make it
less noticeable to a user, the scanning and switching is performed from a
work context.

Big thanks to Jan Kara, Dennis Zhou, Hillf Danton and Tejun Heo for their
ideas and contribution to this patchset.

This patch (of 8):

If an inode's state has I_WILL_FREE flag set, the inode will be freed
soon, so there is no point in trying to switch the inode to a different
cgwb.

I_WILL_FREE was ignored since the introduction of the inode switching, so
it looks like it doesn't lead to any noticeable issues for a user.  This
is why the patch is not intended for a stable backport.

Link: https://lkml.kernel.org/r/20210608230225.2078447-1-guro@fb.com
Link: https://lkml.kernel.org/r/20210608230225.2078447-2-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Jan Kara <jack@suse.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dennis Zhou <dennis@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:47 -07:00
..
9p 9p for 5.13-rc1 2021-05-07 11:18:52 -07:00
adfs fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
affs idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
afs netfslib fixes 2021-06-25 09:41:29 -07:00
autofs autofs: should_expire() argument is guaranteed to be positive 2021-03-24 14:14:27 -04:00
befs fs/befs: Delete obsolete TODO file 2021-03-30 16:54:49 -07:00
bfs fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
btrfs for-5.13-rc6-tag 2021-06-18 16:39:03 -07:00
cachefiles fscache, cachefiles: Add alternate API to use kiocb for read/write to cache 2021-04-23 10:14:32 +01:00
ceph ceph: fix error handling in ceph_atomic_open and ceph_lookup 2021-06-22 13:53:28 +02:00
cifs cifs: change format of CIFS_FULL_KEY_DUMP ioctl 2021-05-27 15:26:32 -05:00
coda coda: fix reference counting in coda_file_mmap error path 2021-04-23 14:42:39 -07:00
configfs treewide: remove editor modelines and cruft 2021-05-07 00:26:34 -07:00
cramfs cramfs: use %pD instead of messing with file_dentry()->d_name 2021-01-05 23:02:47 -05:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-04-26 08:51:23 -07:00
debugfs debugfs: Fix debugfs_read_file_str() 2021-06-04 15:01:08 +02:00
devpts
dlm fs: dlm: fix missing unlock on error in accept_from_sock() 2021-03-29 13:28:18 -05:00
ecryptfs fs: ecryptfs: remove BUG_ON from crypt_scatterlist 2021-05-13 18:32:26 +02:00
efivarfs efivars: convert to fileattr 2021-04-12 15:04:29 +02:00
efs
erofs erofs: fix 1 lcluster-sized pcluster for big pcluster 2021-05-13 15:58:46 +08:00
exfat exfat: speed up iterate/lookup by fixing start point of traversing cluster chain 2021-04-27 20:45:07 +09:00
exportfs exportfs: Add a function to return the raw output from fh_to_dentry() 2020-12-09 09:39:38 -05:00
ext2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-02 09:14:01 -07:00
ext4 Miscellaneous ext4 bug fixes for v5.13 2021-06-06 14:24:13 -07:00
f2fs f2fs: return EINVAL for hole cases in swap file 2021-05-12 07:38:00 -07:00
fat fs: fat: fix spelling typo of values 2021-05-07 00:26:34 -07:00
freevxfs
fscache fscache, cachefiles: Add alternate API to use kiocb for read/write to cache 2021-04-23 10:14:32 +01:00
fuse Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-02 09:14:01 -07:00
gfs2 Revert "gfs2: Fix mmap locking for write faults" 2021-06-01 23:16:42 +02:00
hfs fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
hfsplus hfsplus: prevent corruption in shrinking truncate 2021-05-14 19:41:32 -07:00
hostfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-02 09:14:01 -07:00
hpfs hpfs: replace one-element array with flexible-array member 2021-05-06 19:24:13 -07:00
hugetlbfs mm/hugetlb: expand restore_reserve_on_error functionality 2021-06-16 09:24:42 -07:00
iomap mm/filemap: fix readahead return types 2021-05-14 19:41:32 -07:00
isofs isofs: fix fall-through warnings for Clang 2021-05-06 19:24:13 -07:00
jbd2 ext4: fix debug format string warning 2021-04-09 23:32:16 -04:00
jffs2 This pull request contains changes for JFFS2, UBI and UBIFS 2021-05-04 18:08:40 -07:00
jfs jfs: convert to fileattr 2021-04-12 15:04:29 +02:00
kernfs idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
lockd SUNRPC: Make trace_svc_process() display the RPC procedure symbolically 2021-01-25 09:36:23 -05:00
minix fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
netfs netfs: fix test for whether we can skip read when writing beyond EOF 2021-06-21 21:24:07 +01:00
nfs NFSv4: Fix second deadlock in nfs4_evict_inode() 2021-06-03 10:14:42 -04:00
nfs_common NFSD: Add an xdr_stream-based encoder for NFSv2/3 ACLs 2021-03-22 10:19:00 -04:00
nfsd NFS client updates for Linux 5.13 2021-05-07 11:23:41 -07:00
nilfs2 nilfs2: fix memory leak in nilfs_sysfs_delete_device_group 2021-06-24 19:40:53 -07:00
nls
notify fanotify: fix copy_event_to_user() fid error clean up 2021-06-14 12:16:37 +02:00
ntfs ntfs: fix validity check for file name attribute 2021-06-29 10:53:45 -07:00
ocfs2 ocfs2: remove redundant initialization of variable ret 2021-06-29 10:53:46 -07:00
omfs fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
openpromfs openpromfs: don't do unlock_new_inode() until the new inode is set up 2021-03-12 22:15:22 -05:00
orangefs orangefs: leave files in the page cache for a few micro seconds at least 2021-04-29 08:06:05 -04:00
overlayfs overlayfs update for 5.13 2021-04-30 15:17:08 -07:00
proc proc: only require mm_struct for writing 2021-06-15 10:47:51 -07:00
pstore printk changes for 5.13 2021-04-27 18:09:44 -07:00
qnx4
qnx6
quota quota: Use 'hlist_for_each_entry' to simplify code 2021-05-10 16:27:49 +02:00
ramfs ramfs: support O_TMPFILE 2021-02-24 13:38:26 -08:00
reiserfs treewide: remove editor modelines and cruft 2021-05-07 00:26:34 -07:00
romfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
squashfs squashfs: add option to panic on errors 2021-06-29 10:53:46 -07:00
sysfs sysfs: Support zapping of binary attr mmaps 2021-01-12 14:26:31 +01:00
sysv fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
tracefs tracing: Fix various typos in comments 2021-03-23 14:08:18 -04:00
ubifs This pull request contains changes for JFFS2, UBI and UBIFS 2021-05-04 18:08:40 -07:00
udf useful constants: struct qstr for ".." 2021-04-15 22:36:45 -04:00
ufs useful constants: struct qstr for ".." 2021-04-15 22:36:45 -04:00
unicode .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
vboxsf vboxsf: don't allow to change the inode type 2021-03-12 22:15:00 -05:00
verity fsverity: relax build time dependency on CRYPTO_SHA256 2021-04-22 17:31:32 +10:00
xfs xfs: bunmapi has unnecessary AG lock ordering issues 2021-05-27 08:11:24 -07:00
zonefs \n 2021-04-29 11:06:13 -07:00
aio.c Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio" 2021-04-30 11:20:39 -07:00
anon_inodes.c fs: anon_inodes: rephrase to appropriate kernel-doc 2021-01-15 12:17:25 -05:00
attr.c ima: handle idmapped mounts 2021-01-24 14:27:20 +01:00
bad_inode.c fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
binfmt_aout.c
binfmt_elf.c coredump: don't bother with do_truncate() 2021-03-08 10:21:11 -05:00
binfmt_elf_fdpic.c coredump: don't bother with do_truncate() 2021-03-08 10:21:11 -05:00
binfmt_em86.c
binfmt_flat.c binfmt_flat: allow not offsetting data start 2021-04-19 09:56:37 +10:00
binfmt_misc.c binfmt_misc: fix possible deadlock in bm_register_write 2021-03-13 11:27:30 -08:00
binfmt_script.c
block_dev.c block-5.13-2021-05-22 2021-05-22 07:40:34 -10:00
buffer.c Merge branch 'akpm' (patches from Andrew) 2021-05-05 13:50:15 -07:00
char_dev.c
compat_binfmt_elf.c get rid of COMPAT_ELF_EXEC_PAGESIZE 2021-01-06 08:42:51 -05:00
coredump.c coredump: Limit what can interrupt coredumps 2021-06-10 14:02:29 -07:00
d_path.c constify dentry argument of dentry_path()/dentry_path_raw() 2021-03-21 11:43:58 -04:00
dax.c dax: fix ENOMEM handling in grab_mapping_entry() 2021-06-29 10:53:47 -07:00
dcache.c useful constants: struct qstr for ".." 2021-04-15 22:36:45 -04:00
direct-io.c fs: direct-io: fix missing sdio->boundary 2021-04-09 14:54:23 -07:00
drop_caches.c
eventfd.c eventfd: Export eventfd_ctx_do_read() 2020-11-15 09:49:10 -05:00
eventpoll.c fs/epoll: restore waking from ep_done_scan() 2021-05-06 19:24:13 -07:00
exec.c fs: delete repeated words in comments 2021-02-24 13:38:26 -08:00
fcntl.c idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
fhandle.c fs: delete repeated words in comments 2021-02-24 13:38:26 -08:00
file.c Merge branch 'work.file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-05-03 11:05:28 -07:00
file_table.c epoll: take epitem list out of struct file 2020-10-25 20:02:08 -04:00
filesystems.c
fs-writeback.c writeback, cgroup: do not switch inodes with I_WILL_FREE flag 2021-06-29 10:53:47 -07:00
fs_context.c
fs_parser.c vfs: fs_parser: clean up kernel-doc warnings 2021-04-30 11:20:35 -07:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c init: handle idmapped mounts 2021-01-24 14:27:19 +01:00
inode.c mm: remove nrexceptional from inode: remove BUG_ON 2021-05-05 11:27:20 -07:00
internal.h idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
io-wq.c io-wq: Fix UAF when wakeup wqe in hash waitqueue 2021-05-26 09:03:56 -06:00
io-wq.h io_uring/io-wq: close io-wq full-stop gap 2021-05-25 19:39:58 -06:00
io_uring.c io_uring: add feature flag for rsrc tags 2021-06-10 16:33:51 -06:00
ioctl.c vfs: add fileattr ops 2021-04-12 15:04:23 +02:00
Kconfig NFS client updates for Linux 5.13 2021-05-07 11:23:41 -07:00
Kconfig.binfmt binfmt_flat: allow not offsetting data start 2021-04-19 09:56:37 +10:00
kernel_read_file.c fs/kernel_file_read: Add "offset" arg for partial reads 2020-10-05 13:37:04 +02:00
libfs.c libfs: fix kernel-doc for mnt_userns 2021-03-23 11:20:25 +01:00
locks.c Additional fixes and clean-ups for NFSD since tags/nfsd-5.13, 2021-05-05 13:44:19 -07:00
Makefile netfs: Provide readahead and readpage netfs helpers 2021-04-23 10:14:32 +01:00
mbcache.c
mount.h mount: make {lock,unlock}_mount_hash() static 2021-01-24 14:29:34 +01:00
mpage.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
namei.c fs.idmapped.helpers.v5.13 2021-04-27 12:49:42 -07:00
namespace.c fs/mount_setattr: tighten permission checks 2021-05-12 14:13:16 +02:00
no-block.c
nsfs.c
open.c idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
pipe.c fs: delete repeated words in comments 2021-02-24 13:38:26 -08:00
pnode.c
pnode.h mount: fix mounting of detached mounts onto targets that reside on shared mounts 2021-03-08 15:18:43 +01:00
posix_acl.c fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
proc_namespace.c fs: introduce MOUNT_ATTR_IDMAP 2021-01-24 14:43:45 +01:00
read_write.c teach sendfile(2) to handle send-to-pipe directly 2021-01-25 23:29:36 -05:00
readdir.c readdir: make sure to verify directory entry for legacy interfaces too 2021-04-17 11:39:49 -07:00
remap_range.c ioctl: handle idmapped mounts 2021-01-24 14:27:19 +01:00
select.c kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data() 2021-03-16 22:13:10 +01:00
seq_file.c seq_file: Add a seq_bprintf function 2021-04-27 15:50:15 -07:00
signalfd.c signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo 2021-05-18 16:20:54 -05:00
splice.c for-5.12/block-2021-02-17 2021-02-21 11:02:48 -08:00
stack.c
stat.c fs: fix reporting supported extra file attributes for statx() 2021-04-17 23:03:50 -04:00
statfs.c s390,alpha: switch to 64-bit ino_t 2021-02-13 17:17:53 +01:00
super.c fs,security: Add sb_delete hook 2021-04-22 12:22:11 -07:00
sync.c
timerfd.c
userfaultfd.c userfaultfd: add UFFDIO_CONTINUE ioctl 2021-05-05 11:27:22 -07:00
utimes.c utimes: handle idmapped mounts 2021-01-24 14:27:18 +01:00
xattr.c xattr: fix kernel-doc for mnt_userns and vfs xattr helpers 2021-03-23 11:20:26 +01:00