linux/fs
Nikolay Borisov 4d14c5cde5 btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070eac ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277a49 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_timeout at ffffffffa52a1bdd
  #3  wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
  #4  start_delalloc_inodes at ffffffffc0380db5
  #5  btrfs_start_delalloc_snapshot at ffffffffc0393836
  #6  try_flush_qgroup at ffffffffc03f04b2
  #7  __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
  #8  btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
  #9  btrfs_update_inode at ffffffffc0385ba8
 #10  btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
 #11  touch_atime at ffffffffa4cf0000
 #12  generic_file_read_iter at ffffffffa4c1f123
 #13  new_sync_read at ffffffffa4ccdc8a
 #14  vfs_read at ffffffffa4cd0849
 #15  ksys_read at ffffffffa4cd0bd1
 #16  do_syscall_64 at ffffffffa4a052eb
 #17  entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_preempt_disabled at ffffffffa529e80a
  #3  __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
  #4  btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
  #5  btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
  #6  cow_file_range_inline.constprop.78 at ffffffffc0386be7
  #7  cow_file_range at ffffffffc03879c1
  #8  btrfs_run_delalloc_range at ffffffffc038894c
  #9  writepage_delalloc at ffffffffc03a3c8f
 #10  __extent_writepage at ffffffffc03a4c01
 #11  extent_write_cache_pages at ffffffffc03a500b
 #12  extent_writepages at ffffffffc03a6de2
 #13  do_writepages at ffffffffa4c277eb
 #14  __filemap_fdatawrite_range at ffffffffa4c1e5bb
 #15  btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
 #16  normal_work_helper at ffffffffc03b706c
 #17  process_one_work at ffffffffa4aba4e4
 #18  worker_thread at ffffffffa4aba6fd
 #19  kthread at ffffffffa4ac0a3d
 #20  ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e965360 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-03-02 17:17:09 +01:00
..
9p 9p for 5.11-rc1 2020-12-21 10:28:02 -08:00
adfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
affs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
afs rxrpc: Fix deadlock around release of dst cached on udp tunnel 2021-01-29 21:38:11 -08:00
autofs file: Replace ksys_close with close_fd 2020-12-10 12:42:59 -06:00
befs [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
bfs bfs: don't use WARNING: string when it's just info. 2020-12-15 22:46:18 -08:00
btrfs btrfs: don't flush from btrfs_delayed_inode_reserve_metadata 2021-03-02 17:17:09 +01:00
cachefiles cachefiles: Drop superfluous readpages aops NULL check 2021-01-20 11:33:51 -08:00
ceph libceph, ceph: disambiguate ceph_connection_operations handlers 2021-01-04 17:31:32 +01:00
cifs cifs: report error instead of invalid when revalidating a dentry fails 2021-02-05 13:17:48 -06:00
coda docs: filesystems: convert coda.txt to ReST 2020-05-05 09:22:21 -06:00
configfs configfs: fix kernel-doc markup issue 2020-11-14 10:22:45 +01:00
cramfs [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
crypto f2fs-for-5.11-rc1 2020-12-17 11:18:00 -08:00
debugfs debugfs: remove return value of debugfs_create_devm_seqfile() 2020-10-30 08:37:39 +01:00
devpts
dlm fs: dlm: check on existing node address 2020-11-10 12:14:20 -06:00
ecryptfs ecryptfs: fix uid translation for setxattr on security.capability 2021-01-26 01:47:14 +00:00
efivarfs efivarfs: revert "fix memory leak in efivarfs_create()" 2020-11-25 16:55:02 +01:00
efs [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
erofs erofs: avoid using generic_block_bmap 2020-12-10 11:07:40 +08:00
exfat exfat: Avoid allocating upcase table using kcalloc() 2020-12-22 12:31:17 +09:00
exportfs exportfs: Add a function to return the raw output from fh_to_dentry() 2020-12-09 09:39:38 -05:00
ext2 ext2: Fix fall-through warnings for Clang 2020-11-23 10:36:53 +01:00
ext4 A number of bug fixes for ext4: 2021-01-15 14:54:24 -08:00
f2fs f2fs-for-5.11-rc1 2020-12-17 11:18:00 -08:00
fat [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
freevxfs
fscache Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-06-03 16:27:18 -07:00
fuse fuse: fix bad inode 2020-12-10 15:33:14 +01:00
gfs2 gfs2: in signal_our_withdraw wait for unfreeze of _this_ fs only 2020-12-03 17:04:41 +01:00
hfs fs: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
hfsplus fs: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
hostfs fix hostfs_open() use of ->f_path.dentry 2020-12-21 21:42:29 -05:00
hpfs [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
hugetlbfs mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page 2021-02-05 11:03:47 -08:00
iomap iomap: support REQ_OP_ZONE_APPEND 2021-02-09 00:52:19 +01:00
isofs fs: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
jbd2 jbd2: add a helper to find out number of fast commit blocks 2020-12-17 13:30:45 -05:00
jffs2 jffs2: Fix NULL pointer dereference in rp_size fs option parsing 2020-12-13 21:57:21 +01:00
jfs jfs: Fix array index bounds check in dbAdjTree 2020-11-13 16:03:07 -06:00
kernfs kernfs: wire up ->splice_read and ->splice_write 2021-01-21 18:30:28 +01:00
lockd fs/lockd: convert comma to semicolon 2020-12-16 07:57:37 -05:00
minix [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
nfs pNFS/NFSv4: Improve rejection of out-of-order layouts 2021-01-24 20:52:31 -05:00
nfs_common nfs_common: need lock during iterate through the list 2020-12-09 09:38:34 -05:00
nfsd nfsd4: readdirplus shouldn't return parent of export 2021-01-12 08:54:14 -05:00
nilfs2 fs/nilfs2: remove some unused macros to tame gcc 2020-12-15 22:46:17 -08:00
nls treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
notify fanotify: Fix sys_fanotify_mark() on native x86-32 2020-12-28 11:58:59 +01:00
ntfs fs/ntfs: remove unused variable attr_len 2020-12-15 12:13:37 -08:00
ocfs2 ocfs2: ratelimit the 'max lookup times reached' notice 2020-12-15 12:13:37 -08:00
omfs fs: omfs: use kmemdup() rather than kmalloc+memcpy 2020-09-22 23:39:45 -04:00
openpromfs
orangefs orangefs: add splice file operations 2020-12-16 16:14:08 -05:00
overlayfs ovl: implement volatile-specific fsync error behaviour 2021-01-28 10:22:48 +01:00
proc proc_sysctl: fix oops caused by incorrect command parameters 2021-01-24 10:34:53 -08:00
pstore Tracing updates for 5.11 2020-12-17 13:22:17 -08:00
qnx4 [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
qnx6 [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
quota \n 2020-12-17 11:00:37 -08:00
ramfs ramfs: fix nommu mmap with gaps in the page cache 2020-10-16 11:11:22 -07:00
reiserfs reiserfs: add check for an invalid ih_entry_count 2020-11-26 16:57:28 +01:00
romfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
squashfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
sysfs sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output 2020-10-02 12:02:30 +02:00
sysv [PATCH] reduce boilerplate in fsid handling 2020-09-18 16:45:50 -04:00
tracefs
ubifs This pull request contains changes for JFFS2, UBI and UBIFS: 2020-12-17 17:46:34 -08:00
udf udf: fix the problem that the disc content is not displayed 2021-01-18 12:06:33 +01:00
ufs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
unicode unicode: Add utf8_casefold_hash 2020-09-10 14:03:31 -07:00
vboxsf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2020-10-15 15:11:56 -07:00
verity Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-12-14 12:18:19 -08:00
xfs New code for 5.11: 2020-12-18 12:50:18 -08:00
zonefs zonefs: select CONFIG_CRC32 2021-01-04 09:06:42 +09:00
aio.c Merge branch 'akpm' (patches from Andrew) 2020-12-15 12:53:37 -08:00
anon_inodes.c
attr.c
bad_inode.c fs: move the fiemap definitions out of fs.h 2020-06-03 23:16:55 -04:00
binfmt_aout.c exec: Rename flush_old_exec begin_new_exec 2020-05-07 16:55:47 -05:00
binfmt_elf.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
binfmt_elf_fdpic.c binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot 2020-10-16 11:11:21 -07:00
binfmt_em86.c Merge branch 'akpm' (patches from Andrew) 2020-06-04 19:18:29 -07:00
binfmt_flat.c binfmt_flat: revert "binfmt_flat: don't offset the data start" 2020-08-24 08:49:13 +10:00
binfmt_misc.c Merge branch 'akpm' (patches from Andrew) 2020-06-04 19:18:29 -07:00
binfmt_script.c Merge branch 'akpm' (patches from Andrew) 2020-06-04 19:18:29 -07:00
block_dev.c Revert "block: simplify set_init_blocksize" to regain lost performance 2021-01-27 09:14:12 -07:00
buffer.c for-5.11/block-2020-12-14 2020-12-16 12:57:51 -08:00
char_dev.c vfs: allow unprivileged whiteout creation 2020-05-14 16:44:23 +02:00
compat_binfmt_elf.c elf: Expose ELF header on arch_setup_additional_pages() 2020-10-26 13:46:47 +01:00
coredump.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
d_path.c fs: fix NULL dereference due to data race in prepend_path() 2020-10-14 14:54:45 -07:00
dax.c mm: simplify follow_pte{,pmd} 2020-12-15 22:46:19 -08:00
dcache.c fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set 2020-12-10 17:33:17 -05:00
dcookies.c
direct-io.c \n 2020-10-15 15:03:10 -07:00
drop_caches.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
eventfd.c eventfd: Export eventfd_ctx_do_read() 2020-11-15 09:49:10 -05:00
eventpoll.c epoll: add syscall epoll_pwait2 2020-12-19 11:18:38 -08:00
exec.c Merge branch 'parisc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2020-12-16 12:10:40 -08:00
fcntl.c fcntl: Fix potential deadlock in send_sig{io, urg}() 2020-11-05 07:44:15 -05:00
fhandle.c
file.c kernel/io_uring: cancel io_uring before task works 2020-12-30 19:36:54 -07:00
file_table.c epoll: take epitem list out of struct file 2020-10-25 20:02:08 -04:00
filesystems.c fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() 2020-04-10 15:36:22 -07:00
fs-writeback.c fs: fix lazytime expiration handling in __writeback_single_inode() 2021-01-13 17:26:21 +01:00
fs_context.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
fs_parser.c fs_parse: mark fs_param_bad_value() as static 2020-10-13 18:38:27 -07:00
fs_pin.c
fs_struct.c vfs: Use sequence counter with associated spinlock 2020-07-29 16:14:27 +02:00
fs_types.c
fsopen.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
init.c init: add an init_dup helper 2020-08-04 21:02:38 -04:00
inode.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-12-25 10:54:29 -08:00
internal.h for-5.11/block-2020-12-14 2020-12-16 12:57:51 -08:00
io-wq.c io-wq: kill now unused io_wq_cancel_all() 2020-12-20 10:47:42 -07:00
io-wq.h io-wq: kill now unused io_wq_cancel_all() 2020-12-20 10:47:42 -07:00
io_uring.c io_uring: drop mm/files between task_work_submit 2021-02-04 12:42:58 -07:00
ioctl.c fs: remove ksys_ioctl 2020-07-31 08:16:01 +02:00
Kconfig tmpfs: support 64-bit inums per-sb 2020-08-07 11:33:24 -07:00
Kconfig.binfmt treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
kernel_read_file.c fs/kernel_file_read: Add "offset" arg for partial reads 2020-10-05 13:37:04 +02:00
libfs.c f2fs-for-5.11-rc1 2020-12-17 11:18:00 -08:00
locks.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
Makefile Refactored code for 5.10: 2020-10-23 11:33:41 -07:00
mbcache.c
mount.h mnt: Use generic ns_common::count 2020-08-19 14:14:19 +02:00
mpage.c fs: convert mpage_readpages to mpage_readahead 2020-06-02 10:59:07 -07:00
namei.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-12-25 10:54:29 -08:00
namespace.c umount(2): move the flag validity checks first 2021-01-04 15:31:58 -05:00
no-block.c
nsfs.c nsproxy: attach to namespaces via pidfds 2020-05-13 11:41:22 +02:00
open.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
pipe.c fs/pipe: allow sendfile() to pipe again 2021-01-25 12:32:26 -08:00
pnode.c propagate_one(): mnt_set_mountpoint() needs mount_lock 2020-04-27 10:37:14 -04:00
pnode.h fs/namespace.c: WARN if mnt_count has become negative 2020-12-10 17:33:17 -05:00
posix_acl.c vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK 2020-06-08 11:04:19 -07:00
proc_namespace.c proc mountinfo: make splice available again 2020-12-27 12:00:36 -08:00
read_write.c Refactored code for 5.10: 2020-10-23 11:33:41 -07:00
readdir.c fs: remove ksys_getdents64 2020-07-31 08:16:00 +02:00
remap_range.c vfs: verify source area in vfs_dedupe_file_range_one() 2020-12-14 15:26:13 +01:00
select.c poll: fix performance regression due to out-of-line __put_user() 2021-01-08 11:06:29 -08:00
seq_file.c fix return values of seq_read_iter() 2020-11-15 22:12:53 -05:00
signalfd.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
splice.c io_uring-5.10-2020-10-24 2020-10-24 12:40:18 -07:00
stack.c
stat.c fs: remove KSTAT_QUERY_FLAGS 2020-09-26 22:55:05 -04:00
statfs.c block: remove i_bdev 2020-12-01 14:53:39 -07:00
super.c block: remove i_bdev 2020-12-01 14:53:39 -07:00
sync.c overlayfs update for 5.8 2020-06-09 15:40:50 -07:00
timerfd.c
userfaultfd.c userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob 2020-12-15 12:13:46 -08:00
utimes.c fs: expose utimes_common 2020-07-31 08:16:01 +02:00
xattr.c vfs: move cap_convert_nscap() call into vfs_setxattr() 2020-12-14 15:26:13 +01:00