linux/fs
Ye Bin 23e3d7f706 jbd2: fix a potential race while discarding reserved buffers after an abort
we got issue as follows:
[   72.796117] EXT4-fs error (device sda): ext4_journal_check_start:83: comm fallocate: Detected aborted journal
[   72.826847] EXT4-fs (sda): Remounting filesystem read-only
fallocate: fallocate failed: Read-only file system
[   74.791830] jbd2_journal_commit_transaction: jh=0xffff9cfefe725d90 bh=0x0000000000000000 end delay
[   74.793597] ------------[ cut here ]------------
[   74.794203] kernel BUG at fs/jbd2/transaction.c:2063!
[   74.794886] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   74.795533] CPU: 4 PID: 2260 Comm: jbd2/sda-8 Not tainted 5.17.0-rc8-next-20220315-dirty #150
[   74.798327] RIP: 0010:__jbd2_journal_unfile_buffer+0x3e/0x60
[   74.801971] RSP: 0018:ffffa828c24a3cb8 EFLAGS: 00010202
[   74.802694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   74.803601] RDX: 0000000000000001 RSI: ffff9cfefe725d90 RDI: ffff9cfefe725d90
[   74.804554] RBP: ffff9cfefe725d90 R08: 0000000000000000 R09: ffffa828c24a3b20
[   74.805471] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9cfefe725d90
[   74.806385] R13: ffff9cfefe725d98 R14: 0000000000000000 R15: ffff9cfe833a4d00
[   74.807301] FS:  0000000000000000(0000) GS:ffff9d01afb00000(0000) knlGS:0000000000000000
[   74.808338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   74.809084] CR2: 00007f2b81bf4000 CR3: 0000000100056000 CR4: 00000000000006e0
[   74.810047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   74.810981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   74.811897] Call Trace:
[   74.812241]  <TASK>
[   74.812566]  __jbd2_journal_refile_buffer+0x12f/0x180
[   74.813246]  jbd2_journal_refile_buffer+0x4c/0xa0
[   74.813869]  jbd2_journal_commit_transaction.cold+0xa1/0x148
[   74.817550]  kjournald2+0xf8/0x3e0
[   74.819056]  kthread+0x153/0x1c0
[   74.819963]  ret_from_fork+0x22/0x30

Above issue may happen as follows:
        write                   truncate                   kjournald2
generic_perform_write
 ext4_write_begin
  ext4_walk_page_buffers
   do_journal_get_write_access ->add BJ_Reserved list
 ext4_journalled_write_end
  ext4_walk_page_buffers
   write_end_fn
    ext4_handle_dirty_metadata
                ***************JBD2 ABORT**************
     jbd2_journal_dirty_metadata
 -> return -EROFS, jh in reserved_list
                                                   jbd2_journal_commit_transaction
                                                    while (commit_transaction->t_reserved_list)
                                                      jh = commit_transaction->t_reserved_list;
                        truncate_pagecache_range
                         do_invalidatepage
			  ext4_journalled_invalidatepage
			   jbd2_journal_invalidatepage
			    journal_unmap_buffer
			     __dispose_buffer
			      __jbd2_journal_unfile_buffer
			       jbd2_journal_put_journal_head ->put last ref_count
			        __journal_remove_journal_head
				 bh->b_private = NULL;
				 jh->b_bh = NULL;
				                      jbd2_journal_refile_buffer(journal, jh);
							bh = jh2bh(jh);
							->bh is NULL, later will trigger null-ptr-deref
				 journal_free_journal_head(jh);

After commit 96f1e09745, we no longer hold the j_state_lock while
iterating over the list of reserved handles in
jbd2_journal_commit_transaction().  This potentially allows the
journal_head to be freed by journal_unmap_buffer while the commit
codepath is also trying to free the BJ_Reserved buffers.  Keeping
j_state_lock held while trying extends hold time of the lock
minimally, and solves this issue.

Fixes: 96f1e0974575("jbd2: avoid long hold times of j_state_lock while committing a transaction")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220317142137.1821590-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-04-21 14:21:30 -04:00
..
9p Revert "fs/9p: search open fids first" 2022-01-30 22:13:37 +09:00
adfs fs/adfs: remove unneeded variable make code cleaner 2022-01-20 08:52:55 +02:00
affs affs: use bdev_nr_sectors instead of open coding it 2021-10-18 14:43:22 -06:00
afs proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
autofs autofs: fix wait name hash calculation in autofs_wait() 2021-10-20 21:09:02 -04:00
befs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
bfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
btrfs for-5.17-rc4-tag 2022-02-15 09:14:05 -08:00
cachefiles netfs, cachefiles: Add a method to query presence of data in the cache 2022-02-01 10:29:18 -06:00
ceph ceph: set pool_ns in new inode layout for async creates 2022-01-26 20:17:50 +01:00
cifs cifs: fix confusing unneeded warning message on smb2.1 and earlier 2022-02-16 17:16:49 -06:00
coda coda: bump module version to 7.2 2021-11-09 10:02:51 -08:00
configfs fsnotify: fix fsnotify hooks in pseudo filesystems 2022-01-24 14:17:02 +01:00
cramfs cramfs: use bdev_nr_bytes instead of open coding it 2021-10-18 14:43:22 -06:00
crypto fscrypt: improve a few comments 2021-10-25 19:11:50 -07:00
debugfs debugfs: lockdown: Allow reading debugfs files that are not world readable 2022-01-06 15:47:41 +01:00
devpts fsnotify: fix fsnotify hooks in pseudo filesystems 2022-01-24 14:17:02 +01:00
dlm driver core changes for 5.17-rc1 2022-01-12 11:11:34 -08:00
ecryptfs fs: add is_idmapped_mnt() helper 2021-12-03 18:44:06 +01:00
efivarfs
efs
erofs erofs: fix small compressed files inlining 2022-02-04 12:37:12 +08:00
exfat exfat: fix missing REQ_SYNC in exfat_update_bhs() 2022-01-10 11:00:04 +09:00
exportfs
ext2 fsdax: shift partition offset handling into the file systems 2021-12-04 08:58:54 -08:00
ext4 ext4: update the cached overhead value in the superblock 2022-04-14 22:39:00 -04:00
f2fs Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
fat FAT: use io_schedule_timeout() instead of congestion_wait() 2022-01-20 08:52:54 +02:00
freevxfs
fscache fscache: Fix the volume collision wait condition 2022-01-21 21:36:28 +00:00
fuse virtio,vdpa,qemu_fw_cfg: features, cleanups, fixes 2022-01-18 10:05:48 +02:00
gfs2 gfs2 fixes: 2022-02-11 11:36:32 -08:00
hfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
hfsplus hfsplus: use struct_group_attr() for memcpy() region 2022-01-20 08:52:54 +02:00
hostfs hostfs: Fix writeback of dirty pages 2021-12-21 21:44:27 +01:00
hpfs treewide: Replace open-coded flex arrays in unions 2021-10-18 12:28:53 -07:00
hugetlbfs hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list() 2022-01-15 16:30:30 +02:00
iomap xfs, iomap: limit individual ioend chain lengths in writeback 2022-01-26 09:19:20 -08:00
isofs isofs: Fix out of bound access for corrupted isofs image 2021-10-19 12:51:02 +02:00
jbd2 jbd2: fix a potential race while discarding reserved buffers after an abort 2022-04-21 14:21:30 -04:00
jffs2 Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
jfs Just one JFS patch 2021-11-03 09:23:25 -07:00
kernfs kernfs: prevent early freeing of root node 2021-12-03 14:36:21 +01:00
ksmbd ksmbd: add support for key exchange 2022-02-04 00:12:22 -06:00
lockd Notable bug fixes: 2022-02-02 10:14:31 -08:00
minix mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
netfs netfs: Make ops->init_rreq() optional 2022-01-21 21:36:28 +00:00
nfs NFS: Do not report writeback errors in nfs_getattr() 2022-02-16 15:15:22 -05:00
nfs_common nfs: Fix kerneldoc warning shown up by W=1 2021-10-04 22:02:17 +01:00
nfsd Notable bug fixes: 2022-02-09 09:56:57 -08:00
nilfs2 Merge branch 'akpm' (patches from Andrew) 2022-01-20 10:41:01 +02:00
nls
notify fanotify: Fix stale file descriptor in copy_event_to_user() 2022-02-01 12:52:07 +01:00
ntfs fs/ntfs/attrib.c: fix one kernel-doc comment 2022-01-15 16:30:24 +02:00
ntfs3 mm: remove cleancache 2022-01-22 08:33:38 +02:00
ocfs2 ocfs2: fix a deadlock when commit trans 2022-01-30 09:56:58 +02:00
omfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
openpromfs
orangefs orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc() 2021-12-31 14:37:43 -05:00
overlayfs overlayfs fixes for 5.17-rc3 2022-02-01 11:23:02 -08:00
proc fs/proc: task_mmu.c: don't read mapcount for migration entry 2022-02-11 17:55:00 -08:00
pstore pstore update for v5.17-rc1 2022-01-10 11:48:37 -08:00
qnx4 qnx4: work around gcc false positive warning bug 2021-09-21 08:36:48 -07:00
qnx6
quota quota: make dquot_quota_sync return errors from ->sync_fs 2022-01-30 08:59:47 -08:00
ramfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
reiserfs reiserfs: don't use congestion_wait() 2021-11-18 11:52:22 +01:00
romfs
smbfs_common smb3: add new defines from protocol specification 2022-01-18 16:50:47 -06:00
squashfs squashfs: provide backing_dev_info in order to disable read-ahead 2022-01-15 16:30:24 +02:00
sysfs fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755 sysfs_create_dir_ns() 2021-10-05 16:35:05 +02:00
sysv sysv: use BUILD_BUG_ON instead of runtime check 2021-11-09 10:02:52 -08:00
tracefs Tracing updates for 5.17: 2022-01-16 10:15:32 +02:00
ubifs ubifs: read-only if LEB may always be taken in ubifs_garbage_collect 2021-12-23 22:30:38 +01:00
udf udf: Restore i_lenAlloc when inode expansion fails 2022-01-24 14:45:02 +01:00
ufs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
unicode Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
vboxsf vboxfs: fix broken legacy mount signature checking 2021-09-27 11:26:21 -07:00
verity fs-verity: fix signed integer overflow with i_size near S64_MAX 2021-09-22 10:56:34 -07:00
xfs Fixes for 5.17-rc3: 2022-02-05 09:21:55 -08:00
zonefs zonefs: add MODULE_ALIAS_FS 2021-12-17 16:56:35 +09:00
aio.c aio: move aio sysctl to aio.c 2022-01-22 08:33:34 +02:00
anon_inodes.c fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() 2021-09-19 22:35:37 -04:00
attr.c fs: handle circular mappings correctly 2021-11-17 09:26:09 +01:00
bad_inode.c vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
binfmt_aout.c binfmt: a.out: Fix bogus semicolon 2021-09-05 10:15:05 -07:00
binfmt_elf.c fs/binfmt_elf: fix PT_LOAD p_align values for loaders 2022-02-11 17:55:00 -08:00
binfmt_elf_fdpic.c coredump: Limit coredumps to a single thread group 2021-10-08 12:06:02 -05:00
binfmt_flat.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_misc.c Fix regression due to "fs: move binfmt_misc sysctl to its own file" 2022-02-09 09:50:02 -08:00
binfmt_script.c
buffer.c fs/buffer: Convert __block_write_begin_int() to take a folio 2021-12-16 15:49:51 -05:00
char_dev.c
compat_binfmt_elf.c
coredump.c fs/coredump: move coredump sysctls into its own file 2022-01-22 08:33:36 +02:00
d_path.c d_path: fix Kernel doc validator complaining 2021-11-06 13:30:32 -07:00
dax.c dax: remove the copy_from_iter and copy_to_iter methods 2021-12-18 08:04:53 -08:00
dcache.c fs: move dcache sysctls to its own file 2022-01-22 08:33:36 +02:00
direct-io.c fs: get rid of the res2 iocb->ki_complete argument 2021-10-25 10:36:24 -06:00
drop_caches.c fs: drop_caches: fix skipping over shadow cache inodes 2021-09-03 09:58:10 -07:00
eventfd.c eventfd: Export eventfd_wake_count to modules 2021-09-06 07:20:56 -04:00
eventpoll.c eventpoll: simplify sysctl declaration with register_sysctl() 2022-01-22 08:33:35 +02:00
exec.c fs/coredump: move coredump sysctls into its own file 2022-01-22 08:33:36 +02:00
fcntl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
fhandle.c
file.c fget: clarify and improve __fget_files() implementation 2021-12-13 10:55:30 -08:00
file_table.c fs/file_table: fix adding missing kmemleak_not_leak() 2022-02-17 10:23:19 -08:00
filesystems.c fs: simplify get_filesystem_list / get_all_fs_names 2021-08-23 01:25:40 -04:00
fs-writeback.c fscache rewrite 2022-01-12 13:45:12 -08:00
fs_context.c vfs: fs_context: fix up param length parsing in legacy_parse_param 2022-01-18 09:23:19 +02:00
fs_parser.c fs_parse: allow parameter value to be empty 2021-12-09 14:09:36 -05:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c fs: move inode sysctls to its own file 2022-01-22 08:33:35 +02:00
internal.h fs/buffer: Convert __block_write_begin_int() to take a folio 2021-12-16 15:49:51 -05:00
io-wq.c io_uring-5.17-2022-01-21 2022-01-21 16:07:21 +02:00
io-wq.h Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
io_uring.c mm: io_uring: allow oom-killer from io_uring_setup 2022-02-07 08:44:01 -07:00
ioctl.c fs/ioctl: remove unnecessary __user annotation 2022-01-15 16:30:25 +02:00
Kconfig ksmbd: add support for key exchange 2022-02-04 00:12:22 -06:00
Kconfig.binfmt binfmt: remove support for em86 (alpha only) 2021-07-25 22:33:03 -07:00
kernel_read_file.c vfs: check fd has read access in kernel_read_file_from_fd() 2021-10-18 20:22:03 -10:00
libfs.c unicode: clean up the Kconfig symbol confusion 2022-01-20 19:57:24 -05:00
locks.c fs: move locking sysctls where they are used 2022-01-22 08:33:36 +02:00
Makefile Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
mbcache.c
mount.h
mpage.c mm: remove cleancache 2022-01-22 08:33:38 +02:00
namei.c \n 2022-01-28 17:51:31 +02:00
namespace.c fs: add kernel doc for mnt_{hold,unhold}_writers() 2022-02-14 08:35:32 +01:00
no-block.c
nsfs.c
open.c fs: support mapped mounts of mapped filesystems 2021-12-05 10:28:57 +01:00
pipe.c fs: move pipe sysctls to is own file 2022-01-22 08:33:36 +02:00
pnode.c
pnode.h
posix_acl.c fs: support mapped mounts of mapped filesystems 2021-12-05 10:28:57 +01:00
proc_namespace.c fs: add is_idmapped_mnt() helper 2021-12-03 18:44:06 +01:00
read_write.c fs: remove leftover comments from mandatory locking removal 2021-10-26 12:20:50 -04:00
readdir.c
remap_range.c fs: Convert vfs_dedupe_file_range_compare to folios 2022-01-08 00:28:41 -05:00
select.c select: Fix indefinitely sleeping task in poll_schedule_timeout() 2022-01-11 09:03:05 -08:00
seq_file.c seq_file: move seq_escape() to a header 2021-11-09 10:02:52 -08:00
signalfd.c Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
splice.c
stack.c
stat.c fs: add generic helper for filling statx attribute flags 2021-08-17 11:47:43 +02:00
statfs.c
super.c vfs: make freeze_super abort when sync_filesystem returns error 2022-01-30 08:59:47 -08:00
sync.c vfs: make sync_filesystem return errors from ->sync_fs 2022-01-30 08:59:47 -08:00
sysctls.c fs: move namespace sysctls and declare fs base directory 2022-01-22 08:33:36 +02:00
timerfd.c timerfd: Provide timerfd_resume() 2021-08-10 17:57:22 +02:00
userfaultfd.c mm: move anon_vma declarations to linux/mm_inline.h 2022-01-15 16:30:27 +02:00
utimes.c
xattr.c