linux/fs
Damien Le Moal fe9da61ffc zonefs: fix synchronous direct writes to sequential files
Commit 16d7fd3cfa ("zonefs: use iomap for synchronous direct writes")
changes zonefs code from a self-built zone append BIO to using iomap for
synchronous direct writes. This change relies on iomap submit BIO
callback to change the write BIO built by iomap to a zone append BIO.
However, this change overlooked the fact that a write BIO may be very
large as it is split when issued. The change from a regular write to a
zone append operation for the built BIO can result in a block layer
warning as zone append BIO are not allowed to be split.

WARNING: CPU: 18 PID: 202210 at block/bio.c:1644 bio_split+0x288/0x350
Call Trace:
? __warn+0xc9/0x2b0
? bio_split+0x288/0x350
? report_bug+0x2e6/0x390
? handle_bug+0x41/0x80
? exc_invalid_op+0x13/0x40
? asm_exc_invalid_op+0x16/0x20
? bio_split+0x288/0x350
bio_split_rw+0x4bc/0x810
? __pfx_bio_split_rw+0x10/0x10
? lockdep_unlock+0xf2/0x250
__bio_split_to_limits+0x1d8/0x900
blk_mq_submit_bio+0x1cf/0x18a0
? __pfx_iov_iter_extract_pages+0x10/0x10
? __pfx_blk_mq_submit_bio+0x10/0x10
? find_held_lock+0x2d/0x110
? lock_release+0x362/0x620
? mark_held_locks+0x9e/0xe0
__submit_bio+0x1ea/0x290
? __pfx___submit_bio+0x10/0x10
? seqcount_lockdep_reader_access.constprop.0+0x82/0x90
submit_bio_noacct_nocheck+0x675/0xa20
? __pfx_bio_iov_iter_get_pages+0x10/0x10
? __pfx_submit_bio_noacct_nocheck+0x10/0x10
iomap_dio_bio_iter+0x624/0x1280
__iomap_dio_rw+0xa22/0x18a0
? lock_is_held_type+0xe3/0x140
? __pfx___iomap_dio_rw+0x10/0x10
? lock_release+0x362/0x620
? zonefs_file_write_iter+0x74c/0xc80 [zonefs]
? down_write+0x13d/0x1e0
iomap_dio_rw+0xe/0x40
zonefs_file_write_iter+0x5ea/0xc80 [zonefs]
do_iter_readv_writev+0x18b/0x2c0
? __pfx_do_iter_readv_writev+0x10/0x10
? inode_security+0x54/0xf0
do_iter_write+0x13b/0x7c0
? lock_is_held_type+0xe3/0x140
vfs_writev+0x185/0x550
? __pfx_vfs_writev+0x10/0x10
? __handle_mm_fault+0x9bd/0x1c90
? find_held_lock+0x2d/0x110
? lock_release+0x362/0x620
? find_held_lock+0x2d/0x110
? lock_release+0x362/0x620
? __up_read+0x1ea/0x720
? do_pwritev+0x136/0x1f0
do_pwritev+0x136/0x1f0
? __pfx_do_pwritev+0x10/0x10
? syscall_enter_from_user_mode+0x22/0x90
? lockdep_hardirqs_on+0x7d/0x100
do_syscall_64+0x58/0x80

This error depends on the hardware used, specifically on the max zone
append bytes and max_[hw_]sectors limits. Tests using AMD Epyc machines
that have low limits did not reveal this issue while runs on Intel Xeon
machines with larger limits trigger it.

Manually splitting the zone append BIO using bio_split_rw() can solve
this issue but also requires issuing the fragment BIOs synchronously
with submit_bio_wait(), to avoid potential reordering of the zone append
BIO fragments, which would lead to data corruption. That is, this
solution is not better than using regular write BIOs which are subject
to serialization using zone write locking at the IO scheduler level.

Given this, fix the issue by removing zone append support and using
regular write BIOs for synchronous direct writes. This allows preseving
the use of iomap and having identical synchronous and asynchronous
sequential file write path. Zone append support will be reintroduced
later through io_uring commands to ensure that the needed special
handling is done correctly.

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 16d7fd3cfa ("zonefs: use iomap for synchronous direct writes")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-08-10 12:59:47 +09:00
..
9p fs/9p: Remove unused extern declaration 2023-07-20 19:21:48 +00:00
adfs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
affs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
afs afs: Fix accidental truncation when storing data 2023-07-04 12:24:32 -07:00
autofs arch/*/configs/*defconfig: Replace AUTOFS4_FS by AUTOFS_FS 2023-07-29 14:08:22 -07:00
befs befs: Replace all non-returning strlcpy with strscpy 2023-05-30 16:42:00 -07:00
bfs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
btrfs for-6.5-rc3-tag 2023-07-27 11:44:08 -07:00
cachefiles v6.5/vfs.file 2023-06-26 10:14:36 -07:00
ceph vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
coda vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
configfs fs: consolidate duplicate dt_type helpers 2023-04-03 09:23:54 +02:00
cramfs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
crypto fscrypt: Replace 1-element array with flexible array 2023-05-23 19:46:09 -07:00
debugfs debugfs: Correct the 'debugfs_create_str' docs 2023-05-31 19:02:14 +01:00
devpts devpts: simplify two-level sysctl registration for pty_kern_table 2023-03-13 12:36:34 +01:00
dlm dlm for 6.5 2023-06-29 13:27:50 -07:00
ecryptfs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
efivarfs efivarfs: expose used and total size 2023-05-17 18:21:34 +02:00
efs
erofs erofs: drop unnecessary WARN_ON() in erofs_kill_sb() 2023-08-01 16:12:24 +08:00
exfat vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
exportfs vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
ext2 \n 2023-06-29 13:39:51 -07:00
ext4 ext4: fix rbtree traversal bug in ext4_mb_use_preallocated 2023-07-23 08:21:14 -04:00
f2fs f2fs update for 6.5-rc1 2023-07-05 14:14:37 -07:00
fat splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
freevxfs There is no particular theme here - mainly quick hits all over the tree. 2023-02-23 17:55:40 -08:00
fscache fscache: Use clear_and_wake_up_bit() in fscache_create_volume_work() 2023-01-30 12:51:54 +00:00
fuse fuse update for 6.5 2023-07-19 11:00:27 -07:00
gfs2 gfs2 fixes 2023-07-04 11:45:16 -07:00
hfs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
hfsplus splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
hostfs Landlock updates for v6.5-rc1 2023-06-27 17:10:27 -07:00
hpfs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
hugetlbfs hugetlb: revert use of page_cache_next_miss() 2023-06-23 16:59:32 -07:00
iomap iomap: micro optimize the ki_pos assignment in iomap_file_buffered_write 2023-07-17 08:49:57 -07:00
isofs
jbd2 jbd2: remove __journal_try_to_free_buffer() 2023-07-10 23:09:21 -04:00
jffs2 for-6.5/splice-2023-06-23 2023-06-26 11:52:12 -07:00
jfs vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
kernfs driver core changes for 6.5-rc1 2023-07-03 12:56:23 -07:00
lockd NFS client updates for Linux 6.5 2023-07-01 14:38:25 -07:00
minix splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
netfs Move netfs_extract_iter_to_sg() to lib/scatterlist.c 2023-06-08 13:42:33 +02:00
nfs NFS client updates for Linux 6.5 2023-07-01 14:38:25 -07:00
nfs_common NFSv4.2: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:52 -07:00
nfsd nfsd-6.5 fixes: 2023-08-03 09:26:34 -07:00
nilfs2 for-6.5/block-2023-06-23 2023-06-26 12:47:20 -07:00
nls fs/nls: make load_nls() take a const parameter 2023-07-25 00:30:02 -05:00
notify fanotify: disallow mount/sb marks on kernel internal pseudo fs 2023-07-04 13:29:29 +02:00
ntfs vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
ntfs3 driver ntfs3 for linux 6.5 2023-07-07 14:59:38 -07:00
ocfs2 vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
omfs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
openpromfs
orangefs orangefs: Provide a splice-read wrapper 2023-05-24 08:42:16 -06:00
overlayfs vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
proc proc: fix missing conversion to 'iterate_shared' 2023-08-06 15:08:35 +02:00
pstore pstore updates for v6.5-rc1 2023-06-27 21:21:32 -07:00
qnx4 qnx4: credit contributors in CREDITS 2023-03-14 12:56:30 -06:00
qnx6 qnx6: credit contributor and mark filesystem orphan 2023-03-14 12:56:30 -06:00
quota quota: fix warning in dqgrab() 2023-06-05 16:50:30 +02:00
ramfs - Yosry Ahmed brought back some cgroup v1 stats in OOM logs. 2023-06-28 10:28:11 -07:00
reiserfs - Yosry Ahmed brought back some cgroup v1 stats in OOM logs. 2023-06-28 10:28:11 -07:00
romfs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
smb smb: client: fix dfs link mount against w2k8 2023-08-02 13:36:12 -05:00
squashfs squashfs: fix cache race with migration 2023-07-08 09:29:30 -07:00
sysfs sysfs: Skip empty folders creation 2023-06-15 13:37:53 +02:00
sysv for-6.5/splice-2023-06-23 2023-06-26 11:52:12 -07:00
tracefs fs: port ->mkdir() to pass mnt_idmap 2023-01-19 09:24:26 +01:00
ubifs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
udf \n 2023-06-29 13:39:51 -07:00
ufs splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
unicode unicode: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:54 -07:00
vboxsf vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
verity fsverity: improve documentation for builtin signature support 2023-06-20 22:47:55 -07:00
xfs xfs: convert flex-array declarations in xfs attr shortform objects 2023-07-17 08:48:56 -07:00
zonefs zonefs: fix synchronous direct writes to sequential files 2023-08-10 12:59:47 +09:00
aio.c fs/aio: Stop allocating aio rings from HIGHMEM 2023-06-15 09:22:23 +02:00
anon_inodes.c
attr.c nfs: use vfs setgid helper 2023-03-30 08:51:48 +02:00
bad_inode.c fs: port ->permission() to pass mnt_idmap 2023-01-19 09:24:28 +01:00
binfmt_elf.c Merge branch 'expand-stack' 2023-06-28 20:35:21 -07:00
binfmt_elf_fdpic.c binfmt: Slightly simplify elf_fdpic_map_file() 2023-05-30 15:49:46 -07:00
binfmt_elf_test.c
binfmt_flat.c
binfmt_misc.c binfmt_misc: fix shift-out-of-bounds in check_special_flags 2022-12-02 13:57:04 -08:00
binfmt_script.c
buffer.c \n 2023-06-29 13:39:51 -07:00
char_dev.c vfs: Replace all non-returning strlcpy with strscpy 2023-05-15 09:42:01 +02:00
compat_binfmt_elf.c
coredump.c v6.5/vfs.misc 2023-06-26 09:50:21 -07:00
d_path.c fs: d_path: include internal.h 2023-05-17 09:16:59 +02:00
dax.c dax: enable dax fault handler to report VM_FAULT_HWPOISON 2023-06-26 07:54:23 -06:00
dcache.c
direct-io.c - Yosry Ahmed brought back some cgroup v1 stats in OOM logs. 2023-06-28 10:28:11 -07:00
drop_caches.c
eventfd.c eventfd: show the EFD_SEMAPHORE flag in fdinfo 2023-06-15 09:22:23 +02:00
eventpoll.c v6.5/vfs.misc 2023-06-26 09:50:21 -07:00
exec.c \n 2023-06-29 13:31:44 -07:00
fcntl.c fs.idmapped.v6.3 2023-02-20 11:53:11 -08:00
fhandle.c fsnotify: move fsnotify_open() hook into do_dentry_open() 2023-06-12 10:43:45 +02:00
file.c fs: rely on ->iterate_shared to determine f_pos locking 2023-08-06 15:08:36 +02:00
file_table.c fs: move cleanup from init_file() into its callers 2023-07-02 13:15:49 +02:00
filesystems.c
fs-writeback.c writeback: move wb_over_bg_thresh() call outside lock section 2023-06-09 16:25:14 -07:00
fs_context.c fs: avoid empty option when generating legacy mount string 2023-06-07 21:49:55 +02:00
fs_parser.c ext4: journal_path mount options should follow links 2022-12-01 10:46:54 -05:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c fs: port ->permission() to pass mnt_idmap 2023-01-19 09:24:28 +01:00
inode.c fs: don't assume arguments are non-NULL 2023-07-04 10:21:11 +02:00
internal.h v6.5/vfs.file 2023-06-26 10:14:36 -07:00
ioctl.c fs: port inode_owner_or_capable() to mnt_idmap 2023-01-19 09:24:29 +01:00
Kconfig smb: move client and server files to common directory fs/smb 2023-05-24 16:29:21 -05:00
Kconfig.binfmt
kernel_read_file.c
libfs.c fs: factor out a direct_write_fallback helper 2023-06-09 16:25:53 -07:00
locks.c filelocks: use mount idmapping for setlease permission check 2023-03-09 22:36:12 +01:00
Makefile for-6.5/block-2023-06-23 2023-06-26 12:47:20 -07:00
mbcache.c ext4: fix deadlock due to mbcache entry corruption 2022-12-08 21:49:25 -05:00
mnt_idmapping.c fs: move mnt_idmap 2023-01-19 09:24:30 +01:00
mount.h
mpage.c mpage: use folios in bio end_io handler 2023-04-18 16:30:02 -07:00
namei.c fs: no need to check source 2023-07-04 10:20:29 +02:00
namespace.c v6.5/vfs.mount 2023-06-26 10:27:04 -07:00
nsfs.c kill the last remaining user of proc_ns_fget() 2023-04-20 22:55:35 -04:00
open.c open: make RESOLVE_CACHED correctly test for O_TMPFILE 2023-08-06 15:08:35 +02:00
pipe.c pipe: check for IOCB_NOWAIT alongside O_NONBLOCK 2023-05-12 17:17:27 +02:00
pnode.c fs: allow to mount beneath top mount 2023-05-19 04:30:22 +02:00
pnode.h fs: allow to mount beneath top mount 2023-05-19 04:30:22 +02:00
posix_acl.c acl: don't depend on IOP_XATTR 2023-03-06 09:59:20 +01:00
proc_namespace.c tty, proc, kernfs, random: Use copy_splice_read() 2023-05-24 08:42:16 -06:00
read_write.c splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
readdir.c vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
remap_range.c fs: use UB-safe check for signed addition overflow in remap_verify_area 2023-05-24 11:03:59 +02:00
select.c
seq_file.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
signalfd.c
splice.c splice, net: Fix splice_to_socket() for O_NONBLOCK socket 2023-07-26 21:56:06 -07:00
stack.c
stat.c fs.idmapped.v6.3 2023-02-20 11:53:11 -08:00
statfs.c statfs: enforce statfs[64] structure initialization 2023-05-17 15:20:17 +02:00
super.c \n 2023-06-29 13:39:51 -07:00
sync.c
sysctls.c sysctl: Refactor base paths registrations 2023-05-23 21:43:26 -07:00
timerfd.c
userfaultfd.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
utimes.c fs.idmapped.v6.3 2023-02-20 11:53:11 -08:00
xattr.c fs: don't call posix_acl_listxattr in generic_listxattr 2023-05-17 15:25:20 +02:00