linux/fs
Jeff Moyer 491caa4363 ext4: fix race between sync and completed io work
The following command line will leave the aio-stress process unkillable
on an ext4 file system (in my case, mounted on /mnt/test):

aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2

This is using the aio-stress program from the xfstests test suite.
That particular command line tells aio-stress to do random writes to
20 files from 20 threads (one thread per file).  The files are NOT
preallocated, so you will get writes to random offsets within the
file, thus creating holes and extending i_size.  It also opens the
file with O_DIRECT and O_SYNC.

On to the problem.  When an I/O requires unwritten extent conversion,
it is queued onto the completed_io_list for the ext4 inode.  Two code
paths will pull work items from this list.  The first is the
ext4_end_io_work routine, and the second is ext4_flush_completed_IO,
which is called via the fsync path (and O_SYNC handling, as well).
There are two issues I've found in these code paths.  First, if the
fsync path beats the work routine to a particular I/O, the work
routine will free the io_end structure!  It does not take into account
the fact that the io_end may still be in use by the fsync path.  I've
fixed this issue by adding yet another IO_END flag, indicating that
the io_end is being processed by the fsync path.

The second problem is that the work routine will make an assignment to
io->flag outside of the lock.  I have witnessed this result in a hang
at umount.  Moving the flag setting inside the lock resolved that
problem.

The problem was introduced by commit b82e384c7b ("ext4: optimize
locking for end_io extent conversion"), which first appeared in 3.2.
As such, the fix should be backported to that release (probably along
with the unwritten extent conversion race fix).

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
CC: stable@kernel.org
2012-03-05 10:29:52 -05:00
..
9p Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs 2012-01-10 15:09:01 -08:00
adfs vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
affs affs: propagate umode_t 2012-01-03 22:55:04 -05:00
afs switch ->create() to umode_t 2012-01-03 22:54:53 -05:00
autofs4 autofs4 - fix deal with autofs4_write races 2012-01-13 08:30:49 -08:00
befs vfs: fix the stupidity with i_dentry in inode destructors 2012-01-03 22:52:40 -05:00
bfs switch ->create() to umode_t 2012-01-03 22:54:53 -05:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2012-01-28 17:00:19 -08:00
cachefiles fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2012-01-13 10:29:21 -08:00
cifs CIFS: Rename *UCS* functions to *UTF16* 2012-01-18 22:32:33 -06:00
coda coda: switch coda_cnode_make() to sane API as well, clean coda_lookup() 2012-01-10 11:13:16 -05:00
configfs configfs: convert to umode_t 2012-01-03 22:54:57 -05:00
cramfs Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into Z 2012-01-06 23:15:54 -05:00
debugfs kernel-doc: fix new warnings in debugfs 2012-01-24 10:47:41 -08:00
devpts devpts: fix double-free on mount failure 2012-01-08 20:19:03 -05:00
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm 2012-01-10 14:55:55 -08:00
ecryptfs eCryptfs: move misleading function comments 2012-01-25 15:10:53 -08:00
efs vfs: fix the stupidity with i_dentry in inode destructors 2012-01-03 22:52:40 -05:00
exofs Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2012-01-09 12:51:01 -08:00
exportfs
ext2 ext2: protect inode changes in the SETVERSION and SETFLAGS ioctls 2012-01-11 13:39:02 +01:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2012-01-09 12:51:21 -08:00
ext4 ext4: fix race between sync and completed io work 2012-03-05 10:29:52 -05:00
fat Merge branch 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb 2012-01-09 12:09:47 -08:00
freevxfs fs: propagate umode_t, misc bits 2012-01-03 22:55:10 -05:00
fscache
fuse Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse 2012-01-12 12:39:21 -08:00
gfs2 GFS2: Fix nlink setting on inode creation 2012-01-11 12:35:05 +00:00
hfs vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
hfsplus hfsplus: creation of hidden dir on mount can fail 2012-01-10 17:48:52 -05:00
hostfs vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
hpfs switch ->mknod() to umode_t 2012-01-03 22:54:54 -05:00
hppfs vfs: for usbfs, etc. internal vfsmounts ->mnt_sb->s_root == ->mnt_root 2012-01-03 22:52:41 -05:00
hugetlbfs mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
isofs isofs: inode leak on mount failure 2012-01-09 10:48:11 -05:00
jbd jbd: Issue cache flush after checkpointing 2012-01-11 13:36:57 +01:00
jbd2 ext4: remove the journal=update mount option 2012-02-20 17:53:04 -05:00
jffs2 MTD pull for 3.3 2012-01-10 13:45:22 -08:00
jfs Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm 2012-01-08 13:10:57 -08:00
lockd module_param: make bool parameters really bool (drivers & misc) 2012-01-13 09:32:20 +10:30
logfs Pull request from git://github.com/prasad-joshi/logfs_upstream.git 2012-01-31 09:23:59 -08:00
minix Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-01-08 12:19:57 -08:00
ncpfs vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
nfs NFS client bugfixes and cleanups for Linux 3.3 (pull 2) 2012-01-16 15:08:13 -08:00
nfs_common
nfsd Merge branch 'for-3.3' of git://linux-nfs.org/~bfields/linux 2012-01-14 12:26:41 -08:00
nilfs2 Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm 2012-01-08 13:10:57 -08:00
nls NLS: raname "maxlen" to "maxout" in UTF conversion routines 2011-11-26 19:58:47 -08:00
notify fsnotify: don't BUG in fsnotify_destroy_mark() 2012-01-14 18:01:42 -08:00
ntfs module_param: avoid bool abuse, add bint for special cases. 2012-01-13 09:32:17 +10:30
ocfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm 2012-01-10 14:55:55 -08:00
omfs omfs: propagate umode_t 2012-01-03 22:55:01 -05:00
openpromfs vfs: fix the stupidity with i_dentry in inode destructors 2012-01-03 22:52:40 -05:00
proc proc: clear_refs: do not clear reserved pages 2012-01-23 08:38:48 -08:00
pstore pstore: gracefully handle NULL pstore_info functions 2011-11-18 13:49:00 -08:00
qnx4 qnx4: don't leak ->BitMap on late failure exits 2012-01-19 13:54:36 -05:00
quota quota: Pass information that quota is stored in system file to userspace 2012-01-12 13:09:09 +01:00
ramfs pohmelfs: propagate umode_t 2012-01-03 22:55:07 -05:00
reiserfs reiserfs: don't lock root inode searching 2012-01-10 16:30:54 -08:00
romfs MTD pull for 3.3 2012-01-10 13:45:22 -08:00
squashfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next 2012-01-13 10:34:57 -08:00
sysfs sysfs: Complain bitterly about attempts to remove files from nonexistent directories. 2012-01-24 12:12:32 -08:00
sysv vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb 2012-01-06 23:16:53 -05:00
ubifs UBIFS: fix non-debug configuration build 2012-01-15 13:46:02 +02:00
udf Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2012-01-09 12:51:21 -08:00
ufs vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
xfs xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink() 2012-01-25 11:01:31 -06:00
aio.c Unused iocbs in a batch should not be accounted as active. 2012-01-13 20:39:44 -08:00
anon_inodes.c
attr.c switch is_sxid() to umode_t 2012-01-03 22:55:11 -05:00
bad_inode.c switch ->mknod() to umode_t 2012-01-03 22:54:54 -05:00
binfmt_aout.c
binfmt_elf.c fs: binfmt_elf: create Kconfig variable for PIE randomization 2012-01-10 16:30:51 -08:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb 2012-01-06 23:16:53 -05:00
binfmt_script.c
binfmt_som.c
bio-integrity.c
bio.c bio: change some signed vars to unsigned 2011-11-16 09:21:50 +01:00
block_dev.c vfs: cache request_queue in struct block_device 2012-01-12 20:13:12 -08:00
buffer.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
char_dev.c char_dev.c: fix up some whitespace errors 2011-12-13 11:18:17 -08:00
compat.c switch open and mkdir syscalls to umode_t 2012-01-03 22:55:19 -05:00
compat_binfmt_elf.c
compat_ioctl.c Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2012-01-15 12:49:56 -08:00
dcache.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2012-01-13 10:29:21 -08:00
dcookies.c
direct-io.c dio: optimize cache misses in the submission path 2012-01-12 20:13:12 -08:00
drop_caches.c
eventfd.c
eventpoll.c epoll: limit paths 2012-01-12 20:13:04 -08:00
exec.c tracepoint: add tracepoints for debugging oom_score_adj 2012-01-10 16:30:44 -08:00
fcntl.c
fhandle.c vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb 2012-01-06 23:16:53 -05:00
fifo.c
file.c
file_table.c vfs: prevent remount read-only if pending removes 2012-01-06 23:20:13 -05:00
filesystems.c vfs: convert fs_supers to hlist 2012-01-03 22:52:39 -05:00
fs-writeback.c Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux 2012-01-10 16:59:59 -08:00
fs_struct.c
generic_acl.c
inode.c vfs: remove printk from set_nlink() 2012-01-17 16:39:47 -05:00
internal.h vfs: protect remounting superblock read-only 2012-01-06 23:20:12 -05:00
ioctl.c vfs: fix up ENOIOCTLCMD error handling 2012-01-05 15:40:12 -08:00
ioprio.c block, cfq: unlink cfq_io_context's immediately 2011-12-14 00:33:39 +01:00
Kconfig Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2012-01-09 12:51:01 -08:00
Kconfig.binfmt fs: binfmt_elf: create Kconfig variable for PIE randomization 2012-01-10 16:30:51 -08:00
libfs.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
locks.c vfs: fix handling of lock allocation failure in lease-break case 2011-12-26 10:25:26 -08:00
Makefile Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into Z 2012-01-06 23:15:54 -05:00
mbcache.c
mount.h vfs: keep list of mounts for each superblock 2012-01-06 23:20:12 -05:00
mpage.c fs: remove unneeded plug in mpage_readpages() 2012-01-12 09:19:54 +01:00
namei.c audit: do not call audit_getname on error 2012-01-17 16:17:01 -05:00
namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-01-08 13:21:22 -08:00
no-block.c
open.c switch security_path_chmod() to struct path * 2012-01-06 23:16:53 -05:00
pipe.c pipe: fail cleanly when root tries F_SETPIPE_SZ with big size 2012-01-12 20:13:04 -08:00
pnode.c vfs: switch pnode.h macros to struct mount * 2012-01-03 22:57:11 -05:00
pnode.h vfs: switch pnode.h macros to struct mount * 2012-01-03 22:57:11 -05:00
posix_acl.c
proc_namespace.c vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
read_write.c
read_write.h
readdir.c
select.c
seq_file.c constify seq_file stuff 2012-01-03 22:52:40 -05:00
signalfd.c
splice.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
stack.c
stat.c
statfs.c vfs: new helper - vfs_ustat() 2012-01-03 22:53:07 -05:00
super.c wake up s_wait_unfrozen when ->freeze_fs fails 2012-01-17 16:38:47 -05:00
sync.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
timerfd.c
utimes.c
xattr.c vfs: mnt_drop_write_file() 2012-01-03 22:52:40 -05:00
xattr_acl.c