linux/fs/xfs
Dave Chinner d3bc815afb xfs: punch new delalloc blocks out of failed writes inside EOF.
When a partial write inside EOF fails, it can leave delayed
allocation blocks lying around because they don't get punched back
out. This leads to assert failures like:

XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 847

when evicting inodes from the cache. This can be trivially triggered
by xfstests 083, which takes between 5 and 15 executions on a 512
byte block size filesystem to trip over this. Debugging shows a
failed write due to ENOSPC calling xfs_vm_write_failed such as:

[ 5012.329024] ino 0xa0026: vwf to 0x17000, sze 0x1c85ae

and no action is taken on it. This leaves behind a delayed
allocation extent that has no page covering it and no data in it:

[ 5015.867162] ino 0xa0026: blks: 0x83 delay blocks 0x1, size 0x2538c0
[ 5015.868293] ext 0: off 0x4a, fsb 0x50306, len 0x1
[ 5015.869095] ext 1: off 0x4b, fsb 0x7899, len 0x6b
[ 5015.869900] ext 2: off 0xb6, fsb 0xffffffffe0008, len 0x1
                                    ^^^^^^^^^^^^^^^
[ 5015.871027] ext 3: off 0x36e, fsb 0x7a27, len 0xd
[ 5015.872206] ext 4: off 0x4cf, fsb 0x7a1d, len 0xa

So the delayed allocation extent is one block long at offset
0x16c00. Tracing shows that a bigger write:

xfs_file_buffered_write: size 0x1c85ae offset 0x959d count 0x1ca3f ioflags

allocates the block, and then fails with ENOSPC trying to allocate
the last block on the page, leading to a failed write with stale
delalloc blocks on it.

Because we've had an ENOSPC when trying to allocate 0x16e00, it
means that we are never goinge to call ->write_end on the page and
so the allocated new buffer will not get marked dirty or have the
buffer_new state cleared. In other works, what the above write is
supposed to end up with is this mapping for the page:

    +------+------+------+------+------+------+------+------+
      UMA    UMA    UMA    UMA    UMA    UMA    UND    FAIL

where:  U = uptodate
        M = mapped
        N = new
        A = allocated
        D = delalloc
        FAIL = block we ENOSPC'd on.

and the key point being the buffer_new() state for the newly
allocated delayed allocation block. Except it doesn't - we're not
marking buffers new correctly.

That buffer_new() problem goes back to the xfs_iomap removal days,
where xfs_iomap() used to return a "new" status for any map with
newly allocated blocks, so that __xfs_get_blocks() could call
set_buffer_new() on it. We still have the "new" variable and the
check for it in the set_buffer_new() logic - except we never set it
now!

Hence that newly allocated delalloc block doesn't have the new flag
set on it, so when the write fails we cannot tell which blocks we
are supposed to punch out. WHy do we need the buffer_new flag? Well,
that's because we can have this case:

    +------+------+------+------+------+------+------+------+
      UMD    UMD    UMD    UMD    UMD    UMD    UND    FAIL

where all the UMD buffers contain valid data from a previously
successful write() system call. We only want to punch the UND buffer
because that's the only one that we added in this write and it was
only this write that failed.

That implies that even the old buffer_new() logic was wrong -
because it would result in all those UMD buffers on the page having
set_buffer_new() called on them even though they aren't new. Hence
we shoul donly be calling set_buffer_new() for delalloc buffers that
were allocated (i.e. were a hole before xfs_iomap_write_delay() was
called).

So, fix this set_buffer_new logic according to how we need it to
work for handling failed writes correctly. Also, restore the new
buffer logic handling for blocks allocated via
xfs_iomap_write_direct(), because it should still set the buffer_new
flag appropriately for newly allocated blocks, too.

SO, now we have the buffer_new() being set appropriately in
__xfs_get_blocks(), we can detect the exact delalloc ranges that
we allocated in a failed write, and hence can now do a walk of the
buffers on a page to find them.

Except, it's not that easy. When block_write_begin() fails, it
unlocks and releases the page that we just had an error on, so we
can't use that page to handle errors anymore. We have to get access
to the page while it is still locked to walk the buffers. Hence we
have to open code block_write_begin() in xfs_vm_write_begin() to be
able to insert xfs_vm_write_failed() is the right place.

With that, we can pass the page and write range to
xfs_vm_write_failed() and walk the buffers on the page, looking for
delalloc buffers that are either new or beyond EOF and punch them
out. Handling buffers beyond EOF ensures we still handle the
existing case that xfs_vm_write_failed() handles.

Of special note is the truncate_pagecache() handling - that only
should be done for pages outside EOF - pages within EOF can still
contain valid, dirty data so we must not punch them out of the
cache.

That just leaves the xfs_vm_write_end() failure handling.
The only failure case here is that we didn't copy the entire range,
and generic_write_end() handles that by zeroing the region of the
page that wasn't copied, we don't have to punch out blocks within
the file because they are guaranteed to contain zeros. Hence we only
have to handle the existing "beyond EOF" case and don't need access
to the buffers on the page. Hence it remains largely unchanged.

Note that xfs_getbmap() can still trip over delalloc blocks beyond
EOF that are left there by speculative delayed allocation. Hence
this bug fix does not solve all known issues with bmap vs delalloc,
but it does fix all the the known accidental occurances of the
problem.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:36 -05:00
..
Kconfig
kmem.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
kmem.h xfs: use a normal shrinker for the dquot freelist 2012-02-10 12:38:09 -06:00
Makefile xfs: use common code for quota statistics 2012-03-14 11:09:06 -05:00
mrlock.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
time.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
uuid.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
uuid.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs.h xfs: don't expect xfs headers to be in subdirectories 2011-08-12 13:57:55 -05:00
xfs_acl.c xfs: fix acl count validation in xfs_acl_from_disk() 2011-12-16 15:17:42 -06:00
xfs_acl.h xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set 2011-08-01 02:35:04 -04:00
xfs_ag.h xfs: Remove the macro XFS_BUF_PTR 2011-07-25 15:03:13 -05:00
xfs_alloc.c xfs: fix fstrim offset calculations 2012-03-27 16:07:03 -05:00
xfs_alloc.h xfs: fix fstrim offset calculations 2012-03-27 16:07:03 -05:00
xfs_alloc_btree.c xfs: remove leftovers of the old btree tracing code 2011-07-13 13:43:50 +02:00
xfs_alloc_btree.h
xfs_aops.c xfs: punch new delalloc blocks out of failed writes inside EOF. 2012-05-14 16:20:36 -05:00
xfs_aops.h xfs: log file size updates at I/O completion time 2012-03-13 16:30:49 -05:00
xfs_attr.c xfs: add lots of attribute trace points 2012-03-27 17:18:21 -05:00
xfs_attr.h
xfs_attr_leaf.c xfs: add lots of attribute trace points 2012-03-27 17:18:21 -05:00
xfs_attr_leaf.h
xfs_attr_sf.h
xfs_bit.c
xfs_bit.h
xfs_bmap.c xfs: fix deadlock in xfs_rtfree_extent 2012-03-22 15:31:06 -05:00
xfs_bmap.h xfs: cleanup xfs_bmap.h 2011-10-11 21:15:07 -05:00
xfs_bmap_btree.c xfs: remove leftovers of the old btree tracing code 2011-07-13 13:43:50 +02:00
xfs_bmap_btree.h
xfs_btree.c xfs: remove XFS_BUF_SET_VTYPE and XFS_BUF_SET_VTYPE_REF 2011-10-11 21:15:09 -05:00
xfs_btree.h xfs: Remove the macro XFS_BUF_PTR 2011-07-25 15:03:13 -05:00
xfs_buf.c xfs: on-stack delayed write buffer lists 2012-05-14 16:20:31 -05:00
xfs_buf.h xfs: on-stack delayed write buffer lists 2012-05-14 16:20:31 -05:00
xfs_buf_item.c xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_buf_item.h
xfs_da_btree.c xfs: add lots of attribute trace points 2012-03-27 17:18:21 -05:00
xfs_da_btree.h xfs: remove the dead XFS_DABUF_DEBUG code 2011-07-13 13:43:50 +02:00
xfs_dfrag.c xfs: split in-core and on-disk inode log item fields 2012-03-13 17:08:17 -05:00
xfs_dfrag.h
xfs_dinode.h xfs: Remove the macro XFS_BUF_PTR 2011-07-25 15:03:13 -05:00
xfs_dir2.c xfs: get rid of open-coded S_ISREG(), etc. 2011-07-26 15:05:16 -04:00
xfs_dir2.h xfs: reshuffle dir2 headers 2011-07-13 13:43:48 +02:00
xfs_dir2_block.c xfs: clean up minor sparse warnings 2012-03-14 13:21:17 -05:00
xfs_dir2_data.c xfs: reshuffle dir2 headers 2011-07-13 13:43:48 +02:00
xfs_dir2_format.h xfs: cleanup struct xfs_dir2_free 2011-07-13 13:43:48 +02:00
xfs_dir2_leaf.c xfs: introduce xfs_bmapi_read() 2011-10-11 21:15:03 -05:00
xfs_dir2_node.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-07-25 13:56:39 -07:00
xfs_dir2_priv.h xfs: reshuffle dir2 headers 2011-07-13 13:43:48 +02:00
xfs_dir2_sf.c xfs: reshuffle dir2 headers 2011-07-13 13:43:48 +02:00
xfs_discard.c xfs: fix fstrim offset calculations 2012-03-27 16:07:03 -05:00
xfs_discard.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_dquot.c xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_dquot.h xfs: on-stack delayed write buffer lists 2012-05-14 16:20:31 -05:00
xfs_dquot_item.c xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_dquot_item.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_error.c xfs: Convert remaining cmn_err() callers to new API 2011-03-07 10:08:35 +11:00
xfs_error.h xfs: kill support/debug.[ch] 2011-03-07 10:09:35 +11:00
xfs_export.c xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels 2011-12-06 10:46:23 -06:00
xfs_export.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_extfree_item.c xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_extfree_item.h xfs: Pull EFI/EFD handling out from under the AIL lock 2010-12-20 11:59:49 +11:00
xfs_file.c xfs: push the ilock into xfs_zero_eof 2012-05-14 16:20:20 -05:00
xfs_filestream.c xfs: rename allocation range fields in struct xfs_bmalloca 2011-10-11 21:15:06 -05:00
xfs_filestream.h
xfs_fs.h xfs: consolidate & clarify mount sanity checks 2011-07-08 11:32:51 -05:00
xfs_fs_subr.c xfs: remove the i_size field in struct xfs_inode 2012-01-17 15:08:53 -06:00
xfs_fsops.c xfs: Check the return value of xfs_buf_get() 2011-10-11 21:15:01 -05:00
xfs_fsops.h xfs: ensure log covering transactions are synchronous 2011-01-11 20:28:17 -06:00
xfs_globals.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_ialloc.c xfs: propagate umode_t 2012-01-03 22:55:00 -05:00
xfs_ialloc.h xfs: propagate umode_t 2012-01-03 22:55:00 -05:00
xfs_ialloc_btree.c xfs: remove leftovers of the old btree tracing code 2011-07-13 13:43:50 +02:00
xfs_ialloc_btree.h
xfs_iget.c xfs: remove log item from AIL in xfs_iflush after a shutdown 2012-05-14 16:20:25 -05:00
xfs_inode.c xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_inode.h xfs: on-stack delayed write buffer lists 2012-05-14 16:20:31 -05:00
xfs_inode_item.c xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_inode_item.h xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_inum.h xfs: cleanup shortform directory inode number handling 2011-07-08 14:35:03 +02:00
xfs_ioctl.c xfs: Fix open flag handling in open_by_handle code 2012-03-22 15:56:52 -05:00
xfs_ioctl.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_ioctl32.c xfs: clean up minor sparse warnings 2012-03-14 13:21:17 -05:00
xfs_ioctl32.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_iomap.c xfs: use shared ilock mode for direct IO writes by default 2012-05-14 16:20:21 -05:00
xfs_iomap.h
xfs_iops.c xfs: push the ilock into xfs_zero_eof 2012-05-14 16:20:20 -05:00
xfs_iops.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_itable.c xfs: don't cache inodes read through bulkstat 2012-03-26 17:19:08 -05:00
xfs_itable.h
xfs_linux.h xfs: revert to using a kthread for AIL pushing 2011-10-11 11:02:49 -05:00
xfs_log.c xfs: allow assigning the tail lsn with the AIL lock held 2012-05-14 16:20:26 -05:00
xfs_log.h xfs: allow assigning the tail lsn with the AIL lock held 2012-05-14 16:20:26 -05:00
xfs_log_cil.c xfs: Do background CIL flushes via a workqueue 2012-05-14 16:20:34 -05:00
xfs_log_priv.h xfs: Do background CIL flushes via a workqueue 2012-05-14 16:20:34 -05:00
xfs_log_recover.c xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_log_recover.h
xfs_message.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_message.h treewide: use __printf not __attribute__((format(printf,...))) 2011-10-31 17:30:54 -07:00
xfs_mount.c xfs: implement freezing by emptying the AIL 2012-05-14 16:20:27 -05:00
xfs_mount.h xfs: Do background CIL flushes via a workqueue 2012-05-14 16:20:34 -05:00
xfs_mru_cache.c xfs: convert to alloc_workqueue() 2011-02-01 11:42:43 +01:00
xfs_mru_cache.h
xfs_qm.c xfs: on-stack delayed write buffer lists 2012-05-14 16:20:31 -05:00
xfs_qm.h xfs: remove the global xfs_Gqm structure 2012-03-14 12:06:32 -05:00
xfs_qm_bhv.c xfs: remove the global xfs_Gqm structure 2012-03-14 12:06:32 -05:00
xfs_qm_syscalls.c xfs: remove the per-filesystem list of dquots 2012-03-14 11:53:34 -05:00
xfs_quota.h Define new macro XFS_ALL_QUOTA_ACTIVE and simply some usage 2012-02-03 11:32:20 -06:00
xfs_quota_priv.h xfs: use per-filesystem radix trees for dquot lookup 2012-03-14 11:09:06 -05:00
xfs_quotaops.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_rename.c vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link} 2012-03-20 21:29:32 -04:00
xfs_rtalloc.c xfs: fix deadlock in xfs_rtfree_extent 2012-03-22 15:31:06 -05:00
xfs_rtalloc.h xfs: Remove the macro XFS_BUF_PTR 2011-07-25 15:03:13 -05:00
xfs_rw.c xfs: clean up xfs_ioerror_alert 2011-10-11 21:15:10 -05:00
xfs_rw.h xfs: clean up xfs_ioerror_alert 2011-10-11 21:15:10 -05:00
xfs_sb.h xfs: kill the unused XFS_BB_FSB_OFFSET macro 2012-02-02 17:08:04 -06:00
xfs_stats.c xfs: use common code for quota statistics 2012-03-14 11:09:06 -05:00
xfs_stats.h xfs: use common code for quota statistics 2012-03-14 11:09:06 -05:00
xfs_super.c xfs: Do background CIL flushes via a workqueue 2012-05-14 16:20:34 -05:00
xfs_super.h xfs: remove the global xfs_Gqm structure 2012-03-14 12:06:32 -05:00
xfs_sync.c xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_sync.h xfs: log timestamp updates 2012-03-13 17:01:15 -05:00
xfs_sysctl.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_sysctl.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_trace.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_trace.h xfs: on-stack delayed write buffer lists 2012-05-14 16:20:31 -05:00
xfs_trans.c xfs: split and cleanup xfs_log_reserve 2012-02-22 22:37:04 -06:00
xfs_trans.h xfs: on-stack delayed write buffer lists 2012-05-14 16:20:31 -05:00
xfs_trans_ail.c xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_trans_buf.c xfs: on-stack delayed write buffer lists 2012-05-14 16:20:31 -05:00
xfs_trans_dquot.c xfs: remove the global xfs_Gqm structure 2012-03-14 12:06:32 -05:00
xfs_trans_extfree.c xfs: Pull EFI/EFD handling out from under the AIL lock 2010-12-20 11:59:49 +11:00
xfs_trans_inode.c xfs: split in-core and on-disk inode log item fields 2012-03-13 17:08:17 -05:00
xfs_trans_priv.h xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_trans_space.h
xfs_types.h xfs: exact busy extent tracking 2011-04-28 13:18:04 -05:00
xfs_utils.c vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link} 2012-03-20 21:29:32 -04:00
xfs_utils.h xfs: propagate umode_t 2012-01-03 22:55:00 -05:00
xfs_vnode.h xfs: remove remaining scraps of struct xfs_iomap 2012-03-15 13:40:16 -05:00
xfs_vnodeops.c vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link} 2012-03-20 21:29:32 -04:00
xfs_vnodeops.h xfs: remove remaining scraps of struct xfs_iomap 2012-03-15 13:40:16 -05:00
xfs_xattr.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00