linux/fs/xfs
Dave Chinner 07428d7f0c xfs: fix attr tree double split corruption
In certain circumstances, a double split of an attribute tree is
needed to insert or replace an attribute. In rare situations, this
can go wrong, leaving the attribute tree corrupted. In this case,
the attr being replaced is the last attr in a leaf node, and the
replacement is larger so doesn't fit in the same leaf node.
When we have the initial condition of a node format attribute
btree with two leaves at index 1 and 2. Call them L1 and L2.  The
leaf L1 is completely full, there is not a single byte of free space
in it. L2 is mostly empty.  The attribute being replaced - call it X
- is the last attribute in L1.

The way an attribute replace is executed is that the replacement
attribute - call it Y - is first inserted into the tree, but has an
INCOMPLETE flag set on it so that list traversals ignore it. Once
this transaction is committed, a second transaction it run to
atomically mark Y as COMPLETE and X as INCOMPLETE, so that a
traversal will now find Y and skip X. Once that transaction is
committed, attribute X is then removed.

So, the initial condition is:

     +--------+     +--------+
     |   L1   |     |   L2   |
     | fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |
     | fsp: 0 |     | fsp: N |
     |--------|     |--------|
     | attr A |     | attr 1 |
     |--------|     |--------|
     | attr B |     | attr 2 |
     |--------|     |--------|
     ..........     ..........
     |--------|     |--------|
     | attr X |     | attr n |
     +--------+     +--------+


So now we go to replace X, and see that L1:fsp = 0 - it is full so
we can't insert Y in the same leaf. So we record the the location of
attribute X so we can track it for later use, then we split L1 into
L1 and L3 and reblance across the two leafs. We end with:


     +--------+     +--------+     +--------+
     |   L1   |     |   L3   |     |   L2   |
     | fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|
     | attr A |     | attr X |     | attr 1 |
     |--------|     +--------+     |--------|
     | attr B |                    | attr 2 |
     |--------|                    |--------|
     ..........                    ..........
     |--------|                    |--------|
     | attr W |                    | attr n |
     +--------+                    +--------+


And we track that the original attribute is now at L3:0.

We then try to insert Y into L1 again, and find that there isn't
enough room because the new attribute is larger than the old one.
Hence we have to split again to make room for Y. We end up with
this:


     +--------+     +--------+     +--------+     +--------+
     |   L1   |     |   L4   |     |   L3   |     |   L2   |
     | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|     |--------|
     | attr A |     | attr Y |     | attr X |     | attr 1 |
     |--------|     + INCOMP +     +--------+     |--------|
     | attr B |     +--------+                    | attr 2 |
     |--------|                                   |--------|
     ..........                                   ..........
     |--------|                                   |--------|
     | attr W |                                   | attr n |
     +--------+                                   +--------+

And now we have the new (incomplete) attribute @ L4:0, and the
original attribute at L3:0. At this point, the first transaction is
committed, and we move to the flipping of the flags.

This is where we are supposed to end up with this:

     +--------+     +--------+     +--------+     +--------+
     |   L1   |     |   L4   |     |   L3   |     |   L2   |
     | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|     |--------|
     | attr A |     | attr Y |     | attr X |     | attr 1 |
     |--------|     +--------+     + INCOMP +     |--------|
     | attr B |                    +--------+     | attr 2 |
     |--------|                                   |--------|
     ..........                                   ..........
     |--------|                                   |--------|
     | attr W |                                   | attr n |
     +--------+                                   +--------+

But that doesn't happen properly - the attribute tracking indexes
are not pointing to the right locations. What we end up with is both
the old attribute to be removed pointing at L4:0 and the new
attribute at L4:1.  On a debug kernel, this assert fails like so:

XFS: Assertion failed: args->index2 < be16_to_cpu(leaf2->hdr.count), file: fs/xfs/xfs_attr_leaf.c, line: 2725

because the new attribute location does not exist. On a production
kernel, this goes unnoticed and the code proceeds ahead merrily and
removes L4 because it thinks that is the block that is no longer
needed. This leaves the hash index node pointing to entries
L1, L4 and L2, but only blocks L1, L3 and L2 to exist. Further, the
leaf level sibling list is L1 <-> L4 <-> L2, but L4 is now free
space, and so everything is busted. This corruption is caused by the
removal of the old attribute triggering a join - it joins everything
correctly but then frees the wrong block.

xfs_repair will report something like:

bad sibling back pointer for block 4 in attribute fork for inode 131
problem with attribute contents in inode 131
would clear attr fork
bad nblocks 8 for inode 131, would reset to 3
bad anextents 4 for inode 131, would reset to 0

The problem lies in the assignment of the old/new blocks for
tracking purposes when the double leaf split occurs. The first split
tries to place the new attribute inside the current leaf (i.e.
"inleaf == true") and moves the old attribute (X) to the new block.
This sets up the old block/index to L1:X, and newly allocated
block to L3:0. It then moves attr X to the new block and tries to
insert attr Y at the old index. That fails, so it splits again.

With the second split, the rebalance ends up placing the new attr in
the second new block - L4:0 - and this is where the code goes wrong.
What is does is it sets both the new and old block index to the
second new block. Hence it inserts attr Y at the right place (L4:0)
but overwrites the current location of the attr to replace that is
held in the new block index (currently L3:0). It over writes it with
L4:1 - the index we later assert fail on.

Hopefully this table will show this in a foramt that is a bit easier
to understand:

Split		old attr index		new attr index
		vanilla	patched		vanilla	patched
before 1st	L1:26	L1:26		N/A	N/A
after 1st	L3:0	L3:0		L1:26	L1:26
after 2nd	L4:0	L3:0		L4:1	L4:0
                ^^^^			^^^^
		wrong			wrong

The fix is surprisingly simple, for all this analysis - just stop
the rebalance on the out-of leaf case from overwriting the new attr
index - it's already correct for the double split case.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-13 14:45:29 -06:00
..
Kconfig
kmem.c xfs: switch to proper __bitwise type for KM_... flags 2012-05-29 23:28:32 -04:00
kmem.h xfs: switch to proper __bitwise type for KM_... flags 2012-05-29 23:28:32 -04:00
Makefile xfs: remove xfs_iget.c 2012-10-17 13:42:25 -05:00
mrlock.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
time.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
uuid.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
uuid.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs.h xfs: don't expect xfs headers to be in subdirectories 2011-08-12 13:57:55 -05:00
xfs_acl.c userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr 2012-09-18 01:01:35 -07:00
xfs_acl.h xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set 2011-08-01 02:35:04 -04:00
xfs_ag.h xfs: add EOFBLOCKS inode tagging/untagging 2012-11-08 14:20:44 -06:00
xfs_alloc.c xfs: move allocation stack switch up to xfs_bmapi_allocate 2012-10-18 17:42:48 -05:00
xfs_alloc.h xfs: move allocation stack switch up to xfs_bmapi_allocate 2012-10-18 17:42:48 -05:00
xfs_alloc_btree.c xfs: invalidate allocbt blocks moved to the free list 2012-11-07 14:13:30 -06:00
xfs_alloc_btree.h xfs: struct xfs_buf_log_format isn't variable sized. 2012-07-01 14:50:04 -05:00
xfs_aops.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
xfs_aops.h Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision. 2012-07-22 11:00:55 -05:00
xfs_attr.c xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_attr.h
xfs_attr_leaf.c xfs: fix attr tree double split corruption 2012-11-13 14:45:29 -06:00
xfs_attr_leaf.h xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_attr_sf.h
xfs_bit.c
xfs_bit.h
xfs_bmap.c xfs: move allocation stack switch up to xfs_bmapi_allocate 2012-10-18 17:42:48 -05:00
xfs_bmap.h xfs: move allocation stack switch up to xfs_bmapi_allocate 2012-10-18 17:42:48 -05:00
xfs_bmap_btree.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_bmap_btree.h
xfs_btree.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_btree.h xfs: Remove the macro XFS_BUF_PTR 2011-07-25 15:03:13 -05:00
xfs_buf.c xfs: fix race while discarding buffers [V4] 2012-08-29 15:01:11 -05:00
xfs_buf.h xfs: fix race while discarding buffers [V4] 2012-08-29 15:01:11 -05:00
xfs_buf_item.c xfs: fix buffer shudown reference count mismatch 2012-11-07 15:26:53 -06:00
xfs_buf_item.h xfs: support discontiguous buffers in the xfs_buf_log_item 2012-07-01 14:50:06 -05:00
xfs_da_btree.c xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_da_btree.h xfs: fix comment typo of struct xfs_da_blkinfo. 2012-07-22 10:34:42 -05:00
xfs_dfrag.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
xfs_dfrag.h
xfs_dinode.h xfs: fix typo in comment of xfs_dinode_t. 2012-06-14 12:28:26 -05:00
xfs_dir2.c xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_dir2.h xfs: reshuffle dir2 headers 2011-07-13 13:43:48 +02:00
xfs_dir2_block.c xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_dir2_data.c xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_dir2_format.h xfs: cleanup struct xfs_dir2_free 2011-07-13 13:43:48 +02:00
xfs_dir2_leaf.c xfs: factor buffer reading from xfs_dir2_leaf_getdents 2012-07-01 14:50:08 -05:00
xfs_dir2_node.c xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_dir2_priv.h xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_dir2_sf.c xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_discard.c xfs: check for possible overflow in xfs_ioc_trim 2012-08-23 14:48:44 -05:00
xfs_discard.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_dquot.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_dquot.h xfs: on-stack delayed write buffer lists 2012-05-14 16:20:31 -05:00
xfs_dquot_item.c xfs: clean up xfs_bit.h includes 2012-05-14 16:21:00 -05:00
xfs_dquot_item.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_error.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_error.h xfs: kill support/debug.[ch] 2011-03-07 10:09:35 +11:00
xfs_export.c xfs: remove xfs_iget.c 2012-10-17 13:42:25 -05:00
xfs_export.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_extent_busy.c xfs: make xfs_extent_busy_trim not static 2012-05-14 16:21:04 -05:00
xfs_extent_busy.h xfs: make xfs_extent_busy_trim not static 2012-05-14 16:21:04 -05:00
xfs_extfree_item.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_extfree_item.h xfs: Pull EFI/EFD handling out from under the AIL lock 2010-12-20 11:59:49 +11:00
xfs_file.c xfs: xfs_sync_data is redundant. 2012-10-17 12:01:25 -05:00
xfs_filestream.c xfs: rename allocation range fields in struct xfs_bmalloca 2011-10-11 21:15:06 -05:00
xfs_filestream.h
xfs_fs.h xfs: add minimum file size filtering to eofblocks scan 2012-11-08 15:32:29 -06:00
xfs_fs_subr.c xfs: remove the i_size field in struct xfs_inode 2012-01-17 15:08:53 -06:00
xfs_fsops.c xfs: report projid32bit feature in geometry call 2012-11-07 15:27:27 -06:00
xfs_fsops.h xfs: ensure log covering transactions are synchronous 2011-01-11 20:28:17 -06:00
xfs_globals.c xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_ialloc.c xfs: Update inode alloc comments 2012-11-02 17:13:12 -05:00
xfs_ialloc.h xfs: remove the alloc_done argument to xfs_dialloc 2012-07-29 16:00:31 -05:00
xfs_ialloc_btree.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_ialloc_btree.h
xfs_icache.c xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_icache.h xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_inode.c xfs: create helper to check whether to free eofblocks on inode 2012-11-08 15:22:34 -06:00
xfs_inode.h xfs: create helper to check whether to free eofblocks on inode 2012-11-08 15:22:34 -06:00
xfs_inode_item.c xfs: check for stale inode before acquiring iflock on push 2012-06-21 14:20:06 -05:00
xfs_inode_item.h xfs: pass shutdown method into xfs_trans_ail_delete_bulk 2012-05-14 16:20:33 -05:00
xfs_inum.h xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_ioctl.c xfs: add inode id filtering to eofblocks scan 2012-11-08 15:29:14 -06:00
xfs_ioctl.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_ioctl32.c xfs: Convert to new freezing code 2012-07-31 09:45:48 +04:00
xfs_ioctl32.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_iomap.c xfs: add EOFBLOCKS inode tagging/untagging 2012-11-08 14:20:44 -06:00
xfs_iomap.h xfs: kill xfs_iomap 2010-12-16 16:05:51 -06:00
xfs_iops.c xfs: add EOFBLOCKS inode tagging/untagging 2012-11-08 14:20:44 -06:00
xfs_iops.h xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
xfs_itable.c xfs: remove xfs_iget.c 2012-10-17 13:42:25 -05:00
xfs_itable.h
xfs_linux.h xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_log.c xfs: only update the last_sync_lsn when a transaction completes 2012-10-17 13:43:35 -05:00
xfs_log.h xfs: xfs_quiesce_attr() should quiesce the log like unmount 2012-10-17 13:39:14 -05:00
xfs_log_cil.c xfs: rename log structure to xlog 2012-06-21 14:21:11 -05:00
xfs_log_priv.h xfs: sync work is now only periodic log work 2012-10-17 11:53:29 -05:00
xfs_log_recover.c xfs: fix reading of wrapped log data 2012-11-07 15:27:17 -06:00
xfs_log_recover.h
xfs_message.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_message.h treewide: use __printf not __attribute__((format(printf,...))) 2011-10-31 17:30:54 -07:00
xfs_mount.c xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_mount.h xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_mru_cache.c xfs: convert to alloc_workqueue() 2011-02-01 11:42:43 +01:00
xfs_mru_cache.h
xfs_qm.c xfs: remove xfs_iget.c 2012-10-17 13:42:25 -05:00
xfs_qm.h xfs: remove the global xfs_Gqm structure 2012-03-14 12:06:32 -05:00
xfs_qm_bhv.c xfs: clean up xfs_bit.h includes 2012-05-14 16:21:00 -05:00
xfs_qm_syscalls.c xfs: support a tag-based inode_ag_iterator 2012-11-08 15:20:38 -06:00
xfs_quota.h Define new macro XFS_ALL_QUOTA_ACTIVE and simply some usage 2012-02-03 11:32:20 -06:00
xfs_quota_priv.h xfs: use per-filesystem radix trees for dquot lookup 2012-03-14 11:09:06 -05:00
xfs_quotaops.c userns: Convert qutoactl 2012-09-18 01:01:39 -07:00
xfs_rename.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_rtalloc.c xfs: remove xfs_iget.c 2012-10-17 13:42:25 -05:00
xfs_rtalloc.h xfs: Remove the macro XFS_BUF_PTR 2011-07-25 15:03:13 -05:00
xfs_sb.h xfs: kill the unused XFS_BB_FSB_OFFSET macro 2012-02-02 17:08:04 -06:00
xfs_stats.c xfs: use common code for quota statistics 2012-03-14 11:09:06 -05:00
xfs_stats.h xfs: use common code for quota statistics 2012-03-14 11:09:06 -05:00
xfs_super.c xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_super.h xfs: xfs_sync_data is redundant. 2012-10-17 12:01:25 -05:00
xfs_sysctl.c xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_sysctl.h xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_trace.c xfs: clean up xfs_bit.h includes 2012-05-14 16:21:00 -05:00
xfs_trace.h xfs: create function to scan and clear EOFBLOCKS inodes 2012-11-08 15:25:40 -06:00
xfs_trans.c xfs: Convert to new freezing code 2012-07-31 09:45:48 +04:00
xfs_trans.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
xfs_trans_ail.c xfs: re-enable xfsaild idle mode and fix associated races 2012-07-29 16:27:57 -05:00
xfs_trans_buf.c xfs: add discontiguous buffer support to transactions 2012-07-01 14:50:06 -05:00
xfs_trans_dquot.c userns: Convert quota netlink aka quota_send_warning 2012-09-18 01:01:40 -07:00
xfs_trans_extfree.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_trans_inode.c xfs: clean up xfs_bit.h includes 2012-05-14 16:21:00 -05:00
xfs_trans_priv.h xfs: re-enable xfsaild idle mode and fix associated races 2012-07-29 16:27:57 -05:00
xfs_trans_space.h
xfs_types.h xfs: struct xfs_buf_log_format isn't variable sized. 2012-07-01 14:50:04 -05:00
xfs_utils.c xfs: remove the alloc_done argument to xfs_dialloc 2012-07-29 16:00:31 -05:00
xfs_utils.h xfs: propagate umode_t 2012-01-03 22:55:00 -05:00
xfs_vnode.h xfs: remove remaining scraps of struct xfs_iomap 2012-03-15 13:40:16 -05:00
xfs_vnodeops.c xfs: make xfs_free_eofblocks() non-static, return EAGAIN on trylock failure 2012-11-08 15:24:26 -06:00
xfs_vnodeops.h xfs: make xfs_free_eofblocks() non-static, return EAGAIN on trylock failure 2012-11-08 15:24:26 -06:00
xfs_xattr.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00