linux/fs/btrfs
Filipe Manana 56f23fdbb6 Btrfs: fix file/data loss caused by fsync after rename and new inode
If we rename an inode A (be it a file or a directory), create a new
inode B with the old name of inode A and under the same parent directory,
fsync inode B and then power fail, at log tree replay time we end up
removing inode A completely. If inode A is a directory then all its files
are gone too.

Example scenarios where this happens:
This is reproducible with the following steps, taken from a couple of
test cases written for fstests which are going to be submitted upstream
soon:

   # Scenario 1

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir -p /mnt/a/x
   echo "hello" > /mnt/a/x/foo
   echo "world" > /mnt/a/x/bar
   sync
   mv /mnt/a/x /mnt/a/y
   mkdir /mnt/a/x
   xfs_io -c fsync /mnt/a/x
   <power failure happens>

   The next time the fs is mounted, log tree replay happens and
   the directory "y" does not exist nor do the files "foo" and
   "bar" exist anywhere (neither in "y" nor in "x", nor the root
   nor anywhere).

   # Scenario 2

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir /mnt/a
   echo "hello" > /mnt/a/foo
   sync
   mv /mnt/a/foo /mnt/a/bar
   echo "world" > /mnt/a/foo
   xfs_io -c fsync /mnt/a/foo
   <power failure happens>

   The next time the fs is mounted, log tree replay happens and the
   file "bar" does not exists anymore. A file with the name "foo"
   exists and it matches the second file we created.

Another related problem that does not involve file/data loss is when a
new inode is created with the name of a deleted snapshot and we fsync it:

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir /mnt/testdir
   btrfs subvolume snapshot /mnt /mnt/testdir/snap
   btrfs subvolume delete /mnt/testdir/snap
   rmdir /mnt/testdir
   mkdir /mnt/testdir
   xfs_io -c fsync /mnt/testdir # or fsync some file inside /mnt/testdir
   <power failure>

   The next time the fs is mounted the log replay procedure fails because
   it attempts to delete the snapshot entry (which has dir item key type
   of BTRFS_ROOT_ITEM_KEY) as if it were a regular (non-root) entry,
   resulting in the following error that causes mount to fail:

   [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
   [52174.512570] ------------[ cut here ]------------
   [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]()
   [52174.514681] BTRFS: Transaction aborted (error -2)
   [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc
   [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G        W       4.5.0-rc6-btrfs-next-27+ #1
   [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
   [52174.524053]  0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758
   [52174.524053]  0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd
   [52174.524053]  00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88
   [52174.524053] Call Trace:
   [52174.524053]  [<ffffffff81264e93>] dump_stack+0x67/0x90
   [52174.524053]  [<ffffffff81051618>] warn_slowpath_common+0x99/0xb2
   [52174.524053]  [<ffffffffa03591cd>] ? __btrfs_unlink_inode+0x178/0x351 [btrfs]
   [52174.524053]  [<ffffffff81051679>] warn_slowpath_fmt+0x48/0x50
   [52174.524053]  [<ffffffffa03591cd>] __btrfs_unlink_inode+0x178/0x351 [btrfs]
   [52174.524053]  [<ffffffff8118f5e9>] ? iput+0xb0/0x284
   [52174.524053]  [<ffffffffa0359fe8>] btrfs_unlink_inode+0x1c/0x3d [btrfs]
   [52174.524053]  [<ffffffffa038631e>] check_item_in_log+0x1fe/0x29b [btrfs]
   [52174.524053]  [<ffffffffa0386522>] replay_dir_deletes+0x167/0x1cf [btrfs]
   [52174.524053]  [<ffffffffa038739e>] fixup_inode_link_count+0x289/0x2aa [btrfs]
   [52174.524053]  [<ffffffffa038748a>] fixup_inode_link_counts+0xcb/0x105 [btrfs]
   [52174.524053]  [<ffffffffa038a5ec>] btrfs_recover_log_trees+0x258/0x32c [btrfs]
   [52174.524053]  [<ffffffffa03885b2>] ? replay_one_extent+0x511/0x511 [btrfs]
   [52174.524053]  [<ffffffffa034f288>] open_ctree+0x1dd4/0x21b9 [btrfs]
   [52174.524053]  [<ffffffffa032b753>] btrfs_mount+0x97e/0xaed [btrfs]
   [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
   [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
   [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
   [52174.524053]  [<ffffffffa032af81>] btrfs_mount+0x1ac/0xaed [btrfs]
   [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
   [52174.524053]  [<ffffffff8108c262>] ? lockdep_init_map+0xb9/0x1b3
   [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
   [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
   [52174.524053]  [<ffffffff8119590f>] do_mount+0x8a6/0x9e8
   [52174.524053]  [<ffffffff811358dd>] ? strndup_user+0x3f/0x59
   [52174.524053]  [<ffffffff81195c65>] SyS_mount+0x77/0x9f
   [52174.524053]  [<ffffffff814935d7>] entry_SYSCALL_64_fastpath+0x12/0x6b
   [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]---

Fix this by forcing a transaction commit when such cases happen.
This means we check in the commit root of the subvolume tree if there
was any other inode with the same reference when the inode we are
fsync'ing is a new inode (created in the current transaction).

Test cases for fstests, covering all the scenarios given above, were
submitted upstream for fstests:

  * fstests: generic test for fsync after renaming directory
    https://patchwork.kernel.org/patch/8694281/

  * fstests: generic test for fsync after renaming file
    https://patchwork.kernel.org/patch/8694301/

  * fstests: add btrfs test for fsync after snapshot deletion
    https://patchwork.kernel.org/patch/8670671/

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-04-06 17:01:44 -07:00
..
tests btrfs: move btrfs_compression_type to compression.h 2016-03-11 17:12:46 +01:00
acl.c Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-01-18 12:44:40 -08:00
async-thread.c btrfs: async-thread: Fix a use-after-free error for trace 2016-01-25 16:50:26 -08:00
async-thread.h btrfs: async_thread: Fix workqueue 'max_active' value when initializing 2015-08-31 11:46:40 -07:00
backref.c Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
backref.h btrfs: cleanup, remove inode_item_info helper 2015-01-14 19:23:47 +01:00
btrfs_inode.h btrfs: put delayed item hook into inode 2016-01-07 14:26:58 +01:00
check-integrity.c btrfs: Fix misspellings in comments. 2016-03-14 15:05:02 +01:00
check-integrity.h
compression.c Btrfs: remove no longer used function extent_read_full_page_nolock() 2016-02-03 19:27:10 +00:00
compression.h btrfs: move btrfs_compression_type to compression.h 2016-03-11 17:12:46 +01:00
ctree.c btrfs: fallback to vmalloc in btrfs_compare_tree 2016-04-04 16:29:22 +02:00
ctree.h btrfs: Fix misspellings in comments. 2016-03-14 15:05:02 +01:00
delayed-inode.c btrfs: Print Warning only if ENOSPC_DEBUG is enabled 2016-03-14 14:59:54 +01:00
delayed-inode.h btrfs: properly set the termination value of ctx->pos in readdir 2016-02-11 07:01:59 -08:00
delayed-ref.c btrfs: drop null testing before destroy functions 2016-02-18 11:46:03 +01:00
delayed-ref.h btrfs: better packing of btrfs_delayed_extent_op 2016-01-07 14:26:58 +01:00
dev-replace.c btrfs: Reset IO error counters before start of device replacing 2016-04-04 16:29:22 +02:00
dev-replace.h Btrfs: fix lockdep deadlock warning due to dev_replace 2016-02-23 13:10:10 +01:00
dir-item.c Btrfs: make xattr replace operations atomic 2014-11-20 17:20:07 -08:00
disk-io.c Merge branch 'misc-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.6 2016-03-24 17:36:13 -07:00
disk-io.h Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 2016-01-11 06:08:37 -08:00
export.c BTRFS: support NFSv2 export 2015-10-06 06:55:23 -07:00
export.h
extent-tree.c btrfs: Output more info for enospc_debug mount option 2016-04-04 16:29:22 +02:00
extent_io.c Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
extent_io.h Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
extent_map.c btrfs: Fix misspellings in comments. 2016-03-14 15:05:02 +01:00
extent_map.h btrfs: cleanup, stop casting for extent_map->lookup everywhere 2016-01-15 19:22:28 +01:00
file-item.c btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sums 2016-03-21 07:25:44 -07:00
file.c Btrfs: Improve FL_KEEP_SIZE handling in fallocate 2016-04-04 16:25:28 +02:00
free-space-cache.c Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 2016-01-11 06:08:37 -08:00
free-space-cache.h btrfs: constify remaining structs with function pointers 2016-01-07 15:01:14 +01:00
free-space-tree.c Revert "btrfs: synchronize incompat feature bits with sysfs files" 2016-01-29 08:19:37 -08:00
free-space-tree.h Btrfs: implement the free space B-tree 2015-12-17 12:16:47 -08:00
hash.c btrfs: LLVMLinux: Remove VLAIS 2014-10-14 10:51:22 +02:00
hash.h
inode-item.c Btrfs: consolidate btrfs_error() to btrfs_std_error() 2015-09-29 16:30:00 +02:00
inode-map.c Btrfs: Show a warning message if one of objectid reaches its highest value 2016-03-11 17:12:35 +01:00
inode-map.h Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots 2016-01-15 19:25:02 +01:00
inode.c Btrfs: fix deadlock between direct IO reads and buffered writes 2016-03-01 08:23:37 -08:00
ioctl.c Btrfs: don't use src fd for printk 2016-04-04 16:29:22 +02:00
Kconfig rcu: Make SRCU optional by using CONFIG_SRCU 2015-01-06 11:04:29 -08:00
locking.c btrfs: cleanup, remove stray return statements 2016-01-07 14:30:52 +01:00
locking.h btrfs: fix lockups from btrfs_clear_path_blocking 2014-11-19 10:34:35 -08:00
lzo.c btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00
Makefile Btrfs: add free space tree sanity tests 2015-12-17 12:16:47 -08:00
math.h btrfs: cleanup 64bit/32bit divs, compile time constants 2015-03-03 17:23:57 +01:00
ordered-data.c btrfs: Fix misspellings in comments. 2016-03-14 15:05:02 +01:00
ordered-data.h Btrfs: change how we wait for pending ordered extents 2015-10-21 18:51:40 -07:00
orphan.c
print-tree.c btrfs: teach print_leaf about temporary item subtypes 2016-02-11 16:15:43 +01:00
print-tree.h
props.c btrfs: move btrfs_compression_type to compression.h 2016-03-11 17:12:46 +01:00
props.h
qgroup.c btrfs: Add qgroup tracing 2016-04-04 16:29:22 +02:00
qgroup.h btrfs: qgroup: Check if qgroup reserved space leaked 2015-10-21 18:41:10 -07:00
raid56.c btrfs: raid56: Use raid_write_end_io for scrub 2016-01-20 07:22:18 -08:00
raid56.h Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation 2015-08-09 07:34:26 -07:00
rcu-string.h
reada.c Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6 2016-02-26 15:38:32 +01:00
relocation.c Btrfs: fix invalid reference in replace_path 2016-04-04 16:29:18 +02:00
root-tree.c btrfs: Replace CURRENT_TIME by current_fs_time() 2016-02-18 11:46:03 +01:00
scrub.c btrfs: scrub: silence an uninitialized variable warning 2016-03-11 17:21:59 +01:00
send.c btrfs: move btrfs_compression_type to compression.h 2016-03-11 17:12:46 +01:00
send.h Btrfs: use linux/sizes.h to represent constants 2016-01-07 14:38:02 +01:00
struct-funcs.c
super.c btrfs: rename btrfs_print_info to btrfs_print_mod_info 2016-03-11 17:12:46 +01:00
sysfs.c btrfs: sysfs: check initialization state before updating features 2016-01-27 05:40:10 -08:00
sysfs.h btrfs: sysfs: introduce helper for syncing bits with sysfs files 2016-01-21 18:50:40 +01:00
transaction.c Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
transaction.h btrfs: preallocate path for snapshot creation at ioctl time 2016-01-07 15:20:55 +01:00
tree-defrag.c Btrfs: fix locking bugs when defragging leaves 2015-12-18 02:51:32 +00:00
tree-log.c Btrfs: fix file/data loss caused by fsync after rename and new inode 2016-04-06 17:01:44 -07:00
tree-log.h Btrfs: fix unreplayable log after snapshot delete + parent dir fsync 2016-03-01 08:23:25 -08:00
ulist.c btrfs: ulist: Add ulist_del() function. 2015-06-10 09:26:17 -07:00
ulist.h btrfs: ulist: Add ulist_del() function. 2015-06-10 09:26:17 -07:00
uuid-tree.c Btrfs: make btrfs_search_forward return with nodes unlocked 2014-09-17 13:38:02 -07:00
volumes.c btrfs: Fix misspellings in comments. 2016-03-14 15:05:02 +01:00
volumes.h Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 2016-01-11 06:08:37 -08:00
xattr.c Btrfs: fix listxattrs not listing all xattrs packed in the same item 2016-03-01 08:23:41 -08:00
xattr.h btrfs: Use xattr handler infrastructure 2015-12-06 21:34:14 -05:00
zlib.c btrfs: constify structs with op functions or static definitions 2015-02-16 18:48:44 +01:00