linux/fs/xfs
Dave Chinner 9d11001420 xfs: limit iclog tail updates
From the department of "generic/482 keeps on giving", we bring you
another tail update race condition:

iclog:
	S1			C1
	+-----------------------+-----------------------+
				 S2			EOIC

Two checkpoints in a single iclog. One is complete, the other just
contains the start record and overruns into a new iclog.

Timeline:

Before S1:	Cache flush, log tail = X
At S1:		Metadata stable, write start record and checkpoint
At C1:		Write commit record, set NEED_FUA
		Single iclog checkpoint, so no need for NEED_FLUSH
		Log tail still = X, so no need for NEED_FLUSH

After C1,
Before S2:	Cache flush, log tail = X
At S2:		Metadata stable, write start record and checkpoint
After S2:	Log tail moves to X+1
At EOIC:	End of iclog, more journal data to write
		Releases iclog
		Not a commit iclog, so no need for NEED_FLUSH
		Writes log tail X+1 into iclog.

At this point, the iclog has tail X+1 and NEED_FUA set. There has
been no cache flush for the metadata between X and X+1, and the
iclog writes the new tail permanently to the log. THis is sufficient
to violate on disk metadata/journal ordering.

We have two options here. The first is to detect this case in some
manner and ensure that the partial checkpoint write sets NEED_FLUSH
when the iclog is already marked NEED_FUA and the log tail changes.
This seems somewhat fragile and quite complex to get right, and it
doesn't actually make it obvious what underlying problem it is
actually addressing from reading the code.

The second option seems much cleaner to me, because it is derived
directly from the requirements of the C1 commit record in the iclog.
That is, when we write this commit record to the iclog, we've
guaranteed that the metadata/data ordering is correct for tail
update purposes. Hence if we only write the log tail into the iclog
for the *first* commit record rather than the log tail at the last
release, we guarantee that the log tail does not move past where the
the first commit record in the log expects it to be.

IOWs, taking the first option means that replay of C1 becomes
dependent on future operations doing the right thing, not just the
C1 checkpoint itself doing the right thing. This makes log recovery
almost impossible to reason about because now we have to take into
account what might or might not have happened in the future when
looking at checkpoints in the log rather than just having to
reconstruct the past...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-29 09:27:29 -07:00
..
libxfs xfs: logging the on disk inode LSN can make it go backwards 2021-07-29 09:27:29 -07:00
scrub xfs: detect misaligned rtinherit directory extent size hints 2021-07-15 09:58:42 -07:00
Kconfig xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n 2020-10-16 15:34:28 -07:00
kmem.c xfs: remove kmem_realloc() 2020-09-06 18:05:51 -07:00
kmem.h xfs: Remove kmem_zalloc_large() 2020-09-15 20:52:41 -07:00
Makefile xfs: refactor log recovery item sorting into a generic dispatch structure 2020-05-08 08:49:58 -07:00
mrlock.h
xfs.h
xfs_acl.c xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_acl.h fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
xfs_aops.c fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
xfs_aops.h
xfs_attr_inactive.c xfs: Add delay ready attr remove routines 2021-06-01 10:49:47 -07:00
xfs_attr_list.c xfs: rename and simplify xfs_bmap_one_block 2021-04-15 09:35:50 -07:00
xfs_bio_io.c xfs: async blkdev cache flush 2021-06-21 10:05:51 -07:00
xfs_bmap_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_bmap_item.h xfs: refactor intent item RECOVERED flag into the log item 2020-05-08 08:50:01 -07:00
xfs_bmap_util.c New code for 5.14: 2021-07-02 14:30:27 -07:00
xfs_bmap_util.h
xfs_buf.c xfs: remove xfs_blkdev_issue_flush 2021-06-21 10:05:46 -07:00
xfs_buf.h xfs: remove ->b_offset handling for page backed buffers 2021-06-07 11:49:50 +10:00
xfs_buf_item.c xfs: remove dead stale buf unpin handling code 2021-06-21 10:14:24 -07:00
xfs_buf_item.h xfs: move the buffer retry logic to xfs_buf.c 2020-09-15 20:52:38 -07:00
xfs_buf_item_recover.c xfs: Enforce attr3 buffer recovery order 2021-07-29 09:27:29 -07:00
xfs_dir2_readdir.c xfs: remove XFS_IFINLINE 2021-04-15 09:35:51 -07:00
xfs_discard.c xfs: convert allocbt cursors to use perags 2021-06-02 10:48:24 +10:00
xfs_discard.h
xfs_dquot.c xfs: move the XFS_IFEXTENTS check into xfs_iread_extents 2021-04-15 09:35:50 -07:00
xfs_dquot.h xfs: refactor default quota grace period setting code 2020-09-15 20:52:40 -07:00
xfs_dquot_item.c xfs: xfs_log_force_lsn isn't passed a LSN 2021-06-21 10:12:33 -07:00
xfs_dquot_item.h xfs: factor out quotaoff intent AIL removal and memory free 2020-03-18 08:12:23 -07:00
xfs_dquot_item_recover.c xfs: rename the ondisk dquot d_flags to d_type 2020-07-28 20:24:14 -07:00
xfs_error.c xfs: add error injection for per-AG resv failure 2021-03-25 16:47:53 -07:00
xfs_error.h xfs: xfs_buf_corruption_error should take __this_address 2020-03-12 07:58:12 -07:00
xfs_export.c xfs: Fix fall-through warnings for Clang 2021-05-26 14:51:26 -05:00
xfs_export.h
xfs_extent_busy.c xfs: pass perags through to the busy extent code 2021-06-02 10:48:24 +10:00
xfs_extent_busy.h xfs: pass perags through to the busy extent code 2021-06-02 10:48:24 +10:00
xfs_extfree_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_extfree_item.h xfs: refactor intent item RECOVERED flag into the log item 2020-05-08 08:50:01 -07:00
xfs_file.c New code for 5.14: 2021-07-02 14:30:27 -07:00
xfs_filestream.c xfs: move xfs_perag_get/put to xfs_ag.[ch] 2021-06-02 10:48:24 +10:00
xfs_filestream.h xfs: move the di_flags field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_fsmap.c xfs: remove agno from btree cursor 2021-06-02 10:48:24 +10:00
xfs_fsmap.h xfs: fix deadlock and streamline xfs_getfsmap performance 2020-10-07 08:40:29 -07:00
xfs_fsops.c xfs: shorten the shutdown messages to a single line 2021-06-21 10:14:13 -07:00
xfs_fsops.h xfs: get rid of xfs_growfs_{data,log}_t 2021-02-03 09:18:50 -08:00
xfs_globals.c xfs: consolidate the eofblocks and cowblocks workers 2021-02-03 09:18:49 -08:00
xfs_health.c xfs: drop IDONTCACHE on inodes when we mark them sick 2021-06-08 09:30:20 -07:00
xfs_icache.c xfs: fix type mismatches in the inode reclaim functions 2021-06-21 10:12:46 -07:00
xfs_icache.h xfs: fix type mismatches in the inode reclaim functions 2021-06-21 10:12:46 -07:00
xfs_icreate_item.c xfs: Remove kmem_zone_zalloc() usage 2020-07-28 20:24:14 -07:00
xfs_icreate_item.h
xfs_inode.c xfs: reset child dir '..' entry when unlinking child 2021-07-15 09:58:42 -07:00
xfs_inode.h xfs: get rid of xfs_dir_ialloc() 2021-06-02 10:48:24 +10:00
xfs_inode_item.c xfs: xfs_log_force_lsn isn't passed a LSN 2021-06-21 10:12:33 -07:00
xfs_inode_item.h xfs: xfs_log_force_lsn isn't passed a LSN 2021-06-21 10:12:33 -07:00
xfs_inode_item_recover.c xfs: logging the on disk inode LSN can make it go backwards 2021-07-29 09:27:29 -07:00
xfs_ioctl.c xfs: don't expose misaligned extszinherit hints to userspace 2021-07-15 09:58:42 -07:00
xfs_ioctl.h xfs: convert to fileattr 2021-04-12 15:04:29 +02:00
xfs_ioctl32.c xfs: convert to fileattr 2021-04-12 15:04:29 +02:00
xfs_ioctl32.h xfs: convert to fileattr 2021-04-12 15:04:29 +02:00
xfs_iomap.c xfs: Fix fall-through warnings for Clang 2021-05-26 14:51:26 -05:00
xfs_iomap.h
xfs_iops.c xfs: clean up open-coded fs block unit conversions 2021-06-01 12:53:59 -07:00
xfs_iops.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_itable.c xfs: move the di_crtime field to struct xfs_inode 2021-04-07 14:37:05 -07:00
xfs_itable.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_iwalk.c xfs: use perag for ialloc btree cursors 2021-06-02 10:48:24 +10:00
xfs_iwalk.h
xfs_linux.h xfs: async blkdev cache flush 2021-06-21 10:05:51 -07:00
xfs_log.c xfs: limit iclog tail updates 2021-07-29 09:27:29 -07:00
xfs_log.h xfs: xfs_log_force_lsn isn't passed a LSN 2021-06-21 10:12:33 -07:00
xfs_log_cil.c xfs: fix ordering violation between cache flushes and tail updates 2021-07-29 09:27:28 -07:00
xfs_log_priv.h xfs: need to see iclog flags in tracing 2021-07-29 09:27:29 -07:00
xfs_log_recover.c xfs: force the log offline when log intent item recovery fails 2021-06-21 10:14:24 -07:00
xfs_message.c xfs: refactor ratelimited buffer error messages into helper 2020-05-07 08:27:46 -07:00
xfs_message.h once: implement DO_ONCE_LITE for non-fast-path "do once" functionality 2021-06-28 15:54:57 -07:00
xfs_mount.c xfs: fix log intent recovery ENOSPC shutdowns when inactivating inodes 2021-06-21 10:14:24 -07:00
xfs_mount.h xfs: move perag structure and setup to libxfs/xfs_ag.[ch] 2021-06-02 10:48:24 +10:00
xfs_mru_cache.c xfs: set WQ_SYSFS on all workqueues in debug mode 2021-02-03 09:18:49 -08:00
xfs_mru_cache.h
xfs_ondisk.h xfs: rename struct xfs_legacy_ictimestamp 2021-04-22 18:29:25 -07:00
xfs_pnfs.c xfs: move the di_size field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_pnfs.h
xfs_pwork.c xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_pwork.h xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_qm.c xfs: get rid of xfs_dir_ialloc() 2021-06-02 10:48:24 +10:00
xfs_qm.h xfs: move the quotaoff dqrele inode walk into xfs_icache.c 2021-06-03 15:56:02 -07:00
xfs_qm_bhv.c xfs: move the di_projid field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_qm_syscalls.c xfs: move the quotaoff dqrele inode walk into xfs_icache.c 2021-06-03 15:56:02 -07:00
xfs_quota.h xfs: remove xfs_qm_vop_chown_reserve 2021-02-03 09:18:49 -08:00
xfs_quotaops.c xfs: move the di_nblocks field to struct xfs_inode 2021-04-07 14:37:03 -07:00
xfs_refcount_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_refcount_item.h xfs: refactor intent item RECOVERED flag into the log item 2020-05-08 08:50:01 -07:00
xfs_reflink.c xfs: convert refcount btree cursor to use perags 2021-06-02 10:48:24 +10:00
xfs_reflink.h xfs: move helpers that lock and unlock two inodes against userspace IO 2020-07-06 10:46:57 -07:00
xfs_rmap_item.c treewide: Change list_sort to use const pointers 2021-04-08 16:04:22 -07:00
xfs_rmap_item.h xfs: refactor intent item RECOVERED flag into the log item 2020-05-08 08:50:01 -07:00
xfs_rtalloc.c xfs: fix an integer overflow error in xfs_growfs_rt 2021-07-15 09:58:42 -07:00
xfs_rtalloc.h xfs: remove xfs_buf_t typedef 2020-12-16 16:07:34 -08:00
xfs_stats.c xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_stats.h xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_super.c xfs: remove xfs_blkdev_issue_flush 2021-06-21 10:05:46 -07:00
xfs_super.h xfs: remove xfs_blkdev_issue_flush 2021-06-21 10:05:46 -07:00
xfs_symlink.c xfs: get rid of xfs_dir_ialloc() 2021-06-02 10:48:24 +10:00
xfs_symlink.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_sysctl.c xfs: restore speculative_cow_prealloc_lifetime sysctl 2021-02-24 10:16:08 -08:00
xfs_sysctl.h xfs: consolidate the eofblocks and cowblocks workers 2021-02-03 09:18:49 -08:00
xfs_sysfs.c
xfs_sysfs.h xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init 2020-08-07 11:50:17 -07:00
xfs_trace.c xfs: move perag structure and setup to libxfs/xfs_ag.[ch] 2021-06-02 10:48:24 +10:00
xfs_trace.h xfs: need to see iclog flags in tracing 2021-07-29 09:27:29 -07:00
xfs_trans.c xfs: xfs_log_force_lsn isn't passed a LSN 2021-06-21 10:12:33 -07:00
xfs_trans.h xfs: xfs_log_force_lsn isn't passed a LSN 2021-06-21 10:12:33 -07:00
xfs_trans_ail.c xfs: delete duplicated words + other fixes 2020-08-05 08:49:58 -07:00
xfs_trans_buf.c xfs: Fix fall-through warnings for Clang 2021-05-26 14:51:26 -05:00
xfs_trans_dquot.c xfs: shut down the filesystem if we screw up quota reservation 2021-02-03 09:18:49 -08:00
xfs_trans_priv.h xfs: refactor adding recovered intent items to the log 2020-05-08 08:50:00 -07:00
xfs_xattr.c xfs: prevent metadata files from being inactivated 2021-03-25 16:47:50 -07:00