linux/fs/xfs
Dave Chinner 1ca191576f xfs: Don't use unwritten extents for DAX
DAX has a page fault serialisation problem with block allocation.
Because it allows concurrent page faults and does not have a page
lock to serialise faults to the same page, it can get two concurrent
faults to the page that race.

When two read faults race, this isn't a huge problem as the data
underlying the page is not changing and so "detect and drop" works
just fine. The issues are to do with write faults.

When two write faults occur, we serialise block allocation in
get_blocks() so only one faul will allocate the extent. It will,
however, be marked as an unwritten extent, and that is where the
problem lies - the DAX fault code cannot differentiate between a
block that was just allocated and a block that was preallocated and
needs zeroing. The result is that both write faults end up zeroing
the block and attempting to convert it back to written.

The problem is that the first fault can zero and convert before the
second fault starts zeroing, resulting in the zeroing for the second
fault overwriting the data that the first fault wrote with zeros.
The second fault then attempts to convert the unwritten extent,
which is then a no-op because it's already written. Data loss occurs
as a result of this race.

Because there is no sane locking construct in the page fault code
that we can use for serialisation across the page faults, we need to
ensure block allocation and zeroing occurs atomically in the
filesystem. This means we can still take concurrent page faults and
the only time they will serialise is in the filesystem
mapping/allocation callback. The page fault code will always see
written, initialised extents, so we will be able to remove the
unwritten extent handling from the DAX code when all filesystems are
converted.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03 12:37:00 +11:00
..
libxfs xfs: introduce BMAPI_ZERO for allocating zeroed extents 2015-11-03 12:27:22 +11:00
Kconfig xfs: require 64-bit sector_t 2014-07-30 09:12:05 +10:00
kmem.c xfs: change kmem_free to use generic kvfree() 2015-02-02 09:54:18 +11:00
kmem.h xfs: change kmem_free to use generic kvfree() 2015-02-02 09:54:18 +11:00
Makefile libxfs: add xfs_bit.c 2015-07-29 11:52:08 +10:00
mrlock.h
uuid.c
uuid.h
xfs.h
xfs_acl.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_acl.h xfs: move acl structures to xfs_format.h 2014-11-28 14:24:37 +11:00
xfs_aops.c xfs: Don't use unwritten extents for DAX 2015-11-03 12:37:00 +11:00
xfs_aops.h xfs: fix inode size update overflow in xfs_map_direct() 2015-11-03 12:27:22 +11:00
xfs_attr.h
xfs_attr_inactive.c Merge branch 'xfs-misc-fixes-for-4.2-3' into for-next 2015-06-23 08:49:01 +10:00
xfs_attr_list.c xfs: pass attr geometry to attr leaf header conversion functions 2015-04-13 11:26:02 +10:00
xfs_bmap_util.c xfs: introduce BMAPI_ZERO for allocating zeroed extents 2015-11-03 12:27:22 +11:00
xfs_bmap_util.h xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate 2015-03-25 15:08:56 +11:00
xfs_buf.c xfs: updates for 4.3-rc1 2015-09-07 13:28:32 -07:00
xfs_buf.h dax: move DAX-related functions to a new header 2015-09-08 15:35:28 -07:00
xfs_buf_item.c Merge branch 'xfs-misc-fixes-for-4.3-3' into for-next 2015-08-25 10:13:35 +10:00
xfs_buf_item.h xfs: fix non-debug build warnings 2015-08-25 10:05:13 +10:00
xfs_dir2_readdir.c xfs: stop holding ILOCK over filldir callbacks 2015-08-19 10:33:00 +10:00
xfs_discard.c xfs: pass mp to XFS_WANT_CORRUPTED_GOTO 2015-02-23 22:39:08 +11:00
xfs_discard.h
xfs_dquot.c Merge branch 'xfs-misc-fixes-for-4.3-2' into for-next 2015-08-20 09:28:45 +10:00
xfs_dquot.h xfs: fix implicit bool to int conversion 2015-01-09 10:48:58 +11:00
xfs_dquot_item.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_dquot_item.h xfs: remove the quotaoff log format from the quotaoff log item 2013-12-13 11:34:08 +11:00
xfs_error.c xfs: remove inst_t 2015-06-22 09:44:02 +10:00
xfs_error.h xfs: remove inst_t 2015-06-22 09:44:02 +10:00
xfs_export.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xfs_export.h
xfs_extent_busy.c xfs: merge xfs_ag.h into xfs_format.h 2014-11-28 14:25:04 +11:00
xfs_extent_busy.h
xfs_extfree_item.c xfs: add helper to conditionally remove items from the AIL 2015-08-19 10:01:08 +10:00
xfs_extfree_item.h xfs: fix efi/efd error handling to avoid fs shutdown hangs 2015-08-19 09:51:16 +10:00
xfs_file.c xfs: fix inode size update overflow in xfs_map_direct() 2015-11-03 12:27:22 +11:00
xfs_filestream.c xfs: clean up XFS_MIN_FREELIST macros 2015-06-22 10:13:30 +10:00
xfs_filestream.h xfs: add filestream allocator tracepoints 2014-04-23 07:11:52 +10:00
xfs_fsops.c xfs: growfs not aware of sb_meta_uuid 2015-08-19 10:31:41 +10:00
xfs_fsops.h
xfs_globals.c xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_icache.c xfs: add mssing inode cache attempts counter increment 2015-08-28 14:50:56 +10:00
xfs_icache.h xfs: merge xfs_ag.h into xfs_format.h 2014-11-28 14:25:04 +11:00
xfs_icreate_item.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_icreate_item.h
xfs_inode.c Merge branch 'xfs-misc-fixes-for-4.3-3' into for-next 2015-08-25 10:13:35 +10:00
xfs_inode.h xfs: clean up inode lockdep annotations 2015-08-19 10:32:49 +10:00
xfs_inode_item.c xfs: add helper to conditionally remove items from the AIL 2015-08-19 10:01:08 +10:00
xfs_inode_item.h xfs: remove the inode log format from the inode log item 2013-12-13 11:34:05 +11:00
xfs_ioctl.c xfs: saner xfs_trans_commit interface 2015-06-04 13:48:08 +10:00
xfs_ioctl.h
xfs_ioctl32.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xfs_ioctl32.h xfs: compat_xfs_bstat does not have forkoff 2014-10-02 09:17:58 +10:00
xfs_iomap.c xfs: Don't use unwritten extents for DAX 2015-11-03 12:37:00 +11:00
xfs_iomap.h xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten 2015-01-09 10:48:12 +11:00
xfs_iops.c xfs: fix error gotos in xfs_setattr_nonsize 2015-08-28 14:51:10 +10:00
xfs_iops.h xfs: inodes are new until the dentry cache is set up 2015-02-23 22:38:08 +11:00
xfs_itable.c xfs: fix btree cursor error cleanups 2015-08-19 10:00:53 +10:00
xfs_itable.h xfs: bulkstat chunk formatting cursor is broken 2014-11-07 08:30:30 +11:00
xfs_linux.h xfs: remove xfs_caddr_t 2015-06-22 09:45:10 +10:00
xfs_log.c Merge branch 'xfs-efi-rework' into for-next 2015-08-19 10:10:47 +10:00
xfs_log.h xfs: don't leave EFIs on AIL on mount failure 2015-08-19 09:58:36 +10:00
xfs_log_cil.c xfs: close xc_cil list_empty() races with cil commit sequence 2015-07-29 11:51:01 +10:00
xfs_log_priv.h xfs: don't leave EFIs on AIL on mount failure 2015-08-19 09:58:36 +10:00
xfs_log_recover.c Merge branch 'xfs-misc-fixes-for-4.3-2' into for-next 2015-08-20 09:28:45 +10:00
xfs_message.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_message.h
xfs_mount.c xfs: clean up root inode properly on mount failure 2015-08-19 10:00:28 +10:00
xfs_mount.h xfs: introduce BMAPI_ZERO for allocating zeroed extents 2015-11-03 12:27:22 +11:00
xfs_mru_cache.c xfs: xfs_mru_cache_insert() should use GFP_NOFS 2015-03-25 14:57:53 +11:00
xfs_mru_cache.h xfs: embedd mru_elem into parent structure 2014-04-23 07:11:51 +10:00
xfs_pnfs.c xfs: saner xfs_trans_commit interface 2015-06-04 13:48:08 +10:00
xfs_pnfs.h xfs: unlock i_mutex in xfs_break_layouts 2015-04-13 11:38:29 +10:00
xfs_qm.c xfs: saner xfs_trans_commit interface 2015-06-04 13:48:08 +10:00
xfs_qm.h xfs: Convert to using ->get_state callback 2015-03-04 16:06:36 +01:00
xfs_qm_bhv.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_qm_syscalls.c xfs: saner xfs_trans_commit interface 2015-06-04 13:48:08 +10:00
xfs_quota.h xfs: fix quota block reservation leak when tp allocates and frees blocks 2015-06-01 07:15:37 +10:00
xfs_quotaops.c xfs: Add support for Q_SETINFO 2015-03-04 16:06:38 +01:00
xfs_rtalloc.c xfs: add missing bmap cancel calls in error paths 2015-08-19 10:01:40 +10:00
xfs_rtalloc.h xfs: combine xfs_rtmodify_summary and xfs_rtget_summary 2014-09-09 11:58:42 +10:00
xfs_stats.c xfs: support the XFS_BTNUM_FINOBT free inode btree type 2014-04-24 16:00:52 +10:00
xfs_stats.h xfs: support the XFS_BTNUM_FINOBT free inode btree type 2014-04-24 16:00:52 +10:00
xfs_super.c xfs: updates for 4.3-rc1 2015-09-07 13:28:32 -07:00
xfs_super.h xfs: Remove icsb infrastructure 2015-02-23 21:22:31 +11:00
xfs_symlink.c Merge branch 'xfs-misc-fixes-for-4.3-2' into for-next 2015-08-20 09:28:45 +10:00
xfs_symlink.h
xfs_sysctl.c xfs: remove deprecated sysctls 2015-01-09 10:47:43 +11:00
xfs_sysctl.h xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_sysfs.c xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_sysfs.h xfs: add debug sysfs attribute set 2014-09-09 11:52:42 +10:00
xfs_trace.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_trace.h xfs: huge page fault support 2015-09-08 15:35:28 -07:00
xfs_trans.c xfs: return committed status from xfs_trans_roll() 2015-08-19 09:50:13 +10:00
xfs_trans.h xfs: ensure EFD trans aborts on log recovery extent free failure 2015-08-19 09:51:43 +10:00
xfs_trans_ail.c xfs: remove __psint_t and __psunsigned_t 2015-06-22 09:43:32 +10:00
xfs_trans_buf.c xfs: only trace buffer items if they exist 2015-02-10 09:23:40 +11:00
xfs_trans_dquot.c xfs: Clean up xfs_trans_dup_dqinfo 2015-06-01 10:50:00 +10:00
xfs_trans_extfree.c xfs: ensure EFD trans aborts on log recovery extent free failure 2015-08-19 09:51:43 +10:00
xfs_trans_inode.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_trans_priv.h xfs: add helper to conditionally remove items from the AIL 2015-08-19 10:01:08 +10:00
xfs_xattr.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00