linux/fs
Alex Elder 26be88087a libceph: change how "safe" callback is used
An osd request currently has two callbacks.  They inform the
initiator of the request when we've received confirmation for the
target osd that a request was received, and when the osd indicates
all changes described by the request are durable.

The only time the second callback is used is in the ceph file system
for a synchronous write.  There's a race that makes some handling of
this case unsafe.  This patch addresses this problem.  The error
handling for this callback is also kind of gross, and this patch
changes that as well.

In ceph_sync_write(), if a safe callback is requested we want to add
the request on the ceph inode's unsafe items list.  Because items on
this list must have their tid set (by ceph_osd_start_request()), the
request added *after* the call to that function returns.  The
problem with this is that there's a race between starting the
request and adding it to the unsafe items list; the request may
already be complete before ceph_sync_write() even begins to put it
on the list.

To address this, we change the way the "safe" callback is used.
Rather than just calling it when the request is "safe", we use it to
notify the initiator the bounds (start and end) of the period during
which the request is *unsafe*.  So the initiator gets notified just
before the request gets sent to the osd (when it is "unsafe"), and
again when it's known the results are durable (it's no longer
unsafe).  The first call will get made in __send_request(), just
before the request message gets sent to the messenger for the first
time.  That function is only called by __send_queued(), which is
always called with the osd client's request mutex held.

We then have this callback function insert the request on the ceph
inode's unsafe list when we're told the request is unsafe.  This
will avoid the race because this call will be made under protection
of the osd client's request mutex.  It also nicely groups the setup
and cleanup of the state associated with managing unsafe requests.

The name of the "safe" callback field is changed to "unsafe" to
better reflect its new purpose.  It has a Boolean "unsafe" parameter
to indicate whether the request is becoming unsafe or is now safe.
Because the "msg" parameter wasn't used, we drop that.

This resolves the original problem reportedin:
    http://tracker.ceph.com/issues/4706

Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-01 21:18:52 -07:00
..
9p fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
adfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
affs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
afs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
autofs4 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
befs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
bfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2013-04-14 10:52:54 -07:00
cachefiles
ceph libceph: change how "safe" callback is used 2013-05-01 21:18:52 -07:00
cifs cifs: Allow passwords which begin with a delimitor 2013-04-10 15:54:14 -05:00
coda fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
configfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
cramfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
debugfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
devpts fs: Limit sys_mount to only request filesystem modules (Part 2). 2013-03-07 01:08:55 -08:00
dlm hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ecryptfs ecryptfs: close rmmod race 2013-04-09 14:08:16 -04:00
efs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
exofs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
exportfs hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ext2 ext2: Fix BUG_ON in evict() on inode deletion 2013-03-13 15:23:44 +01:00
ext3 ext3: Fix format string issues 2013-03-11 22:05:56 +01:00
ext4 ext4: fix big-endian bugs which could cause fs corruptions 2013-04-03 12:37:17 -04:00
f2fs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
fat fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
freevxfs fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
fscache hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
fuse fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
gfs2 GFS2: Issue discards in 512b sectors 2013-04-05 17:55:13 +01:00
hfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
hfsplus hfsplus: fix potential overflow in hfsplus_file_truncate() 2013-04-17 16:10:45 -07:00
hostfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-03-13 15:47:50 -07:00
hpfs fs: Limit sys_mount to only request filesystem modules. (Part 3) 2013-03-11 07:09:48 -07:00
hppfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
hugetlbfs hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) 2013-04-17 16:10:44 -07:00
isofs fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
jbd
jbd2 jbd2: fix use after free in jbd2_journal_dirty_metadata() 2013-03-11 13:24:56 -04:00
jffs2 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
jfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
lockd Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux 2013-02-28 18:02:55 -08:00
logfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
minix fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
ncpfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
nfs NFS client bugfixes for Linux 3.9 2013-04-10 10:26:49 -07:00
nfs_common nfs_common: Update the translation between nfsv3 acls linux posix acls 2013-02-13 06:15:14 -08:00
nfsd nfsd4: reject "negative" acl lengths 2013-03-26 16:18:27 -04:00
nilfs2 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
nls
notify hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ntfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
ocfs2 fs: Limit sys_mount to only request filesystem modules (Part 2). 2013-03-07 01:08:55 -08:00
omfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
openpromfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
proc kthread: Prevent unpark race which puts threads on the wrong cpu 2013-04-12 14:18:43 +02:00
pstore A few fixes to reduce places where pstore might hang 2013-02-21 09:38:18 -08:00
qnx4 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
qnx6 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
quota quota: add missing use of dq_data_lock in __dquot_initialize 2013-03-11 22:05:56 +01:00
ramfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
reiserfs reiserfs: Fix warning and inode leak when deleting inode with xattrs 2013-03-29 17:08:43 +01:00
romfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
squashfs fs: Limit sys_mount to only request filesystem modules. (Part 3) 2013-03-11 07:09:48 -07:00
sysfs sysfs fixes for 3.9-rc4 2013-03-28 15:52:14 -07:00
sysv fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
ubifs UBIFS: make space fixup work in the remount case 2013-03-14 11:20:22 +02:00
udf fs: Limit sys_mount to only request filesystem modules. (Part 3) 2013-03-11 07:09:48 -07:00
ufs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
xfs - Fix for a potential infinite loop which was introduced in 4d559a3bcb 2013-03-19 15:17:40 -07:00
aio.c aio: fix possible invalid memory access when DEBUG is enabled 2013-04-26 07:56:18 -07:00
anon_inodes.c get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero 2013-02-26 02:46:11 -05:00
attr.c
bad_inode.c
binfmt_aout.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
binfmt_elf.c fs/binfmt_elf.c: fix hugetlb memory check in vma_dump_size() 2013-04-17 16:10:44 -07:00
binfmt_elf_fdpic.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
binfmt_em86.c
binfmt_flat.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
binfmt_misc.c fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
binfmt_script.c
binfmt_som.c
bio-integrity.c
bio.c Revert "block: add missing block_bio_complete() tracepoint" 2013-04-18 09:00:26 -07:00
block_dev.c loop: prevent bdev freeing while device in use 2013-04-01 15:48:47 -07:00
buffer.c Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block 2013-02-28 12:52:24 -08:00
char_dev.c
compat.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-12 11:05:45 -07:00
compat_binfmt_elf.c
compat_ioctl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
coredump.c coredump: remove redundant defines for dumpable states 2013-02-27 19:10:11 -08:00
coredump.h
dcache.c Nest rename_lock inside vfsmount_lock 2013-03-26 18:25:57 -04:00
dcookies.c
direct-io.c fs: Fix possible use-after-free with AIO 2013-02-22 23:31:36 -05:00
drop_caches.c
eventfd.c
eventpoll.c
exec.c coredump: remove redundant defines for dumpable states 2013-02-27 19:10:11 -08:00
fcntl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
fhandle.c
fifo.c
file.c locking: Various static lock initializer fixes 2013-02-19 08:42:45 +01:00
file_table.c cache the value of file_inode() in struct file 2013-03-01 19:48:30 -05:00
filesystems.c fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
fs-writeback.c 2 writeback fixes 2013-02-28 13:21:44 -08:00
fs_struct.c constify path_get/path_put and fs_struct.c stuff 2013-03-01 23:51:07 -05:00
generic_acl.c
inode.c vfs: Revert spurious fix to spinning prevention in prune_icache_sb 2013-04-13 16:13:55 -07:00
internal.h Don't bother with redoing rw_verify_area() from default_file_splice_from() 2013-03-21 13:11:11 -04:00
ioctl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
ioprio.c
Kconfig
Kconfig.binfmt
libfs.c
locks.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
Makefile
mbcache.c
mount.h
mpage.c
namei.c vfs: don't BUG_ON() if following a /proc fd pseudo-symlink results in a symlink 2013-03-08 09:03:07 -08:00
namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-04-09 12:22:49 -07:00
no-block.c
open.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-03-03 13:23:03 -08:00
pipe.c vfs: fix pipe counter breakage 2013-03-12 08:29:17 -07:00
pnode.c vfs: Carefully propogate mounts across user namespaces 2013-03-27 07:50:05 -07:00
pnode.h vfs: Carefully propogate mounts across user namespaces 2013-03-27 07:50:05 -07:00
posix_acl.c
proc_namespace.c
read_write.c vfs/splice: Fix missed checks in new __kernel_write() helper 2013-03-27 09:24:02 -07:00
read_write.h
readdir.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
select.c
seq_file.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-03-03 13:23:03 -08:00
signalfd.c
splice.c Don't bother with redoing rw_verify_area() from default_file_splice_from() 2013-03-21 13:11:11 -04:00
stack.c
stat.c switch vfs_getattr() to struct path 2013-02-26 02:46:08 -05:00
statfs.c
super.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
sync.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
timerfd.c compat: restore timerfd settime and gettime compat syscalls 2013-03-02 09:35:13 -05:00
utimes.c
xattr.c
xattr_acl.c