qemu/block
Fam Zheng a77fd4bb29 block: Fix bdrv_drain in coroutine
Using the nested aio_poll() in coroutine is a bad idea. This patch
replaces the aio_poll loop in bdrv_drain with a BH, if called in
coroutine.

For example, the bdrv_drain() in mirror.c can hang when a guest issued
request is pending on it in qemu_co_mutex_lock().

Mirror coroutine in this case has just finished a request, and the block
job is about to complete. It calls bdrv_drain() which waits for the
other coroutine to complete. The other coroutine is a scsi-disk request.
The deadlock happens when the latter is in turn pending on the former to
yield/terminate, in qemu_co_mutex_lock(). The state flow is as below
(assuming a qcow2 image):

  mirror coroutine               scsi-disk coroutine
  -------------------------------------------------------------
  do last write

    qcow2:qemu_co_mutex_lock()
    ...
                                 scsi disk read

                                   tracked request begin

                                   qcow2:qemu_co_mutex_lock.enter

    qcow2:qemu_co_mutex_unlock()

  bdrv_drain
    while (has tracked request)
      aio_poll()

In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
because the mirror coroutine is blocked in the aio_poll(blocking=true).

With this patch, the added qemu_coroutine_yield() allows the scsi-disk
coroutine to make progress as expected:

  mirror coroutine               scsi-disk coroutine
  -------------------------------------------------------------
  do last write

    qcow2:qemu_co_mutex_lock()
    ...
                                 scsi disk read

                                   tracked request begin

                                   qcow2:qemu_co_mutex_lock.enter

    qcow2:qemu_co_mutex_unlock()

  bdrv_drain.enter
>   schedule BH
>   qemu_coroutine_yield()
>                                  qcow2:qemu_co_mutex_lock.return
>                                  ...
                                   tracked request end
    ...
    (resumed from BH callback)
  bdrv_drain.return
  ...

Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459855253-5378-2-git-send-email-famz@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-11 16:59:09 +01:00
..
accounting.c block: Clean up includes 2016-01-20 13:36:23 +01:00
archipelago.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
backup.c block: Remove bdrv_(set_)enable_write_cache() 2016-03-30 12:16:03 +02:00
blkdebug.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
blkreplay.c replay: introduce block devices record/replay 2016-03-30 12:15:57 +02:00
blkverify.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
block-backend.c block: Remove BDRV_O_CACHE_WB 2016-03-30 12:16:03 +02:00
bochs.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
cloop.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
commit.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
crypto.c crypto: Avoid memory leak on failure 2016-04-05 17:23:21 +02:00
curl.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
dirty-bitmap.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
dmg.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
gluster.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
io.c block: Fix bdrv_drain in coroutine 2016-04-11 16:59:09 +01:00
iscsi.c iscsi: Support BDRV_REQ_FUA 2016-03-30 12:16:02 +02:00
linux-aio.c block: Clean up includes 2016-01-20 13:36:23 +01:00
Makefile.objs replay: introduce block devices record/replay 2016-03-30 12:15:57 +02:00
mirror.c block: Remove bdrv_(set_)enable_write_cache() 2016-03-30 12:16:03 +02:00
nbd-client.c nbd: don't request FUA on FLUSH 2016-04-05 11:46:52 +02:00
nbd-client.h nbd: Support BDRV_REQ_FUA 2016-03-30 12:16:02 +02:00
nbd.c nbd: Support BDRV_REQ_FUA 2016-03-30 12:16:02 +02:00
nfs.c block/nfs: add missing #include "qemu/cutils.h" 2016-03-30 16:50:39 -04:00
null.c block/null-{co,aio}: Implement get_block_status() 2016-03-30 12:16:04 +02:00
parallels.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
qapi.c block/qapi: Use blk_enable_write_cache() 2016-03-30 12:16:02 +02:00
qcow.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
qcow2-cache.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qcow2-cluster.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
qcow2-refcount.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
qcow2-snapshot.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
qcow2.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
qcow2.h qcow2: Add function for refcount order amendment 2015-12-18 14:34:43 +01:00
qed-check.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-cluster.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-gencb.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-l2-cache.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-table.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
qed.h util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
quorum.c quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode 2016-03-17 16:43:30 +01:00
raw-aio.h include/qemu/iov.h: Don't include qemu-common.h 2016-03-22 22:20:16 +01:00
raw-posix.c block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host 2016-03-30 11:59:32 +02:00
raw-win32.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
raw_bsd.c raw: Support BDRV_REQ_FUA 2016-03-30 12:16:02 +02:00
rbd.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
sheepdog.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
snapshot.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
ssh.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
stream.c -----BEGIN PGP SIGNATURE----- 2016-03-29 19:54:49 +01:00
throttle-groups.c block: Clean up includes 2016-01-20 13:36:23 +01:00
vdi.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
vhdx-endian.c block: Clean up includes 2016-01-20 13:36:23 +01:00
vhdx-log.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
vhdx.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
vhdx.h block: vhdx - update PAYLOAD_BLOCK_UNMAPPED value to match 1.00 spec 2014-12-12 15:42:22 +00:00
vmdk.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
vpc.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
vvfat.c block: Remove BDRV_O_CACHE_WB 2016-03-30 12:16:03 +02:00
win32-aio.c block: Clean up includes 2016-01-20 13:36:23 +01:00
write-threshold.c block: Clean up includes 2016-01-20 13:36:23 +01:00