system/qemu - HydraGit

mirror of https://gitlab.com/qemu-project/qemu synced 2024-09-16 01:03:31 +00:00

Author	SHA1	Message	Date
Max Reitz	bb8160eb78	block/cor: Drop cor_co_truncate() No other filter driver has a .bdrv_co_truncate() implementation, and there is no need to because the general block layer code can handle it just as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-3-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:51 +01:00
Max Reitz	6b7e8f8b1c	block: Handle filter truncation like native impl. Make the filter truncation (passing it through to bs->file) a first-class citizen and handle it exactly as if it was the filter driver's native implementation of .bdrv_co_truncate(). I do not see a reason not to, it makes the code a bit shorter, and may be even more correct because this gets us to finish the write_req that we prepared before (may be important to e.g. bring dirty bitmaps to the correct size). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-2-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:45 +01:00
Max Reitz	e40e6e88f6	qcow2: Fix v3 snapshot table entry compliancy qcow2 v3 images require every snapshot table entry to have at least 16 bytes of extra data. If they do not, let qemu-img check -r all fix it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-15-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:09 +01:00
Max Reitz	d2b1d1ec73	qcow2: Repair snapshot table with too many entries The user cannot choose which snapshots are removed. This is fine because we have chosen the maximum snapshot table size to be so large (65536 entries) that it cannot be reasonably reached. If the snapshot table exceeds this size, the image has probably been corrupted in some way; in this case, it is most important to just make the image usable such that the user can copy off at least the active layer. (Also note that the snapshots will be removed only with "-r all", so a plain "check" or "check -r leaks" will not delete any data.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:08 +01:00
Max Reitz	099febf3ac	qcow2: Fix overly long snapshot tables We currently refuse to open qcow2 images with overly long snapshot tables. This patch makes qemu-img check -r all drop all offending entries past what we deem acceptable. The user cannot choose which snapshots are removed. This is fine because we have chosen the maximum snapshot table size to be so large (64 MB) that it cannot be reasonably reached. If the snapshot table exceeds this size, the image has probably been corrupted in some way; in this case, it is most important to just make the image usable such that the user can copy off at least the active layer. (Also note that the snapshots will be removed only with "-r all", so a plain "check" or "check -r leaks" will not delete any data.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-13-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:04 +01:00
Max Reitz	624143355c	qcow2: Keep track of the snapshot table length When repairing the snapshot table, we truncate entries that have too much extra data. This frees up space that we do not have to count towards the snapshot table size. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-12-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:02 +01:00
Max Reitz	f91f1f159b	qcow2: Fix broken snapshot table entries The only case where we currently reject snapshot table entries is when they have too much extra data. Fix them with qemu-img check -r all by counting it as a corruption, reducing their extra_data_size, and then letting qcow2_check_fix_snapshot_table() do the rest. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:02 +01:00
Max Reitz	fe446b5da2	qcow2: Add qcow2_check_fix_snapshot_table() qcow2_check_read_snapshot_table() can perform consistency checks, but it cannot fix everything. Specifically, it cannot allocate new clusters, because that should wait until the refcount structures are known to be consistent (i.e., after qcow2_check_refcounts()). Thus, it cannot call qcow2_write_snapshots(). Do that in qcow2_check_fix_snapshot_table(), which is called after qcow2_check_refcounts(). Currently, there is nothing that would set result->corruptions, so this is a no-op. A follow-up patch will change that. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:01 +01:00
Max Reitz	8bc584fe03	qcow2: Separate qcow2_check_read_snapshot_table() Reading the snapshot table can fail. That is a problem when we want to repair the image. Therefore, stop reading the snapshot table in qcow2_do_open() in check mode. Instead, add a new function qcow2_check_read_snapshot_table() that reads the snapshot table at a later point. In the future, we want to handle errors here and fix them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-9-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:00 +01:00
Max Reitz	0a85af351d	qcow2: Write v3-compliant snapshot list on upgrade qcow2 v3 requires every snapshot table entry to have two extra data fields: The 64-bit VM state size, and the virtual disk size. Both are optional for v2 images, so they may not be present. qcow2_upgrade() therefore should update the snapshot table to ensure all entries have these extra data fields. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1727347 Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:53:52 +01:00
Max Reitz	722efb0c7c	qcow2: Put qcow2_upgrade() into its own function This does not make sense right now, but it will make sense once we need to do more than to just update s->qcow_version. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-7-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:53:20 +01:00
Max Reitz	e0314b56b2	qcow2: Make qcow2_write_snapshots() public Updating the snapshot list will be useful when upgrading a v2 image to v3, so we will need to call this function in qcow2.c. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-6-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:53:17 +01:00
Max Reitz	fcf9a6b728	qcow2: Keep unknown extra snapshot data The qcow2 specification says to ignore unknown extra data fields in snapshot table entries. Currently, we discard it whenever we update the image, which is a bit different from "ignore". This patch makes the qcow2 driver keep all unknown extra data fields when updating an image's snapshot table. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-5-mreitz@redhat.com [mreitz: Adjusted comments as proposed by Eric] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:52:57 +01:00
Max Reitz	ecf6c7c0c1	qcow2: Add Error ** to qcow2_read_snapshots() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:51:09 +01:00
Max Reitz	d8fa8442ad	qcow2: Use endof() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:51:08 +01:00
Max Reitz	f93c3add3a	mirror: Do not dereference invalid pointers mirror_exit_common() may be called twice (if it is called from mirror_prepare() and fails, it will be called from mirror_abort() again). In such a case, many of the pointers in the MirrorBlockJob object will already be freed. This can be seen most reliably for s->target, which is set to NULL (and then dereferenced by blk_bs()). Cc: qemu-stable@nongnu.org Fixes: `737efc1eda` Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191014153931.20699-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:49:37 +01:00
Maxim Levitsky	e87a09d625	block/nvme: add support for discard Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-3-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:34:35 +01:00
Maxim Levitsky	e0dd95e373	block/nvme: add support for write zeros Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-2-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:34:30 +01:00
Vladimir Sementsov-Ogievskiy	0e2402452f	block/block-copy: increase buffered copy request No reason to limit buffered copy to one cluster. Let's allow up to 1 MiB. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	7f739d0e53	block/block-copy: add memory limit Currently total allocation for parallel requests to block-copy instance is unlimited. Let's limit it to 128 MiB. For now block-copy is used only in backup, so actually we limit total allocation for backup job. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	e332a726da	block/block-copy: refactor copying Merge copying code into one function block_copy_do_copy, which only calls bdrv_ io functions and don't do any synchronization (like dirty bitmap set/reset). Refactor block_copy() function so that it takes full decision about size of chunk to be copied and does all the synchronization (checking intersecting requests, set/reset dirty bitmaps). It will help: - introduce parallel processing of block_copy iterations: we need to calculate chunk size, start async chunk copying and go to the next iteration - simplify synchronization improvement (like memory limiting in further commit and reducing critical section (now we lock the whole requested range, when actually we need to lock only dirty region which we handle at the moment)) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	b3b7036afb	block/block-copy: limit copy_range_size to 16 MiB Large copy range may imply memory allocation and large io effort, so using 2G copy range request may be bad idea. Let's limit it to 16 MiB. It also helps the following patch to refactor copy-with-offload fallback to copy-with-bounce-buffer. Note, that total memory usage of backup is still not limited, it will be fixed in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	3816edd2cb	block/block-copy: allocate buffer in block_copy_with_bounce_buffer Move bounce_buffer allocation block_copy_with_bounce_buffer. This commit simplifies further work on implementing copying by larger chunks (of different size) and further asynchronous handling of block_copy iterations (with help of block/aio_task API). Allocation works fast, a lot faster than disk io, so it's not a problem that we now allocate/free bounce_buffer more times. And we anyway will have to allocate several bounce_buffers for parallel execution of loop iterations in future. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	994b44ab20	Revert "mirror: Only mirror granularity-aligned chunks" This reverts commit `9adc1cb49a`. "mirror: Only mirror granularity-aligned chunks" Since previous commit unaligned chunks are supported by do_sync_target_write. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy	dbdf699cad	block/mirror: support unaligned write in active mirror Prior `9adc1cb49a` do_sync_target_write had a bug: it reset aligned-up region in the dirty bitmap, which means that we may not copy some bytes and assume them copied, which actually leads to producing corrupted target. So `9adc1cb49a` forced dirty bitmap granularity to be request_alignment for mirror-top filter, so we are not working with unaligned requests. However forcing large alignment obviously decreases performance of unaligned requests. This commit provides another solution for the problem: if unaligned padding is already dirty, we can safely ignore it, as 1. It's dirty, it will be copied by mirror_iteration anyway 2. It's dirty, so skipping it now we don't increase dirtiness of the bitmap and therefore don't damage "synchronicity" of the write-blocking mirror. If unaligned padding is not dirty, we just write it, no reason to touch dirty bitmap if we succeed (on failure we'll set the whole region ofcourse, but we loss "synchronicity" on failure anyway). Note: we need to disable dirty_bitmap, otherwise we will not be able to see in do_sync_target_write bitmap state before current operation. We may of course check dirty bitmap before the operation in bdrv_mirror_top_do_write and remember it, but we don't need active dirty bitmap for write-blocking mirror anyway. New code-path is unused until the following commit reverts `9adc1cb49a`. Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191011090711.19940-5-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy	b30168647f	block/block-backend: add blk_co_pwritev_part Add blk write function with qiov_offset parameter. It's needed for the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy	5c511ac375	block/mirror: simplify do_sync_target_write do_sync_target_write is called from bdrv_mirror_top_do_write after write/discard operation, all inside active_write/active_write_settle protecting us from mirror iteration. So the whole area is dirty for sure, no reason to examine dirty bitmap. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Wei Yang	038adc2f58	core: replace getpagesize() with qemu_real_host_page_size There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-26 15:38:06 +02:00
Kevin Wolf	5e97855052	qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation() qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which requires s->lock to be taken to protect its accesses to the refcount table and refcount blocks. However, nothing in this code path actually took the lock. This could cause the same cache entry to be used by two requests at the same time, for different tables at different offsets, resulting in image corruption. As it would be preferable to base the detection on consistent data (even though it's just heuristics), let's take the lock not only around the qcow2_get_refcount() calls, but around the whole function. This patch takes the lock in qcow2_co_block_status() earlier and asserts in qcow2_detect_metadata_preallocation() that we hold the lock. Fixes: `69f47505ee` Cc: qemu-stable@nongnu.org Reported-by: Michael Weiser <michael.weiser@gmx.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-10-25 15:18:55 +02:00
Vladimir Sementsov-Ogievskiy	8ccf458af5	block/backup: drop dead code from backup_job_create After commit `00e30f05de`, there is no more "goto error" points after job creation, so after "error:" @job is always NULL and we don't need roll-back job creation. Reported-by: Coverity (CID 1406402) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-25 15:15:01 +02:00
Vladimir Sementsov-Ogievskiy	f7651539d8	block/nbd: nbd reconnect Implement reconnect. To achieve this: 1. add new modes: connecting-wait: means, that reconnecting is in progress, and there were small number of reconnect attempts, so all requests are waiting for the connection. connecting-nowait: reconnecting is in progress, there were a lot of attempts of reconnect, all requests will return errors. two old modes are used too: connected: normal state quit: exiting after fatal error or on close Possible transitions are: * -> quit connecting-* -> connected connecting-wait -> connecting-nowait (transition is done after reconnect-delay seconds in connecting-wait mode) connected -> connecting-wait 2. Implement reconnect in connection_co. So, in connecting-* mode, connection_co, tries to reconnect unlimited times. 3. Retry nbd queries on channel error, if we are in connecting-wait state. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20191009084158.15614-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-10-22 09:22:07 -05:00
Vladimir Sementsov-Ogievskiy	4dd09f6223	qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commit The only reason I can imagine for this strange code at the very-end of bdrv_reopen_commit is the fact that bs->read_only updated after calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong check for being writable, when actually it only need writable file child not self. So, as it's fixed, let's move things to correct place. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy	f6333cbf8b	block/qcow2-bitmap: fix and improve qcow2_reopen_bitmaps_rw - Correct check for write access to file child, and in correct place (only if we want to write). - Support reopen rw -> rw (which will be used in following commit), for example, !bdrv_dirty_bitmap_readonly() is not a corruption if bitmap is marked IN_USE in the image. - Consider unexpected bitmap as a corruption and check other combinations of in-image and in-RAM bitmaps. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190927122355.7344-9-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy	644ddbb754	block/qcow2-bitmap: do not remove bitmaps on reopen-ro qcow2_reopen_bitmaps_ro wants to store bitmaps and then mark them all readonly. But the latter don't work, as qcow2_store_persistent_dirty_bitmaps removes bitmaps after storing. It's OK for inactivation but bad idea for reopen-ro. And this leads to the following bug: Assume we have persistent bitmap 'bitmap0'. Create external snapshot bitmap0 is stored and therefore removed Commit snapshot now we have no bitmaps Do some writes from guest () they are not marked in bitmap Shutdown Start bitmap0 is loaded as valid, but it is actually broken! It misses writes () Incremental backup it will be inconsistent So, let's stop removing bitmaps on reopen-ro. But don't rejoice: reopening bitmaps to rw is broken too, so the whole scenario will not work after this patch and we can't enable corresponding test cases in 260 iotests still. Reopening bitmaps rw will be fixed in the following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-7-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	bd429a884c	block/qcow2-bitmap: drop qcow2_reopen_bitmaps_rw_hint() The function is unused, drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-6-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	f88676c149	block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps Firstly, no reason to optimize failure path. Then, function name is ambiguous: it checks for readonly and similar things, but someone may think that it will ignore normal bitmaps which was just unchanged, and this is in bad relation with the fact that we should drop IN_USE flag for unchanged bitmaps in the image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	ef9041a7b8	block/dirty-bitmap: refactor bdrv_dirty_bitmap_next bdrv_dirty_bitmap_next is always used in same pattern. So, split it into _next and _first, instead of combining two functions into one and add FOR_EACH_DIRTY_BITMAP macro. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	1e63830160	block/dirty-bitmap: drop BdrvDirtyBitmap.mutex mutex field is just a pointer to bs->dirty_bitmap_mutex, so no needs to store it in BdrvDirtyBitmap when we have bs pointer in it (since previous patch). Drop mutex field. Constantly use bdrv_dirty_bitmaps_lock/unlock in block/dirty-bitmap.c to make it more obvious that it's not per-bitmap lock. Still, for simplicity, leave bdrv_dirty_bitmap_lock/unlock functions as an external API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	5deb6cbd1f	block/dirty-bitmap: add bs link Add bs field to BdrvDirtyBitmap structure. Drop BlockDriverState parameter from bitmap APIs where possible. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-3-vsementsov@virtuozzo.com [Rebased on top of block-copy. --js] Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	767db3aad8	block/dirty-bitmap: drop meta Drop meta bitmaps, as they are unused. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	d2c3080e41	block/qcow2: proper locking on bitmap add/remove paths qmp_block_dirty_bitmap_add and do_block_dirty_bitmap_remove do acquire aio context since `0a6c86d024`. But this is not enough: we also must lock qcow2 mutex when access in-image metadata. Especially it concerns freeing qcow2 clusters. To achieve this, move qcow2_can_store_new_dirty_bitmap and qcow2_remove_persistent_dirty_bitmap to coroutine context. Since we work in coroutines in correct aio context, we don't need context acquiring in blockdev.c anymore, drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	b56a1e3175	block/dirty-bitmap: return int from bdrv_remove_persistent_dirty_bitmap It's more comfortable to not deal with local_err. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	85cc8a4f6b	block: move bdrv_can_store_new_dirty_bitmap to block/dirty-bitmap.c block/dirty-bitmap.c seems to be more appropriate for it and bdrv_remove_persistent_dirty_bitmap already in it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Max Reitz	d1b9d19f99	qcow2: Limit total allocation range to INT_MAX When the COW areas are included, the size of an allocation can exceed INT_MAX. This is kind of limited by handle_alloc() in that it already caps avail_bytes at INT_MAX, but the number of clusters still reflects the original length. This can have all sorts of effects, ranging from the storage layer write call failing to image corruption. (If there were no image corruption, then I suppose there would be data loss because the .cow_end area is forced to be empty, even though there might be something we need to COW.) Fix all of it by limiting nb_clusters so the equivalent number of bytes will not exceed INT_MAX. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Alberto Garcia	f2208fdc5b	block: Reject misaligned write requests with BDRV_REQ_NO_FALLBACK The BDRV_REQ_NO_FALLBACK flag means that an operation should only be performed if it can be offloaded or otherwise performed efficiently. However a misaligned write request requires a RMW so we should return an error and let the caller decide how to proceed. This hits an assertion since commit `c8bb23cbdb` if the required alignment is larger than the cluster size: qemu-img create -f qcow2 -o cluster_size=2k img.qcow2 4G qemu-io -c "open -o driver=qcow2,file.align=4k blkdebug::img.qcow2" \ -c 'write 0 512' qemu-io: block/io.c:1127: bdrv_driver_pwritev: Assertion `!(flags & BDRV_REQ_NO_FALLBACK)' failed. Aborted The reason is that when writing to an unallocated cluster we try to skip the copy-on-write part and zeroize it using BDRV_REQ_NO_FALLBACK instead, resulting in a write request that is too small (2KB cluster size vs 4KB required alignment). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk	e4ec5ad464	replay: add BH oneshot event for block layer Replay is capable of recording normal BH events, but sometimes there are single use callbacks scheduled with aio_bh_schedule_oneshot function. This patch enables recording and replaying such callbacks. Block layer uses these events for calling the completion function. Replaying these calls makes the execution deterministic. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk	c8aa7895eb	replay: don't drain/flush bdrv queue while RR is working In record/replay mode bdrv queue is controlled by replay mechanism. It does not allow saving or loading the snapshots when bdrv queue is not empty. Stopping the VM is not blocked by nonempty queue, but flushing the queue is still impossible there, because it may cause deadlocks in replay mode. This patch disables bdrv_drain_all and bdrv_flush_all in record/replay mode. Stopping the machine when the IO requests are not finished is needed for the debugging. E.g., breakpoint may be set at the specified step, and forcing the IO requests to finish may break the determinism of the execution. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk	3c6c4348f2	block: implement bdrv_snapshot_goto for blkreplay This patch enables making snapshots with blkreplay used in block devices. This function is required to make bdrv_snapshot_goto without calling .bdrv_open which is not implemented. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Peter Lieven	6caaad46de	block/vhdx: add check for truncated image files qemu is currently not able to detect truncated vhdx image files. Add a basic check if all allocated blocks are reachable at open and report all errors during bdrv_co_check. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Maxim Levitsky	e99754b42e	nbd: add empty .bdrv_reopen_prepare Fixes commit job / qemu-img commit, when commiting qcow2 file which is based on nbd export. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1718727 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190930213820.29777-2-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	00e30f05de	block/backup: use backup-top instead of write notifiers Drop write notifiers and use filter node instead. = Changes = 1. Add filter-node-name argument for backup qmp api. We have to do it in this commit, as 257 needs to be fixed. 2. There are no more write notifiers here, so is_write_notifier parameter is dropped from block-copy paths. 3. To sync with in-flight requests at job finish we now have drained removing of the filter, we don't need rw-lock. 4. Block-copy is now using BdrvChildren instead of BlockBackends 5. As backup-top owns these children, we also move block-copy state into backup-top's ownership. = Iotest changes = 56: op-blocker doesn't shoot now, as we set it on source, but then check on filter, when trying to start second backup. To keep the test we instead can catch another collision: both jobs will get 'drive0' job-id, as job-id parameter is unspecified. To prevent interleaving with file-posix locks (as they are dependent on config) let's use another target for second backup. Also, it's obvious now that we'd like to drop this op-blocker at all and add a test-case for two backups from one node (to different destinations) actually works. But not in these series. 141: Output changed: prepatch, "Node is in use" comes from bdrv_has_blk check inside qmp_blockdev_del. But we've dropped block-copy blk objects, so no more blk objects on source bs (job blk is on backup-top filter bs). New message is from op-blocker, which is the next check in qmp_blockdev_add. 257: The test wants to emulate guest write during backup. They should go to filter node, not to original source node, of course. Therefore we need to specify filter node name and use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-6-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	7df7868b96	block: introduce backup-top filter driver Backup-top filter caches write operations and does copy-before-write operations. The driver will be used in backup instead of write-notifiers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-5-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	0f4b02b73e	block/block-copy: split block_copy_set_callbacks function Split block_copy_set_callbacks out of block_copy_state_new. It's needed for further commit: block-copy will use BdrvChildren of backup-top filter, so it will be created from backup-top filter creation function. But callbacks will still belong to backup job and will be set in separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-4-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	843670f30f	block/backup: move write_flags calculation inside backup_job_create This is logic-less refactoring, which simplifies further patch, as we'll need write_flags for backup-top filter creation and backup-top should be created before block job creation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-3-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	a6ffe1998c	block/backup: move in-flight requests handling from backup to block-copy Move synchronization mechanism to block-copy, to be able to use one block-copy instance from backup job and backup-top filter in parallel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-2-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	d924559953	qapi: query-blockstat: add driver specific file-posix stats A block driver can provide a callback to report driver-specific statistics. file-posix driver now reports discard statistics Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	1c45036636	file-posix: account discard operations This will help to identify how many of the user-issued discard operations (accounted on a device level) have actually suceeded down on the host file (even though the numbers will not be exactly the same if non-raw format driver is used (e.g. qcow2 sending metadata discards)). Note that these numbers will not include discards triggered by write-zeroes + MAY_UNMAP calls. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190923121737.83281-9-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	f344446654	block: add empty account cookie type Each block_acct_done/failed call is designed to correspond to a previous block_acct_start call, which initializes the stats cookie. However sometimes it is not the case, e.g. some error paths might report the same cookie twice because it is hard to accurately track if the cookie was reported yet or not. This patch cleans the cookie after report. (Note: block_acct_failed/done without a previous block_acct_start at all should be avoided. Uninitialized cookie might hold a garbage value and there is still "< BLOCK_MAX_IOTYPE" assertion for that) It will be particularly useful in ide code where it's hard to keep track whether the request done its accounting or not: in the following patch of the series, trim requests will do the accounting separately. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190923121737.83281-4-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	159f85ddc8	qapi: add unmap to BlockDeviceStats Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20190923121737.83281-3-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	beb5f5450d	block: move block_copy from block/backup.c to separate file Split block_copy to separate file, to be cleanly shared with backup-top filter driver in further commits. It's a clean movement, the only change is drop "static" from interface functions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-8-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	0e23e382b7	block/backup: fix block-comment style We need to fix comment style around block-copy functions before further moving them to separate file to satisfy checkpatch. But do more: fix all comments style. Also, seems like doubled first asterisk is not forbidden, but drop it too for consistency. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	2c8074c453	block/backup: introduce BlockCopyState Split copying code part from backup to "block-copy", including separate state structure and function renaming. This is needed to share it with backup-top filter driver in further commits. Notes: 1. As BlockCopyState keeps own BlockBackend objects, remaining job->common.blk users only use it to get bs by blk_bs() call, so clear job->commen.blk permissions set in block_job_create and add job->source_bs to be used instead of blk_bs(job->common.blk), to keep it more clear which bs we use when introduce backup-top filter in further commit. 2. Rename s/initializing_bitmap/skip_unallocated/ to sound a bit better as interface to BlockCopyState 3. Split is not very clean: there left some duplicated fields, backup code uses some BlockCopyState fields directly, let's postpone it for further improvements and keep this comment simpler for review. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190920142056.12778-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	372c67ea61	block/backup: improve comment about image fleecing Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	0bd0c44372	block/backup: split shareable copying part from backup_do_cow Split copying logic which will be shared with backup-top filter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	1048ddf0a3	block/backup: fix backup_cow_with_offload for last cluster We shouldn't try to copy bytes beyond EOF. Fix it. Fixes: `9ded4a0114` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920142056.12778-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	981fb5810a	block/backup: fix max_transfer handling for copy_range Of course, QEMU_ALIGN_UP is a typo, it should be QEMU_ALIGN_DOWN, as we are trying to find aligned size which satisfy both source and target. Also, don't ignore too small max_transfer. In this case seems safer to disable copy_range. Fixes: `9ded4a0114` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190920142056.12778-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	d710cf575a	block/qcow2: introduce parallel subrequest handling in read and write It improves performance for fragmented qcow2 images. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190916175324.18478-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	6aa7a2631b	block/qcow2: refactor qcow2_co_pwritev_part Similarly to previous commit, prepare for parallelizing write-loop iterations. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190916175324.18478-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	88f468e546	block/qcow2: refactor qcow2_co_preadv_part Further patch will run partial requests of iterations of qcow2_co_preadv in parallel for performance reasons. To prepare for this, separate part which may be parallelized into separate function (qcow2_co_preadv_task). While being here, also separate encrypted clusters reading to own function, like it is done for compressed reading. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190916175324.18478-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	6e9b225f73	block: introduce aio task pool Common interface for aio task loops. To be used for improving performance of synchronous io loops in qcow2, block-stream, copy-on-read, and may be other places. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190916175324.18478-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Max Reitz	8644476e51	block: Skip COR for inactive nodes We must not write data to inactive nodes, and a COR is certainly something we can simply not do without upsetting anyone. So skip COR operations on inactive nodes. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191001174827.11081-2-mreitz@redhat.com Message-Id: <20191001174827.11081-2-mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-10-08 14:28:25 +01:00
Kevin Wolf	05f4aced65	block/snapshot: Restrict set of snapshot nodes Nodes involved in internal snapshots were those that were returned by bdrv_next(), inserted and not read-only. bdrv_next() in turn returns all nodes that are either the root node of a BlockBackend or monitor-owned nodes. With the typical -drive use, this worked well enough. However, in the typical -blockdev case, the user defines one node per option, making all nodes monitor-owned nodes. This includes protocol nodes etc. which often are not snapshottable, so "savevm" only returns an error. Change the conditions so that internal snapshot still include all nodes that have a BlockBackend attached (we definitely want to snapshot anything attached to a guest device and probably also the built-in NBD server; snapshotting block job BlockBackends is more of an accident, but a preexisting one), but other monitor-owned nodes are only included if they have no parents. This makes internal snapshots usable again with typical -blockdev configurations. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com>	2019-10-04 11:52:40 +02:00
Philippe Mathieu-Daudé	31e404151b	cutils: Move size_to_str() from "qemu-common.h" to "qemu/cutils.h" "qemu/cutils.h" contains various qemu_strtosz_*() functions useful to convert strings to size. It seems natural to have the opposite usage (from size to string) there too. The function definition is already in util/cutils.c. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20190903120555.7551-1-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2019-09-19 11:57:34 +02:00
Maxim Levitsky	603fbd076c	block/qcow2: refactor encryption code * Change the qcow2_co_{encrypt\|decrypt} to just receive full host and guest offsets and use this function directly instead of calling do_perform_cow_encrypt (which is removed by that patch). * Adjust qcow2_co_encdec to take full host and guest offsets as well. * Document the qcow2_co_{encrypt\|decrypt} arguments to prevent the bug fixed in former commit from hopefully happening again. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190915203655.21638-3-mlevitsk@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [mreitz: Let perform_cow() return the error value returned by qcow2_co_encrypt(), as proposed by Vladimir] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:36:22 +02:00
Maxim Levitsky	38e7d54bdc	block/qcow2: Fix corruption introduced by commit `8ac0f15f33` This fixes subtle corruption introduced by luks threaded encryption in commit `8ac0f15f33` Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922 The corruption happens when we do a write that * writes to two or more unallocated clusters at once * doesn't fully cover the first sector * doesn't fully cover the last sector * uses luks encryption In this case, when allocating the new clusters we COW both areas prior to the write and after the write, and we encrypt them. The above mentioned commit accidentally made it so we encrypt the second COW area using the physical cluster offset of the first area. The problem is that offset_in_cluster in do_perform_cow_encrypt can be larger that the cluster size, thus cluster_offset will no longer point to the start of the cluster at which encrypted area starts. Next patch in this series will refactor the code to avoid all these assumptions. In the bugreport that was triggered by rebasing a luks image to new, zero filled base, which lot of such writes, and causes some files with zero areas to contain garbage there instead. But as described above it can happen elsewhere as well Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190915203655.21638-2-mlevitsk@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:35:02 +02:00
Max Reitz	c34dc07f9f	curl: Check curl_multi_add_handle()'s return code If we had done that all along, debugging would have been much simpler. (Also, I/O errors are better than hangs.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-8-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:12 +02:00
Max Reitz	bfb23b480a	curl: Handle success in multi_check_completion Background: As of cURL 7.59.0, it verifies that several functions are not called from within a callback. Among these functions is curl_multi_add_handle(). curl_read_cb() is a callback from cURL and not a coroutine. Waking up acb->co will lead to entering it then and there, which means the current request will settle and the caller (if it runs in the same coroutine) may then issue the next request. In such a case, we will enter curl_setup_preadv() effectively from within curl_read_cb(). Calling curl_multi_add_handle() will then fail and the new request will not be processed. Fix this by not letting curl_read_cb() wake up acb->co. Instead, leave the whole business of settling the AIOCB objects to curl_multi_check_completion() (which is called from our timer callback and our FD handler, so not from any cURL callbacks). Reported-by: Natalie Gavrielov <ngavrilo@redhat.com> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1740193 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-7-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	9abaf9fc47	curl: Report only ready sockets Instead of reporting all sockets to cURL, only report the one that has caused curl_multi_do_locked() to be called. This lets us get rid of the QLIST_FOREACH_SAFE() list, which was actually wrong: SAFE foreaches are only safe when the current element is removed in each iteration. If it possible for the list to be concurrently modified, we cannot guarantee that only the current element will be removed. Therefore, we must not use QLIST_FOREACH_SAFE() here. Fixes: `ff5ca1664a` Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-6-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	9dbad87d25	curl: Pass CURLSocket to curl_multi_do() curl_multi_do_locked() currently marks all sockets as ready. That is not only inefficient, but in fact unsafe (the loop is). A follow-up patch will change that, but to do so, curl_multi_do_locked() needs to know exactly which socket is ready; and that is accomplished by this patch here. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	948403bcb1	curl: Check completion in curl_multi_do() While it is more likely that transfers complete after some file descriptor has data ready to read, we probably should not rely on it. Better be safe than sorry and call curl_multi_check_completion() in curl_multi_do(), too, just like it is done in curl_multi_read(). With this change, curl_multi_do() and curl_multi_read() are actually the same, so drop curl_multi_read() and use curl_multi_do() as the sole FD handler. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-4-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	007f339b10	curl: Keep *socket until the end of curl_sock_cb() This does not really change anything, but it makes the code a bit easier to follow once we use @socket as the opaque pointer for aio_set_fd_handler(). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-3-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	0487861685	curl: Keep pointer to the CURLState in CURLSocket A follow-up patch will make curl_multi_do() and curl_multi_read() take a CURLSocket instead of the CURLState. They still need the latter, though, so add a pointer to it to the former. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190910124136.10565-2-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Nir Soffer	1bbbf32d5f	block: Use QEMU_IS_ALIGNED Replace instances of: (n & (BDRV_SECTOR_SIZE - 1)) == 0 And: (n & ~BDRV_SECTOR_MASK) == 0 With: QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE) Which reveals the intent of the code better, and makes it easier to locate the code checking alignment. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20190827185913.27427-2-nsoffer@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 14:48:30 +02:00
Alberto Garcia	bf3d78ae55	qcow2: Stop overwriting compressed clusters one by one handle_alloc() tries to find as many contiguous clusters that need copy-on-write as possible in order to allocate all of them at the same time. However, compressed clusters are only overwritten one by one, so let's say that we have an image with 1024 consecutive compressed clusters: qemu-img create -f qcow2 hd.qcow2 64M for f in `seq 0 64 65472`; do qemu-io -c "write -c ${f}k 64k" hd.qcow2 done In this case trying to overwrite the whole image with one large write request results in 1024 separate allocations: qemu-io -c "write 0 64M" hd.qcow2 This restriction comes from commit `095a9c58ce` from 2008. Nowadays QEMU can overwrite multiple compressed clusters just fine, and in fact it already does: as long as the first cluster that handle_alloc() finds is not compressed, all other compressed clusters in the same batch will be overwritten in one go: qemu-img create -f qcow2 hd.qcow2 64M qemu-io -c "write -z 0 64k" hd.qcow2 for f in `seq 64 64 65472`; do qemu-io -c "write -c ${f}k 64k" hd.qcow2 done Compared to the previous one, overwriting this image on my computer goes from 8.35s down to 230ms. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: John Snow <jsnow@redhat.com Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:37 +02:00
Philippe Mathieu-Daudé	d90d5cae2b	block/create: Do not abort if a block driver is not available The 'blockdev-create' QMP command was introduced as experimental feature in commit `b0292b851b`, using the assert() debug call. It got promoted to 'stable' command in `3fb588a0f2`, but the assert call was not removed. Some block drivers are optional, and bdrv_find_format() might return a NULL value, triggering the assertion. Stable code is not expected to abort, so return an error instead. This is easily reproducible when libnfs is not installed: ./configure [...] module support no Block whitelist (rw) Block whitelist (ro) libiscsi support yes libnfs support no [...] Start QEMU: $ qemu-system-x86_64 -S -qmp unix:/tmp/qemu.qmp,server,nowait Send the 'blockdev-create' with the 'nfs' driver: $ ( cat << 'EOF' {'execute': 'qmp_capabilities'} {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'} EOF ) \| socat STDIO UNIX:/tmp/qemu.qmp {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 4}, "package": "v4.1.0-733-g89ea03a7dc"}, "capabilities": ["oob"]}} {"return": {}} QEMU crashes: $ gdb qemu-system-x86_64 core Program received signal SIGSEGV, Segmentation fault. (gdb) bt #0 0x00007ffff510957f in raise () at /lib64/libc.so.6 #1 0x00007ffff50f3895 in abort () at /lib64/libc.so.6 #2 0x00007ffff50f3769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007ffff5101a26 in .annobin_assert.c_end () at /lib64/libc.so.6 #4 0x0000555555d7e1f1 in qmp_blockdev_create (job_id=0x555556baee40 "x", options=0x555557666610, errp=0x7fffffffc770) at block/create.c:69 #5 0x0000555555c96b52 in qmp_marshal_blockdev_create (args=0x7fffdc003830, ret=0x7fffffffc7f8, errp=0x7fffffffc7f0) at qapi/qapi-commands-block-core.c:1314 #6 0x0000555555deb0a0 in do_qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false, errp=0x7fffffffc898) at qapi/qmp-dispatch.c:131 #7 0x0000555555deb2a1 in qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false) at qapi/qmp-dispatch.c:174 With this patch applied, QEMU returns a QMP error: {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'} {"id": "x", "error": {"class": "GenericError", "desc": "Block driver 'nfs' not found or not supported"}} Cc: qemu-stable@nongnu.org Reported-by: Xu Tian <xutian@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:37 +02:00
Peter Lieven	d2c6becbe0	block/nfs: add support for nfs_umount libnfs recently added support for unmounting. Add support in Qemu too. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:15 +02:00
Peter Lieven	601dc65597	block/nfs: tear down aio before nfs_close nfs_close is a sync call from libnfs and has its own event handler polling on the nfs FD. Avoid that both QEMU and libnfs are intefering here. CC: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:14 +02:00
Max Reitz	1a37e31244	vpc: Return 0 from vpc_co_create() on success blockdev_create_run() directly uses .bdrv_co_create()'s return value as the job's return value. Jobs must return 0 on success, not just any nonnegative value. Therefore, using blockdev-create for VPC images may currently fail as the vpc driver may return a positive integer. Because there is no point in returning a positive integer anywhere in the block layer (all non-negative integers are generally treated as complete success), we probably do not want to add more such cases. Therefore, fix this problem by making the vpc driver always return 0 in case of success. Suggested-by: Kevin Wolf <kwolf@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Kevin Wolf	effecce6bc	file-posix: Fix has_write_zeroes after NO_FALLBACK If QEMU_AIO_NO_FALLBACK is given, we always return failure and don't even try to use the BLKZEROOUT ioctl. In this failure case, we shouldn't disable has_write_zeroes because we didn't learn anything about the ioctl. The next request might not set QEMU_AIO_NO_FALLBACK and we can still use the ioctl then. Fixes: `738301e117` Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-09-10 08:58:43 +02:00
Max Reitz	b2c6f23f4a	block/file-posix: Reduce xfsctl() use This patch removes xfs_write_zeroes() and xfs_discard(). Both functions have been added just before the same feature was present through fallocate(): - fallocate() has supported PUNCH_HOLE for XFS since Linux 2.6.38 (March 2011); xfs_discard() was added in December 2010. - fallocate() has supported ZERO_RANGE for XFS since Linux 3.15 (June 2014); xfs_write_zeroes() was added in November 2013. Nowadays, all systems that qemu runs on should support both fallocate() features (RHEL 7's kernel does). xfsctl() is still useful for getting the request alignment for O_DIRECT, so this patch does not remove our dependency on it completely. Note that xfs_write_zeroes() had a bug: It calls ftruncate() when the file is shorter than the specified range (because ZERO_RANGE does not increase the file length). ftruncate() may yield and then discard data that parallel write requests have written past the EOF in the meantime. Dropping the function altogether fixes the bug. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: `50ba5b2d99` Reported-by: Lukáš Doktor <ldoktor@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Vladimir Sementsov-Ogievskiy	bb0c940993	job: drop job_drain In job_finish_sync job_enter should be enough for a job to make some progress and draining is a wrong tool for it. So use job_enter directly here and drop job_drain with all related staff not used more. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: John Snow <jsnow@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Alberto Garcia	b70d08205b	qcow2: Fix the calculation of the maximum L2 cache size The size of the qcow2 L2 cache defaults to 32 MB, which can be easily larger than the maximum amount of L2 metadata that the image can have. For example: with 64 KB clusters the user would need a qcow2 image with a virtual size of 256 GB in order to have 32 MB of L2 metadata. Because of that, since commit `b749562d98` we forbid the L2 cache to become larger than the maximum amount of L2 metadata for the image, calculated using this formula: uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8); The problem with this formula is that the result should be rounded up to the cluster size because an L2 table on disk always takes one full cluster. For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly 160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if the last 32 KB of those are not going to be used. However QEMU rounds the numbers down and only creates 2 cache tables (128 KB), which is not enough for the image. A quick test doing 4KB random writes on a 1280 MB image gives me around 500 IOPS, while with the correct cache size I get 16K IOPS. Cc: qemu-stable@nongnu.org Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Eric Blake	f061656cc3	nbd: Implement client use of NBD FAST_ZERO The client side is fairly straightforward: if the server advertised fast zero support, then we can map that to BDRV_REQ_NO_FALLBACK support. A server that advertises FAST_ZERO but not WRITE_ZEROES is technically broken, but we can ignore that situation as it does not change our behavior. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-09-05 16:03:26 -05:00
Andrey Shinkevich	294682cc3a	block: workaround for unaligned byte range in fallocate() Revert the commit `118f99442d` 'block/io.c: fix for the allocation failure' and use better error handling for file systems that do not support fallocate() for an unaligned byte range. Allow falling back to pwrite in case fallocate() returns EINVAL. Suggested-by: Kevin Wolf <kwolf@redhat.com> Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <1566913973-15490-1-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-09-05 16:01:31 -05:00
Eric Blake	df18c04edf	nbd: Use g_autofree in a few places Thanks to our recent move to use glib's g_autofree, I can join the bandwagon. Getting rid of gotos is fun ;) There are probably more places where we could register cleanup functions and get rid of more gotos; this patch just focuses on the labels that existed merely to call g_free. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190824172813.29720-2-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-09-05 15:52:45 -05:00
Stefan Hajnoczi	236094c738	file-posix: fix request_alignment typo Fixes: `a6b257a08e` ("file-posix: Handle undetectable alignment") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190827101328.4062-1-stefanha@redhat.com Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Max Reitz	bedb8bb419	vmdk: Reject invalid compressed writes Compressed writes generally have to write full clusters, not just in theory but also in practice when it comes to vmdk's streamOptimized subformat. It currently is just silently broken for writes with non-zero in-cluster offsets: $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk wrote 4096/4096 bytes at offset 4096 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec) read failed: Invalid argument (The technical reason is that vmdk_write_extent() just writes the incomplete compressed data actually to offset 4k. When reading the data, vmdk_read_extent() looks at offset 0 and finds the compressed data size to be 0, because that is what it reads from there. This yields an error.) For incomplete writes with zero in-cluster offsets, the error path when reading the rest of the cluster is a bit different, but the result is the same: $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec) read failed: Invalid argument (Here, vmdk_read_extent() finds the data and then sees that the uncompressed data is short.) It is better to reject invalid writes than to make the user believe they might have succeeded and then fail when trying to read it back. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190815153638.4600-5-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Max Reitz	cdc0dd2586	vmdk: Use bdrv_dirname() for relative extent paths This makes iotest 033 pass with e.g. subformat=monolithicFlat. It also turns a former error in 059 into success. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190815153638.4600-3-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Nir Soffer	3a20013fbb	block: posix: Always allocate the first block When creating an image with preallocation "off" or "falloc", the first block of the image is typically not allocated. When using Gluster storage backed by XFS filesystem, reading this block using direct I/O succeeds regardless of request length, fooling alignment detection. In this case we fallback to a safe value (4096) instead of the optimal value (512), which may lead to unneeded data copying when aligning requests. Allocating the first block avoids the fallback. Since we allocate the first block even with preallocation=off, we no longer create images with zero disk size: $ ./qemu-img create -f raw test.raw 1g Formatting 'test.raw', fmt=raw size=1073741824 $ ls -lhs test.raw 4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw And converting the image requires additional cluster: $ ./qemu-img measure -f raw -O qcow2 test.raw required size: 458752 fully allocated size: 1074135040 When using format like vmdk with multiple files per image, we allocate one block per file: $ ./qemu-img create -f vmdk -o subformat=twoGbMaxExtentFlat test.vmdk 4g Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined subformat=twoGbMaxExtentFlat $ ls -lhs test*.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f001.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f002.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 353 Aug 27 03:23 test.vmdk I did quick performance test for copying disks with qemu-img convert to new raw target image to Gluster storage with sector size of 512 bytes: for i in $(seq 10); do rm -f dst.raw sleep 10 time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw done Here is a table comparing the total time spent: Type Before(s) After(s) Diff(%) --------------------------------------- real 530.028 469.123 -11.4 user 17.204 10.768 -37.4 sys 17.881 7.011 -60.7 We can see very clear improvement in CPU usage. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20190827010528.8818-2-nsoffer@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Vladimir Sementsov-Ogievskiy	5396234b96	block/qcow2: implement .bdrv_co_pwritev(_compressed)_part Implement and use new interface to get rid of hd_qiov. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-13-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-13-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00

1 2 3 4 5 ...

4570 commits