qemu/migration
Peter Xu 910c164736 migration/postcopy: Fix high frequency sync
With current code base I can observe extremely high sync count during
precopy, as long as one enables postcopy-ram=on before switchover to
postcopy.

To provide some context of when QEMU decides to do a full sync: it checks
must_precopy (which implies "data must be sent during precopy phase"), and
as long as it is lower than the threshold size we calculated (out of
bandwidth and expected downtime) QEMU will kick off the slow/exact sync.

However, when postcopy is enabled (even if still during precopy phase), RAM
only reports all pages as can_postcopy, and report must_precopy==0.  Then
"must_precopy <= threshold_size" mostly always triggers and enforces a slow
sync for every call to migration_iteration_run() when postcopy is enabled
even if not used.  That is insane.

It turns out it was a regress bug introduced in the previous refactoring in
8.0 as reported by Nina [1]:

  (a) c8df4a7aef ("migration: Split save_live_pending() into state_pending_*")

Then a workaround patch is applied at the end of release (8.0-rc4) to fix it:

  (b) 28ef5339c3 ("migration: fix ram_state_pending_exact()")

However that "workaround" was overlooked when during the cleanup in this
9.0 release in this commit..

  (c) b0504edd40 ("migration: Drop unnecessary check in ram's pending_exact()")

Then the issue was re-exposed as reported by Nina [1].

The problem with (b) is that it only fixed the case for RAM, rather than
all the rest of iterators.  Here a slow sync should only be required if all
dirty data (precopy+postcopy) is less than the threshold_size that QEMU
calculated.  It is even debatable whether a sync is needed when switched to
postcopy.  Currently ram_state_pending_exact() will be mostly noop if
switched to postcopy, and that logic seems to apply too for all the rest of
iterators, as sync dirty bitmap during a postcopy doesn't make much sense.
However let's leave such change for later, as we're in rc phase.

So rather than reusing commit (b), this patch provides the complete fix for
all iterators.  When at it, cleanup a little bit on the lines around.

[1] https://gitlab.com/qemu-project/qemu/-/issues/1565

Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Fixes: b0504edd40 ("migration: Drop unnecessary check in ram's pending_exact()")
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240320214453.584374-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-22 12:12:08 -04:00
..
block-dirty-bitmap.c Replace "iothread lock" with "BQL" in comments 2024-01-08 10:45:43 -05:00
block.c migration: Skip only empty block devices 2024-03-13 07:33:41 -04:00
block.h migration: disable auto-converge during bulk block migration 2017-09-27 11:27:14 +01:00
channel-block.c io: follow coroutine AioContext in qio_channel_yield() 2023-09-07 20:32:11 -05:00
channel-block.h migration: introduce a QIOChannel impl for BlockDriverState VMState 2022-06-22 19:33:43 +01:00
channel.c migration: Fix migration_channel_read_peek() error path 2024-01-04 09:52:42 +08:00
channel.h migration: check magic value for deciding the mapping of channels 2023-02-06 19:22:57 +01:00
colo-failover.c migration/colo: Improve an x-colo-lost-heartbeat error message 2023-02-23 14:10:17 +01:00
colo.c migration: privatize colo interfaces 2024-03-11 16:28:59 -04:00
dirtyrate.c system/cpus: rename qemu_mutex_lock_iothread() to bql_lock() 2024-01-08 10:45:43 -05:00
dirtyrate.h migration/calc-dirty-rate: millisecond-granularity period 2023-10-10 08:03:50 +08:00
exec.c migration: simplify exec migration functions 2024-03-04 07:12:40 +01:00
exec.h migration: convert exec backend to accept MigrateAddress. 2023-11-02 11:35:04 +01:00
fd.c migration: Revert mapped-ram multifd support to fd: URI 2024-03-22 12:12:08 -04:00
fd.h migration: Revert mapped-ram multifd support to fd: URI 2024-03-22 12:12:08 -04:00
file.c migration: Revert mapped-ram multifd support to fd: URI 2024-03-22 12:12:08 -04:00
file.h migration: Fix iocs leaks during file and fd migration 2024-03-14 11:39:08 -04:00
global_state.c migration 1st pull for 9.0 2024-01-05 13:35:25 +00:00
meson.build migration/multifd: Implement zero page transmission on the multifd thread. 2024-03-11 16:57:09 -04:00
migration-hmp-cmds.c migration/multifd: Add new migration option zero-page-detection. 2024-03-11 16:57:05 -04:00
migration-stats.c migration: migration_rate_limit_reset() don't need the QEMUFile 2023-10-31 08:44:33 +01:00
migration-stats.h migration: Remove transferred atomic counter 2023-10-31 08:44:33 +01:00
migration.c migration/postcopy: Fix high frequency sync 2024-03-22 12:12:08 -04:00
migration.h migration: purge MigrationState from public interface 2024-03-11 16:28:59 -04:00
multifd-zero-page.c migration/multifd: Implement zero page transmission on the multifd thread. 2024-03-11 16:57:09 -04:00
multifd-zlib.c * Add missing ERRP_GUARD() statements in functions that need it 2024-03-12 16:55:42 +00:00
multifd-zstd.c migration/multifd: Implement zero page transmission on the multifd thread. 2024-03-11 16:57:09 -04:00
multifd.c migration: Revert mapped-ram multifd support to fd: URI 2024-03-22 12:12:08 -04:00
multifd.h migration/multifd: Implement zero page transmission on the multifd thread. 2024-03-11 16:57:09 -04:00
options.c * Add missing ERRP_GUARD() statements in functions that need it 2024-03-12 16:55:42 +00:00
options.h migration/multifd: Add new migration option zero-page-detection. 2024-03-11 16:57:05 -04:00
page_cache.c migration: Fix cache_init()'s "Failed to allocate" error messages 2021-02-08 11:19:51 +00:00
page_cache.h migration: Clean up signed vs. unsigned XBZRLE cache-size 2021-02-08 11:19:51 +00:00
postcopy-ram.c error: Move ERRP_GUARD() to the beginning of the function 2024-03-12 11:45:45 +01:00
postcopy-ram.h migration: remove error from notifier data 2024-02-28 11:31:28 +08:00
qemu-file.c migration: Report error when shutdown fails 2024-03-11 14:41:40 -04:00
qemu-file.h migration/qemu-file: add utility methods for working with seekable channels 2024-03-01 15:42:04 +08:00
ram-compress.c migration: Rename ram_compressed_pages() to compress_ram_pages() 2023-10-30 17:41:55 +01:00
ram-compress.h migration: Rename ram_compressed_pages() to compress_ram_pages() 2023-10-30 17:41:55 +01:00
ram.c migration/multifd: Implement ram_save_target_page_multifd to handle multifd version of MigrationOps::ram_save_target_page. 2024-03-11 16:57:09 -04:00
ram.h migration/multifd: Allow clearing of the file_bmap from multifd 2024-03-11 16:56:52 -04:00
rdma.c migration/rdma: Fix a memory issue for migration 2024-03-11 14:41:40 -04:00
rdma.h migration: convert rdma backend to accept MigrateAddress 2023-11-02 11:35:03 +01:00
savevm.c migration: export migration_is_running 2024-03-11 16:28:59 -04:00
savevm.h migration: Add .save_prepare() handler to struct SaveVMHandlers 2023-09-11 08:34:06 +02:00
socket.c migration/multifd: Drop unnecessary helper to destroy IOC 2024-02-28 11:31:28 +08:00
socket.h migration/multifd: Drop unnecessary helper to destroy IOC 2024-02-28 11:31:28 +08:00
target.c migration: Add migration prefix to functions in target.c 2023-09-11 08:34:06 +02:00
threadinfo.c migration/multifd: Protect accesses to migration_threads 2023-07-26 10:55:56 +02:00
threadinfo.h migration/multifd: Protect accesses to migration_threads 2023-07-26 10:55:56 +02:00
tls.c migration: Drop unused parameter for migration_tls_client_create() 2023-05-03 11:24:20 +02:00
tls.h migration: Drop unused parameter for migration_tls_client_create() 2023-05-03 11:24:20 +02:00
trace-events migration/multifd: Implement zero page transmission on the multifd thread. 2024-03-11 16:57:09 -04:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vmstate-types.c Move CPU softfloat unions to cpu-float.h 2022-04-06 14:31:43 +02:00
vmstate.c migration: Make VMStateDescription.subsections const 2023-12-29 11:17:30 +11:00
xbzrle.c migration/xbzrle: Use i386 host/cpuinfo.h 2023-05-23 16:51:18 -07:00
xbzrle.h migration/xbzrle: Use i386 host/cpuinfo.h 2023-05-23 16:51:18 -07:00
yank_functions.c migration/yank: Use channel features 2024-01-29 11:02:12 +08:00
yank_functions.h migration: Move the yank unregister of channel_close out 2021-07-26 12:45:03 +01:00