system/qemu - HydraGit

mirror of https://gitlab.com/qemu-project/qemu synced 2024-11-05 20:35:44 +00:00

Author	SHA1	Message	Date
Gavin Shan	1e493be587	migration: Add last stage indicator to global dirty log The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callbacks. This last stage indicator isn't used until the dirty ring is enabled in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-2-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-05-18 08:53:50 +02:00
Peter Xu	56adee407f	kvm: dirty-ring: Fix race with vcpu creation It's possible that we want to reap a dirty ring on a vcpu that is during creation, because the vcpu is put onto list (CPU_FOREACH visible) before initialization of the structures. In this case: qemu_init_vcpu x86_cpu_realizefn cpu_exec_realizefn cpu_list_add <---- can be probed by CPU_FOREACH qemu_init_vcpu cpus_accel->create_vcpu_thread(cpu); kvm_init_vcpu map kvm_dirty_gfns <--- kvm_dirty_gfns valid Don't try to reap dirty ring on vcpus during creation or it'll crash. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2124756 Reported-by: Xiaohui Li <xiaohli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1d14deb6684bcb7de1c9633c5bd21113988cc698.1676563222.git.huangy81@chinatelecom.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-04-04 18:46:46 +02:00
Mads Ynddal	412ae12647	gdbstub: move update guest debug to accel ops Continuing the refactor of `a48e7d9e52` (gdbstub: move guest debug support check to ops) by removing hardcoded kvm_enabled() from generic cpu.c code, and replace it with a property of AccelOpsClass. Signed-off-by: Mads Ynddal <m.ynddal@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230207131721.49233-1-mads@ynddal.dk> [AJB: add ifdef around update_guest_debug_ops, fix brace] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230302190846.2593720-27-alex.bennee@linaro.org> Message-Id: <20230303025805.625589-30-richard.henderson@linaro.org>	2023-03-07 20:44:09 +00:00
David Woodhouse	e16aff4cc2	kvm/i386: Add xen-evtchn-max-pirq property The default number of PIRQs is set to 256 to avoid issues with 32-bit MSI devices. Allow it to be increased if the user desires. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>	2023-03-01 09:09:22 +00:00
David Woodhouse	6f43f2ee49	kvm/i386: Add xen-gnttab-max-frames property Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>	2023-03-01 09:07:52 +00:00
David Woodhouse	61491cf441	i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support This just initializes the basic Xen support in KVM for now. Only permitted on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap overlay, event channel, grant tables and other stuff will exist. There's no point having the basic hypercall support if nothing else works. Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used later by support devices. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>	2023-03-01 08:22:49 +00:00
Philippe Mathieu-Daudé	2459d4209f	accel/kvm: Silent -Wmissing-field-initializers warning Silent when compiling with -Wextra: ../accel/kvm/kvm-all.c:2291:17: warning: missing field 'num' initializer [-Wmissing-field-initializers] { NULL, } ^ Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20221220143532.24958-3-philmd@linaro.org>	2023-02-27 22:29:01 +01:00
Philippe Mathieu-Daudé	55b5b8e928	gdbstub: Use vaddr type for generic insert/remove_breakpoint() API Both insert/remove_breakpoint() handlers are used in system and user emulation. We can not use the 'hwaddr' type on user emulation, we have to use 'vaddr' which is defined as "wide enough to contain any #target_ulong virtual address". gdbstub.c doesn't require to include "exec/hwaddr.h" anymore. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221216215519.5522-4-philmd@linaro.org>	2023-02-27 22:29:01 +01:00
Markus Armbruster	aa09b3d5f8	stats: Move QMP commands from monitor/ to stats/ This moves these commands from MAINTAINERS section "QMP" to new section "Stats". Status is Orphan. Volunteers welcome! Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230124121946.1139465-23-armbru@redhat.com>	2023-02-04 07:56:54 +01:00
David Hildenbrand	f39b7d2b96	kvm: Atomic memslot updates If we update an existing memslot (e.g., resize, split), we temporarily remove the memslot to re-add it immediately afterwards. These updates are not atomic, especially not for KVM VCPU threads, such that we can get spurious faults. Let's inhibit most KVM ioctls while performing relevant updates, such that we can perform the update just as if it would happen atomically without additional kernel support. We capture the add/del changes and apply them in the notifier commit stage instead. There, we can check for overlaps and perform the ioctl inhibiting only if really required (-> overlap). To keep things simple we don't perform additional checks that wouldn't actually result in an overlap -- such as !RAM memory regions in some cases (see kvm_set_phys_mem()). To minimize cache-line bouncing, use a separate indicator (in_ioctl_lock) per CPU. Also, make sure to hold the kvm_slots_lock while performing both actions (removing+re-adding). We have to wait until all IOCTLs were exited and block new ones from getting executed. This approach cannot result in a deadlock as long as the inhibitor does not hold any locks that might hinder an IOCTL from getting finished and exited - something fairly unusual. The inhibitor will always hold the BQL. AFAIKs, one possible candidate would be userfaultfd. If a page cannot be placed (e.g., during postcopy), because we're waiting for a lock, or if the userfaultfd thread cannot process a fault, because it is waiting for a lock, there could be a deadlock. However, the BQL is not applicable here, because any other guest memory access while holding the BQL would already result in a deadlock. Nothing else in the kernel should block forever and wait for userspace intervention. Note: pause_all_vcpus()/resume_all_vcpus() or start_exclusive()/end_exclusive() cannot be used, as they either drop the BQL or require to be called without the BQL - something inhibitors cannot handle. We need a low-level locking mechanism that is deadlock-free even when not releasing the BQL. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Tested-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221111154758.1372674-4-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-01-11 09:59:39 +01:00
Emanuele Giuseppe Esposito	a27dd2de68	KVM: keep track of running ioctls Using the new accel-blocker API, mark where ioctls are being called in KVM. Next, we will implement the critical section that will take care of performing memslots modifications atomically, therefore preventing any new ioctl from running and allowing the running ones to finish. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221111154758.1372674-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-01-11 09:59:39 +01:00
Markus Armbruster	d1c81c3496	qapi: Use returned bool to check for failure (again) Commit `012d4c96e2` changed the visitor functions taking Error ** to return bool instead of void, and the commits following it used the new return value to simplify error checking. Since then a few more uses in need of the same treatment crept in. Do that. All pretty mechanical except for * balloon_stats_get_all() This is basically the same transformation commit `012d4c96e2` applied to the virtual walk example in include/qapi/visitor.h. * set_max_queue_size() Additionally replace "goto end of function" by return. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221121085054.683122-10-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2022-12-14 16:19:35 +01:00
Chenyi Qiang	e2e69f6bb9	i386: add notify VM exit support There are cases that malicious virtual machine can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. Notify VM exit is introduced to mitigate such kind of attacks, which will generate a VM exit if no event window occurs in VM non-root mode for a specified amount of time (notify window). A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space so that the user can query the capability and set the expected notify window when creating VMs. The format of the argument when enabling this capability is as follows: Bit 63:32 - notify window specified in qemu command Bit 31:0 - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to enable the feature.) Users can configure the feature by a new (x86 only) accel property: qemu -accel kvm,notify-vmexit=run\|internal-error\|disable,notify-window=n The default option of notify-vmexit is run, which will enable the capability and do nothing if the exit happens. The internal-error option raises a KVM internal error if it happens. The disable option does not enable the capability. The default value of notify-window is 0. It is valid only when notify-vmexit is not disabled. The valid range of notify-window is non-negative. It is even safe to set it to zero since there's an internal hardware threshold to be added to ensure no false positive. Because a notify VM exit may happen with VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated that would set this bit), which means VM context is corrupted. It would be reflected in the flags of KVM_EXIT_NOTIFY exit. If KVM_NOTIFY_CONTEXT_INVALID bit is set, raise a KVM internal error unconditionally. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220929072014.20705-5-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-11 09:36:00 +02:00
Chenyi Qiang	5f8a6bce1f	kvm: expose struct KVMState Expose struct KVMState out of kvm-all.c so that the field of struct KVMState can be accessed when defining target-specific accelerator properties. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220929072014.20705-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-11 09:36:00 +02:00
Paolo Bonzini	3dba0a335c	kvm: allow target-specific accelerator properties Several hypervisor capabilities in KVM are target-specific. When exposed to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), they should not be available for all targets. Add a hook for targets to add their own properties to -accel kvm, for now no such property is defined. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220929072014.20705-3-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-10 09:23:16 +02:00
Alex Bennée	c7f1c53735	accel/kvm: move kvm_update_guest_debug to inline stub Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220929114231.583801-47-alex.bennee@linaro.org>	2022-10-06 11:53:41 +01:00
Alex Bennée	a48e7d9e52	gdbstub: move guest debug support check to ops This removes the final hard coding of kvm_enabled() in gdbstub and moves the check to an AccelOps. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Mads Ynddal <mads@ynddal.dk> Message-Id: <20220929114231.583801-46-alex.bennee@linaro.org>	2022-10-06 11:53:41 +01:00
Alex Bennée	ae7467b1ac	gdbstub: move breakpoint logic to accel ops As HW virtualization requires specific support to handle breakpoints lets push out special casing out of the core gdbstub code and into AccelOpsClass. This will make it easier to add other accelerator support and reduces some of the stub shenanigans. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Mads Ynddal <mads@ynddal.dk> Message-Id: <20220929114231.583801-45-alex.bennee@linaro.org>	2022-10-06 11:53:41 +01:00
Alex Bennée	3b7a93880a	gdbstub: move sstep flags probing into AccelClass The support of single-stepping is very much dependent on support from the accelerator we are using. To avoid special casing in gdbstub move the probing out to an AccelClass function so future accelerators can put their code there. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Mads Ynddal <mads@ynddal.dk> Message-Id: <20220929114231.583801-44-alex.bennee@linaro.org>	2022-10-06 11:53:41 +01:00
Paolo Bonzini	21adec30f6	kvm: fix memory leak on failure to read stats descriptors Reported by Coverity as CID 1490142. Since the size is constant and the lifetime is the same as the StatsDescriptors struct, embed the struct directly instead of using a separate allocation. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-18 09:17:40 +02:00
Paolo Bonzini	52281c6d11	KVM: use store-release to mark dirty pages as harvested The following scenario can happen if QEMU sets more RESET flags while the KVM_RESET_DIRTY_RINGS ioctl is ongoing on another host CPU: CPU0 CPU1 CPU2 ------------------------ ------------------ ------------------------ fill gfn0 store-rel flags for gfn0 fill gfn1 store-rel flags for gfn1 load-acq flags for gfn0 set RESET for gfn0 load-acq flags for gfn1 set RESET for gfn1 do ioctl! -----------> ioctl(RESET_RINGS) fill gfn2 store-rel flags for gfn2 load-acq flags for gfn2 set RESET for gfn2 process gfn0 process gfn1 process gfn2 do ioctl! etc. The three load-acquire in CPU0 synchronize with the three store-release in CPU2, but CPU0 and CPU1 are only synchronized up to gfn1 and CPU1 may miss gfn2's fields other than flags. The kernel must be able to cope with invalid values of the fields, and userspace will invoke the ioctl once more. However, once the RESET flag is cleared on gfn2, it is lost forever, therefore in the above scenario CPU1 must read the correct value of gfn2's fields. Therefore RESET must be set with a store-release, that will synchronize with KVM's load-acquire in CPU1. Cc: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-18 09:17:40 +02:00
Paolo Bonzini	4802bf910e	KVM: dirty ring: add missing memory barrier The KVM_DIRTY_GFN_F_DIRTY flag ensures that the entry is valid. If the read of the fields are not ordered after the read of the flag, QEMU might see stale values. Cc: Gavin Shan <gshan@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 08:37:04 +02:00
Paolo Bonzini	a9197ad210	kvm: fix segfault with query-stats-schemas and -M none -M none creates a guest without a vCPU, causing the following error: $ ./qemu-system-x86_64 -qmp stdio -M none -accel kvm {execute:qmp_capabilities} {"return": {}} {execute: query-stats-schemas} Segmentation fault (core dumped) Fix it by not querying the vCPU stats if first_cpu is NULL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-18 14:08:24 +02:00
Cornelia Huck	47c182fe8b	kvm: don't use perror() without useful errno perror() is designed to append the decoded errno value to a string. This, however, only makes sense if we called something that actually sets errno prior to that. For the callers that check for split irqchip support that is not the case, and we end up with confusing error messages that end in "success". Use error_report() instead. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20220728142446.438177-1-cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-29 00:15:02 +02:00
Peter Maydell	d12dd9c7ee	accel/kvm: Avoid Coverity warning in query_stats() Coverity complains that there is a codepath in the query_stats() function where it can leak the memory pointed to by stats_list. This can only happen if the caller passes something other than STATS_TARGET_VM or STATS_TARGET_VCPU as the 'target', which no callsite does. Enforce this assumption using g_assert_not_reached(), so that if we have a future bug we hit the assert rather than silently leaking memory. Resolves: Coverity CID 1490140 Fixes: `cc01a3f4ca` ("kvm: Support for querying fd-based stats") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220719134853.327059-1-peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-22 19:01:44 +02:00
Peter Maydell	5288bee45f	* Boolean statistics for KVM * Fix build on Haiku -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLWejIUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOrhggArpe2oZHD0Bi+toGOu4wg0zq9PKZJ Mj8v2hjPHbVU0yj1vXbO4skm6OggcH1JgktNZb8vd5QJBiCZorSIR2FPyuTk677U tHrOyzw/r+zPk43bEb/r/O4uGCFmlQUYiesayUKViJVqcF3sUGvBS4dMBKiGnPi7 hyVLelnXqotcQYsURAXVYuVChDVMZs8ACa7vP9WKGEYWEkVdQRSlk9VMmssan0dD Ly+Ikw0FPENJYkNHT8+tM6VYv+Fpsi+PBcijUKRyfsfU5qmPm53rZKEAIhw0jCCV PsEZhzvAdU+frfOscuYkaUUgCYxy7dnXm90W7uMpLJYMECJgVuYoL4IKNQ== =AFZi -----END PGP SIGNATURE----- Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging * Boolean statistics for KVM * Fix build on Haiku # gpg: Signature made Tue 19 Jul 2022 10:32:34 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: util: Fix broken build on Haiku kvm: add support for boolean statistics monitor: add support for boolean statistics Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-07-21 11:13:01 +01:00
Hyman Huang(黄勇)	baa609832e	softmmu/dirtylimit: Implement virtual CPU throttle Setup a negative feedback system when vCPU thread handling KVM_EXIT_DIRTY_RING_FULL exit by introducing throttle_us_per_full field in struct CPUState. Sleep throttle_us_per_full microseconds to throttle vCPU if dirtylimit is in service. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <977e808e03a1cef5151cae75984658b6821be618.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-07-20 12:15:08 +01:00
Hyman Huang(黄勇)	4a06a7cc05	accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function Introduce kvm_dirty_ring_size util function to help calculate dirty ring ful time. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <f9ce1f550bfc0e3a1f711e17b1dbc8f701700e56.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-07-20 12:15:08 +01:00
Hyman Huang(黄勇)	1667e2b97b	accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so that it can cover single vcpu dirty-ring-reaping scenario. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <c32001242875e83b0d9f78f396fe2dcd380ba9e8.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-07-20 12:15:08 +01:00
Paolo Bonzini	105bb7cdbe	kvm: add support for boolean statistics The next version of Linux will introduce boolean statistics, which can only have 0 or 1 values. Convert them to the new QAPI fields added in the previous commit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-18 18:51:32 +02:00
Miaoqian Lin	f696b74b15	accel: kvm: Fix memory leak in find_stats_descriptors This function doesn't release descriptors in one error path, result in memory leak. Call g_free() to release it. Fixes: `cc01a3f4ca` ("kvm: Support for querying fd-based stats") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Message-Id: <20220624063159.57411-1-linmq006@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-08 11:03:36 +02:00
Paolo Bonzini	cf7405bc02	qmp: add filtering of statistics by name Allow retrieving only a subset of statistics. This can be useful for example in order to plot a subset of the statistics many times a second: KVM publishes ~40 statistics for each vCPU on x86; retrieving and serializing all of them would be useless. Another use will be in HMP in the following patch; implementing the filter in the backend is easy enough that it was deemed okay to make this a public interface. Example: { "execute": "query-stats", "arguments": { "target": "vcpu", "vcpus": [ "/machine/unattached/device[2]", "/machine/unattached/device[4]" ], "providers": [ { "provider": "kvm", "names": [ "l1d_flush", "exits" ] } } } { "return": { "vcpus": [ { "path": "/machine/unattached/device[2]" "providers": [ { "provider": "kvm", "stats": [ { "name": "l1d_flush", "value": 41213 }, { "name": "exits", "value": 74291 } ] } ] }, { "path": "/machine/unattached/device[4]" "providers": [ { "provider": "kvm", "stats": [ { "name": "l1d_flush", "value": 16132 }, { "name": "exits", "value": 57922 } ] } ] } ] } } Extracted from a patch by Mark Kanda. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-14 16:50:30 +02:00
Paolo Bonzini	068cc51d42	qmp: add filtering of statistics by provider Allow retrieving the statistics from a specific provider only. This can be used in the future by HMP commands such as "info sync-profile" or "info profile". The next patch also adds filter-by-provider capabilities to the HMP equivalent of query-stats, "info stats". Example: { "execute": "query-stats", "arguments": { "target": "vm", "providers": [ { "provider": "kvm" } ] } } The QAPI is a bit more verbose than just a list of StatsProvider, so that it can be subsequently extended with filtering of statistics by name. If a provider is specified more than once in the filter, each request will be included separately in the output. Extracted from a patch by Mark Kanda. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-14 16:50:30 +02:00
Paolo Bonzini	467ef823d8	qmp: add filtering of statistics by target vCPU Introduce a simple filtering of statistics, that allows to retrieve statistics for a subset of the guest vCPUs. This will be used for example by the HMP monitor, in order to retrieve the statistics for the currently selected CPU. Example: { "execute": "query-stats", "arguments": { "target": "vcpu", "vcpus": [ "/machine/unattached/device[2]", "/machine/unattached/device[4]" ] } } Extracted from a patch by Mark Kanda. Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-14 16:50:30 +02:00
Mark Kanda	cc01a3f4ca	kvm: Support for querying fd-based stats Add support for querying fd-based KVM stats - as introduced by Linux kernel commit: cb082bfab59a ("KVM: stats: Add fd-based API to read binary stats data") This allows the user to analyze the behavior of the VM without access to debugfs. Signed-off-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-14 16:50:30 +02:00
Peter Maydell	9323e79f10	Fix 'writeable' typos We have about 30 instances of the typo/variant spelling 'writeable', and over 500 of the more common 'writable'. Standardize on the latter. Change produced with: sed -i -e 's/$[Ww][Rr][Ii][Tt]$[Ee]$[Aa][Bb][Ll][Ee]$/\1\2/g' $(git grep -il writeable) and then hand-undoing the instance in linux-headers/linux/kvm.h. Most of these changes are in comments or documentation; the exceptions are: * a local variable in accel/hvf/hvf-accel-ops.c * a local variable in accel/kvm/kvm-all.c * the PMCR_WRITABLE_MASK macro in target/arm/internals.h * the EPT_VIOLATION_GPA_WRITABLE macro in target/i386/hvf/vmcs.h (which is never used anywhere) * the AR_TYPE_WRITABLE_MASK macro in target/i386/hvf/vmx.h (which is never used anywhere) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Weil <sw@weilnetz.de> Message-id: 20220505095015.2714666-1-peter.maydell@linaro.org	2022-06-08 19:38:47 +01:00
Marc-André Lureau	8e3b0cbb72	Replace qemu_real_host_page variables with inlined functions Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-04-06 10:50:38 +02:00
Marc-André Lureau	ee3eb3a7ce	Replace TARGET_WORDS_BIGENDIAN Convert the TARGET_WORDS_BIGENDIAN macro, similarly to what was done with HOST_BIG_ENDIAN. The new TARGET_BIG_ENDIAN macro is either 0 or 1, and thus should always be defined to prevent misuse. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Suggested-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-8-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-04-06 10:50:37 +02:00
Marc-André Lureau	e03b56863d	Replace config-time define HOST_WORDS_BIGENDIAN Replace a config-time define with a compile time condition define (compatible with clang and gcc) that must be declared prior to its usage. This avoids having a global configure time define, but also prevents from bad usage, if the config header wasn't included before. This can help to make some code independent from qemu too. gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> [ For the s390x parts I'm involved in ] Acked-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-04-06 10:50:37 +02:00
Markus Armbruster	b21e238037	Use g_new() & friends where that makes obvious sense g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. This commit only touches allocations with size arguments of the form sizeof(T). Patch created mechanically with: $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \ --macro-file scripts/cocci-macro-file.h FILES... Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20220315144156.1595462-4-armbru@redhat.com> Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>	2022-03-21 15:44:44 +01:00
Longpeng(Mike)	def4c5570c	kvm/msi: do explicit commit when adding msi routes We invoke the kvm_irqchip_commit_routes() for each addition to MSI route table, which is not efficient if we are adding lots of routes in some cases. This patch lets callers invoke the kvm_irqchip_commit_routes(), so the callers can decide how to optimize. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html Signed-off-by: Longpeng <longpeng2@huawei.com> Message-Id: <20220222141116.2091-3-longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-15 11:26:20 +01:00
Philippe Mathieu-Daudé	3919635582	accel: Introduce AccelOpsClass::cpus_are_resettable() Add cpus_are_resettable() to AccelOps, and implement it for the KVM accelerator. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220207075426.81934-12-f4bug@amsat.org>	2022-03-06 13:15:42 +01:00
Philippe Mathieu-Daudé	ad7d684dfd	accel: Introduce AccelOpsClass::cpu_thread_is_idle() Add cpu_thread_is_idle() to AccelOps, and implement it for the KVM / WHPX accelerators. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220207075426.81934-11-f4bug@amsat.org>	2022-03-06 13:15:42 +01:00
Maxim Levitsky	fd2ddd1689	kvm: add support for KVM_GUESTDBG_BLOCKIRQ Use the KVM_GUESTDBG_BLOCKIRQ debug flag if supported. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [Extracted from Maxim's patch into a separate commit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211111110604.207376-6-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-10 09:47:18 +01:00
Maxim Levitsky	12bc5b4cd5	gdbstub, kvm: let KVM report supported singlestep flags Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [Extracted from Maxim's patch into a separate commit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20211111110604.207376-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-10 09:47:18 +01:00
Hyman Huang(é»„å‹‡)	7786ae40ba	KVM: introduce dirty_pages and kvm_dirty_ring_enabled dirty_pages is used to calculate dirtyrate via dirty ring, when enabled, kvm-reaper will increase the dirty pages after gfns being dirtied. kvm_dirty_ring_enabled shows if kvm-reaper is working. dirtyrate thread could use it to check if measurement can base on dirty ring feature. Signed-off-by: Hyman Huang(é»„å‹‡) <huangy81@chinatelecom.cn> Message-Id: <fee5fb2ab17ec2159405fc54a3cff8e02322f816.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-11-01 22:56:43 +01:00
Philippe Mathieu-Daudé	773ab6cb16	target/i386/kvm: Restrict SEV stubs to x86 architecture SEV is x86-specific, no need to add its stub to other architectures. Move the stub file to target/i386/kvm/. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211007161716.453984-5-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-13 10:47:49 +02:00
Peter Xu	142518bda5	memory: Name all the memory listeners Provide a name field for all the memory listeners. It can be used to identify which memory listener is which. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210817013553.30584-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 15:30:24 +02:00
Michael Tokarev	7916b5fc8c	target/i386: spelling: occured=>occurred, mininum=>minimum Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Message-Id: <20210818141352.417716-1-mjt@msgid.tls.msk.ru> [lv: add mininum=>minimum in subject] Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-09-16 11:51:23 +02:00
Fabiano Rosas	380e49297c	kvm: ppc: Print meaningful message on KVM_CREATE_VM failure PowerPC has two KVM types (HV, PR) that translate into three kernel modules: kvm.ko - common kvm code kvm_hv.ko - kvm running with MSR_HV=1 or MSR_HV\|PR=0 in a nested guest. kvm_pr.ko - kvm running in usermode MSR_PR=1. Since the two KVM types can both be running at the same time, this creates a situation in which it is possible for one or both of the modules to fail to initialize, leaving the generic one behind. This leads QEMU to think it can create a guest, but KVM will fail when calling the type-specific code: ioctl(KVM_CREATE_VM) failed: 22 Invalid argument qemu-kvm: failed to initialize KVM: Invalid argument Ideally this would be solved kernel-side, but it might be a while until we can get rid of one of the modules. So in the meantime this patch tries to make this less confusing for the end user by adding a more elucidative message: ioctl(KVM_CREATE_VM) failed: 22 Invalid argument PPC KVM module is not loaded. Try 'modprobe kvm_hv'. [dwg: Fixed error in #elif which failed compile on !ppc hosts] Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20210722141340.2367905-1-farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-29 10:59:49 +10:00
Peter Xu	dcafa24827	KVM: Fix dirty ring mmap incorrect size due to renaming accident Found this when I wanted to try the per-vcpu dirty rate series out, then I found that it's not really working and it can quickly hang death a guest. I found strange errors (e.g. guest crash after migration) happens even without the per-vcpu dirty rate series. When merging dirty ring, probably no one notice that the trivial renaming diff [1] missed two existing references of kvm_dirty_ring_sizes; they do matter since otherwise we'll mmap() a shorter range of memory after the renaming. I think it didn't SIGBUS for me easily simply because some other stuff within qemu mmap()ed right after the dirty rings (e.g. when testing 4096 slots, it aligned with one small page on x86), so when we access the rings we've been reading/writting to random memory elsewhere of qemu. Fix the two sizes when map/unmap the shared dirty gfn memory. [1] https://lore.kernel.org/qemu-devel/dac5f0c6-1bca-3daf-e5d2-6451dbbaca93@redhat.com/ Cc: Hyman Huang <huangy81@chinatelecom.cn> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210609014355.217110-1-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:12 +02:00
Stefano Garzarella	d0fb9657a3	docs: fix references to docs/devel/tracing.rst Commit `e50caf4a5c` ("tracing: convert documentation to rST") converted docs/devel/tracing.txt to docs/devel/tracing.rst. We still have several references to the old file, so let's fix them with the following command: sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210517151702.109066-2-sgarzare@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-06-02 06:51:09 +02:00
Peter Xu	b4420f198d	KVM: Dirty ring support KVM dirty ring is a new interface to pass over dirty bits from kernel to the userspace. Instead of using a bitmap for each memory region, the dirty ring contains an array of dirtied GPAs to fetch (in the form of offset in slots). For each vcpu there will be one dirty ring that binds to it. kvm_dirty_ring_reap() is the major function to collect dirty rings. It can be called either by a standalone reaper thread that runs in the background, collecting dirty pages for the whole VM. It can also be called directly by any thread that has BQL taken. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210506160549.130416-11-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:46 +02:00
Peter Xu	a81a592698	KVM: Disable manual dirty log when dirty ring enabled KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is for KVM_CLEAR_DIRTY_LOG, which is only useful for KVM_GET_DIRTY_LOG. Skip enabling it for kvm dirty ring. More importantly, KVM_DIRTY_LOG_INITIALLY_SET will not wr-protect all the pages initially, which is against how kvm dirty ring is used - there's no way for kvm dirty ring to re-protect a page before it's notified as being written first with a GFN entry in the ring! So when KVM_DIRTY_LOG_INITIALLY_SET is enabled with dirty ring, we'll see silent data loss after migration. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210506160549.130416-10-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:46 +02:00
Peter Xu	2ea5cb0a47	KVM: Add dirty-ring-size property Add a parameter for dirty gfn count for dirty rings. If zero, dirty ring is disabled. Otherwise dirty ring will be enabled with the per-vcpu gfn count as specified. If dirty ring cannot be enabled due to unsupported kernel or illegal parameter, it'll fallback to dirty logging. By default, dirty ring is not enabled (dirty-gfn-count default to 0). Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210506160549.130416-9-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:45 +02:00
Peter Xu	563d32ba9b	KVM: Cache kvm slot dirty bitmap size Cache it too because we'll reference it more frequently in the future. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210506160549.130416-8-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:45 +02:00
Peter Xu	29b7e8be76	KVM: Simplify dirty log sync in kvm_set_phys_mem kvm_physical_sync_dirty_bitmap() on the whole section is inaccurate, because the section can be a superset of the memslot that we're working on. The result is that if the section covers multiple kvm memslots, we could be doing the synchronization for multiple times for each kvmslot in the section. With the two helpers that we just introduced, it's very easy to do it right now by calling the helpers. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210506160549.130416-7-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:45 +02:00
Peter Xu	2c20b27eed	KVM: Provide helper to sync dirty bitmap from slot to ramblock kvm_physical_sync_dirty_bitmap() calculates the ramblock offset in an awkward way from the MemoryRegionSection that passed in from the caller. The truth is for each KVMSlot the ramblock offset never change for the lifecycle. Cache the ramblock offset for each KVMSlot into the structure when the KVMSlot is created. With that, we can further simplify kvm_physical_sync_dirty_bitmap() with a helper to sync KVMSlot dirty bitmap to the ramblock dirty bitmap of a specific KVMSlot. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210506160549.130416-6-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:45 +02:00
Peter Xu	e65e5f50db	KVM: Provide helper to get kvm dirty log Provide a helper kvm_slot_get_dirty_log() to make the function kvm_physical_sync_dirty_bitmap() clearer. We can even cache the as_id into KVMSlot when it is created, so that we don't even need to pass it down every time. Since at it, remove return value of kvm_physical_sync_dirty_bitmap() because it should never fail. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210506160549.130416-5-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:45 +02:00
Peter Xu	ea776d15ad	KVM: Create the KVMSlot dirty bitmap on flag changes Previously we have two places that will create the per KVMSlot dirty bitmap: 1. When a newly created KVMSlot has dirty logging enabled, 2. When the first log_sync() happens for a memory slot. The 2nd case is lazy-init, while the 1st case is not (which is a fix of what the 2nd case missed). To do explicit initialization of dirty bitmaps, what we're missing is to create the dirty bitmap when the slot changed from not-dirty-track to dirty-track. Do that in kvm_slot_update_flags(). With that, we can safely remove the 2nd lazy-init. This change will be needed for kvm dirty ring because kvm dirty ring does not use the log_sync() interface at all. Also move all the pre-checks into kvm_slot_init_dirty_bitmap(). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210506160549.130416-4-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:45 +02:00
Peter Xu	a2f77862ff	KVM: Use a big lock to replace per-kml slots_lock Per-kml slots_lock will bring some trouble if we want to take all slots_lock of all the KMLs, especially when we're in a context that we could have taken some of the KML slots_lock, then we even need to figure out what we've taken and what we need to take. Make this simple by merging all KML slots_lock into a single slots lock. Per-kml slots_lock isn't anything that helpful anyway - so far only x86 has two address spaces (so, two slots_locks). All the rest archs will be having one address space always, which means there's actually one slots_lock so it will be the same as before. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210506160549.130416-3-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:45 +02:00
Paolo Bonzini	70cbae429e	KVM: do not allow setting properties at runtime Only allow accelerator properties to be set when the accelerator is being created. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:45 +02:00
Thomas Huth	ee86213aa3	Do not include exec/address-spaces.h if it's not really necessary Stop including exec/address-spaces.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-5-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:51 +02:00
Thomas Huth	4c386f8064	Do not include sysemu/sysemu.h if it's not really necessary Stop including sysemu/sysemu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-2-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:50 +02:00
David Edmondson	56567da376	accel: kvm: clarify that extra exit data is hexadecimal When dumping the extra exit data provided by KVM, make it clear that the data is hexadecimal. At the same time, zero-pad the output. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210428142431.266879-1-david.edmondson@oracle.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:50 +02:00
Paolo Bonzini	26dbec410e	Revert "accel: kvm: Add aligment assert for kvm_log_clear_one_slot" This reverts commit `3920552846`. Thomas Huth reported a failure with CentOS 6 guests: ../../devel/qemu/accel/kvm/kvm-all.c:690: kvm_log_clear_one_slot: Assertion `QEMU_IS_ALIGNED(start \| size, psize)' failed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-16 14:30:30 -04:00
Andrew Jones	516fc0a081	accel: kvm: Fix kvm_type invocation Prior to commit `f2ce39b4f0` a MachineClass kvm_type method only needed to be registered to ensure it would be executed. With commit `f2ce39b4f0` a kvm-type machine property must also be specified. hw/arm/virt relies on the kvm_type method to pass its selected IPA limit to KVM, but this is not exposed as a machine property. Restore the previous functionality of invoking kvm_type when it's present. Fixes: `f2ce39b4f0` ("vl: make qemu_get_machine_opts static") Signed-off-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-id: 20210310135218.255205-2-drjones@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-03-12 12:47:11 +00:00
Keqian Zhu	3920552846	accel: kvm: Add aligment assert for kvm_log_clear_one_slot The parameters start and size are transfered from QEMU memory emulation layer. It can promise that they are TARGET_PAGE_SIZE aligned. However, KVM needs they are qemu_real_page_size aligned. Though no caller breaks this aligned requirement currently, we'd better add an explicit assert to avoid future breaking. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <20201217014941.22872-3-zhukeqian1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-06 11:41:54 +01:00
Keqian Zhu	e0a8f99355	accel: kvm: Fix memory waste under mismatch page size When handle dirty log, we face qemu_real_host_page_size and TARGET_PAGE_SIZE. The first one is the granule of KVM dirty bitmap, and the second one is the granule of QEMU dirty bitmap. As qemu_real_host_page_size >= TARGET_PAGE_SIZE (kvm_init() enforced it), misuse TARGET_PAGE_SIZE to init kvmslot dirty_bmap may waste memory. For example, when qemu_real_host_page_size is 64K and TARGET_PAGE_SIZE is 4K, it wastes 93.75% (15/16) memory. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20201217014941.22872-2-zhukeqian1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-06 11:41:54 +01:00
Tom Lendacky	92a5199b29	sev/i386: Don't allow a system reset under an SEV-ES guest An SEV-ES guest does not allow register state to be altered once it has been measured. When an SEV-ES guest issues a reboot command, Qemu will reset the vCPU state and resume the guest. This will cause failures under SEV-ES. Prevent that from occuring by introducing an arch-specific callback that returns a boolean indicating whether vCPUs are resettable. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: David Hildenbrand <david@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Message-Id: <1ac39c441b9a3e970e9556e1cc29d0a0814de6fd.1611682609.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-16 17:15:39 +01:00
Paolo Bonzini	b2f73a0784	sev/i386: Allow AP booting under SEV-ES When SEV-ES is enabled, it is not possible modify the guests register state after it has been initially created, encrypted and measured. Normally, an INIT-SIPI-SIPI request is used to boot the AP. However, the hypervisor cannot emulate this because it cannot update the AP register state. For the very first boot by an AP, the reset vector CS segment value and the EIP value must be programmed before the register has been encrypted and measured. Search the guest firmware for the guest for a specific GUID that tells Qemu the value of the reset vector to use. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <22db2bfb4d6551aed661a9ae95b4fdbef613ca21.1611682609.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-16 17:15:39 +01:00
Thomas Huth	38e0b7904e	accel/kvm/kvm-all: Fix wrong return code handling in dirty log code The kvm_vm_ioctl() wrapper already returns -errno if the ioctl itself returned -1, so the callers of kvm_vm_ioctl() should not check for -1 but for a value < 0 instead. This problem has been fixed once already in commit `b533f658a9` but that commit missed that the ENOENT error code is not fatal for this ioctl, so the commit has been reverted in commit `50212d6346` since the problem occurred close to a pending release at that point in time. The plan was to fix it properly after the release, but it seems like this has been forgotten. So let's do it now finally instead. Resolves: https://bugs.launchpad.net/qemu/+bug/1294227 Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210129084354.42928-1-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 14:43:55 +01:00
David Gibson	ec78e2cda3	confidential guest support: Move SEV initialization into arch specific code While we've abstracted some (potential) differences between mechanisms for securing guest memory, the initialization is still specific to SEV. Given that, move it into x86's kvm_arch_init() code, rather than the generic kvm_init() code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2021-02-08 16:57:38 +11:00
David Gibson	c9f5aaa6bc	sev: Add Error ** to sev_kvm_init() This allows failures to be reported richly and idiomatically. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com>	2021-02-08 16:57:38 +11:00
David Gibson	e0292d7c62	confidential guest support: Rework the "memory-encryption" property Currently the "memory-encryption" property is only looked at once we get to kvm_init(). Although protection of guest memory from the hypervisor isn't something that could really ever work with TCG, it's not conceptually tied to the KVM accelerator. In addition, the way the string property is resolved to an object is almost identical to how a QOM link property is handled. So, create a new "confidential-guest-support" link property which sets this QOM interface link directly in the machine. For compatibility we keep the "memory-encryption" property, but now implemented in terms of the new property. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com>	2021-02-08 16:57:38 +11:00
David Gibson	aacdb84413	sev: Remove false abstraction of flash encryption When AMD's SEV memory encryption is in use, flash memory banks (which are initialed by pc_system_flash_map()) need to be encrypted with the guest's key, so that the guest can read them. That's abstracted via the kvm_memcrypt_encrypt_data() callback in the KVM state.. except, that it doesn't really abstract much at all. For starters, the only call site is in code specific to the 'pc' family of machine types, so it's obviously specific to those and to x86 to begin with. But it makes a bunch of further assumptions that need not be true about an arbitrary confidential guest system based on memory encryption, let alone one based on other mechanisms: * it assumes that the flash memory is defined to be encrypted with the guest key, rather than being shared with hypervisor * it assumes that that hypervisor has some mechanism to encrypt data into the guest, even though it can't decrypt it out, since that's the whole point * the interface assumes that this encrypt can be done in place, which implies that the hypervisor can write into a confidential guests's memory, even if what it writes isn't meaningful So really, this "abstraction" is actually pretty specific to the way SEV works. So, this patch removes it and instead has the PC flash initialization code call into a SEV specific callback. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cornelia Huck <cohuck@redhat.com>	2021-02-08 16:57:38 +11:00
Claudio Fontana	b86f59c715	accel: replace struct CpusAccel with AccelOpsClass This will allow us to centralize the registration of the cpus.c module accelerator operations (in accel/accel-softmmu.c), and trigger it automatically using object hierarchy lookup from the new accel_init_interfaces() initialization step, depending just on which accelerators are available in the code. Rename all tcg-cpus.c, kvm-cpus.c, etc to tcg-accel-ops.c, kvm-accel-ops.c, etc, matching the object type names. Signed-off-by: Claudio Fontana <cfontana@suse.de> Message-Id: <20210204163931.7358-18-cfontana@suse.de> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-02-05 10:24:15 -10:00
Jiaxun Yang	eb8b1a797a	accel/kvm: avoid using predefined PAGE_SIZE As per POSIX specification of limits.h [1], OS libc may define PAGE_SIZE in limits.h. PAGE_SIZE is used in included kernel uapi headers. To prevent collosion of definition, we discard PAGE_SIZE from defined by libc and take QEMU's variable. [1]: https://pubs.opengroup.org/onlinepubs/7908799/xsh/limits.h.html Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <20210118063808.12471-8-jiaxun.yang@flygoat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-01-20 10:46:54 +01:00
Zenghui Yu	4054adbdd2	kvm: Take into account the unaligned section size when preparing bitmap The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the start and the size of the given range of pages. We have been careful to handle the unaligned cases when performing CLEAR on one slot. But it seems that we forget to take the unaligned size case into account when preparing bitmap for the interface, and we may end up clearing dirty status for pages outside of [start, start + size). If the size is unaligned, let's go through the slow path to manipulate a temp bitmap for the interface so that we won't bother with those unaligned bits at the end of bitmap. I don't think this can happen in practice since the upper layer would provide us with the alignment guarantee. I'm not sure if kvm-all could rely on it. And this patch is mainly intended to address correctness of the specific algorithm used inside kvm_log_clear_one_slot(). Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Message-Id: <20201208114013.875-1-yuzenghui@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 12:52:05 -05:00
Paolo Bonzini	f2ce39b4f0	vl: make qemu_get_machine_opts static Machine options can be retrieved as properties of the machine object. Encourage that by removing the "easy" accessor to machine options. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 12:51:55 -05:00
Elena Afanasova	f9b4908895	accel/kvm: add PIO ioeventfds only in case kvm_eventfds_allowed is true Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Elena Afanasova <eafanasova@gmail.com> Message-Id: <20201017210102.26036-1-eafanasova@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-11-03 18:55:13 +00:00
Dr. David Alan Gilbert	d0a92b353e	kvm: kvm_init_vcpu take Error pointer Clean up the error handling in kvm_init_vcpu so we can see what went wrong more easily. Make it take an Error ** and fill it out with what failed, including the cpu id, so you can tell if it only fails at a given ID. Replace the remaining DPRINTF by a trace. This turns a: kvm_init_vcpu failed: Invalid argument into: kvm_init_vcpu: kvm_get_vcpu failed (256): Invalid argument and with the trace you then get to see: 19049@1595520414.310107:kvm_init_vcpu index: 169 id: 212 19050@1595520414.310635:kvm_init_vcpu index: 170 id: 256 qemu-system-x86_64: kvm_init_vcpu: kvm_get_vcpu failed (256): Invalid argument which makes stuff a lot more obvious. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200723160915.129069-1-dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-05 16:41:22 +02:00
Claudio Fontana	e0715f6abc	kvm: remove kvm specific functions from global includes Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-05 16:41:22 +02:00
Claudio Fontana	57038a92bb	cpus: extract out kvm-specific code to accel/kvm register a "CpusAccel" interface for KVM as well. Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> [added const] Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-05 16:41:22 +02:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Daniel P. Berrangé	448058aa99	util: rename qemu_open() to qemu_open_old() We want to introduce a new version of qemu_open() that uses an Error object for reporting problems and make this it the preferred interface. Rename the existing method to release the namespace for the new impl. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
Marc-André Lureau	1a82878a08	meson: accel Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:30:36 -04:00
Paolo Bonzini	243af0225a	trace: switch position of headers to what Meson requires Meson doesn't enjoy the same flexibility we have with Make in choosing the include path. In particular the tracing headers are using $(build_root)/$(<D). In order to keep the include directives unchanged, the simplest solution is to generate headers with patterns like "trace/trace-audio.h" and place forwarding headers in the source tree such that for example "audio/trace.h" includes "trace/trace-audio.h". This patch is too ugly to be applied to the Makefiles now. It's only a way to separate the changes to the tracing header files from the Meson rewrite of the tracing logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:18:24 -04:00
Markus Armbruster	668f62ec62	error: Eliminate error_propagate() with Coccinelle, part 1 When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. Convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... return ... } to if (!foo(..., errp)) { ... ... return ... } where nothing else needs @err. Coccinelle script: @rule1 forall@ identifier fun, err, errp, lbl; expression list args, args2; binary operator op; constant c1, c2; symbol false; @@ if ( ( - fun(args, &err, args2) + fun(args, errp, args2) \| - !fun(args, &err, args2) + !fun(args, errp, args2) \| - fun(args, &err, args2) op c1 + fun(args, errp, args2) op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; \| return c2; \| return false; ) } @rule2 forall@ identifier fun, err, errp, lbl; expression list args, args2; expression var; binary operator op; constant c1, c2; symbol false; @@ - var = fun(args, &err, args2); + var = fun(args, errp, args2); ... when != err if ( ( var \| !var \| var op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; \| return c2; \| return false; \| return var; ) } @depends on rule1 \|\| rule2@ identifier err; @@ - Error *err = NULL; ... when != err Not exactly elegant, I'm afraid. The "when != lbl:" is necessary to avoid transforming if (fun(args, &err)) { goto out } ... out: error_propagate(errp, err); even though other paths to label out still need the error_propagate(). For an actual example, see sclp_realize(). Without the "when strict", Coccinelle transforms vfio_msix_setup(), incorrectly. I don't know what exactly "when strict" does, only that it helps here. The match of return is narrower than what I want, but I can't figure out how to express "return where the operand doesn't use @err". For an example where it's too narrow, see vfio_intx_enable(). Silently fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Converted manually. Line breaks tidied up manually. One nested declaration of @local_err deleted manually. Preexisting unwanted blank line dropped in hw/riscv/sifive_e.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-35-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	14217038bc	qapi: Use returned bool to check for failure, manual part The previous commit used Coccinelle to convert from checking the Error object to checking the return value. Convert a few more manually. Also tweak control flow in places to conform to the conventional "if error bail out" pattern. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-20-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	62a35aaa31	qapi: Use returned bool to check for failure, Coccinelle part The previous commit enables conversion of visit_foo(..., &err); if (err) { ... } to if (!visit_foo(..., errp)) { ... } for visitor functions that now return true / false on success / error. Coccinelle script: @@ identifier fun =~ "check_list\|input_type_enum\|lv_start_struct\|lv_type_bool\|lv_type_int64\|lv_type_str\|lv_type_uint64\|output_type_enum\|parse_type_bool\|parse_type_int64\|parse_type_null\|parse_type_number\|parse_type_size\|parse_type_str\|parse_type_uint64\|print_type_bool\|print_type_int64\|print_type_null\|print_type_number\|print_type_size\|print_type_str\|print_type_uint64\|qapi_clone_start_alternate\|qapi_clone_start_list\|qapi_clone_start_struct\|qapi_clone_type_bool\|qapi_clone_type_int64\|qapi_clone_type_null\|qapi_clone_type_number\|qapi_clone_type_str\|qapi_clone_type_uint64\|qapi_dealloc_start_list\|qapi_dealloc_start_struct\|qapi_dealloc_type_anything\|qapi_dealloc_type_bool\|qapi_dealloc_type_int64\|qapi_dealloc_type_null\|qapi_dealloc_type_number\|qapi_dealloc_type_str\|qapi_dealloc_type_uint64\|qobject_input_check_list\|qobject_input_check_struct\|qobject_input_start_alternate\|qobject_input_start_list\|qobject_input_start_struct\|qobject_input_type_any\|qobject_input_type_bool\|qobject_input_type_bool_keyval\|qobject_input_type_int64\|qobject_input_type_int64_keyval\|qobject_input_type_null\|qobject_input_type_number\|qobject_input_type_number_keyval\|qobject_input_type_size_keyval\|qobject_input_type_str\|qobject_input_type_str_keyval\|qobject_input_type_uint64\|qobject_input_type_uint64_keyval\|qobject_output_start_list\|qobject_output_start_struct\|qobject_output_type_any\|qobject_output_type_bool\|qobject_output_type_int64\|qobject_output_type_null\|qobject_output_type_number\|qobject_output_type_str\|qobject_output_type_uint64\|start_list\|visit_check_list\|visit_check_struct\|visit_start_alternate\|visit_start_list\|visit_start_struct\|visit_type_."; expression list args; typedef Error; Error err; @@ - fun(args, &err); - if (err) + if (!fun(args, &err)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-19-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
David Hildenbrand	956b109fe3	accel/kvm: Convert to ram_block_discard_disable() Discarding memory does not work as expected. At the time this is called, we cannot have anyone active that relies on discards to work properly. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200626072248.78761-5-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-07-02 05:54:59 -04:00
Jay Zhou	494cd11d76	kvm: support to get/set dirty log initial-all-set capability Since the new capability KVM_DIRTY_LOG_INITIALLY_SET of KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 has been introduced in the kernel, tweak the userspace side to detect and enable this capability. Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20200304025554.2159-1-jianjay.zhou@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-26 06:45:29 -04:00
Peter Xu	c82d9d43ed	KVM: Kick resamplefd for split kernel irqchip This is majorly only for X86 because that's the only one that supports split irqchip for now. When the irqchip is split, we face a dilemma that KVM irqfd will be enabled, however the slow irqchip is still running in the userspace. It means that the resamplefd in the kernel irqfds won't take any effect and it will miss to ack INTx interrupts on EOIs. One example is split irqchip with VFIO INTx, which will break if we use the VFIO INTx fast path. This patch can potentially supports the VFIO fast path again for INTx, that the IRQ delivery will still use the fast path, while we don't need to trap MMIOs in QEMU for the device to emulate the EIOs (see the callers of vfio_eoi() hook). However the EOI of the INTx will still need to be done from the userspace by caching all the resamplefds in QEMU and kick properly for IOAPIC EOI broadcast. This is tricky because in this case the userspace ioapic irr & remote-irr will be bypassed. However such a change will greatly boost performance for assigned devices using INTx irqs (TCP_RR boosts 46% after this patch applied). When the userspace is responsible for the resamplefd kickup, don't register it on the kvm_irqfd anymore, because on newer kernels (after commit 654f1f13ea56, 5.2+) the KVM_IRQFD will fail if with both split irqchip and resamplefd. This will make sure that the fast path will work for all supported kernels. https://patchwork.kernel.org/patch/10738541/#22609933 Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200318145204.74483-5-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-10 12:10:33 -04:00
Peter Xu	ff66ba87ba	KVM: Pass EventNotifier into kvm_irqchip_assign_irqfd So that kvm_irqchip_assign_irqfd() can have access to the EventNotifiers, especially the resample event. It is needed in follow up patch to cache and kick resamplefds from QEMU. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200318145204.74483-4-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-10 12:10:28 -04:00
Markus Armbruster	d2623129a7	qom: Drop parameter @errp of object_property_add() & friends The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]	2020-05-15 07:07:58 +02:00
Markus Armbruster	7eecec7d12	qom: Drop object_property_set_description() parameter @errp object_property_set_description() and object_class_property_set_description() fail only when property @name is not found. There are 85 calls of object_property_set_description() and object_class_property_set_description(). None of them can fail: * 84 immediately follow the creation of the property. * The one in spapr_rng_instance_init() refers to a property created in spapr_rng_class_init(), from spapr_rng_properties[]. Every one of them still gets to decide what to pass for @errp. 51 calls pass &error_abort, 32 calls pass NULL, one receives the error and propagates it to &error_abort, and one propagates it to &error_fatal. I'm actually surprised none of them violates the Error API. What are we gaining by letting callers handle the "property not found" error? Use when the property is not known to exist is simpler: you don't have to guard the call with a check. We haven't found such a use in 5+ years. Until we do, let's make life a bit simpler and drop the @errp parameter. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-8-armbru@redhat.com> [One semantic rebase conflict resolved]	2020-05-15 07:06:49 +02:00
Dongjiu Geng	6b552b9bc8	KVM: Move hwpoison page related functions into kvm-all.c kvm_hwpoison_page_add() and kvm_unpoison_all() will both be used by X86 and ARM platforms, so moving them into "accel/kvm/kvm-all.c" to avoid duplicate code. For architectures that don't use the poison-list functionality the reset handler will harmlessly do nothing, so let's register the kvm_unpoison_all() function in the generic kvm_init() function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com> Acked-by: Xiang Zheng <zhengxiang9@huawei.com> Message-id: 20200512030609.19593-8-gengdongjiu@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-14 15:03:09 +01:00
Paolo Bonzini	9e264985ff	Merge branch 'exec_rw_const_v4' of https://github.com/philmd/qemu into HEAD	2020-02-25 13:41:48 +01:00
Philippe Mathieu-Daudé	88cd34ee9e	accel/kvm: Check ioctl(KVM_SET_USER_MEMORY_REGION) return value kvm_vm_ioctl() can fail, check its return value, and log an error when it failed. This fixes Coverity CID 1412229: Unchecked return value (CHECKED_RETURN) check_return: Calling kvm_vm_ioctl without checking return value Reported-by: Coverity (CID 1412229) Fixes: `235e8982ad` ("support using KVM_MEM_READONLY flag for regions") Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20200221163336.2362-1-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-02-25 09:18:01 +01:00

1 2 3 4 5

211 commits