Commit graph

105583 commits

Author SHA1 Message Date
Richard Henderson
99982beb4d linux-user: Rewrite mmap_frag
Use 'last' variables instead of 'end' variables.
Always zero MAP_ANONYMOUS fragments, which we previously
failed to do if they were not writable; early exit in case
we allocate a new page from the kernel, known zeros.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-16-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
7bdc1acc24 linux-user: Rewrite target_mprotect
Use 'last' variables instead of 'end' variables.
When host page size > guest page size, detect when
adjacent host pages have the same protection and
merge that expanded host range into fewer syscalls.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-15-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
55baec0f4c linux-user: Widen target_mmap offset argument to off_t
We build with _FILE_OFFSET_BITS=64, so off_t = off64_t = uint64_t.
With an extra cast, this fixes emulation of mmap2, which could
overflow the computation of the full value of offset.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-14-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
0dd558121c linux-user: Split out target_to_host_prot
Split out from validate_prot_to_pageflags, as there is not
one single host_prot for the entire range.  We need to adjust
prot for every host page that overlaps multiple guest pages.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-13-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
037986053b linux-user: Implement MAP_FIXED_NOREPLACE
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-12-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
9c255cb53e bsd-user: Use page_check_range_empty for MAP_EXCL
The previous check returned -1 when any page within
[start, start+len) is unmapped, not when all are unmapped.

Cc: Warner Losh <imp@bsdimp.com>
Cc: Kyle Evans <kevans@freebsd.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Warner Losh <imp@bsdimp.com>
Message-Id: <20230707204054.8792-11-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
c2281ddcf3 accel/tcg: Introduce page_check_range_empty
Examine the interval tree to validate that a region
has no existing mappings.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-10-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
4b840f9609 linux-user: Populate more bits in mmap_flags_tbl
Fix translation of TARGET_MAP_SHARED and TARGET_MAP_PRIVATE,
which are types not single bits.  Add TARGET_MAP_SHARED_VALIDATE,
TARGET_MAP_SYNC, TARGET_MAP_NONBLOCK, TARGET_MAP_POPULATE,
TARGET_MAP_FIXED_NOREPLACE, and TARGET_MAP_UNINITIALIZED.

Update strace to match.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-9-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
6edfca9eae linux-user: Split TARGET_PROT_* out of syscall_defs.h
Move the values into the per-target target_mman.h headers

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-8-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
492fe4e754 linux-user: Split TARGET_MAP_* out of syscall_defs.h
Move the values into the per-target target_mman.h headers

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-7-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
40965ad931 linux-user/strace: Expand struct flags to hold a mask
A zero bit value does not make sense -- it must relate to
some field in some way.

Define FLAG_BASIC with a build-time sanity check.
Adjust FLAG_GENERIC and FLAG_TARGET to use it.
Add FLAG_GENERIC_MASK and FLAG_TARGET_MASK.

Fix up the existing flag definitions for build errors.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-6-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
2b730f797e linux-user: Fix formatting of mmap.c
Fix all checkpatch.pl errors within mmap.c.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-5-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Andreas Schwab
d28b3c90cf linux-user: Make sure initial brk(0) is page-aligned
Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Signed-off-by: Andreas Schwab <schwab@suse.de>
Message-Id: <mvmpm55qnno.fsf@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
e18ed26ce7 tcg: Fix info_in_idx increment in layout_arg_by_ref
Off by one error, failing to take into account that layout_arg_1
already incremented info_in_idx for the first piece.  We only
need care for the n-1 TCG_CALL_ARG_BY_REF_N pieces here.

Cc: qemu-stable@nongnu.org
Fixes: 313bdea84d ("tcg: Add TCG_CALL_{RET,ARG}_BY_REF")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1751
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
cb62bd15e1 accel/tcg: Split out cpu_exec_longjmp_cleanup
Share the setjmp cleanup between cpu_exec_step_atomic
and cpu_exec_setjmp.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
9b61f77f40 linux-user: Fix do_shmat type errors
The guest address, raddr, should be unsigned, aka abi_ulong.
The host addresses should be cast via *intptr_t not long.
Drop the inline and fix two other whitespace issues.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20230626140250.69572-1-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Pierrick Bouvier
7a8d9f3a0e linux-user/syscall: Implement execve without execveat
Support for execveat syscall was implemented in 55bbe4 and is available
since QEMU 8.0.0. It relies on host execveat, which is widely available
on most of Linux kernels today.

However, this change breaks qemu-user self emulation, if "host" qemu
version is less than 8.0.0. Indeed, it does not implement yet execveat.
This strange use case happens with most of distribution today having
binfmt support.

With a concrete failing example:
$ qemu-x86_64-7.2 qemu-x86_64-8.0 /bin/bash -c /bin/ls
/bin/bash: line 1: /bin/ls: Function not implemented
-> not implemented means execve returned ENOSYS

qemu-user-static 7.2 and 8.0 can be conveniently grabbed from debian
packages qemu-user-static* [1].

One usage of this is running wine-arm64 from linux-x64 (details [2]).
This is by updating qemu embedded in docker image that we ran into this
issue.

The solution to update host qemu is not always possible. Either it's
complicated or ask you to recompile it, or simply is not accessible
(GitLab CI, GitHub Actions). Thus, it could be worth to implement execve
without relying on execveat, which is the goal of this patch.

This patch was tested with example presented in this commit message.

[1] http://ftp.us.debian.org/debian/pool/main/q/qemu/
[1] https://www.linaro.org/blog/emulate-windows-on-arm/

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20230705121023.973284-1-pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
ea9812d93f include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for nios2
Based on gcc's nios2.h setting BIGGEST_ALIGNMENT to 32 bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
e73f27003e include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for microblaze
Based on gcc's microblaze.h setting BIGGEST_ALIGNMENT to 32 bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
0f41be8d89 linux-user: Use abi_uint not unsigned in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
20d49567a3 linux-user: Use abi_short not short in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
77e935f4e6 linux-user: Use abi_ushort not unsigned short in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
b3c719b2d1 linux-user: Use abi_int not int in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
55a1bcff0c linux-user: Use abi_llong not long long in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
6c977729ef linux-user: Use abi_ullong not unsigned long long in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
c7828bd1c2 linux-user: Use abi_uint not unsigned int in syscall_defs.h
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
93c5c6cd99 linux-user: Use abi_llong not int64_t in syscall_defs.h
Be careful not to change linux_dirent64, which is a host structure.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
7af406a629 linux-user: Use abi_ullong not uint64_t in syscall_defs.h
Be careful not to change linux_dirent64, which is a host structure.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
5dc0c97130 linux-user: Use abi_int not int32_t in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
f6ee1627f3 linux-user: Use abi_uint not uint32_t in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
cb80ce5e65 linux-user: Remove #if 0 block in syscall_defs.h
These definitions are in sparc/signal.c.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
4f8ed2fd79 linux-user: Reformat syscall_defs.h
Untabify and re-indent.
We had a mix of 2, 3, 4, and 8 space indentation.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
4633c1e2c5 * SCSI unit attention fix
* add PCIe devices to s390x emulator
 * IDE unplug fix for Xen
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmSxEWkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPDrAf/SyGEcBr1U2v0HBwfqGcHOVPwx5Dc
 jk9628klLgRF9EqEoffFfJTf9LU5Su4WsjtGLvH+GBCV0thfaPrvQJxD4KWvxgUl
 SKX5zepw9GY+uiTmbyuStLo5a8ksL6z5Zvw92gKh2PEKwuicerJL7OnK8drTMXXS
 haL/UL3v3Qa3OwkxBIIq9uXdZjUiSib6PQD9/u7OoY67F6/ThmtUozgcMpqR/39Q
 0AdNibteN2XlUrysS9hreC0pAmqB6luAdo7wcUR53NV7Yp0yOa1jySJRxiNvHGrB
 gK7jpHL/UBjTTkBodfZD21q5Ih4Vpya2FWpg4ZZlrIEJQc2AyxCl3zw3Bg==
 =Ai1b
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* SCSI unit attention fix
* add PCIe devices to s390x emulator
* IDE unplug fix for Xen

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmSxEWkUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroPDrAf/SyGEcBr1U2v0HBwfqGcHOVPwx5Dc
# jk9628klLgRF9EqEoffFfJTf9LU5Su4WsjtGLvH+GBCV0thfaPrvQJxD4KWvxgUl
# SKX5zepw9GY+uiTmbyuStLo5a8ksL6z5Zvw92gKh2PEKwuicerJL7OnK8drTMXXS
# haL/UL3v3Qa3OwkxBIIq9uXdZjUiSib6PQD9/u7OoY67F6/ThmtUozgcMpqR/39Q
# 0AdNibteN2XlUrysS9hreC0pAmqB6luAdo7wcUR53NV7Yp0yOa1jySJRxiNvHGrB
# gK7jpHL/UBjTTkBodfZD21q5Ih4Vpya2FWpg4ZZlrIEJQc2AyxCl3zw3Bg==
# =Ai1b
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 14 Jul 2023 10:12:09 AM BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [undefined]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
  scsi: clear unit attention only for REPORT LUNS commands
  scsi: cleanup scsi_clear_unit_attention()
  scsi: fetch unit attention when creating the request
  kconfig: Add PCIe devices to s390x machines
  hw/ide/piix: properly initialize the BMIBA register

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-14 16:39:46 +01:00
Stefano Garzarella
2eb5599e8a scsi: clear unit attention only for REPORT LUNS commands
scsi_clear_unit_attention() now only handles REPORTED LUNS DATA HAS
CHANGED.

This only happens when we handle REPORT LUNS commands, so let's rename
the function in scsi_clear_reported_luns_changed() and call it only in
scsi_target_emulate_report_luns().

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-ID: <20230712134352.118655-4-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-14 11:10:58 +02:00
Stefano Garzarella
ba947dab98 scsi: cleanup scsi_clear_unit_attention()
The previous commit moved the unit attention clearing when we create
the request. So now we can clean scsi_clear_unit_attention() to handle
only the case of the REPORT LUNS command: this is the only case in
which a UNIT ATTENTION is cleared without having been reported.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-ID: <20230712134352.118655-3-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-14 11:10:58 +02:00
Stefano Garzarella
9472083e64 scsi: fetch unit attention when creating the request
Commit 1880ad4f4e ("virtio-scsi: Batched prepare for cmd reqs") split
calls to scsi_req_new() and scsi_req_enqueue() in the virtio-scsi device.
No ill effects were observed until commit 8cc5583abe ("virtio-scsi: Send
"REPORTED LUNS CHANGED" sense data upon disk hotplug events") added a
unit attention that was easy to trigger with device hotplug and
hot-unplug.

Because the two calls were separated, all requests in the batch were
prepared calling scsi_req_new() to report a sense.  The first one
submitted would report the right sense and reset it to NO_SENSE, while
the others reported CHECK_CONDITION with no sense data.  This caused
SCSI errors in Linux.

To solve this issue, let's fetch the unit attention as early as possible
when we prepare the request, so that only the first request in the batch
will use the unit attention SCSIReqOps and the others will not report
CHECK CONDITION.

Fixes: 1880ad4f4e ("virtio-scsi: Batched prepare for cmd reqs")
Fixes: 8cc5583abe ("virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events")
Reported-by: Thomas Huth <thuth@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-ID: <20230712134352.118655-2-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-14 11:10:58 +02:00
Cédric Le Goater
cc9ff56fc3 kconfig: Add PCIe devices to s390x machines
It is useful to extend the number of available PCIe devices to KVM guests
for passthrough scenarios and also to expose these models to a different
(big endian) architecture. Introduce a new config PCIE_DEVICES to select
models, Intel Ethernet adapters and one USB controller. These devices all
support MSI-X which is a requirement on s390x as legacy INTx are not
supported.

Cc: Matthew Rosato <mjrosato@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20230712080146.839113-1-clg@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-14 11:10:57 +02:00
Olaf Hering
230dfd9257 hw/ide/piix: properly initialize the BMIBA register
According to the 82371FB documentation (82371FB.pdf, 2.3.9. BMIBA-BUS
MASTER INTERFACE BASE ADDRESS REGISTER, April 1997), the register is
32bit wide. To properly reset it to default values, all 32bit need to be
cleared. Bit #0 "Resource Type Indicator (RTE)" needs to be enabled.

The initial change wrote just the lower 8 bit, leaving parts of the "Bus
Master Interface Base Address" address at bit 15:4 unchanged.

Fixes: e6a71ae327 ("Add support for 82371FB (Step A1) and Improved support for 82371SB (Function 1)")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230712074721.14728-1-olaf@aepfle.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-14 11:10:57 +02:00
Richard Henderson
3dd9e54703 Pull request
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmSvAB0ACgkQnKSrs4Gr
 c8hVzAgAomXGVhqm/qnQ99SIry+kec9a1Bom4ZprvpEtiHndoq8bw/ujeUlr/XK0
 CBKdYNYY3R1rSB6yLsV2ea45elk3x/iMqygbJF3QfWxpHfx0l8vs1WB6uSQFqo/E
 ext1dvP8Czc0BP4MLaijvkW2u0j8qsLQnJcu9JDrRzgD8OqJSlhOxBSmb8VDvDvx
 am0RMRkYxSl7jn2LFEE4mMfUjy9JJSFhnzP8lMoGH/m8C62Eult2PFDItnTAG8hN
 IAyNDCDr2LKZwe6DP9JHUKCtqNYUHnGibgKH3k9NKWgUyOHSxqtDUC9vtoTPskGf
 BRo0XZM7qnSUZCoAhEjvKVWcEkFIkw==
 =aHUy
 -----END PGP SIGNATURE-----

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmSvAB0ACgkQnKSrs4Gr
# c8hVzAgAomXGVhqm/qnQ99SIry+kec9a1Bom4ZprvpEtiHndoq8bw/ujeUlr/XK0
# CBKdYNYY3R1rSB6yLsV2ea45elk3x/iMqygbJF3QfWxpHfx0l8vs1WB6uSQFqo/E
# ext1dvP8Czc0BP4MLaijvkW2u0j8qsLQnJcu9JDrRzgD8OqJSlhOxBSmb8VDvDvx
# am0RMRkYxSl7jn2LFEE4mMfUjy9JJSFhnzP8lMoGH/m8C62Eult2PFDItnTAG8hN
# IAyNDCDr2LKZwe6DP9JHUKCtqNYUHnGibgKH3k9NKWgUyOHSxqtDUC9vtoTPskGf
# BRo0XZM7qnSUZCoAhEjvKVWcEkFIkw==
# =aHUy
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 12 Jul 2023 08:33:49 PM BST
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  virtio-blk: fix host notifier issues during dataplane start/stop

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-12 20:46:10 +01:00
Stefan Hajnoczi
75dcb4d790 virtio-blk: fix host notifier issues during dataplane start/stop
The main loop thread can consume 100% CPU when using --device
virtio-blk-pci,iothread=<iothread>. ppoll() constantly returns but
reading virtqueue host notifiers fails with EAGAIN. The file descriptors
are stale and remain registered with the AioContext because of bugs in
the virtio-blk dataplane start/stop code.

The problem is that the dataplane start/stop code involves drain
operations, which call virtio_blk_drained_begin() and
virtio_blk_drained_end() at points where the host notifier is not
operational:
- In virtio_blk_data_plane_start(), blk_set_aio_context() drains after
  vblk->dataplane_started has been set to true but the host notifier has
  not been attached yet.
- In virtio_blk_data_plane_stop(), blk_drain() and blk_set_aio_context()
  drain after the host notifier has already been detached but with
  vblk->dataplane_started still set to true.

I would like to simplify ->ioeventfd_start/stop() to avoid interactions
with drain entirely, but couldn't find a way to do that. Instead, this
patch accepts the fragile nature of the code and reorders it so that
vblk->dataplane_started is false during drain operations. This way the
virtio_blk_drained_begin() and virtio_blk_drained_end() calls don't
touch the host notifier. The result is that
virtio_blk_data_plane_start() and virtio_blk_data_plane_stop() have
complete control over the host notifier and stale file descriptors are
no longer left in the AioContext.

This patch fixes the 100% CPU consumption in the main loop thread and
correctly moves host notifier processing to the IOThread.

Fixes: 1665d9326f ("virtio-blk: implement BlockDevOps->drained_begin()")
Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Lukas Doktor <ldoktor@redhat.com>
Message-id: 20230704151527.193586-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-12 15:20:32 -04:00
Richard Henderson
6f05a92ddc Hi,
"Host Memory Backends" and "Memory devices" queue ("mem"):
 - Memory device cleanups (especially around machine initialization)
 - "x-ignore-shared" migration support for virtio-mem
 - Add an abstract virtio-md-pci device as a common parent for
   virtio-mem-pci and virtio-pmem-pci (virtio based memory devices)
 - Device unplug support for virtio-mem-pci
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmSuYAQRHGRhdmlkQHJl
 ZGhhdC5jb20ACgkQTd4Q9wD/g1od9A/9HXT8IqKGup9is7P/mpobPWXczRGZ5sEg
 /q21PzX6crr9aFa+fYRF/Dlm3G/cSMOVXFRKGz3royLjsvaEj/veEewfKF8KWbBf
 eIS9udQTOwoD2kAhcv3pm0SwSJoVizpw2z7IodGVKE6iZxTXsmDksqQuFbrvVLSh
 2wtP4lizEXco/YsiCoAnStj2QtXBcHw7Ua7W2cDzxFmL+1pM5w3rjQ1ydCNz3bSG
 l4CXXs1i8OmOZbFN78F/E9SEkzQnAuHSO0Sc1aeAJkwVzOt2lj/YMgt0jHjAY0at
 pheWZ5pEE6hnQP740YXpt4Y6IIgO22pH23dLhq9A2reyRnwjt830uObHi3qAE8kB
 KR+ZQ+Z5bI6ZNB/EFiUsC1dFsr2fF20zQlO02MctyJ+lUG6p3gpvwsGScQxt+zdF
 QlkiSecGErYwC+nZ529SQB4gSEJTCjd/STDoidVYnZazdStaOaSyft02xRNzBPW/
 OnOY+6ZxZK6R11KfwGjnsftrovQIP3Pqi9TXGzW2xVlkWJHqlicy6G3ZfceTTlj9
 Gg2Ue694Wr1r4PDV2XlYcZ1IPLjSy5Msp5V2wERRrp3OItxnvegvTevQN7USEHC+
 BPGNMu11jriSY2pE5BSFN0hfGOvuvsk3GreLJiHFUXoje6gzAynuLjCN/CHdIVyK
 5i0AwdZ+xcA=
 =ch6m
 -----END PGP SIGNATURE-----

Merge tag 'mem-2023-07-12' of https://github.com/davidhildenbrand/qemu into staging

Hi,

"Host Memory Backends" and "Memory devices" queue ("mem"):
- Memory device cleanups (especially around machine initialization)
- "x-ignore-shared" migration support for virtio-mem
- Add an abstract virtio-md-pci device as a common parent for
  virtio-mem-pci and virtio-pmem-pci (virtio based memory devices)
- Device unplug support for virtio-mem-pci

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmSuYAQRHGRhdmlkQHJl
# ZGhhdC5jb20ACgkQTd4Q9wD/g1od9A/9HXT8IqKGup9is7P/mpobPWXczRGZ5sEg
# /q21PzX6crr9aFa+fYRF/Dlm3G/cSMOVXFRKGz3royLjsvaEj/veEewfKF8KWbBf
# eIS9udQTOwoD2kAhcv3pm0SwSJoVizpw2z7IodGVKE6iZxTXsmDksqQuFbrvVLSh
# 2wtP4lizEXco/YsiCoAnStj2QtXBcHw7Ua7W2cDzxFmL+1pM5w3rjQ1ydCNz3bSG
# l4CXXs1i8OmOZbFN78F/E9SEkzQnAuHSO0Sc1aeAJkwVzOt2lj/YMgt0jHjAY0at
# pheWZ5pEE6hnQP740YXpt4Y6IIgO22pH23dLhq9A2reyRnwjt830uObHi3qAE8kB
# KR+ZQ+Z5bI6ZNB/EFiUsC1dFsr2fF20zQlO02MctyJ+lUG6p3gpvwsGScQxt+zdF
# QlkiSecGErYwC+nZ529SQB4gSEJTCjd/STDoidVYnZazdStaOaSyft02xRNzBPW/
# OnOY+6ZxZK6R11KfwGjnsftrovQIP3Pqi9TXGzW2xVlkWJHqlicy6G3ZfceTTlj9
# Gg2Ue694Wr1r4PDV2XlYcZ1IPLjSy5Msp5V2wERRrp3OItxnvegvTevQN7USEHC+
# BPGNMu11jriSY2pE5BSFN0hfGOvuvsk3GreLJiHFUXoje6gzAynuLjCN/CHdIVyK
# 5i0AwdZ+xcA=
# =ch6m
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 12 Jul 2023 09:10:44 AM BST
# gpg:                using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A
# gpg:                issuer "david@redhat.com"
# gpg: Good signature from "David Hildenbrand <david@redhat.com>" [unknown]
# gpg:                 aka "David Hildenbrand <davidhildenbrand@gmail.com>" [undefined]
# gpg:                 aka "David Hildenbrand <hildenbr@in.tum.de>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D  FCCA 4DDE 10F7 00FF 835A

* tag 'mem-2023-07-12' of https://github.com/davidhildenbrand/qemu: (21 commits)
  virtio-mem-pci: Device unplug support
  virtio-mem: Prepare for device unplug support
  virtio-md-pci: Support unplug requests for compatible devices
  virtio-md-pci: Handle unplug of virtio based memory devices
  arm/virt: Use virtio-md-pci (un)plug functions
  pc: Factor out (un)plug handling of virtio-md-pci devices
  virtio-md-pci: New parent type for virtio-mem-pci and virtio-pmem-pci
  virtio-mem: Support "x-ignore-shared" migration
  migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored()
  virtio-mem: Skip most of virtio_mem_unplug_all() without plugged memory
  softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
  memory-device: Track used region size in DeviceMemoryState
  memory-device: Refactor memory_device_pre_plug()
  hw/i386/pc: Remove PC_MACHINE_DEVMEM_REGION_SIZE
  hw/i386/acpi-build: Rely on machine->device_memory when building SRAT
  hw/i386/pc: Use machine_memory_devices_init()
  hw/loongarch/virt: Use machine_memory_devices_init()
  hw/ppc/spapr: Use machine_memory_devices_init()
  hw/arm/virt: Use machine_memory_devices_init()
  memory-device: Introduce machine_memory_devices_init()
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-12 11:07:36 +01:00
David Hildenbrand
339a8bbdfe virtio-mem-pci: Device unplug support
Let's support device unplug by forwarding the unplug_request_check()
callback to the virtio-mem device.

Further, disallow changing the requested-size once an unplug request is
pending.

Disallowing requested-size changes handles corner cases such as
(1) pausing the VM (2) requesting device unplug and (3) adjusting the
requested size. If the VM would plug memory (due to the requested size
change) before processing the unplug request, we would be in trouble.

Message-ID: <20230711153445.514112-8-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:27:32 +02:00
David Hildenbrand
92a8ee1b59 virtio-mem: Prepare for device unplug support
In many cases, blindly unplugging a virtio-mem device is problematic. We
can only safely remove a device once:
* The guest is not expecting to be able to read unplugged memory
  (unplugged-inaccessible == on)
* The virtio-mem device does not have memory plugged (size == 0)
* The virtio-mem device does not have outstanding requests to the VM to
  plug memory (requested-size == 0)

So let's add a callback to the virtio-mem device class to check for that.
We'll wire-up virtio-mem-pci next.

Message-ID: <20230711153445.514112-7-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:27:31 +02:00
David Hildenbrand
aac44204bc virtio-md-pci: Support unplug requests for compatible devices
Let's support unplug requests for virtio-md-pci devices that provide
a unplug_request_check() callback.

We'll wire that up for virtio-mem-pci next.

Message-ID: <20230711153445.514112-6-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:27:30 +02:00
David Hildenbrand
c29dd73f74 virtio-md-pci: Handle unplug of virtio based memory devices
While we fence unplug requests from the outside, the VM can still
trigger unplug of virtio based memory devices, for example, in Linux
doing on a virtio-mem-pci device:
    # echo 0 > /sys/bus/pci/slots/3/power

While doing that is not really expected to work without harming the
guest OS (e.g., removing a virtio-mem device while it still provides
memory), let's make sure that we properly handle it on the QEMU side.

We'll add support for unplugging of virtio-mem devices in some
configurations next.

Message-ID: <20230711153445.514112-5-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:27:29 +02:00
David Hildenbrand
30ec5ccd3a arm/virt: Use virtio-md-pci (un)plug functions
Let's use our new helper functions. Note that virtio-pmem-pci is not
enabled for arm and, therefore, not compiled in.

Message-ID: <20230711153445.514112-4-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:27:28 +02:00
David Hildenbrand
dbdf841b2e pc: Factor out (un)plug handling of virtio-md-pci devices
Let's factor out (un)plug handling, to be reused from arm/virt code.

Provide stubs for the case that CONFIG_VIRTIO_MD is not selected because
neither virtio-mem nor virtio-pmem is enabled. While this cannot
currently happen for x86, it will be possible for arm/virt.

Message-ID: <20230711153445.514112-3-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:27:27 +02:00
David Hildenbrand
18129c15bc virtio-md-pci: New parent type for virtio-mem-pci and virtio-pmem-pci
Let's add a new abstract "virtio memory device" type, and use it as
parent class of virtio-mem-pci and virtio-pmem-pci.

Message-ID: <20230711153445.514112-2-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:27:25 +02:00
David Hildenbrand
b01fd4b67a virtio-mem: Support "x-ignore-shared" migration
To achieve desired "x-ignore-shared" functionality, we should not
discard all RAM when realizing the device and not mess with
preallocation/postcopy when loading device state. In essence, we should
not touch RAM content.

As "x-ignore-shared" gets set after realizing the device, we cannot
rely on that. Let's simply skip discarding of RAM on incoming migration.
Note that virtio_mem_post_load() will call
virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
once migration finished we'll have a consistent state.

The initial system reset will also not discard any RAM, because
virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
memory is plugged (which is the case before loading the device state).

Note that something like VM templating -- see commit b17fbbe55c
("migration: allow private destination ram with x-ignore-shared") -- is
currently incompatible with virtio-mem and ram_block_discard_range() will
warn in case a private file mapping is supplied by virtio-mem.

For VM templating with virtio-mem, it makes more sense to either
(a) Create the template without the virtio-mem device and hotplug a
    virtio-mem device to the new VM instances using proper own memory
    backend.
(b) Use a virtio-mem device that doesn't provide any memory in the
    template (requested-size=0) and use private anonymous memory.

Message-ID: <20230706075612.67404-5-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:25:37 +02:00
David Hildenbrand
f161c88a03 migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored()
virtio-mem wants to know whether it should not mess with the RAMBlock
content (e.g., discard RAM, preallocate memory) on incoming migration.

So let's expose that function as migrate_ram_is_ignored() in
migration/misc.h

Message-ID: <20230706075612.67404-4-david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:25:37 +02:00