Commit graph

111525 commits

Author SHA1 Message Date
Fabiano Rosas 2f6b8826a5 migration/ram: Add incoming 'mapped-ram' migration
Add the necessary code to parse the format changes for the
'mapped-ram' capability.

One of the more notable changes in behavior is that in the
'mapped-ram' case ram pages are restored in one go rather than
constantly looping through the migration stream.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-11-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas c2d5c4a7cb migration/ram: Add outgoing 'mapped-ram' migration
Implement the outgoing migration side for the 'mapped-ram' capability.

A bitmap is introduced to track which pages have been written in the
migration file. Pages are written at a fixed location for every
ramblock. Zero pages are ignored as they'd be zero in the destination
migration as well.

The migration stream is altered to put the dirty pages for a ramblock
after its header instead of having a sequential stream of pages that
follow the ramblock headers.

Without mapped-ram (current):        With mapped-ram (new):

 ---------------------               --------------------------------
 | ramblock 1 header |               | ramblock 1 header            |
 ---------------------               --------------------------------
 | ramblock 2 header |               | ramblock 1 mapped-ram header |
 ---------------------               --------------------------------
 | ...               |               | padding to next 1MB boundary |
 ---------------------               | ...                          |
 | ramblock n header |               --------------------------------
 ---------------------               | ramblock 1 pages             |
 | RAM_SAVE_FLAG_EOS |               | ...                          |
 ---------------------               --------------------------------
 | stream of pages   |               | ramblock 2 header            |
 | (iter 1)          |               --------------------------------
 | ...               |               | ramblock 2 mapped-ram header |
 ---------------------               --------------------------------
 | RAM_SAVE_FLAG_EOS |               | padding to next 1MB boundary |
 ---------------------               | ...                          |
 | stream of pages   |               --------------------------------
 | (iter 2)          |               | ramblock 2 pages             |
 | ...               |               | ...                          |
 ---------------------               --------------------------------
 | ...               |               | ...                          |
 ---------------------               --------------------------------
                                     | RAM_SAVE_FLAG_EOS            |
                                     --------------------------------
                                     | ...                          |
                                     --------------------------------

where:
 - ramblock header: the generic information for a ramblock, such as
   idstr, used_len, etc.

 - ramblock mapped-ram header: the new information added by this
   feature: bitmap of pages written, bitmap size and offset of pages
   in the migration file.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-10-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas 8d9e0d4100 migration: Add mapped-ram URI compatibility check
The mapped-ram migration format needs a channel that supports seeking
to be able to write each page to an arbitrary offset in the migration
stream.

Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-9-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas 4ed49feb44 migration/ram: Introduce 'mapped-ram' migration capability
Add a new migration capability 'mapped-ram'.

The core of the feature is to ensure that RAM pages are mapped
directly to offsets in the resulting migration file instead of being
streamed at arbitrary points.

The reasons why we'd want such behavior are:

 - The resulting file will have a bounded size, since pages which are
   dirtied multiple times will always go to a fixed location in the
   file, rather than constantly being added to a sequential
   stream. This eliminates cases where a VM with, say, 1G of RAM can
   result in a migration file that's 10s of GBs, provided that the
   workload constantly redirties memory.

 - It paves the way to implement O_DIRECT-enabled save/restore of the
   migration stream as the pages are ensured to be written at aligned
   offsets.

 - It allows the usage of multifd so we can write RAM pages to the
   migration file in parallel.

For now, enabling the capability has no effect. The next couple of
patches implement the core functionality.

Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-8-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas 7f5b50a401 migration/qemu-file: add utility methods for working with seekable channels
Add utility methods that will be needed when implementing 'mapped-ram'
migration capability.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240229153017.2221-7-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas c05dfcb7f2 io: fsync before closing a file channel
Make sure the data is flushed to disk before closing file
channels. This is to ensure data is on disk and not lost in the event
of a host crash.

This is currently being implemented to affect the migration code when
migrating to a file, but all QIOChannelFile users should benefit from
the change.

Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Acked-by: "Daniel P. Berrangé" <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-6-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Nikolay Borisov 0478b030fa io: implement io_pwritev/preadv for QIOChannelFile
The upcoming 'mapped-ram' feature will require qemu to write data to
(and restore from) specific offsets of the migration file.

Add a minimal implementation of pwritev/preadv and expose them via the
io_pwritev and io_preadv interfaces.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Nikolay Borisov f1cfe39418 io: Add generic pwritev/preadv interface
Introduce basic pwritev/preadv support in the generic channel layer.
Specific implementation will follow for the file channel as this is
required in order to support migration streams with fixed location of
each ram page.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Nikolay Borisov 401e311ff7 io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
Add a generic QIOChannel feature SEEKABLE which would be used by the
qemu_file* apis. For the time being this will be only implemented for
file channels.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas 4aac6b1e9b migration/multifd: Cleanup multifd_recv_sync_main
Some minor cleanups and documentation for multifd_recv_sync_main.

Use thread_count as done in other parts of the code. Remove p->id from
the multifd_recv_state sync, since that is global and not tied to a
channel. Add documentation for the sync steps.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Thomas Huth 462945cd22 chardev/char-socket: Fix TLS io channels sending too much data to the backend
Commit ffda5db65a ("io/channel-tls: fix handling of bigger read buffers")
changed the behavior of the TLS io channels to schedule a second reading
attempt if there is still incoming data pending. This caused a regression
with backends like the sclpconsole that check in their read function that
the sender does not try to write more bytes to it than the device can
currently handle.

The problem can be reproduced like this:

 1) In one terminal, do this:

  mkdir qemu-pki
  cd qemu-pki
  openssl genrsa 2048 > ca-key.pem
  openssl req -new -x509 -nodes -days 365000 -key ca-key.pem -out ca-cert.pem
  # enter some dummy value for the cert
  openssl genrsa 2048 > server-key.pem
  openssl req -new -x509 -nodes -days 365000 -key server-key.pem \
    -out server-cert.pem
  # enter some other dummy values for the cert

  gnutls-serv --echo --x509cafile ca-cert.pem --x509keyfile server-key.pem \
              --x509certfile server-cert.pem -p 8338

 2) In another terminal, do this:

  wget https://download.fedoraproject.org/pub/fedora-secondary/releases/39/Cloud/s390x/images/Fedora-Cloud-Base-39-1.5.s390x.qcow2

  qemu-system-s390x -nographic -nodefaults \
    -hda Fedora-Cloud-Base-39-1.5.s390x.qcow2 \
    -object tls-creds-x509,id=tls0,endpoint=client,verify-peer=false,dir=$PWD/qemu-pki \
    -chardev socket,id=tls_chardev,host=localhost,port=8338,tls-creds=tls0 \
    -device sclpconsole,chardev=tls_chardev,id=tls_serial

QEMU then aborts after a second or two with:

  qemu-system-s390x: ../hw/char/sclpconsole.c:73: chr_read: Assertion
   `size <= SIZE_BUFFER_VT220 - scon->iov_data_len' failed.
 Aborted (core dumped)

It looks like the second read does not trigger the chr_can_read() function
to be called before the second read, which should normally always be done
before sending bytes to a character device to see how much it can handle,
so the s->max_size in tcp_chr_read() still contains the old value from the
previous read. Let's make sure that we use the up-to-date value by calling
tcp_chr_read_poll() again here.

Fixes: ffda5db65a ("io/channel-tls: fix handling of bigger read buffers")
Buglink: https://issues.redhat.com/browse/RHEL-24614
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Message-ID: <20240229104339.42574-1-thuth@redhat.com>
Reviewed-by: Antoine Damhet <antoine.damhet@blade-group.com>
Tested-by: Antoine Damhet <antoine.damhet@blade-group.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01 08:27:33 +01:00
Thomas Huth f0cb6828ae tests/unit/test-util-sockets: Remove temporary file after test
test-util-sockets leaves the temporary socket files around in the
temporary files folder. Let's better remove them at the end of the
testing.

Fixes: 4d3a329af5 ("tests/util-sockets: add abstract unix socket cases")
Message-ID: <20240226082728.249753-1-thuth@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01 08:27:33 +01:00
Benjamin David Lunt 5e02a4fdeb hw/usb/bus.c: PCAP adding 0xA in Windows version
Since Windows text files use CRLFs for all \n, the Windows version of QEMU
inserts a CR in the PCAP stream when a LF is encountered when using USB PCAP
files. This is due to the fact that the PCAP file is opened as TEXT instead
of BINARY.

To show an example, when using a very common protocol to USB disks, the BBB
protocol uses a 10-byte command packet. For example, the READ_CAPACITY(10)
command will have a command block length of 10 (0xA). When this 10-byte
command (part of the 31-byte CBW) is placed into the PCAP file, the Windows
file manager inserts a 0xD before the 0xA, turning the 31-byte CBW into a
32-byte CBW.

Actual CBW:
  0040 55 53 42 43 01 00 00 00 08 00 00 00 80 00 0a 25 USBC...........%
  0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00       ...............

PCAP CBW
  0040 55 53 42 43 01 00 00 00 08 00 00 00 80 00 0d 0a USBC............
  0050 25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 %..............

I believe simply opening the PCAP file as BINARY instead of TEXT will fix
this issue.

Resolves: https://bugs.launchpad.net/qemu/+bug/2054889
Signed-off-by: Benjamin David Lunt <benlunt@fysnet.net>
Message-ID: <000101da6823$ce1bbf80$6a533e80$@fysnet.net>
[thuth: Break long line to avoid checkpatch.pl error]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01 08:27:33 +01:00
Thomas Huth 8bd3f84d1f hw/intc/Kconfig: Fix GIC settings when using "--without-default-devices"
When using "--without-default-devices", the ARM_GICV3_TCG and ARM_GIC_KVM
settings currently get disabled, though the arm virt machine is only of
very limited use in that case. This also causes the migration-test to
fail in such builds. Let's make sure that we always keep the GIC switches
enabled in the --without-default-devices builds, too.

Message-ID: <20240221110059.152665-1-thuth@redhat.com>
Tested-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01 08:27:33 +01:00
Daniel Henrique Barboza 3283843a8e libqos/virtio.c: fix 'avail_event' offset in qvring_init()
In qvring_init() we're writing vq->used->avail_event at "vq->used + 2 +
array_size".  The struct pointed by vq->used is, from virtio_ring.h
Linux header):

 *	// A ring of used descriptor heads with free-running index.
 *	__virtio16 used_flags;
 *	__virtio16 used_idx;
 *	struct vring_used_elem used[num];
 *	__virtio16 avail_event_idx;

So 'flags' is the word right at vq->used. 'idx' is vq->used + 2. We need
to skip 'used_idx' by adding + 2 bytes, and then sum the vector size, to
reach avail_event_idx. An example on how to properly access this field
can be found in qvirtqueue_kick():

avail_event = qvirtio_readw(d, qts, vq->used + 4 +
                            sizeof(struct vring_used_elem) * vq->size);

This error was detected when enabling the RISC-V 'virt' libqos machine.
The 'idx' test from vhost-user-blk-test.c errors out with a timeout in
qvirtio_wait_used_elem(). The timeout happens because when processing
the first element, 'avail_event' is read in qvirtqueue_kick() as non-zero
because we didn't initialize it properly (and the memory at that point
happened to be non-zero). 'idx' is 0.

All of this makes this condition fail because "idx - avail_event" will
overflow and be non-zero:

/* < 1 because we add elements to avail queue one by one */
if ((flags & VRING_USED_F_NO_NOTIFY) == 0 &&
                        (!vq->event || (uint16_t)(idx-avail_event) < 1)) {
    d->bus->virtqueue_kick(d, vq);
}

As a result the virtqueue is never kicked and we'll timeout waiting for it.

Fixes: 1053587c3f ("libqos: Added EVENT_IDX support")
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240217192607.32565-3-dbarboza@ventanamicro.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01 08:27:33 +01:00
Daniel Henrique Barboza 2791490de1 libqos/virtio.c: init all elems in qvring_indirect_desc_setup()
The loop isn't setting the values for the last element. Every other
element is being initialized with addr = 0, flags = VRING_DESC_F_NEXT
and next = i + 1. The last elem is never touched.

This became a problem when enabling a RISC-V 'virt' libqos machine in
the 'indirect' test of virti-blk-test.c. The 'flags' for the last
element will end up being an odd number (since we didn't touch it).
Being an odd number it will be mistaken by VRING_DESC_F_NEXT, which
happens to be 1.

Deep into hw/virt/virtio.c, in virtqueue_split_pop(), into
virtqueue_split_read_next_desc(), a check for VRING_DESC_F_NEXT will be
made to see if we're supposed to chain. The code will keep up chaining
in the last element because the uninitialized value happens to be odd.
We'll error out right after that because desc->next (which is also
uninitialized) will be >= max. A VIRTQUEUE_READ_DESC_ERROR will be
returned, with an error message like this in the stderr:

qemu-system-riscv64: Desc next is 49391

Since we never returned, we'll end up timing out at qvirtio_wait_used_elem():

ERROR:../tests/qtest/libqos/virtio.c:236:qvirtio_wait_used_elem:
    assertion failed: (g_get_monotonic_time() - start_time <= timeout_us)

The root cause is using uninitialized values from guest_alloc() in
qvring_indirect_desc_setup(). There's no guarantee that the memory pages
retrieved will be zeroed, so we can't make assumptions. In fact, commit
5b4f72f5e8 ("tests/qtest: properly initialise the vring used idx") fixed a
similar problem stating "It is probably not wise to assume guest memory
is zeroed anyway". I concur.

Initialize all elems in qvring_indirect_desc_setup().

Fixes: f294b029aa ("libqos: Added indirect descriptor support to virtio implementation")
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240217192607.32565-2-dbarboza@ventanamicro.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01 08:27:33 +01:00
Bryan Zhang 2b57143231 tests/migration: Set compression level in migration tests
Adds calls to set compression level for `zstd` and `zlib` migration
tests, just to make sure that the calls work.

Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com>
Link: https://lore.kernel.org/r/20240301035901.4006936-3-bryan.zhang@bytedance.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 14:14:55 +08:00
Bryan Zhang b4014a2bf5 migration: Properly apply migration compression level parameters
Some glue code was missing, so that using `qmp_migrate_set_parameters`
to set `multifd-zstd-level` or `multifd-zlib-level` did not work. This
commit adds the glue code to fix that.

Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com>
Link: https://lore.kernel.org/r/20240301035901.4006936-2-bryan.zhang@bytedance.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 14:14:55 +08:00
Steve Sistare 87a2848715 migration: massage cpr-reboot documentation
Re-wrap the cpr-reboot documentation to 70 columns, use '@' for
cpr-reboot references, capitalize COLO and VFIO, and tweak the
wording.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1709218462-3640-1-git-send-email-steven.sistare@oracle.com
[peterx: s/qemu/QEMU per Markus's suggestion]
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 14:14:14 +08:00
Richard Henderson 01a721167a linux-user/loongarch64: Remove TARGET_FORCE_SHMLBA
The kernel abi was changed with

    commit d23b77953f5a4fbf94c05157b186aac2a247ae32
    Author: Huacai Chen <chenhuacai@kernel.org>
    Date:   Wed Jan 17 12:43:08 2024 +0800

        LoongArch: Change SHMLBA from SZ_64K to PAGE_SIZE

during the v6.8 cycle.

Reviewed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29 14:24:30 -10:00
Richard Henderson 4ef1f559f2 linux-user/x86_64: Handle the vsyscall page in open_self_maps_{2,4}
This is the only case in which we expect to have no host memory backing
for a guest memory page, because in general linux user processes cannot
map any pages in the top half of the 64-bit address space.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2170
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29 14:24:24 -10:00
Paolo Bonzini ff202817dc tcg/optimize: fix uninitialized variable
The variables uext_opc and sext_opc are used without initialization if
TCG_TARGET_extract_i{32,64}_valid returns false.  The result, depending
on the compiler, might be the generation of extract and sextract opcodes
with invalid offset and count, or just random data in the TCG opcode
stream.

Fixes: ceb9ee06b7 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}", 2024-02-03)
Cc: Richard Henderson <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240228110641.287205-1-pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29 11:36:05 -10:00
Richard Henderson b816e1b5ba linux-user: Remove pgb_dynamic alignment assertion
The assertion was never correct, because the alignment is a composite
of the image alignment and SHMLBA.  Even if the image alignment didn't
match the image address, an assertion would not be correct -- more
appropriate would be an error message about an ill formed image.  But
the image cannot be held to SHMLBA under any circumstances.

Fixes: ee94743034 ("linux-user: completely re-write init_guest_space")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2157
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reported-by: Alexey Sheplyakov <asheplyakov@yandex.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson f2ffdfab7e target/alpha: Enable TARGET_PAGE_BITS_VARY for user-only
Since alpha binaries are generally built for multiple
page sizes, it is trivial to allow the page size to vary.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-34-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson 835e5fe9e2 target/ppc: Enable TARGET_PAGE_BITS_VARY for user-only
Since ppc binaries are generally built for multiple
page sizes, it is trivial to allow the page size to vary.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-33-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson 78b79b2cb3 linux-user: Bound mmap_min_addr by host page size
Bizzarely, it is possible to set /proc/sys/vm/mmap_min_addr
to a value below the host page size.  Fix that.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-32-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson a575230f95 target/arm: Enable TARGET_PAGE_BITS_VARY for AArch64 user-only
Since aarch64 binaries are generally built for multiple
page sizes, it is trivial to allow the page size to vary.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-31-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson ff8a8bbc2a linux-user: Allow TARGET_PAGE_BITS_VARY
If set, match the host and guest page sizes.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-30-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson 33402cea1f accel/tcg: Disconnect TargetPageDataNode from page size
Dynamically size the node for the runtime target page size.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-29-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson 8c45039f9e cpu: Remove page_size_init
Move qemu_host_page_{size,mask} and HOST_PAGE_ALIGN into bsd-user.
It should be removed from bsd-user as well, but defer that cleanup.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-28-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson 01e449809b *-user: Deprecate and disable -p pagesize
This option controls the host page size.  From the mis-usage in
our own testsuite, this is easily confused with guest page size.

The only thing that occurs when changing the host page size is
that stuff breaks, because one cannot actually change the host
page size.  Therefore reject all but the no-op setting as part
of the deprecation process.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-27-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson 6ada861951 tests/tcg: Extend file in linux-madvise.c
When guest page size > host page size, this test can fail
due to the SIGBUS protection hack.  Avoid this by making
sure that the file size is at least one guest page.

Visible with alpha guest on x86_64 host.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-26-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson e9206163d9 tests/tcg: Remove run-test-mmap-*
These tests are confused, because -p does not change
the guest page size, but the host page size.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-25-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson eb5027ac61 linux-user: Split out mmap_h_gt_g
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-24-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson 8080b2f804 linux-user: Split out mmap_h_lt_g
Work much harder to get alignment and mapping beyond the end
of the file correct.  Both of which are excercised by our
test-mmap for alpha (8k pages) on any 4k page host.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-23-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson 68098de90e linux-user: Split out mmap_h_eq_g
Move the MAX_FIXED_NOREPLACE check for reserved_va earlier.
Move the computation of host_prot earlier.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-22-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson 3bfa271e46 linux-user: Use do_munmap for target_mmap failure
For the cases for which the host mmap succeeds, but does
not yield the desired address, use do_munmap to restore
the reserved_va memory reservation.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson 2952b642a5 linux-user: Split out do_munmap
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson ad87d26e6b linux-user: Do early mmap placement only for reserved_va
For reserved_va, place all non-fixed maps then proceed
as for MAP_FIXED.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-21-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson 6ecc25570f linux-user: Split out mmap_end
Use a subroutine instead of a goto within target_mmap__locked.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-20-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson f0a362c476 linux-user: Fix sub-host-page mmap
We cannot skip over the_end1 to the_end, because we fail to
record the validity of the guest page with the interval tree.
Remove "the_end" and rename "the_end1" to "the_end".

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-19-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson e8cec51be0 linux-user: Move some mmap checks outside the lock
Basic validation of operands does not require the lock.
Hoist them from target_mmap__locked back into target_mmap.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-18-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson d558c395a9 linux-user: Split out target_mmap__locked
All "goto fail" may be transformed to "return -1".

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-17-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson 13c1339755 linux-user: Remove qemu_host_page_size from main
Use qemu_real_host_page_size() instead.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-16-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson 9260bd4013 softmmu/physmem: Remove HOST_PAGE_ALIGN
Align allocation sizes to the maximum of host and target page sizes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-15-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson 80c3aeef7f softmmu/physmem: Remove qemu_host_page_size
Use qemu_real_host_page_size() instead.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-14-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson b61af9b0d1 hw/tpm: Remove HOST_PAGE_ALIGN from tpm_ppi_init
This removes a hidden use of qemu_host_page_size, hoisting
two uses of qemu_real_host_page_size to a local variable.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
2024-02-29 11:35:36 -10:00
Richard Henderson 5d2203691e migration: Remove qemu_host_page_size
Replace with the maximum of the real host page size
and the target page size.  This is an exact replacement.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-12-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson b36b2b1d3d linux-user: Remove HOST_PAGE_ALIGN from mmap.c
This removes a hidden use of qemu_host_page_size, using instead
the existing host_page_size local within each function.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-11-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson e56922abf0 linux-user: Remove REAL_HOST_PAGE_ALIGN from mmap.c
We already have qemu_real_host_page_size() in a local variable.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-10-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00