Commit graph

587 commits

Author SHA1 Message Date
Jason Wang 6e52fc446d virtio-pci-modern: introduce helper for getting queue nums
This patch introduces helper for getting queue num of modern device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-14-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:58 -05:00
Jason Wang 75658afbab virtio-pci-modern: introduce helper for setting/geting queue size
This patch introduces helper for setting/getting queue size for modern
device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-13-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:58 -05:00
Jason Wang dc2e648198 virtio-pci-modern: introduce helper to set/get queue_enable
This patch introduces a helper to set/get queue_enable for modern device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-12-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:58 -05:00
Jason Wang e1b0fa2e38 virtio-pci-modern: introduce vp_modern_queue_address()
This patch introduce a helper to set virtqueue address for modern address.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-11-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:58 -05:00
Jason Wang 3fbda9c1a6 virtio-pci-modern: introduce vp_modern_set_queue_vector()
This patch introduces a helper to set virtqueue MSI vector.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-10-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:57 -05:00
Jason Wang ed2a73dbab virtio-pci-modern: introduce vp_modern_generation()
This patch introduces vp_modern_generation() to get device generation.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-9-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:57 -05:00
Jason Wang 0b0177089c virtio-pci-modern: introduce helpers for setting and getting features
This patch introduces helpers for setting and getting features.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-8-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:57 -05:00
Jason Wang e3669129fd virtio-pci-modern: introduce helpers for setting and getting status
This patch introduces helpers to allow set and get device status.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-7-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:57 -05:00
Jason Wang 1a5c85f165 virtio-pci-modern: introduce helper to set config vector
This patch introduces vp_modern_config_vector() for setting config
vector.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-6-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:57 -05:00
Jason Wang 3249037088 virtio-pci-modern: introduce vp_modern_remove()
This patch introduces vp_modern_remove() doing device resources
cleanup to make it can be used.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-5-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:57 -05:00
Jason Wang 117a9de282 virtio-pci-modern: factor out modern device initialization logic
This patch factors out the modern device initialization logic into a
helper. Note that it still depends on the caller to enable pci device
which allows the caller to use e.g devres.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-4-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:57 -05:00
Jason Wang b5d5809450 virtio-pci: split out modern device
This patch splits out the virtio-pci modern device only attributes
into another structure. While at it, a dedicated probe method for
modern only attributes is introduced. This may help for split the
logic into a dedicated module.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-3-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:57 -05:00
Jason Wang 64f2087aaa virtio-pci: do not access iomem via struct virtio_pci_device directly
Instead of accessing iomem via struct virito_pci_device directly,
tweak to call the io accessors through the iomem structure. This will
ease the splitting of modern virtio device logic.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210104065503.199631-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:57 -05:00
Jiapeng Zhong 02cc6b495d virtio-mem: Assign boolean values to a bool variable
Fix the following coccicheck warnings:

./drivers/virtio/virtio_mem.c:2580:2-25: WARNING: Assignment
of 0/1 to bool variable.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Link: https://lore.kernel.org/r/1611129031-82818-1-git-send-email-abaci-bugfix@linux.alibaba.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-23 07:52:56 -05:00
Linus Torvalds 64145482d3 virtio,vdpa: features, cleanups, fixes
vdpa sim refactoring
 virtio mem  Big Block Mode support
 misc cleanus, fixes
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl/gznEPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpu/cIAJSVWVCs/5KVfeOg6NQ5WRK48g58eZoaIS6z
 jr5iyCRfoQs3tQgcX0W02X3QwVwesnpepF9FChFwexlh+Te3tWXKaDj3eWBmlJVh
 Hg8bMOOiOqY7qh47LsGbmb2pnJ3Tg8uwuTz+w/6VDc43CQa7ganwSl0owqye3ecm
 IdGbIIXZQs55FCzM8hwOWWpjsp1C2lRtjefsOc5AbtFjzGk+7767YT+C73UgwcSi
 peHbD8YFJTInQj6JCbF7uYYAWHrOFAOssWE3OwKtZJdTdJvE7bMgSZaYvUgHMvFR
 gRycqxpLAg6vcuns4qjiYafrywvYwEvTkPIXmMG6IAgNYIPAxK0=
 =SmPb
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - vdpa sim refactoring

 - virtio mem: Big Block Mode support

 - misc cleanus, fixes

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (61 commits)
  vdpa: Use simpler version of ida allocation
  vdpa: Add missing comment for virtqueue count
  uapi: virtio_ids: add missing device type IDs from OASIS spec
  uapi: virtio_ids.h: consistent indentions
  vhost scsi: fix error return code in vhost_scsi_set_endpoint()
  virtio_ring: Fix two use after free bugs
  virtio_net: Fix error code in probe()
  virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()
  tools/virtio: add barrier for aarch64
  tools/virtio: add krealloc_array
  tools/virtio: include asm/bug.h
  vdpa/mlx5: Use write memory barrier after updating CQ index
  vdpa: split vdpasim to core and net modules
  vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov
  vdpa_sim: make vdpasim->buffer size configurable
  vdpa_sim: use kvmalloc to allocate vdpasim->buffer
  vdpa_sim: set vringh notify callback
  vdpa_sim: add set_config callback in vdpasim_dev_attr
  vdpa_sim: add get_config callback in vdpasim_dev_attr
  vdpa_sim: make 'config' generic and usable for any device type
  ...
2020-12-24 12:06:46 -08:00
Dan Carpenter e152d8af42 virtio_ring: Fix two use after free bugs
The "vq" struct is added to the "vdev->vqs" list prematurely.  If we
encounter an error later in the function then the "vq" is freed, but
since it is still on the list that could lead to a use after free bug.

Fixes: cbeedb72b9 ("virtio_ring: allocate desc state for split ring separately")
Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de>
Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X8pGaG/zkI3jk8mk@mwanda
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2020-12-18 16:14:31 -05:00
Dan Carpenter ae93d8ea0f virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()
There is a copy and paste bug in the error handling of this code and
it uses "ring_dma_addr" three times instead of "device_event_dma_addr"
and "driver_event_dma_addr".

Fixes: 1ce9e6055f (" virtio_ring: introduce packed ring support")
Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de>
Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X8pGRJlEzyn+04u2@mwanda
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2020-12-18 16:14:30 -05:00
David Hildenbrand 3711387a75 virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
Let's add a safe mechanism to unplug memory, avoiding long/endless loops
when trying to offline memory - similar to in SBM.

Fake-offline all memory (via alloc_contig_range()) before trying to
offline+remove it. Use this mode as default, but allow to enable the other
mode explicitly (which could give better memory hotunplug guarantees in
some environments).

The "unsafe" mode can be enabled e.g., via virtio_mem.bbm_safe_unplug=0
on the cmdline.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-30-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:28 -05:00
David Hildenbrand 269ac9389d virtio-mem: Big Block Mode (BBM) - basic memory hotunplug
Let's try to unplug completely offline big blocks first. Then, (if
enabled via unplug_offline) try to offline and remove whole big blocks.

No locking necessary - we can deal with concurrent onlining/offlining
just fine.

Note1: This is sub-optimal and might be dangerous in some environments: we
could end up in an infinite loop when offlining (e.g., long-term pinnings),
similar as with DIMMs. We'll introduce safe memory hotunplug via
fake-offlining next, and use this basic mode only when explicitly enabled.

Note2: Without ZONE_MOVABLE, memory unplug will be extremely unreliable
with bigger block sizes.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-29-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:28 -05:00
David Hildenbrand faa45ff4ce virtio-mem: allow to force Big Block Mode (BBM) and set the big block size
Let's allow to force BBM, even if subblocks would be possible. Take care
of properly calculating the first big block id, because the start
address might no longer be aligned to the big block size.

Also, allow to manually configure the size of Big Blocks.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-27-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:27 -05:00
David Hildenbrand 4ba50cd335 virtio-mem: Big Block Mode (BBM) memory hotplug
Currently, we do not support device block sizes that exceed the Linux
memory block size. For example, having a device block size of 1 GiB (e.g.,
gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
blocks.

Let's implement Big Block Mode (BBM), whereby we add/remove at least
one Linux memory block at a time. With a 1 GiB device block size, a Big
Block (BB) will cover 8 Linux memory blocks.

We'll keep registering the online_page_callback machinery, it will be used
for safe memory hotunplug in BBM next.

Note: BBM is properly prepared for variable-sized Linux memory
blocks that we might see in the future. So we won't care how many Linux
memory blocks a big block actually spans, and how the memory notifier is
called.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-26-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:27 -05:00
David Hildenbrand 01afdee29a virtio-mem: factor out adding/removing memory from Linux
Let's use wrappers for the low-level functions that dev_dbg/dev_warn
and work on addr + size, such that we can reuse them for adding/removing
in other granularity.

We only warn when adding memory failed, because that's something to pay
attention to. We won't warn when removing failed, we'll reuse that in
racy context soon (and we do have proper BUG_ON() statements in the
current cases where it must never happen).

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-25-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:27 -05:00
David Hildenbrand d46dfb62f6 virtio-mem: memory notifier callbacks are specific to Sub Block Mode (SBM)
Let's rename accordingly.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-24-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:27 -05:00
David Hildenbrand 602ef89457 virito-mem: existing (un)plug functions are specific to Sub Block Mode (SBM)
Let's rename them accordingly. virtio_mem_plug_request() and
virtio_mem_unplug_request() will be handled separately.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-23-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:27 -05:00
David Hildenbrand 8a6f082bab virtio-mem: memory block ids are specific to Sub Block Mode (SBM)
Let's move first_mb_id/next_mb_id/last_usable_mb_id accordingly.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-22-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:27 -05:00
David Hildenbrand 905c4c5146 virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block Mode (SBM)
Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-21-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:27 -05:00
David Hildenbrand 54c6a6ba75 virito-mem: subblock states are specific to Sub Block Mode (SBM)
Let's rename and move accordingly. While at it, rename sb_bitmap to
"sb_states".

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-20-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:27 -05:00
David Hildenbrand 99f0b55ea6 virtio-mem: memory block states are specific to Sub Block Mode (SBM)
let's use a new "sbm" sub-struct to hold SBM-specific state and rename +
move applicable definitions, functions, and variables (related to
memory block states).

While at it:
- Drop the "_STATE" part from memory block states
- Rename "nb_mb_state" to "mb_count"
- "set_mb_state" / "get_mb_state" vs. "mb_set_state" / "mb_get_state"
- Don't use lengthy "enum virtio_mem_smb_mb_state", simply use "uint8_t"

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-19-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:26 -05:00
David Hildenbrand d561494425 virito-mem: document Sub Block Mode (SBM)
Let's add some documentation for the current mode - Sub Block Mode (SBM) -
to prepare for a new mode - Big Block Mode (BBM).

Follow-up patches will properly factor out the existing Sub Block Mode
(SBM) and implement Big Block Mode (BBM).

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-18-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:26 -05:00
David Hildenbrand 98ff9f9411 virtio-mem: generalize handling when memory is getting onlined deferred
We don't want to add too much memory when it's not getting onlined
immediately, to avoid running OOM. Generalize the handling, to avoid
making use of memory block states. Use a threshold of 1 GiB for now.

Properly adjust the offline size when adding/removing memory. As we are
not always protected by a lock when touching the offline size, use an
atomic64_t. We don't care about races (e.g., someone offlining memory
while we are adding more), only about consistent values.

(1 GiB needs a memmap of ~16MiB - which sounds reasonable even for
 setups with little boot memory and (possibly) one virtio-mem device per
 node)

We don't want to retrigger when onlining is caused immediately by our
action (e.g., adding memory which immediately gets onlined), so use a
flag to indicate if the workqueue is active and use that as an
indicator whether to trigger a retry. This will also be especially relevant
for Big Block Mode (BBM), whereby we might re-online memory in case
offlining of another memory block failed.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-17-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:26 -05:00
David Hildenbrand 1d33c2caa8 virtio-mem: don't always trigger the workqueue when offlining memory
Let's trigger from offlining code only when we're not allowed to unplug
online memory. Handle the other case (memmap possibly freeing up another
memory block) when actually removing memory. We now also properly handle
the case when removing already offline memory blocks via
virtio_mem_mb_remove(). When removing via virtio_mem_remove(), when
unloading the driver, virtio_mem_retry() is a NOP and safe to use.

While at it, move retry handling when offlining out of
virtio_mem_notify_offline(), to share it with Big Block Mode (BBM)
soon.

This is a preparation for Big Block Mode (BBM), whereby we can see some
temporary offlining of memory blocks without actually making progress.
Imagine you have a Big Block that spans to Linux memory blocks. Assume
the first Linux memory blocks has no unmovable data on it. When we would
call offline_and_remove_memory() on the big block, we would
	1. Try to offline the first block. Works, notifiers triggered.
	   virtio_mem_retry() called.
	2. Try to offline the second block. Does not work.
	3. Re-online first block.
	4. Exit to main loop, exit workqueue.
	5. Retry immediately (due to virtio_mem_retry()), go to 1.
The result are endless retries.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-16-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:26 -05:00
David Hildenbrand 420066829b virtio-mem: drop last_mb_id
No longer used, let's drop it.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-15-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:26 -05:00
David Hildenbrand 835491c554 virtio-mem: generalize virtio_mem_overlaps_range()
Avoid using memory block ids. While at it, use uint64_t for
address/size.

This is a preparation for Big Block Mode (BBM).

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-14-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:26 -05:00
David Hildenbrand 8464e3bdf2 virtio-mem: generalize virtio_mem_owned_mb()
Avoid using memory block ids. Rename it to virtio_mem_contains_range().

This is a preparation for Big Block Mode (BBM).

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-13-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:26 -05:00
David Hildenbrand 989ff82527 virtio-mem: generalize check for added memory
Let's check by traversing busy system RAM resources instead, to avoid
relying on memory block states.

Don't use walk_system_ram_range(), as that works on pages and we want to
use the bare addresses we have easily at hand.

This is a preparation for Big Block Mode (BBM), which won't have memory
block states.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-12-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:26 -05:00
David Hildenbrand f2d799d591 virtio-mem: retry fake-offlining via alloc_contig_range() on ZONE_MOVABLE
ZONE_MOVABLE is supposed to give some guarantees, yet,
alloc_contig_range() isn't prepared to properly deal with some racy
cases properly (e.g., temporary page pinning when exiting processed, PCP).

Retry 5 times for now. There is certainly room for improvement in the
future.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-11-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:26 -05:00
David Hildenbrand 7a34c77dab virtio-mem: factor out handling of fake-offline pages in memory notifier
Let's factor out the core pieces and place the implementation next to
virtio_mem_fake_offline(). We'll reuse this functionality soon.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-10-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:26 -05:00
David Hildenbrand 89c486c47f virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()
... which now matches virtio_mem_fake_online(). We'll reuse this
functionality soon.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-9-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:25 -05:00
David Hildenbrand 6beb3a9421 virtio-mem: print debug messages from virtio_mem_send_*_request()
Let's move the existing dev_dbg() into the functions, print if something
went wrong, and also print for virtio_mem_send_unplug_all_request().

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-8-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:25 -05:00
David Hildenbrand 41e6215c6d virtio-mem: factor out calculation of the bit number within the subblock bitmap
The calculation is already complicated enough, let's limit it to one
location.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-7-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:25 -05:00
David Hildenbrand 2a6285114b virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining
No harm done, but let's be consistent.

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-6-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:25 -05:00
David Hildenbrand d76944f80d virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()
We can drop rc2, we don't actually need the value.

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-5-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:25 -05:00
David Hildenbrand 20b9150225 virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling
Let's use pageblock_nr_pages and MAX_ORDER_NR_PAGES instead where
possible to simplify.

Add a comment why we have that restriction for now.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-4-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:25 -05:00
David Hildenbrand 347202dc04 virtio-mem: more precise calculation in virtio_mem_mb_state_prepare_next_mb()
We actually need one byte less (next_mb_id is exclusive, first_mb_id is
inclusive). While at it, compact the code.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-3-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:25 -05:00
David Hildenbrand 6725f21157 virtio-mem: determine nid only once using memory_add_physaddr_to_nid()
Let's determine the target nid only once in case we have none specified -
usually, we'll end up with node 0 either way.

Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-2-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18 16:14:25 -05:00
Vlastimil Babka 8f424750ba mm, page_poison: remove CONFIG_PAGE_POISONING_NO_SANITY
CONFIG_PAGE_POISONING_NO_SANITY skips the check on page alloc whether the
poison pattern was corrupted, suggesting a use-after-free.  The motivation
to introduce it in commit 8823b1dbc0 ("mm/page_poison.c: enable
PAGE_POISONING as a separate option") was to simply sanitize freed pages,
optimally together with CONFIG_PAGE_POISONING_ZERO.

These days we have an init_on_free=1 boot option, which makes this use
case of page poisoning redundant.  For sanitizing, writing zeroes is
sufficient, there is pretty much no benefit from writing the 0xAA poison
pattern to freed pages, without checking it back on alloc.  Thus, remove
this option and suggest init_on_free instead in the main config's help.

Link: https://lkml.kernel.org/r/20201113104033.22907-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Vlastimil Babka 8db26a3d47 mm, page_poison: use static key more efficiently
Commit 11c9c7edae ("mm/page_poison.c: replace bool variable with static
key") changed page_poisoning_enabled() to a static key check.  However,
the function is not inlined, so each check still involves a function call
with overhead not eliminated when page poisoning is disabled.

Analogically to how debug_pagealloc is handled, this patch converts
page_poisoning_enabled() back to boolean check, and introduces
page_poisoning_enabled_static() for fast paths.  Both functions are
inlined.

The function kernel_poison_pages() is also called unconditionally and does
the static key check inside.  Remove it from there and put it to callers.
Also split it to two functions kernel_poison_pages() and
kernel_unpoison_pages() instead of the confusing bool parameter.

Also optimize the check that enables page poisoning instead of
debug_pagealloc for architectures without proper debug_pagealloc support.
Move the check to init_mem_debugging_and_hardening() to enable a single
static key instead of having two static branches in
page_poisoning_enabled_static().

Link: https://lkml.kernel.org/r/20201113104033.22907-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Linus Torvalds 9313f80263 vhost,vdpa,virtio: cleanups, fixes
A very quiet cycle, no new features.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+QSnEPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpvzoIAIAJPV0OTShpvv8JXmBDngDGysuAcQah+d3u
 g2vDzRb9J3lYH7hJgkHans/4s3wYtWcJei7tgU2UkSODTSPK/l+hp4sTuVowsqPD
 Cvp6k7/ipzJscl2AAiflSn5gBUORHXU8oxEeDvUAJbVkSwWdKvKgvDGPbVxZCU0V
 kGlUctRq96e/TQCNekVthZ1Q4cgPKgx4zMFZjLSbj0yDN2JJJp+0Y+y5NJ5u9eTE
 VneaFZOJxlhjmNZZP1Bu/MOcvgPbjxZjDRRUP75sv8c7IkoGiubHbbwcDhbE5gVd
 Ve/ByiFTJe9ydKVVLm1O81AqO7uB13W46LjF5yotyk/dKX6s5eU=
 =1Gdh
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "vhost, vdpa, and virtio cleanups and fixes

  A very quiet cycle, no new features"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  MAINTAINERS: add URL for virtio-mem
  vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call
  vringh: fix __vringh_iov() when riov and wiov are different
  vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
  s390: virtio: PV needs VIRTIO I/O device protection
  virtio: let arch advertise guest's memory access restrictions
  vhost_vdpa: Fix duplicate included kernel.h
  vhost: reduce stack usage in log_used
  virtio-mem: Constify mem_id_table
  virtio_input: Constify id_table
  virtio-balloon: Constify id_table
  vdpa/mlx5: Fix failure to bring link up
  vdpa/mlx5: Make use of a specific 16 bit endianness API
2020-10-23 11:00:57 -07:00
Pierre Morel 0afa15e1a5 virtio: let arch advertise guest's memory access restrictions
An architecture may restrict host access to guest memory,
e.g. IBM s390 Secure Execution or AMD SEV.

Provide a new Kconfig entry the architecture can select,
CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, when it provides
the arch_has_restricted_virtio_memory_access callback to advertise
to VIRTIO common code when the architecture restricts memory access
from the host.

The common code can then fail the probe for any device where
VIRTIO_F_ACCESS_PLATFORM is required, but not set.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Link: https://lore.kernel.org/r/1599728030-17085-2-git-send-email-pmorel@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-10-21 10:34:12 -04:00
Rikard Falkeborn 7ab4de6002 virtio-mem: Constify mem_id_table
mem_id_table is not modified, so make it const to allow the compiler to
put it in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Link: https://lore.kernel.org/r/20200911203509.26505-4-rikard.falkeborn@gmail.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
2020-10-21 10:34:10 -04:00