qemu/qapi
Ankit Agrawal b64b7ed8bb qom: new object to associate device to NUMA node
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
partitioning of the GPU device resources (including device memory) into
several (upto 8) isolated instances. Each of the partitioned memory needs
a dedicated NUMA node to operate. The partitions are not fixed and they
can be created/deleted at runtime.

Unfortunately Linux OS does not provide a means to dynamically create/destroy
NUMA nodes and such feature implementation is not expected to be trivial. The
nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
we utilize the Generic Initiator (GI) Affinity structures that allows
association between nodes and devices. Multiple GI structures per BDF is
possible, allowing creation of multiple nodes by exposing unique PXM in each
of these structures.

Implement the mechanism to build the GI affinity structures as Qemu currently
does not. Introduce a new acpi-generic-initiator object to allow host admin
link a device with an associated NUMA node. Qemu maintains this association
and use this object to build the requisite GI Affinity Structure.

When multiple NUMA nodes are associated with a device, it is required to
create those many number of acpi-generic-initiator objects, each representing
a unique device:node association.

Following is one of a decoded GI affinity structure in VM ACPI SRAT.
[0C8h 0200   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0C9h 0201   1]                       Length : 20

[0CAh 0202   1]                    Reserved1 : 00
[0CBh 0203   1]           Device Handle Type : 01
[0CCh 0204   4]             Proximity Domain : 00000007
[0D0h 0208  16]                Device Handle : 00 00 20 00 00 00 00 00 00 00 00
00 00 00 00 00
[0E0h 0224   4]        Flags (decoded below) : 00000001
                                     Enabled : 1
[0E4h 0228   4]                    Reserved2 : 00000000

[0E8h 0232   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0E9h 0233   1]                       Length : 20

An admin can provide a range of acpi-generic-initiator objects, each
associating a device (by providing the id through pci-dev argument)
to the desired NUMA node (using the node argument). Currently, only PCI
device is supported.

For the grace hopper system, create a range of 8 nodes and associate that
with the device using the acpi-generic-initiator object. While a configuration
of less than 8 nodes per device is allowed, such configuration will prevent
utilization of the feature to the fullest. The following sample creates 8
nodes per PCI device for a VM with 2 PCI devices and link them to the
respecitve PCI device using acpi-generic-initiator objects:

-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
-numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
-numa node,nodeid=8 -numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
-object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
-object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
-object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
-object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
-object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
-object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
-object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \

-numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \
-numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \
-numa node,nodeid=16 -numa node,nodeid=17 \
-device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \
-object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \
-object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \
-object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \
-object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \
-object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \
-object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \
-object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \
-object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \

Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1]
Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Message-Id: <20240308145525.10886-2-ankita@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12 17:56:55 -04:00
..
acpi.json qapi: Require descriptions and tagged sections to be indented 2024-02-26 10:43:56 +01:00
audio.json audio/pw: Pipewire->PipeWire case fix for user-visible text 2023-07-17 15:22:56 +04:00
authz.json qapi: Reformat doc comments to conform to current conventions 2023-05-10 10:01:01 +02:00
block-core.json qapi: Clean up "Returns" sections 2024-03-04 07:12:40 +01:00
block-export.json qapi: Move error documentation to new "Errors" sections 2024-03-04 07:12:40 +01:00
block.json qapi: Delete useless "Returns" sections 2024-03-04 07:12:40 +01:00
char.json qapi: Delete useless "Returns" sections 2024-03-04 07:12:40 +01:00
common.json pcie: Support PCIe Gen5/Gen6 link speeds 2024-03-12 17:56:55 -04:00
compat.json qapi: Belatedly update CompatPolicy documentation for unstable 2023-10-19 07:02:29 +02:00
control.json qapi: Require descriptions and tagged sections to be indented 2024-02-26 10:43:56 +01:00
crypto.json QAPI patches patches for 2024-02-12 2024-02-13 10:54:59 +00:00
cryptodev.json spelling: information 2023-06-09 23:38:16 +03:00
cxl.json qapi: Reformat recent doc comments to conform to current conventions 2023-07-26 14:51:36 +02:00
dump.json qapi: Delete useless "Returns" sections 2024-03-04 07:12:40 +01:00
error.json qapi: Reformat doc comments to conform to current conventions 2023-05-10 10:01:01 +02:00
introspect.json qapi: Drop redundant documentation of inherited members 2024-02-03 09:19:25 +01:00
job.json blockjob: introduce block-job-change QMP command 2023-10-31 18:20:25 +01:00
machine-common.json CPU topology: extend with s390 specifics 2023-10-20 07:16:53 +02:00
machine-target.json qapi: Delete useless "Returns" sections 2024-03-04 07:12:40 +01:00
machine.json qapi: Clean up "Returns" sections 2024-03-04 07:12:40 +01:00
meson.build CPU topology: extend with s390 specifics 2023-10-20 07:16:53 +02:00
migration.json QAPI patches patches for 2024-03-04 2024-03-05 11:20:15 +00:00
misc-target.json qapi: Delete useless "Returns" sections 2024-03-04 07:12:40 +01:00
misc.json qapi: Clean up "Returns" sections 2024-03-04 07:12:40 +01:00
net.json qapi: Delete useless "Returns" sections 2024-03-04 07:12:40 +01:00
opts-visitor.c cutils: Adjust signature of parse_uint[_full] 2023-06-02 12:27:19 -05:00
pci.json qapi: Require descriptions and tagged sections to be indented 2024-02-26 10:43:56 +01:00
pragma.json qapi/migration: Add missing tls-authz documentation 2024-02-12 10:04:32 +01:00
qapi-clone-visitor.c qapi: Make visitor functions taking Error ** return bool, not void 2020-07-10 15:18:08 +02:00
qapi-dealloc-visitor.c qapi: Make visitor functions taking Error ** return bool, not void 2020-07-10 15:18:08 +02:00
qapi-forward-visitor.c qapi: remove needless include 2022-03-22 14:46:18 +04:00
qapi-schema.json CPU topology: extend with s390 specifics 2023-10-20 07:16:53 +02:00
qapi-type-helpers.c qapi: New strv_from_str_list() 2024-03-04 07:12:40 +01:00
qapi-util.c qapi: Fix dangling references to docs/devel/qapi-code-gen.txt 2024-01-26 07:04:53 +01:00
qapi-visit-core.c qapi: Factor out compat_policy_input_ok() 2021-10-29 21:27:20 +02:00
qdev.json qapi: Delete useless "Returns" sections 2024-03-04 07:12:40 +01:00
qmp-dispatch.c monitor: use aio_co_reschedule_self() 2024-02-07 14:44:21 +01:00
qmp-event.c Replace qemu_gettimeofday() with g_get_real_time() 2022-04-06 10:50:37 +02:00
qmp-registry.c qapi: Generalize command policy checking 2021-10-29 18:24:46 +02:00
qobject-input-visitor.c include: add qemu/keyval.h 2022-04-21 17:03:51 +04:00
qobject-output-visitor.c qapi: Extend -compat to set policy for unstable interfaces 2021-10-29 21:28:01 +02:00
qom.json qom: new object to associate device to NUMA node 2024-03-12 17:56:55 -04:00
rdma.json qapi: Require descriptions and tagged sections to be indented 2024-02-26 10:43:56 +01:00
replay.json qapi: Require descriptions and tagged sections to be indented 2024-02-26 10:43:56 +01:00
rocker.json qapi: Require descriptions and tagged sections to be indented 2024-02-26 10:43:56 +01:00
run-state.json qapi: Delete useless "Returns" sections 2024-03-04 07:12:40 +01:00
sockets.json qapi: Add missing union tag documentation 2024-02-12 10:04:32 +01:00
stats.json qapi: Add missing union tag documentation 2024-02-12 10:04:32 +01:00
string-input-visitor.c qapi, qemu-options: make all parsing visitors parse boolean options the same 2020-11-04 12:00:40 -05:00
string-output-visitor.c string-output-visitor: Fix (pseudo) struct handling 2024-01-26 11:16:58 +01:00
tpm.json qapi: Delete useless "Returns" sections 2024-03-04 07:12:40 +01:00
trace-events qapi: Generalize struct member policy checking 2021-10-29 18:23:09 +02:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
trace.json qapi: Require descriptions and tagged sections to be indented 2024-02-26 10:43:56 +01:00
transaction.json qapi: Delete useless "Returns" sections 2024-03-04 07:12:40 +01:00
ui.json Darwin Cocoa patches: 2024-03-08 18:19:25 +00:00
virtio.json qdev: Add a granule_mode property 2024-03-09 19:17:01 +01:00
yank.json qapi/yank: Tweak @yank's error description for consistency 2024-03-04 07:12:40 +01:00