qemu/hw/acpi/pcihp.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

526 lines
16 KiB
C
Raw Normal View History

/*
* QEMU<->ACPI BIOS PCI hotplug interface
*
* QEMU supports PCI hotplug via ACPI. This module
* implements the interface between QEMU and the ACPI BIOS.
* Interface specification - see docs/specs/acpi_pci_hotplug.txt
*
* Copyright (c) 2013, Red Hat Inc, Michael S. Tsirkin (mst@redhat.com)
* Copyright (c) 2006 Fabrice Bellard
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License version 2.1 as published by the Free Software Foundation.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, see <http://www.gnu.org/licenses/>
*
* Contributions after 2012-01-13 are licensed under the terms of the
* GNU GPL, version 2 or (at your option) any later version.
*/
#include "qemu/osdep.h"
#include "hw/acpi/pcihp.h"
#include "hw/pci-host/i440fx.h"
#include "hw/pci/pci.h"
#include "hw/pci/pci_bridge.h"
#include "hw/pci/pci_host.h"
#include "hw/pci/pcie_port.h"
acpi: pcihp: pcie: set power on cap on parent slot on creation a PCIDevice has power turned on at the end of pci_qdev_realize() however later on if PCIe slot isn't populated with any children it's power is turned off. It's fine if native hotplug is used as plug callback will power slot on among other things. However when ACPI hotplug is enabled it replaces native PCIe plug callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and as result slot stays powered off. It works fine as ACPI hotplug on guest side takes care of enumerating/initializing hotplugged device. But when later guest is migrated, call chain introduced by] commit d5daff7d312 (pcie: implement slot power control for pcie root ports) pcie_cap_slot_post_load() -> pcie_cap_update_power() -> pcie_set_power_device() -> pci_set_power() -> pci_update_mappings() will disable earlier initialized BARs for the hotplugged device in powered off slot due to commit 23786d13441 (pci: implement power state) which disables BARs if power is off. Fix it by setting PCI_EXP_SLTCTL_PCC to PCI_EXP_SLTCTL_PWR_ON on slot (root port/downstream port) at the time a device hotplugged into it. As result PCI_EXP_SLTCTL_PWR_ON is migrated to target and above call chain keeps device plugged into it powered on. Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports") Fixes: 23786d13441 ("pci: implement power state") Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 Suggested-by: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20220301151200.3507298-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-01 15:11:59 +00:00
#include "hw/pci-bridge/xio3130_downstream.h"
#include "hw/i386/acpi-build.h"
#include "hw/acpi/acpi.h"
#include "hw/pci/pci_bus.h"
#include "migration/vmstate.h"
2016-03-14 08:01:28 +00:00
#include "qapi/error.h"
#include "qom/qom-qobject.h"
#include "trace.h"
pci: introduce acpi-index property for PCI device In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-15 18:00:58 +00:00
#define ACPI_PCIHP_SIZE 0x0018
#define PCI_UP_BASE 0x0000
#define PCI_DOWN_BASE 0x0004
#define PCI_EJ_BASE 0x0008
#define PCI_RMV_BASE 0x000c
#define PCI_SEL_BASE 0x0010
pci: introduce acpi-index property for PCI device In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-15 18:00:58 +00:00
#define PCI_AIDX_BASE 0x0014
typedef struct AcpiPciHpFind {
int bsel;
PCIBus *bus;
} AcpiPciHpFind;
static int acpi_pcihp_get_bsel(PCIBus *bus)
{
Error *local_err = NULL;
uint64_t bsel = object_property_get_uint(OBJECT(bus), ACPI_PCIHP_PROP_BSEL,
&local_err);
if (local_err || bsel >= ACPI_PCIHP_MAX_HOTPLUG_BUS) {
if (local_err) {
error_free(local_err);
}
return -1;
} else {
return bsel;
}
}
pci: acpihp: assign BSEL only to coldplugged bridges ACPI PCI hotplug would broken after bridge hotplug and then migration if hotplugged bridge were specified on target at command line. Currently it's not possible since, 'hotplugged' property was made read-only for some time now. The issue would happen due to BSEL being assigned to all bridges during 1st 'reset': source seq: 1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge) 2. hotplug bridge, no bsel is assigned (so far is ok) target seq: 1. start 'pc' machine with -S -device pci-bridge,id=hp_br,hotplugged=on BSEL gets assigned to as follows hp_br: 0 pci.0: 1 as result hotplug requests with migrated AML generated on source would be misdirected to 'hp_br' instead of intended pci.0 While it's not issue at the moment, it's based on implicit assumptions * 'hotplugged' property is read-only * 1st reset happens before QEMU drops into monitor mode which lets add hotplugged on source bridges as hotplugged ones (anything added at that stage counts as hotplugged (yet another assumption)) All of it looks too fragile to me, so lets restrict BSEL only to cold-plugged bridges explicitly. Migration wise it shouldn't break anything since assignment order stays the same: * user can't specify 'hotplugged=on' on CLI * user can't specify 'hotplugged=off' at monitor stage or later on older QEMU versions where 'hotplugged' is RW, hotplug is broken after migration anyways and we cannot do anything to fix that. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230112140312.3096331-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-12 14:02:43 +00:00
typedef struct {
unsigned bsel_alloc;
bool has_bridge_hotplug;
} BSELInfo;
/* Assign BSEL property only to buses that support hotplug. */
static void *acpi_set_bsel(PCIBus *bus, void *opaque)
{
pci: acpihp: assign BSEL only to coldplugged bridges ACPI PCI hotplug would broken after bridge hotplug and then migration if hotplugged bridge were specified on target at command line. Currently it's not possible since, 'hotplugged' property was made read-only for some time now. The issue would happen due to BSEL being assigned to all bridges during 1st 'reset': source seq: 1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge) 2. hotplug bridge, no bsel is assigned (so far is ok) target seq: 1. start 'pc' machine with -S -device pci-bridge,id=hp_br,hotplugged=on BSEL gets assigned to as follows hp_br: 0 pci.0: 1 as result hotplug requests with migrated AML generated on source would be misdirected to 'hp_br' instead of intended pci.0 While it's not issue at the moment, it's based on implicit assumptions * 'hotplugged' property is read-only * 1st reset happens before QEMU drops into monitor mode which lets add hotplugged on source bridges as hotplugged ones (anything added at that stage counts as hotplugged (yet another assumption)) All of it looks too fragile to me, so lets restrict BSEL only to cold-plugged bridges explicitly. Migration wise it shouldn't break anything since assignment order stays the same: * user can't specify 'hotplugged=on' on CLI * user can't specify 'hotplugged=off' at monitor stage or later on older QEMU versions where 'hotplugged' is RW, hotplug is broken after migration anyways and we cannot do anything to fix that. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230112140312.3096331-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-12 14:02:43 +00:00
BSELInfo *info = opaque;
unsigned *bus_bsel;
pci: acpihp: assign BSEL only to coldplugged bridges ACPI PCI hotplug would broken after bridge hotplug and then migration if hotplugged bridge were specified on target at command line. Currently it's not possible since, 'hotplugged' property was made read-only for some time now. The issue would happen due to BSEL being assigned to all bridges during 1st 'reset': source seq: 1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge) 2. hotplug bridge, no bsel is assigned (so far is ok) target seq: 1. start 'pc' machine with -S -device pci-bridge,id=hp_br,hotplugged=on BSEL gets assigned to as follows hp_br: 0 pci.0: 1 as result hotplug requests with migrated AML generated on source would be misdirected to 'hp_br' instead of intended pci.0 While it's not issue at the moment, it's based on implicit assumptions * 'hotplugged' property is read-only * 1st reset happens before QEMU drops into monitor mode which lets add hotplugged on source bridges as hotplugged ones (anything added at that stage counts as hotplugged (yet another assumption)) All of it looks too fragile to me, so lets restrict BSEL only to cold-plugged bridges explicitly. Migration wise it shouldn't break anything since assignment order stays the same: * user can't specify 'hotplugged=on' on CLI * user can't specify 'hotplugged=off' at monitor stage or later on older QEMU versions where 'hotplugged' is RW, hotplug is broken after migration anyways and we cannot do anything to fix that. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230112140312.3096331-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-12 14:02:43 +00:00
DeviceState *br = bus->qbus.parent;
bool is_bridge = IS_PCI_BRIDGE(br);
pci: acpihp: assign BSEL only to coldplugged bridges ACPI PCI hotplug would broken after bridge hotplug and then migration if hotplugged bridge were specified on target at command line. Currently it's not possible since, 'hotplugged' property was made read-only for some time now. The issue would happen due to BSEL being assigned to all bridges during 1st 'reset': source seq: 1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge) 2. hotplug bridge, no bsel is assigned (so far is ok) target seq: 1. start 'pc' machine with -S -device pci-bridge,id=hp_br,hotplugged=on BSEL gets assigned to as follows hp_br: 0 pci.0: 1 as result hotplug requests with migrated AML generated on source would be misdirected to 'hp_br' instead of intended pci.0 While it's not issue at the moment, it's based on implicit assumptions * 'hotplugged' property is read-only * 1st reset happens before QEMU drops into monitor mode which lets add hotplugged on source bridges as hotplugged ones (anything added at that stage counts as hotplugged (yet another assumption)) All of it looks too fragile to me, so lets restrict BSEL only to cold-plugged bridges explicitly. Migration wise it shouldn't break anything since assignment order stays the same: * user can't specify 'hotplugged=on' on CLI * user can't specify 'hotplugged=off' at monitor stage or later on older QEMU versions where 'hotplugged' is RW, hotplug is broken after migration anyways and we cannot do anything to fix that. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230112140312.3096331-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-12 14:02:43 +00:00
/* hotplugged bridges can't be described in ACPI ignore them */
if (qbus_is_hotpluggable(BUS(bus))) {
pci: acpihp: assign BSEL only to coldplugged bridges ACPI PCI hotplug would broken after bridge hotplug and then migration if hotplugged bridge were specified on target at command line. Currently it's not possible since, 'hotplugged' property was made read-only for some time now. The issue would happen due to BSEL being assigned to all bridges during 1st 'reset': source seq: 1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge) 2. hotplug bridge, no bsel is assigned (so far is ok) target seq: 1. start 'pc' machine with -S -device pci-bridge,id=hp_br,hotplugged=on BSEL gets assigned to as follows hp_br: 0 pci.0: 1 as result hotplug requests with migrated AML generated on source would be misdirected to 'hp_br' instead of intended pci.0 While it's not issue at the moment, it's based on implicit assumptions * 'hotplugged' property is read-only * 1st reset happens before QEMU drops into monitor mode which lets add hotplugged on source bridges as hotplugged ones (anything added at that stage counts as hotplugged (yet another assumption)) All of it looks too fragile to me, so lets restrict BSEL only to cold-plugged bridges explicitly. Migration wise it shouldn't break anything since assignment order stays the same: * user can't specify 'hotplugged=on' on CLI * user can't specify 'hotplugged=off' at monitor stage or later on older QEMU versions where 'hotplugged' is RW, hotplug is broken after migration anyways and we cannot do anything to fix that. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230112140312.3096331-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-12 14:02:43 +00:00
if (!is_bridge || (!br->hotplugged && info->has_bridge_hotplug)) {
bus_bsel = g_malloc(sizeof *bus_bsel);
pci: acpihp: assign BSEL only to coldplugged bridges ACPI PCI hotplug would broken after bridge hotplug and then migration if hotplugged bridge were specified on target at command line. Currently it's not possible since, 'hotplugged' property was made read-only for some time now. The issue would happen due to BSEL being assigned to all bridges during 1st 'reset': source seq: 1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge) 2. hotplug bridge, no bsel is assigned (so far is ok) target seq: 1. start 'pc' machine with -S -device pci-bridge,id=hp_br,hotplugged=on BSEL gets assigned to as follows hp_br: 0 pci.0: 1 as result hotplug requests with migrated AML generated on source would be misdirected to 'hp_br' instead of intended pci.0 While it's not issue at the moment, it's based on implicit assumptions * 'hotplugged' property is read-only * 1st reset happens before QEMU drops into monitor mode which lets add hotplugged on source bridges as hotplugged ones (anything added at that stage counts as hotplugged (yet another assumption)) All of it looks too fragile to me, so lets restrict BSEL only to cold-plugged bridges explicitly. Migration wise it shouldn't break anything since assignment order stays the same: * user can't specify 'hotplugged=on' on CLI * user can't specify 'hotplugged=off' at monitor stage or later on older QEMU versions where 'hotplugged' is RW, hotplug is broken after migration anyways and we cannot do anything to fix that. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230112140312.3096331-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-12 14:02:43 +00:00
*bus_bsel = info->bsel_alloc++;
object_property_add_uint32_ptr(OBJECT(bus), ACPI_PCIHP_PROP_BSEL,
bus_bsel, OBJ_PROP_FLAG_READ);
}
}
pci: acpihp: assign BSEL only to coldplugged bridges ACPI PCI hotplug would broken after bridge hotplug and then migration if hotplugged bridge were specified on target at command line. Currently it's not possible since, 'hotplugged' property was made read-only for some time now. The issue would happen due to BSEL being assigned to all bridges during 1st 'reset': source seq: 1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge) 2. hotplug bridge, no bsel is assigned (so far is ok) target seq: 1. start 'pc' machine with -S -device pci-bridge,id=hp_br,hotplugged=on BSEL gets assigned to as follows hp_br: 0 pci.0: 1 as result hotplug requests with migrated AML generated on source would be misdirected to 'hp_br' instead of intended pci.0 While it's not issue at the moment, it's based on implicit assumptions * 'hotplugged' property is read-only * 1st reset happens before QEMU drops into monitor mode which lets add hotplugged on source bridges as hotplugged ones (anything added at that stage counts as hotplugged (yet another assumption)) All of it looks too fragile to me, so lets restrict BSEL only to cold-plugged bridges explicitly. Migration wise it shouldn't break anything since assignment order stays the same: * user can't specify 'hotplugged=on' on CLI * user can't specify 'hotplugged=off' at monitor stage or later on older QEMU versions where 'hotplugged' is RW, hotplug is broken after migration anyways and we cannot do anything to fix that. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230112140312.3096331-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-12 14:02:43 +00:00
return info;
}
pci: acpihp: assign BSEL only to coldplugged bridges ACPI PCI hotplug would broken after bridge hotplug and then migration if hotplugged bridge were specified on target at command line. Currently it's not possible since, 'hotplugged' property was made read-only for some time now. The issue would happen due to BSEL being assigned to all bridges during 1st 'reset': source seq: 1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge) 2. hotplug bridge, no bsel is assigned (so far is ok) target seq: 1. start 'pc' machine with -S -device pci-bridge,id=hp_br,hotplugged=on BSEL gets assigned to as follows hp_br: 0 pci.0: 1 as result hotplug requests with migrated AML generated on source would be misdirected to 'hp_br' instead of intended pci.0 While it's not issue at the moment, it's based on implicit assumptions * 'hotplugged' property is read-only * 1st reset happens before QEMU drops into monitor mode which lets add hotplugged on source bridges as hotplugged ones (anything added at that stage counts as hotplugged (yet another assumption)) All of it looks too fragile to me, so lets restrict BSEL only to cold-plugged bridges explicitly. Migration wise it shouldn't break anything since assignment order stays the same: * user can't specify 'hotplugged=on' on CLI * user can't specify 'hotplugged=off' at monitor stage or later on older QEMU versions where 'hotplugged' is RW, hotplug is broken after migration anyways and we cannot do anything to fix that. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230112140312.3096331-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-12 14:02:43 +00:00
static void acpi_set_pci_info(bool has_bridge_hotplug)
{
static bool bsel_is_set;
Object *host = acpi_get_i386_pci_host();
PCIBus *bus;
pci: acpihp: assign BSEL only to coldplugged bridges ACPI PCI hotplug would broken after bridge hotplug and then migration if hotplugged bridge were specified on target at command line. Currently it's not possible since, 'hotplugged' property was made read-only for some time now. The issue would happen due to BSEL being assigned to all bridges during 1st 'reset': source seq: 1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge) 2. hotplug bridge, no bsel is assigned (so far is ok) target seq: 1. start 'pc' machine with -S -device pci-bridge,id=hp_br,hotplugged=on BSEL gets assigned to as follows hp_br: 0 pci.0: 1 as result hotplug requests with migrated AML generated on source would be misdirected to 'hp_br' instead of intended pci.0 While it's not issue at the moment, it's based on implicit assumptions * 'hotplugged' property is read-only * 1st reset happens before QEMU drops into monitor mode which lets add hotplugged on source bridges as hotplugged ones (anything added at that stage counts as hotplugged (yet another assumption)) All of it looks too fragile to me, so lets restrict BSEL only to cold-plugged bridges explicitly. Migration wise it shouldn't break anything since assignment order stays the same: * user can't specify 'hotplugged=on' on CLI * user can't specify 'hotplugged=off' at monitor stage or later on older QEMU versions where 'hotplugged' is RW, hotplug is broken after migration anyways and we cannot do anything to fix that. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230112140312.3096331-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-12 14:02:43 +00:00
BSELInfo info = { .bsel_alloc = ACPI_PCIHP_BSEL_DEFAULT,
.has_bridge_hotplug = has_bridge_hotplug };
if (bsel_is_set) {
return;
}
bsel_is_set = true;
if (!host) {
return;
}
bus = PCI_HOST_BRIDGE(host)->bus;
if (bus) {
/* Scan all PCI buses. Set property to enable acpi based hotplug. */
pci: acpihp: assign BSEL only to coldplugged bridges ACPI PCI hotplug would broken after bridge hotplug and then migration if hotplugged bridge were specified on target at command line. Currently it's not possible since, 'hotplugged' property was made read-only for some time now. The issue would happen due to BSEL being assigned to all bridges during 1st 'reset': source seq: 1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge) 2. hotplug bridge, no bsel is assigned (so far is ok) target seq: 1. start 'pc' machine with -S -device pci-bridge,id=hp_br,hotplugged=on BSEL gets assigned to as follows hp_br: 0 pci.0: 1 as result hotplug requests with migrated AML generated on source would be misdirected to 'hp_br' instead of intended pci.0 While it's not issue at the moment, it's based on implicit assumptions * 'hotplugged' property is read-only * 1st reset happens before QEMU drops into monitor mode which lets add hotplugged on source bridges as hotplugged ones (anything added at that stage counts as hotplugged (yet another assumption)) All of it looks too fragile to me, so lets restrict BSEL only to cold-plugged bridges explicitly. Migration wise it shouldn't break anything since assignment order stays the same: * user can't specify 'hotplugged=on' on CLI * user can't specify 'hotplugged=off' at monitor stage or later on older QEMU versions where 'hotplugged' is RW, hotplug is broken after migration anyways and we cannot do anything to fix that. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230112140312.3096331-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-12 14:02:43 +00:00
pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, &info);
}
}
static void acpi_pcihp_test_hotplug_bus(PCIBus *bus, void *opaque)
{
AcpiPciHpFind *find = opaque;
if (find->bsel == acpi_pcihp_get_bsel(bus)) {
find->bus = bus;
}
}
static PCIBus *acpi_pcihp_find_hotplug_bus(AcpiPciHpState *s, int bsel)
{
AcpiPciHpFind find = { .bsel = bsel, .bus = NULL };
if (bsel < 0) {
return NULL;
}
pci_for_each_bus(s->root, acpi_pcihp_test_hotplug_bus, &find);
/* Make bsel 0 eject root bus if bsel property is not set,
* for compatibility with non acpi setups.
* TODO: really needed?
*/
if (!bsel && !find.bus) {
find.bus = s->root;
}
/*
* Check if find.bus is actually hotpluggable. If bsel is set to
* NULL for example on the root bus in order to make it
* non-hotpluggable, find.bus will match the root bus when bsel
* is 0. See acpi_pcihp_test_hotplug_bus() above. Since the
* bus is not hotpluggable however, we should not select the bus.
* Instead, we should set find.bus to NULL in that case. In the check
* below, we generalize this case for all buses, not just the root bus.
* The callers of this function check for a null return value and
* handle them appropriately.
*/
if (find.bus && !qbus_is_hotpluggable(BUS(find.bus))) {
find.bus = NULL;
}
return find.bus;
}
static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, PCIDevice *dev)
{
DeviceClass *dc = DEVICE_GET_CLASS(dev);
/*
* ACPI doesn't allow hotplug of bridge devices. Don't allow
* hot-unplug of bridge devices unless they were added by hotplug
* (and so, not described by acpi).
hw/acpi: Make the PCI hot-plug aware of SR-IOV PCI device capable of SR-IOV support is a new, still-experimental feature with only a single working example of the Nvme device. This patch in an attempt to fix a double-free problem when a SR-IOV-capable Nvme device is hot-unplugged in the following scenario: Qemu CLI: --------- -device pcie-root-port,slot=0,id=rp0 -device nvme-subsys,id=subsys0 -device nvme,id=nvme0,bus=rp0,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,sriov_vq_flexible=2,sriov_vi_flexible=1 Guest OS: --------- sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0 sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0 echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset sleep 1 echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0 sleep 2 echo 01:00.1 > /sys/bus/pci/drivers/nvme/bind Qemu monitor: ------------- device_del nvme0 Explanation of the problem and the proposed solution: 1) The current SR-IOV implementation assumes it’s the PhysicalFunction that creates and deletes VirtualFunctions. 2) It’s a design decision (the Nvme device at least) for the VFs to be of the same class as PF. Effectively, they share the dc->hotpluggable value. 3) When a VF is created, it’s added as a child node to PF’s PCI bus slot. 4) Monitor/device_del triggers the ACPI mechanism. The implementation is not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes. 5) VFs are unrealized directly, and it doesn’t work well with (1). SR/IOV structures are not updated, so when it’s PF’s turn to be unrealized, it works on stale pointers to already-deleted VFs. The proposed fix is to make the PCI ACPI code aware of SR/IOV. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-05-09 14:16:20 +00:00
*
* Don't allow hot-unplug of SR-IOV Virtual Functions, as they
* will be removed implicitly, when Physical Function is unplugged.
*/
return (IS_PCI_BRIDGE(dev) && !dev->qdev.hotplugged) || !dc->hotpluggable ||
hw/acpi: Make the PCI hot-plug aware of SR-IOV PCI device capable of SR-IOV support is a new, still-experimental feature with only a single working example of the Nvme device. This patch in an attempt to fix a double-free problem when a SR-IOV-capable Nvme device is hot-unplugged in the following scenario: Qemu CLI: --------- -device pcie-root-port,slot=0,id=rp0 -device nvme-subsys,id=subsys0 -device nvme,id=nvme0,bus=rp0,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,sriov_vq_flexible=2,sriov_vi_flexible=1 Guest OS: --------- sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0 sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0 echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset sleep 1 echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0 sleep 2 echo 01:00.1 > /sys/bus/pci/drivers/nvme/bind Qemu monitor: ------------- device_del nvme0 Explanation of the problem and the proposed solution: 1) The current SR-IOV implementation assumes it’s the PhysicalFunction that creates and deletes VirtualFunctions. 2) It’s a design decision (the Nvme device at least) for the VFs to be of the same class as PF. Effectively, they share the dc->hotpluggable value. 3) When a VF is created, it’s added as a child node to PF’s PCI bus slot. 4) Monitor/device_del triggers the ACPI mechanism. The implementation is not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes. 5) VFs are unrealized directly, and it doesn’t work well with (1). SR/IOV structures are not updated, so when it’s PF’s turn to be unrealized, it works on stale pointers to already-deleted VFs. The proposed fix is to make the PCI ACPI code aware of SR/IOV. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-05-09 14:16:20 +00:00
pci_is_vf(dev);
}
static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slots)
{
HotplugHandler *hotplug_ctrl;
BusChild *kid, *next;
int slot = ctz32(slots);
PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
trace_acpi_pci_eject_slot(bsel, slot);
if (!bus || slot > 31) {
return;
}
/* Mark request as complete */
s->acpi_pcihp_pci_status[bsel].down &= ~(1U << slot);
s->acpi_pcihp_pci_status[bsel].up &= ~(1U << slot);
QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
DeviceState *qdev = kid->child;
PCIDevice *dev = PCI_DEVICE(qdev);
if (PCI_SLOT(dev->devfn) == slot) {
if (!acpi_pcihp_pc_no_hotplug(s, dev)) {
/*
* partially_hotplugged is used by virtio-net failover:
* failover has asked the guest OS to unplug the device
* but we need to keep some references to the device
* to be able to plug it back in case of failure so
* we don't execute hotplug_handler_unplug().
*/
if (dev->partially_hotplugged) {
/*
* pending_deleted_event is set to true when
* virtio-net failover asks to unplug the device,
* and set to false here when the operation is done
* This is used by the migration loop to detect the
* end of the operation and really start the migration.
*/
qdev->pending_deleted_event = false;
} else {
hotplug_ctrl = qdev_get_hotplug_handler(qdev);
hotplug_handler_unplug(hotplug_ctrl, qdev, &error_abort);
object_unparent(OBJECT(qdev));
}
}
}
}
}
static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
{
BusChild *kid, *next;
PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
/* Execute any pending removes during reset */
while (s->acpi_pcihp_pci_status[bsel].down) {
acpi_pcihp_eject_slot(s, bsel, s->acpi_pcihp_pci_status[bsel].down);
}
s->acpi_pcihp_pci_status[bsel].hotplug_enable = ~0;
if (!bus) {
return;
}
QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
DeviceState *qdev = kid->child;
PCIDevice *pdev = PCI_DEVICE(qdev);
int slot = PCI_SLOT(pdev->devfn);
if (acpi_pcihp_pc_no_hotplug(s, pdev)) {
s->acpi_pcihp_pci_status[bsel].hotplug_enable &= ~(1U << slot);
}
}
}
static void acpi_pcihp_update(AcpiPciHpState *s)
{
int i;
for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
acpi_pcihp_update_hotplug_bus(s, i);
}
}
void acpi_pcihp_reset(AcpiPciHpState *s)
{
acpi_set_pci_info(s->use_acpi_hotplug_bridge);
acpi_pcihp_update(s);
}
void acpi_pcihp_device_pre_plug_cb(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp)
{
pci: introduce acpi-index property for PCI device In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-15 18:00:58 +00:00
PCIDevice *pdev = PCI_DEVICE(dev);
/* Only hotplugged devices need the hotplug capability. */
if (dev->hotplugged &&
acpi_pcihp_get_bsel(pci_get_bus(pdev)) < 0) {
error_setg(errp, "Unsupported bus. Bus doesn't have property '"
ACPI_PCIHP_PROP_BSEL "' set");
return;
}
}
void acpi_pcihp_device_plug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s,
DeviceState *dev, Error **errp)
{
PCIDevice *pdev = PCI_DEVICE(dev);
int slot = PCI_SLOT(pdev->devfn);
acpi: pcihp: pcie: set power on cap on parent slot on creation a PCIDevice has power turned on at the end of pci_qdev_realize() however later on if PCIe slot isn't populated with any children it's power is turned off. It's fine if native hotplug is used as plug callback will power slot on among other things. However when ACPI hotplug is enabled it replaces native PCIe plug callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and as result slot stays powered off. It works fine as ACPI hotplug on guest side takes care of enumerating/initializing hotplugged device. But when later guest is migrated, call chain introduced by] commit d5daff7d312 (pcie: implement slot power control for pcie root ports) pcie_cap_slot_post_load() -> pcie_cap_update_power() -> pcie_set_power_device() -> pci_set_power() -> pci_update_mappings() will disable earlier initialized BARs for the hotplugged device in powered off slot due to commit 23786d13441 (pci: implement power state) which disables BARs if power is off. Fix it by setting PCI_EXP_SLTCTL_PCC to PCI_EXP_SLTCTL_PWR_ON on slot (root port/downstream port) at the time a device hotplugged into it. As result PCI_EXP_SLTCTL_PWR_ON is migrated to target and above call chain keeps device plugged into it powered on. Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports") Fixes: 23786d13441 ("pci: implement power state") Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 Suggested-by: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20220301151200.3507298-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-01 15:11:59 +00:00
PCIDevice *bridge;
PCIBus *bus;
int bsel;
/* Don't send event when device is enabled during qemu machine creation:
* it is present on boot, no hotplug event is necessary. We do send an
* event when the device is disabled later. */
if (!dev->hotplugged) {
/*
* Overwrite the default hotplug handler with the ACPI PCI one
* for cold plugged bridges only.
*/
if (s->use_acpi_hotplug_bridge &&
object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
PCIBus *sec = pci_bridge_get_sec_bus(PCI_BRIDGE(pdev));
qbus_set_hotplug_handler(BUS(sec), OBJECT(hotplug_dev));
/* We don't have to overwrite any other hotplug handler yet */
assert(QLIST_EMPTY(&sec->child));
}
return;
}
acpi: pcihp: pcie: set power on cap on parent slot on creation a PCIDevice has power turned on at the end of pci_qdev_realize() however later on if PCIe slot isn't populated with any children it's power is turned off. It's fine if native hotplug is used as plug callback will power slot on among other things. However when ACPI hotplug is enabled it replaces native PCIe plug callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and as result slot stays powered off. It works fine as ACPI hotplug on guest side takes care of enumerating/initializing hotplugged device. But when later guest is migrated, call chain introduced by] commit d5daff7d312 (pcie: implement slot power control for pcie root ports) pcie_cap_slot_post_load() -> pcie_cap_update_power() -> pcie_set_power_device() -> pci_set_power() -> pci_update_mappings() will disable earlier initialized BARs for the hotplugged device in powered off slot due to commit 23786d13441 (pci: implement power state) which disables BARs if power is off. Fix it by setting PCI_EXP_SLTCTL_PCC to PCI_EXP_SLTCTL_PWR_ON on slot (root port/downstream port) at the time a device hotplugged into it. As result PCI_EXP_SLTCTL_PWR_ON is migrated to target and above call chain keeps device plugged into it powered on. Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports") Fixes: 23786d13441 ("pci: implement power state") Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 Suggested-by: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20220301151200.3507298-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-01 15:11:59 +00:00
bus = pci_get_bus(pdev);
bridge = pci_bridge_get_device(bus);
if (object_dynamic_cast(OBJECT(bridge), TYPE_PCIE_ROOT_PORT) ||
object_dynamic_cast(OBJECT(bridge), TYPE_XIO3130_DOWNSTREAM)) {
pcie_cap_slot_enable_power(bridge);
}
bsel = acpi_pcihp_get_bsel(bus);
g_assert(bsel >= 0);
s->acpi_pcihp_pci_status[bsel].up |= (1U << slot);
acpi_send_event(DEVICE(hotplug_dev), ACPI_PCI_HOTPLUG_STATUS);
}
void acpi_pcihp_device_unplug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s,
DeviceState *dev, Error **errp)
{
PCIDevice *pdev = PCI_DEVICE(dev);
trace_acpi_pci_unplug(PCI_SLOT(pdev->devfn),
acpi_pcihp_get_bsel(pci_get_bus(pdev)));
qdev_unrealize(dev);
}
void acpi_pcihp_device_unplug_request_cb(HotplugHandler *hotplug_dev,
AcpiPciHpState *s, DeviceState *dev,
Error **errp)
{
PCIDevice *pdev = PCI_DEVICE(dev);
int slot = PCI_SLOT(pdev->devfn);
int bsel = acpi_pcihp_get_bsel(pci_get_bus(pdev));
trace_acpi_pci_unplug_request(bsel, slot);
if (bsel < 0) {
error_setg(errp, "Unsupported bus. Bus doesn't have property '"
ACPI_PCIHP_PROP_BSEL "' set");
return;
}
/*
* pending_deleted_event is used by virtio-net failover to detect the
* end of the unplug operation, the flag is set to false in
* acpi_pcihp_eject_slot() when the operation is completed.
*/
pdev->qdev.pending_deleted_event = true;
acpi: pcihp: allow repeating hot-unplug requests with Q35 using ACPI PCI hotplug by default, user's request to unplug device is ignored when it's issued before guest OS has been booted. And any additional attempt to request device hot-unplug afterwards results in following error: "Device XYZ is already in the process of unplug" arguably it can be considered as a regression introduced by [2], before which it was possible to issue unplug request multiple times. Accept new uplug requests after timeout (1ms). This brings ACPI PCI hotplug on par with native PCIe unplug behavior [1] and allows user to repeat unplug requests at propper times. Set expire timeout to arbitrary 1msec so user won't be able to flood guest with SCI interrupts by calling device_del in tight loop. PS: ACPI spec doesn't mandate what OSPM can do with GPEx.status bits set before it's booted => it's impl. depended. Status bits may be retained (I tested with one Windows version) or cleared (Linux since 2.6 kernel times) during guest's ACPI subsystem initialization. Clearing status bits (though not wrong per se) hides the unplug event from guest, and it's upto user to repeat device_del later when guest is able to handle unplug requests. 1) 18416c62e3 ("pcie: expire pending delete") 2) Fixes: cce8944cc9ef ("qdev-monitor: Forbid repeated device_del") Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> CC: mst@redhat.com CC: anisinha@redhat.com CC: jusual@redhat.com CC: kraxel@redhat.com Message-Id: <20230418090449.2155757-1-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com>
2023-04-18 09:04:49 +00:00
/* if unplug was requested before OSPM is initialized,
* linux kernel will clear GPE0.sts[] bits during boot, which effectively
* hides unplug event. And than followup qmp_device_del() calls remain
* blocked by above flag permanently.
* Unblock qmp_device_del() by setting expire limit, so user can
* repeat unplug request later when OSPM has been booted.
*/
pdev->qdev.pending_deleted_expires_ms =
qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL); /* 1 msec */
s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
acpi_send_event(DEVICE(hotplug_dev), ACPI_PCI_HOTPLUG_STATUS);
}
bool acpi_pcihp_is_hotpluggbale_bus(AcpiPciHpState *s, BusState *bus)
{
Object *o = OBJECT(bus->parent);
if (s->use_acpi_hotplug_bridge &&
object_dynamic_cast(o, TYPE_PCI_BRIDGE)) {
if (object_dynamic_cast(o, TYPE_PCIE_SLOT) && !PCIE_SLOT(o)->hotplug) {
return false;
}
return true;
}
if (s->use_acpi_root_pci_hotplug) {
return true;
}
return false;
}
static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size)
{
AcpiPciHpState *s = opaque;
uint32_t val = 0;
int bsel = s->hotplug_select;
if (bsel < 0 || bsel >= ACPI_PCIHP_MAX_HOTPLUG_BUS) {
return 0;
}
switch (addr) {
case PCI_UP_BASE:
val = s->acpi_pcihp_pci_status[bsel].up;
if (s->use_acpi_hotplug_bridge) {
s->acpi_pcihp_pci_status[bsel].up = 0;
}
trace_acpi_pci_up_read(val);
break;
case PCI_DOWN_BASE:
val = s->acpi_pcihp_pci_status[bsel].down;
trace_acpi_pci_down_read(val);
break;
case PCI_EJ_BASE:
trace_acpi_pci_features_read(val);
break;
case PCI_RMV_BASE:
val = s->acpi_pcihp_pci_status[bsel].hotplug_enable;
trace_acpi_pci_rmv_read(val);
break;
case PCI_SEL_BASE:
val = s->hotplug_select;
trace_acpi_pci_sel_read(val);
pci: introduce acpi-index property for PCI device In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-15 18:00:58 +00:00
break;
case PCI_AIDX_BASE:
val = s->acpi_index;
s->acpi_index = 0;
trace_acpi_pci_acpi_index_read(val);
break;
default:
break;
}
return val;
}
static void pci_write(void *opaque, hwaddr addr, uint64_t data,
unsigned int size)
{
pci: introduce acpi-index property for PCI device In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-15 18:00:58 +00:00
int slot;
PCIBus *bus;
BusChild *kid, *next;
AcpiPciHpState *s = opaque;
pci: introduce acpi-index property for PCI device In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-15 18:00:58 +00:00
s->acpi_index = 0;
switch (addr) {
pci: introduce acpi-index property for PCI device In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-15 18:00:58 +00:00
case PCI_AIDX_BASE:
/*
* fetch acpi-index for specified slot so that follow up read from
* PCI_AIDX_BASE can return it to guest
*/
slot = ctz32(data);
if (s->hotplug_select >= ACPI_PCIHP_MAX_HOTPLUG_BUS) {
break;
}
bus = acpi_pcihp_find_hotplug_bus(s, s->hotplug_select);
if (!bus) {
break;
}
pci: introduce acpi-index property for PCI device In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-15 18:00:58 +00:00
QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
Object *o = OBJECT(kid->child);
PCIDevice *dev = PCI_DEVICE(o);
if (PCI_SLOT(dev->devfn) == slot) {
s->acpi_index = object_property_get_uint(o, "acpi-index", NULL);
break;
}
}
trace_acpi_pci_acpi_index_write(s->hotplug_select, slot, s->acpi_index);
break;
case PCI_EJ_BASE:
if (s->hotplug_select >= ACPI_PCIHP_MAX_HOTPLUG_BUS) {
break;
}
acpi_pcihp_eject_slot(s, s->hotplug_select, data);
trace_acpi_pci_ej_write(addr, data);
break;
case PCI_SEL_BASE:
s->hotplug_select = s->use_acpi_hotplug_bridge ? data :
ACPI_PCIHP_BSEL_DEFAULT;
trace_acpi_pci_sel_write(addr, data);
default:
break;
}
}
static const MemoryRegionOps acpi_pcihp_io_ops = {
.read = pci_read,
.write = pci_write,
.endianness = DEVICE_LITTLE_ENDIAN,
.valid = {
.min_access_size = 4,
.max_access_size = 4,
},
};
void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, PCIBus *root_bus,
MemoryRegion *io, uint16_t io_base)
{
s->io_len = ACPI_PCIHP_SIZE;
s->io_base = io_base;
s->root = root_bus;
memory_region_init_io(&s->io, owner, &acpi_pcihp_io_ops, s,
"acpi-pci-hotplug", s->io_len);
memory_region_add_subregion(io, s->io_base, &s->io);
object_property_add_uint16_ptr(owner, ACPI_PCIHP_IO_BASE_PROP, &s->io_base,
qom: Drop parameter @errp of object_property_add() & friends The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]
2020-05-05 15:29:22 +00:00
OBJ_PROP_FLAG_READ);
object_property_add_uint16_ptr(owner, ACPI_PCIHP_IO_LEN_PROP, &s->io_len,
qom: Drop parameter @errp of object_property_add() & friends The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]
2020-05-05 15:29:22 +00:00
OBJ_PROP_FLAG_READ);
}
const VMStateDescription vmstate_acpi_pcihp_pci_status = {
.name = "acpi_pcihp_pci_status",
.version_id = 1,
.minimum_version_id = 1,
.fields = (const VMStateField[]) {
VMSTATE_UINT32(up, AcpiPciHpPciStatus),
VMSTATE_UINT32(down, AcpiPciHpPciStatus),
VMSTATE_END_OF_LIST()
}
};