VFIO updates for v6.8-rc1

- Add debugfs support, initially used for reporting device migration
    state. (Longfang Liu)
 
  - Fixes and support for migration dirty tracking across multiple IOVA
    regions in the pds-vfio-pci driver. (Brett Creeley)
 
  - Improved IOMMU allocation accounting visibility. (Pasha Tatashin)
 
  - Virtio infrastructure and a new virtio-vfio-pci variant driver, which
    provides emulation of a legacy virtio interfaces on modern virtio
    hardware for virtio-net VF devices where the PF driver exposes
    support for legacy admin queues, ie. an emulated IO BAR on an SR-IOV
    VF to provide driver ABI compatibility to legacy devices.
    (Yishai Hadas & Feng Liu)
 
  - Migration fixes for the hisi-acc-vfio-pci variant driver.
    (Shameer Kolothum)
 
  - Kconfig dependency fix for new virtio-vfio-pci variant driver.
    (Arnd Bergmann)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmWhkhEbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiCLgQAJv6mzD79dVWKAZH27Lj
 PK0ZSyu3fwgPxTmhRXysKKMs79WI2GlVx6nyW8pVe3w+OGWpdTcbZK2H/T/FryZQ
 QsbKteueG83ni1cIdJFzmIM1jO79jhtsPxpclRS/VmECRhYA6+c7smynHyZNrVAL
 wWkJIkS2uUEx3eUefzH4U2CRen3TILwHAXi27fJ8pHbr6Yor+XvUOgM3eQDjUj+t
 eABL/pJr0qFDQstom6k7GLAsenRHKMLUG88ziSciSJxOg5YiT4py7zeLXuoEhVD1
 kI9KE+Vle5EdZe8MzLLhmzLZoFVfhjyNfj821QjtfP3Gkj6TqnUWBKJAptMuQpdf
 HklOLNmabrZbat+i6QqswrnQ5Z1doPz1uNBsl2lH+2/KIaT8bHZI+QgjK7pg2H2L
 O679My0od4rVLpjnSLDdRoXlcLd6mmvq3663gPogziHBNdNl3oQBI3iIa7ixljkA
 lxJbOZIDBAjzPk+t5NLYwkTsab1AY4zGlfr0M3Sk3q7tyj/MlBcX/fuqyhXjUfqR
 Zhqaw2OaWD8R0EqfSK+wRXr1+z7EWJO/y1iq8RYlD5Mozo+6YMVThjLDUO+8mrtV
 6/PL0woGALw0Tq1u0tw3rLjzCd9qwD9BD2fFUQwUWEe3j3wG2HCLLqyomxcmaKS8
 WgvUXtufWyvonCcIeLKXI9Kt
 =IuK2
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.8-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Add debugfs support, initially used for reporting device migration
   state (Longfang Liu)

 - Fixes and support for migration dirty tracking across multiple IOVA
   regions in the pds-vfio-pci driver (Brett Creeley)

 - Improved IOMMU allocation accounting visibility (Pasha Tatashin)

 - Virtio infrastructure and a new virtio-vfio-pci variant driver, which
   provides emulation of a legacy virtio interfaces on modern virtio
   hardware for virtio-net VF devices where the PF driver exposes
   support for legacy admin queues, ie. an emulated IO BAR on an SR-IOV
   VF to provide driver ABI compatibility to legacy devices (Yishai
   Hadas & Feng Liu)

 - Migration fixes for the hisi-acc-vfio-pci variant driver (Shameer
   Kolothum)

 - Kconfig dependency fix for new virtio-vfio-pci variant driver (Arnd
   Bergmann)

* tag 'vfio-v6.8-rc1' of https://github.com/awilliam/linux-vfio: (22 commits)
  vfio/virtio: fix virtio-pci dependency
  hisi_acc_vfio_pci: Update migration data pointer correctly on saving/resume
  vfio/virtio: Declare virtiovf_pci_aer_reset_done() static
  vfio/virtio: Introduce a vfio driver over virtio devices
  vfio/pci: Expose vfio_pci_core_iowrite/read##size()
  vfio/pci: Expose vfio_pci_core_setup_barmap()
  virtio-pci: Introduce APIs to execute legacy IO admin commands
  virtio-pci: Initialize the supported admin commands
  virtio-pci: Introduce admin commands
  virtio-pci: Introduce admin command sending function
  virtio-pci: Introduce admin virtqueue
  virtio: Define feature bit for administration virtqueue
  vfio/type1: account iommu allocations
  vfio/pds: Add multi-region support
  vfio/pds: Move seq/ack bitmaps into region struct
  vfio/pds: Pass region info to relevant functions
  vfio/pds: Move and rename region specific info
  vfio/pds: Only use a single SGL for both seq and ack
  vfio/pds: Fix calculations in pds_vfio_dirty_sync
  MAINTAINERS: Add vfio debugfs interface doc link
  ...
This commit is contained in:
Linus Torvalds 2024-01-18 15:57:25 -08:00
commit 244aefb1c6
34 changed files with 1754 additions and 164 deletions

View file

@ -0,0 +1,25 @@
What: /sys/kernel/debug/vfio
Date: December 2023
KernelVersion: 6.8
Contact: Longfang Liu <liulongfang@huawei.com>
Description: This debugfs file directory is used for debugging
of vfio devices, it's a common directory for all vfio devices.
Vfio core will create a device subdirectory under this
directory.
What: /sys/kernel/debug/vfio/<device>/migration
Date: December 2023
KernelVersion: 6.8
Contact: Longfang Liu <liulongfang@huawei.com>
Description: This debugfs file directory is used for debugging
of vfio devices that support live migration.
The debugfs of each vfio device that supports live migration
could be created under this directory.
What: /sys/kernel/debug/vfio/<device>/migration/state
Date: December 2023
KernelVersion: 6.8
Contact: Longfang Liu <liulongfang@huawei.com>
Description: Read the live migration status of the vfio device.
The contents of the state file reflects the migration state
relative to those defined in the vfio_device_mig_state enum

View file

@ -23002,6 +23002,7 @@ M: Alex Williamson <alex.williamson@redhat.com>
L: kvm@vger.kernel.org
S: Maintained
T: git https://github.com/awilliam/linux-vfio.git
F: Documentation/ABI/testing/debugfs-vfio
F: Documentation/ABI/testing/sysfs-devices-vfio-dev
F: Documentation/driver-api/vfio.rst
F: drivers/vfio/
@ -23037,6 +23038,13 @@ L: kvm@vger.kernel.org
S: Maintained
F: drivers/vfio/pci/mlx5/
VFIO VIRTIO PCI DRIVER
M: Yishai Hadas <yishaih@nvidia.com>
L: kvm@vger.kernel.org
L: virtualization@lists.linux-foundation.org
S: Maintained
F: drivers/vfio/pci/virtio
VFIO PCI DEVICE SPECIFIC DRIVERS
R: Jason Gunthorpe <jgg@nvidia.com>
R: Yishai Hadas <yishaih@nvidia.com>

View file

@ -80,6 +80,16 @@ config VFIO_VIRQFD
select EVENTFD
default n
config VFIO_DEBUGFS
bool "Export VFIO internals in DebugFS"
depends on DEBUG_FS
help
Allows exposure of VFIO device internals. This option enables
the use of debugfs by VFIO drivers as required. The device can
cause the VFIO code create a top-level debug/vfio directory
during initialization, and then populate a subdirectory with
entries as required.
source "drivers/vfio/pci/Kconfig"
source "drivers/vfio/platform/Kconfig"
source "drivers/vfio/mdev/Kconfig"

View file

@ -7,6 +7,7 @@ vfio-$(CONFIG_VFIO_GROUP) += group.o
vfio-$(CONFIG_IOMMUFD) += iommufd.o
vfio-$(CONFIG_VFIO_CONTAINER) += container.o
vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
vfio-$(CONFIG_VFIO_DEBUGFS) += debugfs.o
obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o

92
drivers/vfio/debugfs.c Normal file
View file

@ -0,0 +1,92 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (c) 2023, HiSilicon Ltd.
*/
#include <linux/device.h>
#include <linux/debugfs.h>
#include <linux/seq_file.h>
#include <linux/vfio.h>
#include "vfio.h"
static struct dentry *vfio_debugfs_root;
static int vfio_device_state_read(struct seq_file *seq, void *data)
{
struct device *vf_dev = seq->private;
struct vfio_device *vdev = container_of(vf_dev,
struct vfio_device, device);
enum vfio_device_mig_state state;
int ret;
BUILD_BUG_ON(VFIO_DEVICE_STATE_NR !=
VFIO_DEVICE_STATE_PRE_COPY_P2P + 1);
ret = vdev->mig_ops->migration_get_state(vdev, &state);
if (ret)
return -EINVAL;
switch (state) {
case VFIO_DEVICE_STATE_ERROR:
seq_puts(seq, "ERROR\n");
break;
case VFIO_DEVICE_STATE_STOP:
seq_puts(seq, "STOP\n");
break;
case VFIO_DEVICE_STATE_RUNNING:
seq_puts(seq, "RUNNING\n");
break;
case VFIO_DEVICE_STATE_STOP_COPY:
seq_puts(seq, "STOP_COPY\n");
break;
case VFIO_DEVICE_STATE_RESUMING:
seq_puts(seq, "RESUMING\n");
break;
case VFIO_DEVICE_STATE_RUNNING_P2P:
seq_puts(seq, "RUNNING_P2P\n");
break;
case VFIO_DEVICE_STATE_PRE_COPY:
seq_puts(seq, "PRE_COPY\n");
break;
case VFIO_DEVICE_STATE_PRE_COPY_P2P:
seq_puts(seq, "PRE_COPY_P2P\n");
break;
default:
seq_puts(seq, "Invalid\n");
}
return 0;
}
void vfio_device_debugfs_init(struct vfio_device *vdev)
{
struct device *dev = &vdev->device;
vdev->debug_root = debugfs_create_dir(dev_name(vdev->dev),
vfio_debugfs_root);
if (vdev->mig_ops) {
struct dentry *vfio_dev_migration = NULL;
vfio_dev_migration = debugfs_create_dir("migration",
vdev->debug_root);
debugfs_create_devm_seqfile(dev, "state", vfio_dev_migration,
vfio_device_state_read);
}
}
void vfio_device_debugfs_exit(struct vfio_device *vdev)
{
debugfs_remove_recursive(vdev->debug_root);
}
void vfio_debugfs_create_root(void)
{
vfio_debugfs_root = debugfs_create_dir("vfio", NULL);
}
void vfio_debugfs_remove_root(void)
{
debugfs_remove_recursive(vfio_debugfs_root);
vfio_debugfs_root = NULL;
}

View file

@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
source "drivers/vfio/pci/pds/Kconfig"
source "drivers/vfio/pci/virtio/Kconfig"
endmenu

View file

@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5/
obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
obj-$(CONFIG_PDS_VFIO_PCI) += pds/
obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/

View file

@ -694,6 +694,7 @@ static ssize_t hisi_acc_vf_resume_write(struct file *filp, const char __user *bu
size_t len, loff_t *pos)
{
struct hisi_acc_vf_migration_file *migf = filp->private_data;
u8 *vf_data = (u8 *)&migf->vf_data;
loff_t requested_length;
ssize_t done = 0;
int ret;
@ -715,7 +716,7 @@ static ssize_t hisi_acc_vf_resume_write(struct file *filp, const char __user *bu
goto out_unlock;
}
ret = copy_from_user(&migf->vf_data, buf, len);
ret = copy_from_user(vf_data + *pos, buf, len);
if (ret) {
done = -EFAULT;
goto out_unlock;
@ -835,7 +836,9 @@ static ssize_t hisi_acc_vf_save_read(struct file *filp, char __user *buf, size_t
len = min_t(size_t, migf->total_length - *pos, len);
if (len) {
ret = copy_to_user(buf, &migf->vf_data, len);
u8 *vf_data = (u8 *)&migf->vf_data;
ret = copy_to_user(buf, vf_data + *pos, len);
if (ret) {
done = -EFAULT;
goto out_unlock;

View file

@ -70,7 +70,7 @@ pds_vfio_print_guest_region_info(struct pds_vfio_pci_device *pds_vfio,
kfree(region_info);
}
static int pds_vfio_dirty_alloc_bitmaps(struct pds_vfio_dirty *dirty,
static int pds_vfio_dirty_alloc_bitmaps(struct pds_vfio_region *region,
unsigned long bytes)
{
unsigned long *host_seq_bmp, *host_ack_bmp;
@ -85,47 +85,63 @@ static int pds_vfio_dirty_alloc_bitmaps(struct pds_vfio_dirty *dirty,
return -ENOMEM;
}
dirty->host_seq.bmp = host_seq_bmp;
dirty->host_ack.bmp = host_ack_bmp;
region->host_seq = host_seq_bmp;
region->host_ack = host_ack_bmp;
region->bmp_bytes = bytes;
return 0;
}
static void pds_vfio_dirty_free_bitmaps(struct pds_vfio_dirty *dirty)
{
vfree(dirty->host_seq.bmp);
vfree(dirty->host_ack.bmp);
dirty->host_seq.bmp = NULL;
dirty->host_ack.bmp = NULL;
if (!dirty->regions)
return;
for (int i = 0; i < dirty->num_regions; i++) {
struct pds_vfio_region *region = &dirty->regions[i];
vfree(region->host_seq);
vfree(region->host_ack);
region->host_seq = NULL;
region->host_ack = NULL;
region->bmp_bytes = 0;
}
}
static void __pds_vfio_dirty_free_sgl(struct pds_vfio_pci_device *pds_vfio,
struct pds_vfio_bmp_info *bmp_info)
struct pds_vfio_region *region)
{
struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev;
struct device *pdsc_dev = &pci_physfn(pdev)->dev;
dma_unmap_single(pdsc_dev, bmp_info->sgl_addr,
bmp_info->num_sge * sizeof(struct pds_lm_sg_elem),
dma_unmap_single(pdsc_dev, region->sgl_addr,
region->num_sge * sizeof(struct pds_lm_sg_elem),
DMA_BIDIRECTIONAL);
kfree(bmp_info->sgl);
kfree(region->sgl);
bmp_info->num_sge = 0;
bmp_info->sgl = NULL;
bmp_info->sgl_addr = 0;
region->num_sge = 0;
region->sgl = NULL;
region->sgl_addr = 0;
}
static void pds_vfio_dirty_free_sgl(struct pds_vfio_pci_device *pds_vfio)
{
if (pds_vfio->dirty.host_seq.sgl)
__pds_vfio_dirty_free_sgl(pds_vfio, &pds_vfio->dirty.host_seq);
if (pds_vfio->dirty.host_ack.sgl)
__pds_vfio_dirty_free_sgl(pds_vfio, &pds_vfio->dirty.host_ack);
struct pds_vfio_dirty *dirty = &pds_vfio->dirty;
if (!dirty->regions)
return;
for (int i = 0; i < dirty->num_regions; i++) {
struct pds_vfio_region *region = &dirty->regions[i];
if (region->sgl)
__pds_vfio_dirty_free_sgl(pds_vfio, region);
}
}
static int __pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio,
struct pds_vfio_bmp_info *bmp_info,
u32 page_count)
static int pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio,
struct pds_vfio_region *region,
u32 page_count)
{
struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev;
struct device *pdsc_dev = &pci_physfn(pdev)->dev;
@ -147,32 +163,81 @@ static int __pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio,
return -EIO;
}
bmp_info->sgl = sgl;
bmp_info->num_sge = max_sge;
bmp_info->sgl_addr = sgl_addr;
region->sgl = sgl;
region->num_sge = max_sge;
region->sgl_addr = sgl_addr;
return 0;
}
static int pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio,
u32 page_count)
static void pds_vfio_dirty_free_regions(struct pds_vfio_dirty *dirty)
{
vfree(dirty->regions);
dirty->regions = NULL;
dirty->num_regions = 0;
}
static int pds_vfio_dirty_alloc_regions(struct pds_vfio_pci_device *pds_vfio,
struct pds_lm_dirty_region_info *region_info,
u64 region_page_size, u8 num_regions)
{
struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev;
struct pds_vfio_dirty *dirty = &pds_vfio->dirty;
u32 dev_bmp_offset_byte = 0;
int err;
err = __pds_vfio_dirty_alloc_sgl(pds_vfio, &dirty->host_seq,
page_count);
if (err)
return err;
dirty->regions = vcalloc(num_regions, sizeof(struct pds_vfio_region));
if (!dirty->regions)
return -ENOMEM;
dirty->num_regions = num_regions;
err = __pds_vfio_dirty_alloc_sgl(pds_vfio, &dirty->host_ack,
page_count);
if (err) {
__pds_vfio_dirty_free_sgl(pds_vfio, &dirty->host_seq);
return err;
for (int i = 0; i < num_regions; i++) {
struct pds_lm_dirty_region_info *ri = &region_info[i];
struct pds_vfio_region *region = &dirty->regions[i];
u64 region_size, region_start;
u32 page_count;
/* page_count might be adjusted by the device */
page_count = le32_to_cpu(ri->page_count);
region_start = le64_to_cpu(ri->dma_base);
region_size = page_count * region_page_size;
err = pds_vfio_dirty_alloc_bitmaps(region,
page_count / BITS_PER_BYTE);
if (err) {
dev_err(&pdev->dev, "Failed to alloc dirty bitmaps: %pe\n",
ERR_PTR(err));
goto out_free_regions;
}
err = pds_vfio_dirty_alloc_sgl(pds_vfio, region, page_count);
if (err) {
dev_err(&pdev->dev, "Failed to alloc dirty sg lists: %pe\n",
ERR_PTR(err));
goto out_free_regions;
}
region->size = region_size;
region->start = region_start;
region->page_size = region_page_size;
region->dev_bmp_offset_start_byte = dev_bmp_offset_byte;
dev_bmp_offset_byte += page_count / BITS_PER_BYTE;
if (dev_bmp_offset_byte % BITS_PER_BYTE) {
dev_err(&pdev->dev, "Device bitmap offset is mis-aligned\n");
err = -EINVAL;
goto out_free_regions;
}
}
return 0;
out_free_regions:
pds_vfio_dirty_free_bitmaps(dirty);
pds_vfio_dirty_free_sgl(pds_vfio);
pds_vfio_dirty_free_regions(dirty);
return err;
}
static int pds_vfio_dirty_enable(struct pds_vfio_pci_device *pds_vfio,
@ -181,16 +246,14 @@ static int pds_vfio_dirty_enable(struct pds_vfio_pci_device *pds_vfio,
{
struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev;
struct device *pdsc_dev = &pci_physfn(pdev)->dev;
struct pds_vfio_dirty *dirty = &pds_vfio->dirty;
u64 region_start, region_size, region_page_size;
struct pds_lm_dirty_region_info *region_info;
struct interval_tree_node *node = NULL;
u64 region_page_size = *page_size;
u8 max_regions = 0, num_regions;
dma_addr_t regions_dma = 0;
u32 num_ranges = nnodes;
u32 page_count;
u16 len;
int err;
u16 len;
dev_dbg(&pdev->dev, "vf%u: Start dirty page tracking\n",
pds_vfio->vf_id);
@ -217,39 +280,38 @@ static int pds_vfio_dirty_enable(struct pds_vfio_pci_device *pds_vfio,
return -EOPNOTSUPP;
}
/*
* Only support 1 region for now. If there are any large gaps in the
* VM's address regions, then this would be a waste of memory as we are
* generating 2 bitmaps (ack/seq) from the min address to the max
* address of the VM's address regions. In the future, if we support
* more than one region in the device/driver we can split the bitmaps
* on the largest address region gaps. We can do this split up to the
* max_regions times returned from the dirty_status command.
*/
max_regions = 1;
if (num_ranges > max_regions) {
vfio_combine_iova_ranges(ranges, nnodes, max_regions);
num_ranges = max_regions;
}
region_info = kcalloc(num_ranges, sizeof(*region_info), GFP_KERNEL);
if (!region_info)
return -ENOMEM;
len = num_ranges * sizeof(*region_info);
node = interval_tree_iter_first(ranges, 0, ULONG_MAX);
if (!node)
return -EINVAL;
for (int i = 0; i < num_ranges; i++) {
struct pds_lm_dirty_region_info *ri = &region_info[i];
u64 region_size = node->last - node->start + 1;
u64 region_start = node->start;
u32 page_count;
region_size = node->last - node->start + 1;
region_start = node->start;
region_page_size = *page_size;
page_count = DIV_ROUND_UP(region_size, region_page_size);
len = sizeof(*region_info);
region_info = kzalloc(len, GFP_KERNEL);
if (!region_info)
return -ENOMEM;
ri->dma_base = cpu_to_le64(region_start);
ri->page_count = cpu_to_le32(page_count);
ri->page_size_log2 = ilog2(region_page_size);
page_count = DIV_ROUND_UP(region_size, region_page_size);
dev_dbg(&pdev->dev,
"region_info[%d]: region_start 0x%llx region_end 0x%lx region_size 0x%llx page_count %u page_size %llu\n",
i, region_start, node->last, region_size, page_count,
region_page_size);
region_info->dma_base = cpu_to_le64(region_start);
region_info->page_count = cpu_to_le32(page_count);
region_info->page_size_log2 = ilog2(region_page_size);
node = interval_tree_iter_next(node, 0, ULONG_MAX);
}
regions_dma = dma_map_single(pdsc_dev, (void *)region_info, len,
DMA_BIDIRECTIONAL);
@ -258,39 +320,20 @@ static int pds_vfio_dirty_enable(struct pds_vfio_pci_device *pds_vfio,
goto out_free_region_info;
}
err = pds_vfio_dirty_enable_cmd(pds_vfio, regions_dma, max_regions);
err = pds_vfio_dirty_enable_cmd(pds_vfio, regions_dma, num_ranges);
dma_unmap_single(pdsc_dev, regions_dma, len, DMA_BIDIRECTIONAL);
if (err)
goto out_free_region_info;
/*
* page_count might be adjusted by the device,
* update it before freeing region_info DMA
*/
page_count = le32_to_cpu(region_info->page_count);
dev_dbg(&pdev->dev,
"region_info: regions_dma 0x%llx dma_base 0x%llx page_count %u page_size_log2 %u\n",
regions_dma, region_start, page_count,
(u8)ilog2(region_page_size));
err = pds_vfio_dirty_alloc_bitmaps(dirty, page_count / BITS_PER_BYTE);
err = pds_vfio_dirty_alloc_regions(pds_vfio, region_info,
region_page_size, num_ranges);
if (err) {
dev_err(&pdev->dev, "Failed to alloc dirty bitmaps: %pe\n",
ERR_PTR(err));
goto out_free_region_info;
dev_err(&pdev->dev,
"Failed to allocate %d regions for tracking dirty regions: %pe\n",
num_regions, ERR_PTR(err));
goto out_dirty_disable;
}
err = pds_vfio_dirty_alloc_sgl(pds_vfio, page_count);
if (err) {
dev_err(&pdev->dev, "Failed to alloc dirty sg lists: %pe\n",
ERR_PTR(err));
goto out_free_bitmaps;
}
dirty->region_start = region_start;
dirty->region_size = region_size;
dirty->region_page_size = region_page_size;
pds_vfio_dirty_set_enabled(pds_vfio);
pds_vfio_print_guest_region_info(pds_vfio, max_regions);
@ -299,8 +342,8 @@ static int pds_vfio_dirty_enable(struct pds_vfio_pci_device *pds_vfio,
return 0;
out_free_bitmaps:
pds_vfio_dirty_free_bitmaps(dirty);
out_dirty_disable:
pds_vfio_dirty_disable_cmd(pds_vfio);
out_free_region_info:
kfree(region_info);
return err;
@ -314,6 +357,7 @@ void pds_vfio_dirty_disable(struct pds_vfio_pci_device *pds_vfio, bool send_cmd)
pds_vfio_dirty_disable_cmd(pds_vfio);
pds_vfio_dirty_free_sgl(pds_vfio);
pds_vfio_dirty_free_bitmaps(&pds_vfio->dirty);
pds_vfio_dirty_free_regions(&pds_vfio->dirty);
}
if (send_cmd)
@ -321,8 +365,9 @@ void pds_vfio_dirty_disable(struct pds_vfio_pci_device *pds_vfio, bool send_cmd)
}
static int pds_vfio_dirty_seq_ack(struct pds_vfio_pci_device *pds_vfio,
struct pds_vfio_bmp_info *bmp_info,
u32 offset, u32 bmp_bytes, bool read_seq)
struct pds_vfio_region *region,
unsigned long *seq_ack_bmp, u32 offset,
u32 bmp_bytes, bool read_seq)
{
const char *bmp_type_str = read_seq ? "read_seq" : "write_ack";
u8 dma_dir = read_seq ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
@ -339,7 +384,7 @@ static int pds_vfio_dirty_seq_ack(struct pds_vfio_pci_device *pds_vfio,
int err;
int i;
bmp = (void *)((u64)bmp_info->bmp + offset);
bmp = (void *)((u64)seq_ack_bmp + offset);
page_offset = offset_in_page(bmp);
bmp -= page_offset;
@ -375,7 +420,7 @@ static int pds_vfio_dirty_seq_ack(struct pds_vfio_pci_device *pds_vfio,
goto out_free_sg_table;
for_each_sgtable_dma_sg(&sg_table, sg, i) {
struct pds_lm_sg_elem *sg_elem = &bmp_info->sgl[i];
struct pds_lm_sg_elem *sg_elem = &region->sgl[i];
sg_elem->addr = cpu_to_le64(sg_dma_address(sg));
sg_elem->len = cpu_to_le32(sg_dma_len(sg));
@ -383,15 +428,16 @@ static int pds_vfio_dirty_seq_ack(struct pds_vfio_pci_device *pds_vfio,
num_sge = sg_table.nents;
size = num_sge * sizeof(struct pds_lm_sg_elem);
dma_sync_single_for_device(pdsc_dev, bmp_info->sgl_addr, size, dma_dir);
err = pds_vfio_dirty_seq_ack_cmd(pds_vfio, bmp_info->sgl_addr, num_sge,
offset += region->dev_bmp_offset_start_byte;
dma_sync_single_for_device(pdsc_dev, region->sgl_addr, size, dma_dir);
err = pds_vfio_dirty_seq_ack_cmd(pds_vfio, region->sgl_addr, num_sge,
offset, bmp_bytes, read_seq);
if (err)
dev_err(&pdev->dev,
"Dirty bitmap %s failed offset %u bmp_bytes %u num_sge %u DMA 0x%llx: %pe\n",
bmp_type_str, offset, bmp_bytes,
num_sge, bmp_info->sgl_addr, ERR_PTR(err));
dma_sync_single_for_cpu(pdsc_dev, bmp_info->sgl_addr, size, dma_dir);
num_sge, region->sgl_addr, ERR_PTR(err));
dma_sync_single_for_cpu(pdsc_dev, region->sgl_addr, size, dma_dir);
dma_unmap_sgtable(pdsc_dev, &sg_table, dma_dir, 0);
out_free_sg_table:
@ -403,32 +449,36 @@ static int pds_vfio_dirty_seq_ack(struct pds_vfio_pci_device *pds_vfio,
}
static int pds_vfio_dirty_write_ack(struct pds_vfio_pci_device *pds_vfio,
struct pds_vfio_region *region,
u32 offset, u32 len)
{
return pds_vfio_dirty_seq_ack(pds_vfio, &pds_vfio->dirty.host_ack,
return pds_vfio_dirty_seq_ack(pds_vfio, region, region->host_ack,
offset, len, WRITE_ACK);
}
static int pds_vfio_dirty_read_seq(struct pds_vfio_pci_device *pds_vfio,
struct pds_vfio_region *region,
u32 offset, u32 len)
{
return pds_vfio_dirty_seq_ack(pds_vfio, &pds_vfio->dirty.host_seq,
return pds_vfio_dirty_seq_ack(pds_vfio, region, region->host_seq,
offset, len, READ_SEQ);
}
static int pds_vfio_dirty_process_bitmaps(struct pds_vfio_pci_device *pds_vfio,
struct pds_vfio_region *region,
struct iova_bitmap *dirty_bitmap,
u32 bmp_offset, u32 len_bytes)
{
u64 page_size = pds_vfio->dirty.region_page_size;
u64 region_start = pds_vfio->dirty.region_start;
u64 page_size = region->page_size;
u64 region_start = region->start;
u32 bmp_offset_bit;
__le64 *seq, *ack;
int dword_count;
dword_count = len_bytes / sizeof(u64);
seq = (__le64 *)((u64)pds_vfio->dirty.host_seq.bmp + bmp_offset);
ack = (__le64 *)((u64)pds_vfio->dirty.host_ack.bmp + bmp_offset);
seq = (__le64 *)((u64)region->host_seq + bmp_offset);
ack = (__le64 *)((u64)region->host_ack + bmp_offset);
bmp_offset_bit = bmp_offset * 8;
for (int i = 0; i < dword_count; i++) {
@ -451,12 +501,28 @@ static int pds_vfio_dirty_process_bitmaps(struct pds_vfio_pci_device *pds_vfio,
return 0;
}
static struct pds_vfio_region *
pds_vfio_get_region(struct pds_vfio_pci_device *pds_vfio, unsigned long iova)
{
struct pds_vfio_dirty *dirty = &pds_vfio->dirty;
for (int i = 0; i < dirty->num_regions; i++) {
struct pds_vfio_region *region = &dirty->regions[i];
if (iova >= region->start &&
iova < (region->start + region->size))
return region;
}
return NULL;
}
static int pds_vfio_dirty_sync(struct pds_vfio_pci_device *pds_vfio,
struct iova_bitmap *dirty_bitmap,
unsigned long iova, unsigned long length)
{
struct device *dev = &pds_vfio->vfio_coredev.pdev->dev;
struct pds_vfio_dirty *dirty = &pds_vfio->dirty;
struct pds_vfio_region *region;
u64 bmp_offset, bmp_bytes;
u64 bitmap_size, pages;
int err;
@ -469,26 +535,31 @@ static int pds_vfio_dirty_sync(struct pds_vfio_pci_device *pds_vfio,
return -EINVAL;
}
pages = DIV_ROUND_UP(length, pds_vfio->dirty.region_page_size);
region = pds_vfio_get_region(pds_vfio, iova);
if (!region) {
dev_err(dev, "vf%u: Failed to find region that contains iova 0x%lx length 0x%lx\n",
pds_vfio->vf_id, iova, length);
return -EINVAL;
}
pages = DIV_ROUND_UP(length, region->page_size);
bitmap_size =
round_up(pages, sizeof(u64) * BITS_PER_BYTE) / BITS_PER_BYTE;
dev_dbg(dev,
"vf%u: iova 0x%lx length %lu page_size %llu pages %llu bitmap_size %llu\n",
pds_vfio->vf_id, iova, length, pds_vfio->dirty.region_page_size,
pds_vfio->vf_id, iova, length, region->page_size,
pages, bitmap_size);
if (!length || ((dirty->region_start + iova + length) >
(dirty->region_start + dirty->region_size))) {
if (!length || ((iova - region->start + length) > region->size)) {
dev_err(dev, "Invalid iova 0x%lx and/or length 0x%lx to sync\n",
iova, length);
return -EINVAL;
}
/* bitmap is modified in 64 bit chunks */
bmp_bytes = ALIGN(DIV_ROUND_UP(length / dirty->region_page_size,
sizeof(u64)),
sizeof(u64));
bmp_bytes = ALIGN(DIV_ROUND_UP(length / region->page_size,
sizeof(u64)), sizeof(u64));
if (bmp_bytes != bitmap_size) {
dev_err(dev,
"Calculated bitmap bytes %llu not equal to bitmap size %llu\n",
@ -496,22 +567,30 @@ static int pds_vfio_dirty_sync(struct pds_vfio_pci_device *pds_vfio,
return -EINVAL;
}
bmp_offset = DIV_ROUND_UP(iova / dirty->region_page_size, sizeof(u64));
if (bmp_bytes > region->bmp_bytes) {
dev_err(dev,
"Calculated bitmap bytes %llu larger than region's cached bmp_bytes %llu\n",
bmp_bytes, region->bmp_bytes);
return -EINVAL;
}
bmp_offset = DIV_ROUND_UP((iova - region->start) /
region->page_size, sizeof(u64));
dev_dbg(dev,
"Syncing dirty bitmap, iova 0x%lx length 0x%lx, bmp_offset %llu bmp_bytes %llu\n",
iova, length, bmp_offset, bmp_bytes);
err = pds_vfio_dirty_read_seq(pds_vfio, bmp_offset, bmp_bytes);
err = pds_vfio_dirty_read_seq(pds_vfio, region, bmp_offset, bmp_bytes);
if (err)
return err;
err = pds_vfio_dirty_process_bitmaps(pds_vfio, dirty_bitmap, bmp_offset,
bmp_bytes);
err = pds_vfio_dirty_process_bitmaps(pds_vfio, region, dirty_bitmap,
bmp_offset, bmp_bytes);
if (err)
return err;
err = pds_vfio_dirty_write_ack(pds_vfio, bmp_offset, bmp_bytes);
err = pds_vfio_dirty_write_ack(pds_vfio, region, bmp_offset, bmp_bytes);
if (err)
return err;

View file

@ -4,20 +4,22 @@
#ifndef _DIRTY_H_
#define _DIRTY_H_
struct pds_vfio_bmp_info {
unsigned long *bmp;
u32 bmp_bytes;
struct pds_vfio_region {
unsigned long *host_seq;
unsigned long *host_ack;
u64 bmp_bytes;
u64 size;
u64 start;
u64 page_size;
struct pds_lm_sg_elem *sgl;
dma_addr_t sgl_addr;
u32 dev_bmp_offset_start_byte;
u16 num_sge;
};
struct pds_vfio_dirty {
struct pds_vfio_bmp_info host_seq;
struct pds_vfio_bmp_info host_ack;
u64 region_size;
u64 region_start;
u64 region_page_size;
struct pds_vfio_region *regions;
u8 num_regions;
bool is_enabled;
};

View file

@ -38,7 +38,7 @@
#define vfio_iowrite8 iowrite8
#define VFIO_IOWRITE(size) \
static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev, \
int vfio_pci_core_iowrite##size(struct vfio_pci_core_device *vdev, \
bool test_mem, u##size val, void __iomem *io) \
{ \
if (test_mem) { \
@ -55,7 +55,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev, \
up_read(&vdev->memory_lock); \
\
return 0; \
}
} \
EXPORT_SYMBOL_GPL(vfio_pci_core_iowrite##size);
VFIO_IOWRITE(8)
VFIO_IOWRITE(16)
@ -65,7 +66,7 @@ VFIO_IOWRITE(64)
#endif
#define VFIO_IOREAD(size) \
static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev, \
int vfio_pci_core_ioread##size(struct vfio_pci_core_device *vdev, \
bool test_mem, u##size *val, void __iomem *io) \
{ \
if (test_mem) { \
@ -82,7 +83,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev, \
up_read(&vdev->memory_lock); \
\
return 0; \
}
} \
EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size);
VFIO_IOREAD(8)
VFIO_IOREAD(16)
@ -119,13 +121,13 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
if (copy_from_user(&val, buf, 4))
return -EFAULT;
ret = vfio_pci_iowrite32(vdev, test_mem,
val, io + off);
ret = vfio_pci_core_iowrite32(vdev, test_mem,
val, io + off);
if (ret)
return ret;
} else {
ret = vfio_pci_ioread32(vdev, test_mem,
&val, io + off);
ret = vfio_pci_core_ioread32(vdev, test_mem,
&val, io + off);
if (ret)
return ret;
@ -141,13 +143,13 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
if (copy_from_user(&val, buf, 2))
return -EFAULT;
ret = vfio_pci_iowrite16(vdev, test_mem,
val, io + off);
ret = vfio_pci_core_iowrite16(vdev, test_mem,
val, io + off);
if (ret)
return ret;
} else {
ret = vfio_pci_ioread16(vdev, test_mem,
&val, io + off);
ret = vfio_pci_core_ioread16(vdev, test_mem,
&val, io + off);
if (ret)
return ret;
@ -163,13 +165,13 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
if (copy_from_user(&val, buf, 1))
return -EFAULT;
ret = vfio_pci_iowrite8(vdev, test_mem,
val, io + off);
ret = vfio_pci_core_iowrite8(vdev, test_mem,
val, io + off);
if (ret)
return ret;
} else {
ret = vfio_pci_ioread8(vdev, test_mem,
&val, io + off);
ret = vfio_pci_core_ioread8(vdev, test_mem,
&val, io + off);
if (ret)
return ret;
@ -200,7 +202,7 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
return done;
}
static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
{
struct pci_dev *pdev = vdev->pdev;
int ret;
@ -223,6 +225,7 @@ static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
return 0;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
@ -262,7 +265,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
}
x_end = end;
} else {
int ret = vfio_pci_setup_barmap(vdev, bar);
int ret = vfio_pci_core_setup_barmap(vdev, bar);
if (ret) {
done = ret;
goto out;
@ -363,21 +366,21 @@ static void vfio_pci_ioeventfd_do_write(struct vfio_pci_ioeventfd *ioeventfd,
{
switch (ioeventfd->count) {
case 1:
vfio_pci_iowrite8(ioeventfd->vdev, test_mem,
ioeventfd->data, ioeventfd->addr);
vfio_pci_core_iowrite8(ioeventfd->vdev, test_mem,
ioeventfd->data, ioeventfd->addr);
break;
case 2:
vfio_pci_iowrite16(ioeventfd->vdev, test_mem,
ioeventfd->data, ioeventfd->addr);
vfio_pci_core_iowrite16(ioeventfd->vdev, test_mem,
ioeventfd->data, ioeventfd->addr);
break;
case 4:
vfio_pci_iowrite32(ioeventfd->vdev, test_mem,
ioeventfd->data, ioeventfd->addr);
vfio_pci_core_iowrite32(ioeventfd->vdev, test_mem,
ioeventfd->data, ioeventfd->addr);
break;
#ifdef iowrite64
case 8:
vfio_pci_iowrite64(ioeventfd->vdev, test_mem,
ioeventfd->data, ioeventfd->addr);
vfio_pci_core_iowrite64(ioeventfd->vdev, test_mem,
ioeventfd->data, ioeventfd->addr);
break;
#endif
}
@ -438,7 +441,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
return -EINVAL;
#endif
ret = vfio_pci_setup_barmap(vdev, bar);
ret = vfio_pci_core_setup_barmap(vdev, bar);
if (ret)
return ret;

View file

@ -0,0 +1,15 @@
# SPDX-License-Identifier: GPL-2.0-only
config VIRTIO_VFIO_PCI
tristate "VFIO support for VIRTIO NET PCI devices"
depends on VIRTIO_PCI && VIRTIO_PCI_ADMIN_LEGACY
select VFIO_PCI_CORE
help
This provides support for exposing VIRTIO NET VF devices which support
legacy IO access, using the VFIO framework that can work with a legacy
virtio driver in the guest.
Based on PCIe spec, VFs do not support I/O Space.
As of that this driver emulates I/O BAR in software to let a VF be
seen as a transitional device by its users and let it work with
a legacy driver.
If you don't know what to do here, say N.

View file

@ -0,0 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
virtio-vfio-pci-y := main.o

View file

@ -0,0 +1,576 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
*/
#include <linux/device.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/pci.h>
#include <linux/pm_runtime.h>
#include <linux/types.h>
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/vfio_pci_core.h>
#include <linux/virtio_pci.h>
#include <linux/virtio_net.h>
#include <linux/virtio_pci_admin.h>
struct virtiovf_pci_core_device {
struct vfio_pci_core_device core_device;
u8 *bar0_virtual_buf;
/* synchronize access to the virtual buf */
struct mutex bar_mutex;
void __iomem *notify_addr;
u64 notify_offset;
__le32 pci_base_addr_0;
__le16 pci_cmd;
u8 bar0_virtual_buf_size;
u8 notify_bar;
};
static int
virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
loff_t pos, char __user *buf,
size_t count, bool read)
{
bool msix_enabled =
(virtvdev->core_device.irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
struct pci_dev *pdev = virtvdev->core_device.pdev;
u8 *bar0_buf = virtvdev->bar0_virtual_buf;
bool common;
u8 offset;
int ret;
common = pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled);
/* offset within the relevant configuration area */
offset = common ? pos : pos - VIRTIO_PCI_CONFIG_OFF(msix_enabled);
mutex_lock(&virtvdev->bar_mutex);
if (read) {
if (common)
ret = virtio_pci_admin_legacy_common_io_read(pdev, offset,
count, bar0_buf + pos);
else
ret = virtio_pci_admin_legacy_device_io_read(pdev, offset,
count, bar0_buf + pos);
if (ret)
goto out;
if (copy_to_user(buf, bar0_buf + pos, count))
ret = -EFAULT;
} else {
if (copy_from_user(bar0_buf + pos, buf, count)) {
ret = -EFAULT;
goto out;
}
if (common)
ret = virtio_pci_admin_legacy_common_io_write(pdev, offset,
count, bar0_buf + pos);
else
ret = virtio_pci_admin_legacy_device_io_write(pdev, offset,
count, bar0_buf + pos);
}
out:
mutex_unlock(&virtvdev->bar_mutex);
return ret;
}
static int
virtiovf_pci_bar0_rw(struct virtiovf_pci_core_device *virtvdev,
loff_t pos, char __user *buf,
size_t count, bool read)
{
struct vfio_pci_core_device *core_device = &virtvdev->core_device;
struct pci_dev *pdev = core_device->pdev;
u16 queue_notify;
int ret;
if (!(le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO))
return -EIO;
if (pos + count > virtvdev->bar0_virtual_buf_size)
return -EINVAL;
ret = pm_runtime_resume_and_get(&pdev->dev);
if (ret) {
pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
return -EIO;
}
switch (pos) {
case VIRTIO_PCI_QUEUE_NOTIFY:
if (count != sizeof(queue_notify)) {
ret = -EINVAL;
goto end;
}
if (read) {
ret = vfio_pci_core_ioread16(core_device, true, &queue_notify,
virtvdev->notify_addr);
if (ret)
goto end;
if (copy_to_user(buf, &queue_notify,
sizeof(queue_notify))) {
ret = -EFAULT;
goto end;
}
} else {
if (copy_from_user(&queue_notify, buf, count)) {
ret = -EFAULT;
goto end;
}
ret = vfio_pci_core_iowrite16(core_device, true, queue_notify,
virtvdev->notify_addr);
}
break;
default:
ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
read);
}
end:
pm_runtime_put(&pdev->dev);
return ret ? ret : count;
}
static bool range_intersect_range(loff_t range1_start, size_t count1,
loff_t range2_start, size_t count2,
loff_t *start_offset,
size_t *intersect_count,
size_t *register_offset)
{
if (range1_start <= range2_start &&
range1_start + count1 > range2_start) {
*start_offset = range2_start - range1_start;
*intersect_count = min_t(size_t, count2,
range1_start + count1 - range2_start);
*register_offset = 0;
return true;
}
if (range1_start > range2_start &&
range1_start < range2_start + count2) {
*start_offset = 0;
*intersect_count = min_t(size_t, count1,
range2_start + count2 - range1_start);
*register_offset = range1_start - range2_start;
return true;
}
return false;
}
static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
char __user *buf, size_t count,
loff_t *ppos)
{
struct virtiovf_pci_core_device *virtvdev = container_of(
core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
size_t register_offset;
loff_t copy_offset;
size_t copy_count;
__le32 val32;
__le16 val16;
u8 val8;
int ret;
ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
if (ret < 0)
return ret;
if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
&copy_offset, &copy_count, &register_offset)) {
val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, copy_count))
return -EFAULT;
}
if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
&copy_offset, &copy_count, &register_offset)) {
if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
copy_count))
return -EFAULT;
val16 |= cpu_to_le16(PCI_COMMAND_IO);
if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
copy_count))
return -EFAULT;
}
if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
&copy_offset, &copy_count, &register_offset)) {
/* Transional needs to have revision 0 */
val8 = 0;
if (copy_to_user(buf + copy_offset, &val8, copy_count))
return -EFAULT;
}
if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
&copy_offset, &copy_count, &register_offset)) {
u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1);
u32 pci_base_addr_0 = le32_to_cpu(virtvdev->pci_base_addr_0);
val32 = cpu_to_le32((pci_base_addr_0 & bar_mask) | PCI_BASE_ADDRESS_SPACE_IO);
if (copy_to_user(buf + copy_offset, (void *)&val32 + register_offset, copy_count))
return -EFAULT;
}
if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
&copy_offset, &copy_count, &register_offset)) {
/*
* Transitional devices use the PCI subsystem device id as
* virtio device id, same as legacy driver always did.
*/
val16 = cpu_to_le16(VIRTIO_ID_NET);
if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
copy_count))
return -EFAULT;
}
if (range_intersect_range(pos, count, PCI_SUBSYSTEM_VENDOR_ID, sizeof(val16),
&copy_offset, &copy_count, &register_offset)) {
val16 = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET);
if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
copy_count))
return -EFAULT;
}
return count;
}
static ssize_t
virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
size_t count, loff_t *ppos)
{
struct virtiovf_pci_core_device *virtvdev = container_of(
core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
if (!count)
return 0;
if (index == VFIO_PCI_CONFIG_REGION_INDEX)
return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
if (index == VFIO_PCI_BAR0_REGION_INDEX)
return virtiovf_pci_bar0_rw(virtvdev, pos, buf, count, true);
return vfio_pci_core_read(core_vdev, buf, count, ppos);
}
static ssize_t virtiovf_pci_write_config(struct vfio_device *core_vdev,
const char __user *buf, size_t count,
loff_t *ppos)
{
struct virtiovf_pci_core_device *virtvdev = container_of(
core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
size_t register_offset;
loff_t copy_offset;
size_t copy_count;
if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
&copy_offset, &copy_count,
&register_offset)) {
if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
buf + copy_offset,
copy_count))
return -EFAULT;
}
if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
sizeof(virtvdev->pci_base_addr_0),
&copy_offset, &copy_count,
&register_offset)) {
if (copy_from_user((void *)&virtvdev->pci_base_addr_0 + register_offset,
buf + copy_offset,
copy_count))
return -EFAULT;
}
return vfio_pci_core_write(core_vdev, buf, count, ppos);
}
static ssize_t
virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
size_t count, loff_t *ppos)
{
struct virtiovf_pci_core_device *virtvdev = container_of(
core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
if (!count)
return 0;
if (index == VFIO_PCI_CONFIG_REGION_INDEX)
return virtiovf_pci_write_config(core_vdev, buf, count, ppos);
if (index == VFIO_PCI_BAR0_REGION_INDEX)
return virtiovf_pci_bar0_rw(virtvdev, pos, (char __user *)buf, count, false);
return vfio_pci_core_write(core_vdev, buf, count, ppos);
}
static int
virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
unsigned int cmd, unsigned long arg)
{
struct virtiovf_pci_core_device *virtvdev = container_of(
core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
unsigned long minsz = offsetofend(struct vfio_region_info, offset);
void __user *uarg = (void __user *)arg;
struct vfio_region_info info = {};
if (copy_from_user(&info, uarg, minsz))
return -EFAULT;
if (info.argsz < minsz)
return -EINVAL;
switch (info.index) {
case VFIO_PCI_BAR0_REGION_INDEX:
info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
info.size = virtvdev->bar0_virtual_buf_size;
info.flags = VFIO_REGION_INFO_FLAG_READ |
VFIO_REGION_INFO_FLAG_WRITE;
return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
default:
return vfio_pci_core_ioctl(core_vdev, cmd, arg);
}
}
static long
virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
unsigned long arg)
{
switch (cmd) {
case VFIO_DEVICE_GET_REGION_INFO:
return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
default:
return vfio_pci_core_ioctl(core_vdev, cmd, arg);
}
}
static int
virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
{
struct vfio_pci_core_device *core_device = &virtvdev->core_device;
int ret;
/*
* Setup the BAR where the 'notify' exists to be used by vfio as well
* This will let us mmap it only once and use it when needed.
*/
ret = vfio_pci_core_setup_barmap(core_device,
virtvdev->notify_bar);
if (ret)
return ret;
virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
virtvdev->notify_offset;
return 0;
}
static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
{
struct virtiovf_pci_core_device *virtvdev = container_of(
core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
struct vfio_pci_core_device *vdev = &virtvdev->core_device;
int ret;
ret = vfio_pci_core_enable(vdev);
if (ret)
return ret;
if (virtvdev->bar0_virtual_buf) {
/*
* Upon close_device() the vfio_pci_core_disable() is called
* and will close all the previous mmaps, so it seems that the
* valid life cycle for the 'notify' addr is per open/close.
*/
ret = virtiovf_set_notify_addr(virtvdev);
if (ret) {
vfio_pci_core_disable(vdev);
return ret;
}
}
vfio_pci_core_finish_enable(vdev);
return 0;
}
static int virtiovf_get_device_config_size(unsigned short device)
{
/* Network card */
return offsetofend(struct virtio_net_config, status);
}
static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
{
u64 offset;
int ret;
u8 bar;
ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
&bar, &offset);
if (ret)
return ret;
virtvdev->notify_bar = bar;
virtvdev->notify_offset = offset;
return 0;
}
static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
{
struct virtiovf_pci_core_device *virtvdev = container_of(
core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
struct pci_dev *pdev;
int ret;
ret = vfio_pci_core_init_dev(core_vdev);
if (ret)
return ret;
pdev = virtvdev->core_device.pdev;
ret = virtiovf_read_notify_info(virtvdev);
if (ret)
return ret;
virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
virtiovf_get_device_config_size(pdev->device);
BUILD_BUG_ON(!is_power_of_2(virtvdev->bar0_virtual_buf_size));
virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
GFP_KERNEL);
if (!virtvdev->bar0_virtual_buf)
return -ENOMEM;
mutex_init(&virtvdev->bar_mutex);
return 0;
}
static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
{
struct virtiovf_pci_core_device *virtvdev = container_of(
core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
kfree(virtvdev->bar0_virtual_buf);
vfio_pci_core_release_dev(core_vdev);
}
static const struct vfio_device_ops virtiovf_vfio_pci_tran_ops = {
.name = "virtio-vfio-pci-trans",
.init = virtiovf_pci_init_device,
.release = virtiovf_pci_core_release_dev,
.open_device = virtiovf_pci_open_device,
.close_device = vfio_pci_core_close_device,
.ioctl = virtiovf_vfio_pci_core_ioctl,
.device_feature = vfio_pci_core_ioctl_feature,
.read = virtiovf_pci_core_read,
.write = virtiovf_pci_core_write,
.mmap = vfio_pci_core_mmap,
.request = vfio_pci_core_request,
.match = vfio_pci_core_match,
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
.detach_ioas = vfio_iommufd_physical_detach_ioas,
};
static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
.name = "virtio-vfio-pci",
.init = vfio_pci_core_init_dev,
.release = vfio_pci_core_release_dev,
.open_device = virtiovf_pci_open_device,
.close_device = vfio_pci_core_close_device,
.ioctl = vfio_pci_core_ioctl,
.device_feature = vfio_pci_core_ioctl_feature,
.read = vfio_pci_core_read,
.write = vfio_pci_core_write,
.mmap = vfio_pci_core_mmap,
.request = vfio_pci_core_request,
.match = vfio_pci_core_match,
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
.detach_ioas = vfio_iommufd_physical_detach_ioas,
};
static bool virtiovf_bar0_exists(struct pci_dev *pdev)
{
struct resource *res = pdev->resource;
return res->flags;
}
static int virtiovf_pci_probe(struct pci_dev *pdev,
const struct pci_device_id *id)
{
const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
struct virtiovf_pci_core_device *virtvdev;
int ret;
if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
!virtiovf_bar0_exists(pdev))
ops = &virtiovf_vfio_pci_tran_ops;
virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
&pdev->dev, ops);
if (IS_ERR(virtvdev))
return PTR_ERR(virtvdev);
dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
ret = vfio_pci_core_register_device(&virtvdev->core_device);
if (ret)
goto out;
return 0;
out:
vfio_put_device(&virtvdev->core_device.vdev);
return ret;
}
static void virtiovf_pci_remove(struct pci_dev *pdev)
{
struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
vfio_pci_core_unregister_device(&virtvdev->core_device);
vfio_put_device(&virtvdev->core_device.vdev);
}
static const struct pci_device_id virtiovf_pci_table[] = {
/* Only virtio-net is supported/tested so far */
{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
{}
};
MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
static void virtiovf_pci_aer_reset_done(struct pci_dev *pdev)
{
struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
virtvdev->pci_cmd = 0;
}
static const struct pci_error_handlers virtiovf_err_handlers = {
.reset_done = virtiovf_pci_aer_reset_done,
.error_detected = vfio_pci_core_aer_err_detected,
};
static struct pci_driver virtiovf_pci_driver = {
.name = KBUILD_MODNAME,
.id_table = virtiovf_pci_table,
.probe = virtiovf_pci_probe,
.remove = virtiovf_pci_remove,
.err_handler = &virtiovf_err_handlers,
.driver_managed_dma = true,
};
module_pci_driver(virtiovf_pci_driver);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
MODULE_DESCRIPTION(
"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");

View file

@ -448,4 +448,18 @@ static inline void vfio_device_put_kvm(struct vfio_device *device)
}
#endif
#ifdef CONFIG_VFIO_DEBUGFS
void vfio_debugfs_create_root(void);
void vfio_debugfs_remove_root(void);
void vfio_device_debugfs_init(struct vfio_device *vdev);
void vfio_device_debugfs_exit(struct vfio_device *vdev);
#else
static inline void vfio_debugfs_create_root(void) { }
static inline void vfio_debugfs_remove_root(void) { }
static inline void vfio_device_debugfs_init(struct vfio_device *vdev) { }
static inline void vfio_device_debugfs_exit(struct vfio_device *vdev) { }
#endif /* CONFIG_VFIO_DEBUGFS */
#endif

View file

@ -1436,7 +1436,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
list_for_each_entry(d, &iommu->domain_list, next) {
ret = iommu_map(d->domain, iova, (phys_addr_t)pfn << PAGE_SHIFT,
npage << PAGE_SHIFT, prot | IOMMU_CACHE,
GFP_KERNEL);
GFP_KERNEL_ACCOUNT);
if (ret)
goto unwind;
@ -1750,7 +1750,8 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
}
ret = iommu_map(domain->domain, iova, phys, size,
dma->prot | IOMMU_CACHE, GFP_KERNEL);
dma->prot | IOMMU_CACHE,
GFP_KERNEL_ACCOUNT);
if (ret) {
if (!dma->iommu_mapped) {
vfio_unpin_pages_remote(dma, iova,
@ -1845,7 +1846,8 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain, struct list_head *
continue;
ret = iommu_map(domain->domain, start, page_to_phys(pages), PAGE_SIZE * 2,
IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE, GFP_KERNEL);
IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE,
GFP_KERNEL_ACCOUNT);
if (!ret) {
size_t unmapped = iommu_unmap(domain->domain, start, PAGE_SIZE);

View file

@ -311,6 +311,7 @@ static int __vfio_register_dev(struct vfio_device *device,
refcount_set(&device->refcount, 1);
vfio_device_group_register(device);
vfio_device_debugfs_init(device);
return 0;
err_out:
@ -378,6 +379,7 @@ void vfio_unregister_group_dev(struct vfio_device *device)
}
}
vfio_device_debugfs_exit(device);
/* Balances vfio_device_set_group in register path */
vfio_device_remove_group(device);
}
@ -1676,6 +1678,7 @@ static int __init vfio_init(void)
if (ret)
goto err_alloc_dev_chrdev;
vfio_debugfs_create_root();
pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
return 0;
@ -1691,6 +1694,7 @@ static int __init vfio_init(void)
static void __exit vfio_cleanup(void)
{
vfio_debugfs_remove_root();
ida_destroy(&vfio.device_ida);
vfio_cdev_cleanup();
class_destroy(vfio.device_class);

View file

@ -60,6 +60,11 @@ config VIRTIO_PCI
If unsure, say M.
config VIRTIO_PCI_ADMIN_LEGACY
bool
depends on VIRTIO_PCI && (X86 || COMPILE_TEST)
default y
config VIRTIO_PCI_LEGACY
bool "Support for legacy virtio draft 0.9.X and older devices"
default y

View file

@ -7,6 +7,7 @@ obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
virtio_pci-$(CONFIG_VIRTIO_PCI_ADMIN_LEGACY) += virtio_pci_admin_legacy_io.o
obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o

View file

@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
if (err)
goto err;
if (dev->config->create_avq) {
err = dev->config->create_avq(dev);
if (err)
goto err;
}
err = drv->probe(dev);
if (err)
goto err;
goto err_probe;
/* If probe didn't do it, mark device DRIVER_OK ourselves. */
if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
virtio_config_enable(dev);
return 0;
err_probe:
if (dev->config->destroy_avq)
dev->config->destroy_avq(dev);
err:
virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
return err;
@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
drv->remove(dev);
if (dev->config->destroy_avq)
dev->config->destroy_avq(dev);
/* Driver should have reset device. */
WARN_ON_ONCE(dev->config->get_status(dev));
@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
int virtio_device_freeze(struct virtio_device *dev)
{
struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
int ret;
virtio_config_disable(dev);
dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
if (drv && drv->freeze)
return drv->freeze(dev);
if (drv && drv->freeze) {
ret = drv->freeze(dev);
if (ret)
return ret;
}
if (dev->config->destroy_avq)
dev->config->destroy_avq(dev);
return 0;
}
@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
if (ret)
goto err;
if (dev->config->create_avq) {
ret = dev->config->create_avq(dev);
if (ret)
goto err;
}
if (drv->restore) {
ret = drv->restore(dev);
if (ret)
goto err;
goto err_restore;
}
/* If restore didn't do it, mark device DRIVER_OK ourselves. */
@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
return 0;
err_restore:
if (dev->config->destroy_avq)
dev->config->destroy_avq(dev);
err:
virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
return ret;

View file

@ -0,0 +1,244 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
*/
#include <linux/virtio_pci_admin.h>
#include "virtio_pci_common.h"
/*
* virtio_pci_admin_has_legacy_io - Checks whether the legacy IO
* commands are supported
* @dev: VF pci_dev
*
* Returns true on success.
*/
bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev)
{
struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
struct virtio_pci_device *vp_dev;
if (!virtio_dev)
return false;
if (!virtio_has_feature(virtio_dev, VIRTIO_F_ADMIN_VQ))
return false;
vp_dev = to_vp_device(virtio_dev);
if ((vp_dev->admin_vq.supported_cmds & VIRTIO_LEGACY_ADMIN_CMD_BITMAP) ==
VIRTIO_LEGACY_ADMIN_CMD_BITMAP)
return true;
return false;
}
EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io);
static int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
u8 offset, u8 size, u8 *buf)
{
struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
struct virtio_admin_cmd_legacy_wr_data *data;
struct virtio_admin_cmd cmd = {};
struct scatterlist data_sg;
int vf_id;
int ret;
if (!virtio_dev)
return -ENODEV;
vf_id = pci_iov_vf_id(pdev);
if (vf_id < 0)
return vf_id;
data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
if (!data)
return -ENOMEM;
data->offset = offset;
memcpy(data->registers, buf, size);
sg_init_one(&data_sg, data, sizeof(*data) + size);
cmd.opcode = cpu_to_le16(opcode);
cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
cmd.group_member_id = cpu_to_le64(vf_id + 1);
cmd.data_sg = &data_sg;
ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
kfree(data);
return ret;
}
/*
* virtio_pci_admin_legacy_io_write_common - Write legacy common configuration
* of a member device
* @dev: VF pci_dev
* @offset: starting byte offset within the common configuration area to write to
* @size: size of the data to write
* @buf: buffer which holds the data
*
* Note: caller must serialize access for the given device.
* Returns 0 on success, or negative on failure.
*/
int virtio_pci_admin_legacy_common_io_write(struct pci_dev *pdev, u8 offset,
u8 size, u8 *buf)
{
return virtio_pci_admin_legacy_io_write(pdev,
VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE,
offset, size, buf);
}
EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_common_io_write);
/*
* virtio_pci_admin_legacy_io_write_device - Write legacy device configuration
* of a member device
* @dev: VF pci_dev
* @offset: starting byte offset within the device configuration area to write to
* @size: size of the data to write
* @buf: buffer which holds the data
*
* Note: caller must serialize access for the given device.
* Returns 0 on success, or negative on failure.
*/
int virtio_pci_admin_legacy_device_io_write(struct pci_dev *pdev, u8 offset,
u8 size, u8 *buf)
{
return virtio_pci_admin_legacy_io_write(pdev,
VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE,
offset, size, buf);
}
EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_device_io_write);
static int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
u8 offset, u8 size, u8 *buf)
{
struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
struct virtio_admin_cmd_legacy_rd_data *data;
struct scatterlist data_sg, result_sg;
struct virtio_admin_cmd cmd = {};
int vf_id;
int ret;
if (!virtio_dev)
return -ENODEV;
vf_id = pci_iov_vf_id(pdev);
if (vf_id < 0)
return vf_id;
data = kzalloc(sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
data->offset = offset;
sg_init_one(&data_sg, data, sizeof(*data));
sg_init_one(&result_sg, buf, size);
cmd.opcode = cpu_to_le16(opcode);
cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
cmd.group_member_id = cpu_to_le64(vf_id + 1);
cmd.data_sg = &data_sg;
cmd.result_sg = &result_sg;
ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
kfree(data);
return ret;
}
/*
* virtio_pci_admin_legacy_device_io_read - Read legacy device configuration of
* a member device
* @dev: VF pci_dev
* @offset: starting byte offset within the device configuration area to read from
* @size: size of the data to be read
* @buf: buffer to hold the returned data
*
* Note: caller must serialize access for the given device.
* Returns 0 on success, or negative on failure.
*/
int virtio_pci_admin_legacy_device_io_read(struct pci_dev *pdev, u8 offset,
u8 size, u8 *buf)
{
return virtio_pci_admin_legacy_io_read(pdev,
VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ,
offset, size, buf);
}
EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_device_io_read);
/*
* virtio_pci_admin_legacy_common_io_read - Read legacy common configuration of
* a member device
* @dev: VF pci_dev
* @offset: starting byte offset within the common configuration area to read from
* @size: size of the data to be read
* @buf: buffer to hold the returned data
*
* Note: caller must serialize access for the given device.
* Returns 0 on success, or negative on failure.
*/
int virtio_pci_admin_legacy_common_io_read(struct pci_dev *pdev, u8 offset,
u8 size, u8 *buf)
{
return virtio_pci_admin_legacy_io_read(pdev,
VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ,
offset, size, buf);
}
EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_common_io_read);
/*
* virtio_pci_admin_legacy_io_notify_info - Read the queue notification
* information for legacy interface
* @dev: VF pci_dev
* @req_bar_flags: requested bar flags
* @bar: on output the BAR number of the owner or member device
* @bar_offset: on output the offset within bar
*
* Returns 0 on success, or negative on failure.
*/
int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
u8 req_bar_flags, u8 *bar,
u64 *bar_offset)
{
struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
struct virtio_admin_cmd_notify_info_result *result;
struct virtio_admin_cmd cmd = {};
struct scatterlist result_sg;
int vf_id;
int ret;
if (!virtio_dev)
return -ENODEV;
vf_id = pci_iov_vf_id(pdev);
if (vf_id < 0)
return vf_id;
result = kzalloc(sizeof(*result), GFP_KERNEL);
if (!result)
return -ENOMEM;
sg_init_one(&result_sg, result, sizeof(*result));
cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
cmd.group_member_id = cpu_to_le64(vf_id + 1);
cmd.result_sg = &result_sg;
ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
if (!ret) {
struct virtio_admin_cmd_notify_info_data *entry;
int i;
ret = -ENOENT;
for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
entry = &result->entries[i];
if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
break;
if (entry->flags != req_bar_flags)
continue;
*bar = entry->bar;
*bar_offset = le64_to_cpu(entry->offset);
ret = 0;
break;
}
}
kfree(result);
return ret;
}
EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);

View file

@ -236,6 +236,9 @@ void vp_del_vqs(struct virtio_device *vdev)
int i;
list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
if (vp_dev->is_avq(vdev, vq->index))
continue;
if (vp_dev->per_vq_vectors) {
int v = vp_dev->vqs[vq->index]->msix_vector;
@ -642,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
.sriov_configure = virtio_pci_sriov_configure,
};
struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
{
struct virtio_pci_device *pf_vp_dev;
pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
if (IS_ERR(pf_vp_dev))
return NULL;
return &pf_vp_dev->vdev;
}
module_pci_driver(virtio_pci_driver);
MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");

View file

@ -29,6 +29,7 @@
#include <linux/virtio_pci_modern.h>
#include <linux/highmem.h>
#include <linux/spinlock.h>
#include <linux/mutex.h>
struct virtio_pci_vq_info {
/* the actual virtqueue */
@ -41,6 +42,17 @@ struct virtio_pci_vq_info {
unsigned int msix_vector;
};
struct virtio_pci_admin_vq {
/* Virtqueue info associated with this admin queue. */
struct virtio_pci_vq_info info;
/* serializing admin commands execution and virtqueue deletion */
struct mutex cmd_lock;
u64 supported_cmds;
/* Name of the admin queue: avq.$vq_index. */
char name[10];
u16 vq_index;
};
/* Our device structure */
struct virtio_pci_device {
struct virtio_device vdev;
@ -58,9 +70,13 @@ struct virtio_pci_device {
spinlock_t lock;
struct list_head virtqueues;
/* array of all queues for house-keeping */
/* Array of all virtqueues reported in the
* PCI common config num_queues field
*/
struct virtio_pci_vq_info **vqs;
struct virtio_pci_admin_vq admin_vq;
/* MSI-X support */
int msix_enabled;
int intx_enabled;
@ -86,6 +102,7 @@ struct virtio_pci_device {
void (*del_vq)(struct virtio_pci_vq_info *info);
u16 (*config_vector)(struct virtio_pci_device *vp_dev, u16 vector);
bool (*is_avq)(struct virtio_device *vdev, unsigned int index);
};
/* Constants for MSI-X */
@ -139,4 +156,27 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
int virtio_pci_modern_probe(struct virtio_pci_device *);
void virtio_pci_modern_remove(struct virtio_pci_device *);
struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
#define VIRTIO_LEGACY_ADMIN_CMD_BITMAP \
(BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
/* Unlike modern drivers which support hardware virtio devices, legacy drivers
* assume software-based devices: e.g. they don't use proper memory barriers
* on ARM, use big endian on PPC, etc. X86 drivers are mostly ok though, more
* or less by chance. For now, only support legacy IO on X86.
*/
#ifdef CONFIG_VIRTIO_PCI_ADMIN_LEGACY
#define VIRTIO_ADMIN_CMD_BITMAP VIRTIO_LEGACY_ADMIN_CMD_BITMAP
#else
#define VIRTIO_ADMIN_CMD_BITMAP 0
#endif
int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
struct virtio_admin_cmd *cmd);
#endif

View file

@ -19,6 +19,8 @@
#define VIRTIO_RING_NO_LEGACY
#include "virtio_pci_common.h"
#define VIRTIO_AVQ_SGS_MAX 4
static u64 vp_get_features(struct virtio_device *vdev)
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@ -26,6 +28,187 @@ static u64 vp_get_features(struct virtio_device *vdev)
return vp_modern_get_features(&vp_dev->mdev);
}
static bool vp_is_avq(struct virtio_device *vdev, unsigned int index)
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
return false;
return index == vp_dev->admin_vq.vq_index;
}
static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
u16 opcode,
struct scatterlist **sgs,
unsigned int out_num,
unsigned int in_num,
void *data)
{
struct virtqueue *vq;
int ret, len;
vq = admin_vq->info.vq;
if (!vq)
return -EIO;
if (opcode != VIRTIO_ADMIN_CMD_LIST_QUERY &&
opcode != VIRTIO_ADMIN_CMD_LIST_USE &&
!((1ULL << opcode) & admin_vq->supported_cmds))
return -EOPNOTSUPP;
ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, data, GFP_KERNEL);
if (ret < 0)
return -EIO;
if (unlikely(!virtqueue_kick(vq)))
return -EIO;
while (!virtqueue_get_buf(vq, &len) &&
!virtqueue_is_broken(vq))
cpu_relax();
if (virtqueue_is_broken(vq))
return -EIO;
return 0;
}
int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
struct virtio_admin_cmd *cmd)
{
struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat;
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
struct virtio_admin_cmd_status *va_status;
unsigned int out_num = 0, in_num = 0;
struct virtio_admin_cmd_hdr *va_hdr;
u16 status;
int ret;
if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
return -EOPNOTSUPP;
va_status = kzalloc(sizeof(*va_status), GFP_KERNEL);
if (!va_status)
return -ENOMEM;
va_hdr = kzalloc(sizeof(*va_hdr), GFP_KERNEL);
if (!va_hdr) {
ret = -ENOMEM;
goto err_alloc;
}
va_hdr->opcode = cmd->opcode;
va_hdr->group_type = cmd->group_type;
va_hdr->group_member_id = cmd->group_member_id;
/* Add header */
sg_init_one(&hdr, va_hdr, sizeof(*va_hdr));
sgs[out_num] = &hdr;
out_num++;
if (cmd->data_sg) {
sgs[out_num] = cmd->data_sg;
out_num++;
}
/* Add return status */
sg_init_one(&stat, va_status, sizeof(*va_status));
sgs[out_num + in_num] = &stat;
in_num++;
if (cmd->result_sg) {
sgs[out_num + in_num] = cmd->result_sg;
in_num++;
}
mutex_lock(&vp_dev->admin_vq.cmd_lock);
ret = virtqueue_exec_admin_cmd(&vp_dev->admin_vq,
le16_to_cpu(cmd->opcode),
sgs, out_num, in_num, sgs);
mutex_unlock(&vp_dev->admin_vq.cmd_lock);
if (ret) {
dev_err(&vdev->dev,
"Failed to execute command on admin vq: %d\n.", ret);
goto err_cmd_exec;
}
status = le16_to_cpu(va_status->status);
if (status != VIRTIO_ADMIN_STATUS_OK) {
dev_err(&vdev->dev,
"admin command error: status(%#x) qualifier(%#x)\n",
status, le16_to_cpu(va_status->status_qualifier));
ret = -status;
}
err_cmd_exec:
kfree(va_hdr);
err_alloc:
kfree(va_status);
return ret;
}
static void virtio_pci_admin_cmd_list_init(struct virtio_device *virtio_dev)
{
struct virtio_pci_device *vp_dev = to_vp_device(virtio_dev);
struct virtio_admin_cmd cmd = {};
struct scatterlist result_sg;
struct scatterlist data_sg;
__le64 *data;
int ret;
data = kzalloc(sizeof(*data), GFP_KERNEL);
if (!data)
return;
sg_init_one(&result_sg, data, sizeof(*data));
cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
cmd.result_sg = &result_sg;
ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
if (ret)
goto end;
*data &= cpu_to_le64(VIRTIO_ADMIN_CMD_BITMAP);
sg_init_one(&data_sg, data, sizeof(*data));
cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
cmd.data_sg = &data_sg;
cmd.result_sg = NULL;
ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
if (ret)
goto end;
vp_dev->admin_vq.supported_cmds = le64_to_cpu(*data);
end:
kfree(data);
}
static void vp_modern_avq_activate(struct virtio_device *vdev)
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq;
if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
return;
__virtqueue_unbreak(admin_vq->info.vq);
virtio_pci_admin_cmd_list_init(vdev);
}
static void vp_modern_avq_deactivate(struct virtio_device *vdev)
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq;
if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
return;
__virtqueue_break(admin_vq->info.vq);
}
static void vp_transport_features(struct virtio_device *vdev, u64 features)
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@ -37,6 +220,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features)
if (features & BIT_ULL(VIRTIO_F_RING_RESET))
__virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ))
__virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ);
}
static int __vp_check_common_size_one_feature(struct virtio_device *vdev, u32 fbit,
@ -69,6 +255,9 @@ static int vp_check_common_size(struct virtio_device *vdev)
if (vp_check_common_size_one_feature(vdev, VIRTIO_F_RING_RESET, queue_reset))
return -EINVAL;
if (vp_check_common_size_one_feature(vdev, VIRTIO_F_ADMIN_VQ, admin_queue_num))
return -EINVAL;
return 0;
}
@ -195,6 +384,8 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
/* We should never be setting status to 0. */
BUG_ON(status == 0);
vp_modern_set_status(&vp_dev->mdev, status);
if (status & VIRTIO_CONFIG_S_DRIVER_OK)
vp_modern_avq_activate(vdev);
}
static void vp_reset(struct virtio_device *vdev)
@ -211,6 +402,9 @@ static void vp_reset(struct virtio_device *vdev)
*/
while (vp_modern_get_status(mdev))
msleep(1);
vp_modern_avq_deactivate(vdev);
/* Flush pending VQ/configuration callbacks. */
vp_synchronize_vectors(vdev);
}
@ -345,6 +539,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
bool (*notify)(struct virtqueue *vq);
struct virtqueue *vq;
bool is_avq;
u16 num;
int err;
@ -353,11 +548,13 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
else
notify = vp_notify;
if (index >= vp_modern_get_num_queues(mdev))
is_avq = vp_is_avq(&vp_dev->vdev, index);
if (index >= vp_modern_get_num_queues(mdev) && !is_avq)
return ERR_PTR(-EINVAL);
num = is_avq ?
VIRTIO_AVQ_SGS_MAX : vp_modern_get_queue_size(mdev, index);
/* Check if queue is either not available or already active. */
num = vp_modern_get_queue_size(mdev, index);
if (!num || vp_modern_get_queue_enable(mdev, index))
return ERR_PTR(-ENOENT);
@ -383,6 +580,12 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
goto err;
}
if (is_avq) {
mutex_lock(&vp_dev->admin_vq.cmd_lock);
vp_dev->admin_vq.info.vq = vq;
mutex_unlock(&vp_dev->admin_vq.cmd_lock);
}
return vq;
err:
@ -418,6 +621,12 @@ static void del_vq(struct virtio_pci_vq_info *info)
struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
if (vp_is_avq(&vp_dev->vdev, vq->index)) {
mutex_lock(&vp_dev->admin_vq.cmd_lock);
vp_dev->admin_vq.info.vq = NULL;
mutex_unlock(&vp_dev->admin_vq.cmd_lock);
}
if (vp_dev->msix_enabled)
vp_modern_queue_vector(mdev, vq->index,
VIRTIO_MSI_NO_VECTOR);
@ -527,6 +736,45 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
return true;
}
static int vp_modern_create_avq(struct virtio_device *vdev)
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
struct virtio_pci_admin_vq *avq;
struct virtqueue *vq;
u16 admin_q_num;
if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
return 0;
admin_q_num = vp_modern_avq_num(&vp_dev->mdev);
if (!admin_q_num)
return -EINVAL;
avq = &vp_dev->admin_vq;
avq->vq_index = vp_modern_avq_index(&vp_dev->mdev);
sprintf(avq->name, "avq.%u", avq->vq_index);
vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin_vq.info, avq->vq_index, NULL,
avq->name, NULL, VIRTIO_MSI_NO_VECTOR);
if (IS_ERR(vq)) {
dev_err(&vdev->dev, "failed to setup admin virtqueue, err=%ld",
PTR_ERR(vq));
return PTR_ERR(vq);
}
vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
return 0;
}
static void vp_modern_destroy_avq(struct virtio_device *vdev)
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
return;
vp_dev->del_vq(&vp_dev->admin_vq.info);
}
static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
.get = NULL,
.set = NULL,
@ -545,6 +793,8 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
.get_shm_region = vp_get_shm_region,
.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
.create_avq = vp_modern_create_avq,
.destroy_avq = vp_modern_destroy_avq,
};
static const struct virtio_config_ops virtio_pci_config_ops = {
@ -565,6 +815,8 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
.get_shm_region = vp_get_shm_region,
.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
.create_avq = vp_modern_create_avq,
.destroy_avq = vp_modern_destroy_avq,
};
/* the PCI probing function */
@ -588,9 +840,11 @@ int virtio_pci_modern_probe(struct virtio_pci_device *vp_dev)
vp_dev->config_vector = vp_config_vector;
vp_dev->setup_vq = setup_vq;
vp_dev->del_vq = del_vq;
vp_dev->is_avq = vp_is_avq;
vp_dev->isr = mdev->isr;
vp_dev->vdev.id = mdev->id;
mutex_init(&vp_dev->admin_vq.cmd_lock);
return 0;
}
@ -598,5 +852,6 @@ void virtio_pci_modern_remove(struct virtio_pci_device *vp_dev)
{
struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
mutex_destroy(&vp_dev->admin_vq.cmd_lock);
vp_modern_remove(mdev);
}

View file

@ -207,6 +207,10 @@ static inline void check_offsets(void)
offsetof(struct virtio_pci_modern_common_cfg, queue_notify_data));
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_RESET !=
offsetof(struct virtio_pci_modern_common_cfg, queue_reset));
BUILD_BUG_ON(VIRTIO_PCI_COMMON_ADM_Q_IDX !=
offsetof(struct virtio_pci_modern_common_cfg, admin_queue_index));
BUILD_BUG_ON(VIRTIO_PCI_COMMON_ADM_Q_NUM !=
offsetof(struct virtio_pci_modern_common_cfg, admin_queue_num));
}
/*
@ -296,7 +300,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
mdev->common = vp_modern_map_capability(mdev, common,
sizeof(struct virtio_pci_common_cfg), 4, 0,
offsetofend(struct virtio_pci_modern_common_cfg,
queue_reset),
admin_queue_num),
&mdev->common_len, NULL);
if (!mdev->common)
goto err_map_common;
@ -719,6 +723,24 @@ void __iomem *vp_modern_map_vq_notify(struct virtio_pci_modern_device *mdev,
}
EXPORT_SYMBOL_GPL(vp_modern_map_vq_notify);
u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev)
{
struct virtio_pci_modern_common_cfg __iomem *cfg;
cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
return vp_ioread16(&cfg->admin_queue_num);
}
EXPORT_SYMBOL_GPL(vp_modern_avq_num);
u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
{
struct virtio_pci_modern_common_cfg __iomem *cfg;
cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
return vp_ioread16(&cfg->admin_queue_index);
}
EXPORT_SYMBOL_GPL(vp_modern_avq_index);
MODULE_VERSION("0.1");
MODULE_DESCRIPTION("Modern Virtio PCI Device");
MODULE_AUTHOR("Jason Wang <jasowang@redhat.com>");

View file

@ -69,6 +69,13 @@ struct vfio_device {
u8 iommufd_attached:1;
#endif
u8 cdev_opened:1;
#ifdef CONFIG_DEBUG_FS
/*
* debug_root is a static property of the vfio_device
* which must be set prior to registering the vfio_device.
*/
struct dentry *debug_root;
#endif
};
/**

View file

@ -127,7 +127,27 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
pci_channel_state_t state);
#define VFIO_IOWRITE_DECLATION(size) \
int vfio_pci_core_iowrite##size(struct vfio_pci_core_device *vdev, \
bool test_mem, u##size val, void __iomem *io);
VFIO_IOWRITE_DECLATION(8)
VFIO_IOWRITE_DECLATION(16)
VFIO_IOWRITE_DECLATION(32)
#ifdef iowrite64
VFIO_IOWRITE_DECLATION(64)
#endif
#define VFIO_IOREAD_DECLATION(size) \
int vfio_pci_core_ioread##size(struct vfio_pci_core_device *vdev, \
bool test_mem, u##size *val, void __iomem *io);
VFIO_IOREAD_DECLATION(8)
VFIO_IOREAD_DECLATION(16)
VFIO_IOREAD_DECLATION(32)
#endif /* VFIO_PCI_CORE_H */

View file

@ -103,6 +103,14 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
int virtqueue_reset(struct virtqueue *vq,
void (*recycle)(struct virtqueue *vq, void *buf));
struct virtio_admin_cmd {
__le16 opcode;
__le16 group_type;
__le64 group_member_id;
struct scatterlist *data_sg;
struct scatterlist *result_sg;
};
/**
* struct virtio_device - representation of a device using virtio
* @index: unique position on the virtio bus

View file

@ -93,6 +93,8 @@ typedef void vq_callback_t(struct virtqueue *);
* Returns 0 on success or error status
* If disable_vq_and_reset is set, then enable_vq_after_reset must also be
* set.
* @create_avq: create admin virtqueue resource.
* @destroy_avq: destroy admin virtqueue resource.
*/
struct virtio_config_ops {
void (*get)(struct virtio_device *vdev, unsigned offset,
@ -120,6 +122,8 @@ struct virtio_config_ops {
struct virtio_shm_region *region, u8 id);
int (*disable_vq_and_reset)(struct virtqueue *vq);
int (*enable_vq_after_reset)(struct virtqueue *vq);
int (*create_avq)(struct virtio_device *vdev);
void (*destroy_avq)(struct virtio_device *vdev);
};
/* If driver didn't advertise the feature, it will never appear. */

View file

@ -0,0 +1,23 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
#define _LINUX_VIRTIO_PCI_ADMIN_H
#include <linux/types.h>
#include <linux/pci.h>
#ifdef CONFIG_VIRTIO_PCI_ADMIN_LEGACY
bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev);
int virtio_pci_admin_legacy_common_io_write(struct pci_dev *pdev, u8 offset,
u8 size, u8 *buf);
int virtio_pci_admin_legacy_common_io_read(struct pci_dev *pdev, u8 offset,
u8 size, u8 *buf);
int virtio_pci_admin_legacy_device_io_write(struct pci_dev *pdev, u8 offset,
u8 size, u8 *buf);
int virtio_pci_admin_legacy_device_io_read(struct pci_dev *pdev, u8 offset,
u8 size, u8 *buf);
int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
u8 req_bar_flags, u8 *bar,
u64 *bar_offset);
#endif
#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */

View file

@ -125,4 +125,6 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev);
void vp_modern_remove(struct virtio_pci_modern_device *mdev);
int vp_modern_get_queue_reset(struct virtio_pci_modern_device *mdev, u16 index);
void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index);
u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev);
u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev);
#endif

View file

@ -1219,6 +1219,7 @@ enum vfio_device_mig_state {
VFIO_DEVICE_STATE_RUNNING_P2P = 5,
VFIO_DEVICE_STATE_PRE_COPY = 6,
VFIO_DEVICE_STATE_PRE_COPY_P2P = 7,
VFIO_DEVICE_STATE_NR,
};
/**

View file

@ -52,7 +52,7 @@
* rest are per-device feature bits.
*/
#define VIRTIO_TRANSPORT_F_START 28
#define VIRTIO_TRANSPORT_F_END 41
#define VIRTIO_TRANSPORT_F_END 42
#ifndef VIRTIO_CONFIG_NO_LEGACY
/* Do we get callbacks when the ring is completely used, even if we've
@ -114,4 +114,10 @@
* This feature indicates that the driver can reset a queue individually.
*/
#define VIRTIO_F_RING_RESET 40
/*
* This feature indicates that the device support administration virtqueues.
*/
#define VIRTIO_F_ADMIN_VQ 41
#endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */

View file

@ -175,6 +175,9 @@ struct virtio_pci_modern_common_cfg {
__le16 queue_notify_data; /* read-write */
__le16 queue_reset; /* read-write */
__le16 admin_queue_index; /* read-only */
__le16 admin_queue_num; /* read-only */
};
/* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
@ -215,7 +218,72 @@ struct virtio_pci_cfg_cap {
#define VIRTIO_PCI_COMMON_Q_USEDHI 52
#define VIRTIO_PCI_COMMON_Q_NDATA 56
#define VIRTIO_PCI_COMMON_Q_RESET 58
#define VIRTIO_PCI_COMMON_ADM_Q_IDX 60
#define VIRTIO_PCI_COMMON_ADM_Q_NUM 62
#endif /* VIRTIO_PCI_NO_MODERN */
/* Admin command status. */
#define VIRTIO_ADMIN_STATUS_OK 0
/* Admin command opcode. */
#define VIRTIO_ADMIN_CMD_LIST_QUERY 0x0
#define VIRTIO_ADMIN_CMD_LIST_USE 0x1
/* Admin command group type. */
#define VIRTIO_ADMIN_GROUP_TYPE_SRIOV 0x1
/* Transitional device admin command. */
#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE 0x2
#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ 0x3
#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE 0x4
#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ 0x5
#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO 0x6
struct __packed virtio_admin_cmd_hdr {
__le16 opcode;
/*
* 1 - SR-IOV
* 2-65535 - reserved
*/
__le16 group_type;
/* Unused, reserved for future extensions. */
__u8 reserved1[12];
__le64 group_member_id;
};
struct __packed virtio_admin_cmd_status {
__le16 status;
__le16 status_qualifier;
/* Unused, reserved for future extensions. */
__u8 reserved2[4];
};
struct __packed virtio_admin_cmd_legacy_wr_data {
__u8 offset; /* Starting offset of the register(s) to write. */
__u8 reserved[7];
__u8 registers[];
};
struct __packed virtio_admin_cmd_legacy_rd_data {
__u8 offset; /* Starting offset of the register(s) to read. */
};
#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END 0
#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_DEV 0x1
#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM 0x2
#define VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO 4
struct __packed virtio_admin_cmd_notify_info_data {
__u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
__u8 bar; /* BAR of the member or the owner device */
__u8 padding[6];
__le64 offset; /* Offset within bar. */
};
struct virtio_admin_cmd_notify_info_result {
struct virtio_admin_cmd_notify_info_data entries[VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO];
};
#endif