Commit graph

4779 commits

Author SHA1 Message Date
Robin Murphy cb147bbe22 dma-mapping: name SG DMA flag helpers consistently
sg_is_dma_bus_address() is inconsistent with the naming pattern of its
corresponding setters and its own kerneldoc, so take the majority vote and
rename it sg_dma_is_bus_address() (and fix up the missing underscores in
the kerneldoc too).  This gives us a nice clear pattern where SG DMA flags
are SG_DMA_<NAME>, and the helpers for acting on them are
sg_dma_<action>_<name>().

Link: https://lkml.kernel.org/r/20230612153201.554742-14-catalin.marinas@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
  Link: https://lore.kernel.org/r/fa2eca2862c7ffc41b50337abffb2dfd2864d3ea.1685036694.git.robin.murphy@arm.com
Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:22 -07:00
Joerg Roedel a7a334076d Merge branches 'iommu/fixes', 'arm/smmu', 'ppc/pamu', 'virtio', 'x86/vt-d', 'core' and 'x86/amd' into next 2023-06-19 10:12:42 +02:00
Lu Baolu b4da4e112a iommu/vt-d: Remove commented-out code
These lines of code were commented out when they were first added in commit
ba39592764 ("Intel IOMMU: Intel IOMMU driver"). We do not want to restore
them because the VT-d spec has deprecated the read/write draining hit.

VT-d spec (section 11.4.2):
"
 Hardware implementation with Major Version 2 or higher (VER_REG), always
 performs required drain without software explicitly requesting a drain in
 IOTLB invalidation. This field is deprecated and hardware  will always
 report it as 1 to maintain backward compatibility with software.
"

Remove the code to make the code cleaner.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20230609060514.15154-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:33 +02:00
Yanfei Xu 3f13f72787 iommu/vt-d: Remove two WARN_ON in domain_context_mapping_one()
Remove the WARN_ON(did == 0) as the domain id 0 is reserved and
set once the domain_ids is allocated. So iommu_init_domains will
never return 0.

Remove the WARN_ON(!table) as this pointer will be accessed in
the following code, if empty "table" really happens, the kernel
will report a NULL pointer reference warning at the first place.

Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230605112659.308981-3-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:32 +02:00
Yanfei Xu a0e9911ac1 iommu/vt-d: Handle the failure case of dmar_reenable_qi()
dmar_reenable_qi() may not succeed. Check and return when it fails.

Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230605112659.308981-2-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:32 +02:00
Suhui 82d9654f92 iommu/vt-d: Remove unnecessary (void*) conversions
No need cast (void*) to (struct root_entry *).

Signed-off-by: Suhui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20230425033743.75986-1-suhui@nfschina.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:31 +02:00
Su Hui 5b00369fcf iommu/amd: Fix possible memory leak of 'domain'
Move allocation code down to avoid memory leak.

Fixes: 29f54745f2 ("iommu/amd: Add missing domain type checks")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230608021933.856045-1-suhui@nfschina.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:36:45 +02:00
Vasant Hegde 78db2985c2 iommu/amd: Remove extern from function prototypes
The kernel coding style does not require 'extern' in function prototypes.
Hence remove them from header file.

No functional change intended.

Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20230609090631.6052-2-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:33:58 +02:00
Vasant Hegde d18f4ee219 iommu/amd: Use BIT/BIT_ULL macro to define bit fields
Make use of BIT macro when defining bitfields which makes it easy to read.

No functional change intended.

Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230609090631.6052-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:32:30 +02:00
Vasant Hegde 85751a8af5 iommu/amd: Fix DTE_IRQ_PHYS_ADDR_MASK macro
Interrupt Table Root Pointer is 52 bit and table must be aligned to start
on a 128-byte boundary. Hence first 6 bits are ignored.

Current code uses address mask as 45 instead of 46bit. Use GENMASK_ULL
macro instead of manually generating address mask.

Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20230609090327.5923-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:30:59 +02:00
Lorenzo Stoakes 0b295316b3 mm/gup: remove unused vmas parameter from pin_user_pages_remote()
No invocation of pin_user_pages_remote() uses the vmas parameter, so
remove it.  This forms part of a larger patch set eliminating the use of
the vmas parameters altogether.

Link: https://lkml.kernel.org/r/28f000beb81e45bf538a2aaa77c90f5482b67a32.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:25 -07:00
Joerg Roedel 1ce018df87 iommu/amd: Fix compile error for unused function
Recent changes introduced a compile error:

drivers/iommu/amd/iommu.c:1285:13: error: ‘iommu_flush_irt_and_complete’ defined but not used [-Werror=unused-function]
 1285 | static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

This happens with defconfig-x86_64 because AMD IOMMU is enabled but
CONFIG_IRQ_REMAP is disabled. Move the function under #ifdef
CONFIG_IRQ_REMAP to fix the error.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09 15:18:12 +02:00
Suravee Suthikulpanit bccc37a8a2 iommu/amd: Improving Interrupt Remapping Table Invalidation
Invalidating Interrupt Remapping Table (IRT) requires, the AMD IOMMU driver
to issue INVALIDATE_INTERRUPT_TABLE and COMPLETION_WAIT commands.
Currently, the driver issues the two commands separately, which requires
calling raw_spin_lock_irqsave() twice. In addition, the COMPLETION_WAIT
could potentially be interleaved with other commands causing delay of
the COMPLETION_WAIT command.

Therefore, combine issuing of the two commands in one spin-lock, and
changing struct amd_iommu.cmd_sem_val to use atomic64 to minimize
locking.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20230530141137.14376-6-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09 14:47:10 +02:00
Suravee Suthikulpanit 98aeb4ea55 iommu/amd: Do not Invalidate IRT when IRTE caching is disabled
With the Interrupt Remapping Table cache disabled, there is no need to
issue invalidate IRT and wait for its completion. Therefore, add logic
to bypass the operation.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Suggested-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20230530141137.14376-5-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09 14:47:10 +02:00
Suravee Suthikulpanit 66419036f6 iommu/amd: Introduce Disable IRTE Caching Support
An Interrupt Remapping Table (IRT) stores interrupt remapping configuration
for each device. In a normal operation, the AMD IOMMU caches the table
to optimize subsequent data accesses. This requires the IOMMU driver to
invalidate IRT whenever it updates the table. The invalidation process
includes issuing an INVALIDATE_INTERRUPT_TABLE command following by
a COMPLETION_WAIT command.

However, there are cases in which the IRT is updated at a high rate.
For example, for IOMMU AVIC, the IRTE[IsRun] bit is updated on every
vcpu scheduling (i.e. amd_iommu_update_ga()). On system with large
amount of vcpus and VFIO PCI pass-through devices, the invalidation
process could potentially become a performance bottleneck.

Introducing a new kernel boot option:

    amd_iommu=irtcachedis

which disables IRTE caching by setting the IRTCachedis bit in each IOMMU
Control register, and bypass the IRT invalidation process.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Co-developed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20230530141137.14376-4-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09 14:47:09 +02:00
Suravee Suthikulpanit 74a37817bd iommu/amd: Remove the unused struct amd_ir_data.ref
Since the amd_iommu_update_ga() has been switched to use the
modify_irte_ga() helper function to update the IRTE, the parameter
is no longer needed.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Suggested-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20230530141137.14376-3-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09 14:47:08 +02:00
Joao Martins a42f0c7a41 iommu/amd: Switch amd_iommu_update_ga() to use modify_irte_ga()
The modify_irte_ga() uses cmpxchg_double() to update the IRTE in one shot,
which is necessary when adding IRTE cache disabling support since
the driver no longer need to flush the IRT for hardware to take effect.

Please note that there is a functional change where the IsRun and
Destination bits of IRTE are now cached in the struct amd_ir_data.entry.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20230530141137.14376-2-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09 14:47:08 +02:00
Robin Murphy 6833b8f2e1 iommu/arm-smmu-v3: Set TTL invalidation hint better
When io-pgtable unmaps a whole table, rather than waste time walking it
to find the leaf entries to invalidate exactly, it simply expects
.tlb_flush_walk with nominal last-level granularity to invalidate any
leaf entries at higher intermediate levels as well. This works fine with
page-based invalidation, but with range commands we need to be careful
with the TTL hint - unconditionally setting it based on the given level
3 granule means that an invalidation for a level 1 table would strictly
not be required to affect level 2 block entries. It's easy to comply
with the expected behaviour by simply not setting the TTL hint for
non-leaf invalidations, so let's do that.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b409d9a17c52dc0db51faee91d92737bb7975f5b.1685637456.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08 22:00:22 +01:00
Robin Murphy 0bfbfc526c iommu/arm-smmu-v3: Document nesting-related errata
Both MMU-600 and MMU-700 have similar errata around TLB invalidation
while both stages of translation are active, which will need some
consideration once nesting support is implemented. For now, though,
it's very easy to make our implicit lack of nesting support explicit
for those cases, so they're less likely to be missed in future.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/696da78d32bb4491f898f11b0bb4d850a8aa7c6a.1683731256.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08 21:58:12 +01:00
Robin Murphy 1d9777b9f3 iommu/arm-smmu-v3: Add explicit feature for nesting
In certain cases we may want to refuse to allow nested translation even
when both stages are implemented, so let's add an explicit feature for
nesting support which we can control in its own right. For now this
merely serves as documentation, but it means a nice convenient check
will be ready and waiting for the future nesting code.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/136c3f4a3a84cc14a5a1978ace57dfd3ed67b688.1683731256.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08 21:58:12 +01:00
Robin Murphy 309a15cb16 iommu/arm-smmu-v3: Document MMU-700 erratum 2812531
To work around MMU-700 erratum 2812531 we need to ensure that certain
sequences of commands cannot be issued without an intervening sync. In
practice this falls out of our current command-batching machinery
anyway - each batch only contains a single type of invalidation command,
and ends with a sync. The only exception is when a batch is sufficiently
large to need issuing across multiple command queue slots, wherein the
earlier slots will not contain a sync and thus may in theory interleave
with another batch being issued in parallel to create an affected
sequence across the slot boundary.

Since MMU-700 supports range invalidate commands and thus we will prefer
to use them (which also happens to avoid conditions for other errata),
I'm not entirely sure it's even possible for a single high-level
invalidate call to generate a batch of more than 63 commands, but for
the sake of robustness and documentation, wire up an option to enforce
that a sync is always inserted for every slot issued.

The other aspect is that the relative order of DVM commands cannot be
controlled, so DVM cannot be used. Again that is already the status quo,
but since we have at least defined ARM_SMMU_FEAT_BTM, we can explicitly
disable it for documentation purposes even if it's not wired up anywhere
yet.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/330221cdfd0003cd51b6c04e7ff3566741ad8374.1683731256.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08 21:58:12 +01:00
Robin Murphy f322e8af35 iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982
MMU-600 versions prior to r1p0 fail to correctly generate a WFE wakeup
event when the command queue transitions fom full to non-full. We can
easily work around this by simply hiding the SEV capability such that we
fall back to polling for space in the queue - since MMU-600 implements
MSIs we wouldn't expect to need SEV for sync completion either, so this
should have little to no impact.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/08adbe3d01024d8382a478325f73b56851f76e49.1683731256.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-06-08 21:58:12 +01:00
Peter Zijlstra b1fe7f2cda x86,intel_iommu: Replace cmpxchg_double()
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.855976804@infradead.org
2023-06-05 09:36:38 +02:00
Peter Zijlstra 0a0a6800b0 x86,amd_iommu: Replace cmpxchg_double()
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Vasant Hegde <vasant.hegde@amd.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.788955257@infradead.org
2023-06-05 09:36:38 +02:00
Chen-Yu Tsai b3fc95709c iommu/mediatek: Flush IOTLB completely only if domain has been attached
If an IOMMU domain was never attached, it lacks any linkage to the
actual IOMMU hardware. Attempting to do flush_iotlb_all() on it will
result in a NULL pointer dereference. This seems to happen after the
recent IOMMU core rework in v6.4-rc1.

    Unable to handle kernel read from unreadable memory at virtual address 0000000000000018
    Call trace:
     mtk_iommu_flush_iotlb_all+0x20/0x80
     iommu_create_device_direct_mappings.part.0+0x13c/0x230
     iommu_setup_default_domain+0x29c/0x4d0
     iommu_probe_device+0x12c/0x190
     of_iommu_configure+0x140/0x208
     of_dma_configure_id+0x19c/0x3c0
     platform_dma_configure+0x38/0x88
     really_probe+0x78/0x2c0

Check if the "bank" field has been filled in before actually attempting
the IOTLB flush to avoid it. The IOTLB is also flushed when the device
comes out of runtime suspend, so it should have a clean initial state.

Fixes: 08500c43d4 ("iommu/mediatek: Adjust the structure")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230526085402.394239-1-wenst@chromium.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-01 11:50:13 +02:00
Jason Gunthorpe 139a57a991 iommu/fsl: Use driver_managed_dma to allow VFIO to work
The FSL driver is mangling the iommu_groups to not have a group for its
PCI bridge/controller (eg the thing passed to fsl_add_bridge()). Robin
says this is so FSL could work with VFIO which would be blocked by having
a probed driver on the platform_device in the same group. This is
supported by comments from FSL:

 https://lore.kernel.org/all/C5ECD7A89D1DC44195F34B25E172658D459471@039-SN2MPN1-013.039d.mgd.msft.net

 ..  PCIe devices share the same device group as the PCI controller. This
  becomes a problem while assigning the devices to the guest, as you are
  required to unbind all the PCIe devices including the controller from the
  host. PCIe controller can't be unbound from the host, so we simply delete
  the controller iommu_group.

However, today, we use driver_managed_dma to allow PCI infrastructure
devices that are 'security safe' to co-exist in groups and still allow
VFIO to work. Set this flag for the fsl_pci_driver.

Change fsl_pamu_device_group() so that it no longer removes the controller
from any groups. For check_pci_ctl_endpt_part() mode this creates an extra
group that contains only the controller.

Otherwise force the controller's single group to be the group of all the
PCI devices on the controller's hose. VFIO continues to work because of
driver_managed_dma.

Remove the iommu_group_remove_device() calls from fsl_pamu and lightly
restructure its fsl_pamu_device_group() function.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3-v2-ce71068deeec+4cf6-fsl_rm_groups_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-01 11:47:48 +02:00
Jason Gunthorpe 7977a08e11 iommu/fsl: Move ENODEV to fsl_pamu_probe_device()
The expectation is for the probe op to return ENODEV if the iommu is not
able to support the device. Move the check for fsl,liodn to
fsl_pamu_probe_device() simplify fsl_pamu_device_group()

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2-v2-ce71068deeec+4cf6-fsl_rm_groups_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-01 11:47:47 +02:00
Jason Gunthorpe 5f6489723d iommu/fsl: Always allocate a group for non-pci devices
fsl_pamu_device_group() is only called if dev->iommu_group is NULL, so
iommu_group_get() always returns NULL. Remove this test and just allocate
a group. Call generic_device_group() for this like the other drivers.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1-v2-ce71068deeec+4cf6-fsl_rm_groups_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-01 11:47:47 +02:00
Vasant Hegde 11c439a194 iommu/amd/pgtbl_v2: Fix domain max address
IOMMU v2 page table supports 4 level (47 bit) or 5 level (56 bit) virtual
address space. Current code assumes it can support 64bit IOVA address
space. If IOVA allocator allocates virtual address > 47/56 bit (depending
on page table level) then it will do wrong mapping and cause invalid
translation.

Hence adjust aperture size to use max address supported by the page table.

Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Fixes: aaac38f614 ("iommu/amd: Initial support for AMD IOMMU v2 page table")
Cc: <Stable@vger.kernel.org>  # v6.0+
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20230518054351.9626-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:29:25 +02:00
Jean-Philippe Brucker 7061b6af34 iommu/virtio: Return size mapped for a detached domain
When map() is called on a detached domain, the domain does not exist in
the device so we do not send a MAP request, but we do update the
internal mapping tree, to be replayed on the next attach. Since this
constitutes a successful iommu_map() call, return *mapped in this case
too.

Fixes: 7e62edd7a3 ("iommu/virtio: Add map/unmap_pages() callbacks implementation")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230515113946.1017624-3-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:19:03 +02:00
Jean-Philippe Brucker 809d0810e3 iommu/virtio: Detach domain on endpoint release
When an endpoint is released, for example a PCIe VF being destroyed or a
function hot-unplugged, it should be detached from its domain. Send a
DETACH request.

Fixes: edcd69ab9a ("iommu: Add virtio-iommu driver")
Reported-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Link: https://lore.kernel.org/all/15bf1b00-3aa0-973a-3a86-3fa5c4d41d2c@daynix.com/
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Link: https://lore.kernel.org/r/20230515113946.1017624-2-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:19:03 +02:00
Jason Gunthorpe 5957c19305 iommu: Tidy the control flow in iommu_group_store_type()
Use a normal "goto unwind" instead of trying to be clever with checking
!ret and manually managing the unlock.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/17-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:58 +02:00
Jason Gunthorpe e996c12d76 iommu: Remove __iommu_group_for_each_dev()
The last two users of it are quite trivial, just open code the one line
loop.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/16-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:58 +02:00
Jason Gunthorpe 1000dccd5d iommu: Allow IOMMU_RESV_DIRECT to work on ARM
For now several ARM drivers do not allow mappings to be created until a
domain is attached. This means they do not technically support
IOMMU_RESV_DIRECT as it requires the 1:1 maps to work continuously.

Currently if the platform requests these maps on ARM systems they are
silently ignored.

Work around this by trying again to establish the direct mappings after
the domain is attached if the pre-attach attempt failed.

In the long run the drivers will be fixed to fully setup domains when they
are created without waiting for attachment.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/15-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:57 +02:00
Jason Gunthorpe d99be00f42 iommu: Consolidate the default_domain setup to one function
Make iommu_change_dev_def_domain() general enough to setup the initial
default_domain or replace it with a new default_domain. Call the new
function iommu_setup_default_domain() and make it the only place in the
code that stores to group->default_domain.

Consolidate the three copies of the default_domain setup sequence. The flow
flow requires:

 - Determining the domain type to use
 - Checking if the current default domain is the same type
 - Allocating a domain
 - Doing iommu_create_device_direct_mappings()
 - Attaching it to devices
 - Store group->default_domain

This adjusts the domain allocation from the prior patch to be able to
detect if each of the allocation steps is already the domain we already
have, which is a more robust version of what change default domain was
already doing.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/14-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:57 +02:00
Jason Gunthorpe fcbb0a4d73 iommu: Revise iommu_group_alloc_default_domain()
Robin points out that the fallback to guessing what domains the driver
supports should only happen if the driver doesn't return a preference from
its ops->def_domain_type().

Re-organize iommu_group_alloc_default_domain() so it internally uses
iommu_def_domain_type only during the fallback and makes it clearer how
the fallback sequence works.

Make iommu_group_alloc_default_domain() return the domain so the return
based logic is cleaner and to prepare for the next patch.

Remove the iommu_alloc_default_domain() function as it is now trivially
just calling iommu_group_alloc_default_domain().

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/13-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:56 +02:00
Jason Gunthorpe 8b4eb75ee5 iommu: Consolidate the code to calculate the target default domain type
Put all the code to calculate the default domain type into one
function. Make the function able to handle the
iommu_change_dev_def_domain() by taking in the target domain type and
erroring out if the target type isn't reachable.

This makes it really clear that specifying a 0 type during
iommu_change_dev_def_domain() will have the same outcome as the normal
probe path.

Remove the obfuscating use of __iommu_group_for_each_dev() and related
struct __group_domain_type.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/12-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:56 +02:00
Jason Gunthorpe dfddd54dc7 iommu: Remove the assignment of group->domain during default domain alloc
group->domain should only be set once all the device's drivers have
had their ops->attach_dev() called. iommu_group_alloc_default_domain()
doesn't do this, so it shouldn't set the value.

The previous patches organized things so that each caller of
iommu_group_alloc_default_domain() follows up with calling
__iommu_group_set_domain_internal() that does set the group->domain.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/11-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:55 +02:00
Jason Gunthorpe 152431e4fe iommu: Do iommu_group_create_direct_mappings() before attach
The iommu_probe_device() path calls iommu_create_device_direct_mappings()
after attaching the device.

IOMMU_RESV_DIRECT maps need to be continually in place, so if a hotplugged
device has new ranges the should have been mapped into the default domain
before it is attached.

Move the iommu_create_device_direct_mappings() call up.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/10-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:55 +02:00
Jason Gunthorpe e7f85dfbbc iommu: Fix iommu_probe_device() to attach the right domain
The general invariant is that all devices in an iommu_group are attached
to group->domain. We missed some cases here where an owned group would not
get the device attached.

Rework this logic so it follows the default domain flow of the
bus_iommu_probe() - call iommu_alloc_default_domain(), then use
__iommu_group_set_domain_internal() to set up all the devices.

Finally always attach the device to the current domain if it is already
set.

This is an unlikely functional issue as iommufd uses iommu_attach_group().
It is possible to hot plug in a new group member, add a vfio driver to it
and then hot add it to an existing iommufd. In this case it is required
that the core code set the iommu_domain properly since iommufd won't call
iommu_attach_group() again.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/9-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:54 +02:00
Jason Gunthorpe 2f74198ae0 iommu: Replace iommu_group_do_dma_first_attach with __iommu_device_set_domain
Since __iommu_device_set_domain() now knows how to handle deferred attach
we can just call it directly from the only call site.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/8-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:54 +02:00
Jason Gunthorpe 0046a4337e iommu: Remove iommu_group_do_dma_first_attach() from iommu_group_add_device()
This function is only used to construct the groups, it should not be
operating the iommu driver.

External callers in VFIO and POWER do not have any iommu drivers on the
devices so group->domain will be NULL.

The only internal caller is from iommu_probe_device() which already calls
iommu_group_do_dma_first_attach(), meaning we are calling it twice in the
only case it matters.

Since iommu_probe_device() is the logical place to sort out the group's
domain, remove the call from iommu_group_add_device().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:53 +02:00
Jason Gunthorpe d257344c66 iommu: Replace __iommu_group_dma_first_attach() with set_domain
Reorganize the attach_deferred logic to set dev->iommu->attach_deferred
immediately during probe and then have __iommu_device_set_domain() check
it and not attach the default_domain.

This is to prepare for removing the group->domain set from
iommu_group_alloc_default_domain() by calling __iommu_group_set_domain()
to set the group->domain.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:53 +02:00
Jason Gunthorpe 4c8ad9da05 iommu: Use __iommu_group_set_domain() in iommu_change_dev_def_domain()
This is missing re-attach error handling if the attach fails, use the
common code.

The ugly "group->domain = prev_domain" will be cleaned in a later patch.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:52 +02:00
Jason Gunthorpe ecd60dc5d2 iommu: Use __iommu_group_set_domain() for __iommu_attach_group()
The error recovery here matches the recovery inside
__iommu_group_set_domain(), so just use it directly.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:52 +02:00
Jason Gunthorpe dcf40ed3a2 iommu: Make __iommu_group_set_domain() handle error unwind
Let's try to have a consistent and clear strategy for error handling
during domain attach failures.

There are two broad categories, the first is callers doing destruction and
trying to set the domain back to a previously good domain. These cases
cannot handle failure during destruction flows and must succeed, or at
least avoid a UAF on the current group->domain which is likely about to be
freed.

Many of the drivers are well behaved here and will not hit the WARN_ON's
or a UAF, but some are doing hypercalls/etc that can fail unpredictably
and don't meet the expectations.

The second case is attaching a domain for the first time in a failable
context, failure should restore the attachment back to group->domain using
the above unfailable operation.

Have __iommu_group_set_domain_internal() execute a common algorithm that
tries to achieve this, and in the worst case, would leave a device
"detached" or assigned to a global blocking domain. This relies on some
existing common driver behaviors where attach failure will also do detatch
and true IOMMU_DOMAIN_BLOCK implementations that are not allowed to ever
fail.

Name the first case with __iommu_group_set_domain_nofail() to make it
clear.

Pull all the error handling and WARN_ON generation into
__iommu_group_set_domain_internal().

Avoid the obfuscating use of __iommu_group_for_each_dev() and be more
careful about what should happen during failures by only touching devices
we've already touched.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:51 +02:00
Jason Gunthorpe 3006b15b36 iommu: Add for_each_group_device()
Convenience macro to iterate over every struct group_device in the group.

Replace all open coded list_for_each_entry's with this macro.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:51 +02:00
Jason Gunthorpe 4db0e5f887 iommu: Replace iommu_group_device_count() with list_count_nodes()
No reason to wrapper a standard function, just call the library directly.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:50 +02:00
Florian Fainelli 32261d1094 iommu: Suppress empty whitespaces in prints
If IOMMU_CMD_LINE_DMA_API or IOMMU_CMD_LINE_STRICT are not set in
iommu_cmd_line, we will be emitting a whitespace before the newline.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230509191049.1752259-1-f.fainelli@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:41:51 +02:00
Robin Murphy a4fdd97622 iommu: Use flush queue capability
It remains really handy to have distinct DMA domain types within core
code for the sake of default domain policy selection, but we can now
hide that detail from drivers by using the new capability instead.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> # amd, intel, smmu-v3
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1c552d99e8ba452bdac48209fa74c0bdd52fd9d9.1683233867.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:38:45 +02:00
Robin Murphy 4a20ce0ff6 iommu: Add a capability for flush queue support
Passing a special type to domain_alloc to indirectly query whether flush
queues are a worthwhile optimisation with the given driver is a bit
clunky, and looking increasingly anachronistic. Let's put that into an
explicit capability instead.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> # amd, intel, smmu-v3
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/f0086a93dbccb92622e1ace775846d81c1c4b174.1683233867.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:38:44 +02:00
Jon Pan-Doh 2212fc2acf iommu/amd: Fix domain flush size when syncing iotlb
When running on an AMD vIOMMU, we observed multiple invalidations (of
decreasing power of 2 aligned sizes) when unmapping a single page.

Domain flush takes gather bounds (end-start) as size param. However,
gather->end is defined as the last inclusive address (start + size - 1).
This leads to an off by 1 error.

With this patch, verified that 1 invalidation occurs when unmapping a
single page.

Fixes: a270be1b3f ("iommu/amd: Use only natural aligned flushes in a VM")
Cc: stable@vger.kernel.org # >= 5.15
Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Tested-by: Sudheer Dantuluri <dantuluris@google.com>
Suggested-by: Gary Zibrat <gzibrat@google.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Acked-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20230426203256.237116-1-pandoh@google.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:33:43 +02:00
Jason Gunthorpe 29f54745f2 iommu/amd: Add missing domain type checks
Drivers are supposed to list the domain types they support in their
domain_alloc() ops so when we add new domain types, like BLOCKING or SVA,
they don't start breaking.

This ended up providing an empty UNMANAGED domain when the core code asked
for a BLOCKING domain, which happens to be the fallback for drivers that
don't support it, but this is completely wrong for SVA.

Check for the DMA types AMD supports and reject every other kind.

Fixes: 136467962e ("iommu: Add IOMMU SVA domain support")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/0-v1-2ac37b893728+da-amd_check_types_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:26:45 +02:00
Jerry Snitselaar 8ec4e2befe iommu/amd: Fix up merge conflict resolution
Merge commit e17c6debd4 ("Merge branches 'arm/mediatek', 'arm/msm', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'x86/vt-d' and 'x86/amd' into next")
added amd_iommu_init_devices, amd_iommu_uninit_devices,
and amd_iommu_init_notifier back to drivers/iommu/amd/amd_iommu.h.
The only references to them are here, so clean them up.

Fixes: e17c6debd4 ("Merge branches 'arm/mediatek', 'arm/msm', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'x86/vt-d' and 'x86/amd' into next")
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230420192013.733331-1-jsnitsel@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:23:32 +02:00
Carlos Bilbao 75a616168b iommu/amd: Update copyright notice
The most recent changes to AMD'S IOMMU, such as level 5 guest page table
support date to the year 2023. Update copyright statement accordingly.

Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com>
Link: https://lore.kernel.org/r/20230420173006.3100682-1-carlos.bilbao@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:20:50 +02:00
Jerry Snitselaar 354440a761 iommu/amd: Use page mode macros in fetch_pte()
Use the page mode macros instead of magic numbers in fetch_pte.

Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230420080718.523132-1-jsnitsel@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:17:41 +02:00
Joao Martins af47b0a240 iommu/amd: Handle GALog overflows
GALog exists to propagate interrupts into all vCPUs in the system when
interrupts are marked as non running (e.g. when vCPUs aren't running). A
GALog overflow happens when there's in no space in the log to record the
GATag of the interrupt. So when the GALOverflow condition happens, the
GALog queue is processed and the GALog is restarted, as the IOMMU
manual indicates in section "2.7.4 Guest Virtual APIC Log Restart
Procedure":

| * Wait until MMIO Offset 2020h[GALogRun]=0b so that all request
|   entries are completed as circumstances allow. GALogRun must be 0b to
|   modify the guest virtual APIC log registers safely.
| * Write MMIO Offset 0018h[GALogEn]=0b.
| * As necessary, change the following values (e.g., to relocate or
| resize the guest virtual APIC event log):
|   - the Guest Virtual APIC Log Base Address Register
|      [MMIO Offset 00E0h],
|   - the Guest Virtual APIC Log Head Pointer Register
|      [MMIO Offset 2040h][GALogHead], and
|   - the Guest Virtual APIC Log Tail Pointer Register
|      [MMIO Offset 2048h][GALogTail].
| * Write MMIO Offset 2020h[GALOverflow] = 1b to clear the bit (W1C).
| * Write MMIO Offset 0018h[GALogEn] = 1b, and either set
|   MMIO Offset 0018h[GAIntEn] to enable the GA log interrupt or clear
|   the bit to disable it.

Failing to handle the GALog overflow means that none of the VFs (in any
guest) will work with IOMMU AVIC forcing the user to power cycle the
host. When handling the event it resumes the GALog without resizing
much like how it is done in the event handler overflow. The
[MMIO Offset 2020h][GALOverflow] bit might be set in status register
without the [MMIO Offset 2020h][GAInt] bit, so when deciding to poll
for GA events (to clear space in the galog), also check the overflow
bit.

[suravee: Check for GAOverflow without GAInt, toggle CONTROL_GAINT_EN]

Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230419201154.83880-3-joao.m.martins@oracle.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:16:04 +02:00
Joao Martins ed8a2f4dde iommu/amd: Don't block updates to GATag if guest mode is on
On KVM GSI routing table updates, specially those where they have vIOMMUs
with interrupt remapping enabled (to boot >255vcpus setups without relying
on KVM_FEATURE_MSI_EXT_DEST_ID), a VMM may update the backing VF MSIs
with a new VCPU affinity.

On AMD with AVIC enabled, the new vcpu affinity info is updated via:
	avic_pi_update_irte()
		irq_set_vcpu_affinity()
			amd_ir_set_vcpu_affinity()
				amd_iommu_{de}activate_guest_mode()

Where the IRTE[GATag] is updated with the new vcpu affinity. The GATag
contains VM ID and VCPU ID, and is used by IOMMU hardware to signal KVM
(via GALog) when interrupt cannot be delivered due to vCPU is in
blocking state.

The issue is that amd_iommu_activate_guest_mode() will essentially
only change IRTE fields on transitions from non-guest-mode to guest-mode
and otherwise returns *with no changes to IRTE* on already configured
guest-mode interrupts. To the guest this means that the VF interrupts
remain affined to the first vCPU they were first configured, and guest
will be unable to issue VF interrupts and receive messages like this
from spurious interrupts (e.g. from waking the wrong vCPU in GALog):

[  167.759472] __common_interrupt: 3.34 No irq handler for vector
[  230.680927] mlx5_core 0000:00:02.0: mlx5_cmd_eq_recover:247:(pid
3122): Recovered 1 EQEs on cmd_eq
[  230.681799] mlx5_core 0000:00:02.0:
wait_func_handle_exec_timeout:1113:(pid 3122): cmd[0]: CREATE_CQ(0x400)
recovered after timeout
[  230.683266] __common_interrupt: 3.34 No irq handler for vector

Given the fact that amd_ir_set_vcpu_affinity() uses
amd_iommu_activate_guest_mode() underneath it essentially means that VCPU
affinity changes of IRTEs are nops. Fix it by dropping the check for
guest-mode at amd_iommu_activate_guest_mode(). Same thing is applicable to
amd_iommu_deactivate_guest_mode() although, even if the IRTE doesn't change
underlying DestID on the host, the VFIO IRQ handler will still be able to
poke at the right guest-vCPU.

Fixes: b9c6ff94e4 ("iommu/amd: Re-factor guest virtual APIC (de-)activation code")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20230419201154.83880-2-joao.m.martins@oracle.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:16:04 +02:00
Zhen Lei 5d62bacc05 iommu/iova: Optimize iova_magazine_alloc()
Only the member 'size' needs to be initialized to 0. Clearing the array
pfns[], which is about 1 KiB in size, not only wastes time, but also
causes cache pollution.

Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20230421072422.869-1-thunder.leizhen@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:09:51 +02:00
Chao Wang ec014683c5 iommu/rockchip: Fix unwind goto issue
Smatch complains that
drivers/iommu/rockchip-iommu.c:1306 rk_iommu_probe() warn: missing unwind goto?

The rk_iommu_probe function, after obtaining the irq value through
platform_get_irq, directly returns an error if the returned value
is negative, without releasing any resources.

Fix this by adding a new error handling label "err_pm_disable" and
use a goto statement to redirect to the error handling process. In
order to preserve the original semantics, set err to the value of irq.

Fixes: 1aa55ca9b1 ("iommu/rockchip: Move irq request past pm_runtime_enable")
Signed-off-by: Chao Wang <D202280639@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20230417030421.2777-1-D202280639@hust.edu.cn
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:03:08 +02:00
Randy Dunlap e332003bb2 iommu: Make IPMMU_VMSA dependencies more strict
On riscv64, linux-next-20233030 (and for several days earlier),
there is a kconfig warning:

WARNING: unmet direct dependencies detected for IOMMU_IO_PGTABLE_LPAE
  Depends on [n]: IOMMU_SUPPORT [=y] && (ARM || ARM64 || COMPILE_TEST [=n]) && !GENERIC_ATOMIC64 [=n]
  Selected by [y]:
  - IPMMU_VMSA [=y] && IOMMU_SUPPORT [=y] && (ARCH_RENESAS [=y] || COMPILE_TEST [=n]) && !GENERIC_ATOMIC64 [=n]

and build errors:

riscv64-linux-ld: drivers/iommu/io-pgtable-arm.o: in function `.L140':
io-pgtable-arm.c:(.init.text+0x1e8): undefined reference to `alloc_io_pgtable_ops'
riscv64-linux-ld: drivers/iommu/io-pgtable-arm.o: in function `.L168':
io-pgtable-arm.c:(.init.text+0xab0): undefined reference to `free_io_pgtable_ops'
riscv64-linux-ld: drivers/iommu/ipmmu-vmsa.o: in function `.L140':
ipmmu-vmsa.c:(.text+0xbc4): undefined reference to `free_io_pgtable_ops'
riscv64-linux-ld: drivers/iommu/ipmmu-vmsa.o: in function `.L0 ':
ipmmu-vmsa.c:(.text+0x145e): undefined reference to `alloc_io_pgtable_ops'

Add ARM || ARM64 || COMPILE_TEST dependencies to IPMMU_VMSA to prevent
these issues, i.e., so that ARCH_RENESAS on RISC-V is not allowed.

This makes the ARCH dependencies become:
	depends on (ARCH_RENESAS && (ARM || ARM64)) || COMPILE_TEST
but that can be a bit hard to read.

Fixes: 8292493c22 ("riscv: Kconfig.socs: Add ARCH_RENESAS kconfig option")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux.dev
Cc: Conor Dooley <conor@kernel.org>
Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20230330165817.21920-1-rdunlap@infradead.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:00:34 +02:00
Linus Torvalds d635f6cc93 drm fixes for 6.4-rc3
amdgpu:
 - update gfx11 clock counter logic
 - Fix a race when disabling gfxoff on gfx10/11 for profiling
 - Raven/Raven2/PCO clock counter fix
 - Add missing get_vbios_fb_size for GMC 11
 - Fix a spurious irq warning in the device remove case
 - Fix possible power mode mismatch between driver and PMFW
 - USB4 fix
 
 exynos:
 - fix build warning
 
 i915:
 - fix missing NULL check in HDCP code
 
 msm:
 - display:
 - msm8998: fix fetch and qos to align with downstream
 - msm8998: fix LM pairs to align with downstream
 - remove unused INTF0 interrupt mask on some chipsets
 - remove TE2 block from relevant chipsets
 - relocate non-MDP_TOP offset to different header
 - fix some indentation
 - fix register offets/masks for dither blocks
 - make ping-ping block length 0
 - remove duplicated defines
 - fix log mask for writeback block
 - unregister the hdmi codec for dp during unbind
 - fix yaml warnings
 - gpu:
 - fix submit error path leak
 - arm-smmu-qcom fix for regression that broke per-process page tables
 - fix no-iommu crash
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmRoHNwACgkQDHTzWXnE
 hr6HyBAAgNbzBLtkRbpirwH3oB5qK7geR+CiKGEHVieopD5y+DGvnCQgpuDSfxtG
 qJv4OXTwavRh3/w5OhzOMPHqfpyCcHtgFofqeGSiXhnVhlz3WCNLkbJOPgT4x1Pu
 zyNfgn/Cy6Rp36ZMT+f+3IVvBcctXYADiwJ2wIqEppdGn3K6KrZZTRcHWHe+hWW2
 znWShx9Zl8knx2JEmhXrW6sLAE+7ra2DBCPMfKSTg+RnULl7LqSdUlriSMPwpAvH
 pvruU5+xEAhrGnJp/YZ3IHeCiM9mXMCBLu9l8l/Cr0568py4vn30CwYBTr7jK9Ls
 shqBUqtefmmitLQZ0iVW1HathVMHOf8u06sq6qQ25oi6JcSYbv6BnsTQyyzj4fmV
 WJL4NKKu8PhhrlvK5yXzp5kVOPdjhmyE2myb9b5bDDPJgeLoBlNujWYrDJsEC5sP
 fysgdriFJG224Sv6LJornAORBIkSW5WIZEFO5PlaVAZNHJGAAkvI9XvEI/Gx2OPN
 Y2PavFxp0MIfjzn4AOBlFJqRq7s9Og42q5k5+xeiSs27X/jAYs0MJqXHQIoSj856
 /CE0Bh1i75VdjpEZJ6ZOiDntUwwUWIX7Uba3IXWpUW4pUYynSdbnji4Sn/8P6e/H
 GfAhQahObw8CsQPXW07N0LW7rCxe8DK6Dw9GR1gS5PmAw0bIB9I=
 =gMUV
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2023-05-20' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular fixes pull, amdgpu and msm make up most of these, nothing too
  serious, also one i915 and one exynos.

  I didn't get a misc fixes pull this week (one of the maintainers is
  off, so have to engage the backup) so I think there are a few
  outstanding patches that will show up next week,

  amdgpu:
   - update gfx11 clock counter logic
   - Fix a race when disabling gfxoff on gfx10/11 for profiling
   - Raven/Raven2/PCO clock counter fix
   - Add missing get_vbios_fb_size for GMC 11
   - Fix a spurious irq warning in the device remove case
   - Fix possible power mode mismatch between driver and PMFW
   - USB4 fix

  exynos:
   - fix build warning

  i915:
   - fix missing NULL check in HDCP code

  msm:
   - display:
      - msm8998: fix fetch and qos to align with downstream
      - msm8998: fix LM pairs to align with downstream
      - remove unused INTF0 interrupt mask on some chipsets
      - remove TE2 block from relevant chipsets
      - relocate non-MDP_TOP offset to different header
      - fix some indentation
      - fix register offets/masks for dither blocks
      - make ping-ping block length 0
      - remove duplicated defines
      - fix log mask for writeback block
      - unregister the hdmi codec for dp during unbind
      - fix yaml warnings
   - gpu:
      - fix submit error path leak
      - arm-smmu-qcom fix for regression that broke per-process page
        tables
      - fix no-iommu crash"

* tag 'drm-fixes-2023-05-20' of git://anongit.freedesktop.org/drm/drm: (29 commits)
  drm/amd/display: enable dpia validate
  drm/amd/pm: fix possible power mode mismatch between driver and PMFW
  drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged
  drm/amdgpu/gmc11: implement get_vbios_fb_size()
  drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according to revision id
  drm/amdgpu/gfx11: Adjust gfxoff before powergating on gfx11 as well
  drm/amdgpu/gfx10: Disable gfxoff before disabling powergating.
  drm/amdgpu/gfx11: update gpu_clock_counter logic
  drm/msm: Be more shouty if per-process pgtables aren't working
  iommu/arm-smmu-qcom: Fix missing adreno_smmu's
  drm/i915/hdcp: Check if media_gt exists
  drm/exynos: fix g2d_open/close helper function definitions
  drm/msm: Fix submit error-path leaks
  drm/msm/iommu: Fix null pointer dereference in no-IOMMU case
  dt-bindings: display/msm: dsi-controller-main: Document qcom, master-dsi and qcom, sync-dual-dsi
  drm/msm/dpu: Remove duplicate register defines from INTF
  drm/msm/dpu: Set PINGPONG block length to zero for DPU >= 7.0.0
  drm/msm/dpu: Use V2 DITHER PINGPONG sub-block in SM8[34]50/SC8280XP
  drm/msm/dpu: Fix PP_BLK_DIPHER -> DITHER typo
  drm/msm/dpu: Reindent REV_7xxx interrupt masks with tabs
  ...
2023-05-19 19:11:20 -07:00
Dave Airlie 83ab69c9f7 Merge tag 'drm-msm-fixes-2023-05-17' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
msm-fixes for v6.4-rc3

Display Fixes:

+ Catalog fixes:
 - fix the programmable fetch lines and qos settings of msm8998
   to match what is present downstream
 - fix the LM pairs for msm8998 to match what is present downstream.
   The current settings are not right as LMs with incompatible
   connected blocks are paired
 - remove unused INTF0 interrupt mask from SM6115/QCM2290 as there
   is no INTF0 present on those chipsets. There is only one DSI on
   index 1
 - remove TE2 block from relevant chipsets because this is mainly
   used for ping-pong split feature which is not supported upstream
   and also for the chipsets where we are removing them in this
   change, that block is not present as the tear check has been moved
   to the intf block
 - relocate non-MDP_TOP INTF_INTR offsets from dpu_hwio.h to
   dpu_hw_interrupts.c to match where they belong
 - fix the indentation for REV_7xxx interrupt masks
 - fix the offset and version for dither blocks of SM8[34]50/SC8280XP
   chipsets as it was incorrect
 - make the ping-pong blk length 0 for appropriate chipsets as those
   chipsets only have a dither ping-pong dither block but no other
   functionality in the base ping-pong
 - remove some duplicate register defines from INTF
+ Fix the log mask for the writeback block so that it can be enabled
  correctly via debugfs
+ unregister the hdmi codec for dp during unbind otherwise it leaks
  audio codec devices
+ Yaml change to fix warnings related to 'qcom,master-dsi' and
  'qcom,sync-dual-dsi'

GPU Fixes:

+ fix submit error path leak
+ arm-smmu-qcom fix for regression that broke per-process page tables
+ fix no-iommu crash

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvHEcJfp=k6qatmb_SvAeyvy3CBpaPfwLqtNthuEzA_7w@mail.gmail.com
2023-05-19 11:22:23 +10:00
Rob Clark e36ca2fad6 iommu/arm-smmu-qcom: Fix missing adreno_smmu's
When the special handling of qcom,adreno-smmu was moved into
qcom_smmu_create(), it was overlooked that we didn't have all the
required entries in qcom_smmu_impl_of_match.  So we stopped getting
adreno_smmu_priv on sc7180, breaking per-process pgtables.

Fixes: 30b912a03d ("iommu/arm-smmu-qcom: Move the qcom,adreno-smmu check into qcom_smmu_create")
Cc: <stable@vger.kernel.org>
Suggested-by: Lepton Wu <lepton@chromium.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/537357/
Link: https://lore.kernel.org/r/20230516222039.907690-1-robdclark@gmail.com
2023-05-17 08:53:35 -07:00
Jason Gunthorpe 0f1cbf941d s390/iommu: get rid of S390_CCW_IOMMU and S390_AP_IOMMU
These don't do anything anymore, the only user of the symbol was
VFIO_CCW/AP which already "depends on VFIO" and VFIO itself selects
IOMMU_API.

When this was added VFIO was wrongly doing "depends on IOMMU_API" which
required some contortions like this to ensure IOMMU_API was turned on.

Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v2-eb322ce2e547+188f-rm_iommu_ccw_jgg@nvidia.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-17 15:20:18 +02:00
Linus Torvalds 58390c8ce1 IOMMU Updates for Linux 6.4
Including:
 
 	- Convert to platform remove callback returning void
 
 	- Extend changing default domain to normal group
 
 	- Intel VT-d updates:
 	    - Remove VT-d virtual command interface and IOASID
 	    - Allow the VT-d driver to support non-PRI IOPF
 	    - Remove PASID supervisor request support
 	    - Various small and misc cleanups
 
 	- ARM SMMU updates:
 	    - Device-tree binding updates:
 	        * Allow Qualcomm GPU SMMUs to accept relevant clock properties
 	        * Document Qualcomm 8550 SoC as implementing an MMU-500
 	        * Favour new "qcom,smmu-500" binding for Adreno SMMUs
 
 	    - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
 	      implementations
 
 	    - Acknowledge SMMUv3 PRI queue overflow when consuming events
 
 	    - Document (in a comment) why ATS is disabled for bypass streams
 
 	- AMD IOMMU updates:
 	    - 5-level page-table support
 	    - NUMA awareness for memory allocations
 
 	- Unisoc driver: Support for reattaching an existing domain
 
 	- Rockchip driver: Add missing set_platform_dma_ops callback
 
 	- Mediatek driver: Adjust the dma-ranges
 
 	- Various other small fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmRONeAACgkQK/BELZcB
 GuPmpw/8C9ruxQ0JU5rcDBXQGvos4gMmxlbELMrBpbbiTtdb35xchpKfdhnECGIF
 k2SrrcF40R/S82SyzNU/eZtGKirtcXvGFraUFgu/QdCcnnqpRHs+IJMXX2NJP+it
 +0wO1uiInt3CN1ERcR4F31cDKiWjDG8bvQVE5LIyiy4KrIU5ld2G91Fkaa0R13Au
 6H+/wKkcUC6OyaGE6wPx474xBkapT20vj5AIQuAWisXJJR0wbBon1sUTo/IRKsU+
 IkNxH0W+1PNImJ+crAdf/nkOlyqoChY4ww6cm07LrOsBLIsX5bCqXfL4HvKthElD
 MEgk2SN5kfjfR5Vf29W4hZVM1CT8VbhO41I7OzaZ6X6RU2PXoldPKlgKtZGeSKn1
 9bcMpSgB0BtbttvBevSkxTo5KHFozXS2DG3DFoMB3yFMme8Th0LrhBZ9oB7NIPNw
 ntMo4K75vviC6Vvzjy4Anj/+y+Zm3W6wDDP7F12O6WZLkK5s4hrSsHUm/MQnnKQP
 muJlG870RnSl73xUQZe3cuBxktXuJ3EHqqYIPE0npzvauu8hhWcis3opf2Y+U2s8
 aBCCIgp5kTKqjHLh2e4lNCKZf1/b/dhxRcRBQhpAIb8YsjMlIJyM+G8Jz6K6gBga
 5Ld+68UQ3oHJwoLV1HCFN8jbpQ9KZn1s9+h3yrYjRAcLNiFb3nU=
 =OvTo
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Convert to platform remove callback returning void

 - Extend changing default domain to normal group

 - Intel VT-d updates:
     - Remove VT-d virtual command interface and IOASID
     - Allow the VT-d driver to support non-PRI IOPF
     - Remove PASID supervisor request support
     - Various small and misc cleanups

 - ARM SMMU updates:
     - Device-tree binding updates:
         * Allow Qualcomm GPU SMMUs to accept relevant clock properties
         * Document Qualcomm 8550 SoC as implementing an MMU-500
         * Favour new "qcom,smmu-500" binding for Adreno SMMUs

     - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
       implementations

     - Acknowledge SMMUv3 PRI queue overflow when consuming events

     - Document (in a comment) why ATS is disabled for bypass streams

 - AMD IOMMU updates:
     - 5-level page-table support
     - NUMA awareness for memory allocations

 - Unisoc driver: Support for reattaching an existing domain

 - Rockchip driver: Add missing set_platform_dma_ops callback

 - Mediatek driver: Adjust the dma-ranges

 - Various other small fixes and cleanups

* tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (82 commits)
  iommu: Remove iommu_group_get_by_id()
  iommu: Make iommu_release_device() static
  iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope()
  iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn)
  iommu/vt-d: Remove BUG_ON in map/unmap()
  iommu/vt-d: Remove BUG_ON when domain->pgd is NULL
  iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation
  iommu/vt-d: Remove BUG_ON on checking valid pfn range
  iommu/vt-d: Make size of operands same in bitwise operations
  iommu/vt-d: Remove PASID supervisor request support
  iommu/vt-d: Use non-privileged mode for all PASIDs
  iommu/vt-d: Remove extern from function prototypes
  iommu/vt-d: Do not use GFP_ATOMIC when not needed
  iommu/vt-d: Remove unnecessary checks in iopf disabling path
  iommu/vt-d: Move PRI handling to IOPF feature path
  iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path
  iommu/vt-d: Move iopf code from SVA to IOPF enabling path
  iommu/vt-d: Allow SVA with device-specific IOPF
  dmaengine: idxd: Add enable/disable device IOPF feature
  arm64: dts: mt8186: Add dma-ranges for the parent "soc" node
  ...
2023-04-30 13:00:38 -07:00
Linus Torvalds 22b8cc3e78 Add support for new Linear Address Masking CPU feature. This is similar
to ARM's Top Byte Ignore and allows userspace to store metadata in some
 bits of pointers without masking it out before use.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRK/WIACgkQaDWVMHDJ
 krAL+RAAw33EhsWyYVkeAtYmYBKkGvlgeSDULtfJKe5bynJBTHkGKfM6RE9MSJIt
 5fHWaConGh8HNpy0Us1sDvd/aWcWRm5h7ZcCVD+R4qrgh/vc7ULzM+elXe5jzr4W
 cyuTckF2eW6SVrYg6fH5q+6Uy/moDtrdkLRvwRBf+AYeepB8gvSSH5XixKDNiVBE
 pjNy1xXVZQokqD4tjsFelmLttyacR5OabiE/aeVNoFYf9yTwfnN8N3T6kwuOoS4l
 Lp6NA+/0ux+oBlR+Is+JJG8Mxrjvz96yJGZYdR2YP5k3bMQtHAAjuq2w+GgqZm5i
 j3/E6KQepEGaCfC+bHl68xy/kKx8ik+jMCEcBalCC25J3uxbLz41g6K3aI890wJn
 +5ZtfcmoDUk9pnUyLxR8t+UjOSBFAcRSUE+FTjUH1qEGsMPK++9a4iLXz5vYVK1+
 +YCt1u5LNJbkDxE8xVX3F5jkXh0G01SJsuUVAOqHSNfqSNmohFK8/omqhVRrRqoK
 A7cYLtnOGiUXLnvjrwSxPNOzRrG+GAwqaw8gwOTaYogETWbTY8qsSCEVl204uYwd
 m8io9rk2ZXUdDuha56xpBbPE0JHL9hJ2eKCuPkfvRgJT9YFyTh+e0UdX20k+nDjc
 ang1S350o/Y0sus6rij1qS8AuxJIjHucG0GdgpZk3KUbcxoRLhI=
 =qitk
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 LAM (Linear Address Masking) support from Dave Hansen:
 "Add support for the new Linear Address Masking CPU feature.

  This is similar to ARM's Top Byte Ignore and allows userspace to store
  metadata in some bits of pointers without masking it out before use"

* tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside
  x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA
  selftests/x86/lam: Add test cases for LAM vs thread creation
  selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for linear-address masking
  selftests/x86/lam: Add inherit test cases for linear-address masking
  selftests/x86/lam: Add io_uring test cases for linear-address masking
  selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking
  selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking
  x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
  iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
  mm: Expose untagging mask in /proc/$PID/status
  x86/mm: Provide arch_prctl() interface for LAM
  x86/mm: Reduce untagged_addr() overhead for systems without LAM
  x86/uaccess: Provide untagged_addr() and remove tags before address check
  mm: Introduce untagged_addr_remote()
  x86/mm: Handle LAM on context switch
  x86: CPUID and CR3/CR4 flags for Linear Address Masking
  x86: Allow atomic MM_CONTEXT flags setting
  x86/mm: Rework address range check in get_user() and put_user()
2023-04-28 09:43:49 -07:00
Linus Torvalds 7fa8a8ee94 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
 
 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
 
 - zsmalloc performance improvements from Sergey Senozhatsky.
 
 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.
 
 - VFS rationalizations from Christoph Hellwig:
 
   - removal of most of the callers of write_one_page().
 
   - make __filemap_get_folio()'s return value more useful
 
 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing.  Use `mount -o noswap'.
 
 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.
 
 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).
 
 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.
 
 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
   than exclusive, and has fixed a bunch of errors which were caused by its
   unintuitive meaning.
 
 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.
 
 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.
 
 - Cleanups to do_fault_around() by Lorenzo Stoakes.
 
 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.
 
 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.
 
 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.
 
 - Lorenzo has also contributed further cleanups of vma_merge().
 
 - Chaitanya Prakash provides some fixes to the mmap selftesting code.
 
 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.
 
 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.
 
 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.
 
 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.
 
 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.
 
 - Yosry Ahmed has improved the performance of memcg statistics flushing.
 
 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.
 
 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.
 
 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.
 
 - Pankaj Raghav has made page_endio() unneeded and has removed it.
 
 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.
 
 - Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
 
 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.
 
 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.
 
 - Peter Xu fixes a few issues with uffd-wp at fork() time.
 
 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
 jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
 =J+Dj
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Linus Torvalds b6a7828502 modules-6.4-rc1
The summary of the changes for this pull requests is:
 
  * Song Liu's new struct module_memory replacement
  * Nick Alcock's MODULE_LICENSE() removal for non-modules
  * My cleanups and enhancements to reduce the areas where we vmalloc
    module memory for duplicates, and the respective debug code which
    proves the remaining vmalloc pressure comes from userspace.
 
 Most of the changes have been in linux-next for quite some time except
 the minor fixes I made to check if a module was already loaded
 prior to allocating the final module memory with vmalloc and the
 respective debug code it introduces to help clarify the issue. Although
 the functional change is small it is rather safe as it can only *help*
 reduce vmalloc space for duplicates and is confirmed to fix a bootup
 issue with over 400 CPUs with KASAN enabled. I don't expect stable
 kernels to pick up that fix as the cleanups would have also had to have
 been picked up. Folks on larger CPU systems with modules will want to
 just upgrade if vmalloc space has been an issue on bootup.
 
 Given the size of this request, here's some more elaborate details
 on this pull request.
 
 The functional change change in this pull request is the very first
 patch from Song Liu which replaces the struct module_layout with a new
 struct module memory. The old data structure tried to put together all
 types of supported module memory types in one data structure, the new
 one abstracts the differences in memory types in a module to allow each
 one to provide their own set of details. This paves the way in the
 future so we can deal with them in a cleaner way. If you look at changes
 they also provide a nice cleanup of how we handle these different memory
 areas in a module. This change has been in linux-next since before the
 merge window opened for v6.3 so to provide more than a full kernel cycle
 of testing. It's a good thing as quite a bit of fixes have been found
 for it.
 
 Jason Baron then made dynamic debug a first class citizen module user by
 using module notifier callbacks to allocate / remove module specific
 dynamic debug information.
 
 Nick Alcock has done quite a bit of work cross-tree to remove module
 license tags from things which cannot possibly be module at my request
 so to:
 
   a) help him with his longer term tooling goals which require a
      deterministic evaluation if a piece a symbol code could ever be
      part of a module or not. But quite recently it is has been made
      clear that tooling is not the only one that would benefit.
      Disambiguating symbols also helps efforts such as live patching,
      kprobes and BPF, but for other reasons and R&D on this area
      is active with no clear solution in sight.
 
   b) help us inch closer to the now generally accepted long term goal
      of automating all the MODULE_LICENSE() tags from SPDX license tags
 
 In so far as a) is concerned, although module license tags are a no-op
 for non-modules, tools which would want create a mapping of possible
 modules can only rely on the module license tag after the commit
 8b41fc4454 ("kbuild: create modules.builtin without Makefile.modbuiltin
 or tristate.conf").  Nick has been working on this *for years* and
 AFAICT I was the only one to suggest two alternatives to this approach
 for tooling. The complexity in one of my suggested approaches lies in
 that we'd need a possible-obj-m and a could-be-module which would check
 if the object being built is part of any kconfig build which could ever
 lead to it being part of a module, and if so define a new define
 -DPOSSIBLE_MODULE [0]. A more obvious yet theoretical approach I've
 suggested would be to have a tristate in kconfig imply the same new
 -DPOSSIBLE_MODULE as well but that means getting kconfig symbol names
 mapping to modules always, and I don't think that's the case today. I am
 not aware of Nick or anyone exploring either of these options. Quite
 recently Josh Poimboeuf has pointed out that live patching, kprobes and
 BPF would benefit from resolving some part of the disambiguation as
 well but for other reasons. The function granularity KASLR (fgkaslr)
 patches were mentioned but Joe Lawrence has clarified this effort has
 been dropped with no clear solution in sight [1].
 
 In the meantime removing module license tags from code which could never
 be modules is welcomed for both objectives mentioned above. Some
 developers have also welcomed these changes as it has helped clarify
 when a module was never possible and they forgot to clean this up,
 and so you'll see quite a bit of Nick's patches in other pull
 requests for this merge window. I just picked up the stragglers after
 rc3. LWN has good coverage on the motivation behind this work [2] and
 the typical cross-tree issues he ran into along the way. The only
 concrete blocker issue he ran into was that we should not remove the
 MODULE_LICENSE() tags from files which have no SPDX tags yet, even if
 they can never be modules. Nick ended up giving up on his efforts due
 to having to do this vetting and backlash he ran into from folks who
 really did *not understand* the core of the issue nor were providing
 any alternative / guidance. I've gone through his changes and dropped
 the patches which dropped the module license tags where an SPDX
 license tag was missing, it only consisted of 11 drivers.  To see
 if a pull request deals with a file which lacks SPDX tags you
 can just use:
 
   ./scripts/spdxcheck.py -f \
 	$(git diff --name-only commid-id | xargs echo)
 
 You'll see a core module file in this pull request for the above,
 but that's not related to his changes. WE just need to add the SPDX
 license tag for the kernel/module/kmod.c file in the future but
 it demonstrates the effectiveness of the script.
 
 Most of Nick's changes were spread out through different trees,
 and I just picked up the slack after rc3 for the last kernel was out.
 Those changes have been in linux-next for over two weeks.
 
 The cleanups, debug code I added and final fix I added for modules
 were motivated by David Hildenbrand's report of boot failing on
 a systems with over 400 CPUs when KASAN was enabled due to running
 out of virtual memory space. Although the functional change only
 consists of 3 lines in the patch "module: avoid allocation if module is
 already present and ready", proving that this was the best we can
 do on the modules side took quite a bit of effort and new debug code.
 
 The initial cleanups I did on the modules side of things has been
 in linux-next since around rc3 of the last kernel, the actual final
 fix for and debug code however have only been in linux-next for about a
 week or so but I think it is worth getting that code in for this merge
 window as it does help fix / prove / evaluate the issues reported
 with larger number of CPUs. Userspace is not yet fixed as it is taking
 a bit of time for folks to understand the crux of the issue and find a
 proper resolution. Worst come to worst, I have a kludge-of-concept [3]
 of how to make kernel_read*() calls for modules unique / converge them,
 but I'm currently inclined to just see if userspace can fix this
 instead.
 
 [0] https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/
 [1] https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com
 [2] https://lwn.net/Articles/927569/
 [3] https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmRG4m0SHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinQ2oP/0xlvKwJg6Ey8fHZF0qv8VOskE80zoLF
 hMazU3xfqLA+1TQvouW1YBxt3jwS3t1Ehs+NrV+nY9Yzcm0MzRX/n3fASJVe7nRr
 oqWWQU+voYl5Pw1xsfdp6C8IXpBQorpYby3Vp0MAMoZyl2W2YrNo36NV488wM9KC
 jD4HF5Z6xpnPSZTRR7AgW9mo7FdAtxPeKJ76Bch7lH8U6omT7n36WqTw+5B1eAYU
 YTOvrjRs294oqmWE+LeebyiOOXhH/yEYx4JNQgCwPdxwnRiGJWKsk5va0hRApqF/
 WW8dIqdEnjsa84lCuxnmWgbcPK8cgmlO0rT0DyneACCldNlldCW1LJ0HOwLk9pea
 p3JFAsBL7TKue4Tos6I7/4rx1ufyBGGIigqw9/VX5g0Iif+3BhWnqKRfz+p9wiMa
 Fl7cU6u7yC68CHu1HBSisK16cYMCPeOnTSd89upHj8JU/t74O6k/ARvjrQ9qmNUt
 c5U+OY+WpNJ1nXQydhY/yIDhFdYg8SSpNuIO90r4L8/8jRQYXNG80FDd1UtvVDuy
 eq0r2yZ8C0XHSlOT9QHaua/tWV/aaKtyC/c0hDRrigfUrq8UOlGujMXbUnrmrWJI
 tLJLAc7ePWAAoZXGSHrt0U27l029GzLwRdKqJ6kkDANVnTeOdV+mmBg9zGh3/Mp6
 agiwdHUMVN7X
 =56WK
 -----END PGP SIGNATURE-----

Merge tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull module updates from Luis Chamberlain:
 "The summary of the changes for this pull requests is:

   - Song Liu's new struct module_memory replacement

   - Nick Alcock's MODULE_LICENSE() removal for non-modules

   - My cleanups and enhancements to reduce the areas where we vmalloc
     module memory for duplicates, and the respective debug code which
     proves the remaining vmalloc pressure comes from userspace.

  Most of the changes have been in linux-next for quite some time except
  the minor fixes I made to check if a module was already loaded prior
  to allocating the final module memory with vmalloc and the respective
  debug code it introduces to help clarify the issue. Although the
  functional change is small it is rather safe as it can only *help*
  reduce vmalloc space for duplicates and is confirmed to fix a bootup
  issue with over 400 CPUs with KASAN enabled. I don't expect stable
  kernels to pick up that fix as the cleanups would have also had to
  have been picked up. Folks on larger CPU systems with modules will
  want to just upgrade if vmalloc space has been an issue on bootup.

  Given the size of this request, here's some more elaborate details:

  The functional change change in this pull request is the very first
  patch from Song Liu which replaces the 'struct module_layout' with a
  new 'struct module_memory'. The old data structure tried to put
  together all types of supported module memory types in one data
  structure, the new one abstracts the differences in memory types in a
  module to allow each one to provide their own set of details. This
  paves the way in the future so we can deal with them in a cleaner way.
  If you look at changes they also provide a nice cleanup of how we
  handle these different memory areas in a module. This change has been
  in linux-next since before the merge window opened for v6.3 so to
  provide more than a full kernel cycle of testing. It's a good thing as
  quite a bit of fixes have been found for it.

  Jason Baron then made dynamic debug a first class citizen module user
  by using module notifier callbacks to allocate / remove module
  specific dynamic debug information.

  Nick Alcock has done quite a bit of work cross-tree to remove module
  license tags from things which cannot possibly be module at my request
  so to:

   a) help him with his longer term tooling goals which require a
      deterministic evaluation if a piece a symbol code could ever be
      part of a module or not. But quite recently it is has been made
      clear that tooling is not the only one that would benefit.
      Disambiguating symbols also helps efforts such as live patching,
      kprobes and BPF, but for other reasons and R&D on this area is
      active with no clear solution in sight.

   b) help us inch closer to the now generally accepted long term goal
      of automating all the MODULE_LICENSE() tags from SPDX license tags

  In so far as a) is concerned, although module license tags are a no-op
  for non-modules, tools which would want create a mapping of possible
  modules can only rely on the module license tag after the commit
  8b41fc4454 ("kbuild: create modules.builtin without
  Makefile.modbuiltin or tristate.conf").

  Nick has been working on this *for years* and AFAICT I was the only
  one to suggest two alternatives to this approach for tooling. The
  complexity in one of my suggested approaches lies in that we'd need a
  possible-obj-m and a could-be-module which would check if the object
  being built is part of any kconfig build which could ever lead to it
  being part of a module, and if so define a new define
  -DPOSSIBLE_MODULE [0].

  A more obvious yet theoretical approach I've suggested would be to
  have a tristate in kconfig imply the same new -DPOSSIBLE_MODULE as
  well but that means getting kconfig symbol names mapping to modules
  always, and I don't think that's the case today. I am not aware of
  Nick or anyone exploring either of these options. Quite recently Josh
  Poimboeuf has pointed out that live patching, kprobes and BPF would
  benefit from resolving some part of the disambiguation as well but for
  other reasons. The function granularity KASLR (fgkaslr) patches were
  mentioned but Joe Lawrence has clarified this effort has been dropped
  with no clear solution in sight [1].

  In the meantime removing module license tags from code which could
  never be modules is welcomed for both objectives mentioned above. Some
  developers have also welcomed these changes as it has helped clarify
  when a module was never possible and they forgot to clean this up, and
  so you'll see quite a bit of Nick's patches in other pull requests for
  this merge window. I just picked up the stragglers after rc3. LWN has
  good coverage on the motivation behind this work [2] and the typical
  cross-tree issues he ran into along the way. The only concrete blocker
  issue he ran into was that we should not remove the MODULE_LICENSE()
  tags from files which have no SPDX tags yet, even if they can never be
  modules. Nick ended up giving up on his efforts due to having to do
  this vetting and backlash he ran into from folks who really did *not
  understand* the core of the issue nor were providing any alternative /
  guidance. I've gone through his changes and dropped the patches which
  dropped the module license tags where an SPDX license tag was missing,
  it only consisted of 11 drivers. To see if a pull request deals with a
  file which lacks SPDX tags you can just use:

    ./scripts/spdxcheck.py -f \
	$(git diff --name-only commid-id | xargs echo)

  You'll see a core module file in this pull request for the above, but
  that's not related to his changes. WE just need to add the SPDX
  license tag for the kernel/module/kmod.c file in the future but it
  demonstrates the effectiveness of the script.

  Most of Nick's changes were spread out through different trees, and I
  just picked up the slack after rc3 for the last kernel was out. Those
  changes have been in linux-next for over two weeks.

  The cleanups, debug code I added and final fix I added for modules
  were motivated by David Hildenbrand's report of boot failing on a
  systems with over 400 CPUs when KASAN was enabled due to running out
  of virtual memory space. Although the functional change only consists
  of 3 lines in the patch "module: avoid allocation if module is already
  present and ready", proving that this was the best we can do on the
  modules side took quite a bit of effort and new debug code.

  The initial cleanups I did on the modules side of things has been in
  linux-next since around rc3 of the last kernel, the actual final fix
  for and debug code however have only been in linux-next for about a
  week or so but I think it is worth getting that code in for this merge
  window as it does help fix / prove / evaluate the issues reported with
  larger number of CPUs. Userspace is not yet fixed as it is taking a
  bit of time for folks to understand the crux of the issue and find a
  proper resolution. Worst come to worst, I have a kludge-of-concept [3]
  of how to make kernel_read*() calls for modules unique / converge
  them, but I'm currently inclined to just see if userspace can fix this
  instead"

Link: https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/ [0]
Link: https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com [1]
Link: https://lwn.net/Articles/927569/ [2]
Link: https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org [3]

* tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (121 commits)
  module: add debugging auto-load duplicate module support
  module: stats: fix invalid_mod_bytes typo
  module: remove use of uninitialized variable len
  module: fix building stats for 32-bit targets
  module: stats: include uapi/linux/module.h
  module: avoid allocation if module is already present and ready
  module: add debug stats to help identify memory pressure
  module: extract patient module check into helper
  modules/kmod: replace implementation with a semaphore
  Change DEFINE_SEMAPHORE() to take a number argument
  module: fix kmemleak annotations for non init ELF sections
  module: Ignore L0 and rename is_arm_mapping_symbol()
  module: Move is_arm_mapping_symbol() to module_symbol.h
  module: Sync code of is_arm_mapping_symbol()
  scripts/gdb: use mem instead of core_layout to get the module address
  interconnect: remove module-related code
  interconnect: remove MODULE_LICENSE in non-modules
  zswap: remove MODULE_LICENSE in non-modules
  zpool: remove MODULE_LICENSE in non-modules
  x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modules
  ...
2023-04-27 16:36:55 -07:00
Linus Torvalds cec24b8b6b Char/Misc drivers for 6.4-rc1
Here is the "big" set of char/misc and other driver subsystems for
 6.4-rc1.
 
 It's pretty big, but due to the removal of pcmcia drivers, almost breaks
 even for number of lines added vs. removed, a nice change.
 
 Included in here are:
   - removal of unused PCMCIA drivers (finally!)
   - Interconnect driver updates and additions
   - Lots of IIO driver updates and additions
   - MHI driver updates
   - Coresight driver updates
   - NVMEM driver updates, which required some OF updates
   - W1 driver updates and a new maintainer to manage the subsystem
   - FPGA driver updates
   - New driver subsystem, CDX, for AMD systems
   - lots of other small driver updates and additions
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp5Eg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynSXgCg0kSw3vUYwpsnhAsQkoPw1QVA23sAn2edRCMa
 GEkPWjrROueCom7xbLMu
 =eR+P
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc drivers updates from Greg KH:
 "Here is the "big" set of char/misc and other driver subsystems for
  6.4-rc1.

  It's pretty big, but due to the removal of pcmcia drivers, almost
  breaks even for number of lines added vs. removed, a nice change.

  Included in here are:

   - removal of unused PCMCIA drivers (finally!)

   - Interconnect driver updates and additions

   - Lots of IIO driver updates and additions

   - MHI driver updates

   - Coresight driver updates

   - NVMEM driver updates, which required some OF updates

   - W1 driver updates and a new maintainer to manage the subsystem

   - FPGA driver updates

   - New driver subsystem, CDX, for AMD systems

   - lots of other small driver updates and additions

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (196 commits)
  mcb-lpc: Reallocate memory region to avoid memory overlapping
  mcb-pci: Reallocate memory region to avoid memory overlapping
  mcb: Return actual parsed size when reading chameleon table
  kernel/configs: Drop Android config fragments
  virt: acrn: Replace obsolete memalign() with posix_memalign()
  spmi: Add a check for remove callback when removing a SPMI driver
  spmi: fix W=1 kernel-doc warnings
  spmi: mtk-pmif: Drop of_match_ptr for ID table
  spmi: pmic-arb: Convert to platform remove callback returning void
  spmi: mtk-pmif: Convert to platform remove callback returning void
  spmi: hisi-spmi-controller: Convert to platform remove callback returning void
  w1: gpio: remove unnecessary ENOMEM messages
  w1: omap-hdq: remove unnecessary ENOMEM messages
  w1: omap-hdq: add SPDX tag
  w1: omap-hdq: allow compile testing
  w1: matrox: remove unnecessary ENOMEM messages
  w1: matrox: use inline over __inline__
  w1: matrox: switch from asm to linux header
  w1: ds2482: do not use assignment in if condition
  w1: ds2482: drop unnecessary header
  ...
2023-04-27 12:07:50 -07:00
Linus Torvalds 556eb8b791 Driver core changes for 6.4-rc1
Here is the large set of driver core changes for 6.4-rc1.
 
 Once again, a busy development cycle, with lots of changes happening in
 the driver core in the quest to be able to move "struct bus" and "struct
 class" into read-only memory, a task now complete with these changes.
 
 This will make the future rust interactions with the driver core more
 "provably correct" as well as providing more obvious lifetime rules for
 all busses and classes in the kernel.
 
 The changes required for this did touch many individual classes and
 busses as many callbacks were changed to take const * parameters
 instead.  All of these changes have been submitted to the various
 subsystem maintainers, giving them plenty of time to review, and most of
 them actually did so.
 
 Other than those changes, included in here are a small set of other
 things:
   - kobject logging improvements
   - cacheinfo improvements and updates
   - obligatory fw_devlink updates and fixes
   - documentation updates
   - device property cleanups and const * changes
   - firwmare loader dependency fixes.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
 LEGadNS38k5fs+73UaxV
 =7K4B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Joerg Roedel e51b419839 Merge branches 'iommu/fixes', 'arm/allwinner', 'arm/exynos', 'arm/mediatek', 'arm/omap', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 'unisoc', 'x86/vt-d', 'x86/amd', 'core' and 'platform-remove_new' into next 2023-04-14 13:45:50 +02:00
Jason Gunthorpe f7f9c054a2 iommu: Remove iommu_group_get_by_id()
This is never called.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0-v1-60bbc66d7e92+24-rm_iommu_get_by_id_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-14 13:09:07 +02:00
Jason Gunthorpe e223864f82 iommu: Make iommu_release_device() static
This is not called outside the core code, and indeed cannot be called
correctly outside the bus notifier. Make it static.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0-v1-c3da18124d2d+56-rm_iommu_release_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-14 13:07:53 +02:00
Nick Alcock 48a3cbf1c5 iommu/sun50i: remove MODULE_LICENSE in non-modules
Since commit 8b41fc4454 ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Jernej Skrabec <jernej.skrabec@gmail.com>
Cc: Samuel Holland <samuel@sholland.org>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: iommu@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-sunxi@lists.linux.dev
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-04-13 13:13:52 -07:00
Tina Zhang e60d63e32d iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope()
The dmar_insert_dev_scope() could fail if any unexpected condition is
encountered. However, in this situation, the kernel should attempt
recovery and proceed with execution. Remove BUG_ON with WARN_ON, so that
kernel can avoid being crashed when an unexpected condition occurs.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-8-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:53 +02:00
Tina Zhang ff45ab9646 iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn)
When dmar_alloc_pci_notify_info() is being invoked, the invoker has
ensured the dev->is_virtfn is false. So, remove the useless BUG_ON in
dmar_alloc_pci_notify_info().

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-7-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:53 +02:00
Tina Zhang cbf2f9e8ba iommu/vt-d: Remove BUG_ON in map/unmap()
Domain map/unmap with invalid parameters shouldn't crash the kernel.
Therefore, using if() replaces the BUG_ON.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-6-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:52 +02:00
Tina Zhang 998d4c2db3 iommu/vt-d: Remove BUG_ON when domain->pgd is NULL
When performing domain_context_mapping or getting dma_pte of a pfn, the
availability of the domain page table directory is ensured. Therefore,
the domain->pgd checkings are unnecessary.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-5-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:52 +02:00
Tina Zhang 4a627a2593 iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation
VT-d iotlb cache invalidation request with unexpected type is considered
as a bug to developers, which can be fixed. So, when such kind of issue
comes out, it needs to be reported through the kernel log, instead of
halting the system. Replacing BUG_ON with warning reporting.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-4-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:51 +02:00
Tina Zhang 35dc5d8998 iommu/vt-d: Remove BUG_ON on checking valid pfn range
When encountering an unexpected invalid pfn range, the kernel should
attempt recovery and proceed with execution. Therefore, using WARN_ON to
replace BUG_ON to avoid halting the machine.

Besides, one redundant checking is reduced.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-3-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:51 +02:00
Tina Zhang b31064f881 iommu/vt-d: Make size of operands same in bitwise operations
This addresses the following issue reported by klocwork tool:

 - operands of different size in bitwise operations

Suggested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-2-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:50 +02:00
Jacob Pan 113a031bec iommu/vt-d: Remove PASID supervisor request support
There's no more usage, remove PASID supervisor support.

Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230331231137.1947675-3-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:50 +02:00
Jacob Pan a7050fbde3 iommu/vt-d: Use non-privileged mode for all PASIDs
Supervisor Request Enable (SRE) bit in a PASID entry is for permission
checking on DMA requests. When SRE = 0, DMA with supervisor privilege
will be blocked. However, for in-kernel DMA this is not necessary in that
we are targeting kernel memory anyway. There's no need to differentiate
user and kernel for in-kernel DMA.

Let's use non-privileged (user) permission for all PASIDs used in kernel,
it will be consistent with DMA without PASID (RID_PASID) as well.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230331231137.1947675-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:49 +02:00
Lu Baolu a06c2ecec1 iommu/vt-d: Remove extern from function prototypes
The kernel coding style does not require 'extern' in function prototypes
in .h files, so remove them from drivers/iommu/intel/iommu.h as they are
not needed.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230331045452.500265-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:49 +02:00
Christophe JAILLET 41d71e09a1 iommu/vt-d: Do not use GFP_ATOMIC when not needed
There is no need to use GFP_ATOMIC here. GFP_KERNEL is already used for
some other memory allocations just a few lines above.

Commit e3a981d61d ("iommu/vt-d: Convert allocations to GFP_KERNEL") has
changed the other memory allocation flags.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/e2a8a1019ffc8a86b4b4ed93def3623f60581274.1675542576.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:48 +02:00
Lu Baolu 7b8aa998d6 iommu/vt-d: Remove unnecessary checks in iopf disabling path
iommu_unregister_device_fault_handler() and iopf_queue_remove_device()
are called after device has stopped issuing new page falut requests and
all outstanding page requests have been drained. They should never fail.
Trigger a warning if it happens unfortunately.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:48 +02:00
Lu Baolu fbcde5bb92 iommu/vt-d: Move PRI handling to IOPF feature path
PRI is only used for IOPF. With this move, the PCI/PRI feature could be
controlled by the device driver through iommu_dev_enable/disable_feature()
interfaces.

Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:48 +02:00
Lu Baolu 5ae4008055 iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path
They should be part of the per-device iommu private data initialization.

Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:47 +02:00
Lu Baolu 3d4c7cc3d1 iommu/vt-d: Move iopf code from SVA to IOPF enabling path
Generally enabling IOMMU_DEV_FEAT_SVA requires IOMMU_DEV_FEAT_IOPF, but
some devices manage I/O Page Faults themselves instead of relying on the
IOMMU. Move IOPF related code from SVA to IOPF enabling path.

For the device drivers that relies on the IOMMU for IOPF through PCI/PRI,
IOMMU_DEV_FEAT_IOPF must be enabled before and disabled after
IOMMU_DEV_FEAT_SVA.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:47 +02:00
Lu Baolu a86fb77173 iommu/vt-d: Allow SVA with device-specific IOPF
Currently enabling SVA requires IOPF support from the IOMMU and device
PCI PRI. However, some devices can handle IOPF by itself without ever
sending PCI page requests nor advertising PRI capability.

Allow SVA support with IOPF handled either by IOMMU (PCI PRI) or device
driver (device-specific IOPF). As long as IOPF could be handled, SVA
should continue to work.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:46 +02:00
Yong Wu f7da2da867 iommu/mediatek: Set dma_mask for the master devices
MediaTek iommu arranges dma ranges for all the masters, this patch is to
help them set dma mask. This is to avoid each master setting their own
mask, but also to avoid a real issue, such as JPEG uses
"mediatek,mtk-jpgenc" for 2701/8183/8186/8188, then JPEG could ignore its
different dma_mask in different SoC to achieve common code.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-10-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:59:27 +02:00
Yong Wu 3df9bdd4ae iommu/mediatek: Add a gap for the iova regions
As the removed property in the vcodec dt-binding, the property is:
dma-ranges = <0x1 0x0 0x0 0x40000000 0x0 0xfff00000>;

The length is 0xfff0_0000 rather than 0x1_0000_0000, this means it
requires 1M as a gap. This is because the end address for some vcodec
HW is (address + size). If the size is 4G, the end address may be
0x2_0000_0000, and the width for vcodec register only is 32, then the
HW may get the ZERO address.

Currently the consumer's dma-ranges property doesn't work, IOMMU
has to consider this case. Add a bigger gap(8M) for all the regions
to avoid it.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-9-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:59:26 +02:00
Yong Wu f5d4233ad3 iommu/mediatek: mt8186: Add iova_region_larb_msk
Add iova_region_larb_msk for mt8186. We separate the 16GB iova regions
by each device's larbid/portid.
Note: larb5/6/10/12/14/15/18 connect nothing in this SoC.
Refer to include/dt-bindings/memory/mt8186-memory-port.h

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-8-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:59:26 +02:00
Yong Wu a43e767d4e iommu/mediatek: mt8195: Add iova_region_larb_msk
Add iova_region_larb_msk for mt8195. We separate the 16GB iova regions
by each device's larbid/portid.
Refer to include/dt-bindings/memory/mt8195-memory-port.h

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-7-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:59:25 +02:00
Yong Wu 6b1317f928 iommu/mediatek: mt8192: Add iova_region_larb_msk
Add iova_region_larb_msk for mt8192. We separate the 16GB iova regions
by each device's larbid/portid.
Note: larb3/6/8/10/12/15 connect nothing in this SoC.
Refer to the comment in include/dt-bindings/memory/mt8192-larb-port.h

Define a new macro MT8192_MULTI_REGION_NR_MAX to indicate
the index of mt8xxx_larb_region_msk and
"struct mtk_iommu_iova_region mt8192_multi_dom"
are the same.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-6-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:59:25 +02:00
Yong Wu b2a6876d21 iommu/mediatek: Get regionid from larb/port id
After commit f1ad5338a4 ("of: Fix "dma-ranges" handling for bus
controllers"), the dma-ranges is not allowed for dts leaf node.
but we still would like to separate to different masters
into different iova regions.

Thus we have to separate it by the HW larbid and portid. For example,
larb1/2 are in region2 and larb3 is in region3. The problem is that
some ports inside a larb are in region4 while some ports inside this
larb are in region5. Therefore I define a "iova_region_larb_msk" to help
record the information for each a port. Take a example for a larb:
 [1] = ~0: means all ports in this larb are in region1;
 [2] = BIT(3) | BIT(4): means port3/4 in this larb are region2;
 [3] = ~(BIT(3) | BIT(4)): means all the other ports except port3/4
                           in this larb are region3.

This method also avoids the users forget/abuse the iova regions.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-5-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:59:24 +02:00
Yong Wu ae6693453a iommu/mediatek: Improve comment for the current region/bank
No functional change. Just add more comment about the current region/bank
in the code.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230411093144.2690-4-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:59:24 +02:00
Kishon Vijay Abraham I ccc62b8277 iommu/amd: Fix "Guest Virtual APIC Table Root Pointer" configuration in IRTE
commit b9c6ff94e4 ("iommu/amd: Re-factor guest virtual APIC
(de-)activation code") while refactoring guest virtual APIC
activation/de-activation code, stored information for activate/de-activate
in "struct amd_ir_data". It used 32-bit integer data type for storing the
"Guest Virtual APIC Table Root Pointer" (ga_root_ptr), though the
"ga_root_ptr" is actually a 40-bit field in IRTE (Interrupt Remapping
Table Entry).

This causes interrupts from PCIe devices to not reach the guest in the case
of PCIe passthrough with SME (Secure Memory Encryption) enabled as _SME_
bit in the "ga_root_ptr" is lost before writing it to the IRTE.

Fix it by using 64-bit data type for storing the "ga_root_ptr". While at
that also change the data type of "ga_tag" to u32 in order to match
the IOMMU spec.

Fixes: b9c6ff94e4 ("iommu/amd: Re-factor guest virtual APIC (de-)activation code")
Cc: stable@vger.kernel.org # v5.4+
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Link: https://lore.kernel.org/r/20230405130317.9351-1-kvijayab@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:57:30 +02:00
Jerry Snitselaar 8f880d19e6 iommu/amd: Set page size bitmap during V2 domain allocation
With the addition of the V2 page table support, the domain page size
bitmap needs to be set prior to iommu core setting up direct mappings
for reserved regions. When reserved regions are mapped, if this is not
done, it will be looking at the V1 page size bitmap when determining
the page size to use in iommu_pgsize(). When it gets into the actual
amd mapping code, a check of see if the page size is supported can
fail, because at that point it is checking it against the V2 page size
bitmap which only supports 4K, 2M, and 1G.

Add a check to __iommu_domain_alloc() to not override the
bitmap if it was already set by the iommu ops domain_alloc() code path.

Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Fixes: 4db6c41f09 ("iommu/amd: Add support for using AMD IOMMU v2 page table for DMA-API")
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230404072742.1895252-1-jsnitsel@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:56:19 +02:00
Steven Price 25c2325575 iommu/rockchip: Add missing set_platform_dma_ops callback
Similar to exynos, we need a set_platform_dma_ops() callback for proper
operation on ARM 32 bit after recent changes in the IOMMU framework
(detach ops removal). But also the use of a NULL domain is confusing.

Rework the code to add support for IOMMU_DOMAIN_IDENTITY and a singleton
rk_identity_domain which is assigned to domain when using an identity
mapping rather than "detaching". This makes the code easier to reason about.

Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20230331095154.2671129-1-steven.price@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:50:45 +02:00
Christophe JAILLET 5e799a7cee iommu/exynos: Use the devm_clk_get_optional() helper
Use devm_clk_get_optional() instead of hand writing it.
This saves some loC and improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/99c0d5ce643737ee0952df41fd60433a0bbeb447.1679834256.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 11:50:01 +02:00
Greg Kroah-Hartman 5790d407da Merge 6.3-rc6 into char-misc-next
We need it here to apply other char/misc driver changes to.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-10 08:49:26 +02:00
Kirill A. Shutemov 23baf831a3 mm, treewide: redefine MAX_ORDER sanely
MAX_ORDER currently defined as number of orders page allocator supports:
user can ask buddy allocator for page order between 0 and MAX_ORDER-1.

This definition is counter-intuitive and lead to number of bugs all over
the kernel.

Change the definition of MAX_ORDER to be inclusive: the range of orders
user can ask from buddy allocator is 0..MAX_ORDER now.

[kirill@shutemov.name: fix min() warning]
  Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box
[akpm@linux-foundation.org: fix another min_t warning]
[kirill@shutemov.name: fixups per Zi Yan]
  Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name
[akpm@linux-foundation.org: fix underlining in docs]
  Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:46 -07:00
Kirill A. Shutemov 61883d3c32 iommu: fix MAX_ORDER usage in __iommu_dma_alloc_pages()
MAX_ORDER is not inclusive: the maximum allocation order buddy allocator
can deliver is MAX_ORDER-1.

Fix MAX_ORDER usage in __iommu_dma_alloc_pages().

Also use GENMASK() instead of hard to read "(2U << order) - 1" magic.

Link: https://lkml.kernel.org/r/20230315113133.11326-10-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:46 -07:00
Jason Gunthorpe 692d42d411 Merge branch 'iommufd/for-rc' into for-next
The following selftest patch requires both the bug fixes and the
improvements of the selftest framework.

* iommufd/for-rc:
  iommufd: Do not corrupt the pfn list when doing batch carry
  iommufd: Fix unpinning of pages when an access is present
  iommufd: Check for uptr overflow
  Linux 6.3-rc5

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-04 11:04:30 -03:00
Tom Rix c52159b5be iommufd/selftest: Set varaiable mock_iommu_device storage-class-specifier to static
smatch reports:

drivers/iommu/iommufd/selftest.c:295:21: warning: symbol
  'mock_iommu_device' was not declared. Should it be static?

This variable is only used in one file so it should be static.

Fixes: 65c619ae06 ("iommufd/selftest: Make selftest create a more complete mock device")
Link: https://lore.kernel.org/r/20230404002317.1912530-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-04 11:02:39 -03:00
Jason Gunthorpe 13a0d1ae7e iommufd: Do not corrupt the pfn list when doing batch carry
If batch->end is 0 then setting npfns[0] before computing the new value of
pfns will fail to adjust the pfn and result in various page accounting
corruptions. It should be ordered after.

This seems to result in various kinds of page meta-data corruption related
failures:

  WARNING: CPU: 1 PID: 527 at mm/gup.c:75 try_grab_folio+0x503/0x740
  Modules linked in:
  CPU: 1 PID: 527 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
  RIP: 0010:try_grab_folio+0x503/0x740
  Code: e3 01 48 89 de e8 6d c1 dd ff 48 85 db 0f 84 7c fe ff ff e8 4f bf dd ff 49 8d 47 ff 48 89 45 d0 e9 73 fe ff ff e8 3d bf dd ff <0f> 0b 31 db e9 d0 fc ff ff e8 2f bf dd ff 48 8b 5d c8 31 ff 48 89
  RSP: 0018:ffffc90000f37908 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 00000000fffffc02 RCX: ffffffff81504c26
  RDX: 0000000000000000 RSI: ffff88800d030000 RDI: 0000000000000002
  RBP: ffffc90000f37948 R08: 000000000003ca24 R09: 0000000000000008
  R10: 000000000003ca00 R11: 0000000000000023 R12: ffffea000035d540
  R13: 0000000000000001 R14: 0000000000000000 R15: ffffea000035d540
  FS:  00007fecbf659740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000200011c3 CR3: 000000000ef66006 CR4: 0000000000770ee0
  PKRU: 55555554
  Call Trace:
   <TASK>
   internal_get_user_pages_fast+0xd32/0x2200
   pin_user_pages_fast+0x65/0x90
   pfn_reader_user_pin+0x376/0x390
   pfn_reader_next+0x14a/0x7b0
   pfn_reader_first+0x140/0x1b0
   iopt_area_fill_domain+0x74/0x210
   iopt_table_add_domain+0x30e/0x6e0
   iommufd_device_selftest_attach+0x7f/0x140
   iommufd_test+0x10ff/0x16f0
   iommufd_fops_ioctl+0x206/0x330
   __x64_sys_ioctl+0x10e/0x160
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

Cc: <stable@vger.kernel.org>
Fixes: f394576eb1 ("iommufd: PFN handling for iopt_pages")
Link: https://lore.kernel.org/r/3-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-04 09:10:55 -03:00
Jason Gunthorpe 727c28c1ce iommufd: Fix unpinning of pages when an access is present
syzkaller found that the calculation of batch_last_index should use
'start_index' since at input to this function the batch is either empty or
it has already been adjusted to cross any accesses so it will start at the
point we are unmapping from.

Getting this wrong causes the unmap to run over the end of the pages
which corrupts pages that were never mapped. In most cases this triggers
the num pinned debugging:

  WARNING: CPU: 0 PID: 557 at drivers/iommu/iommufd/pages.c:294 __iopt_area_unfill_domain+0x152/0x560
  Modules linked in:
  CPU: 0 PID: 557 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
  RIP: 0010:__iopt_area_unfill_domain+0x152/0x560
  Code: d2 0f ff 44 8b 64 24 54 48 8b 44 24 48 31 ff 44 89 e6 48 89 44 24 38 e8 fc d3 0f ff 45 85 e4 0f 85 eb 01 00 00 e8 0e d2 0f ff <0f> 0b e8 07 d2 0f ff 48 8b 44 24 38 89 5c 24 58 89 18 8b 44 24 54
  RSP: 0018:ffffc9000108baf0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffffffff821e3f85
  RDX: 0000000000000000 RSI: ffff88800faf0000 RDI: 0000000000000002
  RBP: ffffc9000108bd18 R08: 000000000003ca25 R09: 0000000000000014
  R10: 000000000003ca00 R11: 0000000000000024 R12: 0000000000000004
  R13: 0000000000000801 R14: 00000000000007ff R15: 0000000000000800
  FS:  00007f3499ce1740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020000243 CR3: 00000000179c2001 CR4: 0000000000770ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   iopt_area_unfill_domain+0x32/0x40
   iopt_table_remove_domain+0x23f/0x4c0
   iommufd_device_selftest_detach+0x3a/0x90
   iommufd_selftest_destroy+0x55/0x70
   iommufd_object_destroy_user+0xce/0x130
   iommufd_destroy+0xa2/0xc0
   iommufd_fops_ioctl+0x206/0x330
   __x64_sys_ioctl+0x10e/0x160
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

Also add some useful WARN_ON sanity checks.

Cc: <stable@vger.kernel.org>
Fixes: 8d160cd4d5 ("iommufd: Algorithms for PFN storage")
Link: https://lore.kernel.org/r/2-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-04 09:10:55 -03:00
Jason Gunthorpe e439570133 iommufd: Check for uptr overflow
syzkaller found that setting up a map with a user VA that wraps past zero
can trigger WARN_ONs, particularly from pin_user_pages weirdly returning 0
due to invalid arguments.

Prevent creating a pages with a uptr and size that would math overflow.

  WARNING: CPU: 0 PID: 518 at drivers/iommu/iommufd/pages.c:793 pfn_reader_user_pin+0x2e6/0x390
  Modules linked in:
  CPU: 0 PID: 518 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
  RIP: 0010:pfn_reader_user_pin+0x2e6/0x390
  Code: b1 11 e9 25 fe ff ff e8 28 e4 0f ff 31 ff 48 89 de e8 2e e6 0f ff 48 85 db 74 0a e8 14 e4 0f ff e9 4d ff ff ff e8 0a e4 0f ff <0f> 0b bb f2 ff ff ff e9 3c ff ff ff e8 f9 e3 0f ff ba 01 00 00 00
  RSP: 0018:ffffc90000f9fa30 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff821e2b72
  RDX: 0000000000000000 RSI: ffff888014184680 RDI: 0000000000000002
  RBP: ffffc90000f9fa78 R08: 00000000000000ff R09: 0000000079de6f4e
  R10: ffffc90000f9f790 R11: ffff888014185418 R12: ffffc90000f9fc60
  R13: 0000000000000002 R14: ffff888007879800 R15: 0000000000000000
  FS:  00007f4227555740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020000043 CR3: 000000000e748005 CR4: 0000000000770ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   pfn_reader_next+0x14a/0x7b0
   ? interval_tree_double_span_iter_update+0x11a/0x140
   pfn_reader_first+0x140/0x1b0
   iopt_pages_rw_slow+0x71/0x280
   ? __this_cpu_preempt_check+0x20/0x30
   iopt_pages_rw_access+0x2b2/0x5b0
   iommufd_access_rw+0x19f/0x2f0
   iommufd_test+0xd11/0x16f0
   ? write_comp_data+0x2f/0x90
   iommufd_fops_ioctl+0x206/0x330
   __x64_sys_ioctl+0x10e/0x160
   ? __pfx_iommufd_fops_ioctl+0x10/0x10
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

Cc: <stable@vger.kernel.org>
Fixes: 8d160cd4d5 ("iommufd: Algorithms for PFN storage")
Link: https://lore.kernel.org/r/1-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-04 09:10:55 -03:00
Greg Kroah-Hartman cd8fe5b6db Merge 6.3-rc5 into driver-core-next
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-03 09:33:30 +02:00
Jason Gunthorpe 9fdf791612 Merge branch 'vfio_mdev_ops' into iommufd.git for-next
Yi Liu says

===================
The .bind_iommufd op of vfio emulated devices are either empty or does
nothing. This is different with the vfio physical devices, to add vfio
device cdev, need to make them act the same.

This series first makes the .bind_iommufd op of vfio emulated devices to
create iommufd_access, this introduces a new iommufd API. Then let the
driver that does not provide .bind_iommufd op to use the vfio emulated
iommufd op set. This makes all vfio device drivers have consistent iommufd
operations, which is good for adding new device uAPIs in the device cdev
===================

* branch 'vfio_mdev_ops':
  vfio: Check the presence for iommufd callbacks in __vfio_register_dev()
  vfio/mdev: Uses the vfio emulated iommufd ops set in the mdev sample drivers
  vfio-iommufd: Make vfio_iommufd_emulated_bind() return iommufd_access ID
  vfio-iommufd: No need to record iommufd_ctx in vfio_device
  iommufd: Create access in vfio_iommufd_emulated_bind()
  iommu/iommufd: Pass iommufd_ctx pointer in iommufd_get_ioas()

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-31 13:43:57 -03:00
Yi Liu 632fda7f91 vfio-iommufd: Make vfio_iommufd_emulated_bind() return iommufd_access ID
vfio device cdev needs to return iommufd_access ID to userspace if
bind_iommufd succeeds.

Link: https://lore.kernel.org/r/20230327093351.44505-5-yi.l.liu@intel.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-31 13:43:32 -03:00
Nicolin Chen 54b47585db iommufd: Create access in vfio_iommufd_emulated_bind()
There are needs to created iommufd_access prior to have an IOAS and set
IOAS later. Like the vfio device cdev needs to have an iommufd object
to represent the bond (iommufd_access) and IOAS replacement.

Moves the iommufd_access_create() call into vfio_iommufd_emulated_bind(),
making it symmetric with the __vfio_iommufd_access_destroy() call in the
vfio_iommufd_emulated_unbind(). This means an access is created/destroyed
by the bind()/unbind(), and the vfio_iommufd_emulated_attach_ioas() only
updates the access->ioas pointer.

Since vfio_iommufd_emulated_bind() does not provide ioas_id, drop it from
the argument list of iommufd_access_create(). Instead, add a new access
API iommufd_access_attach() to set the access->ioas pointer. Also, set
vdev->iommufd_attached accordingly, similar to the physical pathway.

Link: https://lore.kernel.org/r/20230327093351.44505-3-yi.l.liu@intel.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-31 13:43:31 -03:00
Chunyan Zhang 816c698c05 iommu/sprd: Add support for reattaching an existing domain
This IOMMU driver should allow a domain to be attached more than once.

If IOMMU is reattaching to the same domain which is attached, there's
nothing to be done.

If reattching to a previously-used domain, do not alloc DMA buffer
again which stores address mapping table to avoid memory leak.

Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com>
Link: https://lore.kernel.org/r/20230331033124.864691-3-zhang.lyra@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:13:02 +02:00
Chunyan Zhang 9afea57384 iommu/sprd: Release dma buffer to avoid memory leak
When attaching to a domain, the driver would alloc a DMA buffer which
is used to store address mapping table, and it need to be released
when the IOMMU domain is freed.

Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com>
Link: https://lore.kernel.org/r/20230331033124.864691-2-zhang.lyra@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:12:51 +02:00
Kan Liang 16812c9655 iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug
A warning can be triggered when hotplug CPU 0.
$ echo 0 > /sys/devices/system/cpu/cpu0/online

 ------------[ cut here ]------------
 Voluntary context switch within RCU read-side critical section!
 WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318
          rcu_note_context_switch+0x4f4/0x580
 RIP: 0010:rcu_note_context_switch+0x4f4/0x580
 Call Trace:
  <TASK>
  ? perf_event_update_userpage+0x104/0x150
  __schedule+0x8d/0x960
  ? perf_event_set_state.part.82+0x11/0x50
  schedule+0x44/0xb0
  schedule_timeout+0x226/0x310
  ? __perf_event_disable+0x64/0x1a0
  ? _raw_spin_unlock+0x14/0x30
  wait_for_completion+0x94/0x130
  __wait_rcu_gp+0x108/0x130
  synchronize_rcu+0x67/0x70
  ? invoke_rcu_core+0xb0/0xb0
  ? __bpf_trace_rcu_stall_warning+0x10/0x10
  perf_pmu_migrate_context+0x121/0x370
  iommu_pmu_cpu_offline+0x6a/0xa0
  ? iommu_pmu_del+0x1e0/0x1e0
  cpuhp_invoke_callback+0x129/0x510
  cpuhp_thread_fun+0x94/0x150
  smpboot_thread_fn+0x183/0x220
  ? sort_range+0x20/0x20
  kthread+0xe6/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>
 ---[ end trace 0000000000000000 ]---

The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(),
when migrating a PMU to a new CPU. However, the current for_each_iommu()
is within RCU read-side critical section.

Two methods were considered to fix the issue.
- Use the dmar_global_lock to replace the RCU read lock when going
  through the drhd list. But it triggers a lockdep warning.
- Use the cpuhp_setup_state_multi() to set up a dedicated state for each
  IOMMU PMU. The lock can be avoided.

The latter method is implemented in this patch. Since each IOMMU PMU has
a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track
the state. The state can be dynamically allocated now. Remove the
CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE.

Fixes: 46284c6ceb ("iommu/vt-d: Support cpumask for IOMMU perfmon")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230328182028.1366416-1-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230329134721.469447-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:06:16 +02:00
Lu Baolu bfd3c6b9fa iommu/vt-d: Allow zero SAGAW if second-stage not supported
The VT-d spec states (in section 11.4.2) that hardware implementations
reporting second-stage translation support (SSTS) field as Clear also
report the SAGAW field as 0. Fix an inappropriate check in alloc_iommu().

Fixes: 792fb43ce2 ("iommu/vt-d: Enable Intel IOMMU scalable mode by default")
Suggested-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230318024824.124542-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20230329134721.469447-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:06:15 +02:00
Lu Baolu c7d624520c iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()
The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.

Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
the DMAR global data structures are not touched there. Remove it to avoid
below lockdep warning.

 ======================================================
 WARNING: possible circular locking dependency detected
 6.3.0-rc2 #468 Not tainted
 ------------------------------------------------------
 swapper/0/1 is trying to acquire lock:
 ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3},
   at: __irq_domain_alloc_irqs+0x3b/0xa0

 but task is already holding lock:
 ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3},
   at: intel_iommu_init+0x58e/0x880

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (dmar_global_lock){++++}-{3:3}:
        lock_acquire+0xd6/0x320
        down_read+0x42/0x180
        intel_irq_remapping_alloc+0xad/0x750
        mp_irqdomain_alloc+0xb8/0x2b0
        irq_domain_alloc_irqs_locked+0x12f/0x2d0
        __irq_domain_alloc_irqs+0x56/0xa0
        alloc_isa_irq_from_domain.isra.7+0xa0/0xe0
        mp_map_pin_to_irq+0x1dc/0x330
        setup_IO_APIC+0x128/0x210
        apic_intr_mode_init+0x67/0x110
        x86_late_time_init+0x24/0x40
        start_kernel+0x41e/0x7e0
        secondary_startup_64_no_verify+0xe0/0xeb

 -> #0 (&domain->mutex){+.+.}-{3:3}:
        check_prevs_add+0x160/0xef0
        __lock_acquire+0x147d/0x1950
        lock_acquire+0xd6/0x320
        __mutex_lock+0x9c/0xfc0
        __irq_domain_alloc_irqs+0x3b/0xa0
        dmar_alloc_hwirq+0x9e/0x120
        iommu_pmu_register+0x11d/0x200
        intel_iommu_init+0x5de/0x880
        pci_iommu_init+0x12/0x40
        do_one_initcall+0x65/0x350
        kernel_init_freeable+0x3ca/0x610
        kernel_init+0x1a/0x140
        ret_from_fork+0x29/0x50

 other info that might help us debug this:

 Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(dmar_global_lock);
                                lock(&domain->mutex);
                                lock(dmar_global_lock);
   lock(&domain->mutex);

                *** DEADLOCK ***

Fixes: 9dbb8e3452 ("irqdomain: Switch to per-domain locking")
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Tested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230314051836.23817-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20230329134721.469447-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:06:15 +02:00
Jason Gunthorpe 99b5726b44 iommu: Remove ioasid infrastructure
This has no use anymore, delete it all.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-8-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:31 +02:00
Jacob Pan fffaed1e24 iommu/ioasid: Rename INVALID_IOASID
INVALID_IOASID and IOMMU_PASID_INVALID are duplicated. Rename
INVALID_IOASID and consolidate since we are moving away from IOASID
infrastructure.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-7-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:27 +02:00
Jacob Pan 1a14bf0fc7 iommu/sva: Use GFP_KERNEL for pasid allocation
We’re not using spinlock-protected IOASID allocation anymore, there’s
no need for GFP_ATOMIC.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-6-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:26 +02:00
Jason Gunthorpe 4e14176ab1 iommu/sva: Stop using ioasid_set for SVA
Instead SVA drivers can use a simple global IDA to allocate PASIDs for
each mm_struct.

Future work would be to allow drivers using the SVA APIs to reserve global
PASIDs from this IDA for their internal use, eg with the DMA API PASID
support.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-5-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:25 +02:00
Jacob Pan 2bef9ba8ae iommu/sva: Remove PASID to mm lookup function
There is no user of iommu_sva_find() function, remove it so that PASID
allocator can be a simple IDA. Device drivers are expected to store
and keep track of their own PASID metadata.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-4-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:25 +02:00
Jacob Pan cd3891158a iommu/sva: Move PASID helpers to sva code
Preparing to remove IOASID infrastructure, PASID management will be
under SVA code. Decouple mm code from IOASID.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-3-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:23 +02:00
Jacob Pan 760f41d182 iommu/vt-d: Remove virtual command interface
Virtual command interface was introduced to allow using host PASIDs
inside VMs. It is unused and abandoned due to architectural change.

With this patch, we can safely remove this feature and the related helpers.

Link: https://lore.kernel.org/r/20230210230206.3160144-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:21 +02:00
Uwe Kleine-König 421b6093f5 iommu/sprd: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230321084125.337021-11-u.kleine-koenig@pengutronix.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:01:59 +02:00
Uwe Kleine-König 5930df68ae iommu/omap: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230321084125.337021-10-u.kleine-koenig@pengutronix.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:01:59 +02:00
Uwe Kleine-König 85e1049e50 iommu/mtk_iommu_v1: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230321084125.337021-9-u.kleine-koenig@pengutronix.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:01:58 +02:00
Uwe Kleine-König d8149d3929 iommu/mtk: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230321084125.337021-8-u.kleine-koenig@pengutronix.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:01:58 +02:00
Uwe Kleine-König 816a4afce1 iommu/msm: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230321084125.337021-7-u.kleine-koenig@pengutronix.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:01:57 +02:00
Uwe Kleine-König 7471ea50ea iommu/ipmmu-vmsa: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230321084125.337021-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:01:57 +02:00
Uwe Kleine-König 62565a77c2 iommu/arm-smmu: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230321084125.337021-5-u.kleine-koenig@pengutronix.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:01:56 +02:00
Uwe Kleine-König 66c7076f76 iommu/arm-smmu-v3: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230321084125.337021-4-u.kleine-koenig@pengutronix.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:01:56 +02:00
Uwe Kleine-König f80473183b iommu/apple-dart: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230321084125.337021-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:01:55 +02:00
Uwe Kleine-König a2972cb899 iommu/arm-smmu: Drop if with an always false condition
The remove and shutdown callback are only called after probe completed
successfully. In this case platform_set_drvdata() was called with a
non-NULL argument and so smmu is never NULL. Other functions in this
driver also don't check for smmu being non-NULL before using it.

Also note that returning an error code from a remove callback doesn't
result in the device staying bound. It's still removed and devm allocated
resources are freed (among others *smmu and the register mapping). So
after an early exit to iommu device stayed around and using it probably
oopses.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20230321084125.337021-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:01:55 +02:00
Tomas Krcka 67ea0b7ce4 iommu/arm-smmu-v3: Acknowledge pri/event queue overflow if any
When an overflow occurs in the PRI queue, the SMMU toggles the overflow
flag in the PROD register. To exit the overflow condition, the PRI thread
is supposed to acknowledge it by toggling this flag in the CONS register.
Unacknowledged overflow causes the queue to stop adding anything new.

Currently, the priq thread always writes the CONS register back to the
SMMU after clearing the queue.

The writeback is not necessary if the OVFLG in the PROD register has not
been changed, no overflow has occured.

This commit checks the difference of the overflow flag between CONS and
PROD register. If it's different, toggles the OVACKFLG flag in the CONS
register and write it to the SMMU.

The situation is similar for the event queue.
The acknowledge register is also toggled after clearing the event
queue but never propagated to the hardware. This would only be done the
next time when executing evtq thread.

Unacknowledged event queue overflow doesn't affect the event
queue, because the SMMU still adds elements to that queue when the
overflow condition is active.
But it feel nicer to keep SMMU in sync when possible, so use the same
way here as well.

Signed-off-by: Tomas Krcka <krckatom@amazon.de>
Link: https://lore.kernel.org/r/20230329123420.34641-1-tomas.krcka@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-03-30 16:05:05 +01:00
Yi Liu 325de95029 iommu/iommufd: Pass iommufd_ctx pointer in iommufd_get_ioas()
No need to pass the iommufd_ucmd pointer.

Link: https://lore.kernel.org/r/20230327093351.44505-2-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-29 16:52:41 -03:00
Nipun Gupta 3f47d3e44d iommu: Add iommu probe for CDX bus
Add CDX bus to iommu_buses so that IOMMU probe is called
for it.

Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20230313132636.31850-3-nipun.gupta@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-29 12:26:32 +02:00
Marek Szyprowski f91bf3272a iommu/exynos: Fix set_platform_dma_ops() callback
There are some subtle differences between release_device() and
set_platform_dma_ops() callbacks, so separate those two callbacks. Device
links should be removed only in release_device(), because they were
created in probe_device() on purpose and they are needed for proper
Exynos IOMMU driver operation. While fixing this, remove the conditional
code as it is not really needed.

Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
Fixes: 189d496b48 ("iommu/exynos: Add missing set_platform_dma_ops callback")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
Link: https://lore.kernel.org/r/20230315232514.1046589-1-m.szyprowski@samsung.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-28 15:33:50 +02:00
Vasant Hegde f594496403 iommu/amd: Add 5 level guest page table support
Newer AMD IOMMU supports 5 level guest page table (v2 page table). If both
processor and IOMMU supports 5 level page table then enable it. Otherwise
fall back to 4 level page table.

Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230310090000.1117786-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-28 15:31:31 +02:00
Yong Wu f045e9df65 iommu/mediatek: Set dma_mask for PGTABLE_PA_35_EN
When we enable PGTABLE_PA_35_EN, the PA for pgtable may be 35bits.
Thus add dma_mask for it.

Fixes: 301c3ca125 ("iommu/mediatek: Allow page table PA up to 35bit")
Signed-off-by: Chengci.Xu <chengci.xu@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230316101445.12443-1-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-28 15:18:13 +02:00
Manivannan Sadhasivam 1226113473 iommu/arm-smmu-qcom: Limit the SMR groups to 128
Some platforms support more than 128 stream matching groups than what is
defined by the ARM SMMU architecture specification. But due to some unknown
reasons, those additional groups don't exhibit the same behavior as the
architecture supported ones.

For instance, the additional groups will not detect the quirky behavior of
some firmware versions intercepting writes to S2CR register, thus skipping
the quirk implemented in the driver and causing boot crash.

So let's limit the groups to 128 for now until the issue with those groups
are fixed and issue a notice to users in that case.

Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20230327080029.11584-1-manivannan.sadhasivam@linaro.org
[will: Reworded the comment slightly]
Signed-off-by: Will Deacon <will@kernel.org>
2023-03-27 13:39:48 +01:00
Jean-Philippe Brucker 8c153645fa iommu/arm-smmu-v3: Explain why ATS stays disabled with bypass
The SMMU does not support enabling ATS for a bypass stream. Add a comment.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20230321100559.341981-1-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
2023-03-27 13:22:23 +01:00
Greg Kroah-Hartman b18d0a0f92 iommu: make the pointer to struct bus_type constant
A number of iommu functions take a struct bus_type * and never modify
the data passed in, so make them all const * as that is what the driver
core is expecting to have passed into as well.

This is a step toward making all struct bus_type pointers constant in
the kernel.

Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux.dev
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20230313182918.1312597-34-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-23 13:21:54 +01:00
Lu Baolu c33fcc13ee iommu: Use sysfs_emit() for sysfs show
Use sysfs_emit() instead of the sprintf() for sysfs entries. sysfs_emit()
knows the maximum of the temporary buffer used for outputting sysfs
content and avoids overrunning the buffer length.

Prefer 'long long' over 'long long int' as suggested by checkpatch.pl.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230322123421.278852-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:47:10 +01:00
Lu Baolu 4c8444f19e iommu: Cleanup iommu_change_dev_def_domain()
As the singleton group limitation has been removed, cleanup the code
in iommu_change_dev_def_domain() accordingly.

Documentation is also updated.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230322064956.263419-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:45:17 +01:00
Lu Baolu 49a22aae7d iommu: Replace device_lock() with group->mutex
device_lock() was used in iommu_group_store_type() to prevent the
devices in an iommu group from being attached by any device driver.
On the other hand, in order to avoid lock race between group->mutex
and device_lock(), it limited the usage scenario to the singleton
groups.

We already have the DMA ownership scheme to avoid driver attachment
and group->mutex ensures that device ops are always valid, there's
no need for device_lock() anymore. Remove device_lock() and the
singleton group limitation.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230322064956.263419-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:45:17 +01:00
Lu Baolu 33793748de iommu: Move lock from iommu_change_dev_def_domain() to its caller
The intention is to make it possible to put group ownership check and
default domain change in a same critical region protected by the group's
mutex lock. No intentional functional change.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230322064956.263419-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:45:17 +01:00
Lu Baolu dba9ca9d41 iommu: Same critical region for device release and removal
In a non-driver context, it is crucial to ensure the consistency of a
device's iommu ops. Otherwise, it may result in a situation where a
device is released but it's iommu ops are still used.

Put the ops->release_device and __iommu_group_remove_device() in a same
group->mutext critical region, so that, as long as group->mutex is held
and the device is in its group's device list, its iommu ops are always
consistent. Add check of group ownership if the released device is the
last one.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230322064956.263419-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:45:16 +01:00
Lu Baolu 293f2564f3 iommu: Split iommu_group_remove_device() into helpers
So that code could be re-used by iommu_release_device() in the subsequent
change. No intention for functionality change.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230322064956.263419-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:45:16 +01:00
Lu Baolu 24dfb197c3 iommu/ipmmu-vmsa: Call arm_iommu_release_mapping() in release path
In the iommu driver's release_device operation, the driver should detach
the device from any attached domain and release the resources allocated
in the probe_device and probe_finalize paths.

Replace arm_iommu_detach_device() with arm_iommu_release_mapping() in the
release path of the ipmmu-vmsa driver. The device_release callback is
called in device_del(), this device is not coming back. Zeroing out
pointers and testing for a condition which cannot be true by construction
is simply a waste of time and code.

The bonus is that it also removes a obstacle of arm_iommu_detach_device()
re-entering the iommu core during release_device. With this removed, the
iommu core code could be simplified a lot.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/linux-iommu/7b248ba1-3967-5cd8-82e9-0268c706d320@arm.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230322064956.263419-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:45:15 +01:00
Vasant Hegde 4d4a0dbab2 iommu/amd: Allocate IOMMU irqs using numa locality info
Use numa information to allocate irq resources and also to set
irq affinity. This optimizes the IOMMU interrupt handling.

Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230321092348.6127-3-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:43:40 +01:00
Vasant Hegde 0d571dcbe7 iommu/amd: Allocate page table using numa locality info
Introduce 'struct protection_domain->nid' variable. It will contain
IOMMU NUMA node ID. And allocate page table pages using IOMMU numa
locality info. This optimizes page table walk by IOMMU.

Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230321092348.6127-2-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:43:39 +01:00
Rob Herring 0c04316461 iommu/omap: Use of_property_read_bool() for boolean properties
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties.
Convert reading boolean properties to of_property_read_bool().

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20230310144709.1542980-1-robh@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 14:56:21 +01:00
Rob Herring a6c9e3874e iommu: Use of_property_present() for testing DT property presence
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties. As
part of this, convert of_get_property/of_find_property calls to the
recently added of_property_present() helper when we just want to test
for presence of a property and nothing more.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20230310144709.1542910-1-robh@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 14:54:42 +01:00
Geert Uytterhoeven 1b0b5f50dc iommu: Spelling s/cpmxchg64/cmpxchg64/
Fix misspellings of "cmpxchg64"

Fixes: d286a58bc8 ("iommu: Tidy up io-pgtable dependencies")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/eab156858147249d44463662eb9192202c39ab9f.1678295792.git.geert+renesas@glider.be
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 14:52:16 +01:00
Randy Dunlap 829a79556f iommu/fsl: fix all kernel-doc warnings in fsl_pamu.c
Fix kernel-doc warnings as reported by the kernel test robot:

fsl_pamu.c:192: warning: expecting prototype for pamu_config_paace(). Prototype was for pamu_config_ppaace() instead
fsl_pamu.c:239: warning: Function parameter or member 'omi_index' not described in 'get_ome_index'
fsl_pamu.c:239: warning: Function parameter or member 'dev' not described in 'get_ome_index'
fsl_pamu.c:332: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Setup operation mapping and stash destinations for QMAN and QMAN portal.
fsl_pamu.c:361: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Setup the operation mapping table for various devices. This is a static

Fixes: 695093e38c ("iommu/fsl: Freescale PAMU driver and iommu implementation.")
Fixes: cd70d4659f ("iommu/fsl: Various cleanups")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: lore.kernel.org/r/202302281151.B1WtZvSC-lkp@intel.com
Cc: Aditya Srivastava <yashsri421@gmail.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux.dev
Cc: Timur Tabi <timur@tabi.org>
Cc: Varun Sethi <Varun.Sethi@freescale.com>
Cc: Emil Medve <Emilian.Medve@Freescale.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Timur Tabi <timur@kernel.org>
Link: https://lore.kernel.org/r/20230308034504.9985-1-rdunlap@infradead.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 14:50:15 +01:00
Wolfram Sang efe37fda9d iommu/ipmmu-vmsa: remove R-Car H3 ES1.* handling
R-Car H3 ES1.* was only available to an internal development group and
needed a lot of quirks and workarounds. These become a maintenance
burden now, so our development group decided to remove upstream support
and disable booting for this SoC. Public users only have ES2 onwards.

Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20230307163041.3815-2-wsa+renesas@sang-engineering.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 14:28:01 +01:00
Nick Alcock 08632365b2 iommu/sun50i: remove MODULE_LICENSE in non-modules
Since commit 8b41fc4454 ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Jernej Skrabec <jernej.skrabec@gmail.com>
Cc: Samuel Holland <samuel@sholland.org>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: iommu@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-sunxi@lists.linux.dev
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20230224150811.80316-8-nick.alcock@oracle.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 14:26:08 +01:00
Thomas Weißschuh aa977833de iommu: Make kobj_type structure constant
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definition to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20230214-kobj_type-iommu-v1-1-e7392834b9d0@weissschuh.net
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 14:19:04 +01:00
Kirill A. Shutemov 23e5d9ec2b x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
IOMMU and SVA-capable devices know nothing about LAM and only expect
canonical addresses. An attempt to pass down tagged pointer will lead
to address translation failure.

By default do not allow to enable both LAM and use SVA in the same
process.

The new ARCH_FORCE_TAGGED_SVA arch_prctl() overrides the limitation.
By using the arch_prctl() userspace takes responsibility to never pass
tagged address to the device.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230312112612.31869-12-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:40 -07:00
Kirill A. Shutemov 400b9b9344 iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
Kernel has few users of pasid_valid() and all but one checks if the
process has PASID allocated. The helper takes ioasid_t as the input.

Replace the helper with mm_valid_pasid() that takes mm_struct as the
argument. The only call that checks PASID that is not tied to mm_struct
is open-codded now.

This is preparatory patch. It helps avoid ifdeffery: no need to
dereference mm->pasid in generic code to check if the process has PASID.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230312112612.31869-11-kirill.shutemov%40linux.intel.com
2023-03-16 13:08:40 -07:00
Jason Gunthorpe fd8c1a4aee iommufd/selftest: Catch overflow of uptr and length
syzkaller hits a WARN_ON when trying to have a uptr close to UINTPTR_MAX:

  WARNING: CPU: 1 PID: 393 at drivers/iommu/iommufd/selftest.c:403 iommufd_test+0xb19/0x16f0
  Modules linked in:
  CPU: 1 PID: 393 Comm: repro Not tainted 6.2.0-c9c3395d5e3d #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
  RIP: 0010:iommufd_test+0xb19/0x16f0
  Code: 94 c4 31 ff 44 89 e6 e8 a5 54 17 ff 45 84 e4 0f 85 bb 0b 00 00 41 be fb ff ff ff e8 31 53 17 ff e9 a0 f7 ff ff e8 27 53 17 ff <0f> 0b 41 be 8
  RSP: 0018:ffffc90000eabdc0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8214c487
  RDX: 0000000000000000 RSI: ffff88800f5c8000 RDI: 0000000000000002
  RBP: ffffc90000eabe48 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000001 R11: 0000000000000000 R12: 00000000cd2b0000
  R13: 00000000cd2af000 R14: 0000000000000000 R15: ffffc90000eabe68
  FS:  00007f94d76d5740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020000043 CR3: 0000000006880006 CR4: 0000000000770ee0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? write_comp_data+0x2f/0x90
   iommufd_fops_ioctl+0x1ef/0x310
   __x64_sys_ioctl+0x10e/0x160
   ? __pfx_iommufd_fops_ioctl+0x10/0x10
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

Check that the user memory range doesn't overflow.

Fixes: f4b20bb34c ("iommufd: Add kernel support for testing iommufd")
Link: https://lore.kernel.org/r/0-v1-95390ed1df8d+8f-iommufd_mock_overflow_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Link: https://lore.kernel.org/r/Y/hOiilV1wJvu/Hv@xpf.sh.intel.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-10 15:29:59 -04:00
Jason Gunthorpe 65c619ae06 iommufd/selftest: Make selftest create a more complete mock device
iommufd wants to use more infrastructure, like the iommu_group, that the
mock device does not support. Create a more complete mock device that can
go through the whole cycle of ownership, blocking domain, and has an
iommu_group.

This requires creating a real struct device on a real bus to be able to
connect it to a iommu_group. Unfortunately we cannot formally attach the
mock iommu driver as an actual driver as the iommu core does not allow
more than one driver or provide a general way for busses to link to
iommus. This can be solved with a little hack to open code the dev_iommus
struct.

With this infrastructure things work exactly the same as the normal domain
path, including the auto domains mechanism and direct attach of hwpts.  As
the created hwpt is now an autodomain it is no longer required to destroy
it and trying to do so will trigger a failure.

Link: https://lore.kernel.org/r/11-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-06 13:06:11 -04:00
Jason Gunthorpe 2cfdeaa07b iommufd/selftest: Rename the sefltest 'device_id' to 'stdev_id'
It is too confusing now that we have the 'dev_id' as part of the main
interface. Make it clear this is the special selftest device object. This
object is analogous to the VFIO device FD.

Link: https://lore.kernel.org/r/7-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-06 10:51:58 -04:00
Jason Gunthorpe 339fbf3ae1 iommufd: Make iommufd_hw_pagetable_alloc() do iopt_table_add_domain()
The HWPT is always linked to an IOAS and once a HWPT exists its domain
should be fully mapped. This ended up being split up into device.c during
a two phase creation that was a bit confusing.

Move the iopt_table_add_domain() into iommufd_hw_pagetable_alloc() by
having it call back to device.c to complete the domain attach in the
required order.

Calling iommufd_hw_pagetable_alloc() with immediate_attach = false will
work on most drivers, but notably the SMMU drivers will fail because they
can't decide what kind of domain to create until they are attached. This
will be fixed when the domain_alloc function can take in a struct device.

Link: https://lore.kernel.org/r/6-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-06 10:51:57 -04:00
Jason Gunthorpe 7e7ec8a569 iommufd: Move iommufd_device to iommufd_private.h
hw_pagetable.c will need this in the next patches.

Link: https://lore.kernel.org/r/5-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-06 10:51:57 -04:00
Jason Gunthorpe 25cde97d95 iommufd: Move ioas related HWPT destruction into iommufd_hw_pagetable_destroy()
A HWPT is permanently associated with an IOAS when it is created, remove
the strange situation where a refcount != 0 HWPT can have been
disconnected from the IOAS by putting all the IOAS related destruction in
the object destroy function.

Initializing a HWPT is two stages, we have to allocate it, attach it to a
device and then populate the domain. Once the domain is populated it is
fully linked to the IOAS.

Arrange things so that all the error unwinds flow through the
iommufd_hw_pagetable_destroy() and allow it to handle all cases.

Link: https://lore.kernel.org/r/4-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-06 10:51:57 -04:00
Jason Gunthorpe 342b9cab8e iommufd: Consistently manage hwpt_item
This should be added immediately after every iopt_table_add_domain(), and
deleted after every iopt_table_remove_domain() under the ioas->mutex.

Tidy things to be consistent.

Link: https://lore.kernel.org/r/3-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-06 10:51:57 -04:00
Jason Gunthorpe 7214c1c85f iommufd: Add iommufd_lock_obj() around the auto-domains hwpts
A later patch will require this locking - currently under the ioas mutex
the hwpt can not have a 0 reference and be on the list.

Link: https://lore.kernel.org/r/2-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-06 10:51:56 -04:00
Jason Gunthorpe 085fcc7eb7 iommufd: Assert devices_lock for iommufd_hw_pagetable_has_group()
The hwpt->devices list is locked by this, make it clearer.

Link: https://lore.kernel.org/r/1-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-06 10:51:56 -04:00
Linus Torvalds 11c7052998 ARM: SoC drivers for 6.3
As usual, there are lots of minor driver changes across SoC platforms
 from  NXP, Amlogic, AMD Zynq, Mediatek, Qualcomm, Apple and Samsung.
 These usually add support for additional chip variations in existing
 drivers, but also add features or bugfixes.
 
 The SCMI firmware subsystem gains a unified raw userspace interface
 through debugfs, which can be used for validation purposes.
 
 Newly added drivers include:
 
  - New power management drivers for StarFive JH7110, Allwinner D1 and
    Renesas RZ/V2M
 
  - A driver for Qualcomm battery and power supply status
 
  - A SoC device driver for identifying Nuvoton WPCM450 chips
 
  - A regulator coupler driver for Mediatek MT81xxv
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmPtSN8ACgkQmmx57+YA
 GNkOSw/+JS5tElm/ZP7c3uWYp6uwvcb0jUlKW/U3aCtPiPEcYDLEqIEXwcNdaDMh
 m4rW3GYlW0IRL3FsyuYkSLx+EIIUIfs40wldYXJOqRDj0XasndiloIwltOQJGfd9
 C/UVM0FpJdxMJrcBMFgwLLQCIbAVnhHP34i6ppDRgxW/MfTeiCaaG6fnS70iv6mC
 oh2N7FoZSKDtTrFtlR5TqFiK5v/W1CgNJVuglkFB0ceFpjyBpp/8AT0FGS887xCz
 IYSTqm4Q/79vaZXI1Y2oog257cgdwsVqgPrnK5CuSFhTnAcJMCekiFelHq8Yhyuk
 Rw7j/B3KO3AOaxmR75c6SZdeZ+VHgUMRC/RKe3fay0sm3Zea2kAIPXA6Zn+r/cxb
 8M94V59qBz+f8XmpXRTK1UR3s3EbwFIuNyuDIkeorMtpSKtvqJXmZxGDwNIfXr2F
 /voo++MKjzdtdxdW/D/5Tc9DC0Pyb4HLi0EYj2QCzA03njmfLDF1w73NfzMec+GD
 R1zAd3FEbiJQx8Hin0PSPjYXpfMnkjkGAEcE9N9Ralg4ewNWAxfOFsAhHKTZNssL
 pitTAvHR/+dXtvkX7FUi2l/6fqn8nJUrg/xRazPPp3scRbpuk8m6P4MNr3/lsaHk
 HTQ/hYwDdecWLvKXjw5y9yIr3yhLmPPcloTVIIFFjsM0t8b+d9E=
 =p6Xp
 -----END PGP SIGNATURE-----

Merge tag 'soc-drivers-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "As usual, there are lots of minor driver changes across SoC platforms
  from NXP, Amlogic, AMD Zynq, Mediatek, Qualcomm, Apple and Samsung.
  These usually add support for additional chip variations in existing
  drivers, but also add features or bugfixes.

  The SCMI firmware subsystem gains a unified raw userspace interface
  through debugfs, which can be used for validation purposes.

  Newly added drivers include:

   - New power management drivers for StarFive JH7110, Allwinner D1 and
     Renesas RZ/V2M

   - A driver for Qualcomm battery and power supply status

   - A SoC device driver for identifying Nuvoton WPCM450 chips

   - A regulator coupler driver for Mediatek MT81xxv"

* tag 'soc-drivers-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
  power: supply: Introduce Qualcomm PMIC GLINK power supply
  soc: apple: rtkit: Do not copy the reg state structure to the stack
  soc: sunxi: SUN20I_PPU should depend on PM
  memory: renesas-rpc-if: Remove redundant division of dummy
  soc: qcom: socinfo: Add IDs for IPQ5332 and its variant
  dt-bindings: arm: qcom,ids: Add IDs for IPQ5332 and its variant
  dt-bindings: power: qcom,rpmpd: add RPMH_REGULATOR_LEVEL_LOW_SVS_L1
  firmware: qcom_scm: Move qcom_scm.h to include/linux/firmware/qcom/
  MAINTAINERS: Update qcom CPR maintainer entry
  dt-bindings: firmware: document Qualcomm SM8550 SCM
  dt-bindings: firmware: qcom,scm: add qcom,scm-sa8775p compatible
  soc: qcom: socinfo: Add Soc IDs for IPQ8064 and variants
  dt-bindings: arm: qcom,ids: Add Soc IDs for IPQ8064 and variants
  soc: qcom: socinfo: Add support for new field in revision 17
  soc: qcom: smd-rpm: Add IPQ9574 compatible
  soc: qcom: pmic_glink: remove redundant calculation of svid
  soc: qcom: stats: Populate all subsystem debugfs files
  dt-bindings: soc: qcom,rpmh-rsc: Update to allow for generic nodes
  soc: qcom: pmic_glink: add CONFIG_NET/CONFIG_OF dependencies
  soc: qcom: pmic_glink: Introduce altmode support
  ...
2023-02-27 10:04:49 -08:00
Linus Torvalds 143c7bc649 iommufd for 6.3
Some polishing and small fixes for iommufd:
 
 - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem
 
 - Use GFP_KERNEL_ACCOUNT inside the iommu_domains
 
 - Support VFIO_NOIOMMU mode with iommufd
 
 - Various typos
 
 - A list corruption bug if HWPTs are used for attach
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/TgzQAKCRCFwuHvBreF
 Ya3AAP4/WxTJIbDvtTyH3Fae3NxTdO8j8gsUvU1vrRYG83zdnAEAxd1yii7GEO8D
 crkeq9D4FUiPAkFnJ64Exw2FHb060Qg=
 =RABK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Some polishing and small fixes for iommufd:

   - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt
     subsystem

   - Use GFP_KERNEL_ACCOUNT inside the iommu_domains

   - Support VFIO_NOIOMMU mode with iommufd

   - Various typos

   - A list corruption bug if HWPTs are used for attach"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Do not add the same hwpt to the ioas->hwpt_list twice
  iommufd: Make sure to zero vfio_iommu_type1_info before copying to user
  vfio: Support VFIO_NOIOMMU with iommufd
  iommufd: Add three missing structures in ucmd_buffer
  selftests: iommu: Fix test_cmd_destroy_access() call in user_copy
  iommu: Remove IOMMU_CAP_INTR_REMAP
  irq/s390: Add arch_is_isolated_msi() for s390
  iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI
  genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI
  genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code
  iommufd: Convert to msi_device_has_isolated_msi()
  vfio/type1: Convert to iommu_group_has_isolated_msi()
  iommu: Add iommu_group_has_isolated_msi()
  genirq/msi: Add msi_device_has_isolated_msi()
2023-02-24 14:34:12 -08:00
Linus Torvalds a13de74e47 IOMMU Updates for Linux v6.3:
Including:
 
 	- Consolidate iommu_map/unmap functions. There have been
 	  blocking and atomic variants so far, but that was problematic
 	  as this approach does not scale with required new variants
 	  which just differ in the GFP flags used.
 	  So Jason consolidated this back into single functions that
 	  take a GFP parameter. This has the potential to cause
 	  conflicts with other trees, as they introduce new call-sites
 	  for the changed functions. I offered them to pull in the
 	  branch containing these changes and resolve it, but I am not
 	  sure everyone did that. The conflicts this caused with
 	  upstream up to v6.2-rc8 are resolved in the final merge
 	  commit.
 
 	- Retire the detach_dev() call-back in iommu_ops
 
 	- Arm SMMU updates from Will:
 	  - Device-tree binding updates:
 	    * Cater for three power domains on SM6375
 	    * Document existing compatible strings for Qualcomm SoCs
 	    * Tighten up clocks description for platform-specific compatible strings
 	  - Enable Qualcomm workarounds for some additional platforms that need them
 
 	- Intel VT-d updates from Lu Baolu:
 	  - Add Intel IOMMU performance monitoring support
 	  - Set No Execute Enable bit in PASID table entry
 	  - Two performance optimizations
 	  - Fix PASID directory pointer coherency
 	  - Fix missed rollbacks in error path
 	  - Cleanups
 
 	- Apple t8110 DART support
 
 	- Exynos IOMMU:
 	  - Implement better fault handling
 	  - Error handling fixes
 
 	- Renesas IPMMU:
 	  - Add device tree bindings for r8a779g0
 
 	- AMD IOMMU:
 	  - Various fixes for handling on SNP-enabled systems and
 	    handling of faults with unknown request-ids
 	  - Cleanups and other small fixes
 
 	- Various other smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmP0hDwACgkQK/BELZcB
 GuM43RAA0YieShO+X0h6TFGfbK0zVoPd91giZehWBv9rHK7pP4iY8UEtBLBWGx/t
 CId4t98mmKmC212zz8QxrwAEzyTIRY+2t1yrpG2aVkoTYk8inMb07TU37wganh3O
 T0QccXN+9b2BS4k8yro5f3uX0d/C1JQVcMowwr53VMb/e73huqP1VTbz06/CIWMH
 DUhVRCzmNhSvoUOT5n7g6+ZDH+pot8WPZbtHV7FowEsmPCRc7Fj8kXyI9FEwKwrZ
 hIV5Y+6Lej8nQScgbO8MfblJym3VrBoSoM4GY2w0L0rjQw6m+Xtea5rT0W39YVWy
 YpiscLTL8TIMPP9zK1dXVygTaABK4J2iWmheHPkpKXIhK0iuH3Dke0Do5p6DNITj
 7J2YlaNEB480D5hvNBKsbbGHavgGPT8m529Sz0R7mSC7omRzqiG5Vsb46IXL+2bc
 92ojjYNfXb6OCtagIr2LMBLZRL2JCODqF1dUmyZfA8GKOHLP5kZXoMM+sZbQ2aUL
 1LOxRZVx+tlb9V4VaH1ZSs/6eM+HLDzjtHeu3PoWYf6mW4AEt4S/yl9SKAkGdBqt
 jCUErmYB1nU/eefqG1jhWRpQeJabcT3Oe30NZru1pfMoREThhjbAACw1JxWtoe1X
 ipGpV6lAP7tQUGuRk3/9O1lNqElJuNwC5lVTjS4FJ38vYQhQbao=
 =ZaZV
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Consolidate iommu_map/unmap functions.

   There have been blocking and atomic variants so far, but that was
   problematic as this approach does not scale with required new
   variants which just differ in the GFP flags used. So Jason
   consolidated this back into single functions that take a GFP
   parameter.

 - Retire the detach_dev() call-back in iommu_ops

 - Arm SMMU updates from Will:
     - Device-tree binding updates:
         - Cater for three power domains on SM6375
         - Document existing compatible strings for Qualcomm SoCs
         - Tighten up clocks description for platform-specific
           compatible strings
     - Enable Qualcomm workarounds for some additional platforms that
       need them

 - Intel VT-d updates from Lu Baolu:
     - Add Intel IOMMU performance monitoring support
     - Set No Execute Enable bit in PASID table entry
     - Two performance optimizations
     - Fix PASID directory pointer coherency
     - Fix missed rollbacks in error path
     - Cleanups

 - Apple t8110 DART support

 - Exynos IOMMU:
     - Implement better fault handling
     - Error handling fixes

 - Renesas IPMMU:
     - Add device tree bindings for r8a779g0

 - AMD IOMMU:
     - Various fixes for handling on SNP-enabled systems and
       handling of faults with unknown request-ids
     - Cleanups and other small fixes

 - Various other smaller fixes and cleanups

* tag 'iommu-updates-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (71 commits)
  iommu/amd: Skip attach device domain is same as new domain
  iommu: Attach device group to old domain in error path
  iommu/vt-d: Allow to use flush-queue when first level is default
  iommu/vt-d: Fix PASID directory pointer coherency
  iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
  iommu/vt-d: Fix error handling in sva enable/disable paths
  iommu/amd: Improve page fault error reporting
  iommu/amd: Do not identity map v2 capable device when snp is enabled
  iommu: Fix error unwind in iommu_group_alloc()
  iommu/of: mark an unused function as __maybe_unused
  iommu: dart: DART_T8110_ERROR range should be 0 to 5
  iommu/vt-d: Enable IOMMU perfmon support
  iommu/vt-d: Add IOMMU perfmon overflow handler support
  iommu/vt-d: Support cpumask for IOMMU perfmon
  iommu/vt-d: Add IOMMU perfmon support
  iommu/vt-d: Support Enhanced Command Interface
  iommu/vt-d: Retrieve IOMMU perfmon capability information
  iommu/vt-d: Support size of the register set in DRHD
  iommu/vt-d: Set No Execute Enable bit in PASID table entry
  iommu/vt-d: Remove sva from intel_svm_dev
  ...
2023-02-24 13:40:13 -08:00
Jason Gunthorpe 939204e4df Linux 6.2
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPyoZYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGcE0H/1imH5XOfowBdPQU
 p06pCJGKQyEsGnn+kXd7UXes9N/uZFQgOzY9sFspS1ZpXfm60zDcWCeJT2l3qatK
 dtmAGxTEBeZJ8JuevtBiedWy9pJPpvMsfeZd85XzGDRxNUnGT5HgU0/98NpIjysb
 9HTPrpJO9HlmoAKkFDu+Z/kLJp+obns1yQOCH5glOREsPY+4SX76bjPjrbSic0oj
 oDSSBpM2gfdwHWnOKkXhgNuu8zr+hS3LaU1HMj6Kgy3Huz2NjGlgXrRpzutTHEmT
 cmt3Dl5hdIeUtMCt8LbQcngjTg/rX11rFdWaOp/MOuD6U7cqTCWeEDyVsPicFehH
 wdsIfgw=
 =+SoL
 -----END PGP SIGNATURE-----

Merge tag 'v6.2' into iommufd.git for-next

Resolve conflicts from the signature change in iommu_map:

 - drivers/infiniband/hw/usnic/usnic_uiom.c
   Switch iommu_map_atomic() to iommu_map(.., GFP_ATOMIC)

 - drivers/vfio/vfio_iommu_type1.c
   Following indenting change for GFP_KERNEL

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-21 11:11:03 -04:00
Joerg Roedel bedd29d793 Merge branches 'apple/dart', 'arm/exynos', 'arm/renesas', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next 2023-02-18 15:43:04 +01:00
Vasant Hegde f451c7a5a3 iommu/amd: Skip attach device domain is same as new domain
If device->domain is same as new domain then we can skip the
device attach process.

Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230215052642.6016-2-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-18 15:36:33 +01:00
Vasant Hegde 2cc73c5712 iommu: Attach device group to old domain in error path
iommu_attach_group() attaches all devices in a group to domain and then
sets group domain (group->domain). Current code (__iommu_attach_group())
does not handle error path. This creates problem as devices to domain
attachment is in inconsistent state.

Flow:
  - During boot iommu attach devices to default domain
  - Later some device driver (like amd/iommu_v2 or vfio) tries to attach
    device to new domain.
  - In iommu_attach_group() path we detach device from current domain.
    Then it tries to attach devices to new domain.
  - If it fails to attach device to new domain then device to domain link
    is broken.
  - iommu_attach_group() returns error.
  - At this stage iommu_attach_group() caller thinks, attaching device to
    new domain failed and devices are still attached to old domain.
  - But in reality device to old domain link is broken. It will result
    in all sort of failures (like IO page fault) later.

To recover from this situation, we need to attach all devices back to the
old domain. Also log warning if it fails attach device back to old domain.

Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Reported-by: Matt Fagnani <matt.fagnani@bell.net>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Matt Fagnani <matt.fagnani@bell.net>
Link: https://lore.kernel.org/r/20230215052642.6016-1-vasant.hegde@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216865
Link: https://lore.kernel.org/lkml/15d0f9ff-2a56-b3e9-5b45-e6b23300ae3b@leemhuis.info/
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-18 15:34:24 +01:00
Tina Zhang 257ec29074 iommu/vt-d: Allow to use flush-queue when first level is default
Commit 29b3283972 ("iommu/vt-d: Do not use flush-queue when caching-mode
is on") forced default domains to be strict mode as long as IOMMU
caching-mode is flagged. The reason for doing this is that when vIOMMU
uses VT-d caching mode to synchronize shadowing page tables, the strict
mode shows better performance.

However, this optimization is orthogonal to the first-level page table
because the Intel VT-d architecture does not define the caching mode of
the first-level page table. Refer to VT-d spec, section 6.1, "When the
CM field is reported as Set, any software updates to remapping
structures other than first-stage mapping (including updates to not-
present entries or present entries whose programming resulted in
translation faults) requires explicit invalidation of the caches."
Exclude the first-level page table from this optimization.

Generally using first-stage translation in vIOMMU implies nested
translation enabled in the physical IOMMU. In this case the first-stage
page table is wholly captured by the guest. The vIOMMU only needs to
transfer the cache invalidations on vIOMMU to the physical IOMMU.
Forcing the default domain to strict mode will cause more frequent
cache invalidations, resulting in performance degradation. In a real
performance benchmark test measured by iperf receive, the performance
result on Sapphire Rapids 100Gb NIC shows:
w/ this fix ~51 Gbits/s, w/o this fix ~39.3 Gbits/s.

Theoretically a first-stage IOMMU page table can still be shadowed
in absence of the caching mode, e.g. with host write-protecting guest
IOMMU page table to synchronize changed PTEs with the physical
IOMMU page table. In this case the shadowing overhead is decoupled
from emulating IOTLB invalidation then the overhead of the latter part
is solely decided by the frequency of IOTLB invalidations. Hence
allowing guest default dma domain to be lazy can also benefit the
overall performance by reducing the total VM-exit numbers.

Fixes: 29b3283972 ("iommu/vt-d: Do not use flush-queue when caching-mode is on")
Reported-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230214025618.2292889-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:05 +01:00
Jacob Pan 194b3348bd iommu/vt-d: Fix PASID directory pointer coherency
On platforms that do not support IOMMU Extended capability bit 0
Page-walk Coherency, CPU caches are not snooped when IOMMU is accessing
any translation structures. IOMMU access goes only directly to
memory. Intel IOMMU code was missing a flush for the PASID table
directory that resulted in the unrecoverable fault as shown below.

This patch adds clflush calls whenever allocating and updating
a PASID table directory to ensure cache coherency.

On the reverse direction, there's no need to clflush the PASID directory
pointer when we deactivate a context entry in that IOMMU hardware will
not see the old PASID directory pointer after we clear the context entry.
PASID directory entries are also never freed once allocated.

 DMAR: DRHD: handling fault status reg 3
 DMAR: [DMA Read NO_PASID] Request device [00:0d.2] fault addr 0x1026a4000
       [fault reason 0x51] SM: Present bit in Directory Entry is clear
 DMAR: Dump dmar1 table entries for IOVA 0x1026a4000
 DMAR: scalable mode root entry: hi 0x0000000102448001, low 0x0000000101b3e001
 DMAR: context entry: hi 0x0000000000000000, low 0x0000000101b4d401
 DMAR: pasid dir entry: 0x0000000101b4e001
 DMAR: pasid table entry[0]: 0x0000000000000109
 DMAR: pasid table entry[1]: 0x0000000000000001
 DMAR: pasid table entry[2]: 0x0000000000000000
 DMAR: pasid table entry[3]: 0x0000000000000000
 DMAR: pasid table entry[4]: 0x0000000000000000
 DMAR: pasid table entry[5]: 0x0000000000000000
 DMAR: pasid table entry[6]: 0x0000000000000000
 DMAR: pasid table entry[7]: 0x0000000000000000
 DMAR: PTE not present at level 4

Cc: <stable@vger.kernel.org>
Fixes: 0bbeb01a4f ("iommu/vt-d: Manage scalalble mode PASID tables")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Sukumar Ghorai <sukumar.ghorai@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230209212843.1788125-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:05 +01:00
Jacob Pan 16a75bbe48 iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
Intel IOMMU driver implements IOTLB flush queue with domain selective
or PASID selective invalidations. In this case there's no need to track
IOVA page range and sync IOTLBs, which may cause significant performance
hit.

This patch adds a check to avoid IOVA gather page and IOTLB sync for
the lazy path.

The performance difference on Sapphire Rapids 100Gb NIC is improved by
the following (as measured by iperf send):

w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s

Cc: <stable@vger.kernel.org>
Fixes: 2a2b8eaa5b ("iommu: Handle freelists when using deferred flushing in iommu drivers")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230209175330.1783556-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:05 +01:00
Lu Baolu 60b1daa3b1 iommu/vt-d: Fix error handling in sva enable/disable paths
Roll back all previous actions in error paths of intel_iommu_enable_sva()
and intel_iommu_disable_sva().

Fixes: d5b9e4bfe0 ("iommu/vt-d: Report prq to io-pgfault framework")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230208051559.700109-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:04 +01:00
Vasant Hegde 996d120b4d iommu/amd: Improve page fault error reporting
If IOMMU domain for device group is not setup properly then we may hit
IOMMU page fault. Current page fault handler assumes that domain is
always setup and it will hit NULL pointer derefence (see below sample log).

Lets check whether domain is setup or not and log appropriate message.

Sample log:
----------
 amdgpu 0000:00:01.0: amdgpu: SE 1, SH per SE 1, CU per SH 8, active_cu_number 6
 BUG: kernel NULL pointer dereference, address: 0000000000000058
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 2 PID: 56 Comm: irq/24-AMD-Vi Not tainted 6.2.0-rc2+ #89
 Hardware name: xxx
 RIP: 0010:report_iommu_fault+0x11/0x90
 [...]
 Call Trace:
  <TASK>
  amd_iommu_int_thread+0x60c/0x760
  ? __pfx_irq_thread_fn+0x10/0x10
  irq_thread_fn+0x1f/0x60
  irq_thread+0xea/0x1a0
  ? preempt_count_add+0x6a/0xa0
  ? __pfx_irq_thread_dtor+0x10/0x10
  ? __pfx_irq_thread+0x10/0x10
  kthread+0xe9/0x110
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2c/0x50
  </TASK>

Reported-by: Matt Fagnani <matt.fagnani@bell.net>
Suggested-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216865
Link: https://lore.kernel.org/lkml/15d0f9ff-2a56-b3e9-5b45-e6b23300ae3b@leemhuis.info/
Link: https://lore.kernel.org/r/20230215052642.6016-3-vasant.hegde@amd.com
Cc: stable@vger.kernel.org
[joro: Edit commit message]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 11:17:33 +01:00
Vasant Hegde 18792e99ea iommu/amd: Do not identity map v2 capable device when snp is enabled
Flow:
  - Booted system with SNP enabled, memory encryption off and
    IOMMU DMA translation mode
  - AMD driver detects v2 capable device and amd_iommu_def_domain_type()
    returns identity mode
  - amd_iommu_domain_alloc() returns NULL an SNP is enabled
  - System will fail to register device

On SNP enabled system, passthrough mode is not supported. IOMMU default
domain is set to translation mode. We need to return zero from
amd_iommu_def_domain_type() so that it allocates translation domain.

Fixes: fb2accadaa ("iommu/amd: Introduce function to check and enable SNP")
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230207091752.7656-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 10:45:33 +01:00
Jason Gunthorpe 4daa861174 iommu: Fix error unwind in iommu_group_alloc()
If either iommu_group_grate_file() fails then the
iommu_group is leaked.

Destroy it on these error paths.

Found by kselftest/iommu/iommufd_fail_nth

Fixes: bc7d12b91b ("iommu: Implement reserved_regions iommu-group sysfs file")
Fixes: c52c72d3de ("iommu: Add sysfs attribyte for domain type")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/0-v1-8f616bee028d+8b-iommu_group_alloc_leak_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 10:20:31 +01:00
Randy Dunlap 4762315d1c iommu/of: mark an unused function as __maybe_unused
When CONFIG_OF_ADDRESS is not set, there is a build warning/error
about an unused function.
Annotate the function to quieten the warning/error.

../drivers/iommu/of_iommu.c:176:29: warning: 'iommu_resv_region_get_type' defined but not used [-Wunused-function]
  176 | static enum iommu_resv_type iommu_resv_region_get_type(struct device *dev, struct resource *phys,
      |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: a5bf3cfce8 ("iommu: Implement of_iommu_get_resv_regions()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: iommu@lists.linux.dev
Reviewed-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20230209010359.23831-1-rdunlap@infradead.org
[joro: Improve code formatting]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 10:17:31 +01:00
Jason Gunthorpe b4ff830eca iommufd: Do not add the same hwpt to the ioas->hwpt_list twice
The hwpt is added to the hwpt_list only during its creation, it is never
added again. This hunk is some missed leftover from rework. Adding it
twice will corrupt the linked list in some cases.

It effects HWPT specific attachment, which is something the test suite
cannot cover until we can create a legitimate struct device with a
non-system iommu "driver" (ie we need the bus removed from the iommu code)

Cc: stable@vger.kernel.org
Fixes: e8d5721003 ("iommufd: Add kAPI toward external drivers for physical devices")
Link: https://lore.kernel.org/r/1-v1-4336b5cb2fe4+1d7-iommufd_hwpt_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-15 21:37:48 -04:00
Jason Gunthorpe b3551ead61 iommufd: Make sure to zero vfio_iommu_type1_info before copying to user
Missed a zero initialization here. Most of the struct is filled with
a copy_from_user(), however minsz for that copy is smaller than the
actual struct by 8 bytes, thus we don't fill the padding.

Cc: stable@vger.kernel.org # 6.1+
Fixes: d624d6652a ("iommufd: vfio container FD ioctl compatibility")
Link: https://lore.kernel.org/r/0-v1-a74499ece799+1a-iommufd_get_info_leak_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: syzbot+cb1e0978f6bf46b83a58@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-14 16:49:55 -04:00
Elliot Berman 3bf90eca76 firmware: qcom_scm: Move qcom_scm.h to include/linux/firmware/qcom/
Move include/linux/qcom_scm.h to include/linux/firmware/qcom/qcom_scm.h.
This removes 1 of a few remaining Qualcomm-specific headers into a more
approciate subdirectory under include/.

Suggested-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Reviewed-by: Guru Das Srinagesh <quic_gurus@quicinc.com>
Acked-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230203210956.3580811-1-quic_eberman@quicinc.com
2023-02-08 19:15:16 -08:00
Jason Gunthorpe bed9e516f1 Merge branch 'vfio-no-iommu' into iommufd.git for-next
Shared branch with VFIO for the no-iommu support.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-03 15:55:49 -04:00
Jason Gunthorpe c9a397cee9 vfio: Support VFIO_NOIOMMU with iommufd
Add a small amount of emulation to vfio_compat to accept the SET_IOMMU to
VFIO_NOIOMMU_IOMMU and have vfio just ignore iommufd if it is working on a
no-iommu enabled device.

Move the enable_unsafe_noiommu_mode module out of container.c into
vfio_main.c so that it is always available even if VFIO_CONTAINER=n.

This passes Alex's mini-test:

https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c

Link: https://lore.kernel.org/r/0-v3-480cd64a16f7+1ad0-iommufd_noiommu_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-03 15:45:23 -04:00
Eric Curtin 9e6a1825ac iommu: dart: DART_T8110_ERROR range should be 0 to 5
This was detected by smatch as one "else if" statement could never be
reached. Confirmed bit order by comparing with python implementation [1].

drivers/iommu/apple-dart.c:991 apple_dart_t8110_irq()
warn: duplicate check 'error_code == ((((1))) << (3))'
  (previous on line 989)

Link: https://github.com/AsahiLinux/m1n1/commit/96b2d584feec1e3f7bfa [1]

Fixes: d8bcc870d9 ("iommu: dart: Add t8110 DART support")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/20230201124257.7801-1-ecurtin@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:10:40 +01:00
Kan Liang d8a7c0cf05 iommu/vt-d: Enable IOMMU perfmon support
Register and enable an IOMMU perfmon for each active IOMMU device.

The failure of IOMMU perfmon registration doesn't impact other
functionalities of an IOMMU device.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-8-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:09 +01:00
Kan Liang 4a0d426565 iommu/vt-d: Add IOMMU perfmon overflow handler support
While enabled to count events and an event occurrence causes the counter
value to increment and roll over to or past zero, this is termed a
counter overflow. The overflow can trigger an interrupt. The IOMMU
perfmon needs to handle the case properly.

New HW IRQs are allocated for each IOMMU device for perfmon. The IRQ IDs
are after the SVM range.

In the overflow handler, the counter is not frozen. It's very unlikely
that the same counter overflows again during the period. But it's
possible that other counters overflow at the same time. Read the
overflow register at the end of the handler and check whether there are
more.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-7-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:08 +01:00
Kan Liang 46284c6ceb iommu/vt-d: Support cpumask for IOMMU perfmon
The perf subsystem assumes that all counters are by default per-CPU. So
the user space tool reads a counter from each CPU. However, the IOMMU
counters are system-wide and can be read from any CPU. Here we use a CPU
mask to restrict counting to one CPU to handle the issue. (with CPU
hotplug notifier to choose a different CPU if the chosen one is taken
off-line).

The CPU is exposed to /sys/bus/event_source/devices/dmar*/cpumask for
the user space perf tool.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-6-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:08 +01:00
Kan Liang 7232ab8b89 iommu/vt-d: Add IOMMU perfmon support
Implement the IOMMU performance monitor capability, which supports the
collection of information about key events occurring during operation of
the remapping hardware, to aid performance tuning and debug.

The IOMMU perfmon support is implemented as part of the IOMMU driver and
interfaces with the Linux perf subsystem.

The IOMMU PMU has the following unique features compared with the other
PMUs.
- Support counting. Not support sampling.
- Does not support per-thread counting. The scope is system-wide.
- Support per-counter capability register. The event constraints can be
  enumerated.
- The available event and event group can also be enumerated.
- Extra Enhanced Commands are introduced to control the counters.

Add a new variable, struct iommu_pmu *pmu, to in the struct intel_iommu
to track the PMU related information.

Add iommu_pmu_register() and iommu_pmu_unregister() to register and
unregister a IOMMU PMU. The register function setup the IOMMU PMU ops
and invoke the standard perf_pmu_register() interface to register a PMU
in the perf subsystem. This patch only exposes the functions. The
following patch will enable them in the IOMMU driver.

The IOMMU PMUs can be found under /sys/bus/event_source/devices/dmar*

The available filters and event format can be found at the format folder

 $ ls /sys/bus/event_source/devices/dmar1/format/
 event  event_group  filter_ats  filter_ats_en  filter_page_table
 filter_page_table_en

The supported events can be found at the events folder

 $ ls /sys/bus/event_source/devices/dmar1/events/
 ats_blocked        fs_nonleaf_hit           int_cache_hit_posted
 iommu_mem_blocked  iotlb_hit        pasid_cache_lookup  ss_nonleaf_hit
 ctxt_cache_hit     fs_nonleaf_lookup        int_cache_lookup
 iommu_mrds         iotlb_lookup     pg_req_posted    ss_nonleaf_lookup
 ctxt_cache_lookup  int_cache_hit_nonposted  iommu_clocks
 iommu_requests     pasid_cache_hit  pw_occupancy

The command below illustrates filter usage with a simple example.

 $ perf stat -e dmar1/iommu_requests,filter_ats_en=0x1,filter_ats=0x1/
   -a sleep 1

 Performance counter stats for 'system wide':

   368,947      dmar1/iommu_requests,filter_ats_en=0x1,filter_ats=0x1/

 1.002592074 seconds time elapsed

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-5-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:06 +01:00
Kan Liang dc57875866 iommu/vt-d: Support Enhanced Command Interface
The Enhanced Command Register is to submit command and operand of
enhanced commands to DMA Remapping hardware. It can supports up to 256
enhanced commands.

There is a HW register to indicate the availability of all 256 enhanced
commands. Each bit stands for each command. But there isn't an existing
interface to read/write all 256 bits. Introduce the u64 ecmdcap[4] to
store the existence of each enhanced command. Read 4 times to get all of
them in map_iommu().

Add a helper to facilitate an enhanced command launch. Make sure hardware
complete the command. Also add a helper to facilitate the check of PMU
essentials. These helpers will be used later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-4-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:05 +01:00
Kan Liang a6a5006dad iommu/vt-d: Retrieve IOMMU perfmon capability information
The performance monitoring infrastructure, perfmon, is to support
collection of information about key events occurring during operation of
the remapping hardware, to aid performance tuning and debug. Each
remapping hardware unit has capability registers that indicate support
for performance monitoring features and enumerate the capabilities.

Add alloc_iommu_pmu() to retrieve IOMMU perfmon capability information
for each iommu unit. The information is stored in the iommu->pmu data
structure. Capability registers are read-only, so it's safe to prefetch
and store them in the pmu structure. This could avoid unnecessary VMEXIT
when this code is running in the virtualization environment.

Add free_iommu_pmu() to free the saved capability information when
freeing the iommu unit.

Add a kernel config option for the IOMMU perfmon feature. Unless a user
explicitly uses the perf tool to monitor the IOMMU perfmon event, there
isn't any impact for the existing IOMMU. Enable it by default.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-3-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:04 +01:00
Kan Liang 4db96bfe9d iommu/vt-d: Support size of the register set in DRHD
A new field, which indicates the size of the remapping hardware register
set for this remapping unit, is introduced in the DMA-remapping hardware
unit definition (DRHD) structure with the VT-d Spec 4.0. With this
information, SW doesn't need to 'guess' the size of the register set
anymore.

Update the struct acpi_dmar_hardware_unit to reflect the field. Store the
size of the register set in struct dmar_drhd_unit for each dmar device.

The 'size' information is ResvZ for the old BIOS and platforms. Fall back
to the old guessing method. There is nothing changed.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-2-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:03 +01:00