Commit graph

896 commits

Author SHA1 Message Date
Maciej W. Rozycki 3c0ec896a4 PCI/ASPM: Factor out waiting for link training to complete
Move code polling for the Link Training bit to clear into a function of its
own.

[bhelgaas: reorder to clean up before exposing to PCI core]
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306111605060.64925@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20 10:58:53 -05:00
Maciej W. Rozycki fd6e6e38eb PCI/ASPM: Avoid unnecessary pcie_link_state use
[bhelgaas: extract from expose patch, reorder to clean up before exposing]
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110229010.64925@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20 10:58:46 -05:00
Maciej W. Rozycki b168979977 PCI/ASPM: Use distinct local vars in pcie_retrain_link()
Use separate local variables to hold the respective values retrieved from
the Link Control Register and the Link Status Register.  Improves
readability and it makes it possible for the compiler to detect actual
uninitialised use should this code change in the future.

[bhelgaas: reorder to clean up before exposing to PCI core]
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110252260.64925@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-14 17:58:12 -05:00
Bjorn Helgaas 43ca31e002 Merge branch 'pci/reset'
- Wait longer for devices to become ready after resume (as we do for reset)
  to accommodate Intel Titan Ridge xHCI devices (Mika Westerberg)

- Drop pci_bridge_wait_for_secondary_bus() timeout parameter since all
  callers pass the same value (Mika Westerberg)

- Extend D3hot delay for NVIDIA HDA controllers to avoid unrecoverable
  devices after a bus reset (Alex Williamson)

* pci/reset:
  PCI/PM: Extend D3hot delay for NVIDIA HDA controllers
  PCI/PM: Drop pci_bridge_wait_for_secondary_bus() timeout parameter
  PCI/PM: Increase wait time after resume
2023-04-20 16:16:33 -05:00
Mika Westerberg e74b2b58ff PCI/PM: Drop pci_bridge_wait_for_secondary_bus() timeout parameter
All callers of pci_bridge_wait_for_secondary_bus() supply a timeout of
PCIE_RESET_READY_POLL_MS, so drop the parameter.  Move the definition of
PCIE_RESET_READY_POLL_MS into pci.c, the only user.

[bhelgaas: extracted from
https://lore.kernel.org/r/20230404052714.51315-3-mika.westerberg@linux.intel.com]
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-04-11 17:35:06 -05:00
Bjorn Helgaas 774820b362 PCI/EDR: Add edr_handle_event() comments
EDR documentation is a bit sketchy.  Add a couple comments to
edr_handle_event() about the devices involved.

Link: https://lore.kernel.org/r/20230407215259.GA3825733@bhelgaas
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2023-04-07 17:39:53 -05:00
Kuppuswamy Sathyanarayanan c441b1e03d PCI/EDR: Clear Device Status after EDR error recovery
During EDR recovery, the OS must clear error status of the port that
triggered DPC even if firmware retains control of DPC and AER (see the
implementation note in the PCI Firmware spec r3.3, sec 4.6.12).

Prior to 068c29a248 ("PCI/ERR: Clear PCIe Device Status errors only if
OS owns AER"), the port Device Status was cleared in this path:

  edr_handle_event
    dpc_process_error(dev)                 # "dev" triggered DPC
    pcie_do_recovery(dev, dpc_reset_link)
      dpc_reset_link                       # exit DPC
      pcie_clear_device_status(dev)        # clear Device Status

After 068c29a248, pcie_do_recovery() no longer clears Device Status when
firmware controls AER, so the error bit remains set even after recovery.

Per the "Downstream Port Containment configuration control" bit in the
returned _OSC Control Field (sec 4.5.1), the OS is allowed to clear error
status until it evaluates _OST, so clear Device Status in
edr_handle_event() if the error recovery was successful.

[bhelgaas: commit log]
Fixes: 068c29a248 ("PCI/ERR: Clear PCIe Device Status errors only if OS owns AER")
Link: https://lore.kernel.org/r/20230315235449.1279209-1-sathyanarayanan.kuppuswamy@linux.intel.com
Reported-by: Tsaur Erwin <erwin.tsaur@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-04-07 16:43:18 -05:00
Linus Torvalds 90ddb3f034 pci-v6.3-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmP2dbsUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vzBDg//aW2IeJYku5ENXwwnCQjBlyjBGOOZ
 456KGpFt/ky0N9Jp0ZS3nQSa5YN7q+L8XY48gu6I7s1hXly8iLZKLrJN++S//k55
 BadXu7mDUyVoY74LYvBe0nlXuwJul2qnq9IJLufRucrn1yoyqApAh39IRdCzi4U8
 mP+wad7sQA0Si4bpf80uwn6Yq8SrDoO0mtmO/dZSXJooM2t2SnDXEL/fxMwTNDA4
 XsVSP9FrbPmcTLo8mkDa8Dy7JKbL6KQJF9yDlmYzuA2spQpTf+YLLfsNnmE+850h
 WTtfCjVaYtlik7i9qTB+VcN1CsGVepYKK3H5the16Aeql2Fu+Ji5KSt74C220Yi9
 ZSDA93d/EfGc5egKyBdUUMFgqhe46srRUAoWcMrx2T4ARGuOm5EYCa9C8C7dFmO0
 j6f9MYL3j2Sw3FROEKViRVOFfbIfVW1TXIo3x0fE0ud3xkg73eKp/++X8QeTMjox
 2ArY2AWPNQpUI1oMlKxlSEd5XjFf7n/hHDtFqj9bIuJzt0/8wXQf0jCYTjhpGkRB
 pmO+lColK6lp+bg8aWRRkiwN73xGdQhKaeXLo0Iq4T6xr0Lb3XoskHZvt6NIGe/A
 ds5/uwtErq6kCf2G9YG1xfh+G1bimbjWwsHCNfSNXzTsWGDFTCb8tvqF90m+7+yl
 bllxTXA6PO312Tw=
 =/y4d
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Rework portdrv shutdown so it disables interrupts but doesn't
     disable bus mastering, which leads to hangs on Loongson LS7A

   - Add mechanism to prevent Max_Read_Request_Size (MRRS) increases,
     again to avoid hardware issues on Loongson LS7A (and likely other
     devices based on DesignWare IP)

   - Ignore devices with a firmware (DT or ACPI) node that says the
     device is disabled

  Resource management:

   - Distribute spare resources to unconfigured hotplug bridges at
     boot-time (not just when hot-adding such a bridge), which makes
     hot-adding devices to docks work better. Tried this in v6.1 but had
     to revert for regressions, so try again

   - Fix root bus issue that dropped resources that happened to end
     at 0, e.g., [bus 00]

  PCI device hotplug:

   - Remove device locking when marking device as disconnected so this
     doesn't have to wait for concurrent driver bind/unbind to complete

   - Quirk more Qualcomm bridges that don't fully implement the PCIe
     Slot Status 'Command Completed' bit

  Power management:

   - Account for _S0W of the target bridge in acpi_pci_bridge_d3() so we
     don't miss hot-add notifications for USB4 docks, Thunderbolt, etc

  Reset:

   - Observe delay after reset, e.g., resuming from system sleep,
     regardless of whether a bridge can suspend to D3cold at runtime

   - Wait for secondary bus to become ready after a bridge reset

  Virtualization:

   - Avoid FLR on some AMD FCH AHCI adapters where it doesn't work

   - Allow independent IOMMU groups for some Wangxun NICs that prevent
     peer-to-peer transactions but don't advertise an ACS Capability

  Error handling:

   - Configure End-to-End-CRC (ECRC) only if Linux owns the AER
     Capability

   - Remove redundant Device Control Error Reporting Enable in the AER
     service driver since this is already done for all devices during
     enumeration

  ASPM:

   - Add pci_enable_link_state() interface to allow drivers to enable
     ASPM link state

  Endpoint framework:

   - Move dra7xx and tegra194 linkup processing from hard IRQ to
     threaded IRQ handler

   - Add a separate lock for endpoint controller list of endpoint
     function drivers to prevent deadlock in callbacks

   - Pass events from endpoint controller to endpoint function drivers
     via callbacks instead of notifiers

  Synopsys DesignWare eDMA controller driver (acked by Vinod):

   - Fix CPU vs PCI address issues

   - Fix source vs destination address issues

   - Fix issues with interleaved transfer semantics

   - Fix channel count initialization issue (issue still exists in
     several other drivers)

   - Clean up and improve debugfs usage so it will work on platforms
     with several eDMA devices

  Baikal T-1 PCIe controller driver:

   - Set a 64-bit DMA mask

  Freescale i.MX6 PCIe controller driver:

   - Add i.MX8MM, i.MX8MQ, i.MX8MP endpoint mode DT binding and driver
     support

  Intel VMD host bridge driver:

   - Add quirk to configure PCIe ASPM and LTR. This is normally done by
     BIOS, and will be for future products

  Marvell MVEBU PCIe controller driver:

   - Mark this driver as broken in Kconfig since bugs prevent its daily
     usage

  MediaTek MT7621 PCIe controller driver:

   - Delay PHY port initialization to improve boot reliability for ZBT
     WE1326, ZBT WF3526-P, and some Netgear models

  Qualcomm PCIe controller driver:

   - Add MSM8998 DT compatible string

   - Unify MSM8996 and MSM8998 clock orderings

   - Add SM8350 DT binding and driver support

   - Add IPQ8074 Gen3 DT binding and driver support

   - Correct qcom,perst-regs in DT binding

   - Add qcom_pcie_host_deinit() so the PHY is powered off and
     regulators and clocks are disabled on late host-init errors

  Socionext UniPhier Pro5 controller driver:

   - Clean up uniphier-ep reg, clocks, resets, and their names in DT
     binding

  Synopsys DesignWare PCIe controller driver:

   - Restrict coherent DMA mask to 32 bits for MSI, but allow controller
     drivers to set 64-bit streaming DMA mask

   - Add eDMA engine support in both Root Port and Endpoint controllers

  Miscellaneous:

   - Remove MODULE_LICENSE from boolean drivers so they don't look like
     modules so modprobe can complain about them"

* tag 'pci-v6.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (86 commits)
  PCI: dwc: Add Root Port and Endpoint controller eDMA engine support
  PCI: bt1: Set 64-bit DMA mask
  PCI: dwc: Restrict only coherent DMA mask for MSI address allocation
  dmaengine: dw-edma: Prepare dw_edma_probe() for builtin callers
  dmaengine: dw-edma: Depend on DW_EDMA instead of selecting it
  dmaengine: dw-edma: Add mem-mapped LL-entries support
  PCI: Remove MODULE_LICENSE so boolean drivers don't look like modules
  PCI: hv: Drop duplicate PCI_MSI dependency
  PCI/P2PDMA: Annotate RCU dereference
  PCI/sysfs: Constify struct kobj_type pci_slot_ktype
  PCI: hotplug: Allow marking devices as disconnected during bind/unbind
  PCI: pciehp: Add Qualcomm quirk for Command Completed erratum
  PCI: qcom: Add IPQ8074 Gen3 port support
  dt-bindings: PCI: qcom: Add IPQ8074 Gen3 port
  dt-bindings: PCI: qcom: Sort compatibles alphabetically
  PCI: qcom: Fix host-init error handling
  PCI: qcom: Add SM8350 support
  dt-bindings: PCI: qcom: Add SM8350
  dt-bindings: PCI: qcom-ep: Correct qcom,perst-regs
  dt-bindings: PCI: qcom: Unify MSM8996 and MSM8998 clock order
  ...
2023-02-24 16:51:40 -08:00
Bjorn Helgaas 90fb1a3652 Merge branch 'pci/controller/vmd'
- Add pci_enable_link_state() to allow drivers to enable ASPM link state
  (Michael Bottini)

- Add quirk to enable all ASPM link states and program LTR for devices
  below VMD (David E. Box)

* pci/controller/vmd:
  PCI: vmd: Add quirk to configure PCIe ASPM and LTR
  PCI: vmd: Create feature grouping for client products
  PCI: vmd: Use PCI_VDEVICE in device list
  PCI/ASPM: Add pci_enable_link_state()
2023-02-22 13:47:31 -06:00
Bjorn Helgaas 0b7af1ddcf Merge branch 'pci/reset'
- Always observe reset delay when waking devices from D3cold, e.g., after
  system sleep, regardless of whether we're allowed to runtime-suspend to
  D3cold (Lukas Wunner)

- Unify reset and resume delays to wait for downstream devices after a
  bridge reset (Lukas Wunner)

- Wait for downstream devices after a DPC-induced bridge reset (Lukas
  Wunner)

* pci/reset:
  PCI/DPC: Await readiness of secondary bus after reset
  PCI: Unify delay handling for reset and resume
  PCI/PM: Observe reset delay irrespective of bridge_d3
2023-02-22 13:47:27 -06:00
Bjorn Helgaas a17613298f Merge branch 'pci/enumeration'
- Implement portdrv .shutdown() method that calls service driver .remove()
  methods (which disables interrupt generation as required by .shutdown()),
  but doesn't disable bus mastering (which hangs on Loongson LS7A because
  of a hardware defect) (Huacai Chen)

- Prevent MRRS increases for devices below Loongson LS7A to avoid hardware
  limitations (Huacai Chen)

- Ignore devices with a firmware (DT/ACPI) node that says the device is
  disabled (Rob Herring)

* pci/enumeration:
  PCI: Honor firmware's device disabled status
  PCI: loongson: Add more devices that need MRRS quirk
  PCI: loongson: Prevent LS7A MRRS increases
  PCI/portdrv: Prevent LS7A Bus Master clearing on shutdown
2023-02-22 13:47:24 -06:00
Bjorn Helgaas ff209ecc37 Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming"
This reverts commit 5e85eba6f5.

Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1 PM Substates
Control Register programming") broke suspend/resume on a Tuxedo
Infinitybook S 14 v5, which seems to use a Clevo L140CU Mainboard.

The main symptom is:

  iwlwifi 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
  nvme 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible

and the machine is only partially usable after resume.  It can't run dmesg
and can't do a clean reboot.  This happens on every suspend/resume cycle.

Revert 5e85eba6f5 until we can figure out the root cause.

Fixes: 5e85eba6f5 ("PCI/ASPM: Refactor L1 PM Substates Control Register programming")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org	# v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
2023-02-10 15:30:24 -06:00
Bjorn Helgaas a7152be79b Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"
This reverts commit 4ff116d0d5.

Tasev Nikola and Mark Enriquez reported that resume from suspend was broken
in v6.1-rc1.  Tasev bisected to a47126ec29 ("PCI/PTM: Cache PTM
Capability offset"), but we can't figure out how that could be related.

Mark saw the same symptoms and bisected to 4ff116d0d5 ("PCI/ASPM: Save L1
PM Substates Capability for suspend/resume"), which does have a connection:
it restores L1 Substates configuration while ASPM L1 may be enabled:

  pci_restore_state
    pci_restore_aspm_l1ss_state
      aspm_program_l1ss
        pci_write_config_dword(PCI_L1SS_CTL1, ctl1)         # L1SS restore
    pci_restore_pcie_state
      pcie_capability_write_word(PCI_EXP_LNKCTL, cap[i++])  # L1 restore

which is a problem because PCIe r6.0, sec 5.5.4, requires that:

  If setting either or both of the enable bits for ASPM L1 PM
  Substates, both ports must be configured as described in this
  section while ASPM L1 is disabled.

Separately, Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1
PM Substates Control Register programming") broke suspend/resume, and it
depends on 4ff116d0d5.

Revert 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") to fix the resume issue and enable revert of 5e85eba6f5
to fix the issue Thomas reported.

Note that reverting 4ff116d0d5 means L1 Substates config may be lost on
suspend/resume.  As far as we know the system will use more power but will
still *work* correctly.

Fixes: 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Tasev Nikola <tasev.stefanoska@skynet.be>
Reported-by: Mark Enriquez <enriquezmark36@gmail.com>
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Mark Enriquez <enriquezmark36@gmail.com>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org	# v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
2023-02-10 15:29:53 -06:00
Lukas Wunner 53b54ad074 PCI/DPC: Await readiness of secondary bus after reset
pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus
Reset, but not after a DPC-induced Hot Reset.

As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not
observed and devices on the secondary bus may be accessed before
they're ready.

One affected device is Intel's Ponte Vecchio HPC GPU.  It comprises a
PCIe switch whose upstream port is not immediately ready after reset.
Because its config space is restored too early, it remains in
D0uninitialized, its subordinate devices remain inaccessible and DPC
recovery fails with messages such as:

  i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
  intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible)
  pcieport 0000:89:02.0: AER: device recovery failed

Fix it.

Link: https://lore.kernel.org/r/9f5ff00e1593d8d9a4b452398b98aa14d23fca11.1673769517.git.lukas@wunner.de
Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable@vger.kernel.org
2023-02-09 12:46:15 -06:00
Michael Bottini de82f60f9c PCI/ASPM: Add pci_enable_link_state()
Add pci_enable_link_state() to allow devices to change the default BIOS
configured states. Clears the BIOS default settings then sets the new
states and reconfigures the link under the semaphore. Also add
PCIE_LINK_STATE_ALL macro for convenience for callers that want to enable
all link states.

Link: https://lore.kernel.org/r/20230120031522.2304439-2-david.e.box@linux.intel.com
Signed-off-by: Michael Bottini <michael.a.bottini@linux.intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2023-02-02 16:01:42 +01:00
Huacai Chen 62b6dee1b4 PCI/portdrv: Prevent LS7A Bus Master clearing on shutdown
After cc27b735ad ("PCI/portdrv: Turn off PCIe services during shutdown")
we observe hangs during poweroff/reboot on systems with LS7A chipset.

This happens because the portdrv .shutdown() method (pcie_portdrv_remove())
clears PCI_COMMAND_MASTER via pci_disable_device(), which prevents bridges
from forwarding memory or I/O Requests in the upstream direction (PCIe
r6.0, sec 7.5.1.1.3).

LS7A Root Ports have a hardware defect: clearing PCI_COMMAND_MASTER *also*
prevents the bridge from forwarding CPU MMIO requests in the downstream
direction, and these MMIO accesses to devices below the bridge happen even
after .shutdown(), e.g., to print console messages.  LS7A neither forwards
the requests nor sends an unsuccessful completion to the CPU, so the CPU
waits forever, resulting in the hang.

The purpose of .shutdown() is to disable interrupts and DMA from the
device.  PCIe ports may generate interrupts (either MSI/MSI-X or INTx) for
AER, DPC, PME, hotplug, etc., but they never perform DMA except MSI/MSI-X.
Clearing PCI_COMMAND_MASTER effectively disables MSI/MSI-X, but not INTx.

The port service driver .remove() methods clear the interrupt enables in
PCI_ERR_ROOT_COMMAND, PCI_EXP_DPC_CTL, PCI_EXP_SLTCTL, and PCI_EXP_RTCTL,
etc., which disables interrupts regardless of whether they are MSI/MSI-X or
INTx.

Add a pcie_portdrv_shutdown() method that calls all the port service driver
.remove() methods to clear the interrupt enables for each service but does
not clear Bus Mastering on the port itself.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20230201043018.778499-2-chenhuacai@loongson.cn
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-02-01 12:05:28 -06:00
Bjorn Helgaas 6b985af556 PCI/AER: Remove redundant Device Control Error Reporting Enable
The following bits in the PCIe Device Control register enable sending of
ERR_COR, ERR_NONFATAL, or ERR_FATAL Messages (or reporting internally in
the case of Root Ports):

  Correctable Error Reporting Enable
  Non-Fatal Error Reporting Enable
  Fatal Error Reporting Enable
  Unsupported Request Reporting Enable

These enable bits are set by pci_enable_pcie_error_reporting(), and since
f26e58bf6f ("PCI/AER: Enable error reporting when AER is native"), we
do that in this path during enumeration:

  pci_init_capabilities
    pci_aer_init
      pci_enable_pcie_error_reporting

Previously, the AER service driver also traversed the hierarchy when
claiming a Root Port, enabling error reporting for downstream devices, but
this is redundant.

Remove the code that enables this error reporting in the AER .probe() path.
Also remove similar code that disables error reporting in the AER .remove()
path.

Note that these Device Control Reporting Enable bits do not control
interrupt generation.  That's done by the similarly-named bits in the AER
Root Error Command register, which are still set by aer_probe() and cleared
by aer_remove(), since the AER service driver handles those interrupts.
See PCIe r6.0, sec 6.2.6.

Link: https://lore.kernel.org/r/20230118234612.272916-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stefan Roese <sr@denx.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <kbusch@kernel.org>
2023-01-26 17:06:13 -06:00
Vidya Sagar bba5065963 PCI/AER: Configure ECRC only if AER is native
As the ECRC configuration bits are part of AER registers, configure ECRC
only if AER is natively owned by the kernel.

Link: https://lore.kernel.org/r/20230112072111.20063-1-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-01-12 12:24:37 -06:00
Linus Torvalds c7020e1b34 pci-v6.2-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmOYpTIUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxuZhAAhGjE8voLZeOYwxbvfL69hGTAZ+Me
 x2hqRWVhh/IGWXTTaoSLwSjMMokcmAKN5S/wv8qdCG5sB8EN8FyTBIZDy8PuRRdl
 8UlqlBMSL+d4oSRDCnYLxFNcynLRNnmx2dfcdw9tJ4zjTLN8Y4o8PHFogR6pJ3MT
 sDC8S0myTQKXr4wAGzTZycKsiGManviYtByp6dCcKD3Oy5Q2uZ9OKO2DP2yQpn+F
 c3IJSV9oDz3KR8JVJ5Q1iz9cdMXbGwjkM3JLlHpxhedwjN4ErLumPutKcebtzO5C
 aTqabN7Nnzc4yJusAIfojFCWH7fgaYUyJ3pxcFyJ4tu4m9Last+2I5UB/kV2sYAD
 jWiCYx3sA/mRopNXOnrBGae+Lgy+sQnt8or0grySr0bK+b+ArAGis4uT4A0uASGO
 RUQdIQwz7zhHeQrwAladHWxnx4BEDNCatgfn38p4fklIYKydCY5nfZURMDvHezSR
 G6Nu08hoE9ZXlmkWTFw+5F23wPWKcCpzZj0hf7OroIouXUp8vqSFSqatH5vGkbCl
 bDswck9GdRJ2hl5SvFOeelaXkM42du45TMLU2JmIn6dYYFNrO93JgdvKSU7E2CpG
 AmDIpg1Idxo8fEPPGH1I7RVU5+ilzmmPQQY7poQW+va4/dEd/QVp1+ZZTDnMC1qk
 qi3ck22VdvPU2VU=
 =KULr
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Squash portdrv_{core,pci}.c into portdrv.c to ease maintenance and
     make more things static.

   - Make portdrv bind to Switch Ports that have AER. Previously, if
     these Ports lacked MSI/MSI-X, portdrv failed to bind, which meant
     the Ports couldn't be suspended to low-power states. AER on these
     Ports doesn't use interrupts, and the AER driver doesn't need to
     claim them.

   - Assign PCI domain IDs using ida_alloc(), which makes host bridge
     add/remove work better.

  Resource management:

   - To work better with recent BIOSes that use EfiMemoryMappedIO for
     PCI host bridge apertures, remove those regions from the E820 map
     (E820 entries normally prevent us from allocating BARs). In v5.19,
     we added some quirks to disable E820 checking, but that's not very
     maintainable. EfiMemoryMappedIO means the OS needs to map the
     region for use by EFI runtime services; it shouldn't prevent OS
     from using it.

  PCIe native device hotplug:

   - Build pciehp by default if USB4 is enabled, since Thunderbolt/USB4
     PCIe tunneling depends on native PCIe hotplug.

   - Enable Command Completed Interrupt only if supported to avoid user
     confusion from lspci output that says this is enabled but not
     supported.

   - Prevent pciehp from binding to Switch Upstream Ports; this happened
     because of interaction with acpiphp and caused devices below the
     Upstream Port to disappear.

  Power management:

   - Convert AGP drivers to generic power management. We hope to remove
     legacy power management from the PCI core eventually.

  Virtualization:

   - Fix pci_device_is_present(), which previously always returned
     "false" for VFs, causing virtio hangs when unbinding the driver.

  Miscellaneous:

   - Convert drivers to gpiod API to prepare for dropping some legacy
     code.

   - Fix DOE fencepost error for the maximum data object length.

  Baikal-T1 PCIe controller driver:

   - Add driver and DT bindings.

  Broadcom STB PCIe controller driver:

   - Enable Multi-MSI.

   - Delay 100ms after PERST# deassert to allow power and clocks to
     stabilize.

   - Configure Read Completion Boundary to 64 bytes.

  Freescale i.MX6 PCIe controller driver:

   - Initialize PHY before deasserting core reset to fix a regression in
     v6.0 on boards where the PHY provides the reference.

   - Fix imx6sx and imx8mq clock names in DT schema.

  Intel VMD host bridge driver:

   - Fix Secondary Bus Reset on VMD bridges, which allows reset of NVMe
     SSDs in VT-d pass-through scenarios.

   - Disable MSI remapping, which gets re-enabled by firmware during
     suspend/resume.

  MediaTek PCIe Gen3 controller driver:

   - Add MT7986 and MT8195 support.

  Qualcomm PCIe controller driver:

   - Add SC8280XP/SA8540P basic interconnect support.

  Rockchip DesignWare PCIe controller driver:

   - Base DT schema on common Synopsys schema.

  Synopsys DesignWare PCIe core:

   - Collect DT items shared between Root Port and Endpoint (PERST GPIO,
     PHY info, clocks, resets, link speed, number of lanes, number of
     iATU windows, interrupt info, etc) to snps,dw-pcie-common.yaml.

   - Add dma-ranges support for Root Ports and Endpoints.

   - Consolidate DT resource retrieval for "dbi", "dbi2", "atu", etc. to
     reduce code duplication.

   - Add generic names for clocks and resets to encourage more
     consistent naming across drivers using DesignWare IP.

   - Stop advertising PTM Responder role for Endpoints, which aren't
     allowed to be responders.

  TI J721E PCIe driver:

   - Add j721s2 host mode ID to DT schema.

   - Add interrupt properties to DT schema.

  Toshiba Visconti PCIe controller driver:

   - Fix interrupts array max constraints in DT schema"

* tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (95 commits)
  x86/PCI: Use pr_info() when possible
  x86/PCI: Fix log message typo
  x86/PCI: Tidy E820 removal messages
  PCI: Skip allocate_resource() if too little space available
  efi/x86: Remove EfiMemoryMappedIO from E820 map
  PCI/portdrv: Allow AER service only for Root Ports & RCECs
  PCI: xilinx-nwl: Fix coding style violations
  PCI: mvebu: Switch to using gpiod API
  PCI: pciehp: Enable Command Completed Interrupt only if supported
  PCI: aardvark: Switch to using devm_gpiod_get_optional()
  dt-bindings: PCI: mediatek-gen3: add support for mt7986
  dt-bindings: PCI: mediatek-gen3: add SoC based clock config
  dt-bindings: PCI: qcom: Allow 'dma-coherent' property
  PCI: mt7621: Add sentinel to quirks table
  PCI: vmd: Fix secondary bus reset for Intel bridges
  PCI: endpoint: pci-epf-vntb: Fix sparse ntb->reg build warning
  PCI: endpoint: pci-epf-vntb: Fix sparse build warning for epf_db
  PCI: endpoint: pci-epf-vntb: Replace hardcoded 4 with sizeof(u32)
  PCI: endpoint: pci-epf-vntb: Remove unused epf_db_phy struct member
  PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path
  ...
2022-12-14 09:54:10 -08:00
Bjorn Helgaas 9303050181 Merge branch 'pci/portdrv'
- Squash portdrv_core.c and portdrv_pci.c into portdrv.c to make it easier
  to find things (Bjorn Helgaas)

- Allow AER service only for Root Ports & RCECs so portdrv can successfully
  bind to other devices that have AER but lack MSI (which they don't need
  for AER), which allows power management for those devices (Bjorn Helgaas)

* pci/portdrv:
  PCI/portdrv: Allow AER service only for Root Ports & RCECs
  PCI/portdrv: Unexport pcie_port_service_register(), pcie_port_service_unregister()
  PCI/portdrv: Move private things to portdrv.c
  PCI/portdrv: Squash into portdrv.c
2022-12-10 10:36:34 -06:00
Bjorn Helgaas d8d2b65a94 PCI/portdrv: Allow AER service only for Root Ports & RCECs
Previously portdrv allowed the AER service for any device with an AER
capability (assuming Linux had control of AER) even though the AER service
driver only attaches to Root Port and RCECs.

Because get_port_device_capability() included AER for non-RP, non-RCEC
devices, we tried to initialize the AER IRQ even though these devices
don't generate AER interrupts.

Intel DG1 and DG2 discrete graphics cards contain a switch leading to a
GPU.  The switch supports AER but not MSI, so initializing an AER IRQ
failed, and portdrv failed to claim the switch port at all.  The GPU itself
could be suspended, but the switch could not be put in a low-power state
because it had no driver.

Don't allow the AER service on non-Root Port, non-Root Complex Event
Collector devices.  This means we won't enable Bus Mastering if the device
doesn't require MSI, the AER service will not appear in sysfs, and the AER
service driver will not bind to the device.

Link: https://lore.kernel.org/r/20221207084105.84947-1-mika.westerberg@linux.intel.com
Link: https://lore.kernel.org/r/20221210002922.1749403-1-helgaas@kernel.org
Based-on-patch-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2022-12-10 10:26:40 -06:00
Dave Jiang 361187e047 PCI/AER: Add optional logging callback for correctable error
Some new devices such as CXL devices may want to record additional error
information on a corrected error. Add a callback to allow the PCI device
driver to do additional logging such as providing additional stats for user
space RAS monitoring.

For CXL device, this is actually a need due to CXL needing to write to the
CXL RAS capability structure correctable error status register in order to
clear the unmasked correctable errors. See CXL spec rev3.0 8.2.4.16.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166984619233.2804404.3966368388544312674.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-03 13:40:56 -08:00
Rafael J. Wysocki 05f5747414 PCI/portdrv: Set PCIE_PORT_SERVICE_HP for Root and Downstream Ports only
It is reported that on some systems pciehp binds to an Upstream Port and
attempts to operate it which causes devices below the Port to disappear
from the bus.

This happens because acpiphp sets dev->is_hotplug_bridge for that Port
(after receiving a Device Check notification on it from the platform
firmware via ACPI) during the enumeration of PCI devices.

get_port_device_capability() sees that dev->is_hotplug_bridge is set and
adds PCIE_PORT_SERVICE_HP to Port services (which allows pciehp to bind to
the Port in question) without consulting the PCIe type, which should be
either Root Port or Downstream Port for the hotplug capability to be
present.

Per PCIe r6.0, sec 7.5.3.2, the Slot Implemented bit is only valid for
Downstream Ports (including Root Ports), and PCIe hotplug depends on the
Slot Capabilities / Control / Status registers.

Make get_port_device_capability() more robust by adding a PCIe type check
to it before adding PCIE_PORT_SERVICE_HP to Port services which helps to
avoid the problem.

[bhelgaas: add spec citation]
Suggested-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/4786090.31r3eYUQgx@kreacher
Reported-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
2022-11-22 13:27:11 -06:00
Albert Zhou e67ad9354a PCI: pciehp: Enable by default if USB4 enabled
Thunderbolt/USB4 PCIe tunneling depends on native PCIe hotplug.  Enable
pciehp by default if USB4 is enabled.

[bhelgaas: squash, update subject, commit logs, tidy whitespace]
Link: https://lore.kernel.org/r/20221115113857.35800-2-albert.zhou.50@gmail.com
Link: https://lore.kernel.org/r/20221115113857.35800-3-albert.zhou.50@gmail.com
Signed-off-by: Albert Zhou <albert.zhou.50@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-11-15 09:25:21 -06:00
Bjorn Helgaas 461a65d7d1 PCI/portdrv: Unexport pcie_port_service_register(), pcie_port_service_unregister()
pcie_port_service_register() and pcie_port_service_unregister() are used
only by the pciehp, aer, dpc, and pme PCIe port service drivers, none of
which can be modules.  Unexport pcie_port_service_register() and
pcie_port_service_unregister().  No functional change intended.

Link: https://lore.kernel.org/r/20221019204127.44463-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
2022-10-24 14:57:30 -05:00
Bjorn Helgaas 29f193feee PCI/portdrv: Move private things to portdrv.c
Previously several things used by portdrv_core.c and portdrv_pci.c were
shared by defining them in portdrv.h.  Now that portdrv_core.c and
portdrv_pci.c have been squashed, move things that can be private into
portdrv.c.  No functional change intended.

Link: https://lore.kernel.org/r/20221019204127.44463-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
2022-10-24 14:57:30 -05:00
Bjorn Helgaas a1ccd3d911 PCI/portdrv: Squash into portdrv.c
Squash portdrv_core.c and portdrv_pci.c into portdrv.c to make it easier to
find things.  The whole thing is less than 1000 lines, and it's a pain to
bounce back and forth between two files.

Several portdrv_core.c functions were non-static because they were
referenced from portdrv_pci.c.  Make them static since they're now all in
portdrv.c.

No functional change intended.

Link: https://lore.kernel.org/r/20221019204127.44463-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
2022-10-24 14:57:30 -05:00
Bjorn Helgaas 034f93fcb1 Merge branch 'pci/pm'
- Cache the PTM capability offset instead of searching for it every time
  (Bjorn Helgaas)

- Separate PTM configuration from PTM enable (Bjorn Helgaas)

- Add pci_suspend_ptm() and pci_resume_ptm() to disable and re-enable PTM
  on suspend/resume so some Root Ports can safely enter a lower-power PM
  state (Bjorn Helgaas)

- Disable PTM for all devices during suspend; previously we only did this
  for Root Ports and even then only in certain cases (Bjorn Helgaas)

- Simplify pci_pm_suspend_noirq() (Rajvi Jingar)

- Reduce the delay after transitions to/from D3hot by using usleep_range()
  instead of msleep(), which reduces the typical delay from 19ms to 10ms
  (Sajid Dalvi, Will McVicker)

* pci/pm:
  PCI/PM: Reduce D3hot delay with usleep_range()
  PCI/PM: Simplify pci_pm_suspend_noirq()
  PCI/PM: Always disable PTM for all devices during suspend
  PCI/PTM: Consolidate PTM interface declarations
  PCI/PTM: Reorder functions in logical order
  PCI/PTM: Preserve RsvdP bits in PTM Control register
  PCI/PTM: Move pci_ptm_info() body into its only caller
  PCI/PTM: Add pci_suspend_ptm() and pci_resume_ptm()
  PCI/PTM: Separate configuration and enable
  PCI/PTM: Add pci_upstream_ptm() helper
  PCI/PTM: Cache PTM Capability offset
2022-10-05 17:32:53 -05:00
Bjorn Helgaas f9538e27a2 Merge branch 'pci/dpc'
- Work around a BIOS defect that makes some Intel Root Ports report an RP
  PIO log size of zero (Mika Westerberg)

* pci/dpc:
  PCI/DPC: Quirk PIO log size for certain Intel Root Ports
2022-10-05 17:32:52 -05:00
Bjorn Helgaas 7afeb84d14 PCI/ASPM: Correct LTR_L1.2_THRESHOLD computation
80d7d7a904 ("PCI/ASPM: Calculate LTR_L1.2_THRESHOLD from device
characteristics") replaced a fixed value (163840ns) with one computed from
T_POWER_OFF, Common_Mode_Restore_Time, etc., but it encoded the
LTR_L1.2_THRESHOLD value incorrectly.

This is especially a problem for small thresholds, e.g., 63ns fell into the
"threshold_ns < 1024" case and was encoded as 32ns:

  LTR_L1.2_THRESHOLD_Scale = 1 (multiplier is 32ns)
  LTR_L1.2_THRESHOLD_Value = 63 >> 5 = 1
  LTR_L1.2_THRESHOLD       = multiplier * value = 32ns * 1 = 32ns

Correct the algorithm to encode all times of 1023ns (0x3ff) or smaller
exactly and larger times conservatively (the encoded threshold is never
smaller than was requested).  This reduces the chance of entering L1.2
when the device can't tolerate the exit latency.

Fixes: 80d7d7a904 ("PCI/ASPM: Calculate LTR_L1.2_THRESHOLD from device characteristics")
Link: https://lore.kernel.org/r/20221005025809.2247547-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2022-10-05 12:12:02 -05:00
Bjorn Helgaas cfc0028627 PCI/ASPM: Ignore L1 PM Substates if device lacks capability
187f91db82 ("PCI/ASPM: Remove struct aspm_register_info.l1ss_cap")
inadvertently removed a check for existence of the L1 PM Substates (L1SS)
Capability before reading it.

If there is no L1SS Capability, this means we mistakenly read PCI_COMMAND
and PCI_STATUS (config address 0x04) and interpret that as the PCI_L1SS_CAP
register, so we may incorrectly configure L1SS.

Make sure the L1SS Capability exists before trying to read it.

Fixes: 187f91db82 ("PCI/ASPM: Remove struct aspm_register_info.l1ss_cap")
Link: https://lore.kernel.org/r/20221005025809.2247547-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2022-10-05 12:11:56 -05:00
Bjorn Helgaas 9e2a03173d PCI/ASPM: Factor out L1 PM Substates configuration
Move L1 PM Substates configuration from pcie_aspm_cap_init() to a new
aspm_l1ss_init() function.  No functional change intended.

Link: https://lore.kernel.org/r/20221005025809.2247547-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2022-10-05 12:11:46 -05:00
Vidya Sagar 4ff116d0d5 PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
Previously the L1 PM Substates Control Registers (CTL1 and CTL2) weren't
saved and restored during suspend/resume leading to the L1 PM Substates
configuration being lost post-resume.

Save the L1 PM Substates Control Registers so that the configuration is
retained post-resume.

[bhelgaas: drop pci_is_pcie() testing; we can rely on pci_configure_ltr()
having already done that]
Link: https://lore.kernel.org/r/20220913131822.16557-3-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-10-04 20:35:40 -05:00
Vidya Sagar 5e85eba6f5 PCI/ASPM: Refactor L1 PM Substates Control Register programming
Refactor the code to extract the common code to program Control
Registers 1 and 2 of the L1 PM Substates capability to a new function
aspm_program_l1ss() and call it for both parent and child devices.

[bhelgaas: squash in update to preserve fields we're not updating from
https://lore.kernel.org/r/36fa13c5-e0f8-022f-77f7-7908e4df98b8@nvidia.com]
Link: https://lore.kernel.org/r/20220913131822.16557-2-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-10-04 20:32:39 -05:00
Mika Westerberg 5459c0b704 PCI/DPC: Quirk PIO log size for certain Intel Root Ports
Some Root Ports on Intel Tiger Lake and Alder Lake systems support the RP
Extensions for DPC and the RP PIO Log registers but incorrectly advertise
an RP PIO Log Size of zero.  This means the kernel complains that:

  DPC: RP PIO log size 0 is invalid

and if DPC is triggered, the DPC driver will not dump the RP PIO Log
registers when it should.

This is caused by a BIOS bug and should be fixed the BIOS for future CPUs.

Add a quirk to set the correct RP PIO Log size for the affected Root Ports.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=209943
Link: https://lore.kernel.org/r/20220816102042.69125-1-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2022-09-27 18:13:18 -05:00
Bjorn Helgaas 8b367e75ac PCI/PTM: Reorder functions in logical order
pci_enable_ptm() and pci_disable_ptm() were separated.
pci_save_ptm_state() and pci_restore_ptm_state() dangled at the top.  Move
them to logical places.  No functional change intended.

Link: https://lore.kernel.org/r/20220909202505.314195-8-helgaas@kernel.org
Tested-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-09-12 15:29:52 -05:00
Bjorn Helgaas 2b89c22f24 PCI/PTM: Preserve RsvdP bits in PTM Control register
Even though only the low 16 bits of PTM Control are currently defined, the
register is 32 bits wide and the unused bits are RsvdP ("Reserved and
Preserved"), so software must preserve the values of those bits when
writing the register.

Update PTM Control reads and writes to use 32-bit accesses and preserve the
reserved bits on writes.

Link: https://lore.kernel.org/r/20220909202505.314195-7-helgaas@kernel.org
Tested-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-09-12 15:29:47 -05:00
Bjorn Helgaas 91b12b2a10 PCI/PTM: Move pci_ptm_info() body into its only caller
pci_ptm_info() is simple and is only called by pci_enable_ptm().  Move the
entire body there.  No functional change intended.

Link: https://lore.kernel.org/r/20220909202505.314195-6-helgaas@kernel.org
Tested-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-09-12 15:29:42 -05:00
Bjorn Helgaas e8bdc5ea48 PCI/PTM: Add pci_suspend_ptm() and pci_resume_ptm()
We disable PTM during suspend because that allows some Root Ports to enter
lower-power PM states, which means we also need to disable PTM for all
downstream devices.  Add pci_suspend_ptm() and pci_resume_ptm() for this
purpose.

pci_enable_ptm() and pci_disable_ptm() are for drivers to use to enable or
disable PTM.  They use dev->ptm_enabled to keep track of whether PTM should
be enabled.

pci_suspend_ptm() and pci_resume_ptm() are PCI core-internal functions to
temporarily disable PTM during suspend and (depending on dev->ptm_enabled)
re-enable PTM during resume.

Enable/disable/suspend/resume all use internal __pci_enable_ptm() and
__pci_disable_ptm() functions that only update the PTM Control register.
Outline:

  pci_enable_ptm(struct pci_dev *dev)
  {
     __pci_enable_ptm(dev);
     dev->ptm_enabled = 1;
     pci_ptm_info(dev);
  }

  pci_disable_ptm(struct pci_dev *dev)
  {
     if (dev->ptm_enabled) {
       __pci_disable_ptm(dev);
       dev->ptm_enabled = 0;
     }
  }

  pci_suspend_ptm(struct pci_dev *dev)
  {
     if (dev->ptm_enabled)
       __pci_disable_ptm(dev);
  }

  pci_resume_ptm(struct pci_dev *dev)
  {
     if (dev->ptm_enabled)
       __pci_enable_ptm(dev);
  }

Nothing currently calls pci_resume_ptm(); the suspend path saves the PTM
state before disabling PTM, so the PTM state restore in the resume path
implicitly re-enables it.  A future change will use pci_resume_ptm() to fix
some problems with this approach.

Link: https://lore.kernel.org/r/20220909202505.314195-5-helgaas@kernel.org
Tested-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-09-12 15:29:37 -05:00
Bjorn Helgaas 118b9dfdc1 PCI/PTM: Separate configuration and enable
PTM configuration and enabling were previously mixed together:
pci_ptm_init() collected granularity info and enabled PTM for Root Ports
and Switch Upstream Ports; pci_enable_ptm() did the same for Endpoints.

Move everything related to the PTM Capability register to pci_ptm_init()
for all devices, and everything related to the PTM Control register to
pci_enable_ptm().

Link: https://lore.kernel.org/r/20220909202505.314195-4-helgaas@kernel.org
Tested-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-09-12 15:29:32 -05:00
Bjorn Helgaas e243c173c0 PCI/PTM: Add pci_upstream_ptm() helper
PTM requires an unbroken path of PTM-supporting devices between the PTM
Root and the ultimate PTM Requester, but if a Switch supports PTM, only the
Upstream Port can have a PTM Capability; the Downstream Ports do not.

Previously we copied the PTM configuration from the Switch Upstream Port to
the Downstream Ports so dev->ptm_enabled for any device implied that all
the upstream devices support PTM.

Instead of making it look like Downstream Ports have their own PTM config,
add pci_upstream_ptm(), which returns the upstream device that has a PTM
Capability (either a Root Port or a Switch Upstream Port).

Link: https://lore.kernel.org/r/20220909202505.314195-3-helgaas@kernel.org
Tested-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-09-12 15:29:25 -05:00
Bjorn Helgaas a47126ec29 PCI/PTM: Cache PTM Capability offset
Cache the PTM Capability offset instead of searching for it every time we
enable/disable PTM or save/restore PTM state.  No functional change
intended.

Link: https://lore.kernel.org/r/20220909202505.314195-2-helgaas@kernel.org
Tested-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-09-12 15:28:43 -05:00
Bjorn Helgaas 5a20930f27 Merge branch 'pci/err'
- Recognize disconnected devices so we don't bother trying to set them to
  "frozen" or "normal" state (Christoph Hellwig)

- Clear PCI Status register during enumeration in case firmware left errors
  logged (Kai-Heng Feng)

- Configure ECRC for every device, including hot-added ones (Stefan Roese)

- Keep AER error reporting enabled for switches (Stefan Roese)

- Enable error reporting for all devices that support AER (Stefan Roese)

- Iterate over error counters instead of error strings to avoid printing
  junk in AER sysfs counters (Mohamed Khalfella)

* pci/err:
  PCI/AER: Iterate over error counters instead of error strings
  PCI/AER: Enable error reporting when AER is native
  PCI/portdrv: Don't disable AER reporting in get_port_device_capability()
  PCI/AER: Configure ECRC for every device
  PCI: Clear PCI_STATUS when setting up device
  PCI/ERR: Recognize disconnected devices in report_error_detected()
2022-08-04 11:41:52 -05:00
Mohamed Khalfella 5e6ae05095 PCI/AER: Iterate over error counters instead of error strings
Previously we iterated over AER stat *names*, e.g.,
aer_correctable_error_string[32], but the actual stat *counters* may not be
that large, e.g., pdev->aer_stats->dev_cor_errs[16], which means that we
printed junk in the sysfs stats files.

Iterate over the stat counter arrays instead of the names to avoid this
junk.

Also, added a build time check to make sure all
counters have entries in strings array.

Fixes: 0678e3109a ("PCI/AER: Simplify __aer_print_error()")
Link: https://lore.kernel.org/r/20220509181441.31884-1-mkhalfella@purestorage.com
Reported-by: Meeta Saggi <msaggi@purestorage.com>
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Meeta Saggi <msaggi@purestorage.com>
Reviewed-by: Eric Badger <ebadger@purestorage.com>
Cc: stable@vger.kernel.org
2022-07-13 14:45:20 -05:00
Stefan Roese f26e58bf6f PCI/AER: Enable error reporting when AER is native
If we have native control of AER, set the following error reporting enable
bits:

  - Correctable Error Reporting Enable
  - Non-Fatal Error Reporting Enable
  - Fatal Error Reporting Enable
  - Unsupported Request Reporting Enable

Note that these bits are all in the Device Control register and are not
AER-specific.

This affects all devices with an AER capability, including hot-added
devices.

Please note that this change is quite invasive, as error reporting now will
be enabled for all available PCIe Endpoints, which was previously not the
case.

When "pci=noaer" is selected, error reporting stays disabled of course.

[bhelgaas: commit log, note error reporting is not AER-specific]
Link: https://lore.kernel.org/r/20220125071820.2247260-4-sr@denx.de
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Pali Rohár <pali@kernel.org>
Cc: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yao Hongbo <yaohongbo@linux.alibaba.com>
Cc: Naveen Naidu <naveennaidu479@gmail.com>
2022-07-13 14:44:12 -05:00
Stefan Roese 8795e182b0 PCI/portdrv: Don't disable AER reporting in get_port_device_capability()
AER reporting is currently disabled in the DevCtl registers of all non Root
Port PCIe devices on systems using pcie_ports_native || host->native_aer,
disabling AER completely in such systems. This is because 2bd50dd800
("PCI: PCIe: Disable PCIe port services during port initialization"), added
a call to pci_disable_pcie_error_reporting() *after* the AER setup was
completed for the PCIe device tree.

Here a longer analysis about the current status of AER enabling /
disabling upon bootup provided by Bjorn:

  pcie_portdrv_probe
    pcie_port_device_register
      get_port_device_capability
        pci_disable_pcie_error_reporting
          clear CERE NFERE FERE URRE               # <-- disable for RP USP DSP
      pcie_device_init
        device_register                            # new AER service device
          aer_probe
            aer_enable_rootport                    # RP only
              set_downstream_devices_error_reporting
                set_device_error_reporting         # self (RP)
                  if (RP || USP || DSP)
                    pci_enable_pcie_error_reporting
                      set CERE NFERE FERE URRE     # <-- enable for RP
                pci_walk_bus
                  set_device_error_reporting
                    if (RP || USP || DSP)
                      pci_enable_pcie_error_reporting
                        set CERE NFERE FERE URRE   # <-- enable for USP DSP

In a typical Root Port -> Endpoint hierarchy, the above:
  - Disables Error Reporting for the Root Port,
  - Enables Error Reporting for the Root Port,
  - Does NOT enable Error Reporting for the Endpoint because it is not a
    Root Port or Switch Port.

In a deeper Root Port -> Upstream Switch Port -> Downstream Switch
Port -> Endpoint hierarchy:
  - Disables Error Reporting for the Root Port,
  - Enables Error Reporting for the Root Port,
  - Enables Error Reporting for both Switch Ports,
  - Does NOT enable Error Reporting for the Endpoint because it is not a
    Root Port or Switch Port,
  - Disables Error Reporting for the Switch Ports when pcie_portdrv_probe()
    claims them.  AER does not re-enable it because these are not Root
    Ports.

Remove this call to pci_disable_pcie_error_reporting() from
get_port_device_capability(), leaving the already enabled AER configuration
intact. With this change, AER is enabled in the Root Port and the PCIe
switch upstream and downstream ports. Only the PCIe Endpoints don't have
AER enabled yet. A follow-up patch will take care of this Endpoint
enabling.

Fixes: 2bd50dd800 ("PCI: PCIe: Disable PCIe port services during port initialization")
Link: https://lore.kernel.org/r/20220125071820.2247260-3-sr@denx.de
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Pali Rohár <pali@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yao Hongbo <yaohongbo@linux.alibaba.com>
Cc: Naveen Naidu <naveennaidu479@gmail.com>
2022-07-13 14:43:40 -05:00
Bjorn Helgaas ba13d4575d PCI/ASPM: Unexport pcie_aspm_support_enabled()
pcie_aspm_support_enabled() is used only by the acpi/pci_root.c driver,
which cannot be built as a module, so it does not need to be exported.
Unexport it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-12 13:18:29 -05:00
Kai-Heng Feng 08d0cc5f34 PCI/ASPM: Remove pcie_aspm_pm_state_change()
pcie_aspm_pm_state_change() was introduced at the inception of PCIe ASPM
code, but it can cause some issues. For instance, when ASPM config is
changed via sysfs, those changes won't persist across power state change
because pcie_aspm_pm_state_change() overwrites them.

Also, if the driver restores L1SS [1] after system resume, the restored
state will also be overwritten by pcie_aspm_pm_state_change().

Remove pcie_aspm_pm_state_change().  If there's any hardware that really
needs it to function, a quirk can be used instead.

[1] https://lore.kernel.org/linux-pci/20220201123536.12962-1-vidyas@nvidia.com/
Link: https://lore.kernel.org/r/20220509073639.2048236-1-kai.heng.feng@canonical.com
[bhelgaas: remove additional pcie_aspm_pm_state_change() call in
pci_set_low_power_state(), added by
10aa5377fc ("PCI/PM: Split pci_raw_set_power_state()") and moved by
7957d20145 ("PCI/PM: Relocate pci_set_low_power_state()")]
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-07-12 10:08:26 -05:00
Stefan Roese 9ffb98f144 PCI/AER: Configure ECRC for every device
Move pcie_set_ecrc_checking() to pci_aer_init() to make sure that
pcie_set_ecrc_checking() is called for each PCIe device, including
hot-added devices.

Link: https://lore.kernel.org/r/20220125071820.2247260-2-sr@denx.de
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Pali Rohár <pali@kernel.org>
Cc: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yao Hongbo <yaohongbo@linux.alibaba.com>
Cc: Naveen Naidu <naveennaidu479@gmail.com>
2022-07-11 14:51:42 -05:00
Christoph Hellwig 5e69a33c5c PCI/ERR: Recognize disconnected devices in report_error_detected()
When a device is already unplugged by pciehp by the time the AER handler is
invoked, the PCIe device will already be in the pci_channel_io_perm_failure
state.  In that case simply return PCI_ERS_RESULT_DISCONNECT instead of
trying to do a state transition that will fail.

Also untangle the state transition failure from the lack of methods to
improve the debugging output in case it happens again.

Link: https://lore.kernel.org/r/20220601074024.3481035-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2022-06-08 15:08:40 -05:00