Commit graph

1696 commits

Author SHA1 Message Date
Linus Torvalds c3bed3b20e pci-v5.5-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl3leXUUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyY3g/9FAVVdPEaadNtAhQ/zIxcjozDovKq
 0q7yOA3aTBTUoNEinm88an6p0dcC4gNKtGukXmzVH2Hhxm9kLRdtpZGYY00tpLUB
 9rI7XsgwwHa+hLwsHbIs507sKGFGy5FLr0ChTTGLDEMppnEvjA2hZooYmcB/OgrC
 LlFcwbNKGOk/Si9u2bF2nLO0JDoVHnwzpF99saew/nqc7Lfj9e9IPZFom+VjPBUh
 AOvRp2H7uBN+WQlpLeFeMDDoeXh34lX0kYqIV/cVkXVnknDGYKV2CBTg2aeX7jd0
 QiPHZh6zlW8zNQgaCZRiBAbatVEOnRMRJ++yiqB8hBYp1LMXm6kJ01YSQpXkugoY
 Vp9dtzzTARWV/XkKwD4brw9ZEmIDnO+Ed2x2VbUkPJVcXAvzSQWAx82IU0Iuqmcb
 9qr6U2Zf/Xk5aFlGPYVH8QOG+QqzIbZNRQ7NlhDlITyW4P6QPu0mw374yYP2wDGL
 sP5YSS3YGa0sQcEgDtVnd4z+WTZI4AwXLPaeaLkDhdfHp2FsERUY4TrPs33J99xw
 og4EyokVFzjYzlnBPU6WWn7LL+jj5ccXkL3MA4DR4FJOnNGHh7NXfQUH56rrgsq7
 F9/8shL5DuTbQkde1uSyUG9Iq/RigVLlV5DQavFm3dSXvZi0E16t5alC5URNTzk7
 at8Bogn53QhlmYc=
 =uUXw
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Warn if a host bridge has no NUMA info (Yunsheng Lin)

   - Add PCI_STD_NUM_BARS for the number of standard BARs (Denis
     Efremov)

  Resource management:

   - Fix boot-time Embedded Controller GPE storm caused by incorrect
     resource assignment after ACPI Bus Check Notification (Mika
     Westerberg)

   - Protect pci_reassign_bridge_resources() against concurrent
     addition/removal (Benjamin Herrenschmidt)

   - Fix bridge dma_ranges resource list cleanup (Rob Herring)

   - Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters to control
     the MMIO and prefetchable MMIO window sizes of hotplug bridges
     independently (Nicholas Johnson)

   - Fix MMIO/MMIO_PREF window assignment that assigned more space than
     desired (Nicholas Johnson)

   - Only enforce bus numbers from bridge EA if the bridge has EA
     devices downstream (Subbaraya Sundeep)

   - Consolidate DT "dma-ranges" parsing and convert all host drivers to
     use shared parsing (Rob Herring)

  Error reporting:

   - Restore AER capability after resume (Mayurkumar Patel)

   - Add PoisonTLPBlocked AER counter (Rajat Jain)

   - Use for_each_set_bit() to simplify AER code (Andy Shevchenko)

   - Fix AER kernel-doc (Andy Shevchenko)

   - Add "pcie_ports=dpc-native" parameter to allow native use of DPC
     even if platform didn't grant control over AER (Olof Johansson)

  Hotplug:

   - Avoid returning prematurely from sysfs requests to enable or
     disable a PCIe hotplug slot (Lukas Wunner)

   - Don't disable interrupts twice when suspending hotplug ports (Mika
     Westerberg)

   - Fix deadlocks when PCIe ports are hot-removed while suspended (Mika
     Westerberg)

  Power management:

   - Remove unnecessary ASPM locking (Bjorn Helgaas)

   - Add support for disabling L1 PM Substates (Heiner Kallweit)

   - Allow re-enabling Clock PM after it has been disabled (Heiner
     Kallweit)

   - Add sysfs attributes for controlling ASPM link states (Heiner
     Kallweit)

   - Remove CONFIG_PCIEASPM_DEBUG, including "link_state" and "clk_ctl"
     sysfs files (Heiner Kallweit)

   - Avoid AMD FCH XHCI USB PME# from D0 defect that prevents wakeup on
     USB 2.0 or 1.1 connect events (Kai-Heng Feng)

   - Move power state check out of pci_msi_supported() (Bjorn Helgaas)

   - Fix incorrect MSI-X masking on resume and revert related nvme quirk
     for Kingston NVME SSD running FW E8FK11.T (Jian-Hong Pan)

   - Always return devices to D0 when thawing to fix hibernation with
     drivers like mlx4 that used legacy power management (previously we
     only did it for drivers with new power management ops) (Dexuan Cui)

   - Clear PCIe PME Status even for legacy power management (Bjorn
     Helgaas)

   - Fix PCI PM documentation errors (Bjorn Helgaas)

   - Use dev_printk() for more power management messages (Bjorn Helgaas)

   - Apply D2 delay as milliseconds, not microseconds (Bjorn Helgaas)

   - Convert xen-platform from legacy to generic power management (Bjorn
     Helgaas)

   - Removed unused .resume_early() and .suspend_late() legacy power
     management hooks (Bjorn Helgaas)

   - Rearrange power management code for clarity (Rafael J. Wysocki)

   - Decode power states more clearly ("4" or "D4" really refers to
     "D3cold") (Bjorn Helgaas)

   - Notice when reading PM Control register returns an error (~0)
     instead of interpreting it as being in D3hot (Bjorn Helgaas)

   - Add missing link delays required by the PCIe spec (Mika Westerberg)

  Virtualization:

   - Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI (Bjorn
     Helgaas)

   - Allow VFs to use PRI (the PF PRI is shared by the VFs, but the code
     previously didn't recognize that) (Kuppuswamy Sathyanarayanan)

   - Allow VFs to use PASID (the PF PASID capability is shared by the
     VFs, but the code previously didn't recognize that) (Kuppuswamy
     Sathyanarayanan)

   - Disconnect PF and VF ATS enablement, since ATS in PFs and
     associated VFs can be enabled independently (Kuppuswamy
     Sathyanarayanan)

   - Cache PRI and PASID capability offsets (Kuppuswamy Sathyanarayanan)

   - Cache the PRI PRG Response PASID Required bit (Bjorn Helgaas)

   - Consolidate ATS declarations in linux/pci-ats.h (Krzysztof
     Wilczynski)

   - Remove unused PRI and PASID stubs (Bjorn Helgaas)

   - Removed unnecessary EXPORT_SYMBOL_GPL() from ATS, PRI, and PASID
     interfaces that are only used by built-in IOMMU drivers (Bjorn
     Helgaas)

   - Hide PRI and PASID state restoration functions used only inside the
     PCI core (Bjorn Helgaas)

   - Add a DMA alias quirk for the Intel VCA NTB (Slawomir Pawlowski)

   - Serialize sysfs sriov_numvfs reads vs writes (Pierre Crégut)

   - Update Cavium ACS quirk for ThunderX2 and ThunderX3 (George
     Cherian)

   - Fix the UPDCR register address in the Intel ACS quirk (Steffen
     Liebergeld)

   - Unify ACS quirk implementations (Bjorn Helgaas)

  Amlogic Meson host bridge driver:

   - Fix meson PERST# GPIO polarity problem (Remi Pommarel)

   - Add DT bindings for Amlogic Meson G12A (Neil Armstrong)

   - Fix meson clock names to match DT bindings (Neil Armstrong)

   - Add meson support for Amlogic G12A SoC with separate shared PHY
     (Neil Armstrong)

   - Add meson extended PCIe PHY functions for Amlogic G12A USB3+PCIe
     combo PHY (Neil Armstrong)

   - Add arm64 DT for Amlogic G12A PCIe controller node (Neil Armstrong)

   - Add commented-out description of VIM3 USB3/PCIe mux in arm64 DT
     (Neil Armstrong)

  Broadcom iProc host bridge driver:

   - Invalidate iProc PAXB address mapping before programming it
     (Abhishek Shah)

   - Fix iproc-msi and mvebu __iomem annotations (Ben Dooks)

  Cadence host bridge driver:

   - Refactor Cadence PCIe host controller to use as a library for both
     host and endpoint (Tom Joseph)

  Freescale Layerscape host bridge driver:

   - Add layerscape LS1028a support (Xiaowei Bao)

  Intel VMD host bridge driver:

   - Add VMD bus 224-255 restriction decode (Jon Derrick)

   - Add VMD 8086:9A0B device ID (Jon Derrick)

   - Remove Keith from VMD maintainer list (Keith Busch)

  Marvell ARMADA 3700 / Aardvark host bridge driver:

   - Use LTSSM state to build link training flag since Aardvark doesn't
     implement the Link Training bit (Remi Pommarel)

   - Delay before training Aardvark link in case PERST# was asserted
     before the driver probe (Remi Pommarel)

   - Fix Aardvark issues with Root Control reads and writes (Remi
     Pommarel)

   - Don't rely on jiffies in Aardvark config access path since
     interrupts may be disabled (Remi Pommarel)

   - Fix Aardvark big-endian support (Grzegorz Jaszczyk)

  Marvell ARMADA 370 / XP host bridge driver:

   - Make mvebu_pci_bridge_emul_ops static (Ben Dooks)

  Microsoft Hyper-V host bridge driver:

   - Add hibernation support for Hyper-V virtual PCI devices (Dexuan
     Cui)

   - Track Hyper-V pci_protocol_version per-hbus, not globally (Dexuan
     Cui)

   - Avoid kmemleak false positive on hv hbus buffer (Dexuan Cui)

  Mobiveil host bridge driver:

   - Change mobiveil csr_read()/write() function names that conflict
     with riscv arch functions (Kefeng Wang)

  NVIDIA Tegra host bridge driver:

   - Fix Tegra CLKREQ dependency programming (Vidya Sagar)

  Renesas R-Car host bridge driver:

   - Remove unnecessary header include from rcar (Andrew Murray)

   - Tighten register index checking for rcar inbound range programming
     (Marek Vasut)

   - Fix rcar inbound range alignment calculation to improve packing of
     multiple entries (Marek Vasut)

   - Update rcar MACCTLR setting to match documentation (Yoshihiro
     Shimoda)

   - Clear bit 0 of MACCTLR before PCIETCTLR.CFINIT per manual
     (Yoshihiro Shimoda)

   - Add Marek Vasut and Yoshihiro Shimoda as R-Car maintainers (Simon
     Horman)

  Rockchip host bridge driver:

   - Make rockchip 0V9 and 1V8 power regulators non-optional (Robin
     Murphy)

  Socionext UniPhier host bridge driver:

   - Set uniphier to host (RC) mode always (Kunihiko Hayashi)

  Endpoint drivers:

   - Fix endpoint driver sign extension problem when shifting page
     number to phys_addr_t (Alan Mikhak)

  Misc:

   - Add NumaChip SPDX header (Krzysztof Wilczynski)

   - Replace EXTRA_CFLAGS with ccflags-y (Krzysztof Wilczynski)

   - Remove unused includes (Krzysztof Wilczynski)

   - Removed unused sysfs attribute groups (Ben Dooks)

   - Remove PTM and ASPM dependencies on PCIEPORTBUS (Bjorn Helgaas)

   - Add PCIe Link Control 2 register field definitions to replace magic
     numbers in AMDGPU and Radeon CIK/SI (Bjorn Helgaas)

   - Fix incorrect Link Control 2 Transmit Margin usage in AMDGPU and
     Radeon CIK/SI PCIe Gen3 link training (Bjorn Helgaas)

   - Use pcie_capability_read_word() instead of pci_read_config_word()
     in AMDGPU and Radeon CIK/SI (Frederick Lawler)

   - Remove unused pci_irq_get_node() Greg Kroah-Hartman)

   - Make asm/msi.h mandatory and simplify PCI_MSI_IRQ_DOMAIN Kconfig
     (Palmer Dabbelt, Michal Simek)

   - Read all 64 bits of Switchtec part_event_bitmap (Logan Gunthorpe)

   - Fix erroneous intel-iommu dependency on CONFIG_AMD_IOMMU (Bjorn
     Helgaas)

   - Fix bridge emulation big-endian support (Grzegorz Jaszczyk)

   - Fix dwc find_next_bit() usage (Niklas Cassel)

   - Fix pcitest.c fd leak (Hewenliang)

   - Fix typos and comments (Bjorn Helgaas)

   - Fix Kconfig whitespace errors (Krzysztof Kozlowski)"

* tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (160 commits)
  PCI: Remove PCI_MSI_IRQ_DOMAIN architecture whitelist
  asm-generic: Make msi.h a mandatory include/asm header
  Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"
  PCI/MSI: Fix incorrect MSI-X masking on resume
  PCI/MSI: Move power state check out of pci_msi_supported()
  PCI/MSI: Remove unused pci_irq_get_node()
  PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer
  PCI: hv: Change pci_protocol_version to per-hbus
  PCI: hv: Add hibernation support
  PCI: hv: Reorganize the code in preparation of hibernation
  MAINTAINERS: Remove Keith from VMD maintainer
  PCI/ASPM: Remove PCIEASPM_DEBUG Kconfig option and related code
  PCI/ASPM: Add sysfs attributes for controlling ASPM link states
  PCI: Fix indentation
  drm/radeon: Prefer pcie_capability_read_word()
  drm/radeon: Replace numbers with PCI_EXP_LNKCTL2 definitions
  drm/radeon: Correct Transmit Margin masks
  drm/amdgpu: Prefer pcie_capability_read_word()
  PCI: uniphier: Set mode register to host mode
  drm/amdgpu: Replace numbers with PCI_EXP_LNKCTL2 definitions
  ...
2019-12-03 13:58:22 -08:00
Linus Torvalds 0da522107e compat_ioctl: remove most of fs/compat_ioctl.c
As part of the cleanup of some remaining y2038 issues, I came to
 fs/compat_ioctl.c, which still has a couple of commands that need support
 for time64_t.
 
 In completely unrelated work, I spent time on cleaning up parts of this
 file in the past, moving things out into drivers instead.
 
 After Al Viro reviewed an earlier version of this series and did a lot
 more of that cleanup, I decided to try to completely eliminate the rest
 of it and move it all into drivers.
 
 This series incorporates some of Al's work and many patches of my own,
 but in the end stops short of actually removing the last part, which is
 the scsi ioctl handlers. I have patches for those as well, but they need
 more testing or possibly a rewrite.
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1
 RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ
 +ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF
 lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL
 6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD
 dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH
 VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb
 YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT
 m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm
 TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n
 e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl
 bN65armTm7bFFR32Avnu
 =lgCl
 -----END PGP SIGNATURE-----

Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
 "As part of the cleanup of some remaining y2038 issues, I came to
  fs/compat_ioctl.c, which still has a couple of commands that need
  support for time64_t.

  In completely unrelated work, I spent time on cleaning up parts of
  this file in the past, moving things out into drivers instead.

  After Al Viro reviewed an earlier version of this series and did a lot
  more of that cleanup, I decided to try to completely eliminate the
  rest of it and move it all into drivers.

  This series incorporates some of Al's work and many patches of my own,
  but in the end stops short of actually removing the last part, which
  is the scsi ioctl handlers. I have patches for those as well, but they
  need more testing or possibly a rewrite"

* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
  scsi: sd: enable compat ioctls for sed-opal
  pktcdvd: add compat_ioctl handler
  compat_ioctl: move SG_GET_REQUEST_TABLE handling
  compat_ioctl: ppp: move simple commands into ppp_generic.c
  compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
  compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
  compat_ioctl: unify copy-in of ppp filters
  tty: handle compat PPP ioctls
  compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
  compat_ioctl: handle SIOCOUTQNSD
  af_unix: add compat_ioctl support
  compat_ioctl: reimplement SG_IO handling
  compat_ioctl: move WDIOC handling into wdt drivers
  fs: compat_ioctl: move FITRIM emulation into file systems
  gfs2: add compat_ioctl support
  compat_ioctl: remove unused convert_in_user macro
  compat_ioctl: remove last RAID handling code
  compat_ioctl: remove /dev/raw ioctl translation
  compat_ioctl: remove PCI ioctl translation
  compat_ioctl: remove joystick ioctl translation
  ...
2019-12-01 13:46:15 -08:00
Jian-Hong Pan 655e7aee1f Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"
Since e045fa29e8 ("PCI/MSI: Fix incorrect MSI-X masking on resume") is
merged, we can revert the previous quirk now.

This reverts commit 19ea025e1d.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887
Fixes: 19ea025e1d ("nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T")
Link: https://lore.kernel.org/r/20191031093408.9322-1-jian-hong@endlessm.com
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
2019-11-26 13:13:14 -06:00
Linus Torvalds 323264eefb for-5.5/drivers-post-20191122
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3X/7gQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgponxD/9mb/H9LD6/flEqoPv7n1dv7Y8Oe+AKpGLb
 a2Jh8ycpU6WtzZdlMYbQkxAqgJCupLTlih3WY3NuI1fwSsxwyMziQEnVnKgPNf7s
 PLWt+Qo5ryooyVkPi4KCHKjx2CFDUL4B1BqtSLm9n3eN72FRa9HCsCiMugjCbV+K
 aF7snzZ0ss+m7SnKIpdtXJjcdIFC2hXwCAGWAOv1vOPwhqQZFBsxjHKnEJtumTp2
 +wzLBPItLBbzHtAyopbiNlfsuHL9CF9L1QFaCTZE7N6eVWnlYpIhMCMrfEJ/e3jK
 27ct3PWAa4Qr2S/85//AMOyL9mxK96g9FjuGjxnKQZ+qWh89RPLq+oGs1zWSv2O0
 08BynWvxYHkoOR2+baPx89SHrlwyN0HCsvFBxVKrMsVHpy81sIYLFlytYaQUMx2/
 RkgkIxiAo5R5vYHPZYX8cU4c1rASjG0tYD9OA6e78MJFIBfkTl70XHHVX2Tgf5by
 fuwon/g+iVvN94QHb81ulZcA0hXRz8jM2RNIAPhqfoJXX6wNzD5MooNxTs9m88IP
 6/HaM1l6AJUtOMNA1aZbKAq+ARYuIA0/qHoSS0UVHoG8D4YkYaFpvYsAaA+TBzeO
 J8IcwSu6eVR1NvgJ9b4cwXFWSf75a/o4UeDP/1fYcQU5Gn/KQEdQVFSMvND1nnHW
 hUTP5AKFcQ==
 =oyE3
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block

Pull additional block driver updates from Jens Axboe:
 "Here's another block driver update, done to avoid conflicts with the
  zoned changes coming next.

  This contains:

   - Prepare SCSI sd for zone open/close/finish support

   - Small NVMe pull request
        - hwmon support (Akinobu)
        - add new co-maintainer (Christoph)
        - work-around for a discard issue on non-conformant drives
          (Eduard)

   - Small nbd leak fix"

* tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block:
  nbd: prevent memory leak
  nvme: hwmon: add quirk to avoid changing temperature threshold
  nvme: hwmon: provide temperature min and max values for each sensor
  nvmet: add another maintainer
  nvme: Discard workaround for non-conformant devices
  nvme: Add hardware monitoring support
  scsi: sd_zbc: add zone open, close, and finish support
2019-11-25 11:18:03 -08:00
Linus Torvalds 2d53943090 for-5.5/drivers-20191121
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3WyEAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplgbD/4jNeqT0q2IkNcUUEWkZWsBOlfi0SiclS5v
 X8JY1IxTlL0kaBWm83mw06JewucQ97Fh7xblPE8/iDHJqpgEX4vvSQY1b8hcDulZ
 YOKUnLkFU22nICeT04/8x/+f8gqD5KOlGxkgEvUKViQW15oc0oNu4St/yFM1QEN0
 qNMzpcfFXV9lYsOPl0y3pKdP+qbfcpeSmaFD9Z65gxN6rJy1WR8rtUGXy2luoiEc
 dh15IL9AGN/r8VTo8yRpD9PStiuJqpALIR8OHJSHPj+s0pQ6twk4aehcnYseAMbH
 zSDpa9AJrfqlnh8tUfKYLWi/PM7pMH0F01rAiQv47j/C0+QhbiOU/uTFTzUW5hQ1
 eK6XzJ0slxwnDsHLKf+xJmCj0Oyk0jDimNQr/2MNsuhmr29V5lfvBNflub8eOLyZ
 ie2Eulv+z6pYBSJx6kqm0X3vhXOy4wgU+X8LzvfcP9iAjgU1rfzxUWxLEj+KfJS2
 Nl+ERV9nafoPpoKpNR7zWRBUulp1qZJzo/U9JaUKiI5cWkIH1hhHmU2++xMeyJpb
 XHoDFNTGv6z/eef65eSveFD7F274TSi16K56Obk+4KWaSrIR0d6VwUA7FDmJbSI+
 Jqk1OFdaRGsQ5OcVxF1Qo4WChn0FvhcD0c+yL0N19WZ01QeYsb3hlA+MUPDtGQ04
 U79MPfu7iA==
 =i0jf
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:
 "Here are the main block driver updates for 5.5. Nothing major in here,
  mostly just fixes. This contains:

   - a set of bcache changes via Coly

   - MD changes from Song

   - loop unmap write-zeroes fix (Darrick)

   - spelling fixes (Geert)

   - zoned additions cleanups to null_blk/dm (Ajay)

   - allow null_blk online submit queue changes (Bart)

   - NVMe changes via Keith, nothing major here either"

* tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block: (56 commits)
  Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()"
  drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
  drivers/md/raid5.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
  bcache: don't export symbols
  bcache: remove the extra cflags for request.o
  bcache: at least try to shrink 1 node in bch_mca_scan()
  bcache: add idle_max_writeback_rate sysfs interface
  bcache: add code comments in bch_btree_leaf_dirty()
  bcache: fix deadlock in bcache_allocator
  bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front()
  bcache: deleted code comments for dead code in bch_data_insert_keys()
  bcache: add more accurate error messages in read_super()
  bcache: fix static checker warning in bcache_device_free()
  bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
  bcache: fix fifo index swapping condition in journal_pin_cmp()
  md/raid10: prevent access of uninitialized resync_pages offset
  md: avoid invalid memory access for array sb->dev_roles
  md/raid1: avoid soft lockup under high load
  null_blk: add zone open, close, and finish support
  dm: add zone open, close and finish support
  ...
2019-11-25 11:15:41 -08:00
Akinobu Mita 6c6aa2f26c nvme: hwmon: add quirk to avoid changing temperature threshold
This adds a new quirk NVME_QUIRK_NO_TEMP_THRESH_CHANGE to avoid changing
the value of the temperature threshold feature for specific devices that
show undesirable behavior.

Guenter reported:

"On my Intel NVME drive (SSDPEKKW512G7), writing any minimum limit on the
Composite temperature sensor results in a temperature warning, and that
warning is sticky until I reset the controller.

It doesn't seem to matter which temperature I write; writing -273000 has
the same result."

The Intel NVMe has the latest firmware version installed, so this isn't
a problem that was ever fixed.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jean Delvare <jdelvare@suse.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-22 02:21:08 +09:00
Akinobu Mita 52deba0f02 nvme: hwmon: provide temperature min and max values for each sensor
According to the NVMe specification, the over temperature threshold and
under temperature threshold features shall be implemented for Composite
Temperature if a non-zero WCTEMP field value is reported in the Identify
Controller data structure.  The features are also implemented for all
implemented temperature sensors (i.e., all Temperature Sensor fields that
report a non-zero value).

This provides the over temperature threshold and under temperature
threshold for each sensor as temperature min and max values of hwmon
sysfs attributes.

The WCTEMP is already provided as a temperature max value for Composite
Temperature, but this change isn't incompatible.  Because the default
value of the over temperature threshold for Composite Temperature is
the WCTEMP.

Now the alarm attribute for Composite Temperature indicates one of the
temperature is outside of a temperature threshold.  Because there is only
a single bit in Critical Warning field that indicates a temperature is
outside of a threshold.

Example output from the "sensors" command:

nvme-pci-0100
Adapter: PCI adapter
Composite:    +33.9°C  (low  = -273.1°C, high = +69.8°C)
                       (crit = +79.8°C)
Sensor 1:     +34.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +31.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 5:     +47.9°C  (low  = -273.1°C, high = +65261.8°C)

This also adds helper macros for kelvin from/to milli Celsius conversion,
and replaces the repeated code in hwmon.c.

Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jean Delvare <jdelvare@suse.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-22 02:21:08 +09:00
Eduard Hasenleithner 530436c45e nvme: Discard workaround for non-conformant devices
Users observe IOMMU related errors when performing discard on nvme from
non-compliant nvme devices reading beyond the end of the DMA mapped
ranges to discard.

Two different variants of this behavior have been observed: SM22XX
controllers round up the read size to a multiple of 512 bytes, and Phison
E12 unconditionally reads the maximum discard size allowed by the spec
(256 segments or 4kB).

Make nvme_setup_discard unconditionally allocate the maximum DSM buffer
so the driver DMA maps a memory range that will always succeed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202665 many
Signed-off-by: Eduard Hasenleithner <eduard@hasenleithner.at>
[changelog, use existing define, kernel coding style]
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-13 06:37:38 +09:00
Guenter Roeck 400b6a7b13 nvme: Add hardware monitoring support
nvme devices report temperature information in the controller information
(for limits) and in the smart log. Currently, the only means to retrieve
this information is the nvme command line interface, which requires
super-user privileges.

At the same time, it would be desirable to be able to use NVMe temperature
information for thermal control.

This patch adds support to read NVMe temperatures from the kernel using the
hwmon API and adds temperature zones for NVMe drives. The thermal subsystem
can use this information to set thermal policies, and userspace can access
it using libsensors and/or the "sensors" command.

Example output from the "sensors" command:

nvme0-pci-0100
Adapter: PCI adapter
Composite:    +39.0°C  (high = +85.0°C, crit = +85.0°C)
Sensor 1:     +39.0°C
Sensor 2:     +41.0°C

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-12 01:57:35 +09:00
Linus Torvalds 5cb8418cb5 for-linus-2019-11-08
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3F+DIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkVmD/9FIa092Q6obga0RqA16GlbI85tgtNyRFZU
 neIO/g9R3G/uBTGbiUeXHXHDz9CqUXIRYX7pmI2u0b07iGLRz8oUsOsgyVEfCMen
 VitqwkJJAZ9j9OifyKpLYCZX9ulVDWX5hEz/vm2cNWDkjCbOpXvRuQmkXEzp7RNM
 F7K25PpGLvJHfqC90q9FXXxNDlB2i1M/rh5I7eUqhb6rHmzfJGKCd+H80t+REoB1
 iXAygPj86agQKLOUKZtJjXUjq9Ol/0FD+OKY+eP3EfVv/FJvIeWWYe78WplxJpRD
 BYb9dhLMCSo619WVVy4hNYCPjSfPKVT2cO5QJmVRpgOI1urFuTNgNoIiw06AgvkZ
 09vrlrJZ5A7eFEppuAFQC4WRYKWCQCQfq8wxt2iGUivXgHfskjJ7qJz1Mh5Nlxsr
 JGm1hSVw9UCBzjqC75K2CR+vVt4T8ovEaizPFvzVj6lQ0lRTmSchisxbXzTdevOn
 kFvBOntBdpeSq/CwZ0x5PDP4AbRsgH3ny47LCJoEpFZlxlOLuEB/dAx564dn6ZgZ
 rRz6mKU7rlO7brrkW+DbRZ0XiOn5qzN6FrcSGX1DBzc4+hMus/5PPjv8Gk+Wo1PU
 388mu6N6DObSjw1ij1AqkwyzudmbrziKM4isJY4I72I0YGSq9cuG5VXgM2GqbGlU
 XXpzsu8pSw==
 =8B/z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Two NVMe device removal crash fixes, and a compat fixup for for an
   ioctl that was introduced in this release (Anton, Charles, Max - via
   Keith)

 - Missing error path mutex unlock for drbd (Dan)

 - cgroup writeback fixup on dead memcg (Tejun)

 - blkcg online stats print fix (Tejun)

* tag 'for-linus-2019-11-08' of git://git.kernel.dk/linux-block:
  cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead
  block: drbd: remove a stray unlock in __drbd_send_protocol()
  blkcg: make blkcg_print_stat() print stats only for online blkgs
  nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
  nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
  nvme-rdma: fix a segmentation fault during module unload
2019-11-08 18:15:55 -08:00
Anton Eidelman 763303a83a nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
nvme_mpath_clear_ctrl_paths() iterates through
the ctrl->namespaces list while holding ctrl->scan_lock.
This does not seem to be the correct way of protecting
from concurrent list modification.

Specifically, nvme_scan_work() sorts ctrl->namespaces
AFTER unlocking scan_lock.

This may result in the following (rare) crash in ctrl disconnect
during scan_work:

    BUG: kernel NULL pointer dereference, address: 0000000000000050
    Oops: 0000 [#1] SMP PTI
    CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic
    RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core]
    ...
    Call Trace:
     nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core]
     nvme_remove_namespaces+0x35/0xe0 [nvme_core]
     nvme_do_delete_ctrl+0x47/0x90 [nvme_core]
     nvme_sysfs_delete+0x49/0x60 [nvme_core]
     dev_attr_store+0x17/0x30
     sysfs_kf_write+0x3e/0x50
     kernfs_fop_write+0x11e/0x1a0
     __vfs_write+0x1b/0x40
     vfs_write+0xb9/0x1a0
     ksys_write+0x67/0xe0
     __x64_sys_write+0x1a/0x20
     do_syscall_64+0x5a/0x130
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f8d02bfb154

Fix:
After taking scan_lock in nvme_mpath_clear_ctrl_paths()
down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe.
This will not cause deadlocks because taking scan_lock never happens
while holding the namespaces_rwsem.
Moreover, scan work downs namespaces_rwsem in the same order.

Alternative: sort ctrl->namespaces in nvme_scan_work()
while still holding the scan_lock.
This would leave nvme_mpath_clear_ctrl_paths() without correct protection
against ctrl->namespaces modification by anyone other than scan_work.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-06 00:30:37 +09:00
Max Gurtovoy 9ad9e8d6ca nvme-rdma: fix a segmentation fault during module unload
In case there are controllers that are not associated with any RDMA
device (e.g. during unsuccessful reconnection) and the user will unload
the module, these controllers will not be freed and will access already
freed memory. The same logic appears in other fabric drivers as well.

Fixes: 87fd125344 ("nvme-rdma: remove redundant reference between ib_device and tagset")
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-06 00:29:23 +09:00
Prabhath Sajeepa 64fab7290d nvme: Fix parsing of ANA log page
Check validity of offset into ANA log buffer before accessing
nvme_ana_group_desc. This check ensures the size of ANA log buffer >=
offset + sizeof(nvme_ana_group_desc)

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Prabhath Sajeepa <psajeepa@purestorage.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:42 -07:00
Christoph Hellwig 716fd9c119 nvmet: stop using bio_set_op_attrs
bio_set_op_attrs has been long deprecated, replace it with a direct
assignment of the flags to bio->bi_opf.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:42 -07:00
Christoph Hellwig 9dea0c81ee nvmet: add plugging for read/write when ns is bdev
With reference to the following issue reported on the mailing list :-
http://lists.infradead.org/pipermail/linux-nvme/2019-October/027604.html
this patch adds plugging for the bdev-ns under nvmet_bdev_execute_rw().

We can see the following performance improvement in random write
workload I/Os with the setup described in the link when device_path
configured as /dev/md0.

Without this patch :-

  write: IOPS=40.8k, BW=159MiB/s (167MB/s)(4777MiB/30002msec)
  write: IOPS=41.2k, BW=161MiB/s (169MB/s)(4831MiB/30011msec)
    slat (usec): min=8,  max=10823, avg=15.64,  stdev=16.85
    slat (usec): min=8,  max=401,   avg=15.40,  stdev= 9.56
    clat (usec): min=54, max=2492,  avg=759.07, stdev=172.62
    clat (usec): min=56, max=1997,  avg=768.06, stdev=178.72

With this patch :-

  write: IOPS=123k, BW=480MiB/s (504MB/s)(14.1GiB/30011msec)
  write: IOPS=123k, BW=481MiB/s (504MB/s)(14.1GiB/30002msec)
    slat (usec): min=8,  max=9941,  avg=13.31,  stdev= 8.04
    slat (usec): min=8,  max=289,   avg=13.31,  stdev= 3.37
    clat (usec): min=43, max=17635, avg=245.46, stdev=171.23
    clat (usec): min=44, max=17751, avg=245.25, stdev=183.14

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:42 -07:00
Christoph Hellwig d84dd8cde6 nvmet: clean up command parsing a bit
Move the special cases for fabrics commands and the discovery controller
to nvmet_parse_admin_cmd in preparation for adding passthrough support.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:42 -07:00
Geert Uytterhoeven 05d3046ff7 nvme-pci: Spelling s/resdicovered/rediscovered/
Fix misspelling of "rediscovered".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:42 -07:00
Sagi Grimberg d4b3a17411 nvmet: fill discovery controller sn, fr and mn correctly
Discovery controllers need this information as well.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:42 -07:00
Christoph Hellwig be3f3114dd nvmet: Open code nvmet_req_execute()
Now that nvmet_req_execute does nothing, open code it.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:42 -07:00
Christoph Hellwig e9061c3978 nvmet: Remove the data_len field from the nvmet_req struct
Instead of storing the expected length and checking it when it's
executed, just check the length inside the command themselves.

A new helper, nvmet_check_data_len() is created to help with this
check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, udpate changelog]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:42 -07:00
Christoph Hellwig 59ef0eaa77 nvmet: Introduce nvmet_dsm_len() helper
Similar to the nvmet_rw_len helper.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:42 -07:00
Christoph Hellwig 6f86f2c9d9 nvmet: Cleanup discovery execute handlers
Push the lid and cns check into their respective handlers and, while
we're at it, rename the functions to be consistent with other
discovery handlers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:41 -07:00
Christoph Hellwig 2cb6963a16 nvmet: Introduce common execute function for get_log_page and identify
Instead of picking the sub-command handler to execute in a nested
switch statement introduce a landing functions that calls out
to the appropriate sub-command handler.

This will allow us to have a common place in the handler to check
the transfer length in a future patch.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, update change log]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:41 -07:00
Logan Gunthorpe c73eebc07a nvmet-tcp: Don't set the request's data_len
It's not apprporiate for the transports to set the data_len
field of the request which is only used by the core.

In this case, just use a variable on the stack to store the
length of the sgl for comparison.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:41 -07:00
Logan Gunthorpe e0bace7177 nvmet-tcp: Don't check data_len in nvmet_tcp_map_data()
None of the other transports check data_len which is verified
in core code. The function should instead check that the sgl length
is non-zero.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:41 -07:00
Damien Le Moal e08f2ae850 nvme: Introduce nvme_lba_to_sect()
Introduce the new helper function nvme_lba_to_sect() to convert a device
logical block number to a 512B sector number. Use this new helper in
obvious places, cleaning up the code.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:41 -07:00
Damien Le Moal 314d48dd22 nvme: Cleanup and rename nvme_block_nr()
Rename nvme_block_nr() to nvme_sect_to_lba() and use SECTOR_SHIFT
instead of its hard coded value 9. Also add a comment to decribe this
helper.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:41 -07:00
Max Gurtovoy 16686f3a6c nvme: move common call to nvme_cleanup_cmd to core layer
nvme_cleanup_cmd should be called for each call to nvme_setup_cmd
(symmetrical functions). Move the call for nvme_cleanup_cmd to the common
core layer and call it during nvme_complete_rq for the good flow. For
error flow, each transport will call nvme_cleanup_cmd independently. Also
take care of a special case of path failure, where we call
nvme_complete_rq without doing nvme_setup_cmd.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:41 -07:00
Max Gurtovoy 2dc3947b53 nvme: introduce "Command Aborted By host" status code
Fix the status code of canceled requests initiated by the host according
to TP4028 (Status Code 0x371):
"Command Aborted By host: The command was aborted as a result of host
action (e.g., the host disconnected the Fabric connection)."

Also in a multipath environment, unless otherwise specified, errors of
this type (path related) should be retried using a different path, if
one is available.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:41 -07:00
Israel Rukshin 59534b9d60 nvmet-rdma: add unlikely check at nvmet_rdma_map_sgl_keyed
The calls to nvmet_req_alloc_sgl and rdma_rw_ctx_init should usually
succeed, so add this simple optimization to the fast path.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:41 -07:00
Israel Rukshin e522f44602 nvmet: add unlikely check at nvmet_req_alloc_sgl
The call to sgl_alloc shouldn't fail so add this simple optimization to
the fast path.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:40 -07:00
Israel Rukshin 4d764bb9a9 nvmet: use bio_io_error instead of duplicating it
This commit doesn't change any logic.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:40 -07:00
Israel Rukshin 58a8df67e0 nvme: introduce nvme_is_aen_req function
This function improves code readability and reduces code duplication.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:40 -07:00
James Smart bcde5f0fc7 nvme-fc: ensure association_id is cleared regardless of a Disconnect LS
Code today only clears the association_id if a Disconnect LS is transmit.

Remove ambiguity and unconditionally clear the association_id if the
association has been terminated.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:40 -07:00
James Smart 7db394848e nvme-fc: clarify error messages
Change wording on a couple of messages to clarify what happened.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:40 -07:00
James Smart 44fbf3bb1a nvme-fc: Set new cmd set indicator in nvme-fc cmnd iu
Set the new category field in the FC-NVME CMND_IU based on queue number.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:40 -07:00
James Smart 53b2b2f599 nvme-fc and nvmet-fc: sync with FC-NVME-2 header changes
Sync sources with revised structure and field names to correspond with
FC-NVME-2 header sync-up.

Tested interoperability with success:
- prior initiator with new target
- prior target with new initiator
- new on new

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 10:56:40 -07:00
Linus Torvalds 1204c70d9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix free/alloc races in batmanadv, from Sven Eckelmann.

 2) Several leaks and other fixes in kTLS support of mlx5 driver, from
    Tariq Toukan.

 3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
    Høiland-Jørgensen.

 4) Add an r8152 device ID, from Kazutoshi Noguchi.

 5) Missing include in ipv6's addrconf.c, from Ben Dooks.

 6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
    easily infer the 32-bit secret otherwise etc.

 7) Several netdevice nesting depth fixes from Taehee Yoo.

 8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
    when doing lockless skb_queue_empty() checks, and accessing
    sk_napi_id/sk_incoming_cpu lockless as well.

 9) Fix jumbo packet handling in RXRPC, from David Howells.

10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.

11) Fix DMA synchronization in gve driver, from Yangchun Fu.

12) Several bpf offload fixes, from Jakub Kicinski.

13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.

14) Fix ping latency during high traffic rates in hisilicon driver, from
    Jiangfent Xiao.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  net: fix installing orphaned programs
  net: cls_bpf: fix NULL deref on offload filter removal
  selftests: bpf: Skip write only files in debugfs
  selftests: net: reuseport_dualstack: fix uninitalized parameter
  r8169: fix wrong PHY ID issue with RTL8168dp
  net: dsa: bcm_sf2: Fix IMP setup for port different than 8
  net: phylink: Fix phylink_dbg() macro
  gve: Fixes DMA synchronization.
  inet: stop leaking jiffies on the wire
  ixgbe: Remove duplicate clear_bit() call
  Documentation: networking: device drivers: Remove stray asterisks
  e1000: fix memory leaks
  i40e: Fix receive buffer starvation for AF_XDP
  igb: Fix constant media auto sense switching when no cable is connected
  net: ethernet: arc: add the missed clk_disable_unprepare
  igb: Enable media autosense for the i350.
  igb/igc: Don't warn on fatal read failures when the device is removed
  tcp: increase tcp_max_syn_backlog max value
  net: increase SOMAXCONN to 4096
  netdevsim: Fix use-after-free during device dismantle
  ...
2019-11-01 17:48:11 -07:00
Anton Eidelman 86cccfbf77 nvme-multipath: remove unused groups_only mode in ana log
groups_only mode in nvme_read_ana_log() is no longer used: remove it.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 08:55:00 -06:00
Anton Eidelman af8fd04247 nvme-multipath: fix possible io hang after ctrl reconnect
The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
   NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
   This fails because ctrl disconnects.
   Therefore nvme_update_ns_ana_state() is not called
   and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
   nvme_read_ana_log(ctrl, groups_only=true).
   However, nvme_update_ana_state() does not update namespaces
   because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.

Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.

More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.

Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.

Note: Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 08:55:00 -06:00
Eric Dumazet 3f926af3f4 net: use skb_queue_empty_lockless() in busy poll contexts
Busy polling usually runs without locks.
Let's use skb_queue_empty_lockless() instead of skb_queue_empty()

Also uses READ_ONCE() in __skb_try_recv_datagram() to address
a similar potential problem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28 13:33:41 -07:00
Arnd Bergmann 1832f2d8ff compat_ioctl: move more drivers to compat_ptr_ioctl
The .ioctl and .compat_ioctl file operations have the same prototype so
they can both point to the same function, which works great almost all
the time when all the commands are compatible.

One exception is the s390 architecture, where a compat pointer is only
31 bit wide, and converting it into a 64-bit pointer requires calling
compat_ptr(). Most drivers here will never run in s390, but since we now
have a generic helper for it, it's easy enough to use it consistently.

I double-checked all these drivers to ensure that all ioctl arguments
are used as pointers or are ignored, but are not interpreted as integer
values.

Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Darren Hart (VMware) <dvhart@infradead.org>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23 17:23:44 +02:00
Kevin Hao a4f40484e7 nvme-pci: Set the prp2 correctly when using more than 4k page
In the current code, the nvme is using a fixed 4k PRP entry size,
but if the kernel use a page size which is more than 4k, we should
consider the situation that the bv_offset may be larger than the
dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
cause the command can't be executed correctly.

Fixes: dff824b2aa ("nvme-pci: optimize mapping of small single segment requests")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-18 23:09:41 +09:00
Max Gurtovoy 28a4cac48c nvme-tcp: fix possible leakage during error flow
During nvme_tcp_setup_cmd_pdu error flow, one must call nvme_cleanup_cmd
since it's symmetric to nvme_setup_cmd.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-15 22:47:29 +09:00
Max Gurtovoy 5812d04c4c nvmet-loop: fix possible leakage during error flow
During nvme_loop_queue_rq error flow, one must call nvme_cleanup_cmd since
it's symmetric to nvme_setup_cmd.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-15 22:47:28 +09:00
Sebastian Andrzej Siewior ac1c4e1885 nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL
The access to sk->sk_ll_usec should be hidden behind
CONFIG_NET_RX_BUSY_POLL like the definition of sk_ll_usec.

Put access to ->sk_ll_usec behind CONFIG_NET_RX_BUSY_POLL.

Fixes: 1a9460cef5 ("nvme-tcp: support simple polling")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14 23:27:01 +09:00
Keith Busch c1ac9a4b07 nvme: Wait for reset state when required
Prevent simultaneous controller disabling/enabling tasks from interfering
with each other through a function to wait until the task successfully
transitioned the controller to the RESETTING state. This ensures disabling
the controller will not be interrupted by another reset path, otherwise
a concurrent reset may leave the controller in the wrong state.

Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14 23:22:00 +09:00
Keith Busch 4c75f87785 nvme: Prevent resets during paused controller state
A paused controller is doing critical internal activation work in the
background. Prevent subsequent controller resets from occurring during
this period by setting the controller state to RESETTING first. A helper
function, nvme_try_sched_reset_work(), is introduced for these paths so
they may continue with scheduling the reset_work after they've completed
their uninterruptible critical section.

Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14 23:21:54 +09:00
Keith Busch 92b98e88d5 nvme: Restart request timers in resetting state
A controller in the resetting state has not yet completed its recovery
actions. The pci and fc transports were already handling this, so update
the remaining transports to not attempt additional recovery in this
state. Instead, just restart the request timer.

Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14 23:21:49 +09:00
Keith Busch 5d02a5c1d6 nvme: Remove ADMIN_ONLY state
The admin only state was intended to fence off actions that don't
apply to a non-IO capable controller. The only actual user of this is
the scan_work, and pci was the only transport to ever set this state.
The consequence of having this state is placing an additional burden on
every other action that applies to both live and admin only controllers.

Remove the admin only state and place the admin only burden on the only
place that actually cares: scan_work.

This also prepares to make it easier to temporarily pause a LIVE state
so that we don't need to remember which state the controller had been in
prior to the pause.

Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14 23:21:44 +09:00