Commit graph

983475 commits

Author SHA1 Message Date
Damien Le Moal ee8f353b15 block: remove skd driver
The STEC S1220 PCIe SSD cards are EOL since 2014 and not supported by
the vendor anymore. As the skd driver for this SSD is starting to cause
problems with improvements to the block layer, stop supporting it in
newer kernel versions.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-04 07:45:32 -07:00
Jens Axboe 203c018079 Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.12/drivers
Pull MD fix from Song.

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md/raid5: cast chunk_sectors to sector_t value
2021-02-04 07:37:40 -07:00
Jens Axboe 1dced56c3a Floppy patch for 5.12
- O_NDELAY/O_NONBLOCK fix for floppy from Jiri Kosina.
   libblkid is using O_NONBLOCK when probing devices.
   This leads to pollution of kernel log with error
   messages from floppy driver. Also the driver fails
   a mount prior to being opened without O_NONBLOCK
   at least once. The patch fixes the issues.
 
 Signed-off-by: Denis Efremov <efremov@linux.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdlQDNgKUDfGSD+QDtSKVsDNQMB8FAmAbxhIACgkQtSKVsDNQ
 MB+1fRAAjl9yvWmWQdq7wCUpcKk7gL/qPVBVxU8IrI+byJp1tbNkwFlOP16/JBiT
 +K7a8NddJaDtYNTx7BXcyWAlXYdqGo065yQYu2doTwrCco3x84OhA+YJAheYII+0
 DpOKiJGrwgVXst5wlLEmyHfVeVTxQMm4iRCq/nl5U9FCrwcpDwGtNo8DZpHK1ZUp
 Bi5LCnejuYYk4FvSZZjD2WbieQhju8n5q/d4LGnt4pgBpC56JLqD5IJZZtEfApFi
 bYwNLsGUGtb3jZrkjPueHUHuF167bNrnPnyBRBWZuxAZpf7WzzPL7xR3CcX+N8Fc
 u1oUw87Adrs2XfCB/7mR3jBxffd1YoaQ0lu0L4EbGOuBOJmKOAYyYIssn6dgqxRz
 PoZ6ixMYFHqYIx/KFlK3UI7BMEbT34Swwhx2JlJA7i8Etp+dyH5dPlhkCqh/YC+6
 bLncPWQslrR+b3GhlOu2AX3SG2RaIEYhUMA5BcnGF8/j3GECDC2l3Q9drvdNbz78
 bKjsRME+9EKOXc/OJWPRF9WUROYSQnK0wQYVQosP+ufP8IbwOM3R0QwbS9xA3z5x
 vI4PoJ7Sm8xcLYsAKzh1+LLHPVstEPoshzsvD240txamoiyDaWkIQQT8J4GoOR6x
 kdK4UnY7/3SnX4nVkO9okDl6Q2DW4KdEnXhBleLa/26bn2yMMeU=
 =pv9Q
 -----END PGP SIGNATURE-----

Merge tag 'floppy-for-5.12' of https://github.com/evdenis/linux-floppy into for-5.12/drivers

Pull floppy fix from Denis:

"Floppy patch for 5.12

- O_NDELAY/O_NONBLOCK fix for floppy from Jiri Kosina.
  libblkid is using O_NONBLOCK when probing devices.
  This leads to pollution of kernel log with error
  messages from floppy driver. Also the driver fails
  a mount prior to being opened without O_NONBLOCK
  at least once. The patch fixes the issues."

Signed-off-by: Denis Efremov <efremov@linux.com>

* tag 'floppy-for-5.12' of https://github.com/evdenis/linux-floppy:
  floppy: reintroduce O_NDELAY fix
2021-02-04 07:36:49 -07:00
Jiri Kosina 8a0c014cd2 floppy: reintroduce O_NDELAY fix
This issue was originally fixed in 09954bad4 ("floppy: refactor open()
flags handling").

The fix as a side-effect, however, introduce issue for open(O_ACCMODE)
that is being used for ioctl-only open. I wrote a fix for that, but
instead of it being merged, full revert of 09954bad4 was performed,
re-introducing the O_NDELAY / O_NONBLOCK issue, and it strikes again.

This is a forward-port of the original fix to current codebase; the
original submission had the changelog below:

====
Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2101221209060.5622@cbobk.fhfr.pm
Cc: stable@vger.kernel.org
Reported-by: Wim Osterholt <wim@djo.tudelft.nl>
Tested-by: Wim Osterholt <wim@djo.tudelft.nl>
Reported-and-tested-by: Kurt Garloff <kurt@garloff.de>
Fixes: 09954bad4 ("floppy: refactor open() flags handling")
Fixes: f2791e7ead ("Revert "floppy: refactor open() flags handling"")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Denis Efremov <efremov@linux.com>
2021-02-04 13:00:24 +03:00
Guoqing Jiang c5eec74f25 md/raid5: cast chunk_sectors to sector_t value
Currently, raid5 calculates dev_sectors from chunk_sectors without
proper cast, which is problematic.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2021-02-03 22:48:16 -08:00
Jens Axboe 0d7389718c nvme updates for 5.12:
- failed reconnect fixes (Chao Leng)
  - various tracing improvements (Michal Krakowiak, Johannes Thumshirn)
  - switch the nvmet-fc assoc_list to use RCU protection (Leonid Ravich)
  - resync the status codes with the latest spec (Max Gurtovoy)
  - minor nvme-tcp improvements (Sagi Grimberg)
  - various cleanups (Rikard Falkeborn, Minwoo Im, Chaitanya Kulkarni,
    Israel Rukshin)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmAZG84LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYMxnBAAjvNHZHS2ZDfZVGSo65PgTeOIG89X8J2JlUF+reIY
 iRrXwHu0OBiUkyXhHF8pwZycLmkl6N3QMUsBKpA8Bz0NXH+wAvw0hqR/BmEjJPeR
 mmMa3lLCVY9m7EmD/PaLiSGExZhKHWO98Xx/5u8vKy7sJcqsocFFP6mfpYAbpJtx
 943VgdzMuZ/Mp/lKH+TvIqLlHcJegTOFnOc/LsnYS1keVfbA3wcqwi8X+Z6gWwss
 RtECjbmGkHhG9q5pXkezs7EvTVaKZf6UhrS019gSAwfLcWvWaj4oQo3WC0z0ztTI
 Zm2Y2jlW2vPk0P3/xiG4R2GcX0dVeJcyaVdkCeGM/oixfnQf2vB0nbIoSvd3UoKW
 Ed21Vn7KprqsGiFqOVzzzoX8KHLaPBjva3FVYtWgSmdRF4aPoVh56QtW7NaBtqTv
 qLgIqEL0VQsMovLBwuHff3szt38ISh7PKQstIqHUsAAOZCMzPzBZxjf7RkbFaBUP
 NyicRxvDt+Pnh2nrmG6sHXxVDvagOa1Z1Z5PJJMvbVJYlu5dP5Toq6uxgqzDrT3m
 /ph4BkQlxWLEQ60X1EtrNr8duUs6ymkZbVreOocG0mgpKeJsohm7QaEI7NkA+8kw
 7ZHXFHT+KycXFbUJ1F/toOkoql9ywMqUpgC7etr7vvH29hSijCguI3EOX/FMA8Sr
 l1U=
 =mwWd
 -----END PGP SIGNATURE-----

Merge tag 'nvme-5.21-2020-02-02' of git://git.infradead.org/nvme into for-5.12/drivers

Pull NVMe updates from Christoph:

"nvme updates for 5.12:

 - failed reconnect fixes (Chao Leng)
 - various tracing improvements (Michal Krakowiak, Johannes Thumshirn)
 - switch the nvmet-fc assoc_list to use RCU protection (Leonid Ravich)
 - resync the status codes with the latest spec (Max Gurtovoy)
 - minor nvme-tcp improvements (Sagi Grimberg)
 - various cleanups (Rikard Falkeborn, Minwoo Im, Chaitanya Kulkarni,
   Israel Rukshin)"

* tag 'nvme-5.21-2020-02-02' of git://git.infradead.org/nvme: (22 commits)
  nvme-tcp: use cancel tagset helper for tear down
  nvme-rdma: use cancel tagset helper for tear down
  nvme-tcp: add clean action for failed reconnection
  nvme-rdma: add clean action for failed reconnection
  nvme-core: add cancel tagset helpers
  nvme-core: get rid of the extra space
  nvme: add tracing of zns commands
  nvme: parse format nvm command details when tracing
  nvme: update enumerations for status codes
  nvmet: add lba to sect conversion helpers
  nvmet: remove extra variable in identify ns
  nvmet: remove extra variable in id-desclist
  nvmet: remove extra variable in smart log nsid
  nvme: refactor ns->ctrl by request
  nvme-tcp: pass multipage bvec to request iov_iter
  nvme-tcp: get rid of unused helper function
  nvme-tcp: fix wrong setting of request iov_iter
  nvme: support command retry delay for admin command
  nvme: constify static attribute_group structs
  nvmet-fc: use RCU proctection for assoc_list
  ...
2021-02-02 07:11:47 -07:00
Chao Leng 563c81586d nvme-tcp: use cancel tagset helper for tear down
Use nvme_cancel_tagset and nvme_cancel_admin_tagset to clean code for
tear down process.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:13 +01:00
Chao Leng c4189d680e nvme-rdma: use cancel tagset helper for tear down
Use nvme_cancel_tagset and nvme_cancel_admin_tagset to clean code for
tear down process.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Chao Leng 70a99574a7 nvme-tcp: add clean action for failed reconnection
If reconnect failed after start io queues, the queues will be unquiesced
and new requests continue to be delivered. Reconnection error handling
process directly free queues without cancel suspend requests. The
suppend request will time out, and then crash due to use the queue
after free.

Add sync queues and cancel suppend requests for reconnection error
handling.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Chao Leng 958dc1d32c nvme-rdma: add clean action for failed reconnection
A crash happens when inject failed reconnection.
If reconnect failed after start io queues, the queues will be unquiesced
and new requests continue to be delivered. Reconnection error handling
process directly free queues without cancel suspend requests. The
suppend request will time out, and then crash due to use the queue
after free.

Add sync queues and cancel suppend requests for reconnection error
handling.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Chao Leng 2547906982 nvme-core: add cancel tagset helpers
Add nvme_cancel_tagset and nvme_cancel_admin_tagset for tear down and
reconnection error handling.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Chaitanya Kulkarni 8f8ea928fd nvme-core: get rid of the extra space
Remove the extra space in the nvme_free_cels() when calling
xa_for_each loop which is not a common practice
(except drivers/infiniband/core/ not sure why).

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Johannes Thumshirn 4a407d5ebc nvme: add tracing of zns commands
When support for the NVMe ZNS commands was merged, tracing of these has
been omitted.

Add nvme_cmd_zone_mgmt_send, nvme_cmd_zone_mgmt_recv as well as
nvme_cmd_zone_append to the nvme driver's tracing facility.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Michal Krakowiak 3a98c51a24 nvme: parse format nvm command details when tracing
Add detailed parsing of format nvm admin command to make the
trace log more consistent and human-readable.

Signed-off-by: Michal Krakowiak <michal.krakowiak@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Max Gurtovoy 3254899e0b nvme: update enumerations for status codes
All the updates are mentioned in the ratified NVMe 1.4 spec.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Chaitanya Kulkarni 193fcf371f nvmet: add lba to sect conversion helpers
In this preparation patch, we add helpers to convert lbas to sectors &
sectors to lba. This is needed to eliminate code duplication in the ZBD
backend.

Use these helpers in the block device backend.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Chaitanya Kulkarni 3c7b224f19 nvmet: remove extra variable in identify ns
We remove the extra local variable struct nvmet_ns in
nvmet_execute_identify_ns() since req already has ns member that can be
reused, this also eliminates the explicit call to nvmet_put_namespace()
which is already present in the request completion path.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Chaitanya Kulkarni 3631c7f4a2 nvmet: remove extra variable in id-desclist
We remove the extra local variable struct nvmet_ns in
nvmet_execute_identify_desclist() since req already has ns member that
can be reused, this also eliminates the explicit call to
nvmet_put_namespace() which is already present in the request
completion path.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Chaitanya Kulkarni 624e67fdf9 nvmet: remove extra variable in smart log nsid
We remove the extra local variable struct nvmet_ns in
nvmet_get_smart_log_nsid() since req already has ns member that can be
reused, this also eliminates the explicit call to nvmet_put_namespace()
which is already present in the request completion path.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Minwoo Im fc97e942d9 nvme: refactor ns->ctrl by request
Just for current code in nvme_cleanup_cmd(), we don't have to get
namespace instance, but we need controller instance.

Controller instance can be retrieved by namespace instance, but it can
be directly accessed by nvme_request instance from request.

	ctrl = nvme_req(req)->ctrl;

We don't have to go around namespace instance from request instance
through gendisk.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Sagi Grimberg 0dc9edaf80 nvme-tcp: pass multipage bvec to request iov_iter
iov_iter uses the right helpers so we should be able
to pass in a multipage bvec. Right now the iov_iter is
initialized with more segments that it needs which doesn't
fail because the iov_iter is capped by byte count, but it
is better to use a full multipage bvec iter.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Sagi Grimberg 60141aa08c nvme-tcp: get rid of unused helper function
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Sagi Grimberg cb9b870fba nvme-tcp: fix wrong setting of request iov_iter
We might set the iov_iter direction wrong, which is harmless for this
use-case, but get it right. Also this makes the code slightly cleaner.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Minwoo Im f9063a5327 nvme: support command retry delay for admin command
The controller can request a delay retrying a failed command by setting
the Command Retry Delay (CRD) field in the Completion Queue Entry.

Currentlty this features is only applied to commands on the I/O queue, but
not to commands on the admin queue.  Retreive the nvme_ctrl from the
request so that no namespace is required and apply the feature to all
commands.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:10 +01:00
Rikard Falkeborn 60b152a508 nvme: constify static attribute_group structs
The only usage of these is to put their addresses in arrays of pointers
to const attribute_groups. Make them const to allow the compiler to put
them in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:10 +01:00
Leonid Ravich 4e2f02bf77 nvmet-fc: use RCU proctection for assoc_list
searching assoc_list protected by rcu_read_lock if list not changed inline.
and according to the rcu list rules.

queue array embedded into nvmet_fc_tgt_assoc protected by rcu_read_lock
according to rcu dereference/assign rules.

queue and assoc object freed after grace period by call_rcu.

tgtport lock taken for changing assoc_list.

Reviewed-by: Eldad Zinger <Eldad.Zinger@dell.com>
Reviewed-by: Elad Grupi <Elad.Grupi@dell.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Leonid Ravich <Leonid.Ravich@emc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:10 +01:00
Israel Rukshin 36ca03c830 nvmet: Fix nvmet_is_port_enabled indentation
Remove extra tab.

Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:10 +01:00
Israel Rukshin cc34562261 nvmet: Use nvmet_is_port_enabled helper for pi_enable
Remove code duplication.

Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:10 +01:00
Joe Perches e8628013e5 drbd: Avoid comma separated statements
Use semicolons and braces.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-31 08:05:36 -07:00
Yang Li 9abe47cc5c rsxx: remove redundant NULL check
Fix below warnings reported by coccicheck:
./drivers/block/rsxx/dma.c:948:3-8: WARNING: NULL check
before some freeing functions is not needed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <abaci-bugfix@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 13:15:41 -07:00
Tian Tao 294ed6b9f0 zram: fix NULL check before some freeing functions is not needed
fixed the below warning:
/drivers/block/zram/zram_drv.c:534:2-8: WARNING: NULL check
before some freeing functions is not needed.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 13:12:08 -07:00
Guoqing Jiang 370276bac8 drbd: remove unused argument from drbd_request_prepare and __drbd_make_request
We can remove start_jif since it is not used by drbd_request_prepare,
then remove it from __drbd_make_request further.

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: drbd-dev@lists.linbit.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 13:11:34 -07:00
Bjorn Helgaas 2126979183 mtip32xx: prefer pcie_capability_read_word()
Replace pci_read_config_word() with pcie_capability_read_word().

pcie_capability_read_word() takes care of a few special cases when reading
the PCIe capability.  See 8c0d3a02c1 ("PCI: Add accessors for PCI Express
Capability").

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 13:09:39 -07:00
Bjorn Helgaas 416c054777 mtip32xx: use PCI #defines instead of numbers
Use PCI #defines for PCIe Device Control register values instead of
hard-coding bit positions.  No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 13:09:37 -07:00
Pavel Tatashin 6cc8e74308 loop: scale loop device by introducing per device lock
Currently, loop device has only one global lock: loop_ctl_mutex.

This becomes hot in scenarios where many loop devices are used.

Scale it by introducing per-device lock: lo_mutex that protects
modifications of all fields in struct loop_device.

Keep loop_ctl_mutex to protect global data: loop_index_idr, loop_lookup,
loop_add.

The new lock ordering requirement is that loop_ctl_mutex must be taken
before lo_mutex.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 13:08:54 -07:00
Jan Kara 767630c63b bdev: Do not return EBUSY if bdev discard races with write
blkdev_fallocate() tries to detect whether a discard raced with an
overlapping write by calling invalidate_inode_pages2_range(). However
this check can give both false negatives (when writing using direct IO
or when writeback already writes out the written pagecache range) and
false positives (when write is not actually overlapping but ends in the
same page when blocksize < pagesize). This actually causes issues for
qemu which is getting confused by EBUSY errors.

Fix the problem by removing this conflicting write detection since it is
inherently racy and thus of little use anyway.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
CC: "Darrick J. Wong" <darrick.wong@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20201111153913.41840-1-mlevitsk@redhat.com
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 10:22:18 -07:00
Christoph Hellwig 46bbf653a6 block: inherit BIO_REMAPPED when cloning bios
Cloned bios are can be used to on the same device, in which case we need
to inherit the BIO_REMAPPED flag to avoid a double partition remap.  When
the cloned bios are used on another device, bio_set_dev will clear the flag.

Fixes: 309dca309f ("block: store a block_device pointer in struct bio")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 08:50:01 -07:00
Christoph Hellwig f65b95fe0c bcache: use bio_set_dev to assign ->bi_bdev
Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 08:50:01 -07:00
Christoph Hellwig a7c7f7b2b6 nvme: use bio_set_dev to assign ->bi_bdev
Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 08:50:01 -07:00
Jens Axboe a5bf0a92e1 bfq: bfq_check_waker() should be static
It's only used in the same file, mark is appropriately static.

Fixes: 71217df39d ("block, bfq: make waker-queue detection more robust")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 21:15:01 -07:00
Paolo Valente 71217df39d block, bfq: make waker-queue detection more robust
In the presence of many parallel I/O flows, the detection of waker
bfq_queues suffers from false positives. This commits addresses this
issue by making the filtering of actual wakers more selective. In more
detail, a candidate waker must be found to meet waker requirements
three times before being promoted to actual waker.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 14:18:37 -07:00
Paolo Valente 5a5436b98d block, bfq: save also injection state on queue merging
To prevent injection information from being lost on bfq_queue merging,
also the amount of service that a bfq_queue receives must be saved and
restored when the bfq_queue is merged and split, respectively.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 14:18:35 -07:00
Paolo Valente e673914d52 block, bfq: save also weight-raised service on queue merging
To prevent weight-raising information from being lost on bfq_queue merging,
also the amount of service that a bfq_queue receives must be saved and
restored when the bfq_queue is merged and split, respectively.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 14:18:34 -07:00
Paolo Valente d1f600fa47 block, bfq: fix switch back from soft-rt weitgh-raising
A bfq_queue may happen to be deemed as soft real-time while it is
still enjoying interactive weight-raising. If this happens because of
a false positive, then the bfq_queue is likely to loose its soft
real-time status soon. Upon losing such a status, the bfq_queue must
get back its interactive weight-raising, if its interactive period is
not over yet. But this case is not handled. This commit corrects this
error.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 14:18:32 -07:00
Paolo Valente 7f1995c27b block, bfq: re-evaluate convenience of I/O plugging on rq arrivals
Upon an I/O-dispatch attempt, BFQ may detect that it was better to
plug I/O dispatch, and to wait for a new request to arrive for the
currently in-service queue. But the arrival of a new request for an
empty bfq_queue, and thus the switch from idle to busy of the
bfq_queue, may cause the scenario to change, and make plugging no
longer needed for service guarantees, or more convenient for
throughput. In this case, keeping I/O-dispatch plugged would certainly
lower throughput.

To address this issue, this commit makes such a check, and stops
plugging I/O if it is better to stop plugging I/O.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 14:18:31 -07:00
Paolo Valente eb2fd80f9d block, bfq: replace mechanism for evaluating I/O intensity
Some BFQ mechanisms make their decisions on a bfq_queue basing also on
whether the bfq_queue is I/O bound. In this respect, the current logic
for evaluating whether a bfq_queue is I/O bound is rather rough. This
commits replaces this logic with a more effective one.

The new logic measures the percentage of time during which a bfq_queue
is active, and marks the bfq_queue as I/O bound if the latter if this
percentage is above a fixed threshold.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 14:18:29 -07:00
Christoph Hellwig 3a905c37c3 block: skip bio_check_eod for partition-remapped bios
When an already remapped bio is resubmitted (e.g. by blk_queue_split),
bio_check_eod will compare the remapped bi_sector against the size
of the partition, leading to spurious I/O failures.

Skip the EOD check in this case.

Fixes: 309dca309f ("block: store a block_device pointer in struct bio")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 11:41:34 -07:00
Pavel Begunkov c42bca92be bio: don't copy bvec for direct IO
The block layer spends quite a while in blkdev_direct_IO() to copy and
initialise bio's bvec. However, if we've already got a bvec in the input
iterator it might be reused in some cases, i.e. when new
ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
performance boost, and it also reduces memory footprint.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 08:58:24 -07:00
Pavel Begunkov 3e1a88ec96 bio: add a helper calculating nr segments to alloc
Add a helper function calculating the number of bvec segments we need to
allocate to construct a bio. It doesn't change anything functionally,
but will be used to not duplicate special cases in the future.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 08:58:24 -07:00
Pavel Begunkov 54c8195b4e iov_iter: optimise bvec iov_iter_advance()
iov_iter_advance() is heavily used, but implemented through generic
means. For bvecs there is a specifically crafted function for that, so
use bvec_iter_advance() instead, it's faster and slimmer.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 08:58:24 -07:00