Commit graph

1044546 commits

Author SHA1 Message Date
Christoph Hellwig 74e157e6a4 jfs: use bdev_nr_bytes instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:23 -06:00
Christoph Hellwig 78ed961bce hfsplus: use bdev_nr_sectors instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211018101130.1838532-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:23 -06:00
Christoph Hellwig beffd16e68 hfs: use bdev_nr_sectors instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211018101130.1838532-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig 9e48243b65 fat: use bdev_nr_sectors instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211018101130.1838532-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig 5816e91e4a cramfs: use bdev_nr_bytes instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211018101130.1838532-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig cda00eba02 btrfs: use bdev_nr_bytes instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Acked-by: David Sterba <dsterba@suse.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig 589aa7bc40 affs: use bdev_nr_sectors instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig bcd1d06350 fs: simplify init_page_buffers
No need to convert from bdev to inode and back.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211018101130.1838532-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig b86058f96c fs: use bdev_nr_bytes instead of open coding it in blkdev_max_block
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig 64f0f42671 target/iblock: use bdev_nr_bytes instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig c68f3ef777 nvmet: use bdev_nr_bytes instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig 0fe80347fd md: use bdev_nr_sectors instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20211018101130.1838532-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig 6dcbb52cdd dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them
Use the proper helpers to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig da7b392467 drbd: use bdev_nr_sectors instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig cda25b82c4 bcache: remove bdev_sectors
Use the equivalent block layer helper instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Acked-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211018101130.1838532-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig 6436bd90f7 block: add a bdev_nr_bytes helper
Add a helper to query the size of a block device in bytes.  This
will be used to remove open coded access to ->bd_inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211018101130.1838532-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig 99457db8b4 block: move the SECTOR_SIZE related definitions to blk_types.h
Ensure these are always available for inlines in the various block layer
headers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Jens Axboe 4f5022453a nvme: wire up completion batching for the IRQ path
Trivial to do now, just need our own io_comp_batch on the stack and pass
that in to the usual command completion handling.

I pondered making this dependent on how many entries we had to process,
but even for a single entry there's no discernable difference in
performance or latency. Running a sync workload over io_uring:

t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1

yields the below performance before the patch:

IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)

and the following after:

IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)

which definitely isn't slower, about the same if you factor in a bit of
variance. For peak performance workloads, benchmarking shows a 2%
improvement.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:40:47 -06:00
Jens Axboe b688f11e86 io_uring: utilize the io batching infrastructure for more efficient polled IO
Wire up using an io_comp_batch for f_op->iopoll(). If the lower stack
supports it, we can handle high rates of polled IO more efficiently.

This raises the single core efficiency on my system from ~6.1M IOPS to
~6.6M IOPS running a random read workload at depth 128 on two gen2
Optane drives.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:40:46 -06:00
Jens Axboe c234a65392 nvme: add support for batched completion of polled IO
Take advantage of struct io_comp_batch, if passed in to the nvme poll
handler. If it's set, rather than complete each request individually
inline, store them in the io_comp_batch list. We only do so for requests
that will complete successfully, anything else will be completed inline as
before.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:40:45 -06:00
Jens Axboe f794f3351f block: add support for blk_mq_end_request_batch()
Instead of calling blk_mq_end_request() on a single request, add a helper
that takes the new struct io_comp_batch and completes any request stored
in there.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:40:43 -06:00
Jens Axboe 1aec5e4a29 sbitmap: add helper to clear a batch of tags
sbitmap currently only supports clearing tags one-by-one, add a helper
that allows the caller to pass in an array of tags to clear.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:40:42 -06:00
Jens Axboe 5a72e899ce block: add a struct io_comp_batch argument to fops->iopoll()
struct io_comp_batch contains a list head and a completion handler, which
will allow completions to more effciently completed batches of IO.

For now, no functional changes in this patch, we just define the
io_comp_batch structure and add the argument to the file_operations iopoll
handler.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:40:40 -06:00
Jens Axboe 013a7f9543 block: provide helpers for rq_list manipulation
Instead of open-coding the list additions, traversal, and removal,
provide a basic set of helpers.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:40:39 -06:00
Jens Axboe afd7de03c5 block: remove some blk_mq_hw_ctx debugfs entries
Just like the blk_mq_ctx counterparts, we've got a bunch of counters
in here that are only for debugfs and are of questionnable value. They
are:

- dispatched, index of how many requests were dispatched in one go

- poll_{considered,invoked,success}, which track poll sucess rates. We're
  confident in the iopoll implementation at this point, don't bother
  tracking these.

As a bonus, this shrinks each hardware queue from 576 bytes to 512 bytes,
dropping a whole cacheline.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:40:33 -06:00
Jens Axboe 9a14d6ce41 block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes
These were added as part of early days debugging for blk-mq, and they
are not really useful anymore. Rather than spend cycles updating them,
just get rid of them.

As a bonus, this shrinks the per-cpu software queue size from 256b
to 192b. That's a whole cacheline less.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:39:48 -06:00
Pavel Begunkov 128459062b block: cache rq_flags inside blk_mq_rq_ctx_init()
Add a local variable for rq_flags, it helps to compile out some of
rq_flags reloads.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:38:45 -06:00
Pavel Begunkov 605f784e4f block: blk_mq_rq_ctx_init cache ctx/q/hctx
We should have enough of registers in blk_mq_rq_ctx_init(), store them
in local vars, so we don't keep reloading them.

note: keeping q->elevator may look unnecessary, but it's also used
inside inlined blk_mq_tags_from_data().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:38:43 -06:00
Pavel Begunkov 4f266f2be8 block: skip elevator fields init for non-elv queue
Don't init rq->hash and rq->rb_node in blk_mq_rq_ctx_init() if there is
no elevator. Also, move some other initialisers that imply barriers to
the end, so the compiler is free to rearrange and optimise other the
rest of them.

note: fold in a change from Jens leaving queue_list unconditional, as
it might lead to problems otherwise.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:38:42 -06:00
Jens Axboe 2ff0682da6 block: store elevator state in request
Add an rq private RQF_ELV flag, which tells the block layer that this
request was initialized on a queue that has an IO scheduler attached.
This allows for faster checking in the fast path, rather than having to
deference rq->q later on.

Elevator switching does full quiesce of the queue before detaching an
IO scheduler, so it's safe to cache this in the request itself.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 08:51:52 -06:00
Jens Axboe 90b8faa0e8 block: only mark bio as tracked if it really is tracked
We set BIO_TRACKED unconditionally when rq_qos_throttle() is called, even
though we may not even have an rq_qos handler. Only mark it as TRACKED if
it really is potentially tracked.

This saves considerable time for the case where the bio isn't tracked:

     2.64%     -1.65%  [kernel.vmlinux]  [k] bio_endio

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 08:50:47 -06:00
Jens Axboe b608762968 block: improve layout of struct request
It's been a while since this was analyzed, move some members around to
better flow with the use case. Initial state up top, and queued state
after that. This improves my peak case by about 1.5%, from 7750K to
7900K IOPS.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 08:50:42 -06:00
Jens Axboe 9be3e06fb7 block: move update request helpers into blk-mq.c
For some reason we still have them in blk-core, with the rest of the
request completion being in blk-mq. That causes and out-of-line call
for each completion.

Move them into blk-mq.c instead, where they belong.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 08:50:28 -06:00
Jens Axboe c477b79778 block: remove useless caller argument to print_req_error()
We have exactly one caller of this, just get rid of adding the useless
function name to the output.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 08:50:22 -06:00
Jens Axboe d4aa57a1ca block: don't bother iter advancing a fully done bio
If we're completing nbytes and nbytes is the size of the bio, don't bother
with calling into the iterator increment helpers. Just clear the bio
size and we're done.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 08:50:16 -06:00
Pavel Begunkov ed6cddefdf block: convert the rest of block to bdev_get_queue
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
queue pointer and so is faster.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/addf6ea988c04213697ba3684c853e4ed7642a39.1634219547.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:37 -06:00
Pavel Begunkov eab4e02733 block: use bdev_get_queue() in blk-core.c
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
queue pointer and so is faster.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/efc41f880262517c8dc32f932f1b23112f21b255.1634219547.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Pavel Begunkov 3caee4634b block: use bdev_get_queue() in bio.c
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
queue pointer and so is faster.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/85c36ea784d285a5075baa10049e6b59e15fb484.1634219547.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Pavel Begunkov 025a38651b block: use bdev_get_queue() in bdev.c
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
queue pointer and so is faster.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a352936ce5d9ac719645b1e29b173d931ebcdc02.1634219547.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Pavel Begunkov 17220ca5ce block: cache request queue in bdev
There are tons of places where we need to get a request_queue only
having bdev, which turns into bdev->bd_disk->queue. There are probably a
hundred of such places considering inline helpers, and enough of them
are in hot paths.

Cache queue pointer in struct block_device and make use of it in
bdev_get_queue().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a3bfaecdd28956f03629d0ca5c63ebc096e1c809.1634219547.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Jens Axboe abd45c159d block: handle fast path of bio splitting inline
The fast path is no splitting needed. Separate the handling into a
check part we can inline, and an out-of-line handling path if we do
need to split.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Jens Axboe 09ce874425 block: use flags instead of bit fields for blkdev_dio
This generates a lot better code for me, and bumps performance from
7650K IOPS to 7750K IOPS. Looking at profiles for the run and running
perf diff, it confirms that we're now sending a lot less time there:

     6.38%     -2.80%  [kernel.vmlinux]  [k] blkdev_direct_IO

Taking it from the 2nd most cycle consumer to only the 9th most at
3.35% of the CPU time.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Pavel Begunkov fac7c6d529 block: cache bdev in struct file for raw bdev IO
bdev = &BDEV_I(file->f_mapping->host)->bdev

Getting struct block_device from a file requires 2 memory dereferences
as illustrated above, that takes a toll on performance, so cache it in
yet unused file->private_data. That gives a noticeable peak performance
improvement.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/8415f9fe12e544b9da89593dfbca8de2b52efe03.1634115360.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Christoph Hellwig c712dccc64 nvme-multipath: enable polled I/O
Set the poll queue flag to enable polling, given that the multipath
node just dispatches the bios to a lower queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Christoph Hellwig a614dd2280 block: don't allow writing to the poll queue attribute
The poll attribute is a historic artefact from before when we had
explicit poll queues that require driver specific configuration.
Just print a warning when writing to the attribute.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Christoph Hellwig 3e08773c38 block: switch polling to be bio based
Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itself leads to a few advantages:

 - the cookie construction can made entirely private in blk-mq.c
 - the caller does not need to remember the request_queue and cookie
   separately and thus sidesteps their lifetime issues
 - keeping the device and the cookie inside the bio allows to trivially
   support polling BIOs remapping by stacking drivers
 - a lot of code to propagate the cookie back up the submission path can
   be removed entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Ming Lei 19416123ab block: define 'struct bvec_iter' as packed
'struct bvec_iter' is embedded into 'struct bio', define it as packed
so that we can get one extra 4bytes for other uses without expanding
bio.

'struct bvec_iter' is often allocated on stack, so making it packed
doesn't affect performance. Also I have run io_uring on both
nvme/null_blk, and not observe performance effect in this way.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Christoph Hellwig 1a7e76e4f1 block: use SLAB_TYPESAFE_BY_RCU for the bio slab
This flags ensures that the pages will not be reused for non-bio
allocations before the end of an RCU grace period.  With that we can
safely use a RCU lookup for bio polling as long as we are fine with
occasionally polling the wrong device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Christoph Hellwig 6ce913fe3e block: rename REQ_HIPRI to REQ_POLLED
Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague,
the bio flag is specific to the polling implementation, so rename and
document it properly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Christoph Hellwig d729cf9acb io_uring: don't sleep when polling for I/O
There is no point in sleeping for the expected I/O completion timeout
in the io_uring async polling model as we never poll for a specific
I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00