Commits · 453096eb048ce613d5775ea21cdf7826d4340e80 · Jason Kridner / Linux

Mar 08, 2022

blk-mq: do not include passthrough requests in I/O accounting · 41fa7222


I/O accounting buckets I/O into the read/write/discard categories into
which passthrough I/O does not fit at all.  It also accounts to the
block_device, which may not even exist for passthrough I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-2-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

41fa7222

Feb 22, 2022

scsi: block: Remove REQ_OP_WRITE_SAME support · 73bd66d9

Christoph Hellwig authored 3 years ago

No more users of REQ_OP_WRITE_SAME or drivers implementing it are left,
so remove the infrastructure.

[mkp: fold in and tweak sysfs reporting fix]

Link: https://lore.kernel.org/r/20220209082828.2629273-8-hch@lst.de


Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

73bd66d9

Feb 16, 2022

block: don't check bio in blk_throtl_dispatch_work_fn · 3f98c753

Ming Lei authored 3 years ago


The bio has been checked already before throttling, so no need to check
it again before dispatching it from throttle queue.

Add a helper of submit_bio_noacct_nocheck() for this purpose.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220216044514.2903784-5-ming.lei@redhat.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

3f98c753

block: don't declare submit_bio_checks in local header · 29ff2362

Ming Lei authored 3 years ago

submit_bio_checks() won't be called outside of block/blk-core.c any more
since commit 9d497e29

 ("block: don't protect submit_bio_checks by
q_usage_counter"), so mark it as one local helper.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-4-ming.lei@redhat.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

29ff2362

Feb 02, 2022

block: pass a block_device and opf to blk_next_bio · 0a3140ea

Chaitanya Kulkarni authored 3 years ago


All callers need to set the block_device and operation, so lift that into
the common code.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-15-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

0a3140ea

block: move blk_drop_partitions to blk.h · e7243285

Christoph Hellwig authored 3 years ago


No need to have this declaration in a public header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-3-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

e7243285

block: move disk_{block,unblock,flush}_events to blk.h · 926597ff

Christoph Hellwig authored 3 years ago


No need to have these declarations in a public header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-2-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

926597ff

Dec 16, 2021

block: only build the icq tracking code when needed · 5ef16305

Christoph Hellwig authored 3 years ago


Only bfq needs to code to track icq, so make it conditional.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211209063131.18537-12-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

5ef16305

Dec 03, 2021

blk-mq: move srcu from blk_mq_hw_ctx to request_queue · 704b914f

Ming Lei authored 3 years ago


In case of BLK_MQ_F_BLOCKING, per-hctx srcu is used to protect dispatch
critical area. However, this srcu instance stays at the end of hctx, and
it often takes standalone cacheline, often cold.

Inside srcu_read_lock() and srcu_read_unlock(), WRITE is always done on
the indirect percpu variable which is allocated from heap instead of
being embedded, srcu->srcu_idx is read only in srcu_read_lock(). It
doesn't matter if srcu structure stays in hctx or request queue.

So switch to per-request-queue srcu for protecting dispatch, and this
way simplifies quiesce a lot, not mention quiesce is always done on the
request queue wide.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211203131534.3668411-3-ming.lei@redhat.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

704b914f

block: switch to atomic_t for request references · 0a467d0f

Jens Axboe authored 3 years ago


refcount_t is not as expensive as it used to be, but it's still more
expensive than the io_uring method of using atomic_t and just checking
for potential over/underflow.

This borrows that same implementation, which in turn is based on the
mm implementation from Linus.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0a467d0f

Nov 29, 2021

block: remove the ->rq_disk field in struct request · f3fa33ac

Christoph Hellwig authored 3 years ago


Just use the disk attached to the request_queue instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

f3fa33ac

block: simplify ioc_lookup_icq · eca5892a

Christoph Hellwig authored 3 years ago


Remove the ioc argument as it always points to current->io_context.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-15-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

eca5892a

block: move blk_mq_sched_assign_ioc to blk-ioc.c · 87dd1d63

Christoph Hellwig authored 3 years ago


Move blk_mq_sched_assign_ioc so that many interfaces from the file can
be marked static.  Rename the function to ioc_find_get_icq as well and
return the icq to simplify the interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-8-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

87dd1d63

block: don't include <linux/part_stat.h> in blk.h · 82d981d4

Christoph Hellwig authored 3 years ago


Not needed, shift it into the source files that need it instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-9-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

82d981d4

block: don't include <linux/idr.h> in blk.h · ca5b304c

Christoph Hellwig authored 3 years ago


Not needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-8-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

ca5b304c

block: don't include <linux/blk-mq.h> in blk.h · a2ff7781

Christoph Hellwig authored 3 years ago


Not needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-7-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

a2ff7781

block: don't include blk-mq.h in blk.h · e4a19f72

Christoph Hellwig authored 3 years ago


No needed, shift a blk-stat.h include into the source file that needs it
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-6-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

e4a19f72

block: don't include blk-mq-sched.h in blk.h · 2aa7745b

Christoph Hellwig authored 3 years ago


No needed, shift it into the source files that need it instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-5-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

2aa7745b

block: remove the e argument to elevator_exit · 0c6cb3a2

Christoph Hellwig authored 3 years ago


All callers pass q->elevator.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-4-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

0c6cb3a2

block: remove elevator_exit · f46b81c5

Christoph Hellwig authored 3 years ago


Open code elevator_exit in it's only caller, and rename __elevator_exit to
elevator_exit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-3-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

f46b81c5

block: move blk_get_flush_queue to blk-flush.c · 0281ed3c

Christoph Hellwig authored 3 years ago


blk_get_flush_queue is only used in blk-flush.c, so move it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-2-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

0281ed3c

blk-mq: simplify the plug handling in blk_mq_submit_bio · 0c5bcc92

Christoph Hellwig authored 3 years ago


blk_mq_submit_bio has two different plug cases, one that uses full
plugging and a limited plugging one.

The limited plugging case is only used for a corner case that does
not matter in real life:

 - no ->commit_rqs (so not NVMe)
 - no shared tags (so not SCSI)
 - not rotational (so no old disk or floppy driver)
 - must have multiple queues (so no eMMC)

Remove the limited merging case and all the related junk to simplify
blk_mq_submit_bio and the functions called from it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123160443.1315598-2-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

0c5bcc92

block: merge disk_scan_partitions and blkdev_reread_part · e16e506c

Christoph Hellwig authored 3 years ago


Unify the functionality that implements a partition rescan for a
gendisk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-6-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

e16e506c

block: move blk_print_req_error to blk-mq.c · 0d7a29a2

Christoph Hellwig authored 3 years ago


This function is only used by the request completion path.  Factor out
a blk_status_to_str to keep blk_errors private in blk-core.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-11-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

0d7a29a2

block: move blk_account_io_{start,done} to blk-mq.c · 450b7879

Christoph Hellwig authored 3 years ago


These are only used for request based I/O, so move them where they are
used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-9-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

450b7879

block: move request based cloning helpers to blk-mq.c · 06c8c691

Christoph Hellwig authored 3 years ago


Keep all the request based code together.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-6-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

06c8c691

Nov 19, 2021

blk-mq: don't insert FUA request with data into scheduler queue · 2b504bd4

Ming Lei authored 3 years ago

We never insert flush request into scheduler queue before.

Recently commit d92ca9d8 ("blk-mq: don't handle non-flush requests in
blk_insert_flush") tries to handle FUA data request as normal request.
This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
case of kyber since RQF_ELVPRIV isn't set for flush request, then
->finish_request won't be called.

Fix the issue by inserting FUA data request with blk_mq_request_bypass_insert()
when the device supports FUA, just like what we did before.

[1] https://lore.kernel.org/linux-block/CAHj4cs-_vkTW=dAzbZYGxpEWSpzpcmaNeY1R=vH311+9vMUSdg@mail.gmail.com/



Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: d92ca9d8

 ("blk-mq: don't handle non-flush requests in blk_insert_flush")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20211118153041.2163228-1-ming.lei...

2b504bd4

Nov 05, 2021

block: move queue enter logic into blk_mq_submit_bio() · 900e0807

Jens Axboe authored 3 years ago


Retain the old logic for the fops based submit, but for our internal
blk_mq_submit_bio(), move the queue entering logic into the core
function itself.

We need to be a bit careful if going into the scheduler, as a scheduler
or queue mappings can arbitrarily change before we have entered the queue.
Have the bio scheduler mapping do that separately, it's a very cheap
operation compared to actually doing merging locking and lookups.

Reviewed-by: Christoph Hellwig <hch@lst.de>
[axboe: update to check merge post submit_bio_checks() doing remap...]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

900e0807

Nov 04, 2021

block: make bio_queue_enter() fast-path available inline · c98cb5bb

Jens Axboe authored 3 years ago


Just a prep patch for shifting the queue enter logic. This moves the
expected fast path inline, and leaves __bio_queue_enter() as an
out-of-line function call. We don't want to inline the latter, as it's
mostly slow path code.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c98cb5bb

Oct 26, 2021

block: Add independent access ranges support · a2247f19

Damien Le Moal authored 3 years ago


The Concurrent Positioning Ranges VPD page (for SCSI) and data log page
(for ATA) contain parameters describing the set of contiguous LBAs that
can be served independently by a single LUN multi-actuator hard-disk.
Similarly, a logically defined block device composed of multiple disks
can in some cases execute requests directed at different sector ranges
in parallel. A dm-linear device aggregating 2 block devices together is
an example.

This patch implements support for exposing a block device independent
access ranges to the user through sysfs to allow optimizing device
accesses to increase performance.

To describe the set of independent sector ranges of a device (actuators
of a multi-actuator HDDs or table entries of a dm-linear device),
The type struct blk_independent_access_ranges is introduced. This
structure describes the sector ranges using an array of
struct blk_independent_access_range structures. This range structure
defines the start sector and number of sectors of the access range.
The ranges in the array cannot overlap and must contain all sectors
within the device capacity.

The function disk_set_independent_access_ranges() allows a device
driver to signal to the block layer that a device has multiple
independent access ranges.  In this case, a struct
blk_independent_access_ranges is attached to the device request queue
by the function disk_set_independent_access_ranges(). The function
disk_alloc_independent_access_ranges() is provided for drivers to
allocate this structure.

struct blk_independent_access_ranges contains kobjects (struct kobject)
to expose to the user through sysfs the set of independent access ranges
supported by a device. When the device is initialized, sysfs
registration of the ranges information is done from blk_register_queue()
using the block layer internal function
disk_register_independent_access_ranges(). If a driver calls
disk_set_independent_access_ranges() for a registered queue, e.g. when a
device is revalidated, disk_set_independent_access_ranges() will execute
disk_register_independent_access_ranges() to update the sysfs attribute
files.  The sysfs file structure created starts from the
independent_access_ranges sub-directory and contains the start sector
and number of sectors of each range, with the information for each range
grouped in numbered sub-directories.

E.g. for a dual actuator HDD, the user sees:

$ tree /sys/block/sdk/queue/independent_access_ranges/
/sys/block/sdk/queue/independent_access_ranges/
|-- 0
|   |-- nr_sectors
|   `-- sector
`-- 1
    |-- nr_sectors
    `-- sector

For a regular device with a single access range, the
independent_access_ranges sysfs directory does not exist.

Device revalidation may lead to changes to this structure and to the
attribute values. When manipulated, the queue sysfs_lock and
sysfs_dir_lock mutexes are held for atomicity, similarly to how the
blk-mq and elevator sysfs queue sub-directories are protected.

The code related to the management of independent access ranges is
added in the new file block/blk-ia-ranges.c.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20211027022223.183838-2-damien.lemoal@wdc.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

a2247f19

Oct 19, 2021

blk-mq: don't handle non-flush requests in blk_insert_flush · d92ca9d8

Christoph Hellwig authored 3 years ago

Return to the normal blk_mq_submit_bio flow if the bio did not end up
actually being a flush because the device didn't support it. Note that
this is basically impossible to hit without special instrumentation given
that submit_bio_checks already clears these flags usually, so we'd need a
tight race to actually hit this code path.

With this the call to blk_mq_run_hw_queue for the flush requests can be
removed given that the actual flush requests are always issued via the
requeue workqueue which runs the queue unconditionally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211019122553.2467817-1-hch@lst.de

Signed-off-by: Jens Axboe <axboe@kernel.dk>

d92ca9d8

block: return whether or not to unplug through boolean · 87c037d1

Jens Axboe authored 3 years ago


Instead of returning the same queue request through a request pointer,
use a boolean to accomplish the same.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

87c037d1

Oct 18, 2021

block: move update request helpers into blk-mq.c · 9be3e06f

Jens Axboe authored 3 years ago


For some reason we still have them in blk-core, with the rest of the
request completion being in blk-mq. That causes and out-of-line call
for each completion.

Move them into blk-mq.c instead, where they belong.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9be3e06f

block: handle fast path of bio splitting inline · abd45c15

Jens Axboe authored 3 years ago


The fast path is no splitting needed. Separate the handling into a
check part we can inline, and an out-of-line handling path if we do
need to split.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

abd45c15

block: rename REQ_HIPRI to REQ_POLLED · 6ce913fe

Christoph Hellwig authored 3 years ago


Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague,
the bio flag is specific to the polling implementation, so rename and
document it properly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

6ce913fe

block: inline hot paths of blk_account_io_*() · be6bfe36

Pavel Begunkov authored 3 years ago


Extract hot paths of __blk_account_io_start() and
__blk_account_io_done() into inline functions, so we don't always pay
for function calls.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b0662a636bd4cc7b4f84c9d0a41efa46a688ef13.1633781740.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

be6bfe36

block: merge block_ioctl into blkdev_ioctl · 8a709512

Christoph Hellwig authored 3 years ago


Simplify the ioctl path and match the code structure on the compat side.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104450.659013-4-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

8a709512

block: move the *blkdev_ioctl declarations out of blkdev.h · 84b8514b

Christoph Hellwig authored 3 years ago


These are only used inside of block/.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104450.659013-3-hch@lst.de


Signed-off-by: Jens Axboe <axboe@kernel.dk>

84b8514b

block: bump max plugged deferred size from 16 to 32 · ba0ffdd8

Jens Axboe authored 3 years ago


Particularly for NVMe with efficient deferred submission for many
requests, there are nice benefits to be seen by bumping the default max
plug count from 16 to 32. This is especially true for virtualized setups,
where the submit part is more expensive. But can be noticed even on
native hardware.

Reduce the multiple queue factor from 4 to 2, since we're changing the
default size.

While changing it, move the defines into the block layer private header.
These aren't values that anyone outside of the block layer uses, or
should use.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

ba0ffdd8

block: move blk-throtl fast path inline · a7b36ee6

Jens Axboe authored 3 years ago


Even if no policies are defined, we spend ~2% of the total IO time
checking. Move the fast path inline.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a7b36ee6

Admin message