Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Nov 29, 2021
  2. Oct 21, 2021
  3. Oct 18, 2021
  4. Jun 24, 2021
    • Jan Kara's avatar
      blk: Fix lock inversion between ioc lock and bfqd lock · fd2ef39c
      Jan Kara authored
      Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
      
      bfqd -> ioc:
       put_io_context+0x33/0x90 -> ioc->lock grabbed
       blk_mq_free_request+0x51/0x140
       blk_put_request+0xe/0x10
       blk_attempt_req_merge+0x1d/0x30
       elv_attempt_insert_merge+0x56/0xa0
       blk_mq_sched_try_insert_merge+0x4b/0x60
       bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
       blk_mq_sched_insert_requests+0xd6/0x2b0
       blk_mq_flush_plug_list+0x154/0x280
       blk_finish_plug+0x40/0x60
       ext4_writepages+0x696/0x1320
       do_writepages+0x1c/0x80
       __filemap_fdatawrite_range+0xd7/0x120
       sync_file_range+0xac/0xf0
      
      ioc->bfqd:
       bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
       put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
       exit_io_context+0x48/0x50
       do_exit+0x7e9/0xdd0
       do_group_exit+0x54/0xc0
      
      To avoid this inversion we change blk_mq_sched_try_insert_merge() to not
      free the merged request but rather leave that upto the caller similarly
      to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure
      to free all the merged requests after dropping bfqd->lock.
      
      Fixes: aee69d78
      
       ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Acked-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210623093634.27879-3-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fd2ef39c
  5. May 24, 2021
  6. Feb 22, 2021
  7. Sep 08, 2020
    • Omar Sandoval's avatar
      block: only call sched requeue_request() for scheduled requests · e8a8a185
      Omar Sandoval authored
      
      Yang Yang reported the following crash caused by requeueing a flush
      request in Kyber:
      
        [    2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
        ...
        [    2.517468] pc : clear_bit+0x18/0x2c
        [    2.517502] lr : sbitmap_queue_clear+0x40/0x228
        [    2.517503] sp : ffffff800832bc60 pstate : 00c00145
        ...
        [    2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
        [    2.517602] Call trace:
        [    2.517606]  clear_bit+0x18/0x2c
        [    2.517619]  kyber_finish_request+0x74/0x80
        [    2.517627]  blk_mq_requeue_request+0x3c/0xc0
        [    2.517637]  __scsi_queue_insert+0x11c/0x148
        [    2.517640]  scsi_softirq_done+0x114/0x130
        [    2.517643]  blk_done_softirq+0x7c/0xb0
        [    2.517651]  __do_softirq+0x208/0x3bc
        [    2.517657]  run_ksoftirqd+0x34/0x60
        [    2.517663]  smpboot_thread_fn+0x1c4/0x2c0
        [    2.517667]  kthread+0x110/0x120
        [    2.517669]  ret_from_fork+0x10/0x18
      
      This happens because Kyber doesn't track flush requests, so
      kyber_finish_request() reads a garbage domain token. Only call the
      scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for
      the finish_request() hook in blk_mq_free_request()). Now that we're
      handling it in blk-mq, also remove the check from BFQ.
      
      Reported-by: default avatarYang Yang <yang.yang@vivo.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e8a8a185
  8. Sep 07, 2020
  9. Jul 23, 2019
  10. Jun 20, 2019
    • Christoph Hellwig's avatar
      block: remove the bi_phys_segments field in struct bio · 14ccb66b
      Christoph Hellwig authored
      
      We only need the number of segments in the blk-mq submission path.
      Remove the field from struct bio, and return it from a variant of
      blk_queue_split instead of that it can passed as an argument to
      those functions that need the value.
      
      This also means we stop recounting segments except for cloning
      and partial segments.
      
      To keep the number of arguments in this how path down remove
      pointless struct request_queue arguments from any of the functions
      that had it and grew a nr_segs argument.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      14ccb66b
  11. Jun 07, 2019
    • Ming Lei's avatar
      block: free sched's request pool in blk_cleanup_queue · c3e22192
      Ming Lei authored
      In theory, IO scheduler belongs to request queue, and the request pool
      of sched tags belongs to the request queue too.
      
      However, the current tags allocation interfaces are re-used for both
      driver tags and sched tags, and driver tags is definitely host wide,
      and doesn't belong to any request queue, same with its request pool.
      So we need tagset instance for freeing request of sched tags.
      
      Meantime, blk_mq_free_tag_set() often follows blk_cleanup_queue() in case
      of non-BLK_MQ_F_TAG_SHARED, this way requires that request pool of sched
      tags to be freed before calling blk_mq_free_tag_set().
      
      Commit 47cdee29 ("block: move blk_exit_queue into __blk_release_queue")
      moves blk_exit_queue into __blk_release_queue for simplying the fast
      path in generic_make_request(), then causes oops during freeing requests
      of sched tags in __blk_release_queue().
      
      Fix the above issue by move freeing request pool of sched tags into
      blk_cleanup_queue(), this way is safe becasue queue has been frozen and no any
      in-queue requests at that time. Freeing sched tags has to be kept in queue's
      release handler becasue there might be un-completed dispatch activity
      which might refer to sched tags.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Fixes: 47cdee29
      
       ("block: move blk_exit_queue into __blk_release_queue")
      Tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c3e22192
  12. Dec 17, 2018
    • Damien Le Moal's avatar
      block: mq-deadline: Fix write completion handling · 7211aef8
      Damien Le Moal authored
      For a zoned block device using mq-deadline, if a write request for a
      zone is received while another write was already dispatched for the same
      zone, dd_dispatch_request() will return NULL and the newly inserted
      write request is kept in the scheduler queue waiting for the ongoing
      zone write to complete. With this behavior, when no other request has
      been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
      and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
      __blk_mq_free_request() call of blk_mq_sched_restart() to not run the
      queue when the already dispatched write request completes. The newly
      dispatched request stays stuck in the scheduler queue until eventually
      another request is submitted.
      
      This problem does not affect SCSI disk as the SCSI stack handles queue
      restart on request completion. However, this problem is can be triggered
      the nullblk driver with zoned mode enabled.
      
      Fix this by always requesting a queue restart in dd_dispatch_request()
      if no request was dispatched while WRITE requests are queued.
      
      Fixes: 5700f691
      
       ("mq-deadline: Introduce zone locking support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      
      Add missing export of blk_mq_sched_restart()
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7211aef8
  13. Nov 19, 2018
  14. Nov 07, 2018
  15. Sep 27, 2018
  16. Aug 21, 2018
    • Jianchao Wang's avatar
      blk-mq: init hctx sched after update ctx and hctx mapping · d48ece20
      Jianchao Wang authored
      
      Currently, when update nr_hw_queues, IO scheduler's init_hctx will
      be invoked before the mapping between ctx and hctx is adapted
      correctly by blk_mq_map_swqueue. The IO scheduler init_hctx (kyber)
      may depend on this mapping and get wrong result and panic finally.
      A simply way to fix this is that switch the IO scheduler to 'none'
      before update the nr_hw_queues, and then switch it back after
      update nr_hw_queues. blk_mq_sched_init_/exit_hctx are removed due
      to nobody use them any more.
      
      Signed-off-by: default avatarJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d48ece20
  17. Jun 01, 2018
  18. Jan 17, 2018
  19. Nov 02, 2017
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  20. Nov 01, 2017
    • Ming Lei's avatar
      blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE · 1f460b63
      Ming Lei authored
      
      SCSI restarts its queue in scsi_end_request() automatically, so we don't
      need to handle this case in blk-mq.
      
      Especailly any request won't be dequeued in this case, we needn't to
      worry about IO hang caused by restart vs. dispatch.
      
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1f460b63
    • Ming Lei's avatar
      blk-mq: introduce .get_budget and .put_budget in blk_mq_ops · de148297
      Ming Lei authored
      
      For SCSI devices, there is often a per-request-queue depth, which needs
      to be respected before queuing one request.
      
      Currently blk-mq always dequeues the request first, then calls
      .queue_rq() to dispatch the request to lld. One obvious issue with this
      approach is that I/O merging may not be successful, because when the
      per-request-queue depth can't be respected, .queue_rq() has to return
      BLK_STS_RESOURCE, and then this request has to stay in hctx->dispatch
      list. This means it never gets a chance to be merged with other IO.
      
      This patch introduces .get_budget and .put_budget callback in blk_mq_ops,
      then we can try to get reserved budget first before dequeuing request.
      If the budget for queueing I/O can't be satisfied, we don't need to
      dequeue request at all. Hence the request can be left in the IO
      scheduler queue, for more merging opportunities.
      
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      de148297
  21. Jun 21, 2017
  22. Jun 18, 2017
  23. May 26, 2017
  24. Apr 20, 2017
  25. Apr 14, 2017
  26. Apr 07, 2017