- Nov 29, 2021
-
-
David Howells authored
Adjust the netfslib docs in light of the foliation changes. Also un-kdoc-mark netfs_skip_folio_read() since it's internal and isn't part of the API. Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cachefs@redhat.com cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/163706992597.3179783.18360472879717076435.stgit@warthog.procyon.org.uk/ Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Nov 27, 2021
-
-
Guenter Roeck authored
NTFS_RW code allocates page size dependent arrays on the stack. This results in build failures if the page size is 64k or larger. fs/ntfs/aops.c: In function 'ntfs_write_mst_block': fs/ntfs/aops.c:1311:1: error: the frame size of 2240 bytes is larger than 2048 bytes Since commit f22969a6 ("powerpc/64s: Default to 64K pages for 64 bit book3s") this affects ppc:allmodconfig builds, but other architectures supporting page sizes of 64k or larger are also affected. Increasing the maximum frame size for affected architectures just to silence this error does not really help. The frame size would have to be set to a really large value for 256k pages. Also, a large frame size could potentially result in stack overruns in this code and elsewhere and is therefore not desirable. Make NTFS_RW dependent on page sizes smaller than 64k instead. Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: L...
-
Ye Bin authored
We got issue as follows: ================================================================================ UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14 signed integer overflow: -4966321760114568020 * 1000000000 cannot be represented in type 'long long int' CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x170/0x1dc lib/dump_stack.c:118 ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161 handle_overflow+0x188/0x1dc lib/ubsan.c:192 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213 ktime_set include/linux/ktime.h:42 [inline] timespec64_to_ktime include/linux/ktime.h:78 [inline] io_timeout fs/io_uring.c:5153 [inline] io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599 __io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988 io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067 io_submit_sqe fs/io_uring.c:6137 [inline] io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331 __do_sys_io_uring_enter fs/io_uring.c:8170 [inline] __se_sys_io_uring_enter fs/io_uring.c:8129 [inline] __arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129 invoke_syscall arch/arm64/kernel/syscall.c:53 [inline] el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121 el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190 el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017 ================================================================================ As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass negative value maybe lead to overflow. To address this issue, we must check if 'sec' is negative. Signed-off-by:
Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ye Bin authored
I got issue as follows: [ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680 [ 594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108] [ 594.364987] Modules linked in: [ 594.365405] irq event stamp: 604180238 [ 594.365906] hardirqs last enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50 [ 594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0 [ 594.368420] softirqs last enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e [ 594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250 [ 594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G L 5.15.0-next-20211112+ #88 [ 594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 594.373604] Workqueue: events_unbound io_ring_exit_work [ 594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50 [ 594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474 [ 594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202 [ 594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106 [ 594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001 [ 594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab [ 594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0 [ 594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000 [ 594.382787] FS: 0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000 [ 594.383851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0 [ 594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 594.387403] Call Trace: [ 594.387738] <TASK> [ 594.388042] find_and_remove_object+0x118/0x160 [ 594.389321] delete_object_full+0xc/0x20 [ 594.389852] kfree+0x193/0x470 [ 594.390275] __io_remove_buffers.part.0+0xed/0x147 [ 594.390931] io_ring_ctx_free+0x342/0x6a2 [ 594.392159] io_ring_exit_work+0x41e/0x486 [ 594.396419] process_one_work+0x906/0x15a0 [ 594.399185] worker_thread+0x8b/0xd80 [ 594.400259] kthread+0x3bf/0x4a0 [ 594.401847] ret_from_fork+0x22/0x30 [ 594.402343] </TASK> Message from syslogd@localhost at Nov 13 09:09:54 ... kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108] [ 596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680 We can reproduce this issue by follow syzkaller log: r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0) sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0) syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0) io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0) The reason above issue is 'buf->list' has 2,100,000 nodes, occupied cpu lead to soft lockup. To solve this issue, we need add schedule point when do while loop in '__io_remove_buffers'. After add schedule point we do regression, get follow data. [ 240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00 [ 268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180 [ 275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00 [ 296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380 [ 305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180 [ 325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00 [ 333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380 ... Fixes:8bab4c09 ("io_uring: allow conditional reschedule for intensive iterators") Signed-off-by:
Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 26, 2021
-
-
Pavel Begunkov authored
WARNING: inconsistent lock state 5.16.0-rc2-syzkaller #0 Not tainted inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. ffff888078e11418 (&ctx->timeout_lock ){?.+.}-{2:2} , at: io_timeout_fn+0x6f/0x360 fs/io_uring.c:5943 {HARDIRQ-ON-W} state was registered at: [...] spin_unlock_irq include/linux/spinlock.h:399 [inline] __io_poll_remove_one fs/io_uring.c:5669 [inline] __io_poll_remove_one fs/io_uring.c:5654 [inline] io_poll_remove_one+0x236/0x870 fs/io_uring.c:5680 io_poll_remove_all+0x1af/0x235 fs/io_uring.c:5709 io_ring_ctx_wait_and_kill+0x1cc/0x322 fs/io_uring.c:9534 io_uring_release+0x42/0x46 fs/io_uring.c:9554 __fput+0x286/0x9f0 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xc14/0x2b40 kernel/exit.c:832 674ee8e1 ("io_uring: correct link-list traversal locking") fixed a data race but introduced a possible deadlock and inconsistentcy in irq states. E.g. io_poll_remove_all() spin_lock_irq(timeout_lock) io_poll_remove_one() spin_lock/unlock_irq(poll_lock); spin_unlock_irq(timeout_lock) Another type of problem is freeing a request while holding ->timeout_lock, which may leads to a deadlock in io_commit_cqring() -> io_flush_timeouts() and other places. Having 3 nested locks is also too ugly. Add io_match_task_safe(), which would briefly take and release timeout_lock for race prevention inside, so the actuall request cancellation / free / etc. code doesn't have it taken. Reported-by:
<syzbot+ff49a3059d49b0ca0eec@syzkaller.appspotmail.com> Reported-by:
<syzbot+847f02ec20a6609a328b@syzkaller.appspotmail.com> Reported-by:
<syzbot+3368aadcd30425ceb53b@syzkaller.appspotmail.com> Reported-by:
<syzbot+51ce8887cdef77c9ac83@syzkaller.appspotmail.com> Reported-by:
<syzbot+3cb756a49d2f394a9ee3@syzkaller.appspotmail.com> Fixes: 674ee8e1 ("io_uring: correct link-list traversal locking") Cc: stable@kernel.org # 5.15+ Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/397f7ebf3f4171f1abe41f708ac1ecb5766f0b68.1637937097.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
WARNING: CPU: 1 PID: 20 at fs/io_uring.c:6269 io_try_cancel_userdata+0x3c5/0x640 fs/io_uring.c:6269 CPU: 1 PID: 20 Comm: kworker/1:0 Not tainted 5.16.0-rc1-syzkaller #0 Workqueue: events io_fallback_req_func RIP: 0010:io_try_cancel_userdata+0x3c5/0x640 fs/io_uring.c:6269 Call Trace: <TASK> io_req_task_link_timeout+0x6b/0x1e0 fs/io_uring.c:6886 io_fallback_req_func+0xf9/0x1ae fs/io_uring.c:1334 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> We need original task's context to do cancellations, so if it's dying and the callback is executed in a fallback mode, fail the cancellation attempt. Fixes: 89b263f6 ("io_uring: run linked timeouts from task_work") Cc: stable@kernel.org # 5.15+ Reported-by:
<syzbot+ab0cfe96c2b3cd1c1153@syzkaller.appspotmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4c41c5f379c6941ad5a07cd48cb66ed62199cf7e.1637937097.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Qu Wenruo authored
[BUG] Fstests generic/027 is pretty easy to trigger a slow but steady memory leak if run with "-o compress=lzo" mount option. Normally one single run of generic/027 is enough to eat up at least 4G ram. [CAUSE] In commit d4088803 ("btrfs: subpage: make lzo_compress_pages() compatible") we changed how @page_in is released. But that refactoring makes @page_in only released after all pages being compressed. This leaves error path not releasing @page_in. And by "error path" things like incompressible data will also be treated as an error (-E2BIG). Thus it can cause a memory leak if even nothing wrong happened. [FIX] Add check under @out label to release @page_in when needed, so when we hit any error, the input page is properly released. Reported-by:
Josef Bacik <josef@toxicpanda.com> Fixes: d4088803 ("btrfs: subpage: make lzo_compress_pages() compatible") Reviewed-and-tested-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Qu Wenruo <wqu@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- Nov 25, 2021
-
-
Miklos Szeredi authored
Checking buf->flags should be done before the pipe_buf_release() is called on the pipe buffer, since releasing the buffer might modify the flags. This is exactly what page_cache_pipe_buf_release() does, and which results in the same VM_BUG_ON_PAGE(PageLRU(page)) that the original patch was trying to fix. Reported-by:
Justin Forbes <jmforbes@linuxtx.org> Fixes: 712a9510 ("fuse: fix page stealing") Cc: <stable@vger.kernel.org> # v2.6.35 Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Namjae Jeon authored
Fix memleak in get_file_stream_info() Fixes: 34061d6b ("ksmbd: validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests") Cc: stable@vger.kernel.org # v5.15 Reported-by:
Coverity Scan <scan-admin@coverity.com> Acked-by:
Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by:
Namjae Jeon <linkinjeon@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Namjae Jeon authored
If xattr is not supported like exfat or fat, ksmbd server doesn't contain default data stream in FILE_STREAM_INFORMATION response. It will cause ppt or doc file update issue if local filesystem is such as ones. This patch move goto statement to contain it. Fixes: 9f632331 ("ksmbd: add default data stream name in FILE_STREAM_INFORMATION") Cc: stable@vger.kernel.org # v5.15 Acked-by:
Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by:
Namjae Jeon <linkinjeon@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Namjae Jeon authored
While file transfer through windows client, This error flood message happen. This flood message will cause performance degradation and misunderstand server has problem. Fixes: e294f78d ("ksmbd: allow PROTECTED_DACL_SECINFO and UNPROTECTED_DACL_SECINFO addition information in smb2 set info security") Cc: stable@vger.kernel.org # v5.15 Acked-by:
Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by:
Namjae Jeon <linkinjeon@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Christophe JAILLET authored
All the error handling paths of 'smb2_sess_setup()' end to 'out_err'. All but the new error handling path added by the commit given in the Fixes tag below. Fix this error handling path and branch to 'out_err' as well. Fixes: 0d994cd4 ("ksmbd: add buffer validation in session setup") Cc: stable@vger.kernel.org # v5.15 Acked-by:
Namjae Jeon <linkinjeon@kernel.org> Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
- Nov 24, 2021
-
-
Andreas Gruenbacher authored
Change iomap_read_inline_data to return 0 or an error code; this simplifies the callers. Add a description. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> [djwong: document the return value of iomap_read_inline_data explicitly] Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
Christoph Hellwig authored
With the remove of xfs_dqrele_all_inodes, xfs_inew_wait and all the infrastructure used to wake the XFS_INEW bit waitqueue is unused. Reported-by:
kernel test robot <lkp@intel.com> Fixes: 777eb1fa ("xfs: remove xfs_dqrele_all_inodes") Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
Yang Xu authored
When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine. Adding a getxattr operation after xattr corrupted, I can reproduce it 100%. The deadlock as below: [983.923403] task:setfattr state:D stack: 0 pid:17639 ppid: 14687 flags:0x00000080 [ 983.923405] Call Trace: [ 983.923410] __schedule+0x2c4/0x700 [ 983.923412] schedule+0x37/0xa0 [ 983.923414] schedule_timeout+0x274/0x300 [ 983.923416] __down+0x9b/0xf0 [ 983.923451] ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs] [ 983.923453] down+0x3b/0x50 [ 983.923471] xfs_buf_lock+0x33/0xf0 [xfs] [ 983.923490] xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs] [ 983.923508] xfs_buf_get_map+0x4c/0x320 [xfs] [ 983.923525] xfs_buf_read_map+0x53/0x310 [xfs] [ 983.923541] ? xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923560] xfs_trans_read_buf_map+0x1cf/0x360 [xfs] [ 983.923575] ? xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923590] xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923606] xfs_da3_node_read+0x1f/0x40 [xfs] [ 983.923621] xfs_da3_node_lookup_int+0x69/0x4a0 [xfs] [ 983.923624] ? kmem_cache_alloc+0x12e/0x270 [ 983.923637] xfs_attr_node_hasname+0x6e/0xa0 [xfs] [ 983.923651] xfs_has_attr+0x6e/0xd0 [xfs] [ 983.923664] xfs_attr_set+0x273/0x320 [xfs] [ 983.923683] xfs_xattr_set+0x87/0xd0 [xfs] [ 983.923686] __vfs_removexattr+0x4d/0x60 [ 983.923688] __vfs_removexattr_locked+0xac/0x130 [ 983.923689] vfs_removexattr+0x4e/0xf0 [ 983.923690] removexattr+0x4d/0x80 [ 983.923693] ? __check_object_size+0xa8/0x16b [ 983.923695] ? strncpy_from_user+0x47/0x1a0 [ 983.923696] ? getname_flags+0x6a/0x1e0 [ 983.923697] ? _cond_resched+0x15/0x30 [ 983.923699] ? __sb_start_write+0x1e/0x70 [ 983.923700] ? mnt_want_write+0x28/0x50 [ 983.923701] path_removexattr+0x9b/0xb0 [ 983.923702] __x64_sys_removexattr+0x17/0x20 [ 983.923704] do_syscall_64+0x5b/0x1a0 [ 983.923705] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 983.923707] RIP: 0033:0x7f080f10ee1b When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job. Then subsequent removexattr will hang because of it. This bug was introduced by kernel commit 07120f1a ("xfs: Add xfs_has_attr and subroutines"). It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state in this case. But xfs_attr_node_hasname will free state itself instead of caller if xfs_da3_node_lookup_int fails. Fix this bug by moving the step of free state into caller. Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and xfs_attr_node_removename_setup function because we should free state ourselves. Fixes: 07120f1a ("xfs: Add xfs_has_attr and subroutines") Signed-off-by:
Yang Xu <xuyang2018.jy@fujitsu.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
- Nov 23, 2021
-
-
Steve French authored
To 2.34 Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
It is clearer to initialize rc at the beginning of the function. Reported-by:
kernel test robot <lkp@intel.com> Reported-by:
Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Shyam Prasad N authored
Recently, a new field got added to the smb3_fs_context struct named server_hostname. While creating extra channels, pick up this field from primary channel. Signed-off-by:
Shyam Prasad N <sprasad@microsoft.com> Reviewed-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Shyam Prasad N authored
Recent fix to maintain a nosharesock state on the server struct caused a regression. It updated this field in the old tcp session, and not the new one. This caused the multichannel scenario to misbehave. Fixes: c9f1c19c (cifs: nosharesock should not share socket with future sessions) Signed-off-by:
Shyam Prasad N <sprasad@microsoft.com> Reviewed-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Huang Jianan authored
We observed the following deadlock in the stress test under low memory scenario: Thread A Thread B - erofs_shrink_scan - erofs_try_to_release_workgroup - erofs_workgroup_try_to_freeze -- A - z_erofs_do_read_page - z_erofs_collection_begin - z_erofs_register_collection - erofs_insert_workgroup - xa_lock(&sbi->managed_pslots) -- B - erofs_workgroup_get - erofs_wait_on_workgroup_freezed -- A - xa_erase - xa_lock(&sbi->managed_pslots) -- B To fix this, it needs to hold xa_lock before freezing the workgroup since xarray will be touched then. So let's hold the lock before accessing each workgroup, just like what we did with the radix tree before. [ Gao Xiang: Jianhua Hao also reports this issue at https://lore.kernel.org/r/b10b85df30694bac8aadfe43537c897a@xiaomi.com ] Link: https://lore.kernel.org/r/20211118135844.3559-1-huangjianan@oppo.com Fixes: 64094a04 ("erofs: convert workstn to XArray") Reviewed-by:
Chao Yu <chao@kernel.org> Reviewed-by:
Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by:
Huang Jianan <huangjianan@oppo.com> Reported-by:
Jianhua Hao <haojianhua1@xiaomi.com> Signed-off-by:
Gao Xiang <xiang@kernel.org>
-
- Nov 22, 2021
-
-
Pavel Begunkov authored
As io_remove_next_linked() is now under ->timeout_lock (see io_link_timeout_fn), we should update locking around io_for_each_link() and io_match_task() to use the new lock. Cc: stable@kernel.org # 5.15+ Fixes: 89850fce ("io_uring: run timeouts from task_work") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b54541cedf7de59cb5ae36109e58529ca16e66aa.1637631883.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 21, 2021
-
-
Andreas Gruenbacher authored
Before commit 740499c7 ("iomap: fix the iomap_readpage_actor return value for inline data"), when hitting an IOMAP_INLINE extent, iomap_readpage_actor would report having read the entire page. Since then, it only reports having read the inline data (iomap->length). This will force iomap_readpage into another iteration, and the filesystem will report an unaligned hole after the IOMAP_INLINE extent. But iomap_readpage_actor (now iomap_readpage_iter) isn't prepared to deal with unaligned extents, it will get things wrong on filesystems with a block size smaller than the page size, and we'll eventually run into the following warning in iomap_iter_advance: WARN_ON_ONCE(iter->processed > iomap_length(iter)); Fix that by changing iomap_readpage_iter to return 0 when hitting an inline extent; this will cause iomap_iter to stop immediately. To fix readahead as well, change iomap_readahead_iter to pass on iomap_readpage_iter return values less than or equal to zero. Fixes: 740499c7 ("iomap: fix the iomap_readpage_actor return value for inline data") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
Geert Uytterhoeven authored
On 32-bit: fs/pstore/blk.c: In function ‘__best_effort_init’: include/linux/kern_levels.h:5:18: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 3 has type ‘long unsigned int’ [-Wformat=] 5 | #define KERN_SOH "\001" /* ASCII Start Of Header */ | ^~~~~~ include/linux/kern_levels.h:14:19: note: in expansion of macro ‘KERN_SOH’ 14 | #define KERN_INFO KERN_SOH "6" /* informational */ | ^~~~~~~~ include/linux/printk.h:373:9: note: in expansion of macro ‘KERN_INFO’ 373 | printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) | ^~~~~~~~~ fs/pstore/blk.c:314:3: note: in expansion of macro ‘pr_info’ 314 | pr_info("attached %s (%zu) (no dedicated panic_write!)\n", | ^~~~~~~ Cc: stable@vger.kernel.org Fixes: 7bb9557b ("pstore/blk: Use the normal block device I/O path") Signed-off-by:
Geert Uytterhoeven <geert@linux-m68k.org...>
-
- Nov 20, 2021
-
-
David Hildenbrand authored
To clear a user buffer we cannot simply use memset, we have to use clear_user(). With a virtio-mem device that registers a vmcore_cb and has some logically unplugged memory inside an added Linux memory block, I can easily trigger a BUG by copying the vmcore via "cp": systemd[1]: Starting Kdump Vmcore Save Service... kdump[420]: Kdump is using the default log level(3). kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ kdump[465]: saving vmcore-dmesg.txt complete kdump[467]: saving vmcore BUG: unable to handle page fault for address: 00007f2374e01000 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867 Oops: 0003 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86 Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81 RSP: 0018:ffffc9000073be08 EFLAGS: 00010212 RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000 RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008 RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50 R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000 R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8 FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0 Call Trace: read_vmcore+0x236/0x2c0 proc_reg_read+0x55/0xa0 vfs_read+0x95/0x190 ksys_read+0x4f/0xc0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access Prevention (SMAP)", which is used to detect wrong access from the kernel to user buffers like this: SMAP triggers a permissions violation on wrong access. In the x86-64 variant of clear_user(), SMAP is properly handled via clac()+stac(). To fix, properly use clear_user() when we're dealing with a user buffer. Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com Fixes: 997c136f ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages") Signed-off-by:
David Hildenbrand <david@redhat.com> Acked-by:
Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Philipp Rudo <prudo@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Nov 17, 2021
-
-
Olga Kornievskaia authored
When the client receives ERR_NOSPC on reply to CREATE_SESSION it leads to a client hanging in nfs_wait_client_init_complete(). Instead, complete and fail the client initiation with an EIO error which allows for the mount command to fail instead of hanging. Signed-off-by:
Olga Kornievskaia <kolga@netapp.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Benjamin Coddington authored
The mechanism in use to allow the client to see the results of COPY/CLONE is to drop those pages from the pagecache. This forces the client to read those pages once more from the server. However, truncate_pagecache_range() zeros out partial pages instead of dropping them. Let us instead use invalidate_inode_pages2_range() with full-page offsets to ensure the client properly sees the results of COPY/CLONE operations. Cc: <stable@vger.kernel.org> # v4.7+ Fixes: 2e72448b ("NFS: Add COPY nfs operation") Signed-off-by:
Benjamin Coddington <bcodding@redhat.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Benjamin Coddington authored
This provides some insight into the client's invalidation behavior to show both when the client uses the helper, and the results of calling the helper which can vary depending on how the helper is called. Signed-off-by:
Benjamin Coddington <bcodding@redhat.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
The failure to retrieve post-op attributes has no bearing on whether or not the clone operation itself was successful. We must therefore ignore the return value of decode_getfattr() when looking at the success or failure of nfs4_xdr_dec_clone(). Fixes: 36022770 ("nfs42: add CLONE xdr functions") Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Matthew Wilcox (Oracle) authored
Instead of setting a bit in the fs_flags to set a bit in the address_space, set the bit in the address_space directly. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <djwong@kernel.org>
-
Christian Brauner authored
When calling setattr_prepare() to determine the validity of the attributes the ia_{g,u}id fields contain the value that will be written to inode->i_{g,u}id. When the {g,u}id attribute of the file isn't altered and the caller's fs{g,u}id matches the current {g,u}id attribute the attribute change is allowed. The value in ia_{g,u}id does already account for idmapped mounts and will have taken the relevant idmapping into account. So in order to verify that the {g,u}id attribute isn't changed we simple need to compare the ia_{g,u}id value against the inode's i_{g,u}id value. This only has any meaning for idmapped mounts as idmapping helpers are idempotent without them. And for idmapped mounts this really only has a meaning when circular idmappings are used, i.e. mappings where e.g. id 1000 is mapped to id 1001 and id 1001 is mapped to id 1000. Such ciruclar mappings can e.g. be useful when sharing the same home directory between multiple users at the same time. As an example consider a directory with two files: /source/file1 owned by {g,u}id 1000 and /source/file2 owned by {g,u}id 1001. Assume we create an idmapped mount at /target with an idmapping that maps files owned by {g,u}id 1000 to being owned by {g,u}id 1001 and files owned by {g,u}id 1001 to being owned by {g,u}id 1000. In effect, the idmapped mount at /target switches the ownership of /source/file1 and source/file2, i.e. /target/file1 will be owned by {g,u}id 1001 and /target/file2 will be owned by {g,u}id 1000. This means that a user with fs{g,u}id 1000 must be allowed to setattr /target/file2 from {g,u}id 1000 to {g,u}id 1000. Similar, a user with fs{g,u}id 1001 must be allowed to setattr /target/file1 from {g,u}id 1001 to {g,u}id 1001. Conversely, a user with fs{g,u}id 1000 must fail to setattr /target/file1 from {g,u}id 1001 to {g,u}id 1000. And a user with fs{g,u}id 1001 must fail to setattr /target/file2 from {g,u}id 1000 to {g,u}id 1000. Both cases must fail with EPERM for non-capable callers. Before this patch we could end up denying legitimate attribute changes and allowing invalid attribute changes when circular mappings are used. To even get into this situation the caller must've been privileged both to create that mapping and to create that idmapped mount. This hasn't been seen in the wild anywhere but came up when expanding the testsuite during work on a series of hardening patches. All idmapped fstests pass without any regressions and we add new tests to verify the behavior of circular mappings. Link: https://lore.kernel.org/r/20211109145713.1868404-1-brauner@kernel.org Fixes: 2f221d6f ("attr: handle idmapped mounts") Cc: Seth Forshee <seth.forshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org CC: linux-fsdevel@vger.kernel.org Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Seth Forshee <sforshee@digitalocean.com> Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com>
-
- Nov 16, 2021
-
-
Kamal Mostafa authored
Fix comment referring to function "io_uring_del_task_file()", now called "io_uring_del_tctx_node()". Fixes: eef51daa ("io_uring: rename function *task_file") Signed-off-by:
Kamal Mostafa <kamal@canonical.com> Link: https://lore.kernel.org/r/20211116175530.31608-1-kamal@canonical.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kees Cook authored
This reverts commit d07f3b08. pstore-blk was fixed to avoid the unwanted APIs in commit 7bb9557b ("pstore/blk: Use the normal block device I/O path"), which landed in the same release as the commit adding BROKEN. Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20211116181559.3975566-1-keescook@chromium.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Paulo Alcantara authored
Use new cifs_ses_mark_for_reconnect() helper to mark all session channels for reconnect instead of duplicating it in different places. Signed-off-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
Updates to the srv_count field are protected elsewhere with the cifs_tcp_ses_lock spinlock. Add one missing place (cifs_get_tcp_sesion). CC: Shyam Prasad N <sprasad@microsoft.com> Addresses-Coverity: 1494149 ("Data Race Condition") Reviewed-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
It is better to print debug messages outside of the chan_lock spinlock where possible. Reviewed-by:
Shyam Prasad N <sprasad@microsoft.com> Addresses-Coverity: 1493854 ("Thread deadlock") Reviewed-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Nikolay Borisov authored
The v2 balance ioctl has been introduced more than 9 years ago. Users of the old v1 ioctl should have long been migrated to it. It's time we deprecate it and eventually remove it. The only known user is in btrfs-progs that tries v1 as a fallback in case v2 is not supported. This is not necessary anymore. Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Anand Jain <anand.jain@oracle.com> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Colin Ian King authored
The bitfields have_csum and io_error are currently signed which is not recommended as the representation is an implementation defined behaviour. Fix this by making the bit-fields unsigned ints. Fixes: 2c363954 ("btrfs: scrub: remove the anonymous structure from scrub_page") Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Qu Wenruo <wqu@suse.com> Signed-off-by:
Colin Ian King <colin.i.king@gmail.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Wang Yugui authored
When a disk has write caching disabled, we skip submission of a bio with flush and sync requests before writing the superblock, since it's not needed. However when the integrity checker is enabled, this results in reports that there are metadata blocks referred by a superblock that were not properly flushed. So don't skip the bio submission only when the integrity checker is enabled for the sake of simplicity, since this is a debug tool and not meant for use in non-debug builds. fstests/btrfs/220 trigger a check-integrity warning like the following when CONFIG_BTRFS_FS_CHECK_INTEGRITY=y and the disk with WCE=0. btrfs: attempt to write superblock which references block M @5242880 (sdb2/5242880/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)! ------------[ cut here ]------------ WARNING: CPU: 28 PID: 843680 at fs/btrfs/check-integrity.c:2196 btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs] CPU: 28 PID: 843680 Comm: umount Not tainted 5.15.0-0.rc5.39.el8.x86_64 #1 Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019 RIP: 0010:btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs] RSP: 0018:ffffb642afb47940 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: 00000000ffffffff RSI: ffff8b722fc97d00 RDI: ffff8b722fc97d00 RBP: ffff8b5601c00000 R08: 0000000000000000 R09: c0000000ffff7fff R10: 0000000000000001 R11: ffffb642afb476f8 R12: ffffffffffffffff R13: ffffb642afb47974 R14: ffff8b5499254c00 R15: 0000000000000003 FS: 00007f00a06d4080(0000) GS:ffff8b722fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff5cff5ff0 CR3: 00000001c0c2a006 CR4: 00000000001706e0 Call Trace: btrfsic_process_written_block+0x2f7/0x850 [btrfs] __btrfsic_submit_bio.part.19+0x310/0x330 [btrfs] ? bio_associate_blkg_from_css+0xa4/0x2c0 btrfsic_submit_bio+0x18/0x30 [btrfs] write_dev_supers+0x81/0x2a0 [btrfs] ? find_get_pages_range_tag+0x219/0x280 ? pagevec_lookup_range_tag+0x24/0x30 ? __filemap_fdatawait_range+0x6d/0xf0 ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e ? find_first_extent_bit+0x9b/0x160 [btrfs] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e write_all_supers+0x1b3/0xa70 [btrfs] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e btrfs_commit_transaction+0x59d/0xac0 [btrfs] close_ctree+0x11d/0x339 [btrfs] generic_shutdown_super+0x71/0x110 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0xb8/0x140 task_work_run+0x6d/0xb0 exit_to_user_mode_prepare+0x1f0/0x200 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x46/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f009f711dfb RSP: 002b:00007fff5cff7928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 000055b68c6c9970 RCX: 00007f009f711dfb RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b68c6c9b50 RBP: 0000000000000000 R08: 000055b68c6ca900 R09: 00007f009f795580 R10: 0000000000000000 R11: 0000000000000246 R12: 000055b68c6c9b50 R13: 00007f00a04bf184 R14: 0000000000000000 R15: 00000000ffffffff ---[ end trace 2c4b82abcef9eec4 ]--- S-65536(sdb2/65536/1) --> M-1064960(sdb2/1064960/1) Reviewed-by:
Filipe Manana <fdmanana@gmail.com> Signed-off-by:
Wang Yugui <wangyugui@e16-tech.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Filipe Manana authored
Often some test cases like btrfs/161 trigger lockdep splats that complain about possible unsafe lock scenario due to the fact that during mount, when reading the chunk tree we end up calling blkdev_get_by_path() while holding a read lock on a leaf of the chunk tree. That produces a lockdep splat like the following: [ 3653.683975] ====================================================== [ 3653.685148] WARNING: possible circular locking dependency detected [ 3653.686301] 5.15.0-rc7-btrfs-next-103 #1 Not tainted [ 3653.687239] ------------------------------------------------------ [ 3653.688400] mount/447465 is trying to acquire lock: [ 3653.689320] ffff8c6b0c76e528 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.691054] but task is already holding lock: [ 3653.692155] ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.693978] which lock already depends on the new lock. [ 3653.695510] the existing dependency chain (in reverse order) is: [ 3653.696915] -> #3 (btrfs-chunk-00){++++}-{3:3}: [ 3653.698053] down_read_nested+0x4b/0x140 [ 3653.698893] __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.699988] btrfs_read_lock_root_node+0x31/0x40 [btrfs] [ 3653.701205] btrfs_search_slot+0x537/0xc00 [btrfs] [ 3653.702234] btrfs_insert_empty_items+0x32/0x70 [btrfs] [ 3653.703332] btrfs_init_new_device+0x563/0x15b0 [btrfs] [ 3653.704439] btrfs_ioctl+0x2110/0x3530 [btrfs] [ 3653.705405] __x64_sys_ioctl+0x83/0xb0 [ 3653.706215] do_syscall_64+0x3b/0xc0 [ 3653.706990] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.708040] -> #2 (sb_internal#2){.+.+}-{0:0}: [ 3653.708994] lock_release+0x13d/0x4a0 [ 3653.709533] up_write+0x18/0x160 [ 3653.710017] btrfs_sync_file+0x3f3/0x5b0 [btrfs] [ 3653.710699] __loop_update_dio+0xbd/0x170 [loop] [ 3653.711360] lo_ioctl+0x3b1/0x8a0 [loop] [ 3653.711929] block_ioctl+0x48/0x50 [ 3653.712442] __x64_sys_ioctl+0x83/0xb0 [ 3653.712991] do_syscall_64+0x3b/0xc0 [ 3653.713519] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.714233] -> #1 (&lo->lo_mutex){+.+.}-{3:3}: [ 3653.715026] __mutex_lock+0x92/0x900 [ 3653.715648] lo_open+0x28/0x60 [loop] [ 3653.716275] blkdev_get_whole+0x28/0x90 [ 3653.716867] blkdev_get_by_dev.part.0+0x142/0x320 [ 3653.717537] blkdev_open+0x5e/0xa0 [ 3653.718043] do_dentry_open+0x163/0x390 [ 3653.718604] path_openat+0x3f0/0xa80 [ 3653.719128] do_filp_open+0xa9/0x150 [ 3653.719652] do_sys_openat2+0x97/0x160 [ 3653.720197] __x64_sys_openat+0x54/0x90 [ 3653.720766] do_syscall_64+0x3b/0xc0 [ 3653.721285] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.721986] -> #0 (&disk->open_mutex){+.+.}-{3:3}: [ 3653.722775] __lock_acquire+0x130e/0x2210 [ 3653.723348] lock_acquire+0xd7/0x310 [ 3653.723867] __mutex_lock+0x92/0x900 [ 3653.724394] blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.725041] blkdev_get_by_path+0xb8/0xd0 [ 3653.725614] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] [ 3653.726332] open_fs_devices+0xd7/0x2c0 [btrfs] [ 3653.726999] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs] [ 3653.727739] open_ctree+0xb8e/0x17bf [btrfs] [ 3653.728384] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 3653.729130] legacy_get_tree+0x30/0x50 [ 3653.729676] vfs_get_tree+0x28/0xc0 [ 3653.730192] vfs_kern_mount.part.0+0x71/0xb0 [ 3653.730800] btrfs_mount+0x11d/0x3a0 [btrfs] [ 3653.731427] legacy_get_tree+0x30/0x50 [ 3653.731970] vfs_get_tree+0x28/0xc0 [ 3653.732486] path_mount+0x2d4/0xbe0 [ 3653.732997] __x64_sys_mount+0x103/0x140 [ 3653.733560] do_syscall_64+0x3b/0xc0 [ 3653.734080] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.734782] other info that might help us debug this: [ 3653.735784] Chain exists of: &disk->open_mutex --> sb_internal#2 --> btrfs-chunk-00 [ 3653.737123] Possible unsafe locking scenario: [ 3653.737865] CPU0 CPU1 [ 3653.738435] ---- ---- [ 3653.739007] lock(btrfs-chunk-00); [ 3653.739449] lock(sb_internal#2); [ 3653.740193] lock(btrfs-chunk-00); [ 3653.740955] lock(&disk->open_mutex); [ 3653.741431] *** DEADLOCK *** [ 3653.742176] 3 locks held by mount/447465: [ 3653.742739] #0: ffff8c6acf85c0e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0 [ 3653.744114] #1: ffffffffc0b28f70 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x870 [btrfs] [ 3653.745563] #2: ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.747066] stack backtrace: [ 3653.747723] CPU: 4 PID: 447465 Comm: mount Not tainted 5.15.0-rc7-btrfs-next-103 #1 [ 3653.748873] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 3653.750592] Call Trace: [ 3653.750967] dump_stack_lvl+0x57/0x72 [ 3653.751526] check_noncircular+0xf3/0x110 [ 3653.752136] ? stack_trace_save+0x4b/0x70 [ 3653.752748] __lock_acquire+0x130e/0x2210 [ 3653.753356] lock_acquire+0xd7/0x310 [ 3653.753898] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.754596] ? lock_is_held_type+0xe8/0x140 [ 3653.755125] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.755729] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.756338] __mutex_lock+0x92/0x900 [ 3653.756794] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.757400] ? do_raw_spin_unlock+0x4b/0xa0 [ 3653.757930] ? _raw_spin_unlock+0x29/0x40 [ 3653.758437] ? bd_prepare_to_claim+0x129/0x150 [ 3653.758999] ? trace_module_get+0x2b/0xd0 [ 3653.759508] ? try_module_get.part.0+0x50/0x80 [ 3653.760072] blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.760661] ? devcgroup_check_permission+0xc1/0x1f0 [ 3653.761288] blkdev_get_by_path+0xb8/0xd0 [ 3653.761797] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] [ 3653.762454] open_fs_devices+0xd7/0x2c0 [btrfs] [ 3653.763055] ? clone_fs_devices+0x8f/0x170 [btrfs] [ 3653.763689] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs] [ 3653.764370] ? kvm_sched_clock_read+0x14/0x40 [ 3653.764922] open_ctree+0xb8e/0x17bf [btrfs] [ 3653.765493] ? super_setup_bdi_name+0x79/0xd0 [ 3653.766043] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 3653.766780] ? rcu_read_lock_sched_held+0x3f/0x80 [ 3653.767488] ? kfree+0x1f2/0x3c0 [ 3653.767979] legacy_get_tree+0x30/0x50 [ 3653.768548] vfs_get_tree+0x28/0xc0 [ 3653.769076] vfs_kern_mount.part.0+0x71/0xb0 [ 3653.769718] btrfs_mount+0x11d/0x3a0 [btrfs] [ 3653.770381] ? rcu_read_lock_sched_held+0x3f/0x80 [ 3653.771086] ? kfree+0x1f2/0x3c0 [ 3653.771574] legacy_get_tree+0x30/0x50 [ 3653.772136] vfs_get_tree+0x28/0xc0 [ 3653.772673] path_mount+0x2d4/0xbe0 [ 3653.773201] __x64_sys_mount+0x103/0x140 [ 3653.773793] do_syscall_64+0x3b/0xc0 [ 3653.774333] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.775094] RIP: 0033:0x7f648bc45aaa This happens because through btrfs_read_chunk_tree(), which is called only during mount, ends up acquiring the mutex open_mutex of a block device while holding a read lock on a leaf of the chunk tree while other paths need to acquire other locks before locking extent buffers of the chunk tree. Since at mount time when we call btrfs_read_chunk_tree() we know that we don't have other tasks running in parallel and modifying the chunk tree, we can simply skip locking of chunk tree extent buffers. So do that and move the assertion that checks the fs is not yet mounted to the top block of btrfs_read_chunk_tree(), with a comment before doing it. Signed-off-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Ordered work functions aren't guaranteed to be handled by the same thread which executed the normal work functions. The only way execution between normal/ordered functions is synchronized is via the WORK_DONE_BIT, unfortunately the used bitops don't guarantee any ordering whatsoever. This manifested as seemingly inexplicable crashes on ARM64, where async_chunk::inode is seen as non-null in async_cow_submit which causes submit_compressed_extents to be called and crash occurs because async_chunk::inode suddenly became NULL. The call trace was similar to: pc : submit_compressed_extents+0x38/0x3d0 lr : async_cow_submit+0x50/0xd0 sp : ffff800015d4bc20 <registers omitted for brevity> Call trace: submit_compressed_extents+0x38/0x3d0 async_cow_submit+0x50/0xd0 run_ordered_work+0xc8/0x280 btrfs_work_helper+0x98/0x250 process_one_work+0x1f0/0x4ac worker_thread+0x188/0x504 kthread+0x110/0x1...
-