- Apr 21, 2022
-
-
Christophe Leroy authored
This is a fix for commit f6795053 ("mm: mmap: Allow for "high" userspace addresses") for hugetlb. This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of hugetlb_get_unmapped_area() function. However, arm64 uses the generic version of that function. So take into account arch_get_mmap_base() and arch_get_mmap_end() in hugetlb_get_unmapped_area(). To allow that, move those two macros out of mm/mmap.c into include/linux/sched/mm.h If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. For the time being, only ARM64 is affected by this change. Catalin (ARM64) said "We should have fixed hugetlb_get_unmapped_area() as well when we added support for 52-bit VA. The reason for commit f6795053 was to prevent normal mmap() from returning addresses above 48-bit by default as some user-space had hard assumptions about this. It's a slight ABI change if you do this for hugetlb_get_unmapped_area() but I doubt anyone would notice. It's more likely that the current behaviour would cause issues, so I'd rather have them consistent. Basically when arm64 gained support for 52-bit addresses we did not want user-space calling mmap() to suddenly get such high addresses, otherwise we could have inadvertently broken some programs (similar behaviour to x86 here). Hence we added commit f6795053. But we missed hugetlbfs which could still get such high mmap() addresses. So in theory that's a potential regression that should have bee addressed at the same time as commit f6795053 (and before arm64 enabled 52-bit addresses)" Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu Fixes: f6795053 ("mm: mmap: Allow for "high" userspace addresses") Signed-off-by:
Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Ye Bin authored
we got issue as follows: [ 72.796117] EXT4-fs error (device sda): ext4_journal_check_start:83: comm fallocate: Detected aborted journal [ 72.826847] EXT4-fs (sda): Remounting filesystem read-only fallocate: fallocate failed: Read-only file system [ 74.791830] jbd2_journal_commit_transaction: jh=0xffff9cfefe725d90 bh=0x0000000000000000 end delay [ 74.793597] ------------[ cut here ]------------ [ 74.794203] kernel BUG at fs/jbd2/transaction.c:2063! [ 74.794886] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 74.795533] CPU: 4 PID: 2260 Comm: jbd2/sda-8 Not tainted 5.17.0-rc8-next-20220315-dirty #150 [ 74.798327] RIP: 0010:__jbd2_journal_unfile_buffer+0x3e/0x60 [ 74.801971] RSP: 0018:ffffa828c24a3cb8 EFLAGS: 00010202 [ 74.802694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 74.803601] RDX: 0000000000000001 RSI: ffff9cfefe725d90 RDI: ffff9cfefe725d90 [ 74.804554] RBP: ffff9cfefe725d90 R08: 0000000000000000 R09: ffffa828c24a3b20 [ 74.805471] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9cfefe725d90 [ 74.806385] R13: ffff9cfefe725d98 R14: 0000000000000000 R15: ffff9cfe833a4d00 [ 74.807301] FS: 0000000000000000(0000) GS:ffff9d01afb00000(0000) knlGS:0000000000000000 [ 74.808338] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 74.809084] CR2: 00007f2b81bf4000 CR3: 0000000100056000 CR4: 00000000000006e0 [ 74.810047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 74.810981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 74.811897] Call Trace: [ 74.812241] <TASK> [ 74.812566] __jbd2_journal_refile_buffer+0x12f/0x180 [ 74.813246] jbd2_journal_refile_buffer+0x4c/0xa0 [ 74.813869] jbd2_journal_commit_transaction.cold+0xa1/0x148 [ 74.817550] kjournald2+0xf8/0x3e0 [ 74.819056] kthread+0x153/0x1c0 [ 74.819963] ret_from_fork+0x22/0x30 Above issue may happen as follows: write truncate kjournald2 generic_perform_write ext4_write_begin ext4_walk_page_buffers do_journal_get_write_access ->add BJ_Reserved list ext4_journalled_write_end ext4_walk_page_buffers write_end_fn ext4_handle_dirty_metadata ***************JBD2 ABORT************** jbd2_journal_dirty_metadata -> return -EROFS, jh in reserved_list jbd2_journal_commit_transaction while (commit_transaction->t_reserved_list) jh = commit_transaction->t_reserved_list; truncate_pagecache_range do_invalidatepage ext4_journalled_invalidatepage jbd2_journal_invalidatepage journal_unmap_buffer __dispose_buffer __jbd2_journal_unfile_buffer jbd2_journal_put_journal_head ->put last ref_count __journal_remove_journal_head bh->b_private = NULL; jh->b_bh = NULL; jbd2_journal_refile_buffer(journal, jh); bh = jh2bh(jh); ->bh is NULL, later will trigger null-ptr-deref journal_free_journal_head(jh); After commit 96f1e097, we no longer hold the j_state_lock while iterating over the list of reserved handles in jbd2_journal_commit_transaction(). This potentially allows the journal_head to be freed by journal_unmap_buffer while the commit codepath is also trying to free the BJ_Reserved buffers. Keeping j_state_lock held while trying extends hold time of the lock minimally, and solves this issue. Fixes: 96f1e097 ("jbd2: avoid long hold times of j_state_lock while committing a transaction") Signed-off-by:
Ye Bin <yebin10@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220317142137.1821590-1-yebin10@huawei.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Christian Brauner authored
After mnt_hold_writers() has been called we will always have set MNT_WRITE_HOLD and consequently we always need to pair mnt_hold_writers() with mnt_unhold_writers(). After the recent cleanup in [1] where Al switched from a do-while to a for loop the cleanup currently fails to unset MNT_WRITE_HOLD for the first mount that was changed. Fix this and make sure that the first mount will be cleaned up and add some comments to make it more obvious. Link: https://lore.kernel.org/lkml/0000000000007cc21d05dd0432b8@google.com Link: https://lore.kernel.org/lkml/00000000000080e10e05dd043247@google.com Link: https://lore.kernel.org/r/20220420131925.2464685-1-brauner@kernel.org Fixes: e257039f ("mount_setattr(): clean the control flow and calling conventions") [1] Cc: Hillf Danton <hdanton@sina.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by:
<syzbot+10a16d1c43580983f6a2@syzkaller.appspotmail.com> Reported-by:
<syzbot+306090cfa3294f0bbfb3@syzkaller.appspotmail.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Christian Brauner (Microsoft) <brauner@kernel.org>
-
- Apr 20, 2022
-
-
Ronnie Sahlberg authored
because the copychunk_write might cover a region of the file that has not yet been sent to the server and thus fail. A simple way to reproduce this is: truncate -s 0 /mnt/testfile; strace -f -o x -ttT xfs_io -i -f -c 'pwrite 0k 128k' -c 'fcollapse 16k 24k' /mnt/testfile the issue is that the 'pwrite 0k 128k' becomes rearranged on the wire with the 'fcollapse 16k 24k' due to write-back caching. fcollapse is implemented in cifs.ko as a SMB2 IOCTL(COPYCHUNK_WRITE) call and it will fail serverside since the file is still 0b in size serverside until the writes have been destaged. To avoid this we must ensure that we destage any unwritten data to the server before calling COPYCHUNK_WRITE. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1997373 Reported-by:
Xiaoli Feng <xifeng@redhat.com> Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Paulo Alcantara authored
TCP_Server_Info::origin_fullpath and TCP_Server_Info::leaf_fullpath are protected by refpath_lock mutex and not cifs_tcp_ses_lock spinlock. Signed-off-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org Reviewed-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Paulo Alcantara authored
Either mount(2) or automount might not have server->origin_fullpath set yet while refresh_cache_worker() is attempting to refresh DFS referrals. Add missing NULL check and locking around it. This fixes bellow crash: [ 1070.276835] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 1070.277676] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 1070.278219] CPU: 1 PID: 8506 Comm: kworker/u8:1 Not tainted 5.18.0-rc3 #10 [ 1070.278701] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 [ 1070.279495] Workqueue: cifs-dfscache refresh_cache_worker [cifs] [ 1070.280044] RIP: 0010:strcasecmp+0x34/0x150 [ 1070.280359] Code: 00 00 00 fc ff df 41 54 55 48 89 fd 53 48 83 ec 10 eb 03 4c 89 fe 48 89 ef 48 83 c5 01 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 08 84 c0 0f 85 bc 00 00 00 0f b6 45 ff 44 [ 1070.281729] RSP: 0018:ffffc90008367958 EFLAGS: 00010246 [ 1070.282114] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000 [ 1070.282691] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 1070.283273] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff873eda27 [ 1070.283857] R10: ffffc900083679a0 R11: 0000000000000001 R12: ffff88812624c000 [ 1070.284436] R13: dffffc0000000000 R14: ffff88810e6e9a88 R15: ffff888119bb9000 [ 1070.284990] FS: 0000000000000000(0000) GS:ffff888151200000(0000) knlGS:0000000000000000 [ 1070.285625] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1070.286100] CR2: 0000561a4d922418 CR3: 000000010aecc000 CR4: 0000000000350ee0 [ 1070.286683] Call Trace: [ 1070.286890] <TASK> [ 1070.287070] refresh_cache_worker+0x895/0xd20 [cifs] [ 1070.287475] ? __refresh_tcon.isra.0+0xfb0/0xfb0 [cifs] [ 1070.287905] ? __lock_acquire+0xcd1/0x6960 [ 1070.288247] ? is_dynamic_key+0x1a0/0x1a0 [ 1070.288591] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 1070.289012] ? lock_downgrade+0x6f0/0x6f0 [ 1070.289318] process_one_work+0x7bd/0x12d0 [ 1070.289637] ? worker_thread+0x160/0xec0 [ 1070.289970] ? pwq_dec_nr_in_flight+0x230/0x230 [ 1070.290318] ? _raw_spin_lock_irq+0x5e/0x90 [ 1070.290619] worker_thread+0x5ac/0xec0 [ 1070.290891] ? process_one_work+0x12d0/0x12d0 [ 1070.291199] kthread+0x2a5/0x350 [ 1070.291430] ? kthread_complete_and_exit+0x20/0x20 [ 1070.291770] ret_from_fork+0x22/0x30 [ 1070.292050] </TASK> [ 1070.292223] Modules linked in: bpfilter cifs cifs_arc4 cifs_md4 [ 1070.292765] ---[ end trace 0000000000000000 ]--- [ 1070.293108] RIP: 0010:strcasecmp+0x34/0x150 [ 1070.293471] Code: 00 00 00 fc ff df 41 54 55 48 89 fd 53 48 83 ec 10 eb 03 4c 89 fe 48 89 ef 48 83 c5 01 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 08 84 c0 0f 85 bc 00 00 00 0f b6 45 ff 44 [ 1070.297718] RSP: 0018:ffffc90008367958 EFLAGS: 00010246 [ 1070.298622] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000 [ 1070.299428] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 1070.300296] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff873eda27 [ 1070.301204] R10: ffffc900083679a0 R11: 0000000000000001 R12: ffff88812624c000 [ 1070.301932] R13: dffffc0000000000 R14: ffff88810e6e9a88 R15: ffff888119bb9000 [ 1070.302645] FS: 0000000000000000(0000) GS:ffff888151200000(0000) knlGS:0000000000000000 [ 1070.303462] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1070.304131] CR2: 0000561a4d922418 CR3: 000000010aecc000 CR4: 0000000000350ee0 [ 1070.305004] Kernel panic - not syncing: Fatal exception [ 1070.305711] Kernel Offset: disabled [ 1070.305971] ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org Reviewed-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Linus Torvalds authored
This reverts commit 5a519c8f . It turns out that making the pipe almost arbitrarily large has some rather unexpected downsides. The kernel test robot reports a kernel warning that is due to pipe->max_usage now growing to the point where the iter_file_splice_write() buffer allocation can no longer be satisfied as a slab allocation, and the int nbufs = pipe->max_usage; struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL); code sequence there will now always fail as a result. That code could be modified to use kvcalloc() too, but I feel very uncomfortable making those kinds of changes for a very niche use case that really should have other options than make these kinds of fundamental changes to pipe behavior. Maybe the CRIU process dumping should be multi-threaded, and use multiple pipes and multiple cores, rather than try to use one larger pipe to minimize splice() calls. Reported-by:
kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/ Cc: Andrei Vagin <avagin@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 19, 2022
-
-
Christian Brauner authored
Last cycle we extended the idmapped mounts infrastructure to support idmapped mounts of idmapped filesystems (No such filesystem yet exist.). Since then, the meaning of an idmapped mount is a mount whose idmapping is different from the filesystems idmapping. While doing that work we missed to adapt the acl translation helpers. They still assume that checking for the identity mapping is enough. But they need to use the no_idmapping() helper instead. Note, POSIX ACLs are always translated right at the userspace-kernel boundary using the caller's current idmapping and the initial idmapping. The order depends on whether we're coming from or going to userspace. The filesystem's idmapping doesn't matter at the border. Consequently, if a non-idmapped mount is passed we need to make sure to always pass the initial idmapping as the mount's idmapping and not the filesystem idmapping. Since it's irrelevant here it would yield invalid ids and prevent setting acls for filesystems that are mountable in a userns and support posix acls (tmpfs and fuse). I verified the regression reported in [1] and verified that this patch fixes it. A regression test will be added to xfstests in parallel. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1] Fixes: bd303368 ("fs: support mapped mounts of mapped filesystems") Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # 5.17 Cc: <regressions@lists.linux.dev> Signed-off-by:
Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 18, 2022
-
-
Haowen Bai authored
Use kzalloc rather than duplicating its implementation, which makes code simple and easy to understand. Signed-off-by:
Haowen Bai <baihaowen@meizu.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
- Apr 17, 2022
-
-
Pavel Begunkov authored
If all completed requests in io_do_iopoll() were marked with REQ_F_CQE_SKIP, we'll not only skip CQE posting but also io_free_batch_list() leaking memory and resources. Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially return the value greater than the real one, but iopolling will deal with it and the userspace will re-iopoll if needed. In anyway, I don't think there are many use cases for REQ_F_CQE_SKIP + IOPOLL. Fixes: 83a13a41 ("io_uring: tweak iopoll CQE_SKIP event counting") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5072fc8693fbfd595f89e5d4305bfcfd5d2f0a64.1650186611.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 16, 2022
-
-
Jens Axboe authored
We just return failure in this case, but we need to release the iovec first. If we're doing IO with more than FAST_IOV segments, then the iovec is allocated and must be freed. Reported-by:
<syzbot+96b43810dfe9c3bb95ed@syzkaller.appspotmail.com> Fixes: 584b0180 ("io_uring: move read/write file prep state into actual opcode handler") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 15, 2022
-
-
Andrew Morton authored
Despite Mike's attempted fix (925346c1), regressions reports continue: https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/ https://bugzilla.kernel.org/show_bug.cgi?id=215720 https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info So revert this patch. Fixes: 9630f0d6 ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Kennelly <ckennelly@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Fangrui Song <maskray@google.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
Commit 925346c1 ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders") was an attempt to fix regressions due to 9630f0d6 ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE"). But regressionss continue to be reported: https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/ https://bugzilla.kernel.org/show_bug.cgi?id=215720 https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info This patch reverts the fix, so the original can also be reverted. Fixes: 925346c1 ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders") Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: David Rientjes <rientjes@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Fangrui Song <maskray@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Mike Rapoport <rppt@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Hongyu Jin authored
The root cause is the race as follows: Thread #1 Thread #2(irq ctx) z_erofs_runqueue() struct z_erofs_decompressqueue io_A[]; submit bio A z_erofs_decompress_kickoff(,,1) z_erofs_decompressqueue_endio(bio A) z_erofs_decompress_kickoff(,,-1) spin_lock_irqsave() atomic_add_return() io_wait_event() -> pending_bios is already 0 [end of function] wake_up_locked(io_A[]) // crash Referenced backtrace in kernel 5.4: [ 10.129422] Unable to handle kernel paging request at virtual address eb0454a4 [ 10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G WC O 5.4.147-ab09225 #1 [ 11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48) [ 11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0) [ 11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c) [ 11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0) [ 11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc) [ 11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c) [ 11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc) [ 11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c) [ 11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138) [ 11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0) [ 11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4) [ 11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0) Signed-off-by:
Hongyu Jin <hongyu.jin@unisoc.com> Reviewed-by:
Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by:
Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220401115527.4935-1-hongyu.jin.cn@gmail.com Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com>
-
- Apr 14, 2022
-
-
Theodore Ts'o authored
If we (re-)calculate the file system overhead amount and it's different from the on-disk s_overhead_clusters value, update the on-disk version since this can take potentially quite a while on bigalloc file systems. Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
Jens Axboe authored
We need to either restore creds properly if we fail on the file assignment, or just do the file assignment first instead. Let's do the latter as it's simpler, should make no difference here for file assignment. Link: https://lore.kernel.org/lkml/000000000000a7edb305dca75a50@google.com/ Reported-by:
<syzbot+60c52ca98513a8760a91@syzkaller.appspotmail.com> Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Theodore Ts'o authored
If the file system does not use bigalloc, calculating the overhead is cheap, so force the recalculation of the overhead so we don't have to trust the precalculated overhead in the superblock. Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
Theodore Ts'o authored
The kernel calculation was underestimating the overhead by not taking into account the reserved gdt blocks. With this change, the overhead calculated by the kernel matches the overhead calculation in mke2fs. Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
NeilBrown authored
When asked to create a path ending '/', but which is not to be a directory (LOOKUP_DIRECTORY not set), filename_create() will never try to create the file. If it doesn't exist, -ENOENT is reported. However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems ->lookup() function, even though there is no intent to create. This is misleading and can cause incorrect behaviour. If you try ln -s foo /path/dir/ where 'dir' is a directory on an NFS filesystem which is not currently known in the dcache, this will fail with ENOENT. But as the name is not in the dcache, nfs_lookup gets called with LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any lookup, with the expectation that a subsequent call to create the target will be made, and the lookup can be combined with the creation. In the case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never made. Instead filename_create() sees that the dentry is not (yet) positive and returns -ENOENT - even though the directory actually exists. So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to create, and use the absence of these flags to decide if -ENOENT should be returned. Note that filename_parentat() is only interested in LOOKUP_REVAL, so we split that out and store it in 'reval_flag'. __lookup_hash() then gets reval_flag combined with whatever create flags were determined to be needed. Reviewed-by:
David Disseldorp <ddiss@suse.de> Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
NeilBrown <neilb@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Ronnie Sahlberg authored
On umount, cifs_sb->tlink_tree might contain entries that do not represent a valid tcon. Check the tcon for error before we dereference it. Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Cc: stable@vger.kernel.org Reviewed-by:
Shyam Prasad N <sprasad@microsoft.com> Reported-by:
Xiaoli Feng <xifeng@redhat.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
- Apr 13, 2022
-
-
Harshit Mogalapalli authored
Smatch printed a warning: arch/x86/crypto/poly1305_glue.c:198 poly1305_update_arch() error: __memcpy() 'dctx->buf' too small (16 vs u32max) It's caused because Smatch marks 'link_len' as untrusted since it comes from sscanf(). Add a check to ensure that 'link_len' is not larger than the size of the 'link_str' buffer. Fixes: c69c1b6e ("cifs: implement CIFSParseMFSymlink()") Signed-off-by:
Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Pavel Begunkov authored
We should not return an error code in req->result in io_poll_check_events(), because it may get mangled and returned as success. Just return the error code directly, the callers will fail the request or proceed accordingly. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5f03514ee33324dc811fb93df84aee0f695fb044.1649862516.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
We pass "unlocked" into io_assign_file() in io_poll_check_events(), which can lead to double locking. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2476d4ae46554324b599ee4055447b105f20a75a.1649862516.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Pass right issue_flags into into io_file_get_fixed() instead of IO_URING_F_UNLOCKED. It's probably not a problem at the moment but let's do it safer. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7d242daa9df5d776907686977cd29fbceb4a2d8d.1649862516.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 12, 2022
-
-
Tadeusz Struk authored
Syzbot found an issue [1] in ext4_fallocate(). The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul, and offset 0x1000000ul, which, when added together exceed the bitmap_maxbytes for the inode. This triggers a BUG in ext4_ind_remove_space(). According to the comments in this function the 'end' parameter needs to be one block after the last block to be removed. In the case when the BUG is triggered it points to the last block. Modify the ext4_punch_hole() function and add constraint that caps the length to satisfy the one before laster block requirement. LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331 LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000 Fixes: a4bb6b64 ("ext4: enable "punch hole" functionality") Reported-by:
<syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com> Signed-off-by:
Tadeusz Struk <tadeusz.struk@linaro.org> Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.org Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
Ye Bin authored
We got issue as follows: EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue ================================================================== BUG: KASAN: use-after-free in ext4_search_dir fs/ext4/namei.c:1394 [inline] BUG: KASAN: use-after-free in search_dirblock fs/ext4/namei.c:1199 [inline] BUG: KASAN: use-after-free in __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553 Read of size 1 at addr ffff8881317c3005 by task syz-executor117/2331 CPU: 1 PID: 2331 Comm: syz-executor117 Not tainted 5.10.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:83 [inline] dump_stack+0x144/0x187 lib/dump_stack.c:124 print_address_description+0x7d/0x630 mm/kasan/report.c:387 __kasan_report+0x132/0x190 mm/kasan/report.c:547 kasan_report+0x47/0x60 mm/kasan/report.c:564 ext4_search_dir fs/ext4/namei.c:1394 [inline] search_dirblock fs/ext4/namei.c:1199 [inline] __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553 ext4_lookup_entry fs/ext4/namei.c:1622 [inline] ext4_lookup+0xb8/0x3a0 fs/ext4/namei.c:1690 __lookup_hash+0xc5/0x190 fs/namei.c:1451 do_rmdir+0x19e/0x310 fs/namei.c:3760 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x445e59 Code: 4d c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff2277fac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 RAX: ffffffffffffffda RBX: 0000000000400280 RCX: 0000000000445e59 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002 R10: 00007fff2277f990 R11: 0000000000000246 R12: 0000000000000000 R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 The buggy address belongs to the page: page:0000000048cd3304 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1317c3 flags: 0x200000000000000() raw: 0200000000000000 ffffea0004526588 ffffea0004528088 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881317c2f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8881317c2f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8881317c3000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff8881317c3080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8881317c3100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== ext4_search_dir: ... de = (struct ext4_dir_entry_2 *)search_buf; dlimit = search_buf + buf_size; while ((char *) de < dlimit) { ... if ((char *) de + de->name_len <= dlimit && ext4_match(dir, fname, de)) { ... } ... de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize); if (de_len <= 0) return -1; offset += de_len; de = (struct ext4_dir_entry_2 *) ((char *) de + de_len); } Assume: de=0xffff8881317c2fff dlimit=0x0xffff8881317c3000 If read 'de->name_len' which address is 0xffff8881317c3005, obviously is out of range, then will trigger use-after-free. To solve this issue, 'dlimit' must reserve 8 bytes, as we will read 'de->name_len' to judge if '(char *) de + de->name_len' out of range. Signed-off-by:
Ye Bin <yebin10@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220324064816.1209985-1-yebin10@huawei.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
Ye Bin authored
We got issue as follows: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:389! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 9 PID: 131 Comm: kworker/9:1 Not tainted 5.17.0-862.14.0.6.x86_64-00001-g23f87daf7d74-dirty #197 Workqueue: events flush_stashed_error_work RIP: 0010:start_this_handle+0x41c/0x1160 RSP: 0018:ffff888106b47c20 EFLAGS: 00010202 RAX: ffffed10251b8400 RBX: ffff888128dc204c RCX: ffffffffb52972ac RDX: 0000000000000200 RSI: 0000000000000004 RDI: ffff888128dc2050 RBP: 0000000000000039 R08: 0000000000000001 R09: ffffed10251b840a R10: ffff888128dc204f R11: ffffed10251b8409 R12: ffff888116d78000 R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888128dc2000 FS: 0000000000000000(0000) GS:ffff88839d680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001620068 CR3: 0000000376c0e000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> jbd2__journal_start+0x38a/0x790 jbd2_journal_start+0x19/0x20 flush_stashed_error_work+0x110/0x2b3 process_one_work+0x688/0x1080 worker_thread+0x8b/0xc50 kthread+0x26f/0x310 ret_from_fork+0x22/0x30 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- Above issue may happen as follows: umount read procfs error_work ext4_put_super flush_work(&sbi->s_error_work); ext4_mb_seq_groups_show ext4_mb_load_buddy_gfp ext4_mb_init_group ext4_mb_init_cache ext4_read_block_bitmap_nowait ext4_validate_block_bitmap ext4_error ext4_handle_error schedule_work(&EXT4_SB(sb)->s_error_work); ext4_unregister_sysfs(sb); jbd2_journal_destroy(sbi->s_journal); journal_kill_thread journal->j_flags |= JBD2_UNMOUNT; flush_stashed_error_work jbd2_journal_start start_this_handle BUG_ON(journal->j_flags & JBD2_UNMOUNT); To solve this issue, we call 'ext4_unregister_sysfs() before flushing s_error_work in ext4_put_super(). Signed-off-by:
Ye Bin <yebin10@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20220322012419.725457-1-yebin10@huawei.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Ye Bin authored
We got issue as follows: [home]# fsck.ext4 -fn ram0yb e2fsck 1.45.6 (20-Mar-2020) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Symlink /p3/d14/d1a/l3d (inode #3494) is invalid. Clear? no Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0). Fix? no As the symlink file size does not match the file content. If the writeback of the symlink data block failed, ext4_finish_bio() handles the end of IO. However this function fails to mark the buffer with BH_write_io_error and so when unmount does journal checkpoint it cannot detect the writeback error and will cleanup the journal. Thus we've lost the correct data in the journal area. To solve this issue, mark the buffer as BH_write_io_error in ext4_finish_bio(). Cc: stable@kernel.org Signed-off-by:
Ye Bin <yebin10@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Darrick J. Wong authored
Since the initial introduction of (posix) fallocate back at the turn of the century, it has been possible to use this syscall to change the user-visible contents of files. This can happen by extending the file size during a preallocation, or through any of the newer modes (punch, zero, collapse, insert range). Because the call can be used to change file contents, we should treat it like we do any other modification to a file -- update the mtime, and drop set[ug]id privileges/capabilities. The VFS function file_modified() does all this for us if pass it a locked inode, so let's make fallocate drop permissions correctly. Signed-off-by:
Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20220308185043.GA117678@magnolia Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
Mikulas Patocka authored
struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit st_dev and st_rdev; struct compat_stat (defined in arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by a 16-bit padding. This patch fixes struct compat_stat to match struct stat. [ Historical note: the old x86 'struct stat' did have that 16-bit field that the compat layer had kept around, but it was changes back in 2003 by "struct stat - support larger dev_t": https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d and back in those days, the x86_64 port was still new, and separate from the i386 code, and had already picked up the old version with a 16-bit st_dev field ] Note that we can't change compat_dev_t because it is used by compat_loop_info. Also, if the st_dev and st_rdev values are 32-bit, we don't have to use old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major number 259. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Dylan Yudaken authored
Ensure that only 0 is passed for pad here. Fixes: c73ebb68 ("io_uring: add timeout support for io_uring_enter()") Signed-off-by:
Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220412163042.2788062-5-dylany@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dylan Yudaken authored
Only allow resv field to be 0 in struct io_uring_rsrc_update user arguments. Fixes: e7a6c00d ("io_uring: add support for registering ring file descriptors") Signed-off-by:
Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220412163042.2788062-4-dylany@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dylan Yudaken authored
Verify that the user does not pass in anything but 0 for this field. Fixes: 992da01a ("io_uring: change registration/upd/rsrc tagging ABI") Signed-off-by:
Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220412163042.2788062-3-dylany@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dylan Yudaken authored
Move validation to be more consistently straight after copy_from_user. This is already done in io_register_rsrc_update and so this removes that redundant check. Signed-off-by:
Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220412163042.2788062-2-dylany@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io-wq work cancellation path can't take uring_lock as how it's done on file assignment, we have to handle IO_WQ_WORK_CANCEL first, this fixes encountered hangs. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0d9b9f37841645518503f6a207e509d14a286aba.1649773463.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 11, 2022
-
-
Jens Axboe authored
There are two reasons why this isn't the best idea: - It's an odd area to grab a bit of storage space, hence it's an odd area to grab storage from. - It puts the 3rd io_kiocb cacheline into the hot path, where normal hot path just needs the first two. Use 'cflags' for joint fd/cflags storage. We only need fd until we successfully issue, and we only need cflags once a request is done and is completed. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
In preparation for fixing a regression with pulling in an extra cacheline for IO that doesn't usually touch the last cacheline of the io_kiocb, move the cached location of apoll->events to space shared with some other completion data. Like cflags, this isn't used until after the request has been completed, so we can piggy back on top of comp_list. Fixes: 81459350 ("io_uring: cache req->apoll->events in req->cflags") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
-1 tells use to use the current position, but we check if the file is a stream regardless of that. Fix up io_kiocb_update_pos() to only dip into file if we need to. This is both more efficient and also drops 12 bytes of text on aarch64 and 64 bytes on x86-64. Fixes: b4aec400 ("io_uring: do not recalculate ppos unnecessarily") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 10, 2022
-
-
Jens Axboe authored
Give applications a way to tell if the kernel supports sane linked files, as in files being assigned at the right time to be able to reliably do <open file direct into slot X><read file from slot X> while using IOSQE_IO_LINK to order them. Not really a bug fix, but flag it as such so that it gets pulled in with backports of the deferred file assignment. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 08, 2022
-
-
David Howells authored
Split the smb3_add_credits tracepoint to make it more obvious when looking at the logs which line corresponds to what credit change. Also add a tracepoint for credit overflow when it's being added back. Note that it might be better to add another field to the tracepoint for the information rather than splitting it. It would also be useful to store the MID potentially, though that isn't available when the credits are first obtained. Signed-off-by:
David Howells <dhowells@redhat.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: linux-cifs@vger.kernel.org Acked-by:
Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by:
Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by:
Steve French <stfrench@microsoft.com>
-