- Apr 15, 2022
-
-
Patrick Wang authored
The kmemleak_*_phys() apis do not check the address for lowmem's min boundary, while the caller may pass an address below lowmem, which will trigger an oops: # echo scan > /sys/kernel/debug/kmemleak Unable to handle kernel paging request at virtual address ff5fffffffe00000 Oops [#1] Modules linked in: CPU: 2 PID: 134 Comm: bash Not tainted 5.18.0-rc1-next-20220407 #33 Hardware name: riscv-virtio,qemu (DT) epc : scan_block+0x74/0x15c ra : scan_block+0x72/0x15c epc : ffffffff801e5806 ra : ffffffff801e5804 sp : ff200000104abc30 gp : ffffffff815cd4e8 tp : ff60000004cfa340 t0 : 0000000000000200 t1 : 00aaaaaac23954cc t2 : 00000000000003ff s0 : ff200000104abc90 s1 : ffffffff81b0ff28 a0 : 0000000000000000 a1 : ff5fffffffe01000 a2 : ffffffff81b0ff28 a3 : 0000000000000002 a4 : 0000000000000001 a5 : 0000000000000000 a6 : ff200000104abd7c a7 : 0000000000000005 s2 : ff5fffffffe00ff9 s3 : ffffffff815cd998 s4 : ffffffff815d0e90 s5 : ffffffff81b0ff28 s6 : 0000000000000020 s7 : ffffffff815d0eb0 s8 : ffffffffffffffff s9 : ff5fffffffe00000 s10: ff5fffffffe01000 s11: 0000000000000022 t3 : 00ffffffaa17db4c t4 : 000000000000000f t5 : 0000000000000001 t6 : 0000000000000000 status: 0000000000000100 badaddr: ff5fffffffe00000 cause: 000000000000000d scan_gray_list+0x12e/0x1a6 kmemleak_scan+0x2aa/0x57e kmemleak_write+0x32a/0x40c full_proxy_write+0x56/0x82 vfs_write+0xa6/0x2a6 ksys_write+0x6c/0xe2 sys_write+0x22/0x2a ret_from_syscall+0x0/0x2 The callers may not quite know the actual address they pass(e.g. from devicetree). So the kmemleak_*_phys() apis should guarantee the address they finally use is in lowmem range, so check the address for lowmem's min boundary. Link: https://lkml.kernel.org/r/20220413122925.33856-1-patrick.wang.shcn@gmail.com Signed-off-by:
Patrick Wang <patrick.wang.shcn@gmail.com> Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Omar Sandoval authored
Commit 3ee48b6a ("mm, x86: Saving vmcore with non-lazy freeing of vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to lazy_max_pages() + 1, ensuring that any future vunmaps() immediately purge the vmap areas instead of doing it lazily. Commit 690467c8 ("mm/vmalloc: Move draining areas out of caller context") moved the purging from the vunmap() caller to a worker thread. Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin (possibly forever). For example, consider the following scenario: 1. Thread reads from /proc/vmcore. This eventually calls __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets vmap_lazy_nr to lazy_max_pages() + 1. 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2 pages (one page plus the guard page) to the purge list and vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the drain_vmap_work is scheduled. 3. Thread returns from the kernel and is scheduled out. 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It frees the 2 pages on the purge list. vmap_lazy_nr is now lazy_max_pages() + 1. 5. This is still over the threshold, so it tries to purge areas again, but doesn't find anything. 6. Repeat 5. If the system is running with only one CPU (which is typicial for kdump) and preemption is disabled, then this will never make forward progress: there aren't any more pages to purge, so it hangs. If there is more than one CPU or preemption is enabled, then the worker thread will spin forever in the background. (Note that if there were already pages to be purged at the time that set_iounmap_nonlazy() was called, this bug is avoided.) This can be reproduced with anything that reads from /proc/vmcore multiple times. E.g., vmcore-dmesg /proc/vmcore. It turns out that improvements to vmap() over the years have obsoleted the need for this "optimization". I benchmarked `dd if=/proc/vmcore of=/dev/null` with 4k and 1M read sizes on a system with a 32GB vmcore. The test was run on 5.17, 5.18-rc1 with a fix that avoided the hang, and 5.18-rc1 with set_iounmap_nonlazy() removed entirely: |5.17 |5.18+fix|5.18+removal 4k|40.86s| 40.09s| 26.73s 1M|24.47s| 23.98s| 21.84s The removal was the fastest (by a wide margin with 4k reads). This patch removes set_iounmap_nonlazy(). Link: https://lkml.kernel.org/r/52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com Fixes: 690467c8 ("mm/vmalloc: Move draining areas out of caller context") Signed-off-by:
Omar Sandoval <osandov@fb.com> Acked-by:
Chris Down <chris@chrisdown.name> Reviewed-by:
Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Baoquan He <bhe@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Kravetz authored
It is possible for poisoned hugetlb pages to reside on the free lists. The huge page allocation routines which dequeue entries from the free lists make a point of avoiding poisoned pages. There is no such check and avoidance in the demote code path. If a hugetlb page on the is on a free list, poison will only be set in the head page rather then the page with the actual error. If such a page is demoted, then the poison flag may follow the wrong page. A page without error could have poison set, and a page with poison could not have the flag set. Check for poison before attempting to demote a hugetlb page. Also, return -EBUSY to the caller if only poisoned pages are on the free list. Link: https://lkml.kernel.org/r/20220307215707.50916-1-mike.kravetz@oracle.com Fixes: 8531fc6f ("hugetlb: add hugetlb demote page support") Signed-off-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Charan Teja Kalla authored
The below warning is reported when CONFIG_COMPACTION=n: mm/compaction.c:56:27: warning: 'HPAGE_FRAG_CHECK_INTERVAL_MSEC' defined but not used [-Wunused-const-variable=] 56 | static const unsigned int HPAGE_FRAG_CHECK_INTERVAL_MSEC = 500; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix it by moving 'HPAGE_FRAG_CHECK_INTERVAL_MSEC' under CONFIG_COMPACTION defconfig. Also since this is just a 'static const int' type, use #define for it. Link: https://lkml.kernel.org/r/1647608518-20924-1-git-send-email-quic_charante@quicinc.com Signed-off-by:
Charan Teja Kalla <quic_charante@quicinc.com> Reported-by:
kernel test robot <lkp@intel.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Nitin Gupta <nigupta@nvidia.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
Two processes under CLONE_VM cloning, user process can be corrupted by seeing zeroed page unexpectedly. CPU A CPU B do_swap_page do_swap_page SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path swap_readpage valid data swap_slot_free_notify delete zram entry swap_readpage zeroed(invalid) data pte_lock map the *zero data* to userspace pte_unlock pte_lock if (!pte_same) goto out_nomap; pte_unlock return and next refault will read zeroed data The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't increase the refcount of swap slot at copy_mm so it couldn't catch up whether it's safe or not to discard data from backing device. In the case, only the lock it could rely on to synchronize swap slot freeing is page table lock. Thus, this patch gets rid of the swap_slot_free_notify function. With this patch, CPU A will see correct data. CPU A CPU B do_swap_page do_swap_page SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path swap_readpage original data pte_lock map the original data swap_free swap_range_free bd_disk->fops->swap_slot_free_notify swap_readpage read zeroed data pte_unlock pte_lock if (!pte_same) goto out_nomap; pte_unlock return on next refault will see mapped data by CPU B The concern of the patch would increase memory consumption since it could keep wasted memory with compressed form in zram as well as uncompressed form in address space. However, most of cases of zram uses no readahead and do_swap_page is followed by swap_free so it will free the compressed form from in zram quickly. Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com Fixes: 0bcac06f ("mm, swap: skip swapcache for swapin of synchronous device") Reported-by:
Ivan Babrou <ivan@cloudflare.com> Tested-by:
Ivan Babrou <ivan@cloudflare.com> Signed-off-by:
Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Juergen Gross authored
Since commit 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") only zones with free memory are included in a built zonelist. This is problematic when e.g. all memory of a zone has been ballooned out when zonelists are being rebuilt. The decision whether to rebuild the zonelists when onlining new memory is done based on populated_zone() returning 0 for the zone the memory will be added to. The new zone is added to the zonelists only, if it has free memory pages (managed_zone() returns a non-zero value) after the memory has been onlined. This implies, that onlining memory will always free the added pages to the allocator immediately, but this is not true in all cases: when e.g. running as a Xen guest the onlined new memory will be added only to the ballooned memory list, it will be freed only when the guest is being ballooned up afterwards. Another problem with using managed_zone() for the decision whether a zone is being added to the zonelists is, that a zone with all memory used will in fact be removed from all zonelists in case the zonelists happen to be rebuilt. Use populated_zone() when building a zonelist as it has been done before that commit. There was a report that QubesOS (based on Xen) is hitting this problem. Xen has switched to use the zone device functionality in kernel 5.9 and QubesOS wants to use memory hotplugging for guests in order to be able to start a guest with minimal memory and expand it as needed. This was the report leading to the patch. Link: https://lkml.kernel.org/r/20220407120637.9035-1-jgross@suse.com Fixes: 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") Signed-off-by:
Juergen Gross <jgross@suse.com> Reported-by:
Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
David Hildenbrand <david@redhat.com> Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by:
Wei Yang <richard.weiyang@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Marco Elver authored
Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been producing garbage data due to the object not actually being maintained by SLAB or SLUB. Fix this by implementing __kfence_obj_info() that copies relevant information to struct kmem_obj_info when the object was allocated by KFENCE; this is called by a common kmem_obj_info(), which also calls the slab/slub/slob specific variant now called __kmem_obj_info(). For completeness, kmem_dump_obj() now displays if the object was allocated by KFENCE. Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com Fixes: b89fb5ef ("mm, kfence: insert KFENCE hooks for SLUB") Fixes: d3fb45f3 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by:
Marco Elver <elver@google.com> Reviewed-by:
Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by:
kernel test robot <oliver.sang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> [slab] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Vincenzo Frascino authored
Kasan enables hw tags via kasan_enable_tagging() which based on the mode passed via kernel command line selects the correct hw backend. kasan_enable_tagging() is meant to be invoked indirectly via the cpu features framework of the architectures that support these backends. Currently the invocation of this function is guarded by CONFIG_KASAN_KUNIT_TEST which allows the enablement of the correct backend only when KUNIT tests are enabled in the kernel. This inconsistency was introduced in commit: ed6d7444 ("kasan: test: support async (again) and asymm modes for HW_TAGS") ... and prevents to enable MTE on arm64 when KUNIT tests for kasan hw_tags are disabled. Fix the issue making sure that the CONFIG_KASAN_KUNIT_TEST guard does not prevent the correct invocation of kasan_enable_tagging(). Link: https://lkml.kernel.org/r/20220408124323.10028-1-vincenzo.frascino@arm.com Fixes: ed6d7444 ("kasan: test: support async (again) and asymm modes for HW_TAGS") Signed-off-by:
Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by:
Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Axel Rasmussen authored
When one tries to grow an existing memfd_secret with ftruncate, one gets a panic [1]. For example, doing the following reliably induces the panic: fd = memfd_secret(); ftruncate(fd, 10); ptr = mmap(NULL, 10, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); strcpy(ptr, "123456789"); munmap(ptr, 10); ftruncate(fd, 20); The basic reason for this is, when we grow with ftruncate, we call down into simple_setattr, and then truncate_inode_pages_range, and eventually we try to zero part of the memory. The normal truncation code does this via the direct map (i.e., it calls page_address() and hands that to memset()). For memfd_secret though, we specifically don't map our pages via the direct map (i.e. we call set_direct_map_invalid_noflush() on every fault). So the address returned by page_address() isn't useful, and when we try to memset() with it we panic. This patch avoids the panic by implementing a custom setattr for memfd_secret, which detects resizes specifically (setting the size for the first time works just fine, since there are no existing pages to try to zero), and rejects them with EINVAL. One could argue growing should be supported, but I think that will require a significantly more lengthy change. So, I propose a minimal fix for the benefit of stable kernels, and then perhaps to extend memfd_secret to support growing in a separate patch. [1]: BUG: unable to handle page fault for address: ffffa0a889277028 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD afa01067 P4D afa01067 PUD 83f909067 PMD 83f8bf067 PTE 800ffffef6d88060 Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 0 PID: 281 Comm: repro Not tainted 5.17.0-dbg-DEV #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:memset_erms+0x9/0x10 Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01 RSP: 0018:ffffb932c09afbf0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffda63c4249dc0 RCX: 0000000000000fd8 RDX: 0000000000000fd8 RSI: 0000000000000000 RDI: ffffa0a889277028 RBP: ffffb932c09afc00 R08: 0000000000001000 R09: ffffa0a889277028 R10: 0000000000020023 R11: 0000000000000000 R12: ffffda63c4249dc0 R13: ffffa0a890d70d98 R14: 0000000000000028 R15: 0000000000000fd8 FS: 00007f7294899580(0000) GS:ffffa0af9bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffa0a889277028 CR3: 0000000107ef6006 CR4: 0000000000370ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? zero_user_segments+0x82/0x190 truncate_inode_partial_folio+0xd4/0x2a0 truncate_inode_pages_range+0x380/0x830 truncate_setsize+0x63/0x80 simple_setattr+0x37/0x60 notify_change+0x3d8/0x4d0 do_sys_ftruncate+0x162/0x1d0 __x64_sys_ftruncate+0x1c/0x20 do_syscall_64+0x44/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xae Modules linked in: xhci_pci xhci_hcd virtio_net net_failover failover virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_hcd evdev ehci_pci ehci_hcd 9pnet_virtio 9p netfs 9pnet CR2: ffffa0a889277028 [lkp@intel.com: secretmem_iops can be static] Signed-off-by:
kernel test robot <lkp@intel.com> [axelrasmussen@google.com: return EINVAL] Link: https://lkml.kernel.org/r/20220324210909.1843814-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20220412193023.279320-1-axelrasmussen@google.com Signed-off-by:
Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Chuck Lever reported fsx-based xfstests generic 075 091 112 127 failing when 5.18-rc1 NFS server exports tmpfs: bisected to recent tmpfs change. Whilst nfsd_splice_action() does contain some questionable handling of repeated pages, and Chuck was able to work around there, history from Mark Hemment makes clear that there might be similar dangers elsewhere: it was not a good idea for me to pass ZERO_PAGE down to unknown actors. Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when iter_is_iovec(); in other cases, use the more natural iov_iter_zero() instead of copy_page_to_iter(). We would use iov_iter_zero() throughout, but the x86 clear_user() is not nearly so well optimized as copy to user (dd of 1T sparse tmpfs file takes 57 seconds rather than 44 seconds). And now pagecache_init() does not need to SetPageUptodate(ZERO_PAGE(0)): which had caused boot failure on arm noMMU STM32F7 and STM32H7 boards Link: https://lkml.kernel.org/r/9a978571-8648-e830-5735-1f4748ce2e30@google.com Fixes: 56a8c8eb ("tmpfs: do not allocate pages on read") Signed-off-by:
Hugh Dickins <hughd@google.com> Reported-by:
Patrice CHOTARD <patrice.chotard@foss.st.com> Reported-by:
Chuck Lever III <chuck.lever@oracle.com> Tested-by:
Chuck Lever III <chuck.lever@oracle.com> Cc: Mark Hemment <markhemm@googlemail.com> Cc: Patrice CHOTARD <patrice.chotard@foss.st.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Lukas Czerner <lczerner@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 08, 2022
-
-
Andrew Morton authored
Commit 405cc51f ("mm/list_lru: optimize memcg_reparent_list_lru_node()") has subtle races which are proving ugly to fix. Revert the original optimization. If quantitative testing indicates that we have a significant problem here then other implementations can be looked at. Fixes: 405cc51f ("mm/list_lru: optimize memcg_reparent_list_lru_node()") Acked-by:
Shakeel Butt <shakeelb@google.com> Reviewed-by:
Muchun Song <songmuchun@bytedance.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Waiman Long <longman@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Miaohe Lin authored
If mpol_new is allocated but not used in restart loop, mpol_new will be freed via mpol_put before returning to the caller. But refcnt is not initialized yet, so mpol_put could not do the right things and might leak the unused mpol_new. This would happen if mempolicy was updated on the shared shmem file while the sp->lock has been dropped during the memory allocation. This issue could be triggered easily with the below code snippet if there are many processes doing the below work at the same time: shmid = shmget((key_t)5566, 1024 * PAGE_SIZE, 0666|IPC_CREAT); shm = shmat(shmid, 0, 0); loop many times { mbind(shm, 1024 * PAGE_SIZE, MPOL_LOCAL, mask, maxnode, 0); mbind(shm + 128 * PAGE_SIZE, 128 * PAGE_SIZE, MPOL_DEFAULT, mask, maxnode, 0); } Link: https://lkml.kernel.org/r/20220329111416.27954-1-linmiaohe@huawei.com Fixes: 42288fe3 ("mm: mempolicy: Convert shared_policy mutex to spinlock") Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> [3.8] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Paolo Bonzini authored
If an mremap() syscall with old_size=0 ends up in move_page_tables(), it will call invalidate_range_start()/invalidate_range_end() unnecessarily, i.e. with an empty range. This causes a WARN in KVM's mmu_notifier. In the past, empty ranges have been diagnosed to be off-by-one bugs, hence the WARNing. Given the low (so far) number of unique reports, the benefits of detecting more buggy callers seem to outweigh the cost of having to fix cases such as this one, where userspace is doing something silly. In this particular case, an early return from move_page_tables() is enough to fix the issue. Link: https://lkml.kernel.org/r/20220329173155.172439-1-pbonzini@redhat.com Reported-by:
<syzbot+6bde52d89cfdf9f61425@syzkaller.appspotmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Max Filippov authored
When CONFIG_DEBUG_KMAP_LOCAL is enabled __kmap_local_sched_{in,out} check that even slots in the tsk->kmap_ctrl.pteval are unmapped. The slots are initialized with 0 value, but the check is done with pte_none. 0 pte however does not necessarily mean that pte_none will return true. e.g. on xtensa it returns false, resulting in the following runtime warnings: WARNING: CPU: 0 PID: 101 at mm/highmem.c:627 __kmap_local_sched_out+0x51/0x108 CPU: 0 PID: 101 Comm: touch Not tainted 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13 Call Trace: dump_stack+0xc/0x40 __warn+0x8f/0x174 warn_slowpath_fmt+0x48/0xac __kmap_local_sched_out+0x51/0x108 __schedule+0x71a/0x9c4 preempt_schedule_irq+0xa0/0xe0 common_exception_return+0x5c/0x93 do_wp_page+0x30e/0x330 handle_mm_fault+0xa70/0xc3c do_page_fault+0x1d8/0x3c4 common_exception+0x7f/0x7f WARNING: CPU: 0 PID: 101 at mm/highmem.c:664 __kmap_local_sched_in+0x50/0xe0 CPU: 0 PID: 101 Comm: touch Tainted: G W 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13 Call Trace: dump_stack+0xc/0x40 __warn+0x8f/0x174 warn_slowpath_fmt+0x48/0xac __kmap_local_sched_in+0x50/0xe0 finish_task_switch$isra$0+0x1ce/0x2f8 __schedule+0x86e/0x9c4 preempt_schedule_irq+0xa0/0xe0 common_exception_return+0x5c/0x93 do_wp_page+0x30e/0x330 handle_mm_fault+0xa70/0xc3c do_page_fault+0x1d8/0x3c4 common_exception+0x7f/0x7f Fix it by replacing !pte_none(pteval) with pte_val(pteval) != 0. Link: https://lkml.kernel.org/r/20220403235159.3498065-1-jcmvbkbc@gmail.com Fixes: 5fbda3ec ("sched: highmem: Store local kmaps in task struct") Signed-off-by:
Max Filippov <jcmvbkbc@gmail.com> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Zi Yan authored
Fix a VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages) crash. With folios support, it is possible to have other than HPAGE_PMD_ORDER THPs, in the form of folios, in the system. Use thp_order() to correctly determine the source page order during migration. Link: https://lkml.kernel.org/r/20220404165325.1883267-1-zi.yan@sent.com Link: https://lore.kernel.org/linux-mm/20220404132908.GA785673@u2004/ Fixes: d68eccad ("mm/filemap: Allow large folios to be added to the page cache") Reported-by:
Naoya Horiguchi <naoya.horiguchi@linux.dev> Signed-off-by:
Zi Yan <ziy@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 07, 2022
-
-
zhenwei pi authored
page_mapped_in_vma() sets nr_pages to 1, which is usually correct as we only want to know about the precise page and not about other pages in the folio. However, hugetlbfs does want to know about the entire hpage, and using nr_pages to get the size of the hpage is wrong. We could change page_mapped_in_vma() to special-case hugetlbfs pages, but it's better to ignore nr_pages in page_vma_mapped_walk() and get the size from the VMA instead. Fixes: 2aff7a47 ("mm: Convert page_vma_mapped_walk to work on PFNs") Signed-off-by:
zhenwei pi <pizhenwei@bytedance.com> Reviewed-by:
Muchun Song <songmuchun@bytedance.com> Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> [edit commit message, use hstate directly]
-
Matthew Wilcox (Oracle) authored
Simplify new_page() by unifying the THP and base page cases, and handle orders other than 0 and HPAGE_PMD_ORDER correctly. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Zi Yan <ziy@nvidia.com> Reviewed-by:
William Kucharski <william.kucharski@oracle.com>
-
Matthew Wilcox (Oracle) authored
This wrapper around alloc_pages_vma() calls prep_transhuge_page(), removing the obligation from the caller. This is in the same spirit as __folio_alloc(). Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Zi Yan <ziy@nvidia.com> Reviewed-by:
William Kucharski <william.kucharski@oracle.com>
-
Matthew Wilcox (Oracle) authored
Unify alloc_misplaced_dst_page() and alloc_misplaced_dst_page_thp(). Removes an assumption that compound pages are HPAGE_PMD_ORDER. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Zi Yan <ziy@nvidia.com> Reviewed-by:
William Kucharski <william.kucharski@oracle.com>
-
Matthew Wilcox (Oracle) authored
This removes an assumption that a large folio is HPAGE_PMD_ORDER as well as letting us remove the call to prep_transhuge_page() and a few hidden calls to compound_head(). Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Zi Yan <ziy@nvidia.com> Reviewed-by:
William Kucharski <william.kucharski@oracle.com>
-
Matthew Wilcox (Oracle) authored
Calling try_to_unmap() with TTU_SPLIT_HUGE_PMD and a folio that's not mapped by a PMD causes oopses on arm64 because we now call page_folio() on an invalid page. pmd_page() returns a valid page for non-leaf PMDs on some architectures, so this bug escaped testing before now. Fix this bug by delaying the call to pmd_page() until after we know the PMD is a leaf. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215804 Fixes: af28a988 ("mm/huge_memory: Convert __split_huge_pmd() to take a folio") Reported-by:
Zorro Lang <zlang@redhat.com> Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by:
Zorro Lang <zlang@redhat.com>
-
- Apr 05, 2022
-
-
Sebastian Andrzej Siewior authored
The local_lock() is now using a proper static inline function which is enough for llvm to accept that the variable is used. Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220328145810.86783-4-bigeasy@linutronix.de
-
- Apr 01, 2022
-
-
Jonghyeon Kim authored
In the DAMON, the minimum wait time of the schemes decides whether the kernel wakes up 'kdamon_fn()'. But since the minimum wait time is initialized to zero, there are corner cases against the original objective. For example, if we have several schemes for one target, and if the wait time of the first scheme is zero, the minimum wait time will set zero, which means 'kdamond_fn()' should wake up to apply this scheme. However, in the following scheme, wait time can be set to non-zero. Thus, the mininum wait time will be set to non-zero, which can cause sleeping this interval for 'kdamon_fn()' due to one deactivated last scheme. This commit prevents making DAMON monitoring inactive state due to other deactivated schemes. Link: https://lkml.kernel.org/r/20220330105302.32114-1-tome01@ajou.ac.kr Signed-off-by:
Jonghyeon Kim <tome01@ajou.ac.kr> Reviewed-by:
SeongJae Park <sj@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Kuan-Ying Lee authored
When we use HW-tag based kasan and enable vmalloc support, we hit the following bug. It is due to comparison between tagged object and non-tagged pointer. We need to reset the kasan tag when we need to compare tagged object and non-tagged pointer. kmemleak: [name:kmemleak&]Scan area larger than object 0xffffffe77076f440 CPU: 4 PID: 1 Comm: init Tainted: G S W 5.15.25-android13-0-g5cacf919c2bc #1 Hardware name: MT6983(ENG) (DT) Call trace: add_scan_area+0xc4/0x244 kmemleak_scan_area+0x40/0x9c layout_and_allocate+0x1e8/0x288 load_module+0x2c8/0xf00 __se_sys_finit_module+0x190/0x1d0 __arm64_sys_finit_module+0x20/0x30 invoke_syscall+0x60/0x170 el0_svc_common+0xc8/0x114 do_el0_svc+0x28/0xa0 el0_svc+0x60/0xf8 el0t_64_sync_handler+0x88/0xec el0t_64_sync+0x1b4/0x1b8 kmemleak: [name:kmemleak&]Object 0xf5ffffe77076b000 (size 32768): kmemleak: [name:kmemleak&] comm "init", pid 1, jiffies 4294894197 kmemleak: [name:kmemleak&] min_count = 0 kmemleak: [name:kmemleak&] count = 0 kmemleak: [name:kmemleak&] flags = 0x1 kmemleak: [name:kmemleak&] checksum = 0 kmemleak: [name:kmemleak&] backtrace: module_alloc+0x9c/0x120 move_module+0x34/0x19c layout_and_allocate+0x1c4/0x288 load_module+0x2c8/0xf00 __se_sys_finit_module+0x190/0x1d0 __arm64_sys_finit_module+0x20/0x30 invoke_syscall+0x60/0x170 el0_svc_common+0xc8/0x114 do_el0_svc+0x28/0xa0 el0_svc+0x60/0xf8 el0t_64_sync_handler+0x88/0xec el0t_64_sync+0x1b4/0x1b8 Link: https://lkml.kernel.org/r/20220318034051.30687-1-Kuan-Ying.Lee@mediatek.com Signed-off-by:
Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Nicholas Tang <nicholas.tang@mediatek.com> Cc: Yee Lee <yee.lee@mediatek.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Rik van Riel authored
In some cases it appears the invalidation of a hwpoisoned page fails because the page is still mapped in another process. This can cause a program to be continuously restarted and die when it page faults on the page that was not invalidated. Avoid that problem by unmapping the hwpoisoned page when we find it. Another issue is that sometimes we end up oopsing in finish_fault, if the code tries to do something with the now-NULL vmf->page. I did not hit this error when submitting the previous patch because there are several opportunities for alloc_set_pte to bail out before accessing vmf->page, and that apparently happened on those systems, and most of the time on other systems, too. However, across several million systems that error does occur a handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE which will cause do_read_fault to return before calling finish_fault. Link: https://lkml.kernel.org/r/20220325161428.5068d97e@imladris.surriel.com Fixes: e53ac737 ("mm: invalidate hwpoison page cache page in fault path") Signed-off-by:
Rik van Riel <riel@surriel.com> Reviewed-by:
Miaohe Lin <linmiaohe@huawei.com> Tested-by:
Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Muchun Song authored
If the kfence object is allocated to be used for objects vector, then this slot of the pool eventually being occupied permanently since the vector is never freed. The solutions could be (1) freeing vector when the kfence object is freed or (2) allocating all vectors statically. Since the memory consumption of object vectors is low, it is better to chose (2) to fix the issue and it is also can reduce overhead of vectors allocating in the future. Link: https://lkml.kernel.org/r/20220328132843.16624-1-songmuchun@bytedance.com Fixes: d3fb45f3 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Reviewed-by:
Marco Elver <elver@google.com> Reviewed-by:
Roman Gushchin <roman.gushchin@linux.dev> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@...>
-
Sebastian Andrzej Siewior authored
The access to mlock_pvec is protected by disabling preemption via get_cpu_var() or implicit by having preemption disabled by the caller (in mlock_page_drain() case). This breaks on PREEMPT_RT since folio_lruvec_lock_irq() acquires a sleeping lock in this section. Create struct mlock_pvec which consits of the local_lock_t and the pagevec. Acquire the local_lock() before accessing the per-CPU pagevec. Replace mlock_page_drain() with a _local() version which is invoked on the local CPU and acquires the local_lock_t and a _remote() version which uses the pagevec from a remote CPU which offline. Link: https://lkml.kernel.org/r/YjizWi9IY0mpvIfb@linutronix.de Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by:
Hugh Dickins <hughd@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Mike reports that LTP memcg_stat_test usually leads to memcg_stat_test 3 TINFO: Test unevictable with MAP_LOCKED memcg_stat_test 3 TINFO: Running memcg_process --mmap-lock1 -s 135168 memcg_stat_test 3 TINFO: Warming up pid: 3460 memcg_stat_test 3 TINFO: Process is still here after warm up: 3460 memcg_stat_test 3 TFAIL: unevictable is 122880, 135168 expected but may also lead to memcg_stat_test 4 TINFO: Test unevictable with mlock memcg_stat_test 4 TINFO: Running memcg_process --mmap-lock2 -s 135168 memcg_stat_test 4 TINFO: Warming up pid: 4271 memcg_stat_test 4 TINFO: Process is still here after warm up: 4271 memcg_stat_test 4 TFAIL: unevictable is 122880, 135168 expected or both. A wee bit flaky. follow_page_pte() used to have an lru_add_drain() per each page mlocked, and the test came to rely on accurate stats. The pagevec to be drained is different now, but still covered by lru_add_drain(); and, never mind the test, I bel...
-
Charan Teja Kalla authored
This reverts commit 08095d63 ("mm: madvise: skip unmapped vma holes passed to process_madvise") as process_madvise() fails to return the exact processed bytes in other cases too. As an example: if process_madvise() hits mlocked pages after processing some initial bytes passed in [start, end), it just returns EINVAL although some bytes are processed. Thus making an exception only for ENOMEM is partially fixing the problem of returning the proper advised bytes. Thus revert this patch and return proper bytes advised. Link: https://lkml.kernel.org/r/e73da1304a88b6a8a11907045117cccf4c2b8374.1648046642.git.quic_charante@quicinc.com Fixes: 08095d63 ("mm: madvise: skip unmapped vma holes passed to process_madvise") Signed-off-by:
Charan Teja Kalla <quic_charante@quicinc.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
We can extract both the file pointer and the pos from the iocb. This simplifies each caller as well as allowing generic_perform_write() to see more of the iocb contents in the future. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Christian Brauner <brauner@kernel.org> Reviewed-by:
Al Viro <viro@zeniv.linux.org.uk> Acked-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Matthew Wilcox (Oracle) authored
- Refer to folios where appropriate, not pages (Matthew Wilcox) - Eliminate references to the internal PG_readhead - Use "readahead" consistently - not "read-ahead" or "read ahead" (mostly Neil Brown) - Clarify some sections that, on reflection, weren't very clear (Neil Brown) - Minor punctuation/spelling fixes (Neil Brown) Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org>
-
Christoph Hellwig authored
The skip_page argument to read_pages controls if rac->_index is incremented before returning from the function. Just open code that in the callers. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Al Viro <viro@zeniv.linux.org.uk> Acked-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org>
-
Christoph Hellwig authored
This is always an empty list or NULL with the removal of the ->readahead support, so remove it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Al Viro <viro@zeniv.linux.org.uk> Acked-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org>
-
Matthew Wilcox (Oracle) authored
All filesystems have now been converted to use ->readahead, so remove the ->readpages operation and fix all the comments that used to refer to it. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Al Viro <viro@zeniv.linux.org.uk> Acked-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Matthew Wilcox (Oracle) authored
With no remaining users, remove this function and the related infrastructure. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Al Viro <viro@zeniv.linux.org.uk> Acked-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Mar 30, 2022
-
-
Zi Yan authored
Whenever a buddy page is found, page_is_buddy() should be called to check its validity. Add the missing check during pageblock merge check. Fixes: 1dd214b8 ("mm: page_alloc: avoid merging non-fallbackable pageblocks with others") Link: https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/ Reported-and-tested-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Zi Yan <ziy@nvidia.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 28, 2022
-
-
Miaohe Lin authored
Since commit b1123ea6 ("mm: balloon: use general non-lru movable page feature"), these functions are called via balloon_aops callbacks. They're not called directly outside this file. So make them static and clean up the relevant code. Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20220125132221.2220-1-linmiaohe@huawei.com Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Reviewed-by:
Muchun Song <songmuchun@bytedance.com>
-
- Mar 27, 2022
-
-
Muchun Song authored
The objcg is not cleared and put for kfence object when it is freed, which could lead to memory leak for struct obj_cgroup and wrong statistics of NR_SLAB_RECLAIMABLE_B or NR_SLAB_UNRECLAIMABLE_B. Since the last freed object's objcg is not cleared, mem_cgroup_from_obj() could return the wrong memcg when this kfence object, which is not charged to any objcgs, is reallocated to other users. A real word issue [1] is caused by this bug. Link: https://lore.kernel.org/all/000000000000cabcb505dae9e577@google.com/ [1] Reported-by:
<syzbot+f8c45ccc7d5d45fc5965@syzkaller.appspotmail.com> Fixes: d3fb45f3 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 25, 2022
-
-
Andreas Gruenbacher authored
When part of the user buffer passed to generic_perform_write() or iomap_file_buffered_write() cannot be faulted in for reading, the entire write currently fails. The correct behavior would be to write all the data that can be written, up to the point of failure. Commit a6294593 ("iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable") gave us the information needed, so fix the page prefaulting in generic_perform_write() and iomap_write_iter() to only bail out when no pages could be faulted in. We already factor in that pages that are faulted in may no longer be resident by the time they are accessed. Paging out pages has the same effect as not faulting in those pages in the first place, so the code can already deal with that. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
- Mar 24, 2022
-
-
Johannes Weiner authored
MADV_DONTNEED historically rejects mlocked ranges, but with MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating, there are valid use cases for depopulating locked ranges as well. Users mlock memory to protect secrets. There are allocators for secure buffers that want in-use memory generally mlocked, but cleared and invalidated memory to give up the physical pages. This could be done with explicit munlock -> mlock calls on free -> alloc of course, but that adds two unnecessary syscalls, heavy mmap_sem write locks, vma splits and re-merges - only to get rid of the backing pages. Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are okay with on-demand initial population. It seems valid to selectively free some memory during the lifetime of such a process, without having to mess with its overall policy. Why add a separate flag? Isn't this a pretty niche usecase? - MADV_DONTNEED has been bailing on locked vmas forever. It's at least conceivable that someone, somewhere is relying on mlock to protect data from perhaps broader invalidation calls. Changing this behavior now could lead to quiet data corruption. - It also clarifies expectations around MADV_FREE and maybe MADV_REMOVE. It avoids the situation where one quietly behaves different than the others. MADV_FREE_LOCKED can be added later. - The combination of mlock() and madvise() in the first place is probably niche. But where it happens, I'd say that dropping pages from a locked region once they don't contain secrets or won't page anymore is much saner than relying on mlock to protect memory from speculative or errant invalidation calls. It's just that we can't change the default behavior because of the two previous points. Given that, an explicit new flag seems to make the most sense. [hannes@cmpxchg.org: fix mips build] Link: https://lkml.kernel.org/r/20220304171912.305060-1-hannes@cmpxchg.org Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-