- Apr 12, 2022
-
-
Mikulas Patocka authored
struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit st_dev and st_rdev; struct compat_stat (defined in arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by a 16-bit padding. This patch fixes struct compat_stat to match struct stat. [ Historical note: the old x86 'struct stat' did have that 16-bit field that the compat layer had kept around, but it was changes back in 2003 by "struct stat - support larger dev_t": https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d and back in those days, the x86_64 port was still new, and separate from the i386 code, and had already picked up the old version with a 16-bit st_dev field ] Note that we can't change compat_dev_t because it is used by compat_loop_info. Also, if the st_dev and st_rdev values are 32-bit, we don't have to use old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major number 259. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 11, 2022
-
-
Vitaly Kuznetsov authored
The following WARN is triggered from kvm_vm_ioctl_set_clock(): WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm] ... CPU: 10 PID: 579353 Comm: qemu-system-x86 Tainted: G W O 5.16.0.stable #20 Hardware name: LENOVO 20UF001CUS/20UF001CUS, BIOS R1CET65W(1.34 ) 06/17/2021 RIP: 0010:mark_page_dirty_in_slot+0x6c/0x80 [kvm] ... Call Trace: <TASK> ? kvm_write_guest+0x114/0x120 [kvm] kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm] kvm_arch_vm_ioctl+0xa26/0xc50 [kvm] ? schedule+0x4e/0xc0 ? __cond_resched+0x1a/0x50 ? futex_wait+0x166/0x250 ? __send_signal+0x1f1/0x3d0 kvm_vm_ioctl+0x747/0xda0 [kvm] ... The WARN was introduced by commit 03c0304a86bc ("KVM: Warn if mark_page_dirty() is called without an active vCPU") but the change seems to be correct (unlike Hyper-V TSC page update mechanism). In fact, there's no real need to actually write to guest memory to invalidate TSC page, this can be done by the first vCPU which goes through kvm_guest_time_update(). Reported-by:
Maxim Levitsky <mlevitsk@redhat.com> Reported-by:
Naresh Kamboju <naresh.kamboju@linaro.org> Suggested-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220407201013.963226-1-vkuznets@redhat.com>
-
Suravee Suthikulpanit authored
Since current AVIC implementation cannot support encrypted memory, inhibit AVIC for SEV-enabled guest. Signed-off-by:
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220408133710.54275-1-suravee.suthikulpanit@amd.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Apr 08, 2022
-
-
Heiko Stuebner authored
vcpu_fp uses the riscv_isa_extension mechanism which gets defined in hwcap.h but doesn't include that head file. While it seems to work in most cases, in certain conditions this can lead to build failures like ../arch/riscv/kvm/vcpu_fp.c: In function ‘kvm_riscv_vcpu_fp_reset’: ../arch/riscv/kvm/vcpu_fp.c:22:13: error: implicit declaration of function ‘riscv_isa_extension_available’ [-Werror=implicit-function-declaration] 22 | if (riscv_isa_extension_available(&isa, f) || | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../arch/riscv/kvm/vcpu_fp.c:22:49: error: ‘f’ undeclared (first use in this function) 22 | if (riscv_isa_extension_available(&isa, f) || Fix this by simply including the necessary header. Fixes: 0a86512d ("RISC-V: KVM: Factor-out FP virtualization into separate sources") Signed-off-by:
Heiko Stuebner <heiko@sntech.de> Signed-off-by:
Anup Patel <anup@brainfault.org>
-
Anup Patel authored
We might have RISC-V systems (such as QEMU) where VMID is not part of the TLB entry tag so these systems will have to flush all TLB entries upon any change in hgatp.VMID. Currently, we zero-out hgatp CSR in kvm_arch_vcpu_put() and we re-program hgatp CSR in kvm_arch_vcpu_load(). For above described systems, this will flush all TLB entries whenever VCPU exits to user-space hence reducing performance. This patch fixes above described performance issue by not clearing hgatp CSR in kvm_arch_vcpu_put(). Fixes: 34bde9d8 ("RISC-V: KVM: Implement VCPU world-switch") Cc: stable@vger.kernel.org Signed-off-by:
Anup Patel <apatel@ventanamicro.com> Signed-off-by:
Anup Patel <anup@brainfault.org>
-
Chanho Park authored
Add the MIDR part number info for the Arm Cortex-A78AE[1] and add it to spectre-BHB affected list[2]. [1]: https://developer.arm.com/Processors/Cortex-A78AE [2]: https://developer.arm.com/Arm%20Security%20Center/Spectre-BHB Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: James Morse <james.morse@arm.com> Signed-off-by:
Chanho Park <chanho61.park@samsung.com> Link: https://lore.kernel.org/r/20220407091128.8700-1-chanho61.park@samsung.com Signed-off-by:
Will Deacon <will@kernel.org>
-
Guo Ren authored
These patch_text implementations are using stop_machine_cpuslocked infrastructure with atomic cpu_count. The original idea: When the master CPU patch_text, the others should wait for it. But current implementation is using the first CPU as master, which couldn't guarantee the remaining CPUs are waiting. This patch changes the last CPU as the master to solve the potential risk. Fixes: ae164807 ("arm64: introduce interfaces to hotpatch kernel and module code") Signed-off-by:
Guo Ren <guoren@linux.alibaba.com> Signed-off-by:
Guo Ren <guoren@kernel.org> Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Reviewed-by:
Masami Hiramatsu <mhiramat@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220407073323.743224-2-guoren@kernel.org Signed-off-by:
Will Deacon <will@kernel.org>
-
- Apr 07, 2022
-
-
Reto Buerki authored
The x86 MSI message data is 32 bits in total and is either in compatibility or remappable format, see Intel Virtualization Technology for Directed I/O, section 5.1.2. Fixes: 6285aa50 ("x86/msi: Provide msi message shadow structs") Co-developed-by:
Adrian-Ken Rueegsegger <ken@codelabs.ch> Signed-off-by:
Adrian-Ken Rueegsegger <ken@codelabs.ch> Signed-off-by:
Reto Buerki <reet@codelabs.ch> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220407110647.67372-1-reet@codelabs.ch
-
Nick Desaulniers authored
Bernardo reported an error that Nathan bisected down to (x86_64) defconfig+LTO_CLANG_FULL+X86_PMEM_LEGACY. LTO vmlinux.o ld.lld: error: <instantiation>:1:13: redefinition of 'found' .set found, 0 ^ <inline asm>:29:1: while in macro instantiation extable_type_reg reg=%eax, type=(17 | ((0) << 16)) ^ This appears to be another LTO specific issue similar to what was folded into commit 4b5305de ("x86/extable: Extend extable functionality"), where the `.set found, 0` in DEFINE_EXTABLE_TYPE_REG in arch/x86/include/asm/asm.h conflicts with the symbol for the static function `found` in arch/x86/kernel/pmem.c. Assembler .set directive declare symbols with global visibility, so the assembler may not rename such symbols in the event of a conflict. LTO could rename static functions if there was a conflict in C sources, but it cannot see into symbols defined in inline asm. The symbols are also retained in the symbol table, regardless of LTO. Give the symbols .L prefixes making them locally visible, so that they may be renamed for LTO to avoid conflicts, and to drop them from the symbol table regardless of LTO. Fixes: 4b5305de ("x86/extable: Extend extable functionality") Reported-by:
Bernardo Meurer Costa <beme@google.com> Debugged-by:
Nathan Chancellor <nathan@kernel.org> Signed-off-by:
Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Nathan Chancellor <nathan@kernel.org> Tested-by:
Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20220329202148.2379697-1-ndesaulniers@google.com
-
Peter Zijlstra authored
Clang can inline emit_indirect_jump() and then folds constants, which results in: | vmlinux.o: warning: objtool: emit_bpf_dispatcher()+0x6a4: relocation to !ENDBR: .text.__x86.indirect_thunk+0x40 | vmlinux.o: warning: objtool: emit_bpf_dispatcher()+0x67d: relocation to !ENDBR: .text.__x86.indirect_thunk+0x40 | vmlinux.o: warning: objtool: emit_bpf_tail_call_indirect()+0x386: relocation to !ENDBR: .text.__x86.indirect_thunk+0x20 | vmlinux.o: warning: objtool: emit_bpf_tail_call_indirect()+0x35d: relocation to !ENDBR: .text.__x86.indirect_thunk+0x20 Suppress the optimization such that it must emit a code reference to the __x86_indirect_thunk_array[] base. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Alexei Starovoitov <ast@kernel.org> Link: https://lkml.kernel.org/r/20220405075531.GB30877@worktop.programming.kicks-ass.net
-
- Apr 06, 2022
-
-
Kefeng Wang authored
This reverts commit 602946ec. If CONFIG_HIGHMEM is enabled, no highmem will be added with max_mapnr set to max_low_pfn, see mem_init(): for (pfn = highmem_mapnr; pfn < max_mapnr; ++pfn) { ... free_highmem_page(); } Now that virt_addr_valid() has been fixed in the previous commit, we can revert the change to max_mapnr. Fixes: 602946ec ("powerpc: Set max_mapnr correctly") Signed-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by:
Christophe Leroy <christophe.leroy@csgroup.eu> Reported-by:
Erhard F. <erhard_f@mailbox.org> [mpe: Update change log to reflect series reordering] Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220406145802.538416-2-mpe@ellerman.id.au
-
Kefeng Wang authored
mpe: On 64-bit Book3E vmalloc space starts at 0x8000000000000000. Because of the way __pa() works we have: __pa(0x8000000000000000) == 0, and therefore virt_to_pfn(0x8000000000000000) == 0, and therefore virt_addr_valid(0x8000000000000000) == true Which is wrong, virt_addr_valid() should be false for vmalloc space. In fact all vmalloc addresses that alias with a valid PFN will return true from virt_addr_valid(). That can cause bugs with hardened usercopy as described below by Kefeng Wang: When running ethtool eth0 on 64-bit Book3E, a BUG occurred: usercopy: Kernel memory exposure attempt detected from SLUB object not in SLUB page?! (offset 0, size 1048)! kernel BUG at mm/usercopy.c:99 ... usercopy_abort+0x64/0xa0 (unreliable) __check_heap_object+0x168/0x190 __check_object_size+0x1a0/0x200 dev_ethtool+0x2494/0x2b20 dev_ioctl+0x5d0/0x770 sock_do_ioctl+0xf0/0x1d0 sock_ioctl+0x3ec/0x5a0 __se_sys_ioctl+0xf0/0x160 system_call_exception+0xfc/0x1f0 system_call_common+0xf8/0x200 The code shows below, data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN)); copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN)) The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true on 64-bit Book3E, which leads to the panic. As commit 4dd7554a ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in the virt_addr_valid() for 64-bit, also add upper limit check to make sure the virt is below high_memory. Meanwhile, for 32-bit PAGE_OFFSET is the virtual address of the start of lowmem, high_memory is the upper low virtual address, the check is suitable for 32-bit, this will fix the issue mentioned in commit 602946ec ("powerpc: Set max_mapnr correctly") too. On 32-bit there is a similar problem with high memory, that was fixed in commit 602946ec ("powerpc: Set max_mapnr correctly"), but that commit breaks highmem and needs to be reverted. We can't easily fix __pa(), we have code that relies on its current behaviour. So for now add extra checks to virt_addr_valid(). For 64-bit Book3S the extra checks are not necessary, the combination of virt_to_pfn() and pfn_valid() should yield the correct result, but they are harmless. Signed-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by:
Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Add additional change log detail] Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220406145802.538416-1-mpe@ellerman.id.au
-
Reiji Watanabe authored
KVM allows userspace to configure either all EL1 32bit or 64bit vCPUs for a guest. At vCPU reset, vcpu_allowed_register_width() checks if the vcpu's register width is consistent with all other vCPUs'. Since the checking is done even against vCPUs that are not initialized (KVM_ARM_VCPU_INIT has not been done) yet, the uninitialized vCPUs are erroneously treated as 64bit vCPU, which causes the function to incorrectly detect a mixed-width VM. Introduce KVM_ARCH_FLAG_EL1_32BIT and KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED bits for kvm->arch.flags. A value of the EL1_32BIT bit indicates that the guest needs to be configured with all 32bit or 64bit vCPUs, and a value of the REG_WIDTH_CONFIGURED bit indicates if a value of the EL1_32BIT bit is valid (already set up). Values in those bits are set at the first KVM_ARM_VCPU_INIT for the guest based on KVM_ARM_VCPU_EL1_32BIT configuration for the vCPU. Check vcpu's register width against those new ...
-
Joey Gouly authored
The alternatives code must be `noinstr` such that it does not patch itself, as the cache invalidation is only performed after all the alternatives have been applied. Mark patch_alternative() as `noinstr`. Mark branch_insn_requires_update() and get_alt_insn() with `__always_inline` since they are both only called through patch_alternative(). Booting a kernel in QEMU TCG with KCSAN=y and ARM64_USE_LSE_ATOMICS=y caused a boot hang: [ 0.241121] CPU: All CPU(s) started at EL2 The alternatives code was patching the atomics in __tsan_read4() from LL/SC atomics to LSE atomics. The following fragment is using LL/SC atomics in the .text section: | <__tsan_unaligned_read4+304>: ldxr x6, [x2] | <__tsan_unaligned_read4+308>: add x6, x6, x5 | <__tsan_unaligned_read4+312>: stxr w7, x6, [x2] | <__tsan_unaligned_read4+316>: cbnz w7, <__tsan_unaligned_read4+304> This LL/SC atomic sequence was to be replaced with LSE atomics. However since the alternatives code was instrumentable, __tsan_read4() was being called after only the first instruction was replaced, which led to the following code in memory: | <__tsan_unaligned_read4+304>: ldadd x5, x6, [x2] | <__tsan_unaligned_read4+308>: add x6, x6, x5 | <__tsan_unaligned_read4+312>: stxr w7, x6, [x2] | <__tsan_unaligned_read4+316>: cbnz w7, <__tsan_unaligned_read4+304> This caused an infinite loop as the `stxr` instruction never completed successfully, so `w7` was always 0. Signed-off-by:
Joey Gouly <joey.gouly@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220405104733.11476-1-joey.gouly@arm.com Signed-off-by:
Will Deacon <will@kernel.org>
-
Yu Zhe authored
Remove unnecessary casts. Signed-off-by:
Yu Zhe <yuzhe@nfschina.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220329102059.268983-1-yuzhe@nfschina.com
-
Oliver Upton authored
It is possible to take a stage-2 permission fault on a page larger than PAGE_SIZE. For example, when running a guest backed by 2M HugeTLB, KVM eagerly maps at the largest possible block size. When dirty logging is enabled on a memslot, KVM does *not* eagerly split these 2M stage-2 mappings and instead clears the write bit on the pte. Since dirty logging is always performed at PAGE_SIZE granularity, KVM lazily splits these 2M block mappings down to PAGE_SIZE in the stage-2 fault handler. This operation must be done under the write lock. Since commit f783ef1c ("KVM: arm64: Add fast path to handle permission relaxation during dirty logging"), the stage-2 fault handler conditionally takes the read lock on permission faults with dirty logging enabled. To that end, it is possible to split a 2M block mapping while only holding the read lock. The problem is demonstrated by running kvm_page_table_test with 2M anonymous HugeTLB, which splats like so: WARNING: CPU: 5 PID: 15276 at arch/arm64/kvm/hyp/pgtable.c:153 stage2_map_walk_leaf+0x124/0x158 [...] Call trace: stage2_map_walk_leaf+0x124/0x158 stage2_map_walker+0x5c/0xf0 __kvm_pgtable_walk+0x100/0x1d4 __kvm_pgtable_walk+0x140/0x1d4 __kvm_pgtable_walk+0x140/0x1d4 kvm_pgtable_walk+0xa0/0xf8 kvm_pgtable_stage2_map+0x15c/0x198 user_mem_abort+0x56c/0x838 kvm_handle_guest_abort+0x1fc/0x2a4 handle_exit+0xa4/0x120 kvm_arch_vcpu_ioctl_run+0x200/0x448 kvm_vcpu_ioctl+0x588/0x664 __arm64_sys_ioctl+0x9c/0xd4 invoke_syscall+0x4c/0x144 el0_svc_common+0xc4/0x190 do_el0_svc+0x30/0x8c el0_svc+0x28/0xcc el0t_64_sync_handler+0x84/0xe4 el0t_64_sync+0x1a4/0x1a8 Fix the issue by only acquiring the read lock if the guest faulted on a PAGE_SIZE granule w/ dirty logging enabled. Add a WARN to catch locking bugs in future changes. Fixes: f783ef1c ("KVM: arm64: Add fast path to handle permission relaxation during dirty logging") Cc: Jing Zhang <jingzhangos@google.com> Signed-off-by:
Oliver Upton <oupton@google.com> Reviewed-by:
Reiji Watanabe <reijiw@google.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220401194652.950240-1-oupton@google.com
-
Oliver Upton authored
We already sanitize the guest's PSCI version when it is being written by userspace, rejecting unsupported version numbers. Additionally, the 'minor' parameter to kvm_psci_1_x_call() is a constant known at compile time for all callsites. Though it is benign, the additional check against the PSCI kvm_psci_1_x_call() is unnecessary and likely to be missed the next time KVM raises its maximum PSCI version. Drop the check altogether and rely on sanitization when the PSCI version is set by userspace. No functional change intended. Signed-off-by:
Oliver Upton <oupton@google.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220322183538.2757758-4-oupton@google.com
-
Oliver Upton authored
The SMCCC does not allow the SMC64 calling convention to be used from AArch32. While KVM checks to see if the calling convention is allowed in PSCI_1_0_FN_PSCI_FEATURES, it does not actually prevent calls to unadvertised PSCI v1.0+ functions. Hoist the check to see if the requested function is allowed into kvm_psci_call(), thereby preventing SMC64 calls from AArch32 for all PSCI versions. Fixes: d43583b8 ("KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest") Acked-by:
Will Deacon <will@kernel.org> Reviewed-by:
Reiji Watanabe <reijiw@google.com> Signed-off-by:
Oliver Upton <oupton@google.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220322183538.2757758-3-oupton@google.com
-
Oliver Upton authored
The only valid calling SMC calling convention from an AArch32 state is SMC32. Disallow any PSCI function that sets the SMC64 function ID bit when called from AArch32 rather than comparing against known SMC64 PSCI functions. Note that without this change KVM advertises the SMC64 flavor of SYSTEM_RESET2 to AArch32 guests. Fixes: d43583b8 ("KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest") Acked-by:
Will Deacon <will@kernel.org> Reviewed-by:
Reiji Watanabe <reijiw@google.com> Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Oliver Upton <oupton@google.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220322183538.2757758-2-oupton@google.com
-
- Apr 05, 2022
-
-
Pawan Gupta authored
After resuming from suspend-to-RAM, the MSRs that control CPU's speculative execution behavior are not being restored on the boot CPU. These MSRs are used to mitigate speculative execution vulnerabilities. Not restoring them correctly may leave the CPU vulnerable. Secondary CPU's MSRs are correctly being restored at S3 resume by identify_secondary_cpu(). During S3 resume, restore these MSRs for boot CPU when restoring its processor state. Fixes: 77243971 ("x86/bugs/intel: Set proper CPU features and setup RDS") Reported-by:
Neelima Krishnan <neelima.krishnan@intel.com> Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by:
Neelima Krishnan <neelima.krishnan@intel.com> Acked-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Pawan Gupta authored
The mechanism to save/restore MSRs during S3 suspend/resume checks for the MSR validity during suspend, and only restores the MSR if its a valid MSR. This is not optimal, as an invalid MSR will unnecessarily throw an exception for every suspend cycle. The more invalid MSRs, higher the impact will be. Check and save the MSR validity at setup. This ensures that only valid MSRs that are guaranteed to not throw an exception will be attempted during suspend. Fixes: 7a9c2dd0 ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume") Suggested-by:
Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by:
Dave Hansen <dave.hansen@linux.intel.com> Acked-by:
Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Greg Kroah-Hartman authored
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the pseries vas sysfs code to use default_groups field which has been the preferred way since aa30f47c ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Haren Myneni <haren@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/20220329142552.558339-1-gregkh@linuxfoundation.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lv Ruyi authored
All work currently pending will be done first by calling destroy_workqueue, so there is unnecessary to flush it explicitly. Reported-by:
Zeal Robot <zealci@zte.com.cn> Signed-off-by:
Lv Ruyi <lv.ruyi@zte.com.cn> Reviewed-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220401083530.2407703-1-lv.ruyi@zte.com.cn> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as -1 is technically undefined behavior when its value is read out by param_get_bool(), as boolean values are supposed to be '0' or '1'. Alternatively, KVM could define a custom getter for the param, but the auto value doesn't depend on the vendor module in any way, and printing "auto" would be unnecessarily unfriendly to the user. In addition to fixing the undefined behavior, resolving the auto value also fixes the scenario where the auto value resolves to N and no vendor module is loaded. Previously, -1 would result in Y being printed even though KVM would ultimately disable the mitigation. Rename the existing MMU module init/exit helpers to clarify that they're invoked with respect to the vendor module, and add comments to document why KVM has two separate "module init" flows. ========================================================================= UBSAN: invalid-load in kernel/params.c:320:33 load of value 255 is not a valid value for type '_Bool' CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 ubsan_epilogue+0x5/0x40 __ubsan_handle_load_invalid_value.cold+0x43/0x48 param_get_bool.cold+0xf/0x14 param_attr_show+0x55/0x80 module_attr_show+0x1c/0x30 sysfs_kf_seq_show+0x93/0xc0 seq_read_iter+0x11c/0x450 new_sync_read+0x11b/0x1a0 vfs_read+0xf0/0x190 ksys_read+0x5f/0xe0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ========================================================================= Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Reported-by:
Bruno Goncalves <bgoncalv@redhat.com> Reported-by:
Jan Stancek <jstancek@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220331221359.3912754-1-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peter Gonda authored
Add resched to avoid warning from sev_clflush_pages() with large number of pages. Signed-off-by:
Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20220330164306.2376085-1-pgonda@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Vincent Mailhol authored
The macro __WARN_FLAGS() uses a local variable named "f". This being a common name, there is a risk of shadowing other variables. For example, GCC would yield: | In file included from ./include/linux/bug.h:5, | from ./include/linux/cpumask.h:14, | from ./arch/x86/include/asm/cpumask.h:5, | from ./arch/x86/include/asm/msr.h:11, | from ./arch/x86/include/asm/processor.h:22, | from ./arch/x86/include/asm/timex.h:5, | from ./include/linux/timex.h:65, | from ./include/linux/time32.h:13, | from ./include/linux/time.h:60, | from ./include/linux/stat.h:19, | from ./include/linux/module.h:13, | from virt/lib/irqbypass.mod.c:1: | ./include/linux/rcupdate.h: In function 'rcu_head_after_call_rcu': | ./arch/x86/include/asm/bug.h:80:21: warning: declaration of 'f' shadows a parameter [-Wshadow] | 80 | __auto_type f = BUGFLAG_WARNING|(flags); \ | | ^ | ./include/asm-generic/bug.h:106:17: note: in expansion of macro '__WARN_FLAGS' | 106 | __WARN_FLAGS(BUGFLAG_ONCE | \ | | ^~~~~~~~~~~~ | ./include/linux/rcupdate.h:1007:9: note: in expansion of macro 'WARN_ON_ONCE' | 1007 | WARN_ON_ONCE(func != (rcu_callback_t)~0L); | | ^~~~~~~~~~~~ | In file included from ./include/linux/rbtree.h:24, | from ./include/linux/mm_types.h:11, | from ./include/linux/buildid.h:5, | from ./include/linux/module.h:14, | from virt/lib/irqbypass.mod.c:1: | ./include/linux/rcupdate.h:1001:62: note: shadowed declaration is here | 1001 | rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f) | | ~~~~~~~~~~~~~~~^ For reference, sparse also warns about it, c.f. [1]. This patch renames the variable from f to __flags (with two underscore prefixes as suggested in the Linux kernel coding style [2]) in order to prevent collisions. [1] https://lore.kernel.org/all/CAFGhKbyifH1a+nAMCvWM88TK6fpNPdzFtUXPmRGnnQeePV+1sw@mail.gmail.com/ [2] Linux kernel coding style, section 12) Macros, Enums and RTL, paragraph 5) namespace collisions when defining local variables in macros resembling functions https://www.kernel.org/doc/html/latest/process/coding-style.html#macros-enums-and-rtl Fixes: bfb1a7c9 ("x86/bug: Merge annotate_reachable() into_BUG_FLAGS() asm") Signed-off-by:
Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com> Acked-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20220324023742.106546-1-mailhol.vincent@wanadoo.fr
-
Kan Liang authored
On Sapphire Rapids, the FRONTEND_RETIRED.MS_FLOWS event requires the FRONTEND MSR value 0x8. However, the current FRONTEND MSR mask doesn't support it. Update intel_spr_extra_regs[] to support it. Fixes: 61b985e3 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by:
Kan Liang <kan.liang@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1648482543-14923-2-git-send-email-kan.liang@linux.intel.com
-
Kan Liang authored
The INST_RETIRED.PREC_DIST event (0x0100) doesn't count on SPR. perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0 Performance counter stats for 'CPU(s) 0': 607,246 cpu/event=0xc0,umask=0x0/ 0 cpu/event=0x0,umask=0x1/ The encoding for INST_RETIRED.PREC_DIST is pseudo-encoding, which doesn't work on the generic counters. However, current perf extends its mask to the generic counters. The pseudo event-code for a fixed counter must be 0x00. Check and avoid extending the mask for the fixed counter event which using the pseudo-encoding, e.g., ref-cycles and PREC_DIST event. With the patch, perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0 Performance counter stats for 'CPU(s) 0': 583,184 cpu/event=0xc0,umask=0x0/ 583,048 cpu/event=0x0,umask=0x1/ Fixes: 2de71ee1 ("perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings") Signed-off-by:
Kan Liang <kan.liang@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1648482543-14923-1-git-send-email-kan.liang@linux.intel.com
-
Kan Liang authored
The uncore PMU of the Raptor Lake is the same as Alder Lake. Add new PCIIDs of IMC for Raptor Lake. Signed-off-by:
Kan Liang <kan.liang@intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/1647366360-82824-4-git-send-email-kan.liang@linux.intel.com
-
Kan Liang authored
Raptor Lake is Intel's successor to Alder lake. PPERF and SMI_COUNT MSRs are also supported. Signed-off-by:
Kan Liang <kan.liang@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/1647366360-82824-3-git-send-email-kan.liang@linux.intel.com
-
Kan Liang authored
Raptor Lake is Intel's successor to Alder lake. From the perspective of Intel cstate residency counters, there is nothing changed compared with Alder lake. Share adl_cstates with Alder lake. Update the comments for Raptor Lake. Signed-off-by:
Kan Liang <kan.liang@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/1647366360-82824-2-git-send-email-kan.liang@linux.intel.com
-
Kan Liang authored
From PMU's perspective, Raptor Lake is the same as the Alder Lake. The only difference is the event list, which will be supported in the perf tool later. Signed-off-by:
Kan Liang <kan.liang@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/1647366360-82824-1-git-send-email-kan.liang@linux.intel.com
-
Sebastian Andrzej Siewior authored
The volatile attribute in the inline assembly of arch_raw_cpu_ptr() forces the compiler to always generate the code, even if the compiler can decide upfront that its result is not needed. For instance invoking __intel_pmu_disable_all(false) (like intel_pmu_snapshot_arch_branch_stack() does) leads to loading the address of &cpu_hw_events into the register while compiler knows that it has no need for it. This ends up with code like: | movq $cpu_hw_events, %rax #, tcp_ptr__ | add %gs:this_cpu_off(%rip), %rax # this_cpu_off, tcp_ptr__ | xorl %eax, %eax # tmp93 It also creates additional code within local_lock() with !RT && !LOCKDEP which is not desired. By removing the volatile attribute the compiler can place the function freely and avoid it if it is not needed in the end. By using the function twice the compiler properly caches only the variable offset and always loads the CPU-offset. this_cpu_ptr() also remains properly placed within a preempt_disable() sections because - arch_raw_cpu_ptr() assembly has a memory input ("m" (this_cpu_off)) - prempt_{dis,en}able() fundamentally has a 'barrier()' in it Therefore this_cpu_ptr() is already properly serialized and does not rely on the 'volatile' attribute. Remove volatile from arch_raw_cpu_ptr(). [ bigeasy: Added Linus' explanation why this_cpu_ptr() is not moved out of a preempt_disable() section without the 'volatile' attribute. ] Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220328145810.86783-2-bigeasy@linutronix.de
-
Christophe Leroy authored
When a static call is updated with __static_call_return0() as target, arch_static_call_transform() set it to use an optimised set of instructions which are meant to lay in the same cacheline. But when initialising a static call with DEFINE_STATIC_CALL_RET0(), we get a branch to the real __static_call_return0() function instead of getting the optimised setup: c00d8120 <__SCT__perf_snapshot_branch_stack>: c00d8120: 4b ff ff f4 b c00d8114 <__static_call_return0> c00d8124: 3d 80 c0 0e lis r12,-16370 c00d8128: 81 8c 81 3c lwz r12,-32452(r12) c00d812c: 7d 89 03 a6 mtctr r12 c00d8130: 4e 80 04 20 bctr c00d8134: 38 60 00 00 li r3,0 c00d8138: 4e 80 00 20 blr c00d813c: 00 00 00 00 .long 0x0 Add ARCH_DEFINE_STATIC_CALL_RET0_TRAMP() defined by each architecture to setup the optimised configuration, and rework DEFINE_STATIC_CALL_RET0() to call it: c00d8120 <__SCT__perf_snapshot_branch_stack>: c00d8120: 48 00 00 14 b c00d8134 <__SCT__perf_snapshot_branch_stack+0x14> c00d8124: 3d 80 c0 0e lis r12,-16370 c00d8128: 81 8c 81 3c lwz r12,-32452(r12) c00d812c: 7d 89 03 a6 mtctr r12 c00d8130: 4e 80 04 20 bctr c00d8134: 38 60 00 00 li r3,0 c00d8138: 4e 80 00 20 blr c00d813c: 00 00 00 00 .long 0x0 Signed-off-by:
Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/1e0a61a88f52a460f62a58ffc2a5f847d1f7d9d8.1647253456.git.christophe.leroy@csgroup.eu
-
Peter Zijlstra authored
Paolo reported that the instruction sequence that is used to replace: call __static_call_return0 namely: 66 66 48 31 c0 data16 data16 xor %rax,%rax decodes to something else on i386, namely: 66 66 48 data16 dec %ax 31 c0 xor %eax,%eax Which is a nonsensical sequence that happens to have the same outcome. *However* an important distinction is that it consists of 2 instructions which is a problem when the thing needs to be overwriten with a regular call instruction again. As such, replace the instruction with something that decodes the same on both i386 and x86_64. Fixes: 3f2a8fc4 ("static_call/x86: Add __static_call_return0()") Reported-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220318204419.GT8939@worktop.programming.kicks-ass.net
-
- Apr 04, 2022
-
-
Dave Hansen authored
0day reported a regression on a microbenchmark which is intended to stress the TLB flushing path: https://lore.kernel.org/all/20220317090415.GE735@xsang-OptiPlex-9020/ It pointed at a commit from Nadav which intended to remove retpoline overhead in the TLB flushing path by taking the 'cond'-ition in on_each_cpu_cond_mask(), pre-calculating it, and incorporating it into 'cpumask'. That allowed the code to use a bunch of earlier direct calls instead of later indirect calls that need a retpoline. But, in practice, threads can go idle (and into lazy TLB mode where they don't need to flush their TLB) between the early and late calls. It works in this direction and not in the other because TLB-flushing threads tend to hold mmap_lock for write. Contention on that lock causes threads to _go_ idle right in this early/late window. There was not any performance data in the original commit specific to the retpoline overhead. I did a few tests on a system with retpolines: https://lore.kernel.org/all/dd8be93c-ded6-b962-50d4-96b1c3afb2b7@intel.com/ which showed a possible small win. But, that small win pales in comparison with the bigger loss induced on non-retpoline systems. Revert the patch that removed the retpolines. This was not a clean revert, but it was self-contained enough not to be too painful. Fixes: 6035152d ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()") Reported-by:
kernel test robot <oliver.sang@intel.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Acked-by:
Nadav Amit <namit@vmware.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/164874672286.389.7021457716635788197.tip-bot2@tip-bot2
-
Zhiyuan Dai authored
Fix typo in comment. Signed-off-by:
Zhiyuan Dai <daizhiyuan@phytium.com.cn> Link: https://lore.kernel.org/r/1647831387-3686-1-git-send-email-daizhiyuan@phytium.com.cn Signed-off-by:
Will Deacon <will@kernel.org>
-
Julia Lawall authored
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by:
Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20220318103729.157574-10-Julia.Lawall@inria.fr [will: Squashed in 20220318103729.157574-28-Julia.Lawall@inria.fr] Signed-off-by:
Will Deacon <will@kernel.org>
-
Phil Auld authored
Arm64 systems rely on store_cpu_topology() to call update_siblings_masks() to transfer the toplogy to the various cpu masks. This needs to be done before the call to notify_cpu_starting() which tells the scheduler about each cpu found, otherwise the core scheduling data structures are setup in a way that does not match the actual topology. With smt_mask not setup correctly we bail on `cpumask_weight(smt_mask) == 1` for !leaders in: notify_cpu_starting() cpuhp_invoke_callback_range() sched_cpu_starting() sched_core_cpu_starting() which leads to rq->core not being correctly set for !leader-rq's. Without this change stress-ng (which enables core scheduling in its prctl tests in newer versions -- i.e. with PR_SCHED_CORE support) causes a warning and then a crash (trimmed for legibility): [ 1853.805168] ------------[ cut here ]------------ [ 1853.809784] task_rq(b)->core != rq->core [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4 ... [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 ... [ 1854.231256] Call trace: [ 1854.233689] pick_next_task+0x3dc/0x81c [ 1854.237512] __schedule+0x10c/0x4cc [ 1854.240988] schedule_idle+0x34/0x54 Fixes: 9edeaea1 ("sched: Core-wide rq->lock") Signed-off-by:
Phil Auld <pauld@redhat.com> Reviewed-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20220331153926.25742-1-pauld@redhat.com Signed-off-by:
Will Deacon <will@kernel.org>
-
Catalin Marinas authored
With 64K page configurations, the tags array stored on the stack of the mte_dump_tag_range() function is 2048 bytes, triggering a compiler warning when CONFIG_FRAME_WARN is enabled. Switch to a kmalloc() allocation via mte_allocate_tag_storage(). Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> Fixes: 6dd8b1a0 ("arm64: mte: Dump the MTE tags in the core file") Reported-by:
kernel test robot <lkp@intel.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220401151356.1674232-1-catalin.marinas@arm.com Signed-off-by:
Will Deacon <will@kernel.org>
-