Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Dec 19, 2021
    • Sean Christopherson's avatar
      KVM: x86: Retry page fault if MMU reload is pending and root has no sp · 18c841e1
      Sean Christopherson authored
      Play nice with a NULL shadow page when checking for an obsolete root in
      the page fault handler by flagging the page fault as stale if there's no
      shadow page associated with the root and KVM_REQ_MMU_RELOAD is pending.
      Invalidating memslots, which is the only case where _all_ roots need to
      be reloaded, requests all vCPUs to reload their MMUs while holding
      mmu_lock for lock.
      
      The "special" roots, e.g. pae_root when KVM uses PAE paging, are not
      backed by a shadow page.  Running with TDP disabled or with nested NPT
      explodes spectaculary due to dereferencing a NULL shadow page pointer.
      
      Skip the KVM_REQ_MMU_RELOAD check if there is a valid shadow page for the
      root.  Zapping shadow pages in response to guest activity, e.g. when the
      guest frees a PGD, can trigger KVM_REQ_MMU_RELOAD even if the current
      vCPU isn't using the affected root.  I.e. KVM_REQ_MMU_RELOAD can be seen
      with a completely valid root shadow page.  This is a bit of a moot point
      as KVM currently unloads all roots on KVM_REQ_MMU_RELOAD, but that will
      be cleaned up in the future.
      
      Fixes: a955cad8
      
       ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
      Cc: stable@vger.kernel.org
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211209060552.2956723-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      18c841e1
    • Vitaly Kuznetsov's avatar
      KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES · 1aa2abb3
      Vitaly Kuznetsov authored
      The ability to write to MSR_IA32_PERF_CAPABILITIES from the host should
      not depend on guest visible CPUID entries, even if just to allow
      creating/restoring guest MSRs and CPUIDs in any sequence.
      
      Fixes: 27461da3
      
       ("KVM: x86/pmu: Support full width counting")
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211216165213.338923-3-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1aa2abb3
  2. Dec 17, 2021
  3. Dec 16, 2021
  4. Dec 14, 2021
  5. Dec 12, 2021
  6. Dec 10, 2021
    • Niklas Schnelle's avatar
      s390: enable switchdev support in defconfig · 5dcf0c30
      Niklas Schnelle authored
      The HiperSockets Converged Interface (HSCI) introduced with commit
      4e20e73e
      
       ("s390/qeth: Switchdev event handler") requires
      CONFIG_SWITCHDEV=y to be usable. Similarly when using Linux controlled
      SR-IOV capable PF devices with the mlx5_core driver CONFIG_SWITCHDEV=y
      as well as CONFIG_MLX5_ESWITCH=y are necessary to actually get link on
      the created VFs. So let's add these to the defconfig to make both types
      of devices usable. Note also that these options are already enabled in
      most current distribution kernels.
      
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      5dcf0c30
    • Alexander Egorenkov's avatar
      s390/kexec: handle R_390_PLT32DBL rela in arch_kexec_apply_relocations_add() · abf0e8e4
      Alexander Egorenkov authored
      Starting with gcc 11.3, the C compiler will generate PLT-relative function
      calls even if they are local and do not require it. Later on during linking,
      the linker will replace all PLT-relative calls to local functions with
      PC-relative ones. Unfortunately, the purgatory code of kexec/kdump is
      not being linked as a regular executable or shared library would have been,
      and therefore, all PLT-relative addresses remain in the generated purgatory
      object code unresolved. This leads to the situation where the purgatory
      code is being executed during kdump with all PLT-relative addresses
      unresolved. And this results in endless loops within the purgatory code.
      
      Furthermore, the clang C compiler has always behaved like described above
      and this commit should fix kdump for kernels built with the latter.
      
      Because the purgatory code is no regular executable or shared library,
      contains only calls to local functions and has no PLT, all R_390_PLT32DBL
      relocation entries can be resolved just like a R_390_PC32DBL one.
      
      * https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSeries/x1633.html#AEN1699
      
      
      
      Relocation entries of purgatory code generated with gcc 11.3
      ------------------------------------------------------------
      
      $ readelf -r linux/arch/s390/purgatory/purgatory.o
      
      Relocation section '.rela.text' at offset 0x370 contains 5 entries:
        Offset          Info           Type           Sym. Value    Sym. Name + Addend
      00000000005c  000c00000013 R_390_PC32DBL     0000000000000000 purgatory_sha_regions + 2
      00000000007a  000d00000014 R_390_PLT32DBL    0000000000000000 sha256_update + 2
      00000000008c  000e00000014 R_390_PLT32DBL    0000000000000000 sha256_final + 2
      000000000092  000800000013 R_390_PC32DBL     0000000000000000 .LC0 + 2
      0000000000a0  000f00000014 R_390_PLT32DBL    0000000000000000 memcmp + 2
      
      Relocation entries of purgatory code generated with gcc 11.2
      ------------------------------------------------------------
      
      $ readelf -r linux/arch/s390/purgatory/purgatory.o
      
      Relocation section '.rela.text' at offset 0x368 contains 5 entries:
        Offset          Info           Type           Sym. Value    Sym. Name + Addend
      00000000005c  000c00000013 R_390_PC32DBL     0000000000000000 purgatory_sha_regions + 2
      00000000007a  000d00000013 R_390_PC32DBL     0000000000000000 sha256_update + 2
      00000000008c  000e00000013 R_390_PC32DBL     0000000000000000 sha256_final + 2
      000000000092  000800000013 R_390_PC32DBL     0000000000000000 .LC0 + 2
      0000000000a0  000f00000013 R_390_PC32DBL     0000000000000000 memcmp + 2
      
      Signed-off-by: default avatarAlexander Egorenkov <egorenar@linux.ibm.com>
      Reported-by: default avatarTao Liu <ltao@redhat.com>
      Suggested-by: default avatarPhilipp Rudo <prudo@redhat.com>
      Reviewed-by: default avatarPhilipp Rudo <prudo@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20211209073817.82196-1-egorenar@linux.ibm.com
      
      
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      abf0e8e4
    • Jerome Marchand's avatar
      s390/ftrace: remove preempt_disable()/preempt_enable() pair · ac8fc6af
      Jerome Marchand authored
      It looks like commit ce5e4803 ("ftrace: disable preemption
      when recursion locked") missed a spot in kprobe_ftrace_handler() in
      arch/s390/kernel/ftrace.c.
      Remove the superfluous preempt_disable/enable_notrace() there too.
      
      Fixes: ce5e4803
      
       ("ftrace: disable preemption when recursion locked")
      Signed-off-by: default avatarJerome Marchand <jmarchan@redhat.com>
      Link: https://lore.kernel.org/r/20211208151503.1510381-1-jmarchan@redhat.com
      
      
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      ac8fc6af
    • Philipp Rudo's avatar
      s390/kexec_file: fix error handling when applying relocations · 41967a37
      Philipp Rudo authored
      arch_kexec_apply_relocations_add currently ignores all errors returned
      by arch_kexec_do_relocs. This means that every unknown relocation is
      silently skipped causing unpredictable behavior while the relocated code
      runs. Fix this by checking for errors and fail kexec_file_load if an
      unknown relocation type is encountered.
      
      The problem was found after gcc changed its behavior and used
      R_390_PLT32DBL relocations for brasl instruction and relied on ld to
      resolve the relocations in the final link in case direct calls are
      possible. As the purgatory code is only linked partially (option -r)
      ld didn't resolve the relocations leaving them for arch_kexec_do_relocs.
      But arch_kexec_do_relocs doesn't know how to handle R_390_PLT32DBL
      relocations so they were silently skipped. This ultimately caused an
      endless loop in the purgatory as the brasl instructions kept branching
      to itself.
      
      Fixes: 71406883
      
       ("s390/kexec_file: Add kexec_file_load system call")
      Reported-by: default avatarTao Liu <ltao@redhat.com>
      Signed-off-by: default avatarPhilipp Rudo <prudo@redhat.com>
      Link: https://lore.kernel.org/r/20211208130741.5821-3-prudo@redhat.com
      
      
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      41967a37
    • Philipp Rudo's avatar
      s390/kexec_file: print some more error messages · edce10ee
      Philipp Rudo authored
      
      Be kind and give some more information on what went wrong.
      
      Signed-off-by: default avatarPhilipp Rudo <prudo@redhat.com>
      Link: https://lore.kernel.org/r/20211208130741.5821-2-prudo@redhat.com
      
      
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      edce10ee
    • Sean Christopherson's avatar
      KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exit · d07898ea
      Sean Christopherson authored
      Replace a WARN with a comment to call out that userspace can modify RCX
      during an exit to userspace to handle string I/O.  KVM doesn't actually
      support changing the rep count during an exit, i.e. the scenario can be
      ignored, but the WARN needs to go as it's trivial to trigger from
      userspace.
      
      Cc: stable@vger.kernel.org
      Fixes: 3b27de27
      
       ("KVM: x86: split the two parts of emulator_pio_in")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211025201311.1881846-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d07898ea
    • Lai Jiangshan's avatar
      KVM: X86: Raise #GP when clearing CR0_PG in 64 bit mode · 777ab82d
      Lai Jiangshan authored
      
      In the SDM:
      If the logical processor is in 64-bit mode or if CR4.PCIDE = 1, an
      attempt to clear CR0.PG causes a general-protection exception (#GP).
      Software should transition to compatibility mode and clear CR4.PCIDE
      before attempting to disable paging.
      
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20211207095230.53437-1-jiangshanlai@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      777ab82d
    • Sean Christopherson's avatar
      KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req · 3244867a
      Sean Christopherson authored
      Do not bail early if there are no bits set in the sparse banks for a
      non-sparse, a.k.a. "all CPUs", IPI request.  Per the Hyper-V spec, it is
      legal to have a variable length of '0', e.g. VP_SET's BankContents in
      this case, if the request can be serviced without the extra info.
      
        It is possible that for a given invocation of a hypercall that does
        accept variable sized input headers that all the header input fits
        entirely within the fixed size header. In such cases the variable sized
        input header is zero-sized and the corresponding bits in the hypercall
        input should be set to zero.
      
      Bailing early results in KVM failing to send IPIs to all CPUs as expected
      by the guest.
      
      Fixes: 214ff83d
      
       ("KVM: x86: hyperv: implement PV IPI send hypercalls")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211207220926.718794-2-seanjc@google.c...
      3244867a
    • Vitaly Kuznetsov's avatar
      KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall · 1ebfaa11
      Vitaly Kuznetsov authored
      Prior to commit 0baedd79 ("KVM: x86: make Hyper-V PV TLB flush use
      tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH |
      KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs
      and KVM_REQ_TLB_FLUSH is/was defined as:
      
       (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
      
      so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that
      "This call guarantees that by the time control returns back to the
      caller, the observable effects of all flushes on the specified virtual
      processors have occurred." and without KVM_REQUEST_WAIT there's a small
      chance that the vCPU making the TLB flush will resume running before
      all IPIs get delivered to other vCPUs and a stale mapping can get read
      there.
      
      Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST:
      kvm_hv_flush_tlb() is the sole caller which uses it for
      kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where
      KVM_REQUEST_WAIT makes a difference.
      
      Cc: stable@kernel.org
      Fixes: 0baedd79
      
       ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211209102937.584397-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1ebfaa11
  7. Dec 09, 2021
    • Tiezhu Yang's avatar
      MIPS: Only define pci_remap_iospace() for Ralink · 09d97da6
      Tiezhu Yang authored
      After commit 9f76779f ("MIPS: implement architecture-specific
      'pci_remap_iospace()'"), there exists the following warning on the
      Loongson64 platform:
      
          loongson-pci 1a000000.pci:       IO 0x0018020000..0x001803ffff -> 0x0000020000
          loongson-pci 1a000000.pci:      MEM 0x0040000000..0x007fffffff -> 0x0040000000
          ------------[ cut here ]------------
          WARNING: CPU: 2 PID: 1 at arch/mips/pci/pci-generic.c:55 pci_remap_iospace+0x84/0x90
          resource start address is not zero
          ...
          Call Trace:
          [<ffffffff8020dc78>] show_stack+0x40/0x120
          [<ffffffff80cf4a0c>] dump_stack_lvl+0x58/0x74
          [<ffffffff8023a0b0>] __warn+0xe0/0x110
          [<ffffffff80cee02c>] warn_slowpath_fmt+0xa4/0xd0
          [<ffffffff80cecf24>] pci_remap_iospace+0x84/0x90
          [<ffffffff807f9864>] devm_pci_remap_iospace+0x5c/0xb8
          [<ffffffff808121b0>] devm_of_pci_bridge_init+0x178/0x1f8
          [<ffffffff807f4000>] devm_pci_alloc_host_bridge+0x78/0x98
          [<ffffffff80819454>] loongson_pci_probe+0x34/0x160
          [<ffffffff809203cc>] platform_probe+0x6c/0xe0
          [<ffffffff8091d5d4>] really_probe+0xbc/0x340
          [<ffffffff8091d8f0>] __driver_probe_device+0x98/0x110
          [<ffffffff8091d9b8>] driver_probe_device+0x50/0x118
          [<ffffffff8091dea0>] __driver_attach+0x80/0x118
          [<ffffffff8091b280>] bus_for_each_dev+0x80/0xc8
          [<ffffffff8091c6d8>] bus_add_driver+0x130/0x210
          [<ffffffff8091ead4>] driver_register+0x8c/0x150
          [<ffffffff80200a8c>] do_one_initcall+0x54/0x288
          [<ffffffff811a5320>] kernel_init_freeable+0x27c/0x2e4
          [<ffffffff80cfc380>] kernel_init+0x2c/0x134
          [<ffffffff80205a2c>] ret_from_kernel_thread+0x14/0x1c
          ---[ end trace e4a0efe10aa5cce6 ]---
          loongson-pci 1a000000.pci: error -19: failed to map resource [io  0x20000-0x3ffff]
      
      We can see that the resource start address is 0x0000020000, because
      the ISA Bridge used the zero address which is defined in the dts file
      arch/mips/boot/dts/loongson/ls7a-pch.dtsi:
      
          ISA Bridge: /bus@10000000/isa@18000000
          IO 0x0000000018000000..0x000000001801ffff  ->  0x0000000000000000
      
      Based on the above analysis, the architecture-specific pci_remap_iospace()
      is not suitable for Loongson64, we should only define pci_remap_iospace()
      for Ralink on MIPS based on the commit background.
      
      Fixes: 9f76779f
      
       ("MIPS: implement architecture-specific 'pci_remap_iospace()'")
      Suggested-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Tested-by: default avatarSergio Paracuellos <sergio.paracuellos@gmail.com>
      Acked-by: default avatarSergio Paracuellos <sergio.paracuellos@gmail.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      09d97da6
  8. Dec 08, 2021
  9. Dec 07, 2021
  10. Dec 05, 2021
  11. Dec 04, 2021
    • Helge Deller's avatar
      parisc: Mark cr16 CPU clocksource unstable on all SMP machines · afdb4a5b
      Helge Deller authored
      In commit c8c37359 ("parisc: Enhance detection of synchronous cr16
      clocksources") I assumed that CPUs on the same physical core are syncronous.
      While booting up the kernel on two different C8000 machines, one with a
      dual-core PA8800 and one with a dual-core PA8900 CPU, this turned out to be
      wrong. The symptom was that I saw a jump in the internal clocks printed to the
      syslog and strange overall behaviour.  On machines which have 4 cores (2
      dual-cores) the problem isn't visible, because the current logic already marked
      the cr16 clocksource unstable in this case.
      
      This patch now marks the cr16 interval timers unstable if we have more than one
      CPU in the system, and it fixes this issue.
      
      Fixes: c8c37359
      
       ("parisc: Enhance detection of synchronous cr16 clocksources")
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org> # v5.15+
      afdb4a5b
    • Helge Deller's avatar
      parisc: Fix "make install" on newer debian releases · 0f9fee4c
      Helge Deller authored
      
      On newer debian releases the debian-provided "installkernel" script is
      installed in /usr/sbin. Fix the kernel install.sh script to look for the
      script in this directory as well.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org> # v3.13+
      0f9fee4c
  12. Dec 03, 2021