Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Apr 12, 2022
    • Mikulas Patocka's avatar
      stat: fix inconsistency between struct stat and struct compat_stat · 932aba1e
      Mikulas Patocka authored
      struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit
      st_dev and st_rdev; struct compat_stat (defined in
      arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by
      a 16-bit padding.
      
      This patch fixes struct compat_stat to match struct stat.
      
      [ Historical note: the old x86 'struct stat' did have that 16-bit field
        that the compat layer had kept around, but it was changes back in 2003
        by "struct stat - support larger dev_t":
      
          https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d
      
      
      
        and back in those days, the x86_64 port was still new, and separate
        from the i386 code, and had already picked up the old version with a
        16-bit st_dev field ]
      
      Note that we can't change compat_dev_t because it is used by
      compat_loop_info.
      
      Also, if the st_dev and st_rdev values are 32-bit, we don't have to use
      old_valid_dev to test if the value fits into them.  This fixes
      -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major
      number 259.
      
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      932aba1e
  2. Apr 11, 2022
  3. Apr 08, 2022
  4. Apr 07, 2022
  5. Apr 06, 2022
    • Kefeng Wang's avatar
      Revert "powerpc: Set max_mapnr correctly" · 1ff5c8e8
      Kefeng Wang authored
      This reverts commit 602946ec.
      
      If CONFIG_HIGHMEM is enabled, no highmem will be added with max_mapnr
      set to max_low_pfn, see mem_init():
      
        for (pfn = highmem_mapnr; pfn < max_mapnr; ++pfn) {
              ...
              free_highmem_page();
        }
      
      Now that virt_addr_valid() has been fixed in the previous commit, we can
      revert the change to max_mapnr.
      
      Fixes: 602946ec
      
       ("powerpc: Set max_mapnr correctly")
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Reported-by: default avatarErhard F. <erhard_f@mailbox.org>
      [mpe: Update change log to reflect series reordering]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20220406145802.538416-2-mpe@ellerman.id.au
      1ff5c8e8
    • Kefeng Wang's avatar
      powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit · ffa0b64e
      Kefeng Wang authored
      mpe: On 64-bit Book3E vmalloc space starts at 0x8000000000000000.
      
      Because of the way __pa() works we have:
        __pa(0x8000000000000000) == 0, and therefore
        virt_to_pfn(0x8000000000000000) == 0, and therefore
        virt_addr_valid(0x8000000000000000) == true
      
      Which is wrong, virt_addr_valid() should be false for vmalloc space.
      In fact all vmalloc addresses that alias with a valid PFN will return
      true from virt_addr_valid(). That can cause bugs with hardened usercopy
      as described below by Kefeng Wang:
      
        When running ethtool eth0 on 64-bit Book3E, a BUG occurred:
      
          usercopy: Kernel memory exposure attempt detected from SLUB object not in SLUB page?! (offset 0, size 1048)!
          kernel BUG at mm/usercopy.c:99
          ...
          usercopy_abort+0x64/0xa0 (unreliable)
          __check_heap_object+0x168/0x190
          __check_object_size+0x1a0/0x200
          dev_ethtool+0x2494/0x2b20
          dev_ioctl+0x5d0/0x770
          sock_do_ioctl+0xf0/0x1d0
          sock_ioctl+0x3ec/0x5a0
          __se_sys_ioctl+0xf0/0x160
          system_call_exception+0xfc/0x1f0
          system_call_common+0xf8/0x200
      
        The code shows below,
      
          data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN));
          copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN))
      
        The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true
        on 64-bit Book3E, which leads to the panic.
      
        As commit 4dd7554a ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va
        and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in
        the virt_addr_valid() for 64-bit, also add upper limit check to make
        sure the virt is below high_memory.
      
        Meanwhile, for 32-bit PAGE_OFFSET is the virtual address of the start
        of lowmem, high_memory is the upper low virtual address, the check is
        suitable for 32-bit, this will fix the issue mentioned in commit
        602946ec ("powerpc: Set max_mapnr correctly") too.
      
      On 32-bit there is a similar problem with high memory, that was fixed in
      commit 602946ec
      
       ("powerpc: Set max_mapnr correctly"), but that
      commit breaks highmem and needs to be reverted.
      
      We can't easily fix __pa(), we have code that relies on its current
      behaviour. So for now add extra checks to virt_addr_valid().
      
      For 64-bit Book3S the extra checks are not necessary, the combination of
      virt_to_pfn() and pfn_valid() should yield the correct result, but they
      are harmless.
      
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      [mpe: Add additional change log detail]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20220406145802.538416-1-mpe@ellerman.id.au
      ffa0b64e
    • Reiji Watanabe's avatar
      KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs · 26bf74bd
      Reiji Watanabe authored
      KVM allows userspace to configure either all EL1 32bit or 64bit vCPUs
      for a guest.  At vCPU reset, vcpu_allowed_register_width() checks
      if the vcpu's register width is consistent with all other vCPUs'.
      Since the checking is done even against vCPUs that are not initialized
      (KVM_ARM_VCPU_INIT has not been done) yet, the uninitialized vCPUs
      are erroneously treated as 64bit vCPU, which causes the function to
      incorrectly detect a mixed-width VM.
      
      Introduce KVM_ARCH_FLAG_EL1_32BIT and KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED
      bits for kvm->arch.flags.  A value of the EL1_32BIT bit indicates that
      the guest needs to be configured with all 32bit or 64bit vCPUs, and
      a value of the REG_WIDTH_CONFIGURED bit indicates if a value of the
      EL1_32BIT bit is valid (already set up). Values in those bits are set at
      the first KVM_ARM_VCPU_INIT for the guest based on KVM_ARM_VCPU_EL1_32BIT
      configuration for the vCPU.
      
      Check vcpu's register width against those new ...
      26bf74bd
    • Joey Gouly's avatar
      arm64: alternatives: mark patch_alternative() as `noinstr` · a2c0b0fb
      Joey Gouly authored
      
      The alternatives code must be `noinstr` such that it does not patch itself,
      as the cache invalidation is only performed after all the alternatives have
      been applied.
      
      Mark patch_alternative() as `noinstr`. Mark branch_insn_requires_update()
      and get_alt_insn() with `__always_inline` since they are both only called
      through patch_alternative().
      
      Booting a kernel in QEMU TCG with KCSAN=y and ARM64_USE_LSE_ATOMICS=y caused
      a boot hang:
      [    0.241121] CPU: All CPU(s) started at EL2
      
      The alternatives code was patching the atomics in __tsan_read4() from LL/SC
      atomics to LSE atomics.
      
      The following fragment is using LL/SC atomics in the .text section:
        | <__tsan_unaligned_read4+304>:     ldxr    x6, [x2]
        | <__tsan_unaligned_read4+308>:     add     x6, x6, x5
        | <__tsan_unaligned_read4+312>:     stxr    w7, x6, [x2]
        | <__tsan_unaligned_read4+316>:     cbnz    w7, <__tsan_unaligned_read4+304>
      
      This LL/SC atomic sequence was to be replaced with LSE atomics. However since
      the alternatives code was instrumentable, __tsan_read4() was being called after
      only the first instruction was replaced, which led to the following code in memory:
        | <__tsan_unaligned_read4+304>:     ldadd   x5, x6, [x2]
        | <__tsan_unaligned_read4+308>:     add     x6, x6, x5
        | <__tsan_unaligned_read4+312>:     stxr    w7, x6, [x2]
        | <__tsan_unaligned_read4+316>:     cbnz    w7, <__tsan_unaligned_read4+304>
      
      This caused an infinite loop as the `stxr` instruction never completed successfully,
      so `w7` was always 0.
      
      Signed-off-by: default avatarJoey Gouly <joey.gouly@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20220405104733.11476-1-joey.gouly@arm.com
      
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      a2c0b0fb
    • Yu Zhe's avatar
    • Oliver Upton's avatar
      KVM: arm64: Don't split hugepages outside of MMU write lock · f587661f
      Oliver Upton authored
      It is possible to take a stage-2 permission fault on a page larger than
      PAGE_SIZE. For example, when running a guest backed by 2M HugeTLB, KVM
      eagerly maps at the largest possible block size. When dirty logging is
      enabled on a memslot, KVM does *not* eagerly split these 2M stage-2
      mappings and instead clears the write bit on the pte.
      
      Since dirty logging is always performed at PAGE_SIZE granularity, KVM
      lazily splits these 2M block mappings down to PAGE_SIZE in the stage-2
      fault handler. This operation must be done under the write lock. Since
      commit f783ef1c ("KVM: arm64: Add fast path to handle permission
      relaxation during dirty logging"), the stage-2 fault handler
      conditionally takes the read lock on permission faults with dirty
      logging enabled. To that end, it is possible to split a 2M block mapping
      while only holding the read lock.
      
      The problem is demonstrated by running kvm_page_table_test with 2M
      anonymous HugeTLB, which splats like so:
      
        WARNING: CPU: 5 PID: 15276 at arch/arm64/kvm/hyp/pgtable.c:153 stage2_map_walk_leaf+0x124/0x158
      
        [...]
      
        Call trace:
        stage2_map_walk_leaf+0x124/0x158
        stage2_map_walker+0x5c/0xf0
        __kvm_pgtable_walk+0x100/0x1d4
        __kvm_pgtable_walk+0x140/0x1d4
        __kvm_pgtable_walk+0x140/0x1d4
        kvm_pgtable_walk+0xa0/0xf8
        kvm_pgtable_stage2_map+0x15c/0x198
        user_mem_abort+0x56c/0x838
        kvm_handle_guest_abort+0x1fc/0x2a4
        handle_exit+0xa4/0x120
        kvm_arch_vcpu_ioctl_run+0x200/0x448
        kvm_vcpu_ioctl+0x588/0x664
        __arm64_sys_ioctl+0x9c/0xd4
        invoke_syscall+0x4c/0x144
        el0_svc_common+0xc4/0x190
        do_el0_svc+0x30/0x8c
        el0_svc+0x28/0xcc
        el0t_64_sync_handler+0x84/0xe4
        el0t_64_sync+0x1a4/0x1a8
      
      Fix the issue by only acquiring the read lock if the guest faulted on a
      PAGE_SIZE granule w/ dirty logging enabled. Add a WARN to catch locking
      bugs in future changes.
      
      Fixes: f783ef1c
      
       ("KVM: arm64: Add fast path to handle permission relaxation during dirty logging")
      Cc: Jing Zhang <jingzhangos@google.com>
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Reviewed-by: default avatarReiji Watanabe <reijiw@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220401194652.950240-1-oupton@google.com
      f587661f
    • Oliver Upton's avatar
      KVM: arm64: Drop unneeded minor version check from PSCI v1.x handler · 73b725c7
      Oliver Upton authored
      
      We already sanitize the guest's PSCI version when it is being written by
      userspace, rejecting unsupported version numbers. Additionally, the
      'minor' parameter to kvm_psci_1_x_call() is a constant known at compile
      time for all callsites.
      
      Though it is benign, the additional check against the
      PSCI kvm_psci_1_x_call() is unnecessary and likely to be missed the next
      time KVM raises its maximum PSCI version. Drop the check altogether and
      rely on sanitization when the PSCI version is set by userspace.
      
      No functional change intended.
      
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220322183538.2757758-4-oupton@google.com
      73b725c7
    • Oliver Upton's avatar
      KVM: arm64: Actually prevent SMC64 SYSTEM_RESET2 from AArch32 · 827c2ab3
      Oliver Upton authored
      The SMCCC does not allow the SMC64 calling convention to be used from
      AArch32. While KVM checks to see if the calling convention is allowed in
      PSCI_1_0_FN_PSCI_FEATURES, it does not actually prevent calls to
      unadvertised PSCI v1.0+ functions.
      
      Hoist the check to see if the requested function is allowed into
      kvm_psci_call(), thereby preventing SMC64 calls from AArch32 for all
      PSCI versions.
      
      Fixes: d43583b8
      
       ("KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest")
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Reviewed-by: default avatarReiji Watanabe <reijiw@google.com>
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220322183538.2757758-3-oupton@google.com
      827c2ab3
    • Oliver Upton's avatar
      KVM: arm64: Generally disallow SMC64 for AArch32 guests · 2da0aebc
      Oliver Upton authored
      The only valid calling SMC calling convention from an AArch32 state is
      SMC32. Disallow any PSCI function that sets the SMC64 function ID bit
      when called from AArch32 rather than comparing against known SMC64 PSCI
      functions.
      
      Note that without this change KVM advertises the SMC64 flavor of
      SYSTEM_RESET2 to AArch32 guests.
      
      Fixes: d43583b8
      
       ("KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest")
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Reviewed-by: default avatarReiji Watanabe <reijiw@google.com>
      Reviewed-by: default avatarAndrew Jones <drjones@redhat.com>
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220322183538.2757758-2-oupton@google.com
      2da0aebc
  6. Apr 05, 2022
  7. Apr 04, 2022