Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Mar 17, 2021
    • Eric Farman's avatar
      s390/cio: return -EFAULT if copy_to_user() fails · 72ba965b
      Eric Farman authored
      commit d9c48a94 upstream.
      
      Fixes: 120e214e
      
       ("vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls")
      Signed-off-by: default avatarEric Farman <farman@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72ba965b
    • Artem Lapkin's avatar
      drm: meson_drv add shutdown function · d2100ef3
      Artem Lapkin authored
      commit fa0c16ca upstream.
      
      Problem: random stucks on reboot stage about 1/20 stuck/reboots
      // debug kernel log
      [    4.496660] reboot: kernel restart prepare CMD:(null)
      [    4.498114] meson_ee_pwrc c883c000.system-controller:power-controller: shutdown begin
      [    4.503949] meson_ee_pwrc c883c000.system-controller:power-controller: shutdown domain 0:VPU...
      ...STUCK...
      
      Solution: add shutdown function to meson_drm driver
      // debug kernel log
      [    5.231896] reboot: kernel restart prepare CMD:(null)
      [    5.246135] [drm:meson_drv_shutdown]
      ...
      [    5.259271] meson_ee_pwrc c883c000.system-controller:power-controller: shutdown begin
      [    5.274688] meson_ee_pwrc c883c000.system-controller:power-controller: shutdown domain 0:VPU...
      [    5.338331] reboot: Restarting system
      [    5.358293] psci: PSCI_0_2_FN_SYSTEM_RESET reboot_mode:0 cmd:(null)
      bl31 reboot reason: 0xd
      bl31 reboot reason: 0x0
      system cmd  1.
      ...REBOOT...
      
      Tested: on VIM1 VIM2 VIM3 VIM3L khadas sbcs - 1000+ successful reboots
      and Odroid boards, WeTek Play2 (GXBB)
      
      Fixes: bbbe775e
      
       ("drm: Add support for Amlogic Meson Graphic Controller")
      Signed-off-by: default avatarArtem Lapkin <art@khadas.com>
      Tested-by: default avatarChristian Hewitt <christianshewitt@gmail.com>
      Acked-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Acked-by: default avatarKevin Hilman <khilman@baylibre.com>
      Signed-off-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210302042202.3728113-1-art@khadas.com
      
      
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2100ef3
    • Neil Roberts's avatar
      drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff · 72c541cc
      Neil Roberts authored
      commit 11d5a474 upstream.
      
      When mmapping the shmem, it would previously adjust the pgoff in the
      vm_area_struct to remove the fake offset that is added to be able to
      identify the buffer. This patch removes the adjustment and makes the
      fault handler use the vm_fault address to calculate the page offset
      instead. Although using this address is apparently discouraged, several
      DRM drivers seem to be doing it anyway.
      
      The problem with removing the pgoff is that it prevents
      drm_vma_node_unmap from working because that searches the mapping tree
      by address. That doesn't work because all of the mappings are at offset
      0. drm_vma_node_unmap is being used by the shmem helpers when purging
      the buffer.
      
      This fixes a bug in Panfrost which is using drm_gem_shmem_purge. Without
      this the mapping for the purged buffer can still be accessed which might
      mean it would access random pages from other buffers
      
      v2: Don't check whether the unsigned page_offset is less than 0.
      
      Cc: stable@vger.kernel.org
      Fixes: 17acb9f3
      
       ("drm/shmem: Add madvise state and purge helpers")
      Signed-off-by: default avatarNeil Roberts <nroberts@igalia.com>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210223155125.199577-3-nroberts@igalia.com
      
      
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72c541cc
    • Neil Roberts's avatar
      drm/shmem-helper: Check for purged buffers in fault handler · 0d574fc4
      Neil Roberts authored
      commit d611b4a0 upstream.
      
      When a buffer is madvised as not needed and then purged, any attempts to
      access the buffer from user-space should cause a bus fault. This patch
      adds a check for that.
      
      Cc: stable@vger.kernel.org
      Fixes: 17acb9f3
      
       ("drm/shmem: Add madvise state and purge helpers")
      Signed-off-by: default avatarNeil Roberts <nroberts@igalia.com>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210223155125.199577-2-nroberts@igalia.com
      
      
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d574fc4
    • Daniel Vetter's avatar
      drm/compat: Clear bounce structures · 3b08ea3a
      Daniel Vetter authored
      commit de066e11
      
       upstream.
      
      Some of them have gaps, or fields we don't clear. Native ioctl code
      does full copies plus zero-extends on size mismatch, so nothing can
      leak. But compat is more hand-rolled so need to be careful.
      
      None of these matter for performance, so just memset.
      
      Also I didn't fix up the CONFIG_DRM_LEGACY or CONFIG_DRM_AGP ioctl, those
      are security holes anyway.
      
      Acked-by: default avatarMaxime Ripard <mripard@kernel.org>
      Reported-by: syzbot+620cf21140fc7e772a5d@syzkaller.appspotmail.com # vblank ioctl
      Cc: syzbot+620cf21140fc7e772a5d@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210222100643.400935-1-daniel.vetter@ffwll.ch
      (cherry picked from commit e926c474
      
      )
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b08ea3a
    • Edwin Peer's avatar
      bnxt_en: reliably allocate IRQ table on reset to avoid crash · cabbd263
      Edwin Peer authored
      commit 20d7d1c5 upstream.
      
      The following trace excerpt corresponds with a NULL pointer dereference
      of 'bp->irq_tbl' in bnxt_setup_inta() on an Aarch64 system after many
      device resets:
      
          Unable to handle kernel NULL pointer dereference at ... 000000d
          ...
          pc : string+0x3c/0x80
          lr : vsnprintf+0x294/0x7e0
          sp : ffff00000f61ba70 pstate : 20000145
          x29: ffff00000f61ba70 x28: 000000000000000d
          x27: ffff0000009c8b5a x26: ffff00000f61bb80
          x25: ffff0000009c8b5a x24: 0000000000000012
          x23: 00000000ffffffe0 x22: ffff000008990428
          x21: ffff00000f61bb80 x20: 000000000000000d
          x19: 000000000000001f x18: 0000000000000000
          x17: 0000000000000000 x16: ffff800b6d0fb400
          x15: 0000000000000000 x14: ffff800b7fe31ae8
          x13: 00001ed16472c920 x12: ffff000008c6b1c9
          x11: ffff000008cf0580 x10: ffff00000f61bb80
          x9 : 00000000ffffffd8 x8 : 000000000000000c
          x7 : ffff800b684b8000 x6 : 0000000000000000
          x5 : 0000000000000065 x4 : 0000000000000001
          x3 : ffff0a00ffffff04 x2 : 000000000000001f
          x1 : 0000000000000000 x0 : 000000000000000d
          Call trace:
          string+0x3c/0x80
          vsnprintf+0x294/0x7e0
          snprintf+0x44/0x50
          __bnxt_open_nic+0x34c/0x928 [bnxt_en]
          bnxt_open+0xe8/0x238 [bnxt_en]
          __dev_open+0xbc/0x130
          __dev_change_flags+0x12c/0x168
          dev_change_flags+0x20/0x60
          ...
      
      Ordinarily, a call to bnxt_setup_inta() (not in trace due to inlining)
      would not be expected on a system supporting MSIX at all. However, if
      bnxt_init_int_mode() does not end up being called after the call to
      bnxt_clear_int_mode() in bnxt_fw_reset_close(), then the driver will
      think that only INTA is supported and bp->irq_tbl will be NULL,
      causing the above crash.
      
      In the error recovery scenario, we call bnxt_clear_int_mode() in
      bnxt_fw_reset_close() early in the sequence. Ordinarily, we will
      call bnxt_init_int_mode() in bnxt_hwrm_if_change() after we
      reestablish communication with the firmware after reset.  However,
      if the sequence has to abort before we call bnxt_init_int_mode() and
      if the user later attempts to re-open the device, then it will cause
      the crash above.
      
      We fix it in 2 ways:
      
      1. Check for bp->irq_tbl in bnxt_setup_int_mode(). If it is NULL, call
      bnxt_init_init_mode().
      
      2. If we need to abort in bnxt_hwrm_if_change() and cannot complete
      the error recovery sequence, set the BNXT_STATE_ABORT_ERR flag.  This
      will cause more drastic recovery at the next attempt to re-open the
      device, including a call to bnxt_init_int_mode().
      
      Fixes: 3bc7d4a3
      
       ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
      Reviewed-by: default avatarScott Branden <scott.branden@broadcom.com>
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cabbd263
    • Wang Qing's avatar
      s390/cio: return -EFAULT if copy_to_user() fails again · dfa176f3
      Wang Qing authored
      commit 51c44bab upstream.
      
      The copy_to_user() function returns the number of bytes remaining to be
      copied, but we want to return -EFAULT if the copy doesn't complete.
      
      Fixes: e01bcdd6
      
       ("vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl")
      Signed-off-by: default avatarWang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Link: https://lore.kernel.org/r/1614600093-13992-1-git-send-email-wangqing@vivo.com
      
      
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfa176f3
    • Jian Shen's avatar
      net: hns3: fix bug when calculating the TCAM table info · 05d11eb7
      Jian Shen authored
      commit b36fc875 upstream.
      
      The function hclge_fd_convert_tuple() is used to convert tuples
      and tuples mask to TCAM x and y.  But it misuses the source mac
      as source mac mask when convert INNER_SRC_MAC, which may cause
      the flow director rule works unexpectedly. So fix it.
      
      Fixes: 11732868
      
       ("net: hns3: Add input key and action config support for flow director")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05d11eb7
    • Jian Shen's avatar
      net: hns3: fix query vlan mask value error for flow director · 8bbc59bb
      Jian Shen authored
      commit c75ec148 upstream.
      
      Currently, the driver returns VLAN_VID_MASK for vlan mask field,
      when get flow director rule information for rule doesn't use vlan.
      It may cause the vlan mask value display as 0xf000 in this
      case, like below:
      
      estuary:/$ ethtool -u eth1
      50 RX rings available
      Total 1 rules
      
      Filter: 2
      Rule Type: TCP over IPv4
      Src IP addr: 0.0.0.0 mask: 255.255.255.255
      Dest IP addr: 0.0.0.0 mask: 255.255.255.255
      TOS: 0x0 mask: 0xff
      Src port: 0 mask: 0xffff
      Dest port: 0 mask: 0xffff
      VLAN EtherType: 0x0 mask: 0xffff
      VLAN: 0x0 mask: 0xf000
      User-defined: 0x1234 mask: 0x0
      Action: Direct to queue 3
      
      Fix it by return 0.
      
      Fixes: 05c2314f
      
       ("net: hns3: Add support for rule query of flow director")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bbc59bb
    • Ian Rogers's avatar
      perf traceevent: Ensure read cmdlines are null terminated. · 4d0273ab
      Ian Rogers authored
      commit 137a5258 upstream.
      
      Issue detected by address sanitizer.
      
      Fixes: cd4ceb63
      
       ("perf util: Save pid-cmdline mapping into tracing header")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20210226221431.1985458-1-irogers@google.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d0273ab
    • Danielle Ratson's avatar
      selftests: forwarding: Fix race condition in mirror installation · ef663d14
      Danielle Ratson authored
      commit edcbf513 upstream.
      
      When mirroring to a gretap in hardware the device expects to be
      programmed with the egress port and all the encapsulating headers. This
      requires the driver to resolve the path the packet will take in the
      software data path and program the device accordingly.
      
      If the path cannot be resolved (in this case because of an unresolved
      neighbor), then mirror installation fails until the path is resolved.
      This results in a race that causes the test to sometimes fail.
      
      Fix this by setting the neighbor's state to permanent, so that it is
      always valid.
      
      Fixes: b5b02939
      
       ("selftests: forwarding: mirror_gre_bridge_1d_vlan: Add STP test")
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef663d14
    • Joakim Zhang's avatar
      net: stmmac: fix watchdog timeout during suspend/resume stress test · fcce3cb6
      Joakim Zhang authored
      commit c511819d upstream.
      
      stmmac_xmit() call stmmac_tx_timer_arm() at the end to modify tx timer to
      do the transmission cleanup work. Imagine such a situation, stmmac enters
      suspend immediately after tx timer modified, it's expire callback
      stmmac_tx_clean() would not be invoked. This could affect BQL, since
      netdev_tx_sent_queue() has been called, but netdev_tx_completed_queue()
      have not been involved, as a result, dql_avail(&dev_queue->dql) finally
      always return a negative value.
      
      __dev_queue_xmit->__dev_xmit_skb->qdisc_run->__qdisc_run->qdisc_restart->dequeue_skb:
      	if ((q->flags & TCQ_F_ONETXQUEUE) &&
      		netif_xmit_frozen_or_stopped(txq)) // __QUEUE_STATE_STACK_XOFF is set
      
      Net core will stop transmitting any more. Finillay, net watchdong would timeout.
      To fix this issue, we should call netdev_tx_reset_queue() in stmmac_resume().
      
      Fixes: 54139cf3
      
       ("net: stmmac: adding multiple buffers for rx")
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcce3cb6
    • Joakim Zhang's avatar
      net: stmmac: stop each tx channel independently · d31ae9ec
      Joakim Zhang authored
      commit a3e860a8 upstream.
      
      If clear GMAC_CONFIG_TE bit, it would stop all tx channels, but users
      may only want to stop specific tx channel.
      
      Fixes: 48863ce5
      
       ("stmmac: add DMA support for GMAC 4.xx")
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d31ae9ec
    • Antony Antony's avatar
      ixgbe: fail to create xfrm offload of IPsec tunnel mode SA · 86ea6055
      Antony Antony authored
      commit d785e1fe upstream.
      
      Based on talks and indirect references ixgbe IPsec offlod do not
      support IPsec tunnel mode offload. It can only support IPsec transport
      mode offload. Now explicitly fail when creating non transport mode SA
      with offload to avoid false performance expectations.
      
      Fixes: 63a67fe2
      
       ("ixgbe: add ipsec offload add and remove SA")
      Signed-off-by: default avatarAntony Antony <antony@phenome.org>
      Acked-by: default avatarShannon Nelson <snelson@pensando.io>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86ea6055
    • Jia-Ju Bai's avatar
      net: qrtr: fix error return code of qrtr_sendmsg() · e8b6c1d7
      Jia-Ju Bai authored
      commit 179d0ba0 upstream.
      
      When sock_alloc_send_skb() returns NULL to skb, no error return code of
      qrtr_sendmsg() is assigned.
      To fix this bug, rc is assigned with -ENOMEM in this case.
      
      Fixes: 194ccc88
      
       ("net: qrtr: Support decoding incoming v2 packets")
      Reported-by: default avatarTOTE Robot <oslab@tsinghua.edu.cn>
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8b6c1d7
    • Paul Cercueil's avatar
      net: davicom: Fix regulator not turned off on driver removal · d28e783c
      Paul Cercueil authored
      commit cf9e60aa upstream.
      
      We must disable the regulator that was enabled in the probe function.
      
      Fixes: 7994fe55
      
       ("dm9000: Add regulator and reset support to dm9000")
      Signed-off-by: default avatarPaul Cercueil <paul@crapouillou.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d28e783c
    • Paul Cercueil's avatar
      net: davicom: Fix regulator not turned off on failed probe · 05517de4
      Paul Cercueil authored
      commit ac88c531 upstream.
      
      When the probe fails or requests to be defered, we must disable the
      regulator that was previously enabled.
      
      Fixes: 7994fe55
      
       ("dm9000: Add regulator and reset support to dm9000")
      Signed-off-by: default avatarPaul Cercueil <paul@crapouillou.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05517de4
    • Xie He's avatar
      net: lapbether: Remove netif_start_queue / netif_stop_queue · 11a58920
      Xie He authored
      commit f7d9d485 upstream.
      
      For the devices in this driver, the default qdisc is "noqueue",
      because their "tx_queue_len" is 0.
      
      In function "__dev_queue_xmit" in "net/core/dev.c", devices with the
      "noqueue" qdisc are specially handled. Packets are transmitted without
      being queued after a "dev->flags & IFF_UP" check. However, it's possible
      that even if this check succeeds, "ops->ndo_stop" may still have already
      been called. This is because in "__dev_close_many", "ops->ndo_stop" is
      called before clearing the "IFF_UP" flag.
      
      If we call "netif_stop_queue" in "ops->ndo_stop", then it's possible in
      "__dev_queue_xmit", it sees the "IFF_UP" flag is present, and then it
      checks "netif_xmit_stopped" and finds that the queue is already stopped.
      In this case, it will complain that:
      "Virtual device ... asks to queue packet!"
      
      To prevent "__dev_queue_xmit" from generating this complaint, we should
      not call "netif_stop_queue" in "ops->ndo_stop".
      
      We also don't need to call "netif_start_queue" in "ops->ndo_open",
      because after a netdev is allocated and registered, the
      "__QUEUE_STATE_DRV_XOFF" flag is initially not set, so there is no need
      to call "netif_start_queue" to clear it.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Acked-by: default avatarMartin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11a58920
    • Paul Moore's avatar
      cipso,calipso: resolve a number of problems with the DOI refcounts · b4800e7a
      Paul Moore authored
      commit ad5d07f4 upstream.
      
      The current CIPSO and CALIPSO refcounting scheme for the DOI
      definitions is a bit flawed in that we:
      
      1. Don't correctly match gets/puts in netlbl_cipsov4_list().
      2. Decrement the refcount on each attempt to remove the DOI from the
         DOI list, only removing it from the list once the refcount drops
         to zero.
      
      This patch fixes these problems by adding the missing "puts" to
      netlbl_cipsov4_list() and introduces a more conventional, i.e.
      not-buggy, refcounting mechanism to the DOI definitions.  Upon the
      addition of a DOI to the DOI list, it is initialized with a refcount
      of one, removing a DOI from the list removes it from the list and
      drops the refcount by one; "gets" and "puts" behave as expected with
      respect to refcounts, increasing and decreasing the DOI's refcount by
      one.
      
      Fixes: b1edeb10 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
      Fixes: d7cce015
      
       ("netlabel: Add support for removing a CALIPSO DOI.")
      Reported-by: default avatar <syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4800e7a
    • Hillf Danton's avatar
      netdevsim: init u64 stats for 32bit hardware · 6d599697
      Hillf Danton authored
      commit 863a42b2 upstream.
      
      Init the u64 stats in order to avoid the lockdep prints on the 32bit
      hardware like
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0
       Hardware name: ARM-Versatile Express
       Backtrace:
       [<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       [<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 [inline])
       [<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 lib/dump_stack.c:120)
       [<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key kernel/locking/lockdep.c:935 [inline])
       [<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 kernel/locking/lockdep.c:1247)
       [<802bef04>] (register_lock_class) from [<802baa2c>] (__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711)
       [<802ba9a8>] (__lock_acquire) from [<802be840>] (lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442)
       [<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5415)
       [<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin include/linux/u64_stats_sync.h:164 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin include/linux/u64_stats_sync.h:175 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 drivers/net/netdevsim/netdev.c:70)
       [<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 net/core/dev.c:10405)
       [<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 net/core/rtnetlink.c:1211)
       [<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] (rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783)
       [<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] (rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798)
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3830 [inline])
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3821 [inline])
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839)
       [<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c net/core/dev.c:10103)
       [<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 drivers/net/netdevsim/netdev.c:317)
       [<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 drivers/net/netdevsim/dev.c:941)
       [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all drivers/net/netdevsim/dev.c:990 [inline])
       [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119)
       [<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 drivers/net/netdevsim/bus.c:287)
       [<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c drivers/base/dd.c:554)
       [<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 drivers/base/dd.c:740)
       [<811728cc>] (driver_probe_device) from [<81172fe4>] (__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846)
       [<81172f58>] (__device_attach_driver) from [<8116fee0>] (bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431)
       [<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 drivers/base/dd.c:914)
       [<81172b90>] (__device_attach) from [<8117305c>] (device_initial_probe+0x14/0x18 drivers/base/dd.c:961)
       [<81173048>] (device_initial_probe) from [<81171358>] (bus_probe_device+0x90/0x98 drivers/base/bus.c:491)
       [<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 drivers/base/core.c:3109)
       [<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 drivers/base/core.c:3182)
       [<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline])
       [<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 drivers/net/netdevsim/bus.c:215)
       [<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 drivers/base/bus.c:122)
       [<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 fs/sysfs/file.c:139)
       [<805b4b44>] (sysfs_kf_write) from [<805b3c90>] (kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296)
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter include/linux/fs.h:1901 [inline])
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write fs/read_write.c:518 [inline])
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c fs/read_write.c:605)
       [<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec fs/read_write.c:658)
       [<804d259c>] (ksys_write) from [<804d2698>] (__do_sys_write fs/read_write.c:670 [inline])
       [<804d259c>] (ksys_write) from [<804d2698>] (sys_write+0x10/0x14 fs/read_write.c:667)
       [<804d2688>] (sys_write) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)
      
      Fixes: 83c9e13a
      
       ("netdevsim: add software driver for testing offloads")
      Reported-by: default avatar <syzbot+e74a6857f2d0efe3ad81@syzkaller.appspotmail.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarHillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d599697
    • Daniele Palmas's avatar
      net: usb: qmi_wwan: allow qmimux add/del with master up · 8e365b61
      Daniele Palmas authored
      commit 6c59cff3 upstream.
      
      There's no reason for preventing the creation and removal
      of qmimux network interfaces when the underlying interface
      is up.
      
      This makes qmi_wwan mux implementation more similar to the
      rmnet one, simplifying userspace management of the same
      logical interfaces.
      
      Fixes: c6adf779
      
       ("net: usb: qmi_wwan: add qmap mux protocol support")
      Reported-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e365b61
    • Maximilian Heyne's avatar
      net: sched: avoid duplicates in classes dump · 392f34cc
      Maximilian Heyne authored
      commit bfc25605 upstream.
      
      This is a follow up of commit ea327469 ("net: sched: avoid
      duplicates in qdisc dump") which has fixed the issue only for the qdisc
      dump.
      
      The duplicate printing also occurs when dumping the classes via
        tc class show dev eth0
      
      Fixes: 59cc1f61
      
       ("net: sched: convert qdisc linked list to hashtable")
      Signed-off-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      392f34cc
    • Ido Schimmel's avatar
      nexthop: Do not flush blackhole nexthops when loopback goes down · 3e66c163
      Ido Schimmel authored
      commit 76c03bf8 upstream.
      
      As far as user space is concerned, blackhole nexthops do not have a
      nexthop device and therefore should not be affected by the
      administrative or carrier state of any netdev.
      
      However, when the loopback netdev goes down all the blackhole nexthops
      are flushed. This happens because internally the kernel associates
      blackhole nexthops with the loopback netdev.
      
      This behavior is both confusing to those not familiar with kernel
      internals and also diverges from the legacy API where blackhole IPv4
      routes are not flushed when the loopback netdev goes down:
      
       # ip route add blackhole 198.51.100.0/24
       # ip link set dev lo down
       # ip route show 198.51.100.0/24
       blackhole 198.51.100.0/24
      
      Blackhole IPv6 routes are flushed, but at least user space knows that
      they are associated with the loopback netdev:
      
       # ip -6 route show 2001:db8:1::/64
       blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium
      
      Fix this by only flushing blackhole nexthops when the loopback netdev is
      unregistered.
      
      Fixes: ab84be7e
      
       ("net: Initial nexthop code")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reported-by: default avatarDonald Sharp <sharpd@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e66c163
    • Ong Boon Leong's avatar
      net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10 · 7f101d03
      Ong Boon Leong authored
      commit 879c348c upstream.
      
      We introduce dwmac410_dma_init_channel() here for both EQoS v4.10 and
      above which use different DMA_CH(n)_Interrupt_Enable bit definitions for
      NIE and AIE.
      
      Fixes: 48863ce5
      
       ("stmmac: add DMA support for GMAC 4.xx")
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarRamesh Babu B <ramesh.babu.b@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f101d03
    • Kevin(Yudong) Yang's avatar
      net/mlx4_en: update moderation when config reset · 0fbbcf79
      Kevin(Yudong) Yang authored
      commit 00ff801b upstream.
      
      This patch fixes a bug that the moderation config will not be
      applied when calling mlx4_en_reset_config. For example, when
      turning on rx timestamping, mlx4_en_reset_config() will be called,
      causing the NIC to forget previous moderation config.
      
      This fix is in phase with a previous fix:
      commit 79c54b6b ("net/mlx4_en: Fix TX moderation info loss
      after set_ringparam is called")
      
      Tested: Before this patch, on a host with NIC using mlx4, run
      netserver and stream TCP to the host at full utilization.
      $ sar -I SUM 1
                       INTR    intr/s
      14:03:56          sum  48758.00
      
      After rx hwtstamp is enabled:
      $ sar -I SUM 1
      14:10:38          sum 317771.00
      We see the moderation is not working properly and issued 7x more
      interrupts.
      
      After the patch, and turned on rx hwtstamp, the rate of interrupts
      is as expected:
      $ sar -I SUM 1
      14:52:11          sum  49332.00
      
      Fixes: 79c54b6b
      
       ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called")
      Signed-off-by: default avatarKevin(Yudong) Yang <yyd@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarNeal Cardwell <ncardwell@google.com>
      CC: Tariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0fbbcf79
    • Vladimir Oltean's avatar
      net: enetc: don't overwrite the RSS indirection table when initializing · 78cbd0a4
      Vladimir Oltean authored
      commit c646d10d upstream.
      
      After the blamed patch, all RX traffic gets hashed to CPU 0 because the
      hashing indirection table set up in:
      
      enetc_pf_probe
      -> enetc_alloc_si_resources
         -> enetc_configure_si
            -> enetc_setup_default_rss_table
      
      is overwritten later in:
      
      enetc_pf_probe
      -> enetc_init_port_rss_memory
      
      which zero-initializes the entire port RSS table in order to avoid ECC errors.
      
      The trouble really is that enetc_init_port_rss_memory really neads
      enetc_alloc_si_resources to be called, because it depends upon
      enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si
      thing could have been better thought out, it has nothing to do in a
      function called "alloc_si_resources", especially since its counterpart,
      "free_si_resources", does nothing to unwind the configuration of the SI.
      
      The point is, we need to pull out enetc_configure_si out of
      enetc_alloc_resources, and move it after enetc_init_port_rss_memory.
      This allows us to set up the default RSS indirection table after
      initializing the memory.
      
      Fixes: 07bf34a5
      
       ("net: enetc: initialize the RFS and RSS memories")
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78cbd0a4
    • Linus Torvalds's avatar
      Revert "mm, slub: consider rest of partial list if acquire_slab() fails" · 6547ec42
      Linus Torvalds authored
      commit 9b1ea29b upstream.
      
      This reverts commit 8ff60eb0.
      
      The kernel test robot reports a huge performance regression due to the
      commit, and the reason seems fairly straightforward: when there is
      contention on the page list (which is what causes acquire_slab() to
      fail), we do _not_ want to just loop and try again, because that will
      transfer the contention to the 'n->list_lock' spinlock we hold, and
      just make things even worse.
      
      This is admittedly likely a problem only on big machines - the kernel
      test robot report comes from a 96-thread dual socket Intel Xeon Gold
      6252 setup, but the regression there really is quite noticeable:
      
         -47.9% regression of stress-ng.rawpkt.ops_per_sec
      
      and the commit that was marked as being fixed (7ced3719
      
      : "slub:
      Acquire_slab() avoid loop") actually did the loop exit early very
      intentionally (the hint being that "avoid loop" part of that commit
      message), exactly to avoid this issue.
      
      The correct thing to do may be to pick some kind of reasonable middle
      ground: instead of breaking out of the loop on the very first sign of
      contention, or trying over and over and over again, the right thing may
      be to re-try _once_, and then give up on the second failure (or pick
      your favorite value for "once"..).
      
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/
      
      
      Cc: Jann Horn <jannh@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6547ec42
    • Paulo Alcantara's avatar
      cifs: return proper error code in statfs(2) · 55e6ede3
      Paulo Alcantara authored
      commit 14302ee3
      
       upstream.
      
      In cifs_statfs(), if server->ops->queryfs is not NULL, then we should
      use its return value rather than always returning 0.  Instead, use rc
      variable as it is properly set to 0 in case there is no
      server->ops->queryfs.
      
      Signed-off-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Reviewed-by: default avatarAurelien Aptel <aaptel@suse.com>
      Reviewed-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55e6ede3
    • Christian Brauner's avatar
      mount: fix mounting of detached mounts onto targets that reside on shared mounts · a1ff418d
      Christian Brauner authored
      commit ee2e3f50 upstream.
      
      Creating a series of detached mounts, attaching them to the filesystem,
      and unmounting them can be used to trigger an integer overflow in
      ns->mounts causing the kernel to block any new mounts in count_mounts()
      and returning ENOSPC because it falsely assumes that the maximum number
      of mounts in the mount namespace has been reached, i.e. it thinks it
      can't fit the new mounts into the mount namespace anymore.
      
      Depending on the number of mounts in your system, this can be reproduced
      on any kernel that supportes open_tree() and move_mount() by compiling
      and running the following program:
      
        /* SPDX-License-Identifier: LGPL-2.1+ */
      
        #define _GNU_SOURCE
        #include <errno.h>
        #include <fcntl.h>
        #include <getopt.h>
        #include <limits.h>
        #include <stdbool.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        #include <sys/mount.h>
        #include <sys/stat.h>
        #include <sys/syscall.h>
        #include <sys/types.h>
        #include <unistd.h>
      
        /* open_tree() */
        #ifndef OPEN_TREE_CLONE
        #define OPEN_TREE_CLONE 1
        #endif
      
        #ifndef OPEN_TREE_CLOEXEC
        #define OPEN_TREE_CLOEXEC O_CLOEXEC
        #endif
      
        #ifndef __NR_open_tree
                #if defined __alpha__
                        #define __NR_open_tree 538
                #elif defined _MIPS_SIM
                        #if _MIPS_SIM == _MIPS_SIM_ABI32        /* o32 */
                                #define __NR_open_tree 4428
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_NABI32       /* n32 */
                                #define __NR_open_tree 6428
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_ABI64        /* n64 */
                                #define __NR_open_tree 5428
                        #endif
                #elif defined __ia64__
                        #define __NR_open_tree (428 + 1024)
                #else
                        #define __NR_open_tree 428
                #endif
        #endif
      
        /* move_mount() */
        #ifndef MOVE_MOUNT_F_EMPTY_PATH
        #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */
        #endif
      
        #ifndef __NR_move_mount
                #if defined __alpha__
                        #define __NR_move_mount 539
                #elif defined _MIPS_SIM
                        #if _MIPS_SIM == _MIPS_SIM_ABI32        /* o32 */
                                #define __NR_move_mount 4429
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_NABI32       /* n32 */
                                #define __NR_move_mount 6429
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_ABI64        /* n64 */
                                #define __NR_move_mount 5429
                        #endif
                #elif defined __ia64__
                        #define __NR_move_mount (428 + 1024)
                #else
                        #define __NR_move_mount 429
                #endif
        #endif
      
        static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags)
        {
                return syscall(__NR_open_tree, dfd, filename, flags);
        }
      
        static inline int sys_move_mount(int from_dfd, const char *from_pathname, int to_dfd,
                                         const char *to_pathname, unsigned int flags)
        {
                return syscall(__NR_move_mount, from_dfd, from_pathname, to_dfd, to_pathname, flags);
        }
      
        static bool is_shared_mountpoint(const char *path)
        {
                bool shared = false;
                FILE *f = NULL;
                char *line = NULL;
                int i;
                size_t len = 0;
      
                f = fopen("/proc/self/mountinfo", "re");
                if (!f)
                        return 0;
      
                while (getline(&line, &len, f) > 0) {
                        char *slider1, *slider2;
      
                        for (slider1 = line, i = 0; slider1 && i < 4; i++)
                                slider1 = strchr(slider1 + 1, ' ');
      
                        if (!slider1)
                                continue;
      
                        slider2 = strchr(slider1 + 1, ' ');
                        if (!slider2)
                                continue;
      
                        *slider2 = '\0';
                        if (strcmp(slider1 + 1, path) == 0) {
                                /* This is the path. Is it shared? */
                                slider1 = strchr(slider2 + 1, ' ');
                                if (slider1 && strstr(slider1, "shared:")) {
                                        shared = true;
                                        break;
                                }
                        }
                }
                fclose(f);
                free(line);
      
                return shared;
        }
      
        static void usage(void)
        {
                const char *text = "mount-new [--recursive] <base-dir>\n";
                fprintf(stderr, "%s", text);
                _exit(EXIT_SUCCESS);
        }
      
        #define exit_usage(format, ...)                              \
                ({                                                   \
                        fprintf(stderr, format "\n", ##__VA_ARGS__); \
                        usage();                                     \
                })
      
        #define exit_log(format, ...)                                \
                ({                                                   \
                        fprintf(stderr, format "\n", ##__VA_ARGS__); \
                        exit(EXIT_FAILURE);                          \
                })
      
        static const struct option longopts[] = {
                {"help",        no_argument,            0,      'a'},
                { NULL,         no_argument,            0,       0 },
        };
      
        int main(int argc, char *argv[])
        {
                int exit_code = EXIT_SUCCESS, index = 0;
                int dfd, fd_tree, new_argc, ret;
                char *base_dir;
                char *const *new_argv;
                char target[PATH_MAX];
      
                while ((ret = getopt_long_only(argc, argv, "", longopts, &index)) != -1) {
                        switch (ret) {
                        case 'a':
                                /* fallthrough */
                        default:
                                usage();
                        }
                }
      
                new_argv = &argv[optind];
                new_argc = argc - optind;
                if (new_argc < 1)
                        exit_usage("Missing base directory\n");
                base_dir = new_argv[0];
      
                if (*base_dir != '/')
                        exit_log("Please specify an absolute path");
      
                /* Ensure that target is a shared mountpoint. */
                if (!is_shared_mountpoint(base_dir))
                        exit_log("Please ensure that \"%s\" is a shared mountpoint", base_dir);
      
                dfd = open(base_dir, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
                if (dfd < 0)
                        exit_log("%m - Failed to open base directory \"%s\"", base_dir);
      
                ret = mkdirat(dfd, "detached-move-mount", 0755);
                if (ret < 0)
                        exit_log("%m - Failed to create required temporary directories");
      
                ret = snprintf(target, sizeof(target), "%s/detached-move-mount", base_dir);
                if (ret < 0 || (size_t)ret >= sizeof(target))
                        exit_log("%m - Failed to assemble target path");
      
                /*
                 * Having a mount table with 10000 mounts is already quite excessive
                 * and shoult account even for weird test systems.
                 */
                for (size_t i = 0; i < 10000; i++) {
                        fd_tree = sys_open_tree(dfd, "detached-move-mount",
                                                OPEN_TREE_CLONE |
                                                OPEN_TREE_CLOEXEC |
                                                AT_EMPTY_PATH);
                        if (fd_tree < 0) {
                                fprintf(stderr, "%m - Failed to open %d(detached-move-mount)", dfd);
                                exit_code = EXIT_FAILURE;
                                break;
                        }
      
                        ret = sys_move_mount(fd_tree, "", dfd, "detached-move-mount", MOVE_MOUNT_F_EMPTY_PATH);
                        if (ret < 0) {
                                if (errno == ENOSPC)
                                        fprintf(stderr, "%m - Buggy mount counting");
                                else
                                        fprintf(stderr, "%m - Failed to attach mount to %d(detached-move-mount)", dfd);
                                exit_code = EXIT_FAILURE;
                                break;
                        }
                        close(fd_tree);
      
                        ret = umount2(target, MNT_DETACH);
                        if (ret < 0) {
                                fprintf(stderr, "%m - Failed to unmount %s", target);
                                exit_code = EXIT_FAILURE;
                                break;
                        }
                }
      
                (void)unlinkat(dfd, "detached-move-mount", AT_REMOVEDIR);
                close(dfd);
      
                exit(exit_code);
        }
      
      and wait for the kernel to refuse any new mounts by returning ENOSPC.
      How many iterations are needed depends on the number of mounts in your
      system. Assuming you have something like 50 mounts on a standard system
      it should be almost instantaneous.
      
      The root cause of this is that detached mounts aren't handled correctly
      when source and target mount are identical and reside on a shared mount
      causing a broken mount tree where the detached source itself is
      propagated which propagation prevents for regular bind-mounts and new
      mounts. This ultimately leads to a miscalculation of the number of
      mounts in the mount namespace.
      
      Detached mounts created via
      open_tree(fd, path, OPEN_TREE_CLONE)
      are essentially like an unattached new mount, or an unattached
      bind-mount. They can then later on be attached to the filesystem via
      move_mount() which calls into attach_recursive_mount(). Part of
      attaching it to the filesystem is making sure that mounts get correctly
      propagated in case the destination mountpoint is MS_SHARED, i.e. is a
      shared mountpoint. This is done by calling into propagate_mnt() which
      walks the list of peers calling propagate_one() on each mount in this
      list making sure it receives the propagation event.
      The propagate_one() functions thereby skips both new mounts and bind
      mounts to not propagate them "into themselves". Both are identified by
      checking whether the mount is already attached to any mount namespace in
      mnt->mnt_ns. The is what the IS_MNT_NEW() helper is responsible for.
      
      However, detached mounts have an anonymous mount namespace attached to
      them stashed in mnt->mnt_ns which means that IS_MNT_NEW() doesn't
      realize they need to be skipped causing the mount to propagate "into
      itself" breaking the mount table and causing a disconnect between the
      number of mounts recorded as being beneath or reachable from the target
      mountpoint and the number of mounts actually recorded/counted in
      ns->mounts ultimately causing an overflow which in turn prevents any new
      mounts via the ENOSPC issue.
      
      So teach propagation to handle detached mounts by making it aware of
      them. I've been tracking this issue down for the last couple of days and
      then verifying that the fix is correct by
      unmounting everything in my current mount table leaving only /proc and
      /sys mounted and running the reproducer above overnight verifying the
      number of mounts counted in ns->mounts. With this fix the counts are
      correct and the ENOSPC issue can't be reproduced.
      
      This change will only have an effect on mounts created with the new
      mount API since detached mounts cannot be created with the old mount API
      so regressions are extremely unlikely.
      
      Link: https://lore.kernel.org/r/20210306101010.243666-1-christian.brauner@ubuntu.com
      Fixes: 2db154b3
      
       ("vfs: syscall: Add move_mount(2) to move mounts around")
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1ff418d
    • Christophe Leroy's avatar
      powerpc/603: Fix protection of user pages mapped with PROT_NONE · 59a057a8
      Christophe Leroy authored
      commit c119565a upstream.
      
      On book3s/32, page protection is defined by the PP bits in the PTE
      which provide the following protection depending on the access
      keys defined in the matching segment register:
      - PP 00 means RW with key 0 and N/A with key 1.
      - PP 01 means RW with key 0 and RO with key 1.
      - PP 10 means RW with both key 0 and key 1.
      - PP 11 means RO with both key 0 and key 1.
      
      Since the implementation of kernel userspace access protection,
      PP bits have been set as follows:
      - PP00 for pages without _PAGE_USER
      - PP01 for pages with _PAGE_USER and _PAGE_RW
      - PP11 for pages with _PAGE_USER and without _PAGE_RW
      
      For kernelspace segments, kernel accesses are performed with key 0
      and user accesses are performed with key 1. As PP00 is used for
      non _PAGE_USER pages, user can't access kernel pages not flagged
      _PAGE_USER while kernel can.
      
      For userspace segments, both kernel and user accesses are performed
      with key 0, therefore pages not flagged _PAGE_USER are still
      accessible to the user.
      
      This shouldn't be an issue, because userspace is expected to be
      accessible to the user. But unlike most other architectures, powerpc
      implements PROT_NONE protection by removing _PAGE_USER flag instead of
      flagging the page as not valid. This means that pages in userspace
      that are not flagged _PAGE_USER shall remain inaccessible.
      
      To get the expected behaviour, just mimic other architectures in the
      TLB miss handler by checking _PAGE_USER permission on userspace
      accesses as if it was the _PAGE_PRESENT bit.
      
      Note that this problem only is only for 603 cores. The 604+ have
      an hash table, and hash_page() function already implement the
      verification of _PAGE_USER permission on userspace pages.
      
      Fixes: f342adca
      
       ("powerpc/32s: Prepare Kernel Userspace Access Protection")
      Cc: stable@vger.kernel.org # v5.2+
      Reported-by: default avatarChristoph Plattner <christoph.plattner@thalesgroup.com>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/4a0c6e3bb8f0c162457bf54d9bc6fd8d7b55129f.1612160907.git.christophe.leroy@csgroup.eu
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59a057a8
    • Lorenzo Bianconi's avatar
      mt76: dma: do not report truncated frames to mac80211 · da9f2219
      Lorenzo Bianconi authored
      commit d0bd52c5 upstream.
      
      Commit b102f0c5 ("mt76: fix array overflow on receiving too many
      fragments for a packet") fixes a possible OOB access but it introduces a
      memory leak since the pending frame is not released to page_frag_cache
      if the frag array of skb_shared_info is full. Commit 93a1d479
      ("mt76: dma: fix a possible memory leak in mt76_add_fragment()") fixes
      the issue but does not free the truncated skb that is forwarded to
      mac80211 layer. Fix the leftover issue discarding even truncated skbs.
      
      Fixes: 93a1d479
      
       ("mt76: dma: fix a possible memory leak in mt76_add_fragment()")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/a03166fcc8214644333c68674a781836e0f57576.1612697217.git.lorenzo@kernel.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da9f2219
    • Jiri Wiesner's avatar
      ibmvnic: always store valid MAC address · 95b0a3b0
      Jiri Wiesner authored
      commit 67eb2114 upstream.
      
      The last change to ibmvnic_set_mac(), 8fc3672a, meant to prevent
      users from setting an invalid MAC address on an ibmvnic interface
      that has not been brought up yet. The change also prevented the
      requested MAC address from being stored by the adapter object for an
      ibmvnic interface when the state of the ibmvnic interface is
      VNIC_PROBED - that is after probing has finished but before the
      ibmvnic interface is brought up. The MAC address stored by the
      adapter object is used and sent to the hypervisor for checking when
      an ibmvnic interface is brought up.
      
      The ibmvnic driver ignoring the requested MAC address when in
      VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more
      than one slave to malfunction. The bonding code must be able to
      change the MAC address of its slaves before they are brought up
      during enslaving. The inability of kernels with 8fc3672a to set
      the MAC addresses of bonding slaves is observable in the output of
      "ip address show". The MAC addresses of the slaves are the same as
      the MAC address of the bond on a working system whereas the slaves
      retain their original MAC addresses on a system with a malfunctioning
      LACP bond.
      
      Fixes: 8fc3672a
      
       ("ibmvnic: fix ibmvnic_set_mac")
      Signed-off-by: default avatarJiri Wiesner <jwiesner@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95b0a3b0
    • Maciej Fijalkowski's avatar
      samples, bpf: Add missing munmap in xdpsock · 3e8ab75f
      Maciej Fijalkowski authored
      commit 6bc66998 upstream.
      
      We mmap the umem region, but we never munmap it.
      Add the missing call at the end of the cleanup.
      
      Fixes: 3945b37a
      
       ("samples/bpf: use hugepages in xdpsock app")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Link: https://lore.kernel.org/bpf/20210303185636.18070-3-maciej.fijalkowski@intel.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e8ab75f
    • Yauheni Kaliuta's avatar
      selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier · c2c3a85a
      Yauheni Kaliuta authored
      commit 6185266c upstream.
      
      The verifier test labelled "valid read map access into a read-only array
      2" calls the bpf_csum_diff() helper and checks its return value. However,
      architecture implementations of csum_partial() (which is what the helper
      uses) differ in whether they fold the return value to 16 bit or not. For
      example, x86 version has ...
      
      	if (unlikely(odd)) {
      		result = from32to16(result);
      		result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
      	}
      
      ... while generic lib/checksum.c does:
      
      	result = from32to16(result);
      	if (odd)
      		result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
      
      This makes the helper return different values on different architectures,
      breaking the test on non-x86. To fix this, add an additional instruction
      to always mask the return value to 16 bits, and update the expected return
      value accordingly.
      
      Fixes: fb2abb73
      
       ("bpf, selftest: test {rd, wr}only flags and direct value access")
      Signed-off-by: default avatarYauheni Kaliuta <yauheni.kaliuta@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210228103017.320240-1-yauheni.kaliuta@redhat.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2c3a85a
    • Hangbin Liu's avatar
      selftests/bpf: No need to drop the packet when there is no geneve opt · 57b9f13e
      Hangbin Liu authored
      commit 557c223b upstream.
      
      In bpf geneve tunnel test we set geneve option on tx side. On rx side we
      only call bpf_skb_get_tunnel_opt(). Since commit 9c2e14b4 ("ip_tunnels:
      Set tunnel option flag when tunnel metadata is present") geneve_rx() will
      not add TUNNEL_GENEVE_OPT flag if there is no geneve option, which cause
      bpf_skb_get_tunnel_opt() return ENOENT and _geneve_get_tunnel() in
      test_tunnel_kern.c drop the packet.
      
      As it should be valid that bpf_skb_get_tunnel_opt() return error when
      there is not tunnel option, there is no need to drop the packet and
      break all geneve rx traffic. Just set opt_class to 0 in this test and
      keep returning TC_ACT_OK.
      
      Fixes: 933a741e
      
       ("selftests/bpf: bpf tunnel test.")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Link: https://lore.kernel.org/bpf/20210224081403.1425474-1-liuhangbin@gmail.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57b9f13e
    • Vasily Averin's avatar
      netfilter: x_tables: gpf inside xt_find_revision() · 82e85c0e
      Vasily Averin authored
      commit 8e24eddd upstream.
      
      nested target/match_revfn() calls work with xt[NFPROTO_UNSPEC] lists
      without taking xt[NFPROTO_UNSPEC].mutex. This can race with module unload
      and cause host to crash:
      
      general protection fault: 0000 [#1]
      Modules linked in: ... [last unloaded: xt_cluster]
      CPU: 0 PID: 542455 Comm: iptables
      RIP: 0010:[<ffffffff8ffbd518>]  [<ffffffff8ffbd518>] strcmp+0x18/0x40
      RDX: 0000000000000003 RSI: ffff9a5a5d9abe10 RDI: dead000000000111
      R13: ffff9a5a5d9abe10 R14: ffff9a5a5d9abd8c R15: dead000000000100
      (VvS: %R15 -- &xt_match,  %RDI -- &xt_match.name,
      xt_cluster unregister match in xt[NFPROTO_UNSPEC].match list)
      Call Trace:
       [<ffffffff902ccf44>] match_revfn+0x54/0xc0
       [<ffffffff902ccf9f>] match_revfn+0xaf/0xc0
       [<ffffffff902cd01e>] xt_find_revision+0x6e/0xf0
       [<ffffffffc05a5be0>] do_ipt_get_ctl+0x100/0x420 [ip_tables]
       [<ffffffff902cc6bf>] nf_getsockopt+0x4f/0x70
       [<ffffffff902dd99e>] ip_getsockopt+0xde/0x100
       [<ffffffff903039b5>] raw_getsockopt+0x25/0x50
       [<ffffffff9026c5da>] sock_common_getsockopt+0x1a/0x20
       [<ffffffff9026b89d>] SyS_getsockopt+0x7d/0xf0
       [<ffffffff903cbf92>] system_call_fastpath+0x25/0x2a
      
      Fixes: 656caff2
      
       ("netfilter 04/09: x_tables: fix match/target revision lookup")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82e85c0e
    • Florian Westphal's avatar
      netfilter: nf_nat: undo erroneous tcp edemux lookup · f66b8e73
      Florian Westphal authored
      commit 03a3ca37 upstream.
      
      Under extremely rare conditions TCP early demux will retrieve the wrong
      socket.
      
      1. local machine establishes a connection to a remote server, S, on port
         p.
      
         This gives:
         laddr:lport -> S:p
         ... both in tcp and conntrack.
      
      2. local machine establishes a connection to host H, on port p2.
         2a. TCP stack choses same laddr:lport, so we have
         laddr:lport -> H:p2 from TCP point of view.
         2b). There is a destination NAT rewrite in place, translating
              H:p2 to S:p.  This results in following conntrack entries:
      
         I)  laddr:lport -> S:p  (origin)  S:p -> laddr:lport (reply)
         II) laddr:lport -> H:p2 (origin)  S:p -> laddr:lport2 (reply)
      
         NAT engine has rewritten laddr:lport to laddr:lport2 to map
         the reply packet to the correct origin.
      
         When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
         will undo-the SNAT transformation, rewriting IP header to
         S:p -> laddr:lport
      
         This causes TCP early demux to associate the skb with the TCP socket
         of the first connection.
      
         The INPUT hook will then reverse the DNAT transformation, rewriting
         the IP header to H:p2 -> laddr:lport.
      
      Because packet ends up with the wrong socket, the new connection
      never completes: originator stays in SYN_SENT and conntrack entry
      remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
      until it gives up.
      
      To resolve this, orphan the skb after the input rewrite:
      Because the source IP address changed, the socket must be incorrect.
      We can't move the DNAT undo to prerouting due to backwards
      compatibility, doing so will make iptables/nftables rules to no longer
      match the way they did.
      
      After orphan, the packet will be handed to the next protocol layer
      (tcp, udp, ...) and that will repeat the socket lookup just like as if
      early demux was disabled.
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427
      
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f66b8e73
    • Eric Dumazet's avatar
      tcp: add sanity tests to TCP_QUEUE_SEQ · 3bf89943
      Eric Dumazet authored
      commit 8811f4a9 upstream.
      
      Qingyu Li reported a syzkaller bug where the repro
      changes RCV SEQ _after_ restoring data in the receive queue.
      
      mprotect(0x4aa000, 12288, PROT_READ)    = 0
      mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000
      mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
      mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000
      socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
      setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0
      connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0
      setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0
      sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
      setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0
      setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0
      recvfrom(3, NULL, 20, 0, NULL, NULL)    = -1 ECONNRESET (Connection reset by peer)
      
      syslog shows:
      [  111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0
      [  111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0
      
      This should not be allowed. TCP_QUEUE_SEQ should only be used
      when queues are empty.
      
      This patch fixes this case, and the tx path as well.
      
      Fixes: ee995283
      
       ("tcp: Initial repair mode")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005
      
      
      Reported-by: default avatarQingyu Li <ieatmuttonchuan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bf89943
    • Torin Cooper-Bennun's avatar
      can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode · b7049b61
      Torin Cooper-Bennun authored
      commit 27126252 upstream.
      
      This patch prevents a potentially destructive race condition. The
      device is fully operational on the bus after entering Normal Mode, so
      zeroing the MRAM after entering this mode may lead to loss of
      information, e.g. new received messages.
      
      This patch fixes the problem by first initializing the MRAM, then
      bringing the device into Normale Mode.
      
      Fixes: 5443c226 ("can: tcan4x5x: Add tcan4x5x driver to the kernel")
      Link: https://lore.kernel.org/r/20210226163440.313628-1-torin@maxiluxsystems.com
      
      
      Suggested-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarTorin Cooper-Bennun <torin@maxiluxsystems.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7049b61
    • Joakim Zhang's avatar
      can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode · a7e187a8
      Joakim Zhang authored
      commit c6382004 upstream.
      
      Invoke flexcan_chip_freeze() to enter freeze mode, since need poll
      freeze mode acknowledge.
      
      Fixes: e955cead ("CAN: Add Flexcan CAN controller driver")
      Link: https://lore.kernel.org/r/20210218110037.16591-4-qiangqing.zhang@nxp.com
      
      
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7e187a8