Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jun 30, 2020
    • Valentin Longchamp's avatar
      net: sched: export __netdev_watchdog_up() · 32e5a15f
      Valentin Longchamp authored
      [ Upstream commit 1a3db27a ]
      
      Since the quiesce/activate rework, __netdev_watchdog_up() is directly
      called in the ucc_geth driver.
      
      Unfortunately, this function is not available for modules and thus
      ucc_geth cannot be built as a module anymore. Fix it by exporting
      __netdev_watchdog_up().
      
      Since the commit introducing the regression was backported to stable
      branches, this one should ideally be as well.
      
      Fixes: 79dde73c
      
       ("net/ethernet/freescale: rework quiesce/activate for ucc_geth")
      Signed-off-by: default avatarValentin Longchamp <valentin@longchamp.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      32e5a15f
    • Sasha Levin's avatar
      btrfs: fix a block group ref counter leak after failure to remove block group · 9d3d40ec
      Sasha Levin authored
      [ Upstream commit 9fecd132 ]
      
      When removing a block group, if we fail to delete the block group's item
      from the extent tree, we jump to the 'out' label and end up decrementing
      the block group's reference count once only (by 1), resulting in a counter
      leak because the block group at that point was already removed from the
      block group cache rbtree - so we have to decrement the reference count
      twice, once for the rbtree and once for our lookup at the start of the
      function.
      
      There is a second bug where if removing the free space tree entries (the
      call to remove_block_group_free_space()) fails we end up jumping to the
      'out_put_group' label but end up decrementing the reference count only
      once, when we should have done it twice, since we have already removed
      the block group from the block group cache rbtree. This happens because
      the reference count decrement for the rbtree reference happens after
      attempting to remove the free space tree entries, which is far away from
      the place where we remove the block group from the rbtree.
      
      To make things less error prone, decrement the reference count for the
      rbtree immediately after removing the block group from it. This also
      eleminates the need for two different exit labels on error, renaming
      'out_put_label' to just 'out' and removing the old 'out'.
      
      Fixes: f6033c5e
      
       ("btrfs: fix block group leak when removing fails")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9d3d40ec
    • Thierry Reding's avatar
      Revert "i2c: tegra: Fix suspending in active runtime PM state" · 8ae850cd
      Thierry Reding authored
      [ Upstream commit 78ad7342 ]
      
      This reverts commit 9f42de8d.
      
      It's not safe to use pm_runtime_force_{suspend,resume}(), especially
      during the noirq phase of suspend. See also the guidance provided in
      commit 1e2ef05b
      
       ("PM: Limit race conditions between runtime PM
      and system sleep (v2)").
      
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8ae850cd
    • Neal Cardwell's avatar
      tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT · 052a7fdd
      Neal Cardwell authored
      [ Upstream commit b344579c ]
      
      Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
      Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
      ACK when the minimum rtt of a connection goes down. From inspection it
      is clear from the existing code that this could happen in an example
      like the following:
      
      o The first 8 RTT samples in a round trip are 150ms, resulting in a
        curr_rtt of 150ms and a delay_min of 150ms.
      
      o The 9th RTT sample is 100ms. The curr_rtt does not change after the
        first 8 samples, so curr_rtt remains 150ms. But delay_min can be
        lowered at any time, so delay_min falls to 100ms. The code executes
        the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
        of 100ms, and the curr_rtt is declared far enough above delay_min to
        force a (spurious) exit of Slow start.
      
      The fix here is simple: allow every RTT sample in a round trip to
      lower the curr_rtt.
      
      Fixes: ae27e98a
      
       ("[TCP] CUBIC v2.3")
      Reported-by: default avatarMirja Kuehlewind <mirja.kuehlewind@ericsson.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      052a7fdd
    • Toke Høiland-Jørgensen's avatar
      sch_cake: fix a few style nits · 94231513
      Toke Høiland-Jørgensen authored
      [ Upstream commit 3f608f0c ]
      
      I spotted a few nits when comparing the in-tree version of sch_cake with
      the out-of-tree one: A redundant error variable declaration shadowing an
      outer declaration, and an indentation alignment issue. Fix both of these.
      
      Fixes: 046f6fd5
      
       ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94231513
    • Toke Høiland-Jørgensen's avatar
      sch_cake: don't call diffserv parsing code when it is not needed · b1aa7e5f
      Toke Høiland-Jørgensen authored
      [ Upstream commit 8c95eca0 ]
      
      As a further optimisation of the diffserv parsing codepath, we can skip it
      entirely if CAKE is configured to neither use diffserv-based
      classification, nor to zero out the diffserv bits.
      
      Fixes: c87b4ecd
      
       ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1aa7e5f
    • Ilya Ponetayev's avatar
      sch_cake: don't try to reallocate or unshare skb unconditionally · ea2628dd
      Ilya Ponetayev authored
      [ Upstream commit 9208d286 ]
      
      cake_handle_diffserv() tries to linearize mac and network header parts of
      skb and to make it writable unconditionally. In some cases it leads to full
      skb reallocation, which reduces throughput and increases CPU load. Some
      measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core
      CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable()
      reallocates skb, if skb was allocated in ethernet driver via so-called
      'build skb' method from page cache (it was discovered by strange increase
      of kmalloc-2048 slab at first).
      
      Obtain DSCP value via read-only skb_header_pointer() call, and leave
      linearization only for DSCP bleaching or ECN CE setting. And, as an
      additional optimisation, skip diffserv parsing entirely if it is not needed
      by the current configuration.
      
      Fixes: c87b4ecd
      
       ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
      Signed-off-by: default avatarIlya Ponetayev <i.ponetaev@ndmsystems.com>
      [ fix a few style issues, reflow commit message ]
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea2628dd
    • Taehee Yoo's avatar
      ip_tunnel: fix use-after-free in ip_tunnel_lookup() · 3c620826
      Taehee Yoo authored
      [ Upstream commit ba61539c
      
       ]
      
      In the datapath, the ip_tunnel_lookup() is used and it internally uses
      fallback tunnel device pointer, which is fb_tunnel_dev.
      This pointer variable should be set to NULL when a fb interface is deleted.
      But there is no routine to set fb_tunnel_dev pointer to NULL.
      So, this pointer will be still used after interface is deleted and
      it eventually results in the use-after-free problem.
      
      Test commands:
          ip netns add A
          ip netns add B
          ip link add eth0 type veth peer name eth1
          ip link set eth0 netns A
          ip link set eth1 netns B
      
          ip netns exec A ip link set lo up
          ip netns exec A ip link set eth0 up
          ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
      	    remote 10.0.0.2
          ip netns exec A ip link set gre1 up
          ip netns exec A ip a a 10.0.100.1/24 dev gre1
          ip netns exec A ip a a 10.0.0.1/24 dev eth0
      
          ip netns exec B ip link set lo up
          ip netns exec B ip link set eth1 up
          ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
      	    remote 10.0.0.1
          ip netns exec B ip link set gre1 up
          ip netns exec B ip a a 10.0.100.2/24 dev gre1
          ip netns exec B ip a a 10.0.0.2/24 dev eth1
          ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
          ip netns del B
      
      Splat looks like:
      [   77.793450][    C3] ==================================================================
      [   77.794702][    C3] BUG: KASAN: use-after-free in ip_tunnel_lookup+0xcc4/0xf30
      [   77.795573][    C3] Read of size 4 at addr ffff888060bd9c84 by task hping3/2905
      [   77.796398][    C3]
      [   77.796664][    C3] CPU: 3 PID: 2905 Comm: hping3 Not tainted 5.8.0-rc1+ #616
      [   77.797474][    C3] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   77.798453][    C3] Call Trace:
      [   77.798815][    C3]  <IRQ>
      [   77.799142][    C3]  dump_stack+0x9d/0xdb
      [   77.799605][    C3]  print_address_description.constprop.7+0x2cc/0x450
      [   77.800365][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.800908][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.801517][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.802145][    C3]  kasan_report+0x154/0x190
      [   77.802821][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.803503][    C3]  ip_tunnel_lookup+0xcc4/0xf30
      [   77.804165][    C3]  __ipgre_rcv+0x1ab/0xaa0 [ip_gre]
      [   77.804862][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   77.805621][    C3]  gre_rcv+0x304/0x1910 [ip_gre]
      [   77.806293][    C3]  ? lock_acquire+0x1a9/0x870
      [   77.806925][    C3]  ? gre_rcv+0xfe/0x354 [gre]
      [   77.807559][    C3]  ? erspan_xmit+0x2e60/0x2e60 [ip_gre]
      [   77.808305][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   77.809032][    C3]  ? rcu_read_lock_held+0x90/0xa0
      [   77.809713][    C3]  gre_rcv+0x1b8/0x354 [gre]
      [ ... ]
      
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: c5441932
      
       ("GRE: Refactor GRE tunneling code.")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c620826
    • Florian Fainelli's avatar
      net: phy: Check harder for errors in get_phy_id() · 9baf076d
      Florian Fainelli authored
      [ Upstream commit b2ffc75e ]
      
      Commit 02a6efca ("net: phy: allow scanning busses with missing
      phys") added a special condition to return -ENODEV in case -ENODEV or
      -EIO was returned from the first read of the MII_PHYSID1 register.
      
      In case the MDIO bus data line pull-up is not strong enough, the MDIO
      bus controller will not flag this as a read error. This can happen when
      a pluggable daughter card is not connected and weak internal pull-ups
      are used (since that is the only option, otherwise the pins are
      floating).
      
      The second read of MII_PHYSID2 will be correctly flagged an error
      though, but now we will return -EIO which will be treated as a hard
      error, thus preventing MDIO bus scanning loops to continue succesfully.
      
      Apply the same logic to both register reads, thus allowing the scanning
      logic to proceed.
      
      Fixes: 02a6efca
      
       ("net: phy: allow scanning busses with missing phys")
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9baf076d
    • Taehee Yoo's avatar
      ip6_gre: fix use-after-free in ip6gre_tunnel_lookup() · 568c5aaf
      Taehee Yoo authored
      [ Upstream commit dafabb65
      
       ]
      
      In the datapath, the ip6gre_tunnel_lookup() is used and it internally uses
      fallback tunnel device pointer, which is fb_tunnel_dev.
      This pointer variable should be set to NULL when a fb interface is deleted.
      But there is no routine to set fb_tunnel_dev pointer to NULL.
      So, this pointer will be still used after interface is deleted and
      it eventually results in the use-after-free problem.
      
      Test commands:
          ip netns add A
          ip netns add B
          ip link add eth0 type veth peer name eth1
          ip link set eth0 netns A
          ip link set eth1 netns B
      
          ip netns exec A ip link set lo up
          ip netns exec A ip link set eth0 up
          ip netns exec A ip link add ip6gre1 type ip6gre local fc:0::1 \
      	    remote fc:0::2
          ip netns exec A ip -6 a a fc:100::1/64 dev ip6gre1
          ip netns exec A ip link set ip6gre1 up
          ip netns exec A ip -6 a a fc:0::1/64 dev eth0
          ip netns exec A ip link set ip6gre0 up
      
          ip netns exec B ip link set lo up
          ip netns exec B ip link set eth1 up
          ip netns exec B ip link add ip6gre1 type ip6gre local fc:0::2 \
      	    remote fc:0::1
          ip netns exec B ip -6 a a fc:100::2/64 dev ip6gre1
          ip netns exec B ip link set ip6gre1 up
          ip netns exec B ip -6 a a fc:0::2/64 dev eth1
          ip netns exec B ip link set ip6gre0 up
          ip netns exec A ping fc:100::2 -s 60000 &
          ip netns del B
      
      Splat looks like:
      [   73.087285][    C1] BUG: KASAN: use-after-free in ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.088361][    C1] Read of size 4 at addr ffff888040559218 by task ping/1429
      [   73.089317][    C1]
      [   73.089638][    C1] CPU: 1 PID: 1429 Comm: ping Not tainted 5.7.0+ #602
      [   73.090531][    C1] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   73.091725][    C1] Call Trace:
      [   73.092160][    C1]  <IRQ>
      [   73.092556][    C1]  dump_stack+0x96/0xdb
      [   73.093122][    C1]  print_address_description.constprop.6+0x2cc/0x450
      [   73.094016][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.094894][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.095767][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.096619][    C1]  kasan_report+0x154/0x190
      [   73.097209][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.097989][    C1]  ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.098750][    C1]  ? gre_del_protocol+0x60/0x60 [gre]
      [   73.099500][    C1]  gre_rcv+0x1c5/0x1450 [ip6_gre]
      [   73.100199][    C1]  ? ip6gre_header+0xf00/0xf00 [ip6_gre]
      [   73.100985][    C1]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   73.101830][    C1]  ? ip6_input_finish+0x5/0xf0
      [   73.102483][    C1]  ip6_protocol_deliver_rcu+0xcbb/0x1510
      [   73.103296][    C1]  ip6_input_finish+0x5b/0xf0
      [   73.103920][    C1]  ip6_input+0xcd/0x2c0
      [   73.104473][    C1]  ? ip6_input_finish+0xf0/0xf0
      [   73.105115][    C1]  ? rcu_read_lock_held+0x90/0xa0
      [   73.105783][    C1]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   73.106548][    C1]  ipv6_rcv+0x1f1/0x300
      [ ... ]
      
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: c12b395a
      
       ("gre: Support GRE over IPv6")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      568c5aaf
    • David Christensen's avatar
      tg3: driver sleeps indefinitely when EEH errors exceed eeh_max_freezes · 35db6386
      David Christensen authored
      [ Upstream commit 3a2656a2
      
       ]
      
      The driver function tg3_io_error_detected() calls napi_disable twice,
      without an intervening napi_enable, when the number of EEH errors exceeds
      eeh_max_freezes, resulting in an indefinite sleep while holding rtnl_lock.
      
      Add check for pcierr_recovery which skips code already executed for the
      "Frozen" state.
      
      Signed-off-by: default avatarDavid Christensen <drc@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35db6386
    • Eric Dumazet's avatar
      tcp: grow window for OOO packets only for SACK flows · fe3a5d8f
      Eric Dumazet authored
      [ Upstream commit 66205121 ]
      
      Back in 2013, we made a change that broke fast retransmit
      for non SACK flows.
      
      Indeed, for these flows, a sender needs to receive three duplicate
      ACK before starting fast retransmit. Sending ACK with different
      receive window do not count.
      
      Even if enabling SACK is strongly recommended these days,
      there still are some cases where it has to be disabled.
      
      Not increasing the window seems better than having to
      rely on RTO.
      
      After the fix, following packetdrill test gives :
      
      // Initialize connection
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
         +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
         +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8>
         +0 < . 1:1(0) ack 1 win 514
      
         +0 accept(3, ..., ...) = 4
      
         +0 < . 1:1001(1000) ack 1 win 514
      // Quick ack
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 2001:3001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 3001:4001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 4001:5001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
          +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 1001:2001(1000) ack 1 win 514
      // Hole is repaired.
         +0 > . 1:1(0) ack 5001 win 272
      
      Fixes: 4e4f1fc2
      
       ("tcp: properly increase rcv_ssthresh for ofo packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe3a5d8f
    • Denis Kirjanov's avatar
      tcp: don't ignore ECN CWR on pure ACK · cb22ce33
      Denis Kirjanov authored
      [ Upstream commit 25702840
      
       ]
      
      there is a problem with the CWR flag set in an incoming ACK segment
      and it leads to the situation when the ECE flag is latched forever
      
      the following packetdrill script shows what happens:
      
      // Stack receives incoming segments with CE set
      +0.1 <[ect0]  . 11001:12001(1000) ack 1001 win 65535
      +0.0 <[ce]    . 12001:13001(1000) ack 1001 win 65535
      +0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535
      
      // Stack repsonds with ECN ECHO
      +0.0 >[noecn]  . 1001:1001(0) ack 12001
      +0.0 >[noecn] E. 1001:1001(0) ack 13001
      +0.0 >[noecn] E. 1001:1001(0) ack 14001
      
      // Write a packet
      +0.1 write(3, ..., 1000) = 1000
      +0.0 >[ect0] PE. 1001:2001(1000) ack 14001
      
      // Pure ACK received
      +0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535
      
      // Since CWR was sent, this packet should NOT have ECE set
      
      +0.1 write(3, ..., 1000) = 1000
      +0.0 >[ect0]  P. 2001:3001(1000) ack 14001
      // but Linux will still keep ECE latched here, with packetdrill
      // flagging a missing ECE flag, expecting
      // >[ect0] PE. 2001:3001(1000) ack 14001
      // in the script
      
      In the situation above we will continue to send ECN ECHO packets
      and trigger the peer to reduce the congestion window. To avoid that
      we can check CWR on pure ACKs received.
      
      v3:
      - Add a sequence check to avoid sending an ACK to an ACK
      
      v2:
      - Adjusted the comment
      - move CWR check before checking for unacknowledged packets
      
      Signed-off-by: default avatarDenis Kirjanov <denis.kirjanov@suse.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb22ce33
    • Marcelo Ricardo Leitner's avatar
      sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket · dc43f7e8
      Marcelo Ricardo Leitner authored
      [ Upstream commit 471e39df ]
      
      If a socket is set ipv6only, it will still send IPv4 addresses in the
      INIT and INIT_ACK packets. This potentially misleads the peer into using
      them, which then would cause association termination.
      
      The fix is to not add IPv4 addresses to ipv6only sockets.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Tested-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc43f7e8
    • David Howells's avatar
      rxrpc: Fix notification call on completion of discarded calls · fea86448
      David Howells authored
      [ Upstream commit 0041cd5a
      
       ]
      
      When preallocated service calls are being discarded, they're passed to
      ->discard_new_call() to have the caller clean up any attached higher-layer
      preallocated pieces before being marked completed.  However, the act of
      marking them completed now invokes the call's notification function - which
      causes a problem because that function might assume that the previously
      freed pieces of memory are still there.
      
      Fix this by setting a dummy notification function on the socket after
      calling ->discard_new_call().
      
      This results in the following kasan message when the kafs module is
      removed.
      
      ==================================================================
      BUG: KASAN: use-after-free in afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
      Write of size 1 at addr ffff8880946c39e4 by task kworker/u4:1/21
      
      CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.8.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x18f/0x20d lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd3/0x413 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
       rxrpc_notify_socket+0x1db/0x5d0 net/rxrpc/recvmsg.c:40
       __rxrpc_set_call_completion.part.0+0x172/0x410 net/rxrpc/recvmsg.c:76
       __rxrpc_call_completed net/rxrpc/recvmsg.c:112 [inline]
       rxrpc_call_completed+0xca/0xf0 net/rxrpc/recvmsg.c:111
       rxrpc_discard_prealloc+0x781/0xab0 net/rxrpc/call_accept.c:233
       rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
       afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
       afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
       ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
       cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
       process_one_work+0x965/0x1690 kernel/workqueue.c:2269
       worker_thread+0x96/0xe10 kernel/workqueue.c:2415
       kthread+0x3b5/0x4a0 kernel/kthread.c:291
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
      
      Allocated by task 6820:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc mm/kasan/common.c:494 [inline]
       __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:467
       kmem_cache_alloc_trace+0x153/0x7d0 mm/slab.c:3551
       kmalloc include/linux/slab.h:555 [inline]
       kzalloc include/linux/slab.h:669 [inline]
       afs_alloc_call+0x55/0x630 fs/afs/rxrpc.c:141
       afs_charge_preallocation+0xe9/0x2d0 fs/afs/rxrpc.c:757
       afs_open_socket+0x292/0x360 fs/afs/rxrpc.c:92
       afs_net_init+0xa6c/0xe30 fs/afs/main.c:125
       ops_init+0xaf/0x420 net/core/net_namespace.c:151
       setup_net+0x2de/0x860 net/core/net_namespace.c:341
       copy_net_ns+0x293/0x590 net/core/net_namespace.c:482
       create_new_namespaces+0x3fb/0xb30 kernel/nsproxy.c:110
       unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
       ksys_unshare+0x43d/0x8e0 kernel/fork.c:2983
       __do_sys_unshare kernel/fork.c:3051 [inline]
       __se_sys_unshare kernel/fork.c:3049 [inline]
       __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
       do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 21:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       kasan_set_free_info mm/kasan/common.c:316 [inline]
       __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:455
       __cache_free mm/slab.c:3426 [inline]
       kfree+0x109/0x2b0 mm/slab.c:3757
       afs_put_call+0x585/0xa40 fs/afs/rxrpc.c:190
       rxrpc_discard_prealloc+0x764/0xab0 net/rxrpc/call_accept.c:230
       rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
       afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
       afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
       ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
       cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
       process_one_work+0x965/0x1690 kernel/workqueue.c:2269
       worker_thread+0x96/0xe10 kernel/workqueue.c:2415
       kthread+0x3b5/0x4a0 kernel/kthread.c:291
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
      
      The buggy address belongs to the object at ffff8880946c3800
       which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 484 bytes inside of
       1024-byte region [ffff8880946c3800, ffff8880946c3c00)
      The buggy address belongs to the page:
      page:ffffea000251b0c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
      flags: 0xfffe0000000200(slab)
      raw: 00fffe0000000200 ffffea0002546508 ffffea00024fa248 ffff8880aa000c40
      raw: 0000000000000000 ffff8880946c3000 0000000100000002 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880946c3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880946c3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8880946c3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
       ffff8880946c3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880946c3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Reported-by: default avatar <syzbot+d3eccef36ddbd02713e9@syzkaller.appspotmail.com>
      Fixes: 5ac0d622
      
       ("rxrpc: Fix missing notification")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fea86448
    • Aditya Pakki's avatar
      rocker: fix incorrect error handling in dma_rings_init · 6956830c
      Aditya Pakki authored
      [ Upstream commit 58d0c864
      
       ]
      
      In rocker_dma_rings_init, the goto blocks in case of errors
      caused by the functions rocker_dma_cmd_ring_waits_alloc() and
      rocker_dma_ring_create() are incorrect. The patch fixes the
      order consistent with cleanup in rocker_dma_rings_fini().
      
      Signed-off-by: default avatarAditya Pakki <pakki001@umn.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6956830c
    • Lorenzo Bianconi's avatar
      openvswitch: take into account de-fragmentation/gso_size in execute_check_pkt_len · a908f986
      Lorenzo Bianconi authored
      [ Upstream commit 17843655 ]
      
      ovs connection tracking module performs de-fragmentation on incoming
      fragmented traffic. Take info account if traffic has been de-fragmented
      in execute_check_pkt_len action otherwise we will perform the wrong
      nested action considering the original packet size. This issue typically
      occurs if ovs-vswitchd adds a rule in the pipeline that requires connection
      tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action.
      Moreover take into account GSO fragment size for GSO packet in
      execute_check_pkt_len routine
      
      Fixes: 4d5ec89f
      
       ("net: openvswitch: Add a new action check_pkt_len")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a908f986
    • Jeremy Kerr's avatar
      net: usb: ax88179_178a: fix packet alignment padding · 27b70214
      Jeremy Kerr authored
      [ Upstream commit e869e7a1
      
       ]
      
      Using a AX88179 device (0b95:1790), I see two bytes of appended data on
      every RX packet. For example, this 48-byte ping, using 0xff as a
      payload byte:
      
        04:20:22.528472 IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 2447, seq 1, length 64
      	0x0000:  000a cd35 ea50 000a cd35 ea4f 0800 4500
      	0x0010:  0054 c116 4000 4001 f63e c0a8 0101 c0a8
      	0x0020:  0102 0800 b633 098f 0001 87ea cd5e 0000
      	0x0030:  0000 dcf2 0600 0000 0000 ffff ffff ffff
      	0x0040:  ffff ffff ffff ffff ffff ffff ffff ffff
      	0x0050:  ffff ffff ffff ffff ffff ffff ffff ffff
      	0x0060:  ffff 961f
      
      Those last two bytes - 96 1f - aren't part of the original packet.
      
      In the ax88179 RX path, the usbnet rx_fixup function trims a 2-byte
      'alignment pseudo header' from the start of the packet, and sets the
      length from a per-packet field populated by hardware. It looks like that
      length field *includes* the 2-byte header; the current driver assumes
      that it's excluded.
      
      This change trims the 2-byte alignment header after we've set the packet
      length, so the resulting packet length is correct. While we're moving
      the comment around, this also fixes the spelling of 'pseudo'.
      
      Signed-off-by: default avatarJeremy Kerr <jk@ozlabs.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27b70214
    • Eric Dumazet's avatar
      net: increment xmit_recursion level in dev_direct_xmit() · 67571b1a
      Eric Dumazet authored
      [ Upstream commit 0ad6f6e7 ]
      
      Back in commit f60e5990 ("ipv6: protect skb->sk accesses
      from recursive dereference inside the stack") Hannes added code
      so that IPv6 stack would not trust skb->sk for typical cases
      where packet goes through 'standard' xmit path (__dev_queue_xmit())
      
      Alas af_packet had a dev_direct_xmit() path that was not
      dealing yet with xmit_recursion level.
      
      Also change sk_mc_loop() to dump a stack once only.
      
      Without this patch, syzbot was able to trigger :
      
      [1]
      [  153.567378] WARNING: CPU: 7 PID: 11273 at net/core/sock.c:721 sk_mc_loop+0x51/0x70
      [  153.567378] Modules linked in: nfnetlink ip6table_raw ip6table_filter iptable_raw iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 iptable_filter macsec macvtap tap macvlan 8021q hsr wireguard libblake2s blake2s_x86_64 libblake2s_generic udp_tunnel ip6_udp_tunnel libchacha20poly1305 poly1305_x86_64 chacha_x86_64 libchacha curve25519_x86_64 libcurve25519_generic netdevsim batman_adv dummy team bridge stp llc w1_therm wire i2c_mux_pca954x i2c_mux cdc_acm ehci_pci ehci_hcd mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core
      [  153.567386] CPU: 7 PID: 11273 Comm: b159172088 Not tainted 5.8.0-smp-DEV #273
      [  153.567387] RIP: 0010:sk_mc_loop+0x51/0x70
      [  153.567388] Code: 66 83 f8 0a 75 24 0f b6 4f 12 b8 01 00 00 00 31 d2 d3 e0 a9 bf ef ff ff 74 07 48 8b 97 f0 02 00 00 0f b6 42 3a 83 e0 01 5d c3 <0f> 0b b8 01 00 00 00 5d c3 0f b6 87 18 03 00 00 5d c0 e8 04 83 e0
      [  153.567388] RSP: 0018:ffff95c69bb93990 EFLAGS: 00010212
      [  153.567388] RAX: 0000000000000011 RBX: ffff95c6e0ee3e00 RCX: 0000000000000007
      [  153.567389] RDX: ffff95c69ae50000 RSI: ffff95c6c30c3000 RDI: ffff95c6c30c3000
      [  153.567389] RBP: ffff95c69bb93990 R08: ffff95c69a77f000 R09: 0000000000000008
      [  153.567389] R10: 0000000000000040 R11: 00003e0e00026128 R12: ffff95c6c30c3000
      [  153.567390] R13: ffff95c6cc4fd500 R14: ffff95c6f84500c0 R15: ffff95c69aa13c00
      [  153.567390] FS:  00007fdc3a283700(0000) GS:ffff95c6ff9c0000(0000) knlGS:0000000000000000
      [  153.567390] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  153.567391] CR2: 00007ffee758e890 CR3: 0000001f9ba20003 CR4: 00000000001606e0
      [  153.567391] Call Trace:
      [  153.567391]  ip6_finish_output2+0x34e/0x550
      [  153.567391]  __ip6_finish_output+0xe7/0x110
      [  153.567391]  ip6_finish_output+0x2d/0xb0
      [  153.567392]  ip6_output+0x77/0x120
      [  153.567392]  ? __ip6_finish_output+0x110/0x110
      [  153.567392]  ip6_local_out+0x3d/0x50
      [  153.567392]  ipvlan_queue_xmit+0x56c/0x5e0
      [  153.567393]  ? ksize+0x19/0x30
      [  153.567393]  ipvlan_start_xmit+0x18/0x50
      [  153.567393]  dev_direct_xmit+0xf3/0x1c0
      [  153.567393]  packet_direct_xmit+0x69/0xa0
      [  153.567394]  packet_sendmsg+0xbf0/0x19b0
      [  153.567394]  ? plist_del+0x62/0xb0
      [  153.567394]  sock_sendmsg+0x65/0x70
      [  153.567394]  sock_write_iter+0x93/0xf0
      [  153.567394]  new_sync_write+0x18e/0x1a0
      [  153.567395]  __vfs_write+0x29/0x40
      [  153.567395]  vfs_write+0xb9/0x1b0
      [  153.567395]  ksys_write+0xb1/0xe0
      [  153.567395]  __x64_sys_write+0x1a/0x20
      [  153.567395]  do_syscall_64+0x43/0x70
      [  153.567396]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  153.567396] RIP: 0033:0x453549
      [  153.567396] Code: Bad RIP value.
      [  153.567396] RSP: 002b:00007fdc3a282cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  153.567397] RAX: ffffffffffffffda RBX: 00000000004d32d0 RCX: 0000000000453549
      [  153.567397] RDX: 0000000000000020 RSI: 0000000020000300 RDI: 0000000000000003
      [  153.567398] RBP: 00000000004d32d8 R08: 0000000000000000 R09: 0000000000000000
      [  153.567398] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d32dc
      [  153.567398] R13: 00007ffee742260f R14: 00007fdc3a282dc0 R15: 00007fdc3a283700
      [  153.567399] ---[ end trace c1d5ae2b1059ec62 ]---
      
      f60e5990
      
       ("ipv6: protect skb->sk accesses from recursive dereference inside the stack")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67571b1a
    • guodeqing's avatar
      net: Fix the arp error in some cases · 97a1d2aa
      guodeqing authored
      [ Upstream commit 5eea3a63 ]
      
      ie.,
      $ ifconfig eth0 6.6.6.6 netmask 255.255.255.0
      
      $ ip rule add from 6.6.6.6 table 6666
      
      $ ip route add 9.9.9.9 via 6.6.6.6
      
      $ ping -I 6.6.6.6 9.9.9.9
      PING 9.9.9.9 (9.9.9.9) from 6.6.6.6 : 56(84) bytes of data.
      
      3 packets transmitted, 0 received, 100% packet loss, time 2079ms
      
      $ arp
      Address     HWtype  HWaddress           Flags Mask            Iface
      6.6.6.6             (incomplete)                              eth0
      
      The arp request address is error, this is because fib_table_lookup in
      fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of
      the fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.
      Here I add a check of whether this is RT_TABLE_MAIN to solve this problem.
      
      Fixes: 3bfd8472
      
       ("net: Use passed in table for nexthop lookups")
      Signed-off-by: default avatarguodeqing <geffrey.guo@huawei.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97a1d2aa
    • Yang Yingliang's avatar
      net: fix memleak in register_netdevice() · 742f2358
      Yang Yingliang authored
      [ Upstream commit 814152a8
      
       ]
      
      I got a memleak report when doing some fuzz test:
      
      unreferenced object 0xffff888112584000 (size 13599):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 32 bytes):
          74 61 70 30 00 00 00 00 00 00 00 00 00 00 00 00  tap0............
          00 ee d9 19 81 88 ff ff 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002f60ba65>] __kmalloc_node+0x309/0x3a0
          [<0000000075b211ec>] kvmalloc_node+0x7f/0xc0
          [<00000000d3a97396>] alloc_netdev_mqs+0x76/0xfc0
          [<00000000609c3655>] __tun_chr_ioctl+0x1456/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff888111845cc0 (size 8):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 8 bytes):
          74 61 70 30 00 88 ff ff                          tap0....
        backtrace:
          [<000000004c159777>] kstrdup+0x35/0x70
          [<00000000d8b496ad>] kstrdup_const+0x3d/0x50
          [<00000000494e884a>] kvasprintf_const+0xf1/0x180
          [<0000000097880a2b>] kobject_set_name_vargs+0x56/0x140
          [<000000008fbdfc7b>] dev_set_name+0xab/0xe0
          [<000000005b99e3b4>] netdev_register_kobject+0xc0/0x390
          [<00000000602704fe>] register_netdevice+0xb61/0x1250
          [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff88811886d800 (size 512):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 32 bytes):
          00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
          ff ff ff ff ff ff ff ff c0 66 3d a3 ff ff ff ff  .........f=.....
        backtrace:
          [<0000000050315800>] device_add+0x61e/0x1950
          [<0000000021008dfb>] netdev_register_kobject+0x17e/0x390
          [<00000000602704fe>] register_netdevice+0xb61/0x1250
          [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      If call_netdevice_notifiers() failed, then rollback_registered()
      calls netdev_unregister_kobject() which holds the kobject. The
      reference cannot be put because the netdev won't be add to todo
      list, so it will leads a memleak, we need put the reference to
      avoid memleak.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      742f2358
    • Tariq Toukan's avatar
      net: Do not clear the sock TX queue in sk_set_socket() · 9e693934
      Tariq Toukan authored
      [ Upstream commit 41b14fb8 ]
      
      Clearing the sock TX queue in sk_set_socket() might cause unexpected
      out-of-order transmit when called from sock_orphan(), as outstanding
      packets can pick a different TX queue and bypass the ones already queued.
      
      This is undesired in general. More specifically, it breaks the in-order
      scheduling property guarantee for device-offloaded TLS sockets.
      
      Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
      explicitly only where needed.
      
      Fixes: e022f0b4
      
       ("net: Introduce sk_tx_queue_mapping")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e693934
    • Taehee Yoo's avatar
      net: core: reduce recursion limit value · 9f217d6d
      Taehee Yoo authored
      [ Upstream commit fb7861d1 ]
      
      In the current code, ->ndo_start_xmit() can be executed recursively only
      10 times because of stack memory.
      But, in the case of the vxlan, 10 recursion limit value results in
      a stack overflow.
      In the current code, the nested interface is limited by 8 depth.
      There is no critical reason that the recursion limitation value should
      be 10.
      So, it would be good to be the same value with the limitation value of
      nesting interface depth.
      
      Test commands:
          ip link add vxlan10 type vxlan vni 10 dstport 4789 srcport 4789 4789
          ip link set vxlan10 up
          ip a a 192.168.10.1/24 dev vxlan10
          ip n a 192.168.10.2 dev vxlan10 lladdr fc:22:33:44:55:66 nud permanent
      
          for i in {9..0}
          do
              let A=$i+1
      	ip link add vxlan$i type vxlan vni $i dstport 4789 srcport 4789 4789
      	ip link set vxlan$i up
      	ip a a 192.168.$i.1/24 dev vxlan$i
      	ip n a 192.168.$i.2 dev vxlan$i lladdr fc:22:33:44:55:66 nud permanent
      	bridge fdb add fc:22:33:44:55:66 dev vxlan$A dst 192.168.$i.2 self
          done
          hping3 192.168.10.2 -2 -d 60000
      
      Splat looks like:
      [  103.814237][ T1127] =============================================================================
      [  103.871955][ T1127] BUG kmalloc-2k (Tainted: G    B            ): Padding overwritten. 0x00000000897a2e4f-0x000
      [  103.873187][ T1127] -----------------------------------------------------------------------------
      [  103.873187][ T1127]
      [  103.874252][ T1127] INFO: Slab 0x000000005cccc724 objects=5 used=5 fp=0x0000000000000000 flags=0x10000000001020
      [  103.881323][ T1127] CPU: 3 PID: 1127 Comm: hping3 Tainted: G    B             5.7.0+ #575
      [  103.882131][ T1127] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  103.883006][ T1127] Call Trace:
      [  103.883324][ T1127]  dump_stack+0x96/0xdb
      [  103.883716][ T1127]  slab_err+0xad/0xd0
      [  103.884106][ T1127]  ? _raw_spin_unlock+0x1f/0x30
      [  103.884620][ T1127]  ? get_partial_node.isra.78+0x140/0x360
      [  103.885214][ T1127]  slab_pad_check.part.53+0xf7/0x160
      [  103.885769][ T1127]  ? pskb_expand_head+0x110/0xe10
      [  103.886316][ T1127]  check_slab+0x97/0xb0
      [  103.886763][ T1127]  alloc_debug_processing+0x84/0x1a0
      [  103.887308][ T1127]  ___slab_alloc+0x5a5/0x630
      [  103.887765][ T1127]  ? pskb_expand_head+0x110/0xe10
      [  103.888265][ T1127]  ? lock_downgrade+0x730/0x730
      [  103.888762][ T1127]  ? pskb_expand_head+0x110/0xe10
      [  103.889244][ T1127]  ? __slab_alloc+0x3e/0x80
      [  103.889675][ T1127]  __slab_alloc+0x3e/0x80
      [  103.890108][ T1127]  __kmalloc_node_track_caller+0xc7/0x420
      [ ... ]
      
      Fixes: 11a766ce
      
       ("net: Increase xmit RECURSION_LIMIT to 10.")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f217d6d
    • Thomas Martitz's avatar
      net: bridge: enfore alignment for ethernet address · f32325b1
      Thomas Martitz authored
      [ Upstream commit db7202dec92e6caa2706c21d6fc359af318bde2e ]
      
      The eth_addr member is passed to ether_addr functions that require
      2-byte alignment, therefore the member must be properly aligned
      to avoid unaligned accesses.
      
      The problem is in place since the initial merge of multicast to unicast:
      commit 6db6f0ea bridge: multicast to unicast
      
      Fixes: 6db6f0ea
      
       ("bridge: multicast to unicast")
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Felix Fietkau <nbd@nbd.name>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Martitz <t.martitz@avm.de>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f32325b1
    • Sven Auhagen's avatar
      mvpp2: ethtool rxtx stats fix · 57a976e6
      Sven Auhagen authored
      [ Upstream commit cc970925
      
       ]
      
      The ethtool rx and tx queue statistics are reporting wrong values.
      Fix reading out the correct ones.
      
      Signed-off-by: default avatarSven Auhagen <sven.auhagen@voleatech.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57a976e6
    • Wang Hai's avatar
      mld: fix memory leak in ipv6_mc_destroy_dev() · fa0d7e09
      Wang Hai authored
      [ Upstream commit ea2fce88 ]
      
      Commit a84d0164 ("mld: fix memory leak in mld_del_delrec()") fixed
      the memory leak of MLD, but missing the ipv6_mc_destroy_dev() path, in
      which mca_sources are leaked after ma_put().
      
      Using ip6_mc_clear_src() to take care of the missing free.
      
      BUG: memory leak
      unreferenced object 0xffff8881113d3180 (size 64):
        comm "syz-executor071", pid 389, jiffies 4294887985 (age 17.943s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 ff 02 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002cbc483c>] kmalloc include/linux/slab.h:555 [inline]
          [<000000002cbc483c>] kzalloc include/linux/slab.h:669 [inline]
          [<000000002cbc483c>] ip6_mc_add1_src net/ipv6/mcast.c:2237 [inline]
          [<000000002cbc483c>] ip6_mc_add_src+0x7f5/0xbb0 net/ipv6/mcast.c:2357
          [<0000000058b8b1ff>] ip6_mc_source+0xe0c/0x1530 net/ipv6/mcast.c:449
          [<000000000bfc4fb5>] do_ipv6_setsockopt.isra.12+0x1b2c/0x3b30 net/ipv6/ipv6_sockglue.c:754
          [<00000000e4e7a722>] ipv6_setsockopt+0xda/0x150 net/ipv6/ipv6_sockglue.c:950
          [<0000000029260d9a>] rawv6_setsockopt+0x45/0x100 net/ipv6/raw.c:1081
          [<000000005c1b46f9>] __sys_setsockopt+0x131/0x210 net/socket.c:2132
          [<000000008491f7db>] __do_sys_setsockopt net/socket.c:2148 [inline]
          [<000000008491f7db>] __se_sys_setsockopt net/socket.c:2145 [inline]
          [<000000008491f7db>] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2145
          [<00000000c7bc11c5>] do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295
          [<000000005fb7a3f3>] entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      Fixes: 1666d49e
      
       ("mld: do not remove mld souce list info when set link down")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Acked-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa0d7e09
    • Thomas Falcon's avatar
      ibmveth: Fix max MTU limit · 009b3e29
      Thomas Falcon authored
      [ Upstream commit 5948378b ]
      
      The max MTU limit defined for ibmveth is not accounting for
      virtual ethernet buffer overhead, which is twenty-two additional
      bytes set aside for the ethernet header and eight additional bytes
      of an opaque handle reserved for use by the hypervisor. Update the
      max MTU to reflect this overhead.
      
      Fixes: d894be57 ("ethernet: use net core MTU range checking in more drivers")
      Fixes: 110447f8
      
       ("ethernet: fix min/max MTU typos")
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      009b3e29
    • Sabrina Dubroca's avatar
      geneve: allow changing DF behavior after creation · f060107c
      Sabrina Dubroca authored
      [ Upstream commit 56c09de3 ]
      
      Currently, trying to change the DF parameter of a geneve device does
      nothing:
      
          # ip -d link show geneve1
          14: geneve1: <snip>
              link/ether <snip>
              geneve id 1 remote 10.0.0.1 ttl auto df set dstport 6081 <snip>
          # ip link set geneve1 type geneve id 1 df unset
          # ip -d link show geneve1
          14: geneve1: <snip>
              link/ether <snip>
              geneve id 1 remote 10.0.0.1 ttl auto df set dstport 6081 <snip>
      
      We just need to update the value in geneve_changelink.
      
      Fixes: a025fb5f
      
       ("geneve: Allow configuration of DF behaviour")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f060107c
    • Claudiu Manoil's avatar
      enetc: Fix tx rings bitmap iteration range, irq handling · ce06fcb6
      Claudiu Manoil authored
      [ Upstream commit 0574e200 ]
      
      The rings bitmap of an interrupt vector encodes
      which of the device's rings were assigned to that
      interrupt vector.
      Hence the iteration range of the tx rings bitmap
      (for_each_set_bit()) should be the total number of
      Tx rings of that netdevice instead of the number of
      rings assigned to the interrupt vector.
      Since there are 2 cores, and one interrupt vector for
      each core, the number of rings asigned to an interrupt
      vector is half the number of available rings.
      The impact of this error is that the upper half of the
      tx rings could still generate interrupts during napi
      polling.
      
      Fixes: d4fd0404
      
       ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce06fcb6
    • yu kuai's avatar
      block/bio-integrity: don't free 'buf' if bio_integrity_add_page() failed · b90ca325
      yu kuai authored
      commit a75ca930 upstream.
      
      commit e7bf90e5 ("block/bio-integrity: fix a memory leak bug") added
      a kfree() for 'buf' if bio_integrity_add_page() returns '0'. However,
      the object will be freed in bio_integrity_free() since 'bio->bi_opf' and
      'bio->bi_integrity' were set previousy in bio_integrity_alloc().
      
      Fixes: commit e7bf90e5
      
       ("block/bio-integrity: fix a memory leak bug")
      Signed-off-by: default avataryu kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarBob Liu <bob.liu@oracle.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b90ca325
  2. Jun 24, 2020