Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Dec 09, 2020
  2. Dec 08, 2020
    • Eric Dumazet's avatar
      tcp: select sane initial rcvq_space.space for big MSS · 72d05c00
      Eric Dumazet authored
      Before commit a337531b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
      small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.
      
      This is no longer the case, and Hazem Mohamed Abuelfotoh reported
      that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.
      
      Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
      of rcvq_space.space computation, while it can select later a smaller
      value for tp->rcv_ssthresh and tp->window_clamp.
      
      ss -temoi on receiver would show :
      
      skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596
      
      This means that TCP can not increase its window in tcp_grow_window(),
      and that DRS can never kick.
      
      Fix this by making sure that rcvq_space.space is not bigger than number of bytes
      that can be held in TCP receive queue.
      
      People unable/unwilling to change their kernel can work around this issue by
      selecting a bigger tcp_rmem[1] value as in :
      
      echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem
      
      Based on an initial report and patch from Hazem Mohamed Abuelfotoh
       https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/
      
      Fixes: a337531b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
      Fixes: 041a14d2
      
       ("tcp: start receiver buffer autotuning sooner")
      Reported-by: default avatarHazem Mohamed Abuelfotoh <abuehaze@amazon.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72d05c00
    • Zhang Changzhong's avatar
      net: ll_temac: Fix potential NULL dereference in temac_probe() · cc6596fc
      Zhang Changzhong authored
      platform_get_resource() may fail and in this case a NULL dereference
      will occur.
      
      Fix it to use devm_platform_ioremap_resource() instead of calling
      platform_get_resource() and devm_ioremap().
      
      This is detected by Coccinelle semantic patch.
      
      @@
      expression pdev, res, n, t, e, e1, e2;
      @@
      
      res = \(platform_get_resource\|platform_get_resource_byname\)(pdev, t, n);
      + if (!res)
      +   return -EINVAL;
      ... when != res == NULL
      e = devm_ioremap(e1, res->start, e2);
      
      Fixes: 8425c41d
      
       ("net: ll_temac: Extend support to non-device-tree platforms")
      Signed-off-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Acked-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc6596fc
    • Cengiz Can's avatar
      net: tipc: prevent possible null deref of link · 0398ba9e
      Cengiz Can authored
      
      `tipc_node_apply_property` does a null check on a `tipc_link_entry`
      pointer but also accesses the same pointer out of the null check block.
      
      This triggers a warning on Coverity Static Analyzer because we're
      implying that `e->link` can BE null.
      
      Move "Update MTU for node link entry" line into if block to make sure
      that we're not in a state that `e->link` is null.
      
      Signed-off-by: default avatarCengiz Can <cengiz@kernel.wtf>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0398ba9e
    • David S. Miller's avatar
      Merge branch 'stmmac-fixes' · 9756bb63
      David S. Miller authored
      
      Joakim Zhang says:
      
      ====================
      patches for stmmac
      
      A patch set for stmmac, fix some driver issues.
      
      ChangeLogs:
      V1->V2:
      	* add Fixes tag.
      	* add patch 5/5 into this patch set.
      
      V2->V3:
      	* rebase to latest net tree where fixes go.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9756bb63
    • Fugang Duan's avatar
      net: stmmac: overwrite the dma_cap.addr64 according to HW design · f119cc98
      Fugang Duan authored
      The current IP register MAC_HW_Feature1[ADDR64] only defines
      32/40/64 bit width, but some SOCs support others like i.MX8MP
      support 34 bits but it maps to 40 bits width in MAC_HW_Feature1[ADDR64].
      So overwrite dma_cap.addr64 according to HW real design.
      
      Fixes: 94abdad6
      
       ("net: ethernet: dwmac: add ethernet glue logic for NXP imx8 chip")
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f119cc98
    • Fugang Duan's avatar
      net: stmmac: delete the eee_ctrl_timer after napi disabled · 5f585913
      Fugang Duan authored
      There have chance to re-enable the eee_ctrl_timer and fire the timer
      in napi callback after delete the timer in .stmmac_release(), which
      introduces to access eee registers in the timer function after clocks
      are disabled then causes system hang. Found this issue when do
      suspend/resume and reboot stress test.
      
      It is safe to delete the timer after napi disabled and disable lpi mode.
      
      Fixes: d765955d
      
       ("stmmac: add the Energy Efficient Ethernet support")
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f585913
    • Fugang Duan's avatar
      net: stmmac: free tx skb buffer in stmmac_resume() · 4ec236c7
      Fugang Duan authored
      When do suspend/resume test, there have WARN_ON() log dump from
      stmmac_xmit() funciton, the code logic:
      	entry = tx_q->cur_tx;
      	first_entry = entry;
      	WARN_ON(tx_q->tx_skbuff[first_entry]);
      
      In normal case, tx_q->tx_skbuff[txq->cur_tx] should be NULL because
      the skb should be handled and freed in stmmac_tx_clean().
      
      But stmmac_resume() reset queue parameters like below, skb buffers
      may not be freed.
      	tx_q->cur_tx = 0;
      	tx_q->dirty_tx = 0;
      
      So free tx skb buffer in stmmac_resume() to avoid warning and
      memory leak.
      
      log:
      [   46.139824] ------------[ cut here ]------------
      [   46.144453] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3235 stmmac_xmit+0x7a0/0x9d0
      [   46.154969] Modules linked in: crct10dif_ce vvcam(O) flexcan can_dev
      [   46.161328] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      5.4.24-2.1.0+g2ad925d15481 #1
      [   46.170369] Hardware name: NXP i.MX8MPlus EVK board (DT)
      [   46.175677] pstate: 80000005 (Nzcv daif -PAN -UAO)
      [   46.180465] pc : stmmac_xmit+0x7a0/0x9d0
      [   46.184387] lr : dev_hard_start_xmit+0x94/0x158
      [   46.188913] sp : ffff800010003cc0
      [   46.192224] x29: ffff800010003cc0 x28: ffff000177e2a100
      [   46.197533] x27: ffff000176ef0840 x26: ffff000176ef0090
      [   46.202842] x25: 0000000000000000 x24: 0000000000000000
      [   46.208151] x23: 0000000000000003 x22: ffff8000119ddd30
      [   46.213460] x21: ffff00017636f000 x20: ffff000176ef0cc0
      [   46.218769] x19: 0000000000000003 x18: 0000000000000000
      [   46.224078] x17: 0000000000000000 x16: 0000000000000000
      [   46.229386] x15: 0000000000000079 x14: 0000000000000000
      [   46.234695] x13: 0000000000000003 x12: 0000000000000003
      [   46.240003] x11: 0000000000000010 x10: 0000000000000010
      [   46.245312] x9 : ffff00017002b140 x8 : 0000000000000000
      [   46.250621] x7 : ffff00017636f000 x6 : 0000000000000010
      [   46.255930] x5 : 0000000000000001 x4 : ffff000176ef0000
      [   46.261238] x3 : 0000000000000003 x2 : 00000000ffffffff
      [   46.266547] x1 : ffff000177e2a000 x0 : 0000000000000000
      [   46.271856] Call trace:
      [   46.274302]  stmmac_xmit+0x7a0/0x9d0
      [   46.277874]  dev_hard_start_xmit+0x94/0x158
      [   46.282056]  sch_direct_xmit+0x11c/0x338
      [   46.285976]  __qdisc_run+0x118/0x5f0
      [   46.289549]  net_tx_action+0x110/0x198
      [   46.293297]  __do_softirq+0x120/0x23c
      [   46.296958]  irq_exit+0xb8/0xd8
      [   46.300098]  __handle_domain_irq+0x64/0xb8
      [   46.304191]  gic_handle_irq+0x5c/0x148
      [   46.307936]  el1_irq+0xb8/0x180
      [   46.311076]  cpuidle_enter_state+0x84/0x360
      [   46.315256]  cpuidle_enter+0x34/0x48
      [   46.318829]  call_cpuidle+0x18/0x38
      [   46.322314]  do_idle+0x1e0/0x280
      [   46.325539]  cpu_startup_entry+0x24/0x40
      [   46.329460]  rest_init+0xd4/0xe0
      [   46.332687]  arch_call_rest_init+0xc/0x14
      [   46.336695]  start_kernel+0x420/0x44c
      [   46.340353] ---[ end trace bc1ee695123cbacd ]---
      
      Fixes: 47dd7a54
      
       ("net: add support for STMicroelectronics Ethernet controllers.")
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ec236c7
    • Fugang Duan's avatar
      net: stmmac: start phylink instance before stmmac_hw_setup() · 36d18b56
      Fugang Duan authored
      Start phylink instance and resume back the PHY to supply
      RX clock to MAC before MAC layer initialization by calling
      .stmmac_hw_setup(), since DMA reset depends on the RX clock,
      otherwise DMA reset cost maximum timeout value then finally
      timeout.
      
      Fixes: 74371272
      
       ("net: stmmac: Convert to phylink and remove phylib logic")
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36d18b56
    • Fugang Duan's avatar
      net: stmmac: increase the timeout for dma reset · 9d14edfd
      Fugang Duan authored
      
      Current timeout value is not enough for gmac5 dma reset
      on imx8mp platform, increase the timeout range.
      
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d14edfd
    • Pablo Neira Ayuso's avatar
      netfilter: nftables: comment indirect serialization of commit_mutex with rtnl_mutex · 42f1c271
      Pablo Neira Ayuso authored
      Add an explicit comment in the code to describe the indirect
      serialization of the holders of the commit_mutex with the rtnl_mutex.
      Commit 90d2723c
      
       ("netfilter: nf_tables: do not hold reference on
      netdevice from preparation phase") already describes this, but a comment
      in this case is better for reference.
      
      Reported-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      42f1c271
    • Pablo Neira Ayuso's avatar
      netfilter: nft_dynset: fix timeouts later than 23 days · 917d80d3
      Pablo Neira Ayuso authored
      Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by
      8e1102d5 ("netfilter: nf_tables: support timeouts larger than 23
      days"), otherwise ruleset listing breaks.
      
      Fixes: a8b1e36d
      
       ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      917d80d3
    • Jarod Wilson's avatar
      bonding: fix feature flag setting at init time · 007ab534
      Jarod Wilson authored
      Don't try to adjust XFRM support flags if the bond device isn't yet
      registered. Bad things can currently happen when netdev_change_features()
      is called without having wanted_features fully filled in yet. This code
      runs both on post-module-load mode changes, as well as at module init
      time, and when run at module init time, it is before register_netdevice()
      has been called and filled in wanted_features. The empty wanted_features
      led to features also getting emptied out, which was definitely not the
      intended behavior, so prevent that from happening.
      
      Originally, I'd hoped to stop adjusting wanted_features at all in the
      bonding driver, as it's documented as being something only the network
      core should touch, but we actually do need to do this to properly update
      both the features and wanted_features fields when changing the bond type,
      or we get to a situation where ethtool sees:
      
          esp-hw-offload: off [requested on]
      
      I do think we should be using netdev_update_features instead of
      netdev_change_features here though, so we only send notifiers when the
      features actually changed.
      
      Fixes: a3b658cf
      
       ("bonding: allow xfrm offload setup post-module-load")
      Reported-by: default avatarIvan Vecera <ivecera@redhat.com>
      Suggested-by: default avatarIvan Vecera <ivecera@redhat.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Link: https://lore.kernel.org/r/20201205172229.576587-1-jarod@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      007ab534
    • Subash Abhinov Kasiviswanathan's avatar
      netfilter: x_tables: Switch synchronization to RCU · cc00bcaa
      Subash Abhinov Kasiviswanathan authored
      When running concurrent iptables rules replacement with data, the per CPU
      sequence count is checked after the assignment of the new information.
      The sequence count is used to synchronize with the packet path without the
      use of any explicit locking. If there are any packets in the packet path using
      the table information, the sequence count is incremented to an odd value and
      is incremented to an even after the packet process completion.
      
      The new table value assignment is followed by a write memory barrier so every
      CPU should see the latest value. If the packet path has started with the old
      table information, the sequence counter will be odd and the iptables
      replacement will wait till the sequence count is even prior to freeing the
      old table info.
      
      However, this assumes that the new table information assignment and the memory
      barrier is actually executed prior to the counter check in the replacement
      thread. If CPU decides to execute the assignment later as there is no user of
      the table information prior to the sequence check, the packet path in another
      CPU may use the old table information. The replacement thread would then free
      the table information under it leading to a use after free in the packet
      processing context-
      
      Unable to handle kernel NULL pointer dereference at virtual
      address 000000000000008e
      pc : ip6t_do_table+0x5d0/0x89c
      lr : ip6t_do_table+0x5b8/0x89c
      ip6t_do_table+0x5d0/0x89c
      ip6table_filter_hook+0x24/0x30
      nf_hook_slow+0x84/0x120
      ip6_input+0x74/0xe0
      ip6_rcv_finish+0x7c/0x128
      ipv6_rcv+0xac/0xe4
      __netif_receive_skb+0x84/0x17c
      process_backlog+0x15c/0x1b8
      napi_poll+0x88/0x284
      net_rx_action+0xbc/0x23c
      __do_softirq+0x20c/0x48c
      
      This could be fixed by forcing instruction order after the new table
      information assignment or by switching to RCU for the synchronization.
      
      Fixes: 80055dab
      
       ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore")
      Reported-by: default avatarSean Tranchetti <stranche@codeaurora.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cc00bcaa
  3. Dec 07, 2020