Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Nov 26, 2021
  2. Nov 25, 2021
  3. Nov 24, 2021
    • Jakub Kicinski's avatar
      Merge branch 'net-smc-fixes-2021-11-24' · fef30d63
      Jakub Kicinski authored
      Karsten Graul says:
      
      ====================
      net/smc: fixes 2021-11-24
      
      Patch 1 from DaXing fixes a possible loop in smc_listen().
      Patch 2 prevents a NULL pointer dereferencing while iterating
      over the lower network devices.
      ====================
      
      Link: https://lore.kernel.org/r/20211124123238.471429-1-kgraul@linux.ibm.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fef30d63
    • Guo DaXing's avatar
      net/smc: Fix loop in smc_listen · 9ebb0c4b
      Guo DaXing authored
      The kernel_listen function in smc_listen will fail when all the available
      ports are occupied.  At this point smc->clcsock->sk->sk_data_ready has
      been changed to smc_clcsock_data_ready.  When we call smc_listen again,
      now both smc->clcsock->sk->sk_data_ready and smc->clcsk_data_ready point
      to the smc_clcsock_data_ready function.
      
      The smc_clcsock_data_ready() function calls lsmc->clcsk_data_ready which
      now points to itself resulting in an infinite loop.
      
      This patch restores smc->clcsock->sk->sk_data_ready with the old value.
      
      Fixes: a60a2b1e
      
       ("net/smc: reduce active tcp_listen workers")
      Signed-off-by: default avatarGuo DaXing <guodaxing@huawei.com>
      Acked-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9ebb0c4b
    • Karsten Graul's avatar
      net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk() · 587acad4
      Karsten Graul authored
      Coverity reports a possible NULL dereferencing problem:
      
      in smc_vlan_by_tcpsk():
      6. returned_null: netdev_lower_get_next returns NULL (checked 29 out of 30 times).
      7. var_assigned: Assigning: ndev = NULL return value from netdev_lower_get_next.
      1623                ndev = (struct net_device *)netdev_lower_get_next(ndev, &lower);
      CID 1468509 (#1 of 1): Dereference null return value (NULL_RETURNS)
      8. dereference: Dereferencing a pointer that might be NULL ndev when calling is_vlan_dev.
      1624                if (is_vlan_dev(ndev)) {
      
      Remove the manual implementation and use netdev_walk_all_lower_dev() to
      iterate over the lower devices. While on it remove an obsolete function
      parameter comment.
      
      Fixes: cb9d43f6
      
       ("net/smc: determine vlan_id of stacked net_device")
      Suggested-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      587acad4
    • Jakub Kicinski's avatar
      Merge branch 'phylink-resolve-fixes' · 06e5ba71
      Jakub Kicinski authored
      Marek Behún says:
      
      ====================
      phylink resolve fixes
      
      With information from me and my nagging, Russell has produced two fixes
      for phylink, which add code that triggers another phylink_resolve() from
      phylink_resolve(), if certain conditions are met:
        interface is being changed
      or
        link is down and previous link was up
      These are needed because sometimes the PCS callbacks may provide stale
      values if link / speed / ...
      ====================
      
      Link: https://lore.kernel.org/r/20211123154403.32051-1-kabel@kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      06e5ba71
    • Russell King (Oracle)'s avatar
      net: phylink: Force retrigger in case of latched link-fail indicator · dbae3388
      Russell King (Oracle) authored
      On mv88e6xxx 1G/2.5G PCS, the SerDes register 4.2001.2 has the following
      description:
        This register bit indicates when link was lost since the last
        read. For the current link status, read this register
        back-to-back.
      
      Thus to get current link state, we need to read the register twice.
      
      But doing that in the link change interrupt handler would lead to
      potentially ignoring link down events, which we really want to avoid.
      
      Thus this needs to be solved in phylink's resolve, by retriggering
      another resolve in the event when PCS reports link down and previous
      link was up, and by re-reading PCS state if the previous link was down.
      
      The wrong value is read when phylink requests change from sgmii to
      2500base-x mode, and link won't come up. This fixes the bug.
      
      Fixes: 9525ae83
      
       ("phylink: add phylink infrastructure")
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dbae3388
    • Russell King (Oracle)'s avatar
      net: phylink: Force link down and retrigger resolve on interface change · 80662f4f
      Russell King (Oracle) authored
      On PHY state change the phylink_resolve() function can read stale
      information from the MAC and report incorrect link speed and duplex to
      the kernel message log.
      
      Example with a Marvell 88X3310 PHY connected to a SerDes port on Marvell
      88E6393X switch:
      - PHY driver triggers state change due to PHY interface mode being
        changed from 10gbase-r to 2500base-x due to copper change in speed
        from 10Gbps to 2.5Gbps, but the PHY itself either hasn't yet changed
        its interface to the host, or the interrupt about loss of SerDes link
        hadn't arrived yet (there can be a delay of several milliseconds for
        this), so we still think that the 10gbase-r mode is up
      - phylink_resolve()
        - phylink_mac_pcs_get_state()
          - this fills in speed=10g link=up
        - interface mode is updated to 2500base-x but speed is left at 10Gbps
        - phylink_major_config()
          - interface is changed to 2500base-x
        - phylink_link_up()
          - mv88e6xxx_mac_link_up()
            - .port_set_speed_duplex()
              - speed is set to 10Gbps
          - reports "Link is Up - 10Gbps/Full" to dmesg
      
      Afterwards when the interrupt finally arrives for mv88e6xxx, another
      resolve is forced in which we get the correct speed from
      phylink_mac_pcs_get_state(), but since the interface is not being
      changed anymore, we don't call phylink_major_config() but only
      phylink_mac_config(), which does not set speed/duplex anymore.
      
      To fix this, we need to force the link down and trigger another resolve
      on PHY interface change event.
      
      Fixes: 9525ae83
      
       ("phylink: add phylink infrastructure")
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      80662f4f
    • Heiner Kallweit's avatar
      lan743x: fix deadlock in lan743x_phy_link_status_change() · ddb826c2
      Heiner Kallweit authored
      Usage of phy_ethtool_get_link_ksettings() in the link status change
      handler isn't needed, and in combination with the referenced change
      it results in a deadlock. Simply remove the call and replace it with
      direct access to phydev->speed. The duplex argument of
      lan743x_phy_update_flowcontrol() isn't used and can be removed.
      
      Fixes: c10a485c
      
       ("phy: phy_ethtool_ksettings_get: Lock the phy for consistency")
      Reported-by: default avatarAlessandro B Maurici <abmaurici@gmail.com>
      Tested-by: default avatarAlessandro B Maurici <abmaurici@gmail.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/40e27f76-0ba3-dcef-ee32-a78b9df38b0f@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ddb826c2
    • Eric Dumazet's avatar
      tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows · 4e1fddc9
      Eric Dumazet authored
      While testing BIG TCP patch series, I was expecting that TCP_RR workloads
      with 80KB requests/answers would send one 80KB TSO packet,
      then being received as a single GRO packet.
      
      It turns out this was not happening, and the root cause was that
      cubic Hystart ACK train was triggering after a few (2 or 3) rounds of RPC.
      
      Hystart was wrongly setting CWND/SSTHRESH to 30, while my RPC
      needed a budget of ~20 segments.
      
      Ideally these TCP_RR flows should not exit slow start.
      
      Cubic Hystart should reset itself at each round, instead of assuming
      every TCP flow is a bulk one.
      
      Note that even after this patch, Hystart can still trigger, depending
      on scheduling artifacts, but at a higher CWND/SSTHRESH threshold,
      keeping optimal TSO packet sizes.
      
      Tested:
      
      ip link set dev eth0 gro_ipv6_max_size 131072 gso_ipv6_max_size 131072
      nstat -n; netperf -H ... -t TCP_RR  -l 5  -- -r 80000,80000 -K cubic; nstat|egrep "Ip6InReceives|Hystart|Ip6OutRequests"
      
      Before:
      
         8605
      Ip6InReceives                   87541              0.0
      Ip6OutRequests                  129496             0.0
      TcpExtTCPHystartTrainDetect     1                  0.0
      TcpExtTCPHystartTrainCwnd       30                 0.0
      
      After:
      
        8760
      Ip6InReceives                   88514              0.0
      Ip6OutRequests                  87975              0.0
      
      Fixes: ae27e98a
      
       ("[TCP] CUBIC v2.3")
      Co-developed-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Link: https://lore.kernel.org/r/20211123202535.1843771-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4e1fddc9