Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Aug 28, 2016
    • Eran Ben Elisha's avatar
      net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ · cc8e9ebf
      Eran Ben Elisha authored
      The driver RQ has two possible configurations: striding RQ and
      non-striding RQ.  Until this patch, the driver always reported the
      number of hardware WQEs (ring descriptors). For non striding RQ
      configuration, this was OK since we have one WQE per pending packet
      For striding RQ, multiple packets can fit into one WQE. For better
      user experience we normalize the rx_pending parameter (size of wqe/mtu)
      as the average ring size in case of striding RQ.
      
      Fixes: 461017cb
      
       ('net/mlx5e: Support RX multi-packet WQE ...')
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc8e9ebf
    • Saeed Mahameed's avatar
      net/mlx5e: Don't wait for SQ completions on close · 6e8dd6d6
      Saeed Mahameed authored
      Instead of asking the firmware to flush the SQ (Send Queue) via
      asynchronous completions when moved to error, we handle SQ flush
      manually (mlx5e_free_tx_descs) same as we did when SQ flush got
      timed out or on tx_timeout.
      
      This will reduce SQs flush time and speedup interface down procedure.
      
      Moved mlx5e_free_tx_descs to the end of en_tx.c for tx
      critical code locality.
      
      Fixes: 29429f33
      
       ('net/mlx5e: Timeout if SQ doesn't flush during close')
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e8dd6d6
    • Saeed Mahameed's avatar
      net/mlx5e: Don't post fragmented MPWQE when RQ is disabled · 8484f9ed
      Saeed Mahameed authored
      ICO (Internal control operations) SQ (Send Queue) is closed/disabled
      after RQ (Receive Queue).  After RQ is closed an ICO SQ completion
      might post a fragmented MPWQE (Multi Packet Work Queue Element) into
      that RQ.
      
      As on regular RQ post, check if we are allowed to post to that
      RQ (RQ is enabled). Cleanup in-progress UMR MPWQE on mlx5e_free_rx_descs
      if needed.
      
      Fixes: bc77b240
      
       ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8484f9ed
    • Saeed Mahameed's avatar
      net/mlx5e: Don't wait for RQ completions on close · f2fde18c
      Saeed Mahameed authored
      This will significantly reduce receive queue flush time on interface
      down.
      
      Instead of asking the firmware to flush the RQ (Receive Queue) via
      asynchronous completions when moved to error, we handle RQ flush
      manually (mlx5e_free_rx_descs) same as we did when RQ flush got timed
      out.
      
      This will reduce RQs flush time and speedup interface down procedure
      (ifconfig down) from 6 sec to 0.3 sec on a 48 cores system.
      
      Moved mlx5e_free_rx_descs en_main.c where it is needed, to keep en_rx.c
      free form non critical data path code for better code locality.
      
      Fixes: 6cd392a0
      
       ('net/mlx5e: Handle RQ flush in error cases')
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2fde18c
    • Saeed Mahameed's avatar
      net/mlx5e: Limit UMR length to the device's limitation · fe4c988b
      Saeed Mahameed authored
      ConnectX-4 UMR (User Memory Region) MTT translation table offset in WQE
      is limited to U16_MAX, before this patch we ignored that limitation and
      requested the maximum possible UMR translation length that the netdev
      might need (MAX channels * MAX pages per channel).
      In case of a system with #cores > 32 and when linear WQE allocation fails,
      falling back to using UMR WQEs will cause the RQ (Receive Queue) to get
      stuck.
      
      Here we limit UMR length to min(U16_MAX, max required pages) (while
      considering the required alignments) on driver load, by default U16_MAX is
      sufficient since the default RX rings value guarantees that we are in
      range, dynamically (on set_ringparam/set_channels) we will check if the
      new required UMR length (num mtts) is still in range, if not, fail the
      request.
      
      Fixes: bc77b240
      
       ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe4c988b
  2. Aug 27, 2016
  3. Aug 26, 2016
  4. Aug 25, 2016
  5. Aug 24, 2016
    • Yotam Gigi's avatar
      mlxsw: router: Enable neighbors to be created on stacked devices · 51af96b5
      Yotam Gigi authored
      Make the function mlxsw_router_neigh_construct search the rif according
      to the neighbour dev other than the dev that was passed to the ndo, thus
      allowing creating neigbhours upon stacked devices.
      
      Fixes: 6cf3c971
      
       ("mlxsw: spectrum_router: Add private neigh table")
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51af96b5
    • Ido Schimmel's avatar
      mlxsw: spectrum: Add missing flood to router port · f888f587
      Ido Schimmel authored
      In case we have a layer 3 interface on top of a bridge (VLAN / FID RIF),
      then we should flood the following packet types to the router:
      
      * Broadcast: If DIP is the broadcast address of the interface, then we
      need to be able to get it to CPU by trapping it following route lookup.
      
      * Reserved IP multicast (224.0.0.X): Some control packets (e.g. OSPF)
      use this range and are trapped in the router block.
      
      Fixes: 99f44bb3
      
       ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f888f587
    • Daniel Borkmann's avatar
      Bluetooth: split sk_filter in l2cap_sock_recv_cb · dbb50887
      Daniel Borkmann authored
      During an audit for sk_filter(), we found that rx_busy_skb handling
      in l2cap_sock_recv_cb() and l2cap_sock_recvmsg() looks not quite as
      intended.
      
      The assumption from commit e328140f ("Bluetooth: Use event-driven
      approach for handling ERTM receive buffer") is that errors returned
      from sock_queue_rcv_skb() are due to receive buffer shortage. However,
      nothing should prevent doing a setsockopt() with SO_ATTACH_FILTER on
      the socket, that could drop some of the incoming skbs when handled in
      sock_queue_rcv_skb().
      
      In that case sock_queue_rcv_skb() will return with -EPERM, propagated
      from sk_filter() and if in L2CAP_MODE_ERTM mode, wrong assumption was
      that we failed due to receive buffer being full. From that point onwards,
      due to the to-be-dropped skb being held in rx_busy_skb, we cannot make
      any forward progress as rx_busy_skb is never cleared from l2cap_sock_recvmsg(),
      due to the filter drop verdict over and over coming from sk_filter().
      Meanwhile, in l2cap_sock_recv_cb() all new incoming skbs are being
      dropped due to rx_busy_skb being occupied.
      
      Instead, just use __sock_queue_rcv_skb() where an error really tells that
      there's a receive buffer issue. Split the sk_filter() and enable it for
      non-segmented modes at queuing time since at this point in time the skb has
      already been through the ERTM state machine and it has been acked, so dropping
      is not allowed. Instead, for ERTM and streaming mode, call sk_filter() in
      l2cap_data_rcv() so the packet can be dropped before the state machine sees it.
      
      Fixes: e328140f
      
       ("Bluetooth: Use event-driven approach for handling ERTM receive buffer")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      dbb50887
    • Frederic Dalleau's avatar
      Bluetooth: Fix memory leak at end of hci requests · 9afee949
      Frederic Dalleau authored
      
      In hci_req_sync_complete the event skb is referenced in hdev->req_skb.
      It is used (via hci_req_run_skb) from either __hci_cmd_sync_ev which will
      pass the skb to the caller, or __hci_req_sync which leaks.
      
      unreferenced object 0xffff880005339a00 (size 256):
        comm "kworker/u3:1", pid 1011, jiffies 4294671976 (age 107.389s)
        backtrace:
          [<ffffffff818d89d9>] kmemleak_alloc+0x49/0xa0
          [<ffffffff8116bba8>] kmem_cache_alloc+0x128/0x180
          [<ffffffff8167c1df>] skb_clone+0x4f/0xa0
          [<ffffffff817aa351>] hci_event_packet+0xc1/0x3290
          [<ffffffff8179a57b>] hci_rx_work+0x18b/0x360
          [<ffffffff810692ea>] process_one_work+0x14a/0x440
          [<ffffffff81069623>] worker_thread+0x43/0x4d0
          [<ffffffff8106ead4>] kthread+0xc4/0xe0
          [<ffffffff818dd38f>] ret_from_fork+0x1f/0x40
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      Signed-off-by: default avatarFrédéric Dalleau <frederic.dalleau@collabora.co.uk>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      9afee949
    • David Ahern's avatar
      net: diag: Fix refcnt leak in error path destroying socket · d7226c7a
      David Ahern authored
      inet_diag_find_one_icsk takes a reference to a socket that is not
      released if sock_diag_destroy returns an error. Fix by changing
      tcp_diag_destroy to manage the refcnt for all cases and remove
      the sock_put calls from tcp_abort.
      
      Fixes: c1e64e29
      
       ("net: diag: Support destroying TCP sockets")
      Reported-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7226c7a
    • Soheil Hassas Yeganeh's avatar
      tun: fix transmit timestamp support · 7b996243
      Soheil Hassas Yeganeh authored
      Instead of using sock_tx_timestamp, use skb_tx_timestamp to record
      software transmit timestamp of a packet.
      
      sock_tx_timestamp resets and overrides the tx_flags of the skb.
      The function is intended to be called from within the protocol
      layer when creating the skb, not from a device driver. This is
      inconsistent with other drivers and will cause issues for TCP.
      
      In TCP, we intend to sample the timestamps for the last byte
      for each sendmsg/sendpage. For that reason, tcp_sendmsg calls
      tcp_tx_timestamp only with the last skb that it generates.
      For example, if a 128KB message is split into two 64KB packets
      we want to sample the SND timestamp of the last packet. The current
      code in the tun driver, however, will result in sampling the SND
      timestamp for both packets.
      
      Also, when the last packet is split into smaller packets for
      retranmission (see tcp_fragment), the tun driver will record
      timestamps for all of the retransmitted packets and not only the
      last packet.
      
      Fixes: eda29772
      
       (tun: Support software transmit time stamping.)
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarFrancis Yan <francisyyan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b996243
  6. Aug 23, 2016
  7. Aug 22, 2016
    • Yuval Mintz's avatar
      qed: FLR of active VFs might lead to FW assert · 4870e704
      Yuval Mintz authored
      Driver never bothered marking the VF's vport with the VF's sw_fid.
      As a result, FLR flows are not going to clean those vports.
      
      If the vport was active when FLRed, re-activating it would lead
      to a FW assertion.
      
      Fixes: dacd88d6
      
       ("qed: IOV l2 functionality")
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4870e704
    • Shmulik Ladkani's avatar
      net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset · c0451fe1
      Shmulik Ladkani authored
      In b8247f09,
      
         "net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs"
      
      gso skbs arriving from an ingress interface that go through UDP
      tunneling, are allowed to be fragmented if the resulting encapulated
      segments exceed the dst mtu of the egress interface.
      
      This aligned the behavior of gso skbs to non-gso skbs going through udp
      encapsulation path.
      
      However the non-gso vs gso anomaly is present also in the following
      cases of a GRE tunnel:
       - ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
         (e.g. OvS vport-gre with df_default=false)
       - ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set
      
      In both of the above cases, the non-gso skbs get fragmented, whereas the
      gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped,
      as they don't go through the segment+fragment code path.
      
      Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set.
      
      Tunnels that do set IP_DF, will not go to fragmentation of segments.
      This preserves behavior of ip_gre in (the default) pmtudisc mode.
      
      Fixes: b8247f09
      
       ("net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs")
      Reported-by: default avatarwenxu <wenxu@ucloud.cn>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Tested-by: default avatarwenxu <wenxu@ucloud.cn>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0451fe1
    • Mike Manning's avatar
      net: ipv6: Remove addresses for failures with strict DAD · 85b51b12
      Mike Manning authored
      
      If DAD fails with accept_dad set to 2, global addresses and host routes
      are incorrectly left in place. Even though disable_ipv6 is set,
      contrary to documentation, the addresses are not dynamically deleted
      from the interface. It is only on a subsequent link down/up that these
      are removed. The fix is not only to set the disable_ipv6 flag, but
      also to call addrconf_ifdown(), which is the action to carry out when
      disabling IPv6. This results in the addresses and routes being deleted
      immediately. The DAD failure for the LL addr is determined as before
      via netlink, or by the absence of the LL addr (which also previously
      would have had to be checked for in case of an intervening link down
      and up). As the call to addrconf_ifdown() requires an rtnl lock, the
      logic to disable IPv6 when DAD fails is moved to addrconf_dad_work().
      
      Previous behavior:
      
      root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
      net.ipv6.conf.eth3.accept_dad = 2
      root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
      root@vm1:/# ip link set up eth3
      root@vm1:/# ip -6 addr show dev eth3
      5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2000::10/64 scope global
             valid_lft forever preferred_lft forever
          inet6 fe80::5054:ff:fe43:dd5a/64 scope link tentative dadfailed
             valid_lft forever preferred_lft forever
      root@vm1:/# ip -6 route show dev eth3
      2000::/64  proto kernel  metric 256
      fe80::/64  proto kernel  metric 256
      root@vm1:/# ip link set down eth3
      root@vm1:/# ip link set up eth3
      root@vm1:/# ip -6 addr show dev eth3
      root@vm1:/# ip -6 route show dev eth3
      root@vm1:/#
      
      New behavior:
      
      root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
      net.ipv6.conf.eth3.accept_dad = 2
      root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
      root@vm1:/# ip link set up eth3
      root@vm1:/# ip -6 addr show dev eth3
      root@vm1:/# ip -6 route show dev eth3
      root@vm1:/#
      
      Signed-off-by: default avatarMike Manning <mmanning@brocade.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85b51b12
    • Mikko Rapeli's avatar
      include/uapi/linux/ipx.h: fix conflicting defitions with glibc netipx/ipx.h · 53dc65d4
      Mikko Rapeli authored
      
      Fixes these compiler warnings via libc-compat.h when glibc netipx/ipx.h is
      included before linux/ipx.h:
      
      ./linux/ipx.h:9:8: error: redefinition of ‘struct sockaddr_ipx’
      ./linux/ipx.h:26:8: error: redefinition of ‘struct ipx_route_definition’
      ./linux/ipx.h:32:8: error: redefinition of ‘struct ipx_interface_definition’
      ./linux/ipx.h:49:8: error: redefinition of ‘struct ipx_config_data’
      ./linux/ipx.h:58:8: error: redefinition of ‘struct ipx_route_def’
      
      Signed-off-by: default avatarMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53dc65d4
    • Mikko Rapeli's avatar
      include/uapi/linux/openvswitch.h: use __u32 from linux/types.h · a1d1f65f
      Mikko Rapeli authored
      
      Kernel uapi header are supposed to use them. Fixes userspace compile error:
      
      linux/openvswitch.h:583:2: error: unknown type name ‘uint32_t’
      
      Signed-off-by: default avatarMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1d1f65f