Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jul 16, 2020
    • Kees Cook's avatar
      bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok() · baef8d10
      Kees Cook authored
      commit 63960260 upstream.
      
      When evaluating access control over kallsyms visibility, credentials at
      open() time need to be used, not the "current" creds (though in BPF's
      case, this has likely always been the same). Plumb access to associated
      file->f_cred down through bpf_dump_raw_ok() and its callers now that
      kallsysm_show_value() has been refactored to take struct cred.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: bpf@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 7105e828
      
       ("bpf: allow for correlation of maps and helpers in dump")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baef8d10
    • Pablo Neira Ayuso's avatar
      netfilter: conntrack: refetch conntrack after nf_conntrack_update() · 2d156633
      Pablo Neira Ayuso authored
      [ Upstream commit d005fbb8 ]
      
      __nf_conntrack_update() might refresh the conntrack object that is
      attached to the skbuff. Otherwise, this triggers UAF.
      
      [  633.200434] ==================================================================
      [  633.200472] BUG: KASAN: use-after-free in nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200478] Read of size 1 at addr ffff888370804c00 by task nfqnl_test/6769
      
      [  633.200487] CPU: 1 PID: 6769 Comm: nfqnl_test Not tainted 5.8.0-rc2+ #388
      [  633.200490] Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
      [  633.200491] Call Trace:
      [  633.200499]  dump_stack+0x7c/0xb0
      [  633.200526]  ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200532]  print_address_description.constprop.6+0x1a/0x200
      [  633.200539]  ? _raw_write_lock_irqsave+0xc0/0xc0
      [  633.200568]  ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200594]  ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200598]  kasan_report.cold.9+0x1f/0x42
      [  633.200604]  ? call_rcu+0x2c0/0x390
      [  633.200633]  ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200659]  nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200687]  ? nf_conntrack_find_get+0x30/0x30 [nf_conntrack]
      
      Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1436
      Fixes: ee04805f
      
       ("netfilter: conntrack: make conntrack userspace helpers work again")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2d156633
    • Eric Dumazet's avatar
      netfilter: ipset: call ip_set_free() instead of kfree() · 4f412ae8
      Eric Dumazet authored
      [ Upstream commit c4e8fa90 ]
      
      Whenever ip_set_alloc() is used, allocated memory can either
      use kmalloc() or vmalloc(). We should call kvfree() or
      ip_set_free()
      
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 21935 Comm: syz-executor.3 Not tainted 5.8.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__phys_addr+0xa7/0x110 arch/x86/mm/physaddr.c:28
      Code: 1d 7a 09 4c 89 e3 31 ff 48 d3 eb 48 89 de e8 d0 58 3f 00 48 85 db 75 0d e8 26 5c 3f 00 4c 89 e0 5b 5d 41 5c c3 e8 19 5c 3f 00 <0f> 0b e8 12 5c 3f 00 48 c7 c0 10 10 a8 89 48 ba 00 00 00 00 00 fc
      RSP: 0000:ffffc900018572c0 EFLAGS: 00010046
      RAX: 0000000000040000 RBX: 0000000000000001 RCX: ffffc9000fac3000
      RDX: 0000000000040000 RSI: ffffffff8133f437 RDI: 0000000000000007
      RBP: ffffc90098aff000 R08: 0000000000000000 R09: ffff8880ae636cdb
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000408018aff000
      R13: 0000000000080000 R14: 000000000000001d R15: ffffc900018573d8
      FS:  00007fc540c66700(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fc9dcd67200 CR3: 0000000059411000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       virt_to_head_page include/linux/mm.h:841 [inline]
       virt_to_cache mm/slab.h:474 [inline]
       kfree+0x77/0x2c0 mm/slab.c:3749
       hash_net_create+0xbb2/0xd70 net/netfilter/ipset/ip_set_hash_gen.h:1536
       ip_set_create+0x6a2/0x13c0 net/netfilter/ipset/ip_set_core.c:1128
       nfnetlink_rcv_msg+0xbe8/0xea0 net/netfilter/nfnetlink.c:230
       netlink_rcv_skb+0x15a/0x430 net/netlink/af_netlink.c:2469
       nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:564
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2352
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2406
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
       do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45cb19
      Code: Bad RIP value.
      RSP: 002b:00007fc540c65c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004fed80 RCX: 000000000045cb19
      RDX: 0000000000000000 RSI: 0000000020001080 RDI: 0000000000000003
      RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 000000000000095e R14: 00000000004cc295 R15: 00007fc540c666d4
      
      Fixes: f66ee041 ("netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports")
      Fixes: 03c8b234
      
       ("netfilter: ipset: Generalize extensions support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4f412ae8
    • John Fastabend's avatar
      bpf, sockmap: RCU dereferenced psock may be used outside RCU block · b709a08b
      John Fastabend authored
      [ Upstream commit 8025751d ]
      
      If an ingress verdict program specifies message sizes greater than
      skb->len and there is an ENOMEM error due to memory pressure we
      may call the rcv_msg handler outside the strp_data_ready() caller
      context. This is because on an ENOMEM error the strparser will
      retry from a workqueue. The caller currently protects the use of
      psock by calling the strp_data_ready() inside a rcu_read_lock/unlock
      block.
      
      But, in above workqueue error case the psock is accessed outside
      the read_lock/unlock block of the caller. So instead of using
      psock directly we must do a look up against the sk again to
      ensure the psock is available.
      
      There is an an ugly piece here where we must handle
      the case where we paused the strp and removed the psock. On
      psock removal we first pause the strparser and then remove
      the psock. If the strparser is paused while an skb is
      scheduled on the workqueue the skb will be dropped on the
      flow and kfree_skb() is called. If the workqueue manages
      to get called before we pause the strparser but runs the rcvmsg
      callback after the psock is removed we will hit the unlikely
      case where we run the sockmap rcvmsg handler but do not have
      a psock. For now we will follow strparser logic and drop the
      skb on the floor with skb_kfree(). This is ugly because the
      data is dropped. To date this has not caused problems in practice
      because either the application controlling the sockmap is
      coordinating with the datapath so that skbs are "flushed"
      before removal or we simply wait for the sock to be closed before
      removing it.
      
      This patch fixes the describe RCU bug and dropping the skb doesn't
      make things worse. Future patches will improve this by allowing
      the normal case where skbs are not merged to skip the strparser
      altogether. In practice many (most?) use cases have no need to
      merge skbs so its both a code complexity hit as seen above and
      a performance issue. For example, in the Cilium case we always
      set the strparser up to return sbks 1:1 without any merging and
      have avoided above issues.
      
      Fixes: e91de6af
      
       ("bpf: Fix running sk_skb program types with ktls")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/159312679888.18340.15248924071966273998.stgit@john-XPS-13-9370
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b709a08b
    • John Fastabend's avatar
      bpf, sockmap: RCU splat with redirect and strparser error or TLS · 2000bb54
      John Fastabend authored
      [ Upstream commit 93dd5f18 ]
      
      There are two paths to generate the below RCU splat the first and
      most obvious is the result of the BPF verdict program issuing a
      redirect on a TLS socket (This is the splat shown below). Unlike
      the non-TLS case the caller of the *strp_read() hooks does not
      wrap the call in a rcu_read_lock/unlock. Then if the BPF program
      issues a redirect action we hit the RCU splat.
      
      However, in the non-TLS socket case the splat appears to be
      relatively rare, because the skmsg caller into the strp_data_ready()
      is wrapped in a rcu_read_lock/unlock. Shown here,
      
       static void sk_psock_strp_data_ready(struct sock *sk)
       {
      	struct sk_psock *psock;
      
      	rcu_read_lock();
      	psock = sk_psock(sk);
      	if (likely(psock)) {
      		if (tls_sw_has_ctx_rx(sk)) {
      			psock->parser.saved_data_ready(sk);
      		} else {
      			write_lock_bh(&sk->sk_callback_lock);
      			strp_data_ready(&psock->parser.strp);
      			write_unlock_bh(&sk->sk_callback_lock);
      		}
      	}
      	rcu_read_unlock();
       }
      
      If the above was the only way to run the verdict program we
      would be safe. But, there is a case where the strparser may throw an
      ENOMEM error while parsing the skb. This is a result of a failed
      skb_clone, or alloc_skb_for_msg while building a new merged skb when
      the msg length needed spans multiple skbs. This will in turn put the
      skb on the strp_wrk workqueue in the strparser code. The skb will
      later be dequeued and verdict programs run, but now from a
      different context without the rcu_read_lock()/unlock() critical
      section in sk_psock_strp_data_ready() shown above. In practice
      I have not seen this yet, because as far as I know most users of the
      verdict programs are also only working on single skbs. In this case no
      merge happens which could trigger the above ENOMEM errors. In addition
      the system would need to be under memory pressure. For example, we
      can't hit the above case in selftests because we missed having tests
      to merge skbs. (Added in later patch)
      
      To fix the below splat extend the rcu_read_lock/unnlock block to
      include the call to sk_psock_tls_verdict_apply(). This will fix both
      TLS redirect case and non-TLS redirect+error case. Also remove
      psock from the sk_psock_tls_verdict_apply() function signature its
      not used there.
      
      [ 1095.937597] WARNING: suspicious RCU usage
      [ 1095.940964] 5.7.0-rc7-02911-g463bac5f1ca79 #1 Tainted: G        W
      [ 1095.944363] -----------------------------
      [ 1095.947384] include/linux/skmsg.h:284 suspicious rcu_dereference_check() usage!
      [ 1095.950866]
      [ 1095.950866] other info that might help us debug this:
      [ 1095.950866]
      [ 1095.957146]
      [ 1095.957146] rcu_scheduler_active = 2, debug_locks = 1
      [ 1095.961482] 1 lock held by test_sockmap/15970:
      [ 1095.964501]  #0: ffff9ea6b25de660 (sk_lock-AF_INET){+.+.}-{0:0}, at: tls_sw_recvmsg+0x13a/0x840 [tls]
      [ 1095.968568]
      [ 1095.968568] stack backtrace:
      [ 1095.975001] CPU: 1 PID: 15970 Comm: test_sockmap Tainted: G        W         5.7.0-rc7-02911-g463bac5f1ca79 #1
      [ 1095.977883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
      [ 1095.980519] Call Trace:
      [ 1095.982191]  dump_stack+0x8f/0xd0
      [ 1095.984040]  sk_psock_skb_redirect+0xa6/0xf0
      [ 1095.986073]  sk_psock_tls_strp_read+0x1d8/0x250
      [ 1095.988095]  tls_sw_recvmsg+0x714/0x840 [tls]
      
      v2: Improve commit message to identify non-TLS redirect plus error case
          condition as well as more common TLS case. In the process I decided
          doing the rcu_read_unlock followed by the lock/unlock inside branches
          was unnecessarily complex. We can just extend the current rcu block
          and get the same effeective without the shuffling and branching.
          Thanks Martin!
      
      Fixes: e91de6af
      
       ("bpf: Fix running sk_skb program types with ktls")
      Reported-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/159312677907.18340.11064813152758406626.stgit@john-XPS-13-9370
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2000bb54
    • Luca Coelho's avatar
      nl80211: don't return err unconditionally in nl80211_start_ap() · a062088e
      Luca Coelho authored
      [ Upstream commit bc7a39b4 ]
      
      When a memory leak was fixed, a return err was changed to goto err,
      but, accidentally, the if (err) was removed, so now we always exit at
      this point.
      
      Fix it by adding if (err) back.
      
      Fixes: 9951ebfc
      
       ("nl80211: fix potential leak in AP start")
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Link: https://lore.kernel.org/r/iwlwifi.20200626124931.871ba5b31eee.I97340172d92164ee92f3c803fe20a8a6e97714e1@changeid
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a062088e
    • Dan Carpenter's avatar
      net: qrtr: Fix an out of bounds read qrtr_endpoint_post() · 91f8d05b
      Dan Carpenter authored
      commit 8ff41cc2
      
       upstream.
      
      This code assumes that the user passed in enough data for a
      qrtr_hdr_v1 or qrtr_hdr_v2 struct, but it's not necessarily true.  If
      the buffer is too small then it will read beyond the end.
      
      Reported-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Reported-by: default avatar <syzbot+b8fe393f999a291a9ea6@syzkaller.appspotmail.com>
      Fixes: 194ccc88
      
       ("net: qrtr: Support decoding incoming v2 packets")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91f8d05b
  2. Jul 09, 2020
    • David Howells's avatar
      rxrpc: Fix afs large storage transmission performance drop · 53e9b626
      David Howells authored
      [ Upstream commit 02c28dff ]
      
      Commit 2ad6691d, which moved the modification of the status annotation
      for a packet in the Tx buffer prior to the retransmission moved the state
      clearance, but managed to lose the bit that set it to UNACK.
      
      Consequently, if a retransmission occurs, the packet is accidentally
      changed to the ACK state (ie. 0) by masking it off, which means that the
      packet isn't counted towards the tally of newly-ACK'd packets if it gets
      hard-ACK'd.  This then prevents the congestion control algorithm from
      recovering properly.
      
      Fix by reinstating the change of state to UNACK.
      
      Spotted by the generic/460 xfstest.
      
      Fixes: 2ad6691d
      
       ("rxrpc: Fix race between incoming ACK parser and retransmitter")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53e9b626
    • David Howells's avatar
      rxrpc: Fix race between incoming ACK parser and retransmitter · 0ff5b1b5
      David Howells authored
      [ Upstream commit 2ad6691d ]
      
      There's a race between the retransmission code and the received ACK parser.
      The problem is that the retransmission loop has to drop the lock under
      which it is iterating through the transmission buffer in order to transmit
      a packet, but whilst the lock is dropped, the ACK parser can crank the Tx
      window round and discard the packets from the buffer.
      
      The retransmission code then updated the annotations for the wrong packet
      and a later retransmission thought it had to retransmit a packet that
      wasn't there, leading to a NULL pointer dereference.
      
      Fix this by:
      
       (1) Moving the annotation change to before we drop the lock prior to
           transmission.  This means we can't vary the annotation depending on
           the outcome of the transmission, but that's fine - we'll retransmit
           again later if it failed now.
      
       (2) Skipping the packet if the skb pointer is NULL.
      
      The following oops was seen:
      
      	BUG: kernel NULL pointer dereference, address: 000000000000002d
      	Workqueue: krxrpcd rxrpc_process_call
      	RIP: 0010:rxrpc_get_skb+0x14/0x8a
      	...
      	Call Trace:
      	 rxrpc_resend+0x331/0x41e
      	 ? get_vtime_delta+0x13/0x20
      	 rxrpc_process_call+0x3c0/0x4ac
      	 process_one_work+0x18f/0x27f
      	 worker_thread+0x1a3/0x247
      	 ? create_worker+0x17d/0x17d
      	 kthread+0xe6/0xeb
      	 ? kthread_delayed_work_timer_fn+0x83/0x83
      	 ret_from_fork+0x1f/0x30
      
      Fixes: 248f219c
      
       ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0ff5b1b5
  3. Jul 07, 2020
  4. Jul 01, 2020
  5. Jun 30, 2020
    • Chuck Lever's avatar
      xprtrdma: Fix handling of RDMA_ERROR replies · de1d70da
      Chuck Lever authored
      commit 7b2182ec
      
       upstream.
      
      The RPC client currently doesn't handle ERR_CHUNK replies correctly.
      rpcrdma_complete_rqst() incorrectly passes a negative number to
      xprt_complete_rqst() as the number of bytes copied. Instead, set
      task->tk_status to the error value, and return zero bytes copied.
      
      In these cases, return -EIO rather than -EREMOTEIO. The RPC client's
      finite state machine doesn't know what to do with -EREMOTEIO.
      
      Additional clean ups:
      - Don't double-count RDMA_ERROR replies
      - Remove a stale comment
      
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Cc: <stable@kernel.vger.org>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de1d70da
    • Chuck Lever's avatar
      SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment() · 7b99577f
      Chuck Lever authored
      commit 89a3c9f5
      
       upstream.
      
      @subbuf is an output parameter of xdr_buf_subsegment(). A survey of
      call sites shows that @subbuf is always uninitialized before
      xdr_buf_segment() is invoked by callers.
      
      There are some execution paths through xdr_buf_subsegment() that do
      not set all of the fields in @subbuf, leaving some pointer fields
      containing garbage addresses. Subsequent processing of that buffer
      then results in a page fault.
      
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b99577f
    • Vasily Averin's avatar
      sunrpc: fixed rollback in rpc_gssd_dummy_populate() · c27d205b
      Vasily Averin authored
      commit b7ade381 upstream.
      
      __rpc_depopulate(gssd_dentry) was lost on error path
      
      cc: stable@vger.kernel.org
      Fixes: commit 4b9a445e
      
       ("sunrpc: create a new dummy pipe for gssd to hold open")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c27d205b
    • Russell King's avatar
      netfilter: ipset: fix unaligned atomic access · ae6d80f6
      Russell King authored
      [ Upstream commit 71502846 ]
      
      When using ip_set with counters and comment, traffic causes the kernel
      to panic on 32-bit ARM:
      
      Alignment trap: not handling instruction e1b82f9f at [<bf01b0dc>]
      Unhandled fault: alignment exception (0x221) at 0xea08133c
      PC is at ip_set_match_extensions+0xe0/0x224 [ip_set]
      
      The problem occurs when we try to update the 64-bit counters - the
      faulting address above is not 64-bit aligned.  The problem occurs
      due to the way elements are allocated, for example:
      
      	set->dsize = ip_set_elem_len(set, tb, 0, 0);
      	map = ip_set_alloc(sizeof(*map) + elements * set->dsize);
      
      If the element has a requirement for a member to be 64-bit aligned,
      and set->dsize is not a multiple of 8, but is a multiple of four,
      then every odd numbered elements will be misaligned - and hitting
      an atomic64_add() on that element will cause the kernel to panic.
      
      ip_set_elem_len() must return a size that is rounded to the maximum
      alignment of any extension field stored in the element.  This change
      ensures that is the case.
      
      Fixes: 95ad1f4a
      
       ("netfilter: ipset: Fix extension alignment")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ae6d80f6
    • David Howells's avatar
      rxrpc: Fix handling of rwind from an ACK packet · d6fb7f45
      David Howells authored
      [ Upstream commit a2ad7c21 ]
      
      The handling of the receive window size (rwind) from a received ACK packet
      is not correct.  The rxrpc_input_ackinfo() function currently checks the
      current Tx window size against the rwind from the ACK to see if it has
      changed, but then limits the rwind size before storing it in the tx_winsize
      member and, if it increased, wake up the transmitting process.  This means
      that if rwind > RXRPC_RXTX_BUFF_SIZE - 1, this path will always be
      followed.
      
      Fix this by limiting rwind before we compare it to tx_winsize.
      
      The effect of this can be seen by enabling the rxrpc_rx_rwind_change
      tracepoint.
      
      Fixes: 702f2ac8
      
       ("rxrpc: Wake up the transmitter if Rx window size increases on the peer")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d6fb7f45
    • Huy Nguyen's avatar
      xfrm: Fix double ESP trailer insertion in IPsec crypto offload. · 41b2debf
      Huy Nguyen authored
      [ Upstream commit 94579ac3 ]
      
      During IPsec performance testing, we see bad ICMP checksum. The error packet
      has duplicated ESP trailer due to double validate_xmit_xfrm calls. The first call
      is from ip_output, but the packet cannot be sent because
      netif_xmit_frozen_or_stopped is true and the packet gets dev_requeue_skb. The second
      call is from NET_TX softirq. However after the first call, the packet already
      has the ESP trailer.
      
      Fix by marking the skb with XFRM_XMIT bit after the packet is handled by
      validate_xmit_xfrm to avoid duplicate ESP trailer insertion.
      
      Fixes: f6e27114
      
       ("net: Add a xfrm validate function to validate_xmit_skb")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Reviewed-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      41b2debf
    • Valentin Longchamp's avatar
      net: sched: export __netdev_watchdog_up() · 32e5a15f
      Valentin Longchamp authored
      [ Upstream commit 1a3db27a ]
      
      Since the quiesce/activate rework, __netdev_watchdog_up() is directly
      called in the ucc_geth driver.
      
      Unfortunately, this function is not available for modules and thus
      ucc_geth cannot be built as a module anymore. Fix it by exporting
      __netdev_watchdog_up().
      
      Since the commit introducing the regression was backported to stable
      branches, this one should ideally be as well.
      
      Fixes: 79dde73c
      
       ("net/ethernet/freescale: rework quiesce/activate for ucc_geth")
      Signed-off-by: default avatarValentin Longchamp <valentin@longchamp.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      32e5a15f
    • Neal Cardwell's avatar
      tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT · 052a7fdd
      Neal Cardwell authored
      [ Upstream commit b344579c ]
      
      Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
      Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
      ACK when the minimum rtt of a connection goes down. From inspection it
      is clear from the existing code that this could happen in an example
      like the following:
      
      o The first 8 RTT samples in a round trip are 150ms, resulting in a
        curr_rtt of 150ms and a delay_min of 150ms.
      
      o The 9th RTT sample is 100ms. The curr_rtt does not change after the
        first 8 samples, so curr_rtt remains 150ms. But delay_min can be
        lowered at any time, so delay_min falls to 100ms. The code executes
        the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
        of 100ms, and the curr_rtt is declared far enough above delay_min to
        force a (spurious) exit of Slow start.
      
      The fix here is simple: allow every RTT sample in a round trip to
      lower the curr_rtt.
      
      Fixes: ae27e98a
      
       ("[TCP] CUBIC v2.3")
      Reported-by: default avatarMirja Kuehlewind <mirja.kuehlewind@ericsson.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      052a7fdd
    • Toke Høiland-Jørgensen's avatar
      sch_cake: fix a few style nits · 94231513
      Toke Høiland-Jørgensen authored
      [ Upstream commit 3f608f0c ]
      
      I spotted a few nits when comparing the in-tree version of sch_cake with
      the out-of-tree one: A redundant error variable declaration shadowing an
      outer declaration, and an indentation alignment issue. Fix both of these.
      
      Fixes: 046f6fd5
      
       ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94231513
    • Toke Høiland-Jørgensen's avatar
      sch_cake: don't call diffserv parsing code when it is not needed · b1aa7e5f
      Toke Høiland-Jørgensen authored
      [ Upstream commit 8c95eca0 ]
      
      As a further optimisation of the diffserv parsing codepath, we can skip it
      entirely if CAKE is configured to neither use diffserv-based
      classification, nor to zero out the diffserv bits.
      
      Fixes: c87b4ecd
      
       ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1aa7e5f
    • Ilya Ponetayev's avatar
      sch_cake: don't try to reallocate or unshare skb unconditionally · ea2628dd
      Ilya Ponetayev authored
      [ Upstream commit 9208d286 ]
      
      cake_handle_diffserv() tries to linearize mac and network header parts of
      skb and to make it writable unconditionally. In some cases it leads to full
      skb reallocation, which reduces throughput and increases CPU load. Some
      measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core
      CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable()
      reallocates skb, if skb was allocated in ethernet driver via so-called
      'build skb' method from page cache (it was discovered by strange increase
      of kmalloc-2048 slab at first).
      
      Obtain DSCP value via read-only skb_header_pointer() call, and leave
      linearization only for DSCP bleaching or ECN CE setting. And, as an
      additional optimisation, skip diffserv parsing entirely if it is not needed
      by the current configuration.
      
      Fixes: c87b4ecd
      
       ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
      Signed-off-by: default avatarIlya Ponetayev <i.ponetaev@ndmsystems.com>
      [ fix a few style issues, reflow commit message ]
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea2628dd
    • Taehee Yoo's avatar
      ip_tunnel: fix use-after-free in ip_tunnel_lookup() · 3c620826
      Taehee Yoo authored
      [ Upstream commit ba61539c
      
       ]
      
      In the datapath, the ip_tunnel_lookup() is used and it internally uses
      fallback tunnel device pointer, which is fb_tunnel_dev.
      This pointer variable should be set to NULL when a fb interface is deleted.
      But there is no routine to set fb_tunnel_dev pointer to NULL.
      So, this pointer will be still used after interface is deleted and
      it eventually results in the use-after-free problem.
      
      Test commands:
          ip netns add A
          ip netns add B
          ip link add eth0 type veth peer name eth1
          ip link set eth0 netns A
          ip link set eth1 netns B
      
          ip netns exec A ip link set lo up
          ip netns exec A ip link set eth0 up
          ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
      	    remote 10.0.0.2
          ip netns exec A ip link set gre1 up
          ip netns exec A ip a a 10.0.100.1/24 dev gre1
          ip netns exec A ip a a 10.0.0.1/24 dev eth0
      
          ip netns exec B ip link set lo up
          ip netns exec B ip link set eth1 up
          ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
      	    remote 10.0.0.1
          ip netns exec B ip link set gre1 up
          ip netns exec B ip a a 10.0.100.2/24 dev gre1
          ip netns exec B ip a a 10.0.0.2/24 dev eth1
          ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
          ip netns del B
      
      Splat looks like:
      [   77.793450][    C3] ==================================================================
      [   77.794702][    C3] BUG: KASAN: use-after-free in ip_tunnel_lookup+0xcc4/0xf30
      [   77.795573][    C3] Read of size 4 at addr ffff888060bd9c84 by task hping3/2905
      [   77.796398][    C3]
      [   77.796664][    C3] CPU: 3 PID: 2905 Comm: hping3 Not tainted 5.8.0-rc1+ #616
      [   77.797474][    C3] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   77.798453][    C3] Call Trace:
      [   77.798815][    C3]  <IRQ>
      [   77.799142][    C3]  dump_stack+0x9d/0xdb
      [   77.799605][    C3]  print_address_description.constprop.7+0x2cc/0x450
      [   77.800365][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.800908][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.801517][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.802145][    C3]  kasan_report+0x154/0x190
      [   77.802821][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.803503][    C3]  ip_tunnel_lookup+0xcc4/0xf30
      [   77.804165][    C3]  __ipgre_rcv+0x1ab/0xaa0 [ip_gre]
      [   77.804862][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   77.805621][    C3]  gre_rcv+0x304/0x1910 [ip_gre]
      [   77.806293][    C3]  ? lock_acquire+0x1a9/0x870
      [   77.806925][    C3]  ? gre_rcv+0xfe/0x354 [gre]
      [   77.807559][    C3]  ? erspan_xmit+0x2e60/0x2e60 [ip_gre]
      [   77.808305][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   77.809032][    C3]  ? rcu_read_lock_held+0x90/0xa0
      [   77.809713][    C3]  gre_rcv+0x1b8/0x354 [gre]
      [ ... ]
      
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: c5441932
      
       ("GRE: Refactor GRE tunneling code.")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c620826
    • Taehee Yoo's avatar
      ip6_gre: fix use-after-free in ip6gre_tunnel_lookup() · 568c5aaf
      Taehee Yoo authored
      [ Upstream commit dafabb65
      
       ]
      
      In the datapath, the ip6gre_tunnel_lookup() is used and it internally uses
      fallback tunnel device pointer, which is fb_tunnel_dev.
      This pointer variable should be set to NULL when a fb interface is deleted.
      But there is no routine to set fb_tunnel_dev pointer to NULL.
      So, this pointer will be still used after interface is deleted and
      it eventually results in the use-after-free problem.
      
      Test commands:
          ip netns add A
          ip netns add B
          ip link add eth0 type veth peer name eth1
          ip link set eth0 netns A
          ip link set eth1 netns B
      
          ip netns exec A ip link set lo up
          ip netns exec A ip link set eth0 up
          ip netns exec A ip link add ip6gre1 type ip6gre local fc:0::1 \
      	    remote fc:0::2
          ip netns exec A ip -6 a a fc:100::1/64 dev ip6gre1
          ip netns exec A ip link set ip6gre1 up
          ip netns exec A ip -6 a a fc:0::1/64 dev eth0
          ip netns exec A ip link set ip6gre0 up
      
          ip netns exec B ip link set lo up
          ip netns exec B ip link set eth1 up
          ip netns exec B ip link add ip6gre1 type ip6gre local fc:0::2 \
      	    remote fc:0::1
          ip netns exec B ip -6 a a fc:100::2/64 dev ip6gre1
          ip netns exec B ip link set ip6gre1 up
          ip netns exec B ip -6 a a fc:0::2/64 dev eth1
          ip netns exec B ip link set ip6gre0 up
          ip netns exec A ping fc:100::2 -s 60000 &
          ip netns del B
      
      Splat looks like:
      [   73.087285][    C1] BUG: KASAN: use-after-free in ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.088361][    C1] Read of size 4 at addr ffff888040559218 by task ping/1429
      [   73.089317][    C1]
      [   73.089638][    C1] CPU: 1 PID: 1429 Comm: ping Not tainted 5.7.0+ #602
      [   73.090531][    C1] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   73.091725][    C1] Call Trace:
      [   73.092160][    C1]  <IRQ>
      [   73.092556][    C1]  dump_stack+0x96/0xdb
      [   73.093122][    C1]  print_address_description.constprop.6+0x2cc/0x450
      [   73.094016][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.094894][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.095767][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.096619][    C1]  kasan_report+0x154/0x190
      [   73.097209][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.097989][    C1]  ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.098750][    C1]  ? gre_del_protocol+0x60/0x60 [gre]
      [   73.099500][    C1]  gre_rcv+0x1c5/0x1450 [ip6_gre]
      [   73.100199][    C1]  ? ip6gre_header+0xf00/0xf00 [ip6_gre]
      [   73.100985][    C1]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   73.101830][    C1]  ? ip6_input_finish+0x5/0xf0
      [   73.102483][    C1]  ip6_protocol_deliver_rcu+0xcbb/0x1510
      [   73.103296][    C1]  ip6_input_finish+0x5b/0xf0
      [   73.103920][    C1]  ip6_input+0xcd/0x2c0
      [   73.104473][    C1]  ? ip6_input_finish+0xf0/0xf0
      [   73.105115][    C1]  ? rcu_read_lock_held+0x90/0xa0
      [   73.105783][    C1]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   73.106548][    C1]  ipv6_rcv+0x1f1/0x300
      [ ... ]
      
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: c12b395a
      
       ("gre: Support GRE over IPv6")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      568c5aaf
    • Eric Dumazet's avatar
      tcp: grow window for OOO packets only for SACK flows · fe3a5d8f
      Eric Dumazet authored
      [ Upstream commit 66205121 ]
      
      Back in 2013, we made a change that broke fast retransmit
      for non SACK flows.
      
      Indeed, for these flows, a sender needs to receive three duplicate
      ACK before starting fast retransmit. Sending ACK with different
      receive window do not count.
      
      Even if enabling SACK is strongly recommended these days,
      there still are some cases where it has to be disabled.
      
      Not increasing the window seems better than having to
      rely on RTO.
      
      After the fix, following packetdrill test gives :
      
      // Initialize connection
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
         +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
         +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8>
         +0 < . 1:1(0) ack 1 win 514
      
         +0 accept(3, ..., ...) = 4
      
         +0 < . 1:1001(1000) ack 1 win 514
      // Quick ack
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 2001:3001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 3001:4001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 4001:5001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
          +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 1001:2001(1000) ack 1 win 514
      // Hole is repaired.
         +0 > . 1:1(0) ack 5001 win 272
      
      Fixes: 4e4f1fc2
      
       ("tcp: properly increase rcv_ssthresh for ofo packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe3a5d8f
    • Denis Kirjanov's avatar
      tcp: don't ignore ECN CWR on pure ACK · cb22ce33
      Denis Kirjanov authored
      [ Upstream commit 25702840
      
       ]
      
      there is a problem with the CWR flag set in an incoming ACK segment
      and it leads to the situation when the ECE flag is latched forever
      
      the following packetdrill script shows what happens:
      
      // Stack receives incoming segments with CE set
      +0.1 <[ect0]  . 11001:12001(1000) ack 1001 win 65535
      +0.0 <[ce]    . 12001:13001(1000) ack 1001 win 65535
      +0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535
      
      // Stack repsonds with ECN ECHO
      +0.0 >[noecn]  . 1001:1001(0) ack 12001
      +0.0 >[noecn] E. 1001:1001(0) ack 13001
      +0.0 >[noecn] E. 1001:1001(0) ack 14001
      
      // Write a packet
      +0.1 write(3, ..., 1000) = 1000
      +0.0 >[ect0] PE. 1001:2001(1000) ack 14001
      
      // Pure ACK received
      +0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535
      
      // Since CWR was sent, this packet should NOT have ECE set
      
      +0.1 write(3, ..., 1000) = 1000
      +0.0 >[ect0]  P. 2001:3001(1000) ack 14001
      // but Linux will still keep ECE latched here, with packetdrill
      // flagging a missing ECE flag, expecting
      // >[ect0] PE. 2001:3001(1000) ack 14001
      // in the script
      
      In the situation above we will continue to send ECN ECHO packets
      and trigger the peer to reduce the congestion window. To avoid that
      we can check CWR on pure ACKs received.
      
      v3:
      - Add a sequence check to avoid sending an ACK to an ACK
      
      v2:
      - Adjusted the comment
      - move CWR check before checking for unacknowledged packets
      
      Signed-off-by: default avatarDenis Kirjanov <denis.kirjanov@suse.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb22ce33
    • Marcelo Ricardo Leitner's avatar
      sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket · dc43f7e8
      Marcelo Ricardo Leitner authored
      [ Upstream commit 471e39df ]
      
      If a socket is set ipv6only, it will still send IPv4 addresses in the
      INIT and INIT_ACK packets. This potentially misleads the peer into using
      them, which then would cause association termination.
      
      The fix is to not add IPv4 addresses to ipv6only sockets.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Tested-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc43f7e8
    • David Howells's avatar
      rxrpc: Fix notification call on completion of discarded calls · fea86448
      David Howells authored
      [ Upstream commit 0041cd5a
      
       ]
      
      When preallocated service calls are being discarded, they're passed to
      ->discard_new_call() to have the caller clean up any attached higher-layer
      preallocated pieces before being marked completed.  However, the act of
      marking them completed now invokes the call's notification function - which
      causes a problem because that function might assume that the previously
      freed pieces of memory are still there.
      
      Fix this by setting a dummy notification function on the socket after
      calling ->discard_new_call().
      
      This results in the following kasan message when the kafs module is
      removed.
      
      ==================================================================
      BUG: KASAN: use-after-free in afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
      Write of size 1 at addr ffff8880946c39e4 by task kworker/u4:1/21
      
      CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.8.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x18f/0x20d lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd3/0x413 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
       rxrpc_notify_socket+0x1db/0x5d0 net/rxrpc/recvmsg.c:40
       __rxrpc_set_call_completion.part.0+0x172/0x410 net/rxrpc/recvmsg.c:76
       __rxrpc_call_completed net/rxrpc/recvmsg.c:112 [inline]
       rxrpc_call_completed+0xca/0xf0 net/rxrpc/recvmsg.c:111
       rxrpc_discard_prealloc+0x781/0xab0 net/rxrpc/call_accept.c:233
       rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
       afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
       afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
       ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
       cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
       process_one_work+0x965/0x1690 kernel/workqueue.c:2269
       worker_thread+0x96/0xe10 kernel/workqueue.c:2415
       kthread+0x3b5/0x4a0 kernel/kthread.c:291
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
      
      Allocated by task 6820:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc mm/kasan/common.c:494 [inline]
       __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:467
       kmem_cache_alloc_trace+0x153/0x7d0 mm/slab.c:3551
       kmalloc include/linux/slab.h:555 [inline]
       kzalloc include/linux/slab.h:669 [inline]
       afs_alloc_call+0x55/0x630 fs/afs/rxrpc.c:141
       afs_charge_preallocation+0xe9/0x2d0 fs/afs/rxrpc.c:757
       afs_open_socket+0x292/0x360 fs/afs/rxrpc.c:92
       afs_net_init+0xa6c/0xe30 fs/afs/main.c:125
       ops_init+0xaf/0x420 net/core/net_namespace.c:151
       setup_net+0x2de/0x860 net/core/net_namespace.c:341
       copy_net_ns+0x293/0x590 net/core/net_namespace.c:482
       create_new_namespaces+0x3fb/0xb30 kernel/nsproxy.c:110
       unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
       ksys_unshare+0x43d/0x8e0 kernel/fork.c:2983
       __do_sys_unshare kernel/fork.c:3051 [inline]
       __se_sys_unshare kernel/fork.c:3049 [inline]
       __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
       do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 21:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       kasan_set_free_info mm/kasan/common.c:316 [inline]
       __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:455
       __cache_free mm/slab.c:3426 [inline]
       kfree+0x109/0x2b0 mm/slab.c:3757
       afs_put_call+0x585/0xa40 fs/afs/rxrpc.c:190
       rxrpc_discard_prealloc+0x764/0xab0 net/rxrpc/call_accept.c:230
       rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
       afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
       afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
       ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
       cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
       process_one_work+0x965/0x1690 kernel/workqueue.c:2269
       worker_thread+0x96/0xe10 kernel/workqueue.c:2415
       kthread+0x3b5/0x4a0 kernel/kthread.c:291
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
      
      The buggy address belongs to the object at ffff8880946c3800
       which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 484 bytes inside of
       1024-byte region [ffff8880946c3800, ffff8880946c3c00)
      The buggy address belongs to the page:
      page:ffffea000251b0c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
      flags: 0xfffe0000000200(slab)
      raw: 00fffe0000000200 ffffea0002546508 ffffea00024fa248 ffff8880aa000c40
      raw: 0000000000000000 ffff8880946c3000 0000000100000002 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880946c3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880946c3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8880946c3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
       ffff8880946c3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880946c3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Reported-by: default avatar <syzbot+d3eccef36ddbd02713e9@syzkaller.appspotmail.com>
      Fixes: 5ac0d622
      
       ("rxrpc: Fix missing notification")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fea86448
    • Lorenzo Bianconi's avatar
      openvswitch: take into account de-fragmentation/gso_size in execute_check_pkt_len · a908f986
      Lorenzo Bianconi authored
      [ Upstream commit 17843655 ]
      
      ovs connection tracking module performs de-fragmentation on incoming
      fragmented traffic. Take info account if traffic has been de-fragmented
      in execute_check_pkt_len action otherwise we will perform the wrong
      nested action considering the original packet size. This issue typically
      occurs if ovs-vswitchd adds a rule in the pipeline that requires connection
      tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action.
      Moreover take into account GSO fragment size for GSO packet in
      execute_check_pkt_len routine
      
      Fixes: 4d5ec89f
      
       ("net: openvswitch: Add a new action check_pkt_len")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a908f986
    • Eric Dumazet's avatar
      net: increment xmit_recursion level in dev_direct_xmit() · 67571b1a
      Eric Dumazet authored
      [ Upstream commit 0ad6f6e7 ]
      
      Back in commit f60e5990 ("ipv6: protect skb->sk accesses
      from recursive dereference inside the stack") Hannes added code
      so that IPv6 stack would not trust skb->sk for typical cases
      where packet goes through 'standard' xmit path (__dev_queue_xmit())
      
      Alas af_packet had a dev_direct_xmit() path that was not
      dealing yet with xmit_recursion level.
      
      Also change sk_mc_loop() to dump a stack once only.
      
      Without this patch, syzbot was able to trigger :
      
      [1]
      [  153.567378] WARNING: CPU: 7 PID: 11273 at net/core/sock.c:721 sk_mc_loop+0x51/0x70
      [  153.567378] Modules linked in: nfnetlink ip6table_raw ip6table_filter iptable_raw iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 iptable_filter macsec macvtap tap macvlan 8021q hsr wireguard libblake2s blake2s_x86_64 libblake2s_generic udp_tunnel ip6_udp_tunnel libchacha20poly1305 poly1305_x86_64 chacha_x86_64 libchacha curve25519_x86_64 libcurve25519_generic netdevsim batman_adv dummy team bridge stp llc w1_therm wire i2c_mux_pca954x i2c_mux cdc_acm ehci_pci ehci_hcd mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core
      [  153.567386] CPU: 7 PID: 11273 Comm: b159172088 Not tainted 5.8.0-smp-DEV #273
      [  153.567387] RIP: 0010:sk_mc_loop+0x51/0x70
      [  153.567388] Code: 66 83 f8 0a 75 24 0f b6 4f 12 b8 01 00 00 00 31 d2 d3 e0 a9 bf ef ff ff 74 07 48 8b 97 f0 02 00 00 0f b6 42 3a 83 e0 01 5d c3 <0f> 0b b8 01 00 00 00 5d c3 0f b6 87 18 03 00 00 5d c0 e8 04 83 e0
      [  153.567388] RSP: 0018:ffff95c69bb93990 EFLAGS: 00010212
      [  153.567388] RAX: 0000000000000011 RBX: ffff95c6e0ee3e00 RCX: 0000000000000007
      [  153.567389] RDX: ffff95c69ae50000 RSI: ffff95c6c30c3000 RDI: ffff95c6c30c3000
      [  153.567389] RBP: ffff95c69bb93990 R08: ffff95c69a77f000 R09: 0000000000000008
      [  153.567389] R10: 0000000000000040 R11: 00003e0e00026128 R12: ffff95c6c30c3000
      [  153.567390] R13: ffff95c6cc4fd500 R14: ffff95c6f84500c0 R15: ffff95c69aa13c00
      [  153.567390] FS:  00007fdc3a283700(0000) GS:ffff95c6ff9c0000(0000) knlGS:0000000000000000
      [  153.567390] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  153.567391] CR2: 00007ffee758e890 CR3: 0000001f9ba20003 CR4: 00000000001606e0
      [  153.567391] Call Trace:
      [  153.567391]  ip6_finish_output2+0x34e/0x550
      [  153.567391]  __ip6_finish_output+0xe7/0x110
      [  153.567391]  ip6_finish_output+0x2d/0xb0
      [  153.567392]  ip6_output+0x77/0x120
      [  153.567392]  ? __ip6_finish_output+0x110/0x110
      [  153.567392]  ip6_local_out+0x3d/0x50
      [  153.567392]  ipvlan_queue_xmit+0x56c/0x5e0
      [  153.567393]  ? ksize+0x19/0x30
      [  153.567393]  ipvlan_start_xmit+0x18/0x50
      [  153.567393]  dev_direct_xmit+0xf3/0x1c0
      [  153.567393]  packet_direct_xmit+0x69/0xa0
      [  153.567394]  packet_sendmsg+0xbf0/0x19b0
      [  153.567394]  ? plist_del+0x62/0xb0
      [  153.567394]  sock_sendmsg+0x65/0x70
      [  153.567394]  sock_write_iter+0x93/0xf0
      [  153.567394]  new_sync_write+0x18e/0x1a0
      [  153.567395]  __vfs_write+0x29/0x40
      [  153.567395]  vfs_write+0xb9/0x1b0
      [  153.567395]  ksys_write+0xb1/0xe0
      [  153.567395]  __x64_sys_write+0x1a/0x20
      [  153.567395]  do_syscall_64+0x43/0x70
      [  153.567396]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  153.567396] RIP: 0033:0x453549
      [  153.567396] Code: Bad RIP value.
      [  153.567396] RSP: 002b:00007fdc3a282cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  153.567397] RAX: ffffffffffffffda RBX: 00000000004d32d0 RCX: 0000000000453549
      [  153.567397] RDX: 0000000000000020 RSI: 0000000020000300 RDI: 0000000000000003
      [  153.567398] RBP: 00000000004d32d8 R08: 0000000000000000 R09: 0000000000000000
      [  153.567398] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d32dc
      [  153.567398] R13: 00007ffee742260f R14: 00007fdc3a282dc0 R15: 00007fdc3a283700
      [  153.567399] ---[ end trace c1d5ae2b1059ec62 ]---
      
      f60e5990
      
       ("ipv6: protect skb->sk accesses from recursive dereference inside the stack")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67571b1a
    • guodeqing's avatar
      net: Fix the arp error in some cases · 97a1d2aa
      guodeqing authored
      [ Upstream commit 5eea3a63 ]
      
      ie.,
      $ ifconfig eth0 6.6.6.6 netmask 255.255.255.0
      
      $ ip rule add from 6.6.6.6 table 6666
      
      $ ip route add 9.9.9.9 via 6.6.6.6
      
      $ ping -I 6.6.6.6 9.9.9.9
      PING 9.9.9.9 (9.9.9.9) from 6.6.6.6 : 56(84) bytes of data.
      
      3 packets transmitted, 0 received, 100% packet loss, time 2079ms
      
      $ arp
      Address     HWtype  HWaddress           Flags Mask            Iface
      6.6.6.6             (incomplete)                              eth0
      
      The arp request address is error, this is because fib_table_lookup in
      fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of
      the fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.
      Here I add a check of whether this is RT_TABLE_MAIN to solve this problem.
      
      Fixes: 3bfd8472
      
       ("net: Use passed in table for nexthop lookups")
      Signed-off-by: default avatarguodeqing <geffrey.guo@huawei.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97a1d2aa
    • Yang Yingliang's avatar
      net: fix memleak in register_netdevice() · 742f2358
      Yang Yingliang authored
      [ Upstream commit 814152a8
      
       ]
      
      I got a memleak report when doing some fuzz test:
      
      unreferenced object 0xffff888112584000 (size 13599):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 32 bytes):
          74 61 70 30 00 00 00 00 00 00 00 00 00 00 00 00  tap0............
          00 ee d9 19 81 88 ff ff 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002f60ba65>] __kmalloc_node+0x309/0x3a0
          [<0000000075b211ec>] kvmalloc_node+0x7f/0xc0
          [<00000000d3a97396>] alloc_netdev_mqs+0x76/0xfc0
          [<00000000609c3655>] __tun_chr_ioctl+0x1456/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff888111845cc0 (size 8):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 8 bytes):
          74 61 70 30 00 88 ff ff                          tap0....
        backtrace:
          [<000000004c159777>] kstrdup+0x35/0x70
          [<00000000d8b496ad>] kstrdup_const+0x3d/0x50
          [<00000000494e884a>] kvasprintf_const+0xf1/0x180
          [<0000000097880a2b>] kobject_set_name_vargs+0x56/0x140
          [<000000008fbdfc7b>] dev_set_name+0xab/0xe0
          [<000000005b99e3b4>] netdev_register_kobject+0xc0/0x390
          [<00000000602704fe>] register_netdevice+0xb61/0x1250
          [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff88811886d800 (size 512):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 32 bytes):
          00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
          ff ff ff ff ff ff ff ff c0 66 3d a3 ff ff ff ff  .........f=.....
        backtrace:
          [<0000000050315800>] device_add+0x61e/0x1950
          [<0000000021008dfb>] netdev_register_kobject+0x17e/0x390
          [<00000000602704fe>] register_netdevice+0xb61/0x1250
          [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      If call_netdevice_notifiers() failed, then rollback_registered()
      calls netdev_unregister_kobject() which holds the kobject. The
      reference cannot be put because the netdev won't be add to todo
      list, so it will leads a memleak, we need put the reference to
      avoid memleak.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      742f2358
    • Tariq Toukan's avatar
      net: Do not clear the sock TX queue in sk_set_socket() · 9e693934
      Tariq Toukan authored
      [ Upstream commit 41b14fb8 ]
      
      Clearing the sock TX queue in sk_set_socket() might cause unexpected
      out-of-order transmit when called from sock_orphan(), as outstanding
      packets can pick a different TX queue and bypass the ones already queued.
      
      This is undesired in general. More specifically, it breaks the in-order
      scheduling property guarantee for device-offloaded TLS sockets.
      
      Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
      explicitly only where needed.
      
      Fixes: e022f0b4
      
       ("net: Introduce sk_tx_queue_mapping")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e693934
    • Thomas Martitz's avatar
      net: bridge: enfore alignment for ethernet address · f32325b1
      Thomas Martitz authored
      [ Upstream commit db7202dec92e6caa2706c21d6fc359af318bde2e ]
      
      The eth_addr member is passed to ether_addr functions that require
      2-byte alignment, therefore the member must be properly aligned
      to avoid unaligned accesses.
      
      The problem is in place since the initial merge of multicast to unicast:
      commit 6db6f0ea bridge: multicast to unicast
      
      Fixes: 6db6f0ea
      
       ("bridge: multicast to unicast")
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Felix Fietkau <nbd@nbd.name>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Martitz <t.martitz@avm.de>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f32325b1
    • Wang Hai's avatar
      mld: fix memory leak in ipv6_mc_destroy_dev() · fa0d7e09
      Wang Hai authored
      [ Upstream commit ea2fce88 ]
      
      Commit a84d0164 ("mld: fix memory leak in mld_del_delrec()") fixed
      the memory leak of MLD, but missing the ipv6_mc_destroy_dev() path, in
      which mca_sources are leaked after ma_put().
      
      Using ip6_mc_clear_src() to take care of the missing free.
      
      BUG: memory leak
      unreferenced object 0xffff8881113d3180 (size 64):
        comm "syz-executor071", pid 389, jiffies 4294887985 (age 17.943s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 ff 02 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002cbc483c>] kmalloc include/linux/slab.h:555 [inline]
          [<000000002cbc483c>] kzalloc include/linux/slab.h:669 [inline]
          [<000000002cbc483c>] ip6_mc_add1_src net/ipv6/mcast.c:2237 [inline]
          [<000000002cbc483c>] ip6_mc_add_src+0x7f5/0xbb0 net/ipv6/mcast.c:2357
          [<0000000058b8b1ff>] ip6_mc_source+0xe0c/0x1530 net/ipv6/mcast.c:449
          [<000000000bfc4fb5>] do_ipv6_setsockopt.isra.12+0x1b2c/0x3b30 net/ipv6/ipv6_sockglue.c:754
          [<00000000e4e7a722>] ipv6_setsockopt+0xda/0x150 net/ipv6/ipv6_sockglue.c:950
          [<0000000029260d9a>] rawv6_setsockopt+0x45/0x100 net/ipv6/raw.c:1081
          [<000000005c1b46f9>] __sys_setsockopt+0x131/0x210 net/socket.c:2132
          [<000000008491f7db>] __do_sys_setsockopt net/socket.c:2148 [inline]
          [<000000008491f7db>] __se_sys_setsockopt net/socket.c:2145 [inline]
          [<000000008491f7db>] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2145
          [<00000000c7bc11c5>] do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295
          [<000000005fb7a3f3>] entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      Fixes: 1666d49e
      
       ("mld: do not remove mld souce list info when set link down")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Acked-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa0d7e09
  6. Jun 24, 2020
    • Murali Karicheri's avatar
      net: hsr/prp: Support VLAN tagged supervision frames · a3f4bfae
      Murali Karicheri authored
      
      In hsr/prp networks, user may choose to use vlan tagged Supervision (SV)
      frames in the network which is defined as an optional feature in the
      iec-62439 standard. User would then require to configure the VLAN ID,
      PCP and DEI values to be used in the SV frames. This patch update the
      driver to accept these values optionally from user as part of the
      ip link add command and use VLAN info to format the VLAN tag in the
      SV frame. Also pass the VLAN ID to the lower device so that it may
      be added to the VLAN filter in h/w. Update the procfs interface to
      allow user to enable/disable SV frame generation logic for debug
      purpose.
      
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      a3f4bfae
    • Murali Karicheri's avatar
      net: hsr/prp: add procfs interface for snmp agent interface · 36129298
      Murali Karicheri authored
      
      implement a simple procfs interface to support snmpd agent
      set/get command to implement iec62439 mibs for LRE. This is done
      similar to ip/udp/tcp mibs of mibii for stats and such. procfs
      files are created under the /proc/<interface-name> folder.
      
      This patch also add some missing hsr/prp modes so that same can
      be set through SNMP interface.
      
      The format of print to console is kept same as the current debugfs
      output format expected by snmpd so that same text parser can be
      re-used at snmpd even after this migration.
      
      Note that this is a temporary interface and is not accepted by
      netdev community at LKML and may change in the future.
      
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      36129298
    • Murali Karicheri's avatar
      net: hsr/prp: introduce rx offload support for HSR/PRP · dc8c14de
      Murali Karicheri authored
      
      This patch introduces the HSR/PRP rx offload flag to allow offload
      of LRE rx side processing such as duplicate detect/drop, node_table,
      etc to h/w. Lower level Ethernet drivers that support h/w with these
      protocol handling list this feature in the feature flag of the
      net_device. If lower device feature flag indicates RX Offload support,
      then the same function is disabled in this driver.
      
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      dc8c14de
    • Murali Karicheri's avatar
      net: hsr/prp: add vlan ctag filter support · 3647b8e7
      Murali Karicheri authored
      
      This patch adds support for vlan ctag based filtering at slave devices.
      The slave Ethenet device may be capable of filtering Ethernet packets
      based on VLAN ID. This requires that when the VLAN interface is created over
      an hsr/prp interface, it passes the vid information to the associated
      slave Ethernet devices so that it updates the hardware filters to
      filter Ethernet frames based on VID. This patch adds the required
      functions to propagate the vid information to the slave devices.
      
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      3647b8e7