Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Apr 07, 2022
  2. Apr 06, 2022
    • Maxim Mikityanskiy's avatar
      bpf: Support dual-stack sockets in bpf_tcp_check_syncookie · 2e8702cc
      Maxim Mikityanskiy authored
      bpf_tcp_gen_syncookie looks at the IP version in the IP header and
      validates the address family of the socket. It supports IPv4 packets in
      AF_INET6 dual-stack sockets.
      
      On the other hand, bpf_tcp_check_syncookie looks only at the address
      family of the socket, ignoring the real IP version in headers, and
      validates only the packet size. This implementation has some drawbacks:
      
      1. Packets are not validated properly, allowing a BPF program to trick
         bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4
         socket.
      
      2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end
         up receiving a SYNACK with the cookie, but the following ACK gets
         dropped.
      
      This patch fixes these issues by changing the checks in
      bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP
      version from the header is taken into account, and it is validated
      properly with address family.
      
      Fixes: 39904084
      
       ("bpf: add helper to check for a valid SYN cookie")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Acked-by: default avatarArthur Fabre <afabre@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com
      2e8702cc
    • Florian Westphal's avatar
      net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=n · a3ebe92a
      Florian Westphal authored
      net/ipv6/ip6mr.c:1656:14: warning: unused variable 'do_wrmifwhole'
      
      Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used.
      
      Fixes: 4b340a5a
      
       ("net: ip6mr: add support for passing full packet on wrong mif")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3ebe92a
    • Eric Dumazet's avatar
      rxrpc: fix a race in rxrpc_exit_net() · 1946014c
      Eric Dumazet authored
      Current code can lead to the following race:
      
      CPU0                                                 CPU1
      
      rxrpc_exit_net()
                                                           rxrpc_peer_keepalive_worker()
                                                             if (rxnet->live)
      
        rxnet->live = false;
        del_timer_sync(&rxnet->peer_keepalive_timer);
      
                                                                   timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay);
      
        cancel_work_sync(&rxnet->peer_keepalive_work);
      
      rxrpc_exit_net() exits while peer_keepalive_timer is still armed,
      leading to use-after-free.
      
      syzbot report was:
      
      ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0
      WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Modules linked in:
      CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
      RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082
      RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
      RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52
      RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0
      R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000
      FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       __debug_check_no_obj_freed lib/debugobjects.c:992 [inline]
       debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023
       kfree+0xd6/0x310 mm/slab.c:3809
       ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176
       ops_free_list net/core/net_namespace.c:174 [inline]
       cleanup_net+0x591/0xb00 net/core/net_namespace.c:598
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
       </TASK>
      
      Fixes: ace45bec
      
       ("rxrpc: Fix firewall route keepalive")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Marc Dionne <marc.dionne@auristor.com>
      Cc: linux-afs@lists.infradead.org
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1946014c
    • Ilya Maximets's avatar
      net: openvswitch: fix leak of nested actions · 1f30fb91
      Ilya Maximets authored
      While parsing user-provided actions, openvswitch module may dynamically
      allocate memory and store pointers in the internal copy of the actions.
      So this memory has to be freed while destroying the actions.
      
      Currently there are only two such actions: ct() and set().  However,
      there are many actions that can hold nested lists of actions and
      ovs_nla_free_flow_actions() just jumps over them leaking the memory.
      
      For example, removal of the flow with the following actions will lead
      to a leak of the memory allocated by nf_ct_tmpl_alloc():
      
        actions:clone(ct(commit),0)
      
      Non-freed set() action may also leak the 'dst' structure for the
      tunnel info including device references.
      
      Under certain conditions with a high rate of flow rotation that may
      cause significant memory leak problem (2MB per second in reporter's
      case).  The problem is also hard to mitigate, because the user doesn't
      have direct control over the datapath flows generated by OVS.
      
      Fix that by iterating over all the nested actions and freeing
      everything that needs to be freed recursively.
      
      New build time assertion should protect us from this problem if new
      actions will be added in the future.
      
      Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all
      attributes has to be explicitly checked.  sample() and clone() actions
      are mixing extra attributes into the user-provided action list.  That
      prevents some code generalization too.
      
      Fixes: 34ae932a ("openvswitch: Make tunnel set action attach a metadata dst")
      Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html
      
      
      Reported-by: default avatarStéphane Graber <stgraber@ubuntu.com>
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Acked-by: default avatarAaron Conole <aconole@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f30fb91
  3. Apr 05, 2022
  4. Apr 01, 2022
    • Nikolay Aleksandrov's avatar
      net: ipv4: fix route with nexthop object delete warning · 6bf92d70
      Nikolay Aleksandrov authored
      FRR folks have hit a kernel warning[1] while deleting routes[2] which is
      caused by trying to delete a route pointing to a nexthop id without
      specifying nhid but matching on an interface. That is, a route is found
      but we hit a warning while matching it. The warning is from
      fib_info_nh() in include/net/nexthop.h because we run it on a fib_info
      with nexthop object. The call chain is:
       inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
      nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
      the fib_info and triggering the warning). The fix is to not do any
      matching in that branch if the fi has a nexthop object because those are
      managed separately. I.e. we should match when deleting without nh spec and
      should fail when deleting a nexthop route with old-style nh spec because
      nexthop objects are managed separately, e.g.:
       $ ip r show 1.2.3.4/32
       1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0
      
       $ ip r del 1.2.3.4/32
       $ ip r del 1.2.3.4/32 nhid 12
       <both should work>
      
       $ ip r del 1.2.3.4/32 dev dummy0
       <should fail with ESRCH>
      
      [1]
       [  523.462226] ------------[ cut here ]------------
       [  523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460
       [  523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd
       [  523.462274]  videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse
       [  523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P           OE     5.16.18-200.fc35.x86_64 #1
       [  523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020
       [  523.462303] RIP: 0010:fib_nh_match+0x210/0x460
       [  523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00
       [  523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286
       [  523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0
       [  523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380
       [  523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000
       [  523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031
       [  523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0
       [  523.462311] FS:  00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000
       [  523.462313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [  523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0
       [  523.462315] Call Trace:
       [  523.462316]  <TASK>
       [  523.462320]  fib_table_delete+0x1a9/0x310
       [  523.462323]  inet_rtm_delroute+0x93/0x110
       [  523.462325]  rtnetlink_rcv_msg+0x133/0x370
       [  523.462327]  ? _copy_to_iter+0xb5/0x6f0
       [  523.462330]  ? rtnl_calcit.isra.0+0x110/0x110
       [  523.462331]  netlink_rcv_skb+0x50/0xf0
       [  523.462334]  netlink_unicast+0x211/0x330
       [  523.462336]  netlink_sendmsg+0x23f/0x480
       [  523.462338]  sock_sendmsg+0x5e/0x60
       [  523.462340]  ____sys_sendmsg+0x22c/0x270
       [  523.462341]  ? import_iovec+0x17/0x20
       [  523.462343]  ? sendmsg_copy_msghdr+0x59/0x90
       [  523.462344]  ? __mod_lruvec_page_state+0x85/0x110
       [  523.462348]  ___sys_sendmsg+0x81/0xc0
       [  523.462350]  ? netlink_seq_start+0x70/0x70
       [  523.462352]  ? __dentry_kill+0x13a/0x180
       [  523.462354]  ? __fput+0xff/0x250
       [  523.462356]  __sys_sendmsg+0x49/0x80
       [  523.462358]  do_syscall_64+0x3b/0x90
       [  523.462361]  entry_SYSCALL_64_after_hwframe+0x44/0xae
       [  523.462364] RIP: 0033:0x7f24552aa337
       [  523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
       [  523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       [  523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337
       [  523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003
       [  523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
       [  523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001
       [  523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040
       [  523.462373]  </TASK>
       [  523.462374] ---[ end trace ba537bc16f6bf4ed ]---
      
      [2] https://github.com/FRRouting/frr/issues/6412
      
      Fixes: 4c7e8084
      
       ("ipv4: Plumb support for nexthop object in a fib_info")
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bf92d70
    • Matt Johnston's avatar
      mctp: Use output netdev to allocate skb headroom · 4a9dda1c
      Matt Johnston authored
      Previously the skb was allocated with headroom MCTP_HEADER_MAXLEN,
      but that isn't sufficient if we are using devs that are not MCTP
      specific.
      
      This also adds a check that the smctp_halen provided to sendmsg for
      extended addressing is the correct size for the netdev.
      
      Fixes: 833ef3b9
      
       ("mctp: Populate socket implementation")
      Reported-by: default avatarMatthew Rinaldi <mjrinal@g.clemson.edu>
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a9dda1c
    • Matt Johnston's avatar
      mctp: Fix check for dev_hard_header() result · 60be976a
      Matt Johnston authored
      dev_hard_header() returns the length of the header, so
      we need to test for negative errors rather than non-zero.
      
      Fixes: 889b7da2
      
       ("mctp: Add initial routing framework")
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60be976a
    • Vladimir Oltean's avatar
      Revert "net: dsa: stop updating master MTU from master.c" · 066dfc42
      Vladimir Oltean authored
      This reverts commit a1ff94c2.
      
      Switch drivers that don't implement ->port_change_mtu() will cause the
      DSA master to remain with an MTU of 1500, since we've deleted the other
      code path. In turn, this causes a regression for those systems, where
      MTU-sized traffic can no longer be terminated.
      
      Revert the change taking into account the fact that rtnl_lock() is now
      taken top-level from the callers of dsa_master_setup() and
      dsa_master_teardown(). Also add a comment in order for it to be
      absolutely clear why it is still needed.
      
      Fixes: a1ff94c2
      
       ("net: dsa: stop updating master MTU from master.c")
      Reported-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      066dfc42
    • Jean-Philippe Brucker's avatar
      skbuff: fix coalescing for page_pool fragment recycling · 1effe8ca
      Jean-Philippe Brucker authored
      Fix a use-after-free when using page_pool with page fragments. We
      encountered this problem during normal RX in the hns3 driver:
      
      (1) Initially we have three descriptors in the RX queue. The first one
          allocates PAGE1 through page_pool, and the other two allocate one
          half of PAGE2 each. Page references look like this:
      
                      RX_BD1 _______ PAGE1
                      RX_BD2 _______ PAGE2
                      RX_BD3 _________/
      
      (2) Handle RX on the first descriptor. Allocate SKB1, eventually added
          to the receive queue by tcp_queue_rcv().
      
      (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to
          netif_receive_skb():
      
          netif_receive_skb(SKB2)
            ip_rcv(SKB2)
              SKB3 = skb_clone(SKB2)
      
          SKB2 and SKB3 share a reference to PAGE2 through
          skb_shinfo()->dataref. The other ref to PAGE2 is still held by
          RX_BD3:
      
                            SKB2 ---+- PAGE2
                            SKB3 __/   /
                      RX_BD3 _________/
      
       (3b) Now while handling TCP, coalesce SKB3 with SKB1:
      
            tcp_v4_rcv(SKB3)
              tcp_try_coalesce(to=SKB1, from=SKB3)    // succeeds
              kfree_skb_partial(SKB3)
                skb_release_data(SKB3)                // drops one dataref
      
                            SKB1 _____ PAGE1
                                 \____
                            SKB2 _____ PAGE2
                                       /
                      RX_BD3 _________/
      
          In skb_try_coalesce(), __skb_frag_ref() takes a page reference to
          PAGE2, where it should instead have increased the page_pool frag
          reference, pp_frag_count. Without coalescing, when releasing both
          SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
          when releasing SKB1 and SKB2, two references to PAGE2 will be
          dropped, resulting in underflow.
      
       (3c) Drop SKB2:
      
            af_packet_rcv(SKB2)
              consume_skb(SKB2)
                skb_release_data(SKB2)                // drops second dataref
                  page_pool_return_skb_page(PAGE2)    // drops one pp_frag_count
      
                            SKB1 _____ PAGE1
                                 \____
                                       PAGE2
                                       /
                      RX_BD3 _________/
      
      (4) Userspace calls recvmsg()
          Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
          release the SKB3 page as well:
      
          tcp_eat_recv_skb(SKB1)
            skb_release_data(SKB1)
              page_pool_return_skb_page(PAGE1)
              page_pool_return_skb_page(PAGE2)        // drops second pp_frag_count
      
      (5) PAGE2 is freed, but the third RX descriptor was still using it!
          In our case this causes IOMMU faults, but it would silently corrupt
          memory if the IOMMU was disabled.
      
      Change the logic that checks whether pp_recycle SKBs can be coalesced.
      We still reject differing pp_recycle between 'from' and 'to' SKBs, but
      in order to avoid the situation described above, we also reject
      coalescing when both 'from' and 'to' are pp_recycled and 'from' is
      cloned.
      
      The new logic allows coalescing a cloned pp_recycle SKB into a page
      refcounted one, because in this case the release (4) will drop the right
      reference, the one taken by skb_try_coalesce().
      
      Fixes: 53e0961d
      
       ("page_pool: add frag page recycling support in page pool")
      Suggested-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Reviewed-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1effe8ca
    • Ziyang Xuan's avatar
      net/tls: fix slab-out-of-bounds bug in decrypt_internal · 9381fe8c
      Ziyang Xuan authored
      The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in
      tls_set_sw_offload(). The return value of crypto_aead_ivsize()
      for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes
      memory space will trigger slab-out-of-bounds bug as following:
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls]
      Read of size 16 at addr ffff888114e84e60 by task tls/10911
      
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       print_report.cold+0x5e/0x5db
       ? decrypt_internal+0x385/0xc40 [tls]
       kasan_report+0xab/0x120
       ? decrypt_internal+0x385/0xc40 [tls]
       kasan_check_range+0xf9/0x1e0
       memcpy+0x20/0x60
       decrypt_internal+0x385/0xc40 [tls]
       ? tls_get_rec+0x2e0/0x2e0 [tls]
       ? process_rx_list+0x1a5/0x420 [tls]
       ? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls]
       decrypt_skb_update+0x9d/0x400 [tls]
       tls_sw_recvmsg+0x3c8/0xb50 [tls]
      
      Allocated by task 10911:
       kasan_save_stack+0x1e/0x40
       __kasan_kmalloc+0x81/0xa0
       tls_set_sw_offload+0x2eb/0xa20 [tls]
       tls_setsockopt+0x68c/0x700 [tls]
       __sys_setsockopt+0xfe/0x1b0
      
      Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size
      when memcpy() iv value in TLS_1_3_VERSION scenario.
      
      Fixes: f295b3ae
      
       ("net/tls: Add support of AES128-CCM based ciphers")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9381fe8c
  5. Mar 31, 2022
  6. Mar 29, 2022
    • NeilBrown's avatar
      SUNRPC: handle malloc failure in ->request_prepare · eb07d5a4
      NeilBrown authored
      
      If ->request_prepare() detects an error, it sets ->rq_task->tk_status.
      This is easy for callers to ignore.
      The only caller is xprt_request_enqueue_receive() and it does ignore the
      error, as does call_encode() which calls it.  This can result in a
      request being queued to receive a reply without an allocated receive buffer.
      
      So instead of setting rq_task->tk_status, return an error, and store in
      ->tk_status only in call_encode();
      
      The call to xprt_request_enqueue_receive() is now earlier in
      call_encode(), where the error can still be handled.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      eb07d5a4
    • Jeremy Sowden's avatar
      netfilter: bitwise: fix reduce comparisons · 31818213
      Jeremy Sowden authored
      The `nft_bitwise_reduce` and `nft_bitwise_fast_reduce` functions should
      compare the bitwise operation in `expr` with the tracked operation
      associated with the destination register of `expr`.  However, instead of
      being called on `expr` and `track->regs[priv->dreg].selector`,
      `nft_expr_priv` is called on `expr` twice, so both reduce functions
      return true even when the operations differ.
      
      Fixes: be5650f8
      
       ("netfilter: nft_bitwise: track register operations")
      Signed-off-by: default avatarJeremy Sowden <jeremy@azazel.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      31818213
    • Duoming Zhou's avatar
      ax25: Fix UAF bugs in ax25 timers · 82e31755
      Duoming Zhou authored
      
      There are race conditions that may lead to UAF bugs in
      ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
      ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call
      ax25_release() to deallocate ax25_dev.
      
      One of the UAF bugs caused by ax25_release() is shown below:
      
            (Thread 1)                    |      (Thread 2)
      ax25_dev_device_up() //(1)          |
      ...                                 | ax25_kill_by_device()
      ax25_bind()          //(2)          |
      ax25_connect()                      | ...
       ax25_std_establish_data_link()     |
        ax25_start_t1timer()              | ax25_dev_device_down() //(3)
         mod_timer(&ax25->t1timer,..)     |
                                          | ax25_release()
         (wait a time)                    |  ...
                                          |  ax25_dev_put(ax25_dev) //(4)FREE
         ax25_t1timer_expiry()            |
          ax25->ax25_dev->values[..] //USE|  ...
           ...                            |
      
      We increase the refcount of ax25_dev in position (1) and (2), and
      decrease the refcount of ax25_dev in position (3) and (4).
      The ax25_dev will be freed in position (4) and be used in
      ax25_t1timer_expiry().
      
      The fail log is shown below:
      ==============================================================
      
      [  106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60
      [  106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0
      [  106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574
      [  106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14
      [  106.116942] Call Trace:
      ...
      [  106.116942]  ax25_t1timer_expiry+0x1c/0x60
      [  106.116942]  call_timer_fn+0x122/0x3d0
      [  106.116942]  __run_timers.part.0+0x3f6/0x520
      [  106.116942]  run_timer_softirq+0x4f/0xb0
      [  106.116942]  __do_softirq+0x1c2/0x651
      ...
      
      This patch adds del_timer_sync() in ax25_release(), which could ensure
      that all timers stop before we deallocate ax25_dev.
      
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      82e31755
    • Duoming Zhou's avatar
      ax25: fix UAF bug in ax25_send_control() · 5352a761
      Duoming Zhou authored
      There are UAF bugs in ax25_send_control(), when we call ax25_release()
      to deallocate ax25_dev. The possible race condition is shown below:
      
            (Thread 1)              |     (Thread 2)
      ax25_dev_device_up() //(1)    |
                                    | ax25_kill_by_device()
      ax25_bind()          //(2)    |
      ax25_connect()                | ...
       ax25->state = AX25_STATE_1   |
       ...                          | ax25_dev_device_down() //(3)
      
            (Thread 3)
      ax25_release()                |
       ax25_dev_put()  //(4) FREE   |
       case AX25_STATE_1:           |
        ax25_send_control()         |
         alloc_skb()       //USE    |
      
      The refcount of ax25_dev increases in position (1) and (2), and
      decreases in position (3) and (4). The ax25_dev will be freed
      before dereference sites in ax25_send_control().
      
      The following is part of the report:
      
      [  102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210
      [  102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602
      [  102.297448] Call Trace:
      [  102.303751]  ax25_send_control+0x33/0x210
      [  102.303751]  ax25_release+0x356/0x450
      [  102.305431]  __sock_release+0x6d/0x120
      [  102.305431]  sock_close+0xf/0x20
      [  102.305431]  __fput+0x11f/0x420
      [  102.305431]  task_work_run+0x86/0xd0
      [  102.307130]  get_signal+0x1075/0x1220
      [  102.308253]  arch_do_signal_or_restart+0x1df/0xc00
      [  102.308253]  exit_to_user_mode_prepare+0x150/0x1e0
      [  102.308253]  syscall_exit_to_user_mode+0x19/0x50
      [  102.308253]  do_syscall_64+0x48/0x90
      [  102.308253]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  102.308253] RIP: 0033:0x405ae7
      
      This patch defers the free operation of ax25_dev and net_device after
      all corresponding dereference sites in ax25_release() to avoid UAF.
      
      Fixes: 9fd75b66
      
       ("ax25: Fix refcount leaks caused by ax25_cb_del()")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5352a761
    • Martin Varghese's avatar
      openvswitch: Fixed nd target mask field in the flow dump. · f19c4445
      Martin Varghese authored
      IPv6 nd target mask was not getting populated in flow dump.
      
      In the function __ovs_nla_put_key the icmp code mask field was checked
      instead of icmp code key field to classify the flow as neighbour discovery.
      
      ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0),
      skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0),
      ct_zone(0/0),ct_mark(0/0),ct_label(0/0),
      eth(src=00:00:00:00:00:00/00:00:00:00:00:00,
      dst=00:00:00:00:00:00/00:00:00:00:00:00),
      eth_type(0x86dd),
      ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no),
      icmpv6(type=135,code=0),
      nd(target=2001::2/::,
      sll=00:00:00:00:00:00/00:00:00:00:00:00,
      tll=00:00:00:00:00:00/00:00:00:00:00:00),
      packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2
      
      Fixes: e6445719
      
       (openvswitch: Restructure datapath.c and flow.c)
      Signed-off-by: default avatarMartin Varghese <martin.varghese@nokia.com>
      Link: https://lore.kernel.org/r/20220328054148.3057-1-martinvarghesenokia@gmail.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f19c4445
  7. Mar 28, 2022
  8. Mar 26, 2022
  9. Mar 25, 2022
    • Eric Dumazet's avatar
      llc: only change llc->dev when bind() succeeds · 2d327a79
      Eric Dumazet authored
      My latest patch, attempting to fix the refcount leak in a minimal
      way turned out to add a new bug.
      
      Whenever the bind operation fails before we attempt to grab
      a reference count on a device, we might release the device refcount
      of a prior successful bind() operation.
      
      syzbot was not happy about this [1].
      
      Note to stable teams:
      
      Make sure commit b37a4668 ("netdevice: add the case if dev is NULL")
      is already present in your trees.
      
      [1]
      general protection fault, probably for non-canonical address 0xdffffc0000000070: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000380-0x0000000000000387]
      CPU: 1 PID: 3590 Comm: syz-executor361 Tainted: G        W         5.17.0-syzkaller-04796-g169e77764adc #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:llc_ui_connect+0x400/0xcb0 net/llc/af_llc.c:500
      Code: 80 3c 02 00 0f 85 fc 07 00 00 4c 8b a5 38 05 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d bc 24 80 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 a9 07 00 00 49 8b b4 24 80 03 00 00 4c 89 f2 48
      RSP: 0018:ffffc900038cfcc0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880756eb600 RCX: 0000000000000000
      RDX: 0000000000000070 RSI: ffffc900038cfe3e RDI: 0000000000000380
      RBP: ffff888015ee5000 R08: 0000000000000001 R09: ffff888015ee5535
      R10: ffffed1002bdcaa6 R11: 0000000000000000 R12: 0000000000000000
      R13: ffffc900038cfe37 R14: ffffc900038cfe38 R15: ffff888015ee5012
      FS:  0000555555acd300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000280 CR3: 0000000077db6000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       __sys_connect_file+0x155/0x1a0 net/socket.c:1900
       __sys_connect+0x161/0x190 net/socket.c:1917
       __do_sys_connect net/socket.c:1927 [inline]
       __se_sys_connect net/socket.c:1924 [inline]
       __x64_sys_connect+0x6f/0xb0 net/socket.c:1924
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f016acb90b9
      Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffd417947f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f016acb90b9
      RDX: 0000000000000010 RSI: 0000000020000140 RDI: 0000000000000003
      RBP: 00007f016ac7d0a0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f016ac7d130
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      Modules linked in:
      ---[ end trace 0000000000000000 ]---
      RIP: 0010:llc_ui_connect+0x400/0xcb0 net/llc/af_llc.c:500
      
      Fixes: 764f4eb6
      
       ("llc: fix netdevice reference leaks in llc_ui_bind()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: 赵子轩 <beraphin@gmail.com>
      Cc: Stoyan Manolov <smanolov@suse.de>
      Link: https://lore.kernel.org/r/20220325035827.360418-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2d327a79
    • Trond Myklebust's avatar
      SUNRPC: Don't return error values in sysfs read of closed files · ebbe7887
      Trond Myklebust authored
      
      Instead of returning an error value, which ends up being the return
      value for the read() system call, it is more elegant to simply return
      the error as a string value.
      
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      ebbe7887
    • Trond Myklebust's avatar
      SUNRPC: Do not dereference non-socket transports in sysfs · 421ab1be
      Trond Myklebust authored
      Do not cast the struct xprt to a sock_xprt unless we know it is a UDP or
      TCP transport. Otherwise the call to lock the mutex will scribble over
      whatever structure is actually there. This has been seen to cause hard
      system lockups when the underlying transport was RDMA.
      
      Fixes: b49ea673
      
       ("SUNRPC: lock against ->sock changing during sysfs read")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      421ab1be
  10. Mar 24, 2022