Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Dec 16, 2021
    • Juergen Gross's avatar
      xen/console: harden hvc_xen against event channel storms · fe415186
      Juergen Gross authored
      
      The Xen console driver is still vulnerable for an attack via excessive
      number of events sent by the backend. Fix that by using a lateeoi event
      channel.
      
      For the normal domU initial console this requires the introduction of
      bind_evtchn_to_irq_lateeoi() as there is no xenbus device available
      at the time the event channel is bound to the irq.
      
      As the decision whether an interrupt was spurious or not requires to
      test for bytes having been read from the backend, move sending the
      event into the if statement, as sending an event without having found
      any bytes to be read is making no sense at all.
      
      This is part of XSA-391
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      ---
      V2:
      - slightly adapt spurious irq detection (Jan Beulich)
      V3:
      - fix spurious irq detection (Jan Beulich)
      fe415186
  2. Dec 14, 2021
  3. Dec 10, 2021
    • SeongJae Park's avatar
      timers: implement usleep_idle_range() · e4779015
      SeongJae Park authored
      Patch series "mm/damon: Fix fake /proc/loadavg reports", v3.
      
      This patchset fixes DAMON's fake load report issue.  The first patch
      makes yet another variant of usleep_range() for this fix, and the second
      patch fixes the issue of DAMON by making it using the newly introduced
      function.
      
      This patch (of 2):
      
      Some kernel threads such as DAMON could need to repeatedly sleep in
      micro seconds level.  Because usleep_range() sleeps in uninterruptible
      state, however, such threads would make /proc/loadavg reports fake load.
      
      To help such cases, this commit implements a variant of usleep_range()
      called usleep_idle_range().  It is same to usleep_range() but sets the
      state of the current task as TASK_IDLE while sleeping.
      
      Link: https://lkml.kernel.org/r/20211126145015.15862-1-sj@kernel.org
      Link: https://lkml.kernel.org/r/20211126145015.15862-2-sj@kernel.org
      
      
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Suggested-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e4779015
    • Drew DeVault's avatar
      Increase default MLOCK_LIMIT to 8 MiB · 9dcc38e2
      Drew DeVault authored
      This limit has not been updated since 2008, when it was increased to 64
      KiB at the request of GnuPG.  Until recently, the main use-cases for this
      feature were (1) preventing sensitive memory from being swapped, as in
      GnuPG's use-case; and (2) real-time use-cases.  In the first case, little
      memory is called for, and in the second case, the user is generally in a
      position to increase it if they need more.
      
      The introduction of IOURING_REGISTER_BUFFERS adds a third use-case:
      preparing fixed buffers for high-performance I/O.  This use-case will take
      as much of this memory as it can get, but is still limited to 64 KiB by
      default, which is very little.  This increases the limit to 8 MB, which
      was chosen fairly arbitrarily as a more generous, but still conservative,
      default value.
      
      It is also possible to raise this limit in userspace.  This is easily
      done, for example, in the use-case of a network daemon: systemd, for
      instance, provides for this via LimitMEMLOCK in the service file; OpenRC
      via the rc_ulimit variables.  However, there is no established userspace
      facility for configuring this outside of daemons: end-user applications do
      not presently have access to a convenient means of raising their limits.
      
      The buck, as it were, stops with the kernel.  It's much easier to address
      it here than it is to bring it to hundreds of distributions, and it can
      only realistically be relied upon to be high-enough by end-user software
      if it is more-or-less ubiquitous.  Most distros don't change this
      particular rlimit from the kernel-supplied default value, so a change here
      will easily provide that ubiquity.
      
      Link: https://lkml.kernel.org/r/20211028080813.15966-1-sir@cmpwn.com
      
      
      Signed-off-by: default avatarDrew DeVault <sir@cmpwn.com>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Acked-by: default avatarCyril Hrubis <chrubis@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Andrew Dona-Couch <andrew@donacou.ch>
      Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9dcc38e2
  4. Dec 09, 2021
  5. Dec 08, 2021
  6. Dec 07, 2021
    • Eric Dumazet's avatar
      netfilter: conntrack: annotate data-races around ct->timeout · 802a7dc5
      Eric Dumazet authored
      (struct nf_conn)->timeout can be read/written locklessly,
      add READ_ONCE()/WRITE_ONCE() to prevent load/store tearing.
      
      BUG: KCSAN: data-race in __nf_conntrack_alloc / __nf_conntrack_find_get
      
      write to 0xffff888132e78c08 of 4 bytes by task 6029 on cpu 0:
       __nf_conntrack_alloc+0x158/0x280 net/netfilter/nf_conntrack_core.c:1563
       init_conntrack+0x1da/0xb30 net/netfilter/nf_conntrack_core.c:1635
       resolve_normal_ct+0x502/0x610 net/netfilter/nf_conntrack_core.c:1746
       nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
       ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
       nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
       nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
       nf_hook include/linux/netfilter.h:262 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
       inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
       __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
       tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
       tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
       __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
       tcp_push_pending_frames include/net/tcp.h:1897 [inline]
       tcp_data_snd_check+0x62/0x2e0 net/ipv4/tcp_input.c:5452
       tcp_rcv_established+0x880/0x10e0 net/ipv4/tcp_input.c:5947
       tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0xf2/0x270 net/core/sock.c:2768
       release_sock+0x40/0x110 net/core/sock.c:3300
       sk_stream_wait_memory+0x435/0x700 net/core/stream.c:145
       tcp_sendmsg_locked+0xb85/0x25a0 net/ipv4/tcp.c:1402
       tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1440
       inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:644
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       __sys_sendto+0x21e/0x2c0 net/socket.c:2036
       __do_sys_sendto net/socket.c:2048 [inline]
       __se_sys_sendto net/socket.c:2044 [inline]
       __x64_sys_sendto+0x74/0x90 net/socket.c:2044
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888132e78c08 of 4 bytes by task 17446 on cpu 1:
       nf_ct_is_expired include/net/netfilter/nf_conntrack.h:286 [inline]
       ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:776 [inline]
       __nf_conntrack_find_get+0x1c7/0xac0 net/netfilter/nf_conntrack_core.c:807
       resolve_normal_ct+0x273/0x610 net/netfilter/nf_conntrack_core.c:1734
       nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
       ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
       nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
       nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
       nf_hook include/linux/netfilter.h:262 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
       inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
       __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
       __tcp_send_ack+0x1fd/0x300 net/ipv4/tcp_output.c:3956
       tcp_send_ack+0x23/0x30 net/ipv4/tcp_output.c:3962
       __tcp_ack_snd_check+0x2d8/0x510 net/ipv4/tcp_input.c:5478
       tcp_ack_snd_check net/ipv4/tcp_input.c:5523 [inline]
       tcp_rcv_established+0x8c2/0x10e0 net/ipv4/tcp_input.c:5948
       tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0xf2/0x270 net/core/sock.c:2768
       release_sock+0x40/0x110 net/core/sock.c:3300
       tcp_sendpage+0x94/0xb0 net/ipv4/tcp.c:1114
       inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833
       rds_tcp_xmit+0x376/0x5f0 net/rds/tcp_send.c:118
       rds_send_xmit+0xbed/0x1500 net/rds/send.c:367
       rds_send_worker+0x43/0x200 net/rds/threads.c:200
       process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
       worker_thread+0x616/0xa70 kernel/workqueue.c:2445
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00027cc2 -> 0x00000000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G        W         5.16.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: krdsd rds_send_worker
      
      Note: I chose an arbitrary commit for the Fixes: tag,
      because I do not think we need to backport this fix to very old kernels.
      
      Fixes: e37542ba
      
       ("netfilter: conntrack: avoid possible false sharing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      802a7dc5
  7. Dec 03, 2021
    • Jakub Kicinski's avatar
      treewide: Add missing includes masked by cgroup -> bpf dependency · 8581fd40
      Jakub Kicinski authored
      
      cgroup.h (therefore swap.h, therefore half of the universe)
      includes bpf.h which in turn includes module.h and slab.h.
      Since we're about to get rid of that dependency we need
      to clean things up.
      
      v2: drop the cpu.h include from cacheinfo.h, it's not necessary
      and it makes riscv sensitive to ordering of include files.
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarKrzysztof Wilczyński <kw@linux.com>
      Acked-by: default avatarPeter Chen <peter.chen@kernel.org>
      Acked-by: default avatarSeongJae Park <sj@kernel.org>
      Acked-by: default avatarJani Nikula <jani.nikula@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/all/20211120035253.72074-1-kuba@kernel.org/  # v1
      Link: https://lore.kernel.org/all/20211120165528.197359-1-kuba@kernel.org/ # cacheinfo discussion
      Link: https://lore.kernel.org/bpf/20211202203400.1208663-1-kuba@kernel.org
      8581fd40
    • Eric Dumazet's avatar
      bonding: make tx_rebalance_counter an atomic · dac8e00f
      Eric Dumazet authored
      KCSAN reported a data-race [1] around tx_rebalance_counter
      which can be accessed from different contexts, without
      the protection of a lock/mutex.
      
      [1]
      BUG: KCSAN: data-race in bond_alb_init_slave / bond_alb_monitor
      
      write to 0xffff888157e8ca24 of 4 bytes by task 7075 on cpu 0:
       bond_alb_init_slave+0x713/0x860 drivers/net/bonding/bond_alb.c:1613
       bond_enslave+0xd94/0x3010 drivers/net/bonding/bond_main.c:1949
       do_set_master net/core/rtnetlink.c:2521 [inline]
       __rtnl_newlink net/core/rtnetlink.c:3475 [inline]
       rtnl_newlink+0x1298/0x13b0 net/core/rtnetlink.c:3506
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2491
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x6e1/0x7d0 net/netlink/af_netlink.c:1916
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2492
       __do_sys_sendmsg net/socket.c:2501 [inline]
       __se_sys_sendmsg net/socket.c:2499 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888157e8ca24 of 4 bytes by task 1082 on cpu 1:
       bond_alb_monitor+0x8f/0xc00 drivers/net/bonding/bond_alb.c:1511
       process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
       worker_thread+0x616/0xa70 kernel/workqueue.c:2445
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000001 -> 0x00000064
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1082 Comm: kworker/u4:3 Not tainted 5.16.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: bond1 bond_alb_monitor
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dac8e00f
    • Eric Dumazet's avatar
      tcp: fix another uninit-value (sk_rx_queue_mapping) · 03cfda4f
      Eric Dumazet authored
      KMSAN is still not happy [1].
      
      I missed that passive connections do not inherit their
      sk_rx_queue_mapping values from the request socket,
      but instead tcp_child_process() is calling
      sk_mark_napi_id(child, skb)
      
      We have many sk_mark_napi_id() callers, so I am providing
      a new helper, forcing the setting sk_rx_queue_mapping
      and sk_napi_id.
      
      Note that we had no KMSAN report for sk_napi_id because
      passive connections got a copy of this field from the listener.
      sk_rx_queue_mapping in the other hand is inside the
      sk_dontcopy_begin/sk_dontcopy_end so sk_clone_lock()
      leaves this field uninitialized.
      
      We might remove dead code populating req->sk_rx_queue_mapping
      in the future.
      
      [1]
      
      BUG: KMSAN: uninit-value in __sk_rx_queue_set include/net/sock.h:1924 [inline]
      BUG: KMSAN: uninit-value in sk_rx_queue_update include/net/sock.h:1938 [inline]
      BUG: KMSAN: uninit-value in sk_mark_napi_id include/net/busy_poll.h:136 [inline]
      BUG: KMSAN: uninit-value in tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
       __sk_rx_queue_set include/net/sock.h:1924 [inline]
       sk_rx_queue_update include/net/sock.h:1938 [inline]
       sk_mark_napi_id include/net/busy_poll.h:136 [inline]
       tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
       tcp_v4_rcv+0x3d83/0x4ed0 net/ipv4/tcp_ipv4.c:2066
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
       run_ksoftirqd+0x33/0x50 kernel/softirq.c:920
       smpboot_thread_fn+0x616/0xbf0 kernel/smpboot.c:164
       kthread+0x721/0x850 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      Uninit was created at:
       __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
       alloc_pages+0x8a5/0xb80
       alloc_slab_page mm/slub.c:1810 [inline]
       allocate_slab+0x287/0x1c20 mm/slub.c:1947
       new_slab mm/slub.c:2010 [inline]
       ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
       __slab_alloc mm/slub.c:3126 [inline]
       slab_alloc_node mm/slub.c:3217 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
       sk_prot_alloc+0xeb/0x570 net/core/sock.c:1914
       sk_clone_lock+0xd6/0x1940 net/core/sock.c:2118
       inet_csk_clone_lock+0x8d/0x6a0 net/ipv4/inet_connection_sock.c:956
       tcp_create_openreq_child+0xb1/0x1ef0 net/ipv4/tcp_minisocks.c:453
       tcp_v4_syn_recv_sock+0x268/0x2710 net/ipv4/tcp_ipv4.c:1563
       tcp_check_req+0x207c/0x2a30 net/ipv4/tcp_minisocks.c:765
       tcp_v4_rcv+0x36f5/0x4ed0 net/ipv4/tcp_ipv4.c:2047
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
      
      Fixes: 342159ee ("net: avoid dirtying sk->sk_rx_queue_mapping")
      Fixes: a37a0ee4
      
       ("net: avoid uninit-value from tcp_conn_request")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03cfda4f
  8. Dec 02, 2021
  9. Dec 01, 2021
    • Eric Dumazet's avatar
      net: avoid uninit-value from tcp_conn_request · a37a0ee4
      Eric Dumazet authored
      A recent change triggers a KMSAN warning, because request
      sockets do not initialize @sk_rx_queue_mapping field.
      
      Add sk_rx_queue_update() helper to make our intent clear.
      
      BUG: KMSAN: uninit-value in sk_rx_queue_set include/net/sock.h:1922 [inline]
      BUG: KMSAN: uninit-value in tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922
       sk_rx_queue_set include/net/sock.h:1922 [inline]
       tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922
       tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528
       tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406
       tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738
       tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
       invoke_softirq+0xa4/0x130 kernel/softirq.c:432
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x76/0x130 kernel/softirq.c:648
       common_interrupt+0xb6/0xd0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40
       smap_restore arch/x86/include/asm/smap.h:67 [inline]
       get_shadow_origin_ptr mm/kmsan/instrumentation.c:31 [inline]
       __msan_metadata_ptr_for_load_1+0x28/0x30 mm/kmsan/instrumentation.c:63
       tomoyo_check_acl+0x1b0/0x630 security/tomoyo/domain.c:173
       tomoyo_path_permission security/tomoyo/file.c:586 [inline]
       tomoyo_check_open_permission+0x61f/0xe10 security/tomoyo/file.c:777
       tomoyo_file_open+0x24f/0x2d0 security/tomoyo/tomoyo.c:311
       security_file_open+0xb1/0x1f0 security/security.c:1635
       do_dentry_open+0x4e4/0x1bf0 fs/open.c:809
       vfs_open+0xaf/0xe0 fs/open.c:957
       do_open fs/namei.c:3426 [inline]
       path_openat+0x52f1/0x5dd0 fs/namei.c:3559
       do_filp_open+0x306/0x760 fs/namei.c:3586
       do_sys_openat2+0x263/0x8f0 fs/open.c:1212
       do_sys_open fs/open.c:1228 [inline]
       __do_sys_open fs/open.c:1236 [inline]
       __se_sys_open fs/open.c:1232 [inline]
       __x64_sys_open+0x314/0x380 fs/open.c:1232
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
       alloc_pages+0x8a5/0xb80
       alloc_slab_page mm/slub.c:1810 [inline]
       allocate_slab+0x287/0x1c20 mm/slub.c:1947
       new_slab mm/slub.c:2010 [inline]
       ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
       __slab_alloc mm/slub.c:3126 [inline]
       slab_alloc_node mm/slub.c:3217 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
       reqsk_alloc include/net/request_sock.h:91 [inline]
       inet_reqsk_alloc+0xaf/0x8b0 net/ipv4/tcp_input.c:6712
       tcp_conn_request+0x910/0x4dc0 net/ipv4/tcp_input.c:6852
       tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528
       tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406
       tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738
       tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
      
      Fixes: 342159ee
      
       ("net: avoid dirtying sk->sk_rx_queue_mapping")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211130182939.2584764-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a37a0ee4
    • Eric Dumazet's avatar
      net: annotate data-races on txq->xmit_lock_owner · 7a10d8c8
      Eric Dumazet authored
      syzbot found that __dev_queue_xmit() is reading txq->xmit_lock_owner
      without annotations.
      
      No serious issue there, let's document what is happening there.
      
      BUG: KCSAN: data-race in __dev_queue_xmit / __dev_queue_xmit
      
      write to 0xffff888139d09484 of 4 bytes by interrupt on cpu 0:
       __netif_tx_unlock include/linux/netdevice.h:4437 [inline]
       __dev_queue_xmit+0x948/0xf70 net/core/dev.c:4229
       dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
       macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
       macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
       __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
       netdev_start_xmit include/linux/netdevice.h:5001 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3590
       dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
       sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
       __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
       __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
       neigh_hh_output include/net/neighbour.h:511 [inline]
       neigh_output include/net/neighbour.h:525 [inline]
       ip6_finish_output2+0x995/0xbb0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:450 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
       ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
       addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
       call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
       expire_timers+0x116/0x240 kernel/time/timer.c:1466
       __run_timers+0x368/0x410 kernel/time/timer.c:1734
       run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
       sysvec_apic_timer_interrupt+0x3e/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      read to 0xffff888139d09484 of 4 bytes by interrupt on cpu 1:
       __dev_queue_xmit+0x5e3/0xf70 net/core/dev.c:4213
       dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
       macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
       macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
       __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
       netdev_start_xmit include/linux/netdevice.h:5001 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3590
       dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
       sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
       __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
       __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
       neigh_resolve_output+0x3db/0x410 net/core/neighbour.c:1523
       neigh_output include/net/neighbour.h:527 [inline]
       ip6_finish_output2+0x9be/0xbb0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:450 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
       ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
       addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
       call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
       expire_timers+0x116/0x240 kernel/time/timer.c:1466
       __run_timers+0x368/0x410 kernel/time/timer.c:1734
       run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
       sysvec_apic_timer_interrupt+0x8d/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
       kcsan_setup_watchpoint+0x94/0x420 kernel/kcsan/core.c:443
       folio_test_anon include/linux/page-flags.h:581 [inline]
       PageAnon include/linux/page-flags.h:586 [inline]
       zap_pte_range+0x5ac/0x10e0 mm/memory.c:1347
       zap_pmd_range mm/memory.c:1467 [inline]
       zap_pud_range mm/memory.c:1496 [inline]
       zap_p4d_range mm/memory.c:1517 [inline]
       unmap_page_range+0x2dc/0x3d0 mm/memory.c:1538
       unmap_single_vma+0x157/0x210 mm/memory.c:1583
       unmap_vmas+0xd0/0x180 mm/memory.c:1615
       exit_mmap+0x23d/0x470 mm/mmap.c:3170
       __mmput+0x27/0x1b0 kernel/fork.c:1113
       mmput+0x3d/0x50 kernel/fork.c:1134
       exit_mm+0xdb/0x170 kernel/exit.c:507
       do_exit+0x608/0x17a0 kernel/exit.c:819
       do_group_exit+0xce/0x180 kernel/exit.c:929
       get_signal+0xfc3/0x1550 kernel/signal.c:2852
       arch_do_signal_or_restart+0x8c/0x2e0 arch/x86/kernel/signal.c:868
       handle_signal_work kernel/entry/common.c:148 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
       exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0xffffffff
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 28712 Comm: syz-executor.0 Tainted: G        W         5.16.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211130170155.2331929-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a10d8c8
    • Masami Hiramatsu's avatar
      kprobes: Limit max data_size of the kretprobe instances · 6bbfa441
      Masami Hiramatsu authored
      The 'kprobe::data_size' is unsigned, thus it can not be negative.  But if
      user sets it enough big number (e.g. (size_t)-8), the result of 'data_size
      + sizeof(struct kretprobe_instance)' becomes smaller than sizeof(struct
      kretprobe_instance) or zero. In result, the kretprobe_instance are
      allocated without enough memory, and kretprobe accesses outside of
      allocated memory.
      
      To avoid this issue, introduce a max limitation of the
      kretprobe::data_size. 4KB per instance should be OK.
      
      Link: https://lkml.kernel.org/r/163836995040.432120.10322772773821182925.stgit@devnote2
      
      Cc: stable@vger.kernel.org
      Fixes: f47cd9b5
      
       ("kprobes: kretprobe user entry-handler")
      Reported-by: default avatarzhangyue <zhangyue1@kylinos.cn>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      6bbfa441
    • Aya Levin's avatar
      net/mlx5: Fix access to a non-supported register · 502e82b9
      Aya Levin authored
      Validate MRTC register is supported before triggering a delayed work
      which accesses it.
      
      Fixes: 5a1023de
      
       ("net/mlx5: Add periodic update of host time to firmware")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      502e82b9
  10. Nov 30, 2021
  11. Nov 29, 2021
    • Arnd Bergmann's avatar
      siphash: use _unaligned version by default · f7e5b9bf
      Arnd Bergmann authored
      On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
      because the ordinary load/store instructions (ldr, ldrh, ldrb) can
      tolerate any misalignment of the memory address. However, load/store
      double and load/store multiple instructions (ldrd, ldm) may still only
      be used on memory addresses that are 32-bit aligned, and so we have to
      use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
      may end up with a severe performance hit due to alignment traps that
      require fixups by the kernel. Testing shows that this currently happens
      with clang-13 but not gcc-11. In theory, any compiler version can
      produce this bug or other problems, as we are dealing with undefined
      behavior in C99 even on architectures that support this in hardware,
      see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363.
      
      Fortunately, the get_unaligned() accessors do the right thing: when
      building for ARMv6 or later, the compiler will emit unaligned accesses
      using the ordinary load/store instructions (but avoid the ones that
      require 32-bit alignment). When building for older ARM, those accessors
      will emit the appropriate sequence of ldrb/mov/orr instructions. And on
      architectures that can truly tolerate any kind of misalignment, the
      get_unaligned() accessors resolve to the leXX_to_cpup accessors that
      operate on aligned addresses.
      
      Since the compiler will in fact emit ldrd or ldm instructions when
      building this code for ARM v6 or later, the solution is to use the
      unaligned accessors unconditionally on architectures where this is
      known to be fast. The _aligned version of the hash function is
      however still needed to get the best performance on architectures
      that cannot do any unaligned access in hardware.
      
      This new version avoids the undefined behavior and should produce
      the fastest hash on all architectures we support.
      
      Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuvel@linaro.org/
      Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAqT=8dQfa0=cqQ@mail.gmail.com/
      
      
      Reported-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Fixes: 2c956a60
      
       ("siphash: add cryptographically secure PRF")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7e5b9bf
    • Jason A. Donenfeld's avatar
      wireguard: device: reset peer src endpoint when netns exits · 20ae1d6a
      Jason A. Donenfeld authored
      
      Each peer's endpoint contains a dst_cache entry that takes a reference
      to another netdev. When the containing namespace exits, we take down the
      socket and prevent future sockets from being created (by setting
      creating_net to NULL), which removes that potential reference on the
      netns. However, it doesn't release references to the netns that a netdev
      cached in dst_cache might be taking, so the netns still might fail to
      exit. Since the socket is gimped anyway, we can simply clear all the
      dst_caches (by way of clearing the endpoint src), which will release all
      references.
      
      However, the current dst_cache_reset function only releases those
      references lazily. But it turns out that all of our usages of
      wg_socket_clear_peer_endpoint_src are called from contexts that are not
      exactly high-speed or bottle-necked. For example, when there's
      connection difficulty, or when userspace is reconfiguring the interface.
      And in particular for this patch, when the netns is exiting. So for
      those cases, it makes more sense to call dst_release immediately. For
      that, we add a small helper function to dst_cache.
      
      This patch also adds a test to netns.sh from Hangbin Liu to ensure this
      doesn't regress.
      
      Tested-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Cc: Toke Høiland-Jørgensen <toke@redhat.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Fixes: 900575aa
      
       ("wireguard: device: avoid circular netns references")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      20ae1d6a
    • msizanoen1's avatar
      ipv6: fix memory leak in fib6_rule_suppress · cdef4852
      msizanoen1 authored
      The kernel leaks memory when a `fib` rule is present in IPv6 nftables
      firewall rules and a suppress_prefix rule is present in the IPv6 routing
      rules (used by certain tools such as wg-quick). In such scenarios, every
      incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
      
      After some hours of `bpftrace`-ing and source code reading, I tracked
      down the issue to ca7a03c4 ("ipv6: do not free rt if
      FIB_LOOKUP_NOREF is set on suppress rule").
      
      The problem with that change is that the generic `args->flags` always have
      `FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag
      `RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not
      decreasing the refcount when needed.
      
      How to reproduce:
       - Add the following nftables rule to a prerouting chain:
           meta nfproto ipv6 fib saddr . mark . iif oif missing drop
         This can be done with:
           sudo nft create table inet test
           sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
           sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
       - Run:
           sudo ip -6 rule add table main suppress_prefixlength 0
       - Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase
         with every incoming ipv6 packet.
      
      This patch exposes the protocol-specific flags to the protocol
      specific `suppress` function, and check the protocol-specific `flags`
      argument for RT6_LOOKUP_F_DST_NOREF instead of the generic
      FIB_LOOKUP_NOREF when decreasing the refcount, like this.
      
      [1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
      [2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105
      Fixes: ca7a03c4
      
       ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdef4852
    • Paolo Abeni's avatar
      tcp: fix page frag corruption on page fault · dacb5d88
      Paolo Abeni authored
      Steffen reported a TCP stream corruption for HTTP requests
      served by the apache web-server using a cifs mount-point
      and memory mapping the relevant file.
      
      The root cause is quite similar to the one addressed by
      commit 20eb4f29
      
       ("net: fix sk_page_frag() recursion from
      memory reclaim"). Here the nested access to the task page frag
      is caused by a page fault on the (mmapped) user-space memory
      buffer coming from the cifs file.
      
      The page fault handler performs an smb transaction on a different
      socket, inside the same process context. Since sk->sk_allaction
      for such socket does not prevent the usage for the task_frag,
      the nested allocation modify "under the hood" the page frag
      in use by the outer sendmsg call, corrupting the stream.
      
      The overall relevant stack trace looks like the following:
      
      httpd 78268 [001] 3461630.850950:      probe:tcp_sendmsg_locked:
              ffffffff91461d91 tcp_sendmsg_locked+0x1
              ffffffff91462b57 tcp_sendmsg+0x27
              ffffffff9139814e sock_sendmsg+0x3e
              ffffffffc06dfe1d smb_send_kvec+0x28
              [...]
              ffffffffc06cfaf8 cifs_readpages+0x213
              ffffffff90e83c4b read_pages+0x6b
              ffffffff90e83f31 __do_page_cache_readahead+0x1c1
              ffffffff90e79e98 filemap_fault+0x788
              ffffffff90eb0458 __do_fault+0x38
              ffffffff90eb5280 do_fault+0x1a0
              ffffffff90eb7c84 __handle_mm_fault+0x4d4
              ffffffff90eb8093 handle_mm_fault+0xc3
              ffffffff90c74f6d __do_page_fault+0x1ed
              ffffffff90c75277 do_page_fault+0x37
              ffffffff9160111e page_fault+0x1e
              ffffffff9109e7b5 copyin+0x25
              ffffffff9109eb40 _copy_from_iter_full+0xe0
              ffffffff91462370 tcp_sendmsg_locked+0x5e0
              ffffffff91462370 tcp_sendmsg_locked+0x5e0
              ffffffff91462b57 tcp_sendmsg+0x27
              ffffffff9139815c sock_sendmsg+0x4c
              ffffffff913981f7 sock_write_iter+0x97
              ffffffff90f2cc56 do_iter_readv_writev+0x156
              ffffffff90f2dff0 do_iter_write+0x80
              ffffffff90f2e1c3 vfs_writev+0xa3
              ffffffff90f2e27c do_writev+0x5c
              ffffffff90c042bb do_syscall_64+0x5b
              ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65
      
      The cifs filesystem rightfully sets sk_allocations to GFP_NOFS,
      we can avoid the nesting using the sk page frag for allocation
      lacking the __GFP_FS flag. Do not define an additional mm-helper
      for that, as this is strictly tied to the sk page frag usage.
      
      v1 -> v2:
       - use a stricted sk_page_frag() check instead of reordering the
         code (Eric)
      
      Reported-by: default avatarSteffen Froemer <sfroemer@redhat.com>
      Fixes: 5640f768
      
       ("net: use a per task frag allocator")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dacb5d88
    • Gurchetan Singh's avatar
      drm/virtgpu api: define a dummy fence signaled event · 7e78781d
      Gurchetan Singh authored
      The current virtgpu implementation of poll(..) drops events
      when VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK is enabled (otherwise
      it's like a normal DRM driver).
      
      This is because paravirtualized userspaces receives responses in a
      buffer of type BLOB_MEM_GUEST, not by read(..).
      
      To be in line with other DRM drivers and avoid specialized behavior,
      it is possible to define a dummy event for virtgpu.  Paravirtualized
      userspace will now have to call read(..) on the DRM fd to receive the
      dummy event.
      
      Fixes: b1079043
      
       ("drm/virtgpu api: create context init feature")
      Reported-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGurchetan Singh <gurchetansingh@chromium.org>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20211122232210.602-2-gurchetansingh@google.com
      
      
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      7e78781d
    • Finn Behrens's avatar
      nl80211: reset regdom when reloading regdb · 1eda9191
      Finn Behrens authored
      
      Reload the regdom when the regulatory db is reloaded.
      Otherwise, the user had to change the regulatoy domain
      to a different one and then reset it to the correct
      one to have a new regulatory db take effect after a
      reload.
      
      Signed-off-by: default avatarFinn Behrens <fin@nyantec.com>
      Link: https://lore.kernel.org/r/YaIIZfxHgqc/UTA7@gimli.kloenk.dev
      
      
      [edit commit message]
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      1eda9191
  12. Nov 26, 2021
    • Vladimir Oltean's avatar
      net: ptp: add a definition for the UDP port for IEEE 1588 general messages · ec15baec
      Vladimir Oltean authored
      
      As opposed to event messages (Sync, PdelayReq etc) which require
      timestamping, general messages (Announce, FollowUp etc) do not.
      In PTP they are part of different streams of data.
      
      IEEE 1588-2008 Annex D.2 "UDP port numbers" states that the UDP
      destination port assigned by IANA is 319 for event messages, and 320 for
      general messages. Yet the kernel seems to be missing the definition for
      general messages. This patch adds it.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec15baec
    • Vladimir Oltean's avatar
      net: mscc: ocelot: create a function that replaces an existing VCAP filter · 95706be1
      Vladimir Oltean authored
      
      VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind
      tc flower offload on ocelot, among other things. The ingress port mask
      on which VCAP rules match is present as a bit field in the actual key of
      the rule. This means that it is possible for a rule to be shared among
      multiple source ports. When the rule is added one by one on each desired
      port, that the ingress port mask of the key must be edited and rewritten
      to hardware.
      
      But the API in ocelot_vcap.c does not allow for this. For one thing,
      ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric,
      because ocelot_vcap_filter_add() works with a preallocated and
      prepopulated filter and programs it to hardware, and
      ocelot_vcap_filter_del() does both the job of removing the specified
      filter from hardware, as well as kfreeing it. That is to say, the only
      option of editing a filter in place, which is to delete it, modify the
      structure and add it back, does not work because it results in
      use-after-free.
      
      This patch introduces ocelot_vcap_filter_replace, which trivially
      reprograms a VCAP entry to hardware, at the exact same index at which it
      existed before, without modifying any list or allocating any memory.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95706be1
  13. Nov 24, 2021
  14. Nov 23, 2021
  15. Nov 22, 2021
  16. Nov 20, 2021