Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jun 08, 2023
  2. Feb 15, 2023
  3. Feb 08, 2023
  4. Feb 01, 2023
    • Sriram Yagnaraman's avatar
      netfilter: conntrack: unify established states for SCTP paths · 743435cd
      Sriram Yagnaraman authored
      commit a44b7651 upstream.
      
      An SCTP endpoint can start an association through a path and tear it
      down over another one. That means the initial path will not see the
      shutdown sequence, and the conntrack entry will remain in ESTABLISHED
      state for 5 days.
      
      By merging the HEARTBEAT_ACKED and ESTABLISHED states into one
      ESTABLISHED state, there remains no difference between a primary or
      secondary path. The timeout for the merged ESTABLISHED state is set to
      210 seconds (hb_interval * max_path_retrans + rto_max). So, even if a
      path doesn't see the shutdown sequence, it will expire in a reasonable
      amount of time.
      
      With this change in place, there is now more than one state from which
      we can transition to ESTABLISHED, COOKIE_ECHOED and HEARTBEAT_SENT, so
      handle the setting of ASSURED bit whenever a state change has happened
      and the new state is ESTABLISHED. Removed the check for dir==REPLY since
      the transition to ESTABLISHED can happen only in the reply direction.
      
      Fixes: 9fb9cbb1
      
       ("[NETFILTER]: Add nf_conntrack subsystem.")
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      743435cd
    • Andy Shevchenko's avatar
      units: Add SI metric prefix definitions · 8ebc2efc
      Andy Shevchenko authored
      [ Upstream commit 26471d4a
      
       ]
      
      Sometimes it's useful to have well-defined SI metric prefix to be used
      to self-describe the formulas or equations.
      
      List most popular ones in the units.h.
      
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Stable-dep-of: c8c37bc5
      
       ("i2c: designware: use casting of u64 in clock multiplication to avoid overflow")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8ebc2efc
    • Daniel Lezcano's avatar
      units: Add Watt units · 974aaf11
      Daniel Lezcano authored
      [ Upstream commit 2ee5f8f0
      
       ]
      
      As there are the temperature units, let's add the Watt macros definition.
      
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Stable-dep-of: c8c37bc5
      
       ("i2c: designware: use casting of u64 in clock multiplication to avoid overflow")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      974aaf11
    • Kees Cook's avatar
      panic: Consolidate open-coded panic_on_warn checks · 55eba182
      Kees Cook authored
      commit 79cc1ba7
      
       upstream.
      
      Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll
      their own warnings, and each check "panic_on_warn". Consolidate this
      into a single function so that future instrumentation can be added in
      a single location.
      
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: David Gow <davidgow@google.com>
      Cc: tangmeng <tangmeng@uniontech.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: kasan-dev@googlegroups.com
      Cc: linux-mm@kvack.org
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Link: https://lore.kernel.org/r/20221117234328.594699-4-keescook@chromium.org
      
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      55eba182
    • Eric W. Biederman's avatar
      exit: Add and use make_task_dead. · d9c740c7
      Eric W. Biederman authored
      commit 0e25498f
      
       upstream.
      
      There are two big uses of do_exit.  The first is it's design use to be
      the guts of the exit(2) system call.  The second use is to terminate
      a task after something catastrophic has happened like a NULL pointer
      in kernel code.
      
      Add a function make_task_dead that is initialy exactly the same as
      do_exit to cover the cases where do_exit is called to handle
      catastrophic failure.  In time this can probably be reduced to just a
      light wrapper around do_task_dead. For now keep it exactly the same so
      that there will be no behavioral differences introducing this new
      concept.
      
      Replace all of the uses of do_exit that use it for catastraphic
      task cleanup with make_task_dead to make it clear what the code
      is doing.
      
      As part of this rename rewind_stack_do_exit
      rewind_stack_and_make_dead.
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d9c740c7
    • tangmeng's avatar
      kernel/panic: move panic sysctls to its own file · e97ec099
      tangmeng authored
      commit 9df91869
      
       upstream.
      
      kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
      dishes, this makes it very difficult to maintain.
      
      To help with this maintenance let's start by moving sysctls to places
      where they actually belong.  The proc sysctl maintainers do not want to
      know what sysctl knobs you wish to add for your own piece of code, we
      just care about the core logic.
      
      All filesystem syctls now get reviewed by fs folks. This commit
      follows the commit of fs, move the oops_all_cpu_backtrace sysctl to
      its own file, kernel/panic.c.
      
      Signed-off-by: default avatartangmeng <tangmeng@uniontech.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e97ec099
    • Xiaoming Ni's avatar
      sysctl: add a new register_sysctl_init() interface · e6226917
      Xiaoming Ni authored
      commit 3ddd9a80 upstream.
      
      Patch series "sysctl: first set of kernel/sysctl cleanups", v2.
      
      Finally had time to respin the series of the work we had started last
      year on cleaning up the kernel/sysct.c kitchen sink.  People keeps
      stuffing their sysctls in that file and this creates a maintenance
      burden.  So this effort is aimed at placing sysctls where they actually
      belong.
      
      I'm going to split patches up into series as there is quite a bit of
      work.
      
      This first set adds register_sysctl_init() for uses of registerting a
      sysctl on the init path, adds const where missing to a few places,
      generalizes common values so to be more easy to share, and starts the
      move of a few kernel/sysctl.c out where they belong.
      
      The majority of rework on v2 in this first patch set is 0-day fixes.
      Eric Biederman's feedback is later addressed in subsequent patch sets.
      
      I'll only post the first two patch sets for now.  We can address the
      rest once the first two patch sets get completely reviewed / Acked.
      
      This patch (of 9):
      
      The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
      dishes, this makes it very difficult to maintain.
      
      To help with this maintenance let's start by moving sysctls to places
      where they actually belong.  The proc sysctl maintainers do not want to
      know what sysctl knobs you wish to add for your own piece of code, we
      just care about the core logic.
      
      Today though folks heavily rely on tables on kernel/sysctl.c so they can
      easily just extend this table with their needed sysctls.  In order to
      help users move their sysctls out we need to provide a helper which can
      be used during code initialization.
      
      We special-case the initialization use of register_sysctl() since it
      *is* safe to fail, given all that sysctls do is provide a dynamic
      interface to query or modify at runtime an existing variable.  So the
      use case of register_sysctl() on init should *not* stop if the sysctls
      don't end up getting registered.  It would be counter productive to stop
      boot if a simple sysctl registration failed.
      
      Provide a helper for init then, and document the recommended init levels
      to use for callers of this routine.  We will later use this in
      subsequent patches to start slimming down kernel/sysctl.c tables and
      moving sysctl registration to the code which actually needs these
      sysctls.
      
      [mcgrof@kernel.org: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c                  ]
      
      Link: https://lkml.kernel.org/r/20211123202347.818157-1-mcgrof@kernel.org
      Link: https://lkml.kernel.org/r/20211123202347.818157-2-mcgrof@kernel.org
      
      
      Signed-off-by: default avatarXiaoming Ni <nixiaoming@huawei.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Qing Wang <wangqing@vivo.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Stephen Kitt <steve@sk2.org>
      Cc: Antti Palosaari <crope@iki.fi>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Clemens Ladisch <clemens@ladisch.de>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Julia Lawall <julia.lawall@inria.fr>
      Cc: Lukas Middendorf <kernel@tuxforce.de>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Phillip Potter <phil@philpotter.co.uk>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: James E.J. Bottomley <jejb@linux.ibm.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e6226917
    • Wenchao Hao's avatar
      scsi: iscsi: Fix multiple iSCSI session unbind events sent to userspace · 6bc564f3
      Wenchao Hao authored
      [ Upstream commit a3be19b9 ]
      
      It was observed that the kernel would potentially send
      ISCSI_KEVENT_UNBIND_SESSION multiple times. Introduce 'target_state' in
      iscsi_cls_session() to make sure session will send only one unbind session
      event.
      
      This introduces a regression wrt. the issue fixed in commit 13e60d3b
      ("scsi: iscsi: Report unbind session event when the target has been
      removed"). If iscsid dies for any reason after sending an unbind session to
      kernel, once iscsid is restarted, the kernel's ISCSI_KEVENT_UNBIND_SESSION
      event is lost and userspace is then unable to logout. However, the session
      is actually in invalid state (its target_id is INVALID) so iscsid should
      not sync this session during restart.
      
      Consequently we need to check the session's target state during iscsid
      restart.  If session is in unbound state, do not sync this session and
      perform session teardown. This is OK because once a ses...
      6bc564f3
    • Jakub Sitnicki's avatar
      l2tp: Serialize access to sk_user_data with sk_callback_lock · e34a965f
      Jakub Sitnicki authored
      [ Upstream commit b68777d5 ]
      
      sk->sk_user_data has multiple users, which are not compatible with each
      other. Writers must synchronize by grabbing the sk->sk_callback_lock.
      
      l2tp currently fails to grab the lock when modifying the underlying tunnel
      socket fields. Fix it by adding appropriate locking.
      
      We err on the side of safety and grab the sk_callback_lock also inside the
      sk_destruct callback overridden by l2tp, even though there should be no
      refs allowing access to the sock at the time when sk_destruct gets called.
      
      v4:
      - serialize write to sk_user_data in l2tp sk_destruct
      
      v3:
      - switch from sock lock to sk_callback_lock
      - document write-protection for sk_user_data
      
      v2:
      - update Fixes to point to origin of the bug
      - use real names in Reported/Tested-by tags
      
      Cc: Tom Parkin <tparkin@katalix.com>
      Fixes: 3557baab
      
       ("[L2TP]: PPP over L2TP driver core")
      Reported-by: default avatarHaowei Yan <g1042620637@gmail.com>
      Signed-off-b...
      e34a965f
    • Eric Dumazet's avatar
      net/sched: sch_taprio: fix possible use-after-free · c60fe700
      Eric Dumazet authored
      [ Upstream commit 3a415d59 ]
      
      syzbot reported a nasty crash [1] in net_tx_action() which
      made little sense until we got a repro.
      
      This repro installs a taprio qdisc, but providing an
      invalid TCA_RATE attribute.
      
      qdisc_create() has to destroy the just initialized
      taprio qdisc, and taprio_destroy() is called.
      
      However, the hrtimer used by taprio had already fired,
      therefore advance_sched() called __netif_schedule().
      
      Then net_tx_action was trying to use a destroyed qdisc.
      
      We can not undo the __netif_schedule(), so we must wait
      until one cpu serviced the qdisc before we can proceed.
      
      Many thanks to Alexander Potapenko for his help.
      
      [1]
      BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
      BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
      BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
      BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
       queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
       do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
       __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
       _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
       spin_trylock include/linux/spinlock.h:359 [inline]
       qdisc_run_begin include/net/sch_generic.h:187 [inline]
       qdisc_run+0xee/0x540 include/net/pkt_sched.h:125
       net_tx_action+0x77c/0x9a0 net/core/dev.c:5086
       __do_softirq+0x1cc/0x7fb kernel/softirq.c:571
       run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934
       smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164
       kthread+0x31b/0x430 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:732 [inline]
       slab_alloc_node mm/slub.c:3258 [inline]
       __kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970
       kmalloc_reserve net/core/skbuff.c:358 [inline]
       __alloc_skb+0x346/0xcf0 net/core/skbuff.c:430
       alloc_skb include/linux/skbuff.h:1257 [inline]
       nlmsg_new include/net/netlink.h:953 [inline]
       netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436
       netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507
       rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       ____sys_sendmsg+0xabc/0xe90 net/socket.c:2482
       ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536
       __sys_sendmsg net/socket.c:2565 [inline]
       __do_sys_sendmsg net/socket.c:2574 [inline]
       __se_sys_sendmsg net/socket.c:2572 [inline]
       __x64_sys_sendmsg+0x367/0x540 net/socket.c:2572
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
      
      Fixes: 5a781ccb
      
       ("tc: Add support for configuring the taprio scheduler")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c60fe700
    • Uwe Kleine-König's avatar
      clk: Provide new devm_clk helpers for prepared and enabled clocks · 935ec78d
      Uwe Kleine-König authored
      [ Upstream commit 7ef9651e
      
       ]
      
      When a driver keeps a clock prepared (or enabled) during the whole
      lifetime of the driver, these helpers allow to simplify the drivers.
      
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Reviewed-by: default avatarAlexandru Ardelean <aardelean@deviqon.com>
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Link: https://lore.kernel.org/r/20220520075737.758761-4-u.kleine-koenig@pengutronix.de
      
      
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Stable-dep-of: 340cb392
      
       ("memory: atmel-sdramc: Fix missing clk_disable_unprepare in atmel_ramc_probe()")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      935ec78d
  5. Jan 24, 2023
  6. Jan 19, 2023
  7. Jan 18, 2023
  8. Jan 14, 2023
    • Mat Martineau's avatar
      mptcp: remove MPTCP 'ifdef' in TCP SYN cookies · 31472f94
      Mat Martineau authored
      From: Matthieu Baerts <matthieu.baerts@tessares.net>
      
      commit 3fff8818
      
       upstream.
      
      To ease the maintenance, it is often recommended to avoid having #ifdef
      preprocessor conditions.
      
      Here the section related to CONFIG_MPTCP was quite short but the next
      commit needs to add more code around. It is then cleaner to move
      specific MPTCP code to functions located in net/mptcp directory.
      
      Now that mptcp_subflow_request_sock_ops structure can be static, it can
      also be marked as "read only after init".
      
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Cc: stable@vger.kernel.org # 5.10
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31472f94
    • Eric Biggers's avatar
      ext4: disable fast-commit of encrypted dir operations · d9ff5ad2
      Eric Biggers authored
      commit 0fbcb525 upstream.
      
      fast-commit of create, link, and unlink operations in encrypted
      directories is completely broken because the unencrypted filenames are
      being written to the fast-commit journal instead of the encrypted
      filenames.  These operations can't be replayed, as encryption keys
      aren't present at journal replay time.  It is also an information leak.
      
      Until if/when we can get this working properly, make encrypted directory
      operations ineligible for fast-commit.
      
      Note that fast-commit operations on encrypted regular files continue to
      be allowed, as they seem to work.
      
      Fixes: aa75f4d3
      
       ("ext4: main fast-commit commit path")
      Cc: <stable@vger.kernel.org> # v5.10+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221106224841.279231-2-ebiggers@kernel.org
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9ff5ad2
    • Ard Biesheuvel's avatar
      efi: random: combine bootloader provided RNG seed with RNG protocol output · b57d7b1d
      Ard Biesheuvel authored
      commit 196dff27
      
       upstream.
      
      Instead of blindly creating the EFI random seed configuration table if
      the RNG protocol is implemented and works, check whether such a EFI
      configuration table was provided by an earlier boot stage and if so,
      concatenate the existing and the new seeds, leaving it up to the core
      code to mix it in and credit it the way it sees fit.
      
      This can be used for, e.g., systemd-boot, to pass an additional seed to
      Linux in a way that can be consumed by the kernel very early. In that
      case, the following definitions should be used to pass the seed to the
      EFI stub:
      
      struct linux_efi_random_seed {
            u32     size; // of the 'seed' array in bytes
            u8      seed[];
      };
      
      The memory for the struct must be allocated as EFI_ACPI_RECLAIM_MEMORY
      pool memory, and the address of the struct in memory should be installed
      as a EFI configuration table using the following GUID:
      
      LINUX_EFI_RANDOM_SEED_TABLE_GUID        1ce1e5bc-7ceb-42f2-81e5-8aadf180f57b
      
      Note that doing so is safe even on kernels that were built without this
      patch applied, but the seed will simply be overwritten with a seed
      derived from the EFI RNG protocol, if available. The recommended seed
      size is 32 bytes, and seeds larger than 512 bytes are considered
      corrupted and ignored entirely.
      
      In order to preserve forward secrecy, seeds from previous bootloaders
      are memzero'd out, and in order to preserve memory, those older seeds
      are also freed from memory. Freeing from memory without first memzeroing
      is not safe to do, as it's possible that nothing else will ever
      overwrite those pages used by EFI.
      
      Reviewed-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      [ardb: incorporate Jason's followup changes to extend the maximum seed
             size on the consumer end, memzero() it and drop a needless printk]
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b57d7b1d
    • Jozsef Kadlecsik's avatar
      netfilter: ipset: Rework long task execution when adding/deleting entries · ee756980
      Jozsef Kadlecsik authored
      [ Upstream commit 5e29dc36 ]
      
      When adding/deleting large number of elements in one step in ipset, it can
      take a reasonable amount of time and can result in soft lockup errors. The
      patch 5f7b51bf ("netfilter: ipset: Limit the maximal range of
      consecutive elements to add/delete") tried to fix it by limiting the max
      elements to process at all. However it was not enough, it is still possible
      that we get hung tasks. Lowering the limit is not reasonable, so the
      approach in this patch is as follows: rely on the method used at resizing
      sets and save the state when we reach a smaller internal batch limit,
      unlock/lock and proceed from the saved state. Thus we can avoid long
      continuous tasks and at the same time removed the limit to add/delete large
      number of elements in one step.
      
      The nfnl mutex is held during the whole operation which prevents one to
      issue other ipset commands in parallel.
      
      Fixes: 5f7b51bf
      
       ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
      Reported-by: default avatar <syzbot+9204e7399656300bf271@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ee756980
    • Jeff Layton's avatar
      filelock: new helper: vfs_inode_has_locks · 40771042
      Jeff Layton authored
      [ Upstream commit ab1ddef9
      
       ]
      
      Ceph has a need to know whether a particular inode has any locks set on
      it. It's currently tracking that by a num_locks field in its
      filp->private_data, but that's problematic as it tries to decrement this
      field when releasing locks and that can race with the file being torn
      down.
      
      Add a new vfs_inode_has_locks helper that just returns whether any locks
      are currently held on the inode.
      
      Reviewed-by: default avatarXiubo Li <xiubli@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Stable-dep-of: 461ab10e
      
       ("ceph: switch to vfs_inode_has_locks() to fix file lock bug")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      40771042
    • minoura makoto's avatar
      SUNRPC: ensure the matching upcall is in-flight upon downcall · cb0d627b
      minoura makoto authored
      [ Upstream commit b18cba09 ]
      
      Commit 9130b8db
      
       ("SUNRPC: allow for upcalls for the same uid
      but different gss service") introduced `auth` argument to
      __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
      since it (and auth->service) was not (yet) determined.
      
      When multiple upcalls with the same uid and different service are
      ongoing, it could happen that __gss_find_upcall(), which returns the
      first match found in the pipe->in_downcall list, could not find the
      correct gss_msg corresponding to the downcall we are looking for.
      Moreover, it might return a msg which is not sent to rpc.gssd yet.
      
      We could see mount.nfs process hung in D state with multiple mount.nfs
      are executed in parallel.  The call trace below is of CentOS 7.9
      kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
      elrepo kernel-ml-6.0.7-1.el7.
      
      PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
       #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
       #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
       #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
       #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc
      [sunrpc]
       #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
       #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
       #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
       #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
       #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
       #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]
      
      The scenario is like this. Let's say there are two upcalls for
      services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.
      
      When rpc.gssd reads pipe to get the upcall msg corresponding to
      service B from pipe->pipe and then writes the response, in
      gss_pipe_downcall the msg corresponding to service A will be picked
      because only uid is used to find the msg and it is before the one for
      B in pipe->in_downcall.  And the process waiting for the msg
      corresponding to service A will be woken up.
      
      Actual scheduing of that process might be after rpc.gssd processes the
      next msg.  In rpc_pipe_generic_upcall it clears msg->errno (for A).
      The process is scheduled to see gss_msg->ctx == NULL and
      gss_msg->msg.errno == 0, therefore it cannot break the loop in
      gss_create_upcall and is never woken up after that.
      
      This patch adds a simple check to ensure that a msg which is not
      sent to rpc.gssd yet is not chosen as the matching upcall upon
      receiving a downcall.
      
      Signed-off-by: default avatarminoura makoto <minoura@valinux.co.jp>
      Signed-off-by: default avatarHiroshi Shimamoto <h-shimamoto@nec.com>
      Tested-by: default avatarHiroshi Shimamoto <h-shimamoto@nec.com>
      Cc: Trond Myklebust <trondmy@hammerspace.com>
      Fixes: 9130b8db
      
       ("SUNRPC: allow for upcalls for same uid but different gss service")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cb0d627b
    • Jan Kara's avatar
      ext4: fix deadlock due to mbcache entry corruption · 1be16a0c
      Jan Kara authored
      [ Upstream commit a44e84a9 ]
      
      When manipulating xattr blocks, we can deadlock infinitely looping
      inside ext4_xattr_block_set() where we constantly keep finding xattr
      block for reuse in mbcache but we are unable to reuse it because its
      reference count is too big. This happens because cache entry for the
      xattr block is marked as reusable (e_reusable set) although its
      reference count is too big. When this inconsistency happens, this
      inconsistent state is kept indefinitely and so ext4_xattr_block_set()
      keeps retrying indefinitely.
      
      The inconsistent state is caused by non-atomic update of e_reusable bit.
      e_reusable is part of a bitfield and e_reusable update can race with
      update of e_referenced bit in the same bitfield resulting in loss of one
      of the updates. Fix the problem by using atomic bitops instead.
      
      This bug has been around for many years, but it became *much* easier
      to hit after commit 65f8b800 ("ext4: fix race when reusing xattr
      blocks").
      
      Cc: stable@vger.kernel.org
      Fixes: 6048c64b ("mbcache: add reusable flag to cache entries")
      Fixes: 65f8b800
      
       ("ext4: fix race when reusing xattr blocks")
      Reported-and-tested-by: default avatarJeremi Piotrowski <jpiotrowski@linux.microsoft.com>
      Reported-by: default avatarThilo Fromm <t-lo@linux.microsoft.com>
      Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microsoft.com/
      
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Link: https://lore.kernel.org/r/20221123193950.16758-1-jack@suse.cz
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1be16a0c
    • Jan Kara's avatar
      mbcache: automatically delete entries from cache on freeing · 0da99012
      Jan Kara authored
      [ Upstream commit 307af6c8
      
       ]
      
      Use the fact that entries with elevated refcount are not removed from
      the hash and just move removal of the entry from the hash to the entry
      freeing time. When doing this we also change the generic code to hold
      one reference to the cache entry, not two of them, which makes code
      somewhat more obvious.
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Stable-dep-of: a44e84a9
      
       ("ext4: fix deadlock due to mbcache entry corruption")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0da99012
    • Jan Kara's avatar
      mbcache: add functions to delete entry if unused · 27c08673
      Jan Kara authored
      [ Upstream commit 3dc96bba ]
      
      Add function mb_cache_entry_delete_or_get() to delete mbcache entry if
      it is unused and also add a function to wait for entry to become unused
      - mb_cache_entry_wait_unused(). We do not share code between the two
      deleting function as one of them will go away soon.
      
      CC: stable@vger.kernel.org
      Fixes: 82939d79
      
       ("ext4: convert to mbcache2")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220712105436.32204-2-jack@suse.cz
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Stable-dep-of: a44e84a9
      
       ("ext4: fix deadlock due to mbcache entry corruption")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      27c08673
    • Ira Weiny's avatar
      mm/highmem: Lift memcpy_[to|from]_page to core · 60d4383c
      Ira Weiny authored
      [ Upstream commit bb90d4bc ]
      
      Working through a conversion to a call kmap_local_page() instead of
      kmap() revealed many places where the pattern kmap/memcpy/kunmap
      occurred.
      
      Eric Biggers, Matthew Wilcox, Christoph Hellwig, Dan Williams, and Al
      Viro all suggested putting this code into helper functions.  Al Viro
      further pointed out that these functions already existed in the iov_iter
      code.[1]
      
      Various locations for the lifted functions were considered.
      
      Headers like mm.h or string.h seem ok but don't really portray the
      functionality well.  pagemap.h made some sense but is for page cache
      functionality.[2]
      
      Another alternative would be to create a new header for the promoted
      memcpy functions, but it masks the fact that these are designed to copy
      to/from pages using the kernel direct mappings and complicates matters
      with a new header.
      
      Placing these functions in 'highmem.h' is suboptimal especially with the
      changes being proposed i...
      60d4383c
    • Kant Fan's avatar
      PM/devfreq: governor: Add a private governor_data for governor · cea018aa
      Kant Fan authored
      commit 5fdded84 upstream.
      
      The member void *data in the structure devfreq can be overwrite
      by governor_userspace. For example:
      1. The device driver assigned the devfreq governor to simple_ondemand
      by the function devfreq_add_device() and init the devfreq member
      void *data to a pointer of a static structure devfreq_simple_ondemand_data
      by the function devfreq_add_device().
      2. The user changed the devfreq governor to userspace by the command
      "echo userspace > /sys/class/devfreq/.../governor".
      3. The governor userspace alloced a dynamic memory for the struct
      userspace_data and assigend the member void *data of devfreq to
      this memory by the function userspace_init().
      4. The user changed the devfreq governor back to simple_ondemand
      by the command "echo simple_ondemand > /sys/class/devfreq/.../governor".
      5. The governor userspace exited and assigned the member void *data
      in the structure devfreq to NULL by the functio...
      cea018aa
    • Bixuan Cui's avatar
      jbd2: use the correct print format · c023597b
      Bixuan Cui authored
      commit d87a7b4c upstream.
      
      The print format error was found when using ftrace event:
          <...>-1406 [000] .... 23599442.895823: jbd2_end_commit: dev 252,8 transaction -1866216965 sync 0 head -1866217368
          <...>-1406 [000] .... 23599442.896299: jbd2_start_commit: dev 252,8 transaction -1866216964 sync 0
      
      Use the correct print format for transaction, head and tid.
      
      Fixes: 879c5e6b
      
       ('jbd2: convert instrumentation from markers to tracepoints')
      Signed-off-by: default avatarBixuan Cui <cuibixuan@linux.alibaba.com>
      Reviewed-by: default avatarJason Yan <yanaijie@huawei.com>
      Link: https://lore.kernel.org/r/1665488024-95172-1-git-send-email-cuibixuan@linux.alibaba.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c023597b