Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Nov 14, 2021
  2. Nov 11, 2021
    • Ingo Molnar's avatar
      mm: allow only SLUB on PREEMPT_RT · 252220da
      Ingo Molnar authored
      Memory allocators may disable interrupts or preemption as part of the
      allocation and freeing process.  For PREEMPT_RT it is important that
      these sections remain deterministic and short and therefore don't depend
      on the size of the memory to allocate/ free or the inner state of the
      algorithm.
      
      Until v3.12-RT the SLAB allocator was an option but involved several
      changes to meet all the requirements.  The SLUB design fits better with
      PREEMPT_RT model and so the SLAB patches were dropped in the 3.12-RT
      patchset.  Comparing the two allocator, SLUB outperformed SLAB in both
      throughput (time needed to allocate and free memory) and the maximal
      latency of the system measured with cyclictest during hackbench.
      
      SLOB was never evaluated since it was unlikely that it preforms better
      than SLAB.  During a quick test, the kernel crashed with SLOB enabled
      during boot.
      
      Disable SLAB and SLOB on PREEMPT_RT.
      
      [bigeasy@linutronix.de: commit description]
      
      Link: https://lkml.kernel.org/r/202110...
      252220da
    • Valentin Schneider's avatar
      preempt: Restore preemption model selection configs · a8b76910
      Valentin Schneider authored
      Commit c597bfdd ("sched: Provide Kconfig support for default dynamic
      preempt mode") changed the selectable config names for the preemption
      model. This means a config file must now select
      
        CONFIG_PREEMPT_BEHAVIOUR=y
      
      rather than
      
        CONFIG_PREEMPT=y
      
      to get a preemptible kernel. This means all arch config files would need to
      be updated - right now they'll all end up with the default
      CONFIG_PREEMPT_NONE_BEHAVIOUR.
      
      Rather than touch a good hundred of config files, restore usage of
      CONFIG_PREEMPT{_NONE, _VOLUNTARY}. Make them configure:
      o The build-time preemption model when !PREEMPT_DYNAMIC
      o The default boot-time preemption model when PREEMPT_DYNAMIC
      
      Add siblings of those configs with the _BUILD suffix to unconditionally
      designate the build-time preemption model (PREEMPT_DYNAMIC is built with
      the "highest" preemption model it supports, aka PREEMPT). Downstream
      configs should by now all be depending / selected by CONFIG_PREEMPTION
      rather than CONFIG_...
      a8b76910
  3. Nov 09, 2021
  4. Nov 06, 2021
  5. Oct 18, 2021
  6. Oct 10, 2021
  7. Sep 22, 2021
  8. Sep 19, 2021
    • Leon Romanovsky's avatar
      init: don't panic if mount_nodev_root failed · 40c8ee67
      Leon Romanovsky authored
      Attempt to mount 9p file system as root gives the following kernel panic:
      
       9pnet_virtio: no channels available for device root
       Kernel panic - not syncing: VFS: Unable to mount root "root" (9p), err=-2
       CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc1+ #127
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack_lvl+0x45/0x59
        panic+0x1e2/0x44b
        ? __warn_printk+0xf3/0xf3
        ? free_unref_page+0x2d4/0x4a0
        ? trace_hardirqs_on+0x32/0x120
        ? free_unref_page+0x2d4/0x4a0
        mount_root+0x189/0x1e0
        prepare_namespace+0x136/0x165
        kernel_init_freeable+0x3b8/0x3cb
        ? rest_init+0x2e0/0x2e0
        kernel_init+0x19/0x130
        ret_from_fork+0x1f/0x30
       Kernel Offset: disabled
       ---[ end Kernel panic - not syncing: VFS: Unable to mount root "root" (9p), err=-2 ]---
      
      QEMU command line:
       "qemu-system-x86_64 -append root=/dev/root rw rootfstype=9p rootflags=trans=virtio ..."
      
      This error is because root_device_name is truncated in prepare_namespace() from
      being "/dev/root" to be "root" prior to call to mount_nodev_root().
      
      As a solution, don't treat errors in mount_nodev_root() as errors that
      require panics and allow failback to the mount flow that existed before
      patch citied in Fixes tag.
      
      Fixes: f9259be6
      
       ("init: allow mounting arbitrary non-blockdevice filesystems as root")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      40c8ee67
    • Vivek Goyal's avatar
      init/do_mounts.c: Harden split_fs_names() against buffer overflow · b51593c4
      Vivek Goyal authored
      
      split_fs_names() currently takes comma separate list of filesystems
      and converts it into individual filesystem strings. Pleaces these
      strings in the input buffer passed by caller and returns number of
      strings.
      
      If caller manages to pass input string bigger than buffer, then we
      can write beyond the buffer. Or if string just fits buffer, we will
      still write beyond the buffer as we append a '\0' byte at the end.
      
      Pass size of input buffer to split_fs_names() and put enough checks
      in place so such buffer overrun possibilities do not occur.
      
      This patch does few things.
      
      - Add a parameter "size" to split_fs_names(). This specifies size
        of input buffer.
      
      - Use strlcpy() (instead of strcpy()) so that we can't go beyond
        buffer size. If input string "names" is larger than passed in
        buffer, input string will be truncated to fit in buffer.
      
      - Stop appending extra '\0' character at the end and avoid one
        possibility of going beyond the input buffer size.
      
      - Do not use extra loop to count number of strings.
      
      - Previously if one passed "rootfstype=foo,,bar", split_fs_names()
        will return only 1 string "foo" (and "bar" will be truncated
        due to extra ,). After this patch, now split_fs_names() will
        return 3 strings ("foo", zero-sized-string, and "bar").
      
        Callers of split_fs_names() have been modified to check for
        zero sized string and skip to next one.
      
      Reported-by: default avatarxu xin <xu.xin16@zte.com.cn>
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b51593c4
  9. Sep 14, 2021
  10. Sep 08, 2021
  11. Sep 07, 2021
    • Marco Elver's avatar
      kbuild: Only default to -Werror if COMPILE_TEST · b339ec9c
      Marco Elver authored
      The cross-product of the kernel's supported toolchains, architectures,
      and configuration options is large. So large, that it's generally
      accepted to be infeasible to enumerate and build+test them all
      (many compile-testers rely on randomly generated configs).
      
      Without the possibility to enumerate all possible combinations of
      toolchains, architectures, and configuration options, it is inevitable
      that compiler warnings in this space exist.
      
      With -Werror, this means that an innumerable set of kernels are now
      broken, yet had been perfectly usable before (confused compilers, code
      with warnings unused, or luck).
      
      Distributors will necessarily pick a point in the toolchain X arch X
      config space, and if unlucky, will have a broken build. Granted, those
      will likely disable CONFIG_WERROR and move on.
      
      The kernel's default configuration is unlikely to be suitable for all
      users, but it's inappropriate to force many users to set CONFIG_WERROR=n.
      
      This also holds for CI sy...
      b339ec9c
  12. Sep 05, 2021
    • Linus Torvalds's avatar
      Enable '-Werror' by default for all kernel builds · 3fe617cc
      Linus Torvalds authored
      
      ... but make it a config option so that broken environments can disable
      it when required.
      
      We really should always have a clean build, and will disable specific
      over-eager warnings as required, if we can't fix them.  But while I
      fairly religiously enforce that in my own tree, it doesn't get enforced
      by various build robots that don't necessarily report warnings.
      
      So this just makes '-Werror' a default compiler flag, but allows people
      to disable it for their configuration if they have some particular
      issues.
      
      Occasionally, new compiler versions end up enabling new warnings, and it
      can take a while before we have them fixed (or the warnings disabled if
      that is what it takes), so the config option allows for that situation.
      
      Hopefully this will mean that I get fewer pull requests that have new
      warnings that were not noticed by various automation we have in place.
      
      Knock wood.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3fe617cc
  13. Aug 24, 2021
  14. Aug 23, 2021
  15. Aug 20, 2021
  16. Aug 12, 2021
  17. Aug 03, 2021
  18. Jul 26, 2021
    • John Ogness's avatar
      printk: remove NMI tracking · 85e3e7fb
      John Ogness authored
      All NMI contexts are handled the same as the safe context: store the
      message and defer printing. There is no need to have special NMI
      context tracking for this. Using in_nmi() is enough.
      
      There are several parts of the kernel that are manually calling into
      the printk NMI context tracking in order to cause general printk
      deferred printing:
      
          arch/arm/kernel/smp.c
          arch/powerpc/kexec/crash.c
          kernel/trace/trace.c
      
      For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new
      function pair printk_deferred_enter/exit that explicitly achieves the
      same objective.
      
      For ftrace, remove the printk context manipulation completely. It was
      added in commit 03fc7f9c
      
       ("printk/nmi: Prevent deadlock when
      accessing the main log buffer in NMI"). The purpose was to enforce
      storing messages directly into the ring buffer even in NMI context.
      It really should have only modified the behavior in NMI context.
      There is no need for a special behavior any longer. All messages are
      always stored directly now. The console deferring is handled
      transparently in vprintk().
      
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      [pmladek@suse.com: Remove special handling in ftrace.c completely.
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
      85e3e7fb
  19. Jul 19, 2021
    • Chris Down's avatar
      printk: Userspace format indexing support · 33701557
      Chris Down authored
      
      We have a number of systems industry-wide that have a subset of their
      functionality that works as follows:
      
      1. Receive a message from local kmsg, serial console, or netconsole;
      2. Apply a set of rules to classify the message;
      3. Do something based on this classification (like scheduling a
         remediation for the machine), rinse, and repeat.
      
      As a couple of examples of places we have this implemented just inside
      Facebook, although this isn't a Facebook-specific problem, we have this
      inside our netconsole processing (for alarm classification), and as part
      of our machine health checking. We use these messages to determine
      fairly important metrics around production health, and it's important
      that we get them right.
      
      While for some kinds of issues we have counters, tracepoints, or metrics
      with a stable interface which can reliably indicate the issue, in order
      to react to production issues quickly we need to work with the interface
      which most kernel developers naturally use when developing: printk.
      
      Most production issues come from unexpected phenomena, and as such
      usually the code in question doesn't have easily usable tracepoints or
      other counters available for the specific problem being mitigated. We
      have a number of lines of monitoring defence against problems in
      production (host metrics, process metrics, service metrics, etc), and
      where it's not feasible to reliably monitor at another level, this kind
      of pragmatic netconsole monitoring is essential.
      
      As one would expect, monitoring using printk is rather brittle for a
      number of reasons -- most notably that the message might disappear
      entirely in a new version of the kernel, or that the message may change
      in some way that the regex or other classification methods start to
      silently fail.
      
      One factor that makes this even harder is that, under normal operation,
      many of these messages are never expected to be hit. For example, there
      may be a rare hardware bug which one wants to detect if it was to ever
      happen again, but its recurrence is not likely or anticipated. This
      precludes using something like checking whether the printk in question
      was printed somewhere fleetwide recently to determine whether the
      message in question is still present or not, since we don't anticipate
      that it should be printed anywhere, but still need to monitor for its
      future presence in the long-term.
      
      This class of issue has happened on a number of occasions, causing
      unhealthy machines with hardware issues to remain in production for
      longer than ideal. As a recent example, some monitoring around
      blk_update_request fell out of date and caused semi-broken machines to
      remain in production for longer than would be desirable.
      
      Searching through the codebase to find the message is also extremely
      fragile, because many of the messages are further constructed beyond
      their callsite (eg. btrfs_printk and other module-specific wrappers,
      each with their own functionality). Even if they aren't, guessing the
      format and formulation of the underlying message based on the aesthetics
      of the message emitted is not a recipe for success at scale, and our
      previous issues with fleetwide machine health checking demonstrate as
      much.
      
      This provides a solution to the issue of silently changed or deleted
      printks: we record pointers to all printk format strings known at
      compile time into a new .printk_index section, both in vmlinux and
      modules. At runtime, this can then be iterated by looking at
      <debugfs>/printk/index/<module>, which emits the following format, both
      readable by humans and able to be parsed by machines:
      
          $ head -1 vmlinux; shuf -n 5 vmlinux
          # <level[,flags]> filename:line function "format"
          <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
          <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
          <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
          <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
          <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
      
      This mitigates the majority of cases where we have a highly-specific
      printk which we want to match on, as we can now enumerate and check
      whether the format changed or the printk callsite disappeared entirely
      in userspace. This allows us to catch changes to printks we monitor
      earlier and decide what to do about it before it becomes problematic.
      
      There is no additional runtime cost for printk callers or printk itself,
      and the assembly generated is exactly the same.
      
      Signed-off-by: default avatarChris Down <chris@chrisdown.name>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Tested-by: default avatarPetr Mladek <pmladek@suse.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Acked-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
      33701557
  20. Jul 17, 2021
  21. Jul 08, 2021
  22. Jul 01, 2021
    • Andrew Halaney's avatar
      init: print out unknown kernel parameters · 86d1919a
      Andrew Halaney authored
      It is easy to foobar setting a kernel parameter on the command line
      without realizing it, there's not much output that you can use to assess
      what the kernel did with that parameter by default.
      
      Make it a little more explicit which parameters on the command line
      _looked_ like a valid parameter for the kernel, but did not match anything
      and ultimately got tossed to init.  This is very similar to the unknown
      parameter message received when loading a module.
      
      This assumes the parameters are processed in a normal fashion, some
      parameters (dyndbg= for example) don't register their parameter with the
      rest of the kernel's parameters, and therefore always show up in this list
      (and are also given to init - like the rest of this list).
      
      Another example is BOOT_IMAGE= is highlighted as an offender, which it
      technically is, but is passed by LILO and GRUB so most systems will see
      that complaint.
      
      An example output where "foobared" and "unrecognized" are intentionally
      invalid parameters:
      
        Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.12-dirty debug log_buf_len=4M foobared unrecognized=foo
        Unknown command line parameters: foobared BOOT_IMAGE=/boot/vmlinuz-5.12-dirty unrecognized=foo
      
      Link: https://lkml.kernel.org/r/20210511211009.42259-1-ahalaney@redhat.com
      
      
      Signed-off-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Suggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Suggested-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86d1919a
  23. Jun 22, 2021
  24. Jun 18, 2021
  25. Jun 10, 2021