Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Nov 14, 2021
  2. Nov 11, 2021
    • Ingo Molnar's avatar
      mm: allow only SLUB on PREEMPT_RT · 252220da
      Ingo Molnar authored
      Memory allocators may disable interrupts or preemption as part of the
      allocation and freeing process.  For PREEMPT_RT it is important that
      these sections remain deterministic and short and therefore don't depend
      on the size of the memory to allocate/ free or the inner state of the
      algorithm.
      
      Until v3.12-RT the SLAB allocator was an option but involved several
      changes to meet all the requirements.  The SLUB design fits better with
      PREEMPT_RT model and so the SLAB patches were dropped in the 3.12-RT
      patchset.  Comparing the two allocator, SLUB outperformed SLAB in both
      throughput (time needed to allocate and free memory) and the maximal
      latency of the system measured with cyclictest during hackbench.
      
      SLOB was never evaluated since it was unlikely that it preforms better
      than SLAB.  During a quick test, the kernel crashed with SLOB enabled
      during boot.
      
      Disable SLAB and SLOB on PREEMPT_RT.
      
      [bigeasy@linutronix.de: commit description]
      
      Link: https://lkml.kernel.org/r/20211015210336.gen3tib33ig5q2md@linutronix.de
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      252220da
    • Valentin Schneider's avatar
      preempt: Restore preemption model selection configs · a8b76910
      Valentin Schneider authored
      Commit c597bfdd ("sched: Provide Kconfig support for default dynamic
      preempt mode") changed the selectable config names for the preemption
      model. This means a config file must now select
      
        CONFIG_PREEMPT_BEHAVIOUR=y
      
      rather than
      
        CONFIG_PREEMPT=y
      
      to get a preemptible kernel. This means all arch config files would need to
      be updated - right now they'll all end up with the default
      CONFIG_PREEMPT_NONE_BEHAVIOUR.
      
      Rather than touch a good hundred of config files, restore usage of
      CONFIG_PREEMPT{_NONE, _VOLUNTARY}. Make them configure:
      o The build-time preemption model when !PREEMPT_DYNAMIC
      o The default boot-time preemption model when PREEMPT_DYNAMIC
      
      Add siblings of those configs with the _BUILD suffix to unconditionally
      designate the build-time preemption model (PREEMPT_DYNAMIC is built with
      the "highest" preemption model it supports, aka PREEMPT). Downstream
      configs should by now all be depending / selected by CONFIG_PREEMPTION
      rather than CONFIG_...
      a8b76910
  3. Nov 09, 2021
  4. Nov 06, 2021
  5. Oct 18, 2021
  6. Oct 10, 2021
  7. Sep 22, 2021
  8. Sep 19, 2021
    • Leon Romanovsky's avatar
      init: don't panic if mount_nodev_root failed · 40c8ee67
      Leon Romanovsky authored
      Attempt to mount 9p file system as root gives the following kernel panic:
      
       9pnet_virtio: no channels available for device root
       Kernel panic - not syncing: VFS: Unable to mount root "root" (9p), err=-2
       CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc1+ #127
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack_lvl+0x45/0x59
        panic+0x1e2/0x44b
        ? __warn_printk+0xf3/0xf3
        ? free_unref_page+0x2d4/0x4a0
        ? trace_hardirqs_on+0x32/0x120
        ? free_unref_page+0x2d4/0x4a0
        mount_root+0x189/0x1e0
        prepare_namespace+0x136/0x165
        kernel_init_freeable+0x3b8/0x3cb
        ? rest_init+0x2e0/0x2e0
        kernel_init+0x19/0x130
        ret_from_fork+0x1f/0x30
       Kernel Offset: disabled
       ---[ end Kernel panic - not syncing: VFS: Unable to mount root "root" (9p), err=-2 ]---
      
      QEMU command line:
       "qemu-system-x86_64 -append root=/dev/root rw rootfstype=9p rootflags=trans=virtio ..."
      
      This error is becaus...
      40c8ee67
    • Vivek Goyal's avatar
      init/do_mounts.c: Harden split_fs_names() against buffer overflow · b51593c4
      Vivek Goyal authored
      
      split_fs_names() currently takes comma separate list of filesystems
      and converts it into individual filesystem strings. Pleaces these
      strings in the input buffer passed by caller and returns number of
      strings.
      
      If caller manages to pass input string bigger than buffer, then we
      can write beyond the buffer. Or if string just fits buffer, we will
      still write beyond the buffer as we append a '\0' byte at the end.
      
      Pass size of input buffer to split_fs_names() and put enough checks
      in place so such buffer overrun possibilities do not occur.
      
      This patch does few things.
      
      - Add a parameter "size" to split_fs_names(). This specifies size
        of input buffer.
      
      - Use strlcpy() (instead of strcpy()) so that we can't go beyond
        buffer size. If input string "names" is larger than passed in
        buffer, input string will be truncated to fit in buffer.
      
      - Stop appending extra '\0' character at the end and avoid one
        possibility of going beyond the input buffer size.
      
      - Do not use extra loop to count number of strings.
      
      - Previously if one passed "rootfstype=foo,,bar", split_fs_names()
        will return only 1 string "foo" (and "bar" will be truncated
        due to extra ,). After this patch, now split_fs_names() will
        return 3 strings ("foo", zero-sized-string, and "bar").
      
        Callers of split_fs_names() have been modified to check for
        zero sized string and skip to next one.
      
      Reported-by: default avatarxu xin <xu.xin16@zte.com.cn>
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b51593c4
  9. Sep 14, 2021
    • Linus Torvalds's avatar
      memblock: introduce saner 'memblock_free_ptr()' interface · 77e02cf5
      Linus Torvalds authored
      The boot-time allocation interface for memblock is a mess, with
      'memblock_alloc()' returning a virtual pointer, but then you are
      supposed to free it with 'memblock_free()' that takes a _physical_
      address.
      
      Not only is that all kinds of strange and illogical, but it actually
      causes bugs, when people then use it like a normal allocation function,
      and it fails spectacularly on a NULL pointer:
      
         https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/
      
      or just random memory corruption if the debug checks don't catch it:
      
         https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/
      
      
      
      I really don't want to apply patches that treat the symptoms, when the
      fundamental cause is this horribly confusing interface.
      
      I started out looking at just automating a sane replacement sequence,
      but because of this mix or virtual and physical addresses, and because
      people have used the "__pa()" macro that can take either a regular
      kernel pointer, or just the raw "unsigned long" address, it's all quite
      messy.
      
      So this just introduces a new saner interface for freeing a virtual
      address that was allocated using 'memblock_alloc()', and that was kept
      as a regular kernel pointer.  And then it converts a couple of users
      that are obvious and easy to test, including the 'xbc_nodes' case in
      lib/bootconfig.c that caused problems.
      
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Fixes: 40caa127
      
       ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77e02cf5
  10. Sep 08, 2021
  11. Sep 07, 2021
    • Marco Elver's avatar
      kbuild: Only default to -Werror if COMPILE_TEST · b339ec9c
      Marco Elver authored
      
      The cross-product of the kernel's supported toolchains, architectures,
      and configuration options is large. So large, that it's generally
      accepted to be infeasible to enumerate and build+test them all
      (many compile-testers rely on randomly generated configs).
      
      Without the possibility to enumerate all possible combinations of
      toolchains, architectures, and configuration options, it is inevitable
      that compiler warnings in this space exist.
      
      With -Werror, this means that an innumerable set of kernels are now
      broken, yet had been perfectly usable before (confused compilers, code
      with warnings unused, or luck).
      
      Distributors will necessarily pick a point in the toolchain X arch X
      config space, and if unlucky, will have a broken build. Granted, those
      will likely disable CONFIG_WERROR and move on.
      
      The kernel's default configuration is unlikely to be suitable for all
      users, but it's inappropriate to force many users to set CONFIG_WERROR=n.
      
      This also holds for CI systems which are focused on runtime testing,
      where the odd warning in some subsystem will disrupt testing of the rest
      of the kernel. Many of those runtime-focused CI systems run tests or
      fuzz the kernel using runtime debugging tools. Runtime testing of
      different subsystems can proceed in parallel, and potentially uncover
      serious bugs; halting runtime testing of the entire kernel because of
      the odd warning (now error) in a subsystem or driver is simply
      inappropriate.
      
      Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n
      as well.
      
      The appropriate usecase for -Werror is therefore compile-test focused
      builds (often done by developers or CI systems).
      
      Reflect this in the Kconfig option by making the default value of WERROR
      match COMPILE_TEST.
      
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reviwed-by: default avatarMark Brown <broonie@kernel.org>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b339ec9c
  12. Sep 05, 2021
    • Linus Torvalds's avatar
      Enable '-Werror' by default for all kernel builds · 3fe617cc
      Linus Torvalds authored
      
      ... but make it a config option so that broken environments can disable
      it when required.
      
      We really should always have a clean build, and will disable specific
      over-eager warnings as required, if we can't fix them.  But while I
      fairly religiously enforce that in my own tree, it doesn't get enforced
      by various build robots that don't necessarily report warnings.
      
      So this just makes '-Werror' a default compiler flag, but allows people
      to disable it for their configuration if they have some particular
      issues.
      
      Occasionally, new compiler versions end up enabling new warnings, and it
      can take a while before we have them fixed (or the warnings disabled if
      that is what it takes), so the config option allows for that situation.
      
      Hopefully this will mean that I get fewer pull requests that have new
      warnings that were not noticed by various automation we have in place.
      
      Knock wood.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3fe617cc
  13. Aug 24, 2021
  14. Aug 23, 2021
  15. Aug 20, 2021
  16. Aug 12, 2021
  17. Aug 03, 2021
  18. Jul 26, 2021
    • John Ogness's avatar
      printk: remove NMI tracking · 85e3e7fb
      John Ogness authored
      All NMI contexts are handled the same as the safe context: store the
      message and defer printing. There is no need to have special NMI
      context tracking for this. Using in_nmi() is enough.
      
      There are several parts of the kernel that are manually calling into
      the printk NMI context tracking in order to cause general printk
      deferred printing:
      
          arch/arm/kernel/smp.c
          arch/powerpc/kexec/crash.c
          kernel/trace/trace.c
      
      For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new
      function pair printk_deferred_enter/exit that explicitly achieves the
      same objective.
      
      For ftrace, remove the printk context manipulation completely. It was
      added in commit 03fc7f9c
      
       ("printk/nmi: Prevent deadlock when
      accessing the main log buffer in NMI"). The purpose was to enforce
      storing messages directly into the ring buffer even in NMI context.
      It really should have only modified the behavior in NMI context.
      There is no need for a special behavior any longer. All messages are
      always stored directly now. The console deferring is handled
      transparently in vprintk().
      
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      [pmladek@suse.com: Remove special handling in ftrace.c completely.
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
      85e3e7fb
  19. Jul 19, 2021
    • Chris Down's avatar
      printk: Userspace format indexing support · 33701557
      Chris Down authored
      We have a number of systems industry-wide that have a subset of their
      functionality that works as follows:
      
      1. Receive a message from local kmsg, serial console, or netconsole;
      2. Apply a set of rules to classify the message;
      3. Do something based on this classification (like scheduling a
         remediation for the machine), rinse, and repeat.
      
      As a couple of examples of places we have this implemented just inside
      Facebook, although this isn't a Facebook-specific problem, we have this
      inside our netconsole processing (for alarm classification), and as part
      of our machine health checking. We use these messages to determine
      fairly important metrics around production health, and it's important
      that we get them right.
      
      While for some kinds of issues we have counters, tracepoints, or metrics
      with a stable interface which can reliably indicate the issue, in order
      to react to production issues quickly we need to work with the interface
      which most kernel developers naturally ...
      33701557
  20. Jul 17, 2021
  21. Jul 08, 2021
  22. Jul 01, 2021
    • Andrew Halaney's avatar
      init: print out unknown kernel parameters · 86d1919a
      Andrew Halaney authored
      It is easy to foobar setting a kernel parameter on the command line
      without realizing it, there's not much output that you can use to assess
      what the kernel did with that parameter by default.
      
      Make it a little more explicit which parameters on the command line
      _looked_ like a valid parameter for the kernel, but did not match anything
      and ultimately got tossed to init.  This is very similar to the unknown
      parameter message received when loading a module.
      
      This assumes the parameters are processed in a normal fashion, some
      parameters (dyndbg= for example) don't register their parameter with the
      rest of the kernel's parameters, and therefore always show up in this list
      (and are also given to init - like the rest of this list).
      
      Another example is BOOT_IMAGE= is highlighted as an offender, which it
      technically is, but is passed by LILO and GRUB so most systems will see
      that complaint.
      
      An example output where "foobared" and "unrecognized" are intentionally
      invalid parameters:
      
        Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.12-dirty debug log_buf_len=4M foobared unrecognized=foo
        Unknown command line parameters: foobared BOOT_IMAGE=/boot/vmlinuz-5.12-dirty unrecognized=foo
      
      Link: https://lkml.kernel.org/r/20210511211009.42259-1-ahalaney@redhat.com
      
      
      Signed-off-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Suggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Suggested-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86d1919a
  23. Jun 22, 2021
  24. Jun 18, 2021
  25. Jun 10, 2021