Commits · 429bdfce22369d5cb3903ef80374687874047e47 · Seth Nowlininlow / Linux

Apr 27, 2020

Merge branch 'rproc-linux-5.4.y' of git://git.ti.com/rpmsg/remoteproc into rpmsg-ti-linux-5.4.y · 429bdfce

Suman Anna authored 4 years ago

Pull in the updated remoteproc feature branch that adds the base
support to the PRU remoteproc driver for supporting the enhanced
ICSSG IP in AM65x SR2.0 SoCs. The merge also includes some cleanup
to the PRU remoteproc driver to introduce device match data so as
to move away from runtime of_device_is_compatible() usage in code.

* 'rproc-linux-5.4.y' of git://git.ti.com/rpmsg/remoteproc

:
  remoteproc/pru: Add support for Tx PRU cores on K3 AM65x SR2.0 SoCs
  dt-bindings: remoteproc: pru: Update bindings for K3 AM65x SR2.0 SoCs
  remoteproc/pru: Cleanup of_device_is_compatible() usage

Signed-off-by: Suman Anna <s-anna@ti.com>

429bdfce

remoteproc/pru: Add support for Tx PRU cores on K3 AM65x SR2.0 SoCs · a10cd3df

Suman Anna authored 4 years ago


The AM65x SR2.0 SoCs have a revised ICSSG IP that is based off the
subsequent IP revision used on J721E SoCs. This IP instance has two
new custom auxiliary PRU cores called Transmit PRUs (Tx_PRUs) in
addition to the existing PRUs and RTUs.

The Tx_PRU cores have their own dedicated IRAM (smaller than a PRU
or RTY), Control and debug feature sets. The RTU and Tx_PRU cores
though share the same Data RAMs as the PRU cores, so the memories
have to be partitioned carefully between different applications.

Enhance the existing PRU remoteproc driver to support these new Tx
PRU cores by using specific compatibles.

Signed-off-by: Suman Anna <s-anna@ti.com>

a10cd3df

dt-bindings: remoteproc: pru: Update bindings for K3 AM65x SR2.0 SoCs · 6bed5754

Suman Anna authored 4 years ago

The AM65x SR2.0 SoCs have a revised ICSSG IP that is based off the
subsequent IP revision used on J721E SoCs, yet retaining some of the
features from AM65x SR1.0 like the PRU IRAM size etc. The ICSSG IP
on K3 AM65x SR2.0 SoCs have two new custom auxiliary PRU cores called
Transmit PRUs (Tx_PRUs) in addition to the existing PRUs and RTUs.

Update the PRU remoteproc bindings for these Tx PRU cores.

Signed-off-by: Suman Anna <s-anna@ti.com>

6bed5754

remoteproc/pru: Cleanup of_device_is_compatible() usage · 11192acc

Suman Anna authored 4 years ago


The PRU remoteproc driver uses the of_device_is_compatible() function
during probe to dynamically assign some flags and properties for each
PRU core. This usage is not recommended and makes the code a bit
cumbersome. Cleanup most of this usage by using device compatible
match data. The check for K2G to conditionally avoid the mailbox
usage is the only check left-out.

Signed-off-by: Suman Anna <s-anna@ti.com>

11192acc

Feb 23, 2020

ti_config_fragments: rpmsg: Enable OMAP remoteproc watchdog support · bc5bab0b

Suman Anna authored 5 years ago


Enable the Watchdog timer support for the OMAP remoteproc driver. This
enables the OMAP remoteproc driver to recovery any remote processor
executions stuck in a loop. This also mandatas that that all the IPU
and DSP firmwares to be used with the OMAP remoteproc driver support
the Watchdog timers and are petting the corresponding dmtimers to avoid
a recurring reboot of that processor.

Signed-off-by: Suman Anna <s-anna@ti.com>

bc5bab0b

Merge branch 'rpmsg-linux-5.4.y' of git://git.ti.com/rpmsg/rpmsg into rpmsg-ti-linux-5.4.y · 2a47401e

Suman Anna authored 5 years ago

Pull in the updated rpmsg base feature branch that adds the support
to the rpmsg-proto driver for handling error recovery with active open
sockets.

The merge also includes a bug fix in the virtio rpmsg core to fix
various mutex circular/recursive dependency issues in rpmsg-rpc and
rpmsg-proto drivers during error recovery.

* 'rpmsg-linux-5.4.y' of git://git.ti.com/rpmsg/rpmsg

:
  rpmsg: fix lockdep warnings in virtio rpmsg bus driver
  net/rpmsg: unblock reader threads operating on errored sockets
  net/rpmsg: return ENOLINK upon Rx on errored sockets
  net/rpmsg: return ESHUTDOWN upon Tx on errored sockets
  net/rpmsg: add support to handle a remote processor error recovery

Signed-off-by: Suman Anna <s-anna@ti.com>

2a47401e

Merge branch 'rpmsg-linux-5.4.y' of git://git.ti.com/rpmsg/rpmsg into rpmsg-ti-linux-5.4.y · 25abd953

Suman Anna authored 5 years ago

Pull in the updated rpmsg base feature branch that includes fixes to the
rpmsg-rpc driver for various bugs and memory leak issues found during
error recovery with active open applications

* 'rpmsg-linux-5.4.y' of git://git.ti.com/rpmsg/rpmsg

:
  rpmsg: rpc: fix potential memory leak of unprocessed skbs
  rpmsg: rpc: fix ept memory leak during recovery
  rpmsg: rpc: use the local device pointer in all file operations
  rpmsg: rpc: maintain a reference device pointer per open fd
  rpmsg: rpc: fix sysfs entry creation failures during recovery

Signed-off-by: Suman Anna <s-anna@ti.com>

25abd953

Merge branch 'rproc-linux-5.4.y' of git://git.ti.com/rpmsg/remoteproc into rpmsg-ti-linux-5.4.y · 0036d749

Suman Anna authored 5 years ago

Pull in the remoteproc feature branch that adds the support for error recovery
from both internal exceptions and Watchdog errors on all IPU and DSP remote
processors on OMAP4+ SoCs. The merge also adds a debugfs last trace feature
to remoteproc core. Also included is a fix for a memory leak issue in virtio
ring core, and a minor optimization in the remoteproc debugfs 'recovery' file;
and fixes for some state-machine issues related to MMU faults with OMAP
remoteproc driver.

NOTE: DSP error recovery is going through fine, but the DSP is not booting
currently. This will be fixed up in a future patch series.

* 'rproc-linux-5.4.y' of git://git.ti.com/rpmsg/remoteproc

:
  ARM: dts: dra7-ipu-dsp-common: Add watchdog timers to IPU and DSP nodes
  ARM: dts: omap5-uevm: Add watchdog timers for IPU and DSP
  ARM: dts: omap4-panda-common: Add watchdog timers for IPU and DSP
  remoteproc/omap: Add watchdog functionality for remote processors
  remoteproc: Fix multiple back-to-back error recoveries
  remoteproc/omap: Report device exceptions and trigger recovery
  remoteproc: implement last trace for remoteproc
  remoteproc: debugfs: Optimize the trace va lookup
  virtio_ring: Fix mem leak with vring_new_virtqueue()

Signed-off-by: Suman Anna <s-anna@ti.com>

0036d749

Feb 22, 2020

ARM: dts: dra7-ipu-dsp-common: Add watchdog timers to IPU and DSP nodes · 87728f35

Suman Anna authored 5 years ago


The watchdog timer information has been added to all the IPU and DSP
remote processor device nodes in the DRA7xx/AM57xx SoC families. The
data has been added to the two common dra7-ipu-dsp-common and
dra74-ipu-dsp-common dtsi files that can be included by all the
desired board files. The following timers are chosen as the watchdog
timers, as per the usage on the current firmware images:
        IPU2: GPTimers 4 & 9 (one for each Cortex-M4 core)
        IPU1: GPTimers 7 & 8 (one for each Cortex-M4 core)
        DSP1: GPTimer 10
        DSP2: GPTimer 13

Each of the IPUs has two Cortex-M4 processors and so uses a timer
each for providing watchdog support on that processor irrespective of
whether the IPU is running in SMP-mode or non-SMP node. The chosen
timers also need to be unique from the ones used by other processors
(regular timers or watchdog timers) so that they can be supported
simultaneously.

The MPU-side drivers will use this data to initialize the watchdog
timer(s), and listen for any watchdog triggers. The BIOS-side code on
these processors needs to configure/refresh the corresponding timer
properly to not throw a watchdog error.

The watchdog timers are optional in general, but are mandatory to
be added to support watchdog error recovery on a particular processor.
These timers can be changed or removed as per the system integration
needs, alongside appropriate equivalent changes on the firmware side.

Signed-off-by: Angela Stegmaier <angelabaker@ti.com>
Signed-off-by: Suman Anna <s-anna@ti.com>

87728f35

ARM: dts: omap5-uevm: Add watchdog timers for IPU and DSP · 649c3e11

Suman Anna authored 5 years ago


The watchdog timers have been added for the IPU and DSP remoteproc
devices for the OMAP5 uEVM board. The following timers (same as the
timers on OMAP4 Panda boards) are used as the watchdog timers,
        DSP : GPT6
        IPU : GPT9 & GPT11 (one for each Cortex-M4 core)

The MPU-side drivers will use this data to initialize the watchdog
timers, and listen for any watchdog triggers. The BIOS-side code
needs to configure and refresh these timers properly to not throw
a watchdog error.

These timers can be changed or removed as per the system integration
needs, alongside appropriate equivalent changes on the firmware side.

Signed-off-by: Suman Anna <s-anna@ti.com>

649c3e11

ARM: dts: omap4-panda-common: Add watchdog timers for IPU and DSP · f2336b3e

Suman Anna authored 5 years ago


The watchdog timers have been added for the IPU and DSP remoteproc
devices on all the OMAP4-based Panda boards. The following timers
are used as the watchdog timers,
	DSP : GPT6
	IPU : GPT9 & GPT11 (one for each Cortex-M3 core)

The MPU-side drivers will use this data to initialize the watchdog
timers, and listen for any watchdog triggers. The BIOS-side code
needs to configure and refresh these timers properly to not throw
a watchdog error.

These timers can be changed or removed as per the system integration
needs, alongside appropriate equivalent changes on the firmware side.

Signed-off-by: Suman Anna <s-anna@ti.com>

f2336b3e

remoteproc/omap: Add watchdog functionality for remote processors · 4cf0e716

Suman Anna authored 5 years ago

Remote processors can be stuck in a loop, and may not be recoverable
if they do not have a built-in watchdog. The watchdog implementation
for OMAP remote processors uses external gptimers that can be used
to interrupt both the Linux host as well as the remote processor.

Each remote processor is responsible for refreshing the timer during
normal behavior - during OS task scheduling or entering the idle loop
properly. During a watchdog condition (executing a tight loop causing
no scheduling), the host processor gets interrupts and schedules a
recovery for the corresponding remote processor. The remote processor
may also get interrupted to be able to print a back trace.

A menuconfig option has also been added to enable/disable the Watchdog
functionality, with the default as disabled.

Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>

4cf0e716

remoteproc: Fix multiple back-to-back error recoveries · b2d47d76

Suman Anna authored 8 years ago

The remoteproc core uses a dedicated work item per remote processor
to perform an error recovery of that processor. This work item is
always scheduled upon notification of an error at the moment. The
error recovery process itself is performed when the workqueue gets
scheduled and executes the work function, but an error recovery
needs to be performed only once if there are multiple notifications
while an existing error recovery is in progress. Fix this by adding
a check to make sure the remote processor error recovery work item
is not already running or scheduled.

This fixes an issue with error recovery upon MMU Faults on OMAP
IPU remote processors. An MMU fault on OMAP IPU sends two error
notifications - one a direct interrupt from the MMU, and the second
a mailbox-based crash notification after the remote processor has
collected some backtrace. The mailbox based interrupt mechanism,
added in commit 7f275d21

 ("remoteproc/omap: report device
exceptions and trigger recovery"), is used for Attribute MMU (AMMU)
faults and other internal exceptions on the IPU. The backtrace
collection on the IPU remote processor is triggered by the same
interrupt which cannot be differentiated between an MMU fault and
an AMMU fault. The remoteproc core changes in 4.9 kernel around the
boot sequences has now caused the second notification to trigger a
secondary error recovery, which was avoided in previous kernels due
to the event detection in the work-function itself. The newer code
sequences changes the timing w.r.t previous kernels where the
recovery process was performed a bit later due to the asynchronous
firmware loading.

Signed-off-by: Suman Anna <s-anna@ti.com>

b2d47d76

remoteproc/omap: Report device exceptions and trigger recovery · bf4c29a7

Suman Anna authored 5 years ago


The OMAP remote processors send a special mailbox message
(RP_MBOX_CRASH) when they crash and detect an internal device
exception.

Add support to the mailbox handling function upon detection of
this special message to report this crash to the remoteproc core.
The remoteproc core can trigger a recovery using the prevailing
recovery mechanism, already in use for MMU Fault recovery.

Co-developed-by: Subramaniam Chanderashekarapuram <subramaniam.ca@ti.com>
Signed-off-by: Subramaniam Chanderashekarapuram <subramaniam.ca@ti.com>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

bf4c29a7

remoteproc: implement last trace for remoteproc · 8604575b

Suman Anna authored 5 years ago


The last trace is a way of preserving the remoteproc traces past
remoteproc recovery. This is achieved by creating a new traceY_last
debugfs entry during a crash for each trace entry, and copying the
trace buffer contents into the corresponding last trace entry. This
copied contents can then be read out using a debugfs entry. All the
trace entries are cleaned up during the resource cleanup phase when
shutting down a remoteproc. The design assumes that the same firmware
is being used for the error recovery.

Eg:
	cat <debugfs_root>/remoteproc/remoteprocX/traceY_last
should give the traces that were printed out just before the recovery
happened on remoteproc X for trace Y.

Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Subramaniam Chanderashekarapuram <subramaniam.ca@ti.com>

8604575b

remoteproc: debugfs: Optimize the trace va lookup · e0c87ffe

Suman Anna authored 5 years ago

The commit a987e6b9

 ("remoteproc: fix trace buffer va initialization")
computes the trace va given the trace device address every time a debugfs
read is done. This can be optimized to perform the lookup only the first
time, and store the value in the va field of the embedded rproc_mem_entry
structure for the debug trace. This restores how the trace va was stored
prior to the above commit in the original trace resource entry handling.

Signed-off-by: Suman Anna <s-anna@ti.com>

e0c87ffe

virtio_ring: Fix mem leak with vring_new_virtqueue() · 30909e24

Suman Anna authored 5 years ago

The functions vring_new_virtqueue() and __vring_new_virtqueue() are used
with split rings, and any allocations within these functions are managed
outside of the .we_own_ring flag. The commit cbeedb72 ("virtio_ring:
allocate desc state for split ring separately") allocates the desc state
within the __vring_new_virtqueue() but frees it only when the .we_own_ring
flag is set. This leads to a memory leak when freeing such allocated
virtqueues with the vring_del_virtqueue() function.

Fix this by moving the desc_state free code outside the flag and only
for split rings. Issue was discovered during testing with remoteproc
and virtio_rpmsg.

Fixes: cbeedb72

 ("virtio_ring: allocate desc state for split ring separately")
Signed-off-by: Suman Anna <s-anna@ti.com>

30909e24

rpmsg: fix lockdep warnings in virtio rpmsg bus driver · b0f0e6a3

Angela Stegmaier authored 5 years ago


The virtio rpmsg bus framework uses endpoints as the basis for
sending and receiving messages to/from a remote processor. Each
rpmsg bus device will have a primary endpoint if the corresponding
rpmsg bus driver supports a callback, and secondary child endpoints
associated with the same rpmsg bus device. The life-cycle of these
endpoints are tied to the corresponding rpmsg device. A virtio rpmsg
bus device can also have its own endpoint for supporting name service
announcements from a corresponding remote processor to create and
delete rpmsg devices dynamically.

Each endpoint has a callback lock associated with it to provide
protection/mutual exclusion between threads that process incoming
rpmsg messages and threads that want to delete the endpoint. The
virtio rpmsg name service endpoint callback will run while holding
it's ept->cb_lock to create/delete rpmsg devices for RPMSG_NS_CREATE
and RPMSG_NS_DELETE messages respectively. The latter message
processing will destroy the requested channel, and will ultimately
result in all the secondary rpmsg device endpoints also to be
destroyed. The ept->cb_lock for the channel's endpoint is also
locked during its destruction while setting the callback to NULL.
This results in a seemingly nested locking of the ept->cb_lock even
though the locking is on different mutexes. This will result in a
false warning from the lockdep validator when it is enabled because
the lockdep deals with classes and both are the same class, although
they are different instances.

Similar circular dependency scenarios also exist with remoteproc
error recovery and existing rpmsg drivers - rpmsg_rpc and rpmsg_proto.

These issues are fixed by replacing the existing mutex_lock() calls
with the mutex_lock_nested() API variation and using different
subclasses for the NameService end-point and for the rpmsg channel
device end-points.

Following are example warning signatures that get fixed by this patch:

1. Recursive locking dependency during RPMSG_NS_DESTROY message processing
 =============================================
 WARNING: possible recursive locking detected
 ---------------------------------------------
 kworker/0:0/1069 is trying to acquire lock:
 e37914c0 (&ept->cb_lock){+.+.}, at: __rpmsg_destroy_ept+0x40/0x6c [virtio_rpmsg_bus]

 but task is already holding lock:
 e3d14bc0 (&ept->cb_lock){+.+.}, at: rpmsg_recv_done+0x6c/0x34c [virtio_rpmsg_bus]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&ept->cb_lock);
   lock(&ept->cb_lock);

  *** DEADLOCK ***

  May be due to missing lock nesting notatio

 4 locks held by kworker/0:0/1069:
  #0: e700b6a0 ((wq_completion)events){+.+.}, at: process_one_work+0x1f4/0x824
  #1: e2175f24 ((work_completion)(&mq->work)){+.+.}, at: process_one_work+0x1f4/0x824
  #2: e3d14bc0 (&ept->cb_lock){+.+.}, at: rpmsg_recv_done+0x6c/0x34c [virtio_rpmsg_bus]
  #3: e335e0bc (&dev->mutex){....}, at: device_release_driver_internal+0x18/0x1bc

2. Circular locking dependency during error recovery of rpmsg-rpc driver
 ======================================================
 WARNING: possible circular locking dependency detected
 -------------------------------------------------------
 kworker/0:4/1068 is trying to acquire lock:
 e3e2b2c0 (&ept->cb_lock){+.+.}, at: __rpmsg_destroy_ept+0x40/0x6c [virtio_rpmsg_bus]

 but task is already holding lock:
 e4282380 (&rppcdev->lock){+.+.}, at: rppc_remove+0x88/0x238 [rpmsg_rpc]

 which lock already depends on the new lock.

 other info that might help us debug this:

 Chain exists of:
  &ept->cb_lock --> rppc_devices_lock --> &rppcdev->lock

  Possible unsafe locking scenario:
        CPU0                    CPU1
        ----                    ----
   lock(&rppcdev->lock);
                                lock(rppc_devices_lock);
                                lock(&rppcdev->lock);
   lock(&ept->cb_lock);

  *** DEADLOCK ***

 7 locks held by kworker/0:4/1068:
  #0: e700b6a0 ((wq_completion)events){+.+.}, at: process_one_work+0x1f4/0x824
  #1: e2289f24 ((work_completion)(&rproc->crash_handler)){+.+.}, at: process_one_work+0x1f4/0x824
  #2: e47ebb1c (&rproc->lock){+.+.}, at: rproc_trigger_recovery+0x44/0x460
  #3: e3e064e4 (&dev->mutex){....}, at: device_release_driver_internal+0x18/0x1bc
  #4: e37bd4bc (&dev->mutex){....}, at: device_release_driver_internal+0x18/0x1bc
  #5: bf019034 (rppc_devices_lock){+.+.}, at: rppc_remove+0x3c/0x238 [rpmsg_rpc]
  #6: e4282380 (&rppcdev->lock){+.+.}, at: rppc_remove+0x88/0x238 [rpmsg_rpc]

3. Circular locking dependency during error recovery of rpmsg-proto driver
 ======================================================
 WARNING: possible circular locking dependency detected
 -------------------------------------------------------
 kworker/0:1/21 is trying to acquire lock:
 e47869c0 (&ept->cb_lock){+.+.}, at: __rpmsg_destroy_ept+0x40/0x6c [virtio_rpmsg_bus]

 but task is already holding lock:
 bf027034 (rpmsg_channels_lock){+.+.}, at: rpmsg_proto_remove+0x28/0x16c [rpmsg_proto]

 which lock already depends on the new lock.

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(rpmsg_channels_lock);
                                lock(&ept->cb_lock);
                                lock(rpmsg_channels_lock);
   lock(&ept->cb_lock);

  *** DEADLOCK ***

 6 locks held by kworker/0:1/21:
  #0: e700b6a0 ((wq_completion)events){+.+.}, at: process_one_work+0x1f4/0x824
  #1: e6057f24 ((work_completion)(&rproc->crash_handler)){+.+.}, at: process_one_work+0x1f4/0x824
  #2: e3884b1c (&rproc->lock){+.+.}, at: rproc_trigger_recovery+0x44/0x460
  #3: e20a88e4 (&dev->mutex){....}, at: device_release_driver_internal+0x18/0x1bc
  #4: e39ae0bc (&dev->mutex){....}, at: device_release_driver_internal+0x18/0x1bc
  #5: bf027034 (rpmsg_channels_lock){+.+.}, at: rpmsg_proto_remove+0x28/0x16c [rpmsg_proto]

Signed-off-by: Angela Stegmaier <angelabaker@ti.com>
[s-anna@ti.com: flip the subclass values, update crash log examples for 5.4]
Signed-off-by: Suman Anna <s-anna@ti.com>

b0f0e6a3

net/rpmsg: unblock reader threads operating on errored sockets · fdffa8cc

Suman Anna authored 6 years ago


The rpmsg_proto driver is used to provide a socket interface
to userspace under the AF_RPMSG address family, and is used
by the TI IPC MessageQ stack. The typical usage for receiving
messages include a thread blocked on a select() call with
appropriate socket fds, followed by a recvfrom() on the fd
returned/marked ready by select().

The rpmsg_sock_poll() function implements the logic needed
by the select() call, and marks a socket ready only when there
is data to be read currently. Any reader thread waiting on the
select() call to return is currently not unblocked when a remote
processor goes through an error recovery, and can remain blocked
forever as its remote processor peer thread may never send it
another message. Enhance the rpmsg_proto driver so that a waiting
thread can be unblocked by waking it up during the process of
marking the open sockets with the error status RPMSG_ERROR. This
is achieved by using the socket's .sk_error_report() ops, and is
preferred over the .sk_state_change() ops to wakeup only a single
exclusive thread.

Signed-off-by: Suman Anna <s-anna@ti.com>

fdffa8cc

net/rpmsg: return ENOLINK upon Rx on errored sockets · f798b9b8

Suman Anna authored 8 years ago

The rpmsg_proto driver is used to provide a socket interface to
userspace under the AF_RPMSG address family, and is used by the TI
IPC MessageQ stack. The rpmsg proto driver creates a rpmsg endpoint
per remote processor (a Rx socket) for each MessageQ object through
the socket's bind() call. These rpmsg endpoints are associated with
a published parent rpmsg device from that remote processor. These
endpoints are cleaned up normally either when the userspace program
/ application closes them or through the automatic cleanup of the
file descriptors when a process is terminated/closed. These endpoints
can also be cleaned up by the rpmsg_proto driver as part of the error
recovery of a remote processor, during the removal of their parent
rpmsg device, with the corresponding Rx sockets simply marked with
the error status RPMSG_ERROR.

This error status is not currently being returned to the userspace
in the socket's recvfrom() interface. Fix this by specifically
checking for this error status, and returning an error value of
ENOLINK back to userspace. The ENOLINK error code is used to allow
the userspace to differentiate this terminal error from other errors
on the Rx sockets and take appropriate action. This error code on
Rx sockets serves the same as the error code ESHUTDOWN used for Tx
sockets, and is chosen specifically to have a meaningful strerror
message appropriate to Rx sockets.

Signed-off-by: Suman Anna <s-anna@ti.com>

f798b9b8

net/rpmsg: return ESHUTDOWN upon Tx on errored sockets · 44be0b7c

Suman Anna authored 10 years ago


The rpmsg proto driver uses a single rpmsg channel device
published from a remote processor to transmit all socket-based
messages intended for that remote processor. This channel will
be auto-removed and recreated if the remote processor goes
through an error recovery process. Any connected sockets are
marked with an error status, and further transmissions on these
connected sockets should gracefully return an error. This error
condition is specifically checked for and a new error ESHUTDOWN
is returned back to userspace to differentiate it from
transmissions on an unconnected socket.

Signed-off-by: Suman Anna <s-anna@ti.com>

44be0b7c

net/rpmsg: add support to handle a remote processor error recovery · 3779cbac

Suman Anna authored 8 years ago


The rpmsg_proto driver is used to provide a socket interface to
userspace under the AF_RPMSG address family, and is used by the
TI IPC MessageQ stack. The rpmsg proto driver uses a single rpmsg
channel device published from a remote processor to transmit and
receive all socket-based messages to/from that remote processor.
There can be any number of Tx and Rx sockets associated with each
remote processor's rpmsg device. This rpmsg channel device will be
auto-removed and recreated if the associated remote processor goes
through an error recovery process. Any existing open sockets (both
Tx and Rx) are oblivious if the underlying rpmsg channel has been
removed, and any further operations on such sockets can create
various kernel crashes due to invalid pointer dereferences.

This patch adds the error recovery support to the rpmsg-proto driver.
This is achieved by using the private field of the published rpmsg
channel device's endpoint (ept->priv) to maintain a list of all the
connected and bound sockets, and setting a new error status
(RPMSG_ERROR) on all these open sockets when the associated parent
rpmsg device is removed. This new error status allows the driver
to check for a valid state of a socket before performing any actions
on it thereby preventing any kernel crashes. The status is also used
to allow the userspace to perform appropriate cleanup and/or recovery
steps.

The logic is asymmetric because of the slight difference between the
Rx and Tx sockets. All the Tx sockets use the one-time published
rpmsg_channel devices for transmissions and just need to be marked
with the error status, while each of the Rx sockets have their own
derivative rpmsg endpoints, and so need to be removed alongside the
removal of the associated rpmsg channel device in addition. The
sockets themselves are freed up anytime either by the userspace
closing them or through an automatic close when the process is
terminated/closed.

Signed-off-by: Suman Anna <s-anna@ti.com>

3779cbac

Feb 21, 2020

Merge branch 'rproc-linux-5.4.y' of git://git.ti.com/rpmsg/remoteproc into rpmsg-ti-linux-5.4.y · acca4625

Suman Anna authored 5 years ago

Pull in the updated remoteproc feature branch that includes a fix in
the omap-prm reset driver for a "scheduling while atomic" bug seen
with IPU and DSP remoteprocs on OMAP4+ SoCs when CONFIG_PREEMPT is
enabled. OMAP IOMMU unit-tests are not affected though, and the
scheduler warning is also not seen with regular omap2plus_defconfig
where CONFIG_PREEMPT is disabled by default.

* 'rproc-linux-5.4.y' of git://git.ti.com/rpmsg/remoteproc

:
  soc: ti: omap-prm: use atomic iopoll instead of sleeping one

Signed-off-by: Suman Anna <s-anna@ti.com>

acca4625

Merge branch 'topic/5.4/ti-sysc-prm-reset' of... · 12f10220

Suman Anna authored 5 years ago

Merge branch 'topic/5.4/ti-sysc-prm-reset' of git://git.ti.com/rpmsg/remoteproc into rproc-linux-5.4.y

Merge in the ti-sysc-prm-reset topic branch into the base remoteproc
feature branch to pull in a minor fix in the omap-prm reset driver
for a "scheduling while atomic" bug seen with IPU and DSP remoteprocs
on OMAP4+ SoCs when CONFIG_PREEMPT is enabled.

* 'topic/5.4/ti-sysc-prm-reset' of git://git.ti.com/rpmsg/remoteproc

:
  soc: ti: omap-prm: use atomic iopoll instead of sleeping one

Signed-off-by: Suman Anna <s-anna@ti.com>

12f10220

Feb 20, 2020

soc: ti: omap-prm: use atomic iopoll instead of sleeping one · 3fa5ab4c

Tero Kristo authored 5 years ago


The reset handling APIs for omap-prm can be invoked PM runtime which
runs in atomic context. For this to work properly, switch to atomic
iopoll version instead of the current which can sleep. Otherwise,
this throws a "BUG: scheduling while atomic" warning. Issue is seen
rather easily when CONFIG_PREEMPT is enabled.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Suman Anna <s-anna@ti.com>

3fa5ab4c

Merge branch 'rproc-linux-5.4.y' of git://git.ti.com/rpmsg/remoteproc into rpmsg-ti-linux-5.4.y · 2bd4e58b

Suman Anna authored 5 years ago

Pull in the updated remoteproc feature branch that adds the support
for system suspend/resume and runtime auto-suspend/resume support on
the IPU and DSP remote processors on OMAP4, OMAP5 and DRA7 SoCs. The
dependent OMAP IOMMU and OMAP mailbox drivers have the suspend/resume
support infrastructure already in place in upstream 5.4 kernel.

* 'rproc-linux-5.4.y' of git://git.ti.com/rpmsg/remoteproc

:
  remoteproc: Fix sysfs interface to stop a suspended processor
  remoteproc/omap: add support for runtime auto-suspend/resume
  remoteproc/omap: add support for system suspend/resume
  dt-bindings: remoteproc: omap: Update ti,autosuspend-delay-ms description

Signed-off-by: Suman Anna <s-anna@ti.com>

2bd4e58b

Feb 19, 2020

rpmsg: rpc: fix potential memory leak of unprocessed skbs · 30e402a7

Suman Anna authored 5 years ago


A user thread sends a request for a remote function execution
on the remote processor through a write() fop. All the responses
from the remote service are queued using allocated skbs in the
driver's rpmsg callback. The allocated skbs are processed and
freed in a read() fop. An error recovery causes a blocked user
thread to bail out immediately and any in-flight queued skbs
are left unprocessed. These in-flight skbs are never freed and
can result in a memory leak.

Fix the memory leak by checking for the presence of any of these
unprocessed skbs in the read queue, and freeing them during the
file descriptor's release() function. This also ensures no memory
is leaked for user applications with bugs and not using matching
write() and read() fops.

Signed-off-by: Suman Anna <s-anna@ti.com>

30e402a7

rpmsg: rpc: fix ept memory leak during recovery · 4614841a

Suman Anna authored 10 years ago

The rpmsg-rpc driver exposes a character device for each remote
service (a rpmsg-rpc device) providing a bunch of remote execution
functions. An endpoint is created in the open() fops, and forms the
source end-point of a dedicated communication channel to allow an
application to send and receive remote function execution commands/
responses on this service. This endpoint address is a child of the
parent virtio device to which the rpmsg-rpc device belongs to. The
virtio devices are deleted and recreated during a remoteproc crash
and recovery process. The associated child endpoints are not deleted
at present during recovery, and the corresponding release() cannot
delete the end-points if it happens after a recovery as the parent
rpmsg-rpc device has already been removed, thereby resulting in a
memory leak during recovery amidst an active usage.

Fix this by deleting all the epts associated with the parent virtio
device of the corresponding rpmsg-rpc device. This is done during the
rpmsg-rpc driver's .remove() which is invoked during the deletion of
the virtio device.

Signed-off-by: Suman Anna <s-anna@ti.com>

4614841a

rpmsg: rpc: use the local device pointer in all file operations · 83f7a0b8

Suman Anna authored 10 years ago


The remote processor recovery process includes the deletion and
recreation of an rpmsg-rpc device. The representative rppc_device
structure is retained and reused if there are any open applications
using the exposed character device. The underlying device pointer
for a rppc_device is though deleted and recreated and can become
NULL at any point if an error recovery happens. So, switch to using
the local reference device pointer in all the fop functions for
the exposed character device.

Signed-off-by: Suman Anna <s-anna@ti.com>

83f7a0b8

rpmsg: rpc: maintain a reference device pointer per open fd · a429ebe6

Suman Anna authored 10 years ago

The remote processor recovery process includes the deletion and
recreation of an rpmsg-rpc device. The representative rppc_device
structure is retained and reused if there are any open applications
using the exposed character device. The underlying device pointer
for a rppc_device is though deleted and recreated and is asynchronous
to any of the operations on the exposed character device. A reference
to this device pointer is to be maintained therefore for each open
application so that it can be used during regular fops and until the
file descriptor is closed instead of referencing the rppc_device's
dev pointer, which can become NULL at any point due to a recovery
process. The actual memory of the rppc_device's dev pointer deleted
in the driver's .remove() is freed when all the open applications
have closed either gracefully or forcefully. Any new applications
after a recovery will leverage a newly created device pointer.

Signed-off-by: Suman Anna <s-anna@ti.com>

a429ebe6

rpmsg: rpc: fix sysfs entry creation failures during recovery · cca6a3d2

Suman Anna authored 10 years ago

The rpmsg-rpc driver exposes a character device for each remote
service (a rpmsg-rpc device) providing a bunch of remote execution
functions. The remote service can be running on any of the available
remote processors, and the supported functions are published as
different sysfs entries on that particular device. These rpmsg-rpc
devices are deleted and recreated as part of the reboot of the remote
processor during an error recovery. The sysfs entries are also deleted
and recreated. The current logic retains the associated rppc_device
structure and the underlying device pointer if there are any
applications actively using the character device at the time of the
rpmsg-rpc device removal, and reuses it upon the reprobe of the same
rpmsg-rpc device. The creation of the sysfs entries fails with -ENOENT
due to an invalid reference to a non-existing parent object, and this
is exposed first in 3.14 kernel due to the repartitioning of the core
sysfs code into a new common kernfs code.

Fix this by deleting the underlying device pointer in the driver's
.remove, and recreating it with the appropriate new rpmsg server
device as its parent in the driver's .probe function. A name
description field is also added to the representative rppc_device
structure for looking up the service on reprobe as the device name
cannot be used due to the deletion of the device pointer.

Signed-off-by: Suman Anna <s-anna@ti.com>

cca6a3d2

Feb 18, 2020

remoteproc: Fix sysfs interface to stop a suspended processor · d7367474

Keerthy authored 8 years ago

Commit 2aefbef0

 ("remoteproc: Add a sysfs interface for
firmware and state") has added an interface to be able to stop
a remote processor, change the firmware and start the remote
processor using the new firmware through the sysfs files 'state'
and 'firmware'. Any firmware change requires the processor to be
in a stopped state. The logic in 'stop' checks for a valid state
(RPROC_RUNNING) before a processor can be stopped. A booted remote
processor though can also be in RPROC_SUSPENDED state if the driver
controlling the device supports runtime auto-suspend, and any
attempt to stop such a processor throws an error,
"write error: Invalid argument".

It should be possible to stop a processor that is in suspended
state using the sysfs entry, as this is a perfectly functional
scenario when either removing the module, or unbinding the device
from the driver. Fix the sysfs logic to permit the same.

Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Suman Anna <s-anna@ti.com>

d7367474

remoteproc/omap: add support for runtime auto-suspend/resume · bbb7ce05

Suman Anna authored 5 years ago

This patch enhances the PM support in the OMAP remoteproc driver to
support the runtime auto-suspend. A remoteproc may not be required to
be running all the time, and typically will need to be active only
during certain usecases. As such, to save power, it should be turned
off during potential long periods of inactivity between usecases.
This suspend and resume of the device is a relatively heavy process
in terms of latencies, so a remoteproc should be suspended only after
a certain period of prolonged inactivity. The OMAP remoteproc driver
leverages the runtime pm framework's auto_suspend feature to accomplish
this functionality. This feature is automatically enabled when a remote
processor has successfully booted. The 'autosuspend_delay_ms' for each
device dictates the inactivity period/time to wait for before
suspending the device.

The runtime auto-suspend design relies on marking the last busy time
on every communication (virtqueue kick) to and from the remote processor.
When there has been no activity for 'autosuspend_delay_ms' time, the
runtime PM framework invokes the driver's runtime pm suspend callback
to suspend the device. The remote processor will be woken up on the
initiation of the next communication message through the runtime pm
resume callback. The current auto-suspend design also allows a remote
processor to deny a auto-suspend attempt, if it wishes to, by sending a
NACK response to the initial suspend request message sent to the remote
processor as part of the suspend process. The auto-suspend request is
also only attempted if the remote processor is idled and in standby at
the time of inactivity timer expiry. This choice is made to avoid
unnecessary messaging, and the auto-suspend is simply rescheduled to
be attempted again after a further lapse of autosuspend_delay_ms.

The runtime pm callbacks functionality in this patch reuses most of the
core logic from the suspend/resume support code, and make use of an
additional auto_suspend flag to differentiate the logic in common code
from system suspend. The system suspend/resume sequences are also updated
to reflect the proper pm_runtime statuses, and also to really perform a
suspend/resume only if the remoteproc has not been auto-suspended at the
time of request. The remote processor is left in suspended state on a
system resume if it has been auto-suspended before, and will be woken up
only when a usecase needs to run.

The OMAP remoteproc driver currently uses a default value of 10 seconds
for all OMAP remoteprocs, and a different value can be chosen either by
choosing a positive value for the 'ti,autosuspend-delay-ms' under DT or
by updating the 'autosuspend_delay_ms' field at runtime through the
sysfs interface. A negative value is equivalent to disabling the runtime
suspend.
Eg: To use 25 seconds for IPU2 on DRA7xx,
echo 25000 > /sys/bus/platform/devices/55020000.ipu/power/autosuspend_delay_ms

The runtime suspend feature can also be similarly enabled or disabled by
writing 'auto' or 'on' to the device's 'control' power field. The default
is enabled.
Eg: To disable auto-suspend for IPU2 on DRA7xx SoC,
echo on > /sys/bus/platform/devices/55020000.ipu/power/control

Signed-off-by: Suman Anna <s-anna@ti.com>
[t-kristo@ti.com: converted to use ti-sysc instead of hwmod]
Signed-off-by: Tero Kristo <t-kristo@ti.com>

bbb7ce05

remoteproc/omap: add support for system suspend/resume · ffd6fe1e

Suman Anna authored 5 years ago

This patch adds the support for system suspend/resume to the
OMAP remoteproc driver so that the OMAP remoteproc devices can
be suspended/resumed during a system suspend/resume. The support
is added through the driver PM .suspend/.resume callbacks, and
requires appropriate support from the OS running on the remote
processors.

The IPU & DSP remote processors typically have their own private
modules like registers, internal memories, caches etc. The context
of these modules need to be saved and restored properly for a
suspend/resume to work. These are in general not accessible from
the MPU, so the remote processors themselves have to implement
the logic for the context save & restore of these modules.

The OMAP remoteproc driver initiates a suspend by sending a mailbox
message requesting the remote processor to save its context and
enter into an idle/standby state. The remote processor should
usually stop whatever processing it is doing to switch to a context
save mode. The OMAP remoteproc driver detects the completion of
the context save by checking the module standby status for the
remoteproc device. It also stops any resources used by the remote
processors like the timers. The timers need to be running only
when the processor is active and executing, and need to be stopped
otherwise to allow the timer driver to reach low-power states. The
IOMMUs are automatically suspended by the PM core during the late
suspend stage, after the remoteproc suspend process is completed by
putting the remote processor cores into reset. Thereafter, the Linux
kernel can put the domain into further lower power states as possible.

The resume sequence undoes the operations performed in the PM suspend
callback, by starting the timers and finally releasing the processors
from reset. This requires that the remote processor side OS be able to
distinguish a power-resume boot from a power-on/cold boot, restore the
context of its private modules saved during the suspend phase, and
resume executing code from where it was suspended. The IOMMUs would
have been resumed by the PM core during early resume, so they are
already enabled by the time remoteproc resume callback gets invoked.

The remote processors should save their context into System RAM (DDR),
as any internal memories are not guaranteed to retain context as it
depends on the lowest power domain that the remote processor device
is put into. The management of the DDR contents will be managed by
the Linux kernel.

Signed-off-by: Suman Anna <s-anna@ti.com>
[t-kristo@ti.com: converted to use ti-sysc instead of hwmod]
Signed-off-by: Tero Kristo <t-kristo@ti.com>

ffd6fe1e

dt-bindings: remoteproc: omap: Update ti,autosuspend-delay-ms description · 4111ff5f

Suman Anna authored 5 years ago


The OMAP remoteproc binding has been updated to add some more details
for the "ti,autosuspend-delay-ms" property that is used for the default
runtime pm autosuspend delay in Linux. This is an optional property
with a default value of 10000 ms, but the runtime auto-suspend can be
disabled for a specific remote processor by default by using a negative
value.

Signed-off-by: Suman Anna <s-anna@ti.com>

4111ff5f

Feb 14, 2020

Merge branch 'rproc-linux-5.4.y' of git://git.ti.com/rpmsg/remoteproc into rpmsg-ti-linux-5.4.y · 6a7e3da5

Suman Anna authored 5 years ago

Pull in the remoteproc feature branch supporting the boot of all DSP and
IPU remote processors on OMAP4, OMAP5 and various DRA7xx/AM57xx SoCs. The
feature branch also pulls in automatically the dependent iommu feature
tree into the rpmsg-ti-linux-5.4.y RPMsg integration branch. OMAP mailbox
is fully upstream in vanilla 5.4 kernel for all OMAP SoCs.

The merge also includes couple of fixes to the OMAP4, OMAP5 and DRA7
DMTimer nodes to be able to define the timer_sys_ck clock alias that is
used by the OMAP remoteproc driver to set the timer's parent clocks.

The supported functional features in OMAP remoteproc include:
 - Device Tree based support for device-specific carveouts and CMA pools
 - Boot of device-tree based IPU and DSP remoteproc devices all based on
   ti-sysc
 - Internal memory loading support on DSPs
 - BIOS Tick timer support using OMAP DMTimer clocksource code
 - Cleanup of legacy platform device based code
 - Cleanup of legacy hwmod data

Supported platforms include OMAP4 Pandaboard, OMAP5 uEVM, DRA7 EVMs,
DRA76 EVM, both DRA72 rev.B and rev.C EVMs, DRA71 EVM, all AM57xx
BeagleBoard-X15 boards and their derivative boards, AM572x IDK, AM571x
IDK and AM574x IDK boards. The IVA and DSP remote processors will be
running at OPP_NOM clock frequencies by default, and at OPP_HIGH with
the appropriate U-Boot on boards/SoCs that can support them.

* 'rproc-linux-5.4.y' of git://git.ti.com/rpmsg/remoteproc

: (54 commits)
  ARM: dts: am571x-idk: Add CMA pools and enable IPUs & DSP1 rprocs
  ARM: dts: am572x-idk-common: Add CMA pools and enable IPU & DSP rprocs
  ARM: dts: beagle-x15-common: Add CMA pools and enable IPU & DSP rprocs
  ARM: dts: dra76-evm: Add CMA pools and enable IPU & DSP rprocs
  ARM: dts: dra71-evm: Add CMA pools and enable IPUs & DSP1 rprocs
  ARM: dts: dra72-evm-revc: Add CMA pools and enable IPUs & DSP1 rprocs
  ARM: dts: dra72-evm: Add CMA pools and enable IPUs & DSP1 rprocs
  ARM: dts: dra7-evm: Add CMA pools and enable IPU & DSP rprocs
  ARM: dts: dra7-ipu-dsp-common: Add timers to IPU and DSP nodes
  ARM: dts: dra7-ipu-dsp-common: Add mailboxes to IPU and DSP nodes
  ARM: dts: dra7-ipu-dsp-common: Move mailboxes into common files
  ARM: dts: dra72x: Add aliases for rproc nodes
  ARM: dts: dra74x: Add aliases for rproc nodes
  ARM: dts: dra74x: Add DSP2 processor device node
  ARM: dts: dra7: Add common IPU and DSP nodes
  ARM: dts: omap5-uevm: Add system timers to DSP and IPU
  ARM: dts: omap5-uevm: Add CMA pools and enable IPU & DSP
  ARM: dts: omap5: Add aliases for rproc nodes
  ARM: dts: omap5: Add DSP and IPU nodes
  ARM: OMAP4: hwmod_data: Remove OMAP4 IPU hwmod data
  ...

Signed-off-by: Suman Anna <s-anna@ti.com>

6a7e3da5

Merge branch 'iommu-linux-5.4.y' of git://git.ti.com/rpmsg/iommu into rproc-linux-5.4.y · b156c300

Suman Anna authored 5 years ago

Merge in the updated iommu feature branch into remoteproc tree to
pull in the necessary support to fix the DRA7 IPU1 boot issue and
couple of DRA7 DSP idle issues with HW_AUTO setting. The DSP idle
status is achieved across all the DSPs and boards only with the
fix for errata i879.

* 'iommu-linux-5.4.y' of git://git.ti.com/rpmsg/iommu

:
  ARM: OMAP2+: Add workaround for DRA7 DSP MStandby errata i879
  ARM: OMAP2+: Extend DRA7 IPU1 MMU pdata quirks to DSP MDMA MMUs
  ARM: OMAP2+: Add IOMMU pdata quirks to fix DRA7 IPU1 boot
  ARM: OMAP2+: omap-iommu.c conversion to ti-sysc

Signed-off-by: Suman Anna <s-anna@ti.com>

b156c300

ARM: dts: am571x-idk: Add CMA pools and enable IPUs & DSP1 rprocs · c4e62778

Suman Anna authored 7 years ago

The CMA reserved memory nodes have been added for both the IPUs and the
DSP1 remoteproc devices on the AM571x IDK board. These nodes are assigned
to the respective rproc device nodes, and both the IPUs and the DSP1
remote processors are enabled for this board.

The current CMA pools and sizes are defined statically for each device.
The addresses chosen are the same as the respective processors on the
DRA72 EVM board to maintain firmware compatibility between the two boards.
The CMA pools and sizes are defined using 64-bit values to support LPAE.
The starting addresses are fixed to meet current dependencies on the
remote processor firmwares, and this will go away when the remote-side
code has been improved to gather this information runtime during its
initialization.

An associated pair of the rproc node and its CMA node can be disabled
later on if there is no use-case defined to use that remote processor.

Signed-off-by: Suman Anna <s-anna@ti.com>

c4e62778

ARM: dts: am572x-idk-common: Add CMA pools and enable IPU & DSP rprocs · e6f87f1f

Suman Anna authored 7 years ago

The CMA reserved memory nodes have been added for all the IPU and DSP
remoteproc devices in the am572x-idk-common.dtsi file that is common to
both the AM572x and AM574x IDK boards. These nodes are assigned to the
respective rproc device nodes, and all the IPU and DSP remote processors
are enabled.

The current CMA pools and sizes are defined statically for each device.
The addresses chosen are the same as the respective processors on
the AM57xx EVM board to maintain firmware compatibility between the
two boards. The CMA pools and sizes are defined using 64-bit values
to support LPAE. The starting addresses are fixed to meet current
dependencies on the remote processor firmwares, and this will go
away when the remote-side code has been improved to gather this
information runtime during its initialization.

An associated pair of the rproc node and its CMA node can be disabled
later on if there is no use-case defined to use that remote processor.

Signed-off-by: Suman Anna <s-anna@ti.com>

e6f87f1f

ARM: dts: beagle-x15-common: Add CMA pools and enable IPU & DSP rprocs · 5e68266d

Suman Anna authored 7 years ago

The CMA reserved memory nodes have been added for all the IPU and DSP
remoteproc devices on all the AM57xx BeagleBoard-X15 boards. These nodes
are assigned to the respective rproc device nodes, and all the IPU and
DSP remote processors are enabled for all these boards.

The current CMA pools and sizes are defined statically for each device.
The addresses chosen are the same as the respective processors on the
DRA7 EVM board to maintain firmware compatibility between the two boards.
The CMA pools and sizes are defined using 64-bit values to support LPAE.
The starting addresses are fixed to meet current dependencies on the
remote processor firmwares, and this will go away when the remote-side
code has been improved to gather this information runtime during its
initialization.

An associated pair of the rproc node and its CMA node can be disabled
later on if there is no use-case defined to use that remote processor.

Signed-off-by: Suman Anna <s-anna@ti.com>

5e68266d

Admin message