The latest Xilinx SoC products combine a set of heterogeneous hardware designs into one powerful and flexible platform that includes Arm Cortex-Ax, Cortex-R5 and Xilinx MicroBlaze processors. The OpenAMP project enables a distributed software architecture across this asymmetric multiprocessing platform (AMP).


Implementation details

This notebook example shows how the Cortex-A Application Processing Unit (APU) can launch an application on the Cortex-R Realtime Processing Unit (RPU). The APU subsystem running Linux is the designated master responsible for managing the life cycle of the RPU. The APU uses the remoteproc framework of OpenAMP to load, start, and stop the RPU application. RPU applications must be written in accordance with the OpenAMP application requirements. See Libmetal and OpenAMP User Guide (UG1186).

Setup

We are using the PetaLinux pre-built images in these examples. For the pre-built kernel images, device tree binary or blobs (e.g.: system.dtb), root file system archives and other files, please find the images directory in your PetaLinux project. For example: pre-built/linux/images.

Aside from the master OS (e.g.: Linux) support, the demos require two executables:

  1. RPU application or firmware is Cortex-R binary, used as an offloading server
  2. APU application is a Linux executable, e.g. Cortex-Ax binary - client
RPU binary in /lib/firmware APU client executable
image_echo_test /usr/bin/echo_test
image_matrix_multiply /usr/bin/mat_mul_demo

In [ ]:
! date; uname -a; id; pwd; ls -l /lib/firmware/i* /usr/bin/{echo_test,mat_mul_demo,proxy_app}

Load Kernel modules

The demos use the remoteproc Linux kernel module to load firmware and start, stop the RPU. We also load the RPMsg module set, including VirtIO.


In [ ]:
! modprobe virtio_ring
! modprobe zynqmp_r5_remoteproc
! modprobe virtio_rpmsg_bus
! modprobe rpmsg_char
! modprobe virtio
! lsmod | tail

Add firmware name

The sysfs filesystem is enabled in the pre-built kernel and it makes it easy to use the remoteproc driver from the Linux shell.

  1. Use the sysfs entry /sys/class/remoteproc/remoteproc0/state to stop RPU if it was running.
  2. Add the name of the RPU firmware via /sys/class/remoteproc/remoteproc0/firmware. Use only the basename of the firmware from the /lib/firmware directory.

In [ ]:
! grep .      /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'
! [[ "offline" != $(</sys/class/remoteproc/remoteproc0/state) ]] && echo stop > /sys/class/remoteproc/remoteproc0/state
! echo image_echo_test >/sys/class/remoteproc/remoteproc0/firmware
! echo === before and after ===; grep . /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'

Demo: Echo Test

At this point the APU is the master running Linux and the RPU is the remote in standby or powered down state. To start executing the firmware recorded in /sys/class/remoteproc/remoteproc0/firmware we write the word start to /sys/class/remoteproc/remoteproc0/state. This triggers the following sequence:

  1. The Linux kernel on the master loads the RPU's firmware into memory based on configuration in the firmware.
  2. The master starts the RPU and waits for it to initialize.
  3. The master is notified when initialization is complete and the RPU is running.

In [ ]:
! grep .       /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'
! echo start > /sys/class/remoteproc/remoteproc0/state
! echo === before and after ===; grep . /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'; /bin/dmesg | tail

The dmesg output should have a line indicating that the RPU is running, e.g.: remote processor r5@0 is now up

After the previous step the RPU is running the echo server /lib/firmware/image_echo_test. The echo client is a Linux Cortex-Ax binary /usr/bin/echo_test. It sends a number of payloads from the APU master to the remote RPU and verifies that they match the replies from the echo server on the RPU. Both the client and the server use the Linux kernel RPMsg module to send and receive data.


In [ ]:
! /usr/bin/echo_test > /tmp/echo_test.out
! head /tmp/echo_test.out; echo === skipping; tail /tmp/echo_test.out

Demo: Matrix Multiplication

In this demo the remote on the RPU runs a simple matrix multiplication server. The algorithm is a direct implementation of the matrix multiplication definition which has $\Theta \left( n^{3} \right)$ complexity to multiply $n \times n$ matrices.

The RPU setup steps:

  1. Stop the RPU if it was running by writing stop to /sys/class/remoteproc/remoteproc0/state
  2. Tell the remoteproc the name of the RPU firmware, i.e. image_matrix_multiply
  3. Start the RPU using remoteproc via sysfs

In [ ]:
! grep .      /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'
! [[ "offline" != $(</sys/class/remoteproc/remoteproc0/state) ]] && echo stop > /sys/class/remoteproc/remoteproc0/state
! echo image_matrix_multiply >/sys/class/remoteproc/remoteproc0/firmware
! echo start                 >/sys/class/remoteproc/remoteproc0/state
! echo === before and after ===; grep . /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'; /bin/dmesg | tail

Run matrix multiplication client

After the previous step the RPU is running the matrix multiplication server /lib/firmware/image_matrix_multiply. The client is a Linux Cortex-Ax binary /usr/bin/mat_mul_demo. It generates two matrices and sends them to the RPU. The server on the RPU calculates the results and sends it back to the APU client. The client prints the result to its sdtout. Both the client and the server use Linux kernel RPMsg module to send and receive data.


In [ ]:
! /usr/bin/mat_mul_demo