The latest Xilinx SoC products combine a set of heterogeneous hardware designs into one powerful and flexible platform that includes Arm Cortex-Ax, Cortex-R5 and Xilinx MicroBlaze processors. The OpenAMP project enables a distributed software architecture across this asymmetric multiprocessing platform (AMP).
This notebook example shows how the Cortex-A Application Processing Unit (APU) can launch an application on the Cortex-R Realtime Processing Unit (RPU). The APU subsystem running Linux is the designated master responsible for managing the life cycle of the RPU. The APU uses the remoteproc framework of OpenAMP to load, start, and stop the RPU application. RPU applications must be written in accordance with the OpenAMP application requirements. See Libmetal and OpenAMP User Guide (UG1186).
We are using the PetaLinux pre-built images in these examples. For the pre-built kernel images, device tree binary or blobs (e.g.: system.dtb), root file system archives and other files, please find the images
directory in your PetaLinux project. For example: pre-built/linux/images
.
Aside from the master OS (e.g.: Linux) support, the demos require two executables:
RPU binary in /lib/firmware |
APU client executable |
---|---|
image_echo_test |
/usr/bin/echo_test |
image_matrix_multiply |
/usr/bin/mat_mul_demo |
In [ ]:
! date; uname -a; id; pwd; ls -l /lib/firmware/i* /usr/bin/{echo_test,mat_mul_demo,proxy_app}
In [ ]:
! modprobe virtio_ring
! modprobe zynqmp_r5_remoteproc
! modprobe virtio_rpmsg_bus
! modprobe rpmsg_char
! modprobe virtio
! lsmod | tail
The sysfs filesystem is enabled in the pre-built kernel and it makes it easy to use the remoteproc driver from the Linux shell.
/sys/class/remoteproc/remoteproc0/state
to stop RPU if it was running./sys/class/remoteproc/remoteproc0/firmware
. Use only the basename of the firmware from the /lib/firmware
directory.
In [ ]:
! grep . /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'
! [[ "offline" != $(</sys/class/remoteproc/remoteproc0/state) ]] && echo stop > /sys/class/remoteproc/remoteproc0/state
! echo image_echo_test >/sys/class/remoteproc/remoteproc0/firmware
! echo === before and after ===; grep . /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'
At this point the APU is the master running Linux and the RPU is the remote in standby or powered down state. To start executing the firmware recorded in /sys/class/remoteproc/remoteproc0/firmware
we write the word start
to /sys/class/remoteproc/remoteproc0/state
. This triggers the following sequence:
In [ ]:
! grep . /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'
! echo start > /sys/class/remoteproc/remoteproc0/state
! echo === before and after ===; grep . /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'; /bin/dmesg | tail
The dmesg
output should have a line indicating that the RPU is running, e.g.:
remote processor r5@0 is now up
After the previous step the RPU is running the echo server /lib/firmware/image_echo_test. The echo client is a Linux Cortex-Ax binary /usr/bin/echo_test. It sends a number of payloads from the APU master to the remote RPU and verifies that they match the replies from the echo server on the RPU. Both the client and the server use the Linux kernel RPMsg module to send and receive data.
In [ ]:
! /usr/bin/echo_test > /tmp/echo_test.out
! head /tmp/echo_test.out; echo === skipping; tail /tmp/echo_test.out
In this demo the remote on the RPU runs a simple matrix multiplication server. The algorithm is a direct implementation of the matrix multiplication definition which has $\Theta \left( n^{3} \right)$ complexity to multiply $n \times n$ matrices.
The RPU setup steps:
stop
to /sys/class/remoteproc/remoteproc0/state
image_matrix_multiply
In [ ]:
! grep . /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'
! [[ "offline" != $(</sys/class/remoteproc/remoteproc0/state) ]] && echo stop > /sys/class/remoteproc/remoteproc0/state
! echo image_matrix_multiply >/sys/class/remoteproc/remoteproc0/firmware
! echo start >/sys/class/remoteproc/remoteproc0/state
! echo === before and after ===; grep . /sys/class/remoteproc/remoteproc0/{state,firmware} |tr : '\t'; /bin/dmesg | tail
After the previous step the RPU is running the matrix multiplication server /lib/firmware/image_matrix_multiply
. The client is a Linux Cortex-Ax binary /usr/bin/mat_mul_demo
. It generates two matrices and sends them to the RPU. The server on the RPU calculates the results and sends it back to the APU client. The client prints the result to its sdtout
. Both the client and the server use Linux kernel RPMsg module to send and receive data.
In [ ]:
! /usr/bin/mat_mul_demo