Output from controller: $\tau$ (e.g. torque at each joint)
Value you want to control: $x$ (e.g. hand position)
Sensors: $x$, $q$, $\dot{q}$ (e.g. hand position, joint angle, joint velocity)
So, we want an algorithm that will give us a $\tau$ value that will cause $x$ to move towards some desired value $x_d$. And we want this to work without having to hard-code information about the motor system itself -- it has to learn that. This means it should be able to adapt when the motor system changes (different loads, motors wearing down, damage, growth, etc.)
Note that we are not going to worry about being optimal! We just want something that'll get us to the target.
To experiment with this system, we'll go with a simple two-joint arm under gravity. The parameters for this physics model were stolen from http://lendek.net/files/papers/sangeetha-fuzzie312.pdf
In [1]:
from arm import Arm
from runner import Runner
from control2 import *
In [3]:
arm = Arm(gravity=1)
runner = Runner('No control', dt=0.0001, display_steps=200)
runner.run(arm, None)
runner.show()
Or we can try that with no joint friction
In [4]:
arm = Arm(gravity=1, b1=0, b2=0)
runner = Runner('No control', dt=0.0001, display_steps=20)
runner.run(arm, None)
runner.show()
Let's start by ridiculously simplifying the problem. Let's say that we have some magic way of getting $q_d$, the desired angles of our joints. Now we just have to apply torque in the right direction to drive $q$ towards $q_d$.
The simplest approach here is to do this control system:
$$ \tau = K_p (q_d - q) $$This is known as a proportional (P) controller.
In [13]:
arm = Arm(gravity=0)
runner = Runner('Proportional control', dt=0.0001)
control = Control(K_p=5)
runner.run(arm, control)
runner.show()
This seems to work okay, until we turn gravity back on
In [14]:
arm = Arm(gravity=1)
runner = Runner('Proportional control', dt=0.0001)
control = Control(K_p=5)
runner.run(arm, control)
runner.show()
We can try to get it to work harder against gravity by cranking up the gain $K_p$.
In [15]:
arm = Arm(gravity=1)
runner = Runner('Proportional control', dt=0.0001)
control = Control(K_p=500)
runner.run(arm, control)
runner.show()
In [17]:
arm = Arm(gravity=1)
runner = Runner('PD control', dt=0.0001)
control = Control(K_p=500, K_v=1)
runner.run(arm, control)
runner.show()
This helps a bit with the oscillation, but doesn't help with the error due to gravity issue at all.
In [19]:
arm = Arm(gravity=1)
runner = Runner('PID control', dt=0.0001)
control = Control(K_p=500, K_v=1, K_i=1000)
runner.run(arm, control)
runner.show()
Well, it does work now, in that it gets to the target. But it does this by missing and then slowly getting there.
What if we increase $K_i$?
In [20]:
arm = Arm(gravity=1)
runner = Runner('PID control', dt=0.0001)
control = Control(K_p=500, K_v=1, K_i=10000)
runner.run(arm, control)
runner.show()
Now it gets there faster, but there's insane movement when the target changes.
Is there a more intelligent way to compensate for the effects of gravity?
What if we exactly knew what the effects of gravity were? We could adjust the torque accordingly. For this arm model, the effects of gravity are:
$$ \tau_d = \begin{bmatrix} g (m_1+m_2){l_1 \over 2} cos(q_0) && g (m_2){l_2 \over 2} cos(q_0+q_1) \end{bmatrix} $$That works great, but it's cheating
In [21]:
arm = Arm(gravity=1)
runner = Runner('PD control (subtract gravity)', dt=0.0001)
control = Control(K_p=50, K_v=0.5, dynamics=DynIdeal())
runner.run(arm, control)
runner.show()
Let's make this less like cheating. Let's say we know the form of the gravity effect, but we don't know the constants.
$$ \tau_d = Y_d \theta_d $$$ Y_d = \begin{bmatrix} cos(q_0) && 0 \\\ 0 && cos(q_0+q_1)\end{bmatrix}, \theta_d = \begin{bmatrix} (m_1+m_2) g {l_1 \over 2} \\\ m_2 g {l_2 \over 2} \end{bmatrix} $
Now if we start with $\theta_d$ = 0, we need some sort of learning rule that will send us towards the correct value.
According to this paper http://web.mit.edu/nsl/www/preprints/adaptive_Jacobian.pdf, here's a control law and learning rule that's guaranteed to converge: to the desired $q_d$.
$ s = {K_p} (q_d-q) + K_v {d \over {dt}} (q_d-q) $
$\tau = s + Y_d \theta_d $
$\Delta\theta_d = L_d Y_d^T s$ (where $L_d$ is a learning rate parameter)
The idea here is to adjust $\theta_d$ based on what should be happening in an ideal situation where there is nothing else affecting motion except the control signal.
In [37]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=50, K_v=0.5, dynamics=Dyn2DOFStandard(L_d=1))
runner.run(arm, control)
runner.show()
Here's an alternate form that I like a bit better. This also seems to work, but it's not the one in the paper:
$ Y_d = \begin{bmatrix} cos(q_0) && cos(q_0+q_1)\end{bmatrix} \\\ \theta_d = \begin{bmatrix} g(m_1+m_2) {l_1 \over 2} && 0 \\\ 0 && g m_2 {l_2 \over 2} \end{bmatrix} $
$ \Delta \theta_d = L_d Y_d \times s $
I like my form better (for reasons that may become apparent below)
In [2]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=50, K_v=0.5, dynamics=Dyn2DOFAlternate(L_d=1))
runner.run(arm, control)
runner.show()
Sweet. That works pretty well.
As a small tweak, we can note that the learned $\theta_d$ value makes sudden jumps when the target changes. We could play around with stopping learning for that time step, or we can do something simpler: put a low-pass filter on $q_d$.
In [31]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=50, K_v=0.5, mode='PD+dynamics', tau_d=0.02)
runner.run(arm, control)
runner.show()
Very nice.
So that works great if we know the form of the other influences on $\tau$. However, what if we don't know them? After all, there's more than jsut gravity actually affecting the movement of the system -- inertia is also having an effect. And what if it's a weird shape that it's a pain to characterize?
What we need is a set of functions $Y_d$ that can be linearly combined together using $\theta_d$ to approximate the unknown function $\tau_d(q)$.
This sounds kinda familiar.
Let's randomly generate a bunch of functions of the form $G[\alpha_i e_i \cdot q + J_{bias}]$ and use those for $Y_d$.
In [32]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=50, K_v=0.5, L_d=0.02, mode='PD+dynamics',
tau_d=0.02, neurons_d=200)
runner.run(arm, control)
runner.show()
Sweet.
So, in neural terms, $Y_d$ is the activities of a group of neurons representing $q$. $\theta_d$ is the decoder from those neurons that will approximate all the aspects of the movement that we aren't controlling, giving a value to add to our $\tau$ output.
Furthermore, the learning rule is exactly the sort of learning rule we've been looking at for the NEF. The change to the decoder is proportional to the activity of the pre-synaptic neuron, times $s$, a value represented by some other population (the desired simple PD controller signal).
Okay, that's kinda cool. But we're not there yet. After all, we've been giving this system a target $q_d$, but what we actually want to do is give it $x_d$, where there's some arbitrary unknown mapping between $x_d$ and $q_d$.
The standard way of talking about this relationship is the Jacobian $J$.
$$ \dot{x} = J \dot{q} $$Of course, even for the simple arm, the Jacobian is nasty:
$$ J = \begin{bmatrix} -sin(q_0)l_1-sin(q_0+q_1)l_2 && -sin(q_0+q_1)l_2 \\\ cos(q_0)l_1+cos(q_0+q_1)l_2 && cos(q_0+q_1)l_2 \end{bmatrix} $$But, let's assue that we have that computed somehow. Now we can change our control rule slightly.
Original rule: $s = {K_p} (q_d-q) + K_v {d \over {dt}} (q_d-q)$
Simplifying: $s = {K_p} (q_d-q) - K_v \dot{q}$
We don't have $q_d - q$. But we do have $x_d - x$. Let's use the Jacobian, and the approximation
$(q_d - q) \approx \alpha J^T (x_d - x)$
Now, rolling $\alpha$ into $K_p$, our control rule is
$s = {K_p} J^T (x_d-x) - K_v \dot{q}$
Let's try that (with no gravity, to simplify life)
In [35]:
arm = Arm(gravity=0)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0, L_d=0, mode='p2p',
tau_d=0.02, theta_k = Control.ideal_theta_k)
runner.run(arm, control)
runner.show()
Not bad. Now let's do it with gravity and the adaptive dynamics.
In [42]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0, L_d=1, mode='p2p',
tau_d=0.2, theta_k = Control.ideal_theta_k)
runner.run(arm, control)
runner.show()
Okay, so that's great if we know $J$. But what if we don't?
Well, what if we know the form of $J$ but not the constants? Here's one way to break it down:
$ \dot{x} = J(q,\theta_k)\dot{q} = Y_k(q, \dot{q})\theta_k $
$ Y_k = \begin{bmatrix} -sin(q_1)\dot{q}_1 && -sin(q_1 + q_2)(\dot{q}_1 + \dot{q}_2) && 0 && 0 \\\ 0 && 0 && cos(q_1)\dot{q}_1 && cos(q_1 + q_2)(\dot{q}_1 + \dot{q}_2) \end{bmatrix} $
$ \theta_k = \begin{bmatrix} l_1 \\\ l_2 \\\ l_1 \\\ l_2 \end{bmatrix} $
rearranging, this gives
$ J(q,\theta_k) = \begin{bmatrix} -sin(q_1)\theta_{k1}-sin(q_1+q_2)\theta_{k2} && -sin(q_1+q_2)\theta_{k2} \\\ cos(q_1)\theta_{k3}+cos(q_1+q_2)\theta_{k4} && cos(q_1+q_2)\theta_{k4} \end{bmatrix} $
Why are we doing it this way? Because this has a nice learning rule for $\theta_k$.
$ \Delta{\theta}_k = L_k Y_k^T K_p (x_d - x) $
In [37]:
arm = Arm(gravity=0)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0.0001, L_d=0, mode='p2p',
tau_d=0.02)
runner.run(arm, control)
runner.show()
Not bad. Let's have it learn the dynamics too.
In [38]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0.0001, L_d=1, mode='p2p',
tau_d=0.02)
runner.run(arm, control)
runner.show()
And with neurons to do the dynamics
In [41]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0.0001, L_d=1, mode='p2p',
tau_d=0.02, neurons_d=200)
runner.run(arm, control)
runner.show()
This is pretty good, but we're still restricted to a particular form for the Jacobian. It'd be very nice to generalize past that.
It's not as strightforward as the dynamics case, since we need both $J$ and $Y_k$:
$s = {K_p} J^T (x_d-x) - K_v \dot{q}$
$ \dot{x} = J(q,\theta_k)\dot{q} = Y_k(q, \dot{q})\theta_k $
$ Y_k = \begin{bmatrix} -sin(q_1)\dot{q}_1 && -sin(q_1 + q_2)(\dot{q}_1 + \dot{q}_2) && 0 && 0 \\\ 0 && 0 && cos(q_1)\dot{q}_1 && cos(q_1 + q_2)(\dot{q}_1 + \dot{q}_2) \end{bmatrix} $
$ \theta_k = \begin{bmatrix} l_1 \\\ l_2 \\\ l_1 \\\ l_2 \end{bmatrix} $
$ J(q,\theta_k) = \begin{bmatrix} -sin(q_1)\theta_{k1}-sin(q_1+q_2)\theta_{k2} && -sin(q_1+q_2)\theta_{k2} \\\ cos(q_1)\theta_{k3}+cos(q_1+q_2)\theta_{k4} && cos(q_1+q_2)\theta_{k4} \end{bmatrix} $
$ \Delta{\theta}_k = L_k Y_k^T K_p (x_d - x) $
So we need to do some reorganizing.
To do this, pull $\dot{q}$ out of $Y_k$
$$ Y_k = Z_0 \dot{q}_0 + Z_1 \dot{q}_1 $$$ Z_0 = \begin{bmatrix} -sin(q_1) && -sin(q_1 + q_2) && 0 && 0 \\\ 0 && 0 && cos(q_1) && cos(q_1 + q_2) \end{bmatrix} $
$ Z_1 = \begin{bmatrix} 0 && -sin(q_1 + q_2) && 0 && 0 \\\ 0 && 0 && 0 && cos(q_1 + q_2) \end{bmatrix} $
Now we have an expression for $J$
$ J = Z \theta_k = \begin{bmatrix} Z_0 \theta_k \\ Z_1 \theta_k \end{bmatrix} $
We should now be able to just generate random functions for $Z$ and use existing learning rule. Getting this to work is ongoing.
As an alternate option, I've also been playing with breaking it down a different way:
$ J = X \theta_X $
$ X = \begin{bmatrix} sin(q_0) && sin(q_0+q_1) && cos(q_0) && cos(q_0+q_1) \end{bmatrix} $
$$ \theta_X = \begin{bmatrix} \begin{bmatrix} -l_1 && 0 \\\ -l_2 && 0 \\\ 0 && l_1 \\\ 0 && l_2 \end{bmatrix} && \begin{bmatrix} 0 && 0 \\\ -l_2 && 0 \\\ 0 && 0 \\\ 0 && l_2 \end{bmatrix} \end{bmatrix} $$This makes it look a lot more like neurons and decoders, but now there's more variables to learn, and we need an alternate learning rule. This might work, but I'm just guessing here:
$ \Delta{\theta}_k = L_k Y_k^T K_p (x_d - x) $
$ \Delta{\theta}_{X} = L_k K_p (X \times \dot{q}) \times (x_d - x) $
In [ ]:
import control
reload(control)
from control import Control
import runner
reload(runner)
from runner import Runner
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0.0001, L_d=0.02, mode='p2p',
tau_d=0.02, neurons_d=200)
runner.run(arm, control)
runner.show()