Adaptive Motor Control

(what to do when you don't know what you're controlling or how to do it)

Output from controller: $\tau$ (e.g. torque at each joint)

Value you want to control: $x$ (e.g. hand position)

Sensors: $x$, $q$, $\dot{q}$ (e.g. hand position, joint angle, joint velocity)

So, we want an algorithm that will give us a $\tau$ value that will cause $x$ to move towards some desired value $x_d$. And we want this to work without having to hard-code information about the motor system itself -- it has to learn that. This means it should be able to adapt when the motor system changes (different loads, motors wearing down, damage, growth, etc.)

Note that we are not going to worry about being optimal! We just want something that'll get us to the target.

Example motor system: 2-dof arm

To experiment with this system, we'll go with a simple two-joint arm under gravity. The parameters for this physics model were stolen from http://lendek.net/files/papers/sangeetha-fuzzie312.pdf


In [1]:
from arm import Arm
from runner import Runner
from control2 import *

In [3]:
arm = Arm(gravity=1)
runner = Runner('No control', dt=0.0001, display_steps=200)
runner.run(arm, None)
runner.show()

Or we can try that with no joint friction


In [4]:
arm = Arm(gravity=1, b1=0, b2=0)
runner = Runner('No control', dt=0.0001, display_steps=20)
runner.run(arm, None)
runner.show()

Step 1: A proportional controller

Let's start by ridiculously simplifying the problem. Let's say that we have some magic way of getting $q_d$, the desired angles of our joints. Now we just have to apply torque in the right direction to drive $q$ towards $q_d$.

The simplest approach here is to do this control system:

$$ \tau = K_p (q_d - q) $$

This is known as a proportional (P) controller.


In [13]:
arm = Arm(gravity=0)
runner = Runner('Proportional control', dt=0.0001)
control = Control(K_p=5)
runner.run(arm, control)
runner.show()

This seems to work okay, until we turn gravity back on


In [14]:
arm = Arm(gravity=1)
runner = Runner('Proportional control', dt=0.0001)
control = Control(K_p=5)
runner.run(arm, control)
runner.show()

We can try to get it to work harder against gravity by cranking up the gain $K_p$.


In [15]:
arm = Arm(gravity=1)
runner = Runner('Proportional control', dt=0.0001)
control = Control(K_p=500)
runner.run(arm, control)
runner.show()

Step 2: A PD controller

How do we improve on a proportional controller and get rid of that oscillatory oscillatory overshoot? The standard way is to introduce a derivative term, making it a PD controller

$$ \tau = K_p (q_d - q) + K_v {d \over {dt}}(q_d - q) $$

In [17]:
arm = Arm(gravity=1)
runner = Runner('PD control', dt=0.0001)
control = Control(K_p=500, K_v=1)
runner.run(arm, control)
runner.show()

This helps a bit with the oscillation, but doesn't help with the error due to gravity issue at all.

Step 3: A PID Controller

Of course, it's now natural to try this with a full PID controller

$$ \tau = K_p (q_d - q) + K_v {d \over {dt}}(q_d - q) + K_i \int (q_d - q)dt $$

In [19]:
arm = Arm(gravity=1)
runner = Runner('PID control', dt=0.0001)
control = Control(K_p=500, K_v=1, K_i=1000)
runner.run(arm, control)
runner.show()

Well, it does work now, in that it gets to the target. But it does this by missing and then slowly getting there.

What if we increase $K_i$?


In [20]:
arm = Arm(gravity=1)
runner = Runner('PID control', dt=0.0001)
control = Control(K_p=500, K_v=1, K_i=10000)
runner.run(arm, control)
runner.show()

Now it gets there faster, but there's insane movement when the target changes.

Is there a more intelligent way to compensate for the effects of gravity?

Step 4: Subtract gravity

What if we exactly knew what the effects of gravity were? We could adjust the torque accordingly. For this arm model, the effects of gravity are:

$$ \tau_d = \begin{bmatrix} g (m_1+m_2){l_1 \over 2} cos(q_0) && g (m_2){l_2 \over 2} cos(q_0+q_1) \end{bmatrix} $$

That works great, but it's cheating


In [21]:
arm = Arm(gravity=1)
runner = Runner('PD control (subtract gravity)', dt=0.0001)
control = Control(K_p=50, K_v=0.5, dynamics=DynIdeal())
runner.run(arm, control)
runner.show()

Step 5: Learning the constants

Let's make this less like cheating. Let's say we know the form of the gravity effect, but we don't know the constants.

$$ \tau_d = Y_d \theta_d $$

$ Y_d = \begin{bmatrix} cos(q_0) && 0 \\\ 0 && cos(q_0+q_1)\end{bmatrix}, \theta_d = \begin{bmatrix} (m_1+m_2) g {l_1 \over 2} \\\ m_2 g {l_2 \over 2} \end{bmatrix} $

Now if we start with $\theta_d$ = 0, we need some sort of learning rule that will send us towards the correct value.

According to this paper http://web.mit.edu/nsl/www/preprints/adaptive_Jacobian.pdf, here's a control law and learning rule that's guaranteed to converge: to the desired $q_d$.

$ s = {K_p} (q_d-q) + K_v {d \over {dt}} (q_d-q) $

$\tau = s + Y_d \theta_d $

$\Delta\theta_d = L_d Y_d^T s$ (where $L_d$ is a learning rate parameter)

The idea here is to adjust $\theta_d$ based on what should be happening in an ideal situation where there is nothing else affecting motion except the control signal.


In [37]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=50, K_v=0.5, dynamics=Dyn2DOFStandard(L_d=1))
runner.run(arm, control)
runner.show()

Here's an alternate form that I like a bit better. This also seems to work, but it's not the one in the paper:

$ Y_d = \begin{bmatrix} cos(q_0) && cos(q_0+q_1)\end{bmatrix} \\\ \theta_d = \begin{bmatrix} g(m_1+m_2) {l_1 \over 2} && 0 \\\ 0 && g m_2 {l_2 \over 2} \end{bmatrix} $

$ \Delta \theta_d = L_d Y_d \times s $

I like my form better (for reasons that may become apparent below)


In [2]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=50, K_v=0.5, dynamics=Dyn2DOFAlternate(L_d=1))
runner.run(arm, control)
runner.show()


Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Python27\lib\lib-tk\Tkinter.py", line 1470, in __call__
    return self.func(*args)
  File "C:\Python27\lib\lib-tk\Tkinter.py", line 531, in callit
    func(*args)
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_tkagg.py", line 141, in _on_timer
    TimerBase._on_timer(self)
  File "C:\Python27\lib\site-packages\matplotlib\backend_bases.py", line 1203, in _on_timer
    ret = func(*args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\animation.py", line 876, in _step
    still_going = Animation._step(self, *args)
  File "C:\Python27\lib\site-packages\matplotlib\animation.py", line 735, in _step
    self._draw_next_frame(framedata, self._blit)
  File "C:\Python27\lib\site-packages\matplotlib\animation.py", line 754, in _draw_next_frame
    self._draw_frame(framedata)
  File "C:\Python27\lib\site-packages\matplotlib\animation.py", line 1049, in _draw_frame
    self._drawn_artists = self._func(framedata, *self._args)
  File "runner.py", line 113, in anim_animate
    tau = self.control.control(self.arm, self.x_d, self.q_d, self.dt*self.control_steps)
  File "control2.py", line 96, in control
    torque = self.dynamics.torque(arm, torque, dt)
AttributeError: Dyn2DOFAlternate instance has no attribute 'torque'

Sweet. That works pretty well.

As a small tweak, we can note that the learned $\theta_d$ value makes sudden jumps when the target changes. We could play around with stopping learning for that time step, or we can do something simpler: put a low-pass filter on $q_d$.


In [31]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=50, K_v=0.5, mode='PD+dynamics', tau_d=0.02)
runner.run(arm, control)
runner.show()

Very nice.

Step 6: Generalizing dynamics

So that works great if we know the form of the other influences on $\tau$. However, what if we don't know them? After all, there's more than jsut gravity actually affecting the movement of the system -- inertia is also having an effect. And what if it's a weird shape that it's a pain to characterize?

What we need is a set of functions $Y_d$ that can be linearly combined together using $\theta_d$ to approximate the unknown function $\tau_d(q)$.

This sounds kinda familiar.

Let's randomly generate a bunch of functions of the form $G[\alpha_i e_i \cdot q + J_{bias}]$ and use those for $Y_d$.


In [32]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=50, K_v=0.5, L_d=0.02, mode='PD+dynamics', 
                  tau_d=0.02, neurons_d=200)
runner.run(arm, control)
runner.show()

Sweet.

So, in neural terms, $Y_d$ is the activities of a group of neurons representing $q$. $\theta_d$ is the decoder from those neurons that will approximate all the aspects of the movement that we aren't controlling, giving a value to add to our $\tau$ output.

Furthermore, the learning rule is exactly the sort of learning rule we've been looking at for the NEF. The change to the decoder is proportional to the activity of the pre-synaptic neuron, times $s$, a value represented by some other population (the desired simple PD controller signal).

Step 7: Kinematics

Okay, that's kinda cool. But we're not there yet. After all, we've been giving this system a target $q_d$, but what we actually want to do is give it $x_d$, where there's some arbitrary unknown mapping between $x_d$ and $q_d$.

The standard way of talking about this relationship is the Jacobian $J$.

$$ \dot{x} = J \dot{q} $$

Of course, even for the simple arm, the Jacobian is nasty:

$$ J = \begin{bmatrix} -sin(q_0)l_1-sin(q_0+q_1)l_2 && -sin(q_0+q_1)l_2 \\\ cos(q_0)l_1+cos(q_0+q_1)l_2 && cos(q_0+q_1)l_2 \end{bmatrix} $$

But, let's assue that we have that computed somehow. Now we can change our control rule slightly.

Original rule: $s = {K_p} (q_d-q) + K_v {d \over {dt}} (q_d-q)$

Simplifying: $s = {K_p} (q_d-q) - K_v \dot{q}$

We don't have $q_d - q$. But we do have $x_d - x$. Let's use the Jacobian, and the approximation

$(q_d - q) \approx \alpha J^T (x_d - x)$

Now, rolling $\alpha$ into $K_p$, our control rule is

$s = {K_p} J^T (x_d-x) - K_v \dot{q}$

Let's try that (with no gravity, to simplify life)


In [35]:
arm = Arm(gravity=0)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0, L_d=0, mode='p2p', 
                  tau_d=0.02, theta_k = Control.ideal_theta_k)
runner.run(arm, control)
runner.show()

Not bad. Now let's do it with gravity and the adaptive dynamics.


In [42]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0, L_d=1, mode='p2p', 
                  tau_d=0.2, theta_k = Control.ideal_theta_k)
runner.run(arm, control)
runner.show()

Step 8: Learning the kinematics

Okay, so that's great if we know $J$. But what if we don't?

Well, what if we know the form of $J$ but not the constants? Here's one way to break it down:

$ \dot{x} = J(q,\theta_k)\dot{q} = Y_k(q, \dot{q})\theta_k $

$ Y_k = \begin{bmatrix} -sin(q_1)\dot{q}_1 && -sin(q_1 + q_2)(\dot{q}_1 + \dot{q}_2) && 0 && 0 \\\ 0 && 0 && cos(q_1)\dot{q}_1 && cos(q_1 + q_2)(\dot{q}_1 + \dot{q}_2) \end{bmatrix} $

$ \theta_k = \begin{bmatrix} l_1 \\\ l_2 \\\ l_1 \\\ l_2 \end{bmatrix} $

rearranging, this gives

$ J(q,\theta_k) = \begin{bmatrix} -sin(q_1)\theta_{k1}-sin(q_1+q_2)\theta_{k2} && -sin(q_1+q_2)\theta_{k2} \\\ cos(q_1)\theta_{k3}+cos(q_1+q_2)\theta_{k4} && cos(q_1+q_2)\theta_{k4} \end{bmatrix} $

Why are we doing it this way? Because this has a nice learning rule for $\theta_k$.

$ \Delta{\theta}_k = L_k Y_k^T K_p (x_d - x) $


In [37]:
arm = Arm(gravity=0)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0.0001, L_d=0, mode='p2p', 
                  tau_d=0.02)
runner.run(arm, control)
runner.show()

Not bad. Let's have it learn the dynamics too.


In [38]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0.0001, L_d=1, mode='p2p', 
                  tau_d=0.02)
runner.run(arm, control)
runner.show()

And with neurons to do the dynamics


In [41]:
arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0.0001, L_d=1, mode='p2p', 
                  tau_d=0.02, neurons_d=200)
runner.run(arm, control)
runner.show()

Where do we go from here?

This is pretty good, but we're still restricted to a particular form for the Jacobian. It'd be very nice to generalize past that.

It's not as strightforward as the dynamics case, since we need both $J$ and $Y_k$:

$s = {K_p} J^T (x_d-x) - K_v \dot{q}$

$ \dot{x} = J(q,\theta_k)\dot{q} = Y_k(q, \dot{q})\theta_k $

$ Y_k = \begin{bmatrix} -sin(q_1)\dot{q}_1 && -sin(q_1 + q_2)(\dot{q}_1 + \dot{q}_2) && 0 && 0 \\\ 0 && 0 && cos(q_1)\dot{q}_1 && cos(q_1 + q_2)(\dot{q}_1 + \dot{q}_2) \end{bmatrix} $

$ \theta_k = \begin{bmatrix} l_1 \\\ l_2 \\\ l_1 \\\ l_2 \end{bmatrix} $

$ J(q,\theta_k) = \begin{bmatrix} -sin(q_1)\theta_{k1}-sin(q_1+q_2)\theta_{k2} && -sin(q_1+q_2)\theta_{k2} \\\ cos(q_1)\theta_{k3}+cos(q_1+q_2)\theta_{k4} && cos(q_1+q_2)\theta_{k4} \end{bmatrix} $

$ \Delta{\theta}_k = L_k Y_k^T K_p (x_d - x) $

So we need to do some reorganizing.

To do this, pull $\dot{q}$ out of $Y_k$

$$ Y_k = Z_0 \dot{q}_0 + Z_1 \dot{q}_1 $$

$ Z_0 = \begin{bmatrix} -sin(q_1) && -sin(q_1 + q_2) && 0 && 0 \\\ 0 && 0 && cos(q_1) && cos(q_1 + q_2) \end{bmatrix} $

$ Z_1 = \begin{bmatrix} 0 && -sin(q_1 + q_2) && 0 && 0 \\\ 0 && 0 && 0 && cos(q_1 + q_2) \end{bmatrix} $

Now we have an expression for $J$

$ J = Z \theta_k = \begin{bmatrix} Z_0 \theta_k \\ Z_1 \theta_k \end{bmatrix} $

We should now be able to just generate random functions for $Z$ and use existing learning rule. Getting this to work is ongoing.

An alternate approach? (i.e. making things up)

As an alternate option, I've also been playing with breaking it down a different way:

$ J = X \theta_X $

$ X = \begin{bmatrix} sin(q_0) && sin(q_0+q_1) && cos(q_0) && cos(q_0+q_1) \end{bmatrix} $

$$ \theta_X = \begin{bmatrix} \begin{bmatrix} -l_1 && 0 \\\ -l_2 && 0 \\\ 0 && l_1 \\\ 0 && l_2 \end{bmatrix} && \begin{bmatrix} 0 && 0 \\\ -l_2 && 0 \\\ 0 && 0 \\\ 0 && l_2 \end{bmatrix} \end{bmatrix} $$

This makes it look a lot more like neurons and decoders, but now there's more variables to learn, and we need an alternate learning rule. This might work, but I'm just guessing here:

$ \Delta{\theta}_k = L_k Y_k^T K_p (x_d - x) $

$ \Delta{\theta}_{X} = L_k K_p (X \times \dot{q}) \times (x_d - x) $


In [ ]:
import control
reload(control)
from control import Control
import runner
reload(runner)
from runner import Runner

arm = Arm(gravity=1)
runner = Runner('PD control with adaptive dynamics', dt=0.0001)
control = Control(K_p=500, K_v=0.5, L_k=0.0001, L_d=0.02, mode='p2p', 
                  tau_d=0.02, neurons_d=200)
runner.run(arm, control)
runner.show()