Learning how to move a human arm

In this tutorial we will show how to train a basic biomechanical model using keras-rl.

Installation

To make it work, follow the instructions in https://github.com/stanfordnmbl/osim-rl#getting-started i.e. run

conda create -n opensim-rl -c kidzik opensim git python=2.7
source activate opensim-rl
pip install git+https://github.com/stanfordnmbl/osim-rl.git

Then run

git clone https://github.com/stanfordnmbl/osim-rl.git
conda install keras -c conda-forge
pip install git+https://github.com/matthiasplappert/keras-rl.git
cd osim-rl
conda install jupyter

follow the instructions and once jupyter is installed and type

jupyter notebook

This should open the browser with jupyter. Navigate to this notebook, i.e. to the file scripts/train.arm.ipynb.

Preparing the environment

The following two blocks load necessary libraries and create a simulator environment.



In [1]:

    
# Derived from keras-rl
import opensim as osim
import numpy as np
import sys

from keras.models import Sequential, Model
from keras.layers import Dense, Activation, Flatten, Input, concatenate
from keras.optimizers import Adam

import numpy as np

from rl.agents import DDPGAgent
from rl.memory import SequentialMemory
from rl.random import OrnsteinUhlenbeckProcess

from osim.env.arm import ArmEnv

from keras.optimizers import RMSprop

import argparse
import math









    



Using Theano backend.



In [2]:

    
# Load walking environment
env = ArmEnv(True)
env.reset()

# Total number of steps in training
nallsteps = 10000

nb_actions = env.action_space.shape[0]

Creating the actor and the critic

The actor serves as a brain for controlling muscles. The critic is our approximation of how good is the brain performing for achieving the goal



In [3]:

    
# Create networks for DDPG
# Next, we build a very simple model.
actor = Sequential()
actor.add(Flatten(input_shape=(1,) + env.observation_space.shape))
actor.add(Dense(32))
actor.add(Activation('relu'))
actor.add(Dense(32))
actor.add(Activation('relu'))
actor.add(Dense(32))
actor.add(Activation('relu'))
actor.add(Dense(nb_actions))
actor.add(Activation('sigmoid'))
print(actor.summary())









    



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_1 (Flatten)          (None, 14)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                480       
_________________________________________________________________
activation_1 (Activation)    (None, 32)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 32)                1056      
_________________________________________________________________
activation_2 (Activation)    (None, 32)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 32)                1056      
_________________________________________________________________
activation_3 (Activation)    (None, 32)                0         
_________________________________________________________________
dense_4 (Dense)              (None, 6)                 198       
_________________________________________________________________
activation_4 (Activation)    (None, 6)                 0         
=================================================================
Total params: 2,790.0
Trainable params: 2,790.0
Non-trainable params: 0.0
_________________________________________________________________
None



In [4]:

    
action_input = Input(shape=(nb_actions,), name='action_input')
observation_input = Input(shape=(1,) + env.observation_space.shape, name='observation_input')
flattened_observation = Flatten()(observation_input)
x = concatenate([action_input, flattened_observation])
x = Dense(64)(x)
x = Activation('relu')(x)
x = Dense(64)(x)
x = Activation('relu')(x)
x = Dense(64)(x)
x = Activation('relu')(x)
x = Dense(1)(x)
x = Activation('linear')(x)
critic = Model(inputs=[action_input, observation_input], outputs=x)
print(critic.summary())









    



____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
observation_input (InputLayer)   (None, 1, 14)         0                                            
____________________________________________________________________________________________________
action_input (InputLayer)        (None, 6)             0                                            
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 14)            0                                            
____________________________________________________________________________________________________
concatenate_1 (Concatenate)      (None, 20)            0                                            
____________________________________________________________________________________________________
dense_5 (Dense)                  (None, 64)            1344                                         
____________________________________________________________________________________________________
activation_5 (Activation)        (None, 64)            0                                            
____________________________________________________________________________________________________
dense_6 (Dense)                  (None, 64)            4160                                         
____________________________________________________________________________________________________
activation_6 (Activation)        (None, 64)            0                                            
____________________________________________________________________________________________________
dense_7 (Dense)                  (None, 64)            4160                                         
____________________________________________________________________________________________________
activation_7 (Activation)        (None, 64)            0                                            
____________________________________________________________________________________________________
dense_8 (Dense)                  (None, 1)             65                                           
____________________________________________________________________________________________________
activation_8 (Activation)        (None, 1)             0                                            
====================================================================================================
Total params: 9,729.0
Trainable params: 9,729.0
Non-trainable params: 0.0
____________________________________________________________________________________________________
None

Train the actor and the critic

We will now run keras-rl implementation of the DDPG algorithm which trains both networks.



In [5]:

    
# Set up the agent for training
memory = SequentialMemory(limit=100000, window_length=1)
random_process = OrnsteinUhlenbeckProcess(theta=.15, mu=0., sigma=.2, size=env.noutput)
agent = DDPGAgent(nb_actions=nb_actions, actor=actor, critic=critic, critic_action_input=action_input,
                  memory=memory, nb_steps_warmup_critic=100, nb_steps_warmup_actor=100,
                  random_process=random_process, gamma=.99, target_model_update=1e-3,
                  delta_clip=1.)
agent.compile(Adam(lr=.001, clipnorm=1.), metrics=['mae'])









    



WARNING (theano.configdefaults): install mkl with `conda install mkl-service`: No module named mkl
[2017-07-22 22:32:57,326] install mkl with `conda install mkl-service`: No module named mkl



In [6]:

    
# Okay, now it's time to learn something! We visualize the training here for show, but this
# slows down training quite a lot. You can always safely abort the training prematurely using
# Ctrl + C.
agent.fit(env, nb_steps=2000, visualize=False, verbose=0, nb_max_episode_steps=200, log_interval=10000)
# After training is done, we save the final weights.
#    agent.save_weights(args.model, overwrite=True)









    



Distance: 0.460324
True positions: (-1.149648,-0.048636)
Reached: (-1.566472,-0.005135)

Distance: 0.358700
True positions: (0.080656,-0.269618)
Reached: (-0.037848,-0.509815)

Distance: 2.262523
True positions: (0.010504,-0.686305)
Reached: (-1.567794,-0.002080)

Distance: 0.637941
True positions: (-0.651515,-0.027902)
Reached: (-0.421993,-0.436321)

Distance: 1.889324
True positions: (0.043834,-0.279527)
Reached: (-1.567862,-0.001899)

Distance: 0.467375
True positions: (0.068209,-0.205440)
Reached: (-0.298954,-0.305652)

Distance: 1.345630
True positions: (-1.105126,-0.895527)
Reached: (-1.564775,-0.009546)

Distance: 0.629321
True positions: (-0.601926,-0.867793)
Reached: (-0.531817,-0.308581)

Distance: 1.611037
True positions: (-0.589761,-0.644584)
Reached: (-1.565031,-0.008817)

Distance: 0.639025
True positions: (-0.590143,-0.529063)
Reached: (-0.167579,-0.312602)

Distance: 1.425042
True positions: (-0.717768,-0.583929)
Reached: (-1.565764,-0.006883)

Distance: 0.897903
True positions: (-0.282552,-0.845413)
Reached: (0.060526,-0.290588)

Distance: 2.104613
True positions: (-0.258825,-0.797847)
Reached: (-1.567617,-0.002026)

Distance: 0.754101
True positions: (-0.746036,-0.157657)
Reached: (-0.102625,-0.268347)

Distance: 2.080117
True positions: (-0.088709,-0.603127)
Reached: (-1.567698,-0.001999)

Distance: 0.850010
True positions: (-0.159947,-0.403816)
Reached: (0.471096,-0.184849)

Distance: 0.953120
True positions: (-0.774693,-0.161867)
Reached: (-1.567858,-0.001912)

Distance: 1.623438
True positions: (0.000107,-0.715426)
Reached: (0.140660,-2.198311)

Distance: 1.519730
True positions: (-0.156049,-0.109846)
Reached: (-1.567854,-0.001921)

Distance: 2.551398
True positions: (-0.341076,-0.137298)
Reached: (-0.127004,-4.083158)






    Out[6]:





<keras.callbacks.History at 0x7f8aadceaf50>

Evaluate the results

Check how our trained 'brain' performs. Below we will also load a pretrained model (on the larger number of episodes), which should perform better. It was trained exactly the same way, just with a larger number of steps (parameter nb_steps in agent.fit.



In [7]:

    
# agent.load_weights(args.model)
# Finally, evaluate our algorithm for 1 episode.
agent.test(env, nb_episodes=2, visualize=False, nb_max_episode_steps=1000)









    



Testing for 2 episodes ...

Distance: 1.744610
True positions: (-0.117997,-0.297044)
Reached: (-1.567597,-0.002033)

Distance: 0.880632
True positions: (-0.014510,-0.320320)
Reached: (0.339653,-0.846790)

Distance: 0.656555
True positions: (-0.423018,-0.626791)
Reached: (0.056300,-0.804027)

Distance: 1.092906
True positions: (-0.550707,-0.236474)
Reached: (0.175475,-0.603198)

Distance: 1.005385
True positions: (-0.443725,-0.151066)
Reached: (0.203324,-0.509402)

Distance: 0.951899
True positions: (-0.414841,-0.014396)
Reached: (0.123739,-0.427716)

Distance: 0.839426
True positions: (-0.418568,-0.693720)
Reached: (-0.001927,-1.116504)

Distance: 1.497009
True positions: (-0.894740,-0.584725)
Reached: (0.170642,-1.016351)

Distance: 1.131753
True positions: (-0.743247,-0.725087)
Reached: (0.183012,-0.930581)

Distance: 0.312900
True positions: (-0.142488,-0.953783)
Reached: (0.149220,-0.932591)
Episode 1: reward: -976.410, steps: 1000

Distance: 1.580191
True positions: (-0.732438,-0.747050)
Reached: (-1.567608,-0.002028)

Distance: 0.556819
True positions: (0.165693,-0.517732)
Reached: (0.298295,-0.941949)

Distance: 0.852716
True positions: (-0.431979,-0.202319)
Reached: (-0.004357,-0.627413)

Distance: 0.061322
True positions: (0.165523,-0.588966)
Reached: (0.199187,-0.561308)

Distance: 0.222607
True positions: (0.181837,-0.303762)
Reached: (0.124651,-0.469183)

Distance: 0.759515
True positions: (-0.399265,-0.265959)
Reached: (0.098984,-0.527225)

Distance: 0.217864
True positions: (-0.048969,-0.635982)
Reached: (0.153119,-0.651758)

Distance: 1.146655
True positions: (-0.664945,-0.014114)
Reached: (0.026750,-0.469074)

Distance: 0.396665
True positions: (0.188923,-0.061788)
Reached: (0.170766,-0.440296)

Distance: 1.534696
True positions: (-0.972415,-0.597262)
Reached: (0.166697,-0.992847)
Episode 2: reward: -857.089, steps: 1000






    Out[7]:





<keras.callbacks.History at 0x7f8aa0777410>



In [9]:

    
agent.load_weights("../models/example.h5f")
# Finally, evaluate our algorithm for 1 episode.
agent.test(env, nb_episodes=5, visualize=False, nb_max_episode_steps=1000)









    



Testing for 5 episodes ...

Distance: 1.982073
True positions: (-0.365306,-0.789004)
Reached: (-1.565560,-0.007185)

Distance: 0.489316
True positions: (-1.066750,-0.081859)
Reached: (-0.828360,-0.332785)

Distance: 0.340835
True positions: (-0.821318,-0.029109)
Reached: (-0.726064,-0.274690)

Distance: 0.112628
True positions: (-0.885140,-0.309075)
Reached: (-0.880001,-0.416565)

Distance: 0.225917
True positions: (-1.052513,-0.912940)
Reached: (-0.975606,-1.061949)

Distance: 0.219576
True positions: (-0.137413,-0.078975)
Reached: (-0.165256,-0.270706)

Distance: 0.201277
True positions: (-0.692787,-0.661610)
Reached: (-0.850584,-0.705090)

Distance: 0.257323
True positions: (-0.855284,-0.132450)
Reached: (-0.740171,-0.274659)

Distance: 0.099240
True positions: (-0.470129,-0.782663)
Reached: (-0.568455,-0.781748)

Distance: 0.418505
True positions: (-1.123986,-0.081918)
Reached: (-0.917664,-0.294101)
Episode 1: reward: -340.685, steps: 1000

Distance: 0.565987
True positions: (-1.166045,-0.176900)
Reached: (-1.564738,-0.009606)

Distance: 0.413008
True positions: (-1.111553,-0.424746)
Reached: (-0.708131,-0.434332)

Distance: 0.160070
True positions: (0.187228,-0.845948)
Reached: (0.335622,-0.857624)

Distance: 0.014350
True positions: (-0.234847,-0.743126)
Reached: (-0.222865,-0.740759)

Distance: 0.218572
True positions: (0.017750,-0.083022)
Reached: (0.066215,-0.253128)

Distance: 0.137725
True positions: (-0.973785,-0.580600)
Reached: (-1.055013,-0.637097)

Distance: 0.063488
True positions: (-0.050015,-0.270118)
Reached: (-0.033164,-0.316754)

Distance: 0.145679
True positions: (-0.577268,-0.582374)
Reached: (-0.706824,-0.598496)

Distance: 0.040211
True positions: (-0.849001,-0.538548)
Reached: (-0.888245,-0.537582)

Distance: 0.242493
True positions: (0.150636,-0.690794)
Reached: (0.286809,-0.584474)
Episode 2: reward: -283.193, steps: 1000

Distance: 1.110902
True positions: (-0.911919,-0.467695)
Reached: (-1.564734,-0.009608)

Distance: 0.187479
True positions: (-0.912437,-0.926943)
Reached: (-0.870454,-1.072439)

Distance: 0.692823
True positions: (-1.166263,-0.060925)
Reached: (-0.728677,-0.316163)

Distance: 0.121699
True positions: (0.069839,-0.985841)
Reached: (0.183458,-0.977761)

Distance: 0.362258
True positions: (-0.728851,-0.079597)
Reached: (-0.620598,-0.333603)

Distance: 0.305197
True positions: (-0.359902,-0.059023)
Reached: (-0.353848,-0.358166)

Distance: 0.177422
True positions: (-0.852094,-0.251452)
Reached: (-0.762637,-0.339417)

Distance: 0.168020
True positions: (0.197237,-0.513982)
Reached: (0.287570,-0.436296)

Distance: 0.113966
True positions: (-0.398864,-0.999874)
Reached: (-0.479416,-0.966461)

Distance: 0.052240
True positions: (-0.201805,-0.521741)
Reached: (-0.168578,-0.502727)
Episode 3: reward: -306.522, steps: 1000

Distance: 1.525887
True positions: (-0.395860,-0.363372)
Reached: (-1.565560,-0.007185)

Distance: 0.075686
True positions: (-0.862836,-0.487829)
Reached: (-0.796294,-0.478685)

Distance: 0.254760
True positions: (-0.630002,-0.431469)
Reached: (-0.768987,-0.547243)

Distance: 0.149572
True positions: (-0.852366,-0.310090)
Reached: (-0.896241,-0.415787)

Distance: 0.255215
True positions: (-0.690679,-0.037590)
Reached: (-0.661083,-0.263208)

Distance: 0.078589
True positions: (0.115092,-0.366882)
Reached: (0.160598,-0.333799)

Distance: 0.041458
True positions: (-0.237613,-0.866449)
Reached: (-0.246931,-0.834309)

Distance: 0.078235
True positions: (-0.091389,-0.404260)
Reached: (-0.055849,-0.446955)

Distance: 0.110965
True positions: (-0.014442,-0.979859)
Reached: (0.075633,-0.958968)

Distance: 0.159591
True positions: (0.223128,-0.570212)
Reached: (0.298659,-0.486151)
Episode 4: reward: -238.329, steps: 1000

Distance: 1.908310
True positions: (-0.300740,-0.643097)
Reached: (-1.567860,-0.001908)

Distance: 0.173744
True positions: (0.154661,-0.377354)
Reached: (0.255312,-0.450447)

Distance: 0.139088
True positions: (-0.818098,-0.352755)
Reached: (-0.855547,-0.454393)

Distance: 0.193557
True positions: (-0.676368,-0.192755)
Reached: (-0.587963,-0.297906)

Distance: 0.157934
True positions: (-0.810920,-0.369649)
Reached: (-0.867354,-0.471150)

Distance: 0.032535
True positions: (-0.838878,-0.567574)
Reached: (-0.836916,-0.537001)

Distance: 0.051300
True positions: (-0.155710,-0.606622)
Reached: (-0.115568,-0.595464)

Distance: 0.009658
True positions: (-0.148966,-0.381940)
Reached: (-0.143162,-0.385795)

Distance: 0.117823
True positions: (-0.476770,-0.817293)
Reached: (-0.591291,-0.820595)

Distance: 0.161973
True positions: (-0.971441,-0.233340)
Reached: (-0.881267,-0.305139)
Episode 5: reward: -198.009, steps: 1000






    Out[9]:





<keras.callbacks.History at 0x7f8aa0777890>