Evolving Stable Strategies

While RL algorithms require a reward signal to be given to the agent at every timestep, ES algorithms only care about the final cumulative reward that an agent gets at the end of its rollout in an environment. In many problems, we only know the outcome at the end of the task, such as whether the agent wins or loses, whether the robot arm picks up the object or not, or whether the agent has survived, and these are problems where ES may have an advantage over traditional RL.[1]



In [5]:

    
import gym



In [3]:

    
# taken from [1]
def rollout(agent, env):
    obs = env.reset()
    done = False
    total_reward = 0
    while not done:
        a = agent.get_action(obs)
        obs, reward, done = env.step(a)
        total_reward += reward
    return total_reward



In [6]:

    
env = gym.make('worlddomination-v0')









    



[2017-11-13 11:08:50,218] Making new env: worlddomination-v0






    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/amir.ziai/anaconda/lib/python3.5/site-packages/gym/envs/registration.py in spec(self, id)
    136         try:
--> 137             return self.env_specs[id]
    138         except KeyError:

KeyError: 'worlddomination-v0'

During handling of the above exception, another exception occurred:

UnregisteredEnv                           Traceback (most recent call last)
<ipython-input-6-137ff435bac7> in <module>()
----> 1 env = gym.make('worlddomination-v0')

/Users/amir.ziai/anaconda/lib/python3.5/site-packages/gym/envs/registration.py in make(id)
    159 
    160 def make(id):
--> 161     return registry.make(id)
    162 
    163 def spec(id):

/Users/amir.ziai/anaconda/lib/python3.5/site-packages/gym/envs/registration.py in make(self, id)
    116     def make(self, id):
    117         logger.info('Making new env: %s', id)
--> 118         spec = self.spec(id)
    119         env = spec.make()
    120         if (env.spec.timestep_limit is not None) and not spec.tags.get('vnc'):

/Users/amir.ziai/anaconda/lib/python3.5/site-packages/gym/envs/registration.py in spec(self, id)
    145                 raise error.DeprecatedEnv('Env {} not found (valid versions include {})'.format(id, matching_envs))
    146             else:
--> 147                 raise error.UnregisteredEnv('No registered env with id: {}'.format(id))
    148 
    149     def register(self, id, **kwargs):

UnregisteredEnv: No registered env with id: worlddomination-v0



In [ ]:

References

[1] http://blog.otoro.net/2017/11/12/evolving-stable-strategies/?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=The%20Wild%20Week%20in%20AI



In [ ]: