Project :: Evolution Strategies

Remember the idea behind Evolution Strategies? Here's a neat blog post about 'em.

Can you reproduce their success? You will have to implement evolutionary strategies and see how they work.

This project is optional; has several milestones each worth a number of points [and swag].

Milestones:

  • [10pts] Basic prototype of evolutionary strategies that works in one thread on CartPole
  • [+5pts] Modify the code to make them work in parallel
  • [+5pts] if you can run ES distributedly on at least two PCs
  • [+10pts] Apply ES to play Atari Pong at least better than random
  • [++] Additional points for all kinds of cool stuff besides milestones

Rules:

  • This is not a mandatory assignment, but it's a way to learn some cool things if you're getting bored with default assignments.
  • Once you decided to take on this project, please tell any of course staff members so that we can help ypu if you get stuck.
  • There's a default implementation of ES in this openai repo. It's okay to look there if you get stuck or want to compare your solutions, but each copy-pasted chunk of code should be understood thoroughly. We'll test that with questions.

Tips on implementation

  • It would be very convenient later if you implemented a function that takes policy weights, generates a session and returns policy changes -- so that you could then run a bunch of them in parallel.

  • The simplest way you can do multiprocessing is to use joblib

  • For joblib, make sure random variables are independent in each job. Simply add np.random.seed() at the beginning of your "job" function.

Later once you got distributed, you may need a storage that gathers gradients from all workers. In such case we recommend Redis due to it's simplicity.

Here's a speed-optimized saver/loader to store numpy arrays in Redis as strings.


In [ ]:
import joblib
from six import BytesIO


def dumps(data):
    """converts whatever to string"""
    s = BytesIO()
    joblib.dump(data, s)
    return s.getvalue()


def loads(self, string):
    """converts string to whatever was dumps'ed in it"""
    return joblib.load(BytesIO(string))

Tips on atari games

  • There's all the pre-processing and tuning done for you in the code below
    • Images rescaled to 42x42 to speed up computation
    • We use last 4 frames as observations to account for ball velocity
    • The code below requires pip install Image and pip install gym[atari]
    • You may also need some dependencies for gym[atari] - google "gym install all" dependencies or use our pre-built environment.
  • The recommended agent architecture is a convolutional neural network. Dense network will also do.

May the force be with you!


In [ ]:
from pong import make_pong
import numpy as np

env = make_pong()
print(env.action_space)

In [ ]:
# get the initial state
s = env.reset()
print(s.shape)

In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
# plot first observation. Only one frame
plt.imshow(s.swapaxes(1, 2).reshape(-1, s.shape[-1]).T)

In [ ]:
# next frame
new_s, r, done, _ = env.step(env.action_space.sample())
plt.imshow(new_s.swapaxes(1, 2).reshape(-1, s.shape[-1]).T)

In [ ]:
# after 10 frames
for _ in range(10):
    new_s, r, done, _ = env.step(env.action_space.sample())

plt.imshow(new_s.swapaxes(1, 2).reshape(-1, s.shape[-1]).T, vmin=0)

In [ ]:
<YOUR CODE: tons of it here or elsewhere>