Smart random exploration

This experiment compares two situations, with smart start and without smart start. In the case without smart start the agent starts in the same state every iteration. In the case with smart start the agent starts in the state which has the lowest kernel density estimation compared to all visited states.

The state with the lowest kernel density is calculated according to $i = \underset{i \in D}{\arg\min}\frac{1}{|D|}\sum_{j \in D}^Ne^{-\sum_{d = 1}^{D_N}(i_d - j_d)^2 / C}$


In [1]:
import numpy as np
import tensorflow as tf
from drl.replaybuffer import ReplayBufferTF
from drl.rrtexploration import Trajectory
from drl.exploration import OrnSteinUhlenbeckNoise
import time

In [7]:
import plotly
import plotly.offline as py
from plotly.graph_objs import *
from plotly import tools
plotly.offline.init_notebook_mode()

scene=Scene(
    xaxis=XAxis(
        gridcolor='rgb(255, 255, 255)',
        zerolinecolor='rgb(255, 255, 255)',
        showbackground=True,
        backgroundcolor='rgb(230, 230,230)'
    ),
    yaxis=YAxis(
        gridcolor='rgb(255, 255, 255)',
        zerolinecolor='rgb(255, 255, 255)',
        showbackground=True,
        backgroundcolor='rgb(230, 230,230)'
    ),
    zaxis=ZAxis(
        gridcolor='rgb(255, 255, 255)',
        zerolinecolor='rgb(255, 255, 255)',
        showbackground=True,
        backgroundcolor='rgb(230, 230,230)'
    )
)