In the previous notebook, we learned how to use hyperparameter tuning to help DQN agents balance a pole on a cart. In this notebook, we'll explore two other types of alogrithms: Policy Gradients and A2C.
Hypertuning takes some time, and in this case, it can take anywhere between 10 - 30 minutes. If this hasn't been done already, run the cell below to kick off the training job now. We'll step through what the code is doing while our agents learn.
In [ ]:
%%bash
BUCKET=<your-bucket-here> # Change to your bucket name
JOB_NAME=pg_on_gcp_$(date -u +%y%m%d_%H%M%S)
REGION='us-central1' # Change to your bucket region
IMAGE_URI=gcr.io/cloud-training-prod-bucket/pg:latest
gcloud ai-platform jobs submit training $JOB_NAME \
--staging-bucket=gs://$BUCKET \
--region=$REGION \
--master-image-uri=$IMAGE_URI \
--scale-tier=BASIC_GPU \
--job-dir=gs://$BUCKET/$JOB_NAME \
--config=templates/hyperparam.yaml
Thankfully, we can use the same environment for these algorithms as DQN, so this notebook will focus less on the operational work of feeding our agents the data, and more on the theory behind these algorthims. Let's start by loading our libraries and environment.
In [1]:
import gym
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras import backend as K
CLIP_EDGE = 1e-8
def print_state(state, step, reward=None):
format_string = 'Step {0} - Cart X: {1:.3f}, Cart V: {2:.3f}, Pole A: {3:.3f}, Pole V:{4:.3f}, Reward:{5}'
print(format_string.format(step, *tuple(state), reward))
env = gym.make('CartPole-v0')
Whereas Q-learning attempts to assign each state a value, Policy Gradients tries to find actions directly, increasing or decreaing a chance to take an action depending on how an episode plays out.
To compare, Q-learning has a table that keeps track of the value of each combination of state and action:
Meal | Snack | Wait | |
---|---|---|---|
Hangry | 1 | .5 | -1 |
Hungry | .5 | 1 | 0 |
Full | -1 | -.5 | 1.5 |
Instead for Policy Gradients, we can imagine that we have a similar table, but instead of recording the values, we'll keep track of the probability to take the column action given the row state.
Meal | Snack | Wait | |
---|---|---|---|
Hangry | 70% | 20% | 10% |
Hungry | 30% | 50% | 20% |
Full | 5% | 15% | 80% |
With Q learning, whenever we take one step in our environment, we can update the value of the old state based on the value of the new state plus any rewards we picked up based on the Q equation:
Could we do the same thing if we have a table of probabilities instead values? No, because we don't have a way to calculate the value of each state from our table. Instead, we'll use a different Temporal Difference Learning strategy.
Q Learning is an evolution of TD(0), and for Policy Gradients, we'll use TD(1). We'll calculate TD(1) accross and entire episode, and use that to indicate whether to increase or decrease the probability correspoding to the action we took. Let's look at a full day of eating.
Hour | State | Action | Reward |
---|---|---|---|
9 | Hangry | Wait | -.9 |
10 | Hangry | Meal | 1.2 |
11 | Full | Wait | .5 |
12 | Full | Snack | -.6 |
13 | Full | Wait | 1 |
14 | Full | Wait | .6 |
15 | Full | Wait | .2 |
16 | Hungry | Wait | 0 |
17 | Hungry | Meal | .4 |
18 | Full | Wait | .5 |
We'll work backwards from the last day, using the same discount, or gamma
, as we did with DQNs. The total_rewards
variable is equivalent to the value of state prime. Using the Bellman Equation, everytime we calculate the value of a state, st, we'll set that as the value of state prime for the state before, st-1.
In [2]:
test_gamma = .5 # Please change me to be between zero and one
episode_rewards = [-.9, 1.2, .5, -.6, 1, .6, .2, 0, .4, .5]
def discount_episode(rewards, gamma):
discounted_rewards = np.zeros_like(rewards)
total_rewards = 0
for t in reversed(range(len(rewards))):
total_rewards = rewards[t] + total_rewards * gamma
discounted_rewards[t] = total_rewards
return discounted_rewards
discount_episode(episode_rewards, test_gamma)
Out[2]:
Wherever our discounted reward is positive, we'll increase the probability corresponding to the action we took. Similarly, wherever our discounted reward is negative, we'll decrease the probabilty.
However, with this strategy, any actions with a positive reward will have it's probability increase, not necessarily the most optimal action. This puts us in a feedback loop, where we're more likely to pick less optimal actions which could further increase their probability. To counter this, we'll divide the size of our increases by the probability to choose the corresponding action, which will slow the growth of popular actions to give other actions a chance.
Here is our update rule for our neural network, where alpha is our learning rate, and pi is our optimal policy, or the probability to take the optimal action, a*, given our current state, s.
Doing some fancy calculus, we can combine the numerator and denominator with a log function. Since it's not clear what the optimal action is, we'll instead use our discounted rewards, or G, to increase or decrease the weights of the respective action the agent took. A full breakdown of the math can be found in this article by Chris Yoon.
Below is what it looks like in code. y_true
is the one-hot encoding of the action that was taken. y_pred
is the probabilty to take each action given the state the agent was in.
In [3]:
def custom_loss(y_true, y_pred):
y_pred_clipped = K.clip(y_pred, CLIP_EDGE, 1-CLIP_EDGE)
log_likelihood = y_true * K.log(y_pred_clipped)
return K.sum(-log_likelihood*g)
We won't have the discounted rewards, or g
, when our agent is acting in the environment. No problem, we'll have one neural network with two types of pathways. One pathway, predict
, will be the probability to take an action given an inputed state. It's only used for prediction and is not used for backpropogation. The other pathway, policy
, will take both a state and a discounted reward, so it can be used for training.
The code in its entirety looks like this. As with Deep Q Networks, the hidden layers of a Policy Gradient can use a CNN if the input state is pixels, but the last layer is typically a Dense layer with a Softmax activation function to convert the output into probabilities.
In [4]:
def build_networks(
state_shape, action_size, learning_rate, hidden_neurons):
"""Creates a Policy Gradient Neural Network.
Creates a two hidden-layer Policy Gradient Neural Network. The loss
function is altered to be a log-likelihood function weighted
by the discounted reward, g.
Args:
space_shape: a tuple of ints representing the observation space.
action_size (int): the number of possible actions.
learning_rate (float): the nueral network's learning rate.
hidden_neurons (int): the number of neurons to use per hidden
layer.
"""
state_input = layers.Input(state_shape, name='frames')
g = layers.Input((1,), name='g')
hidden_1 = layers.Dense(hidden_neurons, activation='relu')(state_input)
hidden_2 = layers.Dense(hidden_neurons, activation='relu')(hidden_1)
probabilities = layers.Dense(action_size, activation='softmax')(hidden_2)
def custom_loss(y_true, y_pred):
y_pred_clipped = K.clip(y_pred, CLIP_EDGE, 1-CLIP_EDGE)
log_lik = y_true*K.log(y_pred_clipped)
return K.sum(-log_lik*g)
policy = models.Model(
inputs=[state_input, g], outputs=[probabilities])
optimizer = tf.keras.optimizers.Adam(lr=learning_rate)
policy.compile(loss=custom_loss, optimizer=optimizer)
predict = models.Model(inputs=[state_input], outputs=[probabilities])
return policy, predict
Let's get a taste of how these networks function. Run the below cell to build our test networks.
In [5]:
space_shape = env.observation_space.shape
action_size = env.action_space.n
# Feel free to play with these
test_learning_rate = .2
test_hidden_neurons = 10
test_policy, test_predict = build_networks(
space_shape, action_size, test_learning_rate, test_hidden_neurons)
We can't use the policy network until we build our learning function, but we can feed a state to the predict network so we can see our chances to pick our actions.
In [6]:
state = env.reset()
test_predict.predict(np.expand_dims(state, axis=0))
Out[6]:
Right now, the numbers should be close to [.5, .5]
, with a little bit of variance due to the randomization of initializing the weights and the cart's starting position. In order to train, we'll need some memories to train on. The memory buffer here is simpler than DQN, as we don't have to worry about random sampling. We'll clear the buffer every time we train as we'll only hold one episode's worth of memory.
In [7]:
class Memory():
"""Sets up a memory replay buffer for Policy Gradient methods.
Args:
gamma (float): The "discount rate" used to assess TD(1) values.
"""
def __init__(self, gamma):
self.buffer = []
self.gamma = gamma
def add(self, experience):
"""Adds an experience into the memory buffer.
Args:
experience: a (state, action, reward) tuple.
"""
self.buffer.append(experience)
def sample(self):
"""Returns the list of episode experiences and clears the buffer.
Returns:
(list): A tuple of lists with structure (
[states], [actions], [rewards]
}
"""
batch = np.array(self.buffer).T.tolist()
states_mb = np.array(batch[0], dtype=np.float32)
actions_mb = np.array(batch[1], dtype=np.int8)
rewards_mb = np.array(batch[2], dtype=np.float32)
self.buffer = []
return states_mb, actions_mb, rewards_mb
Let's make a fake buffer to get a sense of the data we'll be training on. The cell below initializes our memory and runs through one episode of the game by alternating pushing the cart left and right.
Try running it to see the data we'll be using for training.
In [8]:
test_memory = Memory(test_gamma)
actions = [x % 2 for x in range(200)]
state = env.reset()
step = 0
episode_reward = 0
done = False
while not done and step < len(actions):
action = actions[step] # In the future, our agents will define this.
state_prime, reward, done, info = env.step(action)
episode_reward += reward
test_memory.add((state, action, reward))
step += 1
state = state_prime
test_memory.sample()
Out[8]:
Ok, time to start putting together the agent! Let's start by giving it the ability to act. Here, we don't need to worry about exploration vs exploitation because we already have a random chance to take each of our actions. As the agent learns, it will naturally shift from exploration to exploitation. How conveient!
In [9]:
class Partial_Agent():
"""Sets up a reinforcement learning agent to play in a game environment."""
def __init__(self, policy, predict, memory, action_size):
"""Initializes the agent with Policy Gradient networks
and memory sub-classes.
Args:
policy: The policy network created from build_networks().
predict: The predict network created from build_networks().
memory: A Memory class object.
action_size (int): The number of possible actions to take.
"""
self.policy = policy
self.predict = predict
self.action_size = action_size
self.memory = memory
def act(self, state):
"""Selects an action for the agent to take given a game state.
Args:
state (list of numbers): The state of the environment to act on.
Returns:
(int) The index of the action to take.
"""
# If not acting randomly, take action with highest predicted value.
state_batch = np.expand_dims(state, axis=0)
probabilities = self.predict.predict(state_batch)[0]
action = np.random.choice(self.action_size, p=probabilities)
return action
Let's see the act function in action. First, let's build our agent.
In [10]:
test_agent = Partial_Agent(test_policy, test_predict, test_memory, action_size)
Next, run the below cell a few times to test the act
method. Is it about a 50/50 chance to push right instead of left?
In [11]:
action = test_agent.act(state)
print("Push Right" if action else "Push Left")
Now for the most important part. We need to give our agent a way to learn! To start, we'll one-hot encode our actions. Since the output of our network is a probability for each action, we'll have a 1 corresponding to the action that was taken and 0's for the actions we didn't take.
That doesn't give our agent enough information on whether the action that was taken was actually a good idea, so we'll also use our discount_episode
to calculate the TD(1) value of each step within the episode.
One thing to note, is that CartPole doesn't have any negative rewards, meaning, even if it does terribly, the agent will still think the run went well. To help counter this, we'll take the mean and standard deviation of our discounted rewards, or discount_mb
, and use that to find the Standard Score for each discounted reward. With this, steps close to dropping the poll will have a negative reward.
In [12]:
def learn(self, print_variables=False):
"""Trains a Policy Gradient policy network based on stored experiences."""
state_mb, action_mb, reward_mb = self.memory.sample()
# One hot enocde actions
actions = np.zeros([len(action_mb), self.action_size])
actions[np.arange(len(action_mb)), action_mb] = 1
if print_variables:
print("action_mb:", action_mb)
print("actions:", actions)
# Apply TD(1) and normalize
discount_mb = discount_episode(reward_mb, self.memory.gamma)
discount_mb = (discount_mb - np.mean(discount_mb)) / np.std(discount_mb)
if print_variables:
print("reward_mb:", reward_mb)
print("discount_mb:", discount_mb)
return self.policy.train_on_batch([state_mb, discount_mb], actions)
Partial_Agent.learn = learn
test_agent = Partial_Agent(test_policy, test_predict, test_memory, action_size)
Try adding in some print statements to the code above to get a sense of how the data is transformed before feeding it into the model, then run the below code to see it in action.
In [13]:
state = env.reset()
done = False
while not done:
action = test_agent.act(state)
state_prime, reward, done, _ = env.step(action)
test_agent.memory.add((state, action, reward)) # New line here
state = state_prime
test_agent.learn(print_variables=True)
Out[13]:
Finally, it's time to put it all together. Policy Gradient Networks have less hypertuning parameters than DQNs, but since our custom loss constructs a TensorFlow Graph under the hood, we'll set up lazy execution by wrapping our traing steps in a default graph.
By changing test_gamma
, test_learning_rate
, and test_hidden_neurons
, can you help the agent reach a score of 200 within 200 episodes? It takes a little bit of thinking and a little bit of luck.
Hover the curser on this bold text to see a solution to the challenge.
In [14]:
test_gamma = .5
test_learning_rate = .01
test_hidden_neurons = 100
with tf.Graph().as_default():
test_memory = Memory(test_gamma)
test_policy, test_predict = build_networks(
space_shape, action_size, test_learning_rate, test_hidden_neurons)
test_agent = Partial_Agent(test_policy, test_predict, test_memory, action_size)
for episode in range(200):
state = env.reset()
episode_reward = 0
done = False
while not done:
action = test_agent.act(state)
state_prime, reward, done, info = env.step(action)
episode_reward += reward
test_agent.memory.add((state, action, reward))
state = state_prime
test_agent.learn()
print("Episode", episode, "Score =", episode_reward)
Now that we have the hang of Policy Gradients, let's combine this strategy with Deep Q Agents. We'll have one architecture to rule them all!
Below is the setup for our neural networks. There are plenty of ways to go combining the two strategies. We'll be focusing on one varient called A2C, or Advantage Actor Critic.
Here's the philosophy: We'll use our critic pathway to estimate the value of a state, or V(s). Given a state-action-new state transition, we can use our critic and the Bellman Equation to calculate the discounted value of the new state, or r + γ * V(s').
Like DQNs, this discounted value is the label the critic will train on. While that is happening, we can subtract V(s) and the discounted value of the new state to get the advantage, or A(s,a). In human terms, how much value was the action the agent took? This is what the actor, or the policy gradient portion or our network, will train on.
Too long, didn't read: the critic's job is to learn how to asses the value of a state. The actor's job is to assign probabilities to it's available actions such that it increases its chance to move into a higher valued state.
Below is our new build_networks
function. Each line has been tagged with whether it comes from Deep Q Networks (# DQN
), Policy Gradients (# PG
), or is something new (# New
).
In [15]:
def build_networks(state_shape, action_size, learning_rate, critic_weight,
hidden_neurons, entropy):
"""Creates Actor Critic Neural Networks.
Creates a two hidden-layer Policy Gradient Neural Network. The loss
function is altered to be a log-likelihood function weighted
by an action's advantage.
Args:
space_shape: a tuple of ints representing the observation space.
action_size (int): the number of possible actions.
learning_rate (float): the nueral network's learning rate.
critic_weight (float): how much to weigh the critic's training loss.
hidden_neurons (int): the number of neurons to use per hidden layer.
entropy (float): how much to enourage exploration versus exploitation.
"""
state_input = layers.Input(state_shape, name='frames')
advantages = layers.Input((1,), name='advantages') # PG, A instead of G
# PG
actor_1 = layers.Dense(hidden_neurons, activation='relu')(state_input)
actor_2 = layers.Dense(hidden_neurons, activation='relu')(actor_1)
probabilities = layers.Dense(action_size, activation='softmax')(actor_2)
# DQN
critic_1 = layers.Dense(hidden_neurons, activation='relu')(state_input)
critic_2 = layers.Dense(hidden_neurons, activation='relu')(critic_1)
values = layers.Dense(1, activation='linear')(critic_2)
def actor_loss(y_true, y_pred): # PG
y_pred_clipped = K.clip(y_pred, CLIP_EDGE, 1-CLIP_EDGE)
log_lik = y_true*K.log(y_pred_clipped)
entropy_loss = y_pred * K.log(K.clip(y_pred, CLIP_EDGE, 1-CLIP_EDGE)) # New
return K.sum(-log_lik * advantages) - (entropy * K.sum(entropy_loss))
# Train both actor and critic at the same time.
actor = models.Model(
inputs=[state_input, advantages], outputs=[probabilities, values])
actor.compile(
loss=[actor_loss, 'mean_squared_error'], # [PG, DQN]
loss_weights=[1, critic_weight], # [PG, DQN]
optimizer=tf.keras.optimizers.Adam(lr=learning_rate))
critic = models.Model(inputs=[state_input], outputs=[values])
policy = models.Model(inputs=[state_input], outputs=[probabilities])
return actor, critic, policy
The above is one way to go about combining both of the algorithms. Here, we're combining training of both pwathways into on operation. Keras allows for the training against multiple outputs. They can even have their own loss functions as we have above. When minimizing the loss, Keras will take the weighted sum of all the losses, with the weights provided in loss_weights
. The critic_weight
is now another hyperparameter for us to tune.
We could even have completely separate networks for the actor and the critic, and that type of design choice is going to be problem dependent. Having shared nodes and training between the two will be more efficient to train per batch, but more complicated problems could justify keeping the two separate.
The loss function we used here is also slightly different than the one for Policy Gradients. Let's take a look.
In [16]:
def actor_loss(y_true, y_pred): # PG
y_pred_clipped = K.clip(y_pred, 1e-8, 1-1e-8)
log_lik = y_true*K.log(y_pred_clipped)
entropy_loss = y_pred * K.log(K.clip(y_pred, 1e-8, 1-1e-8)) # New
return K.sum(-log_lik * advantages) - (entropy * K.sum(entropy_loss))
We've added a new tool called entropy. We're calculating the log-likelihood again, but instead of comparing the probabilities of our actions versus the action that was taken, we calculating it for the probabilities of our actions against themselves.
Certainly a mouthful, but the idea is to encourage exploration: if our probability prediction is very confident (or close to 1), our entropy will be close to 0. Similary, if our probability isn't confident at all (or close to 0), our entropy will again be zero. Anywhere inbetween, our entropy will be non-zero. This encourages exploration versus exploitation, as the entropy will discourage overconfident predictions.
Now that the networks are out of the way, let's look at the Memory
. We could go with Experience Replay, like with DQNs, or we could calculate TD(1) like with Policy Gradients. This time, we'll do something in between. We'll give our memory a batch_size
. Once there are enough experiences in the buffer, we'll use all the experiences to train and then clear the buffer to start fresh.
In order to speed up training, instead of recording state_prime, we'll record the value of state prime in state_prime_values
or next_values
. This will give us enough information to calculate the discounted values and advantages.
In [17]:
class Memory():
"""Sets up a memory replay for actor-critic training.
Args:
gamma (float): The "discount rate" used to assess state values.
batch_size (int): The number of elements to include in the buffer.
"""
def __init__(self, gamma, batch_size):
self.buffer = []
self.gamma = gamma
self.batch_size = batch_size
def add(self, experience):
"""Adds an experience into the memory buffer.
Args:
experience: (state, action, reward, state_prime_value, done) tuple.
"""
self.buffer.append(experience)
def check_full(self):
return len(self.buffer) >= self.batch_size
def sample(self):
"""Returns formated experiences and clears the buffer.
Returns:
(list): A tuple of lists with structure [
[states], [actions], [rewards], [state_prime_values], [dones]
]
"""
# Columns have different data types, so numpy array would be awkward.
batch = np.array(self.buffer).T.tolist()
states_mb = np.array(batch[0], dtype=np.float32)
actions_mb = np.array(batch[1], dtype=np.int8)
rewards_mb = np.array(batch[2], dtype=np.float32)
dones_mb = np.array(batch[3], dtype=np.int8)
value_mb = np.squeeze(np.array(batch[4], dtype=np.float32))
self.buffer = []
return states_mb, actions_mb, rewards_mb, dones_mb, value_mb
Ok, time to build out the agent! The act
method is the exact same as it was for Policy Gradients. Nice! The learn
method is where things get interesting. We'll find the discounted future state like we did for DQN to train our critic. We'll then subtract the value of the discount state from the value of the current state to find the advantage, which is what the actor will train on.
In [18]:
class Agent():
"""Sets up a reinforcement learning agent to play in a game environment."""
def __init__(self, actor, critic, policy, memory, action_size):
"""Initializes the agent with DQN and memory sub-classes.
Args:
network: A neural network created from deep_q_network().
memory: A Memory class object.
epsilon_decay (float): The rate at which to decay random actions.
action_size (int): The number of possible actions to take.
"""
self.actor = actor
self.critic = critic
self.policy = policy
self.action_size = action_size
self.memory = memory
def act(self, state):
"""Selects an action for the agent to take given a game state.
Args:
state (list of numbers): The state of the environment to act on.
traning (bool): True if the agent is training.
Returns:
(int) The index of the action to take.
"""
# If not acting randomly, take action with highest predicted value.
state_batch = np.expand_dims(state, axis=0)
probabilities = self.policy.predict(state_batch)[0]
action = np.random.choice(self.action_size, p=probabilities)
return action
def learn(self, print_variables=False):
"""Trains the Deep Q Network based on stored experiences."""
gamma = self.memory.gamma
experiences = self.memory.sample()
state_mb, action_mb, reward_mb, dones_mb, next_value = experiences
# One hot enocde actions
actions = np.zeros([len(action_mb), self.action_size])
actions[np.arange(len(action_mb)), action_mb] = 1
#Apply TD(0)
discount_mb = reward_mb + next_value * gamma * (1 - dones_mb)
state_values = self.critic.predict([state_mb])
advantages = discount_mb - np.squeeze(state_values)
if print_variables:
print("discount_mb", discount_mb)
print("next_value", next_value)
print("state_values", state_values)
print("advantages", advantages)
else:
self.actor.train_on_batch(
[state_mb, advantages], [actions, discount_mb])
Run the below cell to initialize an agent, and the cell after that to see the variables used for training. Since it's early, the critic hasn't learned to estimate the values yet, and the advatanges are mostly positive because of it.
Once the crtic has learned how to properly assess states, the actor will start to see negative advantages. Try playing around with the variables to help the agent see this change sooner.
In [19]:
# Change me please.
test_gamma = .9
test_batch_size = 32
test_learning_rate = .02
test_hidden_neurons = 50
test_critic_weight = 0.5
test_entropy = 0.0001
test_memory = Memory(test_gamma, test_batch_size)
test_actor, test_critic, test_policy = build_networks(
space_shape, action_size,
test_learning_rate, test_critic_weight,
test_hidden_neurons, test_entropy)
test_agent = Agent(
test_actor, test_critic, test_policy, test_memory, action_size)
In [20]:
state = env.reset()
episode_reward = 0
done = False
while not done:
action = test_agent.act(state)
state_prime, reward, done, _ = env.step(action)
episode_reward += reward
next_value = test_agent.critic.predict([[state_prime]])
test_agent.memory.add((state, action, reward, done, next_value))
state = state_prime
test_agent.learn(print_variables=True)
Have a set of variables you're happy with? Ok, time to shine! Run the below cell to see how the agent trains.
In [21]:
with tf.Graph().as_default():
test_memory = Memory(test_gamma, test_batch_size)
test_actor, test_critic, test_policy = build_networks(
space_shape, action_size,
test_learning_rate, test_critic_weight,
test_hidden_neurons, test_entropy)
test_agent = Agent(
test_actor, test_critic, test_policy, test_memory, action_size)
for episode in range(200):
state = env.reset()
episode_reward = 0
done = False
while not done:
action = test_agent.act(state)
state_prime, reward, done, _ = env.step(action)
episode_reward += reward
next_value = test_agent.critic.predict([[state_prime]])
test_agent.memory.add((state, action, reward, done, next_value))
#if test_agent.memory.check_full():
#test_agent.learn(print_variables=True)
state = state_prime
test_agent.learn()
print("Episode", episode, "Score =", episode_reward)
Any luck? No sweat if not! It turns out that by combining the power of both algorithms, we also combined some of their setbacks. For instance, actor-critic can fall into local minimums like Policy Gradients, and has a large number of hyperparameters to tune like DQNs.
Time to check how our agents did in the cloud! Any lucky winners? Find it in your bucket to watch a recording of it play.
Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.