Welcome to the fourth project of the Machine Learning Engineer Nanodegree! In this notebook, template code has already been provided for you to aid in your analysis of the Smartcab and your implemented learning algorithm. You will not need to modify the included code beyond what is requested. There will be questions that you must answer which relate to the project and the visualizations provided in the notebook. Each section where you will answer a question is preceded by a 'Question X' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide in agent.py
.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
In this project, you will work towards constructing an optimized Q-Learning driving agent that will navigate a Smartcab through its environment towards a goal. Since the Smartcab is expected to drive passengers from one location to another, the driving agent will be evaluated on two very important metrics: Safety and Reliability. A driving agent that gets the Smartcab to its destination while running red lights or narrowly avoiding accidents would be considered unsafe. Similarly, a driving agent that frequently fails to reach the destination in time would be considered unreliable. Maximizing the driving agent's safety and reliability would ensure that Smartcabs have a permanent place in the transportation industry.
Safety and Reliability are measured using a letter-grade system as follows:
Grade | Safety | Reliability |
---|---|---|
A+ | Agent commits no traffic violations, and always chooses the correct action. |
Agent reaches the destination in time for 100% of trips. |
A | Agent commits few minor traffic violations, such as failing to move on a green light. |
Agent reaches the destination on time for at least 90% of trips. |
B | Agent commits frequent minor traffic violations, such as failing to move on a green light. |
Agent reaches the destination on time for at least 80% of trips. |
C | Agent commits at least one major traffic violation, such as driving through a red light. |
Agent reaches the destination on time for at least 70% of trips. |
D | Agent causes at least one minor accident, such as turning left on green with oncoming traffic. |
Agent reaches the destination on time for at least 60% of trips. |
F | Agent causes at least one major accident, such as driving through a red light with cross-traffic. |
Agent fails to reach the destination on time for at least 60% of trips. |
To assist evaluating these important metrics, you will need to load visualization code that will be used later on in the project. Run the code cell below to import this code which is required for your analysis.
In [1]:
# Import the visualization code
import visuals as vs
# Pretty display for notebooks
%matplotlib inline
Before starting to work on implementing your driving agent, it's necessary to first understand the world (environment) which the Smartcab and driving agent work in. One of the major components to building a self-learning agent is understanding the characteristics about the agent, which includes how the agent operates. To begin, simply run the agent.py
agent code exactly how it is -- no need to make any additions whatsoever. Let the resulting simulation run for some time to see the various working components. Note that in the visual simulation (if enabled), the white vehicle is the Smartcab.
In a few sentences, describe what you observe during the simulation when running the default agent.py
agent code. Some things you could consider:
Hint: From the /smartcab/
top-level directory (where this notebook is located), run the command
'python smartcab/agent.py'
Answer:
The game window looks like this:
The smartcab (white in color) is at an intersection in a world consisting of 8 streets and 6 cross roads. The smartcab does not move at all while the lights change from red to gree to red. Looks like the driving agent goes through multiple trials. The smartcab is positioned at a new intersection at the start of the trial. The driving agent is expected to go someplace within a certain time from this starting point. Apparently currently the deadline is not enforced.
During a trial the driving agent gets postive as well as negative rewards. The rewards are positive when the agent does not move when light is red. The rewards are negative when the agent does not move when the light is green and there is no oncoming traffic. At this time I can speculate that while the smartcab does not move and still get positive rewards because it is displaying safe behavior by not moving when the light is red. It probably gets negative rewards while stationary because it is not moving when it is expected to move in safests possible situations such as light is green and there is no oncoming traffic.
At the end of the trial the cumulative rewards probably determine if the trial was a success or a failure.
In addition to understanding the world, it is also necessary to understand the code itself that governs how the world, simulation, and so on operate. Attempting to create a driving agent would be difficult without having at least explored the "hidden" devices that make everything work. In the /smartcab/
top-level directory, there are two folders: /logs/
(which will be used later) and /smartcab/
. Open the /smartcab/
folder and explore each Python file included, then answer the following question.
agent.py
Python file, choose three flags that can be set and explain how they change the simulation.environment.py
Python file, what Environment class function is called when an agent performs an action?simulator.py
Python file, what is the difference between the 'render_text()'
function and the 'render()'
function?planner.py
Python file, will the 'next_waypoint()
function consider the North-South or East-West direction first?Answer:
Three of the flags that can be set are learning, epsilon and alpha. The flag learning can be set to True or False. The value True forces the agent to use Q-learn a function based on state, action and reward values. After the agent completes an action and a new state is created the function is called and the agent and receives an award. No future rewards are considered during learning. The value of epsilon determines how much can the agent explore during learning. Value of 1 (which is default) means take every opportunity to explore. A value of 0 would mean no exploration just follow your training. It is expected that epsilon will not be a static scalar but would decay to a smaller value based on number of training cycles. (One way to imagine this is that the agent is supposed to get smarter and not explore as it has gone through a large number of training cycles.) The flag alpha determines the learning rate and its default value is 0.5 which means the agent is faster learner than some other learner whose alpha value is say 0.1. (Slow learning alorithms are more confident/reliable but need more data..kind of like the tortoise..slow and steady wins the race !)
In the environment.py code file, the Environment class function act(self, agent, action) is called when an agent performs an action. This function considers an action and performs it if it is legal for an agent and then receive a reward for him based on traffic laws.
The first step to creating an optimized Q-Learning driving agent is getting the agent to actually take valid actions. In this case, a valid action is one of None
, (do nothing) 'Left'
(turn left), 'Right'
(turn right), or 'Forward'
(go forward). For your first implementation, navigate to the 'choose_action()'
agent function and make the driving agent randomly choose one of these actions. Note that you have access to several class variables that will help you write this functionality, such as 'self.learning'
and 'self.valid_actions'
. Once implemented, run the agent file and simulation briefly to confirm that your driving agent is taking a random action each time step.
To obtain results from the initial simulation, you will need to adjust following flags:
'enforce_deadline'
- Set this to True
to force the driving agent to capture whether it reaches the destination in time.'update_delay'
- Set this to a small value (such as 0.01
) to reduce the time between steps in each trial.'log_metrics'
- Set this to True
to log the simluation results as a .csv
file in /logs/
.'n_test'
- Set this to '10'
to perform 10 testing trials.Optionally, you may disable to the visual simulation (which can make the trials go faster) by setting the 'display'
flag to False
. Flags that have been set here should be returned to their default setting when debugging. It is important that you understand what each flag does and how it affects the simulation!
Once you have successfully completed the initial simulation (there should have been 20 training trials and 10 testing trials), run the code cell below to visualize the results. Note that log files are overwritten when identical simulations are run, so be careful with what log file is being loaded! Run the agent.py file after setting the flags from projects/smartcab folder instead of projects/smartcab/smartcab.
In [4]:
# Load the 'sim_no-learning' log file from the initial simulation results
vs.plot_trials('sim_no-learning.csv')
Using the visualization above that was produced from your initial simulation, provide an analysis and make several observations about the driving agent. Be sure that you are making at least one observation about each panel present in the visualization. Some things you could consider:
Answer:
* History shows that the driving agent has made bad decisions about 42% to 45% of the time. None of those have caused accidents
* The rate of reliability is zero. The low score makes sense given that the agent is not actually attempting to drive to the destination but driving randomly. I am not even sure if is moving at all.
* The average rewards received by the agent are between -1 and -1.5. This shows the agent is getting penalized (may or may not be heavily).
As the name of the log file and the text in the graphics suggests, the agent is not actually learning at this point. It makes sense that more trials do not have any significant effect on the outcomes. ~~
~~ This Smartcab would not be considered safe or reliable. It has to take me to my destination reliably first. I am willing to consider some bad actions (more like minor violations) if they are incurred to get to the destination on time. In terms of reliability, fot it to be considered reliable the score has to be much closer to 100% of the time. That it has a grade of B for its safety record is foruitous for the agent but he gets no gratuity from me ;o)
The second step to creating an optimized Q-learning driving agent is defining a set of states that the agent can occupy in the environment. Depending on the input, sensory data, and additional variables available to the driving agent, a set of states can be defined for the agent so that it can eventually learn what action it should take when occupying a state. The condition of 'if state then action'
for each state is called a policy, and is ultimately what the driving agent is expected to learn. Without defining states, the driving agent would never understand which action is most optimal -- or even what environmental variables and conditions it cares about!
Inspecting the 'build_state()'
agent function shows that the driving agent is given the following data from the environment:
'waypoint'
, which is the direction the Smartcab should drive leading to the destination, relative to the Smartcab's heading.'inputs'
, which is the sensor data from the Smartcab. It includes 'light'
, the color of the light.'left'
, the intended direction of travel for a vehicle to the Smartcab's left. Returns None
if no vehicle is present.'right'
, the intended direction of travel for a vehicle to the Smartcab's right. Returns None
if no vehicle is present.'oncoming'
, the intended direction of travel for a vehicle across the intersection from the Smartcab. Returns None
if no vehicle is present.'deadline'
, which is the number of actions remaining for the Smartcab to reach the destination before running out of time.Answer:
The 'inputs' such as 'light', 'left', 'right' and 'oncoming' all seem important from consideration of safety. My guess is that these are visual inputs. Is the light green or red? Is a car coming into the intersection from left going straight or has its turn signal on (for right or for left)? These are very important from the safety consideration. I am tempted to say that Given the American right-of-the way protocol input 'right' should be is immaterial.
From consideration of efficiency, 'waypoint' and 'deadline' seem to be important. My guess is that waypoint is a direction (straight, right or left) given to the smartcab agent for it to get to its destination. Deadline is possibly a countdown to indicate how many steps are left for reaching the destination. Hence they could be important for efficiency.
I had included the deadline in my state definition. That resulted into several rounds of tests that looked like what is shown below:
No amount of fine tuning with the parameters would improve the safety rating. I concluded that by informing the agent about deadline induces bad actions. After removal of the deadline from the state, the safety rating improved immediately as you will see in the visualization above Question 6
When defining a set of states that the agent can occupy, it is necessary to consider the size of the state space. That is to say, if you expect the driving agent to learn a policy for each state, you would need to have an optimal action for every state the agent can occupy. If the number of all possible states is very large, it might be the case that the driving agent never learns what to do in some states, which can lead to uninformed decisions. For example, consider a case where the following features are used to define the state of the Smartcab:
('is_raining', 'is_foggy', 'is_red_light', 'turn_left', 'no_traffic', 'previous_turn_left', 'time_of_day')
.
How frequently would the agent occupy a state like (False, True, True, True, False, False, '3AM')
? Without a near-infinite amount of time for training, it's doubtful the agent would ever learn the proper action!
If a state is defined using the features you've selected from Question 4, what would be the size of the state space? Given what you know about the evironment and how it is simulated, do you think the driving agent could learn a policy for each possible state within a reasonable number of training trials?
Hint: Consider the combinations of features to calculate the total number of states!
Answer:
My state will consist of waypoint (which could be left, right or straight), light (green or red), left (right, left,straight or None), oncoming (right, left,straight or None) and deadline representing number of steps to be completed. I am ignoring the input right
The environment consists of 48 intersections but my intuition is that all the intersections are identical to each other and hence the agent does not have to be exposed to all 48 intersections approaching from all directions. All it needs to know given the next waypoint guidance (i.e. go ahead or turn left or turn right) under what conditions to take an action in line with the above guidance by following American protocol of "right of the way" as one approaches the intersection.
One can consider that the state consists of features independant of each other thus making the size of the state space equal 3 x 2 x 4 x 4 i.e. 48
However, from the perspective of actions these features are not independant of each other. My reasoning is that if I intend to go straight (maybe because waypoint is ahead), the only feature that matters is light (whether it is green).
In any case we are faced with finite number of combinations 48 in the worst case. The agent will be exposed to some combination at every step which may or may not be unique. Every time it encounters a unique combination it will make an entry in its black book. But the number of trials it has to undergo to be exposed all unique combinations is un-deterministic (my way of saying it could be infinite)
It's like life - one is learning forever !
For your second implementation, navigate to the 'build_state()'
agent function. With the justification you've provided in Question 4, you will now set the 'state'
variable to a tuple of all the features necessary for Q-Learning. Confirm your driving agent is updating its state by running the agent file and simulation briefly and note whether the state is displaying. If the visual simulation is used, confirm that the updated state corresponds with what is seen in the simulation.
Note: Remember to reset simulation flags to their default setting when making this observation!
I had included deadline in the state. The state and deadline can be seen indeed.
Howewver I did remove the deadline from the state. No matter what I did with the parameters I got miserable scores on safety. I concluded that this was because of the deadline. Seeing the deadline the agent seems to ignore safety.
The third step to creating an optimized Q-Learning agent is to begin implementing the functionality of Q-Learning itself. The concept of Q-Learning is fairly straightforward: For every state the agent visits, create an entry in the Q-table for all state-action pairs available. Then, when the agent encounters a state and performs an action, update the Q-value associated with that state-action pair based on the reward received and the interative update rule implemented. Of course, additional benefits come from Q-Learning, such that we can have the agent choose the best action for each state based on the Q-values of each state-action pair possible. For this project, you will be implementing a decaying, $\epsilon$-greedy Q-learning algorithm with no discount factor. Follow the implementation instructions under each TODO in the agent functions.
Note that the agent attribute self.Q
is a dictionary: This is how the Q-table will be formed. Each state will be a key of the self.Q
dictionary, and each value will then be another dictionary that holds the action and Q-value. Here is an example:
{ 'state-1': {
'action-1' : Qvalue-1,
'action-2' : Qvalue-2,
...
},
'state-2': {
'action-1' : Qvalue-1,
...
},
...
}
Furthermore, note that you are expected to use a decaying $\epsilon$ (exploration) factor. Hence, as the number of trials increases, $\epsilon$ should decrease towards 0. This is because the agent is expected to learn from its behavior and begin acting on its learned behavior. Additionally, The agent will be tested on what it has learned after $\epsilon$ has passed a certain threshold (the default threshold is 0.01). For the initial Q-Learning implementation, you will be implementing a linear decaying function for $\epsilon$.
To obtain results from the initial Q-Learning implementation, you will need to adjust the following flags and setup:
'enforce_deadline'
- Set this to True
to force the driving agent to capture whether it reaches the destination in time.'update_delay'
- Set this to a small value (such as 0.01
) to reduce the time between steps in each trial.'log_metrics'
- Set this to True
to log the simluation results as a .csv
file and the Q-table as a .txt
file in /logs/
.'n_test'
- Set this to '10'
to perform 10 testing trials.'learning'
- Set this to 'True'
to tell the driving agent to use your Q-Learning implementation.In addition, use the following decay function for $\epsilon$:
$$ \epsilon_{t+1} = \epsilon_{t} - 0.05, \hspace{10px}\textrm{for trial number } t$$If you have difficulty getting your implementation to work, try setting the 'verbose'
flag to True
to help debug. Flags that have been set here should be returned to their default setting when debugging. It is important that you understand what each flag does and how it affects the simulation!
Once you have successfully completed the initial Q-Learning simulation, run the code cell below to visualize the results. Note that log files are overwritten when identical simulations are run, so be careful with what log file is being loaded!
In [7]:
# Load the 'sim_default-learning' file from the default Q-Learning simulation
vs.plot_trials('sim_default-learning.csv')
Using the visualization above that was produced from your default Q-Learning simulation, provide an analysis and make observations about the driving agent like in Question 3. Note that the simulation should have also produced the Q-table in a text file which can help you make observations about the agent's learning. Some additional things you could consider:
Answer:
The third step to creating an optimized Q-Learning agent is to perform the optimization! Now that the Q-Learning algorithm is implemented and the driving agent is successfully learning, it's necessary to tune settings and adjust learning paramaters so the driving agent learns both safety and efficiency. Typically this step will require a lot of trial and error, as some settings will invariably make the learning worse. One thing to keep in mind is the act of learning itself and the time that this takes: In theory, we could allow the agent to learn for an incredibly long amount of time; however, another goal of Q-Learning is to transition from experimenting with unlearned behavior to acting on learned behavior. For example, always allowing the agent to perform a random action during training (if $\epsilon = 1$ and never decays) will certainly make it learn, but never let it act. When improving on your Q-Learning implementation, consider the impliciations it creates and whether it is logistically sensible to make a particular adjustment.
To obtain results from the initial Q-Learning implementation, you will need to adjust the following flags and setup:
'enforce_deadline'
- Set this to True
to force the driving agent to capture whether it reaches the destination in time.'update_delay'
- Set this to a small value (such as 0.01
) to reduce the time between steps in each trial.'log_metrics'
- Set this to True
to log the simluation results as a .csv
file and the Q-table as a .txt
file in /logs/
.'learning'
- Set this to 'True'
to tell the driving agent to use your Q-Learning implementation.'optimized'
- Set this to 'True'
to tell the driving agent you are performing an optimized version of the Q-Learning implementation.Additional flags that can be adjusted as part of optimizing the Q-Learning agent:
'n_test'
- Set this to some positive number (previously 10) to perform that many testing trials.'alpha'
- Set this to a real number between 0 - 1 to adjust the learning rate of the Q-Learning algorithm.'epsilon'
- Set this to a real number between 0 - 1 to adjust the starting exploration factor of the Q-Learning algorithm.'tolerance'
- set this to some small value larger than 0 (default was 0.05) to set the epsilon threshold for testing.Furthermore, use a decaying function of your choice for $\epsilon$ (the exploration factor). Note that whichever function you use, it must decay to 'tolerance'
at a reasonable rate. The Q-Learning agent will not begin testing until this occurs. Some example decaying functions (for $t$, the number of trials):
You may also use a decaying function for $\alpha$ (the learning rate) if you so choose, however this is typically less common. If you do so, be sure that it adheres to the inequality $0 \leq \alpha \leq 1$.
If you have difficulty getting your implementation to work, try setting the 'verbose'
flag to True
to help debug. Flags that have been set here should be returned to their default setting when debugging. It is important that you understand what each flag does and how it affects the simulation!
Once you have successfully completed the improved Q-Learning simulation, run the code cell below to visualize the results. Note that log files are overwritten when identical simulations are run, so be careful with what log file is being loaded!
In [7]:
# Load the 'sim_improved-learning' file from the improved Q-Learning simulation
#case 1 where epsilon = epsilon - 0.005 (epsilon decays from 1 by an amount of 0.005 per trial) and alpha is 0.5
vs.plot_trials('sim_improved-learning_case1.csv')
In [13]:
# Load the 'sim_improved-learning' file from the improved Q-Learning simulation
#case 2 where epsilon =alpha ^ trial_num where trial_num is number of trials where alpha is 0.99
vs.plot_trials('sim_improved-learning_case2.csv')
In [9]:
# Load the 'sim_improved-learning' file from the improved Q-Learning simulation
#case 3 where epsilon =1/trial_num^2 where trial_num is number of trials and alpha is set to 0.5
vs.plot_trials('sim_improved-learning_case3.csv')
In [14]:
# Load the 'sim_improved-learning' file from the improved Q-Learning simulation
#case 4 where epsilon = e ^(-alpha x trial_num) where trial_num is number of trials& alpha = 0.015
vs.plot_trials('sim_improved-learning_case4.csv')
In [12]:
# Load the 'sim_improved-learning' file from the improved Q-Learning simulation
#case 2 where epsilon = cosine(alpha x trial_num) where trial_num is number of trials & alpha=0.005
vs.plot_trials('sim_improved-learning_case5.csv')
Using the visualization above that was produced from your improved Q-Learning simulation, provide a final analysis and make observations about the improved driving agent like in Question 6. Questions you should answer:
Answer:
I tried a number of different decaying functions for epsilon:
case 1 to 5 I have set epsilon = $$epsilon - 0.005 \textrm{ where } a = 0.5 $$, $$ a ^ t \textrm{ where } a = 0.99$$, $$\frac{1}{t ^2} \textrm{ where } a = 0.5$$, $$e ^{-a*t} \textrm{ where } a = 0.015$$ and $$cos(a * t) \textrm{ where } a = 0.005$$
With the exception of case 3 all other decay functions resulted in 190 to 320 trials before beginning testing. In case 3 the number of trials was 20
Sometimes, the answer to the important question "what am I trying to get my agent to learn?" only has a theoretical answer and cannot be concretely described. Here, however, you can concretely define what it is the agent is trying to learn, and that is the U.S. right-of-way traffic laws. Since these laws are known information, you can further define, for each state the Smartcab is occupying, the optimal action for the driving agent based on these laws. In that case, we call the set of optimal state-action pairs an optimal policy. Hence, unlike some theoretical answers, it is clear whether the agent is acting "incorrectly" not only by the reward (penalty) it receives, but also by pure observation. If the agent drives through a red light, we both see it receive a negative reward but also know that it is not the correct behavior. This can be used to your advantage for verifying whether the policy your driving agent has learned is the correct one, or if it is a suboptimal policy.
Provide a few examples (using the states you've defined) of what an optimal policy for this problem would look like. Afterwards, investigate the 'sim_improved-learning.txt'
text file to see the results of your improved Q-Learning algorithm. For each state that has been recorded from the simulation, is the policy (the action with the highest value) correct for the given state? Are there any states where the policy is different than what would be expected from an optimal policy? Provide an example of a state and all state-action rewards recorded, and explain why it is the correct policy.
Answer:
I have added some print sentences to the choose_action(self, state) code. I run the simulation like this python smartcab/agent.py > cature.txt This sends all that one sees on the console to the file capture.txt
When I analyze the sim_improved-learning_case1.txt I see
Looking at sim_improved-learning_case1.csv, We can see that during early trials the agent acquired a large number of negative rewards to suggest fair amount of exploration:
In later trials we see that the gaent does not explore as much and accumulate positive rewards
When I analyze the capture.txt file (its big 298 MB)
I see cases like this:
Training trial 6, Step 5:
Training Trial , Step 8:
My expectation is: The agent should choose a random action for his state when not in training. During training, when probability is greater than 1-epsilon, the agent should choose a random action (This will happen more frequently during early part of the training when epsilon is larger). If the probability is less than 1- epsilon (This will happen more frequently in later part of the training, the agent should choose an action with the best Q value. If there are multiple actions with same Q value, the agent should choose any one of those at random.
I believe the agent code is doing that. It is following optimal policy during training.
The optimal policy can be stated as:
'gamma'
Curiously, as part of the Q-Learning algorithm, you were asked to not use the discount factor, 'gamma'
in the implementation. Including future rewards in the algorithm is used to aid in propogating positive rewards backwards from a future state to the current state. Essentially, if the driving agent is given the option to make several actions to arrive at different states, including future rewards will bias the agent towards states that could provide even more rewards. An example of this would be the driving agent moving towards a goal: With all actions and rewards equal, moving towards the goal would theoretically yield better rewards if there is an additional reward for reaching the goal. However, even though in this project, the driving agent is trying to reach a destination in the allotted time, including future rewards will not benefit the agent. In fact, if the agent were given many trials to learn, it could negatively affect Q-values!
There are two characteristics about the project that invalidate the use of future rewards in the Q-Learning algorithm. One characteristic has to do with the Smartcab itself, and the other has to do with the environment. Can you figure out what they are and why future rewards won't work for this project?
Answer:
The agent gets rewards for taking actions on actionable situations here and now. The light at the next intersection cannot be anticipated although past experience was that it was green and intended direction was to go forward which gives a score of 1.82 or higher is foolhardy at best. So again lights, other vehicle location and direction are part of the environment which is beyond the agent's control. What was the environment few minutes ago need not be now. In real life environments are not stationary as Michael and Charles would say ;o)
~~I am sure I have seen in some movie a passenger offering a New York driver a large gratuity for taking him to airport in twenty minutes. The passenger arrived the airport scared breathless as the cab barrelled through the busy streets of New York. Now this is ok for one ride / test, but in my mind property of smartcab is that it wants to live forever and expects to gather rewards by having infinitely large number of rides and not to have a finite number of rides. In that case it stands to logic that the best policy will still be follow the waypoint guidance when it is safe. ~~
By the same token if smartcab is an organization with multiple smartcab vehicles would also want to protect its reputation by keeping to the best policy stated above. If future rewards cause deviations there could be risk taking leading to accidents and that too between two or more smartcabs !!
Although this looks like a video game and hence all elements of the code are within a developers control and hence it is possible to imagine as I did that all possible information could be generated and sent from the classes - planner, environment and simulation to the agent class. However, in retrospect my imagination may be flawed because in this project the agent class is the only code that is within control. The agent has a very limited visibility. The starting point and destination are random, all the intersections inbetween are unknown, the likely state of all those intersections at the time the agent would reach them is unknown and the rewards future or even the present are determined by the environment not by the agent.
Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to
File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.