The devastating decrease in size of Napoleon's army due to battles, geography, and climate.
Clarity and intuitiveness.
In [150]:
print ggplot(data, aes(xvar, yvar)) + geom_point(color = 'coral') + geom_line(color='coral') + \
ggtitle('title') + xlab('x-label') + ylab('y-label')
In [ ]:
import pandas as pd
from ggplot import *
import pandas
def lineplot(hr_year_csv):
# A csv file will be passed in as an argument which
# contains two columns -- 'HR' (the number of homerun hits)
# and 'yearID' (the year in which the homeruns were hit).
#
# Fill out the body of this function, lineplot, to use the
# passed-in csv file, hr_year.csv, and create a
# chart with points connected by lines, both colored 'red',
# showing the number of HR by year.
#
# You will want to first load the csv file into a pandas dataframe
# and use the pandas dataframe along with ggplot to create your visualization
#
# You can check out the data in the csv file at the link below:
# https://www.dropbox.com/s/awgdal71hc1u06d/hr_year.csv
#
# You can read more about ggplot at the following link:
# https://github.com/yhat/ggplot/
data = pd.read_csv(hr_year_csv)
# he ended up doing ggplot(data, aes('yearID', 'HR'))...
gg = ggplot(data, aes(data['yearID'], data['HR'])) + \
geom_point(color='red') + geom_line(color='red') + \
ggtitle('homeruns by year') + xlab('year') + ylab('homeruns')
return gg
In [ ]:
from pandas import *
from ggplot import *
import pandas as pd
def lineplot_compare(hr_by_team_year_sf_la_csv):
# Write a function, lineplot_compare, that will read a csv file
# called hr_by_team_year_sf_la.csv and plot it using pandas and ggplot2.
#
# This csv file has three columns: yearID, HR, and teamID. The data in the
# file gives the total number of home runs hit each year by the SF Giants
# (teamID == 'SFN') and the LA Dodgers (teamID == "LAN"). Produce a
# visualization comparing the total home runs by year of the two teams.
#
# You can see the data in hr_by_team_year_sf_la_csv
# at the link below:
# https://www.dropbox.com/s/wn43cngo2wdle2b/hr_by_team_year_sf_la.csv
#
# Note that to differentiate between multiple categories on the
# same plot in ggplot, we can pass color in with the other arguments
# to aes, rather than in our geometry functions. For example,
# ggplot(data, aes(xvar, yvar, color=category_var)). This should help you
# in this exercise.
data = pd.read_csv(hr_by_team_year_sf_la_csv)
# he added geom_point() and the title and labels (which seem to create a legend)
gg = ggplot(data, aes('yearID', 'HR', color='teamID')) + geom_line()
return gg
In [182]:
turnstile_weather = pd.read_csv("turnstile_data_master_with_weather.csv", nrows=1000)
#%matplotlib inline
turnstile_weather.describe()
Out[182]:
In [172]:
print turnstile_weather.head()
print turnstile_weather.describe()
unit_counts = data.groupby('UNIT').size()
hour_counts = data.groupby('Hour').size()
hour_counts
Out[172]:
In [170]:
import pandas as pd
from ggplot import *
def plot_weather_data(turnstile_weather):
'''
You are passed in a dataframe called turnstile_weather.
Use turnstile_weather along with ggplot to make a data visualization
focused on the MTA and weather data we used in assignment #3.
You should feel free to implement something that we discussed in class
(e.g., scatterplots, line plots, or histograms) or attempt to implement
something more advanced if you'd like.
Here are some suggestions for things to investigate and illustrate:
* Ridership by time of day or day of week
* How ridership varies based on Subway station
* Which stations have more exits or entries at different times of day
If you'd like to learn more about ggplot and its capabilities, take
a look at the documentation at:
https://pypi.python.org/pypi/ggplot/
You can check out:
https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
To see all the columns and data points included in the turnstile_weather
dataframe.
However, due to the limitation of our Amazon EC2 server, we are giving you about 1/3
of the actual data in the turnstile_weather dataframe
'''
#try with and without stat="bar"
plot = ggplot(turnstile_weather, aes('Hour', 'ENTRIESn_hourly', fill='UNIT', color='UNIT')) + geom_bar(alpha=0.8, stat="bar") + \
ggtitle('Subway Usage') + xlab('Hour') + ylab('Number of Entries')
return plot
plot_weather_data(turnstile_weather)
Out[170]:
In [168]:
import pandas as pd
from ggplot import *
def plot_weather_data(turnstile_weather):
'''
You are passed in a dataframe called turnstile_weather.
Use turnstile_weather along with ggplot to make a data visualization
focused on the MTA and weather data we used in assignment #3.
You should feel free to implement something that we discussed in class
(e.g., scatterplots, line plots, or histograms) or attempt to implement
something more advanced if you'd like.
Here are some suggestions for things to investigate and illustrate:
* Ridership by time of day or day of week
* How ridership varies based on Subway station
* Which stations have more exits or entries at different times of day
If you'd like to learn more about ggplot and its capabilities, take
a look at the documentation at:
https://pypi.python.org/pypi/ggplot/
You can check out:
https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
To see all the columns and data points included in the turnstile_weather
dataframe.
However, due to the limitation of our Amazon EC2 server, we are giving you about 1/3
of the actual data in the turnstile_weather dataframe
'''
plot = ggplot(turnstile_weather, aes('Hour', 'ENTRIESn_hourly')) + geom_bar(alpha=0.8, stat="bar") + \
ggtitle('Subway Usage') + xlab('Hour') + ylab('Number of Entries')
return plot
plot_weather_data(turnstile_weather)
Out[168]:
In [ ]:
import pandas as pd
from ggplot import *
def plot_weather_data(turnstile_weather):
'''
You are passed in a dataframe called turnstile_weather.
Use turnstile_weather along with ggplot to make a data visualization
focused on the MTA and weather data we used in assignment #3.
You should feel free to implement something that we discussed in class
(e.g., scatterplots, line plots, or histograms) or attempt to implement
something more advanced if you'd like.
Here are some suggestions for things to investigate and illustrate:
* Ridership by time of day or day of week
* How ridership varies based on Subway station
* Which stations have more exits or entries at different times of day
If you'd like to learn more about ggplot and its capabilities, take
a look at the documentation at:
https://pypi.python.org/pypi/ggplot/
You can check out:
https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
To see all the columns and data points included in the turnstile_weather
dataframe.
However, due to the limitation of our Amazon EC2 server, we are giving you about 1/3
of the actual data in the turnstile_weather dataframe
'''
plot = ggplot(turnstile_weather, aes('Hour', 'ENTRIESn_hourly')) + geom_bar(alpha=0.8, stat="bar") + \
ggtitle('Subway Usage') + xlab('Hour') + ylab('Number of Entries')
return plot
In [180]:
import pandas as pd
from ggplot import *
def plot_weather_data(turnstile_weather):
'''
plot_weather_data is passed a dataframe called turnstile_weather.
Use turnstile_weather along with ggplot to make another data visualization
focused on the MTA and weather data we used in Project 3.
Make a type of visualization different than what you did in the previous exercise.
Try to use the data in a different way (e.g., if you made a lineplot concerning
ridership and time of day in exercise #1, maybe look at weather and try to make a
histogram in this exercise). Or try to use multiple encodings in your graph if
you didn't in the previous exercise.
You should feel free to implement something that we discussed in class
(e.g., scatterplots, line plots, or histograms) or attempt to implement
something more advanced if you'd like.
Here are some suggestions for things to investigate and illustrate:
* Ridership by time-of-day or day-of-week
* How ridership varies by subway station
* Which stations have more exits or entries at different times of day
If you'd like to learn more about ggplot and its capabilities, take
a look at the documentation at:
https://pypi.python.org/pypi/ggplot/
You can check out the link
https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
to see all the columns and data points included in the turnstile_weather
dataframe.
However, due to the limitation of our Amazon EC2 server, we will give you only
about 1/3 of the actual data in the turnstile_weather dataframe.
'''
plot = ggplot(turnstile_weather, aes('UNIT', 'ENTRIESn_hourly')) + geom_histogram(alpha=0.8) + \
ggtitle('Entries Per Unit') + xlab('UNIT') + ylab('Entries Per Hour')
return plot
plot_weather_data(turnstile_weather)
Out[180]:
In [185]:
import pandas as pd
from ggplot import *
def plot_weather_data(turnstile_weather):
'''
plot_weather_data is passed a dataframe called turnstile_weather.
Use turnstile_weather along with ggplot to make another data visualization
focused on the MTA and weather data we used in Project 3.
Make a type of visualization different than what you did in the previous exercise.
Try to use the data in a different way (e.g., if you made a lineplot concerning
ridership and time of day in exercise #1, maybe look at weather and try to make a
histogram in this exercise). Or try to use multiple encodings in your graph if
you didn't in the previous exercise.
You should feel free to implement something that we discussed in class
(e.g., scatterplots, line plots, or histograms) or attempt to implement
something more advanced if you'd like.
Here are some suggestions for things to investigate and illustrate:
* Ridership by time-of-day or day-of-week
* How ridership varies by subway station
* Which stations have more exits or entries at different times of day
If you'd like to learn more about ggplot and its capabilities, take
a look at the documentation at:
https://pypi.python.org/pypi/ggplot/
You can check out the link
https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
to see all the columns and data points included in the turnstile_weather
dataframe.
However, due to the limitation of our Amazon EC2 server, we will give you only
about 1/3 of the actual data in the turnstile_weather dataframe.
'''
plot = ggplot(turnstile_weather, aes('UNIT', 'ENTRIESn_hourly', fill='UNIT')) + geom_bar(alpha=0.8, stat="bar") + \
ggtitle('Entries Per Unit') + xlab('UNIT') + ylab('Entries Per Hour')
return plot
plot_weather_data(turnstile_weather)
Out[185]:
In [ ]:
import pandas as pd
from ggplot import *
def plot_weather_data(turnstile_weather):
'''
plot_weather_data is passed a dataframe called turnstile_weather.
Use turnstile_weather along with ggplot to make another data visualization
focused on the MTA and weather data we used in Project 3.
Make a type of visualization different than what you did in the previous exercise.
Try to use the data in a different way (e.g., if you made a lineplot concerning
ridership and time of day in exercise #1, maybe look at weather and try to make a
histogram in this exercise). Or try to use multiple encodings in your graph if
you didn't in the previous exercise.
You should feel free to implement something that we discussed in class
(e.g., scatterplots, line plots, or histograms) or attempt to implement
something more advanced if you'd like.
Here are some suggestions for things to investigate and illustrate:
* Ridership by time-of-day or day-of-week
* How ridership varies by subway station
* Which stations have more exits or entries at different times of day
If you'd like to learn more about ggplot and its capabilities, take
a look at the documentation at:
https://pypi.python.org/pypi/ggplot/
You can check out the link
https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
to see all the columns and data points included in the turnstile_weather
dataframe.
However, due to the limitation of our Amazon EC2 server, we will give you only
about 1/3 of the actual data in the turnstile_weather dataframe.
'''
plot = ggplot(turnstile_weather, aes('Hour', 'ENTRIESn_hourly', color='UNIT')) + geom_point(alpha=0.8) + \
ggtitle('Entries Per Unit') + xlab('Hour and Unit') + ylab('Entries Per Hour')
return plot
#plot_weather_data(turnstile_weather)