I like watching the Phillies. I do not have cable. Some Phillies games are broadcast on national television. This is how I made a list of those games.
Pandas is a data analysis tool for the Python programming language. It can do a tremendous amount of really powerful data analysis and visualization. It's a gun in this CSV knife fight.
In [1]:
import pandas as pd
A downloadable CSV schedule is available from mlb.com. Here is a direct link to the Phillies schedule.
The CSV schedule will be used to instantiate a Pandas DataFrame object.
In [2]:
schedule = pd.DataFrame.from_csv("phillies.csv")
In [3]:
schedule.info()
In [4]:
schedule.head()
Out[4]:
In [5]:
schedule.drop(["REMINDER_OFF",
"REMINDER_ON",
"START_TIME_ET",
"END_DATE",
"END_DATE_ET",
"END_TIME",
"END_TIME_ET",
"REMINDER_TIME",
"REMINDER_TIME_ET",
"SHOWTIMEAS_FREE",
"SHOWTIMEAS_BUSY",
"REMINDER_DATE"], axis=1, inplace=True)
schedule.head()
Out[5]:
In [6]:
schedule.DESCRIPTION.head()
Out[6]:
In [7]:
description = schedule.DESCRIPTION[2]
print description
In [8]:
def tv_stations_from_description(description):
"""Return a list of television stations embedded in the given description."""
return [station.strip() for station in description.split(":")[1].split("-----")[0].split("--")]
result = tv_stations_from_description(description)
print result
assert(len(result) == 2)
Picking a game broadcast on a single channel to test the parsing function.
In [9]:
description = schedule.DESCRIPTION[0]
print description
result = tv_stations_from_description(description)
print result
assert(len(result) == 1)
Applying this function to the DataFrame yields a Series of all television stations on which the Phillies are broadcast this season.
In [10]:
stations_series = schedule.DESCRIPTION.apply(
lambda description: [station.strip() for station in
description.split(":")[1].split("-----")[0].split("--")])
stations_series
Out[10]:
Creating a set of stations from that Series will yield a concise list of distinct television broadcast stations.
In [11]:
set([station for stations in stations_series.values for station in stations])
Out[11]:
The 162 regular season Phillies games are broadcast on 7 television channels. Unfortunately only 2 of those 7 stations are available without a cable television subscription. This means that I can only watch games on NBC and FOX.
Filtering the DESCRIPTION column to national television broadcast stations yields only the games which I can watch over the air with my HD antenna.
In [12]:
schedule[(schedule.DESCRIPTION.str.contains("NBC 10")) |
(schedule.DESCRIPTION.str.contains("FOX"))]
Out[12]:
This means that I have the possibility to watch 13 out of 162 regular season Phillies games this season which is roughly 8%.