I like watching the Phillies. I do not have cable. Some Phillies games are broadcast on national television. This is how I made a list of those games.
Pandas is a data analysis tool for the Python programming language. It can do a tremendous amount of really powerful data analysis and visualization. It's a gun in this CSV knife fight.
In [1]:
import pandas as pd
A downloadable CSV schedule is available from mlb.com. Here is a direct link to the Phillies schedule.
The CSV schedule will be used to instantiate a Pandas DataFrame object.
In [2]:
schedule = pd.DataFrame.from_csv("phillies-2016.csv")
In [3]:
schedule.info()
In [4]:
schedule.head()
Out[4]:
In [6]:
schedule.drop(["REMINDER OFF",
"REMINDER ON",
"START TIME ET",
"END DATE",
"END DATE ET",
"END TIME",
"END TIME ET",
"REMINDER TIME",
"REMINDER TIME ET",
"SHOWTIMEAS FREE",
"SHOWTIMEAS BUSY",
"REMINDER DATE"], axis=1, inplace=True)
schedule.head()
Out[6]:
In [11]:
schedule.DESCRIPTION.head(50)
Out[11]:
In [73]:
description = schedule.DESCRIPTION[6]
print description
Grab the rough station string with a regular expression.
In [123]:
import re
TV_STATION_RE = re.compile(r"""Local\s+TV:\s+ # TV token
(?P<stations>.*) # Group everything following it lazily as stations
""", re.X)
Use that to pull them out and do some text wrangling.
In [ ]:
def tv_stations_from_description(description):
"""Return a list of television stations embedded in the given description."""
tv_stations = []
result = re.search(TV_STATION_RE, str(description))
if result:
media_delimiter = "-----"
tv_station_str = result.group("stations").split(media_delimiter)[0]
tv_stations = tv_station_str.split("- ")
tv_stations = [s.strip() for s in tv_stations]
return tv_stations
Test it out on all of the descriptions.
In [126]:
tv_stations = set()
for d in schedule.DESCRIPTION:
tv_stations |= set(tv_stations_from_description(d))
tv_stations
Out[126]:
Applying this function to the DataFrame yields a Series of all television stations on which the Phillies are broadcast this season.
In [127]:
stations_series = schedule.DESCRIPTION.apply(lambda d: tv_stations_from_description(d))
stations_series
Out[127]:
Double check the set of stations from that Series.
In [129]:
set([station for stations in stations_series.values for station in stations])
Out[129]:
The 190 Phillies games are broadcast on 6 television channels. Unfortunately only 1 of those 6 stations are available without a cable television subscription. This means that I can only watch games on NBC.
Filtering the DESCRIPTION column to national television broadcast stations yields only the games which I can watch over the air with my HD antenna.
In [117]:
national_broadcast_schedule = schedule[schedule.DESCRIPTION.str.contains("NBC 10") == True]
national_broadcast_schedule
Out[117]:
In [118]:
national_broadcast_schedule.describe()
Out[118]:
This means that I have the possibility to watch 10 out of 190 Phillies games this season which is roughly 5%.