Streaming, TV, and Sports

By: Siddharth Srikanth and Vishal Bailoor

We sought to discover the true effect of streaming on TV by looking at changes in TV show ratings and sports programming ratings over time. TV show data captures the comparable set of programming to a Netflix or Amazon Video. Sports, meanwhile, has historically been popular and tied to a live experience. The media has trumpeted the effect of streaming on TV, including sports programming, and we wanted to see what the data has to say.

Part I: Sports Ratings

Introduction: The Death of Television (may have been greatly exaggerated)

The rise of Netflix and Amazon Video have dominated the news for a number of years, alongside concurrent trends like cord-cutting, streaming, and the decline of media and cable television. Logic would suggest the two are connected, that the rise of convenient, centralized streaming would itself harm current media, especially TV. Netflix statistics are used here as a proxy for streaming numbers, as a market-dominant firm.

We sought to answer two key questions surrounding this emerging trend.

1. What effect has the rise of Netflix and streaming video had on conventional television?

Key metrics here include [INSERT KEY METRICS HERE SID]

2. To what extent has sports viewership been affected by the rise of streaming?

Key metrics here include ratings numbers, viewership numbers, year-over-year change, and weekly and yearly numbers.

Data

Question 1:

[Fill in Data Methodology]

Question 2:

Sports are a very different phenomenon from the remainder of network TV, which the other portion of the project covers. Sports are consistently higher rated across channels, are less channel-specific, and on a business side have separate media rights contracts to production shows.

This part draws from a number of web sources. The bulk of the data comes from http://www.sportsmediawatch.com/nfl-tv-ratings-viewership-nbc-cbs-fox-espn-nfln-regular-season-playoffs/. This data is on 2014-current ratings and viewership for various football matches. Additional data has been filled in from ESPN.com (which reports viewers/ratings for certain high-ticket games) and http://tvbythenumbers.zap2it.com/tag/nfl-football-ratings/. The latter source was primarily used to fill in unreported games from sportsmediawatch.

SportsMediaWatch("SMW") provides week-by-week data. We then manually downloaded and cleaned the data from 2014-now. 2013 data can be derived from the change numbers in 2014, but we also got comparable raw 2013 data from TV-by-the-numbers. ESPN.com served as a fact-check for blue-chip games.

Operationally, we first downloaded the raw data in CSV form from sportsmediawatch. The CSV link on a public Github file is below. We then cleaned it and organized it in this notebook.



In [1]:

    
import sys                         
import pandas as pd                
import matplotlib as mpl           
import matplotlib.pyplot as plt      
import datetime as dt



In [2]:

    
url = "https://raw.githubusercontent.com/vishbail/DB-Final-Project/master/DB%20Final%20Project%20Data%20Raw.csv"

#sdf stands for sports data frame, distinguished from the non sports data frame
sdf = pd.read_csv(url)



In [3]:

    
#remove all internal header rows - found through unfortunate trial and error
sdf = sdf.drop(sdf.index[80:82])
sdf = sdf.drop(sdf.index[186:188])
sdf = sdf.drop(sdf.index[293:294])



In [4]:

    
##Part 0: Cleaning and Summarizing Data

# drop unnecessary first row
sdf = sdf.drop(sdf.index[0:1])

#drop empty end columns
sdf = sdf.drop(sdf.columns[8:], axis=1)


#Sets internal column headers into overall headers
sdf.columns = sdf.iloc[0]
sdf = sdf.drop(sdf.index[0])

###failed column change attempts (remove/ignore)
#sdf["Year"][2:3] = 2015
#sdf.iloc[9][2:3] = 2015
#for i in range(2,79):
 #   sdf.set_value(2, 9, 2015)s

#Rename Unclear columns
sdf.columns.values[3] = "Vwrs. Change"
sdf.columns.values[5] = "Rtg. Change"

#drop rows with no game ratings and convert others to floats
sdf = sdf.dropna(subset = ["Rtg."])
sdf["Rtg."] = sdf["Rtg."].astype(float)
sdf = sdf.dropna(subset = ["Vwrs."])
sdf["Vwrs."] = sdf["Vwrs."].astype(float)
sdf = sdf.dropna(subset = ["Week"])
sdf["Week"] = sdf["Week"].astype(int)

sdf = sdf.set_index(["Year","Week"])

Part 1: Sports. Sports Never Changes.

Given cord-cutting and a turn from television to streaming, consumption of sports on conventional channels will decrease.

This first figure makes a simple line plot of ratings data against time.



In [5]:

    
fig = plt.figure()
sdf["Rtg."].plot(fontsize=6)
fig.suptitle("NFL Ratings Trend", fontsize=16)
plt.xlabel("Year and Week", fontsize=12)
plt.show()

Surprisingly, this heartbeat-esque plot shows that there is little or no change over time, with the range of starting and ending ratings being within the same 8-15 range. Perhaps gross viewership, a component of ratings (which are at least partially driven by percentage of viewers watching) but a separate statistic, shows a different story?



In [6]:

    
fig1 = plt.figure()
sdf["Vwrs."].plot(fontsize=6)
fig1.suptitle("NFL Viewership Trend", fontsize=16)
plt.xlabel("Year and Week", fontsize=12)
plt.show()

Aside from an aberrative number in the fall of 2013 (negative viewers would certainly be bad for TV), viewership numbers have at least weakly increased since 2013, with a good portion of 2013 under or around 10 million viewers and almost all of 2014-2015 above it, though 2016 appears to show decline.

Netflix, though has been strong over the same period. A look at (manually summarized) subscriber numbers shows this:



In [7]:

    
##ndf stands for netflix data frame
#data sourced from https://www.statista.com/statistics/250934/quarterly-number-of-netflix-streaming-subscribers-worldwide/
ndf = pd.DataFrame(dict(subscribers = pd.Series([44.35, 57.39, 74.76, 86.74]), year = pd.Series([2013,2014,2015,2016])))
ndf = ndf.set_index(ndf["year"])
ndf = ndf.drop(ndf.columns[1], axis=1)
ndf

fig2 = plt.figure()
ndf["subscribers"].plot(fontsize=6, kind="bar")
fig.suptitle("Netflix Subscriber Data", fontsize=16)
plt.xlabel("Year", fontsize=12)
plt.show()

Netflix subscribers have almost doubled over the same time period, indicating that though Netflix has grown, it has certainly not come at the expense of sports. Sports ratings and viewership are steady, and small inter-week fluctuations aside, sports programming shows no sign of change in this age of Netflix.



In [8]:

    
sdf = sdf.reset_index()

sdf = sdf.sort_values(by="Year")
sdf = sdf.sort_values(by="Week")

sdf = sdf.set_index(["Week"]) 

fig3 = plt.figure()
sdf["Rtg."].plot(fontsize=8, kind="bar")
fig3.suptitle("Weekly NFL Ratings Data", fontsize=16)
plt.xlabel("Week By Week", fontsize=12)
plt.show()

(Apologies for the nasty x-axis labeling, I tried to fix it for over an hour and abjectly failed. Essentially the x axis tracks weeks over time.)

A small digression, we wanted to test if weekly numbers showed a pattern across years, for example if sports rating increased on holidays or when college is in session. As the bar graph above attests, there is little pattern in the weekly data, an especially odd thought given the hype around start-of-season games and end-of-season playoff-deciding games.

Though specific networks such as ESPN may or may not be declining, the data shows that cross-channel trends in sports viewership are relatively stable.

With viewership and ratings holding steady as Netflix rises, and knowing that weekly trends are not masking other patterns, we believe that sports viewership has not been affected by streaming.

Part II: TV Watching & Opinions.



In [9]:

    
import sys                             # system module 
import pandas as pd                    # data package
import matplotlib.pyplot as plt # graphics module  
import datetime as dt                  # date and time module
import numpy as np                     # foundation for pandas 

%matplotlib inline                     

# check versions (overkill, but why not?)
print('Python version: ', sys.version)
print('Pandas version: ', pd.__version__)
print('Today: ', dt.date.today())

#I started by importing all the necessary packages.









    



Python version:  3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
Pandas version:  0.19.0
Today:  2016-12-22

AV Club Data

So, because of my inability to get Metacritic ratings per episode for TV shows, I chose to stick with one source for my tv ratings: The A/V Club. I was able to find the information courtesy of a post on r/dataisbeautiful with about 50-60 shows worth of information, including critic ratings, community rating, number of people who voted, etc. So before I used the critic data I needed, I decided to play around with the data and just see what I could find by messing around with it.



In [10]:

    
url = "http://jespajo.neocities.org/clubData.csv"
av = pd.read_csv(url)
avc = av.set_index(['show', 'season', 'epno'])
print(av.columns.values)
avc247 = avc.head(24)









    



['show' 'season' 'epno' 'epname' 'relepno' 'href' 'criticname' 'nratings'
 'critrating' 'commrating']



In [11]:

    
av.show.view









    Out[11]:





<bound method Series.view of 0              24
1              24
2              24
3              24
4              24
5              24
6              24
7              24
8              24
9              24
10             24
11             24
12             24
13             24
14             24
15             24
16             24
17             24
18             24
19             24
20             24
21             24
22             24
23             24
24             24
25             24
26             24
27             24
28             24
29             24
           ...   
17875     Younger
17876     Younger
17877     Younger
17878     Younger
17879     Younger
17880     Younger
17881     Younger
17882     Younger
17883     Younger
17884     Younger
17885     Younger
17886     Younger
17887     Younger
17888     Younger
17889     Younger
17890     Younger
17891     Younger
17892     Younger
17893     Younger
17894     Younger
17895     Younger
17896    Z Nation
17897    Z Nation
17898    Z Nation
17899    Z Nation
17900    Z Nation
17901    Z Nation
17902    Z Nation
17903    Z Nation
17904    Z Nation
Name: show, dtype: object>



In [12]:

    
avc247
print(avc247.dtypes)









    



epname         object
relepno         int64
href           object
criticname     object
nratings      float64
critrating    float64
commrating    float64
dtype: object

The code above describes the following: how I got the data I'd be working with in terms of critical appraisal and began to modify it. The way the data was originally formatted, the index was just simple numbers. So, I changed the index to show, season, and episode number, in that order. So from there, I wanted to see if there was some consistent correlation between critical and community reviews. Was there consistent over or under-scoring?



In [13]:

    
avcomm = av.groupby(['show', 'season'])['commrating'].mean()
avcrit = av.groupby(['show', 'season'])['critrating'].mean()



In [14]:

    
avcomm.head(25).plot.barh()
avcrit.head(25).plot.barh(alpha=0.5, color='red')









    Out[14]:





<matplotlib.axes._subplots.AxesSubplot at 0x1194d40b8>



In [15]:

    
avcrit[20:40]









    Out[15]:





show                             season
@Midnight                        1          8.000000
A Gifted Man                     1          5.625000
A To Z                           1          7.615385
Adventure Time                   4          9.100000
                                 5          9.450000
                                 6          9.285714
                                 7          9.250000
Alcatraz                         1          8.230769
Alias                            1         11.000000
                                 2               NaN
Almost Human                     1          7.538462
Alphas                           1          8.181818
                                 2          8.692308
America's Best Dance Crew        7          9.600000
America's Next Great Restaurant  1          6.444444
America's Next Top Model         10         8.545455
                                 11         8.750000
                                 12         6.900000
                                 13         7.166667
                                 14         8.000000
Name: critrating, dtype: float64



In [16]:

    
avdiff = avcomm - avcrit
avdiff.sort_values(ascending=True)
avdiff.dropna().sort_values(ascending=True).head(10)









    Out[16]:





show                      season
Rock of Love              2        -6.615385
Frankenstein M.D.         1        -4.555556
Gossip Girl               4        -4.500000
Peter Pan Live!           1        -4.242647
The Wil Wheaton Project   1        -4.000000
Witches Of East End       2        -4.000000
America's Next Top Model  19       -3.438462
The Late Late Show        1        -3.384615
Steven Universe           2        -3.279881
The Sound Of Music Live!  1        -3.200000
dtype: float64



In [17]:

    
avc.T









    Out[17]:






  
    
      show
      24
      ...
      Younger
      Z Nation
    
    
      season
      7
      ...
      2
      3
    
    
      epno
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      ...
      12
      1
      2
      3
      4
      5
      6
      7
      8
      9
    
  
  
    
      epname
      Day 7: 8:00 A.M.-9:00 A.M.
      Day 7: 9:00 A.M.-10:00 A.M.
      Day 7: 10:00 A.M.-11:00 A.M.
      Day 7: 11:00 A.M.-12:00 P.M.
      Day 7: 12:00 P.M.-1:00 P.M.
      Day 7: 1:00 P.M.-2:00 P.M.
      Day 7: 2:00 P.M.-3:00 P.M.
      Day 7: 3:00 P.M.-4:00 P.M.
      Day 7: 4:00 P.M.-5:00 P.M.
      Day 7: 5:00 P.M.-6:00 P.M.
      ...
      No Weddings & a Funeral
      No Mercy
      A New Mission
      Murphy's Miracle
      Escorpion and the Red Hand
      Little Red and the Wolfz
      Doc Flew Over the Cuckoo's Nest
      Welcome to Murphytown
      Election Day
      Heart of Darkness
    
    
      relepno
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      ...
      23
      1
      2
      3
      4
      5
      6
      7
      8
      9
    
    
      href
      /tvclub/24-800-am-1000-am-22203
      /tvclub/24-800-am-1000-am-22203
      /tvclub/24-1000-am-1200-pm-22238
      /tvclub/24-1000-am-1200-pm-22238
      /tvclub/24-1200-pm-100-pm-22862
      /tvclub/24-100-pm-200-pm-23085
      /tvclub/24-200-pm-300-pm-23330
      /tvclub/24-300pm-400pm-23625
      /tvclub/24-400-pm-500-pm-23898
      /tvclub/24-500pm-600pm-24278
      ...
      /tvclub/friendship-trumps-shipping-younger-sea...
      /tvclub/z-nation-returns-guns-blazing-stand-al...
      /tvclub/theres-brand-new-mission-z-nation-team...
      /tvclub/zombies-arent-only-things-moving-slow-...
      /tvclub/z-nation-gets-its-gross-groove-back-vi...
      /tvclub/z-nation-dives-deep-survivors-ordeal-a...
      /tvclub/z-nation-visits-mental-asylum-and-ends...
      /tvclub/z-nations-table-setting-presents-two-h...
      /tvclub/z-nation-presidential-election-basical...
      /tvclub/its-no-metaphor-when-death-comes-seatt...
    
    
      criticname
      Zack Handlen
      Zack Handlen
      Zack Handlen
      Zack Handlen
      Zack Handlen
      Zack Handlen
      Zack Handlen
      Zack Handlen
      Zack Handlen
      Zack Handlen
      ...
      Alexa Planje
      Alex McCown-Levy
      Alex McCown-Levy
      Alex McCown-Levy
      Alex McCown-Levy
      Alex McCown-Levy
      Alex McCown-Levy
      Alex McCown-Levy
      Alex McCown-Levy
      Alex McCown-Levy
    
    
      nratings
      2
      1
      1
      1
      1
      1
      1
      1
      1
      NaN
      ...
      24
      24
      21
      15
      15
      13
      16
      16
      12
      10
    
    
      critrating
      10
      10
      8
      8
      7
      9
      9
      10
      8
      9
      ...
      9
      7
      8
      5
      9
      10
      5
      8
      9
      8
    
    
      commrating
      9.5
      10
      9
      9
      8
      10
      7
      10
      8
      NaN
      ...
      8.83333
      9.29167
      8.66667
      8.26667
      9.6
      9.15385
      7.6875
      9.4375
      8.58333
      7.9
    
  

7 rows × 17905 columns



In [18]:

    
avdiff.dropna().sort_values(ascending=True).tail(10)









    Out[18]:





show                       season
Workaholics                2         3.952381
Pickle and Peanut          1         4.333333
The Royals                 1         4.413793
Sharknado                  4         4.464286
Psych                      5         4.475000
The Jack And Triumph Show  1         5.389610
Sharknado                  3         5.409091
Gossip Girl                6         6.159794
The Big C                  4         6.382353
Ready For Love             1         8.333333
dtype: float64



In [19]:

    
avc248 = avc.T[('24', '8')].T
avc248









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py:390: PerformanceWarning: indexing past lexsort depth may impact performance.
  user_expressions, allow_stdin)






    Out[19]:






  
    
      
      epname
      relepno
      href
      criticname
      nratings
      critrating
      commrating
    
    
      epno
      
      
      
      
      
      
      
    
  
  
    
      1
      Day 8: 4:00P.M. - 5:00P.M.
      25
      /tvclub/24-400pm-500pm500pm-600pm-37169
      Zack Handlen
      22
      7
      7.18182
    
    
      2
      Day 8: 5:00P.M. - 6:00P.M.
      26
      /tvclub/24-400pm-500pm500pm-600pm-37169
      Zack Handlen
      20
      7
      7.2
    
    
      3
      Day 8: 6:00P.M. - 7:00P.M.
      27
      /tvclub/24-600pm-700pm700pm-800pm-37214
      Zack Handlen
      23
      9
      8.82609
    
    
      4
      Day 8: 7:00P.M. - 8:00P.M.
      28
      /tvclub/24-600pm-700pm700pm-800pm-37214
      Zack Handlen
      23
      9
      8.95652
    
    
      5
      Day 8: 8:00P.M. - 9:00P.M.
      29
      /tvclub/24-800pm-900pm-37513
      Zack Handlen
      23
      6
      5.91304
    
    
      6
      Day 8: 9:00P.M. - 10:00P.M.
      30
      /tvclub/24-900pm-1000pm-37772
      Zack Handlen
      19
      5
      4.73684
    
    
      7
      Day 8: 10:00P.M. - 11:00P.M.
      31
      /tvclub/24-1000pm-1100pm-38024
      Zack Handlen
      26
      7
      6.57692
    
    
      8
      Day 8: 11:00P.M. - 12:00A.M.
      32
      /tvclub/24-1100pm-1200am-38252
      Zack Handlen
      16
      9
      8.0625
    
    
      9
      Day 8: 12:00A.M. - 1:00A.M.
      33
      /tvclub/24-1200am-100am-38495
      Zack Handlen
      21
      9
      8
    
    
      10
      Day 8: 1:00A.M. - 2:00A.M.
      34
      /tvclub/24-100am-200am-38707
      Zack Handlen
      20
      9
      7.15
    
    
      11
      Day 8: 2:00A.M. - 3:00A.M.
      35
      /tvclub/24-200am-300am-38999
      Zack Handlen
      22
      7
      6.72727
    
    
      12
      Day 8: 3:00A.M. - 4:00A.M.
      36
      /tvclub/24-300am-400am-39242
      Zack Handlen
      26
      5
      6.69231
    
    
      13
      Day 8: 4:00A.M. - 5:00A.M.
      37
      /tvclub/24-400am-500am-39449
      Zack Handlen
      24
      6
      5.125
    
    
      14
      Day 8: 5:00A.M. - 6:00A.M.
      38
      /tvclub/24-500am-600am-39655
      Zack Handlen
      24
      5
      6.33333
    
    
      15
      Day 8: 6:00A.M. - 7:00A.M.
      39
      /tvclub/24-600am-800am-39852
      Zack Handlen
      43
      6
      8.25581
    
    
      16
      Day 8: 7:00A.M. - 8:00A.M.
      40
      /tvclub/24-600am-800am-39852
      Zack Handlen
      43
      6
      8.25581
    
    
      17
      Day 8: 8:00A.M. - 9:00A.M.
      41
      /tvclub/24-800am-900am-40041
      Zack Handlen
      24
      8
      8.04167
    
    
      18
      Day 8: 9:00A.M. - 10:00A.M.
      42
      /tvclub/24-900am-1000am-40283
      Zack Handlen
      19
      8
      7.52632
    
    
      19
      Day 8: 10:00A.M. - 11:00A.M.
      43
      /tvclub/24-1000am-1100am-40509
      Zack Handlen
      25
      4
      5.12
    
    
      20
      Day 8: 11:00A.M. - 12:00P.M.
      44
      /tvclub/24-1100am-1200pm-40758
      Zack Handlen
      19
      6
      7.21053
    
    
      21
      Day 8: 12:00P.M. - 1:00P.M.
      45
      /tvclub/24-1200pm-100pm-41026
      Zack Handlen
      18
      9
      7.72222
    
    
      22
      Day 8: 1:00P.M. - 2:00P.M.
      46
      /tvclub/24-100pm-200pm-41254
      Zack Handlen
      20
      10
      8.9
    
    
      23
      Day 8: 2:00P.M. - 3:00P.M.
      47
      /tvclub/24-200pm-400pm-41481
      Zack Handlen
      39
      8
      8.48718
    
    
      24
      Day 8: 3:00P.M. - 4:00P.M.
      48
      /tvclub/24-200pm-400pm-41481
      Zack Handlen
      38
      8
      8.44737



In [20]:

    
e=avc.T[('How I Met Your Mother', '3')].T









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py:390: PerformanceWarning: indexing past lexsort depth may impact performance.
  user_expressions, allow_stdin)

Now, this is just the data from the AVClub. I was having a lot of trouble installing IMDB and Metacritic packages on my computer. Once the packages install, I can query the sites and use their data to see if the AVClub is a consistent over- or under-scorer. In addition, with the TV ratings data Vishal has, I can look at the series broadcasting from 2013 onwards and compare viewership trends juxtaposed against quality.



In [21]:

    
import pandas as pd             # data package
import matplotlib.pyplot as plt # graphics 
import datetime as dt           # date tools, used to note current date  
import sys

# these are new 
import requests

print('\nPython version: ', sys.version) 
print('Pandas version: ', pd.__version__)
print('Requests version: ', requests.__version__)
print("Today's date:", dt.date.today())









    



Python version:  3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
Pandas version:  0.19.0
Requests version:  2.12.3
Today's date: 2016-12-22



In [22]:

    
import json
import pandas as pd



In [23]:

    
import pandas as pd
himymratings = pd.ExcelFile("C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx").parse()









    



---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-23-194c528f26eb> in <module>()
      1 import pandas as pd
----> 2 himymratings = pd.ExcelFile("C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx").parse()

/Users/sglyon/anaconda3/lib/python3.5/site-packages/pandas/io/excel.py in __init__(self, io, **kwds)
    247             self.book = xlrd.open_workbook(file_contents=data)
    248         elif isinstance(io, compat.string_types):
--> 249             self.book = xlrd.open_workbook(io)
    250         else:
    251             raise ValueError('Must explicitly set engine if not passing in'

/Users/sglyon/anaconda3/lib/python3.5/site-packages/xlrd/__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
    393         peek = file_contents[:peeksz]
    394     else:
--> 395         with open(filename, "rb") as f:
    396             peek = f.read(peeksz)
    397     if peek == b"PK\x03\x04": # a ZIP file

FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx'

PLEASE READ!

Hey Professors, so something odd happens with this following bit of code. The "a" variable basically generates a list of lists, but the ordering of those lists changes every time this notebook is opened for the first time. So, if you run this code and the graphs don't work, check the list ordering in 'a' and refill the values for the date, rating, title and number variables below, for the three shows that use this data. I know it's tedious and honestly bad this problem is occuring in the first place without a solution, but I couldn't find a solution in time, and would just let you know. Sorry in advance.



In [24]:

    
url = 'https://raw.githubusercontent.com/leosartaj/tvstats/master/data/jsonData/himym.json'
himym = pd.read_json(url)

himym.episodes[1]
a = [list(col) for col in zip(*[d.values() for d in himym.episodes[1]])]
himymdate1 = a[0]
himymrating1 = a[1]
himymtitle1 = a[2]
himymnumber1 = a[3]
himyms1 = pd.DataFrame(
    {'Episode Number': himymnumber1,
     'Episode Title': himymtitle1,
     'Rating': himymrating1
    })

himym.episodes[2]
a = [list(col) for col in zip(*[d.values() for d in himym.episodes[2]])]
himymdate2 = a[0]
himymrating2 = a[1]
himymtitle2 = a[2]
himymnumber2 = a[3]
himyms2 = pd.DataFrame(
    {'Episode Number': himymnumber2,
     'Episode Title': himymtitle2,
     'Rating': himymrating2
    })
  
himym.episodes[3]
a = [list(col) for col in zip(*[d.values() for d in himym.episodes[3]])]
himymdate3 = a[0]
himymrating3 = a[1]
himymtitle3 = a[2]
himymnumber3 = a[3]
himyms3 = pd.DataFrame(
    {'Episode Number': himymnumber3,
     'Episode Title': himymtitle3,
     'Rating': himymrating3
    })

himym.episodes[4]
a = [list(col) for col in zip(*[d.values() for d in himym.episodes[4]])]
himymdate4 = a[0]
himymrating4 = a[1]
himymtitle4 = a[2]
himymnumber4 = a[3]
himyms4 = pd.DataFrame(
    {'Episode Number': himymnumber4,
     'Episode Title': himymtitle4,
     'Rating': himymrating4
    })

himym.episodes[5]
a = [list(col) for col in zip(*[d.values() for d in himym.episodes[5]])]
himymdate5 = a[0]
himymrating5 = a[1]
himymtitle5 = a[2]
himymnumber5 = a[3]
himyms5 = pd.DataFrame(
    {'Episode Number': himymnumber5,
     'Episode Title': himymtitle5,
     'Rating': himymrating5
    })

himym.episodes[6]
a = [list(col) for col in zip(*[d.values() for d in himym.episodes[6]])]
himymdate6 = a[0]
himymrating6 = a[1]
himymtitle6 = a[2]
himymnumber6 = a[3]
himyms6 = pd.DataFrame(
    {'Episode Number': himymnumber6,
     'Episode Title': himymtitle6,
     'Rating': himymrating6
    })
 
himym.episodes[7]
a = [list(col) for col in zip(*[d.values() for d in himym.episodes[7]])]
himymdate7 = a[0]
himymrating7 = a[1]
himymtitle7 = a[2]
himymnumber7 = a[3]
himyms7 = pd.DataFrame(
    {'Episode Number': himymnumber7,
     'Episode Title': himymtitle7,
     'Rating': himymrating7
    })
     
himym.episodes[8]
a = [list(col) for col in zip(*[d.values() for d in himym.episodes[8]])]
himymdate8 = a[0]
himymrating8 = a[1]
himymtitle8 = a[2]
himymnumber8 = a[3]
himyms8 = pd.DataFrame(
    {'Episode Number': himymnumber8,
     'Episode Title': himymtitle8,
     'Rating': himymrating8
    })

himym.episodes[9]
a = [list(col) for col in zip(*[d.values() for d in himym.episodes[9]])]
himymdate9 = a[0]
himymrating9 = a[1]
himymtitle9 = a[2]
himymnumber9 = a[3]
himyms9 = pd.DataFrame(
    {'Episode Number': himymnumber9,
     'Episode Title': himymtitle9,
     'Rating': himymrating9
    })


a









    Out[24]:





[['7.7',
  '7.7',
  '7.5',
  '7.1',
  '7.1',
  '7.4',
  '7.1',
  '7.2',
  '8.3',
  '6.5',
  '5.5',
  '7.6',
  '8.2',
  '5.5',
  '8.4',
  '9.5',
  '8.1',
  '8.0',
  '7.6',
  '8.4',
  '8.3',
  '8.9',
  '6.9',
  '5.7'],
 ['The Locket',
  'Coming Back',
  'Last Time in New York',
  'The Broken Code',
  'The Poker Game',
  'Knight Vision',
  'No Questions Asked',
  'The Lighthouse',
  'Platonish',
  'Mom and Dad',
  'Bedtime Stories',
  'The Rehearsal Dinner',
  'Bass Player Wanted',
  'Slapsgiving 3: Slappointment in Slapmarra',
  'Unpause',
  'How Your Mother Met Me',
  'Sunrise',
  'Rally',
  'Vesuvius',
  'Daisy',
  'Gary Blauman',
  'The End of the Aisle',
  'Last Forever: Part One',
  'Last Forever: Part Two'],
 ['1',
  '2',
  '3',
  '4',
  '5',
  '6',
  '7',
  '8',
  '9',
  '10',
  '11',
  '12',
  '13',
  '14',
  '15',
  '16',
  '17',
  '18',
  '19',
  '20',
  '21',
  '22',
  '23',
  '24'],
 ['23 Sep. 2013',
  '23 Sep. 2013',
  '30 Sep. 2013',
  '7 Oct. 2013',
  '14 Oct. 2013',
  '21 Oct. 2013',
  '28 Oct. 2013',
  '4 Nov. 2013',
  '11 Nov. 2013',
  '18 Nov. 2013',
  '25 Nov. 2013',
  '2 Dec. 2013',
  '16 Dec. 2013',
  '13 Jan. 2014',
  '20 Jan. 2014',
  '27 Jan. 2014',
  '3 Feb. 2014',
  '24 Feb. 2014',
  '3 Mar. 2014',
  '10 Mar. 2014',
  '17 Mar. 2014',
  '24 Mar. 2014',
  '31 Mar. 2014',
  '31 Mar. 2014']]



In [25]:

    
himymratings2 = himymratings.set_index(['Season', 'No. in Season'])
d = himymratings2.T[(8)].T
e = avc.T[('How I Met Your Mother', '8')].T









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-42cda333eadb> in <module>()
----> 1 himymratings2 = himymratings.set_index(['Season', 'No. in Season'])
      2 d = himymratings2.T[(8)].T
      3 e = avc.T[('How I Met Your Mother', '8')].T

NameError: name 'himymratings' is not defined



In [26]:

    
fig, axe = plt.subplots()
e.plot(y='critrating', ax=axe)
himyms8.Rating.convert_objects(convert_numeric = True).plot(ax=axe)









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:3: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  app.launch_new_instance()






    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-5bbacbacdcc2> in <module>()
      1 fig, axe = plt.subplots()
      2 e.plot(y='critrating', ax=axe)
----> 3 himyms8.Rating.convert_objects(convert_numeric = True).plot(ax=axe)

/Users/sglyon/anaconda3/lib/python3.5/site-packages/pandas/tools/plotting.py in __call__(self, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   3597                            colormap=colormap, table=table, yerr=yerr,
   3598                            xerr=xerr, label=label, secondary_y=secondary_y,
-> 3599                            **kwds)
   3600     __call__.__doc__ = plot_series.__doc__
   3601 

/Users/sglyon/anaconda3/lib/python3.5/site-packages/pandas/tools/plotting.py in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   2671                  yerr=yerr, xerr=xerr,
   2672                  label=label, secondary_y=secondary_y,
-> 2673                  **kwds)
   2674 
   2675 

/Users/sglyon/anaconda3/lib/python3.5/site-packages/pandas/tools/plotting.py in _plot(data, x, y, subplots, ax, kind, **kwds)
   2467         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   2468 
-> 2469     plot_obj.generate()
   2470     plot_obj.draw()
   2471     return plot_obj.result

/Users/sglyon/anaconda3/lib/python3.5/site-packages/pandas/tools/plotting.py in generate(self)
   1039     def generate(self):
   1040         self._args_adjust()
-> 1041         self._compute_plot_data()
   1042         self._setup_subplots()
   1043         self._make_plot()

/Users/sglyon/anaconda3/lib/python3.5/site-packages/pandas/tools/plotting.py in _compute_plot_data(self)
   1148         if is_empty:
   1149             raise TypeError('Empty {0!r}: no numeric data to '
-> 1150                             'plot'.format(numeric_data.__class__.__name__))
   1151 
   1152         self.data = numeric_data

TypeError: Empty 'DataFrame': no numeric data to plot



In [112]:

    
e.critrating - himyms8.Rating.convert_objects(convert_numeric = True)









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  if __name__ == '__main__':






    Out[112]:





0     NaN
1     NaN
2     NaN
3     NaN
4    -1.3
5     1.6
6     2.7
7     0.6
8     0.6
9     1.1
10   -2.5
11    1.7
12    3.2
13    1.1
14   -2.1
15    2.5
16   -2.7
17      0
18    3.2
19      2
20    1.7
21   -1.5
22    1.6
23   -0.6
24    NaN
dtype: object



In [113]:

    
himyms1 = himyms1.set_index('Episode Number')
himyms2 = himyms2.set_index('Episode Number')
himyms3 = himyms3.set_index('Episode Number')
himyms4 = himyms4.set_index('Episode Number')
himyms5 = himyms5.set_index('Episode Number')
himyms6 = himyms6.set_index('Episode Number')
himyms7 = himyms7.set_index('Episode Number')
himyms8 = himyms8.set_index('Episode Number')
himyms9 = himyms9.set_index('Episode Number')



In [114]:

    
frames = [himyms1, himyms2, himyms3, himyms4, himyms5, himyms6, himyms7, himyms8, himyms9]
result = pd.concat(frames, keys=['Season 1', 'Season 2', 'Season 3', 'Season 4', 'Season 5', 'Season 6', 'Season 7', 'Season 8', 'Season 9'])
result









    Out[114]:






  
    
      
      
      Episode Title
      Rating
    
    
      
      Episode Number
      
      
    
  
  
    
      Season 1
      1
      Pilot
      8.5
    
    
      2
      Purple Giraffe
      8.2
    
    
      3
      The Sweet Taste of Liberty
      8.2
    
    
      4
      Return of the Shirt
      8.1
    
    
      5
      Okay Awesome
      8.4
    
    
      6
      The Slutty Pumpkin
      8.2
    
    
      7
      Matchmaker
      7.8
    
    
      8
      The Duel
      8.2
    
    
      9
      Belly Full of Turkey
      8.1
    
    
      10
      The Pineapple Incident
      9.2
    
    
      11
      The Limo
      8.3
    
    
      12
      The Wedding
      8.2
    
    
      13
      Drumroll, Please
      8.7
    
    
      14
      Zip, Zip, Zip
      8.2
    
    
      15
      Game Night
      9.1
    
    
      16
      Cupcake
      7.9
    
    
      17
      Life Among the Gorillas
      7.8
    
    
      18
      Nothing Good Happens After 2 AM
      8.4
    
    
      19
      Mary the Paralegal
      8.8
    
    
      20
      Best Prom Ever
      7.9
    
    
      21
      Milk
      8.2
    
    
      22
      Come On
      8.8
    
    
      Season 2
      1
      Where Were We?
      8.2
    
    
      2
      The Scorpion and the Toad
      8.3
    
    
      3
      Brunch
      8.5
    
    
      4
      Ted Mosby, Architect
      8.8
    
    
      5
      World's Greatest Couple
      8.8
    
    
      6
      Aldrin Justice
      8.2
    
    
      7
      Swarley
      9.0
    
    
      8
      Atlantic City
      8.0
    
    
      ...
      ...
      ...
      ...
    
    
      Season 8
      19
      The Fortress
      7.8
    
    
      20
      The Time Travelers
      8.0
    
    
      21
      Romeward Bound
      7.3
    
    
      22
      The Bro Mitzvah
      8.5
    
    
      23
      Something Old
      7.4
    
    
      24
      Something New
      8.6
    
    
      Season 9
      1
      The Locket
      7.7
    
    
      2
      Coming Back
      7.7
    
    
      3
      Last Time in New York
      7.5
    
    
      4
      The Broken Code
      7.1
    
    
      5
      The Poker Game
      7.1
    
    
      6
      Knight Vision
      7.4
    
    
      7
      No Questions Asked
      7.1
    
    
      8
      The Lighthouse
      7.2
    
    
      9
      Platonish
      8.3
    
    
      10
      Mom and Dad
      6.5
    
    
      11
      Bedtime Stories
      5.5
    
    
      12
      The Rehearsal Dinner
      7.6
    
    
      13
      Bass Player Wanted
      8.2
    
    
      14
      Slapsgiving 3: Slappointment in Slapmarra
      5.5
    
    
      15
      Unpause
      8.4
    
    
      16
      How Your Mother Met Me
      9.5
    
    
      17
      Sunrise
      8.1
    
    
      18
      Rally
      8.0
    
    
      19
      Vesuvius
      7.6
    
    
      20
      Daisy
      8.4
    
    
      21
      Gary Blauman
      8.3
    
    
      22
      The End of the Aisle
      8.9
    
    
      23
      Last Forever: Part One
      6.9
    
    
      24
      Last Forever: Part Two
      5.7
    
  

208 rows × 2 columns



In [115]:

    
resultrating = result.Rating.convert_objects(convert_numeric = True)
resultrating['Season 8']









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  if __name__ == '__main__':






    Out[115]:





Episode Number
1     8.5
2     7.3
3     7.3
4     7.4
5     7.3
6     7.4
7     7.3
8     7.4
9     7.4
10    7.9
11    8.5
12    9.3
13    7.8
14    7.9
15    8.1
16    7.5
17    7.7
18    8.0
19    7.8
20    8.0
21    7.3
22    8.5
23    7.4
24    8.6
Name: Rating, dtype: float64

Hannibal



In [116]:

    
url = 'https://raw.githubusercontent.com/leosartaj/tvstats/master/data/jsonData/hannibal.json'
hannibal = pd.read_json(url)
hannibal.episodes[1]
a = [list(col) for col in zip(*[d.values() for d in hannibal.episodes[1]])]
hannibaldate1 = a[0]
hannibalrating1 = a[1]
hannibaltitle1 = a[2]
hannibalnumber1 = a[3]
hannibals1 = pd.DataFrame(
    {'Episode Number': hannibalnumber1,
     'Episode Title': hannibaltitle1,
     'Rating': hannibalrating1
    })

hannibal.episodes[2]
a = [list(col) for col in zip(*[d.values() for d in hannibal.episodes[2]])]
hannibaldate2 = a[0]
hannibalrating2 = a[1]
hannibaltitle2 = a[2]
hannibalnumber2 = a[3]
hannibals2 = pd.DataFrame(
    {'Episode Number': hannibalnumber2,
     'Episode Title': hannibaltitle2,
     'Rating': hannibalrating2
    })

hannibal.episodes[3]
a = [list(col) for col in zip(*[d.values() for d in hannibal.episodes[3]])]
hannibaldate3 = a[0]
hannibalrating3 = a[1]
hannibaltitle3 = a[2]
hannibalnumber3 = a[3]
hannibals3 = pd.DataFrame(
    {'Episode Number': hannibalnumber3,
     'Episode Title': hannibaltitle3,
     'Rating': hannibalrating3
    })

hannibals1 = hannibals1.set_index('Episode Number')
hannibals2 = hannibals2.set_index('Episode Number')
hannibals3 = hannibals3.set_index('Episode Number')

haframes = [hannibals1, hannibals2]
hannibalfinal = pd.concat(haframes, keys=['Season 1', 'Season 2'])



In [286]:

    
avhannibal = avc.T['Hannibal'].T









    Out[286]:






  
    
      
      Episode Title
      Rating
    
    
      Episode Number
      
      
    
  
  
    
      1
      Antipasto
      None
    
    
      2
      Primavera
      None
    
    
      3
      Secondo
      None
    
    
      4
      Episode #3.4
      None
    
    
      5
      Episode #3.5
      None
    
    
      6
      Episode #3.6
      None
    
    
      7
      Digestivo
      None
    
    
      8
      The Great Red Dragon
      None
    
    
      9
      ...And the Woman Clothed in Sun
      None
    
    
      10
      ...And the Woman Clothed with the Sun
      None
    
    
      11
      ...And the Beast from the Sea
      None
    
    
      12
      Episode #3.12
      None
    
    
      13
      The Wrath of the Lamb
      None



In [118]:

    
hannibaltvratings = pd.ExcelFile("C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx").parse(sheetname = 'Hannibal')
hannibaltvratings = hannibaltvratings.set_index('Number Overall')

Breaking Bad



In [119]:

    
url = 'https://raw.githubusercontent.com/leosartaj/tvstats/master/data/jsonData/breakingBad.json'
bb = pd.read_json(url)

bb.episodes[1]
a = [list(col) for col in zip(*[d.values() for d in bb.episodes[1]])]
bbdate1 = a[0]
bbrating1 = a[1]
bbtitle1 = a[2]
bbnumber1 = a[3]
bbs1 = pd.DataFrame(
    {'Episode Number': bbnumber1,
     'Episode Title': bbtitle1,
     'Rating': bbrating1
    })

bb.episodes[2]
a = [list(col) for col in zip(*[d.values() for d in bb.episodes[2]])]
bbdate2 = a[0]
bbrating2 = a[1]
bbtitle2 = a[2]
bbnumber2 = a[3]
bbs2 = pd.DataFrame(
    {'Episode Number': bbnumber2,
     'Episode Title': bbtitle2,
     'Rating': bbrating2
    })

bb.episodes[3]
a = [list(col) for col in zip(*[d.values() for d in bb.episodes[3]])]
bbdate3 = a[2]
bbrating3 = a[0]
bbtitle3 = a[3]
bbnumber3 = a[1]
bbs3 = pd.DataFrame(
    {'Episode Number': bbnumber3,
     'Episode Title': bbtitle3,
     'Rating': bbrating3
    })

bb.episodes[4]
a = [list(col) for col in zip(*[d.values() for d in bb.episodes[4]])]
bbdate4 = a[0]
bbrating4 = a[1]
bbtitle4 = a[2]
bbnumber4 = a[3]
bbs4 = pd.DataFrame(
    {'Episode Number': bbnumber4,
     'Episode Title': bbtitle4,
     'Rating': bbrating4
    })

bb.episodes[5]
a = [list(col) for col in zip(*[d.values() for d in bb.episodes[5]])]
bbdate5 = a[0]
bbrating5 = a[1]
bbtitle5 = a[2]
bbnumber5 = a[3]
bbs5 = pd.DataFrame(
    {'Episode Number': bbnumber5,
     'Episode Title': bbtitle5,
     'Rating': bbrating5}
    )

bbs1
bbs1 = bbs1.set_index('Episode Number')     
bbs2









    Out[119]:






  
    
      
      Episode Number
      Episode Title
      Rating
    
  
  
    
      0
      1
      Seven Thirty-Seven
      7.9
    
    
      1
      2
      Grilled
      8.5
    
    
      2
      3
      Bit by a Dead Bee
      7.6
    
    
      3
      4
      Down
      7.5
    
    
      4
      5
      Breakage
      7.6
    
    
      5
      6
      Peekaboo
      8.1
    
    
      6
      7
      Negro Y Azul
      7.9
    
    
      7
      8
      Better Call Saul
      8.3
    
    
      8
      9
      4 Days Out
      8.2
    
    
      9
      10
      Over
      7.8
    
    
      10
      11
      Mandala
      8.0
    
    
      11
      12
      Phoenix
      8.4
    
    
      12
      13
      ABQ
      8.4



In [120]:

    
bbs1
bbs2 = bbs2.set_index('Episode Number')
bbs3 = bbs3.set_index('Episode Number')
bbs4 = bbs4.set_index('Episode Number')
bbs5 = bbs5.set_index('Episode Number')

bbframes = [bbs1, bbs2, bbs3, bbs4, bbs5]
bbfinal = pd.concat(bbframes, keys=['Season 1', 'Season 2', 'Season 3', 'Season 4', 'Season 5'])

bbfinal









    Out[120]:






  
    
      
      
      Episode Title
      Rating
    
    
      
      Episode Number
      
      
    
  
  
    
      Season 1
      1
      Pilot
      8.5
    
    
      2
      Cat's in the Bag...
      8.0
    
    
      3
      ...And the Bag's in the River
      8.1
    
    
      4
      Cancer Man
      7.7
    
    
      5
      Gray Matter
      7.7
    
    
      6
      Crazy Handful of Nothin'
      8.5
    
    
      7
      A No-Rough-Stuff-Type Deal
      8.1
    
    
      Season 2
      1
      Seven Thirty-Seven
      7.9
    
    
      2
      Grilled
      8.5
    
    
      3
      Bit by a Dead Bee
      7.6
    
    
      4
      Down
      7.5
    
    
      5
      Breakage
      7.6
    
    
      6
      Peekaboo
      8.1
    
    
      7
      Negro Y Azul
      7.9
    
    
      8
      Better Call Saul
      8.3
    
    
      9
      4 Days Out
      8.2
    
    
      10
      Over
      7.8
    
    
      11
      Mandala
      8.0
    
    
      12
      Phoenix
      8.4
    
    
      13
      ABQ
      8.4
    
    
      Season 3
      7.8
      1
      21 Mar. 2010
    
    
      7.8
      2
      28 Mar. 2010
    
    
      7.6
      3
      4 Apr. 2010
    
    
      7.5
      4
      11 Apr. 2010
    
    
      7.8
      5
      18 Apr. 2010
    
    
      8.4
      6
      25 Apr. 2010
    
    
      8.8
      7
      2 May 2010
    
    
      7.9
      8
      9 May 2010
    
    
      7.6
      9
      16 May 2010
    
    
      7.3
      10
      23 May 2010
    
    
      ...
      ...
      ...
    
    
      8.9
      13
      13 Jun. 2010
    
    
      Season 4
      1
      Box Cutter
      8.4
    
    
      2
      Thirty-Eight Snub
      7.5
    
    
      3
      Open House
      7.4
    
    
      4
      Bullet Points
      7.8
    
    
      5
      Shotgun
      7.8
    
    
      6
      Cornered
      7.6
    
    
      7
      Problem Dog
      8.0
    
    
      8
      Hermanos
      8.4
    
    
      9
      Bug
      8.0
    
    
      10
      Salud
      8.8
    
    
      11
      Crawl Space
      9.0
    
    
      12
      End Times
      8.6
    
    
      13
      Face Off
      9.6
    
    
      Season 5
      1
      Live Free or Die
      8.6
    
    
      2
      Madrigal
      8.2
    
    
      3
      Hazard Pay
      8.1
    
    
      4
      Fifty-One
      8.0
    
    
      5
      Dead Freight
      9.1
    
    
      6
      Buyout
      8.2
    
    
      7
      Say My Name
      8.8
    
    
      8
      Gliding Over All
      9.0
    
    
      9
      Blood Money
      8.8
    
    
      10
      Buried
      8.4
    
    
      11
      Confessions
      9.0
    
    
      12
      Rabid Dog
      8.4
    
    
      13
      To'hajiilee
      9.4
    
    
      14
      Ozymandias
      9.9
    
    
      15
      Granite State
      9.3
    
    
      16
      Felina
      9.9
    
  

62 rows × 2 columns



In [121]:

    
bbtvratings = pd.ExcelFile("C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx").parse(sheetname = 'Breaking Bad')
bbtvratings









    Out[121]:






  
    
      
      Number Overall
      Season
      No. in Season
      Title
      Directed By
      Written By
      Air Date
      Ratings (in Millions)
    
  
  
    
      0
      1
      1
      1
      "Pilot"
      Vince Gilligan
      Vince Gilligan
      January 20, 2008
      1.41
    
    
      1
      2
      1
      2
      "Cat's in the Bag..."
      Adam Bernstein
      Vince Gilligan
      January 27, 2008
      1.49
    
    
      2
      3
      1
      3
      "...And the Bag's in the River"
      Adam Bernstein
      Vince Gilligan
      February 10, 2008
      1.08
    
    
      3
      4
      1
      4
      "Cancer Man"
      Jim McKay
      Vince Gilligan
      February 17, 2008
      1.09
    
    
      4
      5
      1
      5
      "Gray Matter"
      Tricia Brock
      Patty Lin
      February 24, 2008
      0.97
    
    
      5
      6
      1
      6
      "Crazy Handful of Nothin'"
      Bronwen Hughes
      George Mastras
      March 2, 2008
      1.07
    
    
      6
      7
      1
      7
      "A No-Rough-Stuff-Type Deal"
      Tim Hunter
      Peter Gould
      March 9, 2008
      1.50
    
    
      7
      8
      2
      1
      "Seven Thirty-Seven"
      Bryan Cranston
      J. Roberts
      March 8, 2009
      1.66
    
    
      8
      9
      2
      2
      "Grilled"
      Charles Haid
      George Mastras
      March 15, 2009
      1.60
    
    
      9
      10
      2
      3
      "Bit by a Dead Bee"
      Terry McDonough
      Peter Gould
      March 22, 2009
      1.13
    
    
      10
      11
      2
      4
      "Down"
      John Dahl
      Sam Catlin
      March 29, 2009
      1.29
    
    
      11
      12
      2
      5
      "Breakage"
      Johan Renck
      Moira Walley-Beckett
      April 5, 2009
      1.21
    
    
      12
      13
      2
      6
      "Peekaboo"
      Peter Medak
      J. Roberts & Vince Gilligan
      April 12, 2009
      1.41
    
    
      13
      14
      2
      7
      "Negro y Azul"
      Felix Alcala
      John Shiban
      April 19, 2009
      NaN
    
    
      14
      15
      2
      8
      "Better Call Saul"
      Terry McDonough
      Peter Gould
      April 26, 2009
      1.04
    
    
      15
      16
      2
      9
      "4 Days Out"
      Michelle MacLaren
      Sam Catlin
      May 3, 2009
      NaN
    
    
      16
      17
      2
      10
      "Over"
      Phil Abraham
      Moira Walley-Beckett
      May 10, 2009
      NaN
    
    
      17
      18
      2
      11
      "Mandala"
      Adam Bernstein
      George Mastras
      May 17, 2009
      NaN
    
    
      18
      19
      2
      12
      "Phoenix"
      Colin Bucksey
      John Shiban
      May 24, 2009
      NaN
    
    
      19
      20
      2
      13
      "ABQ"
      Adam Bernstein
      Vince Gilligan
      May 31, 2009
      1.50
    
    
      20
      21
      3
      1
      "No Más"
      Bryan Cranston
      Vince Gilligan
      March 21, 2010
      1.95
    
    
      21
      22
      3
      2
      "Caballo Sin Nombre"
      Adam Bernstein
      Peter Gould
      March 28, 2010
      1.55
    
    
      22
      23
      3
      3
      "I.F.T."
      Michelle MacLaren
      George Mastras
      April 4, 2010
      1.33
    
    
      23
      24
      3
      4
      "Green Light"
      Scott Winant
      Sam Catlin
      April 11, 2010
      1.46
    
    
      24
      25
      3
      5
      "Más"
      Johan Renck
      Moira Walley-Beckett
      April 18, 2010
      1.61
    
    
      25
      26
      3
      6
      "Sunset"
      John Shiban
      John Shiban
      April 25, 2010
      1.64
    
    
      26
      27
      3
      7
      "One Minute"
      Michelle MacLaren
      Thomas Schnauz
      May 2, 2010
      1.52
    
    
      27
      28
      3
      8
      "I See You"
      Colin Bucksey
      Gennifer Hutchison
      May 9, 2010
      1.78
    
    
      28
      29
      3
      9
      "Kafkaesque"
      Michael Slovis
      Peter Gould & George Mastras
      May 16, 2010
      1.61
    
    
      29
      30
      3
      10
      "Fly"
      Rian Johnson
      Sam Catlin & Moira Walley-Beckett
      May 23, 2010
      1.20
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      32
      33
      3
      13
      "Full Measure"
      Vince Gilligan
      Vince Gilligan
      June 13, 2010
      1.56
    
    
      33
      34
      4
      1
      "Box Cutter"
      Adam Bernstein
      Vince Gilligan
      July 17, 2011
      2.58
    
    
      34
      35
      4
      2
      "Thirty-Eight Snub"
      Michelle MacLaren
      George Mastras
      July 24, 2011
      1.97
    
    
      35
      36
      4
      3
      "Open House"
      David Slade
      Sam Catlin
      July 31, 2011
      1.71
    
    
      36
      37
      4
      4
      "Bullet Points"
      Colin Bucksey
      Moira Walley-Beckett
      August 7, 2011
      1.93
    
    
      37
      38
      4
      5
      "Shotgun"
      Michelle MacLaren
      Thomas Schnauz
      August 14, 2011
      1.75
    
    
      38
      39
      4
      6
      "Cornered"
      Michael Slovis
      Gennifer Hutchison
      August 21, 2011
      1.67
    
    
      39
      40
      4
      7
      "Problem Dog"
      Peter Gould
      Peter Gould
      August 28, 2011
      1.91
    
    
      40
      41
      4
      8
      "Hermanos"
      Johan Renck
      Sam Catlin & George Mastras
      September 4, 2011
      1.98
    
    
      41
      42
      4
      9
      "Bug"
      Terry McDonough
      Moira Walley-Beckett & Thomas Schnauz
      September 11, 2011
      1.89
    
    
      42
      43
      4
      10
      "Salud"
      Michelle MacLaren
      Peter Gould & Gennifer Hutchison
      September 18, 2011
      1.80
    
    
      43
      44
      4
      11
      "Crawl Space"
      Scott Winant
      George Mastras & Sam Catlin
      September 25, 2011
      1.55
    
    
      44
      45
      4
      12
      "End Times"
      Vince Gilligan
      Thomas Schnauz & Moira Walley-Beckett
      October 2, 2011
      1.73
    
    
      45
      46
      4
      13
      "Face Off"
      Vince Gilligan
      Vince Gilligan
      October 9, 2011
      1.90
    
    
      46
      47
      5
      1
      "Live Free or Die"
      Michael Slovis
      Vince Gilligan
      July 15, 2012
      2.93
    
    
      47
      48
      5
      2
      "Madrigal"
      Michelle MacLaren
      Vince Gilligan
      July 22, 2012
      2.29
    
    
      48
      49
      5
      3
      "Hazard Pay"
      Adam Bernstein
      Peter Gould
      July 29, 2012
      2.20
    
    
      49
      50
      5
      4
      "Fifty-One"
      Rian Johnson
      Sam Catlin
      August 5, 2012
      2.29
    
    
      50
      51
      5
      5
      "Dead Freight"
      George Mastras
      George Mastras
      August 12, 2012
      2.48
    
    
      51
      52
      5
      6
      "Buyout"
      Colin Bucksey
      Gennifer Hutchison
      August 19, 2012
      2.81
    
    
      52
      53
      5
      7
      "Say My Name"
      Thomas Schnauz
      Thomas Schnauz
      August 26, 2012
      2.98
    
    
      53
      54
      5
      8
      "Gliding Over All"
      Michelle MacLaren
      Moira Walley-Beckett
      September 2, 2012
      2.78
    
    
      54
      55
      5
      9
      "Blood Money"
      Bryan Cranston
      Peter Gould
      August 11, 2013
      5.92
    
    
      55
      56
      5
      10
      "Buried"
      Michelle MacLaren
      Thomas Schnauz
      August 18, 2013
      4.77
    
    
      56
      57
      5
      11
      "Confessions"
      Michael Slovis
      Gennifer Hutchison
      August 25, 2013
      4.85
    
    
      57
      58
      5
      12
      "Rabid Dog"
      Sam Catlin
      Sam Catlin
      September 1, 2013
      4.41
    
    
      58
      59
      5
      13
      "To'hajiilee"
      Michelle MacLaren
      George Mastras
      September 8, 2013
      5.11
    
    
      59
      60
      5
      14
      "Ozymandias"
      Rian Johnson
      Moira Walley-Beckett
      September 15, 2013
      6.37
    
    
      60
      61
      5
      15
      "Granite State"
      Peter Gould
      Peter Gould
      September 22, 2013
      6.58
    
    
      61
      62
      5
      16
      "Felina"
      Vince Gilligan
      Vince Gilligan
      September 29, 2013
      10.28
    
  

62 rows × 8 columns



In [122]:

    
avbb = avc.T['Breaking Bad'].T
bbtvratings = pd.ExcelFile("C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx").parse(sheetname = 'Breaking Bad')

bbtvratings = bbtvratings.set_index('Number Overall')

Empire



In [123]:

    
empireavc = avc.T['Empire'].T
empireavc









    Out[123]:






  
    
      
      
      epname
      relepno
      href
      criticname
      nratings
      critrating
      commrating
    
    
      season
      epno
      
      
      
      
      
      
      
    
  
  
    
      1
      1
      Pilot
      1
      /tvclub/empire-pilot-213578
      Joshua Alston
      148
      10
      9.12838
    
    
      2
      The Outspoken King
      2
      /tvclub/empire-outspoken-king-213847
      Joshua Alston
      95
      5
      7.95789
    
    
      3
      The Devil Quotes Scripture
      3
      /tvclub/empire-devil-quotes-scripture-214176
      Joshua Alston
      85
      10
      9.58824
    
    
      4
      False Imposition
      4
      /tvclub/empire-false-imposition-214503
      Joshua Alston
      68
      8
      8.47059
    
    
      5
      Dangerous Bonds
      5
      /tvclub/empire-dangerous-bonds-214812
      Joshua Alston
      67
      8
      8.55224
    
    
      6
      Out, Damned Spot
      6
      /tvclub/empire-out-damned-spot-215110
      Joshua Alston
      97
      6
      8.73196
    
    
      7
      Our Dancing Days
      7
      /tvclub/empire-our-dancing-days-215441
      Joshua Alston
      80
      7
      8.9375
    
    
      8
      The Lyon's Roar
      8
      /tvclub/empire-lyons-roar-215754
      Myles McNutt
      99
      7
      9.17172
    
    
      9
      Unto the Breach
      9
      /tvclub/empire-unto-breach-215984
      Joshua Alston
      97
      7
      9.36082
    
    
      10
      Sins of the Father
      10
      /tvclub/empire-sins-father-216441
      Joshua Alston
      83
      11
      9.04819
    
    
      11
      Die But Once
      11
      /tvclub/empire-die-oncewho-i-am-216740
      Joshua Alston
      52
      8
      7.84615
    
    
      12
      The Newborn King
      12
      /tvclub/empire-die-oncewho-i-am-216740
      Joshua Alston
      86
      8
      8.27907
    
    
      2
      1
      The Devils Are Here
      13
      /tvclub/empire-returns-same-it-ever-was-better...
      Joshua Alston
      64
      7
      7.8125
    
    
      2
      Without A Country
      14
      /tvclub/strong-episode-brings-empire-back-down...
      Joshua Alston
      50
      9
      7.72
    
    
      3
      Fires of Heaven
      15
      /tvclub/empire-has-its-pedal-metal-and-its-whe...
      Joshua Alston
      46
      6
      7.41304
    
    
      4
      Poor Yorick
      16
      /tvclub/empire-becoming-biggest-troll-televisi...
      Joshua Alston
      49
      4
      5.46939
    
    
      5
      Be True
      17
      /tvclub/empire-muted-family-drama-trapped-insi...
      Joshua Alston
      42
      7
      6.80952
    
    
      6
      A High Hope For A Low Heaven
      18
      /tvclub/empire-would-rather-burn-out-fade-away...
      Joshua Alston
      37
      5
      5.72973
    
    
      7
      True Love Never
      19
      /tvclub/its-official-empires-second-season-rai...
      Joshua Alston
      41
      2
      4.02439
    
    
      8
      My Bad Parts
      20
      /tvclub/rap-battle-brings-out-best-empire-228640
      Joshua Alston
      34
      9
      7.47059
    
    
      9
      Sinned Against
      21
      /tvclub/empire-fizzles-again-episode-manages-b...
      Joshua Alston
      31
      4
      4.70968
    
    
      10
      Et Tu, Brute
      22
      /tvclub/empire-fritters-away-last-its-good-wil...
      Joshua Alston
      48
      2
      4.97917
    
    
      11
      Death Will Have His Day
      23
      /tvclub/empire-back-along-its-constant-threat-...
      Joshua Alston
      27
      9
      7.14815
    
    
      12
      A Rose by Any Other Name
      24
      /tvclub/well-damnso-much-empire-renaissance-23...
      Joshua Alston
      20
      4
      5.35
    
    
      13
      The Tameness of a Wolf
      25
      /tvclub/behold-return-earthbound-empire-235291
      Joshua Alston
      16
      7
      6.9375
    
    
      14
      Time Shall Unfold
      26
      /tvclub/empire-reveals-latest-stupid-detail-ab...
      Joshua Alston
      17
      8
      6
    
    
      15
      More Than Kin
      27
      /tvclub/empire-reminds-us-why-its-shame-title-...
      Joshua Alston
      15
      2
      3
    
    
      16
      The Lyon Who Cried Wolf
      28
      /tvclub/finally-empire-right-kind-ridiculous-2...
      Joshua Alston
      8
      9
      8.625
    
    
      17
      Rise by Sin
      29
      /tvclub/empires-biggest-cliffhanger-season-rin...
      Joshua Alston
      13
      7
      6
    
    
      18
      Past Is Prologue
      30
      /tvclub/empire-ended-its-rough-second-season-m...
      Joshua Alston
      13
      5
      5.76923
    
    
      3
      1
      Light in Darkness
      31
      /tvclub/lucious-tried-court-his-familys-opinio...
      Danette Chavez
      12
      4
      4.83333
    
    
      4
      Cupid Kills
      32
      /tvclub/lyons-get-ready-go-mattresses-empire-2...
      Danette Chavez
      3
      NaN
      5
    
    
      5
      One Before Another
      33
      /tvclub/empires-lyons-cant-keep-it-together-an...
      Danette Chavez
      3
      NaN
      2.33333



In [124]:

    
empireavc = avc.T['Empire'].T
empireavc
empiretvratings = pd.ExcelFile("C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx").parse(sheetname = 'Empire')
empiretvratings['IMDb Rating']









    Out[124]:





0     8.1
1     8.2
2     8.2
3     8.3
4     8.6
5     8.4
6     8.8
7     8.9
8     9.0
9     8.5
10    8.4
11    8.5
12    8.2
13    8.2
14    8.2
15    8.4
16    8.1
17    8.0
18    8.2
19    8.6
20    8.5
21    9.2
22    8.5
23    8.3
24    8.3
25    8.2
26    7.9
27    8.7
28    9.2
29    8.1
Name: IMDb Rating, dtype: float64

Critical & Community Differences.

Which episodes were the most divisive?

Now that we have all the data, it's time to plot. First, let's see if there's any correlation between critical appraisal and community appraisal.



In [125]:

    
f = avc.T[('How I Met Your Mother', '3')].T
pet1 = himyms1.reset_index()
afs = f.reset_index()
himyms3diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1

f = avc.T[('How I Met Your Mother', '4')].T
pet = himyms4.reset_index()
afs = f.reset_index()
pet1 = pet.drop(pet.index[[0,1,2, 3]]).reset_index()
himyms4diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1

f = avc.T[('How I Met Your Mother', '5')].T
pet = himyms5.reset_index()
afs = f.reset_index()
pet1 = pet.drop(pet.index[[0,1,2,3]]).reset_index()
himyms5diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1

f = avc.T[('How I Met Your Mother', '6')].T
pet = himyms6.reset_index()
afs = f.reset_index()
pet1 = pet.drop(pet.index[[0,1,2,3]]).reset_index()
himyms6diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1

f = avc.T[('How I Met Your Mother', '7')].T
pet = himyms7.reset_index()
afs = f.reset_index()
pet1 = pet.drop(pet.index[[0,1,2, 23]]).reset_index()
himyms7diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1

e = avc.T[('How I Met Your Mother', '8')].T
prt = himyms8.reset_index()
ads = e.reset_index()
prt1 = prt.drop(prt.index[[0,1,2]])
prt1 = prt1.reset_index()
himyms8diff = prt1.Rating.convert_objects(convert_numeric = True) - ads.critrating + 1
himyms8diff.sort_values()

f = avc.T[('How I Met Your Mother', '9')].T
pet = himyms9.reset_index()
afs = f.reset_index()
pet1 = pet.drop(pet.index[[0,1,2]]).reset_index()
himyms9diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\kernelbase.py:390: PerformanceWarning: indexing past lexsort depth may impact performance.
  user_expressions, allow_stdin)
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:4: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:10: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:16: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:22: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:28: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:35: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:42: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.



In [126]:

    
himymframes = [himyms3diff, himyms4diff, himyms5diff, himyms6diff, himyms7diff, himyms8diff, himyms9diff]

himymdifffinal = pd.concat(himymframes, keys=['Season 3', 'Season 4', 'Season 5', 'Season 6', 'Season 7', 'Season 8', 'Season 9'])

himymdifffinal.dropna().sort_values()









    Out[126]:





Season 9  10   -3.5
          7    -3.5
          4    -2.8
Season 4  15   -2.3
Season 3  15   -2.1
Season 4  1      -2
Season 9  14     -2
Season 5  10     -2
Season 8  14     -2
Season 9  13   -1.9
Season 5  17   -1.8
Season 4  12   -1.7
Season 9  17   -1.7
          5    -1.7
          2    -1.6
Season 4  8    -1.6
          9    -1.6
Season 3  4    -1.6
Season 6  9    -1.6
Season 8  2    -1.6
Season 9  6    -1.5
Season 8  7    -1.5
Season 9  8    -1.4
Season 6  12   -1.4
Season 7  8    -1.3
Season 4  18   -1.3
Season 5  6    -1.2
          0    -1.2
          2    -1.2
Season 3  6    -1.2
               ... 
Season 6  14    1.2
Season 3  13    1.2
          7     1.2
Season 8  17    1.3
Season 6  19    1.4
Season 7  19    1.4
Season 5  19    1.4
Season 4  14    1.5
Season 5  14    1.6
Season 3  12    1.7
Season 5  16    1.7
Season 7  18    1.7
Season 5  11    1.8
Season 6  17    1.8
Season 9  18    1.9
Season 7  11      2
Season 5  18    2.2
Season 4  6     2.2
Season 8  0     2.4
Season 4  16    2.6
Season 7  15    2.6
Season 6  0     2.8
Season 8  10    2.9
          6     2.9
          12    3.5
Season 9  9     4.2
Season 3  1     4.2
Season 6  8     4.7
Season 3  3     5.1
          18    5.8
dtype: object



In [127]:

    
f = avc.T[('Breaking Bad', '1')].T
pet = bbs1.reset_index()
afs = f.reset_index()
pet1 = pet.reset_index()
bbs1diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1

f = avc.T[('Breaking Bad', '2')].T
pet = bbs2.reset_index()
afs = f.reset_index()
pet1 = pet.reset_index()
bbs2diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1

f = avc.T[('Breaking Bad', '3')].T
pet = bbs3.reset_index()
afs = f.reset_index()
bbs3diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1

f = avc.T[('Breaking Bad', '4')].T
pet = bbs4.reset_index()
afs = f.reset_index()
pet1 = pet.reset_index()
bbs4diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1

f = avc.T[('Breaking Bad', '5')].T
pet = bbs5.reset_index()
afs = f.reset_index()
pet1 = pet.reset_index()
bbs5diff = pet1.Rating.convert_objects(convert_numeric = True) - afs.critrating + 1


bbframes = [bbs1diff, bbs2diff, bbs3diff, bbs4diff, bbs5diff]

bbdifffinal = pd.concat(bbframes, keys=['Season 1', 'Season 2', 'Season 3', 'Season 4', 'Season 5'])

bbdifffinal.dropna().sort_values()









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\kernelbase.py:390: PerformanceWarning: indexing past lexsort depth may impact performance.
  user_expressions, allow_stdin)
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:5: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:11: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:16: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:22: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:28: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.






    Out[127]:





Season 3  4    -2.4
          2    -2.4
Season 2  2    -2.4
          4    -2.4
Season 3  9    -2.2
Season 2  9    -2.2
Season 3  6    -2.1
          0    -2.1
          5    -1.9
Season 5  5    -1.8
Season 3  8    -1.8
Season 2  8    -1.8
          7    -1.7
          11   -1.6
          12   -1.6
Season 3  12   -1.6
Season 5  11   -1.6
Season 3  11   -1.6
Season 2  3    -1.5
Season 1  5    -1.5
Season 4  1    -1.5
          11   -1.4
Season 1  4    -1.3
Season 4  3    -1.2
Season 5  8    -1.2
Season 2  0    -1.1
          6    -1.1
Season 3  10     -1
Season 5  7      -1
Season 4  10     -1
               ... 
          6      -1
Season 5  2    -0.9
Season 2  5    -0.9
Season 1  2    -0.9
Season 5  1    -0.8
          9    -0.6
          12   -0.6
Season 4  2    -0.6
Season 1  0    -0.5
Season 4  5    -0.4
          12   -0.4
Season 5  0    -0.4
Season 4  4    -0.2
          9    -0.2
Season 5  6    -0.2
          15   -0.1
          13   -0.1
          10      0
Season 4  8       0
Season 1  6     0.1
Season 5  4     0.1
          14    0.3
Season 4  7     0.4
          0     0.4
Season 3  1     0.5
Season 2  1     0.5
Season 3  3     0.5
Season 2  10      1
Season 3  7     1.3
Season 1  3     1.7
dtype: object



In [128]:

    
g = avc.T[('Hannibal', '1')].T
pet = hannibals1.reset_index()
ags = g.reset_index()
pet1 = pet.reset_index()
hannibals1diff = pet1.Rating.convert_objects(convert_numeric = True) - ags.critrating + 1

g = avc.T[('Hannibal', '2')].T
pet = hannibals2.reset_index()
ags = g.reset_index()
pet1 = pet.reset_index()
hannibals2diff = pet1.Rating.convert_objects(convert_numeric = True) - ags.critrating + 1

hannibalframes = [hannibals1diff, hannibals2diff]
hannibaldifffinal = pd.concat(hannibalframes, keys=['Season 1', 'Season 2'])
hannibaldifffinal.dropna().sort_values()









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\kernelbase.py:390: PerformanceWarning: indexing past lexsort depth may impact performance.
  user_expressions, allow_stdin)
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:5: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:11: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.






    Out[128]:





Season 1  11   -1.2
          3    -1.1
Season 2  4    -0.9
          3    -0.9
Season 1  0    -0.7
Season 2  12   -0.7
Season 1  5    -0.4
Season 2  7    -0.3
Season 1  9    -0.2
Season 2  10    0.1
Season 1  4     0.1
Season 2  11    0.1
          1     0.1
          5     0.1
Season 1  1     0.3
          10    0.6
Season 2  2     0.9
          8     0.9
          0       1
Season 1  7       1
Season 2  6     1.1
Season 1  2     1.2
          8     1.4
          6     1.8
Season 2  9     2.1
dtype: object



In [129]:

    
empires1 = avc.T[('Empire', '1')].T
empires2 = avc.T[('Empire', '2')].T
empiretvratings = pd.ExcelFile("C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx").parse(sheetname = 'Empire')
empiretvratings

es1 = empires1.reset_index().critrating
es2 = empires2.reset_index().critrating
eframe = [es1, es2]
efinal = pd.concat(eframe)
beta = efinal.reset_index().critrating - empiretvratings['IMDb Rating']
beta.sort_values()









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\kernelbase.py:390: PerformanceWarning: indexing past lexsort depth may impact performance.
  user_expressions, allow_stdin)






    Out[129]:





21   -7.2
18   -6.2
26   -5.9
20   -4.5
15   -4.4
23   -4.3
1    -3.2
29   -3.1
17     -3
5    -2.4
28   -2.2
14   -2.2
8      -2
7    -1.9
6    -1.8
24   -1.3
12   -1.2
16   -1.1
4    -0.6
11   -0.5
10   -0.4
3    -0.3
25   -0.2
27    0.3
19    0.4
22    0.5
13    0.8
2     1.8
0     1.9
9     2.5
dtype: object

Visualizing Critical Opinion against Ratings

The main point of this data project is to see if there is any immediate correlation between Television Ratings and critical opinion or community opinion. We have our data, so the most efficient way of going about this would be plotting a dot plot and getting some regression analysis of it done via Python.

How I Met Your Mother



In [302]:

    
result = pd.concat(frames, keys=['Season 1', 'Season 2', 'Season 3', 'Season 4', 'Season 5', 'Season 6', 'Season 7', 'Season 8', 'Season 9'])

result = result.drop(['Season 1', 'Season 2',])
result = result.drop([('Season 4', '1'), ('Season 4', '2'), ('Season 4', '3'), ('Season 4', '4'), ('Season 4', '16')])
result = result.drop([('Season 5', '1'), ('Season 5', '2'), ('Season 5', '3'), ('Season 5', '4')])
result = result.drop([('Season 6', '1'), ('Season 6', '2'), ('Season 6', '3'), ('Season 6', '4')])
result = result.drop([('Season 7', '1'), ('Season 7', '2'), ('Season 7', '3')])
result = result.drop([('Season 8', '1'), ('Season 8', '2'), ('Season 8', '3')])
result = result.drop([('Season 9', '1'), ('Season 9', '2'), ('Season 9', '3'), ('Season 9', '24')])

i = avc.T['How I Met Your Mother'].T.critrating - 1
j = result.Rating
k = pd.ExcelFile("C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx").parse(sheetname = 'HIMYM for AVClub').set_index(['Season', 'No. in Season']).Rating

fig, axe = plt.subplots()
k.to_frame().plot(ax=axe)
j.convert_objects(convert_numeric=True).plot(ax=axe)
axe.set_title('IMDb User Score against Viewership')
axe.set_xlabel('Season & Episode Number')
axe.legend(['Viewership in Millions', 'Avg. IMDb User Rating'])









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:17: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.






    Out[302]:





<matplotlib.legend.Legend at 0x1c79fd0a2b0>



In [308]:

    
fig, axe = plt.subplots()
i.to_frame().plot(ax=axe, alpha = 0.5)
j.convert_objects(convert_numeric=True).plot(ax=axe)
axe.set_ylim(0, 10)
axe.set_xlabel('Season and Episode Number')
axe.set_title('IMDb v. AV Club Scores per Episode')
axe.legend(['AV Club Score', 'IMDb Score'], loc = 4)









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:3: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  app.launch_new_instance()






    Out[308]:





<matplotlib.legend.Legend at 0x1c7a59c0908>



In [132]:

    
i2 = i.reset_index().critrating.convert_objects(convert_numeric=True)
j2 = j.reset_index().Rating.convert_objects(convert_numeric = True)
k2 = k.reset_index().Rating

plt.scatter(k2, j2)









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  if __name__ == '__main__':
C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  from ipykernel import kernelapp as app






    Out[132]:





<matplotlib.collections.PathCollection at 0x1c79fc682b0>



In [336]:

    
plt.scatter(k2, i2)









    Out[336]:





<matplotlib.collections.PathCollection at 0x1c7a6fd2780>



In [337]:

    
plt.scatter(i2, j2)









    Out[337]:





<matplotlib.collections.PathCollection at 0x1c7a56bfa90>



In [133]:

    
k2.describe()









    Out[133]:





count    141.000000
mean       8.860071
std        1.181586
min        6.410000
25%        8.000000
50%        8.770000
75%        9.730000
max       13.130000
Name: Rating, dtype: float64



In [134]:

    
i = 0
delta = []
for h in j2:
    while i < 140:
        d = j2[i+1] - j2[i] 
        delta.append(d)
        i = i + 1  

sum(delta)/len(delta)









    Out[134]:





-0.010000000000000002

So, what do these graphs show us? The graph that plots IMDb score against AV Club score shows us that the volatility of the AV Club's score is much higher than that of the IMDb score. Let's see if we can prove that with some numbers. The mean of the AV Club scores is 7.9, while the average IMDb score was 8.04. A 0.1 point difference exists, but I think the more interesting value to find would be the delta of successive episodes, i.e. how the rating changed for each successive episode, as well as a breakdown of the standard deviation of the show as a whole. So, beginning with the IMDb user score, we see that, for the series as a whole, the mean is 8.04 with a standard deviation of 0.61. 68% of episodes were rated between 7.43 and 8.65, showing the show as a whole was considered good, not great. What's more, the average change in "quality" for the series as a whole was -0.01, implying the show was more consistent in its quality.

What can we ascertain from this? Well, that HIMYM was a reliable show in its genre. The critical volatility contrasted with the less volatile community scores shows that while HIMYM had bad episodes, it was still an overall "reliable" show when it came to comedy. It wasn't offensively bad but it wasn't consistently phenomenal either. It, if anything, looks like the graph for any long-running live-action comedy show: "Friends", "Cheers", etc. "How I Met Your Mother" exists as our baseline, as our example for a show that had a solid fan following backed up by a decent critical opinion.

Hannibal



In [287]:

    
himdb = hannibalfinal
havc = avc.T['Hannibal'].T
hannibaltvratings = pd.ExcelFile("C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx").parse(sheetname = 'Hannibal')
htvr = hannibaltvratings.set_index(['Season', 'No. in Season'])

n









    Out[287]:





Season  No. in Season
1       1                4.36
        2                4.38
        3                3.51
        4                 NaN
        5                2.40
        6                2.61
        7                2.62
        8                2.46
        9                2.69
        10               2.40
        11               2.36
        12               2.10
        13               1.98
2       1                3.27
        2                2.50
        3                2.47
        4                2.69
        5                3.49
        6                2.18
        7                2.25
        8                2.80
        9                2.45
        10               2.28
        11               1.95
        12               2.32
        13               2.35
3       1                2.57
        2                1.66
        3                1.69
        4                1.46
        5                1.23
        6                1.38
        7                0.97
        8                0.96
        9                1.02
        10               1.01
        11               1.03
        12               0.79
        13               1.24
Name: Rating, dtype: float64



In [292]:

    
c = himdb.Rating
s = havc.critrating - 1
n = htvr.Rating

c2 = c.reset_index()
c3 = c2.Rating.convert_objects(convert_numeric=True)
n2 = n.drop(3).reset_index().Rating
plt.scatter(n2, c3)









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:6: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.






    Out[292]:





<matplotlib.collections.PathCollection at 0x1c7a8422240>



In [282]:

    
n3 = n.reset_index().drop(6).Rating
plt.scatter(n3, s)









    Out[282]:





<matplotlib.collections.PathCollection at 0x1c7a72a2a20>



In [335]:

    
s4 = s.drop('3').reset_index().critrating
c4 = c3.drop(25)
plt.scatter(s4, c4)









    Out[335]:





<matplotlib.collections.PathCollection at 0x1c7a700be48>



In [283]:

    
c = c.convert_objects(convert_numeric = True)
fig, axr = plt.subplots()
n.plot(ax=axr)
s.plot(ax=axr)
c.plot(ax=axr, color='red')
axr.set_ylim(0, 11)
axr.set_xlabel('Season & Episode Number')
axr.legend(['Viewership in Millions', 'AV Club Rating', 'IMDb User Rating'], loc = 5)
axr.set_title('Critical Appraisal v. Viewership')









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  if __name__ == '__main__':






    Out[283]:





<matplotlib.text.Text at 0x1c7a73c0ba8>



In [284]:

    
c = c.convert_objects(convert_numeric = True)
fig, axr = plt.subplots()
c.plot(ax=axr, color='red')
s.plot(ax=axr, color='green')
axr.set_ylim(0, 10)
axr.set_xlabel('Season & Episode Number')
axr.legend(['IMDB User Rating', 'AV Club Rating'], loc = 5)
axr.set_title('Critical v. Communal Appraisal')









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  if __name__ == '__main__':






    Out[284]:





<matplotlib.text.Text at 0x1c7a73740f0>



In [164]:

    
i = 0
delta = []
for h in c:
    while i < 24:
        d = c[i+1] - c[i] 
        delta.append(d)
        i = i + 1    

c.mean()
sum(delta)/len(delta)


delta = []
i = 0
for f in s:
    while i < 24:
        d = s[i+1] - s[i] 
        delta.append(d)
        i = i + 1    
s.convert_objects(convert_numeric=True).mean()
sum(delta)/len(delta)

c.std()









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:20: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.






    Out[164]:





0.3777056240798361

Average user rating for Hannibal: 8.79. Average successive change: 0.03, Standard deviation for series as a whole, 0.38.

Average crit rating for Hannibal: 8.52. Average successive change: 0.04, Std: 0.38

Avg. viewers: 2.67 mil, with standard dev. of .647.

So, with "Hannibal", it's kind of clear to see viewership dived with Season 3, as the graph below shows. Each season had worse and worse viewership, despite the high review scores of the show and the evaluations of its fanbase. If anything, it appears "Hannibal" is an example of a show that was too niche. I think that, its audience initially bought into the case-of-the-week style show of Season 1, but as the show grew more and more serialized (e.g. episodes became more and more connected to one another, making it hard for a newcomer to "get into" the show), viewership dropped. There were very likely more factors that led to the show's demise, such as the emphasis on horror and grotesque imagery or the lack of big stars, but I think that by changing each season, from a set of cases wrapped in an overarching story to just one big overarching story, it left new fans unable to join the viewerbase while testing caught-up fans with its unique pacing.



In [298]:

    
fig, axe = plt.subplots(nrows=3, ncols=1, sharey = True)
n[1].plot(ax=axe[0])
n[2].plot(ax=axe[1])
n[3].plot(ax=axe[2])









    Out[298]:





<matplotlib.axes._subplots.AxesSubplot at 0x1c7a5a60b38>

Breaking Bad



In [141]:

    
bbac = avc.T['Breaking Bad'].T.critrating - 1
bbtvr = bbtvratings['Ratings (in Millions)']
bbimdb = bbfinal.Rating

fig, axt = plt.subplots()
bbac.plot(ax=axt)
bbimdb.convert_objects(convert_numeric = True).plot(ax=axt, color='red')
axt.set_ylim(0, 10)
axt.set_xlabel('Season & Episode Number')
axt.legend(['AV Club Rating', 'Avg. IMDb User Rating'], loc = 3)
axt.set_title('Critical v. Communal Consensus')









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:7: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.






    Out[141]:





<matplotlib.text.Text at 0x1c79daeae80>



In [142]:

    
fig, axt = plt.subplots()
bbac.plot(ax=axt)
bbtvr.plot(ax=axt)
axt.set_ylim(0, 11)
axt.set_title("AV Club Rating against Viewership")
axt.set_xlabel('Season & Episode Number')









    Out[142]:





<matplotlib.text.Text at 0x1c7a179c5f8>



In [143]:

    
fig, axt = plt.subplots()
bbimdb.convert_objects(convert_numeric = True).plot(ax=axt, color='red')
bbtvr.plot(ax=axt)
axt.set_title('IMDb User Rating against Viewership')
axt.legend(['Viewership in Millions', 'Avg. IMDb User Score'], loc = 10)
axt.set_xticklabels('')
axt.set_xlabel('Episode Number Aired')
axt.set_ylim(0, 12)









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  from ipykernel import kernelapp as app






    Out[143]:





(0, 12)



In [311]:

    
bbisc = bbimdb.reset_index().Rating.convert_objects(convert_numeric = True)
babcs = bbac.reset_index().critrating
plt.scatter(bbtvr, babcs)









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  if __name__ == '__main__':






    Out[311]:





<matplotlib.collections.PathCollection at 0x1c7a6eb01d0>

From this we can see, the second half of Season 5 completely skews the scatter plot of the ratings against critical appraisal. So, if we omit the last half of the season, and look at it...



In [312]:

    
bbacs = babcs.drop([54, 55, 56, 57, 58, 59, 60, 61])
bbtvr4 = bbtvr.drop([55, 56, 57, 58, 59, 60, 61, 62])
bbimsc2 = bbimdb.reset_index().Rating.convert_objects(convert_numeric = True).drop([54, 55, 56, 57, 58, 59, 60, 61])


plt.scatter(bbtvr4, bbimsc2)









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:3: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  app.launch_new_instance()






    Out[312]:





<matplotlib.collections.PathCollection at 0x1c7a6f0fcf8>

We see a scatter plot that appears to have some semblance of a correlation. Although any line of best fit would have a small R^2 value, it is still possible to see some correlation existed between viewership and quality.



In [317]:

    
plt.scatter(bbimdb.reset_index().Rating.convert_objects(convert_numeric = True), babcs)









    



C:\Users\Sidd\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  if __name__ == '__main__':






    Out[317]:





<matplotlib.collections.PathCollection at 0x1c7a6f706d8>



In [146]:

    
fig, axe = plt.subplots()
bbtvr4.plot(ax=axe)
bbimsc2.plot(ax=axe, color='red')
axe.legend(['Viewership (in Millions)', 'Avg. IMDb User Score'], loc=5)
axe.set_title('Viewership against Community Reviews')









    Out[146]:





<matplotlib.text.Text at 0x1c7a29b3cf8>

What we see is in fact a more consistent viewerbase. The peaks and troughs of the average user's opinions on the show are much closer than those of other shows, so it can be hyothesized the consistency in quality also led to a consistency in viewership.



In [163]:

    
bbimsc2
i = 0
delta = []
for h in bbimsc2:
    while i < 53:
        d = bbimsc2[i+1] - bbimsc2[i] 
        delta.append(d)
        i = i + 1    
delta
bbimsc2.mean()
sum(delta)/len(delta)

bbacs.mean()
delta = []
i = 0
for h in bbacs:
    while i < 53:
        d = bbacs[i+1] - bbacs[i] 
        delta.append(d)
        i = i + 1    

bbtvr.describe()









    



C:\Users\Sidd\Anaconda3\lib\site-packages\numpy\lib\function_base.py:3834: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)






    Out[163]:





count    57.000000
mean      2.326140
std       1.718762
min       0.970000
25%            NaN
50%            NaN
75%            NaN
max      10.280000
Name: Ratings (in Millions), dtype: float64



In [ ]:

    
IMDB: mean = 8.34, std = 0.61, 
AV Club: mean 9.14, std = 0.96

From this what we see is, the largest dip in quality from successive episode is -1. This means that, even before the second half started airing, the average episode quality was 8.14 with an average "consistency", or average change in quality, of 0.01. Critically, the average rating per episode was a 9.07 with an average drop between episodes of 0.02, with the largest successive drop being 3 points. With this and the information in the cell above, it's apparent that "Breaking Bad" was an example of a show that maintained a high bar of quality, despite a small initial viewership. As the AVClub graph against ratings with S5 Pt. 2 included shows, the show had a conistent viewerbase that followed the show for seasons 1 - season 5 pt. 1. Suddenly, when S5 Pt. 2 premieres, the viewership spikes.

My hypothesis for this is as follows: between Season 5 Part 1 and Season 5 Part 2, AMC scheduled a year-long break i.e. the finale of Pt. 1 aired in September 2012 and the premiere of Part 2 aired in August 2013. Because of this unexpected hiatus, AMC enabled fans of the show to spread the word about how good "Breaking Bad" was and gave new fans the time to catchup to the show. By letting the show' massive critical acclaim and word-of-mouth buzz converge, the show gained more and more fans.

This is also why AMC did the same trick with Mad Men's final season, although in that case, it didn't workout as well for them as the premiere of Season 7 Pt. 1 and the premiere of Season 7 Pt. 2 attracted the same size audience. So, there was some x-factor about "Breaking Bad" that enabled the surge in fanbase between parts, and I believe it has something to do with "watercooler" moments. If I had Twitter data, I believe if you were to analyze the gaps between parts for "Mad Men" and "Breaking Bad", you'd find "Breaking Bad" would have a higher volume of hashtag usage than "Mad Men". I'd attribute it to the fact "Breaking Bad" is a show about definite action (e.g. guns being shot, lives being taken, etc), meaning it has moments that can hook new-comers easily (like "Did you see what Jesse did to Gale?" or "OMG that finale with the plane!"), while "Mad Men" is more about character and personal breakthrough, giving fans less moments to really talk about to convince people to watch the show.

Empire



In [166]:

    
empireavc = avc.T['Empire'].T
empireavc
empiretvratings = pd.ExcelFile("C:/Users/Sidd/Desktop/Data_Bootcamp/TV Ratings.xlsx").parse(sheetname = 'Empire')
eavcsc = empireavc.critrating.drop(['3']) - 1
etvrsc = empiretvratings['Viewers (in millions)']
eimdbsc = empiretvratings['IMDb Rating']

eavsc1 = eavcsc.reset_index().critrating

fig, axy = plt.subplots()
eavsc1.plot(ax=axy, color='green')
eimdbsc.plot(ax=axy, color='red')
axy.set_title('Average IMDb User Score against Critical Score')
axy.set_xlabel('Total Episode Number')
axy.legend(['AV Club Score', 'Avg. IMDb User Score'], loc=3)









    Out[166]:





<matplotlib.legend.Legend at 0x1c7a2b63668>



In [167]:

    
fig, axy = plt.subplots()
etvrsc.plot(ax=axy)
eimdbsc.plot(ax=axy, color='red')
axy.set_title('Average IMDb User Rating against Viewership')
axy.set_xlabel('Total Episode Number')
axy.legend(['Viewership (in Millions)', 'Avg. IMDb User Score'])









    Out[167]:





<matplotlib.legend.Legend at 0x1c7a2aa49e8>



In [168]:

    
fig, axy = plt.subplots()
axy.set_title('Critical score against Viewership')
axy.set_xlabel('Total Episode Number')
axy.set_xlim(0, 12)
eavsc1.plot(ax=axy)
eavsc1.columns = ['AV Club Score']
etvrsc.plot(ax=axy)
axy.legend(['Critical Rating', 'Viewership (in Millions)'])









    Out[168]:





<matplotlib.legend.Legend at 0x1c7a2ad3748>



In [309]:

    
plt.scatter(eavsc1, eimdbsc)









    Out[309]:





<matplotlib.collections.PathCollection at 0x1c7a715c5f8>



In [207]:

    
i = 0
delta = []
for h in etvrsc:
    while i < 25:
        d = etvrsc[i+1] - etvrsc[i] 
        delta.append(d)
        i = i + 1    
delta
sum(delta[13:30])/len(delta[13:30])









    Out[207]:





-0.59833333333333338

Critics: mean was 5.7, std was 2.40, change during season 1 was -0.08, avg. change during season 2 was -0.08.

Communit: mean was 8.4, std was 0.33, change during season 1 was 0.08, avg. change during season 2 was 0.

Ratings: Season 1's delta was 0.3, season 2's was -0.598

So, from this, we can see that critics maintained a much harsher view of "Empire" than its fanbase did. Critics viewed the show as more volatile in quality. For seasons, even though through the average change per successive episode was small, what the graph shows is that the actual change per successive episode varies, with some episodes being huge drops in quality and other huge gainers.

For the community, the graph backs up the numerical findings: namely that it was rated more consistently, hence showing it was viewed as a more quality-consistent show. Surprisingly enough though, even though the quality was constant throughout both seasons, viewership actively declined during the second season despite community members rating episodes on the same level as Season 1. Maybe this was because of the more volatile critics reviews which alternate between B grades and D grades, week in and week out.

In Conclusion



In [213]:

    
fig, axe = plt.subplots()
etvrsc.plot(ax=axe)
bbtvr.plot(ax=axe)
n.plot(ax=axe)
k.plot(ax=axe)
axe.legend(['Empire', 'Breaking Bad', 'Hannibal', 'How I Met Your Mother'])
axe.set_xticklabels('')
axe.set_xlabel('Episode Number in Series')
axe.set_ylabel('Viewers in Millions')
axe.set_title('Viewership Visualization')









    Out[213]:





<matplotlib.text.Text at 0x1c7a3e00ac8>

The reason I made the graph about is just for you, the reader, to have context on the relative ratings of each of these series. As expected, the two shows on the networks with the lowest barrier to access have the largest ratings. "Empire" and "How I Met Your Mother" were also shows that belonged to very traditional television genres: the soap opera and the sitcom. It's widely acknowledged that the more "genre" your show is, the less audience it typically attracts e.g. a broad comedy like "How I Met Your Mother" will attract more viewers than a artistic horror-themed show like "Hannibal." In addition, Season 1 of "Empire" is the only instance of a season on this chart where viewership every episode increases. Let's plot these seperately.



In [234]:

    
fig, axe = plt.subplots(nrows=3, ncols=1)
bbtvr[54:63].plot(ax=axe[0])
etvrsc[0:12].plot(ax=axe[1])
k[0:24].plot(ax=axe[2])









    Out[234]:





<matplotlib.axes._subplots.AxesSubplot at 0x1c7a426aa90>

It is taken as gospel that the highest rating a TV show will get is during finales. And that reason generally remains true here. The anomaly, as you can see, remains in "HIMYM" Season 3. The reason for that episode in particular having such high viewership is likely because of the presence of Britney Spears, during her infamous 2007-08 period when she had her public meltdown. So, with the exception of notable event episodes (i.e. special guest stars), the statement has been proven true.

With that in mind, the topmost graph, which is for "Breaking Bad" Season 5 Part 2, shows a dip throughout the season, same with "HIMYM" Season 3. And if I were to plot the rest of the seasons for the TV shows I'm examining, the presence of dips is almost guaranteed. In the past 5 years, only one show has had a season where viewership increased episode-to-episode, and that was "Empire" Season 1. If I had time, I'd have loved to examine the Twitter data to see the volume of hashtags associated with the show during the 2014-15 television season, to see if there was indeed a correlation between so-called "watercooler" moments and this show. No correlation could be found regarding critical opinion or community opinion and the show's ratings, so I'm left to hypothesize that the reason "Empire" bucked all those trends was because it was a soap opera that played predominantly to a segment of the population left ignored by traditional TV shows and that it had enough gasp-worthy moments to basically have people covert new watchers to the show by just talking about it.

So, I began this project in the hopes, I'd find my beliefs validated: that there does exist some substantive correlation between viewership and critical data. In the end, it ended up being false. There exists no correlation between viewership and critical or popular opinion, neither does there really exist a correlation between critical and popular opinion. Instead, the elimination of these as reasons for popularity gives me insight into what I could do to build on this idea. Reviews can impact a show's viewing audience by convincing their reader to give a show a chance. But, critics are impersonal to us: we don't trust their beliefs because of who they are, rather it is because we trust the publication letting this person act as their offical opinion-giver. But, because of that impersonality, we don't really value their opinion all to highly compared to, say, a friend. Thus, what is the impact if a friend or a circle of friends give a positive opinion when compared to critics? If your friends like a show, say "Westworld", and talk about how much they like, does it make you more prone to watching the show compared to if a New York Times Arts reviewer says you should see it? That'd be an interesting experiment.



In [ ]:

show	24										...	Younger	Z Nation
season	7										...	2	3
epno	1	2	3	4	5	6	7	8	9	10	...	12	1	2	3	4	5	6	7	8	9
epname	Day 7: 8:00 A.M.-9:00 A.M.	Day 7: 9:00 A.M.-10:00 A.M.	Day 7: 10:00 A.M.-11:00 A.M.	Day 7: 11:00 A.M.-12:00 P.M.	Day 7: 12:00 P.M.-1:00 P.M.	Day 7: 1:00 P.M.-2:00 P.M.	Day 7: 2:00 P.M.-3:00 P.M.	Day 7: 3:00 P.M.-4:00 P.M.	Day 7: 4:00 P.M.-5:00 P.M.	Day 7: 5:00 P.M.-6:00 P.M.	...	No Weddings & a Funeral	No Mercy	A New Mission	Murphy's Miracle	Escorpion and the Red Hand	Little Red and the Wolfz	Doc Flew Over the Cuckoo's Nest	Welcome to Murphytown	Election Day	Heart of Darkness
relepno	1	2	3	4	5	6	7	8	9	10	...	23	1	2	3	4	5	6	7	8	9
href	/tvclub/24-800-am-1000-am-22203	/tvclub/24-800-am-1000-am-22203	/tvclub/24-1000-am-1200-pm-22238	/tvclub/24-1000-am-1200-pm-22238	/tvclub/24-1200-pm-100-pm-22862	/tvclub/24-100-pm-200-pm-23085	/tvclub/24-200-pm-300-pm-23330	/tvclub/24-300pm-400pm-23625	/tvclub/24-400-pm-500-pm-23898	/tvclub/24-500pm-600pm-24278	...	/tvclub/friendship-trumps-shipping-younger-sea...	/tvclub/z-nation-returns-guns-blazing-stand-al...	/tvclub/theres-brand-new-mission-z-nation-team...	/tvclub/zombies-arent-only-things-moving-slow-...	/tvclub/z-nation-gets-its-gross-groove-back-vi...	/tvclub/z-nation-dives-deep-survivors-ordeal-a...	/tvclub/z-nation-visits-mental-asylum-and-ends...	/tvclub/z-nations-table-setting-presents-two-h...	/tvclub/z-nation-presidential-election-basical...	/tvclub/its-no-metaphor-when-death-comes-seatt...
criticname	Zack Handlen	Zack Handlen	Zack Handlen	Zack Handlen	Zack Handlen	Zack Handlen	Zack Handlen	Zack Handlen	Zack Handlen	Zack Handlen	...	Alexa Planje	Alex McCown-Levy	Alex McCown-Levy	Alex McCown-Levy	Alex McCown-Levy	Alex McCown-Levy	Alex McCown-Levy	Alex McCown-Levy	Alex McCown-Levy	Alex McCown-Levy
nratings	2	1	1	1	1	1	1	1	1	NaN	...	24	24	21	15	15	13	16	16	12	10
critrating	10	10	8	8	7	9	9	10	8	9	...	9	7	8	5	9	10	5	8	9	8
commrating	9.5	10	9	9	8	10	7	10	8	NaN	...	8.83333	9.29167	8.66667	8.26667	9.6	9.15385	7.6875	9.4375	8.58333	7.9

	epname	relepno	href	criticname	nratings	critrating	commrating
epno
1	Day 8: 4:00P.M. - 5:00P.M.	25	/tvclub/24-400pm-500pm500pm-600pm-37169	Zack Handlen	22	7	7.18182
2	Day 8: 5:00P.M. - 6:00P.M.	26	/tvclub/24-400pm-500pm500pm-600pm-37169	Zack Handlen	20	7	7.2
3	Day 8: 6:00P.M. - 7:00P.M.	27	/tvclub/24-600pm-700pm700pm-800pm-37214	Zack Handlen	23	9	8.82609
4	Day 8: 7:00P.M. - 8:00P.M.	28	/tvclub/24-600pm-700pm700pm-800pm-37214	Zack Handlen	23	9	8.95652
5	Day 8: 8:00P.M. - 9:00P.M.	29	/tvclub/24-800pm-900pm-37513	Zack Handlen	23	6	5.91304
6	Day 8: 9:00P.M. - 10:00P.M.	30	/tvclub/24-900pm-1000pm-37772	Zack Handlen	19	5	4.73684
7	Day 8: 10:00P.M. - 11:00P.M.	31	/tvclub/24-1000pm-1100pm-38024	Zack Handlen	26	7	6.57692
8	Day 8: 11:00P.M. - 12:00A.M.	32	/tvclub/24-1100pm-1200am-38252	Zack Handlen	16	9	8.0625
9	Day 8: 12:00A.M. - 1:00A.M.	33	/tvclub/24-1200am-100am-38495	Zack Handlen	21	9	8
10	Day 8: 1:00A.M. - 2:00A.M.	34	/tvclub/24-100am-200am-38707	Zack Handlen	20	9	7.15
11	Day 8: 2:00A.M. - 3:00A.M.	35	/tvclub/24-200am-300am-38999	Zack Handlen	22	7	6.72727
12	Day 8: 3:00A.M. - 4:00A.M.	36	/tvclub/24-300am-400am-39242	Zack Handlen	26	5	6.69231
13	Day 8: 4:00A.M. - 5:00A.M.	37	/tvclub/24-400am-500am-39449	Zack Handlen	24	6	5.125
14	Day 8: 5:00A.M. - 6:00A.M.	38	/tvclub/24-500am-600am-39655	Zack Handlen	24	5	6.33333
15	Day 8: 6:00A.M. - 7:00A.M.	39	/tvclub/24-600am-800am-39852	Zack Handlen	43	6	8.25581
16	Day 8: 7:00A.M. - 8:00A.M.	40	/tvclub/24-600am-800am-39852	Zack Handlen	43	6	8.25581
17	Day 8: 8:00A.M. - 9:00A.M.	41	/tvclub/24-800am-900am-40041	Zack Handlen	24	8	8.04167
18	Day 8: 9:00A.M. - 10:00A.M.	42	/tvclub/24-900am-1000am-40283	Zack Handlen	19	8	7.52632
19	Day 8: 10:00A.M. - 11:00A.M.	43	/tvclub/24-1000am-1100am-40509	Zack Handlen	25	4	5.12
20	Day 8: 11:00A.M. - 12:00P.M.	44	/tvclub/24-1100am-1200pm-40758	Zack Handlen	19	6	7.21053
21	Day 8: 12:00P.M. - 1:00P.M.	45	/tvclub/24-1200pm-100pm-41026	Zack Handlen	18	9	7.72222
22	Day 8: 1:00P.M. - 2:00P.M.	46	/tvclub/24-100pm-200pm-41254	Zack Handlen	20	10	8.9
23	Day 8: 2:00P.M. - 3:00P.M.	47	/tvclub/24-200pm-400pm-41481	Zack Handlen	39	8	8.48718
24	Day 8: 3:00P.M. - 4:00P.M.	48	/tvclub/24-200pm-400pm-41481	Zack Handlen	38	8	8.44737

		Episode Title	Rating
	Episode Number
Season 1	1	Pilot	8.5
	2	Purple Giraffe	8.2
	3	The Sweet Taste of Liberty	8.2
	4	Return of the Shirt	8.1
	5	Okay Awesome	8.4
	6	The Slutty Pumpkin	8.2
	7	Matchmaker	7.8
	8	The Duel	8.2
	9	Belly Full of Turkey	8.1
	10	The Pineapple Incident	9.2
	11	The Limo	8.3
	12	The Wedding	8.2
	13	Drumroll, Please	8.7
	14	Zip, Zip, Zip	8.2
	15	Game Night	9.1
	16	Cupcake	7.9
	17	Life Among the Gorillas	7.8
	18	Nothing Good Happens After 2 AM	8.4
	19	Mary the Paralegal	8.8
	20	Best Prom Ever	7.9
	21	Milk	8.2
	22	Come On	8.8
Season 2	1	Where Were We?	8.2
	2	The Scorpion and the Toad	8.3
	3	Brunch	8.5
	4	Ted Mosby, Architect	8.8
	5	World's Greatest Couple	8.8
	6	Aldrin Justice	8.2
	7	Swarley	9.0
	8	Atlantic City	8.0
...	...	...	...
Season 8	19	The Fortress	7.8
	20	The Time Travelers	8.0
	21	Romeward Bound	7.3
	22	The Bro Mitzvah	8.5
	23	Something Old	7.4
	24	Something New	8.6
Season 9	1	The Locket	7.7
	2	Coming Back	7.7
	3	Last Time in New York	7.5
	4	The Broken Code	7.1
	5	The Poker Game	7.1
	6	Knight Vision	7.4
	7	No Questions Asked	7.1
	8	The Lighthouse	7.2
	9	Platonish	8.3
	10	Mom and Dad	6.5
	11	Bedtime Stories	5.5
	12	The Rehearsal Dinner	7.6
	13	Bass Player Wanted	8.2
	14	Slapsgiving 3: Slappointment in Slapmarra	5.5
	15	Unpause	8.4
	16	How Your Mother Met Me	9.5
	17	Sunrise	8.1
	18	Rally	8.0
	19	Vesuvius	7.6
	20	Daisy	8.4
	21	Gary Blauman	8.3
	22	The End of the Aisle	8.9
	23	Last Forever: Part One	6.9
	24	Last Forever: Part Two	5.7

	Episode Title	Rating
Episode Number
1	Antipasto	None
2	Primavera	None
3	Secondo	None
4	Episode #3.4	None
5	Episode #3.5	None
6	Episode #3.6	None
7	Digestivo	None
8	The Great Red Dragon	None
9	...And the Woman Clothed in Sun	None
10	...And the Woman Clothed with the Sun	None
11	...And the Beast from the Sea	None
12	Episode #3.12	None
13	The Wrath of the Lamb	None

	Episode Number	Episode Title	Rating
0	1	Seven Thirty-Seven	7.9
1	2	Grilled	8.5
2	3	Bit by a Dead Bee	7.6
3	4	Down	7.5
4	5	Breakage	7.6
5	6	Peekaboo	8.1
6	7	Negro Y Azul	7.9
7	8	Better Call Saul	8.3
8	9	4 Days Out	8.2
9	10	Over	7.8
10	11	Mandala	8.0
11	12	Phoenix	8.4
12	13	ABQ	8.4

	Number Overall	Season	No. in Season	Title	Directed By	Written By	Air Date	Ratings (in Millions)
0	1	1	1	"Pilot"	Vince Gilligan	Vince Gilligan	January 20, 2008	1.41
1	2	1	2	"Cat's in the Bag..."	Adam Bernstein	Vince Gilligan	January 27, 2008	1.49
2	3	1	3	"...And the Bag's in the River"	Adam Bernstein	Vince Gilligan	February 10, 2008	1.08
3	4	1	4	"Cancer Man"	Jim McKay	Vince Gilligan	February 17, 2008	1.09
4	5	1	5	"Gray Matter"	Tricia Brock	Patty Lin	February 24, 2008	0.97
5	6	1	6	"Crazy Handful of Nothin'"	Bronwen Hughes	George Mastras	March 2, 2008	1.07
6	7	1	7	"A No-Rough-Stuff-Type Deal"	Tim Hunter	Peter Gould	March 9, 2008	1.50
7	8	2	1	"Seven Thirty-Seven"	Bryan Cranston	J. Roberts	March 8, 2009	1.66
8	9	2	2	"Grilled"	Charles Haid	George Mastras	March 15, 2009	1.60
9	10	2	3	"Bit by a Dead Bee"	Terry McDonough	Peter Gould	March 22, 2009	1.13
10	11	2	4	"Down"	John Dahl	Sam Catlin	March 29, 2009	1.29
11	12	2	5	"Breakage"	Johan Renck	Moira Walley-Beckett	April 5, 2009	1.21
12	13	2	6	"Peekaboo"	Peter Medak	J. Roberts & Vince Gilligan	April 12, 2009	1.41
13	14	2	7	"Negro y Azul"	Felix Alcala	John Shiban	April 19, 2009	NaN
14	15	2	8	"Better Call Saul"	Terry McDonough	Peter Gould	April 26, 2009	1.04
15	16	2	9	"4 Days Out"	Michelle MacLaren	Sam Catlin	May 3, 2009	NaN
16	17	2	10	"Over"	Phil Abraham	Moira Walley-Beckett	May 10, 2009	NaN
17	18	2	11	"Mandala"	Adam Bernstein	George Mastras	May 17, 2009	NaN
18	19	2	12	"Phoenix"	Colin Bucksey	John Shiban	May 24, 2009	NaN
19	20	2	13	"ABQ"	Adam Bernstein	Vince Gilligan	May 31, 2009	1.50
20	21	3	1	"No Más"	Bryan Cranston	Vince Gilligan	March 21, 2010	1.95
21	22	3	2	"Caballo Sin Nombre"	Adam Bernstein	Peter Gould	March 28, 2010	1.55
22	23	3	3	"I.F.T."	Michelle MacLaren	George Mastras	April 4, 2010	1.33
23	24	3	4	"Green Light"	Scott Winant	Sam Catlin	April 11, 2010	1.46
24	25	3	5	"Más"	Johan Renck	Moira Walley-Beckett	April 18, 2010	1.61
25	26	3	6	"Sunset"	John Shiban	John Shiban	April 25, 2010	1.64
26	27	3	7	"One Minute"	Michelle MacLaren	Thomas Schnauz	May 2, 2010	1.52
27	28	3	8	"I See You"	Colin Bucksey	Gennifer Hutchison	May 9, 2010	1.78
28	29	3	9	"Kafkaesque"	Michael Slovis	Peter Gould & George Mastras	May 16, 2010	1.61
29	30	3	10	"Fly"	Rian Johnson	Sam Catlin & Moira Walley-Beckett	May 23, 2010	1.20
...	...	...	...	...	...	...	...	...
32	33	3	13	"Full Measure"	Vince Gilligan	Vince Gilligan	June 13, 2010	1.56
33	34	4	1	"Box Cutter"	Adam Bernstein	Vince Gilligan	July 17, 2011	2.58
34	35	4	2	"Thirty-Eight Snub"	Michelle MacLaren	George Mastras	July 24, 2011	1.97
35	36	4	3	"Open House"	David Slade	Sam Catlin	July 31, 2011	1.71
36	37	4	4	"Bullet Points"	Colin Bucksey	Moira Walley-Beckett	August 7, 2011	1.93
37	38	4	5	"Shotgun"	Michelle MacLaren	Thomas Schnauz	August 14, 2011	1.75
38	39	4	6	"Cornered"	Michael Slovis	Gennifer Hutchison	August 21, 2011	1.67
39	40	4	7	"Problem Dog"	Peter Gould	Peter Gould	August 28, 2011	1.91
40	41	4	8	"Hermanos"	Johan Renck	Sam Catlin & George Mastras	September 4, 2011	1.98
41	42	4	9	"Bug"	Terry McDonough	Moira Walley-Beckett & Thomas Schnauz	September 11, 2011	1.89
42	43	4	10	"Salud"	Michelle MacLaren	Peter Gould & Gennifer Hutchison	September 18, 2011	1.80
43	44	4	11	"Crawl Space"	Scott Winant	George Mastras & Sam Catlin	September 25, 2011	1.55
44	45	4	12	"End Times"	Vince Gilligan	Thomas Schnauz & Moira Walley-Beckett	October 2, 2011	1.73
45	46	4	13	"Face Off"	Vince Gilligan	Vince Gilligan	October 9, 2011	1.90
46	47	5	1	"Live Free or Die"	Michael Slovis	Vince Gilligan	July 15, 2012	2.93
47	48	5	2	"Madrigal"	Michelle MacLaren	Vince Gilligan	July 22, 2012	2.29
48	49	5	3	"Hazard Pay"	Adam Bernstein	Peter Gould	July 29, 2012	2.20
49	50	5	4	"Fifty-One"	Rian Johnson	Sam Catlin	August 5, 2012	2.29
50	51	5	5	"Dead Freight"	George Mastras	George Mastras	August 12, 2012	2.48
51	52	5	6	"Buyout"	Colin Bucksey	Gennifer Hutchison	August 19, 2012	2.81
52	53	5	7	"Say My Name"	Thomas Schnauz	Thomas Schnauz	August 26, 2012	2.98
53	54	5	8	"Gliding Over All"	Michelle MacLaren	Moira Walley-Beckett	September 2, 2012	2.78
54	55	5	9	"Blood Money"	Bryan Cranston	Peter Gould	August 11, 2013	5.92
55	56	5	10	"Buried"	Michelle MacLaren	Thomas Schnauz	August 18, 2013	4.77
56	57	5	11	"Confessions"	Michael Slovis	Gennifer Hutchison	August 25, 2013	4.85
57	58	5	12	"Rabid Dog"	Sam Catlin	Sam Catlin	September 1, 2013	4.41
58	59	5	13	"To'hajiilee"	Michelle MacLaren	George Mastras	September 8, 2013	5.11
59	60	5	14	"Ozymandias"	Rian Johnson	Moira Walley-Beckett	September 15, 2013	6.37
60	61	5	15	"Granite State"	Peter Gould	Peter Gould	September 22, 2013	6.58
61	62	5	16	"Felina"	Vince Gilligan	Vince Gilligan	September 29, 2013	10.28