Predicting the Outcome of Cricket Matches

Introduction

In this project, we shall build a model which predicts the outcome of cricket matches in the Indian Premier League using data about matches and deliveries.

Data Mining:

Season : 2008 - 2015 (8 Seasons)
Teams : DD, KKR, MI, RCB, KXIP, RR, CSK (7 Teams)
Neglect matches that have inconsistencies such as No Result, Tie, D/L Method, etc.

Features:

Average Batsman Rating (Strike Rate)
Average Bowler Rating (Wickets per Run)
Player of the Match Awards
Previous Encounters - Win by runs, Win by Wickets
Recent form

Prediction Model

Logistic Regression using sklearn
K-Nearest Neighbors using sklearn



In [1]:

    
%matplotlib inline 
import numpy as np # imports a fast numerical programming library
import matplotlib.pyplot as plt #sets up plotting under plt
import pandas as pd #lets us handle data as dataframes
#sets up pandas table display
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set(style="whitegrid", color_codes=True)
from __future__ import division

Data Mining



In [2]:

    
# Reading in the data
allmatches = pd.read_csv("../data/matches.csv")
alldeliveries = pd.read_csv("../data/deliveries.csv")
allmatches.head(10)









    Out[2]:







  
    
      
      id
      season
      city
      date
      team1
      team2
      toss_winner
      toss_decision
      result
      dl_applied
      winner
      win_by_runs
      win_by_wickets
      player_of_match
      venue
      umpire1
      umpire2
      umpire3
    
  
  
    
      0
      1
      2008
      Bangalore
      2008-04-18
      Kolkata Knight Riders
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Kolkata Knight Riders
      140
      0
      BB McCullum
      M Chinnaswamy Stadium
      Asad Rauf
      RE Koertzen
      NaN
    
    
      1
      2
      2008
      Chandigarh
      2008-04-19
      Chennai Super Kings
      Kings XI Punjab
      Chennai Super Kings
      bat
      normal
      0
      Chennai Super Kings
      33
      0
      MEK Hussey
      Punjab Cricket Association Stadium, Mohali
      MR Benson
      SL Shastri
      NaN
    
    
      2
      3
      2008
      Delhi
      2008-04-19
      Rajasthan Royals
      Delhi Daredevils
      Rajasthan Royals
      bat
      normal
      0
      Delhi Daredevils
      0
      9
      MF Maharoof
      Feroz Shah Kotla
      Aleem Dar
      GA Pratapkumar
      NaN
    
    
      3
      4
      2008
      Mumbai
      2008-04-20
      Mumbai Indians
      Royal Challengers Bangalore
      Mumbai Indians
      bat
      normal
      0
      Royal Challengers Bangalore
      0
      5
      MV Boucher
      Wankhede Stadium
      SJ Davis
      DJ Harper
      NaN
    
    
      4
      5
      2008
      Kolkata
      2008-04-20
      Deccan Chargers
      Kolkata Knight Riders
      Deccan Chargers
      bat
      normal
      0
      Kolkata Knight Riders
      0
      5
      DJ Hussey
      Eden Gardens
      BF Bowden
      K Hariharan
      NaN
    
    
      5
      6
      2008
      Jaipur
      2008-04-21
      Kings XI Punjab
      Rajasthan Royals
      Kings XI Punjab
      bat
      normal
      0
      Rajasthan Royals
      0
      6
      SR Watson
      Sawai Mansingh Stadium
      Aleem Dar
      RB Tiffin
      NaN
    
    
      6
      7
      2008
      Hyderabad
      2008-04-22
      Deccan Chargers
      Delhi Daredevils
      Deccan Chargers
      bat
      normal
      0
      Delhi Daredevils
      0
      9
      V Sehwag
      Rajiv Gandhi International Stadium, Uppal
      IL Howell
      AM Saheba
      NaN
    
    
      7
      8
      2008
      Chennai
      2008-04-23
      Chennai Super Kings
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Chennai Super Kings
      6
      0
      ML Hayden
      MA Chidambaram Stadium, Chepauk
      DJ Harper
      GA Pratapkumar
      NaN
    
    
      8
      9
      2008
      Hyderabad
      2008-04-24
      Deccan Chargers
      Rajasthan Royals
      Rajasthan Royals
      field
      normal
      0
      Rajasthan Royals
      0
      3
      YK Pathan
      Rajiv Gandhi International Stadium, Uppal
      Asad Rauf
      MR Benson
      NaN
    
    
      9
      10
      2008
      Chandigarh
      2008-04-25
      Kings XI Punjab
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Kings XI Punjab
      66
      0
      KC Sangakkara
      Punjab Cricket Association Stadium, Mohali
      Aleem Dar
      AM Saheba
      NaN



In [3]:

    
# Selecting Seasons 2008 - 2015
matches_seasons = allmatches.loc[allmatches['season'] != 2016]
deliveries_seasons = alldeliveries.loc[alldeliveries['match_id'] < 518]



In [4]:

    
# Selecting teams DD, KKR, MI, RCB, KXIP, RR, CSK
matches_teams = matches_seasons.loc[(matches_seasons['team1'].isin(['Kolkata Knight Riders', \
'Royal Challengers Bangalore', 'Delhi Daredevils', 'Chennai Super Kings', 'Rajasthan Royals', \
'Mumbai Indians', 'Kings XI Punjab'])) & (matches_seasons['team2'].isin(['Kolkata Knight Riders', \
'Royal Challengers Bangalore', 'Delhi Daredevils', 'Chennai Super Kings', 'Rajasthan Royals', \
'Mumbai Indians', 'Kings XI Punjab']))]
matches_team_matchids = matches_teams.id.unique()
deliveries_teams = deliveries_seasons.loc[deliveries_seasons['match_id'].isin(matches_team_matchids)]
print "Teams selected:\n"
for team in matches_teams.team1.unique():
    print team









    



Teams selected:

Kolkata Knight Riders
Chennai Super Kings
Rajasthan Royals
Mumbai Indians
Kings XI Punjab
Royal Challengers Bangalore
Delhi Daredevils



In [5]:

    
# Neglect matches with inconsistencies like 'No Result' or 'D/L Applied'
matches = matches_teams.loc[(matches_teams['result'] == 'normal') & (matches_teams['dl_applied'] == 0)]
matches_matchids = matches.id.unique()
deliveries = deliveries_teams.loc[deliveries_teams['match_id'].isin(matches_matchids)]
# Verifying consistency between datasets
(matches.id.unique() == deliveries.match_id.unique()).all()









    Out[5]:





True

Building Features



In [6]:

    
# Batsman Strike Rate Calculation
# Team 1: Batting First; Team 2: Fielding First

def getMatchDeliveriesDF(match_id):
    return deliveries.loc[deliveries['match_id'] == match_id]

def getInningsOneBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].batsman.unique()[0:5]

def getInningsTwoBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].batsman.unique()[0:5]

def getBatsmanStrikeRate(batsman, match_id):
    onstrikedeliveries = deliveries.loc[(deliveries['match_id'] < match_id) & (deliveries['batsman'] == batsman)]
    total_runs = onstrikedeliveries['batsman_runs'].sum()
    total_balls = onstrikedeliveries.shape[0]
    if total_balls != 0: 
        return (total_runs/total_balls) * 100
    else:
        return None

def getTeamStrikeRate(batsmen, match_id):
    strike_rates = []
    for batsman in batsmen:
        bsr = getBatsmanStrikeRate(batsman, match_id)
        if bsr != None:
            strike_rates.append(bsr)
    return np.mean(strike_rates)

def getAverageStrikeRates(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBatsmen = getInningsOneBatsmen(match_deliveries)
    innTwoBatsmen = getInningsTwoBatsmen(match_deliveries)
    teamOneSR = getTeamStrikeRate(innOneBatsmen, match_id)
    teamTwoSR = getTeamStrikeRate(innTwoBatsmen, match_id)
    return teamOneSR, teamTwoSR



In [7]:

    
# Testing Functionality
getAverageStrikeRates(517)









    Out[7]:





(126.98024523159935, 128.55579510411653)



In [8]:

    
# Bowler Rating : Wickets/Run (Higher the Better)
# Team 1: Batting First; Team 2: Fielding First

def getInningsOneBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].bowler.unique()[0:4]

def getInningsTwoBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].bowler.unique()[0:4]

def getBowlerWPR(bowler, match_id):
    balls = deliveries.loc[(deliveries['match_id'] < match_id) & (deliveries['bowler'] == bowler)]
    total_runs = balls['total_runs'].sum()
    total_wickets = balls.loc[balls['dismissal_kind'].isin(['caught', 'bowled', 'lbw', \
    'caught and bowled', 'stumped'])].shape[0]
    if balls.shape[0] > 0:
        return (total_wickets/total_runs) * 100
    else:
        return None

def getTeamWPR(bowlers, match_id):
    WPRs = []
    for bowler in bowlers:
        bwpr = getBowlerWPR(bowler, match_id)
        if bwpr != None:
            WPRs.append(bwpr)
    return np.mean(WPRs)

def getAverageWPR(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBowlers = getInningsOneBowlers(match_deliveries)
    innTwoBowlers = getInningsTwoBowlers(match_deliveries)
    teamOneWPR = getTeamWPR(innTwoBowlers, match_id)
    teamTwoWPR = getTeamWPR(innOneBowlers, match_id)
    return teamOneWPR, teamTwoWPR



In [9]:

    
# testing functionality
getAverageWPR(517)









    Out[9]:





(2.7641806594085776, 4.4721111768026631)



In [10]:

    
# MVP Score (Total number of Player of the Match awards in a squad)

def getAllInningsOneBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].batsman.unique()

def getAllInningsTwoBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].batsman.unique()

def getAllInningsOneBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].bowler.unique()

def getAllInningsTwoBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].bowler.unique()

def makeSquad(batsmen, bowlers):
    p = []
    p = np.append(p, batsmen)
    for i in bowlers:
        if i not in batsmen:
            p = np.append(p, i)
    return p

def getPlayerMVPAwards(player, match_id):
    return matches.loc[(matches['player_of_match'] == player) & (matches['id'] < match_id)].shape[0]

def getTeamMVPAwards(squad, match_id):
    num_awards = 0
    for player in squad:
        num_awards += getPlayerMVPAwards(player, match_id)
    return num_awards

def compareMVPAwards(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBatsmen = getAllInningsOneBatsmen(match_deliveries)
    innTwoBatsmen = getAllInningsTwoBatsmen(match_deliveries)
    innOneBowlers = getAllInningsOneBowlers(match_deliveries)
    innTwoBowlers = getAllInningsTwoBowlers(match_deliveries)
    teamOneSquad = makeSquad(innOneBatsmen, innTwoBowlers)
    teamTwoSquad = makeSquad(innTwoBatsmen, innOneBowlers)
    teamOneAwards = getTeamMVPAwards(teamOneSquad, match_id)
    teamTwoAwards = getTeamMVPAwards(teamTwoSquad, match_id)
    return teamOneAwards, teamTwoAwards



In [11]:

    
compareMVPAwards(517)









    Out[11]:





(28, 52)



In [12]:

    
# Prints a comparison between two teams based on squad attributes

def generateSquadRating(match_id):
    gameday_teams = deliveries.loc[(deliveries['match_id'] == match_id)].batting_team.unique()
    teamOne = gameday_teams[0]
    teamTwo = gameday_teams[1]
    teamOneSR, teamTwoSR = getAverageStrikeRates(match_id)
    teamOneWPR, teamTwoWPR = getAverageWPR(match_id)
    teamOneMVPs, teamTwoMVPs = compareMVPAwards(match_id)
    print "Comparing squads for {} vs {}".format(teamOne,teamTwo)
    print "\nAverage Strike Rate for Batsmen in {} : {}".format(teamOne,teamOneSR)
    print "\nAverage Strike Rate for Batsmen in {} : {}".format(teamTwo,teamTwoSR)
    print "\nBowler Rating (W/R) for {} : {}".format(teamOne,teamOneWPR)
    print "\nBowler Rating (W/R) for {} : {}".format(teamTwo,teamTwoWPR)
    print "\nNumber of MVP Awards in {} : {}".format(teamOne,teamOneMVPs)
    print "\nNumber of MVP Awards in {} : {}".format(teamTwo,teamTwoMVPs)



In [13]:

    
#Testing Functionality
generateSquadRating(517)









    



Comparing squads for Mumbai Indians vs Chennai Super Kings

Average Strike Rate for Batsmen in Mumbai Indians : 126.980245232

Average Strike Rate for Batsmen in Chennai Super Kings : 128.555795104

Bowler Rating (W/R) for Mumbai Indians : 2.76418065941

Bowler Rating (W/R) for Chennai Super Kings : 4.4721111768

Number of MVP Awards in Mumbai Indians : 28

Number of MVP Awards in Chennai Super Kings : 52



In [14]:

    
## 2nd Feature : Previous Encounter
# Won by runs and won by wickets (Higher the better)

def getTeam1(match_id):
    return matches.loc[matches["id"] == match_id].team1.unique()

def getTeam2(match_id):
    return matches.loc[matches["id"] == match_id].team2.unique()

def getPreviousEncDF(match_id):
    team1 = getTeam1(match_id)
    team2 = getTeam2(match_id)
    return matches.loc[(matches["id"] < match_id) & (((matches["team1"].isin(team1)) & (matches["team2"].isin(team2))) | ((matches["team1"].isin(team2)) & (matches["team2"].isin(team1))))]

def getTeamWBR(match_id, team):
    WBR = 0
    DF = getPreviousEncDF(match_id)
    winnerDF = DF.loc[DF["winner"] == team]
    WBR = winnerDF['win_by_runs'].sum()    
    return WBR


def getTeamWBW(match_id, team):
    WBW = 0 
    DF = getPreviousEncDF(match_id)
    winnerDF = DF.loc[DF["winner"] == team]
    WBW = winnerDF['win_by_wickets'].sum()
    return WBW 
    
def getTeamWinPerc(match_id):
    dF = getPreviousEncDF(match_id)
    timesPlayed = dF.shape[0]
    team1 = getTeam1(match_id)[0].strip("[]")
    timesWon = dF.loc[dF["winner"] == team1].shape[0]
    if timesPlayed != 0:
        winPerc = (timesWon/timesPlayed) * 100
    else:
        winPerc = 0
    return winPerc

def getBothTeamStats(match_id):
    DF = getPreviousEncDF(match_id)
    team1 = getTeam1(match_id)[0].strip("[]")
    team2 = getTeam2(match_id)[0].strip("[]")
    timesPlayed = DF.shape[0]
    timesWon = DF.loc[DF["winner"] == team1].shape[0]
    WBRTeam1 = getTeamWBR(match_id, team1)
    WBRTeam2 = getTeamWBR(match_id, team2)
    WBWTeam1 = getTeamWBW(match_id, team1)
    WBWTeam2 = getTeamWBW(match_id, team2)

    print "Out of {} times in the past {} have won {} times({}%) from {}".format(timesPlayed, team1, timesWon, getTeamWinPerc(match_id), team2)
    print "{} won by {} total runs and {} total wickets.".format(team1, WBRTeam1, WBWTeam1)
    print "{} won by {} total runs and {} total wickets.".format(team2, WBRTeam2, WBWTeam2)



In [15]:

    
#Testing functionality 
getBothTeamStats(517)









    



Out of 21 times in the past Mumbai Indians have won 11 times(52.380952381%) from Chennai Super Kings
Mumbai Indians won by 144 total runs and 30 total wickets.
Chennai Super Kings won by 138 total runs and 31 total wickets.



In [16]:

    
# 3rd Feature: Recent Form (Win Percentage of 3 previous matches of a team in the same season)
# Higher the better

def getMatchYear(match_id):
    return matches.loc[matches["id"] == match_id].season.unique()

def getTeam1DF(match_id, year):
    team1 = getTeam1(match_id)
    return matches.loc[(matches["id"] < match_id) & (matches["season"] == year) & ((matches["team1"].isin(team1)) | (matches["team2"].isin(team1)))].tail(3)

def getTeam2DF(match_id, year):
    team2 = getTeam2(match_id)
    return matches.loc[(matches["id"] < match_id) & (matches["season"] == year) & ((matches["team1"].isin(team2)) | (matches["team2"].isin(team2)))].tail(3)

def getTeamWinPercentage(match_id):
    year = int(getMatchYear(match_id))
    team1 = getTeam1(match_id)[0].strip("[]")
    team2 = getTeam2(match_id)[0].strip("[]")
    team1DF = getTeam1DF(match_id, year)
    team2DF = getTeam2DF(match_id, year)
    team1TotalMatches = team1DF.shape[0]
    team1WinMatches = team1DF.loc[team1DF["winner"] == team1].shape[0]
    team2TotalMatches = team2DF.shape[0]
    team2WinMatches = team2DF.loc[team2DF["winner"] == team2].shape[0]
    if (team1TotalMatches != 0) and (team2TotalMatches !=0):
        winPercTeam1 = ((team1WinMatches / team1TotalMatches) * 100) 
        winPercTeam2 = ((team2WinMatches / team2TotalMatches) * 100) 
    elif (team1TotalMatches != 0) and (team2TotalMatches ==0):
        winPercTeam1 = ((team1WinMatches / team1TotalMatches) * 100) 
        winPercTeam2 = 0
    elif (team1TotalMatches == 0) and (team2TotalMatches !=0):
        winPercTeam1 = 0
        winPercTeam2 = ((team2WinMatches / team2TotalMatches) * 100) 
    else:
        winPercTeam1 = 0
        winPercTeam2 = 0
    return winPercTeam1, winPercTeam2



In [17]:

    
#Testing Functionality
getTeamWinPercentage(517)









    Out[17]:





(66.66666666666666, 66.66666666666666)



In [18]:

    
#Function to implement all features
def getAllFeatures(match_id):
    generateSquadRating(match_id)
    print ("\n")
    getBothTeamStats(match_id)
    print("\n")
    getTeamWinPercentage(match_id)



In [19]:

    
#Testing Functionality
getAllFeatures(517)









    



Comparing squads for Mumbai Indians vs Chennai Super Kings

Average Strike Rate for Batsmen in Mumbai Indians : 126.980245232

Average Strike Rate for Batsmen in Chennai Super Kings : 128.555795104

Bowler Rating (W/R) for Mumbai Indians : 2.76418065941

Bowler Rating (W/R) for Chennai Super Kings : 4.4721111768

Number of MVP Awards in Mumbai Indians : 28

Number of MVP Awards in Chennai Super Kings : 52


Out of 21 times in the past Mumbai Indians have won 11 times(52.380952381%) from Chennai Super Kings
Mumbai Indians won by 144 total runs and 30 total wickets.
Chennai Super Kings won by 138 total runs and 31 total wickets.

Adding New Columns for Features in Matches DataFrame



In [20]:

    
#Create Column for Team 1 Winning Status (1 = Won, 0 = Lost)

matches['team1Winning'] = np.where(matches['team1'] == matches['winner'], 1, 0)









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until



In [21]:

    
# New Column for Difference of Average Strike rates (First Team SR - Second Team SR) 
# [Negative value means Second team is better]

firstTeamSR = []
secondTeamSR = []
for i in matches['id'].unique():
    P, Q = getAverageStrikeRates(i)
    firstTeamSR.append(P), secondTeamSR.append(Q)
firstSRSeries = pd.Series(firstTeamSR)
secondSRSeries = pd.Series(secondTeamSR)
matches["Avg_SR_Difference"] = firstSRSeries.values - secondSRSeries.values









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/Users/gursahej/anaconda2/lib/python2.7/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()



In [22]:

    
# New Column for Difference of Wickets Per Run (First Team WPR - Second Team WPR) 
# [Negative value means Second team is better]

firstTeamWPR = []
secondTeamWPR = []
for i in matches['id'].unique():
    R, S = getAverageWPR(i)
    firstTeamWPR.append(R), secondTeamWPR.append(S)
firstWPRSeries = pd.Series(firstTeamWPR)
secondWPRSeries = pd.Series(secondTeamWPR)
matches["Avg_WPR_Difference"] = firstWPRSeries.values - secondWPRSeries.values









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()



In [23]:

    
# New column for difference of MVP Awards 
# (Negative value means Second team is better)

firstTeamMVP = []
secondTeamMVP = []
for i in matches['id'].unique():
    T, U = compareMVPAwards(i)
    firstTeamMVP.append(T), secondTeamMVP.append(U)
firstMVPSeries = pd.Series(firstTeamMVP)
secondMVPSeries = pd.Series(secondTeamMVP)
matches["Total_MVP_Difference"] = firstMVPSeries.values - secondMVPSeries.values









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()



In [24]:

    
# New column for Win Percentage of Team 1 in previous encounters

firstTeamWP = []
for i in matches['id'].unique():
    WP = getTeamWinPerc(i)
    firstTeamWP.append(WP)
firstWPSeries = pd.Series(firstTeamWP)
matches["Prev_Enc_Team1_WinPerc"] = firstWPSeries.values









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [25]:

    
# New column for Recent form(Win Percentage in the current season) of 1st Team compared to 2nd Team
# (Negative means 2nd team has higher win percentage)

firstTeamRF = []
secondTeamRF = []
for i in matches['id'].unique():
    K, L = getTeamWinPercentage(i)
    firstTeamRF.append(K), secondTeamRF.append(L)
firstRFSeries = pd.Series(firstTeamRF)
secondRFSeries = pd.Series(secondTeamRF)
matches["Total_RF_Difference"] = firstRFSeries.values - secondRFSeries.values









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()



In [26]:

    
#Testing 
matches.tail()









    Out[26]:







  
    
      
      id
      season
      city
      date
      team1
      team2
      toss_winner
      toss_decision
      result
      dl_applied
      winner
      win_by_runs
      win_by_wickets
      player_of_match
      venue
      umpire1
      umpire2
      umpire3
      team1Winning
      Avg_SR_Difference
      Avg_WPR_Difference
      Total_MVP_Difference
      Prev_Enc_Team1_WinPerc
      Total_RF_Difference
    
  
  
    
      510
      511
      2015
      Mumbai
      2015-05-16
      Rajasthan Royals
      Kolkata Knight Riders
      Rajasthan Royals
      bat
      normal
      0
      Rajasthan Royals
      9
      0
      SR Watson
      Brabourne Stadium
      RM Deshpande
      RK Illingworth
      NaN
      1
      -3.303823
      -0.271935
      -16
      50.000000
      0.000000
    
    
      513
      514
      2015
      Mumbai
      2015-05-19
      Mumbai Indians
      Chennai Super Kings
      Mumbai Indians
      bat
      normal
      0
      Mumbai Indians
      25
      0
      KA Pollard
      Wankhede Stadium
      HDPK Dharmasena
      RK Illingworth
      NaN
      1
      6.315981
      -0.617777
      -24
      50.000000
      0.000000
    
    
      514
      515
      2015
      Pune
      2015-05-20
      Royal Challengers Bangalore
      Rajasthan Royals
      Royal Challengers Bangalore
      bat
      normal
      0
      Royal Challengers Bangalore
      71
      0
      AB de Villiers
      Maharashtra Cricket Association Stadium
      AK Chaudhary
      C Shamshuddin
      NaN
      1
      -2.200375
      0.969143
      5
      50.000000
      0.000000
    
    
      515
      516
      2015
      Ranchi
      2015-05-22
      Royal Challengers Bangalore
      Chennai Super Kings
      Chennai Super Kings
      field
      normal
      0
      Chennai Super Kings
      0
      3
      A Nehra
      JSCA International Stadium Complex
      AK Chaudhary
      CB Gaffaney
      NaN
      0
      -0.521025
      1.039181
      -23
      38.888889
      33.333333
    
    
      516
      517
      2015
      Kolkata
      2015-05-24
      Mumbai Indians
      Chennai Super Kings
      Chennai Super Kings
      field
      normal
      0
      Mumbai Indians
      41
      0
      RG Sharma
      Eden Gardens
      HDPK Dharmasena
      RK Illingworth
      NaN
      1
      -1.575550
      -1.707931
      -24
      52.380952
      0.000000

Visualizations for Features vs. Response



In [27]:

    
# Graph for Average Strike Rate Difference
matches.boxplot(column = 'Avg_SR_Difference', by='team1Winning', showfliers= False)









    Out[27]:





<matplotlib.axes._subplots.AxesSubplot at 0x110c920d0>



In [28]:

    
# Graph for Average WPR(Wickets per Run) Difference
matches.boxplot(column = 'Avg_WPR_Difference', by='team1Winning', showfliers= False)









    Out[28]:





<matplotlib.axes._subplots.AxesSubplot at 0x115e9a850>



In [29]:

    
# Graph for MVP Difference
matches.boxplot(column = 'Total_MVP_Difference', by='team1Winning', showfliers= False)









    Out[29]:





<matplotlib.axes._subplots.AxesSubplot at 0x115ec7410>



In [30]:

    
#Graph for Previous encounters Win Percentage of Team #1
matches.boxplot(column = 'Prev_Enc_Team1_WinPerc', by='team1Winning', showfliers= False)









    Out[30]:





<matplotlib.axes._subplots.AxesSubplot at 0x115fc8810>



In [31]:

    
# Graph for Recent form(Win Percentage in the same season)
matches.boxplot(column = 'Total_RF_Difference', by='team1Winning', showfliers= False)









    Out[31]:





<matplotlib.axes._subplots.AxesSubplot at 0x11607fd10>

	id	season	city	date	team1	team2	toss_winner	toss_decision	result	winner	win_by_runs	win_by_wickets	player_of_match	venue	umpire1	umpire2	umpire3
0	1	2008	Bangalore	2008-04-18	Kolkata Knight Riders	Royal Challengers Bangalore	Royal Challengers Bangalore	field	normal	Kolkata Knight Riders	140	0	BB McCullum	M Chinnaswamy Stadium	Asad Rauf	RE Koertzen	NaN
1	2	2008	Chandigarh	2008-04-19	Chennai Super Kings	Kings XI Punjab	Chennai Super Kings	bat	normal	Chennai Super Kings	33	0	MEK Hussey	Punjab Cricket Association Stadium, Mohali	MR Benson	SL Shastri	NaN
2	3	2008	Delhi	2008-04-19	Rajasthan Royals	Delhi Daredevils	Rajasthan Royals	bat	normal	Delhi Daredevils	0	9	MF Maharoof	Feroz Shah Kotla	Aleem Dar	GA Pratapkumar	NaN
3	4	2008	Mumbai	2008-04-20	Mumbai Indians	Royal Challengers Bangalore	Mumbai Indians	bat	normal	Royal Challengers Bangalore	0	5	MV Boucher	Wankhede Stadium	SJ Davis	DJ Harper	NaN
4	5	2008	Kolkata	2008-04-20	Deccan Chargers	Kolkata Knight Riders	Deccan Chargers	bat	normal	Kolkata Knight Riders	0	5	DJ Hussey	Eden Gardens	BF Bowden	K Hariharan	NaN
5	6	2008	Jaipur	2008-04-21	Kings XI Punjab	Rajasthan Royals	Kings XI Punjab	bat	normal	Rajasthan Royals	0	6	SR Watson	Sawai Mansingh Stadium	Aleem Dar	RB Tiffin	NaN
6	7	2008	Hyderabad	2008-04-22	Deccan Chargers	Delhi Daredevils	Deccan Chargers	bat	normal	Delhi Daredevils	0	9	V Sehwag	Rajiv Gandhi International Stadium, Uppal	IL Howell	AM Saheba	NaN
7	8	2008	Chennai	2008-04-23	Chennai Super Kings	Mumbai Indians	Mumbai Indians	field	normal	Chennai Super Kings	6	0	ML Hayden	MA Chidambaram Stadium, Chepauk	DJ Harper	GA Pratapkumar	NaN
8	9	2008	Hyderabad	2008-04-24	Deccan Chargers	Rajasthan Royals	Rajasthan Royals	field	normal	Rajasthan Royals	0	3	YK Pathan	Rajiv Gandhi International Stadium, Uppal	Asad Rauf	MR Benson	NaN
9	10	2008	Chandigarh	2008-04-25	Kings XI Punjab	Mumbai Indians	Mumbai Indians	field	normal	Kings XI Punjab	66	0	KC Sangakkara	Punjab Cricket Association Stadium, Mohali	Aleem Dar	AM Saheba	NaN

	id	season	city	date	team1	team2	toss_winner	toss_decision	result	winner	win_by_runs	win_by_wickets	player_of_match	venue	umpire1	umpire2	umpire3	team1Winning	Avg_SR_Difference	Avg_WPR_Difference	Total_MVP_Difference	Prev_Enc_Team1_WinPerc	Total_RF_Difference
510	511	2015	Mumbai	2015-05-16	Rajasthan Royals	Kolkata Knight Riders	Rajasthan Royals	bat	normal	Rajasthan Royals	9	0	SR Watson	Brabourne Stadium	RM Deshpande	RK Illingworth	NaN	1	-3.303823	-0.271935	-16	50.000000	0.000000
513	514	2015	Mumbai	2015-05-19	Mumbai Indians	Chennai Super Kings	Mumbai Indians	bat	normal	Mumbai Indians	25	0	KA Pollard	Wankhede Stadium	HDPK Dharmasena	RK Illingworth	NaN	1	6.315981	-0.617777	-24	50.000000	0.000000
514	515	2015	Pune	2015-05-20	Royal Challengers Bangalore	Rajasthan Royals	Royal Challengers Bangalore	bat	normal	Royal Challengers Bangalore	71	0	AB de Villiers	Maharashtra Cricket Association Stadium	AK Chaudhary	C Shamshuddin	NaN	1	-2.200375	0.969143	5	50.000000	0.000000
515	516	2015	Ranchi	2015-05-22	Royal Challengers Bangalore	Chennai Super Kings	Chennai Super Kings	field	normal	Chennai Super Kings	0	3	A Nehra	JSCA International Stadium Complex	AK Chaudhary	CB Gaffaney	NaN	0	-0.521025	1.039181	-23	38.888889	33.333333
516	517	2015	Kolkata	2015-05-24	Mumbai Indians	Chennai Super Kings	Chennai Super Kings	field	normal	Mumbai Indians	41	0	RG Sharma	Eden Gardens	HDPK Dharmasena	RK Illingworth	NaN	1	-1.575550	-1.707931	-24	52.380952	0.000000