Predicting the Outcome of Cricket Matches

Introduction

In this project, we shall build a model which predicts the outcome of cricket matches in the Indian Premier League using data about matches and deliveries.

Data Mining:

  • Season : 2008 - 2015 (8 Seasons)
  • Teams : DD, KKR, MI, RCB, KXIP, RR, CSK (7 Teams)
  • Neglect matches that have inconsistencies such as No Result, Tie, D/L Method, etc.

Features:

  • Average Batsman Rating (Strike Rate)
  • Average Bowler Rating (Wickets per Run)
  • Player of the Match Awards
  • Previous Encounters - Win by runs, Win by Wickets
  • Recent form

Prediction Model

  • Logistic Regression using sklearn
  • K-Nearest Neighbors using sklearn

In [1]:
%matplotlib inline 
import numpy as np # imports a fast numerical programming library
import matplotlib.pyplot as plt #sets up plotting under plt
import pandas as pd #lets us handle data as dataframes
#sets up pandas table display
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set(style="whitegrid", color_codes=True)
from __future__ import division

Data Mining


In [2]:
# Reading in the data
allmatches = pd.read_csv("../data/matches.csv")
alldeliveries = pd.read_csv("../data/deliveries.csv")
allmatches.head(10)


Out[2]:
id season city date team1 team2 toss_winner toss_decision result dl_applied winner win_by_runs win_by_wickets player_of_match venue umpire1 umpire2 umpire3
0 1 2008 Bangalore 2008-04-18 Kolkata Knight Riders Royal Challengers Bangalore Royal Challengers Bangalore field normal 0 Kolkata Knight Riders 140 0 BB McCullum M Chinnaswamy Stadium Asad Rauf RE Koertzen NaN
1 2 2008 Chandigarh 2008-04-19 Chennai Super Kings Kings XI Punjab Chennai Super Kings bat normal 0 Chennai Super Kings 33 0 MEK Hussey Punjab Cricket Association Stadium, Mohali MR Benson SL Shastri NaN
2 3 2008 Delhi 2008-04-19 Rajasthan Royals Delhi Daredevils Rajasthan Royals bat normal 0 Delhi Daredevils 0 9 MF Maharoof Feroz Shah Kotla Aleem Dar GA Pratapkumar NaN
3 4 2008 Mumbai 2008-04-20 Mumbai Indians Royal Challengers Bangalore Mumbai Indians bat normal 0 Royal Challengers Bangalore 0 5 MV Boucher Wankhede Stadium SJ Davis DJ Harper NaN
4 5 2008 Kolkata 2008-04-20 Deccan Chargers Kolkata Knight Riders Deccan Chargers bat normal 0 Kolkata Knight Riders 0 5 DJ Hussey Eden Gardens BF Bowden K Hariharan NaN
5 6 2008 Jaipur 2008-04-21 Kings XI Punjab Rajasthan Royals Kings XI Punjab bat normal 0 Rajasthan Royals 0 6 SR Watson Sawai Mansingh Stadium Aleem Dar RB Tiffin NaN
6 7 2008 Hyderabad 2008-04-22 Deccan Chargers Delhi Daredevils Deccan Chargers bat normal 0 Delhi Daredevils 0 9 V Sehwag Rajiv Gandhi International Stadium, Uppal IL Howell AM Saheba NaN
7 8 2008 Chennai 2008-04-23 Chennai Super Kings Mumbai Indians Mumbai Indians field normal 0 Chennai Super Kings 6 0 ML Hayden MA Chidambaram Stadium, Chepauk DJ Harper GA Pratapkumar NaN
8 9 2008 Hyderabad 2008-04-24 Deccan Chargers Rajasthan Royals Rajasthan Royals field normal 0 Rajasthan Royals 0 3 YK Pathan Rajiv Gandhi International Stadium, Uppal Asad Rauf MR Benson NaN
9 10 2008 Chandigarh 2008-04-25 Kings XI Punjab Mumbai Indians Mumbai Indians field normal 0 Kings XI Punjab 66 0 KC Sangakkara Punjab Cricket Association Stadium, Mohali Aleem Dar AM Saheba NaN

In [3]:
# Selecting Seasons 2008 - 2015
matches_seasons = allmatches.loc[allmatches['season'] != 2016]
deliveries_seasons = alldeliveries.loc[alldeliveries['match_id'] < 518]

In [4]:
# Selecting teams DD, KKR, MI, RCB, KXIP, RR, CSK
matches_teams = matches_seasons.loc[(matches_seasons['team1'].isin(['Kolkata Knight Riders', \
'Royal Challengers Bangalore', 'Delhi Daredevils', 'Chennai Super Kings', 'Rajasthan Royals', \
'Mumbai Indians', 'Kings XI Punjab'])) & (matches_seasons['team2'].isin(['Kolkata Knight Riders', \
'Royal Challengers Bangalore', 'Delhi Daredevils', 'Chennai Super Kings', 'Rajasthan Royals', \
'Mumbai Indians', 'Kings XI Punjab']))]
matches_team_matchids = matches_teams.id.unique()
deliveries_teams = deliveries_seasons.loc[deliveries_seasons['match_id'].isin(matches_team_matchids)]
print "Teams selected:\n"
for team in matches_teams.team1.unique():
    print team


Teams selected:

Kolkata Knight Riders
Chennai Super Kings
Rajasthan Royals
Mumbai Indians
Kings XI Punjab
Royal Challengers Bangalore
Delhi Daredevils

In [5]:
# Neglect matches with inconsistencies like 'No Result' or 'D/L Applied'
matches = matches_teams.loc[(matches_teams['result'] == 'normal') & (matches_teams['dl_applied'] == 0)]
matches_matchids = matches.id.unique()
deliveries = deliveries_teams.loc[deliveries_teams['match_id'].isin(matches_matchids)]
# Verifying consistency between datasets
(matches.id.unique() == deliveries.match_id.unique()).all()


Out[5]:
True

Building Features


In [6]:
# Batsman Strike Rate Calculation
# Team 1: Batting First; Team 2: Fielding First

def getMatchDeliveriesDF(match_id):
    return deliveries.loc[deliveries['match_id'] == match_id]

def getInningsOneBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].batsman.unique()[0:5]

def getInningsTwoBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].batsman.unique()[0:5]

def getBatsmanStrikeRate(batsman, match_id):
    onstrikedeliveries = deliveries.loc[(deliveries['match_id'] < match_id) & (deliveries['batsman'] == batsman)]
    total_runs = onstrikedeliveries['batsman_runs'].sum()
    total_balls = onstrikedeliveries.shape[0]
    if total_balls != 0: 
        return (total_runs/total_balls) * 100
    else:
        return None

def getTeamStrikeRate(batsmen, match_id):
    strike_rates = []
    for batsman in batsmen:
        bsr = getBatsmanStrikeRate(batsman, match_id)
        if bsr != None:
            strike_rates.append(bsr)
    return np.mean(strike_rates)

def getAverageStrikeRates(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBatsmen = getInningsOneBatsmen(match_deliveries)
    innTwoBatsmen = getInningsTwoBatsmen(match_deliveries)
    teamOneSR = getTeamStrikeRate(innOneBatsmen, match_id)
    teamTwoSR = getTeamStrikeRate(innTwoBatsmen, match_id)
    return teamOneSR, teamTwoSR

In [7]:
# Testing Functionality
getAverageStrikeRates(517)


Out[7]:
(126.98024523159935, 128.55579510411653)

In [8]:
# Bowler Rating : Wickets/Run (Higher the Better)
# Team 1: Batting First; Team 2: Fielding First

def getInningsOneBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].bowler.unique()[0:4]

def getInningsTwoBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].bowler.unique()[0:4]

def getBowlerWPR(bowler, match_id):
    balls = deliveries.loc[(deliveries['match_id'] < match_id) & (deliveries['bowler'] == bowler)]
    total_runs = balls['total_runs'].sum()
    total_wickets = balls.loc[balls['dismissal_kind'].isin(['caught', 'bowled', 'lbw', \
    'caught and bowled', 'stumped'])].shape[0]
    if balls.shape[0] > 0:
        return (total_wickets/total_runs) * 100
    else:
        return None

def getTeamWPR(bowlers, match_id):
    WPRs = []
    for bowler in bowlers:
        bwpr = getBowlerWPR(bowler, match_id)
        if bwpr != None:
            WPRs.append(bwpr)
    return np.mean(WPRs)

def getAverageWPR(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBowlers = getInningsOneBowlers(match_deliveries)
    innTwoBowlers = getInningsTwoBowlers(match_deliveries)
    teamOneWPR = getTeamWPR(innTwoBowlers, match_id)
    teamTwoWPR = getTeamWPR(innOneBowlers, match_id)
    return teamOneWPR, teamTwoWPR

In [9]:
# testing functionality
getAverageWPR(517)


Out[9]:
(2.7641806594085776, 4.4721111768026631)

In [10]:
# MVP Score (Total number of Player of the Match awards in a squad)

def getAllInningsOneBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].batsman.unique()

def getAllInningsTwoBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].batsman.unique()

def getAllInningsOneBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].bowler.unique()

def getAllInningsTwoBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].bowler.unique()

def makeSquad(batsmen, bowlers):
    p = []
    p = np.append(p, batsmen)
    for i in bowlers:
        if i not in batsmen:
            p = np.append(p, i)
    return p

def getPlayerMVPAwards(player, match_id):
    return matches.loc[(matches['player_of_match'] == player) & (matches['id'] < match_id)].shape[0]

def getTeamMVPAwards(squad, match_id):
    num_awards = 0
    for player in squad:
        num_awards += getPlayerMVPAwards(player, match_id)
    return num_awards

def compareMVPAwards(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBatsmen = getAllInningsOneBatsmen(match_deliveries)
    innTwoBatsmen = getAllInningsTwoBatsmen(match_deliveries)
    innOneBowlers = getAllInningsOneBowlers(match_deliveries)
    innTwoBowlers = getAllInningsTwoBowlers(match_deliveries)
    teamOneSquad = makeSquad(innOneBatsmen, innTwoBowlers)
    teamTwoSquad = makeSquad(innTwoBatsmen, innOneBowlers)
    teamOneAwards = getTeamMVPAwards(teamOneSquad, match_id)
    teamTwoAwards = getTeamMVPAwards(teamTwoSquad, match_id)
    return teamOneAwards, teamTwoAwards

In [11]:
compareMVPAwards(517)


Out[11]:
(28, 52)

In [12]:
# Prints a comparison between two teams based on squad attributes

def generateSquadRating(match_id):
    gameday_teams = deliveries.loc[(deliveries['match_id'] == match_id)].batting_team.unique()
    teamOne = gameday_teams[0]
    teamTwo = gameday_teams[1]
    teamOneSR, teamTwoSR = getAverageStrikeRates(match_id)
    teamOneWPR, teamTwoWPR = getAverageWPR(match_id)
    teamOneMVPs, teamTwoMVPs = compareMVPAwards(match_id)
    print "Comparing squads for {} vs {}".format(teamOne,teamTwo)
    print "\nAverage Strike Rate for Batsmen in {} : {}".format(teamOne,teamOneSR)
    print "\nAverage Strike Rate for Batsmen in {} : {}".format(teamTwo,teamTwoSR)
    print "\nBowler Rating (W/R) for {} : {}".format(teamOne,teamOneWPR)
    print "\nBowler Rating (W/R) for {} : {}".format(teamTwo,teamTwoWPR)
    print "\nNumber of MVP Awards in {} : {}".format(teamOne,teamOneMVPs)
    print "\nNumber of MVP Awards in {} : {}".format(teamTwo,teamTwoMVPs)

In [13]:
#Testing Functionality
generateSquadRating(517)


Comparing squads for Mumbai Indians vs Chennai Super Kings

Average Strike Rate for Batsmen in Mumbai Indians : 126.980245232

Average Strike Rate for Batsmen in Chennai Super Kings : 128.555795104

Bowler Rating (W/R) for Mumbai Indians : 2.76418065941

Bowler Rating (W/R) for Chennai Super Kings : 4.4721111768

Number of MVP Awards in Mumbai Indians : 28

Number of MVP Awards in Chennai Super Kings : 52

In [14]:
## 2nd Feature : Previous Encounter
# Won by runs and won by wickets (Higher the better)

def getTeam1(match_id):
    return matches.loc[matches["id"] == match_id].team1.unique()

def getTeam2(match_id):
    return matches.loc[matches["id"] == match_id].team2.unique()

def getPreviousEncDF(match_id):
    team1 = getTeam1(match_id)
    team2 = getTeam2(match_id)
    return matches.loc[(matches["id"] < match_id) & (((matches["team1"].isin(team1)) & (matches["team2"].isin(team2))) | ((matches["team1"].isin(team2)) & (matches["team2"].isin(team1))))]

def getTeamWBR(match_id, team):
    WBR = 0
    DF = getPreviousEncDF(match_id)
    winnerDF = DF.loc[DF["winner"] == team]
    WBR = winnerDF['win_by_runs'].sum()    
    return WBR


def getTeamWBW(match_id, team):
    WBW = 0 
    DF = getPreviousEncDF(match_id)
    winnerDF = DF.loc[DF["winner"] == team]
    WBW = winnerDF['win_by_wickets'].sum()
    return WBW 
    
def getTeamWinPerc(match_id):
    dF = getPreviousEncDF(match_id)
    timesPlayed = dF.shape[0]
    team1 = getTeam1(match_id)[0].strip("[]")
    timesWon = dF.loc[dF["winner"] == team1].shape[0]
    if timesPlayed != 0:
        winPerc = (timesWon/timesPlayed) * 100
    else:
        winPerc = 0
    return winPerc

def getBothTeamStats(match_id):
    DF = getPreviousEncDF(match_id)
    team1 = getTeam1(match_id)[0].strip("[]")
    team2 = getTeam2(match_id)[0].strip("[]")
    timesPlayed = DF.shape[0]
    timesWon = DF.loc[DF["winner"] == team1].shape[0]
    WBRTeam1 = getTeamWBR(match_id, team1)
    WBRTeam2 = getTeamWBR(match_id, team2)
    WBWTeam1 = getTeamWBW(match_id, team1)
    WBWTeam2 = getTeamWBW(match_id, team2)

    print "Out of {} times in the past {} have won {} times({}%) from {}".format(timesPlayed, team1, timesWon, getTeamWinPerc(match_id), team2)
    print "{} won by {} total runs and {} total wickets.".format(team1, WBRTeam1, WBWTeam1)
    print "{} won by {} total runs and {} total wickets.".format(team2, WBRTeam2, WBWTeam2)

In [15]:
#Testing functionality 
getBothTeamStats(517)


Out of 21 times in the past Mumbai Indians have won 11 times(52.380952381%) from Chennai Super Kings
Mumbai Indians won by 144 total runs and 30 total wickets.
Chennai Super Kings won by 138 total runs and 31 total wickets.

In [16]:
# 3rd Feature: Recent Form (Win Percentage of 3 previous matches of a team in the same season)
# Higher the better

def getMatchYear(match_id):
    return matches.loc[matches["id"] == match_id].season.unique()

def getTeam1DF(match_id, year):
    team1 = getTeam1(match_id)
    return matches.loc[(matches["id"] < match_id) & (matches["season"] == year) & ((matches["team1"].isin(team1)) | (matches["team2"].isin(team1)))].tail(3)

def getTeam2DF(match_id, year):
    team2 = getTeam2(match_id)
    return matches.loc[(matches["id"] < match_id) & (matches["season"] == year) & ((matches["team1"].isin(team2)) | (matches["team2"].isin(team2)))].tail(3)

def getTeamWinPercentage(match_id):
    year = int(getMatchYear(match_id))
    team1 = getTeam1(match_id)[0].strip("[]")
    team2 = getTeam2(match_id)[0].strip("[]")
    team1DF = getTeam1DF(match_id, year)
    team2DF = getTeam2DF(match_id, year)
    team1TotalMatches = team1DF.shape[0]
    team1WinMatches = team1DF.loc[team1DF["winner"] == team1].shape[0]
    team2TotalMatches = team2DF.shape[0]
    team2WinMatches = team2DF.loc[team2DF["winner"] == team2].shape[0]
    if (team1TotalMatches != 0) and (team2TotalMatches !=0):
        winPercTeam1 = ((team1WinMatches / team1TotalMatches) * 100) 
        winPercTeam2 = ((team2WinMatches / team2TotalMatches) * 100) 
    elif (team1TotalMatches != 0) and (team2TotalMatches ==0):
        winPercTeam1 = ((team1WinMatches / team1TotalMatches) * 100) 
        winPercTeam2 = 0
    elif (team1TotalMatches == 0) and (team2TotalMatches !=0):
        winPercTeam1 = 0
        winPercTeam2 = ((team2WinMatches / team2TotalMatches) * 100) 
    else:
        winPercTeam1 = 0
        winPercTeam2 = 0
    return winPercTeam1, winPercTeam2

In [17]:
#Testing Functionality
getTeamWinPercentage(517)


Out[17]:
(66.66666666666666, 66.66666666666666)

In [18]:
#Function to implement all features
def getAllFeatures(match_id):
    generateSquadRating(match_id)
    print ("\n")
    getBothTeamStats(match_id)
    print("\n")
    getTeamWinPercentage(match_id)

In [19]:
#Testing Functionality
getAllFeatures(517)


Comparing squads for Mumbai Indians vs Chennai Super Kings

Average Strike Rate for Batsmen in Mumbai Indians : 126.980245232

Average Strike Rate for Batsmen in Chennai Super Kings : 128.555795104

Bowler Rating (W/R) for Mumbai Indians : 2.76418065941

Bowler Rating (W/R) for Chennai Super Kings : 4.4721111768

Number of MVP Awards in Mumbai Indians : 28

Number of MVP Awards in Chennai Super Kings : 52


Out of 21 times in the past Mumbai Indians have won 11 times(52.380952381%) from Chennai Super Kings
Mumbai Indians won by 144 total runs and 30 total wickets.
Chennai Super Kings won by 138 total runs and 31 total wickets.


Adding New Columns for Features in Matches DataFrame


In [20]:
#Create Column for Team 1 Winning Status (1 = Won, 0 = Lost)

matches['team1Winning'] = np.where(matches['team1'] == matches['winner'], 1, 0)


/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until

In [21]:
# New Column for Difference of Average Strike rates (First Team SR - Second Team SR) 
# [Negative value means Second team is better]

firstTeamSR = []
secondTeamSR = []
for i in matches['id'].unique():
    P, Q = getAverageStrikeRates(i)
    firstTeamSR.append(P), secondTeamSR.append(Q)
firstSRSeries = pd.Series(firstTeamSR)
secondSRSeries = pd.Series(secondTeamSR)
matches["Avg_SR_Difference"] = firstSRSeries.values - secondSRSeries.values


/Users/gursahej/anaconda2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/Users/gursahej/anaconda2/lib/python2.7/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()

In [22]:
# New Column for Difference of Wickets Per Run (First Team WPR - Second Team WPR) 
# [Negative value means Second team is better]

firstTeamWPR = []
secondTeamWPR = []
for i in matches['id'].unique():
    R, S = getAverageWPR(i)
    firstTeamWPR.append(R), secondTeamWPR.append(S)
firstWPRSeries = pd.Series(firstTeamWPR)
secondWPRSeries = pd.Series(secondTeamWPR)
matches["Avg_WPR_Difference"] = firstWPRSeries.values - secondWPRSeries.values


/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()

In [23]:
# New column for difference of MVP Awards 
# (Negative value means Second team is better)

firstTeamMVP = []
secondTeamMVP = []
for i in matches['id'].unique():
    T, U = compareMVPAwards(i)
    firstTeamMVP.append(T), secondTeamMVP.append(U)
firstMVPSeries = pd.Series(firstTeamMVP)
secondMVPSeries = pd.Series(secondTeamMVP)
matches["Total_MVP_Difference"] = firstMVPSeries.values - secondMVPSeries.values


/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()

In [24]:
# New column for Win Percentage of Team 1 in previous encounters

firstTeamWP = []
for i in matches['id'].unique():
    WP = getTeamWinPerc(i)
    firstTeamWP.append(WP)
firstWPSeries = pd.Series(firstTeamWP)
matches["Prev_Enc_Team1_WinPerc"] = firstWPSeries.values


/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  

In [25]:
# New column for Recent form(Win Percentage in the current season) of 1st Team compared to 2nd Team
# (Negative means 2nd team has higher win percentage)

firstTeamRF = []
secondTeamRF = []
for i in matches['id'].unique():
    K, L = getTeamWinPercentage(i)
    firstTeamRF.append(K), secondTeamRF.append(L)
firstRFSeries = pd.Series(firstTeamRF)
secondRFSeries = pd.Series(secondTeamRF)
matches["Total_RF_Difference"] = firstRFSeries.values - secondRFSeries.values


/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()

In [26]:
#Testing 
matches.tail()


Out[26]:
id season city date team1 team2 toss_winner toss_decision result dl_applied winner win_by_runs win_by_wickets player_of_match venue umpire1 umpire2 umpire3 team1Winning Avg_SR_Difference Avg_WPR_Difference Total_MVP_Difference Prev_Enc_Team1_WinPerc Total_RF_Difference
510 511 2015 Mumbai 2015-05-16 Rajasthan Royals Kolkata Knight Riders Rajasthan Royals bat normal 0 Rajasthan Royals 9 0 SR Watson Brabourne Stadium RM Deshpande RK Illingworth NaN 1 -3.303823 -0.271935 -16 50.000000 0.000000
513 514 2015 Mumbai 2015-05-19 Mumbai Indians Chennai Super Kings Mumbai Indians bat normal 0 Mumbai Indians 25 0 KA Pollard Wankhede Stadium HDPK Dharmasena RK Illingworth NaN 1 6.315981 -0.617777 -24 50.000000 0.000000
514 515 2015 Pune 2015-05-20 Royal Challengers Bangalore Rajasthan Royals Royal Challengers Bangalore bat normal 0 Royal Challengers Bangalore 71 0 AB de Villiers Maharashtra Cricket Association Stadium AK Chaudhary C Shamshuddin NaN 1 -2.200375 0.969143 5 50.000000 0.000000
515 516 2015 Ranchi 2015-05-22 Royal Challengers Bangalore Chennai Super Kings Chennai Super Kings field normal 0 Chennai Super Kings 0 3 A Nehra JSCA International Stadium Complex AK Chaudhary CB Gaffaney NaN 0 -0.521025 1.039181 -23 38.888889 33.333333
516 517 2015 Kolkata 2015-05-24 Mumbai Indians Chennai Super Kings Chennai Super Kings field normal 0 Mumbai Indians 41 0 RG Sharma Eden Gardens HDPK Dharmasena RK Illingworth NaN 1 -1.575550 -1.707931 -24 52.380952 0.000000

Visualizations for Features vs. Response


In [27]:
# Graph for Average Strike Rate Difference
matches.boxplot(column = 'Avg_SR_Difference', by='team1Winning', showfliers= False)


Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x110c920d0>

In [28]:
# Graph for Average WPR(Wickets per Run) Difference
matches.boxplot(column = 'Avg_WPR_Difference', by='team1Winning', showfliers= False)


Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x115e9a850>

In [29]:
# Graph for MVP Difference
matches.boxplot(column = 'Total_MVP_Difference', by='team1Winning', showfliers= False)


Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x115ec7410>

In [30]:
#Graph for Previous encounters Win Percentage of Team #1
matches.boxplot(column = 'Prev_Enc_Team1_WinPerc', by='team1Winning', showfliers= False)


Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x115fc8810>

In [31]:
# Graph for Recent form(Win Percentage in the same season)
matches.boxplot(column = 'Total_RF_Difference', by='team1Winning', showfliers= False)


Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x11607fd10>