Predicting the Outcome of Cricket Matches

Introduction

In this project, we shall build a model which predicts the outcome of cricket matches in the Indian Premier League using data about matches and deliveries.

Data Mining:

Season : 2008 - 2015 (8 Seasons)
Teams : DD, KKR, MI, RCB, KXIP, RR, CSK (7 Teams)
Neglect matches that have inconsistencies such as No Result, Tie, D/L Method, etc.

Features:

Average Batsman Rating (Strike Rate)
Average Bowler Rating (Wickets per Run)
Player of the Match Awards in squad
Previous Encounters between two teams
Recent form (Last 3 games played)

Prediction Model

K-Nearest Neighbors using sklearn
Gradient Boosting using xgboost



In [1]:

    
%matplotlib inline 
import numpy as np # imports a fast numerical programming library
import matplotlib.pyplot as plt #sets up plotting under plt
import pandas as pd #lets us handle data as dataframes
#sets up pandas table display
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
from __future__ import division

Data Mining



In [2]:

    
# Reading in the data
allmatches = pd.read_csv("../data/matches.csv")
alldeliveries = pd.read_csv("../data/deliveries.csv")
allmatches.head(10)









    Out[2]:







  
    
      
      id
      season
      city
      date
      team1
      team2
      toss_winner
      toss_decision
      result
      dl_applied
      winner
      win_by_runs
      win_by_wickets
      player_of_match
      venue
      umpire1
      umpire2
      umpire3
    
  
  
    
      0
      1
      2008
      Bangalore
      2008-04-18
      Kolkata Knight Riders
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Kolkata Knight Riders
      140
      0
      BB McCullum
      M Chinnaswamy Stadium
      Asad Rauf
      RE Koertzen
      NaN
    
    
      1
      2
      2008
      Chandigarh
      2008-04-19
      Chennai Super Kings
      Kings XI Punjab
      Chennai Super Kings
      bat
      normal
      0
      Chennai Super Kings
      33
      0
      MEK Hussey
      Punjab Cricket Association Stadium, Mohali
      MR Benson
      SL Shastri
      NaN
    
    
      2
      3
      2008
      Delhi
      2008-04-19
      Rajasthan Royals
      Delhi Daredevils
      Rajasthan Royals
      bat
      normal
      0
      Delhi Daredevils
      0
      9
      MF Maharoof
      Feroz Shah Kotla
      Aleem Dar
      GA Pratapkumar
      NaN
    
    
      3
      4
      2008
      Mumbai
      2008-04-20
      Mumbai Indians
      Royal Challengers Bangalore
      Mumbai Indians
      bat
      normal
      0
      Royal Challengers Bangalore
      0
      5
      MV Boucher
      Wankhede Stadium
      SJ Davis
      DJ Harper
      NaN
    
    
      4
      5
      2008
      Kolkata
      2008-04-20
      Deccan Chargers
      Kolkata Knight Riders
      Deccan Chargers
      bat
      normal
      0
      Kolkata Knight Riders
      0
      5
      DJ Hussey
      Eden Gardens
      BF Bowden
      K Hariharan
      NaN
    
    
      5
      6
      2008
      Jaipur
      2008-04-21
      Kings XI Punjab
      Rajasthan Royals
      Kings XI Punjab
      bat
      normal
      0
      Rajasthan Royals
      0
      6
      SR Watson
      Sawai Mansingh Stadium
      Aleem Dar
      RB Tiffin
      NaN
    
    
      6
      7
      2008
      Hyderabad
      2008-04-22
      Deccan Chargers
      Delhi Daredevils
      Deccan Chargers
      bat
      normal
      0
      Delhi Daredevils
      0
      9
      V Sehwag
      Rajiv Gandhi International Stadium, Uppal
      IL Howell
      AM Saheba
      NaN
    
    
      7
      8
      2008
      Chennai
      2008-04-23
      Chennai Super Kings
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Chennai Super Kings
      6
      0
      ML Hayden
      MA Chidambaram Stadium, Chepauk
      DJ Harper
      GA Pratapkumar
      NaN
    
    
      8
      9
      2008
      Hyderabad
      2008-04-24
      Deccan Chargers
      Rajasthan Royals
      Rajasthan Royals
      field
      normal
      0
      Rajasthan Royals
      0
      3
      YK Pathan
      Rajiv Gandhi International Stadium, Uppal
      Asad Rauf
      MR Benson
      NaN
    
    
      9
      10
      2008
      Chandigarh
      2008-04-25
      Kings XI Punjab
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Kings XI Punjab
      66
      0
      KC Sangakkara
      Punjab Cricket Association Stadium, Mohali
      Aleem Dar
      AM Saheba
      NaN



In [3]:

    
# Selecting Seasons 2008 - 2015
matches_seasons = allmatches.loc[allmatches['season'] != 2016]
deliveries_seasons = alldeliveries.loc[alldeliveries['match_id'] < 518]



In [4]:

    
# Selecting teams DD, KKR, MI, RCB, KXIP, RR, CSK
matches_teams = matches_seasons.loc[(matches_seasons['team1'].isin(['Kolkata Knight Riders', \
'Royal Challengers Bangalore', 'Delhi Daredevils', 'Chennai Super Kings', 'Rajasthan Royals', \
'Mumbai Indians', 'Kings XI Punjab'])) & (matches_seasons['team2'].isin(['Kolkata Knight Riders', \
'Royal Challengers Bangalore', 'Delhi Daredevils', 'Chennai Super Kings', 'Rajasthan Royals', \
'Mumbai Indians', 'Kings XI Punjab']))]
matches_team_matchids = matches_teams.id.unique()
deliveries_teams = deliveries_seasons.loc[deliveries_seasons['match_id'].isin(matches_team_matchids)]
print "Teams selected:\n"
for team in matches_teams.team1.unique():
    print team









    



Teams selected:

Kolkata Knight Riders
Chennai Super Kings
Rajasthan Royals
Mumbai Indians
Kings XI Punjab
Royal Challengers Bangalore
Delhi Daredevils



In [5]:

    
# Neglect matches with inconsistencies like 'No Result' or 'D/L Applied'
matches = matches_teams.loc[(matches_teams['result'] == 'normal') & (matches_teams['dl_applied'] == 0)]
matches_matchids = matches.id.unique()
deliveries = deliveries_teams.loc[deliveries_teams['match_id'].isin(matches_matchids)]
# Verifying consistency between datasets
(matches.id.unique() == deliveries.match_id.unique()).all()









    Out[5]:





True

Building Features



In [6]:

    
# Batsman Strike Rate Calculation (Top 5 Batsmen)
# Team 1: Batting First; Team 2: Fielding First

def getMatchDeliveriesDF(match_id):
    return deliveries.loc[deliveries['match_id'] == match_id]

def getInningsOneBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].batsman.unique()[0:5]

def getInningsTwoBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].batsman.unique()[0:5]

def getBatsmanStrikeRate(batsman, match_id):
    onstrikedeliveries = deliveries.loc[(deliveries['match_id'] < match_id) & (deliveries['batsman'] == batsman)]
    total_runs = onstrikedeliveries['batsman_runs'].sum()
    total_balls = onstrikedeliveries.shape[0]
    if total_balls != 0: 
        return (total_runs/total_balls) * 100
    else:
        return None

def getTeamStrikeRate(batsmen, match_id):
    strike_rates = []
    for batsman in batsmen:
        bsr = getBatsmanStrikeRate(batsman, match_id)
        if bsr != None:
            strike_rates.append(bsr)
    return np.mean(strike_rates)

def getAverageStrikeRates(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBatsmen = getInningsOneBatsmen(match_deliveries)
    innTwoBatsmen = getInningsTwoBatsmen(match_deliveries)
    teamOneSR = getTeamStrikeRate(innOneBatsmen, match_id)
    teamTwoSR = getTeamStrikeRate(innTwoBatsmen, match_id)
    return teamOneSR, teamTwoSR



In [7]:

    
# testing functionality
getAverageStrikeRates(517)









    Out[7]:





(126.98024523159935, 128.55579510411653)



In [8]:

    
# Bowler Rating : Wickets/Run (Higher the Better)
# Team 1: Batting First; Team 2: Fielding First

def getInningsOneBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].bowler.unique()[0:4]

def getInningsTwoBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].bowler.unique()[0:4]

def getBowlerWPR(bowler, match_id):
    balls = deliveries.loc[(deliveries['match_id'] < match_id) & (deliveries['bowler'] == bowler)]
    total_runs = balls['total_runs'].sum()
    total_wickets = balls.loc[balls['dismissal_kind'].isin(['caught', 'bowled', 'lbw', \
    'caught and bowled', 'stumped'])].shape[0]
    if balls.shape[0] > 0:
        return (total_wickets/total_runs) * 100
    else:
        return None

def getTeamWPR(bowlers, match_id):
    WPRs = []
    for bowler in bowlers:
        bwpr = getBowlerWPR(bowler, match_id)
        if bwpr != None:
            WPRs.append(bwpr)
    return np.mean(WPRs)

def getAverageWPR(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBowlers = getInningsOneBowlers(match_deliveries)
    innTwoBowlers = getInningsTwoBowlers(match_deliveries)
    teamOneWPR = getTeamWPR(innTwoBowlers, match_id)
    teamTwoWPR = getTeamWPR(innOneBowlers, match_id)
    return teamOneWPR, teamTwoWPR



In [9]:

    
# testing functionality
getAverageWPR(517)









    Out[9]:





(2.7641806594085776, 4.4721111768026631)



In [10]:

    
# MVP Score (Total number of Player of the Match awards in a squad)
# Team 1: Batting First; Team 2: Fielding First

def getAllInningsOneBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].batsman.unique()

def getAllInningsTwoBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].batsman.unique()

def getAllInningsOneBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].bowler.unique()

def getAllInningsTwoBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].bowler.unique()

def makeSquad(batsmen, bowlers):
    p = []
    p = np.append(p, batsmen)
    for i in bowlers:
        if i not in batsmen:
            p = np.append(p, i)
    return p

def getPlayerMVPAwards(player, match_id):
    return matches.loc[(matches['player_of_match'] == player) & (matches['id'] < match_id)].shape[0]

def getTeamMVPAwards(squad, match_id):
    num_awards = 0
    for player in squad:
        num_awards += getPlayerMVPAwards(player, match_id)
    return num_awards

def compareMVPAwards(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBatsmen = getAllInningsOneBatsmen(match_deliveries)
    innTwoBatsmen = getAllInningsTwoBatsmen(match_deliveries)
    innOneBowlers = getAllInningsOneBowlers(match_deliveries)
    innTwoBowlers = getAllInningsTwoBowlers(match_deliveries)
    teamOneSquad = makeSquad(innOneBatsmen, innTwoBowlers)
    teamTwoSquad = makeSquad(innTwoBatsmen, innOneBowlers)
    teamOneAwards = getTeamMVPAwards(teamOneSquad, match_id)
    teamTwoAwards = getTeamMVPAwards(teamTwoSquad, match_id)
    return teamOneAwards, teamTwoAwards



In [11]:

    
compareMVPAwards(517)









    Out[11]:





(28, 52)



In [12]:

    
# Prints a comparison between two teams based on squad attributes
def generateSquadRating(match_id):
    gameday_teams = deliveries.loc[(deliveries['match_id'] == match_id)].batting_team.unique()
    teamOne = gameday_teams[0]
    teamTwo = gameday_teams[1]
    teamOneSR, teamTwoSR = getAverageStrikeRates(match_id)
    teamOneWPR, teamTwoWPR = getAverageWPR(match_id)
    teamOneMVPs, teamTwoMVPs = compareMVPAwards(match_id)
    print "Comparing squads for " + teamOne + " vs " + teamTwo
    print "\nAverage Strike Rate for Batsmen in " + str(teamOne) + " : " + str(teamOneSR)
    print "\nAverage Strike Rate for Batsmen in " + str(teamTwo) + " : " + str(teamTwoSR)
    print "\nBowler Rating for " + str(teamOne) + " : " + str(teamOneWPR)
    print "\nBowler Rating for " + str(teamTwo) + " : " + str(teamTwoWPR)
    print "\nNumber of MVP Awards in " + str(teamOne) + " : " + str(teamOneMVPs)
    print "\nNumber of MVP Awards in " + str(teamTwo) + " : " + str(teamTwoMVPs)



In [13]:

    
generateSquadRating(517)









    



Comparing squads for Mumbai Indians vs Chennai Super Kings

Average Strike Rate for Batsmen in Mumbai Indians : 126.980245232

Average Strike Rate for Batsmen in Chennai Super Kings : 128.555795104

Bowler Rating for Mumbai Indians : 2.76418065941

Bowler Rating for Chennai Super Kings : 4.4721111768

Number of MVP Awards in Mumbai Indians : 28

Number of MVP Awards in Chennai Super Kings : 52



In [14]:

    
# Previous Encounters (All games played in previous matches)
# Win % for Team 1 against Team 2

def getTeam1(match_id):
    return matches.loc[matches["id"] == match_id].team1.unique()

def getTeam2(match_id):
    return matches.loc[matches["id"] == match_id].team2.unique()

def getPreviousEncDF(match_id):
    team1 = getTeam1(match_id)
    team2 = getTeam2(match_id)
    return matches.loc[(matches["id"] < match_id) & (((matches["team1"].isin(team1)) & (matches["team2"].isin(team2))) | ((matches["team1"].isin(team2)) & (matches["team2"].isin(team1))))]

def getTeamWBR(match_id, team):
    WBR = 0
    DF = getPreviousEncDF(match_id)
    winnerDF = DF.loc[DF["winner"] == team]
    WBR = winnerDF['win_by_runs'].sum()    
    return WBR


def getTeamWBW(match_id, team):
    WBW = 0 
    DF = getPreviousEncDF(match_id)
    winnerDF = DF.loc[DF["winner"] == team]
    WBW = winnerDF['win_by_wickets'].sum()
    return WBW 
    
def getTeamWinPerc(match_id):
    dF = getPreviousEncDF(match_id)
    timesPlayed = dF.shape[0]
    team1 = getTeam1(match_id)[0].strip("[]")
    timesWon = dF.loc[dF["winner"] == team1].shape[0]
    if timesPlayed != 0:
        winPerc = (timesWon/timesPlayed) * 100
    else:
        winPerc = 0
    return winPerc

def getBothTeamStats(match_id):
    DF = getPreviousEncDF(match_id)
    team1 = getTeam1(match_id)[0].strip("[]")
    team2 = getTeam2(match_id)[0].strip("[]")
    timesPlayed = DF.shape[0]
    timesWon = DF.loc[DF["winner"] == team1].shape[0]
    WBRTeam1 = getTeamWBR(match_id, team1)
    WBRTeam2 = getTeamWBR(match_id, team2)
    WBWTeam1 = getTeamWBW(match_id, team1)
    WBWTeam2 = getTeamWBW(match_id, team2)

    print "Out of {} times in the past {} have won {} times({}%) from {}".format(timesPlayed, team1, timesWon, getTeamWinPerc(match_id), team2)
    print "{} won by {} total runs and {} total wickets.".format(team1, WBRTeam1, WBWTeam1)
    print "{} won by {} total runs and {} total wickets.".format(team2, WBRTeam2, WBWTeam2)



In [15]:

    
#Testing functionality 
getBothTeamStats(517)









    



Out of 21 times in the past Mumbai Indians have won 11 times(52.380952381%) from Chennai Super Kings
Mumbai Indians won by 144 total runs and 30 total wickets.
Chennai Super Kings won by 138 total runs and 31 total wickets.



In [16]:

    
# Recent Form (Win Percentage of the 3 previous matches of a team in the same season)
# Higher the better

def getMatchYear(match_id):
    return matches.loc[matches["id"] == match_id].season.unique()

def getTeam1DF(match_id, year):
    team1 = getTeam1(match_id)
    return matches.loc[(matches["id"] < match_id) & (matches["season"] == year) & ((matches["team1"].isin(team1)) | (matches["team2"].isin(team1)))].tail(3)

def getTeam2DF(match_id, year):
    team2 = getTeam2(match_id)
    return matches.loc[(matches["id"] < match_id) & (matches["season"] == year) & ((matches["team1"].isin(team2)) | (matches["team2"].isin(team2)))].tail(3)

def getTeamWinPercentage(match_id):
    year = int(getMatchYear(match_id))
    team1 = getTeam1(match_id)[0].strip("[]")
    team2 = getTeam2(match_id)[0].strip("[]")
    team1DF = getTeam1DF(match_id, year)
    team2DF = getTeam2DF(match_id, year)
    team1TotalMatches = team1DF.shape[0]
    team1WinMatches = team1DF.loc[team1DF["winner"] == team1].shape[0]
    team2TotalMatches = team2DF.shape[0]
    team2WinMatches = team2DF.loc[team2DF["winner"] == team2].shape[0]
    if (team1TotalMatches != 0) and (team2TotalMatches !=0):
        winPercTeam1 = ((team1WinMatches / team1TotalMatches) * 100) 
        winPercTeam2 = ((team2WinMatches / team2TotalMatches) * 100) 
    elif (team1TotalMatches != 0) and (team2TotalMatches ==0):
        winPercTeam1 = ((team1WinMatches / team1TotalMatches) * 100) 
        winPercTeam2 = 0
    elif (team1TotalMatches == 0) and (team2TotalMatches !=0):
        winPercTeam1 = 0
        winPercTeam2 = ((team2WinMatches / team2TotalMatches) * 100) 
    else:
        winPercTeam1 = 0
        winPercTeam2 = 0
    return winPercTeam1, winPercTeam2



In [17]:

    
getTeamWinPercentage(517)









    Out[17]:





(66.66666666666666, 66.66666666666666)



In [18]:

    
#Function to implement all features
def getAllFeatures(match_id):
    generateSquadRating(match_id)
    print ("\n")
    getBothTeamStats(match_id)
    print("\n")
    getTeamWinPercentage(match_id)



In [19]:

    
#Testing Functionality
getAllFeatures(517)









    



Comparing squads for Mumbai Indians vs Chennai Super Kings

Average Strike Rate for Batsmen in Mumbai Indians : 126.980245232

Average Strike Rate for Batsmen in Chennai Super Kings : 128.555795104

Bowler Rating for Mumbai Indians : 2.76418065941

Bowler Rating for Chennai Super Kings : 4.4721111768

Number of MVP Awards in Mumbai Indians : 28

Number of MVP Awards in Chennai Super Kings : 52


Out of 21 times in the past Mumbai Indians have won 11 times(52.380952381%) from Chennai Super Kings
Mumbai Indians won by 144 total runs and 30 total wickets.
Chennai Super Kings won by 138 total runs and 31 total wickets.

Adding New Columns for Features in Matches DataFrame



In [20]:

    
# New Column for Difference of Average Strike rates (First Team SR - Second Team SR) 
# [Negative value means Second team is better]

firstTeamSR = []
secondTeamSR = []
for i in matches['id'].unique():
    P, Q = getAverageStrikeRates(i)
    firstTeamSR.append(P), secondTeamSR.append(Q)
firstSRSeries = pd.Series(firstTeamSR)
secondSRSeries = pd.Series(secondTeamSR)
matches["Avg_SR_Difference"] = firstSRSeries.values - secondSRSeries.values









    



/home/soham/anaconda2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/soham/anaconda2/lib/python2.7/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/soham/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [21]:

    
# New Column for Difference of Wickets Per Run (First Team WPR - Second Team WPR) 
# [Negative value means Second team is better]

firstTeamWPR = []
secondTeamWPR = []
for i in matches['id'].unique():
    R, S = getAverageWPR(i)
    firstTeamWPR.append(R), secondTeamWPR.append(S)
firstWPRSeries = pd.Series(firstTeamWPR)
secondWPRSeries = pd.Series(secondTeamWPR)
matches["Avg_WPR_Difference"] = firstWPRSeries.values - secondWPRSeries.values









    



/home/soham/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [22]:

    
# New column for difference of MVP Awards 
# (Negative value means Second team is better)

firstTeamMVP = []
secondTeamMVP = []
for i in matches['id'].unique():
    T, U = compareMVPAwards(i)
    firstTeamMVP.append(T), secondTeamMVP.append(U)
firstMVPSeries = pd.Series(firstTeamMVP)
secondMVPSeries = pd.Series(secondTeamMVP)
matches["Total_MVP_Difference"] = firstMVPSeries.values - secondMVPSeries.values









    



/home/soham/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [23]:

    
# New column for Win Percentage of Team 1 in previous encounters

firstTeamWP = []
for i in matches['id'].unique():
    WP = getTeamWinPerc(i)
    firstTeamWP.append(WP)
firstWPSeries = pd.Series(firstTeamWP)
matches["Prev_Enc_Team1_WinPerc"] = firstWPSeries.values









    



/home/soham/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [24]:

    
# New column for Recent form(Win Percentage in the current season) of 1st Team compared to 2nd Team
# (Negative means 2nd team has higher win percentage)

firstTeamRF = []
secondTeamRF = []
for i in matches['id'].unique():
    K, L = getTeamWinPercentage(i)
    firstTeamRF.append(K), secondTeamRF.append(L)
firstRFSeries = pd.Series(firstTeamRF)
secondRFSeries = pd.Series(secondTeamRF)
matches["Total_RF_Difference"] = firstRFSeries.values - secondRFSeries.values









    



/home/soham/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [25]:

    
#Create Column for Team 1 Winning Status (1 = Won, 0 = Lost)

matches['team1Winning'] = np.where(matches['team1'] == matches['winner'], 1, 0)









    



/home/soham/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()



In [26]:

    
#Testing 
matches









    Out[26]:







  
    
      
      id
      season
      city
      date
      team1
      team2
      toss_winner
      toss_decision
      result
      dl_applied
      winner
      win_by_runs
      win_by_wickets
      player_of_match
      venue
      umpire1
      umpire2
      umpire3
      Avg_SR_Difference
      Avg_WPR_Difference
      Total_MVP_Difference
      Prev_Enc_Team1_WinPerc
      Total_RF_Difference
      team1Winning
    
  
  
    
      0
      1
      2008
      Bangalore
      2008-04-18
      Kolkata Knight Riders
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Kolkata Knight Riders
      140
      0
      BB McCullum
      M Chinnaswamy Stadium
      Asad Rauf
      RE Koertzen
      NaN
      NaN
      NaN
      0
      0.000000
      0.000000
      1
    
    
      1
      2
      2008
      Chandigarh
      2008-04-19
      Chennai Super Kings
      Kings XI Punjab
      Chennai Super Kings
      bat
      normal
      0
      Chennai Super Kings
      33
      0
      MEK Hussey
      Punjab Cricket Association Stadium, Mohali
      MR Benson
      SL Shastri
      NaN
      NaN
      NaN
      0
      0.000000
      0.000000
      1
    
    
      2
      3
      2008
      Delhi
      2008-04-19
      Rajasthan Royals
      Delhi Daredevils
      Rajasthan Royals
      bat
      normal
      0
      Delhi Daredevils
      0
      9
      MF Maharoof
      Feroz Shah Kotla
      Aleem Dar
      GA Pratapkumar
      NaN
      NaN
      NaN
      0
      0.000000
      0.000000
      0
    
    
      3
      4
      2008
      Mumbai
      2008-04-20
      Mumbai Indians
      Royal Challengers Bangalore
      Mumbai Indians
      bat
      normal
      0
      Royal Challengers Bangalore
      0
      5
      MV Boucher
      Wankhede Stadium
      SJ Davis
      DJ Harper
      NaN
      NaN
      NaN
      0
      0.000000
      0.000000
      0
    
    
      5
      6
      2008
      Jaipur
      2008-04-21
      Kings XI Punjab
      Rajasthan Royals
      Kings XI Punjab
      bat
      normal
      0
      Rajasthan Royals
      0
      6
      SR Watson
      Sawai Mansingh Stadium
      Aleem Dar
      RB Tiffin
      NaN
      55.665975
      1.414786
      0
      0.000000
      0.000000
      0
    
    
      7
      8
      2008
      Chennai
      2008-04-23
      Chennai Super Kings
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Chennai Super Kings
      6
      0
      ML Hayden
      MA Chidambaram Stadium, Chepauk
      DJ Harper
      GA Pratapkumar
      NaN
      6.135734
      -1.591368
      1
      0.000000
      100.000000
      1
    
    
      9
      10
      2008
      Chandigarh
      2008-04-25
      Kings XI Punjab
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Kings XI Punjab
      66
      0
      KC Sangakkara
      Punjab Cricket Association Stadium, Mohali
      Aleem Dar
      AM Saheba
      NaN
      4.666844
      0.111379
      0
      0.000000
      0.000000
      1
    
    
      10
      11
      2008
      Bangalore
      2008-04-26
      Royal Challengers Bangalore
      Rajasthan Royals
      Rajasthan Royals
      field
      normal
      0
      Rajasthan Royals
      0
      7
      SR Watson
      M Chinnaswamy Stadium
      MR Benson
      IL Howell
      NaN
      25.388743
      -0.021123
      0
      0.000000
      0.000000
      0
    
    
      11
      12
      2008
      Chennai
      2008-04-26
      Kolkata Knight Riders
      Chennai Super Kings
      Kolkata Knight Riders
      bat
      normal
      0
      Chennai Super Kings
      0
      9
      JDP Oram
      MA Chidambaram Stadium, Chepauk
      BF Bowden
      AV Jayaprakash
      NaN
      -28.438618
      11.723738
      0
      0.000000
      0.000000
      0
    
    
      13
      14
      2008
      Chandigarh
      2008-04-27
      Delhi Daredevils
      Kings XI Punjab
      Delhi Daredevils
      bat
      normal
      0
      Kings XI Punjab
      0
      4
      SM Katich
      Punjab Cricket Association Stadium, Mohali
      RE Koertzen
      I Shivram
      NaN
      41.221731
      6.066625
      0
      0.000000
      66.666667
      0
    
    
      14
      15
      2008
      Bangalore
      2008-04-28
      Chennai Super Kings
      Royal Challengers Bangalore
      Chennai Super Kings
      bat
      normal
      0
      Chennai Super Kings
      13
      0
      MS Dhoni
      M Chinnaswamy Stadium
      BR Doctrove
      RB Tiffin
      NaN
      37.233069
      0.581470
      2
      0.000000
      66.666667
      1
    
    
      15
      16
      2008
      Kolkata
      2008-04-29
      Kolkata Knight Riders
      Mumbai Indians
      Kolkata Knight Riders
      bat
      normal
      0
      Mumbai Indians
      0
      7
      ST Jayasuriya
      Eden Gardens
      BF Bowden
      AV Jayaprakash
      NaN
      -13.582248
      1.010938
      1
      0.000000
      50.000000
      0
    
    
      16
      17
      2008
      Delhi
      2008-04-30
      Delhi Daredevils
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Delhi Daredevils
      10
      0
      GD McGrath
      Feroz Shah Kotla
      Aleem Dar
      I Shivram
      NaN
      15.293648
      2.058102
      -1
      0.000000
      16.666667
      1
    
    
      18
      19
      2008
      Jaipur
      2008-05-01
      Rajasthan Royals
      Kolkata Knight Riders
      Rajasthan Royals
      bat
      normal
      0
      Rajasthan Royals
      45
      0
      SA Asnodkar
      Sawai Mansingh Stadium
      RE Koertzen
      GA Pratapkumar
      NaN
      40.069300
      -2.720529
      2
      0.000000
      33.333333
      1
    
    
      19
      20
      2008
      Chennai
      2008-05-02
      Chennai Super Kings
      Delhi Daredevils
      Chennai Super Kings
      bat
      normal
      0
      Delhi Daredevils
      0
      8
      V Sehwag
      MA Chidambaram Stadium, Chepauk
      BF Bowden
      K Hariharan
      NaN
      -6.529304
      0.155223
      0
      0.000000
      33.333333
      0
    
    
      21
      22
      2008
      Chandigarh
      2008-05-03
      Kings XI Punjab
      Kolkata Knight Riders
      Kings XI Punjab
      bat
      normal
      0
      Kings XI Punjab
      9
      0
      IK Pathan
      Punjab Cricket Association Stadium, Mohali
      DJ Harper
      I Shivram
      NaN
      60.276090
      1.503388
      1
      0.000000
      66.666667
      1
    
    
      22
      23
      2008
      Mumbai
      2008-05-04
      Mumbai Indians
      Delhi Daredevils
      Delhi Daredevils
      field
      normal
      0
      Mumbai Indians
      29
      0
      SM Pollock
      Dr DY Patil Sports Academy
      IL Howell
      RE Koertzen
      NaN
      -36.759577
      -0.140660
      -1
      0.000000
      -33.333333
      1
    
    
      23
      24
      2008
      Jaipur
      2008-05-04
      Chennai Super Kings
      Rajasthan Royals
      Chennai Super Kings
      bat
      normal
      0
      Rajasthan Royals
      0
      8
      Sohail Tanvir
      Sawai Mansingh Stadium
      Asad Rauf
      AV Jayaprakash
      NaN
      -3.740886
      -1.545548
      -2
      0.000000
      -33.333333
      0
    
    
      24
      25
      2008
      Bangalore
      2008-05-05
      Royal Challengers Bangalore
      Kings XI Punjab
      Kings XI Punjab
      field
      normal
      0
      Kings XI Punjab
      0
      6
      S Sreesanth
      M Chinnaswamy Stadium
      SJ Davis
      BR Doctrove
      NaN
      -29.919482
      -1.732469
      -1
      0.000000
      -100.000000
      0
    
    
      26
      27
      2008
      Mumbai
      2008-05-07
      Rajasthan Royals
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Mumbai Indians
      0
      7
      A Nehra
      Dr DY Patil Sports Academy
      DJ Harper
      RE Koertzen
      NaN
      -1.150869
      1.681456
      2
      0.000000
      33.333333
      0
    
    
      27
      28
      2008
      Delhi
      2008-05-08
      Delhi Daredevils
      Chennai Super Kings
      Chennai Super Kings
      field
      normal
      0
      Chennai Super Kings
      0
      4
      MS Dhoni
      Feroz Shah Kotla
      Aleem Dar
      RB Tiffin
      NaN
      4.157345
      0.525677
      1
      100.000000
      33.333333
      0
    
    
      28
      29
      2008
      Kolkata
      2008-05-08
      Kolkata Knight Riders
      Royal Challengers Bangalore
      Kolkata Knight Riders
      bat
      normal
      0
      Kolkata Knight Riders
      5
      0
      SC Ganguly
      Eden Gardens
      Asad Rauf
      IL Howell
      NaN
      -11.720957
      2.154708
      -1
      100.000000
      0.000000
      1
    
    
      30
      31
      2008
      Bangalore
      2008-05-28
      Royal Challengers Bangalore
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Mumbai Indians
      0
      9
      CRD Fernando
      M Chinnaswamy Stadium
      BF Bowden
      AV Jayaprakash
      NaN
      -12.795080
      -3.123743
      -2
      100.000000
      -100.000000
      0
    
    
      31
      32
      2008
      Chennai
      2008-05-10
      Chennai Super Kings
      Kings XI Punjab
      Kings XI Punjab
      field
      normal
      0
      Chennai Super Kings
      18
      0
      L Balaji
      MA Chidambaram Stadium, Chepauk
      AV Jayaprakash
      BG Jerling
      NaN
      15.090854
      -2.420466
      0
      100.000000
      -66.666667
      1
    
    
      33
      34
      2008
      Jaipur
      2008-05-11
      Delhi Daredevils
      Rajasthan Royals
      Rajasthan Royals
      field
      normal
      0
      Rajasthan Royals
      0
      3
      SR Watson
      Sawai Mansingh Stadium
      SJ Davis
      RE Koertzen
      NaN
      16.965741
      0.337565
      1
      100.000000
      -33.333333
      0
    
    
      34
      35
      2008
      Chandigarh
      2008-05-12
      Royal Challengers Bangalore
      Kings XI Punjab
      Royal Challengers Bangalore
      bat
      normal
      0
      Kings XI Punjab
      0
      9
      SE Marsh
      Punjab Cricket Association Stadium, Mohali
      BR Doctrove
      I Shivram
      NaN
      -39.254979
      -1.533140
      -1
      0.000000
      -66.666667
      0
    
    
      35
      36
      2008
      Kolkata
      2008-05-13
      Kolkata Knight Riders
      Delhi Daredevils
      Kolkata Knight Riders
      bat
      normal
      0
      Kolkata Knight Riders
      23
      0
      Shoaib Akhtar
      Eden Gardens
      Asad Rauf
      IL Howell
      NaN
      -30.450448
      -0.062459
      -2
      0.000000
      33.333333
      1
    
    
      36
      37
      2008
      Mumbai
      2008-05-14
      Chennai Super Kings
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Mumbai Indians
      0
      9
      ST Jayasuriya
      Wankhede Stadium
      BR Doctrove
      AM Saheba
      NaN
      9.397768
      1.087380
      0
      100.000000
      -33.333333
      0
    
    
      37
      38
      2008
      Chandigarh
      2008-05-28
      Kings XI Punjab
      Rajasthan Royals
      Rajasthan Royals
      field
      normal
      0
      Kings XI Punjab
      41
      0
      SE Marsh
      Punjab Cricket Association Stadium, Mohali
      SJ Davis
      K Hariharan
      NaN
      5.965139
      0.737928
      0
      0.000000
      0.000000
      1
    
    
      39
      40
      2008
      Mumbai
      2008-05-16
      Kolkata Knight Riders
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Mumbai Indians
      0
      8
      SM Pollock
      Wankhede Stadium
      BR Doctrove
      DJ Harper
      NaN
      -20.297425
      10.328566
      -2
      0.000000
      -33.333333
      0
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      474
      475
      2015
      Bangalore
      2015-04-19
      Mumbai Indians
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Mumbai Indians
      18
      0
      Harbhajan Singh
      M Chinnaswamy Stadium
      RK Illingworth
      VA Kulkarni
      NaN
      -6.548623
      0.610484
      -4
      53.333333
      -100.000000
      1
    
    
      475
      476
      2015
      Delhi
      2015-04-20
      Delhi Daredevils
      Kolkata Knight Riders
      Kolkata Knight Riders
      field
      normal
      0
      Kolkata Knight Riders
      0
      6
      UT Yadav
      Feroz Shah Kotla
      SD Fry
      CB Gaffaney
      NaN
      -11.068650
      -0.136905
      -22
      46.153846
      -33.333333
      0
    
    
      478
      479
      2015
      Bangalore
      2015-04-22
      Chennai Super Kings
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Chennai Super Kings
      27
      0
      SK Raina
      M Chinnaswamy Stadium
      JD Cloete
      C Shamshuddin
      NaN
      7.048118
      -1.246590
      22
      56.250000
      -16.666667
      1
    
    
      479
      480
      2015
      Delhi
      2015-04-23
      Delhi Daredevils
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Delhi Daredevils
      37
      0
      SS Iyer
      Feroz Shah Kotla
      SD Fry
      CK Nandan
      NaN
      -0.596885
      0.020515
      -11
      50.000000
      0.000000
      1
    
    
      480
      481
      2015
      Ahmedabad
      2015-04-24
      Rajasthan Royals
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Royal Challengers Bangalore
      0
      9
      MA Starc
      Sardar Patel Stadium, Motera
      M Erasmus
      S Ravi
      NaN
      17.723755
      0.354974
      3
      53.846154
      66.666667
      0
    
    
      482
      483
      2015
      Chennai
      2015-04-25
      Chennai Super Kings
      Kings XI Punjab
      Chennai Super Kings
      bat
      normal
      0
      Chennai Super Kings
      97
      0
      BB McCullum
      MA Chidambaram Stadium, Chepauk
      JD Cloete
      C Shamshuddin
      NaN
      -5.061380
      -1.212153
      7
      53.846154
      33.333333
      1
    
    
      483
      484
      2015
      Delhi
      2015-04-26
      Delhi Daredevils
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Royal Challengers Bangalore
      0
      10
      VR Aaron
      Feroz Shah Kotla
      M Erasmus
      S Ravi
      NaN
      -5.004302
      0.906597
      -4
      41.666667
      33.333333
      0
    
    
      485
      486
      2015
      Kolkata
      2015-05-07
      Kolkata Knight Riders
      Delhi Daredevils
      Kolkata Knight Riders
      bat
      normal
      0
      Kolkata Knight Riders
      13
      0
      PP Chawla
      Eden Gardens
      AK Chaudhary
      M Erasmus
      NaN
      2.012543
      0.366985
      21
      57.142857
      66.666667
      1
    
    
      487
      488
      2015
      Chennai
      2015-04-28
      Chennai Super Kings
      Kolkata Knight Riders
      Kolkata Knight Riders
      field
      normal
      0
      Chennai Super Kings
      2
      0
      DJ Bravo
      MA Chidambaram Stadium, Chepauk
      RM Deshpande
      VA Kulkarni
      NaN
      2.910151
      0.425559
      7
      61.538462
      -33.333333
      1
    
    
      488
      489
      2015
      Delhi
      2015-05-01
      Kings XI Punjab
      Delhi Daredevils
      Delhi Daredevils
      field
      normal
      0
      Delhi Daredevils
      0
      9
      NM Coulter-Nile
      Feroz Shah Kotla
      RK Illingworth
      S Ravi
      NaN
      14.275080
      0.459693
      15
      61.538462
      -33.333333
      0
    
    
      489
      490
      2015
      Mumbai
      2015-05-01
      Mumbai Indians
      Rajasthan Royals
      Rajasthan Royals
      field
      normal
      0
      Mumbai Indians
      8
      0
      AT Rayudu
      Wankhede Stadium
      HDPK Dharmasena
      CK Nandan
      NaN
      -13.536521
      0.097342
      5
      60.000000
      -33.333333
      1
    
    
      490
      491
      2015
      Bangalore
      2015-05-02
      Kolkata Knight Riders
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Royal Challengers Bangalore
      0
      7
      Mandeep Singh
      M Chinnaswamy Stadium
      JD Cloete
      PG Pathak
      NaN
      4.510188
      -0.570758
      7
      57.142857
      0.000000
      0
    
    
      492
      493
      2015
      Chandigarh
      2015-05-03
      Mumbai Indians
      Kings XI Punjab
      Mumbai Indians
      bat
      normal
      0
      Mumbai Indians
      23
      0
      LMP Simmons
      Punjab Cricket Association Stadium, Mohali
      RK Illingworth
      VA Kulkarni
      NaN
      -13.615487
      -0.737024
      -1
      46.666667
      66.666667
      1
    
    
      493
      494
      2015
      Mumbai
      2015-05-03
      Rajasthan Royals
      Delhi Daredevils
      Delhi Daredevils
      field
      normal
      0
      Rajasthan Royals
      14
      0
      AM Rahane
      Brabourne Stadium
      HDPK Dharmasena
      CB Gaffaney
      NaN
      10.775508
      -0.057722
      7
      60.000000
      0.000000
      1
    
    
      494
      495
      2015
      Chennai
      2015-05-04
      Chennai Super Kings
      Royal Challengers Bangalore
      Chennai Super Kings
      bat
      normal
      0
      Chennai Super Kings
      24
      0
      SK Raina
      MA Chidambaram Stadium, Chepauk
      C Shamshuddin
      K Srinath
      NaN
      4.569589
      -0.231439
      25
      58.823529
      0.000000
      1
    
    
      496
      497
      2015
      Mumbai
      2015-05-05
      Delhi Daredevils
      Mumbai Indians
      Delhi Daredevils
      bat
      normal
      0
      Mumbai Indians
      0
      5
      Harbhajan Singh
      Wankhede Stadium
      HDPK Dharmasena
      CB Gaffaney
      NaN
      -18.726971
      -0.755737
      -14
      53.333333
      -33.333333
      0
    
    
      497
      498
      2015
      Bangalore
      2015-05-06
      Royal Challengers Bangalore
      Kings XI Punjab
      Kings XI Punjab
      field
      normal
      0
      Royal Challengers Bangalore
      138
      0
      CH Gayle
      M Chinnaswamy Stadium
      RK Illingworth
      VA Kulkarni
      NaN
      -5.574733
      0.886535
      5
      35.714286
      66.666667
      1
    
    
      499
      500
      2015
      Chennai
      2015-05-08
      Chennai Super Kings
      Mumbai Indians
      Chennai Super Kings
      bat
      normal
      0
      Mumbai Indians
      0
      6
      HH Pandya
      MA Chidambaram Stadium, Chepauk
      CB Gaffaney
      CK Nandan
      NaN
      5.014283
      1.972348
      17
      52.631579
      0.000000
      0
    
    
      500
      501
      2015
      Kolkata
      2015-05-09
      Kings XI Punjab
      Kolkata Knight Riders
      Kings XI Punjab
      bat
      normal
      0
      Kolkata Knight Riders
      0
      1
      AD Russell
      Eden Gardens
      AK Chaudhary
      HDPK Dharmasena
      NaN
      6.415078
      0.254813
      -18
      40.000000
      -33.333333
      0
    
    
      502
      503
      2015
      Mumbai
      2015-05-10
      Royal Challengers Bangalore
      Mumbai Indians
      Royal Challengers Bangalore
      bat
      normal
      0
      Royal Challengers Bangalore
      39
      0
      AB de Villiers
      Wankhede Stadium
      JD Cloete
      C Shamshuddin
      NaN
      -1.343308
      2.062729
      -3
      43.750000
      -33.333333
      1
    
    
      503
      504
      2015
      Chennai
      2015-05-10
      Chennai Super Kings
      Rajasthan Royals
      Chennai Super Kings
      bat
      normal
      0
      Chennai Super Kings
      12
      0
      RA Jadeja
      MA Chidambaram Stadium, Chepauk
      M Erasmus
      CK Nandan
      NaN
      -5.738591
      -0.046456
      16
      62.500000
      33.333333
      1
    
    
      505
      506
      2015
      Raipur
      2015-05-12
      Chennai Super Kings
      Delhi Daredevils
      Chennai Super Kings
      bat
      normal
      0
      Delhi Daredevils
      0
      6
      Z Khan
      Shaheed Veer Narayan Singh International Stadium
      RK Illingworth
      VA Kulkarni
      NaN
      6.941454
      1.678318
      28
      73.333333
      33.333333
      0
    
    
      506
      507
      2015
      Chandigarh
      2015-05-13
      Kings XI Punjab
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Kings XI Punjab
      22
      0
      AR Patel
      Punjab Cricket Association Stadium, Mohali
      JD Cloete
      C Shamshuddin
      NaN
      5.622383
      -1.324729
      -16
      60.000000
      -66.666667
      1
    
    
      507
      508
      2015
      Mumbai
      2015-05-14
      Mumbai Indians
      Kolkata Knight Riders
      Kolkata Knight Riders
      field
      normal
      0
      Mumbai Indians
      5
      0
      HH Pandya
      Wankhede Stadium
      RK Illingworth
      VA Kulkarni
      NaN
      -0.677689
      -0.313345
      -11
      66.666667
      33.333333
      1
    
    
      509
      510
      2015
      Chandigarh
      2015-05-16
      Kings XI Punjab
      Chennai Super Kings
      Kings XI Punjab
      bat
      normal
      0
      Chennai Super Kings
      0
      7
      P Negi
      Punjab Cricket Association Stadium, Mohali
      CK Nandan
      C Shamshuddin
      NaN
      -0.716536
      1.824407
      -33
      42.857143
      0.000000
      0
    
    
      510
      511
      2015
      Mumbai
      2015-05-16
      Rajasthan Royals
      Kolkata Knight Riders
      Rajasthan Royals
      bat
      normal
      0
      Rajasthan Royals
      9
      0
      SR Watson
      Brabourne Stadium
      RM Deshpande
      RK Illingworth
      NaN
      -3.303823
      -0.271935
      -16
      50.000000
      0.000000
      1
    
    
      513
      514
      2015
      Mumbai
      2015-05-19
      Mumbai Indians
      Chennai Super Kings
      Mumbai Indians
      bat
      normal
      0
      Mumbai Indians
      25
      0
      KA Pollard
      Wankhede Stadium
      HDPK Dharmasena
      RK Illingworth
      NaN
      6.315981
      -0.617777
      -24
      50.000000
      0.000000
      1
    
    
      514
      515
      2015
      Pune
      2015-05-20
      Royal Challengers Bangalore
      Rajasthan Royals
      Royal Challengers Bangalore
      bat
      normal
      0
      Royal Challengers Bangalore
      71
      0
      AB de Villiers
      Maharashtra Cricket Association Stadium
      AK Chaudhary
      C Shamshuddin
      NaN
      -2.200375
      0.969143
      5
      50.000000
      0.000000
      1
    
    
      515
      516
      2015
      Ranchi
      2015-05-22
      Royal Challengers Bangalore
      Chennai Super Kings
      Chennai Super Kings
      field
      normal
      0
      Chennai Super Kings
      0
      3
      A Nehra
      JSCA International Stadium Complex
      AK Chaudhary
      CB Gaffaney
      NaN
      -0.521025
      1.039181
      -23
      38.888889
      33.333333
      0
    
    
      516
      517
      2015
      Kolkata
      2015-05-24
      Mumbai Indians
      Chennai Super Kings
      Chennai Super Kings
      field
      normal
      0
      Mumbai Indians
      41
      0
      RG Sharma
      Eden Gardens
      HDPK Dharmasena
      RK Illingworth
      NaN
      -1.575550
      -1.707931
      -24
      52.380952
      0.000000
      1
    
  

331 rows × 24 columns

Visualizations for Features vs. Response



In [27]:

    
matches.boxplot(column = 'Avg_SR_Difference', by='team1Winning', showfliers= False)









    Out[27]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1d1eeb1f90>



In [28]:

    
matches.boxplot(column = 'Avg_WPR_Difference', by='team1Winning', showfliers= False)









    Out[28]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1d1eeb1710>



In [29]:

    
matches.boxplot(column = 'Total_MVP_Difference', by='team1Winning', showfliers= False)









    Out[29]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1d1ecaa090>



In [30]:

    
matches.boxplot(column = 'Prev_Enc_Team1_WinPerc', by='team1Winning', showfliers= False)









    Out[30]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1d1ebee810>



In [31]:

    
matches.boxplot(column = 'Total_RF_Difference', by='team1Winning', showfliers= False)









    Out[31]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1d1ec3cd10>

Predictions



In [32]:

    
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from patsy import dmatrices









    



/home/soham/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)



In [33]:

    
y, X = dmatrices('team1Winning ~ 0 + Avg_SR_Difference + Avg_WPR_Difference + Total_MVP_Difference + Prev_Enc_Team1_WinPerc + \
                  Total_RF_Difference', matches, return_type="dataframe")
y_arr = np.ravel(y)

Training and Testing on Entire Data



In [34]:

    
# instantiate a logistic regression model, and fit with X and y
model = LogisticRegression()
model = model.fit(X, y_arr)
# check the accuracy on the training set
print "Accuracy is", model.score(X, y_arr)*100, "%"









    



Accuracy is 57.4923547401 %

Splitting train and test using train_test_split



In [35]:

    
# evaluate the model by splitting into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y_arr, random_state = 0)



In [36]:

    
# Logistic Regression on train_test_split
model2 = LogisticRegression()
model2.fit(X_train, y_train)
# predict class labels for the test set
predicted = model2.predict(X_test)
# generate evaluation metrics
print "Accuracy is ", metrics.accuracy_score(y_test, predicted)*100, "%"









    



Accuracy is  58.5365853659 %



In [37]:

    
# KNN Classification on train_test_split
k_range = list(range(1, 61))
k_score = []
for k in k_range:
    knn = KNeighborsClassifier(n_neighbors = k)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    k_score.append(metrics.accuracy_score(y_test, y_pred))
plt.plot(k_range, k_score)









    Out[37]:





[<matplotlib.lines.Line2D at 0x7f1d0448e6d0>]



In [38]:

    
# Best values of k in train_test_split
knn = KNeighborsClassifier(n_neighbors = 50)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print "Accuracy is ", metrics.accuracy_score(y_test, y_pred)*100, "%"









    



Accuracy is  64.6341463415 %

Splitting Training Set (2008-2013) and Test Set (2013-2015) based on Seasons



In [39]:

    
X_timetrain = X.loc[X.index < 398]
Y_timetrain = y.loc[y.index < 398]
Y_timetrain_arr = np.ravel(Y_timetrain)
X_timetest = X.loc[X.index >= 398]
Y_timetest = y.loc[y.index >= 398]
Y_timetest_arr = np.ravel(Y_timetest)
X_timetest









    Out[39]:







  
    
      
      Avg_SR_Difference
      Avg_WPR_Difference
      Total_MVP_Difference
      Prev_Enc_Team1_WinPerc
      Total_RF_Difference
    
  
  
    
      398
      -9.646646
      0.466526
      6.0
      16.666667
      0.000000
    
    
      399
      4.963605
      0.097800
      12.0
      50.000000
      0.000000
    
    
      400
      7.079810
      0.432566
      11.0
      70.000000
      0.000000
    
    
      402
      21.485599
      1.176414
      17.0
      53.846154
      -100.000000
    
    
      403
      -4.503334
      1.663169
      15.0
      54.545455
      100.000000
    
    
      404
      -7.297630
      -0.332117
      -1.0
      72.727273
      -100.000000
    
    
      405
      12.183316
      2.316918
      5.0
      66.666667
      -50.000000
    
    
      407
      -5.341707
      2.620287
      12.0
      61.538462
      50.000000
    
    
      408
      5.093091
      0.588349
      20.0
      54.545455
      -50.000000
    
    
      410
      -13.668459
      -2.328697
      0.0
      60.000000
      -66.666667
    
    
      411
      15.451031
      0.903107
      -2.0
      54.545455
      66.666667
    
    
      412
      16.852669
      -1.198669
      -19.0
      50.000000
      33.333333
    
    
      413
      -6.674135
      2.225351
      8.0
      50.000000
      -33.333333
    
    
      415
      -19.344665
      -2.002792
      4.0
      41.666667
      -66.666667
    
    
      418
      12.797760
      0.233431
      -7.0
      70.000000
      66.666667
    
    
      419
      7.351864
      4.156536
      -1.0
      58.333333
      100.000000
    
    
      420
      5.659837
      -2.136710
      4.0
      50.000000
      33.333333
    
    
      422
      7.195161
      -0.935038
      -8.0
      45.454545
      33.333333
    
    
      423
      -12.464962
      -2.197910
      -7.0
      30.769231
      -66.666667
    
    
      424
      -3.504686
      0.247855
      -1.0
      50.000000
      33.333333
    
    
      425
      5.508879
      -1.167825
      -9.0
      50.000000
      33.333333
    
    
      426
      8.303271
      1.050310
      -14.0
      36.363636
      -33.333333
    
    
      428
      35.077496
      1.280748
      -1.0
      61.538462
      66.666667
    
    
      430
      -7.544654
      0.752493
      -8.0
      56.250000
      0.000000
    
    
      431
      21.871322
      0.886981
      -11.0
      54.545455
      33.333333
    
    
      432
      0.003466
      -1.131751
      3.0
      50.000000
      -100.000000
    
    
      434
      11.859779
      1.592142
      -11.0
      35.714286
      33.333333
    
    
      435
      2.600434
      1.780807
      5.0
      54.545455
      0.000000
    
    
      437
      -8.157091
      -0.853403
      -7.0
      76.923077
      0.000000
    
    
      438
      13.456476
      2.440403
      -4.0
      53.846154
      66.666667
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      474
      -6.548623
      0.610484
      -4.0
      53.333333
      -100.000000
    
    
      475
      -11.068650
      -0.136905
      -22.0
      46.153846
      -33.333333
    
    
      478
      7.048118
      -1.246590
      22.0
      56.250000
      -16.666667
    
    
      479
      -0.596885
      0.020515
      -11.0
      50.000000
      0.000000
    
    
      480
      17.723755
      0.354974
      3.0
      53.846154
      66.666667
    
    
      482
      -5.061380
      -1.212153
      7.0
      53.846154
      33.333333
    
    
      483
      -5.004302
      0.906597
      -4.0
      41.666667
      33.333333
    
    
      485
      2.012543
      0.366985
      21.0
      57.142857
      66.666667
    
    
      487
      2.910151
      0.425559
      7.0
      61.538462
      -33.333333
    
    
      488
      14.275080
      0.459693
      15.0
      61.538462
      -33.333333
    
    
      489
      -13.536521
      0.097342
      5.0
      60.000000
      -33.333333
    
    
      490
      4.510188
      -0.570758
      7.0
      57.142857
      0.000000
    
    
      492
      -13.615487
      -0.737024
      -1.0
      46.666667
      66.666667
    
    
      493
      10.775508
      -0.057722
      7.0
      60.000000
      0.000000
    
    
      494
      4.569589
      -0.231439
      25.0
      58.823529
      0.000000
    
    
      496
      -18.726971
      -0.755737
      -14.0
      53.333333
      -33.333333
    
    
      497
      -5.574733
      0.886535
      5.0
      35.714286
      66.666667
    
    
      499
      5.014283
      1.972348
      17.0
      52.631579
      0.000000
    
    
      500
      6.415078
      0.254813
      -18.0
      40.000000
      -33.333333
    
    
      502
      -1.343308
      2.062729
      -3.0
      43.750000
      -33.333333
    
    
      503
      -5.738591
      -0.046456
      16.0
      62.500000
      33.333333
    
    
      505
      6.941454
      1.678318
      28.0
      73.333333
      33.333333
    
    
      506
      5.622383
      -1.324729
      -16.0
      60.000000
      -66.666667
    
    
      507
      -0.677689
      -0.313345
      -11.0
      66.666667
      33.333333
    
    
      509
      -0.716536
      1.824407
      -33.0
      42.857143
      0.000000
    
    
      510
      -3.303823
      -0.271935
      -16.0
      50.000000
      0.000000
    
    
      513
      6.315981
      -0.617777
      -24.0
      50.000000
      0.000000
    
    
      514
      -2.200375
      0.969143
      5.0
      50.000000
      0.000000
    
    
      515
      -0.521025
      1.039181
      -23.0
      38.888889
      33.333333
    
    
      516
      -1.575550
      -1.707931
      -24.0
      52.380952
      0.000000
    
  

87 rows × 5 columns



In [40]:

    
# Logistic Regression on time-based split sets
model3 = LogisticRegression()
model3.fit(X_timetrain, Y_timetrain_arr)
timepredicted = model3.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, timepredicted)*100, "%"









    



Accuracy is  52.8735632184 %



In [53]:

    
# KNN Classification on time-based split sets
k_range = list(range(1, 32))
k_score = []
for k in k_range:
    knn = KNeighborsClassifier(n_neighbors = k)
    knn.fit(X_timetrain, Y_timetrain_arr)
    y_pred = knn.predict(X_timetest)
    k_score.append(metrics.accuracy_score(Y_timetest_arr, y_pred))
plt.plot(k_range, k_score)









    Out[53]:





[<matplotlib.lines.Line2D at 0x7f1d033a7410>]



In [54]:

    
# Best values of k in time-based split data
knn1 = KNeighborsClassifier(n_neighbors = 31)
knn1.fit(X_timetrain, Y_timetrain_arr)
y_pred = knn1.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, y_pred)*100, "%"









    



Accuracy is  64.367816092 %

Support Vector Machines



In [43]:

    
clf = svm.SVC(gamma=0.001, C=10)
clf.fit(X_timetrain, Y_timetrain_arr)
clf_pred = clf.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, clf_pred)*100, "%"









    



Accuracy is  45.9770114943 %

Random Forests



In [44]:

    
rfc = RandomForestClassifier(n_jobs = -1, random_state = 1)
rfc.fit(X_timetrain, Y_timetrain_arr)
rfc_pred = rfc.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, rfc_pred)*100, "%"









    



Accuracy is  54.0229885057 %



In [45]:

    
fi = zip(X.columns, rfc.feature_importances_)
print "Feature Importance according to Random Forests Model\n"
for i in fi:
    print i[0], ":", i[1]









    



Feature Importance according to Random Forests Model

Avg_SR_Difference : 0.330684992918
Avg_WPR_Difference : 0.21317276792
Total_MVP_Difference : 0.191778034092
Prev_Enc_Team1_WinPerc : 0.141146504197
Total_RF_Difference : 0.123217700874

Naive Bayes Classifier



In [46]:

    
gclf = GaussianNB()
gclf.fit(X_timetrain, Y_timetrain_arr)
gclf_pred = gclf.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, gclf_pred) *100, "%"









    



Accuracy is  55.1724137931 %

Cross Validation



In [47]:

    
from sklearn.cross_validation import cross_val_score



In [48]:

    
rfc = LogisticRegression()
scores = cross_val_score(rfc, X, y_arr, cv=10, scoring='accuracy')
scores









    Out[48]:





array([ 0.48484848,  0.57575758,  0.60606061,  0.48484848,  0.51515152,
        0.66666667,  0.48484848,  0.45454545,  0.5       ,  0.41935484])



In [49]:

    
k_range = list(range(1, 61))
k_scores = []
for k in k_range:
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn, X, y_arr, cv=10, scoring='accuracy')
    k_scores.append(scores.mean())
plt.plot(k_range, k_scores)









    Out[49]:





[<matplotlib.lines.Line2D at 0x7f1d03ab6350>]

Gradient Boosting



In [50]:

    
from xgboost import XGBClassifier



In [51]:

    
xgbtest = XGBClassifier(
     learning_rate =1,
     n_estimators=2,
     max_depth=6,
     min_child_weight=8,
     gamma=0.1,
     subsample=0.9,
     colsample_bytree=0.8,
     objective= 'binary:logistic',
     scale_pos_weight=1,
     seed=27)
xgbtest.fit(X_timetrain, Y_timetrain_arr)
xgbtest_pred = xgbtest.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, xgbtest_pred) *100, "%"









    



Accuracy is  62.0689655172 %

Get Prediction for Web App



In [55]:

    
def getPrediction(match_id):
    '''Returns the prediction for the given match
    
    Args: match_id (int): Match ID for the required game
    
    Returns: String: Predicted winner of the game and probability of victory 
    '''
    results = {}
    match_row = matches.loc[matches['id'] == match_id]
    team1name = match_row.team1.unique()[0]
    team2name = match_row.team2.unique()[0]
    toPredict = X_timetest.loc[X_timetest.index == match_id-1].values
    prediction_prob = knn1.predict_proba(toPredict)
    prediction = knn1.predict(toPredict)
    if prediction[0] > 0:
        results['name'] = str(team1name)
        results['prob'] = float(prediction_prob[0][1])*100
    else:
        results['name'] = str(team2name)
        results['prob'] = float(prediction_prob[0][0])*100
    return results

getPrediction(517)









    Out[55]:





{'name': 'Mumbai Indians', 'prob': 51.61290322580645}

	id	season	city	date	team1	team2	toss_winner	toss_decision	result	winner	win_by_runs	win_by_wickets	player_of_match	venue	umpire1	umpire2	umpire3
0	1	2008	Bangalore	2008-04-18	Kolkata Knight Riders	Royal Challengers Bangalore	Royal Challengers Bangalore	field	normal	Kolkata Knight Riders	140	0	BB McCullum	M Chinnaswamy Stadium	Asad Rauf	RE Koertzen	NaN
1	2	2008	Chandigarh	2008-04-19	Chennai Super Kings	Kings XI Punjab	Chennai Super Kings	bat	normal	Chennai Super Kings	33	0	MEK Hussey	Punjab Cricket Association Stadium, Mohali	MR Benson	SL Shastri	NaN
2	3	2008	Delhi	2008-04-19	Rajasthan Royals	Delhi Daredevils	Rajasthan Royals	bat	normal	Delhi Daredevils	0	9	MF Maharoof	Feroz Shah Kotla	Aleem Dar	GA Pratapkumar	NaN
3	4	2008	Mumbai	2008-04-20	Mumbai Indians	Royal Challengers Bangalore	Mumbai Indians	bat	normal	Royal Challengers Bangalore	0	5	MV Boucher	Wankhede Stadium	SJ Davis	DJ Harper	NaN
4	5	2008	Kolkata	2008-04-20	Deccan Chargers	Kolkata Knight Riders	Deccan Chargers	bat	normal	Kolkata Knight Riders	0	5	DJ Hussey	Eden Gardens	BF Bowden	K Hariharan	NaN
5	6	2008	Jaipur	2008-04-21	Kings XI Punjab	Rajasthan Royals	Kings XI Punjab	bat	normal	Rajasthan Royals	0	6	SR Watson	Sawai Mansingh Stadium	Aleem Dar	RB Tiffin	NaN
6	7	2008	Hyderabad	2008-04-22	Deccan Chargers	Delhi Daredevils	Deccan Chargers	bat	normal	Delhi Daredevils	0	9	V Sehwag	Rajiv Gandhi International Stadium, Uppal	IL Howell	AM Saheba	NaN
7	8	2008	Chennai	2008-04-23	Chennai Super Kings	Mumbai Indians	Mumbai Indians	field	normal	Chennai Super Kings	6	0	ML Hayden	MA Chidambaram Stadium, Chepauk	DJ Harper	GA Pratapkumar	NaN
8	9	2008	Hyderabad	2008-04-24	Deccan Chargers	Rajasthan Royals	Rajasthan Royals	field	normal	Rajasthan Royals	0	3	YK Pathan	Rajiv Gandhi International Stadium, Uppal	Asad Rauf	MR Benson	NaN
9	10	2008	Chandigarh	2008-04-25	Kings XI Punjab	Mumbai Indians	Mumbai Indians	field	normal	Kings XI Punjab	66	0	KC Sangakkara	Punjab Cricket Association Stadium, Mohali	Aleem Dar	AM Saheba	NaN

	Avg_SR_Difference	Avg_WPR_Difference	Total_MVP_Difference	Prev_Enc_Team1_WinPerc	Total_RF_Difference
398	-9.646646	0.466526	6.0	16.666667	0.000000
399	4.963605	0.097800	12.0	50.000000	0.000000
400	7.079810	0.432566	11.0	70.000000	0.000000
402	21.485599	1.176414	17.0	53.846154	-100.000000
403	-4.503334	1.663169	15.0	54.545455	100.000000
404	-7.297630	-0.332117	-1.0	72.727273	-100.000000
405	12.183316	2.316918	5.0	66.666667	-50.000000
407	-5.341707	2.620287	12.0	61.538462	50.000000
408	5.093091	0.588349	20.0	54.545455	-50.000000
410	-13.668459	-2.328697	0.0	60.000000	-66.666667
411	15.451031	0.903107	-2.0	54.545455	66.666667
412	16.852669	-1.198669	-19.0	50.000000	33.333333
413	-6.674135	2.225351	8.0	50.000000	-33.333333
415	-19.344665	-2.002792	4.0	41.666667	-66.666667
418	12.797760	0.233431	-7.0	70.000000	66.666667
419	7.351864	4.156536	-1.0	58.333333	100.000000
420	5.659837	-2.136710	4.0	50.000000	33.333333
422	7.195161	-0.935038	-8.0	45.454545	33.333333
423	-12.464962	-2.197910	-7.0	30.769231	-66.666667
424	-3.504686	0.247855	-1.0	50.000000	33.333333
425	5.508879	-1.167825	-9.0	50.000000	33.333333
426	8.303271	1.050310	-14.0	36.363636	-33.333333
428	35.077496	1.280748	-1.0	61.538462	66.666667
430	-7.544654	0.752493	-8.0	56.250000	0.000000
431	21.871322	0.886981	-11.0	54.545455	33.333333
432	0.003466	-1.131751	3.0	50.000000	-100.000000
434	11.859779	1.592142	-11.0	35.714286	33.333333
435	2.600434	1.780807	5.0	54.545455	0.000000
437	-8.157091	-0.853403	-7.0	76.923077	0.000000
438	13.456476	2.440403	-4.0	53.846154	66.666667
...	...	...	...	...	...
474	-6.548623	0.610484	-4.0	53.333333	-100.000000
475	-11.068650	-0.136905	-22.0	46.153846	-33.333333
478	7.048118	-1.246590	22.0	56.250000	-16.666667
479	-0.596885	0.020515	-11.0	50.000000	0.000000
480	17.723755	0.354974	3.0	53.846154	66.666667
482	-5.061380	-1.212153	7.0	53.846154	33.333333
483	-5.004302	0.906597	-4.0	41.666667	33.333333
485	2.012543	0.366985	21.0	57.142857	66.666667
487	2.910151	0.425559	7.0	61.538462	-33.333333
488	14.275080	0.459693	15.0	61.538462	-33.333333
489	-13.536521	0.097342	5.0	60.000000	-33.333333
490	4.510188	-0.570758	7.0	57.142857	0.000000
492	-13.615487	-0.737024	-1.0	46.666667	66.666667
493	10.775508	-0.057722	7.0	60.000000	0.000000
494	4.569589	-0.231439	25.0	58.823529	0.000000
496	-18.726971	-0.755737	-14.0	53.333333	-33.333333
497	-5.574733	0.886535	5.0	35.714286	66.666667
499	5.014283	1.972348	17.0	52.631579	0.000000
500	6.415078	0.254813	-18.0	40.000000	-33.333333
502	-1.343308	2.062729	-3.0	43.750000	-33.333333
503	-5.738591	-0.046456	16.0	62.500000	33.333333
505	6.941454	1.678318	28.0	73.333333	33.333333
506	5.622383	-1.324729	-16.0	60.000000	-66.666667
507	-0.677689	-0.313345	-11.0	66.666667	33.333333
509	-0.716536	1.824407	-33.0	42.857143	0.000000
510	-3.303823	-0.271935	-16.0	50.000000	0.000000
513	6.315981	-0.617777	-24.0	50.000000	0.000000
514	-2.200375	0.969143	5.0	50.000000	0.000000
515	-0.521025	1.039181	-23.0	38.888889	33.333333
516	-1.575550	-1.707931	-24.0	52.380952	0.000000