The Effect of Nationality on Tennis Play

Matt DeMichiel and Kevin Lee

The goal of this project will be to investigate on how nationality is related to different aspects of a tennis player's game and how certain match statistics affect the outcome of the match. For example, it is generally regarded to be true that because of the lack of clay courts available for practicing in the US, American players are not very good on this surface. However, the abundance of hard courts makes them very serious contenders at tournaments where the matches are played on surfaces that more closely resemble asphalt, such as sthe US Open or the Australian Open. On the flip side, Spanish players are known as masters of the clay court. This is due to a combination of the abundance of clay courts in spain and other cultural factors that make the general Spanish style of play very advantageous for success on clay. In fact, one Spaniard, Rafael Nadal, has had so much success that he universally has been given the moniker of “The King of Clay.”

This is one of the trends we would like to investigate. For the most part, these claims remain cultural hearsay within the tennis world. We're interested in analyzing match data and seeing if these general patterns actually have some quantitative basis. Using the data from thousands of matches, we would like to examine these assumptions, as well as look for new patterns that can be found. Do certain nationalities tend to spend more time on court? Are certain players more likely to hit aces than others? Who is more at risk for committing unforced errors? How important is saving break points on your serve to winning a match? Have these trends changed over time?

Preliminaries



In [1]:

    
import sys                             # system module 
import pandas as pd                    # data package
import matplotlib.pyplot as plt        # graphics module  
import datetime as dt                  # date and time module
import numpy as np                     # foundation for pandas 

%matplotlib inline                     

# check versions
print('Python version: ', sys.version)
print('Pandas version: ', pd.__version__)
print('Today: ', dt.date.today())









    



Python version:  3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
Pandas version:  0.19.0
Today:  2016-12-22

Collecting the Data

To look for and analyze these trends, we will be using work published by Jeff Sackmann, an author and entrepreneur in the sports statistics field. Jeff maintains a page on GitHub known as the “Match Charting Project,” in which he has recorded the results of tens of thousands of matches spanning over thirty years. These match results include data points such as court surface, amount of time on court, winner, loser, number of unforced errors, and many more. For the purposes of our research, we will only be examining the results of matches betwen 2013 and 2015. In addition, we will only be analyzing the results of players from seven countries. These countries are USA, Japan, Spain, Argentina, Australia, Serbia, and France. These countries give us a nice cross section of players from every continent and are the major countries in the tennis world with the most prominent players.



In [2]:

    
url2013 = 'https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_2013.csv' # 2013 match data
url2014 = 'https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_2014.csv' # 2014 match data
url2015 = 'https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_2015.csv' # 2015 match data
# convert each csv file into data frames
df2013 = pd.read_csv(url2013)
df2014 = pd.read_csv(url2014)
df2015 = pd.read_csv(url2015)



In [3]:

    
# a look at the plentiful amount of data provided by Jeff Sackmann (each year has the same data)
df2013.columns









    Out[3]:





Index(['tourney_id', 'tourney_name', 'surface', 'draw_size', 'tourney_level',
       'tourney_date', 'match_num', 'winner_id', 'winner_seed', 'winner_entry',
       'winner_name', 'winner_hand', 'winner_ht', 'winner_ioc', 'winner_age',
       'winner_rank', 'winner_rank_points', 'loser_id', 'loser_seed',
       'loser_entry', 'loser_name', 'loser_hand', 'loser_ht', 'loser_ioc',
       'loser_age', 'loser_rank', 'loser_rank_points', 'score', 'best_of',
       'round', 'minutes', 'w_ace', 'w_df', 'w_svpt', 'w_1stIn', 'w_1stWon',
       'w_2ndWon', 'w_SvGms', 'w_bpSaved', 'w_bpFaced', 'l_ace', 'l_df',
       'l_svpt', 'l_1stIn', 'l_1stWon', 'l_2ndWon', 'l_SvGms', 'l_bpSaved',
       'l_bpFaced'],
      dtype='object')

Seeing as we are not examining every variable, our final dataframe that we will be working with will not include every column. The following code keeps only the variables we are interested in looking at. In addition, each dataframe includes countries other than what we are interested in looking at. Therefore, we create a variable countries which contains string of the country codes we are examining.



In [4]:

    
numlist=[1,2,5,10,11,12,13,14,15,20,21,22,23,24,25,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48] 
## we created a list numbers corresponding to the variables we want to extract from the dataframe
atp2013 = df2013[numlist]
atp2014 = df2014[numlist]
atp2015 = df2015[numlist]
##extracts the metrics we're interested in and assigns it to a new dataframe corresponding to year 
countries=['USA','FRA','ESP','SRB','JPN','AUS','ARG'] ## we'll use this later. It's a list of countries we're interested in analyzing



In [5]:

    
# create our final master dataframe which combines the dataframes for each year
atp = atp2013.append(atp2014).append(atp2015)
atp.head()









    Out[5]:






  
    
      
      tourney_name
      surface
      tourney_date
      winner_name
      winner_hand
      winner_ht
      winner_ioc
      winner_age
      winner_rank
      loser_name
      ...
      w_bpFaced
      l_ace
      l_df
      l_svpt
      l_1stIn
      l_1stWon
      l_2ndWon
      l_SvGms
      l_bpSaved
      l_bpFaced
    
  
  
    
      0
      Kitzbuhel
      Clay
      20130728
      Leonardo Mayer
      R
      188.0
      ARG
      26.206708
      78.0
      Kenny De Schepper
      ...
      0.0
      3.0
      4.0
      78.0
      45.0
      30.0
      16.0
      10.0
      10.0
      12.0
    
    
      1
      Kitzbuhel
      Clay
      20130728
      Horacio Zeballos
      L
      188.0
      ARG
      28.254620
      54.0
      Martin Fischer
      ...
      16.0
      4.0
      2.0
      81.0
      44.0
      30.0
      21.0
      14.0
      5.0
      9.0
    
    
      2
      Kitzbuhel
      Clay
      20130728
      Marcel Granollers
      R
      190.0
      ESP
      27.296372
      53.0
      Mate Pavic
      ...
      6.0
      2.0
      3.0
      73.0
      39.0
      26.0
      19.0
      10.0
      6.0
      8.0
    
    
      3
      Kitzbuhel
      Clay
      20130728
      Guillermo Garcia Lopez
      R
      188.0
      ESP
      30.151951
      72.0
      Thomaz Bellucci
      ...
      15.0
      9.0
      3.0
      116.0
      62.0
      38.0
      27.0
      16.0
      6.0
      12.0
    
    
      4
      Kitzbuhel
      Clay
      20130728
      Robin Haase
      R
      190.0
      NED
      26.313484
      68.0
      Guillaume Rufin
      ...
      6.0
      2.0
      7.0
      88.0
      48.0
      32.0
      15.0
      11.0
      7.0
      11.0
    
  

5 rows × 37 columns

Finally, we decided that in order to keep our research as controlled as possible, we removed Grand Slam matches, which are played best 3 out of 5 sets. Our final dataframe consists of the results of matches that were played best 2 out of 3.



In [6]:

    
atp= atp[atp['best_of'].isin([3])]

Now that we have our "Master DataFrame", we decided to look at nationality's effects on play based on a few different metrics:

1) Length of Time of a Match
2) Win Percentage per Surface
3) Unforced Errors Committed Over Time
4) Break Points Saved as a percentage of Break Points Faced

The challenge in this dataframe is that for each match, there is a winner and a loser. This means that a lot of the data manipulation involves tracking the players both in the wins and the losses column.

Length of Time

Because Europeans play on clay on courts for all of their junior careers, their games are more suited to this surface. Since clay is a slow surface, one would anticipate that a European player would play longer matches. The same goes for Americans. Hard courts, which are notoriously fast, are the predominant surface in the US. This means American players are used to short points, so one would expect the overall match would be shorter.



In [7]:

    
# get average length of time for matches that a country won
g = atp[['winner_ioc','minutes']].groupby('winner_ioc')
gm=g.mean()
minW=minW= gm.loc[countries] #pull average match length for countries we want to examine

# because there is a winner_ioc and a loser_ioc, we have to rename the index to countries in order combine the dataframes
minW.index.name='Country'
minW



In [8]:

    
# now we do the same for the matches that the countries lost
l = atp[['loser_ioc','minutes']].groupby('loser_ioc')
gl=l.mean()
minL= gl.loc[countries]
minL.index.name='Country'
minL



In [9]:

    
# Combine the average won and lost time to a total average
minL['minutesW']=minW['minutes']
minL['Average']= (minL['minutes']+minL['minutesW'])/2
minL

We are only interested in the Average column, as that is the average time for all matches, so that is the one we used to create our graph.



In [10]:

    
minL['Average'].plot(kind='bar',
                    ylim= (80,105))









    Out[10]:





<matplotlib.axes._subplots.AxesSubplot at 0x119bb84a8>

This graph supports the popular consensus people have about players from certain countries. Spanish players are known as the clay court masters who play incredibly long points. As this graph shows, Spanish players play longer matches on average than every other country expect for Argentina, a country also known for its clay court play. The United States, as to be expected, plays among the shortest matches on average, only behind Japan, a country where hard courts are the predominant surface.

Win Percentage per Surface

As certain players grow up on certain surfaces, it is to be expected that this would be reflected in their win percentages. Our next analysis examines just that. To manipulate the data so that we can graph the win percentages by surfaces for each country, we had to count how many times the player's country showed up in the wins column and loss column, and then compute their win percentage.



In [11]:

    
w = atp[['winner_ioc','surface']].groupby(['surface']) # group the results by surface



In [12]:

    
# count of how many times each country shows up in the wins column and turn into a dataframe
t= w.winner_ioc.value_counts()
tt=pd.DataFrame(t)



In [13]:

    
tt.head()



In [14]:

    
# move surface out of the index to look at a country's wins per surface
tt=tt.unstack(level='surface')
tt.head()



In [15]:

    
tt=tt.loc[countries] # pull out countries we are examining
tt.columns=['Clay Wins', 'Grass Wins', 'Hard Wins']
tt.index.rename('Country', inplace=True) # changing winner_ioc to country allows us to add losses to the dataframe in a coherent way
tt=tt.reset_index() # in order to merge the data on country, it must not be in the index
tt



In [16]:

    
# compute losses by surface per country
l= atp[['loser_ioc','surface']].groupby(['surface'])



In [17]:

    
x= l.loser_ioc.value_counts()
xx=pd.DataFrame(x)
xx=xx.unstack(level='surface')
xx=xx.loc[countries]
xx.index.rename('Country', inplace=True)
xx.columns=['Clay Losses', 'Grass Losses', 'Hard Losses']
xx=xx.reset_index()
xx









    Out[17]:






  
    
      
      Country
      Clay Losses
      Grass Losses
      Hard Losses
    
  
  
    
      0
      USA
      60.0
      63.0
      346.0
    
    
      1
      FRA
      175.0
      64.0
      424.0
    
    
      2
      ESP
      381.0
      38.0
      367.0
    
    
      3
      SRB
      53.0
      7.0
      78.0
    
    
      4
      JPN
      14.0
      10.0
      87.0
    
    
      5
      AUS
      45.0
      32.0
      190.0
    
    
      6
      ARG
      192.0
      14.0
      109.0



In [18]:

    
wl = pd.merge(tt,xx,
             how= 'right',
             on= 'Country') #merge the two dataframes
wl=wl.set_index('Country')
wl









    Out[18]:






  
    
      
      Clay Wins
      Grass Wins
      Hard Wins
      Clay Losses
      Grass Losses
      Hard Losses
    
    
      Country
      
      
      
      
      
      
    
  
  
    
      USA
      52.0
      54.0
      321.0
      60.0
      63.0
      346.0
    
    
      FRA
      154.0
      79.0
      451.0
      175.0
      64.0
      424.0
    
    
      ESP
      533.0
      43.0
      412.0
      381.0
      38.0
      367.0
    
    
      SRB
      68.0
      7.0
      160.0
      53.0
      7.0
      78.0
    
    
      JPN
      28.0
      11.0
      101.0
      14.0
      10.0
      87.0
    
    
      AUS
      20.0
      40.0
      160.0
      45.0
      32.0
      190.0
    
    
      ARG
      209.0
      7.0
      109.0
      192.0
      14.0
      109.0

Now that we have the wins and losses by surface for each country, we can make a new dataframe that has the win percentage by surface by country. A for loop makes this simple.



In [19]:

    
surfaces=['Clay','Grass','Hard']
for surface in surfaces:
    wl[surface+' Win%']= wl[surface+' Wins']/(wl[surface+' Wins']+ wl[surface+' Losses'])
wl=wl[['Clay Win%','Grass Win%','Hard Win%']]
wl



In [20]:

    
fig, ax= plt.subplots()
wl.plot(ax=ax,
        kind='bar',
        color=['red','green','blue'],
        title= 'Win Percentage on Each Surface By Country')
ax.legend(loc=0)
ax.set_ylabel('Win Percentage')
ax.set_ylim(0.3,0.7)
ax.set_xticklabels( countries, rotation=45 )









    Out[20]:





[<matplotlib.text.Text at 0xaed78b1908>,
 <matplotlib.text.Text at 0xaed789c7b8>,
 <matplotlib.text.Text at 0xaed7934400>,
 <matplotlib.text.Text at 0xaed78a6f98>,
 <matplotlib.text.Text at 0xaed79345c0>,
 <matplotlib.text.Text at 0xaed7934ef0>,
 <matplotlib.text.Text at 0xaed7936940>]

Again, this graph supports certain popular opinions. Americans have a higher win percentage on hard courts than grass and clay. Spanish players are incredibly successful on clay courts. Australians, also known for their hard court play, have an incredibly poor record on clay, but do well on grass, a fast surface. Serbians seem to have a very high win percentage overall. However, that number is most likely skewed, as Serbian Novak Djokovic, the number 1 player in the world, is known to dominate his competition, pushing SRB's win percentage higher than normal.

Unforced Errors Over Time

One could assume that a consequence of playing shorter points would be more unforced errors are made as time goes on. However, unforced errors are not one of the variables that Jeff Sackmann provides in his dataset. We decided to use double faults, an unforced error of the serve, as a proxy. The hypothesis for this metric is that the longer a hard court player (American, Australian, Japanese) goes into a match, the more errors he makes, and vice versa for clay court players. Therefore, if we were to plot the length of the match in the x axis and number of doubles faults on the y axis, we would expect that the trend line would be steeper for hard court players than clay court players.



In [20]:

    
df= atp[['winner_ioc','w_df','loser_ioc','l_df','minutes']] #dataframe of only the variables we are looking at
df.head()



In [21]:

    
# For matches in which players lost, how many double faults were the for the length of the match
dfl = df[['loser_ioc','l_df','minutes']]
dfl=dfl.set_index('loser_ioc')
dfl=dfl.loc[countries]
dfl.dropna(inplace=True)
# rename columns to make it simpler to combine dataframes
dfl.index.names= ['ioc']
dfl=dfl.rename(columns={'l_df':'df'})
dfl.head()



In [22]:

    
# the same for matches in which the player won
dfw= df[['winner_ioc','w_df','minutes']]
dfw=dfw.set_index('winner_ioc')
dfw=dfw.loc[countries]
dfw.dropna(inplace=True)
# rename columns and index to make it easier to combine data
dfw.index.names= ['ioc']
dfw=dfw.rename(columns={'w_df':'df'})
dfw.head()



In [23]:

    
#combine dataframes
dfs= dfw.append(dfl)
dfs.head()



In [24]:

    
# combine data and organize it by country by length of match
dfs=dfs.reset_index()
dfs=dfs.set_index(['ioc','minutes'])
dfs=dfs.sort_index()
dfs.head()



In [25]:

    
fig, ax= plt.subplots(7, figsize=(10,20), sharex=True)
for country in countries:
    dfs.xs(country).reset_index(level=1).plot.scatter('minutes','df', ax= ax[countries.index(country)],
                                                     title=country, )

Examining these scatter plots reveals that there is no significant correlation between minutes played and number of unforced errors. This is likely due to the fact that best of 3 set matches are relatively short, so players are not likely to get tired in a 2 out of 3 set match. It is also possible that number of double faults is not an accurate proxy for unforced errors.

Break Points Saved

Now we look our last metric: break points saved. A break point in tennis is when the player returning serve can win the game by winning the next point. In tennis, there are certain points throughout a match that matter much more than other points. It's not at all uncommon to see a relatively even score in terms of total points won, but one player wins the match pretty handily. Especially in the men's game, a couple of break points, and therefore your ability to save them, is usually the difference between winning and losing. What we see at the top of the tennis game, in players like Andy Murray, Novak Djokovic, Rafael Nadal to name a few, is that they face fewer break points than other players, and most importantly, of the break points they face on their service games they save most of them.

The intuition behind investigating the differences among nationalities is that certain countries have players where service games are much more important to their success. For example, US players tend to play with big serves, so holding serve and saving break points should be very important to their success. On the other hand, Spanish players tend to rely less on big serves and more on their baseline game and ability to break their opponents' serve. If the BP Ratio is more important to US players as opposed to Spanish players, then we would expect a larger difference in BP Ratio between winners and losers for the US versus for Spain. It is, however, entirely possible that this metric is so important to success that no matter the nationality, winners have a certain BP Ratio and losers have a certain BP Ratio. It's also possible that there are simply no differences among nationalities.

In this final portion of the project, we're interested in investigating how important this Break Point Ratio is (we'll call it BP Ratio and it's just equal to total break points saved/total break points faced) to match success and how it varies across both winners and losers and players from different countries. Let's take a look.

First, let's start off by extracting the columns/variables we're interested in from the dataframe ATP. Then, we will attempt to organize the data by country and compute some simple mean statistics for the resulting dataframe. Lastly, we'll add a new varaible for BP Ratio to the dataframe.

As always, the data from our source is organized in terms of winners and losers by match. We'll first extract and organize the data for winners and then do the same for losers. Once we have both, we can combine the dataframes into one so we have all the data together organized by country. And lastly, we can run some simple plot methods on the dataframe to support our interpretations of the data with some visual representations.



In [26]:

    
varlist = ['winner_ioc','w_bpFaced','w_bpSaved',] ## list of variables we want



In [27]:

    
countries=['USA','FRA','ESP','SRB','JPN','AUS','ARG']

W_BPdf = atp[varlist] ## this dataframe is break point statistics for winners
W_BPdf.columns = ('Country','w_bpFaced','w_bpSaved')
W_BPdf.head()



In [28]:

    
comparison = W_BPdf['Country'].isin(countries)
selection = W_BPdf[comparison] ## now we select only the entries that correspond to countries we're interested in 
selection.head()

This data is now organized by country; however, it's still organized with respect to individual matches. What we have now is for each match what country the winner is from and how many break points the winning player faced and saved. What we want is a dataframe organized by how many break points, on average, winning players from each country both faced and saved. To accomplish this, we'll use the .groupby method.



In [29]:

    
df = selection.groupby('Country')



In [30]:

    
dm = df.mean() ## computes the mean of each varaible proportional to how many entries there were for each country
dm



In [31]:

    
dm['w_bpratio'] = dm['w_bpSaved']/dm['w_bpFaced'] ## now we add our new variable, bp ratio, as a new column to dm 
dm

We now have a dataframe that is exactly what we want: the averages for break points faced, break saved, and break point ratios for each country. Let's do the exact same with the data on losing players:



In [32]:

    
var2list = ['loser_ioc','l_bpFaced','l_bpSaved']
L_BPdf = atp[var2list]
L_BPdf.columns = ('Country','l_bpFaced','l_bpSaved')
L_BPdf.head()



In [33]:

    
comparison2 = L_BPdf['Country'].isin(countries)
selection2 = L_BPdf[comparison2]
selection2.head()



In [34]:

    
df2 = selection2.groupby('Country')



In [35]:

    
dm2 = df2.mean()
dm2



In [36]:

    
dm2['l_bpratio'] = dm2['l_bpSaved']/dm2['l_bpFaced']
dm2



In [37]:

    
result = pd.concat([dm, dm2], axis=1) 
## This is a slighlty different method than the merge method we learned in class, but essentially it combines two dataframes.



In [38]:

    
result

We now have a dataframe that's a combination of the winner and loser data. To make things easier for plotting purposes, let's take only the bp ratio numbers and extract them to a new dataframe.



In [39]:

    
varlist3 = ['l_bpratio','w_bpratio'] 
result2 = result[varlist3]
result2



In [40]:

    
fig, ax= plt.subplots()
result2.plot(ax=ax,
        kind='bar',
        color=['red','blue'],
        title= 'Break Point Ratios by Country')
ax.legend(loc=2)
ax.set_ylabel('Percentage of Break Points Saved')
ax.set_xlabel('Country')
ax.set_xticklabels( countries, rotation=45 )









    Out[40]:





[<matplotlib.text.Text at 0x11abfa978>,
 <matplotlib.text.Text at 0x11abfce80>,
 <matplotlib.text.Text at 0x118010b38>,
 <matplotlib.text.Text at 0x118014588>,
 <matplotlib.text.Text at 0x118014f98>,
 <matplotlib.text.Text at 0x1180189e8>,
 <matplotlib.text.Text at 0x11801c438>]

Our results indicate that the BP Ratio does in fact matter quite a bit to a player's success. Across the board, there are fairly large differences between loser bp ratios and winner bp ratios: for most countries, the difference between winners and losers is a full 10-15 percentage points. While there does seem to be a small amount of variation for these differences among nationalities (the BP Ratio seems to matter the least to the success of Australian players for some reason), for the most part, the differences are pretty significant and consistent between winning and losing players among different nationalities.

From this, we can conclude that, at least for the year 2015 and the countries represented in this particular analysis, the "BP Ratio" as we have defined it here in this project, is a pretty signfiicant factor in determining a player's success in a match. We cannot conclude wtih any reasonable degree of certainty any patterns in the BP Ratio among different countries other than that for the most part, the ratios are relatively consistent.

Conclusion

The results of our analysis confirmed certain theories the tennis community feels has about players from certain countries. After analyzing the average length of a 2 out of 3 set match, we see that Spanish players, who grow up playing on clay courts, have longer matches on average. Conversely, countries like Japan, USA, and Australia, whose players grow up on hard court, play much shorter matches. Our analysis of success rate/win percentage similarly confirms certain aspects of this theory. Again, Spanish players have a higher win percentage on clay than on hard and grass. Americans have a higher win percentage on hard court than clay or grass. Australian players have very little success on clay courts. Our analysis of comparing double faults to length of match did not return any valuable information to discern a player's endurance based on where he grew up playing. Lastly, having a higher ratio of break points saved to break points faced is an important factor for winning matches, an outcome consistent with the data for winners and losers and having little variation among players from different countries.



In [ ]:

	minutes	minutesW	Average
Country
USA	91.939474	92.684848	92.312161
FRA	93.676208	91.388489	92.532348
ESP	99.770167	97.551601	98.660884
SRB	96.153153	94.032787	95.092970
JPN	85.294872	96.401869	90.848370
AUS	94.548837	93.511628	94.030233
ARG	100.454225	99.451505	99.952865

	Clay Win%	Grass Win%	Hard Win%
Country
USA	0.464286	0.461538	0.481259
FRA	0.468085	0.552448	0.515429
ESP	0.583151	0.530864	0.528883
SRB	0.561983	0.500000	0.672269
JPN	0.666667	0.523810	0.537234
AUS	0.307692	0.555556	0.457143
ARG	0.521197	0.333333	0.500000

	w_bpFaced	w_bpSaved
Country
ARG	4.851735	3.261830
AUS	4.615023	3.281690
ESP	4.709845	3.266321
FRA	4.356509	3.087278
JPN	4.702899	3.173913
SRB	3.722467	2.409692
USA	3.624703	2.567696

	w_bpFaced	w_bpSaved	w_bpratio
Country
ARG	4.851735	3.261830	0.672302
AUS	4.615023	3.281690	0.711089
ESP	4.709845	3.266321	0.693509
FRA	4.356509	3.087278	0.708659
JPN	4.702899	3.173913	0.674884
SRB	3.722467	2.409692	0.647337
USA	3.624703	2.567696	0.708388

	l_bpFaced	l_bpSaved
Country
ARG	8.032258	4.406452
AUS	7.298113	4.079245
ESP	7.893590	4.315385
FRA	7.618902	4.353659
JPN	7.185185	3.657407
SRB	8.110294	4.750000
USA	7.212446	4.122318

	tourney_name	surface	tourney_date	winner_name	winner_hand	winner_ht	winner_ioc	winner_age	winner_rank	loser_name	...	w_bpFaced	l_ace	l_df	l_svpt	l_1stIn	l_1stWon	l_2ndWon	l_SvGms	l_bpSaved	l_bpFaced
0	Kitzbuhel	Clay	20130728	Leonardo Mayer	R	188.0	ARG	26.206708	78.0	Kenny De Schepper	...	0.0	3.0	4.0	78.0	45.0	30.0	16.0	10.0	10.0	12.0
1	Kitzbuhel	Clay	20130728	Horacio Zeballos	L	188.0	ARG	28.254620	54.0	Martin Fischer	...	16.0	4.0	2.0	81.0	44.0	30.0	21.0	14.0	5.0	9.0
2	Kitzbuhel	Clay	20130728	Marcel Granollers	R	190.0	ESP	27.296372	53.0	Mate Pavic	...	6.0	2.0	3.0	73.0	39.0	26.0	19.0	10.0	6.0	8.0
3	Kitzbuhel	Clay	20130728	Guillermo Garcia Lopez	R	188.0	ESP	30.151951	72.0	Thomaz Bellucci	...	15.0	9.0	3.0	116.0	62.0	38.0	27.0	16.0	6.0	12.0
4	Kitzbuhel	Clay	20130728	Robin Haase	R	190.0	NED	26.313484	68.0	Guillaume Rufin	...	6.0	2.0	7.0	88.0	48.0	32.0	15.0	11.0	7.0	11.0

	winner_ioc
surface	Clay	Grass	Hard
winner_ioc
ALG	2.0	NaN	1.0
ARG	209.0	7.0	109.0
AUS	20.0	40.0	160.0
AUT	65.0	7.0	49.0
BAR	1.0	NaN	1.0

	Country	Clay Wins	Grass Wins	Hard Wins
0	USA	52.0	54.0	321.0
1	FRA	154.0	79.0	451.0
2	ESP	533.0	43.0	412.0
3	SRB	68.0	7.0	160.0
4	JPN	28.0	11.0	101.0
5	AUS	20.0	40.0	160.0
6	ARG	209.0	7.0	109.0

	Country	Clay Losses	Grass Losses	Hard Losses
0	USA	60.0	63.0	346.0
1	FRA	175.0	64.0	424.0
2	ESP	381.0	38.0	367.0
3	SRB	53.0	7.0	78.0
4	JPN	14.0	10.0	87.0
5	AUS	45.0	32.0	190.0
6	ARG	192.0	14.0	109.0

	winner_ioc	w_df	loser_ioc	l_df	minutes
0	ARG	1.0	FRA	4.0	84.0
1	ARG	1.0	AUT	2.0	125.0
2	ESP	3.0	CRO	3.0	115.0
3	ESP	12.0	BRA	3.0	171.0
4	NED	0.0	FRA	7.0	110.0

	l_bpratio	w_bpratio
Country
ARG	0.548594	0.672302
AUS	0.558945	0.711089
ESP	0.546695	0.693509
FRA	0.571429	0.708659
JPN	0.509021	0.674884
SRB	0.585675	0.647337
USA	0.571556	0.708388