FiveThirtyEight's The New Science of Hitting serves as a nice primer on how Statcast data is giving us a new glimpse into the game's inner workings. This notebook will serve as an example of how to re-create their analysis and work with Statcast data.
In [1]:
#imports
from pybaseball import statcast
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
In [2]:
# collect Statcast data on all pitches from the months of May and June
data = statcast('2017-05-01', '2017-06-30')
print(data.shape)
In [3]:
data2 = data.dropna(subset=['launch_angle', 'launch_speed', 'estimated_ba_using_speedangle'])
In [4]:
data2.shape
Out[4]:
In [5]:
fig, ax = plt.subplots(figsize=(8, 8))
sns.despine(fig, left=True, bottom=True)
sns.scatterplot(x="launch_speed", y="launch_angle",
hue="estimated_ba_using_speedangle",
palette='viridis',
data=data2, ax=ax)
ax.set_title("Hit probability by Launch Angle and Exit Velocity");
As you can see, the "sweet spot" where these two metrics are just right for producing a hit is not a simple blob on the graph as one might expect! In fact, there seem to be two distinct patterns happening here. Let's take a look at this same chart for home runs only and see if those are responsible for the differing patterns.
In [6]:
data2['hr'] = data2.events=='home_run'
In [7]:
fig, ax = plt.subplots(figsize=(8, 8))
sns.despine(fig, left=True, bottom=True)
sns.scatterplot(x="launch_speed", y="launch_angle",
hue="hr",
palette='binary',
data=data2, ax=ax)
ax.set_title("Home Runs by Launch Angle and Exit Velocity");
So there you have it. From the two patterns observed when plotting hit probability against exit velocity and launch angle, the round cluster around 115mph speed and 30 degrees launch angle represents mostly home runs, while the other pattern observed is within-the-park hits.
Here we will look at a few metrics, but most importantly expected wOBA (weighted on base average), as they vary by hit speed. First, let's break hit speed into five evenly spaced bins and see how a few variables look when broken down by exit velocity.
In [8]:
data2.groupby(pd.cut(data2.launch_speed, 6)).mean()
Out[8]:
There are definitely some patterns there. To take a closer look at the metric of interest, let's use a few more bins and graph the expected weighted on base average value of a hit against its exit velocity. This should give a rough answer as to whether it's always better to hit the ball harder.
In [9]:
groups = data2.groupby(pd.cut(data2.launch_speed, 30))
ax = groups['estimated_woba_using_speedangle'].mean().plot()
ax.set_xlabel('Launch Speed', fontsize=14)
ax.set_ylabel('Expected wOBA Value', fontsize=14);
So, while it's usually better to hit the ball harder, there is a slight downward "dip" in the graph. This most likely represents the fly ball zone, where the ball has been hit hard enogh to get past the infield and gain some air, but not hard enough to make it past the fences.
In general, however, this confirms the trend we would expect. Harder-hit balls tend to give batters more bases.