In [1]:
import pandas as pd
import numpy as np
import chainladder as cl
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
cl.__version__
Out[1]:
In [2]:
cl.Chainladder()
Out[2]:
Let's start by creating an Triangle with Development patterns and a TailCurve. Recall, we can bundle these two estimators into a single Pipeline if we wish.
In [3]:
genins = cl.load_dataset('genins')
genins_dev = cl.Pipeline(
[('dev', cl.Development()),
('tail', cl.TailCurve())]).fit_transform(genins)
We can now use the basic Chainladder estimator to estimate ultimate_ values of our Triangle.
In [4]:
genins_model = cl.Chainladder().fit(genins_dev)
genins_model.ultimate_
Out[4]:
We can also view the ibnr_. Techincally the term IBNR is reserved for incurred but not reported, but the chainladder models use it to generally describe to amounts out of the lower half of the triangle.
In [5]:
genins_model.ibnr_
Out[5]:
It is often useful to see the completed Triangle and this can be accomplished by inspecting the full_triangle_. As with most other estimator properties, the full_triangle_ is itself a Triangle and can be manipulated as such.
In [6]:
genins_model.full_triangle_.dev_to_val().cum_to_incr()
Out[6]:
Notice the calendar year of our ultimates. While ultimates will generally be realized before this date, the chainladder package picks the highest allowable date available for its ultimate_ valuation.
In [7]:
genins_model.full_triangle_.valuation_date
Out[7]:
Another useful property is the full_expectation_. Like the full_triangle, it "squares" the Triangle, but replaces the known data with the expected implied by the model.
In [8]:
genins_model.full_expectation_
Out[8]:
With some clever arithmetic, we can use these objects to give us other useful information. For example, we can retrospectively review the actual Triangle against its modeled expectation.
In [9]:
(genins_model.full_triangle_ - genins_model.full_expectation_).dropna()
Out[9]:
Getting comfortable with manipulating Triangles will greatly improve your ability to extract value out of the chainladder package. Here is another way of getting the same answer.
In [10]:
(genins - genins_model.full_expectation_).dropna()
Out[10]:
To see the IBNR expected to runoff in the up coming year...
In [11]:
cal_yr_ibnr = genins_model.full_triangle_.dev_to_val().cum_to_incr()
cal_yr_ibnr[cal_yr_ibnr.valuation=='2011']
Out[11]:
The BornhuetterFerguson estimator is another deterministic method having many of the same attributes as the Chainladder estimator. It comes with one assumption, the apriori. This is a scalar multiplier that is to be applied to an exposure vector o determine an apriori ultimate estimate of our model.
Since the CAS Loss Reserve Database has premium, we will use it as an example. Let's grab the paid loss and net earned premium for the commercial auto line of business.
In [12]:
comauto = cl.load_dataset('clrd').groupby('LOB').sum().loc['comauto'][['CumPaidLoss', 'EarnedPremNet']]
comauto
Out[12]:
Let's set an apriori Loss Ratio estimate of 75%
In [13]:
bf_model = cl.BornhuetterFerguson(apriori=0.75)
The BornhuetterFerguson method along with all other expected loss methods like CapeCod and Benktander, need to take in an exposure vector. The exposure vector has to be a Triangle itself. The Triangle class supports single exposure vectors and you can learn more about it in our previous Triangle tutorial.
The signature for the fit function for all estmators is:
estimator.fit(X, y=None, sample_weight)
This signature comes from sklearn and while we don't use y, the sample_weight is used to include an exposure vector for our BornhuetterFerguson model.
In [14]:
bf_model.fit(X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
Out[14]:
Having an apriori that takes on only a constant for all origins can be limiting. This shouldn't stop the practitioner from exploiting the fact that the apriori can be embedded directly in the exposure vector itself allowing full cusomization of the apriori.
In [15]:
b1 = cl.BornhuetterFerguson(apriori=0.75).fit(
X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
b2 = cl.BornhuetterFerguson(apriori=1.0).fit(
X=comauto['CumPaidLoss'],
sample_weight=0.75*comauto['EarnedPremNet'].latest_diagonal)
b1.ultimate_ == b2.ultimate_
Out[15]:
Let's compare the results of our BornhuetterFerguson model to our Chainladder model.
In [16]:
cl_model = cl.Chainladder().fit(comauto['CumPaidLoss'])
bf_model.ultimate_.to_frame().merge(
cl_model.ultimate_.to_frame(),
left_index=True, right_index=True,
suffixes=('BF', 'CL')).plot(title='BF vs CL');
The Benktander method is similar to the BornhuetterFerguson method, but allows the specification of one additional assumption, n_iters. The Benktander method generalizes both the BornhuetterFerguson and the Chainladder estimator through this assumption.
n_iters=1 we get the BornhuetterFerguson estimatorn_iters is sufficiently large, we converge to the Chainladder estimator.
In [17]:
bk_model = cl.Benktander(apriori=0.75, n_iters=3)
Fitting the Bentander method looks identical to the BornhuetterFerguson method.
In [18]:
bk_model.fit(X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
Out[18]:
Once fit, it has the usual properties we would expect - ultimate_, ibnr_, full_triangle_, and full_expectation_. Let's re-use our same Chainladder comparison with the Benktander model. Notice that even at n_iters=3 it already approximates the Chainladder far better than the BornhuetterFerguson method.
In [19]:
bk_model.ultimate_.to_frame().merge(
cl_model.ultimate_.to_frame(),
left_index=True, right_index=True,
suffixes=('BF', 'CL')).plot(title='BF vs CL');
CapeCod is similar to the BornhuetterFerguson method except its apriori is computed from the Triangle itself. Instead of specifying an apriori, you instead specify a decay and a trend.
decay is a rate that gives weight to earlier acident periods trend is a trend rate along the origin axis to reflect possible systematic inflationary impacts on the apriori.
In [20]:
cc_model = cl.CapeCod(decay=1, trend=0).fit(
X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
When you fit a CapeCod, you can see the apriori it computes given the decay and trend assumptions. Since it is an estimated parameter, the CapeCod attribute is called the apriori_ with a trailing underscore.
In [21]:
cc_model.apriori_
Out[21]:
At a decay=1, each origin period gets the same apriori_. Let's verify that this apriori_ makes sense. It should be the latest incurred losses over the 'used up premium' where 'used up premium' is the premium vector / CDF.
In [22]:
latest_diagonal = comauto['CumPaidLoss'].latest_diagonal
cdf_as_origin_vector = (
cl.Chainladder().fit(comauto['CumPaidLoss']).ultimate_ /
comauto['CumPaidLoss'].latest_diagonal)
used_up_premium = comauto['EarnedPremNet'].latest_diagonal / cdf_as_origin_vector
apriori_check = latest_diagonal.sum() / used_up_premium.sum()
apriori_check
Out[22]:
At a decay=0, the apriori_ for each origin period stands on its own.
In [23]:
cc_model = cl.CapeCod(decay=0, trend=0).fit(X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
cc_model.apriori_
Out[23]:
Doing the same on our manually calculated apriori_ yields the same result.
In [24]:
latest_diagonal / used_up_premium
Out[24]:
We can examine our apriori_ for whether it exhibits any trending over time.
In [25]:
cc_model.apriori_.plot(title='Apriori by Accident Year');
There is not much trend, but lets judgementally flatten it by modifying our trend assumption.
In [26]:
trended_cc_model = cl.CapeCod(decay=0, trend=0.01).fit(
X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
cc_model.apriori_.to_frame().merge(
trended_cc_model.apriori_.to_frame(),
left_index=True, right_index=True,
suffixes=('_original', '_trended')).plot(title='Original vs Trended Apriori');
Adding trend to the CapeCod method is intended to adjust apriori_s to a common level. Once at a common level, the apriori_ can be estimated from multiple origin periods using the decay factor.
In [27]:
trended_cc_model = cl.CapeCod(decay=0.75, trend=0.01).fit(
X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
cc_model.apriori_.to_frame().merge(
trended_cc_model.apriori_.to_frame(),
left_index=True, right_index=True,
suffixes=('_untrended', '_trended')).plot(title='Original vs Trended and Blended Apriori');
Once estimated, it is necessary to detrend our apriori_s back to their untrended levels and these are contained in detrended_apriori_. It is the detrended_apriori_ that gets used in the calculation of ultimate_ losses.
In [28]:
cc_model.apriori_.to_frame().merge(
trended_cc_model.detrended_apriori_.to_frame(),
left_index=True, right_index=True,
suffixes=('_original', '_smoothed')).plot(title='Original vs Blended Apriori');
The detrended_apriori_ is a much smoother estimate of the initial expected ultimate_. With the detrended_apriori_ in hand, the CapeCod method estimator behaves exactly like our the BornhuetterFerguson model.
In [29]:
bf = cl.BornhuetterFerguson().fit(
X=comauto['CumPaidLoss'],
sample_weight=trended_cc_model.detrended_apriori_* comauto['EarnedPremNet'].latest_diagonal)
print('BF vs CC Difference: ', bf.ultimate_.sum() - trended_cc_model.ultimate_.sum())
Let's now compare the CapeCod ultimate_ to the basic Chainladder method.
In [30]:
trended_cc_model.ultimate_.to_frame().merge(
cl_model.ultimate_.to_frame(),
left_index=True, right_index=True,
suffixes=('CC', 'CL')).plot(title='CC vs CL');
ultimate_, ibnr_, full_expecation_ and full_triangle_ attributes that are themselves Triangles. These can be manipulated in a variety of ways to gain additional insights from your model.Triangle through the sample_weight argument of the fit method.CapeCod method has the additional attributes apriori_ and detrended_apriori_ to accomodate the selection of its trend and decay assumptions.Finally, these esimators work very well with the transformers discussed in previous tutorials. Let's go to town demonstating the compositional nature of these estimators.
In [31]:
wkcomp = cl.load_dataset('clrd').groupby('LOB').sum().loc['wkcomp'][['CumPaidLoss', 'EarnedPremNet']]
patterns = cl.Pipeline([
('dev', cl.Development(average=['simple']*5+['volume']*4,
n_periods=7, drop_valuation='1995')),
('tail', cl.TailCurve(curve='inverse_power', extrap_periods=80)),])
cc = cl.CapeCod(decay=0.8, trend=0.02).fit(
X=patterns.fit_transform(wkcomp['CumPaidLoss']),
sample_weight=wkcomp['EarnedPremNet'].latest_diagonal)
cc.ultimate_.plot(kind='bar', color='hotpink', alpha=0.7, legend=False,
title='Workers Compensation Industry Model\n' + \
'Cape Cod with InversePower Tail').set(
xlabel='Accident Year',
ylabel='Ultimates');