In [1]:
import pandas as pd
import numpy as np
import chainladder as cl
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
cl.__version__
Out[1]:
In [2]:
cl.Chainladder()
Out[2]:
Let's start by creating an Triangle with Development
patterns and a TailCurve
. Recall, we can bundle these two estimators into a single Pipeline
if we wish.
In [3]:
genins = cl.load_dataset('genins')
genins_dev = cl.Pipeline(
[('dev', cl.Development()),
('tail', cl.TailCurve())]).fit_transform(genins)
We can now use the basic Chainladder
estimator to estimate ultimate_
values of our Triangle
.
In [4]:
genins_model = cl.Chainladder().fit(genins_dev)
genins_model.ultimate_
Out[4]:
We can also view the ibnr_
. Techincally the term IBNR is reserved for incurred but not reported, but the chainladder
models use it to generally describe to amounts out of the lower half of the triangle.
In [5]:
genins_model.ibnr_
Out[5]:
It is often useful to see the completed Triangle
and this can be accomplished by inspecting the full_triangle_
. As with most other estimator properties, the full_triangle_
is itself a Triangle
and can be manipulated as such.
In [6]:
genins_model.full_triangle_.dev_to_val().cum_to_incr()
Out[6]:
Notice the calendar year of our ultimates. While ultimates will generally be realized before this date, the chainladder
package picks the highest allowable date available for its ultimate_
valuation.
In [7]:
genins_model.full_triangle_.valuation_date
Out[7]:
Another useful property is the full_expectation_
. Like the full_triangle
, it "squares" the Triangle
, but replaces the known data with the expected implied by the model.
In [8]:
genins_model.full_expectation_
Out[8]:
With some clever arithmetic, we can use these objects to give us other useful information. For example, we can retrospectively review the actual Triangle
against its modeled expectation.
In [9]:
(genins_model.full_triangle_ - genins_model.full_expectation_).dropna()
Out[9]:
Getting comfortable with manipulating Triangle
s will greatly improve your ability to extract value out of the chainladder
package. Here is another way of getting the same answer.
In [10]:
(genins - genins_model.full_expectation_).dropna()
Out[10]:
To see the IBNR expected to runoff in the up coming year...
In [11]:
cal_yr_ibnr = genins_model.full_triangle_.dev_to_val().cum_to_incr()
cal_yr_ibnr[cal_yr_ibnr.valuation=='2011']
Out[11]:
The BornhuetterFerguson
estimator is another deterministic method having many of the same attributes as the Chainladder
estimator. It comes with one assumption, the apriori
. This is a scalar multiplier that is to be applied to an exposure vector o determine an apriori ultimate estimate of our model.
Since the CAS Loss Reserve Database has premium, we will use it as an example. Let's grab the paid loss and net earned premium for the commercial auto line of business.
In [12]:
comauto = cl.load_dataset('clrd').groupby('LOB').sum().loc['comauto'][['CumPaidLoss', 'EarnedPremNet']]
comauto
Out[12]:
Let's set an apriori Loss Ratio estimate of 75%
In [13]:
bf_model = cl.BornhuetterFerguson(apriori=0.75)
The BornhuetterFerguson
method along with all other expected loss methods like CapeCod
and Benktander
, need to take in an exposure vector. The exposure vector has to be a Triangle
itself. The Triangle
class supports single exposure vectors and you can learn more about it in our previous Triangle tutorial.
The signature for the fit
function for all estmators is:
estimator.fit(X, y=None, sample_weight)
This signature comes from sklearn
and while we don't use y, the sample_weight
is used to include an exposure vector for our BornhuetterFerguson
model.
In [14]:
bf_model.fit(X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
Out[14]:
Having an apriori
that takes on only a constant for all origins can be limiting. This shouldn't stop the practitioner from exploiting the fact that the apriori
can be embedded directly in the exposure vector itself allowing full cusomization of the apriori
.
In [15]:
b1 = cl.BornhuetterFerguson(apriori=0.75).fit(
X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
b2 = cl.BornhuetterFerguson(apriori=1.0).fit(
X=comauto['CumPaidLoss'],
sample_weight=0.75*comauto['EarnedPremNet'].latest_diagonal)
b1.ultimate_ == b2.ultimate_
Out[15]:
Let's compare the results of our BornhuetterFerguson
model to our Chainladder
model.
In [16]:
cl_model = cl.Chainladder().fit(comauto['CumPaidLoss'])
bf_model.ultimate_.to_frame().merge(
cl_model.ultimate_.to_frame(),
left_index=True, right_index=True,
suffixes=('BF', 'CL')).plot(title='BF vs CL');
The Benktander method is similar to the BornhuetterFerguson
method, but allows the specification of one additional assumption, n_iters
. The Benktander method generalizes both the BornhuetterFerguson
and the Chainladder
estimator through this assumption.
n_iters=1
we get the BornhuetterFerguson
estimatorn_iters
is sufficiently large, we converge to the Chainladder
estimator.
In [17]:
bk_model = cl.Benktander(apriori=0.75, n_iters=3)
Fitting the Bentander
method looks identical to the BornhuetterFerguson
method.
In [18]:
bk_model.fit(X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
Out[18]:
Once fit, it has the usual properties we would expect - ultimate_
, ibnr_
, full_triangle_
, and full_expectation_
. Let's re-use our same Chainladder
comparison with the Benktander
model. Notice that even at n_iters=3
it already approximates the Chainladder
far better than the BornhuetterFerguson
method.
In [19]:
bk_model.ultimate_.to_frame().merge(
cl_model.ultimate_.to_frame(),
left_index=True, right_index=True,
suffixes=('BF', 'CL')).plot(title='BF vs CL');
CapeCod
is similar to the BornhuetterFerguson
method except its apriori
is computed from the Triangle
itself. Instead of specifying an apriori
, you instead specify a decay
and a trend
.
decay
is a rate that gives weight to earlier acident periods trend
is a trend rate along the origin
axis to reflect possible systematic inflationary impacts on the apriori.
In [20]:
cc_model = cl.CapeCod(decay=1, trend=0).fit(
X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
When you fit
a CapeCod
, you can see the apriori
it computes given the decay
and trend
assumptions. Since it is an estimated parameter, the CapeCod
attribute is called the apriori_
with a trailing underscore.
In [21]:
cc_model.apriori_
Out[21]:
At a decay=1
, each origin
period gets the same apriori_
. Let's verify that this apriori_
makes sense. It should be the latest incurred losses over the 'used up premium' where 'used up premium' is the premium vector / CDF.
In [22]:
latest_diagonal = comauto['CumPaidLoss'].latest_diagonal
cdf_as_origin_vector = (
cl.Chainladder().fit(comauto['CumPaidLoss']).ultimate_ /
comauto['CumPaidLoss'].latest_diagonal)
used_up_premium = comauto['EarnedPremNet'].latest_diagonal / cdf_as_origin_vector
apriori_check = latest_diagonal.sum() / used_up_premium.sum()
apriori_check
Out[22]:
At a decay=0
, the apriori_
for each origin
period stands on its own.
In [23]:
cc_model = cl.CapeCod(decay=0, trend=0).fit(X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
cc_model.apriori_
Out[23]:
Doing the same on our manually calculated apriori_
yields the same result.
In [24]:
latest_diagonal / used_up_premium
Out[24]:
We can examine our apriori_
for whether it exhibits any trending over time.
In [25]:
cc_model.apriori_.plot(title='Apriori by Accident Year');
There is not much trend, but lets judgementally flatten it by modifying our trend
assumption.
In [26]:
trended_cc_model = cl.CapeCod(decay=0, trend=0.01).fit(
X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
cc_model.apriori_.to_frame().merge(
trended_cc_model.apriori_.to_frame(),
left_index=True, right_index=True,
suffixes=('_original', '_trended')).plot(title='Original vs Trended Apriori');
Adding trend
to the CapeCod
method is intended to adjust apriori_
s to a common level. Once at a common level, the apriori_
can be estimated from multiple origin
periods using the decay
factor.
In [27]:
trended_cc_model = cl.CapeCod(decay=0.75, trend=0.01).fit(
X=comauto['CumPaidLoss'],
sample_weight=comauto['EarnedPremNet'].latest_diagonal)
cc_model.apriori_.to_frame().merge(
trended_cc_model.apriori_.to_frame(),
left_index=True, right_index=True,
suffixes=('_untrended', '_trended')).plot(title='Original vs Trended and Blended Apriori');
Once estimated, it is necessary to detrend our apriori_
s back to their untrended levels and these are contained in detrended_apriori_
. It is the detrended_apriori_
that gets used in the calculation of ultimate_
losses.
In [28]:
cc_model.apriori_.to_frame().merge(
trended_cc_model.detrended_apriori_.to_frame(),
left_index=True, right_index=True,
suffixes=('_original', '_smoothed')).plot(title='Original vs Blended Apriori');
The detrended_apriori_
is a much smoother estimate of the initial expected ultimate_
. With the detrended_apriori_
in hand, the CapeCod
method estimator behaves exactly like our the BornhuetterFerguson
model.
In [29]:
bf = cl.BornhuetterFerguson().fit(
X=comauto['CumPaidLoss'],
sample_weight=trended_cc_model.detrended_apriori_* comauto['EarnedPremNet'].latest_diagonal)
print('BF vs CC Difference: ', bf.ultimate_.sum() - trended_cc_model.ultimate_.sum())
Let's now compare the CapeCod ultimate_
to the basic Chainladder
method.
In [30]:
trended_cc_model.ultimate_.to_frame().merge(
cl_model.ultimate_.to_frame(),
left_index=True, right_index=True,
suffixes=('CC', 'CL')).plot(title='CC vs CL');
ultimate_
, ibnr_
, full_expecation_
and full_triangle_
attributes that are themselves Triangle
s. These can be manipulated in a variety of ways to gain additional insights from your model.Triangle
through the sample_weight
argument of the fit
method.CapeCod
method has the additional attributes apriori_
and detrended_apriori_
to accomodate the selection of its trend
and decay
assumptions.Finally, these esimators work very well with the transformers discussed in previous tutorials. Let's go to town demonstating the compositional nature of these estimators.
In [31]:
wkcomp = cl.load_dataset('clrd').groupby('LOB').sum().loc['wkcomp'][['CumPaidLoss', 'EarnedPremNet']]
patterns = cl.Pipeline([
('dev', cl.Development(average=['simple']*5+['volume']*4,
n_periods=7, drop_valuation='1995')),
('tail', cl.TailCurve(curve='inverse_power', extrap_periods=80)),])
cc = cl.CapeCod(decay=0.8, trend=0.02).fit(
X=patterns.fit_transform(wkcomp['CumPaidLoss']),
sample_weight=wkcomp['EarnedPremNet'].latest_diagonal)
cc.ultimate_.plot(kind='bar', color='hotpink', alpha=0.7, legend=False,
title='Workers Compensation Industry Model\n' + \
'Cape Cod with InversePower Tail').set(
xlabel='Accident Year',
ylabel='Ultimates');