Applying Deterministic methods

Getting started

All exercises rely on chainladder v0.5.2 and later.



In [1]:

    
import pandas as pd
import numpy as np
import chainladder as cl
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
cl.__version__









    Out[1]:





'0.5.2'

Basic Chainladder

The basic chainladder method is entirely specified by its development pattern selections. For this reason, the Chainladder estimator takes no additional assumptions, i.e. no additional arguments.



In [2]:

    
cl.Chainladder()









    Out[2]:





Chainladder()

Let's start by creating an Triangle with Development patterns and a TailCurve. Recall, we can bundle these two estimators into a single Pipeline if we wish.



In [3]:

    
genins = cl.load_dataset('genins')

genins_dev = cl.Pipeline(
    [('dev', cl.Development()),
     ('tail', cl.TailCurve())]).fit_transform(genins)

We can now use the basic Chainladder estimator to estimate ultimate_ values of our Triangle.



In [4]:

    
genins_model = cl.Chainladder().fit(genins_dev)
genins_model.ultimate_

We can also view the ibnr_. Techincally the term IBNR is reserved for incurred but not reported, but the chainladder models use it to generally describe to amounts out of the lower half of the triangle.



In [5]:

    
genins_model.ibnr_

It is often useful to see the completed Triangle and this can be accomplished by inspecting the full_triangle_. As with most other estimator properties, the full_triangle_ is itself a Triangle and can be manipulated as such.



In [6]:

    
genins_model.full_triangle_.dev_to_val().cum_to_incr()

Notice the calendar year of our ultimates. While ultimates will generally be realized before this date, the chainladder package picks the highest allowable date available for its ultimate_ valuation.



In [7]:

    
genins_model.full_triangle_.valuation_date









    Out[7]:





Timestamp('2262-01-01 00:00:00')

Another useful property is the full_expectation_. Like the full_triangle, it "squares" the Triangle, but replaces the known data with the expected implied by the model.



In [8]:

    
genins_model.full_expectation_

With some clever arithmetic, we can use these objects to give us other useful information. For example, we can retrospectively review the actual Triangle against its modeled expectation.



In [9]:

    
(genins_model.full_triangle_ - genins_model.full_expectation_).dropna()

Getting comfortable with manipulating Triangles will greatly improve your ability to extract value out of the chainladder package. Here is another way of getting the same answer.



In [10]:

    
(genins - genins_model.full_expectation_).dropna()

To see the IBNR expected to runoff in the up coming year...



In [11]:

    
cal_yr_ibnr = genins_model.full_triangle_.dev_to_val().cum_to_incr()
cal_yr_ibnr[cal_yr_ibnr.valuation=='2011']

The BornhuetterFerguson method

The BornhuetterFerguson estimator is another deterministic method having many of the same attributes as the Chainladder estimator. It comes with one assumption, the apriori. This is a scalar multiplier that is to be applied to an exposure vector o determine an apriori ultimate estimate of our model.

Since the CAS Loss Reserve Database has premium, we will use it as an example. Let's grab the paid loss and net earned premium for the commercial auto line of business.



In [12]:

    
comauto = cl.load_dataset('clrd').groupby('LOB').sum().loc['comauto'][['CumPaidLoss', 'EarnedPremNet']]
comauto









    Out[12]:





  
    
      
      Triangle Summary
    
  
  
    
      Valuation:
      1997-12
    
    
      Grain:
      OYDY
    
    
      Shape:
      (1, 2, 10, 10)
    
    
      Index:
      [LOB]
    
    
      Columns:
      [CumPaidLoss, EarnedPremNet]

Let's set an apriori Loss Ratio estimate of 75%



In [13]:

    
bf_model = cl.BornhuetterFerguson(apriori=0.75)

The BornhuetterFerguson method along with all other expected loss methods like CapeCod and Benktander, need to take in an exposure vector. The exposure vector has to be a Triangle itself. The Triangle class supports single exposure vectors and you can learn more about it in our previous Triangle tutorial.

The signature for the fit function for all estmators is:

estimator.fit(X, y=None, sample_weight)

This signature comes from sklearn and while we don't use y, the sample_weight is used to include an exposure vector for our BornhuetterFerguson model.



In [14]:

    
bf_model.fit(X=comauto['CumPaidLoss'],
             sample_weight=comauto['EarnedPremNet'].latest_diagonal)









    Out[14]:





BornhuetterFerguson(apriori=0.75)

Having an apriori that takes on only a constant for all origins can be limiting. This shouldn't stop the practitioner from exploiting the fact that the apriori can be embedded directly in the exposure vector itself allowing full cusomization of the apriori.



In [15]:

    
b1 = cl.BornhuetterFerguson(apriori=0.75).fit(
    X=comauto['CumPaidLoss'],
    sample_weight=comauto['EarnedPremNet'].latest_diagonal)

b2 = cl.BornhuetterFerguson(apriori=1.0).fit(
    X=comauto['CumPaidLoss'],
    sample_weight=0.75*comauto['EarnedPremNet'].latest_diagonal)

b1.ultimate_ == b2.ultimate_









    Out[15]:





True

Let's compare the results of our BornhuetterFerguson model to our Chainladder model.



In [16]:

    
cl_model = cl.Chainladder().fit(comauto['CumPaidLoss'])
bf_model.ultimate_.to_frame().merge(
    cl_model.ultimate_.to_frame(),
    left_index=True, right_index=True,
    suffixes=('BF', 'CL')).plot(title='BF vs CL');

The Benktander method

The Benktander method is similar to the BornhuetterFerguson method, but allows the specification of one additional assumption, n_iters. The Benktander method generalizes both the BornhuetterFerguson and the Chainladder estimator through this assumption.

where n_iters=1 we get the BornhuetterFerguson estimator
where n_iters is sufficiently large, we converge to the Chainladder estimator.



In [17]:

    
bk_model = cl.Benktander(apriori=0.75, n_iters=3)

Fitting the Bentander method looks identical to the BornhuetterFerguson method.



In [18]:

    
bk_model.fit(X=comauto['CumPaidLoss'],
             sample_weight=comauto['EarnedPremNet'].latest_diagonal)









    Out[18]:





Benktander(apriori=0.75, n_iters=3)

Once fit, it has the usual properties we would expect - ultimate_, ibnr_, full_triangle_, and full_expectation_. Let's re-use our same Chainladder comparison with the Benktander model. Notice that even at n_iters=3 it already approximates the Chainladder far better than the BornhuetterFerguson method.



In [19]:

    
bk_model.ultimate_.to_frame().merge(
    cl_model.ultimate_.to_frame(),
    left_index=True, right_index=True,
    suffixes=('BF', 'CL')).plot(title='BF vs CL');

Cape Cod method

CapeCod is similar to the BornhuetterFerguson method except its apriori is computed from the Triangle itself. Instead of specifying an apriori, you instead specify a decay and a trend.

decay is a rate that gives weight to earlier acident periods
trend is a trend rate along the origin axis to reflect possible systematic inflationary impacts on the apriori.



In [20]:

    
cc_model = cl.CapeCod(decay=1, trend=0).fit(
    X=comauto['CumPaidLoss'],
    sample_weight=comauto['EarnedPremNet'].latest_diagonal)

When you fit a CapeCod, you can see the apriori it computes given the decay and trend assumptions. Since it is an estimated parameter, the CapeCod attribute is called the apriori_ with a trailing underscore.



In [21]:

    
cc_model.apriori_









    Out[21]:





  
    
      Origin
      CumPaidLoss
    
  
  
    
      1988
      0.6857
    
    
      1989
      0.6857
    
    
      1990
      0.6857
    
    
      1991
      0.6857
    
    
      1992
      0.6857
    
    
      1993
      0.6857
    
    
      1994
      0.6857
    
    
      1995
      0.6857
    
    
      1996
      0.6857
    
    
      1997
      0.6857

At a decay=1, each origin period gets the same apriori_. Let's verify that this apriori_ makes sense. It should be the latest incurred losses over the 'used up premium' where 'used up premium' is the premium vector / CDF.



In [22]:

    
latest_diagonal = comauto['CumPaidLoss'].latest_diagonal

cdf_as_origin_vector = (
    cl.Chainladder().fit(comauto['CumPaidLoss']).ultimate_ / 
    comauto['CumPaidLoss'].latest_diagonal)

used_up_premium = comauto['EarnedPremNet'].latest_diagonal / cdf_as_origin_vector

apriori_check = latest_diagonal.sum() / used_up_premium.sum()
apriori_check









    Out[22]:





0.6856862224535671

At a decay=0, the apriori_ for each origin period stands on its own.



In [23]:

    
cc_model = cl.CapeCod(decay=0, trend=0).fit(X=comauto['CumPaidLoss'],
             sample_weight=comauto['EarnedPremNet'].latest_diagonal)
cc_model.apriori_









    Out[23]:





  
    
      Origin
      CumPaidLoss
    
  
  
    
      1988
      0.6853
    
    
      1989
      0.7041
    
    
      1990
      0.6903
    
    
      1991
      0.6478
    
    
      1992
      0.6518
    
    
      1993
      0.6815
    
    
      1994
      0.6925
    
    
      1995
      0.7004
    
    
      1996
      0.7039
    
    
      1997
      0.7619

Doing the same on our manually calculated apriori_ yields the same result.



In [24]:

    
latest_diagonal / used_up_premium









    Out[24]:





  
    
      Origin
      CumPaidLoss
    
  
  
    
      1988
      0.6853
    
    
      1989
      0.7041
    
    
      1990
      0.6903
    
    
      1991
      0.6478
    
    
      1992
      0.6518
    
    
      1993
      0.6815
    
    
      1994
      0.6925
    
    
      1995
      0.7004
    
    
      1996
      0.7039
    
    
      1997
      0.7619

We can examine our apriori_ for whether it exhibits any trending over time.



In [25]:

    
cc_model.apriori_.plot(title='Apriori by Accident Year');

There is not much trend, but lets judgementally flatten it by modifying our trend assumption.



In [26]:

    
trended_cc_model = cl.CapeCod(decay=0, trend=0.01).fit(
    X=comauto['CumPaidLoss'],
    sample_weight=comauto['EarnedPremNet'].latest_diagonal)

cc_model.apriori_.to_frame().merge(
    trended_cc_model.apriori_.to_frame(),
    left_index=True, right_index=True,
    suffixes=('_original', '_trended')).plot(title='Original vs Trended Apriori');

Adding trend to the CapeCod method is intended to adjust apriori_s to a common level. Once at a common level, the apriori_ can be estimated from multiple origin periods using the decay factor.



In [27]:

    
trended_cc_model = cl.CapeCod(decay=0.75, trend=0.01).fit(
    X=comauto['CumPaidLoss'],
    sample_weight=comauto['EarnedPremNet'].latest_diagonal)

cc_model.apriori_.to_frame().merge(
    trended_cc_model.apriori_.to_frame(),
    left_index=True, right_index=True,
    suffixes=('_untrended', '_trended')).plot(title='Original vs Trended and Blended Apriori');

Once estimated, it is necessary to detrend our apriori_s back to their untrended levels and these are contained in detrended_apriori_. It is the detrended_apriori_ that gets used in the calculation of ultimate_ losses.



In [28]:

    
cc_model.apriori_.to_frame().merge(
    trended_cc_model.detrended_apriori_.to_frame(),
    left_index=True, right_index=True,
    suffixes=('_original', '_smoothed')).plot(title='Original vs Blended Apriori');

The detrended_apriori_ is a much smoother estimate of the initial expected ultimate_. With the detrended_apriori_ in hand, the CapeCod method estimator behaves exactly like our the BornhuetterFerguson model.



In [29]:

    
bf = cl.BornhuetterFerguson().fit(
    X=comauto['CumPaidLoss'],
    sample_weight=trended_cc_model.detrended_apriori_* comauto['EarnedPremNet'].latest_diagonal)
print('BF vs CC Difference: ', bf.ultimate_.sum() - trended_cc_model.ultimate_.sum())









    



BF vs CC Difference:  0.0

Let's now compare the CapeCod ultimate_ to the basic Chainladder method.



In [30]:

    
trended_cc_model.ultimate_.to_frame().merge(
    cl_model.ultimate_.to_frame(),
    left_index=True, right_index=True,
    suffixes=('CC', 'CL')).plot(title='CC vs CL');

Recap

All the deterministic estimators have ultimate_, ibnr_, full_expecation_ and full_triangle_ attributes that are themselves Triangles. These can be manipulated in a variety of ways to gain additional insights from your model.
The expected loss methods take in an exposure vector which itself is a Triangle through the sample_weight argument of the fit method.
The CapeCod method has the additional attributes apriori_ and detrended_apriori_ to accomodate the selection of its trend and decay assumptions.

Finally, these esimators work very well with the transformers discussed in previous tutorials. Let's go to town demonstating the compositional nature of these estimators.



In [31]:

    
wkcomp = cl.load_dataset('clrd').groupby('LOB').sum().loc['wkcomp'][['CumPaidLoss', 'EarnedPremNet']]

patterns = cl.Pipeline([
    ('dev', cl.Development(average=['simple']*5+['volume']*4,
                           n_periods=7, drop_valuation='1995')),
    ('tail', cl.TailCurve(curve='inverse_power', extrap_periods=80)),])

cc = cl.CapeCod(decay=0.8, trend=0.02).fit(
        X=patterns.fit_transform(wkcomp['CumPaidLoss']),
        sample_weight=wkcomp['EarnedPremNet'].latest_diagonal)

cc.ultimate_.plot(kind='bar', color='hotpink', alpha=0.7, legend=False,
                  title='Workers Compensation Industry Model\n' + \
                        'Cape Cod with InversePower Tail').set(
                  xlabel='Accident Year',
                  ylabel='Ultimates');

Origin	values
2001	4,016,553
2002	5,594,009
2003	5,537,497
2004	5,454,190
2005	5,001,513
2006	5,261,947
2007	5,827,759
2008	6,984,945
2009	5,808,708
2010	5,116,430

Origin	values
2001	115,090
2002	254,924
2003	628,182
2004	865,922
2005	1,128,202
2006	1,570,235
2007	2,344,629
2008	4,120,447
2009	4,445,414
2010	4,772,416

Origin	2001	2002	2003	2004	2005	2006	2007	2008	2009	2010	...	2012	2013	2014	2015	2016	2017	2018	2019	2020	2262
2001	357,848	766,940	610,542	482,940	527,326	574,398	146,342	139,950	227,229	67,948	...										68,482
2002		352,118	884,021	933,894	1,183,289	445,745	320,996	527,804	266,172	425,046	...	64,913									95,377
2003			290,507	1,001,799	926,219	1,016,654	750,816	146,923	495,992	280,405	...	93,678	64,257								94,413
2004				310,608	1,108,250	776,189	1,562,400	272,482	352,053	206,286	...	370,179	92,268	63,291							92,993
2005					443,160	693,190	991,983	769,488	504,851	470,639	...	226,674	339,456	84,611	58,038						85,275
2006						396,132	937,085	847,498	805,037	705,960	...	351,548	238,477	357,132	89,016	61,060					89,715
2007							440,832	847,631	1,131,398	1,063,269	...	424,501	389,349	264,121	395,534	98,588	67,626				99,362
2008								359,480	1,061,648	1,443,370	...	725,788	508,792	466,660	316,566	474,073	118,164	81,054			119,092
2009									376,686	986,608	...	1,089,616	603,569	423,113	388,076	263,257	394,241	98,266	67,405		99,038
2010										344,014	...	897,410	959,756	531,636	372,687	341,826	231,882	347,255	86,555	59,371	87,234

Origin	12	24	36	48	60	72	84	96	108	120	132	9999
2001	270,061	942,678	1,647,172	2,400,610	2,817,960	3,110,531	3,378,874	3,560,909	3,833,515	3,901,463	3,948,071	4,016,553
2002	376,125	1,312,904	2,294,081	3,343,423	3,924,682	4,332,157	4,705,889	4,959,416	5,339,085	5,433,719	5,498,632	5,594,009
2003	372,325	1,299,641	2,270,905	3,309,647	3,885,035	4,288,393	4,658,349	4,909,315	5,285,148	5,378,826	5,443,084	5,537,497
2004	366,724	1,280,089	2,236,741	3,259,856	3,826,587	4,223,877	4,588,268	4,835,458	5,205,637	5,297,906	5,361,197	5,454,190
2005	336,287	1,173,846	2,051,100	2,989,300	3,508,995	3,873,311	4,207,459	4,434,133	4,773,589	4,858,200	4,916,237	5,001,513
2006	353,798	1,234,970	2,157,903	3,144,956	3,691,712	4,074,999	4,426,546	4,665,023	5,022,155	5,111,171	5,172,231	5,261,947
2007	391,842	1,367,765	2,389,941	3,483,130	4,088,678	4,513,179	4,902,528	5,166,649	5,562,182	5,660,771	5,728,396	5,827,759
2008	469,648	1,639,355	2,864,498	4,174,756	4,900,545	5,409,337	5,875,997	6,192,562	6,666,635	6,784,799	6,865,853	6,984,945
2009	390,561	1,363,294	2,382,128	3,471,744	4,075,313	4,498,426	4,886,502	5,149,760	5,544,000	5,642,266	5,709,671	5,808,708
2010	344,014	1,200,818	2,098,228	3,057,984	3,589,620	3,962,307	4,304,132	4,536,015	4,883,270	4,969,825	5,029,196	5,116,430

Origin	12	24	36	48	60	72	84	96	108
2001	87,787	182,110	88,158	-182,340	-72,364	209,463	87,462	45,377	0
2002	-24,007	-76,765	-124,048	9,899	-125,615	-212,094	-58,022	-45,377
2003	-81,818	-7,335	-52,380	-74,468	100,960	-155,475	-29,439
2004	-56,116	138,769	-41,694	497,591	203,342	158,105
2005	106,873	-37,496	77,233	-91,479	-106,323
2006	42,334	98,247	22,812	-159,204
2007	48,990	-79,302	29,920
2008	-110,168	-218,227
2009	-13,875
2010	-0

Origin	2011
2001	46,608
2002	94,634
2003	375,833
2004	247,190
2005	334,148
2006	383,287
2007	605,548
2008	1,310,258
2009	1,018,834
2010	856,804

	Triangle Summary
Valuation:	1997-12
Grain:	OYDY
Shape:	(1, 2, 10, 10)
Index:	[LOB]
Columns:	[CumPaidLoss, EarnedPremNet]

Origin	CumPaidLoss
1988	0.6857
1989	0.6857
1990	0.6857
1991	0.6857
1992	0.6857
1993	0.6857
1994	0.6857
1995	0.6857
1996	0.6857
1997	0.6857

Origin	CumPaidLoss
1988	0.6853
1989	0.7041
1990	0.6903
1991	0.6478
1992	0.6518
1993	0.6815
1994	0.6925
1995	0.7004
1996	0.7039
1997	0.7619