Development tutorial

Getting started

All exercises rely on chainladder v0.5.2 and later.


In [1]:
import chainladder as cl
cl.__version__


Out[1]:
'0.6.2'

Should you develop with Chain-Ladder?

The Chain Ladder method is based on the strong assumptions of independence across origin years and across valuation years. Mack developed tests to verify if these assumptions hold, and these tests have been implemented in chainladder.

You should verify that your data satisfies these tests at the required confidence interval level. If it does not, you should consider if the development would be better done in other ways, for example using an AR model instead. Below is an example of how to test independence across origin and development years


In [ ]:
raa = cl.load_dataset('raa')
print('Correlation across valuation years? ', raa.valuation_correlation(p_critical=.1, total=True).z_critical.values)
print('Correlation across origin years? ', raa.development_correlation(p_critical=.5).t_critical.values)

The above tests show that the raa triangle is independent in both cases, suggesting that Chain Ladder is indeed an appropriate method to develop it. It is suggested to read Mack (1993) and Mack (1997) [refs] to ensure a proper understanding of the methodology and the choice of p_critical.

Mack (1997) differs from Mack (1993) for testing valuation years correlation. The first paper looks at the aggregate of all years, while the latter suggests to check independence for each valuation year and if dependence does appear in one year, to reduce the weight for such year in the development process [how?] To test for each valuation year one can run


In [ ]:
# Setting total=False provides a year-by-year test
raa.valuation_correlation(p_critical=.1, total=False).z_critical

Please note that the tests are run on the entire 4 dimensions of the triangle, and indeed the output of the test is a triangle itself

Estimator Basics

All development methods follow the sklearn estimator API. These estimators have a few properties that are worth getting used to.

You instiantiate the estimator with your choice of assumptions. In the case where you don't opt for any assumptions, defaults are chosen for you.


In [2]:
cl.Development()


Out[2]:
Development(average='volume', drop=None, drop_high=None, drop_low=None,
      drop_valuation=None, n_periods=-1, sigma_interpolation='log-linear')

At this point, we've chosen an estimator and assumptions (even if default) but we have not shown our estimator a Triangle. At this point it is merely instructions on how to fit development patterns, but no patterns exist as of yet.

All estimators have a fit method and you can pass a triangle to your estimator. Let's fit a Triangle in a Development estimator. Let's also assign the estimator to a variable so we can reference attributes about it.


In [3]:
genins = cl.load_dataset('genins')
dev = cl.Development().fit(genins)

Now that we have fit a Development estimator, it has many additional properties that didn't exist before fitting. For example, we ca view the ldf_


In [4]:
dev.ldf_


Out[4]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7473 1.4574 1.1739 1.1038 1.0863 1.0539 1.0766 1.0177

We can view the cdf_


In [5]:
dev.cdf_


Out[5]:
Origin 12-Ult 24-Ult 36-Ult 48-Ult 60-Ult 72-Ult 84-Ult 96-Ult 108-Ult
(All) 14.4466 4.1387 2.3686 1.6252 1.3845 1.2543 1.1547 1.0956 1.0177

Notice these extra attributes have a trailing underscore_. This is sklearn API convention and it is used to quickly distinguish between attributes that are assumptions (i.e. that exist pre-fit), and those that are estimated from the data (only exist post-fit)


In [6]:
print('Assumption parameter (no underscore):', dev.average)
print('Estimated parameter (underscore):\n',   dev.ldf_)


Assumption parameter (no underscore): volume
Estimated parameter (underscore):
           12-24     24-36     36-48     48-60     60-72     72-84     84-96    96-108   108-120
(All)  3.490607  1.747333  1.457413  1.173852  1.103824  1.086269  1.053874  1.076555  1.017725

Development Averaging

Now that we have a grounding in triangle manipulation and the basics of estimators, we can start getting more creative with customizing our Development factors.

The basic Development estimator uses a weighted regression through the origin for estimating parameters. Mack showed that using weighted regressions allows for:

  1. volume weighted average development patterns
  2. simple average development factors
  3. OLS regression estimate of development factor where the regression equation is Y = mX + 0

While he posited this framework to suggest the MackChainladder stochastic method, it is an elegant form even for deterministic development pattern selection.


In [7]:
vol = cl.Development(average='volume').fit(genins).ldf_
vol


Out[7]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7473 1.4574 1.1739 1.1038 1.0863 1.0539 1.0766 1.0177

In [8]:
sim = cl.Development(average='simple').fit(genins).ldf_
sim


Out[8]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.5661 1.7456 1.4520 1.1810 1.1112 1.0848 1.0527 1.0748 1.0177

In most cases, estimator attributes are Triangles themselves and can be manipulated with just like raw triangles.


In [9]:
print('LDF Type: ', type(vol))
print('Difference between volume and simple average:')
vol-sim


LDF Type:  <class 'chainladder.core.triangle.Triangle'>
Difference between volume and simple average:
Out[9]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) -0.0755 0.0018 0.0055 -0.0071 -0.0074 0.0015 0.0011 0.0018

Choosing how you average your LDFs can be done independently for each age-to-age period. For example, we can use volume averaging on the first pattern, simple the second, regression the third, and then repeat the cycle as follows:


In [10]:
cl.Development(average=['volume', 'simple', 'regression']*3).fit(genins).ldf_


Out[10]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7456 1.4619 1.1739 1.1112 1.0873 1.0539 1.0748 1.0177

Another example, using volume-weighting for the first and last three patterns with simple averaging in between.


In [11]:
cl.Development(average=['volume']+['simple']*5+['volume']*3).fit(genins).ldf_


Out[11]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4906 1.7456 1.4520 1.1810 1.1112 1.0848 1.0539 1.0766 1.0177

Averaging Period

Development comes with an n_periods parameter that allows you to select the latest n valuation periods for fitting your development patterns. n_periods=-1 is used to indicate the usage of all available periods.


In [12]:
cl.Development(n_periods=3).fit(genins).ldf_


Out[12]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.4604 1.8465 1.3920 1.1539 1.0849 1.0974 1.0539 1.0766 1.0177

The units of n_periods follows the origin_grain of your triangle.


In [13]:
dev = cl.Development(n_periods=5).fit(genins)
print('Using ' + str(dev.n_periods) + str(genins.origin_grain) + ' Avg')
dev.ldf_


Using 5Y Avg
Out[13]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.2448 1.7867 1.4682 1.1651 1.1038 1.0863 1.0539 1.0766 1.0177

Much like average, n_periods can also be set for each age-to-age individually.


In [14]:
cl.Development(n_periods=[8,2,6,5,-1,2,-1,-1,5]).fit(genins).ldf_


Out[14]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.5325 1.9502 1.4808 1.1651 1.1038 1.0825 1.0539 1.0766 1.0177

Note that if you want more n_periods than are available for any particular age-to-age period, all available periods will be used instead.


In [15]:
cl.Development(n_periods=[1,2,3,4,5,6,7,8,9]).fit(genins).ldf_ == \
cl.Development(n_periods=[1,2,3,4,5,4,3,2,1]).fit(genins).ldf_


Out[15]:
True

Even with n_periods, there are situations where you might want to be more surgical in your picks. For example, you could have a valuation period with bad data and wish to omit the entire diagonal from your averaging.


In [16]:
cl.Development(drop_valuation='2004').fit(genins).ldf_


Out[16]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.3797 1.7517 1.4426 1.1651 1.1038 1.0863 1.0539 1.0766 1.0177

Maybe you want do do olympic averaging (i.e. exluding high and low from each period)


In [17]:
cl.Development(drop_high=True, drop_low=True).fit(genins).ldf_


c:\users\jboga\onedrive\documents\github\chainladder-python\chainladder\development\base.py:155: UserWarning: drop_high and drop_low cannot be computed when less than three LDFs are present. Ignoring exclusions in some cases.
  warnings.warn('drop_high and drop_low cannot be computed '
Out[17]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.5201 1.7277 1.4351 1.1930 1.1018 1.0825 1.0573 1.0766 1.0177

Or maybe there is just a single outlier link-ratio that you don't think is indicative of future development. For these, you can specify the intersection of the origin and development age of the denominator of the link-ratio to drop.


In [18]:
cl.Development(drop=('2004', 12)).fit(genins).ldf_


Out[18]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.3797 1.7473 1.4574 1.1739 1.1038 1.0863 1.0539 1.0766 1.0177

If there are more than one troublesome outliers, you cal also pass a list to the drop argument.


In [19]:
cl.Development(drop=[('2004', 12), ('2003', 24)]).fit(genins).ldf_


Out[19]:
Origin 12-24 24-36 36-48 48-60 60-72 72-84 84-96 96-108 108-120
(All) 3.3797 1.7517 1.4574 1.1739 1.1038 1.0863 1.0539 1.0766 1.0177

Transformers

In sklearn parlance, there are two types of estimators. They are Transformers (which Development is) and Predictors. The Development object is a means to creating development patterns, but itself is not an reserving model. Transformers come with the tranform and fit_transform method. These will return a Triangle object but augment it with additional information for use in a subsequent IBNR model.


In [20]:
transformed_triangle = cl.Development().fit_transform(genins)
transformed_triangle


Out[20]:
Origin 12 24 36 48 60 72 84 96 108 120
2001 357,848 1,124,788 1,735,330 2,218,270 2,745,596 3,319,994 3,466,336 3,606,286 3,833,515 3,901,463
2002 352,118 1,236,139 2,170,033 3,353,322 3,799,067 4,120,063 4,647,867 4,914,039 5,339,085
2003 290,507 1,292,306 2,218,525 3,235,179 3,985,995 4,132,918 4,628,910 4,909,315
2004 310,608 1,418,858 2,195,047 3,757,447 4,029,929 4,381,982 4,588,268
2005 443,160 1,136,350 2,128,333 2,897,821 3,402,672 3,873,311
2006 396,132 1,333,217 2,180,715 2,985,752 3,691,712
2007 440,832 1,288,463 2,419,861 3,483,130
2008 359,480 1,421,128 2,864,498
2009 376,686 1,363,294
2010 344,014

Our transformed triangle behaves as our original genins triangle.


In [21]:
print(type(transformed_triangle))
transformed_triangle.latest_diagonal


<class 'chainladder.core.triangle.Triangle'>
Out[21]:
Origin values
2001 3,901,463
2002 5,339,085
2003 4,909,315
2004 4,588,268
2005 3,873,311
2006 3,691,712
2007 3,483,130
2008 2,864,498
2009 1,363,294
2010 344,014

However, it has other attributes that make it IBNR model-ready.


In [22]:
transformed_triangle.cdf_


Out[22]:
Origin 12-Ult 24-Ult 36-Ult 48-Ult 60-Ult 72-Ult 84-Ult 96-Ult 108-Ult
(All) 14.4466 4.1387 2.3686 1.6252 1.3845 1.2543 1.1547 1.0956 1.0177

fit_transform() is equivalent to calling fit and transform in succession on the same triangle. Again, this should feel very familiar to the sklearn practitioner.


In [23]:
cl.Development().fit_transform(genins) == cl.Development().fit(genins).transform(genins)


Out[23]:
True

The reason you might want want to use fit and transform separately would be when you want to apply development patterns to a a different triangle. For examlple, we can:

  1. Extract the commercial auto triangles from the clrd dataset
  2. Summarize to an industry level and fit a Development object
  3. We can then transform the individual company triangles with the industry development patterns

In [24]:
clrd = cl.load_dataset('clrd')
comauto = clrd[clrd['LOB']=='comauto']['CumPaidLoss']

comauto_industry = comauto.sum()
industry_dev = cl.Development().fit(comauto_industry)

industry_dev.transform(comauto)


Out[24]:
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (157, 1, 10, 10)
Index: [GRNAME, LOB]
Columns: [CumPaidLoss]

Working with multidimensional triangles

Several (though not all) of the estimators in chainlader can be fit to several triangles simultaneously. While this can be a convenient shorthand, it will use the same assumptions across every triangle.


In [25]:
clrd = cl.load_dataset('clrd').groupby('LOB').sum()['CumPaidLoss']
print('Fitting to ' + str(len(clrd.index)) + ' industries simultaneously.')
cl.Development().fit_transform(clrd).cdf_


Fitting to 6 industries simultaneously.
Out[25]:
Triangle Summary
Valuation: 1997-12
Grain: OYDY
Shape: (6, 1, 10, 9)
Index: [LOB]
Columns: [CumPaidLoss]

For greater control, you can slice individual triangles out and fit separate patterns to each.


In [26]:
print(cl.Development(average='simple').fit(clrd.loc['wkcomp']))
print(cl.Development(n_periods=4).fit(clrd.loc['ppauto']))
print(cl.Development(average='regression', n_periods=6).fit(clrd.loc['comauto']))


Development(average='simple', drop=None, drop_high=None, drop_low=None,
      drop_valuation=None, n_periods=-1, sigma_interpolation='log-linear')
Development(average='volume', drop=None, drop_high=None, drop_low=None,
      drop_valuation=None, n_periods=4, sigma_interpolation='log-linear')
Development(average='regression', drop=None, drop_high=None, drop_low=None,
      drop_valuation=None, n_periods=6, sigma_interpolation='log-linear')