In [1]:
import chainladder as cl
cl.__version__
Out[1]:
The Chain Ladder method is based on the strong assumptions of independence across origin years and across valuation years. Mack developed tests to verify if these assumptions hold, and these tests have been implemented in chainladder.
You should verify that your data satisfies these tests at the required confidence interval level. If it does not, you should consider if the development would be better done in other ways, for example using an AR model instead. Below is an example of how to test independence across origin and development years
In [ ]:
raa = cl.load_dataset('raa')
print('Correlation across valuation years? ', raa.valuation_correlation(p_critical=.1, total=True).z_critical.values)
print('Correlation across origin years? ', raa.development_correlation(p_critical=.5).t_critical.values)
The above tests show that the raa
triangle is independent in both cases, suggesting that Chain Ladder is indeed an appropriate method to develop it.
It is suggested to read Mack (1993) and Mack (1997) [refs] to ensure a proper understanding of the methodology and the choice of p_critical.
Mack (1997) differs from Mack (1993) for testing valuation years correlation. The first paper looks at the aggregate of all years, while the latter suggests to check independence for each valuation year and if dependence does appear in one year, to reduce the weight for such year in the development process [how?] To test for each valuation year one can run
In [ ]:
# Setting total=False provides a year-by-year test
raa.valuation_correlation(p_critical=.1, total=False).z_critical
Please note that the tests are run on the entire 4 dimensions of the triangle
, and indeed the output of the test is a triangle
itself
All development methods follow the sklearn
estimator API. These estimators have a few properties that are worth getting used to.
You instiantiate the estimator with your choice of assumptions. In the case where you don't opt for any assumptions, defaults are chosen for you.
In [2]:
cl.Development()
Out[2]:
At this point, we've chosen an estimator and assumptions (even if default) but we have not shown our estimator a Triangle
. At this point it is merely instructions on how to fit development patterns, but no patterns exist as of yet.
All estimators have a fit
method and you can pass a triangle to your estimator. Let's fit
a Triangle
in a Development
estimator. Let's also assign the estimator to a variable so we can reference attributes about it.
In [3]:
genins = cl.load_dataset('genins')
dev = cl.Development().fit(genins)
Now that we have fit
a Development
estimator, it has many additional properties that didn't exist before fitting. For example,
we ca view the ldf_
In [4]:
dev.ldf_
Out[4]:
We can view the cdf_
In [5]:
dev.cdf_
Out[5]:
Notice these extra attributes have a trailing underscore_. This is sklearn
API convention and it is used to quickly distinguish between attributes that are assumptions (i.e. that exist pre-fit), and those that are estimated from the data (only exist post-fit)
In [6]:
print('Assumption parameter (no underscore):', dev.average)
print('Estimated parameter (underscore):\n', dev.ldf_)
Now that we have a grounding in triangle manipulation and the basics of estimators, we can start getting more creative with customizing our Development
factors.
The basic Development
estimator uses a weighted regression through the origin for estimating parameters. Mack showed that using weighted regressions allows for:
volume
weighted average development patternssimple
average development factorsregression
estimate of development factor where the regression equation is Y = mX + 0While he posited this framework to suggest the MackChainladder
stochastic method, it is an elegant form even for deterministic development pattern selection.
In [7]:
vol = cl.Development(average='volume').fit(genins).ldf_
vol
Out[7]:
In [8]:
sim = cl.Development(average='simple').fit(genins).ldf_
sim
Out[8]:
In most cases, estimator attributes are Triangle
s themselves and can be manipulated with just like raw triangles.
In [9]:
print('LDF Type: ', type(vol))
print('Difference between volume and simple average:')
vol-sim
Out[9]:
Choosing how you average your LDFs can be done independently for each age-to-age period. For example, we can use volume
averaging on the first pattern, simple
the second, regression
the third, and then repeat the cycle as follows:
In [10]:
cl.Development(average=['volume', 'simple', 'regression']*3).fit(genins).ldf_
Out[10]:
Another example, using volume
-weighting for the first and last three patterns with simple
averaging in between.
In [11]:
cl.Development(average=['volume']+['simple']*5+['volume']*3).fit(genins).ldf_
Out[11]:
Development
comes with an n_periods
parameter that allows you to select the latest n
valuation periods for fitting your development patterns. n_periods=-1
is used to indicate the usage of all available periods.
In [12]:
cl.Development(n_periods=3).fit(genins).ldf_
Out[12]:
The units of n_periods
follows the origin_grain
of your triangle.
In [13]:
dev = cl.Development(n_periods=5).fit(genins)
print('Using ' + str(dev.n_periods) + str(genins.origin_grain) + ' Avg')
dev.ldf_
Out[13]:
Much like average
, n_periods
can also be set for each age-to-age individually.
In [14]:
cl.Development(n_periods=[8,2,6,5,-1,2,-1,-1,5]).fit(genins).ldf_
Out[14]:
Note that if you want more n_periods
than are available for any particular age-to-age period, all available periods will be used instead.
In [15]:
cl.Development(n_periods=[1,2,3,4,5,6,7,8,9]).fit(genins).ldf_ == \
cl.Development(n_periods=[1,2,3,4,5,4,3,2,1]).fit(genins).ldf_
Out[15]:
In [16]:
cl.Development(drop_valuation='2004').fit(genins).ldf_
Out[16]:
Maybe you want do do olympic averaging (i.e. exluding high and low from each period)
In [17]:
cl.Development(drop_high=True, drop_low=True).fit(genins).ldf_
Out[17]:
Or maybe there is just a single outlier link-ratio that you don't think is indicative of future development. For these, you can specify the intersection of the origin and development age of the denominator of the link-ratio to drop
.
In [18]:
cl.Development(drop=('2004', 12)).fit(genins).ldf_
Out[18]:
If there are more than one troublesome outliers, you cal also pass a list to the drop
argument.
In [19]:
cl.Development(drop=[('2004', 12), ('2003', 24)]).fit(genins).ldf_
Out[19]:
In sklearn
parlance, there are two types of estimators. They are Transformers (which Development
is) and Predictors. The Development
object is a means to creating development patterns, but itself is not an reserving model. Transformers come with the tranform
and fit_transform
method. These will return a Triangle
object but augment it with additional information for use in a subsequent IBNR model.
In [20]:
transformed_triangle = cl.Development().fit_transform(genins)
transformed_triangle
Out[20]:
Our transformed triangle behaves as our original genins
triangle.
In [21]:
print(type(transformed_triangle))
transformed_triangle.latest_diagonal
Out[21]:
However, it has other attributes that make it IBNR model-ready.
In [22]:
transformed_triangle.cdf_
Out[22]:
fit_transform()
is equivalent to calling fit
and transform
in succession on the same triangle. Again, this should feel very familiar to the sklearn
practitioner.
In [23]:
cl.Development().fit_transform(genins) == cl.Development().fit(genins).transform(genins)
Out[23]:
The reason you might want want to use fit
and transform
separately would be when you want to apply development patterns to a a different triangle. For examlple, we can:
clrd
datasetfit
a Development
objecttransform
the individual company triangles with the industry development patterns
In [24]:
clrd = cl.load_dataset('clrd')
comauto = clrd[clrd['LOB']=='comauto']['CumPaidLoss']
comauto_industry = comauto.sum()
industry_dev = cl.Development().fit(comauto_industry)
industry_dev.transform(comauto)
Out[24]:
In [25]:
clrd = cl.load_dataset('clrd').groupby('LOB').sum()['CumPaidLoss']
print('Fitting to ' + str(len(clrd.index)) + ' industries simultaneously.')
cl.Development().fit_transform(clrd).cdf_
Out[25]:
For greater control, you can slice individual triangles out and fit separate patterns to each.
In [26]:
print(cl.Development(average='simple').fit(clrd.loc['wkcomp']))
print(cl.Development(n_periods=4).fit(clrd.loc['ppauto']))
print(cl.Development(average='regression', n_periods=6).fit(clrd.loc['comauto']))