In [1]:
import pandas as pd
import numpy as np
import chainladder as cl
%matplotlib inline
cl.__version__
Out[1]:
A table of loss experience showing total losses for a certain period (origin) at various, regular valuation dates (development), reflecting the change in amounts as claims mature. Older periods in the table will have one more entry than the next youngest period, leading to the triangle shape of the data in the table or any other measure that matures over time from a set origin date. Loss triangles can be used to determine loss development for a given risk.
In its simplest form, it can look like this:
In [2]:
raa = cl.load_dataset('raa')
raa
Out[2]:
A triangle has more properties than just what is displayed. For example we can see the underlying link_ratio
s. Which represent the multiplicative change in amounts from one development period to the next.
In [3]:
raa.link_ratio
Out[3]:
We can also view and manipulate the latest_diagonal
of the triangle
In [4]:
raa.latest_diagonal
Out[4]:
The latest diagonal certainly corresponds to a valuation_date
In [5]:
raa.valuation_date
Out[5]:
We should also be able to tell whether our triangle:
is_cumulative
- representing whether the data acumulates across the development periods or is incrementalis_ultimate
- represents whether ultimate values are contained in the triangleis_val_tri
- whether the development period is stated as a valuation data as opposed to an ageis_full
- whether the bottom half of the triangle has been completed
In [6]:
print('Is triangle cumulative?', raa.is_cumulative)
print('Does triangle contain ultimate projections?', raa.is_ultimate)
print('Is this a valuation triangle?', raa.is_val_tri)
print('Has the triangle been "squared"?', raa.is_full)
We can also inspect the triangle to understand its origin_grain
and development_grain
. chainladder
supports monthly, quarterly and yearly grains.
In [7]:
print('Origin grain: ', raa.origin_grain)
print('Development grain: ', raa.development_grain)
The triangle described so far is a two-dimensional structure that spans multiple cells of data. This is a useful structure for exploring individual triangles, but becomes more problematic when working with sets of triangles. Pandas does not have a triangle dtype
, but if it did, working with sets of triangles would be much more convenient. To facilitate working with more than one triangle at a time the chainladder.Triangle
acts like a pandas dataframe (with an index and columns) where each cell (row x col) is an individual triangle. This structure manifests itself as a four-dimensional space.
In [8]:
triangle = cl.load_dataset('clrd')
triangle
Out[8]:
Since 4D strucures do not fit nicely on 2D screens, we see a summary view that describes the structure rather than the underlying data itself. However, this structure behaves very much like a pandas dataframe. For example, we can explore the index and columns much like we would in pandas.
The index
generally represents reserve groupings - in this example, lines of business and companies. Other ways to use the index would be to have triangles by state, program.
In [9]:
triangle.index.head()
Out[9]:
The columns
generally represent different values. They could be paid amounts, incurred amounts, reported counts, loss ratios, closure rates, excess losses, premium, etc.
In [10]:
triangle.columns
Out[10]:
As a 4D structure, this sample triangle represents a collection of 775x6 or 4,650 triangles that are themselves 10 accident years x 10 development lags. This can be seen directly in the __repr__
as well as the calling on the shape
property of the triangle
In [11]:
triangle.shape
Out[11]:
Under the hood, the data structure is a numpy.ndarray
with the equivalent shape. Like pandas, you can directly access the underlying numpy structure with the values
property. By exposing the underlying ndarray
you are free to manipulate the underlying data directly with numpy should that be an easier route to solving a problem. Keep in mind though, the chainladder.Triangle
has several methods and properties beyond the raw numpy representation and these are kept in sync by using the chainladder.Triangle
directly.
In [12]:
print(type(triangle.values))
print(triangle.values.shape)
print(np.nansum(triangle.values))
In [13]:
triangle[['CumPaidLoss', 'IncurLoss', 'BulkLoss']]
Out[13]:
We can also boolean-index the rows of the Triangle.
In [14]:
triangle[triangle['LOB']=='wkcomp']
Out[14]:
We can even use the typical loc
, iloc
functionality similar to pandas to access subsets of data. These features can be chained together as much as you want.
In [15]:
triangle.loc['Allstate Ins Co Grp'].iloc[-1]['CumPaidLoss']
Out[15]:
In [16]:
triangle['CaseIncurLoss'] = triangle['IncurLoss'] - triangle['BulkLoss']
triangle['PaidToInc'] = triangle['CumPaidLoss'] / triangle['CaseIncurLoss']
triangle[['CaseIncurLoss', 'PaidToInc']]
Out[16]:
Another common manipulation is aggregating the values across all rows of a dataframe/triangle
In [17]:
triangle['CumPaidLoss'].sum()
Out[17]:
Aggregating rows is nice, but it is often useful to aggregate across groups of rows using groupby
. For example, we may want to group the triangles by Line of Business and get a sum across all companies for each industry.
In [18]:
triangle.groupby('LOB').sum()
Out[18]:
The aggregate functions, e.g. sum
, mean
, std
, min
, max
, etc. don't have to just apply to the index
axis. You can apply them to any of the four axes in the triangle object. using either the axis name or number.
In [19]:
triangle.sum(axis=1).sum(axis='index')
Out[19]:
Pandas has special 'accessor' methods for str
and dt
. These allow for the manipulation of data within each cell of data:
df['Last_First'].str.split(',') # splits lastname from first name by a comma-delimiter
df['Accident Date'].dt.year # pulls the year out of each date in a dataframe column
chainladder
also has special 'accessor' methods designed to allow you to manipulate the origin
, development
and valuation
vectors of a triangle.
We may want to extract only the latest accident period for every triangle.
In [20]:
triangle[triangle.origin==triangle.origin.max()]
Out[20]:
We may want to extract particular diagonals from our triangles using its valuation
vector
In [21]:
triangle[(triangle.valuation>='1994')&(triangle.valuation<='1995')].sum()['CumPaidLoss']
Out[21]:
We may even want to slice particular development periods to explore aspects of our data by development age.
In [22]:
triangle[triangle.development<=24].sum()['CumPaidLoss'].link_ratio.plot();
When the shape of a Triangle
object can be expressed as a 2D structure (i.e. two of its four axes have a length of 1), you can use the to_frame
method to convert your data into a pandas.DataFrame
. The plot
method above is nothing more than a conversion to pandas and using pandas plot.
In [23]:
triangle.groupby('LOB').sum().latest_diagonal['CumPaidLoss'].to_frame().astype(int)
Out[23]:
In [24]:
# Create a 'NetPaidLossRatio' column in triangle from the existing columns
triangle['NetPaidLossRatio'] = triangle['CumPaidLoss']/triangle['EarnedPremNet']
In [25]:
# What is the highest net paid loss ratio for any observation for origin 1997 Age 12
triangle[triangle.origin==1997][triangle.development==12]['NetPaidLossRatio'].max()
Out[25]:
In [26]:
# Subset the overall triangle to just include 'Alaska Nat Ins Co'
triangle[triangle['GRNAME']=='Alaska Nat Ins Co']
Out[26]:
In [27]:
# Use boolean indexing to create a triangle subset that includes all triangles for companies with names starting with 'B'
triangle[triangle['GRNAME'].str[0]=='B']
Out[27]:
In [28]:
# Which companies are in the top 5 net premium share for 1990?
triangle[triangle.origin=='1990']['EarnedPremNet'].latest_diagonal.groupby('GRNAME').sum().to_frame().sort_values().iloc[-5:]
Out[28]:
The chainladder.Triangle
class is designed to ingest pandas.DataFrame
objects. However, you do not need to worry about shaping the dataframe into triangle format yourself. This happens at the time you ingest the data.
Let's look at the initialization signature.
In [29]:
cl.Triangle?
We will be using the reserve prism test data to construct our triangles.
In [30]:
data = pd.read_csv('https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/prism.csv')
data.head()
Out[30]:
We must specify the origin, devleopment and columns to create a triangle object. By limiting our columns to one measure and not specifying an index, we can create a single triangle.
In [31]:
x = cl.Triangle(data=data,
origin='AccYrMo', development='ValYrMo',
columns='Paid')
x
Out[31]:
If we want to include more columns or indices we can certainly do so. Note that as we do so, we move into the 4D arena changing the display of the overall object.
In [32]:
x = cl.Triangle(data=data,
origin='AccYrMo', development='ValYrMo',
columns=['Paid', 'Incurred'])
x
Out[32]:
Though nothing stops us from using the slicing options above to represent 2D triangles
In [33]:
x['Paid']
Out[33]:
pandas has wonderful datetime inference functionality that the Triangle
heavily uses to infer origin and development granularity. Even still, there are rare occassions where date format inferences can fail and it is better to explicitly tell the triangle the date format.
In [34]:
cl.Triangle(data=data,
origin='AccYrMo', development='ValYrMo',
columns=['Paid', 'Incurred'],
origin_format='%Y-%m', development_format='%Y-%m') # Explicit > Implicit
x
Out[34]:
Up until now, we've been playing with symmetric triangles, i.e. the origin period and the development period are the same grain. However, nothing precludes us from having a different grain. Often times in practice the development
axis is more granular than the origin
axis. All the functionality available to symmetric triangles works equally well for asymmetric triangles.
In [35]:
data['AccYr'] = data['AccYrMo'].str[:4]
x = cl.Triangle(data=data,
origin='AccYr', development='ValYrMo',
columns=['Paid', 'Incurred'],
origin_format='%Y', development_format='%Y-%m')
x
Out[35]:
While exposure triangles make sense for auditable lines like workers compensation, there are many lines of business where exposure expressed as a 1D vector sufficiently and completely describes the data structure. chainladder
arithmetic requires that operations happen between a triangle and either an int
, float
, or another Triangle
. To create a 1D exposure vector, simply omit the development
argument at initialization.
In [36]:
data['Premium'] = data['Incurred'] * 3 # Contrived
x = cl.Triangle(data=data,
origin='AccYrMo',
columns='Premium',
origin_format='%Y')
print(type(x))
x
Out[36]:
We have not created triangles with an index yet, but this is easily done by passing the index argument.
In [37]:
x = cl.Triangle(
data=data,
origin='AccYrMo',
development='ValYrMo',
columns=['Paid', 'Incurred'],
index='Line', # Add index
origin_format='%Y', development_format='%Y%m')
x
Out[37]:
Just as we are not limited to one single column, we are not limited to a single index either. Multiple indices can be passed as a list.
In [38]:
x = cl.Triangle(
data=data,
origin='AccYrMo',
development='ValYrMo',
columns=['Paid', 'Incurred'],
index=['Line','Type'], # multiple indices
origin_format='%Y', development_format='%Y-%m')
x
Out[38]:
Up until now, we've kept pretty close to the pandas API for triangle manipulation. However, there are data transformations commonly applied to triangles that don't have a nice pandas analogy.
For example, the practitioner often wants to convert a triangle from an incremental view into a cumulative view and vice versa. This is accomplished with the incr_to_cum
and cum_to_incr
methods.
Since our data is stored incrementally, we want to accumulate the data after we've ingested into a Triangle
In [39]:
x = cl.Triangle(data=data,
origin='AccYrMo', development='ValYrMo',
columns='Paid')
x.incr_to_cum()
Out[39]:
By default (and in concert with the pandas philosophy), the methods associated with the Triangle
class strive for immutability. This means that the triangle we just accumulated was thrown away and our original object is incremental. Many of the chainladder.Triangle
methods have an inplace
argument or alternatively you can just use variable reassignment to store the transformation.
In [40]:
# This works
x.incr_to_cum(inplace=True)
# So does this
x = x.incr_to_cum()
When dealing with triangles that have an origin
axis, development
axis or both at a monthly or quarterly grain, the triangle can be summarized to a higher grain using the grain
method.
The grain to which you want your triangle converted, specified as 'OxDy' where x and y can take on values of ['Y', 'Q', 'M']
For example:
In [41]:
x = x.grain('OYDY')
x
Out[41]:
Depending on the type of analysis being done, it may be more convenient to look at a triangle with its development
axis expressed as a valuation rather than an age. To do this, the Triangle
has two methods for toggling between a development triangle and a valuation triangle. The methods are dev_to_val
and its inverse val_to_dev
In [42]:
x.dev_to_val()
Out[42]:
When working with real-world data, the triangles can have holes. A common issue is that subsets of a triangle might be new programs or lines and others more mature. In these cases, it doesn't make sense to include empty accident periods or development ages for the new/retired line. For example the 'Home' line has its latest accidents through '2016-03' whereas the 'Auto' program exhibits losses through '2017-12'. Sometimes, dropping the non-applicable fields is usefule with the dropna()
method.
In [43]:
x = cl.Triangle(
data=data,
origin='AccYrMo',
development='ValYrMo',
columns=['Paid'],
index=['Line']).incr_to_cum().grain('OYDY')
x.loc['Home']
Out[43]:
Note that the dropna()
method will retain empty periods if they are surrounded by non-empty periods with valid data.
In [44]:
x.loc['Home'].dropna()
Out[44]:
In [45]:
print('Commutative? ', x.sum().latest_diagonal == x.latest_diagonal.sum())
print('Commutative? ', x.loc['Auto'].link_ratio == x.link_ratio.loc['Auto'])
print('Commutative? ', x.grain('OYDY').sum() == x.sum().grain('OYDY'))
In [46]:
x.sum().to_clipboard() # Automatically converts to a pandas dataframe and puts in the clipboard for pasting in Excel
Alternatively, if you want to store the triangle elsewhere but be able to reconstitute a triangle out of it later, then you can use:
Triangle.to_json
and its inverse cl.read_json
for json formatTriangle.to_pickle
and its inverse cl.read_pickle
for pickle formatThese have the added benefit of working on multi-dimensional triangles that don't fit into a DataFrame.
In [47]:
y = x.to_json() # store as JSON string
x == cl.read_json(y) # reconstitute and compare to original object
Out[47]:
In [48]:
x.to_pickle('triangle.pkl') # store on pickle bytecode
x == cl.read_pickle('triangle.pkl') # reconstitute and compare to original object
Out[48]:
In [49]:
x = cl.Triangle(
data=data,
origin='AccYrMo',
development='ValYrMo',
columns=['Paid', 'Incurred'],
index=['Line', 'Type'], # multiple indices
origin_format='%Y', development_format='%Y-%m').incr_to_cum()
x
Out[49]:
In [50]:
# What is the case incurred activity for calendar period 2015Q2 by Line?
y = x.groupby('Line').sum().cum_to_incr()['Incurred'].dev_to_val()
y[y.valuation=='2015Q2'].sum('origin').to_frame().astype(int)
Out[50]:
In [51]:
# What proportion of our Paid come from each 'Type' for Accident year 2015?
count_by_type = x[x.origin=='2015'].latest_diagonal['Paid'].groupby('Type').sum().to_frame()
(count_by_type/count_by_type.sum())
Out[51]: