The package is in pip, so you can conveniently just "pip copulalib". It also comes in Anaconda.
You may need to install a package called "statistics".
In order to make it work, a small fix is needed in copulalib.py, or you will get the following problem when attempting to build your copula:
The fix is simple: replace "if X.size is not Y.size" with something like:
In [ ]:
#The first assert makes sure that you are getting a 1D in X and Y
#replace this code around line 58 in site-packages/copulalib/copulalib.py
try:
if X.shape[0] != Y.shape[0]:
raise ValueError('The size of both arrays should be same.')
except:
raise TypeError('X and Y should have a callable shape method. '
'Try converting them to numpy arrays')
Note: at that point I was lazily pasting the structure for some plotting ;).
The first samples (x) are generated from a beta distribution and (y) from a lognormal. Beta distributions have finite bounds for support, while the right-side support for lognormals is in the infinity. An interesting property ofcopulas: BOTH marginals are transformed to the unit range. Isn't that cool?
To try the package, we fitted copulas of three archimedean families (Frank, Clayton, Gumbel) to samples x and y. We then extracted some samples from the fitted copulas, and plotted the sampled outputs together with the original samples, in order to look at how they compare with each other.
feel free to try it out on you own, and do comment if you feel like!
In [6]:
import numpy as np
import matplotlib.pyplot as plt
from copulalib.copulalib import Copula as copula
from scipy.stats import beta, lognorm, gaussian_kde
plt.style.use('ggplot')
def ppf(var,q):
#the equivalent of ppf but built directly from data
sortedvar=np.sort(var)
rcdf=np.linspace(0,1, num=len(var))
returnable=np.array([sortedvar[np.where(rcdf<=qx)[0][-1]] for qx in q])
#print returnable.shape
return returnable
def generateAndPlot(title,x,y,howmany):
#generate and plot
fig, axs = plt.subplots(3,2,figsize=(12., 18.))
fig.suptitle(title)
for index,family in enumerate(['frank','clayton','gumbel']):
copula_f = copula(x,y,family=family)
try:
#get pseudo observations
u,v = copula_f.generate_uv(howmany)
except Exception as exx:
print "Could not extract pseudo observations for {}\
because {}".format(family, exx.message)
continue
#plot pseudo observations
axs[index][0].scatter(u,v,marker='o',alpha=0.7)
axs[index][0].set_title('pseudo observations {}'.format(family),
fontsize=12)
#plot inverse ppf
#ux = beta.ppf(u,a,b,loc=loc)
#vy= lognorm.ppf(v,sc,loc=loc)
ux=ppf(x,u)
vy=ppf(y,v)
axs[index][1].scatter(x,y,marker='*',color='r', alpha=0.5) #original samples
axs[index][1].scatter(ux,vy,marker='s', alpha=0.3) #simulated realisations
axs[index][1].set_title('simulated (blue) vs original (red) {}'.format(family),
fontsize=12)
plt.show()
#total samples vs pseudo observations
pseudoobs=100
sz=300
loc=0.0 #needed for most distributions
#lognormal param
sc=0.5
y=lognorm.rvs(sc,loc=loc, size=sz)
We will draw samples from a beta distribution for (x), and samples from a lognormal for (y). The samples are pseudo-independent (we know there is no real independence if you use a computer to draw the samples, but well, reasonably independent).
In [7]:
#unrelated data: a beta (x) and a lognormal (y)
a= 0.45#2. #alpha
b=0.25#5. #beta
x=beta.rvs(a,b,loc=loc,size=sz)
#plot unrelated x and y
t = np.linspace(0, lognorm.ppf(0.99, sc), sz)
plt.plot(t, beta.pdf(t,a,b), lw=5, alpha=0.6, label='x:beta')
plt.plot(t, lognorm.pdf(t, sc), lw=5, alpha=0.6, label='y:lognormal')
plt.legend()
#plot copulas built from unrelated x and y
title='Copulas from unrelated data x: beta, alpha {} beta {}, y: lognormal, mu {}, sigma {}'.format(a,b,loc,sc)
generateAndPlot(title,x,y,pseudoobs)
The independent variable will be a lognormal (y) and variable (x) depends on (y) with the following relation: The initial value is 1 (independently). Then, for each point i, if $y_i > y_{i-1}$, then $x_i=x_{i-1}+c$, in which c is chosen uniformly from a list of fractions of 1. Otherwise, $x_i=x_{i-1}-c$.
In [8]:
#related data: a lognormal (y)
#plot related data
title='Copulas from dependent data xx depending on y, y: lognormal, mu {}, sigma {}'.format(loc,sc)
xx=[]
jumps=[0.001,0.01,0.1]
for indx,yi in enumerate(y):
if indx>1:
if yi>y[indx-1]:
xx.append(xx[-1]+np.random.choice(jumps))
else:
xx.append(xx[-1]-np.random.choice(jumps))
else:
xx.append(1)
xx=np.array(xx)
t = np.linspace(0, lognorm.ppf(0.99, sc), sz)
#plt.plot(t, gkxx.pdf(t), lw=5, alpha=0.6, label='x:f(y)')
plt.hist(xx,normed=True,label='x:f(y)',alpha=0.6,bins=100)
plt.plot(t, lognorm.pdf(t, sc), lw=5, alpha=0.6, label='y:lognormal')
plt.legend()
generateAndPlot(title,xx,y,pseudoobs)
In [ ]:
plt.plot(c, lognorm.pdf(t, sc), lw=5, alpha=0.6, label='y:lognormal')
In [ ]: