In [1]:
# Load Biospytial modules and etc.
%matplotlib inline
import sys
sys.path.append('/apps')
import django
django.setup()
import pandas as pd
import numpy as np

Sketches for automating spatial models

This notebook is for designing the tool box and methods for fitting spatial data. I´m using the library Geopystats (before spystats)

Requirements

  • Given a dataset in Geopandas dataframe, create the Variogram object. And read from the file the variogram data

In [2]:
plotdata_path = "/RawDataCSV/idiv_share/plotsClimateData_11092017.csv"
empirical_data_path = "/apps/external_plugins/spystats/HEC_runs/results/logbiomas_logsppn_res.csv"

This will run all the process for fitting and GLS


In [3]:
from external_plugins.spystats import tools
%run ../HEC_runs/fit_fia_logbiomass_logspp_GLS.py "/RawDataCSV/idiv_share/plotsClimateData_11092017.csv" "/apps/external_plugins/spystats/HEC_runs/results/logbiomas_logsppn_res.csv"  -85 -80 30 35


/opt/conda/envs/biospytial/lib/python2.7/site-packages/IPython/core/interactiveshell.py:2481: DtypeWarning: Columns (24) have mixed types. Specify dtype option on import or set low_memory=False.
  self.compile if kw['shell_futures'] else None)
INFO:__main__:Reprojecting to Alberts equal area
INFO:__main__:Fitting OLS linear model: logBiomass ~ logSppN 
INFO:__main__:Removing possible duplicates. 
 This avoids problems of Non Positive semidefinite
INFO:__main__:Reading the empirical Variogram file
INFO:__main__:Instantiating a Variogram object with the values calculated before
INFO:__main__:Dropping possible Nans
INFO:__main__:Instantiating Whittle Model...
INFO:__main__:fitting Whittle Model with the empirical variogram
INFO:__main__:Whittle Model fitted
INFO:__main__:Subselecting Region
INFO:__main__:Calculating Distance Matrix
INFO:__main__:Calculating Correlation based on theoretical model
INFO:__main__:Fitting linear model using GLS
INFO:__main__:Writing to file
INFO:__main__:Finished! Results in: tests1.csv

In [4]:
from external_plugins.spystats import tools
hx = np.linspace(0,800000,100)

{'dataframe':new_data,'variogram':gvg,'modelGLS':model1}


In [5]:
new_data = results['dataframe']
gvg = results['variogram']
model1 = results['modelGLS']

The object new_data has been reprojected to Alberts and a linear model have been fitted with residuals stored as residuals


In [6]:
new_data.residuals[:10]


Out[6]:
0    0.390705
1   -0.499146
2   -0.793889
3   -0.185936
4    0.411008
5    0.294243
6   -0.399424
7    0.834632
8    0.339474
9   -0.152521
Name: residuals, dtype: float64

The empirical variogram

The empirical variogram has been calculated already using the HEC. A variogram object has been created which takes the values from the previously calculated in HEC


In [7]:
gvg.plot(refresh=False,legend=False,percentage_trunked=20)
plt.title("Semivariogram of residuals $log(Biomass) ~ log(SppR)$")
## HERE we can cast a model (Whittle) and fit it inside the global variogram
whittle_model = tools.WhittleVariogram(sill=0.345,range_a=100000,nugget=0.33,alpha=1.0)
#tt = gvg.fitVariogramModel(whittle_model)
#plt.plot(hx,gvg.model.f(hx),'--',lw=4,c='black')
#print(whittle_model)



In [8]:
## This section is an example for calculating GLS. Using a small section because of computing intensity

In [9]:
minx = -85
maxx = -80
miny = 30
maxy = 35

section = tools._subselectDataFrameByCoordinates(new_data,'LON','LAT',minx,maxx,miny,maxy)
secvg = tools.Variogram(section,'logBiomass',model=whittle_model)
MMdist = secvg.distance_coordinates.flatten()
CovMat = secvg.model.corr_f(MMdist).reshape(len(section),len(section))
plt.imshow(CovMat)


Out[9]:
<matplotlib.image.AxesImage at 0x7f4247271cd0>
import statsmodels.regression.linear_model as lm import statsmodels.api as sm model1 = lm.GLS.from_formula(formula='logBiomass ~ logSppN',data=section,sigma=CovMat) results = model1.fit() resum = results.summary()

k = resum.as_text

Bonus!

Fitting the model to the empirical variogram

It´s included as a method in the VariogramModel class.


In [10]:
matm = tools.MaternVariogram(sill=0.34,range_a=100000,nugget=0.33,kappa=0.5)
expmm = tools.ExponentialVariogram(sill=0.34,range_a=100000,nugget=0.33)
gausms = tools.GaussianVariogram(sill=0.34,range_a=100000,nugget=0.33)
sphmm = tools.SphericalVariogram(sill=0.34,range_a=100000,nugget=0.33)
wm = tools.WhittleVariogram(sill=0.34,range_a=100000,nugget=0.33,alpha=1)
map(lambda l : l.fit(gvg), [matm,expmm,gausms,sphmm,wm])

print(matm)
print(expmm)
print(gausms)
print(sphmm)
print(wm)


< Matern Variogram : sill 0.340268825065, range 48244.4387977, nugget 0.33, kappa 0.0096205185245 >
< Exponential Variogram : sill 0.340280561125, range 38623.6456301, nugget 0.3296179524 >
< Gaussian Variogram : sill 0.340223070001, range 44965.9649192, nugget 0.330727881938 >
< Spherical Variogram : sill 266987951015.0, range 3.86443231011e+19, nugget 0.3378129155 >
< Whittle Variogram : sill 0.340274705841, range 41060.357796, nugget 0.32981717589, alpha1.12078082963 >
/apps/external_plugins/spystats/tools.py:510: RuntimeWarning: divide by zero encountered in power
  rho_h = a * np.power(b,kappa) * K_v_b
/opt/conda/envs/biospytial/lib/python2.7/site-packages/scipy/optimize/minpack.py:690: OptimizeWarning: Covariance of the parameters could not be estimated
  category=OptimizeWarning)
/apps/external_plugins/spystats/tools.py:540: RuntimeWarning: invalid value encountered in double_scalars
  g_h = ((sill - nugget)*(1 - np.exp(-(h**alpha / range_a**alpha)))) + nugget*Ih

In [22]:
gvg.plot(refresh=False,legend=False,percentage_trunked=20)
plt.title("Semivariogram of residuals $log(Biomass) ~ log(SppR)$")
## HERE we can cast a model (Whittle) and fit it inside the global variogram
#whittle_model = tools.WhittleVariogram(sill=0.345,range_a=100000,nugget=0.33,alpha=1.0)
#tt = gvg.fitVariogramModel(whittle_model)
#plt.plot(hx,matm.f(hx),'--',lw=4)
plt.plot(hx,expmm.f(hx),'--',lw=4)
plt.plot(hx,gausms.f(hx),'--',lw=4)
plt.plot(hx,sphmm.f(hx),'--',lw=4)
plt.plot(hx,wm.f(hx),'--',lw=4)

#print(whittle_model)


Out[22]:
[<matplotlib.lines.Line2D at 0x7f4246b52d50>]

In [23]:
plt.plot(hx,matm.corr_f(hx),'--',lw=4)
plt.plot(hx,expmm.corr_f(hx),'--',lw=4)
plt.plot(hx,gausms.corr_f(hx),'--',lw=4)
#plt.plot(hx,sphmm.corr_f(hx),'--',lw=4)
plt.plot(hx,wm.corr_f(hx),'--',lw=4)



TypeErrorTraceback (most recent call last)
<ipython-input-23-53b39d2b528d> in <module>()
      2 plt.plot(hx,expmm.corr_f(hx),'--',lw=4)
      3 plt.plot(hx,gausms.corr_f(hx),'--',lw=4)
----> 4 plt.plot(hx,sphmm.corr_f(hx),'--',lw=4)
      5 plt.plot(hx,wm.corr_f(hx),'--',lw=4)

/apps/external_plugins/spystats/tools.py in corr_f(self, h)
    578         corr_cont = lambda hx :  1 - (self.sill * self.f(hx) / (self.sill + self.nugget))
    579 
--> 580         return np.array([1.0 if hx == 0 else corr_cont(hx) for hx in h])
    581 
    582 

/apps/external_plugins/spystats/tools.py in <lambda>(hx)
    576         ## correlation function for distances bigger than zero
    577         ## See: Diggle & Ribeiro (2006) section 3.5
--> 578         corr_cont = lambda hx :  1 - (self.sill * self.f(hx) / (self.sill + self.nugget))
    579 
    580         return np.array([1.0 if hx == 0 else corr_cont(hx) for hx in h])

/apps/external_plugins/spystats/tools.py in <lambda>(x)
    566     @property
    567     def f(self):
--> 568         return lambda x : self.model(x,self.sill,self.range_a,self.nugget)
    569 
    570 

/apps/external_plugins/spystats/tools.py in sphericalVariogram(h, sill, range_a, nugget)
    480     else:
    481         Ih = 1.0 if h >= 0 else 0.0
--> 482         I0r = [1.0 if hi <= range_a else 0.0 for hi in h]
    483         Irinf = [1.0 if hi > range_a else 0.0 for hi in h]
    484     g_h = (sill - nugget)*((3*h / float(2*range_a))*I0r + Irinf) - (h**3 / float(2*range_a)) + (nugget*Ih)

TypeError: 'numpy.float64' object is not iterable

Outlook

I think the Matern is the right way to go but I need to recheck the model to see if it´s correctly written.


In [13]:
tik = wm.corr_f(MMdist)

In [20]:



Out[20]:
(array([  7.52679200e+06,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          2.74400000e+03]),
 array([ 0.82720746,  0.84448672,  0.86176597,  0.87904523,  0.89632448,
         0.91360373,  0.93088299,  0.94816224,  0.96544149,  0.98272075,  1.        ]),
 <a list of 10 Patch objects>)

In [ ]:
wm.corr_f(np.array([0,0,0]))

In [ ]:
f = lambda x : x**2

In [ ]:
f(hx)

In [ ]:
wm.f(0)

In [ ]: