In [1]:
# Load Biospytial modules and etc.
%matplotlib inline
import sys
sys.path.append('/apps')
import django
django.setup()
import pandas as pd
import numpy as np

Sketches for automating spatial models

This notebook is for designing the tool box and methods for fitting spatial data. I´m using the library Geopystats (before spystats)

Requirements

  • Given a dataset in Geopandas dataframe, create the Variogram object. And read from the file the variogram data

In [2]:
from external_plugins.spystats import tools
%run ../HEC_runs/fit_fia_logbiomass_logspp_GLS.py


/opt/conda/envs/biospytial/lib/python2.7/site-packages/IPython/core/pylabtools.py:168: DtypeWarning: Columns (24) have mixed types. Specify dtype option on import or set low_memory=False.
  safe_execfile(fname,*where,**kw)
INFO:__main__:Removing possible duplicates
INFO:__main__:Reading the empirical Variogram file

In [3]:
from external_plugins.spystats import tools
hx = np.linspace(0,800000,100)

The object new_data has been reprojected to Alberts and a linear model have been fitted with residuals stored as residuals


In [4]:
new_data.residuals[:10]


Out[4]:
0    0.390705
1   -0.499146
2   -0.793889
3   -0.185936
4    0.411008
5    0.294243
6   -0.399424
7    0.834632
8    0.339474
9   -0.152521
Name: residuals, dtype: float64

The empirical variogram

The empirical variogram has been calculated already using the HEC. A variogram object has been created which takes the values from the previously calculated in HEC


In [5]:
gvg.plot(refresh=False,legend=False,percentage_trunked=20)
plt.title("Semivariogram of residuals $log(Biomass) ~ log(SppR)$")
## HERE we can cast a model (Whittle) and fit it inside the global variogram
whittle_model = tools.WhittleVariogram(sill=0.345,range_a=100000,nugget=0.33,alpha=1.0)
tt = gvg.fitVariogramModel(whittle_model)
plt.plot(hx,gvg.model.f(hx),'--',lw=4,c='black')
print(whittle_model)


< Whittle Variogram : sill 0.340274671636, range 41061.0511898, nugget 0.32981732354, alpha1.1210178637 >

In [7]:
## This section is an example for calculating GLS. Using a small section because of computing intensity

In [11]:
minx = -85
maxx = -80
miny = 30
maxy = 35

section = tools._subselectDataFrameByCoordinates(new_data,'LON','LAT',minx,maxx,miny,maxy)
secvg = tools.Variogram(section,'logBiomass',model=whittle_model)
MMdist = secvg.distance_coordinates.flatten()
CovMat = secvg.model.corr_f(MMdist).reshape(len(section),len(section))
plt.imshow(CovMat)


Out[11]:
<matplotlib.image.AxesImage at 0x7f178b1d0390>

In [34]:
import statsmodels.regression.linear_model as lm
import statsmodels.api as sm
model1 = results_gls.from_formula(formula='logBiomass ~ logSppN',data=section,sigma=CovMat)
results = model1.fit()
results.summary()


Out[34]:
GLS Regression Results
Dep. Variable: logBiomass R-squared: 0.171
Model: GLS Adj. R-squared: 0.170
Method: Least Squares F-statistic: 564.1
Date: Fri, 08 Dec 2017 Prob (F-statistic): 1.50e-113
Time: 04:23:01 Log-Likelihood: -3301.1
No. Observations: 2744 AIC: 6606.
Df Residuals: 2742 BIC: 6618.
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 8.3170 14.454 0.575 0.565 -20.024 36.658
logSppN 0.4659 0.020 23.743 0.000 0.427 0.504
Omnibus: 169.100 Durbin-Watson: 2.090
Prob(Omnibus): 0.000 Jarque-Bera (JB): 692.162
Skew: -0.125 Prob(JB): 5.00e-151
Kurtosis: 5.448 Cond. No. 737.

In [37]:
## Without spatial structure
Id = np.identity(len(section))
model2 = results_gls.from_formula(formula='logBiomass ~ logSppN',data=section,sigma=Id)
results = model2.fit()
results.summary()


Out[37]:
GLS Regression Results
Dep. Variable: logBiomass R-squared: 0.196
Model: GLS Adj. R-squared: 0.195
Method: Least Squares F-statistic: 666.8
Date: Fri, 08 Dec 2017 Prob (F-statistic): 8.58e-132
Time: 04:25:31 Log-Likelihood: -2676.5
No. Observations: 2744 AIC: 5357.
Df Residuals: 2742 BIC: 5369.
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 8.3423 0.030 280.887 0.000 8.284 8.401
logSppN 0.4567 0.018 25.822 0.000 0.422 0.491
Omnibus: 176.112 Durbin-Watson: 1.989
Prob(Omnibus): 0.000 Jarque-Bera (JB): 235.457
Skew: -0.575 Prob(JB): 7.43e-52
Kurtosis: 3.860 Cond. No. 5.32

Bonus!

Fitting the model to the empirical variogram

It´s included as a method in the VariogramModel class.


In [16]:
matm = tools.MaternVariogram(sill=0.34,range_a=100000,nugget=0.33,kappa=0.5)
expmm = tools.ExponentialVariogram(sill=0.34,range_a=100000,nugget=0.33)
gausms = tools.GaussianVariogram(sill=0.34,range_a=100000,nugget=0.33)
sphmm = tools.SphericalVariogram(sill=0.34,range_a=100000,nugget=0.33)
wm = tools.WhittleVariogram(sill=0.34,range_a=100000,nugget=0.33,alpha=1)
map(lambda l : l.fit(gvg), [matm,expmm,gausms,sphmm,wm])

print(matm)
print(expmm)
print(gausms)
print(sphmm)
print(wm)


< Matern Variogram : sill 0.00955074594083, range 23746.0270787, nugget 0.330725245077, kappa 1.65463982714 >
< Exponential Variogram : sill 0.340280561125, range 38623.6456301, nugget 0.3296179524 >
< Gaussian Variogram : sill 0.340223070001, range 44965.9649192, nugget 0.330727881938 >
< Spherical Variogram : sill 266987951015.0, range 3.86443231011e+19, nugget 0.3378129155 >
< Whittle Variogram : sill 0.340274705841, range 41060.357796, nugget 0.32981717589, alpha1.12078082963 >
/apps/external_plugins/spystats/tools.py:534: RuntimeWarning: invalid value encountered in double_scalars
  g_h = ((sill - nugget)*(1 - np.exp(-(h**alpha / range_a**alpha)))) + nugget*Ih

In [ ]: