In [1]:
# Load Biospytial modules and etc.
%matplotlib inline
import sys
sys.path.append('/apps')
import django
django.setup()
import pandas as pd
import numpy as np

Sketches for automating spatial models

This notebook is for designing the tool box and methods for fitting spatial data. I´m using the library Geopystats (before spystats)

Requirements

  • Given a dataset in Geopandas dataframe, create the Variogram object. And read from the file the variogram data

In [2]:
plotdata_path = "/RawDataCSV/idiv_share/plotsClimateData_11092017.csv"
empirical_data_path = "/apps/external_plugins/spystats/HEC_runs/results/logbiomas_logsppn_res.csv"

This will run all the process for fitting and GLS


In [4]:
from external_plugins.spystats import tools
%run ../HEC_runs/fit_fia_logbiomass_logspp_GLS.py "/RawDataCSV/idiv_share/plotsClimateData_11092017.csv" "/apps/external_plugins/spystats/HEC_runs/results/logbiomas_logsppn_res.csv"  -85 -80 30 35


INFO:__main__:Reprojecting to Alberts equal area
INFO:__main__:Fitting OLS linear model: logBiomass ~ logSppN 
INFO:__main__:Removing possible duplicates. 
 This avoids problems of Non Positive semidefinite
INFO:__main__:Reading the empirical Variogram file
INFO:__main__:Instantiating a Variogram object with the values calculated before
INFO:__main__:Dropping possible Nans
INFO:__main__:Instantiating Matern Model...
INFO:__main__:fitting Matern Model with the empirical variogram
INFO:__main__:Matern Model fitted
INFO:__main__:Subselecting Region
INFO:__main__:Calculating Distance Matrix
INFO:__main__:Calculating Correlation based on theoretical model
INFO:__main__:Fitting linear model using GLS
INFO:__main__:Writing to file
INFO:__main__:Finished! Results in: tests1.csv

In [5]:
from external_plugins.spystats import tools
hx = np.linspace(0,800000,100)

{'dataframe':new_data,'variogram':gvg,'modelGLS':model1}


In [6]:
new_data = results['dataframe']
gvg = results['variogram']
model1 = results['modelGLS']

The object new_data has been reprojected to Alberts and a linear model have been fitted with residuals stored as residuals


In [7]:
new_data.residuals[:10]


Out[7]:
0    0.390705
1   -0.499146
2   -0.793889
3   -0.185936
4    0.411008
5    0.294243
6   -0.399424
7    0.834632
8    0.339474
9   -0.152521
Name: residuals, dtype: float64

The empirical variogram

The empirical variogram has been calculated already using the HEC. A variogram object has been created which takes the values from the previously calculated in HEC


In [8]:
gvg.plot(refresh=False,legend=False,percentage_trunked=20)
plt.title("Semivariogram of residuals $log(Biomass) ~ log(SppR)$")
## HERE we can cast a model (Whittle) and fit it inside the global variogram
whittle_model = tools.WhittleVariogram(sill=0.345,range_a=100000,nugget=0.33,alpha=1.0)
#tt = gvg.fitVariogramModel(whittle_model)
#plt.plot(hx,gvg.model.f(hx),'--',lw=4,c='black')
#print(whittle_model)



In [9]:
## This section is an example for calculating GLS. Using a small section because of computing intensity

In [10]:
whole_var = tools.Variogram(new_data,'logBiomass',model=whittle_model)

In [12]:
MMdist = whole_var.distance_coordinates.flatten()

In [13]:
distances = pd.DataFrame(MMdist)

In [38]:
xx = distances.loc[ (distances[0] < 1000) & (distances[0] > 0)]

In [51]:
xx.sort([0])


/opt/conda/envs/biospytial/lib/python2.7/site-packages/ipykernel/__main__.py:1: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  if __name__ == '__main__':
Out[51]:
0
523346475 139.015907
526846560 139.015907
797215262 142.239474
797104733 142.239474
493280839 149.708451
492507136 149.708451
29955002 176.202690
30581333 176.202690
1328262228 180.350359
1327451682 180.350359
213848372 198.005178
213553628 198.005178
275711131 227.391724
275563759 227.391724
321325232 242.074939
320846273 242.074939
3242363 281.543049
3352892 281.543049
1334746928 295.816491
1333199522 295.816491
792683328 296.359099
792609642 296.359099
977645232 299.029358
977718918 299.029358
406252937 302.095548
405037118 302.095548
1342152783 309.773451
1340973807 309.773451
437129163 315.460803
440187132 315.460803
... ...
816190460 936.972560
816927320 936.972560
1035749811 954.588087
1036339299 954.588087
519993421 955.554269
517635469 955.554269
240892607 958.680985
240782078 958.680985
553706820 960.419546
559601700 960.419546
1316840310 970.792049
1317208740 970.792049
1318314109 970.844026
1318645696 970.844026
632075965 973.886009
631707535 973.886009
782182503 978.063826
782108817 978.063826
1305676286 983.125297
1306449989 983.125297
593278571 983.533862
607315754 983.533862
258688748 985.709640
258799277 985.709640
1292670003 990.340059
1293517392 990.340059
489670049 990.947860
489633206 990.947860
266205128 994.352895
266315657 994.352895

236 rows × 1 columns


In [9]:
minx = -85
maxx = -80
miny = 30
maxy = 35

section = tools._subselectDataFrameByCoordinates(new_data,'LON','LAT',minx,maxx,miny,maxy)
secvg = tools.Variogram(section,'logBiomass',model=whittle_model)
MMdist = secvg.distance_coordinates.flatten()
CovMat = secvg.model.corr_f(MMdist).reshape(len(section),len(section))
plt.imshow(CovMat)


Out[9]:
<matplotlib.image.AxesImage at 0x7f4247271cd0>

In [ ]:

import statsmodels.regression.linear_model as lm import statsmodels.api as sm model1 = lm.GLS.from_formula(formula='logBiomass ~ logSppN',data=section,sigma=CovMat) results = model1.fit() resum = results.summary()

k = resum.as_text

Bonus!

Fitting the model to the empirical variogram

It´s included as a method in the VariogramModel class.


In [52]:
matm = tools.MaternVariogram(sill=0.34,range_a=100000,nugget=0.33,kappa=0.5)
expmm = tools.ExponentialVariogram(sill=0.34,range_a=100000,nugget=0.33)
gausms = tools.GaussianVariogram(sill=0.34,range_a=100000,nugget=0.33)
sphmm = tools.SphericalVariogram(sill=0.34,range_a=100000,nugget=0.33)
wm = tools.WhittleVariogram(sill=0.34,range_a=100000,nugget=0.33,alpha=1)
map(lambda l : l.fit(gvg), [matm,expmm,gausms,sphmm,wm])

print(matm)
print(expmm)
print(gausms)
print(sphmm)
print(wm)


/apps/external_plugins/spystats/tools.py:510: RuntimeWarning: divide by zero encountered in power
  rho_h = a * np.power(b,kappa) * K_v_b
/apps/external_plugins/spystats/tools.py:540: RuntimeWarning: invalid value encountered in double_scalars
  g_h = ((sill - nugget)*(1 - np.exp(-(h**alpha / range_a**alpha)))) + nugget*Ih
< Matern Variogram : sill 0.340268825065, range 48244.4387977, nugget 0.33, kappa 0.0096205185245 >
< Exponential Variogram : sill 0.340280561125, range 38623.6456301, nugget 0.3296179524 >
< Gaussian Variogram : sill 0.340223070001, range 44965.9649192, nugget 0.330727881938 >
< Spherical Variogram : sill 266987951015.0, range 3.86443231011e+19, nugget 0.3378129155 >
< Whittle Variogram : sill 0.340274705841, range 41060.357796, nugget 0.32981717589, alpha1.12078082963 >

In [53]:
gvg.plot(refresh=False,legend=False,percentage_trunked=20)
plt.title("Semivariogram of residuals $log(Biomass) ~ log(SppR)$")
## HERE we can cast a model (Whittle) and fit it inside the global variogram
#whittle_model = tools.WhittleVariogram(sill=0.345,range_a=100000,nugget=0.33,alpha=1.0)
#tt = gvg.fitVariogramModel(whittle_model)
#plt.plot(hx,matm.f(hx),'--',lw=4)
plt.plot(hx,expmm.f(hx),'--',lw=4)
plt.plot(hx,gausms.f(hx),'--',lw=4)
plt.plot(hx,sphmm.f(hx),'--',lw=4)
plt.plot(hx,wm.f(hx),'--',lw=4)

#print(whittle_model)


Out[53]:
[<matplotlib.lines.Line2D at 0x7fc5f78044d0>]

Outlook

I think the Matern is the right way to go but I need to recheck the model to see if it´s correctly written.

  • Note: there was a small mistake. I´ve corrected it and now it seems to work. I´ll use this model instead of the wittle. It seems better to me.
  • Note 2: Matern makes differentiability close to the origin at distance 0. it is a better model

In [54]:
newhx = np.linspace(0,2000,100)
plt.plot(newhx,matm.corr_f(newhx),'--',lw=4)
#plt.plot(hx,expmm.corr_f(hx),'--',lw=4)
#plt.plot(hx,gausms.corr_f(hx),'--',lw=4)
#plt.plot(hx,sphmm.corr_f(hx),'--',lw=4)
plt.plot(newhx,wm.corr_f(newhx),'--',lw=4)
plt.title("Correlation for Matern and Wittle models")


Out[54]:
<matplotlib.text.Text at 0x7fc5f77fea90>

In [63]:
wm.corr_f(np.linspace(0,100000,100))


Out[63]:
array([ 1.        ,  0.83238879,  0.83226288,  0.83214003,  0.83202017,
        0.83190322,  0.83178911,  0.83167778,  0.83156915,  0.83146316,
        0.83135974,  0.83125884,  0.83116039,  0.83106433,  0.83097061,
        0.83087916,  0.83078994,  0.83070289,  0.83061795,  0.83053507,
        0.83045421,  0.83037531,  0.83029833,  0.83022322,  0.83014994,
        0.83007844,  0.83000867,  0.8299406 ,  0.82987419,  0.82980938,
        0.82974616,  0.82968447,  0.82962427,  0.82956554,  0.82950824,
        0.82945233,  0.82939778,  0.82934456,  0.82929262,  0.82924195,
        0.82919252,  0.82914428,  0.82909721,  0.82905129,  0.82900649,
        0.82896277,  0.82892012,  0.8288785 ,  0.82883789,  0.82879827,
        0.82875961,  0.8287219 ,  0.8286851 ,  0.82864919,  0.82861415,
        0.82857997,  0.82854662,  0.82851408,  0.82848233,  0.82845135,
        0.82842112,  0.82839163,  0.82836285,  0.82833478,  0.82830738,
        0.82828065,  0.82825457,  0.82822913,  0.8282043 ,  0.82818008,
        0.82815644,  0.82813338,  0.82811088,  0.82808893,  0.82806751,
        0.82804661,  0.82802622,  0.82800632,  0.82798691,  0.82796797,
        0.82794949,  0.82793146,  0.82791387,  0.8278967 ,  0.82787995,
        0.82786361,  0.82784767,  0.82783211,  0.82781693,  0.82780212,
        0.82778767,  0.82777357,  0.82775981,  0.82774639,  0.82773329,
        0.82772052,  0.82770805,  0.82769589,  0.82768402,  0.82767244])

In [57]:
xx


Out[57]:
0
3242363 281.543049
3352892 281.543049
24723001 928.039716
24944059 928.039716
29954994 859.412110
29955002 176.202690
30286581 859.412110
30286598 856.496991
30581333 176.202690
30581342 856.496991
84522452 483.806758
85332998 483.806758
97455045 860.050659
98191905 860.050659
99555209 468.535696
99702577 696.583368
99960478 696.583368
100255226 468.535696
115693302 450.830642
115766988 450.830642
213553628 198.005178
213848372 198.005178
217274969 686.877736
217422341 686.877736
240782078 958.680985
240892607 958.680985
244908722 800.074238
245166623 800.074238
258688748 985.709640
258799277 985.709640
... ...
1297680895 547.922539
1297695679 459.281609
1299965326 898.119620
1301291674 898.119620
1303907715 935.334030
1304202517 558.354477
1304276145 935.334030
1305307830 752.761712
1305676286 983.125297
1305860475 752.761712
1306118353 558.354477
1306449989 983.125297
1309029173 577.670139
1309508132 577.670139
1316840310 970.792049
1317208740 970.792049
1317761464 572.895320
1318314109 970.844026
1318645696 970.844026
1319198341 572.895320
1323288181 589.575093
1323509239 589.575093
1327451682 180.350359
1328262228 180.350359
1333199522 295.816491
1334746928 295.816491
1340973807 309.773451
1342152783 309.773451
1352727346 836.670662
1353316834 836.670662

236 rows × 1 columns


In [ ]: