In classical geoestatistics it is frequent to use a sample-based variogram to fit a parametrised theoretical variogram. This method has proved to present biasses for the estimators of range, sill and shape. There are two reasons for this.
Having said so, it is still a method used for estimating parameters and when the size of the data is big, being a good alternative to the likelihood methodology, a more robust optimization but computationally more complex.
So first thing first, import the neessary modules. Depending on your machine this may change. For the moment I work with this.
In [1]:
# Load Biospytial modules and etc.
%matplotlib inline
import sys
sys.path.append('/apps')
sys.path.append('/apps/external_plugins/spystats/')
import django
django.setup()
## Use the ggplot style
plt.style.use('ggplot')
In [6]:
import pandas as pd
from statsmodels.regression import linear_model
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
In [4]:
import tools
import HEC_runs.fit_fia_logbiomass_logspp_GLS as auxiliary
In [7]:
## File locations, change accordingly
empirical_data_path = "/RawDataCSV/idiv_share/FIA_Plots_Biomass_11092017.csv"
variogram_path = "/apps/external_plugins/spystats/HEC_runs/results/variogram/data_envelope.csv"
In [11]:
minx = -130
maxx = -60
miny = 24
maxy = 50
In [13]:
data = auxiliary.initAnalysis(empirical_data_path=empirical_data_path,plotdata_path=plotdata_path, maxx=maxx,minx=minx,maxy=maxy,miny=miny)
In [81]:
test_data = data.loc[0:49,:]
In [82]:
model = tools.WhittleVariogram(sill=0.340246718396,range_a=41188.0234423,nugget=0.329937603763,alpha=1.12143687914)
In [83]:
import numpy as np
X = np.linspace(0,1000000,1000)
plt.plot(X,model.corr_f(X))
Out[83]:
In [84]:
vg = tools.Variogram(test_data,'residuals',model=model)
MM = vg.calculateCovarianceMatrix()
In [85]:
lmod = linear_model.GLS.from_formula(formula='logBiomass ~ logSppN',data=test_data,sigma=MM)
In [86]:
results = lmod.fit()
In [ ]:
In [87]:
results.summary()
Out[87]:
In [88]:
MM
Out[88]:
In [89]:
DD = vg.distance_coordinates
In [90]:
correl_mat = vg.model.corr_f(DD)
In [91]:
correl_mat.tofile('/outputs/correl_matf50.csv',sep=",",format="%s")
In [92]:
MM.tofile('/outputs/covar_matf50.csv',sep=",",format="%s")
In [93]:
ls /outputs
In [94]:
MM.tofile?
In [95]:
lmodco = linear_model.GLS.from_formula(formula='logBiomass ~ logSppN',data=test_data,sigma=correl_mat)
In [96]:
res=lmodco.fit()
In [97]:
res.summary()
Out[97]:
In [80]:
len(test_data)
Out[80]:
In [ ]: