Here I'm process by chunks the entire region.



In [1]:

    
# Load Biospytial modules and etc.
%matplotlib inline
import sys
sys.path.append('/apps')
import django
django.setup()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
## Use the ggplot style
plt.style.use('ggplot')



In [2]:

    
from external_plugins.spystats import tools
%run ../testvariogram.py









    



/opt/conda/envs/biospytial/lib/python2.7/site-packages/IPython/core/pylabtools.py:168: DtypeWarning: Columns (24) have mixed types. Specify dtype option on import or set low_memory=False.
  safe_execfile(fname,*where,**kw)



In [3]:

    
section.shape









    Out[3]:





(1841, 46)

Algorithm for processing Chunks

Make a partition given the extent
Produce a tuple (minx ,maxx,miny,maxy) for each element on the partition
Calculate the semivariogram for each chunk and save it in a dataframe
Plot Everything
Do the same with a mMatern Kernel



In [4]:

    
minx,maxx,miny,maxy = getExtent(new_data)



In [5]:

    
maxy









    Out[5]:





1556957.5046647713



In [6]:

    
## Let's build the partition
N = 30
xp,dx = np.linspace(minx,maxx,N,retstep=True)
yp,dy = np.linspace(miny,maxy,N,retstep=True)



In [7]:

    
xx,yy = np.meshgrid(xp,yp)



In [8]:

    
coordinates_list = [ (xx[i][j],yy[i][j]) for i in range(N) for j in range(N)]



In [9]:

    
from functools import partial
tuples = map(lambda (x,y) : partial(getExtentFromPoint,x,y,step_sizex=dx,step_sizey=dy)(),coordinates_list)



In [10]:

    
len(tuples)









    Out[10]:





900



In [11]:

    
chunks = map(lambda (mx,Mx,my,My) : subselectDataFrameByCoordinates(new_data,'newLon','newLat',mx,Mx,my,My),tuples)



In [12]:

    
len(chunks)









    Out[12]:





900



In [13]:

    
## Here we can filter based on a threshold
threshold = 10
chunks_non_empty = filter(lambda df : df.shape[0] > threshold ,chunks)



In [14]:

    
len(chunks_non_empty)









    Out[14]:





372



In [15]:

    
lengths = pd.Series(map(lambda ch : ch.shape[0],chunks_non_empty))



In [16]:

    
lengths.plot.hist()









    Out[16]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fc265eff3d0>



In [17]:

    
variograms =map(lambda chunk : tools.Variogram(chunk,'residuals1',using_distance_threshold=200000),chunks_non_empty)



In [18]:

    
vars = map(lambda v : v.calculateEmpirical(),variograms)
vars = map(lambda v : v.calculateEnvelope(num_iterations=50),variograms)

Take an average of the empirical variograms also with the envelope.

We will use the group by directive on the field lags



In [19]:

    
envslow = pd.concat(map(lambda df : df[['envlow']],vars),axis=1)
envhigh = pd.concat(map(lambda df : df[['envhigh']],vars),axis=1)
variogram = pd.concat(map(lambda df : df[['variogram']],vars),axis=1)



In [20]:

    
lags = vars[0][['lags']]



In [21]:

    
meanlow = list(envslow.apply(lambda row : np.mean(row),axis=1))
meanhigh = list(envhigh.apply(np.mean,axis=1))
meanvariogram = list(variogram.apply(np.mean,axis=1))
results = pd.DataFrame({'meanvariogram':meanvariogram,'meanlow':meanlow,'meanhigh':meanhigh})



In [22]:

    
result_envelope = pd.concat([lags,results],axis=1)



In [23]:

    
meanvg = tools.Variogram(section,'residuals1')



In [24]:

    
meanvg.plot()



In [25]:

    
meanvg.envelope.columns









    Out[25]:





Index([u'envhigh', u'envlow', u'lags', u'variogram'], dtype='object')



In [26]:

    
result_envelope.columns









    Out[26]:





Index([u'lags', u'meanhigh', u'meanlow', u'meanvariogram'], dtype='object')



In [27]:

    
result_envelope.columns = ['lags','envhigh','envlow','variogram']



In [28]:

    
meanvg.envelope = result_envelope



In [31]:

    
meanvg.plot(refresh=False)



In [ ]: