Here I'm process by chunks the entire region.


In [1]:
# Load Biospytial modules and etc.
%matplotlib inline
import sys
sys.path.append('/apps')
import django
django.setup()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
## Use the ggplot style
plt.style.use('ggplot')

In [2]:
from external_plugins.spystats import tools
%run ../testvariogram.py


/opt/conda/envs/biospytial/lib/python2.7/site-packages/IPython/core/pylabtools.py:168: DtypeWarning: Columns (24) have mixed types. Specify dtype option on import or set low_memory=False.
  safe_execfile(fname,*where,**kw)

In [3]:
section.shape


Out[3]:
(1841, 46)

Algorithm for processing Chunks

  1. Make a partition given the extent
  2. Produce a tuple (minx ,maxx,miny,maxy) for each element on the partition
  3. Calculate the semivariogram for each chunk and save it in a dataframe
  4. Plot Everything
  5. Do the same with a mMatern Kernel

In [4]:
minx,maxx,miny,maxy = getExtent(new_data)

In [5]:
maxy


Out[5]:
1556957.5046647713

In [6]:
## Let's build the partition
N = 30
xp,dx = np.linspace(minx,maxx,N,retstep=True)
yp,dy = np.linspace(miny,maxy,N,retstep=True)

In [7]:
xx,yy = np.meshgrid(xp,yp)

In [8]:
coordinates_list = [ (xx[i][j],yy[i][j]) for i in range(N) for j in range(N)]

In [9]:
from functools import partial
tuples = map(lambda (x,y) : partial(getExtentFromPoint,x,y,step_sizex=dx,step_sizey=dy)(),coordinates_list)

In [10]:
len(tuples)


Out[10]:
900

In [11]:
chunks = map(lambda (mx,Mx,my,My) : subselectDataFrameByCoordinates(new_data,'newLon','newLat',mx,Mx,my,My),tuples)

In [12]:
len(chunks)


Out[12]:
900

In [13]:
## Here we can filter based on a threshold
threshold = 10
chunks_non_empty = filter(lambda df : df.shape[0] > threshold ,chunks)

In [14]:
len(chunks_non_empty)


Out[14]:
372

In [15]:
lengths = pd.Series(map(lambda ch : ch.shape[0],chunks_non_empty))

In [16]:
lengths.plot.hist()


Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1ca1aaf6d0>

In [59]:
variograms =map(lambda chunk : tools.Variogram(chunk,'residuals1',using_distance_threshold=200000),chunks_non_empty[:30])

In [60]:
vars = map(lambda v : v.calculateEmpirical(),variograms)
vars = map(lambda v : v.calculateEnvelope(num_iterations=50),variograms)

Take an average of the empirical variograms also with the envelope.

We will use the group by directive on the field lags


In [118]:
envslow = pd.concat(map(lambda df : df[['envlow']],vars),axis=1)
envhigh = pd.concat(map(lambda df : df[['envhigh']],vars),axis=1)
variogram = pd.concat(map(lambda df : df[['variogram']],vars),axis=1)

In [119]:
lags = vars[0][['lags']]

In [120]:
meanlow = list(envslow.apply(lambda row : np.mean(row),axis=1))
meanhigh = list(envhigh.apply(np.mean,axis=1))
meanvariogram = list(variogram.apply(np.mean,axis=1))
results = pd.DataFrame({'meanvariogram':meanvariogram,'meanlow':meanlow,'meanhigh':meanhigh})

In [121]:
result_envelope = pd.concat([lags,results],axis=1)

In [110]:
meanvg = tools.Variogram(section,'residuals1')

In [111]:
meanvg.plot()



In [113]:
meanvg.envelope.columns


Out[113]:
Index([u'envhigh', u'envlow', u'lags', u'variogram'], dtype='object')

In [122]:
result_envelope.columns


Out[122]:
Index([u'lags', u'meanhigh', u'meanlow', u'meanvariogram'], dtype='object')

In [123]:
result_envelope.columns = ['lags','envhigh','envlow','variogram']

In [124]:
meanvg.envelope = result_envelope

In [129]:
meanvg.plot(refresh=False)



In [127]:
meanvg.envelope


Out[127]:
lags envhigh envlow variogram
0 0.000000 0.493957 0.232742 0.319624
1 4489.795918 0.482763 0.261160 0.305599
2 8979.591837 0.481754 0.259424 0.337892
3 13469.387755 0.437602 0.286370 0.352717
4 17959.183673 0.451231 0.285533 0.356507
5 22448.979592 0.441574 0.283620 0.359633
6 26938.775510 0.456625 0.272039 0.335856
7 31428.571429 0.445692 0.288721 0.376069
8 35918.367347 0.430606 0.293097 0.373970
9 40408.163265 0.445407 0.286467 0.391480
10 44897.959184 0.456072 0.281728 0.372273
11 49387.755102 0.451602 0.288407 0.391042
12 53877.551020 0.461087 0.279264 0.373798
13 58367.346939 0.469905 0.270644 0.341670
14 62857.142857 0.461197 0.279282 0.398030
15 67346.938776 0.480583 0.271864 0.417322
16 71836.734694 0.468033 0.270205 0.427869
17 76326.530612 0.460169 0.273277 0.405162
18 80816.326531 0.475759 0.258388 0.424669
19 85306.122449 0.505846 0.247591 0.349006
20 89795.918367 0.485888 0.250038 0.442487
21 94285.714286 0.507817 0.246693 0.425742
22 98775.510204 0.486828 0.254242 0.410957
23 103265.306122 0.508068 0.236977 0.375554
24 107755.102041 0.506166 0.246554 0.427081
25 112244.897959 0.493029 0.251642 0.380368
26 116734.693878 0.476342 0.255027 0.371280
27 121224.489796 0.498147 0.236925 0.424712
28 125714.285714 0.476313 0.249644 0.355673
29 130204.081633 0.479220 0.257324 0.419672
30 134693.877551 0.470906 0.249311 0.381958
31 139183.673469 0.495696 0.235298 0.353450
32 143673.469388 0.505793 0.224136 0.399837
33 148163.265306 0.453241 0.258426 0.323854
34 152653.061224 0.477970 0.227335 0.373931
35 157142.857143 0.520589 0.197887 0.301297
36 161632.653061 0.613553 0.126106 0.310328
37 166122.448980 0.695026 0.117874 0.212953
38 170612.244898 0.605350 0.225374 0.287225
39 175102.040816 0.585072 0.162777 0.822125
40 179591.836735 0.817694 0.056600 0.169444
41 184081.632653 NaN NaN NaN
42 188571.428571 NaN NaN NaN
43 193061.224490 NaN NaN NaN
44 197551.020408 NaN NaN NaN
45 202040.816327 NaN NaN NaN
46 206530.612245 NaN NaN NaN
47 211020.408163 NaN NaN NaN
48 215510.204082 NaN NaN NaN

In [ ]: