In [13]:
# Load Biospytial modules and etc.
%matplotlib inline
import sys
sys.path.append('/apps')
import django
django.setup()
import pandas as pd
import matplotlib.pyplot as plt
## Use the ggplot style
plt.style.use('ggplot')

In [14]:
# My mac
#data = pd.read_csv("/RawDataCSV/plotsClimateData_11092017.csv")
# My Linux desktop
data = pd.read_csv("/RawDataCSV/idiv_share/plotsClimateData_11092017.csv")

In [15]:
data.SppN.mean()


Out[15]:
5.1824128104220382

In [5]:
import geopandas as gpd

In [6]:
from django.contrib.gis import geos
from shapely.geometry import Point

In [7]:
data['geometry'] = data.apply(lambda z: Point(z.LON, z.LAT), axis=1)
new_data = gpd.GeoDataFrame(data)

Let´s reproject to Alberts or something with distance


In [8]:
new_data.crs = {'init':'epsg:4326'}

Uncomment to reproject

proj string taken from: http://spatialreference.org/


In [9]:
#new_data =  new_data.to_crs("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs ")

Model Fitting Using a GLM

The general model will have the form: $$ SppRichness(x,y) = Z(x,y) + \epsilon $$ Where: $Z(x,y)$ is the Spatial Autocorrelation process and $\epsilon \sim N(0,\sigma^2)$


In [10]:
##### OLD #######
len(data.lon)
#X = data[['AET','StandAge','lon','lat']]
#X = data[['SppN','lon','lat']]
X = data[['lon','lat']]
#Y = data['plotBiomass']
Y = data[['SppN']]
## First step in spatial autocorrelation
#Y = pd.DataFrame(np.zeros(len(Y)))
## Let´s take a small sample only for the spatial autocorrelation
#import numpy as np
#sample_size = 2000
#randindx = np.random.randint(0,X.shape[0],sample_size)
#nX = X.loc[randindx]
#nY = Y.loc[randindx]
nX = X
nY = Y

In [139]:
## Small function for systematically selecting the k-th element of the data.
#### Sughgestion use for now a small k i.e. 10
systematic_selection = lambda k : filter(lambda i : not(i % 10) ,range(len(data)))

In [140]:
idx = systematic_selection(7)
nX = X.loc[idx]
nY = Y.loc[idx]
new_data = data.loc[idx]

In [141]:
# Import GPFlow
import GPflow as gf
k = gf.kernels.Matern12(2, lengthscales=1, active_dims = [0,1] )

In [142]:
model = gf.gpr.GPR(nX.as_matrix(),nY.as_matrix().reshape(len(nY),1).astype(float),k)

In [12]:
new_data.hist('SppN')


Out[12]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f3dfe511a10>]], dtype=object)

Fitted parameters (From HEC)


In [143]:
model.kern.lengthscales = 25.4846122373
model.kern.variance = 10.9742076021
model.likelihood.variance = 4.33463026664

In [144]:
%time mm = k.compute_K_symm(X.as_matrix())


CPU times: user 47 s, sys: 3.8 s, total: 50.8 s
Wall time: 14.4 s

In [145]:
import numpy as np
Nn = 500
dsc = data
predicted_x = np.linspace(min(dsc.lon),max(dsc.lon),Nn)
predicted_y = np.linspace(min(dsc.lat),max(dsc.lat),Nn)
Xx, Yy = np.meshgrid(predicted_x,predicted_y)
## Fake richness
fake_sp_rich = np.ones(len(Xx.ravel()))
predicted_coordinates = np.vstack([ Xx.ravel(), Yy.ravel()]).transpose()
#predicted_coordinates = np.vstack([section.SppN, section.newLon,section.newLat]).transpose()

In [146]:
len(predicted_coordinates)


Out[146]:
250000

In [20]:
#We will calculate everything with the new model and parameters
#model = gf.gpr.GPR(X.as_matrix(),Y.as_matrix().reshape(len(Y),1).astype(float),k)

In [147]:
%time means,variances = model.predict_y(predicted_coordinates)


CPU times: user 5min 58s, sys: 6.96 s, total: 6min 5s
Wall time: 5min 22s
#USING k-partition = 10 import cartopy plt.figure(figsize=(17,11)) proj = cartopy.crs.PlateCarree() ax = plt.subplot(111, projection=proj) ax = plt.axes(projection=proj) #algo = new_data.plot(column='SppN',ax=ax,cmap=colormap,edgecolors='') #ax.set_extent([-93, -70, 30, 50]) ax.set_extent([-125, -60, 20, 50]) #ax.set_extent([-95, -70, 25, 45]) #ax.add_feature(cartopy.feature.LAND) ax.add_feature(cartopy.feature.OCEAN) ax.add_feature(cartopy.feature.COASTLINE) ax.add_feature(cartopy.feature.BORDERS, linestyle=':') ax.add_feature(cartopy.feature.LAKES, alpha=0.9) ax.stock_img() #ax.add_geometries(new_data.geometry,crs=cartopy.crs.PlateCarree()) #ax.add_feature(cartopy.feature.RIVERS) mm = ax.pcolormesh(Xx,Yy,means.reshape(Nn,Nn),transform=proj ) #cs = plt.contour(Xx,Yy,np.sqrt(variances).reshape(Nn,Nn),linewidths=2,cmap=plt.cm.Greys_r,linestyles='dotted') cs = plt.contour(Xx,Yy,means.reshape(Nn,Nn),linewidths=2,colors='k',linestyles='dotted',levels=[4.0,5.0,6.0,7.0,8.0]) plt.clabel(cs, fontsize=16,inline=True,fmt='%1.1f') #ax.scatter(new_data.lon,new_data.lat,edgecolors='',cmap=plt.cm.Accent,alpha=0.6) plt.colorbar(mm) plt.title("Predicted Species Richness") #(x.LON > -90) & (x.LON < -80) & (x.LAT > 40) & (x.LAT < 50)

In [74]:



Out[74]:
<matplotlib.text.Text at 0x7f71b27f8c50>

Predictions with +2std.Dev


In [173]:
#Using k-partition = 7
-


Out[173]:
<matplotlib.text.Text at 0x7f71b13d6310>

Predicted means


In [174]:
#Using k-partition = 7
import cartopy
plt.figure(figsize=(17,11))

proj = cartopy.crs.PlateCarree()
ax = plt.subplot(111, projection=proj)


ax = plt.axes(projection=proj)
#algo = new_data.plot(column='SppN',ax=ax,cmap=colormap,edgecolors='')


#ax.set_extent([-93, -70, 30, 50])
ax.set_extent([-125, -60, 20, 50])
#ax.set_extent([-95, -70, 25, 45])

#ax.add_feature(cartopy.feature.LAND)
ax.add_feature(cartopy.feature.OCEAN)
ax.add_feature(cartopy.feature.COASTLINE)
ax.add_feature(cartopy.feature.BORDERS, linestyle=':')
ax.add_feature(cartopy.feature.LAKES, alpha=0.9)
ax.stock_img()
#ax.add_geometries(new_data.geometry,crs=cartopy.crs.PlateCarree())
#ax.add_feature(cartopy.feature.RIVERS)
mm = ax.pcolormesh(Xx,Yy,means.reshape(Nn,Nn),transform=proj )
#cs = plt.contour(Xx,Yy,np.sqrt(variances).reshape(Nn,Nn),linewidths=2,cmap=plt.cm.Greys_r,linestyles='dotted')
cs = plt.contour(Xx,Yy,means.reshape(Nn,Nn),linewidths=2,colors='k',linestyles='dotted',levels=range(1,20))
plt.clabel(cs, fontsize=16,inline=True,fmt='%1.1f')
#ax.scatter(new_data.lon,new_data.lat,edgecolors='',color='white',alpha=0.6)
plt.colorbar(mm)
plt.title("Predicted Species Richness")


Out[174]:
<matplotlib.text.Text at 0x7f71afcc6f50>

In [150]:
#Using k-partition = 7
import cartopy
plt.figure(figsize=(17,11))

proj = cartopy.crs.PlateCarree()
ax = plt.subplot(111, projection=proj)


ax = plt.axes(projection=proj)
#algo = new_data.plot(column='SppN',ax=ax,cmap=colormap,edgecolors='')


#ax.set_extent([-93, -70, 30, 50])
ax.set_extent([-125, -60, 20, 50])
#ax.set_extent([-95, -70, 25, 45])

#ax.add_feature(cartopy.feature.LAND)
ax.add_feature(cartopy.feature.OCEAN)
ax.add_feature(cartopy.feature.COASTLINE)
ax.add_feature(cartopy.feature.BORDERS, linestyle=':')
ax.add_feature(cartopy.feature.LAKES, alpha=0.9)
ax.stock_img()
#ax.add_geometries(new_data.geometry,crs=cartopy.crs.PlateCarree())
#ax.add_feature(cartopy.feature.RIVERS)
mm = ax.pcolormesh(Xx,Yy,means.reshape(Nn,Nn) - (2* np.sqrt(variances).reshape(Nn,Nn)),transform=proj )
#cs = plt.contour(Xx,Yy,np.sqrt(variances).reshape(Nn,Nn),linewidths=2,cmap=plt.cm.Greys_r,linestyles='dotted')
cs = plt.contour(Xx,Yy,means.reshape(Nn,Nn) - (2 * np.sqrt(variances).reshape(Nn,Nn)),linewidths=2,colors='k',linestyles='dotted',levels=[4.0,5.0,6.0,7.0,8.0])
plt.clabel(cs, fontsize=16,inline=True,fmt='%1.1f')
#ax.scatter(new_data.lon,new_data.lat,edgecolors='',color='white',alpha=0.6)
plt.colorbar(mm)
plt.title("Predicted Species Richness - 2stdev")


Out[150]:
<matplotlib.text.Text at 0x7f71b008c650>

Model Analysis


In [151]:
model.get_parameter_dict()


Out[151]:
{'name.kern.lengthscales': array([ 25.48461224]),
 'name.kern.variance': array([ 10.9742076]),
 'name.likelihood.variance': array([ 4.33463027])}

Let's calculate the residuals


In [152]:
X_ = data[['LON','LAT']]
%time Y_hat = model.predict_y(X_)


CPU times: user 53.5 s, sys: 1.36 s, total: 54.8 s
Wall time: 48.3 s

In [153]:
pred_y = pd.DataFrame(Y_hat[0])
var_y = pd.DataFrame(Y_hat[1])

In [154]:
new_data['pred_y'] = pred_y
new_data['var_y'] = var_y

In [155]:
new_data= new_data.assign(error=lambda y : (y.SppN - y.pred_y)**2 )

In [156]:
new_data.error.hist(bins=50)


Out[156]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f71b04d0e10>

In [157]:
print(new_data.error.mean())
print(new_data.error.std())


4.22043980672
7.01996573172

Experiment

In this section we will bring a Raster Data from the US, using Biospytial Raster API.

  1. First select a polygon, then get A Raster FRom there, say Mean Temperature.

In [2]:
import raster_api.tools as rt
from raster_api.models import MeanTemperature,ETOPO1,Precipitation,SolarRadiation
from sketches.models import Country

In [3]:
## Select US
us_border = Country.objects.filter(name__contains='United States')[1]

In [167]:
from django.db import close_old_connections

In [168]:
close_old_connections()

In [4]:
#Get Raster API
us_meantemp = rt.RasterData(Precipitation,us_border.geom)
us_meantemp.getRaster()



OperationalErrorTraceback (most recent call last)
<ipython-input-4-68b631962010> in <module>()
      1 #Get Raster API
      2 us_meantemp = rt.RasterData(Precipitation,us_border.geom)
----> 3 us_meantemp.getRaster()

/apps/raster_api/tools.pyc in getRaster(self, **bandnumber)
    131         # First filter by border
    132         #self.model = self.model.filter(rast__intersect_with=self.geometry)
--> 133         agg_dic = self.model.aggregate(raster=Union('rast',geometry=self.geometry,**bandnumber))
    134         raster = aggregateDictToRaster(aggregate_dic=agg_dic)
    135         self.rasterdata = raster

/opt/conda/envs/biospytial/lib/python2.7/site-packages/django/db/models/query.pyc in aggregate(self, *args, **kwargs)
    354             if not query.annotations[alias].contains_aggregate:
    355                 raise TypeError("%s is not an aggregate expression" % alias)
--> 356         return query.get_aggregation(self.db, kwargs.keys())
    357 
    358     def count(self):

/opt/conda/envs/biospytial/lib/python2.7/site-packages/django/db/models/sql/query.pyc in get_aggregation(self, using, added_aggregate_names)
    455         outer_query.select_related = False
    456         compiler = outer_query.get_compiler(using)
--> 457         result = compiler.execute_sql(SINGLE)
    458         if result is None:
    459             result = [None for q in outer_query.annotation_select.items()]

/opt/conda/envs/biospytial/lib/python2.7/site-packages/django/db/models/sql/compiler.pyc in execute_sql(self, result_type)
    833         cursor = self.connection.cursor()
    834         try:
--> 835             cursor.execute(sql, params)
    836         except Exception:
    837             cursor.close()

/opt/conda/envs/biospytial/lib/python2.7/site-packages/django/db/backends/utils.pyc in execute(self, sql, params)
     77         start = time()
     78         try:
---> 79             return super(CursorDebugWrapper, self).execute(sql, params)
     80         finally:
     81             stop = time()

/opt/conda/envs/biospytial/lib/python2.7/site-packages/django/db/backends/utils.pyc in execute(self, sql, params)
     62                 return self.cursor.execute(sql)
     63             else:
---> 64                 return self.cursor.execute(sql, params)
     65 
     66     def executemany(self, sql, param_list):

/opt/conda/envs/biospytial/lib/python2.7/site-packages/django/db/utils.pyc in __exit__(self, exc_type, exc_value, traceback)
     92                 if dj_exc_type not in (DataError, IntegrityError):
     93                     self.wrapper.errors_occurred = True
---> 94                 six.reraise(dj_exc_type, dj_exc_value, traceback)
     95 
     96     def __call__(self, func):

/opt/conda/envs/biospytial/lib/python2.7/site-packages/django/db/backends/utils.pyc in execute(self, sql, params)
     62                 return self.cursor.execute(sql)
     63             else:
---> 64                 return self.cursor.execute(sql, params)
     65 
     66     def executemany(self, sql, param_list):

OperationalError: SSL SYSCALL error: EOF detected

In [124]:
us_meantemp.display_field()



In [110]:
%time coords = us_meantemp.getCoordinates()


CPU times: user 5min 10s, sys: 8.22 s, total: 5min 19s
Wall time: 5min 19s

In [111]:



Out[111]:
67795328

In [ ]:
plot