Authors: Matthew Perry (mperry@ecotrust.org) and David Diaz (ddiaz@ecotrust.org)
Several research studies have shown that purely statistical approaches to mapping species distributions in response to projected climate changes are commonly observed to over-predict climatic vulnerability and species’ range losses in comparison to process-based models (e.g., Morin & Thuiller 2009; Keenan et al. 2011; Cheaib et al. 2012).
Most bioclimatic envelope models are trained using observations (plots) of the presence and absence of a given species. The resulting model effectively considers all absences as being due to explantory climatic factors. However, there are many non-climatic factors which might explain the absence of a species in a forest inventory plot such as historical or recent management activity, competition from other species, disturbances, soils, sampling bias or sampling error.
For example, although it is well-known that Douglas-fir is capable of growing in the Willamette Valley of western Oregon, there is a high occurrence of plots with Douglas-fir absent in the Willamette Valley. The absence of Douglas-fir is likely due to human land-use, competition with other species, and similar factors that are not related to the climatic suitability. In other words, they are a false negatives. When these false negatives are used to train predictive models, if/when future climatic conditions in other areas begin to resemble those of the Willamette Valley (or other places with 'false negatives'), the statistical models will under-predict suitability for the species.
Here we present a general methodology for attenuating the influence of these false negatives. In essence, we adjust our interpretation of observed absences by using auxillary information (from process-based models or expert judgement) to classify observed absences as either:
False negatives are then removed from the training data and are not considered when we develop our model. This yeilds a more robust model which more closely matches the fundamental climatic niche of the species and is therefore more appropriate for predicting the impact of climate change on future species distributions.
In this exploratory notebook, we have closely followed the Random Forest-based approach used by Crookston et al. (2010) that currently drives the calculation of climatic suitability scores that are factored into Climate-FVS. We will attempt to replicate the results using publically available Forest Inventory Analysis (FIA) plot information. It is important to note that the spatial precision for public FIA plot data has been degraded to protect the privacy of land owners. In some cases, these plots might be shifted by as much as 1 mile. As Crookston et. al. had access to the original (non-degraded) spatial data, the results are not directly comparable.
We will demostrate the workflow using python along with the following packages to perform the analysis:
We can map the FIA plots with Douglas-fir absence (GRIDCODE==0
)
In [1]:
import geopandas as gpd
fia = gpd.GeoDataFrame.from_file('data/DF_PNW.shp')
fia.head(10)
Out[1]:
In [2]:
print("Coordinate reference system is: {}".format(fia.crs))
print("{} observations with {} columns".format(*fia.shape))
We can map the FIA plots with Douglas-fir absence (GRIDCODE==0
)
In [3]:
%matplotlib inline
fia[fia.GRIDCODE == 0].plot()
Out[3]:
And map the FIA plots with Douglas-fir present (GRIDCODE==1
)
In [4]:
fia[fia.GRIDCODE == 1].plot()
Out[4]:
Next, we will load up the Douglas Fir (Pseudotsuga menziesii) map from Elbert Little (citation?) which we can use as "expert judgement" to define the current suitable range.
In [5]:
df_little = gpd.GeoDataFrame.from_file("data/pseumenz.shp")
df_little.head()
Out[5]:
In [6]:
df_little.plot()
Out[6]:
Finally, we will load the results of a process-model approach (3PG) employed by Coops et al. (2009; 2011). This is a raster dataset which can be accessed using rasterio
In [7]:
import rasterio
import numpy as np
with rasterio.open("data/df_mask_hist.img") as ds:
meta_coops = ds.meta
df_coops = np.ma.masked_equal(ds.read_band(1), meta_coops['nodata'])
np.histogram(df_coops.compressed())
Out[7]:
In [8]:
from pylab import plt
def plotit(x, title, cmap="Blues"):
plt.imshow(x, cmap=cmap, interpolation='nearest')
plt.colorbar()
plt.title(title)
plt.show()
plotit(df_coops, "Current Douglas-fir Suitability, Coops et al. ")
Using the Little maps as "expert judgement" we will model the current climatic envelope of Douglas-fir. To do this, we can use a spatial join
to determine which plots are inside or outside of Little's mapped Douglas-fir range.
First, let's apply a buffer in order to eliminate any edge effects along the coast and then select only those polygons where df is predicted to be viable (CODE == 1
) and.
Note that the buffer
operation on a GeoDataFrame
currently returns a GeoSeries
. This is a known bug. In order to work around it, we need to copy the old dataframe and set it's geometry explicity.
In [9]:
df_little_buff = df_little.copy()
df_little_buff['geometry'] = df_little.buffer(0.01)
df_little_buff = df_little_buff[df_little_buff.CODE == 1]
df_little_buff.plot()
Out[9]:
Now we can perform the spatial join (sjoin
) between our plots and the Little maps. This will give us a GeoDataFrame
with all fia plot point geometries, the GRIDCODE
column indicating presence/abscence from the fia data, and the CODE
column incidating if the point is within the range as defined by Little.
In [10]:
from geopandas.tools import sjoin
fia_little = sjoin(fia, df_little_buff, how="left")
fia_little.head()
Out[10]:
Our false negatives will be those points where the Douglas-fir is asbsent but the range is suitable according to Little. Specifically, we will keep only the points where GRIDCODE == 1
or CODE != 1
(either Douglas-fir is observerved or it's not in the suitable range)
GRIDCODE | |||
1 | 0 | ||
CODE | 1 | keep | *remove* |
NA | keep | keep |
In [11]:
orig_shape = fia_little.shape
# Masking: remove false negatives (rmfn)
fia_little_rmfn = fia_little[(fia_little.GRIDCODE == 1) | (fia_little.CODE != 1)]
new_shape = fia_little_rmfn.shape
print("Removed false negatives from FIA plots\nWent from {} points to {}.".format(orig_shape[0], new_shape[0]))
In this section we will use the fia plots and climate data provided by USGS TODO to predict a map of current climatic suitability for Douglas-fir. We will use a similar approach to Crookston et. al. 2010; using a Random Forest supervised classification to predict presence/absence based on explanatory climatic variables for which we have raster data for our study area.
First we will perform the analysis using the fia
dataframe (the original point dataset before we filtered out false negatives) and then compare the results with the fia_little_rmfn
(after removing false negatives).
In [12]:
import os
explanatory_fields = [
'd100',
'dd0',
'dd5',
'fday',
'ffp',
'gsdd5',
'gsp',
'map',
'mat_tenths',
'mmax_tenths',
'mmindd0',
'mmin_tenths',
'mtcm_tenths',
'mtwm_tenths',
'sday']
explanatory_rasters = [os.path.join('data', 'raster', "current_" + r + ".img") for r in explanatory_fields]
We will pyimpute
to prep the data and generate the raster maps of suitability.
In [13]:
from pyimpute import load_training_vector
train_xs, train_y = load_training_vector(fia, explanatory_rasters, response_field='GRIDCODE')
train_xs.shape, train_y.shape
Out[13]:
And we will use scikit-learn
to implement the Random Forest model itself.
In [14]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=10, n_jobs=1)
clf.fit(train_xs, train_y)
Out[14]:
Let's evaluate the accuracy of the classification
In [15]:
from pyimpute import evaluate_clf
evaluate_clf(clf, train_xs, train_y)
Finally, let's use the classifier to impute the current range of Douglas-fir (i.e. generate raster maps). Though the Random Forest model attempts to classify each pixel as suitable/unsuitable, it also provides a certainty estimate which is interpreted as a relative suitability score with 0 being completely unsuitable and 1 being completely suitable.
In [16]:
from pyimpute import impute, load_targets
target_xs, raster_info = load_targets(explanatory_rasters)
impute(target_xs, clf, raster_info,
outdir="_df_output_current",
class_prob=True, certainty=True)
The output consists of a responses.img
raster which is the predicted class (0 or 1) and probability_1.img
which is our current climatic suitability index.
In [17]:
with rasterio.open("_df_output_current/probability_1.img") as ds:
current_df = ds.read_band(1)
plotit(current_df, "Douglas-fir Suitability (unmasked)", cmap="Greens")
Let's zoom in to the Willamette Valley and check the predicted suitability. We see that there is very low values as many of the FIA plots did not contain Douglas-fir due to management, agriculture, etc. By incorporating these false negatives we have effectively trained the model to consider the current climate of the Willamette valley as unsuitable for Douglas-fir when it is most likely that non-climatic factors are the likely cause of absence.
In [18]:
plotit(current_df[550:850, 600:800], "Douglas-fir Suitability, Willamette (unmasked)", cmap="Greens")
Now let's repeat the analysis but this time use the FIA plots that have been filtered to exclude false negatives based on expert judgement in the form of Elbert Little's maps. (fia_little_rmfn
)
In [19]:
train_xs_rmfn, train_y_rmfn = load_training_vector(fia_little_rmfn, explanatory_rasters, response_field='GRIDCODE')
train_xs_rmfn.shape, train_y_rmfn.shape
Out[19]:
In [20]:
clf_rmfn = RandomForestClassifier(n_estimators=10, n_jobs=3)
clf_rmfn.fit(train_xs_rmfn, train_y_rmfn)
evaluate_clf(clf, train_xs_rmfn, train_y_rmfn)
In [21]:
target_xs_rmfn, raster_info_rmfn = load_targets(explanatory_rasters)
impute(target_xs, clf_rmfn, raster_info_rmfn,
outdir="_df_output_current_rmfn",
class_prob=True, certainty=True)
In [22]:
with rasterio.open("_df_output_current_rmfn/probability_1.img") as ds:
current_df_rmfn = ds.read_band(1)
plotit(current_df_rmfn, "Douglas-fir Suitability (masked)", cmap="Greens")
plotit(current_df_rmfn[550:850, 600:800], "Douglas-fir Suitability, Willamette (masked)", cmap="Greens")
This masked model with false negatives removed is not as adept at showing variation in the current spatial distribution. But it is more appropriate for showing where the climate could support Douglas-fir.
The preceding plots demonstrate a subtle difference in interpretation and purpose. The first set (with all absences considered) is trying to map the realized niche, and is answering the question
Where does this species exist?
For mapping current species distributions, this approach may be valid. However, it is problematic to assume that this non-masked relationship between species and climate will hold true in different locations when projected onto future climate scenarios. In other words, projecting the non-masked model onto future climate scenarios assumes that the non-climatic factors (which partly explain the distribution) will migrate in step with the climate.
It is clear that we need to modify the approach to model the fundamental climatic niche of the species. By masking false negatives, we are answering the question
Where could this species exist considering only climatic factors?
When we project this climatic niche onto future climate scenarios, the resulting map is not one of species distribution (i.e. it does not define where Douglas-fir will exist) but of climatic suitability (i.e. where the climatic niche of douglas fir will exist).
If the soils, human managment, lack of seed sources or any other non-climatic factors prevent Douglas-fir from growing, any change in climatic suitability will hardly matter. Conversely if the non-climatic factors are just right for Douglas-fir but the area is shifting out of the climatic niche due to climate change, it may be vulnerable to increased mortality.
In this section we'll demonstrate the application of the models to map future climatic suitability and compare the results of non-masked and masked models.
First, load the target raster data for the Ensemble of GCMs under the 6.0 representitive concentration pathway (moderate emmissions) for the year 2090:
In [23]:
# Swap out for future climate rasters
future_explanatory_rasters = [os.path.join('data', 'raster', "Ensemble_rcp60_y2090_{}.img".format(r))
for r in explanatory_fields]
target_xs, raster_info = load_targets(future_explanatory_rasters)
impute(target_xs, clf, raster_info, outdir="_df_output_Ensemble_rcp60_y2090",
linechunk=400, class_prob=True, certainty=True)
In [24]:
with rasterio.open("_df_output_Ensemble_rcp60_y2090/probability_1.img") as ds:
df_Ensemble_rcp60_y2090 = ds.read_band(1)
plotit(df_Ensemble_rcp60_y2090, "2090 Douglas-fir Suitability (unmasked)", cmap="Greens")
We can take the difference between current and future suitability to get a visual sense of the predicted change
In [25]:
loss = current_df - df_Ensemble_rcp60_y2090
loss[loss < 0] = 0
plotit(loss, "Loss of Douglas-fir suitability (unmasked)", cmap="Blues")
According to this interpretation, in the southern edge of it's current range, Douglas-fir would appear to become entirely unsuitable. In the western coastal ranges, suitability would decrease significantly.
Contrast this future projection of Douglas-fir suitability with the masked version.
In [26]:
impute(target_xs, clf_rmfn, raster_info, outdir="_df_output_Ensemble_rcp60_y2090_rmfn",
linechunk=400, class_prob=True, certainty=True)
In [27]:
with rasterio.open("_df_output_Ensemble_rcp60_y2090_rmfn/probability_1.img") as ds:
df_Ensemble_rcp60_y2090_rmfn = ds.read_band(1)
plotit(df_Ensemble_rcp60_y2090_rmfn, "2090 Douglas-fir Suitability (masked)", cmap="Greens")
In [28]:
loss_rmfn = current_df_rmfn - df_Ensemble_rcp60_y2090_rmfn
loss_rmfn[loss_rmfn < 0] = 0
plotit(loss_rmfn, "Loss of Douglas-fir suitability (masked)", cmap="Blues")
If we compare the non-masked to masked results for 2090, we can see where they over and under predict relative to one another. Of note is the fact that the masked plots predict sustaintially higher future suitability in Douglas-fir's current/historical range.
In [29]:
diff = df_Ensemble_rcp60_y2090_rmfn - df_Ensemble_rcp60_y2090
# Smoothing
from scipy import ndimage
diff_smooth = ndimage.filters.gaussian_filter(diff, 1.5, mode='nearest')
plotit(diff_smooth, "Diff in Douglas-fir Suitability (masked - unmasked)", cmap="jet")
Cheaib, A., Badeau, V., Boe, J., Chuine, I., Delire, C., Dufrêne, E., François, C., Gritti, E.S., Legay, M., Pagé, C., Thuiller, W., Viovy, N. & Leadley, P. (2012)
Climate change impacts on tree ranges: model intercomparison facilitates understanding and quantification of uncertainty: Understanding and quantification of uncertainties of climate change impacts on tree range. Ecology Letters, 15, 533–544.
Coops, N.C., Waring, R.H., Beier, C., Roy-Jauvin, R. & Wang, T. (2011) Modeling the occurrence of 15 coniferous tree species throughout the Pacific Northwest of North America using a hybrid approach of a generic process -based growth model and decision tree analysis: Modeling the occurrence of fifteen coniferous tree species. Applied Vegetation Science, 14, 402–414.
Coops, N.C., Waring, R.H. & Schroeder, T.A. (2009) Combining a generic process-based productivity model and a statistical classification method to predict the presence and absence of tree species in the Pacific Northwest, U.S.A. Ecological Modelling, 220, 1787–1796.
Crookston, N.L., Rehfeldt, G.E., Dixon, G.E. & Weiskittel, A.R. (2010) Addressing climate change in the forest vegetation simulator to assess impacts on landscape forest dynamics. Forest Ecology and Management, 260, 1198–1211.
Keenan, T., Maria Serra, J., Lloret, F., Ninyerola, M. & Sabate, S. (2011) Predicting the future of forests in the Mediterranean under climate change, with niche- and process-based models: CO2 matters! Global Change Biology, 17, 565–579.
Morin, X. & Thuiller, W. (2009) Comparing niche-and process-based models to reduce prediction uncertainty in species range shifts under climate change. Ecology, 90, 1301–1313.