Hypsometric analysis as the study of land elevations about a given datum can be traced back to the works of German Geographer Albrecht Penck$^1$, although its modern implementation is usually related to a seminal paper by A.N.Strahler$^2$. The area-altitude distribution can be shown as hypsographic or hypsometric curves. The hypsographic curve uses absolute units of measure, where elevation is plotted on the ordinate and the area above a given elevation on the abscissa. The hypsometric curve uses adimensional axes to show the relation of area an elevation of a point about the total area and maximum elevation of a region$^{3,4,5}$ (Supplementary Figure 1A).
One important point is that both representations are cumulative curves, not simple histograms of elevation distribution. The Empirical Cumulative Distribution Function (ECDF) of a DEM can be used to calculate the accumulated area (or relative area) per elevation and used to construct a hypsometric curve (as in the R package hydroTSM
$^6$). To plot the hypsographic curve, on the other hand, the real area of pixels is needed, and the ECDF cannot be used.
The area of the pixels of a DEM in a 'Latitude-Longitude' projection will decrease towards the pole as the length of an arc of longitude tends to zero at $90^\circ$, and this variation must be considered for a proper hypsometric analysis. This could be achieved by calculating the size of the pixels (as shown in the code below) or by using tools such as the R package raster
$^7$, or the GRASS-GIS$^8$ module r.stats
.
Elsen & Tingley used a data set of "182 expert-delineated mountain ranges" available from Natural Earth. The authors misinterpreted the metadata and stated that the data set is "roughly accurate to 50m". That is not the case. Natural Earth distributes geographical data at three scales: "1:10m" (1:10,000,000), "1:50m" (1:50,000,000) and "1:250m" (1:250,000,000). Despite the use of a lower case "m" to indicate the 1:1,000,000 scale, the documentation is clear:
"Primarily derived from Patterson’s Physical Map of the World. Polygons defined by international team of volunteers. The boundaries of physical regions should be taken with a grain of salt. They are roughly accurate to 50m scale, although the number of features included is to the 10m scale. Use these polygons to for map algebra operations at your own risk!"
The README file for this dataset is available at http://www.naturalearthdata.com/downloads/10m-physical-vectors/10m-physical-labels/ and Tom Patterson's Map can be accessed at http://www.shadedrelief.com/world/index.html.
The maps in Figure 1(B-F) show Natural Earth polygons (in black) and polygons delimiting the same mountain ranges at larger scales (in red) for five of the mountain ranges analysed by Elsen & Tingley:
The differences between the boundaries are considered to be large enough to influence the results obtained by Elsen & Tingley, as it can be seen in the graphics of Supplementary Figure 1(B-F).
In this Supplementary Information I intent to show how the "1:50m" boundaries used to delineate the mountain ranges will influence on the hypsometric analysis. Additionally, Python code is presented for the calculation of hypsographic and hypsometric curves, and for the "hypsographic histograms" used by Elsen & Tingley. The code is presented as an IPython (Jupyter) Notebook, available at GitHub (https://github.com/CarlosGrohmann/hypsometric), where the data
directory contains all necessary GeoTIFFs and shapefiles.
The plots shown in the code are low-resolution examples of the results obtained. The reader is referred to the Supplementary Figure 1 for the final plots of each mountain range analysed here.
The data used in this supplementary information was acquired from the following sources:
Natural Earth - Boundaries of mountain ranges derived from Patterson’s Physical Map of the World and used by Elsen & Tingley. Scale 1:50,000,000. Available at http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_geography_regions_polys.zip (Last access: 2015-06-17)
Alps - Boundary of the Alps from the "Eco-pedological Map for the Alpine Territory" project (ECALP). No scale indicated. Available at http://eusoils.jrc.ec.europa.eu/projects/alpsis/Ecalp_data.html (Last access: 2015-06-17)
Blue Ridge (USA) - Boundary of Blue Ridge range at 1:7,000,000. From: Fenneman, N.M., and Johnson, D.W., 1946, Physiographic Divisions of the United States, U.S. Geological Survey (USGS), Washington, D.C.; Available at http://water.usgs.gov/GIS/metadata/usgswrd/XML/physio.xml (Last access: 2015-06-17)
Cachimbo, Ibiapaba and Espinhaco Ranges (Brazil) - From: IBGE (Brazilian Institute of Geography and Statistics), 2006. Map of Landscape Units of Brazil at 1:5,000,000 (Instituto Brasileiro de Geografia e Estatística, 2006. Mapa de unidades de relevo do Brasil 1:5.000.000). Available at ftp://geoftp.ibge.gov.br/mapas_tematicos/mapas_murais/shapes/relevo/ (Last access: 2015-06-17)
1 - Penck, A., 1894, Morphologie der Erdoberfläche, Stuttgart, J. Engelhorn, 2 vols.
2 - Strahler, A.N., 1952. Hypsometric (area-altitude) analysis of erosional topography. Bulletin of the Geological Society of America, 63, 1117-1142.
3 - Péguy, C.P., 1942. Principes de morphométrie alpine. Revue de Géographie Alpine, 30, 453-486.
4 - Langbein, W.B., 1947. Topographic characteristics of drainage basin. U.S. Geological Survey, Water Supply Paper 968-C, 125-157.
5 - Luo, W., 1998. Hypsometric analysis with a Geographic Information System. Computers & Geosciences, 24, 815-821.
6 - Zambrano-Bigiarini, M., 2014. hydroTSM: Time series management, analysis and interpolation for hydrological modelling. R package. Available at: http://cran.r-project.org/web/packages/hydroTSM/index.html. (Last access: 2015-06-17)
7 - Hijmans, R. J., 2015. raster: Geographic Data Analysis and Modeling. R package. Available at: http://cran.r-project.org/web/packages/raster/index.html. (Last access: 2015-06-17
8 - Neteler, M., Bowman, M.H., Landa, M., Metz, M., 2012. GRASS GIS: A multi-purpose open source GIS. Environmental Modelling & Software, 31, 124-130.
In [1]:
import sys, os
import numpy as np
import math as math
import numpy.ma as ma
from matplotlib import cm
from matplotlib.colors import LightSource
from scipy import ndimage
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
%matplotlib inline
# import osgeo libs after basemap, so it
# won't cause conflicts (Assertion failed..)
# with mannualy-installed GEOS
import gdal, ogr
import shapefile as shpf
Define functions
The haversine formula is used to calculate the distance between two points on a spherical approximation of the Earth. Adapted from http://rosettacode.org/wiki/Haversine_formula#Python. Latitude and Longitude must be in decimal degrees. The value used here is the Mean Radius for Earth as defined by the International Union of Geodesy and Geophysics (IUGG).
In [2]:
# auxiliar functions
def roundBase(x, base=5):
return int(base * round(float(x)/base))
def roundUp(x, base=50):
return int(base * np.ceil(float(x)/base))
def roundDown(x, base=50):
return int(base * np.floor(float(x)/base))
def haversine(lon1, lat1, lon2, lat2, r=6371.009):
R = r # Earth radius in kilometers
dLat = math.radians(lat2 - lat1)
dLon = math.radians(lon2 - lon1)
lat1 = math.radians(lat1)
lat2 = math.radians(lat2)
a = math.sin(dLat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dLon/2)**2
c = 2 * math.asin(math.sqrt(a))
return R * c
Define variables for shapefiles and GeoTIFF
In [5]:
# files
dataDir = './data/'
mountain = 'cachimbo' # 'alps', 'blueRidge', 'espinhaco', 'cachimbo', 'ibiapaba'
mtn = mountain + '.shp'
mtn_NE = mountain + '_NEarth.shp'
tiff = mountain + '.tif'
# label for 5M/7M boundaries
source = 'IBGE'# brazilian maps
# source = 'ECALP' # Alps
# source = 'Fenneman & Johnson 1946' # USA Physiographic Provinces
Import GeoTIFF
In [6]:
rast = gdal.Open(tiff)
rast_band = rast.GetRasterBand(1)
rast_array = rast.ReadAsArray()
rast_stats = rast_band.GetStatistics( True, True )
rast_min = rast_stats[0]
rast_max = rast_stats[1]
Get GeoTransformation parameters, calculate image extents
In [7]:
w_lon, xdim, rot1, n_lat, rot2, ydim = rast.GetGeoTransform()
e_lon = w_lon + xdim * rast.RasterXSize
s_lat = n_lat + ydim * rast.RasterYSize
Load shapefiles (for plotting only)
In [8]:
bound_5M = shpf.Reader(mtn)
bound_5M_lonlat = np.array(bound_5M.shape().points)
bound_NE = shpf.Reader(mtn_NE)
bound_NE_lonlat = np.array(bound_NE.shape().points)
Create basemap with shaded relief image and mountain range boundaries
In [9]:
m = Basemap(projection='merc', llcrnrlat=s_lat, urcrnrlat=n_lat, llcrnrlon=w_lon, \
urcrnrlon=e_lon, resolution='c')
ls = LightSource(azdeg=135,altdeg=25)
rgb = ls.shade(rast_array,plt.cm.Greys)
m_shade = m.imshow(rgb, origin='upper')
m_color = m.imshow(rast_array, origin='upper',cmap=plt.cm.terrain, alpha=0.8, vmin=-150)
bounds = range(0, roundUp(rast_max), 50)
cbar = m.colorbar(size='3%', boundaries=bounds)
cbar.ax.tick_params(labelsize=8)
m.drawmapscale(lon=e_lon-0.8, lat=s_lat+0.5, lon0=e_lon, lat0=s_lat, length=100)
xticks = np.arange(roundBase(w_lon), roundBase(e_lon), 2)
yticks = np.arange(roundBase(s_lat), roundBase(n_lat), 2)
m.drawparallels(yticks, linewidth=0.2, labels=[1,0,0,0], fontsize=9) # draw parallels
m.drawmeridians(xticks, linewidth=0.2, labels=[0,0,1,0], fontsize=9) # draw meridians
m.plot(bound_NE_lonlat[:,0], bound_NE_lonlat[:,1], c='k', label='Natural Earth', latlon=True)
m.plot(bound_5M_lonlat[:,0], bound_5M_lonlat[:,1], c='r', label=source, latlon=True)
lg = plt.legend(loc='upper right', fontsize=9)
lg.get_frame().set_alpha(.8) # A little transparency
# plt.show()
# plt.savefig(mtn + '.pdf', dpi=600, bbox_inches='tight')
# plt.clf()
Mask original raster with shapefiles
Uses external gdalwarp
utility. Pixels outside the boundary polygon will be assigned a -9999
value.
In [10]:
# 5M limits
out_mtn = dataDir + mountain + '_clip_5M.tif'
os.system('gdalwarp -overwrite -dstnodata -9999 -cutline %s %s %s' %(mtn, tiff, out_mtn))
# Natural Earth
out_NE = dataDir + mountain + '_clip_NE.tif'
os.system('gdalwarp -overwrite -dstnodata -9999 -cutline %s %s %s' %(mtn_NE, tiff, out_NE))
Out[10]:
Load clipped rasters
The -9999
value is set to NaN
(Not a Number), in a masked Numpy array.
In [11]:
# 5M
rast_clip = gdal.Open(out_mtn)
clip_bd = rast_clip.GetRasterBand(1)
clip_array = rast_clip.ReadAsArray()
clip_mask = ma.masked_where(clip_array == -9999, clip_array)
# NatEarth
rast_clip_NE = gdal.Open(out_NE)
clip_NE_bd = rast_clip_NE.GetRasterBand(1)
clip_NE_array = rast_clip_NE.ReadAsArray()
clip_NE_mask = ma.masked_where(clip_NE_array == -9999, clip_NE_array)
Set yres
to a positive value
Used to calculate the area of each pixel.
In [12]:
if ydim < 0:
yres = ydim * -1.0
Calculate pixel size (in km) along the N-S direction
This value (dy
) does not change with Latitutde
In [13]:
dy = haversine(0, 0, 0, ydim, r=6371.009)
In [14]:
# array with indices
rows, cols = np.indices(rast_array.shape)
nrows = rast_array.shape[0]
ncols = rast_array.shape[1]
# new array for area values
area_array = np.empty(rast_array.shape)
# nested loop to create array with area values
for row in range(nrows):
for col in range(ncols):
y = row
lat = n_lat - ((y - 0.5) * yres)
dx = haversine(0, lat, xdim, lat, r=6371.009)
area_array[row,col] = dx * dy
In [15]:
# elevation 5M
stats_clip = clip_bd.GetStatistics( True, True )
clip_min = stats_clip[0]
clip_max = stats_clip[1]
# heigh of point/contour above base of basin
clip_array_comp = ma.compressed(clip_mask)
h_clip = clip_array_comp - clip_min
# total height of basin
H_clip = clip_max - clip_min
# normalize elev for hypsometric curve
elevNorm_clip = h_clip / H_clip
# elevation NatEarth
stats_clip_NE = clip_NE_bd.GetStatistics( True, True )
clip_NE_min = stats_clip_NE[0]
clip_NE_max = stats_clip_NE[1]
clip_array_NE_comp = ma.compressed(clip_NE_mask)
h_clip_NE = clip_array_NE_comp - clip_min
H_clip_NE = clip_NE_max - clip_NE_min
elevNorm_clip_NE = h_clip_NE / H_clip_NE
In [16]:
# cell area 5M
area_clip = ma.masked_where(clip_array == -9999, area_array)
# total area of basin/area
area_clip_sum = np.sum(area_clip)
# cumulative area for hypsographyc curve
area_clip_csum = np.cumsum(ma.compressed(area_clip))
# normalized area for hypsometric curve
area_norm_clip = area_clip / area_clip_sum
area_norm_csum = np.cumsum(ma.compressed(area_norm_clip))
# cell area NatEarth
area_clip_NE = ma.masked_where(clip_NE_array == -9999, area_array)
area_clip_sum_NE = np.sum(area_clip_NE)
area_clip_csum_NE = np.cumsum(ma.compressed(area_clip_NE))
area_norm_clip_NE = area_clip_NE / area_clip_sum_NE
area_norm_csum_NE = np.cumsum(ma.compressed(area_norm_clip_NE))
In [19]:
# 5M
plt.plot(area_clip_csum[::-1], np.sort(ma.compressed(clip_mask)), c='r', label=source)
# NatEarth
plt.plot(area_clip_csum_NE[::-1], np.sort(ma.compressed(clip_NE_mask)), c='k', \
label='Natural Earth')
# decorations
plt.ylabel('Elevation')
plt.xlabel('Area km^2')
plt.title('Hypsographic curve for ' + mountain)
# plt.ylim(0.0, 5000.0)
lg = plt.legend(loc='upper right', fontsize=9)
# fighist = mountain + '_hypsographic.pdf'
# plt.savefig(fighist)
# plt.clf()
In [27]:
# 5M
plt.plot(area_norm_csum[::-1], np.sort(ma.compressed(elevNorm_clip)), c='r', label=source)
# NatEarth
plt.plot(area_norm_csum_NE[::-1], np.sort(ma.compressed(elevNorm_clip_NE)), c='k', \
label='Natural Earth')
# decorations
plt.xlim(0.0,1.0)
plt.ylim(0.0,1.0)
plt.ylabel('Elevation: h/H')
plt.xlabel('Area: a/A')
plt.title('Hypsometric curve for ' + mountain)
lg = plt.legend(loc='upper right', fontsize=9)
# fighist = mountain + '_hypsometric.pdf'
# plt.savefig(fighist)
# plt.clf()
In [21]:
# define bins for all histograms
binsize = 50
# 5M
bins_clip = range(0, roundUp(clip_max), binsize)
bincenters = [i + binsize/2 for i in bins_clip]
# Nat Earth
bins_clip_NE = range(0, roundUp(clip_NE_max), binsize)
bincenters_NE = [i + binsize/2 for i in bins_clip_NE]
In [28]:
# 5M
vals, edges = np.histogram(clip_array_comp, bins=bins_clip)
plt.plot(bincenters[:-1], vals, c='r', label='IBGE')
# NatEarth
vals_NE, edges_NE = np.histogram(clip_array_NE_comp, bins=bins_clip_NE)
plt.plot(bincenters_NE[:-1], vals_NE, c='k', label='Natural Earth')
# decorations
plt.ylabel('Elevation frequency (counts)')
plt.xlabel('Elevation (m)')
plt.title('Frequency histograms for ' + mountain)
lg = plt.legend(loc='upper right', fontsize=9)
# plt.show()
# fighist = mountain + '_histogram_frequency.pdf'
# plt.savefig(fighist)
# plt.clf()
In [29]:
# i) approximating area by mean cell size
mean_area_clip = np.mean(area_clip)
mean_area_clip_NE = np.mean(area_clip_NE)
# 5M
vals, edges = np.histogram(clip_array_comp, bins=bins_clip)
plt.plot(bincenters[:-1], vals * mean_area_clip, c='r', label='IBGE')
# NatEarth
vals_NE, edges_NE = np.histogram(clip_array_NE_comp, bins=bins_clip_NE)
plt.plot(bincenters_NE[:-1], vals_NE * mean_area_clip_NE, c='k', label='Natural Earth')
# decorations
plt.ylabel('Area km2 (approx)')
plt.xlabel('Elevation (m)')
plt.title('Area (approx) histograms for ' + mountain)
lg = plt.legend(loc='upper right', fontsize=9)
# plt.show()
# fighist = mountain + '_histogram_area_approx.pdf'
# plt.savefig(fighist)
# plt.clf()
To calculate the area of pixels per elevation, we use the ndimage
function from SciPy. It sums the values in one array (area) based on occurence a second array (elevation). A third array is used as an index (from 0 to max+1).
In [24]:
# ii) calculating area per elevation
# 5M data
clip_range = np.arange(0, int(clip_max)+1)
sum_area_clip = ndimage.sum(area_array, clip_array, clip_range)
# sum the values of areas in each bin
bins_sum = []
for i in bincenters:
low = i - (binsize / 2)
up = i + (binsize / 2)
b_sum = np.sum(sum_area_clip[low:up])
bins_sum.append(b_sum)
# Natural Earth
clip_range_NE = np.arange(0, int(clip_NE_max)+1)
sum_area_clip = ndimage.sum(area_array, clip_NE_array, clip_range_NE)
bins_sum_NE = []
for i in bincenters_NE:
low = i - (binsize / 2)
up = i + (binsize / 2)
b_sum = np.sum(sum_area_clip[low:up])
bins_sum_NE.append(b_sum)
In [30]:
# 5M
plt.plot(bincenters, bins_sum, c='r', label='IBGE')
# Natural Earth
plt.plot(bincenters_NE, bins_sum_NE, c='k', label='Natural Earth')
# decorations
plt.ylabel('Area km2 (calc)')
plt.xlabel('Elevation (m)')
plt.title('Area (calc) histograms for ' + mountain)
lg = plt.legend(loc='upper right', fontsize=9)
# plt.show()
# fighist = mountain + '_histogram_area_calc.pdf'
# plt.savefig(fighist)
# plt.clf()
We can compare both methods and see that approximating the area of pixels by the mean cell size gives results very close to those obtained by calculating the area of each pixel.
In [42]:
# 5M area - calculated
plt.plot(bincenters, bins_sum, c='r', label='calculated')
#5M area - approximated
plt.plot(bincenters[:-1], vals * mean_area_clip, 'o', c='k', ms=4, label='approximated')
# plt.plot(bins_sum[:-1],vals * mean_area_clip, 'ko-')
# decorations
plt.ylabel('Area km2')
plt.xlabel('Elevation (m)')
plt.title('Area histograms for ' + mountain)
lg = plt.legend(loc='upper right', fontsize=9)