This notebook builds on the ideas that Bryce Nordgren posted to
http://nbviewer.ipython.org/gist/bnordgren/47eee80aae57deb303a0
Basically, we found that Bryce's changes were still not sufficient to produce the gdalinfo metadata we expected to see, nor to geolocate the geotiff file he produced using ArcGIS.
Like Bryce, I am working with this file:
ftp://sidads.colorado.edu/pub/DATASETS/nsidc0531_MEASURES_nhsnow_weekly100/2003/nhtsw100e2_20030107_20030113.nc
N.B. Bryce observes that this file is "supposed to be in EASE-Grid 2.0 format." To begin I would like to clarify that EASE-Grid 2.0 is not a file format. It is a projection definition. Gridded data in EASE-Grid 2.0 can be stored in any file format, including flat binary, HDF, netCDF and geoTIFF. There are three EASE-Grid 2.0 projections, which are all in the equal area family: the Lambert Azimuthal North and South and the global cylindrical equal-area projection (with true latitude at +/- 30 degrees).
Our tests with gdal utilities, ENVI (5.2) and ArcMap (v10) indicate that the following alterations to the original .nc file will enable correct geolocation in these tools. We believe that the files need:
A very interesting source for future reference, on differences between what I saw in epsg wkt and esri/gdal/ogc wkt:
First issue: original file produces this from gdalinfo (on merged_snow_cover_extent): Warning 1: dimension #2 (cols) is not a Longitude/X dimension. Warning 1: dimension #1 (rows) is not a Latitude/Y dimension. and output Coordinate system is undefined, there is no origin/pixel size, and corner points are wrong: Corner Coordinates: Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 180.0) Upper Right ( 180.0, 0.0) Lower Right ( 180.0, 180.0) Center ( 90.0, 90.0)
1) When I added my rows and cols variables, I made them int16, which was too small for the meters range. This eliminated the warnings from gdalinfo, and now the output Coordinate system is defined (correctly, I think) and there are now rows and cols dimension variables, but there is no origin/pixel size and the Corner Coordinates are still Corner Coordinates: Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 180.0) Upper Right ( 180.0, 0.0) Lower Right ( 180.0, 180.0) Center ( 90.0, 90.0) Further, the dimension variables contain junk ( think because python is casting the floats or doubles into int16s): test = d.variables['rows'][::] print test [ 28432 -2640 31824 752 -30320 4144 -26928 7536 -23536 10928 -20144 14320...
2) Next try using a sufficiently large float32 for the dimension variables. This eliminated the warnings from gdalinfo, and now the output Coordinate system is defined (correctly, I think), along with the correct origin/pixel size, and there are now rows and cols dimension variables, the Corner Coordinates are correct:
Upper Left (-9000000.000, 9000000.000) Lower Left (-9000000.000,-9000000.000) Upper Right ( 9000000.000, 9000000.000) Lower Right ( 9000000.000,-9000000.000) Center ( 0.0000000, 0.0000000)
Further, the dimension variables contain what I set (which goes from bottom-up): test = d.variables['rows'][::] print test [-8950000. -8850000. -8750000. -8650000. -8550000. -8450000. -8350000.
However,
gdal_translate -of GTiff -b 1 NETCDF:nhtsw100e2_20030107_20030113.coordvar.nc:merged_snow_cover_extent out.tif complains with this: Input file size is 180, 180 0ERROR 1: nBlockYSize = 180, only 1 supported when reading bottom-up dataset ERROR 1: NETCDF:nhtsw100e2_20030107_20030113.coordvar.nc:merged_snow_cover_extent, band 1: IReadBlock failed at X offset 0, Y offset 0 ERROR 1: GetBlockRef failed at X block offset 0, Y block offset 0
3) So I tried defining the rows to start with the largest numbers first (the UL corner) Now the gdalinfo matches what I got in 2), and there is no error from gdal_translate. Doing gdalinfo on the .nc and .tif files returns different things:
vagrant@vsnowyski:~/ipynb$ diff gdalinfo.coordvar_yUL.nc.txt gdalinfo.coordvar_yUL.tif.txt 1,2c1,2 < Driver: netCDF/Network Common Data Format
Driver: GTiff/GeoTIFF Files: snowcover.coordvar_yUL.tif 10d9 < TOWGS84[0,0,0,0,0,0,0], 12,15c11,12 < PRIMEM["Greenwich",0, < AUTHORITY["EPSG","8901"]], < UNIT["degree",0.0174532925199433,
< AUTHORITY["EPSG","9108"]],
PRIMEM["Greenwich",0], UNIT["degree",0.0174532925199433],
21c18,20
< PARAMETER["false_northing",0]]
PARAMETER["false_northing",0], UNIT["metre",1, AUTHORITY["EPSG","9001"]]]
24a24 AREA_OR_POINT=Area 85,94c85,86 < Geolocation: < LINE_OFFSET=0 < LINE_STEP=1 < PIXEL_OFFSET=0 < PIXEL_STEP=1 < SRS=GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9108"]],AUTHORITY["EPSG","4326"]] < X_BAND=1 < X_DATASET=NETCDF:"nhtsw100e2_20030107_20030113.coordvar_yUL.nc":longitude < Y_BAND=1
< Y_DATASET=NETCDF:"nhtsw100e2_20030107_20030113.coordvar_yUL.nc":latitude
Image Structure Metadata: INTERLEAVE=BAND 101c93
< Band 1 Block=180x180 Type=Byte, ColorInterp=Undefined
Band 1 Block=180x45 Type=Byte, ColorInterp=Gray vagrant@vsnowyski:~/ipynb$ cp snowcover.coordvar_yUL.tif /projects/PMESDR/vagrant/brodzik/gdal_test/
of which the only thing that bothers me is this in the tif:
Band 1 Block=180x45 Type=Byte, ColorInterp=Gray
I would expect it to be 180x180.
However, ArcGIS reads it, fine.
In [1]:
import mpl_toolkits.basemap.pyproj as pyproj
import netCDF4 as nc
import os
import numpy as np
from mpl_toolkits.basemap import Basemap
I open the file for writing, and examine the coordinate system information. I can see the information about the Northern EASE-Grid 2.0, which looks fine.
In [78]:
d.close()
In [79]:
os.chdir("/home/vagrant/ipynb/")
d = nc.Dataset('nhtsw100e2_20030107_20030113.nc', 'r+')
In [80]:
print d
In [76]:
test = d.variables['rows'][::]
print test
In [81]:
coords = d.variables['coord_system']
print coords
I believe was correct in stating that his problem was that row and column coordinate variables were missing. Bryce outlined an empirical method for determining the values needed, but I know a couple of things about this projection, namely that the origin of the project is the North Pole, which is centered at the intersection of the 4 center cells. The file level metadata is telling me that the resolution is 100 km. I believe it is customary in projection mathematics to assume a left-handed Cartesian coordinate system positioned at the origin, with X increasing to the right and Y increasing up.
In order to set the row and column coordinate values to the centers of each cell, I calculate the values for each axis by multiplying each cell index by the map scale (100000. m). The offset of 50000 m positions each value at the center of each cell.
In [82]:
map_scale_x_meters = 100000.
ncols = len(d.dimensions['cols'])
x_meters = ( np.arange(ncols) - (ncols/2) ) * map_scale_x_meters + (map_scale_x_meters/2)
print x_meters
print type(x_meters[0])
I could calculate the y_meters similarly, but I know that this projection is azimuthal and symmetric, so I can just set y_meters values to be the same as x_meters, but it turns out that the order of the values is important.
At first I did the same thing Bryce did, which was to order the x values from left to right and the y values from top to bottom. This would mean:
y_meters = x_meters[::-1]
However as Bryce points out, this requires that the gdal utilities assume a default orientation for the gridded data that is flipped from top to bottom. This would require that gdal_translate be called with "--config GDAL_NETCDF_BOTTOMUP NO".
I experimented with this and found that setting y_meters with the smallest values first means that I can call gdal utilities without this config switch.
In [83]:
x_meters = x_meters.astype(np.float32)
print x_meters
print type(x_meters[0])
In [84]:
# Use this order (smallest y coordinates first)
# if the data arrays are left alone in the file
# doing this means that gdal_translate doesn't have to be called with
# "--config GDAL_NETCDF_BOTTOMUP NO" to override the gdal drivers
#y_meters = x_meters
# However, I think the correct way to set up these files for
# gdal drivers to interpret them correctly would be to reverse the
# actual data arrays by row, and then use this for the
# y coordinates:
y_meters = x_meters[::-1]
# It's not clear to me whether this is a CF issue or a gdal convention.
Now I just did the same thing Bryce did to define row and column coordinate variables, with the expected standard_names.
In [85]:
# make the rows coordinate variable
# note the variable name is the same as the dimension name
# the standard_name set to 'projection_y_coordinate'
# is needed for gdal to understand the geolocation
rows_var = d.createVariable('rows',np.float32, ('rows',))
# give it the expected attributes
rows_var.standard_name = 'projection_y_coordinate'
rows_var.axis = 'Y'
rows_var.units = 'meters'
# write the values to the variable
rows_var[:] = y_meters
In [86]:
# make the cols coordinate variable similarly
cols_var = d.createVariable('cols',np.float32, ('cols',))
# give it the expected attributes
cols_var.standard_name = 'projection_x_coordinate'
cols_var.axis = 'X'
cols_var.units = 'meters'
# write the values to the variable
cols_var[:] = x_meters
In [88]:
print d
print d.variables['rows'][::]
However, I found that the coordinate variables still didn't return gdalinfo metadata with the correct corner values, so
In [ ]:
# Add spatial reference and geotransform to coordinate system
# I lifted the following wkt from a geotiff that I created using python gdal tools, where
# I specified the projection with the proj.4 string.
#coords.spatial_ref = "PROJCS[\"unnamed\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9108\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Lambert_Azimuthal_Equal_Area\"],PARAMETER[\"latitude_of_center\",90],PARAMETER[\"longitude_of_center\",0],PARAMETER[\"false_easting\",0],PARAMETER[\"false_northing\",0],UNIT[\"Meter\",1]]"
#coords.spatial_ref = \
# "PROJCS[\"WGS 84 / NSIDC EASE-Grid 2.0 North\"," \
# + " GEOGCS[\"WGS 84\"," \
# + " DATUM[\"WGS_1984\"," \
# + " SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]]," \
# + " TOWGS84[0,0,0,0,0,0,0]," \
# + " AUTHORITY[\"EPSG\",\"6326\"]]," \
# + " PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]]," \
# + " UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9108\"]]," \
# + " AUTHORITY[\"EPSG\",\"4326\"]]," \
# + " PROJECTION[\"Lambert_Azimuthal_Equal_Area\"]," \
# + " PARAMETER[\"latitude_of_center\",90]," \
# + " PARAMETER[\"longitude_of_center\",0]," \
# + " PARAMETER[\"false_easting\",0]," \
# + " PARAMETER[\"false_northing\",0]," \
# + " UNIT[\"Meter\",1]]"
# This is the wkt for epsg 6931, taken directly from the epsg-registry.org site.
#coords.spatial_ref = \
# "PROJCS[\"WGS 84 / NSIDC EASE-Grid 2.0 North\"," \
# + " GEOGCS[\"WGS 84\"," \
# + " DATUM[\"WGS_1984\"," \
# + " SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]]," \
# + " TOWGS84[0,0,0,0,0,0,0]," \
# + " AUTHORITY[\"EPSG\",\"6326\"]]," \
# + " PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]]," \
# + " UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9108\"]]]," \
# + " PROJECTION[\"Lambert Azimuthal Equal Area\"," \
# + " PARAMETER[\"Latitude of natural origin\",90,ANGLEUNIT[\"degree\",0.01745329252]]," \
# + " PARAMETER[\"Longitude of natural origin\",0,ANGLEUNIT[\"degree\",0.01745329252]]," \
# + " PARAMETER[\"False easting\",0,LENGTHUNIT[\"metre\",1.0]]," \
# + " PARAMETER[\"False northing\",0,LENGTHUNIT[\"metre\",1.0]]]," \
# + " UNIT[\"Meter\",1]," \
# + " AUTHORITY[\"EPSG\",\"6931\"]]"
#coords.GeoTransform = "-9000000 100000 0 9000000 0 -100000 "
In [89]:
#close the file to make sure everything is written
d.close()
print "Done"
In [ ]: