EASE Grid 2.0 via Python, pyproj, and gdal_translate

Prelude

This page is necessary because a snow data product I was trying to use is not georeferencing in standard off the shelf tools:

[fedora@lugosi ~]$ gdalinfo NETCDF:nhtsw100e2_20030107_20030113.nc:merged_snow_cover_extent
Warning 1: dimension #2 (cols) is not a Longitude/X dimension.
Warning 1: dimension #1 (rows) is not a Latitude/Y dimension.
Driver: netCDF/Network Common Data Format
Files: nhtsw100e2_20030107_20030113.nc
Size is 180, 180
Coordinate System is `'

Note that the coordinate system is empty and there are a couple of warnings. Turns out, the same thing fixes both problems. Long story short, the file is missing coordinate variables for the rows and cols dimensions. Now it seems obvious that this is what the warning messages were trying to tell me, but at the outset I was too dense to pick up on it.

To experience the entire thread of discovery, and/or learn how to mess around with netCDF using Python, read on. If you just want to see how to fix it, skip ahead to the section "So why isn't it working?"

Exploring the NetCDF file

This section starts with the basics. Open the file, sniff around to see what's in there, and print out some key bits of information. The very first thing to do is to tell Python what libraries we're going to need.


In [4]:
import mpl_toolkits.basemap.pyproj as pyproj
#import pyproj
import netCDF4 as nc
import os
import numpy as np
from mpl_toolkits.basemap import Basemap

Before opening this notebook, I grabbed an example data file from the National Snow and Ice Data Center. (ftp://sidads.colorado.edu/pub/DATASETS/nsidc0531_MEASURES_nhsnow_weekly100/2003/nhtsw100e2_20030107_20030113.nc) This file is supposed to be in what's known as EASE Grid 2.0 format. Unfortunately, no known GIS tools (commercial or open source) read the file. Our objective here is to come up with a method of converting the file to something that the tools can read.

So the very first step is to change to the directory containing the file using os.chdir. It works much as you should expect. Give it the full path to where you want to be.


In [5]:
os.chdir("/home/vagrant/ipynb/")

Now it's time to open the file. When we imported the NetCDF library above, we told python to associate it with the abbreviation "nc". To open the file, we need to use a function called "Dataset" in the netcdf library. All we need to do is tell it what filename we're interested in. This looks like the following:


In [36]:
d = nc.Dataset('nhtsw100e2_20030107_20030113.nc')

Now we have a variable "d" that we can use to access the contents of the netCDF file. The NetCDF library is pretty smart. It doesn't try to load the whole file into memory at this point, as many NetCDF files can be pretty large. You can use this "d" variable to look at the descriptive metadata, get a sense for the size of the grids, or eventually, ask for the actual data values. For the moment, let's just do the blindingly simple thing and "print" the variable to the terminal.


In [38]:
#print d
print d.variables['rows'][::]


[-8950004.        -8850004.        -8750004.        -8650004.        -8550004.
 -8450004.        -8350003.5       -8250003.5       -8150003.5       -8050003.5
 -7950003.5       -7850003.5       -7750003.5       -7650003.5       -7550003.5
 -7450003.5       -7350003.        -7250003.        -7150003.        -7050003.
 -6950003.        -6850003.        -6750003.        -6650003.        -6550003.
 -6450003.        -6350003.        -6250003.        -6150003.        -6050002.5
 -5950002.5       -5850002.5       -5750002.5       -5650002.5       -5550002.5
 -5450002.5       -5350002.5       -5250002.5       -5150002.5       -5050002.5
 -4950002.5       -4850002.        -4750002.        -4650002.        -4550002.
 -4450002.        -4350002.        -4250002.        -4150002.        -4050002.
 -3950001.75      -3850001.75      -3750001.75      -3650001.75      -3550001.75
 -3450001.75      -3350001.5       -3250001.5       -3150001.5       -3050001.5
 -2950001.5       -2850001.5       -2750001.25      -2650001.25      -2550001.25
 -2450001.25      -2350001.25      -2250001.25      -2150001.        -2050001.125
 -1950001.        -1850001.        -1750001.        -1650000.875
 -1550000.875     -1450000.875     -1350000.75      -1250000.75      -1150000.75
 -1050000.625      -950000.625      -850000.5625     -750000.5625
  -650000.5        -550000.4375     -450000.40625    -350000.375
  -250000.328125   -150000.28125     -50000.2421875    49999.796875
   149999.84375     249999.875       349999.90625     449999.96875
   550000.          650000.0625      750000.0625      850000.125
   950000.1875     1050000.25       1150000.25       1250000.25
  1350000.375      1450000.375      1550000.375      1650000.5        1750000.5
  1850000.5        1950000.625      2050000.625      2150000.75       2250000.75
  2350000.75       2450000.75       2550000.75       2650000.75       2750001.
  2850001.         2950001.         3050001.         3150001.         3250001.
  3350001.25       3450001.25       3550001.25       3650001.25       3750001.25
  3850001.25       3950001.5        4050001.5        4150001.5        4250001.5
  4350001.5        4450001.5        4550001.5        4650001.5        4750001.5
  4850002.         4950002.         5050002.         5150002.         5250002.
  5350002.         5450002.         5550002.         5650002.         5750002.
  5850002.         5950002.         6050002.5        6150002.5        6250002.5
  6350002.5        6450002.5        6550002.5        6650002.5        6750002.5
  6850002.5        6950002.5        7050002.5        7150002.5        7250003.
  7350003.         7450003.         7550003.         7650003.         7750003.
  7850003.         7950003.         8050003.         8150003.         8250003.
  8350003.         8450003.         8550003.         8650003.         8750003.
  8850003.         8950003.       ]

This simple command gives us a lot of information. This particular file has a lot of metadata, which is all printed out. Some files will have more and some will have less. For the moment, just notice that the "Conventions" attribute specifies "CF-1.6". That means that the rest of the attributes should be interpreted using the CF v1.6 specification found on http://cfconventions.org/. The real meat and potatoes of a NetCDF file are the "dimensions" and "variables". All useful NetCDF files will have dimensions and variables.

Dimensions have a name and a size. We can see from the above that this file has three dimensions (time, rows, cols). The time dimension has a size of 1, while rows and cols both have a size of 180. Dimensions in and of themselves are not very interesting. They exist mainly to specify the shape of gridded data stored in Variables.

Variables have a name, a data type, and zero or more dimensions. If a variable has zero dimensions (like "coord_system", above), it is a scalar and can only store one value. Otherwise, the dimensions specify the shape of the grid. The time variable has one dimension, also named time. Latitude and longitude have two dimensions (rows and cols), and everything else except "coord_system" has three dimensions (time, rows, cols).

Starting simple, let's look at the "time" variable.

Coordinate Variables


In [8]:
print d.variables['time']


<type 'netCDF4.Variable'>
int32 time(time)
    calendar: gregorian
    axis: T
    units: days since 1966-10-03
    long_name: time
    standard_name: time
unlimited dimensions: time
current shape = (1,)
filling on, default _FillValue of -2147483647 used

Printing an individual variable gives quite a bit more information about the variable than just printing out an overview of the file. This information tends to be directly related to the variable itself. We can see the variable is a 32 bit integer. Since it has a dimension of time, and the time dimension has a size of 1, we expect that the current_shape would be a single dimension of one value. The rest of the attributes are human friendly: you as a human can pretty much figure it out just by looking.

Lets check if the contents of the time variable meet our expectations. From the filename, this file contains data for the week of Jan 7-13, 2003. Without being too careful about leap years and such, let's check to see if the time variable contains a number that works out to be about the number of days that would put us in Jan 2003 if we were to start counting on Oct 3, 1966.


In [ ]:
print d.variables['time'][0]

Here for the first time, we've asked for actual data from the file. To get data, you index the variable of interest. In the above, we asked for the first value in the time array (index 0).

Now we just have to see if 13251 days puts us in 2003.


In [ ]:
full_years = 365.25 * (2002-1966)
oct_to_jan = np.sum([(31-3),30,31]) # rest of oct, nov, dec
print full_years, full_years + oct_to_jan
print d.variables['time'][0] - (full_years + oct_to_jan)

Our faith in the ability of computers to count is affirmed. All hail our cyber overlords. The day specified is Jan 13, 2003 plus or minus 1 day for alignment of leap years with our time window.

The attributes "units", "standard_name", and "axis" are part of the CF 1.6 convention that the file respects. And since this variable is in the file adhering to CF 1.6, the variable adheres to it too. These attributes have special meanings, and you have to read the convention specification to find out what that meaning is.

One more term to remember is "coordinate variable". A one dimensional variable that has the same name as it's dimension is a coordinate variable. Hence, the time variable is the only coordinate variable in this file. Coordinate variables are how you translate from unitless grid indices to real values along the axis. In the above, index 0 for the time dimension stands for Jan 13, 2003. This relationship holds for any variable in this file which uses the time dimension.

Auxiliary Coordinate Variables

Sometimes, the grid cells are not evenly spaced. The latitude and longitude variables for this EASE Grid 2.0 are examples of this. The data are spaced at even 100km intervals in the geospatial projection, which does not translate to nicely gridded lat/lon data. Auxiliary coordinate variables are 2D variables which can be used to lookup the latitude and longitude of any row/col pair.


In [ ]:
print d.variables['latitude']

This variable stores the latitude of each grid cell. The fact that "units" is "degrees_north" and/or the value "latitude" for the standard_name attribute may be used to determine that this is an auxiliary coordinate variable, and it contains the latitude values. The name of the variable is not special. It could be called 'i_do_not_contain_latitudes', but if one or more of the special attributes contained the magic latitude-indicating values, it would still be interpreted as latitudes.

Let's see what's inside.


In [ ]:
print d.variables['latitude'][0,0]

Because this is a 2D variable, we had to provide a 2D index. The -- means that the latitude value at index [0,0] is nodata. The EASE Grid is a polar projection which looks more or less like a circle. The corners don't have data, so we really should have expected this. While we're here, let's load the whole latitude array into memory because it's small.


In [9]:
lats = d.variables['latitude'][:]
print lats.shape


(180, 180)

This gives us a new variable called "lats". It has a copy of the values that were in the file. All 180x180 of them. Changing the values in the "lats" variable does not change the values in the file itself. We used a technique called "slicing" (e.g., a colon instead of an index) to load in the values we wanted. Normally, you say [min_index:max_index] if you want to specify a subset. The default value of min_index is the first array value, and the default value of max_index is the last array value. In this case, we wanted everything, so we left out both the min and the max, allowing both to take on their default values.

There's actually a little bit more than this going on since we're talking about a 2D array, but that is out of scope for this notebook. Slicing is a powerful python technique which you need to spend some brain cells on. Go forth and learn.

Masked Arrays

In this case, the python variable "lats" is what's known as a masked array. The netcdf library automatically masked out the nodata values for us, which is why we just saw a '--' above when we printed out the latitude value of the corner. Masked arrays are a combination of a boolean (True/False) mask, and the values. If the mask flag for a cell is true, the cell is considered to contain "nodata". So let's revisit that masked out corner.


In [10]:
print "The mask is: ", lats.mask[0,0]
print "The data value is: ", lats.data[0,0]


The mask is:  True
The data value is:  -999.0

If you scroll back up, the fill value in the netCDF file was -999. So you can see that python is making things a little nicer for you in that you don't have to always check to see if the grid cell contains the fill value/nodata value.

Let's set the corner value to something and then mask it back out.


In [11]:
lats[0,0] = 42.
print lats[0,0]
print "The mask is: ", lats.mask[0,0]
print "The data value is: ", lats.data[0,0]


42.0
The mask is:  False
The data value is:  42.0

In [12]:
# ... and set it back
lats[0,0] = np.ma.masked
print lats[0,0]
print "The mask is: ", lats.mask[0,0]
print "The data value is: ", lats.data[0,0]


--
The mask is:  True
The data value is:  42.0

Whoops! It masked the value out for us all right, but it didn't change the value back to -999. Keep this in mind. Depending on your situation, it might be important. You can explicitly set the value by using the "data" attribute.


In [13]:
lats.data[0,0] = -999.0
print lats[0,0]
print "The mask is: ", lats.mask[0,0]
print "The data value is: ", lats.data[0,0]


--
The mask is:  True
The data value is:  -999.0

Variables...finally

So far we have explored a whole lot of nothing useful. We know where each grid cell is, but (in this case) not whether they are covered in snow. We want to know whether they are covered in snow, so we use the non-coordinate-variable, non-auxiliary-coordinate-variable called "merged_snow_cover_extent". We will create a python variable as a shorthand to the data in the file, but we're not going to give a slicing expression. Without the slicing expression, python won't copy the data into memory from the file.


In [14]:
snow = d.variables['merged_snow_cover_extent']
print snow


<type 'netCDF4.Variable'>
int8 merged_snow_cover_extent(time, rows, cols)
    flag_meanings: cdr_microwave_report_snow cdr_only_reports_snow microwave_only_reports_snow snow_free_land permanent_ice ocean
    flag_values: [10 11 12 20 30 40]
    _FillValue: -99
    comment: 10: Snow cover reported by weekly_cdr, passive_microwave, 11: Snow cover reported by weekly_cdr only,  12: Snow cover reported by passive_microwave only, 20: Snow free land, 30: Permanent ice covered land, 40: Ocean
    valid_range: [10 40]
    coordinates: longitude latitude time
    long_name: Merged Snow Cover Extent
    grid_mapping: coord_system
unlimited dimensions: time
current shape = (1, 180, 180)
filling on

See how the above behaved as if we typed "print d.variables['merged_snow_cover_extent']"? Now we can be lazy and just type "print snow" instead. You can still specify an index and get the data, but this time you are not working with a copy that resides in memory. Remember we did not explicitly ask python to copy all the data out of the file into a variable. Ergo, if you try and assign different values to cells in the "snow" variable, it will try to change the values in the file. In this case, we opened the file read-only (which is the default), so changing the value would fail.


In [15]:
print snow[0,0,0]


--

??! A 3D variable? Tricky. You can handle this by now, and you should be able to tell what order the coordinates go in by looking above.

So what happens if we try to change the values in the snow variable?


In [ ]:
#snow[0,0,0] = 42

This translates roughly as "Thou shalt not try to write to a file you opened as read only." Fear not the crappy error message. Just understand that when we created the "snow" variable in python, we essentially told python to treat the snow variable as a handle to the data on disk. Trying to set values in the snow variable attempts to write the new values to the file...which it can't do if the file has been opened for reading only.

Now, one final thing: the coordinates attribute lists the (auxiliary) coordinate variables which contain the location information in each cell. The CF 1.6 standard requires that the lat/lon of each cell be provided if the lat/lons are not evenly spaced out.

Projection information

If you look above where we printed out the snow variable, you should see a "grid_mapping" attribute. This is another one of those attributes specified by CF-1.6. It contains the name of a dummy variable which describes the projection in which the data are presented. It's a dummy variable because it contains no data values, it serves only as a container for a bunch of descriptive attributes....all of which are described by CF-1.6. The snow variable declares that its projection information is contained in the "coord_system" dummy variable, so let's take a look.


In [16]:
coords = d.variables['coord_system']
print coords


<type 'netCDF4.Variable'>
|S1 coord_system()
    comment: EASE-Grid-2.0 projection definition documention: http://nsidc.org/data/ease/ease_grid2.html, NSIDC mapx grid parameter definition file: EASE2_N100km.gpd, grid_id: EASE2_N100km
    grid_mapping_name: lambert_azimuthal_equal_area
    longitude_of_projection_origin: 0.0
    latitude_of_projection_origin: 90.0
    false_easting: 0.0
    false_northing: 0.0
    scale_factor_at_projection_origin: 25
    semimajor_axis: 6.37814e+06
    semiminor_axis: 6.35675e+06
    inverse_flattening: 298.257
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of  used

The only parameter that a variable pointed to by a "grid_mapping" attribute must have is "grid_mapping_name". These names select the projection and are defined in http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#appendix-grid-mappings. The other attributes define the parameters of the projection, which of course may vary from projection to projection. The parameters of the sphereoid may appear regardless of the projection selected.

This information gives enough information to calculate projected coordinates given lat/lon, or vice-versa. You will want to use pre-written library code for that.

Summary

Here is what we've discovered so far:

  • Grid cells are regularly spaced in projected coordinates.
  • Latitude/Longitude do not comprise a regular grid.
  • There are no coordinate variables (1d) to describe rows and cols.
  • Auxiliary coordinate variables (2d) are identified in the "coordinates" attribute, and tie each grid cell to a set of lat/lon coordinates.
  • The names of the auxiliary coordinate variables are not special: latitude and longitude are identified by matching the "units" to accepted values.
  • Variables containing geolocated data refer to their spatial reference system by placing the name of a variable in the "grid_mapping" attribute. This variable has a series of attributes which describe the spatial reference system.

So why isn't it working?

The big thing that is missing is a simple translation between rows/cols and locations in the projected coordinates. Presumably, once we provide this, other programs will start to grok the NetCDF file, or we can export to a different format like geotiff. Specifically, the most likely reason that the projection is not being read is that the file technically violates the CF 1.6 specification. When the projection is Lambert Azimuthal Equal Area, the spec says "The x (abscissa) and y (ordinate) rectangular coordinates are identified by the standard_name attribute values projection_x_coordinate and projection_y_coordinate respectively."

So, software designed to the specification is expecting coordinate variables for rows and cols, and will be looking for these variables to be tagged as "projection_x_coordinate" and "projection_y_coordinate". As noted above, these variables are entirely missing. It is possible that if they are provided, the software may start working.

Calculating projected coordinates

We are going to assume that the relationship between the grid cell indices and the projected coordinates is linear, that the axes are orthogonal, and the (i,j) grid cell axes are not rotated with respect to the axes of the projected coordinate system. We will only generalize to this more complex scenario if our assumptions turn out to be false.

We need to find two points which have lat/lon data, these two points should be as far apart as possible, and the points should not be in the same column or row.


In [17]:
mysnow = d.variables['merged_snow_cover_extent'][:]

In [18]:
mysnow[0,0,79:101]


Out[18]:
masked_array(data = [-- 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 --],
             mask = [ True False False False False False False False False False False False
 False False False False False False False False False  True],
       fill_value = -99)

In [19]:
unmasked_ij = np.where(np.logical_not(lats.mask))
indices = zip(*unmasked_ij)
first = indices[0]
last = indices[-1]
print "First: ", first
print "Last: ", last


First:  (0, 80)
Last:  (179, 99)

In [20]:
lons = d.variables['longitude']

In [21]:
print "First: ", lats[first], lons[first]
print "Last: ", lats[last], lons[last]


First:  0.123689 -173.941
Last:  0.123689 6.05899

If we were trying to get a simple linear fit from rows/cols to lat/lon, this would be a problem. However, we know this is not possible so let's take these lat/lons and calculate the projected coordinates. We will be using Proj.4. (http://trac.osgeo.org/proj/) The first step is to get the parameter values for the projection from the coord_system variable.

We have not yet done this. We've printed the values and looked at them, but we are now moving into the realm of having the computer fetch specific values and use them automatically. Getting the value of an attribute in the netCDF file is accomplished exactly as if it were an attribute of a python variable:


In [22]:
#remember that coords is the python variable referring to the "coord_system" netcdf variable.
lat_0 = coords.latitude_of_projection_origin 
lon_0 = coords.longitude_of_projection_origin
false_e = coords.false_easting
false_n = coords.false_northing
s_major = coords.semimajor_axis 
s_minor = coords.semiminor_axis

Now we make a projection object. It's just a handy way to carry around a description of which projection equations to use, and what the parameters are.


In [23]:
the_proj = pyproj.Proj(proj="laea", lat_0=lat_0, lon_0=lon_0, x_0=false_e, y_0=false_n, a=s_major, b=s_minor)
geo      = pyproj.Proj(proj="latlong", a=s_major, b=s_minor)

Now we can compute the projected coordinates for the lat/lon coordinates of the two points.


In [24]:
x1,y1 = pyproj.transform(geo,the_proj, lons[first], lats[first])
x2,y2 = pyproj.transform(geo,the_proj, lons[last], lats[last])

In [25]:
print x1,y1
print x2,y2


-950000.614496 8950001.68985
950000.167584 -8950001.73729

Construct the linear fits

So, it's now a matter of making a linear fit for x and a linear fit for y.


In [26]:
# it's (rows,cols), so index[1]==column 
m_x = (x2-x1)/(last[1]-first[1])
b_x = x1 - (m_x*first[1])
print x2, last[1]*m_x+b_x


950000.167584 950000.167584

Woo hoo the fit for x seems to work. Now on to y.


In [27]:
# it's (rows,cols) so index[0]==rows
m_y = (y2-y1)/(last[0]-first[0])
b_y = y1 -(m_y*first[0])
print y2, last[0]*m_y+b_y


-8950001.73729 -8950001.73729

Just to doublecheck, we know that the data set is on a 100km grid, so m_x and m_y should both be 100000.


In [28]:
print m_x, m_y


100000.041162 -100000.019146

Close enough for me.

Create the coordinate variables...

Python, specifically the "numpy" library, makes it easy to write equations which manipulate entire arrays of data. Usually, the equation is applied to each element of the input array and produces a corresponding output array. Numpy is well worth learning.

In the following, the function "np.arange" generates an array containing the numbers 0, 1, 2, ... {n-1}, where n is either the number of rows or cols. This entire array is transformed with the linear fit we produced above, yielding the projected x/y coordinates for each row and column.


In [29]:
cv_rows = m_x * np.arange(len(d.dimensions['rows'])) + b_x
cv_cols = m_y * np.arange(len(d.dimensions['cols'])) + b_y

Now we need to store these new coordinate variables in the file and add the expected attributes. To do so, we need to close the netcdf file and open it again for writing. Make sure we open for read/write! Opening just for write will try to delete the existing file so you have a nice clean slate.


In [30]:
print cv_rows
print cv_cols


[-8950003.90746319 -8850003.8663011  -8750003.82513901 -8650003.78397691
 -8550003.74281482 -8450003.70165273 -8350003.66049064 -8250003.61932854
 -8150003.57816645 -8050003.53700436 -7950003.49584226 -7850003.45468017
 -7750003.41351808 -7650003.37235599 -7550003.3311939  -7450003.2900318
 -7350003.24886971 -7250003.20770762 -7150003.16654552 -7050003.12538343
 -6950003.08422134 -6850003.04305925 -6750003.00189715 -6650002.96073506
 -6550002.91957297 -6450002.87841088 -6350002.83724878 -6250002.79608669
 -6150002.7549246  -6050002.71376251 -5950002.67260041 -5850002.63143832
 -5750002.59027623 -5650002.54911414 -5550002.50795204 -5450002.46678995
 -5350002.42562786 -5250002.38446577 -5150002.34330367 -5050002.30214158
 -4950002.26097949 -4850002.21981739 -4750002.1786553  -4650002.13749321
 -4550002.09633112 -4450002.05516902 -4350002.01400693 -4250001.97284484
 -4150001.93168275 -4050001.89052065 -3950001.84935856 -3850001.80819647
 -3750001.76703438 -3650001.72587228 -3550001.68471019 -3450001.6435481
 -3350001.60238601 -3250001.56122391 -3150001.52006182 -3050001.47889973
 -2950001.43773764 -2850001.39657554 -2750001.35541345 -2650001.31425136
 -2550001.27308927 -2450001.23192717 -2350001.19076508 -2250001.14960299
 -2150001.10844089 -2050001.0672788  -1950001.02611671 -1850000.98495462
 -1750000.94379252 -1650000.90263043 -1550000.86146834 -1450000.82030625
 -1350000.77914415 -1250000.73798206 -1150000.69681997 -1050000.65565788
  -950000.61449578  -850000.57333369  -750000.5321716   -650000.49100951
  -550000.44984741  -450000.40868532  -350000.36752323  -250000.32636114
  -150000.28519904   -50000.24403695    49999.79712514   149999.83828723
   249999.87944933   349999.92061142   449999.96177351   550000.00293561
   650000.0440977    750000.08525979   850000.12642188   950000.16758398
  1050000.20874607  1150000.24990816  1250000.29107025  1350000.33223234
  1450000.37339444  1550000.41455653  1650000.45571862  1750000.49688072
  1850000.53804281  1950000.5792049   2050000.62036699  2150000.66152909
  2250000.70269118  2350000.74385327  2450000.78501536  2550000.82617746
  2650000.86733955  2750000.90850164  2850000.94966373  2950000.99082583
  3050001.03198792  3150001.07315001  3250001.1143121   3350001.1554742
  3450001.19663629  3550001.23779838  3650001.27896048  3750001.32012257
  3850001.36128466  3950001.40244675  4050001.44360884  4150001.48477094
  4250001.52593303  4350001.56709512  4450001.60825722  4550001.64941931
  4650001.6905814   4750001.73174349  4850001.77290559  4950001.81406768
  5050001.85522977  5150001.89639186  5250001.93755396  5350001.97871605
  5450002.01987814  5550002.06104023  5650002.10220233  5750002.14336442
  5850002.18452651  5950002.2256886   6050002.2668507   6150002.30801279
  6250002.34917488  6350002.39033698  6450002.43149907  6550002.47266116
  6650002.51382325  6750002.55498534  6850002.59614744  6950002.63730953
  7050002.67847162  7150002.71963372  7250002.76079581  7350002.8019579
  7450002.84311999  7550002.88428209  7650002.92544418  7750002.96660627
  7850003.00776836  7950003.04893046  8050003.09009255  8150003.13125464
  8250003.17241673  8350003.21357883  8450003.25474092  8550003.29590301
  8650003.3370651   8750003.3782272   8850003.41938929  8950003.46055138]
[ 8950001.68985344  8850001.67070738  8750001.65156132  8650001.63241526
  8550001.6132692   8450001.59412315  8350001.57497709  8250001.55583103
  8150001.53668497  8050001.51753891  7950001.49839285  7850001.4792468
  7750001.46010074  7650001.44095468  7550001.42180862  7450001.40266256
  7350001.3835165   7250001.36437044  7150001.34522439  7050001.32607833
  6950001.30693227  6850001.28778621  6750001.26864015  6650001.24949409
  6550001.23034804  6450001.21120198  6350001.19205592  6250001.17290986
  6150001.1537638   6050001.13461774  5950001.11547169  5850001.09632563
  5750001.07717957  5650001.05803351  5550001.03888745  5450001.01974139
  5350001.00059533  5250000.98144928  5150000.96230322  5050000.94315716
  4950000.9240111   4850000.90486504  4750000.88571898  4650000.86657293
  4550000.84742687  4450000.82828081  4350000.80913475  4250000.78998869
  4150000.77084263  4050000.75169658  3950000.73255052  3850000.71340446
  3750000.6942584   3650000.67511234  3550000.65596628  3450000.63682023
  3350000.61767417  3250000.59852811  3150000.57938205  3050000.56023599
  2950000.54108993  2850000.52194387  2750000.50279782  2650000.48365176
  2550000.4645057   2450000.44535964  2350000.42621358  2250000.40706752
  2150000.38792147  2050000.36877541  1950000.34962935  1850000.33048329
  1750000.31133723  1650000.29219117  1550000.27304512  1450000.25389906
  1350000.234753    1250000.21560694  1150000.19646088  1050000.17731482
   950000.15816876   850000.13902271   750000.11987665   650000.10073059
   550000.08158453   450000.06243847   350000.04329241   250000.02414636
   150000.0050003     49999.98585424   -50000.03329182  -150000.05243788
  -250000.07158394  -350000.09072999  -450000.10987605  -550000.12902211
  -650000.14816817  -750000.16731423  -850000.18646029  -950000.20560635
 -1050000.2247524  -1150000.24389846 -1250000.26304452 -1350000.28219058
 -1450000.30133664 -1550000.3204827  -1650000.33962875 -1750000.35877481
 -1850000.37792087 -1950000.39706693 -2050000.41621299 -2150000.43535905
 -2250000.4545051  -2350000.47365116 -2450000.49279722 -2550000.51194328
 -2650000.53108934 -2750000.5502354  -2850000.56938145 -2950000.58852751
 -3050000.60767357 -3150000.62681963 -3250000.64596569 -3350000.66511175
 -3450000.68425781 -3550000.70340386 -3650000.72254992 -3750000.74169598
 -3850000.76084204 -3950000.7799881  -4050000.79913416 -4150000.81828021
 -4250000.83742627 -4350000.85657233 -4450000.87571839 -4550000.89486445
 -4650000.91401051 -4750000.93315656 -4850000.95230262 -4950000.97144868
 -5050000.99059474 -5150001.0097408  -5250001.02888686 -5350001.04803292
 -5450001.06717897 -5550001.08632503 -5650001.10547109 -5750001.12461715
 -5850001.14376321 -5950001.16290927 -6050001.18205532 -6150001.20120138
 -6250001.22034744 -6350001.2394935  -6450001.25863956 -6550001.27778562
 -6650001.29693167 -6750001.31607773 -6850001.33522379 -6950001.35436985
 -7050001.37351591 -7150001.39266197 -7250001.41180802 -7350001.43095408
 -7450001.45010014 -7550001.4692462  -7650001.48839226 -7750001.50753832
 -7850001.52668437 -7950001.54583043 -8050001.56497649 -8150001.58412255
 -8250001.60326861 -8350001.62241467 -8450001.64156073 -8550001.66070678
 -8650001.67985284 -8750001.6989989  -8850001.71814496 -8950001.73729102]

In [31]:
# first, close the current file
d.close()
# now open again for writing
d = nc.Dataset('nhtsw100e2_20030107_20030113.nc', 'r+')

Creating variables in netcdf is pretty simple. You just need to specify the variable name, the data type, and a list of dimensions as strings. Coordinate variables are super simple because the name of the variable is the same as the name of the only dimension they have.


In [32]:
coords = d.variables['coord_system']
print coords


<type 'netCDF4.Variable'>
|S1 coord_system()
    comment: EASE-Grid-2.0 projection definition documention: http://nsidc.org/data/ease/ease_grid2.html, NSIDC mapx grid parameter definition file: EASE2_N100km.gpd, grid_id: EASE2_N100km
    grid_mapping_name: lambert_azimuthal_equal_area
    longitude_of_projection_origin: 0.0
    latitude_of_projection_origin: 90.0
    false_easting: 0.0
    false_northing: 0.0
    scale_factor_at_projection_origin: 25
    semimajor_axis: 6.37814e+06
    semiminor_axis: 6.35675e+06
    inverse_flattening: 298.257
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of  used


In [ ]:
coords.test = "test"

In [3]:
d.close()


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-ef507f445a30> in <module>()
----> 1 d.close()

NameError: name 'd' is not defined

In [33]:
# make the rows coordinate variable
# note the variable name is the same as the dimension name
rows_var = d.createVariable('rows',np.float32, ('rows',))

# give it the expected attributes
rows_var.standard_name = 'projection_y_coordinate'
rows_var.axis          = 'Y'
rows_var.units         = 'meters'

# write the values to the variable
rows_var[:] = cv_rows

In [34]:
# make the cols coordinate variable
# note the variable name is the same as the dimension name
cols_var = d.createVariable('cols',np.float32, ('cols',))

# give it the expected attributes
cols_var.standard_name = 'projection_x_coordinate'
cols_var.axis          = 'X'
cols_var.units         = 'meters'

# write the values to the variable
cols_var[:] = cv_cols

In [35]:
#close the file to make sure everything is written
d.close()

Poof. GDAL works now:

[fedora@lugosi ~]$ gdalinfo NETCDF:nhtsw100e2_20030107_20030113.nc:merged_snow_cover_extent
Driver: netCDF/Network Common Data Format
Files: nhtsw100e2_20030107_20030113.nc
Size is 180, 180
Coordinate System is:
PROJCS["LAEA (WGS84) ",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9108"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Lambert_Azimuthal_Equal_Area"],
    PARAMETER["latitude_of_center",90],
    PARAMETER["longitude_of_center",0],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0]]

Geotiffs via gdal_translate

Because gdal is now detecting the spatial reference system, one can use it to convert the variables to geotiff. But we are not quite done. Remember the slope of the line fit between "cols" and the projected y coordinate? It was negative, right?


In [ ]:
print m_y

Yup. Negative. That means row zero translates to the biggest projected Y value and row 179 translates to the smallest projected Y value. So row 0 starts at the top and works its way down. The netcdf driver for gdal tries to detect whether the rows are top-down or bottom up, but sometimes it doesn't. Our data set is one of the lucky winners.

Configuration options from http://www.gdal.org/frmt_netcdf.html:

GDAL_NETCDF_BOTTOMUP=[YES/NO] : Set the y-axis order for import, overriding the order detected by the driver. This option is usually not needed unless a specific dataset is causing problems (which should be reported in GDAL trac).

In this case, gdal 1.11.1 autodetects a bottom-up dataset when it should be top down, so we must override. The working conversion command for gdal_translate is as follows:

gdal_translate -of GTiff -b 1 --config GDAL_NETCDF_BOTTOMUP NO \
    NETCDF:nhtsw100e2_20030107_20030113:merged_snow_cover_extent snowcover.tif

Looking at the hard won data

Here we view the data in the file. First, we don't do anything special, just dump it to the screen exactly as it exists in the file.


In [ ]:
d = nc.Dataset('nhtsw100e2_20030107_20030113.nc')
snow = d.variables['merged_snow_cover_extent'][:]
lats = d.variables['latitude'][:]
lons = d.variables['longitude'][:]
imshow(snow[0,...])

In [ ]:
print "hello"

That's great, but we don't really have any indication that the georeferencing worked. The axes are strictly (i,j) without any physical meaning. In fact, we could have done this right off the bat, without fixing anything. What we really want to do is draw a quick map with crude coastlines and some lat/lon lines.


In [ ]:
def init_basemap() : 

    m = Basemap(width=100000*180, height=100000*180,
            #llcrnrx=cv_cols[0], llcrnry=cv_rows[-1], urcrnrx=cv_cols[-1], urcrnry=cv_rows[0],
            resolution='l', projection='laea',
            lon_0=lon_0, lat_0=lat_0)
    m.drawcoastlines()
    m.drawcountries()
    m.drawmeridians(np.arange(-180.,180.,20.),labels=[False,False,False,True])
    m.drawparallels(np.arange(10.,80.,20.), labels=[True,False,False,False])
    return m

The basic strategy is to use the map template above by calling "init_basemap", then draw the data on it. The data are drawn using "filled contours" between the data points. This may or may not be what we want.


In [ ]:
figsize(12,10)
m = init_basemap()
c = m.contourf(lons[:], lats[:],
               snow[0,...], 20, latlon=True)
cb = m.colorbar(c)
title("Snow cover, 13 Jan 2003")

As shown, the data values line up with the coastlines, so it appears that the georeferencing worked just fine.