Overview

I want to create a sample data set of 100 grid cells (10km x 10km) near Fairbanks. I will use actual data where I can easily find it and otherise I will generate random data. This sample dataset will demonstrate the correct "shape" (variables and their dimensions) for the new input files to dvm-dos-tem.

Summary of Requirements

  • Data should be in a rectangular (grid) layout.
  • NetCDF.
  • Attempting to conform to CF & COARDS standards.
  • Geospatial information must be with the file. Each file should have variables for Lat and Lon each defined interms of the dimensions of (y, x), where (y, x) are the rectangular grid coordinates.

Is computing GIRR the only place that latitude is needed?

Maybe it is possible to precompute GIRR?

Is longitude even used?

Process

Driving climate file (nirr, vapo, tair, prec)

First step is to pick some y,x or lat/lon bounding box coordinates to use with gdal_translate for subsetting the AIEM domain data files from SNAP. The files I have from SNAP are in Alaska Albers projection, 1km pixel size.

With Google Maps, I take a guess at where the Bonanza Creek LTER is from the sattelite view (64.667557, -148.489804), and then eyeball the location ~5km north and west of there (64.708950, -148.633313) so that my new subwindow is more or less centered on the LTER site.

Working with a netcdf file as opposed to the tif, I can use ncview to quickly check coordinates and data values to make sure the commands are doing what I expect.


In [ ]:
# Specifying the "creation option" means that special 
# variables will be written to the new netcdf file mapping
# row/column coordinates to lat/lon
!gdal_translate -of netCDF -co "WRITE_LONLAT=YES" \
../../snap-data/tas_mean_C_iem_cccma_cgcm3_1_sresa1b_2001_2100/tas_mean_C_iem_cccma_cgcm3_1_sresa1b_01_2001.tif \
temporary_with_lonlat.nc

# see what we got
#!ncdump -h temporary_with_lonlat.nc

While gdal_translate would let me specify the coordinates for my subwindow in lon/lat, (projection coordinates), for some reason I want a nice, square, 10x10km data set, so I use the row/column coordinates that I find using ncview: (64.708950, -148.633313) which is roughly (j=1134, i=991) as far as ncview is concerned. gdal_translate wants the "source window" defined in terms of (xoff, yoff, xsize, ysize).

If I was not concerned with the exact pixel size of the sample dataset I could instead speficy the "source window" to gdal_translate using "projection coordinates", assuming the source file has the right geo- spatial information with it.

Anyways, use gdal_translate again to grab the sub-window, specifying the coordinates with pixel and line:


In [ ]:
# NOTE: Actually I seem to be having problems with the y axis
# ordering again. So while I pick pixel(i,j): (915,1548), when I 
# use those numbers for xoff and yoff, the resulting image is not
# in the right place. If I 'reverse' the y (j) coordinate, 
# so 1850 - (1548 - 10) = 292, then the image comes out correct
# with (0,0) being the lower left corner, and the pixel we selected
# as Toolik, or close to it.
#
# NOTE: this means I need to fix the Bonanza Creek coords!!!

# -srcwin 915 292 10 10  #  Toolik ~(68.538, -149.6275)
# -srcwin 991 1134 10 10  #  Bonanza Creek area
!gdal_translate -of netCDF -co "WRITE_LONLAT=YES" -co GDAL_NETCDF_BOTTOMUP=YES \
-srcwin 915 292 10 10 temporary_with_lonlat.nc temp_subset_with_lonlat.nc

In [ ]:
# Check out what we got
#!ncdump -h temp_subset_with_lonlat.nc
!gdalinfo temp_subset_with_lonlat.nc

In [ ]:
import netCDF4
import numpy as np

NOTE: I am going to hardcode the (0,0) pixel to have the exact same, values as the input data for Toolik for testing...


In [ ]:
def make_fire_dataset(fname, sizey=10, sizex=10):
    ncfile = netCDF4.Dataset(fname, mode='w', format='NETCDF4')

    Y = ncfile.createDimension('Y', sizey)
    X = ncfile.createDimension('X', sizex)

    fri = ncfile.createVariable('fri', np.int, ('Y','X',))
    fri[:] = np.random.uniform(low=1, high=7, size=(10, 10))
    fri[0,0] = 1000
    
    fire_year_vector = ncfile.createVLType(np.int, 'fire_year_vector')
    fire_years = ncfile.createVariable('fire_years', fire_year_vector, ('Y','X'))

    fire_sizes = ncfile.createVariable('fire_sizes', fire_year_vector, ('Y','X'))

    yr_data = np.empty(sizey * sizex, object)
    sz_data = np.empty(sizey * sizex, object)
    for n in range(sizey * sizex):
        yr_data[n] = np.array(sorted(np.random.randint(1900, 2006, np.random.randint(0,10,1))), dtype=np.int)
        sz_data[n] = np.random.randint(0,100,len(yr_data[n]))
        #numpy.arange(random.randint(1,10),dtype='int32')+1
    
    yr_data = np.reshape(yr_data,(sizey,sizex))
    sz_data = np.reshape(sz_data,(sizey,sizex))

    print yr_data[0,0], "-->", sz_data[0,0] 
    print yr_data[0,1], "-->", sz_data[0,1]
    print yr_data[9,9], "-->", sz_data[9,9]
    
    fire_years[:] = yr_data
    fire_sizes[:] = sz_data

    
    
    
    ncfile.close()   


    
    
    
    
# Create the new fire file...
make_fire_dataset("new-fire-dataset.nc", sizey=10, sizex=10)

In [ ]:
!ncdump -h new-fire-dataset.nc

In [ ]:


In [ ]:
def make_veg_classification(fname, sizey=10, sizex=10):
    ncfile = netCDF4.Dataset(fname, mode='w', format='NETCDF4')

    Y = ncfile.createDimension('Y', sizey)
    X = ncfile.createDimension('X', sizex)

    veg_class = ncfile.createVariable('veg_class', np.int, ('Y', 'X',))
    veg_class[:] = np.random.uniform(low=1, high=7, size=(10,10))
    veg_class[0,0] = 4
    
    ncfile.close()

# Create a new veg_classification file
make_veg_classification("new-veg-dataset.nc", sizey=10, sizex=10)

In [ ]:
def make_drainage_classification(fname, sizey=10, sizex=10):
    ncfile = netCDF4.Dataset(fname, mode='w', format='NETCDF4')

    Y = ncfile.createDimension('Y', sizey)
    X = ncfile.createDimension('X', sizex)

    drainage_class = ncfile.createVariable('drainage_class', np.int, ('Y', 'X',))
    drainage_class[:] = np.random.uniform(low=1, high=7, size=(10,10))
    drainage_class[0,0] = 0
    ncfile.close()

# Create a new veg_classification file
make_drainage_classification("new-drainage-dataset.nc", sizey=10, sizex=10)

In [ ]:
def make_run_mask(filename, sizey=10, sizex=10):
    ncfile = netCDF4.Dataset(filename, mode='w', format='NETCDF4')
    Y = ncfile.createDimension('Y', sizey)
    X = ncfile.createDimension('X', sizex)

    run = ncfile.createVariable('run', np.int, ('Y', 'X',))
    run[:] = np.zeros((10,10))
    run[0,0] = 1
    
    ncfile.close()

# Create a new run mask file (all ones for now)
make_run_mask('run-mask.nc', sizey=10, sizex=10)

In [ ]:
def copy_co2_to_new_style(filename):
    '''Creates an co2 file for dvmdostem from the old sample data'''
    old_ncfile = netCDF4.Dataset("../DATA/test_single_site/dataregion/co2.nc", mode='r')
    new_ncfile = netCDF4.Dataset(filename, mode='w', format='NETCDF4')

    # Dimensions
    yearD = new_ncfile.createDimension('year', None) # append along time axis
    
    # Coordinate Variable
    yearV = new_ncfile.createVariable('year', np.int, ('year',))
    
    # Data Variables
    co2 = new_ncfile.createVariable('co2', np.float32, ('year',))
    
    yearV[:] = old_ncfile.variables['YEAR'][:]
    co2[:] = old_ncfile.variables['CO2'][:]
    
    old_ncfile.close()
    new_ncfile.close()
    
# Copy over the co2 file
copy_co2_to_new_style('new-co2-dataset.nc')

In [ ]:
def create_empty_climate_nc_file(filename, sizey=10, sizex=10):
    '''Creates an empty climate file for dvmdostem; y,x grid, time unlimited.'''
    
    ncfile = netCDF4.Dataset(filename, mode="w", format='NETCDF4')
    
    # Dimensions for the file.
    time_dim = ncfile.createDimension('time', None) # append along time axis
    Y = ncfile.createDimension('Y', sizey)
    X = ncfile.createDimension('X', sizex)

    # Coordinate Variables
    Y = ncfile.createVariable('Y', np.int, ('Y',))
    X = ncfile.createVariable('X', np.int, ('X',))
    Y[:] = np.arange(0, sizey)
    X[:] = np.arange(0, sizex)
    
    # 'Spatial Refefence' variables (?)
    lat = ncfile.createVariable('lat', np.float32, ('Y', 'X',))
    lon = ncfile.createVariable('lon', np.float32, ('Y', 'X',))
    
    # Create data variables
    #co2 = ncfile.createVariable('co2', np.float32, ('time')) # actually year
    temp_air = ncfile.createVariable('tair', np.float32, ('time', 'Y', 'X',))
    precip = ncfile.createVariable('precip', np.float32, ('time', 'Y', 'X',))
    nirr = ncfile.createVariable('nirr', np.float32, ('time', 'Y', 'X',))
    vapor_press = ncfile.createVariable('vapor_press', np.float32, ('time', 'Y', 'X',))
    
    ncfile.close()
    
# Make a new file to copy data into
create_empty_climate_nc_file('new-climate-dataset.nc', sizey=10, sizex=10)

In [ ]:
# Open the 'temporary' dataset
temp_subset_with_lonlat = netCDF4.Dataset('temp_subset_with_lonlat.nc', mode='r')

# Open the new file for appending
new_climatedataset = netCDF4.Dataset('new-climate-dataset.nc', mode='a')

# Grab the lat and lon from the temporary file
lat = new_climatedataset.variables['lat']
lon = new_climatedataset.variables['lon']
lat[:] = temp_subset_with_lonlat.variables['lat'][:]
lon[:] = temp_subset_with_lonlat.variables['lon'][:]

#tair = new_climatedateset.variables['tair']
#tair[0,:,:] = temp_subset_with_lonlat.variables['Band1'][:]

new_climatedataset.close()
temp_subset_with_lonlat.close()

In [ ]:
#!ncdump -h temp_subset_with_lonlat.nc
#!ncdump temp_subset_with_lonlat.nc
#!ncdump new-climate-dataset.nc

#Check the values at 0,0 to see that everythting looks correct:
!ncks -v lat,lon -d Y,0,0,1 -d X,0,0,1 new-climate-dataset.nc

Now we have a basic dataset with the lat/lon coordinates. Time to populate it with bit of data.


In [ ]:
with netCDF4.Dataset('new-climate-dataset.nc', mode='a') as new_climatedataset:

    YEARS = 10

    TIMESTEPS=YEARS*12

    # # Write random, junk data to the climate file
    # new_climatedataset = netCDF4.Dataset('new-climate-dataset.nc', mode='a')

    sx = new_climatedataset.variables['X'].size
    sy = new_climatedataset.variables['Y'].size

    junkA = np.random.uniform(low=0.0, high=10, size=(TIMESTEPS*sy*sx)).reshape(TIMESTEPS, sy, sx)
    junkB = np.random.uniform(low=0.0, high=1300, size=(TIMESTEPS*sy*sx)).reshape(TIMESTEPS, sy, sx)
    junkC = np.random.uniform(low=0.0, high=20, size=(TIMESTEPS*sy*sx)).reshape(TIMESTEPS, sy, sx)

    new_climatedataset.variables['precip'][:] = junkA
    new_climatedataset.variables['nirr'][:] = junkB
    new_climatedataset.variables['vapor_press'][:] = junkC

In [ ]:
#!ncks -v nirr new-climate-dataset.nc
!ncdump -h new-climate-dataset.nc
print YEARS

In [ ]:
# Open the new file for appending
with netCDF4.Dataset('new-climate-dataset.nc', mode='a') as new_climatedataset:

    for yridx, year in enumerate(range(2010, 2010+YEARS)):
        for midx, month in enumerate(range (1,13)): # Note 1 based month!
            print year, month
            # TRANSLATE TO NETCDF
            # The curly braces are needed to run the shell command from w/in
            # ipython and have the variable exansion with year and month
            # work out alright
            print "Converting tif --> netcdf..."
            !gdal_translate -of netCDF \
            {"../../snap-data/tas_mean_C_iem_cccma_cgcm3_1_sresa1b_2001_2100/tas_mean_C_iem_cccma_cgcm3_1_sresa1b_%02d_%04d.tif" % (month, year)} \
            temporary_tair.nc
            !gdal_translate -of netCDF \
            {"../../snap-data/rsds_mean_MJ-m2-d1_iem_cccma_cgcm3_1_sresa1b_2001_2100/rsds_mean_MJ-m2-d1_iem_cccma_cgcm3_1_sresa1b_%02d_%04d.tif" % (month, year)} \
            temporary_rsds.nc
            !gdal_translate -of netCDF \
            {"../../snap-data/pr_total_mm_iem_cccma_cgcm3_1_sresa1b_2001_2100/pr_total_mm_iem_cccma_cgcm3_1_sresa1b_%02d_%04d.tif" % (month, year)} \
            temporary_pr.nc
            !gdal_translate -of netCDF \
            {"../../snap-data/vap_mean_hPa_iem_cccma_cgcm3_1_sresa1b_2001_2100/vap_mean_hPa_iem_cccma_cgcm3_1_sresa1b_%02d_%04d.tif" % (month, year)} \
            temporary_vapo.nc

            print "Subsetting...."
            !gdal_translate -of netCDF \
            -srcwin 915 292 10 10 temporary_tair.nc temporary_tair2.nc
            !gdal_translate -of netCDF \
            -srcwin 915 292 10 10 temporary_rsds.nc temporary_rsds2.nc
            !gdal_translate -of netCDF \
            -srcwin 915 292 10 10 temporary_pr.nc temporary_pr2.nc
            !gdal_translate -of netCDF \
            -srcwin 915 292 10 10 temporary_vapo.nc temporary_vapo2.nc

            print "Writing subset's data to new files..."
            with netCDF4.Dataset('temporary_tair2.nc', mode='r') as t2:
                # Grab the lat and lon from the temporary file
                tair = new_climatedataset.variables['tair']
                tair[yridx*12+midx] = t2.variables['Band1'][:]

            with netCDF4.Dataset('temporary_rsds2.nc', mode='r') as t2:
                # Grab the lat and lon from the temporary file
                nirr = new_climatedataset.variables['nirr']
                nirr[yridx*12+midx] = t2.variables['Band1'][:]
                
            with netCDF4.Dataset('temporary_pr2.nc', mode='r') as t2:
                # Grab the lat and lon from the temporary file
                prec = new_climatedataset.variables['precip']
                prec[yridx*12+midx] = t2.variables['Band1'][:]

            with netCDF4.Dataset('temporary_vapo2.nc', mode='r') as t2:
                # Grab the lat and lon from the temporary file
                vapo = new_climatedataset.variables['vapor_press']
                vapo[yridx*12+midx] = t2.variables['Band1'][:]
                
                
                
                
print "Done appending. Closing the new file"

In [ ]:
!ncdump -h new-climate-dataset.nc

In [ ]:
Y = 0
X = 0
with netCDF4.Dataset('new-climate-dataset.nc', mode='a') as new_climatedataset:
    plt.plot(new_climatedataset.variables['tair'][:,Y,X])
    plt.plot(new_climatedataset.variables['precip'][:,Y,X])
    plt.plot(new_climatedataset.variables['nirr'][:,Y,X])
    plt.plot(new_climatedataset.variables['vapor_press'][:,Y,X], label='vapor_press')
    

with netCDF4.Dataset('../DATA/Toolik_Inputs/datacht/cccma_a1b.nc', mode='a') as old_climatedataset:
    CLMID = 0
    plt.plot(old_climatedataset.variables['TAIR'][CLMID,0:10,:].flatten(), label='TAIR')  
    plt.plot(old_climatedataset.variables['PREC'][CLMID,0:10,:].flatten(), label='PREC')  
    plt.plot(old_climatedataset.variables['NIRR'][CLMID,0:10,:].flatten()/10, label='NIRR') 
    plt.plot(old_climatedataset.variables['VAPO'][CLMID,0:10,:].flatten(), label='VAPO') 
    
    plt.legend()

In [ ]:
with netCDF4.Dataset('../DATA/Toolik_Inputs/datacht/cccma_a1b.nc', mode='a') as old_climatedataset:
    
    CLMID = 0
    plt.plot(old_climatedataset.variables['TAIR'][CLMID,0:10,:].flatten(), label='TAIR')  
    plt.plot(old_climatedataset.variables['PREC'][CLMID,0:10,:].flatten(), label='PREC')  
    plt.plot(old_climatedataset.variables['NIRR'][CLMID,0:10,:].flatten()/8, label='NIRR')  
    #plt.legend()
#     plt.plot(old_climatedataset.variables['precip'][:,Y,X])
#     plt.plot(old_climatedataset.variables['nirr'][:,Y,X])

In [ ]:
#!ncdump -h ../DATA/Toolik_Inputs/datacht/cccma_a1b.nc
!ncks -v CLMID ../DATA/Toolik_Inputs/datacht/cccma_a1b.nc

In [ ]:
!ncdump ../DATA/Toolik_Inputs/datacht/cohortid.nc

In [ ]:
!ncdump -h ../DATA/Toolik_Inputs/datacht/climate.nc

In [ ]:
from mpl_toolkits.mplot3d import Axes3D

def randrange(n, vmin, vmax):
    return (vmax-vmin)*np.random.rand(n) + vmin

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
n = 100
for c, m, zl, zh in [('r', 'o', -50, -25), ('b', '^', -30, -5)]:
    xs = randrange(n, 23, 32)
    ys = randrange(n, 0, 100)
    zs = randrange(n, zl, zh)
    ax.scatter(xs, ys, zs, c=c, marker=m)

ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')

plt.show()

In [ ]:


In [ ]:


In [ ]:


In [ ]:
with open("temporary.nc") as f:
    print "Hmmm"

In [ ]:


In [ ]: