The Australian Geoscience Datacube provides an integrated gridded data analysis environment for decades of analysis ready earth observation satellite and related data from multiple satellite and other acquisition systems.
For instructions on using the Datacube on NCI, see: http://agdc-v2.readthedocs.io/en/develop/nci_usage.html
For instructions on setting up your own instance, see: http://agdc-v2.readthedocs.io/en/develop/install.html
This notebook touches briefly on some the implimented features of the Datacube module, and is only intended to deomstrat functionality rather than be a tutorial.
In [1]:
%matplotlib inline
import datacube
If you have set up your config correctly, or are using the module on NCI, you should be able to make Datacube object that can connects to the configured datacube system.
In [2]:
dc = datacube.Datacube(app='dc-example')
dc
Out[2]:
In [3]:
dc.list_products()
Out[3]:
In [4]:
dc.list_measurements()
Out[4]:
In [5]:
nbar = dc.load(product='ls5_nbar_albers', x=(149.25, 149.35), y=(-35.25, -35.35))
The returned data is an xarray.Dataset object, which is a labelled n-dimensional array wrapping a numpy array.
We can investigate the data to see the variables (measurement bands) and dimensions that were returned:
In [6]:
nbar
Out[6]:
We can look at the data by name directly, or through the data_vars dictionary:
In [7]:
nbar.data_vars
Out[7]:
In [8]:
nbar.green
Out[8]:
In [9]:
autumn = nbar.green.loc['1991-3':'1991-5']
autumn.shape
Out[9]:
In [10]:
autumn.plot(col='time', col_wrap=3)
Out[10]:
When there is no data availaible, such as on the boundaries of a scene, it is filled in with a special value.
We can use filter it out, although xarray will convert the data from int to float so that it can use NaN to indicate no data.
Now that bad values are no longer represented as -9999, the data fits on a much better colour ramp:
In [11]:
autumn_valid = autumn.where(autumn != autumn.attrs['nodata'])
autumn_valid.plot(col='time', col_wrap=3)
Out[11]:
In [12]:
pq = dc.load(product='ls5_pq_albers', x=(149.25, 149.35), y=(-35.25, -35.35))
pq_autumn = pq.pixelquality.loc['1991-3':'1991-5']
pq_autumn.plot(col='time', col_wrap=3)
Out[12]:
The PQ layer stores a bitmask of several values. We can list the information available:
In [13]:
from datacube.storage import masking
import pandas
pandas.DataFrame.from_dict(masking.get_flags_def(pq), orient='index')
Out[13]:
In [14]:
good_data = masking.make_mask(pq, cloud_acca='no_cloud', cloud_fmask='no_cloud', contiguous=True)
autumn_good_data = good_data.pixelquality.loc['1991-3':'1991-5']
autumn_good_data.plot(col='time', col_wrap=3)
Out[14]:
In [15]:
autumn_cloud_free = autumn_valid.where(autumn_good_data)
autumn_cloud_free.plot(col='time', col_wrap=3)
Out[15]:
In [16]:
nbar_by_solar_day = dc.load(product='ls5_nbar_albers', x=(149.25, 149.35), y=(-35.25, -35.35), group_by='solar_day')
len(nbar_by_solar_day.time)
Out[16]:
We have fewer times than we did previously.
In [17]:
autumn2 = nbar_by_solar_day.green.loc['1991-3':'1991-5']
autumn2.shape
Out[17]:
In [18]:
autumn2.plot(col='time', col_wrap=3)
Out[18]:
In [19]:
two_bands = dc.load(product='ls5_nbar_albers', x=(149.07, 149.17), y=(-35.25, -35.35),
time=('1991', '1992'), measurements=['red', 'nir'], group_by='solar_day')
red = two_bands.red.where(two_bands.red != two_bands.red.attrs['nodata'])
nir = two_bands.nir.where(two_bands.nir != two_bands.nir.attrs['nodata'])
pq = dc.load(product='ls5_pq_albers', x=(149.07, 149.17), y=(-35.25, -35.35),
time=('1991', '1992'), group_by='solar_day')
cloud_free = masking.make_mask(pq, cloud_acca='no_cloud', cloud_fmask='no_cloud', contiguous=True).pixelquality
ndvi = ((nir - red) / (nir + red)).where(cloud_free)
In [20]:
ndvi.shape
Out[20]:
In [21]:
ndvi.plot(col='time', col_wrap=5)
Out[21]:
In [22]:
mostly_cloud_free = cloud_free.sum(dim=('x','y')) > (0.75 * cloud_free.size / cloud_free.time.size)
mostly_good_ndvi = ndvi.where(mostly_cloud_free).dropna('time', how='all')
mostly_good_ndvi.plot(col='time', col_wrap=5)
Out[22]:
In [23]:
mostly_good_ndvi.median(dim='time').plot()
Out[23]:
In [24]:
mostly_good_ndvi.std(dim='time').plot()
Out[24]:
In [25]:
mostly_good_ndvi.sel(y=-3955361, x=1549813, method='nearest').plot()
Out[25]:
In [26]:
mostly_good_ndvi.isel(x=[200], y=[200]).plot()
Out[26]:
In [27]:
mostly_good_ndvi.isel(y=250).plot()
Out[27]:
A line shapefile with pairs of coordinates (using sel_points instead of isel_points) would be able to be interpolated into something less blocky for the next plot.
In [28]:
mostly_good_ndvi.isel_points(x=[0, 100, 200, 300, 300, 400],
y=[200, 200, 200, 250, 300, 400]).plot(x='points', y='time')
Out[28]:
In [29]:
rgb = dc.load(product='ls5_nbar_albers',
x=(149.07, 149.17), y=(-35.25, -35.35),
time=('1991-3-1', '1991-6-30'),
measurements=['red', 'green', 'blue'],
group_by='solar_day', stack='color').transpose('time', 'y', 'x', 'color')
zip(rgb.dims, rgb.shape)
Out[29]:
In [30]:
fake_saturation = 3000
clipped_visible = rgb.where(rgb<fake_saturation).fillna(fake_saturation)
max_val = clipped_visible.max(['y', 'x'])
scaled = (clipped_visible / max_val)
In [31]:
from matplotlib import pyplot as plt
plt.imshow(scaled.isel(time=3))
Out[31]:
In [32]:
grid = dc.load(product='dsm1sv10', x=(149.07, 149.17), y=(-35.25, -35.35))
grid.elevation.shape
Out[32]:
In [33]:
grid.elevation[0].plot()
Out[33]:
In [34]:
albers_grid = dc.load(product='dsm1sv10', x=(149.07, 149.17), y=(-35.25, -35.35),
output_crs='EPSG:3577', resolution=(-25,25))
albers_grid.elevation.shape
Out[34]:
In [35]:
albers_grid.elevation[0].plot()
Out[35]: