Loading data from the datacube

This notebook will briefly discuss how to load data from the datacube.

Importing the datacube

To start with, we'll import the datacube module and load an instance of the datacube and call our application name load-data-example.


In [1]:
import datacube
dc = datacube.Datacube(app='load-data-example')

Loading data

Loading data from the datacube uses the load function.

The function takes several arguments:

  • product; A specifc product to load
  • x; Defines the spatial region in the x dimension
  • y; Defines the spatial region in the y dimension
  • time; Defines the temporal extent.

We'll load the Landsat 5-TM, Nadir Bi-directional reflectance ristribution function Adjusted Reflectance, for the spatial region covering:

  • 149.25 -> 149.5 degrees longitude
  • -36.25 -> -36.5 degrees latitude

and a temporal extent covering:

  • 2008-01-01 -> 2009-01-01

In [2]:
data = dc.load(product='ls5_nbar_albers', x=(149.25, 149.5), y=(-36.25, -36.5),
               time=('2008-01-01', '2009-01-01'))

In [3]:
data


Out[3]:
<xarray.Dataset>
Dimensions:  (time: 7, x: 1041, y: 1221)
Coordinates:
  * time     (time) datetime64[ns] 2008-01-15T23:41:48.500000 ...
  * y        (y) float64 -4.066e+06 -4.066e+06 -4.066e+06 -4.066e+06 ...
  * x        (x) float64 1.543e+06 1.543e+06 1.543e+06 1.543e+06 1.543e+06 ...
Data variables:
    blue     (time, y, x) int16 3326 3326 3326 3326 3326 3326 3326 3326 3326 ...
    green    (time, y, x) int16 6061 5882 5942 5853 5942 5972 6150 7088 7493 ...
    red      (time, y, x) int16 5943 5792 5767 5717 5868 5792 6194 6370 6370 ...
    nir      (time, y, x) int16 6678 6434 6504 6364 6504 6434 6749 7932 8866 ...
    swir1    (time, y, x) int16 3155 2939 3179 3035 3323 3418 3825 4926 6002 ...
    swir2    (time, y, x) int16 1784 1618 1884 1718 1884 1984 2350 3182 4113 ...
Attributes:
    crs: EPSG:3577

Load data via a products native co-ordinate system

By default, the x and y arguments accept queries in a geographical co-ordinate system identified by the EPSG code 4326, which is the same as within Google Earth.

The user can also query via the native co-ordinate system that the product is stored in, and supply the crs argument.


In [4]:
data = dc.load(product='ls5_nbar_albers', x=(1543137.5, 1569137.5), y=(-4065537.5, -4096037.5),
               time=('2008-01-01', '2009-01-01'), crs='EPSG:3577')

In [5]:
data


Out[5]:
<xarray.Dataset>
Dimensions:  (time: 7, x: 1041, y: 1221)
Coordinates:
  * time     (time) datetime64[ns] 2008-01-15T23:41:48.500000 ...
  * y        (y) float64 -4.066e+06 -4.066e+06 -4.066e+06 -4.066e+06 ...
  * x        (x) float64 1.543e+06 1.543e+06 1.543e+06 1.543e+06 1.543e+06 ...
Data variables:
    blue     (time, y, x) int16 3326 3326 3326 3326 3326 3326 3326 3326 3326 ...
    green    (time, y, x) int16 6061 5882 5942 5853 5942 5972 6150 7088 7493 ...
    red      (time, y, x) int16 5943 5792 5767 5717 5868 5792 6194 6370 6370 ...
    nir      (time, y, x) int16 6678 6434 6504 6364 6504 6434 6749 7932 8866 ...
    swir1    (time, y, x) int16 3155 2939 3179 3035 3323 3418 3825 4926 6002 ...
    swir2    (time, y, x) int16 1784 1618 1884 1718 1884 1984 2350 3182 4113 ...
Attributes:
    crs: EPSG:3577

Load specific measurements of a given product

Some products have several measurements such as Landsat 5-TM, which for the ls5_nbar_albers product contains the following spectral measurements:

  • blue
  • green
  • red
  • nir
  • swir1
  • swir2

In this next example we'll only load the red and nir measurements.


In [6]:
data = dc.load(product='ls5_nbar_albers', x=(149.25, 149.5), y=(-36.25, -36.5),
               time=('2008-01-01', '2009-01-01'), measurements=['red', 'nir'])

In [7]:
data


Out[7]:
<xarray.Dataset>
Dimensions:  (time: 7, x: 1041, y: 1221)
Coordinates:
  * time     (time) datetime64[ns] 2008-01-15T23:41:48.500000 ...
  * y        (y) float64 -4.066e+06 -4.066e+06 -4.066e+06 -4.066e+06 ...
  * x        (x) float64 1.543e+06 1.543e+06 1.543e+06 1.543e+06 1.543e+06 ...
Data variables:
    red      (time, y, x) int16 5943 5792 5767 5717 5868 5792 6194 6370 6370 ...
    nir      (time, y, x) int16 6678 6434 6504 6364 6504 6434 6749 7932 8866 ...
Attributes:
    crs: EPSG:3577

Additional help can be found by calling help(dc.load)


In [8]:
help(dc.load)


Help on method load in module datacube.api.core:

load(product=None, measurements=None, output_crs=None, resolution=None, resampling=None, stack=False, dask_chunks=None, like=None, fuse_func=None, align=None, datasets=None, **query) method of datacube.api.core.Datacube instance
    Load data as an ``xarray`` object.  Each measurement will be a data variable in the :class:`xarray.Dataset`.
    
    See the `xarray documentation <http://xarray.pydata.org/en/stable/data-structures.html>`_ for usage of the
    :class:`xarray.Dataset` and :class:`xarray.DataArray` objects.
    
    **Product and Measurements**
        A product can be specified using the product name, or by search fields that uniquely describe a single
        product.
        ::
    
            product='ls5_ndvi_albers'
    
        See :meth:`list_products` for the list of products with their names and properties.
    
        A product can also be selected by searched using fields, but must only match one product.
        ::
    
            platform='LANDSAT_5',
            product_type='ndvi'
    
        The ``measurements`` argument is a list of measurement names, as listed in :meth:`list_measurements`.
        If not provided, all measurements for the product will be returned.
        ::
    
            measurements=['red', 'nir', swir2']
    
    **Dimensions**
        Spatial dimensions can specified using the ``longitude``/``latitude`` and ``x``/``y`` fields.
    
        The CRS of this query is assumed to be WGS84/EPSG:4326 unless the ``crs`` field is supplied,
        even if the stored data is in another projection or the `output_crs` is specified.
        The dimensions ``longitude``/``latitude`` and ``x``/``y`` can be used interchangeably.
        ::
    
            latitude=(-34.5, -35.2), longitude=(148.3, 148.7)
    
        or ::
    
            x=(1516200, 1541300), y=(-3867375, -3867350), crs='EPSG:3577'
    
        The ``time`` dimension can be specified using a tuple of datetime objects or strings with
        `YYYY-MM-DD hh:mm:ss` format. E.g::
    
            time=('2001-04', '2001-07')
    
        For EO-specific datasets that are based around scenes, the time dimension can be reduced to the day level,
        using solar day to keep scenes together.
        ::
    
            group_by='solar_day'
    
        For data that has different values for the scene overlap the requires more complex rules for combining data,
        such as GA's Pixel Quality dataset, a function can be provided to the merging into a single time slice.
        ::
    
            def pq_fuser(dest, src):
                valid_bit = 8
                valid_val = (1 << valid_bit)
    
                no_data_dest_mask = ~(dest & valid_val).astype(bool)
                np.copyto(dest, src, where=no_data_dest_mask)
    
                both_data_mask = (valid_val & dest & src).astype(bool)
                np.copyto(dest, src & dest, where=both_data_mask)
    
    **Output**
        If the `stack` argument is supplied, the returned data is stacked in a single ``DataArray``.
        A new dimension is created with the name supplied.
        This requires all of the data to be of the same datatype.
    
        To reproject or resample the data, supply the ``output_crs``, ``resolution``, ``resampling`` and ``align``
        fields.
    
        To reproject data to 25m resolution for EPSG:3577::
    
            dc.load(product='ls5_nbar_albers', x=(148.15, 148.2), y=(-35.15, -35.2), time=('1990', '1991'),
                    output_crs='EPSG:3577`, resolution=(-25, 25), resampling='cubic')
    
    :param str product: the product to be included.
    
    :param measurements:
        measurements name or list of names to be included, as listed in :meth:`list_measurements`.
            If a list is specified, the measurements will be returned in the order requested.
            By default all available measurements are included.
    
    :type measurements: list(str), optional
    
    :param query:
        Search parameters for products and dimension ranges as described above.
    
    :param str output_crs:
        The CRS of the returned data.  If no CRS is supplied, the CRS of the stored data is used.
    
    :param (float,float) resolution:
        A tuple of the spatial resolution of the returned data.
        This includes the direction (as indicated by a positive or negative number).
    
        Typically when using most CRSs, the first number would be negative.
    
    :param str resampling:
        The resampling method to use if re-projection is required.
    
        Valid values are: ``'nearest', 'cubic', 'bilinear', 'cubic_spline', 'lanczos', 'average'``
    
        Defaults to ``'nearest'``.
    
    :param (float,float) align:
        Load data such that point 'align' lies on the pixel boundary.
        Units are in the co-ordinate space of the output CRS.
    
        Default is (0,0)
    
    :param stack: The name of the new dimension used to stack the measurements.
        If provided, the data is returned as a :class:`xarray.DataArray` rather than a :class:`xarray.Dataset`.
    
        If only one measurement is returned, the dimension name is not used and the dimension is dropped.
    
    :type stack: str or bool
    
    :param dict dask_chunks:
        If the data should be lazily loaded using :class:`dask.array.Array`,
        specify the chunking size in each output dimension.
    
        See the documentation on using `xarray with dask <http://xarray.pydata.org/en/stable/dask.html>`_
        for more information.
    
    :param xarray.Dataset like:
        Uses the output of a previous ``load()`` to form the basis of a request for another product.
        E.g.::
    
            pq = dc.load(product='ls5_pq_albers', like=nbar_dataset)
    
    :param str group_by:
        When specified, perform basic combining/reducing of the data.
    
    :param fuse_func:
        Function used to fuse/combine/reduce data with the ``group_by`` parameter. By default,
        data is simply copied over the top of each other, in a relatively undefined manner. This function can
        perform a specific combining step, eg. for combining GA PQ data.
    
    :param datasets:
        Optional. If this is a non-empty list of :class:`datacube.model.Dataset` objects, these will be loaded
        instead of performing a database lookup.
    
    :return: Requested data in a :class:`xarray.Dataset`.
        As a :class:`xarray.DataArray` if the ``stack`` variable is supplied.
    
    :rtype: :class:`xarray.Dataset` or :class:`xarray.DataArray`


In [ ]: