An introduction to Rasterio

The smallest interesting problems [1] addressed by Rasterio are reading raster data from files as Numpy arrays and writing such arrays back to files. In between, you can use the world of scientific python software to analyze and process the data. Rasterio also provides a few operations that are described in the next notebooks in this series.

This notebook demonstrates the basics of reading and writing raster data with Rasterio.

Overview of a dataset

A raster dataset consists of one or more dense (as opposed to sparse) 2-D arrays of scalar values. An RGB TIFF image file is a good example of a raster dataset. It has 3 bands (or channels – we'll call them bands here) and each has a number of rows (its height) and columns (its width) and a uniform data type (unsigned 8-bit integers, 64-bit floats, etc). Geospatially referenced datasets will also possess a mapping from image to world coordinates (a transform) in a specific coordinate reference system (crs). This metadata about a dataset is readily accessible using Rasterio.

The Scientific Python community often imports numpy as np. Do this and also import rasterio.


In [9]:
import numpy as np

import rasterio

Rasterio uses for many of its tests a small 3-band GeoTIFF file named "RGB.byte.tif". Open it using the function rasterio.open().


In [10]:
src = rasterio.open('../tests/data/RGB.byte.tif')

This function returns a dataset object. It has many of the same properties as a Python file object.


In [11]:
src.name


Out[11]:
'../tests/data/RGB.byte.tif'

In [12]:
src.mode


Out[12]:
'r'

In [13]:
src.closed


Out[13]:
False

Raster datasets have additional structure and a description can be had from its meta property or individually.


In [14]:
src.meta


Out[14]:
{'affine': Affine(300.0379266750948, 0.0, 101985.0,
       0.0, -300.041782729805, 2826915.0),
 'count': 3,
 'crs': {'init': u'epsg:32618'},
 'driver': u'GTiff',
 'dtype': 'uint8',
 'height': 718,
 'nodata': 0.0,
 'transform': (101985.0,
  300.0379266750948,
  0.0,
  2826915.0,
  0.0,
  -300.041782729805),
 'width': 791}

In [15]:
src.crs


Out[15]:
{'init': u'epsg:32618'}

To close an opened dataset, use its close() method.


In [16]:
src.close()
src.closed


Out[16]:
True

You can't read from or write to a closed dataset, but you can continue access its properties.


In [23]:
src.driver


Out[23]:
u'GTiff'

Dataset layout

Three properties of a Rasterio dataset tell you a lot about it in Numpy terms. The shape of a dataset is a height, width tuple and is exactly the shape of Numpy arrays that would be read from it. The testing dataset has 718 rows and 791 columns.


In [26]:
src.shape


Out[26]:
(718, 791)

The count of bands in the dataset is 3.


In [27]:
src.count


Out[27]:
3

All three of its bands contain 8-bit unsigned integers.


In [28]:
src.dtypes


Out[28]:
['uint8', 'uint8', 'uint8']

Numpy concepts are the model here. If you wanted to create a 3-D Numpy array into which the testing data file's bands would fit without any resampling, you would use the following Python code.


In [25]:
dest = np.empty((src.count,) + src.shape, dtype='uint8')
dest


Out[25]:
array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)

References

[1]: Mike Bostock's words from his FOSS4G keynote, 2014-09-10