virtual python

Virtual python environments give you the ability to install your own python packages in a given directory.

This allows you to test many python packages while also allowing you to have just the packages you need for a particular computation.

All of the packages available on the system will still be available when you run in your virtual environment (unless you create the virtual environment with --no-site-packages).

Creating a virtual environment

cd $HOME
mkdir virtpy
virtualenv virtpy

Activating

source ./virtpy/bin/activate

Now my prompt changes, to tell me I'm in a virtual python environment:

(virtpy)[jduckles@bison ~]$


Running pip in your virtual environment

When your virtual environment is activated it will make sure when you run python or pip they run from your virtualenvironment.

pip

A tool for installing and managing python packages.

There will come a time in your python programming where you find a tool that is available outside of the Python Standard library which could be helpful to you.

Installing things with pip

pip install ipython

This will install the newest stable release of ipython into your virtual environment.

Listing Packages

To show what you have installed you can run:

pip list

Removing packages

pip uninstall ipython

upgrading packages

pip install --upgrade ipython

Some interesting packages

  • csvkit - command line utilities for manipulating csv files
  • shapely - for manipulating vector geometries
  • spectral - Spectral methods (unfortunately doesn't use GDAL)
  • pandas - R-like data frames for Python and more!

  • PyPi - The python package index


In [333]:
from osgeo import gdal 
import numpy as np
import matplotlib.pyplot as plt

Building VRTs

For each of the first four Landsat bands, build a VRT which contains that band across all tiles in the current directory.

for i in {1..4}; do 
    gdalbuildvrt -srcnodata 0 B$i.vrt *B$i0.TIF
done

We get four vrt files, each is a "mosaic" of the available tiles for that band.

Now lets build a single VRT which has each of the by-band mosaics as layers.

gdalbuildvrt -separate stack.vrt *.vrt

In [251]:
vrt_stack = gdal.Open('/Users/jduckles/tmp/ETM/stack.vrt')
b1 = vrt_stack.GetRasterBand(1).ReadAsArray() / 255.0 # normalize for plotting
plt.imshow(b1)


Out[251]:
<matplotlib.image.AxesImage at 0x140e82350>

In [252]:
b2 = vrt_stack.GetRasterBand(2).ReadAsArray() / 255.0 # normalize for plotting
plt.imshow(b2)


Out[252]:
<matplotlib.image.AxesImage at 0x146874d10>

Stacking arrays

A list of arrays can be thought of as a "stack" of rasters. For multspectral/hyperspectral data these stacks tend to be stacks of bands. For time-series data these stacks can be temporal stacks of the data arranged in time.


In [226]:
bands = ['/Users/jduckles/tmp/ETM/L71028035_03520060608_B10.TIF', 
         '/Users/jduckles/tmp/ETM/L71028035_03520060608_B20.TIF',
         '/Users/jduckles/tmp/ETM/L71028035_03520060608_B30.TIF',
         '/Users/jduckles/tmp/ETM/L71028035_03520060608_B40.TIF',
         '/Users/jduckles/tmp/ETM/L71028035_03520060608_B50.TIF']

rasters = np.array([ gdal.Open(band).ReadAsArray() for band in bands ])

List comprehensions

List comprehension is a concise syntax for creating lists in place.


In [234]:
squares1 = []
for x in range(10):
    squares1.append(x**2)
print squares1


[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [233]:
# is the same as:
squares2 = [x**2 for x in range(10)]
print squares2


[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [332]:
# inline if's are possible too.
[x**2 for x in range(60) if x % 4 == 0]


Out[332]:
[0, 16, 64, 144, 256, 400, 576, 784, 1024, 1296, 1600, 1936, 2304, 2704, 3136]

Activity

Try to create a list comprehension expression which takes the following list:

pets = ['cat','dog','frog','rabbit']

And returns a list which has the pets names in upper case with the length of each string

stats = [['CAT',3],['DOG',3],['FROG',4],['RABBIT',6]]

In [244]:
pets = ['cat','dog','frog','rabbit']
stats = [ [pet.upper(),len(pet)] for pet in pets]
print stats


[['CAT', 3], ['DOG', 3], ['FROG', 4], ['RABBIT', 6]]

Back to opening a list of bands


In [265]:
bands = ['/Users/jduckles/tmp/ETM/L71028035_03520060608_B10.TIF', 
         '/Users/jduckles/tmp/ETM/L71028035_03520060608_B20.TIF',
         '/Users/jduckles/tmp/ETM/L71028035_03520060608_B30.TIF',
         '/Users/jduckles/tmp/ETM/L71028035_03520060608_B40.TIF',
         '/Users/jduckles/tmp/ETM/L71028035_03520060608_B50.TIF']

rasters = np.array([ gdal.Open(band).ReadAsArray() for band in bands ])  # <-- See the list comprehension?

We now have a 3-d array with dimesions (spectral band, x, y)


In [254]:
rasters.shape


Out[254]:
(5, 7201, 8161)

Slicing

We can slice the array in various ways.


In [256]:
print rasters[:, 1500, 1500] # all bands for one x,y


[119 127 207 103 225]

In [259]:
print rasters[0:3,1500,1500] # first three bands for one location


[119 127 207]

In [266]:
print rasters [:, 1500:1505, 1500:1505] # all bands for a sampling box


[[[119 118 119 120 121]
  [120 116 117 119 120]
  [117 118 117 117 120]
  [113 111 109 106 108]
  [104 104  97  94  95]]

 [[127 130 132 133 138]
  [127 131 127 133 135]
  [122 128 129 128 128]
  [117 117 111 106 110]
  [100 103  93  88  91]]

 [[207 221 139 145 152]
  [199 231 133 139 147]
  [186 222 201 218 216]
  [180 187 168 157 127]
  [140 136 118  97 106]]

 [[103 104 103 103 105]
  [103 103 101 101 102]
  [101 102 101 100 101]
  [102 103 101  99  98]
  [103 101  98  96  97]]

 [[225 200 199 197 197]
  [215 199 194 197 194]
  [206 214 210 191 191]
  [201 204 197 180 178]
  [175 176 164 152 157]]]

numpy.apply_along_axis

This function allows you to apply a function you define across an axis of a multidimensional array. In this case we're going to apply across the stack dimension.


In [290]:
help(np.apply_along_axis)


Help on function apply_along_axis in module numpy.lib.shape_base:

apply_along_axis(func1d, axis, arr, *args)
    Apply a function to 1-D slices along the given axis.
    
    Execute `func1d(a, *args)` where `func1d` operates on 1-D arrays and `a`
    is a 1-D slice of `arr` along `axis`.
    
    Parameters
    ----------
    func1d : function
        This function should accept 1-D arrays. It is applied to 1-D
        slices of `arr` along the specified axis.
    axis : integer
        Axis along which `arr` is sliced.
    arr : ndarray
        Input array.
    args : any
        Additional arguments to `func1d`.
    
    Returns
    -------
    apply_along_axis : ndarray
        The output array. The shape of `outarr` is identical to the shape of
        `arr`, except along the `axis` dimension, where the length of `outarr`
        is equal to the size of the return value of `func1d`.  If `func1d`
        returns a scalar `outarr` will have one fewer dimensions than `arr`.
    
    See Also
    --------
    apply_over_axes : Apply a function repeatedly over multiple axes.
    
    Examples
    --------
    >>> def my_func(a):
    ...     """Average first and last element of a 1-D array"""
    ...     return (a[0] + a[-1]) * 0.5
    >>> b = np.array([[1,2,3], [4,5,6], [7,8,9]])
    >>> np.apply_along_axis(my_func, 0, b)
    array([ 4.,  5.,  6.])
    >>> np.apply_along_axis(my_func, 1, b)
    array([ 2.,  5.,  8.])
    
    For a function that doesn't return a scalar, the number of dimensions in
    `outarr` is the same as `arr`.
    
    >>> def new_func(a):
    ...     """Divide elements of a by 2."""
    ...     return a * 0.5
    >>> b = np.array([[1,2,3], [4,5,6], [7,8,9]])
    >>> np.apply_along_axis(new_func, 0, b)
    array([[ 0.5,  1. ,  1.5],
           [ 2. ,  2.5,  3. ],
           [ 3.5,  4. ,  4.5]])


In [284]:
def total(a):
    return(np.sum(a))

np.apply_along_axis(total, 0, rasters[:,1500:1550,1500:1550])


Out[284]:
array([[781, 773, 692, ..., 568, 586, 602],
       [764, 780, 672, ..., 580, 570, 615],
       [732, 784, 758, ..., 560, 569, 594],
       ..., 
       [609, 571, 569, ..., 535, 519, 511],
       [632, 601, 582, ..., 539, 527, 503],
       [625, 610, 593, ..., 535, 519, 505]], dtype=uint64)

In [314]:
print rasters[:,1500:1505,1500:1505]

def plusone(a):
    return(a + 1)

np.apply_along_axis(plusone, 0, rasters[:,1500:1505,1500:1505])


[[[119 118 119 120 121]
  [120 116 117 119 120]
  [117 118 117 117 120]
  [113 111 109 106 108]
  [104 104  97  94  95]]

 [[127 130 132 133 138]
  [127 131 127 133 135]
  [122 128 129 128 128]
  [117 117 111 106 110]
  [100 103  93  88  91]]

 [[207 221 139 145 152]
  [199 231 133 139 147]
  [186 222 201 218 216]
  [180 187 168 157 127]
  [140 136 118  97 106]]

 [[103 104 103 103 105]
  [103 103 101 101 102]
  [101 102 101 100 101]
  [102 103 101  99  98]
  [103 101  98  96  97]]

 [[225 200 199 197 197]
  [215 199 194 197 194]
  [206 214 210 191 191]
  [201 204 197 180 178]
  [175 176 164 152 157]]]
Out[314]:
array([[[120, 119, 120, 121, 122],
        [121, 117, 118, 120, 121],
        [118, 119, 118, 118, 121],
        [114, 112, 110, 107, 109],
        [105, 105,  98,  95,  96]],

       [[128, 131, 133, 134, 139],
        [128, 132, 128, 134, 136],
        [123, 129, 130, 129, 129],
        [118, 118, 112, 107, 111],
        [101, 104,  94,  89,  92]],

       [[208, 222, 140, 146, 153],
        [200, 232, 134, 140, 148],
        [187, 223, 202, 219, 217],
        [181, 188, 169, 158, 128],
        [141, 137, 119,  98, 107]],

       [[104, 105, 104, 104, 106],
        [104, 104, 102, 102, 103],
        [102, 103, 102, 101, 102],
        [103, 104, 102, 100,  99],
        [104, 102,  99,  97,  98]],

       [[226, 201, 200, 198, 198],
        [216, 200, 195, 198, 195],
        [207, 215, 211, 192, 192],
        [202, 205, 198, 181, 179],
        [176, 177, 165, 153, 158]]], dtype=uint8)

In [283]:
def ndvi(a):
    return( float(a[1] - a[0])/float(a[1]+a[0]))
    
ndvi = np.apply_along_axis(ndvi, 0, rasters[2:4,1500:1510,1500:1510]/255.0)

print ndvi


[[-0.33548387 -0.36       -0.14876033 -0.16935484 -0.18287938 -0.20599251
  -0.21481481 -0.18918919 -0.15503876 -0.14716981]
 [-0.31788079 -0.38323353 -0.13675214 -0.15833333 -0.18072289 -0.1875
  -0.19521912 -0.17269076 -0.14285714 -0.13178295]
 [-0.29616725 -0.37037037 -0.33112583 -0.37106918 -0.36277603 -0.35483871
  -0.17446809 -0.15       -0.11740891 -0.11553785]
 [-0.27659574 -0.28965517 -0.24907063 -0.2265625  -0.12888889 -0.1277533
  -0.12       -0.13080169 -0.12096774 -0.10569106]
 [-0.15226337 -0.14767932 -0.09259259 -0.00518135 -0.04433498 -0.07981221
  -0.11504425 -0.11666667 -0.10931174 -0.10833333]
 [-0.12280702 -0.08256881 -0.03        0.01621622 -0.00518135 -0.06666667
  -0.12663755 -0.10743802 -0.09836066 -0.09170306]
 [-0.12554113 -0.11504425 -0.07981221 -0.02538071  0.01587302  0.03703704
  -0.10043668 -0.09465021 -0.09465021 -0.0877193 ]
 [-0.06976744 -0.03414634 -0.03960396  0.0052356   0.04347826 -0.03960396
  -0.24363636 -0.31888545 -0.09623431 -0.08928571]
 [-0.07692308 -0.06161137 -0.04950495  0.01604278  0.03225806 -0.07981221
  -0.29010239 -0.07949791 -0.08154506 -0.0678733 ]
 [-0.13692946 -0.03921569 -0.00529101  0.01075269 -0.02040816 -0.15966387
  -0.31210191 -0.32298137 -0.09251101 -0.07207207]]

Activity

Try making your own user-defined function to operate a raster stack (or any stacked numpy array).


In [312]:
mydata = np.random.randn(5,1000,1000)

def mean(a):
    return(np.sum(a) / len(a))

%time np.apply_along_axis(mean,0,mydata)


CPU times: user 27.5 s, sys: 40.3 ms, total: 27.6 s
Wall time: 27.6 s
Out[312]:
array([[  4.96055493e-01,  -1.22065439e+00,  -8.48411960e-01, ...,
          3.57343482e-01,   8.84460261e-01,   5.10007953e-01],
       [ -2.28894023e-01,   8.84902998e-03,   1.02013517e+00, ...,
         -5.55009064e-01,  -7.27247167e-01,  -1.37625278e-02],
       [ -5.53089832e-01,  -3.52400856e-04,   4.40031630e-02, ...,
         -7.39083046e-01,   6.16216133e-01,   2.33083133e-01],
       ..., 
       [  4.45048632e-01,   3.65587590e-01,   1.84576130e-02, ...,
         -1.56104352e-01,  -1.33489308e-01,   7.69939570e-02],
       [  4.78729403e-01,   2.51896712e-02,  -7.37171646e-01, ...,
          2.22681083e-02,  -1.82146414e-01,  -5.32261388e-01],
       [ -3.12020567e-01,  -9.36700963e-01,   4.06244259e-01, ...,
          9.93119734e-03,   3.66320055e-01,   4.12191015e-01]])

Pandas Tutorial

Video of Pandas by its creator Wes McKinney

Next time

Our last python lecture

  • multiprocessing example
  • Your questions about python
  • How to manage large projects
  • Anything else you'd like to cover!

After that we'll be switching gears to the open source GIS GRASS. We'll be using a different server and I'll send you all credentials for connecting.