Introduction to NumPy Arrays

What is an array and the NumPy package

  • Creating arrays
  • Array indexing
  • Array inquiry
  • Array manipulation
  • Array operations

What is (in) UV-CDAT?

  • Masked variables, axes
  • Brief tour of vcdat
  • scanner - "cdscan"
  • Spicy touch of GUI

Why NumPy?

An array in numpy is like a list except :

  • All elements are of the same type, so operations with arrays are much faster.
  • Multi‐dimensional arrays are more clearly supported.
  • Array operations are supported.

Creating numpy arrays


In [ ]:
# import statement  
import numpy

In [ ]:
# create sample 1-dimensional array     
a = numpy.array([2, 3, -5, 21, -2, 1, 10, 100, 200])
a.shape

Let's create 2 dimensional array


In [ ]:
# create sample 2-dimensional array
a = numpy.array([[2, 3, -5],[21, -2, 1]])
a.shape

Lets print it...


In [ ]:
print a

What is data type of above 'a' variable


In [ ]:
type(a) # returns type of object 'a'

In [ ]:
a.dtype # returns type of array 'a' elements

How to create float / double type variable


In [ ]:
# To force a certain numerical type for the array, set the "dtype" keyword to a type code
a = numpy.array([[2, 3, -5],[21, -2, 1]], dtype='i')
a.dtype

Lets print it again ...


In [ ]:
print a

Some common typecodes:

'd' : Double precision floating 'f' : Single precision floating 'i' : Short integer 'l' : Long integer

To create an array of a given shape filled with zeros, use the zeros function (with dtype being optional)


In [ ]:
a = numpy.zeros((3,2), dtype='d')
print a

To create an array the same as range, use the arange function (again dtype is optional)


In [ ]:
a = numpy.arange(10)
print a

Note : its arange function in numpy, range function can be used in normal list

Array indexing

Like lists, element addresses start with zero, so the first element of 1‐D array a is a[0], the second is a[1], etc.


In [ ]:
a[0]

Like lists, you can reference elements starting from the end, e.g., element a[-1] is the last element in a 1‐D array.


In [ ]:
a[-1]

Slicing an array

  • Element addresses in a range are separated by a colon.
  • The lower limit is inclusive, and the upper limit is exclusive.

In [ ]:
a = numpy.array([2, 3.2, 5.5, -6.4, -2.2, 2.4])
a[1:4] # what it will return ? ...
  • For multi‐dimensional arrays, indexing between different dimensions is separated by commas.
  • The fastest varying dimension is the last index. Thus, a 2‐D array is indexed [row, col].
  • To specify all elements in a dimension, use a colon by itself for the dimension.

In [ ]:
a = numpy.array([[2, 3.2, 5.5, -6.4, -2.2, 2.4],
                 [1, 22, 4, 0.1, 5.3, -9],
                 [3, 1, 2.1, 21, 1.1, -2]])

How can we access the number 4 from the abvoe numpy array variable a ?


In [ ]:
a[1][2] #normal index access using multi square brackets []

Alternatively ...


In [ ]:
a[1,2]

How can we aceess the full row / column ?


In [ ]:
# get the 2nd row of the a numpy array variable
a[1,:] # what about a[1] ?

Tell me answers for the following two statements...


In [ ]:
a[1:3,0]

In [ ]:
a[1:3,0:2]

In [ ]:
# Will it throw an error? why?
a[1:4,0:2]

Array inquiry

Some information about arrays comes through functions on the array, others through attributes attached to the array.


In [ ]:
print a

In [ ]:
# Shape of the array:  
numpy.shape(a) # same as a.shape

In [ ]:
# Rank of the array:  
numpy.rank(a)

In [ ]:
# Number of elements in the array
numpy.size(a) # same as a.size

Why shouldn't we use len(a) function ?


In [ ]:
len(a)

Array manipulation

Reshape


In [ ]:
print a
a.shape

In [ ]:
# Reshape the array:  
numpy.reshape(a, (9, 2))

In [ ]:
b = a.reshape((6, 3)) # reshape array a and store into variable b
print "b shape", b.shape
print "a shape", a.shape # still array a shape is not changed

Transpose


In [ ]:
numpy.transpose(a) # same as a.T

In [ ]:
b = a.T
b.shape

Convert multi dimension into single dimension array


In [ ]:
numpy.ravel(a)

Repeat every elements in an array


In [ ]:
numpy.repeat(a,3)

Convert array a to another type


In [ ]:
b = a.astype('i')  # here the argument is the typecode for b.
b

Array operations : Multiplication - Method 1 (loops)

Multiply two arrays together, element by element


In [ ]:
a = numpy.array([[2, 3.2, 5.5, -6.4],
                  [3, 1, 2.1, 21]])

b = numpy.array([[4, 1.2, -4, 9.1],
                [6, 21, 1.5, -27]])

shape_a = numpy.shape(a)
product = numpy.zeros(shape_a, dtype='f')

for i in xrange(shape_a[0]):
    for j in xrange(shape_a[1]):
        product[i,j] = a[i,j] * b[i,j]
    # end of for j in xrange(shape_a[1]):
# end of for i in xrange(shape_a[0]):

# print the result
print product
  • Note the use of xrange - range holds the list in memory, but where as in xrange not
  • Loops are relatively slow.
  • What if the two arrays do not have the same shape?
  • How about matrix multplication ?
  • Do we have alternate & robust way to do array operations ?

Array operations : Multiplication - Method 2 (array syntax

Multiply two arrays together, element by element


In [ ]:
import numpy
a = numpy.array([[2, 3.2, 5.5, -6.4],
                 [3, 1, 2.1, 21]])

b = numpy.array([[4, 1.2, -4, 9.1],
                 [6, 21, 1.5, -27]])

# We didn't create the result 'product' matrix yet !
product = a * b  # just normal multiplication
product

How about matrix addition, subtraction, division


In [ ]:
addition = a + b            # element-wise addition  
subtraction = a - b         # element-wise subtraction 
division = a / b            # element-wise division

In [ ]:
division
  • Arithmetic operators are automatically defined to act element‐wise when operands are NumPy arrays.
  • Output array automatically created.
  • Operand shapes are automatically checked for compatibility.
  • You do not need to know the rank of the arrays ahead of time.
  • Faster than loops.

How about Matrix Multiplication (say dot product)


In [ ]:
numpy.dot(a, b)

In [ ]:
numpy.dot(a, b.T) # equivalent to a.dot(b.T)

Array Operations : Condtion - Method 1 (loops)

  • Often times, you will want to do calculations on an array that involves conditionals.
  • You could implement this in a loop.

Say you have a 2‐D array a and you want to return an array answer which is square the value when the element in a is greater than 5 and less than 10, and output zero when it is not.


In [ ]:
a = numpy.array([[2, 3.2, 5.5, -6.4, 5],
                 [30, 7, 2.1, 4.9, 6]])

answer = numpy.zeros(numpy.shape(a), dtype='f')

for i in xrange(numpy.shape(a)[0]):
    for j in xrange(numpy.shape(a)[1]):
        if (a[i,j] > 5) and (a[i,j] < 10):
            answer[i,j] = a[i,j] * a[i,j]
        else:
            pass
        # end of if if (a[i,j] > 5) and (a[i,j] < 10):
    # end of for j in xrange(numpy.shape(a)[1]):
# end of for i in xrange(numpy.shape(a)[0]):

answer
  • The pass command is used when you have an option where you don't want to do anything.
  • Again, loops are slow, and the if statement makes it even slower.
  • numpy has functions better way to do it !

Array Operations : Condtion - Method 2 (array syntax)

Comparison operators (implemented either as operators or functions) act element‐wise, and return a boolean array.


In [ ]:
a = numpy.arange(10) 
print a
a > 5  # What do you expect the ans

In [ ]:
numpy.greater(a, 5) # same as 'a > 5'

Boolean operators are implemented as functions that also act element‐wise (e.g., logical_and, logical_or).

The where function tests any condition and applies operations for true and false cases, as specified, on an element‐wise basis.


In [ ]:
# share your view on this
a = numpy.arange(11)
condition = numpy.logical_and(a>5, a<10)
answer = numpy.where(condition, a*2, 0)

What is condition? answer?


In [ ]:
condition

In [ ]:
answer
  • This code implements the example in the last slide, and is both cleaner and runs faster ?
  • You can also accomplish what the where function does by taking advantage of how arithmetic operations on boolean arrays treat True as 1 and False as 0.
  • By using multiplication and addition, the boolean values become selectors.

In [ ]:
condition = numpy.logical_and(a>5, a<10)
answer = ((a*2)*condition) + (0*numpy.logical_not(condition))
  • This method is also faster than loops.
  • Try comparing the relative speeds of these different ways of applying tests to an array.

In [ ]:
* Use time module, learn it yourself  # import time

Array Operations : Additional functions

  • Basic mathematical functions: sin, exp, interp, etc.
  • Basic statistical functions: correlate, histogram, hamming, fft, etc.
  • NumPy has a lot of stuff!

Numpy Document

Use help(numpy), as well as help(numpy.x), where x is the name of a function, to get more information.


In [ ]:
help(numpy)

In [ ]:
help(numpy.histogram)

In [ ]:
print numpy.fft.__doc__

Exercise 1 : Reading a multi‐column text file

  • For the file two‐col_rad_sine.txt in files, write code to read the two columns of data into two arrays, one for angle in radians (column 1) and the other for the sine of the angle (column 2).

  • The two columns are separated by tabs. The file's newline character is just '\n' (though this isn't something you'll need to know to do the exercise).

Exercise 1 : Reading a multi‐column text file Simple Solution


In [ ]:
import numpy
# data path
DATAPATH = ./sample_cdat_data/
# absolute file path
fileobj = open(DATAPATH + 'two-col_rad_sine.txt', 'r')
data_str = fileobj.readlines()
fileobj.close()

# it just declared the arrays with needed length, shape
radians = numpy.array(len(data_str), 'f')
sines = numpy.array(len(data_str), 'f')

# loop through no of lines
for i in xrange(len(data_str)):
    # split the line string w.r.t tab character       
    split_istr = data_str[i].split('\t')    # split line and returns as list
    radians[i] = float(split_istr[0])       # convert string no into float data type
    sines[i] = float(split_istr[1])

Exercise 1 : Reading a multi‐column text file NumPy Powered Solution


In [ ]:
import numpy, os
# data path
DATAPATH = ./sample_cdat_data/
# absolute file path
filepath = os.path.join(DATAPATH, 'two-col_rad_sine.txt')
# read two columns from datafile and load it into two variables
radians, sines = numpy.loadtxt(filepath, delimiter='\t', unpack=True) # to unpack all columns
  • How simple it is !

Exercise 2 (Homework): Reading formatted data

  • In the directory './sample_cdat_data/' you will see the following files
    • REGIONRF.TXT (a data file with rainfall data for India)
    • test_FortranFormat.py
  • Look at the python program and try to understand what it does and do the exercise given in it.

Acknowledgments

  • Dean Williams, Charles Doutriaux (PCMDI, LLNL).
  • Dr. Johnny Lin (Physics Department, North Park University, Chicago, Illinois).
  • Dr.Krishna AchtuaRao (Centre for Atmospheic Sciences, Indian Institute of Technology Delhi, India).

License

  • IPython Notebook Created by
      Arulalan.T 
      Date : 14.02.2014
      Project Associate,
      Centre for Atmospheic Sciences,
      Indian Institute of Technology Delhi, India.
      Blog : http://tuxcoder.wordpress.com 
      Repo : https://github.com/arulalant/UV-CDAT-IPython-Notebooks
      
  • This work (IPython Notebook & Html Slide) is licensed under a Creative Commons Attribution‐NonCommercial‐ShareAlike 3.0
  • Includes all python scripts in this notebook