Array Computing

Terminology

List

  • A sequence of values that can vary in length.
  • The values can be different data types.
  • The values can be modified (mutable).

Tuple

  • A sequence of values with a fixed length.
  • The values can be different data types.
  • The values cannot be modified (immutable).

Array

  • A sequence of values with a fixed length.
  • The values cannot be different data types.
  • The values can be modified (mutable).

Vector: A 1 dimensional (1D) array.

Matrix: - A 2 dimensional (2D) array.

Arrays are like lists but less flexible and more efficient for lengthy calculations (one data type, stored in the same location in memory).

But first:

VECTORS -- very simple arrays

Vectors can have an arbitrary number of components, existing in an n-dimensional space.

(x1, x2, x3, ... xn)

Or

(x0, x1, x2, ... x(n-1)) for Python...

In Python, vectors are represented by lists or tuples:

Lists:


In [1]:
x = 2
y = 3

myList = [x, y]
myList


Out[1]:
[2, 3]

Tuples:


In [2]:
myTuple = (-4, 7)
myTuple


Out[2]:
(-4, 7)

Mathematical Operations on Vectors

Review of vector operations: textbook sections 5.1.2 & 5.1.3

In computing:

Applying a mathematical function to a vector 
means applying it to each element in the vector.
(you may hear me use the phrase "element-wise,"
which means "performing some operation one element
at a time")

However, this is not true of lists and tuples

Q. What do these yield?


In [3]:
numList  = [0.0, 1.0, 2.0]
numTuple = (0.0, 1.0, 2.0)

In [4]:
2 * numList


Out[4]:
[0.0, 1.0, 2.0, 0.0, 1.0, 2.0]

In [5]:
2 * numTuple


Out[5]:
(0.0, 1.0, 2.0, 0.0, 1.0, 2.0)

In [6]:
2.0 * numList


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-a9f9a4d47557> in <module>()
----> 1 2.0 * numList

TypeError: can't multiply sequence by non-int of type 'float'

Vectors in Python programming

Our current solution:

  • using lists for collecting function data
  • convert to NumPy arrays for doing math with them.

As an example, a falling object in Earth's gravity:


In [7]:
def distance(t, a = 9.8):
    '''Calculate the distance given a time and acceleration.
       
       Input:  time in seconds <int> or <float>,
               acceleration in m/s^2 <int> or <float>
       Output: distance in m <float>
    '''
    return 0.5 * a * t**2

numPoints = 6                       # number of points
delta     = 1.0 / (numPoints - 1)   # time interval between points

# Q. What do the two lines below do?  

timeList = [index * delta for index in range(numPoints)]
distList = [distance(t) for t in timeList]

In [8]:
print("Time List:    ", timeList)
print("Distance List:", distList)


Time List:     [0.0, 0.2, 0.4, 0.6000000000000001, 0.8, 1.0]
Distance List: [0.0, 0.19600000000000006, 0.7840000000000003, 1.7640000000000007, 3.136000000000001, 4.9]

Repeat on your own: stitching results together:


In [9]:
timeDistList = []
for index in range(numPoints):
    timeDistList.append([timeList[index], distList[index]])

for element in timeDistList:
    print element


[0.0, 0.0]
[0.2, 0.19600000000000006]
[0.4, 0.7840000000000003]
[0.6000000000000001, 1.7640000000000007]
[0.8, 3.136000000000001]
[1.0, 4.9]

Or using zip, we did this already before:


In [12]:
timeDistList2 = [[time, dist] for time, dist in zip(timeList, distList)]

for element in timeDistList2:
    print(element)


[0.0, 0.0]
[0.2, 0.19600000000000006]
[0.4, 0.7840000000000003]
[0.6000000000000001, 1.7640000000000007]
[0.8, 3.136000000000001]
[1.0, 4.9]
What zip does:

In [16]:
daveList = range(5)
for element in zip(timeList, distList):
    print(element)
list(zip(timeList, distList, daveList))


(0.0, 0.0)
(0.2, 0.19600000000000006)
(0.4, 0.7840000000000003)
(0.6000000000000001, 1.7640000000000007)
(0.8, 3.136000000000001)
(1.0, 4.9)
Out[16]:
[(0.0, 0.0, 0),
 (0.2, 0.19600000000000006, 1),
 (0.4, 0.7840000000000003, 2),
 (0.6000000000000001, 1.7640000000000007, 3),
 (0.8, 3.136000000000001, 4)]

When to use lists and arrays?

In general, we'll use lists instead of arrays when elements have to be added (e.g., we don't know how the number of elements ahead of time, and must use methods like append and extend) or their types are heterogeneous.

Otherwise we'll use arrays for numerical calculations.

Basics of numpy arrays

Characteristics of numpy arrays:

  1. Elements are all the same type

  2. Number of elements known when array is created

  3. Numerical Python (numpy) must be imported to manipulate arrays.

  4. All array elements are operated on by numpy, which eliminates loops and makes programs much faster.

  5. Arrays with one index are sometimes called vectors (or 1D arrays). Arrays with two indices are sometimes called matrices (or 2D arrays).

Some numpy functionality and standard usage:

In [17]:
import numpy as np

To convert a list to an array use the array method:


In [9]:
myList  = [1, 2, 3]
myArray = np.array(myList)

print(type(myArray))
myArray


<class 'numpy.ndarray'>
Out[9]:
array([1, 2, 3])

Note the type!

To create an array of length n filled with zeros (to be filled later):


In [10]:
np.zeros?

In [20]:
myArray = np.zeros(10)
myArray


Out[20]:
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

To create arrays with elements of a type other than the default float, use a second argument:


In [11]:
myArray = np.zeros(5, dtype=int)
myArray


Out[11]:
array([0, 0, 0, 0, 0])

We often want array elements equally spaced by some interval (delta).

numpy.linspace(start, end, number of elements) does this:

NOTE #### HERE, THE "end" VALUE IS NOT (end - 1) #### NOTE


In [13]:
zArray = np.linspace(0, 5, 6)
zArray


Out[13]:
array([ 0.,  1.,  2.,  3.,  4.,  5.])

Q. What will that do?

Array elements are accessed with square brackets, the same as lists:


In [14]:
zArray[3]


Out[14]:
3.0

Slicing can also be done on arrays:

Q. What does this give us?


In [15]:
yArray = zArray[1:4]
yArray


Out[15]:
array([ 1.,  2.,  3.])

For reference below:


In [16]:
zArray


Out[16]:
array([ 0.,  1.,  2.,  3.,  4.,  5.])

Let's edit one of the values in the z array


In [17]:
zArray[3] = 10.0
zArray


Out[17]:
array([  0.,   1.,   2.,  10.,   4.,   5.])

Now let's look at the y array again


In [18]:
yArray


Out[18]:
array([  1.,   2.,  10.])

The variable yArray is a reference (or view in Numpy lingo) to three elements (a slice) from zArray: element indices 1, 2, and 3.

Here is a blog post which discusses this issue nicely:

http://nedbatchelder.com/text/names.html

Reason this is of course memory efficiency: Why copy data if not necessary?


In [21]:
lList = [6, 7, 8, 9, 10, 11]
mList = lList[1:3]

print(mList)
lList[1] = 10
mList


[7, 8]
Out[21]:
[7, 8]

Do not forget this -- check your array values frequently if you are unsure!

Computing coordinates and function values

Here's the distance function we did previously:


In [30]:
def distance(t, a = 9.8):
    '''Calculate the distance given a time and acceleration.
       
       Input:  time in seconds <int> or <float>,
               acceleration in m/s^2 <int> or <float>
       Output: distance in m <float>
    '''
    return 0.5 * a * t**2

numPoints = 6                       # number of points
delta     = 1.0 / (numPoints - 1)   # time interval between points

timeList = [index * delta for index in range(numPoints)]   # Create the time list
distList = [distance(t) for t in timeList]                 # Create the distance list

We could convert timeList and distList from lists to arrays:


In [31]:
timeArray = np.array(timeList)
distArray = np.array(distList)

print(type(timeArray), timeArray)
print(type(distArray), distArray)


<class 'numpy.ndarray'> [ 0.   0.2  0.4  0.6  0.8  1. ]
<class 'numpy.ndarray'> [ 0.     0.196  0.784  1.764  3.136  4.9  ]

We can do this directly by creating arrays (without converting from a list) with np.linspace to create timeArray and np.zeros to create distArray.

(This is merely a demonstration, not superior to the above code for this simple example.)


In [22]:
def distance(t, a = 9.8):
    '''Calculate the distance given a time and acceleration.
       
       Input:  time in seconds <int> or <float>,
               acceleration in m/s^2 <int> or <float>
       Output: distance in m <float>
    '''
    return 0.5 * a * t**2

numPoints = 6                       # number of points

timeArray = np.linspace(0, 1, numPoints)   # Create the time array
distArray = np.zeros(numPoints)            # Create the distance array populated with 0's

print("Time Array:          ", type(timeArray), timeArray)
print("Dist Array Zeros:    ", type(distArray), distArray)

for index in range(numPoints):
    distArray[index] = distance(timeArray[index])   # Populate the distance array with calculated values

print("Dist Array Populated:", type(distArray), distArray)


Time Array:           <class 'numpy.ndarray'> [ 0.   0.2  0.4  0.6  0.8  1. ]
Dist Array Zeros:     <class 'numpy.ndarray'> [ 0.  0.  0.  0.  0.  0.]
Dist Array Populated: <class 'numpy.ndarray'> [ 0.     0.196  0.784  1.764  3.136  4.9  ]

Vectorization -- one of the great powers of arrays

The examples above are great, but they doesn't use the computation power of arrays by operating on all the elements simultaneously!

Loops are slow. Operating on the elements simultaneously is much faster (and simpler!).

"Vectorization" is replacing a loop with vector or array expressions.


In [23]:
def distance(t, a = 9.8):
    '''Calculate the distance given a time and acceleration.
       
       Input:  time(s) in seconds <int> or <float> or <np.array>,
               acceleration in m/s^2 <int> or <float>
       Output: distance in m <float>
    '''
    return 0.5 * a * t**2

numPoints = 6                              # number of points

timeArray = np.linspace(0, 1, numPoints)   # Create the time array
distArray = distance(timeArray)            # Create and populate the distance array using vectorization

print("Time Array:", type(timeArray), timeArray)
print("Dist Array:", type(distArray), distArray)


Time Array: <class 'numpy.ndarray'> [ 0.   0.2  0.4  0.6  0.8  1. ]
Dist Array: <class 'numpy.ndarray'> [ 0.     0.196  0.784  1.764  3.136  4.9  ]

What just happened?

Let's look at what the function "distance" is doing to the values in timeArray


In [24]:
numPoints = 6   # Number of points
a = 9.8         # Acceleration in m/s^2

timeArray = np.linspace(0, 1, numPoints)  # The values a created like before
print("Original ", timeArray)

timeArray = timeArray**2                  # Once in the function, they are first squared
print("Squared  ", timeArray)
print(distArray)

timeArray = timeArray * 0.5               # Next they are multiplied by 0.5
print("Times 0.5", timeArray)

timeArray = timeArray * a                 # Finally, they are multiplied by a and the entire modified
print("Times a  ", timeArray)                # array is returned


Original  [ 0.   0.2  0.4  0.6  0.8  1. ]
Squared   [ 0.    0.04  0.16  0.36  0.64  1.  ]
[ 0.     0.196  0.784  1.764  3.136  4.9  ]
Times 0.5 [ 0.    0.02  0.08  0.18  0.32  0.5 ]
Times a   [ 0.     0.196  0.784  1.764  3.136  4.9  ]

Caution: numpy has its own math functions, such as sin, cos, pi, exp, and some of these are slightly different from Python's math module.

Also, the math module does not accept numpy array as arguments, i.e. it is NOT vectorized.

Conclusiong: Use numpy built in math whenever dealing with arrays, but be aware that if you repeatedly (in a loop) calculate only 1 value at a time, the math library would be faster (because numpy has some overhead costs to do autmatically element-wise math).

So, do this for single calculations:


In [25]:
import math
math.sin(0.5)


Out[25]:
0.479425538604203

but do this for arrays:


In [26]:
np.sin([0.1, 0.2, 0.3, 0.4, 0.5])


Out[26]:
array([ 0.09983342,  0.19866933,  0.29552021,  0.38941834,  0.47942554])

In [ ]: