Week 3: Jupyter Notebooks and NumPy

(2017-07-18 23:10)

3.1. Jupyter Notebooks

3.1.1. Why Jupyter Notebooks?

Documented Data Scienec: Jupyter notebooks allow us to document the process by combining notes, code, and graphics.
Reproducible Science: it allows others to view and understand everything, as well as improving collaboration
Presentation of Results: we can share the notebook easily
It supports Julia, Python, and R, along with many other languages.

3.1.2. Jupyter: Getting Started

3.1.3. Setting up your system

3.1.4. Live Code: Getting Started



In [1]:

    
print('Hello World!')









    



Hello World!



In [2]:

    
365 * 24 * 60 * 60 # how many seconds are in a year









    Out[2]:





31536000



In [3]:

    
_ / 1e6 # underscore refers to the output of the previous cell









    Out[3]:





31.536



In [4]:

    
x = 4 + 3
print(x)

3.1.5. Documenting Analysis with Markdown Text

3.1.6. Live Code: Documenting Analysis with Markdown Text

This is a markdown cell

This is heading 2.

This is heading 3

Hi!

One fish
Two fish
Red fish
Blue fish

bold text

italic text

http://jupyter.org

This is a Latex equation:

$\int_0^\infty x^{-\alpha}$

3.1.7. Jupyter: Additional Tips

3.1.8. Live Code: Additional Tips



In [5]:

    
%matplotlib inline
from matplotlib.pyplot import plot

plot([0, 1, 0, 1])









    Out[5]:





[<matplotlib.lines.Line2D at 0x7f766d480ef0>]

This is a plot using matplotlib of a vector.

We can save and close the browser tab, the kernel running the notebook will continue to run. To prevent this, we can use the 'File -> Close and halt'. We can download the notebook in a number of formats (html, markdown, pdf etc.)

Live: Using Unix in Jupyter

We simply put an exclamation mark before a command.



In [6]:

    
# Declare filename:

!ls









    



count_vs_words	      Week 2.ipynb		Week-3-Numpy.zip
shakespeare.txt       Week-2-UNIXDataFiles	word_cloud
temp.txt	      Week-2-UNIXDataFiles.zip	word_cloud.zip
Week-1-Intro-new.zip  Week 3.ipynb
Week 1.ipynb	      Week-3-Numpy



In [7]:

    
filename = './shakespeare.txt'
!echo $filename
print(filename)









    



./shakespeare.txt
./shakespeare.txt



In [8]:

    
# head:

!head -n 3 $filename



In [9]:

    
# tail:

!tail -n 10 $filename



In [10]:

    
# wc:

!wc $filename









    



 124505  901447 5583442 ./shakespeare.txt



In [11]:

    
!wc -l $filename









    



124505 ./shakespeare.txt



In [12]:

    
# cat:

!cat $filename | wc -l



In [13]:

    
# grep:

!grep -i 'parchment' $filename



In [14]:

    
# output matching patten one per line and then count number of lines

!cat $filename | grep -o 'liberty' | wc -l



In [15]:

    
# sed:

# replace all instances of 'parchment' to 'manuscript'

!sed -e 's/parchment/manuscript/g' $filename > temp.txt

!grep -i 'manuscript' temp.txt



In [16]:

    
# sort:

!head -n 5 $filename



In [17]:

    
!head -n 5 $filename | sort



In [18]:

    
# columns separated by ' ', sort on column 2 (-ks), case insensitive (-f)

!head -n 5 $filename | sort -f -t' ' -k2



In [19]:

    
!sort $filename | wc -l



In [20]:

    
# uniq command for getting unique records using -u option

!sort $filename | uniq -u | wc -l



In [21]:

    
# Count the most frequent words in the text in Unix:

!sed -e 's/ /\'$'\n/g' < $filename | sort | uniq -c | sort -nr | head -15









    



 502289 $
  22678 the$
  19163 I$
  17868 and$
  15324 to$
  15216 of$

  12152 a$
  10614 my$
   9347 in$
   8709 you$
   7662 is$
   7332 that$
   7065 And$
   6737 not$
sort: write failed: 'standard output': Broken pipe
sort: write error



In [22]:

    
!sed -e 's/ /\'$'\n/g' < $filename | sort | uniq -c | sort -nr | head -15 > count_vs_words









    



sort: write failed: 'standard output': Broken pipe
sort: write error



In [23]:

    
!cat count_vs_words









    



 502289 $
  22678 the$
  19163 I$
  17868 and$
  15324 to$
  15216 of$

  12152 a$
  10614 my$
   9347 in$
   8709 you$
   7662 is$
   7332 that$
   7065 And$
   6737 not$



In [24]:

    
# plot by importing words counts into Python

%matplotlib inline

import matplotlib.pyplot as plt
import csv

xTicks = []
y = []

with open('count_vs_words', 'r') as csvfile:
    plots = csv.reader(csvfile, delimiter = ' ')
    for row in plots:
        y.append(int(row[-2]))
        xTicks.append(str(row[-1]))
        
# remove the count of spaces (first line)
y = y[1:]
xTicks = xTicks[1:]

# plot:

x = range(len(y))
plt.figure(figsize = (10, 10))
plt.xticks(x, xTicks, rotation = 90)
plt.plot(x, y, '*')









    Out[24]:





[<matplotlib.lines.Line2D at 0x7f766d3e5c88>]

3.2. NumPy

3.2.1. Why NumPy

Key-features:
Multi-dimensional arrays, built-in array operations, broadcasting (simplified interactions), integration (C, C++).

Why NumPy:

Speed: Ndarray are fixed in size (unlike lists which can change), all elements are the same type (lists can hold ints, floats, strings etc.)
Functionality: sum/multiply matrices, subset matrices, integration with Pandas and others.

Many packages are built on top of NumPy!

3.2.2. NumPy: ndarray basics

Rank 1 ndarray:



In [25]:

    
import numpy as np



In [26]:

    
an_array = np.array([3, 33, 333]) # create a rank 1 array
print(type(an_array))             # the type of a ndarray









    



<class 'numpy.ndarray'>



In [27]:

    
# test the shape of the array:

an_array.shape









    Out[27]:





(3,)



In [28]:

    
# because it's rank 1, we need only one index to access each element:

print(an_array[0], an_array[1], an_array[2])



In [29]:

    
# arrays are mutable:

an_array[0] = 88
an_array









    Out[29]:





array([ 88,  33, 333])



In [31]:

    
# you can't assign a string to a array:

an_array[0] = 'Spam'









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-800cf8c5005a> in <module>()
      1 # you can't assign a string to a array:
      2 
----> 3 an_array[0] = 'Spam'

ValueError: invalid literal for int() with base 10: 'Spam'

Rank 2 ndarray



In [32]:

    
another = np.array([[11, 12, 13], [21, 22, 23]]) # rank 2 ndarray
print(another)
print('The shape is:', another.shape)
print('Indexing the elements: ', another[0, 0], another[0, 1], another[1, 0])









    



[[11 12 13]
 [21 22 23]]
The shape is: (2, 3)
Indexing the elements:  11 12 21

There are many ways to create a ndarray



In [33]:

    
# a 2x2 ndarray filled with zeros

ex1 = np.zeros((2, 2))
ex1









    Out[33]:





array([[ 0.,  0.],
       [ 0.,  0.]])



In [34]:

    
# a 2x2 ndarray filled with 9.0

ex2 = np.full((2, 2), 9.0)
ex2









    Out[34]:





array([[ 9.,  9.],
       [ 9.,  9.]])



In [35]:

    
# a 2x2 ndarray filled with a diagonal of 1s and other 0s:

ex3 = np.eye(2, 2)
ex3









    Out[35]:





array([[ 1.,  0.],
       [ 0.,  1.]])



In [36]:

    
# a ndarray of 1s:

ex4 = np.ones((1, 2))
ex4









    Out[36]:





array([[ 1.,  1.]])



In [37]:

    
# the above array is rank 2:

print(ex4.shape)

#which means we need two indexes to access an element:
print()
print(ex4[0, 1])









    



(1, 2)

1.0



In [38]:

    
# ndarray of random float between 0 and 1:

ex5 = np.random.random((2, 2))
ex5









    Out[38]:





array([[ 0.5902834 ,  0.70214394],
       [ 0.48677465,  0.43616793]])

3.2.3. NumPy: ndarray indexing

Similar to slice indexing with lists and strings, we can use it to pull out sub-regions of ndarrrays.



In [39]:

    
import numpy as np



In [40]:

    
# rank 2 ndarray of shape (3, 4):

an_array = np.array([[11, 12, 13, 14],
                    [21, 22, 23, 24],
                    [31, 32, 33, 34]])
an_array









    Out[40]:





array([[11, 12, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 33, 34]])



In [41]:

    
# slice the first 2 rows and 2 columns:
# first value is inclusive, second value exclusive

a_slice = an_array[:2, 1:3]
a_slice









    Out[41]:





array([[12, 13],
       [22, 23]])

The 'a_slice' has its own indices (different from 'an_array'). When you modify 'a_slice', you actually modify the underlying ndarray, because they are both pointing to the same object:



In [42]:

    
print('Before:', an_array[0, 1])
a_slice[0, 0] = 1000
print('After:', an_array[0, 1])









    



Before: 12
After: 1000

You can make a copy of another array, so you have two ndarrays pointing to two different objects:



In [43]:

    
an_array2 = np.array([[11, 12, 13, 14],
                    [21, 22, 23, 24],
                    [31, 32, 33, 34]])

a_slice2 = np.array(an_array2[:2, 1:3]) # THIS IS THE TRICK!!!

print('Before:', an_array2[0, 1])
a_slice2[0, 0] = 1000
print('After:', an_array2[0, 1])









    



Before: 12
After: 12

Use both integer indexing and slice indexing



In [44]:

    
an_array2 = np.array([[11, 12, 13, 14],
                    [21, 22, 23, 24],
                    [31, 32, 33, 34]])

row_rank1 = an_array[1, :]        # only 1 row and all columns  (:)(
print(row_rank1, row_rank1.shape) # notice only a single []









    



[21 22 23 24] (4,)



In [45]:

    
# slicing alone: generates an array of the same rank as the an_array:

row_rank2 = an_array[1:2, :]      # rank 2 view
print(row_rank2, row_rank2.shape) # notice the [[]]









    



[[21 22 23 24]] (1, 4)

Array indexing for changing elements:



In [46]:

    
# create a new array:

an_array = np.array([[11, 12, 13, 14],
                    [21, 22, 23, 24],
                    [31, 32, 33, 34],
                    [41, 42, 43, 44]])

print('Original array:')
an_array









    



Original array:






    Out[46]:





array([[11, 12, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 33, 34],
       [41, 42, 43, 44]])



In [47]:

    
# create an array of indices:

col_indices = np.array([0, 1, 2, 0])
print('\nCol indices picked:', col_indices)

row_indices = np.arange(4)
print('\nRow indices picked:', row_indices)









    



Col indices picked: [0 1 2 0]

Row indices picked: [0 1 2 3]



In [48]:

    
# examine the pairings of row_indices and col_indices:

for row, col in zip(row_indices, col_indices):
    print(row, ', ', col)



In [49]:

    
# select one element from each row:

print('Values in the an_array at those indices:\n', an_array[row_indices, col_indices])









    



Values in the an_array at those indices:
 [11 22 33 41]



In [50]:

    
# change one element from each row using the indices selected:

an_array[row_indices, col_indices] += 1000

print('Changed array:\n', an_array)









    



Changed array:
 [[1011   12   13   14]
 [  21 1022   23   24]
 [  31   32 1033   34]
 [1041   42   43   44]]

3.2.4. NumPy: ndarray boolean indexing



In [51]:

    
# create a 3x2 ndarray:

an_array = np.array([[11, 12], [21, 22], [31, 32]])
an_array









    Out[51]:





array([[11, 12],
       [21, 22],
       [31, 32]])



In [52]:

    
# create a filter which will be a boolean:

filter1 = (an_array > 15)
filter1









    Out[52]:





array([[False, False],
       [ True,  True],
       [ True,  True]], dtype=bool)

Notice the filter has the same shape as the original ndarray, but filled with True and False according to the result of the boolean logic.



In [53]:

    
# we can now select just those elements which meet that criteria
an_array[filter1]









    Out[53]:





array([21, 22, 31, 32])



In [54]:

    
# for short, we could directly used this approach without creating a new object:

an_array[an_array > 15]









    Out[54]:





array([21, 22, 31, 32])



In [55]:

    
# greater than 20 and less than 30:

an_array[(an_array > 20) & (an_array < 30)]









    Out[55]:





array([21, 22])



In [56]:

    
# only even values:

an_array[an_array % 2 == 0]









    Out[56]:





array([12, 22, 32])

We can actually change the values based on some filter:



In [57]:

    
an_array[an_array % 2 == 0] += 100
an_array









    Out[57]:





array([[ 11, 112],
       [ 21, 122],
       [ 31, 132]])

3.2.5. NumPy: ndarray Datatypes and Operations



In [58]:

    
import numpy as np



In [59]:

    
ex1 = np.array([11, 22]) # dtype is inferred as int64
ex1.dtype









    Out[59]:





dtype('int64')



In [60]:

    
ex2 = np.array([11.0, 12.0]) # dtype is inferred as float64
ex2.dtype









    Out[60]:





dtype('float64')



In [61]:

    
ex3 = np.array([11, 21], dtype = np.int64) # "force" the dtype as int64
ex3.dtype









    Out[61]:





dtype('int64')



In [62]:

    
ex5 = np.array([11, 21], dtype = np.float64) # "force" the dtype as float64
print(ex5.dtype)
print()
print(ex5)









    



float64

[ 11.  21.]

Arithmetic Array Operations



In [63]:

    
x = np.array([[111, 112], [121, 122]], dtype = np.int)
y = np.array([[211.1, 212.1], [221.1, 222.1]], dtype = np.float64)

print(x, '\n\n', y)









    



[[111 112]
 [121 122]] 

 [[ 211.1  212.1]
 [ 221.1  222.1]]



In [64]:

    
# add

print(x + y, '\n\n', np.add(x, y))









    



[[ 322.1  324.1]
 [ 342.1  344.1]] 

 [[ 322.1  324.1]
 [ 342.1  344.1]]



In [65]:

    
# subtract

print(x - y, '\n\n', np.subtract(x, y))









    



[[-100.1 -100.1]
 [-100.1 -100.1]] 

 [[-100.1 -100.1]
 [-100.1 -100.1]]



In [66]:

    
# multiply

print(x * y, '\n\n', np.multiply(x, y))









    



[[ 23432.1  23755.2]
 [ 26753.1  27096.2]] 

 [[ 23432.1  23755.2]
 [ 26753.1  27096.2]]



In [67]:

    
# divide

print(x / y, '\n\n', np.divide(x, y))









    



[[ 0.52581715  0.52805281]
 [ 0.54726368  0.54930212]] 

 [[ 0.52581715  0.52805281]
 [ 0.54726368  0.54930212]]



In [68]:

    
# square root

np.sqrt(x)









    Out[68]:





array([[ 10.53565375,  10.58300524],
       [ 11.        ,  11.04536102]])



In [69]:

    
# exponent (e ** x)

np.exp(x)









    Out[69]:





array([[  1.60948707e+48,   4.37503945e+48],
       [  3.54513118e+52,   9.63666567e+52]])

3.2.6. NumPy: Statistical, Sorting and Set Operations



In [70]:

    
# setup a random 2x4 matrix

arr = 10 * np.random.randn(2, 5)
print(arr)









    



[[  3.8469235    3.7911226    3.99024488   3.07115218   1.85514902]
 [-23.34216714   4.43488019 -12.93344486   1.2985727    8.10960011]]



In [71]:

    
# mean for all elements

arr.mean()









    Out[71]:





-0.58779668199001045



In [72]:

    
# mean by row

arr.mean(axis = 1) # axis = 1 means calculate by row (axis = 1 is for column)









    Out[72]:





array([ 3.31091844, -4.4865118 ])



In [73]:

    
# mean by column

arr.mean(axis = 0)









    Out[73]:





array([-9.74762182,  4.11300139, -4.47159999,  2.18486244,  4.98237456])



In [74]:

    
# sum all elements

arr.sum()









    Out[74]:





-5.8779668199001041



In [75]:

    
# median by column

np.median(arr, axis = 0)









    Out[75]:





array([-9.74762182,  4.11300139, -4.47159999,  2.18486244,  4.98237456])

Sorting



In [76]:

    
# create a 10 element array of randoms

unsorted = np.random.randn(10)
unsorted









    Out[76]:





array([-0.5497472 ,  0.36310535, -0.68827437, -0.23370217, -2.46601364,
       -1.20890416,  0.06640181, -0.0667975 ,  2.10128554, -0.12253205])



In [77]:

    
# create a copy of the unsorted array and sort it:

sorted1 = np.array(unsorted)
sorted1.sort()

print(sorted1, '\n\n', unsorted)









    



[-2.46601364 -1.20890416 -0.68827437 -0.5497472  -0.23370217 -0.12253205
 -0.0667975   0.06640181  0.36310535  2.10128554] 

 [-0.5497472   0.36310535 -0.68827437 -0.23370217 -2.46601364 -1.20890416
  0.06640181 -0.0667975   2.10128554 -0.12253205]

Finding Unique Elements



In [78]:

    
array = np.array([1, 2, 1, 4, 2, 1, 4, 2])
np.unique(array)









    Out[78]:





array([1, 2, 4])

Set Operations with np.array data type



In [79]:

    
s1 = np.array(['desk', 'chair', 'bulb'])
s2 = np.array(['lamp', 'bulb', 'chair'])
print(s1, s2)









    



['desk' 'chair' 'bulb'] ['lamp' 'bulb' 'chair']



In [80]:

    
np.intersect1d(s1, s2)









    Out[80]:





array(['bulb', 'chair'], 
      dtype='<U5')



In [81]:

    
np.union1d(s1, s2)









    Out[81]:





array(['bulb', 'chair', 'desk', 'lamp'], 
      dtype='<U5')



In [82]:

    
# elements in s1 that are not in s2

np.setdiff1d(s1, s2)









    Out[82]:





array(['desk'], 
      dtype='<U5')



In [83]:

    
# element of s1 is also in s2

np.in1d(s1, s2)









    Out[83]:





array([False,  True,  True], dtype=bool)

3.2.7. NumPy: Broadcasting

"When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions. and works its way forward. Two dimensions are compatible when:

they are equal or
one of them is 1"



In [84]:

    
start = np.zeros((4, 3))
start









    Out[84]:





array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])



In [85]:

    
# create a rank 1 ndarray with 3 values

add_rows = np.array([1, 0, 2])
add_rows









    Out[85]:





array([1, 0, 2])



In [86]:

    
# row broadcasting

start + add_rows









    Out[86]:





array([[ 1.,  0.,  2.],
       [ 1.,  0.,  2.],
       [ 1.,  0.,  2.],
       [ 1.,  0.,  2.]])



In [87]:

    
# column broadcasting

add_cols = np.array([[0, 1, 2, 3]]).T
print(add_cols, '\n\n', add_cols + start)









    



[[0]
 [1]
 [2]
 [3]] 

 [[ 0.  0.  0.]
 [ 1.  1.  1.]
 [ 2.  2.  2.]
 [ 3.  3.  3.]]



In [88]:

    
# broadcasting in both dimensions:

add_scalar = np.array([1])
print(start + add_scalar)









    



[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]

3.2.8. NumPy: speed test ndarray vs list



In [89]:

    
from numpy import arange
from timeit import Timer

size = 1000000
timeits = 1000



In [90]:

    
#  create the ndarray with values 0, 1, 2, ..., size-1

nd_array = arange(size)
print(type(nd_array))









    



<class 'numpy.ndarray'>



In [91]:

    
# timer expects the operations as a parameter,
# here we pass nd_array.sum()

%time

nd_array.sum()









    



CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 7.15 µs






    Out[91]:





499999500000



In [92]:

    
# create the list with values 0, 1, 2, ..., size-1

a_list = list(range(size))
print(type(a_list))









    



<class 'list'>



In [93]:

    
%time

sum(a_list)









    



CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 7.39 µs






    Out[93]:





499999500000

3.3. Satellite Image Application in NumPy

3.3.1. Satellite Image Example

WIFIRE is an integrated system for wildfire analysis which is capable of handling changing urban dynamics and climate. The website http://www.landfire.gov/ provides data on earth surface. Each image is formed by many pixels, a pixel is a square which doesn't vary its color. We can represent a pixel's color by three 8-bit numbers (from 0 to 255) representing the Red, Green and Blue. For example, Black is (0, 0, 0), White is (255, 255, 255), Red is (255, 0, 0), Green is (0, 255, 0), and Blue is (0, 0, 255).

3.3.2. (Optional) WIFIRE Overview

3.3.3. Live Code: Satellite Example Part A



In [94]:

    
%matplotlib inline
import numpy as np
from scipy import misc
import matplotlib.pyplot as plt



In [95]:

    
from skimage import data
photo_data = misc.imread('/home/jayme/Courses/Python4DS/Week-3-Numpy/wifire/sd-3layers.jpg')
type(photo_data)









    Out[95]:





numpy.ndarray



In [96]:

    
plt.figure(figsize = (15, 15))
plt.imshow(photo_data)









    Out[96]:





<matplotlib.image.AxesImage at 0x7f765ef406d8>



In [97]:

    
photo_data.shape









    Out[97]:





(3725, 4797, 3)

The first two numbers are the length and the width, and the third number (i.e. 3) is for the three layers: Red, Green and Blue. The Red pixels indicates Altitude, the Blue indicates Aspect and the Green indicates Slope. The higher the intensity of the color the higher the altitude, aspect and slope.



In [98]:

    
photo_data.size









    Out[98]:





53606475



In [99]:

    
photo_data.min(), photo_data.max()









    Out[99]:





(0, 255)



In [100]:

    
photo_data.mean()









    Out[100]:





75.829935450894695



In [101]:

    
# pixel on the 150th row and 250th column

photo_data[150, 250]









    Out[101]:





array([ 17,  35, 255], dtype=uint8)



In [102]:

    
# only the green values

photo_data[150, 250, 1]









    Out[102]:





35



In [103]:

    
# set a pixel to all zeros and show image (doesn't change much)

photo_data[150, 250] = 0
plt.figure(figsize = (10, 10))
plt.imshow(photo_data)









    Out[103]:





<matplotlib.image.AxesImage at 0x7f765ef27fd0>



In [104]:

    
# set the green layer for rows 200 to 800 to full intensity

photo_data[200:800, :, 1] = 255
plt.figure(figsize = (10, 10))
plt.imshow(photo_data)









    Out[104]:





<matplotlib.image.AxesImage at 0x7f765ee3aa58>



In [105]:

    
# set all layers for rows 200 to 800 to full intensity (white)

photo_data = misc.imread('/home/jayme/Courses/Python4DS/Week-3-Numpy/wifire/sd-3layers.jpg')

photo_data[200:800, :] = 255
plt.figure(figsize = (10, 10))
plt.imshow(photo_data)









    Out[105]:





<matplotlib.image.AxesImage at 0x7f765edbfc50>



In [106]:

    
# pick all pixels with low values

photo_data = misc.imread('/home/jayme/Courses/Python4DS/Week-3-Numpy/wifire/sd-3layers.jpg')
print('Original shape:', photo_data.shape)
low_value_filter = photo_data < 200
print('Low value filter shape:', low_value_filter.shape)









    



Original shape: (3725, 4797, 3)
Low value filter shape: (3725, 4797, 3)



In [107]:

    
# filtering out low values

plt.figure(figsize = (10, 10))
plt.imshow(photo_data)
photo_data[low_value_filter] = 0
plt.figure(figsize = (10, 10))
plt.imshow(photo_data)









    Out[107]:





<matplotlib.image.AxesImage at 0x7f765ecc87b8>



In [108]:

    
# more row and column operations

rows_range = np.arange(len(photo_data))
cols_range = rows_range
print(type(rows_range))









    



<class 'numpy.ndarray'>



In [109]:

    
photo_data[rows_range, cols_range] = 255

plt.figure(figsize = (15, 15))
plt.imshow(photo_data)









    Out[109]:





<matplotlib.image.AxesImage at 0x7f765ebfbb70>

3.3.4. Live Code: Satellite Example Part B



In [110]:

    
%matplotlib inline
import numpy as np
from scipy import misc
import matplotlib.pyplot as plt



In [111]:

    
photo_data = misc.imread('/home/jayme/Courses/Python4DS/Week-3-Numpy/wifire/sd-3layers.jpg')

total_rows, total_cols, total_layers = photo_data.shape

# compact method for creating a multidimensional ndarray operations in single lines:
X, Y = np.ogrid[: total_rows, : total_cols]
print('X shape is', X.shape, 'Y shape is', Y.shape)









    



X shape is (3725, 1) Y shape is (1, 4797)



In [112]:

    
# calculate the center point
center_row, center_col = total_rows / 2, total_cols / 2

# distance from the center
dist_from_center = (X - center_row) ** 2 + (Y - center_col) ** 2

# radius
radius = (total_rows / 2) ** 2

# mask
circular_mask = (dist_from_center > radius)



In [113]:

    
photo_data[circular_mask] = 0
plt.figure(figsize = (10, 10))
plt.imshow(photo_data)









    Out[113]:





<matplotlib.image.AxesImage at 0x7f765eb90b38>

Further masking: let's get just the upper half disc



In [114]:

    
X, Y  = np.ogrid[: total_rows, : total_cols]
half_upper = X < center_row # generates a mask for all rows above center
half_upper_mask = np.logical_and(half_upper, circular_mask)



In [115]:

    
photo_data = misc.imread('/home/jayme/Courses/Python4DS/Week-3-Numpy/wifire/sd-3layers.jpg')
photo_data[half_upper_mask] = 255
plt.imshow(photo_data)









    Out[115]:





<matplotlib.image.AxesImage at 0x7f765eb20b70>



In [116]:

    
import random

photo_data = misc.imread('/home/jayme/Courses/Python4DS/Week-3-Numpy/wifire/sd-3layers.jpg')
photo_data[half_upper_mask] = random.randint(200, 255)
plt.imshow(photo_data)









    Out[116]:





<matplotlib.image.AxesImage at 0x7f765c0adc88>

Let's try to highlight all the high altitude areas by detecting the high intensity Red pixels and muting down other areas



In [117]:

    
photo_data = misc.imread('/home/jayme/Courses/Python4DS/Week-3-Numpy/wifire/sd-3layers.jpg')
red_mask = photo_data[:, :, 0] < 150
photo_data[red_mask] = 0
plt.imshow(photo_data);



In [118]:

    
# the same as before but for the Green pixels

photo_data = misc.imread('/home/jayme/Courses/Python4DS/Week-3-Numpy/wifire/sd-3layers.jpg')
green_mask = photo_data[:, :, 1] < 150
photo_data[green_mask] = 0
plt.imshow(photo_data);



In [119]:

    
# the same as before but for the Blue pixels

photo_data = misc.imread('/home/jayme/Courses/Python4DS/Week-3-Numpy/wifire/sd-3layers.jpg')
green_mask = photo_data[:, :, 2] < 150
photo_data[green_mask] = 0
plt.imshow(photo_data);

Composite mask for all three layers



In [120]:

    
photo_data = misc.imread('/home/jayme/Courses/Python4DS/Week-3-Numpy/wifire/sd-3layers.jpg')

red_mask = photo_data[:, :, 0] < 150
green_mask = photo_data[:, :, 1] > 100
blue_mask = photo_data[:, :, 2] < 100

final_mask = np.logical_and(red_mask, green_mask, blue_mask)
photo_data[final_mask] = 0

plt.imshow(photo_data);

3.4. Week 3: Assessment

(2017-07-25 15:13)