Why Python?

  • Easy to learn
  • High-level data structures
  • Elegant syntax
  • Lots of useful packages for machine learning and data science

Now we have python3 installed

  • numpy
  • scipy
  • scikit-learn
  • matplotlib
  • ...

To install packages:

conda install <PACKAGE_NAME>
pip install <PACKAGE_NAME>

Let's run our slides first!

jupyter notebook

Want more fancy stuff? Just install RISE!

conda install -c damianavila82 rise

Play with your toys!

Here is an option to play with if you can't set up jupyter on your own computer: https://tmpnb.org.


In [1]:
print ('Hello Python!')


Hello Python!

Python Basics

  • Data Types
  • Containers
  • Functions
  • Classes

Basic data types

Numbers

Integers and floats work as you would expect from other languages:


In [2]:
x = 3
print (x, type(x))


3 <class 'int'>

In [3]:
print (x + 3)   # Addition;
print (x - x)   # Subtraction;
print (x * 2)   # Multiplication;
print (x ** 3)  # Exponentiation;


6
0
6
27

In [4]:
print (x)
x += 1
print (x)
x = x + 1
print (x)  # Prints "4"
x *= 2
print (x)  # Prints "8"


3
4
5
10

In [5]:
y = 2.5
print (type(y)) # Prints "<type 'float'>"
print (y, y + 1, y * 2, y ** 2) # Prints "2.5 3.5 5.0 6.25"


<class 'float'>
2.5 3.5 5.0 6.25

Note that unlike many languages, Python does not have unary increment (x++) or decrement (x--) operators.

Python also has built-in types for long integers and complex numbers; you can find all of the details in the documentation.


In [6]:
print (17 / 3)   # return float
print (17 // 3)  # return integer
print (17 % 3)   # Modulo operation


5.666666666666667
5
2

Booleans

Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (&&, ||, etc.):


In [7]:
t, f = True, False # Note the Captilzation!
print (type(t)) # Prints "<type 'bool'>"


<class 'bool'>

Now we let's look at the operations:


In [8]:
print (t and f) # Logical AND;
print (t or f)  # Logical OR;
print (not t)   # Logical NOT;
print (t != f)  # Logical XOR;


False
True
False
True

Strings


In [9]:
hello = 'hello'   # String literals can use single quotes
world = "world"   # or double quotes; it does not matter.
print (hello, len(hello))


hello 5

In [10]:
hw = hello + ' ' + world  # String concatenation
print (hw)  # prints "hello world"


hello world

In [11]:
# sprintf style string formatting 
hw12 = '%s %s %d' % (hello, world, 12)
# Recommended formatting style for Py3.0+ (https://pyformat.info)
new_py3_hw12 = '{:>15} {:1.1f} {}'.format('hello' + ' ' +  'world', 1, 2) 
print (hw12)
print (new_py3_hw12)


hello world 12
    hello world 1.0 2

In [2]:
s = "hello"
print (s.capitalize())  # Capitalize a string; prints "Hello"
print (s.upper())       # Convert a string to uppercase; prints "HELLO"
print (s.rjust(7))      # Right-justify a string, padding with spaces; prints "  hello"
print (s.center(7))     # Center a string, padding with spaces; prints " hello "
print (s.replace('ll', '(ell)'))  # Replace all instances of one substring with another;
                               # prints "he(ell)(ell)o"
print ('  world '.strip())  # Strip leading and trailing whitespace; prints "world"


Hello
HELLO
  hello
 hello 
he(ell)o
world

In [13]:
"You can type ' inside"


Out[13]:
"You can type ' inside"

In [3]:
'You can type \' inside'


Out[3]:
'You can type " inside'

You can find a list of all string methods in the document.

Containers

Python includes several built-in container types: lists, dictionaries, sets, and tuples.

Lists

A list is the Python equivalent of an array, but is resizeable and can contain elements of different types:


In [5]:
x = [1, 2, 3, 'a', 'b', 'c'] + ['hello'] # list append with the + operator
print (x, x[2])  # access by index
print (x[0])    # index can be negative


[1, 2, 3, 'a', 'b', 'c', 'hello'] 3
1

In [16]:
x.append('element')
print (x)
print (x.pop(), x)


[1, 2, 3, 'a', 'b', 'c', 'hello', 'element']
element [1, 2, 3, 'a', 'b', 'c', 'hello']

Slicing

In addition to accessing list elements one at a time, Python provides concise syntax to access sublists; this is known as slicing:


In [7]:
x = [1, 2, 3, 4, 5]
print (x[2:])
print (x[:3])
print (x[2:5])
x[0:3] = ['a', 'b', 'c']  # modify elements in list
print (x)


[3, 4, 5]
[1, 2, 3]
[3, 4, 5]
['a', 'b', 'c', 4, 5]

In [8]:
y = x[:]   # copy list
y[2] = 100 # x won't change
print ('y:', y)
print ('x:', x)


y: ['a', 'b', 100, 4, 5]
x: ['a', 'b', 100, 4, 5]

As usual, you can find all the gory details about lists in the documentation.

Loops

You can loop over the elements of a list like this:


In [19]:
animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print (animal)


cat
dog
monkey

If you want access to the index of each element within the body of a loop, use the built-in enumerate function:


In [9]:
animals = ['cat', 'dog', 'monkey']
print (enumerate(animals))
for idx, animal in enumerate(animals):
    print ('#%d: %s' % (idx + 1, animal))


<enumerate object at 0x000001BAE532FB40>
#1: cat
#2: dog
#3: monkey

List comprehensions:

When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers:


In [21]:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print (squares)


[0, 1, 4, 9, 16]

You can make this code simpler using a list comprehension:


In [22]:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print (squares)


[0, 1, 4, 9, 16]

List comprehensions can also contain conditions:


In [23]:
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
even_squares_alt = [i ** 2 for i in filter(lambda k: k % 2 == 0 , nums)]
print (even_squares_alt)


[0, 4, 16]

In [24]:
nums = [0, 1, 2, 3, 4]
even_squares_or_one = [x ** 2 if x % 2 == 0 else 1 for x in nums]
print (even_squares_or_one)


[0, 1, 4, 1, 16]

Dictionaries

A dictionary stores (key, value) pairs, similar to a Map in C++. You can use it like this:


In [25]:
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print (d['cat'])       # Get an entry from a dictionary; prints "cute"
print ('cute' in d)     # Check if a dictionary has a given key; prints "True"


cute
False

In [26]:
d['fish'] = 'wet'    # Set an entry in a dictionary
print (d['fish'])      # Prints "wet"


wet

In [27]:
print (d['monkey'])  # KeyError: 'monkey' not a key of d


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-27-39608aeda0ef> in <module>()
----> 1 print (d['monkey'])  # KeyError: 'monkey' not a key of d

KeyError: 'monkey'

In [28]:
print (d.get('monkey', 'N/A'))  # Get an element with a default; prints "N/A"
print (d.get('fish', 'N/A'))    # Get an element with a default; prints "wet"


N/A
wet

In [29]:
del d['fish']        # Remove an element from a dictionary
print (d.get('fish', 'N/A')) # "fish" is no longer a key; prints "N/A"


N/A

You can find all you need to know about dictionaries in the documentation.

It is easy to iterate over the keys in a dictionary:


In [30]:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    legs = d[animal]
    print ('A %s has %d legs' % (animal, legs))


A cat has 4 legs
A person has 2 legs
A spider has 8 legs

If you want access to keys and their corresponding values, use the items method:


In [31]:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal, legs in d.items():
    print ('A %s has %d legs' % (animal, legs))


A cat has 4 legs
A person has 2 legs
A spider has 8 legs

Dictionary comprehensions: These are similar to list comprehensions, but allow you to easily construct dictionaries. For example:


In [32]:
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print (even_num_to_square)


{0: 0, 2: 4, 4: 16}

In [33]:
# Make a dictionary from two lists using zip
l1 = ['EECS445', 'EECS545'] 
l2 = ['Undergraduate ML', 'Graduate ML']
d = dict(zip(l1, l2))
print (d)
# Unroll dictionary into two tuples
k, v = list(d.keys()), list(d.values())
print (d.items())
print (k, v)


{'EECS445': 'Undergraduate ML', 'EECS545': 'Graduate ML'}
dict_items([('EECS445', 'Undergraduate ML'), ('EECS545', 'Graduate ML')])
['EECS445', 'EECS545'] ['Undergraduate ML', 'Graduate ML']

Sets

A set is an unordered collection of distinct elements. As a simple example, consider the following:


In [34]:
animals = {'cat', 'dog'}
print ('cat' in animals)   # Check if an element is in a set; prints "True"
print ('fish' in animals)  # prints "False"


True
False

In [35]:
animals.add('fish')      # Add an element to a set
print ('fish' in animals)
print (len(animals))       # Number of elements in a set;


True
3

In [36]:
animals.add('cat')       # Adding an element that is already in the set does nothing
print (len(animals))       
animals.remove('cat')    # Remove an element from a set
print (len(animals))


3
2

Loops: Iterating over a set has the same syntax as iterating over a list; however since sets are unordered, you cannot make assumptions about the order in which you visit the elements of the set:


In [37]:
animals = {'dog', 'fish', 'cat'}
for idx, animal in enumerate(animals):
    print ('#%d: %s' % (idx + 1, animal))
# Prints "#1: fish", "#2: dog", "#3: cat"


#1: cat
#2: dog
#3: fish

Set comprehensions: Like lists and dictionaries, we can easily construct sets using set comprehensions:


In [38]:
from math import sqrt
print ({int(sqrt(x)) for x in range(30)})


{0, 1, 2, 3, 4, 5}

Tuples

A tuple is an (immutable) ordered list of values. A tuple is in many ways similar to a list; one of the most important differences is that tuples can be used as keys in dictionaries and as elements of sets, while lists cannot. Here is a trivial example:


In [39]:
d = {(x, x + 1): x for x in range(0, 10, 2)}  # Create a dictionary with tuple keys, note that range can use step args.
t = (0, 1)       # Create a tuple
print (type(t))
print (d[t])
print (d[(2, 3)])


<class 'tuple'>
0
2

In [40]:
t[0] = 1


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-0a69537257d5> in <module>()
----> 1 t[0] = 1

TypeError: 'tuple' object does not support item assignment

Functions

Python functions are defined using the def keyword. For example:


In [41]:
def get_GPA(x):
    if x >= 90:
        return "A"
    elif x >= 75:
        return "B"
    elif x >=60:
        return "C"
    else:
        return "F"

for x in [59, 70, 91]:
    print (get_GPA(x))


F
C
A

We will often define functions to take optional keyword arguments, like this:


In [42]:
def fib(n = 10):
    a = 0
    b = 1
    while b < n:
        print(b, end=',')
        a, b = b, a + b

fib()


1,1,2,3,5,8,

Classes

The syntax for defining classes in Python is straightforward:


In [43]:
class Greeter:

    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
            print ('HELLO, %s!' % self.name.upper())
        else:
            print ('Hello, %s' % self.name)

g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"


Hello, Fred
HELLO, FRED!

Modules

  • import modules
  • numpy
  • matplotlib
  • scikit-learn

In [44]:
from modules import fibo
from modules.fibo import fib2
print (fib2(10))
print (fibo.fib2(10))


[1, 1, 2, 3, 5, 8]
[1, 1, 2, 3, 5, 8]

NumPy

  • NumPy arrays, dtype, and shape
  • Reshape and Update In-Place
  • Combine Arrays
  • Array Math
  • Inner Product
  • Matrixes

To use Numpy, we first need to import the numpy package:


In [11]:
import numpy as np

In [46]:
a = np.array([1, 2, 3])
print(a)
print(a.shape)
print(a.dtype)


[1 2 3]
(3,)
int32

In [47]:
b = np.array([[0, 2, 4], [1, 3, 5]], dtype = np.float64)
print(b)
print(b.shape)
print(b.dtype)


[[ 0.  2.  4.]
 [ 1.  3.  5.]]
(2, 3)
float64

Numpy also provides many functions to create arrays:


In [48]:
np.zeros(5)  # Create an array of all zeros


Out[48]:
array([ 0.,  0.,  0.,  0.,  0.])

In [49]:
np.ones(shape=(3, 4), dtype = np.int32)  # Create an array of all ones


Out[49]:
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

In [50]:
np.full((2,2), 7, dtype = np.int32)  # Create a constant array


Out[50]:
array([[7, 7],
       [7, 7]])

In [51]:
np.eye(2)  # Create a 2x2 identity matrix


Out[51]:
array([[ 1.,  0.],
       [ 0.,  1.]])

In [52]:
np.random.random((2,2))  # Create an array filled with random values


Out[52]:
array([[ 0.41850537,  0.25369572],
       [ 0.60860764,  0.9802275 ]])

Array indexing

Numpy offers several ways to index into arrays. Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:


In [53]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]
print (b)


[[2 3]
 [6 7]]

In [54]:
print (a[0, 1])  
b[0, 0] = 77    # b[0, 0] is the same piece of data as a[0, 1]
print (a[0, 1])


2
77

In [55]:
row_r1 = a[1, :]    # Rank 1 view of the second row of a  
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
row_r3 = a[[1], :]  # Rank 2 view of the second row of a
print (a)
print (row_r1, row_r1.shape) 
print (row_r2, row_r2.shape)
print (row_r3, row_r3.shape)


[[ 1 77  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[[5 6 7 8]] (1, 4)

Reshape and Update In-Place


In [56]:
e = np.arange(12)
print(e)


[ 0  1  2  3  4  5  6  7  8  9 10 11]

In [57]:
# f is a view of contents of e
f = e.reshape(3, 4)
print(f)


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

In [58]:
# Set values of e from index 5 onwards to 0
e[7:] = 0
print (e)
# f is also updated
print (f)


[0 1 2 3 4 5 6 0 0 0 0 0]
[[0 1 2 3]
 [4 5 6 0]
 [0 0 0 0]]

In [59]:
# We can get transpose of array by T attribute
print (f.T)


[[0 4 0]
 [1 5 0]
 [2 6 0]
 [3 0 0]]

Combine Arrays


In [14]:
a = np.array([1, 2, 3])
print(np.concatenate([a, a, a]))


[1 2 3 1 2 3 1 2 3]

In [15]:
b = np.array([[1, 2, 3], [4, 5, 6]])
d = b / 2.0
# Use broadcasting when needed to do this automatically
print (np.vstack([a, b, d]))


[[ 1.   2.   3. ]
 [ 1.   2.   3. ]
 [ 4.   5.   6. ]
 [ 0.5  1.   1.5]
 [ 2.   2.5  3. ]]

In [16]:
# In machine learning, useful to enrich or 
# add new/concatenate features with hstack
np.hstack([b, d])
print (np.concatenate([b, d], axis = 0))


[[ 1.   2.   3. ]
 [ 4.   5.   6. ]
 [ 0.5  1.   1.5]
 [ 2.   2.5  3. ]]

Array math

Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:


In [63]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

In [64]:
# Elementwise sum; both produce the array
print (x + y)
print (np.add(x, y))


[[  6.   8.]
 [ 10.  12.]]
[[  6.   8.]
 [ 10.  12.]]

In [65]:
# Elementwise difference; both produce the array
print (x - y)
print (np.subtract(x, y))


[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]

In [66]:
# Elementwise product; both produce the array
print (x * y)
print (np.multiply(x, y))


[[  5.  12.]
 [ 21.  32.]]
[[  5.  12.]
 [ 21.  32.]]

In [67]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print (x / y)
print (np.divide(x, y))


[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]
[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]

In [68]:
# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print (np.sqrt(x))


[[ 1.          1.41421356]
 [ 1.73205081  2.        ]]

Broadcasting

Arrays with different dimensions can also perform above operations.


In [69]:
# Multiply single number
print (x * 0.5)


[[ 0.5  1. ]
 [ 1.5  2. ]]

In [70]:
a = np.array([1, 2, 3])
b = np.array([[1, 2, 3], [4, 5, 6]])

In [71]:
c = a + b
print(a.reshape(1, 3).shape, b.shape, c.shape)
print(c)


(1, 3) (2, 3) (2, 3)
[[2 4 6]
 [5 7 9]]

In [72]:
a.reshape((1, 1, 3)) + c.reshape((2, 1, 3))


Out[72]:
array([[[ 3,  6,  9]],

       [[ 6,  9, 12]]])

We can also get statistical results directly using sum, mean and std methods.


In [73]:
print (d)
print (d.sum())
print (d.sum(axis = 0))
print (d.mean())
print (d.mean(axis = 1))
print (d.std())
print (d.std(axis = 0))


[[ 0.5  1.   1.5]
 [ 2.   2.5  3. ]]
10.5
[ 2.5  3.5  4.5]
1.75
[ 1.   2.5]
0.85391256383
[ 0.75  0.75  0.75]

Inner Product

$$ (a_1, a_2, a_3, ..., a_n) \cdot (b_1, b_2, b_3, ..., b_n)^T = \sum_{i = 1}^{n}{a_ib_i} $$

We use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:


In [74]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])
# Inner product of vectors; both produce 219
print (v.dot(w))
print (np.dot(v, w))


219
219

In [75]:
# Matrix / vector product; both produce the rank 1 array [29 67]
print (x.dot(v))
print (np.dot(x, v))


[29 67]
[29 67]

In [76]:
# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print (x.dot(y))
print (np.dot(x, y))


[[19 22]
 [43 50]]
[[19 22]
 [43 50]]

Matrix

Instead of arrays, we can also use matrix to simplify the code.


In [77]:
x = np.matrix('1, 2, 3; 4, 5, 6')
y = np.matrix(np.ones((3, 4)))
print(x.shape)
print(y.shape)
print(x * y)
print(y.T * x.T)


(2, 3)
(3, 4)
[[  6.   6.   6.   6.]
 [ 15.  15.  15.  15.]]
[[  6.  15.]
 [  6.  15.]
 [  6.  15.]
 [  6.  15.]]

You can find more in the document.

Matplotlib

  • Plotting Lines
  • Plotting Multiple Lines
  • Scatter Plots
  • Legend, Titles, etc.
  • Subplots
  • Histogram

In [18]:
import pylab as plt

To make pylab work inside ipython:


In [19]:
%matplotlib inline

In [21]:
plt.plot([1,2,3,4], 'o-')
plt.ylabel('some numbers')
plt.show()



In [81]:
x = np.linspace(0,1,100);
y1 = x ** 2;
y2 = np.sin(x);

plt.plot(x, y1, 'r-', label="parabola");
plt.plot(x, y2, 'g-', label="sine");
plt.legend();
plt.xlabel("x axis");
plt.show()



In [82]:
# Create sample data, add some noise
x = np.random.uniform(1, 100, 1000)
y = np.log(x) + np.random.normal(0, .3, 1000)

plt.scatter(x, y)
plt.show()


Subplots

You can plot different things in the same figure using the subplot function. Here is an example:


In [83]:
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)

# First plot
plt.subplot(2, 1, 1)
plt.plot(x, y_sin)
plt.title('Sine')

# Second plot
plt.subplot(2, 1, 2)
plt.plot(x, y_cos)
plt.title('Cosine')

# Show the figure.
plt.show()



In [84]:
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)

plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.axis([40, 160, 0, 0.03])
plt.grid(True)
plt.show()


Scikit-learn

This is a common machine learning package with lots of algorithms, you can find detailed usage here.

Here is an example of KMeans cluster algorithm:


In [85]:
from sklearn.cluster import KMeans

In [86]:
mu1 = [5, 5]
mu2 = [0, 0]
cov1 = [[1, 0], [0, 1]]
cov2 = [[2, 1], [1, 3]]
x1 = np.random.multivariate_normal(mu1, cov1, 1000)
x2 = np.random.multivariate_normal(mu2, cov2, 1000)

print (x1.shape)
print (x2.shape)

plt.plot(x1[:, 0], x1[:, 1], 'r.')
plt.plot(x2[:, 0], x2[:, 1], 'b.')
plt.show()


(1000, 2)
(1000, 2)

In [87]:
x = np.vstack([x1, x2])
print (x.shape)
plt.plot(x[:, 0], x[:, 1], 'b.')
plt.show()


(2000, 2)

In [88]:
y_pred = KMeans(n_clusters=2).fit_predict(x)
x_pred1 = x[y_pred == 0, :]
x_pred2 = x[y_pred == 1, :]
print (x_pred1.shape)
print (x_pred2.shape)
plt.plot(x_pred1[:, 0], x_pred1[:, 1], 'b.')
plt.plot(x_pred2[:, 0], x_pred2[:, 1], 'r.')
plt.show()


(968, 2)
(1032, 2)