In [30]:
from IPython.display import Image
Image(url='http://www.pythonbootcamp.info/_/rsrc/1280771545584/home/Screen%20shot%202010-08-02%20at%2010.51.14%20AM.png', embed=True)


Out[30]:
<IPython.core.display.Image object>

Introduction

This course is meant to provide a very gentle introduction to the python scientific ecosystem. We will try to cover as much as humanly possible within the timespan of 2 hours. It is essentially a warm-up before the upcoming machine learning bootcamps for those who would like to adopt python as their daily data hacking toolbox.

What is in this coure

  • an introduction to python syntax,
  • an overview of the scientific libraries,
  • using the IPython Notebook
  • plotting within the notebook,
  • and a basic introduction to the machine library scikit-learn.

What this course is not about

  • an introduction to object oriented programming
  • detailed expertise in scientific python libraries
  • a machine learning practical course (come to thursday's bootcamp for that)

References

Table of Contents

IPython Notebook

Python syntax

A comment is line preceded by a #


In [31]:
# Here is a comment

Numbers

Variables are declared without mentioning their type

Numbers


In [32]:
i = 5  # a is an integer
x = 2.0  # x is a float

Caveats


In [33]:
5. / 2


Out[33]:
2.5

In [34]:
5 / 2


Out[34]:
2

In [35]:
5. // 2  # floor division operator


Out[35]:
2.0

Strings

Strings can be declared with two different ways (actually four...)


In [36]:
s1 = 'This is a string.'

s2 = '''This is a mult-line string.
        Awesome if I want to replace Microsoft Word by Python...
        or simply to document my code directly within ;)
     '''

s3 = "I'm using a double quotes to delimit the string, otherwise, I'd have to \"escape\" it with a \ ."

In [37]:
print(s1)
print(s2)
print(s3)


This is a string.
This is a mult-line string.
        Awesome if I want to replace Microsoft Word by Python...
        or simply to document my code directly within ;)
     
I'm using a double quotes to delimit the string, otherwise, I'd have to "escape" it with a \ .

You can know the type of a variable by calling the type() function, it comes out-of-box functions with your python interpreter. Many other functions are provided as we will see. We call them the built-in functions.


In [38]:
print(type(i))
print(type(x))
print(type(s1))


<type 'int'>
<type 'float'>
<type 'str'>

Tip

Variables can be assigned all at onces, which allows to do swapping pretty elegantly


In [39]:
a, b = 1, 's'
print(a, b)
a, b = b, a
print(a, b)


(1, 's')
('s', 1)

In [40]:
c = d = 'multiple assignment'

In [41]:
b += 10  # no b++ nor ++b in python

Indentation

Where C-like languages use curly brackets to delimit blocs of code, coditional statements, loops etc., python uses the code indentation.


In [42]:
i = 0
while i < 5:
    if i == 2:
        print('Two!')
    else:
        print(i)
    i += 1


0
1
Two!
3
4

Functions


In [43]:
def my_func(argument):
    """ This comment is called a docstring.
    Placed right after the function signature,
    it serves as a documentation for the function."""

    # the indentation is crucial
    print(argument)

In [44]:
my_func('hey you!')


hey you!

In [45]:
my_func(argument='out there in the cold')


out there in the cold

In [46]:
def my_func_2(argument='default argument'):
    print(argument)

In [47]:
my_func_2()


default argument

In [48]:
def my_func_3(a, b=10, c=20):
    print(sum((a, b, c)))

In [49]:
my_func_3(1)       # 1 + 10 + 20
my_func_3(1, 2)    # 1 + 2 + 20
my_func_3(1, 2, 3) # 1 + 2 + 3


31
23
6

Data structures

  • lists
  • dictionaries

In [50]:
my_list = [1, 2, 3]

In [51]:
def my_func_4(*args):
    print(sum(*args))

In [52]:
my_func_4(my_list)


6

In [53]:
my_func_3(*my_list)


6

In [54]:
my_dict = {'a': 1, 'b': 2, 'c': 3}

In [55]:
my_func_3(**my_dict)


6

In [56]:
def my_func_5(**kwargs):
    values = kwargs.values()
    print(sum(values))

In [57]:
my_func_5(**my_dict)


6

The standard signature for functions is commonly


In [58]:
def func(pos_arg_1, pos_arg_2, named_arg_1='default arg 1', named_arg_2='default arg 2', *args, **kwargs):
    pass

Data structures (most commonly used) operations


In [59]:
my_list.append('four')
my_list  # note that the list is a heterogenous container


Out[59]:
[1, 2, 3, 'four']

In [60]:
my_list[0]


Out[60]:
1

In [69]:
my_list[:2]


Out[69]:
[1, 2]

In [70]:
my_list[1:3]


Out[70]:
[2, 3]

In [79]:
my_list[-1]


Out[79]:
7

Ex.

Display the last two elements of the list without explicitely mentioning the total lenght of the list.


In [61]:
my_list.extend([5, 6, 7])
my_list


Out[61]:
[1, 2, 3, 'four', 5, 6, 7]

In [71]:
my_list[1:5:2]


Out[71]:
[2, 'four']

One very handy and expressive way to create lists is to use list comprehensions


In [81]:
my_list_comprehension = [i**2 for i in range(10)]
my_list_comprehension


Out[81]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [62]:
my_dict.keys()


Out[62]:
['a', 'c', 'b']

In [63]:
my_dict.values()


Out[63]:
[1, 3, 2]

In [64]:
my_dict.items()


Out[64]:
[('a', 1), ('c', 3), ('b', 2)]

In [65]:
my_dict.update({'c': 3})
my_dict


Out[65]:
{'a': 1, 'b': 2, 'c': 3}

In [77]:
my_dict['c']


Out[77]:
3

In [78]:
my_dict['f']  # wrap it in a print to see the return value


None

In [78]:
my_dict.get('f')  # wrap it in a print to see the return value


None

The get() method accepts a second argument, check it out by pressing shift + tab when the cursor is over it. What you see is actually the docstring of the method (hence the importance of writing these little peaces of comments).


In [82]:
'f' in my_dict


Out[82]:
False

Uncovered here

  • Dict comprehension
  • Sets
  • Tuples

Classes and objects


In [66]:
class MyClass:
    """ Class docstring """
    my_class_attribute = "I'm shared among all the instances."
    
    def __init__(self):
        """ Constructor """    
        self.my_instance_attribtue = None

In [67]:
my_object = MyClass()  # constructor with no arguments

The Python scientific environment

Numpy


In [94]:
import numpy as np

Meet the ndarray class


In [186]:
a = np.arange(20)
a, type(a)


Out[186]:
(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19]), numpy.ndarray)

In [187]:
a.reshape(5, 4)


Out[187]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [188]:
a = a.reshape(5, 2, 2)
a


Out[188]:
array([[[ 0,  1],
        [ 2,  3]],

       [[ 4,  5],
        [ 6,  7]],

       [[ 8,  9],
        [10, 11]],

       [[12, 13],
        [14, 15]],

       [[16, 17],
        [18, 19]]])

In [189]:
a.dtype


Out[189]:
dtype('int64')

In [190]:
b = np.array([0.1, 0.2, 0.3])

In [191]:
b.dtype


Out[191]:
dtype('float64')

In [192]:
b.astype(np.complex)


Out[192]:
array([ 0.1+0.j,  0.2+0.j,  0.3+0.j])

In [193]:
a.shape, b.shape


Out[193]:
((5, 2, 2), (3,))

In [194]:
print(a.ndim)
print(b.ndim)


3
1

Array creation


In [195]:
np.zeros( (4, 6) )


Out[195]:
array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

In [196]:
np.zeros_like(a)


Out[196]:
array([[[0, 0],
        [0, 0]],

       [[0, 0],
        [0, 0]],

       [[0, 0],
        [0, 0]],

       [[0, 0],
        [0, 0]],

       [[0, 0],
        [0, 0]]])

In [197]:
np.ones(10)


Out[197]:
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

Basic operations


In [198]:
c = np.arange(10).reshape(2, -1)
c


Out[198]:
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [199]:
c + 10


Out[199]:
array([[10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [200]:
c**2


Out[200]:
array([[ 0,  1,  4,  9, 16],
       [25, 36, 49, 64, 81]])

In [201]:
np.sin(c)


Out[201]:
array([[ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ],
       [-0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849]])

In [202]:
d = np.arange(10, 15)
d


Out[202]:
array([10, 11, 12, 13, 14])

In [203]:
print('{} + {} = \n\n{}'.format(c, d, c+d))


[[0 1 2 3 4]
 [5 6 7 8 9]] + [10 11 12 13 14] = 

[[10 12 14 16 18]
 [15 17 19 21 23]]

In [204]:
c > 6


Out[204]:
array([[False, False, False, False, False],
       [False, False,  True,  True,  True]], dtype=bool)

In [205]:
c[c > 6]


Out[205]:
array([7, 8, 9])

In [211]:
c = c.T
c


Out[211]:
array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])

In [213]:
c[ [1, 3, 4] ]


Out[213]:
array([[1, 6],
       [3, 8],
       [4, 9]])

In [207]:
A = np.array( [[10  , 20],
               [30  , 40]] )
B = np.array( [[1   , 0.5],
               [1./3, 0.25]] )

In [208]:
A * B  # elementwise product


Out[208]:
array([[ 10.,  10.],
       [ 10.,  10.]])

In [209]:
np.dot(A, B)  # matrix product


Out[209]:
array([[ 16.66666667,  10.        ],
       [ 43.33333333,  25.        ]])

Copies and views, for lists and arrays


In [229]:
list_1 = range(10)
list_1


Out[229]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [221]:
list_2 = list_1

In [222]:
id(list_1), id(list_2)


Out[222]:
(4387029072, 4387029072)

In [223]:
list_1 is list_2


Out[223]:
True

In [224]:
list_1 = list_2[:]

In [225]:
id(list_1), id(list_2)


Out[225]:
(4387375512, 4387029072)

In [233]:
# equivalently...
list_1 is list_2


Out[233]:
False

In [227]:
array_1 = np.arange(10)

In [236]:
array_2 = array_1

In [237]:
array_2 is array_1


Out[237]:
True

In [231]:
array_2 = array_1[:]

In [232]:
array_2 is array_1


Out[232]:
False

In [235]:
array_2[-1] = 100
array_1


Out[235]:
array([  0,   1,   2,   3,   4,   5,   6,   7,   8, 100])

In [238]:
array_2 = array_1.view()

In [239]:
array_2 is array_1


Out[239]:
False

In [240]:
array_2.base is array_1


Out[240]:
True

If you want a deep copy, use the .copy() method.

For a more comprehensive introduction to numpy, you can read the official tutorial.

Plotting

Scikit-learn


In [241]:
from sklearn import datasets

In [243]:
from sklearn.ensemble import AdaBoostClassifier

In [242]:
digits = datasets.load_digits()

In [246]:
X, y = digits.data, digits.target

In [247]:
clf = AdaBoostClassifier(n_estimators=100)

In [248]:
clf.fit(X, y)


Out[248]:
AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,
          learning_rate=1.0, n_estimators=100, random_state=None)

In [250]:
clf.score(X, y)


Out[250]:
0.42682248191430161

In [ ]: