Python NRT

(Not a Real Tutorial)


A brief brief tour around Python 2.7 & Pandas library

Python language

  • Good language to start programming with
  • Simple, powerful, mature
  • Easy to read, intuitive
>>> print "how "+"are you?"
how are you?

Running Python

  • From Python console
$ python
Python 2.7.12 (default, Jun 29 2016, 14:05:02)
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
  • Run a python script
$ python myprogram.py
  • Use an interactive web console like Jupyter
$ jupyter notebook

Syntax

  • No termination character

In [1]:
name = "Pepe"
  • Blocks specified by indentation (not braces nor brackets)
  • First statement of a block ends in colon (:)

In [2]:
def myfunction(x):
    pass
    if x > 10:
        pass
        pass
        return "bigger"
    else:
        pass
        pass
        return "smaller"
    
print "This number is: " + myfunction(5)


This number is: smaller
  • Comments use number/hash symbol (#) and triple quotes (""")

In [3]:
"""This is a comment that spands for
more than one line"""
# This is a one line comment
print "This line is executed"


This line is executed

Modules


In [4]:
import pandas as pd
from time import clock

Lists and selections


In [5]:
months = ["Jan", "Feb", 3, 4, "May", "Jun"]
print months[0]


Jan

In [6]:
print months[1:3]  # slice operator :


['Feb', 3]

In [7]:
print months[-2:]


['May', 'Jun']

Tuples

Similar to lists, sequence of elements that conforms an immutable object.


In [8]:
tup = ('physics', 'chemistry', 1997, 2000)
print tup[0]


physics

In [9]:
print tup[1:3]


('chemistry', 1997)

Functions & Methods


In [10]:
"""functions are pieces of code that you can 
call/execute, they are defined with the def keyword"""

def hola_mundo():
    print "Hola Mundo!"

In [11]:
""" methods are attributes of an object that 
you can call over the object with and "." """

s = "How are you" 
print s.split(" ")


['How', 'are', 'you']

Control flow

  • Loops (while, for)

In [12]:
for numbers in range(1,5):
    print numbers


1
2
3
4
  • Conditionals (if, elif, else)

In [13]:
united_kingdom = ["England", "Scotland", "Wales", "N Ireland"]
one = "France"

if one in united_kingdom:
    print "UK"
elif one == "France":
    print "Not UK. Bon jour!"
else:
    print "Not UK"


Not UK. Bon jour!

Help!

"house".len()?

len(house)?


In [14]:
help(len)


Help on built-in function len in module __builtin__:

len(...)
    len(object) -> integer
    
    Return the number of items of a sequence or collection.


In [15]:
len("house")


Out[15]:
5

In [16]:
help(list)


Help on class list in module __builtin__:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 |  
 |  Methods defined here:
 |  
 |  __add__(...)
 |      x.__add__(y) <==> x+y
 |  
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x
 |  
 |  __delitem__(...)
 |      x.__delitem__(y) <==> del x[y]
 |  
 |  __delslice__(...)
 |      x.__delslice__(i, j) <==> del x[i:j]
 |      
 |      Use of negative indices is not supported.
 |  
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
 |  
 |  __ge__(...)
 |      x.__ge__(y) <==> x>=y
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __getslice__(...)
 |      x.__getslice__(i, j) <==> x[i:j]
 |      
 |      Use of negative indices is not supported.
 |  
 |  __gt__(...)
 |      x.__gt__(y) <==> x>y
 |  
 |  __iadd__(...)
 |      x.__iadd__(y) <==> x+=y
 |  
 |  __imul__(...)
 |      x.__imul__(y) <==> x*=y
 |  
 |  __init__(...)
 |      x.__init__(...) initializes x; see help(type(x)) for signature
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  __le__(...)
 |      x.__le__(y) <==> x<=y
 |  
 |  __len__(...)
 |      x.__len__() <==> len(x)
 |  
 |  __lt__(...)
 |      x.__lt__(y) <==> x<y
 |  
 |  __mul__(...)
 |      x.__mul__(n) <==> x*n
 |  
 |  __ne__(...)
 |      x.__ne__(y) <==> x!=y
 |  
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  
 |  __reversed__(...)
 |      L.__reversed__() -- return a reverse iterator over the list
 |  
 |  __rmul__(...)
 |      x.__rmul__(n) <==> n*x
 |  
 |  __setitem__(...)
 |      x.__setitem__(i, y) <==> x[i]=y
 |  
 |  __setslice__(...)
 |      x.__setslice__(i, j, y) <==> x[i:j]=y
 |      
 |      Use  of negative indices is not supported.
 |  
 |  __sizeof__(...)
 |      L.__sizeof__() -- size of L in memory, in bytes
 |  
 |  append(...)
 |      L.append(object) -- append object to end
 |  
 |  count(...)
 |      L.count(value) -> integer -- return number of occurrences of value
 |  
 |  extend(...)
 |      L.extend(iterable) -- extend list by appending elements from the iterable
 |  
 |  index(...)
 |      L.index(value, [start, [stop]]) -> integer -- return first index of value.
 |      Raises ValueError if the value is not present.
 |  
 |  insert(...)
 |      L.insert(index, object) -- insert object before index
 |  
 |  pop(...)
 |      L.pop([index]) -> item -- remove and return item at index (default last).
 |      Raises IndexError if list is empty or index is out of range.
 |  
 |  remove(...)
 |      L.remove(value) -- remove first occurrence of value.
 |      Raises ValueError if the value is not present.
 |  
 |  reverse(...)
 |      L.reverse() -- reverse *IN PLACE*
 |  
 |  sort(...)
 |      L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
 |      cmp(x, y) -> -1, 0, 1
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __hash__ = None
 |  
 |  __new__ = <built-in method __new__ of type object>
 |      T.__new__(S, ...) -> a new object with type S, a subtype of T

PANDAS LIBRARY

Open source library providing high-performance structures and data analysis tools for the Python programming language.

Import


In [17]:
import pandas as pd

Structures

Pandas Series


In [18]:
ss = pd.Series([1,2,3], 
              index = ['a','b','c'])
ss


Out[18]:
a    1
b    2
c    3
dtype: int64

Selection


In [19]:
ss = pd.Series([1,2,3], 
              index = ['a','b','c'])
print ss[0]       # as a list
print ss.iloc[0]  # by position, integer
print ss.loc['a'] # by label of the index
print ss.ix['a']  # label (priority)
print ss.ix[0]    # position if no label


1
1
1
1
1

In [20]:
"""Be careful with the slice operator 
using positions or labels"""

print ss.iloc[0:2] # positions 0,1
print ss.loc['a':'c'] # labels 'a','b','c'


a    1
b    2
dtype: int64
a    1
b    2
c    3
dtype: int64

Built-in methods


In [21]:
pd.Series([1, 2, 3]).mean()


Out[21]:
2.0

In [22]:
pd.Series([1, 2, 3]).sum()


Out[22]:
6

In [23]:
pd.Series([1, 2, 3]).std()


Out[23]:
1.0

Pandas Dataframe


In [24]:
df = pd.DataFrame(
    data =[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    index=['row1', 'row2', 'row3'],
    columns=['col1', 'col2', 'col3'])
df


Out[24]:
col1 col2 col3
row1 1 2 3
row2 4 5 6
row3 7 8 9

Selection

Select columns


In [25]:
df['col1']  # one col => Series


Out[25]:
row1    1
row2    4
row3    7
Name: col1, dtype: int64

In [26]:
df[['col1']] # list of cols => DataFrame


Out[26]:
col1
row1 1
row2 4
row3 7

Select rows


In [27]:
df.loc['row1'] # by row using label


Out[27]:
col1    1
col2    2
col3    3
Name: row1, dtype: int64

In [28]:
df.iloc[0]  # by row using position


Out[28]:
col1    1
col2    2
col3    3
Name: row1, dtype: int64

In [29]:
df.ix['row1']   # by row, using label
print df.ix[0]  # by row, using position


col1    1
col2    2
col3    3
Name: row1, dtype: int64

Combined selection


In [30]:
print df.loc['row1',['col1', 'col3']] # labels
print df.loc[['row1','row3'],'col1' : 'col3']


col1    1
col3    3
Name: row1, dtype: int64
      col1  col2  col3
row1     1     2     3
row3     7     8     9

In [31]:
df.iloc[0:2,[0,2]]  # row position 0,1


Out[31]:
col1 col3
row1 1 3
row2 4 6

In [32]:
print df.ix[0,['col2','col3']] # position & label
print df.ix['row1':'row3', :]


col2    2
col3    3
Name: row1, dtype: int64
      col1  col2  col3
row1     1     2     3
row2     4     5     6
row3     7     8     9

Should I use always .ix()?

.ix() selector gotcha!


In [33]:
df2 = pd.DataFrame(
    data =[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    index=[1, 2, 3],
    columns=['col1', 'col2', 'col3'])

print df2.ix[1] # priority is label
# df2.ix[0]  ERROR!!


col1    1
col2    2
col3    3
Name: 1, dtype: int64

In [34]:
df2 = pd.DataFrame(
    data =[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    index=[1, 2, 3],
    columns=['col1', 'col2', 'col3'])

print df2.ix[1:3] # LABELS!! (1,2,3)


   col1  col2  col3
1     1     2     3
2     4     5     6
3     7     8     9

In [35]:
# these two dataframes are the same!!
df2 = pd.DataFrame(
    data =[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    index=[1, 2, 3],
    columns=[1, 2, 3])

df3 = pd.DataFrame(
    data =[[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df3


Out[35]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9

DataFrame Selection Summary


In [36]:
df['col1']     # by columns
df.loc['row1'] # by row, using label
df.iloc[0]     # by row, using position
df.ix['row2']  # by row, using label
df.ix[1]       # by row, using position


Out[36]:
col1    4
col2    5
col3    6
Name: row2, dtype: int64

Built-in method


In [37]:
df.mean() # operates by columns (axis=0)


Out[37]:
col1    4.0
col2    5.0
col3    6.0
dtype: float64

Pandas Axis

axis axis along each
axis=1 axis="columns" along the columns for each row
axis=0 axis="index" along the rows for each column

In [38]:
df2 = pd.DataFrame(
    data =[[1, 2], [4, 5], [7, 8]],
    columns=["A", "B"])
df2


Out[38]:
A B
0 1 2
1 4 5
2 7 8

In [39]:
df2.mean(axis=1) # mean for each row


Out[39]:
0    1.5
1    4.5
2    7.5
dtype: float64

In [40]:
df2 = pd.DataFrame(
    data =[[1, 2], [4, 5], [7, 8]],
    columns=["A", "B"])
df2


Out[40]:
A B
0 1 2
1 4 5
2 7 8

In [41]:
df2.drop("A", axis=1) # drop columns for each row


Out[41]:
B
0 2
1 5
2 8

Let's do it!!