Python NRT

(Not a Real Tutorial)

A brief brief tour around Python 2.7 & Pandas library

Python language

Good language to start programming with
Simple, powerful, mature
Easy to read, intuitive

>>> print "how "+"are you?"
how are you?

Running Python

From Python console

$ python
Python 2.7.12 (default, Jun 29 2016, 14:05:02)
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Run a python script

$ python myprogram.py

Use an interactive web console like Jupyter

$ jupyter notebook

Syntax

No termination character



In [1]:

    
name = "Pepe"

Blocks specified by indentation (not braces nor brackets)
First statement of a block ends in colon (:)



In [2]:

    
def myfunction(x):
    pass
    if x > 10:
        pass
        pass
        return "bigger"
    else:
        pass
        pass
        return "smaller"
    
print "This number is: " + myfunction(5)









    



This number is: smaller

Comments use number/hash symbol (#) and triple quotes (""")



In [3]:

    
"""This is a comment that spands for
more than one line"""
# This is a one line comment
print "This line is executed"









    



This line is executed

Modules



In [4]:

    
import pandas as pd
from time import clock

Lists and selections



In [5]:

    
months = ["Jan", "Feb", 3, 4, "May", "Jun"]
print months[0]

Jan



In [6]:

    
print months[1:3]  # slice operator :









    



['Feb', 3]



In [7]:

    
print months[-2:]









    



['May', 'Jun']

Tuples

Similar to lists, sequence of elements that conforms an immutable object.



In [8]:

    
tup = ('physics', 'chemistry', 1997, 2000)
print tup[0]









    



physics



In [9]:

    
print tup[1:3]









    



('chemistry', 1997)

Functions & Methods



In [10]:

    
"""functions are pieces of code that you can 
call/execute, they are defined with the def keyword"""

def hola_mundo():
    print "Hola Mundo!"



In [11]:

    
""" methods are attributes of an object that 
you can call over the object with and "." """

s = "How are you" 
print s.split(" ")









    



['How', 'are', 'you']

Control flow

Loops (while, for)



In [12]:

    
for numbers in range(1,5):
    print numbers

Conditionals (if, elif, else)



In [13]:

    
united_kingdom = ["England", "Scotland", "Wales", "N Ireland"]
one = "France"

if one in united_kingdom:
    print "UK"
elif one == "France":
    print "Not UK. Bon jour!"
else:
    print "Not UK"









    



Not UK. Bon jour!

Help!

"house".len()?

len(house)?



In [14]:

    
help(len)









    



Help on built-in function len in module __builtin__:

len(...)
    len(object) -> integer
    
    Return the number of items of a sequence or collection.



In [15]:

    
len("house")









    Out[15]:





5



In [16]:

    
help(list)









    



Help on class list in module __builtin__:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 |  
 |  Methods defined here:
 |  
 |  __add__(...)
 |      x.__add__(y) <==> x+y
 |  
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x
 |  
 |  __delitem__(...)
 |      x.__delitem__(y) <==> del x[y]
 |  
 |  __delslice__(...)
 |      x.__delslice__(i, j) <==> del x[i:j]
 |      
 |      Use of negative indices is not supported.
 |  
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
 |  
 |  __ge__(...)
 |      x.__ge__(y) <==> x>=y
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __getslice__(...)
 |      x.__getslice__(i, j) <==> x[i:j]
 |      
 |      Use of negative indices is not supported.
 |  
 |  __gt__(...)
 |      x.__gt__(y) <==> x>y
 |  
 |  __iadd__(...)
 |      x.__iadd__(y) <==> x+=y
 |  
 |  __imul__(...)
 |      x.__imul__(y) <==> x*=y
 |  
 |  __init__(...)
 |      x.__init__(...) initializes x; see help(type(x)) for signature
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  __le__(...)
 |      x.__le__(y) <==> x<=y
 |  
 |  __len__(...)
 |      x.__len__() <==> len(x)
 |  
 |  __lt__(...)
 |      x.__lt__(y) <==> x<y
 |  
 |  __mul__(...)
 |      x.__mul__(n) <==> x*n
 |  
 |  __ne__(...)
 |      x.__ne__(y) <==> x!=y
 |  
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  
 |  __reversed__(...)
 |      L.__reversed__() -- return a reverse iterator over the list
 |  
 |  __rmul__(...)
 |      x.__rmul__(n) <==> n*x
 |  
 |  __setitem__(...)
 |      x.__setitem__(i, y) <==> x[i]=y
 |  
 |  __setslice__(...)
 |      x.__setslice__(i, j, y) <==> x[i:j]=y
 |      
 |      Use  of negative indices is not supported.
 |  
 |  __sizeof__(...)
 |      L.__sizeof__() -- size of L in memory, in bytes
 |  
 |  append(...)
 |      L.append(object) -- append object to end
 |  
 |  count(...)
 |      L.count(value) -> integer -- return number of occurrences of value
 |  
 |  extend(...)
 |      L.extend(iterable) -- extend list by appending elements from the iterable
 |  
 |  index(...)
 |      L.index(value, [start, [stop]]) -> integer -- return first index of value.
 |      Raises ValueError if the value is not present.
 |  
 |  insert(...)
 |      L.insert(index, object) -- insert object before index
 |  
 |  pop(...)
 |      L.pop([index]) -> item -- remove and return item at index (default last).
 |      Raises IndexError if list is empty or index is out of range.
 |  
 |  remove(...)
 |      L.remove(value) -- remove first occurrence of value.
 |      Raises ValueError if the value is not present.
 |  
 |  reverse(...)
 |      L.reverse() -- reverse *IN PLACE*
 |  
 |  sort(...)
 |      L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
 |      cmp(x, y) -> -1, 0, 1
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __hash__ = None
 |  
 |  __new__ = <built-in method __new__ of type object>
 |      T.__new__(S, ...) -> a new object with type S, a subtype of T

PANDAS LIBRARY

Open source library providing high-performance structures and data analysis tools for the Python programming language.

Import



In [17]:

    
import pandas as pd

Structures

Pandas Series



In [18]:

    
ss = pd.Series([1,2,3], 
              index = ['a','b','c'])
ss









    Out[18]:





a    1
b    2
c    3
dtype: int64

Selection



In [19]:

    
ss = pd.Series([1,2,3], 
              index = ['a','b','c'])
print ss[0]       # as a list
print ss.iloc[0]  # by position, integer
print ss.loc['a'] # by label of the index
print ss.ix['a']  # label (priority)
print ss.ix[0]    # position if no label



In [20]:

    
"""Be careful with the slice operator 
using positions or labels"""

print ss.iloc[0:2] # positions 0,1
print ss.loc['a':'c'] # labels 'a','b','c'









    



a    1
b    2
dtype: int64
a    1
b    2
c    3
dtype: int64

Built-in methods



In [21]:

    
pd.Series([1, 2, 3]).mean()









    Out[21]:





2.0



In [22]:

    
pd.Series([1, 2, 3]).sum()









    Out[22]:





6



In [23]:

    
pd.Series([1, 2, 3]).std()









    Out[23]:





1.0

Pandas Dataframe



In [24]:

    
df = pd.DataFrame(
    data =[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    index=['row1', 'row2', 'row3'],
    columns=['col1', 'col2', 'col3'])
df

Selection

Select columns



In [25]:

    
df['col1']  # one col => Series









    Out[25]:





row1    1
row2    4
row3    7
Name: col1, dtype: int64



In [26]:

    
df[['col1']] # list of cols => DataFrame

Select rows



In [27]:

    
df.loc['row1'] # by row using label









    Out[27]:





col1    1
col2    2
col3    3
Name: row1, dtype: int64



In [28]:

    
df.iloc[0]  # by row using position









    Out[28]:





col1    1
col2    2
col3    3
Name: row1, dtype: int64



In [29]:

    
df.ix['row1']   # by row, using label
print df.ix[0]  # by row, using position









    



col1    1
col2    2
col3    3
Name: row1, dtype: int64

Combined selection



In [30]:

    
print df.loc['row1',['col1', 'col3']] # labels
print df.loc[['row1','row3'],'col1' : 'col3']









    



col1    1
col3    3
Name: row1, dtype: int64
      col1  col2  col3
row1     1     2     3
row3     7     8     9



In [31]:

    
df.iloc[0:2,[0,2]]  # row position 0,1



In [32]:

    
print df.ix[0,['col2','col3']] # position & label
print df.ix['row1':'row3', :]









    



col2    2
col3    3
Name: row1, dtype: int64
      col1  col2  col3
row1     1     2     3
row2     4     5     6
row3     7     8     9

Should I use always .ix()?

.ix() selector gotcha!



In [33]:

    
df2 = pd.DataFrame(
    data =[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    index=[1, 2, 3],
    columns=['col1', 'col2', 'col3'])

print df2.ix[1] # priority is label
# df2.ix[0]  ERROR!!









    



col1    1
col2    2
col3    3
Name: 1, dtype: int64



In [34]:

    
df2 = pd.DataFrame(
    data =[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    index=[1, 2, 3],
    columns=['col1', 'col2', 'col3'])

print df2.ix[1:3] # LABELS!! (1,2,3)









    



   col1  col2  col3
1     1     2     3
2     4     5     6
3     7     8     9



In [35]:

    
# these two dataframes are the same!!
df2 = pd.DataFrame(
    data =[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    index=[1, 2, 3],
    columns=[1, 2, 3])

df3 = pd.DataFrame(
    data =[[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df3

DataFrame Selection Summary



In [36]:

    
df['col1']     # by columns
df.loc['row1'] # by row, using label
df.iloc[0]     # by row, using position
df.ix['row2']  # by row, using label
df.ix[1]       # by row, using position









    Out[36]:





col1    4
col2    5
col3    6
Name: row2, dtype: int64

Built-in method



In [37]:

    
df.mean() # operates by columns (axis=0)









    Out[37]:





col1    4.0
col2    5.0
col3    6.0
dtype: float64

Pandas Axis

axis	axis	along	each
axis=1	axis="columns"	along the columns	for each row
axis=0	axis="index"	along the rows	for each column



In [38]:

    
df2 = pd.DataFrame(
    data =[[1, 2], [4, 5], [7, 8]],
    columns=["A", "B"])
df2



In [39]:

    
df2.mean(axis=1) # mean for each row









    Out[39]:





0    1.5
1    4.5
2    7.5
dtype: float64



In [40]:

    
df2 = pd.DataFrame(
    data =[[1, 2], [4, 5], [7, 8]],
    columns=["A", "B"])
df2



In [41]:

    
df2.drop("A", axis=1) # drop columns for each row