Purpose of This Notebook

how to use apply on a pandas Series and DataFrame
show a bit about how lambda functions work



In [1]:

    
# numpy and pandas related imports 

import numpy as np
from pandas import Series, DataFrame
import pandas as pd

Setup: create Series and DataFrames

Let's make two Series and a DataFrame to use for our example



In [2]:

    
# for example, using lower and uppercase English letters

import string
string.lowercase, string.uppercase









    Out[2]:





('abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')



In [3]:

    
# we can make a list composed of the individual lowercase letters 

list(string.lowercase)









    Out[3]:





['a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']



In [4]:

    
# create a pandas Series out of the list of lowercase letters

lower = Series(list(string.lowercase), name='lower')
print type(lower)
lower.head()









    



<class 'pandas.core.series.Series'>






    Out[4]:





0    a
1    b
2    c
3    d
4    e
Name: lower, dtype: object



In [5]:

    
# create a pandas Series out of the list of lowercase letters

upper = Series(list(string.uppercase), name='upper')



In [6]:

    
# concatenate the two Series as columns, using axis=1 
# axis = 0 would result in two rows in the DataFrame

df = pd.concat((lower, upper), axis=1)
df.head()









    Out[6]:






  
    
      
      lower
      upper
    
  
  
    
      0
       a
       A
    
    
      1
       b
       B
    
    
      2
       c
       C
    
    
      3
       d
       D
    
    
      4
       e
       E
    
  

5 rows × 2 columns

Using apply

Series.apply

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html:

Series.apply(func, convert_dtype=True, args=(), **kwds)

Invoke function on values of Series.



In [7]:

    
# Let's start by using Series.apply
# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html

# first of all, it's useful to find a way to use apply to return the exact same Series

def identity(s):
    return s

lower.apply(identity)









    Out[7]:





0     a
1     b
2     c
3     d
4     e
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
Name: lower, dtype: object



In [8]:

    
# show that identity yields the same Series -- first on element by element basis

lower.apply(identity) == lower









    Out[8]:





0     True
1     True
2     True
3     True
4     True
5     True
6     True
7     True
8     True
9     True
10    True
11    True
12    True
13    True
14    True
15    True
16    True
17    True
18    True
19    True
20    True
21    True
22    True
23    True
24    True
25    True
Name: lower, dtype: bool



In [9]:

    
# Check that match happens for every element in the Series using numpy.all
# http://docs.scipy.org/doc/numpy/reference/generated/numpy.all.html

np.all(lower.apply(identity) == lower)









    Out[9]:





True

Let's use `lambda`

Sometimes it's convenient to write functions using lambda, especially short functions for doing a simple transformation of the parameters. Only some functions can be rewritten with lambda.



In [10]:

    
def add_preface(s):
    return 'letter ' + s

lower.apply(add_preface)









    Out[10]:





0     letter a
1     letter b
2     letter c
3     letter d
4     letter e
5     letter f
6     letter g
7     letter h
8     letter i
9     letter j
10    letter k
11    letter l
12    letter m
13    letter n
14    letter o
15    letter p
16    letter q
17    letter r
18    letter s
19    letter t
20    letter u
21    letter v
22    letter w
23    letter x
24    letter y
25    letter z
Name: lower, dtype: object



In [11]:

    
# rewrite with lambda

lower.apply(lambda s: 'letter ' + s)









    Out[11]:





0     letter a
1     letter b
2     letter c
3     letter d
4     letter e
5     letter f
6     letter g
7     letter h
8     letter i
9     letter j
10    letter k
11    letter l
12    letter m
13    letter n
14    letter o
15    letter p
16    letter q
17    letter r
18    letter s
19    letter t
20    letter u
21    letter v
22    letter w
23    letter x
24    letter y
25    letter z
Name: lower, dtype: object

Another illustration of apply

Another illustration of using apply -- using ord and chr



In [12]:

    
# ord: Given a string of length one, return an integer representing the Unicode code 
# point of the character when the argument is a unicode object, or the value of the 
# byte when the argument is an 8-bit string. 
# http://docs.python.org/2.7/library/functions.html#ord

ord('a')









    Out[12]:





97



In [13]:

    
# chr: Return a string of one character whose ASCII code is the integer i.
# http://docs.python.org/2.7/library/functions.html#chr

chr(97)









    Out[13]:





'a'



In [14]:

    
# show that for the case of 'a', chr(ord()) returns what we start with:'a'

chr(ord('a')) == 'a'









    Out[14]:





True



In [15]:

    
# we can test whether chr reverses ord for all the lower case letters
# note how we chain two apply together

np.all(lower.apply(ord).apply(chr) == lower)









    Out[15]:





True

Note that we read off a specific series from the DataFrame



In [16]:

    
type(df.upper)









    Out[16]:





pandas.core.series.Series



In [17]:

    
# transform
df.upper.apply(lambda s: s.lower())









    Out[17]:





0     a
1     b
2     c
3     d
4     e
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
Name: upper, dtype: object

DataFrame.apply

apply can also be applied to a DataFrame

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html

DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)
Applies function along input axis of DataFrame.

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.



In [18]:

    
# let's show that whether we use apply on columns (axis=0) or rows (axis=1), we get the same 
# result

def identity(s):
    return s

np.all(df.apply(identity, axis=0) == df.apply(identity, axis=1))









    Out[18]:





True



In [19]:

    
# for each column, first lower and then upper, return the index

def index(s):
    return s.index

df.apply(index, axis=0)









    Out[19]:






  
    
      
      lower
      upper
    
  
  
    
      0 
        0
        0
    
    
      1 
        1
        1
    
    
      2 
        2
        2
    
    
      3 
        3
        3
    
    
      4 
        4
        4
    
    
      5 
        5
        5
    
    
      6 
        6
        6
    
    
      7 
        7
        7
    
    
      8 
        8
        8
    
    
      9 
        9
        9
    
    
      10
       10
       10
    
    
      11
       11
       11
    
    
      12
       12
       12
    
    
      13
       13
       13
    
    
      14
       14
       14
    
    
      15
       15
       15
    
    
      16
       16
       16
    
    
      17
       17
       17
    
    
      18
       18
       18
    
    
      19
       19
       19
    
    
      20
       20
       20
    
    
      21
       21
       21
    
    
      22
       22
       22
    
    
      23
       23
       23
    
    
      24
       24
       24
    
    
      25
       25
       25
    
  

26 rows × 2 columns



In [20]:

    
# for each row (axis=1), first lower and then upper, return the index 
# (which are the column names)

def index(s):
    return s.index

df.apply(index, axis=1)









    Out[20]:






  
    
      
      lower
      upper
    
  
  
    
      0 
       lower
       upper
    
    
      1 
       lower
       upper
    
    
      2 
       lower
       upper
    
    
      3 
       lower
       upper
    
    
      4 
       lower
       upper
    
    
      5 
       lower
       upper
    
    
      6 
       lower
       upper
    
    
      7 
       lower
       upper
    
    
      8 
       lower
       upper
    
    
      9 
       lower
       upper
    
    
      10
       lower
       upper
    
    
      11
       lower
       upper
    
    
      12
       lower
       upper
    
    
      13
       lower
       upper
    
    
      14
       lower
       upper
    
    
      15
       lower
       upper
    
    
      16
       lower
       upper
    
    
      17
       lower
       upper
    
    
      18
       lower
       upper
    
    
      19
       lower
       upper
    
    
      20
       lower
       upper
    
    
      21
       lower
       upper
    
    
      22
       lower
       upper
    
    
      23
       lower
       upper
    
    
      24
       lower
       upper
    
    
      25
       lower
       upper
    
  

26 rows × 2 columns



In [21]:

    
# it might be easier to see the difference between axis=0 vs axis=1
# by using join

# Consider what you get with

"".join(df.lower)









    Out[21]:





'abcdefghijklmnopqrstuvwxyz'



In [22]:

    
# Now compare (axis=0)

df.apply(lambda s: "".join(s), axis=0)









    Out[22]:





lower    abcdefghijklmnopqrstuvwxyz
upper    ABCDEFGHIJKLMNOPQRSTUVWXYZ
dtype: object



In [23]:

    
# join with axis=1

df.apply(lambda s: "".join(s), axis=1)









    Out[23]:





0     aA
1     bB
2     cC
3     dD
4     eE
5     fF
6     gG
7     hH
8     iI
9     jJ
10    kK
11    lL
12    mM
13    nN
14    oO
15    pP
16    qQ
17    rR
18    sS
19    tT
20    uU
21    vV
22    wW
23    xX
24    yY
25    zZ
dtype: object



In [24]:

    
# note that you can access use the index in your function passed to apply

df.apply(lambda s: s['upper'] + s['lower'], axis=1)









    Out[24]:





0     Aa
1     Bb
2     Cc
3     Dd
4     Ee
5     Ff
6     Gg
7     Hh
8     Ii
9     Jj
10    Kk
11    Ll
12    Mm
13    Nn
14    Oo
15    Pp
16    Qq
17    Rr
18    Ss
19    Tt
20    Uu
21    Vv
22    Ww
23    Xx
24    Yy
25    Zz
dtype: object



In [24]:



In [ ]:

	lower	upper
0	0	0
1	1	1
2	2	2
3	3	3
4	4	4
5	5	5
6	6	6
7	7	7
8	8	8
9	9	9
10	10	10
11	11	11
12	12	12
13	13	13
14	14	14
15	15	15
16	16	16
17	17	17
18	18	18
19	19	19
20	20	20
21	21	21
22	22	22
23	23	23
24	24	24
25	25	25

	lower	upper
0	lower	upper
1	lower	upper
2	lower	upper
3	lower	upper
4	lower	upper
5	lower	upper
6	lower	upper
7	lower	upper
8	lower	upper
9	lower	upper
10	lower	upper
11	lower	upper
12	lower	upper
13	lower	upper
14	lower	upper
15	lower	upper
16	lower	upper
17	lower	upper
18	lower	upper
19	lower	upper
20	lower	upper
21	lower	upper
22	lower	upper
23	lower	upper
24	lower	upper
25	lower	upper

	lower	upper
0	0	0
1	1	1
2	2	2
3	3	3
4	4	4
5	5	5
6	6	6
7	7	7
8	8	8
9	9	9
10	10	10
11	11	11
12	12	12
13	13	13
14	14	14
15	15	15
16	16	16
17	17	17
18	18	18
19	19	19
20	20	20
21	21	21
22	22	22
23	23	23
24	24	24
25	25	25