Purpose of This Notebook

  • how to use apply on a pandas Series and DataFrame
  • show a bit about how lambda functions work

In [1]:
# numpy and pandas related imports 

import numpy as np
from pandas import Series, DataFrame
import pandas as pd

Setup: create Series and DataFrames

Let's make two Series and a DataFrame to use for our example


In [2]:
# for example, using lower and uppercase English letters

import string
string.lowercase, string.uppercase


Out[2]:
('abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')

In [3]:
# we can make a list composed of the individual lowercase letters 

list(string.lowercase)


Out[3]:
['a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']

In [4]:
# create a pandas Series out of the list of lowercase letters

lower = Series(list(string.lowercase), name='lower')
print type(lower)
lower.head()


<class 'pandas.core.series.Series'>
Out[4]:
0    a
1    b
2    c
3    d
4    e
Name: lower, dtype: object

In [5]:
# create a pandas Series out of the list of lowercase letters

upper = Series(list(string.uppercase), name='upper')

In [6]:
# concatenate the two Series as columns, using axis=1 
# axis = 0 would result in two rows in the DataFrame

df = pd.concat((lower, upper), axis=1)
df.head()


Out[6]:
lower upper
0 a A
1 b B
2 c C
3 d D
4 e E

5 rows × 2 columns

Using apply

Series.apply

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html:

Series.apply(func, convert_dtype=True, args=(), **kwds)

Invoke function on values of Series.

In [7]:
# Let's start by using Series.apply
# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html

# first of all, it's useful to find a way to use apply to return the exact same Series

def identity(s):
    return s

lower.apply(identity)


Out[7]:
0     a
1     b
2     c
3     d
4     e
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
Name: lower, dtype: object

In [8]:
# show that identity yields the same Series -- first on element by element basis

lower.apply(identity) == lower


Out[8]:
0     True
1     True
2     True
3     True
4     True
5     True
6     True
7     True
8     True
9     True
10    True
11    True
12    True
13    True
14    True
15    True
16    True
17    True
18    True
19    True
20    True
21    True
22    True
23    True
24    True
25    True
Name: lower, dtype: bool

In [9]:
# Check that match happens for every element in the Series using numpy.all
# http://docs.scipy.org/doc/numpy/reference/generated/numpy.all.html

np.all(lower.apply(identity) == lower)


Out[9]:
True

Let's use lambda

Sometimes it's convenient to write functions using lambda, especially short functions for doing a simple transformation of the parameters. Only some functions can be rewritten with lambda.


In [10]:
def add_preface(s):
    return 'letter ' + s

lower.apply(add_preface)


Out[10]:
0     letter a
1     letter b
2     letter c
3     letter d
4     letter e
5     letter f
6     letter g
7     letter h
8     letter i
9     letter j
10    letter k
11    letter l
12    letter m
13    letter n
14    letter o
15    letter p
16    letter q
17    letter r
18    letter s
19    letter t
20    letter u
21    letter v
22    letter w
23    letter x
24    letter y
25    letter z
Name: lower, dtype: object

In [11]:
# rewrite with lambda

lower.apply(lambda s: 'letter ' + s)


Out[11]:
0     letter a
1     letter b
2     letter c
3     letter d
4     letter e
5     letter f
6     letter g
7     letter h
8     letter i
9     letter j
10    letter k
11    letter l
12    letter m
13    letter n
14    letter o
15    letter p
16    letter q
17    letter r
18    letter s
19    letter t
20    letter u
21    letter v
22    letter w
23    letter x
24    letter y
25    letter z
Name: lower, dtype: object

Another illustration of apply

Another illustration of using apply -- using ord and chr


In [12]:
# ord: Given a string of length one, return an integer representing the Unicode code 
# point of the character when the argument is a unicode object, or the value of the 
# byte when the argument is an 8-bit string. 
# http://docs.python.org/2.7/library/functions.html#ord

ord('a')


Out[12]:
97

In [13]:
# chr: Return a string of one character whose ASCII code is the integer i.
# http://docs.python.org/2.7/library/functions.html#chr

chr(97)


Out[13]:
'a'

In [14]:
# show that for the case of 'a', chr(ord()) returns what we start with:'a'

chr(ord('a')) == 'a'


Out[14]:
True

In [15]:
# we can test whether chr reverses ord for all the lower case letters
# note how we chain two apply together

np.all(lower.apply(ord).apply(chr) == lower)


Out[15]:
True

Note that we read off a specific series from the DataFrame


In [16]:
type(df.upper)


Out[16]:
pandas.core.series.Series

In [17]:
# transform
df.upper.apply(lambda s: s.lower())


Out[17]:
0     a
1     b
2     c
3     d
4     e
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
Name: upper, dtype: object

DataFrame.apply

apply can also be applied to a DataFrame

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html

DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)
Applies function along input axis of DataFrame.

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.

In [18]:
# let's show that whether we use apply on columns (axis=0) or rows (axis=1), we get the same 
# result

def identity(s):
    return s

np.all(df.apply(identity, axis=0) == df.apply(identity, axis=1))


Out[18]:
True

In [19]:
# for each column, first lower and then upper, return the index

def index(s):
    return s.index

df.apply(index, axis=0)


Out[19]:
lower upper
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 11 11
12 12 12
13 13 13
14 14 14
15 15 15
16 16 16
17 17 17
18 18 18
19 19 19
20 20 20
21 21 21
22 22 22
23 23 23
24 24 24
25 25 25

26 rows × 2 columns


In [20]:
# for each row (axis=1), first lower and then upper, return the index 
# (which are the column names)

def index(s):
    return s.index

df.apply(index, axis=1)


Out[20]:
lower upper
0 lower upper
1 lower upper
2 lower upper
3 lower upper
4 lower upper
5 lower upper
6 lower upper
7 lower upper
8 lower upper
9 lower upper
10 lower upper
11 lower upper
12 lower upper
13 lower upper
14 lower upper
15 lower upper
16 lower upper
17 lower upper
18 lower upper
19 lower upper
20 lower upper
21 lower upper
22 lower upper
23 lower upper
24 lower upper
25 lower upper

26 rows × 2 columns


In [21]:
# it might be easier to see the difference between axis=0 vs axis=1
# by using join

# Consider what you get with

"".join(df.lower)


Out[21]:
'abcdefghijklmnopqrstuvwxyz'

In [22]:
# Now compare (axis=0)

df.apply(lambda s: "".join(s), axis=0)


Out[22]:
lower    abcdefghijklmnopqrstuvwxyz
upper    ABCDEFGHIJKLMNOPQRSTUVWXYZ
dtype: object

In [23]:
# join with axis=1

df.apply(lambda s: "".join(s), axis=1)


Out[23]:
0     aA
1     bB
2     cC
3     dD
4     eE
5     fF
6     gG
7     hH
8     iI
9     jJ
10    kK
11    lL
12    mM
13    nN
14    oO
15    pP
16    qQ
17    rR
18    sS
19    tT
20    uU
21    vV
22    wW
23    xX
24    yY
25    zZ
dtype: object

In [24]:
# note that you can access use the index in your function passed to apply

df.apply(lambda s: s['upper'] + s['lower'], axis=1)


Out[24]:
0     Aa
1     Bb
2     Cc
3     Dd
4     Ee
5     Ff
6     Gg
7     Hh
8     Ii
9     Jj
10    Kk
11    Ll
12    Mm
13    Nn
14    Oo
15    Pp
16    Qq
17    Rr
18    Ss
19    Tt
20    Uu
21    Vv
22    Ww
23    Xx
24    Yy
25    Zz
dtype: object

In [24]:


In [ ]: