Purpose of This Notebook

  • how to use apply on a pandas Series and DataFrame
  • show a bit about how lambda functions work

In [1]:
# numpy and pandas related imports 

import numpy as np
from pandas import Series, DataFrame
import pandas as pd

Setup: create Series and DataFrames

Let's make two Series and a DataFrame to use for our example


In [2]:
# for example, using lower and uppercase English letters

import string
string.lowercase, string.uppercase


Out[2]:
('abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')

In [3]:
# we can make a list composed of the individual lowercase letters 

list(string.lowercase)


Out[3]:
['a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']

In [9]:
# create a pandas Series out of the list of lowercase letters

lower = Series(list(string.lowercase))
print type(lower)
lower.head()


<class 'pandas.core.series.Series'>
Out[9]:
0    a
1    b
2    c
3    d
4    e
dtype: object

In [13]:
# create a pandas Series out of the list of lowercase letters

upper = Series(list(string.uppercase), name='upper')

In [14]:
# concatenate the two Series as columns, using axis=1 
# axis = 0 would result in two rows in the DataFrame

df = pd.concat((lower, upper), axis=1)
df.head()


Out[14]:
0 1
0 a A
1 b B
2 c C
3 d D
4 e E

5 rows × 2 columns

Using apply

Series.apply

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html:

Series.apply(func, convert_dtype=True, args=(), **kwds)

Invoke function on values of Series.

In [15]:
# Let's start by using Series.apply
# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html

# first of all, it's useful to find a way to use apply to return the exact same Series

def identity(s):
    return s

lower.apply(identity)


Out[15]:
0     a
1     b
2     c
3     d
4     e
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
dtype: object

In [16]:
# show that identity yields the same Series -- first on element by element basis

lower.apply(identity) == lower


Out[16]:
0     True
1     True
2     True
3     True
4     True
5     True
6     True
7     True
8     True
9     True
10    True
11    True
12    True
13    True
14    True
15    True
16    True
17    True
18    True
19    True
20    True
21    True
22    True
23    True
24    True
25    True
dtype: bool

In [17]:
# Check that match happens for every element in the Series using numpy.all
# http://docs.scipy.org/doc/numpy/reference/generated/numpy.all.html

np.all(lower.apply(identity) == lower)


Out[17]:
True

Let's use lambda

Sometimes it's convenient to write functions using lambda, especially short functions for doing a simple transformation of the parameters. Only some functions can be rewritten with lambda.


In [ ]:
def add_preface(s):
    return 'letter ' + s

lower.apply(add_preface)

In [ ]:
# rewrite with lambda

lower.apply(lambda s: 'letter ' + s)

Another illustration of apply

Another illustration of using apply -- using ord and chr


In [ ]:
# ord: Given a string of length one, return an integer representing the Unicode code 
# point of the character when the argument is a unicode object, or the value of the 
# byte when the argument is an 8-bit string. 
# http://docs.python.org/2.7/library/functions.html#ord

ord('a')

In [ ]:
# chr: Return a string of one character whose ASCII code is the integer i.
# http://docs.python.org/2.7/library/functions.html#chr

chr(97)

In [ ]:
# show that for the case of 'a', chr(ord()) returns what we start with:'a'

chr(ord('a')) == 'a'

In [ ]:
# we can test whether chr reverses ord for all the lower case letters
# note how we chain two apply together

np.all(lower.apply(ord).apply(chr) == lower)

Note that we read off a specific series from the DataFrame


In [ ]:
type(df.upper)

In [ ]:
# transform
df.upper.apply(lambda s: s.lower())

DataFrame.apply

apply can also be applied to a DataFrame

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html

DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)
Applies function along input axis of DataFrame.

Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1). Return type depends on whether passed function aggregates, or the reduce argument if the DataFrame is empty.

In [ ]:
# let's show that whether we use apply on columns (axis=0) or rows (axis=1), we get the same 
# result

def identity(s):
    return s

np.all(df.apply(identity, axis=0) == df.apply(identity, axis=1))

In [ ]:
# for each column, first lower and then upper, return the index

def index(s):
    return s.index

df.apply(index, axis=0)

In [ ]:
# for each row (axis=1), first lower and then upper, return the index 
# (which are the column names)

def index(s):
    return s.index

df.apply(index, axis=1)

In [ ]:
# it might be easier to see the difference between axis=0 vs axis=1
# by using join

# Consider what you get with

"".join(df.lower)

In [ ]:
# Now compare (axis=0)

df.apply(lambda s: "".join(s), axis=0)

In [ ]:
# join with axis=1

df.apply(lambda s: "".join(s), axis=1)

In [ ]:
# note that you can access use the index in your function passed to apply

df.apply(lambda s: s['upper'] + s['lower'], axis=1)