Python for Data Analysis Lightning Tutorials

Pandas Cookbook Series

Python for Data Analysis Lightning Tutorials is a series of tutorials in Data Analysis, Statistics, and Graphics using Python. The Pandas Cookbook series of tutorials provides recipes for common tasks and moves on to more advanced topics in statistics and time series analysis.

Created by Alfred Essa, Apr 27th, 2014

Note: IPython Notebook and Data files can be found at my Github Site: http://github/alfredessa

4.1 Introduction

In this tutorial we learn to apply Python slicing operations to a Pandas DataFrame object.

4.11 Load Libraries


In [1]:
# load pandas and numpy
import pandas as pd
import numpy as np

In [2]:
# ipython magic for inline plots
%pylab inline


Populating the interactive namespace from numpy and matplotlib

4.12 Create a Sample DataFrame Object


In [3]:
# we want to a create a dataframe that has dates as rows and cities as columns, 
# the cells will be temperatures


# first let's define a date range, the first two months of 2014

days = pd.date_range('2014-01-01', '2014-02-28', freq = 'D')

In [ ]:
# then we will create a tuple defining the dimensions of our table (59 x 5)

In [4]:
dim = (59,5)

In [5]:
# define the data frame
df = pd.DataFrame(np.random.random_integers(-20,40,dim), index=days, columns=['Madrid','Boston','Tokyo','Shanghai','Kolkata'])

In [7]:
# check the values
df.tail()


Out[7]:
Madrid Boston Tokyo Shanghai Kolkata
2014-02-24 4 8 26 13 -19
2014-02-25 23 -13 15 23 39
2014-02-26 -13 20 20 -4 -6
2014-02-27 -2 -7 -20 -8 8
2014-02-28 7 -18 4 -20 40

5 rows × 5 columns

4.13 Apply various slices using .ix method


In [8]:
# range of rows and range of columns 
df.ix[3:6, 'Madrid': 'Tokyo']


Out[8]:
Madrid Boston Tokyo
2014-01-04 37 19 16
2014-01-05 -14 -9 16
2014-01-06 -6 26 19

3 rows × 3 columns


In [9]:
# specific rows and specific columns
df.ix[[3,20,49], ['Boston', 'Shanghai']]


Out[9]:
Boston Shanghai
2014-01-04 19 21
2014-01-21 11 31
2014-02-19 9 -17

3 rows × 2 columns


In [10]:
# specific rows and all columns
df.ix[[3,9,11,13], :]


Out[10]:
Madrid Boston Tokyo Shanghai Kolkata
2014-01-04 37 19 16 21 -12
2014-01-10 -13 36 24 -15 40
2014-01-12 -3 36 21 16 -6
2014-01-14 -20 25 13 25 36

4 rows × 5 columns


In [11]:
# all rows and range of columns
df.ix[ :, 'Madrid': 'Tokyo']


Out[11]:
Madrid Boston Tokyo
2014-01-01 1 39 -11
2014-01-02 -13 26 15
2014-01-03 -7 -4 5
2014-01-04 37 19 16
2014-01-05 -14 -9 16
2014-01-06 -6 26 19
2014-01-07 30 -13 31
2014-01-08 31 -1 7
2014-01-09 24 -16 28
2014-01-10 -13 36 24
2014-01-11 0 -1 32
2014-01-12 -3 36 21
2014-01-13 17 26 27
2014-01-14 -20 25 13
2014-01-15 -17 0 15
2014-01-16 -1 23 18
2014-01-17 22 24 8
2014-01-18 15 31 20
2014-01-19 3 -19 23
2014-01-20 4 -2 40
2014-01-21 -15 11 38
2014-01-22 2 -3 -1
2014-01-23 1 28 5
2014-01-24 23 20 21
2014-01-25 28 5 -5
2014-01-26 31 23 -6
2014-01-27 -13 -7 16
2014-01-28 26 10 16
2014-01-29 35 23 16
2014-01-30 25 39 40
2014-01-31 8 22 36
2014-02-01 37 30 35
2014-02-02 -13 23 2
2014-02-03 -18 -7 7
2014-02-04 0 1 4
2014-02-05 15 -12 -13
2014-02-06 -2 33 -11
2014-02-07 -15 30 16
2014-02-08 19 -3 27
2014-02-09 21 37 22
2014-02-10 -17 -1 2
2014-02-11 37 11 22
2014-02-12 -3 -16 0
2014-02-13 -18 23 14
2014-02-14 1 29 22
2014-02-15 -1 1 -1
2014-02-16 28 -1 39
2014-02-17 24 7 -1
2014-02-18 13 28 -7
2014-02-19 1 9 8
2014-02-20 8 24 -18
2014-02-21 -5 36 -17
2014-02-22 35 37 -1
2014-02-23 -14 5 39
2014-02-24 4 8 26
2014-02-25 23 -13 15
2014-02-26 -13 20 20
2014-02-27 -2 -7 -20
2014-02-28 7 -18 4

59 rows × 3 columns


In [12]:
# range of rows and specific columns
df.ix[3:11, ['Boston','Shanghai']]


Out[12]:
Boston Shanghai
2014-01-04 19 21
2014-01-05 -9 20
2014-01-06 26 16
2014-01-07 -13 35
2014-01-08 -1 22
2014-01-09 -16 16
2014-01-10 36 -15
2014-01-11 -1 21

8 rows × 2 columns

4.14 Practice using World Bank dataset


In [13]:
m = pd.read_csv('data/mortality.csv')

In [15]:


In [17]:
m = m.set_index('Country Name')

In [18]:
m.head()


Out[18]:
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
Country Name
Aruba NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
Andorra NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
Afghanistan NaN 352.5 346.9 341.8 336.8 331.6 326.2 321.2 316.2 311.4 306.7 301.3 296 290.9 285.2 279.8 274.2 268.3 262.4 256.5 ...
Angola NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
Albania NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...

5 rows × 54 columns


In [19]:
t = m.T

In [20]:
t.head()


Out[20]:
Country Name Aruba Andorra Afghanistan Angola Albania United Arab Emirates Argentina Armenia American Samoa Antigua and Barbuda Australia Austria Azerbaijan Burundi Belgium Benin Burkina Faso Bangladesh Bulgaria Bahrain
1960 NaN NaN NaN NaN NaN 206.7 NaN NaN NaN NaN 24.8 42.8 NaN 245.4 33.9 321.9 344.9 262.4 NaN 198.6 ...
1961 NaN NaN 352.5 NaN NaN 195.6 NaN NaN NaN NaN 24.3 40.2 NaN 245.0 32.4 316.8 340.1 255.3 NaN 181.2 ...
1962 NaN NaN 346.9 NaN NaN 184.7 NaN NaN NaN NaN 23.7 37.9 NaN 244.7 31.2 310.2 335.8 248.4 NaN 165.1 ...
1963 NaN NaN 341.8 NaN NaN 173.9 NaN NaN NaN NaN 23.2 35.9 NaN 244.2 30.0 302.8 332.1 241.9 46.5 150.4 ...
1964 NaN NaN 336.8 NaN NaN 163.1 NaN NaN NaN NaN 22.7 34.2 NaN 243.7 28.9 295.3 328.8 235.9 45.7 136.8 ...

5 rows × 224 columns


In [21]:
comparison = t[['Bangladesh','India','Rwanda','Uganda']]

In [22]:
comparison.tail()


Out[22]:
Country Name Bangladesh India Rwanda Uganda
2009 50.9 63.8 69.4 83.0
2010 47.2 61.2 63.8 78.3
2011 43.8 58.6 58.9 74.0
2012 40.9 56.3 55.0 68.9
2013 NaN NaN NaN NaN

5 rows × 4 columns


In [23]:
comparison.plot()


Out[23]:
<matplotlib.axes.AxesSubplot at 0x1104a0b90>

In [ ]: