Python for Data Analysis Lightning Tutorials

Pandas Cookbook Series

Python for Data Analysis Lightning Tutorials is a series of tutorials in Data Analysis, Statistics, and Graphics using Python. The Pandas Cookbook series of tutorials provides recipes for common tasks and moves on to more advanced topics in statistics and time series analysis.

Created by Alfred Essa, Dec 15th, 2013

Note: IPython Notebook and Data files can be found at my Github Site: http://github/alfredessa

Chapter 1: Data Structures

1.2 Problem. How can I create a DataFrame object in Pandas?

1.21 What is a DataFrame?

The DataFrame data structure in Pandas is a two-dimensional labeled array.

  • Data in the array can be of any type (integers, strings, floating point numbers, Python objects, etc.).
  • Data within each column is homogeneous
  • By default Pandas creates a numerical index for the rows in sequence 0...n

Here's an example where we have set the Dates column to be the index and label for the rows.

1.22 Preliminaries - import pandas and datetime library; create data for populating our first dataframe object


In [ ]:
import pandas as pd
import datetime

In [ ]:
# create a list containing dates from 12-01 to 12-07
dt = datetime.datetime(2013,12,1)
end = datetime.datetime(2013,12,8)
step = datetime.timedelta(days=1)
dates = []

In [ ]:
# populate the list
while dt < end:
    dates.append(dt.strftime('%m-%d'))
    dt += step

In [ ]:
dates

In [ ]:
d = {'Date': dates, 'Tokyo' : [15,19,15,11,9,8,13], 'Paris': [-2,0,2,5,7,-5,-3], 'Mumbai':[20,18,23,19,25,27,23]}

In [ ]:
d

1.23 Example 1: Create Dataframe Object from a Python Dictionary of equal length lists


In [ ]:
temps = pd.DataFrame(d)

In [ ]:
ntemp = temps['Mumbai']

In [ ]:
ntemp

In [ ]:
temps = temps.set_index('Date')

In [ ]:
temps

1.24 Example 2 : Create DataFrame Object by reading a .csv file (Titanic passengers)


In [ ]:
titanic = pd.read_csv('data/titanic.csv')

In [ ]:
titanic.Survived.value_counts()

1.25 Example 3 : Create DataFrame Object by reading a .csv file (Olympic Medalists)


In [ ]:
medals=pd.read_csv('data/olympicmedals.csv')

In [ ]:
medals.tail()

In [ ]:
medals.Sport.value_counts()

In [ ]: