Part 1: Data Structures in Pandas



In [ ]:

    
"""
----------------------------------------------------------------------
Filename : 01_basic_data_structs.py
Date     : 12th Dec, 2013
Author   : Jaidev Deshpande
Purpose  : To get started with basic data structures in Pandas
Libraries: Pandas 0.12 and its dependencies
----------------------------------------------------------------------
"""

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. http://pandas.pydata.org

There are many useful objects in Pandas:

Series
DataFrame
Panel
TimeSeries

Series and DataFrame



In [ ]:

    
# imports
import pandas as pd
from math import pi



In [ ]:

    
s = pd.Series(range(10))
print(s)



In [ ]:

    
print(s[5])

A pandas Series, like a list, doesn't have to be homogenous.



In [ ]:

    
s = pd.Series(['foo', None, 3+4j])

The index of a Series can be arbitrary as well.



In [ ]:

    
inds = ['bar',1, (1, 2)]
s.index = inds
print(s['bar'], s[1], s[(1, 2)])

Multiple Series objects can be clubbed together to make a pandas DataFrame. The pandas DataFrame is similar to the data.frame object in R.



In [ ]:

    
s1 = pd.Series(range(10))
s2 = pd.Series(range(10,20))
df = pd.DataFrame({'A':s1,'B':s2})
df.head()

Think of pandas DataFrames as dicts of Series. Almost all operations that are valid on a Python dictionary will work on a pandas DataFrame.



In [ ]:

    
df['C'] = [str(c) for c in range(20, 30)]
print(df.head())



In [ ]:

    
print(df['C'])



In [ ]:

    
del df['A']
print(df.head(10))



In [ ]:

    
df.update({'B': range(50,60)})
print(df.head())

Index Objects

Index objects available in Pandas:

Index : The most general Pandas index, often created by default
Int64Index : Specialized index for integer values
MultiIndex : Hierarchical index
DatetimeIndex: Nanosecond timestamps that can be used as indexes
PeriodIndex : Specialized indices for timespans



In [ ]:

    
df.index

Exercise: Creating Series, DataFrames and indexing them

Create a random valued NumPy array having dimensions (10,10).
Convert this into a DataFrame
The column names of this DataFrame should be of type str.
Add one more column to the DataFrame using the update method demonstrated above.