In [ ]:
"""
----------------------------------------------------------------------
Filename : 01_basic_data_structs.py
Date : 12th Dec, 2013
Author : Jaidev Deshpande
Purpose : To get started with basic data structures in Pandas
Libraries: Pandas 0.12 and its dependencies
----------------------------------------------------------------------
"""
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. http://pandas.pydata.org
There are many useful objects in Pandas:
In [ ]:
# imports
import pandas as pd
from math import pi
In [ ]:
s = pd.Series(range(10))
print(s)
In [ ]:
print(s[5])
A pandas Series
, like a list, doesn't have to be homogenous.
In [ ]:
s = pd.Series(['foo', None, 3+4j])
The index of a Series can be arbitrary as well.
In [ ]:
inds = ['bar',1, (1, 2)]
s.index = inds
print(s['bar'], s[1], s[(1, 2)])
Multiple Series
objects can be clubbed together to make a pandas DataFrame
. The pandas DataFrame
is similar to the data.frame
object in R.
In [ ]:
s1 = pd.Series(range(10))
s2 = pd.Series(range(10,20))
df = pd.DataFrame({'A':s1,'B':s2})
df.head()
Think of pandas DataFrames
as dict
s of Series
. Almost all operations that are valid on a Python dictionary
will work on a pandas DataFrame
.
In [ ]:
df['C'] = [str(c) for c in range(20, 30)]
print(df.head())
In [ ]:
print(df['C'])
In [ ]:
del df['A']
print(df.head(10))
In [ ]:
df.update({'B': range(50,60)})
print(df.head())
Index objects available in Pandas:
Index
: The most general Pandas index, often created by defaultInt64Index
: Specialized index for integer valuesMultiIndex
: Hierarchical indexDatetimeIndex
: Nanosecond timestamps that can be used as indexesPeriodIndex
: Specialized indices for timespans
In [ ]:
df.index
update
method demonstrated above.