CH5 Getting Started With Pandas


In [2]:
from pandas import Series, DataFrame
import pandas as pd

Series


In [4]:
obj = Series([4,7,-5,3])
obj


Out[4]:
0    4
1    7
2   -5
3    3
dtype: int64

In [5]:
obj.values


Out[5]:
array([ 4,  7, -5,  3])

In [6]:
obj.index


Out[6]:
Int64Index([0, 1, 2, 3], dtype='int64')

In [9]:
obj2 = Series([4,6,8,9], index = ['a','d','t','y'])

In [10]:
obj2


Out[10]:
a    4
d    6
t    8
y    9
dtype: int64

In [11]:
obj2.index


Out[11]:
Index([u'a', u'd', u't', u'y'], dtype='object')

In [12]:
obj2['y']


Out[12]:
9

In [13]:
obj2['y']=3

In [14]:
obj2['y']


Out[14]:
3

In [18]:
obj2[obj2 >= 6] # note that not y =3


Out[18]:
d    6
t    8
dtype: int64

In [23]:
obj2*2


Out[23]:
a     8
d    12
t    16
y     6
dtype: int64

Should you have data contained in a Python dict, you can create a Series from it by passing the dict


In [19]:
sdata = {'Ohio':30000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}

In [20]:
obj3 = Series(sdata) # treats the first colomn as the index

In [21]:
obj3


Out[21]:
Ohio      30000
Oregon    16000
Texas     71000
Utah       5000
dtype: int64

In [22]:
obj3.index


Out[22]:
Index([u'Ohio', u'Oregon', u'Texas', u'Utah'], dtype='object')

Use a function in pandas


In [24]:
states = ['California', 'Ohio', 'Oregon', 'Texas']

In [25]:
obj4 = Series(sdata, index = states)

In [26]:
obj4


Out[26]:
California      NaN
Ohio          30000
Oregon        16000
Texas         71000
dtype: float64

In [27]:
pd.isnull(obj4)


Out[27]:
California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

You can also do this:


In [28]:
obj4.isnull()


Out[28]:
California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

A critical Series feature for many applications is that it automatically aligns differently- indexed data in arithmetic operations:


In [29]:
obj3 + obj4


Out[29]:
California       NaN
Ohio           60000
Oregon         32000
Texas         142000
Utah             NaN
dtype: float64

In [ ]: