Pandas basics


In [1]:
import pandas as pd

Pandas series

pandas series is similar to numpy array, But it suppport lots of extra functionality like Pandaseries.describe()

Basic acces is samilar to numpy arrary, it support access by index( s[5] ) or slicing ( s[5:10] ).
It also support vectorise operation and looping like numpy array.
Implemented in C so it works very fast.

Benfits of Pandas series


In [8]:
s=pd.Series([2,3,4,5,6])
print s.describe()


count    5.000000
mean     4.000000
std      1.581139
min      2.000000
25%      3.000000
50%      4.000000
75%      5.000000
max      6.000000
dtype: float64

Pandas Index

Hybrid of list and python Dictionary. It map key value pair.


In [11]:
sal=pd.Series([40,12,43,56],
             index=['Ram',
                  'Syam',
                  "Rahul",
                  "Ganesh"])
print sal


Ram       40
Syam      12
Rahul     43
Ganesh    56
dtype: int64

In [20]:
print sal[0]


40

lookUp by index


In [21]:
print sal.loc["Syam"]


12

Using sal[position] is not prefered instead prefer to use sal.iloc[position] becouse Index has different meaning in series so it avoid confusion


In [19]:
print sal.iloc[3]


56

argmax() function return index of max value element


In [24]:
print sal.argmax()


Ganesh

In [25]:
print sal.loc["Ganesh"]
print sal.max()


56
56

Adding series with Differen index


In [27]:
a=pd.Series([1,2,3,4],
            index=["a","b","c","d"])
b=pd.Series([9,8,7,6],
           index=["c","d","e","f"])
print a


a    1
b    2
c    3
d    4
dtype: int64

In [28]:
print b


c    9
d    8
e    7
f    6
dtype: int64

In [29]:
print a+b


a   NaN
b   NaN
c    12
d    12
e   NaN
f   NaN
dtype: float64

C,D are common in both so added correctly rest are just assign a volue NaN (Not a number)

we can modify it such that in case of mismatch original data will assign instead of NaN or drop All NaN


In [35]:
res = (a+b)
print res.dropna()


c    12
d    12
dtype: float64

Treat missing values as 0


In [37]:
res=a.add(b,fill_value=0)
print res


a     1
b     2
c    12
d    12
e     7
f     6
dtype: float64

s.apply(function_name) used to apply some operation on each element.

Example:

adding 5 to each element , we can do this by simply series+5 becouse it is a vector, But lets do using this new techniqe s.apply(function)


In [39]:
print res


a     1
b     2
c    12
d    12
e     7
f     6
dtype: float64

In [40]:
print res+5


a     6
b     7
c    17
d    17
e    12
f    11
dtype: float64

In [41]:
def add_5(x):
    return x+5

In [44]:
print res.apply(add_5)


a     6
b     7
c    17
d    17
e    12
f    11
dtype: float64

Plotting

automaticaly plot index vs data plot


In [47]:
%pylab inline
res.plot()


Populating the interactive namespace from numpy and matplotlib
Out[47]:
<matplotlib.axes.AxesSubplot at 0x5746350>