Pandas basics



In [1]:

    
import pandas as pd

Pandas series

pandas series is similar to numpy array, But it suppport lots of extra functionality like Pandaseries.describe()

Basic acces is samilar to numpy arrary, it support access by index( s[5] ) or slicing ( s[5:10] ).
It also support vectorise operation and looping like numpy array.
Implemented in C so it works very fast.

Benfits of Pandas series



In [8]:

    
s=pd.Series([2,3,4,5,6])
print s.describe()









    



count    5.000000
mean     4.000000
std      1.581139
min      2.000000
25%      3.000000
50%      4.000000
75%      5.000000
max      6.000000
dtype: float64

Pandas Index

Hybrid of list and python Dictionary. It map key value pair.



In [11]:

    
sal=pd.Series([40,12,43,56],
             index=['Ram',
                  'Syam',
                  "Rahul",
                  "Ganesh"])
print sal









    



Ram       40
Syam      12
Rahul     43
Ganesh    56
dtype: int64



In [20]:

    
print sal[0]

lookUp by index



In [21]:

    
print sal.loc["Syam"]

Using sal[position] is not prefered instead prefer to use sal.iloc[position] becouse Index has different meaning in series so it avoid confusion



In [19]:

    
print sal.iloc[3]

argmax() function return index of max value element



In [24]:

    
print sal.argmax()









    



Ganesh



In [25]:

    
print sal.loc["Ganesh"]
print sal.max()

Adding series with Differen index



In [27]:

    
a=pd.Series([1,2,3,4],
            index=["a","b","c","d"])
b=pd.Series([9,8,7,6],
           index=["c","d","e","f"])
print a









    



a    1
b    2
c    3
d    4
dtype: int64



In [28]:

    
print b









    



c    9
d    8
e    7
f    6
dtype: int64



In [29]:

    
print a+b









    



a   NaN
b   NaN
c    12
d    12
e   NaN
f   NaN
dtype: float64

C,D are common in both so added correctly rest are just assign a volue NaN (Not a number)

we can modify it such that in case of mismatch original data will assign instead of NaN or drop All NaN



In [35]:

    
res = (a+b)
print res.dropna()









    



c    12
d    12
dtype: float64

Treat missing values as 0



In [37]:

    
res=a.add(b,fill_value=0)
print res









    



a     1
b     2
c    12
d    12
e     7
f     6
dtype: float64

s.apply(function_name) used to apply some operation on each element.

Example:

adding 5 to each element , we can do this by simply series+5 becouse it is a vector, But lets do using this new techniqe s.apply(function)



In [39]:

    
print res









    



a     1
b     2
c    12
d    12
e     7
f     6
dtype: float64



In [40]:

    
print res+5









    



a     6
b     7
c    17
d    17
e    12
f    11
dtype: float64



In [41]:

    
def add_5(x):
    return x+5



In [44]:

    
print res.apply(add_5)









    



a     6
b     7
c    17
d    17
e    12
f    11
dtype: float64

Plotting

automaticaly plot index vs data plot



In [47]:

    
%pylab inline
res.plot()









    



Populating the interactive namespace from numpy and matplotlib






    Out[47]:





<matplotlib.axes.AxesSubplot at 0x5746350>