Basics of NUmpy & Pandas

Numpy

Numpy uses array whereas pandas used scaler



In [2]:

    
import numpy as np

Array are similar to python list , but it all element must be of same data type, and it faster than list



In [13]:

    
num = np.array([3,4,2,5,7,23,56,23,7,23,89,43,676,43])
num









    Out[13]:





array([  3,   4,   2,   5,   7,  23,  56,  23,   7,  23,  89,  43, 676,  43])

Lets see some of functionality



In [17]:

    
print "Mean :",num.mean()
print "sum :",num.sum()
print "max :",num.max()
print "std :",num.std()









    



Mean : 71.7142857143
sum : 1004
max : 676
std : 169.340919269



In [18]:

    
#slicing
num[:5]









    Out[18]:





array([3, 4, 2, 5, 7])



In [19]:

    
#find index of any element let say max
print "index of max :",num.argmax()









    



index of max : 12



In [21]:

    
print "data Type of array :",num.dtype









    



data Type of array : int32

Vector Operation



In [22]:

    
a=np.array([5,6,15])
b=np.array([5,4,-5])



In [26]:

    
# Addition
print "{} + {} = {}".format(a,b,a+b)









    



[ 5  6 15] + [ 5  4 -5] = [10 10 10]



In [27]:

    
print "{} * {} = {}".format(a,b,a*b)









    



[ 5  6 15] * [ 5  4 -5] = [ 25  24 -75]



In [28]:

    
print "{} / {} = {}".format(a,b,a/b)









    



[ 5  6 15] / [ 5  4 -5] = [ 1  1 -3]



In [34]:

    
# If size mismatch then error occure
b=np.array([5,4,-5,5])
print "{} + {} = {}".format(a,b,a+b)









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-ca4423c15efb> in <module>()
      1 # If size mismatch then error occure
      2 b=np.array([5,4,-5,5])
----> 3 print "{} + {} = {}".format(a,b,a+b)

ValueError: operands could not be broadcast together with shapes (3,) (4,)

vector [+-*/] Scaler



In [30]:

    
print "{} + {} = {}".format(a,3,a+3)









    



[ 5  6 15] + 3 = [ 8  9 18]



In [31]:

    
print "{} * {} = {}".format(a,3,a*3)









    



[ 5  6 15] * 3 = [15 18 45]



In [32]:

    
print "{} / {} = {}".format(a,3,a/3)









    



[ 5  6 15] / 3 = [1 2 5]

vector & boolean vector



In [36]:

    
num=np.array([5,6,15,65,32,656,23,435,2,45,21])
bl=np.array([False,True,True,False,True,False,True,False,True,True,False])



In [37]:

    
num[6]









    Out[37]:





23

num[bl],, what it will return ??

It return array of values corresponding to which elemnt in bl is True



In [40]:

    
num[bl]









    Out[40]:





array([ 6, 15, 32, 23,  2, 45])

find all elemnt greter than 100 from num



In [41]:

    
num[num>100]









    Out[41]:





array([656, 435])

All element less than 50 ??



In [42]:

    
num[num<50]









    Out[42]:





array([ 5,  6, 15, 32, 23,  2, 45, 21])

In-place operation in numpay (Diff between += and +)



In [45]:

    
a=np.array([5,6,15])
b=a
a += 2
print b
print "this happen becouse a and b both point to same array and += is In-place operation so it maintain that"









    



[ 7  8 17]
this happen becouse a and b both point to same array and += is In-place operation so it maintain that



In [47]:

    
a=np.array([5,6,15])
b=a
a = a + 2
print b

this happen becouse a and b both point to same array and + operation create a new array and then a point to that so b remain unaffected"



In [49]:

    
a=np.array([5,6,15])
b=a[:3]
b[0]=1000
print a,"Reason is similar as +="









    



[1000    6   15] Reason is similar as +=

Pandas Series

Basics are same as numpy array but pandas series also contain lots of functionality and speciality



In [51]:

    
import pandas as pd



In [53]:

    
num = pd.Series([3,4,2,5,7,23,56,23,7,23,89,43,676,43])
num









    Out[53]:





0       3
1       4
2       2
3       5
4       7
5      23
6      56
7      23
8       7
9      23
10     89
11     43
12    676
13     43
dtype: int64

See All basic results using describe() function



In [54]:

    
num.describe()









    Out[54]:





count     14.000000
mean      71.714286
std      175.733377
min        2.000000
25%        5.500000
50%       23.000000
75%       43.000000
max      676.000000
dtype: float64