Basics of NUmpy & Pandas

Numpy

Numpy uses array whereas pandas used scaler


In [2]:
import numpy as np

Array are similar to python list , but it all element must be of same data type, and it faster than list


In [13]:
num = np.array([3,4,2,5,7,23,56,23,7,23,89,43,676,43])
num


Out[13]:
array([  3,   4,   2,   5,   7,  23,  56,  23,   7,  23,  89,  43, 676,  43])

Lets see some of functionality


In [17]:
print "Mean :",num.mean()
print "sum :",num.sum()
print "max :",num.max()
print "std :",num.std()


Mean : 71.7142857143
sum : 1004
max : 676
std : 169.340919269

In [18]:
#slicing
num[:5]


Out[18]:
array([3, 4, 2, 5, 7])

In [19]:
#find index of any element let say max
print "index of max :",num.argmax()


index of max : 12

In [21]:
print "data Type of array :",num.dtype


data Type of array : int32

Vector Operation


In [22]:
a=np.array([5,6,15])
b=np.array([5,4,-5])

In [26]:
# Addition
print "{} + {} = {}".format(a,b,a+b)


[ 5  6 15] + [ 5  4 -5] = [10 10 10]

In [27]:
print "{} * {} = {}".format(a,b,a*b)


[ 5  6 15] * [ 5  4 -5] = [ 25  24 -75]

In [28]:
print "{} / {} = {}".format(a,b,a/b)


[ 5  6 15] / [ 5  4 -5] = [ 1  1 -3]

In [34]:
# If size mismatch then error occure
b=np.array([5,4,-5,5])
print "{} + {} = {}".format(a,b,a+b)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-ca4423c15efb> in <module>()
      1 # If size mismatch then error occure
      2 b=np.array([5,4,-5,5])
----> 3 print "{} + {} = {}".format(a,b,a+b)

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

vector [+-*/] Scaler


In [30]:
print "{} + {} = {}".format(a,3,a+3)


[ 5  6 15] + 3 = [ 8  9 18]

In [31]:
print "{} * {} = {}".format(a,3,a*3)


[ 5  6 15] * 3 = [15 18 45]

In [32]:
print "{} / {} = {}".format(a,3,a/3)


[ 5  6 15] / 3 = [1 2 5]

vector & boolean vector


In [36]:
num=np.array([5,6,15,65,32,656,23,435,2,45,21])
bl=np.array([False,True,True,False,True,False,True,False,True,True,False])

In [37]:
num[6]


Out[37]:
23

num[bl],, what it will return ??

It return array of values corresponding to which elemnt in bl is True


In [40]:
num[bl]


Out[40]:
array([ 6, 15, 32, 23,  2, 45])

find all elemnt greter than 100 from num


In [41]:
num[num>100]


Out[41]:
array([656, 435])

All element less than 50 ??


In [42]:
num[num<50]


Out[42]:
array([ 5,  6, 15, 32, 23,  2, 45, 21])

In-place operation in numpay (Diff between += and +)


In [45]:
a=np.array([5,6,15])
b=a
a += 2
print b
print "this happen becouse a and b both point to same array and += is In-place operation so it maintain that"


[ 7  8 17]
this happen becouse a and b both point to same array and += is In-place operation so it maintain that

In [47]:
a=np.array([5,6,15])
b=a
a = a + 2
print b


[ 5  6 15]
this happen becouse a and b both point to same array and + operation create a new array and then a point to that so b remain unaffected"

In [49]:
a=np.array([5,6,15])
b=a[:3]
b[0]=1000
print a,"Reason is similar as +="


[1000    6   15] Reason is similar as +=

Pandas Series

Basics are same as numpy array but pandas series also contain lots of functionality and speciality


In [51]:
import pandas as pd

In [53]:
num = pd.Series([3,4,2,5,7,23,56,23,7,23,89,43,676,43])
num


Out[53]:
0       3
1       4
2       2
3       5
4       7
5      23
6      56
7      23
8       7
9      23
10     89
11     43
12    676
13     43
dtype: int64

See All basic results using describe() function


In [54]:
num.describe()


Out[54]:
count     14.000000
mean      71.714286
std      175.733377
min        2.000000
25%        5.500000
50%       23.000000
75%       43.000000
max      676.000000
dtype: float64