NumPy

NumPy is a Linear Algebra Library for Python.

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes. The number of axes is rank. For example, the coordinates of a point in 3D space [1, 2, 1] is an array of rank 1, because it has one axis. That axis has a length of 3. In the example pictured below, the array has rank 2 (it is 2-dimensional).

Numpy is also incredibly fast, as it has bindings to C libraries.

For easy installing Numpy:

sudo pip3 install numpy

NumPy array


In [1]:
import numpy as np 

a = [1,2,3]

a


Out[1]:
[1, 2, 3]

In [2]:
b = np.array(a)
b


Out[2]:
array([1, 2, 3])

In [3]:
np.arange(1, 10)


Out[3]:
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]:
np.arange(1, 10, 2)


Out[4]:
array([1, 3, 5, 7, 9])

zeros , ones and eye

np.zeros

Return a new array of given shape and type, filled with zeros.


In [5]:
np.zeros(2, dtype=float)


Out[5]:
array([ 0.,  0.])

In [5]:
np.zeros((2,3))


Out[5]:
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
ones

Return a new array of given shape and type, filled with ones.


In [7]:
np.ones(3, )


Out[7]:
array([ 1.,  1.,  1.])
eye

Return a 2-D array with ones on the diagonal and zeros elsewhere.


In [8]:
np.eye(3)


Out[8]:
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])
linspace

Returns num evenly spaced samples, calculated over the interval [start, stop].


In [9]:
np.linspace(1, 11, 3)


Out[9]:
array([  1.,   6.,  11.])

Random number and matrix

rand

Random values in a given shape.


In [10]:
np.random.rand(2)


Out[10]:
array([ 0.11632384,  0.36169799])

In [11]:
np.random.rand(2,3,4)


Out[11]:
array([[[ 0.57504205,  0.77035938,  0.48356283,  0.43705212],
        [ 0.43280501,  0.28349802,  0.71859913,  0.70023941],
        [ 0.23944041,  0.38088474,  0.45425597,  0.1247087 ]],

       [[ 0.47605776,  0.02899426,  0.07707044,  0.72656565],
        [ 0.73339753,  0.87517107,  0.73552799,  0.2346255 ],
        [ 0.68990338,  0.57983998,  0.37863682,  0.03533712]]])
randn

Return a sample (or samples) from the "standard normal" distribution.

  • andom.standard_normal Similar, but takes a tuple as its argument.

In [12]:
np.random.randn(2,3)


Out[12]:
array([[-1.6991798 , -0.61355368,  0.49392586],
       [ 0.89563615,  1.42702856,  0.97350729]])
random

Return random floats in the half-open interval [0.0, 1.0).


In [13]:
np.random.random()


Out[13]:
0.6852553611099047
randint

Return n random integers (by default one integer) from low (inclusive) to high (exclusive).


In [14]:
np.random.randint(1,50,10)


Out[14]:
array([12, 31, 31, 12,  2, 46, 24, 11, 47,  3])

In [15]:
np.random.randint(1,40)


Out[15]:
24

Shape and Reshape

shape return the shape of data and reshape returns an array containing the same data with a new shape

In [16]:
zero = np.zeros([3,4])
print(zero , '   ' ,'shape of a :' , zero.shape)
zero = zero.reshape([2,6])
print()
print(zero)


[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]     shape of a : (3, 4)

[[ 0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.]]

Basic Operation

Element wise product and matrix product


In [17]:
number = np.array([[1,2,],
                   [3,4]])
number2 = np.array([[1,3],[2,1]])

print('element wise product :\n',number * number2 )
print('matrix product :\n',number.dot(number2))     ## also can use : np.dot(number, number2)


element wise product :
 [[1 6]
 [6 4]]
matrix product :
 [[ 5  5]
 [11 13]]

min max argmin argmax mean


In [18]:
numbers = np.random.randint(1,100, 10)
print(numbers)
print('max is :', numbers.max())
print('index of max :', numbers.argmax())
print('min is :', numbers.min())
print('index of min :', numbers.argmin())
print('mean :', numbers.mean())


[25 46 17 37 90 17 36 99 68 56]
max is : 99
index of max : 7
min is : 17
index of min : 2
mean : 49.1

Universal function

numpy also has some funtion for mathmatical operation like exp, log, sqrt, abs and etc .

for find more function click here


In [19]:
number = np.arange(1,10).reshape(3,3)
print(number)
print()
print('exp:\n', np.exp(number))
print()
print('sqrt:\n',np.sqrt(number))


[[1 2 3]
 [4 5 6]
 [7 8 9]]

exp:
 [[  2.71828183e+00   7.38905610e+00   2.00855369e+01]
 [  5.45981500e+01   1.48413159e+02   4.03428793e+02]
 [  1.09663316e+03   2.98095799e+03   8.10308393e+03]]

sqrt:
 [[ 1.          1.41421356  1.73205081]
 [ 2.          2.23606798  2.44948974]
 [ 2.64575131  2.82842712  3.        ]]
dtype

In [20]:
numbers.dtype


Out[20]:
dtype('int64')

No copy & Shallow copy & Deep copy

  • ### No copy ###### Simple assignments make no copy of array objects or of their data.

In [21]:
number = np.arange(0,20)
number2 = number 
print (number is number2 , id(number), id(number2))
print(number)
number2.shape = (4,5)
print(number)


True 139671397699872 139671397699872
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
  • ### Shallow copy

    Different array objects can share the same data. The view method creates a new array object that looks at the same data.


In [22]:
number = np.arange(0,20)
number2 = number.view()
print (number is number2 , id(number), id(number2))


False 139671397702032 139671397702432

In [23]:
number2.shape = (5,4)
print('number2 shape:', number2.shape,'\nnumber shape:', number.shape)


number2 shape: (5, 4) 
number shape: (20,)

In [24]:
print('befor:', number)
number2[0][0] = 2222
print()
print('after:', number)


befor: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

after: [2222    1    2    3    4    5    6    7    8    9   10   11   12   13   14
   15   16   17   18   19]
  • ### Deep copy

    The copy method makes a complete copy of the array and its data.


In [25]:
number = np.arange(0,20)
number2 = number.copy()
print (number is number2 , id(number), id(number2))


False 139671397701872 139671397732560

In [26]:
print('befor:', number)
number2[0] = 10
print()
print('after:', number)
print()
print('number2:',number2)


befor: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

after: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

number2: [10  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

Broadcating

One of important concept to understand numpy is Broadcasting

It's very useful for performancing mathmaica operation beetween arrays of different shape.


In [27]:
number = np.arange(1,11)
num = 2 
print(' number =', number)
print('\n number .* num =',number * num)


 number = [ 1  2  3  4  5  6  7  8  9 10]

 number .* num = [ 2  4  6  8 10 12 14 16 18 20]

In [28]:
number = np.arange(1,10).reshape(3,3)
number2 = np.arange(1,4).reshape(1,3)
number * number2


Out[28]:
array([[ 1,  4,  9],
       [ 4, 10, 18],
       [ 7, 16, 27]])

In [29]:
number = np.array([1,2,3])
print('number =', number)
print('\nnumber =', number + 100)


number = [1 2 3]

number = [101 102 103]

In [30]:
number = np.arange(1,10).reshape(3,3)
number2 = np.arange(1,4)
print('number: \n', number)
add = number + number2 
print()
print('number2: \n ', number2)
print()
print('add: \n', add)


number: 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

number2: 
  [1 2 3]

add: 
 [[ 2  4  6]
 [ 5  7  9]
 [ 8 10 12]]

If you still doubt Why we use Python and NumPy see it. 😉


In [31]:
from time import time
a = np.random.rand(8000000, 1)
c = 0
tic = time()
for i in range(len(a)):
    c +=(a[i][0] * a[i][0])
          
print ('output1:', c)
tak = time()

print('multiply 2 matrix with loop: ', tak - tic)

tic = time()
print('output2:', np.dot(a.T, a))
tak = time()


print('multiply 2 matrix with numpy func: ', tak - tic)


output1: 2665834.57759
multiply 2 matrix with loop:  4.754087448120117
output2: [[ 2665834.57759168]]
multiply 2 matrix with numpy func:  0.004221677780151367

I tried to write essential things in numpy that you can start to code and enjoy it but there are many function that i don't write in this book if you neet more informatino click here

Pandas

pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

For easy installing Pandas

sudo pip3 install pandas

In [6]:
import pandas as pd

Series


In [33]:
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}

In [34]:
pd.Series(data=my_list)


Out[34]:
0    10
1    20
2    30
dtype: int64

In [35]:
pd.Series(data=my_list,index=labels)


Out[35]:
a    10
b    20
c    30
dtype: int64

In [36]:
pd.Series(d)


Out[36]:
a    10
b    20
c    30
dtype: int64

Dataframe

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure


In [7]:
dataframe = pd.DataFrame(np.random.randn(5,4),columns=['A','B','V','D'])

In [8]:
dataframe.head()


Out[8]:
A B V D
0 -1.194072 -0.028099 -0.948889 -1.531588
1 -0.030137 -0.732911 0.604193 1.568273
2 0.263154 -1.560707 1.182750 -0.521581
3 -1.527798 -0.577411 -1.407768 -1.287491
4 -0.733531 0.098774 -1.348715 0.350558

Selection


In [9]:
dataframe['A']


Out[9]:
0   -1.194072
1   -0.030137
2    0.263154
3   -1.527798
4   -0.733531
Name: A, dtype: float64

In [10]:
dataframe[['A', 'D']]


Out[10]:
A D
0 -1.194072 -1.531588
1 -0.030137 1.568273
2 0.263154 -0.521581
3 -1.527798 -1.287491
4 -0.733531 0.350558

creating new column


In [11]:
dataframe['E'] = dataframe['A'] + dataframe['B']

In [12]:
dataframe


Out[12]:
A B V D E
0 -1.194072 -0.028099 -0.948889 -1.531588 -1.222172
1 -0.030137 -0.732911 0.604193 1.568273 -0.763048
2 0.263154 -1.560707 1.182750 -0.521581 -1.297553
3 -1.527798 -0.577411 -1.407768 -1.287491 -2.105209
4 -0.733531 0.098774 -1.348715 0.350558 -0.634756

removing a column


In [14]:
dataframe.drop('E', axis=1)


Out[14]:
A B V D
0 -1.194072 -0.028099 -0.948889 -1.531588
1 -0.030137 -0.732911 0.604193 1.568273
2 0.263154 -1.560707 1.182750 -0.521581
3 -1.527798 -0.577411 -1.407768 -1.287491
4 -0.733531 0.098774 -1.348715 0.350558

In [44]:
dataframe


Out[44]:
A B C D E
0 -0.131864 0.478105 0.759782 -1.163273 0.346242
1 -1.201529 1.419080 -0.180453 0.682591 0.217551
2 -0.538697 0.521623 0.565700 -0.169198 -0.017073
3 0.573575 0.393859 2.964976 1.436765 0.967434
4 -1.053742 1.134712 -0.165858 -0.389600 0.080970

In [45]:
dataframe.drop('E', axis=1, inplace=True)
dataframe


Out[45]:
A B C D
0 -0.131864 0.478105 0.759782 -1.163273
1 -1.201529 1.419080 -0.180453 0.682591
2 -0.538697 0.521623 0.565700 -0.169198
3 0.573575 0.393859 2.964976 1.436765
4 -1.053742 1.134712 -0.165858 -0.389600

Selcting row


In [46]:
dataframe.loc[0]


Out[46]:
A   -0.131864
B    0.478105
C    0.759782
D   -1.163273
Name: 0, dtype: float64

In [47]:
dataframe.iloc[0]


Out[47]:
A   -0.131864
B    0.478105
C    0.759782
D   -1.163273
Name: 0, dtype: float64

In [48]:
dataframe.loc[0 , 'A']


Out[48]:
-0.13186355473715136

In [49]:
dataframe.loc[[0,2],['A', 'C']]


Out[49]:
A C
0 -0.131864 0.759782
2 -0.538697 0.565700

Conditional Selection


In [50]:
dataframe > 0.3


Out[50]:
A B C D
0 False True True False
1 False True False True
2 False True True False
3 True True True True
4 False True False False

In [51]:
dataframe[dataframe > 0.3 ]


Out[51]:
A B C D
0 NaN 0.478105 0.759782 NaN
1 NaN 1.419080 NaN 0.682591
2 NaN 0.521623 0.565700 NaN
3 0.573575 0.393859 2.964976 1.436765
4 NaN 1.134712 NaN NaN

In [52]:
dataframe[dataframe['A']>0.3]


Out[52]:
A B C D
3 0.573575 0.393859 2.964976 1.436765

In [53]:
dataframe[dataframe['A']>0.3]['B']


Out[53]:
3    0.393859
Name: B, dtype: float64

In [54]:
dataframe[(dataframe['A']>0.5) & (dataframe['C'] > 0)]


Out[54]:
A B C D
3 0.573575 0.393859 2.964976 1.436765

Multi-Index and Index Hierarchy


In [12]:
layer1 = ['g1','g1','g1','g2','g2','g2']
layer2 = [1,2,3,1,2,3]
hier_index = list(zip(layer1,layer2))
hier_index = pd.MultiIndex.from_tuples(hier_index)

In [13]:
hier_index


Out[13]:
MultiIndex(levels=[['g1', 'g2'], [1, 2, 3]],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])

In [14]:
dataframe2 = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])

In [15]:
dataframe2


Out[15]:
A B
g1 1 1.327113 -0.870679
2 0.258946 1.492455
3 2.041487 -0.101779
g2 1 -0.465014 -2.738942
2 0.666121 -1.009980
3 -0.459053 0.128703

In [58]:
dataframe2.loc['g1']


Out[58]:
A B
1 -0.125270 -0.492899
2 1.272080 0.829754
3 0.532546 0.938678

In [59]:
dataframe2.loc['g1'].loc[1]


Out[59]:
A   -0.125270
B   -0.492899
Name: 1, dtype: float64

Input and output


In [60]:
titanic = pd.read_csv('Datasets/titanic.csv')

In [ ]:
pd.read

In [61]:
titanic.head()


Out[61]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

In [62]:
titanic.drop('Name', axis=1 , inplace = True)

In [63]:
titanic.head()


Out[63]:
PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 male 35.0 0 0 373450 8.0500 NaN S

In [64]:
titanic.to_csv('Datasets/titanic_drop_names.csv')

csv is one of the most important format but Pandas compatible with many other format like html table , sql, json and etc.

Mising data (NaN)


In [65]:
titanic.head()


Out[65]:
PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 male 35.0 0 0 373450 8.0500 NaN S

In [66]:
titanic.dropna()


Out[66]:
PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C
3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S
6 7 0 1 male 54.0 0 0 17463 51.8625 E46 S
10 11 1 3 female 4.0 1 1 PP 9549 16.7000 G6 S
11 12 1 1 female 58.0 0 0 113783 26.5500 C103 S
21 22 1 2 male 34.0 0 0 248698 13.0000 D56 S
23 24 1 1 male 28.0 0 0 113788 35.5000 A6 S
27 28 0 1 male 19.0 3 2 19950 263.0000 C23 C25 C27 S
52 53 1 1 female 49.0 1 0 PC 17572 76.7292 D33 C
54 55 0 1 male 65.0 0 1 113509 61.9792 B30 C
62 63 0 1 male 45.0 1 0 36973 83.4750 C83 S
66 67 1 2 female 29.0 0 0 C.A. 29395 10.5000 F33 S
75 76 0 3 male 25.0 0 0 348123 7.6500 F G73 S
88 89 1 1 female 23.0 3 2 19950 263.0000 C23 C25 C27 S
92 93 0 1 male 46.0 1 0 W.E.P. 5734 61.1750 E31 S
96 97 0 1 male 71.0 0 0 PC 17754 34.6542 A5 C
97 98 1 1 male 23.0 0 1 PC 17759 63.3583 D10 D12 C
102 103 0 1 male 21.0 0 1 35281 77.2875 D26 S
110 111 0 1 male 47.0 0 0 110465 52.0000 C110 S
118 119 0 1 male 24.0 0 1 PC 17558 247.5208 B58 B60 C
123 124 1 2 female 32.5 0 0 27267 13.0000 E101 S
124 125 0 1 male 54.0 0 1 35281 77.2875 D26 S
136 137 1 1 female 19.0 0 2 11752 26.2833 D47 S
137 138 0 1 male 37.0 1 0 113803 53.1000 C123 S
139 140 0 1 male 24.0 0 0 PC 17593 79.2000 B86 C
148 149 0 2 male 36.5 0 2 230080 26.0000 F2 S
151 152 1 1 female 22.0 1 0 113776 66.6000 C2 S
170 171 0 1 male 61.0 0 0 111240 33.5000 B19 S
174 175 0 1 male 56.0 0 0 17764 30.6958 A7 C
177 178 0 1 female 50.0 0 0 PC 17595 28.7125 C49 C
... ... ... ... ... ... ... ... ... ... ... ...
737 738 1 1 male 35.0 0 0 PC 17755 512.3292 B101 C
741 742 0 1 male 36.0 1 0 19877 78.8500 C46 S
742 743 1 1 female 21.0 2 2 PC 17608 262.3750 B57 B59 B63 B66 C
745 746 0 1 male 70.0 1 1 WE/P 5735 71.0000 B22 S
748 749 0 1 male 19.0 1 0 113773 53.1000 D30 S
751 752 1 3 male 6.0 0 1 392096 12.4750 E121 S
759 760 1 1 female 33.0 0 0 110152 86.5000 B77 S
763 764 1 1 female 36.0 1 2 113760 120.0000 B96 B98 S
765 766 1 1 female 51.0 1 0 13502 77.9583 D11 S
772 773 0 2 female 57.0 0 0 S.O./P.P. 3 10.5000 E77 S
779 780 1 1 female 43.0 0 1 24160 211.3375 B3 S
781 782 1 1 female 17.0 1 0 17474 57.0000 B20 S
782 783 0 1 male 29.0 0 0 113501 30.0000 D6 S
789 790 0 1 male 46.0 0 0 PC 17593 79.2000 B82 B84 C
796 797 1 1 female 49.0 0 0 17465 25.9292 D17 S
802 803 1 1 male 11.0 1 2 113760 120.0000 B96 B98 S
806 807 0 1 male 39.0 0 0 112050 0.0000 A36 S
809 810 1 1 female 33.0 1 0 113806 53.1000 E8 S
820 821 1 1 female 52.0 1 1 12749 93.5000 B69 S
823 824 1 3 female 27.0 0 1 392096 12.4750 E121 S
835 836 1 1 female 39.0 1 1 PC 17756 83.1583 E49 C
853 854 1 1 female 16.0 0 1 PC 17592 39.4000 D28 S
857 858 1 1 male 51.0 0 0 113055 26.5500 E17 S
862 863 1 1 female 48.0 0 0 17466 25.9292 D17 S
867 868 0 1 male 31.0 0 0 PC 17590 50.4958 A24 S
871 872 1 1 female 47.0 1 1 11751 52.5542 D35 S
872 873 0 1 male 33.0 0 0 695 5.0000 B51 B53 B55 S
879 880 1 1 female 56.0 0 1 11767 83.1583 C50 C
887 888 1 1 female 19.0 0 0 112053 30.0000 B42 S
889 890 1 1 male 26.0 0 0 111369 30.0000 C148 C

183 rows × 11 columns


In [67]:
titanic.dropna(axis=1)


Out[67]:
PassengerId Survived Pclass Sex SibSp Parch Ticket Fare
0 1 0 3 male 1 0 A/5 21171 7.2500
1 2 1 1 female 1 0 PC 17599 71.2833
2 3 1 3 female 0 0 STON/O2. 3101282 7.9250
3 4 1 1 female 1 0 113803 53.1000
4 5 0 3 male 0 0 373450 8.0500
5 6 0 3 male 0 0 330877 8.4583
6 7 0 1 male 0 0 17463 51.8625
7 8 0 3 male 3 1 349909 21.0750
8 9 1 3 female 0 2 347742 11.1333
9 10 1 2 female 1 0 237736 30.0708
10 11 1 3 female 1 1 PP 9549 16.7000
11 12 1 1 female 0 0 113783 26.5500
12 13 0 3 male 0 0 A/5. 2151 8.0500
13 14 0 3 male 1 5 347082 31.2750
14 15 0 3 female 0 0 350406 7.8542
15 16 1 2 female 0 0 248706 16.0000
16 17 0 3 male 4 1 382652 29.1250
17 18 1 2 male 0 0 244373 13.0000
18 19 0 3 female 1 0 345763 18.0000
19 20 1 3 female 0 0 2649 7.2250
20 21 0 2 male 0 0 239865 26.0000
21 22 1 2 male 0 0 248698 13.0000
22 23 1 3 female 0 0 330923 8.0292
23 24 1 1 male 0 0 113788 35.5000
24 25 0 3 female 3 1 349909 21.0750
25 26 1 3 female 1 5 347077 31.3875
26 27 0 3 male 0 0 2631 7.2250
27 28 0 1 male 3 2 19950 263.0000
28 29 1 3 female 0 0 330959 7.8792
29 30 0 3 male 0 0 349216 7.8958
... ... ... ... ... ... ... ... ...
861 862 0 2 male 1 0 28134 11.5000
862 863 1 1 female 0 0 17466 25.9292
863 864 0 3 female 8 2 CA. 2343 69.5500
864 865 0 2 male 0 0 233866 13.0000
865 866 1 2 female 0 0 236852 13.0000
866 867 1 2 female 1 0 SC/PARIS 2149 13.8583
867 868 0 1 male 0 0 PC 17590 50.4958
868 869 0 3 male 0 0 345777 9.5000
869 870 1 3 male 1 1 347742 11.1333
870 871 0 3 male 0 0 349248 7.8958
871 872 1 1 female 1 1 11751 52.5542
872 873 0 1 male 0 0 695 5.0000
873 874 0 3 male 0 0 345765 9.0000
874 875 1 2 female 1 0 P/PP 3381 24.0000
875 876 1 3 female 0 0 2667 7.2250
876 877 0 3 male 0 0 7534 9.8458
877 878 0 3 male 0 0 349212 7.8958
878 879 0 3 male 0 0 349217 7.8958
879 880 1 1 female 0 1 11767 83.1583
880 881 1 2 female 0 1 230433 26.0000
881 882 0 3 male 0 0 349257 7.8958
882 883 0 3 female 0 0 7552 10.5167
883 884 0 2 male 0 0 C.A./SOTON 34068 10.5000
884 885 0 3 male 0 0 SOTON/OQ 392076 7.0500
885 886 0 3 female 0 5 382652 29.1250
886 887 0 2 male 0 0 211536 13.0000
887 888 1 1 female 0 0 112053 30.0000
888 889 0 3 female 1 2 W./C. 6607 23.4500
889 890 1 1 male 0 0 111369 30.0000
890 891 0 3 male 0 0 370376 7.7500

891 rows × 8 columns


In [68]:
titanic.fillna('Fill NaN').head()


Out[68]:
PassengerId Survived Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 male 22 1 0 A/5 21171 7.2500 Fill NaN S
1 2 1 1 female 38 1 0 PC 17599 71.2833 C85 C
2 3 1 3 female 26 0 0 STON/O2. 3101282 7.9250 Fill NaN S
3 4 1 1 female 35 1 0 113803 53.1000 C123 S
4 5 0 3 male 35 0 0 373450 8.0500 Fill NaN S

Concating merging and ...


In [16]:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3']},
                        index=[0, 1, 2, 3])

df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                        'B': ['B4', 'B5', 'B6', 'B7'],
                        'C': ['C4', 'C5', 'C6', 'C7'],
                        'D': ['D4', 'D5', 'D6', 'D7']},
                         index=[4, 5, 6, 7]) 

df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                        'B': ['B8', 'B9', 'B10', 'B11'],
                        'C': ['C8', 'C9', 'C10', 'C11'],
                        'D': ['D8', 'D9', 'D10', 'D11']},
                        index=[8, 9, 10, 11])

In [17]:
df1


Out[17]:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3

In [18]:
df2


Out[18]:
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7

In [19]:
df3


Out[19]:
A B C D
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11

Concatenation


In [20]:
frames = [df1, df2, df3 ]

In [21]:
pd.concat(frames)
#pd.concat(frames, ignore_index=True)


Out[21]:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11

In [22]:
pd.concat(frames, axis=1)


Out[22]:
A B C D A B C D A B C D
0 A0 B0 C0 D0 NaN NaN NaN NaN NaN NaN NaN NaN
1 A1 B1 C1 D1 NaN NaN NaN NaN NaN NaN NaN NaN
2 A2 B2 C2 D2 NaN NaN NaN NaN NaN NaN NaN NaN
3 A3 B3 C3 D3 NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN A4 B4 C4 D4 NaN NaN NaN NaN
5 NaN NaN NaN NaN A5 B5 C5 D5 NaN NaN NaN NaN
6 NaN NaN NaN NaN A6 B6 C6 D6 NaN NaN NaN NaN
7 NaN NaN NaN NaN A7 B7 C7 D7 NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN NaN NaN A8 B8 C8 D8
9 NaN NaN NaN NaN NaN NaN NaN NaN A9 B9 C9 D9
10 NaN NaN NaN NaN NaN NaN NaN NaN A10 B10 C10 D10
11 NaN NaN NaN NaN NaN NaN NaN NaN A11 B11 C11 D11

In [23]:
df1.append(df2)


Out[23]:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7

Mergeing


In [77]:
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})
   
right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                          'C': ['C0', 'C1', 'C2', 'C3'],
                          'D': ['D0', 'D1', 'D2', 'D3']})

In [78]:
left


Out[78]:
A B key
0 A0 B0 K0
1 A1 B1 K1
2 A2 B2 K2
3 A3 B3 K3

In [79]:
right


Out[79]:
C D key
0 C0 D0 K0
1 C1 D1 K1
2 C2 D2 K2
3 C3 D3 K3

In [80]:
pd.merge(left, right, on= 'key')


Out[80]:
A B key C D
0 A0 B0 K0 C0 D0
1 A1 B1 K1 C1 D1
2 A2 B2 K2 C2 D2
3 A3 B3 K3 C3 D3

In [81]:
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                        'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})
    
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                               'key2': ['K0', 'K0', 'K0', 'K0'],
                                  'C': ['C0', 'C1', 'C2', 'C3'],
                                  'D': ['D0', 'D1', 'D2', 'D3']})

In [82]:
pd.merge(left, right, on=['key1', 'key2'])


Out[82]:
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A2 B2 K1 K0 C1 D1
2 A2 B2 K1 K0 C2 D2

In [83]:
pd.merge(left, right, how='outer', on=['key1', 'key2'])


Out[83]:
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A1 B1 K0 K1 NaN NaN
2 A2 B2 K1 K0 C1 D1
3 A2 B2 K1 K0 C2 D2
4 A3 B3 K2 K1 NaN NaN
5 NaN NaN K2 K0 C3 D3

In [84]:
pd.merge(left, right, how='left', on=['key1', 'key2'])


Out[84]:
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A1 B1 K0 K1 NaN NaN
2 A2 B2 K1 K0 C1 D1
3 A2 B2 K1 K0 C2 D2
4 A3 B3 K2 K1 NaN NaN

In [85]:
pd.merge(left, right, how='right', on=['key1', 'key2'])


Out[85]:
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A2 B2 K1 K0 C1 D1
2 A2 B2 K1 K0 C2 D2
3 NaN NaN K2 K0 C3 D3

Joining


In [86]:
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                      index=['K0', 'K1', 'K2']) 

right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                    'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])

In [87]:
left


Out[87]:
A B
K0 A0 B0
K1 A1 B1
K2 A2 B2

In [88]:
right


Out[88]:
C D
K0 C0 D0
K2 C2 D2
K3 C3 D3

In [89]:
left.join(right)


Out[89]:
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 NaN NaN
K2 A2 B2 C2 D2