Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. The data manipulation capabilities of pandas are built on top of the numpy library. In a way, numpy is a dependency of the pandas library.
In this notebook we'll try various pandas methods and in the process learn more about Pandas.
Please follow this link. All the necessary steps are mentioned here.
In [43]:
import numpy as np
import pandas as pd
In [45]:
seriesLabel = ['label1', 'label2', 'label3']
exampleList = [5, 10, 20]
In [46]:
pd.Series(exampleList)
Out[46]:
In [47]:
pd.Series(exampleList, seriesLabel)
Out[47]:
In [48]:
exampleNumpyArray = np.array([6, 12, 18])
In [49]:
pd.Series(exampleNumpyArray)
Out[49]:
In [50]:
pd.Series(exampleNumpyArray, seriesLabel)
Out[50]:
In [51]:
exampleDictionary = { 'label4': 7, 'label5': 14, 'label6': 21 }
In [52]:
# No need to mention labels parameter
pd.Series(exampleDictionary)
Out[52]:
In [53]:
# If you mention different labels for a dictionary
pd.Series(exampleDictionary, seriesLabel)
Out[53]:
In [54]:
def sampleFunc1():
pass
def sampleFunc2():
pass
def sampleFunc3():
pass
pd.Series(data=[sampleFunc1, sampleFunc2, sampleFunc3])
Out[54]:
In [55]:
pd.Series(['a', 2, 'hey'])
Out[55]:
It is the second parameter which acts as the label for the series.
In [56]:
pd.Series(data=[sampleFunc1, sampleFunc2, sampleFunc3], index=['a', 'b', 'c'])
Out[56]:
In [57]:
pd.Series(['a', 2, 'hey'], ['label', 2, 'key'])
Out[57]:
In [71]:
pd.DataFrame(data = np.random.randint(1,51, (4,3)), index = ['row1', 'row2', 'row3', 'row4'], columns = ['col1', 'col2', 'col3'])
Out[71]:
In [73]:
dataFrame = pd.DataFrame(data = np.random.randint(1,51, (4,3)), index = ['row1', 'row2', 'row3', 'row4'], columns = ['col1', 'col2', 'col3'])
dataFrame
Out[73]:
Selecting a single column
In [74]:
dataFrame['col1']
Out[74]:
Selecting multiple columns
In [75]:
dataFrame[['col1', 'col2']]
Out[75]:
Creation of new columns using arithmetic operators
In [76]:
dataFrame['newCol1'] = dataFrame['col3'] - dataFrame['col2']
dataFrame
Out[76]:
In [77]:
dataFrame['newCol2'] = dataFrame['col1'] * dataFrame['col3']
dataFrame
Out[77]:
Removal of columns
In [78]:
# axis -> 0 means that we are targeting the rows
# axis -> 1 means that we are targeting the columns
dataFrame.drop('newCol1', axis=1)
Out[78]:
In [79]:
# we did not really drop the column
dataFrame
Out[79]:
In [80]:
# Pandas saves us from accidentally dropping the columns
# Inorder to delete it
dataFrame.drop('newCol1', axis=1, inplace=True)
dataFrame
Out[80]:
In [81]:
dataFrame.drop('newCol2', axis=1, inplace=True)
dataFrame
Out[81]:
Selecting a single Row
In [82]:
dataFrame.loc['row1']
Out[82]:
Selecting multiple rows
In [83]:
dataFrame.loc[['row1', 'row2']]
Out[83]:
Selecting rows based on their index number
In [84]:
dataFrame.iloc[1]
Out[84]:
Removal of rows
In [85]:
dataFrame.drop('row1', axis=0)
Out[85]:
In [86]:
# again pandas didn't drop it completely
dataFrame
Out[86]:
In [87]:
# we should use 'inplace' to drop the row
# dataFrame.drop('row1', axis=0, inplace=True)
# dataFrame
Selecting both columns and rows
In [88]:
dataFrame.loc['row1', 'col2']
Out[88]:
In [89]:
dataFrame.loc[['row1', 'row2', 'row3'],['col2', 'col3']]
Out[89]:
In [90]:
dataFrame.iloc[0,1]
Out[90]:
In [91]:
dataFrame.iloc[[0,1]]
Out[91]:
In [92]:
dataFrame
Out[92]:
In [93]:
dataFrame > 10
Out[93]:
Instead of getting true and false values, we can also get the actual value if the condition is satisfied
In [94]:
dataFrame[dataFrame > 10]
Out[94]:
We can also target individual columns
In [102]:
dataFrame[dataFrame['col2'] > 10]
Out[102]:
We can also output columns only that we want
In [103]:
dataFrame[dataFrame['col2'] > 10]['col3']
Out[103]:
In [105]:
dataFrame[dataFrame['col2'] > 10][['col1', 'col3']]
Out[105]:
If we want to apply conditional operators on multiple columns then we do so by
In [109]:
dataFrame[(dataFrame['col1'] > 10) & (dataFrame['col3'] > 10)]
Out[109]:
In [ ]:
Note: This notebook is not complete, more content will be added soon.