Operations

There are lots of operations with pandas that will be really useful to you, but don't fall into any distinct category. Let's show them here in this lecture:



In [1]:

    
import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 3, 4],
                   'col2': [444, 555, 666, 444],
                   'col3': ['abc', 'def', 'ghi', 'xyz']})
df.head()

Info on Unique Values



In [2]:

    
df['col2'].unique()









    Out[2]:





array([444, 555, 666], dtype=int64)



In [3]:

    
df['col2'].nunique()









    Out[3]:





3



In [4]:

    
df['col2'].value_counts()









    Out[4]:





444    2
555    1
666    1
Name: col2, dtype: int64

Selecting Data



In [5]:

    
#Select from DataFrame using criteria from multiple columns
newdf = df[(df['col1'] > 2) & (df['col2'] == 444)]



In [6]:

    
newdf

Applying Functions



In [7]:

    
def times2(x):
    return x * 2



In [8]:

    
df['col1'].apply(times2)









    Out[8]:





0    2
1    4
2    6
3    8
Name: col1, dtype: int64



In [9]:

    
df['col3'].apply(len)









    Out[9]:





0    3
1    3
2    3
3    3
Name: col3, dtype: int64



In [10]:

    
df['col1'].sum()









    Out[10]:





10

Permanently Removing a Column



In [11]:

    
del df['col1']



In [12]:

    
df

Get column and index names:



In [13]:

    
df.columns









    Out[13]:





Index(['col2', 'col3'], dtype='object')



In [14]:

    
df.index









    Out[14]:





RangeIndex(start=0, stop=4, step=1)

Sorting and Ordering a DataFrame:



In [15]:

    
df



In [16]:

    
df.sort_values(by = 'col2') #inplace=False by default

Find Null Values or Check for Null Values



In [17]:

    
df.isnull()









    Out[17]:







  
    
      
      col2
      col3
    
  
  
    
      0
      False
      False
    
    
      1
      False
      False
    
    
      2
      False
      False
    
    
      3
      False
      False



In [18]:

    
# Drop rows with NaN Values
df.dropna()

Filling in NaN values with something else:



In [19]:

    
import numpy as np



In [20]:

    
df = pd.DataFrame({'col1': [1, 2, 3, np.nan],
                   'col2': [np.nan, 555, 666, 444],
                   'col3': ['abc', 'def', 'ghi', 'xyz']})
df.head()



In [21]:

    
df.fillna('FILL')



In [22]:

    
data = {'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],
        'B': ['one', 'one', 'two', 'two', 'one', 'one'],
        'C': ['x', 'y', 'x', 'y', 'x', 'y'],
        'D': [1, 3, 2, 5, 4, 1]}

df = pd.DataFrame(data)



In [23]:

    
df



In [24]:

    
df.pivot_table(values = 'D',
               index = ['A', 'B'],
               columns = ['C'])

Operations

Info on Unique Values

Selecting Data

Applying Functions

Great Job!