Title: Selecting Pandas DataFrame Rows Based On Conditions
Slug: pandas_selecting_rows_on_conditions
Summary: Selecting Pandas DataFrame Rows Based On Conditions
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon

Preliminaries


In [38]:
# Import modules
import pandas as pd
import numpy as np

In [39]:
# Create a dataframe
raw_data = {'first_name': ['Jason', 'Molly', np.nan, np.nan, np.nan], 
        'nationality': ['USA', 'USA', 'France', 'UK', 'UK'], 
        'age': [42, 52, 36, 24, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'nationality', 'age'])
df


Out[39]:
first_name nationality age
0 Jason USA 42
1 Molly USA 52
2 NaN France 36
3 NaN UK 24
4 NaN UK 70

Method 1: Using Boolean Variables


In [40]:
# Create variable with TRUE if nationality is USA
american = df['nationality'] == "USA"

# Create variable with TRUE if age is greater than 50
elderly = df['age'] > 50

# Select all casess where nationality is USA and age is greater than 50
df[american & elderly]


Out[40]:
first_name nationality age
1 Molly USA 52

Method 2: Using variable attributes


In [41]:
# Select all cases where the first name is not missing and nationality is USA 
df[df['first_name'].notnull() & (df['nationality'] == "USA")]


Out[41]:
first_name nationality age
0 Jason USA 42
1 Molly USA 52