In [ ]:
import pandas as pd
import numpy as np
In [ ]:
# Example dataframe
df = pd.DataFrame({'Age': [20, 18, 25, 55, 125, 30],
'Height': [165, 189, 359, 149, 175, 163]})
df
Masks are useful to get parts of our dataframe with specific characteristics, for instance,
In [ ]:
my_mask = df['Age'] < 30
my_mask
... People with an exact age:
In [ ]:
my_mask = df['Age'] == 55
my_mask
Or, if we want people with age 0 or above and below 115:
In [ ]:
my_mask = (df['Age'] >= 0) & (df['Age'] < 115)
my_mask
This is our mask! When dealing with Dataframes, you get a Series in return with the rows that fulfill your inequalities. Let us see our last mask in practice, where we see that one of the rows was dropped:
In [ ]:
df[my_mask]
In [ ]:
df.loc[my_mask]
Well, our initial dataframe df
is still...
In [ ]:
df
.. since we didn't change it yet! We just took a look at views of the dataframe. Let us drop the row 4 with Age=125
In [ ]:
df = df[my_mask]
df
But we still have a person that looks too tall to be true. Let's do something about it, let's trim her to 155!
In [ ]:
mask = df['Height'] == 359
df[mask]['Height'] = 155
df
We got a warning! Maybe we shouldn't have trimmed that person down!!
Actually, it's not that... The problem is that we are (or might be) trying to assign a value (175
) to a view of a dataframe instead of the actual dataframe! And this can be a hidden problem if we disregard the warning. Explaining this would require more time than we actually have, but I recommend you to take a look at the warning's link. Always pay attention to the warnings - if you don't know what they mean, Google them.
The solution for this is to use the .loc[]
, which is primarily label based (e.g., using 'Age', 'Height'), but may also be used with a boolean array (which is what we want). I would also recommend to take a look at this post.
In [ ]:
df.loc[df['Height'] == 359, 'Height'] = 155
df
And here we have our dataframe without extreme heights and our ages within a specified range. By the way, if you want to invert your mask in a pythonic way you just need to do this:
In [ ]:
~my_mask
In [ ]: