In [1]:
import pandas as pd
from numpy import NaN
Create a sample DataFrame with some missing values.
In [4]:
df = pd.DataFrame({
'colA': ['aaa', NaN, NaN, NaN, 'bbb', 'ccc'],
'colB': ['xxx', 'yyy', NaN, 'zzz', NaN, 'www'],
#'colC': [NaN, 3, NaN, 1, 0, 9]
})
In [5]:
df
Out[5]:
Task: replace missing values in column colA
with those in colB
(if they exist).
First we define a filtering expression ("condition") cond
which encodes the condition which we'd like to use for filling in the values. In this case we could actually use the simpler condition cond = df.colA.isnull()
because it doesn't matter if the value in colB
is also missing (since we would just replace NaN
with NaN
), but for the sake of illustration let's use this slightly more complicated expression.
In [13]:
cond = df.colA.isnull() & ~df.colB.isnull()
cond
Out[13]:
We can use this to extract the desired columns if we wish.
In [14]:
df[cond]
Out[14]:
Now we can do the assignment. Note that we use the .loc
operator to avoid a warning about "trying to set values on a copy of a slice from a DataFrame" which would happen if we used for example the following expression
df[cond]['colA'] = df[cond]['colB']
(See http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy for details.)
In [18]:
df.loc[cond, 'colA'] = df.loc[cond, 'colB']
The resulting DataFrame does indeed have the values yyy
and zzz
filled in column colA
.
In [19]:
df
Out[19]: