In [ ]:
import pandas as pd
In [ ]:
df = pd.read_csv('../data/airbnb_input.csv', index_col='room_id')
On df DataFrame you can find data about Airbnb rooms in Lisbon.
Go ahead and preview it!
In [ ]:
# code to preview df
Here, we are selecting the first row and getting it as a pandas Series:
In [ ]:
df.iloc[0]
But you can also get a single row as a DataFrame:
In [ ]:
df.iloc[[0]]
Now try to get rows 0, 1 and 3, using iloc:
In [ ]:
# code to select rows 0, 1 and 3 from df
Array slices also work, so try to select all the rows from position number 2 to position number 7:
In [ ]:
# code to select rows [2, 7] from df
Here, we're selecting the row that corresponds to room 17031 (with index 17031):
In [ ]:
df.loc[17031]
Now, if you select a row for an index that doesn't exist you'll get a KeyError:
In [ ]:
df.loc[[1]]
Try to select the rows that correspond to rooms 17031 and 25659:
In [ ]:
# code to select rooms 17031 and 25659
Now check what happens when you pass a boolean array to loc!
In [ ]:
# code to call loc with a boolean array
You can select columns using dot notation, like this:
In [ ]:
df.room_type
Or using brackets, like this:
In [ ]:
df['room_type']
Now, try to select column room_type from df, but getting the result as a DataFrame instead of a Series:
In [ ]:
# code to get column room_type from df as a DataFrame
And now try to select columns room_type and neighborhood:
In [ ]:
# code to select columns room_type and neighborhood
If you're feeling confident, here's a little challenge: select column room_type using loc!
In [ ]:
# code to select column room_type using loc
Mask function can be used to "hide" the rows that verify a certain condition:
In [ ]:
df.mask(df.overall_satisfaction == 5.0)
And with where function, you can hide the rows that don't verify a certain condition:
In [ ]:
df.where(df.overall_satisfaction == 5.0)
Here we're selecting only the rows of rooms in the Alvalade neighborhood:
In [ ]:
df[df.neighborhood == 'Alvalade']
Check DataFrame's data types by using the class attribute dtypes:
In [ ]:
df.dtypes
In order to select columns based on their data type, use the function select_dtypes:
In [ ]:
df.select_dtypes(include=['float64'])
In [ ]:
df.to_csv('../data/airbnb_output.csv')