DAT210x - Programming with Python for DS

Module2 - Lab5

Import and alias Pandas:


In [ ]:
# .. your code here ..

As per usual, load up the specified dataset, setting appropriate header labels.


In [ ]:
# .. your code here ..

Excellent.

Now, use basic pandas commands to look through the dataset. Get a feel for it before proceeding!

Do the data-types of each column reflect the values you see when you look through the data using a text editor / spread sheet program? If you see object where you expect to see int32 or float64, that is a good indicator that there might be a string or missing value or erroneous value in the column.


In [ ]:
# .. your code here ..

Try use your_data_frame['your_column'].unique() or equally, your_data_frame.your_column.unique() to see the unique values of each column and identify the rogue values.

If you find any value that should be properly encoded to NaNs, you can convert them either using the na_values parameter when loading the dataframe. Or alternatively, use one of the other methods discussed in the reading.


In [ ]:
# .. your code here ..

Look through your data and identify any potential categorical features. Ensure you properly encode any ordinal and nominal types using the methods discussed in the chapter.

Be careful! Some features can be represented as either categorical or continuous (numerical). If you ever get confused, think to yourself what makes more sense generally---to represent such features with a continuous numeric type... or a series of categories?


In [ ]:
# .. your code here ..

Lastly, print out your dataframe!


In [ ]:
# .. your code here ..