Pandas Munging Exercises



In [ ]:

    
# 1. Import pandas as pd and read_csv() simple.csv into a dataframe 'df' (optionally, auto-convert dates).



In [ ]:

    
# 2. Use df.dropna() to drop any sample that contains any Na/NaN values.



In [ ]:

    
# 3. Use df.dropna() with the subset keyword argument to drop those rows without a Count.



In [ ]:

    
# 4. Use df.fillna() to fill the NaN values in Count with 0.



In [ ]:

    
# 5. Use pd.to_numeric() to convert "Weird Count" to numbers. After error, try with keyword errors='coerce'.



In [ ]:

    
# 6. Use str.replace('\t', '') on the column Weird Date to delete any tabs.



In [ ]:

    
# 7. Use .str.partition(',')[2] to chop WEEKDAY COMMA  from Weird Date and make a new column.



In [ ]:

    
# 8. Use .str.strip() to remove any whitespace in this new column.



In [ ]:

    
# 9. Use pd.to_datetime() to convert the weirdly formatted dates in Less Weird Dates to pandas datetimes.



In [ ]:

    
# 10. Convert the nice pandas dates to month long period types using df[].dt.to_period().



In [ ]:

    
# 11. Convert Count to an int using the column's "astype()" method.



In [ ]:

    
# 12. Import numpy as np. Run pd.isnull(np.NaN). Run None == np.NaN. Run np.NaN == np.NaN. What does that tell you?



In [ ]:

    
# 13. Do a database style inner join of the df and a copy (df.copy()) of the dataframe on Date using pd.merge()



In [ ]:

    
# 14. Combine the df with a copy of the df using concat, effectively stacking the df on top of itself.



In [ ]:

    
# 15. Convert Count to string type using .astype(). If failure, use raise_on_error=False argument.



In [ ]:

    
# 16. Bin the values in Count in groups of 10. 0-9, 10-19, 20-29, etc. using pd.cut.



In [ ]:

    
# 17. Use pd.read_csv to read the CFPB CSB into dataframe 'df2'.



In [ ]:

    
# 18. Filter df2 down to ['Product', 'Sub-product', 'Complaint ID', and 'Date received']



In [ ]:

    
# 19. set_index() with ['Product', 'Sub-product] amd assign the result to df3.

Answers