In [ ]:
# 1. Import pandas as pd and read_csv() simple.csv into a dataframe 'df' (optionally, auto-convert dates).
In [ ]:
# 2. Use df.dropna() to drop any sample that contains any Na/NaN values.
In [ ]:
# 3. Use df.dropna() with the subset keyword argument to drop those rows without a Count.
In [ ]:
# 4. Use df.fillna() to fill the NaN values in Count with 0.
In [ ]:
# 5. Use pd.to_numeric() to convert "Weird Count" to numbers. After error, try with keyword errors='coerce'.
In [ ]:
# 6. Use str.replace('\t', '') on the column Weird Date to delete any tabs.
In [ ]:
# 7. Use .str.partition(',')[2] to chop WEEKDAY COMMA from Weird Date and make a new column.
In [ ]:
# 8. Use .str.strip() to remove any whitespace in this new column.
In [ ]:
# 9. Use pd.to_datetime() to convert the weirdly formatted dates in Less Weird Dates to pandas datetimes.
In [ ]:
# 10. Convert the nice pandas dates to month long period types using df[].dt.to_period().
In [ ]:
# 11. Convert Count to an int using the column's "astype()" method.
In [ ]:
# 12. Import numpy as np. Run pd.isnull(np.NaN). Run None == np.NaN. Run np.NaN == np.NaN. What does that tell you?
In [ ]:
# 13. Do a database style inner join of the df and a copy (df.copy()) of the dataframe on Date using pd.merge()
In [ ]:
# 14. Combine the df with a copy of the df using concat, effectively stacking the df on top of itself.
In [ ]:
# 15. Convert Count to string type using .astype(). If failure, use raise_on_error=False argument.
In [ ]:
# 16. Bin the values in Count in groups of 10. 0-9, 10-19, 20-29, etc. using pd.cut.
In [ ]:
# 17. Use pd.read_csv to read the CFPB CSB into dataframe 'df2'.
In [ ]:
# 18. Filter df2 down to ['Product', 'Sub-product', 'Complaint ID', and 'Date received']
In [ ]:
# 19. set_index() with ['Product', 'Sub-product] amd assign the result to df3.