See also: Working with Text Data
See also: Pandas String Methods
See also: Time Series / Date Functionality
See also: Computational Tools
In [ ]:
# 1. import pandas as pd, create a dataframe using CFPB.csv, and read_csv into 'df'.
In [ ]:
# 2. Use the .str namespace of the Issue column to access the .str.upper() method
In [ ]:
# 3. Use the .str.split() method to split df['Issue'] into strings.
In [ ]:
# 4. Use dir() on df['Issue'].str to get all the availible string methods.
In [ ]:
# 5. Use the .str.replace() method to replace the letters 'or' with '!' in 'Issue', and then capitalize.
In [ ]:
# 6. Use .str.extract() with regex r'(\b\S\S\S\b)' to get the first 3-letter word from Complaint.
In [ ]:
# 7. Use .str.contains() with 'lawyer' regex to select all rows with lawyer (boolean indexing)
In [ ]:
# 8. Index the .str namespace of 'Issue' directly with [] to get the first three letters of each string.
In [ ]:
# 9. Create a range of dates from 1/1/2000 to 12/31/2020 using pd.date_range and assign it to 'dindex'
In [ ]:
# 10. Create a times from 9am on 1/1/2000 to 9pm on 1/3/2000 using pd.date_range
In [ ]:
# 11. pd.read_csv the simple.csv with arguments: infer_datetime_format=True, parse_dates=['Date']. Assign to 'df'
In [ ]:
# 12. Use the dataframe's set_index() with inplace=True to index on Date. Assign result to 'df'.
In [ ]:
# 13. Now use the dataframe's resample method to mean() to get a biweekly average.
In [ ]:
# 14. Use the dataframe's rolling() method to get a 3 day rolling mean. Assign to 'roll_df'.
In [ ]:
# 15. Import matplotlib, set %matplotlib inline, use the plot method of roll_df[['Count]]