Homework 5

Copy this notebook. Rename it as: YOURNAME-HW5-streams

with your name replacing YOURNAME.

Upload your completed jupyter notebook to elearning site as your homework submission. You can put this notebook on your github.

5.1 Register for a stream of Twitter data

5.2 Create a bloom filter classifying two days worth of twitters ( after removing stop words and urls )

5.3 For another days worth of twitter data find the previous twitters that match in the bloom filter (This means get two days of data in one file or directory , use that data for training the bloom filter, capture a different days data in a different file ( or do it in real time)and capture the match output then running the new twitter data through the filter.

5.4 Plot a historgram of matches for each twitter in 5.3

For the 4-5 grade.- Submit in a separate notebook - YourNAME-Homework5-Supplement

  1. Use a different machine learning training algorithm
  2. Make a continous feed where you take two days of data and match the incoming stream ( do this for 5 days windowing the filter data)
  3. Find new trends in the twitter feed (daily or hourly)
  4. Or some other streaming exploration of your choosing

In [ ]: