Homework:

Hierarchical Attention Networks and interpretability

Clone and understand my implementation of Hierarchial Attention Network (priceless)

I would also provide serialized pre-trained model

Visualize attention for IMDB reviews on both sentence and word level (10 points)

Plot attention weight distribution of words "good" and "bad" for both positive and negative reviews and compare them (6 points)

Attention weight distribution of `"good"` regarding yelp review grade.

Compare accuracy for HAN with plain rnn-based model and singe-word-attention model (4 points)

Pre-trained model for comparation could be found in the classwork

Reference article

Hierarchical Attention Networks for Document Classification (Yang et al., 2016)

http://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf