A lot of sentiment analysis is done using simple word lists, rather than fancy parsing. There are downsides to this, of course, because context and negation matter a lot. "It's not fun" is the opposite of "It's fun," but a word-spotting approach probably won't pick up the difference. This is one reason to be wary of sentiment analyses when you see them.
Here's an example toy you can play with, by the way (built by David Mimno using wordlists from Matt Jockers Syuzhet r package (http://www.matthewjockers.net/2015/02/02/syuzhet/): http://mimno.infosci.cornell.edu/sentiment/
However, if we use a broad enough "window" and inspect the results, it can still be useful to do it this way. Let's try some simple line charts.
Open up get_sentiment_chunks.py in your editor.
Make sure the paths at the top are correct:
NEGWORDS = "../data/sentiment_wordlists/negative-words.txt"
POSWORDS = "../data/sentiment_wordlists/positive-words.txt"
Run the code at the command line using a data file as your input (don't do it here, it won't work right...)
> python get_sentiment_chunks.py ../data/SOTU/Abraham_Lincoln_December_3,_1861.txt 100
The output file is called sentiment.json. Move it to where the html can find it:
In [1]:
!mv sentiment.json ../outputdata/
If you need to change the name of your data file from sentiment.json, you can, but remember to fix it in the html net_sentiment.html.
Edit the title in your HTML too, if you want:
<div class="titles">
<h2 class="title">Net Sentiment During SOTU</h2>
</div>
When you save and open that file in your localhost server, you should see an interactive line plot.
Roll over the dots to see the words at each point in the speech. Above the middle line is net "positive" words, below is net "negative" words.
Check out how overall "negative" Moby Dick is!
In [ ]: