At "The Future of Science Communication in a Post-Factual World," a recent 21st Century Scientist workshop held at the University of Illinois at Urbana-Champaign, Liana Aghajanian made a compelling observation about how ordinary users perceive and express sentiment on social media.
According to Aghajanian, the digital world is keeping a bit of a secret: users of social media may not be as positive as they first appear. Some feeds may look dire indeed, given the particular experiences and orientations of friends and followers. But overall, Aghajanian says, people tend to share positive news stories on social media. But they do so not because they are positive people, but rather because they want to be perceived as positive people. In other words, people tend not to share negative news because they fear they will then be perceived as negative people.
Aghajanian herself is a bit at odds with the world. She works primarily in restorative narrative, a method of long-form writing that aims to capture the complexity of life. For her, gone are the chirpy stories of triumph through tragedy, of just-hang-in-there-and-it-will-all-work-out-in-the-end optimism so characteristic of animated films and bumper stickers. What replaces them is a sober exploration of what it means to be harmed and yet still alive; of how people pick up the pieces (or they don't or they can't or how they drop some of them along the way); of what life looks like after a time of ashes has passed.
Instead of these kinds of stories, she said, many people almost compulsively share positive news stories, in the hope that they will be perceived not as complicated people living in a complicated world, but as unabashedly positive people. News about technology, which tends to be written somewhere on the spectrum between “gee, golly, isn’t it great to be alive with all these cool toys” to “the world is terrible, but the new [thing] will save us” may thus be overrepresented among shared stories, not because those who share necessarily believe in x app, but because they perceive technology positively, and want to borrow some of that positivity for themselves.
Intrigued by her statement, I set out to discover how people feel about technology when given the chance to converse with famous figures responsible for much of modern computing technology.
Technology is very broadly defined, of course—anything from the wheel to the codex to Oculus Rift counts. To make an impossible task more manageable, and to center the human in technical discourse, I turn instead to the people who have become the faces of technological innovation: Bill Gates, Steve Wozniak, and Elon Musk.
My sentiment analysis of social-media interviews with Gates, Wozniak, and Musk suggests that people (here, participants in Reddit's Ask Me Anything (AMA) series) are, in fact, quite positive in their interactions with the people responsible for much of the computing technology that we use every day. But users feel more positively about Musk, while they very much dislike Steve Jobs (so much so that Wozniak, Jobs' one-time partner, must answer--quite literally--for many of Jobs' perceived flaws).
Perhaps most intriguing, like a tech version of A Christmas Carol, these three figures come to function as the ghosts of a technology past, present, and future (Wozniak, Gates, and Musk, respectively). In so doing, they invite a few users to denounce crimes against humanity and, somewhat less grandly, embrace snark, while many others are inspired to imagine a world and even a universe transformed by the power of hardware and software.
I analyzed sentiment in the Reddit AMAs with Wozniak, Gates, and Musk. All three AMAs are relatively recent (approximately a year old), are archived at Reddit and thereby accessible, and have produced a lot of data--over 15,000 parent-level comments between them.
Pang and Lee (2008) note that "sentiment" is a tricky term. For this project, however, I followed the understanding of sentiment that they include from Merriam Webster's Online Dictionary: a "settled opinion reflective of ones feelings" (Pang and Lee, p.5).
To conduct my sentiment analysis, I used the PRAW Python library to collect the authors, scores, and body of all parent- or top-level comments in each AMA. I chose to work with top-level comments, as Reddit encourages its users to ardently police these comments by upvoting appropriate comments and downvoting inappropriate ones. More information on Reddit's voting guidelines can be found here.
How do Reddit users feel about technology? Are their comments more positive, negative, or neutral toward these tech innovators? What words frequently comprise their questions? With what do users associate famous tech innovators? Do Reddit users use the Reddit voting system to elevate positive comments and bury negative ones, and if so, what counts as positive and negative comments?
Word frequency; sentiment mining; visualization.
To perform the sentiment analysis, I chose VADER, a rule-based model optimized explicitly for social media data (Hutto and Gilbert, 2014, p.4).
Sentiment analysis of Reddit is a fairly popular topic. Reddit is an attractive social media site to mine for sentiment, as it offers users both the ability to comment and the ability to vote on others’ comments, and it does so on thousands of topics in spaces known as “subReddits”. There are subReddits devoted to topics as varied as animals, politics, and interviews (or Ask Me Anythings), and many more, and users are encouraged to create subReddits devoted to their particular interests if none already exist. The voting system (also known as “Karma”) offers a particularly rich system for documenting and analyzing sentiment associated with individual users, topics, and within and between subReddits.
Methods, however, differ. Some, like this Reddit post about sentiment on the subReddit r/apple (https://www.Reddit.com/r/Python/comments/59thp6/sentiment_analysis_of_new_mpb_on_Reddit_with/), uses the PRAW library and TextBlob, while this blog post about learning data science through analyzing Reddit headlines uses JSON and NLTK (http://www.learndatasci.com/sentiment-analysis-Reddit-headlines-pythons-nltk/), and this academic paper uses Karma to map neural networks (https://cs224d.stanford.edu/reports/TingJason.pdf). Though seemingly not as prevalent as sentiment analyses of Twitter, sentiment analysis of Reddit offers a robust look at how millions of users interested in almost innumerable topics feel about the topics, people, and products with which they engage.
These resources are just a few of the offerings at which I looked when gathering data for my project. This blog post in particular introduced me to the PRAW library, giving me enough of an overview to send me off to the PRAW docs fairly confident that this was a tool I should use: https://unsupervisedlearning.wordpress.com/2012/09/26/who-is-rwashingtondc-part-1-daily-activity-usage/. The author’s overall goal evolved from understand their own Reddit habits to characterizing Redditors to characterizing Redditors via the hours and frequency of their posting. From this post, as well as the guidance offered by numerous StackOverflow posts on analyzing sentiment with both NLTK and PRAW, I was able to assemble my data.
I initially wanted to capture just the first question posed by each participant—the opening question that started everyone off on separate conversations within the same AMA. I quickly learned that, between my grappling with the DOM and the message-board-like quality of Reddit, it was not very easy to parse out the dividing lines between author, evaluation metadata, parent comment, and children comments.
As a result, a nice, clean list of just opening comments took a few days to produce. After using screen capture, parsing with BeautifulSoup to separate the metadata from the body of each comment, and cleaning the comment data (removing “load more comments”, tokenizing, case folding, and removing stop words), I ended up with the data I wanted.
Reddit conversations function much like conversations in the real world. As people become involved in chatting, they alter previous questions, make jokes, or wander; in other words, they produce a lot of noise. Why they were chatting to begin with—to pose and answer a question—can get a bit muddled. As such, while I think this first attempt produced an interesting dataset, the data are limited to just a few “best” parent comments and many child comments, which risks turning a large-scale open exchange into a few snatches of conversation between friends.
Wanting more, and more diverse, data, I turned to PRAW. I learned about PRAW from a few tutorials, the PRAW docs, and several very generous and knowledgeable StackOverflow-ers. After installing PRAW through pip, registering my script with the API, and establishing a Reddit instance (all of which took approximately 10 minutes), I can confirm that PRAW does indeed make navigating Reddit comments easy and (relatively) quick. The longest I have waited to retrieve comment bodies from over 3,000 comments is just three minutes.
This method gave me all of the top-level comments, stripped of metadata, and without needing to have the phrase “load more comments” removed. (I could easily limit the amount of more comments I wanted loaded, as each load was a separate request to the API, and could further filter based upon the “threshold” of child comments in response to the original comment, with the idea being I could call for parent comments that had at least one child comment in response to it).
The best aspect of this approach is the increase in data. Just with Steve Wozniak’s AMA, my dataset went from 477 comments to 3,478. And these comments span the length of the comment offerings, from the most thoughtful or interesting or popular to the most flippant or crude or downvoted. However, this approach also scrambles Reddit’s voting system by treating all comments equally. Those who attempted to adhere to Reddit’s policies concerning AMA comments, those who submitted comments or voted on others in the hopes of getting real answers, and those who voted but did not themselves comment, are not represented in this dataset.
The previous two approaches helped me clarify my research question, and thus, what kind of data I needed to compile. I want to capture parent comments that represent important questions for AMA participants, whether as representative of questions the community would like to see answered, or as representatives of questions that the community would like to bury and, thus, have a significantly reduced chance of being answered.
To capture the data, I have used the PRAW Python library to connect with the Reddit API and collect the parent or top-level comments for each AMA. I have chosen to work only with the top-level comments because these comments, through the voting system, function as Reddit's built-in sentiment analyzer of sorts.
Users can contribute their own top-level comments and they can also vote for other comments to be either pushed up in the list or buried further down in it. Voting has serious ramifications for engagement during an AMA. In the rules for AMAs, participants are explicitly encouraged to vote for comments with which they agree and to downvote comments that are rude, offensive, or do not pose a question. Upvoted comments also bear a dual burden of becoming surrogates for voters’ own unarticulated comments, and more successfully catching the attention of the interviewee, who is also explicitly told to respond to upvoted questions rather than hunting for questions which they may feel more comfortable answering. https://www.Reddit.com/r/IAmA/wiki/index
As such, top-level comments are more likely to be seen and responded to by both user and interviewee, either with votes or with answers.
Each AMA provides over 3,000 top-level comments, for a grand total of approximately 15,000 top-level comments between them. What are they talking about? What words dominate each thread? Are any top words shared across AMAs?
To take a large view of each AMA, I cleaned each dataset (removed punctuation, casefolded, and removed stopwords) and looked for word frequencies.
In [108]:
import nltk
stopwords = nltk.corpus.stopwords.words("english")
In [109]:
with open('./data/wozniak_text.txt') as f:
wozniak_string = f.read()
wozniak_tokens = nltk.word_tokenize(wozniak_string)
replace_punct = [word.replace("'", '').replace('"','') for word in wozniak_tokens]
alpha = [word for word in replace_punct if word and word[0].isalpha()]
alpha_lower = [word.lower() for word in alpha]
alpha_lower_stop = [word for word in alpha_lower if word not in stopwords]
alpha_lower_stop_fd = nltk.FreqDist(alpha_lower_stop)
alpha_lower_stop_fd.tabulate(10)
In [110]:
%matplotlib inline
alpha_lower_stop_fd.plot(20)
The most frequent two words in Wozniak's AMA are proper nouns: 'apple' and 'steve'. This latter is not terrifically surprising, given that Wozniak's first name is, of course, 'Steve', and that many comments begin with some version of 'Hi, Steve!'. Looking at the sixth most frequent word, 'jobs', one might think--as I did--that users were interested in either working for Apple, or in the economic benefits that technology in general and Apple in particular might bring (employing young, college-educated people, much like the population of the users, for instance). But looking at the word 'jobs' in context suggests something else entirely.
In [111]:
get_Text = nltk.Text(wozniak_tokens)
get_Text.concordance("jobs",width=49,lines=20)
Wozniak's AMA, in some ways, is a coversation with two Steves--one living, one gone; one perceived in a positive light, the other in a rather negative one, but both very much present in this thread--which suggests that sentiment in Wozniak's AMA is sentiment shared, at least in part, with Steve Jobs.
In [113]:
with open('./data/gates_text.txt') as f:
gates_string = f.read()
gates_tokens = nltk.word_tokenize(gates_string)
replace_punct = [word.replace("'", '').replace('"','') for word in gates_tokens]
alpha = [word for word in replace_punct if word and word[0].isalpha()]
alpha_lower = [word.lower() for word in alpha]
alpha_lower_stop = [word for word in alpha_lower if word not in stopwords]
alpha_lower_stop_fd = nltk.FreqDist(alpha_lower_stop)
alpha_lower_stop_fd.tabulate(10)
In [114]:
alpha_lower_stop_fd.plot(20)
It would seem that Gates, while still associated with Microsoft, has been able to achieve some distance from his company. His own last name is the most frequent word in his AMA (as opposed to 'apple' for Wozniak), and it would appear that his brand as a global-minded philanthrophist is successfully present here. The word 'world', for instance, is the seventh most frequent word. Looking at the word 'world' in context, we can indeed see that users associate him with a concern with the large scale, the hugely impactful, and the geo-political at its biggest level.
In [249]:
get_Text = nltk.Text(gates_tokens)
get_Text.concordance("world",width=49,lines=20)
In [115]:
with open('./data/musk_text.txt') as f:
musk_string = f.read()
musk_tokens = nltk.word_tokenize(musk_string)
replace_punct = [word.replace("'", '').replace('"','') for word in musk_tokens]
alpha = [word for word in replace_punct if word and word[0].isalpha()]
alpha_lower = [word.lower() for word in alpha]
alpha_lower_stop = [word for word in alpha_lower if word not in stopwords]
alpha_lower_stop_fd = nltk.FreqDist(alpha_lower_stop)
alpha_lower_stop_fd.tabulate(10)
In [116]:
alpha_lower_stop_fd.plot(20)
Like Gates, Musk has also been able to establish an identity apart from his company, though rather like Wozniak, Musk is never far from his signature technology, here, Tesla. However, just from looking at the twenty most frequent words, we can see that Musk's AMA is not so much about the past as Wozniak's, nor so much about our world as Gates', but is instead about the not here and the not now.
'Space' is the seventh most frequent word, while 'future' is fifteen and 'mars' is eighteen. Looking at the word 'future' in context, we can see that Musk is approached as a kind of visionary of a future that is just on the cusp of being, brought about by amazing technologies of which we have only begun to dream (but Musk may have already patented). The link established here between Musk and the future, however cursory at the moment, suggests that his AMA will be the most positive in nature.
In [117]:
get_Text = nltk.Text(musk_tokens)
get_Text.concordance("future",width=49,lines=20)
As mentioned previously, Reddit encourages group sentiment policing with the upvote/downvote system. In these three AMAs, users have avidly participated in this system, particularly when it comes to upvoting comments. Approximately 95% of top-level comments in both Wozniak's and Gates' AMAs have scores greater than 0. Approximately 85% of top-level comments in Musk's AMA have scores greater than 0.
Comments were awarded significantly fewer downvotes comparatively. Just 1% of top-level comments in Wozniak's and Gates' AMAs have negative scores, while just 3% of top-level comments in Musk's AMA have negative scores.
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt; plt.rcdefaults()
import matplotlib.pyplot as plt
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
In [63]:
wozniak_comments = pd.read_csv("./data/wozniak_parents.csv").dropna()
gates_comments = pd.read_csv("./data/gates_parents.csv").dropna()
musk_comments = pd.read_csv("./data/musk_parents.csv").dropna()
In [120]:
#code adapted from http://stackoverflow.com/questions/14270391/python-matplotlib-multiple-bars
n_groups = 3
scores_wozniak = (round((len(wozniak_comments[wozniak_comments['Score'] < 0]) / len(wozniak_comments)) * 100),
round((len(wozniak_comments[wozniak_comments['Score'] == 0]) / len(wozniak_comments)) * 100),
round((len(wozniak_comments[wozniak_comments['Score'] > 0]) / len(wozniak_comments)) * 100))
scores_gates = (round((len(gates_comments[gates_comments['Score'] < 0]) / len(gates_comments)) * 100),
round((len(gates_comments[gates_comments['Score'] == 0]) / len(gates_comments)) * 100),
round((len(gates_comments[gates_comments['Score'] > 0]) / len(gates_comments)) * 100))
scores_musk = (round((len(musk_comments[musk_comments['Score'] < 0]) / len(musk_comments)) * 100),
round((len(musk_comments[musk_comments['Score'] == 0]) / len(musk_comments)) * 100),
round((len(musk_comments[musk_comments['Score'] > 0]) / len(musk_comments)) * 100))
fig, ax = plt.subplots()
index = np.arange(n_groups)
bar_width = 0.20
opacity = 0.8
rects1 = plt.bar(index, scores_wozniak, bar_width,
alpha=opacity,
color='b',
label='Wozniak')
rects2 = plt.bar(index + bar_width, scores_gates, bar_width,
alpha=opacity,
color='g',
label='Gates')
rects3 = plt.bar(index + bar_width * 2, scores_musk, bar_width,
alpha=opacity,
color='r',
label='Musk')
plt.xlabel('Innovator')
plt.ylabel('Percentage of total comments')
plt.title('Comment breakdown by score')
plt.xticks(index + bar_width, ('Negative', 'Neutral', 'Positive'))
plt.legend(loc=4, prop={'size':8})
plt.tight_layout()
plt.show()
My findings suggest that positivity is the dominant sentiment in these online comments with technology innovators. This is particularly true for Musk, whose AMA is the most positive by far, with almost 70% of comments on his AMA thread marked as positive, 24% neutral, and only 9% negative. Gates had the second most positive AMA, with 58% of comments marked as positive, 28% neutral, and 13% negative. Wozniak had the least positive AMA, with 55% of comments marked as positive, 32% neutral, and 12% negative.
In [66]:
wozniak_comments['Sentiment'] = wozniak_comments['Text'].apply(lambda comment: analyzer.polarity_scores(comment)['compound'])
In [67]:
gates_comments['Sentiment'] = gates_comments['Text'].apply(lambda comment: analyzer.polarity_scores(comment)['compound'])
In [68]:
musk_comments['Sentiment'] = musk_comments['Text'].apply(lambda comment: analyzer.polarity_scores(comment)['compound'])
In [69]:
#code adapted from http://stackoverflow.com/questions/14270391/python-matplotlib-multiple-bars
n_groups = 3
scores_wozniak = (round((len(wozniak_comments[wozniak_comments['Sentiment'] < 0]) / len(wozniak_comments)) * 100),
round((len(wozniak_comments[wozniak_comments['Sentiment'] == 0]) / len(wozniak_comments)) * 100),
round((len(wozniak_comments[wozniak_comments['Sentiment'] > 0]) / len(wozniak_comments)) * 100))
scores_gates = (round((len(gates_comments[gates_comments['Sentiment'] < 0]) / len(gates_comments)) * 100),
round((len(gates_comments[gates_comments['Sentiment'] == 0]) / len(gates_comments)) * 100),
round((len(gates_comments[gates_comments['Sentiment'] > 0]) / len(gates_comments)) * 100))
scores_musk = (round((len(musk_comments[musk_comments['Sentiment'] < 0]) / len(musk_comments)) * 100),
round((len(musk_comments[musk_comments['Sentiment'] == 0]) / len(musk_comments)) * 100),
round((len(musk_comments[musk_comments['Sentiment'] > 0]) / len(musk_comments)) * 100))
fig, ax = plt.subplots()
index = np.arange(n_groups)
bar_width = 0.20
opacity = 0.8
rects1 = plt.bar(index, scores_wozniak, bar_width,
alpha=opacity,
color='b',
label='Wozniak')
rects2 = plt.bar(index + bar_width, scores_gates, bar_width,
alpha=opacity,
color='r',
label='Gates')
rects3 = plt.bar(index + bar_width * 2, scores_musk, bar_width,
alpha=opacity,
color='g',
label='Musk')
plt.xlabel('Innovator')
plt.ylabel('Percentage of total comments')
plt.title('Comment breakdown by sentiment')
plt.xticks(index + bar_width, ('Negative', 'Neutral', 'Positive'))
plt.legend(loc=4, prop={'size':8})
plt.tight_layout()
plt.show()
In all three AMAs, positive comments comprised the dominant share of comments. In two of the three AMAs (Gates and Musk), comments higher in positive sentiment are correlated with higher scores.
In [71]:
%matplotlib inline
gates_comments.groupby(by=['Score']).mean().plot()
Out[71]:
In [72]:
musk_comments.groupby(by=['Score']).mean().plot()
Out[72]:
In Wozniak's AMA, however, neutral sentiment is rewarded with the most upvotes.
In [73]:
wozniak_comments.groupby(by=['Score']).mean().plot()
Out[73]:
Across all three AMAs, comments with the highest amount of negative and positive sentiment receive very low scores.
Negative and positive comments within AMAs differ in length and focus.
Negative and positive comments according to score tend to be shorter in length, generally consisting of one sentence culminating in a question mark.
More positive comments by score--particularly in Gates' and Musk's AMAs--tend to be longer, with more characters spent on providing context to the question.
In [239]:
for number, comment in enumerate(wozniak_comments.sort_values('Score').iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment))
In [240]:
for number, comment in enumerate(gates_comments.sort_values('Score').iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
In [241]:
for number, comment in enumerate(musk_comments.sort_values('Score').iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
In [242]:
for number, comment in enumerate(wozniak_comments.sort_values('Score', ascending=False).iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
In [243]:
for number, comment in enumerate(gates_comments.sort_values('Score', ascending=False).iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
In [244]:
for number, comment in enumerate(musk_comments.sort_values('Score', ascending=False).iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
Negative and positive comments by sentiment, however, both tend to be longer than even the positive comments by score, consisting of at least a few sentences that recall a fact or an experience, recounts an anecdote or piece of information, or explicitly communicates an opinion, followed by another sentence culminating in a question mark.
In [245]:
for number, comment in enumerate(wozniak_comments.sort_values('Sentiment').iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
In [246]:
for number, comment in enumerate(gates_comments.sort_values('Sentiment').iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
In [247]:
for number, comment in enumerate(musk_comments.sort_values('Sentiment').iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
In [165]:
for number, comment in enumerate(wozniak_comments.sort_values('Sentiment', ascending=False).iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
print('\n')
In [167]:
for number, comment in enumerate(gates_comments.sort_values('Sentiment', ascending=False).iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
print('\n')
In [170]:
for number, comment in enumerate(musk_comments.sort_values('Sentiment', ascending=False).iloc[:10]['Text'], 1):
print("{}. {}".format(number, comment.replace('\n','')))
print('\n')
There is some disagreement between Reddit's voting system and VADER, when it comes to determining the most negative and most positive comments within each AMA.
With Reddit's system, negative comments tend to be characterized by overly personal language. This includes insults regarding the interviewee's personal appearance, pleas for items or experiences that would personally benefit the asker, or statements that represent the personal beliefs of the asker but are phrased as a question. Examples include:
While there is some overlap, VADER appears to be more sensitive to negative sentiment as it occurs on a larger scale than the personal. Comments interested in the relationship between the interviewee and his company, or the interviewee and the law, or the interviewee and the enviroment are assigned the most negative sentiment scores, rather than any comments that explicitly involve the person of the interviewee. Examples include:
Regarding the most positive comments, the same seems to apply. Comments that are shorter and less personal/more neutral in phrasing tend to have a higher score. Examples include:
Comments that are longer and more personal in nature tend to have a higher positive sentiment, though a lower score. Examples include:
The Musk AMA slightly differs from the above model in that comments that are both high scoring and have high positive sentiment contain personal language, either as it relates to Musk, the user, or (more commonly) to both. This suggests that users--both those who post the top-level comments and those who upvote these top-level comments--want to connect with Musk on an emotional level both individually and collectively, using the sentiment-rich language of memory, flattery, and comparison.
Examples of personal language in high score Musk comments include:
Examples of personal language in high positive sentiment Musk comments include:
From this sentiment analysis of top-level comments submitted by Reddit users, we can see users feel positively toward technology as it is represented here by the figures of Steve Wozniak, Bill Gates, and Elon Musk.
Users feel particularly positively toward Musk; almost 70% of comments on his AMA are positive, 24% neutral, and only 9% negative. Users feel positively towards Gates as well; 58% of comments on his AMA are positive, 28% neutral, and 13% negative. Users feel the least positively toward Wozniak; 55% of comments on his AMA are positive, 32% neutral, and 12% negative.
I contend that a primary reason why users feel so positively toward Musk is that they associate him with the future, particularly with the future of space exploration. Words like "space", "future", and "mars" are some of the most frequent used in questions posed to him, for instance. The most frequent word in his AMA is his own first name, "Elon", which suggests that his personal brand eclipses his technologies even as they rely on one another to create positivity toward both.
In another way, Gates has also become associated with the future, but here, it's the potential future of the world to be a more humane place, and is thus a future we can work toward right now. Words like "world" and "people" occur frequently in questions posed to him. His own last name, "Gates", is also the most frequent word in his AMA, which again suggests that Gates' personal brand has eclipsed that of Microsoft's, even as the terms remain connected.
For Wozniak, however, the past is a heavy chain. Words like "apple" and "jobs" (i.e. Steve Jobs) occur frequently in questions posed to him, suggesting that users associate him with the past of an organization and of a deceased person, and it is of these things that Wozniak is asked to speak again and again. Unfortunately for him, users take issue with one or both of these things, which may very well be why Wozniak's AMA is the least positive. The most frequent word used in questions posed to him is not, after all, his name; instead, it's "apple".
And the secret to getting a highly-scored comment? Keep it shortish and neutral-positive (unless you're talking to Elon Musk, in which case longish and fawning are fine).
To fully understand sentiment in Reddit AMAs, the next step could involve performing a sentiment analysis on the top Reddit AMAs. One could then situate these results within this larger view of how sentiment works in Reddit. It would also be promising to track individual users across these three tech-oriented AMAs, to determine at the granular level of the individual commenter if the same people have participated in more than one AMA, and if so, if their sentiments change or remain the same across all three.
http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf
http://www.cs.cornell.edu/home/llee/omsa/omsa.pdf
https://www.Reddit.com/r/Python/comments/59thp6/sentiment_analysis_of_new_mpb_on_Reddit_with/
http://www.learndatasci.com/sentiment-analysis-Reddit-headlines-pythons-nltk/