In [1]:
library("twitteR")
library("DBI")
library("RSQLite")
library("wordcloud")
library("SnowballC")
library("tm")
library("dplyr")
Sys.setlocale(category = "LC_ALL", locale = "C")
Loading required package: RColorBrewer
Loading required package: NLP
Attaching package: ‘dplyr’
The following objects are masked from ‘package:twitteR’:
id, location
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
'LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C'
function to remove the urls
In [252]:
removeURL = function(x) {
str = gsub("[<].*[>]", "", gsub("http[^[:space:]]*", "", x))
return (str)
}
Authorize twitter API keys
In [6]:
setup_twitter_oauth('rAqZwIdUYoji0Vvp9p3R31Gqm',"V6MiPvEYC8VCEm4JcUEWB4YMGtUAS6OGvF0UXJp0VGYdDlYFt2","771507053600833536-gk5qwQ3jAkdaZC5jPCxSqc01tkNCzlR","vRN5Nhfp5WoinneUepef007yqlQuXxSD4gPXEUJwUXcmt")
[1] "Using direct authentication"
Set the number of tweets to collect
In [29]:
LIMIT = 1000
Set the search String
In [30]:
searchTopicString = "#MachineLearning"
Search the twitter for above mentioned string
In [31]:
topicTweets = searchTwitter(searchTopicString,LIMIT)
In [32]:
topicTweets = strip_retweets(topicTweets)
Show some data
In [33]:
head(topicTweets)
[[1]]
[1] "missdkingsbury: Check out @Google #Cloud Platform #BigData and #MachineLearning Fundamentals #course for only $60! https://t.co/Q6tmNsbsEw via @coursera"
[[2]]
[1] "joshinav: #AI app knows when couples are fighting, may someday predict (and prevent) conflict <U+201D> https://t.co/oquzRv2fTi #MachineLearning"
[[3]]
[1] "xtools_at: The mind in the machine: Demis Hassabis on artificial intelligence #machinelearning #bigdata #ai https://t.co/KbXDUwc61L"
[[4]]
[1] "sparkl3rz: Is #machinelearning the End of #Marketing As We Know It?\n#AI #bigdata #ML #digital... https://t.co/HIWNi7UTGJ by<U+2026> https://t.co/kMq8SqGaSn"
[[5]]
[1] "Chemo101: The latest The https://t.co/y6uFR8X6an Daily! https://t.co/eZcQP4aQUL Thanks to @bon__fifi #bigdata #machinelearning"
[[6]]
[1] "Dr_Who: il #MachineLearning per confermare i trend di fondo #digitalklive #BigData"
Convert the tweets to Dataframe
In [34]:
topicTweetsDf = twListToDF(topicTweets)
Show the converted tweets as a dataframe
In [35]:
head(topicTweetsDf)
text favorited favoriteCount replyToSN created truncated replyToSID id replyToUID statusSource screenName retweetCount isRetweet retweeted longitude latitude
Check out @Google #Cloud Platform #BigData and #MachineLearning Fundamentals #course for only $60! https://t.co/Q6tmNsbsEw via @coursera FALSE 0 NA 2017-04-22 13:34:56 FALSE NA 855776939822338048 NA <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a> missdkingsbury 0 FALSE FALSE NA NA
#AI app knows when couples are fighting, may someday predict (and prevent) conflict <U+201D> https://t.co/oquzRv2fTi #MachineLearning FALSE 0 NA 2017-04-22 13:34:00 FALSE NA 855776705365118981 NA <a href="http://bufferapp.com" rel="nofollow">Buffer</a> joshinav 0 FALSE FALSE NA NA
The mind in the machine: Demis Hassabis on artificial intelligence #machinelearning #bigdata #ai https://t.co/KbXDUwc61L FALSE 0 NA 2017-04-22 13:33:20 FALSE NA 855776536485670912 NA <a href="https://ifttt.com" rel="nofollow">IFTTT</a> xtools_at 0 FALSE FALSE NA NA
Is #machinelearning the End of #Marketing As We Know It?
#AI #bigdata #ML #digital... https://t.co/HIWNi7UTGJ by<U+2026> https://t.co/kMq8SqGaSn FALSE 1 NA 2017-04-22 13:32:41 TRUE NA 855776373360742403 NA <a href="http://linkis.com" rel="nofollow">Linkis: turn sharing into growth</a> sparkl3rz 0 FALSE FALSE NA NA
The latest The https://t.co/y6uFR8X6an Daily! https://t.co/eZcQP4aQUL Thanks to @bon__fifi #bigdata #machinelearning FALSE 1 NA 2017-04-22 13:32:22 FALSE NA 855776291378929664 NA <a href="http://paper.li" rel="nofollow">Paper.li</a> Chemo101 0 FALSE FALSE NA NA
il #MachineLearning per confermare i trend di fondo #digitalklive #BigData FALSE 1 NA 2017-04-22 13:32:09 FALSE NA 855776238698418177 NA <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a> Dr_Who 1 FALSE FALSE NA NA
Save the tweets
In [36]:
saveRDS(topicTweetsDf,file = "topicTweetsDf.Rda")
Load the tweets
In [253]:
topicTweetsDf = readRDS(file = "topicTweetsDf.Rda")
Select only the text field of the tweets
In [254]:
topicTweetsTextDf = data.frame(topicTweetsDf$text, stringsAsFactors=FALSE)
Rename the column name
In [255]:
colnames(topicTweetsTextDf) <- c("text")
Show the data
In [256]:
head(topicTweetsTextDf)
text
Check out @Google #Cloud Platform #BigData and #MachineLearning Fundamentals #course for only $60! https://t.co/Q6tmNsbsEw via @coursera
#AI app knows when couples are fighting, may someday predict (and prevent) conflict <U+201D> https://t.co/oquzRv2fTi #MachineLearning
The mind in the machine: Demis Hassabis on artificial intelligence #machinelearning #bigdata #ai https://t.co/KbXDUwc61L
Is #machinelearning the End of #Marketing As We Know It?
#AI #bigdata #ML #digital... https://t.co/HIWNi7UTGJ by<U+2026> https://t.co/kMq8SqGaSn
The latest The https://t.co/y6uFR8X6an Daily! https://t.co/eZcQP4aQUL Thanks to @bon__fifi #bigdata #machinelearning
il #MachineLearning per confermare i trend di fondo #digitalklive #BigData
In [257]:
removeURL(topicTweetsTextDf$text[4])
'Is #machinelearning the End of #Marketing As We Know It?
#AI #bigdata #ML #digital... by… '
Remove all the urls
In [258]:
len = nrow(topicTweetsTextDf)
for(i in 1:len) {
topicTweetsTextDf$text[i] = removeURL(topicTweetsTextDf$text[i])
}
head(topicTweetsTextDf)
text
Check out @Google #Cloud Platform #BigData and #MachineLearning Fundamentals #course for only $60! via @coursera
#AI app knows when couples are fighting, may someday predict (and prevent) conflict <U+201D> #MachineLearning
The mind in the machine: Demis Hassabis on artificial intelligence #machinelearning #bigdata #ai
Is #machinelearning the End of #Marketing As We Know It?
#AI #bigdata #ML #digital... by<U+2026>
The latest The Daily! Thanks to @bon__fifi #bigdata #machinelearning
il #MachineLearning per confermare i trend di fondo #digitalklive #BigData
Show the data cleaned in above step
In [259]:
head(topicTweetsTextDf)
text
Check out @Google #Cloud Platform #BigData and #MachineLearning Fundamentals #course for only $60! via @coursera
#AI app knows when couples are fighting, may someday predict (and prevent) conflict <U+201D> #MachineLearning
The mind in the machine: Demis Hassabis on artificial intelligence #machinelearning #bigdata #ai
Is #machinelearning the End of #Marketing As We Know It?
#AI #bigdata #ML #digital... by<U+2026>
The latest The Daily! Thanks to @bon__fifi #bigdata #machinelearning
il #MachineLearning per confermare i trend di fondo #digitalklive #BigData
Write the data in a file for Hadoop Map Reduce
In [260]:
write.table(topicTweetsTextDf, "/home/vipin/hadoopLocal/tweetText.txt", row.names = FALSE)
Read the file which is generated by hadoop word count program
In [264]:
mydata = read.csv("/home/vipin/hadoopLocal/hashtags.txt",sep="",header = FALSE)
Convert the data into a matrix
In [265]:
tdm = as.matrix(mydata)
Make the word cloud from this matrix
In [266]:
wordcloud(tdm, max.words = 250, min.freq =1, random.order = FALSE,colors = brewer.pal(4, "Dark2"))
Warning message in wordcloud(tdm, max.words = 250, min.freq = 1, random.order = FALSE, :
"kirkdborneu2026 could not be fit on page. It will not be plotted."Warning message in wordcloud(tdm, max.words = 250, min.freq = 1, random.order = FALSE, :
"nerderymisfit could not be fit on page. It will not be plotted."Warning message in wordcloud(tdm, max.words = 250, min.freq = 1, random.order = FALSE, :
"pymntsu2026 could not be fit on page. It will not be plotted."Warning message in wordcloud(tdm, max.words = 250, min.freq = 1, random.order = FALSE, :
"sagittarscopes could not be fit on page. It will not be plotted."
In [ ]:
Content source: vk3105/Data-Intensive-Programming-CSE-587
Similar notebooks: