DIC LAB 1 Problem 1 : Learning Jupyter, R and twitteR

Define the search string for a user below in searchUserString


In [1]:
searchUserString = "@realDonaldTrump"

Define the search string for topic below in searchTopicString


In [2]:
searchTopicString = "#MachineLearning"

Define the limit of number of tweets to be searched


In [3]:
LIMIT = 200

Define all the libraries which needs to be set for operations here


In [4]:
library("twitteR")
library("DBI")
library("RSQLite")
Sys.setlocale(category = "LC_ALL", locale = "C")


'LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C'

Setup the twitter app key for authentication


In [5]:
setup_twitter_oauth('YOUR KEY')


[1] "Using direct authentication"

Searches and collects a given number of tweets from twitter on a given topic


In [6]:
topicTweets = searchTwitter(searchTopicString,LIMIT)

Prints the top few tweets


In [7]:
head(topicTweets)


[[1]]
[1] "PetiotEric: It<U+2019>s time to take AI Seriously !\n\nhttps://t.co/3LcMaM5giZ\n#AI #ArtificialIntelligence #ML #MachineLearning https://t.co/6SpcYoC16V"

[[2]]
[1] "wicas: RT @RelearnML: What Statistics Topics are Needed for Excelling at Data Science? https://t.co/qjoXR3nu9b #machinelearning #ai https://t.co/A<U+2026>"

[[3]]
[1] "AMULETAnalytics: Putting data in the hands of doctors -- https://t.co/tfetTszhjE #MachineLearning"

[[4]]
[1] "researchercis: RT gp_pulipaka: A.I. allows to diagnose #Alzheimer's or #Parkinson's. #BigData #DeepLearning #MachineLearning #DataScience #AI"

[[5]]
[1] "ImDataScientist: #DataScience, #MachineLearning, #DeepLearning #AI platform?"

[[6]]
[1] "HotAirNetwork: BrainChip Holdings Ltd. Provides Updated Company Overview https://t.co/yU3LIgapud #MachineLearning"

Removes duplicate tweets and prints the top few tweets


In [8]:
head(strip_retweets(topicTweets, strip_manual=TRUE, strip_mt=TRUE))


[[1]]
[1] "PetiotEric: It<U+2019>s time to take AI Seriously !\n\nhttps://t.co/3LcMaM5giZ\n#AI #ArtificialIntelligence #ML #MachineLearning https://t.co/6SpcYoC16V"

[[2]]
[1] "AMULETAnalytics: Putting data in the hands of doctors -- https://t.co/tfetTszhjE #MachineLearning"

[[3]]
[1] "ImDataScientist: #DataScience, #MachineLearning, #DeepLearning #AI platform?"

[[4]]
[1] "HotAirNetwork: BrainChip Holdings Ltd. Provides Updated Company Overview https://t.co/yU3LIgapud #MachineLearning"

[[5]]
[1] "ianatsynonym: .@IBMWatson technology used to create #MachineLearning solution for @IBMzSystems https://t.co/2G2asgxRHy #AI<U+2026> https://t.co/T6pJHGoIHF"

[[6]]
[1] "StackDevJobs: Machine Learning Research Engineer at @autodesk (San Francisco, CA) https://t.co/ANJzPbK0lU #machinelearning"

Fetches the given User's information from twitter


In [9]:
userInfo = getUser(searchUserString)

Prints the searched User's description


In [10]:
userInfo$getDescription()


'45th President of the United States of America'

Prints the number of followers the seached User have


In [11]:
userInfo$getFollowersCount()


25090249

Prints a given number of name and id of the seached User's friends


In [12]:
userInfo$getFriends(n = 5)


$`471672239`
[1] "KellyannePolls"

$`20733972`
[1] "Reince"

$`322293052`
[1] "RealRomaDowney"

$`720293443260456960`
[1] "Trump"

$`2325495378`
[1] "TrumpGolf"

Prints a given number of favorites tweets of the seached User


In [13]:
userInfo$getFavorites(n = 5)


[[1]]
[1] "IvankaTrump: 2016 has been one of the most eventful and exciting years of my life. I wish you peace, joy, love and laughter. Hap<U+2026> https://t.co/A1I3tvTySZ"

[[2]]
[1] "DonaldJTrumpJr: FINAL PUSH! Eric and I doing dozens of radio interviews. We can win this thing! GET OUT AND VOTE! #MAGA #ElectionDay https://t.co/dYcxRCBQUd"

[[3]]
[1] "DanScavino: INDIANA #TrumpTrain<ed><U+00A0><U+00BD><ed><U+00BA><U+0082><ed><U+00A0><U+00BD><ed><U+00B2><U+00A8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8>\ncc: @mike_pence @marc_lotter https://t.co/fxvQ43k2im"

[[4]]
[1] "mike_pence: Congrats to my running mate @realDonaldTrump on a big debate win! Proud to stand with you as we #MAGA."

[[5]]
[1] "TeamTrump: It's hard to fight terrorism when you're making cash payments to the world's LARGEST state sponsor of TERROR. Under<U+2026> https://t.co/GPSkdoiiRC"

Converts the tweets to a data frame


In [14]:
topicDf = twListToDF(topicTweets)

Prints a few top tweets in data frame format


In [15]:
head(topicDf)


textfavoritedfavoriteCountreplyToSNcreatedtruncatedreplyToSIDidreplyToUIDstatusSourcescreenNameretweetCountisRetweetretweetedlongitudelatitude
It<U+2019>s time to take AI Seriously ! https://t.co/3LcMaM5giZ #AI #ArtificialIntelligence #ML #MachineLearning https://t.co/6SpcYoC16V FALSE 0 NA 2017-02-18 00:12:58 FALSE NA 832744683105312768 NA <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> PetiotEric 0 FALSE FALSE NA NA
RT @RelearnML: What Statistics Topics are Needed for Excelling at Data Science? https://t.co/qjoXR3nu9b #machinelearning #ai https://t.co/A<U+2026>FALSE 0 NA 2017-02-18 00:12:58 FALSE NA 832744681964392449 NA <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> wicas 3 TRUE FALSE NA NA
Putting data in the hands of doctors -- https://t.co/tfetTszhjE #MachineLearning FALSE 0 NA 2017-02-18 00:12:29 FALSE NA 832744561042612224 NA <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a> AMULETAnalytics 0 FALSE FALSE NA NA
RT gp_pulipaka: A.I. allows to diagnose #Alzheimer's or #Parkinson's. #BigData #DeepLearning #MachineLearning #DataScience #AI FALSE 0 NA 2017-02-18 00:12:19 FALSE NA 832744518956879872 NA <a href="http://www.botize.com" rel="nofollow">Botize</a> researchercis 1 FALSE FALSE NA NA
#DataScience, #MachineLearning, #DeepLearning #AI platform? FALSE 0 NA 2017-02-18 00:10:47 FALSE NA 832744130857951234 NA <a href="http://datasciencepakistan.com" rel="nofollow">Data Pakistan</a> ImDataScientist 0 FALSE FALSE NA NA
BrainChip Holdings Ltd. Provides Updated Company Overview https://t.co/yU3LIgapud #MachineLearning FALSE 0 NA 2017-02-18 00:10:19 FALSE NA 832744013472075776 NA <a href="https://ifttt.com" rel="nofollow">IFTTT</a> HotAirNetwork 0 FALSE FALSE NA NA

1st line creates a temporary sqlite db file

2nd line store and load tweets database which is backend registered

3rd line store the tweets in a table named "tweets", which is automatically provided by twitterR


In [16]:
sql_lite_file = tempfile()
register_sqlite_backend(sql_lite_file)
store_tweets_db(topicTweets)


TRUE

Loads the stored tweets from table, here its is "tweets", which is automatically provided by twitterR


In [17]:
from_db_tweets = load_tweets_db()

Prints top few tweets which are retrieved from the db


In [18]:
head(from_db_tweets)


[[1]]
[1] "PetiotEric: It<U+2019>s time to take AI Seriously !\n\nhttps://t.co/3LcMaM5giZ\n#AI #ArtificialIntelligence #ML #MachineLearning https://t.co/6SpcYoC16V"

[[2]]
[1] "wicas: RT @RelearnML: What Statistics Topics are Needed for Excelling at Data Science? https://t.co/qjoXR3nu9b #machinelearning #ai https://t.co/A<U+2026>"

[[3]]
[1] "AMULETAnalytics: Putting data in the hands of doctors -- https://t.co/tfetTszhjE #MachineLearning"

[[4]]
[1] "researchercis: RT gp_pulipaka: A.I. allows to diagnose #Alzheimer's or #Parkinson's. #BigData #DeepLearning #MachineLearning #DataScience #AI"

[[5]]
[1] "ImDataScientist: #DataScience, #MachineLearning, #DeepLearning #AI platform?"

[[6]]
[1] "HotAirNetwork: BrainChip Holdings Ltd. Provides Updated Company Overview https://t.co/yU3LIgapud #MachineLearning"

Searches tweets from given User's timeline by default only 20 are fetched


In [19]:
userTweets = userTimeline(searchUserString)

Prints 5 tweets of the timeline


In [20]:
userTweets[1:5]


[[1]]
[1] "realDonaldTrump: Looking forward to the Florida rally tomorrow. Big crowd expected!"

[[2]]
[1] "realDonaldTrump: \"One of the most effective press conferences I've ever seen!\" says Rush Limbaugh. Many agree.Yet FAKE MEDIA  calls it differently! Dishonest"

[[3]]
[1] "realDonaldTrump: The FAKE NEWS media (failing @nytimes, @NBCNews, @ABC, @CBS, @CNN) is not my enemy, it is the enemy of the American People!"

[[4]]
[1] "realDonaldTrump: Join me at 11:00am:\nWatch here: https://t.co/veqKmsGAwf https://t.co/UzndIjIqjM"

[[5]]
[1] "realDonaldTrump: General Keith Kellogg, who I have known for a long time, is very much in play for NSA - as are three others."

Searches given number of tweets from given User's timeline


In [21]:
userTweetsLarge = userTimeline(searchUserString, n = 100)

Prints the size of the tweets collected in the previous step


In [22]:
length(userTweetsLarge)


99

The availableTrendLocations function will return a data.frame with a location in each row and the woeid giving that location’s WOEID


In [23]:
availTrends = availableTrendLocations()

Prints the top few trends in a data frame format


In [24]:
head(availTrends)


namecountrywoeid
Worldwide 1
Winnipeg Canada 2972
Ottawa Canada 3369
Quebec Canada 3444
Montreal Canada 3534
Toronto Canada 4118

The closestTrendLocations function is passed a latitude and longitude and will return the same style data.frame as of availableTrendLocations.


In [25]:
closeTrends = closestTrendLocations(34.05223,-118.2437)

Prints top trending locations


In [26]:
head(closeTrends)


namecountrywoeid
Los Angeles United States2442047

The getTrends function is used to pull current trend information from a given location, which is specified using a WOEID


In [27]:
trends = getTrends(2442047)

Prints top few trends info in a data frame format


In [28]:
head(trends)


nameurlquerywoeid
#LARain http://twitter.com/search?q=%23LARain %23LARain 2442047
#Logan http://twitter.com/search?q=%23Logan %23Logan 2442047
Matt Reeves http://twitter.com/search?q=%22Matt+Reeves%22%22Matt+Reeves%22 2442047
#Lucifer http://twitter.com/search?q=%23Lucifer %23Lucifer 2442047
#FridayFeeling http://twitter.com/search?q=%23FridayFeeling %23FridayFeeling 2442047
#StormWatch http://twitter.com/search?q=%23StormWatch %23StormWatch 2442047

Collects given number of tweets on a search topic


In [29]:
r_tweets = searchTwitter(searchTopicString, n = 300)

Extract source user agent of all the tweets fetched in the previous step


In [30]:
sources = sapply(r_tweets,function(x)x$getStatusSource())

Removed the anchored URL string if any and replaces with nothing


In [31]:
sources = gsub("</a>","",sources)

Split the elements of a character vector souces into substrings according to the matches to substring split within them.


In [32]:
sources = strsplit(sources,">")

Removes any data source which has a length greater than 1


In [33]:
sources = sapply(sources,function(x)ifelse(length(x)>1,x[2],x[1]))

Stores data in a table


In [34]:
source_table=table(sources)

Shows a pie chart based on the table generated above


In [35]:
pie(source_table[source_table>10])


References


In [ ]: