DIC LAB 1 Problem 1 : Learning Jupyter, R and twitteR

Define the search string for a user below in searchUserString



In [1]:

    
searchUserString = "@realDonaldTrump"

Define the search string for topic below in searchTopicString



In [2]:

    
searchTopicString = "#MachineLearning"

Define the limit of number of tweets to be searched



In [3]:

    
LIMIT = 200

Define all the libraries which needs to be set for operations here



In [4]:

    
library("twitteR")
library("DBI")
library("RSQLite")
Sys.setlocale(category = "LC_ALL", locale = "C")









    




'LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C'

Setup the twitter app key for authentication



In [5]:

    
setup_twitter_oauth('YOUR KEY')









    



[1] "Using direct authentication"

Searches and collects a given number of tweets from twitter on a given topic



In [6]:

    
topicTweets = searchTwitter(searchTopicString,LIMIT)

Prints the top few tweets



In [7]:

    
head(topicTweets)









    





[[1]]
[1] "PetiotEric: It<U+2019>s time to take AI Seriously !\n\nhttps://t.co/3LcMaM5giZ\n#AI #ArtificialIntelligence #ML #MachineLearning https://t.co/6SpcYoC16V"

[[2]]
[1] "wicas: RT @RelearnML: What Statistics Topics are Needed for Excelling at Data Science? https://t.co/qjoXR3nu9b #machinelearning #ai https://t.co/A<U+2026>"

[[3]]
[1] "AMULETAnalytics: Putting data in the hands of doctors -- https://t.co/tfetTszhjE #MachineLearning"

[[4]]
[1] "researchercis: RT gp_pulipaka: A.I. allows to diagnose #Alzheimer's or #Parkinson's. #BigData #DeepLearning #MachineLearning #DataScience #AI"

[[5]]
[1] "ImDataScientist: #DataScience, #MachineLearning, #DeepLearning #AI platform?"

[[6]]
[1] "HotAirNetwork: BrainChip Holdings Ltd. Provides Updated Company Overview https://t.co/yU3LIgapud #MachineLearning"

Removes duplicate tweets and prints the top few tweets



In [8]:

    
head(strip_retweets(topicTweets, strip_manual=TRUE, strip_mt=TRUE))









    





[[1]]
[1] "PetiotEric: It<U+2019>s time to take AI Seriously !\n\nhttps://t.co/3LcMaM5giZ\n#AI #ArtificialIntelligence #ML #MachineLearning https://t.co/6SpcYoC16V"

[[2]]
[1] "AMULETAnalytics: Putting data in the hands of doctors -- https://t.co/tfetTszhjE #MachineLearning"

[[3]]
[1] "ImDataScientist: #DataScience, #MachineLearning, #DeepLearning #AI platform?"

[[4]]
[1] "HotAirNetwork: BrainChip Holdings Ltd. Provides Updated Company Overview https://t.co/yU3LIgapud #MachineLearning"

[[5]]
[1] "ianatsynonym: .@IBMWatson technology used to create #MachineLearning solution for @IBMzSystems https://t.co/2G2asgxRHy #AI<U+2026> https://t.co/T6pJHGoIHF"

[[6]]
[1] "StackDevJobs: Machine Learning Research Engineer at @autodesk (San Francisco, CA) https://t.co/ANJzPbK0lU #machinelearning"

Fetches the given User's information from twitter



In [9]:

    
userInfo = getUser(searchUserString)

Prints the searched User's description



In [10]:

    
userInfo$getDescription()









    




'45th President of the United States of America'

Prints the number of followers the seached User have



In [11]:

    
userInfo$getFollowersCount()

25090249

Prints a given number of name and id of the seached User's friends



In [12]:

    
userInfo$getFriends(n = 5)









    





$`471672239`
[1] "KellyannePolls"

$`20733972`
[1] "Reince"

$`322293052`
[1] "RealRomaDowney"

$`720293443260456960`
[1] "Trump"

$`2325495378`
[1] "TrumpGolf"

Prints a given number of favorites tweets of the seached User



In [13]:

    
userInfo$getFavorites(n = 5)









    





[[1]]
[1] "IvankaTrump: 2016 has been one of the most eventful and exciting years of my life. I wish you peace, joy, love and laughter. Hap<U+2026> https://t.co/A1I3tvTySZ"

[[2]]
[1] "DonaldJTrumpJr: FINAL PUSH! Eric and I doing dozens of radio interviews. We can win this thing! GET OUT AND VOTE! #MAGA #ElectionDay https://t.co/dYcxRCBQUd"

[[3]]
[1] "DanScavino: INDIANA #TrumpTrain<ed><U+00A0><U+00BD><ed><U+00BA><U+0082><ed><U+00A0><U+00BD><ed><U+00B2><U+00A8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8>\ncc: @mike_pence @marc_lotter https://t.co/fxvQ43k2im"

[[4]]
[1] "mike_pence: Congrats to my running mate @realDonaldTrump on a big debate win! Proud to stand with you as we #MAGA."

[[5]]
[1] "TeamTrump: It's hard to fight terrorism when you're making cash payments to the world's LARGEST state sponsor of TERROR. Under<U+2026> https://t.co/GPSkdoiiRC"

Converts the tweets to a data frame



In [14]:

    
topicDf = twListToDF(topicTweets)

Prints a few top tweets in data frame format



In [15]:

    
head(topicDf)









    





text favorited favoriteCount replyToSN created truncated replyToSID id replyToUID statusSource screenName retweetCount isRetweet retweeted longitude latitude

	It<U+2019>s time to take AI Seriously !

https://t.co/3LcMaM5giZ
#AI #ArtificialIntelligence #ML #MachineLearning https://t.co/6SpcYoC16V       FALSE                                                                                                                                                                                            0                                                                                                                                                                                                NA                                                                                                                                                                                               2017-02-18 00:12:58                                                                                                                                                                              FALSE                                                                                                                                                                                            NA                                                                                                                                                                                               832744683105312768                                                                                                                                                                               NA                                                                                                                                                                                               <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>                                                           PetiotEric                                                                                                                                               0                                                                                                                                                                                                FALSE                                                                                                                                                                                            FALSE                                                                                                                                                                                            NA                                                                                                                                                                                               NA                                                                                                                                                                                               
	RT @RelearnML: What Statistics Topics are Needed for Excelling at Data Science? https://t.co/qjoXR3nu9b #machinelearning #ai https://t.co/A<U+2026> FALSE                                                                                                                                                    0                                                                                                                                                        NA                                                                                                                                                       2017-02-18 00:12:58                                                                                                                                      FALSE                                                                                                                                                    NA                                                                                                                                                       832744681964392449                                                                                                                                       NA                                                                                                                                                       <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>                                                         wicas                                                                                                            3                                                                                                                                                         TRUE                                                                                                                                                    FALSE                                                                                                                                                    NA                                                                                                                                                       NA                                                                                                                                                       
	Putting data in the hands of doctors -- https://t.co/tfetTszhjE #MachineLearning                                                                   FALSE                                                                                                                                                                                      0                                                                                                                                                                                          NA                                                                                                                                                                                         2017-02-18 00:12:29                                                                                                                                                                        FALSE                                                                                                                                                                                      NA                                                                                                                                                                                         832744561042612224                                                                                                                                                                         NA                                                                                                                                                                                         <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>                                                                     AMULETAnalytics                                                                                                                                                                            0                                                                                                                                                                                          FALSE                                                                                                                                                                                      FALSE                                                                                                                                                                                      NA                                                                                                                                                                                         NA                                                                                                                                                                                         
	RT gp_pulipaka: A.I. allows to diagnose #Alzheimer's or #Parkinson's. #BigData #DeepLearning #MachineLearning #DataScience #AI                     FALSE                                                                                                                                                                                      0                                                                                                                                                                                          NA                                                                                                                                                                                         2017-02-18 00:12:19                                                                                                                                                                        FALSE                                                                                                                                                                                      NA                                                                                                                                                                                         832744518956879872                                                                                                                                                                         NA                                                                                                                                                                                         <a href="http://www.botize.com" rel="nofollow">Botize</a>                                                                              researchercis                                                                                                                                      1                                                                                                                                                                                          FALSE                                                                                                                                                                                      FALSE                                                                                                                                                                                      NA                                                                                                                                                                                         NA                                                                                                                                                                                         
	#DataScience, #MachineLearning, #DeepLearning #AI platform?                                                                                        FALSE                                                                                                                                                                                      0                                                                                                                                                                                          NA                                                                                                                                                                                         2017-02-18 00:10:47                                                                                                                                                                        FALSE                                                                                                                                                                                      NA                                                                                                                                                                                         832744130857951234                                                                                                                                                                         NA                                                                                                                                                                                         <a href="http://datasciencepakistan.com" rel="nofollow">Data Pakistan</a>                                                              ImDataScientist                                                                                                                                                                            0                                                                                                                                                                                          FALSE                                                                                                                                                                                      FALSE                                                                                                                                                                                      NA                                                                                                                                                                                         NA                                                                                                                                                                                         
	BrainChip Holdings Ltd. Provides Updated Company Overview https://t.co/yU3LIgapud #MachineLearning                                                 FALSE                                                                                                                                                                                      0                                                                                                                                                                                          NA                                                                                                                                                                                         2017-02-18 00:10:19                                                                                                                                                                        FALSE                                                                                                                                                                                      NA                                                                                                                                                                                         832744013472075776                                                                                                                                                                         NA                                                                                                                                                                                         <a href="https://ifttt.com" rel="nofollow">IFTTT</a>                                                                                   HotAirNetwork                                                                                                                                      0                                                                                                                                                                                          FALSE                                                                                                                                                                                      FALSE                                                                                                                                                                                      NA                                                                                                                                                                                         NA

1st line creates a temporary sqlite db file

2nd line store and load tweets database which is backend registered

3rd line store the tweets in a table named "tweets", which is automatically provided by twitterR



In [16]:

    
sql_lite_file = tempfile()
register_sqlite_backend(sql_lite_file)
store_tweets_db(topicTweets)









    




TRUE

Loads the stored tweets from table, here its is "tweets", which is automatically provided by twitterR



In [17]:

    
from_db_tweets = load_tweets_db()

Prints top few tweets which are retrieved from the db



In [18]:

    
head(from_db_tweets)









    





[[1]]
[1] "PetiotEric: It<U+2019>s time to take AI Seriously !\n\nhttps://t.co/3LcMaM5giZ\n#AI #ArtificialIntelligence #ML #MachineLearning https://t.co/6SpcYoC16V"

[[2]]
[1] "wicas: RT @RelearnML: What Statistics Topics are Needed for Excelling at Data Science? https://t.co/qjoXR3nu9b #machinelearning #ai https://t.co/A<U+2026>"

[[3]]
[1] "AMULETAnalytics: Putting data in the hands of doctors -- https://t.co/tfetTszhjE #MachineLearning"

[[4]]
[1] "researchercis: RT gp_pulipaka: A.I. allows to diagnose #Alzheimer's or #Parkinson's. #BigData #DeepLearning #MachineLearning #DataScience #AI"

[[5]]
[1] "ImDataScientist: #DataScience, #MachineLearning, #DeepLearning #AI platform?"

[[6]]
[1] "HotAirNetwork: BrainChip Holdings Ltd. Provides Updated Company Overview https://t.co/yU3LIgapud #MachineLearning"

Searches tweets from given User's timeline by default only 20 are fetched



In [19]:

    
userTweets = userTimeline(searchUserString)

Prints 5 tweets of the timeline



In [20]:

    
userTweets[1:5]









    





[[1]]
[1] "realDonaldTrump: Looking forward to the Florida rally tomorrow. Big crowd expected!"

[[2]]
[1] "realDonaldTrump: \"One of the most effective press conferences I've ever seen!\" says Rush Limbaugh. Many agree.Yet FAKE MEDIA  calls it differently! Dishonest"

[[3]]
[1] "realDonaldTrump: The FAKE NEWS media (failing @nytimes, @NBCNews, @ABC, @CBS, @CNN) is not my enemy, it is the enemy of the American People!"

[[4]]
[1] "realDonaldTrump: Join me at 11:00am:\nWatch here: https://t.co/veqKmsGAwf https://t.co/UzndIjIqjM"

[[5]]
[1] "realDonaldTrump: General Keith Kellogg, who I have known for a long time, is very much in play for NSA - as are three others."

Searches given number of tweets from given User's timeline



In [21]:

    
userTweetsLarge = userTimeline(searchUserString, n = 100)

Prints the size of the tweets collected in the previous step



In [22]:

    
length(userTweetsLarge)

The availableTrendLocations function will return a data.frame with a location in each row and the woeid giving that location’s WOEID



In [23]:

    
availTrends = availableTrendLocations()

Prints the top few trends in a data frame format



In [24]:

    
head(availTrends)









    





name country woeid

	Worldwide          1        
	Winnipeg Canada   2972     
	Ottawa   Canada   3369     
	Quebec   Canada   3444     
	Montreal Canada   3534     
	Toronto  Canada   4118

The closestTrendLocations function is passed a latitude and longitude and will return the same style data.frame as of availableTrendLocations.



In [25]:

    
closeTrends = closestTrendLocations(34.05223,-118.2437)

Prints top trending locations



In [26]:

    
head(closeTrends)









    





name country woeid

	Los Angeles  United States 2442047

The getTrends function is used to pull current trend information from a given location, which is specified using a WOEID



In [27]:

    
trends = getTrends(2442047)

Prints top few trends info in a data frame format



In [28]:

    
head(trends)









    





name url query woeid

	#LARain                                      http://twitter.com/search?q=%23LARain        %23LARain                                    2442047                                      
	#Logan                                       http://twitter.com/search?q=%23Logan         %23Logan                                     2442047                                      
	Matt Reeves                                  http://twitter.com/search?q=%22Matt+Reeves%22 %22Matt+Reeves%22                            2442047                                      
	#Lucifer                                     http://twitter.com/search?q=%23Lucifer       %23Lucifer                                   2442047                                      
	#FridayFeeling                               http://twitter.com/search?q=%23FridayFeeling %23FridayFeeling                             2442047                                      
	#StormWatch                                  http://twitter.com/search?q=%23StormWatch    %23StormWatch                                2442047

Collects given number of tweets on a search topic



In [29]:

    
r_tweets = searchTwitter(searchTopicString, n = 300)

Extract source user agent of all the tweets fetched in the previous step



In [30]:

    
sources = sapply(r_tweets,function(x)x$getStatusSource())

Removed the anchored URL string if any and replaces with nothing



In [31]:

    
sources = gsub("</a>","",sources)

Split the elements of a character vector souces into substrings according to the matches to substring split within them.



In [32]:

    
sources = strsplit(sources,">")

Removes any data source which has a length greater than 1



In [33]:

    
sources = sapply(sources,function(x)ifelse(length(x)>1,x[2],x[1]))

Stores data in a table



In [34]:

    
source_table=table(sources)

Shows a pie chart based on the table generated above



In [35]:

    
pie(source_table[source_table>10])

References



In [ ]:

text	favorited	replyToSN	created	truncated	replyToSID	id	replyToUID	statusSource	screenName	retweetCount	isRetweet	retweeted	longitude	latitude
It<U+2019>s time to take AI Seriously ! https://t.co/3LcMaM5giZ #AI #ArtificialIntelligence #ML #MachineLearning https://t.co/6SpcYoC16V	FALSE	NA	2017-02-18 00:12:58	FALSE	NA	832744683105312768	NA	<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>	PetiotEric	0	FALSE	FALSE	NA	NA
RT @RelearnML: What Statistics Topics are Needed for Excelling at Data Science? https://t.co/qjoXR3nu9b #machinelearning #ai https://t.co/A<U+2026>	FALSE	NA	2017-02-18 00:12:58	FALSE	NA	832744681964392449	NA	<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>	wicas	3	TRUE	FALSE	NA	NA
Putting data in the hands of doctors -- https://t.co/tfetTszhjE #MachineLearning	FALSE	NA	2017-02-18 00:12:29	FALSE	NA	832744561042612224	NA	<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>	AMULETAnalytics	0	FALSE	FALSE	NA	NA
RT gp_pulipaka: A.I. allows to diagnose #Alzheimer's or #Parkinson's. #BigData #DeepLearning #MachineLearning #DataScience #AI	FALSE	NA	2017-02-18 00:12:19	FALSE	NA	832744518956879872	NA	<a href="http://www.botize.com" rel="nofollow">Botize</a>	researchercis	1	FALSE	FALSE	NA	NA
#DataScience, #MachineLearning, #DeepLearning #AI platform?	FALSE	NA	2017-02-18 00:10:47	FALSE	NA	832744130857951234	NA	<a href="http://datasciencepakistan.com" rel="nofollow">Data Pakistan</a>	ImDataScientist	0	FALSE	FALSE	NA	NA
BrainChip Holdings Ltd. Provides Updated Company Overview https://t.co/yU3LIgapud #MachineLearning	FALSE	NA	2017-02-18 00:10:19	FALSE	NA	832744013472075776	NA	<a href="https://ifttt.com" rel="nofollow">IFTTT</a>	HotAirNetwork	0	FALSE	FALSE	NA	NA

name	country	woeid
Worldwide		1
Winnipeg	Canada	2972
Ottawa	Canada	3369
Quebec	Canada	3444
Montreal	Canada	3534
Toronto	Canada	4118

name	url	query	woeid
#LARain	http://twitter.com/search?q=%23LARain	%23LARain	2442047
#Logan	http://twitter.com/search?q=%23Logan	%23Logan	2442047
Matt Reeves	http://twitter.com/search?q=%22Matt+Reeves%22	%22Matt+Reeves%22	2442047
#Lucifer	http://twitter.com/search?q=%23Lucifer	%23Lucifer	2442047
#FridayFeeling	http://twitter.com/search?q=%23FridayFeeling	%23FridayFeeling	2442047
#StormWatch	http://twitter.com/search?q=%23StormWatch	%23StormWatch	2442047