DIC LAB 1 Problem 2 : Study response to an event

Define the limit of numbers to be searched



In [73]:

    
LIMIT = 20000

Declare all the search, strings which will be searched in a vector, following are the hastags for SuperBowl



In [74]:

    
superBowlStrings = c("#superbowl","#Patriots","#Falcons","#SB51")

Declare your custom search strings, which will be searched, in a vector. Leave it empty vector if no custom string.

Eg : Empty Vector will look like customUserSearchStrings = c("")

NOTE : Make sure you empty leave customUserSearchStrings vector empty if you want to see the results from superBowlStrings



In [109]:

    
customUserSearchStrings = c("") #c("#NASA")

Create a boolean variable to check if the custom search string is present or not. Custom search will be given preference over SuperBowl search. Number of search limit will change too i.e. LIMIT = 1000



In [76]:

    
isCustomSearch = FALSE
if(customUserSearchStrings[1]!=""){
    isCustomSearch = TRUE
    LIMIT = 1000
    userSearchStrings = customUserSearchStrings
}else{
    isCustomSearch = FALSE
    LIMIT = 20000
    userSearchStrings = superBowlStrings
}

Define all the libraries which needs to be set up for operations below



In [77]:

    
library("twitteR")
library("DBI")
library("RSQLite")
library("gtools")
library("bitops")
library("ggplot2")
library("RCurl")
library("RJSONIO")
library("ggmap")
library("sp")
library("mapdata")
library("maptools")
library("scales")
library("maps")
Sys.setlocale(category = "LC_ALL", locale = "C")









    




'LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C'

Setup the twitter app key for authentication



In [78]:

    
setup_twitter_oauth('YOUR KEY')









    



[1] "Using direct authentication"

Following function fetches the state and country from the address_components field in json response .Following is the response

"address_components" : [ { "long_name" : "Buffalo", "short_name" : "Buffalo", "types" : [ "locality", "political" ] }, { "long_name" : "Erie County", "short_name" : "Erie County", "types" : [ "administrative_area_level_2", "political" ] }, { "long_name" : "New York", "short_name" : "NY", "types" : [ "administrative_area_level_1", "political" ] }, { "long_name" : "United States", "short_name" : "US", "types" : [ "country", "political" ] } ],

It will generate the state as New York and country as United States.



In [79]:

    
parseAddressComponents <- function(addr_comp){
    result = tryCatch(
        {
            addrSize = length(addr_comp)
            stateName = ""
            countryName = ""
            for(i in 1:addrSize){
                if(addr_comp[[i]]$types[[1]] == "administrative_area_level_1"){
                    stateName = addr_comp[[i]]$long_name
                }
                if(addr_comp[[i]]$types[[1]] == "country"){
                    countryName = addr_comp[[i]]$long_name
                }
            }
            if(stateName == ""){
                stateName = NA
            }
            if(countryName == ""){
                countryName = NA
            }
            return (c(stateName,countryName))
        },
        error=function(cond) {
            return (c(NA,NA))
        },
        warning=function(cond) {
        },
        finally={
        }
    )
    return (result);
}

This function generates the url which will be used to query the Google server. You have to pass the address for this function

Eg : https://maps.google.com/maps/api/geocode/json?address=Buffalo&key=YOUR_KEY_GOOGLE&sensor=false



In [80]:

    
getGeoCodeUrl <- function(address, return.call = "json", sensor = "false") {
    root <- "https://maps.google.com/maps/api/geocode/"
    url <- paste(root, return.call, "?address=", address,"&key=YOUR_KEY_GOOGLE","&sensor=", sensor, sep = "")
    return(URLencode(url))
}

This function generates the url which will be used to query the Google server. You have to pass the latitude and longitude for this function

Eg. https://maps.google.com/maps/api/geocode/json?address=42.88644679999999,-78.8783689&key=AIzaSyCskJdqaLuNJBJ58IHDqfsjJAm9jkZbcKM&sensor=false



In [81]:

    
getLatLngURL <- function(lat, lng, return.call = "json", sensor = "false") {
    root <- "https://maps.google.com/maps/api/geocode/"
    latlng = paste(lat, lng, sep = ",")
    url <- paste(root, return.call, "?address=", latlng,"&key=AIzaSyCskJdqaLuNJBJ58IHDqfsjJAm9jkZbcKM","&sensor=", sensor, sep = "")
    return(URLencode(url))
}

The below code decomposes the json response from the Google Map API code

The output response is as follows for location Buffalo

{ "results" : [ { "address_components" : [ { "long_name" : "Buffalo", "short_name" : "Buffalo", "types" : [ "locality", "political" ] }, { "long_name" : "Erie County", "short_name" : "Erie County", "types" : [ "administrative_area_level_2", "political" ] }, { "long_name" : "New York", "short_name" : "NY", "types" : [ "administrative_area_level_1", "political" ] }, { "long_name" : "United States", "short_name" : "US", "types" : [ "country", "political" ] } ], "formatted_address" : "Buffalo, NY, USA", "geometry" : { "bounds" : { "northeast" : { "lat" : 42.9664549, "lng" : -78.795157 }, "southwest" : { "lat" : 42.826023, "lng" : -78.9337276 } }, "location" : { "lat" : 42.88644679999999, "lng" : -78.8783689 }, "location_type" : "APPROXIMATE", "viewport" : { "northeast" : { "lat" : 42.9664549, "lng" : -78.795157 }, "southwest" : { "lat" : 42.826023, "lng" : -78.9142665 } } }, "place_id" : "ChIJoeXfUmES04kRcYEfGKUEI5g", "types" : [ "locality", "political" ] } ], "status" : "OK" }

From the above json repsonse the data is extracted in a data frame shown in following codes

We remove all those locations which are not in United State or which are in United State but no information about the state.



In [82]:

    
getGeoCode <- function(address,userScreenName,verbose=FALSE) {
    if(verbose) cat(address,"\n")
    url <- getGeoCodeUrl(address)
    doc <- getURL(url)
    docData <- fromJSON(doc,simplify = FALSE)
    
    if(docData$status=="OK") {
        
        lat <- docData$results[[1]]$geometry$location$lat
        lng <- docData$results[[1]]$geometry$location$lng        
        
        address_component <- parseAddressComponents(docData$results[[1]]$address_components)
        state = address_component[1]
        country = address_component[2]
        
        if(!invalid(country) & country == "United States"){
            
            if(invalid(state)){
                latlongUrl = getLatLngURL(lat,lng)
                latlngDoc <- getURL(latlongUrl)
                latlngDocData <- fromJSON(latlngDoc,simplify = FALSE)
    
                if(latlngDocData$status=="OK") {
                    address_component_latlng <- parseAddressComponents(latlngDocData$results[[1]]$address_components)
                    latlngState = address_component_latlng[1]
                    latlngCountry = address_component_latlng[2]
                    return(c(userScreenName, lat, lng, latlngState, latlngCountry))
                }
            }else{
                return(c(userScreenName, lat, lng, state, country))
            }
        } 
        Sys.sleep(0.5)
    }
}

1st line creates a sqlite db file

2nd line store and load tweets database which is backend registered

3rd line store the tweets in a table named "tweets", which is automatically provided by twitterR



In [83]:

    
if(!isCustomSearch){
    mydbName = toString("DIClab1.db")
}else{
    mydbName = toString("custom.db")
}
conn = dbConnect(SQLite(), dbname = mydbName)
register_sqlite_backend(mydbName)

Searches the tweets from the user input search string vector and equally divides them in equal chunks per input string in vector. So that the total sum of all the tweets is LIMIT, i.e 20000 here.



In [84]:

    
size = length(userSearchStrings)
for(searchStr in userSearchStrings){
    tweets <- searchTwitter(searchStr,LIMIT/size)
    store_tweets_db(tweets)
}

Makes a connection with the database



In [85]:

    
con = dbConnect(SQLite(), dbname = mydbName)

Query the database for all the screen names of the users.



In [86]:

    
tweetsUserNames = dbGetQuery(con,"SELECT screenName from tweets")

Generate a vector by taking transpose of the above result.



In [87]:

    
userNamesVec = c(t(tweetsUserNames))

Prints a few user's screen names



In [88]:

    
head(userNamesVec)









    





	'ConstitutionWY'
	'WI_Constitution'
	'WAConstitution'
	'ConstitutionUT'
	'ConstParty_SC'
	'KBistami'

Queries the twitter with all the screen names to get the user information



In [89]:

    
lookupUserNames <- lookupUsers(userNamesVec)

Converts the data collected in the previous step to a data frame



In [90]:

    
userNamesDf = twListToDF(lookupUserNames)

Save this data frame so that the previous heavy work needs not to be repeated everytime



In [91]:

    
if(!isCustomSearch){
    saveRDS(userNamesDf,file = "userNamesDf.Rda")
}else{
    saveRDS(userNamesDf,file = "customUserNamesDF.Rda")
}

Load the userNameDf from the saved file



In [92]:

    
if(!isCustomSearch){
    userNamesDf = readRDS(file = "userNamesDf.Rda")
}else{
    userNamesDf = readRDS(file = "customUserNamesDF.Rda")
}

Print a few items of the data frame



In [93]:

    
head(userNamesDf)









    





description statusesCount followersCount favoritesCount friendsCount url name created protected verified screenName location lang id listedCount followRequestSent profileImageUrl

	ConstitutionWY #NASA have been key in the #Illuminati's plan to hide our #FlatEarth & destroy humanity for #Satan             12467                                                                                                                                                      164                                                                                                                                                        0                                                                                                                                                          19                                                                                                                                                         NA                                                                                                                 NASA are ILLUMINATI                                                                                                                                        2015-10-01 04:11:59                                                                                                                                        FALSE                                                                                                                                                      FALSE                                                                                                                                                      ConstitutionWY                                                                                                                                                                                                                                                                en                                                                                                                                                         3744055813                                                                                                          79                                                                                                                                                        FALSE                                                                                                                                                      http://pbs.twimg.com/profile_images/658816339235225600/HEB9pDvR_normal.png                                                                                 
	WI_Constitution #Lockheed is the main contractor helping #NASA hide our #FlatEarth & thereby hide #GOD. #Lockheed serve #SATAN.  8244                                                                                                               83                                                                                                                0                                                                                                                  13                                                                                                                 NA                                                                         Lockheed serve SATAN                                                                                               2015-10-01 06:34:14                                                                                                FALSE                                                                                                              FALSE                                                                                                              WI_Constitution                                                                                                                                                                               en                                                                                                                 3745020312                                                                 103                                                                                                                FALSE                                                                                                              http://pbs.twimg.com/profile_images/659579485663027200/TW05NYu3_normal.png                                         
	WAConstitution See the light
See the way
#CultofSiduri                                                                       4839                                                                                                           64                                                                                                            0                                                                                                               1                                                                                                             http://t.co/NgXssa1W1f                                                                                         Goth Siduri                                                                                                    2015-09-30 20:47:28                                                                                            FALSE                                                                                                          FALSE                                                                                                          WAConstitution                                                                                                 Seattle, WA                                                                                                    en                                                                                                             3740593934                                                                                                      25                                                                                                            FALSE                                                                                                          http://pbs.twimg.com/profile_images/652339690595614720/iflUNzPJ_normal.jpg                                     
	ConstitutionUT The United States #Constitution now hangs like a thread and can only be saved by the #WhiteHorse               24369                                                                                                          218                                                                                                            1                                                                                                              11                                                                                                             http://t.co/NgXssa1W1f                                                                                         WhiteHorse SIDURI                                                                                              2015-09-30 19:01:02                                                                                            FALSE                                                                                                          FALSE                                                                                                          ConstitutionUT                                                                                                 Salt Lake City, UT                                                                                             en                                                                                                             3739697178                                                                                                     360                                                                                                            FALSE                                                                                                          http://pbs.twimg.com/profile_images/652185485545205760/xToaeRah_normal.jpg                                     
	ConstParty_SC I seek vengeance for the murder of Priestess Hypatia of the Cult of #Gnosis/#Sophia/#Siduri                     7424                                                                                                           90                                                                                                            0                                                                                                               5                                                                                                             http://t.co/NgXssa1W1f                                                                                         Hypatia Siduri                                                                                                 2015-09-30 02:59:31                                                                                            FALSE                                                                                                          FALSE                                                                                                          ConstParty_SC                                                                                                  Too far away                                                                                                   en                                                                                                             3732391574                                                                                                      39                                                                                                            FALSE                                                                                                          http://pbs.twimg.com/profile_images/659135861745950720/FaS0KUFj_normal.jpg                                     
	KBistami                                                                                                                    8                                                                                                            0                                                                                                            2                                                                                                               9                                                                                                             NA                                                                                                             Khalil BISTAMI                                                                                                 2017-02-18 00:49:20                                                                                            FALSE                                                                                                          FALSE                                                                                                          KBistami                                                                                                                                                                                                                      en                                                                                                             832753832564101121                                                                                               0                                                                                                            FALSE                                                                                                          http://pbs.twimg.com/profile_images/832757758231339009/KTm8kJeP_normal.jpg

This is the heart of the whole code. It generates a data frame which will contain the screen name, latitude, longitude, state and country of the all the users. We keep storing the data frames as they arrive so that in case of any problem we can resume from the failure point.

NOTE: Here in this experiment we generated some 5000+ locations.



In [94]:

    
userLocationDataDf = data.frame(screenName = as.character(),Lat = double(),Long = double(),State = as.character(), Country = as.character())

sizeOfDf = length(userNamesDf$screenName)

for(i in 1:sizeOfDf){
    tempUserLocation = userNamesDf$location[i]
    tempUserScreenName = userNamesDf$screenName[i]
    if(tempUserLocation != ""){
        tempVec = getGeoCode(tempUserLocation,tempUserScreenName)
        if(!invalid(tempVec)){
            tempDf = data.frame(screenName = as.character(tempVec[1]),Lat = as.numeric(tempVec[2]),Long = as.numeric(tempVec[3]),State = as.character(tempVec[4]),Country = as.character(tempVec[5]))
            userLocationDataDf = rbind(userLocationDataDf,tempDf)
            if(!isCustomSearch){
                saveRDS(userLocationDataDf,file = "userLocationDataDf.Rda")
            }else{
                saveRDS(userLocationDataDf,file = "customUserLocationDataDf.Rda")
            }
        }
    }
}

Save the above generated data frame since the above process is time consuming.



In [95]:

    
if(!isCustomSearch){
    saveRDS(userLocationDataDf,file = "userLocationDataDf.Rda")
}else{
    saveRDS(userLocationDataDf,file = "customUserLocationDataDf.Rda")
}

Load the data frame for further queries



In [96]:

    
if(!isCustomSearch){
    userLocationDataDf = readRDS(file = "userLocationDataDf.Rda")
}else{
    userLocationDataDf = readRDS(file = "customUserLocationDataDf.Rda")
}

Print and check if the data frame generated is in right format or not



In [97]:

    
head(userLocationDataDf)









    





screenName Lat Long State Country

	WAConstitution 47.60621      -122.33207    Washington    United States 
	ConstitutionUT 40.76078      -111.89105    Utah          United States 
	ConstParty_SC 42.72605       -88.54242    Wisconsin     United States 
	LovinTrump    37.09024       -95.71289    Kansas        United States 
	MEConstitution 37.09024       -95.71289    Kansas        United States 
	Step_Holt     42.99359       -89.56843    Wisconsin     United States

Prints the number of items in the data frame



In [98]:

    
print(length(userLocationDataDf$screenName))

This code basically aggregates the data as per the different state and club them together. This code also uses mean function to generate the latitude and logitude of the state the mean value of them will still be in the state.



In [99]:

    
aggDataDf = aggregate(x = list(Latitude = userLocationDataDf$Lat, Longitude = userLocationDataDf$Long), by = list(State=userLocationDataDf$State), FUN = mean)

This part of code generates the frequency table from the userLocationDataDf data frame per state.



In [100]:

    
tweetLocationFreqDf = aggregate(x = list(Tweet_Count = userLocationDataDf$State), list(userLocationDataDf$State), length)

Merges the two data frames generated above



In [101]:

    
aggUserLocationDataDf = data.frame(aggDataDf,tweetLocationFreqDf)

Save this data frame before moving further



In [102]:

    
if(!isCustomSearch){
    saveRDS(aggUserLocationDataDf,file = "aggUserLocationDataDf.Rda")
}else{
    saveRDS(aggUserLocationDataDf,file = "aggCustomUserLocationDataDf.Rda")
}

Reload the data in a data frame again for futher processing



In [103]:

    
if(!isCustomSearch){
    aggUserLocationDataDf = readRDS(file = "aggUserLocationDataDf.Rda")
}else{
    aggUserLocationDataDf = readRDS(file = "aggCustomUserLocationDataDf.Rda")
}

Print and check if the data frame is in right format or not.



In [104]:

    
head(aggUserLocationDataDf)









    





State Latitude Longitude Group.1 Tweet_Count

	Washington    47.65450      -121.80143    Washington     3            
	Utah          40.76078      -111.89105    Utah           1            
	Wisconsin     43.42920       -88.97276    Wisconsin      6            
	Kansas        37.09024       -95.71289    Kansas        13            
	South Carolina 33.45218       -80.65315    South Carolina  5            
	Georgia       33.11962       -83.78326    Georgia        5

get_map give the map of the specified location.



In [105]:

    
USAMap = get_map(location = 'United States', zoom = 4)









    



Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=United+States&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=United%20States&sensor=false

This code generates the points as per the data provided. We need to pass the map, generated in the previous step and all the points which needs to be plotted



In [106]:

    
mapPoints <- ggmap(USAMap) +
    geom_point(aes(x = aggUserLocationDataDf$Longitude, y = aggUserLocationDataDf$Latitude, size = Tweet_Count), data = aggUserLocationDataDf, color = "blue")

Draws the map with the marked points.



In [107]:

    
mapPoints









    



Warning message:
"Removed 2 rows containing missing values (geom_point)."

References



In [ ]:

	description	statusesCount	followersCount	favoritesCount	friendsCount	url	name	created	protected	verified	screenName	location	lang	id	listedCount	followRequestSent	profileImageUrl
ConstitutionWY	#NASA have been key in the #Illuminati's plan to hide our #FlatEarth & destroy humanity for #Satan	12467	164	0	19	NA	NASA are ILLUMINATI	2015-10-01 04:11:59	FALSE	FALSE	ConstitutionWY		en	3744055813	79	FALSE	http://pbs.twimg.com/profile_images/658816339235225600/HEB9pDvR_normal.png
WI_Constitution	#Lockheed is the main contractor helping #NASA hide our #FlatEarth & thereby hide #GOD. #Lockheed serve #SATAN.	8244	83	0	13	NA	Lockheed serve SATAN	2015-10-01 06:34:14	FALSE	FALSE	WI_Constitution		en	3745020312	103	FALSE	http://pbs.twimg.com/profile_images/659579485663027200/TW05NYu3_normal.png
WAConstitution	See the light See the way #CultofSiduri	4839	64	0	1	http://t.co/NgXssa1W1f	Goth Siduri	2015-09-30 20:47:28	FALSE	FALSE	WAConstitution	Seattle, WA	en	3740593934	25	FALSE	http://pbs.twimg.com/profile_images/652339690595614720/iflUNzPJ_normal.jpg
ConstitutionUT	The United States #Constitution now hangs like a thread and can only be saved by the #WhiteHorse	24369	218	1	11	http://t.co/NgXssa1W1f	WhiteHorse SIDURI	2015-09-30 19:01:02	FALSE	FALSE	ConstitutionUT	Salt Lake City, UT	en	3739697178	360	FALSE	http://pbs.twimg.com/profile_images/652185485545205760/xToaeRah_normal.jpg
ConstParty_SC	I seek vengeance for the murder of Priestess Hypatia of the Cult of #Gnosis/#Sophia/#Siduri	7424	90	0	5	http://t.co/NgXssa1W1f	Hypatia Siduri	2015-09-30 02:59:31	FALSE	FALSE	ConstParty_SC	Too far away	en	3732391574	39	FALSE	http://pbs.twimg.com/profile_images/659135861745950720/FaS0KUFj_normal.jpg
KBistami		8	0	2	9	NA	Khalil BISTAMI	2017-02-18 00:49:20	FALSE	FALSE	KBistami		en	832753832564101121	0	FALSE	http://pbs.twimg.com/profile_images/832757758231339009/KTm8kJeP_normal.jpg

screenName	Lat	Long	State	Country
WAConstitution	47.60621	-122.33207	Washington	United States
ConstitutionUT	40.76078	-111.89105	Utah	United States
ConstParty_SC	42.72605	-88.54242	Wisconsin	United States
LovinTrump	37.09024	-95.71289	Kansas	United States
MEConstitution	37.09024	-95.71289	Kansas	United States
Step_Holt	42.99359	-89.56843	Wisconsin	United States

State	Latitude	Longitude	Group.1	Tweet_Count
Washington	47.65450	-121.80143	Washington	3
Utah	40.76078	-111.89105	Utah	1
Wisconsin	43.42920	-88.97276	Wisconsin	6
Kansas	37.09024	-95.71289	Kansas	13
South Carolina	33.45218	-80.65315	South Carolina	5
Georgia	33.11962	-83.78326	Georgia	5