Twitter data mining using Python assignment 14

Team Rython: Dainius Masiliunas and Tim Weerman
Date: 21st of January, 2016
Apache License 2.0

Imports

Make sure you have pysqlite2, tweepy and spatialite installed!


In [1]:
from __future__ import division
import tweepy
import datetime
import json
import os
from pysqlite2 import dbapi2 as sqlite3

Twitter authentication (fill this!)


In [2]:
APP_KEY = ""
APP_SECRET = ""
OAUTH_TOKEN = ""
OAUTH_TOKEN_SECRET = ""

Using Tweepy instead of Twython (because it's more readily available via apt-get or zypper).


In [3]:
auth = tweepy.OAuthHandler(APP_KEY, APP_SECRET)
auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
api = tweepy.API(auth)

Database file to write to (fill this!)

The database file has to exist and already have a table defined. An emty ready database file is included in "spatial-backup.sqlite", so you can use that.


In [4]:
databasefile = "spatial-backup.sqlite"

SQLite opening

Make sure you load pysqlite2, and give a path for mod_spatialite for SpatiaLite support! This may be distribution-specific! You might need to install libspatialite!


In [5]:
conn = sqlite3.connect(databasefile)
conn.enable_load_extension(True)
conn.execute('SELECT load_extension("/usr/lib64/mod_spatialite.so.7")')
curs = conn.cursor()

Coordinates to WKT

Converts Twitter coordinates (two points) into a Well Known Text.


In [6]:
def coordinates_to_wkt(coords):
    if coords == None:
        return ""
    return "POINT("+str(coords["coordinates"][0])+" "+str(coords["coordinates"][1])+")"

Bounding box to WKT

Calculates the centroid of a bounding box and returns a Well Known Text of that point. Only polygonal bounding boxes are supported (but are there any other kind?)


In [7]:
def bbox_to_wkt(bbox):
    if bbox.coordinates == None:
        return ""
    if bbox.type == "Polygon":
        centroid = [0, 0]
        centroid[0] = (bbox.coordinates[0][2][0] + bbox.coordinates[0][0][0]) / 2
        centroid[1] = (bbox.coordinates[0][2][1] + bbox.coordinates[0][0][1]) / 2
        return "POINT("+str(centroid[0])+" "+str(centroid[1])+")"
    print "Unknown place type!"
    return ""

Process query: main function of the script

Does the query parsing and output to SpatiaLite. Pass the result of api.search() to it.


In [8]:
def process_query(search_results):
    for tweet in search_results:
        full_place_name = ""
        place_type = ""
        location = ""
        username =  tweet.user.screen_name
        followers_count =  tweet.user.followers_count
        tweettext = tweet.text.encode("utf-8")
        if tweet.place != None:
            full_place_name = tweet.place.full_name
            place_type =  tweet.place.place_type
        coordinates = tweet.coordinates
        if (coordinates != None) or (tweet.place != None):
            print 'Found a geolocated tweet! By:'
            print username
            print '==========================='
            if coordinates != None:
                location = coordinates_to_wkt(coordinates)
            else:
                if tweet.place != None:
                    location = bbox_to_wkt(tweet.place.bounding_box)
            curs.execute("insert into tweets (username, followers_count, tweettext, full_place_name, place_type, coordinates, geometry) values (?, ?, ?, ?, ?, ?, ST_GeomFromText( ? , 4326));", \
                (username, followers_count, tweettext.decode('utf-8'), full_place_name, place_type, location, location))
            conn.commit()

Example queries

Write queries in succession (or loops if you like). Their results (if they are geolocated) will be added into the SpatiaLite database.


In [9]:
process_query(api.search(q="Beer", count=100))
process_query(api.search(q="Jorn", count=100))
process_query(api.search(q="cairo", count=100))
process_query(api.search(q="washington", count=100))


Found a geolocated tweet! By:
scltnmz
===========================
Found a geolocated tweet! By:
KyleSokol
===========================
Found a geolocated tweet! By:
sylvainbauza
===========================
Found a geolocated tweet! By:
nahlaw
===========================
Found a geolocated tweet! By:
SpotHopperApp
===========================
Found a geolocated tweet! By:
moyamcallister
===========================
Found a geolocated tweet! By:
azfRFuFnBthjvNb
===========================
Found a geolocated tweet! By:
azfRFuFnBthjvNb
===========================
Found a geolocated tweet! By:
AcostaMzk
===========================
Found a geolocated tweet! By:
azfRFuFnBthjvNb
===========================
Found a geolocated tweet! By:
bh_Cairo
===========================
Found a geolocated tweet! By:
MohammadKabli
===========================
Found a geolocated tweet! By:
haquelpontes
===========================
Found a geolocated tweet! By:
mohameduwk_97
===========================
Found a geolocated tweet! By:
bh_Cairo
===========================
Found a geolocated tweet! By:
bh_Cairo
===========================
Found a geolocated tweet! By:
Spiky216
===========================
Found a geolocated tweet! By:
brunomanzali
===========================
Found a geolocated tweet! By:
tmj_wak_jobs
===========================
Found a geolocated tweet! By:
Franki_is_witty
===========================

Close the database


In [10]:
conn.close()

Visualise data

Opens QGIS with the database passed as an argument. It should show you all the points. Add a layer of OpenStreetMap or such for a nice visualisation of the points.


In [11]:
os.system("qgis "+databasefile)


Out[11]:
0


In [ ]: