Twitter Collector

Setup

Before you start, you need a few things:

  1. Twitter OAuth credentials
  2. plain-text file containing either Twitter handles or user ids (one per line)
  3. settings.cfg with proper settings
  4. virtual environment with all the necessary Python modules - we provide environment.yml for conda users and requirements.txt for pip users (NOTE: we've tested these only on OS X)

Getting Twitter OAuth Credentials

Here's Twitter's overview of the credentials and the process.

List of users to fetch

The scripts can look for either handles or user ids but not both. So, you need to create a plain-text file with one handle or user id per line. We provided list_example.txt to show you how.

Script Settings

  1. Copy settings-example.cfg to settings.cfg
  2. Replace the values in settings.cfg with settings appropriate for your environment and with your own Twitter credentials

The settings (besides Twitter credentials) are

  • logfile: the name of the file where you want to log the script's output (mostly progress updates)
  • listfile: the name of the file containing your list of handles or ids to fetch
  • outfolder: path to the folder where you want the script to store the JSON(s) returned by Twitter
  • user_id: set to true if you provided a list of ids, otherwise leave it as false

Collect and Cache

To download the user timelines as JSON files (one per use), run the script below:


In [1]:
run scripts/twitter_collect.py


Getting 1 user timeline(s). Storing files in files/.
Getting 310 tweets for casmlab. (Or ~3200, whichever is lower.)
Done collecting.

Parse

Once you have all the JSON files, you can parse the data into some other format that's useful for you. For instance, Ilan Manor likes to work in Excel, so we created a parser for him that puts the data he cares about into a CSV Excel can read.


In [2]:
run scripts/twitter_parse.py


Parsing 17 files.
Done parsing. See your data in files/test_output.txt.