Tweepy Example - Twitter api for Python

This example notebook shows the code we used to download twitter feeds for the in-class assignment. You can try to follow along but this notebook may not work on some systems.

Before starting we need to make sure Tweepy module is installed. You need to use the pip command to install this module (other common install commands include conda, or easy_install. which one you use depends on the module). More information about Tweepy can be found on their website:

http://www.tweepy.org/

You only need to run the installer once on your system.


In [ ]:
!pip install tweepy

Assuming the installation worked, you can now import the tweepy module.


In [ ]:
import tweepy

The next step is to get a costomer_key, consumer_secret, access_token, and access_token_secret from your twitter account. This is not strait forward but fortunalty you should only need to do it once. Here are the basic steps:

  1. Log onto your twitter account
  2. Go to the following website and click on the add button: https://apps.twitter.com/
  3. Fill out the following fields:
    • Name (this needs to be unique) - DirkColbry-Python
    • Description - Ipython notebook tweepy test
    • URL - Not sure, I'm not planning on posting so I just used my personal website - http://www.mus.edu/~colbrydi
    • Check "Yes I agree" checkbox
  4. Assuming you are successful in creating the app, click on the "Permissions" tab and set it to "read-only"
  5. Note the consumer_key and consumer_secret strings for later.
  6. Click the "create access token" and record those two numbers.
  7. Note the access_token and the access_token_secret strings
  8. Insert all four strings into the variables below.

In [ ]:
consumer_key=''
consumer_secret = ''
access_token = ''
access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

Now we can check that it is working by pulling the timeline from your personal twitter feed


In [ ]:
public_tweets = api.home_timeline()
for tweet in public_tweets:
    print(tweet.text)

What we really want are the feeds from the presidential candidates. You need to search for the Twitter feed handle for each person. Put their name in the search menu and go to their twitter timeline. Their feed name is in the URL. For example here is Barack Obama's timeline:

https://twitter.com/barackobama

His twitter screen_name is just barackobama.

Now let's download 10 of his latest twitters using the user_timeline function


In [ ]:
#Test the code
public_tweets = api.user_timeline(screen_name = 'barackobama', count = 10, include_rts = True)

f = open('barackobama_tweets.txt', 'w')
for tweet in public_tweets:
    f.write(tweet.text+'\n')
f.close()

In [ ]:
'''
This chunk of code downloads a bunch of tweets.  The tweepy API will only 
download 200 tweets at a time, and you can only access the last ~3200 tweets.

This is a modified version of https://gist.github.com/yanofsky/5436496
'''

# whose tweets do I want to download?
twitter_names = [ 'BarackObama', 'realDonaldTrump','HillaryClinton','timkaine','mike_pence']

# how many do I want?  Max = 3200 or so.
number_to_download = 2000

# loop over all of the twitter handles
for name in twitter_names:

    print(name)
    
    # initialize a list to hold all the tweepy tweets
    alltweets = []

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = name, count = 200, include_rts = True)

    # save most recent tweets in our new list
    alltweets.extend(new_tweets)
    
    # save the ID of the oldest tweet
    oldest = alltweets[-1].id - 1
    
    # keep grabbing tweets until we've reached the number we want.
    while len(alltweets) < number_to_download:
        
        # all subsequent requests use the max_id param to prevent duplication
        new_tweets = api.user_timeline(screen_name = name, count=200, max_id=oldest, include_rts=True)

        # extend the list again
        alltweets.extend(new_tweets)

        # save the ID of the oldest tweet again
        oldest = alltweets[-1].id - 1
        
        # give the user some sense of what's going on
        print("...%s tweets downloaded so far" % (len(alltweets)) )
    
    # write 'em out!
    f = open(name+'_tweets.txt', 'w')
    for tweet in alltweets:
        f.write(tweet.text+'\n')
    f.close()

For class, I uploaded these files to a GitHub account: https://github.com/bwoshea/CMSE201_datasets