Collecting tweets using the Twitter API

In this section we are going to see how to connect to the Twitter API to collect tweets and save them.

"In computer programming, an Application Programming Interface (API) is a set of subroutine definitions, protocols, and tools for building application software." wikipedia

The Twitter API is the tool we use to collect tweets from Twitter

Twitter offers two different APIs:

To use the Twitter API from python, we will use the library tweepy which facilitate the access to the API.

To install it run the following command in your terminal or execute the cell below:

pip install tweepy

In [ ]:
# this will install tweepy on your machine
!pip install tweepy

Create a Twitter app and find your consumer token and secret

  1. go to https://apps.twitter.com/
  2. click Create New App
  3. fill in the details
  4. click on manage keys and access tokens
  5. copy paste your Consumer Key (API Key) and Consumer Secret (API Secret) below:
  6. click create my access token

In [ ]:
consumer_key = 'xxx'
consumer_secret = 'xxx'
access_token = 'xxx'
access_token_secret = 'xxx'

Authentificate with the Twitter API


In [ ]:
import tweepy

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# create the api object that we will use to interact with Twitter
api = tweepy.API(auth)

In [ ]:
# example of:
tweet = api.update_status('Hello Twitter')

In [ ]:
# see all the information contained in a tweet:
print(tweet)

Collecting tweets from the Streaming API

source : http://tweepy.readthedocs.io/en/v3.5.0/streaming_how_to.html

Step 1: Creating a StreamListener

This simple stream listener prints status text. The on_data method of Tweepy’s StreamListener conveniently passes data from statuses to the on_status method. Create class MyStreamListener inheriting from StreamListener and overriding on_status.:


In [ ]:
#override tweepy.StreamListener to make it print tweet content when new data arrives
class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print(status.text)

Step 2: Creating a Stream

Using the api object we created and the StreamListener we can create a Stream Object:


In [ ]:
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)

Step 3: Starting a Stream

A number of twitter streams are available through Tweepy. Most cases will use filter, the user_stream, or the sitestream. For more information on the capabilities and limitations of the different streams see Twitter Streaming API Documentation

In this example we will use filter to stream all tweets containing the word python. The track parameter is an array of search terms to stream.


In [ ]:
myStream.filter(track=['new york'])

In [ ]:
myStream.disconnect()

In [ ]:
myStream.filter(track=['realdonaldtrump,trump'], languages=['en'])

In [ ]:
myStream.disconnect()

In [ ]:
# streaming tweets from a given location
# we need to provide a comma-separated list of longitude,latitude pairs specifying a set of bounding boxes
# for example for New York
myStream.filter(locations=[-74,40,-73,41])

In [ ]:
myStream.disconnect()

Saving the stream to a file

Lets' define a new StreamListener that will save the collected data to a file


In [ ]:
#override tweepy.StreamListener to make it save data to a file
class StreamSaver(tweepy.StreamListener):
    def __init__(self, filename, max_num_tweets=2000, api=None):
        self.filename = filename
        
        self.num_tweets = 0
        
        self.max_num_tweets = max_num_tweets
        
        tweepy.StreamListener.__init__(self, api=api)
        
        
    def on_data(self, data):
        #print json directly to file
        
        with open(self.filename,'a') as tf:
            tf.write(data)
            
        self.num_tweets += 1
        
        print(self.num_tweets)
        
        if self.num_tweets >= self.max_num_tweets:
            return True
            
    def on_error(self, status):
        print(status)

In [ ]:
# create the new StreamListener and stream object that will save collected tweets to a file
saveStream = StreamSaver(filename='testTweets.txt')
mySaveStream = tweepy.Stream(auth = api.auth, listener=saveStream)

In [ ]:
mySaveStream.filter(track=['realdonaldtrump,trump'], languages=['en'])

In [ ]:
mySaveStream.disconnect()