The Twitter API

This tutorial presents an overview of how to use the Python programming language to interact with the Twitter API, both for acquiring data and for posting it. We're using the Twitter API because it's useful in its own right but also presents an interesting "case study" of how to work with APIs offered by commercial entities and social media sites.

About the API

The Twitter API allows you to programmatically enact many of the same actions that you can perform with the Twitter app and the Twitter website, such as searching for Tweets, following users, reading your timeline, posting tweets and direct messages, etc, though there are some parts of the Twitter user experience, like polls, that are (as of this writing) unavailable through the API. You can use the API to do things like collect data from tweets and write automated agents that post to Twitter.

In particular, we're going to be talking about Twitter's REST API. ("REST" stands for Representational State Transfer, a popular style of API design). For the kind of work we'll be doing, the streaming API is also worth a look, but left as an exercise for the reader.

Authorization

All requests to the REST API—making a post, running a search, following or unfollowing an account—must be made on behalf of a user. Your code must be authorized to submit requests to the API on that user's behalf. A user can authorize code to act on their behalf through the medium of something called an application. You can see which applications you've authorized by logging into Twitter and following this link.

When making requests to the Twitter API, you don't use your username and password: instead you use four different unique identifiers: the Consumer (Application) Key, the Consumer (Application) Secret, an Access Token, and a Token Secret. The Consumer Key/Secret identify the application, and the Token/Token Secret identify a particular account as having access through a particular application. You don't choose these values; they're strings of random numbers automatically generated by Twitter. These strings, together, act as a sort of "password" for the Twitter API.

In order to obtain these four magical strings, we need to...

  • Create a Twitter "application," which is associated with a Twitter account; the "API Key" and "API Secret" are created with this application;
  • Create an access token and token secret;
  • Copy all four strings to use when creating Python programs that access Twitter.

This site has a good overview of the steps you need to perform in order to create a Twitter application. I'll demonstrate the process in class. You'll need to have already signed up for a Twitter account!

When you're done creating your application and getting your token for the application, assign them to the variables below:


In [5]:
api_key = ""
api_secret = ""
access_token = ""
token_secret = ""

Working with Twython

Twitter's API operates over HTTP and returns JSON objects, and technically you could use the requests library (or any other HTTP client) to make requests to and receive responses from the API. However, the Twitter API uses a somewhat complicated authentication process called Oauth, which requires the generation of cryptographic signatures of requests in order to ensure their security. This process is a bit complicated, and not worth implementing from scratch. For this reason, most programmers making use of the Twitter API use a third-party library to do so. These libraries wrap up and abstract away the particulars of working with Oauth authentication so that programmers don't have to worry about them. As a happy side-effect, the libraries provide particular abstractions for API calls which make it slightly easier to use the API—you can just call methods with parameters instead of constructing URLs in your code "by hand".

There are a number of different libraries for accessing the Twitter API. We're going to use one called Twython. You can install Twython with pip:


In [2]:
!pip3 install twython


Collecting twython
  Downloading twython-3.4.0.tar.gz
Requirement already satisfied (use --upgrade to upgrade): requests>=2.1.0 in /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages (from twython)
Collecting requests-oauthlib>=0.4.0 (from twython)
  Downloading requests_oauthlib-0.6.1-py2.py3-none-any.whl
Collecting oauthlib>=0.6.2 (from requests-oauthlib>=0.4.0->twython)
  Downloading oauthlib-1.1.2.tar.gz (111kB)
    100% |████████████████████████████████| 114kB 2.0MB/s 
Installing collected packages: oauthlib, requests-oauthlib, twython
  Running setup.py install for oauthlib
  Running setup.py install for twython
Successfully installed oauthlib-1.1.2 requests-oauthlib-0.6.1 twython-3.4.0
You are using pip version 7.1.2, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

Searching Twitter

Here's our first example Twython snippet, which uses the search resource to find tweets that match a particular search term.


In [80]:
import twython

# create a Twython object by passing the necessary secret passwords
twitter = twython.Twython(api_key, api_secret, access_token, token_secret)

response = twitter.search(q="data journalism", result_type="recent", count=20)
[r['text'] for r in response['statuses']]


Out[80]:
['#BigData Triggers #Predictive #Journalism https://t.co/Jyjno5Rcvy #journalists',
 'Data Journalism: A Simple Guide for Marketers https://t.co/w9uHCXtAZ9 https://t.co/DMImBSLTfE',
 'The latest Data Journalism! https://t.co/moNhOKnSPs Thanks to @TheInsightBee @vikesverkko #journalism #datajournalism',
 "Biggest leak in the history of data journalism just went live, and it's about Cards Against Humanity.",
 'Newspapers And Data: Good For Outreach, Bad For Journalism? https://t.co/SkhxpPt4q1 #DigiAsia #DigitalPublishing https://t.co/leYrvl1XUS',
 'RT @dinocitraro: This Digital Humanities + Data Journalism Symposium looks fantastic: https://t.co/uKVElmNdhW More info here: https://t.co/…',
 'Data Journalism Awards celebrate evidence-based questioning in our society https://t.co/jToiOpgKxq @euroscientist https://t.co/YIIEf3vKd5',
 'RT @josephwillits: Red dots of bloodshed & death. Harrowing map of every car bomb in Baghdad since 2003 https://t.co/t8TcgWqMI5 https://t.c…',
 'RT @a_m_papagiotis: How one Mexican data team uncovered the story of 4,000 missing women https://t.co/vwALimBe7q via @wordpressdotcom',
 'Data Generation Gap: Younger IT Workers Believe The Hype - InformationWeek https://t.co/E7AaFEj123 https://t.co/DEVrvZwjtZ',
 'Journalism in the Age of Data: A Video Report on Data Visualization by Geoff McGhee... https://t.co/nEiOoaSXs3',
 'Data Journalism: A Simple Guide for Marketers https://t.co/5Ju2NpNrCd',
 'Data Journalism: A Simple Guide for Marketers https://t.co/dUDT2qEWWV',
 'RT @GrowthMob: Data Journalism: A Simple Guide for Marketers https://t.co/jOtB1WCDgX via @hubspot @DholakiyaPratik #Startup #Marketing #Gro…',
 'RT @GrowthMob: Data Journalism: A Simple Guide for Marketers https://t.co/jOtB1WCDgX via @hubspot @DholakiyaPratik #Startup #Marketing #Gro…',
 'RT @GrowthMob: Data Journalism: A Simple Guide for Marketers https://t.co/jOtB1WCDgX via @hubspot @DholakiyaPratik #Startup #Marketing #Gro…',
 'Data Journalism: A Simple Guide for Marketers https://t.co/jOtB1WCDgX via @hubspot @DholakiyaPratik #Startup #Marketing #GrowthHacking',
 'RT @HamdiCheb: @NassiraELM https://t.co/RRBfbS6ZXE',
 'RT @a_m_papagiotis: How one Mexican data team uncovered the story of 4,000 missing women https://t.co/vwALimBe7q via @wordpressdotcom',
 'How one Mexican data team uncovered the story of 4,000 missing women https://t.co/vwALimBe7q via @wordpressdotcom']

The .search() method performs a Twitter search, just as though you'd gone to the Twitter search page and typed in your query. The method returns a JSON object, which Twython converts to a dictionary for us. This dictionary contains a number of items; importantly, the value for the key statuses is a list of tweets that match the search term. Let's look at the underlying response in detail. I'm going to run the search again, this time asking for only two results instead of twenty:


In [11]:
response = twitter.search(q="data journalism", result_type="recent", count=2)
response


Out[11]:
{'search_metadata': {'completed_in': 0.05,
  'count': 2,
  'max_id': 750060071153958912,
  'max_id_str': '750060071153958912',
  'next_results': '?max_id=750057453123960831&q=data%20journalism&count=2&include_entities=1&result_type=recent',
  'query': 'data+journalism',
  'refresh_url': '?since_id=750060071153958912&q=data%20journalism&result_type=recent&include_entities=1',
  'since_id': 0,
  'since_id_str': '0'},
 'statuses': [{'contributors': None,
   'coordinates': None,
   'created_at': 'Mon Jul 04 20:13:51 +0000 2016',
   'entities': {'hashtags': [],
    'media': [{'display_url': 'pic.twitter.com/V44EbUvvVe',
      'expanded_url': 'http://twitter.com/OrenKessler/status/749729760293511168/photo/1',
      'id': 749729753117044736,
      'id_str': '749729753117044736',
      'indices': [111, 134],
      'media_url': 'http://pbs.twimg.com/media/CmeTPPwVMAA8Tgn.jpg',
      'media_url_https': 'https://pbs.twimg.com/media/CmeTPPwVMAA8Tgn.jpg',
      'sizes': {'large': {'h': 1334, 'resize': 'fit', 'w': 750},
       'medium': {'h': 1200, 'resize': 'fit', 'w': 675},
       'small': {'h': 680, 'resize': 'fit', 'w': 382},
       'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
      'source_status_id': 749729760293511168,
      'source_status_id_str': '749729760293511168',
      'source_user_id': 18101696,
      'source_user_id_str': '18101696',
      'type': 'photo',
      'url': 'https://t.co/V44EbUvvVe'}],
    'symbols': [],
    'urls': [{'display_url': 'theguardian.com/news/datablog/…',
      'expanded_url': 'https://www.theguardian.com/news/datablog/2010/oct/23/wikileaks-iraq-data-journalism',
      'indices': [87, 110],
      'url': 'https://t.co/lr3wD1YVGe'}],
    'user_mentions': [{'id': 18101696,
      'id_str': '18101696',
      'indices': [3, 15],
      'name': 'Oren Kessler',
      'screen_name': 'OrenKessler'}]},
   'favorite_count': 0,
   'favorited': False,
   'geo': None,
   'id': 750060071153958912,
   'id_str': '750060071153958912',
   'in_reply_to_screen_name': None,
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str': None,
   'in_reply_to_user_id': None,
   'in_reply_to_user_id_str': None,
   'is_quote_status': False,
   'lang': 'en',
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'place': None,
   'possibly_sensitive': False,
   'retweet_count': 33,
   'retweeted': False,
   'retweeted_status': {'contributors': None,
    'coordinates': None,
    'created_at': 'Sun Jul 03 22:21:18 +0000 2016',
    'entities': {'hashtags': [],
     'media': [{'display_url': 'pic.twitter.com/V44EbUvvVe',
       'expanded_url': 'http://twitter.com/OrenKessler/status/749729760293511168/photo/1',
       'id': 749729753117044736,
       'id_str': '749729753117044736',
       'indices': [94, 117],
       'media_url': 'http://pbs.twimg.com/media/CmeTPPwVMAA8Tgn.jpg',
       'media_url_https': 'https://pbs.twimg.com/media/CmeTPPwVMAA8Tgn.jpg',
       'sizes': {'large': {'h': 1334, 'resize': 'fit', 'w': 750},
        'medium': {'h': 1200, 'resize': 'fit', 'w': 675},
        'small': {'h': 680, 'resize': 'fit', 'w': 382},
        'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
       'type': 'photo',
       'url': 'https://t.co/V44EbUvvVe'}],
     'symbols': [],
     'urls': [{'display_url': 'theguardian.com/news/datablog/…',
       'expanded_url': 'https://www.theguardian.com/news/datablog/2010/oct/23/wikileaks-iraq-data-journalism',
       'indices': [70, 93],
       'url': 'https://t.co/lr3wD1YVGe'}],
     'user_mentions': []},
    'favorite_count': 14,
    'favorited': False,
    'geo': None,
    'id': 749729760293511168,
    'id_str': '749729760293511168',
    'in_reply_to_screen_name': None,
    'in_reply_to_status_id': None,
    'in_reply_to_status_id_str': None,
    'in_reply_to_user_id': None,
    'in_reply_to_user_id_str': None,
    'is_quote_status': False,
    'lang': 'en',
    'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
    'place': {'attributes': {},
     'bounding_box': {'coordinates': [[[-77.119401, 38.801826],
        [-76.909396, 38.801826],
        [-76.909396, 38.9953797],
        [-77.119401, 38.9953797]]],
      'type': 'Polygon'},
     'contained_within': [],
     'country': 'United States',
     'country_code': 'US',
     'full_name': 'Washington, DC',
     'id': '01fbe706f872cb32',
     'name': 'Washington',
     'place_type': 'city',
     'url': 'https://api.twitter.com/1.1/geo/id/01fbe706f872cb32.json'},
    'possibly_sensitive': False,
    'retweet_count': 33,
    'retweeted': False,
    'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
    'text': 'These maps showing every attack in Baghdad since 2003 are just insane https://t.co/lr3wD1YVGe https://t.co/V44EbUvvVe',
    'truncated': False,
    'user': {'contributors_enabled': False,
     'created_at': 'Sat Dec 13 17:41:53 +0000 2008',
     'default_profile': False,
     'default_profile_image': False,
     'description': 'Deputy Director for Research and Research Fellow at Foundation for Defense of Democracies. Lots of Egypt. RTs may be the opposite of endorsements.',
     'entities': {'description': {'urls': []},
      'url': {'urls': [{'display_url': 'defenddemocracy.org/about-fdd/team…',
         'expanded_url': 'http://www.defenddemocracy.org/about-fdd/team-overview/oren-kessler/',
         'indices': [0, 23],
         'url': 'https://t.co/pncCL0oUws'}]}},
     'favourites_count': 4867,
     'follow_request_sent': False,
     'followers_count': 11927,
     'following': False,
     'friends_count': 5112,
     'geo_enabled': True,
     'has_extended_profile': False,
     'id': 18101696,
     'id_str': '18101696',
     'is_translation_enabled': False,
     'is_translator': False,
     'lang': 'en',
     'listed_count': 486,
     'location': 'Washington DC',
     'name': 'Oren Kessler',
     'notifications': False,
     'profile_background_color': 'C0DEED',
     'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
     'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',
     'profile_background_tile': False,
     'profile_banner_url': 'https://pbs.twimg.com/profile_banners/18101696/1459049086',
     'profile_image_url': 'http://pbs.twimg.com/profile_images/713926571204218880/UY_SCAqx_normal.jpg',
     'profile_image_url_https': 'https://pbs.twimg.com/profile_images/713926571204218880/UY_SCAqx_normal.jpg',
     'profile_link_color': '1B95E0',
     'profile_sidebar_border_color': 'C0DEED',
     'profile_sidebar_fill_color': 'DDEEF6',
     'profile_text_color': '333333',
     'profile_use_background_image': True,
     'protected': False,
     'screen_name': 'OrenKessler',
     'statuses_count': 37486,
     'time_zone': 'Eastern Time (US & Canada)',
     'url': 'https://t.co/pncCL0oUws',
     'utc_offset': -14400,
     'verified': False}},
   'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
   'text': 'RT @OrenKessler: These maps showing every attack in Baghdad since 2003 are just insane https://t.co/lr3wD1YVGe https://t.co/V44EbUvvVe',
   'truncated': False,
   'user': {'contributors_enabled': False,
    'created_at': 'Thu Oct 28 16:29:17 +0000 2010',
    'default_profile': True,
    'default_profile_image': False,
    'description': 'Business communications pro, frmr #potus speechwriter & defender of free enterprise. Dad w/daughters. Happiest when I run more than I drive.',
    'entities': {'description': {'urls': []}},
    'favourites_count': 500,
    'follow_request_sent': False,
    'followers_count': 506,
    'following': False,
    'friends_count': 525,
    'geo_enabled': False,
    'has_extended_profile': False,
    'id': 209153772,
    'id_str': '209153772',
    'is_translation_enabled': False,
    'is_translator': False,
    'lang': 'en',
    'listed_count': 15,
    'location': 'DC/MD',
    'name': 'Noam Neusner',
    'notifications': False,
    'profile_background_color': 'C0DEED',
    'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
    'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',
    'profile_background_tile': False,
    'profile_banner_url': 'https://pbs.twimg.com/profile_banners/209153772/1353293142',
    'profile_image_url': 'http://pbs.twimg.com/profile_images/2987636674/0dfa99cd800590566ae8c1ab3ca34d2e_normal.jpeg',
    'profile_image_url_https': 'https://pbs.twimg.com/profile_images/2987636674/0dfa99cd800590566ae8c1ab3ca34d2e_normal.jpeg',
    'profile_link_color': '0084B4',
    'profile_sidebar_border_color': 'C0DEED',
    'profile_sidebar_fill_color': 'DDEEF6',
    'profile_text_color': '333333',
    'profile_use_background_image': True,
    'protected': False,
    'screen_name': 'NoamNeusner',
    'statuses_count': 3208,
    'time_zone': 'Eastern Time (US & Canada)',
    'url': None,
    'utc_offset': -14400,
    'verified': False}},
  {'contributors': None,
   'coordinates': None,
   'created_at': 'Mon Jul 04 20:03:26 +0000 2016',
   'entities': {'hashtags': [{'indices': [98, 106], 'text': 'twitter'},
     {'indices': [107, 116], 'text': 'opendata'}],
    'symbols': [],
    'urls': [{'display_url': 'paper.li/postoditacco/1…',
      'expanded_url': 'http://paper.li/postoditacco/1381694616?edition_id=5d93afe0-4222-11e6-bfc9-0cc47a0d1609',
      'indices': [34, 57],
      'url': 'https://t.co/dkM1nYl3aQ'}],
    'user_mentions': [{'id': 239362738,
      'id_str': '239362738',
      'indices': [68, 79],
      'name': 'numeroteca',
      'screen_name': 'numeroteca'},
     {'id': 2234881,
      'id_str': '2234881',
      'indices': [80, 87],
      'name': 'Mindy McAdams',
      'screen_name': 'macloo'},
     {'id': 16911276,
      'id_str': '16911276',
      'indices': [88, 97],
      'name': 'Timetric',
      'screen_name': 'timetric'}]},
   'favorite_count': 0,
   'favorited': False,
   'geo': None,
   'id': 750057453123960832,
   'id_str': '750057453123960832',
   'in_reply_to_screen_name': None,
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str': None,
   'in_reply_to_user_id': None,
   'in_reply_to_user_id_str': None,
   'is_quote_status': False,
   'lang': 'en',
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'place': None,
   'possibly_sensitive': False,
   'retweet_count': 0,
   'retweeted': False,
   'source': '<a href="http://paper.li" rel="nofollow">Paper.li</a>',
   'text': 'The latest Data Journalism Daily! https://t.co/dkM1nYl3aQ Thanks to @numeroteca @macloo @timetric #twitter #opendata',
   'truncated': False,
   'user': {'contributors_enabled': False,
    'created_at': 'Wed Nov 07 00:44:46 +0000 2007',
    'default_profile': False,
    'default_profile_image': False,
    'description': 'Webaholic. Fact-checker. Storyteller. Data lover. #socialmedia & #communication inside (tm).\nAlso working on BI and ADV for an international publisher',
    'entities': {'description': {'urls': []},
     'url': {'urls': [{'display_url': 'myweb20.it',
        'expanded_url': 'http://www.myweb20.it',
        'indices': [0, 22],
        'url': 'http://t.co/8mvOl9RvlK'}]}},
    'favourites_count': 2648,
    'follow_request_sent': False,
    'followers_count': 2515,
    'following': False,
    'friends_count': 1942,
    'geo_enabled': True,
    'has_extended_profile': False,
    'id': 10016342,
    'id_str': '10016342',
    'is_translation_enabled': False,
    'is_translator': False,
    'lang': 'it',
    'listed_count': 304,
    'location': 'Damascus, Syria',
    'name': 'Roberto \(^o^)/',
    'notifications': False,
    'profile_background_color': 'FFFFFF',
    'profile_background_image_url': 'http://pbs.twimg.com/profile_background_images/771220795/d8f23f4028338de980af394548ba6953.jpeg',
    'profile_background_image_url_https': 'https://pbs.twimg.com/profile_background_images/771220795/d8f23f4028338de980af394548ba6953.jpeg',
    'profile_background_tile': True,
    'profile_banner_url': 'https://pbs.twimg.com/profile_banners/10016342/1405752496',
    'profile_image_url': 'http://pbs.twimg.com/profile_images/378800000703172715/8d3d40317dc63a86093683cfd650aa9e_normal.jpeg',
    'profile_image_url_https': 'https://pbs.twimg.com/profile_images/378800000703172715/8d3d40317dc63a86093683cfd650aa9e_normal.jpeg',
    'profile_link_color': '1146E8',
    'profile_sidebar_border_color': 'FFFFFF',
    'profile_sidebar_fill_color': 'FFFFFF',
    'profile_text_color': '342135',
    'profile_use_background_image': True,
    'protected': False,
    'screen_name': 'postoditacco',
    'statuses_count': 28727,
    'time_zone': 'Rome',
    'url': 'http://t.co/8mvOl9RvlK',
    'utc_offset': 7200,
    'verified': False}}]}

As you can see, there's a lot of stuff in here, even for just two tweets. In the top-level dictionary, there's a key search_metadata whose value is a dictionary with, well, metadata about the search: how many results it returned, what the query was, and what URL to use to get the next page of results. The value for the statuses key is a list of dictionaries, each of which contains information about the matching tweets. Tweets are limited to 140 characters, but have much more than 140 characters of metadata. Twitter has a good guide to what each of the fields mean here, but here are the most interesting key/value pairs from our perspective:

  • id_str: the unique numerical ID of the tweet
  • in_reply_to_status_id_str: the ID of the tweet that this tweet is a reply to, if applicable
  • retweet_count: number of times this tweet has been retweeted
  • retweet_status: the tweet that this tweet is a retweet of, if applicable
  • favorite_count: the number of times that this tweet has been favorited
  • text: the actual text of the tweet
  • user: a dictionary with information on the user who wrote the tweet, including the screen_name key which has the Twitter screen name of the user

NOTE: You can do much more with the query than just search for raw strings. The "Query operators" section on this page shows the different bits of syntax you can use to make your query more expressive.

You can form the URL of a particular tweet by combining the tweet's ID and the user's screen name using the following function. (This is helpful if you want to view the tweet in a web browser.)


In [19]:
def tweet_url(tweet):
    return 'https://twitter.com/' + \
        tweet['user']['screen_name'] + \
        "/statuses/" + \
        tweet['id_str']

In [20]:
tweet_url(response['statuses'][1])


Out[20]:
'https://twitter.com/postoditacco/statuses/750057453123960832'

Twython, the REST API, and method parameters

In general, Twython has one method for every "endpoint" in the Twitter REST API. Usually the Twython method has a name that resembles or is identical to the corresponding URL in the REST API. The Twython API documentation lists the available methods and which parts of the Twitter API they map to.

As a means of becoming more familiar with this, let's dig a bit deeper into search. The .search() method of Twython takes a number of different parameters, which match up with the query string parameters of the REST API's search/tweets endpoint as documented here. Every parameter that can be specified on the query string in a REST API call can also be included as a named parameter in the call to Twython's .search() method. The preceding examples already show some examples of this:


In [22]:
response = twitter.search(q="data journalism", result_type="recent", count=2)

This call to .search() includes the parameters q (which specifies the search query), result_type (which can be set to either popular, recent or mixed, depending on how you want results to be returned) and count (which specifies how many tweets you want returned in the response, with an upper limit of 100). Looking at the documentation, it appears that there's another interesting parameter we could play with: the geocode parameter, which will make our search respond with tweets only within a given radius of a particular latitude/longitude. Let's use this to find the screen names of people tweeting about data journalism within a few miles of Columbia University:


In [36]:
response = twitter.search(q="data journalism",
                          result_type="recent",
                          count=100,
                          geocode="40.807511,-73.963265,4mi")
for resp in response['statuses']:
    print(resp['user']['screen_name'])


Sash_Marguerite
RealRakhmetov
pminozzo
hscott61
DickensOlewe
cunyjschool

Getting a user's timeline

The Twitter API provides an endpoint for fetching the tweets of a particular user. The endpoint in the API for this is statuses/user_timeline and the function in Twython is .get_user_timeline(). This function looks a lot like .search() in the shape of its response. Let's try it out, fetching the last few tweets of the Twitter account of the Columbia Journalism School.


In [44]:
response = twitter.get_user_timeline(screen_name='columbiajourn',
                                     count=20,
                                     include_rts=False,
                                     exclude_replies=True)
for item in response:
    print(item['text'])


Final hour of giving for the Annual Fund: https://t.co/txFzXZ7rHI
You're on deadline for the Annual Fund, 14 hours (EST) remain: https://t.co/txFzXZ7rHI https://t.co/PVbCl5ve9E
It's not too late to contribute to the future of journalism through the Annual Fund. https://t.co/txFzXZ7rHI https://t.co/YsHB34WQpz
#cjs59 Paul Zimmerman, aka "Dr. Z," writes about his Jschool experience in this memoir excerpt: https://t.co/d1PT4m0f36 @SInow
For #WVAfloods coverage, see @NBCNews #cjs12 @MorganRadford https://t.co/nKEtr8KsXn
Congrats to @MagnumPhotos Nominee &amp; #cjs10 @dianamarkosian. Her work explores her Armenian heritage &amp; memory https://t.co/Rug8kUh0TC
2/2 #canalampliado included grad contributors 
Matteo Lonardi, @aglorios, @SarahLJorgensen &amp; @simko_bednarski  
https://t.co/uVXWsQ3z28
1/2 Adj Prof Walt Bogdanich, #cjs15 @jacwil28 &amp; @agMendezPty co-authored @nytimes article on #canalampliado
 https://t.co/uVXWsQ3z28

The screen_name parameter specifies whose timeline we want; the count parameter specifies how many tweets we want from that account. The include_rts and exclude_replies parameters control whether or not particular kinds of tweets are included in the response: setting include_rts to False means we don't see any retweets in our results, while the exclude_replies parameter set to True means we don't see any tweets that are replies to other users. (According to the API documentation, "Using exclude_replies with the count parameter will mean you will receive up-to count tweets — this is because the count parameter retrieves that many tweets before filtering out retweets and replies," which is why asking for 20 tweets doesn't necessarily return 20 tweets in this case.)

Note that the .get_user_timeline() function returns not a dictionary with a key whose value is a list of tweets, like the .search() function. Instead, it simply returns a JSON list:


In [46]:
response = twitter.get_user_timeline(screen_name='columbiajourn', count=1)
response


Out[46]:
[{'contributors': None,
  'coordinates': None,
  'created_at': 'Sat Jul 02 12:46:34 +0000 2016',
  'entities': {'hashtags': [{'indices': [43, 50], 'text': 'serial'}],
   'symbols': [],
   'urls': [{'display_url': 'twitter.com/vanityfair/sta…',
     'expanded_url': 'https://twitter.com/vanityfair/status/749196140084690944',
     'indices': [139, 140],
     'url': 'https://t.co/tf8e3J3mVS'}],
   'user_mentions': [{'id': 295652602,
     'id_str': '295652602',
     'indices': [3, 14],
     'name': 'Mirta Ojito',
     'screen_name': 'MirtaOjito'},
    {'id': 16369150,
     'id_str': '16369150',
     'indices': [85, 99],
     'name': 'Columbia Journalism',
     'screen_name': 'columbiajourn'},
    {'id': 15704275,
     'id_str': '15704275',
     'indices': [110, 122],
     'name': 'danachivvis',
     'screen_name': 'danachivvis'}]},
  'favorite_count': 0,
  'favorited': False,
  'geo': None,
  'id': 749222732689092608,
  'id_str': '749222732689092608',
  'in_reply_to_screen_name': None,
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'is_quote_status': True,
  'lang': 'en',
  'place': None,
  'possibly_sensitive': False,
  'quoted_status_id': 749196140084690944,
  'quoted_status_id_str': '749196140084690944',
  'retweet_count': 2,
  'retweeted': False,
  'retweeted_status': {'contributors': None,
   'coordinates': None,
   'created_at': 'Sat Jul 02 12:24:01 +0000 2016',
   'entities': {'hashtags': [{'indices': [27, 34], 'text': 'serial'}],
    'symbols': [],
    'urls': [{'display_url': 'twitter.com/vanityfair/sta…',
      'expanded_url': 'https://twitter.com/vanityfair/status/749196140084690944',
      'indices': [117, 140],
      'url': 'https://t.co/tf8e3J3mVS'}],
    'user_mentions': [{'id': 16369150,
      'id_str': '16369150',
      'indices': [69, 83],
      'name': 'Columbia Journalism',
      'screen_name': 'columbiajourn'},
     {'id': 15704275,
      'id_str': '15704275',
      'indices': [94, 106],
      'name': 'danachivvis',
      'screen_name': 'danachivvis'}]},
   'favorite_count': 7,
   'favorited': False,
   'geo': None,
   'id': 749217061264560128,
   'id_str': '749217061264560128',
   'in_reply_to_screen_name': None,
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str': None,
   'in_reply_to_user_id': None,
   'in_reply_to_user_id_str': None,
   'is_quote_status': True,
   'lang': 'en',
   'place': None,
   'possibly_sensitive': False,
   'quoted_status': {'contributors': None,
    'coordinates': None,
    'created_at': 'Sat Jul 02 11:00:53 +0000 2016',
    'entities': {'hashtags': [{'indices': [0, 7], 'text': 'Serial'}],
     'media': [{'display_url': 'pic.twitter.com/UdNROUbTWa',
       'expanded_url': 'http://twitter.com/VanityFair/status/749196140084690944/photo/1',
       'id': 749196137656180737,
       'id_str': '749196137656180737',
       'indices': [81, 104],
       'media_url': 'http://pbs.twimg.com/media/CmWt6vQXYAE5EsB.jpg',
       'media_url_https': 'https://pbs.twimg.com/media/CmWt6vQXYAE5EsB.jpg',
       'sizes': {'large': {'h': 960, 'resize': 'fit', 'w': 1440},
        'medium': {'h': 800, 'resize': 'fit', 'w': 1200},
        'small': {'h': 453, 'resize': 'fit', 'w': 680},
        'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
       'type': 'photo',
       'url': 'https://t.co/UdNROUbTWa'}],
     'symbols': [],
     'urls': [{'display_url': 'vntyfr.com/43mFxcm',
       'expanded_url': 'http://vntyfr.com/43mFxcm',
       'indices': [57, 80],
       'url': 'https://t.co/cZiTmFiyXV'}],
     'user_mentions': []},
    'extended_entities': {'media': [{'display_url': 'pic.twitter.com/UdNROUbTWa',
       'expanded_url': 'http://twitter.com/VanityFair/status/749196140084690944/photo/1',
       'id': 749196137656180737,
       'id_str': '749196137656180737',
       'indices': [81, 104],
       'media_url': 'http://pbs.twimg.com/media/CmWt6vQXYAE5EsB.jpg',
       'media_url_https': 'https://pbs.twimg.com/media/CmWt6vQXYAE5EsB.jpg',
       'sizes': {'large': {'h': 960, 'resize': 'fit', 'w': 1440},
        'medium': {'h': 800, 'resize': 'fit', 'w': 1200},
        'small': {'h': 453, 'resize': 'fit', 'w': 680},
        'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
       'type': 'photo',
       'url': 'https://t.co/UdNROUbTWa'}]},
    'favorite_count': 43,
    'favorited': False,
    'geo': None,
    'id': 749196140084690944,
    'id_str': '749196140084690944',
    'in_reply_to_screen_name': None,
    'in_reply_to_status_id': None,
    'in_reply_to_status_id_str': None,
    'in_reply_to_user_id': None,
    'in_reply_to_user_id_str': None,
    'is_quote_status': False,
    'lang': 'en',
    'place': None,
    'possibly_sensitive': False,
    'retweet_count': 24,
    'retweeted': False,
    'source': '<a href="http://www.socialflow.com" rel="nofollow">SocialFlow</a>',
    'text': '#Serial star Adnan Syed is getting a new trial after all https://t.co/cZiTmFiyXV https://t.co/UdNROUbTWa',
    'truncated': False,
    'user': {'contributors_enabled': False,
     'created_at': 'Mon Jun 30 14:17:35 +0000 2008',
     'default_profile': False,
     'default_profile_image': False,
     'description': 'In-depth reporting, gripping narratives, and world-class photography, plus heaping doses of Oscar-blogging, royal-watching, and assorted guilty pleasures.',
     'entities': {'description': {'urls': []},
      'url': {'urls': [{'display_url': 'vanityfair.com',
         'expanded_url': 'http://www.vanityfair.com',
         'indices': [0, 22],
         'url': 'http://t.co/THwtnNR9Wy'}]}},
     'favourites_count': 3355,
     'follow_request_sent': False,
     'followers_count': 4030067,
     'following': False,
     'friends_count': 1066,
     'geo_enabled': False,
     'has_extended_profile': False,
     'id': 15279429,
     'id_str': '15279429',
     'is_translation_enabled': False,
     'is_translator': False,
     'lang': 'en',
     'listed_count': 18366,
     'location': 'New York, NY',
     'name': 'VANITY FAIR',
     'notifications': False,
     'profile_background_color': '000000',
     'profile_background_image_url': 'http://pbs.twimg.com/profile_background_images/611295825776357376/bqC8Op6g.jpg',
     'profile_background_image_url_https': 'https://pbs.twimg.com/profile_background_images/611295825776357376/bqC8Op6g.jpg',
     'profile_background_tile': False,
     'profile_banner_url': 'https://pbs.twimg.com/profile_banners/15279429/1434576586',
     'profile_image_url': 'http://pbs.twimg.com/profile_images/694922349238308864/wVeLVf86_normal.jpg',
     'profile_image_url_https': 'https://pbs.twimg.com/profile_images/694922349238308864/wVeLVf86_normal.jpg',
     'profile_link_color': '990000',
     'profile_sidebar_border_color': '000000',
     'profile_sidebar_fill_color': 'DDFFCC',
     'profile_text_color': '333333',
     'profile_use_background_image': True,
     'protected': False,
     'screen_name': 'VanityFair',
     'statuses_count': 61255,
     'time_zone': 'Eastern Time (US & Canada)',
     'url': 'http://t.co/THwtnNR9Wy',
     'utc_offset': -14400,
     'verified': True}},
   'quoted_status_id': 749196140084690944,
   'quoted_status_id_str': '749196140084690944',
   'retweet_count': 2,
   'retweeted': False,
   'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
   'text': "If you haven't listened to #serial, do. Produced by one of my former @columbiajourn students, @danachivvis.So proud. https://t.co/tf8e3J3mVS",
   'truncated': False,
   'user': {'contributors_enabled': False,
    'created_at': 'Mon May 09 12:43:21 +0000 2011',
    'default_profile': True,
    'default_profile_image': False,
    'description': 'Passionate about the news. Author, Hunting Season: Immigration and Murder in an All-American Town. Director, News Standards @Telemundo.',
    'entities': {'description': {'urls': []}},
    'favourites_count': 412,
    'follow_request_sent': False,
    'followers_count': 1249,
    'following': False,
    'friends_count': 503,
    'geo_enabled': False,
    'has_extended_profile': False,
    'id': 295652602,
    'id_str': '295652602',
    'is_translation_enabled': False,
    'is_translator': False,
    'lang': 'en',
    'listed_count': 65,
    'location': 'New York, Miami',
    'name': 'Mirta Ojito',
    'notifications': False,
    'profile_background_color': 'C0DEED',
    'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
    'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',
    'profile_background_tile': False,
    'profile_banner_url': 'https://pbs.twimg.com/profile_banners/295652602/1456203979',
    'profile_image_url': 'http://pbs.twimg.com/profile_images/1345286862/Mirta_Ojito_normal.jpg',
    'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1345286862/Mirta_Ojito_normal.jpg',
    'profile_link_color': '0084B4',
    'profile_sidebar_border_color': 'C0DEED',
    'profile_sidebar_fill_color': 'DDEEF6',
    'profile_text_color': '333333',
    'profile_use_background_image': True,
    'protected': False,
    'screen_name': 'MirtaOjito',
    'statuses_count': 1529,
    'time_zone': 'Eastern Time (US & Canada)',
    'url': None,
    'utc_offset': -14400,
    'verified': False}},
  'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
  'text': "RT @MirtaOjito: If you haven't listened to #serial, do. Produced by one of my former @columbiajourn students, @danachivvis.So proud. https:…",
  'truncated': False,
  'user': {'contributors_enabled': False,
   'created_at': 'Fri Sep 19 19:49:35 +0000 2008',
   'default_profile': False,
   'default_profile_image': False,
   'description': 'We train students to become high-impact journalists. https://t.co/JSRYKFLGsg',
   'entities': {'description': {'urls': [{'display_url': 'bitly.com/cjs_adm',
       'expanded_url': 'http://bitly.com/cjs_adm',
       'indices': [53, 76],
       'url': 'https://t.co/JSRYKFLGsg'}]},
    'url': {'urls': [{'display_url': 'journalism.columbia.edu',
       'expanded_url': 'http://www.journalism.columbia.edu',
       'indices': [0, 23],
       'url': 'https://t.co/G5PSZbcV3Q'}]}},
   'favourites_count': 968,
   'follow_request_sent': False,
   'followers_count': 41146,
   'following': False,
   'friends_count': 1112,
   'geo_enabled': False,
   'has_extended_profile': False,
   'id': 16369150,
   'id_str': '16369150',
   'is_translation_enabled': False,
   'is_translator': False,
   'lang': 'en',
   'listed_count': 1981,
   'location': 'New York, NY',
   'name': 'Columbia Journalism',
   'notifications': False,
   'profile_background_color': '002664',
   'profile_background_image_url': 'http://pbs.twimg.com/profile_background_images/570414669/xokdu8996gc5omt2fa0l.jpeg',
   'profile_background_image_url_https': 'https://pbs.twimg.com/profile_background_images/570414669/xokdu8996gc5omt2fa0l.jpeg',
   'profile_background_tile': True,
   'profile_banner_url': 'https://pbs.twimg.com/profile_banners/16369150/1466177743',
   'profile_image_url': 'http://pbs.twimg.com/profile_images/378800000120489911/fd97e4435944df1284661f5f044c5574_normal.jpeg',
   'profile_image_url_https': 'https://pbs.twimg.com/profile_images/378800000120489911/fd97e4435944df1284661f5f044c5574_normal.jpeg',
   'profile_link_color': '0039A6',
   'profile_sidebar_border_color': '051DF5',
   'profile_sidebar_fill_color': 'D2D3D9',
   'profile_text_color': '080808',
   'profile_use_background_image': True,
   'protected': False,
   'screen_name': 'columbiajourn',
   'statuses_count': 8790,
   'time_zone': 'Eastern Time (US & Canada)',
   'url': 'https://t.co/G5PSZbcV3Q',
   'utc_offset': -14400,
   'verified': False}}]

Using cursors to fetch more results

The .search() and .get_user_timeline() functions by default only return the most recent results, up to the number specified with count (and sometimes even fewer than that). In order to find older tweets, you need to page through the results. If you were doing this "by hand," you would use the max_id or since_id parameters to find tweets older than the last tweet in the current result, repeating that process until you'd exhausted the results (or found as many tweets as you need). This is delicate work and thankfully Twython includes pre-built functionality to make this easier: the .cursor() function.

The .cursor() function takes the function you want to page through as the first parameter, and after that the keyword parameters that you would normally pass to that function. Given this information, it can repeatedly call the given function on your behalf, going back as far as it can. The object returned from the .cursor() function can be used as the iterable object in a for loop, allowing you to iterate over all of the results. Here's an example using .search():


In [65]:
cursor = twitter.cursor(twitter.search, q='"data journalism" -filter:retweets', count=100)
all_text = list()
for tweet in cursor:
    all_text.append(tweet['text'])
    if len(all_text) > 500: # stop after 1000 tweets
        break

This snippet finds 500 tweets containing the phrase data journalism (excluding retweets) and stores the text of those tweets in a list. We can then use this text for data analysis, like a simple word count:


In [73]:
from collections import Counter
import re
c = Counter()
for text in all_text:
    c.update([t.lower() for t in text.split()])
# most common ten words that have a length greater than three and aren't
# "data" or "journalism"
[k for k, v in c.most_common() \
 if len(k) > 3 and not(re.search(r"data|journalism", k))][:25]


Out[73]:
['guide',
 'simple',
 'marketers',
 'every',
 'maps',
 'death',
 "it's",
 'wikileaks',
 'just',
 'iraq:',
 'deaths',
 'about',
 'went',
 'this',
 'history',
 'live,',
 'awards',
 'biggest',
 'leak',
 'gana',
 'papers',
 '@rezahakbari',
 'https://t.co/ipjp4v3g67',
 '2016”',
 'from']

Rate limits

TK!

Entities

TK!

Posting tweets

You can also use the Twitter API to post tweets on behalf of a user. In this tutorial, we're going to use this ability of the API to make a simple bot.

Simple example

When you first create a Twitter application, the credentials you have by default (i.e., the ones you get when you click "Create my access token") are for your own user. This means that you can post tweets to your own account using these credentials, and to your own account only. This isn't normally very desirable, but let's give it a shot, just to see how to update your status (i.e., post a tweet) with Twython. Here we go:


In [82]:
twitter.update_status(status="This is a test tweet for a tutorial I'm going through, please ignore")


Out[82]:
{'contributors': None,
 'coordinates': None,
 'created_at': 'Tue Jul 05 03:27:25 +0000 2016',
 'entities': {'hashtags': [], 'symbols': [], 'urls': [], 'user_mentions': []},
 'favorite_count': 0,
 'favorited': False,
 'geo': None,
 'id': 750169184496037888,
 'id_str': '750169184496037888',
 'in_reply_to_screen_name': None,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'is_quote_status': False,
 'lang': 'en',
 'place': None,
 'retweet_count': 0,
 'retweeted': False,
 'source': '<a href="http://www.example.com" rel="nofollow">aparrish test</a>',
 'text': "This is a test tweet for a tutorial I'm going through, please ignore",
 'truncated': False,
 'user': {'contributors_enabled': False,
  'created_at': 'Fri May 23 08:27:48 +0000 2008',
  'default_profile': True,
  'default_profile_image': False,
  'description': "For all the times you want to test aparrish, there's aparrishtest.",
  'entities': {'description': {'urls': []}},
  'favourites_count': 0,
  'follow_request_sent': False,
  'followers_count': 13,
  'following': False,
  'friends_count': 5,
  'geo_enabled': False,
  'has_extended_profile': False,
  'id': 14879231,
  'id_str': '14879231',
  'is_translation_enabled': False,
  'is_translator': False,
  'lang': 'en',
  'listed_count': 0,
  'location': 'Brooklyntest',
  'name': 'aparrishtest',
  'notifications': False,
  'profile_background_color': 'C0DEED',
  'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
  'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',
  'profile_background_tile': False,
  'profile_image_url': 'http://pbs.twimg.com/profile_images/420488681/tagnic8yaaWp_normal',
  'profile_image_url_https': 'https://pbs.twimg.com/profile_images/420488681/tagnic8yaaWp_normal',
  'profile_link_color': '0084B4',
  'profile_sidebar_border_color': 'C0DEED',
  'profile_sidebar_fill_color': 'DDEEF6',
  'profile_text_color': '333333',
  'profile_use_background_image': True,
  'protected': False,
  'screen_name': 'aparrishtest',
  'statuses_count': 300,
  'time_zone': 'Central Time (US & Canada)',
  'url': None,
  'utc_offset': -18000,
  'verified': False}}

Check your account, and you'll see that your status has been updated! (You can safely delete this tweet if you'd like.) As you can see, the .update_status() function takes a single named parameter, status, which should have a string as its value. Twitter will update your status with the given string. The function returns a dictionary with information about the tweet that was just created.

Authorizing another user

Of course, you generally don't want to update your own status. You want to write a program that updates someone else's status, even if that someone else is a bot user of your own creation.

Before you proceed, create a new Twitter account. You'll need to log out of your current account and then open up the Twitter website, or (preferably) use your browser's "private" or "incognito" functionality. Every Twitter account requires a unique e-mail address, and you'll need to have access to the e-mail address to "verify" your account, so make sure you have an e-mail address you can use (and check) other than the one you used for your primary Twitter account. (We'll go over this process in class.)

Once you've created a new Twitter account, you'll need to have that user authorize the Twitter application we created earlier to tweet on its behalf. Doing this is a two-step process. Run the cell below (making sure that the api_key and api_secret variables have been set to the consumer key and consumer secret of your application, respectively), and then open the URL it prints out while you are logged into your bot's account.


In [ ]:
twitter = twython.Twython(api_key, api_secret)
auth = twitter.get_authentication_tokens()
print("Log into Twitter as the user you want to authorize and visit this URL:")
print("\t" + auth['auth_url'])

On the page that appears, confirm that you want to authorize the application. A PIN will appear. Paste this PIN into the cell below, as the value assigned to the variable pin.


In [ ]:
pin = ""

twitter = twython.Twython(api_key, api_secret, auth['oauth_token'], auth['oauth_token_secret'])
tokens = twitter.get_authorized_tokens(pin)

new_access_token = tokens['oauth_token']
new_token_secret = tokens['oauth_token_secret']
print("your access token:", new_access_token)
print("your token secret:", new_token_secret)

Great! Now you have an access token and token secret for your bot's account. Run the cell below to create a new Twython object authorized with these credentials.


In [85]:
twitter = twython.Twython(api_key, api_secret, new_access_token, new_token_secret)

And run the following cell to post a test tweet:


In [86]:
twitter.update_status(status="hello, world!")


Out[86]:
{'contributors': None,
 'coordinates': None,
 'created_at': 'Tue Jul 05 03:41:20 +0000 2016',
 'entities': {'hashtags': [], 'symbols': [], 'urls': [], 'user_mentions': []},
 'favorite_count': 0,
 'favorited': False,
 'geo': None,
 'id': 750172685100023808,
 'id_str': '750172685100023808',
 'in_reply_to_screen_name': None,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'is_quote_status': False,
 'lang': 'en',
 'place': None,
 'retweet_count': 0,
 'retweeted': False,
 'source': '<a href="http://www.example.com" rel="nofollow">aparrish test</a>',
 'text': 'hello, world!',
 'truncated': False,
 'user': {'contributors_enabled': False,
  'created_at': 'Sat Jun 16 19:54:12 +0000 2007',
  'default_profile': False,
  'default_profile_image': False,
  'description': 'I write computer programs that write poetry (uh, generously construed). Recently resident @FordhamEnglish; adjunct @NYU_ITP. My bots: https://t.co/ZhhCXX5tdQ',
  'entities': {'description': {'urls': [{'display_url': 'twitter.com/aparrish/lists…',
      'expanded_url': 'https://twitter.com/aparrish/lists/my-bots',
      'indices': [134, 157],
      'url': 'https://t.co/ZhhCXX5tdQ'}]},
   'url': {'urls': [{'display_url': 'decontextualize.com',
      'expanded_url': 'http://www.decontextualize.com/',
      'indices': [0, 23],
      'url': 'https://t.co/NvexPAokK8'}]}},
  'favourites_count': 23715,
  'follow_request_sent': False,
  'followers_count': 3435,
  'following': False,
  'friends_count': 1013,
  'geo_enabled': False,
  'has_extended_profile': False,
  'id': 6857962,
  'id_str': '6857962',
  'is_translation_enabled': False,
  'is_translator': False,
  'lang': 'en',
  'listed_count': 204,
  'location': 'Brooklyn, NY',
  'name': 'Allison Parrish',
  'notifications': False,
  'profile_background_color': '9AE4E8',
  'profile_background_image_url': 'http://pbs.twimg.com/profile_background_images/4483699/phelpslovecraft.png',
  'profile_background_image_url_https': 'https://pbs.twimg.com/profile_background_images/4483699/phelpslovecraft.png',
  'profile_background_tile': True,
  'profile_banner_url': 'https://pbs.twimg.com/profile_banners/6857962/1430626711',
  'profile_image_url': 'http://pbs.twimg.com/profile_images/730146910892331008/V4IsS80w_normal.jpg',
  'profile_image_url_https': 'https://pbs.twimg.com/profile_images/730146910892331008/V4IsS80w_normal.jpg',
  'profile_link_color': '0000FF',
  'profile_sidebar_border_color': '87BC44',
  'profile_sidebar_fill_color': 'E0FF92',
  'profile_text_color': '000000',
  'profile_use_background_image': False,
  'protected': False,
  'screen_name': 'aparrish',
  'statuses_count': 11771,
  'time_zone': 'Eastern Time (US & Canada)',
  'url': 'https://t.co/NvexPAokK8',
  'utc_offset': -14400,
  'verified': False}}

Simple bot example

We're now going to make a simple bot that posts tweets with information about a randomly selected lake from the MONDIAL database. First, we'll make a big list of dictionaries with all of the information from the table:


In [88]:
import pg8000

lakes = list()

conn = pg8000.connect(database="mondial")
cursor = conn.cursor()
cursor.execute("SELECT name, area, depth, elevation, type, river FROM lake")
for row in cursor.fetchall():
    lakes.append({'name': row[0],
                 'area': row[1],
                 'depth': row[2],
                 'elevation': row[3],
                 'type': row[4],
                 'river': row[5]})
len(lakes)


Out[88]:
143

The following dictionary maps each column to a sentence frame:


In [107]:
sentences = {
    'area': 'The area of {} is {} square kilometers.',
    'depth': 'The depth of {} is {} meters.',
    'elevation': 'The elevation of {} is {} meters.',
    'type': 'The type of {} is "{}."',
    'river': '{} empties into a river named {}.'
}

The following cell selects a random lake from the list, and a random sentence frame from the sentences dictionary, and attempts to fill in the frame with relevant information from the lake.


In [112]:
import random
def random_lake_sentence(lakes, sentences):
    rlake = random.choice(lakes)
    # get the keys in the dictionary whose value is not None; we'll only try to
    # make sentences for these
    possible_keys = [k for k, v in rlake.items() if v is not None and k != 'name']
    rframe = random.choice(possible_keys)
    return sentences[rframe].format(rlake['name'], rlake[rframe])
for i in range(10):
    print(random_lake_sentence(lakes, sentences))


The area of Lago Maggiore is 216 square kilometers.
The type of Lake Eyre is "salt."
Lake Burley Griffin empties into a river named Murrumbidgee River.
The depth of Lake Eyre is 4 meters.
The elevation of Lake Tana is 1830 meters.
The area of Kuybyshev Reservoir is 6450 square kilometers.
Lago di Como empties into a river named Adda.
The depth of Lake Bosumtwi is 81 meters.
The area of Lake Winnipesaukee is 186 square kilometers.
The area of Caspian Sea is 386400 square kilometers.

We can now call the .update_status() function with the result of the random text generation function:


In [110]:
twitter.update_status(status=random_lake_sentence(lakes, sentences))


Out[110]:
{'contributors': None,
 'coordinates': None,
 'created_at': 'Tue Jul 05 04:02:25 +0000 2016',
 'entities': {'hashtags': [], 'symbols': [], 'urls': [], 'user_mentions': []},
 'favorite_count': 0,
 'favorited': False,
 'geo': None,
 'id': 750177992240947200,
 'id_str': '750177992240947200',
 'in_reply_to_screen_name': None,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'is_quote_status': False,
 'lang': 'es',
 'place': None,
 'retweet_count': 0,
 'retweeted': False,
 'source': '<a href="http://www.example.com" rel="nofollow">aparrish test</a>',
 'text': 'Lago de Chapala empties into a river named Rio Lerma.',
 'truncated': False,
 'user': {'contributors_enabled': False,
  'created_at': 'Sat Jun 16 19:54:12 +0000 2007',
  'default_profile': False,
  'default_profile_image': False,
  'description': 'I write computer programs that write poetry (uh, generously construed). Recently resident @FordhamEnglish; adjunct @NYU_ITP. My bots: https://t.co/ZhhCXX5tdQ',
  'entities': {'description': {'urls': [{'display_url': 'twitter.com/aparrish/lists…',
      'expanded_url': 'https://twitter.com/aparrish/lists/my-bots',
      'indices': [134, 157],
      'url': 'https://t.co/ZhhCXX5tdQ'}]},
   'url': {'urls': [{'display_url': 'decontextualize.com',
      'expanded_url': 'http://www.decontextualize.com/',
      'indices': [0, 23],
      'url': 'https://t.co/NvexPAokK8'}]}},
  'favourites_count': 23715,
  'follow_request_sent': False,
  'followers_count': 3435,
  'following': False,
  'friends_count': 1013,
  'geo_enabled': False,
  'has_extended_profile': False,
  'id': 6857962,
  'id_str': '6857962',
  'is_translation_enabled': False,
  'is_translator': False,
  'lang': 'en',
  'listed_count': 203,
  'location': 'Brooklyn, NY',
  'name': 'Allison Parrish',
  'notifications': False,
  'profile_background_color': '9AE4E8',
  'profile_background_image_url': 'http://pbs.twimg.com/profile_background_images/4483699/phelpslovecraft.png',
  'profile_background_image_url_https': 'https://pbs.twimg.com/profile_background_images/4483699/phelpslovecraft.png',
  'profile_background_tile': True,
  'profile_banner_url': 'https://pbs.twimg.com/profile_banners/6857962/1430626711',
  'profile_image_url': 'http://pbs.twimg.com/profile_images/730146910892331008/V4IsS80w_normal.jpg',
  'profile_image_url_https': 'https://pbs.twimg.com/profile_images/730146910892331008/V4IsS80w_normal.jpg',
  'profile_link_color': '0000FF',
  'profile_sidebar_border_color': '87BC44',
  'profile_sidebar_fill_color': 'E0FF92',
  'profile_text_color': '000000',
  'profile_use_background_image': False,
  'protected': False,
  'screen_name': 'aparrish',
  'statuses_count': 11773,
  'time_zone': 'Eastern Time (US & Canada)',
  'url': 'https://t.co/NvexPAokK8',
  'utc_offset': -14400,
  'verified': False}}

To make this into a "bot," we'd need to go an extra step: move all of this code into a standalone Python script, and set a cron job to run it every so often (maybe every few hours).

Further reading

TK

posting to twitter


In [ ]:

authorizing into twitter

getting access tokens for yourself:

  • created an app

  • authorize yourself to use that app using the "create token" button

getting access tokens for someone else:

  • create the app

  • have the other user authorize the application <- ocmplicated process!!!


In [ ]:
twitter = twython.Twython(api_key, api_secret)
auth = twitter.get_authentication_tokens()
print("Log into Twitter as the user you want to authorize and visit this URL")

In [ ]: