In [1]:
from __future__ import print_function, absolute_import, division

Re-Introduction to Software Engineering:

The initial chain of software development

Version 0.1


By AA Miller 2017 Apr 21

During the first session of the DSFP we spent a significant amount of time learning about version control and git/github. As we have continued to use git as the software management system for the LSSTC DSFP, we will not be reviewing that material at this time.

Instead we are going to review the basic elements of software engineering as introduced to us by Jake VanderPlas. The four steps are (and note that I've omitted step 0, which is to use git for version control throughout this process):

  1. Begin the project in Jupyter notebook (this is best for exploration/before you know the final library structure)

Instead we are going to review the basic elements of software engineering as introduced to us by Jake VanderPlas. The four steps are (and note that I've omitted step 0, which is to use git for version control throughout this process):

  1. Begin the project in Jupyter notebook (this is best for exploration/before you know the final library structure)
  2. Create a python directory for the project (most important is an __init__.py file so you can import your library).

Instead we are going to review the basic elements of software engineering as introduced to us by Jake VanderPlas. The four steps are (and note that I've omitted step 0, which is to use git for version control throughout this process):

  1. Begin the project in Jupyter notebook (this is best for exploration/before you know the final library structure)
  2. Create a python directory for the library (most important is an init.py file so you can import your library).
  3. Build unit tests (ensure that the library is portable/not broken)

Instead we are going to review the basic elements of software engineering as introduced to us by Jake VanderPlas. The four steps are (and note that I've omitted step 0, which is to use git for version control throughout this process):

  1. Begin the project in Jupyter notebook (this is best for exploration/before you know the final library structure)
  2. Create a python directory for the library (most important is an init.py file so you can import your library).
  3. Build unit tests (ensure that the library is portable/not broken)
  4. Develop a platform for continuous integration, e.g. Travis-CI

Before we begin with the actual exercise, a quick aside.

A quick note on modular programming. [Previously we urged you to build talks in a modular fashion - the idea here is similar but not exactly the same.]

The idea - each individual "idea" should be contained within a single module. This does not mean every call to NumPy should be it's own module, but the code should be organized via a series of small code snippets.

The basic appeal -- modular progamming improves:

  • Readability - for the benefit of you(!) and others

The basic appeal -- modular progamming improves:

  • Readability - for the benefit of you(!) and others
  • Reproducibility - our work is built on foundation of reproducibility [documentation also important]

The basic appeal -- modular progamming improves:

  • Readability - for the benefit of you(!) and others
  • Reproducibility - our work is built on foundation of reproducibility [documentation also important]
  • Easier to profile - easier to improve speed of modular programs

tmpAcct4Edu PleaseDontHackUsThisIs4Education

Problem 1) Develop the Code in Jupyter Notebook

For this problem we are going to focus on the steps associated with development, and skip (most of) the nitty gritty for writing the code for this problem. We will start by creating a Jupyter notebook with the basics of our software. The is realtively simple: we will develop a script to retrieve the last $N$ tweets from any specified twitter user. The basics of such a script are as follows:

import tweepy

consumer_key = "bhzpKBdspYr2xSDb0RxpI586q"
consumer_secret = "FfddeX3qatIeXoA51LJbgHs4qNsYAoNoWIqnlMISr3E7P4x03L"
access_key = "855466876364877825-pUkJcfH48x3rEnlFKvSLJaWZ0jzg6Nc"
access_secret = "9JHPalnxb6PVineBeCFFU5L98PD7EMOUBuwemM8vj8hA9"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)

tweets = api.user_timeline(screen_name = twitter_acct ,count = Ntweets)

return [tweet.text for tweet in tweets]

Note 1 - you likely need to pip install tweepy. You may also need to restart your kernel after that installation.

Note 2 - I have created a dummy twitter account to provide keys and secret codes to use the twitter API. I will change those keys after Monday. It goes without saying that secret keys should not be uploaded to github.

Problem 1a

Create a notebook with a function get_recent_tweets that returns the last $N$ tweets from any specified twitter user.

Test the function by retrieving tweets. If you don't have a favorite twitter user, you can check my account, MillerAdamA (likely boring), or Lucianne's account, shaka_lulu (probably more interesting).

Hint - only a small modification is needed to the example code given above.


In [4]:
import tweepy

def get_recent_tweets(twitter_acct, Ntweets, print_results = False):
    """Get the last N tweets from a twitter user
    
    
    Parameters
    ----------
    twitter_acct : str
        Twitter handle for the user
        
    Ntweets : int
        Number of tweets to be returned
    
    print_results : bool (default = True)
        Print the tweets at the command line.
        
    Output
    ------
    the Ntweets most recent tweets from twitter user twitter_acct
    """
    
    consumer_key = "bhzpKBdspYr2xSDb0RxpI586q"
    consumer_secret = "FfddeX3qatIeXoA51LJbgHs4qNsYAoNoWIqnlMISr3E7P4x03L"
    access_key = "855466876364877825-pUkJcfH48x3rEnlFKvSLJaWZ0jzg6Nc"
    access_secret = "9JHPalnxb6PVineBeCFFU5L98PD7EMOUBuwemM8vj8hA9"

    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    tweets = api.user_timeline(screen_name = twitter_acct ,count = Ntweets)
    
    if print_results:
        print("The last {:d} tweets from {:s} are:".format(Ntweets, twitter_acct))
        print([tweet.text for tweet in tweets])
    
    return [tweet.text for tweet in tweets]

get_recent_tweets("shaka_lulu", 2)


The last 2 tweets from shaka_lulu are:
["Thanks @arayyay for bringing that story to my attention. Y'all should follow her too.", 'But also? My friend, @LeagueOfExtra owes you (& the other artists whose images he "found" lololol) SOME FAT CHECKS AND APOLOGIES']
Out[4]:
["Thanks @arayyay for bringing that story to my attention. Y'all should follow her too.",
 'But also? My friend, @LeagueOfExtra owes you (& the other artists whose images he "found" lololol) SOME FAT CHECKS AND APOLOGIES']

Problem 2) Create Python Library

Create a directory retrieve_tweets/ which will serve as your new Python library to retrieve tweets from twitter.

Create an __init__.py file in retrieve_tweets/. The contents of this file can be empty.

Create the file get_recent_tweets.py in retrieve_tweets/, and include the get_recent_tweets function that you previously developed in a Jupyter notebook as the contents of this file.

Problem 2a

Check your work by importing the retrieve_tweets library and running the get_recent_tweets function from that library.

Use github to store the results of your work.

Problem 3) Create a Unit Test

Create a directory tests/ in retrieve_tweets/. In tests/, create an __init__.py file, and a test_get_recent_tweets.py file.

Problem 3a

Write a unit test for get_recent_tweets.py in test_get_recent_tweets.py.

Run your unit test to make sure your software is working.

Hint - Recall that nosetests is a great package for executing your unit tests.

Challenge Problem) Continuous Integration

Complete continuous integration of your library with Travis-CI.


In [ ]: