This lecture takes some of the ideas we developed last week, and organizes them into a workflow. We'll do this by example, analyzing the pronto data. Topics will include:
These topics provide a rough outline for an analysis workflow.
In [1]:
# Packages
from urllib import request
import os
import pandas as pd
In [2]:
# Constants used in analysis
TRIP_DATA = "https://data.seattle.gov/api/views/tw7j-dfaw/rows.csv?accessType=DOWNLOAD"
TRIP_FILE = "pronto_trips.csv"
WEATHER_DATA = "http://uwseds.github.io/data/pronto_weather.csv"
WEATHER_FILE = "pronto_weather.csv"
In [ ]:
# Get the URL data
#request.urlretrieve(TRIP_DATA, TRIP_FILE)
In [3]:
!ls -lh
Two challenges
In [7]:
# Example function
def xyz(input): # The function's name is "func". It has one argument "input".
return int(input) + 1 # The function returns one value, input + 1
print (xyz("3"))
#a = xyz(3)
#print (xyz(a))
In [8]:
def addTwo(input1, input2):
return input1 + input2
#
addTwo(1, 2)
Out[8]:
Colin will provide more details about function, such as variable scope, and multiple return values.
In [10]:
# Function to download from a URL
def download(url, filename):
print("Downloading", filename)
#request.urlretrieve(url, filename)
In [11]:
download(TRIP_DATA, TRIP_FILE)
In [13]:
# Enhancing function to detect file already present
import os.path
def download(url, filename):
if os.path.isfile(filename):
print("Already present %s." % filename)
else:
print("Downloading %s" % filename)
#request.urlretrieve(url, filename)
download(TRIP_DATA, "none.csv")
In [14]:
import download
download.download_file(TRIP_DATA, "none.csv")