We review the material we've covered to date: Python fundamentals, data input with Pandas, and graphics with Matplotlib. Questions marked Bonus are more difficult and are there to give the experts something to do.
This Jupyter notebook was created by Dave Backus, Chase Coleman, and Spencer Lyon for the NYU Stern course Data Bootcamp.
This version was modified by (add your name in bold here). And add your initials to the notebook's name at the top.
In [ ]:
# import packages
import pandas as pd # data management
import matplotlib.pyplot as plt # graphics
# Jupyter command, puts plots in notebook
%matplotlib inline
# check Python version
import datetime as dt
import sys
print('Today is', dt.date.today())
print('What version of Python are we running? \n', sys.version, sep='')
Question 1.
Answers. Enter your answers below:
Question 2. Describe the type and content of these expressions (by content we mean, if the expression defines a variable, what is the name and value of the variable?):
x = 2
y = 3.0
z = "3.0"
x/y
letters = 'abcd'
letters[-1]
xyz = [x, y, z]
xyz[1]
abcd = list(letters)
abcd[-2]
case = {'a': 'A', 'b': 'B', 'c': 'C'}
case['c']
2 >= 1
x == 2
Answers. Enter your answers below:
In [ ]:
# code cell for experimenting
Question 3. These get progressively more difficult:
dollars = '$1,234.5'
?dollars
. dollars
. dollars
and covert the result to a float.In each case, create a code cell that delivers the answer. Please write the question number in a comment in each cell.
In [ ]:
Question 4.
For this problem we set letters = 'abcd'
as in problem 2.
'a'
to the upper case letter 'A'
. letters
and prints their upper case versions.letters
. On each interation, print a string consisting of the upper and lower case versions together; eg, 'Aa'
. In each case, create a code cell that delivers the answer. Please write the question number in a comment in each cell.
In [ ]:
Question 5.
For this problem xyz
is the same as defined in problem 2 (parts 1, 2, 3, and 7)
xyz
and prints them.xyz
and their type. In each case, create a code cell that delivers the answer. Please write the question number in a comment in each cell.
In [ ]:
Question 6. Write code in the cell below that reads the csv file we posted at
http://pages.stern.nyu.edu/~dbackus/Data/debt.csv
Assign the contents of the file to the object debt
.
In [ ]:
The rest of the questions in this notebook will refer to the object debt
you create below.
In [ ]:
# if that failed, you can generate the same data with
data = {'ARG': [137.5, 106.0, 61.8, 47.0, 39.1, 37.3, 48.6],
'DEU': [59.2, 64.6, 66.3, 64.9, 80.3, 79.0, 73.1],
'GRC': [98.1, 94.9, 102.9, 108.8, 145.7, 156.5, 177.2],
'Year': [2002, 2004, 2006, 2008, 2010, 2012, 2014]}
debt = pd.DataFrame(data)
Question 7. Let's describe the object debt
:
debt
?In each case, create a code cell that delivers the answer. Please write the question number in a comment in each cell.
In [ ]:
Question 8. Do the following with debt
:
Year
as the index. The next three get progressively more difficult:
Some simple plots:
Year
using a plot
method. In each case, create a code cell that delivers the answer. Please write the question number in a comment in each cell.
In [ ]:
Question 9.
plt.subplots()
. debt
data and the axis object we just created. ['red', 'green', 'blue']
. plot
ting a separate line applied to the same axis object. In each case, create a code cell that delivers the answer. Please write the question number in a comment in each cell.
In [ ]:
Question 10. In the figure of the previous question:
In each case, create a code cell that delivers the answer. Please write the question number in a comment in each cell.
In [ ]:
Question 11. We ran across this one in the OECD healthcare data. The country names had numbers appended, which served as footnotes in the original spreadsheet but looked dumb when we used them as index labels. The question is how to eliminate them. A short version of the country names is
names = ['Australia 1', 'Canada 2', 'Chile 3', 'United States 1']
Do each of these in a separate code cell:
us = names[-1]
and call the rsplit()
method on us. What do you get?rsplit
to split us
into two pieces, the country name and the number 1. How would you extract just the country name?names
.names
. In each case, create a code cell that delivers the answer. Please write the question number in a comment in each cell.
In [ ]: