Assignment 1 - Intro to Data Science

Student Name
October 10, 2017

This assignment covers some beginning analysis exercises and an introduction to Python programming that covers basic operations to get you started building data science notebooks.

The objectives of this assignment are:

  • Review of data science project lifecycle
  • Review of data science project roles
  • Simple analysis excercises
  • Experience in the Jupyter notebook development environment
  • Practice using Python for data analysis
  • Crash course in Markdown document formatting

Question 1 - Project Lifecycle

For each item below, write a short description of the following project activities:

1: Problem Statement

2: Data Profile & Discovery

3: Hypothesis Statement

4: Data Preparation

5: Model Development

6: Model Evaluation

Question 2 - Project Roles

For each item below, briefly desribe the following roles in a Data Science project:

1: Project Manager

2: Domain Expert

3: Data Engineer

4: Data Scientist

5: Graphic Designer

Question 3 - Comparison between Business Intelligence & Data Science

a List some common objectives and activities in a Business Intelligence Project:

  • Response 1
  • Response 2
  • Response 3
  • Response 4

b. List some common objectives and activities in a Data Science Project:

  • Response 1
  • Response 2
  • Response 3
  • Response 4

Question 4: Simple Data Analysis - World Population Demographics

Using the World Bank - Population Data Site, complete the following analysis tasks: (You may use any tools for this excercise)

a. Identify the top three countries with the largest population growth in the past five years.

  • Country 1
  • Country 2
  • Country 3

b. Identify the top three countries with the lowest (or negative) population growth in the past five years.

  • Country 1
  • Country 2
  • Country 3

Question 5 - Python list objects

For each item below, enter the code to complete the operation:


In [14]:
#  a. take a list of [2, 3, 4] and multiply it by 3 to get [6, 9, 12]
a = [1, 2, 3]

In [10]:
# b  Return count of 'white' in list
colors = ['red', 'white', 'blue', 'white', 'purple', 'brown', 'white']

In [11]:
# c  Add value 'green' to color list below
colors = ['red', 'white', 'blue', 'white', 'purple', 'brown', 'white']

Question 6 - Python dictionaries

For each item below, enter the code to complete the operation:


In [ ]:
# a  Add an additional value [c:30] to the following dictionary
tens = {a:10, b:20}

In [ ]:
# b  Print the value of 'b'
tens = {a:10, b:20, c:30}

In [ ]:
# c Merge the following two dictionaries in a single new dictionary
t1 = {'a': 100, 'b': 200}
t2 = {'c': 300, 'd': 400}

Question 7 - Python loops

For each item below, enter the code to complete the operation:


In [19]:
# a  Using the range() function, write a short for loop to print numbers from 1 to 10:

In [20]:
# b  Write a short while loop to create a list of numbers from 1 to 10
a = []
# Loop
print(len(a))


0

In [21]:
# c  Write a short loop from 1 to 100. Print 'fizz' if the number if evenly divisible by 3, print 'buzz'
#    if the number is evenly divisible by 5, and print 'fizzbuzz' if number is divisible by both 3 and 5.

Question 8 - Python date formatting

For each item below, enter the code to complete the operation:


In [25]:
# a Print the current date in the format: "Month, Day Year"
from datetime import date

In [32]:
# b Calculate the number of days between the following two dates:
from datetime import date
d1 = date(2016, 10, 3)
d2 = date(2017, 10, 3)

In [33]:
# c Print current date and time in the format: 'YYYY-MM-DD HH:MM:SS'
import datetime

Question 9 - Data files

For each item below, enter the code to complete the operation:


In [42]:
# a complete the following to load the data file and print the number of rows
import csv
import requests

url = 'http://winterolympicsmedals.com/medals.csv'
r = requests.get(url)

In [44]:
# b complete the following to save the file to a local copy: (open('dataset.csv', 'w')) 
import urllib.request

url = 'http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'

In [45]:
# c complete the following to read a local file: (open('dataset.csv', 'r')) 
import csv

Question 10 - (Optional) - Probability Puzzle

  1. Single Dice Game (You may use any tools for this excercise)

    • You are playing a game with one (1) standard six-sided dice (die)
    • You may roll three (3) times
    • You will receive a reward of the amount you roll: (1-6)
    • You cannot keep prior rolls
  2. What is the expected average payout you will receive playing this game?

All Done