Now You Code 1: Data Analysis of Movie Goers

In this assignment you will perform a data analysis of people who go to the movies.

A movie theatre chain asked movie goers to fill out a quick survey in exchange for a 1/2 price ticket. The survey asked for basic demographics: age, gender, occupation and zip code. This survey results are contained in the data file 'NYC1-moviegoers.csv'

In this assignment you will write a series of Python pandas code (in several cells) to answer some basic questions about the responses in the dataset.


In [1]:
# this turns off warning messages
import warnings
warnings.filterwarnings('ignore')

Part 1: Load the dataset

write code to import pandas and load the dataset (in csv format) into the variable moviegoers and then print a random sample of 5 people from the data set.


In [ ]:

Part 2: Gender distribution

How many males and females filled out our survey?

Write a single line of Python Pandas code to count the genders in the data set. (There should be M = 670, F = 273)

HINT: Select the gender column then use a built-in series method to count the values in the series.


In [ ]:

Part 3: People without jobs

Who are the survey respondents without jobs?

Write Python Pandas code to create a variable no_occupation which filters the moviegoers data set to only those survey respondents with an ocupation of 'none'. (There should be 9 people)


In [ ]:

Part 4: Gender distribution of people without jobs.

What is the gender distribution of the 9 respondents without jobs?

Write Python Pandas code to display this.

HINT: Use the variable no_occupation from the previous step.


In [ ]:

Part 5: Young Artists

Write Python Pandas code to display the count of respondents with an occupation of artist who are 21 and under. (There should be 5)

HINT: You can either set each Pandas filter to a new DataFrame variable or try to chain the filters together. Also display them before you try and count them.


In [ ]:

Part 6: Distribution by age group

The movie theater which conducted this survey prices their tickets by age group:

  • Youth (age 18 and under) $7.50

  • Adult (age 19 55) $12.50

  • Senior (age 56 and up) $8.50

Write python code to count the number of moviegoers in each of these age groups.

Your counts should be as follows:

Adult     837
Youth      54
Senior     52

HINT: You must perform feature engineering. Create a new column 'age_group' and use the 'age' column to assign one or more values to the age group. After you create the column and set the values get a count of values for the 'age_group' column.


In [ ]:

Step 7: Questions

  1. Pandas programs are different than typical Python programs. Explain the process by which you got the final solution?

Answer:

  1. What was the most difficult aspect of this assignment?

Answer:

Step 8: Reflection

Reflect upon your experience completing this assignment. This should be a personal narrative, in your own voice, and cite specifics relevant to the activity as to help the grader understand how you arrived at the code you submitted. Things to consider touching upon: Elaborate on the process itself. Did your original problem analysis work as designed? How many iterations did you go through before you arrived at the solution? Where did you struggle along the way and how did you overcome it? What did you learn from completing the assignment? What do you need to work on to get better? What was most valuable and least valuable about this exercise? Do you have any suggestions for improvements?

To make a good reflection, you should journal your thoughts, questions and comments while you complete the exercise.

Keep your response to between 100 and 250 words.

--== Write Your Reflection Below Here ==--