Data Bootcamp Entry Poll

Experiments with the results of the Data Bootcamp entry poll.

This IPython notebook was created by Dave Backus for the NYU Stern course Data Bootcamp.

Import packages


In [5]:
import pandas as pd             # data package
import matplotlib.pyplot as plt # graphics 
import sys                      # system module, used to get Python version 
import datetime as dt           # date tools, used to note current date  

print('\nPython version: ', sys.version) 
print('Pandas version: ', pd.__version__)
print("Today's date:", dt.date.today())


Python version:  3.5.1 |Anaconda 4.0.0 (64-bit)| (default, Feb 16 2016, 09:49:46) [MSC v.1900 64 bit (AMD64)]
Pandas version:  0.18.0
Today's date: 2016-04-05

Data input and description


In [6]:
url1 = 'http://pages.stern.nyu.edu/~dbackus/Data/'
url2 = 'Data-Bootcamp-entry-poll_s16.csv'
url = url1 + url2
file = url2

ep = pd.read_csv(url, header=0)
print('Dimensions:', ep.shape)


Dimensions: (104, 11)

In [10]:
# fix variable names 
# rename variables and price dtypes 
variables = ['time', 'program', 'career', 'programming', 'stats', 
             'media', 'other', 'major', 'data', 'why', 'topics']
variables = [var.title() for var in variables]             
ep.columns = variables     
ep.dtypes


Out[10]:
Time            object
Program         object
Career          object
Programming     object
Stats           object
Media           object
Other          float64
Major           object
Data            object
Why             object
Topics          object
dtype: object

In [9]:
# summarize results
for var in list(ep):
    print('\n', var, '\n', ep[var].value_counts().head(5), sep='')


Time
2/10/2016 11:35    2
1/28/2016 14:13    2
2/24/2016 15:36    1
2/10/2016 15:28    1
1/26/2016 14:32    1
Name: Time, dtype: int64

Program
MBA                       46
Undergraduate business    36
Other undergraduate       13
Old Fellow                 2
auditing                   1
Name: Program, dtype: int64

Career
Finance                             34
Technology (Google, Amazon, etc)    26
Consulting                          22
Marketing                            5
Wine Drinking                        2
Name: Career, dtype: int64

Programming
None                                                              52
I have taken one programming course                               32
I have taken many courses and forgotten most of what I learned    12
I have taken many courses and/or have extensive experience         7
Name: Programming, dtype: int64

Stats
I have taken one probability or statistics course                 67
I have taken many courses and forgotten most of what I learned    20
I have taken many courses and/or have extensive experience        14
None                                                               2
Name: Stats, dtype: int64

Media
None                          46
Twitter                       27
Facebook                       6
Twitter, Blog (RSS) reader     5
Blog (RSS) reader              4
Name: Media, dtype: int64

Other
Series([], Name: Other, dtype: int64)

Major
Finance                     36
Analytics or other quant    14
Marketing                    9
Management                   8
Economics                    5
Name: Major, dtype: int64

Data
I don't know.                                1
Not a clue, but I hope to find out.          1
Income inequality, educational attainment    1
Sports, finance, customer engagement         1
People Analytics                             1
Name: Data, dtype: int64

Why
To help with my career                  67
I heard it was fun                      20
I lost my mind for a minute              7
like data analysis and programming       1
Great marketing by Chase and Spencer     1
Name: Why, dtype: int64

Topics
Web scraping                                       22
Multivariate regression                            11
None, I'd prefer to focus on fundamentals.         10
Natural language processing                        10
Web scraping, Maps, Natural language processing     6
Name: Topics, dtype: int64

In [ ]:


In [23]:
ep['Stats'].str.contains('one', na=False).head(10)*1


Out[23]:
0    1
1    1
2    1
3    1
4    0
5    1
6    0
7    1
8    0
9    0
Name: Stats, dtype: int32

In [ ]:


In [ ]:


In [ ]:


In [ ]: