Intro to Data Science: Final Project 1

Analyzing the NYC Subway Dataset

Section 3.Visualization

Import Data


In [393]:
import operator

import numpy as np
import pandas as pd
import scipy as sp
import scipy.stats as st
import statsmodels.api as sm
import scipy.optimize as op

from sklearn import linear_model
from sklearn.metrics import r2_score
from sklearn.linear_model import Ridge
from sklearn.linear_model import SGDClassifier
from sklearn.svm import SVC

import matplotlib.pyplot as plt
%matplotlib inline

filename = '/Users/excalibur/py/nanodegree/intro_ds/final_project/improved-dataset/turnstile_weather_v2.csv'

# import data
data = pd.read_csv(filename)

Functions for Basic Statistics and Learning

Extract Relevant Data

Class for Creating Data Samples

Formulas Implemented (i.e., not included in modules/packages)

Class for Creating Learners

3.1 One visualization should contain two histograms: one of ENTRIESn_hourly for rainy days and one of ENTRIESn_hourly for non-rainy days.

3.2 One visualization can be more freeform.