In [ ]:
"""This area sets up the Jupyter environment.
Please do not modify anything in this cell.
"""
import os
import sys
import time
# Add project to PYTHONPATH for future use
sys.path.insert(1, os.path.join(sys.path[0], '..'))
# Import miscellaneous modules
from IPython.core.display import display, HTML
# Set CSS styling
with open('../admin/custom.css', 'r') as f:
style = """<style>\n{}\n</style>""".format(f.read())
display(HTML(style))
Public bike-sharing systems are a new generation of traditional bike rentals where the process from membership, rental, and return back of bicycles have become automatic. Through these systems, a user is able to easily rent a bicycle from a particular position and return it back to another position. Currently, there are about 500 bike-sharing systems around the world which are composed of over 500 thousand bicycles. Today, there exist great interest in these systems due to their important role in traffic, environmental, and health issues.
Apart from interesting real-world applications of these kinds of bike-sharing systems, the data being generated by these systems make them desirable for research as well. As opposed to other transport services such as bus or subway, the duration of travel, departure, and arrival position is explicitly recorded. This feature turns bike-sharing into a virtual sensor network that can be used for sensing mobility in a city. Hence, it is expected that significant events in a city could be detected by monitoring these data.
The bike-sharing rental process is highly correlated to environmental and seasonal settings. For instance, weather conditions, precipitation, day of the week, season, hour of the day, and more can affect rental behaviours. The core dataset is related to a two-year historical log between 2011 and 2012 from the Capital Bikeshare system (Washington D.C., USA) which is publicly available at http://capitalbikeshare.com/system-data. The data was aggregated hourly as well as daily and then combined with weather and seasonal information. Weather information was extracted from http://www.freemeteo.com.
We have already standardised some of the features, i.e. zero mean and unit variance.
Predict the hourly bicycle rental count based on the environmental and seasonal settings.
day.csv - Bike-sharing counts aggregated on a daily basis (731 days)
Features:
- instance: record index
- dteday : date
- season : season (1:springer, 2:summer, 3:fall, 4:winter)
- yr : year (0: 2011, 1:2012)
- mnth : month ( 1 to 12)
- holiday : weather day is holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule)
- weekday : day of the week
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
+ weathersit :
- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp : Normalized temperature in Celsius. The values are divided to 41 (max)
- atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max)
- hum: Normalized humidity. The values are divided to 100 (max)
- windspeed: Normalized wind speed. The values are divided to 67 (max)
- casual: count of casual users
- registered: count of registered users
- cnt: count of total rental bikes including both casual and registered
This dataset was created and preprocessed in:
[1] Fanaee-T, Hadi, and Gama, Joao, "Event labeling combining ensemble detectors and background knowledge", Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg, doi:10.1007/s13748-013-0040-3.
In [ ]:
# Plots will be show inside the notebook
%matplotlib notebook
import matplotlib.pyplot as plt
# High-level package for creating and training artificial neural networks
import keras
# NumPy is a package for manipulating N-dimensional array objects
import numpy as np
# Pandas is a data analysis package
import pandas as pd
import admin.tools as tools
import problem_unittests as tests
Load features for training:
In [ ]:
train_features = tools.load_csv_with_dates('resources/bike_training_features.csv', 'dteday')
Load targets for training:
In [ ]:
train_targets = tools.load_csv_with_dates('resources/bike_training_targets.csv', 'dteday')
Load features for testing:
In [ ]:
test_features = tools.load_csv_with_dates('resources/bike_test_features.csv', 'dteday')
Load targets for testing:
In [ ]:
test_targets = tools.load_csv_with_dates('resources/bike_test_targets.csv', 'dteday')
test_dates = test_targets.index.strftime('%b %d')
print('\n', test_targets.head(n=5))
Unpack the Pandas DataFrames to NumPy arrays:
In [ ]:
# Unpack features
X_train = train_features.values
X_test = test_features.values
# Unpack targets
y_train = train_targets['cnt'].values
y_test = test_targets['cnt'].values
# Record number of inputs and outputs
nb_features = X_train.shape[1]
nb_outputs = 1
Now, using Keras we will build a multivariate regression model. Remember, these kinds of models can be represented as artifical neural networks, hence why we can implement them using Keras.
The model, an artificial neural network, will consist of a $d$ dimensional input that is fully- or densely-connected to a single output neuron.
The model will be made using the Keras functional guide, which allows us to take advantage of a functional API to create complex models with an arbitrary number of input and output neurons. Below is some example code for how to set up a simple model using this API with 32 inputs and 4 outputs:
from keras.models import Model
from keras.layers import Input, Dense
a = Input(shape=(32,))
b = Dense(4)
model = Model(inputs=a, outputs=b)
Notice how this is the same setup we used for the previous notebook on linear regression. Make sure to revisit that notebook if you have trouble understanding the basic usage of this API.
In [ ]:
# Import what we need
from keras.layers import (Input, Dense)
from keras.models import Model
def simple_model(nb_inputs, nb_outputs):
"""Return a Keras Model.
"""
model = None
return model
### Do *not* modify the following line ###
# Test and see that the model has been created correctly
tests.test_simple_model(simple_model)
As opposed to standard model parameters, such as the weights in a linear model, hyperparamters are user-specified parameters not learned by the training process, i.e. they are specified a priori. In the following section we will look at how we can define and evaluate a few different hyperparameters relevant to our previously defined model. The hyperparameters we will take a look at are:
One of the ultimate goals of machine learning is for our models to generalise well. That is, we would like the performance of our model on the data we have trained on, i.e. the in-sample error, to be representative of the performance of our model on the data we are attempting to model, i.e. the out-of-sample error. Unfortunately, for most problems we are unable to test our model on all possible data that we have not trained on. This might be due to difficulties gathering new data or simply because the amount of possible data is very large.
For this reason, we have to settle for a different solution when we want to evaluate our trained models. The go-to solution is to gather a second set of data, in addtion to the training set, called a test set. For the test set to be useful it is important that it is representative of the data we have not trained on. In order words, the error we get on the test set should be close to the out-of-sample error.
Selecting appropriate hyperparameters can be seen as a sort of meta-optimisation task on top of the learning task. Now, we could train a model several times, alter some hyperparameters each time, and record the final performance on the test set, however, this will likely yield errors that are overly optimistic. This is because looking at the test set when making learning choices, i.e. selecting hyperparamters, introduces bias and causes the estimated out-of-sample error to diverge from the true out-of-sample error. Remember, this is the reason why we have a test set in the first place.
The solution to this problem is to create a third set: the validation set. This is typically a partition of the training set, however there exist several cross validation methodologies for how to create and use validation sets efficiently. By having this third set we can: (i) use the training set to train the trainable model parameters, (ii) use the validation set to select hyperparameters, and (iii) use the test set to estimate the out-of-sample error. This split ensures that the test set remains unbiased.
As we saw in the previous notebook, learning rate is an important parameter that decides how big of a jump we will make during gradient descent-based optimisation when moving in the negative gradient direction.
In order to select a good learning rate it is paramount that we track the state of the current error / loss / cost during training after each application of the gradient descent update rule. Below is a cartoon diagram illustrating the loss over the course of training. The shape of the error as training progresses can give a good indication as to what constitutes a good learning rate.
Validation error refers to the error taken over a validation set on the current model.
In artificial neural network terminology one epoch typically means that every example in the training set has been seen once by the learning algorithm. It is generally preferable to track the number of epochs as opposed to the number iterations, i.e. applications of an update rule, because the latter depends on the batch size.
In literature, iteration is sometimes used synonymously with epoch.
As we saw in the previous notebook, we typically sum over multiple examples for a single application of an update rule. The number of examples we include is the batch size.
The batch size allows us to control how much memory we need during training because we only need to sample examples for a single batch. This is important for when the entire dataset cannot fit in memory. The important thing to keeep in mind when it comes to batch size is that the smaller the batch size the less accurate the estimate of the gradient over the training set will be. In other words, moves done by the update rule in the space over all trainable parameters become more noisy the smaller the batch size is.
Make sure you understand most of the code below before you continue.
In [ ]:
"""Do not modify the following code. It is to be used as a refence for future tasks.
"""
# Create a simple model
model = simple_model(nb_features, nb_outputs)
#
# Define hyperparameters
#
lr = 0.2
nb_epochs = 10
batch_size = 10
# Fraction of the training data held as a validation set
validation_split = 0.1
# Define optimiser
optimizer = keras.optimizers.sgd(lr=lr)
# Compile model, use mean squared error
model.compile(loss='mean_squared_error', optimizer=optimizer)
# Print model
model.summary()
# Train and record history
logs = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=nb_epochs,
validation_split=validation_split,
verbose=2)
# Plot the error
fig, ax = plt.subplots(1,1)
pd.DataFrame(logs.history).plot(ax=ax)
ax.grid(linestyle='dotted')
ax.legend()
plt.show()
# Estimation on unseen data can be done using the `predict()` function, e.g.:
_y = model.predict(X_test)
In this task you will get the opportunity to play with the hyperparameters we discussed in the previous section.
In [ ]:
# Create a simple model
model = None
#
# Define hyperparameters
#
lr = 0.2
nb_epochs = 10
batch_size = 10
# Fraction of the training data held as a validation set
validation_split = 0.1
# Define optimiser
# Compile model, use mean squared error
### Do *not* modify the following lines ###
# Print model
model.summary()
# Train our network and do live plots of loss
tools.assess_multivariate_model(model, X_train, y_train, X_test, y_test,
test_dates, nb_epochs, batch_size,
validation_split
)
Regularisation is any modification made to a learning algorithm intended to reduce the generalisation error, i.e. the expected value of the error on an unseen example, but not the training error. Typically, this is interpreted as adjusting the complexity of the model by adding a regularisation term, or regulariser to the error function that we minimise:
$$ \begin{equation*} \min_{h}\sum_{i=1}^{N}E(h(\mathbf{x}_i), y_i) + \lambda R(h) \end{equation*} $$where $h$ is a hypothesis, $E$ is an error function, $R$ is the regularizer, and $\lambda$ is a parameter for controlling the aforementioned regularizer. There are other ways to control the model complexity as well, such as noise injection, data augmentation, and early stopping, but in this notebook we will focus on the type above.
In case you want to review regularization material you can refer to the following material:
$L^2$ regularization, otherwise known as weight decay, ridge regression, or Tikhonov regularization, is a popular form of regularization that penalises the norm of the model parameters. This is done by letting $R(h) = \frac{1}{2}\lVert\mathbf{w}\rVert_{2}^{2}$, which drives the weights towards the origin. Any point can be selected, but the origin is a good choice if we do not know the correct value. By multiplying with a factor of $\frac{1}{2}$ we will simplify the gradient of $R(h)$.
In [ ]:
# Import what we need
from keras import regularizers
def simple_model_l2(nb_inputs, nb_outputs, reg_factor):
"""Return a L2 regularized Keras Model.
"""
model = None
return model
### Do *not* modify the following line ###
# Test and see that the model has been created correctly
tests.test_simple_model_regularized(simple_model_l2)
Now, with this model, let's try to optimize the regularization factor $\lambda$. This adjusts the strength of the regularizer.
In [ ]:
# Regularization factor (lambda)
reg_factor = 0.005
# Create a simple model
model = None
#
# Define hyperparameters
#
lr = 0.0005
nb_epochs = 100
batch_size = 128
reg_factor = 0.0005
# Fraction of the training data held as a validation set
validation_split = 0.1
# Define optimiser
# Compile model, use mean squared error
### Do *not* modify the following lines ###
# Print model
model.summary()
# Train our network and do live plots of loss
tools.assess_multivariate_model(model, X_train, y_train, X_test, y_test,
test_dates, nb_epochs, batch_size,
validation_split
)
In [ ]: