Title: Defining Activation Functions Slug: activation-functions Summary: A Overview of Implementing Activation Functions in Your Own Neural Network Date: 2018-01-1 09:11 Category: Neural Networks Tags: Basics Authors: Thomas Pinder
Activation functions are an integral part to a neural network, mapping the weighted input to a range of outputs. It is through the use of an activation function that a neural network can model non-linear mappings and consequently, the choice of activation function is important. In this brief summary, the sigmoid
, tanh
, ReLU
and softmax
activation functions will be presented along with an implementation.
With all activation functions, not only is the function itself needed, but also the functions derivative will be needed for back propogation. Some people prefer to define these in seperate functions, however I prefer to have it wrapped up in one function for conciseness. For all of the following activation functions, the NumPy
library should be loaded.
In [3]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
The sigmoid
function was once the default choice of activation function when building a network and to some extent it still is. By mapping values into a range between 0 and 1 it lacks the beneficial quality of being zero centered - a property that aids gradient descent during back propogation.
In [4]:
def activation_sigmoid(x, derivative):
sigmoid_value = 1/(1+np.exp(-x))
if not derivative:
return sigmoid_value
else:
return sigmoid_value*(1-sigmoid_value)
When plotted on a range of -5,5, this gives the following shape.
In [5]:
x_values = np.arange(-5, 6, 0.1)
y_sigmoid = activation_sigmoid(x_values, derivative=False)
plt.plot(x_values, y_sigmoid)
Out[5]:
tanh
is very similar in shape to the sigmoid
, however the defining difference is that tanh
ranges from -1 to 1, making it zero centered and consequently a very popular choice. Conveniently, tanh
is pre-defined in NumPy
, however it is still worthwhile wrapping it up in a function in order to define the derivative of tanh
.
In [7]:
def activation_tanh(x, derivative):
tanh_value = np.tanh(x)
if not derivative:
return tanh_value
else:
return 1-tanh_value**2
y_tanh = activation_tanh(x_values, derivative = False)
plt.plot(x_values, y_tanh)
Out[7]:
The Rectified Linear Unit is another commonly used activation function with a range from 0 to infinity. A major advantage of the ReLU
function is that, unlike the sigmoid
and tanh
, the gradient of the ReLU
function does not vanish as the limits are approached. An additionaly benefit of the ReLU
is its enhanced computational efficiency as shown by Krizhevsky et. al. who found the ReLU
function to be six times faster than tanh
.
In [9]:
def relu_activation(x, derivative):
if not derivative:
return x * (x>0)
else:
x[x <= 0] = 0
x[x > 0] = 1
return x
y_relu = relu_activation(x_values, derivative=False)
plt.plot(x_values, y_relu)
Out[9]:
It is probably worth noting, that the leaky ReLU is a closely related function with the only difference being that values < 0 are not completely set to 0, instead multiplied by 0.01.
The final function to be discussed is the softmax, a function typically used in the final layer of a network. The softmax function reduces the value of each neurone in the final layer to a value in the range of 0 and 1, such that all values in the final layer sum to 1. The benefit of this is that in a multi-classification problem, the softmax function will assign a probability to each class, allowing for deeper insight into the performance of the network to be obtained through metrics such as top-n error. Note, the softmax will sometimes be written with the omission of the subtraction of np.max(x)
stablises the function due to the exponent in the softmax sometimes resulting in a value larger than what Python can accept (10 followed by 138 0s) being calculated.
In [11]:
def softmax_activation(x):
exponent = np.exp(x - np.max(x))
softmax_value = exponent/np.sum(exponent, axis = 0)
return softmax_value
y_softmax = softmax_activation(x_values)
plt.plot(x_values, y_softmax)
print("The sum of all softmax probabilities can be confirmed as " + str(np.sum(y_softmax)))
Out[11]:
This brief discussion around the main activation functions used in neural networks should provide you with a good understanding of how each function works and the relationships between them. If in doubt, it is generally advisable to build your network using the ReLU
function in the hidden layers and the softmax function in your final layer, however, it is often worth trialing different functions to be sure.