Regression algorithm tries to find the function that best maps an input to an output in the simplest way, without overcomplicating things.
How well is the algorithm working and how do we find the best function? We want a function that is not biased towards the training data it learned from and we don't want the results to wildly vary just because the real data is slightly different than the training set. We want it to generalize to unseen data as well.
Variance - indicates how sensitive a prediction is to the training set
Bias - indicates the strength of the assumptions made on the training set
Cost Function is used to evaluate each candidate solution
In [1]:
%matplotlib inline
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
learning_rate = 0.01 # Hyperparameters
training_epochs = 100
x_train = np.linspace(-1, 1, 101) # Dataset
y_train = 2 * x_train + np.random.randn(*x_train.shape) * 0.33
X = tf.placeholder(tf.float32) # tf placeholder nodes for input/output
Y = tf.placeholder(tf.float32)
w = tf.Variable(0.0, name="weights") # Weights variable
def model(X, w): # defines model as Y = wX
return tf.multiply(X, w)
y_model = model(X, w) # Cost Function
cost = tf.square(Y-y_model)
In [2]:
# Defines the operation to be called on each iteration of the learning algorithm
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
sess = tf.Session() # Setup the tf Session and init variables
init = tf.global_variables_initializer()
sess.run(init)
for epoch in range(training_epochs): # Loop thru dataset multiple times
for (x, y) in zip(x_train, y_train): # Loop thru each point in the dataset
sess.run(train_op, feed_dict={X: x, Y: y})
w_val = sess.run(w) # Get final parameter value
sess.close() # Close the session
In [3]:
plt.scatter(x_train, y_train) # Plot the original data
y_learned = x_train*w_val
plt.plot(x_train, y_learned, 'r') # Plot the best-fit line
plt.show()
When a simple linear function won't fit the data, a polynomial function offers more flexibility. An Nth degree polynomial: $f(x) = w_nx^n + ... + w_1x + w_0$ can also describe a linear function when $n=1$
In [4]:
%matplotlib inline
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
learning_rate = 0.01 # Hyperparameters
training_epochs = 40
trX = np.linspace(-1, 1, 101) # Dataset based on 5th deg polynomial
num_coeffs = 6
trY_coeffs = [1, 2, 3, 4, 5, 6]
trY = 0
for i in range(num_coeffs):
trY += trY_coeffs[i] * np.power(trX, i)
trY += np.random.randn(*trX.shape) * 1.5 # Add noise
plt.scatter(trX, trY)
plt.show()
In [6]:
X = tf.placeholder(tf.float32) # tf placeholder nodes for input/output
Y = tf.placeholder(tf.float32)
w = tf.Variable(0.0, name="weights") # Weights variable
def model(X, w): # defines model as 5th deg poly
terms = []
for i in range(num_coeffs):
term = tf.multiply(w[i], tf.pow(X, i))
terms.append(term)
return tf.add_n(terms)
w = tf.Variable([0.] * num_coeffs, name="parameters") # Sets param vector to zeros
y_model = model(X, w)
cost = (tf.pow(Y-y_model, 2)) # Cost Function
In [9]:
# Defines the operation to be called on each iteration of the learning algorithm
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
sess = tf.Session() # Setup the tf Session and init variables
init = tf.global_variables_initializer()
sess.run(init)
for epoch in range(training_epochs): # Loop thru dataset multiple times
for (x, y) in zip(trX, trY): # Loop thru each point in the dataset
sess.run(train_op, feed_dict={X: x, Y: y})
w_val = sess.run(w) # Get final parameter value
print("5th deg polynomial coeffs:\n", w_val)
sess.close() # Close the session
In [8]:
plt.scatter(trX, trY) # Plot the original data
trY2 = 0
for i in range(num_coeffs): # Plot the result
trY2 += w_val[i] * np.power(trX, i)
plt.plot(trX, trY2, 'r')
plt.show()
Polynomial regression isn't always the best ... the goal should be to find the simplest function that best represents the data. A 10th degree polynomial isn't always the best answer. Regularization is a way of penalizing the wrong parameters in order to have the correct parameters be more dominant.
When the regularization parameter, $\lambda$, is zero there is no regularization taking place. Larger values of $\lambda$ penalize parameters with larger norms more; making an overly complicated model less complex and reducing its flexibility (fixes overfitting).
In [ ]:
In [ ]:
In [ ]: