Performing Linear Regression in TensorFlow

I gathered this data for current real estate listing prices in North Bergen from Zillow. Let's see if we can use it to develop a model for housing costs based on home size.


In [1]:
%matplotlib inline
#Typical imports
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import pandas as pd

# plots on fleek
matplotlib.style.use('ggplot')

In [2]:
# Read the housing data from the csv file into a pandas dataframe
# the names keyword allows us to name the columns,
# while the dtype sets the data type.

df = pd.read_csv('data/nb home sales.csv', names=['Square Feet', 'Price'],
                 dtype=np.float32)

In [3]:
# Display the dataframe 
df


Out[3]:
Square Feet Price
0 670.0 144900.0
1 2760.0 508000.0
2 2860.0 600000.0
3 503.0 139000.0
4 1575.0 435000.0
5 935.0 260000.0
6 680.0 229000.0
7 1593.0 559000.0
8 2552.0 475000.0
9 1008.0 275000.0
10 1060.0 200000.0
11 1976.0 459000.0

In [4]:
# Visualize the data as a scatter plot 
# with sq. ft. as the independent variable.
df.plot(x='Square Feet', y='Price', kind='scatter')


Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x110f37160>

It seems a linear model could be appropriate in this case. How can we build it with TensorFlow?


In [5]:
# First we declare our placeholders
x = tf.placeholder(tf.float32, [None, 1]) 
y_ = tf.placeholder(tf.float32, [None, 1]) 

# Then our variables
W = tf.Variable(tf.zeros([1,1]))
b = tf.Variable(tf.zeros([1]))

# And now we can make our linear model: y = Wx + b
y = tf.matmul(x, W) + b

# Finally we choose our cost function (SSE in this case)
cost = tf.reduce_sum(tf.square(y_-y))

And here's where all the magic will happen:


In [6]:
# Call tf's gradient descent function with a learning rate and instructions to minimize the cost
learn_rate = .0000000001
train = tf.train.GradientDescentOptimizer(learn_rate).minimize(cost)

# Prepare our data to be read into the training session. The data needs to match the 
# shape we specified earlier -- in this case (n, 1) where n is the number of data points.
xdata = np.asarray([[i] for i in df['Square Feet']])
y_data = np.asarray([[i] for i in df['Price']])

# Create a tensorflow session, initialize the variables, and run gradient descent
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(10000):
        # This is the actual training step - feed_dict specifies the data to be read into
        # the placeholders x and y_ respectively.
        sess.run(train, feed_dict={x:xdata, y_:y_data})
    
    # Convert our variables from tensors to scalars so we can use them outside tf
    price_sqft = np.asscalar(sess.run(W))
    cost_0 = np.asscalar(sess.run(b))

print("Model: y = %sx + %s" % (round(price_sqft,2), round(cost_0,2)))


Model: y = 222.19x + 0.61

In [7]:
# Create the empty plot
fig, axes = plt.subplots()

# Draw the scatter plot on the axes we just created
df.plot(x='Square Feet', y='Price', kind='scatter', ax=axes)

# Create a range of x values to plug into our model
sqft = np.arange(500, 3000, 1)

# Plot the model
plt.plot(sqft, price_sqft*sqft + cost_0)
plt.show()