(Introduction to Tensorflow) * 10^6

In this notebook, we modify the tensor-fied intro to TensorFlow notebook to use placeholder tensors and feed in data from a data set of millions of points. This is a derivation of Jared Ostmeyer's Naked Tensor code.



In [1]:

    
import numpy as np
np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
tf.set_random_seed(42)



In [2]:

    
xs = np.linspace(0., 8., 8000000) # eight million points spaced evenly over the interval zero to eight
ys = 0.3*xs-0.8+np.random.normal(scale=0.25, size=len(xs)) # eight million labels given xs, m=0.3, b=-0.8, plus normally-distributed noise



In [3]:

    
fig, ax = plt.subplots()
data_subset = pd.DataFrame(list(zip(xs, ys)), columns=['x', 'y']).sample(n=1000)
_ = ax.scatter(data_subset.x, data_subset.y)



In [4]:

    
m = tf.Variable(-0.5)
b = tf.Variable(1.0)



In [5]:

    
batch_size = 8 # sample mini-batches of size eight for each step of gradient descent

Define placeholder tensors of length batch_size whose values will be filled in during graph execution



In [6]:

    
xs_placeholder = tf.placeholder(tf.float32, [batch_size])
ys_placeholder = tf.placeholder(tf.float32, [batch_size])

Define graph that incorporates placeholders



In [7]:

    
ys_model = m*xs_placeholder+b
total_error = tf.reduce_sum((ys_placeholder-ys_model)**2)



In [8]:

    
optimizer_operation = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(total_error) # demo 0.01, 0.0001



In [9]:

    
initializer_operation = tf.global_variables_initializer()

Sample from the full data set while running the session



In [10]:

    
with tf.Session() as session:
    
    session.run(initializer_operation)
    
    n_batches = 1000 # 10, then 1000
    for iteration in range(n_batches):
        random_indices = np.random.randint(len(xs), size=batch_size) # sample the batch by random selection
        feed = { # feeds are dictionaries
            xs_placeholder: xs[random_indices],
            ys_placeholder: ys[random_indices]
        }
        session.run(optimizer_operation, feed_dict=feed) # minimize cost with the mini-batch
    
    slope, intercept = session.run([m, b])



In [11]:

    
slope









    Out[11]:





0.30601835



In [12]:

    
intercept









    Out[12]:





-0.77169347



In [ ]: