TF intro

This guide assumes you can read through basic Python code or use your google skills to catch up on that as needed. We begin by understanding how tensorflow works. The key point to remember is that all the tensorflow computation happens in a graph, and all that you get to do in python is to manipulate and run that graph. This creates a programming paradigm that looks a lot like python but is actually quite different. We begin with a simple logistic regression example.


In [3]:
import tensorflow as tf
import numpy as np

n_obs = 1000
n_features = 5

x_ph = tf.placeholder(shape=(n_obs, n_features), name="x_ph", dtype=tf.float32)
beta_init = np.random.normal(size=(n_features, 1))
beta_hat = tf.Variable(beta_init, dtype=tf.float32, name="beta_hat")
y_hat = tf.nn.sigmoid(tf.matmul(x_ph, beta_hat), name="yhat")

We can visualize thie graph, and then explain what the different parts are:

This code is a modification from the DeepDream notebook. There is more visualization / exploration that can be done using tensorboard, which is the tool this uses.


In [4]:
# TensorFlow Graph visualizer code
# https://stackoverflow.com/questions/41388673/visualizing-a-tensorflow-graph-in-jupyter-doesnt-work
import numpy as np
from IPython.display import clear_output, Image, display, HTML

def strip_consts(graph_def, max_const_size=32):
    """Strip large constant values from graph_def."""
    strip_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = strip_def.node.add() 
        n.MergeFrom(n0)
        if n.op == 'Const':
            tensor = n.attr['value'].tensor
            size = len(tensor.tensor_content)
            if size > max_const_size:
                tensor.tensor_content = bytes("<stripped %d bytes>"%size, 'utf-8')
    return strip_def

def show_graph(graph_def, max_const_size=32):
    """Visualize TensorFlow graph."""
    if hasattr(graph_def, 'as_graph_def'):
        graph_def = graph_def.as_graph_def()
    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
    code = """
        <script src="//cdnjs.cloudflare.com/ajax/libs/polymer/0.3.3/platform.js"></script>
        <script>
          function load() {{
            document.getElementById("{id}").pbtxt = {data};
          }}
        </script>
        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
        <div style="height:800px">
          <tf-graph-basic id="{id}"></tf-graph-basic>
        </div>
    """.format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))

    iframe = """
        <iframe seamless style="width:1200px;height:800px;border:0" srcdoc="{}"></iframe>
    """.format(code.replace('"', '&quot;'))
    display(HTML(iframe))

In [5]:
show_graph(tf.get_default_graph().as_graph_def())


What we do in tensorflow is construct graphs like this, and then evaluate nodes. Each graph node is associated with some code. When we evaluate a node like y_hat, tensorflow figures out what nodes it depends on, evaluates all of those nodes, and then evaluates y_hat. In this graph, there are three types of nodes (Tensors), a Variable, a placeholder, and a vanilla Tensor that is none of the above. Tensors have no state: they are computable from the rest of the graph. Variables have state (they're the only thing we can optimize / save / load). Placeholders are not computable and don't have state: we must feed values into them. We can also feed values into other tensors, but TF will explicitly complain if we fail to feed a value into a placeholder.

Our python objects like beta_hat are references to the TF graph nodes, not the nodes themselves (i.e. copying the python object does not dupliate the graph node).

To evaluate a graph we need to associate it with a session, and then either a tensor's eval method or the session's run method. The difference is that we can run multiple tensors together, which might be useful if they share dependencies.

Now we define the loss, generate synth data, and optimize:


In [6]:
from scipy.special import expit as logistic

true_beta = np.random.normal(size=(n_features, 1))
x = np.random.normal(size=(n_obs, n_features))
y = np.random.binomial(n=1, p=logistic(x @ true_beta))

y_ph = tf.placeholder(shape=(n_obs, 1), name="y_ph", dtype=tf.float32)

logistic_loss = -tf.reduce_sum(y_ph * tf.log(1e-10 + y_hat)+ (1-y_ph) * tf.log(1e-10 + 1 - y_hat))

# if we needed the gradients for some reason, e.g. to pass to an external optimizier or to plot
# grads = tf.gradients(logistic_loss, beta_hat)

# create the optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.0005)

# this is the long way of doing it
# grads_and_vars = optimizer.compute_gradients(logistic_loss, beta_hat)
# optionally, modify gradients here (e.g. threshold)
# minimize_op = optimizer.apply_gradients(grads_and_vars)

# the short way
minimize_op = optimizer.minimize(logistic_loss) # by default, minimize w.r.t. all variables

sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(2500):
    if i % 100 == 0:
        l = logistic_loss.eval(session=sess, feed_dict={x_ph:x, y_ph:y})
        print("Iter %i, loss %f" % (i, l))
    sess.run(minimize_op, feed_dict={x_ph:x, y_ph:y})

accuracy = np.average(tf.equal(tf.round(y_hat), y_ph).eval(session=sess, feed_dict={x_ph:x, y_ph:y}))
print(accuracy)


Iter 0, loss 1257.209106
Iter 100, loss 425.954163
Iter 200, loss 425.796875
Iter 300, loss 425.795349
Iter 400, loss 425.795319
Iter 500, loss 425.795227
Iter 600, loss 425.795227
Iter 700, loss 425.795227
Iter 800, loss 425.795227
Iter 900, loss 425.795227
Iter 1000, loss 425.795227
Iter 1100, loss 425.795227
Iter 1200, loss 425.795227
Iter 1300, loss 425.795227
Iter 1400, loss 425.795227
Iter 1500, loss 425.795227
Iter 1600, loss 425.795227
Iter 1700, loss 425.795227
Iter 1800, loss 425.795227
Iter 1900, loss 425.795227
Iter 2000, loss 425.795227
Iter 2100, loss 425.795227
Iter 2200, loss 425.795227
Iter 2300, loss 425.795227
Iter 2400, loss 425.795227
0.814

In [7]:
# another optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
minimize_op = optimizer.minimize(logistic_loss) # by default, minimize w.r.t. all variables

sess.run(tf.global_variables_initializer())
for i in range(1000):
    if i % 100 == 0:
        l = logistic_loss.eval(session=sess, feed_dict={x_ph:x, y_ph:y})
        print("Iter %i, loss %f" % (i, l))
    sess.run(minimize_op, feed_dict={x_ph:x, y_ph:y})

accuracy = np.average(tf.equal(tf.round(y_hat), y_ph).eval(session=sess, feed_dict={x_ph:x, y_ph:y}))
print(accuracy)


Iter 0, loss 1257.209106
Iter 100, loss 621.341553
Iter 200, loss 453.863708
Iter 300, loss 430.841217
Iter 400, loss 426.745361
Iter 500, loss 425.957336
Iter 600, loss 425.818573
Iter 700, loss 425.798065
Iter 800, loss 425.795532
Iter 900, loss 425.795319
0.814

One more tutorial point is on how to print things. Since a tensor only has a value when the graph is executed, inspecting things is trickier than usual. The Print op returns the same op as its input, but prints as a side-effect. This means we need to inject the op into the graph. Unfortunately, the print happens on the C++ end, so we will see the logging messages in the jupyter server log (or in a shell).


In [8]:
logistic_loss_with_print = tf.Print(input_=logistic_loss, data=[x, logistic_loss])

_ = logistic_loss_with_print.eval(session=sess, feed_dict={x_ph:x, y_ph:y})

MLP on MNIST

Now we build some neural network building blocks we will reuse for VAEs.


In [9]:
from tensorflow.examples.tutorials.mnist import input_data

global_dtype = tf.float32
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
input_size = mnist.train.images.shape[1]


Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Define our neural network building blocks.


In [10]:
def _dense_mlp_layer(x, input_size, out_size, nonlinearity=tf.nn.softmax, name_prefix=""):
    w_init = tf.truncated_normal(shape=[input_size, out_size], stddev=0.001)
    b_init = tf.ones(shape=[out_size]) * 0.1
    W = tf.Variable(w_init, name="%s_W" % name_prefix)
    b = tf.Variable(b_init, name="%s_b" % name_prefix)
    out = nonlinearity(tf.matmul(x, W) + b)
    return out, [W, b]

def _mlp(x, n_layers, units_per_layer, input_size, out_size, nonlinearity=tf.tanh):
    train_vars = []

    x, v = _dense_mlp_layer(x, input_size, units_per_layer, nonlinearity, name_prefix="into_hidden")
    train_vars.extend(v)
    # exploit the fact that repeatedly calling the same TF function creates multiple ops. 
    # no need to hang onto the intermediate layer handles (though we can get them back if we need them)
    for l in range(n_layers-1):
        x, v = _dense_mlp_layer(x, units_per_layer, units_per_layer, nonlinearity, name_prefix="hidden")
        train_vars.extend(v)

    x, v = _dense_mlp_layer(x, units_per_layer, out_size, nonlinearity, name_prefix="readout")
    train_vars.extend(v)
    return x, train_vars

Now we construct the graph. The graph and scope boilerplate makes our life easier as far as visualization and debugging is concerned. We can visualize/run only this graph and not the graph for logistic regression (above).


In [11]:
mlp_graph = tf.Graph()

with mlp_graph.as_default():
    with tf.name_scope("Feedforward_Net"):
        x = tf.placeholder(shape=[None, input_size], dtype=global_dtype, name='x')
        y = tf.placeholder(shape=[None, 10], dtype=global_dtype, name='y')
        y_hat, mlp_test_vars = _mlp(x, n_layers=2, units_per_layer=30, input_size=784, out_size=10)

    with tf.name_scope("Opt_and_loss"):
        cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_hat))
        # learning rates are much smaller for optimizers like Adam and RMSProp
        train_step_mlptest = tf.train.AdamOptimizer(0.01).minimize(cross_entropy)

    with tf.name_scope("Support_stuff"):
        init = tf.global_variables_initializer()
        correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_hat,1))
        mlp_acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

We visualize our graph:


In [12]:
show_graph(mlp_graph.as_graph_def())


Next we create a session, initialize our variables, and train the network:


In [13]:
sess = tf.Session(graph=mlp_graph)
sess.run(init)
train_steps = 2500

acc = np.zeros(train_steps)

# create this op outside of the loop so we don't create it 5000 times
for i in range(train_steps):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    acc[i] = mlp_acc.eval(session = sess, feed_dict = {x: batch_xs, y: batch_ys})
    sess.run(train_step_mlptest, feed_dict={x: batch_xs, y: batch_ys})

test_acc = mlp_acc.eval(session=sess, feed_dict={x: mnist.test.images, y: mnist.test.labels})
print("Test accuracy: %f" % test_acc)

import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(acc)


Test accuracy: 0.935200
Out[13]:
[<matplotlib.lines.Line2D at 0x130c7b208>]

VAEs!

$$ \DeclareMathOperator{\Tr}{Tr} \newcommand{\trp}{{^\top}} % transpose \newcommand{\trace}{\text{Trace}} % trace \newcommand{\inv}{^{-1}} \newcommand{\mb}{\mathbf{b}} \newcommand{\M}{\mathbf{M}} \newcommand{\G}{\mathbf{G}} \newcommand{\A}{\mathbf{A}} \newcommand{\R}{\mathbf{R}} \renewcommand{\S}{\mathbf{S}} \newcommand{\B}{\mathbf{B}} \newcommand{\Q}{\mathbf{Q}} \newcommand{\mH}{\mathbf{H}} \newcommand{\U}{\mathbf{U}} \newcommand{\mL}{\mathbf{L}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\etr}{\mathrm{etr}} \renewcommand{\H}{\mathbf{H}} \newcommand{\vecop}{\mathrm{vec}} \newcommand{\I}{\mathbf{I}} \newcommand{\X}{\mathbf{X}_{ij}} \newcommand{\Y}{\mathbf{Y}_{jk}} \newcommand{\Z}{\mathbf{Z}_{ik}} $$

In our generative model, we would like to estimate the density of some complicated probability density $\log P_{\theta}(X)$. This is slightly odd notation but seems standard in these papers: it says that $X$ is a random variable but $\theta$ are parameters. To do this, we will write it as follows:

$$ \log P_{\theta}(X) = \log P_{\theta}(X,Z) - \log P_{\theta}(Z\mid X) $$

This is just the definition of marginal probability. Next, we add/subtract $\log Q_{\phi}(Z\mid X)$ which sums to 0:

$$ \log P_{\theta}(X) = \log P_{\theta}(X,Z) - \log P_{\theta}(Z\mid X) - \log Q_{\phi}(Z\mid X) + \log Q_{\phi}(Z\mid X) $$

We take expectation of both sides. Note that this expectation has to be w.r.t. the conditional distribution $Q(Z\mid X)$ for the rest of this to work properly:

$$ \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X) = \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X,Z) - \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(Z\mid X) - \mathbb{E}_{Q(Z\mid X)}\log Q_{\phi}(Z\mid X) + \mathbb{E}_{Q(Z\mid X)}\log Q_{\phi}(Z\mid X) $$

Since the LHS is independent of Z, that expectation just goes away. We rearrange the terms and realize we ended up with the evidence lower bound (ELBO) and a KL divergence:

$$ \begin{align} \log P_{\theta}(X) =& \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X,Z) - \mathbb{E}_{Q(Z\mid X)}\log Q_{\phi}(Z\mid X) + \mathbb{E}_{Q(Z\mid X)}\log Q_{\phi}(Z\mid X)- \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(Z\mid X) \\ =& \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X,Z) - \mathbb{E}_{Q(Z\mid X)}\log Q_{\phi}(Z\mid X) + \int_Z Q_{\phi}(Z\mid X)\log Q_{\phi}(Z\mid X)- \int_Z Q_{\phi}(Z\mid X)\log P_{\theta}(Z\mid X) \\ =& \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X,Z) +\mathbb{E}_{Q(Z\mid X)}\log Q_{\phi}(Z\mid X) - \int_Z Q_{\phi}(Z\mid X)\log \frac{Q_{\phi}(Z\mid X)}{P_{\theta}(Z\mid X)} \\ =& \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X,Z) + \mathbb{E}_{Q(Z\mid X)}\log Q_{\phi}(Z\mid X) -\mathcal{D}_{KL}(Q_{\theta}(Z\mid X)|| P_{\phi}(Z\mid X)) \end{align} $$

The ELBO is a lower bound because KL divergence is greater than equal to 0. So if we want to maximize the LHS, we can choose to either minimize the KL divergence, or maximize the ELBO. We would rather do the former, because the KL divergence contains the posterior $P(Z\mid X)$ and if we knew how to do that, we wouldn't be going through this hassle. The nice thing is, since this holds for any $Q$ and any $Z$ we can define both distributions to be as nice as we like. So we're going to say that the likelihood $P_{\theta}(X,Z)$, prior $P_{\theta}(Z)$ and approximate posterior $Q_{\theta}(X,Z)$ are all gaussian. We pick the easiest possible marginal distribution over $Q(Z)$, an identity-covariance gaussian:

$$ \begin{align} P_{\theta}(X\mid Z) :=& \mathcal{N}(a(Z,\theta), b(Z,\theta)b(Z,\theta)\trp) \\ Q_{\phi}(Z\mid X):=&\mathcal{N}(f(X,\phi),g(X,\phi)g(X,\phi)^{\intercal})\\ P_{\theta}(Z) :=& \mathcal{N}(0,\mathbf{I}) \end{align} $$

The distributions $P_{\theta}(X\mid Z)$ and $Q_{\phi}(Z\mid X)$ are parameterized by mean and covariance-square-root functions $a,b,f,g$ which we leave unspecified for now. In practice for VAEs people use the equivalent of a mean-field assumption, which means that the covariance functions will just return SDs/variances, but I'd like to write the reparameterization trick in general form. We can additionally rewrite the expression to get a second KL divergence:

$$ \begin{align} \log P_{\theta}(X) =& \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X\mid Z) + \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(Z) + \mathbb{E}_{Q(Z\mid X)}\log Q_{\phi}(Z\mid X)-\mathcal{D}_{KL}(Q_{\theta}(Z\mid X)|| P_{\phi}(Z\mid X)) \\ =&\mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X\mid Z) + \mathcal{D}_{KL}( Q_{\phi}(Z\mid X)||P_{\theta}(Z)) - \mathcal{D}_{KL}(Q_{\theta}(Z\mid X)|| P_{\phi}(Z\mid X)) \end{align} $$

Conveniently, the KL divergence between two gaussians (the prior and approximate posterior) is analytic. What remains is the expectation $\mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X\mid Z)$, which we can compute from its empirical, sample-based mean:

$$ \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X\mid Z) = \int Q_{\phi}(Z\mid X)\log P_{\theta}(X|Z)d Q \approx \frac{1}{N} \sum \log P_{\theta}(X|Z) Z_i, \\\ Z_i\sim\mathcal{N}(f(X,\phi),g(X,\phi)g(X,\phi)\trp) $$

The naive gradient estimator here has very high variance according to the VAE paper, though it is used in Blei, Jordan and Paisley 2012 (ICML):

$$ \nabla_{\phi}\mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X\mid Z) \approx \frac{1}{N} \sum \nabla_{\phi}\log P_{\theta}(X|Z) Z_i $$

What the VAE paper does instead is apply the reparameterization trick:

\begin{align} \epsilon&\sim\mathcal{N}(0, \I)\\ Z &= f(X,\phi) + g(X,\phi)\epsilon\\ \mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X\mid Z) &\approx \frac{1}{N} \sum \log P_{\theta}(X|Z)(f(X,\phi) + g(X,\phi)\epsilon)\\ \nabla_{\phi}\mathbb{E}_{Q(Z\mid X)}\log P_{\theta}(X\mid Z) &\approx \frac{1}{N} \sum \nabla_{\phi}\log P_{\theta}(X|Z)(f(X,\phi) + g(X,\phi)\epsilon) \end{align}

Now let's see if we can implement it using tensorflow. We begin by sanity-checking a basic MLP and then go to VAEs. First, import some things we'll need and download MNIST:


In [14]:
from tensorflow.examples.tutorials.mnist import input_data

encoder_depth = 2
decoder_depth = 2
encoder_units = 500
decoder_units = 500
latent_size = 10
global_dtype = tf.float32
minibatch_size = 100
input_size = mnist.train.images.shape[1]
train_steps = mnist.train.num_examples // minibatch_size
encoder_nonlinearity = tf.nn.sigmoid
decoder_nonlinearity = tf.nn.sigmoid
n_epochs = 10

Construct the graph:


In [15]:
vae_graph = tf.Graph()

with vae_graph.as_default():

    with tf.name_scope("Encoder_Q"):

        x = tf.placeholder(shape=[None, input_size], dtype=global_dtype, name='x')
        q_network, q_mu_vars = _mlp(x, n_layers=encoder_depth, units_per_layer=encoder_units, input_size=input_size, out_size=encoder_units, nonlinearity=encoder_nonlinearity)
        w_mu = tf.Variable(tf.truncated_normal(shape=[encoder_units, latent_size], stddev=0.1), name="w_mu")
        w_logsig = tf.Variable(tf.truncated_normal(shape=[encoder_units, latent_size], stddev=0.1), name="w_logsig")
        b_mu = tf.Variable(tf.truncated_normal(shape=[latent_size], stddev=0.1), name="b_mu")
        b_logsig = tf.Variable(tf.truncated_normal(shape=[latent_size], stddev=0.1), name="b_logsig")
        q_mu = tf.matmul(q_network, w_mu) + b_mu
        q_logsigma = tf.matmul(q_network, w_logsig) + b_logsig
        epsilon = tf.random_normal([minibatch_size, latent_size])
        z = q_mu + tf.sqrt(tf.exp(q_logsigma)) * epsilon


    with tf.name_scope("Decoder_P"):

        p_mu, p_mu_vars = _mlp(z, n_layers=decoder_depth, units_per_layer=decoder_units, input_size=latent_size,  out_size=input_size, nonlinearity=decoder_nonlinearity)

    with tf.name_scope("Opt_and_loss"):
        kld = 0.5 * tf.reduce_sum(1 + q_logsigma - tf.square(q_mu) - tf.exp(q_logsigma), 1)
        ll = tf.reduce_sum(x * tf.log(1e-10 + p_mu)+ (1-x) * tf.log(1e-10 + 1 - p_mu), 1)
        elbo = tf.reduce_mean(ll + kld)
        minimize_op = tf.train.AdamOptimizer(0.001).minimize(-elbo)

    with tf.name_scope("Support_stuff"):
        init = tf.global_variables_initializer()

In [16]:
show_graph(vae_graph.as_graph_def())


Now we run and visualize:


In [17]:
sess = tf.Session(graph=vae_graph)
sess.run(init)

elbo_log = np.zeros(n_epochs * train_steps)

for i in range(n_epochs):
    for j in range(train_steps):
        batch_xs, batch_ys = mnist.train.next_batch(minibatch_size)
        sess.run(minimize_op, feed_dict={x: batch_xs})
        elbo_log[i*train_steps + j] = elbo.eval(session=sess, feed_dict={x: batch_xs})
        if j % 10 == 0:
            print("Epoch %i, step %i: average elbo=%f" % (i, j, elbo_log[i*train_steps + j]))


Epoch 0, step 0: average elbo=-498.922028
Epoch 0, step 10: average elbo=-214.074295
Epoch 0, step 20: average elbo=-214.997269
Epoch 0, step 30: average elbo=-212.165344
Epoch 0, step 40: average elbo=-210.181015
Epoch 0, step 50: average elbo=-213.656189
Epoch 0, step 60: average elbo=-207.370346
Epoch 0, step 70: average elbo=-203.687546
Epoch 0, step 80: average elbo=-210.040588
Epoch 0, step 90: average elbo=-207.408844
Epoch 0, step 100: average elbo=-200.401810
Epoch 0, step 110: average elbo=-211.652115
Epoch 0, step 120: average elbo=-203.729889
Epoch 0, step 130: average elbo=-198.393951
Epoch 0, step 140: average elbo=-206.218460
Epoch 0, step 150: average elbo=-207.235901
Epoch 0, step 160: average elbo=-209.720154
Epoch 0, step 170: average elbo=-217.805222
Epoch 0, step 180: average elbo=-207.267563
Epoch 0, step 190: average elbo=-205.444199
Epoch 0, step 200: average elbo=-203.058441
Epoch 0, step 210: average elbo=-211.681366
Epoch 0, step 220: average elbo=-213.850220
Epoch 0, step 230: average elbo=-210.270935
Epoch 0, step 240: average elbo=-204.285355
Epoch 0, step 250: average elbo=-202.927597
Epoch 0, step 260: average elbo=-205.129791
Epoch 0, step 270: average elbo=-197.652893
Epoch 0, step 280: average elbo=-205.535339
Epoch 0, step 290: average elbo=-192.104752
Epoch 0, step 300: average elbo=-202.904068
Epoch 0, step 310: average elbo=-202.045944
Epoch 0, step 320: average elbo=-206.704285
Epoch 0, step 330: average elbo=-219.215546
Epoch 0, step 340: average elbo=-215.538666
Epoch 0, step 350: average elbo=-205.286438
Epoch 0, step 360: average elbo=-201.292419
Epoch 0, step 370: average elbo=-206.491440
Epoch 0, step 380: average elbo=-207.749573
Epoch 0, step 390: average elbo=-206.005707
Epoch 0, step 400: average elbo=-203.475937
Epoch 0, step 410: average elbo=-195.294220
Epoch 0, step 420: average elbo=-200.652267
Epoch 0, step 430: average elbo=-208.812149
Epoch 0, step 440: average elbo=-200.045319
Epoch 0, step 450: average elbo=-199.878784
Epoch 0, step 460: average elbo=-206.345490
Epoch 0, step 470: average elbo=-207.140121
Epoch 0, step 480: average elbo=-204.130676
Epoch 0, step 490: average elbo=-203.469315
Epoch 0, step 500: average elbo=-205.958557
Epoch 0, step 510: average elbo=-203.972565
Epoch 0, step 520: average elbo=-201.650574
Epoch 0, step 530: average elbo=-204.986053
Epoch 0, step 540: average elbo=-198.621933
Epoch 1, step 0: average elbo=-202.771149
Epoch 1, step 10: average elbo=-201.271362
Epoch 1, step 20: average elbo=-195.510757
Epoch 1, step 30: average elbo=-209.701096
Epoch 1, step 40: average elbo=-198.979568
Epoch 1, step 50: average elbo=-198.095337
Epoch 1, step 60: average elbo=-210.112228
Epoch 1, step 70: average elbo=-201.188522
Epoch 1, step 80: average elbo=-205.822937
Epoch 1, step 90: average elbo=-198.407654
Epoch 1, step 100: average elbo=-193.962448
Epoch 1, step 110: average elbo=-194.385345
Epoch 1, step 120: average elbo=-204.923477
Epoch 1, step 130: average elbo=-208.576111
Epoch 1, step 140: average elbo=-204.448181
Epoch 1, step 150: average elbo=-204.872742
Epoch 1, step 160: average elbo=-200.421188
Epoch 1, step 170: average elbo=-198.152557
Epoch 1, step 180: average elbo=-200.047562
Epoch 1, step 190: average elbo=-206.334961
Epoch 1, step 200: average elbo=-202.665192
Epoch 1, step 210: average elbo=-204.729080
Epoch 1, step 220: average elbo=-197.487152
Epoch 1, step 230: average elbo=-196.357513
Epoch 1, step 240: average elbo=-207.793549
Epoch 1, step 250: average elbo=-209.511246
Epoch 1, step 260: average elbo=-202.061935
Epoch 1, step 270: average elbo=-197.077194
Epoch 1, step 280: average elbo=-195.298264
Epoch 1, step 290: average elbo=-194.038712
Epoch 1, step 300: average elbo=-199.171677
Epoch 1, step 310: average elbo=-188.342758
Epoch 1, step 320: average elbo=-203.318146
Epoch 1, step 330: average elbo=-196.126175
Epoch 1, step 340: average elbo=-191.844864
Epoch 1, step 350: average elbo=-193.601959
Epoch 1, step 360: average elbo=-201.827423
Epoch 1, step 370: average elbo=-188.135818
Epoch 1, step 380: average elbo=-197.268051
Epoch 1, step 390: average elbo=-192.352890
Epoch 1, step 400: average elbo=-202.524338
Epoch 1, step 410: average elbo=-195.991409
Epoch 1, step 420: average elbo=-198.012756
Epoch 1, step 430: average elbo=-197.807190
Epoch 1, step 440: average elbo=-203.761795
Epoch 1, step 450: average elbo=-195.336060
Epoch 1, step 460: average elbo=-195.639923
Epoch 1, step 470: average elbo=-197.546234
Epoch 1, step 480: average elbo=-194.885941
Epoch 1, step 490: average elbo=-186.946213
Epoch 1, step 500: average elbo=-200.764420
Epoch 1, step 510: average elbo=-201.621597
Epoch 1, step 520: average elbo=-197.619278
Epoch 1, step 530: average elbo=-190.513000
Epoch 1, step 540: average elbo=-202.596405
Epoch 2, step 0: average elbo=-196.629211
Epoch 2, step 10: average elbo=-201.397049
Epoch 2, step 20: average elbo=-196.755905
Epoch 2, step 30: average elbo=-190.376953
Epoch 2, step 40: average elbo=-191.373810
Epoch 2, step 50: average elbo=-197.041595
Epoch 2, step 60: average elbo=-196.976074
Epoch 2, step 70: average elbo=-196.638824
Epoch 2, step 80: average elbo=-193.106171
Epoch 2, step 90: average elbo=-194.680786
Epoch 2, step 100: average elbo=-188.465073
Epoch 2, step 110: average elbo=-194.780670
Epoch 2, step 120: average elbo=-191.621445
Epoch 2, step 130: average elbo=-188.459686
Epoch 2, step 140: average elbo=-205.755707
Epoch 2, step 150: average elbo=-189.846405
Epoch 2, step 160: average elbo=-200.592209
Epoch 2, step 170: average elbo=-195.979645
Epoch 2, step 180: average elbo=-187.060059
Epoch 2, step 190: average elbo=-190.503784
Epoch 2, step 200: average elbo=-188.892029
Epoch 2, step 210: average elbo=-194.118698
Epoch 2, step 220: average elbo=-193.117111
Epoch 2, step 230: average elbo=-188.780701
Epoch 2, step 240: average elbo=-195.566757
Epoch 2, step 250: average elbo=-188.729858
Epoch 2, step 260: average elbo=-197.187729
Epoch 2, step 270: average elbo=-188.502106
Epoch 2, step 280: average elbo=-194.004059
Epoch 2, step 290: average elbo=-191.728455
Epoch 2, step 300: average elbo=-195.860443
Epoch 2, step 310: average elbo=-197.077072
Epoch 2, step 320: average elbo=-188.211090
Epoch 2, step 330: average elbo=-184.821442
Epoch 2, step 340: average elbo=-184.030563
Epoch 2, step 350: average elbo=-194.977005
Epoch 2, step 360: average elbo=-202.008591
Epoch 2, step 370: average elbo=-191.488541
Epoch 2, step 380: average elbo=-188.002853
Epoch 2, step 390: average elbo=-193.981567
Epoch 2, step 400: average elbo=-184.338333
Epoch 2, step 410: average elbo=-184.117584
Epoch 2, step 420: average elbo=-190.043304
Epoch 2, step 430: average elbo=-192.296600
Epoch 2, step 440: average elbo=-189.440826
Epoch 2, step 450: average elbo=-194.788101
Epoch 2, step 460: average elbo=-193.212387
Epoch 2, step 470: average elbo=-189.541321
Epoch 2, step 480: average elbo=-189.744705
Epoch 2, step 490: average elbo=-181.076370
Epoch 2, step 500: average elbo=-192.771469
Epoch 2, step 510: average elbo=-191.602188
Epoch 2, step 520: average elbo=-189.750305
Epoch 2, step 530: average elbo=-195.219528
Epoch 2, step 540: average elbo=-186.222946
Epoch 3, step 0: average elbo=-199.638092
Epoch 3, step 10: average elbo=-188.312546
Epoch 3, step 20: average elbo=-192.338470
Epoch 3, step 30: average elbo=-185.960022
Epoch 3, step 40: average elbo=-192.097534
Epoch 3, step 50: average elbo=-192.343826
Epoch 3, step 60: average elbo=-183.228088
Epoch 3, step 70: average elbo=-200.525116
Epoch 3, step 80: average elbo=-183.065903
Epoch 3, step 90: average elbo=-192.388733
Epoch 3, step 100: average elbo=-197.787506
Epoch 3, step 110: average elbo=-183.872620
Epoch 3, step 120: average elbo=-183.707382
Epoch 3, step 130: average elbo=-186.933151
Epoch 3, step 140: average elbo=-184.939850
Epoch 3, step 150: average elbo=-190.825287
Epoch 3, step 160: average elbo=-184.756760
Epoch 3, step 170: average elbo=-188.678238
Epoch 3, step 180: average elbo=-191.503235
Epoch 3, step 190: average elbo=-178.095306
Epoch 3, step 200: average elbo=-183.801208
Epoch 3, step 210: average elbo=-179.513519
Epoch 3, step 220: average elbo=-177.637024
Epoch 3, step 230: average elbo=-186.102478
Epoch 3, step 240: average elbo=-187.571564
Epoch 3, step 250: average elbo=-183.121918
Epoch 3, step 260: average elbo=-185.857697
Epoch 3, step 270: average elbo=-172.899399
Epoch 3, step 280: average elbo=-172.112228
Epoch 3, step 290: average elbo=-179.014175
Epoch 3, step 300: average elbo=-180.813889
Epoch 3, step 310: average elbo=-173.349045
Epoch 3, step 320: average elbo=-182.294907
Epoch 3, step 330: average elbo=-180.487030
Epoch 3, step 340: average elbo=-176.551819
Epoch 3, step 350: average elbo=-177.762909
Epoch 3, step 360: average elbo=-174.468552
Epoch 3, step 370: average elbo=-177.856094
Epoch 3, step 380: average elbo=-175.555847
Epoch 3, step 390: average elbo=-182.225510
Epoch 3, step 400: average elbo=-173.955200
Epoch 3, step 410: average elbo=-178.312073
Epoch 3, step 420: average elbo=-169.398071
Epoch 3, step 430: average elbo=-173.134094
Epoch 3, step 440: average elbo=-181.403122
Epoch 3, step 450: average elbo=-184.885941
Epoch 3, step 460: average elbo=-173.984009
Epoch 3, step 470: average elbo=-175.292831
Epoch 3, step 480: average elbo=-171.830017
Epoch 3, step 490: average elbo=-168.667892
Epoch 3, step 500: average elbo=-179.198853
Epoch 3, step 510: average elbo=-175.162155
Epoch 3, step 520: average elbo=-183.335052
Epoch 3, step 530: average elbo=-174.732819
Epoch 3, step 540: average elbo=-172.761948
Epoch 4, step 0: average elbo=-169.987076
Epoch 4, step 10: average elbo=-179.686371
Epoch 4, step 20: average elbo=-181.981567
Epoch 4, step 30: average elbo=-170.259186
Epoch 4, step 40: average elbo=-175.775681
Epoch 4, step 50: average elbo=-167.100113
Epoch 4, step 60: average elbo=-173.629333
Epoch 4, step 70: average elbo=-172.521881
Epoch 4, step 80: average elbo=-165.267502
Epoch 4, step 90: average elbo=-171.267242
Epoch 4, step 100: average elbo=-174.288712
Epoch 4, step 110: average elbo=-166.835739
Epoch 4, step 120: average elbo=-166.294037
Epoch 4, step 130: average elbo=-169.909088
Epoch 4, step 140: average elbo=-177.945663
Epoch 4, step 150: average elbo=-176.419724
Epoch 4, step 160: average elbo=-167.970230
Epoch 4, step 170: average elbo=-175.322250
Epoch 4, step 180: average elbo=-174.118790
Epoch 4, step 190: average elbo=-182.828949
Epoch 4, step 200: average elbo=-165.007339
Epoch 4, step 210: average elbo=-178.020233
Epoch 4, step 220: average elbo=-170.333313
Epoch 4, step 230: average elbo=-170.001465
Epoch 4, step 240: average elbo=-171.660828
Epoch 4, step 250: average elbo=-165.533676
Epoch 4, step 260: average elbo=-175.753433
Epoch 4, step 270: average elbo=-159.924728
Epoch 4, step 280: average elbo=-167.253693
Epoch 4, step 290: average elbo=-168.621872
Epoch 4, step 300: average elbo=-166.031387
Epoch 4, step 310: average elbo=-166.500565
Epoch 4, step 320: average elbo=-167.965042
Epoch 4, step 330: average elbo=-168.473206
Epoch 4, step 340: average elbo=-169.340530
Epoch 4, step 350: average elbo=-169.905426
Epoch 4, step 360: average elbo=-169.395905
Epoch 4, step 370: average elbo=-172.168900
Epoch 4, step 380: average elbo=-164.849762
Epoch 4, step 390: average elbo=-166.733353
Epoch 4, step 400: average elbo=-166.215134
Epoch 4, step 410: average elbo=-162.192719
Epoch 4, step 420: average elbo=-162.633209
Epoch 4, step 430: average elbo=-164.450668
Epoch 4, step 440: average elbo=-171.088303
Epoch 4, step 450: average elbo=-153.585678
Epoch 4, step 460: average elbo=-174.374100
Epoch 4, step 470: average elbo=-167.306763
Epoch 4, step 480: average elbo=-164.465836
Epoch 4, step 490: average elbo=-164.340775
Epoch 4, step 500: average elbo=-166.516174
Epoch 4, step 510: average elbo=-165.074493
Epoch 4, step 520: average elbo=-161.835327
Epoch 4, step 530: average elbo=-159.028549
Epoch 4, step 540: average elbo=-158.972260
Epoch 5, step 0: average elbo=-161.663269
Epoch 5, step 10: average elbo=-155.588303
Epoch 5, step 20: average elbo=-158.082321
Epoch 5, step 30: average elbo=-155.623718
Epoch 5, step 40: average elbo=-156.327423
Epoch 5, step 50: average elbo=-171.485825
Epoch 5, step 60: average elbo=-161.134216
Epoch 5, step 70: average elbo=-161.478607
Epoch 5, step 80: average elbo=-154.923416
Epoch 5, step 90: average elbo=-166.341919
Epoch 5, step 100: average elbo=-161.477417
Epoch 5, step 110: average elbo=-157.946991
Epoch 5, step 120: average elbo=-164.460999
Epoch 5, step 130: average elbo=-156.922211
Epoch 5, step 140: average elbo=-155.834213
Epoch 5, step 150: average elbo=-159.178268
Epoch 5, step 160: average elbo=-163.428238
Epoch 5, step 170: average elbo=-157.463638
Epoch 5, step 180: average elbo=-154.159225
Epoch 5, step 190: average elbo=-153.533615
Epoch 5, step 200: average elbo=-154.449234
Epoch 5, step 210: average elbo=-154.205353
Epoch 5, step 220: average elbo=-157.540344
Epoch 5, step 230: average elbo=-151.772644
Epoch 5, step 240: average elbo=-155.326782
Epoch 5, step 250: average elbo=-168.326431
Epoch 5, step 260: average elbo=-158.993423
Epoch 5, step 270: average elbo=-155.198044
Epoch 5, step 280: average elbo=-160.985458
Epoch 5, step 290: average elbo=-153.521072
Epoch 5, step 300: average elbo=-149.906235
Epoch 5, step 310: average elbo=-154.694489
Epoch 5, step 320: average elbo=-147.911850
Epoch 5, step 330: average elbo=-157.380417
Epoch 5, step 340: average elbo=-154.107620
Epoch 5, step 350: average elbo=-151.641586
Epoch 5, step 360: average elbo=-152.034393
Epoch 5, step 370: average elbo=-162.538422
Epoch 5, step 380: average elbo=-150.565140
Epoch 5, step 390: average elbo=-155.165543
Epoch 5, step 400: average elbo=-156.571915
Epoch 5, step 410: average elbo=-161.094864
Epoch 5, step 420: average elbo=-170.060822
Epoch 5, step 430: average elbo=-151.670639
Epoch 5, step 440: average elbo=-155.040726
Epoch 5, step 450: average elbo=-157.455261
Epoch 5, step 460: average elbo=-154.977341
Epoch 5, step 470: average elbo=-160.902618
Epoch 5, step 480: average elbo=-153.935730
Epoch 5, step 490: average elbo=-153.944031
Epoch 5, step 500: average elbo=-149.662338
Epoch 5, step 510: average elbo=-157.111954
Epoch 5, step 520: average elbo=-162.256409
Epoch 5, step 530: average elbo=-150.760132
Epoch 5, step 540: average elbo=-152.029526
Epoch 6, step 0: average elbo=-155.732666
Epoch 6, step 10: average elbo=-156.560745
Epoch 6, step 20: average elbo=-152.059937
Epoch 6, step 30: average elbo=-152.967682
Epoch 6, step 40: average elbo=-158.282028
Epoch 6, step 50: average elbo=-145.727631
Epoch 6, step 60: average elbo=-152.622086
Epoch 6, step 70: average elbo=-149.752701
Epoch 6, step 80: average elbo=-152.749710
Epoch 6, step 90: average elbo=-153.806732
Epoch 6, step 100: average elbo=-147.049759
Epoch 6, step 110: average elbo=-150.624130
Epoch 6, step 120: average elbo=-152.360565
Epoch 6, step 130: average elbo=-149.232956
Epoch 6, step 140: average elbo=-162.980728
Epoch 6, step 150: average elbo=-154.330383
Epoch 6, step 160: average elbo=-155.444077
Epoch 6, step 170: average elbo=-150.516769
Epoch 6, step 180: average elbo=-151.843826
Epoch 6, step 190: average elbo=-148.542816
Epoch 6, step 200: average elbo=-151.447815
Epoch 6, step 210: average elbo=-155.992691
Epoch 6, step 220: average elbo=-151.121979
Epoch 6, step 230: average elbo=-149.608627
Epoch 6, step 240: average elbo=-152.166275
Epoch 6, step 250: average elbo=-153.272659
Epoch 6, step 260: average elbo=-148.642929
Epoch 6, step 270: average elbo=-147.143890
Epoch 6, step 280: average elbo=-155.657715
Epoch 6, step 290: average elbo=-139.734222
Epoch 6, step 300: average elbo=-148.644791
Epoch 6, step 310: average elbo=-142.534683
Epoch 6, step 320: average elbo=-156.466125
Epoch 6, step 330: average elbo=-150.992905
Epoch 6, step 340: average elbo=-157.106628
Epoch 6, step 350: average elbo=-153.045273
Epoch 6, step 360: average elbo=-150.769318
Epoch 6, step 370: average elbo=-147.912750
Epoch 6, step 380: average elbo=-151.458710
Epoch 6, step 390: average elbo=-152.496918
Epoch 6, step 400: average elbo=-152.984879
Epoch 6, step 410: average elbo=-151.655121
Epoch 6, step 420: average elbo=-151.046295
Epoch 6, step 430: average elbo=-144.511719
Epoch 6, step 440: average elbo=-152.132370
Epoch 6, step 450: average elbo=-149.324203
Epoch 6, step 460: average elbo=-145.197098
Epoch 6, step 470: average elbo=-151.296692
Epoch 6, step 480: average elbo=-158.355743
Epoch 6, step 490: average elbo=-149.179382
Epoch 6, step 500: average elbo=-158.373520
Epoch 6, step 510: average elbo=-153.022156
Epoch 6, step 520: average elbo=-152.593201
Epoch 6, step 530: average elbo=-153.738419
Epoch 6, step 540: average elbo=-146.962357
Epoch 7, step 0: average elbo=-151.509415
Epoch 7, step 10: average elbo=-149.951126
Epoch 7, step 20: average elbo=-160.073105
Epoch 7, step 30: average elbo=-152.959381
Epoch 7, step 40: average elbo=-149.960999
Epoch 7, step 50: average elbo=-152.841812
Epoch 7, step 60: average elbo=-143.995834
Epoch 7, step 70: average elbo=-148.298065
Epoch 7, step 80: average elbo=-161.453522
Epoch 7, step 90: average elbo=-144.098984
Epoch 7, step 100: average elbo=-148.892197
Epoch 7, step 110: average elbo=-152.635895
Epoch 7, step 120: average elbo=-150.777634
Epoch 7, step 130: average elbo=-146.769760
Epoch 7, step 140: average elbo=-150.729996
Epoch 7, step 150: average elbo=-154.418198
Epoch 7, step 160: average elbo=-140.354202
Epoch 7, step 170: average elbo=-145.693726
Epoch 7, step 180: average elbo=-149.687698
Epoch 7, step 190: average elbo=-143.482346
Epoch 7, step 200: average elbo=-149.290619
Epoch 7, step 210: average elbo=-150.292618
Epoch 7, step 220: average elbo=-145.138580
Epoch 7, step 230: average elbo=-137.252075
Epoch 7, step 240: average elbo=-148.015137
Epoch 7, step 250: average elbo=-147.578522
Epoch 7, step 260: average elbo=-142.746353
Epoch 7, step 270: average elbo=-146.899185
Epoch 7, step 280: average elbo=-149.724579
Epoch 7, step 290: average elbo=-147.446030
Epoch 7, step 300: average elbo=-148.224670
Epoch 7, step 310: average elbo=-146.019089
Epoch 7, step 320: average elbo=-151.039597
Epoch 7, step 330: average elbo=-143.525162
Epoch 7, step 340: average elbo=-149.361511
Epoch 7, step 350: average elbo=-149.946106
Epoch 7, step 360: average elbo=-145.240402
Epoch 7, step 370: average elbo=-138.103790
Epoch 7, step 380: average elbo=-141.549301
Epoch 7, step 390: average elbo=-142.104797
Epoch 7, step 400: average elbo=-149.644714
Epoch 7, step 410: average elbo=-142.766296
Epoch 7, step 420: average elbo=-151.002380
Epoch 7, step 430: average elbo=-145.616943
Epoch 7, step 440: average elbo=-135.332092
Epoch 7, step 450: average elbo=-144.044830
Epoch 7, step 460: average elbo=-141.058121
Epoch 7, step 470: average elbo=-148.847992
Epoch 7, step 480: average elbo=-137.432938
Epoch 7, step 490: average elbo=-144.435593
Epoch 7, step 500: average elbo=-149.157562
Epoch 7, step 510: average elbo=-143.406174
Epoch 7, step 520: average elbo=-153.542984
Epoch 7, step 530: average elbo=-149.728409
Epoch 7, step 540: average elbo=-148.322662
Epoch 8, step 0: average elbo=-152.375198
Epoch 8, step 10: average elbo=-140.552338
Epoch 8, step 20: average elbo=-144.333649
Epoch 8, step 30: average elbo=-143.684143
Epoch 8, step 40: average elbo=-143.386276
Epoch 8, step 50: average elbo=-140.615875
Epoch 8, step 60: average elbo=-147.880035
Epoch 8, step 70: average elbo=-151.309311
Epoch 8, step 80: average elbo=-147.549103
Epoch 8, step 90: average elbo=-147.350769
Epoch 8, step 100: average elbo=-146.331955
Epoch 8, step 110: average elbo=-151.270279
Epoch 8, step 120: average elbo=-146.124786
Epoch 8, step 130: average elbo=-144.839905
Epoch 8, step 140: average elbo=-149.929428
Epoch 8, step 150: average elbo=-149.361938
Epoch 8, step 160: average elbo=-142.470749
Epoch 8, step 170: average elbo=-142.187073
Epoch 8, step 180: average elbo=-140.513519
Epoch 8, step 190: average elbo=-144.875626
Epoch 8, step 200: average elbo=-149.019836
Epoch 8, step 210: average elbo=-140.572418
Epoch 8, step 220: average elbo=-143.931229
Epoch 8, step 230: average elbo=-150.187775
Epoch 8, step 240: average elbo=-146.801804
Epoch 8, step 250: average elbo=-142.372375
Epoch 8, step 260: average elbo=-152.931595
Epoch 8, step 270: average elbo=-135.314545
Epoch 8, step 280: average elbo=-148.277603
Epoch 8, step 290: average elbo=-142.926910
Epoch 8, step 300: average elbo=-144.686462
Epoch 8, step 310: average elbo=-149.680130
Epoch 8, step 320: average elbo=-146.330063
Epoch 8, step 330: average elbo=-151.952103
Epoch 8, step 340: average elbo=-151.465546
Epoch 8, step 350: average elbo=-149.537109
Epoch 8, step 360: average elbo=-140.898880
Epoch 8, step 370: average elbo=-147.172318
Epoch 8, step 380: average elbo=-138.434204
Epoch 8, step 390: average elbo=-146.433090
Epoch 8, step 400: average elbo=-150.392761
Epoch 8, step 410: average elbo=-140.727234
Epoch 8, step 420: average elbo=-141.578613
Epoch 8, step 430: average elbo=-148.095291
Epoch 8, step 440: average elbo=-145.144806
Epoch 8, step 450: average elbo=-143.880737
Epoch 8, step 460: average elbo=-137.393417
Epoch 8, step 470: average elbo=-140.097153
Epoch 8, step 480: average elbo=-146.022690
Epoch 8, step 490: average elbo=-144.845795
Epoch 8, step 500: average elbo=-143.871811
Epoch 8, step 510: average elbo=-149.038940
Epoch 8, step 520: average elbo=-139.470383
Epoch 8, step 530: average elbo=-148.597366
Epoch 8, step 540: average elbo=-144.198151
Epoch 9, step 0: average elbo=-140.357361
Epoch 9, step 10: average elbo=-141.146988
Epoch 9, step 20: average elbo=-141.967148
Epoch 9, step 30: average elbo=-146.998611
Epoch 9, step 40: average elbo=-144.164032
Epoch 9, step 50: average elbo=-148.779556
Epoch 9, step 60: average elbo=-151.063293
Epoch 9, step 70: average elbo=-137.717789
Epoch 9, step 80: average elbo=-152.716034
Epoch 9, step 90: average elbo=-144.909103
Epoch 9, step 100: average elbo=-143.754822
Epoch 9, step 110: average elbo=-140.322479
Epoch 9, step 120: average elbo=-134.213715
Epoch 9, step 130: average elbo=-147.544861
Epoch 9, step 140: average elbo=-143.714844
Epoch 9, step 150: average elbo=-138.135941
Epoch 9, step 160: average elbo=-153.722473
Epoch 9, step 170: average elbo=-150.919556
Epoch 9, step 180: average elbo=-140.218307
Epoch 9, step 190: average elbo=-142.421478
Epoch 9, step 200: average elbo=-134.036087
Epoch 9, step 210: average elbo=-149.913147
Epoch 9, step 220: average elbo=-145.744049
Epoch 9, step 230: average elbo=-136.944687
Epoch 9, step 240: average elbo=-137.228867
Epoch 9, step 250: average elbo=-135.546371
Epoch 9, step 260: average elbo=-141.177231
Epoch 9, step 270: average elbo=-142.965271
Epoch 9, step 280: average elbo=-138.939972
Epoch 9, step 290: average elbo=-136.309570
Epoch 9, step 300: average elbo=-138.670303
Epoch 9, step 310: average elbo=-148.282288
Epoch 9, step 320: average elbo=-131.374817
Epoch 9, step 330: average elbo=-144.719391
Epoch 9, step 340: average elbo=-141.071304
Epoch 9, step 350: average elbo=-144.354645
Epoch 9, step 360: average elbo=-144.292847
Epoch 9, step 370: average elbo=-139.482513
Epoch 9, step 380: average elbo=-143.472778
Epoch 9, step 390: average elbo=-146.002213
Epoch 9, step 400: average elbo=-144.582214
Epoch 9, step 410: average elbo=-148.937836
Epoch 9, step 420: average elbo=-136.593323
Epoch 9, step 430: average elbo=-146.098541
Epoch 9, step 440: average elbo=-137.195114
Epoch 9, step 450: average elbo=-150.506149
Epoch 9, step 460: average elbo=-136.688766
Epoch 9, step 470: average elbo=-140.248764
Epoch 9, step 480: average elbo=-130.937668
Epoch 9, step 490: average elbo=-141.994659
Epoch 9, step 500: average elbo=-145.443604
Epoch 9, step 510: average elbo=-140.903854
Epoch 9, step 520: average elbo=-138.184250
Epoch 9, step 530: average elbo=-141.018204
Epoch 9, step 540: average elbo=-134.956665

In [18]:
plt.plot(elbo_log)


Out[18]:
[<matplotlib.lines.Line2D at 0x12b9ade48>]

In [19]:
x_sample = mnist.test.next_batch(minibatch_size)[0]
x_reconstruct = p_mu.eval(session=sess, feed_dict={x:x_sample})[0]
plt.subplot(1, 2, 1)
plt.imshow(x_sample[0].reshape(28, 28), vmin=0, vmax=1, cmap="gray")
plt.title("Test input")
plt.colorbar()
plt.subplot(1, 2, 2)
plt.imshow(x_reconstruct.reshape(28, 28), vmin=0, vmax=1, cmap="gray")
plt.title("Reconstruction")
plt.colorbar()
plt.tight_layout()