MNIST Dataset Introduction

Most examples are using MNIST dataset of handwritten digits. It has 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image, so each sample is represented as a matrix of size 28x28 with values from 0 to 1.

Overview

Usage

In our examples, we are using TensorFlow input_data.py script to load that dataset. It is quite useful for managing our data, and handle:

  • Dataset downloading

  • Loading the entire dataset into numpy array:


In [ ]:
# Import MNIST
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

# Load data
X_train = mnist.train.images
Y_train = mnist.train.labels
X_test = mnist.test.images
Y_test = mnist.test.labels
  • A next_batch function that can iterate over the whole dataset and return only the desired fraction of the dataset samples (in order to save memory and avoid to load the entire dataset).

In [ ]:
# Get the next 64 images array and labels
batch_X, batch_Y = mnist.train.next_batch(64)