Moving to the Cloud

Setup TensorFlow locally

Follow the instructions to install TensorFlow locally on your computer:

https://www.tensorflow.org/install/

You can also skip this section and proceed with "Create VM" if you get stuck with the installation.

Test your installation by copying the following cell into a file and executing it on your computer:


In [0]:
import os
import tensorflow as tf
print('version={}, CUDA={}, GPU={}, TPU={}'.format(
    tf.__version__, tf.test.is_built_with_cuda(),
    # GPU attached? Note that you can "Runtime/Change runtime type..." in Colab.
    len(tf.config.list_physical_devices('GPU')) > 0,
    # TPU accessible? (only works on Colab)
    'COLAB_TPU_ADDR' in os.environ))

Create VM

Prerequisite : Before using any Cloud services, you will need to set up a billing account (https://console.cloud.google.com/billing) and register a credit card. Once your credit card is validated (by charging and immediately reimbursing a small amount), you will get 300 USD credit for one year.

Participants of the workshop will get a 50 USD voucher and don't need to set up any billing.

  1. Create a new virtual machine in Google Cloud Console. By far the easiest way to get this set up correctly is by copying the following command into the Cloud Shell :
$ gcloud compute instances create cpu-1 --zone=us-east1-b --image-project=deeplearning-platform-release --image-family=tf2-2-1-cpu

  1. Once the VM named "cpu-1" has started (green icon), click on the "SSH" button in the "Connect" column - if you have copy'n'paste problems with the web terminal, try with another browser. You can also install your SSH key (make sure to specify the cloud username when copy'n'pasting) and then use your favorite terminal emulator to connect to the VM.

  2. Check the installation

$ python3 -c'import tensorflow as tf; print(tf.__version__)'

  1. Tip: If you turn down the VM when you don't need it, your credits will last a lot longer :-)

Train as Python program

Below is a minimal program that trains a linear model using data from cloud. Copy it into a file and start it on your computer or cloud instance (don't forget to run the program with python3 on Cloud if you used the above commands).


In [0]:
import json
import tensorflow as tf

data_path = 'gs://amld-datasets/zoo_img'
batch_size = 100
labels = [label.strip() for label in 
          tf.io.gfile.GFile('{}/labels.txt'.format(data_path))]
counts = json.load(tf.io.gfile.GFile('{}/counts.json'.format(data_path)))
train_steps = counts['train'] // batch_size
eval_steps = counts['eval'] // batch_size

feature_spec = {
    'label': tf.io.FixedLenFeature(shape=[1], dtype=tf.int64),
    'img_64': tf.io.FixedLenFeature(shape=[64, 64], dtype=tf.int64),
}

def parse_example(serialized_example):
  features = tf.io.parse_single_example(serialized_example, feature_spec)
  label = tf.one_hot(tf.squeeze(features['label']), len(labels))
  img_64 = tf.cast(features['img_64'], tf.float32) / 255.
  return img_64, label

ds_train = tf.data.TFRecordDataset(
    tf.io.gfile.glob('{}/train-*'.format(data_path))
    ).map(parse_example).batch(batch_size)
ds_eval  = tf.data.TFRecordDataset(
    tf.io.gfile.glob('{}/eval-*'.format(data_path))
    ).map(parse_example).batch(batch_size)

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(64, 64,)),
    tf.keras.layers.Dense(len(labels), activation='softmax')
])
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.01),
    loss='categorical_crossentropy',
    metrics=['accuracy'])

history = model.fit(ds_train, steps_per_epoch=train_steps, epochs=1)
print('eval: ', model.evaluate(ds_eval, steps=eval_steps))

model.save('linear.h5')

Use your own data

  1. Create a new storage bucket

    https://console.cloud.google.com/storage/browser

  2. Upload data with commands below

  3. Optionally set access permissions (e.g. give allUsers "Storage Object Viewer" access)

By the way: All workshop Colabs also work with paths that are inside cloud storage buckets!


In [0]:
data_src_directory = '/content/gdrive/My Drive/amld_data/zoo_img'
# YOUR ACTION REQUIRED:
# Change the bucket name to the bucket that you created and have write 
# access to.
data_dst_bucket = 'gs://amld-datasets'

In [0]:
# Authenticate for Drive & Cloud.
from google.colab import drive
drive.mount('/content/gdrive')
from google.colab import auth
auth.authenticate_user()

In [0]:
!gsutil cp -R "$data_src_directory" "$data_dst_bucket"

In [0]:
# YOUR ACTION REQUIRED:
# Change the data_path in above training script to your own data and re-run.
# Note that you may have to update access rights accordingly.

----- Optional part -----

Find better parameters

Extend the above program. Some suggestions:

  • Specify parameters from command line using the argparse module.
  • Make the model more complicated (like in 2_keras.ipynb), also specify these as parameters.
  • Store the evaluation results and training parameters to disk (e.g. as JSON).
  • Write a script that explores different parameters.
  • Write a script that shows a summary of results.
  • Log results to TensorBoard (like in 3_eager.ipynb).

Pro Tip : Want to use GPU and running into CUDA & TensorFlow library problems? You might find the following magic Cloud Shell command helpful (taken from Cloud Docs):

$ gcloud compute instances create gpu-1 --zone=us-east1-b --image-project=deeplearning-platform-release --image-family=tf2-latest-gpu --maintenance-policy=TERMINATE --accelerator="type=nvidia-tesla-p100,count=1" --metadata="install-nvidia-driver=True"

This command will create a new VM instance called "gpu-1" and start it.

Note that you first need to apply for GPU Quota (make sure to increase "PUs (all regions)" from 0 to 1) - the quota is usually granted within a couple of minutes.

https://console.cloud.google.com/iam-admin/quotas?metric=GPUs%20(all%20regions),NVIDIA%20P100%20GPUs

Check that everything is set up correctly:

$ python3 -c'import tensorflow as tf; print("version={} CUDA={} GPU={}".format(tf.__version__, tf.test.is_built_with_cuda(), tf.config.list_physical_devices("GPU")))'

! Before you leave !

  1. If you were at the Workshop, please give us your feedback!
  2. You probably want to stop your VMs now, especially if you were running expensive hardware for fast training.