Kubeflow pipelines

Learning Objectives

  • See how to process, train, tune and deploy a model using Kubeflow pipelines

Introduction

Kubeflow is an open source Kubernetes-native platform for developing, orchestrating, and deploying scalable and portable ML workloads. It allows you to manage end-to-end orchestration of ML pipelines.

Kubeflow Pipelines are a new component of Kubeflow that can help you compose, deploy, and manage end-to-end (optionally hybrid) machine learning workflows. In essence, pipelines enable you to port your data to an accessible format and location, perform data cleaning and feature engineering, analyze your trained models, version your models, scalably serve your trained models while avoiding training or serving skew, and more.

Have a look at this blog post to read more about Getting Started with Kubeflow Pipelines.

This notebook goes through the steps of using Kubeflow pipelines using the Python3 interpreter. We'll create a cluster and deploy a Kubeflow pipeline to to preprocess, train, tune and deploy the babyweight model.

Create a Kubeflow cluster

To begin, we'll create a Kubeflow cluster called lakpipeline


In [1]:
%%bash
gcloud config set compute/zone us-central1-b
gcloud container clusters create lakpipeline \
  --zone us-central1-b \
  --scopes cloud-platform \
  --enable-cloud-logging \
  --enable-cloud-monitoring \
  --machine-type n1-standard-2 \
  --num-nodes 4
kubectl create clusterrolebinding ml-pipeline-admin-binding --clusterrole=cluster-admin --user=$(gcloud config get-value account)


Updated property [compute/zone].
WARNING: Starting in 1.12, new clusters will have basic authentication disabled by default. Basic authentication can be enabled (or disabled) manually using the `--[no-]enable-basic-auth` flag.
WARNING: Starting in 1.12, new clusters will not have a client certificate issued. You can manually enable (or disable) the issuance of the client certificate using the `--[no-]issue-client-certificate` flag.
WARNING: Currently VPC-native is not the default mode during cluster creation. In the future, this will become the default mode and can be disabled using `--no-enable-ip-alias` flag. Use `--[no-]enable-ip-alias` flag to suppress this warning.
WARNING: Starting in 1.12, default node pools in new clusters will have their legacy Compute Engine instance metadata endpoints disabled by default. To create a cluster with legacy instance metadata endpoints disabled in the default node pool, run `clusters create` with the flag `--metadata disable-legacy-endpoints=true`.
This will enable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
WARNING: The behavior of --scopes will change in a future gcloud release: service-control and service-management scopes will no longer be added to what is specified in --scopes. To use these scopes, add them explicitly to --scopes. To use the new behavior, set container/new_scopes_behavior property (gcloud config set container/new_scopes_behavior true).
WARNING: Starting in Kubernetes v1.10, new clusters will no longer get compute-rw and storage-ro scopes added to what is specified in --scopes (though the latter will remain included in the default --scopes). To use these scopes, add them explicitly to --scopes. To use the new behavior, set container/new_scopes_behavior property (gcloud config set container/new_scopes_behavior true).
ERROR: (gcloud.container.clusters.create) ResponseError: code=409, message=Already exists: projects/munn-sandbox/zones/us-central1-b/clusters/lakpipeline.
Error from server (Forbidden): clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "619455089084-compute@developer.gserviceaccount.com" cannot create clusterrolebindings.rbac.authorization.k8s.io at the cluster scope: Required "container.clusterRoleBindings.create" permission.
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-1-fffe7aeabac5> in <module>
----> 1 get_ipython().run_cell_magic('bash', '', 'gcloud config set compute/zone us-central1-b\ngcloud container clusters create lakpipeline \\\n  --zone us-central1-b \\\n  --scopes cloud-platform \\\n  --enable-cloud-logging \\\n  --enable-cloud-monitoring \\\n  --machine-type n1-standard-2 \\\n  --num-nodes 4\nkubectl create clusterrolebinding ml-pipeline-admin-binding --clusterrole=cluster-admin --user=$(gcloud config get-value account)\n')

/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2345                 magic_arg_s = self.var_expand(line, stack_depth)
   2346             with self.builtin_trap:
-> 2347                 result = fn(magic_arg_s, cell)
   2348             return result
   2349 

/usr/local/lib/python3.5/dist-packages/IPython/core/magics/script.py in named_script_magic(line, cell)
    140             else:
    141                 line = script
--> 142             return self.shebang(line, cell)
    143 
    144         # write a basic docstring:

</usr/local/lib/python3.5/dist-packages/decorator.py:decorator-gen-110> in shebang(self, line, cell)

/usr/local/lib/python3.5/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/usr/local/lib/python3.5/dist-packages/IPython/core/magics/script.py in shebang(self, line, cell)
    243             sys.stderr.flush()
    244         if args.raise_error and p.returncode!=0:
--> 245             raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
    246 
    247     def _run_script(self, p, cell, to_close):

CalledProcessError: Command 'b'gcloud config set compute/zone us-central1-b\ngcloud container clusters create lakpipeline \\\n  --zone us-central1-b \\\n  --scopes cloud-platform \\\n  --enable-cloud-logging \\\n  --enable-cloud-monitoring \\\n  --machine-type n1-standard-2 \\\n  --num-nodes 4\nkubectl create clusterrolebinding ml-pipeline-admin-binding --clusterrole=cluster-admin --user=$(gcloud config get-value account)\n'' returned non-zero exit status 1

Go the Google Kubernetes Engine section of the GCP console and make sure that the cluster is started and ready. This will take about 3 minutes.

Deploy the Kubeflow pipeline to the cluster


In [ ]:
%%bash
PIPELINE_VERSION=0.1.3
kubectl create -f https://storage.googleapis.com/ml-pipeline/release/$PIPELINE_VERSION/bootstrapper.yaml

The above command can take up to 20 minutes. Run the following cell until the "SUCCESS" is 1.


In [ ]:
%%bash
kubectl get job

3. Set up port forward to access Jupyter running on cluster


# Do this on your laptop, not in Jupyter!!!
export NAMESPACE=kubeflow
kubectl port-forward -n ${NAMESPACE} $(kubectl get pods -n ${NAMESPACE} --selector=service=ambassador -o jsonpath='{.items[0].metadata.name}') 8085:80

Now:

4. Install local interpreter


In [ ]:
%%bash
PIPELINE_VERSION=0.1.3
pip3 install python-dateutil https://storage.googleapis.com/ml-pipeline/release/$PIPELINE_VERSION/kfp.tar.gz --upgrade

5. Do the DSL compile


In [18]:
%%bash
OUTDIR=pipelines/dsl
rm -rf $OUTDIR
mkdir -p $OUTDIR
python3 pipelines/mlp_babyweight.py $OUTDIR/mlp_babyweight.tar.gz
ls -l $OUTDIR


total 4
-rw-r--r-- 1 jovyan users 1089 Dec  6 23:39 mlp_babyweight.tar.gz

5. Upload and execute pipeline

Download the above tar file, and upload it to the Kubeflow pipeline UI. (Follow https://github.com/kubeflow/pipelines/issues/495 to see if you can directly upload from the cluster).

Copyright 2017 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License