Learning Objectives
Kubeflow is an open source Kubernetes-native platform for developing, orchestrating, and deploying scalable and portable ML workloads. It allows you to manage end-to-end orchestration of ML pipelines.
Kubeflow Pipelines are a new component of Kubeflow that can help you compose, deploy, and manage end-to-end (optionally hybrid) machine learning workflows. In essence, pipelines enable you to port your data to an accessible format and location, perform data cleaning and feature engineering, analyze your trained models, version your models, scalably serve your trained models while avoiding training or serving skew, and more.
Have a look at this blog post to read more about Getting Started with Kubeflow Pipelines.
This notebook goes through the steps of using Kubeflow pipelines using the Python3 interpreter. We'll create a cluster and deploy a Kubeflow pipeline to to preprocess, train, tune and deploy the babyweight model.
In [1]:
%%bash
gcloud config set compute/zone us-central1-b
gcloud container clusters create lakpipeline \
--zone us-central1-b \
--scopes cloud-platform \
--enable-cloud-logging \
--enable-cloud-monitoring \
--machine-type n1-standard-2 \
--num-nodes 4
kubectl create clusterrolebinding ml-pipeline-admin-binding --clusterrole=cluster-admin --user=$(gcloud config get-value account)
Go the Google Kubernetes Engine section of the GCP console and make sure that the cluster is started and ready. This will take about 3 minutes.
In [ ]:
%%bash
PIPELINE_VERSION=0.1.3
kubectl create -f https://storage.googleapis.com/ml-pipeline/release/$PIPELINE_VERSION/bootstrapper.yaml
The above command can take up to 20 minutes. Run the following cell until the "SUCCESS" is 1.
In [ ]:
%%bash
kubectl get job
# Do this on your laptop, not in Jupyter!!! export NAMESPACE=kubeflow kubectl port-forward -n ${NAMESPACE} $(kubectl get pods -n ${NAMESPACE} --selector=service=ambassador -o jsonpath='{.items[0].metadata.name}') 8085:80
Now:
In [ ]:
%%bash
PIPELINE_VERSION=0.1.3
pip3 install python-dateutil https://storage.googleapis.com/ml-pipeline/release/$PIPELINE_VERSION/kfp.tar.gz --upgrade
In [18]:
%%bash
OUTDIR=pipelines/dsl
rm -rf $OUTDIR
mkdir -p $OUTDIR
python3 pipelines/mlp_babyweight.py $OUTDIR/mlp_babyweight.tar.gz
ls -l $OUTDIR
Download the above tar file, and upload it to the Kubeflow pipeline UI. (Follow https://github.com/kubeflow/pipelines/issues/495 to see if you can directly upload from the cluster).
Copyright 2017 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License