Define Kubeflow Pipelines to train a model

Create entry point using fairing

Kubeflow Fairing is a Python package that makes training and deploying machine learning models on Kubeflow easier.

Here, we use the preprocessor in Kubeflow Fairing to convert a notebook to be a Python script and create an entry point for that script. After preprocessing the notebook, we can call the command in the command line like the following to run

$ python repo_mlp.py train

In [1]:
import sys
sys.path.append("../../py")

In [2]:
import os
from fairing.preprocessors.converted_notebook import ConvertNotebookPreprocessorWithFire

In [3]:
preprocessor = ConvertNotebookPreprocessorWithFire('IssuesLoader', notebook_file='issues_loader.ipynb')

if not preprocessor.input_files:
    preprocessor.input_files = set()
input_files = ['../../py/code_intelligence/embeddings.py',
               '../../py/code_intelligence/inference.py',
               '../../py/label_microservice/repo_config.py']
preprocessor.input_files =  set([os.path.normpath(f) for f in input_files])
preprocessor.preprocess()


Converting issues_loader.ipynb to issues_loader.py
Creating entry point for the class name IssuesLoader
Out[3]:
[PosixPath('issues_loader.py'),
 '../../py/code_intelligence/embeddings.py',
 '../../py/label_microservice/repo_config.py',
 '../../py/code_intelligence/inference.py']

In [4]:
preprocessor = ConvertNotebookPreprocessorWithFire('RepoMLP', notebook_file='repo_mlp.ipynb')

if not preprocessor.input_files:
    preprocessor.input_files = set()
input_files = ['../../py/label_microservice/mlp.py',
               '../../py/label_microservice/repo_config.py']
preprocessor.input_files =  set([os.path.normpath(f) for f in input_files])
preprocessor.preprocess()


Converting repo_mlp.ipynb to repo_mlp.py
Creating entry point for the class name RepoMLP
Out[4]:
[PosixPath('repo_mlp.py'),
 '../../py/label_microservice/mlp.py',
 '../../py/label_microservice/repo_config.py']

Use Fairing to build docker image


In [5]:
import os
import sys
import fairing
from fairing.builders import append
from fairing.builders import cluster
from fairing.deployers import job
from fairing.preprocessors.converted_notebook import ConvertNotebookPreprocessorWithFire

In [6]:
# Setting up google container repositories (GCR) for storing output containers
# You can use any docker container registry istead of GCR
GCP_PROJECT = fairing.cloud.gcp.guess_project_name()
print(GCP_PROJECT)
DOCKER_REGISTRY = 'gcr.io/{}/training'.format(GCP_PROJECT)
print(DOCKER_REGISTRY)
PY_VERSION = ".".join([str(x) for x in sys.version_info[0:3]])
BASE_IMAGE = 'python:{}'.format(PY_VERSION)
# ucan use Dockerfile in this repo to build and use the base_image
base_image = 'gcr.io/issue-label-bot-dev/ml-gpu-lite-py3.6'


issue-label-bot-dev
gcr.io/issue-label-bot-dev/training

Build Docker image

We use builders in Kubeflow Fairing to build docker images. We use ClusterBuilder to builds a docker image in a Kubernetes cluster and AppendBuilder to append a new layer tarball. We also include preprocessor as a parameter to send the processed inputs to docker build.


In [7]:
preprocessor = ConvertNotebookPreprocessorWithFire('RepoMLP', notebook_file='repo_mlp.ipynb')

if not preprocessor.input_files:
    preprocessor.input_files = set()
input_files = ['../../py/label_microservice/mlp.py',
               '../../py/label_microservice/repo_config.py',
               '../../py/code_intelligence/embeddings.py',
               '../../py/code_intelligence/inference.py',
               'issues_loader.py']
preprocessor.input_files =  set([os.path.normpath(f) for f in input_files])
preprocessor.preprocess()


Converting repo_mlp.ipynb to repo_mlp.py
Creating entry point for the class name RepoMLP
Out[7]:
[PosixPath('repo_mlp.py'),
 'issues_loader.py',
 '../../py/label_microservice/repo_config.py',
 '../../py/code_intelligence/inference.py',
 '../../py/code_intelligence/embeddings.py',
 '../../py/label_microservice/mlp.py']

In [8]:
cluster_builder = cluster.cluster.ClusterBuilder(registry=DOCKER_REGISTRY,
                                                 base_image=base_image,
                                                 namespace='chunhsiang',
                                                 preprocessor=preprocessor,
                                                 pod_spec_mutators=[fairing.cloud.gcp.add_gcp_credentials_if_exists],
                                                 context_source=cluster.gcs_context.GCSContextSource())
cluster_builder.build()


Building image using cluster builder.
Creating docker context: /tmp/fairing_context_h7gadtk6
Converting repo_mlp.ipynb to repo_mlp.py
Creating entry point for the class name RepoMLP
Waiting for fairing-builder-7hzzg to start...
Waiting for fairing-builder-7hzzg to start...
Waiting for fairing-builder-7hzzg to start...
Waiting for fairing-builder-7hzzg to start...
Waiting for fairing-builder-7hzzg to start...
Pod started running True
INFO[0006] Downloading base image gcr.io/issue-label-bot-dev/ml-gpu-lite-py3.6
INFO[0006] Downloading base image gcr.io/issue-label-bot-dev/ml-gpu-lite-py3.6
WARN[0006] Error while retrieving image from cache: getting image from path: open /cache/sha256:75db155d8f0781010154a4611eec4861962ce5b3759fe50e2fcbd9394cb90526: no such file or directory
INFO[0007] Checking for cached layer gcr.io/issue-label-bot-dev/training/fairing-job/cache:dbfc5832e37808816d505d53c631c032397ec9c19a1c363b082c6c4f3fa262ea...
INFO[0007] No cached layer found for cmd RUN if [ -e requirements.txt ];then pip install --no-cache -r requirements.txt; fi
INFO[0007] Unpacking rootfs as cmd RUN if [ -e requirements.txt ];then pip install --no-cache -r requirements.txt; fi requires it.
INFO[0173] Taking snapshot of full filesystem...
INFO[0184] Skipping paths under /dev, as it is a whitelisted directory
INFO[0184] Skipping paths under /etc/secrets, as it is a whitelisted directory
INFO[0184] Skipping paths under /kaniko, as it is a whitelisted directory
INFO[0184] Skipping paths under /proc, as it is a whitelisted directory
INFO[0184] Skipping paths under /sys, as it is a whitelisted directory
INFO[0185] Skipping paths under /var/run, as it is a whitelisted directory
INFO[0226] WORKDIR /app/
INFO[0226] cmd: workdir
INFO[0226] Changed working directory to /app/
INFO[0226] Creating directory /app/
INFO[0226] Taking snapshot of files...
INFO[0226] ENV FAIRING_RUNTIME 1
INFO[0226] Taking snapshot of files...
INFO[0226] RUN if [ -e requirements.txt ];then pip install --no-cache -r requirements.txt; fi
INFO[0226] cmd: /bin/bash
INFO[0226] args: [-c if [ -e requirements.txt ];then pip install --no-cache -r requirements.txt; fi]
INFO[0226] Taking snapshot of full filesystem...
INFO[0257] Skipping paths under /dev, as it is a whitelisted directory
INFO[0257] Skipping paths under /etc/secrets, as it is a whitelisted directory
INFO[0257] Skipping paths under /kaniko, as it is a whitelisted directory
INFO[0257] Skipping paths under /proc, as it is a whitelisted directory
INFO[0257] Skipping paths under /sys, as it is a whitelisted directory
INFO[0258] Skipping paths under /var/run, as it is a whitelisted directory
INFO[0279] No files were changed, appending empty layer to config. No layer added to image.
INFO[0279] Using files from context: [/kaniko/buildcontext/app]
INFO[0279] COPY /app/ /app/
INFO[0279] Pushing layer gcr.io/issue-label-bot-dev/training/fairing-job/cache:dbfc5832e37808816d505d53c631c032397ec9c19a1c363b082c6c4f3fa262ea to cache now
INFO[0279] Taking snapshot of files...
2019/09/12 22:54:07 existing blob: sha256:d68afcc1cf651f4109698bde41c1173e6c1968c3a15b3cd13430d6bcde8ed8b4
2019/09/12 22:54:07 existing blob: sha256:89732bc7504122601f40269fc9ddfb70982e633ea9caf641ae45736f2846b004
2019/09/12 22:54:07 gcr.io/issue-label-bot-dev/training/fairing-job/cache:dbfc5832e37808816d505d53c631c032397ec9c19a1c363b082c6c4f3fa262ea: digest: sha256:ff33c06878a7f553ac692b92ededd58e3be8bf3434a386c410726bc17fe3bfbd size: 423
2019/09/12 22:54:08 existing blob: sha256:e4732fdd9b394aca10fbc52eb5dbd8c5a3b04e7c99044a7fd44cca2b28774f4b
2019/09/12 22:54:08 existing blob: sha256:b855cf81cf214886646b61e0892493983f57abb8ca55ddb58ce4b74a5f80b87c
2019/09/12 22:54:08 existing blob: sha256:f401bdaa92adf9a46956b65a2d86021969297b5a5ed17b9820a54c8e6d44cf85
2019/09/12 22:54:08 existing blob: sha256:16d7d0c5a038bb07d1b7cb710111258065d67eaa8154bea62f28d1320f77a173
2019/09/12 22:54:08 existing blob: sha256:be8e0cd8df7cf8f297197d642acdbd141989f4235ca9decb11a86cdbe59d139b
2019/09/12 22:54:08 existing blob: sha256:d62ff11941adfa22f5dc2be2e4f92d3f48d3a064c27c275d4a68cfb12ff05476
2019/09/12 22:54:08 existing blob: sha256:6a42806f2bf5819dcbdb874004e0032af07eda4560c12b1229588d630e401c5e
2019/09/12 22:54:08 existing blob: sha256:e152ee00b3dfec432f831c3add9c47dce9cd1974772890d67f5d3d0a5c8b89ae
2019/09/12 22:54:08 existing blob: sha256:bfa420bf42bba88c31241000eff205efcda9c6ae42b758ad40773424e82f66a6
2019/09/12 22:54:08 existing blob: sha256:05731e63f21105725a5c062a725b33a54ad8c697f9c810870c6aa3e3cd9fb6a2
2019/09/12 22:54:08 existing blob: sha256:e059dd98ac7cff88cacd4e01f2f1d56af872618aac98b0aff8afdd81b9fd2c76
2019/09/12 22:54:08 existing blob: sha256:0809e577f6d6c969247b35a338c0809b324e29ee01964de6dd3c3fe4808b2a15
2019/09/12 22:54:08 existing blob: sha256:3246eae925c37c1c7542f246be281690d3075d7f3d9ab89c052cd393a7569def
2019/09/12 22:54:08 existing blob: sha256:6abc03819f3e00a67ed5adc1132cfec041d5f7ec3c29d5416ba0433877547b6f
2019/09/12 22:54:08 existing blob: sha256:51fecbc74e17222a7a64e9d690fe416cc6120154ba4c4ec90348beacdf454243
2019/09/12 22:54:08 existing blob: sha256:421e23cecfe86d864c15dc6cf8c6f41b9b8f356497d45fa786a7d3906b0fd38b
2019/09/12 22:54:08 existing blob: sha256:5b6ac7f35d3dce1cd93b9740c7342fe317b179c5ebb8d9e92069d5c671a756ae
2019/09/12 22:54:08 existing blob: sha256:a5abf09960671b05108c96974fb37583588b286621ccd3b1229ee948d23af04e
2019/09/12 22:54:08 existing blob: sha256:2e3cac877bb2877095aa4ffad6519197728eb04865674481d1be8986b3c74ff9
2019/09/12 22:54:08 existing blob: sha256:f5f8ce55306cbab02d089edd0c8cd4cf402c27ceb58f98d42ac57d96b7972e57
2019/09/12 22:54:08 existing blob: sha256:cbeb255d6ab1895a17ac078089e678fa0189fd8204c85ad79ab69a766dbb2e8a
2019/09/12 22:54:08 existing blob: sha256:0bd67c50d6beeb55108476f72bea3b4b29a9f48832d6e045ec66b7ac4bf712a0
2019/09/12 22:54:08 existing blob: sha256:6669e38ab1bac2750ee43e9fd769478543e5ee6866cfbbe10c5f55584681792d
2019/09/12 22:54:08 existing blob: sha256:2588190c9278ac1ef3f82ce60f995f2da985e3b4da01fed13276258da4d38dae
2019/09/12 22:54:08 existing blob: sha256:d718e4299c08d3ee1aca64bdc55d04f86142879fa278eae6c2adaa6aee273598
2019/09/12 22:54:08 existing blob: sha256:d5c73556cc1e31575beb68d582444652a0e1d704e0a19bbab8565e777493ece0
2019/09/12 22:54:08 existing blob: sha256:27179306e54c7fd5dd61b4546f049da31eb11733621779b8b6fe8b991bd49e4a
2019/09/12 22:54:08 existing blob: sha256:88462de30d91bf4e8392342ddf21c4793b4ee2b24ec8f31e7b3323615635c9d2
2019/09/12 22:54:08 existing blob: sha256:5d617e1015becaddf401b99d4c2d593dfd8b1f1bb3f5a3893c6c99f187455967
2019/09/12 22:54:09 pushed blob sha256:75dd22cc05551e91321f15e2cc8c05937113d1bba56ab505eaad4bcebe96def9
2019/09/12 22:54:09 pushed blob sha256:ed519f604a5f5d46268ec1e181b23e40b5a15e155eae6e793c1639849bfc83c3
2019/09/12 22:54:09 pushed blob sha256:7a3160d9bbde1e8445545b3a10eb83efd96de72b8444bceec9a6680aff5d431e
2019/09/12 22:54:09 pushed blob sha256:5a019115a1bd8a16b779de15dc5375c3161bc7a6f113fa634ff85344b144ccfa
2019/09/12 22:54:10 gcr.io/issue-label-bot-dev/training/fairing-job:423CDBC6: digest: sha256:be826ed9819009cf881cc35d4f9d4549ae1605a8cf6087b3faa264151444cda4 size: 5466

In [9]:
builder = append.append.AppendBuilder(registry=DOCKER_REGISTRY,
                                      base_image=cluster_builder.image_tag,
                                      preprocessor=preprocessor)
builder.build()


Building image using Append builder...
Creating docker context: /tmp/fairing_context_v2rep_e0
Converting repo_mlp.ipynb to repo_mlp.py
Creating entry point for the class name RepoMLP
Loading Docker credentials for repository 'gcr.io/issue-label-bot-dev/training/fairing-job:423CDBC6'
Invoking 'docker-credential-gcloud' to obtain Docker credentials.
Successfully obtained Docker credentials.
Image successfully built in 1.6972043240093626s.
Pushing image gcr.io/issue-label-bot-dev/training/fairing-job:5350A3D3...
Loading Docker credentials for repository 'gcr.io/issue-label-bot-dev/training/fairing-job:5350A3D3'
Invoking 'docker-credential-gcloud' to obtain Docker credentials.
Successfully obtained Docker credentials.
Uploading gcr.io/issue-label-bot-dev/training/fairing-job:5350A3D3
Layer sha256:0bd67c50d6beeb55108476f72bea3b4b29a9f48832d6e045ec66b7ac4bf712a0 exists, skipping
Layer sha256:a5abf09960671b05108c96974fb37583588b286621ccd3b1229ee948d23af04e exists, skipping
Layer sha256:88462de30d91bf4e8392342ddf21c4793b4ee2b24ec8f31e7b3323615635c9d2 exists, skipping
Layer sha256:e059dd98ac7cff88cacd4e01f2f1d56af872618aac98b0aff8afdd81b9fd2c76 exists, skipping
Layer sha256:5d617e1015becaddf401b99d4c2d593dfd8b1f1bb3f5a3893c6c99f187455967 exists, skipping
Layer sha256:d62ff11941adfa22f5dc2be2e4f92d3f48d3a064c27c275d4a68cfb12ff05476 exists, skipping
Layer sha256:6669e38ab1bac2750ee43e9fd769478543e5ee6866cfbbe10c5f55584681792d exists, skipping
Layer sha256:b855cf81cf214886646b61e0892493983f57abb8ca55ddb58ce4b74a5f80b87c exists, skipping
Layer sha256:0809e577f6d6c969247b35a338c0809b324e29ee01964de6dd3c3fe4808b2a15 exists, skipping
Layer sha256:cbeb255d6ab1895a17ac078089e678fa0189fd8204c85ad79ab69a766dbb2e8a exists, skipping
Layer sha256:f5f8ce55306cbab02d089edd0c8cd4cf402c27ceb58f98d42ac57d96b7972e57 exists, skipping
Layer sha256:e152ee00b3dfec432f831c3add9c47dce9cd1974772890d67f5d3d0a5c8b89ae exists, skipping
Layer sha256:6abc03819f3e00a67ed5adc1132cfec041d5f7ec3c29d5416ba0433877547b6f exists, skipping
Layer sha256:7a3160d9bbde1e8445545b3a10eb83efd96de72b8444bceec9a6680aff5d431e exists, skipping
Layer sha256:d5c73556cc1e31575beb68d582444652a0e1d704e0a19bbab8565e777493ece0 exists, skipping
Layer sha256:2588190c9278ac1ef3f82ce60f995f2da985e3b4da01fed13276258da4d38dae exists, skipping
Layer sha256:d718e4299c08d3ee1aca64bdc55d04f86142879fa278eae6c2adaa6aee273598 exists, skipping
Layer sha256:421e23cecfe86d864c15dc6cf8c6f41b9b8f356497d45fa786a7d3906b0fd38b exists, skipping
Layer sha256:5b6ac7f35d3dce1cd93b9740c7342fe317b179c5ebb8d9e92069d5c671a756ae exists, skipping
Layer sha256:16d7d0c5a038bb07d1b7cb710111258065d67eaa8154bea62f28d1320f77a173 exists, skipping
Layer sha256:27179306e54c7fd5dd61b4546f049da31eb11733621779b8b6fe8b991bd49e4a exists, skipping
Layer sha256:6a42806f2bf5819dcbdb874004e0032af07eda4560c12b1229588d630e401c5e exists, skipping
Layer sha256:75dd22cc05551e91321f15e2cc8c05937113d1bba56ab505eaad4bcebe96def9 exists, skipping
Layer sha256:51fecbc74e17222a7a64e9d690fe416cc6120154ba4c4ec90348beacdf454243 exists, skipping
Layer sha256:f401bdaa92adf9a46956b65a2d86021969297b5a5ed17b9820a54c8e6d44cf85 exists, skipping
Layer sha256:be8e0cd8df7cf8f297197d642acdbd141989f4235ca9decb11a86cdbe59d139b exists, skipping
Layer sha256:2e3cac877bb2877095aa4ffad6519197728eb04865674481d1be8986b3c74ff9 exists, skipping
Layer sha256:e4732fdd9b394aca10fbc52eb5dbd8c5a3b04e7c99044a7fd44cca2b28774f4b exists, skipping
Layer sha256:05731e63f21105725a5c062a725b33a54ad8c697f9c810870c6aa3e3cd9fb6a2 exists, skipping
Layer sha256:bfa420bf42bba88c31241000eff205efcda9c6ae42b758ad40773424e82f66a6 exists, skipping
Layer sha256:ed519f604a5f5d46268ec1e181b23e40b5a15e155eae6e793c1639849bfc83c3 exists, skipping
Layer sha256:3246eae925c37c1c7542f246be281690d3075d7f3d9ab89c052cd393a7569def exists, skipping
Layer sha256:ef96ccdab5fe1ffd9d4c3ffef6c5655fcf9cf15ccd9017959bd7d0f6f67521bd pushed.
Layer sha256:69f575316ab0d5b3e21f56f42826540bf275b1fe9d788ce8852af51ba8f8b038 pushed.
Finished upload of: gcr.io/issue-label-bot-dev/training/fairing-job:5350A3D3
Pushed image gcr.io/issue-label-bot-dev/training/fairing-job:5350A3D3 in 4.877649196016137s.

In [ ]:

Build pipeline

Kubeflow Pipelines builds reusable end-to-end machine learning workflows.

Define the pipeline as a Python function. "@kfp.dsl.pipeline" is a required decoration including name and description properties.

We define two steps for our training pipelines, including scrapping issues and training model, both of which will be executed in our built image from Kubeflow Fairing. Also, we use GPU and add GCP credentials to the pipelines.


In [10]:
import kfp
import kfp.components as comp
import kfp.gcp as gcp
import kfp.dsl as dsl
import kfp.compiler as compiler

In [11]:
# need to modify it if build a new one
target_image = 'gcr.io/issue-label-bot-dev/training/fairing-job:5350A3D3'

In [12]:
@dsl.pipeline(
   name='Training pipeline',
   description='A pipeline that loads embeddings and trains a model for a github repo.'
)
def train_pipeline(owner, repo):
    scrape_op = dsl.ContainerOp(
            name='scrape issues',
            image=target_image,
            command=['python', 'issues_loader.py', 'save_issue_embeddings', f'--owner={owner}', f'--repo={repo}'],
            ).set_gpu_limit(1).apply(
                gcp.use_gcp_secret('user-gcp-sa'),
            )
    scrape_op.container.working_dir = '/app'

    train_op = dsl.ContainerOp(
            name='train',
            image=target_image,
            command=['python', 'repo_mlp.py', 'train', f'--owner={owner}', f'--repo={repo}'],
            ).set_gpu_limit(1).apply(
                gcp.use_gcp_secret('user-gcp-sa'),
            )
    train_op.container.working_dir = '/app'
    train_op.after(scrape_op)

Compile the pipeline

We compile our pipeline to an intermediate representation, which is a YAML file compressed in a zip file.


In [13]:
pipeline_func = train_pipeline
pipeline_filename = pipeline_func.__name__ + '.pipeline.zip'
compiler.Compiler().compile(pipeline_func, pipeline_filename)

In [ ]:

Submit the pipeline for execution

We upload our created pipeline, the zip file, and run it. Then, we can see the pipeline and experiments on Kubeflow UI.


In [14]:
EXPERIMENT_NAME = 'TrainModel'

client = kfp.Client()
experiment = client.create_experiment(EXPERIMENT_NAME)


Experiment link here

In [15]:
#Specify pipeline argument values
arguments = {'owner': '', 'repo': ''}

#Submit a pipeline run
run_name = pipeline_func.__name__ + ' run'
run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)


Run link here

In [ ]:


In [ ]:


In [ ]:


In [ ]: