In this lab, you use the TFX CLI utility to build and deploy a TFX pipeline that uses Kubeflow pipelines for orchestration, AI Platform for model training, and a managed AI Platform Pipeline instance (Kubeflow Pipelines) that runs on a Kubernetes cluster for compute. You will then create and monitor pipeline runs using the TFX CLI as well as the KFP UI.
In [ ]:
import yaml
# Set `PATH` to include the directory containing TFX CLI and skaffold.
PATH=%env PATH
%env PATH=/home/jupyter/.local/bin:{PATH}
In [ ]:
!python -c "import tfx; print('TFX version: {}'.format(tfx.__version__))"
!python -c "import kfp; print('KFP version: {}'.format(kfp.__version__))"
Note: this lab was built and tested with the following package versions:
TFX version: 0.21.4
KFP version: 0.5.1
If running the above command results in different package versions or you receive an import error, upgrade to the correct versions by running the cell below:
In [ ]:
%pip install --upgrade --user tfx==0.21.4
%pip install --upgrade --user kfp==0.5.1
Note: you may need to restart the kernel to pick up the correct package versions.
In [ ]:
%cd pipeline
In [ ]:
!ls -la
The config.py
module configures the default values for the environment specific settings and the default values for the pipeline runtime parameters.
The default values can be overwritten at compile time by providing the updated values in a set of environment variables.
The pipeline.py
module contains the TFX DSL defining the workflow implemented by the pipeline.
The preprocessing.py
module implements the data preprocessing logic the Transform
component.
The model.py
module implements the training logic for the Train
component.
The runner.py
module configures and executes KubeflowDagRunner
. At compile time, the KubeflowDagRunner.run()
method conversts the TFX DSL into the pipeline package in the argo format.
The features.py
module contains feature definitions common across preprocessing.py
and model.py
.
Navigate to AI Platform Pipelines page in the Google Cloud Console.
1. Create or select an existing Kubernetes cluster (GKE) and deploy AI Platform. Make sure to select "Allow access to the following Cloud APIs https://www.googleapis.com/auth/cloud-platform"
to allow for programmatic access to your pipeline by the Kubeflow SDK for the rest of the lab. Also, provide an App instance name
such as "tfx" or "mlops". Note you may have already deployed an AI Pipelines instance during the Setup for the lab series. If so, you can proceed using that instance below in the next step.
Validate the deployment of your AI Platform Pipelines instance in the console before proceeding.
2. Configure your environment settings.
Update the below constants with the settings reflecting your lab environment.
GCP_REGION
- the compute region for AI Platform Training and PredictionARTIFACT_STORE
- the GCS bucket created during installation of AI Platform Pipelines. The bucket name will contain the kubeflowpipelines-
prefix.
In [ ]:
# Use the following command to identify the GCS bucket for metadata and pipeline storage.
!gsutil ls
ENDPOINT
- set the ENDPOINT
constant to the endpoint to your AI Platform Pipelines instance. The endpoint to the AI Platform Pipelines instance can be found on the AI Platform Pipelines page in the Google Cloud Console. Open the SETTINGS for your instance and use the value of the host
variable in the Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SKD section of the SETTINGS window. The format is '....[region].pipelines.googleusercontent.com'
.
In [ ]:
GCP_REGION = 'us-central1'
ENDPOINT = '490ab949a23d5f6d-dot-us-central2.pipelines.googleusercontent.com'
ARTIFACT_STORE_URI = 'gs://hostedkfp-default-36un4wco1q'
PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]
You can build and upload the pipeline to the AI Platform Pipelines instance in one step, using the tfx pipeline create
command. The tfx pipeline create
goes through the following steps:
As you debug the pipeline DSL, you may prefer to first use the tfx pipeline compile
command, which only executes the compilation step. After the DSL compiles successfully you can use tfx pipeline create
to go through all steps.
The pipeline can run using a security context of the GKE default node pool's service account or the service account defined in the user-gcp-sa
secret of the Kubernetes namespace hosting Kubeflow Pipelines. If you want to use the user-gcp-sa
service account you change the value of USE_KFP_SA
to True
.
Note that the default AI Platform Pipelines configuration does not define the user-gcp-sa
secret.
In [ ]:
PIPELINE_NAME = 'tfx_covertype_continuous_training'
MODEL_NAME = 'tfx_covertype_classifier'
USE_KFP_SA=False
DATA_ROOT_URI = 'gs://workshop-datasets/covertype/small'
CUSTOM_TFX_IMAGE = 'gcr.io/{}/{}'.format(PROJECT_ID, PIPELINE_NAME)
RUNTIME_VERSION = '2.1'
PYTHON_VERSION = '3.7'
In [ ]:
%env PROJECT_ID={PROJECT_ID}
%env KUBEFLOW_TFX_IMAGE={CUSTOM_TFX_IMAGE}
%env ARTIFACT_STORE_URI={ARTIFACT_STORE_URI}
%env DATA_ROOT_URI={DATA_ROOT_URI}
%env GCP_REGION={GCP_REGION}
%env MODEL_NAME={MODEL_NAME}
%env PIPELINE_NAME={PIPELINE_NAME}
%env RUNTIME_VERSION={RUNTIME_VERSION}
%env PYTHON_VERIONS={PYTHON_VERSION}
%env USE_KFP_SA={USE_KFP_SA}
In [ ]:
!tfx pipeline compile --engine kubeflow --pipeline_path runner.py
After the pipeline code compiles without any errors you can use the tfx pipeline create
command to perform the full build and deploy the pipeline. You will deploy your compiled pipeline code e.g. gcr.io/[PROJECT_ID]/tfx_covertype_continuous_training
to run on AI Platform Pipelines with the TFX CLI.
In [ ]:
!tfx pipeline create \
--pipeline_path=runner.py \
--endpoint={ENDPOINT} \
--build_target_image={CUSTOM_TFX_IMAGE}
If you need to redeploy the pipeline you can first delete the previous version using tfx pipeline delete
or you can update the pipeline in-place using tfx pipeline update
.
To delete the pipeline:
tfx pipeline delete --pipeline_name {PIPELINE_NAME} --endpoint {ENDPOINT}
To update the pipeline:
tfx pipeline update --pipeline_path runner.py --endpoint {ENDPOINT}
1. Trigger a pipeline run using the TFX CLI.
In [ ]:
!tfx run create --pipeline_name={PIPELINE_NAME} --endpoint={ENDPOINT}
2. Trigger a pipeline run from the KFP UI.
On the AI Platform Pipelines page, click OPEN PIPELINES DASHBOARD
. A new tab will open. Select the Pipelines
tab to the left, you see the tfx_covertype_continuous_training
pipeline you deployed previously. Click on the pipeline name which will open up a window with a graphical display of your TFX pipeline. Next, click the Create a run
button. Verify the Pipeline name
and Pipeline version
are pre-populated and optionally provide a Run name
and Experiment
to logically group the run metadata under before hitting Start
.
Note: each full pipeline run takes about 45 minutes to 1 hour. Take the time to review the pipeline metadata artifacts created in the GCS storage bucket for each component including data splits, your Tensorflow SavedModel, model evaluation results, etc. as the pipeline executes.
To list all active runs of the pipeline:
In [ ]:
!tfx run list --pipeline_name {PIPELINE_NAME} --endpoint {ENDPOINT}
To retrieve the status of a given run:
In [ ]:
RUN_ID='[YOUR RUN ID]'
!tfx run status --pipeline_name {PIPELINE_NAME} --run_id {RUN_ID} --endpoint {ENDPOINT}
In this lab, you learned how to manually build and deploy a TFX pipeline to AI Platform Pipelines and trigger pipeline runs from a notebook. In the next lab, you will construct a Cloud Build CI/CD workflow that automatically builds and deploys this same TFX pipeline.
Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.</font>