In this lab, you will walk through authoring a Cloud Build CI/CD workflow that automatically builds and deploys the same TFX pipeline from lab-02.ipynb. You will also integrate your workflow with GitHub by setting up a trigger that starts the workflow when a new tag is applied to the GitHub repo hosting the pipeline's code.
In [ ]:
import yaml
# Set `PATH` to include the directory containing TFX CLI.
PATH=%env PATH
%env PATH=/home/jupyter/.local/bin:{PATH}
In [ ]:
!python -c "import tfx; print('TFX version: {}'.format(tfx.__version__))"
Note: this lab was built and tested with the following package versions:
TFX version: 0.21.4
In [ ]:
%pip install --upgrade --user tfx==0.21.4
Note: you may need to restart the kernel to pick up the correct package versions.
Review the cloudbuild.yaml file to understand how the CI/CD workflow is implemented and how environment specific settings are abstracted using Cloud Build variables.
The Cloud Build CI/CD workflow automates the steps you walked through manually during lab-02:
The Cloud Build workflow configuration uses both standard and custom Cloud Build builders. The custom builder encapsulates TFX CLI.
Navigate to AI Platform Pipelines page in the Google Cloud Console.
1. Create or select an existing Kubernetes cluster (GKE) and deploy AI Platform. Make sure to select "Allow access to the following Cloud APIs https://www.googleapis.com/auth/cloud-platform" to allow for programmatic access to your pipeline by the Kubeflow SDK for the rest of the lab. Also, provide an App instance name such as "tfx" or "mlops". Note you may have already deployed an AI Pipelines instance during the Setup for the lab series. If so, you can proceed using that instance below in the next step.
Validate the deployment of your AI Platform Pipelines instance in the console before proceeding.
2. Configuring environment settings
Update the below constants with the settings reflecting your lab environment.
GCP_REGION - the compute region for AI Platform Training and PredictionARTIFACT_STORE - the GCS bucket created during installation of AI Platform Pipelines. The bucket name starts with the kubeflowpipelines- prefix.
In [ ]:
# Use the following command to identify the GCS bucket for metadata and pipeline storage.
!gsutil ls
ENDPOINT - set the ENDPOINT constant to the endpoint to your AI Platform Pipelines instance. The endpoint to the AI Platform Pipelines instance can be found on the AI Platform Pipelines page in the Google Cloud Console. Open the SETTINGS for your instance and use the value of the host variable in the Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SKD section of the SETTINGS window. The format is '....[region].pipelines.googleusercontent.com'.
In [ ]:
#TODO: Set your environment settings here for GCP_REGION, ENDPOINT, and ARTIFACT_STORE_URI.
GCP_REGION = ''
ENDPOINT = ''
ARTIFACT_STORE_URI = ''
PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]
1. Review the Dockerfile describing the TFX CLI builder.
In [ ]:
!cat ./tfx-cli/Dockerfile
In [ ]:
!cat ./tfx-cli/requirements.txt
2. Build the image and push it to your project's Container Registry.
Hint: Review the Cloud Build gcloud command line reference for builds submit. Your image should follow the format gcr.io/[PROJECT_ID]/[IMAGE_NAME]:latest. Note the source code for the tfx-cli is in the directory ./tfx-cli.
In [ ]:
# TODO: Your gcloud command here to build tfx-cli and submit to Container Registry.
In [ ]:
PIPELINE_NAME='tfx_covertype_continuous_training'
TAG_NAME='test'
TFX_IMAGE_NAME='lab-03-tfx-image'
DATA_ROOT_URI='gs://workshop-datasets/covertype/small'
MODEL_NAME='tfx_covertype_classifier'
PIPELINE_FOLDER='pipeline'
PIPELINE_DSL='runner.py'
RUNTIME_VERSION='2.1'
PYTHON_VERSION='3.7'
USE_KFP_SA='False'
SUBSTITUTIONS="""
_ENDPOINT={},\
_GCP_REGION={},\
_ARTIFACT_STORE_URI={},\
_TFX_IMAGE_NAME={},\
_DATA_ROOT_URI={},\
_MODEL_NAME={},\
TAG_NAME={},\
_PIPELINE_FOLDER={},\
_PIPELINE_DSL={},\
_PIPELINE_NAME={},\
_RUNTIME_VERSION={},\
_USE_KFP_SA={},\
_PYTHON_VERSION={}
""".format(ENDPOINT,
GCP_REGION,
ARTIFACT_STORE_URI,
TFX_IMAGE_NAME,
DATA_ROOT_URI,
MODEL_NAME,
TAG_NAME,
PIPELINE_FOLDER,
PIPELINE_DSL,
PIPELINE_NAME,
RUNTIME_VERSION,
PYTHON_VERSION,
USE_KFP_SA
).strip()
In [ ]:
!gcloud builds submit . --config cloudbuild.yaml --substitutions {SUBSTITUTIONS}
In this exercise, you integrate your CI/CD workflow with GitHub, using Cloud Build GitHub App. You will set up a trigger that starts the CI/CD workflow when a new tag is applied to the GitHub repo managing the pipeline source code. You will use a fork of this repo as your source GitHub repository.
1. Follow the GitHub documentation to fork this repo.
2. Create a Cloud Build trigger.
Connect the fork you created in the previous step to your Google Cloud project and create a trigger following the steps in the Creating GitHub app trigger article. Use the following values on the Edit trigger form:
| Field | Value |
|---|---|
| Name | [YOUR TRIGGER NAME] |
| Description | [YOUR TRIGGER DESCRIPTION] |
| Event | Tag |
| Source | [YOUR FORK] |
| Tag (regex) | .* |
| Build Configuration | Cloud Build configuration file (yaml or json) |
| Cloud Build configuration file location | / workshops/tfx-caip-tf21/lab-03-tfx-cicd/cloudbuild.yaml |
Use the following values for the substitution variables:
| Variable | Value |
|---|---|
| _ENDPOINT | [Your inverting proxy host] |
| _TFX_IMAGE_NAME | lab-03-tfx-image |
| _PIPELINE_NAME | tfx_covertype_continuous_training |
| _PIPELINE_DSL | runner.py |
| _DATA_ROOT_URI | gs://workshop-datasets/covertype/small |
| _PIPELINE_FOLDER | workshops/tfx-caip-tf21/lab-03-tfx-cicd/pipeline |
| _PYTHON_VERSION | 3.7 |
| _RUNTIME_VERSION | 2.1 |
| _USE_KFP_SA | False |
3. Trigger the build.
To start an automated build create a new release of the repo in GitHub. Alternatively, you can start the build by applying a tag using git.
git tag [TAG NAME]
git push origin --tags
4. Verify triggered build in Cloud Build dashboard.
After you see the pipeline finish building on the Cloud Build dashboard, return to AI Platform Pipelines in the console. Click OPEN PIPELINES DASHBOARD and view the newly deployed pipeline. Creating a release tag on GitHub will create a pipeline with the name tfx_covertype_continuous_training-[TAG NAME] while doing so from GitHub will create a pipeline with the name tfx_covertype_continuous_training_github-[TAG NAME].
In this lab, you walked through authoring a Cloud Build CI/CD workflow that automatically builds and deploys a TFX pipeline. You also integrated your TFX workflow with GitHub by setting up a Cloud Build trigger. In the next lab, you will walk through inspection of TFX metadata and pipeline artifacts created during TFX pipeline runs.
Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.</font>