Kubeflow Fairing E2E MNIST Case: Building, Training and Serving

This example guides you through:

  1. Taking an example TensorFlow model and modifying it to support distributed training.
  2. Using Kubeflow Fairing to build docker image and launch a TFJob to train model.
  3. Using Kubeflow Fairing to create InferenceService (KFServing) to deploy the trained model.
  4. Clean up the TFJob and InferenceService using kubeflow-tfjob and kfserving SDK client.

Requirements

  • The TF-Operator and KFServing have been installed in Kubenertes Cluster.

Prepare Training Code

We modified the examples to be better suited for distributed training and model serving. There is a delta between existing distributed mnist examples and what's needed to run well as a TFJob. The updated training code is mnist.py.

Install Required Libraries


In [1]:
!pip install git+git://github.com/kubeflow/fairing.git@dc61c4c88f233edaf22b13bbfb184ded0ed877a4


Collecting git+git://github.com/kubeflow/fairing.git@dc61c4c88f233edaf22b13bbfb184ded0ed877a4
  Cloning git://github.com/kubeflow/fairing.git (to revision dc61c4c88f233edaf22b13bbfb184ded0ed877a4) to /tmp/pip-req-build-yqjo1vet
  Running command git clone -q git://github.com/kubeflow/fairing.git /tmp/pip-req-build-yqjo1vet
Requirement already satisfied (use --upgrade to upgrade): kubeflow-fairing==0.7.1 from git+git://github.com/kubeflow/fairing.git@dc61c4c88f233edaf22b13bbfb184ded0ed877a4 in /opt/python/python36/lib/python3.6/site-packages
Requirement already satisfied: python-dateutil<=2.8.0,>=2.1 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (2.8.0)
Requirement already satisfied: numpy>=1.17.3 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (1.18.1)
Requirement already satisfied: kfserving>=0.2.1.1 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (0.2.2.1)
Requirement already satisfied: docker>=3.4.1 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (4.2.0)
Requirement already satisfied: notebook>=5.6.0 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (6.0.3)
Requirement already satisfied: kubernetes>=10.0.1 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (10.0.1)
Requirement already satisfied: future>=0.17.1 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (0.18.2)
Requirement already satisfied: six>=1.11.0 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (1.14.0)
Requirement already satisfied: google-cloud-storage>=1.13.2 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (1.26.0)
Requirement already satisfied: google-cloud-logging>=1.13.0 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (1.14.0)
Requirement already satisfied: requests>=2.21.0 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (2.22.0)
Requirement already satisfied: setuptools>=34.0.0 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (45.2.0)
Requirement already satisfied: google-auth>=1.6.2 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (1.11.1)
Requirement already satisfied: httplib2>=0.12.0 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (0.17.0)
Requirement already satisfied: oauth2client>=4.0.0 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (4.1.3)
Requirement already satisfied: tornado>=6.0.1 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (6.0.3)
Requirement already satisfied: google-api-python-client>=1.7.8 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (1.7.11)
Requirement already satisfied: cloudpickle>=0.8 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (1.3.0)
Requirement already satisfied: urllib3==1.24.2 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (1.24.2)
Requirement already satisfied: boto3>=1.9.0 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (1.11.17)
Requirement already satisfied: azure>=4.0.0 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (4.0.0)
Requirement already satisfied: retrying>=1.3.3 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (1.3.3)
Requirement already satisfied: kubeflow-tfjob>=0.1.1 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (0.1.3)
Requirement already satisfied: kubeflow-pytorchjob>=0.1.1 in /opt/python/python36/lib/python3.6/site-packages (from kubeflow-fairing==0.7.1) (0.1.3)
Requirement already satisfied: argparse>=1.4.0 in /opt/python/python36/lib/python3.6/site-packages (from kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (1.4.0)
Requirement already satisfied: table-logger>=0.3.5 in /opt/python/python36/lib/python3.6/site-packages (from kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (0.3.6)
Requirement already satisfied: certifi>=14.05.14 in /opt/python/python36/lib/python3.6/site-packages (from kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (2019.11.28)
Requirement already satisfied: azure-storage-blob<=2.1.0,>=1.3.0 in /opt/python/python36/lib/python3.6/site-packages (from kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (1.5.0)
Requirement already satisfied: adal>=1.2.2 in /opt/python/python36/lib/python3.6/site-packages (from kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (1.2.2)
Requirement already satisfied: minio>=4.0.9 in /opt/python/python36/lib/python3.6/site-packages (from kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (5.0.7)
Requirement already satisfied: websocket-client>=0.32.0 in /opt/python/python36/lib/python3.6/site-packages (from docker>=3.4.1->kubeflow-fairing==0.7.1) (0.57.0)
Requirement already satisfied: jupyter-core>=4.6.1 in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (4.6.2)
Requirement already satisfied: ipython-genutils in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.2.0)
Requirement already satisfied: terminado>=0.8.1 in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.8.3)
Requirement already satisfied: traitlets>=4.2.1 in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (4.3.3)
Requirement already satisfied: Send2Trash in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (1.5.0)
Requirement already satisfied: jinja2 in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (2.11.1)
Requirement already satisfied: prometheus-client in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.7.1)
Requirement already satisfied: nbconvert in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (5.6.1)
Requirement already satisfied: pyzmq>=17 in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (18.1.1)
Requirement already satisfied: ipykernel in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (5.1.4)
Requirement already satisfied: jupyter-client>=5.3.4 in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (5.3.4)
Requirement already satisfied: nbformat in /opt/python/python36/lib/python3.6/site-packages (from notebook>=5.6.0->kubeflow-fairing==0.7.1) (5.0.4)
Requirement already satisfied: requests-oauthlib in /opt/python/python36/lib/python3.6/site-packages (from kubernetes>=10.0.1->kubeflow-fairing==0.7.1) (1.3.0)
Requirement already satisfied: pyyaml>=3.12 in /opt/python/python36/lib/python3.6/site-packages (from kubernetes>=10.0.1->kubeflow-fairing==0.7.1) (5.3)
Requirement already satisfied: google-resumable-media<0.6dev,>=0.5.0 in /opt/python/python36/lib/python3.6/site-packages (from google-cloud-storage>=1.13.2->kubeflow-fairing==0.7.1) (0.5.0)
Requirement already satisfied: google-cloud-core<2.0dev,>=1.2.0 in /opt/python/python36/lib/python3.6/site-packages (from google-cloud-storage>=1.13.2->kubeflow-fairing==0.7.1) (1.3.0)
Requirement already satisfied: google-api-core[grpc]<2.0.0dev,>=1.14.0 in /opt/python/python36/lib/python3.6/site-packages (from google-cloud-logging>=1.13.0->kubeflow-fairing==0.7.1) (1.16.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/python/python36/lib/python3.6/site-packages (from requests>=2.21.0->kubeflow-fairing==0.7.1) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in /opt/python/python36/lib/python3.6/site-packages (from requests>=2.21.0->kubeflow-fairing==0.7.1) (2.8)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/python/python36/lib/python3.6/site-packages (from google-auth>=1.6.2->kubeflow-fairing==0.7.1) (0.2.8)
Requirement already satisfied: rsa<4.1,>=3.1.4 in /opt/python/python36/lib/python3.6/site-packages (from google-auth>=1.6.2->kubeflow-fairing==0.7.1) (4.0)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/python/python36/lib/python3.6/site-packages (from google-auth>=1.6.2->kubeflow-fairing==0.7.1) (4.0.0)
Requirement already satisfied: pyasn1>=0.1.7 in /opt/python/python36/lib/python3.6/site-packages (from oauth2client>=4.0.0->kubeflow-fairing==0.7.1) (0.4.8)
Requirement already satisfied: uritemplate<4dev,>=3.0.0 in /opt/python/python36/lib/python3.6/site-packages (from google-api-python-client>=1.7.8->kubeflow-fairing==0.7.1) (3.0.1)
Requirement already satisfied: google-auth-httplib2>=0.0.3 in /opt/python/python36/lib/python3.6/site-packages (from google-api-python-client>=1.7.8->kubeflow-fairing==0.7.1) (0.0.3)
Requirement already satisfied: botocore<1.15.0,>=1.14.17 in /opt/python/python36/lib/python3.6/site-packages (from boto3>=1.9.0->kubeflow-fairing==0.7.1) (1.14.17)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /opt/python/python36/lib/python3.6/site-packages (from boto3>=1.9.0->kubeflow-fairing==0.7.1) (0.9.4)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /opt/python/python36/lib/python3.6/site-packages (from boto3>=1.9.0->kubeflow-fairing==0.7.1) (0.3.3)
Requirement already satisfied: azure-graphrbac~=0.40.0 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (0.40.0)
Requirement already satisfied: azure-servicemanagement-legacy~=0.20.6 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (0.20.6)
Requirement already satisfied: azure-storage-queue~=1.3 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (1.4.0)
Requirement already satisfied: azure-eventgrid~=1.1 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (1.3.0)
Requirement already satisfied: azure-servicebus~=0.21.1 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (0.21.1)
Requirement already satisfied: azure-cosmosdb-table~=1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (1.0.6)
Requirement already satisfied: azure-batch~=4.1 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (4.1.3)
Requirement already satisfied: azure-datalake-store~=0.0.18 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (0.0.48)
Requirement already satisfied: azure-servicefabric~=6.3.0.0 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (6.3.0.0)
Requirement already satisfied: azure-mgmt~=4.0 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (4.0.0)
Requirement already satisfied: azure-applicationinsights~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.0)
Requirement already satisfied: azure-loganalytics~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.0)
Requirement already satisfied: azure-storage-file~=1.3 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (1.4.0)
Requirement already satisfied: azure-keyvault~=1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure>=4.0.0->kubeflow-fairing==0.7.1) (1.1.0)
Requirement already satisfied: azure-storage-common~=1.4 in /opt/python/python36/lib/python3.6/site-packages (from azure-storage-blob<=2.1.0,>=1.3.0->kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (1.4.2)
Requirement already satisfied: azure-common>=1.1.5 in /opt/python/python36/lib/python3.6/site-packages (from azure-storage-blob<=2.1.0,>=1.3.0->kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (1.1.24)
Requirement already satisfied: cryptography>=1.1.0 in /opt/python/python36/lib/python3.6/site-packages (from adal>=1.2.2->kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (2.8)
Requirement already satisfied: PyJWT>=1.0.0 in /opt/python/python36/lib/python3.6/site-packages (from adal>=1.2.2->kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (1.7.1)
Requirement already satisfied: configparser in /opt/python/python36/lib/python3.6/site-packages (from minio>=4.0.9->kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (4.0.2)
Requirement already satisfied: pytz in /opt/python/python36/lib/python3.6/site-packages (from minio>=4.0.9->kfserving>=0.2.1.1->kubeflow-fairing==0.7.1) (2019.3)
Requirement already satisfied: ptyprocess; os_name != "nt" in /opt/python/python36/lib/python3.6/site-packages (from terminado>=0.8.1->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.6.0)
Requirement already satisfied: decorator in /opt/python/python36/lib/python3.6/site-packages (from traitlets>=4.2.1->notebook>=5.6.0->kubeflow-fairing==0.7.1) (4.4.1)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/python/python36/lib/python3.6/site-packages (from jinja2->notebook>=5.6.0->kubeflow-fairing==0.7.1) (1.1.1)
Requirement already satisfied: testpath in /opt/python/python36/lib/python3.6/site-packages (from nbconvert->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.4.4)
Requirement already satisfied: bleach in /opt/python/python36/lib/python3.6/site-packages (from nbconvert->notebook>=5.6.0->kubeflow-fairing==0.7.1) (3.1.0)
Requirement already satisfied: entrypoints>=0.2.2 in /opt/python/python36/lib/python3.6/site-packages (from nbconvert->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.3)
Requirement already satisfied: mistune<2,>=0.8.1 in /opt/python/python36/lib/python3.6/site-packages (from nbconvert->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.8.4)
Requirement already satisfied: defusedxml in /opt/python/python36/lib/python3.6/site-packages (from nbconvert->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.6.0)
Requirement already satisfied: pygments in /opt/python/python36/lib/python3.6/site-packages (from nbconvert->notebook>=5.6.0->kubeflow-fairing==0.7.1) (2.5.2)
Requirement already satisfied: pandocfilters>=1.4.1 in /opt/python/python36/lib/python3.6/site-packages (from nbconvert->notebook>=5.6.0->kubeflow-fairing==0.7.1) (1.4.2)
Requirement already satisfied: ipython>=5.0.0 in /opt/python/python36/lib/python3.6/site-packages (from ipykernel->notebook>=5.6.0->kubeflow-fairing==0.7.1) (7.12.0)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/python/python36/lib/python3.6/site-packages (from nbformat->notebook>=5.6.0->kubeflow-fairing==0.7.1) (3.2.0)
Requirement already satisfied: oauthlib>=3.0.0 in /opt/python/python36/lib/python3.6/site-packages (from requests-oauthlib->kubernetes>=10.0.1->kubeflow-fairing==0.7.1) (3.1.0)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /opt/python/python36/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=1.14.0->google-cloud-logging>=1.13.0->kubeflow-fairing==0.7.1) (1.51.0)
Requirement already satisfied: protobuf>=3.4.0 in /opt/python/python36/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=1.14.0->google-cloud-logging>=1.13.0->kubeflow-fairing==0.7.1) (3.11.3)
Requirement already satisfied: grpcio<2.0dev,>=1.8.2; extra == "grpc" in /opt/python/python36/lib/python3.6/site-packages (from google-api-core[grpc]<2.0.0dev,>=1.14.0->google-cloud-logging>=1.13.0->kubeflow-fairing==0.7.1) (1.27.2)
Requirement already satisfied: docutils<0.16,>=0.10 in /opt/python/python36/lib/python3.6/site-packages (from botocore<1.15.0,>=1.14.17->boto3>=1.9.0->kubeflow-fairing==0.7.1) (0.15.2)
Requirement already satisfied: msrestazure<2.0.0,>=0.4.20 in /opt/python/python36/lib/python3.6/site-packages (from azure-graphrbac~=0.40.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.6.2)
Requirement already satisfied: azure-nspkg>=2.0.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-graphrbac~=0.40.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (3.0.2)
Requirement already satisfied: msrest>=0.5.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-eventgrid~=1.1->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.6.11)
Requirement already satisfied: azure-cosmosdb-nspkg>=2.0.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-cosmosdb-table~=1.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.0.2)
Requirement already satisfied: cffi in /opt/python/python36/lib/python3.6/site-packages (from azure-datalake-store~=0.0.18->azure>=4.0.0->kubeflow-fairing==0.7.1) (1.14.0)
Requirement already satisfied: azure-mgmt-servicebus~=0.5.1 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.5.3)
Requirement already satisfied: azure-mgmt-compute~=4.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (4.6.2)
Requirement already satisfied: azure-mgmt-trafficmanager~=0.50.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.50.0)
Requirement already satisfied: azure-mgmt-recoveryservices~=0.3.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.3.0)
Requirement already satisfied: azure-mgmt-managementgroups~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.0)
Requirement already satisfied: azure-mgmt-servicefabric~=0.2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.2.0)
Requirement already satisfied: azure-mgmt-monitor~=0.5.2 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.5.2)
Requirement already satisfied: azure-mgmt-subscription~=0.2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.2.0)
Requirement already satisfied: azure-mgmt-sql~=0.9.1 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.9.1)
Requirement already satisfied: azure-mgmt-search~=2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.1.0)
Requirement already satisfied: azure-mgmt-media~=1.0.0rc2 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (1.0.0)
Requirement already satisfied: azure-mgmt-cosmosdb~=0.4.1 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.4.1)
Requirement already satisfied: azure-mgmt-datafactory~=0.6.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.6.0)
Requirement already satisfied: azure-mgmt-marketplaceordering~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.0)
Requirement already satisfied: azure-mgmt-containerregistry~=2.1 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.8.0)
Requirement already satisfied: azure-mgmt-consumption~=2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.0.0)
Requirement already satisfied: azure-mgmt-iothubprovisioningservices~=0.2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.2.0)
Requirement already satisfied: azure-mgmt-signalr~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.1)
Requirement already satisfied: azure-mgmt-rdbms~=1.2 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (1.9.0)
Requirement already satisfied: azure-mgmt-dns~=2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.1.0)
Requirement already satisfied: azure-mgmt-datamigration~=1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (1.0.0)
Requirement already satisfied: azure-mgmt-msi~=0.2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.2.0)
Requirement already satisfied: azure-mgmt-advisor~=1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (1.0.1)
Requirement already satisfied: azure-mgmt-iothub~=0.5.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.5.0)
Requirement already satisfied: azure-mgmt-relay~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.0)
Requirement already satisfied: azure-mgmt-datalake-store~=0.5.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.5.0)
Requirement already satisfied: azure-mgmt-managementpartner~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.1)
Requirement already satisfied: azure-mgmt-storage~=2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.0.0)
Requirement already satisfied: azure-mgmt-maps~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.0)
Requirement already satisfied: azure-mgmt-billing~=0.2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.2.0)
Requirement already satisfied: azure-mgmt-commerce~=1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (1.0.1)
Requirement already satisfied: azure-mgmt-applicationinsights~=0.1.1 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.1)
Requirement already satisfied: azure-mgmt-cognitiveservices~=3.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (3.0.0)
Requirement already satisfied: azure-mgmt-eventhub~=2.1 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.6.0)
Requirement already satisfied: azure-mgmt-authorization~=0.50.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.50.0)
Requirement already satisfied: azure-mgmt-eventgrid~=1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (1.0.0)
Requirement already satisfied: azure-mgmt-network~=2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.7.0)
Requirement already satisfied: azure-mgmt-powerbiembedded~=2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.0.0)
Requirement already satisfied: azure-mgmt-containerinstance~=1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (1.5.0)
Requirement already satisfied: azure-mgmt-logic~=3.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (3.0.0)
Requirement already satisfied: azure-mgmt-hanaonazure~=0.1.1 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.1)
Requirement already satisfied: azure-mgmt-reservations~=0.2.1 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.2.1)
Requirement already satisfied: azure-mgmt-devspaces~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.0)
Requirement already satisfied: azure-mgmt-cdn~=3.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (3.1.0)
Requirement already satisfied: azure-mgmt-batch~=5.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (5.0.1)
Requirement already satisfied: azure-mgmt-iotcentral~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.0)
Requirement already satisfied: azure-mgmt-scheduler~=2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.0.0)
Requirement already satisfied: azure-mgmt-loganalytics~=0.2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.2.0)
Requirement already satisfied: azure-mgmt-notificationhubs~=2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.1.0)
Requirement already satisfied: azure-mgmt-redis~=5.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (5.0.0)
Requirement already satisfied: azure-mgmt-resource~=2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.2.0)
Requirement already satisfied: azure-mgmt-machinelearningcompute~=0.4.1 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.4.1)
Requirement already satisfied: azure-mgmt-devtestlabs~=2.2 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.2.0)
Requirement already satisfied: azure-mgmt-keyvault~=1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (1.1.0)
Requirement already satisfied: azure-mgmt-policyinsights~=0.1.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.1.0)
Requirement already satisfied: azure-mgmt-web~=0.35.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.35.0)
Requirement already satisfied: azure-mgmt-batchai~=2.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.0.0)
Requirement already satisfied: azure-mgmt-recoveryservicesbackup~=0.3.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.3.0)
Requirement already satisfied: azure-mgmt-datalake-analytics~=0.6.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.6.0)
Requirement already satisfied: azure-mgmt-containerservice~=4.2 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (4.4.0)
Requirement already satisfied: webencodings in /opt/python/python36/lib/python3.6/site-packages (from bleach->nbconvert->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.5.1)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/python/python36/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->notebook>=5.6.0->kubeflow-fairing==0.7.1) (3.0.3)
Requirement already satisfied: backcall in /opt/python/python36/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.1.0)
Requirement already satisfied: jedi>=0.10 in /opt/python/python36/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.16.0)
Requirement already satisfied: pexpect; sys_platform != "win32" in /opt/python/python36/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->notebook>=5.6.0->kubeflow-fairing==0.7.1) (4.8.0)
Requirement already satisfied: pickleshare in /opt/python/python36/lib/python3.6/site-packages (from ipython>=5.0.0->ipykernel->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.7.5)
Requirement already satisfied: attrs>=17.4.0 in /opt/python/python36/lib/python3.6/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->notebook>=5.6.0->kubeflow-fairing==0.7.1) (19.3.0)
Requirement already satisfied: pyrsistent>=0.14.0 in /opt/python/python36/lib/python3.6/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.15.7)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /opt/python/python36/lib/python3.6/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->notebook>=5.6.0->kubeflow-fairing==0.7.1) (1.5.0)
Requirement already satisfied: isodate>=0.6.0 in /opt/python/python36/lib/python3.6/site-packages (from msrest>=0.5.0->azure-eventgrid~=1.1->azure>=4.0.0->kubeflow-fairing==0.7.1) (0.6.0)
Requirement already satisfied: pycparser in /opt/python/python36/lib/python3.6/site-packages (from cffi->azure-datalake-store~=0.0.18->azure>=4.0.0->kubeflow-fairing==0.7.1) (2.19)
Requirement already satisfied: azure-mgmt-nspkg>=2.0.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt-trafficmanager~=0.50.0->azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (3.0.2)
Requirement already satisfied: azure-mgmt-datalake-nspkg>=2.0.0 in /opt/python/python36/lib/python3.6/site-packages (from azure-mgmt-datalake-store~=0.5.0->azure-mgmt~=4.0->azure>=4.0.0->kubeflow-fairing==0.7.1) (3.0.1)
Requirement already satisfied: wcwidth in /opt/python/python36/lib/python3.6/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=5.0.0->ipykernel->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.1.8)
Requirement already satisfied: parso>=0.5.2 in /opt/python/python36/lib/python3.6/site-packages (from jedi>=0.10->ipython>=5.0.0->ipykernel->notebook>=5.6.0->kubeflow-fairing==0.7.1) (0.6.1)
Requirement already satisfied: zipp>=0.5 in /opt/python/python36/lib/python3.6/site-packages (from importlib-metadata; python_version < "3.8"->jsonschema!=2.5.0,>=2.4->nbformat->notebook>=5.6.0->kubeflow-fairing==0.7.1) (2.2.0)
Building wheels for collected packages: kubeflow-fairing
  Building wheel for kubeflow-fairing (setup.py) ... done
  Created wheel for kubeflow-fairing: filename=kubeflow_fairing-0.7.1-py3-none-any.whl size=154861 sha256=4b92aa1c6d22a629ae5c766c294641ee2ccd1763d963d006860e8dce554182a7
  Stored in directory: /root/.cache/pip/wheels/10/9f/7c/dda9d45fc21712d6ee8be6592da856aba0afe96abc0bcf6099
Successfully built kubeflow-fairing

In [2]:
import yaml
from importlib import reload
# Force a reload of kubeflow; since kubeflow is a multi namespace module
# it looks like doing this in notebook_setup may not be sufficient
import kubeflow
reload(kubeflow)


Out[2]:
<module 'kubeflow' from '/opt/python/python36/lib/python3.6/site-packages/kubeflow/__init__.py'>

Configure The Docker Registry For Kubeflow Fairing

  • In order to build docker images from your notebook we need a docker registry where the images will be stored

Note: The below section must be updated to your values.


In [3]:
# Set docker registry to store image.
# Ensure you have permission for pushing docker image requests. 
DOCKER_REGISTRY = 'index.docker.io/jinchi'

# Set namespace. Note that the created PVC should be in the namespace.
my_namespace = 'hejinchi'
# You also can get the default target namepspace using below API.
#namespace = fairing_utils.get_default_target_namespace()

Create PV/PVC to Store The Exported Model

Create Persistent Volume(PV) and Persistent Volume Claim(PVC), the PVC will be used by pods of training and serving for local mode in steps below.

Note: The below section must be updated to your values.


In [4]:
# To satify the distributed training, the PVC should be access from all nodes in the cluster.
# The example creates a NFS PV to satify that.
nfs_server = '172.16.189.69'
nfs_path = '/opt/kubeflow/data/mnist'
pv_name = 'mnist-e2e-pv'
pvc_name = 'mnist-e2e-pvc'

Skip below creating PV/PVC step if you set an existing PV and PVC.


In [5]:
from kubernetes import client as k8s_client
from kubernetes import config as k8s_config
from kubeflow.fairing.utils import is_running_in_k8s

pv_yaml = f'''
apiVersion: v1
kind: PersistentVolume
metadata:
  name: {pv_name}
spec:
  capacity:
    storage: 10Gi
  accessModes:
  - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: {nfs_path}
    server: {nfs_server}
'''
pvc_yaml = f'''
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: {pvc_name}
  namespace: {my_namespace}
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 10Gi
'''

if is_running_in_k8s():
    k8s_config.load_incluster_config()
else:
    k8s_config.load_kube_config()

k8s_core_api = k8s_client.CoreV1Api()
k8s_core_api.create_persistent_volume(yaml.safe_load(pv_yaml))
k8s_core_api.create_namespaced_persistent_volume_claim(my_namespace, yaml.safe_load(pvc_yaml))


Out[5]:
{'api_version': 'v1',
 'kind': 'PersistentVolumeClaim',
 'metadata': {'annotations': None,
              'cluster_name': None,
              'creation_timestamp': datetime.datetime(2020, 2, 24, 4, 53, 57, tzinfo=tzutc()),
              'deletion_grace_period_seconds': None,
              'deletion_timestamp': None,
              'finalizers': ['kubernetes.io/pvc-protection'],
              'generate_name': None,
              'generation': None,
              'initializers': None,
              'labels': None,
              'managed_fields': None,
              'name': 'mnist-e2e-pvc',
              'namespace': 'hejinchi',
              'owner_references': None,
              'resource_version': '5761749',
              'self_link': '/api/v1/namespaces/hejinchi/persistentvolumeclaims/mnist-e2e-pvc',
              'uid': '3c81bc0d-2879-4afa-850b-86a56f17c2b9'},
 'spec': {'access_modes': ['ReadWriteMany'],
          'data_source': None,
          'resources': {'limits': None, 'requests': {'storage': '10Gi'}},
          'selector': None,
          'storage_class_name': '',
          'volume_mode': 'Filesystem',
          'volume_name': None},
 'status': {'access_modes': None,
            'capacity': None,
            'conditions': None,
            'phase': 'Pending'}}

Use Kubeflow fairing to build the docker image and launch a TFJob for training

  • Use kubeflow fairing to build a docker image that includes all your dependencies
  • Launch a TFJob in the on premise cluster to taining model.

Firstly set some custom training parameters for TFJob.


In [6]:
num_ps = 1  #number of PS in TFJob 
num_workers = 2  #number of Worker in TFJob 
model_dir = "/mnt"
export_path = "/mnt/export" 
train_steps = "200"
batch_size = "100"
learning_rate = "0.01"

Use Kubeflow Fairing to build a docker image and push to docker registry, and then launch a TFJob in the on-prem cluster for distributed training model.


In [7]:
import uuid
from kubeflow import fairing   
from kubeflow.fairing.kubernetes.utils import mounting_pvc

tfjob_name = f'mnist-training-{uuid.uuid4().hex[:4]}'

output_map =  {
    "Dockerfile": "Dockerfile",
    "mnist.py": "mnist.py"
}

command=["python",
         "/opt/mnist.py",
         "--tf-model-dir=" + model_dir,
         "--tf-export-dir=" + export_path,
         "--tf-train-steps=" + train_steps,
         "--tf-batch-size=" + batch_size,
         "--tf-learning-rate=" + learning_rate]

fairing.config.set_preprocessor('python', command=command, path_prefix="/app", output_map=output_map)
fairing.config.set_builder(name='docker', registry=DOCKER_REGISTRY, base_image="",
                           image_name="mnist", dockerfile_path="Dockerfile")
fairing.config.set_deployer(name='tfjob', namespace=my_namespace, stream_log=False,
                            worker_count=num_workers, ps_count=num_ps, job_name=tfjob_name,
                            pod_spec_mutators = [mounting_pvc(pvc_name=pvc_name, pvc_mount_path=model_dir)])
fairing.config.run()


[I 200223 20:53:57 config:125] Using preprocessor: <kubeflow.fairing.preprocessors.base.BasePreProcessor object at 0x7f42a6ef96a0>
[I 200223 20:53:57 config:127] Using builder: <kubeflow.fairing.builders.docker.docker.DockerBuilder object at 0x7f42dc4f6eb8>
[I 200223 20:53:57 config:129] Using deployer: <kubeflow.fairing.deployers.tfjob.tfjob.TfJob object at 0x7f42a6ef95f8>
[I 200223 20:53:57 docker:32] Building image using docker
[W 200223 20:53:57 docker:41] Docker command: ['python', '/opt/mnist.py', '--tf-model-dir=/mnt', '--tf-export-dir=/mnt/export', '--tf-train-steps=200', '--tf-batch-size=100', '--tf-learning-rate=0.01']
[I 200223 20:53:57 base:107] Creating docker context: /tmp/fairing_context_iu67hth0
[W 200223 20:53:57 docker:56] Building docker image index.docker.io/jinchi/mnist:54A2DC37...
[I 200223 20:53:57 docker:103] Build output: Step 1/5 : FROM tensorflow/tensorflow:1.15.2-py3
[I 200223 20:53:57 docker:103] Build output: 
[I 200223 20:53:57 docker:103] Build output: ---> b2b972268a17
[I 200223 20:53:57 docker:103] Build output: Step 2/5 : ADD mnist.py /opt/mnist.py
[I 200223 20:53:57 docker:103] Build output: 
[I 200223 20:53:58 docker:103] Build output: ---> Using cache
[I 200223 20:53:58 docker:103] Build output: ---> 57336dff2275
[I 200223 20:53:58 docker:103] Build output: Step 3/5 : RUN chmod +x /opt/mnist.py
[I 200223 20:53:58 docker:103] Build output: 
[I 200223 20:53:58 docker:103] Build output: ---> Using cache
[I 200223 20:53:58 docker:103] Build output: ---> b4ea1c1ee7a5
[I 200223 20:53:58 docker:103] Build output: Step 4/5 : ENTRYPOINT ["/usr/bin/python"]
[I 200223 20:53:58 docker:103] Build output: 
[I 200223 20:53:58 docker:103] Build output: ---> Using cache
[I 200223 20:53:58 docker:103] Build output: ---> e9382e66f5cf
[I 200223 20:53:58 docker:103] Build output: Step 5/5 : CMD ["/opt/mnist.py"]
[I 200223 20:53:58 docker:103] Build output: 
[I 200223 20:53:58 docker:103] Build output: ---> Using cache
[I 200223 20:53:58 docker:103] Build output: ---> d3b32c13f461
[I 200223 20:53:58 docker:103] Push finished: {'ID': 'sha256:d3b32c13f461a3ff7266b9c89c9261e1defdad3234309e42869ddc11be1831d6'}
[I 200223 20:53:58 docker:103] Build output: Successfully built d3b32c13f461
[I 200223 20:53:58 docker:103] Build output: Successfully tagged jinchi/mnist:54A2DC37
[W 200223 20:53:58 docker:70] Publishing image index.docker.io/jinchi/mnist:54A2DC37...
[I 200223 20:53:58 docker:103] Push output: The push refers to repository [docker.io/jinchi/mnist] None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Preparing None
[I 200223 20:53:58 docker:103] Push output: Waiting None
[I 200223 20:53:58 docker:103] Push output: Waiting None
[I 200223 20:53:58 docker:103] Push output: Waiting None
[I 200223 20:53:58 docker:103] Push output: Waiting None
[I 200223 20:53:58 docker:103] Push output: Waiting None
[I 200223 20:53:58 docker:103] Push output: Waiting None
[I 200223 20:53:58 docker:103] Push output: Waiting None
[I 200223 20:53:59 docker:103] Push output: Layer already exists None
[I 200223 20:53:59 docker:103] Push output: Layer already exists None
[I 200223 20:53:59 docker:103] Push output: Layer already exists None
[I 200223 20:53:59 docker:103] Push output: Layer already exists None
[I 200223 20:53:59 docker:103] Push output: Layer already exists None
[I 200223 20:53:59 docker:103] Push output: Layer already exists None
[I 200223 20:53:59 docker:103] Push output: Layer already exists None
[I 200223 20:53:59 docker:103] Push output: Layer already exists None
[I 200223 20:53:59 docker:103] Push output: Layer already exists None
[I 200223 20:53:59 docker:103] Push output: Layer already exists None
[I 200223 20:54:00 docker:103] Push output: Layer already exists None
[I 200223 20:54:00 docker:103] Push output: Layer already exists None
[I 200223 20:54:01 docker:103] Push output: 54A2DC37: digest: sha256:b4910d86d53a45dca89fd395ae4bcfbb3c7f440b4c108ef5ed1b64ca5bcce70a size: 2828 None
[I 200223 20:54:01 docker:103] Push finished: {'Tag': '54A2DC37', 'Digest': 'sha256:b4910d86d53a45dca89fd395ae4bcfbb3c7f440b4c108ef5ed1b64ca5bcce70a', 'Size': 2828}
[W 200223 20:54:01 job:90] The tfjob mnist-training-445b launched.
Out[7]:
(<kubeflow.fairing.preprocessors.base.BasePreProcessor at 0x7f42a6ef96a0>,
 <kubeflow.fairing.builders.docker.docker.DockerBuilder at 0x7f42dc4f6eb8>,
 <kubeflow.fairing.deployers.tfjob.tfjob.TfJob at 0x7f42a6ef95f8>)

Get The Created TFJobs


In [8]:
from kubeflow.tfjob import TFJobClient
tfjob_client = TFJobClient()

tfjob_client.get(tfjob_name, namespace=my_namespace)


Out[8]:
{'apiVersion': 'kubeflow.org/v1',
 'kind': 'TFJob',
 'metadata': {'creationTimestamp': '2020-02-24T04:54:01Z',
  'generateName': 'fairing-tfjob-',
  'generation': 1,
  'labels': {'fairing-deployer': 'tfjob',
   'fairing-id': 'acd47550-56c1-11ea-b7e1-00163e01bd45'},
  'name': 'mnist-training-445b',
  'namespace': 'hejinchi',
  'resourceVersion': '5761787',
  'selfLink': '/apis/kubeflow.org/v1/namespaces/hejinchi/tfjobs/mnist-training-445b',
  'uid': '6546a875-348b-41bb-8510-04abc7dd1a58'},
 'spec': {'tfReplicaSpecs': {'Chief': {'replicas': 1,
    'template': {'metadata': {'annotations': {'sidecar.istio.io/inject': 'false'},
      'labels': {'fairing-deployer': 'tfjob',
       'fairing-id': 'acd47550-56c1-11ea-b7e1-00163e01bd45'},
      'name': 'fairing-deployer'},
     'spec': {'containers': [{'command': ['python',
         '/opt/mnist.py',
         '--tf-model-dir=/mnt',
         '--tf-export-dir=/mnt/export',
         '--tf-train-steps=200',
         '--tf-batch-size=100',
         '--tf-learning-rate=0.01'],
        'env': [{'name': 'FAIRING_RUNTIME', 'value': '1'}],
        'image': 'index.docker.io/jinchi/mnist:54A2DC37',
        'name': 'tensorflow',
        'securityContext': {'runAsUser': 0},
        'volumeMounts': [{'mountPath': '/mnt',
          'name': 'fairing-volume-mnist-e2e-pvc'}],
        'workingDir': '/app'}],
      'restartPolicy': 'Never',
      'volumes': [{'name': 'fairing-volume-mnist-e2e-pvc',
        'persistentVolumeClaim': {'claimName': 'mnist-e2e-pvc'}}]}}},
   'PS': {'replicas': 1,
    'template': {'metadata': {'annotations': {'sidecar.istio.io/inject': 'false'},
      'labels': {'fairing-deployer': 'tfjob',
       'fairing-id': 'acd47550-56c1-11ea-b7e1-00163e01bd45'},
      'name': 'fairing-deployer'},
     'spec': {'containers': [{'command': ['python',
         '/opt/mnist.py',
         '--tf-model-dir=/mnt',
         '--tf-export-dir=/mnt/export',
         '--tf-train-steps=200',
         '--tf-batch-size=100',
         '--tf-learning-rate=0.01'],
        'env': [{'name': 'FAIRING_RUNTIME', 'value': '1'}],
        'image': 'index.docker.io/jinchi/mnist:54A2DC37',
        'name': 'tensorflow',
        'securityContext': {'runAsUser': 0},
        'volumeMounts': [{'mountPath': '/mnt',
          'name': 'fairing-volume-mnist-e2e-pvc'}],
        'workingDir': '/app'}],
      'restartPolicy': 'Never',
      'volumes': [{'name': 'fairing-volume-mnist-e2e-pvc',
        'persistentVolumeClaim': {'claimName': 'mnist-e2e-pvc'}}]}}},
   'Worker': {'replicas': 2,
    'template': {'metadata': {'annotations': {'sidecar.istio.io/inject': 'false'},
      'labels': {'fairing-deployer': 'tfjob',
       'fairing-id': 'acd47550-56c1-11ea-b7e1-00163e01bd45'},
      'name': 'fairing-deployer'},
     'spec': {'containers': [{'command': ['python',
         '/opt/mnist.py',
         '--tf-model-dir=/mnt',
         '--tf-export-dir=/mnt/export',
         '--tf-train-steps=200',
         '--tf-batch-size=100',
         '--tf-learning-rate=0.01'],
        'env': [{'name': 'FAIRING_RUNTIME', 'value': '1'}],
        'image': 'index.docker.io/jinchi/mnist:54A2DC37',
        'name': 'tensorflow',
        'securityContext': {'runAsUser': 0},
        'volumeMounts': [{'mountPath': '/mnt',
          'name': 'fairing-volume-mnist-e2e-pvc'}],
        'workingDir': '/app'}],
      'restartPolicy': 'Never',
      'volumes': [{'name': 'fairing-volume-mnist-e2e-pvc',
        'persistentVolumeClaim': {'claimName': 'mnist-e2e-pvc'}}]}}}}}}

Wait For the Training Job to finish


In [9]:
tfjob_client.wait_for_job(tfjob_name, namespace=my_namespace, watch=True)


NAME                           STATE                TIME                          
mnist-training-445b            Created              2020-02-24T04:54:01Z          
mnist-training-445b            Created              2020-02-24T04:54:01Z          
mnist-training-445b            Created              2020-02-24T04:54:01Z          
mnist-training-445b            Created              2020-02-24T04:54:01Z          
mnist-training-445b            Running              2020-02-24T04:54:08Z          
mnist-training-445b            Succeeded            2020-02-24T04:54:16Z          

Check if the TFJob succeeded.


In [10]:
tfjob_client.is_job_succeeded(tfjob_name, namespace=my_namespace)


Out[10]:
True

Get the Training Logs


In [11]:
tfjob_client.get_logs(tfjob_name, namespace=my_namespace)


[I 200223 20:54:16 tf_job_client:386] The logs of Pod mnist-training-445b-chief-0:
     WARNING:tensorflow:From /opt/mnist.py:237: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.
    
    WARNING:tensorflow:From /opt/mnist.py:155: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
    
    W0224 04:54:10.486181 139704674924352 module_wrapper.py:139] From /opt/mnist.py:155: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
    
    WARNING:tensorflow:From /opt/mnist.py:155: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
    
    W0224 04:54:10.486561 139704674924352 module_wrapper.py:139] From /opt/mnist.py:155: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
    
    WARNING:tensorflow:From /opt/mnist.py:160: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
    
    W0224 04:54:10.488451 139704674924352 module_wrapper.py:139] From /opt/mnist.py:160: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
    
    INFO:tensorflow:TF_CONFIG {"cluster":{"chief":["mnist-training-445b-chief-0.hejinchi.svc:2222"],"ps":["mnist-training-445b-ps-0.hejinchi.svc:2222"],"worker":["mnist-training-445b-worker-0.hejinchi.svc:2222","mnist-training-445b-worker-1.hejinchi.svc:2222"]},"task":{"type":"chief","index":0},"environment":"cloud"}
    I0224 04:54:10.488698 139704674924352 mnist.py:160] TF_CONFIG {"cluster":{"chief":["mnist-training-445b-chief-0.hejinchi.svc:2222"],"ps":["mnist-training-445b-ps-0.hejinchi.svc:2222"],"worker":["mnist-training-445b-worker-0.hejinchi.svc:2222","mnist-training-445b-worker-1.hejinchi.svc:2222"]},"task":{"type":"chief","index":0},"environment":"cloud"}
    INFO:tensorflow:cluster={'chief': ['mnist-training-445b-chief-0.hejinchi.svc:2222'], 'ps': ['mnist-training-445b-ps-0.hejinchi.svc:2222'], 'worker': ['mnist-training-445b-worker-0.hejinchi.svc:2222', 'mnist-training-445b-worker-1.hejinchi.svc:2222']} job_name=chief task_index=0
    I0224 04:54:10.489460 139704674924352 mnist.py:166] cluster={'chief': ['mnist-training-445b-chief-0.hejinchi.svc:2222'], 'ps': ['mnist-training-445b-ps-0.hejinchi.svc:2222'], 'worker': ['mnist-training-445b-worker-0.hejinchi.svc:2222', 'mnist-training-445b-worker-1.hejinchi.svc:2222']} job_name=chief task_index=0
    INFO:tensorflow:Will export model
    I0224 04:54:10.489564 139704674924352 mnist.py:171] Will export model
    WARNING:tensorflow:
    The TensorFlow contrib module will not be included in TensorFlow 2.0.
    For more information, please see:
      * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
      * https://github.com/tensorflow/addons
      * https://github.com/tensorflow/io (for I/O related ops)
    If you depend on functionality not listed there, please file an issue.
    
    W0224 04:54:10.489712 139704674924352 lazy_loader.py:50] 
    The TensorFlow contrib module will not be included in TensorFlow 2.0.
    For more information, please see:
      * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
      * https://github.com/tensorflow/addons
      * https://github.com/tensorflow/io (for I/O related ops)
    If you depend on functionality not listed there, please file an issue.
    
    WARNING:tensorflow:From /opt/mnist.py:176: load_mnist (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
    W0224 04:54:11.105488 139704674924352 deprecation.py:323] From /opt/mnist.py:176: load_mnist (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:300: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
    W0224 04:54:11.105893 139704674924352 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:300: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please write your own downloading logic.
    W0224 04:54:11.106053 139704674924352 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please write your own downloading logic.
    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/base.py:252: _internal_retry.<locals>.wrap.<locals>.wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use urllib or similar directly.
    W0224 04:54:11.106473 139704674924352 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/base.py:252: _internal_retry.<locals>.wrap.<locals>.wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use urllib or similar directly.
    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use tf.data to implement this functionality.
    W0224 04:54:11.732874 139704674924352 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use tf.data to implement this functionality.
    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use tf.data to implement this functionality.
    W0224 04:54:12.413595 139704674924352 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use tf.data to implement this functionality.
    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
    W0224 04:54:13.169425 139704674924352 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
    WARNING:tensorflow:From /opt/mnist.py:177: The name tf.estimator.inputs is deprecated. Please use tf.compat.v1.estimator.inputs instead.
    
    W0224 04:54:13.979085 139704674924352 module_wrapper.py:139] From /opt/mnist.py:177: The name tf.estimator.inputs is deprecated. Please use tf.compat.v1.estimator.inputs instead.
    
    WARNING:tensorflow:From /opt/mnist.py:177: The name tf.estimator.inputs.numpy_input_fn is deprecated. Please use tf.compat.v1.estimator.inputs.numpy_input_fn instead.
    
    W0224 04:54:13.979414 139704674924352 module_wrapper.py:139] From /opt/mnist.py:177: The name tf.estimator.inputs.numpy_input_fn is deprecated. Please use tf.compat.v1.estimator.inputs.numpy_input_fn instead.
    
    INFO:tensorflow:TF_CONFIG environment variable: {'cluster': {'chief': ['mnist-training-445b-chief-0.hejinchi.svc:2222'], 'ps': ['mnist-training-445b-ps-0.hejinchi.svc:2222'], 'worker': ['mnist-training-445b-worker-0.hejinchi.svc:2222', 'mnist-training-445b-worker-1.hejinchi.svc:2222']}, 'task': {'type': 'chief', 'index': 0}, 'environment': 'cloud'}
    I0224 04:54:13.979754 139704674924352 run_config.py:535] TF_CONFIG environment variable: {'cluster': {'chief': ['mnist-training-445b-chief-0.hejinchi.svc:2222'], 'ps': ['mnist-training-445b-ps-0.hejinchi.svc:2222'], 'worker': ['mnist-training-445b-worker-0.hejinchi.svc:2222', 'mnist-training-445b-worker-1.hejinchi.svc:2222']}, 'task': {'type': 'chief', 'index': 0}, 'environment': 'cloud'}
    INFO:tensorflow:Using config: {'_model_dir': '/mnt', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': device_filters: "/job:ps"
    device_filters: "/job:chief"
    allow_soft_placement: true
    graph_options {
      rewrite_options {
        meta_optimizer_iterations: ONE
      }
    }
    , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0f56efdf28>, '_task_type': 'chief', '_task_id': 0, '_evaluation_master': '', '_master': 'grpc://mnist-training-445b-chief-0.hejinchi.svc:2222', '_num_ps_replicas': 1, '_num_worker_replicas': 3, '_global_id_in_cluster': 0, '_is_chief': True}
    I0224 04:54:13.981208 139704674924352 estimator.py:212] Using config: {'_model_dir': '/mnt', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': device_filters: "/job:ps"
    device_filters: "/job:chief"
    allow_soft_placement: true
    graph_options {
      rewrite_options {
        meta_optimizer_iterations: ONE
      }
    }
    , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0f56efdf28>, '_task_type': 'chief', '_task_id': 0, '_evaluation_master': '', '_master': 'grpc://mnist-training-445b-chief-0.hejinchi.svc:2222', '_num_ps_replicas': 1, '_num_worker_replicas': 3, '_global_id_in_cluster': 0, '_is_chief': True}
    INFO:tensorflow:Not using Distribute Coordinator.
    I0224 04:54:13.982498 139704674924352 estimator_training.py:186] Not using Distribute Coordinator.
    INFO:tensorflow:Start Tensorflow server.
    I0224 04:54:13.983283 139704674924352 training.py:744] Start Tensorflow server.
    2020-02-24 04:54:13.983879: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
    2020-02-24 04:54:13.992014: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
    2020-02-24 04:54:13.993487: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44f5040 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-02-24 04:54:13.993530: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    2020-02-24 04:54:13.998717: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:258] Initialize GrpcChannelCache for job chief -> {0 -> localhost:2222}
    2020-02-24 04:54:13.998797: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:258] Initialize GrpcChannelCache for job ps -> {0 -> mnist-training-445b-ps-0.hejinchi.svc:2222}
    2020-02-24 04:54:13.998810: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:258] Initialize GrpcChannelCache for job worker -> {0 -> mnist-training-445b-worker-0.hejinchi.svc:2222, 1 -> mnist-training-445b-worker-1.hejinchi.svc:2222}
    2020-02-24 04:54:14.001617: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:365] Started server with target: grpc://localhost:2222
    INFO:tensorflow:Skipping training since max_steps has already saved.
    I0224 04:54:14.053299 139704674924352 estimator.py:363] Skipping training since max_steps has already saved.
    WARNING:tensorflow:From /opt/mnist.py:232: Estimator.export_savedmodel (from tensorflow_estimator.python.estimator.estimator) is deprecated and will be removed in a future version.
    Instructions for updating:
    This function has been renamed, use `export_saved_model` instead.
    W0224 04:54:14.054757 139704674924352 deprecation.py:323] From /opt/mnist.py:232: Estimator.export_savedmodel (from tensorflow_estimator.python.estimator.estimator) is deprecated and will be removed in a future version.
    Instructions for updating:
    This function has been renamed, use `export_saved_model` instead.
    WARNING:tensorflow:From /opt/mnist.py:145: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
    
    W0224 04:54:14.133359 139704674924352 module_wrapper.py:139] From /opt/mnist.py:145: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
    
    INFO:tensorflow:Calling model_fn.
    I0224 04:54:14.136102 139704674924352 estimator.py:1148] Calling model_fn.
    WARNING:tensorflow:From /opt/mnist.py:76: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
    
    W0224 04:54:14.141005 139704674924352 module_wrapper.py:139] From /opt/mnist.py:76: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
    
    WARNING:tensorflow:From /opt/mnist.py:82: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.keras.layers.Conv2D` instead.
    W0224 04:54:14.142981 139704674924352 deprecation.py:323] From /opt/mnist.py:82: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.keras.layers.Conv2D` instead.
    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/layers/convolutional.py:424: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use `layer.__call__` method instead.
    W0224 04:54:14.145952 139704674924352 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/layers/convolutional.py:424: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use `layer.__call__` method instead.
    WARNING:tensorflow:From /opt/mnist.py:84: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use keras.layers.MaxPooling2D instead.
    W0224 04:54:14.187732 139704674924352 deprecation.py:323] From /opt/mnist.py:84: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use keras.layers.MaxPooling2D instead.
    WARNING:tensorflow:From /opt/mnist.py:100: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use keras.layers.Dense instead.
    W0224 04:54:14.225172 139704674924352 deprecation.py:323] From /opt/mnist.py:100: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use keras.layers.Dense instead.
    WARNING:tensorflow:From /opt/mnist.py:104: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use keras.layers.dropout instead.
    W0224 04:54:14.250329 139704674924352 deprecation.py:323] From /opt/mnist.py:104: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use keras.layers.dropout instead.
    INFO:tensorflow:Done calling model_fn.
    I0224 04:54:14.281573 139704674924352 estimator.py:1150] Done calling model_fn.
    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
    Instructions for updating:
    This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
    W0224 04:54:14.282021 139704674924352 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
    Instructions for updating:
    This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
    INFO:tensorflow:Signatures INCLUDED in export for Classify: None
    I0224 04:54:14.282793 139704674924352 export_utils.py:170] Signatures INCLUDED in export for Classify: None
    INFO:tensorflow:Signatures INCLUDED in export for Regress: None
    I0224 04:54:14.283399 139704674924352 export_utils.py:170] Signatures INCLUDED in export for Regress: None
    INFO:tensorflow:Signatures INCLUDED in export for Predict: ['classes', 'serving_default']
    I0224 04:54:14.283491 139704674924352 export_utils.py:170] Signatures INCLUDED in export for Predict: ['classes', 'serving_default']
    INFO:tensorflow:Signatures INCLUDED in export for Train: None
    I0224 04:54:14.283800 139704674924352 export_utils.py:170] Signatures INCLUDED in export for Train: None
    INFO:tensorflow:Signatures INCLUDED in export for Eval: None
    I0224 04:54:14.283922 139704674924352 export_utils.py:170] Signatures INCLUDED in export for Eval: None
    INFO:tensorflow:Restoring parameters from /mnt/model.ckpt-204
    I0224 04:54:14.327866 139704674924352 saver.py:1284] Restoring parameters from /mnt/model.ckpt-204
    INFO:tensorflow:Assets added to graph.
    I0224 04:54:14.389047 139704674924352 builder_impl.py:665] Assets added to graph.
    INFO:tensorflow:No assets to write.
    I0224 04:54:14.389322 139704674924352 builder_impl.py:460] No assets to write.
    INFO:tensorflow:SavedModel written to: /mnt/export/temp-b'1582520054'/saved_model.pb
    I0224 04:54:14.683466 139704674924352 builder_impl.py:425] SavedModel written to: /mnt/export/temp-b'1582520054'/saved_model.pb
    Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
    Extracting /tmp/data/train-images-idx3-ubyte.gz
    Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
    Extracting /tmp/data/train-labels-idx1-ubyte.gz
    Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
    Extracting /tmp/data/t10k-images-idx3-ubyte.gz
    Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
    Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
    Train and evaluate
    Training done
    Export saved model
    Done exporting the model
    

Deploy Service using KFServing


In [12]:
from kubeflow.fairing.deployers.kfserving.kfserving import KFServing
isvc_name = f'mnist-service-{uuid.uuid4().hex[:4]}'
isvc = KFServing('tensorflow', namespace=my_namespace, isvc_name=isvc_name,
                 default_storage_uri='pvc://' + pvc_name + '/export')
isvc.deploy(isvc.generate_isvc())


NAME                 READY      DEFAULT_TRAFFIC CANARY_TRAFFIC  URL                                               
mnist-service-5041   Unknown                                                                                      
mnist-service-5041   False                                                                                        
mnist-service-5041   False                                                                                        
mnist-service-5041   False                                                                                        
mnist-service-5041   True       100                             http://mnist-service-5041.hejinchi.example.com/...
[I 200223 20:54:36 kfserving:116] Deployed the InferenceService mnist-service-5041 successfully.
Out[12]:
'mnist-service-5041'

Get the InferenceService


In [13]:
from kfserving import KFServingClient
kfserving_client = KFServingClient()
kfserving_client.get(namespace=my_namespace)


Out[13]:
{'apiVersion': 'serving.kubeflow.org/v1alpha2',
 'items': [{'apiVersion': 'serving.kubeflow.org/v1alpha2',
   'kind': 'InferenceService',
   'metadata': {'creationTimestamp': '2020-02-24T04:54:16Z',
    'generateName': 'fairing-kfserving-',
    'generation': 5,
    'name': 'mnist-service-5041',
    'namespace': 'hejinchi',
    'resourceVersion': '5762157',
    'selfLink': '/apis/serving.kubeflow.org/v1alpha2/namespaces/hejinchi/inferenceservices/mnist-service-5041',
    'uid': '7da546be-8831-4743-8631-4d0aad20844d'},
   'spec': {'default': {'predictor': {'tensorflow': {'resources': {'limits': {'cpu': '1',
         'memory': '2Gi'},
        'requests': {'cpu': '1', 'memory': '2Gi'}},
       'runtimeVersion': '1.14.0',
       'storageUri': 'pvc://mnist-e2e-pvc/export'}}}},
   'status': {'canary': {},
    'conditions': [{'lastTransitionTime': '2020-02-24T04:54:36Z',
      'status': 'True',
      'type': 'DefaultPredictorReady'},
     {'lastTransitionTime': '2020-02-24T04:54:36Z',
      'status': 'True',
      'type': 'Ready'},
     {'lastTransitionTime': '2020-02-24T04:54:36Z',
      'status': 'True',
      'type': 'RoutesReady'}],
    'default': {'predictor': {'host': 'mnist-service-5041-predictor-default.hejinchi.example.com',
      'name': 'mnist-service-5041-predictor-default-vcbgf'}},
    'traffic': 100,
    'url': 'http://mnist-service-5041.hejinchi.example.com/v1/models/mnist-service-5041'}}],
 'kind': 'InferenceServiceList',
 'metadata': {'continue': '',
  'resourceVersion': '5762157',
  'selfLink': '/apis/serving.kubeflow.org/v1alpha2/namespaces/hejinchi/inferenceservices'}}

Get the InferenceService and Service Endpoint


In [14]:
mnist_isvc = kfserving_client.get(isvc_name, namespace=my_namespace)
print("MNIST Service Endpoint: " + mnist_isvc['status'].get('url', ''))


MNIST Service Endpoint: http://mnist-service-5041.hejinchi.example.com/v1/models/mnist-service-5041

Clean Up

Delete the TFJob


In [15]:
tfjob_client.delete(tfjob_name, namespace=my_namespace)


Out[15]:
{'kind': 'Status',
 'apiVersion': 'v1',
 'metadata': {},
 'status': 'Success',
 'details': {'name': 'mnist-training-445b',
  'group': 'kubeflow.org',
  'kind': 'tfjobs',
  'uid': '6546a875-348b-41bb-8510-04abc7dd1a58'}}

Delete the InferenceService.


In [16]:
kfserving_client.delete(isvc_name, namespace=my_namespace)


Out[16]:
{'kind': 'Status',
 'apiVersion': 'v1',
 'metadata': {},
 'status': 'Success',
 'details': {'name': 'mnist-service-5041',
  'group': 'serving.kubeflow.org',
  'kind': 'inferenceservices',
  'uid': '7da546be-8831-4743-8631-4d0aad20844d'}}

In [17]:
k8s_core_api.delete_namespaced_persistent_volume_claim(pvc_name, my_namespace)
k8s_core_api.delete_persistent_volume(pv_name)


Out[17]:
{'api_version': 'v1',
 'code': None,
 'details': None,
 'kind': 'PersistentVolume',
 'message': None,
 'metadata': {'_continue': None,
              'resource_version': '5762169',
              'self_link': '/api/v1/persistentvolumes/mnist-e2e-pv'},
 'reason': None,
 'status': "{'phase': 'Bound'}"}

In [ ]: