This notebook uses BentoML
and Kubeflow
to build, train and boot an api powered by a Prophet
time series model.
The below notebook focuses on building and serving a model using:
The notebook uses the Covid19 Kaggle locality infection data as our sample data. The notebook uses my home town New York
as the locality.
In [1]:
%%capture
!pip install pandas sklearn auto-sklearn kubeflow-fairing grpcio kubeflow.metadata bentoml plotly fbprophet
In [2]:
import uuid
from importlib import reload
import grpc
from kubeflow import fairing
from kubeflow.fairing import constants
import os
import pandas as pd
import logging
logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)
In [3]:
# The docker registry to store images in
DOCKER_REGISTRY = "iancoffey"
# The k8s namespace to run the experiment in
k8s_namespace = "default"
# Use local bentoml storage
!bentoml config set yatai_service.url=""
This notebook will use Minio as the development context storage for Kubeflow Fairing
, but any of the supported blob storage (s3, gcp, etc) will work.
To install Minio, apply the provided Minio manifest into the provided namespace.
kubectl apply -f ./manifests/minio -n $k8s_namespace
We will use the kubernetes sdk to determine the endpoints ip address and then build our MinioContextSource
.
In [4]:
from kubernetes import utils as k8s_utils
from kubernetes import client as k8s_client
from kubernetes import config as k8s_config
from kubeflow.fairing.utils import is_running_in_k8s
from kubeflow.fairing.cloud.k8s import MinioUploader
from kubeflow.fairing.builders.cluster.minio_context import MinioContextSource
if is_running_in_k8s():
k8s_config.load_incluster_config()
else:
k8s_config.load_kube_config()
api_client = k8s_client.CoreV1Api()
minio_service_endpoint = api_client.read_namespaced_service(name='minio-service', namespace='default').spec.cluster_ip
In [5]:
minio_endpoint = "http://"+minio_service_endpoint+":9000"
minio_username = "minio"
minio_key = "minio123"
minio_region = "us-east-1"
minio_uploader = MinioUploader(endpoint_url=minio_endpoint, minio_secret=minio_username, minio_secret_key=minio_key, region_name=minio_region)
minio_context_source = MinioContextSource(endpoint_url=minio_endpoint, minio_secret=minio_username, minio_secret_key=minio_key, region_name=minio_region)
minio_endpoint
Out[5]:
In [6]:
# fairing:include-cell
data_path="covid_19_data.csv"
dframe = pd.read_csv(data_path, sep=',')
In [7]:
cols_of_interest = ['Confirmed', 'Province/State', 'ObservationDate']
dframe['ObservationDate'] = pd.to_datetime(dframe['ObservationDate'])
dframe.sort_index(inplace=True)
trimmed_dframe=dframe[cols_of_interest]
trimmed_dframe=trimmed_dframe.dropna()
# Note the copy() here - else we would be working on a reference
state_data = trimmed_dframe.loc[trimmed_dframe['Province/State'] == 'New York'].copy()
state_data = state_data.drop('Province/State', axis=1).sort_index()
state_data.rename(columns={'Confirmed': 'y', 'ObservationDate': 'ds'}, inplace=True)
state_data.head()
Out[7]:
In [8]:
color_pal = ["#F8766D", "#D39200", "#93AA00",
"#00BA38", "#00C19F", "#00B9E3",
"#619CFF", "#DB72FB"]
_ = state_data.plot(x ='ds', y='y', kind='scatter', figsize=(15,5), title="Raw Covid19 Dataset")
In [9]:
split_date = "2020-05-15"
train_data = state_data[state_data['ds'] <= split_date].copy()
test_data = state_data[state_data['ds'] > split_date].copy()
len(state_data)
Out[9]:
Now that we have gathered and transformed our original covid19 timeseries data, we can create a new Prophet model and fit our training data.
Then, we can use make_future_dataframe to create a dataframe to contain our future timeseries and predictions. We pass periods
as 10 to ensure we have 10 days predicted.
In [10]:
# Python
import pandas as pd
from fbprophet import Prophet
m = Prophet()
m.fit(train_data)
future = m.make_future_dataframe(periods=10)
future.tail()
Out[10]:
In [11]:
import numpy as np
forecast = m.predict(future)
print(forecast['yhat'].tail())
In [12]:
fig1 = m.plot(forecast, figsize = (15, 10))
In [14]:
for c in forecast.columns.sort_values():
print(c)
In [15]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid', {'axes.facecolor': '.9'})
sns.set_palette(palette='deep')
sns_c = sns.color_palette(palette='deep')
threshold_date = pd.to_datetime(split_date)
fig, ax = plt.subplots(figsize = (15, 10))
sns.lineplot(x='ds', y='y', label='y_train', data=train_data, ax=ax)
sns.lineplot(x='ds', y='y', label='y_test', data=test_data, ax=ax)
sns.lineplot(x='ds', y='trend', data=forecast, ax=ax)
ax.axvline(threshold_date, color=sns_c[3], linestyle='--', label='train test split')
ax.legend(loc='upper left')
ax.set(title='Confirmed Cases', ylabel='');
In [16]:
%%writefile prophet_serve.py
import bentoml
from bentoml.handlers import DataframeHandler
from bentoml.artifact import PickleArtifact
import fbprophet
@bentoml.artifacts([PickleArtifact('model')])
@bentoml.env(pip_dependencies=['fbprophet'])
class ProphetServe(bentoml.BentoService):
@bentoml.api(DataframeHandler)
def predict(self, df):
return self.artifacts.model.predict(df)
In [17]:
import prophet_serve
import importlib
importlib.reload(prophet_serve)
Out[17]:
In [18]:
from prophet_serve import ProphetServe
bento_service = ProphetServe()
bento_service.pack('model', m)
saved_path = bento_service.save()
In [19]:
!bentoml get ProphetServe
Lets pick a service to launch as a service. We will use the most recent one here, but you will need to edit the values below to match yours. This is purely for convience so that we can poke at the BentoML CLI quickly.
We can see inside the autogenerated bento service directory that many files now exist, including a Dockerfile and requirements.txt
In [20]:
!bentoml get ProphetServe:20200626124946_94E973
In [21]:
!bentoml run ProphetServe:20200626124946_94E973 predict --input '{"ds":["2021-07-14"]}'
In [32]:
!ls /home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973
We can build our models container with Kubeflow Fairing, a cool component which aims to improve data scientist life by streamlining the process of building, training, and deploying machine learning (ML) models.
First we need to do some work to get the BentoML artifacts into our build context. We do this via the preprocessing
work below. We are just taking the auto-generated BentoML
Docker environment and stitching it together with Kubeflow Fairing
.
In [35]:
# Let build a docker image with builder using bentoml output
from kubeflow.fairing.preprocessors.base import BasePreProcessor
output_map = {
"/home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973/Dockerfile": "Dockerfile",
"/home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973/environment.yml": "environment.yml",
"/home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973/requirements.txt": "requirements.txt",
"/home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973/setup.py": "setup.py",
"/home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973/bentoml-init.sh": "bentoml-init.sh",
"/home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973/bentoml.yml": "bentoml.yml",
"/home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973/ProphetServe/": "ProphetServe/",
"/home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973/ProphetServe/prophet_serve.py": "ProphetServe/prophet_serve.py",
"/home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973/ProphetServe/artifacts/model.pkl": "ProphetServe/artifacts/model.pkl",
"/home/jovyan/bentoml/repository/ProphetServe/20200626124946_94E973/docker-entrypoint.sh": "docker-entrypoint.sh",
}
preprocessor = BasePreProcessor(output_map=output_map)
preprocessor.preprocess()
Out[35]:
In this workflow, we want to develop our models locally, until we arrive at a model we want to deploy. From there we want to use cloud resources to build Docker images to deploy to our Kubernetes cluster.
These Docker images will be built in our private cloud, perhaps using data and artifacts which are sensitive in nature.
In [40]:
from kubeflow.fairing.builders import cluster
from kubeflow.fairing import constants
constants.constants.KANIKO_IMAGE = "gcr.io/kaniko-project/executor:v0.22.0"
cluster_builder = cluster.cluster.ClusterBuilder(registry=DOCKER_REGISTRY,
preprocessor=preprocessor,
dockerfile_path="Dockerfile",
context_source=minio_context_source)
print(cluster_builder.build())
There are several good ways we can deploy the model we have fit, trained and crafted into an image. For this example, we will deploy the model service with KFServing by creating an InferenceService
. This project uses Knative Serving to deploy the service and setup its routes. Under the covers Istio is at work routing our traffic to this new service.
For local development, set the cluster local tags and set the domain to reflect svc.cluster.local
as described in these docs.
Lets use the Custom Object K8S API to deploy the pickled Prophet model we have defined with BentoML and built with Fairing.
InferenceService
Below, we define and launch an inferenceservice, which in turn reconciles into a Knative service.
In [60]:
from kfserving import V1alpha2EndpointSpec,V1alpha2InferenceServiceSpec, V1alpha2InferenceService, V1alpha2CustomSpec
from kfserving import KFServingClient
from kfserving import constants
containerSpec = k8s_client.V1Container(
name="prophet-model-api-container",
image=cluster_builder.image_tag,
ports=[k8s_client.V1ContainerPort(container_port=5000)])
default_custom_model_spec = V1alpha2EndpointSpec(predictor=V1alpha2PredictorSpec(custom=V1alpha2CustomSpec(container=containerSpec)))
metadata = k8s_client.V1ObjectMeta(
name="prophet-model-api", namespace="default",
)
isvc = V1alpha2InferenceService(api_version=constants.KFSERVING_GROUP + '/' + constants.KFSERVING_VERSION,
kind=constants.KFSERVING_KIND,
metadata=metadata,
spec=V1alpha2InferenceServiceSpec(default=default_custom_model_spec))
KFServing = KFServingClient()
KFServing.create(isvc)
Out[60]:
In [61]:
!curl -i --header "Content-Type: application/json" -X POST http://prophet-model-api-predictor-default-7z66r-private.default.svc.cluster.local/predict --data '{"ds":["2020-07-14"]}'