This notebook demonstrates how to extract the machine learning Models + Analysis from the Redis Labs Cloud (https://redislabs.com/redis-cloud) cache endpoint named "CACHE" and saved locally as a compressed, string artifact file (Pickle + zlib compression). Once the file is saved, it is uploaded to the configured S3 Bucket for archiving and sharing.
Extract the IRIS XGB regressor models from the Redis Labs Cloud CACHE endpoint. After extraction, compile a manifest for defining a cache mapping for all the Models + their respective Analysis. Once cached, the Models can be extract and shared + deployed on other Sci-pype instances by using something like this notebook or the command-line versions.
This notebook was built from the extractor command line examples:
https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/extractors
In [1]:
# Setup the Sci-pype environment
import sys, os
# Only Redis Labs is needed for this notebook:
os.environ["ENV_DEPLOYMENT_TYPE"] = "RedisLabs"
# Load the Sci-pype PyCore as a named-object called "core" and environment variables
from src.common.load_ipython_env import *
Extract the Models from the Cache with this request and upload them object files to the configured S3 Bucket.
Please make sure the environment variables are set correctly and the S3 Bucket exists:
ENV_AWS_KEY=<AWS API Key>
ENV_AWS_SECRET=<AWS API Secret>
For docker containers make sure to set these keys in the correct Jupyter env file and restart the container:
<repo base dir>/justredis/redis-labs.env
<repo base dir>/local/jupyter.env
<repo base dir>/test/jupyter.env
In [2]:
ds_name = "iris_regressor"
In [3]:
data_dir = str(os.getenv("ENV_DATA_DST_DIR", "/opt/work/data/dst"))
if not os.path.exists(data_dir):
os.mkdir(data_dir, 0777)
In [4]:
s3_bucket = "unique-bucket-name-for-datasets" # name this something under your AWS Account (This might be open to the public in the future...stay tuned)
s3_key = "dataset_" + core.to_upper(ds_name) + ".cache.pickle.zlib"
s3_loc = str(s3_bucket) + ":" + str(s3_key)
In [5]:
cache_req = {
"RAName" : "CACHE", # Redis endpoint name holding the models
"DSName" : str(ds_name), # Dataset name for pulling out of the cache
"S3Loc" : str(s3_loc), # S3 location to store the model file
"DeleteAfter" : False, # Optional delete after upload
"SaveDir" : data_dir, # Optional dir to save the model file - default is ENV_DATA_DST_DIR
"TrackingID" : "" # Future support for using the tracking id
}
upload_results = core.ml_upload_cached_dataset_to_s3(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
lg("Done Uploading Model and Analysis DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 6)
else:
lg("", 6)
lg("ERROR: Failed Upload Model and Analysis Caches as file for DSName(" + str(ds_name) + ")", 6)
lg(upload_results["Error"], 6)
lg("", 6)
sys.exit(1)
# end of if extract + upload worked
lg("", 6)
lg("Extract and Upload Completed", 5)
lg("", 6)
Now that the XGB models are archived as an artifact on S3, you can run the following notebooks to checkout how the Sci-pype workflow continues using this data science artifact:
This notebook was built from the extractor command line examples:
https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/extractors