This notebook demonstrates how to extract the machine learning Models + Analysis from the redis cache CACHE" and saved to disk as a compressed, string artifact file (Pickle + zlib compression). Once the file is saved, it is uploaded to the configured S3 Bucket for archiving and sharing.
Extract the IRIS Regressor and Classification datasets from the redis CACHE. After extraction, compile a manifest for defining a cache mapping for all the Models + their respective Analysis. Once cached, the Models can be extract and shared + deployed on other Sci-pype instances by using something like this notebook or the command-line versions.
I built this notebook from the extractor examples:
https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/extractors
In [1]:
# Setup the Sci-pype environment
import sys, os
# Only redis is needed for this notebook:
os.environ["ENV_DEPLOYMENT_TYPE"] = "JustRedis"
# Load the Sci-pype PyCore as a named-object called "core" and environment variables
from src.common.load_ipython_env import *
Extract the Models from the Cache with this request and upload them object files to the configured S3 Bucket.
Please make sure the environment variables are set correctly and the S3 Bucket exists:
ENV_AWS_KEY=<AWS API Key>
ENV_AWS_SECRET=<AWS API Secret>
For docker containers make sure to set these keys in the correct Jupyter env file and restart the container:
<repo base dir>/justredis/jupyter.env
<repo base dir>/local/jupyter.env
<repo base dir>/test/jupyter.env
In [2]:
ds_name = "iris_classifier"
In [3]:
data_dir = str(os.getenv("ENV_DATA_DST_DIR", "/opt/work/data/dst"))
if not os.path.exists(data_dir):
os.mkdir(data_dir, 0777)
In [4]:
s3_bucket = "unique-bucket-name-for-datasets" # name this something under your AWS Account (This might be open to the public in the future...stay tuned)
s3_key = "dataset_" + core.to_upper(ds_name) + ".cache.pickle.zlib"
s3_loc = str(s3_bucket) + ":" + str(s3_key)
In [5]:
cache_req = {
"RAName" : "CACHE", # Redis instance name holding the models
"DSName" : str(ds_name), # Dataset name for pulling out of the cache
"S3Loc" : str(s3_loc), # S3 location to store the model file
"DeleteAfter" : False, # Optional delete after upload
"SaveDir" : data_dir, # Optional dir to save the model file - default is ENV_DATA_DST_DIR
"TrackingID" : "" # Future support for using the tracking id
}
upload_results = core.ml_upload_cached_dataset_to_s3(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
lg("Done Uploading Model and Analysis DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 6)
else:
lg("", 6)
lg("ERROR: Failed Upload Model and Analysis Caches as file for DSName(" + str(ds_name) + ")", 6)
lg(upload_results["Error"], 6)
lg("", 6)
# end of if extract + upload worked
lg("", 6)
lg("Extract and Upload Completed", 5)
lg("", 6)
In [6]:
ds_name = "iris_regressor"
In [7]:
cache_req = {
"RAName" : "CACHE", # Redis instance name holding the models
"DSName" : str(ds_name), # Dataset name for pulling out of the cache
"S3Loc" : str(s3_loc), # S3 location to store the model file
"DeleteAfter" : False, # Optional delete after upload
"SaveDir" : data_dir, # Optional dir to save the model file - default is ENV_DATA_DST_DIR
"TrackingID" : "" # Future support for using the tracking id
}
upload_results = core.ml_upload_cached_dataset_to_s3(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
lg("Done Uploading Model and Analysis DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 6)
else:
lg("", 6)
lg("ERROR: Failed Upload Model and Analysis Caches as file for DSName(" + str(ds_name) + ")", 6)
lg(upload_results["Error"], 6)
lg("", 6)
sys.exit(1)
# end of if extract + upload worked
lg("", 6)
lg("Extract and Upload Completed", 5)
lg("", 6)
I built this notebook from the extractor examples:
https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/extractors