This notebook demonstrates how to import a machine learning model file from the S3 and store the Models + Analysis in the redis cache named "CACHE".
Download the S3 IRIS Model archive from the configured S3 Bucket + Key and decompress + extract the pickled historcial analysis and previously built models. This includes examples from the IRIS sample dataset and requires you to have a valid S3 Bucket storing the models and are comfortable paying for the download costs for downloading the objects from S3 (https://aws.amazon.com/s3/pricing/).
Once uploaded to the S3 Bucket you should be able to view the files have a similar disk size:
After importing, the Models and Analysis are available to any other Sci-pype instance with connectivity to the same redis cache.
I built this notebook from the importer examples:
https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/importers
In [1]:
# Setup the Sci-pype environment
import sys, os
# Only redis is needed for this notebook:
os.environ["ENV_DEPLOYMENT_TYPE"] = "JustRedis"
# Load the Sci-pype PyCore as a named-object called "core" and environment variables
from src.common.load_ipython_env import *
Import the Models from S3 and store the extracted Models + Analysis in the Cache.
Please make sure the environment variables are set correctly and the S3 Bucket exists:
ENV_AWS_KEY=<AWS API Key>
ENV_AWS_SECRET=<AWS API Secret>
For docker containers make sure to set these keys in the correct Jupyter env file and restart the container:
<repo base dir>/justredis/jupyter.env
<repo base dir>/local/jupyter.env
<repo base dir>/test/jupyter.env
In [2]:
ds_name = "iris_classifier"
In [3]:
data_dir = str(os.getenv("ENV_DATA_SRC_DIR", "/opt/work/data/src"))
if not os.path.exists(data_dir):
os.mkdir(data_dir, 0777)
In [4]:
s3_bucket = "unique-bucket-name-for-datasets" # name this something under your AWS Account (This might be open to the public in the future...stay tuned)
s3_key = "dataset_" + core.to_upper(ds_name) + ".cache.pickle.zlib"
s3_loc = str(s3_bucket) + ":" + str(s3_key)
In [5]:
ml_file = data_dir + "/" + str(s3_key)
In [6]:
lg("-------------------------------------------------", 6)
lg("Importing Models and Analysis from S3 into Caching Models from CACHE - S3Loc(" + str(s3_loc) + ") File(" + str(ml_file) + ")", 6)
lg("", 6)
if os.path.exists(ml_file) == False:
s3_loc = str(s3_bucket) + ":" + str(s3_key)
lg("Downloading ModelFile S3Loc(" + str(s3_loc) + ")", 6)
download_results = core.s3_download_and_store_file(s3_loc, ml_file, core.get_rds(), core.get_dbs(), debug)
if download_results["Status"] != "SUCCESS":
lg("ERROR: Stopping processing for errror: " + str(download_results["Error"]), 0)
else:
lg("", 6)
lg("Done Downloading ModelFile S3Loc(" + str(s3_loc) + ") File(" + str(download_results["Record"]["File"]) + ")", 5)
ml_file = download_results["Record"]["File"]
else:
lg("", 6)
lg("Continuing with the existing file.", 5)
lg("", 6)
# end of downloading from s3 if it's not locally available
In [7]:
ra_name = "CACHE"
lg("Importing(" + str(ml_file) + ") Models and Analysis into Redis(" + str(ra_name) + ")", 6)
cache_req = {
"RAName" : ra_name,
"DSName" : str(ds_name),
"TrackingID": "",
"ModelFile" : ml_file,
"S3Loc" : s3_loc
}
upload_results = core.ml_load_model_file_into_cache(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
lg("", 6)
lg("Done Loading Model File for DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 5)
lg("", 6)
lg("Importing and Caching Completed", 5)
lg("", 6)
else:
lg("", 6)
lg("ERROR: Failed Loading Model File(" + str(cache_req["ModelFile"]) + ") into Cache for DSName(" + str(ds_name) + ")", 6)
lg(upload_results["Error"], 6)
lg("", 6)
# end of if success
In [8]:
ds_name = "iris_regressor"
In [9]:
s3_bucket = "unique-bucket-name-for-datasets" # name this something under your AWS Account (This might be open to the public in the future...stay tuned)
s3_key = "dataset_" + core.to_upper(ds_name) + ".cache.pickle.zlib"
s3_loc = str(s3_bucket) + ":" + str(s3_key)
ra_name = "CACHE"
ml_file = data_dir + "/" + str(s3_key)
lg("-------------------------------------------------", 6)
lg("Importing Models and Analysis from S3 into Caching Models from CACHE - S3Loc(" + str(s3_loc) + ") File(" + str(ml_file) + ")", 6)
lg("", 6)
if os.path.exists(ml_file) == False:
s3_loc = str(s3_bucket) + ":" + str(s3_key)
lg("Downloading ModelFile S3Loc(" + str(s3_loc) + ")", 6)
download_results = core.s3_download_and_store_file(s3_loc, ml_file, core.get_rds(), core.get_dbs(), debug)
if download_results["Status"] != "SUCCESS":
lg("ERROR: Stopping processing for errror: " + str(download_results["Error"]), 0)
else:
lg("", 6)
lg("Done Downloading ModelFile S3Loc(" + str(s3_loc) + ") File(" + str(download_results["Record"]["File"]) + ")", 5)
ml_file = download_results["Record"]["File"]
else:
lg("", 6)
lg("Continuing with the existing file.", 5)
lg("", 6)
# end of downloading from s3 if it's not locally available
lg("Importing(" + str(ml_file) + ") Models and Analysis into Redis(" + str(ra_name) + ")", 6)
cache_req = {
"RAName" : ra_name,
"DSName" : str(ds_name),
"TrackingID" : "",
"ModelFile" : ml_file,
"S3Loc" : s3_loc
}
upload_results = core.ml_load_model_file_into_cache(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
lg("", 6)
lg("Done Loading Model File for DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 5)
lg("", 6)
lg("Importing and Caching Completed", 5)
lg("", 6)
else:
lg("", 6)
lg("ERROR: Failed Loading Model File(" + str(cache_req["ModelFile"]) + ") into Cache for DSName(" + str(ds_name) + ")", 6)
lg(upload_results["Error"], 6)
lg("", 6)
# end of if success
I built this notebook from the importer examples:
https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/importers