Importing the IRIS XGB Models from S3 and Deploying them to Redis Labs Cloud

This notebook demonstrates how to import a machine learning model file from the S3 location and deploy the Models + Analysis on the Redis Labs Cloud (https://redislabs.com/redis-cloud) endpoint named "CACHE".

Overview

Download the S3 IRIS Model archive from the configured S3 Bucket + Key and decompress + extract the pickled historcial analysis and previously built models. This includes examples from the IRIS sample dataset and requires you to have a valid S3 Bucket storing the models and are comfortable paying for the download costs for downloading the objects from S3 (https://aws.amazon.com/s3/pricing/).

Once uploaded to the S3 Bucket you should be able to view the files have a similar disk size:

After importing, the Models and Analysis are available to any other Sci-pype instance with connectivity to the same Redis Labs Online Cloud cache instance.

Command-line Versions

This notebook was built from the importer command line examples:

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/importers/rl_import_iris_regressor.py

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/importers

1) Setup the Environment for Deploying to Redis Labs Cloud



In [1]:

    
# Setup the Sci-pype environment
import sys, os

# Only Redis Labs is needed for this notebook:
os.environ["ENV_DEPLOYMENT_TYPE"] = "RedisLabs"

# Load the Sci-pype PyCore as a named-object called "core" and environment variables
from src.common.load_ipython_env import *

2) Setup the Request

Import the Models from S3 and store the extracted Models + Analysis in the Redis Labs Online Cloud cache endpoint named "Cache" listening on port 16005.

Please make sure the environment variables are set correctly and the S3 Bucket exists:

   ENV_AWS_KEY=<AWS API Key>
   ENV_AWS_SECRET=<AWS API Secret>

For docker containers make sure to set these keys in the correct Jupyter env file and restart the container:

   <repo base dir>/justredis/redis-labs.env
   <repo base dir>/local/jupyter.env
   <repo base dir>/test/jupyter.env

What's the dataset name?



In [2]:

    
ds_name     = "iris_regressor"

Where is the downloaded file getting stored?



In [3]:

    
data_dir    = str(os.getenv("ENV_DATA_SRC_DIR", "/opt/work/data/src"))
if not os.path.exists(data_dir):
    os.mkdir(data_dir, 0777)

What's the S3 Location (Unique Bucket Name + Key)?



In [4]:

    
s3_bucket   = "unique-bucket-name-for-datasets" # name this something under your AWS Account (This might be open to the public in the future...stay tuned)
s3_key      = "dataset_" + core.to_upper(ds_name) + ".cache.pickle.zlib"
s3_loc      = str(s3_bucket) + ":" + str(s3_key)

Where will the downloaded file be stored?



In [5]:

    
ml_file     = data_dir + "/" + str(s3_key)

3) Check if the Model File Artifact needs to be Downloaded



In [6]:

    
lg("-------------------------------------------------", 6)
lg("Importing Models and Analysis from S3 into Caching Models from CACHE - S3Loc(" + str(s3_loc) + ") File(" + str(ml_file) + ")", 6)
lg("", 6)

if os.path.exists(ml_file) == False:

    s3_loc              = str(s3_bucket) + ":" + str(s3_key)
    lg("Downloading ModelFile S3Loc(" + str(s3_loc) + ")", 6)
    download_results    = core.s3_download_and_store_file(s3_loc, ml_file, core.get_rds(), core.get_dbs(), debug)

    if download_results["Status"] != "SUCCESS":
        lg("ERROR: Stopping processing for errror: " + str(download_results["Error"]), 0)
    else:
        lg("", 6)
        lg("Done Downloading ModelFile S3Loc(" + str(s3_loc) + ") File(" + str(download_results["Record"]["File"]) + ")", 5)
        ml_file         = download_results["Record"]["File"]
else:
    lg("", 6)
    lg("Continuing with the existing file.", 5)
    lg("", 6)
# end of downloading from s3 if it's not locally available









    



-------------------------------------------------
Importing Models and Analysis from S3 into Caching Models from CACHE - S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib) File(/opt/work/data/src/dataset_IRIS_REGRESSOR.cache.pickle.zlib)

Downloading ModelFile S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib)

Done Downloading ModelFile S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib) File(/opt/work/data/src/dataset_IRIS_REGRESSOR.cache.pickle.zlib)

4) Start the Importer to Download + Cache the Models out of S3 or Locally if the file already exists



In [7]:

    
ra_name        = "CACHE"

lg("Importing(" + str(ml_file) + ") Models and Analysis into Redis(" + str(ra_name) + ")", 6)

cache_req      = {
                    "RAName"    : ra_name,
                    "DSName"    : str(ds_name),
                    "TrackingID": "",
                    "ModelFile" : ml_file,
                    "S3Loc"     : s3_loc
               }

upload_results = core.ml_load_model_file_into_cache(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
    lg("", 6)
    lg("Done Loading Model File for DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 5)
    lg("", 6)
    lg("Importing and Caching Completed", 5)
    lg("", 6)
else:
    lg("", 6)
    lg("ERROR: Failed Loading Model File(" + str(cache_req["ModelFile"]) + ") into Cache for DSName(" + str(ds_name) + ")", 6)
    lg(upload_results["Error"], 6)
    lg("", 6)
# end of if success









    



Importing(/opt/work/data/src/dataset_IRIS_REGRESSOR.cache.pickle.zlib) Models and Analysis into Redis(CACHE)
Loading DSName(IRIS_REGRESSOR) ModelFile(/opt/work/data/src/dataset_IRIS_REGRESSOR.cache.pickle.zlib)






    



/opt/conda/envs/python2/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)






    



Compressing Models(5)
Compressing Model(0/5) Type(XGBRegressor)
Done Compressing(0/5) Type(XGBRegressor) Size(347522) Decompressed(1081031)
Caching Model(0) ID(_MD_IRIS_REGRESSOR_992f1e_0) RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalLength)
Done Caching Model(0) ID(_MD_IRIS_REGRESSOR_992f1e_0) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(1/5) Type(XGBRegressor)
Done Compressing(1/5) Type(XGBRegressor) Size(374016) Decompressed(1136927)
Caching Model(1) ID(_MD_IRIS_REGRESSOR_992f1e_1) RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalLength)
Done Caching Model(1) ID(_MD_IRIS_REGRESSOR_992f1e_1) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(2/5) Type(XGBRegressor)
Done Compressing(2/5) Type(XGBRegressor) Size(195314) Decompressed(650161)
Caching Model(2) ID(_MD_IRIS_REGRESSOR_992f1e_2) RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalWidth)
Done Caching Model(2) ID(_MD_IRIS_REGRESSOR_992f1e_2) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(3/5) Type(XGBRegressor)
Done Compressing(3/5) Type(XGBRegressor) Size(410953) Decompressed(1240355)
Caching Model(3) ID(_MD_IRIS_REGRESSOR_992f1e_3) RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalWidth)
Done Caching Model(3) ID(_MD_IRIS_REGRESSOR_992f1e_3) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(4/5) Type(XGBRegressor)
Done Compressing(4/5) Type(XGBRegressor) Size(59797) Decompressed(301562)
Caching Model(4) ID(_MD_IRIS_REGRESSOR_992f1e_4) RLoc(CACHE:_MD_IRIS_REGRESSOR_ResultTargetValue)
Done Caching Model(4) ID(_MD_IRIS_REGRESSOR_992f1e_4) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Caching AccuracyResults RLoc(CACHE:_MD_IRIS_REGRESSOR_Accuracy)
Done Caching AccuracyResults
Caching PredictionsDF RLoc(CACHE:_MD_IRIS_REGRESSOR_PredictionsDF)
Done Caching PredictionsDF
Decompressing Analysis Dataset
Finding ManifestKey(Accuracy) Records in RLoc(CACHE:_MD_IRIS_REGRESSOR_Accuracy)
Decompressing Key(Accuracy)
Done Decompressing Key(Accuracy)
Finding ManifestKey(PredictionsDF) Records in RLoc(CACHE:_MD_IRIS_REGRESSOR_PredictionsDF)
Decompressing Key(PredictionsDF)
Done Decompressing Key(PredictionsDF)
Done Decompressing Analysis Dataset
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalLength)
Found Model(_MD_IRIS_REGRESSOR_992f1e_0) Type(XGBRegressor) Target(SepalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalLength)
Found Model(_MD_IRIS_REGRESSOR_992f1e_1) Type(XGBRegressor) Target(PetalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalWidth)
Found Model(_MD_IRIS_REGRESSOR_992f1e_2) Type(XGBRegressor) Target(PetalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalWidth)
Found Model(_MD_IRIS_REGRESSOR_992f1e_3) Type(XGBRegressor) Target(SepalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_ResultTargetValue)
Found Model(_MD_IRIS_REGRESSOR_992f1e_4) Type(XGBRegressor) Target(ResultTargetValue) FeatureNames(4)
Sorting Predictions
Done Decompressing Models(5)

Done Loading Model File for DSName(iris_regressor) S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib)

Importing and Caching Completed

Automation with Lambda - Coming Soon

Native lambda uploading support will be added in the future. Packaging and functionality still need to be figured out. For now, you can extend the command line versions for the extractors below.

Next Steps

Now that the XGB models from the S3 artifact are deployed to the Machine Learning data store within the Redis Labs Cloud endpoint, you can run the following notebooks to checkout how the Sci-pype workflow continues using this data science artifact:

Command-line Versions

This notebook was built from the importer command line examples:

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/importers/rl_import_iris_regressor.py

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/importers