Extracting the IRIS XGB Models and Analysis from Redis Labs Cloud

This notebook demonstrates how to extract the machine learning Models + Analysis from the Redis Labs Cloud (https://redislabs.com/redis-cloud) cache endpoint named "CACHE" and saved locally as a compressed, string artifact file (Pickle + zlib compression). Once the file is saved, it is uploaded to the configured S3 Bucket for archiving and sharing.

Overview

Extract the IRIS XGB regressor models from the Redis Labs Cloud CACHE endpoint. After extraction, compile a manifest for defining a cache mapping for all the Models + their respective Analysis. Once cached, the Models can be extract and shared + deployed on other Sci-pype instances by using something like this notebook or the command-line versions.

Command-line Versions

This notebook was built from the extractor command line examples:

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/extractors/rl_extract_and_upload_iris_regressor.py

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/extractors

1) Extract the IRIS XGB Regressor Models + Analysis from the Cache



In [1]:

    
# Setup the Sci-pype environment
import sys, os

# Only Redis Labs is needed for this notebook:
os.environ["ENV_DEPLOYMENT_TYPE"] = "RedisLabs"

# Load the Sci-pype PyCore as a named-object called "core" and environment variables
from src.common.load_ipython_env import *

2) Setup the Request

Extract the Models from the Cache with this request and upload them object files to the configured S3 Bucket.

Please make sure the environment variables are set correctly and the S3 Bucket exists:

   ENV_AWS_KEY=<AWS API Key>
   ENV_AWS_SECRET=<AWS API Secret>

For docker containers make sure to set these keys in the correct Jupyter env file and restart the container:

   <repo base dir>/justredis/redis-labs.env
   <repo base dir>/local/jupyter.env
   <repo base dir>/test/jupyter.env

What's the dataset name?



In [2]:

    
ds_name             = "iris_regressor"

Where is the downloaded file getting stored?



In [3]:

    
data_dir            = str(os.getenv("ENV_DATA_DST_DIR", "/opt/work/data/dst"))
if not os.path.exists(data_dir):
    os.mkdir(data_dir, 0777)

What's the S3 Location (Unique Bucket Name + Key)?



In [4]:

    
s3_bucket           = "unique-bucket-name-for-datasets" # name this something under your AWS Account (This might be open to the public in the future...stay tuned)
s3_key              = "dataset_" + core.to_upper(ds_name) + ".cache.pickle.zlib"
s3_loc              = str(s3_bucket) + ":" + str(s3_key)

3) Build and Run the Extract + Upload Request



In [5]:

    
cache_req           = {
                        "RAName"        : "CACHE",      # Redis endpoint name holding the models
                        "DSName"        : str(ds_name), # Dataset name for pulling out of the cache
                        "S3Loc"         : str(s3_loc),  # S3 location to store the model file
                        "DeleteAfter"   : False,        # Optional delete after upload
                        "SaveDir"       : data_dir,     # Optional dir to save the model file - default is ENV_DATA_DST_DIR
                        "TrackingID"    : ""            # Future support for using the tracking id
                    }

upload_results      = core.ml_upload_cached_dataset_to_s3(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
    lg("Done Uploading Model and Analysis DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 6)
else:
    lg("", 6)
    lg("ERROR: Failed Upload Model and Analysis Caches as file for DSName(" + str(ds_name) + ")", 6)
    lg(upload_results["Error"], 6)
    lg("", 6)
    sys.exit(1)
# end of if extract + upload worked

lg("", 6)
lg("Extract and Upload Completed", 5)
lg("", 6)









    



Decompressing Analysis Dataset
Finding ManifestKey(Accuracy) Records in RLoc(CACHE:_MD_IRIS_REGRESSOR_Accuracy)
Decompressing Key(Accuracy)
Done Decompressing Key(Accuracy)
Finding ManifestKey(PredictionsDF) Records in RLoc(CACHE:_MD_IRIS_REGRESSOR_PredictionsDF)
Decompressing Key(PredictionsDF)
Done Decompressing Key(PredictionsDF)
Done Decompressing Analysis Dataset
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalLength)






    



/opt/conda/envs/python2/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)






    



Found Model(_MD_IRIS_REGRESSOR_992f1e_0) Type(XGBRegressor) Target(SepalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalLength)
Found Model(_MD_IRIS_REGRESSOR_992f1e_1) Type(XGBRegressor) Target(PetalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalWidth)
Found Model(_MD_IRIS_REGRESSOR_992f1e_2) Type(XGBRegressor) Target(PetalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalWidth)
Found Model(_MD_IRIS_REGRESSOR_992f1e_3) Type(XGBRegressor) Target(SepalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_ResultTargetValue)
Found Model(_MD_IRIS_REGRESSOR_992f1e_4) Type(XGBRegressor) Target(ResultTargetValue) FeatureNames(4)
Sorting Predictions
Done Decompressing Models(5)
Found DSName(IRIS_REGRESSOR) Analysis Created on Date(2017-01-31 19:41:20) Creating File(/opt/work/data/dst/dataset_IRIS_REGRESSOR.cache.pickle.zlib)
Validating Serialization
Checking Model Counts
Decompression Validation Passed - Models(5) == (5)
Done Creating File(/opt/work/data/dst/dataset_IRIS_REGRESSOR.cache.pickle.zlib)
Uploading to DSName(IRIS_REGRESSOR) Analysis to S3(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib)
Uploaded DSName(IRIS_REGRESSOR) S3(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib)
Done Uploading to S3(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib)
Done Uploading Model and Analysis DSName(iris_regressor) S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib)

Extract and Upload Completed

Automation with Lambda - Coming Soon

Native lambda uploading support will be added in the future. Packaging and functionality still need to be figured out. For now, you can extend the command line versions for the extractors below.

Next Steps

Now that the XGB models are archived as an artifact on S3, you can run the following notebooks to checkout how the Sci-pype workflow continues using this data science artifact:

Command-line Versions

This notebook was built from the extractor command line examples:

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/extractors/rl_extract_and_upload_iris_regressor.py

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/extractors