Extracting the IRIS Models from Cache

This notebook demonstrates how to extract the machine learning Models + Analysis from the redis cache CACHE" and saved to disk as a compressed, string artifact file (Pickle + zlib compression). Once the file is saved, it is uploaded to the configured S3 Bucket for archiving and sharing.

Overview

Extract the IRIS Regressor and Classification datasets from the redis CACHE. After extraction, compile a manifest for defining a cache mapping for all the Models + their respective Analysis. Once cached, the Models can be extract and shared + deployed on other Sci-pype instances by using something like this notebook or the command-line versions.

Command-line Versions

I built this notebook from the extractor examples:

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/extractors

1) Extract the IRIS Classifier Models + Analysis from the Cache


In [1]:
# Setup the Sci-pype environment
import sys, os

# Only redis is needed for this notebook:
os.environ["ENV_DEPLOYMENT_TYPE"] = "JustRedis"

# Load the Sci-pype PyCore as a named-object called "core" and environment variables
from src.common.load_ipython_env import *

2) Setup the Request

Extract the Models from the Cache with this request and upload them object files to the configured S3 Bucket.

Please make sure the environment variables are set correctly and the S3 Bucket exists:

   ENV_AWS_KEY=<AWS API Key>
   ENV_AWS_SECRET=<AWS API Secret>

For docker containers make sure to set these keys in the correct Jupyter env file and restart the container:

   <repo base dir>/justredis/jupyter.env
   <repo base dir>/local/jupyter.env
   <repo base dir>/test/jupyter.env
  • What's the dataset name?

In [2]:
ds_name             = "iris_classifier"
  • Where is the downloaded file getting stored?

In [3]:
data_dir            = str(os.getenv("ENV_DATA_DST_DIR", "/opt/work/data/dst"))
if not os.path.exists(data_dir):
    os.mkdir(data_dir, 0777)
  • What's the S3 Location (Unique Bucket Name + Key)?

In [4]:
s3_bucket           = "unique-bucket-name-for-datasets" # name this something under your AWS Account (This might be open to the public in the future...stay tuned)
s3_key              = "dataset_" + core.to_upper(ds_name) + ".cache.pickle.zlib"
s3_loc              = str(s3_bucket) + ":" + str(s3_key)
  • Build the full request and run it

In [5]:
cache_req           = {
                        "RAName"        : "CACHE",      # Redis instance name holding the models
                        "DSName"        : str(ds_name), # Dataset name for pulling out of the cache
                        "S3Loc"         : str(s3_loc),  # S3 location to store the model file
                        "DeleteAfter"   : False,        # Optional delete after upload
                        "SaveDir"       : data_dir,     # Optional dir to save the model file - default is ENV_DATA_DST_DIR
                        "TrackingID"    : ""            # Future support for using the tracking id
                    }

upload_results      = core.ml_upload_cached_dataset_to_s3(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
    lg("Done Uploading Model and Analysis DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 6)
else:
    lg("", 6)
    lg("ERROR: Failed Upload Model and Analysis Caches as file for DSName(" + str(ds_name) + ")", 6)
    lg(upload_results["Error"], 6)
    lg("", 6)
# end of if extract + upload worked

lg("", 6)
lg("Extract and Upload Completed", 5)
lg("", 6)


Decompressing Analysis Dataset
Finding ManifestKey(Accuracy) Records in RLoc(CACHE:_MD_IRIS_CLASSIFIER_Accuracy)
Decompressing Key(Accuracy)
Done Decompressing Key(Accuracy)
Finding ManifestKey(PredictionsDF) Records in RLoc(CACHE:_MD_IRIS_CLASSIFIER_PredictionsDF)
Decompressing Key(PredictionsDF)
Done Decompressing Key(PredictionsDF)
Done Decompressing Analysis Dataset
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_CLASSIFIER_SepalLength)
/opt/conda/envs/python2/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Found Model(_MD_IRIS_CLASSIFIER_bf9214_0) Type(XGBClassifier) Target(SepalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_CLASSIFIER_PetalLength)
Found Model(_MD_IRIS_CLASSIFIER_bf9214_1) Type(XGBClassifier) Target(PetalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_CLASSIFIER_PetalWidth)
Found Model(_MD_IRIS_CLASSIFIER_bf9214_2) Type(XGBClassifier) Target(PetalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_CLASSIFIER_SepalWidth)
Found Model(_MD_IRIS_CLASSIFIER_bf9214_3) Type(XGBClassifier) Target(SepalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_CLASSIFIER_ResultTargetValue)
Found Model(_MD_IRIS_CLASSIFIER_bf9214_4) Type(XGBClassifier) Target(ResultTargetValue) FeatureNames(4)
Sorting Predictions
Done Decompressing Models(5)
Found DSName(IRIS_CLASSIFIER) Analysis Created on Date(2017-01-09 08:02:07) Creating File(/opt/work/data/dst/dataset_IRIS_CLASSIFIER.cache.pickle.zlib)
Validating Serialization
Checking Model Counts
Decompression Validation Passed - Models(5) == (5)
Done Creating File(/opt/work/data/dst/dataset_IRIS_CLASSIFIER.cache.pickle.zlib)
Uploading to DSName(IRIS_CLASSIFIER) Analysis to S3(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib)
Uploaded DSName(IRIS_CLASSIFIER) S3(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib)
Done Uploading to S3(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib)
Done Uploading Model and Analysis DSName(iris_classifier) S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib)

Extract and Upload Completed

3) Setup the Extract and Upload for the IRIS Regressor Models and Analysis


In [6]:
ds_name             = "iris_regressor"

4) Build and Run the Extract + Upload Request


In [7]:
cache_req           = {
                        "RAName"        : "CACHE",      # Redis instance name holding the models
                        "DSName"        : str(ds_name), # Dataset name for pulling out of the cache
                        "S3Loc"         : str(s3_loc),  # S3 location to store the model file
                        "DeleteAfter"   : False,        # Optional delete after upload
                        "SaveDir"       : data_dir,     # Optional dir to save the model file - default is ENV_DATA_DST_DIR
                        "TrackingID"    : ""            # Future support for using the tracking id
                    }

upload_results      = core.ml_upload_cached_dataset_to_s3(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
    lg("Done Uploading Model and Analysis DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 6)
else:
    lg("", 6)
    lg("ERROR: Failed Upload Model and Analysis Caches as file for DSName(" + str(ds_name) + ")", 6)
    lg(upload_results["Error"], 6)
    lg("", 6)
    sys.exit(1)
# end of if extract + upload worked

lg("", 6)
lg("Extract and Upload Completed", 5)
lg("", 6)


Decompressing Analysis Dataset
Finding ManifestKey(Accuracy) Records in RLoc(CACHE:_MD_IRIS_REGRESSOR_Accuracy)
Decompressing Key(Accuracy)
Done Decompressing Key(Accuracy)
Finding ManifestKey(PredictionsDF) Records in RLoc(CACHE:_MD_IRIS_REGRESSOR_PredictionsDF)
Decompressing Key(PredictionsDF)
Done Decompressing Key(PredictionsDF)
Done Decompressing Analysis Dataset
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalLength)
Found Model(_MD_IRIS_REGRESSOR_e50b12_0) Type(XGBRegressor) Target(SepalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalLength)
Found Model(_MD_IRIS_REGRESSOR_e50b12_1) Type(XGBRegressor) Target(PetalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalWidth)
Found Model(_MD_IRIS_REGRESSOR_e50b12_2) Type(XGBRegressor) Target(PetalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalWidth)
Found Model(_MD_IRIS_REGRESSOR_e50b12_3) Type(XGBRegressor) Target(SepalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_ResultTargetValue)
Found Model(_MD_IRIS_REGRESSOR_e50b12_4) Type(XGBRegressor) Target(ResultTargetValue) FeatureNames(4)
Sorting Predictions
Done Decompressing Models(5)
Found DSName(IRIS_REGRESSOR) Analysis Created on Date(2017-01-09 08:01:27) Creating File(/opt/work/data/dst/dataset_IRIS_CLASSIFIER.cache.pickle.zlib)
Validating Serialization
Checking Model Counts
Decompression Validation Passed - Models(5) == (5)
Done Creating File(/opt/work/data/dst/dataset_IRIS_CLASSIFIER.cache.pickle.zlib)
Uploading to DSName(IRIS_REGRESSOR) Analysis to S3(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib)
Uploaded DSName(IRIS_REGRESSOR) S3(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib)
Done Uploading to S3(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib)
Done Uploading Model and Analysis DSName(iris_regressor) S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib)

Extract and Upload Completed

Automation with Lambda - Coming Soon

Native lambda uploading support will be added in the future. Packaging and functionality still need to be figured out. For now, you can extend the command line versions for the extractors below.

Command-line Versions

I built this notebook from the extractor examples:

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/extractors