Importing the IRIS Models from S3 and Caching them

This notebook demonstrates how to import a machine learning model file from the S3 and store the Models + Analysis in the redis cache named "CACHE".

Overview

Download the S3 IRIS Model archive from the configured S3 Bucket + Key and decompress + extract the pickled historcial analysis and previously built models. This includes examples from the IRIS sample dataset and requires you to have a valid S3 Bucket storing the models and are comfortable paying for the download costs for downloading the objects from S3 (https://aws.amazon.com/s3/pricing/).

Once uploaded to the S3 Bucket you should be able to view the files have a similar disk size:

After importing, the Models and Analysis are available to any other Sci-pype instance with connectivity to the same redis cache.

Command-line Versions

I built this notebook from the importer examples:

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/importers

1) Setup the Environment for Importing the IRIS Classifier Model File



In [1]:

    
# Setup the Sci-pype environment
import sys, os

# Only redis is needed for this notebook:
os.environ["ENV_DEPLOYMENT_TYPE"] = "JustRedis"

# Load the Sci-pype PyCore as a named-object called "core" and environment variables
from src.common.load_ipython_env import *

2) Setup the Request

Import the Models from S3 and store the extracted Models + Analysis in the Cache.

Please make sure the environment variables are set correctly and the S3 Bucket exists:

   ENV_AWS_KEY=<AWS API Key>
   ENV_AWS_SECRET=<AWS API Secret>

For docker containers make sure to set these keys in the correct Jupyter env file and restart the container:

   <repo base dir>/justredis/jupyter.env
   <repo base dir>/local/jupyter.env
   <repo base dir>/test/jupyter.env

What's the dataset name?



In [2]:

    
ds_name     = "iris_classifier"

Where is the downloaded file getting stored?



In [3]:

    
data_dir    = str(os.getenv("ENV_DATA_SRC_DIR", "/opt/work/data/src"))
if not os.path.exists(data_dir):
    os.mkdir(data_dir, 0777)

What's the S3 Location (Unique Bucket Name + Key)?



In [4]:

    
s3_bucket   = "unique-bucket-name-for-datasets" # name this something under your AWS Account (This might be open to the public in the future...stay tuned)
s3_key      = "dataset_" + core.to_upper(ds_name) + ".cache.pickle.zlib"
s3_loc      = str(s3_bucket) + ":" + str(s3_key)

Where will the downloaded file be stored?



In [5]:

    
ml_file     = data_dir + "/" + str(s3_key)

Check if the Model File needs to be Downloaded



In [6]:

    
lg("-------------------------------------------------", 6)
lg("Importing Models and Analysis from S3 into Caching Models from CACHE - S3Loc(" + str(s3_loc) + ") File(" + str(ml_file) + ")", 6)
lg("", 6)

if os.path.exists(ml_file) == False:

    s3_loc              = str(s3_bucket) + ":" + str(s3_key)
    lg("Downloading ModelFile S3Loc(" + str(s3_loc) + ")", 6)
    download_results    = core.s3_download_and_store_file(s3_loc, ml_file, core.get_rds(), core.get_dbs(), debug)

    if download_results["Status"] != "SUCCESS":
        lg("ERROR: Stopping processing for errror: " + str(download_results["Error"]), 0)
    else:
        lg("", 6)
        lg("Done Downloading ModelFile S3Loc(" + str(s3_loc) + ") File(" + str(download_results["Record"]["File"]) + ")", 5)
        ml_file         = download_results["Record"]["File"]
else:
    lg("", 6)
    lg("Continuing with the existing file.", 5)
    lg("", 6)
# end of downloading from s3 if it's not locally available









    



-------------------------------------------------
Importing Models and Analysis from S3 into Caching Models from CACHE - S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib) File(/opt/work/data/src/dataset_IRIS_CLASSIFIER.cache.pickle.zlib)

Downloading ModelFile S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib)

Done Downloading ModelFile S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib) File(/opt/work/data/src/dataset_IRIS_CLASSIFIER.cache.pickle.zlib)

Start the Importer to Download + Cache the Models out of S3 or Locally if the file already exists



In [7]:

    
ra_name        = "CACHE"

lg("Importing(" + str(ml_file) + ") Models and Analysis into Redis(" + str(ra_name) + ")", 6)

cache_req      = {
                    "RAName"    : ra_name,
                    "DSName"    : str(ds_name),
                    "TrackingID": "",
                    "ModelFile" : ml_file,
                    "S3Loc"     : s3_loc
               }

upload_results = core.ml_load_model_file_into_cache(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
    lg("", 6)
    lg("Done Loading Model File for DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 5)
    lg("", 6)
    lg("Importing and Caching Completed", 5)
    lg("", 6)
else:
    lg("", 6)
    lg("ERROR: Failed Loading Model File(" + str(cache_req["ModelFile"]) + ") into Cache for DSName(" + str(ds_name) + ")", 6)
    lg(upload_results["Error"], 6)
    lg("", 6)
# end of if success









    



Importing(/opt/work/data/src/dataset_IRIS_CLASSIFIER.cache.pickle.zlib) Models and Analysis into Redis(CACHE)
Loading DSName(IRIS_CLASSIFIER) ModelFile(/opt/work/data/src/dataset_IRIS_CLASSIFIER.cache.pickle.zlib)






    



/opt/conda/envs/python2/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)






    



Compressing Models(5)
Compressing Model(0/5) Type(XGBRegressor)
Done Compressing(0/5) Type(XGBRegressor) Size(347522) Decompressed(1081031)
Caching Model(0) ID(_MD_IRIS_REGRESSOR_e50b12_0) RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalLength)
Done Caching Model(0) ID(_MD_IRIS_REGRESSOR_e50b12_0) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(1/5) Type(XGBRegressor)
Done Compressing(1/5) Type(XGBRegressor) Size(374016) Decompressed(1136927)
Caching Model(1) ID(_MD_IRIS_REGRESSOR_e50b12_1) RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalLength)
Done Caching Model(1) ID(_MD_IRIS_REGRESSOR_e50b12_1) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(2/5) Type(XGBRegressor)
Done Compressing(2/5) Type(XGBRegressor) Size(195314) Decompressed(650161)
Caching Model(2) ID(_MD_IRIS_REGRESSOR_e50b12_2) RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalWidth)
Done Caching Model(2) ID(_MD_IRIS_REGRESSOR_e50b12_2) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(3/5) Type(XGBRegressor)
Done Compressing(3/5) Type(XGBRegressor) Size(410953) Decompressed(1240355)
Caching Model(3) ID(_MD_IRIS_REGRESSOR_e50b12_3) RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalWidth)
Done Caching Model(3) ID(_MD_IRIS_REGRESSOR_e50b12_3) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(4/5) Type(XGBRegressor)
Done Compressing(4/5) Type(XGBRegressor) Size(59797) Decompressed(301562)
Caching Model(4) ID(_MD_IRIS_REGRESSOR_e50b12_4) RLoc(CACHE:_MD_IRIS_REGRESSOR_ResultTargetValue)
Done Caching Model(4) ID(_MD_IRIS_REGRESSOR_e50b12_4) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Caching AccuracyResults RLoc(CACHE:_MD_IRIS_REGRESSOR_Accuracy)
Done Caching AccuracyResults
Caching PredictionsDF RLoc(CACHE:_MD_IRIS_REGRESSOR_PredictionsDF)
Done Caching PredictionsDF
Decompressing Analysis Dataset
Finding ManifestKey(Accuracy) Records in RLoc(CACHE:_MD_IRIS_CLASSIFIER_Accuracy)
Decompressing Key(Accuracy)
Done Decompressing Key(Accuracy)
Finding ManifestKey(PredictionsDF) Records in RLoc(CACHE:_MD_IRIS_CLASSIFIER_PredictionsDF)
Decompressing Key(PredictionsDF)
Done Decompressing Key(PredictionsDF)
Done Decompressing Analysis Dataset
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_CLASSIFIER_SepalLength)
Found Model(_MD_IRIS_CLASSIFIER_bf9214_0) Type(XGBClassifier) Target(SepalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_CLASSIFIER_PetalLength)
Found Model(_MD_IRIS_CLASSIFIER_bf9214_1) Type(XGBClassifier) Target(PetalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_CLASSIFIER_PetalWidth)
Found Model(_MD_IRIS_CLASSIFIER_bf9214_2) Type(XGBClassifier) Target(PetalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_CLASSIFIER_SepalWidth)
Found Model(_MD_IRIS_CLASSIFIER_bf9214_3) Type(XGBClassifier) Target(SepalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_CLASSIFIER_ResultTargetValue)
Found Model(_MD_IRIS_CLASSIFIER_bf9214_4) Type(XGBClassifier) Target(ResultTargetValue) FeatureNames(4)
Sorting Predictions
Done Decompressing Models(5)

Done Loading Model File for DSName(iris_classifier) S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_CLASSIFIER.cache.pickle.zlib)

Importing and Caching Completed

3) Setup to Import and Cache the IRIS Regressor Models and Analysis



In [8]:

    
ds_name        = "iris_regressor"

4) Import the IRIS Regressor Models from S3 and Store them in Redis



In [9]:

    
s3_bucket      = "unique-bucket-name-for-datasets" # name this something under your AWS Account (This might be open to the public in the future...stay tuned)
s3_key         = "dataset_" + core.to_upper(ds_name) + ".cache.pickle.zlib"
s3_loc         = str(s3_bucket) + ":" + str(s3_key)
ra_name        = "CACHE"
ml_file        = data_dir + "/" + str(s3_key)

lg("-------------------------------------------------", 6)
lg("Importing Models and Analysis from S3 into Caching Models from CACHE - S3Loc(" + str(s3_loc) + ") File(" + str(ml_file) + ")", 6)
lg("", 6)

if os.path.exists(ml_file) == False:

    s3_loc              = str(s3_bucket) + ":" + str(s3_key)
    lg("Downloading ModelFile S3Loc(" + str(s3_loc) + ")", 6)
    download_results    = core.s3_download_and_store_file(s3_loc, ml_file, core.get_rds(), core.get_dbs(), debug)

    if download_results["Status"] != "SUCCESS":
        lg("ERROR: Stopping processing for errror: " + str(download_results["Error"]), 0)
    else:
        lg("", 6)
        lg("Done Downloading ModelFile S3Loc(" + str(s3_loc) + ") File(" + str(download_results["Record"]["File"]) + ")", 5)
        ml_file         = download_results["Record"]["File"]
else:
    lg("", 6)
    lg("Continuing with the existing file.", 5)
    lg("", 6)
# end of downloading from s3 if it's not locally available

lg("Importing(" + str(ml_file) + ") Models and Analysis into Redis(" + str(ra_name) + ")", 6)

cache_req      = {
                    "RAName"       : ra_name,
                    "DSName"       : str(ds_name),
                    "TrackingID"   : "",
                    "ModelFile"    : ml_file,
                    "S3Loc"        : s3_loc
               }

upload_results = core.ml_load_model_file_into_cache(cache_req, core.get_rds(), core.get_dbs(), debug)
if upload_results["Status"] == "SUCCESS":
    lg("", 6)
    lg("Done Loading Model File for DSName(" + str(ds_name) + ") S3Loc(" + str(cache_req["S3Loc"]) + ")", 5)

    lg("", 6)
    lg("Importing and Caching Completed", 5)
    lg("", 6)
else:
    lg("", 6)
    lg("ERROR: Failed Loading Model File(" + str(cache_req["ModelFile"]) + ") into Cache for DSName(" + str(ds_name) + ")", 6)
    lg(upload_results["Error"], 6)
    lg("", 6)
# end of if success









    



-------------------------------------------------
Importing Models and Analysis from S3 into Caching Models from CACHE - S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib) File(/opt/work/data/src/dataset_IRIS_REGRESSOR.cache.pickle.zlib)

Downloading ModelFile S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib)

Done Downloading ModelFile S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib) File(/opt/work/data/src/dataset_IRIS_REGRESSOR.cache.pickle.zlib)
Importing(/opt/work/data/src/dataset_IRIS_REGRESSOR.cache.pickle.zlib) Models and Analysis into Redis(CACHE)
Loading DSName(IRIS_REGRESSOR) ModelFile(/opt/work/data/src/dataset_IRIS_REGRESSOR.cache.pickle.zlib)
Compressing Models(5)
Compressing Model(0/5) Type(XGBRegressor)
Done Compressing(0/5) Type(XGBRegressor) Size(347522) Decompressed(1081031)
Caching Model(0) ID(_MD_IRIS_REGRESSOR_8c4c5b_0) RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalLength)
Done Caching Model(0) ID(_MD_IRIS_REGRESSOR_8c4c5b_0) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(1/5) Type(XGBRegressor)
Done Compressing(1/5) Type(XGBRegressor) Size(374016) Decompressed(1136927)
Caching Model(1) ID(_MD_IRIS_REGRESSOR_8c4c5b_1) RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalLength)
Done Caching Model(1) ID(_MD_IRIS_REGRESSOR_8c4c5b_1) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(2/5) Type(XGBRegressor)
Done Compressing(2/5) Type(XGBRegressor) Size(195314) Decompressed(650161)
Caching Model(2) ID(_MD_IRIS_REGRESSOR_8c4c5b_2) RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalWidth)
Done Caching Model(2) ID(_MD_IRIS_REGRESSOR_8c4c5b_2) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(3/5) Type(XGBRegressor)
Done Compressing(3/5) Type(XGBRegressor) Size(410953) Decompressed(1240355)
Caching Model(3) ID(_MD_IRIS_REGRESSOR_8c4c5b_3) RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalWidth)
Done Caching Model(3) ID(_MD_IRIS_REGRESSOR_8c4c5b_3) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Compressing Model(4/5) Type(XGBRegressor)
Done Compressing(4/5) Type(XGBRegressor) Size(59797) Decompressed(301562)
Caching Model(4) ID(_MD_IRIS_REGRESSOR_8c4c5b_4) RLoc(CACHE:_MD_IRIS_REGRESSOR_ResultTargetValue)
Done Caching Model(4) ID(_MD_IRIS_REGRESSOR_8c4c5b_4) RLoc(CACHE:_MODELS_IRIS_REGRESSOR_LATEST)
Caching AccuracyResults RLoc(CACHE:_MD_IRIS_REGRESSOR_Accuracy)
Done Caching AccuracyResults
Caching PredictionsDF RLoc(CACHE:_MD_IRIS_REGRESSOR_PredictionsDF)
Done Caching PredictionsDF
Decompressing Analysis Dataset
Finding ManifestKey(Accuracy) Records in RLoc(CACHE:_MD_IRIS_REGRESSOR_Accuracy)
Decompressing Key(Accuracy)
Done Decompressing Key(Accuracy)
Finding ManifestKey(PredictionsDF) Records in RLoc(CACHE:_MD_IRIS_REGRESSOR_PredictionsDF)
Decompressing Key(PredictionsDF)
Done Decompressing Key(PredictionsDF)
Done Decompressing Analysis Dataset
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalLength)
Found Model(_MD_IRIS_REGRESSOR_8c4c5b_0) Type(XGBRegressor) Target(SepalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalLength)
Found Model(_MD_IRIS_REGRESSOR_8c4c5b_1) Type(XGBRegressor) Target(PetalLength) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_PetalWidth)
Found Model(_MD_IRIS_REGRESSOR_8c4c5b_2) Type(XGBRegressor) Target(PetalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_SepalWidth)
Found Model(_MD_IRIS_REGRESSOR_8c4c5b_3) Type(XGBRegressor) Target(SepalWidth) FeatureNames(4)
Getting Single Model
Getting Model RLoc(CACHE:_MD_IRIS_REGRESSOR_ResultTargetValue)
Found Model(_MD_IRIS_REGRESSOR_8c4c5b_4) Type(XGBRegressor) Target(ResultTargetValue) FeatureNames(4)
Sorting Predictions
Done Decompressing Models(5)

Done Loading Model File for DSName(iris_regressor) S3Loc(unique-bucket-name-for-datasets:dataset_IRIS_REGRESSOR.cache.pickle.zlib)

Importing and Caching Completed

Automation with Lambda - Coming Soon

Native lambda uploading support will be added in the future. Packaging and functionality still need to be figured out. For now, you can extend the command line versions for the extractors below.

Command-line Versions

I built this notebook from the importer examples:

https://github.com/jay-johnson/sci-pype/tree/master/bins/ml/importers