Learning AWS Batch

What it does: Allows you to create computing jobs on the cloud. Amazon computers (EC2) will complete the job and you can recieve the results.

Work flow:

Retrieve input data from AWS bucket.
Submit a job that analyzes the given input data.
Once job finishes, EC2 instance can store output results?
User can then retrieve results from the cloud (I'm guessing from another bucket).

Setup:

http://docs.aws.amazon.com/batch/latest/userguide/Batch_GetStarted.html
https://boto3.readthedocs.io/en/latest/guide/quickstart.html

Should have: AWS account (duh), IAM User, boto3/awscli installed.

Simple Example:

In this simple test, I will attempt to start a job process using AWS Batch. The job will not depend on any input data or other parameters.



In [3]:

    
import boto3
# low level client that can access AWS batch methods
client = boto3.client("batch")



In [2]:

    
# create Compute Environments
# Compute environments contain the EC2 instances that are used to run batch jobs. You will map a compute environment
# to a job queue. The job queue has a scheduler that helps plan out which EC2 instances are ready to take a job.

response = client.create_compute_environment(
    computeEnvironmentName= "Test_Env",
    type= 'Managed', # managed means AWS will manage the computing resources that you specify
    state= 'ENABLED', # enable your computing environment
    computeResources={
        'type': 'EC2'|'SPOT',
        'minvCpus': 1, #min number of cpus environment should maintain.
        'maxvCpus': 5, #max number of cpus environment should maintain.
        'desiredvCpus': 3,
        'instanceTypes': ['c4.large'], #type of instances allowed to run.
        'subnets': ['subnet-220c0e0a'] #this is a thing
        'securityGroupIds': ['sg-cf5093b2'], #this is also a thing
        'instanceRole': 'ecsInstanceRole'
    },
    serviceRole= 'string'
)
print response



In [4]:

    
# create job queue (this is where AWS will store your jobs until an EC2 Instance is available to run them)
response = client.create_job_queue(
    jobQueueName='test_queue',
    state='ENABLED',
    priority=1,
    computeEnvironmentOrder=[
        {
            'order': 1,
            'computeEnvironment': 'test_batch'
        },
    ]
)

print response









    



{u'jobQueueArn': u'arn:aws:batch:us-east-1:389826612951:job-queue/test_queue', 'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': '3c73fb6d-022c-11e7-b707-1768a71d5ff7', 'HTTPHeaders': {'x-amzn-requestid': '3c73fb6d-022c-11e7-b707-1768a71d5ff7', 'content-length': '103', 'via': '1.1 33ea49e9d525be4d0ab838be605fd6d0.cloudfront.net (CloudFront)', 'x-cache': 'Miss from cloudfront', 'x-amzn-trace-id': 'Root=1-58bcf0fb-eb532ba796f5cc9abe38c722', 'connection': 'keep-alive', 'x-amz-cf-id': 'y6832nF1UfxMw7cOVfIaH6yt3G21y9iVyVy694UBNcdol-N1gifyhA==', 'date': 'Mon, 06 Mar 2017 05:17:47 GMT', 'content-type': 'application/json'}}, u'jobQueueName': u'test_queue'}



In [15]:

    
# job description: specifies how the jobs are run (required field for jobs)
# special attributes you can attach: Docker images to use with the job, a command (can be overwritten during runtime)
# environment variables, data volumes, etc
response = client.register_job_definition(
    type='container',
    containerProperties={
        'command': [
            'echo',
            'Hello World',
        ],
        'image': 'busybox', # docker image with basic UNIX utils
        'memory': 128,
        'vcpus': 1,
    },
    jobDefinitionName='echoMsg',
)

print(response)









    



{u'jobDefinitionArn': u'arn:aws:batch:us-east-1:389826612951:job-definition/echoMsg:2', u'jobDefinitionName': u'echoMsg', 'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': '8bf64e6a-0230-11e7-96c7-43e8595d1439', 'HTTPHeaders': {'x-amzn-requestid': '8bf64e6a-0230-11e7-96c7-43e8595d1439', 'content-length': '127', 'via': '1.1 c4ff8a7ab32ae47f65347ecd5f30a4e6.cloudfront.net (CloudFront)', 'x-cache': 'Miss from cloudfront', 'x-amzn-trace-id': 'Root=1-58bcf836-1e6c6db13a063e0ff8c84861', 'connection': 'keep-alive', 'x-amz-cf-id': 'bxYv-nwcR6gT-cyxzFwVXyESNEB3Afu46EIX8wQsxJthXcAI7amooQ==', 'date': 'Mon, 06 Mar 2017 05:48:38 GMT', 'content-type': 'application/json'}}, u'revision': 2}



In [6]:

    
#submit a job
response = client.submit_job(
    jobDefinition='echoMsg',
    jobName='test',
    jobQueue='test_queue',
    containerOverrides={
        'command': ['echo', 'NEURODATA']
    }
)

print response









    



{u'jobName': u'test', 'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': '283b37f2-022e-11e7-a28b-1fdf52ca726b', 'HTTPHeaders': {'x-amzn-requestid': '283b37f2-022e-11e7-a28b-1fdf52ca726b', 'content-length': '65', 'via': '1.1 316430958c7664ce84a9544466b4155f.cloudfront.net (CloudFront)', 'x-cache': 'Miss from cloudfront', 'x-amzn-trace-id': 'Root=1-58bcf434-e4dc521e29be279d6fea393d', 'connection': 'keep-alive', 'x-amz-cf-id': 'Gdq4wVfgBSiqLK-Lj5g9dHfSCGVMCkZyYcpTm6cZEPCeTOdxki1lpg==', 'date': 'Mon, 06 Mar 2017 05:31:32 GMT', 'content-type': 'application/json'}}, u'jobId': u'15bcf7c4-871f-406f-af4e-8f6aa43d1d68'}



In [13]:

    
from IPython.display import Image
Image('../../Desktop/proof.png')









    Out[13]:



In [ ]: