Instructions for Amazon Setup

Getting an Amazon account and your credits:

For the class, Amazon will be providing each of you with $100 of free AWS credits.

First, you must register for an AWS account, during which you will be required to enter your own personal credit card information. Once registered, we will provide you with a $100 credit code. It is important you understand that once the provided credit code is used up, your credit card will be charged for any additional AWS usage, so it is important to keep track of your usage.

The following steps will guide you through the registration process:

  1. Sign up for AWS using either your personal Amazon account or by creating a new AWS account.
  2. After signing up for AWS, sign up for EC2, which will include registration for Elastic MapReduce and a other similar services. Some of these other services may carry a cost if you decide to use them for your own personal use.
  3. Wait for an email from us with your AWS credit code.
  4. Login to your AWS Account page. Click Payment Method. At the bottom of the page, click Redeem/View AWS Credits. Then, enter your code and click redeem.
  5. As mentioned in class, you may want to set up a billing alert using this link.

You can manage your account via the AWS Console.

Get setup to run mrjob on EMR

You can find out more about MRJob here.

To set yourself up to use mrjob on Amazon, after getting your Amazon credits and setting up an AWS account, read the following QuickStart. If you follow the instructions in there, you should have set up your access key, optionally set up ssh tunnel access, and written your access keys to the ~/.mrjob.conf file. You could also set

MRJOB_CONF=/home/you/yourpath/fileName.txt

with the appropriate syntax in bash/csh/zsh/command.exe.

Use Region us-east-1 when prompted for choosing a region. This might sometimes show up as Virginia. Its ok to use another one, but beware that if you usedifferent regions at different times you might forget to make sure your services are shut down: you will then incur a cost.

Note: Just a reminder, with these keys ANYONE can send a job to Amazon under your guise (and you will be charged). It should be fairly obvious that you therefore do not want to distribute these keys. If at anytime your keys are compromised, you can log into your account, click on "Security Credentials", create a new pair, and deactivate the current pair.

If you decide to use AWS for your final project, a configuration file is preferable to avoid the repetition of reconfiguration. However, you can also use the command line to configure MRJob.

Type following two commands in your terminal:

  • export AWS_ACCESS_KEY ID=xxxxxx
  • export AWS_SECRET_ACCESS_KEY=yyyyyy

where the xxxxxx and yyyyyy are your Access Key ID and Secret Access Key, respectively. (or the windows or csh equivalents).

By default, a single “small standard on-demand” instance will be used for computation. However, these settings can be modified via any of the previously mentioned configuration methods using the “ec2 instance type” and “num ec2 instances” flags. See here for more details on these flags as well as others.

Testing:

At this point it is a good idea to try running the scripts at the mrjob quickstart. Note that EMR is billed by the hour, so run as many tests as you can (or as much of your code as you can) in batches of 1 hour, so you can have more credits left over for your own future use.

Important: Please always make sure that your code is bug free, before actually submitting it to amazon. Try to run the job locally first and see if it produces the desired result. Then, if this worked, you are ready to proceed to the cloud. The homework problems are small and your free credit should provide you with a lot of room for running and testing on Amazon. However, it is your responsibility to make sure the jobs terminate properly and do not cause excessive costs. You can always monitor your currently running jobs using this overview at region US-EAST-1 of your MapReduce job flows.