For the class, Amazon will be providing each of you with $100 of free AWS credits.
First, you must register for an AWS account, during which you will be required to enter your own personal credit card information. Once registered, we will provide you with a $100 credit code. It is important you understand that once the provided credit code is used up, your credit card will be charged for any additional AWS usage, so it is important to keep track of your usage.
The following steps will guide you through the registration process:
You can manage your account via the AWS Console.
You can find out more about MRJob here.
To set yourself up to use mrjob
on Amazon, after getting your Amazon credits and setting up an AWS account, read the following QuickStart. If you follow the instructions in there, you should have set up your access key, optionally set up ssh tunnel access, and written your access keys to the ~/.mrjob.conf file. You could also set
MRJOB_CONF=/home/you/yourpath/fileName.txt
with the appropriate syntax in bash/csh/zsh/command.exe.
Use Region us-east-1
when prompted for choosing a region. This might sometimes show up as Virginia. Its ok to use another one, but beware that if you usedifferent regions at different times you might forget to make sure your services are shut down: you will then incur a cost.
Note: Just a reminder, with these keys ANYONE can send a job to Amazon under your guise (and you will be charged). It should be fairly obvious that you therefore do not want to distribute these keys. If at anytime your keys are compromised, you can log into your account, click on "Security Credentials", create a new pair, and deactivate the current pair.
If you decide to use AWS for your final project, a configuration file is preferable to avoid the repetition of reconfiguration. However, you can also use the command line to configure MRJob.
Type following two commands in your terminal:
where the xxxxxx and yyyyyy are your Access Key ID and Secret Access Key, respectively. (or the windows or csh equivalents).
By default, a single “small standard on-demand” instance will be used for computation. However, these settings can be modified via any of the previously mentioned configuration methods using the “ec2 instance type” and “num ec2 instances” flags. See here for more details on these flags as well as others.
At this point it is a good idea to try running the scripts at the mrjob quickstart. Note that EMR is billed by the hour, so run as many tests as you can (or as much of your code as you can) in batches of 1 hour, so you can have more credits left over for your own future use.
Important: Please always make sure that your code is bug free, before actually submitting it to amazon. Try to run the job locally first and see if it produces the desired result. Then, if this worked, you are ready to proceed to the cloud. The homework problems are small and your free credit should provide you with a lot of room for running and testing on Amazon. However, it is your responsibility to make sure the jobs terminate properly and do not cause excessive costs. You can always monitor your currently running jobs using this overview at region US-EAST-1 of your MapReduce job flows.