Week 1

Links

Video: http://course.fast.ai/lessons/lesson1.html

Video timeline: http://wiki.fast.ai/index.php/Lesson_1_Timeline

Summary: http://wiki.fast.ai/index.php/Lesson_1 -----> Basically a numbered list of steps for the week!

Notes: http://wiki.fast.ai/index.php/Lesson_1_Notes

Lecture materials: https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson1.ipynb

Aims:

To get all the bits and pieces set up
Enter a kaggle competition

Steps

0. Install Anaconda

(Contains jupyter, where we type notes, code and display things)

1. Install Git

(to get project files for the course)

2. Set up Amazon Web Services - AWS

(We will use their big computers)

3. If using windows, set up Cygwin

(allows windows computer to interact with a unix computer such as AWS)

4. Install AWS command line tools

(so you can command AWS from your computer)

5. Set up and configure AWS

(tell amazon what we want)

6. Log in to your AWS instance

(first-time login)

7. Daily AWS use

(bare essentials to get into your instance)

8. Get the course files onto the EC2 instance

(Our files won't be stored on our personal computers)

9. Get the data!

(Setup up Kaggle)

10. Check out the code for our basic deep network

(Poke around and run the code)

11. Rapid fire

(Get practical, submit a squid or even an old scab, perfect it later)

0. Install Anaconda

Explanation:

Data science super-program
Contains jupyter which is a notebook where you can make notes, code and display things
- Open notebook using terminal
```
jupyter notebook
```
Can also use it to install many other programs in the future
- To install a particular program
```
conda install 'program name'
```

Steps:

https://www.continuum.io/downloads
Install python 2.7 using the graphical installer
Open a notebook to start working
```
jupyter notebook
```

Different versions of python

If you have two versions of anaconda installed, you can reorder the folders that the computer searches for python in to pick and choose which one you use as default. To check which python:

python --version

To view the order that folders are searched for

Go to your 'environment'
```
env
```
look for the part that says "PATH=/blah/blah:/gah:/wah/ma:/trash"
This means it looks for python first in /blah/blah then /gah then /wah/ma then /trash
To change the order you can go have a look what's in the bash profile
```
vim ~/.bash_profile
```
vim is a text editor (the original text editor). It uses fancy commands to be efficient
The bash profile is a thing that adds bits to your path when a new terminal is opened
Play around with the order of the things in the the bash profile using vim commands
e.g. yy cuts a line, p places it somewhere, dd deletes a line, :wq writes and quits
Then once your back out of there execute the file to make the changes
```
source ~/.bash_profile
```
Then check what happened to your path in environment
```
env
```
If you mucked it up, copy the bit of the path that you want to keep e.g. only the folders /blah/blah and /gah, not /wah/ma or /trash
then type
```
export PATH=/blah/blah:/gah
```
check your path again
```
env
```
then have a look at the bash profile
```
vim ~/.bash_profile
```
If you like what it says e.g. the order of the exports in there, write and quit
```
:wq
```
then execute the file
```
source ~/.bash_profile
```
check path
```
env
```
if it's all good then exit terminal window and reopen to make sure it's permanent
```
env
```

TIP: if you ever want to interrupt some process that the terminal is running -> ctrl+c

1. Install Git and make a folder for cheat-sheets

Explanation:

Git (as in get) gets things
It's the way most people download projects and collaborate on things.
- It allows people to work on the same project at the same time and basically keeps track of what changes have been made and then makes sure there are no clashes.
We will use it to 'clone' the project files from the web to our computer
E.g.
```
git clone www.projectlocation.com
```
Because we're going to be using different programs and we just want the basic bits from each, it'll be useful to have a folder with cheat-sheets for each of the different programs

How to get git: Summary: https://help.github.com/articles/set-up-git/

Download here: https://git-scm.com/downloads
If computer asks for you to install xcode developer tools, say ok
- If already installed, can update with
```
git clone https://github.com/git/git
```

Make a cheat-sheet dump!

Every time you install a new tool, google 'tool name cheat sheet pdf' and lump it into a folder with all your other cheat sheets
We'll use WWW Get (wget) to grab things using their URL

Install wget and tree. Windows skip to step 2 (we will make sure they are installed with cygwin)
1. We'll use homebrew (another package manager - like a non-science version of anaconda) to install wget.
```
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```
2. use homebrew to install wget
```
brew install wget
```
3. use hombrew to also install tree
```
brew install tree
```
4. If you want tree on cygwin, you can re-run the cygwin installer and make sure 'tree' is included along with 'wget'
Make a folder where you want to keep computer and data science project notes
- My setup:
  1. Went to my main directory
```
cd
```
  2. Made a folder/directory called 'proj'. When naming things, spaces make it slow and difficult in terminal because spaces are used to separate commands, so use an underline instead of a space
```
mkdir proj
```
  3. Went into 'proj' and make a folder for deep learning 'dl'
```
mkdir dl
```
  4. Went into 'dl'
```
cd dl
```
  5. Cheat cheet folder
```
mkdir cheat_sheets
```
  6. Go there
```
cd cheat_sheets
```
Got to a cheat sheet's website in your browser and copy the url (can right click a link and copy link address too)

Use web get to get the file and into your cheat sheets folder

wget url

bash sheet

wget -O bash_sheet.pdf http://www.lsv.ens-cachan.fr/~fthire/teaching/2016-2017/programmation-1/cheatsheet/shell.pdf

shorter bash sheet

wget -O bash_quick_sheet.pdf http://sites.tufts.edu/cbi/files/2013/01/linux_cheat_sheet.pdf

download conda tips and rename conda_sheet.pdf

wget -O conda_cheat.pdf https://conda.io/docs/_downloads/conda-cheatsheet.pdf

git sheet

wget -O git_sheet.pdf https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf

jupyter sheet

wget -O jupyter_sheet.pdf https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/pdf/

python sheet

wget -O python_sheet.pdf https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf

tmux sheet

wget -O tmux_sheet.pdf http://alvinalexander.com/downloads/linux/tmux-cheat-sheet.pdf

aws special commands (alias commands) they made for the course

wget -O aws_alias_sheet.html http://wiki.fast.ai/index.php/Aws-alias

kaggle sheet

wget -O kaggle_sheet.html https://github.com/floydwch/kaggle-cli

tree (viewing file structure)

wget -O tree_sheet.html http://mama.indstate.edu/users/ice/tree/tree.1.html

To open a file from the command line
```
open blah_blah.pdf
```

A tip: be careful to always know what folder you are in (`pwd`) when removing files (`rm 'file'`) because you might accidentally delete something important

2. Set up Amazon Web Services (AWS)

Explanation:

Deep learning involves performing multiple simple operations
Runing calculations on your computer's CPU (central processing unit) is possible, but would take aaaages for all but the simplest of problems
Deep learning has been revolutionised by sending computations to GPUs (Graphics Processing Units) which can do many simple computations simultaneously
Sending these computations to the GPU is possible only with GPUs from NVIDIA
Using amazon's NVIDIA GPU's is cheaper and easier than buying your own and setting it up

Video: https://www.youtube.com/watch?v=8rjRfW4JM2I

Notes: http://wiki.fast.ai/index.php/AWS_install

There will be two types of AWS:

T2 is shitty and free (for tinkering, testing and prototyping) (setup_t2.sh)
P2 beefy and $0.90 per hour (for training the models) (setup_p2.sh)

Install the free version (T2) of AWS

Set up AWS account
- https://aws.amazon.com/
Get permission to use the P2 version of AWS
- aws.amazon.com/contact-us/ec2-request
- limit type EC2 instances
- region: 'US west (oregon)'
- primary instance type select 'p2.xlarge'
- limit 'instance limit'
- new limit value '1'
- use case description 'fast.ai MOOC'
- 'submit'
Download folder from the course containing AWS setup files
1. Open terminal (mac) or command line (windows)
2. Use change directory (cd) to navigate to where you want the folder to go
  - list files and folders in current folder
```
ls
```
  - go to a folder
```
cd 'folder name'
```
  - go back up a folder level
```
cd ..
```
  - find which folder you are currently in
```
pwd
```
3. Get the folder with all the AWS files they made for the course
```
git clone https://github.com/fastai/courses
```
  - This copies their folder to the folder you are currently in

3. If using windows, install Cygwin

Explanation:

Unix is a family of operating systems, including linux, which is what the AWS computers use
To speak to a unix computer, you type into a window called bash
Mac's have a bash window already, but windows aint
Cygwin is a bash window, will use it to run a program for the AWS to use

Download and install:

https://cygwin.com/setup-x86_64.exe
Make sure the installation has 'wget' as 'keep', not 'skip' as per https://youtu.be/8rjRfW4JM2I?t=2m53s
Additionally, make sure you also include 'tree' (for viewing file structure easily)
Cygwin insaller can be re-run many times to install new packages. It's no problem if you've already installed it.

4. Install AWS command line tools

Explanation:

AWS are computers that are set up to receive specific instructions about how to run
We will control them from our computer. e.g start, stop
We need to install the tools on our computer to do that

Steps:

Make sure pip is up to date
- pip (Pip Installs Package) is a tool we use to get and install other tools/packages from the internet
- Make sure pip is up to date
```
pip install --upgrade pip
```
Install the AWS command line interface
```
pip install awscli
```

5. Set up and configure AWS

Explanation:

Amazon remembers what computer setup you have and keeps track of things by giving you a key
Need to make sure we use our keys when we access AWS
Steps 1-3 only when first setting up. Future day-to-day use just do step 4 onwards
Names:
- AWS: amazon web service (amazon allows access to their computers)
- EC2: elasic compute cloud (amazon platform where you can get different computer setups)
- AMI: amazon machine image (a specific setup that contains settings and folders. The course creators set up out AMI with anaconda and some course notes already in it. So handy. It is a virtual machine - it runs on a big server but you treat it like a desktop computer. Elastic because you can change which hardware you run it on - GPUs, CPUs different sizes and capabilities)
- Instance: A setup of an AMI on a particular type of hardware (our course AMI on a P2 is an instance, same for when it's on a T2)

I think oregon is the best location for Aus at the moment. It's ages away, but I don't think the course AMI works for the amazon computers located in Sydney, which would be much quicker for us to use

Steps:

Log in at https://aws.amazon.com/
Create a 'user'
1. 'Services' tab
2. 'Security, Identity and Compliance' heading
3. 'IAM' link
4. 'Users' tab on left
5. 'Add user' blue button
6. Enter your name
7. Tick 'programmatic access' and 'AWS Management Console Access'
8. Make up password
9. Uncheck 'require password reset'
10. 'Next'
11. Click 'attach exising policies directly'
12. Choose 'AdministratorAccess'
13. 'Next: review'
14. 'Create user'
15. Save key and secret key to a word document
Configure AWS
1. Configure
```
aws configure
```
  - Enter access key ID (copy+paste)
  - Enter secret access key (copy+paste)
  - Enter region:
```
us-west-2
```
  - Default output format
```
text
```
2. change to the directory where we cloned the course folder, enter the 'courses' folder
3. change to 'setup' folder
4. execute the setup file using bash (if you have the p2 approved, use that, otherwise use t2 for now)
  - If P2 approved
```
bash setup_p2.sh
```
  - Otherwise
```
bash setup_t2.sh
```
5. Wait for the thing to finish
6. Copy and paste the details it spits out to a word document and save for later

6. Log in to your AWS instance for the first time

Explanation:

Now you've run the scripts that got the instance set up, you can SSH in (log in from any computer
Secure Shell (SSH) works by you specifying the IP address of the computer you want to log into remotely from your bash shell (terminal on mac, cygwin on windows)
Anyone can do this, so until we change our passwords from the default

Steps:

Connect to AWS
1. Copy the connect line (starting from 'ssh...' onwards) and run
  - e.g. -> ssh...etc
  - Will be of the format:
```
ssh -i /Users/yourdirectory/.ssh/aws-key-fast-ai.pem ubuntu@ec2-a-bunch-of-numbers.us-west-2.compute.amazonaws.com
```
2. Type 'yes' to approve the authenticity
3. First time only: There is a file they accidentally left in there called bash_history which you need to delete (only the first time you log in to the server) otherwise you can't save any bash preferences. (e.g like the PATH variable, and other preferences)
  - to list regular files
```
ls
```
  - to list all files (including secret ones like the bash preferences
```
ls -a
```
  - to remove or alter these secret files you have to specify that you are a super user, who knows what's up. The command is 'super user do' (sudo)
```
sudo rm .bash_history
```
If you are using the P2 instance, check out the details with
```
nvidia-smi
```
Start a jupyter notebook
1. Get the AWS server to open up a port for the notebook
```
jupyter notebook
```
2. Wait for it to start and see what port the notebook is running at. It will give an http://(blahblah):PORT
3. Go to your browser and type in the address for your instance followed by a colon and then the port number
  1. You can copy the address from the login you used (the bit after 'ubuntu@'). E.g. ec2-a-bunch-of-numbers.us-west-2.compute.amazonaws.com:PORT
  2. Or you can type the IP address from the bunch of numbers. E.g. a.bunch.of.numbers:PORT
  3. Or you can get the IP address from aws.amazon.com under 'network and security', 'elastic IPs'
4. Enter the passowrd: dl_course
Open a workbook by clicking 'new' (right top) and selecting 'python (conda root)'
1. Test you can add numbers. Hit: shift+enter to execute a cell
```
1+2
```
2. Test you can import theano, the deep leaning language:
```
import theano
```
3. Test you can import keras, the simpler language that instructs theano:
```
import keras
```
4. Get cracking on some code!
When you've finished ya thangs, shut down your instance
- Might be able to leave your T2 open (not sure), but the P2 will be charging you
- Starting the P2 costs 90c even if it's multiple times within the one hour
- Go to aws.amazon.com and navigate to 'EC2' then 'running instances'
- Right click and hit 'stop', rather than 'terminate'
  - 'stop'
    - No longer charged money (P2)
    - Your files will stored on the virtual hard drive
  - 'terminate'
    - No longer cahrged money (P2)
    - Files deleted from virtual hard drive
    - Don't use this unless you want to get rid of an instance completely
    - If you terminate, you'll need to start a new instance like before with -> aws configure
See if you've been billed for anything
1. Go to aws.amazon.com
2. Click on your name (top right)
3. 'my billing dashboard'
4. Check how much storage you've used
  - I think we get 30gb per month, not exactly sure if that's downloads or just total size of your files so be mindful of downloading lots of things to your AWS instance

7. Daily AWS use

Description:

How to get into your EC2 ASAP

Steps:

Go to folder with course files in it, then into the setup folder /course/setup
Start the alias, which simplifies the AWS commands for us
```
source aws-alias.sh
```
See the list of alias commands and what they are doing behind the scenes
```
alias
```
```
aws-get-t2
```
```
aws-get-p2
```
```
aws-start
```
```
aws-ip
```
```
aws-nb
```
```
aws-ssh
```
```
aws-stop
```
We've already started the thing via the website so to log in, do these:
1. get t2
```
aws-get-t2
```
2. start
```
aws-start
```
  - If you have trouble at this step, go to aws.amazon.com and navigate to EC2, right click and start the instance you want. Then return to your bash window
3. get ip
```
aws-ip
```
4. start secure shell
```
aws-ssh
```
Start a notebook by copying the IP address you just printed and adding :8888 to the end in your browser (n.u.m.b.e.r.s:8888)
The terminal/cygwin window will turn into a notebook logger, if you want another bash window you'll have to open another window (either new window or a new tmux pane)
1. In new window navigate to your home course directory and to the course/setup folder
2. Activate the command aliases
```
source aws-alias.sh
```
3. Get that window to get the t2 details
```
aws-get-t2
```
4. Get the IP address
```
aws-ip
```
5. SSH in
```
aws-ssh
```
Now you can have a notebook running to write the deep learning program, but also be able to access the instance via the bash shell to manage files on the instance

8. Get the course files onto the instance

Explanation:

We've set up our virtual computer (EC2), now we want our project stuff inside it

Steps:

Get git on your EC2 with the linux Advanced Packaging Tool, as a SuperUser (Do)
```
sudo apt-get install git
```
Navigate on your instance to where you want the course folder to go. I'm using home directory (/home/ubuntu). Ubuntu is the name of the version of the linux operating system that the amaon computers use
```
cd
```
Use git to clone the course files from the course github site (on aws)
```
git clone https://github.com/fastai/courses.git
```
install tree to inspect your folders
```
sudo apt-get install tree
```
Visualise what you got by using tree to see 'd'irectories
```
tree -d
```

9. Get the data!

Description:

We'll be downloading the images onto the EC2 where they will stay
During the project we'll be using data from kaggle competitions
We download kaggle data with the the kaggle command line interface
Our data is coming from this competition: https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition but the course has taken that data and put it into nice folder for us already!

Steps

Set up Kaggle
1. Upgrade pip on your EC2
```
pip install --upgrade pip
```
2. Install kaggle command line interface
```
pip install kaggle-cli
```
3. In your browser go to kaggle.com and set up an account manually, don't link it to facebook, the cli doesn't like that. Remember your USERNAME and PASSWORD
4. Go to the competition website https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition
5. Go to 'more' (top right) then 'rules' and scroll down to accept the rules
6. Configure your cli with your password, so your EC2 can talk to the kaggle website to download data and submit entries
```
kg config -g -u USERNAME -p PASSWORD -c dogs-vs-cats-redux-kernels-edition
```
7. Install unzip
```
sudo apt install unzip
```
Download the data. The course have downloaded the data from kaggle and put it in nice folders for us already
1. Navigate to /courses/deeplearning1/nbs so that our data goes in the same folder as the rest of the project
2. make a data folder
```
mkdir data
```
3. go there
```
cd data
```
4. get the zipped data from the course platform website
```
wget http://www.platform.ai/files/dogscats.zip
```
5. unzip it
```
unzip dogscats.zip
```
6. Remove the zip file
```
rm dogscats.zip
```
7. Inspect the folder structure
```
tree -d
```
8. Inspect the distribution of files by looking at the sizes
```
tree -d --du
```

ALTERNATIVELY If you wanted to download the data straight from kaggle, and organise the files yourself

Download the data
1. Navigate to /courses/deeplearning1/nbs so that our data goes in the same folder as the rest of the project
2. Make a data folder
```
mkdir data
```
3. Go into the data folder
```
cd data
```
4. Make a dogscats folder
```
mkdir dogscats
```
5. Go into that folder
6. Download the data (now that we've told kaggle the name of our competition). Takes a while
```
kg download
```
7. Unzip test images
```
unzip test.zip
```
8. Unzip train images
```
unzip train.zip
```
9. Remove test.zip
```
rm test.zip
```
10. Remove train.zip
```
rm train.zip
```
Make sure our folder structure is accurate so that the training works (we will need training, validation and test folders)
1. Make sure you're in /courses/deeplearning1/nbs/dogscats
```
pwd
```
2. Make sure the 'train' and 'test' folders are still there from when we downloaded them. They should be about 286,720 bytes and 757,760 bytes
```
ls -l
```
  - Train (757,760 bytes): This will be used to fit the parameters of the model
  - Test (286,720 bytes): This will be fresh data the model hasn't seen. It will be used to see how good it is. The Kaggle website has a second set of secret test data which it will use to see how good we are
  - Note that our training folder is much larger than the test folder to mazimise the information we have to learn from before being tested
3. Make an empty validation folder
```
mkdir valid
```
  - This will be used to fine tune the parameters of the model. Will need to put images in it (one tenth the amount of the train folder)
4. We also want a duplicate of all of these in with tiny quantities of data so we can rapidly test
  1. make a sample directory
```
mkdir sample
```
  2. enter sample directory
```
cd sample
```
```
mkdir train
```
```
mkdir valid
```
```
mkdir test
```
5. navigate back to deeplearning1/nbs
```
cd ..
```
```
cd ..
```
6. See your folder structure
```
tree -d
```
7. In every end-folder make a 'cats' folder and a 'dogs' folder (navigate with cd 'name' and cd .., make with mkdir)
  - data
    - catsdogs
      - sample
        
        train
        
        - cats - dogs
        
        valid
        
        - cats - dogs
      - test
      - train
        
        cats
        
        dogs
      - valid
        
        cats
        
        dogs
8. See how many bytes each folder has
```
tree --du
```
9. Distribute the files as you need them

10. Check out the code for our basic deep network

Find the file deeplearning1/nbs/lesson1.ipynb and make a copy
```
cp lesson1.ipynb lesson1_copy.ipynb
```
Go over to your notebook (or run notebook with -> jupyter notebook) and open deeplearning1/nbs/lesson1_copy.ipynb
Inspect the code, then at each cell, hit shift+enter to run. Start from the top to the bottom. When the cell is busy it will change it's line number (top left) to an asterisk, so wait for that to finish
1. line1: Matplotlib raises a warning that it's taking it's time doing some font things. All good
2. Line2: Where we are looking for our files (small sample (quick), or big full set of data (slow))
  - At present we haven't yet copied any images from our regular data in to our sample folders
  - Once we do we can # comment out the real data path and uncomment the sample data path
3. Line3: Imports modules that we have already installed as part of anaconda (numpy, matplotlib)
4. Line4: Utils is a module that the course people wrote which simplifies some of the things we want to achieve. To inspect what utils can do, go to the document called utils.py and have a peek
5. Line6: VGG16!
  - We import the module that containes the specifics of the winning deep neural network from an old imagenet competition.
  - It's been turned into a module that we can import so we don't have to look at any of the nuts and bolts in week 1.
  - If you want to look at what the vgg16 module contains, go inspect the vgg16.py file in the same directory
    - Do it first in terminal with the concatenate+print command
```
cat vgg16.py
```
    - Then via jupyter navigation

I'll put more here when I work out what it is

Basically will be going through the rest of this http://wiki.fast.ai/index.php/Lesson_1

Need to do, but haven't looked into:

Add the data folder to .gitignore
How to upload a model to git for use on a different computer

11. Rapid fire

Description: No messing around. I have to go to bed and cats and dogs are running away!

Get into P2

Git clone the repo

git clone https://github.com/fastai/courses.git

install pip
```
pip install --upgrade pip
```

install kaggle

pip install kaggle-cli

Configure kaggle

kg config -g -u USERNAME -p PASSWORD -c dogs-vs-cats-redux-kernels-edition

In a separate bash window, ssh into the p2 instance, open jupyter notebook courses/deeplearning1/nbs/dogs_cats_redux
make a 'data' folder, then inside that a 'redux' folder
- Download data
```
kg download
```
- Get unzip
```
sudo apt install unzip
```
- Unzip the test and train zips, delete zip files
- Get tree
```
sudo apt install tree
```
- See what you've got
```
tree -d --du
```
Run the boxes up to 'Action Plan'
See what you did
```
tree -d --du
```
Run up to "Rearrange image files into their respective directories"
See what you did
```
tree -d --du
```
- You moved some files into the sample/train folder and some into the validation folder
Run up to "Finetuning and training"
See what you done
Run up to "Generate Predictions"
Wait ages. Like 650s x 3 = 32 mins
Run the next line to make predictions - takes like 3 mins or so

Run the rest!

kg submit submission1.csv -u USERNAME -p PASSWORD -c dogs-vs-cats-redux-kernels-edition -m any_message

Phew!



In [ ]:

Week 1

Links

Aims:

Steps

0. Install Anaconda

1. Install Git

2. Set up Amazon Web Services - AWS

3. If using windows, set up Cygwin

4. Install AWS command line tools

5. Set up and configure AWS

6. Log in to your AWS instance

7. Daily AWS use

8. Get the course files onto the EC2 instance

9. Get the data!

10. Check out the code for our basic deep network

11. Rapid fire

0. Install Anaconda

Different versions of python

1. Install Git and make a folder for cheat-sheets

A tip: be careful to always know what folder you are in (pwd) when removing files (rm 'file') because you might accidentally delete something important

2. Set up Amazon Web Services (AWS)

3. If using windows, install Cygwin

4. Install AWS command line tools

5. Set up and configure AWS

6. Log in to your AWS instance for the first time

7. Daily AWS use

8. Get the course files onto the instance

9. Get the data!

ALTERNATIVELY If you wanted to download the data straight from kaggle, and organise the files yourself

10. Check out the code for our basic deep network

11. Rapid fire

A tip: be careful to always know what folder you are in (`pwd`) when removing files (`rm 'file'`) because you might accidentally delete something important