Tools that are useful

And a little about them

See the cheat sheets for common commands for these tools


Toolbox:

  • Amazon Web Services
  • Anaconda
  • Bcolz
  • Bash
  • Brew
  • Cygwin
  • Github
  • Jupyter
  • Kaggle
  • Modules
  • Pip
  • Python
  • P7zip
  • SSH
  • Tmux
  • Tree
  • Unzip
  • Wget

Amazon Web Services

Install AWS command line interface:

pip install awscli
  • Amazon allows you to buy time on their computers in various configurations
  • You log in remotely with secure shell (SSH) and instruct their computers what to do, then leave them to do the hard work while you're free to use your computer as you please
  • Deep learning involves performing multiple simple operations
  • Runing calculations on your computer's CPU (central processing unit) is possible, but would take ages for all but the simplest of problems
  • Deep learning has been revolutionised by sending computations to GPUs (Graphics Processing Units) which can do many simple computations simultaneously
  • Sending these computations to the GPU is possible only with GPUs from NVIDIA
  • Using amazon's NVIDIA GPU's is cheaper and easier than buying your own and setting it up until you get right into things

Anaconda

  • Data science super-platform
  • Contains jupyter which is a notebook where you can make notes, code and display things
    • Open notebook using terminal -> jupyter notebook
  • Contains scientific tools that work well for statistics and math
  • Can also use it to install many other programs in the future
    • To install a particular program -> conda install 'program name'

Bash

  • Unix is a family of operating systems, including linux, which is what the AWS computers use
  • To speak to a unix computer, you type into a window called bash
  • On a Mac you would type into Terminal, on Windows you would need to use Cygwin to have it unterstand bash
  • Bash = Bourne Again SHell
  • It is a type of shell, made by the GNU Project
  • Shell is a type of user interface that mostly uses text

TIP: Shell commands (including bash) can be confusing Type the command into this website to get a breakdown of all the elements of the command

Bcolz

  • We store our data as matrices / tensors using numpy arrays
    • matrices: grids of numbers
    • tensor: bigger grids of numbers
    • numpy: library for python to be able to do some cools things with numbers
    • array: A way of storing numbers in grid formation
  • bcolz allows arrays to be compressed
  • https://github.com/Blosc/bcolz

Brew

Install:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

"Homebrew" is a way to install things with the format: brew install 'thing'

Cygwin

Install: https://cygwin.com/setup-x86_64.exe

  • Make sure the installation has 'wget' as 'keep', not 'skip'

  • Unix is a family of operating systems, including linux, which is what the AWS computers use

  • To speak to a unix computer, you type into a window called bash
  • Mac's have a bash window already, but windows aint
  • Cygwin is a bash window, will use it to run a program for the AWS to use

Github

Download git: https://git-scm.com/downloads

On linux/ubuntu (aws) install:

sudo apt-get install git
  • Git (as in get) gets things
  • It's the way most people download projects and collaborate on things.
    • It allows people to work on the same project at the same time and basically keeps track of what changes have been made and then makes sure there are no clashes.
  • Github is a site that stores projects up to 1Gb in a 'repository'
  • We use git to 'clone' project files from the web (from github) to our computer
  • Git keeps track of who makes changes so people can work on the same project simultaneously
  • You upload your changes to the repository and the changes are recorded along with alerts for any clashes which can then be fixed
  • E.g. -> git clone www.projectlocation.com
  • Because we're going to be using different programs and we just want the basic bits from each, it'll be useful to have a folder with cheat-sheets for each of the different programs
  • Git will keep all of our project files except the data folder (which has been added to a document called .gitignore)

First timers get-going guide

How to contribute to this repository
  1. Make an account at https://github.com/
  2. Go to https://github.com/breadley/learning_machine and click the 'fork' button (top right)
    • This creates a parallel version of the entire project, where you will make changes and edits and then merge it back with the master version
    • 'Fork' makes a copy of learning_machine in your github repository
  3. Got to terminal and clone your version of learning_machine to your computer
    git clone git@github.com/YOUR_USERNAME/learning_machine
  4. Go to that directory on your computer
    cd FOLDER_NAME
    cd ..
  5. Add a connection to this repository
    git remote add learning_machine git://github.com/breadley/learning_machine
    • You can check to see if the connection was made with
      git remote -v
  6. Make changes to the files as you like
  7. Add the changes to git (make sure you are in the 'learning_machine' folder)
    git add "filename"
    • If you want to upload all files
      git add *
  8. Commit the changes
    git commit -m 'insert tiny message about your changes'
  9. Push them to your repository
    git push origin master
    • is this correct? or just -> git push
  10. Go to your github repository then to the learning_machine folder (github.com/YOUR_USERNAME/learning_machine)
  11. Click pull request and write a message about what you've been up to
  12. I can then merge the changes
Day-to-day git commit

Go to folder 'learning_machine'

git add *
git commit -m 'message about commit'
git push origin master

Jupyter

  • Contains jupyter which is a notebook where you can make notes, code and display things
    • Open notebook using terminal -> jupyter notebook

Keras

Install version 1.2.2 (keras 2.0 is out, but the coursework is still using <2.0):

conda install -c conda-forge keras=1.2.2
  • Deep learning framework that is simple to write and can execute in either tensorflow or theano

Kaggle

pip install kaggle-cli
  • During the project we'll be using data from kaggle competitions
  • Set up which competition you are working on
    kg config -g -u breadley -p PASSWORD -c competition_name_from_kaggle_url_ending
  • We download kaggle data with the the kaggle command line interface
    kg download
  • Submit
    kg submit submission1.csv -u USERNAME -p PASSWORD -c competition_name_from_kaggle_url_ending -m any_message

Modules

In python you can import a module from another document to simplify and de-clutter your code

import my_module
  • It will look for modules called my_module
    • First in the normal system folders where important modules are kept
    • Then it will look in your current_working_directory
    • The all the way back up through the folders out of that directory .., .. and .. etc.
  • Hence, if you want your program to import a my_module.py, place it in the same directory, or in the directories above it.

Modules have a .py extension

P7zip

Install:

brew install p7zip

A good zipper, used by some kaggle competitions

To ('e') extract a file called file_name.7z file:

7z e file_name.7z

To ('a') add/compress files file_1, file_2 and file_3 into a new_name.7z file:

7z a new_name.7z file_1 file_2 file_3

Pip

Upgrade (It's already installed)

pip install --upgrade pip

Python

Programming language

Secure Shell

  • Also known as SSH
  • Allows you to login remotely to a computer

Tensorflow

Install:

pip install tensorflow
  • Deep learning framework

Theano

Install:

conda install theano
  • Deep learning framework

Tmux

  • Allows multiple bash windows to be open within one larger window

Tree

Install:

brew install tree
  • Displays the folder structure of your data
  • Directories: tree -d
  • DIrectories with file sizes: tree -d --du

Unzip

Install:

sudo apt install unzip
  • For unzipping data files
  • unzip filename.zip

Wget

Install:

brew install wget
  • Allows you to use the 'WWW' to 'Get' (wget) to grab things using their URL

In [ ]: