As a rule, I always import division from future when using python 2.*. By doing so, any floating number division will work correctly without the use of '.' or numpy (e.g. 3/2 will prduce 1.5 instead of 1)


In [2]:
from __future__ import division

Other useful import is numpy. You will use it so many times you should just have it imported at every file.


In [4]:
import numpy as np

# setting a random seed like the Professor said. Make sure you use the correct number (or no number).
np.random.seed(0)

Loading Data

For our first example we'll use the famous iris data set. It is available on the HW-1 zip file.


In [5]:
path_to_file = 'HW1-code/data/iris.txt'   # This is where my sits, update it to your path.
iris = np.genfromtxt(path_to_file, delimiter=None)  # Loading thet txt file

This loaded the iris data set into a numpy 2D array (basically a table). Some of the things you can do with a table is to check the size of it, get a row/column, slice it, etc.

Size


In [6]:
iris.shape


Out[6]:
(148, 5)

Get (or set) item/row/column


In [ ]:
iris[0, 3]  # Get the element on the first row and 4th column.
iris[0, :]  # Or iris[0]. Getting the 0'th row
iris[:, 0]  # Get the 0'th column

Slicing


In [ ]:
iris[0:3, :]  # Or iris[:3, :] -- Get the first 3 rows
iris[:, 0:2]  # Or iris[:, :2] -- Get the first two columns.

Using negative indicators.


In [ ]:
# You can also use negative indicator for both getting an item/row/column and for slicing
iris[-1, :]   # Get the last row
iris[-2:, :]  # Get the last two rows
iris[:, :-1]  # First four columns

Plotting the Data

There are a 100001 ways to visualize the data, but in this course we'll use the simple built in matplotlib.pyplot package. The nice thing about it that it can be used in a very simple way but also you can do crazy things with it.

First thing first, if you are using Jupyter Notebooks you need to allow inline plotting. Just run the next code cell.


In [7]:
%matplotlib inline
import matplotlib.pyplot as plt

The most simple type of plotting is plotting a line. Let's plot the first two columns of the iris data using the first one as the x-axis and the second as the y-axis.


In [10]:
plt.plot(iris[:, 0], iris[:, 1])
plt.show() # You don't have to have this in the notebooks, but it's goot practice as it cleans the canvas.


This plot doesn't make much sence, that's because that's not the right way to plot the data. A more proper way is a scatter plot.


In [13]:
plt.scatter(iris[:, 0], iris[:, 1])
plt.show() # You don't have to have this in the notebooks, but it's goot practice as it cleans the canvas.


Now let's plot the scatter using the same two columns and the last one as the class (which you will have to do in your HW assignment.

Notice that I also added a lot more stuff in the scatter method. There are 100001 options in the pyplot library for each method, you can see them in the documentations and plotting examples online.


In [18]:
plt.scatter(iris[:, 0], iris[:, 1], s=130, c=iris[:, -1], alpha=0.75)
plt.show()


Or in a more recommended way, where you can control more things, plot each class seperately (either in a loop or if you know in advanced what you want to plot -- manualy)


In [19]:
colors = ['blue', 'green', 'red']

for i, c in enumerate(np.unique(iris[:, -1])):
    mask = np.where(iris[:, -1] == c)[0]   # Finding the right points
    plt.scatter(iris[mask, 0], iris[mask, 1], s=120, c=colors[i], alpha=0.75, label='class %d' % i)

plt.legend()
plt.show()


There are a lot of different ways to plot things, some are more pretty, some are less. At the end of the day you should plot things in the way that is most supporting your results or help you find what you're looking for in the data.