Crash Course Review Exercises

Import numpy, pandas, matplotlib, and sklearn. Also set visualizations to be shown inline in the notebook.


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Set Numpy's Random Seed to 101


In [2]:
np.random.seed(101)

Create a NumPy Matrix of 100 rows by 5 columns consisting of random integers from 1-100. (Keep in mind that the upper limit may be exclusive.


In [3]:
random_integers = np.random.randint(low = 1, 
                                    high = 101, 
                                    size = (100, 5))

Create a 2-D visualization using plt.imshow of the numpy matrix with a colorbar. Add a title to your plot. Bonus: Figure out how to change the aspect of the imshow() plot.


In [4]:
fig = plt.figure(figsize = (12, 12))
plt.imshow(random_integers, aspect = 0.05)
plt.colorbar()
plt.title("2D visualisation")


Out[4]:
<matplotlib.text.Text at 0x1abc7eb10f0>

Now use pd.DataFrame() to read in this numpy array as a dataframe. Simple pass in the numpy array into that function to get back a dataframe. Pandas will auto label the columns to 0-4


In [5]:
df = pd.DataFrame(random_integers)
df.head()


Out[5]:
0 1 2 3 4
0 96 12 82 71 64
1 88 76 10 78 41
2 5 64 41 61 93
3 65 6 13 94 41
4 50 84 9 30 60

Now create a scatter plot using pandas of the 0 column vs the 1 column.


In [6]:
df.plot(x = 0, 
        y = 1, 
        kind = 'scatter', figsize = (12, 8))


Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x1abc82b70f0>

Now scale the data to have a minimum of 0 and a maximum value of 1 using scikit-learn.


In [7]:
from sklearn.preprocessing import MinMaxScaler

In [8]:
minmax = MinMaxScaler()

In [9]:
scaled_random_int = minmax.fit_transform(df)
type(scaled_random_int)


Out[9]:
numpy.ndarray

In [10]:
scaled_df = pd.DataFrame(scaled_random_int)
scaled_df.head()


Out[10]:
0 1 2 3 4
0 0.958763 0.104167 0.821053 0.721649 0.632653
1 0.876289 0.770833 0.063158 0.793814 0.397959
2 0.020619 0.645833 0.389474 0.618557 0.928571
3 0.639175 0.041667 0.094737 0.958763 0.397959
4 0.484536 0.854167 0.052632 0.298969 0.591837

Using your previously created DataFrame, use df.columns = [...] to rename the pandas columns to be ['f1','f2','f3','f4','label']. Then perform a train/test split with scikitlearn.


In [11]:
from sklearn.model_selection import train_test_split

In [12]:
df.columns = ['f1','f2','f3','f4','label']
df.head()


Out[12]:
f1 f2 f3 f4 label
0 96 12 82 71 64
1 88 76 10 78 41
2 5 64 41 61 93
3 65 6 13 94 41
4 50 84 9 30 60

In [13]:
X = df.iloc[:, df.columns != 'label']
Y = df['label']

In [14]:
X.shape


Out[14]:
(100, 4)

In [15]:
Y.shape


Out[15]:
(100,)

In [16]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, 
                                                    test_size = 0.3, 
                                                    random_state = 101)

In [17]:
X_train.shape


Out[17]:
(70, 4)

In [18]:
X_test.shape


Out[18]:
(30, 4)

In [19]:
Y_train.shape


Out[19]:
(70,)

In [20]:
Y_test.shape


Out[20]:
(30,)

Great Job!