Crash Course Review Exercises

Import numpy, pandas, matplotlib, and sklearn. Also set visualizations to be shown inline in the notebook.



In [1]:

    
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Set Numpy's Random Seed to 101



In [2]:

    
np.random.seed(101)

Create a NumPy Matrix of 100 rows by 5 columns consisting of random integers from 1-100. (Keep in mind that the upper limit may be exclusive.



In [3]:

    
random_integers = np.random.randint(low = 1, 
                                    high = 101, 
                                    size = (100, 5))

Create a 2-D visualization using plt.imshow of the numpy matrix with a colorbar. Add a title to your plot. Bonus: Figure out how to change the aspect of the imshow() plot.



In [4]:

    
fig = plt.figure(figsize = (12, 12))
plt.imshow(random_integers, aspect = 0.05)
plt.colorbar()
plt.title("2D visualisation")









    Out[4]:





<matplotlib.text.Text at 0x1abc7eb10f0>

Now use pd.DataFrame() to read in this numpy array as a dataframe. Simple pass in the numpy array into that function to get back a dataframe. Pandas will auto label the columns to 0-4



In [5]:

    
df = pd.DataFrame(random_integers)
df.head()

Now create a scatter plot using pandas of the 0 column vs the 1 column.



In [6]:

    
df.plot(x = 0, 
        y = 1, 
        kind = 'scatter', figsize = (12, 8))









    Out[6]:





<matplotlib.axes._subplots.AxesSubplot at 0x1abc82b70f0>

Now scale the data to have a minimum of 0 and a maximum value of 1 using scikit-learn.



In [7]:

    
from sklearn.preprocessing import MinMaxScaler



In [8]:

    
minmax = MinMaxScaler()



In [9]:

    
scaled_random_int = minmax.fit_transform(df)
type(scaled_random_int)









    Out[9]:





numpy.ndarray



In [10]:

    
scaled_df = pd.DataFrame(scaled_random_int)
scaled_df.head()

Using your previously created DataFrame, use df.columns = [...] to rename the pandas columns to be ['f1','f2','f3','f4','label']. Then perform a train/test split with scikitlearn.



In [11]:

    
from sklearn.model_selection import train_test_split



In [12]:

    
df.columns = ['f1','f2','f3','f4','label']
df.head()



In [13]:

    
X = df.iloc[:, df.columns != 'label']
Y = df['label']



In [14]:

    
X.shape









    Out[14]:





(100, 4)



In [15]:

    
Y.shape









    Out[15]:





(100,)



In [16]:

    
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, 
                                                    test_size = 0.3, 
                                                    random_state = 101)



In [17]:

    
X_train.shape









    Out[17]:





(70, 4)



In [18]:

    
X_test.shape









    Out[18]:





(30, 4)



In [19]:

    
Y_train.shape









    Out[19]:





(70,)



In [20]:

    
Y_test.shape









    Out[20]:





(30,)

	0	1	2	3	4
0	96	12	82	71	64
1	88	76	10	78	41
2	5	64	41	61	93
3	65	6	13	94	41
4	50	84	9	30	60

	0	1	2	3	4
0	0.958763	0.104167	0.821053	0.721649	0.632653
1	0.876289	0.770833	0.063158	0.793814	0.397959
2	0.020619	0.645833	0.389474	0.618557	0.928571
3	0.639175	0.041667	0.094737	0.958763	0.397959
4	0.484536	0.854167	0.052632	0.298969	0.591837

	f1	f2	f3	f4	label
0	96	12	82	71	64
1	88	76	10	78	41
2	5	64	41	61	93
3	65	6	13	94	41
4	50	84	9	30	60

Crash Course Review Exercises

Great Job!