CSC 492/592: Special Topics (Deep Learning) University of Rhode Island

Homework Assignment #1 (Due on Oct 26th at 11:59p)

This assignment is worth a total of 100 points. You will hand in a report for your solution to the Music Genre Recognition competition posted on Kaggle (https://www.kaggle.com/c/uri-dl-hw-1). Your report must be submitted via Gradescope in PDF form and must follow the structure below.

  1. (30 pt) Describe your solution to the Kaggle competition. Here you will include a detailed description of your work and provide a justification for the decisions you made during your implementation.

The list below shows a few examples of items to be discussed/presented in your description: • have you performed any feature transformation or data preprocessing? Yes, only normalization is applied as the pre-processing step to the data. With normalization, we centralize the data (zero-mean) and make it uni-variance.

• what is the structure of your classifier/neural network? Multi layer perceptron including one input layer, one hidden layer, one output layer. This feed-forward neural network can also be made deeper using more layers.

• what activation functions are being used? Tanh: zero-centered, monotonic, differentiable, and saturated to -1 and +1.

• what is your loss function? have you tried others? Max cross entropy using softmaxx layer for probability generation.

• are you implementing any sort of regularization? provide details No

• what types of gradient descent are being explored? batch? stochastic? Stochastic gradient descent the vanilla version.

• are you implementing any improved optimization? e.g. learning rate schedules or adding momentum No

• how are you performing model selection? is cross-validation being used? No model selection but cross-validation is applied to validate the training and make sure underfitting and overfitting is not happening.

• what are the hyperparameters you are tuning in your model selection? n_iter = 1000 # number of epochs alpha = 1e-2 # learning_rate mb_size = 128 # minibatch size num_hidden_units = 64 # number of kernels/ filters in each layer num_layers = 1 # depth of NN

  1. (30 pt) Report your public/private scores from Kaggle, and indicate all hyperparameter and char- acteristics of the model that was used for your final submission. They are all indicated in the hyperparameters section below. In order to earn points your scores need to be higher than the two provided baselines. Score is available in Kaggle!

The Score for Kaggle is 0.21313, ranked 6, and with 25 entries or tries. The hyperparameters are listed below: n_iter = 1000 # number of epochs alpha = 1e-2 # learning_rate mb_size = 128 # minibatch size num_hidden_units = 64 # number of kernels/ filters in each layer num_layers = 1 # depth of NN

  1. (40 pt) Include all the source code for your submission (syntax highlighting is highly appreciated in your source code but avoid screenshots and pictures). Points will be deducted for source code that does not include proper comments/style. Proper training/testing procedures are necessary for obtaining full points.

Source code is provide in this notebook with proper comments and separate in each block!


In [1]:
import pandas as pd # to read CSV files (Comma Separated Values)

train_x = pd.read_csv(filepath_or_buffer='../data/kaggle-music-genre/train.x.csv')
train_x.head()


Out[1]:
Id att1 att2 att3 att4 att5 att6 att7 att8 att9 ... att18 att19 att20 att21 att22 att23 att24 att25 att26 msd_track_id
0 1 41.08 6.579 4.307 3.421 3.192 2.076 2.179 2.052 1.794 ... 1.3470 -0.2463 -1.5470 0.17920 -1.1530 -0.7370 0.40750 -0.67190 -0.05147 TRPLTEM128F92E1389
1 2 60.80 5.973 4.344 3.261 2.835 2.725 2.446 1.884 1.962 ... -0.3316 0.3519 -1.4760 0.52700 -2.1960 1.5990 -1.39000 0.22560 -0.72080 TRJWMBQ128F424155E
2 3 51.47 4.971 4.316 2.916 3.112 2.290 2.053 1.934 1.878 ... -0.2803 -0.1603 -0.1355 1.03500 0.2370 1.4890 0.02959 -0.13670 0.10820 TRRZWMO12903CCFCC2
3 4 41.28 6.610 4.411 2.602 2.822 2.126 1.984 1.973 1.945 ... -1.6930 1.0040 -0.3953 0.26710 -1.0450 0.4974 0.03724 1.04500 -0.20000 TRBZRUT12903CE6C04
4 5 54.17 8.945 4.685 4.208 3.154 3.527 2.733 2.202 2.686 ... 2.4690 -0.5449 -0.5622 -0.08968 -0.9823 -0.2445 -1.65800 -0.04825 -0.70950 TRLUJQF128F42AF5BF

5 rows × 28 columns


In [2]:
train_y = pd.read_csv(filepath_or_buffer='../data/kaggle-music-genre/train.y.csv')
train_y.head()


Out[2]:
Id class_label
0 1 International
1 2 Vocal
2 3 Latin
3 4 Blues
4 5 Vocal

In [3]:
test_x = pd.read_csv(filepath_or_buffer='../data/kaggle-music-genre/test.x.csv')
test_x.head()


Out[3]:
Id att1 att2 att3 att4 att5 att6 att7 att8 att9 ... att17 att18 att19 att20 att21 att22 att23 att24 att25 att26
0 1 38.22 8.076 6.935 4.696 3.856 3.465 2.922 2.568 2.070 ... 3.988 0.4957 0.1836 -2.2210 0.6453 -0.2923 1.2000 -0.09179 0.4674 0.2158
1 2 36.42 6.131 5.364 4.292 3.968 2.937 2.872 2.142 2.050 ... 7.098 1.2290 0.5971 -1.0670 0.9569 -1.8240 2.3130 -0.80890 0.5612 -0.6225
2 3 70.01 5.496 4.698 3.699 3.258 2.293 2.680 2.226 2.034 ... 4.449 0.4773 1.6370 -1.0690 2.4160 -0.6299 1.4190 -0.81960 0.9151 -0.5948
3 4 40.64 7.281 6.702 4.043 3.729 3.043 2.644 2.366 1.940 ... 2.785 1.9000 -1.1370 1.2750 1.7920 -2.1250 1.6090 -0.83230 -0.1998 -0.1218
4 5 38.85 7.118 5.703 4.825 4.088 3.823 3.254 2.551 2.193 ... 4.536 2.1470 1.0200 -0.2656 2.8050 0.2762 0.2504 1.04900 0.3447 -0.7689

5 rows × 27 columns


In [4]:
test_y_sample = pd.read_csv(filepath_or_buffer='../data/kaggle-music-genre/submission-random.csv')
test_y_sample.head()


Out[4]:
Id Blues Country Electronic Folk International Jazz Latin New_Age Pop_Rock Rap Reggae RnB Vocal
0 1 0.0964 0.0884 0.0121 0.1004 0.0137 0.1214 0.0883 0.0765 0.0332 0.0445 0.1193 0.1019 0.1038
1 2 0.0121 0.0804 0.0376 0.0289 0.1310 0.0684 0.1044 0.0118 0.1562 0.0585 0.1633 0.1400 0.0073
2 3 0.1291 0.0985 0.0691 0.0356 0.0788 0.0529 0.1185 0.1057 0.1041 0.0075 0.0481 0.1283 0.0238
3 4 0.0453 0.1234 0.0931 0.0126 0.1224 0.0627 0.0269 0.0764 0.0812 0.1337 0.0357 0.0937 0.0930
4 5 0.0600 0.0915 0.0667 0.0947 0.0509 0.0335 0.1251 0.0202 0.1012 0.0365 0.1310 0.0898 0.0991

In [5]:
test_y_sample[:0]


Out[5]:
Id Blues Country Electronic Folk International Jazz Latin New_Age Pop_Rock Rap Reggae RnB Vocal

In [6]:
import numpy as np

train_X = np.array(train_x)
train_Y = np.array(train_y[:]['class_label'])
test_X = np.array(test_x)

# Getting rid of the first and the last column: Id and msd_track_id
X_train_val = np.array(train_X[:, 1:-1], dtype=float)
X_test = np.array(test_X[:, 1:], dtype=float)

train_Y.shape


Out[6]:
(13000,)

In [7]:
from collections import Counter

# Count the freq of the keys in the training labels
counted_labels = Counter(train_Y)
labels_keys = counted_labels.keys()
labels_keys


Out[7]:
dict_keys(['Country', 'Rap', 'Latin', 'International', 'RnB', 'Electronic', 'Pop_Rock', 'Blues', 'Folk', 'Reggae', 'Vocal', 'New_Age', 'Jazz'])

In [8]:
labels_keys_sorted = sorted(labels_keys)
labels_keys_sorted


Out[8]:
['Blues',
 'Country',
 'Electronic',
 'Folk',
 'International',
 'Jazz',
 'Latin',
 'New_Age',
 'Pop_Rock',
 'Rap',
 'Reggae',
 'RnB',
 'Vocal']

In [9]:
# This for loop for creating a dictionary/ vocab
key_to_val = {key: val for val, key in enumerate(labels_keys_sorted)}
key_to_val['Country']
key_to_val


Out[9]:
{'Blues': 0,
 'Country': 1,
 'Electronic': 2,
 'Folk': 3,
 'International': 4,
 'Jazz': 5,
 'Latin': 6,
 'New_Age': 7,
 'Pop_Rock': 8,
 'Rap': 9,
 'Reggae': 10,
 'RnB': 11,
 'Vocal': 12}

In [10]:
val_to_key = {val: key for val, key in enumerate(labels_keys_sorted)}
val_to_key[1]
val_to_key


Out[10]:
{0: 'Blues',
 1: 'Country',
 2: 'Electronic',
 3: 'Folk',
 4: 'International',
 5: 'Jazz',
 6: 'Latin',
 7: 'New_Age',
 8: 'Pop_Rock',
 9: 'Rap',
 10: 'Reggae',
 11: 'RnB',
 12: 'Vocal'}

In [11]:
Y_train_vec = []
for each in train_y[:]['class_label']:
#     print(each, key_to_val[each])
    Y_train_vec.append(key_to_val[each])

Y_train_val = np.array(Y_train_vec)
Y_train_val.shape


Out[11]:
(13000,)

In [12]:
# # Pre-processing: normalizing
# def normalize(X):
#     # max scale for images 255= 2**8= 8 bit grayscale for each channel
#     return (X - X.mean(axis=0)) #/ X.std(axis=0)
# X_train, X_val, X_test = normalize(X=X_train), normalize(X=X_val), normalize(X=X_test)

# Preprocessing: normalizing the data based on the training set
mean = X_train_val.mean(axis=0)
std = X_train_val.std(axis=0)

X_train_val, X_test = (X_train_val - mean)/ std, (X_test - mean)/ std
X_train_val.shape, X_test.shape, X_train_val.dtype, X_test.dtype


Out[12]:
((13000, 26), (10400, 26), dtype('float64'), dtype('float64'))

In [13]:
# Creating validation set: 10% or 1/10 of the training set or whatever dataset with labels/ annotation
valid_size = X_train_val.shape[0]//10
valid_size
X_val = X_train_val[-valid_size:]
Y_val = Y_train_val[-valid_size:]
X_train = X_train_val[: -valid_size]
Y_train = Y_train_val[: -valid_size]
X_train_val.shape, 
X_train.shape, X_val.shape, X_test.shape, Y_val.shape, Y_train.shape 
# X_train.dtype, X_val.dtype
# Y_train.dtype, Y_val


Out[13]:
((11700, 26), (1300, 26), (10400, 26), (1300,), (11700,))

In [14]:
def softmax(X):
    eX = np.exp((X.T - np.max(X, axis=1)).T)
    return (eX.T / eX.sum(axis=1)).T

def tanh_forward(X):
    out = np.tanh(X)
    cache = out
    return out, cache

def tanh_backward(dout, cache):
    # dX = dout * (1 - (np.tanh(X)**2)) # dTanh = 1-tanh**2
    dX = (1 - cache**2) * dout
    return dX

def cross_entropy(y_pred, y_train):
    m = y_pred.shape[0]

    prob = softmax(y_pred)
    log_like = -np.log(prob[range(m), y_train]) # to avoid the devision by zero
    data_loss = np.sum(log_like) / m

    return data_loss

def dcross_entropy(y_pred, y_train): # this is equal for both since the reg_loss (noise) derivative is ZERO.
    m = y_pred.shape[0]

    grad_y = softmax(y_pred)
    grad_y[range(m), y_train] -= 1.
    grad_y /= m

    return grad_y

In [15]:
from sklearn.utils import shuffle as skshuffle

class FFNN:

    def __init__(self, D, C, H, L):
        self.L = L # number of layers or depth
        self.losses = {'train':[], 'train_acc':[], 
                       'valid':[], 'valid_acc':[]}
        
        # The adaptive/learnable/updatable random feedforward
        self.model = []
        self.grads = []
        low, high = -1, 1
        
        # Input layer: weights/ biases
        m = dict(W=np.random.uniform(size=(D, H), low=low, high=high) / np.sqrt(D / 2.), 
                 b=np.zeros((1, H)))
        self.model.append(m)
        # Input layer: gradients
        self.grads.append({key: np.zeros_like(val) for key, val in self.model[0].items()})

        # Hidden layers: weights/ biases
        m_L = []
        for _ in range(L):
            m = dict(W=np.random.uniform(size=(H, H), low=low, high=high) / np.sqrt(H / 2.), 
                     b=np.zeros((1, H)))
            m_L.append(m)
        self.model.append(m_L)
        # Hidden layer: gradients
        grad_L = []
        for _ in range(L):
            grad_L.append({key: np.zeros_like(val) for key, val in self.model[1][0].items()})
        self.grads.append(grad_L)
        
        # Output layer: weights/ biases
        m = dict(W=np.random.uniform(size=(H, C), low=low, high=high) / np.sqrt(H / 2.), 
                 b=np.zeros((1, C)))
        self.model.append(m)
        # Outout layer: gradients
        self.grads.append({key: np.zeros_like(val) for key, val in self.model[2].items()})
        
    def fc_forward(self, X, W, b):
        out = (X @ W) + b
        cache = (W, X)
        return out, cache

    def fc_backward(self, dout, cache):
        W, X = cache

        dW = X.T @ dout
        db = np.sum(dout, axis=0).reshape(1, -1) # db_1xn
        dX = dout @ W.T # Backprop

        return dX, dW, db

    def train_forward(self, X, train):
        caches, ys = [], []
        
        # Input layer
        y, fc_cache = self.fc_forward(X=X, W=self.model[0]['W'], b=self.model[0]['b']) # X_1xD, y_1xc
        y, nl_cache = tanh_forward(X=y)
        X = y.copy() # pass to the next layer
        if train:
            caches.append((fc_cache, nl_cache))
        
        # Hidden layers
        fc_caches, nl_caches = [], []
        for layer in range(self.L):
            y, fc_cache = self.fc_forward(X=X, W=self.model[1][layer]['W'], b=self.model[1][layer]['b'])
            y, nl_cache = tanh_forward(X=y)
            X = y.copy() # pass to next layer
            if train:
                fc_caches.append(fc_cache)
                nl_caches.append(nl_cache)
        if train:
            caches.append((fc_caches, nl_caches)) # caches[1]            
        
        # Output layer
        y, fc_cache = self.fc_forward(X=X, W=self.model[2]['W'], b=self.model[2]['b'])
        # Softmax is included in loss function
        if train:
            caches.append(fc_cache)

        return y, caches # for backpropating the error

    def loss_function(self, y, y_train):
        
        loss = cross_entropy(y, y_train) # softmax is included
        dy = dcross_entropy(y, y_train) # dsoftmax is included
        
        return loss, dy
        
    def train_backward(self, dy, caches):
        grads = self.grads # initialized by Zero in every iteration/epoch
        
        # Output layer
        fc_cache = caches[2]
        # dSoftmax is included in loss function
        dX, dW, db = self.fc_backward(dout=dy, cache=fc_cache)
        dy = dX.copy()
        grads[2]['W'] = dW
        grads[2]['b'] = db

        # Hidden layer
        fc_caches, nl_caches = caches[1]
        for layer in reversed(range(self.L)):
            dy = tanh_backward(cache=nl_caches[layer], dout=dy) # diffable function
            dX, dW, db = self.fc_backward(dout=dy, cache=fc_caches[layer])
            dy = dX.copy()
            grads[1][layer]['W'] = dW
            grads[1][layer]['b'] = db
        
        # Input layer
        fc_cache, nl_cache = caches[0]
        dy = tanh_backward(cache=nl_cache, dout=dy) # diffable function
        dX, dW, db = self.fc_backward(dout=dy, cache=fc_cache)
        grads[0]['W'] = dW
        grads[0]['b'] = db

        return grads
    
    def test(self, X):
        y_logit, _ = self.train_forward(X, train=False)
        
        # if self.mode == 'classification':
        y_prob = softmax(y_logit) # for accuracy == acc
        y_pred = np.argmax(y_prob, axis=1) # for loss ==err
        
        return y_pred, y_logit
        
    def get_minibatch(self, X, y, minibatch_size, shuffle):
        minibatches = []

        if shuffle:
            X, y = skshuffle(X, y)

        for i in range(0, X.shape[0], minibatch_size):
            X_mini = X[i:i + minibatch_size]
            y_mini = y[i:i + minibatch_size]
            minibatches.append((X_mini, y_mini))

        return minibatches

    def sgd(self, train_set, val_set, alpha, mb_size, n_iter, print_after):
        X_train, y_train = train_set
        X_val, y_val = val_set

        # Epochs
        for iter in range(1, n_iter + 1):

            # Minibatches
            minibatches = self.get_minibatch(X_train, y_train, mb_size, shuffle=True)
            idx = np.random.randint(0, len(minibatches))
            X_mini, y_mini = minibatches[idx]
            
            # Train the model
            y, caches = self.train_forward(X_mini, train=True)
            _, dy = self.loss_function(y, y_mini)
            grads = self.train_backward(dy, caches) 
            
            # Update the model for input layer
            for key in grads[0].keys():
                self.model[0][key] -= alpha * grads[0][key]

            # Update the model for the hidden layers
            for layer in range(self.L):
                for key in grads[1][layer].keys():
                    self.model[1][layer][key] -= alpha * grads[1][layer][key]

            # Update the model for output layer
            for key in grads[2].keys():
                self.model[2][key] -= alpha * grads[2][key]
                
            # Trained model info
            y_pred, y_logit = self.test(X_mini)
            loss, _ = self.loss_function(y_logit, y_mini) # softmax is included in entropy loss function
            self.losses['train'].append(loss)
            acc = np.mean(y_pred == y_mini) # confusion matrix
            self.losses['train_acc'].append(acc)

            # Validated model info
            y_pred, y_logit = self.test(X_val)
            valid_loss, _ = self.loss_function(y_logit, y_val) # softmax is included in entropy loss function
            self.losses['valid'].append(valid_loss)
            valid_acc = np.mean(y_pred == y_val) # confusion matrix
            self.losses['valid_acc'].append(valid_acc)
            
            # Print the model info: loss & accuracy or err & acc
            if iter % print_after == 0:
                print('Iter: {}, train loss: {:.4f}, train acc: {:.4f}, valid loss: {:.4f}, valid acc: {:.4f}'.format(
                    iter, loss, acc, valid_loss, valid_acc))

In [16]:
Y_train.shape, X_train.shape, X_val.shape, Y_val.shape


Out[16]:
((11700,), (11700, 26), (1300, 26), (1300,))

In [19]:
# Hyper-parameters
n_iter = 1000 # number of epochs
alpha = 1e-2 # learning_rate
mb_size = 128 # 2**10==1024 # width, timestep for sequential data or minibatch size
print_after = 10 # n_iter//10 # print loss for train, valid, and test
num_hidden_units = 64 # number of kernels/ filters in each layer
num_input_units = X_train.shape[1] # noise added at the input lavel as input noise we can use dX or for more improvement
num_output_units = Y_train.max() + 1 # number of classes in this classification problem
num_layers = 1 # depth 

# Build the model/NN and learn it: running session.
nn = FFNN(C=num_output_units, D=num_input_units, H=num_hidden_units, L=num_layers)

nn.sgd(train_set=(X_train, Y_train), val_set=(X_val, Y_val), mb_size=mb_size, alpha=alpha, 
           n_iter=n_iter, print_after=print_after)


Iter: 10, train loss: 2.6001, train acc: 0.0625, valid loss: 2.5968, valid acc: 0.0831
Iter: 20, train loss: 2.5793, train acc: 0.0703, valid loss: 2.5835, valid acc: 0.0931
Iter: 30, train loss: 2.5155, train acc: 0.1328, valid loss: 2.5694, valid acc: 0.1015
Iter: 40, train loss: 2.5484, train acc: 0.1016, valid loss: 2.5554, valid acc: 0.1085
Iter: 50, train loss: 2.5542, train acc: 0.1016, valid loss: 2.5431, valid acc: 0.1131
Iter: 60, train loss: 2.5101, train acc: 0.1016, valid loss: 2.5311, valid acc: 0.1223
Iter: 70, train loss: 2.4730, train acc: 0.1484, valid loss: 2.5200, valid acc: 0.1308
Iter: 80, train loss: 2.5076, train acc: 0.1406, valid loss: 2.5095, valid acc: 0.1423
Iter: 90, train loss: 2.5010, train acc: 0.1250, valid loss: 2.4985, valid acc: 0.1500
Iter: 100, train loss: 2.4977, train acc: 0.1016, valid loss: 2.4887, valid acc: 0.1554
Iter: 110, train loss: 2.4862, train acc: 0.1250, valid loss: 2.4783, valid acc: 0.1569
Iter: 120, train loss: 2.4520, train acc: 0.2266, valid loss: 2.4690, valid acc: 0.1638
Iter: 130, train loss: 2.4684, train acc: 0.1641, valid loss: 2.4600, valid acc: 0.1685
Iter: 140, train loss: 2.4244, train acc: 0.2188, valid loss: 2.4510, valid acc: 0.1715
Iter: 150, train loss: 2.3886, train acc: 0.2656, valid loss: 2.4418, valid acc: 0.1846
Iter: 160, train loss: 2.4564, train acc: 0.2344, valid loss: 2.4332, valid acc: 0.1946
Iter: 170, train loss: 2.4210, train acc: 0.2031, valid loss: 2.4249, valid acc: 0.1969
Iter: 180, train loss: 2.4064, train acc: 0.2031, valid loss: 2.4173, valid acc: 0.2008
Iter: 190, train loss: 2.4254, train acc: 0.1719, valid loss: 2.4104, valid acc: 0.2008
Iter: 200, train loss: 2.3520, train acc: 0.2422, valid loss: 2.4028, valid acc: 0.2038
Iter: 210, train loss: 2.3798, train acc: 0.2344, valid loss: 2.3962, valid acc: 0.2100
Iter: 220, train loss: 2.3387, train acc: 0.2656, valid loss: 2.3892, valid acc: 0.2154
Iter: 230, train loss: 2.3683, train acc: 0.2109, valid loss: 2.3829, valid acc: 0.2154
Iter: 240, train loss: 2.3713, train acc: 0.2266, valid loss: 2.3762, valid acc: 0.2192
Iter: 250, train loss: 2.3269, train acc: 0.2109, valid loss: 2.3701, valid acc: 0.2208
Iter: 260, train loss: 2.3170, train acc: 0.2656, valid loss: 2.3632, valid acc: 0.2215
Iter: 270, train loss: 2.4071, train acc: 0.2031, valid loss: 2.3571, valid acc: 0.2223
Iter: 280, train loss: 2.4196, train acc: 0.1719, valid loss: 2.3516, valid acc: 0.2246
Iter: 290, train loss: 2.2954, train acc: 0.2656, valid loss: 2.3459, valid acc: 0.2269
Iter: 300, train loss: 2.3299, train acc: 0.2500, valid loss: 2.3397, valid acc: 0.2269
Iter: 310, train loss: 2.3525, train acc: 0.2109, valid loss: 2.3341, valid acc: 0.2277
Iter: 320, train loss: 2.3780, train acc: 0.1719, valid loss: 2.3294, valid acc: 0.2262
Iter: 330, train loss: 2.3609, train acc: 0.2891, valid loss: 2.3246, valid acc: 0.2315
Iter: 340, train loss: 2.2800, train acc: 0.2656, valid loss: 2.3195, valid acc: 0.2362
Iter: 350, train loss: 2.2325, train acc: 0.2344, valid loss: 2.3148, valid acc: 0.2338
Iter: 360, train loss: 2.2904, train acc: 0.2500, valid loss: 2.3102, valid acc: 0.2346
Iter: 370, train loss: 2.3098, train acc: 0.2500, valid loss: 2.3062, valid acc: 0.2346
Iter: 380, train loss: 2.2575, train acc: 0.2500, valid loss: 2.3016, valid acc: 0.2385
Iter: 390, train loss: 2.2798, train acc: 0.2734, valid loss: 2.2972, valid acc: 0.2431
Iter: 400, train loss: 2.1696, train acc: 0.2891, valid loss: 2.2929, valid acc: 0.2446
Iter: 410, train loss: 2.3403, train acc: 0.2500, valid loss: 2.2886, valid acc: 0.2454
Iter: 420, train loss: 2.2617, train acc: 0.2500, valid loss: 2.2839, valid acc: 0.2462
Iter: 430, train loss: 2.3332, train acc: 0.2344, valid loss: 2.2800, valid acc: 0.2446
Iter: 440, train loss: 2.3081, train acc: 0.1953, valid loss: 2.2756, valid acc: 0.2462
Iter: 450, train loss: 2.3269, train acc: 0.2031, valid loss: 2.2718, valid acc: 0.2492
Iter: 460, train loss: 2.2642, train acc: 0.2188, valid loss: 2.2683, valid acc: 0.2500
Iter: 470, train loss: 2.2421, train acc: 0.2656, valid loss: 2.2646, valid acc: 0.2492
Iter: 480, train loss: 2.2419, train acc: 0.2344, valid loss: 2.2613, valid acc: 0.2492
Iter: 490, train loss: 2.1280, train acc: 0.4038, valid loss: 2.2575, valid acc: 0.2515
Iter: 500, train loss: 2.1745, train acc: 0.3203, valid loss: 2.2540, valid acc: 0.2531
Iter: 510, train loss: 2.1680, train acc: 0.3047, valid loss: 2.2503, valid acc: 0.2562
Iter: 520, train loss: 2.2102, train acc: 0.2734, valid loss: 2.2470, valid acc: 0.2585
Iter: 530, train loss: 2.1808, train acc: 0.2656, valid loss: 2.2437, valid acc: 0.2585
Iter: 540, train loss: 2.2894, train acc: 0.2344, valid loss: 2.2406, valid acc: 0.2585
Iter: 550, train loss: 2.1491, train acc: 0.3359, valid loss: 2.2373, valid acc: 0.2608
Iter: 560, train loss: 2.1267, train acc: 0.3125, valid loss: 2.2338, valid acc: 0.2608
Iter: 570, train loss: 2.1458, train acc: 0.2812, valid loss: 2.2315, valid acc: 0.2592
Iter: 580, train loss: 2.2561, train acc: 0.2734, valid loss: 2.2289, valid acc: 0.2631
Iter: 590, train loss: 2.1578, train acc: 0.3047, valid loss: 2.2262, valid acc: 0.2654
Iter: 600, train loss: 2.1807, train acc: 0.3047, valid loss: 2.2235, valid acc: 0.2646
Iter: 610, train loss: 2.2329, train acc: 0.2422, valid loss: 2.2209, valid acc: 0.2638
Iter: 620, train loss: 2.2463, train acc: 0.2031, valid loss: 2.2181, valid acc: 0.2662
Iter: 630, train loss: 2.1876, train acc: 0.2812, valid loss: 2.2158, valid acc: 0.2677
Iter: 640, train loss: 2.1464, train acc: 0.3047, valid loss: 2.2134, valid acc: 0.2677
Iter: 650, train loss: 2.1648, train acc: 0.2891, valid loss: 2.2108, valid acc: 0.2685
Iter: 660, train loss: 2.1306, train acc: 0.3203, valid loss: 2.2086, valid acc: 0.2692
Iter: 670, train loss: 2.1863, train acc: 0.2500, valid loss: 2.2062, valid acc: 0.2700
Iter: 680, train loss: 2.2393, train acc: 0.2656, valid loss: 2.2036, valid acc: 0.2738
Iter: 690, train loss: 2.1149, train acc: 0.2969, valid loss: 2.2017, valid acc: 0.2762
Iter: 700, train loss: 2.2001, train acc: 0.2422, valid loss: 2.1998, valid acc: 0.2754
Iter: 710, train loss: 2.1983, train acc: 0.2812, valid loss: 2.1979, valid acc: 0.2731
Iter: 720, train loss: 2.1263, train acc: 0.3438, valid loss: 2.1957, valid acc: 0.2769
Iter: 730, train loss: 2.3528, train acc: 0.1719, valid loss: 2.1938, valid acc: 0.2769
Iter: 740, train loss: 2.1859, train acc: 0.2969, valid loss: 2.1923, valid acc: 0.2777
Iter: 750, train loss: 2.1191, train acc: 0.2969, valid loss: 2.1909, valid acc: 0.2792
Iter: 760, train loss: 2.0976, train acc: 0.2969, valid loss: 2.1889, valid acc: 0.2792
Iter: 770, train loss: 2.2289, train acc: 0.2812, valid loss: 2.1873, valid acc: 0.2785
Iter: 780, train loss: 2.2007, train acc: 0.2500, valid loss: 2.1854, valid acc: 0.2785
Iter: 790, train loss: 2.1545, train acc: 0.2969, valid loss: 2.1832, valid acc: 0.2808
Iter: 800, train loss: 2.2544, train acc: 0.2500, valid loss: 2.1812, valid acc: 0.2831
Iter: 810, train loss: 2.0917, train acc: 0.3984, valid loss: 2.1798, valid acc: 0.2831
Iter: 820, train loss: 2.0184, train acc: 0.3281, valid loss: 2.1780, valid acc: 0.2831
Iter: 830, train loss: 2.1898, train acc: 0.2969, valid loss: 2.1763, valid acc: 0.2846
Iter: 840, train loss: 2.0478, train acc: 0.3672, valid loss: 2.1744, valid acc: 0.2838
Iter: 850, train loss: 2.1585, train acc: 0.2891, valid loss: 2.1726, valid acc: 0.2831
Iter: 860, train loss: 2.1067, train acc: 0.3594, valid loss: 2.1711, valid acc: 0.2838
Iter: 870, train loss: 2.0990, train acc: 0.2969, valid loss: 2.1693, valid acc: 0.2846
Iter: 880, train loss: 2.1779, train acc: 0.2656, valid loss: 2.1676, valid acc: 0.2854
Iter: 890, train loss: 1.9959, train acc: 0.3672, valid loss: 2.1660, valid acc: 0.2838
Iter: 900, train loss: 2.0345, train acc: 0.3594, valid loss: 2.1646, valid acc: 0.2869
Iter: 910, train loss: 2.1366, train acc: 0.3047, valid loss: 2.1633, valid acc: 0.2862
Iter: 920, train loss: 2.1382, train acc: 0.2656, valid loss: 2.1618, valid acc: 0.2885
Iter: 930, train loss: 2.1217, train acc: 0.2891, valid loss: 2.1604, valid acc: 0.2908
Iter: 940, train loss: 2.0093, train acc: 0.3125, valid loss: 2.1587, valid acc: 0.2908
Iter: 950, train loss: 2.1775, train acc: 0.2969, valid loss: 2.1576, valid acc: 0.2938
Iter: 960, train loss: 2.1124, train acc: 0.3281, valid loss: 2.1561, valid acc: 0.2923
Iter: 970, train loss: 2.0881, train acc: 0.3281, valid loss: 2.1549, valid acc: 0.2915
Iter: 980, train loss: 2.0959, train acc: 0.2969, valid loss: 2.1534, valid acc: 0.2931
Iter: 990, train loss: 2.1102, train acc: 0.2578, valid loss: 2.1518, valid acc: 0.2938
Iter: 1000, train loss: 2.0664, train acc: 0.3281, valid loss: 2.1504, valid acc: 0.2923

In [20]:
# # Display the learning curve and losses for training, validation, and testing
# %matplotlib inline
# %config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt

plt.plot(nn.losses['train'], label='Train loss')
plt.plot(nn.losses['valid'], label='Valid loss')
plt.legend()
plt.show()



In [21]:
loss_train = np.array(nn.losses['train'], dtype=float)
loss_valid = np.array(nn.losses['valid'], dtype=float)
loss_train.shape, loss_valid.shape


Out[21]:
((1000,), (1000,))

In [22]:
loss_train_norm = (loss_train - loss_train.mean(axis=0))/ loss_train.std(axis=0)
loss_valid_norm = (loss_valid - loss_valid.mean(axis=0))/ loss_valid.std(axis=0)

In [23]:
plt.plot(loss_train_norm, label='Normalized train loss')
plt.plot(loss_valid_norm, label='Normalized valid loss')
plt.legend()
plt.show()



In [24]:
plt.plot(nn.losses['train_acc'], label='Train accuracy')
plt.plot(nn.losses['valid_acc'], label='Valid accuracy')
plt.legend()
plt.show()



In [25]:
heading = labels_keys_sorted.copy()
heading.insert(0, 'Id')
heading


Out[25]:
['Id',
 'Blues',
 'Country',
 'Electronic',
 'Folk',
 'International',
 'Jazz',
 'Latin',
 'New_Age',
 'Pop_Rock',
 'Rap',
 'Reggae',
 'RnB',
 'Vocal']

In [26]:
y_pred, y_logits = nn.test(X_test)
y_prob = softmax(y_logits)
y_prob.shape, X_test.shape, y_logits.shape, test_y_sample.shape, test_y_sample[:1]


Out[26]:
((10400, 13),
 (10400, 26),
 (10400, 13),
 (10400, 14),
    Id   Blues  Country  Electronic    Folk  International    Jazz   Latin  \
 0   1  0.0964   0.0884      0.0121  0.1004         0.0137  0.1214  0.0883   
 
    New_Age  Pop_Rock     Rap  Reggae     RnB   Vocal  
 0   0.0765    0.0332  0.0445  0.1193  0.1019  0.1038  )

In [27]:
pred_list = []
for Id, pred in enumerate(y_prob):
#     print(Id+1, *pred)
    pred_list.append([Id+1, *pred])

In [28]:
pred_file = open(file='prediction.csv', mode='w')
pred_file.write('\n') # because of the previous line        

for idx in range(len(heading)):
    if idx < len(heading) - 1:
        pred_file.write(heading[idx] + ',')
    else:
        pred_file.write(heading[idx] + '\n')        

# len(test), test[0]
# for key in test:
for i in range(len(pred_list)): # rows
    for j in range(len(pred_list[i])): # cols
        if j < (len(pred_list[i]) - 1):
            pred_file.write(str(pred_list[i][j]))
            pred_file.write(',')
        else: # last item before starting a new line
            pred_file.write(str(pred_list[i][j]) + '\n')        

# pred_file.write(-',')
pred_file.close()

In [29]:
pd.read_csv(filepath_or_buffer='prediction.csv').head()


Out[29]:
Id Blues Country Electronic Folk International Jazz Latin New_Age Pop_Rock Rap Reggae RnB Vocal
0 1 0.025174 0.016746 0.065243 0.017988 0.026095 0.026814 0.037724 0.011046 0.008171 0.294567 0.359248 0.079265 0.031919
1 2 0.043550 0.033226 0.047976 0.027139 0.042617 0.011235 0.062821 0.009417 0.018814 0.261108 0.280848 0.091242 0.070008
2 3 0.032431 0.025760 0.094421 0.019534 0.037828 0.008196 0.044102 0.005857 0.058257 0.255460 0.319420 0.078205 0.020529
3 4 0.034612 0.048024 0.066285 0.037946 0.040719 0.017690 0.051901 0.013471 0.017397 0.188359 0.298043 0.104883 0.080668
4 5 0.030729 0.014709 0.058063 0.007349 0.014495 0.004782 0.027369 0.002000 0.018998 0.360809 0.406771 0.041362 0.012564

In [30]:
pd.read_csv(filepath_or_buffer='prediction.csv').shape, test_y_sample.shape


Out[30]:
((10400, 14), (10400, 14))

In [31]:
test_y_sample.head()


Out[31]:
Id Blues Country Electronic Folk International Jazz Latin New_Age Pop_Rock Rap Reggae RnB Vocal
0 1 0.0964 0.0884 0.0121 0.1004 0.0137 0.1214 0.0883 0.0765 0.0332 0.0445 0.1193 0.1019 0.1038
1 2 0.0121 0.0804 0.0376 0.0289 0.1310 0.0684 0.1044 0.0118 0.1562 0.0585 0.1633 0.1400 0.0073
2 3 0.1291 0.0985 0.0691 0.0356 0.0788 0.0529 0.1185 0.1057 0.1041 0.0075 0.0481 0.1283 0.0238
3 4 0.0453 0.1234 0.0931 0.0126 0.1224 0.0627 0.0269 0.0764 0.0812 0.1337 0.0357 0.0937 0.0930
4 5 0.0600 0.0915 0.0667 0.0947 0.0509 0.0335 0.1251 0.0202 0.1012 0.0365 0.1310 0.0898 0.0991

In [ ]:


In [ ]: