Deep learning on video titles

In this notebook we do some data learning using the titles and the number of subscribers of the videos.


In [53]:
import requests
import json
import pandas as pd
from math import *
import numpy as np
import tensorflow as tf
import time
import collections
import os
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from IPython.display import display
from random import randint

Database selection

We can choose on which database we want to do our learning. To test our neural network we created 3 databases, one on the theme "animals", an other on "cars", and one with random videos. We want to see if we get different results depending on the dataset. More databases can easily be created by using the notebook "create_videos_database".


In [54]:
folder = os.path.join('sql_database_animaux')
#folder = os.path.join('sql_database_voitures')
#folder = os.path.join('sql_database_random')

In [55]:
videos_database = pd.read_sql('videos', 'sqlite:///' + os.path.join(folder, 'videos.sqlite'), index_col='index')
videos_database = videos_database.drop_duplicates('id')
videos_database = videos_database.reset_index(drop=True)

display(videos_database)

print("Length of the video database :",len(videos_database))


id channelId title thumbnailUrl viewCount likeCount dislikeCount commentCount subsCount
0 p3mjQp8hLp8 UCa4R_3Ii-u7RxvROF9GUJnw bêtisier des animaux https://i.ytimg.com/vi/p3mjQp8hLp8/default.jpg 11889 40 2 1 239
1 fsM9ecpIBFA UC8_aLXmRelD95h5CLJOIryw Quand un chien policier attaque!!! https://i.ytimg.com/vi/fsM9ecpIBFA/default.jpg 8479 22 5 0 18
2 7fgCfC3bM0U UCpko_-a4wgz2u_DgDgd9fqA The Try Guys Try Drag For The First Time https://i.ytimg.com/vi/7fgCfC3bM0U/default.jpg 23449551 266289 6713 20994 11639882
3 GT22KpOF98E UCOyVzq9Qv5vuiJACON4FJxA نسخة عن FILM documentaire animaux sauvages 2016 https://i.ytimg.com/vi/GT22KpOF98E/default.jpg 48336 40 24 1 70
4 ZpF-jjy72-M UCakQLdwrxuo0KhJ49Sq2csA Alice aux Pays des Merveilles - Extrait - Le c... https://i.ytimg.com/vi/ZpF-jjy72-M/default.jpg 81586 223 4 5 940704
5 iRff2nHA-tc UCvWx3bt6RmIE2aW-EkN0hOA JUL - C'EST LA DANSE DES CANARDS [CLIP OFFICIEL] https://i.ytimg.com/vi/iRff2nHA-tc/default.jpg 487 16 7 7 48
6 8SHtIMrM2FI UCckz6n8QccTd6K_xdwKqa0A Streetart: M. Chat sort les griffes face à la ... https://i.ytimg.com/vi/8SHtIMrM2FI/default.jpg 15510 93 4 1 109749
7 cZh39iYOJAs UC5pVNCws5dA6Ef818KcsYyg FS17 / Map Fichtelberg / ep14 / Les animaux/ v... https://i.ytimg.com/vi/cZh39iYOJAs/default.jpg 17770 499 10 72 128848
8 etswSufeUek UCLMKLU-ZuDQIsbjMvR3bbog Présentation du mod "ANIMAL BIKES V2"! - 26 Mo... https://i.ytimg.com/vi/etswSufeUek/default.jpg 523449 5225 141 402 1232513
9 fwS0PthZKSk UC-iBrOB6NtcCYaZ_yLNnEdg CHIEN DE SUIVI -THE HUNTER https://i.ytimg.com/vi/fwS0PthZKSk/default.jpg 2269 55 4 22 4200
10 SKmks3GptB8 UCFrDDP81MX_QfOHrRZOgD4g La résistance d'un cheval à l'abattoir #StopAb... https://i.ytimg.com/vi/SKmks3GptB8/default.jpg 25683 83 41 90 16384
11 HgFCgC9e184 UC6qHSO1KaXdHVgQiXO6LISw How fast is your hamster? https://i.ytimg.com/vi/HgFCgC9e184/default.jpg 25095 430 13 122 29147
12 kpdSQ1v-4h8 UCy4C4IQVxxd1JMyp6f2-y_Q A Hampsterdance Tribute Video https://i.ytimg.com/vi/kpdSQ1v-4h8/default.jpg 1418786 5752 184 1558 3973
13 VAPZhisVTAI UCjOKBdaBEnS_yfFxBqhc7lw Brigade Équestre La Courneuve (93) Emission Le... https://i.ytimg.com/vi/VAPZhisVTAI/default.jpg 64692 213 8 0 43
14 cR98RZrNuZY UCukLvmEiidTNhA3O45nN3Cg Langue d'un chat en slow motion - SlowMo pour ... https://i.ytimg.com/vi/cR98RZrNuZY/default.jpg 107538 1860 24 131 31152
15 pKcnyHOXOTQ UCmBLuIUUp2aG96oE9u78X9Q dresser son chien à ne pas bouger https://i.ytimg.com/vi/pKcnyHOXOTQ/default.jpg 121765 660 29 59 1306
16 nBtDsQ4fhXY UCMJ5Qf3sOvQpcYiai1Noa3Q Major Lazer - Cold Water (feat. Justin Bieber ... https://i.ytimg.com/vi/nBtDsQ4fhXY/default.jpg 118152794 899872 28873 24915 6278329
17 2qLwkT1F4Lc UCR4s1DE9J4DHzZYXMltSMAg How To Make Vegetarian Lasagna https://i.ytimg.com/vi/2qLwkT1F4Lc/default.jpg 1261936 73031 4135 8516 8505141
18 MftOONlDQac UC8-Th83bH_thdKZDJCrn88g Jimmy Fallon Went to Bayside High with "Saved ... https://i.ytimg.com/vi/MftOONlDQac/default.jpg 35374533 177133 4749 15472 12829046
19 -5sXP1ST_Mw UCy0cGyspwjGvBlauVhX3S7w Différentes méthodes pour monter à cheval https://i.ytimg.com/vi/-5sXP1ST_Mw/default.jpg 206005 884 78 352 1243
20 KDrpPqsXfVU UCC85qHVFjfIaF23dfVd4wog Hamster VTEC Miss-Shift https://i.ytimg.com/vi/KDrpPqsXfVU/default.jpg 1826271 8812 178 534 133
21 -UM8h0kp_KU UCxEWAwSWGR6hIRL61pRtiUg Une vie de chat (A Cat in Paris) | Bande annon... https://i.ytimg.com/vi/-UM8h0kp_KU/default.jpg 302544 465 27 26 6893
22 52XdYZ-wRsc UCUSybgAxXRfRpr7ukQF_klg La rumeur - l'oiseau fait sa cage https://i.ytimg.com/vi/52XdYZ-wRsc/default.jpg 27009 123 2 12 442
23 BRUILig_dAQ UCp0hYYBW6IMayGgR-WeoCvQ Ellen and Tom Hanks Have a Pixar-Off! https://i.ytimg.com/vi/BRUILig_dAQ/default.jpg 3201743 38365 342 1535 18393053
24 4EUobFfx-tU UCy_pH0l1ENt0keocE09jYDw ONE MAN DOES 30 ANIMAL SOUNDS https://i.ytimg.com/vi/4EUobFfx-tU/default.jpg 5173810 84213 4349 7206 89470
25 DcU13oVAFR0 UCpxw7aG-avB2MOHbVfp7B-Q TAG - 10 Choses qui ont changé depuis que j'ai... https://i.ytimg.com/vi/DcU13oVAFR0/default.jpg 29318 555 40 121 7552
26 R_4V_534Wao UCcOGvjC9rlMciMIFoVL0Pkw Un chien panique en voyant son maitre sous l'eau https://i.ytimg.com/vi/R_4V_534Wao/default.jpg 251988 1234 39 75 1135
27 Fh9dRaFW0O4 UCc9Vhky48Pv1jE_7HpRzBYQ DOUCHER SON CHEVAL 🚿 https://i.ytimg.com/vi/Fh9dRaFW0O4/default.jpg 37223 1805 7 550 21587
28 j1JmFzDtI7s UCQn_PPRxgmP5BV3k7Nlplxg L’expo BODY WORLDS : Animaux à corps ouvert https://i.ytimg.com/vi/j1JmFzDtI7s/default.jpg 2663 0 0 1 721
29 uxV6-l8idZg UCYyLDV7Kv8EEpYZSSjoDaCA Hamster Bonding Tips! https://i.ytimg.com/vi/uxV6-l8idZg/default.jpg 33722 0 0 111 15193
... ... ... ... ... ... ... ... ... ...
3792 SDYFWRRlgNw UCqRUUBc0ybdnxsGuxCDqe2w Mes achats au salon du cheval 2014 https://i.ytimg.com/vi/SDYFWRRlgNw/default.jpg 6246 51 4 27 1023
3793 AWFWrFNo6Tk UCiUHWeGU13MBHuGLdtoIEkg Désensibilisation chien agressif envers ses co... https://i.ytimg.com/vi/AWFWrFNo6Tk/default.jpg 58713 123 19 45 158
3794 K4O7Flt9PNs UCuk-5yiL8PQwovBbKMPC3CA Comment bien natter son cheval : suivez les co... https://i.ytimg.com/vi/K4O7Flt9PNs/default.jpg 135866 810 41 123 13563
3795 cGTlqydmq2c UCb7TuwnX9-7B9ny9ncaMpDA L'histoire du petit oiseau. https://i.ytimg.com/vi/cGTlqydmq2c/default.jpg 30535 116 0 9 6
3796 z5EnEaiKghQ UCDjuU0aZbNKS4uezlCBsYBA [TAG Présente nous ton cheval :) ] https://i.ytimg.com/vi/z5EnEaiKghQ/default.jpg 73501 948 33 136 52978
3797 Xt9cNXSZMGU UCSPZewkLzY70H81rvOsiOUA Hamster : parcours d'agility https://i.ytimg.com/vi/Xt9cNXSZMGU/default.jpg 156909 1417 31 191 298
3798 bz-J2XIC7b8 UCpkY_QlQWfmT7D9H-w8qVBg 7 PRATIQUES SEXUELLES ÉTRANGES chez les ANIMAUX https://i.ytimg.com/vi/bz-J2XIC7b8/default.jpg 221077 5010 281 771 431978
3799 eHCyaJS4Cbs UCMu5gPmKp5av0QCAajKTMhw Bruce Lee vs Clint Eastwood. Epic Rap Battles... https://i.ytimg.com/vi/eHCyaJS4Cbs/default.jpg 61289415 375689 7741 242793 14130253
3800 OWMctRDfGAo UCfm4y4rHF5HGrSr-qbvOwOg When A Brown Girl Dates A White Boy (ft. Adam ... https://i.ytimg.com/vi/OWMctRDfGAo/default.jpg 5392382 218661 4483 13439 10773165
3801 M6hqsTgmXaw UCOurQ4tHJIElCz2kIbkUblg ROBERT FUSIL et les chiens fous | "Libre comme... https://i.ytimg.com/vi/M6hqsTgmXaw/default.jpg 9958 71 3 8 103
3802 IDY-hGNY7_k UCvgfXK4nTYKudb0rFR6noLA UFC 207: The Thrill and the Agony - Preview https://i.ytimg.com/vi/IDY-hGNY7_k/default.jpg 450127 2751 73 900 3619397
3803 YgVehdAQV2k UCF4mncU0xAOzyqG0hQLglFQ The Chat with Priscilla - Mercy Multiplied (Pa... https://i.ytimg.com/vi/YgVehdAQV2k/default.jpg 5101 85 1 6 5171
3804 C0B8uY6KBo8 UC4Tis1aWiIaIKwFeyQazJZA Animaux DRÔLE accouplement , bétail , l'accou... https://i.ytimg.com/vi/C0B8uY6KBo8/default.jpg 32556 16 11 3 0
3805 FCBNWkPH0MU UC-iBrOB6NtcCYaZ_yLNnEdg LES NOUVEAUX CANARDS - THE HUNTER https://i.ytimg.com/vi/FCBNWkPH0MU/default.jpg 10721 100 3 15 4200
3806 jYCD5Nyr1vU UCFmrfBN7qjyP5j8nbw6sO1w DIY Lolly Stick Hamster House https://i.ytimg.com/vi/jYCD5Nyr1vU/default.jpg 56077 1100 24 282 122035
3807 0FGMypAJ4vA UCAdyNOE80FsFPYlFliyXfwQ Sexe : comment fait le cheval ? - ZAPPING SAUVAGE https://i.ytimg.com/vi/0FGMypAJ4vA/default.jpg 1701385 771 317 67 30215
3808 9draaLgNCco UCvEPXHn0-ZGRQrPMfIawWqQ Party Animal (Remix) - Charly Black Ft. Daddy ... https://i.ytimg.com/vi/9draaLgNCco/default.jpg 135075 979 29 24 140271
3809 0JAObeMl1bo UCkmISlL1CNB36sSiWBze54Q When Prey Fights Back | Most Amazing Animal At... https://i.ytimg.com/vi/0JAObeMl1bo/default.jpg 93646 106 28 13 63824
3810 _dmGNe1aipw UCLtaWed3u1FTeurk1CXPL4g 5 Strange Animal Mating Rituals (Part 2) https://i.ytimg.com/vi/_dmGNe1aipw/default.jpg 155392 107 34 13 43905
3811 1VuMdLm0ccU UCuAlq9bXR6SLadGysNbcPLg 2 Hamsters 1 Wheel https://i.ytimg.com/vi/1VuMdLm0ccU/default.jpg 7447279 59923 1291 2850 1036
3812 uo2F1eRpMoI UCvnm_PY4H-kFy4gyonx3Fvg Baywatch | International Trailer - "Ready" | P... https://i.ytimg.com/vi/uo2F1eRpMoI/default.jpg 1324844 3875 535 440 245637
3813 ZIsNLaFVHks UCI7NEK93UIaqMztD0UoyS1Q Marechal ferrant - Ferrage d'un cheval par Mar... https://i.ytimg.com/vi/ZIsNLaFVHks/default.jpg 33783 21 6 3 13
3814 KQdZgj0enhQ UC_majyzBb8HZV0wCmU4Z4Ag L'art du dressage du cheval - Documentaire com... https://i.ytimg.com/vi/KQdZgj0enhQ/default.jpg 448913 948 140 54 234945
3815 i-FV8TVN598 UCEjZNkOGaq6PDXjd6jW6BSw [ANIMAUX] Le capucin, intelligent et sauvage https://i.ytimg.com/vi/i-FV8TVN598/default.jpg 13017 47 8 0 82162
3816 DDhnW4Vx8Eo UCjo1xlKU4-m_zP4lvC_XqlA Compilation d'animaux drôles qui danse ! https://i.ytimg.com/vi/DDhnW4Vx8Eo/default.jpg 11469 59 3 5 14
3817 8xrzEGbJHz4 UCIfCFwJH6rr-KUv67MNt_vA CUTE HAMSTER NAMES 🐹❤ https://i.ytimg.com/vi/8xrzEGbJHz4/default.jpg 8226 0 0 0 1033
3818 6g5fFO4whLw UCkpSnbo3Y8AMdflzwifkeaw Maltraitance envers les animaux https://i.ytimg.com/vi/6g5fFO4whLw/default.jpg 33950 228 49 268 210
3819 w6py4c3CXe0 UC6KkAnkG6Xdy04lihAROjqQ Farm Animals Learn Animal Names and Sounds Edu... https://i.ytimg.com/vi/w6py4c3CXe0/default.jpg 281251 224 68 224 54017
3820 xZLIbMoDPEg UCjTCFFq605uuq4YN4VmhkBA Custom PC build materials @ Cooler Master - CE... https://i.ytimg.com/vi/xZLIbMoDPEg/default.jpg 21374 763 69 120 1043317
3821 kuz5uBCCU_g UCEeMNYGmcs3XYzmWYbXjpUg Call of Duty: 360 No Scope in Real Life: VR Video https://i.ytimg.com/vi/kuz5uBCCU_g/default.jpg 91361 491 130 103 18969

3822 rows × 9 columns

Length of the video database : 3822

Train_data creation

For our train_data set we use the Bag of Words and Term Frequency-Inverse Document Frequency (TF-IDF) method. It is usually used in sentiment analysis but this method should give interesting results in our case because we think that some particular words in a video title may attract more viewers.

We also append the normalized number of subscribers to each vector.


In [60]:
#maximal number of words to extract, it is also the maximal size of our vectors
#we played a little with this value and 2000 seems to give good results
nwords = 2000  

#the stopwords are the words such as "the" or "is" that are everywhere and does not give any information
#we don't want those words in our vocabulary
#we get them from the file "stopwords.txt" found on the internet
stopwords = [line.rstrip('\n') for line in open('stopwords.txt')]

#print('stopwords:',stopwords)

def compute_bag_of_words(text, nwords):
    vectorizer = CountVectorizer(max_features=nwords)
    vectors = vectorizer.fit_transform(text)
    vocabulary = vectorizer.get_feature_names()
    return vectors, vocabulary

#we concatenate the titles to extract the words from them
concatenated_titles=[]
for titles in videos_database['title']:
    concatenated_titles += [' ', titles]

#create a vocabulary from the titles
title_bow, titles_vocab = compute_bag_of_words(concatenated_titles, nwords)

del concatenated_titles

titles_list = videos_database['title'].tolist()

#we apply the TF-IDF method to the titles
vect = TfidfVectorizer(sublinear_tf=True, max_df=0.5, analyzer='word', stop_words=stopwords, vocabulary=titles_vocab)
vect.fit(titles_list)

#create a sparse TF-IDF matrix 
titles_tfidf = vect.transform(titles_list)

del titles_list

train_data = titles_tfidf.todense()

print(train_data.shape)

def print_most_frequent(bow, vocab, n=100):
    idx = np.argsort(bow.sum(axis=0))
    for i in range(n):
        j = idx[0, -i]
        print(vocab[j],': ',title_bow.sum(axis=0)[0,j])
 
print('most used words:')

print_most_frequent(title_bow,titles_vocab)

#print(len(title_vocab))

#print(train_data.shape)


(3822, 2000)
most used words:
fall :  2
de :  706
un :  522
animaux :  473
chat :  468
les :  449
chien :  443
animal :  440
le :  429
hamster :  417
cheval :  412
oiseau :  401
canards :  393
la :  373
et :  327
des :  325
the :  240
du :  233
en :  209
pour :  171
son :  164
comment :  143
of :  131
to :  129
mon :  118
aux :  117
qui :  116
plus :  112
vs :  108
10 :  106
compilation :  104
2016 :  103
chasse :  101
in :  96
and :  96
avec :  96
video :  93
dans :  88
une :  85
funny :  84
au :  84
for :  83
est :  82
sur :  82
with :  81
top :  80
on :  77
hd :  71
how :  71
animals :  68
petit :  67
cage :  62
2015 :  61
monde :  59
par :  57
official :  55
faire :  52
danse :  52
hamsters :  49
enfant :  48
kids :  46
new :  46
parole :  45
2014 :  45
apprendre :  44
tuto :  44
epic :  44
fait :  43
sauvages :  42
enfants :  42
diy :  41
history :  41
best :  40
documentaire :  40
chiens :  40
battles :  40
rap :  39
se :  38
my :  38
season :  37
vidéo :  37
you :  37
pas :  36
votre :  36
chats :  35
wild :  35
your :  35
ce :  35
try :  35
life :  35
drôles :  33
que :  33
drôle :  32
comme :  31
live :  31
français :  31
bébé :  30
je :  30
homme :  29
tiny :  29

In [61]:
#add the sub count to data_train

subsCountTemp = videos_database['subsCount'].tolist()

maxSubs = max(subsCountTemp)

print(max(subsCountTemp))

#divide all the subs count by the maximal number of subs. 
#it is to have values in the range of the values created by the tf-idf algorithm
subsCount = []
for x in subsCountTemp:
    subsCount.append(x/maxSubs)
    
del subsCountTemp

#add the subs to our train_data
subsCount = np.asarray(subsCount)
subsCount = np.reshape(subsCount, [len(subsCount),1]);
train_data = np.append(train_data, np.array(subsCount), 1)

del subsCount

print(train_data.shape)


52434012
(3822, 2001)

The labels

Each of our label corresponds to a range of views. We have 8 labels:

  • 0 to 99 views
  • 100 to 999 views
  • 1'000 to 9'999 views
  • 10'000 to 99'999 views
  • 100'000 to 999'999 views
  • 1'000'000 to 9'999'999 views
  • 10'000'000 to 99'999'999 views
  • more than 99'999'999 views

In [62]:
nbr_labels = 8
nbr_video = len(videos_database['title'])

train_labels = np.zeros([train_data.shape[0],nbr_labels])

for i in range(nbr_video):
    views = int(videos_database['viewCount'][i])

    if views < 99:
        train_labels[i] = [1,0,0,0,0,0,0,0]
    elif views < 999:
        train_labels[i] = [0,1,0,0,0,0,0,0]
    elif views < 9999:
        train_labels[i] = [0,0,1,0,0,0,0,0]
    elif views < 99999:
        train_labels[i] = [0,0,0,1,0,0,0,0]
    elif views < 999999:
        train_labels[i] = [0,0,0,0,1,0,0,0]
    elif views < 9999999:
        train_labels[i] = [0,0,0,0,0,1,0,0]
    elif views < 99999999:
        train_labels[i] = [0,0,0,0,0,0,1,0]
    else:
        train_labels[i] = [0,0,0,0,0,0,0,1]
        
print('train_labels shape :', train_labels.shape)


train_labels shape : (3822, 8)

Test set extraction

We randomly extract 100 items from our data set to construct our test set.


In [63]:
testset = 100

test_data = np.zeros([testset,train_data.shape[1]])
test_labels = np.zeros([testset,nbr_labels])

for i in range(len(test_data)):
    x = randint(0,len(test_data))
    test_data[i] = train_data[x]
    test_labels[i] = train_labels[x]
    train_data=np.delete(train_data,x,axis=0)
    train_labels=np.delete(train_labels,x,axis=0)
    
print('train data shape  ', train_data.shape)
print('train labels shape', train_labels.shape)
print('train test shape  ', test_data.shape)
print('train labels shape', test_labels.shape)


train data shape   (3722, 2001)
train labels shape (3722, 8)
train test shape   (100, 2001)
train labels shape (100, 8)

Neural Network Classifier

We tried different networks, with 2, 3 or even 4 layers, fully connected or not, and different activations. In the end the classic 2 layer with ReLu activation works just as well as the others, or better.

$$ y=\textrm{softmax}(ReLU(xW_1+b_1)W_2+b_2) $$

In [51]:
# Define computational graph (CG)
batch_size = testset     # batch size
d = train_data.shape[1]  # data dimensionality
nc = nbr_labels          # number of classes

# CG inputs
xin = tf.placeholder(tf.float32,[batch_size,d]);
y_label = tf.placeholder(tf.float32,[batch_size,nc]);

# 1st Fully Connected layer
nfc1 = 300
Wfc1 = tf.Variable(tf.truncated_normal([d,nfc1], stddev=tf.sqrt(5./tf.to_float(d+nfc1)) ));
bfc1 = tf.Variable(tf.zeros([nfc1]));
y = tf.matmul(xin, Wfc1);
y += bfc1;

# ReLU activation
y = tf.nn.relu(y)

# dropout
y = tf.nn.dropout(y, 0.25)

# 2nd layer
nfc2 = nc
#nfc2 = 100
Wfc2 = tf.Variable(tf.truncated_normal([nfc1,nfc2], stddev=tf.sqrt(5./tf.to_float(nfc1+nc)) )); 
bfc2 = tf.Variable(tf.zeros([nfc2])); 
y = tf.matmul(y, Wfc2); 
y += bfc2;

#y = tf.nn.relu(y)

# 3rd layer
#nfc3 = nc
#Wfc3 = tf.Variable(tf.truncated_normal([nfc2,nfc3], stddev=tf.sqrt(5./tf.to_float(nfc1+nc)) )); 
#bfc3 = tf.Variable(tf.zeros([nfc3])); 
#y = tf.matmul(y, Wfc3); 
#y += bfc3;

# Softmax
y = tf.nn.softmax(y);

# Loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_label * tf.log(y), 1))

# L2 Regularization
reg_loss = tf.nn.l2_loss(Wfc1)
reg_loss += tf.nn.l2_loss(bfc1)
reg_loss += tf.nn.l2_loss(Wfc2)
reg_loss += tf.nn.l2_loss(bfc2)
reg_par = 4*1e-3
total_loss = cross_entropy + reg_par*reg_loss

# Optimization scheme
train_step = tf.train.AdamOptimizer(0.001).minimize(total_loss)

# Accuracy
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_label,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [52]:
# Run Computational Graph
n = train_data.shape[0]
indices = collections.deque()
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(10001):
    
    # Batch extraction
    if len(indices) < batch_size:
        indices.extend(np.random.permutation(n)) 
    idx = [indices.popleft() for i in range(batch_size)]
    batch_x, batch_y = train_data[idx,:], train_labels[idx]
    
    # Run CG for variable training
    _,acc_train,total_loss_o = sess.run([train_step,accuracy,total_loss], feed_dict={xin: batch_x, y_label: batch_y})
    
    # Run CG for test set
    if not i%100:
        print('\nIteration i=',i,', train accuracy=',acc_train,', loss=',total_loss_o)
        acc_test = sess.run(accuracy, feed_dict={xin: test_data, y_label: test_labels})
        print('test accuracy=',acc_test)


Iteration i= 0 , train accuracy= 0.15 , loss= 4.17036
test accuracy= 0.09

Iteration i= 100 , train accuracy= 0.37 , loss= 1.75347
test accuracy= 0.31

Iteration i= 200 , train accuracy= 0.42 , loss= 1.67264
test accuracy= 0.37

Iteration i= 300 , train accuracy= 0.52 , loss= 1.46247
test accuracy= 0.42

Iteration i= 400 , train accuracy= 0.57 , loss= 1.44062
test accuracy= 0.35

Iteration i= 500 , train accuracy= 0.58 , loss= 1.44749
test accuracy= 0.37

Iteration i= 600 , train accuracy= 0.5 , loss= 1.57928
test accuracy= 0.36

Iteration i= 700 , train accuracy= 0.56 , loss= 1.47601
test accuracy= 0.4

Iteration i= 800 , train accuracy= 0.52 , loss= 1.56849
test accuracy= 0.47

Iteration i= 900 , train accuracy= 0.55 , loss= 1.52185
test accuracy= 0.4

Iteration i= 1000 , train accuracy= 0.55 , loss= 1.52442
test accuracy= 0.43

Iteration i= 1100 , train accuracy= 0.58 , loss= 1.47568
test accuracy= 0.47

Iteration i= 1200 , train accuracy= 0.58 , loss= 1.44343
test accuracy= 0.47

Iteration i= 1300 , train accuracy= 0.53 , loss= 1.59163
test accuracy= 0.39

Iteration i= 1400 , train accuracy= 0.56 , loss= 1.50138
test accuracy= 0.47

Iteration i= 1500 , train accuracy= 0.6 , loss= 1.46945
test accuracy= 0.43

Iteration i= 1600 , train accuracy= 0.62 , loss= 1.41275
test accuracy= 0.45

Iteration i= 1700 , train accuracy= 0.59 , loss= 1.47348
test accuracy= 0.46

Iteration i= 1800 , train accuracy= 0.52 , loss= 1.56009
test accuracy= 0.49

Iteration i= 1900 , train accuracy= 0.56 , loss= 1.53852
test accuracy= 0.46

Iteration i= 2000 , train accuracy= 0.5 , loss= 1.61454
test accuracy= 0.45

Iteration i= 2100 , train accuracy= 0.59 , loss= 1.40033
test accuracy= 0.46

Iteration i= 2200 , train accuracy= 0.68 , loss= 1.32618
test accuracy= 0.45

Iteration i= 2300 , train accuracy= 0.49 , loss= 1.5699
test accuracy= 0.45

Iteration i= 2400 , train accuracy= 0.62 , loss= 1.51052
test accuracy= 0.43

Iteration i= 2500 , train accuracy= 0.57 , loss= 1.60779
test accuracy= 0.44

Iteration i= 2600 , train accuracy= 0.58 , loss= 1.46009
test accuracy= 0.45

Iteration i= 2700 , train accuracy= 0.54 , loss= 1.55583
test accuracy= 0.41

Iteration i= 2800 , train accuracy= 0.63 , loss= 1.47174
test accuracy= 0.42

Iteration i= 2900 , train accuracy= 0.55 , loss= 1.44453
test accuracy= 0.37

Iteration i= 3000 , train accuracy= 0.65 , loss= 1.39545
test accuracy= 0.46

Iteration i= 3100 , train accuracy= 0.66 , loss= 1.41468
test accuracy= 0.45

Iteration i= 3200 , train accuracy= 0.63 , loss= 1.46265
test accuracy= 0.45

Iteration i= 3300 , train accuracy= 0.61 , loss= 1.4307
test accuracy= 0.45

Iteration i= 3400 , train accuracy= 0.6 , loss= 1.47217
test accuracy= 0.43

Iteration i= 3500 , train accuracy= 0.55 , loss= 1.50845
test accuracy= 0.45

Iteration i= 3600 , train accuracy= 0.63 , loss= 1.43007
test accuracy= 0.38

Iteration i= 3700 , train accuracy= 0.63 , loss= 1.43998
test accuracy= 0.43

Iteration i= 3800 , train accuracy= 0.61 , loss= 1.45348
test accuracy= 0.44

Iteration i= 3900 , train accuracy= 0.51 , loss= 1.6591
test accuracy= 0.43

Iteration i= 4000 , train accuracy= 0.59 , loss= 1.50675
test accuracy= 0.44

Iteration i= 4100 , train accuracy= 0.6 , loss= 1.48884
test accuracy= 0.42

Iteration i= 4200 , train accuracy= 0.61 , loss= 1.56112
test accuracy= 0.41

Iteration i= 4300 , train accuracy= 0.58 , loss= 1.53728
test accuracy= 0.43

Iteration i= 4400 , train accuracy= 0.67 , loss= 1.40599
test accuracy= 0.42

Iteration i= 4500 , train accuracy= 0.61 , loss= 1.56555
test accuracy= 0.46

Iteration i= 4600 , train accuracy= 0.6 , loss= 1.47585
test accuracy= 0.43

Iteration i= 4700 , train accuracy= 0.63 , loss= 1.46315
test accuracy= 0.4

Iteration i= 4800 , train accuracy= 0.45 , loss= 1.58564
test accuracy= 0.45

Iteration i= 4900 , train accuracy= 0.51 , loss= 1.61028
test accuracy= 0.45

Iteration i= 5000 , train accuracy= 0.56 , loss= 1.47904
test accuracy= 0.41

Iteration i= 5100 , train accuracy= 0.63 , loss= 1.42922
test accuracy= 0.46

Iteration i= 5200 , train accuracy= 0.6 , loss= 1.47226
test accuracy= 0.42

Iteration i= 5300 , train accuracy= 0.63 , loss= 1.43026
test accuracy= 0.43

Iteration i= 5400 , train accuracy= 0.64 , loss= 1.50048
test accuracy= 0.4

Iteration i= 5500 , train accuracy= 0.6 , loss= 1.52189
test accuracy= 0.37

Iteration i= 5600 , train accuracy= 0.6 , loss= 1.46665
test accuracy= 0.46

Iteration i= 5700 , train accuracy= 0.64 , loss= 1.39556
test accuracy= 0.45

Iteration i= 5800 , train accuracy= 0.6 , loss= 1.52636
test accuracy= 0.46

Iteration i= 5900 , train accuracy= 0.67 , loss= 1.50692
test accuracy= 0.45

Iteration i= 6000 , train accuracy= 0.51 , loss= 1.50573
test accuracy= 0.39

Iteration i= 6100 , train accuracy= 0.62 , loss= 1.45463
test accuracy= 0.36

Iteration i= 6200 , train accuracy= 0.65 , loss= 1.42356
test accuracy= 0.41

Iteration i= 6300 , train accuracy= 0.56 , loss= 1.48915
test accuracy= 0.39

Iteration i= 6400 , train accuracy= 0.62 , loss= 1.46937
test accuracy= 0.41

Iteration i= 6500 , train accuracy= 0.64 , loss= 1.48078
test accuracy= 0.41

Iteration i= 6600 , train accuracy= 0.57 , loss= 1.46041
test accuracy= 0.43

Iteration i= 6700 , train accuracy= 0.62 , loss= 1.37864
test accuracy= 0.45

Iteration i= 6800 , train accuracy= 0.64 , loss= 1.45754
test accuracy= 0.43

Iteration i= 6900 , train accuracy= 0.62 , loss= 1.46096
test accuracy= 0.38

Iteration i= 7000 , train accuracy= 0.64 , loss= 1.44749
test accuracy= 0.41

Iteration i= 7100 , train accuracy= 0.59 , loss= 1.48538
test accuracy= 0.37

Iteration i= 7200 , train accuracy= 0.59 , loss= 1.54102
test accuracy= 0.48

Iteration i= 7300 , train accuracy= 0.65 , loss= 1.43455
test accuracy= 0.42

Iteration i= 7400 , train accuracy= 0.58 , loss= 1.59779
test accuracy= 0.37

Iteration i= 7500 , train accuracy= 0.62 , loss= 1.47773
test accuracy= 0.42

Iteration i= 7600 , train accuracy= 0.65 , loss= 1.41622
test accuracy= 0.41

Iteration i= 7700 , train accuracy= 0.6 , loss= 1.51204
test accuracy= 0.45

Iteration i= 7800 , train accuracy= 0.68 , loss= 1.43334
test accuracy= 0.44

Iteration i= 7900 , train accuracy= 0.64 , loss= 1.41955
test accuracy= 0.43

Iteration i= 8000 , train accuracy= 0.6 , loss= 1.51081
test accuracy= 0.43

Iteration i= 8100 , train accuracy= 0.58 , loss= 1.47814
test accuracy= 0.41

Iteration i= 8200 , train accuracy= 0.55 , loss= 1.51721
test accuracy= 0.46

Iteration i= 8300 , train accuracy= 0.56 , loss= 1.55523
test accuracy= 0.45

Iteration i= 8400 , train accuracy= 0.65 , loss= 1.45241
test accuracy= 0.43

Iteration i= 8500 , train accuracy= 0.65 , loss= 1.44941
test accuracy= 0.46

Iteration i= 8600 , train accuracy= 0.6 , loss= 1.37601
test accuracy= 0.42

Iteration i= 8700 , train accuracy= 0.55 , loss= 1.48717
test accuracy= 0.45

Iteration i= 8800 , train accuracy= 0.59 , loss= 1.4971
test accuracy= 0.38

Iteration i= 8900 , train accuracy= 0.59 , loss= 1.55226
test accuracy= 0.43

Iteration i= 9000 , train accuracy= 0.6 , loss= 1.5388
test accuracy= 0.41

Iteration i= 9100 , train accuracy= 0.62 , loss= 1.50196
test accuracy= 0.44

Iteration i= 9200 , train accuracy= 0.58 , loss= 1.43555
test accuracy= 0.42

Iteration i= 9300 , train accuracy= 0.63 , loss= 1.4531
test accuracy= 0.46

Iteration i= 9400 , train accuracy= 0.63 , loss= 1.41254
test accuracy= 0.41

Iteration i= 9500 , train accuracy= 0.57 , loss= 1.51985
test accuracy= 0.41

Iteration i= 9600 , train accuracy= 0.64 , loss= 1.42211
test accuracy= 0.43

Iteration i= 9700 , train accuracy= 0.6 , loss= 1.49074
test accuracy= 0.47

Iteration i= 9800 , train accuracy= 0.63 , loss= 1.47285
test accuracy= 0.44

Iteration i= 9900 , train accuracy= 0.67 , loss= 1.33962
test accuracy= 0.41

Iteration i= 10000 , train accuracy= 0.67 , loss= 1.44293
test accuracy= 0.41

Results

random dataset:

  • train accuracy: ~0.7
  • test accuracy: ~0.32

"cars" dataset:

  • train accuracy: ~0.8
  • test accuracy: ~0.4

"animals" dataset:

  • train accuracy: ~0.67
  • test accuracy: ~0.41

We can see that we get better results if we use videos in a given theme. Unfortunately we could not use really big dataset because of the limited memory of the virtual machine. It is a good result considering that this neural network do not take into consideration the thumbnail of the video!

Without the L2 regulation and dropout we can overfit our train accuracy up to 0.95, but the test accuracy will drop.