In [3]:

    
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

Read input from JSON records.



In [2]:

    
lines = []
for part in ("00000", "00001"):
  with open("../output/2017-01-03_13.57.34/part-%s" % part) as f:
    lines += f.readlines()

print(lines[0])









    



{"Id":1,"PostTypeId":1,"Body":"What does \"backprop\" mean? I've Googled it, but it's showing backpropagation.\n\nIs the \"backprop\" term basically the same as \"backpropagation\" or does it have a different meaning?\n","Score":5,"Tags":["neural-networks","definitions"]}

Create a pandas DataFrame



In [4]:

    
import pandas as pd

df = pd.read_json('[%s]' % ','.join(lines))

print(df.info())
df.head()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 1279 entries, 0 to 1278
Data columns (total 5 columns):
Body          1279 non-null object
Id            1279 non-null int64
PostTypeId    1279 non-null int64
Score         1279 non-null int64
Tags          1279 non-null object
dtypes: int64(3), object(2)
memory usage: 60.0+ KB
None






    Out[4]:






  
    
      
      Body
      Id
      PostTypeId
      Score
      Tags
    
  
  
    
      0
      What does "backprop" mean? I've Googled it, bu...
      1
      1
      5
      [neural-networks, definitions]
    
    
      1
      Does increasing the noise in data help to impr...
      2
      1
      7
      [generalization]
    
    
      2
      "Backprop" is the same as "backpropagation": i...
      3
      2
      10
      []
    
    
      3
      When you're writing your algorithm, how do you...
      4
      1
      16
      [deep-network, layers, neurons]
    
    
      4
      I have a LEGO Mindstorms EV3 and I'm wondering...
      5
      1
      0
      [mindstorms]

Create new features



In [5]:

    
df["has_qmark"] = df.Body.apply(lambda s: "?" in s)
df["num_qmarks"] = df.Body.apply(lambda s: s.count("?"))
df["body_length"] = df.Body.apply(lambda s: len(s))



In [6]:

    
df









    Out[6]:






  
    
      
      Body
      Id
      PostTypeId
      Score
      Tags
      has_qmark
      num_qmarks
      body_length
    
  
  
    
      0
      What does "backprop" mean? I've Googled it, bu...
      1
      1
      5
      [neural-networks, definitions]
      True
      2
      179
    
    
      1
      Does increasing the noise in data help to impr...
      2
      1
      7
      [generalization]
      True
      3
      213
    
    
      2
      "Backprop" is the same as "backpropagation": i...
      3
      2
      10
      []
      False
      0
      117
    
    
      3
      When you're writing your algorithm, how do you...
      4
      1
      16
      [deep-network, layers, neurons]
      True
      2
      184
    
    
      4
      I have a LEGO Mindstorms EV3 and I'm wondering...
      5
      1
      0
      [mindstorms]
      True
      2
      333
    
    
      5
      The intelligent agent definition of intelligen...
      6
      1
      3
      [intelligent-agent, philosophy]
      True
      1
      376
    
    
      6
      This quote by Stephen Hawking has been in head...
      7
      1
      8
      [intelligent-agent]
      True
      5
      418
    
    
      7
      You can use  which can be used to program Lego...
      8
      2
      2
      []
      False
      0
      93
    
    
      8
      Noise in the data, to a reasonable amount, may...
      9
      2
      6
      []
      False
      0
      750
    
    
      9
      I'm new to A.I. and I'd like to know in simple...
      10
      1
      13
      [deep-network, fuzzy-logic]
      True
      2
      126
    
    
      10
      We typically think of machine learning models ...
      11
      2
      6
      []
      False
      0
      842
    
    
      11
      There is no direct way to find the optimal num...
      12
      2
      9
      []
      False
      0
      284
    
    
      12
      In particular, an embedded computer (limited r...
      13
      1
      3
      [neural-networks, image-recognition]
      True
      1
      989
    
    
      13
      \n  Is a Mindstorm considered AI?\n\n\nThis de...
      14
      2
      3
      []
      True
      1
      237
    
    
      14
      The  was the first test of artificial intellig...
      15
      1
      15
      [turing-test, strong-ai, intelligent-agent, we...
      True
      1
      259
    
    
      15
      What is the "early stopping" and what are the ...
      16
      1
      4
      [generalization, definitions]
      True
      1
      102
    
    
      16
      I've heard the idea of the technological singu...
      17
      1
      12
      [self-learning, singularity]
      True
      3
      353
    
    
      17
      \n  To put it simply in layman terms, what are...
      18
      2
      2
      []
      True
      3
      1061
    
    
      18
      Because he did not yet know how far away curre...
      19
      2
      2
      []
      False
      0
      351
    
    
      19
      It rather depends on how one defines several o...
      20
      2
      2
      []
      False
      0
      567
    
    
      20
      I'm worry that my network become too complex. ...
      21
      1
      0
      [deep-network, overfitting, optimization]
      True
      1
      235
    
    
      21
      It's not just Hawking, you hear variations on ...
      22
      2
      3
      []
      True
      2
      1459
    
    
      22
      As Andrew Ng , worrying about such threat from...
      23
      2
      3
      []
      False
      0
      300
    
    
      23
      He says this because it can happen. If somethi...
      24
      2
      1
      []
      True
      4
      1576
    
    
      24
      There are a number of long resources to answer...
      25
      2
      2
      []
      False
      0
      1037
    
    
      25
      I've seen emotional intelligence defined as th...
      26
      1
      11
      [turing-test, emotional-intelligence]
      True
      3
      669
    
    
      26
      The problem of the Turing Test is that it test...
      27
      2
      4
      []
      False
      0
      696
    
    
      27
      Since human intelligence presumably is a funct...
      28
      1
      5
      [self-learning, genetic-algorithms]
      True
      3
      344
    
    
      28
      
      29
      5
      0
      []
      False
      0
      0
    
    
      29
      
      30
      4
      0
      []
      False
      0
      0
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      1249
      Even if machines with true Artificial General ...
      2420
      2
      2
      []
      True
      1
      1319
    
    
      1250
      We've concluded that it is a two-faceted, circ...
      2421
      2
      0
      []
      False
      0
      624
    
    
      1251
      I have data of 30 students attendance for a pa...
      2422
      1
      1
      [structured-data]
      True
      1
      422
    
    
      1252
      The terminology of this exercise is not standa...
      2423
      2
      2
      []
      False
      0
      449
    
    
      1253
      I suggest you should use AI Regression Model f...
      2424
      2
      2
      []
      False
      0
      168
    
    
      1254
      Because you have a small number of students (3...
      2425
      2
      0
      []
      False
      0
      665
    
    
      1255
      \n            \n            \n                ...
      2426
      1
      -1
      [neural-networks, machine-learning, deep-learn...
      False
      0
      395
    
    
      1256
      The Turing Test has been the classic test of a...
      2427
      1
      0
      [turing-test]
      True
      3
      550
    
    
      1257
      As far as I know I think this is the closest w...
      2428
      2
      0
      []
      False
      0
      296
    
    
      1258
      According to NASA scientist Rick Briggs, Sansk...
      2429
      1
      6
      [ai-design, nasa, cyborg]
      True
      3
      278
    
    
      1259
      As you can see, there is no computer screen fo...
      2430
      1
      -2
      [strong-ai]
      True
      1
      165
    
    
      1260
      There are many communication methods that coul...
      2431
      2
      2
      []
      False
      0
      487
    
    
      1261
      Rick Briggs refers to the difficulty an artifi...
      2432
      2
      5
      []
      True
      1
      1782
    
    
      1262
      "How is it possible for it to see and talk to ...
      2433
      2
      0
      []
      True
      1
      957
    
    
      1263
      THE PROBLEM\n\nIn my main body of text there a...
      2434
      1
      0
      [lstm, keras, tensorflow]
      False
      0
      1508
    
    
      1264
      If I have a dataset of images, and I extract a...
      2435
      1
      -1
      [machine-learning, deep-learning, image-recogn...
      True
      5
      455
    
    
      1265
      I am working on a project, wherein I take inpu...
      2436
      1
      0
      [nlp]
      True
      1
      454
    
    
      1266
      Is it possible to train an agent to take and p...
      2437
      1
      0
      [deep-learning]
      True
      2
      265
    
    
      1267
      There are programs that do this today, for som...
      2438
      2
      2
      []
      False
      0
      454
    
    
      1268
      I had already started in my graduation project...
      2439
      1
      0
      [machine-learning, deep-learning, algorithm, a...
      False
      0
      964
    
    
      1269
      No one has attempted to make a system that cou...
      2440
      2
      2
      []
      True
      1
      836
    
    
      1270
      One of the most crucial questions we as a spec...
      2441
      1
      1
      [strong-ai, legal, rights]
      True
      2
      984
    
    
      1271
      Always leave a back door, cheat code, or somet...
      2442
      2
      -1
      []
      False
      0
      85
    
    
      1272
      There is no doubt as to the fact that AI would...
      2443
      1
      1
      [strong-ai, prediction]
      True
      1
      241
    
    
      1273
      By definition, artificial intelligence include...
      2445
      2
      3
      []
      False
      0
      1173
    
    
      1274
      The cake example presented in the book "artifi...
      2446
      1
      0
      [path-planning]
      True
      1
      405
    
    
      1275
      if we are talking about AI that can replicate ...
      2448
      2
      0
      []
      False
      0
      455
    
    
      1276
      Printing actionspace for Pong-v0 gives 'Discre...
      2449
      1
      0
      [gaming, reinforcement-learning]
      True
      2
      304
    
    
      1277
      AI are not actually human, but are rather made...
      2450
      2
      0
      []
      True
      4
      598
    
    
      1278
      I have started to make a Python AI, and thee b...
      2451
      1
      1
      [algorithm]
      True
      1
      266
    
  

1279 rows × 8 columns

Filter for PostTypeId == 1 or PostTypeId == 2



In [7]:

    
df = df.loc[df.PostTypeId.apply(lambda x: x in [1, 2]), :]
df = df.reset_index(drop=True)
df.head()









    Out[7]:






  
    
      
      Body
      Id
      PostTypeId
      Score
      Tags
      has_qmark
      num_qmarks
      body_length
    
  
  
    
      0
      What does "backprop" mean? I've Googled it, bu...
      1
      1
      5
      [neural-networks, definitions]
      True
      2
      179
    
    
      1
      Does increasing the noise in data help to impr...
      2
      1
      7
      [generalization]
      True
      3
      213
    
    
      2
      "Backprop" is the same as "backpropagation": i...
      3
      2
      10
      []
      False
      0
      117
    
    
      3
      When you're writing your algorithm, how do you...
      4
      1
      16
      [deep-network, layers, neurons]
      True
      2
      184
    
    
      4
      I have a LEGO Mindstorms EV3 and I'm wondering...
      5
      1
      0
      [mindstorms]
      True
      2
      333



In [8]:

    
n_questions = np.sum(df.PostTypeId == 1)
n_answers = np.sum(df.PostTypeId == 2)
print("No. questions {0} / No. answers {1}".format(n_questions, n_answers))









    



No. questions 421 / No. answers 749

Are any relationships apparent in the raw data and the obvious features?



In [9]:

    
df.plot.scatter(x="num_qmarks",y="PostTypeId")
df.plot.scatter(x="body_length",y="PostTypeId")









    Out[9]:





<matplotlib.axes._subplots.AxesSubplot at 0x1155ad828>

Problem:

Can PostTypeId be predicted from the post body? Here, we try the linear RidgeClassifier and the nonlinear RandomForestClassifier and compare the accuracy in the training set to the accuracy in the test set.

From the results, we see that the RandomForestClassifier is more accurate than the linear model, but is actually overfitting the data. The overfitting is likely to improve with more training examples, so let's choose the RF classifier.



In [10]:

    
from sklearn.linear_model import RidgeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, StratifiedShuffleSplit

X = df.loc[:, ['num_qmarks', 'body_length']]
y = df.loc[:, 'PostTypeId']

X_train, X_test, y_train, y_test = train_test_split(
  X, y, test_size=0.3, random_state=42)

classifiers = [("Ridge", RidgeClassifier()), ("RandomForest", RandomForestClassifier())]

for name, classifier in classifiers:
  classifier.fit(X_train, y_train)
  print(name + " " + "-"*(60 - len(name)))
  print("R2_train: {0}, R2_test: {1}".format(classifier.score(X_train, y_train), classifier.score(X_test, y_test)))
  print()









    



Ridge -------------------------------------------------------
R2_train: 0.8473748473748474, R2_test: 0.8632478632478633

RandomForest ------------------------------------------------
R2_train: 0.9804639804639804, R2_test: 0.8632478632478633

Feature extraction

Use sklearns Tranformer class to extract features from the original DataFrame.



In [11]:

    
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.feature_extraction.text import CountVectorizer

class FSTransformer(BaseEstimator, TransformerMixin):
  """
  Returns the different feature names
  """
  def __init__(self, features):
    self.features = features
    pass

  def fit(self, X, y):
    return self
  
  def transform(self, df):
    return df[self.features].as_matrix()

  
class CountVecTransformer(BaseEstimator, TransformerMixin):
  def __init__(self):
    self.vectorizer = CountVectorizer(binary=False)
    pass
  
  def fit(self, df, y=None):
    self.vectorizer.fit(df.Body)
    return self
  
  def transform(self, df):
    return self.vectorizer.transform(df.Body).todense()



In [12]:

    
df.head()









    Out[12]:






  
    
      
      Body
      Id
      PostTypeId
      Score
      Tags
      has_qmark
      num_qmarks
      body_length
    
  
  
    
      0
      What does "backprop" mean? I've Googled it, bu...
      1
      1
      5
      [neural-networks, definitions]
      True
      2
      179
    
    
      1
      Does increasing the noise in data help to impr...
      2
      1
      7
      [generalization]
      True
      3
      213
    
    
      2
      "Backprop" is the same as "backpropagation": i...
      3
      2
      10
      []
      False
      0
      117
    
    
      3
      When you're writing your algorithm, how do you...
      4
      1
      16
      [deep-network, layers, neurons]
      True
      2
      184
    
    
      4
      I have a LEGO Mindstorms EV3 and I'm wondering...
      5
      1
      0
      [mindstorms]
      True
      2
      333



In [13]:

    
fst = FSTransformer(["has_qmark"])
fst.transform(df)









    Out[13]:





array([[ True],
       [ True],
       [False],
       ..., 
       [ True],
       [ True],
       [ True]], dtype=bool)



In [14]:

    
CountVecTransformer().fit_transform(df)









    Out[14]:





matrix([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]])

Let's use the word frequency vectors in combination with the obvious features we created previously to try to predict PostTypeId.



In [15]:

    
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.metrics import f1_score

model_pipe = Pipeline([
    ("features", 
      FeatureUnion([
        ("derived", FSTransformer(["has_qmark", "num_qmarks", "body_length"])),
        ("count_vec", CountVecTransformer())
      ])
     ),
    ("clf", RandomForestClassifier())
  ])

sss = StratifiedShuffleSplit(n_splits=5, test_size=0.3, random_state=42)

X = df
y = df.PostTypeId

for train_index, test_index in sss.split(X.as_matrix(), y.as_matrix()):
  X_train, X_test = X.iloc[train_index, :], X.iloc[test_index, :]
  y_train, y_test = y.iloc[train_index], y.iloc[test_index]
  
  model_pipe.fit(X_train, y_train)
  r2_train = model_pipe.score(X_train, y_train)
  r2_test = model_pipe.score(X_test, y_test)
  y_pred = model_pipe.predict(X_test)
  f1 = f1_score(y_test, y_pred)
  print("R2_train: {0} R2_test: {1} f1: {2}".format(r2_train, r2_test, f1))









    



R2_train: 0.9987789987789988 R2_test: 0.8717948717948718 f1: 0.8132780082987553
R2_train: 0.9975579975579976 R2_test: 0.8774928774928775 f1: 0.8154506437768241
R2_train: 0.9975579975579976 R2_test: 0.8746438746438746 f1: 0.8035714285714286
R2_train: 0.9987789987789988 R2_test: 0.8433048433048433 f1: 0.7717842323651453
R2_train: 1.0 R2_test: 0.8433048433048433 f1: 0.7736625514403291

Since we're overfitting, we can try reducing the total number of features. One approach to this is to perform the $\chi^2$ statistical test of independence on each feature with respect to the label (PostTypeId) and remove the features that are most independent of the label. Here, we put SelectKBest into the model pipeline and keep only the 10 most dependent features.



In [16]:

    
from sklearn.feature_selection import SelectKBest, chi2

model_pipe = Pipeline([
    ("features", 
      FeatureUnion([
        ("derived", FSTransformer(["has_qmark", "num_qmarks", "body_length"])),
        ("count_vec", CountVecTransformer())
      ])
     ),
    ("best_features", SelectKBest(chi2, k=10)),
    ("clf", RandomForestClassifier())
  ])

sss = StratifiedShuffleSplit(n_splits=5, test_size=0.3, random_state=42)

X = df
y = df.PostTypeId

for train_index, test_index in sss.split(X.as_matrix(), y.as_matrix()):
  X_train, X_test = X.iloc[train_index, :], X.iloc[test_index, :]
  y_train, y_test = y.iloc[train_index], y.iloc[test_index]
  
  model_pipe.fit(X_train, y_train)
  r2_train = model_pipe.score(X_train, y_train)
  r2_test = model_pipe.score(X_test, y_test)
  y_pred = model_pipe.predict(X_test)
  f1 = f1_score(y_test, y_pred)
  print("R2_train: {0} R2_test: {1} f1: {2}".format(r2_train, r2_test, f1))









    



R2_train: 0.9951159951159951 R2_test: 0.8945868945868946 f1: 0.8525896414342629
R2_train: 0.989010989010989 R2_test: 0.8888888888888888 f1: 0.8421052631578947
R2_train: 0.9914529914529915 R2_test: 0.8974358974358975 f1: 0.859375
R2_train: 0.9951159951159951 R2_test: 0.886039886039886 f1: 0.8387096774193549
R2_train: 0.9951159951159951 R2_test: 0.8974358974358975 f1: 0.8625954198473282

So, the generalization the model improved, but what is the right number of features to keep? For this, we can use model cross-validation. The class GridSearchCV allows us to vary a hyperparameter of the model and compute the model score for each candidate parameter (or set of parameters).



In [18]:

    
from sklearn.model_selection import GridSearchCV

modelCV = GridSearchCV(model_pipe, {"best_features__k":[3 ** i for i in range(1,7)]})
modelCV.fit(X,y)

cv_accuracy = pd.DataFrame([{**score.parameters, **{"mean_validation_score": score.mean_validation_score}} 
             for score in modelCV.grid_scores_])
cv_accuracy.plot(x="best_features__k", y="mean_validation_score")
cv_accuracy









    



/Users/joshuaarnold/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_search.py:667: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20
  DeprecationWarning)






    Out[18]:






  
    
      
      best_features__k
      mean_validation_score
    
  
  
    
      0
      3
      0.799145
    
    
      1
      9
      0.887179
    
    
      2
      27
      0.903419
    
    
      3
      81
      0.907692
    
    
      4
      243
      0.901709
    
    
      5
      729
      0.889744

We can refine our range of hyperparameters to hone in on the best number.



In [19]:

    
modelCV = GridSearchCV(model_pipe, {"best_features__k":list(range(80,120,10))})
modelCV.fit(X,y)

cv_accuracy = pd.DataFrame([{**score.parameters, **{"mean_validation_score": score.mean_validation_score}} 
             for score in modelCV.grid_scores_])
cv_accuracy.plot(x="best_features__k", y="mean_validation_score")
cv_accuracy









    



/Users/joshuaarnold/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_search.py:667: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20
  DeprecationWarning)






    Out[19]:






  
    
      
      best_features__k
      mean_validation_score
    
  
  
    
      0
      80
      0.894872
    
    
      1
      90
      0.903419
    
    
      2
      100
      0.899145
    
    
      3
      110
      0.902564

Finally, we can inspect the confusion matrix to determine the types of errors encounter

True Positive	False Positive
False Negative	True Negative



In [24]:

    
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, modelCV.predict(X_test))
cm = cm / cm.sum(axis=0)
print(cm)









    



[[ 0.98425197  0.00446429]
 [ 0.01574803  0.99553571]]



In [23]:

    
cm = confusion_matrix(y_test, modelCV.predict(X_test))
cm









    Out[23]:





array([[125,   1],
       [  2, 223]])



In [ ]:



In [25]:

    
plt.imshow(cm, interpolation="nearest", cmap="Blues")
plt.colorbar()









    Out[25]:





<matplotlib.colorbar.Colorbar at 0x114f25320>

	Body	Id	PostTypeId	Score	Tags
0	What does "backprop" mean? I've Googled it, bu...	1	1	5	[neural-networks, definitions]
1	Does increasing the noise in data help to impr...	2	1	7	[generalization]
2	"Backprop" is the same as "backpropagation": i...	3	2	10	[]
3	When you're writing your algorithm, how do you...	4	1	16	[deep-network, layers, neurons]
4	I have a LEGO Mindstorms EV3 and I'm wondering...	5	1	0	[mindstorms]

	Body	Id	PostTypeId	Score	Tags	has_qmark	num_qmarks	body_length
0	What does "backprop" mean? I've Googled it, bu...	1	1	5	[neural-networks, definitions]	True	2	179
1	Does increasing the noise in data help to impr...	2	1	7	[generalization]	True	3	213
2	"Backprop" is the same as "backpropagation": i...	3	2	10	[]	False	0	117
3	When you're writing your algorithm, how do you...	4	1	16	[deep-network, layers, neurons]	True	2	184
4	I have a LEGO Mindstorms EV3 and I'm wondering...	5	1	0	[mindstorms]	True	2	333
5	The intelligent agent definition of intelligen...	6	1	3	[intelligent-agent, philosophy]	True	1	376
6	This quote by Stephen Hawking has been in head...	7	1	8	[intelligent-agent]	True	5	418
7	You can use which can be used to program Lego...	8	2	2	[]	False	0	93
8	Noise in the data, to a reasonable amount, may...	9	2	6	[]	False	0	750
9	I'm new to A.I. and I'd like to know in simple...	10	1	13	[deep-network, fuzzy-logic]	True	2	126
10	We typically think of machine learning models ...	11	2	6	[]	False	0	842
11	There is no direct way to find the optimal num...	12	2	9	[]	False	0	284
12	In particular, an embedded computer (limited r...	13	1	3	[neural-networks, image-recognition]	True	1	989
13	\n Is a Mindstorm considered AI?\n\n\nThis de...	14	2	3	[]	True	1	237
14	The was the first test of artificial intellig...	15	1	15	[turing-test, strong-ai, intelligent-agent, we...	True	1	259
15	What is the "early stopping" and what are the ...	16	1	4	[generalization, definitions]	True	1	102
16	I've heard the idea of the technological singu...	17	1	12	[self-learning, singularity]	True	3	353
17	\n To put it simply in layman terms, what are...	18	2	2	[]	True	3	1061
18	Because he did not yet know how far away curre...	19	2	2	[]	False	0	351
19	It rather depends on how one defines several o...	20	2	2	[]	False	0	567
20	I'm worry that my network become too complex. ...	21	1	0	[deep-network, overfitting, optimization]	True	1	235
21	It's not just Hawking, you hear variations on ...	22	2	3	[]	True	2	1459
22	As Andrew Ng , worrying about such threat from...	23	2	3	[]	False	0	300
23	He says this because it can happen. If somethi...	24	2	1	[]	True	4	1576
24	There are a number of long resources to answer...	25	2	2	[]	False	0	1037
25	I've seen emotional intelligence defined as th...	26	1	11	[turing-test, emotional-intelligence]	True	3	669
26	The problem of the Turing Test is that it test...	27	2	4	[]	False	0	696
27	Since human intelligence presumably is a funct...	28	1	5	[self-learning, genetic-algorithms]	True	3	344
28		29	5	0	[]	False	0	0
29		30	4	0	[]	False	0	0
...	...	...	...	...	...	...	...	...
1249	Even if machines with true Artificial General ...	2420	2	2	[]	True	1	1319
1250	We've concluded that it is a two-faceted, circ...	2421	2	0	[]	False	0	624
1251	I have data of 30 students attendance for a pa...	2422	1	1	[structured-data]	True	1	422
1252	The terminology of this exercise is not standa...	2423	2	2	[]	False	0	449
1253	I suggest you should use AI Regression Model f...	2424	2	2	[]	False	0	168
1254	Because you have a small number of students (3...	2425	2	0	[]	False	0	665
1255	\n \n \n ...	2426	1	-1	[neural-networks, machine-learning, deep-learn...	False	0	395
1256	The Turing Test has been the classic test of a...	2427	1	0	[turing-test]	True	3	550
1257	As far as I know I think this is the closest w...	2428	2	0	[]	False	0	296
1258	According to NASA scientist Rick Briggs, Sansk...	2429	1	6	[ai-design, nasa, cyborg]	True	3	278
1259	As you can see, there is no computer screen fo...	2430	1	-2	[strong-ai]	True	1	165
1260	There are many communication methods that coul...	2431	2	2	[]	False	0	487
1261	Rick Briggs refers to the difficulty an artifi...	2432	2	5	[]	True	1	1782
1262	"How is it possible for it to see and talk to ...	2433	2	0	[]	True	1	957
1263	THE PROBLEM\n\nIn my main body of text there a...	2434	1	0	[lstm, keras, tensorflow]	False	0	1508
1264	If I have a dataset of images, and I extract a...	2435	1	-1	[machine-learning, deep-learning, image-recogn...	True	5	455
1265	I am working on a project, wherein I take inpu...	2436	1	0	[nlp]	True	1	454
1266	Is it possible to train an agent to take and p...	2437	1	0	[deep-learning]	True	2	265
1267	There are programs that do this today, for som...	2438	2	2	[]	False	0	454
1268	I had already started in my graduation project...	2439	1	0	[machine-learning, deep-learning, algorithm, a...	False	0	964
1269	No one has attempted to make a system that cou...	2440	2	2	[]	True	1	836
1270	One of the most crucial questions we as a spec...	2441	1	1	[strong-ai, legal, rights]	True	2	984
1271	Always leave a back door, cheat code, or somet...	2442	2	-1	[]	False	0	85
1272	There is no doubt as to the fact that AI would...	2443	1	1	[strong-ai, prediction]	True	1	241
1273	By definition, artificial intelligence include...	2445	2	3	[]	False	0	1173
1274	The cake example presented in the book "artifi...	2446	1	0	[path-planning]	True	1	405
1275	if we are talking about AI that can replicate ...	2448	2	0	[]	False	0	455
1276	Printing actionspace for Pong-v0 gives 'Discre...	2449	1	0	[gaming, reinforcement-learning]	True	2	304
1277	AI are not actually human, but are rather made...	2450	2	0	[]	True	4	598
1278	I have started to make a Python AI, and thee b...	2451	1	1	[algorithm]	True	1	266

	best_features__k	mean_validation_score
0	3	0.799145
1	9	0.887179
2	27	0.903419
3	81	0.907692
4	243	0.901709
5	729	0.889744