Exploring QVEC

I want to spend some time now looking closer at QVEC's output, namely the correlations and the alignment matrix. The second main point of the original paper is that the alignments allow you to interpret individual dimensions of embeddings.



In [1]:

    
%matplotlib inline
import os
import csv
from itertools import product
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

data_path = '../../data'
tmp_path = '../../tmp'









    



/home/bacon/miniconda3/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

Linguistic features



In [2]:

    
feature_path = os.path.join(data_path, 'evaluation/semcor/tsvetkov_semcor.csv')
subset = pd.read_csv(feature_path, index_col=0)
subset.columns = [c.replace('semcor.', '') for c in subset.columns]
subset.set_index('words', inplace=True)
subset = subset.T

Learnt embeddings



In [3]:

    
size = 300
fname = 'embeddings/glove.6B.{}d.txt'.format(size)
embedding_path = os.path.join(data_path, fname)
embeddings = pd.read_csv(embedding_path, sep=' ', header=None, index_col=0, quoting=csv.QUOTE_NONE).T

QVEC model



In [4]:

    
def qvec(features, embeddings):
    """
    Returns correlations between columns of `features` and `embeddings`.
    
    The aligned feature is the one with the highest correlation.
    The qvec score is the sum of correlations of aligned features.
    """
    common_words = embeddings.columns.intersection(features.columns)
    S = features[common_words]
    X = embeddings[common_words]
    correlations = pd.DataFrame({i:X.corrwith(S.iloc[i], axis=1) for i in range(len(S))})
    correlations.columns = S.index
    return correlations



In [46]:

    
correlations = qvec(subset, embeddings)
V = len(embeddings.columns.intersection(subset.columns))



In [47]:

    
correlations.head()









    Out[47]:







  
    
      
      noun.Tops
      noun.act
      noun.animal
      noun.artifact
      noun.attribute
      noun.body
      noun.cognition
      noun.communication
      noun.event
      noun.feeling
      ...
      verb.consumption
      verb.contact
      verb.creation
      verb.emotion
      verb.motion
      verb.perception
      verb.possession
      verb.social
      verb.stative
      verb.weather
    
  
  
    
      1
      -0.025195
      0.103305
      0.022804
      0.019960
      -0.005073
      -0.063040
      0.001746
      -0.044627
      0.060426
      -0.000629
      ...
      -0.000923
      0.032615
      -0.059228
      0.002191
      0.043395
      0.016654
      -0.000253
      0.015861
      -0.058195
      0.027505
    
    
      2
      0.025052
      0.003521
      0.071996
      0.033976
      0.046986
      0.000275
      -0.025890
      -0.093323
      0.013002
      -0.026363
      ...
      0.041451
      0.004752
      0.008058
      -0.024316
      -0.043428
      0.021968
      -0.003188
      0.024070
      0.064498
      -0.034473
    
    
      3
      0.000335
      0.132868
      -0.020388
      -0.043322
      0.026484
      -0.031252
      0.051305
      0.025105
      0.069223
      0.112211
      ...
      -0.009357
      -0.078781
      -0.096304
      -0.000102
      -0.010545
      -0.015110
      -0.082116
      -0.047947
      -0.098220
      0.008797
    
    
      4
      0.003880
      0.020326
      0.023546
      -0.164084
      -0.013492
      -0.079157
      0.008527
      0.045884
      -0.002118
      -0.003778
      ...
      0.031046
      -0.015342
      -0.029799
      0.009851
      0.045801
      -0.023565
      0.036102
      0.003050
      0.059049
      -0.009769
    
    
      5
      0.011347
      0.032593
      0.008034
      -0.031921
      -0.015440
      0.030811
      0.026431
      -0.020380
      -0.054214
      0.032733
      ...
      -0.006414
      -0.024326
      -0.012067
      0.045959
      -0.026189
      -0.051913
      0.013840
      -0.012455
      0.000732
      -0.006095
    
  

5 rows × 41 columns

Exploration

What dimensions and features are aligned?

The dataframe below is as follows: index is the dimension of the learnt embedding, 'feature' is the name of the linguistic feature aligned with that dimension, and 'max_corr' is the correlation between the dimension and feature. The sum of the 'max_corr' column is the qvec score.

39 dimensions pick out 'noun.person', 37 'noun.artifact', 19 'noun.body', 15 'verb.change'.



In [48]:

    
alignments = pd.DataFrame(correlations.idxmax(axis=1))
alignments.columns = ['feature']
alignments['max_corr'] = correlations.max(axis=1)
alignments.sort_values(by='max_corr', ascending=False).head(10)









    Out[48]:







  
    
      
      feature
      max_corr
    
  
  
    
      122
      noun.person
      0.318956
    
    
      255
      noun.person
      0.288418
    
    
      91
      noun.artifact
      0.268320
    
    
      54
      noun.person
      0.254084
    
    
      245
      noun.person
      0.224563
    
    
      300
      noun.person
      0.215958
    
    
      151
      noun.group
      0.207835
    
    
      235
      noun.artifact
      0.204042
    
    
      187
      noun.artifact
      0.200060
    
    
      182
      noun.person
      0.197632

What is QVEC doing?

QVEC is looking at 41 correlation coefficients (or as many linguistic features as there are) and finding the maximum. Here, I show the relevant scatterplot for the highest correlation.

A consistent observation is that the distribution of the linguistic features are strongly peaked at 0. That is, almost all words have 0 for for most features. Sometimes, there is some mass at 1. This suggests to me that the linguistic features being used are not appropriate.



In [8]:

    
common_words = embeddings.columns.intersection(subset.columns)
S = subset[common_words]
X = embeddings[common_words]
def plot(i, j, X=X, S=S):
    """Plot ith dimension of embeddings against feature j."""
    x = X.loc[i]
    s = S.loc[j]
    sns.jointplot(x, s);



In [9]:

    
plot(300,'noun.person')

What do the learnt embeddings looks like?

In sum: each dimension looks pretty normal, but the formal tests I'm using suggest otherwise. Most are centered at 0 with std around 0.4.

From the marginal distribution plots above, it looks like each dimension is normally distributed. I don't know if that's purposively done during training or if it just turns out that way.



In [10]:

    
sns.distplot(X.loc[89]);

Graphical test of normality

I'm plotting a QQ plot and a probability plot side by side.



In [11]:

    
fig, axs = plt.subplots(1,2)
vector = X.loc[1]
sm.qqplot(vector, ax=axs[0]);
stats.probplot(vector, plot=axs[1]);

KS test

The preliminary results suggest that some dimensions are not normally distributed.

The KS test is clear, but I have some uncertainty about how to use it in scipy. In particular, do I give it the std or var of the distribution being tested?



In [12]:

    
def do_kstest(i):
    vector = X.loc[i]
    ybar = vector.mean()
    s = vector.std()
    result = stats.kstest(vector, cdf='norm', args=(ybar, s))
    return result.pvalue

p_values = [do_kstest(i) for i in X.index]
sns.distplot(p_values);

Shapiro-Wilk test



In [13]:

    
def do_shapirotest(i):
    vector = X.loc[i]
    result = stats.shapiro(vector)
    return result[1]

p_values = [do_shapirotest(i) for i in X.index]
sns.distplot(p_values);

Lilliefors test



In [14]:

    
def do_lillieforstest(i):
    vector = X.loc[i]
    result = sm.stats.lilliefors(vector)
    return result[1]



In [15]:

    
p_values = [do_lillieforstest(i) for i in X.index]
sns.distplot(p_values);

Location & spread of each dimension



In [16]:

    
fig, axs = plt.subplots(1,2)
sns.distplot(X.mean(axis=1), ax=axs[0]);
sns.distplot(X.std(axis=1), ax=axs[1]);

What does each learnt embedding look like?

In sum: Centered at 0 with std 0.4, but less clearly normal.

How to answer this effectively?



In [17]:

    
sns.distplot(X['bird']);

Location & spread of each word embedding



In [18]:

    
fig, axs = plt.subplots(1,2)
sns.distplot(X.mean(), ax=axs[0]);
sns.distplot(X.std(), ax=axs[1]);

What do the features look like?

The features are strongly bimodal. The usual summary statistics of mean, median and std are not appropriate for bimodal distributions.

If you were to randomly select a word, on average its feature representation would have 1.3% for an animal noun.



In [19]:

    
S.mean(axis=1).sort_values(ascending=False).head()









    Out[19]:





noun.artifact         0.100604
noun.person           0.081092
noun.act              0.066654
noun.communication    0.060680
verb.communication    0.050085
dtype: float64



In [20]:

    
fig, axs = plt.subplots(ncols=4, figsize=(10, 4), sharey=True)
sns.distplot(S.loc['noun.artifact'], ax=axs[0], kde=False);
sns.distplot(S.loc['noun.person'], ax=axs[1], kde=False);
sns.distplot(S.loc['noun.act'], ax=axs[2], kde=False);
sns.distplot(S.loc['noun.communication'], ax=axs[3], kde=False);

What proportion of words in the vocab have a non-zero value for each feature?

On average across all 41 features, 6% of words have a nonzero entry for features. The highest proportion is 21%.



In [21]:

    
proportions = S.astype(bool).sum(axis=1) / len(S.columns)
print(proportions.sort_values(ascending=False).head())
proportions.describe()









    



noun.artifact         0.210702
noun.act              0.181796
noun.communication    0.151935
noun.person           0.135929
verb.communication    0.116579
dtype: float64






    Out[21]:





count    41.000000
mean      0.060783
std       0.048772
min       0.002628
25%       0.024606
50%       0.043717
75%       0.091495
max       0.210702
dtype: float64

How can I actually measure an association between linguistic features and learnt dimensions?

Knowing that the dimensions of the learnt embeddings are normally distributed and that the features are strongly bimodal, what is the best way to measure their correlation? It's clear that Pearson's $r$ and Spearman's $\rho$ are not appropriate because of the high number of ties.

I see two broad approaches:

Remove all 0's and use Pearson's or Spearman's.
Treat the feature as binary and compare means.

In sum: Neither is very insightful. I need to use different (less sparse) features.

Remove 0's

In sum: Removing 0's seems to help, but is not principled. It picks out one extremely rare feature. The fact that removing 0's helps tells me the presence of such rare features is a problem.

I changed the 0's to missing values and then use the usual QVEC code from above. I checked the source of corrwith and it looks like it ignores missing values, which is what I want.

The wierd thing is that one feature 'noun.motive' is the most highly correlated feature for 66 of the 300 dimensions. Most of the most highly correlated features are 'noun.motive'. Previously, it didn't appear at all. There are only 11 nonzero entries for it.



In [22]:

    
S_no_zeroes = S[(S != 0)]
S_no_zeroes.head()









    Out[22]:







  
    
      words
      in
      a
      be
      have
      will
      one
      two
      more
      first
      up
      ...
      epiphysis
      alveolus
      antiserum
      quirt
      polyphosphate
      catkin
      illumine
      thyroglobulin
      pushup
      compulsivity
    
  
  
    
      noun.Tops
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      noun.act
      0.181818
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.322581
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      1.0
      NaN
    
    
      noun.animal
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      noun.artifact
      0.090909
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      1.0
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      noun.attribute
      0.090909
      NaN
      NaN
      NaN
      0.111111
      NaN
      NaN
      NaN
      0.032258
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      1.0
    
  

5 rows × 4186 columns



In [23]:

    
tmp = qvec(S_no_zeroes, embeddings)
tmp.head()









    Out[23]:







  
    
      
      noun.Tops
      noun.act
      noun.animal
      noun.artifact
      noun.attribute
      noun.body
      noun.cognition
      noun.communication
      noun.event
      noun.feeling
      ...
      verb.consumption
      verb.contact
      verb.creation
      verb.emotion
      verb.motion
      verb.perception
      verb.possession
      verb.social
      verb.stative
      verb.weather
    
  
  
    
      1
      -0.229182
      0.111077
      0.007209
      0.037589
      -0.083839
      -0.227386
      0.002934
      -0.045626
      -0.019186
      -0.062111
      ...
      0.073027
      0.085515
      -0.244246
      -0.067874
      0.068440
      0.070704
      -0.100652
      0.013106
      -0.167430
      0.256932
    
    
      2
      0.202263
      -0.008259
      0.200702
      -0.000356
      0.113172
      -0.024689
      -0.072588
      -0.178311
      -0.051213
      -0.149925
      ...
      0.100824
      -0.074187
      -0.067437
      -0.190583
      -0.162674
      0.028970
      -0.054395
      0.000717
      0.154650
      -0.111338
    
    
      3
      0.014284
      0.214625
      -0.075941
      -0.117910
      -0.020473
      0.002985
      0.049466
      0.009423
      0.171650
      0.440297
      ...
      0.087655
      -0.040716
      -0.241742
      0.112286
      0.066815
      0.146187
      -0.137460
      -0.026873
      -0.175116
      0.181890
    
    
      4
      0.155118
      0.049381
      0.292984
      -0.116676
      0.000463
      0.044785
      0.116891
      0.058691
      0.084327
      -0.093920
      ...
      0.109172
      0.019141
      -0.081494
      0.030214
      0.161246
      -0.153968
      0.116954
      -0.026835
      0.241725
      0.038425
    
    
      5
      -0.008214
      0.055944
      -0.049097
      -0.014899
      -0.069239
      0.001092
      -0.028705
      -0.007390
      -0.094735
      0.062622
      ...
      0.040004
      -0.034280
      -0.047176
      0.085783
      -0.065920
      -0.261270
      0.037078
      -0.065048
      -0.029855
      0.198482
    
  

5 rows × 41 columns



In [24]:

    
alignments = pd.DataFrame(tmp.idxmax(axis=1))
alignments.columns = ['feature']
alignments['max_corr'] = tmp.max(axis=1)
alignments.sort_values(by='max_corr', ascending=False).head()









    Out[24]:







  
    
      
      feature
      max_corr
    
  
  
    
      160
      noun.motive
      0.865133
    
    
      164
      noun.motive
      0.727977
    
    
      213
      noun.motive
      0.705146
    
    
      200
      noun.motive
      0.678975
    
    
      218
      noun.motive
      0.676393

The following dimensions and features were aligned previously:

122 noun.person
255 noun.person
91 noun.artifact
54 noun.person
245 noun.person



In [25]:

    
plot(122, 'noun.person', S=S_no_zeroes);



In [26]:

    
plot(255, 'noun.person', S=S_no_zeroes);

Treat linguistic features as binary

In sum: I binarize S, so the linguistic features are now presence/absence. For each dimension-feature pair, you can look at the distribution of the dimension for words with and without that feature. Dimension-features that are aligned using the original method show separation. But quantifying the separation across all dimension-feature pairs is problematic, using either t-test or KS test. You get seemingly significant results for unaligned pairs. I have not done any multiple test corrections. This approach does not seem as promising as changing the features to less sparse ones.*

One suggestion here is to compare the means directly. Below, I plot the distribution of a dimension split by words that have that feature and those that don't. For dimensions and features that were identified above as aligned, this plot shows some good separation. For non-aligned dimension-feature pairs, there is no separation. However, I perform a two-tailed t-test for a difference of means between the two. I get "significant" results even when there is no visible difference. Thus, I cannot blindly trust the t test results here. To show this, I perform all $(41 \times 300)$ t tests and plot the p values. The plot suggests most pairs are significantly different, in line with my eyeball checks previously.



In [27]:

    
def do_ttest(i, feature, X=X, S=S):
    """Do two sample t test for difference of means between the ith dimension 
    of words with feature and those without."""
    dim = X.loc[i]
    have = S.loc[feature].astype(bool)
    result = stats.ttest_ind(dim[have], dim[~have])
    return result[1]



In [28]:

    
def do_2kstest(i, feature, X=X, S=S):
    """Returns p value from 2 sided KS test that the ith dimension from words with 
    feature and those from words without feature come from the same distribution."""
    dim = X.loc[i]
    have = S.loc[feature].astype(bool)
    result = stats.ks_2samp(dim[have], dim[~have])
    return result[1]



In [29]:

    
def plot_by_presence(i, feature, X=X, S=S):
    """Plot distribution of the ith dimension of X for those that have
    feature and those that don't."""
    dim = X.loc[i]
    have = S.loc[feature].astype(bool)
    has_label = feature
    has_not_label = 'no {}'.format(feature)
    sns.distplot(dim[have], label=has_label);
    sns.distplot(dim[~have], label=has_not_label);
    t_test = do_ttest(i, feature, X, S)
    ks_test = do_2kstest(i, feature, X, S)
    plt.legend();
    plt.title('t: {}\nks: {}'.format(t_test, ks_test))

This dimension-feature pair was aligned in the original method using correlation between raw values. I see good separation between the distributions, which is consistent with a relationship between the variables. The t test result strongly suggests the population means are different (but I can see that from the plot).



In [30]:

    
plot_by_presence(255, 'noun.person')

This dimension-feature pair is weakly negatively correlated using the original method ($r=-0.06$). Consistent with that, the distributions overlap a lot. However, a t test gives a small p value, suggesting the population means are different.



In [31]:

    
plot_by_presence(1, 'noun.body')

This is the distribution of p values from a t test for all dimension-feature pairs. I think it shows the inappropriateness of a t test more than anything else.



In [32]:

    
tmp = [do_ttest(i, f) for (i, f) in product(X.index, S.index)]
sns.distplot(tmp);



In [33]:

    
tmp = [do_2kstest(i, f) for (i, f) in product(X.index, S.index)]
sns.distplot(tmp);

How sparse is the feature matrix?

93% of the entries in the feature matrix are zero.



In [34]:

    
(subset.size - np.count_nonzero(subset.values)) / subset.size









    Out[34]:





0.93932934089998199

How correlated are the features and learnt dimensions?

This plot says that the correlations are normally distributed around 0.



In [36]:

    
sns.distplot(correlations.values.flatten());

The following summary shows that the $(41*300)$ correlations are centered at 0 with std 0.05. The largest is 0.32, and the smallest is -0.30. These don't seem like very high numbers. The 75% percentile is 0.03. By looking back at the histogram above, it's obviously these dimensions are not highly correlated with these linguistic features.

Another important point from this is that the distribution of correlations is symmetric. So using the the max, rather than the max absolute, seems arbitrary. It doesn't allow for reversed dimensions.



In [37]:

    
pd.Series(correlations.values.flatten()).describe()









    Out[37]:





count    12300.000000
mean        -0.000063
std          0.048710
min         -0.298277
25%         -0.028078
50%          0.000444
75%          0.028427
max          0.318956
dtype: float64

What do the maximum correlations look like?

These are all positive. But are they different enough from 0? What test can I use here?



In [38]:

    
sns.distplot(correlations.max(axis=1));

In the heatmap below, nothing really sticks out. It does not look like these features are captured well by these dimensions.



In [14]:

    
sns.heatmap(correlations);

Which features are not the most correlated with any dimension?

'verb.weather' doesn't seem like a good feature, but the others do. So leaving them out isn't great.



In [53]:

    
subset.index.difference(alignments['feature'])









    Out[53]:





Index(['noun.Tops', 'noun.event', 'noun.motive', 'noun.relation',
       'verb.competition', 'verb.consumption', 'verb.perception',
       'verb.weather'],
      dtype='object')

Top K words

NB: This stinks with the whole embedding matrix, but looks promising with smaller vocab.

In the paper they give the top K words for a dimension. The code prints, for each dimension, the dimension number, the aligned linguistic feature, the correlation between the two previous things, and the top k words associated with the dimension. I understand the last bit to mean "the k words with the highest value in the dimension".

Importantly, it matters whether you look for the top K words in the whole embedding matrix, or the reduced vocab in the matrix X (the one you have linguistic features for). You get much better results when you use X. Ideally, the method would generalize to the larger vocab. Clearly, X will have words with much higher frequency. This may give more sensible (stable?) results.

How can I assess whether these top K words are "correct" or not?

Are the top k words of the right POS?
Look at the smallest values associated with each dimension.



In [112]:

    
def highest_value(i, k=20, X=X):
    """Return the top `k` words with highest values for ith dimension in X."""
    dim = X.loc[i]
    return dim.nlargest(n=k).index



In [113]:

    
k = 10
largest = pd.DataFrame([highest_value(i, k) for i in alignments.index], index=alignments.index)
top_k = pd.merge(alignments, largest, left_index=True, right_index=True)
top_k.sort_values(by='max_corr', ascending=False).head()









    Out[113]:







  
    
      
      feature
      max_corr
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
    
  
  
    
      122
      noun.person
      0.318956
      amateur
      educator
      physician
      bodybuilder
      linguist
      politician
      legislator
      philosopher
      teacher
      dancer
    
    
      255
      noun.person
      0.288418
      soldier
      writer
      army
      farmer
      man
      father
      disease
      son
      horse
      yard
    
    
      91
      noun.artifact
      0.268320
      rotor
      barrel
      drum
      brass
      steel
      batting
      bicycle
      pace
      piano
      hit
    
    
      54
      noun.person
      0.254084
      platoon
      apprentice
      sergeant
      regiment
      corps
      mars
      mortar
      president
      conductor
      retailer
    
    
      245
      noun.person
      0.224563
      ambassador
      minister
      deputy
      interior
      counterpart
      policeman
      summon
      swear
      wail
      dispatch



In [120]:

    
def get_dims(feature, df=top_k):
    """Return the dimensions aligned with `feature` in `df`."""
    return df[df['feature']==feature].sort_values(by='max_corr', ascending=False)

get_dims('noun.time').head()









    Out[120]:







  
    
      
      feature
      max_corr
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
    
  
  
    
      180
      noun.time
      0.152026
      minority
      debt
      qualify
      percent
      leaders
      percentage
      nation
      income
      cup
      losses
    
    
      84
      noun.time
      0.134748
      bomb
      doctor
      milligram
      summary
      interruption
      peace
      lawyer
      fraud
      noon
      trial
    
    
      166
      noun.time
      0.122093
      clap
      nazi
      german
      germany
      dipole
      dividend
      transmission
      thaw
      confirmation
      swiss
    
    
      4
      noun.time
      0.118316
      hr
      creep
      sneak
      average
      deduction
      pound
      expire
      procreation
      orient
      theologian
    
    
      239
      noun.time
      0.105017
      earthquake
      index
      congress
      magnitude
      anemia
      classification
      ruling
      rating
      retention
      boundary

Todo

Explore top k words more.
How can I assess whether the top K words are "correct" or not?
Look at more than the most highly correlated feature.
Do embeddings capture POS/syntactic information?
Are dimensions that capture the same feature giving complimentary information?
Rotate the vector space.

	noun.Tops	noun.act	noun.animal	noun.artifact	noun.attribute	noun.body	noun.cognition	noun.communication	noun.event	noun.feeling	...	verb.consumption	verb.contact	verb.creation	verb.emotion	verb.motion	verb.perception	verb.possession	verb.social	verb.stative	verb.weather
1	-0.025195	0.103305	0.022804	0.019960	-0.005073	-0.063040	0.001746	-0.044627	0.060426	-0.000629	...	-0.000923	0.032615	-0.059228	0.002191	0.043395	0.016654	-0.000253	0.015861	-0.058195	0.027505
2	0.025052	0.003521	0.071996	0.033976	0.046986	0.000275	-0.025890	-0.093323	0.013002	-0.026363	...	0.041451	0.004752	0.008058	-0.024316	-0.043428	0.021968	-0.003188	0.024070	0.064498	-0.034473
3	0.000335	0.132868	-0.020388	-0.043322	0.026484	-0.031252	0.051305	0.025105	0.069223	0.112211	...	-0.009357	-0.078781	-0.096304	-0.000102	-0.010545	-0.015110	-0.082116	-0.047947	-0.098220	0.008797
4	0.003880	0.020326	0.023546	-0.164084	-0.013492	-0.079157	0.008527	0.045884	-0.002118	-0.003778	...	0.031046	-0.015342	-0.029799	0.009851	0.045801	-0.023565	0.036102	0.003050	0.059049	-0.009769
5	0.011347	0.032593	0.008034	-0.031921	-0.015440	0.030811	0.026431	-0.020380	-0.054214	0.032733	...	-0.006414	-0.024326	-0.012067	0.045959	-0.026189	-0.051913	0.013840	-0.012455	0.000732	-0.006095

	feature	max_corr
122	noun.person	0.318956
255	noun.person	0.288418
91	noun.artifact	0.268320
54	noun.person	0.254084
245	noun.person	0.224563
300	noun.person	0.215958
151	noun.group	0.207835
235	noun.artifact	0.204042
187	noun.artifact	0.200060
182	noun.person	0.197632

words	in	a	be	have	will	one	two	more	first	up	...	epiphysis	alveolus	antiserum	quirt	polyphosphate	catkin	illumine	thyroglobulin	pushup	compulsivity
noun.Tops	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
noun.act	0.181818	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.322581	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN
noun.animal	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
noun.artifact	0.090909	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN
noun.attribute	0.090909	NaN	NaN	NaN	0.111111	NaN	NaN	NaN	0.032258	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0

	noun.Tops	noun.act	noun.animal	noun.artifact	noun.attribute	noun.body	noun.cognition	noun.communication	noun.event	noun.feeling	...	verb.consumption	verb.contact	verb.creation	verb.emotion	verb.motion	verb.perception	verb.possession	verb.social	verb.stative	verb.weather
1	-0.229182	0.111077	0.007209	0.037589	-0.083839	-0.227386	0.002934	-0.045626	-0.019186	-0.062111	...	0.073027	0.085515	-0.244246	-0.067874	0.068440	0.070704	-0.100652	0.013106	-0.167430	0.256932
2	0.202263	-0.008259	0.200702	-0.000356	0.113172	-0.024689	-0.072588	-0.178311	-0.051213	-0.149925	...	0.100824	-0.074187	-0.067437	-0.190583	-0.162674	0.028970	-0.054395	0.000717	0.154650	-0.111338
3	0.014284	0.214625	-0.075941	-0.117910	-0.020473	0.002985	0.049466	0.009423	0.171650	0.440297	...	0.087655	-0.040716	-0.241742	0.112286	0.066815	0.146187	-0.137460	-0.026873	-0.175116	0.181890
4	0.155118	0.049381	0.292984	-0.116676	0.000463	0.044785	0.116891	0.058691	0.084327	-0.093920	...	0.109172	0.019141	-0.081494	0.030214	0.161246	-0.153968	0.116954	-0.026835	0.241725	0.038425
5	-0.008214	0.055944	-0.049097	-0.014899	-0.069239	0.001092	-0.028705	-0.007390	-0.094735	0.062622	...	0.040004	-0.034280	-0.047176	0.085783	-0.065920	-0.261270	0.037078	-0.065048	-0.029855	0.198482

	feature	max_corr
160	noun.motive	0.865133
164	noun.motive	0.727977
213	noun.motive	0.705146
200	noun.motive	0.678975
218	noun.motive	0.676393

	feature	max_corr	0	1	2	3	4	5	6	7	8	9
122	noun.person	0.318956	amateur	educator	physician	bodybuilder	linguist	politician	legislator	philosopher	teacher	dancer
255	noun.person	0.288418	soldier	writer	army	farmer	man	father	disease	son	horse	yard
91	noun.artifact	0.268320	rotor	barrel	drum	brass	steel	batting	bicycle	pace	piano	hit
54	noun.person	0.254084	platoon	apprentice	sergeant	regiment	corps	mars	mortar	president	conductor	retailer
245	noun.person	0.224563	ambassador	minister	deputy	interior	counterpart	policeman	summon	swear	wail	dispatch

	feature	max_corr	0	1	2	3	4	5	6	7	8	9
180	noun.time	0.152026	minority	debt	qualify	percent	leaders	percentage	nation	income	cup	losses
84	noun.time	0.134748	bomb	doctor	milligram	summary	interruption	peace	lawyer	fraud	noon	trial
166	noun.time	0.122093	clap	nazi	german	germany	dipole	dividend	transmission	thaw	confirmation	swiss
4	noun.time	0.118316	hr	creep	sneak	average	deduction	pound	expire	procreation	orient	theologian
239	noun.time	0.105017	earthquake	index	congress	magnitude	anemia	classification	ruling	rating	retention	boundary