State-of-the-art for word-level intrinsic eval

Bruni, Tran, Baroni (multimodal dist semantics):

  • ws: 0.7
  • men: 0.77 TODO Need to find more recent papers

RG: http://aclweb.org/aclwiki/index.php?title=RG-65_Test_Collection_(State_of_the_art) $\rho=0.89$

Test: instrinsic eval on noise-corrupted vectors

Add noise as usual, evaluate intrinsically.

Coverage

Vectors trained on less data will have lower coverage of types, so they may not be able to provide and answer for all word pairs in the test sets. I handle this in two ways:

  • relaxed: OOV items are not looked at. This may provide an unfair advantage to vectors trained on less data, because we forgive their poor coverage
  • strict: OOV items score 0 correlation.

PoS tags

Words in 3/4 datasets are provided without PoS tags (only MEN provides tags). In my work cat/J, cat/N and cat/V have different vectors, so before evaluation I need to map cat to one of these two versions. I use the first PoS tag that is found, in the order JNV. I could have ignored words in the test set that may map to multiple PoS tags.

Questions

  • Do people use the strict version?
  • What do I do with multiple possible POS tags per word for ws353 data?

Running experiment

Run thesisgenerator/scripts/intrinsic_eval_words.py first and make sure results are in the right place.


In [1]:
%cd ~/NetBeansProjects/ExpLosion/
from notebooks.common_imports import *

sns.timeseries.algo.bootstrap = my_bootstrap
sns.categorical.bootstrap = my_bootstrap


/Volumes/LocalDataHD/m/mm/mmb28/NetBeansProjects/ExpLosion

In [2]:
noise_df = pd.read_csv('../thesisgenerator/intrinsic_noise_word_level.csv', index_col=0)
noise_df['Dataset'] = noise_df.test.map(str.upper)
noise_df = noise_df.drop('test', axis=1)
noise_df.head()


Out[2]:
vect noise kind corr pval folds Dataset
0 w2v-giga-100 0 strict 0.463833 3.120347e-20 0 WS353
1 w2v-giga-100 0 relaxed 0.553462 1.340141e-28 0 WS353
2 w2v-giga-100 0 strict 0.466006 1.975560e-20 1 WS353
3 w2v-giga-100 0 relaxed 0.533901 2.252053e-26 1 WS353
4 w2v-giga-100 0 strict 0.382868 9.066415e-14 2 WS353

In [3]:
with sns.axes_style('whitegrid'):
    g = sns.factorplot(x='noise', hue='vect', col='Dataset', y='corr', col_wrap=2,
                       data=noise_df[noise_df.kind == 'strict'], kind='point', 
                       x_order=sorted(noise_df.noise.unique()), aspect=1.5,
                      col_order = ['MEN', 'WS353', 'RG', 'MC']
                      );
g.set_ylabels('Spearman $\\rho$')
sns.despine(left=True, bottom=True)
for ax in g.axes.flat:
    sparsify_axis_labels(ax)
    ax.axhline(0, c='k')


/home/m/mm/mmb28/anaconda3/lib/python3.4/site-packages/seaborn/categorical.py:2653: UserWarning: The `x_order` parameter has been renamed `order`
  UserWarning)

WARNING:py.warnings:/home/m/mm/mmb28/anaconda3/lib/python3.4/site-packages/seaborn/categorical.py:2653: UserWarning: The `x_order` parameter has been renamed `order`
  UserWarning)


In [4]:
with sns.color_palette("cubehelix", 4):
    g = sns.FacetGrid(noise_df[noise_df.kind == 'strict'], col='Dataset', col_wrap=2,
                     col_order = ['MEN', 'WS353', 'RG', 'MC'])
    g.map_dataframe(tsplot_for_facetgrid, time='noise', value='corr', condition='vect', 
                    unit='folds', ci=68).add_legend()
for ax in g.axes.flat:
    sparsify_axis_labels(ax)

g.set_ylabels('Spearman $\\rho$')
g.set_xlabels('Noise')
plt.savefig('plot-intrinsic-noise.pdf', format='pdf', dpi=300, bbox_inches='tight', pad_inches=0.1)



In [5]:
with sns.axes_style('whitegrid'):
    g = sns.factorplot(x='noise', col='vect', hue='Dataset', y='pval',
                   data=noise_df[noise_df.kind == 'relaxed'], 
                   kind='point', 
                   x_order=sorted(noise_df.noise.unique()));
sns.despine(left=True)
for ax in g.axes.flat:
    sparsify_axis_labels(ax)
    ax.set_ylim(-0.05, 1)


/home/m/mm/mmb28/anaconda3/lib/python3.4/site-packages/seaborn/categorical.py:2653: UserWarning: The `x_order` parameter has been renamed `order`
  UserWarning)

WARNING:py.warnings:/home/m/mm/mmb28/anaconda3/lib/python3.4/site-packages/seaborn/categorical.py:2653: UserWarning: The `x_order` parameter has been renamed `order`
  UserWarning)


In [6]:
g = sns.FacetGrid(noise_df[noise_df.kind == 'relaxed'], col='vect')
g.map_dataframe(tsplot_for_facetgrid, time='noise', value='pval',
                condition='Dataset', unit='folds', ci=0.1).add_legend()
for ax in g.axes.flat:
    sparsify_axis_labels(ax)
    ax.set_title('{1} {2}%'.format(*ax.title._text.split('-')))



In [7]:
# version of the above without percentiles
noise_df2 = noise_df[noise_df.kind == 'relaxed'].groupby(['Dataset', 'vect', 'noise']).mean().reset_index()
with sns.color_palette("cubehelix", 4):
    g = sns.FacetGrid(noise_df2, col='vect')
    g.map_dataframe(tsplot_for_facetgrid, time='noise', value='pval',
                    condition='Dataset', unit='folds', ci=0.1).add_legend()

for ax in g.axes.flat:
    sparsify_axis_labels(ax)
    _, corpus, percent = ax.title._text.split('-')
    ax.set_title('{} {}%'.format(corpus.title(), percent), fontsize=18)

g.set_ylabels('P-value')
g.set_xlabels('Noise')
plt.savefig('plot-intrinsic-pvals.pdf', format='pdf', dpi=300, bbox_inches='tight', pad_inches=0.1)


Observations

  • Measured vector quality decreases nicely for WS353/MEN, oscilates for MC/RG.
  • P-value of correlation for smallers datasets explodes early, i.e. chance of such a strong correlation being observed by chance. Test has low power

Test: Learning curve

Evaluate vectors intrinsically as more unlabelled training data is added


In [8]:
curve_df = pd.read_csv('../thesisgenerator/intrinsic_learning_curve_word_level.csv')
with sns.axes_style('whitegrid'):
    sns.factorplot(data=curve_df, col='test', x='percent', y='corr', 
                   hue='kind', col_wrap=2, col_order = ['men', 'ws353', 'rg', 'mc'])



In [9]:
g = sns.FacetGrid(curve_df, col='test', col_wrap=2,
                 col_order = ['men', 'ws353', 'rg', 'mc'])
g.map_dataframe(tsplot_for_facetgrid, time='percent', value='corr', condition='kind', 
                unit='folds', ci=68).add_legend();
plt.savefig('plot-intrinsic-learning-curve.pdf', format='pdf', dpi=300, bbox_inches='tight', pad_inches=0.1)


Observation

None of the intrinsic tests but can tell between wiki-15 and wiki-100, regardless of dataset size.

I thought this may be because I was using the relaxed score, but the difference between relaxed and strict is generally small. Such a difference only arises when a model's coverage of the test words is poor, i.e. when unlabelled data is very limited. This isn't a real issue (see below)

Coverage

Almost perfect after 10% of wikipedia


In [10]:
g = sns.factorplot(y='missing', x='percent', hue='test', data=curve_df, aspect=2);


Repeated runs of w2v and multivector boosting


In [11]:
rep_df = pd.read_csv('../thesisgenerator/intrinsic_w2v_repeats_word_level.csv')


---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-11-b52cd322beb5> in <module>()
----> 1 rep_df = pd.read_csv('../thesisgenerator/intrinsic_w2v_repeats_word_level.csv')

/home/m/mm/mmb28/anaconda3/lib/python3.4/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    472                     skip_blank_lines=skip_blank_lines)
    473 
--> 474         return _read(filepath_or_buffer, kwds)
    475 
    476     parser_f.__name__ = name

/home/m/mm/mmb28/anaconda3/lib/python3.4/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    248 
    249     # Create the parser.
--> 250     parser = TextFileReader(filepath_or_buffer, **kwds)
    251 
    252     if (nrows is not None) and (chunksize is not None):

/home/m/mm/mmb28/anaconda3/lib/python3.4/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    564             self.options['has_index_names'] = kwds['has_index_names']
    565 
--> 566         self._make_engine(self.engine)
    567 
    568     def _get_options_with_defaults(self, engine):

/home/m/mm/mmb28/anaconda3/lib/python3.4/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    703     def _make_engine(self, engine='c'):
    704         if engine == 'c':
--> 705             self._engine = CParserWrapper(self.f, **self.options)
    706         else:
    707             if engine == 'python':

/home/m/mm/mmb28/anaconda3/lib/python3.4/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1070         kwds['allow_leading_cols'] = self.index_col is not False
   1071 
-> 1072         self._reader = _parser.TextReader(src, **kwds)
   1073 
   1074         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3173)()

pandas/parser.pyx in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912)()

OSError: File b'../thesisgenerator/intrinsic_w2v_repeats_word_level.csv' does not exist

In [ ]:
rep_df.head()

In [ ]:
sns.factorplot(data=rep_df, x='rep_id', y='corr', col='test', kind='bar')

In [ ]:
from discoutils.thesaurus_loader import Vectors as V
from thesisgenerator.plugins.multivectors import MultiVectors
prefix = 'lustre/scratch/inf/mmb28/FeatureExtractionToolkit/word2vec_vectors/'
pattern = os.path.join(prefix, 'word2vec-wiki-15perc.unigr.strings.rep%d')
rep_vectors = [V.from_tsv(pattern % i) for i in [0, 1, 2]]
avg_vectors = [V.from_tsv(os.path.join(prefix, 'word2vec-wiki-15perc.unigr.strings.avg3'))]
mv = [MultiVectors(tuple(rep_vectors))]

In [ ]:
mv[0].get_nearest_neighbours('love/N')

In [ ]: