Feature variation by substitution ($\nu_{\phi}$)

1 Setup

Flags and settings.



In [1]:

    
SAVE_FIGURES = False
PAPER_FEATURES = ['frequency', 'aoa', 'clustering', 'letters_count',
                  'synonyms_count', 'orthographic_density']
N_COMPONENTS = 3
BIN_COUNT = 4

Imports and database setup.



In [2]:

    
from itertools import product

import pandas as pd
import seaborn as sb
from scipy import stats
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from progressbar import ProgressBar

%cd -q ..
from brainscopypaste.conf import settings
%cd -q notebooks
from brainscopypaste.mine import Model, Time, Source, Past, Durl
from brainscopypaste.db import Substitution
from brainscopypaste.utils import init_db, session_scope
engine = init_db()

2 Variation of features upon substitution

First build our data.



In [3]:

    
model = Model(time=Time.discrete, source=Source.all, past=Past.all, durl=Durl.all, max_distance=1)
data = []

with session_scope() as session:
    substitutions = session.query(Substitution.id)\
        .filter(Substitution.model == model)
    print("Got {} substitutions for model {}"
          .format(substitutions.count(), model))
    substitution_ids = [id for (id,) in substitutions]

for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for feature in Substitution.__features__:
            source, destination = substitution.features(feature)
            source_rel, destination_rel = \
                substitution.features(feature, sentence_relative='median')
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'feature': feature,
                'source': source,
                'source_rel': source_rel,
                'destination': destination,
                'destination_rel': destination_rel,
                'h0': substitution.feature_average(feature),
                'h0_rel': substitution.feature_average(
                        feature, sentence_relative='median'),
                'h0n': substitution.feature_average(
                        feature, source_synonyms=True),
                'h0n_rel': substitution.feature_average(
                        feature, source_synonyms=True,
                        sentence_relative='median')})

original_variations = pd.DataFrame(data)
del data









    



Got 53419 substitutions for model Model(time=Time.discrete, source=Source.all, past=Past.all, durl=Durl.all, max_distance=1)






    



100% (53419 of 53419) |####################| Elapsed Time: 0:11:40 Time: 0:11:40

Compute cluster averages (so as not to overestimate confidence intervals) and crop data so that we have acceptable CIs.



In [4]:

    
variations = original_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'feature'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'feature'], as_index=False)\
    ['source', 'source_rel', 'destination', 'destination_rel', 'feature',
     'h0', 'h0_rel', 'h0n', 'h0n_rel'].mean()
variations['variation'] = variations['destination'] - variations['source']

# HARDCODED: drop values where source AoA is above 15.
# This crops the graphs to acceptable CIs.
variations.loc[(variations.feature == 'aoa') & (variations.source > 15),
               ['source', 'source_rel', 'destination', 'destination_rel',
                'h0', 'h0_rel', 'h0n', 'h0n_rel']] = np.nan

Prepare feature ordering.



In [5]:

    
ordered_features = sorted(
    Substitution.__features__,
    key=lambda f: Substitution._transformed_feature(f).__doc__
)

What we plot about features

For a feature $\phi$, plot:

$\nu_{\phi}$, the average feature of an appearing word upon substitution, as a function of the feature of the disappearing word: $$\nu_{\phi}(f) = \left< \phi(w') \right>_{\{w \rightarrow w' | \phi(w) = f \}}$$
$\nu_{\phi}^0$ (which is the average feature value), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi}^{00}$ (which is the average feature value for synonyms of the source word), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

We also plot these values relative to the sentence average, i.e.:

$\nu_{\phi, r}$, the average sentence-relative feature of an appearing word upon substitution as a function of the sentence-relative feature of the disappearing word, i.e. $\phi($destination$) - \phi($destination sentence$)$ as a function of $\phi($source$) - \phi($source sentence$)$
$\nu_{\phi, r}^0$ (which is the average feature value minus the sentence average), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi, r}^{00}$ (which is the average feature value for synonyms of the source word minus the sentence average), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

Those values are plotted with fixed-width bins, then quantile bins, with absolute feature values, then with relative-to-sentence features.



In [6]:

    
def print_significance(name, bins, h0, h0n, values):
    bin_count = bins.max() + 1
    print()
    print('-' * len(name))
    print(name)
    print('-' * len(name))
    header = ('Bin  |   '
              + ' |   '.join(map(str, range(1, bin_count + 1)))
              + ' |')
    print(header)
    print('-' * len(header))
    
    for null_name, nulls in [('H_0 ', h0), ('H_00', h0n)]:
        bin_values = np.zeros(bin_count)
        bin_nulls = np.zeros(bin_count)
        cis = np.zeros((bin_count, 3))

        for i in range(bin_count):
            indices = bins == i
            n = (indices).sum()
            s = values[indices].std(ddof=1)

            bin_values[i] = values[indices].mean()
            bin_nulls[i] = nulls[indices].mean()
            for j, alpha in enumerate([.05, .01, .001]):
                cis[i, j] = (stats.t.ppf(1 - alpha/2, n - 1)
                             * values[indices].std(ddof=1)
                             / np.sqrt(n - 1))

        print(null_name + ' |', end='')
        differences = ((bin_values[:,np.newaxis]
                        < bin_nulls[:,np.newaxis] - cis)
                       | (bin_values[:,np.newaxis]
                          > bin_nulls[:,np.newaxis] + cis))
        for i in range(bin_count):
            if differences[i].any():
                n_stars = np.where(differences[i])[0].max()
                bin_stars = '*' * (1 + n_stars) + ' ' * (2 - n_stars)
            else:
                bin_stars = 'ns.'
            print(' ' + bin_stars + ' |', end='')
        print()



In [7]:

    
def plot_variation(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    relative = kwargs.get('relative', False)
    quantiles = kwargs.get('quantiles', False)
    feature_field = kwargs.get('feature_field', 'feature')
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    nuphi = r'\nu_{\phi' + (',r' if relative else '') + '}'
    plt.plot(middles, values, '-', lw=2, color=color,
             label='${}$'.format(nuphi))
    plt.fill_between(middles, values - cis, values + cis,
                     color=sb.desaturate(color, 0.2), alpha=0.2)
    plt.plot(middles, h0s, '--', color=sb.desaturate(color, 0.2),
             label='${}^0$'.format(nuphi))
    plt.plot(middles, h0ns, linestyle='-.',
             color=sb.desaturate(color, 0.2),
             label='${}^{{00}}$'.format(nuphi))
    plt.plot(middles, middles, linestyle='dotted',
             color=sb.desaturate(color, 0.2),
             label='$y = x$')
    lmin, lmax = middles[0], middles[-1]
    h0min, h0max = min(h0s.min(), h0ns.min()), max(h0s.max(), h0ns.max())
    # Rescale limits if we're touching H0 or H00.
    if h0min < lmin:
        lmin = h0min - (lmax - h0min) / 10
    elif h0max > lmax:
        lmax = h0max + (h0max - lmin) / 10
    plt.xlim(lmin, lmax)
    plt.ylim(lmin, lmax)

    # Test for statistical significance
    print_significance(str(data.iloc[0][feature_field]),
                       x_bins, h0, h0n, y)



In [8]:

    
def plot_grid(data, features, filename,
              plot_function, xlabel, ylabel,
              feature_field='feature', plot_kws={}):
    g = sb.FacetGrid(data=data[data[feature_field]
                               .map(lambda f: f in features)],
                     sharex=False, sharey=False,
                     col=feature_field, hue=feature_field,
                     col_order=features, hue_order=features,
                     col_wrap=3, aspect=1.5, size=3)
    g.map_dataframe(plot_function, **plot_kws)
    g.set_titles('{col_name}')
    g.set_xlabels(xlabel)
    g.set_ylabels(ylabel)
    for ax in g.axes.ravel():
        legend = ax.legend(frameon=True, loc='best')
        if not legend:
            # Skip if nothing was plotted on these axes.
            continue
        frame = legend.get_frame()
        frame.set_facecolor('#f2f2f2')
        frame.set_edgecolor('#000000')
        ax.set_title(Substitution._transformed_feature(ax.get_title())
                     .__doc__)
    if SAVE_FIGURES:
        g.fig.savefig(settings.FIGURE.format(filename),
                      bbox_inches='tight', dpi=300)



In [9]:

    
def plot_bias(ax, data, color, ci=True, relative=False, quantiles=False):
    feature = data.iloc[0].feature
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    scale = abs(h0s.mean())
    ax.plot(np.linspace(0, 1, bin_count),
            (values - h0ns) / scale, '-', lw=2, color=color,
            label=Substitution._transformed_feature(feature).__doc__)
    if ci:
        ax.fill_between(np.linspace(0, 1, bin_count),
                        (values - h0ns - cis) / scale,
                        (values - h0ns + cis) / scale,
                        color=sb.desaturate(color, 0.2), alpha=0.2)



In [10]:

    
def plot_overlay(data, features, filename, palette_name,
                 plot_function, title, xlabel, ylabel, plot_kws={}):
    palette = sb.color_palette(palette_name, len(features))
    fig, ax = plt.subplots(figsize=(12, 6))
    for j, feature in enumerate(features):
        plot_function(ax, data[data.feature == feature].dropna(),
                      color=palette[j], **plot_kws)
    ax.legend(loc='lower right')
    ax.set_title(title)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    if SAVE_FIGURES:
        fig.savefig(settings.FIGURE.format(filename),
                    bbox_inches='tight', dpi=300)
    return ax

2.1 Global feature values

2.1.1 Bins of distribution of appeared global feature values

For each feature $\phi$, we plot the variation upon substitution as explained above



In [11]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | *** | *** | *** | *   |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *   | *** | ns. |
H_00 | *** | *   | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | **  | **  |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | ns. | **  | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | ns. | *** |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *   | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | *** | ns. | *** | **  |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *   | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | ns. | ns. | *** |

Then plot $\nu_{\phi} - \nu_{\phi}^{00}$ for each feature (i.e. the measured bias) to see how they compare



In [12]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False});



In [13]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | *** | *** | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | ns. | *** |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | *** | *** | *** | *   |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | **  | **  |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | *** | ns. | *** | **  |



In [14]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)')\
    .set_ylim(-2, .7);

2.1.2 Quantiles of distribution of appeared global feature values



In [15]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |
------------------------
H_0  | *** | *** | *** |
H_00 | *** | *   | *   |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *   | *** |
H_00 | **  | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *   | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | **  | ns. | *** |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | **  | *** | *** |
H_00 | *** | *   | ns. | *** |



In [16]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False, 'quantiles': True});



In [17]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *   | *** |
H_00 | **  | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | **  | ns. | *** |



In [18]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'quantiles': True})\
    .set_ylim(-1.2, .6);

2.2 Sentence-relative feature values

2.2.1 Bins of distribution of appeared sentence-relative values



In [19]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | ns. | *** | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *   |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | ns. | ns. | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | **  | *** | ns. | **  |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | **  |
H_00 | *** | *** | **  | ns. |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | *   | *** | ns. | ns. |



In [20]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True});



In [21]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | **  | *** | ns. | **  |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *   |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | **  |
H_00 | *** | *** | **  | ns. |



In [22]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_sentencerel',
             'deep', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True})\
    .set_ylim(-2, .7);

2.2.2 Quantiles of distribution of appeared sentence-relative values



In [23]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *** |
H_00 | *** | ns. | *** | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *   | ns. | *** |
H_00 | *** | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *   | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *   | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | **  | *** | *** |
H_00 | *** | *** | ns. | **  |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | **  | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | ns. | *** | *** |
H_00 | *** | *** | ns. | *   |



In [24]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True, 'quantiles': True});



In [25]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *   | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *   | ns. | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | **  | *** | *** |
H_00 | *** | *** | ns. | **  |



In [26]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True, 'quantiles': True});

3 Streamplots

We'd like to see what happens between absolute and relative feature values, i.e. how do their effects interact. Especially, we want to know who wins between cognitive bias, attraction to sentence average, or attraction to global feature average.

To do this we plot the general direction (arrows) and strength (color) of where destination words are given a particular absolute/relative source feature couple. I.e., for a given absolute feature value and relative feature value, if this word were to be substituted, where would it go in this (absolute, relative) space?

The interesting thing in these plots is the attraction front, where all arrows point to and join. We're interested in:

its slope
its shape (e.g. several slope regimes?)
its position w.r.t. $\nu_{\phi}^0$ and $y = 0$ (which is $\left< \phi(sentence) \right>$)

First, here's our plotting function. (Note we set the arrow size to something that turns out to be huge here, but gives normal sizes in the figures saves. There must be some dpi scaling problem with the arrows.)



In [27]:

    
def plot_stream(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    source = data['source']
    source_rel = data['source_rel']
    dest = data['destination']
    dest_rel = data['destination_rel']
    h0 = data['h0']
    
    # Compute binning.
    bin_count = 4
    x_bins, x_margins = pd.cut(source, bin_count,
                               right=False, labels=False, retbins=True)
    x_middles = (x_margins[:-1] + x_margins[1:]) / 2
    y_bins, y_margins = pd.cut(source_rel, bin_count,
                               right=False, labels=False, retbins=True)
    y_middles = (y_margins[:-1] + y_margins[1:]) / 2
    
    # Compute bin values.
    h0s = np.ones(bin_count) * h0.iloc[0]
    u_values = np.zeros((bin_count, bin_count))
    v_values = np.zeros((bin_count, bin_count))
    strength = np.zeros((bin_count, bin_count))
    for x in range(bin_count):
        for y in range(bin_count):
            u_values[y, x] = (
                dest[(x_bins == x) & (y_bins == y)] -
                source[(x_bins == x) & (y_bins == y)]
            ).mean()
            v_values[y, x] = (
                dest_rel[(x_bins == x) & (y_bins == y)] -
                source_rel[(x_bins == x) & (y_bins == y)]
            ).mean()
            strength[y, x] = np.sqrt(
                (dest[(x_bins == x) & (y_bins == y)] - 
                 source[(x_bins == x) & (y_bins == y)]) ** 2 +
                (dest_rel[(x_bins == x) & (y_bins == y)] - 
                 source_rel[(x_bins == x) & (y_bins == y)]) ** 2
            ).mean()
    
    # Plot.
    plt.streamplot(x_middles, y_middles, u_values, v_values,
                   arrowsize=4, color=strength, cmap=plt.cm.viridis)
    plt.plot(x_middles, np.zeros(bin_count), linestyle='-',
             color=sb.desaturate(color, 0.2), 
             label=r'$\left< \phi(sentence) \right>$')
    plt.plot(h0s, y_middles, linestyle='--',
             color=sb.desaturate(color, 0.2), label=r'$\nu_{\phi}^0$')
    plt.xlim(x_middles[0], x_middles[-1])
    plt.ylim(y_middles[0], y_middles[-1])

Here are the plots for all features



In [28]:

    
g = sb.FacetGrid(data=variations,
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=ordered_features, hue_order=ordered_features)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

And here are the plots for the features we expose in the paper



In [29]:

    
g = sb.FacetGrid(data=variations[variations['feature']
                                 .map(lambda f: f in PAPER_FEATURES)],
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=PAPER_FEATURES, hue_order=PAPER_FEATURES)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

4 PCA'd feature variations

Compute PCA on feature variations (note: on variations, not on features directly), and show the evolution of the first three components upon substitution.

CAVEAT: the PCA is computed on variations where all features are defined. This greatly reduces the number of words included (and also the number of substitutions -- see below for real values, but you should know it's drastic). This also has an effect on the computation of $\mathcal{H}_0$ and $\mathcal{H}_{00}$, which are computed using words for which all features are defined. This, again, hugely reduces the number of words taken into account, changing the values under the null hypotheses.

4.1 On all the features

Compute the actual PCA



In [30]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(Substitution.__features__))
pcavariations = variations.pivot(index='cluster_id',
                                 columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

print("We're plotting variation for the first {} components:"
      .format(N_COMPONENTS))
pd.DataFrame(pca.components_[:N_COMPONENTS],
             columns=pcafeatures,
             index=['Component-{}'.format(i) for i in range(N_COMPONENTS)])









    



MLE estimates there are 11 components.

Those explain the following variance:
[ 0.53118001  0.16984383  0.08486023  0.07138244  0.03431877  0.03114635
  0.02026665  0.01953249  0.01499711  0.00921627  0.00740786]

We're plotting variation for the first 3 components:






    Out[30]:






  
    
      
      aoa
      betweenness
      clustering
      degree
      frequency
      letters_count
      orthographic_density
      pagerank
      phonemes_count
      phonological_density
      syllables_count
      synonyms_count
    
  
  
    
      Component-0
      -0.472754
      0.292834
      -0.079535
      0.243040
      0.233793
      -0.425883
      0.211612
      0.284075
      -0.400360
      0.278756
      -0.160736
      -0.003837
    
    
      Component-1
      -0.256312
      0.428259
      -0.135165
      0.293881
      0.287968
      0.422299
      -0.170129
      0.294531
      0.439821
      -0.220452
      0.164369
      -0.015803
    
    
      Component-2
      0.814881
      0.374155
      -0.133468
      0.124764
      0.334159
      -0.141843
      -0.007585
      0.088831
      -0.100192
      0.101068
      -0.025615
      -0.044239

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [31]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(N_COMPONENTS):
            source, destination = substitution\
                .components(component, pca, pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (53419 of 53419) |####################| Elapsed Time: 0:08:03 Time: 0:08:03

Compute cluster averages (so as not to overestimate confidence intervals).



In [32]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components (see the caveat section below)



In [33]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='upper left')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | ns. | *** | *** |
H_00 | *   | *   | *   | *** |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---
2.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | ns. | *** | *   |

4.2 On a subset of relevant features



In [34]:

    
relevant_features = ['frequency', 'aoa', 'letters_count']

Compute the actual PCA



In [35]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(relevant_features))
pcavariations = variations[variations['feature']
                           .map(lambda f: f in pcafeatures)]\
    .pivot(index='cluster_id', columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

pd.DataFrame(pca.components_,
             columns=pcafeatures,
             index=['Component-{}'.format(i)
                    for i in range(pca.n_components_)])









    



MLE estimates there are 2 components.

Those explain the following variance:
[ 0.68164612  0.17921223]







    Out[35]:






  
    
      
      aoa
      frequency
      letters_count
    
  
  
    
      Component-0
      -0.742885
      0.385537
      -0.547250
    
    
      Component-1
      0.310008
      -0.526418
      -0.791694

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [36]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(pca.n_components_):
            source, destination = substitution.components(component, pca,
                                                          pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (53419 of 53419) |####################| Elapsed Time: 0:05:45 Time: 0:05:45

Compute cluster averages (so as not to overestimate confidence intervals).



In [37]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components



In [38]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | **  | *** | *** |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | ns. |

4.3 CAVEAT: reduction of the numbers of words and substitutions

As explained above, this PCA analysis can only use words for which all the features are defined (in this case, the features listed in relevant_features). So note the following:



In [39]:

    
for feature in relevant_features:
    print("Feature '{}' is based on {} words."
          .format(feature, len(Substitution
                               ._transformed_feature(feature)())))

# Compute the number of words that have all PAPER_FEATURES defined.
words = set()
for tfeature in [Substitution._transformed_feature(feature)
                 for feature in relevant_features]:
    words.update(tfeature())

data = dict((feature, []) for feature in relevant_features)
words_list = []
for word in words:
    words_list.append(word)
    for feature in relevant_features:
        data[feature].append(Substitution
                             ._transformed_feature(feature)(word))
wordsdf = pd.DataFrame(data)
wordsdf['words'] = words_list
del words_list, data

print()
print("Among all the set of words used by these features, "
      "only {} are used."
      .format(len(wordsdf.dropna())))

print()
print("Similarly, we mined {} (cluster-unique) substitutions, "
      "but the PCA is in fact"
      " computed on {} of them (those where all features are defined)."
      .format(len(set(variations['cluster_id'])), len(pcavariations)))









    



Feature 'frequency' is based on 33450 words.
Feature 'aoa' is based on 30102 words.
Feature 'letters_count' is based on 42786 words.

Among all the set of words used by these features, only 14450 are used.

Similarly, we mined 1825 (cluster-unique) substitutions, but the PCA is in fact computed on 1491 of them (those where all features are defined).

The way $\mathcal{H}_0$ and $\mathcal{H}_{00}$ are computed makes them also affected by this.

5 Interactions between features (by Anova)

Some useful variables first.



In [40]:

    
cuts = [('fixed bins', pd.cut)]#, ('quantiles', pd.qcut)]
rels = [('global', ''), ('sentence-relative', '_rel')]

def star_level(p):
    if p < .001:
        return '***'
    elif p < .01:
        return ' **'
    elif p < .05:
        return '  *'
    else:
        return 'ns.'

Now for each feature, assess if it has an interaction with the other features' destination value. We look at this for all pairs of features, with all pairs of global/sentence-relative value and types of binning (fixed width/quantiles). So it's a lot of answers.

Three stars means $p < .001$, two $p < .01$, one $p < .05$, and ns. means non-significative.



In [41]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
  *** sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
  *** global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> aoa
  *** global -> global
    * global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
    * global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
   ** global -> global
   ** global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
   ** global -> global
   ** global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
   ** global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
   ** global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Now for each feature, look at its interaction with the other features' variation (i.e. destination - source). Same drill, same combinations.



In [42]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]\
                    - variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
   ** global -> global
    * global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> synonyms_count
   ** global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
    * global -> global
    * global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> aoa
    * global -> global
    * global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> letters_count
    * global -> global
    * global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Ok, so this can go on for a long time, and I'm not going to look at interactions with this lens (meaning at interaction of couples of features with another feature's destination values).

6 Regression



In [43]:

    
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures



In [44]:

    
rels = {False: ('global', ''),
        True: ('rel', '_rel')}

def regress(data, features, target,
            source_rel=False, dest_rel=False, interactions=False):
    if source_rel not in [True, False, 'both']:
        raise ValueError
    if not isinstance(dest_rel, bool):
        raise ValueError
    # Process source/destination relativeness arguments.
    if isinstance(source_rel, bool):
        source_rel = [source_rel]
    else:
        source_rel = [False, True]
    dest_rel_name, dest_rel = rels[dest_rel]
    
    features = tuple(sorted(features))
    feature_tuples = [('source' + rels[rel][1], feature)
                      for rel in source_rel
                      for feature in features]
    feature_names = [rels[rel][0] + '_' + feature
                     for rel in source_rel
                     for feature in features]
    
    # Get source and destination values.
    source = pd.pivot_table(
        data,
        values=['source' + rels[rel][1] for rel in source_rel],
        index=['cluster_id'],
        columns=['feature']
    )[feature_tuples].dropna()
    destination = variations[variations.feature == target]\
        .pivot(index='cluster_id', columns='feature',
               values='destination' + dest_rel)\
        .loc[source.index][target].dropna()
    source = source.loc[destination.index].values
    destination = destination.values

    # If asked to, get polynomial features.
    if interactions:
        poly = PolynomialFeatures(degree=2, interaction_only=True)
        source = poly.fit_transform(source)
        regress_features = [' * '.join([feature_names[j]
                                        for j, p in enumerate(powers)
                                        if p > 0]) or 'intercept'
                            for powers in poly.powers_]
    else:
        regress_features = feature_names

    # Regress.
    linreg = linear_model.LinearRegression(fit_intercept=not interactions)
    linreg.fit(source, destination)

    # And print the score and coefficients.
    print('Regressing {} with {} measures, {} interactions'
          .format(dest_rel_name + ' ' + target, len(source),
                  'with' if interactions else 'no'))
    print('           ' + '^' * len(dest_rel_name + ' ' + target))
    print('R^2 = {}'
          .format(linreg.score(source, destination)))
    print()
    coeffs = pd.Series(index=regress_features, data=linreg.coef_)
    if not interactions:
        coeffs = pd.Series(index=['intercept'], data=[linreg.intercept_])\
            .append(coeffs)
    with pd.option_context('display.max_rows', 999):
        print(coeffs)



In [45]:

    
for target in PAPER_FEATURES:
    print('-' * 70)
    for source_rel, dest_rel in product([False, True, 'both'],
                                        [False, True]):
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel)
        print()
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel, interactions=True)
        print()









    



----------------------------------------------------------------------
Regressing global frequency with 1161 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.09915365924497643

intercept                      5.270138
global_aoa                     0.045483
global_clustering              0.034711
global_frequency               0.392186
global_letters_count          -0.034149
global_orthographic_density   -0.024387
global_synonyms_count         -0.074285
dtype: float64

Regressing global frequency with 1161 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.12095215357874045

intercept                                              14.694324
global_aoa                                             -0.568471
global_clustering                                       1.891165
global_frequency                                        0.180273
global_letters_count                                   -0.461772
global_orthographic_density                             0.446609
global_synonyms_count                                  -0.457463
global_aoa * global_clustering                         -0.100096
global_aoa * global_frequency                           0.001081
global_aoa * global_letters_count                       0.014542
global_aoa * global_orthographic_density               -0.085795
global_aoa * global_synonyms_count                      0.091624
global_clustering * global_frequency                   -0.080524
global_clustering * global_letters_count               -0.061995
global_clustering * global_orthographic_density        -0.046905
global_clustering * global_synonyms_count               0.040500
global_frequency * global_letters_count                -0.016404
global_frequency * global_orthographic_density         -0.100756
global_frequency * global_synonyms_count               -0.027383
global_letters_count * global_orthographic_density      0.123846
global_letters_count * global_synonyms_count           -0.000447
global_orthographic_density * global_synonyms_count     0.204307
dtype: float64

Regressing rel frequency with 1161 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.048586282584234815

intercept                     -7.209904
global_aoa                     0.102859
global_clustering              0.011502
global_frequency               0.332951
global_letters_count           0.051823
global_orthographic_density    0.035035
global_synonyms_count          0.071893
dtype: float64

Regressing rel frequency with 1161 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.07004667973146228

intercept                                              2.808859
global_aoa                                            -0.651698
global_clustering                                      1.181897
global_frequency                                       0.028950
global_letters_count                                  -0.741899
global_orthographic_density                           -0.005324
global_synonyms_count                                 -1.390455
global_aoa * global_clustering                        -0.074059
global_aoa * global_frequency                          0.006023
global_aoa * global_letters_count                      0.043621
global_aoa * global_orthographic_density              -0.042572
global_aoa * global_synonyms_count                     0.150171
global_clustering * global_frequency                  -0.046735
global_clustering * global_letters_count              -0.042568
global_clustering * global_orthographic_density       -0.016037
global_clustering * global_synonyms_count              0.178766
global_frequency * global_letters_count                0.010990
global_frequency * global_orthographic_density        -0.063363
global_frequency * global_synonyms_count               0.077918
global_letters_count * global_orthographic_density     0.121861
global_letters_count * global_synonyms_count           0.054825
global_orthographic_density * global_synonyms_count    0.379793
dtype: float64

Regressing global frequency with 1161 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.06561228216743253

intercept                   9.498654
rel_aoa                     0.046224
rel_clustering             -0.051503
rel_frequency               0.257524
rel_letters_count          -0.050509
rel_orthographic_density   -0.033506
rel_synonyms_count         -0.191411
dtype: float64

Regressing global frequency with 1161 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.07801975996168686

intercept                                        9.348740
rel_aoa                                          0.092568
rel_clustering                                   0.011621
rel_frequency                                    0.229342
rel_letters_count                                0.071858
rel_orthographic_density                        -0.142320
rel_synonyms_count                               0.022490
rel_aoa * rel_clustering                        -0.053121
rel_aoa * rel_frequency                          0.011995
rel_aoa * rel_letters_count                     -0.003593
rel_aoa * rel_orthographic_density              -0.014687
rel_aoa * rel_synonyms_count                     0.074789
rel_clustering * rel_frequency                  -0.008145
rel_clustering * rel_letters_count              -0.049698
rel_clustering * rel_orthographic_density       -0.077364
rel_clustering * rel_synonyms_count              0.110636
rel_frequency * rel_letters_count                0.007784
rel_frequency * rel_orthographic_density        -0.016053
rel_frequency * rel_synonyms_count               0.047378
rel_letters_count * rel_orthographic_density     0.057638
rel_letters_count * rel_synonyms_count          -0.079036
rel_orthographic_density * rel_synonyms_count   -0.026505
dtype: float64

Regressing rel frequency with 1161 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.28272706408924175

intercept                  -1.446408
rel_aoa                     0.066423
rel_clustering              0.128131
rel_frequency               0.651298
rel_letters_count          -0.119104
rel_orthographic_density   -0.192689
rel_synonyms_count         -0.038826
dtype: float64

Regressing rel frequency with 1161 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.30183954573818006

intercept                                       -1.488340
rel_aoa                                          0.054857
rel_clustering                                   0.234205
rel_frequency                                    0.689005
rel_letters_count                               -0.044542
rel_orthographic_density                        -0.328598
rel_synonyms_count                               0.131671
rel_aoa * rel_clustering                        -0.083759
rel_aoa * rel_frequency                         -0.019153
rel_aoa * rel_letters_count                      0.019926
rel_aoa * rel_orthographic_density               0.057339
rel_aoa * rel_synonyms_count                     0.197963
rel_clustering * rel_frequency                  -0.015810
rel_clustering * rel_letters_count              -0.111460
rel_clustering * rel_orthographic_density       -0.208749
rel_clustering * rel_synonyms_count              0.001545
rel_frequency * rel_letters_count               -0.011142
rel_frequency * rel_orthographic_density        -0.024900
rel_frequency * rel_synonyms_count               0.023104
rel_letters_count * rel_orthographic_density     0.053060
rel_letters_count * rel_synonyms_count          -0.052241
rel_orthographic_density * rel_synonyms_count    0.171173
dtype: float64

Regressing global frequency with 1161 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.10475942731006904

intercept                      4.550934
global_aoa                     0.005743
global_clustering              0.056650
global_frequency               0.394863
global_letters_count           0.122825
global_orthographic_density    0.159245
global_synonyms_count          0.133111
rel_aoa                        0.052188
rel_clustering                -0.037527
rel_frequency                 -0.006108
rel_letters_count             -0.173948
rel_orthographic_density      -0.204465
rel_synonyms_count            -0.261256
dtype: float64

Regressing global frequency with 1161 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.1703632873631069

intercept                                                -36.905088
global_aoa                                                 0.909706
global_clustering                                         -6.913103
global_frequency                                           1.449273
global_letters_count                                       0.514666
global_orthographic_density                                8.716022
global_synonyms_count                                     12.269183
rel_aoa                                                   -0.573135
rel_clustering                                             8.936461
rel_frequency                                             -1.388017
rel_letters_count                                         -1.445242
rel_orthographic_density                                  -6.876256
rel_synonyms_count                                        -8.881459
global_aoa * global_clustering                             0.217089
global_aoa * global_frequency                              0.070771
global_aoa * global_letters_count                          0.024422
global_aoa * global_orthographic_density                  -0.149820
global_aoa * global_synonyms_count                        -0.245770
global_aoa * rel_aoa                                      -0.002599
global_aoa * rel_clustering                               -0.295445
global_aoa * rel_frequency                                 0.001099
global_aoa * rel_letters_count                            -0.011999
global_aoa * rel_orthographic_density                      0.044603
global_aoa * rel_synonyms_count                            0.196156
global_clustering * global_frequency                       0.195372
global_clustering * global_letters_count                   0.167772
global_clustering * global_orthographic_density            1.234745
global_clustering * global_synonyms_count                  0.708241
global_clustering * rel_aoa                               -0.278541
global_clustering * rel_clustering                         0.108281
global_clustering * rel_frequency                         -0.249732
global_clustering * rel_letters_count                     -0.264470
global_clustering * rel_orthographic_density              -1.097049
global_clustering * rel_synonyms_count                    -0.539182
global_frequency * global_letters_count                    0.016360
global_frequency * global_orthographic_density            -0.076570
global_frequency * global_synonyms_count                  -0.446528
global_frequency * rel_aoa                                -0.125964
global_frequency * rel_clustering                         -0.260240
global_frequency * rel_frequency                           0.004615
global_frequency * rel_letters_count                      -0.027044
global_frequency * rel_orthographic_density               -0.046284
global_frequency * rel_synonyms_count                      0.276904
global_letters_count * global_orthographic_density         0.122852
global_letters_count * global_synonyms_count              -0.026111
global_letters_count * rel_aoa                             0.024570
global_letters_count * rel_clustering                     -0.201048
global_letters_count * rel_frequency                      -0.066340
global_letters_count * rel_letters_count                   0.016440
global_letters_count * rel_orthographic_density            0.004183
global_letters_count * rel_synonyms_count                 -0.120879
global_orthographic_density * global_synonyms_count       -0.607100
global_orthographic_density * rel_aoa                      0.037932
global_orthographic_density * rel_clustering              -1.208292
global_orthographic_density * rel_frequency               -0.030787
global_orthographic_density * rel_letters_count           -0.014335
global_orthographic_density * rel_orthographic_density     0.006580
global_orthographic_density * rel_synonyms_count           0.798136
global_synonyms_count * rel_aoa                            0.268944
global_synonyms_count * rel_clustering                    -0.913457
global_synonyms_count * rel_frequency                      0.391871
global_synonyms_count * rel_letters_count                  0.374951
global_synonyms_count * rel_orthographic_density           1.177683
global_synonyms_count * rel_synonyms_count                -0.057612
rel_aoa * rel_clustering                                   0.222541
rel_aoa * rel_frequency                                    0.044591
rel_aoa * rel_letters_count                               -0.016446
rel_aoa * rel_orthographic_density                         0.030972
rel_aoa * rel_synonyms_count                              -0.118518
rel_clustering * rel_frequency                             0.276004
rel_clustering * rel_letters_count                         0.209690
rel_clustering * rel_orthographic_density                  0.897050
rel_clustering * rel_synonyms_count                        0.920472
rel_frequency * rel_letters_count                          0.069616
rel_frequency * rel_orthographic_density                   0.101343
rel_frequency * rel_synonyms_count                        -0.194866
rel_letters_count * rel_orthographic_density               0.039564
rel_letters_count * rel_synonyms_count                    -0.253465
rel_orthographic_density * rel_synonyms_count             -1.280971
dtype: float64

Regressing rel frequency with 1161 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.3692535743730183

intercept                      4.182974
global_aoa                     0.000628
global_clustering              0.092043
global_frequency              -0.547665
global_letters_count           0.152136
global_orthographic_density    0.137463
global_synonyms_count          0.088632
rel_aoa                        0.043782
rel_clustering                -0.054827
rel_frequency                  0.973339
rel_letters_count             -0.206179
rel_orthographic_density      -0.185905
rel_synonyms_count            -0.203482
dtype: float64

Regressing rel frequency with 1161 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.41443170077609315

intercept                                                -36.640635
global_aoa                                                 0.948240
global_clustering                                         -6.706670
global_frequency                                           0.401734
global_letters_count                                       0.585243
global_orthographic_density                                8.895021
global_synonyms_count                                     10.489028
rel_aoa                                                   -0.786465
rel_clustering                                             8.778522
rel_frequency                                             -0.444491
rel_letters_count                                         -1.501943
rel_orthographic_density                                  -7.310481
rel_synonyms_count                                        -7.588912
global_aoa * global_clustering                             0.207522
global_aoa * global_frequency                              0.059901
global_aoa * global_letters_count                          0.017903
global_aoa * global_orthographic_density                  -0.134958
global_aoa * global_synonyms_count                        -0.216006
global_aoa * rel_aoa                                       0.001324
global_aoa * rel_clustering                               -0.291675
global_aoa * rel_frequency                                 0.008649
global_aoa * rel_letters_count                             0.003101
global_aoa * rel_orthographic_density                      0.051405
global_aoa * rel_synonyms_count                            0.171094
global_clustering * global_frequency                       0.182358
global_clustering * global_letters_count                   0.196847
global_clustering * global_orthographic_density            1.202160
global_clustering * global_synonyms_count                  0.692351
global_clustering * rel_aoa                               -0.273882
global_clustering * rel_clustering                         0.094065
global_clustering * rel_frequency                         -0.228699
global_clustering * rel_letters_count                     -0.262641
global_clustering * rel_orthographic_density              -1.026613
global_clustering * rel_synonyms_count                    -0.504126
global_frequency * global_letters_count                    0.048638
global_frequency * global_orthographic_density            -0.094111
global_frequency * global_synonyms_count                  -0.319265
global_frequency * rel_aoa                                -0.107339
global_frequency * rel_clustering                         -0.247487
global_frequency * rel_frequency                           0.018145
global_frequency * rel_letters_count                      -0.048478
global_frequency * rel_orthographic_density               -0.000989
global_frequency * rel_synonyms_count                      0.176441
global_letters_count * global_orthographic_density         0.068672
global_letters_count * global_synonyms_count              -0.033099
global_letters_count * rel_aoa                             0.028155
global_letters_count * rel_clustering                     -0.214469
global_letters_count * rel_frequency                      -0.070371
global_letters_count * rel_letters_count                   0.015813
global_letters_count * rel_orthographic_density            0.074285
global_letters_count * rel_synonyms_count                 -0.061471
global_orthographic_density * global_synonyms_count       -0.614220
global_orthographic_density * rel_aoa                      0.030485
global_orthographic_density * rel_clustering              -1.223252
global_orthographic_density * rel_frequency                0.000050
global_orthographic_density * rel_letters_count            0.051468
global_orthographic_density * rel_orthographic_density     0.036870
global_orthographic_density * rel_synonyms_count           0.832371
global_synonyms_count * rel_aoa                            0.237572
global_synonyms_count * rel_clustering                    -0.979396
global_synonyms_count * rel_frequency                      0.270343
global_synonyms_count * rel_letters_count                  0.349914
global_synonyms_count * rel_orthographic_density           1.059887
global_synonyms_count * rel_synonyms_count                -0.049345
rel_aoa * rel_clustering                                   0.228984
rel_aoa * rel_frequency                                    0.023362
rel_aoa * rel_letters_count                               -0.029582
rel_aoa * rel_orthographic_density                         0.027805
rel_aoa * rel_synonyms_count                              -0.086680
rel_clustering * rel_frequency                             0.258582
rel_clustering * rel_letters_count                         0.198317
rel_clustering * rel_orthographic_density                  0.870863
rel_clustering * rel_synonyms_count                        0.937487
rel_frequency * rel_letters_count                          0.065794
rel_frequency * rel_orthographic_density                   0.046880
rel_frequency * rel_synonyms_count                        -0.101270
rel_letters_count * rel_orthographic_density              -0.038616
rel_letters_count * rel_synonyms_count                    -0.276959
rel_orthographic_density * rel_synonyms_count             -1.188544
dtype: float64

----------------------------------------------------------------------
Regressing global aoa with 1086 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.11042444978395527

intercept                      5.917103
global_aoa                     0.301035
global_clustering              0.117429
global_frequency              -0.041689
global_letters_count           0.066940
global_orthographic_density   -0.034636
global_synonyms_count         -0.046331
dtype: float64

Regressing global aoa with 1086 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.12474162706847791

intercept                                              5.099451
global_aoa                                             0.110553
global_clustering                                     -1.150214
global_frequency                                      -0.116595
global_letters_count                                  -0.086276
global_orthographic_density                           -1.681673
global_synonyms_count                                 -0.474458
global_aoa * global_clustering                         0.046734
global_aoa * global_frequency                         -0.000928
global_aoa * global_letters_count                      0.059523
global_aoa * global_orthographic_density               0.092020
global_aoa * global_synonyms_count                    -0.055852
global_clustering * global_frequency                   0.049257
global_clustering * global_letters_count               0.096779
global_clustering * global_orthographic_density       -0.015343
global_clustering * global_synonyms_count             -0.192919
global_frequency * global_letters_count                0.032605
global_frequency * global_orthographic_density         0.108969
global_frequency * global_synonyms_count              -0.001609
global_letters_count * global_orthographic_density    -0.021241
global_letters_count * global_synonyms_count          -0.033340
global_orthographic_density * global_synonyms_count   -0.095915
dtype: float64

Regressing rel aoa with 1086 measures, no interactions
           ^^^^^^^
R^2 = 0.02734019430685353

intercept                      1.540924
global_aoa                     0.113359
global_clustering              0.096268
global_frequency              -0.074244
global_letters_count           0.009886
global_orthographic_density    0.001196
global_synonyms_count         -0.023566
dtype: float64

Regressing rel aoa with 1086 measures, with interactions
           ^^^^^^^
R^2 = 0.04166972298223859

intercept                                             -2.951563
global_aoa                                             0.753858
global_clustering                                     -0.627347
global_frequency                                       0.170579
global_letters_count                                   0.011940
global_orthographic_density                           -1.087063
global_synonyms_count                                 -0.146830
global_aoa * global_clustering                         0.080687
global_aoa * global_frequency                         -0.021381
global_aoa * global_letters_count                     -0.003717
global_aoa * global_orthographic_density               0.033930
global_aoa * global_synonyms_count                    -0.048543
global_clustering * global_frequency                   0.041968
global_clustering * global_letters_count               0.011875
global_clustering * global_orthographic_density       -0.111982
global_clustering * global_synonyms_count             -0.332679
global_frequency * global_letters_count                0.017295
global_frequency * global_orthographic_density         0.048688
global_frequency * global_synonyms_count              -0.120083
global_letters_count * global_orthographic_density    -0.046651
global_letters_count * global_synonyms_count          -0.055914
global_orthographic_density * global_synonyms_count   -0.057033
dtype: float64

Regressing global aoa with 1086 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.043169063489390536

intercept                   6.717236
rel_aoa                     0.108811
rel_clustering              0.282065
rel_frequency               0.050940
rel_letters_count           0.036065
rel_orthographic_density   -0.318602
rel_synonyms_count          0.010470
dtype: float64

Regressing global aoa with 1086 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.06530021889360116

intercept                                        6.762380
rel_aoa                                         -0.104207
rel_clustering                                   0.157129
rel_frequency                                    0.078369
rel_letters_count                               -0.017182
rel_orthographic_density                        -0.403861
rel_synonyms_count                               0.001822
rel_aoa * rel_clustering                         0.053608
rel_aoa * rel_frequency                         -0.052520
rel_aoa * rel_letters_count                      0.032566
rel_aoa * rel_orthographic_density               0.030054
rel_aoa * rel_synonyms_count                    -0.091201
rel_clustering * rel_frequency                   0.045158
rel_clustering * rel_letters_count               0.112965
rel_clustering * rel_orthographic_density        0.032578
rel_clustering * rel_synonyms_count             -0.297173
rel_frequency * rel_letters_count               -0.009821
rel_frequency * rel_orthographic_density         0.007973
rel_frequency * rel_synonyms_count              -0.068908
rel_letters_count * rel_orthographic_density     0.039429
rel_letters_count * rel_synonyms_count           0.087044
rel_orthographic_density * rel_synonyms_count    0.257176
dtype: float64

Regressing rel aoa with 1086 measures, no interactions
           ^^^^^^^
R^2 = 0.17639193412244614

intercept                   0.719240
rel_aoa                     0.468731
rel_clustering             -0.027894
rel_frequency              -0.077250
rel_letters_count          -0.008894
rel_orthographic_density    0.086268
rel_synonyms_count          0.021489
dtype: float64

Regressing rel aoa with 1086 measures, with interactions
           ^^^^^^^
R^2 = 0.18709722810868834

intercept                                        0.894703
rel_aoa                                          0.476143
rel_clustering                                  -0.182487
rel_frequency                                   -0.026174
rel_letters_count                               -0.084925
rel_orthographic_density                         0.203873
rel_synonyms_count                               0.029552
rel_aoa * rel_clustering                         0.050398
rel_aoa * rel_frequency                          0.012720
rel_aoa * rel_letters_count                      0.003188
rel_aoa * rel_orthographic_density              -0.000646
rel_aoa * rel_synonyms_count                    -0.086332
rel_clustering * rel_frequency                  -0.004009
rel_clustering * rel_letters_count               0.066264
rel_clustering * rel_orthographic_density        0.072684
rel_clustering * rel_synonyms_count             -0.010434
rel_frequency * rel_letters_count               -0.007423
rel_frequency * rel_orthographic_density         0.058645
rel_frequency * rel_synonyms_count              -0.022428
rel_letters_count * rel_orthographic_density    -0.007990
rel_letters_count * rel_synonyms_count           0.102728
rel_orthographic_density * rel_synonyms_count    0.251364
dtype: float64

Regressing global aoa with 1086 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.13068865199926538

intercept                      4.526975
global_aoa                     0.444253
global_clustering              0.128537
global_frequency              -0.023633
global_letters_count           0.180919
global_orthographic_density    0.029905
global_synonyms_count         -0.323798
rel_aoa                       -0.223550
rel_clustering                -0.008152
rel_frequency                 -0.027559
rel_letters_count             -0.134118
rel_orthographic_density      -0.037309
rel_synonyms_count             0.317906
dtype: float64

Regressing global aoa with 1086 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.2047928284558056

intercept                                                 80.004426
global_aoa                                                -0.502588
global_clustering                                          9.535099
global_frequency                                          -1.819693
global_letters_count                                      -6.097286
global_orthographic_density                              -17.097317
global_synonyms_count                                     -6.566081
rel_aoa                                                    0.445974
rel_clustering                                            -9.769941
rel_frequency                                              2.883649
rel_letters_count                                          4.981123
rel_orthographic_density                                  12.642100
rel_synonyms_count                                         1.148984
global_aoa * global_clustering                            -0.097830
global_aoa * global_frequency                             -0.069599
global_aoa * global_letters_count                          0.126928
global_aoa * global_orthographic_density                   0.249417
global_aoa * global_synonyms_count                        -0.167028
global_aoa * rel_aoa                                       0.007543
global_aoa * rel_clustering                                0.056536
global_aoa * rel_frequency                                 0.019532
global_aoa * rel_letters_count                            -0.081008
global_aoa * rel_orthographic_density                     -0.166455
global_aoa * rel_synonyms_count                            0.332705
global_clustering * global_frequency                      -0.104908
global_clustering * global_letters_count                  -0.405232
global_clustering * global_orthographic_density           -2.458471
global_clustering * global_synonyms_count                 -0.601074
global_clustering * rel_aoa                                0.212612
global_clustering * rel_clustering                         0.063380
global_clustering * rel_frequency                          0.341507
global_clustering * rel_letters_count                      0.344333
global_clustering * rel_orthographic_density               2.120664
global_clustering * rel_synonyms_count                    -0.120949
global_frequency * global_letters_count                    0.294597
global_frequency * global_orthographic_density             0.132859
global_frequency * global_synonyms_count                   0.050053
global_frequency * rel_aoa                                 0.118297
global_frequency * rel_clustering                          0.102283
global_frequency * rel_frequency                          -0.034788
global_frequency * rel_letters_count                      -0.248400
global_frequency * rel_orthographic_density                0.028449
global_frequency * rel_synonyms_count                      0.274480
global_letters_count * global_orthographic_density        -0.128301
global_letters_count * global_synonyms_count               0.387268
global_letters_count * rel_aoa                            -0.064114
global_letters_count * rel_clustering                      0.622738
global_letters_count * rel_frequency                      -0.110288
global_letters_count * rel_letters_count                   0.025486
global_letters_count * rel_orthographic_density            0.117782
global_letters_count * rel_synonyms_count                 -0.569861
global_orthographic_density * global_synonyms_count        0.356481
global_orthographic_density * rel_aoa                     -0.180351
global_orthographic_density * rel_clustering               2.415342
global_orthographic_density * rel_frequency                0.026531
global_orthographic_density * rel_letters_count            0.105148
global_orthographic_density * rel_orthographic_density     0.263664
global_orthographic_density * rel_synonyms_count          -1.662629
global_synonyms_count * rel_aoa                           -0.163748
global_synonyms_count * rel_clustering                     0.853015
global_synonyms_count * rel_frequency                     -0.488988
global_synonyms_count * rel_letters_count                 -0.890907
global_synonyms_count * rel_orthographic_density          -1.180091
global_synonyms_count * rel_synonyms_count                -0.023774
rel_aoa * rel_clustering                                  -0.085515
rel_aoa * rel_frequency                                   -0.076680
rel_aoa * rel_letters_count                                0.038750
rel_aoa * rel_orthographic_density                         0.143061
rel_aoa * rel_synonyms_count                              -0.070301
rel_clustering * rel_frequency                            -0.331874
rel_clustering * rel_letters_count                        -0.393456
rel_clustering * rel_orthographic_density                 -1.837171
rel_clustering * rel_synonyms_count                       -0.374081
rel_frequency * rel_letters_count                          0.069129
rel_frequency * rel_orthographic_density                  -0.065803
rel_frequency * rel_synonyms_count                         0.158881
rel_letters_count * rel_orthographic_density               0.035935
rel_letters_count * rel_synonyms_count                     1.011138
rel_orthographic_density * rel_synonyms_count              2.515135
dtype: float64

Regressing rel aoa with 1086 measures, no interactions
           ^^^^^^^
R^2 = 0.2245166577889857

intercept                      2.942644
global_aoa                    -0.378622
global_clustering              0.063177
global_frequency              -0.009426
global_letters_count           0.117375
global_orthographic_density   -0.020673
global_synonyms_count         -0.115429
rel_aoa                        0.746909
rel_clustering                 0.017197
rel_frequency                 -0.043170
rel_letters_count             -0.087305
rel_orthographic_density      -0.056238
rel_synonyms_count             0.098431
dtype: float64

Regressing rel aoa with 1086 measures, with interactions
           ^^^^^^^
R^2 = 0.2767373036358838

intercept                                                 56.355743
global_aoa                                                -1.010120
global_clustering                                          6.668357
global_frequency                                          -1.274900
global_letters_count                                      -4.007846
global_orthographic_density                              -13.320716
global_synonyms_count                                     -5.019890
rel_aoa                                                    1.762654
rel_clustering                                            -5.987591
rel_frequency                                              2.229802
rel_letters_count                                          3.967092
rel_orthographic_density                                  11.174220
rel_synonyms_count                                         0.997056
global_aoa * global_clustering                            -0.102541
global_aoa * global_frequency                             -0.069998
global_aoa * global_letters_count                          0.058670
global_aoa * global_orthographic_density                   0.223225
global_aoa * global_synonyms_count                        -0.073456
global_aoa * rel_aoa                                      -0.009345
global_aoa * rel_clustering                                0.013576
global_aoa * rel_frequency                                 0.019631
global_aoa * rel_letters_count                            -0.044220
global_aoa * rel_orthographic_density                     -0.202904
global_aoa * rel_synonyms_count                            0.224801
global_clustering * global_frequency                      -0.068231
global_clustering * global_letters_count                  -0.248368
global_clustering * global_orthographic_density           -1.733776
global_clustering * global_synonyms_count                 -0.650047
global_clustering * rel_aoa                                0.224254
global_clustering * rel_clustering                         0.011399
global_clustering * rel_frequency                          0.239000
global_clustering * rel_letters_count                      0.257179
global_clustering * rel_orthographic_density               1.504348
global_clustering * rel_synonyms_count                     0.147917
global_frequency * global_letters_count                    0.220747
global_frequency * global_orthographic_density             0.176172
global_frequency * global_synonyms_count                  -0.070249
global_frequency * rel_aoa                                 0.097894
global_frequency * rel_clustering                         -0.005341
global_frequency * rel_frequency                          -0.020450
global_frequency * rel_letters_count                      -0.205531
global_frequency * rel_orthographic_density               -0.120481
global_frequency * rel_synonyms_count                      0.321142
global_letters_count * global_orthographic_density        -0.095246
global_letters_count * global_synonyms_count               0.250253
global_letters_count * rel_aoa                            -0.051021
global_letters_count * rel_clustering                      0.440114
global_letters_count * rel_frequency                      -0.098227
global_letters_count * rel_letters_count                   0.012442
global_letters_count * rel_orthographic_density            0.072611
global_letters_count * rel_synonyms_count                 -0.414545
global_orthographic_density * global_synonyms_count        0.289502
global_orthographic_density * rel_aoa                     -0.172761
global_orthographic_density * rel_clustering               1.685504
global_orthographic_density * rel_frequency               -0.112267
global_orthographic_density * rel_letters_count            0.023477
global_orthographic_density * rel_orthographic_density     0.152885
global_orthographic_density * rel_synonyms_count          -1.236964
global_synonyms_count * rel_aoa                           -0.170261
global_synonyms_count * rel_clustering                     0.861191
global_synonyms_count * rel_frequency                     -0.240078
global_synonyms_count * rel_letters_count                 -0.576242
global_synonyms_count * rel_orthographic_density          -0.917183
global_synonyms_count * rel_synonyms_count                -0.022376
rel_aoa * rel_clustering                                  -0.057415
rel_aoa * rel_frequency                                   -0.051890
rel_aoa * rel_letters_count                                0.040309
rel_aoa * rel_orthographic_density                         0.158610
rel_aoa * rel_synonyms_count                              -0.019836
rel_clustering * rel_frequency                            -0.142301
rel_clustering * rel_letters_count                        -0.308743
rel_clustering * rel_orthographic_density                 -1.340928
rel_clustering * rel_synonyms_count                       -0.557254
rel_frequency * rel_letters_count                          0.075356
rel_frequency * rel_orthographic_density                   0.113721
rel_frequency * rel_synonyms_count                        -0.030809
rel_letters_count * rel_orthographic_density               0.067712
rel_letters_count * rel_synonyms_count                     0.689365
rel_orthographic_density * rel_synonyms_count              1.942932
dtype: float64

----------------------------------------------------------------------
Regressing global clustering with 963 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.09273241181178615

intercept                     -3.572419
global_aoa                    -0.013346
global_clustering              0.280722
global_frequency              -0.042476
global_letters_count          -0.007819
global_orthographic_density   -0.023686
global_synonyms_count         -0.053570
dtype: float64

Regressing global clustering with 963 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.16416738657559538

intercept                                             -5.892083
global_aoa                                             0.285347
global_clustering                                     -0.265641
global_frequency                                      -0.492829
global_letters_count                                   0.422474
global_orthographic_density                            0.696611
global_synonyms_count                                  0.126070
global_aoa * global_clustering                         0.072282
global_aoa * global_frequency                          0.013287
global_aoa * global_letters_count                      0.001177
global_aoa * global_orthographic_density              -0.001105
global_aoa * global_synonyms_count                    -0.003658
global_clustering * global_frequency                  -0.054655
global_clustering * global_letters_count               0.071411
global_clustering * global_orthographic_density        0.108784
global_clustering * global_synonyms_count              0.012029
global_frequency * global_letters_count                0.001362
global_frequency * global_orthographic_density         0.010532
global_frequency * global_synonyms_count               0.032782
global_letters_count * global_orthographic_density    -0.022372
global_letters_count * global_synonyms_count          -0.034847
global_orthographic_density * global_synonyms_count   -0.115626
dtype: float64

Regressing rel clustering with 963 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.06759459467158024

intercept                      2.144134
global_aoa                    -0.008230
global_clustering              0.253508
global_frequency              -0.020647
global_letters_count          -0.003196
global_orthographic_density    0.000667
global_synonyms_count         -0.070109
dtype: float64

Regressing rel clustering with 963 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.1296743809722617

intercept                                              0.425859
global_aoa                                             0.230375
global_clustering                                     -0.347209
global_frequency                                      -0.516192
global_letters_count                                   0.282996
global_orthographic_density                            0.742227
global_synonyms_count                                  0.364057
global_aoa * global_clustering                         0.066420
global_aoa * global_frequency                          0.015697
global_aoa * global_letters_count                      0.002866
global_aoa * global_orthographic_density              -0.001291
global_aoa * global_synonyms_count                    -0.018130
global_clustering * global_frequency                  -0.047426
global_clustering * global_letters_count               0.068093
global_clustering * global_orthographic_density        0.125393
global_clustering * global_synonyms_count              0.066626
global_frequency * global_letters_count                0.012442
global_frequency * global_orthographic_density         0.018557
global_frequency * global_synonyms_count               0.022551
global_letters_count * global_orthographic_density    -0.022832
global_letters_count * global_synonyms_count           0.001257
global_orthographic_density * global_synonyms_count   -0.086897
dtype: float64

Regressing global clustering with 963 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.05902768440447648

intercept                  -5.909058
rel_aoa                    -0.001316
rel_clustering              0.244918
rel_frequency              -0.014025
rel_letters_count          -0.001904
rel_orthographic_density   -0.006565
rel_synonyms_count         -0.075734
dtype: float64

Regressing global clustering with 963 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.08990619738361405

intercept                                       -5.880673
rel_aoa                                         -0.017865
rel_clustering                                   0.070834
rel_frequency                                   -0.020923
rel_letters_count                               -0.040461
rel_orthographic_density                        -0.016208
rel_synonyms_count                              -0.170552
rel_aoa * rel_clustering                         0.052485
rel_aoa * rel_frequency                          0.001376
rel_aoa * rel_letters_count                     -0.005484
rel_aoa * rel_orthographic_density              -0.009128
rel_aoa * rel_synonyms_count                     0.006739
rel_clustering * rel_frequency                  -0.010628
rel_clustering * rel_letters_count               0.047374
rel_clustering * rel_orthographic_density        0.015919
rel_clustering * rel_synonyms_count              0.051853
rel_frequency * rel_letters_count               -0.002698
rel_frequency * rel_orthographic_density        -0.011804
rel_frequency * rel_synonyms_count              -0.033276
rel_letters_count * rel_orthographic_density    -0.013181
rel_letters_count * rel_synonyms_count          -0.005683
rel_orthographic_density * rel_synonyms_count    0.009254
dtype: float64

Regressing rel clustering with 963 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.21731944458987607

intercept                   0.221013
rel_aoa                    -0.020294
rel_clustering              0.492326
rel_frequency               0.001752
rel_letters_count           0.012591
rel_orthographic_density    0.015749
rel_synonyms_count         -0.026227
dtype: float64

Regressing rel clustering with 963 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.24638431132408667

intercept                                        0.236605
rel_aoa                                         -0.022916
rel_clustering                                   0.344555
rel_frequency                                   -0.011912
rel_letters_count                               -0.024333
rel_orthographic_density                        -0.015333
rel_synonyms_count                              -0.146960
rel_aoa * rel_clustering                         0.045692
rel_aoa * rel_frequency                          0.005108
rel_aoa * rel_letters_count                     -0.008202
rel_aoa * rel_orthographic_density              -0.014303
rel_aoa * rel_synonyms_count                     0.013690
rel_clustering * rel_frequency                  -0.012786
rel_clustering * rel_letters_count               0.047402
rel_clustering * rel_orthographic_density        0.038497
rel_clustering * rel_synonyms_count             -0.004329
rel_frequency * rel_letters_count               -0.003122
rel_frequency * rel_orthographic_density        -0.019838
rel_frequency * rel_synonyms_count              -0.043158
rel_letters_count * rel_orthographic_density    -0.013249
rel_letters_count * rel_synonyms_count           0.016356
rel_orthographic_density * rel_synonyms_count    0.054079
dtype: float64

Regressing global clustering with 963 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.10173575643038746

intercept                     -2.479430
global_aoa                    -0.019000
global_clustering              0.275005
global_frequency              -0.099950
global_letters_count          -0.072864
global_orthographic_density   -0.103528
global_synonyms_count          0.022411
rel_aoa                        0.003175
rel_clustering                 0.012891
rel_frequency                  0.063658
rel_letters_count              0.066495
rel_orthographic_density       0.079207
rel_synonyms_count            -0.096857
dtype: float64

Regressing global clustering with 963 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.2301864340813674

intercept                                                 16.168919
global_aoa                                                -0.137249
global_clustering                                          3.705904
global_frequency                                          -1.907403
global_letters_count                                       0.279415
global_orthographic_density                                1.397808
global_synonyms_count                                     -2.642020
rel_aoa                                                   -0.054513
rel_clustering                                            -3.964612
rel_frequency                                              1.187820
rel_letters_count                                         -0.139937
rel_orthographic_density                                  -1.870585
rel_synonyms_count                                         0.920532
global_aoa * global_clustering                             0.009452
global_aoa * global_frequency                              0.033171
global_aoa * global_letters_count                         -0.013165
global_aoa * global_orthographic_density                  -0.062247
global_aoa * global_synonyms_count                        -0.011721
global_aoa * rel_aoa                                       0.006961
global_aoa * rel_clustering                                0.098548
global_aoa * rel_frequency                                -0.005871
global_aoa * rel_letters_count                             0.021110
global_aoa * rel_orthographic_density                      0.066952
global_aoa * rel_synonyms_count                            0.070497
global_clustering * global_frequency                      -0.301868
global_clustering * global_letters_count                  -0.028058
global_clustering * global_orthographic_density            0.024230
global_clustering * global_synonyms_count                 -0.164589
global_clustering * rel_aoa                               -0.002327
global_clustering * rel_clustering                        -0.082859
global_clustering * rel_frequency                          0.218705
global_clustering * rel_letters_count                      0.074934
global_clustering * rel_orthographic_density               0.030292
global_clustering * rel_synonyms_count                     0.263479
global_frequency * global_letters_count                   -0.036774
global_frequency * global_orthographic_density            -0.057268
global_frequency * global_synonyms_count                   0.207753
global_frequency * rel_aoa                                -0.023985
global_frequency * rel_clustering                          0.196741
global_frequency * rel_frequency                           0.006302
global_frequency * rel_letters_count                       0.052909
global_frequency * rel_orthographic_density                0.112293
global_frequency * rel_synonyms_count                     -0.042143
global_letters_count * global_orthographic_density        -0.014114
global_letters_count * global_synonyms_count               0.029626
global_letters_count * rel_aoa                             0.007946
global_letters_count * rel_clustering                      0.091260
global_letters_count * rel_frequency                       0.030757
global_letters_count * rel_letters_count                   0.006391
global_letters_count * rel_orthographic_density            0.053403
global_letters_count * rel_synonyms_count                  0.002666
global_orthographic_density * global_synonyms_count       -0.344140
global_orthographic_density * rel_aoa                      0.113068
global_orthographic_density * rel_clustering               0.113085
global_orthographic_density * rel_frequency                0.057884
global_orthographic_density * rel_letters_count           -0.035499
global_orthographic_density * rel_orthographic_density     0.090436
global_orthographic_density * rel_synonyms_count           0.294156
global_synonyms_count * rel_aoa                            0.004345
global_synonyms_count * rel_clustering                    -0.102484
global_synonyms_count * rel_frequency                     -0.178151
global_synonyms_count * rel_letters_count                 -0.162460
global_synonyms_count * rel_orthographic_density          -0.076274
global_synonyms_count * rel_synonyms_count                 0.041540
rel_aoa * rel_clustering                                  -0.034260
rel_aoa * rel_frequency                                    0.011877
rel_aoa * rel_letters_count                               -0.022763
rel_aoa * rel_orthographic_density                        -0.119590
rel_aoa * rel_synonyms_count                              -0.062321
rel_clustering * rel_frequency                            -0.168747
rel_clustering * rel_letters_count                        -0.061062
rel_clustering * rel_orthographic_density                 -0.044036
rel_clustering * rel_synonyms_count                        0.002552
rel_frequency * rel_letters_count                         -0.049659
rel_frequency * rel_orthographic_density                  -0.102375
rel_frequency * rel_synonyms_count                         0.036062
rel_letters_count * rel_orthographic_density               0.023330
rel_letters_count * rel_synonyms_count                     0.123530
rel_orthographic_density * rel_synonyms_count              0.109680
dtype: float64

Regressing rel clustering with 963 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.2846576682124845

intercept                     -1.398934
global_aoa                    -0.011406
global_clustering             -0.479491
global_frequency              -0.078326
global_letters_count          -0.062189
global_orthographic_density   -0.064413
global_synonyms_count         -0.028234
rel_aoa                       -0.007237
rel_clustering                 0.881652
rel_frequency                  0.052786
rel_letters_count              0.065124
rel_orthographic_density       0.053326
rel_synonyms_count            -0.032315
dtype: float64

Regressing rel clustering with 963 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.3782221872825321

intercept                                                 13.089291
global_aoa                                                -0.121379
global_clustering                                          2.127325
global_frequency                                          -1.457275
global_letters_count                                       0.110758
global_orthographic_density                                1.193421
global_synonyms_count                                     -1.848777
rel_aoa                                                   -0.016340
rel_clustering                                            -2.596261
rel_frequency                                              0.889736
rel_letters_count                                         -0.204275
rel_orthographic_density                                  -1.682865
rel_synonyms_count                                        -0.146934
global_aoa * global_clustering                            -0.002742
global_aoa * global_frequency                              0.024256
global_aoa * global_letters_count                         -0.011337
global_aoa * global_orthographic_density                  -0.062832
global_aoa * global_synonyms_count                        -0.039869
global_aoa * rel_aoa                                       0.007668
global_aoa * rel_clustering                                0.086338
global_aoa * rel_frequency                                -0.003973
global_aoa * rel_letters_count                             0.023217
global_aoa * rel_orthographic_density                      0.065079
global_aoa * rel_synonyms_count                            0.094727
global_clustering * global_frequency                      -0.220106
global_clustering * global_letters_count                  -0.028938
global_clustering * global_orthographic_density            0.051742
global_clustering * global_synonyms_count                 -0.231673
global_clustering * rel_aoa                               -0.000569
global_clustering * rel_clustering                        -0.116594
global_clustering * rel_frequency                          0.164309
global_clustering * rel_letters_count                      0.055032
global_clustering * rel_orthographic_density              -0.001883
global_clustering * rel_synonyms_count                     0.286299
global_frequency * global_letters_count                   -0.021596
global_frequency * global_orthographic_density            -0.021066
global_frequency * global_synonyms_count                   0.116951
global_frequency * rel_aoa                                -0.022317
global_frequency * rel_clustering                          0.142206
global_frequency * rel_frequency                           0.005801
global_frequency * rel_letters_count                       0.044267
global_frequency * rel_orthographic_density                0.079214
global_frequency * rel_synonyms_count                      0.039652
global_letters_count * global_orthographic_density        -0.010408
global_letters_count * global_synonyms_count               0.007225
global_letters_count * rel_aoa                             0.005431
global_letters_count * rel_clustering                      0.104113
global_letters_count * rel_frequency                       0.025100
global_letters_count * rel_letters_count                   0.004366
global_letters_count * rel_orthographic_density            0.043117
global_letters_count * rel_synonyms_count                  0.044035
global_orthographic_density * global_synonyms_count       -0.364419
global_orthographic_density * rel_aoa                      0.097767
global_orthographic_density * rel_clustering               0.055634
global_orthographic_density * rel_frequency                0.037553
global_orthographic_density * rel_letters_count           -0.024893
global_orthographic_density * rel_orthographic_density     0.085185
global_orthographic_density * rel_synonyms_count           0.326084
global_synonyms_count * rel_aoa                            0.010002
global_synonyms_count * rel_clustering                     0.017482
global_synonyms_count * rel_frequency                     -0.116725
global_synonyms_count * rel_letters_count                 -0.098267
global_synonyms_count * rel_orthographic_density           0.009377
global_synonyms_count * rel_synonyms_count                 0.045138
rel_aoa * rel_clustering                                  -0.026935
rel_aoa * rel_frequency                                    0.015069
rel_aoa * rel_letters_count                               -0.021462
rel_aoa * rel_orthographic_density                        -0.098876
rel_aoa * rel_synonyms_count                              -0.057953
rel_clustering * rel_frequency                            -0.132928
rel_clustering * rel_letters_count                        -0.058571
rel_clustering * rel_orthographic_density                  0.007769
rel_clustering * rel_synonyms_count                       -0.099991
rel_frequency * rel_letters_count                         -0.043805
rel_frequency * rel_orthographic_density                  -0.077949
rel_frequency * rel_synonyms_count                        -0.030048
rel_letters_count * rel_orthographic_density               0.017035
rel_letters_count * rel_synonyms_count                     0.047406
rel_orthographic_density * rel_synonyms_count              0.019833
dtype: float64

----------------------------------------------------------------------
Regressing global letters_count with 1161 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.13464868482973624

intercept                      3.868800
global_aoa                     0.005179
global_clustering             -0.026663
global_frequency               0.049935
global_letters_count           0.367401
global_orthographic_density   -0.130362
global_synonyms_count         -0.329013
dtype: float64

Regressing global letters_count with 1161 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.14924866700013062

intercept                                              6.065154
global_aoa                                            -0.114311
global_clustering                                     -0.622126
global_frequency                                      -0.026116
global_letters_count                                  -0.190613
global_orthographic_density                           -2.250155
global_synonyms_count                                 -1.366575
global_aoa * global_clustering                         0.074559
global_aoa * global_frequency                          0.041656
global_aoa * global_letters_count                      0.018667
global_aoa * global_orthographic_density               0.064984
global_aoa * global_synonyms_count                    -0.007252
global_clustering * global_frequency                   0.067059
global_clustering * global_letters_count              -0.086281
global_clustering * global_orthographic_density       -0.087309
global_clustering * global_synonyms_count              0.165260
global_frequency * global_letters_count               -0.009503
global_frequency * global_orthographic_density         0.142950
global_frequency * global_synonyms_count               0.131270
global_letters_count * global_orthographic_density    -0.044474
global_letters_count * global_synonyms_count           0.103967
global_orthographic_density * global_synonyms_count    0.209858
dtype: float64

Regressing rel letters_count with 1161 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.07372803029266195

intercept                      1.127398
global_aoa                    -0.050179
global_clustering             -0.061754
global_frequency               0.015716
global_letters_count           0.275244
global_orthographic_density   -0.107589
global_synonyms_count         -0.426139
dtype: float64

Regressing rel letters_count with 1161 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.0881426982780541

intercept                                             -1.311378
global_aoa                                             0.231527
global_clustering                                     -1.044509
global_frequency                                       0.099757
global_letters_count                                   0.139002
global_orthographic_density                           -1.981233
global_synonyms_count                                 -1.267812
global_aoa * global_clustering                         0.092201
global_aoa * global_frequency                          0.027114
global_aoa * global_letters_count                     -0.005829
global_aoa * global_orthographic_density               0.051459
global_aoa * global_synonyms_count                    -0.029370
global_clustering * global_frequency                   0.076026
global_clustering * global_letters_count              -0.046579
global_clustering * global_orthographic_density       -0.090320
global_clustering * global_synonyms_count              0.064510
global_frequency * global_letters_count               -0.007581
global_frequency * global_orthographic_density         0.139075
global_frequency * global_synonyms_count               0.075204
global_letters_count * global_orthographic_density    -0.068420
global_letters_count * global_synonyms_count           0.083807
global_orthographic_density * global_synonyms_count    0.194010
dtype: float64

Regressing global letters_count with 1161 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.10014936803851404

intercept                   5.739292
rel_aoa                    -0.083904
rel_clustering              0.156329
rel_frequency               0.085235
rel_letters_count           0.282338
rel_orthographic_density   -0.305044
rel_synonyms_count         -0.236031
dtype: float64

Regressing global letters_count with 1161 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.10731561801158851

intercept                                        5.790251
rel_aoa                                         -0.132966
rel_clustering                                   0.042902
rel_frequency                                    0.151388
rel_letters_count                                0.311857
rel_orthographic_density                        -0.365097
rel_synonyms_count                              -0.093778
rel_aoa * rel_clustering                         0.044099
rel_aoa * rel_frequency                         -0.010114
rel_aoa * rel_letters_count                      0.001081
rel_aoa * rel_orthographic_density              -0.008207
rel_aoa * rel_synonyms_count                    -0.031679
rel_clustering * rel_frequency                  -0.052588
rel_clustering * rel_letters_count              -0.038842
rel_clustering * rel_orthographic_density       -0.013929
rel_clustering * rel_synonyms_count             -0.036197
rel_frequency * rel_letters_count               -0.017695
rel_frequency * rel_orthographic_density         0.004064
rel_frequency * rel_synonyms_count               0.068598
rel_letters_count * rel_orthographic_density     0.046646
rel_letters_count * rel_synonyms_count           0.130541
rel_orthographic_density * rel_synonyms_count    0.207306
dtype: float64

Regressing rel letters_count with 1161 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.16428016873386375

intercept                   1.390173
rel_aoa                    -0.089815
rel_clustering              0.027498
rel_frequency              -0.116572
rel_letters_count           0.458840
rel_orthographic_density   -0.014511
rel_synonyms_count         -0.290205
dtype: float64

Regressing rel letters_count with 1161 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.1757625424107071

intercept                                        1.389649
rel_aoa                                         -0.071891
rel_clustering                                  -0.123015
rel_frequency                                   -0.080001
rel_letters_count                                0.565899
rel_orthographic_density                         0.006395
rel_synonyms_count                              -0.106951
rel_aoa * rel_clustering                         0.076486
rel_aoa * rel_frequency                          0.015692
rel_aoa * rel_letters_count                     -0.038527
rel_aoa * rel_orthographic_density              -0.085360
rel_aoa * rel_synonyms_count                    -0.028960
rel_clustering * rel_frequency                  -0.040528
rel_clustering * rel_letters_count               0.004624
rel_clustering * rel_orthographic_density        0.053957
rel_clustering * rel_synonyms_count              0.008827
rel_frequency * rel_letters_count                0.001564
rel_frequency * rel_orthographic_density         0.040156
rel_frequency * rel_synonyms_count               0.075492
rel_letters_count * rel_orthographic_density     0.059378
rel_letters_count * rel_synonyms_count           0.131896
rel_orthographic_density * rel_synonyms_count    0.231840
dtype: float64

Regressing global letters_count with 1161 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.14992376786500572

intercept                     -0.371404
global_aoa                     0.112022
global_clustering             -0.263241
global_frequency               0.172855
global_letters_count           0.534892
global_orthographic_density   -0.084574
global_synonyms_count         -0.454033
rel_aoa                       -0.164857
rel_clustering                 0.274726
rel_frequency                 -0.151584
rel_letters_count             -0.181387
rel_orthographic_density      -0.017714
rel_synonyms_count             0.162674
dtype: float64

Regressing global letters_count with 1161 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.2078382818896235

intercept                                                 14.313714
global_aoa                                                -0.165250
global_clustering                                         -0.040403
global_frequency                                          -0.552253
global_letters_count                                      -0.649663
global_orthographic_density                               -5.842305
global_synonyms_count                                     -0.968744
rel_aoa                                                   -2.048968
rel_clustering                                            -3.326721
rel_frequency                                              1.147516
rel_letters_count                                          1.171699
rel_orthographic_density                                   1.288898
rel_synonyms_count                                        -1.512449
global_aoa * global_clustering                             0.045658
global_aoa * global_frequency                             -0.019652
global_aoa * global_letters_count                          0.034651
global_aoa * global_orthographic_density                   0.281004
global_aoa * global_synonyms_count                        -0.049377
global_aoa * rel_aoa                                       0.008312
global_aoa * rel_clustering                               -0.011926
global_aoa * rel_frequency                                -0.022045
global_aoa * rel_letters_count                            -0.041641
global_aoa * rel_orthographic_density                     -0.197503
global_aoa * rel_synonyms_count                            0.032352
global_clustering * global_frequency                       0.058797
global_clustering * global_letters_count                   0.048955
global_clustering * global_orthographic_density           -0.445630
global_clustering * global_synonyms_count                 -0.176482
global_clustering * rel_aoa                                0.120032
global_clustering * rel_clustering                         0.017183
global_clustering * rel_frequency                          0.060913
global_clustering * rel_letters_count                     -0.139894
global_clustering * rel_orthographic_density               0.289566
global_clustering * rel_synonyms_count                     0.164933
global_frequency * global_letters_count                    0.163321
global_frequency * global_orthographic_density             0.248677
global_frequency * global_synonyms_count                   0.068594
global_frequency * rel_aoa                                 0.214621
global_frequency * rel_clustering                          0.204987
global_frequency * rel_frequency                          -0.042583
global_frequency * rel_letters_count                      -0.208615
global_frequency * rel_orthographic_density                0.014146
global_frequency * rel_synonyms_count                      0.063440
global_letters_count * global_orthographic_density        -0.233863
global_letters_count * global_synonyms_count              -0.049565
global_letters_count * rel_aoa                             0.130573
global_letters_count * rel_clustering                      0.075054
global_letters_count * rel_frequency                      -0.026554
global_letters_count * rel_letters_count                   0.030346
global_letters_count * rel_orthographic_density            0.205283
global_letters_count * rel_synonyms_count                  0.259859
global_orthographic_density * global_synonyms_count       -0.178378
global_orthographic_density * rel_aoa                     -0.129861
global_orthographic_density * rel_clustering               0.295112
global_orthographic_density * rel_frequency               -0.125126
global_orthographic_density * rel_letters_count            0.115547
global_orthographic_density * rel_orthographic_density     0.198230
global_orthographic_density * rel_synonyms_count           0.319651
global_synonyms_count * rel_aoa                            0.008380
global_synonyms_count * rel_clustering                     0.688583
global_synonyms_count * rel_frequency                     -0.076500
global_synonyms_count * rel_letters_count                 -0.369970
global_synonyms_count * rel_orthographic_density          -0.113242
global_synonyms_count * rel_synonyms_count                -0.142667
rel_aoa * rel_clustering                                  -0.031549
rel_aoa * rel_frequency                                   -0.140231
rel_aoa * rel_letters_count                               -0.146251
rel_aoa * rel_orthographic_density                         0.070841
rel_aoa * rel_synonyms_count                               0.048122
rel_clustering * rel_frequency                            -0.292377
rel_clustering * rel_letters_count                        -0.039784
rel_clustering * rel_orthographic_density                 -0.089748
rel_clustering * rel_synonyms_count                       -0.525958
rel_frequency * rel_letters_count                          0.039752
rel_frequency * rel_orthographic_density                  -0.019725
rel_frequency * rel_synonyms_count                         0.045557
rel_letters_count * rel_orthographic_density               0.004185
rel_letters_count * rel_synonyms_count                     0.232338
rel_orthographic_density * rel_synonyms_count              0.271661
dtype: float64

Regressing rel letters_count with 1161 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.2070882279762809

intercept                     -0.800120
global_aoa                     0.084084
global_clustering             -0.275563
global_frequency               0.174291
global_letters_count          -0.387582
global_orthographic_density   -0.050206
global_synonyms_count         -0.459561
rel_aoa                       -0.130381
rel_clustering                 0.284193
rel_frequency                 -0.170865
rel_letters_count              0.762139
rel_orthographic_density      -0.078341
rel_synonyms_count             0.178487
dtype: float64

Regressing rel letters_count with 1161 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.26367368809199

intercept                                                 4.799281
global_aoa                                               -0.533710
global_clustering                                        -1.719301
global_frequency                                         -0.296642
global_letters_count                                     -0.607885
global_orthographic_density                              -4.787324
global_synonyms_count                                     0.546874
rel_aoa                                                  -1.530335
rel_clustering                                           -2.253588
rel_frequency                                             0.750303
rel_letters_count                                         1.268324
rel_orthographic_density                                 -0.033278
rel_synonyms_count                                       -3.387595
global_aoa * global_clustering                            0.011725
global_aoa * global_frequency                            -0.005177
global_aoa * global_letters_count                         0.047162
global_aoa * global_orthographic_density                  0.240214
global_aoa * global_synonyms_count                       -0.088268
global_aoa * rel_aoa                                      0.005042
global_aoa * rel_clustering                               0.033673
global_aoa * rel_frequency                               -0.037273
global_aoa * rel_letters_count                           -0.045343
global_aoa * rel_orthographic_density                    -0.151784
global_aoa * rel_synonyms_count                           0.039792
global_clustering * global_frequency                      0.115043
global_clustering * global_letters_count                  0.196795
global_clustering * global_orthographic_density          -0.205475
global_clustering * global_synonyms_count                -0.102401
global_clustering * rel_aoa                               0.144829
global_clustering * rel_clustering                        0.020229
global_clustering * rel_frequency                        -0.011821
global_clustering * rel_letters_count                    -0.271301
global_clustering * rel_orthographic_density              0.025831
global_clustering * rel_synonyms_count                    0.036934
global_frequency * global_letters_count                   0.139043
global_frequency * global_orthographic_density            0.294480
global_frequency * global_synonyms_count                  0.010924
global_frequency * rel_aoa                                0.185929
global_frequency * rel_clustering                         0.158015
global_frequency * rel_frequency                         -0.044771
global_frequency * rel_letters_count                     -0.177651
global_frequency * rel_orthographic_density              -0.029638
global_frequency * rel_synonyms_count                     0.139812
global_letters_count * global_orthographic_density       -0.167125
global_letters_count * global_synonyms_count             -0.053368
global_letters_count * rel_aoa                            0.102925
global_letters_count * rel_clustering                    -0.020900
global_letters_count * rel_frequency                     -0.006169
global_letters_count * rel_letters_count                  0.022342
global_letters_count * rel_orthographic_density           0.138776
global_letters_count * rel_synonyms_count                 0.270902
global_orthographic_density * global_synonyms_count      -0.245086
global_orthographic_density * rel_aoa                    -0.060372
global_orthographic_density * rel_clustering              0.145894
global_orthographic_density * rel_frequency              -0.149326
global_orthographic_density * rel_letters_count           0.022024
global_orthographic_density * rel_orthographic_density    0.194100
global_orthographic_density * rel_synonyms_count          0.377701
global_synonyms_count * rel_aoa                           0.072126
global_synonyms_count * rel_clustering                    0.639488
global_synonyms_count * rel_frequency                    -0.020558
global_synonyms_count * rel_letters_count                -0.353183
global_synonyms_count * rel_orthographic_density          0.013567
global_synonyms_count * rel_synonyms_count               -0.136413
rel_aoa * rel_clustering                                 -0.070950
rel_aoa * rel_frequency                                  -0.108717
rel_aoa * rel_letters_count                              -0.129748
rel_aoa * rel_orthographic_density                       -0.000367
rel_aoa * rel_synonyms_count                              0.025239
rel_clustering * rel_frequency                           -0.237793
rel_clustering * rel_letters_count                        0.049379
rel_clustering * rel_orthographic_density                 0.086401
rel_clustering * rel_synonyms_count                      -0.445175
rel_frequency * rel_letters_count                         0.015131
rel_frequency * rel_orthographic_density                 -0.006348
rel_frequency * rel_synonyms_count                       -0.029033
rel_letters_count * rel_orthographic_density              0.104979
rel_letters_count * rel_synonyms_count                    0.206408
rel_orthographic_density * rel_synonyms_count             0.158806
dtype: float64

----------------------------------------------------------------------
Regressing global synonyms_count with 1128 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.05542125347208415

intercept                      0.660080
global_aoa                    -0.018401
global_clustering              0.026324
global_frequency              -0.013268
global_letters_count           0.002850
global_orthographic_density    0.011771
global_synonyms_count          0.233325
dtype: float64

Regressing global synonyms_count with 1128 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.06756013806064609

intercept                                             -0.480548
global_aoa                                             0.092825
global_clustering                                     -0.045982
global_frequency                                       0.017273
global_letters_count                                   0.152103
global_orthographic_density                           -0.006127
global_synonyms_count                                  0.170237
global_aoa * global_clustering                         0.009591
global_aoa * global_frequency                         -0.001960
global_aoa * global_letters_count                     -0.007555
global_aoa * global_orthographic_density               0.000892
global_aoa * global_synonyms_count                     0.026366
global_clustering * global_frequency                  -0.003594
global_clustering * global_letters_count               0.007866
global_clustering * global_orthographic_density       -0.005040
global_clustering * global_synonyms_count              0.036854
global_frequency * global_letters_count               -0.005218
global_frequency * global_orthographic_density         0.001604
global_frequency * global_synonyms_count              -0.014046
global_letters_count * global_orthographic_density    -0.010861
global_letters_count * global_synonyms_count           0.021165
global_orthographic_density * global_synonyms_count    0.076497
dtype: float64

Regressing rel synonyms_count with 1128 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.04292373163385299

intercept                      0.300033
global_aoa                    -0.014045
global_clustering              0.014565
global_frequency              -0.013922
global_letters_count          -0.000383
global_orthographic_density    0.004478
global_synonyms_count          0.195675
dtype: float64

Regressing rel synonyms_count with 1128 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.05830930054499783

intercept                                             -0.487699
global_aoa                                             0.082748
global_clustering                                     -0.078335
global_frequency                                      -0.024454
global_letters_count                                   0.088399
global_orthographic_density                           -0.109381
global_synonyms_count                                  0.374384
global_aoa * global_clustering                         0.015591
global_aoa * global_frequency                          0.003627
global_aoa * global_letters_count                     -0.007522
global_aoa * global_orthographic_density               0.001512
global_aoa * global_synonyms_count                     0.024633
global_clustering * global_frequency                  -0.001119
global_clustering * global_letters_count              -0.001110
global_clustering * global_orthographic_density       -0.007074
global_clustering * global_synonyms_count              0.062642
global_frequency * global_letters_count               -0.004172
global_frequency * global_orthographic_density         0.011188
global_frequency * global_synonyms_count              -0.013554
global_letters_count * global_orthographic_density    -0.009436
global_letters_count * global_synonyms_count           0.015053
global_orthographic_density * global_synonyms_count    0.043466
dtype: float64

Regressing global synonyms_count with 1128 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.0450338590838143

intercept                   0.376685
rel_aoa                    -0.006084
rel_clustering             -0.018993
rel_frequency              -0.007666
rel_letters_count           0.003667
rel_orthographic_density    0.028700
rel_synonyms_count          0.229158
dtype: float64

Regressing global synonyms_count with 1128 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.058651374636921716

intercept                                        0.398210
rel_aoa                                         -0.043789
rel_clustering                                  -0.065210
rel_frequency                                   -0.007619
rel_letters_count                               -0.019623
rel_orthographic_density                         0.056007
rel_synonyms_count                               0.193619
rel_aoa * rel_clustering                        -0.000794
rel_aoa * rel_frequency                         -0.008524
rel_aoa * rel_letters_count                      0.005212
rel_aoa * rel_orthographic_density               0.000327
rel_aoa * rel_synonyms_count                     0.028454
rel_clustering * rel_frequency                   0.000991
rel_clustering * rel_letters_count               0.017715
rel_clustering * rel_orthographic_density       -0.009365
rel_clustering * rel_synonyms_count              0.045423
rel_frequency * rel_letters_count                0.002285
rel_frequency * rel_orthographic_density         0.003008
rel_frequency * rel_synonyms_count               0.007754
rel_letters_count * rel_orthographic_density    -0.013092
rel_letters_count * rel_synonyms_count           0.005069
rel_orthographic_density * rel_synonyms_count   -0.000131
dtype: float64

Regressing rel synonyms_count with 1128 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.13605647651694952

intercept                   0.050965
rel_aoa                    -0.013952
rel_clustering              0.033689
rel_frequency              -0.001297
rel_letters_count          -0.000775
rel_orthographic_density    0.004383
rel_synonyms_count          0.380532
dtype: float64

Regressing rel synonyms_count with 1128 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.14588260648997886

intercept                                        0.074059
rel_aoa                                         -0.036028
rel_clustering                                  -0.011538
rel_frequency                                    0.001459
rel_letters_count                               -0.025041
rel_orthographic_density                         0.022778
rel_synonyms_count                               0.417527
rel_aoa * rel_clustering                         0.013092
rel_aoa * rel_frequency                         -0.002356
rel_aoa * rel_letters_count                      0.002295
rel_aoa * rel_orthographic_density              -0.003182
rel_aoa * rel_synonyms_count                     0.018250
rel_clustering * rel_frequency                   0.001055
rel_clustering * rel_letters_count               0.017056
rel_clustering * rel_orthographic_density        0.003997
rel_clustering * rel_synonyms_count              0.040601
rel_frequency * rel_letters_count               -0.001398
rel_frequency * rel_orthographic_density         0.003459
rel_frequency * rel_synonyms_count               0.020588
rel_letters_count * rel_orthographic_density    -0.008027
rel_letters_count * rel_synonyms_count          -0.000317
rel_orthographic_density * rel_synonyms_count    0.004197
dtype: float64

Regressing global synonyms_count with 1128 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.060004960311874966

intercept                      1.496633
global_aoa                    -0.022641
global_clustering              0.105418
global_frequency              -0.024661
global_letters_count          -0.015963
global_orthographic_density   -0.024800
global_synonyms_count          0.159637
rel_aoa                        0.006501
rel_clustering                -0.092076
rel_frequency                  0.015264
rel_letters_count              0.019953
rel_orthographic_density       0.039824
rel_synonyms_count             0.083144
dtype: float64

Regressing global synonyms_count with 1128 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.13759204336558484

intercept                                                 13.633856
global_aoa                                                -0.580295
global_clustering                                          2.525081
global_frequency                                          -0.640199
global_letters_count                                       0.134856
global_orthographic_density                               -0.351327
global_synonyms_count                                      2.034797
rel_aoa                                                   -0.015989
rel_clustering                                            -1.600294
rel_frequency                                              0.561500
rel_letters_count                                         -0.024652
rel_orthographic_density                                  -0.179057
rel_synonyms_count                                        -2.554508
global_aoa * global_clustering                            -0.023758
global_aoa * global_frequency                              0.036035
global_aoa * global_letters_count                          0.006009
global_aoa * global_orthographic_density                  -0.007067
global_aoa * global_synonyms_count                        -0.003721
global_aoa * rel_aoa                                      -0.005926
global_aoa * rel_clustering                                0.067443
global_aoa * rel_frequency                                -0.042495
global_aoa * rel_letters_count                            -0.018204
global_aoa * rel_orthographic_density                      0.019906
global_aoa * rel_synonyms_count                            0.034966
global_clustering * global_frequency                      -0.135202
global_clustering * global_letters_count                  -0.055252
global_clustering * global_orthographic_density           -0.220129
global_clustering * global_synonyms_count                  0.092063
global_clustering * rel_aoa                               -0.033137
global_clustering * rel_clustering                         0.005673
global_clustering * rel_frequency                          0.114824
global_clustering * rel_letters_count                      0.024422
global_clustering * rel_orthographic_density               0.159420
global_clustering * rel_synonyms_count                    -0.093486
global_frequency * global_letters_count                   -0.036680
global_frequency * global_orthographic_density            -0.076659
global_frequency * global_synonyms_count                  -0.097629
global_frequency * rel_aoa                                -0.002896
global_frequency * rel_clustering                          0.086153
global_frequency * rel_frequency                           0.009358
global_frequency * rel_letters_count                       0.003782
global_frequency * rel_orthographic_density                0.079789
global_frequency * rel_synonyms_count                      0.076052
global_letters_count * global_orthographic_density        -0.044805
global_letters_count * global_synonyms_count              -0.050275
global_letters_count * rel_aoa                            -0.010968
global_letters_count * rel_clustering                     -0.019070
global_letters_count * rel_frequency                       0.024790
global_letters_count * rel_letters_count                   0.004285
global_letters_count * rel_orthographic_density            0.062442
global_letters_count * rel_synonyms_count                  0.128481
global_orthographic_density * global_synonyms_count        0.070900
global_orthographic_density * rel_aoa                     -0.054470
global_orthographic_density * rel_clustering               0.092704
global_orthographic_density * rel_frequency                0.060789
global_orthographic_density * rel_letters_count            0.090991
global_orthographic_density * rel_orthographic_density    -0.021631
global_orthographic_density * rel_synonyms_count           0.128155
global_synonyms_count * rel_aoa                            0.037493
global_synonyms_count * rel_clustering                     0.089975
global_synonyms_count * rel_frequency                      0.113995
global_synonyms_count * rel_letters_count                  0.043473
global_synonyms_count * rel_orthographic_density           0.011146
global_synonyms_count * rel_synonyms_count                 0.123060
rel_aoa * rel_clustering                                  -0.000536
rel_aoa * rel_frequency                                    0.004654
rel_aoa * rel_letters_count                                0.021462
rel_aoa * rel_orthographic_density                         0.024773
rel_aoa * rel_synonyms_count                              -0.035317
rel_clustering * rel_frequency                            -0.052123
rel_clustering * rel_letters_count                         0.047135
rel_clustering * rel_orthographic_density                 -0.039414
rel_clustering * rel_synonyms_count                       -0.063446
rel_frequency * rel_letters_count                          0.008382
rel_frequency * rel_orthographic_density                  -0.071027
rel_frequency * rel_synonyms_count                        -0.091068
rel_letters_count * rel_orthographic_density              -0.128158
rel_letters_count * rel_synonyms_count                    -0.081818
rel_orthographic_density * rel_synonyms_count             -0.151296
dtype: float64

Regressing rel synonyms_count with 1128 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.20684473627754008

intercept                      1.115813
global_aoa                    -0.019790
global_clustering              0.074619
global_frequency              -0.022611
global_letters_count          -0.009070
global_orthographic_density   -0.004879
global_synonyms_count         -0.608100
rel_aoa                        0.006560
rel_clustering                -0.058055
rel_frequency                  0.015338
rel_letters_count              0.010614
rel_orthographic_density       0.009702
rel_synonyms_count             0.943327
dtype: float64

Regressing rel synonyms_count with 1128 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.27176412173528275

intercept                                                 14.300660
global_aoa                                                -0.450089
global_clustering                                          2.480667
global_frequency                                          -0.810184
global_letters_count                                      -0.081872
global_orthographic_density                               -0.335968
global_synonyms_count                                      0.812865
rel_aoa                                                   -0.000941
rel_clustering                                            -1.676175
rel_frequency                                              0.585801
rel_letters_count                                          0.132397
rel_orthographic_density                                   0.108550
rel_synonyms_count                                        -1.006256
global_aoa * global_clustering                            -0.016564
global_aoa * global_frequency                              0.029481
global_aoa * global_letters_count                          0.005367
global_aoa * global_orthographic_density                  -0.012347
global_aoa * global_synonyms_count                        -0.005047
global_aoa * rel_aoa                                      -0.004271
global_aoa * rel_clustering                                0.049904
global_aoa * rel_frequency                                -0.034280
global_aoa * rel_letters_count                            -0.013705
global_aoa * rel_orthographic_density                      0.019277
global_aoa * rel_synonyms_count                            0.042939
global_clustering * global_frequency                      -0.145051
global_clustering * global_letters_count                  -0.066917
global_clustering * global_orthographic_density           -0.184363
global_clustering * global_synonyms_count                  0.111782
global_clustering * rel_aoa                               -0.027692
global_clustering * rel_clustering                        -0.001183
global_clustering * rel_frequency                          0.103675
global_clustering * rel_letters_count                      0.041363
global_clustering * rel_orthographic_density               0.159349
global_clustering * rel_synonyms_count                    -0.056885
global_frequency * global_letters_count                   -0.022731
global_frequency * global_orthographic_density            -0.052214
global_frequency * global_synonyms_count                  -0.046656
global_frequency * rel_aoa                                -0.001726
global_frequency * rel_clustering                          0.102648
global_frequency * rel_frequency                           0.006429
global_frequency * rel_letters_count                      -0.001405
global_frequency * rel_orthographic_density                0.056778
global_frequency * rel_synonyms_count                      0.042989
global_letters_count * global_orthographic_density        -0.045114
global_letters_count * global_synonyms_count              -0.029991
global_letters_count * rel_aoa                            -0.009804
global_letters_count * rel_clustering                      0.003433
global_letters_count * rel_frequency                       0.016047
global_letters_count * rel_letters_count                   0.002370
global_letters_count * rel_orthographic_density            0.054369
global_letters_count * rel_synonyms_count                  0.097320
global_orthographic_density * global_synonyms_count        0.054121
global_orthographic_density * rel_aoa                     -0.050363
global_orthographic_density * rel_clustering               0.065843
global_orthographic_density * rel_frequency                0.039660
global_orthographic_density * rel_letters_count            0.090898
global_orthographic_density * rel_orthographic_density    -0.020043
global_orthographic_density * rel_synonyms_count           0.125044
global_synonyms_count * rel_aoa                            0.035235
global_synonyms_count * rel_clustering                     0.027522
global_synonyms_count * rel_frequency                      0.079560
global_synonyms_count * rel_letters_count                 -0.014830
global_synonyms_count * rel_orthographic_density          -0.073061
global_synonyms_count * rel_synonyms_count                 0.124864
rel_aoa * rel_clustering                                   0.006372
rel_aoa * rel_frequency                                    0.003930
rel_aoa * rel_letters_count                                0.016431
rel_aoa * rel_orthographic_density                         0.031875
rel_aoa * rel_synonyms_count                              -0.052335
rel_clustering * rel_frequency                            -0.051633
rel_clustering * rel_letters_count                         0.020058
rel_clustering * rel_orthographic_density                 -0.051088
rel_clustering * rel_synonyms_count                       -0.055238
rel_frequency * rel_letters_count                          0.007300
rel_frequency * rel_orthographic_density                  -0.044630
rel_frequency * rel_synonyms_count                        -0.067039
rel_letters_count * rel_orthographic_density              -0.114663
rel_letters_count * rel_synonyms_count                    -0.030334
rel_orthographic_density * rel_synonyms_count             -0.070738
dtype: float64

----------------------------------------------------------------------
Regressing global orthographic_density with 992 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.14868339111486462

intercept                      1.332520
global_aoa                    -0.021985
global_clustering             -0.035420
global_frequency              -0.011890
global_letters_count          -0.063577
global_orthographic_density    0.252349
global_synonyms_count          0.081738
dtype: float64

Regressing global orthographic_density with 992 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1675205781489445

intercept                                              2.023534
global_aoa                                            -0.148280
global_clustering                                      0.085188
global_frequency                                       0.052246
global_letters_count                                  -0.137362
global_orthographic_density                            0.478333
global_synonyms_count                                 -0.674780
global_aoa * global_clustering                        -0.008663
global_aoa * global_frequency                         -0.002716
global_aoa * global_letters_count                      0.007909
global_aoa * global_orthographic_density               0.039752
global_aoa * global_synonyms_count                     0.019148
global_clustering * global_frequency                  -0.004668
global_clustering * global_letters_count               0.000473
global_clustering * global_orthographic_density        0.014130
global_clustering * global_synonyms_count             -0.066170
global_frequency * global_letters_count                0.001260
global_frequency * global_orthographic_density        -0.049362
global_frequency * global_synonyms_count              -0.008243
global_letters_count * global_orthographic_density     0.005440
global_letters_count * global_synonyms_count           0.023302
global_orthographic_density * global_synonyms_count    0.138904
dtype: float64

Regressing rel orthographic_density with 992 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1151413580890156

intercept                     -0.963501
global_aoa                    -0.009021
global_clustering             -0.020816
global_frequency              -0.005009
global_letters_count          -0.056852
global_orthographic_density    0.214256
global_synonyms_count          0.088467
dtype: float64

Regressing rel orthographic_density with 992 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.13250656821908657

intercept                                              0.447675
global_aoa                                            -0.164562
global_clustering                                     -0.009022
global_frequency                                       0.033930
global_letters_count                                  -0.311322
global_orthographic_density                            0.020252
global_synonyms_count                                 -0.757039
global_aoa * global_clustering                         0.005761
global_aoa * global_frequency                          0.003632
global_aoa * global_letters_count                      0.014672
global_aoa * global_orthographic_density               0.051367
global_aoa * global_synonyms_count                     0.019952
global_clustering * global_frequency                   0.008078
global_clustering * global_letters_count              -0.016707
global_clustering * global_orthographic_density        0.008384
global_clustering * global_synonyms_count             -0.078469
global_frequency * global_letters_count                0.003723
global_frequency * global_orthographic_density        -0.020136
global_frequency * global_synonyms_count              -0.001888
global_letters_count * global_orthographic_density     0.014201
global_letters_count * global_synonyms_count           0.023990
global_orthographic_density * global_synonyms_count    0.103278
dtype: float64

Regressing global orthographic_density with 992 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.11455535532234884

intercept                   1.619597
rel_aoa                     0.005958
rel_clustering             -0.059091
rel_frequency              -0.018453
rel_letters_count          -0.045363
rel_orthographic_density    0.299187
rel_synonyms_count          0.061967
dtype: float64

Regressing global orthographic_density with 992 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.12933624173856262

intercept                                        1.598998
rel_aoa                                          0.072589
rel_clustering                                   0.008022
rel_frequency                                   -0.028959
rel_letters_count                               -0.014721
rel_orthographic_density                         0.322177
rel_synonyms_count                               0.232211
rel_aoa * rel_clustering                         0.023418
rel_aoa * rel_frequency                          0.013210
rel_aoa * rel_letters_count                     -0.001989
rel_aoa * rel_orthographic_density               0.037096
rel_aoa * rel_synonyms_count                     0.047112
rel_clustering * rel_frequency                   0.001338
rel_clustering * rel_letters_count               0.007860
rel_clustering * rel_orthographic_density        0.108676
rel_clustering * rel_synonyms_count             -0.075978
rel_frequency * rel_letters_count                0.010650
rel_frequency * rel_orthographic_density         0.025843
rel_frequency * rel_synonyms_count               0.018819
rel_letters_count * rel_orthographic_density    -0.003472
rel_letters_count * rel_synonyms_count          -0.017023
rel_orthographic_density * rel_synonyms_count    0.081635
dtype: float64

Regressing rel orthographic_density with 992 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1838071448813724

intercept                  -0.483055
rel_aoa                     0.015907
rel_clustering             -0.040982
rel_frequency               0.025201
rel_letters_count          -0.034467
rel_orthographic_density    0.377409
rel_synonyms_count          0.055698
dtype: float64

Regressing rel orthographic_density with 992 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1971933614053899

intercept                                       -0.454972
rel_aoa                                          0.071134
rel_clustering                                  -0.000308
rel_frequency                                    0.040115
rel_letters_count                               -0.007678
rel_orthographic_density                         0.418242
rel_synonyms_count                               0.169622
rel_aoa * rel_clustering                         0.020231
rel_aoa * rel_frequency                          0.006588
rel_aoa * rel_letters_count                      0.005088
rel_aoa * rel_orthographic_density               0.057619
rel_aoa * rel_synonyms_count                     0.046967
rel_clustering * rel_frequency                  -0.000213
rel_clustering * rel_letters_count               0.010314
rel_clustering * rel_orthographic_density        0.084746
rel_clustering * rel_synonyms_count             -0.097764
rel_frequency * rel_letters_count                0.007840
rel_frequency * rel_orthographic_density         0.040123
rel_frequency * rel_synonyms_count               0.006246
rel_letters_count * rel_orthographic_density     0.004646
rel_letters_count * rel_synonyms_count          -0.018200
rel_orthographic_density * rel_synonyms_count    0.057995
dtype: float64

Regressing global orthographic_density with 992 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.15723250880902273

intercept                      2.684331
global_aoa                    -0.054613
global_clustering             -0.011068
global_frequency              -0.071304
global_letters_count          -0.142007
global_orthographic_density    0.239427
global_synonyms_count          0.128168
rel_aoa                        0.048670
rel_clustering                -0.024386
rel_frequency                  0.069983
rel_letters_count              0.083662
rel_orthographic_density      -0.000632
rel_synonyms_count            -0.060496
dtype: float64

Regressing global orthographic_density with 992 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.20510157323104905

intercept                                                -4.669612
global_aoa                                               -0.431192
global_clustering                                        -0.444114
global_frequency                                          0.299690
global_letters_count                                      0.376881
global_orthographic_density                               3.373074
global_synonyms_count                                     0.870053
rel_aoa                                                   0.261669
rel_clustering                                            1.314428
rel_frequency                                            -0.343254
rel_letters_count                                        -0.159244
rel_orthographic_density                                 -2.362134
rel_synonyms_count                                        0.437713
global_aoa * global_clustering                           -0.080929
global_aoa * global_frequency                            -0.002219
global_aoa * global_letters_count                         0.001853
global_aoa * global_orthographic_density                 -0.049715
global_aoa * global_synonyms_count                        0.119123
global_aoa * rel_aoa                                      0.009946
global_aoa * rel_clustering                               0.070650
global_aoa * rel_frequency                                0.010515
global_aoa * rel_letters_count                            0.015571
global_aoa * rel_orthographic_density                     0.091577
global_aoa * rel_synonyms_count                          -0.157885
global_clustering * global_frequency                      0.003209
global_clustering * global_letters_count                  0.052508
global_clustering * global_orthographic_density           0.233177
global_clustering * global_synonyms_count                 0.034017
global_clustering * rel_aoa                               0.003974
global_clustering * rel_clustering                       -0.043348
global_clustering * rel_frequency                        -0.042948
global_clustering * rel_letters_count                     0.017584
global_clustering * rel_orthographic_density             -0.205908
global_clustering * rel_synonyms_count                   -0.026901
global_frequency * global_letters_count                  -0.007804
global_frequency * global_orthographic_density           -0.125577
global_frequency * global_synonyms_count                 -0.127159
global_frequency * rel_aoa                               -0.017654
global_frequency * rel_clustering                        -0.038519
global_frequency * rel_frequency                          0.007265
global_frequency * rel_letters_count                      0.016171
global_frequency * rel_orthographic_density               0.028568
global_frequency * rel_synonyms_count                     0.041003
global_letters_count * global_orthographic_density       -0.036318
global_letters_count * global_synonyms_count             -0.120134
global_letters_count * rel_aoa                           -0.020039
global_letters_count * rel_clustering                    -0.125299
global_letters_count * rel_frequency                     -0.027689
global_letters_count * rel_letters_count                 -0.001169
global_letters_count * rel_orthographic_density           0.061625
global_letters_count * rel_synonyms_count                 0.134532
global_orthographic_density * global_synonyms_count       0.394853
global_orthographic_density * rel_aoa                     0.038356
global_orthographic_density * rel_clustering             -0.385071
global_orthographic_density * rel_frequency               0.038905
global_orthographic_density * rel_letters_count           0.024705
global_orthographic_density * rel_orthographic_density   -0.005161
global_orthographic_density * rel_synonyms_count         -0.356889
global_synonyms_count * rel_aoa                          -0.081477
global_synonyms_count * rel_clustering                   -0.140801
global_synonyms_count * rel_frequency                     0.162031
global_synonyms_count * rel_letters_count                 0.169381
global_synonyms_count * rel_orthographic_density         -0.151051
global_synonyms_count * rel_synonyms_count               -0.037862
rel_aoa * rel_clustering                                 -0.002619
rel_aoa * rel_frequency                                   0.018597
rel_aoa * rel_letters_count                               0.005498
rel_aoa * rel_orthographic_density                       -0.026500
rel_aoa * rel_synonyms_count                              0.143201
rel_clustering * rel_frequency                            0.065778
rel_clustering * rel_letters_count                        0.044480
rel_clustering * rel_orthographic_density                 0.369801
rel_clustering * rel_synonyms_count                       0.027951
rel_frequency * rel_letters_count                         0.021321
rel_frequency * rel_orthographic_density                  0.027332
rel_frequency * rel_synonyms_count                       -0.083879
rel_letters_count * rel_orthographic_density             -0.036356
rel_letters_count * rel_synonyms_count                   -0.184974
rel_orthographic_density * rel_synonyms_count             0.190239
dtype: float64

Regressing rel orthographic_density with 992 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.22007119532538477

intercept                      2.027454
global_aoa                    -0.048767
global_clustering             -0.009198
global_frequency              -0.066509
global_letters_count          -0.107804
global_orthographic_density   -0.524602
global_synonyms_count          0.129932
rel_aoa                        0.042632
rel_clustering                -0.014765
rel_frequency                  0.072626
rel_letters_count              0.047559
rel_orthographic_density       0.822147
rel_synonyms_count            -0.073166
dtype: float64

Regressing rel orthographic_density with 992 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.26254127879419886

intercept                                                -4.316537
global_aoa                                               -0.151743
global_clustering                                        -0.379758
global_frequency                                          0.251776
global_letters_count                                      0.150206
global_orthographic_density                               1.840495
global_synonyms_count                                     0.836509
rel_aoa                                                   0.129684
rel_clustering                                            1.031134
rel_frequency                                            -0.435260
rel_letters_count                                        -0.060506
rel_orthographic_density                                 -0.789131
rel_synonyms_count                                        0.594929
global_aoa * global_clustering                           -0.050462
global_aoa * global_frequency                            -0.001993
global_aoa * global_letters_count                        -0.003199
global_aoa * global_orthographic_density                 -0.075941
global_aoa * global_synonyms_count                        0.083193
global_aoa * rel_aoa                                      0.009592
global_aoa * rel_clustering                               0.047684
global_aoa * rel_frequency                                0.014730
global_aoa * rel_letters_count                            0.022254
global_aoa * rel_orthographic_density                     0.112406
global_aoa * rel_synonyms_count                          -0.114341
global_clustering * global_frequency                      0.005192
global_clustering * global_letters_count                  0.028743
global_clustering * global_orthographic_density           0.171768
global_clustering * global_synonyms_count                -0.013213
global_clustering * rel_aoa                              -0.012947
global_clustering * rel_clustering                       -0.032887
global_clustering * rel_frequency                        -0.041035
global_clustering * rel_letters_count                     0.033965
global_clustering * rel_orthographic_density             -0.126911
global_clustering * rel_synonyms_count                    0.033461
global_frequency * global_letters_count                  -0.005778
global_frequency * global_orthographic_density           -0.100964
global_frequency * global_synonyms_count                 -0.102195
global_frequency * rel_aoa                               -0.018430
global_frequency * rel_clustering                        -0.030854
global_frequency * rel_frequency                          0.007329
global_frequency * rel_letters_count                      0.012849
global_frequency * rel_orthographic_density               0.015637
global_frequency * rel_synonyms_count                     0.005742
global_letters_count * global_orthographic_density        0.041079
global_letters_count * global_synonyms_count             -0.130330
global_letters_count * rel_aoa                           -0.015458
global_letters_count * rel_clustering                    -0.092565
global_letters_count * rel_frequency                     -0.016211
global_letters_count * rel_letters_count                 -0.001124
global_letters_count * rel_orthographic_density           0.003288
global_letters_count * rel_synonyms_count                 0.140804
global_orthographic_density * global_synonyms_count       0.282750
global_orthographic_density * rel_aoa                     0.037226
global_orthographic_density * rel_clustering             -0.280633
global_orthographic_density * rel_frequency               0.056810
global_orthographic_density * rel_letters_count          -0.015357
global_orthographic_density * rel_orthographic_density    0.014259
global_orthographic_density * rel_synonyms_count         -0.238598
global_synonyms_count * rel_aoa                          -0.058184
global_synonyms_count * rel_clustering                   -0.050784
global_synonyms_count * rel_frequency                     0.154542
global_synonyms_count * rel_letters_count                 0.142027
global_synonyms_count * rel_orthographic_density         -0.121368
global_synonyms_count * rel_synonyms_count               -0.050796
rel_aoa * rel_clustering                                  0.007111
rel_aoa * rel_frequency                                   0.013311
rel_aoa * rel_letters_count                               0.001872
rel_aoa * rel_orthographic_density                       -0.020764
rel_aoa * rel_synonyms_count                              0.110015
rel_clustering * rel_frequency                            0.059014
rel_clustering * rel_letters_count                        0.014400
rel_clustering * rel_orthographic_density                 0.242242
rel_clustering * rel_synonyms_count                      -0.069309
rel_frequency * rel_letters_count                         0.009314
rel_frequency * rel_orthographic_density                  0.004137
rel_frequency * rel_synonyms_count                       -0.075116
rel_letters_count * rel_orthographic_density             -0.022390
rel_letters_count * rel_synonyms_count                   -0.159875
rel_orthographic_density * rel_synonyms_count             0.137536
dtype: float64

	aoa	betweenness	clustering	degree	frequency	letters_count	orthographic_density	pagerank	phonemes_count	phonological_density	syllables_count	synonyms_count
Component-0	-0.472754	0.292834	-0.079535	0.243040	0.233793	-0.425883	0.211612	0.284075	-0.400360	0.278756	-0.160736	-0.003837
Component-1	-0.256312	0.428259	-0.135165	0.293881	0.287968	0.422299	-0.170129	0.294531	0.439821	-0.220452	0.164369	-0.015803
Component-2	0.814881	0.374155	-0.133468	0.124764	0.334159	-0.141843	-0.007585	0.088831	-0.100192	0.101068	-0.025615	-0.044239

	aoa	frequency	letters_count
Component-0	-0.742885	0.385537	-0.547250
Component-1	0.310008	-0.526418	-0.791694