Feature variation by substitution ($\nu_{\phi}$)

1 Setup

Flags and settings.



In [1]:

    
SAVE_FIGURES = False
PAPER_FEATURES = ['frequency', 'aoa', 'clustering', 'letters_count',
                  'synonyms_count', 'orthographic_density']
N_COMPONENTS = 3
BIN_COUNT = 4

Imports and database setup.



In [2]:

    
from itertools import product

import pandas as pd
import seaborn as sb
from scipy import stats
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from progressbar import ProgressBar

%cd -q ..
from brainscopypaste.conf import settings
%cd -q notebooks
from brainscopypaste.mine import Model, Time, Source, Past, Durl
from brainscopypaste.db import Substitution
from brainscopypaste.utils import init_db, session_scope
engine = init_db()

2 Variation of features upon substitution

First build our data.



In [3]:

    
model = Model(time=Time.continuous, source=Source.majority, past=Past.all, durl=Durl.exclude_past, max_distance=2)
data = []

with session_scope() as session:
    substitutions = session.query(Substitution.id)\
        .filter(Substitution.model == model)
    print("Got {} substitutions for model {}"
          .format(substitutions.count(), model))
    substitution_ids = [id for (id,) in substitutions]

for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for feature in Substitution.__features__:
            source, destination = substitution.features(feature)
            source_rel, destination_rel = \
                substitution.features(feature, sentence_relative='median')
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'feature': feature,
                'source': source,
                'source_rel': source_rel,
                'destination': destination,
                'destination_rel': destination_rel,
                'h0': substitution.feature_average(feature),
                'h0_rel': substitution.feature_average(
                        feature, sentence_relative='median'),
                'h0n': substitution.feature_average(
                        feature, source_synonyms=True),
                'h0n_rel': substitution.feature_average(
                        feature, source_synonyms=True,
                        sentence_relative='median')})

original_variations = pd.DataFrame(data)
del data









    



Got 2289 substitutions for model Model(time=Time.continuous, source=Source.majority, past=Past.all, durl=Durl.exclude_past, max_distance=2)






    



100% (2289 of 2289) |######################| Elapsed Time: 0:01:03 Time: 0:01:03

Compute cluster averages (so as not to overestimate confidence intervals) and crop data so that we have acceptable CIs.



In [4]:

    
variations = original_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'feature'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'feature'], as_index=False)\
    ['source', 'source_rel', 'destination', 'destination_rel', 'feature',
     'h0', 'h0_rel', 'h0n', 'h0n_rel'].mean()
variations['variation'] = variations['destination'] - variations['source']

# HARDCODED: drop values where source AoA is above 15.
# This crops the graphs to acceptable CIs.
variations.loc[(variations.feature == 'aoa') & (variations.source > 15),
               ['source', 'source_rel', 'destination', 'destination_rel',
                'h0', 'h0_rel', 'h0n', 'h0n_rel']] = np.nan

Prepare feature ordering.



In [5]:

    
ordered_features = sorted(
    Substitution.__features__,
    key=lambda f: Substitution._transformed_feature(f).__doc__
)

What we plot about features

For a feature $\phi$, plot:

$\nu_{\phi}$, the average feature of an appearing word upon substitution, as a function of the feature of the disappearing word: $$\nu_{\phi}(f) = \left< \phi(w') \right>_{\{w \rightarrow w' | \phi(w) = f \}}$$
$\nu_{\phi}^0$ (which is the average feature value), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi}^{00}$ (which is the average feature value for synonyms of the source word), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

We also plot these values relative to the sentence average, i.e.:

$\nu_{\phi, r}$, the average sentence-relative feature of an appearing word upon substitution as a function of the sentence-relative feature of the disappearing word, i.e. $\phi($destination$) - \phi($destination sentence$)$ as a function of $\phi($source$) - \phi($source sentence$)$
$\nu_{\phi, r}^0$ (which is the average feature value minus the sentence average), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi, r}^{00}$ (which is the average feature value for synonyms of the source word minus the sentence average), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

Those values are plotted with fixed-width bins, then quantile bins, with absolute feature values, then with relative-to-sentence features.



In [6]:

    
def print_significance(name, bins, h0, h0n, values):
    bin_count = bins.max() + 1
    print()
    print('-' * len(name))
    print(name)
    print('-' * len(name))
    header = ('Bin  |   '
              + ' |   '.join(map(str, range(1, bin_count + 1)))
              + ' |')
    print(header)
    print('-' * len(header))
    
    for null_name, nulls in [('H_0 ', h0), ('H_00', h0n)]:
        bin_values = np.zeros(bin_count)
        bin_nulls = np.zeros(bin_count)
        cis = np.zeros((bin_count, 3))

        for i in range(bin_count):
            indices = bins == i
            n = (indices).sum()
            s = values[indices].std(ddof=1)

            bin_values[i] = values[indices].mean()
            bin_nulls[i] = nulls[indices].mean()
            for j, alpha in enumerate([.05, .01, .001]):
                cis[i, j] = (stats.t.ppf(1 - alpha/2, n - 1)
                             * values[indices].std(ddof=1)
                             / np.sqrt(n - 1))

        print(null_name + ' |', end='')
        differences = ((bin_values[:,np.newaxis]
                        < bin_nulls[:,np.newaxis] - cis)
                       | (bin_values[:,np.newaxis]
                          > bin_nulls[:,np.newaxis] + cis))
        for i in range(bin_count):
            if differences[i].any():
                n_stars = np.where(differences[i])[0].max()
                bin_stars = '*' * (1 + n_stars) + ' ' * (2 - n_stars)
            else:
                bin_stars = 'ns.'
            print(' ' + bin_stars + ' |', end='')
        print()



In [7]:

    
def plot_variation(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    relative = kwargs.get('relative', False)
    quantiles = kwargs.get('quantiles', False)
    feature_field = kwargs.get('feature_field', 'feature')
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    nuphi = r'\nu_{\phi' + (',r' if relative else '') + '}'
    plt.plot(middles, values, '-', lw=2, color=color,
             label='${}$'.format(nuphi))
    plt.fill_between(middles, values - cis, values + cis,
                     color=sb.desaturate(color, 0.2), alpha=0.2)
    plt.plot(middles, h0s, '--', color=sb.desaturate(color, 0.2),
             label='${}^0$'.format(nuphi))
    plt.plot(middles, h0ns, linestyle='-.',
             color=sb.desaturate(color, 0.2),
             label='${}^{{00}}$'.format(nuphi))
    plt.plot(middles, middles, linestyle='dotted',
             color=sb.desaturate(color, 0.2),
             label='$y = x$')
    lmin, lmax = middles[0], middles[-1]
    h0min, h0max = min(h0s.min(), h0ns.min()), max(h0s.max(), h0ns.max())
    # Rescale limits if we're touching H0 or H00.
    if h0min < lmin:
        lmin = h0min - (lmax - h0min) / 10
    elif h0max > lmax:
        lmax = h0max + (h0max - lmin) / 10
    plt.xlim(lmin, lmax)
    plt.ylim(lmin, lmax)

    # Test for statistical significance
    print_significance(str(data.iloc[0][feature_field]),
                       x_bins, h0, h0n, y)



In [8]:

    
def plot_grid(data, features, filename,
              plot_function, xlabel, ylabel,
              feature_field='feature', plot_kws={}):
    g = sb.FacetGrid(data=data[data[feature_field]
                               .map(lambda f: f in features)],
                     sharex=False, sharey=False,
                     col=feature_field, hue=feature_field,
                     col_order=features, hue_order=features,
                     col_wrap=3, aspect=1.5, size=3)
    g.map_dataframe(plot_function, **plot_kws)
    g.set_titles('{col_name}')
    g.set_xlabels(xlabel)
    g.set_ylabels(ylabel)
    for ax in g.axes.ravel():
        legend = ax.legend(frameon=True, loc='best')
        if not legend:
            # Skip if nothing was plotted on these axes.
            continue
        frame = legend.get_frame()
        frame.set_facecolor('#f2f2f2')
        frame.set_edgecolor('#000000')
        ax.set_title(Substitution._transformed_feature(ax.get_title())
                     .__doc__)
    if SAVE_FIGURES:
        g.fig.savefig(settings.FIGURE.format(filename),
                      bbox_inches='tight', dpi=300)



In [9]:

    
def plot_bias(ax, data, color, ci=True, relative=False, quantiles=False):
    feature = data.iloc[0].feature
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    scale = abs(h0s.mean())
    ax.plot(np.linspace(0, 1, bin_count),
            (values - h0ns) / scale, '-', lw=2, color=color,
            label=Substitution._transformed_feature(feature).__doc__)
    if ci:
        ax.fill_between(np.linspace(0, 1, bin_count),
                        (values - h0ns - cis) / scale,
                        (values - h0ns + cis) / scale,
                        color=sb.desaturate(color, 0.2), alpha=0.2)



In [10]:

    
def plot_overlay(data, features, filename, palette_name,
                 plot_function, title, xlabel, ylabel, plot_kws={}):
    palette = sb.color_palette(palette_name, len(features))
    fig, ax = plt.subplots(figsize=(12, 6))
    for j, feature in enumerate(features):
        plot_function(ax, data[data.feature == feature].dropna(),
                      color=palette[j], **plot_kws)
    ax.legend(loc='lower right')
    ax.set_title(title)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    if SAVE_FIGURES:
        fig.savefig(settings.FIGURE.format(filename),
                    bbox_inches='tight', dpi=300)
    return ax

2.1 Global feature values

2.1.1 Bins of distribution of appeared global feature values

For each feature $\phi$, we plot the variation upon substitution as explained above



In [11]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | **  | *   |
H_00 | *** | *** | *** | **  |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *   | *   |
H_00 | **  | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | ns. | **  | ns. |
H_00 | *** | ns. | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | ns. | **  |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | ns. | ns. | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | **  | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *   | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | *   | *** | ns. |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | ns. | ns. | ns. | *** |

Then plot $\nu_{\phi} - \nu_{\phi}^{00}$ for each feature (i.e. the measured bias) to see how they compare



In [12]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False});



In [13]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | *** | *** | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | **  | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | **  | *   |
H_00 | *** | *** | *** | **  |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | ns. | **  |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | *   | *** | ns. |



In [14]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)')\
    .set_ylim(-2, .7);

2.1.2 Quantiles of distribution of appeared global feature values



In [15]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | **  | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |
------------------
H_0  | *** | ns. |
H_00 | *** | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *   | *   |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | ns. | ns. | *** |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | ns. | ns. | **  |



In [16]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False, 'quantiles': True});



In [17]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *   | *   |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | ns. | ns. | *** |



In [18]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'quantiles': True})\
    .set_ylim(-1.2, .6);

2.2 Sentence-relative feature values

2.2.1 Bins of distribution of appeared sentence-relative values



In [19]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | ns. | ns. | ns. | *   |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *   | ns. |
H_00 | ns. | **  | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | *** | **  |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *   | ns. |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | ns. | *** | *   |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | **  | *** | *** | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | ns. | **  | ns. | ns. |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | **  | *** | *** |
H_00 | ns. | *   | ns. | ns. |



In [20]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True});



In [21]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | *** | **  |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | **  | *** | *** | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *   | ns. |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | ns. | **  | ns. | ns. |



In [22]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_sentencerel',
             'deep', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True})\
    .set_ylim(-2, .7);

2.2.2 Quantiles of distribution of appeared sentence-relative values



In [23]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | ns. | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | **  |
H_00 | **  | ns. | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | **  | *   | *   |
H_00 | *   | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | **  | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *   | ns. | *** |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | *   | ns. | **  |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | **  | *** | *** |
H_00 | ns. | **  | ns. | ns. |



In [24]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True, 'quantiles': True});



In [25]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *   | ns. | *** |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | **  | *   | *   |
H_00 | *   | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | *   | ns. | **  |



In [26]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True, 'quantiles': True});

3 Streamplots

We'd like to see what happens between absolute and relative feature values, i.e. how do their effects interact. Especially, we want to know who wins between cognitive bias, attraction to sentence average, or attraction to global feature average.

To do this we plot the general direction (arrows) and strength (color) of where destination words are given a particular absolute/relative source feature couple. I.e., for a given absolute feature value and relative feature value, if this word were to be substituted, where would it go in this (absolute, relative) space?

The interesting thing in these plots is the attraction front, where all arrows point to and join. We're interested in:

its slope
its shape (e.g. several slope regimes?)
its position w.r.t. $\nu_{\phi}^0$ and $y = 0$ (which is $\left< \phi(sentence) \right>$)

First, here's our plotting function. (Note we set the arrow size to something that turns out to be huge here, but gives normal sizes in the figures saves. There must be some dpi scaling problem with the arrows.)



In [27]:

    
def plot_stream(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    source = data['source']
    source_rel = data['source_rel']
    dest = data['destination']
    dest_rel = data['destination_rel']
    h0 = data['h0']
    
    # Compute binning.
    bin_count = 4
    x_bins, x_margins = pd.cut(source, bin_count,
                               right=False, labels=False, retbins=True)
    x_middles = (x_margins[:-1] + x_margins[1:]) / 2
    y_bins, y_margins = pd.cut(source_rel, bin_count,
                               right=False, labels=False, retbins=True)
    y_middles = (y_margins[:-1] + y_margins[1:]) / 2
    
    # Compute bin values.
    h0s = np.ones(bin_count) * h0.iloc[0]
    u_values = np.zeros((bin_count, bin_count))
    v_values = np.zeros((bin_count, bin_count))
    strength = np.zeros((bin_count, bin_count))
    for x in range(bin_count):
        for y in range(bin_count):
            u_values[y, x] = (
                dest[(x_bins == x) & (y_bins == y)] -
                source[(x_bins == x) & (y_bins == y)]
            ).mean()
            v_values[y, x] = (
                dest_rel[(x_bins == x) & (y_bins == y)] -
                source_rel[(x_bins == x) & (y_bins == y)]
            ).mean()
            strength[y, x] = np.sqrt(
                (dest[(x_bins == x) & (y_bins == y)] - 
                 source[(x_bins == x) & (y_bins == y)]) ** 2 +
                (dest_rel[(x_bins == x) & (y_bins == y)] - 
                 source_rel[(x_bins == x) & (y_bins == y)]) ** 2
            ).mean()
    
    # Plot.
    plt.streamplot(x_middles, y_middles, u_values, v_values,
                   arrowsize=4, color=strength, cmap=plt.cm.viridis)
    plt.plot(x_middles, np.zeros(bin_count), linestyle='-',
             color=sb.desaturate(color, 0.2), 
             label=r'$\left< \phi(sentence) \right>$')
    plt.plot(h0s, y_middles, linestyle='--',
             color=sb.desaturate(color, 0.2), label=r'$\nu_{\phi}^0$')
    plt.xlim(x_middles[0], x_middles[-1])
    plt.ylim(y_middles[0], y_middles[-1])

Here are the plots for all features



In [28]:

    
g = sb.FacetGrid(data=variations,
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=ordered_features, hue_order=ordered_features)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

And here are the plots for the features we expose in the paper



In [29]:

    
g = sb.FacetGrid(data=variations[variations['feature']
                                 .map(lambda f: f in PAPER_FEATURES)],
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=PAPER_FEATURES, hue_order=PAPER_FEATURES)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

4 PCA'd feature variations

Compute PCA on feature variations (note: on variations, not on features directly), and show the evolution of the first three components upon substitution.

CAVEAT: the PCA is computed on variations where all features are defined. This greatly reduces the number of words included (and also the number of substitutions -- see below for real values, but you should know it's drastic). This also has an effect on the computation of $\mathcal{H}_0$ and $\mathcal{H}_{00}$, which are computed using words for which all features are defined. This, again, hugely reduces the number of words taken into account, changing the values under the null hypotheses.

4.1 On all the features

Compute the actual PCA



In [30]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(Substitution.__features__))
pcavariations = variations.pivot(index='cluster_id',
                                 columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

print("We're plotting variation for the first {} components:"
      .format(N_COMPONENTS))
pd.DataFrame(pca.components_[:N_COMPONENTS],
             columns=pcafeatures,
             index=['Component-{}'.format(i) for i in range(N_COMPONENTS)])









    



MLE estimates there are 10 components.

Those explain the following variance:
[ 0.54284167  0.16812967  0.07916533  0.07206606  0.03452181  0.02949849
  0.01832897  0.01781715  0.01706002  0.0085365 ]

We're plotting variation for the first 3 components:






    Out[30]:






  
    
      
      aoa
      betweenness
      clustering
      degree
      frequency
      letters_count
      orthographic_density
      pagerank
      phonemes_count
      phonological_density
      syllables_count
      synonyms_count
    
  
  
    
      Component-0
      -0.446262
      0.276224
      -0.087837
      0.238706
      0.225108
      -0.445901
      0.224235
      0.279202
      -0.420940
      0.282661
      -0.158210
      0.000616
    
    
      Component-1
      0.328045
      -0.409550
      0.152141
      -0.295563
      -0.258442
      -0.425448
      0.157897
      -0.306974
      -0.418579
      0.213053
      -0.163004
      0.004164
    
    
      Component-2
      0.735232
      0.252298
      -0.150948
      0.095754
      0.581154
      -0.095967
      0.002559
      0.032680
      -0.034140
      0.089080
      0.003244
      -0.081795

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [31]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(N_COMPONENTS):
            source, destination = substitution\
                .components(component, pca, pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (2289 of 2289) |######################| Elapsed Time: 0:01:06 Time: 0:01:06

Compute cluster averages (so as not to overestimate confidence intervals).



In [32]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components (see the caveat section below)



In [33]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='upper left')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | *   | **  | *** |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | *** | *** | *** | ns. |

---
2.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | *** | *   | ns. |

4.2 On a subset of relevant features



In [34]:

    
relevant_features = ['frequency', 'aoa', 'letters_count']

Compute the actual PCA



In [35]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(relevant_features))
pcavariations = variations[variations['feature']
                           .map(lambda f: f in pcafeatures)]\
    .pivot(index='cluster_id', columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

pd.DataFrame(pca.components_,
             columns=pcafeatures,
             index=['Component-{}'.format(i)
                    for i in range(pca.n_components_)])









    



MLE estimates there are 2 components.

Those explain the following variance:
[ 0.68061225  0.18276911]







    Out[35]:






  
    
      
      aoa
      frequency
      letters_count
    
  
  
    
      Component-0
      -0.735242
      0.386221
      -0.557004
    
    
      Component-1
      0.422901
      -0.380811
      -0.822276

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [36]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(pca.n_components_):
            source, destination = substitution.components(component, pca,
                                                          pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (2289 of 2289) |######################| Elapsed Time: 0:00:17 Time: 0:00:17

Compute cluster averages (so as not to overestimate confidence intervals).



In [37]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components



In [38]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | *   | *** | *** | *** |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | ns. |

4.3 CAVEAT: reduction of the numbers of words and substitutions

As explained above, this PCA analysis can only use words for which all the features are defined (in this case, the features listed in relevant_features). So note the following:



In [39]:

    
for feature in relevant_features:
    print("Feature '{}' is based on {} words."
          .format(feature, len(Substitution
                               ._transformed_feature(feature)())))

# Compute the number of words that have all PAPER_FEATURES defined.
words = set()
for tfeature in [Substitution._transformed_feature(feature)
                 for feature in relevant_features]:
    words.update(tfeature())

data = dict((feature, []) for feature in relevant_features)
words_list = []
for word in words:
    words_list.append(word)
    for feature in relevant_features:
        data[feature].append(Substitution
                             ._transformed_feature(feature)(word))
wordsdf = pd.DataFrame(data)
wordsdf['words'] = words_list
del words_list, data

print()
print("Among all the set of words used by these features, "
      "only {} are used."
      .format(len(wordsdf.dropna())))

print()
print("Similarly, we mined {} (cluster-unique) substitutions, "
      "but the PCA is in fact"
      " computed on {} of them (those where all features are defined)."
      .format(len(set(variations['cluster_id'])), len(pcavariations)))









    



Feature 'frequency' is based on 33450 words.
Feature 'aoa' is based on 30102 words.
Feature 'letters_count' is based on 42786 words.

Among all the set of words used by these features, only 14450 are used.

Similarly, we mined 1839 (cluster-unique) substitutions, but the PCA is in fact computed on 1463 of them (those where all features are defined).

The way $\mathcal{H}_0$ and $\mathcal{H}_{00}$ are computed makes them also affected by this.

5 Interactions between features (by Anova)

Some useful variables first.



In [40]:

    
cuts = [('fixed bins', pd.cut)]#, ('quantiles', pd.qcut)]
rels = [('global', ''), ('sentence-relative', '_rel')]

def star_level(p):
    if p < .001:
        return '***'
    elif p < .01:
        return ' **'
    elif p < .05:
        return '  *'
    else:
        return 'ns.'

Now for each feature, assess if it has an interaction with the other features' destination value. We look at this for all pairs of features, with all pairs of global/sentence-relative value and types of binning (fixed width/quantiles). So it's a lot of answers.

Three stars means $p < .001$, two $p < .01$, one $p < .05$, and ns. means non-significative.



In [41]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
   ** global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
   ** sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
    * global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
    * global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
    * global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
  *** global -> global
    * global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
    * global -> global
    * global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
  *** global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
    * global -> sentence-relative
  *** sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
    * global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Now for each feature, look at its interaction with the other features' variation (i.e. destination - source). Same drill, same combinations.



In [42]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]\
                    - variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> synonyms_count
    * global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
   ** global -> global
    * global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
   ** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
    * global -> global
    * global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
   ** global -> global
   ** global -> sentence-relative
   ** sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Ok, so this can go on for a long time, and I'm not going to look at interactions with this lens (meaning at interaction of couples of features with another feature's destination values).

6 Regression



In [43]:

    
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures



In [44]:

    
rels = {False: ('global', ''),
        True: ('rel', '_rel')}

def regress(data, features, target,
            source_rel=False, dest_rel=False, interactions=False):
    if source_rel not in [True, False, 'both']:
        raise ValueError
    if not isinstance(dest_rel, bool):
        raise ValueError
    # Process source/destination relativeness arguments.
    if isinstance(source_rel, bool):
        source_rel = [source_rel]
    else:
        source_rel = [False, True]
    dest_rel_name, dest_rel = rels[dest_rel]
    
    features = tuple(sorted(features))
    feature_tuples = [('source' + rels[rel][1], feature)
                      for rel in source_rel
                      for feature in features]
    feature_names = [rels[rel][0] + '_' + feature
                     for rel in source_rel
                     for feature in features]
    
    # Get source and destination values.
    source = pd.pivot_table(
        data,
        values=['source' + rels[rel][1] for rel in source_rel],
        index=['cluster_id'],
        columns=['feature']
    )[feature_tuples].dropna()
    destination = variations[variations.feature == target]\
        .pivot(index='cluster_id', columns='feature',
               values='destination' + dest_rel)\
        .loc[source.index][target].dropna()
    source = source.loc[destination.index].values
    destination = destination.values

    # If asked to, get polynomial features.
    if interactions:
        poly = PolynomialFeatures(degree=2, interaction_only=True)
        source = poly.fit_transform(source)
        regress_features = [' * '.join([feature_names[j]
                                        for j, p in enumerate(powers)
                                        if p > 0]) or 'intercept'
                            for powers in poly.powers_]
    else:
        regress_features = feature_names

    # Regress.
    linreg = linear_model.LinearRegression(fit_intercept=not interactions)
    linreg.fit(source, destination)

    # And print the score and coefficients.
    print('Regressing {} with {} measures, {} interactions'
          .format(dest_rel_name + ' ' + target, len(source),
                  'with' if interactions else 'no'))
    print('           ' + '^' * len(dest_rel_name + ' ' + target))
    print('R^2 = {}'
          .format(linreg.score(source, destination)))
    print()
    coeffs = pd.Series(index=regress_features, data=linreg.coef_)
    if not interactions:
        coeffs = pd.Series(index=['intercept'], data=[linreg.intercept_])\
            .append(coeffs)
    with pd.option_context('display.max_rows', 999):
        print(coeffs)



In [45]:

    
for target in PAPER_FEATURES:
    print('-' * 70)
    for source_rel, dest_rel in product([False, True, 'both'],
                                        [False, True]):
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel)
        print()
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel, interactions=True)
        print()









    



----------------------------------------------------------------------
Regressing global frequency with 1135 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.05092761799769541

intercept                      5.569300
global_aoa                     0.033309
global_clustering             -0.055170
global_frequency               0.298179
global_letters_count          -0.014794
global_orthographic_density   -0.039255
global_synonyms_count          0.064807
dtype: float64

Regressing global frequency with 1135 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.06634161286796303

intercept                                              10.353287
global_aoa                                              0.152916
global_clustering                                       0.751890
global_frequency                                        0.517993
global_letters_count                                   -1.124183
global_orthographic_density                            -0.823061
global_synonyms_count                                  -0.266760
global_aoa * global_clustering                          0.045852
global_aoa * global_frequency                           0.005199
global_aoa * global_letters_count                       0.013894
global_aoa * global_orthographic_density                0.010409
global_aoa * global_synonyms_count                     -0.001957
global_clustering * global_frequency                    0.017861
global_clustering * global_letters_count               -0.201287
global_clustering * global_orthographic_density        -0.120011
global_clustering * global_synonyms_count               0.252538
global_frequency * global_letters_count                -0.023881
global_frequency * global_orthographic_density         -0.033436
global_frequency * global_synonyms_count                0.094977
global_letters_count * global_orthographic_density      0.052816
global_letters_count * global_synonyms_count            0.099426
global_orthographic_density * global_synonyms_count     0.268392
dtype: float64

Regressing rel frequency with 1135 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.02358013799934633

intercept                     -6.496182
global_aoa                     0.043219
global_clustering             -0.055706
global_frequency               0.231190
global_letters_count           0.071017
global_orthographic_density    0.032809
global_synonyms_count          0.134862
dtype: float64

Regressing rel frequency with 1135 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.038635307799503305

intercept                                             -5.746180
global_aoa                                             0.437291
global_clustering                                      0.140638
global_frequency                                       0.440111
global_letters_count                                  -0.669411
global_orthographic_density                           -0.098966
global_synonyms_count                                 -1.000643
global_aoa * global_clustering                         0.087382
global_aoa * global_frequency                          0.005268
global_aoa * global_letters_count                      0.012983
global_aoa * global_orthographic_density              -0.019255
global_aoa * global_synonyms_count                     0.056028
global_clustering * global_frequency                   0.011087
global_clustering * global_letters_count              -0.155696
global_clustering * global_orthographic_density       -0.020692
global_clustering * global_synonyms_count              0.253802
global_frequency * global_letters_count               -0.035382
global_frequency * global_orthographic_density        -0.030179
global_frequency * global_synonyms_count               0.164355
global_letters_count * global_orthographic_density     0.069236
global_letters_count * global_synonyms_count           0.067517
global_orthographic_density * global_synonyms_count    0.264410
dtype: float64

Regressing global frequency with 1135 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.03579532055169454

intercept                   9.368949
rel_aoa                     0.053696
rel_clustering             -0.126941
rel_frequency               0.202538
rel_letters_count          -0.023051
rel_orthographic_density   -0.028567
rel_synonyms_count          0.024898
dtype: float64

Regressing global frequency with 1135 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.048309892646133834

intercept                                        9.233787
rel_aoa                                          0.047717
rel_clustering                                  -0.033528
rel_frequency                                    0.164721
rel_letters_count                                0.035625
rel_orthographic_density                        -0.155838
rel_synonyms_count                               0.483165
rel_aoa * rel_clustering                         0.003968
rel_aoa * rel_frequency                          0.018470
rel_aoa * rel_letters_count                      0.026761
rel_aoa * rel_orthographic_density               0.009699
rel_aoa * rel_synonyms_count                     0.053802
rel_clustering * rel_frequency                  -0.023099
rel_clustering * rel_letters_count              -0.101171
rel_clustering * rel_orthographic_density       -0.067687
rel_clustering * rel_synonyms_count              0.227550
rel_frequency * rel_letters_count               -0.002269
rel_frequency * rel_orthographic_density        -0.038927
rel_frequency * rel_synonyms_count               0.151808
rel_letters_count * rel_orthographic_density     0.033901
rel_letters_count * rel_synonyms_count           0.050980
rel_orthographic_density * rel_synonyms_count    0.258644
dtype: float64

Regressing rel frequency with 1135 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.19745004254322276

intercept                  -1.740440
rel_aoa                     0.032460
rel_clustering              0.097993
rel_frequency               0.571919
rel_letters_count          -0.099101
rel_orthographic_density   -0.187356
rel_synonyms_count          0.064502
dtype: float64

Regressing rel frequency with 1135 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.215912806392616

intercept                                       -1.876178
rel_aoa                                         -0.063971
rel_clustering                                   0.090319
rel_frequency                                    0.568069
rel_letters_count                               -0.014488
rel_orthographic_density                        -0.387956
rel_synonyms_count                               0.366594
rel_aoa * rel_clustering                        -0.040314
rel_aoa * rel_frequency                         -0.031362
rel_aoa * rel_letters_count                      0.047460
rel_aoa * rel_orthographic_density               0.079595
rel_aoa * rel_synonyms_count                     0.164874
rel_clustering * rel_frequency                  -0.057583
rel_clustering * rel_letters_count              -0.153346
rel_clustering * rel_orthographic_density       -0.242436
rel_clustering * rel_synonyms_count              0.128418
rel_frequency * rel_letters_count               -0.003235
rel_frequency * rel_orthographic_density        -0.065826
rel_frequency * rel_synonyms_count               0.088518
rel_letters_count * rel_orthographic_density     0.041699
rel_letters_count * rel_synonyms_count           0.037283
rel_orthographic_density * rel_synonyms_count    0.330356
dtype: float64

Regressing global frequency with 1135 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.05421483019109563

intercept                      5.463455
global_aoa                    -0.014166
global_clustering              0.021535
global_frequency               0.313665
global_letters_count           0.120934
global_orthographic_density    0.036477
global_synonyms_count          0.079401
rel_aoa                        0.063946
rel_clustering                -0.091763
rel_frequency                 -0.016406
rel_letters_count             -0.145017
rel_orthographic_density      -0.081190
rel_synonyms_count            -0.035444
dtype: float64

Regressing global frequency with 1135 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.11838983196630148

intercept                                                -1.394843
global_aoa                                               -0.850741
global_clustering                                        -6.226261
global_frequency                                         -0.057819
global_letters_count                                     -1.225519
global_orthographic_density                              -2.133125
global_synonyms_count                                    -7.077541
rel_aoa                                                   1.203963
rel_clustering                                            9.370079
rel_frequency                                             1.503467
rel_letters_count                                         0.021496
rel_orthographic_density                                  0.866640
rel_synonyms_count                                        7.417909
global_aoa * global_clustering                            0.308945
global_aoa * global_frequency                             0.168912
global_aoa * global_letters_count                         0.114362
global_aoa * global_orthographic_density                  0.185348
global_aoa * global_synonyms_count                        0.004469
global_aoa * rel_aoa                                     -0.011589
global_aoa * rel_clustering                              -0.240108
global_aoa * rel_frequency                               -0.100531
global_aoa * rel_letters_count                           -0.126723
global_aoa * rel_orthographic_density                    -0.265267
global_aoa * rel_synonyms_count                          -0.150105
global_clustering * global_frequency                      0.252875
global_clustering * global_letters_count                  0.020302
global_clustering * global_orthographic_density           0.865924
global_clustering * global_synonyms_count                 0.515355
global_clustering * rel_aoa                              -0.290631
global_clustering * rel_clustering                        0.139575
global_clustering * rel_frequency                        -0.130453
global_clustering * rel_letters_count                    -0.205607
global_clustering * rel_orthographic_density             -0.691248
global_clustering * rel_synonyms_count                   -0.078988
global_frequency * global_letters_count                  -0.001852
global_frequency * global_orthographic_density            0.421047
global_frequency * global_synonyms_count                  0.308851
global_frequency * rel_aoa                               -0.250381
global_frequency * rel_clustering                        -0.274317
global_frequency * rel_frequency                         -0.017336
global_frequency * rel_letters_count                      0.005837
global_frequency * rel_orthographic_density              -0.270892
global_frequency * rel_synonyms_count                    -0.164571
global_letters_count * global_orthographic_density        0.237106
global_letters_count * global_synonyms_count              0.824480
global_letters_count * rel_aoa                           -0.056237
global_letters_count * rel_clustering                    -0.416172
global_letters_count * rel_frequency                     -0.115741
global_letters_count * rel_letters_count                  0.016335
global_letters_count * rel_orthographic_density          -0.021541
global_letters_count * rel_synonyms_count                -0.532758
global_orthographic_density * global_synonyms_count       1.344823
global_orthographic_density * rel_aoa                     0.028313
global_orthographic_density * rel_clustering             -1.213593
global_orthographic_density * rel_frequency              -0.495033
global_orthographic_density * rel_letters_count          -0.278625
global_orthographic_density * rel_orthographic_density   -0.087437
global_orthographic_density * rel_synonyms_count         -1.014218
global_synonyms_count * rel_aoa                          -0.121315
global_synonyms_count * rel_clustering                   -0.737046
global_synonyms_count * rel_frequency                    -0.447559
global_synonyms_count * rel_letters_count                -0.542833
global_synonyms_count * rel_orthographic_density         -1.143486
global_synonyms_count * rel_synonyms_count                0.105317
rel_aoa * rel_clustering                                  0.190419
rel_aoa * rel_frequency                                   0.134981
rel_aoa * rel_letters_count                               0.108324
rel_aoa * rel_orthographic_density                        0.064370
rel_aoa * rel_synonyms_count                              0.268514
rel_clustering * rel_frequency                            0.152995
rel_clustering * rel_letters_count                        0.383494
rel_clustering * rel_orthographic_density                 0.854293
rel_clustering * rel_synonyms_count                       0.621061
rel_frequency * rel_letters_count                         0.094176
rel_frequency * rel_orthographic_density                  0.319340
rel_frequency * rel_synonyms_count                        0.484276
rel_letters_count * rel_orthographic_density              0.128509
rel_letters_count * rel_synonyms_count                    0.405401
rel_orthographic_density * rel_synonyms_count             1.151156
dtype: float64

Regressing rel frequency with 1135 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.2864042269747268

intercept                      4.509180
global_aoa                    -0.007695
global_clustering              0.064613
global_frequency              -0.614545
global_letters_count           0.165002
global_orthographic_density    0.148159
global_synonyms_count          0.100046
rel_aoa                        0.032520
rel_clustering                -0.097426
rel_frequency                  0.939527
rel_letters_count             -0.176559
rel_orthographic_density      -0.170026
rel_synonyms_count            -0.066924
dtype: float64

Regressing rel frequency with 1135 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.3319005296969024

intercept                                                -10.950951
global_aoa                                                -0.344355
global_clustering                                         -6.457858
global_frequency                                          -0.356258
global_letters_count                                      -1.095109
global_orthographic_density                               -0.094467
global_synonyms_count                                     -6.686727
rel_aoa                                                    0.783575
rel_clustering                                             9.963923
rel_frequency                                              1.854277
rel_letters_count                                          0.333635
rel_orthographic_density                                  -0.068623
rel_synonyms_count                                         6.939867
global_aoa * global_clustering                             0.301551
global_aoa * global_frequency                              0.132237
global_aoa * global_letters_count                          0.108345
global_aoa * global_orthographic_density                   0.151347
global_aoa * global_synonyms_count                        -0.004599
global_aoa * rel_aoa                                      -0.008668
global_aoa * rel_clustering                               -0.235722
global_aoa * rel_frequency                                -0.063207
global_aoa * rel_letters_count                            -0.128078
global_aoa * rel_orthographic_density                     -0.244187
global_aoa * rel_synonyms_count                           -0.129967
global_clustering * global_frequency                       0.271794
global_clustering * global_letters_count                   0.020464
global_clustering * global_orthographic_density            0.920740
global_clustering * global_synonyms_count                  0.451413
global_clustering * rel_aoa                               -0.279776
global_clustering * rel_clustering                         0.137776
global_clustering * rel_frequency                         -0.113943
global_clustering * rel_letters_count                     -0.123082
global_clustering * rel_orthographic_density              -0.670610
global_clustering * rel_synonyms_count                     0.005048
global_frequency * global_letters_count                   -0.001745
global_frequency * global_orthographic_density             0.313058
global_frequency * global_synonyms_count                   0.245688
global_frequency * rel_aoa                                -0.207758
global_frequency * rel_clustering                         -0.325168
global_frequency * rel_frequency                           0.008187
global_frequency * rel_letters_count                       0.020465
global_frequency * rel_orthographic_density               -0.201445
global_frequency * rel_synonyms_count                     -0.067021
global_letters_count * global_orthographic_density         0.221245
global_letters_count * global_synonyms_count               0.898089
global_letters_count * rel_aoa                            -0.066793
global_letters_count * rel_clustering                     -0.416156
global_letters_count * rel_frequency                      -0.100253
global_letters_count * rel_letters_count                   0.018559
global_letters_count * rel_orthographic_density           -0.011449
global_letters_count * rel_synonyms_count                 -0.609188
global_orthographic_density * global_synonyms_count        1.206302
global_orthographic_density * rel_aoa                      0.024845
global_orthographic_density * rel_clustering              -1.233238
global_orthographic_density * rel_frequency               -0.396003
global_orthographic_density * rel_letters_count           -0.273339
global_orthographic_density * rel_orthographic_density    -0.082794
global_orthographic_density * rel_synonyms_count          -0.978354
global_synonyms_count * rel_aoa                           -0.118687
global_synonyms_count * rel_clustering                    -0.662228
global_synonyms_count * rel_frequency                     -0.433968
global_synonyms_count * rel_letters_count                 -0.658649
global_synonyms_count * rel_orthographic_density          -1.008285
global_synonyms_count * rel_synonyms_count                 0.120807
rel_aoa * rel_clustering                                   0.197965
rel_aoa * rel_frequency                                    0.100338
rel_aoa * rel_letters_count                                0.110934
rel_aoa * rel_orthographic_density                         0.051798
rel_aoa * rel_synonyms_count                               0.285825
rel_clustering * rel_frequency                             0.182021
rel_clustering * rel_letters_count                         0.318734
rel_clustering * rel_orthographic_density                  0.838557
rel_clustering * rel_synonyms_count                        0.540648
rel_frequency * rel_letters_count                          0.071204
rel_frequency * rel_orthographic_density                   0.267983
rel_frequency * rel_synonyms_count                         0.448084
rel_letters_count * rel_orthographic_density               0.127493
rel_letters_count * rel_synonyms_count                     0.516350
rel_orthographic_density * rel_synonyms_count              1.119963
dtype: float64

----------------------------------------------------------------------
Regressing global aoa with 1035 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.06653881265831163

intercept                      6.048668
global_aoa                     0.213992
global_clustering             -0.015924
global_frequency              -0.084469
global_letters_count           0.078773
global_orthographic_density   -0.036025
global_synonyms_count         -0.240920
dtype: float64

Regressing global aoa with 1035 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.09110626024737722

intercept                                             -12.017134
global_aoa                                              0.157347
global_clustering                                      -3.236914
global_frequency                                        0.836886
global_letters_count                                    1.696674
global_orthographic_density                             0.426490
global_synonyms_count                                  -1.989118
global_aoa * global_clustering                          0.038479
global_aoa * global_frequency                           0.010158
global_aoa * global_letters_count                       0.029710
global_aoa * global_orthographic_density               -0.004027
global_aoa * global_synonyms_count                      0.033282
global_clustering * global_frequency                    0.137332
global_clustering * global_letters_count                0.282417
global_clustering * global_orthographic_density         0.108265
global_clustering * global_synonyms_count              -0.412000
global_frequency * global_letters_count                -0.028316
global_frequency * global_orthographic_density         -0.014070
global_frequency * global_synonyms_count               -0.079108
global_letters_count * global_orthographic_density      0.046859
global_letters_count * global_synonyms_count           -0.002671
global_orthographic_density * global_synonyms_count    -0.088752
dtype: float64

Regressing rel aoa with 1035 measures, no interactions
           ^^^^^^^
R^2 = 0.01356725309748219

intercept                      0.683986
global_aoa                     0.097176
global_clustering             -0.019949
global_frequency              -0.052094
global_letters_count           0.020539
global_orthographic_density    0.057217
global_synonyms_count         -0.159458
dtype: float64

Regressing rel aoa with 1035 measures, with interactions
           ^^^^^^^
R^2 = 0.0327599621790412

intercept                                             -13.916806
global_aoa                                              0.394315
global_clustering                                      -1.992189
global_frequency                                        0.964792
global_letters_count                                    1.146792
global_orthographic_density                             0.062683
global_synonyms_count                                  -0.306558
global_aoa * global_clustering                          0.023182
global_aoa * global_frequency                          -0.012611
global_aoa * global_letters_count                      -0.013574
global_aoa * global_orthographic_density                0.016777
global_aoa * global_synonyms_count                      0.014264
global_clustering * global_frequency                    0.127870
global_clustering * global_letters_count                0.150659
global_clustering * global_orthographic_density        -0.106812
global_clustering * global_synonyms_count              -0.334284
global_frequency * global_letters_count                -0.011462
global_frequency * global_orthographic_density         -0.053841
global_frequency * global_synonyms_count               -0.141092
global_letters_count * global_orthographic_density     -0.048143
global_letters_count * global_synonyms_count           -0.070719
global_orthographic_density * global_synonyms_count    -0.129703
dtype: float64

Regressing global aoa with 1035 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.027616414762624

intercept                   6.601148
rel_aoa                     0.008087
rel_clustering              0.175299
rel_frequency              -0.007414
rel_letters_count           0.036275
rel_orthographic_density   -0.285130
rel_synonyms_count         -0.395410
dtype: float64

Regressing global aoa with 1035 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.049905971815738526

intercept                                        6.555986
rel_aoa                                         -0.153623
rel_clustering                                   0.072446
rel_frequency                                   -0.025082
rel_letters_count                                0.014885
rel_orthographic_density                        -0.567006
rel_synonyms_count                              -0.764458
rel_aoa * rel_clustering                        -0.003751
rel_aoa * rel_frequency                         -0.045174
rel_aoa * rel_letters_count                      0.040881
rel_aoa * rel_orthographic_density               0.031635
rel_aoa * rel_synonyms_count                    -0.024989
rel_clustering * rel_frequency                   0.099587
rel_clustering * rel_letters_count               0.198913
rel_clustering * rel_orthographic_density        0.023404
rel_clustering * rel_synonyms_count             -0.500204
rel_frequency * rel_letters_count               -0.010919
rel_frequency * rel_orthographic_density        -0.030586
rel_frequency * rel_synonyms_count              -0.169158
rel_letters_count * rel_orthographic_density     0.096898
rel_letters_count * rel_synonyms_count           0.006393
rel_orthographic_density * rel_synonyms_count   -0.061010
dtype: float64

Regressing rel aoa with 1035 measures, no interactions
           ^^^^^^^
R^2 = 0.13440333036339358

intercept                   0.836519
rel_aoa                     0.435563
rel_clustering             -0.154003
rel_frequency              -0.080444
rel_letters_count          -0.007436
rel_orthographic_density    0.147691
rel_synonyms_count         -0.277548
dtype: float64

Regressing rel aoa with 1035 measures, with interactions
           ^^^^^^^
R^2 = 0.15043220372207944

intercept                                        0.976750
rel_aoa                                          0.495556
rel_clustering                                  -0.332656
rel_frequency                                   -0.086627
rel_letters_count                               -0.012474
rel_orthographic_density                         0.368125
rel_synonyms_count                              -0.558297
rel_aoa * rel_clustering                        -0.023561
rel_aoa * rel_frequency                          0.024107
rel_aoa * rel_letters_count                      0.009661
rel_aoa * rel_orthographic_density               0.005100
rel_aoa * rel_synonyms_count                    -0.051715
rel_clustering * rel_frequency                   0.057781
rel_clustering * rel_letters_count               0.218390
rel_clustering * rel_orthographic_density        0.147487
rel_clustering * rel_synonyms_count             -0.283836
rel_frequency * rel_letters_count                0.035741
rel_frequency * rel_orthographic_density         0.126425
rel_frequency * rel_synonyms_count              -0.113869
rel_letters_count * rel_orthographic_density     0.013765
rel_letters_count * rel_synonyms_count           0.002149
rel_orthographic_density * rel_synonyms_count   -0.104007
dtype: float64

Regressing global aoa with 1035 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.08876757905698318

intercept                      3.695410
global_aoa                     0.384674
global_clustering             -0.060807
global_frequency              -0.030582
global_letters_count           0.207207
global_orthographic_density   -0.066468
global_synonyms_count          0.227629
rel_aoa                       -0.258401
rel_clustering                 0.049934
rel_frequency                 -0.058519
rel_letters_count             -0.140208
rel_orthographic_density       0.085465
rel_synonyms_count            -0.563158
dtype: float64

Regressing global aoa with 1035 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.15421698441966092

intercept                                                 11.579573
global_aoa                                                 2.513297
global_clustering                                          4.004677
global_frequency                                           2.590315
global_letters_count                                      -2.632383
global_orthographic_density                               -8.273981
global_synonyms_count                                     -3.792461
rel_aoa                                                   -0.933503
rel_clustering                                            -7.538344
rel_frequency                                             -0.828131
rel_letters_count                                          3.899300
rel_orthographic_density                                   9.723687
rel_synonyms_count                                        -4.102503
global_aoa * global_clustering                            -0.022716
global_aoa * global_frequency                             -0.217225
global_aoa * global_letters_count                         -0.054469
global_aoa * global_orthographic_density                   0.200994
global_aoa * global_synonyms_count                        -0.377533
global_aoa * rel_aoa                                       0.021879
global_aoa * rel_clustering                                0.035835
global_aoa * rel_frequency                                 0.168024
global_aoa * rel_letters_count                             0.068875
global_aoa * rel_orthographic_density                     -0.169487
global_aoa * rel_synonyms_count                            0.471346
global_clustering * global_frequency                       0.187561
global_clustering * global_letters_count                  -0.217896
global_clustering * global_orthographic_density           -2.195567
global_clustering * global_synonyms_count                 -0.973299
global_clustering * rel_aoa                                0.125344
global_clustering * rel_clustering                         0.019035
global_clustering * rel_frequency                          0.068334
global_clustering * rel_letters_count                      0.251619
global_clustering * rel_orthographic_density               1.762406
global_clustering * rel_synonyms_count                     0.503510
global_frequency * global_letters_count                    0.195506
global_frequency * global_orthographic_density            -0.422079
global_frequency * global_synonyms_count                  -0.133011
global_frequency * rel_aoa                                 0.223504
global_frequency * rel_clustering                          0.070638
global_frequency * rel_frequency                           0.035907
global_frequency * rel_letters_count                      -0.245560
global_frequency * rel_orthographic_density                0.182156
global_frequency * rel_synonyms_count                      0.380981
global_letters_count * global_orthographic_density        -0.390286
global_letters_count * global_synonyms_count               0.494356
global_letters_count * rel_aoa                            -0.143468
global_letters_count * rel_clustering                      0.539843
global_letters_count * rel_frequency                      -0.143267
global_letters_count * rel_letters_count                  -0.021388
global_letters_count * rel_orthographic_density            0.004823
global_letters_count * rel_synonyms_count                 -0.198943
global_orthographic_density * global_synonyms_count       -0.333112
global_orthographic_density * rel_aoa                     -0.283958
global_orthographic_density * rel_clustering               1.846730
global_orthographic_density * rel_frequency                0.261287
global_orthographic_density * rel_letters_count            0.168633
global_orthographic_density * rel_orthographic_density    -0.025205
global_orthographic_density * rel_synonyms_count           0.330001
global_synonyms_count * rel_aoa                            0.421367
global_synonyms_count * rel_clustering                     1.179616
global_synonyms_count * rel_frequency                     -0.085262
global_synonyms_count * rel_letters_count                 -0.635352
global_synonyms_count * rel_orthographic_density           0.001175
global_synonyms_count * rel_synonyms_count                -0.082411
rel_aoa * rel_clustering                                  -0.061259
rel_aoa * rel_frequency                                   -0.120708
rel_aoa * rel_letters_count                                0.117277
rel_aoa * rel_orthographic_density                         0.218515
rel_aoa * rel_synonyms_count                              -0.474866
rel_clustering * rel_frequency                            -0.147429
rel_clustering * rel_letters_count                        -0.268968
rel_clustering * rel_orthographic_density                 -1.201326
rel_clustering * rel_synonyms_count                       -1.185707
rel_frequency * rel_letters_count                          0.142774
rel_frequency * rel_orthographic_density                  -0.048571
rel_frequency * rel_synonyms_count                        -0.261705
rel_letters_count * rel_orthographic_density               0.195913
rel_letters_count * rel_synonyms_count                     0.321895
rel_orthographic_density * rel_synonyms_count              0.014216
dtype: float64

Regressing rel aoa with 1035 measures, no interactions
           ^^^^^^^
R^2 = 0.1768493010794906

intercept                      1.962674
global_aoa                    -0.360013
global_clustering             -0.005143
global_frequency               0.038636
global_letters_count           0.121284
global_orthographic_density   -0.110049
global_synonyms_count          0.315384
rel_aoa                        0.680847
rel_clustering                -0.007266
rel_frequency                 -0.100610
rel_letters_count             -0.101521
rel_orthographic_density       0.074267
rel_synonyms_count            -0.620394
dtype: float64

Regressing rel aoa with 1035 measures, with interactions
           ^^^^^^^
R^2 = 0.233117501250661

intercept                                                -0.271380
global_aoa                                                0.416959
global_clustering                                         0.930300
global_frequency                                          2.258472
global_letters_count                                     -0.655168
global_orthographic_density                              -7.062027
global_synonyms_count                                    -0.881407
rel_aoa                                                   0.850907
rel_clustering                                           -3.389955
rel_frequency                                            -0.881807
rel_letters_count                                         2.592401
rel_orthographic_density                                  8.839634
rel_synonyms_count                                       -4.444725
global_aoa * global_clustering                           -0.064624
global_aoa * global_frequency                            -0.139841
global_aoa * global_letters_count                        -0.037756
global_aoa * global_orthographic_density                  0.241320
global_aoa * global_synonyms_count                       -0.211523
global_aoa * rel_aoa                                     -0.006467
global_aoa * rel_clustering                               0.070732
global_aoa * rel_frequency                                0.088724
global_aoa * rel_letters_count                           -0.003783
global_aoa * rel_orthographic_density                    -0.320821
global_aoa * rel_synonyms_count                           0.285344
global_clustering * global_frequency                      0.227597
global_clustering * global_letters_count                  0.046298
global_clustering * global_orthographic_density          -1.452461
global_clustering * global_synonyms_count                -0.819839
global_clustering * rel_aoa                               0.137377
global_clustering * rel_clustering                        0.076927
global_clustering * rel_frequency                        -0.066919
global_clustering * rel_letters_count                    -0.068664
global_clustering * rel_orthographic_density              0.968220
global_clustering * rel_synonyms_count                    0.478884
global_frequency * global_letters_count                   0.138135
global_frequency * global_orthographic_density           -0.205620
global_frequency * global_synonyms_count                 -0.209597
global_frequency * rel_aoa                                0.151578
global_frequency * rel_clustering                        -0.012462
global_frequency * rel_frequency                          0.030146
global_frequency * rel_letters_count                     -0.238080
global_frequency * rel_orthographic_density              -0.073776
global_frequency * rel_synonyms_count                     0.330441
global_letters_count * global_orthographic_density       -0.242651
global_letters_count * global_synonyms_count              0.128546
global_letters_count * rel_aoa                           -0.093153
global_letters_count * rel_clustering                     0.235124
global_letters_count * rel_frequency                     -0.117348
global_letters_count * rel_letters_count                 -0.022556
global_letters_count * rel_orthographic_density           0.014639
global_letters_count * rel_synonyms_count                 0.038533
global_orthographic_density * global_synonyms_count      -0.479214
global_orthographic_density * rel_aoa                    -0.256987
global_orthographic_density * rel_clustering              1.084968
global_orthographic_density * rel_frequency               0.018402
global_orthographic_density * rel_letters_count          -0.040598
global_orthographic_density * rel_orthographic_density   -0.118652
global_orthographic_density * rel_synonyms_count          0.547423
global_synonyms_count * rel_aoa                           0.275060
global_synonyms_count * rel_clustering                    1.030960
global_synonyms_count * rel_frequency                     0.063150
global_synonyms_count * rel_letters_count                -0.280288
global_synonyms_count * rel_orthographic_density          0.174980
global_synonyms_count * rel_synonyms_count               -0.089194
rel_aoa * rel_clustering                                 -0.084783
rel_aoa * rel_frequency                                  -0.068337
rel_aoa * rel_letters_count                               0.113181
rel_aoa * rel_orthographic_density                        0.287373
rel_aoa * rel_synonyms_count                             -0.348748
rel_clustering * rel_frequency                            0.030914
rel_clustering * rel_letters_count                        0.039461
rel_clustering * rel_orthographic_density                -0.477351
rel_clustering * rel_synonyms_count                      -1.142823
rel_frequency * rel_letters_count                         0.182981
rel_frequency * rel_orthographic_density                  0.227016
rel_frequency * rel_synonyms_count                       -0.338865
rel_letters_count * rel_orthographic_density              0.181604
rel_letters_count * rel_synonyms_count                    0.092569
rel_orthographic_density * rel_synonyms_count            -0.300925
dtype: float64

----------------------------------------------------------------------
Regressing global clustering with 915 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.013926892953416536

intercept                     -5.427622
global_aoa                     0.008855
global_clustering              0.057753
global_frequency              -0.027437
global_letters_count           0.018110
global_orthographic_density    0.039340
global_synonyms_count          0.040765
dtype: float64

Regressing global clustering with 915 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.04398336286022397

intercept                                             -5.415831
global_aoa                                             0.134246
global_clustering                                      0.197632
global_frequency                                      -0.326637
global_letters_count                                   0.358330
global_orthographic_density                            0.643763
global_synonyms_count                                 -0.152919
global_aoa * global_clustering                        -0.014253
global_aoa * global_frequency                         -0.008302
global_aoa * global_letters_count                     -0.015416
global_aoa * global_orthographic_density              -0.025893
global_aoa * global_synonyms_count                    -0.015027
global_clustering * global_frequency                  -0.041681
global_clustering * global_letters_count               0.046477
global_clustering * global_orthographic_density        0.082258
global_clustering * global_synonyms_count             -0.161716
global_frequency * global_letters_count                0.009754
global_frequency * global_orthographic_density         0.028475
global_frequency * global_synonyms_count               0.005891
global_letters_count * global_orthographic_density    -0.030280
global_letters_count * global_synonyms_count          -0.075877
global_orthographic_density * global_synonyms_count   -0.205136
dtype: float64

Regressing rel clustering with 915 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.013235412282621373

intercept                      0.436581
global_aoa                     0.015942
global_clustering              0.040190
global_frequency              -0.019464
global_letters_count           0.029917
global_orthographic_density    0.064828
global_synonyms_count          0.019336
dtype: float64

Regressing rel clustering with 915 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.03659406908167406

intercept                                              2.078141
global_aoa                                             0.069918
global_clustering                                      0.389369
global_frequency                                      -0.315081
global_letters_count                                   0.070258
global_orthographic_density                            0.667679
global_synonyms_count                                  0.249146
global_aoa * global_clustering                        -0.017253
global_aoa * global_frequency                         -0.006589
global_aoa * global_letters_count                     -0.007675
global_aoa * global_orthographic_density              -0.028345
global_aoa * global_synonyms_count                    -0.039599
global_clustering * global_frequency                  -0.040971
global_clustering * global_letters_count               0.011075
global_clustering * global_orthographic_density        0.075261
global_clustering * global_synonyms_count             -0.103904
global_frequency * global_letters_count                0.011451
global_frequency * global_orthographic_density         0.017064
global_frequency * global_synonyms_count              -0.006174
global_letters_count * global_orthographic_density    -0.013374
global_letters_count * global_synonyms_count          -0.045604
global_orthographic_density * global_synonyms_count   -0.200341
dtype: float64

Regressing global clustering with 915 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.0021053733319106316

intercept                  -5.798973
rel_aoa                     0.006062
rel_clustering              0.037808
rel_frequency               0.003388
rel_letters_count           0.006774
rel_orthographic_density    0.027974
rel_synonyms_count          0.008065
dtype: float64

Regressing global clustering with 915 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.015073500365170411

intercept                                       -5.800011e+00
rel_aoa                                         -1.731629e-02
rel_clustering                                   4.120036e-02
rel_frequency                                   -3.704124e-03
rel_letters_count                               -8.451542e-07
rel_orthographic_density                         8.055076e-02
rel_synonyms_count                              -1.183780e-01
rel_aoa * rel_clustering                         3.442701e-03
rel_aoa * rel_frequency                         -8.702757e-03
rel_aoa * rel_letters_count                     -5.412612e-03
rel_aoa * rel_orthographic_density              -9.715018e-03
rel_aoa * rel_synonyms_count                    -1.087327e-02
rel_clustering * rel_frequency                   1.916351e-03
rel_clustering * rel_letters_count               1.309215e-02
rel_clustering * rel_orthographic_density        2.601094e-02
rel_clustering * rel_synonyms_count             -1.289165e-01
rel_frequency * rel_letters_count                7.188487e-03
rel_frequency * rel_orthographic_density         8.119131e-03
rel_frequency * rel_synonyms_count              -3.914745e-02
rel_letters_count * rel_orthographic_density    -1.984854e-02
rel_letters_count * rel_synonyms_count          -4.215622e-02
rel_orthographic_density * rel_synonyms_count   -1.682726e-01
dtype: float64

Regressing rel clustering with 915 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.07187947359614688

intercept                   0.336594
rel_aoa                    -0.009339
rel_clustering              0.266910
rel_frequency               0.015736
rel_letters_count           0.026452
rel_orthographic_density    0.047818
rel_synonyms_count          0.040985
dtype: float64

Regressing rel clustering with 915 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.08754693278226655

intercept                                        0.315992
rel_aoa                                         -0.042323
rel_clustering                                   0.276550
rel_frequency                                    0.007574
rel_letters_count                                0.033036
rel_orthographic_density                         0.078383
rel_synonyms_count                              -0.102748
rel_aoa * rel_clustering                         0.009635
rel_aoa * rel_frequency                         -0.008241
rel_aoa * rel_letters_count                     -0.008625
rel_aoa * rel_orthographic_density              -0.026333
rel_aoa * rel_synonyms_count                    -0.016491
rel_clustering * rel_frequency                  -0.012409
rel_clustering * rel_letters_count              -0.004330
rel_clustering * rel_orthographic_density        0.038844
rel_clustering * rel_synonyms_count             -0.151525
rel_frequency * rel_letters_count                0.005970
rel_frequency * rel_orthographic_density        -0.001227
rel_frequency * rel_synonyms_count              -0.036599
rel_letters_count * rel_orthographic_density    -0.014277
rel_letters_count * rel_synonyms_count          -0.011817
rel_orthographic_density * rel_synonyms_count   -0.133319
dtype: float64

Regressing global clustering with 915 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.03163919467911003

intercept                     -4.190917
global_aoa                     0.012167
global_clustering              0.159044
global_frequency              -0.059749
global_letters_count           0.021216
global_orthographic_density   -0.062305
global_synonyms_count          0.106601
rel_aoa                       -0.006871
rel_clustering                -0.115813
rel_frequency                  0.037493
rel_letters_count             -0.000868
rel_orthographic_density       0.122879
rel_synonyms_count            -0.085600
dtype: float64

Regressing global clustering with 915 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.14620451048874972

intercept                                                -3.209075
global_aoa                                                0.564133
global_clustering                                         2.110767
global_frequency                                         -0.497675
global_letters_count                                      2.301972
global_orthographic_density                               0.633603
global_synonyms_count                                    -3.788732
rel_aoa                                                  -0.217602
rel_clustering                                           -3.384001
rel_frequency                                            -0.602444
rel_letters_count                                        -1.787515
rel_orthographic_density                                  0.177906
rel_synonyms_count                                        1.186166
global_aoa * global_clustering                           -0.115085
global_aoa * global_frequency                            -0.043742
global_aoa * global_letters_count                        -0.112656
global_aoa * global_orthographic_density                 -0.145182
global_aoa * global_synonyms_count                        0.058787
global_aoa * rel_aoa                                      0.016595
global_aoa * rel_clustering                               0.147696
global_aoa * rel_frequency                                0.058735
global_aoa * rel_letters_count                            0.100822
global_aoa * rel_orthographic_density                     0.100021
global_aoa * rel_synonyms_count                          -0.055341
global_clustering * global_frequency                     -0.144349
global_clustering * global_letters_count                  0.178695
global_clustering * global_orthographic_density          -0.174391
global_clustering * global_synonyms_count                -0.359869
global_clustering * rel_aoa                              -0.005778
global_clustering * rel_clustering                       -0.089563
global_clustering * rel_frequency                         0.040190
global_clustering * rel_letters_count                    -0.110724
global_clustering * rel_orthographic_density              0.189524
global_clustering * rel_synonyms_count                    0.230528
global_frequency * global_letters_count                  -0.029500
global_frequency * global_orthographic_density           -0.031882
global_frequency * global_synonyms_count                  0.183759
global_frequency * rel_aoa                               -0.011982
global_frequency * rel_clustering                         0.127550
global_frequency * rel_frequency                          0.024175
global_frequency * rel_letters_count                      0.040383
global_frequency * rel_orthographic_density               0.019952
global_frequency * rel_synonyms_count                    -0.030987
global_letters_count * global_orthographic_density       -0.058235
global_letters_count * global_synonyms_count             -0.047052
global_letters_count * rel_aoa                            0.004429
global_letters_count * rel_clustering                    -0.048988
global_letters_count * rel_frequency                      0.041179
global_letters_count * rel_letters_count                  0.005720
global_letters_count * rel_orthographic_density           0.045332
global_letters_count * rel_synonyms_count                 0.154255
global_orthographic_density * global_synonyms_count      -0.186904
global_orthographic_density * rel_aoa                     0.105745
global_orthographic_density * rel_clustering              0.319966
global_orthographic_density * rel_frequency               0.097734
global_orthographic_density * rel_letters_count           0.023015
global_orthographic_density * rel_orthographic_density   -0.015184
global_orthographic_density * rel_synonyms_count          0.000679
global_synonyms_count * rel_aoa                          -0.031950
global_synonyms_count * rel_clustering                    0.152224
global_synonyms_count * rel_frequency                    -0.186460
global_synonyms_count * rel_letters_count                -0.189774
global_synonyms_count * rel_orthographic_density         -0.088656
global_synonyms_count * rel_synonyms_count               -0.092150
rel_aoa * rel_clustering                                 -0.016570
rel_aoa * rel_frequency                                   0.010850
rel_aoa * rel_letters_count                              -0.021069
rel_aoa * rel_orthographic_density                       -0.101945
rel_aoa * rel_synonyms_count                              0.009145
rel_clustering * rel_frequency                           -0.062810
rel_clustering * rel_letters_count                        0.022567
rel_clustering * rel_orthographic_density                -0.226539
rel_clustering * rel_synonyms_count                      -0.160096
rel_frequency * rel_letters_count                        -0.034726
rel_frequency * rel_orthographic_density                 -0.058608
rel_frequency * rel_synonyms_count                        0.036537
rel_letters_count * rel_orthographic_density             -0.035288
rel_letters_count * rel_synonyms_count                    0.027633
rel_orthographic_density * rel_synonyms_count             0.071078
dtype: float64

Regressing rel clustering with 915 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.16242769231155751

intercept                     -2.744261
global_aoa                     0.022202
global_clustering             -0.581732
global_frequency              -0.053471
global_letters_count           0.002912
global_orthographic_density   -0.030016
global_synonyms_count          0.023949
rel_aoa                       -0.019702
rel_clustering                 0.753646
rel_frequency                  0.039303
rel_letters_count              0.017910
rel_orthographic_density       0.079509
rel_synonyms_count            -0.001441
dtype: float64

Regressing rel clustering with 915 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.24919254038601546

intercept                                                -3.837078
global_aoa                                                0.629840
global_clustering                                         0.376015
global_frequency                                         -0.127867
global_letters_count                                      1.382205
global_orthographic_density                              -0.055725
global_synonyms_count                                    -3.528995
rel_aoa                                                  -0.087837
rel_clustering                                           -1.694457
rel_frequency                                            -0.549840
rel_letters_count                                        -1.273947
rel_orthographic_density                                  0.681801
rel_synonyms_count                                        1.751633
global_aoa * global_clustering                           -0.056045
global_aoa * global_frequency                            -0.029364
global_aoa * global_letters_count                        -0.088729
global_aoa * global_orthographic_density                 -0.115768
global_aoa * global_synonyms_count                        0.010836
global_aoa * rel_aoa                                      0.013091
global_aoa * rel_clustering                               0.086177
global_aoa * rel_frequency                                0.048975
global_aoa * rel_letters_count                            0.085604
global_aoa * rel_orthographic_density                     0.082795
global_aoa * rel_synonyms_count                          -0.023256
global_clustering * global_frequency                     -0.052075
global_clustering * global_letters_count                  0.078720
global_clustering * global_orthographic_density          -0.125626
global_clustering * global_synonyms_count                -0.290344
global_clustering * rel_aoa                              -0.022695
global_clustering * rel_clustering                       -0.106247
global_clustering * rel_frequency                         0.007721
global_clustering * rel_letters_count                    -0.065342
global_clustering * rel_orthographic_density              0.107677
global_clustering * rel_synonyms_count                    0.204428
global_frequency * global_letters_count                  -0.031552
global_frequency * global_orthographic_density            0.021962
global_frequency * global_synonyms_count                  0.144061
global_frequency * rel_aoa                               -0.027287
global_frequency * rel_clustering                         0.062755
global_frequency * rel_frequency                          0.024077
global_frequency * rel_letters_count                      0.041361
global_frequency * rel_orthographic_density              -0.049138
global_frequency * rel_synonyms_count                    -0.033280
global_letters_count * global_orthographic_density       -0.018924
global_letters_count * global_synonyms_count              0.066222
global_letters_count * rel_aoa                           -0.005880
global_letters_count * rel_clustering                     0.024425
global_letters_count * rel_frequency                      0.025067
global_letters_count * rel_letters_count                  0.004753
global_letters_count * rel_orthographic_density           0.001663
global_letters_count * rel_synonyms_count                 0.027104
global_orthographic_density * global_synonyms_count      -0.068842
global_orthographic_density * rel_aoa                     0.087578
global_orthographic_density * rel_clustering              0.226944
global_orthographic_density * rel_frequency               0.039290
global_orthographic_density * rel_letters_count          -0.006445
global_orthographic_density * rel_orthographic_density    0.006124
global_orthographic_density * rel_synonyms_count         -0.096904
global_synonyms_count * rel_aoa                          -0.014760
global_synonyms_count * rel_clustering                    0.185258
global_synonyms_count * rel_frequency                    -0.154959
global_synonyms_count * rel_letters_count                -0.222082
global_synonyms_count * rel_orthographic_density         -0.139852
global_synonyms_count * rel_synonyms_count               -0.080139
rel_aoa * rel_clustering                                  0.005778
rel_aoa * rel_frequency                                   0.017061
rel_aoa * rel_letters_count                              -0.009376
rel_aoa * rel_orthographic_density                       -0.088570
rel_aoa * rel_synonyms_count                              0.008476
rel_clustering * rel_frequency                           -0.037406
rel_clustering * rel_letters_count                       -0.019578
rel_clustering * rel_orthographic_density                -0.143148
rel_clustering * rel_synonyms_count                      -0.221217
rel_frequency * rel_letters_count                        -0.021867
rel_frequency * rel_orthographic_density                  0.002623
rel_frequency * rel_synonyms_count                        0.039666
rel_letters_count * rel_orthographic_density              0.017284
rel_letters_count * rel_synonyms_count                    0.080632
rel_orthographic_density * rel_synonyms_count             0.101136
dtype: float64

----------------------------------------------------------------------
Regressing global letters_count with 1135 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.06291360172985039

intercept                      4.456631
global_aoa                     0.052116
global_clustering             -0.050031
global_frequency               0.051367
global_letters_count           0.190153
global_orthographic_density   -0.133795
global_synonyms_count         -0.358129
dtype: float64

Regressing global letters_count with 1135 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.07484083849990475

intercept                                             -14.293958
global_aoa                                              0.728991
global_clustering                                      -3.054321
global_frequency                                        0.762517
global_letters_count                                    1.318163
global_orthographic_density                             0.875885
global_synonyms_count                                   2.808896
global_aoa * global_clustering                          0.098718
global_aoa * global_frequency                           0.004371
global_aoa * global_letters_count                      -0.018846
global_aoa * global_orthographic_density                0.008905
global_aoa * global_synonyms_count                     -0.122518
global_clustering * global_frequency                    0.122669
global_clustering * global_letters_count                0.140677
global_clustering * global_orthographic_density         0.219075
global_clustering * global_synonyms_count               0.212707
global_frequency * global_letters_count                -0.016266
global_frequency * global_orthographic_density          0.035925
global_frequency * global_synonyms_count               -0.023248
global_letters_count * global_orthographic_density     -0.000085
global_letters_count * global_synonyms_count           -0.079557
global_orthographic_density * global_synonyms_count    -0.342725
dtype: float64

Regressing rel letters_count with 1135 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.03140334223760377

intercept                      1.411580
global_aoa                     0.012989
global_clustering             -0.040687
global_frequency               0.047772
global_letters_count           0.127766
global_orthographic_density   -0.097959
global_synonyms_count         -0.397576
dtype: float64

Regressing rel letters_count with 1135 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.04132224198522071

intercept                                             -11.759560
global_aoa                                              0.567835
global_clustering                                      -2.005954
global_frequency                                        0.621152
global_letters_count                                    0.942026
global_orthographic_density                             0.033741
global_synonyms_count                                   2.503860
global_aoa * global_clustering                          0.064472
global_aoa * global_frequency                           0.008188
global_aoa * global_letters_count                      -0.036604
global_aoa * global_orthographic_density                0.002800
global_aoa * global_synonyms_count                     -0.140526
global_clustering * global_frequency                    0.104913
global_clustering * global_letters_count                0.074565
global_clustering * global_orthographic_density         0.043113
global_clustering * global_synonyms_count               0.143464
global_frequency * global_letters_count                -0.009075
global_frequency * global_orthographic_density          0.034379
global_frequency * global_synonyms_count               -0.063451
global_letters_count * global_orthographic_density     -0.024332
global_letters_count * global_synonyms_count           -0.041830
global_orthographic_density * global_synonyms_count    -0.246130
dtype: float64

Regressing global letters_count with 1135 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.0517356969261189

intercept                   5.940810
rel_aoa                    -0.054881
rel_clustering              0.114196
rel_frequency               0.055220
rel_letters_count           0.152228
rel_orthographic_density   -0.269837
rel_synonyms_count         -0.416934
dtype: float64

Regressing global letters_count with 1135 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.066102712156896

intercept                                        5.861061
rel_aoa                                         -0.110882
rel_clustering                                   0.099281
rel_frequency                                    0.089922
rel_letters_count                                0.239315
rel_orthographic_density                        -0.419682
rel_synonyms_count                              -0.765463
rel_aoa * rel_clustering                         0.091384
rel_aoa * rel_frequency                         -0.038050
rel_aoa * rel_letters_count                     -0.043706
rel_aoa * rel_orthographic_density              -0.030657
rel_aoa * rel_synonyms_count                    -0.173503
rel_clustering * rel_frequency                  -0.005788
rel_clustering * rel_letters_count               0.036474
rel_clustering * rel_orthographic_density        0.130330
rel_clustering * rel_synonyms_count             -0.074756
rel_frequency * rel_letters_count               -0.008725
rel_frequency * rel_orthographic_density        -0.022682
rel_frequency * rel_synonyms_count              -0.092902
rel_letters_count * rel_orthographic_density     0.053832
rel_letters_count * rel_synonyms_count          -0.014281
rel_orthographic_density * rel_synonyms_count   -0.259310
dtype: float64

Regressing rel letters_count with 1135 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.09509034545828898

intercept                   1.701325
rel_aoa                    -0.028360
rel_clustering             -0.048318
rel_frequency              -0.128300
rel_letters_count           0.337519
rel_orthographic_density    0.058402
rel_synonyms_count         -0.392321
dtype: float64

Regressing rel letters_count with 1135 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.11690583590303338

intercept                                        1.660806
rel_aoa                                          0.023611
rel_clustering                                  -0.089199
rel_frequency                                   -0.094081
rel_letters_count                                0.499186
rel_orthographic_density                         0.080240
rel_synonyms_count                              -0.570456
rel_aoa * rel_clustering                         0.104754
rel_aoa * rel_frequency                         -0.002763
rel_aoa * rel_letters_count                     -0.095181
rel_aoa * rel_orthographic_density              -0.123774
rel_aoa * rel_synonyms_count                    -0.206193
rel_clustering * rel_frequency                  -0.004389
rel_clustering * rel_letters_count               0.072806
rel_clustering * rel_orthographic_density        0.208877
rel_clustering * rel_synonyms_count             -0.060411
rel_frequency * rel_letters_count                0.011080
rel_frequency * rel_orthographic_density         0.042364
rel_frequency * rel_synonyms_count              -0.041623
rel_letters_count * rel_orthographic_density     0.060855
rel_letters_count * rel_synonyms_count           0.032715
rel_orthographic_density * rel_synonyms_count   -0.167081
dtype: float64

Regressing global letters_count with 1135 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.0735032508198209

intercept                      0.699430
global_aoa                     0.157349
global_clustering             -0.321805
global_frequency               0.135188
global_letters_count           0.266172
global_orthographic_density   -0.104131
global_synonyms_count         -0.027827
rel_aoa                       -0.162345
rel_clustering                 0.311498
rel_frequency                 -0.103149
rel_letters_count             -0.083527
rel_orthographic_density      -0.017127
rel_synonyms_count            -0.382604
dtype: float64

Regressing global letters_count with 1135 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1316839987859666

intercept                                                 14.242188
global_aoa                                                 1.067321
global_clustering                                          6.103556
global_frequency                                           1.986688
global_letters_count                                      -2.435356
global_orthographic_density                               -1.136480
global_synonyms_count                                      5.549084
rel_aoa                                                   -0.682557
rel_clustering                                           -11.267498
rel_frequency                                              0.281457
rel_letters_count                                          4.079198
rel_orthographic_density                                   0.048979
rel_synonyms_count                                        -4.156664
global_aoa * global_clustering                             0.029950
global_aoa * global_frequency                             -0.067943
global_aoa * global_letters_count                         -0.007024
global_aoa * global_orthographic_density                   0.056183
global_aoa * global_synonyms_count                        -0.247364
global_aoa * rel_aoa                                       0.023705
global_aoa * rel_clustering                               -0.047460
global_aoa * rel_frequency                                 0.013109
global_aoa * rel_letters_count                            -0.028578
global_aoa * rel_orthographic_density                      0.002680
global_aoa * rel_synonyms_count                            0.218344
global_clustering * global_frequency                      -0.014539
global_clustering * global_letters_count                  -0.617482
global_clustering * global_orthographic_density           -1.483768
global_clustering * global_synonyms_count                 -0.683949
global_clustering * rel_aoa                                0.191196
global_clustering * rel_clustering                        -0.029655
global_clustering * rel_frequency                          0.315310
global_clustering * rel_letters_count                      0.636501
global_clustering * rel_orthographic_density               1.303458
global_clustering * rel_synonyms_count                     0.990639
global_frequency * global_letters_count                   -0.037010
global_frequency * global_orthographic_density            -0.562058
global_frequency * global_synonyms_count                  -0.428257
global_frequency * rel_aoa                                 0.188514
global_frequency * rel_clustering                          0.293122
global_frequency * rel_frequency                           0.004302
global_frequency * rel_letters_count                      -0.027009
global_frequency * rel_orthographic_density                0.576820
global_frequency * rel_synonyms_count                      0.502425
global_letters_count * global_orthographic_density        -0.324783
global_letters_count * global_synonyms_count              -0.132707
global_letters_count * rel_aoa                            -0.033112
global_letters_count * rel_clustering                      1.025980
global_letters_count * rel_frequency                       0.085185
global_letters_count * rel_letters_count                   0.014372
global_letters_count * rel_orthographic_density            0.181205
global_letters_count * rel_synonyms_count                  0.098994
global_orthographic_density * global_synonyms_count       -1.348352
global_orthographic_density * rel_aoa                     -0.157505
global_orthographic_density * rel_clustering               1.547172
global_orthographic_density * rel_frequency                0.470443
global_orthographic_density * rel_letters_count            0.130441
global_orthographic_density * rel_orthographic_density    -0.066869
global_orthographic_density * rel_synonyms_count           1.069737
global_synonyms_count * rel_aoa                            0.124464
global_synonyms_count * rel_clustering                     1.179066
global_synonyms_count * rel_frequency                      0.256856
global_synonyms_count * rel_letters_count                 -0.099824
global_synonyms_count * rel_orthographic_density           1.184584
global_synonyms_count * rel_synonyms_count                -0.101370
rel_aoa * rel_clustering                                  -0.019016
rel_aoa * rel_frequency                                   -0.110190
rel_aoa * rel_letters_count                               -0.045060
rel_aoa * rel_orthographic_density                         0.040468
rel_aoa * rel_synonyms_count                              -0.224574
rel_clustering * rel_frequency                            -0.495595
rel_clustering * rel_letters_count                        -0.877787
rel_clustering * rel_orthographic_density                 -1.075537
rel_clustering * rel_synonyms_count                       -1.410291
rel_frequency * rel_letters_count                         -0.032594
rel_frequency * rel_orthographic_density                  -0.449151
rel_frequency * rel_synonyms_count                        -0.383012
rel_letters_count * rel_orthographic_density              -0.013862
rel_letters_count * rel_synonyms_count                     0.053506
rel_orthographic_density * rel_synonyms_count             -1.182395
dtype: float64

Regressing rel letters_count with 1135 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.14710068779642604

intercept                      0.035391
global_aoa                     0.124825
global_clustering             -0.257180
global_frequency               0.186691
global_letters_count          -0.567937
global_orthographic_density   -0.134607
global_synonyms_count         -0.043771
rel_aoa                       -0.110065
rel_clustering                 0.250776
rel_frequency                 -0.160566
rel_letters_count              0.768358
rel_orthographic_density       0.005713
rel_synonyms_count            -0.324677
dtype: float64

Regressing rel letters_count with 1135 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.19681533614201221

intercept                                                 12.140777
global_aoa                                                 0.636171
global_clustering                                          4.456707
global_frequency                                           1.435915
global_letters_count                                      -2.631710
global_orthographic_density                               -2.139042
global_synonyms_count                                      7.096952
rel_aoa                                                   -0.237994
rel_clustering                                            -9.283586
rel_frequency                                              0.637049
rel_letters_count                                          4.148247
rel_orthographic_density                                  -0.029425
rel_synonyms_count                                        -4.593900
global_aoa * global_clustering                             0.063610
global_aoa * global_frequency                             -0.020108
global_aoa * global_letters_count                          0.013602
global_aoa * global_orthographic_density                   0.033334
global_aoa * global_synonyms_count                        -0.222055
global_aoa * rel_aoa                                       0.013971
global_aoa * rel_clustering                               -0.074885
global_aoa * rel_frequency                                -0.027273
global_aoa * rel_letters_count                            -0.034123
global_aoa * rel_orthographic_density                      0.034768
global_aoa * rel_synonyms_count                            0.198838
global_clustering * global_frequency                       0.013621
global_clustering * global_letters_count                  -0.448798
global_clustering * global_orthographic_density           -1.295717
global_clustering * global_synonyms_count                 -0.485757
global_clustering * rel_aoa                                0.142418
global_clustering * rel_clustering                        -0.081414
global_clustering * rel_frequency                          0.265914
global_clustering * rel_letters_count                      0.436663
global_clustering * rel_orthographic_density               1.042335
global_clustering * rel_synonyms_count                     0.643039
global_frequency * global_letters_count                   -0.011314
global_frequency * global_orthographic_density            -0.376567
global_frequency * global_synonyms_count                  -0.439452
global_frequency * rel_aoa                                 0.132161
global_frequency * rel_clustering                          0.234600
global_frequency * rel_frequency                          -0.005123
global_frequency * rel_letters_count                      -0.056972
global_frequency * rel_orthographic_density                0.434065
global_frequency * rel_synonyms_count                      0.373413
global_letters_count * global_orthographic_density        -0.287421
global_letters_count * global_synonyms_count              -0.242051
global_letters_count * rel_aoa                            -0.048271
global_letters_count * rel_clustering                      0.837043
global_letters_count * rel_frequency                       0.060891
global_letters_count * rel_letters_count                  -0.004575
global_letters_count * rel_orthographic_density            0.169404
global_letters_count * rel_synonyms_count                  0.122205
global_orthographic_density * global_synonyms_count       -1.282866
global_orthographic_density * rel_aoa                     -0.114818
global_orthographic_density * rel_clustering               1.227564
global_orthographic_density * rel_frequency                0.322163
global_orthographic_density * rel_letters_count            0.126586
global_orthographic_density * rel_orthographic_density    -0.056550
global_orthographic_density * rel_synonyms_count           0.992987
global_synonyms_count * rel_aoa                            0.106517
global_synonyms_count * rel_clustering                     0.912569
global_synonyms_count * rel_frequency                      0.282296
global_synonyms_count * rel_letters_count                  0.013224
global_synonyms_count * rel_orthographic_density           1.111107
global_synonyms_count * rel_synonyms_count                -0.138250
rel_aoa * rel_clustering                                   0.011025
rel_aoa * rel_frequency                                   -0.066207
rel_aoa * rel_letters_count                               -0.032053
rel_aoa * rel_orthographic_density                         0.014788
rel_aoa * rel_synonyms_count                              -0.217988
rel_clustering * rel_frequency                            -0.461077
rel_clustering * rel_letters_count                        -0.681363
rel_clustering * rel_orthographic_density                 -0.743796
rel_clustering * rel_synonyms_count                       -1.059086
rel_frequency * rel_letters_count                         -0.002822
rel_frequency * rel_orthographic_density                  -0.344900
rel_frequency * rel_synonyms_count                        -0.295201
rel_letters_count * rel_orthographic_density              -0.051510
rel_letters_count * rel_synonyms_count                     0.027046
rel_orthographic_density * rel_synonyms_count             -1.044847
dtype: float64

----------------------------------------------------------------------
Regressing global synonyms_count with 1096 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.009710579691767696

intercept                      0.660172
global_aoa                    -0.006420
global_clustering              0.033366
global_frequency              -0.003646
global_letters_count          -0.009227
global_orthographic_density   -0.008353
global_synonyms_count          0.088318
dtype: float64

Regressing global synonyms_count with 1096 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.025383781584193413

intercept                                              2.550964
global_aoa                                             0.083269
global_clustering                                      0.550885
global_frequency                                      -0.081275
global_letters_count                                  -0.060378
global_orthographic_density                           -0.257843
global_synonyms_count                                 -0.219339
global_aoa * global_clustering                        -0.012595
global_aoa * global_frequency                         -0.008481
global_aoa * global_letters_count                     -0.012078
global_aoa * global_orthographic_density              -0.015599
global_aoa * global_synonyms_count                     0.023200
global_clustering * global_frequency                  -0.024236
global_clustering * global_letters_count              -0.020806
global_clustering * global_orthographic_density       -0.065900
global_clustering * global_synonyms_count              0.025713
global_frequency * global_letters_count                0.001398
global_frequency * global_orthographic_density        -0.008214
global_frequency * global_synonyms_count               0.008398
global_letters_count * global_orthographic_density     0.001741
global_letters_count * global_synonyms_count           0.012673
global_orthographic_density * global_synonyms_count    0.118642
dtype: float64

Regressing rel synonyms_count with 1096 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.006454873490346902

intercept                      0.352216
global_aoa                    -0.005743
global_clustering              0.027180
global_frequency              -0.008945
global_letters_count          -0.006642
global_orthographic_density   -0.010591
global_synonyms_count          0.061028
dtype: float64

Regressing rel synonyms_count with 1096 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.021165434814992445

intercept                                              2.732785
global_aoa                                             0.098709
global_clustering                                      0.590905
global_frequency                                      -0.168737
global_letters_count                                  -0.081396
global_orthographic_density                           -0.201402
global_synonyms_count                                 -0.246229
global_aoa * global_clustering                        -0.006283
global_aoa * global_frequency                         -0.004186
global_aoa * global_letters_count                     -0.014129
global_aoa * global_orthographic_density              -0.017032
global_aoa * global_synonyms_count                     0.013159
global_clustering * global_frequency                  -0.031216
global_clustering * global_letters_count              -0.026900
global_clustering * global_orthographic_density       -0.054188
global_clustering * global_synonyms_count              0.013588
global_frequency * global_letters_count                0.002127
global_frequency * global_orthographic_density        -0.001315
global_frequency * global_synonyms_count               0.006692
global_letters_count * global_orthographic_density    -0.004440
global_letters_count * global_synonyms_count           0.020125
global_orthographic_density * global_synonyms_count    0.093630
dtype: float64

Regressing global synonyms_count with 1096 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.00688698929330922

intercept                   0.371851
rel_aoa                    -0.004706
rel_clustering             -0.005353
rel_frequency               0.001551
rel_letters_count           0.003025
rel_orthographic_density    0.023131
rel_synonyms_count          0.083520
dtype: float64

Regressing global synonyms_count with 1096 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.020760430043277767

intercept                                        0.392315
rel_aoa                                         -0.024388
rel_clustering                                  -0.084479
rel_frequency                                    0.014669
rel_letters_count                                0.009902
rel_orthographic_density                         0.038145
rel_synonyms_count                               0.101499
rel_aoa * rel_clustering                        -0.019867
rel_aoa * rel_frequency                         -0.012715
rel_aoa * rel_letters_count                     -0.008540
rel_aoa * rel_orthographic_density              -0.011794
rel_aoa * rel_synonyms_count                     0.013197
rel_clustering * rel_frequency                  -0.010998
rel_clustering * rel_letters_count              -0.002640
rel_clustering * rel_orthographic_density       -0.073921
rel_clustering * rel_synonyms_count              0.026623
rel_frequency * rel_letters_count               -0.000152
rel_frequency * rel_orthographic_density        -0.003813
rel_frequency * rel_synonyms_count              -0.004956
rel_letters_count * rel_orthographic_density    -0.000543
rel_letters_count * rel_synonyms_count           0.007924
rel_orthographic_density * rel_synonyms_count    0.074757
dtype: float64

Regressing rel synonyms_count with 1096 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.05305257593515189

intercept                   0.039229
rel_aoa                    -0.016603
rel_clustering              0.040051
rel_frequency               0.005715
rel_letters_count           0.002555
rel_orthographic_density   -0.000950
rel_synonyms_count          0.228901
dtype: float64

Regressing rel synonyms_count with 1096 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.06879047903321789

intercept                                        0.066218
rel_aoa                                         -0.028133
rel_clustering                                  -0.055101
rel_frequency                                    0.021958
rel_letters_count                                0.008348
rel_orthographic_density                         0.030000
rel_synonyms_count                               0.311039
rel_aoa * rel_clustering                        -0.000557
rel_aoa * rel_frequency                         -0.009396
rel_aoa * rel_letters_count                     -0.010047
rel_aoa * rel_orthographic_density              -0.009628
rel_aoa * rel_synonyms_count                    -0.007990
rel_clustering * rel_frequency                  -0.022331
rel_clustering * rel_letters_count              -0.004683
rel_clustering * rel_orthographic_density       -0.052538
rel_clustering * rel_synonyms_count              0.007439
rel_frequency * rel_letters_count               -0.000165
rel_frequency * rel_orthographic_density        -0.000761
rel_frequency * rel_synonyms_count               0.011458
rel_letters_count * rel_orthographic_density    -0.004722
rel_letters_count * rel_synonyms_count           0.017081
rel_orthographic_density * rel_synonyms_count    0.089731
dtype: float64

Regressing global synonyms_count with 1096 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.020676502671740926

intercept                      1.875480
global_aoa                    -0.000451
global_clustering              0.124590
global_frequency              -0.020375
global_letters_count          -0.055800
global_orthographic_density   -0.132494
global_synonyms_count          0.066629
rel_aoa                       -0.007486
rel_clustering                -0.105746
rel_frequency                  0.020714
rel_letters_count              0.051026
rel_orthographic_density       0.140675
rel_synonyms_count             0.023477
dtype: float64

Regressing global synonyms_count with 1096 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.07523328537541141

intercept                                                 6.396394
global_aoa                                                0.202293
global_clustering                                         1.343485
global_frequency                                         -0.429871
global_letters_count                                      0.073519
global_orthographic_density                               0.171681
global_synonyms_count                                     0.369757
rel_aoa                                                  -0.427222
rel_clustering                                           -1.047253
rel_frequency                                            -0.069079
rel_letters_count                                        -0.258513
rel_orthographic_density                                  0.103432
rel_synonyms_count                                       -1.817931
global_aoa * global_clustering                            0.014519
global_aoa * global_frequency                             0.018345
global_aoa * global_letters_count                        -0.020743
global_aoa * global_orthographic_density                 -0.120928
global_aoa * global_synonyms_count                        0.030222
global_aoa * rel_aoa                                     -0.002426
global_aoa * rel_clustering                              -0.019131
global_aoa * rel_frequency                               -0.030394
global_aoa * rel_letters_count                            0.010964
global_aoa * rel_orthographic_density                     0.111872
global_aoa * rel_synonyms_count                           0.021856
global_clustering * global_frequency                     -0.084406
global_clustering * global_letters_count                 -0.047349
global_clustering * global_orthographic_density          -0.077761
global_clustering * global_synonyms_count                 0.140869
global_clustering * rel_aoa                              -0.032933
global_clustering * rel_clustering                        0.017868
global_clustering * rel_frequency                         0.011574
global_clustering * rel_letters_count                    -0.015542
global_clustering * rel_orthographic_density              0.105575
global_clustering * rel_synonyms_count                   -0.132480
global_frequency * global_letters_count                  -0.030417
global_frequency * global_orthographic_density           -0.032990
global_frequency * global_synonyms_count                  0.051130
global_frequency * rel_aoa                               -0.004871
global_frequency * rel_clustering                         0.061978
global_frequency * rel_frequency                          0.001047
global_frequency * rel_letters_count                      0.015590
global_frequency * rel_orthographic_density               0.036129
global_frequency * rel_synonyms_count                     0.009614
global_letters_count * global_orthographic_density        0.066754
global_letters_count * global_synonyms_count              0.032027
global_letters_count * rel_aoa                            0.022705
global_letters_count * rel_clustering                     0.043632
global_letters_count * rel_frequency                      0.056216
global_letters_count * rel_letters_count                 -0.005734
global_letters_count * rel_orthographic_density          -0.062279
global_letters_count * rel_synonyms_count                 0.049392
global_orthographic_density * global_synonyms_count      -0.007661
global_orthographic_density * rel_aoa                     0.089372
global_orthographic_density * rel_clustering              0.090247
global_orthographic_density * rel_frequency               0.030707
global_orthographic_density * rel_letters_count          -0.021972
global_orthographic_density * rel_orthographic_density   -0.032885
global_orthographic_density * rel_synonyms_count          0.152174
global_synonyms_count * rel_aoa                          -0.024356
global_synonyms_count * rel_clustering                   -0.021577
global_synonyms_count * rel_frequency                     0.034037
global_synonyms_count * rel_letters_count                -0.075916
global_synonyms_count * rel_orthographic_density          0.032550
global_synonyms_count * rel_synonyms_count                0.073489
rel_aoa * rel_clustering                                  0.019741
rel_aoa * rel_frequency                                   0.004446
rel_aoa * rel_letters_count                              -0.021745
rel_aoa * rel_orthographic_density                       -0.100056
rel_aoa * rel_synonyms_count                             -0.021386
rel_clustering * rel_frequency                           -0.004503
rel_clustering * rel_letters_count                        0.012145
rel_clustering * rel_orthographic_density                -0.172326
rel_clustering * rel_synonyms_count                       0.052853
rel_frequency * rel_letters_count                        -0.034528
rel_frequency * rel_orthographic_density                 -0.041495
rel_frequency * rel_synonyms_count                       -0.087200
rel_letters_count * rel_orthographic_density             -0.007312
rel_letters_count * rel_synonyms_count                    0.028487
rel_orthographic_density * rel_synonyms_count            -0.039081
dtype: float64

Regressing rel synonyms_count with 1096 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.12988695530914085

intercept                      1.305677
global_aoa                     0.003381
global_clustering              0.076463
global_frequency              -0.031230
global_letters_count          -0.038188
global_orthographic_density   -0.046916
global_synonyms_count         -0.593880
rel_aoa                       -0.013369
rel_clustering                -0.061095
rel_frequency                  0.028955
rel_letters_count              0.033709
rel_orthographic_density       0.051752
rel_synonyms_count             0.794122
dtype: float64

Regressing rel synonyms_count with 1096 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.18356355602772723

intercept                                                 3.490825
global_aoa                                                0.101985
global_clustering                                         0.886106
global_frequency                                         -0.357059
global_letters_count                                      0.219466
global_orthographic_density                               0.936720
global_synonyms_count                                     0.012578
rel_aoa                                                  -0.090790
rel_clustering                                           -0.641498
rel_frequency                                            -0.027317
rel_letters_count                                        -0.494223
rel_orthographic_density                                 -0.777737
rel_synonyms_count                                       -1.657917
global_aoa * global_clustering                            0.005774
global_aoa * global_frequency                             0.019754
global_aoa * global_letters_count                        -0.017252
global_aoa * global_orthographic_density                 -0.110593
global_aoa * global_synonyms_count                        0.042243
global_aoa * rel_aoa                                     -0.003010
global_aoa * rel_clustering                              -0.004974
global_aoa * rel_frequency                               -0.024705
global_aoa * rel_letters_count                            0.011600
global_aoa * rel_orthographic_density                     0.101082
global_aoa * rel_synonyms_count                           0.015460
global_clustering * global_frequency                     -0.066882
global_clustering * global_letters_count                 -0.022119
global_clustering * global_orthographic_density           0.001978
global_clustering * global_synonyms_count                 0.086965
global_clustering * rel_aoa                              -0.014054
global_clustering * rel_clustering                        0.015396
global_clustering * rel_frequency                         0.002619
global_clustering * rel_letters_count                    -0.031634
global_clustering * rel_orthographic_density              0.025736
global_clustering * rel_synonyms_count                   -0.074109
global_frequency * global_letters_count                  -0.025183
global_frequency * global_orthographic_density           -0.038279
global_frequency * global_synonyms_count                  0.047335
global_frequency * rel_aoa                               -0.011458
global_frequency * rel_clustering                         0.056471
global_frequency * rel_frequency                          0.002679
global_frequency * rel_letters_count                      0.021683
global_frequency * rel_orthographic_density               0.053914
global_frequency * rel_synonyms_count                     0.036953
global_letters_count * global_orthographic_density        0.033478
global_letters_count * global_synonyms_count             -0.024558
global_letters_count * rel_aoa                            0.010682
global_letters_count * rel_clustering                     0.009732
global_letters_count * rel_frequency                      0.037827
global_letters_count * rel_letters_count                 -0.003667
global_letters_count * rel_orthographic_density          -0.030067
global_letters_count * rel_synonyms_count                 0.122602
global_orthographic_density * global_synonyms_count      -0.232179
global_orthographic_density * rel_aoa                     0.050591
global_orthographic_density * rel_clustering             -0.040531
global_orthographic_density * rel_frequency               0.012231
global_orthographic_density * rel_letters_count           0.000273
global_orthographic_density * rel_orthographic_density   -0.041626
global_orthographic_density * rel_synonyms_count          0.368429
global_synonyms_count * rel_aoa                          -0.060521
global_synonyms_count * rel_clustering                    0.042879
global_synonyms_count * rel_frequency                     0.013568
global_synonyms_count * rel_letters_count                -0.045644
global_synonyms_count * rel_orthographic_density          0.187468
global_synonyms_count * rel_synonyms_count                0.074515
rel_aoa * rel_clustering                                  0.002701
rel_aoa * rel_frequency                                   0.006405
rel_aoa * rel_letters_count                              -0.013887
rel_aoa * rel_orthographic_density                       -0.061341
rel_aoa * rel_synonyms_count                              0.000802
rel_clustering * rel_frequency                           -0.010502
rel_clustering * rel_letters_count                        0.032038
rel_clustering * rel_orthographic_density                -0.042051
rel_clustering * rel_synonyms_count                      -0.028457
rel_frequency * rel_letters_count                        -0.028058
rel_frequency * rel_orthographic_density                 -0.030409
rel_frequency * rel_synonyms_count                       -0.081114
rel_letters_count * rel_orthographic_density             -0.028024
rel_letters_count * rel_synonyms_count                   -0.016504
rel_orthographic_density * rel_synonyms_count            -0.190775
dtype: float64

----------------------------------------------------------------------
Regressing global orthographic_density with 913 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.05587156649970437

intercept                      2.210546
global_aoa                    -0.050901
global_clustering              0.055091
global_frequency              -0.019447
global_letters_count          -0.035323
global_orthographic_density    0.117976
global_synonyms_count          0.052144
dtype: float64

Regressing global orthographic_density with 913 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.06742521661849987

intercept                                              8.157536
global_aoa                                            -0.352689
global_clustering                                      0.801636
global_frequency                                      -0.248298
global_letters_count                                  -0.549615
global_orthographic_density                           -0.219409
global_synonyms_count                                  0.164724
global_aoa * global_clustering                        -0.023387
global_aoa * global_frequency                          0.003332
global_aoa * global_letters_count                      0.015189
global_aoa * global_orthographic_density               0.031642
global_aoa * global_synonyms_count                     0.016128
global_clustering * global_frequency                  -0.025967
global_clustering * global_letters_count              -0.057311
global_clustering * global_orthographic_density       -0.038137
global_clustering * global_synonyms_count              0.120312
global_frequency * global_letters_count                0.009701
global_frequency * global_orthographic_density        -0.001038
global_frequency * global_synonyms_count               0.023762
global_letters_count * global_orthographic_density    -0.021316
global_letters_count * global_synonyms_count           0.017387
global_orthographic_density * global_synonyms_count    0.122201
dtype: float64

Regressing rel orthographic_density with 913 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.043011852957886676

intercept                     -0.062364
global_aoa                    -0.036698
global_clustering              0.057854
global_frequency              -0.016037
global_letters_count          -0.045770
global_orthographic_density    0.076685
global_synonyms_count          0.051837
dtype: float64

Regressing rel orthographic_density with 913 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.05313621517155098

intercept                                              6.064244
global_aoa                                            -0.162395
global_clustering                                      1.008548
global_frequency                                      -0.190975
global_letters_count                                  -0.648406
global_orthographic_density                           -0.398415
global_synonyms_count                                 -0.282678
global_aoa * global_clustering                        -0.004369
global_aoa * global_frequency                         -0.002409
global_aoa * global_letters_count                      0.013351
global_aoa * global_orthographic_density               0.030749
global_aoa * global_synonyms_count                     0.017029
global_clustering * global_frequency                  -0.034835
global_clustering * global_letters_count              -0.090615
global_clustering * global_orthographic_density       -0.056997
global_clustering * global_synonyms_count              0.035467
global_frequency * global_letters_count               -0.001735
global_frequency * global_orthographic_density        -0.002765
global_frequency * global_synonyms_count               0.014128
global_letters_count * global_orthographic_density    -0.009366
global_letters_count * global_synonyms_count           0.030185
global_orthographic_density * global_synonyms_count    0.087075
dtype: float64

Regressing global orthographic_density with 913 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.035217158353819

intercept                   1.500615
rel_aoa                     0.018355
rel_clustering             -0.021130
rel_frequency              -0.025446
rel_letters_count          -0.023182
rel_orthographic_density    0.174847
rel_synonyms_count          0.110603
dtype: float64

Regressing global orthographic_density with 913 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.05167633673783811

intercept                                        1.519483
rel_aoa                                          0.087337
rel_clustering                                  -0.042923
rel_frequency                                   -0.039625
rel_letters_count                               -0.013789
rel_orthographic_density                         0.293988
rel_synonyms_count                               0.356269
rel_aoa * rel_clustering                         0.029791
rel_aoa * rel_frequency                          0.030615
rel_aoa * rel_letters_count                      0.009093
rel_aoa * rel_orthographic_density               0.025487
rel_aoa * rel_synonyms_count                     0.061400
rel_clustering * rel_frequency                  -0.011934
rel_clustering * rel_letters_count              -0.007352
rel_clustering * rel_orthographic_density        0.019589
rel_clustering * rel_synonyms_count              0.137228
rel_frequency * rel_letters_count                0.015670
rel_frequency * rel_orthographic_density         0.039971
rel_frequency * rel_synonyms_count               0.073592
rel_letters_count * rel_orthographic_density    -0.021428
rel_letters_count * rel_synonyms_count          -0.011463
rel_orthographic_density * rel_synonyms_count    0.097006
dtype: float64

Regressing rel orthographic_density with 913 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.06481528259007474

intercept                  -0.648527
rel_aoa                     0.016908
rel_clustering              0.011158
rel_frequency               0.016526
rel_letters_count          -0.025605
rel_orthographic_density    0.222566
rel_synonyms_count          0.082167
dtype: float64

Regressing rel orthographic_density with 913 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.07569178670848253

intercept                                       -0.600305
rel_aoa                                          0.060254
rel_clustering                                  -0.009356
rel_frequency                                    0.033704
rel_letters_count                               -0.020026
rel_orthographic_density                         0.327652
rel_synonyms_count                               0.234154
rel_aoa * rel_clustering                         0.025766
rel_aoa * rel_frequency                          0.014188
rel_aoa * rel_letters_count                      0.010556
rel_aoa * rel_orthographic_density               0.040983
rel_aoa * rel_synonyms_count                     0.063014
rel_clustering * rel_frequency                  -0.021683
rel_clustering * rel_letters_count              -0.034688
rel_clustering * rel_orthographic_density       -0.015961
rel_clustering * rel_synonyms_count              0.090380
rel_frequency * rel_letters_count                0.006309
rel_frequency * rel_orthographic_density         0.037648
rel_frequency * rel_synonyms_count               0.049818
rel_letters_count * rel_orthographic_density    -0.014689
rel_letters_count * rel_synonyms_count          -0.023299
rel_orthographic_density * rel_synonyms_count    0.032929
dtype: float64

Regressing global orthographic_density with 913 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.07725144699457365

intercept                      3.675987
global_aoa                    -0.110406
global_clustering              0.149524
global_frequency              -0.058749
global_letters_count          -0.097367
global_orthographic_density    0.203406
global_synonyms_count         -0.134108
rel_aoa                        0.095571
rel_clustering                -0.107619
rel_frequency                  0.046324
rel_letters_count              0.063074
rel_orthographic_density      -0.113287
rel_synonyms_count             0.222970
dtype: float64

Regressing global orthographic_density with 913 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.13098968554671608

intercept                                                 5.827708
global_aoa                                               -0.965956
global_clustering                                         0.631623
global_frequency                                         -0.565504
global_letters_count                                      0.416032
global_orthographic_density                               2.812111
global_synonyms_count                                     2.409368
rel_aoa                                                   0.249666
rel_clustering                                            1.452906
rel_frequency                                             0.492836
rel_letters_count                                        -0.547276
rel_orthographic_density                                 -3.542108
rel_synonyms_count                                        0.773518
global_aoa * global_clustering                           -0.089446
global_aoa * global_frequency                             0.004308
global_aoa * global_letters_count                         0.035638
global_aoa * global_orthographic_density                  0.037780
global_aoa * global_synonyms_count                        0.066078
global_aoa * rel_aoa                                     -0.005402
global_aoa * rel_clustering                               0.063494
global_aoa * rel_frequency                               -0.008387
global_aoa * rel_letters_count                            0.009315
global_aoa * rel_orthographic_density                     0.036402
global_aoa * rel_synonyms_count                          -0.118041
global_clustering * global_frequency                     -0.098868
global_clustering * global_letters_count                  0.011552
global_clustering * global_orthographic_density           0.429018
global_clustering * global_synonyms_count                 0.234287
global_clustering * rel_aoa                              -0.027177
global_clustering * rel_clustering                        0.014042
global_clustering * rel_frequency                         0.018006
global_clustering * rel_letters_count                     0.070779
global_clustering * rel_orthographic_density             -0.252754
global_clustering * rel_synonyms_count                   -0.001320
global_frequency * global_letters_count                  -0.027943
global_frequency * global_orthographic_density            0.003360
global_frequency * global_synonyms_count                 -0.118404
global_frequency * rel_aoa                               -0.001620
global_frequency * rel_clustering                         0.039506
global_frequency * rel_frequency                          0.003210
global_frequency * rel_letters_count                      0.075974
global_frequency * rel_orthographic_density               0.075971
global_frequency * rel_synonyms_count                     0.000370
global_letters_count * global_orthographic_density       -0.080984
global_letters_count * global_synonyms_count             -0.324451
global_letters_count * rel_aoa                           -0.017721
global_letters_count * rel_clustering                    -0.229515
global_letters_count * rel_frequency                     -0.038993
global_letters_count * rel_letters_count                 -0.008756
global_letters_count * rel_orthographic_density           0.207214
global_letters_count * rel_synonyms_count                 0.295161
global_orthographic_density * global_synonyms_count       0.506317
global_orthographic_density * rel_aoa                    -0.064509
global_orthographic_density * rel_clustering             -0.530022
global_orthographic_density * rel_frequency              -0.101259
global_orthographic_density * rel_letters_count           0.022723
global_orthographic_density * rel_orthographic_density    0.093918
global_orthographic_density * rel_synonyms_count         -0.384001
global_synonyms_count * rel_aoa                          -0.071934
global_synonyms_count * rel_clustering                   -0.287331
global_synonyms_count * rel_frequency                     0.138351
global_synonyms_count * rel_letters_count                 0.357069
global_synonyms_count * rel_orthographic_density         -0.502371
global_synonyms_count * rel_synonyms_count               -0.048684
rel_aoa * rel_clustering                                  0.065120
rel_aoa * rel_frequency                                   0.026064
rel_aoa * rel_letters_count                               0.010975
rel_aoa * rel_orthographic_density                        0.053874
rel_aoa * rel_synonyms_count                              0.156253
rel_clustering * rel_frequency                            0.036682
rel_clustering * rel_letters_count                        0.072211
rel_clustering * rel_orthographic_density                 0.263363
rel_clustering * rel_synonyms_count                       0.218822
rel_frequency * rel_letters_count                        -0.003351
rel_frequency * rel_orthographic_density                  0.027635
rel_frequency * rel_synonyms_count                        0.012413
rel_letters_count * rel_orthographic_density             -0.133272
rel_letters_count * rel_synonyms_count                   -0.343209
rel_orthographic_density * rel_synonyms_count             0.444972
dtype: float64

Regressing rel orthographic_density with 913 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.09960699762523006

intercept                      2.486065
global_aoa                    -0.088365
global_clustering              0.121385
global_frequency              -0.042670
global_letters_count          -0.066533
global_orthographic_density   -0.510724
global_synonyms_count         -0.072586
rel_aoa                        0.072911
rel_clustering                -0.071694
rel_frequency                  0.043592
rel_letters_count              0.035524
rel_orthographic_density       0.668170
rel_synonyms_count             0.157883
dtype: float64

Regressing rel orthographic_density with 913 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.15125169697148866

intercept                                                 6.492047
global_aoa                                               -0.462360
global_clustering                                         1.103992
global_frequency                                         -0.736879
global_letters_count                                     -0.053744
global_orthographic_density                               2.536639
global_synonyms_count                                     1.066743
rel_aoa                                                  -0.077452
rel_clustering                                            0.766823
rel_frequency                                             0.442770
rel_letters_count                                        -0.190153
rel_orthographic_density                                 -2.965736
rel_synonyms_count                                        1.600587
global_aoa * global_clustering                           -0.030228
global_aoa * global_frequency                             0.006105
global_aoa * global_letters_count                         0.037069
global_aoa * global_orthographic_density                 -0.015011
global_aoa * global_synonyms_count                        0.057041
global_aoa * rel_aoa                                     -0.001296
global_aoa * rel_clustering                               0.012220
global_aoa * rel_frequency                               -0.004394
global_aoa * rel_letters_count                            0.007181
global_aoa * rel_orthographic_density                     0.096169
global_aoa * rel_synonyms_count                          -0.098198
global_clustering * global_frequency                     -0.141498
global_clustering * global_letters_count                 -0.026288
global_clustering * global_orthographic_density           0.393282
global_clustering * global_synonyms_count                 0.071065
global_clustering * rel_aoa                              -0.054271
global_clustering * rel_clustering                        0.003829
global_clustering * rel_frequency                         0.045205
global_clustering * rel_letters_count                     0.092410
global_clustering * rel_orthographic_density             -0.142844
global_clustering * rel_synonyms_count                    0.179747
global_frequency * global_letters_count                  -0.016559
global_frequency * global_orthographic_density           -0.047206
global_frequency * global_synonyms_count                 -0.087037
global_frequency * rel_aoa                                0.001011
global_frequency * rel_clustering                         0.060101
global_frequency * rel_frequency                          0.004226
global_frequency * rel_letters_count                      0.064410
global_frequency * rel_orthographic_density               0.136526
global_frequency * rel_synonyms_count                     0.002234
global_letters_count * global_orthographic_density       -0.007700
global_letters_count * global_synonyms_count             -0.243768
global_letters_count * rel_aoa                           -0.011639
global_letters_count * rel_clustering                    -0.162940
global_letters_count * rel_frequency                     -0.028547
global_letters_count * rel_letters_count                 -0.007788
global_letters_count * rel_orthographic_density           0.160294
global_letters_count * rel_synonyms_count                 0.247152
global_orthographic_density * global_synonyms_count       0.416810
global_orthographic_density * rel_aoa                    -0.045493
global_orthographic_density * rel_clustering             -0.383584
global_orthographic_density * rel_frequency              -0.031660
global_orthographic_density * rel_letters_count          -0.040307
global_orthographic_density * rel_orthographic_density    0.130352
global_orthographic_density * rel_synonyms_count         -0.279632
global_synonyms_count * rel_aoa                          -0.086590
global_synonyms_count * rel_clustering                   -0.112324
global_synonyms_count * rel_frequency                     0.119992
global_synonyms_count * rel_letters_count                 0.269394
global_synonyms_count * rel_orthographic_density         -0.443549
global_synonyms_count * rel_synonyms_count               -0.043895
rel_aoa * rel_clustering                                  0.074018
rel_aoa * rel_frequency                                   0.013316
rel_aoa * rel_letters_count                              -0.000344
rel_aoa * rel_orthographic_density                        0.024523
rel_aoa * rel_synonyms_count                              0.150160
rel_clustering * rel_frequency                            0.017430
rel_clustering * rel_letters_count                        0.020141
rel_clustering * rel_orthographic_density                 0.049859
rel_clustering * rel_synonyms_count                      -0.000557
rel_frequency * rel_letters_count                        -0.015431
rel_frequency * rel_orthographic_density                 -0.037484
rel_frequency * rel_synonyms_count                       -0.003520
rel_letters_count * rel_orthographic_density             -0.092696
rel_letters_count * rel_synonyms_count                   -0.277996
rel_orthographic_density * rel_synonyms_count             0.368819
dtype: float64

	aoa	betweenness	clustering	degree	frequency	letters_count	orthographic_density	pagerank	phonemes_count	phonological_density	syllables_count	synonyms_count
Component-0	-0.446262	0.276224	-0.087837	0.238706	0.225108	-0.445901	0.224235	0.279202	-0.420940	0.282661	-0.158210	0.000616
Component-1	0.328045	-0.409550	0.152141	-0.295563	-0.258442	-0.425448	0.157897	-0.306974	-0.418579	0.213053	-0.163004	0.004164
Component-2	0.735232	0.252298	-0.150948	0.095754	0.581154	-0.095967	0.002559	0.032680	-0.034140	0.089080	0.003244	-0.081795

	aoa	frequency	letters_count
Component-0	-0.735242	0.386221	-0.557004
Component-1	0.422901	-0.380811	-0.822276