Feature variation by substitution ($\nu_{\phi}$)

1 Setup

Flags and settings.



In [1]:

    
SAVE_FIGURES = False
PAPER_FEATURES = ['frequency', 'aoa', 'clustering', 'letters_count',
                  'synonyms_count', 'orthographic_density']
N_COMPONENTS = 3
BIN_COUNT = 4

Imports and database setup.



In [2]:

    
from itertools import product

import pandas as pd
import seaborn as sb
from scipy import stats
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from progressbar import ProgressBar

%cd -q ..
from brainscopypaste.conf import settings
%cd -q notebooks
from brainscopypaste.mine import Model, Time, Source, Past, Durl
from brainscopypaste.db import Substitution
from brainscopypaste.utils import init_db, session_scope
engine = init_db()

2 Variation of features upon substitution

First build our data.



In [3]:

    
model = Model(time=Time.discrete, source=Source.all, past=Past.last_bin, durl=Durl.all, max_distance=2)
data = []

with session_scope() as session:
    substitutions = session.query(Substitution.id)\
        .filter(Substitution.model == model)
    print("Got {} substitutions for model {}"
          .format(substitutions.count(), model))
    substitution_ids = [id for (id,) in substitutions]

for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for feature in Substitution.__features__:
            source, destination = substitution.features(feature)
            source_rel, destination_rel = \
                substitution.features(feature, sentence_relative='median')
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'feature': feature,
                'source': source,
                'source_rel': source_rel,
                'destination': destination,
                'destination_rel': destination_rel,
                'h0': substitution.feature_average(feature),
                'h0_rel': substitution.feature_average(
                        feature, sentence_relative='median'),
                'h0n': substitution.feature_average(
                        feature, source_synonyms=True),
                'h0n_rel': substitution.feature_average(
                        feature, source_synonyms=True,
                        sentence_relative='median')})

original_variations = pd.DataFrame(data)
del data









    



Got 39875 substitutions for model Model(time=Time.discrete, source=Source.all, past=Past.last_bin, durl=Durl.all, max_distance=2)






    



100% (39875 of 39875) |####################| Elapsed Time: 0:09:13 Time: 0:09:13

Compute cluster averages (so as not to overestimate confidence intervals) and crop data so that we have acceptable CIs.



In [4]:

    
variations = original_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'feature'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'feature'], as_index=False)\
    ['source', 'source_rel', 'destination', 'destination_rel', 'feature',
     'h0', 'h0_rel', 'h0n', 'h0n_rel'].mean()
variations['variation'] = variations['destination'] - variations['source']

# HARDCODED: drop values where source AoA is above 15.
# This crops the graphs to acceptable CIs.
variations.loc[(variations.feature == 'aoa') & (variations.source > 15),
               ['source', 'source_rel', 'destination', 'destination_rel',
                'h0', 'h0_rel', 'h0n', 'h0n_rel']] = np.nan

Prepare feature ordering.



In [5]:

    
ordered_features = sorted(
    Substitution.__features__,
    key=lambda f: Substitution._transformed_feature(f).__doc__
)

What we plot about features

For a feature $\phi$, plot:

$\nu_{\phi}$, the average feature of an appearing word upon substitution, as a function of the feature of the disappearing word: $$\nu_{\phi}(f) = \left< \phi(w') \right>_{\{w \rightarrow w' | \phi(w) = f \}}$$
$\nu_{\phi}^0$ (which is the average feature value), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi}^{00}$ (which is the average feature value for synonyms of the source word), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

We also plot these values relative to the sentence average, i.e.:

$\nu_{\phi, r}$, the average sentence-relative feature of an appearing word upon substitution as a function of the sentence-relative feature of the disappearing word, i.e. $\phi($destination$) - \phi($destination sentence$)$ as a function of $\phi($source$) - \phi($source sentence$)$
$\nu_{\phi, r}^0$ (which is the average feature value minus the sentence average), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi, r}^{00}$ (which is the average feature value for synonyms of the source word minus the sentence average), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

Those values are plotted with fixed-width bins, then quantile bins, with absolute feature values, then with relative-to-sentence features.



In [6]:

    
def print_significance(name, bins, h0, h0n, values):
    bin_count = bins.max() + 1
    print()
    print('-' * len(name))
    print(name)
    print('-' * len(name))
    header = ('Bin  |   '
              + ' |   '.join(map(str, range(1, bin_count + 1)))
              + ' |')
    print(header)
    print('-' * len(header))
    
    for null_name, nulls in [('H_0 ', h0), ('H_00', h0n)]:
        bin_values = np.zeros(bin_count)
        bin_nulls = np.zeros(bin_count)
        cis = np.zeros((bin_count, 3))

        for i in range(bin_count):
            indices = bins == i
            n = (indices).sum()
            s = values[indices].std(ddof=1)

            bin_values[i] = values[indices].mean()
            bin_nulls[i] = nulls[indices].mean()
            for j, alpha in enumerate([.05, .01, .001]):
                cis[i, j] = (stats.t.ppf(1 - alpha/2, n - 1)
                             * values[indices].std(ddof=1)
                             / np.sqrt(n - 1))

        print(null_name + ' |', end='')
        differences = ((bin_values[:,np.newaxis]
                        < bin_nulls[:,np.newaxis] - cis)
                       | (bin_values[:,np.newaxis]
                          > bin_nulls[:,np.newaxis] + cis))
        for i in range(bin_count):
            if differences[i].any():
                n_stars = np.where(differences[i])[0].max()
                bin_stars = '*' * (1 + n_stars) + ' ' * (2 - n_stars)
            else:
                bin_stars = 'ns.'
            print(' ' + bin_stars + ' |', end='')
        print()



In [7]:

    
def plot_variation(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    relative = kwargs.get('relative', False)
    quantiles = kwargs.get('quantiles', False)
    feature_field = kwargs.get('feature_field', 'feature')
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    nuphi = r'\nu_{\phi' + (',r' if relative else '') + '}'
    plt.plot(middles, values, '-', lw=2, color=color,
             label='${}$'.format(nuphi))
    plt.fill_between(middles, values - cis, values + cis,
                     color=sb.desaturate(color, 0.2), alpha=0.2)
    plt.plot(middles, h0s, '--', color=sb.desaturate(color, 0.2),
             label='${}^0$'.format(nuphi))
    plt.plot(middles, h0ns, linestyle='-.',
             color=sb.desaturate(color, 0.2),
             label='${}^{{00}}$'.format(nuphi))
    plt.plot(middles, middles, linestyle='dotted',
             color=sb.desaturate(color, 0.2),
             label='$y = x$')
    lmin, lmax = middles[0], middles[-1]
    h0min, h0max = min(h0s.min(), h0ns.min()), max(h0s.max(), h0ns.max())
    # Rescale limits if we're touching H0 or H00.
    if h0min < lmin:
        lmin = h0min - (lmax - h0min) / 10
    elif h0max > lmax:
        lmax = h0max + (h0max - lmin) / 10
    plt.xlim(lmin, lmax)
    plt.ylim(lmin, lmax)

    # Test for statistical significance
    print_significance(str(data.iloc[0][feature_field]),
                       x_bins, h0, h0n, y)



In [8]:

    
def plot_grid(data, features, filename,
              plot_function, xlabel, ylabel,
              feature_field='feature', plot_kws={}):
    g = sb.FacetGrid(data=data[data[feature_field]
                               .map(lambda f: f in features)],
                     sharex=False, sharey=False,
                     col=feature_field, hue=feature_field,
                     col_order=features, hue_order=features,
                     col_wrap=3, aspect=1.5, size=3)
    g.map_dataframe(plot_function, **plot_kws)
    g.set_titles('{col_name}')
    g.set_xlabels(xlabel)
    g.set_ylabels(ylabel)
    for ax in g.axes.ravel():
        legend = ax.legend(frameon=True, loc='best')
        if not legend:
            # Skip if nothing was plotted on these axes.
            continue
        frame = legend.get_frame()
        frame.set_facecolor('#f2f2f2')
        frame.set_edgecolor('#000000')
        ax.set_title(Substitution._transformed_feature(ax.get_title())
                     .__doc__)
    if SAVE_FIGURES:
        g.fig.savefig(settings.FIGURE.format(filename),
                      bbox_inches='tight', dpi=300)



In [9]:

    
def plot_bias(ax, data, color, ci=True, relative=False, quantiles=False):
    feature = data.iloc[0].feature
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    scale = abs(h0s.mean())
    ax.plot(np.linspace(0, 1, bin_count),
            (values - h0ns) / scale, '-', lw=2, color=color,
            label=Substitution._transformed_feature(feature).__doc__)
    if ci:
        ax.fill_between(np.linspace(0, 1, bin_count),
                        (values - h0ns - cis) / scale,
                        (values - h0ns + cis) / scale,
                        color=sb.desaturate(color, 0.2), alpha=0.2)



In [10]:

    
def plot_overlay(data, features, filename, palette_name,
                 plot_function, title, xlabel, ylabel, plot_kws={}):
    palette = sb.color_palette(palette_name, len(features))
    fig, ax = plt.subplots(figsize=(12, 6))
    for j, feature in enumerate(features):
        plot_function(ax, data[data.feature == feature].dropna(),
                      color=palette[j], **plot_kws)
    ax.legend(loc='lower right')
    ax.set_title(title)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    if SAVE_FIGURES:
        fig.savefig(settings.FIGURE.format(filename),
                    bbox_inches='tight', dpi=300)
    return ax

2.1 Global feature values

2.1.1 Bins of distribution of appeared global feature values

For each feature $\phi$, we plot the variation upon substitution as explained above



In [11]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | **  |
H_00 | *** | *** | *** | ns. |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | ns. | **  | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | ns. | *** | *** |
H_00 | *** | ns. | **  | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | **  |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *   | *** | *** |
H_00 | ns. | *** | ns. | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | ns. | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *   | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | *** | ns. | **  | **  |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | **  | *** | *** |
H_00 | *** | ns. | ns. | *** |

Then plot $\nu_{\phi} - \nu_{\phi}^{00}$ for each feature (i.e. the measured bias) to see how they compare



In [12]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False});



In [13]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | ns. | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | **  |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | **  |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | *** | ns. | **  | **  |



In [14]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)')\
    .set_ylim(-2, .7);

2.1.2 Quantiles of distribution of appeared global feature values



In [15]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | **  | *** |
H_00 | *** | ns. | ns. | **  |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |
------------------------
H_0  | *** | *** | *** |
H_00 | *** | *   | **  |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *** |
H_00 | *   | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | *** | **  | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | *** | *** | **  | **  |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | ns. | *** | *** |
H_00 | *** | *** | ns. | *** |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | ns. | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | **  | *** | *** |
H_00 | *** | ns. | ns. | *** |



In [16]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False, 'quantiles': True});



In [17]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | *** | *** | **  | **  |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *** |
H_00 | *   | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | ns. | *** | *** |
H_00 | *** | *** | ns. | *** |



In [18]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'quantiles': True})\
    .set_ylim(-1.2, .6);

2.2 Sentence-relative feature values

2.2.1 Bins of distribution of appeared sentence-relative values



In [19]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *** | **  |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | ns. | *   | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | *** | ns. | **  |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | ns. | **  | ns. | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | **  | *** | ns. | *   |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *   | *** | **  |
H_00 | *** | *** | *   | ns. |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | *   | *** | *** | **  |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | ns. | *** | *** |
H_00 | **  | *** | ns. | ns. |



In [20]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True});



In [21]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | *** | *** | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | **  | *** | ns. | *   |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *** | **  |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *   | *** | **  |
H_00 | *** | *** | *   | ns. |



In [22]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_sentencerel',
             'deep', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True})\
    .set_ylim(-2, .7);

2.2.2 Quantiles of distribution of appeared sentence-relative values



In [23]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *   | ns. | *   |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *** |
H_00 | *** | ns. | *** | *   |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | **  | ns. | *** |
H_00 | *** | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | *   | ns. | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | ns. | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *   | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | **  | *** | *** |
H_00 | *** | **  | ns. | **  |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | ns. | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | ns. | *** | *** |
H_00 | *** | **  | ns. | *   |



In [24]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True, 'quantiles': True});



In [25]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | ns. | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | **  | ns. | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | **  | *** | *** |
H_00 | *** | **  | ns. | **  |



In [26]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True, 'quantiles': True});

3 Streamplots

We'd like to see what happens between absolute and relative feature values, i.e. how do their effects interact. Especially, we want to know who wins between cognitive bias, attraction to sentence average, or attraction to global feature average.

To do this we plot the general direction (arrows) and strength (color) of where destination words are given a particular absolute/relative source feature couple. I.e., for a given absolute feature value and relative feature value, if this word were to be substituted, where would it go in this (absolute, relative) space?

The interesting thing in these plots is the attraction front, where all arrows point to and join. We're interested in:

its slope
its shape (e.g. several slope regimes?)
its position w.r.t. $\nu_{\phi}^0$ and $y = 0$ (which is $\left< \phi(sentence) \right>$)

First, here's our plotting function. (Note we set the arrow size to something that turns out to be huge here, but gives normal sizes in the figures saves. There must be some dpi scaling problem with the arrows.)



In [27]:

    
def plot_stream(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    source = data['source']
    source_rel = data['source_rel']
    dest = data['destination']
    dest_rel = data['destination_rel']
    h0 = data['h0']
    
    # Compute binning.
    bin_count = 4
    x_bins, x_margins = pd.cut(source, bin_count,
                               right=False, labels=False, retbins=True)
    x_middles = (x_margins[:-1] + x_margins[1:]) / 2
    y_bins, y_margins = pd.cut(source_rel, bin_count,
                               right=False, labels=False, retbins=True)
    y_middles = (y_margins[:-1] + y_margins[1:]) / 2
    
    # Compute bin values.
    h0s = np.ones(bin_count) * h0.iloc[0]
    u_values = np.zeros((bin_count, bin_count))
    v_values = np.zeros((bin_count, bin_count))
    strength = np.zeros((bin_count, bin_count))
    for x in range(bin_count):
        for y in range(bin_count):
            u_values[y, x] = (
                dest[(x_bins == x) & (y_bins == y)] -
                source[(x_bins == x) & (y_bins == y)]
            ).mean()
            v_values[y, x] = (
                dest_rel[(x_bins == x) & (y_bins == y)] -
                source_rel[(x_bins == x) & (y_bins == y)]
            ).mean()
            strength[y, x] = np.sqrt(
                (dest[(x_bins == x) & (y_bins == y)] - 
                 source[(x_bins == x) & (y_bins == y)]) ** 2 +
                (dest_rel[(x_bins == x) & (y_bins == y)] - 
                 source_rel[(x_bins == x) & (y_bins == y)]) ** 2
            ).mean()
    
    # Plot.
    plt.streamplot(x_middles, y_middles, u_values, v_values,
                   arrowsize=4, color=strength, cmap=plt.cm.viridis)
    plt.plot(x_middles, np.zeros(bin_count), linestyle='-',
             color=sb.desaturate(color, 0.2), 
             label=r'$\left< \phi(sentence) \right>$')
    plt.plot(h0s, y_middles, linestyle='--',
             color=sb.desaturate(color, 0.2), label=r'$\nu_{\phi}^0$')
    plt.xlim(x_middles[0], x_middles[-1])
    plt.ylim(y_middles[0], y_middles[-1])

Here are the plots for all features



In [28]:

    
g = sb.FacetGrid(data=variations,
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=ordered_features, hue_order=ordered_features)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

And here are the plots for the features we expose in the paper



In [29]:

    
g = sb.FacetGrid(data=variations[variations['feature']
                                 .map(lambda f: f in PAPER_FEATURES)],
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=PAPER_FEATURES, hue_order=PAPER_FEATURES)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

4 PCA'd feature variations

Compute PCA on feature variations (note: on variations, not on features directly), and show the evolution of the first three components upon substitution.

CAVEAT: the PCA is computed on variations where all features are defined. This greatly reduces the number of words included (and also the number of substitutions -- see below for real values, but you should know it's drastic). This also has an effect on the computation of $\mathcal{H}_0$ and $\mathcal{H}_{00}$, which are computed using words for which all features are defined. This, again, hugely reduces the number of words taken into account, changing the values under the null hypotheses.

4.1 On all the features

Compute the actual PCA



In [30]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(Substitution.__features__))
pcavariations = variations.pivot(index='cluster_id',
                                 columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

print("We're plotting variation for the first {} components:"
      .format(N_COMPONENTS))
pd.DataFrame(pca.components_[:N_COMPONENTS],
             columns=pcafeatures,
             index=['Component-{}'.format(i) for i in range(N_COMPONENTS)])









    



MLE estimates there are 10 components.

Those explain the following variance:
[ 0.55502294  0.17267444  0.07161956  0.06521569  0.03225433  0.02935426
  0.01982183  0.01778344  0.01528559  0.00879764]

We're plotting variation for the first 3 components:






    Out[30]:






  
    
      
      aoa
      betweenness
      clustering
      degree
      frequency
      letters_count
      orthographic_density
      pagerank
      phonemes_count
      phonological_density
      syllables_count
      synonyms_count
    
  
  
    
      Component-0
      -0.495983
      0.215799
      -0.076269
      0.216337
      0.225148
      -0.459887
      0.192592
      0.249676
      -0.435075
      0.270379
      -0.175185
      0.010494
    
    
      Component-1
      0.374635
      -0.405534
      0.140260
      -0.299004
      -0.261974
      -0.411493
      0.139908
      -0.292163
      -0.429618
      0.186461
      -0.154245
      0.009605
    
    
      Component-2
      0.656900
      0.628446
      -0.103045
      0.216350
      -0.184906
      -0.126580
      0.031159
      0.222963
      -0.052857
      0.091591
      -0.056113
      -0.029739

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [31]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(N_COMPONENTS):
            source, destination = substitution\
                .components(component, pca, pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (39875 of 39875) |####################| Elapsed Time: 0:06:20 Time: 0:06:20

Compute cluster averages (so as not to overestimate confidence intervals).



In [32]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components (see the caveat section below)



In [33]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='upper left')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | **  | *** | *** |
H_00 | ns. | *   | **  | *** |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | ns. |

---
2.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | ns. | *** |
H_00 | *   | *** | **  | **  |

4.2 On a subset of relevant features



In [34]:

    
relevant_features = ['frequency', 'aoa', 'letters_count']

Compute the actual PCA



In [35]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(relevant_features))
pcavariations = variations[variations['feature']
                           .map(lambda f: f in pcafeatures)]\
    .pivot(index='cluster_id', columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

pd.DataFrame(pca.components_,
             columns=pcafeatures,
             index=['Component-{}'.format(i)
                    for i in range(pca.n_components_)])









    



MLE estimates there are 2 components.

Those explain the following variance:
[ 0.68298848  0.19563148]







    Out[35]:






  
    
      
      aoa
      frequency
      letters_count
    
  
  
    
      Component-0
      -0.747822
      0.383865
      -0.541673
    
    
      Component-1
      0.379124
      -0.422859
      -0.823077

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [36]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(pca.n_components_):
            source, destination = substitution.components(component, pca,
                                                          pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (39875 of 39875) |####################| Elapsed Time: 0:04:28 Time: 0:04:28

Compute cluster averages (so as not to overestimate confidence intervals).



In [37]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components



In [38]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | **  | *** | *** |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | ns. |

4.3 CAVEAT: reduction of the numbers of words and substitutions

As explained above, this PCA analysis can only use words for which all the features are defined (in this case, the features listed in relevant_features). So note the following:



In [39]:

    
for feature in relevant_features:
    print("Feature '{}' is based on {} words."
          .format(feature, len(Substitution
                               ._transformed_feature(feature)())))

# Compute the number of words that have all PAPER_FEATURES defined.
words = set()
for tfeature in [Substitution._transformed_feature(feature)
                 for feature in relevant_features]:
    words.update(tfeature())

data = dict((feature, []) for feature in relevant_features)
words_list = []
for word in words:
    words_list.append(word)
    for feature in relevant_features:
        data[feature].append(Substitution
                             ._transformed_feature(feature)(word))
wordsdf = pd.DataFrame(data)
wordsdf['words'] = words_list
del words_list, data

print()
print("Among all the set of words used by these features, "
      "only {} are used."
      .format(len(wordsdf.dropna())))

print()
print("Similarly, we mined {} (cluster-unique) substitutions, "
      "but the PCA is in fact"
      " computed on {} of them (those where all features are defined)."
      .format(len(set(variations['cluster_id'])), len(pcavariations)))









    



Feature 'frequency' is based on 33450 words.
Feature 'aoa' is based on 30102 words.
Feature 'letters_count' is based on 42786 words.

Among all the set of words used by these features, only 14450 are used.

Similarly, we mined 1462 (cluster-unique) substitutions, but the PCA is in fact computed on 1137 of them (those where all features are defined).

The way $\mathcal{H}_0$ and $\mathcal{H}_{00}$ are computed makes them also affected by this.

5 Interactions between features (by Anova)

Some useful variables first.



In [40]:

    
cuts = [('fixed bins', pd.cut)]#, ('quantiles', pd.qcut)]
rels = [('global', ''), ('sentence-relative', '_rel')]

def star_level(p):
    if p < .001:
        return '***'
    elif p < .01:
        return ' **'
    elif p < .05:
        return '  *'
    else:
        return 'ns.'

Now for each feature, assess if it has an interaction with the other features' destination value. We look at this for all pairs of features, with all pairs of global/sentence-relative value and types of binning (fixed width/quantiles). So it's a lot of answers.

Three stars means $p < .001$, two $p < .01$, one $p < .05$, and ns. means non-significative.



In [41]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
  *** sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
   ** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
  *** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
    * global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
    * global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
    * global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
   ** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
   ** global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
    * global -> global
  ns. global -> sentence-relative
   ** sentence-relative -> global
    * sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Now for each feature, look at its interaction with the other features' variation (i.e. destination - source). Same drill, same combinations.



In [42]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]\
                    - variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
   ** global -> global
   ** global -> sentence-relative
    * sentence-relative -> global
   ** sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
   ** global -> global
    * global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
   ** global -> global
   ** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
    * global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Ok, so this can go on for a long time, and I'm not going to look at interactions with this lens (meaning at interaction of couples of features with another feature's destination values).

6 Regression



In [43]:

    
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures



In [44]:

    
rels = {False: ('global', ''),
        True: ('rel', '_rel')}

def regress(data, features, target,
            source_rel=False, dest_rel=False, interactions=False):
    if source_rel not in [True, False, 'both']:
        raise ValueError
    if not isinstance(dest_rel, bool):
        raise ValueError
    # Process source/destination relativeness arguments.
    if isinstance(source_rel, bool):
        source_rel = [source_rel]
    else:
        source_rel = [False, True]
    dest_rel_name, dest_rel = rels[dest_rel]
    
    features = tuple(sorted(features))
    feature_tuples = [('source' + rels[rel][1], feature)
                      for rel in source_rel
                      for feature in features]
    feature_names = [rels[rel][0] + '_' + feature
                     for rel in source_rel
                     for feature in features]
    
    # Get source and destination values.
    source = pd.pivot_table(
        data,
        values=['source' + rels[rel][1] for rel in source_rel],
        index=['cluster_id'],
        columns=['feature']
    )[feature_tuples].dropna()
    destination = variations[variations.feature == target]\
        .pivot(index='cluster_id', columns='feature',
               values='destination' + dest_rel)\
        .loc[source.index][target].dropna()
    source = source.loc[destination.index].values
    destination = destination.values

    # If asked to, get polynomial features.
    if interactions:
        poly = PolynomialFeatures(degree=2, interaction_only=True)
        source = poly.fit_transform(source)
        regress_features = [' * '.join([feature_names[j]
                                        for j, p in enumerate(powers)
                                        if p > 0]) or 'intercept'
                            for powers in poly.powers_]
    else:
        regress_features = feature_names

    # Regress.
    linreg = linear_model.LinearRegression(fit_intercept=not interactions)
    linreg.fit(source, destination)

    # And print the score and coefficients.
    print('Regressing {} with {} measures, {} interactions'
          .format(dest_rel_name + ' ' + target, len(source),
                  'with' if interactions else 'no'))
    print('           ' + '^' * len(dest_rel_name + ' ' + target))
    print('R^2 = {}'
          .format(linreg.score(source, destination)))
    print()
    coeffs = pd.Series(index=regress_features, data=linreg.coef_)
    if not interactions:
        coeffs = pd.Series(index=['intercept'], data=[linreg.intercept_])\
            .append(coeffs)
    with pd.option_context('display.max_rows', 999):
        print(coeffs)



In [45]:

    
for target in PAPER_FEATURES:
    print('-' * 70)
    for source_rel, dest_rel in product([False, True, 'both'],
                                        [False, True]):
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel)
        print()
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel, interactions=True)
        print()









    



----------------------------------------------------------------------
Regressing global frequency with 916 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.12576524594451877

intercept                      5.336622
global_aoa                     0.077836
global_clustering              0.111482
global_frequency               0.452088
global_letters_count          -0.059705
global_orthographic_density   -0.069771
global_synonyms_count         -0.023569
dtype: float64

Regressing global frequency with 916 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.14795837944006807

intercept                                              13.235179
global_aoa                                             -0.258813
global_clustering                                       1.894700
global_frequency                                        0.344360
global_letters_count                                   -0.568262
global_orthographic_density                             0.722636
global_synonyms_count                                   0.460473
global_aoa * global_clustering                         -0.050758
global_aoa * global_frequency                          -0.007996
global_aoa * global_letters_count                       0.019167
global_aoa * global_orthographic_density               -0.015684
global_aoa * global_synonyms_count                     -0.000954
global_clustering * global_frequency                   -0.089088
global_clustering * global_letters_count               -0.097208
global_clustering * global_orthographic_density        -0.012895
global_clustering * global_synonyms_count               0.065528
global_frequency * global_letters_count                -0.028160
global_frequency * global_orthographic_density         -0.137585
global_frequency * global_synonyms_count               -0.015621
global_letters_count * global_orthographic_density      0.085603
global_letters_count * global_synonyms_count           -0.018818
global_orthographic_density * global_synonyms_count     0.136466
dtype: float64

Regressing rel frequency with 916 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.0756700272591091

intercept                     -6.286985
global_aoa                     0.098821
global_clustering              0.070318
global_frequency               0.396259
global_letters_count          -0.014076
global_orthographic_density   -0.138041
global_synonyms_count          0.076317
dtype: float64

Regressing rel frequency with 916 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.09900829988300208

intercept                                              1.375745
global_aoa                                            -0.402505
global_clustering                                      1.473002
global_frequency                                       0.129173
global_letters_count                                  -0.402142
global_orthographic_density                            0.731685
global_synonyms_count                                  0.268602
global_aoa * global_clustering                        -0.052908
global_aoa * global_frequency                         -0.005630
global_aoa * global_letters_count                      0.036505
global_aoa * global_orthographic_density               0.003103
global_aoa * global_synonyms_count                     0.023604
global_clustering * global_frequency                  -0.101857
global_clustering * global_letters_count              -0.043663
global_clustering * global_orthographic_density        0.100395
global_clustering * global_synonyms_count              0.154382
global_frequency * global_letters_count               -0.025913
global_frequency * global_orthographic_density        -0.112050
global_frequency * global_synonyms_count               0.042748
global_letters_count * global_orthographic_density     0.127661
global_letters_count * global_synonyms_count           0.000482
global_orthographic_density * global_synonyms_count    0.136953
dtype: float64

Regressing global frequency with 916 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.06498523236865306

intercept                   9.584236
rel_aoa                     0.094161
rel_clustering              0.005271
rel_frequency               0.273210
rel_letters_count          -0.050457
rel_orthographic_density   -0.043860
rel_synonyms_count         -0.099062
dtype: float64

Regressing global frequency with 916 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.08509449613831299

intercept                                        9.439776
rel_aoa                                          0.149462
rel_clustering                                   0.037526
rel_frequency                                    0.254555
rel_letters_count                                0.017214
rel_orthographic_density                        -0.280768
rel_synonyms_count                               0.285348
rel_aoa * rel_clustering                        -0.008348
rel_aoa * rel_frequency                          0.025527
rel_aoa * rel_letters_count                     -0.007559
rel_aoa * rel_orthographic_density              -0.031399
rel_aoa * rel_synonyms_count                     0.032129
rel_clustering * rel_frequency                  -0.065420
rel_clustering * rel_letters_count              -0.090190
rel_clustering * rel_orthographic_density       -0.005198
rel_clustering * rel_synonyms_count              0.141166
rel_frequency * rel_letters_count               -0.022027
rel_frequency * rel_orthographic_density        -0.085048
rel_frequency * rel_synonyms_count               0.055206
rel_letters_count * rel_orthographic_density     0.050605
rel_letters_count * rel_synonyms_count          -0.105574
rel_orthographic_density * rel_synonyms_count    0.027853
dtype: float64

Regressing rel frequency with 916 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.2806043450275211

intercept                  -1.214164
rel_aoa                     0.066505
rel_clustering              0.160144
rel_frequency               0.636118
rel_letters_count          -0.093645
rel_orthographic_density   -0.225000
rel_synonyms_count          0.037975
dtype: float64

Regressing rel frequency with 916 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.30117437504759625

intercept                                       -1.322056
rel_aoa                                          0.077498
rel_clustering                                   0.154940
rel_frequency                                    0.660798
rel_letters_count                               -0.000200
rel_orthographic_density                        -0.514867
rel_synonyms_count                               0.263599
rel_aoa * rel_clustering                        -0.053928
rel_aoa * rel_frequency                         -0.015226
rel_aoa * rel_letters_count                      0.004937
rel_aoa * rel_orthographic_density               0.039077
rel_aoa * rel_synonyms_count                     0.102731
rel_clustering * rel_frequency                  -0.071167
rel_clustering * rel_letters_count              -0.085905
rel_clustering * rel_orthographic_density       -0.058344
rel_clustering * rel_synonyms_count              0.155158
rel_frequency * rel_letters_count               -0.019298
rel_frequency * rel_orthographic_density        -0.075165
rel_frequency * rel_synonyms_count               0.033337
rel_letters_count * rel_orthographic_density     0.072416
rel_letters_count * rel_synonyms_count          -0.052974
rel_orthographic_density * rel_synonyms_count    0.168503
dtype: float64

Regressing global frequency with 916 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.13097609065419813

intercept                      4.399050
global_aoa                     0.030404
global_clustering              0.145869
global_frequency               0.502216
global_letters_count           0.047949
global_orthographic_density    0.098808
global_synonyms_count          0.086487
rel_aoa                        0.068122
rel_clustering                -0.055569
rel_frequency                 -0.060308
rel_letters_count             -0.117023
rel_orthographic_density      -0.199140
rel_synonyms_count            -0.130551
dtype: float64

Regressing global frequency with 916 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.2095029432911517

intercept                                                 -3.087868
global_aoa                                                -0.668884
global_clustering                                         -3.384040
global_frequency                                           0.417330
global_letters_count                                      -0.660940
global_orthographic_density                                3.103288
global_synonyms_count                                     -7.139633
rel_aoa                                                    1.718209
rel_clustering                                             4.794462
rel_frequency                                             -0.757193
rel_letters_count                                          0.491709
rel_orthographic_density                                   0.839961
rel_synonyms_count                                        12.844561
global_aoa * global_clustering                             0.124519
global_aoa * global_frequency                              0.077693
global_aoa * global_letters_count                          0.128725
global_aoa * global_orthographic_density                   0.072644
global_aoa * global_synonyms_count                        -0.221334
global_aoa * rel_aoa                                      -0.025371
global_aoa * rel_clustering                               -0.082316
global_aoa * rel_frequency                                -0.002610
global_aoa * rel_letters_count                            -0.093802
global_aoa * rel_orthographic_density                     -0.133958
global_aoa * rel_synonyms_count                            0.131760
global_clustering * global_frequency                       0.084661
global_clustering * global_letters_count                   0.112202
global_clustering * global_orthographic_density            0.699740
global_clustering * global_synonyms_count                 -0.405557
global_clustering * rel_aoa                               -0.231189
global_clustering * rel_clustering                         0.193351
global_clustering * rel_frequency                         -0.172359
global_clustering * rel_letters_count                     -0.036245
global_clustering * rel_orthographic_density              -0.301856
global_clustering * rel_synonyms_count                     0.900545
global_frequency * global_letters_count                   -0.006222
global_frequency * global_orthographic_density            -0.024980
global_frequency * global_synonyms_count                   0.086861
global_frequency * rel_aoa                                -0.216494
global_frequency * rel_clustering                         -0.000568
global_frequency * rel_frequency                          -0.009381
global_frequency * rel_letters_count                       0.011608
global_frequency * rel_orthographic_density               -0.176456
global_frequency * rel_synonyms_count                     -0.242627
global_letters_count * global_orthographic_density         0.163252
global_letters_count * global_synonyms_count               0.675418
global_letters_count * rel_aoa                            -0.117347
global_letters_count * rel_clustering                     -0.279789
global_letters_count * rel_frequency                      -0.046203
global_letters_count * rel_letters_count                   0.027144
global_letters_count * rel_orthographic_density           -0.037651
global_letters_count * rel_synonyms_count                 -0.868244
global_orthographic_density * global_synonyms_count        0.843099
global_orthographic_density * rel_aoa                     -0.061067
global_orthographic_density * rel_clustering              -0.969358
global_orthographic_density * rel_frequency                0.044734
global_orthographic_density * rel_letters_count           -0.101080
global_orthographic_density * rel_orthographic_density     0.000420
global_orthographic_density * rel_synonyms_count          -0.568194
global_synonyms_count * rel_aoa                            0.041317
global_synonyms_count * rel_clustering                     0.047574
global_synonyms_count * rel_frequency                     -0.264367
global_synonyms_count * rel_letters_count                 -0.406396
global_synonyms_count * rel_orthographic_density          -0.723883
global_synonyms_count * rel_synonyms_count                -0.111281
rel_aoa * rel_clustering                                   0.099273
rel_aoa * rel_frequency                                    0.112481
rel_aoa * rel_letters_count                                0.102072
rel_aoa * rel_orthographic_density                         0.052756
rel_aoa * rel_synonyms_count                               0.049788
rel_clustering * rel_frequency                             0.024737
rel_clustering * rel_letters_count                         0.102322
rel_clustering * rel_orthographic_density                  0.552204
rel_clustering * rel_synonyms_count                       -0.256775
rel_frequency * rel_letters_count                          0.025038
rel_frequency * rel_orthographic_density                   0.022364
rel_frequency * rel_synonyms_count                         0.491264
rel_letters_count * rel_orthographic_density               0.142051
rel_letters_count * rel_synonyms_count                     0.562542
rel_orthographic_density * rel_synonyms_count              0.496880
dtype: float64

Regressing rel frequency with 916 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.33232400692486597

intercept                      3.183158
global_aoa                     0.035614
global_clustering              0.229764
global_frequency              -0.373253
global_letters_count           0.127470
global_orthographic_density    0.124782
global_synonyms_count          0.025935
rel_aoa                        0.038573
rel_clustering                -0.113308
rel_frequency                  0.855188
rel_letters_count             -0.177379
rel_orthographic_density      -0.209452
rel_synonyms_count            -0.020604
dtype: float64

Regressing rel frequency with 916 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.39272123208073817

intercept                                                -14.041511
global_aoa                                                -0.003513
global_clustering                                         -4.035038
global_frequency                                           0.112775
global_letters_count                                      -0.306664
global_orthographic_density                                4.695004
global_synonyms_count                                     -5.238208
rel_aoa                                                    1.308256
rel_clustering                                             5.236490
rel_frequency                                             -0.111246
rel_letters_count                                          0.202766
rel_orthographic_density                                  -0.235253
rel_synonyms_count                                        12.360315
global_aoa * global_clustering                             0.129607
global_aoa * global_frequency                              0.036861
global_aoa * global_letters_count                          0.099298
global_aoa * global_orthographic_density                   0.045915
global_aoa * global_synonyms_count                        -0.187547
global_aoa * rel_aoa                                      -0.021995
global_aoa * rel_clustering                               -0.091382
global_aoa * rel_frequency                                 0.030972
global_aoa * rel_letters_count                            -0.064535
global_aoa * rel_orthographic_density                     -0.093742
global_aoa * rel_synonyms_count                            0.068852
global_clustering * global_frequency                       0.131599
global_clustering * global_letters_count                   0.148997
global_clustering * global_orthographic_density            0.766809
global_clustering * global_synonyms_count                 -0.341839
global_clustering * rel_aoa                               -0.220540
global_clustering * rel_clustering                         0.185247
global_clustering * rel_frequency                         -0.167497
global_clustering * rel_letters_count                     -0.077968
global_clustering * rel_orthographic_density              -0.339179
global_clustering * rel_synonyms_count                     0.847897
global_frequency * global_letters_count                    0.022527
global_frequency * global_orthographic_density            -0.085941
global_frequency * global_synonyms_count                  -0.005681
global_frequency * rel_aoa                                -0.185173
global_frequency * rel_clustering                         -0.065242
global_frequency * rel_frequency                          -0.003741
global_frequency * rel_letters_count                      -0.014164
global_frequency * rel_orthographic_density               -0.151464
global_frequency * rel_synonyms_count                     -0.237379
global_letters_count * global_orthographic_density         0.111232
global_letters_count * global_synonyms_count               0.581956
global_letters_count * rel_aoa                            -0.107727
global_letters_count * rel_clustering                     -0.280598
global_letters_count * rel_frequency                      -0.061063
global_letters_count * rel_letters_count                   0.021071
global_letters_count * rel_orthographic_density            0.008551
global_letters_count * rel_synonyms_count                 -0.781513
global_orthographic_density * global_synonyms_count        0.717809
global_orthographic_density * rel_aoa                     -0.070161
global_orthographic_density * rel_clustering              -0.914216
global_orthographic_density * rel_frequency                0.095265
global_orthographic_density * rel_letters_count           -0.071337
global_orthographic_density * rel_orthographic_density     0.011477
global_orthographic_density * rel_synonyms_count          -0.525352
global_synonyms_count * rel_aoa                            0.078501
global_synonyms_count * rel_clustering                    -0.056107
global_synonyms_count * rel_frequency                     -0.172627
global_synonyms_count * rel_letters_count                 -0.327648
global_synonyms_count * rel_orthographic_density          -0.512477
global_synonyms_count * rel_synonyms_count                -0.114789
rel_aoa * rel_clustering                                   0.095673
rel_aoa * rel_frequency                                    0.076872
rel_aoa * rel_letters_count                                0.087176
rel_aoa * rel_orthographic_density                         0.034972
rel_aoa * rel_synonyms_count                               0.040018
rel_clustering * rel_frequency                             0.025317
rel_clustering * rel_letters_count                         0.110672
rel_clustering * rel_orthographic_density                  0.483893
rel_clustering * rel_synonyms_count                       -0.138553
rel_frequency * rel_letters_count                          0.034939
rel_frequency * rel_orthographic_density                  -0.002695
rel_frequency * rel_synonyms_count                         0.462837
rel_letters_count * rel_orthographic_density               0.115175
rel_letters_count * rel_synonyms_count                     0.481014
rel_orthographic_density * rel_synonyms_count              0.331515
dtype: float64

----------------------------------------------------------------------
Regressing global aoa with 835 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.15903318281317536

intercept                      4.127336
global_aoa                     0.407612
global_clustering             -0.092151
global_frequency              -0.042421
global_letters_count           0.030305
global_orthographic_density   -0.059005
global_synonyms_count          0.060732
dtype: float64

Regressing global aoa with 835 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.1861539469460648

intercept                                              5.472163
global_aoa                                             0.859646
global_clustering                                      0.018140
global_frequency                                       0.092001
global_letters_count                                  -0.214492
global_orthographic_density                           -3.255002
global_synonyms_count                                 -2.098122
global_aoa * global_clustering                        -0.003198
global_aoa * global_frequency                         -0.057112
global_aoa * global_letters_count                     -0.003144
global_aoa * global_orthographic_density               0.035221
global_aoa * global_synonyms_count                     0.061166
global_clustering * global_frequency                   0.030577
global_clustering * global_letters_count               0.045312
global_clustering * global_orthographic_density       -0.368071
global_clustering * global_synonyms_count             -0.433414
global_frequency * global_letters_count                0.060813
global_frequency * global_orthographic_density         0.093200
global_frequency * global_synonyms_count              -0.122836
global_letters_count * global_orthographic_density    -0.034863
global_letters_count * global_synonyms_count           0.002195
global_orthographic_density * global_synonyms_count    0.231669
dtype: float64

Regressing rel aoa with 835 measures, no interactions
           ^^^^^^^
R^2 = 0.057036892439359166

intercept                      0.220727
global_aoa                     0.184334
global_clustering              0.034933
global_frequency              -0.106786
global_letters_count           0.059719
global_orthographic_density    0.112738
global_synonyms_count          0.075137
dtype: float64

Regressing rel aoa with 835 measures, with interactions
           ^^^^^^^
R^2 = 0.10700170620257798

intercept                                             -4.143186
global_aoa                                             1.564108
global_clustering                                     -0.387165
global_frequency                                       0.241713
global_letters_count                                  -0.346645
global_orthographic_density                           -2.477832
global_synonyms_count                                 -1.013051
global_aoa * global_clustering                         0.071487
global_aoa * global_frequency                         -0.080998
global_aoa * global_letters_count                     -0.043556
global_aoa * global_orthographic_density              -0.002515
global_aoa * global_synonyms_count                     0.041697
global_clustering * global_frequency                   0.062869
global_clustering * global_letters_count              -0.003987
global_clustering * global_orthographic_density       -0.354997
global_clustering * global_synonyms_count             -0.383958
global_frequency * global_letters_count                0.086516
global_frequency * global_orthographic_density         0.088755
global_frequency * global_synonyms_count              -0.203186
global_letters_count * global_orthographic_density    -0.078186
global_letters_count * global_synonyms_count          -0.000920
global_orthographic_density * global_synonyms_count    0.297082
dtype: float64

Regressing global aoa with 835 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.06157896818902031

intercept                   6.871364
rel_aoa                     0.181450
rel_clustering              0.108791
rel_frequency               0.102351
rel_letters_count           0.014355
rel_orthographic_density   -0.403726
rel_synonyms_count          0.079354
dtype: float64

Regressing global aoa with 835 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.11502893575367956

intercept                                        6.755965
rel_aoa                                         -0.133476
rel_clustering                                   0.082825
rel_frequency                                    0.109415
rel_letters_count                                0.086941
rel_orthographic_density                        -0.376684
rel_synonyms_count                               0.007078
rel_aoa * rel_clustering                         0.071257
rel_aoa * rel_frequency                         -0.135673
rel_aoa * rel_letters_count                     -0.021196
rel_aoa * rel_orthographic_density               0.030008
rel_aoa * rel_synonyms_count                     0.012983
rel_clustering * rel_frequency                   0.119585
rel_clustering * rel_letters_count               0.080954
rel_clustering * rel_orthographic_density       -0.199274
rel_clustering * rel_synonyms_count             -0.233882
rel_frequency * rel_letters_count                0.026158
rel_frequency * rel_orthographic_density         0.019216
rel_frequency * rel_synonyms_count              -0.169008
rel_letters_count * rel_orthographic_density     0.011195
rel_letters_count * rel_synonyms_count           0.013412
rel_orthographic_density * rel_synonyms_count    0.433498
dtype: float64

Regressing rel aoa with 835 measures, no interactions
           ^^^^^^^
R^2 = 0.23562421715658555

intercept                   0.470394
rel_aoa                     0.520385
rel_clustering             -0.075242
rel_frequency              -0.073137
rel_letters_count           0.023810
rel_orthographic_density    0.130469
rel_synonyms_count          0.070377
dtype: float64

Regressing rel aoa with 835 measures, with interactions
           ^^^^^^^
R^2 = 0.2712306471092165

intercept                                        0.561445
rel_aoa                                          0.392006
rel_clustering                                  -0.205029
rel_frequency                                   -0.011524
rel_letters_count                                0.073218
rel_orthographic_density                         0.435114
rel_synonyms_count                               0.133380
rel_aoa * rel_clustering                         0.029922
rel_aoa * rel_frequency                         -0.072623
rel_aoa * rel_letters_count                     -0.047440
rel_aoa * rel_orthographic_density              -0.031596
rel_aoa * rel_synonyms_count                     0.005317
rel_clustering * rel_frequency                   0.046565
rel_clustering * rel_letters_count               0.032736
rel_clustering * rel_orthographic_density       -0.219235
rel_clustering * rel_synonyms_count             -0.321181
rel_frequency * rel_letters_count                0.024086
rel_frequency * rel_orthographic_density         0.082316
rel_frequency * rel_synonyms_count              -0.112286
rel_letters_count * rel_orthographic_density    -0.029555
rel_letters_count * rel_synonyms_count          -0.006413
rel_orthographic_density * rel_synonyms_count    0.273760
dtype: float64

Regressing global aoa with 835 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.18123756967367954

intercept                      4.854298
global_aoa                     0.532426
global_clustering             -0.051996
global_frequency              -0.168675
global_letters_count           0.046710
global_orthographic_density    0.056310
global_synonyms_count         -0.076762
rel_aoa                       -0.197254
rel_clustering                -0.014551
rel_frequency                  0.134728
rel_letters_count             -0.015074
rel_orthographic_density      -0.080329
rel_synonyms_count             0.154774
dtype: float64

Regressing global aoa with 835 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.2679064555037203

intercept                                                 32.396546
global_aoa                                                 0.987165
global_clustering                                          4.361887
global_frequency                                          -0.241214
global_letters_count                                      -1.669955
global_orthographic_density                              -12.678729
global_synonyms_count                                     -1.155701
rel_aoa                                                   -1.049197
rel_clustering                                            -4.387101
rel_frequency                                              0.264832
rel_letters_count                                          0.749946
rel_orthographic_density                                   5.814582
rel_synonyms_count                                        -9.040391
global_aoa * global_clustering                            -0.052226
global_aoa * global_frequency                             -0.055632
global_aoa * global_letters_count                         -0.059010
global_aoa * global_orthographic_density                   0.007698
global_aoa * global_synonyms_count                         0.042408
global_aoa * rel_aoa                                       0.041875
global_aoa * rel_clustering                               -0.011758
global_aoa * rel_frequency                                 0.006129
global_aoa * rel_letters_count                             0.066277
global_aoa * rel_orthographic_density                      0.077805
global_aoa * rel_synonyms_count                            0.200321
global_clustering * global_frequency                       0.058995
global_clustering * global_letters_count                  -0.163822
global_clustering * global_orthographic_density           -1.890476
global_clustering * global_synonyms_count                 -0.715205
global_clustering * rel_aoa                                0.057034
global_clustering * rel_clustering                        -0.047198
global_clustering * rel_frequency                         -0.002603
global_clustering * rel_letters_count                      0.040991
global_clustering * rel_orthographic_density               1.118124
global_clustering * rel_synonyms_count                     0.048471
global_frequency * global_letters_count                    0.142737
global_frequency * global_orthographic_density             0.160935
global_frequency * global_synonyms_count                  -0.185771
global_frequency * rel_aoa                                 0.115185
global_frequency * rel_clustering                         -0.101706
global_frequency * rel_frequency                           0.001563
global_frequency * rel_letters_count                      -0.105618
global_frequency * rel_orthographic_density               -0.018433
global_frequency * rel_synonyms_count                      0.542994
global_letters_count * global_orthographic_density        -0.036879
global_letters_count * global_synonyms_count              -0.295428
global_letters_count * rel_aoa                            -0.026395
global_letters_count * rel_clustering                      0.264970
global_letters_count * rel_frequency                      -0.011083
global_letters_count * rel_letters_count                   0.006004
global_letters_count * rel_orthographic_density            0.071081
global_letters_count * rel_synonyms_count                  0.650398
global_orthographic_density * global_synonyms_count       -0.002908
global_orthographic_density * rel_aoa                     -0.110676
global_orthographic_density * rel_clustering               1.947113
global_orthographic_density * rel_frequency               -0.065594
global_orthographic_density * rel_letters_count            0.010057
global_orthographic_density * rel_orthographic_density     0.220517
global_orthographic_density * rel_synonyms_count          -0.330427
global_synonyms_count * rel_aoa                           -0.066936
global_synonyms_count * rel_clustering                     0.148957
global_synonyms_count * rel_frequency                      0.066456
global_synonyms_count * rel_letters_count                 -0.022601
global_synonyms_count * rel_orthographic_density          -0.340387
global_synonyms_count * rel_synonyms_count                 0.170603
rel_aoa * rel_clustering                                   0.076170
rel_aoa * rel_frequency                                   -0.128718
rel_aoa * rel_letters_count                               -0.052344
rel_aoa * rel_orthographic_density                         0.073998
rel_aoa * rel_synonyms_count                              -0.072501
rel_clustering * rel_frequency                             0.119350
rel_clustering * rel_letters_count                        -0.058250
rel_clustering * rel_orthographic_density                 -1.415736
rel_clustering * rel_synonyms_count                        0.061440
rel_frequency * rel_letters_count                          0.014468
rel_frequency * rel_orthographic_density                   0.062414
rel_frequency * rel_synonyms_count                        -0.571070
rel_letters_count * rel_orthographic_density               0.000495
rel_letters_count * rel_synonyms_count                    -0.355542
rel_orthographic_density * rel_synonyms_count              1.167486
dtype: float64

Regressing rel aoa with 835 measures, no interactions
           ^^^^^^^
R^2 = 0.2689656084310281

intercept                      2.447499
global_aoa                    -0.309245
global_clustering             -0.109512
global_frequency              -0.113585
global_letters_count           0.068965
global_orthographic_density    0.053528
global_synonyms_count          0.150192
rel_aoa                        0.743333
rel_clustering                 0.076304
rel_frequency                  0.027734
rel_letters_count             -0.027195
rel_orthographic_density      -0.067328
rel_synonyms_count            -0.097789
dtype: float64

Regressing rel aoa with 835 measures, with interactions
           ^^^^^^^
R^2 = 0.34603624897142027

intercept                                                 26.963584
global_aoa                                                -0.190405
global_clustering                                          2.279694
global_frequency                                          -0.965182
global_letters_count                                      -1.363897
global_orthographic_density                              -11.109973
global_synonyms_count                                     -0.206379
rel_aoa                                                    0.712037
rel_clustering                                            -2.342070
rel_frequency                                              0.611354
rel_letters_count                                          1.563400
rel_orthographic_density                                   7.837716
rel_synonyms_count                                        -6.899304
global_aoa * global_clustering                            -0.029011
global_aoa * global_frequency                             -0.069677
global_aoa * global_letters_count                          0.003432
global_aoa * global_orthographic_density                   0.147288
global_aoa * global_synonyms_count                         0.150739
global_aoa * rel_aoa                                       0.016266
global_aoa * rel_clustering                                0.016629
global_aoa * rel_frequency                                -0.003689
global_aoa * rel_letters_count                            -0.015029
global_aoa * rel_orthographic_density                     -0.102025
global_aoa * rel_synonyms_count                            0.000496
global_clustering * global_frequency                      -0.005423
global_clustering * global_letters_count                   0.042344
global_clustering * global_orthographic_density           -1.185111
global_clustering * global_synonyms_count                 -0.398651
global_clustering * rel_aoa                                0.103551
global_clustering * rel_clustering                        -0.078268
global_clustering * rel_frequency                          0.028707
global_clustering * rel_letters_count                     -0.011806
global_clustering * rel_orthographic_density               0.777877
global_clustering * rel_synonyms_count                    -0.132727
global_frequency * global_letters_count                    0.177260
global_frequency * global_orthographic_density             0.298481
global_frequency * global_synonyms_count                  -0.209430
global_frequency * rel_aoa                                 0.117415
global_frequency * rel_clustering                         -0.074735
global_frequency * rel_frequency                           0.022573
global_frequency * rel_letters_count                      -0.135072
global_frequency * rel_orthographic_density               -0.252584
global_frequency * rel_synonyms_count                      0.453472
global_letters_count * global_orthographic_density         0.019931
global_letters_count * global_synonyms_count              -0.244367
global_letters_count * rel_aoa                            -0.065438
global_letters_count * rel_clustering                      0.082567
global_letters_count * rel_frequency                      -0.050462
global_letters_count * rel_letters_count                   0.006004
global_letters_count * rel_orthographic_density            0.016273
global_letters_count * rel_synonyms_count                  0.483966
global_orthographic_density * global_synonyms_count        0.236781
global_orthographic_density * rel_aoa                     -0.181251
global_orthographic_density * rel_clustering               1.182846
global_orthographic_density * rel_frequency               -0.170862
global_orthographic_density * rel_letters_count           -0.086451
global_orthographic_density * rel_orthographic_density     0.177152
global_orthographic_density * rel_synonyms_count          -0.529893
global_synonyms_count * rel_aoa                           -0.155281
global_synonyms_count * rel_clustering                     0.078751
global_synonyms_count * rel_frequency                      0.104587
global_synonyms_count * rel_letters_count                 -0.039700
global_synonyms_count * rel_orthographic_density          -0.556038
global_synonyms_count * rel_synonyms_count                 0.164983
rel_aoa * rel_clustering                                   0.000636
rel_aoa * rel_frequency                                   -0.102302
rel_aoa * rel_letters_count                                0.005215
rel_aoa * rel_orthographic_density                         0.142882
rel_aoa * rel_synonyms_count                               0.057565
rel_clustering * rel_frequency                             0.116145
rel_clustering * rel_letters_count                        -0.018687
rel_clustering * rel_orthographic_density                 -0.938915
rel_clustering * rel_synonyms_count                       -0.076713
rel_frequency * rel_letters_count                          0.063422
rel_frequency * rel_orthographic_density                   0.225897
rel_frequency * rel_synonyms_count                        -0.464762
rel_letters_count * rel_orthographic_density               0.085048
rel_letters_count * rel_synonyms_count                    -0.233467
rel_orthographic_density * rel_synonyms_count              1.179678
dtype: float64

----------------------------------------------------------------------
Regressing global clustering with 756 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.12422401434810038

intercept                     -3.049286
global_aoa                    -0.026747
global_clustering              0.340500
global_frequency              -0.049548
global_letters_count          -0.006536
global_orthographic_density   -0.042377
global_synonyms_count         -0.036368
dtype: float64

Regressing global clustering with 756 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.17174510605199256

intercept                                             -4.776451
global_aoa                                             0.273745
global_clustering                                     -0.095461
global_frequency                                      -0.336956
global_letters_count                                   0.226757
global_orthographic_density                            0.212506
global_synonyms_count                                 -0.800829
global_aoa * global_clustering                         0.052576
global_aoa * global_frequency                         -0.005055
global_aoa * global_letters_count                      0.005069
global_aoa * global_orthographic_density               0.013188
global_aoa * global_synonyms_count                     0.018428
global_clustering * global_frequency                  -0.038726
global_clustering * global_letters_count               0.061270
global_clustering * global_orthographic_density        0.050596
global_clustering * global_synonyms_count             -0.036522
global_frequency * global_letters_count                0.010783
global_frequency * global_orthographic_density         0.007682
global_frequency * global_synonyms_count               0.044464
global_letters_count * global_orthographic_density    -0.021706
global_letters_count * global_synonyms_count           0.009986
global_orthographic_density * global_synonyms_count   -0.041790
dtype: float64

Regressing rel clustering with 756 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.10227192444436073

intercept                      2.828850
global_aoa                    -0.019664
global_clustering              0.295811
global_frequency              -0.046322
global_letters_count          -0.017125
global_orthographic_density   -0.060461
global_synonyms_count         -0.049126
dtype: float64

Regressing rel clustering with 756 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.13959228652996492

intercept                                              1.490945
global_aoa                                             0.136323
global_clustering                                     -0.166766
global_frequency                                      -0.322227
global_letters_count                                   0.172213
global_orthographic_density                            0.311956
global_synonyms_count                                 -0.587055
global_aoa * global_clustering                         0.045037
global_aoa * global_frequency                          0.004614
global_aoa * global_letters_count                      0.008604
global_aoa * global_orthographic_density               0.004970
global_aoa * global_synonyms_count                     0.018894
global_clustering * global_frequency                  -0.031029
global_clustering * global_letters_count               0.058002
global_clustering * global_orthographic_density        0.060998
global_clustering * global_synonyms_count              0.003152
global_frequency * global_letters_count                0.008466
global_frequency * global_orthographic_density        -0.000528
global_frequency * global_synonyms_count               0.026563
global_letters_count * global_orthographic_density    -0.009747
global_letters_count * global_synonyms_count           0.031799
global_orthographic_density * global_synonyms_count   -0.017656
dtype: float64

Regressing global clustering with 756 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.0735133723429423

intercept                  -5.888720
rel_aoa                    -0.001390
rel_clustering              0.277839
rel_frequency              -0.008613
rel_letters_count           0.006916
rel_orthographic_density    0.005036
rel_synonyms_count         -0.005032
dtype: float64

Regressing global clustering with 756 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.10923979991583053

intercept                                       -5.839601
rel_aoa                                         -0.005062
rel_clustering                                   0.146873
rel_frequency                                    0.006529
rel_letters_count                               -0.006239
rel_orthographic_density                         0.023969
rel_synonyms_count                              -0.110268
rel_aoa * rel_clustering                         0.054410
rel_aoa * rel_frequency                         -0.013932
rel_aoa * rel_letters_count                     -0.005918
rel_aoa * rel_orthographic_density               0.039785
rel_aoa * rel_synonyms_count                     0.009959
rel_clustering * rel_frequency                  -0.002905
rel_clustering * rel_letters_count               0.041916
rel_clustering * rel_orthographic_density       -0.001775
rel_clustering * rel_synonyms_count             -0.037660
rel_frequency * rel_letters_count                0.003018
rel_frequency * rel_orthographic_density         0.012877
rel_frequency * rel_synonyms_count              -0.023243
rel_letters_count * rel_orthographic_density    -0.008986
rel_letters_count * rel_synonyms_count          -0.000769
rel_orthographic_density * rel_synonyms_count   -0.016390
dtype: float64

Regressing rel clustering with 756 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.21384640111189268

intercept                   0.173704
rel_aoa                    -0.017731
rel_clustering              0.478522
rel_frequency              -0.008170
rel_letters_count           0.004241
rel_orthographic_density   -0.008785
rel_synonyms_count         -0.006927
dtype: float64

Regressing rel clustering with 756 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.24399893834389297

intercept                                        0.222430
rel_aoa                                         -0.016837
rel_clustering                                   0.363046
rel_frequency                                    0.003198
rel_letters_count                               -0.028018
rel_orthographic_density                        -0.030960
rel_synonyms_count                              -0.168009
rel_aoa * rel_clustering                         0.047028
rel_aoa * rel_frequency                         -0.008884
rel_aoa * rel_letters_count                     -0.009580
rel_aoa * rel_orthographic_density               0.018516
rel_aoa * rel_synonyms_count                    -0.007962
rel_clustering * rel_frequency                  -0.002020
rel_clustering * rel_letters_count               0.059224
rel_clustering * rel_orthographic_density        0.046446
rel_clustering * rel_synonyms_count             -0.051084
rel_frequency * rel_letters_count               -0.003928
rel_frequency * rel_orthographic_density        -0.004222
rel_frequency * rel_synonyms_count              -0.038112
rel_letters_count * rel_orthographic_density    -0.007160
rel_letters_count * rel_synonyms_count           0.006562
rel_orthographic_density * rel_synonyms_count   -0.038765
dtype: float64

Regressing global clustering with 756 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.14035734940902078

intercept                     -1.150288
global_aoa                    -0.036184
global_clustering              0.408787
global_frequency              -0.115757
global_letters_count          -0.083517
global_orthographic_density   -0.175657
global_synonyms_count         -0.121098
rel_aoa                        0.014019
rel_clustering                -0.070117
rel_frequency                  0.078877
rel_letters_count              0.080155
rel_orthographic_density       0.144413
rel_synonyms_count             0.088391
dtype: float64

Regressing global clustering with 756 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.2615572231990728

intercept                                                 12.639389
global_aoa                                                 0.027492
global_clustering                                          3.235353
global_frequency                                          -1.338911
global_letters_count                                       0.059520
global_orthographic_density                               -0.076390
global_synonyms_count                                     -1.794813
rel_aoa                                                   -0.123321
rel_clustering                                            -3.450430
rel_frequency                                              0.512256
rel_letters_count                                         -0.002194
rel_orthographic_density                                  -0.558460
rel_synonyms_count                                        -0.149117
global_aoa * global_clustering                            -0.022772
global_aoa * global_frequency                              0.007821
global_aoa * global_letters_count                         -0.025311
global_aoa * global_orthographic_density                  -0.094256
global_aoa * global_synonyms_count                        -0.017508
global_aoa * rel_aoa                                       0.023909
global_aoa * rel_clustering                                0.061448
global_aoa * rel_frequency                                 0.002669
global_aoa * rel_letters_count                             0.035002
global_aoa * rel_orthographic_density                      0.092496
global_aoa * rel_synonyms_count                            0.112450
global_clustering * global_frequency                      -0.220917
global_clustering * global_letters_count                  -0.056882
global_clustering * global_orthographic_density           -0.101301
global_clustering * global_synonyms_count                 -0.257597
global_clustering * rel_aoa                                0.034669
global_clustering * rel_clustering                        -0.166024
global_clustering * rel_frequency                          0.103819
global_clustering * rel_letters_count                      0.133053
global_clustering * rel_orthographic_density               0.103913
global_clustering * rel_synonyms_count                     0.430311
global_frequency * global_letters_count                   -0.031731
global_frequency * global_orthographic_density            -0.009896
global_frequency * global_synonyms_count                   0.136965
global_frequency * rel_aoa                                -0.005472
global_frequency * rel_clustering                          0.120819
global_frequency * rel_frequency                           0.010006
global_frequency * rel_letters_count                       0.056630
global_frequency * rel_orthographic_density                0.060831
global_frequency * rel_synonyms_count                      0.056790
global_letters_count * global_orthographic_density         0.026210
global_letters_count * global_synonyms_count              -0.032166
global_letters_count * rel_aoa                             0.016111
global_letters_count * rel_clustering                      0.128969
global_letters_count * rel_frequency                       0.034178
global_letters_count * rel_letters_count                   0.008905
global_letters_count * rel_orthographic_density           -0.019066
global_letters_count * rel_synonyms_count                  0.080398
global_orthographic_density * global_synonyms_count       -0.431223
global_orthographic_density * rel_aoa                      0.100476
global_orthographic_density * rel_clustering               0.157372
global_orthographic_density * rel_frequency               -0.028440
global_orthographic_density * rel_letters_count           -0.021024
global_orthographic_density * rel_orthographic_density     0.069184
global_orthographic_density * rel_synonyms_count           0.421449
global_synonyms_count * rel_aoa                            0.067174
global_synonyms_count * rel_clustering                    -0.095532
global_synonyms_count * rel_frequency                      0.001514
global_synonyms_count * rel_letters_count                  0.007440
global_synonyms_count * rel_orthographic_density           0.227670
global_synonyms_count * rel_synonyms_count                 0.047822
rel_aoa * rel_clustering                                  -0.001201
rel_aoa * rel_frequency                                   -0.004535
rel_aoa * rel_letters_count                               -0.036756
rel_aoa * rel_orthographic_density                        -0.037675
rel_aoa * rel_synonyms_count                              -0.129023
rel_clustering * rel_frequency                            -0.049870
rel_clustering * rel_letters_count                        -0.138120
rel_clustering * rel_orthographic_density                 -0.132447
rel_clustering * rel_synonyms_count                       -0.103234
rel_frequency * rel_letters_count                         -0.047409
rel_frequency * rel_orthographic_density                   0.010938
rel_frequency * rel_synonyms_count                        -0.163041
rel_letters_count * rel_orthographic_density               0.035743
rel_letters_count * rel_synonyms_count                    -0.052750
rel_orthographic_density * rel_synonyms_count             -0.191227
dtype: float64

Regressing rel clustering with 756 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.27445039938381466

intercept                     -0.500179
global_aoa                    -0.031972
global_clustering             -0.449983
global_frequency              -0.107264
global_letters_count          -0.082650
global_orthographic_density   -0.149336
global_synonyms_count         -0.110231
rel_aoa                        0.011205
rel_clustering                 0.853921
rel_frequency                  0.073327
rel_letters_count              0.079554
rel_orthographic_density       0.110209
rel_synonyms_count             0.083979
dtype: float64

Regressing rel clustering with 756 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.3721031299826959

intercept                                                 9.462646
global_aoa                                               -0.052187
global_clustering                                         1.407294
global_frequency                                         -1.016011
global_letters_count                                     -0.055960
global_orthographic_density                               0.070276
global_synonyms_count                                    -2.049042
rel_aoa                                                  -0.038673
rel_clustering                                           -1.835718
rel_frequency                                             0.366341
rel_letters_count                                         0.091576
rel_orthographic_density                                 -0.350106
rel_synonyms_count                                       -0.103641
global_aoa * global_clustering                           -0.019188
global_aoa * global_frequency                             0.010142
global_aoa * global_letters_count                        -0.012776
global_aoa * global_orthographic_density                 -0.084326
global_aoa * global_synonyms_count                       -0.006979
global_aoa * rel_aoa                                      0.022986
global_aoa * rel_clustering                               0.047164
global_aoa * rel_frequency                               -0.002139
global_aoa * rel_letters_count                            0.025213
global_aoa * rel_orthographic_density                     0.086782
global_aoa * rel_synonyms_count                           0.080768
global_clustering * global_frequency                     -0.152075
global_clustering * global_letters_count                 -0.025938
global_clustering * global_orthographic_density          -0.031654
global_clustering * global_synonyms_count                -0.251161
global_clustering * rel_aoa                               0.013856
global_clustering * rel_clustering                       -0.183179
global_clustering * rel_frequency                         0.064096
global_clustering * rel_letters_count                     0.102198
global_clustering * rel_orthographic_density              0.082418
global_clustering * rel_synonyms_count                    0.381261
global_frequency * global_letters_count                  -0.015150
global_frequency * global_orthographic_density           -0.000367
global_frequency * global_synonyms_count                  0.076824
global_frequency * rel_aoa                               -0.015842
global_frequency * rel_clustering                         0.071228
global_frequency * rel_frequency                          0.006672
global_frequency * rel_letters_count                      0.039459
global_frequency * rel_orthographic_density               0.042505
global_frequency * rel_synonyms_count                     0.105312
global_letters_count * global_orthographic_density        0.034533
global_letters_count * global_synonyms_count              0.039289
global_letters_count * rel_aoa                            0.000901
global_letters_count * rel_clustering                     0.102283
global_letters_count * rel_frequency                      0.023751
global_letters_count * rel_letters_count                  0.007542
global_letters_count * rel_orthographic_density          -0.031148
global_letters_count * rel_synonyms_count                 0.037289
global_orthographic_density * global_synonyms_count      -0.166835
global_orthographic_density * rel_aoa                     0.083105
global_orthographic_density * rel_clustering              0.095078
global_orthographic_density * rel_frequency              -0.023595
global_orthographic_density * rel_letters_count          -0.014203
global_orthographic_density * rel_orthographic_density    0.071396
global_orthographic_density * rel_synonyms_count          0.183975
global_synonyms_count * rel_aoa                           0.069922
global_synonyms_count * rel_clustering                   -0.076057
global_synonyms_count * rel_frequency                     0.005642
global_synonyms_count * rel_letters_count                -0.069545
global_synonyms_count * rel_orthographic_density          0.019472
global_synonyms_count * rel_synonyms_count                0.054747
rel_aoa * rel_clustering                                  0.012007
rel_aoa * rel_frequency                                   0.005506
rel_aoa * rel_letters_count                              -0.026313
rel_aoa * rel_orthographic_density                       -0.041375
rel_aoa * rel_synonyms_count                             -0.120929
rel_clustering * rel_frequency                           -0.030981
rel_clustering * rel_letters_count                       -0.106487
rel_clustering * rel_orthographic_density                -0.105084
rel_clustering * rel_synonyms_count                      -0.089135
rel_frequency * rel_letters_count                        -0.036932
rel_frequency * rel_orthographic_density                  0.010566
rel_frequency * rel_synonyms_count                       -0.166898
rel_letters_count * rel_orthographic_density              0.037609
rel_letters_count * rel_synonyms_count                    0.002746
rel_orthographic_density * rel_synonyms_count            -0.029139
dtype: float64

----------------------------------------------------------------------
Regressing global letters_count with 916 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.11893462623793914

intercept                      2.661659
global_aoa                     0.001554
global_clustering             -0.214465
global_frequency               0.053183
global_letters_count           0.353370
global_orthographic_density   -0.126639
global_synonyms_count         -0.197031
dtype: float64

Regressing global letters_count with 916 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.13342892868600909

intercept                                             -0.891176
global_aoa                                             0.277141
global_clustering                                     -1.694793
global_frequency                                       0.188428
global_letters_count                                  -0.144452
global_orthographic_density                           -1.502258
global_synonyms_count                                  0.250704
global_aoa * global_clustering                         0.103892
global_aoa * global_frequency                          0.040994
global_aoa * global_letters_count                     -0.006857
global_aoa * global_orthographic_density              -0.020346
global_aoa * global_synonyms_count                     0.037143
global_clustering * global_frequency                   0.128538
global_clustering * global_letters_count              -0.063403
global_clustering * global_orthographic_density       -0.058937
global_clustering * global_synonyms_count              0.067136
global_frequency * global_letters_count                0.022565
global_frequency * global_orthographic_density         0.142429
global_frequency * global_synonyms_count              -0.008593
global_letters_count * global_orthographic_density    -0.017732
global_letters_count * global_synonyms_count          -0.005987
global_orthographic_density * global_synonyms_count   -0.154757
dtype: float64

Regressing rel letters_count with 916 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.0673340433536288

intercept                     -0.288824
global_aoa                    -0.044285
global_clustering             -0.191994
global_frequency               0.013653
global_letters_count           0.296935
global_orthographic_density   -0.035957
global_synonyms_count         -0.251433
dtype: float64

Regressing rel letters_count with 916 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.09032936645711653

intercept                                             -5.709652
global_aoa                                             0.465934
global_clustering                                     -2.166707
global_frequency                                       0.258288
global_letters_count                                  -0.307890
global_orthographic_density                           -2.093874
global_synonyms_count                                 -0.206443
global_aoa * global_clustering                         0.134742
global_aoa * global_frequency                          0.038094
global_aoa * global_letters_count                     -0.015045
global_aoa * global_orthographic_density              -0.011898
global_aoa * global_synonyms_count                     0.045599
global_clustering * global_frequency                   0.172309
global_clustering * global_letters_count              -0.058689
global_clustering * global_orthographic_density       -0.137738
global_clustering * global_synonyms_count             -0.043300
global_frequency * global_letters_count                0.046301
global_frequency * global_orthographic_density         0.173216
global_frequency * global_synonyms_count              -0.055876
global_letters_count * global_orthographic_density    -0.043377
global_letters_count * global_synonyms_count           0.006889
global_orthographic_density * global_synonyms_count   -0.123978
dtype: float64

Regressing global letters_count with 916 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1011495666269473

intercept                   5.705693
rel_aoa                    -0.127347
rel_clustering              0.009635
rel_frequency               0.095691
rel_letters_count           0.316939
rel_orthographic_density   -0.273487
rel_synonyms_count         -0.157588
dtype: float64

Regressing global letters_count with 916 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.11031136175947187

intercept                                        5.558510
rel_aoa                                         -0.265555
rel_clustering                                   0.050937
rel_frequency                                    0.069948
rel_letters_count                                0.463094
rel_orthographic_density                        -0.216875
rel_synonyms_count                              -0.360640
rel_aoa * rel_clustering                         0.054232
rel_aoa * rel_frequency                         -0.026535
rel_aoa * rel_letters_count                      0.003060
rel_aoa * rel_orthographic_density              -0.040259
rel_aoa * rel_synonyms_count                    -0.015922
rel_clustering * rel_frequency                   0.021383
rel_clustering * rel_letters_count              -0.067408
rel_clustering * rel_orthographic_density       -0.133005
rel_clustering * rel_synonyms_count              0.148612
rel_frequency * rel_letters_count                0.029484
rel_frequency * rel_orthographic_density         0.038161
rel_frequency * rel_synonyms_count               0.004359
rel_letters_count * rel_orthographic_density     0.046552
rel_letters_count * rel_synonyms_count           0.073125
rel_orthographic_density * rel_synonyms_count    0.010250
dtype: float64

Regressing rel letters_count with 916 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.1589804354709511

intercept                   1.225975
rel_aoa                    -0.093598
rel_clustering             -0.113382
rel_frequency              -0.098330
rel_letters_count           0.458841
rel_orthographic_density    0.005284
rel_synonyms_count         -0.223634
dtype: float64

Regressing rel letters_count with 916 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.17272773306775258

intercept                                        1.056261
rel_aoa                                         -0.184811
rel_clustering                                  -0.044131
rel_frequency                                   -0.153697
rel_letters_count                                0.643782
rel_orthographic_density                         0.128080
rel_synonyms_count                              -0.415045
rel_aoa * rel_clustering                         0.057398
rel_aoa * rel_frequency                         -0.001198
rel_aoa * rel_letters_count                     -0.022968
rel_aoa * rel_orthographic_density              -0.110218
rel_aoa * rel_synonyms_count                     0.016333
rel_clustering * rel_frequency                   0.059628
rel_clustering * rel_letters_count              -0.013642
rel_clustering * rel_orthographic_density       -0.075562
rel_clustering * rel_synonyms_count              0.104857
rel_frequency * rel_letters_count                0.040805
rel_frequency * rel_orthographic_density         0.066329
rel_frequency * rel_synonyms_count               0.003620
rel_letters_count * rel_orthographic_density     0.057023
rel_letters_count * rel_synonyms_count           0.040246
rel_orthographic_density * rel_synonyms_count   -0.064648
dtype: float64

Regressing global letters_count with 916 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1372478958183253

intercept                     -0.746463
global_aoa                     0.151132
global_clustering             -0.520666
global_frequency               0.089935
global_letters_count           0.339886
global_orthographic_density   -0.036326
global_synonyms_count         -0.134686
rel_aoa                       -0.226147
rel_clustering                 0.365245
rel_frequency                 -0.051155
rel_letters_count              0.022084
rel_orthographic_density      -0.068576
rel_synonyms_count            -0.036146
dtype: float64

Regressing global letters_count with 916 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.21128243578774164

intercept                                                 10.809658
global_aoa                                                 1.689153
global_clustering                                          1.269882
global_frequency                                          -0.220974
global_letters_count                                      -2.260321
global_orthographic_density                               -3.958919
global_synonyms_count                                      4.912576
rel_aoa                                                   -3.254560
rel_clustering                                            -6.020520
rel_frequency                                              1.830121
rel_letters_count                                          1.926089
rel_orthographic_density                                  -1.739529
rel_synonyms_count                                        -9.647303
global_aoa * global_clustering                             0.223617
global_aoa * global_frequency                             -0.026439
global_aoa * global_letters_count                         -0.110539
global_aoa * global_orthographic_density                   0.232445
global_aoa * global_synonyms_count                         0.211391
global_aoa * rel_aoa                                       0.032116
global_aoa * rel_clustering                               -0.133135
global_aoa * rel_frequency                                 0.029994
global_aoa * rel_letters_count                             0.073037
global_aoa * rel_orthographic_density                     -0.236711
global_aoa * rel_synonyms_count                           -0.148389
global_clustering * global_frequency                       0.017042
global_clustering * global_letters_count                  -0.668448
global_clustering * global_orthographic_density           -0.110780
global_clustering * global_synonyms_count                 -0.674150
global_clustering * rel_aoa                               -0.057018
global_clustering * rel_clustering                         0.006309
global_clustering * rel_frequency                          0.177842
global_clustering * rel_letters_count                      0.474004
global_clustering * rel_orthographic_density              -0.460015
global_clustering * rel_synonyms_count                     0.081409
global_frequency * global_letters_count                   -0.002606
global_frequency * global_orthographic_density             0.316419
global_frequency * global_synonyms_count                  -0.355597
global_frequency * rel_aoa                                 0.220921
global_frequency * rel_clustering                          0.278785
global_frequency * rel_frequency                          -0.014964
global_frequency * rel_letters_count                       0.000581
global_frequency * rel_orthographic_density               -0.097768
global_frequency * rel_synonyms_count                      0.374561
global_letters_count * global_orthographic_density        -0.277797
global_letters_count * global_synonyms_count              -0.777345
global_letters_count * rel_aoa                             0.137642
global_letters_count * rel_clustering                      0.743620
global_letters_count * rel_frequency                      -0.026108
global_letters_count * rel_letters_count                   0.022060
global_letters_count * rel_orthographic_density            0.184372
global_letters_count * rel_synonyms_count                  0.968588
global_orthographic_density * global_synonyms_count       -1.373193
global_orthographic_density * rel_aoa                     -0.242470
global_orthographic_density * rel_clustering               0.452319
global_orthographic_density * rel_frequency               -0.360858
global_orthographic_density * rel_letters_count            0.233430
global_orthographic_density * rel_orthographic_density     0.228383
global_orthographic_density * rel_synonyms_count           1.118944
global_synonyms_count * rel_aoa                            0.013698
global_synonyms_count * rel_clustering                     0.640339
global_synonyms_count * rel_frequency                      0.242484
global_synonyms_count * rel_letters_count                  0.428453
global_synonyms_count * rel_orthographic_density           1.307913
global_synonyms_count * rel_synonyms_count                 0.094237
rel_aoa * rel_clustering                                   0.112102
rel_aoa * rel_frequency                                   -0.172969
rel_aoa * rel_letters_count                               -0.137927
rel_aoa * rel_orthographic_density                         0.243642
rel_aoa * rel_synonyms_count                               0.043439
rel_clustering * rel_frequency                            -0.311508
rel_clustering * rel_letters_count                        -0.623239
rel_clustering * rel_orthographic_density                  0.084231
rel_clustering * rel_synonyms_count                       -0.011683
rel_frequency * rel_letters_count                          0.040954
rel_frequency * rel_orthographic_density                   0.316636
rel_frequency * rel_synonyms_count                        -0.263163
rel_letters_count * rel_orthographic_density              -0.015493
rel_letters_count * rel_synonyms_count                    -0.556858
rel_orthographic_density * rel_synonyms_count             -0.995388
dtype: float64

Regressing rel letters_count with 916 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.20948835999690085

intercept                     -1.878268
global_aoa                     0.105065
global_clustering             -0.516309
global_frequency               0.134058
global_letters_count          -0.466556
global_orthographic_density   -0.020983
global_synonyms_count         -0.061023
rel_aoa                       -0.161245
rel_clustering                 0.374093
rel_frequency                 -0.108840
rel_letters_count              0.848522
rel_orthographic_density      -0.086004
rel_synonyms_count            -0.123299
dtype: float64

Regressing rel letters_count with 916 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.27293682772124284

intercept                                                 13.757976
global_aoa                                                 0.512858
global_clustering                                         -0.381773
global_frequency                                          -1.143156
global_letters_count                                      -2.826731
global_orthographic_density                               -5.418355
global_synonyms_count                                      4.916537
rel_aoa                                                   -2.024200
rel_clustering                                            -4.447428
rel_frequency                                              2.116679
rel_letters_count                                          2.187917
rel_orthographic_density                                  -1.861344
rel_synonyms_count                                       -10.294668
global_aoa * global_clustering                             0.168892
global_aoa * global_frequency                              0.028176
global_aoa * global_letters_count                         -0.051313
global_aoa * global_orthographic_density                   0.205414
global_aoa * global_synonyms_count                         0.196261
global_aoa * rel_aoa                                       0.031075
global_aoa * rel_clustering                               -0.064435
global_aoa * rel_frequency                                -0.014930
global_aoa * rel_letters_count                             0.032125
global_aoa * rel_orthographic_density                     -0.192061
global_aoa * rel_synonyms_count                           -0.164352
global_clustering * global_frequency                       0.041339
global_clustering * global_letters_count                  -0.414236
global_clustering * global_orthographic_density            0.094586
global_clustering * global_synonyms_count                 -0.303946
global_clustering * rel_aoa                               -0.021639
global_clustering * rel_clustering                        -0.063454
global_clustering * rel_frequency                          0.123220
global_clustering * rel_letters_count                      0.228752
global_clustering * rel_orthographic_density              -0.733099
global_clustering * rel_synonyms_count                    -0.502433
global_frequency * global_letters_count                    0.067343
global_frequency * global_orthographic_density             0.542887
global_frequency * global_synonyms_count                  -0.195632
global_frequency * rel_aoa                                 0.158097
global_frequency * rel_clustering                          0.207219
global_frequency * rel_frequency                          -0.006688
global_frequency * rel_letters_count                      -0.036448
global_frequency * rel_orthographic_density               -0.239517
global_frequency * rel_synonyms_count                      0.165665
global_letters_count * global_orthographic_density        -0.174394
global_letters_count * global_synonyms_count              -0.655224
global_letters_count * rel_aoa                             0.082064
global_letters_count * rel_clustering                      0.511015
global_letters_count * rel_frequency                      -0.065433
global_letters_count * rel_letters_count                   0.005274
global_letters_count * rel_orthographic_density            0.132525
global_letters_count * rel_synonyms_count                  0.851604
global_orthographic_density * global_synonyms_count       -1.315453
global_orthographic_density * rel_aoa                     -0.227017
global_orthographic_density * rel_clustering               0.204955
global_orthographic_density * rel_frequency               -0.509542
global_orthographic_density * rel_letters_count            0.157286
global_orthographic_density * rel_orthographic_density     0.206581
global_orthographic_density * rel_synonyms_count           1.089993
global_synonyms_count * rel_aoa                           -0.009656
global_synonyms_count * rel_clustering                     0.442619
global_synonyms_count * rel_frequency                      0.174405
global_synonyms_count * rel_letters_count                  0.348831
global_synonyms_count * rel_orthographic_density           1.186960
global_synonyms_count * rel_synonyms_count                 0.047531
rel_aoa * rel_clustering                                   0.023692
rel_aoa * rel_frequency                                   -0.116910
rel_aoa * rel_letters_count                               -0.096824
rel_aoa * rel_orthographic_density                         0.223091
rel_aoa * rel_synonyms_count                               0.118891
rel_clustering * rel_frequency                            -0.229302
rel_clustering * rel_letters_count                        -0.376741
rel_clustering * rel_orthographic_density                  0.352585
rel_clustering * rel_synonyms_count                        0.353467
rel_frequency * rel_letters_count                          0.066092
rel_frequency * rel_orthographic_density                   0.400049
rel_frequency * rel_synonyms_count                        -0.165391
rel_letters_count * rel_orthographic_density              -0.005338
rel_letters_count * rel_synonyms_count                    -0.481752
rel_orthographic_density * rel_synonyms_count             -0.844831
dtype: float64

----------------------------------------------------------------------
Regressing global synonyms_count with 889 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.10042448684255367

intercept                      1.118460
global_aoa                    -0.016433
global_clustering              0.056667
global_frequency              -0.022565
global_letters_count          -0.025240
global_orthographic_density   -0.005193
global_synonyms_count          0.275913
dtype: float64

Regressing global synonyms_count with 889 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1186144401509085

intercept                                              1.929291
global_aoa                                             0.024795
global_clustering                                      0.448772
global_frequency                                       0.033081
global_letters_count                                  -0.019344
global_orthographic_density                           -0.166078
global_synonyms_count                                  0.346970
global_aoa * global_clustering                        -0.011119
global_aoa * global_frequency                         -0.009472
global_aoa * global_letters_count                     -0.002668
global_aoa * global_orthographic_density              -0.009324
global_aoa * global_synonyms_count                     0.024440
global_clustering * global_frequency                  -0.018218
global_clustering * global_letters_count              -0.013368
global_clustering * global_orthographic_density       -0.058241
global_clustering * global_synonyms_count              0.046840
global_frequency * global_letters_count               -0.009040
global_frequency * global_orthographic_density        -0.021231
global_frequency * global_synonyms_count              -0.022857
global_letters_count * global_orthographic_density     0.008463
global_letters_count * global_synonyms_count           0.022102
global_orthographic_density * global_synonyms_count    0.064305
dtype: float64

Regressing rel synonyms_count with 889 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.06681236474208474

intercept                      0.736096
global_aoa                    -0.017944
global_clustering              0.047938
global_frequency              -0.018370
global_letters_count          -0.020680
global_orthographic_density   -0.014086
global_synonyms_count          0.214816
dtype: float64

Regressing rel synonyms_count with 889 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.08752307672995452

intercept                                              1.917442
global_aoa                                             0.000935
global_clustering                                      0.498738
global_frequency                                      -0.044105
global_letters_count                                   0.030331
global_orthographic_density                           -0.093323
global_synonyms_count                                  0.342386
global_aoa * global_clustering                        -0.015574
global_aoa * global_frequency                         -0.005705
global_aoa * global_letters_count                     -0.006766
global_aoa * global_orthographic_density              -0.017220
global_aoa * global_synonyms_count                     0.025160
global_clustering * global_frequency                  -0.025980
global_clustering * global_letters_count              -0.010624
global_clustering * global_orthographic_density       -0.044249
global_clustering * global_synonyms_count              0.051851
global_frequency * global_letters_count               -0.008671
global_frequency * global_orthographic_density        -0.013952
global_frequency * global_synonyms_count              -0.022212
global_letters_count * global_orthographic_density     0.008670
global_letters_count * global_synonyms_count           0.019200
global_orthographic_density * global_synonyms_count    0.050167
dtype: float64

Regressing global synonyms_count with 889 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.07902665058600067

intercept                   0.404166
rel_aoa                     0.008059
rel_clustering              0.000674
rel_frequency              -0.022531
rel_letters_count          -0.025694
rel_orthographic_density    0.016530
rel_synonyms_count          0.266772
dtype: float64

Regressing global synonyms_count with 889 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.10103624455165283

intercept                                        0.447450
rel_aoa                                         -0.012011
rel_clustering                                  -0.103201
rel_frequency                                   -0.013050
rel_letters_count                               -0.058753
rel_orthographic_density                         0.010903
rel_synonyms_count                               0.151245
rel_aoa * rel_clustering                        -0.015391
rel_aoa * rel_frequency                         -0.005985
rel_aoa * rel_letters_count                      0.010539
rel_aoa * rel_orthographic_density               0.011075
rel_aoa * rel_synonyms_count                     0.013575
rel_clustering * rel_frequency                  -0.017758
rel_clustering * rel_letters_count               0.016715
rel_clustering * rel_orthographic_density       -0.030570
rel_clustering * rel_synonyms_count              0.026832
rel_frequency * rel_letters_count               -0.001009
rel_frequency * rel_orthographic_density        -0.004483
rel_frequency * rel_synonyms_count              -0.000481
rel_letters_count * rel_orthographic_density    -0.006879
rel_letters_count * rel_synonyms_count           0.048317
rel_orthographic_density * rel_synonyms_count    0.040137
dtype: float64

Regressing rel synonyms_count with 889 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.1625214399964614

intercept                   0.071250
rel_aoa                    -0.006205
rel_clustering              0.032927
rel_frequency              -0.017693
rel_letters_count          -0.022112
rel_orthographic_density   -0.005548
rel_synonyms_count          0.390280
dtype: float64

Regressing rel synonyms_count with 889 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.17967873769719145

intercept                                        0.123230
rel_aoa                                         -0.015624
rel_clustering                                  -0.086076
rel_frequency                                    0.001590
rel_letters_count                               -0.048248
rel_orthographic_density                        -0.006207
rel_synonyms_count                               0.370801
rel_aoa * rel_clustering                        -0.008608
rel_aoa * rel_frequency                         -0.001011
rel_aoa * rel_letters_count                      0.004209
rel_aoa * rel_orthographic_density              -0.004440
rel_aoa * rel_synonyms_count                     0.001691
rel_clustering * rel_frequency                  -0.024540
rel_clustering * rel_letters_count               0.017098
rel_clustering * rel_orthographic_density       -0.022379
rel_clustering * rel_synonyms_count              0.027173
rel_frequency * rel_letters_count               -0.004810
rel_frequency * rel_orthographic_density        -0.000536
rel_frequency * rel_synonyms_count               0.016793
rel_letters_count * rel_orthographic_density     0.001720
rel_letters_count * rel_synonyms_count           0.043644
rel_orthographic_density * rel_synonyms_count    0.056845
dtype: float64

Regressing global synonyms_count with 889 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.10751830919300921

intercept                      1.680244
global_aoa                    -0.034395
global_clustering              0.138368
global_frequency              -0.017163
global_letters_count          -0.012993
global_orthographic_density   -0.024322
global_synonyms_count          0.226318
rel_aoa                        0.027348
rel_clustering                -0.096243
rel_frequency                 -0.004799
rel_letters_count             -0.014224
rel_orthographic_density       0.017611
rel_synonyms_count             0.051621
dtype: float64

Regressing global synonyms_count with 889 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.202452069964017

intercept                                                 6.277285
global_aoa                                               -0.082010
global_clustering                                         1.653434
global_frequency                                         -0.234125
global_letters_count                                      0.120092
global_orthographic_density                               0.403236
global_synonyms_count                                     2.657418
rel_aoa                                                  -0.197459
rel_clustering                                           -1.131238
rel_frequency                                             0.046798
rel_letters_count                                        -0.181572
rel_orthographic_density                                 -0.535255
rel_synonyms_count                                       -3.356617
global_aoa * global_clustering                           -0.038067
global_aoa * global_frequency                            -0.000852
global_aoa * global_letters_count                        -0.018022
global_aoa * global_orthographic_density                 -0.057579
global_aoa * global_synonyms_count                        0.032144
global_aoa * rel_aoa                                     -0.007541
global_aoa * rel_clustering                               0.065367
global_aoa * rel_frequency                               -0.009839
global_aoa * rel_letters_count                            0.004723
global_aoa * rel_orthographic_density                     0.041545
global_aoa * rel_synonyms_count                           0.031571
global_clustering * global_frequency                     -0.087092
global_clustering * global_letters_count                 -0.074100
global_clustering * global_orthographic_density          -0.070803
global_clustering * global_synonyms_count                 0.454920
global_clustering * rel_aoa                              -0.035254
global_clustering * rel_clustering                       -0.013801
global_clustering * rel_frequency                         0.035834
global_clustering * rel_letters_count                     0.053389
global_clustering * rel_orthographic_density              0.070725
global_clustering * rel_synonyms_count                   -0.470992
global_frequency * global_letters_count                  -0.041721
global_frequency * global_orthographic_density           -0.052219
global_frequency * global_synonyms_count                  0.060649
global_frequency * rel_aoa                               -0.000688
global_frequency * rel_clustering                         0.097306
global_frequency * rel_frequency                          0.008787
global_frequency * rel_letters_count                      0.027382
global_frequency * rel_orthographic_density               0.052787
global_frequency * rel_synonyms_count                    -0.050471
global_letters_count * global_orthographic_density        0.018507
global_letters_count * global_synonyms_count             -0.066097
global_letters_count * rel_aoa                            0.010115
global_letters_count * rel_clustering                    -0.023653
global_letters_count * rel_frequency                      0.024143
global_letters_count * rel_letters_count                  0.006604
global_letters_count * rel_orthographic_density           0.029962
global_letters_count * rel_synonyms_count                 0.114218
global_orthographic_density * global_synonyms_count      -0.049624
global_orthographic_density * rel_aoa                     0.005934
global_orthographic_density * rel_clustering             -0.203340
global_orthographic_density * rel_frequency               0.015382
global_orthographic_density * rel_letters_count           0.045540
global_orthographic_density * rel_orthographic_density    0.016774
global_orthographic_density * rel_synonyms_count          0.071961
global_synonyms_count * rel_aoa                          -0.042913
global_synonyms_count * rel_clustering                   -0.163944
global_synonyms_count * rel_frequency                    -0.003596
global_synonyms_count * rel_letters_count                -0.044079
global_synonyms_count * rel_orthographic_density         -0.114448
global_synonyms_count * rel_synonyms_count                0.146894
rel_aoa * rel_clustering                                 -0.006945
rel_aoa * rel_frequency                                   0.000749
rel_aoa * rel_letters_count                               0.007593
rel_aoa * rel_orthographic_density                       -0.006621
rel_aoa * rel_synonyms_count                              0.007299
rel_clustering * rel_frequency                           -0.056308
rel_clustering * rel_letters_count                        0.038258
rel_clustering * rel_orthographic_density                 0.143307
rel_clustering * rel_synonyms_count                       0.178131
rel_frequency * rel_letters_count                        -0.011160
rel_frequency * rel_orthographic_density                 -0.037004
rel_frequency * rel_synonyms_count                       -0.017922
rel_letters_count * rel_orthographic_density             -0.073525
rel_letters_count * rel_synonyms_count                    0.052824
rel_orthographic_density * rel_synonyms_count             0.186942
dtype: float64

Regressing rel synonyms_count with 889 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.22994780335014553

intercept                      1.097911
global_aoa                    -0.028101
global_clustering              0.090084
global_frequency              -0.001688
global_letters_count          -0.009560
global_orthographic_density   -0.036291
global_synonyms_count         -0.550803
rel_aoa                        0.018992
rel_clustering                -0.058381
rel_frequency                 -0.013813
rel_letters_count             -0.014109
rel_orthographic_density       0.016536
rel_synonyms_count             0.903072
dtype: float64

Regressing rel synonyms_count with 889 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.3161390243349952

intercept                                                 4.488409
global_aoa                                               -0.146732
global_clustering                                         1.323777
global_frequency                                         -0.183990
global_letters_count                                      0.189616
global_orthographic_density                               0.252426
global_synonyms_count                                     3.257709
rel_aoa                                                   0.011382
rel_clustering                                           -1.001935
rel_frequency                                            -0.137191
rel_letters_count                                        -0.225775
rel_orthographic_density                                 -0.316094
rel_synonyms_count                                       -3.828458
global_aoa * global_clustering                           -0.038671
global_aoa * global_frequency                             0.003681
global_aoa * global_letters_count                        -0.013937
global_aoa * global_orthographic_density                 -0.048629
global_aoa * global_synonyms_count                        0.007208
global_aoa * rel_aoa                                     -0.004828
global_aoa * rel_clustering                               0.062589
global_aoa * rel_frequency                               -0.009948
global_aoa * rel_letters_count                            0.001387
global_aoa * rel_orthographic_density                     0.034947
global_aoa * rel_synonyms_count                           0.056679
global_clustering * global_frequency                     -0.072008
global_clustering * global_letters_count                 -0.059664
global_clustering * global_orthographic_density          -0.069568
global_clustering * global_synonyms_count                 0.485435
global_clustering * rel_aoa                              -0.023180
global_clustering * rel_clustering                       -0.013917
global_clustering * rel_frequency                         0.009685
global_clustering * rel_letters_count                     0.050122
global_clustering * rel_orthographic_density              0.097545
global_clustering * rel_synonyms_count                   -0.483712
global_frequency * global_letters_count                  -0.040311
global_frequency * global_orthographic_density           -0.037870
global_frequency * global_synonyms_count                 -0.002472
global_frequency * rel_aoa                               -0.010034
global_frequency * rel_clustering                         0.087109
global_frequency * rel_frequency                          0.007049
global_frequency * rel_letters_count                      0.032827
global_frequency * rel_orthographic_density               0.049168
global_frequency * rel_synonyms_count                     0.027356
global_letters_count * global_orthographic_density        0.018908
global_letters_count * global_synonyms_count             -0.100441
global_letters_count * rel_aoa                            0.001184
global_letters_count * rel_clustering                    -0.021407
global_letters_count * rel_frequency                      0.028350
global_letters_count * rel_letters_count                  0.005638
global_letters_count * rel_orthographic_density           0.027073
global_letters_count * rel_synonyms_count                 0.131275
global_orthographic_density * global_synonyms_count      -0.135898
global_orthographic_density * rel_aoa                    -0.001870
global_orthographic_density * rel_clustering             -0.167045
global_orthographic_density * rel_frequency               0.018379
global_orthographic_density * rel_letters_count           0.028839
global_orthographic_density * rel_orthographic_density    0.011034
global_orthographic_density * rel_synonyms_count          0.159991
global_synonyms_count * rel_aoa                          -0.035002
global_synonyms_count * rel_clustering                   -0.191632
global_synonyms_count * rel_frequency                     0.074431
global_synonyms_count * rel_letters_count                 0.031253
global_synonyms_count * rel_orthographic_density          0.010029
global_synonyms_count * rel_synonyms_count                0.156376
rel_aoa * rel_clustering                                 -0.012629
rel_aoa * rel_frequency                                   0.009445
rel_aoa * rel_letters_count                               0.012384
rel_aoa * rel_orthographic_density                        0.000301
rel_aoa * rel_synonyms_count                             -0.005897
rel_clustering * rel_frequency                           -0.036824
rel_clustering * rel_letters_count                        0.017000
rel_clustering * rel_orthographic_density                 0.070728
rel_clustering * rel_synonyms_count                       0.180894
rel_frequency * rel_letters_count                        -0.021863
rel_frequency * rel_orthographic_density                 -0.038126
rel_frequency * rel_synonyms_count                       -0.096612
rel_letters_count * rel_orthographic_density             -0.058824
rel_letters_count * rel_synonyms_count                   -0.012880
rel_orthographic_density * rel_synonyms_count             0.045070
dtype: float64

----------------------------------------------------------------------
Regressing global orthographic_density with 791 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.15294432393977342

intercept                      1.389638
global_aoa                    -0.022396
global_clustering              0.059286
global_frequency              -0.016475
global_letters_count          -0.000518
global_orthographic_density    0.370822
global_synonyms_count          0.060897
dtype: float64

Regressing global orthographic_density with 791 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1794535006477466

intercept                                              2.984795
global_aoa                                            -0.285495
global_clustering                                      0.263010
global_frequency                                       0.139289
global_letters_count                                  -0.231212
global_orthographic_density                            0.396127
global_synonyms_count                                 -0.008330
global_aoa * global_clustering                        -0.029841
global_aoa * global_frequency                         -0.012940
global_aoa * global_letters_count                      0.021264
global_aoa * global_orthographic_density               0.063268
global_aoa * global_synonyms_count                    -0.008465
global_clustering * global_frequency                   0.004008
global_clustering * global_letters_count              -0.009054
global_clustering * global_orthographic_density       -0.000837
global_clustering * global_synonyms_count              0.097686
global_frequency * global_letters_count                0.001364
global_frequency * global_orthographic_density        -0.055815
global_frequency * global_synonyms_count               0.058592
global_letters_count * global_orthographic_density     0.005233
global_letters_count * global_synonyms_count           0.010724
global_orthographic_density * global_synonyms_count    0.076847
dtype: float64

Regressing rel orthographic_density with 791 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.10685045241508107

intercept                     -0.960264
global_aoa                    -0.007991
global_clustering              0.048104
global_frequency              -0.012897
global_letters_count           0.002444
global_orthographic_density    0.315998
global_synonyms_count          0.075558
dtype: float64

Regressing rel orthographic_density with 791 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.12955775863584473

intercept                                              1.815165
global_aoa                                            -0.328840
global_clustering                                      0.431548
global_frequency                                       0.076455
global_letters_count                                  -0.256779
global_orthographic_density                            0.272421
global_synonyms_count                                 -0.402417
global_aoa * global_clustering                        -0.033006
global_aoa * global_frequency                         -0.007550
global_aoa * global_letters_count                      0.021022
global_aoa * global_orthographic_density               0.056228
global_aoa * global_synonyms_count                     0.004293
global_clustering * global_frequency                  -0.003510
global_clustering * global_letters_count              -0.022335
global_clustering * global_orthographic_density       -0.000346
global_clustering * global_synonyms_count              0.050738
global_frequency * global_letters_count               -0.003907
global_frequency * global_orthographic_density        -0.042223
global_frequency * global_synonyms_count               0.066150
global_letters_count * global_orthographic_density     0.004968
global_letters_count * global_synonyms_count           0.011620
global_orthographic_density * global_synonyms_count    0.058473
dtype: float64

Regressing global orthographic_density with 791 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1270025900227072

intercept                   1.499778
rel_aoa                     0.003814
rel_clustering             -0.012240
rel_frequency              -0.036377
rel_letters_count           0.001519
rel_orthographic_density    0.387401
rel_synonyms_count          0.074083
dtype: float64

Regressing global orthographic_density with 791 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1497927624285017

intercept                                        1.464158
rel_aoa                                          0.076406
rel_clustering                                   0.121422
rel_frequency                                   -0.064022
rel_letters_count                               -0.021754
rel_orthographic_density                         0.325563
rel_synonyms_count                               0.320381
rel_aoa * rel_clustering                        -0.000414
rel_aoa * rel_frequency                          0.019801
rel_aoa * rel_letters_count                      0.003247
rel_aoa * rel_orthographic_density               0.034049
rel_aoa * rel_synonyms_count                     0.032332
rel_clustering * rel_frequency                   0.021428
rel_clustering * rel_letters_count              -0.014296
rel_clustering * rel_orthographic_density        0.061043
rel_clustering * rel_synonyms_count              0.025791
rel_frequency * rel_letters_count               -0.006310
rel_frequency * rel_orthographic_density        -0.018863
rel_frequency * rel_synonyms_count               0.035506
rel_letters_count * rel_orthographic_density    -0.010936
rel_letters_count * rel_synonyms_count          -0.005951
rel_orthographic_density * rel_synonyms_count    0.156248
dtype: float64

Regressing rel orthographic_density with 791 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.19383144080833659

intercept                  -0.558994
rel_aoa                     0.013672
rel_clustering              0.009845
rel_frequency               0.009977
rel_letters_count           0.016025
rel_orthographic_density    0.477126
rel_synonyms_count          0.063489
dtype: float64

Regressing rel orthographic_density with 791 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.21226142953754412

intercept                                       -0.532145
rel_aoa                                          0.095787
rel_clustering                                   0.124187
rel_frequency                                    0.021845
rel_letters_count                               -0.003071
rel_orthographic_density                         0.440496
rel_synonyms_count                               0.255595
rel_aoa * rel_clustering                        -0.005346
rel_aoa * rel_frequency                          0.017663
rel_aoa * rel_letters_count                      0.002690
rel_aoa * rel_orthographic_density               0.042791
rel_aoa * rel_synonyms_count                     0.027037
rel_clustering * rel_frequency                   0.015650
rel_clustering * rel_letters_count              -0.007917
rel_clustering * rel_orthographic_density        0.067460
rel_clustering * rel_synonyms_count             -0.007929
rel_frequency * rel_letters_count               -0.011497
rel_frequency * rel_orthographic_density         0.003648
rel_frequency * rel_synonyms_count               0.024152
rel_letters_count * rel_orthographic_density     0.002768
rel_letters_count * rel_synonyms_count          -0.018015
rel_orthographic_density * rel_synonyms_count    0.093321
dtype: float64

Regressing global orthographic_density with 791 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.16037982522758643

intercept                      2.645949
global_aoa                    -0.039782
global_clustering              0.180760
global_frequency              -0.015800
global_letters_count          -0.054819
global_orthographic_density    0.293061
global_synonyms_count         -0.025198
rel_aoa                        0.032058
rel_clustering                -0.141010
rel_frequency                  0.004052
rel_letters_count              0.057945
rel_orthographic_density       0.077940
rel_synonyms_count             0.097261
dtype: float64

Regressing global orthographic_density with 791 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.24417879890640037

intercept                                                 2.345250
global_aoa                                               -1.261278
global_clustering                                         0.444813
global_frequency                                          0.334965
global_letters_count                                      0.420818
global_orthographic_density                               1.790315
global_synonyms_count                                     0.372837
rel_aoa                                                   0.907114
rel_clustering                                            1.016405
rel_frequency                                            -0.477074
rel_letters_count                                        -0.489788
rel_orthographic_density                                 -1.239832
rel_synonyms_count                                        3.269005
global_aoa * global_clustering                           -0.158403
global_aoa * global_frequency                            -0.019773
global_aoa * global_letters_count                         0.063291
global_aoa * global_orthographic_density                  0.092054
global_aoa * global_synonyms_count                        0.004573
global_aoa * rel_aoa                                     -0.002568
global_aoa * rel_clustering                               0.136206
global_aoa * rel_frequency                                0.008338
global_aoa * rel_letters_count                           -0.020419
global_aoa * rel_orthographic_density                    -0.018693
global_aoa * rel_synonyms_count                          -0.030251
global_clustering * global_frequency                     -0.010407
global_clustering * global_letters_count                  0.153911
global_clustering * global_orthographic_density          -0.019336
global_clustering * global_synonyms_count                 0.288101
global_clustering * rel_aoa                               0.042976
global_clustering * rel_clustering                       -0.095283
global_clustering * rel_frequency                        -0.077794
global_clustering * rel_letters_count                    -0.016557
global_clustering * rel_orthographic_density              0.250429
global_clustering * rel_synonyms_count                    0.150824
global_frequency * global_letters_count                   0.018833
global_frequency * global_orthographic_density           -0.182504
global_frequency * global_synonyms_count                  0.030625
global_frequency * rel_aoa                               -0.034943
global_frequency * rel_clustering                        -0.024214
global_frequency * rel_frequency                          0.017358
global_frequency * rel_letters_count                      0.050095
global_frequency * rel_orthographic_density               0.203789
global_frequency * rel_synonyms_count                    -0.052597
global_letters_count * global_orthographic_density       -0.081449
global_letters_count * global_synonyms_count              0.112989
global_letters_count * rel_aoa                           -0.041670
global_letters_count * rel_clustering                    -0.246043
global_letters_count * rel_frequency                     -0.082298
global_letters_count * rel_letters_count                 -0.005769
global_letters_count * rel_orthographic_density           0.191874
global_letters_count * rel_synonyms_count                -0.195995
global_orthographic_density * global_synonyms_count       0.393318
global_orthographic_density * rel_aoa                     0.006090
global_orthographic_density * rel_clustering             -0.392886
global_orthographic_density * rel_frequency               0.034878
global_orthographic_density * rel_letters_count           0.035156
global_orthographic_density * rel_orthographic_density    0.021020
global_orthographic_density * rel_synonyms_count         -0.376423
global_synonyms_count * rel_aoa                           0.012350
global_synonyms_count * rel_clustering                   -0.511478
global_synonyms_count * rel_frequency                     0.148836
global_synonyms_count * rel_letters_count                -0.097857
global_synonyms_count * rel_orthographic_density         -0.653673
global_synonyms_count * rel_synonyms_count               -0.037132
rel_aoa * rel_clustering                                 -0.046227
rel_aoa * rel_frequency                                   0.042795
rel_aoa * rel_letters_count                               0.023636
rel_aoa * rel_orthographic_density                       -0.008122
rel_aoa * rel_synonyms_count                             -0.004292
rel_clustering * rel_frequency                            0.121340
rel_clustering * rel_letters_count                        0.112671
rel_clustering * rel_orthographic_density                 0.169859
rel_clustering * rel_synonyms_count                       0.132120
rel_frequency * rel_letters_count                         0.021281
rel_frequency * rel_orthographic_density                 -0.107609
rel_frequency * rel_synonyms_count                       -0.088811
rel_letters_count * rel_orthographic_density             -0.134948
rel_letters_count * rel_synonyms_count                    0.180621
rel_orthographic_density * rel_synonyms_count             0.739206
dtype: float64

Regressing rel orthographic_density with 791 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.2313984902402778

intercept                      1.945900
global_aoa                    -0.022475
global_clustering              0.191132
global_frequency              -0.009526
global_letters_count          -0.035257
global_orthographic_density   -0.451137
global_synonyms_count         -0.018577
rel_aoa                        0.021174
rel_clustering                -0.148172
rel_frequency                  0.003756
rel_letters_count              0.034523
rel_orthographic_density       0.875194
rel_synonyms_count             0.079765
dtype: float64

Regressing rel orthographic_density with 791 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.30541027066376647

intercept                                                 3.395187
global_aoa                                               -0.880207
global_clustering                                         0.630962
global_frequency                                          0.159685
global_letters_count                                      0.244618
global_orthographic_density                               0.031450
global_synonyms_count                                     1.038311
rel_aoa                                                   0.665010
rel_clustering                                            0.509829
rel_frequency                                            -0.414779
rel_letters_count                                        -0.086999
rel_orthographic_density                                  1.085306
rel_synonyms_count                                        2.545158
global_aoa * global_clustering                           -0.136863
global_aoa * global_frequency                            -0.024146
global_aoa * global_letters_count                         0.053849
global_aoa * global_orthographic_density                  0.031720
global_aoa * global_synonyms_count                       -0.019239
global_aoa * rel_aoa                                     -0.004044
global_aoa * rel_clustering                               0.119319
global_aoa * rel_frequency                                0.013883
global_aoa * rel_letters_count                           -0.016501
global_aoa * rel_orthographic_density                     0.031886
global_aoa * rel_synonyms_count                          -0.001861
global_clustering * global_frequency                     -0.024120
global_clustering * global_letters_count                  0.154085
global_clustering * global_orthographic_density          -0.090676
global_clustering * global_synonyms_count                 0.258754
global_clustering * rel_aoa                               0.035119
global_clustering * rel_clustering                       -0.085555
global_clustering * rel_frequency                        -0.066733
global_clustering * rel_letters_count                     0.006864
global_clustering * rel_orthographic_density              0.381509
global_clustering * rel_synonyms_count                    0.168075
global_frequency * global_letters_count                   0.024176
global_frequency * global_orthographic_density           -0.135203
global_frequency * global_synonyms_count                  0.027050
global_frequency * rel_aoa                               -0.028233
global_frequency * rel_clustering                        -0.008498
global_frequency * rel_frequency                          0.015380
global_frequency * rel_letters_count                      0.036611
global_frequency * rel_orthographic_density               0.144282
global_frequency * rel_synonyms_count                    -0.063741
global_letters_count * global_orthographic_density        0.033829
global_letters_count * global_synonyms_count              0.055924
global_letters_count * rel_aoa                           -0.033300
global_letters_count * rel_clustering                    -0.238540
global_letters_count * rel_frequency                     -0.074711
global_letters_count * rel_letters_count                 -0.004227
global_letters_count * rel_orthographic_density           0.091155
global_letters_count * rel_synonyms_count                -0.129442
global_orthographic_density * global_synonyms_count       0.210162
global_orthographic_density * rel_aoa                     0.038740
global_orthographic_density * rel_clustering             -0.196261
global_orthographic_density * rel_frequency               0.018424
global_orthographic_density * rel_letters_count          -0.069823
global_orthographic_density * rel_orthographic_density    0.046083
global_orthographic_density * rel_synonyms_count         -0.164077
global_synonyms_count * rel_aoa                           0.029471
global_synonyms_count * rel_clustering                   -0.394876
global_synonyms_count * rel_frequency                     0.148757
global_synonyms_count * rel_letters_count                -0.100100
global_synonyms_count * rel_orthographic_density         -0.530759
global_synonyms_count * rel_synonyms_count               -0.055579
rel_aoa * rel_clustering                                 -0.040840
rel_aoa * rel_frequency                                   0.033788
rel_aoa * rel_letters_count                               0.015984
rel_aoa * rel_orthographic_density                       -0.043267
rel_aoa * rel_synonyms_count                             -0.025803
rel_clustering * rel_frequency                            0.114756
rel_clustering * rel_letters_count                        0.083880
rel_clustering * rel_orthographic_density                -0.085056
rel_clustering * rel_synonyms_count                       0.016540
rel_frequency * rel_letters_count                         0.013101
rel_frequency * rel_orthographic_density                 -0.083984
rel_frequency * rel_synonyms_count                       -0.088971
rel_letters_count * rel_orthographic_density             -0.047035
rel_letters_count * rel_synonyms_count                    0.152491
rel_orthographic_density * rel_synonyms_count             0.545814
dtype: float64

	aoa	betweenness	clustering	degree	frequency	letters_count	orthographic_density	pagerank	phonemes_count	phonological_density	syllables_count	synonyms_count
Component-0	-0.495983	0.215799	-0.076269	0.216337	0.225148	-0.459887	0.192592	0.249676	-0.435075	0.270379	-0.175185	0.010494
Component-1	0.374635	-0.405534	0.140260	-0.299004	-0.261974	-0.411493	0.139908	-0.292163	-0.429618	0.186461	-0.154245	0.009605
Component-2	0.656900	0.628446	-0.103045	0.216350	-0.184906	-0.126580	0.031159	0.222963	-0.052857	0.091591	-0.056113	-0.029739

	aoa	frequency	letters_count
Component-0	-0.747822	0.383865	-0.541673
Component-1	0.379124	-0.422859	-0.823077