Feature variation by substitution ($\nu_{\phi}$)

1 Setup

Flags and settings.



In [1]:

    
SAVE_FIGURES = False
PAPER_FEATURES = ['frequency', 'aoa', 'clustering', 'letters_count',
                  'synonyms_count', 'orthographic_density']
N_COMPONENTS = 3
BIN_COUNT = 4

Imports and database setup.



In [2]:

    
from itertools import product

import pandas as pd
import seaborn as sb
from scipy import stats
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from progressbar import ProgressBar

%cd -q ..
from brainscopypaste.conf import settings
%cd -q notebooks
from brainscopypaste.mine import Model, Time, Source, Past, Durl
from brainscopypaste.db import Substitution
from brainscopypaste.utils import init_db, session_scope
engine = init_db()

2 Variation of features upon substitution

First build our data.



In [3]:

    
model = Model(time=Time.discrete, source=Source.majority, past=Past.all, durl=Durl.exclude_past, max_distance=2)
data = []

with session_scope() as session:
    substitutions = session.query(Substitution.id)\
        .filter(Substitution.model == model)
    print("Got {} substitutions for model {}"
          .format(substitutions.count(), model))
    substitution_ids = [id for (id,) in substitutions]

for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for feature in Substitution.__features__:
            source, destination = substitution.features(feature)
            source_rel, destination_rel = \
                substitution.features(feature, sentence_relative='median')
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'feature': feature,
                'source': source,
                'source_rel': source_rel,
                'destination': destination,
                'destination_rel': destination_rel,
                'h0': substitution.feature_average(feature),
                'h0_rel': substitution.feature_average(
                        feature, sentence_relative='median'),
                'h0n': substitution.feature_average(
                        feature, source_synonyms=True),
                'h0n_rel': substitution.feature_average(
                        feature, source_synonyms=True,
                        sentence_relative='median')})

original_variations = pd.DataFrame(data)
del data









    



Got 13005 substitutions for model Model(time=Time.discrete, source=Source.majority, past=Past.all, durl=Durl.exclude_past, max_distance=2)






    



100% (13005 of 13005) |####################| Elapsed Time: 0:03:13 Time: 0:03:13

Compute cluster averages (so as not to overestimate confidence intervals) and crop data so that we have acceptable CIs.



In [4]:

    
variations = original_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'feature'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'feature'], as_index=False)\
    ['source', 'source_rel', 'destination', 'destination_rel', 'feature',
     'h0', 'h0_rel', 'h0n', 'h0n_rel'].mean()
variations['variation'] = variations['destination'] - variations['source']

# HARDCODED: drop values where source AoA is above 15.
# This crops the graphs to acceptable CIs.
variations.loc[(variations.feature == 'aoa') & (variations.source > 15),
               ['source', 'source_rel', 'destination', 'destination_rel',
                'h0', 'h0_rel', 'h0n', 'h0n_rel']] = np.nan

Prepare feature ordering.



In [5]:

    
ordered_features = sorted(
    Substitution.__features__,
    key=lambda f: Substitution._transformed_feature(f).__doc__
)

What we plot about features

For a feature $\phi$, plot:

$\nu_{\phi}$, the average feature of an appearing word upon substitution, as a function of the feature of the disappearing word: $$\nu_{\phi}(f) = \left< \phi(w') \right>_{\{w \rightarrow w' | \phi(w) = f \}}$$
$\nu_{\phi}^0$ (which is the average feature value), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi}^{00}$ (which is the average feature value for synonyms of the source word), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

We also plot these values relative to the sentence average, i.e.:

$\nu_{\phi, r}$, the average sentence-relative feature of an appearing word upon substitution as a function of the sentence-relative feature of the disappearing word, i.e. $\phi($destination$) - \phi($destination sentence$)$ as a function of $\phi($source$) - \phi($source sentence$)$
$\nu_{\phi, r}^0$ (which is the average feature value minus the sentence average), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi, r}^{00}$ (which is the average feature value for synonyms of the source word minus the sentence average), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

Those values are plotted with fixed-width bins, then quantile bins, with absolute feature values, then with relative-to-sentence features.



In [6]:

    
def print_significance(name, bins, h0, h0n, values):
    bin_count = bins.max() + 1
    print()
    print('-' * len(name))
    print(name)
    print('-' * len(name))
    header = ('Bin  |   '
              + ' |   '.join(map(str, range(1, bin_count + 1)))
              + ' |')
    print(header)
    print('-' * len(header))
    
    for null_name, nulls in [('H_0 ', h0), ('H_00', h0n)]:
        bin_values = np.zeros(bin_count)
        bin_nulls = np.zeros(bin_count)
        cis = np.zeros((bin_count, 3))

        for i in range(bin_count):
            indices = bins == i
            n = (indices).sum()
            s = values[indices].std(ddof=1)

            bin_values[i] = values[indices].mean()
            bin_nulls[i] = nulls[indices].mean()
            for j, alpha in enumerate([.05, .01, .001]):
                cis[i, j] = (stats.t.ppf(1 - alpha/2, n - 1)
                             * values[indices].std(ddof=1)
                             / np.sqrt(n - 1))

        print(null_name + ' |', end='')
        differences = ((bin_values[:,np.newaxis]
                        < bin_nulls[:,np.newaxis] - cis)
                       | (bin_values[:,np.newaxis]
                          > bin_nulls[:,np.newaxis] + cis))
        for i in range(bin_count):
            if differences[i].any():
                n_stars = np.where(differences[i])[0].max()
                bin_stars = '*' * (1 + n_stars) + ' ' * (2 - n_stars)
            else:
                bin_stars = 'ns.'
            print(' ' + bin_stars + ' |', end='')
        print()



In [7]:

    
def plot_variation(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    relative = kwargs.get('relative', False)
    quantiles = kwargs.get('quantiles', False)
    feature_field = kwargs.get('feature_field', 'feature')
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    nuphi = r'\nu_{\phi' + (',r' if relative else '') + '}'
    plt.plot(middles, values, '-', lw=2, color=color,
             label='${}$'.format(nuphi))
    plt.fill_between(middles, values - cis, values + cis,
                     color=sb.desaturate(color, 0.2), alpha=0.2)
    plt.plot(middles, h0s, '--', color=sb.desaturate(color, 0.2),
             label='${}^0$'.format(nuphi))
    plt.plot(middles, h0ns, linestyle='-.',
             color=sb.desaturate(color, 0.2),
             label='${}^{{00}}$'.format(nuphi))
    plt.plot(middles, middles, linestyle='dotted',
             color=sb.desaturate(color, 0.2),
             label='$y = x$')
    lmin, lmax = middles[0], middles[-1]
    h0min, h0max = min(h0s.min(), h0ns.min()), max(h0s.max(), h0ns.max())
    # Rescale limits if we're touching H0 or H00.
    if h0min < lmin:
        lmin = h0min - (lmax - h0min) / 10
    elif h0max > lmax:
        lmax = h0max + (h0max - lmin) / 10
    plt.xlim(lmin, lmax)
    plt.ylim(lmin, lmax)

    # Test for statistical significance
    print_significance(str(data.iloc[0][feature_field]),
                       x_bins, h0, h0n, y)



In [8]:

    
def plot_grid(data, features, filename,
              plot_function, xlabel, ylabel,
              feature_field='feature', plot_kws={}):
    g = sb.FacetGrid(data=data[data[feature_field]
                               .map(lambda f: f in features)],
                     sharex=False, sharey=False,
                     col=feature_field, hue=feature_field,
                     col_order=features, hue_order=features,
                     col_wrap=3, aspect=1.5, size=3)
    g.map_dataframe(plot_function, **plot_kws)
    g.set_titles('{col_name}')
    g.set_xlabels(xlabel)
    g.set_ylabels(ylabel)
    for ax in g.axes.ravel():
        legend = ax.legend(frameon=True, loc='best')
        if not legend:
            # Skip if nothing was plotted on these axes.
            continue
        frame = legend.get_frame()
        frame.set_facecolor('#f2f2f2')
        frame.set_edgecolor('#000000')
        ax.set_title(Substitution._transformed_feature(ax.get_title())
                     .__doc__)
    if SAVE_FIGURES:
        g.fig.savefig(settings.FIGURE.format(filename),
                      bbox_inches='tight', dpi=300)



In [9]:

    
def plot_bias(ax, data, color, ci=True, relative=False, quantiles=False):
    feature = data.iloc[0].feature
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    scale = abs(h0s.mean())
    ax.plot(np.linspace(0, 1, bin_count),
            (values - h0ns) / scale, '-', lw=2, color=color,
            label=Substitution._transformed_feature(feature).__doc__)
    if ci:
        ax.fill_between(np.linspace(0, 1, bin_count),
                        (values - h0ns - cis) / scale,
                        (values - h0ns + cis) / scale,
                        color=sb.desaturate(color, 0.2), alpha=0.2)



In [10]:

    
def plot_overlay(data, features, filename, palette_name,
                 plot_function, title, xlabel, ylabel, plot_kws={}):
    palette = sb.color_palette(palette_name, len(features))
    fig, ax = plt.subplots(figsize=(12, 6))
    for j, feature in enumerate(features):
        plot_function(ax, data[data.feature == feature].dropna(),
                      color=palette[j], **plot_kws)
    ax.legend(loc='lower right')
    ax.set_title(title)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    if SAVE_FIGURES:
        fig.savefig(settings.FIGURE.format(filename),
                    bbox_inches='tight', dpi=300)
    return ax

2.1 Global feature values

2.1.1 Bins of distribution of appeared global feature values

For each feature $\phi$, we plot the variation upon substitution as explained above



In [11]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *   |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | *   | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | ns. | ns. | ns. |
H_00 | *   | ns. | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *   |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | ns. | ns. |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | ns. | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *   | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *   | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | ns. | **  | ns. |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | **  | *** | *** |
H_00 | ns. | ns. | ns. | **  |

Then plot $\nu_{\phi} - \nu_{\phi}^{00}$ for each feature (i.e. the measured bias) to see how they compare



In [12]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False});



In [13]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *   | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *   |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *   |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | ns. | ns. |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | ns. | **  | ns. |



In [14]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)')\
    .set_ylim(-2, .7);

2.1.2 Quantiles of distribution of appeared global feature values



In [15]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | **  | ns. | ns. | *   |

---------------
syllables_count
---------------
Bin  |   1 |   2 |
------------------
H_0  | *** | ns. |
H_00 | *   | *   |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | **  | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | **  | *** | **  |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | ns. | ns. | *** |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | ns. | *   | *   |



In [16]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False, 'quantiles': True});



In [17]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | **  | *** | **  |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | ns. | ns. | *** |



In [18]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'quantiles': True})\
    .set_ylim(-1.2, .6);

2.2 Sentence-relative feature values

2.2.1 Bins of distribution of appeared sentence-relative values



In [19]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | **  | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | *   | ns. | ns. | *   |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | ns. | ns. |
H_00 | ns. | ns. | ns. | *   |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | ns. | ns. |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | *   | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | **  | *** | *** | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | ns. | ns. | ns. |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | **  | *** | *** |
H_00 | ns. | ns. | ns. | ns. |



In [20]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True});



In [21]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | **  | *** | *** | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | **  | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | ns. | ns. |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | ns. | ns. | ns. |



In [22]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_sentencerel',
             'deep', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True})\
    .set_ylim(-2, .7);

2.2.2 Quantiles of distribution of appeared sentence-relative values



In [23]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | ns. | ns. | ns. | **  |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | ns. | ns. | ns. | *   |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | *   | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | ns. | *** | **  |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | ns. | ns. | *** |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | ns. | ns. | ns. | *   |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | **  | *** | *** |
H_00 | ns. | ns. | *   | ns. |



In [24]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True, 'quantiles': True});



In [25]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | ns. | ns. | *** |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | *   | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | ns. | ns. | ns. | *   |



In [26]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True, 'quantiles': True});

3 Streamplots

We'd like to see what happens between absolute and relative feature values, i.e. how do their effects interact. Especially, we want to know who wins between cognitive bias, attraction to sentence average, or attraction to global feature average.

To do this we plot the general direction (arrows) and strength (color) of where destination words are given a particular absolute/relative source feature couple. I.e., for a given absolute feature value and relative feature value, if this word were to be substituted, where would it go in this (absolute, relative) space?

The interesting thing in these plots is the attraction front, where all arrows point to and join. We're interested in:

its slope
its shape (e.g. several slope regimes?)
its position w.r.t. $\nu_{\phi}^0$ and $y = 0$ (which is $\left< \phi(sentence) \right>$)

First, here's our plotting function. (Note we set the arrow size to something that turns out to be huge here, but gives normal sizes in the figures saves. There must be some dpi scaling problem with the arrows.)



In [27]:

    
def plot_stream(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    source = data['source']
    source_rel = data['source_rel']
    dest = data['destination']
    dest_rel = data['destination_rel']
    h0 = data['h0']
    
    # Compute binning.
    bin_count = 4
    x_bins, x_margins = pd.cut(source, bin_count,
                               right=False, labels=False, retbins=True)
    x_middles = (x_margins[:-1] + x_margins[1:]) / 2
    y_bins, y_margins = pd.cut(source_rel, bin_count,
                               right=False, labels=False, retbins=True)
    y_middles = (y_margins[:-1] + y_margins[1:]) / 2
    
    # Compute bin values.
    h0s = np.ones(bin_count) * h0.iloc[0]
    u_values = np.zeros((bin_count, bin_count))
    v_values = np.zeros((bin_count, bin_count))
    strength = np.zeros((bin_count, bin_count))
    for x in range(bin_count):
        for y in range(bin_count):
            u_values[y, x] = (
                dest[(x_bins == x) & (y_bins == y)] -
                source[(x_bins == x) & (y_bins == y)]
            ).mean()
            v_values[y, x] = (
                dest_rel[(x_bins == x) & (y_bins == y)] -
                source_rel[(x_bins == x) & (y_bins == y)]
            ).mean()
            strength[y, x] = np.sqrt(
                (dest[(x_bins == x) & (y_bins == y)] - 
                 source[(x_bins == x) & (y_bins == y)]) ** 2 +
                (dest_rel[(x_bins == x) & (y_bins == y)] - 
                 source_rel[(x_bins == x) & (y_bins == y)]) ** 2
            ).mean()
    
    # Plot.
    plt.streamplot(x_middles, y_middles, u_values, v_values,
                   arrowsize=4, color=strength, cmap=plt.cm.viridis)
    plt.plot(x_middles, np.zeros(bin_count), linestyle='-',
             color=sb.desaturate(color, 0.2), 
             label=r'$\left< \phi(sentence) \right>$')
    plt.plot(h0s, y_middles, linestyle='--',
             color=sb.desaturate(color, 0.2), label=r'$\nu_{\phi}^0$')
    plt.xlim(x_middles[0], x_middles[-1])
    plt.ylim(y_middles[0], y_middles[-1])

Here are the plots for all features



In [28]:

    
g = sb.FacetGrid(data=variations,
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=ordered_features, hue_order=ordered_features)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

And here are the plots for the features we expose in the paper



In [29]:

    
g = sb.FacetGrid(data=variations[variations['feature']
                                 .map(lambda f: f in PAPER_FEATURES)],
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=PAPER_FEATURES, hue_order=PAPER_FEATURES)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

4 PCA'd feature variations

Compute PCA on feature variations (note: on variations, not on features directly), and show the evolution of the first three components upon substitution.

CAVEAT: the PCA is computed on variations where all features are defined. This greatly reduces the number of words included (and also the number of substitutions -- see below for real values, but you should know it's drastic). This also has an effect on the computation of $\mathcal{H}_0$ and $\mathcal{H}_{00}$, which are computed using words for which all features are defined. This, again, hugely reduces the number of words taken into account, changing the values under the null hypotheses.

4.1 On all the features

Compute the actual PCA



In [30]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(Substitution.__features__))
pcavariations = variations.pivot(index='cluster_id',
                                 columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

print("We're plotting variation for the first {} components:"
      .format(N_COMPONENTS))
pd.DataFrame(pca.components_[:N_COMPONENTS],
             columns=pcafeatures,
             index=['Component-{}'.format(i) for i in range(N_COMPONENTS)])









    



MLE estimates there are 10 components.

Those explain the following variance:
[ 0.54592671  0.16271588  0.07991678  0.07409271  0.03373343  0.03044308
  0.01830013  0.01713546  0.01652317  0.0089701 ]

We're plotting variation for the first 3 components:






    Out[30]:






  
    
      
      aoa
      betweenness
      clustering
      degree
      frequency
      letters_count
      orthographic_density
      pagerank
      phonemes_count
      phonological_density
      syllables_count
      synonyms_count
    
  
  
    
      Component-0
      -0.427764
      0.283759
      -0.089472
      0.245792
      0.229850
      -0.449654
      0.221950
      0.282619
      -0.426400
      0.276820
      -0.159614
      -0.001335
    
    
      Component-1
      0.345614
      -0.376443
      0.140480
      -0.297578
      -0.268808
      -0.425018
      0.152353
      -0.315924
      -0.428158
      0.203662
      -0.172741
      -0.000870
    
    
      Component-2
      0.741466
      0.238557
      -0.143220
      0.096703
      0.586037
      -0.082116
      0.002898
      0.050598
      -0.018035
      0.075508
      0.014736
      -0.066779

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [31]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(N_COMPONENTS):
            source, destination = substitution\
                .components(component, pca, pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (13005 of 13005) |####################| Elapsed Time: 0:02:35 Time: 0:02:35

Compute cluster averages (so as not to overestimate confidence intervals).



In [32]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components (see the caveat section below)



In [33]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='upper left')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | **  | *** | *** |
H_00 | ns. | *   | *   | *** |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *   |

---
2.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | **  | *   | ns. |

4.2 On a subset of relevant features



In [34]:

    
relevant_features = ['frequency', 'aoa', 'letters_count']

Compute the actual PCA



In [35]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(relevant_features))
pcavariations = variations[variations['feature']
                           .map(lambda f: f in pcafeatures)]\
    .pivot(index='cluster_id', columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

pd.DataFrame(pca.components_,
             columns=pcafeatures,
             index=['Component-{}'.format(i)
                    for i in range(pca.n_components_)])









    



MLE estimates there are 2 components.

Those explain the following variance:
[ 0.67708467  0.18316591]







    Out[35]:






  
    
      
      aoa
      frequency
      letters_count
    
  
  
    
      Component-0
      -0.727905
      0.367678
      -0.578764
    
    
      Component-1
      0.438702
      -0.398972
      -0.805209

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [36]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(pca.n_components_):
            source, destination = substitution.components(component, pca,
                                                          pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (13005 of 13005) |####################| Elapsed Time: 0:01:23 Time: 0:01:23

Compute cluster averages (so as not to overestimate confidence intervals).



In [37]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components



In [38]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | *   | *** | *** | *** |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

4.3 CAVEAT: reduction of the numbers of words and substitutions

As explained above, this PCA analysis can only use words for which all the features are defined (in this case, the features listed in relevant_features). So note the following:



In [39]:

    
for feature in relevant_features:
    print("Feature '{}' is based on {} words."
          .format(feature, len(Substitution
                               ._transformed_feature(feature)())))

# Compute the number of words that have all PAPER_FEATURES defined.
words = set()
for tfeature in [Substitution._transformed_feature(feature)
                 for feature in relevant_features]:
    words.update(tfeature())

data = dict((feature, []) for feature in relevant_features)
words_list = []
for word in words:
    words_list.append(word)
    for feature in relevant_features:
        data[feature].append(Substitution
                             ._transformed_feature(feature)(word))
wordsdf = pd.DataFrame(data)
wordsdf['words'] = words_list
del words_list, data

print()
print("Among all the set of words used by these features, "
      "only {} are used."
      .format(len(wordsdf.dropna())))

print()
print("Similarly, we mined {} (cluster-unique) substitutions, "
      "but the PCA is in fact"
      " computed on {} of them (those where all features are defined)."
      .format(len(set(variations['cluster_id'])), len(pcavariations)))









    



Feature 'frequency' is based on 33450 words.
Feature 'aoa' is based on 30102 words.
Feature 'letters_count' is based on 42786 words.

Among all the set of words used by these features, only 14450 are used.

Similarly, we mined 1403 (cluster-unique) substitutions, but the PCA is in fact computed on 1140 of them (those where all features are defined).

The way $\mathcal{H}_0$ and $\mathcal{H}_{00}$ are computed makes them also affected by this.

5 Interactions between features (by Anova)

Some useful variables first.



In [40]:

    
cuts = [('fixed bins', pd.cut)]#, ('quantiles', pd.qcut)]
rels = [('global', ''), ('sentence-relative', '_rel')]

def star_level(p):
    if p < .001:
        return '***'
    elif p < .01:
        return ' **'
    elif p < .05:
        return '  *'
    else:
        return 'ns.'

Now for each feature, assess if it has an interaction with the other features' destination value. We look at this for all pairs of features, with all pairs of global/sentence-relative value and types of binning (fixed width/quantiles). So it's a lot of answers.

Three stars means $p < .001$, two $p < .01$, one $p < .05$, and ns. means non-significative.



In [41]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
    * global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
  *** global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
   ** global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
   ** global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Now for each feature, look at its interaction with the other features' variation (i.e. destination - source). Same drill, same combinations.



In [42]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]\
                    - variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
   ** global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
   ** global -> global
    * global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
    * global -> global
    * global -> sentence-relative
    * sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
    * global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
    * global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Ok, so this can go on for a long time, and I'm not going to look at interactions with this lens (meaning at interaction of couples of features with another feature's destination values).

6 Regression



In [43]:

    
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures



In [44]:

    
rels = {False: ('global', ''),
        True: ('rel', '_rel')}

def regress(data, features, target,
            source_rel=False, dest_rel=False, interactions=False):
    if source_rel not in [True, False, 'both']:
        raise ValueError
    if not isinstance(dest_rel, bool):
        raise ValueError
    # Process source/destination relativeness arguments.
    if isinstance(source_rel, bool):
        source_rel = [source_rel]
    else:
        source_rel = [False, True]
    dest_rel_name, dest_rel = rels[dest_rel]
    
    features = tuple(sorted(features))
    feature_tuples = [('source' + rels[rel][1], feature)
                      for rel in source_rel
                      for feature in features]
    feature_names = [rels[rel][0] + '_' + feature
                     for rel in source_rel
                     for feature in features]
    
    # Get source and destination values.
    source = pd.pivot_table(
        data,
        values=['source' + rels[rel][1] for rel in source_rel],
        index=['cluster_id'],
        columns=['feature']
    )[feature_tuples].dropna()
    destination = variations[variations.feature == target]\
        .pivot(index='cluster_id', columns='feature',
               values='destination' + dest_rel)\
        .loc[source.index][target].dropna()
    source = source.loc[destination.index].values
    destination = destination.values

    # If asked to, get polynomial features.
    if interactions:
        poly = PolynomialFeatures(degree=2, interaction_only=True)
        source = poly.fit_transform(source)
        regress_features = [' * '.join([feature_names[j]
                                        for j, p in enumerate(powers)
                                        if p > 0]) or 'intercept'
                            for powers in poly.powers_]
    else:
        regress_features = feature_names

    # Regress.
    linreg = linear_model.LinearRegression(fit_intercept=not interactions)
    linreg.fit(source, destination)

    # And print the score and coefficients.
    print('Regressing {} with {} measures, {} interactions'
          .format(dest_rel_name + ' ' + target, len(source),
                  'with' if interactions else 'no'))
    print('           ' + '^' * len(dest_rel_name + ' ' + target))
    print('R^2 = {}'
          .format(linreg.score(source, destination)))
    print()
    coeffs = pd.Series(index=regress_features, data=linreg.coef_)
    if not interactions:
        coeffs = pd.Series(index=['intercept'], data=[linreg.intercept_])\
            .append(coeffs)
    with pd.option_context('display.max_rows', 999):
        print(coeffs)



In [45]:

    
for target in PAPER_FEATURES:
    print('-' * 70)
    for source_rel, dest_rel in product([False, True, 'both'],
                                        [False, True]):
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel)
        print()
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel, interactions=True)
        print()









    



----------------------------------------------------------------------
Regressing global frequency with 893 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.03883431038989871

intercept                      6.227704
global_aoa                    -0.003804
global_clustering             -0.038345
global_frequency               0.252878
global_letters_count          -0.001389
global_orthographic_density   -0.027351
global_synonyms_count          0.073722
dtype: float64

Regressing global frequency with 893 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.057806765532471216

intercept                                              17.550110
global_aoa                                             -0.367816
global_clustering                                       1.008030
global_frequency                                       -0.127717
global_letters_count                                   -1.559120
global_orthographic_density                            -1.330171
global_synonyms_count                                   0.715316
global_aoa * global_clustering                          0.046186
global_aoa * global_frequency                           0.034441
global_aoa * global_letters_count                       0.044526
global_aoa * global_orthographic_density                0.048599
global_aoa * global_synonyms_count                     -0.017816
global_clustering * global_frequency                   -0.017868
global_clustering * global_letters_count               -0.205191
global_clustering * global_orthographic_density        -0.047678
global_clustering * global_synonyms_count               0.309552
global_frequency * global_letters_count                -0.002074
global_frequency * global_orthographic_density          0.035287
global_frequency * global_synonyms_count                0.072847
global_letters_count * global_orthographic_density      0.067309
global_letters_count * global_synonyms_count            0.057775
global_orthographic_density * global_synonyms_count     0.210041
dtype: float64

Regressing rel frequency with 893 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.020984170616111575

intercept                     -5.902189
global_aoa                    -0.008201
global_clustering             -0.038582
global_frequency               0.209319
global_letters_count           0.055548
global_orthographic_density    0.003120
global_synonyms_count          0.153445
dtype: float64

Regressing rel frequency with 893 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.03694615555497027

intercept                                              3.800049
global_aoa                                            -0.120132
global_clustering                                      0.349071
global_frequency                                      -0.473627
global_letters_count                                  -1.403295
global_orthographic_density                           -1.175463
global_synonyms_count                                 -0.344067
global_aoa * global_clustering                         0.101136
global_aoa * global_frequency                          0.047066
global_aoa * global_letters_count                      0.038247
global_aoa * global_orthographic_density               0.037577
global_aoa * global_synonyms_count                     0.036786
global_clustering * global_frequency                  -0.018190
global_clustering * global_letters_count              -0.175851
global_clustering * global_orthographic_density        0.060571
global_clustering * global_synonyms_count              0.208762
global_frequency * global_letters_count                0.013500
global_frequency * global_orthographic_density         0.107179
global_frequency * global_synonyms_count               0.143914
global_letters_count * global_orthographic_density     0.055974
global_letters_count * global_synonyms_count           0.001988
global_orthographic_density * global_synonyms_count    0.134340
dtype: float64

Regressing global frequency with 893 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.030478475606443678

intercept                   9.348503
rel_aoa                     0.018116
rel_clustering             -0.087900
rel_frequency               0.183868
rel_letters_count          -0.020464
rel_orthographic_density   -0.032435
rel_synonyms_count          0.051222
dtype: float64

Regressing global frequency with 893 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.044807032632144805

intercept                                        9.259479
rel_aoa                                          0.001921
rel_clustering                                   0.056817
rel_frequency                                    0.174944
rel_letters_count                                0.055623
rel_orthographic_density                        -0.089660
rel_synonyms_count                               0.366414
rel_aoa * rel_clustering                         0.006759
rel_aoa * rel_frequency                          0.028779
rel_aoa * rel_letters_count                      0.039126
rel_aoa * rel_orthographic_density              -0.004192
rel_aoa * rel_synonyms_count                     0.068206
rel_clustering * rel_frequency                  -0.024476
rel_clustering * rel_letters_count              -0.096484
rel_clustering * rel_orthographic_density        0.007595
rel_clustering * rel_synonyms_count              0.275556
rel_frequency * rel_letters_count                0.001864
rel_frequency * rel_orthographic_density         0.011990
rel_frequency * rel_synonyms_count               0.089319
rel_letters_count * rel_orthographic_density     0.055959
rel_letters_count * rel_synonyms_count           0.038896
rel_orthographic_density * rel_synonyms_count    0.297203
dtype: float64

Regressing rel frequency with 893 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.1854653901912695

intercept                  -1.888848
rel_aoa                     0.009229
rel_clustering              0.098295
rel_frequency               0.539168
rel_letters_count          -0.115209
rel_orthographic_density   -0.205407
rel_synonyms_count          0.100126
dtype: float64

Regressing rel frequency with 893 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.20240978047301872

intercept                                       -1.950431
rel_aoa                                         -0.068632
rel_clustering                                   0.184790
rel_frequency                                    0.567039
rel_letters_count                               -0.024150
rel_orthographic_density                        -0.329138
rel_synonyms_count                               0.368101
rel_aoa * rel_clustering                        -0.031893
rel_aoa * rel_frequency                         -0.005458
rel_aoa * rel_letters_count                      0.062225
rel_aoa * rel_orthographic_density               0.064317
rel_aoa * rel_synonyms_count                     0.163009
rel_clustering * rel_frequency                  -0.058216
rel_clustering * rel_letters_count              -0.152819
rel_clustering * rel_orthographic_density       -0.130186
rel_clustering * rel_synonyms_count              0.057439
rel_frequency * rel_letters_count               -0.000790
rel_frequency * rel_orthographic_density        -0.000060
rel_frequency * rel_synonyms_count               0.058909
rel_letters_count * rel_orthographic_density     0.068271
rel_letters_count * rel_synonyms_count           0.038186
rel_orthographic_density * rel_synonyms_count    0.341498
dtype: float64

Regressing global frequency with 893 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.0428771043920293

intercept                      5.861051
global_aoa                    -0.041226
global_clustering              0.014068
global_frequency               0.246753
global_letters_count           0.152023
global_orthographic_density    0.163269
global_synonyms_count          0.040220
rel_aoa                        0.048385
rel_clustering                -0.061585
rel_frequency                  0.005601
rel_letters_count             -0.162556
rel_orthographic_density      -0.203757
rel_synonyms_count             0.022484
dtype: float64

Regressing global frequency with 893 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.10786546057701296

intercept                                                 -8.283918
global_aoa                                                -0.651653
global_clustering                                         -6.534466
global_frequency                                           0.153426
global_letters_count                                      -2.110388
global_orthographic_density                                2.302207
global_synonyms_count                                     -1.201243
rel_aoa                                                    0.320996
rel_clustering                                            12.196393
rel_frequency                                              1.240890
rel_letters_count                                          0.418017
rel_orthographic_density                                  -3.888966
rel_synonyms_count                                         5.634133
global_aoa * global_clustering                             0.357758
global_aoa * global_frequency                              0.169991
global_aoa * global_letters_count                          0.194960
global_aoa * global_orthographic_density                   0.106055
global_aoa * global_synonyms_count                        -0.147413
global_aoa * rel_aoa                                      -0.009383
global_aoa * rel_clustering                               -0.399476
global_aoa * rel_frequency                                -0.141146
global_aoa * rel_letters_count                            -0.174440
global_aoa * rel_orthographic_density                     -0.039304
global_aoa * rel_synonyms_count                           -0.086710
global_clustering * global_frequency                       0.210765
global_clustering * global_letters_count                  -0.014722
global_clustering * global_orthographic_density            1.115464
global_clustering * global_synonyms_count                  1.042714
global_clustering * rel_aoa                               -0.268292
global_clustering * rel_clustering                         0.157039
global_clustering * rel_frequency                         -0.108029
global_clustering * rel_letters_count                     -0.207416
global_clustering * rel_orthographic_density              -0.835252
global_clustering * rel_synonyms_count                    -0.432768
global_frequency * global_letters_count                    0.017977
global_frequency * global_orthographic_density             0.179737
global_frequency * global_synonyms_count                   0.194735
global_frequency * rel_aoa                                -0.156215
global_frequency * rel_clustering                         -0.374214
global_frequency * rel_frequency                          -0.023776
global_frequency * rel_letters_count                       0.000612
global_frequency * rel_orthographic_density                0.034070
global_frequency * rel_synonyms_count                     -0.161062
global_letters_count * global_orthographic_density         0.315824
global_letters_count * global_synonyms_count               0.766195
global_letters_count * rel_aoa                            -0.076141
global_letters_count * rel_clustering                     -0.342621
global_letters_count * rel_frequency                      -0.024648
global_letters_count * rel_letters_count                   0.013673
global_letters_count * rel_orthographic_density           -0.168319
global_letters_count * rel_synonyms_count                 -0.692632
global_orthographic_density * global_synonyms_count        1.332956
global_orthographic_density * rel_aoa                      0.015043
global_orthographic_density * rel_clustering              -1.536462
global_orthographic_density * rel_frequency               -0.355812
global_orthographic_density * rel_letters_count           -0.355602
global_orthographic_density * rel_orthographic_density    -0.133100
global_orthographic_density * rel_synonyms_count          -1.238159
global_synonyms_count * rel_aoa                            0.080540
global_synonyms_count * rel_clustering                    -1.433101
global_synonyms_count * rel_frequency                     -0.301467
global_synonyms_count * rel_letters_count                 -0.608119
global_synonyms_count * rel_orthographic_density          -1.297383
global_synonyms_count * rel_synonyms_count                 0.080390
rel_aoa * rel_clustering                                   0.307628
rel_aoa * rel_frequency                                    0.116948
rel_aoa * rel_letters_count                                0.103515
rel_aoa * rel_orthographic_density                        -0.055714
rel_aoa * rel_synonyms_count                               0.152732
rel_clustering * rel_frequency                             0.239522
rel_clustering * rel_letters_count                         0.337707
rel_clustering * rel_orthographic_density                  1.151154
rel_clustering * rel_synonyms_count                        1.153531
rel_frequency * rel_letters_count                          0.020012
rel_frequency * rel_orthographic_density                   0.230253
rel_frequency * rel_synonyms_count                         0.356388
rel_letters_count * rel_orthographic_density               0.255984
rel_letters_count * rel_synonyms_count                     0.606940
rel_orthographic_density * rel_synonyms_count              1.463237
dtype: float64

Regressing rel frequency with 893 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.27436559412593

intercept                      4.769943
global_aoa                    -0.036050
global_clustering              0.084806
global_frequency              -0.653443
global_letters_count           0.204173
global_orthographic_density    0.263861
global_synonyms_count          0.006191
rel_aoa                        0.023954
rel_clustering                -0.085728
rel_frequency                  0.938897
rel_letters_count             -0.203354
rel_orthographic_density      -0.279502
rel_synonyms_count             0.054442
dtype: float64

Regressing rel frequency with 893 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.32413930366156374

intercept                                                -16.182734
global_aoa                                                -0.494135
global_clustering                                         -6.895821
global_frequency                                          -0.260953
global_letters_count                                      -1.742548
global_orthographic_density                                3.620956
global_synonyms_count                                     -2.396803
rel_aoa                                                    0.442412
rel_clustering                                            12.346582
rel_frequency                                              1.659248
rel_letters_count                                          0.483089
rel_orthographic_density                                  -4.370762
rel_synonyms_count                                         7.513351
global_aoa * global_clustering                             0.313155
global_aoa * global_frequency                              0.145207
global_aoa * global_letters_count                          0.194739
global_aoa * global_orthographic_density                   0.043127
global_aoa * global_synonyms_count                        -0.109717
global_aoa * rel_aoa                                       0.000100
global_aoa * rel_clustering                               -0.336956
global_aoa * rel_frequency                                -0.112719
global_aoa * rel_letters_count                            -0.186146
global_aoa * rel_orthographic_density                     -0.000482
global_aoa * rel_synonyms_count                           -0.106666
global_clustering * global_frequency                       0.246571
global_clustering * global_letters_count                   0.049899
global_clustering * global_orthographic_density            1.154684
global_clustering * global_synonyms_count                  0.847992
global_clustering * rel_aoa                               -0.215600
global_clustering * rel_clustering                         0.147578
global_clustering * rel_frequency                         -0.105655
global_clustering * rel_letters_count                     -0.190579
global_clustering * rel_orthographic_density              -0.799190
global_clustering * rel_synonyms_count                    -0.201862
global_frequency * global_letters_count                    0.026416
global_frequency * global_orthographic_density             0.130441
global_frequency * global_synonyms_count                   0.229738
global_frequency * rel_aoa                                -0.140343
global_frequency * rel_clustering                         -0.407194
global_frequency * rel_frequency                           0.006665
global_frequency * rel_letters_count                       0.010086
global_frequency * rel_orthographic_density                0.061388
global_frequency * rel_synonyms_count                     -0.191735
global_letters_count * global_orthographic_density         0.342472
global_letters_count * global_synonyms_count               0.793410
global_letters_count * rel_aoa                            -0.098736
global_letters_count * rel_clustering                     -0.386998
global_letters_count * rel_frequency                      -0.012527
global_letters_count * rel_letters_count                   0.017488
global_letters_count * rel_orthographic_density           -0.166291
global_letters_count * rel_synonyms_count                 -0.771699
global_orthographic_density * global_synonyms_count        0.987627
global_orthographic_density * rel_aoa                      0.048375
global_orthographic_density * rel_clustering              -1.522881
global_orthographic_density * rel_frequency               -0.316631
global_orthographic_density * rel_letters_count           -0.391633
global_orthographic_density * rel_orthographic_density    -0.106326
global_orthographic_density * rel_synonyms_count          -1.023658
global_synonyms_count * rel_aoa                            0.049530
global_synonyms_count * rel_clustering                    -1.329979
global_synonyms_count * rel_frequency                     -0.357453
global_synonyms_count * rel_letters_count                 -0.685334
global_synonyms_count * rel_orthographic_density          -0.991481
global_synonyms_count * rel_synonyms_count                 0.076974
rel_aoa * rel_clustering                                   0.264512
rel_aoa * rel_frequency                                    0.111329
rel_aoa * rel_letters_count                                0.121264
rel_aoa * rel_orthographic_density                        -0.080944
rel_aoa * rel_synonyms_count                               0.199249
rel_clustering * rel_frequency                             0.269024
rel_clustering * rel_letters_count                         0.297758
rel_clustering * rel_orthographic_density                  1.078902
rel_clustering * rel_synonyms_count                        0.987623
rel_frequency * rel_letters_count                         -0.001984
rel_frequency * rel_orthographic_density                   0.231845
rel_frequency * rel_synonyms_count                         0.438596
rel_letters_count * rel_orthographic_density               0.275296
rel_letters_count * rel_synonyms_count                     0.720942
rel_orthographic_density * rel_synonyms_count              1.288746
dtype: float64

----------------------------------------------------------------------
Regressing global aoa with 825 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.05167968342844542

intercept                      6.774031
global_aoa                     0.192096
global_clustering             -0.020666
global_frequency              -0.106313
global_letters_count           0.022344
global_orthographic_density   -0.113999
global_synonyms_count         -0.141283
dtype: float64

Regressing global aoa with 825 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.07886337389418241

intercept                                             -10.910688
global_aoa                                             -0.371006
global_clustering                                      -2.830794
global_frequency                                        1.173993
global_letters_count                                    2.013364
global_orthographic_density                            -0.210553
global_synonyms_count                                  -2.809577
global_aoa * global_clustering                         -0.044455
global_aoa * global_frequency                           0.015943
global_aoa * global_letters_count                       0.020570
global_aoa * global_orthographic_density                0.000862
global_aoa * global_synonyms_count                      0.089052
global_clustering * global_frequency                    0.159768
global_clustering * global_letters_count                0.304420
global_clustering * global_orthographic_density        -0.036183
global_clustering * global_synonyms_count              -0.473565
global_frequency * global_letters_count                -0.051416
global_frequency * global_orthographic_density         -0.059585
global_frequency * global_synonyms_count               -0.125966
global_letters_count * global_orthographic_density      0.056024
global_letters_count * global_synonyms_count            0.078550
global_orthographic_density * global_synonyms_count     0.041241
dtype: float64

Regressing rel aoa with 825 measures, no interactions
           ^^^^^^^
R^2 = 0.01065309007711901

intercept                      1.529576
global_aoa                     0.078289
global_clustering             -0.015233
global_frequency              -0.065955
global_letters_count          -0.034699
global_orthographic_density   -0.062693
global_synonyms_count         -0.093779
dtype: float64

Regressing rel aoa with 825 measures, with interactions
           ^^^^^^^
R^2 = 0.03531480900482509

intercept                                             -9.711974
global_aoa                                            -0.076672
global_clustering                                     -0.944022
global_frequency                                       1.177022
global_letters_count                                   1.246230
global_orthographic_density                           -0.542277
global_synonyms_count                                 -0.993977
global_aoa * global_clustering                        -0.049473
global_aoa * global_frequency                         -0.008949
global_aoa * global_letters_count                     -0.017862
global_aoa * global_orthographic_density               0.027955
global_aoa * global_synonyms_count                     0.017908
global_clustering * global_frequency                   0.109115
global_clustering * global_letters_count               0.125004
global_clustering * global_orthographic_density       -0.268034
global_clustering * global_synonyms_count             -0.354148
global_frequency * global_letters_count               -0.046902
global_frequency * global_orthographic_density        -0.128840
global_frequency * global_synonyms_count              -0.165451
global_letters_count * global_orthographic_density    -0.025379
global_letters_count * global_synonyms_count           0.037965
global_orthographic_density * global_synonyms_count    0.020970
dtype: float64

Regressing global aoa with 825 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.017611575312532657

intercept                   6.556888
rel_aoa                    -0.012326
rel_clustering              0.069465
rel_frequency              -0.048028
rel_letters_count           0.010711
rel_orthographic_density   -0.293941
rel_synonyms_count         -0.231003
dtype: float64

Regressing global aoa with 825 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.04901687531706511

intercept                                        6.614042
rel_aoa                                         -0.130086
rel_clustering                                  -0.107764
rel_frequency                                   -0.066691
rel_letters_count                               -0.119746
rel_orthographic_density                        -0.701687
rel_synonyms_count                              -0.726171
rel_aoa * rel_clustering                        -0.122602
rel_aoa * rel_frequency                         -0.049732
rel_aoa * rel_letters_count                      0.036529
rel_aoa * rel_orthographic_density               0.049934
rel_aoa * rel_synonyms_count                    -0.008429
rel_clustering * rel_frequency                   0.124963
rel_clustering * rel_letters_count               0.265186
rel_clustering * rel_orthographic_density       -0.055414
rel_clustering * rel_synonyms_count             -0.647406
rel_frequency * rel_letters_count               -0.020892
rel_frequency * rel_orthographic_density        -0.080671
rel_frequency * rel_synonyms_count              -0.194558
rel_letters_count * rel_orthographic_density     0.064015
rel_letters_count * rel_synonyms_count           0.008421
rel_orthographic_density * rel_synonyms_count   -0.115456
dtype: float64

Regressing rel aoa with 825 measures, no interactions
           ^^^^^^^
R^2 = 0.10973761347072786

intercept                   0.946381
rel_aoa                     0.407765
rel_clustering             -0.190375
rel_frequency              -0.073958
rel_letters_count          -0.053147
rel_orthographic_density    0.052091
rel_synonyms_count         -0.212326
dtype: float64

Regressing rel aoa with 825 measures, with interactions
           ^^^^^^^
R^2 = 0.1340116403683329

intercept                                        1.055286
rel_aoa                                          0.514427
rel_clustering                                  -0.394704
rel_frequency                                   -0.121819
rel_letters_count                               -0.078933
rel_orthographic_density                         0.170844
rel_synonyms_count                              -0.624637
rel_aoa * rel_clustering                        -0.127989
rel_aoa * rel_frequency                          0.016741
rel_aoa * rel_letters_count                      0.027865
rel_aoa * rel_orthographic_density               0.076164
rel_aoa * rel_synonyms_count                    -0.023667
rel_clustering * rel_frequency                   0.096880
rel_clustering * rel_letters_count               0.234518
rel_clustering * rel_orthographic_density       -0.026362
rel_clustering * rel_synonyms_count             -0.274772
rel_frequency * rel_letters_count                0.041398
rel_frequency * rel_orthographic_density         0.085051
rel_frequency * rel_synonyms_count              -0.138232
rel_letters_count * rel_orthographic_density     0.002187
rel_letters_count * rel_synonyms_count           0.043377
rel_orthographic_density * rel_synonyms_count   -0.018524
dtype: float64

Regressing global aoa with 825 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.07563930510864758

intercept                      6.270946
global_aoa                     0.388899
global_clustering              0.137202
global_frequency              -0.074067
global_letters_count           0.083945
global_orthographic_density   -0.200737
global_synonyms_count          0.145446
rel_aoa                       -0.298233
rel_clustering                -0.196951
rel_frequency                 -0.033353
rel_letters_count             -0.061115
rel_orthographic_density       0.149718
rel_synonyms_count            -0.344279
dtype: float64

Regressing global aoa with 825 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.14969029172274195

intercept                                                 37.573608
global_aoa                                                 1.947689
global_clustering                                          8.837261
global_frequency                                           0.802847
global_letters_count                                      -1.819434
global_orthographic_density                               -5.661687
global_synonyms_count                                    -17.478146
rel_aoa                                                   -0.005842
rel_clustering                                           -12.733944
rel_frequency                                              1.056407
rel_letters_count                                          3.206897
rel_orthographic_density                                   5.913852
rel_synonyms_count                                         6.379006
global_aoa * global_clustering                            -0.102631
global_aoa * global_frequency                             -0.189510
global_aoa * global_letters_count                         -0.157872
global_aoa * global_orthographic_density                   0.265554
global_aoa * global_synonyms_count                        -0.170892
global_aoa * rel_aoa                                       0.017833
global_aoa * rel_clustering                                0.179103
global_aoa * rel_frequency                                 0.193873
global_aoa * rel_letters_count                             0.166109
global_aoa * rel_orthographic_density                     -0.292479
global_aoa * rel_synonyms_count                            0.366182
global_clustering * global_frequency                      -0.113142
global_clustering * global_letters_count                  -0.305502
global_clustering * global_orthographic_density           -2.176750
global_clustering * global_synonyms_count                 -1.820726
global_clustering * rel_aoa                                0.177669
global_clustering * rel_clustering                        -0.077146
global_clustering * rel_frequency                          0.346480
global_clustering * rel_letters_count                      0.384258
global_clustering * rel_orthographic_density               1.815960
global_clustering * rel_synonyms_count                     1.321681
global_frequency * global_letters_count                    0.166849
global_frequency * global_orthographic_density            -0.544822
global_frequency * global_synonyms_count                   0.251729
global_frequency * rel_aoa                                 0.165546
global_frequency * rel_clustering                          0.319056
global_frequency * rel_frequency                           0.017324
global_frequency * rel_letters_count                      -0.195363
global_frequency * rel_orthographic_density                0.404756
global_frequency * rel_synonyms_count                      0.047468
global_letters_count * global_orthographic_density        -0.736107
global_letters_count * global_synonyms_count               0.904172
global_letters_count * rel_aoa                            -0.102403
global_letters_count * rel_clustering                      0.555552
global_letters_count * rel_frequency                      -0.246152
global_letters_count * rel_letters_count                  -0.033485
global_letters_count * rel_orthographic_density            0.464703
global_letters_count * rel_synonyms_count                 -0.258385
global_orthographic_density * global_synonyms_count        0.035663
global_orthographic_density * rel_aoa                     -0.332021
global_orthographic_density * rel_clustering               1.832276
global_orthographic_density * rel_frequency                0.417392
global_orthographic_density * rel_letters_count            0.424970
global_orthographic_density * rel_orthographic_density    -0.023635
global_orthographic_density * rel_synonyms_count           0.177087
global_synonyms_count * rel_aoa                            0.153804
global_synonyms_count * rel_clustering                     2.252145
global_synonyms_count * rel_frequency                     -0.427222
global_synonyms_count * rel_letters_count                 -0.748517
global_synonyms_count * rel_orthographic_density           0.047996
global_synonyms_count * rel_synonyms_count                -0.003562
rel_aoa * rel_clustering                                  -0.335452
rel_aoa * rel_frequency                                   -0.138170
rel_aoa * rel_letters_count                                0.105803
rel_aoa * rel_orthographic_density                         0.375883
rel_aoa * rel_synonyms_count                              -0.283114
rel_clustering * rel_frequency                            -0.393745
rel_clustering * rel_letters_count                        -0.227279
rel_clustering * rel_orthographic_density                 -1.324962
rel_clustering * rel_synonyms_count                       -2.231570
rel_frequency * rel_letters_count                          0.202053
rel_frequency * rel_orthographic_density                  -0.361959
rel_frequency * rel_synonyms_count                         0.005588
rel_letters_count * rel_orthographic_density              -0.191143
rel_letters_count * rel_synonyms_count                     0.252907
rel_orthographic_density * rel_synonyms_count             -0.011812
dtype: float64

Regressing rel aoa with 825 measures, no interactions
           ^^^^^^^
R^2 = 0.15066065145370222

intercept                      3.911236
global_aoa                    -0.352379
global_clustering              0.164890
global_frequency               0.018675
global_letters_count           0.051665
global_orthographic_density   -0.267105
global_synonyms_count          0.259643
rel_aoa                        0.654624
rel_clustering                -0.204865
rel_frequency                 -0.087535
rel_letters_count             -0.078585
rel_orthographic_density       0.165232
rel_synonyms_count            -0.462299
dtype: float64

Regressing rel aoa with 825 measures, with interactions
           ^^^^^^^
R^2 = 0.22537192842698428

intercept                                                 16.049504
global_aoa                                                -0.499592
global_clustering                                          4.327838
global_frequency                                           1.546063
global_letters_count                                       0.768541
global_orthographic_density                               -5.650797
global_synonyms_count                                    -15.820375
rel_aoa                                                    2.066632
rel_clustering                                            -5.829447
rel_frequency                                              0.054067
rel_letters_count                                          1.225197
rel_orthographic_density                                   6.981506
rel_synonyms_count                                         7.212878
global_aoa * global_clustering                            -0.115561
global_aoa * global_frequency                             -0.063932
global_aoa * global_letters_count                         -0.129966
global_aoa * global_orthographic_density                   0.328959
global_aoa * global_synonyms_count                        -0.226091
global_aoa * rel_aoa                                      -0.017905
global_aoa * rel_clustering                                0.178835
global_aoa * rel_frequency                                 0.083232
global_aoa * rel_letters_count                             0.068811
global_aoa * rel_orthographic_density                     -0.483773
global_aoa * rel_synonyms_count                            0.386405
global_clustering * global_frequency                       0.058728
global_clustering * global_letters_count                  -0.083820
global_clustering * global_orthographic_density           -1.494476
global_clustering * global_synonyms_count                 -1.677795
global_clustering * rel_aoa                                0.193876
global_clustering * rel_clustering                         0.028306
global_clustering * rel_frequency                          0.073603
global_clustering * rel_letters_count                      0.073975
global_clustering * rel_orthographic_density               1.153067
global_clustering * rel_synonyms_count                     1.325146
global_frequency * global_letters_count                   -0.006383
global_frequency * global_orthographic_density            -0.340827
global_frequency * global_synonyms_count                   0.161759
global_frequency * rel_aoa                                 0.059163
global_frequency * rel_clustering                          0.004561
global_frequency * rel_frequency                           0.013010
global_frequency * rel_letters_count                      -0.081944
global_frequency * rel_orthographic_density                0.143936
global_frequency * rel_synonyms_count                      0.044722
global_letters_count * global_orthographic_density        -0.469928
global_letters_count * global_synonyms_count               0.952631
global_letters_count * rel_aoa                            -0.037051
global_letters_count * rel_clustering                      0.321498
global_letters_count * rel_frequency                      -0.137615
global_letters_count * rel_letters_count                  -0.030889
global_letters_count * rel_orthographic_density            0.361831
global_letters_count * rel_synonyms_count                 -0.457782
global_orthographic_density * global_synonyms_count        0.369600
global_orthographic_density * rel_aoa                     -0.287747
global_orthographic_density * rel_clustering               1.174778
global_orthographic_density * rel_frequency                0.136617
global_orthographic_density * rel_letters_count            0.122660
global_orthographic_density * rel_orthographic_density    -0.160333
global_orthographic_density * rel_synonyms_count          -0.135858
global_synonyms_count * rel_aoa                            0.221721
global_synonyms_count * rel_clustering                     2.014748
global_synonyms_count * rel_frequency                     -0.310279
global_synonyms_count * rel_letters_count                 -0.789793
global_synonyms_count * rel_orthographic_density          -0.129918
global_synonyms_count * rel_synonyms_count                 0.041096
rel_aoa * rel_clustering                                  -0.366513
rel_aoa * rel_frequency                                   -0.066716
rel_aoa * rel_letters_count                                0.096716
rel_aoa * rel_orthographic_density                         0.433011
rel_aoa * rel_synonyms_count                              -0.354558
rel_clustering * rel_frequency                            -0.010314
rel_clustering * rel_letters_count                         0.045146
rel_clustering * rel_orthographic_density                 -0.745228
rel_clustering * rel_synonyms_count                       -2.118415
rel_frequency * rel_letters_count                          0.180351
rel_frequency * rel_orthographic_density                  -0.015191
rel_frequency * rel_synonyms_count                        -0.097261
rel_letters_count * rel_orthographic_density              -0.109673
rel_letters_count * rel_synonyms_count                     0.420193
rel_orthographic_density * rel_synonyms_count              0.057979
dtype: float64

----------------------------------------------------------------------
Regressing global clustering with 735 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.012486902385596022

intercept                     -5.401308
global_aoa                     0.011570
global_clustering              0.066260
global_frequency              -0.020001
global_letters_count           0.011954
global_orthographic_density    0.024889
global_synonyms_count          0.043253
dtype: float64

Regressing global clustering with 735 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.04582066799449569

intercept                                             -5.826107
global_aoa                                            -0.005346
global_clustering                                      0.129814
global_frequency                                      -0.241613
global_letters_count                                   0.461692
global_orthographic_density                            0.536182
global_synonyms_count                                 -0.026537
global_aoa * global_clustering                        -0.024592
global_aoa * global_frequency                         -0.001375
global_aoa * global_letters_count                     -0.016797
global_aoa * global_orthographic_density              -0.011666
global_aoa * global_synonyms_count                     0.007060
global_clustering * global_frequency                  -0.028669
global_clustering * global_letters_count               0.059731
global_clustering * global_orthographic_density        0.040316
global_clustering * global_synonyms_count             -0.182546
global_frequency * global_letters_count                0.009088
global_frequency * global_orthographic_density         0.007197
global_frequency * global_synonyms_count              -0.006484
global_letters_count * global_orthographic_density    -0.036641
global_letters_count * global_synonyms_count          -0.114493
global_orthographic_density * global_synonyms_count   -0.230724
dtype: float64

Regressing rel clustering with 735 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.011512443785469116

intercept                      0.447936
global_aoa                     0.018796
global_clustering              0.047487
global_frequency              -0.010147
global_letters_count           0.023744
global_orthographic_density    0.048742
global_synonyms_count          0.007187
dtype: float64

Regressing rel clustering with 735 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.03414038972817324

intercept                                              0.577862
global_aoa                                            -0.029117
global_clustering                                      0.163141
global_frequency                                      -0.111349
global_letters_count                                   0.154647
global_orthographic_density                            0.588918
global_synonyms_count                                  0.251560
global_aoa * global_clustering                        -0.024326
global_aoa * global_frequency                         -0.001674
global_aoa * global_letters_count                     -0.009530
global_aoa * global_orthographic_density              -0.015460
global_aoa * global_synonyms_count                    -0.013253
global_clustering * global_frequency                  -0.011178
global_clustering * global_letters_count               0.016800
global_clustering * global_orthographic_density        0.046526
global_clustering * global_synonyms_count             -0.126242
global_frequency * global_letters_count                0.007997
global_frequency * global_orthographic_density         0.000128
global_frequency * global_synonyms_count              -0.014084
global_letters_count * global_orthographic_density    -0.017824
global_letters_count * global_synonyms_count          -0.080280
global_orthographic_density * global_synonyms_count   -0.224323
dtype: float64

Regressing global clustering with 735 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.0011641681114800129

intercept                  -5.781522
rel_aoa                     0.007020
rel_clustering              0.029615
rel_frequency               0.009454
rel_letters_count           0.002007
rel_orthographic_density    0.011292
rel_synonyms_count          0.006853
dtype: float64

Regressing global clustering with 735 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.03376713654776564

intercept                                       -5.838143
rel_aoa                                         -0.005692
rel_clustering                                   0.073236
rel_frequency                                   -0.041759
rel_letters_count                                0.001892
rel_orthographic_density                         0.045865
rel_synonyms_count                              -0.111403
rel_aoa * rel_clustering                        -0.011716
rel_aoa * rel_frequency                         -0.009639
rel_aoa * rel_letters_count                     -0.005501
rel_aoa * rel_orthographic_density               0.002121
rel_aoa * rel_synonyms_count                     0.011693
rel_clustering * rel_frequency                   0.037837
rel_clustering * rel_letters_count               0.046614
rel_clustering * rel_orthographic_density        0.030275
rel_clustering * rel_synonyms_count             -0.258159
rel_frequency * rel_letters_count                0.018387
rel_frequency * rel_orthographic_density        -0.000610
rel_frequency * rel_synonyms_count              -0.054595
rel_letters_count * rel_orthographic_density    -0.028617
rel_letters_count * rel_synonyms_count          -0.067341
rel_orthographic_density * rel_synonyms_count   -0.215344
dtype: float64

Regressing rel clustering with 735 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.07177698742437533

intercept                   0.355003
rel_aoa                    -0.010391
rel_clustering              0.277202
rel_frequency               0.022742
rel_letters_count           0.024891
rel_orthographic_density    0.031174
rel_synonyms_count          0.030790
dtype: float64

Regressing rel clustering with 735 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.0954791253818783

intercept                                        0.293279
rel_aoa                                         -0.032785
rel_clustering                                   0.365749
rel_frequency                                   -0.015089
rel_letters_count                                0.031930
rel_orthographic_density                         0.034881
rel_synonyms_count                              -0.116565
rel_aoa * rel_clustering                         0.002884
rel_aoa * rel_frequency                         -0.008406
rel_aoa * rel_letters_count                     -0.008504
rel_aoa * rel_orthographic_density              -0.014989
rel_aoa * rel_synonyms_count                     0.017795
rel_clustering * rel_frequency                   0.031607
rel_clustering * rel_letters_count               0.005499
rel_clustering * rel_orthographic_density        0.008667
rel_clustering * rel_synonyms_count             -0.203551
rel_frequency * rel_letters_count                0.009611
rel_frequency * rel_orthographic_density        -0.010872
rel_frequency * rel_synonyms_count              -0.044092
rel_letters_count * rel_orthographic_density    -0.017259
rel_letters_count * rel_synonyms_count          -0.040265
rel_orthographic_density * rel_synonyms_count   -0.181720
dtype: float64

Regressing global clustering with 735 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.03427485864421742

intercept                     -3.864290
global_aoa                     0.018279
global_clustering              0.191525
global_frequency              -0.068745
global_letters_count           0.008971
global_orthographic_density   -0.050351
global_synonyms_count          0.111187
rel_aoa                       -0.011321
rel_clustering                -0.145920
rel_frequency                  0.052848
rel_letters_count              0.004490
rel_orthographic_density       0.093968
rel_synonyms_count            -0.089950
dtype: float64

Regressing global clustering with 735 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.16467629453640253

intercept                                                -3.105931
global_aoa                                                0.736708
global_clustering                                         2.589866
global_frequency                                         -0.333054
global_letters_count                                      2.538438
global_orthographic_density                               0.057794
global_synonyms_count                                    -3.590216
rel_aoa                                                  -0.397110
rel_clustering                                           -3.357636
rel_frequency                                            -0.520579
rel_letters_count                                        -1.866890
rel_orthographic_density                                  0.414645
rel_synonyms_count                                        0.136938
global_aoa * global_clustering                           -0.111243
global_aoa * global_frequency                            -0.062564
global_aoa * global_letters_count                        -0.132580
global_aoa * global_orthographic_density                 -0.057815
global_aoa * global_synonyms_count                        0.035745
global_aoa * rel_aoa                                      0.014497
global_aoa * rel_clustering                               0.149379
global_aoa * rel_frequency                                0.086309
global_aoa * rel_letters_count                            0.113701
global_aoa * rel_orthographic_density                     0.010697
global_aoa * rel_synonyms_count                           0.016958
global_clustering * global_frequency                     -0.157200
global_clustering * global_letters_count                  0.176879
global_clustering * global_orthographic_density          -0.346237
global_clustering * global_synonyms_count                -0.258869
global_clustering * rel_aoa                              -0.001425
global_clustering * rel_clustering                       -0.121763
global_clustering * rel_frequency                         0.062983
global_clustering * rel_letters_count                    -0.114431
global_clustering * rel_orthographic_density              0.298831
global_clustering * rel_synonyms_count                    0.045016
global_frequency * global_letters_count                  -0.025490
global_frequency * global_orthographic_density           -0.096130
global_frequency * global_synonyms_count                  0.164105
global_frequency * rel_aoa                                0.009765
global_frequency * rel_clustering                         0.116324
global_frequency * rel_frequency                          0.022099
global_frequency * rel_letters_count                      0.032929
global_frequency * rel_orthographic_density               0.070635
global_frequency * rel_synonyms_count                    -0.051724
global_letters_count * global_orthographic_density       -0.146731
global_letters_count * global_synonyms_count             -0.055315
global_letters_count * rel_aoa                            0.024042
global_letters_count * rel_clustering                    -0.079055
global_letters_count * rel_frequency                      0.011398
global_letters_count * rel_letters_count                  0.000352
global_letters_count * rel_orthographic_density           0.150056
global_letters_count * rel_synonyms_count                 0.140258
global_orthographic_density * global_synonyms_count       0.147547
global_orthographic_density * rel_aoa                     0.041263
global_orthographic_density * rel_clustering              0.344321
global_orthographic_density * rel_frequency               0.104254
global_orthographic_density * rel_letters_count           0.064444
global_orthographic_density * rel_orthographic_density    0.000468
global_orthographic_density * rel_synonyms_count         -0.076643
global_synonyms_count * rel_aoa                           0.012450
global_synonyms_count * rel_clustering                    0.222698
global_synonyms_count * rel_frequency                    -0.172996
global_synonyms_count * rel_letters_count                -0.175078
global_synonyms_count * rel_orthographic_density         -0.415712
global_synonyms_count * rel_synonyms_count               -0.001740
rel_aoa * rel_clustering                                 -0.033944
rel_aoa * rel_frequency                                  -0.020370
rel_aoa * rel_letters_count                              -0.031238
rel_aoa * rel_orthographic_density                       -0.016196
rel_aoa * rel_synonyms_count                             -0.049377
rel_clustering * rel_frequency                           -0.038586
rel_clustering * rel_letters_count                        0.085725
rel_clustering * rel_orthographic_density                -0.202571
rel_clustering * rel_synonyms_count                      -0.303012
rel_frequency * rel_letters_count                        -0.006524
rel_frequency * rel_orthographic_density                 -0.078402
rel_frequency * rel_synonyms_count                        0.032401
rel_letters_count * rel_orthographic_density             -0.106215
rel_letters_count * rel_synonyms_count                    0.020234
rel_orthographic_density * rel_synonyms_count             0.146338
dtype: float64

Regressing rel clustering with 735 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.1555439804158163

intercept                     -2.387070
global_aoa                     0.030749
global_clustering             -0.548523
global_frequency              -0.064664
global_letters_count          -0.016383
global_orthographic_density   -0.018881
global_synonyms_count          0.033217
rel_aoa                       -0.027695
rel_clustering                 0.733692
rel_frequency                  0.058374
rel_letters_count              0.035002
rel_orthographic_density       0.053658
rel_synonyms_count            -0.023286
dtype: float64

Regressing rel clustering with 735 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.2595696263051983

intercept                                                -4.312689
global_aoa                                                0.686107
global_clustering                                         0.814902
global_frequency                                          0.123408
global_letters_count                                      1.657162
global_orthographic_density                              -0.363842
global_synonyms_count                                    -3.791267
rel_aoa                                                  -0.168962
rel_clustering                                           -1.252144
rel_frequency                                            -0.586645
rel_letters_count                                        -1.484132
rel_orthographic_density                                  0.654646
rel_synonyms_count                                        0.937803
global_aoa * global_clustering                           -0.062778
global_aoa * global_frequency                            -0.041870
global_aoa * global_letters_count                        -0.101653
global_aoa * global_orthographic_density                 -0.053845
global_aoa * global_synonyms_count                       -0.022877
global_aoa * rel_aoa                                      0.012715
global_aoa * rel_clustering                               0.095665
global_aoa * rel_frequency                                0.068767
global_aoa * rel_letters_count                            0.094976
global_aoa * rel_orthographic_density                     0.021931
global_aoa * rel_synonyms_count                           0.054879
global_clustering * global_frequency                     -0.052276
global_clustering * global_letters_count                  0.086137
global_clustering * global_orthographic_density          -0.306757
global_clustering * global_synonyms_count                -0.256926
global_clustering * rel_aoa                              -0.008010
global_clustering * rel_clustering                       -0.147391
global_clustering * rel_frequency                         0.020617
global_clustering * rel_letters_count                    -0.085874
global_clustering * rel_orthographic_density              0.241954
global_clustering * rel_synonyms_count                    0.077088
global_frequency * global_letters_count                  -0.030561
global_frequency * global_orthographic_density           -0.057023
global_frequency * global_synonyms_count                  0.132666
global_frequency * rel_aoa                               -0.011051
global_frequency * rel_clustering                         0.010905
global_frequency * rel_frequency                          0.020332
global_frequency * rel_letters_count                      0.039663
global_frequency * rel_orthographic_density               0.022680
global_frequency * rel_synonyms_count                    -0.046653
global_letters_count * global_orthographic_density       -0.093038
global_letters_count * global_synonyms_count              0.082791
global_letters_count * rel_aoa                            0.007876
global_letters_count * rel_clustering                    -0.025853
global_letters_count * rel_frequency                      0.008448
global_letters_count * rel_letters_count                 -0.000470
global_letters_count * rel_orthographic_density           0.094209
global_letters_count * rel_synonyms_count                 0.012383
global_orthographic_density * global_synonyms_count       0.222307
global_orthographic_density * rel_aoa                     0.042371
global_orthographic_density * rel_clustering              0.265334
global_orthographic_density * rel_frequency               0.079906
global_orthographic_density * rel_letters_count           0.022371
global_orthographic_density * rel_orthographic_density    0.018938
global_orthographic_density * rel_synonyms_count         -0.175213
global_synonyms_count * rel_aoa                           0.037731
global_synonyms_count * rel_clustering                    0.271356
global_synonyms_count * rel_frequency                    -0.160448
global_synonyms_count * rel_letters_count                -0.225556
global_synonyms_count * rel_orthographic_density         -0.396320
global_synonyms_count * rel_synonyms_count               -0.018380
rel_aoa * rel_clustering                                 -0.013998
rel_aoa * rel_frequency                                  -0.006487
rel_aoa * rel_letters_count                              -0.016350
rel_aoa * rel_orthographic_density                       -0.024903
rel_aoa * rel_synonyms_count                             -0.059708
rel_clustering * rel_frequency                            0.014428
rel_clustering * rel_letters_count                        0.050564
rel_clustering * rel_orthographic_density                -0.153716
rel_clustering * rel_synonyms_count                      -0.351864
rel_frequency * rel_letters_count                        -0.008871
rel_frequency * rel_orthographic_density                 -0.051240
rel_frequency * rel_synonyms_count                        0.043964
rel_letters_count * rel_orthographic_density             -0.041864
rel_letters_count * rel_synonyms_count                    0.076405
rel_orthographic_density * rel_synonyms_count             0.151840
dtype: float64

----------------------------------------------------------------------
Regressing global letters_count with 893 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.05006538660000015

intercept                      4.230274
global_aoa                     0.103990
global_clustering             -0.202062
global_frequency               0.013291
global_letters_count           0.086375
global_orthographic_density   -0.180446
global_synonyms_count         -0.288908
dtype: float64

Regressing global letters_count with 893 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.06847163492376507

intercept                                             -18.360628
global_aoa                                              1.216387
global_clustering                                      -2.819744
global_frequency                                        1.443516
global_letters_count                                    1.395553
global_orthographic_density                             0.618582
global_synonyms_count                                   1.710753
global_aoa * global_clustering                          0.092587
global_aoa * global_frequency                          -0.027595
global_aoa * global_letters_count                      -0.047628
global_aoa * global_orthographic_density               -0.003675
global_aoa * global_synonyms_count                     -0.126263
global_clustering * global_frequency                    0.150687
global_clustering * global_letters_count                0.085819
global_clustering * global_orthographic_density         0.075079
global_clustering * global_synonyms_count              -0.045692
global_frequency * global_letters_count                -0.051914
global_frequency * global_orthographic_density         -0.035896
global_frequency * global_synonyms_count               -0.122220
global_letters_count * global_orthographic_density      0.015797
global_letters_count * global_synonyms_count           -0.000074
global_orthographic_density * global_synonyms_count    -0.271714
dtype: float64

Regressing rel letters_count with 893 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.027247445938195725

intercept                      0.985179
global_aoa                     0.054818
global_clustering             -0.196404
global_frequency               0.007950
global_letters_count           0.075304
global_orthographic_density   -0.109278
global_synonyms_count         -0.347372
dtype: float64

Regressing rel letters_count with 893 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.0482178933733598

intercept                                             -16.658596
global_aoa                                              0.950016
global_clustering                                      -1.784778
global_frequency                                        1.400607
global_letters_count                                    1.165452
global_orthographic_density                            -0.010732
global_synonyms_count                                   2.163901
global_aoa * global_clustering                          0.038753
global_aoa * global_frequency                          -0.030581
global_aoa * global_letters_count                      -0.059912
global_aoa * global_orthographic_density               -0.001958
global_aoa * global_synonyms_count                     -0.162311
global_clustering * global_frequency                    0.131376
global_clustering * global_letters_count                0.038023
global_clustering * global_orthographic_density        -0.089073
global_clustering * global_synonyms_count               0.004577
global_frequency * global_letters_count                -0.049292
global_frequency * global_orthographic_density         -0.060812
global_frequency * global_synonyms_count               -0.190997
global_letters_count * global_orthographic_density     -0.003148
global_letters_count * global_synonyms_count            0.066313
global_orthographic_density * global_synonyms_count    -0.082652
dtype: float64

Regressing global letters_count with 893 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.03275812074901707

intercept                   5.968445
rel_aoa                    -0.028771
rel_clustering             -0.023004
rel_frequency               0.012187
rel_letters_count           0.078793
rel_orthographic_density   -0.293820
rel_synonyms_count         -0.312290
dtype: float64

Regressing global letters_count with 893 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.05169649612070393

intercept                                        5.973059
rel_aoa                                         -0.057400
rel_clustering                                  -0.088692
rel_frequency                                    0.058482
rel_letters_count                                0.072149
rel_orthographic_density                        -0.507312
rel_synonyms_count                              -0.788989
rel_aoa * rel_clustering                         0.094545
rel_aoa * rel_frequency                         -0.039825
rel_aoa * rel_letters_count                     -0.058594
rel_aoa * rel_orthographic_density              -0.026910
rel_aoa * rel_synonyms_count                    -0.088084
rel_clustering * rel_frequency                   0.006125
rel_clustering * rel_letters_count              -0.000062
rel_clustering * rel_orthographic_density       -0.057353
rel_clustering * rel_synonyms_count             -0.335536
rel_frequency * rel_letters_count               -0.031632
rel_frequency * rel_orthographic_density        -0.076670
rel_frequency * rel_synonyms_count              -0.120319
rel_letters_count * rel_orthographic_density     0.017775
rel_letters_count * rel_synonyms_count          -0.009003
rel_orthographic_density * rel_synonyms_count   -0.315860
dtype: float64

Regressing rel letters_count with 893 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.08283128514594296

intercept                   1.794354
rel_aoa                    -0.020692
rel_clustering             -0.168295
rel_frequency              -0.154126
rel_letters_count           0.289553
rel_orthographic_density    0.040993
rel_synonyms_count         -0.317094
dtype: float64

Regressing rel letters_count with 893 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.1070075898647207

intercept                                        1.831942
rel_aoa                                          0.064048
rel_clustering                                  -0.250945
rel_frequency                                   -0.098616
rel_letters_count                                0.384951
rel_orthographic_density                         0.014432
rel_synonyms_count                              -0.751937
rel_aoa * rel_clustering                         0.097500
rel_aoa * rel_frequency                         -0.014453
rel_aoa * rel_letters_count                     -0.105009
rel_aoa * rel_orthographic_density              -0.091916
rel_aoa * rel_synonyms_count                    -0.110711
rel_clustering * rel_frequency                   0.006660
rel_clustering * rel_letters_count               0.019567
rel_clustering * rel_orthographic_density       -0.003106
rel_clustering * rel_synonyms_count             -0.224686
rel_frequency * rel_letters_count               -0.009189
rel_frequency * rel_orthographic_density        -0.005854
rel_frequency * rel_synonyms_count              -0.102597
rel_letters_count * rel_orthographic_density     0.029984
rel_letters_count * rel_synonyms_count           0.063134
rel_orthographic_density * rel_synonyms_count   -0.136627
dtype: float64

Regressing global letters_count with 893 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.061736929365021176

intercept                      0.980595
global_aoa                     0.228721
global_clustering             -0.389945
global_frequency               0.105811
global_letters_count           0.143115
global_orthographic_density   -0.180743
global_synonyms_count         -0.076255
rel_aoa                       -0.192492
rel_clustering                 0.211960
rel_frequency                 -0.106180
rel_letters_count             -0.059958
rel_orthographic_density       0.016398
rel_synonyms_count            -0.238866
dtype: float64

Regressing global letters_count with 893 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.12452629415511596

intercept                                                 28.683569
global_aoa                                                 0.381699
global_clustering                                          6.323376
global_frequency                                           0.335846
global_letters_count                                      -2.565354
global_orthographic_density                               -1.073827
global_synonyms_count                                     -4.854058
rel_aoa                                                    0.775105
rel_clustering                                           -12.554860
rel_frequency                                              2.059967
rel_letters_count                                          3.766512
rel_orthographic_density                                  -1.030766
rel_synonyms_count                                         2.922038
global_aoa * global_clustering                            -0.060559
global_aoa * global_frequency                             -0.080734
global_aoa * global_letters_count                         -0.012022
global_aoa * global_orthographic_density                   0.239351
global_aoa * global_synonyms_count                        -0.315511
global_aoa * rel_aoa                                       0.020768
global_aoa * rel_clustering                                0.062238
global_aoa * rel_frequency                                 0.037251
global_aoa * rel_letters_count                            -0.040116
global_aoa * rel_orthographic_density                     -0.212659
global_aoa * rel_synonyms_count                            0.216081
global_clustering * global_frequency                      -0.149567
global_clustering * global_letters_count                  -0.528243
global_clustering * global_orthographic_density           -0.792675
global_clustering * global_synonyms_count                 -1.189387
global_clustering * rel_aoa                                0.244070
global_clustering * rel_clustering                        -0.066438
global_clustering * rel_frequency                          0.352670
global_clustering * rel_letters_count                      0.524051
global_clustering * rel_orthographic_density               0.635264
global_clustering * rel_synonyms_count                     1.396231
global_frequency * global_letters_count                   -0.012276
global_frequency * global_orthographic_density            -0.293616
global_frequency * global_synonyms_count                  -0.052685
global_frequency * rel_aoa                                 0.079067
global_frequency * rel_clustering                          0.527934
global_frequency * rel_frequency                          -0.008613
global_frequency * rel_letters_count                      -0.027052
global_frequency * rel_orthographic_density                0.332590
global_frequency * rel_synonyms_count                      0.201279
global_letters_count * global_orthographic_density        -0.317836
global_letters_count * global_synonyms_count               0.655269
global_letters_count * rel_aoa                            -0.011852
global_letters_count * rel_clustering                      0.765681
global_letters_count * rel_frequency                      -0.109379
global_letters_count * rel_letters_count                   0.002349
global_letters_count * rel_orthographic_density            0.373116
global_letters_count * rel_synonyms_count                 -0.309981
global_orthographic_density * global_synonyms_count       -1.491274
global_orthographic_density * rel_aoa                     -0.140980
global_orthographic_density * rel_clustering               0.856419
global_orthographic_density * rel_frequency                0.199517
global_orthographic_density * rel_letters_count            0.120976
global_orthographic_density * rel_orthographic_density    -0.003214
global_orthographic_density * rel_synonyms_count           1.501933
global_synonyms_count * rel_aoa                           -0.101886
global_synonyms_count * rel_clustering                     1.521040
global_synonyms_count * rel_frequency                     -0.361919
global_synonyms_count * rel_letters_count                 -0.743703
global_synonyms_count * rel_orthographic_density           1.535812
global_synonyms_count * rel_synonyms_count                -0.087741
rel_aoa * rel_clustering                                  -0.129049
rel_aoa * rel_frequency                                   -0.048296
rel_aoa * rel_letters_count                               -0.033579
rel_aoa * rel_orthographic_density                         0.101628
rel_aoa * rel_synonyms_count                               0.090534
rel_clustering * rel_frequency                            -0.643060
rel_clustering * rel_letters_count                        -0.712415
rel_clustering * rel_orthographic_density                 -0.696331
rel_clustering * rel_synonyms_count                       -1.876085
rel_frequency * rel_letters_count                          0.083595
rel_frequency * rel_orthographic_density                  -0.300019
rel_frequency * rel_synonyms_count                         0.140356
rel_letters_count * rel_orthographic_density              -0.194071
rel_letters_count * rel_synonyms_count                     0.429577
rel_orthographic_density * rel_synonyms_count             -1.790598
dtype: float64

Regressing rel letters_count with 893 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.14061742915605568

intercept                      0.417981
global_aoa                     0.186172
global_clustering             -0.328207
global_frequency               0.145187
global_letters_count          -0.673764
global_orthographic_density   -0.212663
global_synonyms_count         -0.068725
rel_aoa                       -0.137510
rel_clustering                 0.161172
rel_frequency                 -0.149313
rel_letters_count              0.785903
rel_orthographic_density       0.038594
rel_synonyms_count            -0.239895
dtype: float64

Regressing rel letters_count with 893 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.19654280514614375

intercept                                                 20.822616
global_aoa                                                -0.162057
global_clustering                                          4.034526
global_frequency                                           0.224651
global_letters_count                                      -2.345494
global_orthographic_density                               -1.724239
global_synonyms_count                                     -1.180253
rel_aoa                                                    1.012255
rel_clustering                                            -9.525231
rel_frequency                                              2.009327
rel_letters_count                                          3.532173
rel_orthographic_density                                  -1.447720
rel_synonyms_count                                         0.444332
global_aoa * global_clustering                            -0.036981
global_aoa * global_frequency                             -0.031375
global_aoa * global_letters_count                          0.005144
global_aoa * global_orthographic_density                   0.228007
global_aoa * global_synonyms_count                        -0.303554
global_aoa * rel_aoa                                       0.003799
global_aoa * rel_clustering                                0.026518
global_aoa * rel_frequency                                -0.012340
global_aoa * rel_letters_count                            -0.037177
global_aoa * rel_orthographic_density                     -0.177686
global_aoa * rel_synonyms_count                            0.199587
global_clustering * global_frequency                      -0.059679
global_clustering * global_letters_count                  -0.308504
global_clustering * global_orthographic_density           -0.676644
global_clustering * global_synonyms_count                 -0.985823
global_clustering * rel_aoa                                0.199117
global_clustering * rel_clustering                        -0.125417
global_clustering * rel_frequency                          0.265249
global_clustering * rel_letters_count                      0.294239
global_clustering * rel_orthographic_density               0.470366
global_clustering * rel_synonyms_count                     1.141032
global_frequency * global_letters_count                    0.028343
global_frequency * global_orthographic_density            -0.144853
global_frequency * global_synonyms_count                  -0.240136
global_frequency * rel_aoa                                 0.042924
global_frequency * rel_clustering                          0.377803
global_frequency * rel_frequency                          -0.020707
global_frequency * rel_letters_count                      -0.064643
global_frequency * rel_orthographic_density                0.235883
global_frequency * rel_synonyms_count                      0.284707
global_letters_count * global_orthographic_density        -0.359610
global_letters_count * global_synonyms_count               0.501116
global_letters_count * rel_aoa                            -0.012679
global_letters_count * rel_clustering                      0.556147
global_letters_count * rel_frequency                      -0.119540
global_letters_count * rel_letters_count                  -0.018882
global_letters_count * rel_orthographic_density            0.420467
global_letters_count * rel_synonyms_count                 -0.239873
global_orthographic_density * global_synonyms_count       -1.338976
global_orthographic_density * rel_aoa                     -0.108815
global_orthographic_density * rel_clustering               0.569473
global_orthographic_density * rel_frequency                0.124052
global_orthographic_density * rel_letters_count            0.165670
global_orthographic_density * rel_orthographic_density     0.001818
global_orthographic_density * rel_synonyms_count           1.352606
global_synonyms_count * rel_aoa                           -0.101728
global_synonyms_count * rel_clustering                     1.228486
global_synonyms_count * rel_frequency                     -0.218174
global_synonyms_count * rel_letters_count                 -0.568151
global_synonyms_count * rel_orthographic_density           1.472410
global_synonyms_count * rel_synonyms_count                -0.111684
rel_aoa * rel_clustering                                  -0.085376
rel_aoa * rel_frequency                                   -0.022662
rel_aoa * rel_letters_count                               -0.038212
rel_aoa * rel_orthographic_density                         0.069702
rel_aoa * rel_synonyms_count                               0.077256
rel_clustering * rel_frequency                            -0.541628
rel_clustering * rel_letters_count                        -0.487136
rel_clustering * rel_orthographic_density                 -0.380283
rel_clustering * rel_synonyms_count                       -1.513950
rel_frequency * rel_letters_count                          0.099802
rel_frequency * rel_orthographic_density                  -0.274263
rel_frequency * rel_synonyms_count                         0.069417
rel_letters_count * rel_orthographic_density              -0.268742
rel_letters_count * rel_synonyms_count                     0.350883
rel_orthographic_density * rel_synonyms_count             -1.677363
dtype: float64

----------------------------------------------------------------------
Regressing global synonyms_count with 864 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.004766204780108585

intercept                      0.386563
global_aoa                    -0.006641
global_clustering              0.017096
global_frequency               0.002003
global_letters_count           0.006697
global_orthographic_density    0.014818
global_synonyms_count          0.066354
dtype: float64

Regressing global synonyms_count with 864 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.019367169444870713

intercept                                              0.418075
global_aoa                                             0.168662
global_clustering                                      0.385232
global_frequency                                      -0.002723
global_letters_count                                   0.166659
global_orthographic_density                            0.150996
global_synonyms_count                                 -0.531359
global_aoa * global_clustering                        -0.002509
global_aoa * global_frequency                         -0.004567
global_aoa * global_letters_count                     -0.019158
global_aoa * global_orthographic_density              -0.032110
global_aoa * global_synonyms_count                     0.028130
global_clustering * global_frequency                  -0.021744
global_clustering * global_letters_count              -0.011531
global_clustering * global_orthographic_density       -0.049472
global_clustering * global_synonyms_count             -0.011934
global_frequency * global_letters_count               -0.010224
global_frequency * global_orthographic_density        -0.024500
global_frequency * global_synonyms_count               0.017259
global_letters_count * global_orthographic_density    -0.002521
global_letters_count * global_synonyms_count           0.013286
global_orthographic_density * global_synonyms_count    0.089625
dtype: float64

Regressing rel synonyms_count with 864 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.003085949533883192

intercept                      0.091412
global_aoa                    -0.000465
global_clustering              0.021814
global_frequency              -0.000442
global_letters_count           0.006666
global_orthographic_density    0.014198
global_synonyms_count          0.046104
dtype: float64

Regressing rel synonyms_count with 864 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.017957620389255857

intercept                                             -0.208640
global_aoa                                             0.167552
global_clustering                                      0.254110
global_frequency                                      -0.022580
global_letters_count                                   0.170519
global_orthographic_density                            0.215352
global_synonyms_count                                 -0.650602
global_aoa * global_clustering                         0.009265
global_aoa * global_frequency                          0.002696
global_aoa * global_letters_count                     -0.017665
global_aoa * global_orthographic_density              -0.030584
global_aoa * global_synonyms_count                     0.022127
global_clustering * global_frequency                  -0.016873
global_clustering * global_letters_count              -0.012728
global_clustering * global_orthographic_density       -0.033954
global_clustering * global_synonyms_count             -0.019649
global_frequency * global_letters_count               -0.012254
global_frequency * global_orthographic_density        -0.018774
global_frequency * global_synonyms_count               0.023579
global_letters_count * global_orthographic_density    -0.008374
global_letters_count * global_synonyms_count           0.024151
global_orthographic_density * global_synonyms_count    0.068795
dtype: float64

Regressing global synonyms_count with 864 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.0040788756298461015

intercept                   0.363970
rel_aoa                    -0.009489
rel_clustering             -0.001568
rel_frequency               0.005765
rel_letters_count           0.013823
rel_orthographic_density    0.034491
rel_synonyms_count          0.040079
dtype: float64

Regressing global synonyms_count with 864 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.016809581198069523

intercept                                        0.385265
rel_aoa                                         -0.027021
rel_clustering                                  -0.062047
rel_frequency                                    0.011496
rel_letters_count                                0.011196
rel_orthographic_density                         0.054844
rel_synonyms_count                               0.042373
rel_aoa * rel_clustering                        -0.003992
rel_aoa * rel_frequency                         -0.010538
rel_aoa * rel_letters_count                     -0.015149
rel_aoa * rel_orthographic_density              -0.025751
rel_aoa * rel_synonyms_count                     0.013644
rel_clustering * rel_frequency                  -0.000049
rel_clustering * rel_letters_count               0.012014
rel_clustering * rel_orthographic_density       -0.043021
rel_clustering * rel_synonyms_count             -0.059727
rel_frequency * rel_letters_count               -0.000567
rel_frequency * rel_orthographic_density        -0.006954
rel_frequency * rel_synonyms_count              -0.004689
rel_letters_count * rel_orthographic_density    -0.007163
rel_letters_count * rel_synonyms_count           0.013643
rel_orthographic_density * rel_synonyms_count    0.037110
dtype: float64

Regressing rel synonyms_count with 864 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.04122341122458695

intercept                   0.034106
rel_aoa                    -0.017724
rel_clustering              0.056427
rel_frequency               0.010353
rel_letters_count           0.010753
rel_orthographic_density    0.009139
rel_synonyms_count          0.189455
dtype: float64

Regressing rel synonyms_count with 864 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.05946307777590121

intercept                                        0.062314
rel_aoa                                         -0.024119
rel_clustering                                  -0.007952
rel_frequency                                    0.016956
rel_letters_count                               -0.004002
rel_orthographic_density                         0.025378
rel_synonyms_count                               0.268027
rel_aoa * rel_clustering                         0.014232
rel_aoa * rel_frequency                         -0.006406
rel_aoa * rel_letters_count                     -0.015656
rel_aoa * rel_orthographic_density              -0.020861
rel_aoa * rel_synonyms_count                    -0.001034
rel_clustering * rel_frequency                  -0.000719
rel_clustering * rel_letters_count               0.010824
rel_clustering * rel_orthographic_density       -0.036130
rel_clustering * rel_synonyms_count             -0.053242
rel_frequency * rel_letters_count               -0.004184
rel_frequency * rel_orthographic_density        -0.011903
rel_frequency * rel_synonyms_count               0.016804
rel_letters_count * rel_orthographic_density    -0.011512
rel_letters_count * rel_synonyms_count           0.020402
rel_orthographic_density * rel_synonyms_count    0.056905
dtype: float64

Regressing global synonyms_count with 864 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.011980133381401692

intercept                      1.045260
global_aoa                     0.004960
global_clustering              0.042195
global_frequency              -0.013936
global_letters_count          -0.033153
global_orthographic_density   -0.088996
global_synonyms_count          0.145427
rel_aoa                       -0.015441
rel_clustering                -0.028952
rel_frequency                  0.018281
rel_letters_count              0.042868
rel_orthographic_density       0.116247
rel_synonyms_count            -0.096096
dtype: float64

Regressing global synonyms_count with 864 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.0816456088481361

intercept                                                 5.119818
global_aoa                                                0.227672
global_clustering                                         1.329771
global_frequency                                         -0.178186
global_letters_count                                     -0.016902
global_orthographic_density                              -0.523130
global_synonyms_count                                     1.590743
rel_aoa                                                  -0.786586
rel_clustering                                           -1.131891
rel_frequency                                            -0.285749
rel_letters_count                                         0.011675
rel_orthographic_density                                  0.630992
rel_synonyms_count                                       -3.481850
global_aoa * global_clustering                            0.001645
global_aoa * global_frequency                             0.018829
global_aoa * global_letters_count                        -0.014048
global_aoa * global_orthographic_density                 -0.180643
global_aoa * global_synonyms_count                        0.014422
global_aoa * rel_aoa                                     -0.002608
global_aoa * rel_clustering                              -0.020953
global_aoa * rel_frequency                               -0.039250
global_aoa * rel_letters_count                           -0.000802
global_aoa * rel_orthographic_density                     0.172417
global_aoa * rel_synonyms_count                           0.046625
global_clustering * global_frequency                     -0.059572
global_clustering * global_letters_count                 -0.021730
global_clustering * global_orthographic_density          -0.257722
global_clustering * global_synonyms_count                 0.119498
global_clustering * rel_aoa                              -0.038375
global_clustering * rel_clustering                       -0.001615
global_clustering * rel_frequency                        -0.037566
global_clustering * rel_letters_count                    -0.052383
global_clustering * rel_orthographic_density              0.255157
global_clustering * rel_synonyms_count                   -0.120616
global_frequency * global_letters_count                  -0.023928
global_frequency * global_orthographic_density           -0.062183
global_frequency * global_synonyms_count                 -0.051175
global_frequency * rel_aoa                                0.021627
global_frequency * rel_clustering                         0.019102
global_frequency * rel_frequency                         -0.001384
global_frequency * rel_letters_count                     -0.013697
global_frequency * rel_orthographic_density               0.055491
global_frequency * rel_synonyms_count                     0.153133
global_letters_count * global_orthographic_density        0.131939
global_letters_count * global_synonyms_count              0.074246
global_letters_count * rel_aoa                            0.024597
global_letters_count * rel_clustering                     0.056245
global_letters_count * rel_frequency                      0.052408
global_letters_count * rel_letters_count                 -0.003224
global_letters_count * rel_orthographic_density          -0.112304
global_letters_count * rel_synonyms_count                -0.022027
global_orthographic_density * global_synonyms_count      -0.116784
global_orthographic_density * rel_aoa                     0.100562
global_orthographic_density * rel_clustering              0.329809
global_orthographic_density * rel_frequency               0.015001
global_orthographic_density * rel_letters_count          -0.085972
global_orthographic_density * rel_orthographic_density   -0.035274
global_orthographic_density * rel_synonyms_count          0.237292
global_synonyms_count * rel_aoa                          -0.018172
global_synonyms_count * rel_clustering                   -0.049566
global_synonyms_count * rel_frequency                     0.072546
global_synonyms_count * rel_letters_count                -0.069781
global_synonyms_count * rel_orthographic_density          0.235861
global_synonyms_count * rel_synonyms_count                0.047461
rel_aoa * rel_clustering                                  0.051671
rel_aoa * rel_frequency                                  -0.006846
rel_aoa * rel_letters_count                              -0.033232
rel_aoa * rel_orthographic_density                       -0.136019
rel_aoa * rel_synonyms_count                             -0.029154
rel_clustering * rel_frequency                            0.065736
rel_clustering * rel_letters_count                        0.041139
rel_clustering * rel_orthographic_density                -0.349895
rel_clustering * rel_synonyms_count                       0.008834
rel_frequency * rel_letters_count                        -0.018546
rel_frequency * rel_orthographic_density                 -0.029539
rel_frequency * rel_synonyms_count                       -0.162876
rel_letters_count * rel_orthographic_density              0.040583
rel_letters_count * rel_synonyms_count                    0.027402
rel_orthographic_density * rel_synonyms_count            -0.280149
dtype: float64

Regressing rel synonyms_count with 864 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.09785079903352378

intercept                      0.567953
global_aoa                     0.007656
global_clustering              0.002633
global_frequency              -0.029942
global_letters_count          -0.019261
global_orthographic_density    0.008490
global_synonyms_count         -0.492157
rel_aoa                       -0.017941
rel_clustering                 0.015236
rel_frequency                  0.030076
rel_letters_count              0.025686
rel_orthographic_density       0.009312
rel_synonyms_count             0.656031
dtype: float64

Regressing rel synonyms_count with 864 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.1653968596303923

intercept                                                 1.205229
global_aoa                                                0.110570
global_clustering                                         0.743183
global_frequency                                         -0.096702
global_letters_count                                      0.252450
global_orthographic_density                               0.585271
global_synonyms_count                                     1.110034
rel_aoa                                                  -0.406161
rel_clustering                                           -0.718043
rel_frequency                                            -0.300775
rel_letters_count                                        -0.477463
rel_orthographic_density                                 -0.859341
rel_synonyms_count                                       -3.040896
global_aoa * global_clustering                           -0.008372
global_aoa * global_frequency                             0.016044
global_aoa * global_letters_count                        -0.012191
global_aoa * global_orthographic_density                 -0.144601
global_aoa * global_synonyms_count                        0.041373
global_aoa * rel_aoa                                     -0.004125
global_aoa * rel_clustering                               0.003829
global_aoa * rel_frequency                               -0.025077
global_aoa * rel_letters_count                            0.006079
global_aoa * rel_orthographic_density                     0.136734
global_aoa * rel_synonyms_count                           0.027170
global_clustering * global_frequency                     -0.041614
global_clustering * global_letters_count                  0.019277
global_clustering * global_orthographic_density          -0.139884
global_clustering * global_synonyms_count                 0.033148
global_clustering * rel_aoa                              -0.015407
global_clustering * rel_clustering                        0.001029
global_clustering * rel_frequency                        -0.041260
global_clustering * rel_letters_count                    -0.087738
global_clustering * rel_orthographic_density              0.124738
global_clustering * rel_synonyms_count                   -0.019238
global_frequency * global_letters_count                  -0.011993
global_frequency * global_orthographic_density           -0.072531
global_frequency * global_synonyms_count                 -0.060451
global_frequency * rel_aoa                                0.014382
global_frequency * rel_clustering                         0.025458
global_frequency * rel_frequency                          0.001804
global_frequency * rel_letters_count                     -0.006522
global_frequency * rel_orthographic_density               0.090867
global_frequency * rel_synonyms_count                     0.174924
global_letters_count * global_orthographic_density        0.056020
global_letters_count * global_synonyms_count             -0.006484
global_letters_count * rel_aoa                            0.017836
global_letters_count * rel_clustering                     0.004614
global_letters_count * rel_frequency                      0.029132
global_letters_count * rel_letters_count                 -0.000968
global_letters_count * rel_orthographic_density          -0.036083
global_letters_count * rel_synonyms_count                 0.072694
global_orthographic_density * global_synonyms_count      -0.310153
global_orthographic_density * rel_aoa                     0.050365
global_orthographic_density * rel_clustering              0.154290
global_orthographic_density * rel_frequency               0.016476
global_orthographic_density * rel_letters_count          -0.028037
global_orthographic_density * rel_orthographic_density   -0.036988
global_orthographic_density * rel_synonyms_count          0.442000
global_synonyms_count * rel_aoa                          -0.066410
global_synonyms_count * rel_clustering                    0.045500
global_synonyms_count * rel_frequency                     0.077252
global_synonyms_count * rel_letters_count                -0.016040
global_synonyms_count * rel_orthographic_density          0.369217
global_synonyms_count * rel_synonyms_count                0.054925
rel_aoa * rel_clustering                                  0.021369
rel_aoa * rel_frequency                                  -0.010967
rel_aoa * rel_letters_count                              -0.029298
rel_aoa * rel_orthographic_density                       -0.077817
rel_aoa * rel_synonyms_count                              0.000555
rel_clustering * rel_frequency                            0.050531
rel_clustering * rel_letters_count                        0.077961
rel_clustering * rel_orthographic_density                -0.168237
rel_clustering * rel_synonyms_count                      -0.104882
rel_frequency * rel_letters_count                        -0.014134
rel_frequency * rel_orthographic_density                 -0.048835
rel_frequency * rel_synonyms_count                       -0.171799
rel_letters_count * rel_orthographic_density             -0.013585
rel_letters_count * rel_synonyms_count                   -0.036315
rel_orthographic_density * rel_synonyms_count            -0.423385
dtype: float64

----------------------------------------------------------------------
Regressing global orthographic_density with 730 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.05456011970550101

intercept                      2.441128
global_aoa                    -0.057671
global_clustering              0.111100
global_frequency              -0.012181
global_letters_count          -0.024275
global_orthographic_density    0.120140
global_synonyms_count          0.073275
dtype: float64

Regressing global orthographic_density with 730 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.08086569102341157

intercept                                              9.712993
global_aoa                                            -0.567442
global_clustering                                      0.826598
global_frequency                                      -0.407836
global_letters_count                                  -0.581165
global_orthographic_density                            0.029255
global_synonyms_count                                  0.455086
global_aoa * global_clustering                        -0.014086
global_aoa * global_frequency                          0.022028
global_aoa * global_letters_count                      0.026386
global_aoa * global_orthographic_density               0.041180
global_aoa * global_synonyms_count                     0.045250
global_clustering * global_frequency                  -0.036667
global_clustering * global_letters_count              -0.045103
global_clustering * global_orthographic_density       -0.017670
global_clustering * global_synonyms_count              0.119030
global_frequency * global_letters_count                0.014994
global_frequency * global_orthographic_density        -0.017212
global_frequency * global_synonyms_count              -0.017099
global_letters_count * global_orthographic_density    -0.030506
global_letters_count * global_synonyms_count          -0.008278
global_orthographic_density * global_synonyms_count    0.195242
dtype: float64

Regressing rel orthographic_density with 730 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.04495722648225453

intercept                      0.119044
global_aoa                    -0.036231
global_clustering              0.125062
global_frequency               0.000423
global_letters_count          -0.040818
global_orthographic_density    0.082298
global_synonyms_count          0.060558
dtype: float64

Regressing rel orthographic_density with 730 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.06274354969742135

intercept                                              8.168075
global_aoa                                            -0.313018
global_clustering                                      1.102738
global_frequency                                      -0.420956
global_letters_count                                  -0.751803
global_orthographic_density                           -0.248635
global_synonyms_count                                  0.002997
global_aoa * global_clustering                         0.012720
global_aoa * global_frequency                          0.017885
global_aoa * global_letters_count                      0.022473
global_aoa * global_orthographic_density               0.032205
global_aoa * global_synonyms_count                     0.039198
global_clustering * global_frequency                  -0.051160
global_clustering * global_letters_count              -0.087943
global_clustering * global_orthographic_density       -0.038061
global_clustering * global_synonyms_count              0.030825
global_frequency * global_letters_count                0.005888
global_frequency * global_orthographic_density        -0.006061
global_frequency * global_synonyms_count              -0.014629
global_letters_count * global_orthographic_density    -0.012023
global_letters_count * global_synonyms_count          -0.004095
global_orthographic_density * global_synonyms_count    0.125968
dtype: float64

Regressing global orthographic_density with 730 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.03399779520173141

intercept                   1.511102
rel_aoa                     0.024933
rel_clustering              0.053446
rel_frequency              -0.013116
rel_letters_count          -0.018086
rel_orthographic_density    0.189752
rel_synonyms_count          0.087376
dtype: float64

Regressing global orthographic_density with 730 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.061822224855744956

intercept                                        1.527737
rel_aoa                                          0.107282
rel_clustering                                   0.025891
rel_frequency                                   -0.036660
rel_letters_count                               -0.017127
rel_orthographic_density                         0.333465
rel_synonyms_count                               0.333083
rel_aoa * rel_clustering                         0.023075
rel_aoa * rel_frequency                          0.035400
rel_aoa * rel_letters_count                      0.023162
rel_aoa * rel_orthographic_density               0.048387
rel_aoa * rel_synonyms_count                     0.105528
rel_clustering * rel_frequency                  -0.017514
rel_clustering * rel_letters_count              -0.014396
rel_clustering * rel_orthographic_density        0.026031
rel_clustering * rel_synonyms_count              0.200889
rel_frequency * rel_letters_count                0.017930
rel_frequency * rel_orthographic_density         0.044232
rel_frequency * rel_synonyms_count               0.056346
rel_letters_count * rel_orthographic_density    -0.028293
rel_letters_count * rel_synonyms_count          -0.019646
rel_orthographic_density * rel_synonyms_count    0.171546
dtype: float64

Regressing rel orthographic_density with 730 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.06668970371075966

intercept                  -0.650808
rel_aoa                     0.022465
rel_clustering              0.090777
rel_frequency               0.028087
rel_letters_count          -0.026486
rel_orthographic_density    0.228379
rel_synonyms_count          0.061206
dtype: float64

Regressing rel orthographic_density with 730 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.0863104015690086

intercept                                       -0.619932
rel_aoa                                          0.074022
rel_clustering                                   0.070550
rel_frequency                                    0.032053
rel_letters_count                               -0.025186
rel_orthographic_density                         0.333993
rel_synonyms_count                               0.227011
rel_aoa * rel_clustering                         0.031087
rel_aoa * rel_frequency                          0.022414
rel_aoa * rel_letters_count                      0.020007
rel_aoa * rel_orthographic_density               0.046611
rel_aoa * rel_synonyms_count                     0.092842
rel_clustering * rel_frequency                  -0.029371
rel_clustering * rel_letters_count              -0.044411
rel_clustering * rel_orthographic_density       -0.000787
rel_clustering * rel_synonyms_count              0.149968
rel_frequency * rel_letters_count                0.007526
rel_frequency * rel_orthographic_density         0.035037
rel_frequency * rel_synonyms_count               0.039549
rel_letters_count * rel_orthographic_density    -0.017008
rel_letters_count * rel_synonyms_count          -0.032812
rel_orthographic_density * rel_synonyms_count    0.086538
dtype: float64

Regressing global orthographic_density with 730 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.07544518305791881

intercept                      3.429927
global_aoa                    -0.133127
global_clustering              0.106398
global_frequency              -0.045799
global_letters_count          -0.072855
global_orthographic_density    0.112725
global_synonyms_count          0.011397
rel_aoa                        0.121653
rel_clustering                 0.016492
rel_frequency                  0.040177
rel_letters_count              0.046598
rel_orthographic_density      -0.010197
rel_synonyms_count             0.066409
dtype: float64

Regressing global orthographic_density with 730 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.16662289499463878

intercept                                                 -6.355020
global_aoa                                                -0.337845
global_clustering                                          0.286188
global_frequency                                           0.352575
global_letters_count                                       1.390088
global_orthographic_density                                3.754944
global_synonyms_count                                     10.608294
rel_aoa                                                   -0.809874
rel_clustering                                             3.041590
rel_frequency                                             -0.178071
rel_letters_count                                         -1.032240
rel_orthographic_density                                  -2.931623
rel_synonyms_count                                        -6.642774
global_aoa * global_clustering                            -0.062760
global_aoa * global_frequency                             -0.011693
global_aoa * global_letters_count                          0.001602
global_aoa * global_orthographic_density                  -0.017634
global_aoa * global_synonyms_count                         0.044883
global_aoa * rel_aoa                                      -0.016970
global_aoa * rel_clustering                                0.055497
global_aoa * rel_frequency                                 0.014443
global_aoa * rel_letters_count                             0.048535
global_aoa * rel_orthographic_density                      0.101423
global_aoa * rel_synonyms_count                           -0.098690
global_clustering * global_frequency                      -0.066590
global_clustering * global_letters_count                   0.022097
global_clustering * global_orthographic_density            0.286666
global_clustering * global_synonyms_count                  0.483919
global_clustering * rel_aoa                               -0.014244
global_clustering * rel_clustering                         0.127577
global_clustering * rel_frequency                          0.055242
global_clustering * rel_letters_count                      0.114351
global_clustering * rel_orthographic_density              -0.106928
global_clustering * rel_synonyms_count                    -0.400478
global_frequency * global_letters_count                   -0.080098
global_frequency * global_orthographic_density            -0.141487
global_frequency * global_synonyms_count                  -0.533882
global_frequency * rel_aoa                                 0.079782
global_frequency * rel_clustering                         -0.043446
global_frequency * rel_frequency                           0.011948
global_frequency * rel_letters_count                       0.103761
global_frequency * rel_orthographic_density                0.110447
global_frequency * rel_synonyms_count                      0.333875
global_letters_count * global_orthographic_density        -0.086776
global_letters_count * global_synonyms_count              -0.721530
global_letters_count * rel_aoa                             0.055638
global_letters_count * rel_clustering                     -0.249280
global_letters_count * rel_frequency                       0.043667
global_letters_count * rel_letters_count                   0.000558
global_letters_count * rel_orthographic_density            0.148229
global_letters_count * rel_synonyms_count                  0.559841
global_orthographic_density * global_synonyms_count        0.583668
global_orthographic_density * rel_aoa                     -0.084035
global_orthographic_density * rel_clustering              -0.364619
global_orthographic_density * rel_frequency                0.011267
global_orthographic_density * rel_letters_count            0.040843
global_orthographic_density * rel_orthographic_density     0.030060
global_orthographic_density * rel_synonyms_count          -0.549136
global_synonyms_count * rel_aoa                            0.051244
global_synonyms_count * rel_clustering                    -0.346002
global_synonyms_count * rel_frequency                      0.566213
global_synonyms_count * rel_letters_count                  0.731934
global_synonyms_count * rel_orthographic_density          -0.443142
global_synonyms_count * rel_synonyms_count                -0.056595
rel_aoa * rel_clustering                                   0.044420
rel_aoa * rel_frequency                                   -0.046683
rel_aoa * rel_letters_count                               -0.066986
rel_aoa * rel_orthographic_density                         0.050536
rel_aoa * rel_synonyms_count                               0.101451
rel_clustering * rel_frequency                             0.084995
rel_clustering * rel_letters_count                         0.045474
rel_clustering * rel_orthographic_density                  0.124433
rel_clustering * rel_synonyms_count                        0.504044
rel_frequency * rel_letters_count                         -0.050516
rel_frequency * rel_orthographic_density                   0.010697
rel_frequency * rel_synonyms_count                        -0.352259
rel_letters_count * rel_orthographic_density              -0.112889
rel_letters_count * rel_synonyms_count                    -0.641170
rel_orthographic_density * rel_synonyms_count              0.466164
dtype: float64

Regressing rel orthographic_density with 730 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.10591069171712686

intercept                      2.083941
global_aoa                    -0.102412
global_clustering              0.087744
global_frequency              -0.027489
global_letters_count          -0.036854
global_orthographic_density   -0.536591
global_synonyms_count          0.035770
rel_aoa                        0.089679
rel_clustering                 0.038423
rel_frequency                  0.037967
rel_letters_count              0.011997
rel_orthographic_density       0.711642
rel_synonyms_count             0.037419
dtype: float64

Regressing rel orthographic_density with 730 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.18925705987722719

intercept                                                -7.254336
global_aoa                                                0.167291
global_clustering                                         0.330656
global_frequency                                          0.283644
global_letters_count                                      0.882710
global_orthographic_density                               3.280584
global_synonyms_count                                     8.477285
rel_aoa                                                  -1.006793
rel_clustering                                            2.637591
rel_frequency                                            -0.332705
rel_letters_count                                        -0.703647
rel_orthographic_density                                 -2.315439
rel_synonyms_count                                       -5.029606
global_aoa * global_clustering                           -0.008057
global_aoa * global_frequency                            -0.011255
global_aoa * global_letters_count                         0.005145
global_aoa * global_orthographic_density                 -0.084479
global_aoa * global_synonyms_count                        0.059327
global_aoa * rel_aoa                                     -0.013077
global_aoa * rel_clustering                               0.006247
global_aoa * rel_frequency                                0.020631
global_aoa * rel_letters_count                            0.046966
global_aoa * rel_orthographic_density                     0.168585
global_aoa * rel_synonyms_count                          -0.093742
global_clustering * global_frequency                     -0.081953
global_clustering * global_letters_count                  0.000381
global_clustering * global_orthographic_density           0.281023
global_clustering * global_synonyms_count                 0.264559
global_clustering * rel_aoa                              -0.048572
global_clustering * rel_clustering                        0.116374
global_clustering * rel_frequency                         0.051404
global_clustering * rel_letters_count                     0.126653
global_clustering * rel_orthographic_density             -0.029052
global_clustering * rel_synonyms_count                   -0.153427
global_frequency * global_letters_count                  -0.066269
global_frequency * global_orthographic_density           -0.165973
global_frequency * global_synonyms_count                 -0.471422
global_frequency * rel_aoa                                0.072015
global_frequency * rel_clustering                        -0.043916
global_frequency * rel_frequency                          0.012559
global_frequency * rel_letters_count                      0.094496
global_frequency * rel_orthographic_density               0.159059
global_frequency * rel_synonyms_count                     0.299574
global_letters_count * global_orthographic_density        0.029007
global_letters_count * global_synonyms_count             -0.662447
global_letters_count * rel_aoa                            0.048076
global_letters_count * rel_clustering                    -0.197697
global_letters_count * rel_frequency                      0.045128
global_letters_count * rel_letters_count                  0.002937
global_letters_count * rel_orthographic_density           0.064489
global_letters_count * rel_synonyms_count                 0.538010
global_orthographic_density * global_synonyms_count       0.487399
global_orthographic_density * rel_aoa                    -0.066112
global_orthographic_density * rel_clustering             -0.226127
global_orthographic_density * rel_frequency               0.056185
global_orthographic_density * rel_letters_count          -0.044380
global_orthographic_density * rel_orthographic_density    0.066389
global_orthographic_density * rel_synonyms_count         -0.401271
global_synonyms_count * rel_aoa                           0.019232
global_synonyms_count * rel_clustering                   -0.090345
global_synonyms_count * rel_frequency                     0.519428
global_synonyms_count * rel_letters_count                 0.632420
global_synonyms_count * rel_orthographic_density         -0.441448
global_synonyms_count * rel_synonyms_count               -0.046641
rel_aoa * rel_clustering                                  0.061262
rel_aoa * rel_frequency                                  -0.053148
rel_aoa * rel_letters_count                              -0.069507
rel_aoa * rel_orthographic_density                        0.017359
rel_aoa * rel_synonyms_count                              0.101437
rel_clustering * rel_frequency                            0.084494
rel_clustering * rel_letters_count                       -0.004156
rel_clustering * rel_orthographic_density                -0.084102
rel_clustering * rel_synonyms_count                       0.180010
rel_frequency * rel_letters_count                        -0.059082
rel_frequency * rel_orthographic_density                 -0.046488
rel_frequency * rel_synonyms_count                       -0.332928
rel_letters_count * rel_orthographic_density             -0.044574
rel_letters_count * rel_synonyms_count                   -0.569330
rel_orthographic_density * rel_synonyms_count             0.416810
dtype: float64

	aoa	betweenness	clustering	degree	frequency	letters_count	orthographic_density	pagerank	phonemes_count	phonological_density	syllables_count	synonyms_count
Component-0	-0.427764	0.283759	-0.089472	0.245792	0.229850	-0.449654	0.221950	0.282619	-0.426400	0.276820	-0.159614	-0.001335
Component-1	0.345614	-0.376443	0.140480	-0.297578	-0.268808	-0.425018	0.152353	-0.315924	-0.428158	0.203662	-0.172741	-0.000870
Component-2	0.741466	0.238557	-0.143220	0.096703	0.586037	-0.082116	0.002898	0.050598	-0.018035	0.075508	0.014736	-0.066779

	aoa	frequency	letters_count
Component-0	-0.727905	0.367678	-0.578764
Component-1	0.438702	-0.398972	-0.805209