Feature variation by substitution ($\nu_{\phi}$)

1 Setup

Flags and settings.



In [1]:

    
SAVE_FIGURES = False
PAPER_FEATURES = ['frequency', 'aoa', 'clustering', 'letters_count',
                  'synonyms_count', 'orthographic_density']
N_COMPONENTS = 3
BIN_COUNT = 4

Imports and database setup.



In [2]:

    
from itertools import product

import pandas as pd
import seaborn as sb
from scipy import stats
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from progressbar import ProgressBar

%cd -q ..
from brainscopypaste.conf import settings
%cd -q notebooks
from brainscopypaste.mine import Model, Time, Source, Past, Durl
from brainscopypaste.db import Substitution
from brainscopypaste.utils import init_db, session_scope
engine = init_db()

2 Variation of features upon substitution

First build our data.



In [3]:

    
model = Model(time=Time.continuous, source=Source.majority, past=Past.last_bin, durl=Durl.all, max_distance=1)
data = []

with session_scope() as session:
    substitutions = session.query(Substitution.id)\
        .filter(Substitution.model == model)
    print("Got {} substitutions for model {}"
          .format(substitutions.count(), model))
    substitution_ids = [id for (id,) in substitutions]

for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for feature in Substitution.__features__:
            source, destination = substitution.features(feature)
            source_rel, destination_rel = \
                substitution.features(feature, sentence_relative='median')
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'feature': feature,
                'source': source,
                'source_rel': source_rel,
                'destination': destination,
                'destination_rel': destination_rel,
                'h0': substitution.feature_average(feature),
                'h0_rel': substitution.feature_average(
                        feature, sentence_relative='median'),
                'h0n': substitution.feature_average(
                        feature, source_synonyms=True),
                'h0n_rel': substitution.feature_average(
                        feature, source_synonyms=True,
                        sentence_relative='median')})

original_variations = pd.DataFrame(data)
del data









    



Got 7358 substitutions for model Model(time=Time.continuous, source=Source.majority, past=Past.last_bin, durl=Durl.all, max_distance=1)






    



100% (7358 of 7358) |######################| Elapsed Time: 0:02:03 Time: 0:02:03

Compute cluster averages (so as not to overestimate confidence intervals) and crop data so that we have acceptable CIs.



In [4]:

    
variations = original_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'feature'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'feature'], as_index=False)\
    ['source', 'source_rel', 'destination', 'destination_rel', 'feature',
     'h0', 'h0_rel', 'h0n', 'h0n_rel'].mean()
variations['variation'] = variations['destination'] - variations['source']

# HARDCODED: drop values where source AoA is above 15.
# This crops the graphs to acceptable CIs.
variations.loc[(variations.feature == 'aoa') & (variations.source > 15),
               ['source', 'source_rel', 'destination', 'destination_rel',
                'h0', 'h0_rel', 'h0n', 'h0n_rel']] = np.nan

Prepare feature ordering.



In [5]:

    
ordered_features = sorted(
    Substitution.__features__,
    key=lambda f: Substitution._transformed_feature(f).__doc__
)

What we plot about features

For a feature $\phi$, plot:

$\nu_{\phi}$, the average feature of an appearing word upon substitution, as a function of the feature of the disappearing word: $$\nu_{\phi}(f) = \left< \phi(w') \right>_{\{w \rightarrow w' | \phi(w) = f \}}$$
$\nu_{\phi}^0$ (which is the average feature value), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi}^{00}$ (which is the average feature value for synonyms of the source word), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

We also plot these values relative to the sentence average, i.e.:

$\nu_{\phi, r}$, the average sentence-relative feature of an appearing word upon substitution as a function of the sentence-relative feature of the disappearing word, i.e. $\phi($destination$) - \phi($destination sentence$)$ as a function of $\phi($source$) - \phi($source sentence$)$
$\nu_{\phi, r}^0$ (which is the average feature value minus the sentence average), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi, r}^{00}$ (which is the average feature value for synonyms of the source word minus the sentence average), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

Those values are plotted with fixed-width bins, then quantile bins, with absolute feature values, then with relative-to-sentence features.



In [6]:

    
def print_significance(name, bins, h0, h0n, values):
    bin_count = bins.max() + 1
    print()
    print('-' * len(name))
    print(name)
    print('-' * len(name))
    header = ('Bin  |   '
              + ' |   '.join(map(str, range(1, bin_count + 1)))
              + ' |')
    print(header)
    print('-' * len(header))
    
    for null_name, nulls in [('H_0 ', h0), ('H_00', h0n)]:
        bin_values = np.zeros(bin_count)
        bin_nulls = np.zeros(bin_count)
        cis = np.zeros((bin_count, 3))

        for i in range(bin_count):
            indices = bins == i
            n = (indices).sum()
            s = values[indices].std(ddof=1)

            bin_values[i] = values[indices].mean()
            bin_nulls[i] = nulls[indices].mean()
            for j, alpha in enumerate([.05, .01, .001]):
                cis[i, j] = (stats.t.ppf(1 - alpha/2, n - 1)
                             * values[indices].std(ddof=1)
                             / np.sqrt(n - 1))

        print(null_name + ' |', end='')
        differences = ((bin_values[:,np.newaxis]
                        < bin_nulls[:,np.newaxis] - cis)
                       | (bin_values[:,np.newaxis]
                          > bin_nulls[:,np.newaxis] + cis))
        for i in range(bin_count):
            if differences[i].any():
                n_stars = np.where(differences[i])[0].max()
                bin_stars = '*' * (1 + n_stars) + ' ' * (2 - n_stars)
            else:
                bin_stars = 'ns.'
            print(' ' + bin_stars + ' |', end='')
        print()



In [7]:

    
def plot_variation(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    relative = kwargs.get('relative', False)
    quantiles = kwargs.get('quantiles', False)
    feature_field = kwargs.get('feature_field', 'feature')
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    nuphi = r'\nu_{\phi' + (',r' if relative else '') + '}'
    plt.plot(middles, values, '-', lw=2, color=color,
             label='${}$'.format(nuphi))
    plt.fill_between(middles, values - cis, values + cis,
                     color=sb.desaturate(color, 0.2), alpha=0.2)
    plt.plot(middles, h0s, '--', color=sb.desaturate(color, 0.2),
             label='${}^0$'.format(nuphi))
    plt.plot(middles, h0ns, linestyle='-.',
             color=sb.desaturate(color, 0.2),
             label='${}^{{00}}$'.format(nuphi))
    plt.plot(middles, middles, linestyle='dotted',
             color=sb.desaturate(color, 0.2),
             label='$y = x$')
    lmin, lmax = middles[0], middles[-1]
    h0min, h0max = min(h0s.min(), h0ns.min()), max(h0s.max(), h0ns.max())
    # Rescale limits if we're touching H0 or H00.
    if h0min < lmin:
        lmin = h0min - (lmax - h0min) / 10
    elif h0max > lmax:
        lmax = h0max + (h0max - lmin) / 10
    plt.xlim(lmin, lmax)
    plt.ylim(lmin, lmax)

    # Test for statistical significance
    print_significance(str(data.iloc[0][feature_field]),
                       x_bins, h0, h0n, y)



In [8]:

    
def plot_grid(data, features, filename,
              plot_function, xlabel, ylabel,
              feature_field='feature', plot_kws={}):
    g = sb.FacetGrid(data=data[data[feature_field]
                               .map(lambda f: f in features)],
                     sharex=False, sharey=False,
                     col=feature_field, hue=feature_field,
                     col_order=features, hue_order=features,
                     col_wrap=3, aspect=1.5, size=3)
    g.map_dataframe(plot_function, **plot_kws)
    g.set_titles('{col_name}')
    g.set_xlabels(xlabel)
    g.set_ylabels(ylabel)
    for ax in g.axes.ravel():
        legend = ax.legend(frameon=True, loc='best')
        if not legend:
            # Skip if nothing was plotted on these axes.
            continue
        frame = legend.get_frame()
        frame.set_facecolor('#f2f2f2')
        frame.set_edgecolor('#000000')
        ax.set_title(Substitution._transformed_feature(ax.get_title())
                     .__doc__)
    if SAVE_FIGURES:
        g.fig.savefig(settings.FIGURE.format(filename),
                      bbox_inches='tight', dpi=300)



In [9]:

    
def plot_bias(ax, data, color, ci=True, relative=False, quantiles=False):
    feature = data.iloc[0].feature
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    scale = abs(h0s.mean())
    ax.plot(np.linspace(0, 1, bin_count),
            (values - h0ns) / scale, '-', lw=2, color=color,
            label=Substitution._transformed_feature(feature).__doc__)
    if ci:
        ax.fill_between(np.linspace(0, 1, bin_count),
                        (values - h0ns - cis) / scale,
                        (values - h0ns + cis) / scale,
                        color=sb.desaturate(color, 0.2), alpha=0.2)



In [10]:

    
def plot_overlay(data, features, filename, palette_name,
                 plot_function, title, xlabel, ylabel, plot_kws={}):
    palette = sb.color_palette(palette_name, len(features))
    fig, ax = plt.subplots(figsize=(12, 6))
    for j, feature in enumerate(features):
        plot_function(ax, data[data.feature == feature].dropna(),
                      color=palette[j], **plot_kws)
    ax.legend(loc='lower right')
    ax.set_title(title)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    if SAVE_FIGURES:
        fig.savefig(settings.FIGURE.format(filename),
                    bbox_inches='tight', dpi=300)
    return ax

2.1 Global feature values

2.1.1 Bins of distribution of appeared global feature values

For each feature $\phi$, we plot the variation upon substitution as explained above



In [11]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *** |
H_00 | *** | *** | *** | ns. |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | *** | ns. | **  | *   |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | ns. | *** | **  |
H_00 | *** | ns. | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | **  |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *   | *** | *** |
H_00 | ns. | *** | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *   | *** | ns. | *   |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | *** | ns. | **  | *   |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | ns. |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | **  | *** | *** |
H_00 | *** | ns. | ns. | **  |

Then plot $\nu_{\phi} - \nu_{\phi}^{00}$ for each feature (i.e. the measured bias) to see how they compare



In [12]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False});



In [13]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *   | *** | ns. | *   |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *** |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | **  |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | *** | ns. | **  | *   |



In [14]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)')\
    .set_ylim(-2, .7);

2.1.2 Quantiles of distribution of appeared global feature values



In [15]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *** |
H_00 | *** | *   | ns. | *   |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |
------------------------
H_0  | *** | *** | *** |
H_00 | *** | *   | *   |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *   |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *** |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | ns. | *   | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *** | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | **  | *** | ns. | *** |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | ns. | **  | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | ns. | *** | *** |
H_00 | *** | ns. | ns. | **  |



In [16]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False, 'quantiles': True});



In [17]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *   |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *** | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *** |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | **  | *** | ns. | *** |



In [18]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'quantiles': True})\
    .set_ylim(-1.2, .6);

2.2 Sentence-relative feature values

2.2.1 Bins of distribution of appeared sentence-relative values



In [19]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | ns. |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | **  | *** |
H_00 | **  | ns. | ns. | *** |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | **  | *** |
H_00 | ns. | *** | ns. | **  |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | ns. |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | ns. | ns. | *** | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | ns. |
H_00 | ns. | *** | ns. | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | *** | *   | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | ns. | *** | *** |
H_00 | **  | *** | *   | ns. |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | **  | *** | *** | **  |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | ns. | *** | *   |
H_00 | **  | **  | ns. | ns. |



In [20]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True});



In [21]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | ns. |
H_00 | ns. | *** | ns. | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | ns. |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | ns. | *** | *** |
H_00 | **  | *** | *   | ns. |



In [22]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_sentencerel',
             'deep', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True})\
    .set_ylim(-2, .7);

2.2.2 Quantiles of distribution of appeared sentence-relative values



In [23]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | **  | *** |
H_00 | **  | ns. | ns. | *   |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *** |
H_00 | *** | ns. | ns. | *   |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | **  |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | ns. | ns. | *** |
H_00 | *** | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | *** | **  | *** |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | ns. | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *   | *** | *** | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | ns. | *** | *** |
H_00 | *** | **  | ns. | *   |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *   | *** | *** |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | ns. | *** | *** |
H_00 | *** | *   | ns. | *   |



In [24]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True, 'quantiles': True});



In [25]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | **  |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | ns. | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | ns. | ns. | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | ns. | *** | *** |
H_00 | *** | **  | ns. | *   |



In [26]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True, 'quantiles': True});

3 Streamplots

We'd like to see what happens between absolute and relative feature values, i.e. how do their effects interact. Especially, we want to know who wins between cognitive bias, attraction to sentence average, or attraction to global feature average.

To do this we plot the general direction (arrows) and strength (color) of where destination words are given a particular absolute/relative source feature couple. I.e., for a given absolute feature value and relative feature value, if this word were to be substituted, where would it go in this (absolute, relative) space?

The interesting thing in these plots is the attraction front, where all arrows point to and join. We're interested in:

its slope
its shape (e.g. several slope regimes?)
its position w.r.t. $\nu_{\phi}^0$ and $y = 0$ (which is $\left< \phi(sentence) \right>$)

First, here's our plotting function. (Note we set the arrow size to something that turns out to be huge here, but gives normal sizes in the figures saves. There must be some dpi scaling problem with the arrows.)



In [27]:

    
def plot_stream(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    source = data['source']
    source_rel = data['source_rel']
    dest = data['destination']
    dest_rel = data['destination_rel']
    h0 = data['h0']
    
    # Compute binning.
    bin_count = 4
    x_bins, x_margins = pd.cut(source, bin_count,
                               right=False, labels=False, retbins=True)
    x_middles = (x_margins[:-1] + x_margins[1:]) / 2
    y_bins, y_margins = pd.cut(source_rel, bin_count,
                               right=False, labels=False, retbins=True)
    y_middles = (y_margins[:-1] + y_margins[1:]) / 2
    
    # Compute bin values.
    h0s = np.ones(bin_count) * h0.iloc[0]
    u_values = np.zeros((bin_count, bin_count))
    v_values = np.zeros((bin_count, bin_count))
    strength = np.zeros((bin_count, bin_count))
    for x in range(bin_count):
        for y in range(bin_count):
            u_values[y, x] = (
                dest[(x_bins == x) & (y_bins == y)] -
                source[(x_bins == x) & (y_bins == y)]
            ).mean()
            v_values[y, x] = (
                dest_rel[(x_bins == x) & (y_bins == y)] -
                source_rel[(x_bins == x) & (y_bins == y)]
            ).mean()
            strength[y, x] = np.sqrt(
                (dest[(x_bins == x) & (y_bins == y)] - 
                 source[(x_bins == x) & (y_bins == y)]) ** 2 +
                (dest_rel[(x_bins == x) & (y_bins == y)] - 
                 source_rel[(x_bins == x) & (y_bins == y)]) ** 2
            ).mean()
    
    # Plot.
    plt.streamplot(x_middles, y_middles, u_values, v_values,
                   arrowsize=4, color=strength, cmap=plt.cm.viridis)
    plt.plot(x_middles, np.zeros(bin_count), linestyle='-',
             color=sb.desaturate(color, 0.2), 
             label=r'$\left< \phi(sentence) \right>$')
    plt.plot(h0s, y_middles, linestyle='--',
             color=sb.desaturate(color, 0.2), label=r'$\nu_{\phi}^0$')
    plt.xlim(x_middles[0], x_middles[-1])
    plt.ylim(y_middles[0], y_middles[-1])

Here are the plots for all features



In [28]:

    
g = sb.FacetGrid(data=variations,
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=ordered_features, hue_order=ordered_features)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

And here are the plots for the features we expose in the paper



In [29]:

    
g = sb.FacetGrid(data=variations[variations['feature']
                                 .map(lambda f: f in PAPER_FEATURES)],
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=PAPER_FEATURES, hue_order=PAPER_FEATURES)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

4 PCA'd feature variations

Compute PCA on feature variations (note: on variations, not on features directly), and show the evolution of the first three components upon substitution.

CAVEAT: the PCA is computed on variations where all features are defined. This greatly reduces the number of words included (and also the number of substitutions -- see below for real values, but you should know it's drastic). This also has an effect on the computation of $\mathcal{H}_0$ and $\mathcal{H}_{00}$, which are computed using words for which all features are defined. This, again, hugely reduces the number of words taken into account, changing the values under the null hypotheses.

4.1 On all the features

Compute the actual PCA



In [30]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(Substitution.__features__))
pcavariations = variations.pivot(index='cluster_id',
                                 columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

print("We're plotting variation for the first {} components:"
      .format(N_COMPONENTS))
pd.DataFrame(pca.components_[:N_COMPONENTS],
             columns=pcafeatures,
             index=['Component-{}'.format(i) for i in range(N_COMPONENTS)])









    



MLE estimates there are 11 components.

Those explain the following variance:
[ 0.52965151  0.1847743   0.0775079   0.07113102  0.03371227  0.02999846
  0.01926641  0.01805821  0.01509079  0.0096798   0.00655659]

We're plotting variation for the first 3 components:






    Out[30]:






  
    
      
      aoa
      betweenness
      clustering
      degree
      frequency
      letters_count
      orthographic_density
      pagerank
      phonemes_count
      phonological_density
      syllables_count
      synonyms_count
    
  
  
    
      Component-0
      0.528445
      -0.271420
      0.083731
      -0.222847
      -0.242064
      0.419672
      -0.218663
      -0.262450
      0.375893
      -0.276713
      0.145347
      -0.001198
    
    
      Component-1
      0.342208
      -0.365379
      0.109383
      -0.289321
      -0.229756
      -0.424334
      0.194591
      -0.292117
      -0.444309
      0.266082
      -0.169381
      0.027833
    
    
      Component-2
      0.348832
      0.625277
      -0.043001
      0.171100
      -0.631099
      -0.106504
      -0.000281
      0.197628
      -0.033353
      0.033945
      -0.050172
      -0.052467

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [31]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(N_COMPONENTS):
            source, destination = substitution\
                .components(component, pca, pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (7358 of 7358) |######################| Elapsed Time: 0:01:42 Time: 0:01:42

Compute cluster averages (so as not to overestimate confidence intervals).



In [32]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components (see the caveat section below)



In [33]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='upper left')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | *** | **  | ns. | **  |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | ns. |

---
2.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | ns. |
H_00 | ns. | *** | *** | ns. |

4.2 On a subset of relevant features



In [34]:

    
relevant_features = ['frequency', 'aoa', 'letters_count']

Compute the actual PCA



In [35]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(relevant_features))
pcavariations = variations[variations['feature']
                           .map(lambda f: f in pcafeatures)]\
    .pivot(index='cluster_id', columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

pd.DataFrame(pca.components_,
             columns=pcafeatures,
             index=['Component-{}'.format(i)
                    for i in range(pca.n_components_)])









    



MLE estimates there are 2 components.

Those explain the following variance:
[ 0.66337507  0.2085776 ]







    Out[35]:






  
    
      
      aoa
      frequency
      letters_count
    
  
  
    
      Component-0
      0.755290
      -0.385516
      0.530014
    
    
      Component-1
      -0.335417
      0.467393
      0.817948

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [36]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(pca.n_components_):
            source, destination = substitution.components(component, pca,
                                                          pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (7358 of 7358) |######################| Elapsed Time: 0:00:47 Time: 0:00:47

Compute cluster averages (so as not to overestimate confidence intervals).



In [37]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components



In [38]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | ns. | ns. |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | *** | *** | *** |

4.3 CAVEAT: reduction of the numbers of words and substitutions

As explained above, this PCA analysis can only use words for which all the features are defined (in this case, the features listed in relevant_features). So note the following:



In [39]:

    
for feature in relevant_features:
    print("Feature '{}' is based on {} words."
          .format(feature, len(Substitution
                               ._transformed_feature(feature)())))

# Compute the number of words that have all PAPER_FEATURES defined.
words = set()
for tfeature in [Substitution._transformed_feature(feature)
                 for feature in relevant_features]:
    words.update(tfeature())

data = dict((feature, []) for feature in relevant_features)
words_list = []
for word in words:
    words_list.append(word)
    for feature in relevant_features:
        data[feature].append(Substitution
                             ._transformed_feature(feature)(word))
wordsdf = pd.DataFrame(data)
wordsdf['words'] = words_list
del words_list, data

print()
print("Among all the set of words used by these features, "
      "only {} are used."
      .format(len(wordsdf.dropna())))

print()
print("Similarly, we mined {} (cluster-unique) substitutions, "
      "but the PCA is in fact"
      " computed on {} of them (those where all features are defined)."
      .format(len(set(variations['cluster_id'])), len(pcavariations)))









    



Feature 'frequency' is based on 33450 words.
Feature 'aoa' is based on 30102 words.
Feature 'letters_count' is based on 42786 words.

Among all the set of words used by these features, only 14450 are used.

Similarly, we mined 881 (cluster-unique) substitutions, but the PCA is in fact computed on 676 of them (those where all features are defined).

The way $\mathcal{H}_0$ and $\mathcal{H}_{00}$ are computed makes them also affected by this.

5 Interactions between features (by Anova)

Some useful variables first.



In [40]:

    
cuts = [('fixed bins', pd.cut)]#, ('quantiles', pd.qcut)]
rels = [('global', ''), ('sentence-relative', '_rel')]

def star_level(p):
    if p < .001:
        return '***'
    elif p < .01:
        return ' **'
    elif p < .05:
        return '  *'
    else:
        return 'ns.'

Now for each feature, assess if it has an interaction with the other features' destination value. We look at this for all pairs of features, with all pairs of global/sentence-relative value and types of binning (fixed width/quantiles). So it's a lot of answers.

Three stars means $p < .001$, two $p < .01$, one $p < .05$, and ns. means non-significative.



In [41]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
    * global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
   ** global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
  *** sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
  *** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
  *** global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
    * global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
    * sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
    * global -> global
    * global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
   ** global -> global
  ns. global -> sentence-relative
   ** sentence-relative -> global
    * sentence-relative -> sentence-relative

-> aoa
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> clustering
   ** global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Now for each feature, look at its interaction with the other features' variation (i.e. destination - source). Same drill, same combinations.



In [42]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]\
                    - variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
   ** global -> global
    * global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
    * global -> global
    * global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  *** global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
   ** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
   ** global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
   ** global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
    * global -> global
    * global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
  ns. global -> global
    * global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
  *** global -> global
   ** global -> sentence-relative
   ** sentence-relative -> global
    * sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Ok, so this can go on for a long time, and I'm not going to look at interactions with this lens (meaning at interaction of couples of features with another feature's destination values).

6 Regression



In [43]:

    
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures



In [44]:

    
rels = {False: ('global', ''),
        True: ('rel', '_rel')}

def regress(data, features, target,
            source_rel=False, dest_rel=False, interactions=False):
    if source_rel not in [True, False, 'both']:
        raise ValueError
    if not isinstance(dest_rel, bool):
        raise ValueError
    # Process source/destination relativeness arguments.
    if isinstance(source_rel, bool):
        source_rel = [source_rel]
    else:
        source_rel = [False, True]
    dest_rel_name, dest_rel = rels[dest_rel]
    
    features = tuple(sorted(features))
    feature_tuples = [('source' + rels[rel][1], feature)
                      for rel in source_rel
                      for feature in features]
    feature_names = [rels[rel][0] + '_' + feature
                     for rel in source_rel
                     for feature in features]
    
    # Get source and destination values.
    source = pd.pivot_table(
        data,
        values=['source' + rels[rel][1] for rel in source_rel],
        index=['cluster_id'],
        columns=['feature']
    )[feature_tuples].dropna()
    destination = variations[variations.feature == target]\
        .pivot(index='cluster_id', columns='feature',
               values='destination' + dest_rel)\
        .loc[source.index][target].dropna()
    source = source.loc[destination.index].values
    destination = destination.values

    # If asked to, get polynomial features.
    if interactions:
        poly = PolynomialFeatures(degree=2, interaction_only=True)
        source = poly.fit_transform(source)
        regress_features = [' * '.join([feature_names[j]
                                        for j, p in enumerate(powers)
                                        if p > 0]) or 'intercept'
                            for powers in poly.powers_]
    else:
        regress_features = feature_names

    # Regress.
    linreg = linear_model.LinearRegression(fit_intercept=not interactions)
    linreg.fit(source, destination)

    # And print the score and coefficients.
    print('Regressing {} with {} measures, {} interactions'
          .format(dest_rel_name + ' ' + target, len(source),
                  'with' if interactions else 'no'))
    print('           ' + '^' * len(dest_rel_name + ' ' + target))
    print('R^2 = {}'
          .format(linreg.score(source, destination)))
    print()
    coeffs = pd.Series(index=regress_features, data=linreg.coef_)
    if not interactions:
        coeffs = pd.Series(index=['intercept'], data=[linreg.intercept_])\
            .append(coeffs)
    with pd.option_context('display.max_rows', 999):
        print(coeffs)



In [45]:

    
for target in PAPER_FEATURES:
    print('-' * 70)
    for source_rel, dest_rel in product([False, True, 'both'],
                                        [False, True]):
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel)
        print()
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel, interactions=True)
        print()









    



----------------------------------------------------------------------
Regressing global frequency with 503 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.1693552573435606

intercept                      3.840847
global_aoa                     0.127387
global_clustering              0.143919
global_frequency               0.538811
global_letters_count           0.001459
global_orthographic_density    0.101768
global_synonyms_count         -0.074403
dtype: float64

Regressing global frequency with 503 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.19556401294522374

intercept                                              0.828309
global_aoa                                            -0.104438
global_clustering                                      0.829330
global_frequency                                       1.103426
global_letters_count                                   0.567994
global_orthographic_density                            3.043509
global_synonyms_count                                 -0.314164
global_aoa * global_clustering                        -0.027622
global_aoa * global_frequency                          0.006338
global_aoa * global_letters_count                      0.000740
global_aoa * global_orthographic_density              -0.043448
global_aoa * global_synonyms_count                     0.101799
global_clustering * global_frequency                  -0.040875
global_clustering * global_letters_count              -0.041147
global_clustering * global_orthographic_density        0.166080
global_clustering * global_synonyms_count             -0.050853
global_frequency * global_letters_count               -0.091650
global_frequency * global_orthographic_density        -0.216836
global_frequency * global_synonyms_count              -0.021438
global_letters_count * global_orthographic_density     0.051281
global_letters_count * global_synonyms_count          -0.085038
global_orthographic_density * global_synonyms_count   -0.024118
dtype: float64

Regressing rel frequency with 503 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.1106601044826192

intercept                     -7.399689
global_aoa                     0.137283
global_clustering              0.142575
global_frequency               0.481995
global_letters_count           0.045064
global_orthographic_density    0.001758
global_synonyms_count          0.019079
dtype: float64

Regressing rel frequency with 503 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.12209704742869298

intercept                                             -13.783467
global_aoa                                              0.045699
global_clustering                                      -0.495636
global_frequency                                        1.339816
global_letters_count                                    0.281682
global_orthographic_density                             1.281838
global_synonyms_count                                  -0.941574
global_aoa * global_clustering                          0.006558
global_aoa * global_frequency                          -0.013823
global_aoa * global_letters_count                       0.026755
global_aoa * global_orthographic_density                0.026946
global_aoa * global_synonyms_count                      0.086746
global_clustering * global_frequency                    0.053400
global_clustering * global_letters_count                0.002820
global_clustering * global_orthographic_density         0.123900
global_clustering * global_synonyms_count              -0.026840
global_frequency * global_letters_count                -0.051344
global_frequency * global_orthographic_density         -0.116391
global_frequency * global_synonyms_count               -0.002850
global_letters_count * global_orthographic_density      0.038858
global_letters_count * global_synonyms_count            0.014450
global_orthographic_density * global_synonyms_count     0.117712
dtype: float64

Regressing global frequency with 503 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.08450110686095458

intercept                   9.645808
rel_aoa                     0.106847
rel_clustering              0.068451
rel_frequency               0.335418
rel_letters_count           0.029006
rel_orthographic_density    0.086533
rel_synonyms_count         -0.183123
dtype: float64

Regressing global frequency with 503 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.11514160513618865

intercept                                        9.594861
rel_aoa                                          0.194525
rel_clustering                                   0.078812
rel_frequency                                    0.347032
rel_letters_count                                0.046801
rel_orthographic_density                        -0.271237
rel_synonyms_count                               0.075868
rel_aoa * rel_clustering                        -0.023289
rel_aoa * rel_frequency                          0.033231
rel_aoa * rel_letters_count                     -0.025977
rel_aoa * rel_orthographic_density              -0.070158
rel_aoa * rel_synonyms_count                     0.068682
rel_clustering * rel_frequency                  -0.069936
rel_clustering * rel_letters_count              -0.002323
rel_clustering * rel_orthographic_density        0.189423
rel_clustering * rel_synonyms_count              0.079629
rel_frequency * rel_letters_count               -0.035340
rel_frequency * rel_orthographic_density        -0.102956
rel_frequency * rel_synonyms_count               0.050936
rel_letters_count * rel_orthographic_density     0.062407
rel_letters_count * rel_synonyms_count          -0.188302
rel_orthographic_density * rel_synonyms_count   -0.279502
dtype: float64

Regressing rel frequency with 503 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.33198233070635197

intercept                  -1.057987
rel_aoa                     0.092569
rel_clustering              0.260584
rel_frequency               0.702593
rel_letters_count          -0.052861
rel_orthographic_density   -0.104936
rel_synonyms_count         -0.092812
dtype: float64

Regressing rel frequency with 503 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.36056996046299983

intercept                                       -1.072601
rel_aoa                                          0.105109
rel_clustering                                   0.102887
rel_frequency                                    0.750872
rel_letters_count                               -0.127973
rel_orthographic_density                        -0.562293
rel_synonyms_count                               0.162963
rel_aoa * rel_clustering                        -0.066159
rel_aoa * rel_frequency                         -0.039050
rel_aoa * rel_letters_count                     -0.013556
rel_aoa * rel_orthographic_density               0.053355
rel_aoa * rel_synonyms_count                     0.135461
rel_clustering * rel_frequency                  -0.074828
rel_clustering * rel_letters_count               0.035536
rel_clustering * rel_orthographic_density        0.074865
rel_clustering * rel_synonyms_count              0.246726
rel_frequency * rel_letters_count               -0.047167
rel_frequency * rel_orthographic_density        -0.156512
rel_frequency * rel_synonyms_count               0.088315
rel_letters_count * rel_orthographic_density     0.015157
rel_letters_count * rel_synonyms_count          -0.115774
rel_orthographic_density * rel_synonyms_count   -0.037848
dtype: float64

Regressing global frequency with 503 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.17516703946265644

intercept                      2.194815
global_aoa                     0.106486
global_clustering              0.033600
global_frequency               0.585284
global_letters_count           0.035616
global_orthographic_density    0.236006
global_synonyms_count          0.211248
rel_aoa                        0.031921
rel_clustering                 0.118571
rel_frequency                 -0.054112
rel_letters_count             -0.037184
rel_orthographic_density      -0.168814
rel_synonyms_count            -0.321680
dtype: float64

Regressing global frequency with 503 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.2967054172196023

intercept                                                -28.126967
global_aoa                                                 0.108213
global_clustering                                         -4.819305
global_frequency                                           2.029352
global_letters_count                                       0.800416
global_orthographic_density                                6.930450
global_synonyms_count                                      4.499600
rel_aoa                                                    1.118393
rel_clustering                                             3.521761
rel_frequency                                             -1.799887
rel_letters_count                                         -0.218401
rel_orthographic_density                                  -1.725065
rel_synonyms_count                                        -2.881345
global_aoa * global_clustering                             0.186743
global_aoa * global_frequency                              0.119274
global_aoa * global_letters_count                          0.046290
global_aoa * global_orthographic_density                  -0.049481
global_aoa * global_synonyms_count                        -0.395497
global_aoa * rel_aoa                                      -0.024834
global_aoa * rel_clustering                               -0.154271
global_aoa * rel_frequency                                 0.029009
global_aoa * rel_letters_count                             0.007614
global_aoa * rel_orthographic_density                     -0.004901
global_aoa * rel_synonyms_count                            0.443183
global_clustering * global_frequency                       0.177399
global_clustering * global_letters_count                  -0.122135
global_clustering * global_orthographic_density            0.966642
global_clustering * global_synonyms_count                 -0.072794
global_clustering * rel_aoa                               -0.324654
global_clustering * rel_clustering                         0.227457
global_clustering * rel_frequency                         -0.254486
global_clustering * rel_letters_count                      0.279827
global_clustering * rel_orthographic_density              -0.576527
global_clustering * rel_synonyms_count                     0.233356
global_frequency * global_letters_count                   -0.242328
global_frequency * global_orthographic_density            -0.113567
global_frequency * global_synonyms_count                  -0.546737
global_frequency * rel_aoa                                -0.295777
global_frequency * rel_clustering                          0.044180
global_frequency * rel_frequency                          -0.018174
global_frequency * rel_letters_count                       0.234071
global_frequency * rel_orthographic_density               -0.147725
global_frequency * rel_synonyms_count                      0.555494
global_letters_count * global_orthographic_density         0.141762
global_letters_count * global_synonyms_count               0.640149
global_letters_count * rel_aoa                            -0.040773
global_letters_count * rel_clustering                      0.001326
global_letters_count * rel_frequency                       0.026200
global_letters_count * rel_letters_count                  -0.011263
global_letters_count * rel_orthographic_density           -0.114660
global_letters_count * rel_synonyms_count                 -0.927256
global_orthographic_density * global_synonyms_count        0.043582
global_orthographic_density * rel_aoa                      0.126854
global_orthographic_density * rel_clustering              -0.694487
global_orthographic_density * rel_frequency                0.138824
global_orthographic_density * rel_letters_count           -0.148062
global_orthographic_density * rel_orthographic_density    -0.118980
global_orthographic_density * rel_synonyms_count           0.045675
global_synonyms_count * rel_aoa                            0.300731
global_synonyms_count * rel_clustering                    -0.250414
global_synonyms_count * rel_frequency                      0.195736
global_synonyms_count * rel_letters_count                 -0.224814
global_synonyms_count * rel_orthographic_density           0.245203
global_synonyms_count * rel_synonyms_count                 0.067156
rel_aoa * rel_clustering                                   0.147821
rel_aoa * rel_frequency                                    0.104477
rel_aoa * rel_letters_count                               -0.005390
rel_aoa * rel_orthographic_density                        -0.113520
rel_aoa * rel_synonyms_count                              -0.263252
rel_clustering * rel_frequency                             0.001703
rel_clustering * rel_letters_count                        -0.192500
rel_clustering * rel_orthographic_density                  0.330792
rel_clustering * rel_synonyms_count                        0.493686
rel_frequency * rel_letters_count                         -0.083413
rel_frequency * rel_orthographic_density                  -0.059641
rel_frequency * rel_synonyms_count                        -0.070741
rel_letters_count * rel_orthographic_density               0.135739
rel_letters_count * rel_synonyms_count                     0.425405
rel_orthographic_density * rel_synonyms_count             -0.451413
dtype: float64

Regressing rel frequency with 503 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.3778907256120312

intercept                      2.045981
global_aoa                     0.104254
global_clustering              0.166029
global_frequency              -0.340932
global_letters_count           0.085144
global_orthographic_density    0.256582
global_synonyms_count          0.155418
rel_aoa                        0.017393
rel_clustering                 0.032290
rel_frequency                  0.909296
rel_letters_count             -0.085320
rel_orthographic_density      -0.167599
rel_synonyms_count            -0.241426
dtype: float64

Regressing rel frequency with 503 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.4639269952238518

intercept                                                -33.850317
global_aoa                                                 0.203795
global_clustering                                         -5.377929
global_frequency                                           1.419909
global_letters_count                                       1.412397
global_orthographic_density                                6.919174
global_synonyms_count                                      4.321909
rel_aoa                                                    0.865845
rel_clustering                                             5.247571
rel_frequency                                             -1.133459
rel_letters_count                                         -1.238270
rel_orthographic_density                                  -2.464171
rel_synonyms_count                                        -2.989525
global_aoa * global_clustering                             0.149365
global_aoa * global_frequency                              0.098470
global_aoa * global_letters_count                          0.027969
global_aoa * global_orthographic_density                  -0.071208
global_aoa * global_synonyms_count                        -0.359657
global_aoa * rel_aoa                                      -0.025225
global_aoa * rel_clustering                               -0.147376
global_aoa * rel_frequency                                 0.040975
global_aoa * rel_letters_count                             0.037186
global_aoa * rel_orthographic_density                      0.029407
global_aoa * rel_synonyms_count                            0.404747
global_clustering * global_frequency                       0.233302
global_clustering * global_letters_count                   0.007755
global_clustering * global_orthographic_density            0.906951
global_clustering * global_synonyms_count                 -0.116633
global_clustering * rel_aoa                               -0.299559
global_clustering * rel_clustering                         0.200455
global_clustering * rel_frequency                         -0.269075
global_clustering * rel_letters_count                      0.151004
global_clustering * rel_orthographic_density              -0.538406
global_clustering * rel_synonyms_count                     0.201465
global_frequency * global_letters_count                   -0.192982
global_frequency * global_orthographic_density            -0.114617
global_frequency * global_synonyms_count                  -0.551386
global_frequency * rel_aoa                                -0.267014
global_frequency * rel_clustering                         -0.079682
global_frequency * rel_frequency                          -0.004619
global_frequency * rel_letters_count                       0.213402
global_frequency * rel_orthographic_density               -0.109577
global_frequency * rel_synonyms_count                      0.529746
global_letters_count * global_orthographic_density         0.117702
global_letters_count * global_synonyms_count               0.570606
global_letters_count * rel_aoa                            -0.028088
global_letters_count * rel_clustering                     -0.168521
global_letters_count * rel_frequency                       0.012927
global_letters_count * rel_letters_count                  -0.008771
global_letters_count * rel_orthographic_density           -0.056752
global_letters_count * rel_synonyms_count                 -0.828239
global_orthographic_density * global_synonyms_count        0.074730
global_orthographic_density * rel_aoa                      0.129193
global_orthographic_density * rel_clustering              -0.698142
global_orthographic_density * rel_frequency                0.170530
global_orthographic_density * rel_letters_count           -0.080347
global_orthographic_density * rel_orthographic_density    -0.094325
global_orthographic_density * rel_synonyms_count           0.030848
global_synonyms_count * rel_aoa                            0.290735
global_synonyms_count * rel_clustering                    -0.238451
global_synonyms_count * rel_frequency                      0.179864
global_synonyms_count * rel_letters_count                 -0.203577
global_synonyms_count * rel_orthographic_density           0.242639
global_synonyms_count * rel_synonyms_count                 0.072266
rel_aoa * rel_clustering                                   0.148836
rel_aoa * rel_frequency                                    0.073092
rel_aoa * rel_letters_count                               -0.033583
rel_aoa * rel_orthographic_density                        -0.139719
rel_aoa * rel_synonyms_count                              -0.260170
rel_clustering * rel_frequency                             0.079019
rel_clustering * rel_letters_count                        -0.029333
rel_clustering * rel_orthographic_density                  0.337743
rel_clustering * rel_synonyms_count                        0.509876
rel_frequency * rel_letters_count                         -0.085503
rel_frequency * rel_orthographic_density                  -0.111930
rel_frequency * rel_synonyms_count                        -0.044734
rel_letters_count * rel_orthographic_density               0.061457
rel_letters_count * rel_synonyms_count                     0.386432
rel_orthographic_density * rel_synonyms_count             -0.458680
dtype: float64

----------------------------------------------------------------------
Regressing global aoa with 462 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.14784799901996304

intercept                      4.713029
global_aoa                     0.297249
global_clustering             -0.151249
global_frequency              -0.094692
global_letters_count           0.086017
global_orthographic_density   -0.178521
global_synonyms_count         -0.120407
dtype: float64

Regressing global aoa with 462 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.2017683392147247

intercept                                             -3.110006
global_aoa                                             1.141404
global_clustering                                     -2.444321
global_frequency                                      -0.177858
global_letters_count                                   0.180022
global_orthographic_density                           -2.329623
global_synonyms_count                                 -1.629732
global_aoa * global_clustering                         0.054183
global_aoa * global_frequency                         -0.087393
global_aoa * global_letters_count                      0.032741
global_aoa * global_orthographic_density               0.056359
global_aoa * global_synonyms_count                    -0.106714
global_clustering * global_frequency                   0.052317
global_clustering * global_letters_count               0.268154
global_clustering * global_orthographic_density       -0.132024
global_clustering * global_synonyms_count             -0.163639
global_frequency * global_letters_count                0.133636
global_frequency * global_orthographic_density         0.105212
global_frequency * global_synonyms_count               0.057754
global_letters_count * global_orthographic_density    -0.022897
global_letters_count * global_synonyms_count           0.086429
global_orthographic_density * global_synonyms_count    0.148535
dtype: float64

Regressing rel aoa with 462 measures, no interactions
           ^^^^^^^
R^2 = 0.05880053073259861

intercept                      0.330176
global_aoa                     0.137538
global_clustering             -0.113925
global_frequency              -0.182012
global_letters_count           0.059270
global_orthographic_density    0.053172
global_synonyms_count         -0.104642
dtype: float64

Regressing rel aoa with 462 measures, with interactions
           ^^^^^^^
R^2 = 0.1093164349304947

intercept                                              2.424067
global_aoa                                             1.377960
global_clustering                                     -0.003734
global_frequency                                      -0.656781
global_letters_count                                  -0.703091
global_orthographic_density                           -2.417698
global_synonyms_count                                 -1.660621
global_aoa * global_clustering                         0.080520
global_aoa * global_frequency                         -0.081938
global_aoa * global_letters_count                     -0.015752
global_aoa * global_orthographic_density               0.034838
global_aoa * global_synonyms_count                    -0.042279
global_clustering * global_frequency                  -0.036668
global_clustering * global_letters_count               0.020422
global_clustering * global_orthographic_density       -0.321648
global_clustering * global_synonyms_count             -0.264223
global_frequency * global_letters_count                0.121615
global_frequency * global_orthographic_density         0.083584
global_frequency * global_synonyms_count              -0.010831
global_letters_count * global_orthographic_density    -0.090470
global_letters_count * global_synonyms_count           0.037690
global_orthographic_density * global_synonyms_count    0.105141
dtype: float64

Regressing global aoa with 462 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.05948703314076875

intercept                   6.658963
rel_aoa                     0.116084
rel_clustering              0.129909
rel_frequency               0.061890
rel_letters_count          -0.058977
rel_orthographic_density   -0.541202
rel_synonyms_count         -0.230622
dtype: float64

Regressing global aoa with 462 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.13529266245692406

intercept                                        6.414916
rel_aoa                                         -0.241714
rel_clustering                                  -0.112431
rel_frequency                                    0.012741
rel_letters_count                               -0.061912
rel_orthographic_density                        -0.631087
rel_synonyms_count                               0.151620
rel_aoa * rel_clustering                         0.088834
rel_aoa * rel_frequency                         -0.128012
rel_aoa * rel_letters_count                      0.051089
rel_aoa * rel_orthographic_density               0.127967
rel_aoa * rel_synonyms_count                    -0.046068
rel_clustering * rel_frequency                   0.078144
rel_clustering * rel_letters_count               0.244488
rel_clustering * rel_orthographic_density        0.116504
rel_clustering * rel_synonyms_count             -0.101655
rel_frequency * rel_letters_count                0.036343
rel_frequency * rel_orthographic_density         0.014968
rel_frequency * rel_synonyms_count               0.029876
rel_letters_count * rel_orthographic_density     0.001607
rel_letters_count * rel_synonyms_count           0.172015
rel_orthographic_density * rel_synonyms_count    0.728087
dtype: float64

Regressing rel aoa with 462 measures, no interactions
           ^^^^^^^
R^2 = 0.22001841180704673

intercept                   0.363160
rel_aoa                     0.477864
rel_clustering             -0.037636
rel_frequency              -0.072521
rel_letters_count          -0.016124
rel_orthographic_density    0.052489
rel_synonyms_count         -0.174989
dtype: float64

Regressing rel aoa with 462 measures, with interactions
           ^^^^^^^
R^2 = 0.25025133945749967

intercept                                        0.421336
rel_aoa                                          0.385185
rel_clustering                                  -0.122304
rel_frequency                                   -0.034759
rel_letters_count                               -0.026928
rel_orthographic_density                         0.384971
rel_synonyms_count                               0.059128
rel_aoa * rel_clustering                         0.066477
rel_aoa * rel_frequency                         -0.034464
rel_aoa * rel_letters_count                     -0.013560
rel_aoa * rel_orthographic_density              -0.005431
rel_aoa * rel_synonyms_count                    -0.125440
rel_clustering * rel_frequency                   0.018507
rel_clustering * rel_letters_count               0.045120
rel_clustering * rel_orthographic_density       -0.014379
rel_clustering * rel_synonyms_count             -0.254260
rel_frequency * rel_letters_count                0.016036
rel_frequency * rel_orthographic_density         0.099208
rel_frequency * rel_synonyms_count               0.010015
rel_letters_count * rel_orthographic_density    -0.044541
rel_letters_count * rel_synonyms_count           0.099389
rel_orthographic_density * rel_synonyms_count    0.257897
dtype: float64

Regressing global aoa with 462 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.19362123560523356

intercept                      0.434991
global_aoa                     0.410955
global_clustering             -0.478063
global_frequency              -0.130386
global_letters_count           0.464201
global_orthographic_density    0.017931
global_synonyms_count          0.330983
rel_aoa                       -0.168105
rel_clustering                 0.345253
rel_frequency                  0.004358
rel_letters_count             -0.428045
rel_orthographic_density      -0.131909
rel_synonyms_count            -0.522108
dtype: float64

Regressing global aoa with 462 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.3703330663900626

intercept                                                 100.232177
global_aoa                                                 -0.161828
global_clustering                                          12.215915
global_frequency                                           -4.332160
global_letters_count                                       -5.770915
global_orthographic_density                               -19.611259
global_synonyms_count                                     -12.097041
rel_aoa                                                     0.058630
rel_clustering                                            -10.359042
rel_frequency                                               5.337029
rel_letters_count                                           4.531699
rel_orthographic_density                                   12.281217
rel_synonyms_count                                         -5.217451
global_aoa * global_clustering                             -0.053176
global_aoa * global_frequency                              -0.101497
global_aoa * global_letters_count                           0.101469
global_aoa * global_orthographic_density                    0.258624
global_aoa * global_synonyms_count                          0.141997
global_aoa * rel_aoa                                       -0.001695
global_aoa * rel_clustering                                -0.019812
global_aoa * rel_frequency                                 -0.103979
global_aoa * rel_letters_count                             -0.173311
global_aoa * rel_orthographic_density                      -0.326579
global_aoa * rel_synonyms_count                             0.013828
global_clustering * global_frequency                       -0.465695
global_clustering * global_letters_count                   -0.246990
global_clustering * global_orthographic_density            -2.614471
global_clustering * global_synonyms_count                  -1.566964
global_clustering * rel_aoa                                 0.272478
global_clustering * rel_clustering                         -0.098087
global_clustering * rel_frequency                           0.691899
global_clustering * rel_letters_count                       0.006113
global_clustering * rel_orthographic_density                1.462193
global_clustering * rel_synonyms_count                      0.384847
global_frequency * global_letters_count                     0.383800
global_frequency * global_orthographic_density              0.141486
global_frequency * global_synonyms_count                    0.258391
global_frequency * rel_aoa                                  0.163030
global_frequency * rel_clustering                           0.220424
global_frequency * rel_frequency                           -0.022101
global_frequency * rel_letters_count                       -0.358680
global_frequency * rel_orthographic_density                -0.130004
global_frequency * rel_synonyms_count                       0.440990
global_letters_count * global_orthographic_density          0.163640
global_letters_count * global_synonyms_count               -0.314329
global_letters_count * rel_aoa                             -0.014387
global_letters_count * rel_clustering                       0.626184
global_letters_count * rel_frequency                        0.001511
global_letters_count * rel_letters_count                    0.068697
global_letters_count * rel_orthographic_density            -0.096781
global_letters_count * rel_synonyms_count                   0.948899
global_orthographic_density * global_synonyms_count         0.059427
global_orthographic_density * rel_aoa                      -0.058991
global_orthographic_density * rel_clustering                2.042211
global_orthographic_density * rel_frequency                -0.069834
global_orthographic_density * rel_letters_count            -0.232034
global_orthographic_density * rel_orthographic_density      0.227763
global_orthographic_density * rel_synonyms_count           -0.713399
global_synonyms_count * rel_aoa                            -0.395097
global_synonyms_count * rel_clustering                      1.821113
global_synonyms_count * rel_frequency                      -0.581083
global_synonyms_count * rel_letters_count                  -0.248646
global_synonyms_count * rel_orthographic_density           -0.319759
global_synonyms_count * rel_synonyms_count                  0.163352
rel_aoa * rel_clustering                                    0.019538
rel_aoa * rel_frequency                                    -0.055319
rel_aoa * rel_letters_count                                 0.068371
rel_aoa * rel_orthographic_density                          0.123374
rel_aoa * rel_synonyms_count                                0.133492
rel_clustering * rel_frequency                             -0.423781
rel_clustering * rel_letters_count                         -0.093199
rel_clustering * rel_orthographic_density                  -0.676931
rel_clustering * rel_synonyms_count                        -1.171781
rel_frequency * rel_letters_count                           0.084040
rel_frequency * rel_orthographic_density                    0.181743
rel_frequency * rel_synonyms_count                         -0.077556
rel_letters_count * rel_orthographic_density                0.300115
rel_letters_count * rel_synonyms_count                     -0.254797
rel_orthographic_density * rel_synonyms_count               1.269740
dtype: float64

Regressing rel aoa with 462 measures, no interactions
           ^^^^^^^
R^2 = 0.27260817842448093

intercept                      1.100010
global_aoa                    -0.396632
global_clustering             -0.312528
global_frequency              -0.118866
global_letters_count           0.258524
global_orthographic_density   -0.108360
global_synonyms_count          0.292017
rel_aoa                        0.756344
rel_clustering                 0.294350
rel_frequency                 -0.002163
rel_letters_count             -0.234449
rel_orthographic_density      -0.019596
rel_synonyms_count            -0.497012
dtype: float64

Regressing rel aoa with 462 measures, with interactions
           ^^^^^^^
R^2 = 0.4351352198718219

intercept                                                 107.257258
global_aoa                                                 -2.189975
global_clustering                                          13.919198
global_frequency                                           -5.161930
global_letters_count                                       -5.214899
global_orthographic_density                               -15.901749
global_synonyms_count                                     -11.681989
rel_aoa                                                     0.866531
rel_clustering                                            -12.618937
rel_frequency                                               4.949186
rel_letters_count                                           4.376722
rel_orthographic_density                                    9.072232
rel_synonyms_count                                         -1.553406
global_aoa * global_clustering                             -0.238712
global_aoa * global_frequency                              -0.097980
global_aoa * global_letters_count                           0.092006
global_aoa * global_orthographic_density                    0.274381
global_aoa * global_synonyms_count                          0.397057
global_aoa * rel_aoa                                       -0.013813
global_aoa * rel_clustering                                 0.208676
global_aoa * rel_frequency                                 -0.100700
global_aoa * rel_letters_count                             -0.167598
global_aoa * rel_orthographic_density                      -0.341827
global_aoa * rel_synonyms_count                            -0.284153
global_clustering * global_frequency                       -0.624562
global_clustering * global_letters_count                   -0.224817
global_clustering * global_orthographic_density            -2.240956
global_clustering * global_synonyms_count                  -1.523694
global_clustering * rel_aoa                                 0.179689
global_clustering * rel_clustering                         -0.204744
global_clustering * rel_frequency                           0.651924
global_clustering * rel_letters_count                       0.074127
global_clustering * rel_orthographic_density                1.062585
global_clustering * rel_synonyms_count                      0.625355
global_frequency * global_letters_count                     0.371400
global_frequency * global_orthographic_density              0.079971
global_frequency * global_synonyms_count                    0.190262
global_frequency * rel_aoa                                  0.157473
global_frequency * rel_clustering                           0.328755
global_frequency * rel_frequency                           -0.002290
global_frequency * rel_letters_count                       -0.332509
global_frequency * rel_orthographic_density                -0.146786
global_frequency * rel_synonyms_count                       0.352356
global_letters_count * global_orthographic_density         -0.051193
global_letters_count * global_synonyms_count               -0.500676
global_letters_count * rel_aoa                             -0.045045
global_letters_count * rel_clustering                       0.499326
global_letters_count * rel_frequency                       -0.046699
global_letters_count * rel_letters_count                    0.073723
global_letters_count * rel_orthographic_density             0.128642
global_letters_count * rel_synonyms_count                   0.941446
global_orthographic_density * global_synonyms_count         0.032707
global_orthographic_density * rel_aoa                      -0.120837
global_orthographic_density * rel_clustering                2.057537
global_orthographic_density * rel_frequency                -0.003805
global_orthographic_density * rel_letters_count            -0.046479
global_orthographic_density * rel_orthographic_density      0.292491
global_orthographic_density * rel_synonyms_count           -0.519786
global_synonyms_count * rel_aoa                            -0.402875
global_synonyms_count * rel_clustering                      1.838459
global_synonyms_count * rel_frequency                      -0.348639
global_synonyms_count * rel_letters_count                   0.081067
global_synonyms_count * rel_orthographic_density           -0.132329
global_synonyms_count * rel_synonyms_count                  0.178252
rel_aoa * rel_clustering                                    0.010183
rel_aoa * rel_frequency                                    -0.039447
rel_aoa * rel_letters_count                                 0.078544
rel_aoa * rel_orthographic_density                          0.142348
rel_aoa * rel_synonyms_count                                0.175844
rel_clustering * rel_frequency                             -0.348009
rel_clustering * rel_letters_count                         -0.104226
rel_clustering * rel_orthographic_density                  -0.769058
rel_clustering * rel_synonyms_count                        -1.491669
rel_frequency * rel_letters_count                           0.095183
rel_frequency * rel_orthographic_density                    0.134725
rel_frequency * rel_synonyms_count                         -0.183881
rel_letters_count * rel_orthographic_density                0.139188
rel_letters_count * rel_synonyms_count                     -0.467564
rel_orthographic_density * rel_synonyms_count               0.662367
dtype: float64

----------------------------------------------------------------------
Regressing global clustering with 422 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.13921698421566164

intercept                     -3.293251
global_aoa                    -0.045066
global_clustering              0.328217
global_frequency              -0.051314
global_letters_count           0.033465
global_orthographic_density   -0.005343
global_synonyms_count         -0.095923
dtype: float64

Regressing global clustering with 422 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.20248198233829784

intercept                                             -0.429890
global_aoa                                             0.361220
global_clustering                                      0.700112
global_frequency                                      -0.734019
global_letters_count                                  -0.031478
global_orthographic_density                            0.175735
global_synonyms_count                                 -1.578926
global_aoa * global_clustering                         0.052523
global_aoa * global_frequency                         -0.007870
global_aoa * global_letters_count                     -0.001512
global_aoa * global_orthographic_density              -0.011637
global_aoa * global_synonyms_count                     0.001866
global_clustering * global_frequency                  -0.097913
global_clustering * global_letters_count               0.016198
global_clustering * global_orthographic_density        0.062676
global_clustering * global_synonyms_count             -0.155140
global_frequency * global_letters_count                0.019678
global_frequency * global_orthographic_density         0.042092
global_frequency * global_synonyms_count               0.018206
global_letters_count * global_orthographic_density    -0.025353
global_letters_count * global_synonyms_count           0.058406
global_orthographic_density * global_synonyms_count    0.045646
dtype: float64

Regressing rel clustering with 422 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.09723767762863389

intercept                      2.441173
global_aoa                    -0.042883
global_clustering              0.269478
global_frequency              -0.036025
global_letters_count           0.020651
global_orthographic_density   -0.001631
global_synonyms_count         -0.124089
dtype: float64

Regressing rel clustering with 422 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.15544196820884826

intercept                                              7.264651
global_aoa                                             0.255187
global_clustering                                      0.986380
global_frequency                                      -0.781511
global_letters_count                                  -0.212589
global_orthographic_density                            0.386283
global_synonyms_count                                 -1.227788
global_aoa * global_clustering                         0.040827
global_aoa * global_frequency                         -0.003200
global_aoa * global_letters_count                      0.001132
global_aoa * global_orthographic_density              -0.018694
global_aoa * global_synonyms_count                    -0.026827
global_clustering * global_frequency                  -0.110947
global_clustering * global_letters_count              -0.011146
global_clustering * global_orthographic_density        0.068217
global_clustering * global_synonyms_count             -0.114506
global_frequency * global_letters_count                0.017072
global_frequency * global_orthographic_density         0.024462
global_frequency * global_synonyms_count               0.001931
global_letters_count * global_orthographic_density    -0.017590
global_letters_count * global_synonyms_count           0.090174
global_orthographic_density * global_synonyms_count    0.053284
dtype: float64

Regressing global clustering with 422 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.0719060212119641

intercept                  -5.931226
rel_aoa                    -0.015098
rel_clustering              0.262345
rel_frequency              -0.009358
rel_letters_count           0.013118
rel_orthographic_density    0.000139
rel_synonyms_count         -0.092510
dtype: float64

Regressing global clustering with 422 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.12945924643167284

intercept                                       -5.848151
rel_aoa                                         -0.037081
rel_clustering                                   0.088056
rel_frequency                                    0.027019
rel_letters_count                               -0.021272
rel_orthographic_density                         0.039447
rel_synonyms_count                              -0.103529
rel_aoa * rel_clustering                         0.076356
rel_aoa * rel_frequency                         -0.009491
rel_aoa * rel_letters_count                     -0.012625
rel_aoa * rel_orthographic_density               0.006444
rel_aoa * rel_synonyms_count                     0.014252
rel_clustering * rel_frequency                  -0.033094
rel_clustering * rel_letters_count               0.031616
rel_clustering * rel_orthographic_density        0.050720
rel_clustering * rel_synonyms_count             -0.039878
rel_frequency * rel_letters_count               -0.005864
rel_frequency * rel_orthographic_density         0.004001
rel_frequency * rel_synonyms_count               0.006731
rel_letters_count * rel_orthographic_density    -0.025754
rel_letters_count * rel_synonyms_count           0.057443
rel_orthographic_density * rel_synonyms_count    0.115098
dtype: float64

Regressing rel clustering with 422 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.2044643913731714

intercept                   0.190609
rel_aoa                    -0.021966
rel_clustering              0.456714
rel_frequency               0.002282
rel_letters_count           0.012970
rel_orthographic_density    0.020976
rel_synonyms_count         -0.068099
dtype: float64

Regressing rel clustering with 422 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.24660967787495025

intercept                                        0.241105
rel_aoa                                         -0.029137
rel_clustering                                   0.273433
rel_frequency                                    0.027578
rel_letters_count                               -0.006031
rel_orthographic_density                         0.009165
rel_synonyms_count                              -0.080089
rel_aoa * rel_clustering                         0.041923
rel_aoa * rel_frequency                         -0.005532
rel_aoa * rel_letters_count                     -0.020489
rel_aoa * rel_orthographic_density              -0.022570
rel_aoa * rel_synonyms_count                    -0.007383
rel_clustering * rel_frequency                  -0.049093
rel_clustering * rel_letters_count               0.028743
rel_clustering * rel_orthographic_density        0.050869
rel_clustering * rel_synonyms_count             -0.081278
rel_frequency * rel_letters_count               -0.007150
rel_frequency * rel_orthographic_density        -0.018771
rel_frequency * rel_synonyms_count               0.005519
rel_letters_count * rel_orthographic_density    -0.015187
rel_letters_count * rel_synonyms_count           0.075109
rel_orthographic_density * rel_synonyms_count    0.122627
dtype: float64

Regressing global clustering with 422 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.15767357478676658

intercept                     -1.827993
global_aoa                    -0.061252
global_clustering              0.426854
global_frequency              -0.099781
global_letters_count           0.036025
global_orthographic_density   -0.079522
global_synonyms_count         -0.095232
rel_aoa                        0.019014
rel_clustering                -0.100955
rel_frequency                  0.058768
rel_letters_count             -0.006415
rel_orthographic_density       0.088745
rel_synonyms_count            -0.020173
dtype: float64

Regressing global clustering with 422 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.3293374673563825

intercept                                                 22.001854
global_aoa                                                -0.113978
global_clustering                                          5.513673
global_frequency                                          -1.873126
global_letters_count                                       0.322800
global_orthographic_density                               -0.350901
global_synonyms_count                                     -0.671422
rel_aoa                                                   -0.122788
rel_clustering                                            -4.405948
rel_frequency                                              0.354490
rel_letters_count                                         -1.242981
rel_orthographic_density                                  -1.551325
rel_synonyms_count                                        -2.599507
global_aoa * global_clustering                            -0.087980
global_aoa * global_frequency                              0.011182
global_aoa * global_letters_count                         -0.050668
global_aoa * global_orthographic_density                  -0.196512
global_aoa * global_synonyms_count                         0.033327
global_aoa * rel_aoa                                       0.015588
global_aoa * rel_clustering                                0.100887
global_aoa * rel_frequency                                 0.000624
global_aoa * rel_letters_count                             0.053404
global_aoa * rel_orthographic_density                      0.161782
global_aoa * rel_synonyms_count                            0.014141
global_clustering * global_frequency                      -0.321052
global_clustering * global_letters_count                  -0.045155
global_clustering * global_orthographic_density           -0.392140
global_clustering * global_synonyms_count                 -0.719681
global_clustering * rel_aoa                               -0.015586
global_clustering * rel_clustering                        -0.062698
global_clustering * rel_frequency                          0.111133
global_clustering * rel_letters_count                     -0.085958
global_clustering * rel_orthographic_density               0.094779
global_clustering * rel_synonyms_count                     0.626479
global_frequency * global_letters_count                   -0.012977
global_frequency * global_orthographic_density            -0.053287
global_frequency * global_synonyms_count                   0.026466
global_frequency * rel_aoa                                -0.052396
global_frequency * rel_clustering                          0.180515
global_frequency * rel_frequency                           0.015478
global_frequency * rel_letters_count                       0.046920
global_frequency * rel_orthographic_density                0.115324
global_frequency * rel_synonyms_count                      0.136384
global_letters_count * global_orthographic_density         0.036925
global_letters_count * global_synonyms_count              -0.454608
global_letters_count * rel_aoa                             0.027874
global_letters_count * rel_clustering                      0.089141
global_letters_count * rel_frequency                       0.037182
global_letters_count * rel_letters_count                  -0.006984
global_letters_count * rel_orthographic_density           -0.066306
global_letters_count * rel_synonyms_count                  0.618811
global_orthographic_density * global_synonyms_count       -1.074081
global_orthographic_density * rel_aoa                      0.205708
global_orthographic_density * rel_clustering               0.395105
global_orthographic_density * rel_frequency                0.042598
global_orthographic_density * rel_letters_count           -0.101167
global_orthographic_density * rel_orthographic_density    -0.045353
global_orthographic_density * rel_synonyms_count           0.978301
global_synonyms_count * rel_aoa                            0.028988
global_synonyms_count * rel_clustering                     0.414855
global_synonyms_count * rel_frequency                      0.004802
global_synonyms_count * rel_letters_count                  0.250033
global_synonyms_count * rel_orthographic_density           0.685353
global_synonyms_count * rel_synonyms_count                -0.034766
rel_aoa * rel_clustering                                   0.077855
rel_aoa * rel_frequency                                    0.049804
rel_aoa * rel_letters_count                               -0.039159
rel_aoa * rel_orthographic_density                        -0.170047
rel_aoa * rel_synonyms_count                              -0.071239
rel_clustering * rel_frequency                            -0.081328
rel_clustering * rel_letters_count                         0.089687
rel_clustering * rel_orthographic_density                  0.032052
rel_clustering * rel_synonyms_count                       -0.512967
rel_frequency * rel_letters_count                         -0.051691
rel_frequency * rel_orthographic_density                  -0.058787
rel_frequency * rel_synonyms_count                        -0.157851
rel_letters_count * rel_orthographic_density               0.077371
rel_letters_count * rel_synonyms_count                    -0.322342
rel_orthographic_density * rel_synonyms_count             -0.438733
dtype: float64

Regressing rel clustering with 422 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.26790836369512516

intercept                     -0.985286
global_aoa                    -0.054553
global_clustering             -0.421911
global_frequency              -0.088709
global_letters_count           0.012987
global_orthographic_density   -0.071518
global_synonyms_count         -0.137344
rel_aoa                        0.019019
rel_clustering                 0.803637
rel_frequency                  0.053620
rel_letters_count              0.004690
rel_orthographic_density       0.057445
rel_synonyms_count             0.042856
dtype: float64

Regressing rel clustering with 422 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.4142918901415621

intercept                                                 20.673448
global_aoa                                                -0.184795
global_clustering                                          3.773981
global_frequency                                          -1.723619
global_letters_count                                      -0.308644
global_orthographic_density                               -0.172144
global_synonyms_count                                      0.459103
rel_aoa                                                   -0.010908
rel_clustering                                            -3.187592
rel_frequency                                              0.403243
rel_letters_count                                         -0.863365
rel_orthographic_density                                  -2.412514
rel_synonyms_count                                        -2.967608
global_aoa * global_clustering                            -0.066987
global_aoa * global_frequency                              0.012598
global_aoa * global_letters_count                         -0.030346
global_aoa * global_orthographic_density                  -0.147400
global_aoa * global_synonyms_count                         0.012710
global_aoa * rel_aoa                                       0.012639
global_aoa * rel_clustering                                0.066996
global_aoa * rel_frequency                                -0.006160
global_aoa * rel_letters_count                             0.040161
global_aoa * rel_orthographic_density                      0.130674
global_aoa * rel_synonyms_count                            0.004157
global_clustering * global_frequency                      -0.267054
global_clustering * global_letters_count                  -0.088421
global_clustering * global_orthographic_density           -0.243804
global_clustering * global_synonyms_count                 -0.534776
global_clustering * rel_aoa                               -0.035058
global_clustering * rel_clustering                        -0.096593
global_clustering * rel_frequency                          0.092809
global_clustering * rel_letters_count                     -0.063569
global_clustering * rel_orthographic_density              -0.096475
global_clustering * rel_synonyms_count                     0.477812
global_frequency * global_letters_count                    0.010395
global_frequency * global_orthographic_density            -0.008584
global_frequency * global_synonyms_count                  -0.032860
global_frequency * rel_aoa                                -0.052229
global_frequency * rel_clustering                          0.158994
global_frequency * rel_frequency                           0.014559
global_frequency * rel_letters_count                       0.026349
global_frequency * rel_orthographic_density                0.085258
global_frequency * rel_synonyms_count                      0.181654
global_letters_count * global_orthographic_density         0.012102
global_letters_count * global_synonyms_count              -0.401109
global_letters_count * rel_aoa                             0.003771
global_letters_count * rel_clustering                      0.125970
global_letters_count * rel_frequency                       0.020043
global_letters_count * rel_letters_count                  -0.003682
global_letters_count * rel_orthographic_density           -0.022978
global_letters_count * rel_synonyms_count                  0.553315
global_orthographic_density * global_synonyms_count       -0.912292
global_orthographic_density * rel_aoa                      0.158821
global_orthographic_density * rel_clustering               0.290047
global_orthographic_density * rel_frequency                0.017605
global_orthographic_density * rel_letters_count           -0.079089
global_orthographic_density * rel_orthographic_density    -0.042354
global_orthographic_density * rel_synonyms_count           0.741071
global_synonyms_count * rel_aoa                           -0.002593
global_synonyms_count * rel_clustering                     0.371022
global_synonyms_count * rel_frequency                      0.011828
global_synonyms_count * rel_letters_count                  0.237676
global_synonyms_count * rel_orthographic_density           0.542463
global_synonyms_count * rel_synonyms_count                -0.020653
rel_aoa * rel_clustering                                   0.073288
rel_aoa * rel_frequency                                    0.046998
rel_aoa * rel_letters_count                               -0.025274
rel_aoa * rel_orthographic_density                        -0.146369
rel_aoa * rel_synonyms_count                              -0.022944
rel_clustering * rel_frequency                            -0.081484
rel_clustering * rel_letters_count                         0.073803
rel_clustering * rel_orthographic_density                  0.118116
rel_clustering * rel_synonyms_count                       -0.465233
rel_frequency * rel_letters_count                         -0.038183
rel_frequency * rel_orthographic_density                  -0.064697
rel_frequency * rel_synonyms_count                        -0.154946
rel_letters_count * rel_orthographic_density               0.043674
rel_letters_count * rel_synonyms_count                    -0.309467
rel_orthographic_density * rel_synonyms_count             -0.269784
dtype: float64

----------------------------------------------------------------------
Regressing global letters_count with 503 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.15120533024787108

intercept                      4.881611
global_aoa                    -0.081317
global_clustering             -0.151578
global_frequency              -0.063556
global_letters_count           0.340451
global_orthographic_density   -0.241732
global_synonyms_count         -0.382040
dtype: float64

Regressing global letters_count with 503 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.18090983402858563

intercept                                             -10.719951
global_aoa                                              0.590692
global_clustering                                      -3.638692
global_frequency                                        0.582165
global_letters_count                                    0.557222
global_orthographic_density                            -0.570473
global_synonyms_count                                  -0.774476
global_aoa * global_clustering                          0.143772
global_aoa * global_frequency                           0.046540
global_aoa * global_letters_count                      -0.037492
global_aoa * global_orthographic_density               -0.034672
global_aoa * global_synonyms_count                      0.067197
global_clustering * global_frequency                    0.237870
global_clustering * global_letters_count                0.016002
global_clustering * global_orthographic_density         0.091597
global_clustering * global_synonyms_count               0.352482
global_frequency * global_letters_count                 0.021618
global_frequency * global_orthographic_density          0.147311
global_frequency * global_synonyms_count                0.155739
global_letters_count * global_orthographic_density     -0.060577
global_letters_count * global_synonyms_count            0.037047
global_orthographic_density * global_synonyms_count     0.356008
dtype: float64

Regressing rel letters_count with 503 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.09297509807073101

intercept                      2.045984
global_aoa                    -0.112679
global_clustering             -0.154057
global_frequency              -0.127151
global_letters_count           0.258577
global_orthographic_density   -0.157778
global_synonyms_count         -0.411154
dtype: float64

Regressing rel letters_count with 503 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.1309985379812999

intercept                                             -17.392641
global_aoa                                              1.068514
global_clustering                                      -3.858788
global_frequency                                        0.844248
global_letters_count                                    0.451211
global_orthographic_density                            -0.026117
global_synonyms_count                                  -1.503234
global_aoa * global_clustering                          0.221268
global_aoa * global_frequency                           0.055894
global_aoa * global_letters_count                      -0.060741
global_aoa * global_orthographic_density               -0.058045
global_aoa * global_synonyms_count                      0.095865
global_clustering * global_frequency                    0.270663
global_clustering * global_letters_count               -0.064386
global_clustering * global_orthographic_density         0.058657
global_clustering * global_synonyms_count               0.148813
global_frequency * global_letters_count                -0.003481
global_frequency * global_orthographic_density          0.111834
global_frequency * global_synonyms_count                0.132940
global_letters_count * global_orthographic_density     -0.087183
global_letters_count * global_synonyms_count           -0.012953
global_orthographic_density * global_synonyms_count     0.234990
dtype: float64

Regressing global letters_count with 503 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.10308386010677605

intercept                   5.606987
rel_aoa                    -0.136071
rel_clustering              0.048960
rel_frequency               0.029410
rel_letters_count           0.180469
rel_orthographic_density   -0.446706
rel_synonyms_count         -0.346318
dtype: float64

Regressing global letters_count with 503 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.12757157157760968

intercept                                        5.587367
rel_aoa                                         -0.193741
rel_clustering                                   0.199020
rel_frequency                                    0.070337
rel_letters_count                                0.131890
rel_orthographic_density                        -0.590763
rel_synonyms_count                               0.017216
rel_aoa * rel_clustering                         0.056424
rel_aoa * rel_frequency                         -0.026463
rel_aoa * rel_letters_count                     -0.010599
rel_aoa * rel_orthographic_density               0.016229
rel_aoa * rel_synonyms_count                     0.094555
rel_clustering * rel_frequency                   0.033421
rel_clustering * rel_letters_count               0.002575
rel_clustering * rel_orthographic_density        0.163209
rel_clustering * rel_synonyms_count              0.456994
rel_frequency * rel_letters_count               -0.026810
rel_frequency * rel_orthographic_density        -0.028694
rel_frequency * rel_synonyms_count               0.068349
rel_letters_count * rel_orthographic_density    -0.005424
rel_letters_count * rel_synonyms_count           0.117072
rel_orthographic_density * rel_synonyms_count    0.692996
dtype: float64

Regressing rel letters_count with 503 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.18509179962933398

intercept                   1.008942
rel_aoa                    -0.123192
rel_clustering             -0.064372
rel_frequency              -0.178754
rel_letters_count           0.386458
rel_orthographic_density   -0.114810
rel_synonyms_count         -0.330510
dtype: float64

Regressing rel letters_count with 503 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.21275987830004262

intercept                                        0.839293
rel_aoa                                         -0.164378
rel_clustering                                   0.287574
rel_frequency                                   -0.220080
rel_letters_count                                0.516735
rel_orthographic_density                        -0.090620
rel_synonyms_count                               0.059736
rel_aoa * rel_clustering                         0.111350
rel_aoa * rel_frequency                          0.007251
rel_aoa * rel_letters_count                     -0.045544
rel_aoa * rel_orthographic_density              -0.092249
rel_aoa * rel_synonyms_count                     0.097940
rel_clustering * rel_frequency                   0.108845
rel_clustering * rel_letters_count              -0.022306
rel_clustering * rel_orthographic_density        0.144190
rel_clustering * rel_synonyms_count              0.280148
rel_frequency * rel_letters_count                0.010277
rel_frequency * rel_orthographic_density         0.036464
rel_frequency * rel_synonyms_count               0.075280
rel_letters_count * rel_orthographic_density     0.036002
rel_letters_count * rel_synonyms_count           0.104448
rel_orthographic_density * rel_synonyms_count    0.546337
dtype: float64

Regressing global letters_count with 503 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.17723003749697985

intercept                     -0.558613
global_aoa                     0.001419
global_clustering             -0.527913
global_frequency               0.004724
global_letters_count           0.657788
global_orthographic_density   -0.010310
global_synonyms_count         -0.281237
rel_aoa                       -0.129423
rel_clustering                 0.397643
rel_frequency                 -0.115869
rel_letters_count             -0.350718
rel_orthographic_density      -0.216416
rel_synonyms_count            -0.076155
dtype: float64

Regressing global letters_count with 503 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.32097886073887616

intercept                                                -41.051379
global_aoa                                                 3.670034
global_clustering                                         -8.153656
global_frequency                                           1.717683
global_letters_count                                      -0.102200
global_orthographic_density                               -2.498642
global_synonyms_count                                      2.197868
rel_aoa                                                   -5.614202
rel_clustering                                             1.969013
rel_frequency                                             -0.290141
rel_letters_count                                          0.829228
rel_orthographic_density                                  -2.093375
rel_synonyms_count                                        -6.483967
global_aoa * global_clustering                             0.499061
global_aoa * global_frequency                             -0.033808
global_aoa * global_letters_count                         -0.122137
global_aoa * global_orthographic_density                   0.099172
global_aoa * global_synonyms_count                        -0.008394
global_aoa * rel_aoa                                       0.027666
global_aoa * rel_clustering                               -0.436324
global_aoa * rel_frequency                                -0.037031
global_aoa * rel_letters_count                             0.049642
global_aoa * rel_orthographic_density                     -0.070690
global_aoa * rel_synonyms_count                            0.038303
global_clustering * global_frequency                       0.499890
global_clustering * global_letters_count                   0.133517
global_clustering * global_orthographic_density           -0.271542
global_clustering * global_synonyms_count                 -0.410503
global_clustering * rel_aoa                               -0.017183
global_clustering * rel_clustering                        -0.078389
global_clustering * rel_frequency                         -0.061360
global_clustering * rel_letters_count                     -0.294570
global_clustering * rel_orthographic_density               0.015347
global_clustering * rel_synonyms_count                     0.176008
global_frequency * global_letters_count                    0.348725
global_frequency * global_orthographic_density             0.215889
global_frequency * global_synonyms_count                  -0.165680
global_frequency * rel_aoa                                 0.327009
global_frequency * rel_clustering                         -0.143469
global_frequency * rel_frequency                          -0.011845
global_frequency * rel_letters_count                      -0.436902
global_frequency * rel_orthographic_density                0.066936
global_frequency * rel_synonyms_count                      0.354400
global_letters_count * global_orthographic_density        -0.466455
global_letters_count * global_synonyms_count              -0.515218
global_letters_count * rel_aoa                             0.326833
global_letters_count * rel_clustering                      0.276871
global_letters_count * rel_frequency                      -0.004817
global_letters_count * rel_letters_count                   0.036971
global_letters_count * rel_orthographic_density            0.285204
global_letters_count * rel_synonyms_count                  0.712034
global_orthographic_density * global_synonyms_count       -0.066111
global_orthographic_density * rel_aoa                      0.129908
global_orthographic_density * rel_clustering               0.321847
global_orthographic_density * rel_frequency               -0.084033
global_orthographic_density * rel_letters_count            0.441803
global_orthographic_density * rel_orthographic_density     0.247394
global_orthographic_density * rel_synonyms_count           0.082845
global_synonyms_count * rel_aoa                            0.236487
global_synonyms_count * rel_clustering                     0.660084
global_synonyms_count * rel_frequency                      0.283899
global_synonyms_count * rel_letters_count                  0.060673
global_synonyms_count * rel_orthographic_density          -0.112295
global_synonyms_count * rel_synonyms_count                -0.110797
rel_aoa * rel_clustering                                   0.241441
rel_aoa * rel_frequency                                   -0.209287
rel_aoa * rel_letters_count                               -0.303570
rel_aoa * rel_orthographic_density                        -0.102352
rel_aoa * rel_synonyms_count                              -0.063167
rel_clustering * rel_frequency                            -0.015792
rel_clustering * rel_letters_count                        -0.018279
rel_clustering * rel_orthographic_density                  0.248819
rel_clustering * rel_synonyms_count                        0.064720
rel_frequency * rel_letters_count                          0.088243
rel_frequency * rel_orthographic_density                  -0.049208
rel_frequency * rel_synonyms_count                        -0.325764
rel_letters_count * rel_orthographic_density              -0.177406
rel_letters_count * rel_synonyms_count                    -0.255177
rel_orthographic_density * rel_synonyms_count              0.578395
dtype: float64

Regressing rel letters_count with 503 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.21413174838359927

intercept                     -0.639725
global_aoa                    -0.017002
global_clustering             -0.530617
global_frequency              -0.007641
global_letters_count          -0.285074
global_orthographic_density   -0.031342
global_synonyms_count         -0.291328
rel_aoa                       -0.107309
rel_clustering                 0.403683
rel_frequency                 -0.128216
rel_letters_count              0.611873
rel_orthographic_density      -0.225938
rel_synonyms_count            -0.072869
dtype: float64

Regressing rel letters_count with 503 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.3467653915238531

intercept                                                -49.603211
global_aoa                                                 3.128403
global_clustering                                        -10.109904
global_frequency                                           1.733482
global_letters_count                                       0.288680
global_orthographic_density                               -0.949617
global_synonyms_count                                      0.025980
rel_aoa                                                   -4.608711
rel_clustering                                             2.061770
rel_frequency                                             -0.489196
rel_letters_count                                          0.980319
rel_orthographic_density                                  -3.014978
rel_synonyms_count                                        -5.312076
global_aoa * global_clustering                             0.456793
global_aoa * global_frequency                              0.002925
global_aoa * global_letters_count                         -0.109415
global_aoa * global_orthographic_density                   0.021247
global_aoa * global_synonyms_count                        -0.037608
global_aoa * rel_aoa                                       0.032334
global_aoa * rel_clustering                               -0.338739
global_aoa * rel_frequency                                -0.055457
global_aoa * rel_letters_count                             0.030303
global_aoa * rel_orthographic_density                     -0.007485
global_aoa * rel_synonyms_count                            0.033196
global_clustering * global_frequency                       0.553715
global_clustering * global_letters_count                   0.321619
global_clustering * global_orthographic_density            0.131534
global_clustering * global_synonyms_count                 -0.571519
global_clustering * rel_aoa                                0.026005
global_clustering * rel_clustering                        -0.131374
global_clustering * rel_frequency                         -0.137898
global_clustering * rel_letters_count                     -0.453958
global_clustering * rel_orthographic_density              -0.343216
global_clustering * rel_synonyms_count                     0.250805
global_frequency * global_letters_count                    0.294164
global_frequency * global_orthographic_density             0.314370
global_frequency * global_synonyms_count                   0.003080
global_frequency * rel_aoa                                 0.255454
global_frequency * rel_clustering                         -0.169814
global_frequency * rel_frequency                          -0.019189
global_frequency * rel_letters_count                      -0.397989
global_frequency * rel_orthographic_density               -0.059297
global_frequency * rel_synonyms_count                      0.230980
global_letters_count * global_orthographic_density        -0.356515
global_letters_count * global_synonyms_count              -0.463492
global_letters_count * rel_aoa                             0.283674
global_letters_count * rel_clustering                      0.140796
global_letters_count * rel_frequency                       0.010136
global_letters_count * rel_letters_count                   0.024667
global_letters_count * rel_orthographic_density            0.175559
global_letters_count * rel_synonyms_count                  0.705651
global_orthographic_density * global_synonyms_count       -0.401703
global_orthographic_density * rel_aoa                      0.231509
global_orthographic_density * rel_clustering               0.175670
global_orthographic_density * rel_frequency               -0.129936
global_orthographic_density * rel_letters_count            0.324546
global_orthographic_density * rel_orthographic_density     0.249328
global_orthographic_density * rel_synonyms_count           0.407318
global_synonyms_count * rel_aoa                            0.322149
global_synonyms_count * rel_clustering                     0.885414
global_synonyms_count * rel_frequency                      0.139525
global_synonyms_count * rel_letters_count                 -0.031793
global_synonyms_count * rel_orthographic_density           0.187887
global_synonyms_count * rel_synonyms_count                -0.103157
rel_aoa * rel_clustering                                   0.119029
rel_aoa * rel_frequency                                   -0.154026
rel_aoa * rel_letters_count                               -0.255551
rel_aoa * rel_orthographic_density                        -0.178700
rel_aoa * rel_synonyms_count                              -0.105162
rel_clustering * rel_frequency                            -0.007354
rel_clustering * rel_letters_count                         0.093910
rel_clustering * rel_orthographic_density                  0.333038
rel_clustering * rel_synonyms_count                       -0.046933
rel_frequency * rel_letters_count                          0.084853
rel_frequency * rel_orthographic_density                   0.008147
rel_frequency * rel_synonyms_count                        -0.219711
rel_letters_count * rel_orthographic_density              -0.060653
rel_letters_count * rel_synonyms_count                    -0.209614
rel_orthographic_density * rel_synonyms_count              0.301514
dtype: float64

----------------------------------------------------------------------
Regressing global synonyms_count with 490 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.11077070507413257

intercept                      0.606119
global_aoa                    -0.011749
global_clustering              0.050091
global_frequency              -0.007184
global_letters_count           0.005011
global_orthographic_density    0.047292
global_synonyms_count          0.310430
dtype: float64

Regressing global synonyms_count with 490 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.14100926045240292

intercept                                              2.506985
global_aoa                                            -0.134439
global_clustering                                      0.326600
global_frequency                                      -0.146616
global_letters_count                                  -0.020179
global_orthographic_density                           -0.204084
global_synonyms_count                                  0.838374
global_aoa * global_clustering                        -0.015227
global_aoa * global_frequency                         -0.008869
global_aoa * global_letters_count                      0.009598
global_aoa * global_orthographic_density               0.044725
global_aoa * global_synonyms_count                     0.023393
global_clustering * global_frequency                  -0.032420
global_clustering * global_letters_count               0.013648
global_clustering * global_orthographic_density       -0.017000
global_clustering * global_synonyms_count              0.152939
global_frequency * global_letters_count                0.005519
global_frequency * global_orthographic_density        -0.007159
global_frequency * global_synonyms_count               0.020178
global_letters_count * global_orthographic_density    -0.017714
global_letters_count * global_synonyms_count          -0.007523
global_orthographic_density * global_synonyms_count    0.032913
dtype: float64

Regressing rel synonyms_count with 490 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.07683945445333906

intercept                      0.188787
global_aoa                    -0.007962
global_clustering              0.044224
global_frequency               0.000787
global_letters_count           0.008229
global_orthographic_density    0.047511
global_synonyms_count          0.251699
dtype: float64

Regressing rel synonyms_count with 490 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.10696190849405496

intercept                                              2.680360
global_aoa                                            -0.116232
global_clustering                                      0.377515
global_frequency                                      -0.241978
global_letters_count                                  -0.043896
global_orthographic_density                           -0.210266
global_synonyms_count                                  0.950652
global_aoa * global_clustering                        -0.014417
global_aoa * global_frequency                         -0.007009
global_aoa * global_letters_count                      0.007527
global_aoa * global_orthographic_density               0.039978
global_aoa * global_synonyms_count                     0.006007
global_clustering * global_frequency                  -0.041644
global_clustering * global_letters_count               0.013034
global_clustering * global_orthographic_density       -0.004008
global_clustering * global_synonyms_count              0.155319
global_frequency * global_letters_count                0.009450
global_frequency * global_orthographic_density         0.004112
global_frequency * global_synonyms_count               0.013320
global_letters_count * global_orthographic_density    -0.012427
global_letters_count * global_synonyms_count          -0.001235
global_orthographic_density * global_synonyms_count    0.022210
dtype: float64

Regressing global synonyms_count with 490 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.09578105853667529

intercept                   0.386678
rel_aoa                     0.013103
rel_clustering             -0.008782
rel_frequency              -0.011405
rel_letters_count          -0.001645
rel_orthographic_density    0.062828
rel_synonyms_count          0.303791
dtype: float64

Regressing global synonyms_count with 490 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1334645059873809

intercept                                        0.422857
rel_aoa                                          0.009515
rel_clustering                                  -0.061609
rel_frequency                                   -0.012221
rel_letters_count                               -0.059690
rel_orthographic_density                         0.081071
rel_synonyms_count                               0.306412
rel_aoa * rel_clustering                        -0.009323
rel_aoa * rel_frequency                         -0.001391
rel_aoa * rel_letters_count                      0.024403
rel_aoa * rel_orthographic_density               0.045525
rel_aoa * rel_synonyms_count                     0.032589
rel_clustering * rel_frequency                  -0.005590
rel_clustering * rel_letters_count               0.006988
rel_clustering * rel_orthographic_density       -0.026235
rel_clustering * rel_synonyms_count              0.163610
rel_frequency * rel_letters_count                0.002096
rel_frequency * rel_orthographic_density         0.000285
rel_frequency * rel_synonyms_count               0.023265
rel_letters_count * rel_orthographic_density    -0.027675
rel_letters_count * rel_synonyms_count          -0.021436
rel_orthographic_density * rel_synonyms_count   -0.019608
dtype: float64

Regressing rel synonyms_count with 490 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.18528381361710644

intercept                   0.093312
rel_aoa                     0.004986
rel_clustering              0.036617
rel_frequency               0.003078
rel_letters_count          -0.003635
rel_orthographic_density    0.037883
rel_synonyms_count          0.428144
dtype: float64

Regressing rel synonyms_count with 490 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.22117641035973956

intercept                                        0.127650
rel_aoa                                          0.002736
rel_clustering                                  -0.033222
rel_frequency                                    0.009871
rel_letters_count                               -0.062978
rel_orthographic_density                         0.033931
rel_synonyms_count                               0.527303
rel_aoa * rel_clustering                        -0.005010
rel_aoa * rel_frequency                         -0.001508
rel_aoa * rel_letters_count                      0.019936
rel_aoa * rel_orthographic_density               0.036296
rel_aoa * rel_synonyms_count                     0.012577
rel_clustering * rel_frequency                  -0.014242
rel_clustering * rel_letters_count               0.009960
rel_clustering * rel_orthographic_density       -0.010799
rel_clustering * rel_synonyms_count              0.159892
rel_frequency * rel_letters_count               -0.003145
rel_frequency * rel_orthographic_density        -0.005595
rel_frequency * rel_synonyms_count               0.040739
rel_letters_count * rel_orthographic_density    -0.021260
rel_letters_count * rel_synonyms_count          -0.012101
rel_orthographic_density * rel_synonyms_count    0.025991
dtype: float64

Regressing global synonyms_count with 490 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.12556160992722953

intercept                      1.322142
global_aoa                    -0.041041
global_clustering              0.163819
global_frequency               0.011600
global_letters_count           0.034955
global_orthographic_density   -0.011156
global_synonyms_count          0.217047
rel_aoa                        0.040357
rel_clustering                -0.123693
rel_frequency                 -0.014973
rel_letters_count             -0.032189
rel_orthographic_density       0.062022
rel_synonyms_count             0.097695
dtype: float64

Regressing global synonyms_count with 490 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.2640174699196396

intercept                                                 4.795464
global_aoa                                                0.293472
global_clustering                                         2.141331
global_frequency                                         -0.299780
global_letters_count                                      0.404526
global_orthographic_density                               0.521546
global_synonyms_count                                     8.355276
rel_aoa                                                  -0.402395
rel_clustering                                           -0.993738
rel_frequency                                            -0.183250
rel_letters_count                                        -0.412333
rel_orthographic_density                                  0.515827
rel_synonyms_count                                       -7.733796
global_aoa * global_clustering                           -0.041100
global_aoa * global_frequency                            -0.026600
global_aoa * global_letters_count                        -0.050438
global_aoa * global_orthographic_density                 -0.020873
global_aoa * global_synonyms_count                        0.037577
global_aoa * rel_aoa                                     -0.005172
global_aoa * rel_clustering                               0.059855
global_aoa * rel_frequency                                0.033562
global_aoa * rel_letters_count                            0.047735
global_aoa * rel_orthographic_density                     0.050639
global_aoa * rel_synonyms_count                          -0.015225
global_clustering * global_frequency                     -0.127916
global_clustering * global_letters_count                 -0.052228
global_clustering * global_orthographic_density          -0.210399
global_clustering * global_synonyms_count                 0.781357
global_clustering * rel_aoa                              -0.073818
global_clustering * rel_clustering                        0.015348
global_clustering * rel_frequency                         0.062163
global_clustering * rel_letters_count                     0.076534
global_clustering * rel_orthographic_density              0.323018
global_clustering * rel_synonyms_count                   -0.645687
global_frequency * global_letters_count                  -0.005269
global_frequency * global_orthographic_density           -0.107784
global_frequency * global_synonyms_count                 -0.138580
global_frequency * rel_aoa                                0.001675
global_frequency * rel_clustering                         0.069924
global_frequency * rel_frequency                          0.002769
global_frequency * rel_letters_count                      0.014925
global_frequency * rel_orthographic_density               0.094693
global_frequency * rel_synonyms_count                     0.169995
global_letters_count * global_orthographic_density       -0.108227
global_letters_count * global_synonyms_count             -0.257532
global_letters_count * rel_aoa                            0.015415
global_letters_count * rel_clustering                     0.010316
global_letters_count * rel_frequency                      0.004501
global_letters_count * rel_letters_count                 -0.005702
global_letters_count * rel_orthographic_density           0.055535
global_letters_count * rel_synonyms_count                 0.323367
global_orthographic_density * global_synonyms_count      -0.195578
global_orthographic_density * rel_aoa                    -0.055824
global_orthographic_density * rel_clustering              0.043700
global_orthographic_density * rel_frequency               0.091412
global_orthographic_density * rel_letters_count           0.157701
global_orthographic_density * rel_orthographic_density    0.009772
global_orthographic_density * rel_synonyms_count          0.102060
global_synonyms_count * rel_aoa                           0.013242
global_synonyms_count * rel_clustering                   -0.478584
global_synonyms_count * rel_frequency                     0.341511
global_synonyms_count * rel_letters_count                 0.213616
global_synonyms_count * rel_orthographic_density          0.164850
global_synonyms_count * rel_synonyms_count                0.137665
rel_aoa * rel_clustering                                  0.049404
rel_aoa * rel_frequency                                  -0.014837
rel_aoa * rel_letters_count                               0.012708
rel_aoa * rel_orthographic_density                        0.069782
rel_aoa * rel_synonyms_count                             -0.020978
rel_clustering * rel_frequency                           -0.004623
rel_clustering * rel_letters_count                       -0.060177
rel_clustering * rel_orthographic_density                -0.182595
rel_clustering * rel_synonyms_count                       0.368272
rel_frequency * rel_letters_count                        -0.004071
rel_frequency * rel_orthographic_density                 -0.060229
rel_frequency * rel_synonyms_count                       -0.343262
rel_letters_count * rel_orthographic_density             -0.141353
rel_letters_count * rel_synonyms_count                   -0.254318
rel_orthographic_density * rel_synonyms_count            -0.034402
dtype: float64

Regressing rel synonyms_count with 490 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.26390203096518805

intercept                      1.056204
global_aoa                    -0.036432
global_clustering              0.140351
global_frequency               0.008930
global_letters_count           0.039949
global_orthographic_density    0.011986
global_synonyms_count         -0.625378
rel_aoa                        0.032392
rel_clustering                -0.103807
rel_frequency                 -0.011038
rel_letters_count             -0.029708
rel_orthographic_density       0.036838
rel_synonyms_count             1.012928
dtype: float64

Regressing rel synonyms_count with 490 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.3831972220962952

intercept                                                 3.951724
global_aoa                                                0.360948
global_clustering                                         1.915922
global_frequency                                         -0.277357
global_letters_count                                      0.393577
global_orthographic_density                               0.638706
global_synonyms_count                                     6.328000
rel_aoa                                                  -0.314564
rel_clustering                                           -1.419182
rel_frequency                                            -0.205004
rel_letters_count                                        -0.473938
rel_orthographic_density                                  0.532358
rel_synonyms_count                                       -5.907896
global_aoa * global_clustering                           -0.030815
global_aoa * global_frequency                            -0.026153
global_aoa * global_letters_count                        -0.051053
global_aoa * global_orthographic_density                 -0.033834
global_aoa * global_synonyms_count                        0.038204
global_aoa * rel_aoa                                     -0.003617
global_aoa * rel_clustering                               0.054397
global_aoa * rel_frequency                                0.028346
global_aoa * rel_letters_count                            0.046918
global_aoa * rel_orthographic_density                     0.051436
global_aoa * rel_synonyms_count                          -0.009821
global_clustering * global_frequency                     -0.115788
global_clustering * global_letters_count                 -0.053059
global_clustering * global_orthographic_density          -0.173381
global_clustering * global_synonyms_count                 0.631356
global_clustering * rel_aoa                              -0.054782
global_clustering * rel_clustering                       -0.001755
global_clustering * rel_frequency                         0.043740
global_clustering * rel_letters_count                     0.073813
global_clustering * rel_orthographic_density              0.319149
global_clustering * rel_synonyms_count                   -0.515355
global_frequency * global_letters_count                  -0.007098
global_frequency * global_orthographic_density           -0.096460
global_frequency * global_synonyms_count                 -0.136667
global_frequency * rel_aoa                               -0.002700
global_frequency * rel_clustering                         0.080387
global_frequency * rel_frequency                         -0.000678
global_frequency * rel_letters_count                      0.023508
global_frequency * rel_orthographic_density               0.088572
global_frequency * rel_synonyms_count                     0.182901
global_letters_count * global_orthographic_density       -0.090370
global_letters_count * global_synonyms_count             -0.232218
global_letters_count * rel_aoa                            0.018700
global_letters_count * rel_clustering                     0.045206
global_letters_count * rel_frequency                      0.007926
global_letters_count * rel_letters_count                 -0.004891
global_letters_count * rel_orthographic_density           0.046352
global_letters_count * rel_synonyms_count                 0.298083
global_orthographic_density * global_synonyms_count      -0.146444
global_orthographic_density * rel_aoa                    -0.029123
global_orthographic_density * rel_clustering              0.077313
global_orthographic_density * rel_frequency               0.084033
global_orthographic_density * rel_letters_count           0.139882
global_orthographic_density * rel_orthographic_density    0.020714
global_orthographic_density * rel_synonyms_count          0.045622
global_synonyms_count * rel_aoa                           0.008098
global_synonyms_count * rel_clustering                   -0.405993
global_synonyms_count * rel_frequency                     0.335116
global_synonyms_count * rel_letters_count                 0.212870
global_synonyms_count * rel_orthographic_density          0.135645
global_synonyms_count * rel_synonyms_count                0.132780
rel_aoa * rel_clustering                                  0.033196
rel_aoa * rel_frequency                                  -0.009789
rel_aoa * rel_letters_count                               0.004744
rel_aoa * rel_orthographic_density                        0.042635
rel_aoa * rel_synonyms_count                             -0.034695
rel_clustering * rel_frequency                           -0.010032
rel_clustering * rel_letters_count                       -0.085272
rel_clustering * rel_orthographic_density                -0.229168
rel_clustering * rel_synonyms_count                       0.298629
rel_frequency * rel_letters_count                        -0.017562
rel_frequency * rel_orthographic_density                 -0.062015
rel_frequency * rel_synonyms_count                       -0.350038
rel_letters_count * rel_orthographic_density             -0.112848
rel_letters_count * rel_synonyms_count                   -0.256584
rel_orthographic_density * rel_synonyms_count             0.002072
dtype: float64

----------------------------------------------------------------------
Regressing global orthographic_density with 430 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.15739544102247105

intercept                      1.269993
global_aoa                    -0.006626
global_clustering              0.028336
global_frequency               0.008613
global_letters_count          -0.049324
global_orthographic_density    0.293736
global_synonyms_count          0.115125
dtype: float64

Regressing global orthographic_density with 430 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.19168756385961228

intercept                                              2.722874
global_aoa                                            -0.198795
global_clustering                                      0.836964
global_frequency                                       0.311407
global_letters_count                                  -0.065135
global_orthographic_density                            0.657599
global_synonyms_count                                  0.668984
global_aoa * global_clustering                        -0.018833
global_aoa * global_frequency                         -0.005651
global_aoa * global_letters_count                      0.009060
global_aoa * global_orthographic_density               0.063700
global_aoa * global_synonyms_count                    -0.005620
global_clustering * global_frequency                  -0.025357
global_clustering * global_letters_count              -0.063309
global_clustering * global_orthographic_density       -0.015977
global_clustering * global_synonyms_count             -0.076556
global_frequency * global_letters_count               -0.043858
global_frequency * global_orthographic_density        -0.086945
global_frequency * global_synonyms_count              -0.040836
global_letters_count * global_orthographic_density    -0.011923
global_letters_count * global_synonyms_count          -0.070594
global_orthographic_density * global_synonyms_count   -0.148553
dtype: float64

Regressing rel orthographic_density with 430 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.12943942670919617

intercept                     -1.146799
global_aoa                     0.004258
global_clustering             -0.009181
global_frequency               0.005956
global_letters_count          -0.047821
global_orthographic_density    0.252685
global_synonyms_count          0.121499
dtype: float64

Regressing rel orthographic_density with 430 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.16590662716826166

intercept                                              3.769474
global_aoa                                            -0.352741
global_clustering                                      0.889244
global_frequency                                       0.052569
global_letters_count                                  -0.428532
global_orthographic_density                            0.041074
global_synonyms_count                                  0.467246
global_aoa * global_clustering                        -0.012901
global_aoa * global_frequency                         -0.005911
global_aoa * global_letters_count                      0.036446
global_aoa * global_orthographic_density               0.094554
global_aoa * global_synonyms_count                    -0.009752
global_clustering * global_frequency                  -0.045991
global_clustering * global_letters_count              -0.061030
global_clustering * global_orthographic_density        0.008934
global_clustering * global_synonyms_count             -0.075359
global_frequency * global_letters_count               -0.028170
global_frequency * global_orthographic_density        -0.046646
global_frequency * global_synonyms_count              -0.064463
global_letters_count * global_orthographic_density     0.007223
global_letters_count * global_synonyms_count          -0.013603
global_orthographic_density * global_synonyms_count   -0.065061
dtype: float64

Regressing global orthographic_density with 430 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.12458528934390745

intercept                   1.542772
rel_aoa                    -0.020945
rel_clustering              0.001678
rel_frequency              -0.007523
rel_letters_count           0.011719
rel_orthographic_density    0.349349
rel_synonyms_count          0.133203
dtype: float64

Regressing global orthographic_density with 430 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.14136654604993815

intercept                                        1.545561
rel_aoa                                          0.028705
rel_clustering                                   0.169947
rel_frequency                                   -0.002100
rel_letters_count                                0.010225
rel_orthographic_density                         0.341887
rel_synonyms_count                               0.177641
rel_aoa * rel_clustering                         0.034009
rel_aoa * rel_frequency                          0.013556
rel_aoa * rel_letters_count                     -0.006819
rel_aoa * rel_orthographic_density               0.018890
rel_aoa * rel_synonyms_count                    -0.006216
rel_clustering * rel_frequency                   0.019174
rel_clustering * rel_letters_count              -0.046414
rel_clustering * rel_orthographic_density        0.040956
rel_clustering * rel_synonyms_count             -0.231146
rel_frequency * rel_letters_count               -0.010566
rel_frequency * rel_orthographic_density         0.000114
rel_frequency * rel_synonyms_count              -0.027298
rel_letters_count * rel_orthographic_density    -0.004841
rel_letters_count * rel_synonyms_count          -0.054536
rel_orthographic_density * rel_synonyms_count   -0.116379
dtype: float64

Regressing rel orthographic_density with 430 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.21201275481404547

intercept                  -0.479509
rel_aoa                    -0.008585
rel_clustering             -0.001989
rel_frequency               0.033499
rel_letters_count           0.013534
rel_orthographic_density    0.441850
rel_synonyms_count          0.094700
dtype: float64

Regressing rel orthographic_density with 430 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.22576184870074134

intercept                                       -0.384901
rel_aoa                                          0.040815
rel_clustering                                   0.027870
rel_frequency                                    0.078062
rel_letters_count                               -0.029601
rel_orthographic_density                         0.409673
rel_synonyms_count                               0.036553
rel_aoa * rel_clustering                         0.021399
rel_aoa * rel_frequency                          0.009468
rel_aoa * rel_letters_count                      0.008139
rel_aoa * rel_orthographic_density               0.051764
rel_aoa * rel_synonyms_count                     0.002216
rel_clustering * rel_frequency                  -0.012987
rel_clustering * rel_letters_count              -0.016236
rel_clustering * rel_orthographic_density        0.045605
rel_clustering * rel_synonyms_count             -0.171916
rel_frequency * rel_letters_count               -0.018039
rel_frequency * rel_orthographic_density         0.006175
rel_frequency * rel_synonyms_count              -0.042447
rel_letters_count * rel_orthographic_density     0.000742
rel_letters_count * rel_synonyms_count          -0.037683
rel_orthographic_density * rel_synonyms_count   -0.086192
dtype: float64

Regressing global orthographic_density with 430 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.18184311548903162

intercept                      3.492853
global_aoa                     0.024513
global_clustering              0.103354
global_frequency              -0.066700
global_letters_count          -0.265914
global_orthographic_density    0.204344
global_synonyms_count          0.033380
rel_aoa                       -0.038854
rel_clustering                -0.091693
rel_frequency                  0.088387
rel_letters_count              0.236108
rel_orthographic_density       0.095699
rel_synonyms_count             0.089028
dtype: float64

Regressing global orthographic_density with 430 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.30552293949204035

intercept                                                -10.847105
global_aoa                                                -0.293244
global_clustering                                         -1.384395
global_frequency                                           1.916790
global_letters_count                                       0.177665
global_orthographic_density                               -0.810121
global_synonyms_count                                     -1.351879
rel_aoa                                                   -1.015110
rel_clustering                                             2.198185
rel_frequency                                             -2.486174
rel_letters_count                                         -0.941047
rel_orthographic_density                                   1.032276
rel_synonyms_count                                         7.352303
global_aoa * global_clustering                            -0.021452
global_aoa * global_frequency                             -0.014034
global_aoa * global_letters_count                          0.014775
global_aoa * global_orthographic_density                   0.154830
global_aoa * global_synonyms_count                        -0.078088
global_aoa * rel_aoa                                       0.021325
global_aoa * rel_clustering                               -0.011148
global_aoa * rel_frequency                                 0.016366
global_aoa * rel_letters_count                             0.002299
global_aoa * rel_orthographic_density                     -0.111170
global_aoa * rel_synonyms_count                            0.065518
global_clustering * global_frequency                       0.208087
global_clustering * global_letters_count                  -0.099573
global_clustering * global_orthographic_density           -0.193609
global_clustering * global_synonyms_count                 -0.242661
global_clustering * rel_aoa                               -0.248381
global_clustering * rel_clustering                         0.026326
global_clustering * rel_frequency                         -0.451130
global_clustering * rel_letters_count                     -0.001672
global_clustering * rel_orthographic_density               0.216757
global_clustering * rel_synonyms_count                     0.679087
global_frequency * global_letters_count                   -0.102563
global_frequency * global_orthographic_density            -0.080078
global_frequency * global_synonyms_count                  -0.141177
global_frequency * rel_aoa                                -0.014077
global_frequency * rel_clustering                         -0.208337
global_frequency * rel_frequency                           0.010419
global_frequency * rel_letters_count                       0.127531
global_frequency * rel_orthographic_density                0.041388
global_frequency * rel_synonyms_count                      0.020080
global_letters_count * global_orthographic_density        -0.098699
global_letters_count * global_synonyms_count               0.185388
global_letters_count * rel_aoa                            -0.067344
global_letters_count * rel_clustering                      0.057962
global_letters_count * rel_frequency                      -0.030793
global_letters_count * rel_letters_count                  -0.004334
global_letters_count * rel_orthographic_density            0.180046
global_letters_count * rel_synonyms_count                 -0.486310
global_orthographic_density * global_synonyms_count        0.486256
global_orthographic_density * rel_aoa                     -0.109937
global_orthographic_density * rel_clustering               0.116293
global_orthographic_density * rel_frequency               -0.107920
global_orthographic_density * rel_letters_count           -0.071238
global_orthographic_density * rel_orthographic_density    -0.064491
global_orthographic_density * rel_synonyms_count          -0.746798
global_synonyms_count * rel_aoa                            0.183384
global_synonyms_count * rel_clustering                    -0.113283
global_synonyms_count * rel_frequency                      0.126364
global_synonyms_count * rel_letters_count                 -0.110375
global_synonyms_count * rel_orthographic_density          -0.695310
global_synonyms_count * rel_synonyms_count                -0.113362
rel_aoa * rel_clustering                                   0.222213
rel_aoa * rel_frequency                                    0.038121
rel_aoa * rel_letters_count                                0.044092
rel_aoa * rel_orthographic_density                         0.108274
rel_aoa * rel_synonyms_count                              -0.173657
rel_clustering * rel_frequency                             0.379724
rel_clustering * rel_letters_count                        -0.001718
rel_clustering * rel_orthographic_density                 -0.095111
rel_clustering * rel_synonyms_count                       -0.429574
rel_frequency * rel_letters_count                         -0.029793
rel_frequency * rel_orthographic_density                   0.070683
rel_frequency * rel_synonyms_count                        -0.053080
rel_letters_count * rel_orthographic_density              -0.037826
rel_letters_count * rel_synonyms_count                     0.335740
rel_orthographic_density * rel_synonyms_count              0.894954
dtype: float64

Regressing rel orthographic_density with 430 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.25648184618726366

intercept                      2.334768
global_aoa                     0.024684
global_clustering              0.079498
global_frequency              -0.040434
global_letters_count          -0.198010
global_orthographic_density   -0.560183
global_synonyms_count          0.065281
rel_aoa                       -0.034054
rel_clustering                -0.058050
rel_frequency                  0.073886
rel_letters_count              0.159617
rel_orthographic_density       0.911624
rel_synonyms_count             0.037589
dtype: float64

Regressing rel orthographic_density with 430 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.3710063540470248

intercept                                                -9.222587
global_aoa                                               -0.186246
global_clustering                                        -0.563500
global_frequency                                          1.813530
global_letters_count                                      0.040748
global_orthographic_density                              -1.283389
global_synonyms_count                                    -0.750261
rel_aoa                                                  -1.138317
rel_clustering                                            1.500315
rel_frequency                                            -2.616804
rel_letters_count                                        -1.123321
rel_orthographic_density                                  0.991654
rel_synonyms_count                                        6.062642
global_aoa * global_clustering                           -0.034908
global_aoa * global_frequency                            -0.022187
global_aoa * global_letters_count                         0.009715
global_aoa * global_orthographic_density                  0.111926
global_aoa * global_synonyms_count                       -0.094809
global_aoa * rel_aoa                                      0.016810
global_aoa * rel_clustering                              -0.009841
global_aoa * rel_frequency                                0.019929
global_aoa * rel_letters_count                            0.018485
global_aoa * rel_orthographic_density                    -0.067734
global_aoa * rel_synonyms_count                           0.097599
global_clustering * global_frequency                      0.160708
global_clustering * global_letters_count                 -0.099653
global_clustering * global_orthographic_density          -0.331513
global_clustering * global_synonyms_count                -0.209954
global_clustering * rel_aoa                              -0.228066
global_clustering * rel_clustering                        0.036967
global_clustering * rel_frequency                        -0.406112
global_clustering * rel_letters_count                     0.016904
global_clustering * rel_orthographic_density              0.323744
global_clustering * rel_synonyms_count                    0.556312
global_frequency * global_letters_count                  -0.078488
global_frequency * global_orthographic_density           -0.168252
global_frequency * global_synonyms_count                 -0.185243
global_frequency * rel_aoa                               -0.009947
global_frequency * rel_clustering                        -0.174345
global_frequency * rel_frequency                          0.012451
global_frequency * rel_letters_count                      0.120425
global_frequency * rel_orthographic_density               0.150070
global_frequency * rel_synonyms_count                     0.046243
global_letters_count * global_orthographic_density       -0.085724
global_letters_count * global_synonyms_count              0.233909
global_letters_count * rel_aoa                           -0.037404
global_letters_count * rel_clustering                     0.069123
global_letters_count * rel_frequency                     -0.014861
global_letters_count * rel_letters_count                  0.002738
global_letters_count * rel_orthographic_density           0.194139
global_letters_count * rel_synonyms_count                -0.502199
global_orthographic_density * global_synonyms_count       0.510221
global_orthographic_density * rel_aoa                    -0.064867
global_orthographic_density * rel_clustering              0.301481
global_orthographic_density * rel_frequency               0.011921
global_orthographic_density * rel_letters_count          -0.038350
global_orthographic_density * rel_orthographic_density   -0.052386
global_orthographic_density * rel_synonyms_count         -0.747516
global_synonyms_count * rel_aoa                           0.189279
global_synonyms_count * rel_clustering                    0.022039
global_synonyms_count * rel_frequency                     0.191898
global_synonyms_count * rel_letters_count                -0.146129
global_synonyms_count * rel_orthographic_density         -0.654165
global_synonyms_count * rel_synonyms_count               -0.149424
rel_aoa * rel_clustering                                  0.217954
rel_aoa * rel_frequency                                   0.032378
rel_aoa * rel_letters_count                               0.014084
rel_aoa * rel_orthographic_density                        0.061265
rel_aoa * rel_synonyms_count                             -0.190449
rel_clustering * rel_frequency                            0.363933
rel_clustering * rel_letters_count                       -0.034995
rel_clustering * rel_orthographic_density                -0.269042
rel_clustering * rel_synonyms_count                      -0.504053
rel_frequency * rel_letters_count                        -0.056508
rel_frequency * rel_orthographic_density                 -0.061033
rel_frequency * rel_synonyms_count                       -0.124925
rel_letters_count * rel_orthographic_density             -0.091569
rel_letters_count * rel_synonyms_count                    0.350176
rel_orthographic_density * rel_synonyms_count             0.830197
dtype: float64

	aoa	betweenness	clustering	degree	frequency	letters_count	orthographic_density	pagerank	phonemes_count	phonological_density	syllables_count	synonyms_count
Component-0	0.528445	-0.271420	0.083731	-0.222847	-0.242064	0.419672	-0.218663	-0.262450	0.375893	-0.276713	0.145347	-0.001198
Component-1	0.342208	-0.365379	0.109383	-0.289321	-0.229756	-0.424334	0.194591	-0.292117	-0.444309	0.266082	-0.169381	0.027833
Component-2	0.348832	0.625277	-0.043001	0.171100	-0.631099	-0.106504	-0.000281	0.197628	-0.033353	0.033945	-0.050172	-0.052467

	aoa	frequency	letters_count
Component-0	0.755290	-0.385516	0.530014
Component-1	-0.335417	0.467393	0.817948