Feature variation by substitution ($\nu_{\phi}$)

1 Setup

Flags and settings.



In [1]:

    
SAVE_FIGURES = False
PAPER_FEATURES = ['frequency', 'aoa', 'clustering', 'letters_count',
                  'synonyms_count', 'orthographic_density']
N_COMPONENTS = 3
BIN_COUNT = 4

Imports and database setup.



In [2]:

    
from itertools import product

import pandas as pd
import seaborn as sb
from scipy import stats
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from progressbar import ProgressBar

%cd -q ..
from brainscopypaste.conf import settings
%cd -q notebooks
from brainscopypaste.mine import Model, Time, Source, Past, Durl
from brainscopypaste.db import Substitution
from brainscopypaste.utils import init_db, session_scope
engine = init_db()

2 Variation of features upon substitution

First build our data.



In [3]:

    
model = Model(time=Time.discrete, source=Source.majority, past=Past.last_bin, durl=Durl.exclude_past, max_distance=1)
data = []

with session_scope() as session:
    substitutions = session.query(Substitution.id)\
        .filter(Substitution.model == model)
    print("Got {} substitutions for model {}"
          .format(substitutions.count(), model))
    substitution_ids = [id for (id,) in substitutions]

for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for feature in Substitution.__features__:
            source, destination = substitution.features(feature)
            source_rel, destination_rel = \
                substitution.features(feature, sentence_relative='median')
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'feature': feature,
                'source': source,
                'source_rel': source_rel,
                'destination': destination,
                'destination_rel': destination_rel,
                'h0': substitution.feature_average(feature),
                'h0_rel': substitution.feature_average(
                        feature, sentence_relative='median'),
                'h0n': substitution.feature_average(
                        feature, source_synonyms=True),
                'h0n_rel': substitution.feature_average(
                        feature, source_synonyms=True,
                        sentence_relative='median')})

original_variations = pd.DataFrame(data)
del data









    



Got 3036 substitutions for model Model(time=Time.discrete, source=Source.majority, past=Past.last_bin, durl=Durl.exclude_past, max_distance=1)






    



100% (3036 of 3036) |######################| Elapsed Time: 0:01:09 Time: 0:01:09

Compute cluster averages (so as not to overestimate confidence intervals) and crop data so that we have acceptable CIs.



In [4]:

    
variations = original_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'feature'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'feature'], as_index=False)\
    ['source', 'source_rel', 'destination', 'destination_rel', 'feature',
     'h0', 'h0_rel', 'h0n', 'h0n_rel'].mean()
variations['variation'] = variations['destination'] - variations['source']

# HARDCODED: drop values where source AoA is above 15.
# This crops the graphs to acceptable CIs.
variations.loc[(variations.feature == 'aoa') & (variations.source > 15),
               ['source', 'source_rel', 'destination', 'destination_rel',
                'h0', 'h0_rel', 'h0n', 'h0n_rel']] = np.nan

Prepare feature ordering.



In [5]:

    
ordered_features = sorted(
    Substitution.__features__,
    key=lambda f: Substitution._transformed_feature(f).__doc__
)

What we plot about features

For a feature $\phi$, plot:

$\nu_{\phi}$, the average feature of an appearing word upon substitution, as a function of the feature of the disappearing word: $$\nu_{\phi}(f) = \left< \phi(w') \right>_{\{w \rightarrow w' | \phi(w) = f \}}$$
$\nu_{\phi}^0$ (which is the average feature value), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi}^{00}$ (which is the average feature value for synonyms of the source word), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

We also plot these values relative to the sentence average, i.e.:

$\nu_{\phi, r}$, the average sentence-relative feature of an appearing word upon substitution as a function of the sentence-relative feature of the disappearing word, i.e. $\phi($destination$) - \phi($destination sentence$)$ as a function of $\phi($source$) - \phi($source sentence$)$
$\nu_{\phi, r}^0$ (which is the average feature value minus the sentence average), i.e. what happens under $\mathcal{H}_0$
$\nu_{\phi, r}^{00}$ (which is the average feature value for synonyms of the source word minus the sentence average), i.e. what happens under $\mathcal{H}_{00}$
$y = x$, i.e. what happens if there is no substitution

Those values are plotted with fixed-width bins, then quantile bins, with absolute feature values, then with relative-to-sentence features.



In [6]:

    
def print_significance(name, bins, h0, h0n, values):
    bin_count = bins.max() + 1
    print()
    print('-' * len(name))
    print(name)
    print('-' * len(name))
    header = ('Bin  |   '
              + ' |   '.join(map(str, range(1, bin_count + 1)))
              + ' |')
    print(header)
    print('-' * len(header))
    
    for null_name, nulls in [('H_0 ', h0), ('H_00', h0n)]:
        bin_values = np.zeros(bin_count)
        bin_nulls = np.zeros(bin_count)
        cis = np.zeros((bin_count, 3))

        for i in range(bin_count):
            indices = bins == i
            n = (indices).sum()
            s = values[indices].std(ddof=1)

            bin_values[i] = values[indices].mean()
            bin_nulls[i] = nulls[indices].mean()
            for j, alpha in enumerate([.05, .01, .001]):
                cis[i, j] = (stats.t.ppf(1 - alpha/2, n - 1)
                             * values[indices].std(ddof=1)
                             / np.sqrt(n - 1))

        print(null_name + ' |', end='')
        differences = ((bin_values[:,np.newaxis]
                        < bin_nulls[:,np.newaxis] - cis)
                       | (bin_values[:,np.newaxis]
                          > bin_nulls[:,np.newaxis] + cis))
        for i in range(bin_count):
            if differences[i].any():
                n_stars = np.where(differences[i])[0].max()
                bin_stars = '*' * (1 + n_stars) + ' ' * (2 - n_stars)
            else:
                bin_stars = 'ns.'
            print(' ' + bin_stars + ' |', end='')
        print()



In [7]:

    
def plot_variation(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    relative = kwargs.get('relative', False)
    quantiles = kwargs.get('quantiles', False)
    feature_field = kwargs.get('feature_field', 'feature')
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    nuphi = r'\nu_{\phi' + (',r' if relative else '') + '}'
    plt.plot(middles, values, '-', lw=2, color=color,
             label='${}$'.format(nuphi))
    plt.fill_between(middles, values - cis, values + cis,
                     color=sb.desaturate(color, 0.2), alpha=0.2)
    plt.plot(middles, h0s, '--', color=sb.desaturate(color, 0.2),
             label='${}^0$'.format(nuphi))
    plt.plot(middles, h0ns, linestyle='-.',
             color=sb.desaturate(color, 0.2),
             label='${}^{{00}}$'.format(nuphi))
    plt.plot(middles, middles, linestyle='dotted',
             color=sb.desaturate(color, 0.2),
             label='$y = x$')
    lmin, lmax = middles[0], middles[-1]
    h0min, h0max = min(h0s.min(), h0ns.min()), max(h0s.max(), h0ns.max())
    # Rescale limits if we're touching H0 or H00.
    if h0min < lmin:
        lmin = h0min - (lmax - h0min) / 10
    elif h0max > lmax:
        lmax = h0max + (h0max - lmin) / 10
    plt.xlim(lmin, lmax)
    plt.ylim(lmin, lmax)

    # Test for statistical significance
    print_significance(str(data.iloc[0][feature_field]),
                       x_bins, h0, h0n, y)



In [8]:

    
def plot_grid(data, features, filename,
              plot_function, xlabel, ylabel,
              feature_field='feature', plot_kws={}):
    g = sb.FacetGrid(data=data[data[feature_field]
                               .map(lambda f: f in features)],
                     sharex=False, sharey=False,
                     col=feature_field, hue=feature_field,
                     col_order=features, hue_order=features,
                     col_wrap=3, aspect=1.5, size=3)
    g.map_dataframe(plot_function, **plot_kws)
    g.set_titles('{col_name}')
    g.set_xlabels(xlabel)
    g.set_ylabels(ylabel)
    for ax in g.axes.ravel():
        legend = ax.legend(frameon=True, loc='best')
        if not legend:
            # Skip if nothing was plotted on these axes.
            continue
        frame = legend.get_frame()
        frame.set_facecolor('#f2f2f2')
        frame.set_edgecolor('#000000')
        ax.set_title(Substitution._transformed_feature(ax.get_title())
                     .__doc__)
    if SAVE_FIGURES:
        g.fig.savefig(settings.FIGURE.format(filename),
                      bbox_inches='tight', dpi=300)



In [9]:

    
def plot_bias(ax, data, color, ci=True, relative=False, quantiles=False):
    feature = data.iloc[0].feature
    rel = '_rel' if relative else ''
    x = data['source' + rel]
    y = data['destination' + rel]
    h0 = data['h0' + rel]
    h0n = data['h0n' + rel]
    
    # Compute binning.
    cut, cut_kws = ((pd.qcut, {}) if quantiles
                    else (pd.cut, {'right': False}))
    for bin_count in range(BIN_COUNT, 0, -1):
        try:
            x_bins, bins = cut(x, bin_count, labels=False,
                               retbins=True, **cut_kws)
            break
        except ValueError:
            pass
    middles = (bins[:-1] + bins[1:]) / 2
    
    # Compute bin values.
    h0s = np.zeros(bin_count)
    h0ns = np.zeros(bin_count)
    values = np.zeros(bin_count)
    cis = np.zeros(bin_count)
    for i in range(bin_count):
        indices = x_bins == i
        n = indices.sum()
        h0s[i] = h0[indices].mean()
        h0ns[i] = h0n[indices].mean()
        values[i] = y[indices].mean()
        cis[i] = (stats.t.ppf(.975, n - 1) * y[indices].std(ddof=1)
                  / np.sqrt(n - 1))
    
    # Plot.
    scale = abs(h0s.mean())
    ax.plot(np.linspace(0, 1, bin_count),
            (values - h0ns) / scale, '-', lw=2, color=color,
            label=Substitution._transformed_feature(feature).__doc__)
    if ci:
        ax.fill_between(np.linspace(0, 1, bin_count),
                        (values - h0ns - cis) / scale,
                        (values - h0ns + cis) / scale,
                        color=sb.desaturate(color, 0.2), alpha=0.2)



In [10]:

    
def plot_overlay(data, features, filename, palette_name,
                 plot_function, title, xlabel, ylabel, plot_kws={}):
    palette = sb.color_palette(palette_name, len(features))
    fig, ax = plt.subplots(figsize=(12, 6))
    for j, feature in enumerate(features):
        plot_function(ax, data[data.feature == feature].dropna(),
                      color=palette[j], **plot_kws)
    ax.legend(loc='lower right')
    ax.set_title(title)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    if SAVE_FIGURES:
        fig.savefig(settings.FIGURE.format(filename),
                    bbox_inches='tight', dpi=300)
    return ax

2.1 Global feature values

2.1.1 Bins of distribution of appeared global feature values

For each feature $\phi$, we plot the variation upon substitution as explained above



In [11]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *   | ns. |
H_00 | *** | *** | *** | ns. |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | ns. | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *   |
H_00 | ns. | ns. | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *   | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | **  | **  | *   |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | ns. | ns. | *** | *   |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *   | ns. |
H_00 | ns. | *** | ns. | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | *   | **  | *** | ns. |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | ns. |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *   | *** | **  |
H_00 | *   | ns. | ns. | ns. |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | ns. | *** | **  | ns. |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | ns. | ns. | ns. | ns. |

Then plot $\nu_{\phi} - \nu_{\phi}^{00}$ for each feature (i.e. the measured bias) to see how they compare



In [12]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False});



In [13]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$')









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | *** | *** | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *   |
H_00 | *** | *** | *   | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *   | ns. |
H_00 | ns. | *** | ns. | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *   | ns. |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | **  | **  | *   |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *   | *** | **  |
H_00 | *   | ns. | ns. | ns. |



In [14]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)')\
    .set_ylim(-2, .7);

2.1.2 Quantiles of distribution of appeared global feature values



In [15]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | *   |
H_00 | *   | *   | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |
------------------
H_0  | *** | *   |
H_00 | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | **  | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | ns. | ns. | *** |
H_00 | ns. | *** | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | *** | **  | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | ns. | *** | *   | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | **  | *** | **  |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |
------------------------
H_0  | ns. | **  | *** |
H_00 | *   | ns. | *   |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | ns. | *** | *   |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | ns. | ns. | ns. | ns. |



In [16]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_global',
             'husl', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'ci': False, 'quantiles': True});



In [17]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_global', plot_variation,
          r'$\phi($disappearing word$)$', r'$\phi($appearing word$)$',
          plot_kws={'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | **  | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | **  |
H_00 | ns. | *** | *   | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | ns. | ns. | *** |
H_00 | ns. | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |
------------------------
H_0  | ns. | **  | *** |
H_00 | *   | ns. | *   |



In [18]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_global',
             'deep', plot_bias, 'Measured bias for all features',
             r'$\phi($source word$)$ (normalised to $[0, 1]$)',
             r'$\nu_{\phi} - \nu_{\phi}^{00}$'
                 '\n(normalised to feature average)',
             plot_kws={'quantiles': True})\
    .set_ylim(-1.2, .6);

2.2 Sentence-relative feature values

2.2.1 Bins of distribution of appeared sentence-relative values



In [19]:

    
plot_grid(variations, ordered_features,
          'all-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | ns. |
H_00 | *** | *** | *** | ns. |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | *   | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | **  | **  |
H_00 | ns. | ns. | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *   | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | **  | ns. |
H_00 | ns. | **  | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *   | *** | *   |
H_00 | ns. | ns. | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | ns. |
H_00 | ns. | *   | **  | ns. |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | **  | ns. | *** | **  |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *   |
H_00 | **  | ns. | ns. | ns. |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | **  |
H_00 | ns. | *   | *** | ns. |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | ns. | ns. | ns. | ns. |



In [20]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-fixedbins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True});



In [21]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-fixedbins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | **  | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *   | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | ns. |
H_00 | ns. | *   | **  | ns. |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | ns. |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | **  | ns. |
H_00 | ns. | **  | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *   |
H_00 | **  | ns. | ns. | ns. |



In [22]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-fixedbins_sentencerel',
             'deep', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True})\
    .set_ylim(-2, .7);

2.2.2 Quantiles of distribution of appeared sentence-relative values



In [23]:

    
plot_grid(variations, ordered_features,
          'all-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
phonemes_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | ns. | ns. |
H_00 | ns. | ns. | ns. | ns. |

---------------
syllables_count
---------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | **  | ns. | ns. |
H_00 | ns. | ns. | ns. | ns. |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | ns. |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | ns. | ns. | **  |
H_00 | ns. | ns. | *** | *** |

-----------
betweenness
-----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | ns. |
H_00 | ns. | *   | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | **  | *** |
H_00 | ns. | **  | ns. | *   |

------
degree
------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | ns. | ns. | **  | *** |

---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | **  |
H_00 | *   | ns. | ns. | ns. |

--------
pagerank
--------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | **  | *** | *** | *** |
H_00 | ns. | ns. | *** | **  |

--------------------
phonological_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | **  | *** |
H_00 | *   | ns. | ns. | ns. |



In [24]:

    
plot_overlay(variations, ordered_features,
             'all-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'ci': False, 'relative': True, 'quantiles': True});



In [25]:

    
plot_grid(variations, PAPER_FEATURES,
          'paper-variations-quantilebins_sentencerel', plot_variation,
          r'$\phi($disappearing word$) - \phi($sentence$)$',
          r'$\phi($appearing word$) - \phi($sentence$)$',
          plot_kws={'relative': True, 'quantiles': True})









    



---------
frequency
---------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | *** |

---
aoa
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | *** |
H_00 | *** | *** | *** | ns. |

----------
clustering
----------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | **  | *** |
H_00 | ns. | **  | ns. | *   |

-------------
letters_count
-------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *** | ns. |
H_00 | *** | *** | *** | *** |

--------------
synonyms_count
--------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | ns. | ns. | **  |
H_00 | ns. | ns. | *** | *** |

--------------------
orthographic_density
--------------------
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | **  |
H_00 | *   | ns. | ns. | ns. |



In [26]:

    
plot_overlay(variations, PAPER_FEATURES,
             'paper-variations_bias-quantilebins_sentencerel',
             'husl', plot_bias,
             'Measured bias for all sentence-relative features',
             r'$\phi($source word$) - \phi($sentence$)$'
                 r' (normalised to $[0, 1]$)',
             r'$\nu_{\phi,r} - \nu_{\phi,r}^{00}$'
                 '\n(normalised to sentence-relative feature average)',
             plot_kws={'relative': True, 'quantiles': True});

3 Streamplots

We'd like to see what happens between absolute and relative feature values, i.e. how do their effects interact. Especially, we want to know who wins between cognitive bias, attraction to sentence average, or attraction to global feature average.

To do this we plot the general direction (arrows) and strength (color) of where destination words are given a particular absolute/relative source feature couple. I.e., for a given absolute feature value and relative feature value, if this word were to be substituted, where would it go in this (absolute, relative) space?

The interesting thing in these plots is the attraction front, where all arrows point to and join. We're interested in:

its slope
its shape (e.g. several slope regimes?)
its position w.r.t. $\nu_{\phi}^0$ and $y = 0$ (which is $\left< \phi(sentence) \right>$)

First, here's our plotting function. (Note we set the arrow size to something that turns out to be huge here, but gives normal sizes in the figures saves. There must be some dpi scaling problem with the arrows.)



In [27]:

    
def plot_stream(**kwargs):
    data = kwargs.pop('data')
    color = kwargs.get('color', 'blue')
    source = data['source']
    source_rel = data['source_rel']
    dest = data['destination']
    dest_rel = data['destination_rel']
    h0 = data['h0']
    
    # Compute binning.
    bin_count = 4
    x_bins, x_margins = pd.cut(source, bin_count,
                               right=False, labels=False, retbins=True)
    x_middles = (x_margins[:-1] + x_margins[1:]) / 2
    y_bins, y_margins = pd.cut(source_rel, bin_count,
                               right=False, labels=False, retbins=True)
    y_middles = (y_margins[:-1] + y_margins[1:]) / 2
    
    # Compute bin values.
    h0s = np.ones(bin_count) * h0.iloc[0]
    u_values = np.zeros((bin_count, bin_count))
    v_values = np.zeros((bin_count, bin_count))
    strength = np.zeros((bin_count, bin_count))
    for x in range(bin_count):
        for y in range(bin_count):
            u_values[y, x] = (
                dest[(x_bins == x) & (y_bins == y)] -
                source[(x_bins == x) & (y_bins == y)]
            ).mean()
            v_values[y, x] = (
                dest_rel[(x_bins == x) & (y_bins == y)] -
                source_rel[(x_bins == x) & (y_bins == y)]
            ).mean()
            strength[y, x] = np.sqrt(
                (dest[(x_bins == x) & (y_bins == y)] - 
                 source[(x_bins == x) & (y_bins == y)]) ** 2 +
                (dest_rel[(x_bins == x) & (y_bins == y)] - 
                 source_rel[(x_bins == x) & (y_bins == y)]) ** 2
            ).mean()
    
    # Plot.
    plt.streamplot(x_middles, y_middles, u_values, v_values,
                   arrowsize=4, color=strength, cmap=plt.cm.viridis)
    plt.plot(x_middles, np.zeros(bin_count), linestyle='-',
             color=sb.desaturate(color, 0.2), 
             label=r'$\left< \phi(sentence) \right>$')
    plt.plot(h0s, y_middles, linestyle='--',
             color=sb.desaturate(color, 0.2), label=r'$\nu_{\phi}^0$')
    plt.xlim(x_middles[0], x_middles[-1])
    plt.ylim(y_middles[0], y_middles[-1])

Here are the plots for all features



In [28]:

    
g = sb.FacetGrid(data=variations,
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=ordered_features, hue_order=ordered_features)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

And here are the plots for the features we expose in the paper



In [29]:

    
g = sb.FacetGrid(data=variations[variations['feature']
                                 .map(lambda f: f in PAPER_FEATURES)],
                 col='feature', col_wrap=3,
                 sharex=False, sharey=False, hue='feature',
                 aspect=1, size=4.5,
                 col_order=PAPER_FEATURES, hue_order=PAPER_FEATURES)
g.map_dataframe(plot_stream)
g.set_titles('{col_name}')
g.set_xlabels(r'$\phi($word$)$')
g.set_ylabels(r'$\phi($word$) - \phi($sentence$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
    ax.set_title(Substitution._transformed_feature(ax.get_title()).__doc__)
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-feature_streams'),
                  bbox_inches='tight', dpi=300)









    



/home/sl/.virtualenvs/brainscopypaste/lib/python3.5/site-packages/numpy/ma/core.py:4144: UserWarning: Warning: converting a masked element to nan.
  warnings.warn("Warning: converting a masked element to nan.")

4 PCA'd feature variations

Compute PCA on feature variations (note: on variations, not on features directly), and show the evolution of the first three components upon substitution.

CAVEAT: the PCA is computed on variations where all features are defined. This greatly reduces the number of words included (and also the number of substitutions -- see below for real values, but you should know it's drastic). This also has an effect on the computation of $\mathcal{H}_0$ and $\mathcal{H}_{00}$, which are computed using words for which all features are defined. This, again, hugely reduces the number of words taken into account, changing the values under the null hypotheses.

4.1 On all the features

Compute the actual PCA



In [30]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(Substitution.__features__))
pcavariations = variations.pivot(index='cluster_id',
                                 columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

print("We're plotting variation for the first {} components:"
      .format(N_COMPONENTS))
pd.DataFrame(pca.components_[:N_COMPONENTS],
             columns=pcafeatures,
             index=['Component-{}'.format(i) for i in range(N_COMPONENTS)])









    



MLE estimates there are 10 components.

Those explain the following variance:
[ 0.55032841  0.15333033  0.08368943  0.07604974  0.03688808  0.02731091
  0.0207735   0.01834456  0.01215353  0.01056822]

We're plotting variation for the first 3 components:






    Out[30]:






  
    
      
      aoa
      betweenness
      clustering
      degree
      frequency
      letters_count
      orthographic_density
      pagerank
      phonemes_count
      phonological_density
      syllables_count
      synonyms_count
    
  
  
    
      Component-0
      -0.468881
      0.324011
      -0.100663
      0.251706
      0.275475
      -0.423402
      0.213411
      0.275275
      -0.366154
      0.263802
      -0.146755
      -0.002217
    
    
      Component-1
      0.272132
      -0.323434
      0.098092
      -0.267714
      -0.302221
      -0.428990
      0.121751
      -0.316236
      -0.530175
      0.214495
      -0.147851
      0.029822
    
    
      Component-2
      -0.672428
      -0.106674
      -0.005963
      0.001734
      -0.718328
      0.082599
      0.028255
      -0.050776
      0.083237
      -0.031154
      0.014735
      0.045954

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [31]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(N_COMPONENTS):
            source, destination = substitution\
                .components(component, pca, pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (3036 of 3036) |######################| Elapsed Time: 0:01:10 Time: 0:01:10

Compute cluster averages (so as not to overestimate confidence intervals).



In [32]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components (see the caveat section below)



In [33]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='upper left')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('all-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | ns. | *** | *** |
H_00 | ns. | ns. | **  | ns. |

---
1.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *   | *** | *** | *** |
H_00 | ns. | *** | *** | **  |

---
2.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | *** | *** | *   | ns. |
H_00 | ns. | *   | ns. | ns. |

4.2 On a subset of relevant features



In [34]:

    
relevant_features = ['frequency', 'aoa', 'letters_count']

Compute the actual PCA



In [35]:

    
# Compute the PCA.
pcafeatures = tuple(sorted(relevant_features))
pcavariations = variations[variations['feature']
                           .map(lambda f: f in pcafeatures)]\
    .pivot(index='cluster_id', columns='feature', values='variation')
pcavariations = pcavariations.dropna()
pca = PCA(n_components='mle')
pca.fit(pcavariations)

# Show 
print('MLE estimates there are {} components.\n'.format(pca.n_components_))
print('Those explain the following variance:')
print(pca.explained_variance_ratio_)
print()

pd.DataFrame(pca.components_,
             columns=pcafeatures,
             index=['Component-{}'.format(i)
                    for i in range(pca.n_components_)])









    



MLE estimates there are 1 components.

Those explain the following variance:
[ 0.623601]







    Out[35]:






  
    
      
      aoa
      frequency
      letters_count
    
  
  
    
      Component-0
      -0.738181
      0.359234
      -0.570999

Compute the source and destination component values, along with $\mathcal{H}_0$ and $\mathcal{H}_{00}$, for each component.



In [36]:

    
data = []
for substitution_id in ProgressBar(term_width=80)(substitution_ids):
    with session_scope() as session:
        substitution = session.query(Substitution).get(substitution_id)
        
        for component in range(pca.n_components_):
            source, destination = substitution.components(component, pca,
                                                          pcafeatures)
            data.append({
                'cluster_id': substitution.source.cluster.sid,
                'destination_id': substitution.destination.sid,
                'occurrence': substitution.occurrence,
                'position': substitution.position,
                'source_id': substitution.source.sid,
                'component': component,
                'source': source,
                'destination': destination,
                'h0': substitution.component_average(component, pca,
                                                     pcafeatures),
                'h0n': substitution.component_average(component, pca,
                                                      pcafeatures,
                                                      source_synonyms=True)
            })

original_component_variations = pd.DataFrame(data)
del data









    



100% (3036 of 3036) |######################| Elapsed Time: 0:00:19 Time: 0:00:19

Compute cluster averages (so as not to overestimate confidence intervals).



In [37]:

    
component_variations = original_component_variations\
    .groupby(['destination_id', 'occurrence', 'position', 'component'],
             as_index=False).mean()\
    .groupby(['cluster_id', 'component'], as_index=False)\
    ['source', 'destination', 'component', 'h0', 'h0n'].mean()

Plot the actual variations of components



In [38]:

    
g = sb.FacetGrid(data=component_variations, col='component', col_wrap=3,
                 sharex=False, sharey=False, hue='component',
                 aspect=1.5, size=3)
g.map_dataframe(plot_variation, feature_field='component')
g.set_xlabels(r'$c($disappearing word$)$')
g.set_ylabels(r'$c($appearing word$)$')
for ax in g.axes.ravel():
    legend = ax.legend(frameon=True, loc='best')
    if not legend:
        # Skip if nothing was plotted on these axes.
        continue
    frame = legend.get_frame()
    frame.set_facecolor('#f2f2f2')
    frame.set_edgecolor('#000000')
if SAVE_FIGURES:
    g.fig.savefig(settings.FIGURE.format('paper-pca_variations-absolute'),
                  bbox_inches='tight', dpi=300)









    



---
0.0
---
Bin  |   1 |   2 |   3 |   4 |
------------------------------
H_0  | ns. | *** | *** | *** |
H_00 | ns. | ns. | *** | *** |

4.3 CAVEAT: reduction of the numbers of words and substitutions

As explained above, this PCA analysis can only use words for which all the features are defined (in this case, the features listed in relevant_features). So note the following:



In [39]:

    
for feature in relevant_features:
    print("Feature '{}' is based on {} words."
          .format(feature, len(Substitution
                               ._transformed_feature(feature)())))

# Compute the number of words that have all PAPER_FEATURES defined.
words = set()
for tfeature in [Substitution._transformed_feature(feature)
                 for feature in relevant_features]:
    words.update(tfeature())

data = dict((feature, []) for feature in relevant_features)
words_list = []
for word in words:
    words_list.append(word)
    for feature in relevant_features:
        data[feature].append(Substitution
                             ._transformed_feature(feature)(word))
wordsdf = pd.DataFrame(data)
wordsdf['words'] = words_list
del words_list, data

print()
print("Among all the set of words used by these features, "
      "only {} are used."
      .format(len(wordsdf.dropna())))

print()
print("Similarly, we mined {} (cluster-unique) substitutions, "
      "but the PCA is in fact"
      " computed on {} of them (those where all features are defined)."
      .format(len(set(variations['cluster_id'])), len(pcavariations)))









    



Feature 'frequency' is based on 33450 words.
Feature 'aoa' is based on 30102 words.
Feature 'letters_count' is based on 42786 words.

Among all the set of words used by these features, only 14450 are used.

Similarly, we mined 422 (cluster-unique) substitutions, but the PCA is in fact computed on 322 of them (those where all features are defined).

The way $\mathcal{H}_0$ and $\mathcal{H}_{00}$ are computed makes them also affected by this.

5 Interactions between features (by Anova)

Some useful variables first.



In [40]:

    
cuts = [('fixed bins', pd.cut)]#, ('quantiles', pd.qcut)]
rels = [('global', ''), ('sentence-relative', '_rel')]

def star_level(p):
    if p < .001:
        return '***'
    elif p < .01:
        return ' **'
    elif p < .05:
        return '  *'
    else:
        return 'ns.'

Now for each feature, assess if it has an interaction with the other features' destination value. We look at this for all pairs of features, with all pairs of global/sentence-relative value and types of binning (fixed width/quantiles). So it's a lot of answers.

Three stars means $p < .001$, two $p < .01$, one $p < .05$, and ns. means non-significative.



In [41]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
    * global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
   ** global -> sentence-relative
    * sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
   ** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
   ** sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  ns. sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
   ** global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
   ** sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  *** global -> global
    * global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
   ** global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
    * global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
   ** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
    * global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
   ** sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
   ** global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Now for each feature, look at its interaction with the other features' variation (i.e. destination - source). Same drill, same combinations.



In [42]:

    
for feature1 in PAPER_FEATURES:
    print('-' * len(feature1))
    print(feature1)
    print('-' * len(feature1))

    for feature2 in PAPER_FEATURES:
        print()
        print('-> {}'.format(feature2))
        for (cut_label, cut), (rel1_label, rel1) in product(cuts, rels):
            for (rel2_label, rel2) in rels:
                source = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel1)[feature1]
                destination = variations.pivot(
                    index='cluster_id', columns='feature',
                    values='destination' + rel2)[feature2]\
                    - variations.pivot(
                    index='cluster_id', columns='feature',
                    values='source' + rel2)[feature2]

                # Compute binning.
                for bin_count in range(BIN_COUNT, 0, -1):
                    try:
                        source_bins = cut(source, bin_count, labels=False)
                        break
                    except ValueError:
                        pass

                _, p = stats.f_oneway(*[destination[source_bins == i]
                                        .dropna()
                                        for i in range(bin_count)])
                print('  {} {} -> {}'
                      .format(star_level(p), rel1_label, rel2_label))
    print()









    



---------
frequency
---------

-> frequency
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> aoa
   ** global -> global
    * global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
    * global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
   ** global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
    * global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

---
aoa
---

-> frequency
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
   ** global -> global
   ** global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
  ns. sentence-relative -> sentence-relative

----------
clustering
----------

-> frequency
   ** global -> global
  *** global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> aoa
    * global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> letters_count
    * global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
    * global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-------------
letters_count
-------------

-> frequency
   ** global -> global
    * global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
    * sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
    * sentence-relative -> global
    * sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

--------------
synonyms_count
--------------

-> frequency
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> aoa
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> synonyms_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> orthographic_density
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

--------------------
orthographic_density
--------------------

-> frequency
  *** global -> global
  *** global -> sentence-relative
   ** sentence-relative -> global
   ** sentence-relative -> sentence-relative

-> aoa
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> clustering
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> letters_count
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

-> synonyms_count
  ns. global -> global
  ns. global -> sentence-relative
  ns. sentence-relative -> global
  ns. sentence-relative -> sentence-relative

-> orthographic_density
  *** global -> global
  *** global -> sentence-relative
  *** sentence-relative -> global
  *** sentence-relative -> sentence-relative

Ok, so this can go on for a long time, and I'm not going to look at interactions with this lens (meaning at interaction of couples of features with another feature's destination values).

6 Regression



In [43]:

    
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures



In [44]:

    
rels = {False: ('global', ''),
        True: ('rel', '_rel')}

def regress(data, features, target,
            source_rel=False, dest_rel=False, interactions=False):
    if source_rel not in [True, False, 'both']:
        raise ValueError
    if not isinstance(dest_rel, bool):
        raise ValueError
    # Process source/destination relativeness arguments.
    if isinstance(source_rel, bool):
        source_rel = [source_rel]
    else:
        source_rel = [False, True]
    dest_rel_name, dest_rel = rels[dest_rel]
    
    features = tuple(sorted(features))
    feature_tuples = [('source' + rels[rel][1], feature)
                      for rel in source_rel
                      for feature in features]
    feature_names = [rels[rel][0] + '_' + feature
                     for rel in source_rel
                     for feature in features]
    
    # Get source and destination values.
    source = pd.pivot_table(
        data,
        values=['source' + rels[rel][1] for rel in source_rel],
        index=['cluster_id'],
        columns=['feature']
    )[feature_tuples].dropna()
    destination = variations[variations.feature == target]\
        .pivot(index='cluster_id', columns='feature',
               values='destination' + dest_rel)\
        .loc[source.index][target].dropna()
    source = source.loc[destination.index].values
    destination = destination.values

    # If asked to, get polynomial features.
    if interactions:
        poly = PolynomialFeatures(degree=2, interaction_only=True)
        source = poly.fit_transform(source)
        regress_features = [' * '.join([feature_names[j]
                                        for j, p in enumerate(powers)
                                        if p > 0]) or 'intercept'
                            for powers in poly.powers_]
    else:
        regress_features = feature_names

    # Regress.
    linreg = linear_model.LinearRegression(fit_intercept=not interactions)
    linreg.fit(source, destination)

    # And print the score and coefficients.
    print('Regressing {} with {} measures, {} interactions'
          .format(dest_rel_name + ' ' + target, len(source),
                  'with' if interactions else 'no'))
    print('           ' + '^' * len(dest_rel_name + ' ' + target))
    print('R^2 = {}'
          .format(linreg.score(source, destination)))
    print()
    coeffs = pd.Series(index=regress_features, data=linreg.coef_)
    if not interactions:
        coeffs = pd.Series(index=['intercept'], data=[linreg.intercept_])\
            .append(coeffs)
    with pd.option_context('display.max_rows', 999):
        print(coeffs)



In [45]:

    
for target in PAPER_FEATURES:
    print('-' * 70)
    for source_rel, dest_rel in product([False, True, 'both'],
                                        [False, True]):
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel)
        print()
        regress(variations, PAPER_FEATURES, target, source_rel=source_rel,
                dest_rel=dest_rel, interactions=True)
        print()









    



----------------------------------------------------------------------
Regressing global frequency with 233 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.038213789495747275

intercept                      9.577115
global_aoa                    -0.077307
global_clustering              0.187871
global_frequency               0.185746
global_letters_count          -0.044959
global_orthographic_density   -0.360617
global_synonyms_count         -0.117370
dtype: float64

Regressing global frequency with 233 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.13991021775189127

intercept                                              11.716615
global_aoa                                             -1.811731
global_clustering                                      -0.150890
global_frequency                                        0.385409
global_letters_count                                   -0.150196
global_orthographic_density                             2.953439
global_synonyms_count                                   3.326686
global_aoa * global_clustering                         -0.067353
global_aoa * global_frequency                           0.070076
global_aoa * global_letters_count                       0.087586
global_aoa * global_orthographic_density                0.105876
global_aoa * global_synonyms_count                      0.056556
global_clustering * global_frequency                    0.017878
global_clustering * global_letters_count                0.011048
global_clustering * global_orthographic_density         0.446076
global_clustering * global_synonyms_count               0.232409
global_frequency * global_letters_count                -0.050170
global_frequency * global_orthographic_density         -0.156499
global_frequency * global_synonyms_count               -0.122545
global_letters_count * global_orthographic_density      0.020229
global_letters_count * global_synonyms_count           -0.151433
global_orthographic_density * global_synonyms_count    -0.322884
dtype: float64

Regressing rel frequency with 233 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.047135618464312135

intercept                     -1.567747
global_aoa                    -0.079660
global_clustering              0.284910
global_frequency               0.210346
global_letters_count          -0.046491
global_orthographic_density   -0.511513
global_synonyms_count          0.030410
dtype: float64

Regressing rel frequency with 233 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.10926525568570611

intercept                                              10.082005
global_aoa                                             -1.522370
global_clustering                                       2.106895
global_frequency                                       -0.371892
global_letters_count                                   -0.176961
global_orthographic_density                             2.189264
global_synonyms_count                                   0.396026
global_aoa * global_clustering                         -0.039589
global_aoa * global_frequency                           0.047941
global_aoa * global_letters_count                       0.090598
global_aoa * global_orthographic_density                0.118778
global_aoa * global_synonyms_count                      0.125074
global_clustering * global_frequency                   -0.185757
global_clustering * global_letters_count               -0.051722
global_clustering * global_orthographic_density         0.436379
global_clustering * global_synonyms_count              -0.118163
global_frequency * global_letters_count                -0.091950
global_frequency * global_orthographic_density         -0.138912
global_frequency * global_synonyms_count               -0.111365
global_letters_count * global_orthographic_density      0.062822
global_letters_count * global_synonyms_count           -0.118945
global_orthographic_density * global_synonyms_count    -0.107615
dtype: float64

Regressing global frequency with 233 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.03626779050783702

intercept                   9.099227
rel_aoa                    -0.014769
rel_clustering              0.092637
rel_frequency               0.151174
rel_letters_count          -0.047820
rel_orthographic_density   -0.292930
rel_synonyms_count         -0.217825
dtype: float64

Regressing global frequency with 233 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.09347706494944896

intercept                                        9.223459
rel_aoa                                          0.251327
rel_clustering                                   0.047579
rel_frequency                                    0.184406
rel_letters_count                               -0.123564
rel_orthographic_density                        -0.524559
rel_synonyms_count                              -0.114466
rel_aoa * rel_clustering                        -0.117142
rel_aoa * rel_frequency                          0.044753
rel_aoa * rel_letters_count                     -0.004389
rel_aoa * rel_orthographic_density               0.042488
rel_aoa * rel_synonyms_count                     0.024793
rel_clustering * rel_frequency                  -0.096677
rel_clustering * rel_letters_count               0.100907
rel_clustering * rel_orthographic_density        0.395070
rel_clustering * rel_synonyms_count              0.295009
rel_frequency * rel_letters_count               -0.027568
rel_frequency * rel_orthographic_density        -0.042171
rel_frequency * rel_synonyms_count               0.055386
rel_letters_count * rel_orthographic_density     0.027861
rel_letters_count * rel_synonyms_count          -0.237457
rel_orthographic_density * rel_synonyms_count   -0.514468
dtype: float64

Regressing rel frequency with 233 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.24234693466770554

intercept                  -1.643758
rel_aoa                    -0.038536
rel_clustering              0.342004
rel_frequency               0.545893
rel_letters_count          -0.181238
rel_orthographic_density   -0.570674
rel_synonyms_count         -0.113196
dtype: float64

Regressing rel frequency with 233 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.2963352102909719

intercept                                       -1.560198
rel_aoa                                          0.004633
rel_clustering                                  -0.060319
rel_frequency                                    0.624613
rel_letters_count                               -0.218661
rel_orthographic_density                        -0.832450
rel_synonyms_count                              -0.045249
rel_aoa * rel_clustering                        -0.154114
rel_aoa * rel_frequency                         -0.051005
rel_aoa * rel_letters_count                      0.055112
rel_aoa * rel_orthographic_density               0.243870
rel_aoa * rel_synonyms_count                     0.150835
rel_clustering * rel_frequency                  -0.237046
rel_clustering * rel_letters_count               0.109733
rel_clustering * rel_orthographic_density        0.374604
rel_clustering * rel_synonyms_count              0.268032
rel_frequency * rel_letters_count               -0.004873
rel_frequency * rel_orthographic_density        -0.028258
rel_frequency * rel_synonyms_count               0.023468
rel_letters_count * rel_orthographic_density     0.017949
rel_letters_count * rel_synonyms_count          -0.202173
rel_orthographic_density * rel_synonyms_count   -0.205424
dtype: float64

Regressing global frequency with 233 measures, no interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.04677123618377166

intercept                      8.934124
global_aoa                    -0.115860
global_clustering              0.078780
global_frequency               0.123778
global_letters_count           0.049927
global_orthographic_density   -0.136726
global_synonyms_count          0.228221
rel_aoa                        0.060307
rel_clustering                 0.115252
rel_frequency                  0.075125
rel_letters_count             -0.112062
rel_orthographic_density      -0.254015
rel_synonyms_count            -0.405102
dtype: float64

Regressing global frequency with 233 measures, with interactions
           ^^^^^^^^^^^^^^^^
R^2 = 0.3436282639175905

intercept                                                -100.702659
global_aoa                                                  0.947260
global_clustering                                         -24.642056
global_frequency                                            3.671442
global_letters_count                                       -2.144076
global_orthographic_density                                 9.953214
global_synonyms_count                                      33.982206
rel_aoa                                                    -3.899108
rel_clustering                                             17.185552
rel_frequency                                              -9.045751
rel_letters_count                                          -0.100550
rel_orthographic_density                                   -4.042218
rel_synonyms_count                                        -12.136458
global_aoa * global_clustering                              0.901479
global_aoa * global_frequency                               0.224906
global_aoa * global_letters_count                           0.262012
global_aoa * global_orthographic_density                    0.563250
global_aoa * global_synonyms_count                         -0.587936
global_aoa * rel_aoa                                       -0.026025
global_aoa * rel_clustering                                -0.759124
global_aoa * rel_frequency                                 -0.018691
global_aoa * rel_letters_count                             -0.125322
global_aoa * rel_orthographic_density                      -0.468074
global_aoa * rel_synonyms_count                             0.537016
global_clustering * global_frequency                        0.990476
global_clustering * global_letters_count                    0.059937
global_clustering * global_orthographic_density             2.741408
global_clustering * global_synonyms_count                   3.092174
global_clustering * rel_aoa                                -1.117108
global_clustering * rel_clustering                         -0.114766
global_clustering * rel_frequency                          -1.382390
global_clustering * rel_letters_count                      -0.137551
global_clustering * rel_orthographic_density               -2.276524
global_clustering * rel_synonyms_count                     -2.555243
global_frequency * global_letters_count                     0.114535
global_frequency * global_orthographic_density              0.349922
global_frequency * global_synonyms_count                   -1.425752
global_frequency * rel_aoa                                 -0.123755
global_frequency * rel_clustering                          -0.655684
global_frequency * rel_frequency                            0.021040
global_frequency * rel_letters_count                       -0.119379
global_frequency * rel_orthographic_density                -0.550172
global_frequency * rel_synonyms_count                       0.337264
global_letters_count * global_orthographic_density         -0.442409
global_letters_count * global_synonyms_count                0.276846
global_letters_count * rel_aoa                             -0.170065
global_letters_count * rel_clustering                      -0.019811
global_letters_count * rel_frequency                        0.111668
global_letters_count * rel_letters_count                    0.043590
global_letters_count * rel_orthographic_density             0.025496
global_letters_count * rel_synonyms_count                  -1.570474
global_orthographic_density * global_synonyms_count         1.661431
global_orthographic_density * rel_aoa                      -0.401070
global_orthographic_density * rel_clustering               -1.783743
global_orthographic_density * rel_frequency                 0.020822
global_orthographic_density * rel_letters_count             0.429050
global_orthographic_density * rel_orthographic_density     -0.640283
global_orthographic_density * rel_synonyms_count           -1.860739
global_synonyms_count * rel_aoa                             0.601967
global_synonyms_count * rel_clustering                     -2.324828
global_synonyms_count * rel_frequency                       1.472847
global_synonyms_count * rel_letters_count                   0.555138
global_synonyms_count * rel_orthographic_density           -0.602500
global_synonyms_count * rel_synonyms_count                 -0.293776
rel_aoa * rel_clustering                                    0.871294
rel_aoa * rel_frequency                                     0.013421
rel_aoa * rel_letters_count                                 0.067656
rel_aoa * rel_orthographic_density                          0.247166
rel_aoa * rel_synonyms_count                               -0.450392
rel_clustering * rel_frequency                              0.965702
rel_clustering * rel_letters_count                          0.164307
rel_clustering * rel_orthographic_density                   1.721551
rel_clustering * rel_synonyms_count                         1.684745
rel_frequency * rel_letters_count                          -0.097979
rel_frequency * rel_orthographic_density                    0.154813
rel_frequency * rel_synonyms_count                         -0.493837
rel_letters_count * rel_orthographic_density               -0.168589
rel_letters_count * rel_synonyms_count                      0.408515
rel_orthographic_density * rel_synonyms_count               0.246430
dtype: float64

Regressing rel frequency with 233 measures, no interactions
           ^^^^^^^^^^^^^
R^2 = 0.36075878216056356

intercept                      9.209714
global_aoa                    -0.114263
global_clustering              0.241489
global_frequency              -0.790811
global_letters_count           0.042553
global_orthographic_density   -0.156046
global_synonyms_count          0.070962
rel_aoa                        0.037936
rel_clustering                 0.023063
rel_frequency                  1.037978
rel_letters_count             -0.107182
rel_orthographic_density      -0.189431
rel_synonyms_count            -0.218510
dtype: float64

Regressing rel frequency with 233 measures, with interactions
           ^^^^^^^^^^^^^
R^2 = 0.5548099904628678

intercept                                                -95.138258
global_aoa                                                 1.472853
global_clustering                                        -21.905605
global_frequency                                           2.296258
global_letters_count                                      -1.167573
global_orthographic_density                               11.513460
global_synonyms_count                                     32.422612
rel_aoa                                                   -4.329951
rel_clustering                                            15.488644
rel_frequency                                             -7.629155
rel_letters_count                                         -1.231726
rel_orthographic_density                                  -5.468321
rel_synonyms_count                                       -13.675234
global_aoa * global_clustering                             0.839617
global_aoa * global_frequency                              0.194528
global_aoa * global_letters_count                          0.191714
global_aoa * global_orthographic_density                   0.440320
global_aoa * global_synonyms_count                        -0.545613
global_aoa * rel_aoa                                      -0.023488
global_aoa * rel_clustering                               -0.670830
global_aoa * rel_frequency                                 0.022289
global_aoa * rel_letters_count                            -0.028952
global_aoa * rel_orthographic_density                     -0.291078
global_aoa * rel_synonyms_count                            0.521995
global_clustering * global_frequency                       0.851717
global_clustering * global_letters_count                   0.192169
global_clustering * global_orthographic_density            2.382367
global_clustering * global_synonyms_count                  2.539984
global_clustering * rel_aoa                               -1.134299
global_clustering * rel_clustering                        -0.170639
global_clustering * rel_frequency                         -1.230594
global_clustering * rel_letters_count                     -0.206286
global_clustering * rel_orthographic_density              -1.846832
global_clustering * rel_synonyms_count                    -2.334466
global_frequency * global_letters_count                    0.184589
global_frequency * global_orthographic_density             0.153759
global_frequency * global_synonyms_count                  -1.313723
global_frequency * rel_aoa                                -0.094795
global_frequency * rel_clustering                         -0.586442
global_frequency * rel_frequency                           0.048766
global_frequency * rel_letters_count                      -0.175722
global_frequency * rel_orthographic_density               -0.367220
global_frequency * rel_synonyms_count                      0.290720
global_letters_count * global_orthographic_density        -0.571284
global_letters_count * global_synonyms_count              -0.281458
global_letters_count * rel_aoa                            -0.172452
global_letters_count * rel_clustering                     -0.226923
global_letters_count * rel_frequency                       0.035757
global_letters_count * rel_letters_count                   0.066855
global_letters_count * rel_orthographic_density            0.176686
global_letters_count * rel_synonyms_count                 -1.004968
global_orthographic_density * global_synonyms_count        1.076348
global_orthographic_density * rel_aoa                     -0.410899
global_orthographic_density * rel_clustering              -1.558789
global_orthographic_density * rel_frequency                0.176113
global_orthographic_density * rel_letters_count            0.675017
global_orthographic_density * rel_orthographic_density    -0.463322
global_orthographic_density * rel_synonyms_count          -1.291890
global_synonyms_count * rel_aoa                            0.531472
global_synonyms_count * rel_clustering                    -1.821583
global_synonyms_count * rel_frequency                      1.338483
global_synonyms_count * rel_letters_count                  0.918340
global_synonyms_count * rel_orthographic_density          -0.394001
global_synonyms_count * rel_synonyms_count                -0.253752
rel_aoa * rel_clustering                                   0.852337
rel_aoa * rel_frequency                                   -0.036780
rel_aoa * rel_letters_count                                0.022875
rel_aoa * rel_orthographic_density                         0.212762
rel_aoa * rel_synonyms_count                              -0.422699
rel_clustering * rel_frequency                             0.876238
rel_clustering * rel_letters_count                         0.278176
rel_clustering * rel_orthographic_density                  1.401113
rel_clustering * rel_synonyms_count                        1.501329
rel_frequency * rel_letters_count                         -0.031205
rel_frequency * rel_orthographic_density                   0.042632
rel_frequency * rel_synonyms_count                        -0.441269
rel_letters_count * rel_orthographic_density              -0.326118
rel_letters_count * rel_synonyms_count                     0.013757
rel_orthographic_density * rel_synonyms_count             -0.070137
dtype: float64

----------------------------------------------------------------------
Regressing global aoa with 215 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.09195694494825501

intercept                      5.514783
global_aoa                     0.325997
global_clustering              0.126006
global_frequency              -0.065139
global_letters_count           0.070149
global_orthographic_density    0.135023
global_synonyms_count          0.302076
dtype: float64

Regressing global aoa with 215 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.2399658095033953

intercept                                              3.891470
global_aoa                                             2.888039
global_clustering                                      1.112920
global_frequency                                       0.410599
global_letters_count                                  -0.375055
global_orthographic_density                           -6.606796
global_synonyms_count                                 -7.819622
global_aoa * global_clustering                         0.331885
global_aoa * global_frequency                         -0.024288
global_aoa * global_letters_count                     -0.099423
global_aoa * global_orthographic_density               0.055987
global_aoa * global_synonyms_count                     0.176247
global_clustering * global_frequency                  -0.002403
global_clustering * global_letters_count              -0.308807
global_clustering * global_orthographic_density       -0.950878
global_clustering * global_synonyms_count             -0.678090
global_frequency * global_letters_count               -0.062149
global_frequency * global_orthographic_density         0.065291
global_frequency * global_synonyms_count               0.081617
global_letters_count * global_orthographic_density    -0.049059
global_letters_count * global_synonyms_count           0.122225
global_orthographic_density * global_synonyms_count    1.221635
dtype: float64

Regressing rel aoa with 215 measures, no interactions
           ^^^^^^^
R^2 = 0.059510699355043606

intercept                      0.798262
global_aoa                     0.098020
global_clustering              0.024551
global_frequency              -0.247884
global_letters_count           0.194195
global_orthographic_density    0.349400
global_synonyms_count          0.228543
dtype: float64

Regressing rel aoa with 215 measures, with interactions
           ^^^^^^^
R^2 = 0.22494215404682857

intercept                                              22.068362
global_aoa                                              2.091234
global_clustering                                       3.336210
global_frequency                                       -0.847802
global_letters_count                                   -2.807090
global_orthographic_density                            -8.967741
global_synonyms_count                                  -4.054359
global_aoa * global_clustering                          0.207923
global_aoa * global_frequency                          -0.107310
global_aoa * global_letters_count                      -0.030821
global_aoa * global_orthographic_density                0.239378
global_aoa * global_synonyms_count                      0.045950
global_clustering * global_frequency                   -0.093381
global_clustering * global_letters_count               -0.422032
global_clustering * global_orthographic_density        -0.951871
global_clustering * global_synonyms_count              -0.528906
global_frequency * global_letters_count                 0.091244
global_frequency * global_orthographic_density          0.244306
global_frequency * global_synonyms_count               -0.105866
global_letters_count * global_orthographic_density     -0.104662
global_letters_count * global_synonyms_count            0.104700
global_orthographic_density * global_synonyms_count     0.881347
dtype: float64

Regressing global aoa with 215 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.0437671147034403

intercept                   6.906449
rel_aoa                     0.169161
rel_clustering              0.434464
rel_frequency               0.093665
rel_letters_count          -0.062993
rel_orthographic_density   -0.307295
rel_synonyms_count          0.275472
dtype: float64

Regressing global aoa with 215 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.2152803776976637

intercept                                        6.991109
rel_aoa                                          0.051758
rel_clustering                                   0.640141
rel_frequency                                    0.175207
rel_letters_count                               -0.165152
rel_orthographic_density                         0.154760
rel_synonyms_count                               1.458176
rel_aoa * rel_clustering                         0.290433
rel_aoa * rel_frequency                         -0.112372
rel_aoa * rel_letters_count                      0.033881
rel_aoa * rel_orthographic_density               0.478712
rel_aoa * rel_synonyms_count                     0.249934
rel_clustering * rel_frequency                   0.211770
rel_clustering * rel_letters_count              -0.149706
rel_clustering * rel_orthographic_density       -0.478508
rel_clustering * rel_synonyms_count             -0.714823
rel_frequency * rel_letters_count               -0.036810
rel_frequency * rel_orthographic_density         0.099841
rel_frequency * rel_synonyms_count              -0.052887
rel_letters_count * rel_orthographic_density    -0.142655
rel_letters_count * rel_synonyms_count          -0.085599
rel_orthographic_density * rel_synonyms_count    1.114452
dtype: float64

Regressing rel aoa with 215 measures, no interactions
           ^^^^^^^
R^2 = 0.22386513498492133

intercept                   0.372557
rel_aoa                     0.542963
rel_clustering              0.104060
rel_frequency              -0.084591
rel_letters_count           0.072345
rel_orthographic_density    0.329323
rel_synonyms_count          0.085256
dtype: float64

Regressing rel aoa with 215 measures, with interactions
           ^^^^^^^
R^2 = 0.3418891918886606

intercept                                        0.707327
rel_aoa                                          0.725525
rel_clustering                                   0.300033
rel_frequency                                    0.059170
rel_letters_count                                0.092593
rel_orthographic_density                         1.251717
rel_synonyms_count                               0.712301
rel_aoa * rel_clustering                         0.143113
rel_aoa * rel_frequency                         -0.039628
rel_aoa * rel_letters_count                     -0.038979
rel_aoa * rel_orthographic_density               0.316554
rel_aoa * rel_synonyms_count                    -0.007745
rel_clustering * rel_frequency                   0.093478
rel_clustering * rel_letters_count              -0.278107
rel_clustering * rel_orthographic_density       -0.581539
rel_clustering * rel_synonyms_count             -0.554957
rel_frequency * rel_letters_count                0.006219
rel_frequency * rel_orthographic_density         0.246101
rel_frequency * rel_synonyms_count              -0.099279
rel_letters_count * rel_orthographic_density    -0.160288
rel_letters_count * rel_synonyms_count          -0.008601
rel_orthographic_density * rel_synonyms_count    0.621535
dtype: float64

Regressing global aoa with 215 measures, no interactions
           ^^^^^^^^^^
R^2 = 0.1154730697015548

intercept                      2.752350
global_aoa                     0.368341
global_clustering             -0.355973
global_frequency              -0.225268
global_letters_count           0.262934
global_orthographic_density    0.382507
global_synonyms_count          0.656712
rel_aoa                       -0.080744
rel_clustering                 0.517544
rel_frequency                  0.146008
rel_letters_count             -0.199769
rel_orthographic_density      -0.232286
rel_synonyms_count            -0.416296
dtype: float64

Regressing global aoa with 215 measures, with interactions
           ^^^^^^^^^^
R^2 = 0.5090129924821236

intercept                                                 170.522259
global_aoa                                                 -2.697972
global_clustering                                          34.722854
global_frequency                                           -3.598628
global_letters_count                                       -1.027431
global_orthographic_density                               -23.339120
global_synonyms_count                                     -33.935382
rel_aoa                                                     4.516616
rel_clustering                                            -11.753131
rel_frequency                                              11.026145
rel_letters_count                                           4.975181
rel_orthographic_density                                   18.811288
rel_synonyms_count                                         12.804160
global_aoa * global_clustering                             -0.640584
global_aoa * global_frequency                               0.006650
global_aoa * global_letters_count                          -0.199635
global_aoa * global_orthographic_density                   -0.218130
global_aoa * global_synonyms_count                          1.074980
global_aoa * rel_aoa                                       -0.016975
global_aoa * rel_clustering                                 0.527134
global_aoa * rel_frequency                                 -0.047482
global_aoa * rel_letters_count                             -0.031155
global_aoa * rel_orthographic_density                      -0.213498
global_aoa * rel_synonyms_count                            -1.085241
global_clustering * global_frequency                       -1.247455
global_clustering * global_letters_count                   -0.912052
global_clustering * global_orthographic_density            -5.558357
global_clustering * global_synonyms_count                  -2.552799
global_clustering * rel_aoa                                 0.912559
global_clustering * rel_clustering                          0.535276
global_clustering * rel_frequency                           2.040187
global_clustering * rel_letters_count                       0.782799
global_clustering * rel_orthographic_density                3.746622
global_clustering * rel_synonyms_count                      2.009425
global_frequency * global_letters_count                    -0.422034
global_frequency * global_orthographic_density             -1.223424
global_frequency * global_synonyms_count                    0.647551
global_frequency * rel_aoa                                 -0.010224
global_frequency * rel_clustering                           0.391414
global_frequency * rel_frequency                           -0.056751
global_frequency * rel_letters_count                        0.154143
global_frequency * rel_orthographic_density                 0.597120
global_frequency * rel_synonyms_count                       0.241359
global_letters_count * global_orthographic_density          1.099578
global_letters_count * global_synonyms_count                0.184337
global_letters_count * rel_aoa                              0.072558
global_letters_count * rel_clustering                       0.034297
global_letters_count * rel_frequency                        0.161701
global_letters_count * rel_letters_count                    0.107225
global_letters_count * rel_orthographic_density            -0.331429
global_letters_count * rel_synonyms_count                   1.667910
global_orthographic_density * global_synonyms_count         2.810791
global_orthographic_density * rel_aoa                       0.511259
global_orthographic_density * rel_clustering                3.109597
global_orthographic_density * rel_frequency                 0.830200
global_orthographic_density * rel_letters_count            -1.052525
global_orthographic_density * rel_orthographic_density      0.606237
global_orthographic_density * rel_synonyms_count           -2.354413
global_synonyms_count * rel_aoa                            -0.690232
global_synonyms_count * rel_clustering                      2.320820
global_synonyms_count * rel_frequency                      -0.864780
global_synonyms_count * rel_letters_count                  -1.683893
global_synonyms_count * rel_orthographic_density           -3.439689
global_synonyms_count * rel_synonyms_count                  0.449484
rel_aoa * rel_clustering                                   -0.519670
rel_aoa * rel_frequency                                    -0.066583
rel_aoa * rel_letters_count                                 0.045432
rel_aoa * rel_orthographic_density                          0.126540
rel_aoa * rel_synonyms_count                                0.725118
rel_clustering * rel_frequency                             -0.903110
rel_clustering * rel_letters_count                         -0.269963
rel_clustering * rel_orthographic_density                  -2.109930
rel_clustering * rel_synonyms_count                        -2.047286
rel_frequency * rel_letters_count                           0.051149
rel_frequency * rel_orthographic_density                   -0.181678
rel_frequency * rel_synonyms_count                          0.098523
rel_letters_count * rel_orthographic_density                0.689624
rel_letters_count * rel_synonyms_count                      0.159877
rel_orthographic_density * rel_synonyms_count               4.035883
dtype: float64

Regressing rel aoa with 215 measures, no interactions
           ^^^^^^^
R^2 = 0.29018433476345973

intercept                      2.807258
global_aoa                    -0.465168
global_clustering             -0.298053
global_frequency              -0.249464
global_letters_count           0.170835
global_orthographic_density    0.169267
global_synonyms_count          0.659981
rel_aoa                        0.869133
rel_clustering                 0.505800
rel_frequency                  0.131627
rel_letters_count             -0.045734
rel_orthographic_density      -0.002428
rel_synonyms_count            -0.472575
dtype: float64

Regressing rel aoa with 215 measures, with interactions
           ^^^^^^^
R^2 = 0.6077801061726691

intercept                                                 141.897507
global_aoa                                                 -3.843411
global_clustering                                          26.192868
global_frequency                                           -3.671101
global_letters_count                                       -0.607447
global_orthographic_density                               -22.156532
global_synonyms_count                                     -37.278981
rel_aoa                                                     4.385761
rel_clustering                                             -3.775346
rel_frequency                                               8.337601
rel_letters_count                                           4.814136
rel_orthographic_density                                   19.811854
rel_synonyms_count                                         20.255589
global_aoa * global_clustering                             -0.550463
global_aoa * global_frequency                               0.031259
global_aoa * global_letters_count                          -0.158271
global_aoa * global_orthographic_density                   -0.113144
global_aoa * global_synonyms_count                          1.365899
global_aoa * rel_aoa                                       -0.046858
global_aoa * rel_clustering                                 0.468836
global_aoa * rel_frequency                                 -0.109034
global_aoa * rel_letters_count                             -0.086420
global_aoa * rel_orthographic_density                      -0.349822
global_aoa * rel_synonyms_count                            -1.390349
global_clustering * global_frequency                       -0.965809
global_clustering * global_letters_count                   -0.591250
global_clustering * global_orthographic_density            -4.084309
global_clustering * global_synonyms_count                  -3.913693
global_clustering * rel_aoa                                 0.461569
global_clustering * rel_clustering                          0.591085
global_clustering * rel_frequency                           1.335417
global_clustering * rel_letters_count                       0.620898
global_clustering * rel_orthographic_density                2.787445
global_clustering * rel_synonyms_count                      2.800492
global_frequency * global_letters_count                    -0.306572
global_frequency * global_orthographic_density             -0.579611
global_frequency * global_synonyms_count                    0.280219
global_frequency * rel_aoa                                 -0.005498
global_frequency * rel_clustering                           0.119167
global_frequency * rel_frequency                           -0.003126
global_frequency * rel_letters_count                        0.082134
global_frequency * rel_orthographic_density                 0.108589
global_frequency * rel_synonyms_count                       0.123667
global_letters_count * global_orthographic_density          0.995860
global_letters_count * global_synonyms_count               -0.055688
global_letters_count * rel_aoa                             -0.053719
global_letters_count * rel_clustering                      -0.305360
global_letters_count * rel_frequency                        0.067345
global_letters_count * rel_letters_count                    0.116894
global_letters_count * rel_orthographic_density            -0.405151
global_letters_count * rel_synonyms_count                   1.327557
global_orthographic_density * global_synonyms_count         1.904152
global_orthographic_density * rel_aoa                       0.111173
global_orthographic_density * rel_clustering                1.982611
global_orthographic_density * rel_frequency                 0.171364
global_orthographic_density * rel_letters_count            -0.904107
global_orthographic_density * rel_orthographic_density      0.299786
global_orthographic_density * rel_synonyms_count           -1.489391
global_synonyms_count * rel_aoa                            -0.617229
global_synonyms_count * rel_clustering                      2.565864
global_synonyms_count * rel_frequency                      -0.579511
global_synonyms_count * rel_letters_count                  -1.133535
global_synonyms_count * rel_orthographic_density           -2.231229
global_synonyms_count * rel_synonyms_count                  0.233436
rel_aoa * rel_clustering                                   -0.193740
rel_aoa * rel_frequency                                    -0.001720
rel_aoa * rel_letters_count                                 0.119024
rel_aoa * rel_orthographic_density                          0.324669
rel_aoa * rel_synonyms_count                                0.610592
rel_clustering * rel_frequency                             -0.336636
rel_clustering * rel_letters_count                          0.063893
rel_clustering * rel_orthographic_density                  -1.088784
rel_clustering * rel_synonyms_count                        -1.939061
rel_frequency * rel_letters_count                           0.088074
rel_frequency * rel_orthographic_density                    0.310935
rel_frequency * rel_synonyms_count                          0.097970
rel_letters_count * rel_orthographic_density                0.722367
rel_letters_count * rel_synonyms_count                      0.041723
rel_orthographic_density * rel_synonyms_count               2.511515
dtype: float64

----------------------------------------------------------------------
Regressing global clustering with 184 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.11948646739044377

intercept                     -3.327766
global_aoa                     0.004258
global_clustering              0.163689
global_frequency              -0.126863
global_letters_count          -0.052613
global_orthographic_density    0.017045
global_synonyms_count         -0.088625
dtype: float64

Regressing global clustering with 184 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.26562252749303983

intercept                                             -2.904848
global_aoa                                             0.769873
global_clustering                                      0.712451
global_frequency                                      -0.372737
global_letters_count                                  -0.063387
global_orthographic_density                            0.299173
global_synonyms_count                                 -2.079098
global_aoa * global_clustering                         0.055659
global_aoa * global_frequency                         -0.062514
global_aoa * global_letters_count                      0.010855
global_aoa * global_orthographic_density               0.022659
global_aoa * global_synonyms_count                     0.027824
global_clustering * global_frequency                  -0.110368
global_clustering * global_letters_count               0.039302
global_clustering * global_orthographic_density       -0.010618
global_clustering * global_synonyms_count             -0.355569
global_frequency * global_letters_count                0.018097
global_frequency * global_orthographic_density        -0.035662
global_frequency * global_synonyms_count              -0.040175
global_letters_count * global_orthographic_density    -0.050229
global_letters_count * global_synonyms_count           0.021703
global_orthographic_density * global_synonyms_count   -0.076635
dtype: float64

Regressing rel clustering with 184 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.1203726680892675

intercept                      2.227458
global_aoa                     0.021148
global_clustering              0.082446
global_frequency              -0.114610
global_letters_count          -0.069435
global_orthographic_density    0.040052
global_synonyms_count         -0.157160
dtype: float64

Regressing rel clustering with 184 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.2229321620994208

intercept                                              5.745799
global_aoa                                             0.449108
global_clustering                                      1.398138
global_frequency                                      -0.230429
global_letters_count                                  -0.172240
global_orthographic_density                            0.290476
global_synonyms_count                                 -2.626346
global_aoa * global_clustering                         0.002914
global_aoa * global_frequency                         -0.049000
global_aoa * global_letters_count                     -0.002010
global_aoa * global_orthographic_density               0.015217
global_aoa * global_synonyms_count                     0.023319
global_clustering * global_frequency                  -0.102491
global_clustering * global_letters_count              -0.030251
global_clustering * global_orthographic_density       -0.046357
global_clustering * global_synonyms_count             -0.435470
global_frequency * global_letters_count               -0.006902
global_frequency * global_orthographic_density        -0.059353
global_frequency * global_synonyms_count              -0.034005
global_letters_count * global_orthographic_density    -0.021510
global_letters_count * global_synonyms_count           0.021955
global_orthographic_density * global_synonyms_count   -0.093876
dtype: float64

Regressing global clustering with 184 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.05829752058959481

intercept                  -5.790940
rel_aoa                     0.025796
rel_clustering              0.142374
rel_frequency              -0.047332
rel_letters_count          -0.068389
rel_orthographic_density   -0.012623
rel_synonyms_count         -0.103258
dtype: float64

Regressing global clustering with 184 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.2069205875447667

intercept                                       -5.760668
rel_aoa                                         -0.026237
rel_clustering                                   0.039557
rel_frequency                                   -0.028737
rel_letters_count                               -0.054557
rel_orthographic_density                         0.125565
rel_synonyms_count                              -0.135304
rel_aoa * rel_clustering                         0.022962
rel_aoa * rel_frequency                         -0.045829
rel_aoa * rel_letters_count                     -0.003650
rel_aoa * rel_orthographic_density               0.090976
rel_aoa * rel_synonyms_count                     0.060459
rel_clustering * rel_frequency                  -0.035882
rel_clustering * rel_letters_count              -0.084341
rel_clustering * rel_orthographic_density       -0.190759
rel_clustering * rel_synonyms_count             -0.467522
rel_frequency * rel_letters_count                0.003051
rel_frequency * rel_orthographic_density        -0.000959
rel_frequency * rel_synonyms_count              -0.053023
rel_letters_count * rel_orthographic_density    -0.045716
rel_letters_count * rel_synonyms_count          -0.009943
rel_orthographic_density * rel_synonyms_count   -0.009705
dtype: float64

Regressing rel clustering with 184 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.1441429094221961

intercept                   0.333296
rel_aoa                     0.013433
rel_clustering              0.359954
rel_frequency              -0.035528
rel_letters_count          -0.054357
rel_orthographic_density    0.021915
rel_synonyms_count         -0.080836
dtype: float64

Regressing rel clustering with 184 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.25191402754693404

intercept                                        0.421046
rel_aoa                                          0.045230
rel_clustering                                   0.217918
rel_frequency                                    0.008814
rel_letters_count                               -0.035689
rel_orthographic_density                         0.118612
rel_synonyms_count                              -0.107233
rel_aoa * rel_clustering                        -0.016256
rel_aoa * rel_frequency                         -0.022943
rel_aoa * rel_letters_count                     -0.024521
rel_aoa * rel_orthographic_density               0.033274
rel_aoa * rel_synonyms_count                     0.064652
rel_clustering * rel_frequency                  -0.065097
rel_clustering * rel_letters_count              -0.057944
rel_clustering * rel_orthographic_density       -0.089425
rel_clustering * rel_synonyms_count             -0.401444
rel_frequency * rel_letters_count               -0.002225
rel_frequency * rel_orthographic_density        -0.002900
rel_frequency * rel_synonyms_count              -0.036021
rel_letters_count * rel_orthographic_density    -0.029631
rel_letters_count * rel_synonyms_count           0.030575
rel_orthographic_density * rel_synonyms_count    0.070341
dtype: float64

Regressing global clustering with 184 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.16668535703053633

intercept                     -0.154171
global_aoa                    -0.023693
global_clustering              0.321320
global_frequency              -0.284776
global_letters_count          -0.126580
global_orthographic_density    0.087279
global_synonyms_count         -0.034014
rel_aoa                        0.044866
rel_clustering                -0.176488
rel_frequency                  0.171930
rel_letters_count              0.066709
rel_orthographic_density      -0.064241
rel_synonyms_count            -0.110451
dtype: float64

Regressing global clustering with 184 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.5134184710314191

intercept                                                 8.569284
global_aoa                                                0.288345
global_clustering                                         2.691075
global_frequency                                         -3.049812
global_letters_count                                      2.422365
global_orthographic_density                               8.169910
global_synonyms_count                                    -2.010792
rel_aoa                                                   0.283338
rel_clustering                                           -5.614520
rel_frequency                                             3.034223
rel_letters_count                                        -0.150873
rel_orthographic_density                                 -2.050063
rel_synonyms_count                                       -1.074238
global_aoa * global_clustering                           -0.128353
global_aoa * global_frequency                            -0.051852
global_aoa * global_letters_count                        -0.021124
global_aoa * global_orthographic_density                 -0.367287
global_aoa * global_synonyms_count                       -0.255940
global_aoa * rel_aoa                                     -0.024759
global_aoa * rel_clustering                               0.319059
global_aoa * rel_frequency                               -0.059104
global_aoa * rel_letters_count                            0.058611
global_aoa * rel_orthographic_density                     0.301389
global_aoa * rel_synonyms_count                           0.445837
global_clustering * global_frequency                     -0.467935
global_clustering * global_letters_count                  0.528076
global_clustering * global_orthographic_density           0.875320
global_clustering * global_synonyms_count                -1.459795
global_clustering * rel_aoa                               0.282307
global_clustering * rel_clustering                       -0.172484
global_clustering * rel_frequency                         0.435351
global_clustering * rel_letters_count                    -0.053595
global_clustering * rel_orthographic_density             -0.073343
global_clustering * rel_synonyms_count                    1.715966
global_frequency * global_letters_count                   0.088901
global_frequency * global_orthographic_density           -0.018424
global_frequency * global_synonyms_count                  0.090536
global_frequency * rel_aoa                                0.060498
global_frequency * rel_clustering                         0.356864
global_frequency * rel_frequency                         -0.009353
global_frequency * rel_letters_count                     -0.078486
global_frequency * rel_orthographic_density               0.011017
global_frequency * rel_synonyms_count                     0.122503
global_letters_count * global_orthographic_density        0.035167
global_letters_count * global_synonyms_count             -0.676605
global_letters_count * rel_aoa                            0.136636
global_letters_count * rel_clustering                    -0.262305
global_letters_count * rel_frequency                      0.053395
global_letters_count * rel_letters_count                  0.016018
global_letters_count * rel_orthographic_density          -0.134643
global_letters_count * rel_synonyms_count                 0.847585
global_orthographic_density * global_synonyms_count      -2.261486
global_orthographic_density * rel_aoa                     0.254520
global_orthographic_density * rel_clustering             -0.372257
global_orthographic_density * rel_frequency               0.033002
global_orthographic_density * rel_letters_count           0.071277
global_orthographic_density * rel_orthographic_density   -0.143946
global_orthographic_density * rel_synonyms_count          2.460770
global_synonyms_count * rel_aoa                           0.036357
global_synonyms_count * rel_clustering                    0.971276
global_synonyms_count * rel_frequency                    -0.387661
global_synonyms_count * rel_letters_count                 0.216100
global_synonyms_count * rel_orthographic_density          0.856287
global_synonyms_count * rel_synonyms_count                0.048821
rel_aoa * rel_clustering                                 -0.360171
rel_aoa * rel_frequency                                  -0.061717
rel_aoa * rel_letters_count                              -0.139358
rel_aoa * rel_orthographic_density                       -0.124162
rel_aoa * rel_synonyms_count                             -0.095514
rel_clustering * rel_frequency                           -0.480831
rel_clustering * rel_letters_count                       -0.164773
rel_clustering * rel_orthographic_density                -0.263440
rel_clustering * rel_synonyms_count                      -1.615279
rel_frequency * rel_letters_count                        -0.023817
rel_frequency * rel_orthographic_density                 -0.044039
rel_frequency * rel_synonyms_count                        0.162430
rel_letters_count * rel_orthographic_density             -0.011114
rel_letters_count * rel_synonyms_count                   -0.370679
rel_orthographic_density * rel_synonyms_count            -0.884100
dtype: float64

Regressing rel clustering with 184 measures, no interactions
           ^^^^^^^^^^^^^^
R^2 = 0.28211052223303734

intercept                      0.309383
global_aoa                    -0.014141
global_clustering             -0.538689
global_frequency              -0.253585
global_letters_count          -0.133065
global_orthographic_density    0.078185
global_synonyms_count         -0.070302
rel_aoa                        0.042328
rel_clustering                 0.779239
rel_frequency                  0.152621
rel_letters_count              0.077757
rel_orthographic_density      -0.063274
rel_synonyms_count            -0.069909
dtype: float64

Regressing rel clustering with 184 measures, with interactions
           ^^^^^^^^^^^^^^
R^2 = 0.5752339686898901

intercept                                                 14.342539
global_aoa                                                -0.367577
global_clustering                                          1.902338
global_frequency                                          -2.952400
global_letters_count                                       1.450891
global_orthographic_density                                6.335395
global_synonyms_count                                     -2.354765
rel_aoa                                                    0.276406
rel_clustering                                            -4.520784
rel_frequency                                              2.295780
rel_letters_count                                          0.039527
rel_orthographic_density                                  -2.304402
rel_synonyms_count                                        -0.430467
global_aoa * global_clustering                            -0.160833
global_aoa * global_frequency                             -0.038032
global_aoa * global_letters_count                          0.030678
global_aoa * global_orthographic_density                  -0.301842
global_aoa * global_synonyms_count                        -0.190806
global_aoa * rel_aoa                                      -0.009627
global_aoa * rel_clustering                                0.295368
global_aoa * rel_frequency                                -0.068553
global_aoa * rel_letters_count                             0.022365
global_aoa * rel_orthographic_density                      0.311535
global_aoa * rel_synonyms_count                            0.311070
global_clustering * global_frequency                      -0.405654
global_clustering * global_letters_count                   0.475960
global_clustering * global_orthographic_density            0.640580
global_clustering * global_synonyms_count                 -1.371441
global_clustering * rel_aoa                                0.230554
global_clustering * rel_clustering                        -0.200854
global_clustering * rel_frequency                          0.305281
global_clustering * rel_letters_count                     -0.088893
global_clustering * rel_orthographic_density              -0.016210
global_clustering * rel_synonyms_count                     1.488561
global_frequency * global_letters_count                    0.126102
global_frequency * global_orthographic_density            -0.005142
global_frequency * global_synonyms_count                   0.070934
global_frequency * rel_aoa                                 0.057105
global_frequency * rel_clustering                          0.335596
global_frequency * rel_frequency                          -0.006043
global_frequency * rel_letters_count                      -0.094264
global_frequency * rel_orthographic_density                0.031730
global_frequency * rel_synonyms_count                      0.109539
global_letters_count * global_orthographic_density         0.025214
global_letters_count * global_synonyms_count              -0.631853
global_letters_count * rel_aoa                             0.083016
global_letters_count * rel_clustering                     -0.262653
global_letters_count * rel_frequency                       0.037141
global_letters_count * rel_letters_count                   0.010341
global_letters_count * rel_orthographic_density           -0.089472
global_letters_count * rel_synonyms_count                  0.777717
global_orthographic_density * global_synonyms_count       -1.979912
global_orthographic_density * rel_aoa                      0.222310
global_orthographic_density * rel_clustering              -0.300612
global_orthographic_density * rel_frequency                0.057567
global_orthographic_density * rel_letters_count            0.069272
global_orthographic_density * rel_orthographic_density    -0.102657
global_orthographic_density * rel_synonyms_count           1.999247
global_synonyms_count * rel_aoa                           -0.032864
global_synonyms_count * rel_clustering                     1.060060
global_synonyms_count * rel_frequency                     -0.326125
global_synonyms_count * rel_letters_count                  0.230879
global_synonyms_count * rel_orthographic_density           0.738182
global_synonyms_count * rel_synonyms_count                 0.040334
rel_aoa * rel_clustering                                  -0.332371
rel_aoa * rel_frequency                                   -0.038892
rel_aoa * rel_letters_count                               -0.110004
rel_aoa * rel_orthographic_density                        -0.135699
rel_aoa * rel_synonyms_count                              -0.014267
rel_clustering * rel_frequency                            -0.396261
rel_clustering * rel_letters_count                        -0.113519
rel_clustering * rel_orthographic_density                 -0.263438
rel_clustering * rel_synonyms_count                       -1.505516
rel_frequency * rel_letters_count                         -0.022550
rel_frequency * rel_orthographic_density                  -0.082839
rel_frequency * rel_synonyms_count                         0.126833
rel_letters_count * rel_orthographic_density              -0.035115
rel_letters_count * rel_synonyms_count                    -0.354925
rel_orthographic_density * rel_synonyms_count             -0.682009
dtype: float64

----------------------------------------------------------------------
Regressing global letters_count with 233 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.06966557464479661

intercept                      2.744672
global_aoa                     0.087522
global_clustering             -0.159349
global_frequency               0.004968
global_letters_count           0.280897
global_orthographic_density    0.162294
global_synonyms_count         -0.124507
dtype: float64

Regressing global letters_count with 233 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1797683331359642

intercept                                             -18.530789
global_aoa                                              2.342506
global_clustering                                      -1.511208
global_frequency                                        1.698679
global_letters_count                                    1.426795
global_orthographic_density                            -2.658008
global_synonyms_count                                  -2.322934
global_aoa * global_clustering                          0.216180
global_aoa * global_frequency                           0.039097
global_aoa * global_letters_count                      -0.184001
global_aoa * global_orthographic_density               -0.207192
global_aoa * global_synonyms_count                      0.113554
global_clustering * global_frequency                    0.236508
global_clustering * global_letters_count               -0.253103
global_clustering * global_orthographic_density        -0.604864
global_clustering * global_synonyms_count              -0.150780
global_frequency * global_letters_count                -0.127081
global_frequency * global_orthographic_density          0.089081
global_frequency * global_synonyms_count                0.142589
global_letters_count * global_orthographic_density     -0.010109
global_letters_count * global_synonyms_count           -0.139837
global_orthographic_density * global_synonyms_count     0.119861
dtype: float64

Regressing rel letters_count with 233 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.03750203244001138

intercept                      0.000617
global_aoa                     0.028428
global_clustering             -0.165249
global_frequency              -0.081788
global_letters_count           0.255982
global_orthographic_density    0.258768
global_synonyms_count         -0.204812
dtype: float64

Regressing rel letters_count with 233 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.1332729932299349

intercept                                             -27.051097
global_aoa                                              2.615635
global_clustering                                      -3.640443
global_frequency                                        1.635868
global_letters_count                                    0.914511
global_orthographic_density                            -2.578941
global_synonyms_count                                  -1.379630
global_aoa * global_clustering                          0.281079
global_aoa * global_frequency                           0.029813
global_aoa * global_letters_count                      -0.171938
global_aoa * global_orthographic_density               -0.152941
global_aoa * global_synonyms_count                      0.045088
global_clustering * global_frequency                    0.347315
global_clustering * global_letters_count               -0.176572
global_clustering * global_orthographic_density        -0.450029
global_clustering * global_synonyms_count              -0.139345
global_frequency * global_letters_count                -0.032972
global_frequency * global_orthographic_density          0.179629
global_frequency * global_synonyms_count                0.073760
global_letters_count * global_orthographic_density     -0.066646
global_letters_count * global_synonyms_count           -0.106312
global_orthographic_density * global_synonyms_count     0.045342
dtype: float64

Regressing global letters_count with 233 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.03785986166697608

intercept                   5.896102
rel_aoa                     0.034622
rel_clustering              0.119683
rel_frequency               0.061469
rel_letters_count           0.161211
rel_orthographic_density   -0.043370
rel_synonyms_count         -0.093798
dtype: float64

Regressing global letters_count with 233 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.07389284245794736

intercept                                        5.992309
rel_aoa                                         -0.081326
rel_clustering                                   0.353747
rel_frequency                                    0.177159
rel_letters_count                                0.191825
rel_orthographic_density                         0.105434
rel_synonyms_count                               0.162640
rel_aoa * rel_clustering                        -0.036092
rel_aoa * rel_frequency                         -0.040693
rel_aoa * rel_letters_count                     -0.024326
rel_aoa * rel_orthographic_density              -0.047790
rel_aoa * rel_synonyms_count                     0.025236
rel_clustering * rel_frequency                   0.113235
rel_clustering * rel_letters_count              -0.006422
rel_clustering * rel_orthographic_density       -0.089116
rel_clustering * rel_synonyms_count             -0.133362
rel_frequency * rel_letters_count               -0.030221
rel_frequency * rel_orthographic_density         0.074735
rel_frequency * rel_synonyms_count              -0.035539
rel_letters_count * rel_orthographic_density     0.047222
rel_letters_count * rel_synonyms_count          -0.010669
rel_orthographic_density * rel_synonyms_count    0.290509
dtype: float64

Regressing rel letters_count with 233 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.16672434431328387

intercept                   1.205948
rel_aoa                     0.023994
rel_clustering             -0.002930
rel_frequency              -0.185416
rel_letters_count           0.431341
rel_orthographic_density    0.399329
rel_synonyms_count         -0.091716
dtype: float64

Regressing rel letters_count with 233 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.2037515571332852

intercept                                        1.241649
rel_aoa                                          0.015239
rel_clustering                                   0.410392
rel_frequency                                   -0.131792
rel_letters_count                                0.584239
rel_orthographic_density                         0.650392
rel_synonyms_count                               0.126438
rel_aoa * rel_clustering                         0.096748
rel_aoa * rel_frequency                          0.001308
rel_aoa * rel_letters_count                     -0.083663
rel_aoa * rel_orthographic_density              -0.157117
rel_aoa * rel_synonyms_count                     0.002190
rel_clustering * rel_frequency                   0.201026
rel_clustering * rel_letters_count              -0.035782
rel_clustering * rel_orthographic_density       -0.080044
rel_clustering * rel_synonyms_count             -0.107245
rel_frequency * rel_letters_count               -0.003250
rel_frequency * rel_orthographic_density         0.107653
rel_frequency * rel_synonyms_count               0.003780
rel_letters_count * rel_orthographic_density     0.055609
rel_letters_count * rel_synonyms_count           0.034753
rel_orthographic_density * rel_synonyms_count    0.228924
dtype: float64

Regressing global letters_count with 233 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.09257681789488159

intercept                     -3.004451
global_aoa                     0.067000
global_clustering             -0.739190
global_frequency               0.163995
global_letters_count           0.530815
global_orthographic_density   -0.126756
global_synonyms_count         -0.053722
rel_aoa                        0.003495
rel_clustering                 0.693145
rel_frequency                 -0.183754
rel_letters_count             -0.257647
rel_orthographic_density       0.291969
rel_synonyms_count            -0.062388
dtype: float64

Regressing global letters_count with 233 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^
R^2 = 0.382984292204945

intercept                                                 18.483034
global_aoa                                                 3.847504
global_clustering                                         10.477873
global_frequency                                           0.893873
global_letters_count                                       0.426155
global_orthographic_density                               -4.066484
global_synonyms_count                                    -11.949345
rel_aoa                                                   -4.025837
rel_clustering                                            -3.497886
rel_frequency                                              4.007698
rel_letters_count                                          4.169363
rel_orthographic_density                                  -0.997250
rel_synonyms_count                                        -1.385906
global_aoa * global_clustering                             0.323480
global_aoa * global_frequency                             -0.003393
global_aoa * global_letters_count                         -0.297147
global_aoa * global_orthographic_density                  -0.208963
global_aoa * global_synonyms_count                         0.510544
global_aoa * rel_aoa                                       0.037788
global_aoa * rel_clustering                               -0.012434
global_aoa * rel_frequency                                 0.042555
global_aoa * rel_letters_count                             0.001297
global_aoa * rel_orthographic_density                     -0.062371
global_aoa * rel_synonyms_count                           -0.169074
global_clustering * global_frequency                      -0.366209
global_clustering * global_letters_count                  -0.862253
global_clustering * global_orthographic_density           -2.062810
global_clustering * global_synonyms_count                 -1.723306
global_clustering * rel_aoa                                0.155876
global_clustering * rel_clustering                         0.156065
global_clustering * rel_frequency                          0.968698
global_clustering * rel_letters_count                      0.944404
global_clustering * rel_orthographic_density               1.242103
global_clustering * rel_synonyms_count                     0.763627
global_frequency * global_letters_count                   -0.247219
global_frequency * global_orthographic_density            -0.591247
global_frequency * global_synonyms_count                  -0.416588
global_frequency * rel_aoa                                 0.268579
global_frequency * rel_clustering                          0.146456
global_frequency * rel_frequency                           0.019438
global_frequency * rel_letters_count                       0.113946
global_frequency * rel_orthographic_density                0.713658
global_frequency * rel_synonyms_count                      0.890723
global_letters_count * global_orthographic_density        -0.093993
global_letters_count * global_synonyms_count              -0.148820
global_letters_count * rel_aoa                             0.439888
global_letters_count * rel_clustering                      0.445405
global_letters_count * rel_frequency                       0.069026
global_letters_count * rel_letters_count                   0.046607
global_letters_count * rel_orthographic_density            0.234028
global_letters_count * rel_synonyms_count                  0.208974
global_orthographic_density * global_synonyms_count        1.553737
global_orthographic_density * rel_aoa                      0.009513
global_orthographic_density * rel_clustering               0.473127
global_orthographic_density * rel_frequency                0.391854
global_orthographic_density * rel_letters_count           -0.027214
global_orthographic_density * rel_orthographic_density     0.304089
global_orthographic_density * rel_synonyms_count          -1.411874
global_synonyms_count * rel_aoa                           -0.525271
global_synonyms_count * rel_clustering                     1.727516
global_synonyms_count * rel_frequency                     -0.128491
global_synonyms_count * rel_letters_count                 -0.549823
global_synonyms_count * rel_orthographic_density          -1.547373
global_synonyms_count * rel_synonyms_count                -0.231759
rel_aoa * rel_clustering                                  -0.096490
rel_aoa * rel_frequency                                   -0.271226
rel_aoa * rel_letters_count                               -0.321224
rel_aoa * rel_orthographic_density                         0.244461
rel_aoa * rel_synonyms_count                               0.393802
rel_clustering * rel_frequency                            -0.545752
rel_clustering * rel_letters_count                        -0.547907
rel_clustering * rel_orthographic_density                  0.423908
rel_clustering * rel_synonyms_count                       -0.993522
rel_frequency * rel_letters_count                         -0.002559
rel_frequency * rel_orthographic_density                  -0.258670
rel_frequency * rel_synonyms_count                        -0.228397
rel_letters_count * rel_orthographic_density               0.105617
rel_letters_count * rel_synonyms_count                     0.292822
rel_orthographic_density * rel_synonyms_count              1.428830
dtype: float64

Regressing rel letters_count with 233 measures, no interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.219722540827619

intercept                     -2.795844
global_aoa                     0.037938
global_clustering             -0.617426
global_frequency               0.169978
global_letters_count          -0.372231
global_orthographic_density   -0.095871
global_synonyms_count          0.055144
rel_aoa                        0.001273
rel_clustering                 0.582634
rel_frequency                 -0.219301
rel_letters_count              0.668306
rel_orthographic_density       0.201347
rel_synonyms_count            -0.165776
dtype: float64

Regressing rel letters_count with 233 measures, with interactions
           ^^^^^^^^^^^^^^^^^
R^2 = 0.4518602555034998

intercept                                                  4.500467
global_aoa                                                 3.204886
global_clustering                                          6.531504
global_frequency                                           1.209303
global_letters_count                                       0.761924
global_orthographic_density                               -4.574826
global_synonyms_count                                    -11.669728
rel_aoa                                                   -3.085230
rel_clustering                                            -1.730486
rel_frequency                                              3.239566
rel_letters_count                                          3.766850
rel_orthographic_density                                  -0.600989
rel_synonyms_count                                         0.035483
global_aoa * global_clustering                             0.327561
global_aoa * global_frequency                              0.035610
global_aoa * global_letters_count                         -0.275462
global_aoa * global_orthographic_density                  -0.168921
global_aoa * global_synonyms_count                         0.577253
global_aoa * rel_aoa                                       0.022461
global_aoa * rel_clustering                               -0.009740
global_aoa * rel_frequency                                 0.010093
global_aoa * rel_letters_count                            -0.034641
global_aoa * rel_orthographic_density                     -0.143466
global_aoa * rel_synonyms_count                           -0.354815
global_clustering * global_frequency                      -0.188595
global_clustering * global_letters_count                  -0.554430
global_clustering * global_orthographic_density           -1.808201
global_clustering * global_synonyms_count                 -1.314508
global_clustering * rel_aoa                                0.116211
global_clustering * rel_clustering                         0.090228
global_clustering * rel_frequency                          0.762992
global_clustering * rel_letters_count                      0.576340
global_clustering * rel_orthographic_density               0.918662
global_clustering * rel_synonyms_count                     0.479578
global_frequency * global_letters_count                   -0.217382
global_frequency * global_orthographic_density            -0.434533
global_frequency * global_synonyms_count                  -0.291886
global_frequency * rel_aoa                                 0.187585
global_frequency * rel_clustering                          0.064400
global_frequency * rel_frequency                           0.014931
global_frequency * rel_letters_count                       0.080230
global_frequency * rel_orthographic_density                0.565211
global_frequency * rel_synonyms_count                      0.720741
global_letters_count * global_orthographic_density        -0.083690
global_letters_count * global_synonyms_count              -0.010851
global_letters_count * rel_aoa                             0.371440
global_letters_count * rel_clustering                      0.230791
global_letters_count * rel_frequency                       0.036945
global_letters_count * rel_letters_count                   0.035739
global_letters_count * rel_orthographic_density            0.164595
global_letters_count * rel_synonyms_count                  0.132741
global_orthographic_density * global_synonyms_count        1.653362
global_orthographic_density * rel_aoa                      0.042247
global_orthographic_density * rel_clustering               0.342923
global_orthographic_density * rel_frequency                0.317259
global_orthographic_density * rel_letters_count           -0.085259
global_orthographic_density * rel_orthographic_density     0.190251
global_orthographic_density * rel_synonyms_count          -1.481393
global_synonyms_count * rel_aoa                           -0.459342
global_synonyms_count * rel_clustering                     1.302850
global_synonyms_count * rel_frequency                     -0.175945
global_synonyms_count * rel_letters_count                 -0.647257
global_synonyms_count * rel_orthographic_density          -1.330977
global_synonyms_count * rel_synonyms_count                -0.267398
rel_aoa * rel_clustering                                  -0.045882
rel_aoa * rel_frequency                                   -0.199124
rel_aoa * rel_letters_count                               -0.230945
rel_aoa * rel_orthographic_density                         0.216515
rel_aoa * rel_synonyms_count                               0.443372
rel_clustering * rel_frequency                            -0.423057
rel_clustering * rel_letters_count                        -0.245125
rel_clustering * rel_orthographic_density                  0.664362
rel_clustering * rel_synonyms_count                       -0.694849
rel_frequency * rel_letters_count                          0.037011
rel_frequency * rel_orthographic_density                  -0.220142
rel_frequency * rel_synonyms_count                        -0.115831
rel_letters_count * rel_orthographic_density               0.186600
rel_letters_count * rel_synonyms_count                     0.349113
rel_orthographic_density * rel_synonyms_count              1.212159
dtype: float64

----------------------------------------------------------------------
Regressing global synonyms_count with 227 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1113648952187245

intercept                     -0.003555
global_aoa                     0.014992
global_clustering             -0.018239
global_frequency               0.025739
global_letters_count          -0.029728
global_orthographic_density    0.062103
global_synonyms_count          0.240667
dtype: float64

Regressing global synonyms_count with 227 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.15245013955236386

intercept                                             -2.902750
global_aoa                                             0.450282
global_clustering                                      0.204889
global_frequency                                       0.045397
global_letters_count                                   0.391958
global_orthographic_density                            1.240935
global_synonyms_count                                 -0.124634
global_aoa * global_clustering                         0.009862
global_aoa * global_frequency                         -0.025036
global_aoa * global_letters_count                     -0.020845
global_aoa * global_orthographic_density              -0.035583
global_aoa * global_synonyms_count                     0.036391
global_clustering * global_frequency                  -0.051704
global_clustering * global_letters_count               0.013869
global_clustering * global_orthographic_density        0.079712
global_clustering * global_synonyms_count             -0.012190
global_frequency * global_letters_count               -0.018255
global_frequency * global_orthographic_density        -0.043564
global_frequency * global_synonyms_count               0.027530
global_letters_count * global_orthographic_density    -0.012497
global_letters_count * global_synonyms_count          -0.031971
global_orthographic_density * global_synonyms_count   -0.025513
dtype: float64

Regressing rel synonyms_count with 227 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.0822996774353153

intercept                     -0.373987
global_aoa                     0.015866
global_clustering             -0.029422
global_frequency               0.028920
global_letters_count          -0.030371
global_orthographic_density    0.059281
global_synonyms_count          0.176126
dtype: float64

Regressing rel synonyms_count with 227 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.11771330025921567

intercept                                             -2.635651
global_aoa                                             0.358145
global_clustering                                      0.214419
global_frequency                                      -0.036961
global_letters_count                                   0.416565
global_orthographic_density                            1.179671
global_synonyms_count                                 -0.183115
global_aoa * global_clustering                         0.000824
global_aoa * global_frequency                         -0.021066
global_aoa * global_letters_count                     -0.018926
global_aoa * global_orthographic_density              -0.029815
global_aoa * global_synonyms_count                     0.019876
global_clustering * global_frequency                  -0.058396
global_clustering * global_letters_count               0.025410
global_clustering * global_orthographic_density        0.093209
global_clustering * global_synonyms_count             -0.011433
global_frequency * global_letters_count               -0.016492
global_frequency * global_orthographic_density        -0.038826
global_frequency * global_synonyms_count               0.032996
global_letters_count * global_orthographic_density     0.003145
global_letters_count * global_synonyms_count          -0.019267
global_orthographic_density * global_synonyms_count   -0.035606
dtype: float64

Regressing global synonyms_count with 227 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.12445477019188222

intercept                   0.610270
rel_aoa                     0.049236
rel_clustering             -0.061784
rel_frequency               0.020408
rel_letters_count          -0.051462
rel_orthographic_density    0.048914
rel_synonyms_count          0.218561
dtype: float64

Regressing global synonyms_count with 227 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.17987054072897923

intercept                                        0.647687
rel_aoa                                          0.002417
rel_clustering                                  -0.163200
rel_frequency                                    0.044767
rel_letters_count                               -0.076717
rel_orthographic_density                         0.059117
rel_synonyms_count                               0.121831
rel_aoa * rel_clustering                        -0.005224
rel_aoa * rel_frequency                         -0.018565
rel_aoa * rel_letters_count                      0.026014
rel_aoa * rel_orthographic_density               0.070252
rel_aoa * rel_synonyms_count                     0.042213
rel_clustering * rel_frequency                  -0.057997
rel_clustering * rel_letters_count               0.009596
rel_clustering * rel_orthographic_density        0.078106
rel_clustering * rel_synonyms_count             -0.004642
rel_frequency * rel_letters_count                0.003139
rel_frequency * rel_orthographic_density         0.013100
rel_frequency * rel_synonyms_count              -0.005415
rel_letters_count * rel_orthographic_density    -0.015631
rel_letters_count * rel_synonyms_count           0.025975
rel_orthographic_density * rel_synonyms_count    0.038618
dtype: float64

Regressing rel synonyms_count with 227 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.181322866432861

intercept                   0.271811
rel_aoa                     0.026936
rel_clustering             -0.002363
rel_frequency               0.024004
rel_letters_count          -0.046817
rel_orthographic_density    0.022282
rel_synonyms_count          0.345036
dtype: float64

Regressing rel synonyms_count with 227 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.2346793046341754

intercept                                        0.310802
rel_aoa                                         -0.031534
rel_clustering                                  -0.162524
rel_frequency                                    0.056363
rel_letters_count                               -0.075981
rel_orthographic_density                        -0.002400
rel_synonyms_count                               0.375681
rel_aoa * rel_clustering                         0.013428
rel_aoa * rel_frequency                         -0.014676
rel_aoa * rel_letters_count                      0.026947
rel_aoa * rel_orthographic_density               0.050619
rel_aoa * rel_synonyms_count                    -0.000903
rel_clustering * rel_frequency                  -0.067388
rel_clustering * rel_letters_count               0.024115
rel_clustering * rel_orthographic_density        0.079356
rel_clustering * rel_synonyms_count              0.031200
rel_frequency * rel_letters_count               -0.004083
rel_frequency * rel_orthographic_density         0.007474
rel_frequency * rel_synonyms_count               0.019143
rel_letters_count * rel_orthographic_density     0.003521
rel_letters_count * rel_synonyms_count           0.028195
rel_orthographic_density * rel_synonyms_count    0.065293
dtype: float64

Regressing global synonyms_count with 227 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1457539135445446

intercept                      0.628980
global_aoa                    -0.027209
global_clustering              0.089888
global_frequency               0.014627
global_letters_count           0.043018
global_orthographic_density    0.147503
global_synonyms_count          0.155432
rel_aoa                        0.067015
rel_clustering                -0.120319
rel_frequency                  0.017436
rel_letters_count             -0.081637
rel_orthographic_density      -0.085692
rel_synonyms_count             0.084002
dtype: float64

Regressing global synonyms_count with 227 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.447667968308838

intercept                                                 -1.853427
global_aoa                                                 1.658112
global_clustering                                          3.338512
global_frequency                                          -0.321843
global_letters_count                                       1.103115
global_orthographic_density                                6.561701
global_synonyms_count                                      6.902074
rel_aoa                                                   -0.039373
rel_clustering                                            -5.907922
rel_frequency                                              0.516397
rel_letters_count                                         -0.818330
rel_orthographic_density                                  -3.562703
rel_synonyms_count                                       -15.522914
global_aoa * global_clustering                             0.031543
global_aoa * global_frequency                             -0.037447
global_aoa * global_letters_count                         -0.141090
global_aoa * global_orthographic_density                  -0.207892
global_aoa * global_synonyms_count                        -0.096349
global_aoa * rel_aoa                                      -0.035439
global_aoa * rel_clustering                                0.039103
global_aoa * rel_frequency                                 0.050351
global_aoa * rel_letters_count                             0.094895
global_aoa * rel_orthographic_density                      0.121582
global_aoa * rel_synonyms_count                            0.145301
global_clustering * global_frequency                      -0.267264
global_clustering * global_letters_count                  -0.135739
global_clustering * global_orthographic_density            0.027712
global_clustering * global_synonyms_count                  0.447987
global_clustering * rel_aoa                               -0.022319
global_clustering * rel_clustering                        -0.112712
global_clustering * rel_frequency                          0.196476
global_clustering * rel_letters_count                      0.083482
global_clustering * rel_orthographic_density               0.019836
global_clustering * rel_synonyms_count                    -1.003211
global_frequency * global_letters_count                   -0.070030
global_frequency * global_orthographic_density            -0.391288
global_frequency * global_synonyms_count                   0.089128
global_frequency * rel_aoa                                -0.032432
global_frequency * rel_clustering                          0.220723
global_frequency * rel_frequency                          -0.025855
global_frequency * rel_letters_count                       0.017394
global_frequency * rel_orthographic_density                0.257821
global_frequency * rel_synonyms_count                      0.172394
global_letters_count * global_orthographic_density        -0.121488
global_letters_count * global_synonyms_count              -0.390457
global_letters_count * rel_aoa                             0.094063
global_letters_count * rel_clustering                      0.331117
global_letters_count * rel_frequency                       0.045010
global_letters_count * rel_letters_count                   0.001364
global_letters_count * rel_orthographic_density            0.004185
global_letters_count * rel_synonyms_count                  0.684861
global_orthographic_density * global_synonyms_count       -1.355932
global_orthographic_density * rel_aoa                      0.001844
global_orthographic_density * rel_clustering               0.406255
global_orthographic_density * rel_frequency                0.230524
global_orthographic_density * rel_letters_count            0.256065
global_orthographic_density * rel_orthographic_density     0.035743
global_orthographic_density * rel_synonyms_count           1.677925
global_synonyms_count * rel_aoa                           -0.039552
global_synonyms_count * rel_clustering                    -0.002950
global_synonyms_count * rel_frequency                     -0.039761
global_synonyms_count * rel_letters_count                  0.321026
global_synonyms_count * rel_orthographic_density           1.038234
global_synonyms_count * rel_synonyms_count                 0.235494
rel_aoa * rel_clustering                                  -0.012429
rel_aoa * rel_frequency                                   -0.045816
rel_aoa * rel_letters_count                               -0.015091
rel_aoa * rel_orthographic_density                         0.114624
rel_aoa * rel_synonyms_count                               0.033774
rel_clustering * rel_frequency                            -0.257446
rel_clustering * rel_letters_count                        -0.198641
rel_clustering * rel_orthographic_density                 -0.216083
rel_clustering * rel_synonyms_count                        0.513464
rel_frequency * rel_letters_count                          0.016546
rel_frequency * rel_orthographic_density                  -0.074993
rel_frequency * rel_synonyms_count                        -0.200232
rel_letters_count * rel_orthographic_density              -0.128804
rel_letters_count * rel_synonyms_count                    -0.499908
rel_orthographic_density * rel_synonyms_count             -1.287400
dtype: float64

Regressing rel synonyms_count with 227 measures, no interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.30047685585861683

intercept                      0.355031
global_aoa                    -0.026762
global_clustering              0.067002
global_frequency               0.009860
global_letters_count           0.055307
global_orthographic_density    0.160650
global_synonyms_count         -0.695342
rel_aoa                        0.056957
rel_clustering                -0.084228
rel_frequency                  0.013366
rel_letters_count             -0.079656
rel_orthographic_density      -0.100727
rel_synonyms_count             0.999400
dtype: float64

Regressing rel synonyms_count with 227 measures, with interactions
           ^^^^^^^^^^^^^^^^^^
R^2 = 0.5543428027403134

intercept                                                  2.576118
global_aoa                                                 1.408521
global_clustering                                          3.628022
global_frequency                                          -0.653560
global_letters_count                                       0.599557
global_orthographic_density                                6.819259
global_synonyms_count                                      4.049474
rel_aoa                                                    0.286723
rel_clustering                                            -5.973066
rel_frequency                                              0.622277
rel_letters_count                                         -0.696997
rel_orthographic_density                                  -3.797480
rel_synonyms_count                                       -11.981688
global_aoa * global_clustering                             0.021553
global_aoa * global_frequency                             -0.031963
global_aoa * global_letters_count                         -0.119890
global_aoa * global_orthographic_density                  -0.199083
global_aoa * global_synonyms_count                        -0.028420
global_aoa * rel_aoa                                      -0.032220
global_aoa * rel_clustering                                0.042159
global_aoa * rel_frequency                                 0.045170
global_aoa * rel_letters_count                             0.089424
global_aoa * rel_orthographic_density                      0.139525
global_aoa * rel_synonyms_count                            0.077619
global_clustering * global_frequency                      -0.292558
global_clustering * global_letters_count                  -0.172997
global_clustering * global_orthographic_density            0.140162
global_clustering * global_synonyms_count                  0.340165
global_clustering * rel_aoa                               -0.000595
global_clustering * rel_clustering                        -0.078716
global_clustering * rel_frequency                          0.204208
global_clustering * rel_letters_count                      0.104896
global_clustering * rel_orthographic_density              -0.024421
global_clustering * rel_synonyms_count                    -0.815340
global_frequency * global_letters_count                   -0.059121
global_frequency * global_orthographic_density            -0.364042
global_frequency * global_synonyms_count                   0.113276
global_frequency * rel_aoa                                -0.039228
global_frequency * rel_clustering                          0.254069
global_frequency * rel_frequency                          -0.027661
global_frequency * rel_letters_count                       0.026020
global_frequency * rel_orthographic_density                0.246832
global_frequency * rel_synonyms_count                      0.140477
global_letters_count * global_orthographic_density        -0.100427
global_letters_count * global_synonyms_count              -0.290331
global_letters_count * rel_aoa                             0.071015
global_letters_count * rel_clustering                      0.357303
global_letters_count * rel_frequency                       0.050326
global_letters_count * rel_letters_count                   0.000724
global_letters_count * rel_orthographic_density           -0.012254
global_letters_count * rel_synonyms_count                  0.583527
global_orthographic_density * global_synonyms_count       -1.188651
global_orthographic_density * rel_aoa                     -0.019123
global_orthographic_density * rel_clustering               0.302881
global_orthographic_density * rel_frequency                0.221533
global_orthographic_density * rel_letters_count            0.236033
global_orthographic_density * rel_orthographic_density     0.050694
global_orthographic_density * rel_synonyms_count           1.522033
global_synonyms_count * rel_aoa                           -0.062424
global_synonyms_count * rel_clustering                     0.020795
global_synonyms_count * rel_frequency                     -0.036515
global_synonyms_count * rel_letters_count                  0.258301
global_synonyms_count * rel_orthographic_density           0.966302
global_synonyms_count * rel_synonyms_count                 0.236372
rel_aoa * rel_clustering                                  -0.028811
rel_aoa * rel_frequency                                   -0.037900
rel_aoa * rel_letters_count                               -0.007390
rel_aoa * rel_orthographic_density                         0.107253
rel_aoa * rel_synonyms_count                               0.029506
rel_clustering * rel_frequency                            -0.264894
rel_clustering * rel_letters_count                        -0.223981
rel_clustering * rel_orthographic_density                 -0.199533
rel_clustering * rel_synonyms_count                        0.404104
rel_frequency * rel_letters_count                         -0.001339
rel_frequency * rel_orthographic_density                  -0.072534
rel_frequency * rel_synonyms_count                        -0.195759
rel_letters_count * rel_orthographic_density              -0.101749
rel_letters_count * rel_synonyms_count                    -0.438080
rel_orthographic_density * rel_synonyms_count             -1.214120
dtype: float64

----------------------------------------------------------------------
Regressing global orthographic_density with 195 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.13650950518725347

intercept                      1.042500
global_aoa                     0.011662
global_clustering              0.099070
global_frequency               0.017447
global_letters_count           0.001370
global_orthographic_density    0.361489
global_synonyms_count          0.226540
dtype: float64

Regressing global orthographic_density with 195 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.22704832745030643

intercept                                             -2.661345
global_aoa                                            -0.193773
global_clustering                                     -0.403481
global_frequency                                       0.320138
global_letters_count                                   0.093064
global_orthographic_density                            2.898605
global_synonyms_count                                  0.062320
global_aoa * global_clustering                        -0.032281
global_aoa * global_frequency                         -0.017466
global_aoa * global_letters_count                      0.022453
global_aoa * global_orthographic_density               0.036311
global_aoa * global_synonyms_count                    -0.020276
global_clustering * global_frequency                   0.012999
global_clustering * global_letters_count               0.052707
global_clustering * global_orthographic_density        0.261131
global_clustering * global_synonyms_count             -0.046718
global_frequency * global_letters_count                0.005025
global_frequency * global_orthographic_density        -0.129255
global_frequency * global_synonyms_count               0.053010
global_letters_count * global_orthographic_density    -0.003387
global_letters_count * global_synonyms_count          -0.038229
global_orthographic_density * global_synonyms_count   -0.195189
dtype: float64

Regressing rel orthographic_density with 195 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.1335478418592111

intercept                     -1.850906
global_aoa                     0.029248
global_clustering              0.030767
global_frequency               0.045447
global_letters_count          -0.002283
global_orthographic_density    0.327780
global_synonyms_count          0.244050
dtype: float64

Regressing rel orthographic_density with 195 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.2060170435559943

intercept                                             -0.138673
global_aoa                                            -0.296317
global_clustering                                      0.596089
global_frequency                                       0.272554
global_letters_count                                  -0.264679
global_orthographic_density                            2.106722
global_synonyms_count                                 -0.183045
global_aoa * global_clustering                        -0.039967
global_aoa * global_frequency                         -0.018500
global_aoa * global_letters_count                      0.038512
global_aoa * global_orthographic_density               0.022684
global_aoa * global_synonyms_count                    -0.033871
global_clustering * global_frequency                  -0.031504
global_clustering * global_letters_count              -0.022109
global_clustering * global_orthographic_density        0.151735
global_clustering * global_synonyms_count             -0.115838
global_frequency * global_letters_count               -0.021159
global_frequency * global_orthographic_density        -0.129185
global_frequency * global_synonyms_count              -0.003152
global_letters_count * global_orthographic_density     0.025816
global_letters_count * global_synonyms_count           0.020846
global_orthographic_density * global_synonyms_count   -0.103534
dtype: float64

Regressing global orthographic_density with 195 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.12471876743527022

intercept                   1.443948
rel_aoa                    -0.020029
rel_clustering              0.095188
rel_frequency              -0.009204
rel_letters_count           0.062585
rel_orthographic_density    0.414605
rel_synonyms_count          0.218458
dtype: float64

Regressing global orthographic_density with 195 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.180167105439908

intercept                                        1.313283
rel_aoa                                          0.002685
rel_clustering                                   0.267633
rel_frequency                                   -0.097018
rel_letters_count                                0.023299
rel_orthographic_density                         0.332535
rel_synonyms_count                               0.233279
rel_aoa * rel_clustering                         0.030645
rel_aoa * rel_frequency                         -0.002558
rel_aoa * rel_letters_count                     -0.000254
rel_aoa * rel_orthographic_density               0.019790
rel_aoa * rel_synonyms_count                    -0.049003
rel_clustering * rel_frequency                  -0.007847
rel_clustering * rel_letters_count              -0.018367
rel_clustering * rel_orthographic_density        0.190993
rel_clustering * rel_synonyms_count             -0.117974
rel_frequency * rel_letters_count                0.010207
rel_frequency * rel_orthographic_density        -0.071000
rel_frequency * rel_synonyms_count               0.043961
rel_letters_count * rel_orthographic_density    -0.061028
rel_letters_count * rel_synonyms_count           0.024516
rel_orthographic_density * rel_synonyms_count   -0.124361
dtype: float64

Regressing rel orthographic_density with 195 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.21200395891513812

intercept                  -0.529789
rel_aoa                    -0.024761
rel_clustering              0.038182
rel_frequency               0.045638
rel_letters_count           0.068443
rel_orthographic_density    0.487484
rel_synonyms_count          0.195196
dtype: float64

Regressing rel orthographic_density with 195 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.2467261896003298

intercept                                       -0.621360
rel_aoa                                         -0.061128
rel_clustering                                   0.114426
rel_frequency                                    0.004077
rel_letters_count                                0.027319
rel_orthographic_density                         0.368258
rel_synonyms_count                               0.122547
rel_aoa * rel_clustering                         0.020321
rel_aoa * rel_frequency                         -0.014810
rel_aoa * rel_letters_count                      0.015754
rel_aoa * rel_orthographic_density               0.030596
rel_aoa * rel_synonyms_count                    -0.049505
rel_clustering * rel_frequency                  -0.029552
rel_clustering * rel_letters_count              -0.022218
rel_clustering * rel_orthographic_density        0.130359
rel_clustering * rel_synonyms_count             -0.124581
rel_frequency * rel_letters_count               -0.005050
rel_frequency * rel_orthographic_density        -0.064447
rel_frequency * rel_synonyms_count               0.014734
rel_letters_count * rel_orthographic_density    -0.024267
rel_letters_count * rel_synonyms_count           0.027017
rel_orthographic_density * rel_synonyms_count   -0.123446
dtype: float64

Regressing global orthographic_density with 195 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.17595776363209661

intercept                      3.575997
global_aoa                     0.070082
global_clustering              0.176527
global_frequency              -0.080275
global_letters_count          -0.257905
global_orthographic_density    0.249900
global_synonyms_count          0.245103
rel_aoa                       -0.076704
rel_clustering                -0.104257
rel_frequency                  0.104305
rel_letters_count              0.280610
rel_orthographic_density       0.132764
rel_synonyms_count            -0.017050
dtype: float64

Regressing global orthographic_density with 195 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.4501460155598608

intercept                                                -26.210886
global_aoa                                                -0.286535
global_clustering                                         -4.687655
global_frequency                                           0.824886
global_letters_count                                       1.296383
global_orthographic_density                                6.675620
global_synonyms_count                                      9.202337
rel_aoa                                                   -0.790272
rel_clustering                                             2.429198
rel_frequency                                             -2.066068
rel_letters_count                                         -1.273293
rel_orthographic_density                                  -4.052206
rel_synonyms_count                                        -3.203522
global_aoa * global_clustering                            -0.015162
global_aoa * global_frequency                              0.003717
global_aoa * global_letters_count                          0.007322
global_aoa * global_orthographic_density                   0.099369
global_aoa * global_synonyms_count                        -0.370093
global_aoa * rel_aoa                                       0.013270
global_aoa * rel_clustering                               -0.094906
global_aoa * rel_frequency                                -0.046500
global_aoa * rel_letters_count                             0.025171
global_aoa * rel_orthographic_density                     -0.094085
global_aoa * rel_synonyms_count                            0.442807
global_clustering * global_frequency                       0.143241
global_clustering * global_letters_count                   0.157731
global_clustering * global_orthographic_density            0.953176
global_clustering * global_synonyms_count                  0.235217
global_clustering * rel_aoa                               -0.220004
global_clustering * rel_clustering                         0.082199
global_clustering * rel_frequency                         -0.462904
global_clustering * rel_letters_count                     -0.026875
global_clustering * rel_orthographic_density              -0.514667
global_clustering * rel_synonyms_count                    -0.164311
global_frequency * global_letters_count                   -0.010367
global_frequency * global_orthographic_density            -0.070369
global_frequency * global_synonyms_count                  -0.057611
global_frequency * rel_aoa                                 0.032792
global_frequency * rel_clustering                          0.081059
global_frequency * rel_frequency                          -0.022263
global_frequency * rel_letters_count                       0.027356
global_frequency * rel_orthographic_density                0.124243
global_frequency * rel_synonyms_count                     -0.139273
global_letters_count * global_orthographic_density        -0.250896
global_letters_count * global_synonyms_count              -0.401430
global_letters_count * rel_aoa                            -0.065249
global_letters_count * rel_clustering                      0.099923
global_letters_count * rel_frequency                       0.045557
global_letters_count * rel_letters_count                  -0.003542
global_letters_count * rel_orthographic_density            0.230430
global_letters_count * rel_synonyms_count                 -0.221056
global_orthographic_density * global_synonyms_count       -1.260806
global_orthographic_density * rel_aoa                     -0.388108
global_orthographic_density * rel_clustering              -0.962569
global_orthographic_density * rel_frequency               -0.240477
global_orthographic_density * rel_letters_count            0.273423
global_orthographic_density * rel_orthographic_density    -0.338169
global_orthographic_density * rel_synonyms_count           0.901810
global_synonyms_count * rel_aoa                            0.349465
global_synonyms_count * rel_clustering                    -0.074229
global_synonyms_count * rel_frequency                      0.378784
global_synonyms_count * rel_letters_count                  0.647340
global_synonyms_count * rel_orthographic_density           1.220115
global_synonyms_count * rel_synonyms_count                -0.239027
rel_aoa * rel_clustering                                   0.189464
rel_aoa * rel_frequency                                    0.003220
rel_aoa * rel_letters_count                                0.016374
rel_aoa * rel_orthographic_density                         0.333375
rel_aoa * rel_synonyms_count                              -0.442853
rel_clustering * rel_frequency                             0.209625
rel_clustering * rel_letters_count                        -0.131951
rel_clustering * rel_orthographic_density                  0.834254
rel_clustering * rel_synonyms_count                       -0.038013
rel_frequency * rel_letters_count                         -0.059735
rel_frequency * rel_orthographic_density                   0.064213
rel_frequency * rel_synonyms_count                        -0.111650
rel_letters_count * rel_orthographic_density              -0.453425
rel_letters_count * rel_synonyms_count                    -0.146430
rel_orthographic_density * rel_synonyms_count             -1.103019
dtype: float64

Regressing rel orthographic_density with 195 measures, no interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.27068420746221955

intercept                      2.020793
global_aoa                     0.089689
global_clustering              0.144500
global_frequency              -0.050760
global_letters_count          -0.202624
global_orthographic_density   -0.385943
global_synonyms_count          0.295158
rel_aoa                       -0.095186
rel_clustering                -0.091146
rel_frequency                  0.086397
rel_letters_count              0.208155
rel_orthographic_density       0.826848
rel_synonyms_count            -0.099491
dtype: float64

Regressing rel orthographic_density with 195 measures, with interactions
           ^^^^^^^^^^^^^^^^^^^^^^^^
R^2 = 0.5231819096458818

intercept                                                -16.416901
global_aoa                                                -0.043738
global_clustering                                         -3.274828
global_frequency                                           0.185432
global_letters_count                                       0.521026
global_orthographic_density                                3.755240
global_synonyms_count                                      8.833460
rel_aoa                                                   -1.068499
rel_clustering                                             0.926574
rel_frequency                                             -1.774470
rel_letters_count                                         -0.102774
rel_orthographic_density                                  -1.052371
rel_synonyms_count                                        -2.296872
global_aoa * global_clustering                             0.025569
global_aoa * global_frequency                              0.008353
global_aoa * global_letters_count                          0.034638
global_aoa * global_orthographic_density                   0.029315
global_aoa * global_synonyms_count                        -0.331871
global_aoa * rel_aoa                                       0.014774
global_aoa * rel_clustering                               -0.080466
global_aoa * rel_frequency                                -0.037859
global_aoa * rel_letters_count                             0.014585
global_aoa * rel_orthographic_density                      0.013980
global_aoa * rel_synonyms_count                            0.469447
global_clustering * global_frequency                       0.053226
global_clustering * global_letters_count                   0.091029
global_clustering * global_orthographic_density            0.757796
global_clustering * global_synonyms_count                  0.038494
global_clustering * rel_aoa                               -0.215269
global_clustering * rel_clustering                         0.021598
global_clustering * rel_frequency                         -0.352318
global_clustering * rel_letters_count                      0.136535
global_clustering * rel_orthographic_density              -0.265528
global_clustering * rel_synonyms_count                    -0.004833
global_frequency * global_letters_count                   -0.016773
global_frequency * global_orthographic_density             0.009881
global_frequency * global_synonyms_count                  -0.118147
global_frequency * rel_aoa                                 0.051513
global_frequency * rel_clustering                          0.090732
global_frequency * rel_frequency                          -0.004342
global_frequency * rel_letters_count                       0.034479
global_frequency * rel_orthographic_density                0.044157
global_frequency * rel_synonyms_count                     -0.160080
global_letters_count * global_orthographic_density        -0.096482
global_letters_count * global_synonyms_count              -0.573880
global_letters_count * rel_aoa                            -0.090796
global_letters_count * rel_clustering                      0.098249
global_letters_count * rel_frequency                       0.031563
global_letters_count * rel_letters_count                  -0.001208
global_letters_count * rel_orthographic_density            0.087453
global_letters_count * rel_synonyms_count                 -0.152087
global_orthographic_density * global_synonyms_count       -1.036730
global_orthographic_density * rel_aoa                     -0.297049
global_orthographic_density * rel_clustering              -0.534706
global_orthographic_density * rel_frequency               -0.148726
global_orthographic_density * rel_letters_count            0.165845
global_orthographic_density * rel_orthographic_density    -0.253846
global_orthographic_density * rel_synonyms_count           0.754509
global_synonyms_count * rel_aoa                            0.265440
global_synonyms_count * rel_clustering                     0.274769
global_synonyms_count * rel_frequency                      0.451192
global_synonyms_count * rel_letters_count                  0.641328
global_synonyms_count * rel_orthographic_density           0.804343
global_synonyms_count * rel_synonyms_count                -0.304402
rel_aoa * rel_clustering                                   0.156456
rel_aoa * rel_frequency                                   -0.012557
rel_aoa * rel_letters_count                                0.025929
rel_aoa * rel_orthographic_density                         0.197799
rel_aoa * rel_synonyms_count                              -0.424943
rel_clustering * rel_frequency                             0.173061
rel_clustering * rel_letters_count                        -0.238138
rel_clustering * rel_orthographic_density                  0.316317
rel_clustering * rel_synonyms_count                       -0.413106
rel_frequency * rel_letters_count                         -0.061489
rel_frequency * rel_orthographic_density                  -0.029418
rel_frequency * rel_synonyms_count                        -0.126733
rel_letters_count * rel_orthographic_density              -0.334528
rel_letters_count * rel_synonyms_count                    -0.051090
rel_orthographic_density * rel_synonyms_count             -0.816305
dtype: float64

	aoa	betweenness	clustering	degree	frequency	letters_count	orthographic_density	pagerank	phonemes_count	phonological_density	syllables_count	synonyms_count
Component-0	-0.468881	0.324011	-0.100663	0.251706	0.275475	-0.423402	0.213411	0.275275	-0.366154	0.263802	-0.146755	-0.002217
Component-1	0.272132	-0.323434	0.098092	-0.267714	-0.302221	-0.428990	0.121751	-0.316236	-0.530175	0.214495	-0.147851	0.029822
Component-2	-0.672428	-0.106674	-0.005963	0.001734	-0.718328	0.082599	0.028255	-0.050776	0.083237	-0.031154	0.014735	0.045954