Differences between U.S. newspapers in their construal of risk

One specific are of interest is whether risk behaves differently in the six publications, irrespective of the year of a text's publication. In this investigation, we will focus on dependency relationships with risk words, rather than constituencies. As such, we won't be querying syntax trees, but Stanford CoreNLP's XML structures.


In [1]:
# import module
from corpkit import *
# various wordlists
from dictionaries import *
# model corpora in data directory
corpora = Corpora('data')
# inline figures
%matplotlib inline


Corpus: /Volumes/extra/risk/data/CHT-parsed
Corpus: /Volumes/extra/risk/data/NYT-parsed
Corpus: /Volumes/extra/risk/data/TBT-parsed
Corpus: /Volumes/extra/risk/data/UST-parsed
Corpus: /Volumes/extra/risk/data/WAP-parsed
Corpus: /Volumes/extra/risk/data/WSJ-parsed

In [2]:
# load previous results
r = load_all_results()

# map results to some elaborated names
trans = {'adjrisk'        : 'Adjectives modifying risk',
         'risker'         : 'Subjects of risk',
         'riskers'        : 'Subjects of risk, run risk, take risk',
         'riskyx'         : 'Nouns modified by adjectival risk',
         'torisksomething': 'Objects of risk processes',
         'nommod'         : 'Nominal modifiers of risk participants',
         'risknoun'       : 'Nouns modified by nominal risk'}


17:51:32: adjrisk.p loaded as adjrisk.
17:52:49: entities.p loaded as entities.
17:52:51: nommod.p loaded as nommod.
17:53:06: nrelrisk.p loaded as nrelrisk.
17:53:30: nriskers.p loaded as nriskers.
17:53:46: prelrisk.p loaded as prelrisk.
17:54:13: priskers.p loaded as priskers.
17:54:31: relrisker.p loaded as relrisker.
17:54:33: riskers.p loaded as riskers.
17:54:58: risknoun.p loaded as risknoun.
17:55:00: riskyx.p loaded as riskyx.
17:55:03: torisksomething.p loaded as torisksomething.
17:55:03: 12 interrogations loaded from saved_interrogations.

In [3]:
# for moving results around and preserving order
from collections import OrderedDict

# table viewing parameters
import pandas as pd
pd.set_option('display.max_rows', 30)
pd.set_option('display.max_columns', 30)
pd.set_option('display.width', None)
pd.set_option('display.precision', 2)
pd.set_option('expand_frame_repr', True)

# colouring -- requires seaborn
import seaborn as sns
# for colouring freq tables
cm = sns.light_palette("green", as_cmap=True)
# keyword tables
divergemap = sns.diverging_palette(10, 133, as_cmap=True)

sixcolours = sns.color_palette()

# display html table
from IPython.display import display, HTML

Our plan

What we're going to concentrate on are things that co-occur lexicogrammatically with risk as a participant and risk as a process. For example, we want to find common subjects and objects of risk processes, as well as how risk as a participan is modified.

In the sections below, we'll run all the investigations, and save the results, so that we can load them all as one item for editing and visualising. We'll first look at the data by feature, and then create sketches of the behaviour of risk in each publication.

The first interrogation, of subjects of risk processes, will feature a bit of commentary. The rest will follow the same structure, but will just be code.

Subjects of risk processes

Subjects of risk as process are best located with a Tregex query, as this can simultaneously locate those who risk, those who take risks, and those who run risks.


In [223]:
q = {T: r'/NN.?/ !< /(?i).?\brisk.?/ >># (@NP $ (VP <+(VP) (VP ( <<# (/VB.?/ < /(?i).?\brisk.?/) '\
        r'| <<# (/VB.?/ < /(?i)(take|taking|takes|taken|took|run|running|runs|ran|put|putting|puts)/) '\
        r'< (NP <<# (/NN.?/ < /(?i).?\brisk.?/))))))'}

We can then pass the query to an interrogation method, choosing to show the lemma form, and to save to disk as riskers.p:


In [ ]:
riskers = corpora.interrogate(q, show = L, quicksave = 'riskers')

In [ ]:
# this can take hours, depending on the corpora's size!
#risker = corpora.interrogate(q, show = L, save = 'risker')

The result is an Interrodict object, with attributes for each newspaper. If we wanted to quickly see the raw results, we could use topwords():


In [16]:
r.riskers.topwords()


CHT             %   NYT             %   TBT           %   UST           %   WAP             %   WSJ             %
person       4.12   person       3.22   person     3.78   person     3.48   person       3.67   investor     4.65
company      1.56   company      2.30   city       2.35   company    2.29   state        1.71   company      4.04
man          1.33   bank         1.36   man        1.33   investor   1.18   government   1.53   bank         2.85
investor     1.12   investor     1.15   ray        1.29   clinton    1.18   company      1.49   person       2.07
woman        1.07   state        1.15   county     1.29   state      1.18   president    1.07   firm         1.34
bush         0.99   government   1.14   state      0.98   bush       1.11   man          0.87   government   1.27
player       0.94   man          0.84   company    0.89   bank       0.96   investor     0.73   fund         1.05
bank         0.78   leader       0.83   team       0.84   woman      0.96   official     0.73   fed          1.05
government   0.76   president    0.77   american   0.80   driver     0.89   leader       0.73   u.s.         1.05
owner        0.73   woman        0.71   student    0.76   fed        0.74   bush         0.71   manager      0.78

Lots of pronouns is to be expected, as we didn't exclude them in our search. Maybe something like this instead:


In [17]:
nopro = r.riskers.edit(skip_entries = wordlists.closedclass, print_info = False)
nopro.topwords(df = True).style.set_caption('Riskers')


Out[17]:
Riskers
CHT CHT NYT NYT TBT TBT UST UST WAP WAP WSJ WSJ
Result % Result % Result % Result % Result % Result %
0.0 person 4.23 person 3.3 person 3.9 person 3.59 person 3.76 investor 4.74
1.0 company 1.61 company 2.36 city 2.43 company 2.37 state 1.75 company 4.12
2.0 man 1.37 bank 1.39 man 1.38 investor 1.22 government 1.57 bank 2.9
3.0 investor 1.15 investor 1.18 ray 1.33 clinton 1.22 company 1.53 person 2.11
4.0 woman 1.1 state 1.18 county 1.33 state 1.22 president 1.09 firm 1.36
5.0 bush 1.02 government 1.16 state 1.01 bush 1.15 man 0.89 government 1.29
6.0 player 0.96 man 0.86 company 0.92 woman 0.99 official 0.75 fund 1.07
7.0 bank 0.8 leader 0.85 team 0.87 bank 0.99 leader 0.75 fed 1.07
8.0 government 0.78 president 0.79 american 0.83 driver 0.92 investor 0.75 u.s. 1.07
9.0 owner 0.75 woman 0.72 student 0.78 fed 0.76 bush 0.73 manager 0.79

Another view is provided by the collapse() method:


In [18]:
# collapse rows (years)
nopro.collapse().results.top()
# top() avoids pandas' truncation of large results
# it responds to the number of max columns


Out[18]:
person company investor bank government state man woman bush president leader city player firm administration clinton team american fed official manager republican country owner fund member israel worker student democrat
CHT 158 60 43 30 29 22 51 41 38 17 27 25 36 12 21 20 20 17 13 15 18 9 13 28 12 9 14 12 12 10
NYT 187 134 67 79 66 67 49 41 38 45 48 19 38 28 38 30 27 30 18 32 29 28 17 18 23 28 26 26 19 27
TBT 85 20 8 10 10 22 30 11 9 11 11 53 12 6 8 6 19 18 1 9 3 5 8 13 3 11 7 10 17 4
UST 47 31 16 13 10 16 6 13 15 6 6 5 5 4 7 16 9 5 10 3 8 6 7 10 2 4 5 2 4 5
WAP 165 67 33 25 69 77 39 22 32 48 33 25 28 22 28 31 27 25 20 33 11 30 26 12 5 20 23 19 27 24
WSJ 85 166 191 117 52 20 18 26 21 17 18 6 9 55 25 22 9 14 43 11 32 21 27 9 43 14 10 13 2 9

In [8]:
# collapse columns (words)
nopro.collapse('x').results


Out[8]:
1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
CHT 151 145 129 136 154 135 144 166 146 133 107 127 184 156 161 147 183 167 146 126 158 110 89 70 81 85 114 85
NYT 202 190 198 217 194 198 180 179 167 177 190 218 219 210 212 235 232 213 202 216 179 238 229 237 228 206 188 114
TBT 113 110 113 119 120 87 83 93 65 73 58 86 76 77 77 75 71 78 56 77 74 83 112 42 45 32 35 47
UST 12 18 12 24 23 60 61 62 63 50 64 71 60 58 45 68 71 59 53 53 46 44 41 50 36 35 42 28
WAP 157 131 149 152 138 159 154 169 108 103 125 189 191 193 171 194 227 152 178 172 162 162 170 127 135 193 147 80
WSJ 114 99 134 119 96 118 108 125 109 98 118 124 133 117 103 137 144 162 168 140 189 206 202 172 200 208 193 196

In [9]:
# collapse dict key names (newspapers)
nopro.collapse('n').results.head(10).top()


Out[9]:
person company investor bank government state man woman bush president leader city player administration firm clinton team american fed official manager republican country owner fund member israel worker student democrat
1987 28 21 8 5 6 10 11 6 0 8 9 11 1 9 10 0 3 4 5 1 4 1 2 2 2 5 2 0 1 4
1988 22 11 8 5 7 9 4 4 5 5 3 8 6 3 4 0 6 1 2 3 1 0 1 3 0 2 3 3 1 4
1989 27 15 6 6 11 8 7 5 7 5 8 8 6 1 2 0 3 7 9 5 2 0 5 2 1 3 6 2 3 1
1990 26 14 4 7 16 16 7 5 18 4 4 7 3 8 10 0 2 6 2 2 6 0 8 5 0 2 8 5 7 1
1991 35 9 10 11 6 9 15 5 14 5 10 6 5 5 4 0 2 7 4 3 5 1 1 1 0 2 4 3 3 2
1992 26 16 16 11 3 10 4 12 18 8 4 6 4 2 2 11 1 6 6 0 3 1 5 5 0 0 0 4 5 6
1993 23 19 10 8 8 7 10 6 0 4 1 4 8 7 2 26 2 4 1 2 4 2 1 6 1 7 3 2 2 4
1994 29 14 19 7 8 8 4 12 1 5 2 5 4 8 1 18 7 3 5 5 5 1 4 7 10 7 1 3 4 0
1995 27 11 8 17 5 11 8 6 0 8 2 4 3 8 2 13 5 1 4 7 4 3 1 8 1 2 2 4 2 1
1996 13 20 14 6 6 10 6 11 0 1 2 6 3 4 1 10 3 3 4 1 4 4 3 3 1 1 3 3 2 0

Recent versions of pandas allow some highlighting of values, too:


In [21]:
nopro.collapse().results.iloc[:,:10].style.background_gradient(cmap=cm, axis = 0)


Out[21]:
person company investor bank government state man woman bush president
CHT 158.0 60.0 43.0 30.0 29.0 22.0 51.0 41.0 38.0 17.0
NYT 187.0 134.0 67.0 79.0 66.0 67.0 49.0 41.0 38.0 45.0
TBT 85.0 20.0 8.0 10.0 10.0 22.0 30.0 11.0 9.0 11.0
UST 47.0 31.0 16.0 13.0 10.0 16.0 6.0 13.0 15.0 6.0
WAP 165.0 67.0 33.0 25.0 69.0 77.0 39.0 22.0 32.0 48.0
WSJ 85.0 166.0 191.0 117.0 52.0 20.0 18.0 26.0 21.0 17.0

This works better, however, when we have relative, rather than absolute, frequencies:


In [22]:
rel = nopro.collapse().edit('%', SELF, print_info = False)
#.results * 100.0 / nopro.collapse().results.sum()
rel.results.T.head(10).style.background_gradient(cmap=cm, axis = 1).set_caption('Riskers')


Out[22]:
Riskers
CHT NYT TBT UST WAP WSJ
person 4.23 3.3 3.9 3.59 3.76 2.11
company 1.61 2.36 0.92 2.37 1.53 4.12
investor 1.15 1.18 0.37 1.22 0.75 4.74
bank 0.8 1.39 0.46 0.99 0.57 2.9
government 0.78 1.16 0.46 0.76 1.57 1.29
state 0.59 1.18 1.01 1.22 1.75 0.5
man 1.37 0.86 1.38 0.46 0.89 0.45
woman 1.1 0.72 0.51 0.99 0.5 0.64
bush 1.02 0.67 0.41 1.15 0.73 0.52
president 0.46 0.79 0.51 0.46 1.09 0.42

We can already see some big differences between the Wall Street Journal and other publications. But anyway, we'll come back to this kind of stuff later, after running some interrogations.

Interrogating the corpus

Below, we collect a number of lexicogrammatical phenomena in clauses containing risk.


In [ ]:
# objects of risk: to risk LIFE
q = {F: '(dobj|.subjpass)', GL: r'\brisk'}
toriskx = corpora.interrogate(q, show = L, quicksave = 'toriskx')

In [ ]:
# nominal modifiers of nominal risk: a SECURITY risk
q = {F: '^(compound|nn)$', GL: r'\brisk'}
nommod = corpora.interrogate(q, show = L, quicksave = 'nommod')

In [ ]:
# adjectives modifying risk: a BIG risk
q = {F: 'amod', GL: r'\brisk'}
adjrisk = corpora.interrogate(q, show = L, quicksave = 'adjrisk')

In [ ]:
# nouns modified by adjectival risk: a risky DECISION
q = {F: 'amod', L: r'\brisk'}
riskyx = corpora.interrogate(q, show = GL, quicksave = 'riskyx')

In [ ]:
# objects of risk processes: to risk a LIFE
q = {F: '(dobj|.subjpass)', GL: r'\brisk'}
torisksomething = corpora.interrogate(q, show = L, quicksave = 'torisksomething')

In [ ]:
# nouns modified by nominal risk: risk MANAGEMENT
q = {F: '^(compound|nn)$', L: r'\brisk'}
risknoun = corpora.interrogate(q, show = GL, quicksave = 'risknoun')

Charting behaviour of risk

With all of this data, we can produce a sketch of the behaviour of risk in each publication using some pandas objects, and visualise results using corpkit's, visualise() method.


In [19]:
%matplotlib inline
for name, data in r.items():
    # skip results we're not interested in
    if name not in trans.keys():
        continue
    # refresh class attributes
    data = Interrodict(data)
    data = data.collapse()
    data = data.edit('%', SELF, skip_entries = wordlists.closedclass, print_info = False)
    tab = data.results.T.head(15).style.background_gradient(cmap=cm, axis = 1).set_caption(trans[name])

    # note, log y axes are turned on, because of outlier results!
    plt = data.results.visualise(trans[name], kind='bar', grid = True, figsize = (11, 5), logy = True, ncol = 1,
                        x_label = 'Publication', num_to_plot = 15, legend_pos = 'outside right', fontsize = 16)
    plt.show() 
    
    plt = data.results.visualise(trans[name], kind='bar', grid = True, figsize = (11, 5), ncol = 1,
                                 x_label = 'Publication', num_to_plot = 15, stacked = True,
                                 legend_pos = 'outside right', fontsize = 16, reverse_legend = True)
    plt.show()  
    display(tab)


Nouns modified by nominal risk
CHT NYT TBT UST WAP WSJ
factor 27.66 19.47 20.08 28.67 23.22 8.94
management 6.02 6.94 9.62 3.14 5.95 8.4
assessment 3.05 3.89 3.44 2.55 6.64 2.22
premium 0.69 1.79 0.36 0.63 1.11 6.42
manager 1.43 1.83 5.28 1.0 1.14 2.28
taker 1.73 2.78 1.93 3.77 1.38 1.17
group 1.94 2.12 2.3 1.78 1.85 1.41
tolerance 2.84 1.1 0.52 2.72 0.92 2.35
profile 1.26 1.26 0.41 1.29 1.32 3.21
business 3.54 1.57 1.96 2.63 1.11 0.76
aversion 0.43 1.04 0.08 0.83 0.69 3.67
analysis 1.08 1.73 0.84 1.73 2.44 1.1
level 1.36 1.3 1.09 1.31 1.45 1.63
officer 0.5 1.68 0.23 0.8 0.84 2.09
pool 1.02 1.33 1.7 1.31 1.43 1.1
Nominal modifiers of risk participants
CHT NYT TBT UST WAP WSJ
health 21.27 17.04 25.94 15.34 19.78 8.36
cancer 7.75 6.39 6.25 10.92 6.59 3.14
security 4.67 6.55 6.54 3.97 7.96 2.48
credit 3.79 3.94 2.0 2.26 4.11 9.27
downside 1.97 2.57 0.59 1.76 1.94 5.49
heart 2.69 3.29 1.83 6.35 2.14 1.53
safety 2.93 2.6 3.53 2.48 3.29 1.67
flight 3.53 2.1 4.49 0.87 2.73 0.65
market 1.33 1.46 0.86 0.85 1.28 2.84
inflation 0.82 0.92 0.15 0.96 0.95 3.39
currency 0.66 1.52 0.19 0.81 0.47 3.01
breast 2.06 1.39 0.66 4.32 1.44 0.19
investment 1.3 1.09 0.8 0.97 1.22 1.88
disease 1.45 1.27 1.13 2.63 1.51 0.42
business 0.68 0.8 0.79 0.28 0.76 1.39
Nouns modified by adjectival risk
CHT NYT TBT UST WAP WSJ
investment 3.33 3.68 3.38 4.64 3.17 5.04
asset 0.59 1.95 0.17 1.64 0.87 6.89
business 3.43 2.83 2.92 2.28 2.57 2.15
behavior 3.66 2.95 2.5 4.01 3.37 1.11
loan 1.77 2.1 1.61 1.68 2.23 3.04
bond 1.18 1.66 0.98 1.23 1.25 4.19
group 2.55 2.2 2.36 1.64 2.5 0.87
child 2.32 0.8 4.64 1.24 2.92 0.22
strategy 1.36 2.05 0.83 1.8 1.47 2.13
student 1.94 0.74 5.68 1.11 2.58 0.16
youth 2.03 0.65 3.74 1.26 3.4 0.22
area 1.42 1.52 2.17 2.16 1.28 1.17
security 0.99 1.45 1.1 0.86 0.99 2.05
venture 1.53 1.58 1.51 0.9 1.27 0.95
patient 1.47 1.23 1.32 1.68 1.25 0.99
Adjectives modifying risk
CHT NYT TBT UST WAP WSJ
high 6.84 6.89 8.91 7.63 6.71 4.31
increase 7.02 5.05 5.17 6.52 5.92 4.37
greater 5.65 4.55 6.14 5.5 4.98 3.73
higher 5.51 3.23 4.27 6.51 3.92 4.18
potential 2.79 2.65 3.28 2.61 2.72 2.87
political 1.98 3.1 1.66 1.57 2.93 3.31
financial 2.24 2.41 2.86 1.49 2.17 2.99
significant 1.95 2.15 1.7 1.64 2.28 2.21
big 1.84 2.01 1.94 2.69 1.45 2.64
serious 2.11 2.08 1.97 1.58 2.28 1.29
lower 2.14 1.48 1.46 2.18 2.0 1.62
great 1.77 1.88 2.51 1.45 2.18 0.81
little 1.63 1.76 1.72 1.6 1.43 1.54
real 1.34 1.93 1.3 1.7 1.72 1.45
own 1.94 1.41 3.07 1.49 1.46 1.08
Objects of risk processes
CHT NYT TBT UST WAP WSJ
life 15.99 14.44 22.8 16.2 16.57 6.91
injury 3.0 2.13 3.96 3.27 2.44 0.43
loss 1.43 1.83 1.3 1.46 1.43 1.91
benefit 1.21 1.52 0.73 1.63 1.35 2.1
death 1.2 1.29 1.17 1.25 1.24 0.67
money 0.9 1.02 1.09 0.85 1.01 1.8
career 0.98 0.98 1.14 1.34 1.28 0.58
damage 1.13 0.95 0.98 1.25 1.13 0.75
wrath 1.06 0.98 0.8 1.21 0.98 0.96
health 1.06 0.95 1.27 1.04 0.82 0.49
fine 1.11 0.83 1.08 1.15 0.97 0.4
reward 0.74 0.75 0.54 1.27 0.74 1.42
reputation 0.8 0.91 0.8 0.93 0.79 0.72
arrest 0.87 0.91 0.72 0.74 0.99 0.36
risk 0.88 0.57 0.52 0.98 0.77 1.24
Subjects of risk, run risk, take risk
CHT NYT TBT UST WAP WSJ
person 4.23 3.3 3.9 3.59 3.76 2.11
company 1.61 2.36 0.92 2.37 1.53 4.12
investor 1.15 1.18 0.37 1.22 0.75 4.74
bank 0.8 1.39 0.46 0.99 0.57 2.9
government 0.78 1.16 0.46 0.76 1.57 1.29
state 0.59 1.18 1.01 1.22 1.75 0.5
man 1.37 0.86 1.38 0.46 0.89 0.45
woman 1.1 0.72 0.51 0.99 0.5 0.64
bush 1.02 0.67 0.41 1.15 0.73 0.52
president 0.46 0.79 0.51 0.46 1.09 0.42
leader 0.72 0.85 0.51 0.46 0.75 0.45
city 0.67 0.34 2.43 0.38 0.57 0.15
player 0.96 0.67 0.55 0.38 0.64 0.22
administration 0.56 0.67 0.37 0.53 0.64 0.62
firm 0.32 0.49 0.28 0.31 0.5 1.36

Keywording

We can also generate keywords for each newspaper, using the total dataset as the reference corpus. A diverging colour scheme will help us know what is key and unkey at a glance.


In [24]:
for name, data in r.items():
    # skip results we're not interested in
    if name not in trans.keys():
        continue
    # refresh class attributes
    data = Interrodict(data)
    data = data.collapse()
    data = data.edit('k', SELF, skip_entries = wordlists.closedclass,
                     replace_names = r'[^a-zA-Z0-9-]', print_info = False)
    tab = data.results.T.head(50).style.background_gradient(cmap=divergemap, axis = 1).set_caption(trans[name])
    display(tab)


Nouns modified by nominal risk
CHT NYT TBT UST WAP WSJ
factor 624.75 13.36 14.27 233.29 170.65 -1663.58
premium -258.55 -45.38 -201.48 -85.1 -140.4 1434.85
appetite -144.46 -140.64 -157.13 -35.33 -229.75 1296.5
aversion -141.43 -26.76 -161.93 -14.01 -73.38 775.49
business 263.07 -0.45 5.02 24.01 -26.6 -160.94
profile -16.98 -22.98 -96.79 -4.02 -12.87 329.44
forecast -32.55 -43.46 -14.36 -7.12 -32.21 233.35
officer -74.28 30.29 -84.23 -6.51 -18.97 152.59
agency 167.06 -6.94 -11.57 -7.9 -5.42 -33.72
consultants 158.45 -7.89 -0.45 -11.3 -18.29 -32.25
control -40.07 9.45 -28.39 -20.72 -57.97 176.71
provision -18.38 -7.59 -19.51 -16.32 -36.59 170.58
weighting -43.99 -2.04 -15.89 -4.86 -29.73 140.99
sentiment -23.08 -21.97 -11.99 -7.31 -15.84 139.08
child 76.95 -34.3 28.57 -1.04 8.43 -109.33
resident -24.71 -33.29 -6.96 -2.43 262.12 -60.45
reversal -20.92 -19.39 -10.86 -0.56 -20.73 116.38
exposure -17.7 -8.43 -0.54 -5.51 -13.65 123.58
asset -22.89 -8.98 -26.15 8.23 -19.97 107.52
taker -0.31 90.58 0.86 75.47 -15.38 -74.78
neglect -17.67 -15.57 -4.06 -5.59 154.13 -32.91
capital -12.35 -0.05 -10.72 -5.99 -5.99 76.37
disclosure -3.87 -6.13 -20.37 -1.53 -4.74 72.09
trade -8.79 -9.01 -14.05 0.17 -8.63 68.35
weight -14.06 -11.27 -10.86 -6.62 0.27 51.64
inc 12.75 -46.96 2.39 -28.43 -47.08 92.27
parity -12.98 -4.07 -6.74 -4.11 -12.87 64.15
mom 61.26 -7.86 -3.18 -1.94 -6.08 -11.42
manager -22.6 -2.03 290.93 -24.07 -57.54 10.35
arbitrager -21.84 12.14 -25.1 -15.3 -47.9 67.27
management -12.49 0.54 79.24 -101.66 -16.34 87.85
assessment -6.99 9.28 -0.0 -10.32 341.18 -151.42
firm 17.45 24.6 -20.34 -0.06 0.28 -24.23
regulator -38.95 9.43 -20.23 -6.64 20.48 0.45
perception -22.33 1.28 -15.65 -0.18 0.18 23.88
spectrum -9.04 0.01 -8.23 -3.7 -6.6 42.16
ltd -0.44 -6.47 -13.67 -8.33 -1.73 43.38
area 17.02 -10.45 39.47 1.12 0.03 -27.61
department -0.37 4.51 83.56 -15.51 -35.01 -2.66
rating -5.81 0.19 -17.37 0.0 -1.97 36.92
intelligence -9.02 -3.0 -4.68 -0.13 -3.8 33.39
tool 0.02 -3.25 -9.64 -1.55 -5.59 42.1
clinic 33.37 -9.24 0.76 -2.28 -0.67 -13.43
romance 36.04 -4.62 -1.87 -1.14 -3.57 -6.72
cos 36.04 -4.62 -1.87 -1.14 -3.57 -6.72
subc -6.85 -8.78 -3.56 -2.17 68.77 -12.76
seen -10.46 64.18 -5.43 -3.31 -4.93 -8.6
analysis -11.7 12.38 -19.15 3.23 90.24 -24.33
behavior 2.32 -1.08 5.86 3.84 31.48 -55.18
trading -0.91 -2.38 -1.7 -9.02 -16.11 47.12
Nominal modifiers of risk participants
CHT NYT TBT UST WAP WSJ
health 334.74 28.7 427.18 -0.46 176.93 -1130.47
credit -38.48 -30.97 -149.54 -119.28 -15.4 1059.84
inflation -51.04 -40.36 -134.54 -12.42 -28.71 663.79
downside -41.41 -1.75 -179.29 -28.61 -41.84 694.28
interest-rate -55.24 -31.65 -53.35 -36.12 -91.6 659.25
flight 163.41 0.26 164.02 -64.22 34.54 -352.71
currency -72.01 6.03 -110.04 -17.11 -116.81 521.68
cancer 105.69 12.48 2.28 266.86 17.29 -412.39
heart-attack -39.38 -70.34 -11.81 -6.33 -32.37 353.54
percent 65.75 39.19 0.44 -79.15 49.77 -313.37
breast 69.42 1.14 -28.83 335.68 2.36 -387.21
counterparty -39.67 -6.73 -41.52 -41.37 -45.11 320.89
security -1.35 118.97 39.46 -12.86 295.16 -392.26
default -5.03 -23.67 -57.17 -8.63 -12.07 269.3
market -7.1 -1.76 -28.25 -29.19 -9.79 257.18
title 190.44 -34.7 -0.45 -18.39 -0.36 -55.63
foreign-exchange -32.63 -8.88 -14.17 -4.96 -23.06 153.8
prepayment -4.16 -6.11 -20.22 -25.01 -30.04 171.06
stock-market -25.42 -21.82 -2.87 -5.52 -6.67 118.52
disease 13.05 2.83 -0.03 106.25 18.27 -167.8
escape -5.38 -14.22 347.76 -5.78 -3.16 -90.09
price -0.03 -19.8 -17.58 -3.11 -11.46 126.64
litigation -17.8 0.01 -23.97 -10.74 -5.53 117.0
trading -14.55 0.0 -19.21 -5.84 -38.49 135.56
liquidity -17.58 0.84 -16.12 -24.7 -19.08 121.68
national-security -22.39 -5.7 -9.72 -9.69 -7.13 98.61
execution -11.61 -0.0 -19.5 -6.02 -28.67 122.64
exchange-rate -14.82 -4.11 -9.18 -3.51 -10.56 98.2
attack 4.85 0.15 -0.13 217.98 0.0 -148.88
event -5.45 -0.1 -33.28 -2.73 -16.06 97.33
country -12.2 0.15 -16.12 -4.75 -11.55 89.08
business -5.47 -0.26 -0.15 -35.79 -1.11 96.07
heart 2.75 51.42 -14.42 318.29 -8.58 -120.9
bank -21.35 0.18 -4.57 -4.53 -15.99 75.99
investment 0.3 -4.19 -13.5 -5.13 -0.11 82.63
taxpayer -14.02 -17.41 -0.03 -1.32 -1.13 61.72
safety 15.86 2.1 31.19 0.03 45.02 -79.36
suicide -0.13 11.04 61.59 -1.3 6.38 -58.62
balance-sheet -3.7 -14.48 -5.44 -5.42 -6.08 64.46
own -0.43 34.05 14.06 -10.26 3.42 -50.06
specialty 60.9 -10.09 -3.79 -3.78 -1.15 -13.32
injury 4.8 -2.22 1.35 89.52 7.35 -86.49
stock -0.07 -12.91 -3.46 0.2 -21.64 68.76
sinkhole -19.35 -22.38 258.81 -8.37 -18.2 -29.54
fire 0.85 4.77 17.77 31.25 0.5 -60.29
real-estate -3.92 -10.53 -3.95 -3.94 -8.56 55.77
housing -11.76 -4.39 -5.11 -1.17 -1.15 42.04
foreign-currency -6.06 -4.39 -5.11 -1.17 -11.06 53.02
lifetime 9.63 0.1 1.93 17.32 17.32 -57.24
refinancing -6.4 -0.98 -7.08 -2.48 -9.11 50.26
Nouns modified by adjectival risk
CHT NYT TBT UST WAP WSJ
asset -479.53 -43.28 -488.15 -31.05 -389.04 2635.33
bond -88.76 -17.61 -86.68 -27.95 -89.21 866.27
student 18.62 -138.75 884.34 -11.7 148.74 -849.32
child 62.06 -133.39 500.18 -7.65 228.93 -790.5
debt -29.65 -27.25 -75.45 -38.0 -51.26 493.08
behavior 107.03 19.43 -0.09 61.78 69.09 -440.28
youth 31.72 -171.55 289.8 -4.74 465.38 -736.97
currency -55.02 -43.23 -32.99 -20.67 -48.55 357.99
bet -56.59 -0.27 -96.24 0.13 -32.85 349.78
group 60.91 20.49 17.94 -1.38 59.9 -276.64
kid 13.52 -145.19 314.72 4.51 105.06 -370.19
pregnancy 56.01 2.95 77.9 -10.77 3.61 -190.88
slice -38.17 -15.8 -22.89 -14.34 -25.88 206.08
offender -2.81 85.88 32.93 3.23 0.0 -174.47
treasurys -23.73 -27.59 -14.23 -8.92 -27.61 177.74
company -39.02 -0.5 -19.04 0.07 -8.74 143.77
market -4.4 -8.2 -31.65 -0.34 -17.99 169.24
family 45.67 -4.51 62.95 -8.26 4.28 -109.09
investment -10.3 -0.27 -5.93 16.71 -25.88 162.11
security -19.43 3.9 -5.43 -13.97 -22.75 154.25
loan -15.44 -0.22 -20.16 -8.49 0.72 136.73
lifestyle 134.59 -23.0 -0.0 -3.23 -2.9 -27.99
inmate 2.56 -6.07 166.38 0.72 -0.15 -116.34
youngster 47.31 -17.26 24.19 -2.28 11.91 -95.14
investor -6.31 0.11 -43.51 3.42 -47.62 135.18
reinsurance -11.34 -13.93 -10.72 -6.72 -20.81 113.13
capital -22.39 -3.61 2.34 -17.95 -25.25 112.47
sex 20.09 32.59 -4.41 7.75 -0.35 -79.54
c-fund -19.94 -23.2 -11.96 -7.49 198.06 -37.44
fund -6.87 -2.95 -29.3 4.94 -6.84 104.75
woman 45.73 -0.04 0.01 21.68 6.51 -78.23
situation 33.8 5.44 41.25 -2.49 -0.59 -68.74
firm -12.63 -3.45 -42.32 0.48 -1.52 88.83
business 68.72 9.74 6.54 -1.96 0.11 -34.28
borrower -0.85 -11.57 -55.38 -2.19 7.94 81.32
profile -2.08 -6.57 -24.23 -0.89 -6.59 89.25
teens 21.43 -44.05 108.48 1.68 0.1 -79.18
requirement -27.3 -1.89 -0.04 -9.24 -4.72 67.83
ratio -12.3 -0.0 -3.69 -0.27 -15.15 87.54
stock 0.35 -20.62 -27.17 23.01 -6.45 69.99
category 34.65 1.33 21.89 0.01 -0.29 -47.9
product -21.89 0.33 -22.14 6.77 -5.06 62.53
return -4.31 -0.01 -55.89 20.21 -32.35 85.85
neighborhood 16.43 0.06 8.39 2.26 1.64 -66.47
mortgage -15.05 -2.59 -2.78 -0.0 0.03 64.36
individual 19.07 -0.0 3.01 0.01 25.3 -70.07
assumption -0.9 93.57 -4.32 -5.14 -7.35 -14.55
type -4.63 -4.6 -5.85 -0.28 -5.36 68.99
sector -7.32 -8.47 -12.41 -4.45 0.18 58.94
trade -10.75 3.27 -39.15 1.46 -10.7 60.01
Adjectives modifying risk
CHT NYT TBT UST WAP WSJ
systemic -345.27 -5.71 -177.17 -95.21 -10.56 1051.43
high 28.22 41.85 189.21 47.12 18.78 -371.77
interest-rate -22.96 -34.5 -30.9 -30.39 -40.71 362.9
increase 227.24 -10.38 -0.69 40.46 32.58 -98.92
great 5.54 24.24 75.52 -2.89 85.2 -288.52
-58.72 -120.28 -29.77 323.43 -35.12 91.7
excessive -86.04 19.78 -87.25 -3.12 -1.7 162.25
regulatory -22.7 -2.77 -18.77 -22.22 -30.78 222.81
greater 92.51 -2.17 79.67 23.4 11.14 -110.19
genuine 43.71 21.73 3.27 16.03 -7.04 -145.87
sovereign -49.13 0.11 -30.51 -1.24 -34.48 155.28
geopolitical -30.23 -12.11 -41.02 28.47 -19.12 143.44
breast-cancer -4.72 -39.7 -3.51 -1.99 -31.95 173.15
own 49.14 -5.66 226.19 -0.05 -1.27 -79.0
greatest 13.53 6.46 20.83 13.84 38.16 -170.84
political -65.94 60.95 -68.28 -70.54 21.26 121.89
foreign-exchange -24.86 -11.85 -10.77 -4.13 -27.15 143.83
reputational -31.79 0.04 -38.5 -10.85 -2.91 114.75
biggest -3.25 -0.83 -21.59 14.26 -40.97 161.83
calculated 15.66 9.36 53.43 2.17 1.23 -116.76
highest 32.3 2.86 11.28 51.21 -0.0 -110.67
inflationary -20.99 -6.28 -13.12 -3.79 -1.8 108.44
economic -18.2 4.45 -17.86 -11.06 -3.8 107.2
cardiovascular -16.46 0.04 -43.49 12.31 -9.18 94.95
near-term -8.33 -4.32 -20.4 -1.27 -12.77 109.64
legal -24.07 2.65 -10.1 -2.38 -4.68 95.38
perceive -3.43 -2.73 -22.17 -14.4 -0.64 108.09
key -14.74 -20.02 -4.53 0.21 -1.44 87.16
coronary -0.37 104.61 0.25 -0.36 -33.23 -31.32
corporate -35.19 -1.17 -1.24 -6.94 -1.26 76.26
grave 0.41 71.17 -12.44 -2.61 25.18 -83.82
serious 19.77 21.24 2.06 -4.98 51.05 -97.68
new -21.82 7.11 -32.59 -0.24 -0.77 76.93
physical 4.21 17.52 2.26 -0.16 10.8 -89.49
emotional 39.99 10.5 -11.54 -2.28 0.31 -52.94
big -4.29 0.15 -0.1 35.89 -75.0 130.28
upside -5.86 -3.18 -24.14 -0.11 -13.03 89.23
financial -1.89 0.66 19.38 -54.17 -6.35 107.6
less -0.4 -0.07 -3.21 0.79 -17.21 97.64
rise -18.53 0.6 -12.28 -0.29 -11.62 74.92
artistic 28.96 29.58 -1.52 -2.64 -15.47 -30.18
grow -11.13 0.19 -25.24 -0.06 -1.02 66.6
provision -8.66 -12.37 -3.75 -3.22 -9.46 69.4
global -15.45 -1.07 -8.84 0.0 -0.9 60.55
public 8.21 -8.05 43.7 3.44 32.4 -84.77
additional -13.83 -0.02 3.1 -0.01 -11.59 68.33
operational -6.85 -3.4 -12.23 -4.85 -0.01 59.21
bigger -2.34 -2.35 -1.85 6.43 -15.84 71.89
higher 186.9 -143.79 0.74 181.63 -5.6 0.59
main -5.26 0.07 -10.95 -0.25 -7.96 59.65
Objects of risk processes
CHT NYT TBT UST WAP WSJ
life 26.81 -0.12 406.84 10.3 53.14 -599.24
injury 35.98 -3.18 100.17 18.77 1.37 -272.51
capital -7.92 -0.07 -22.23 -1.47 -14.45 182.2
management -8.29 -10.49 2.69 -12.74 -17.1 143.94
appetite -11.47 -9.49 -6.83 -3.49 -13.07 98.39
return -6.95 0.33 -13.64 0.13 -7.56 88.48
cost -5.66 -0.27 -0.28 0.18 -5.3 68.03
inflation -7.75 1.07 -29.18 0.07 -1.03 57.2
aversion -7.87 -0.98 -8.29 -4.24 -6.0 62.2
profit -8.68 0.66 -9.51 -3.76 -5.14 60.89
limb 33.77 -6.18 25.08 -0.82 0.0 -31.31
money -4.15 -0.29 0.13 -2.01 -0.46 55.96
premium -2.42 -2.69 -3.35 -2.98 -3.93 57.37
reward -0.97 -1.33 -10.25 11.59 -1.65 48.84
backlash -1.24 -11.16 -18.69 3.58 13.9 32.42
benefit -2.68 5.08 -33.93 2.9 -0.03 45.61
market -4.39 -1.45 -6.16 -1.22 -0.63 46.24
downgrade -5.28 -1.25 -10.97 0.03 -1.7 42.96
fine 12.33 -0.37 5.21 4.48 2.61 -37.19
price 0.64 -7.07 -5.62 -2.98 -3.93 45.83
debt -4.81 -14.19 -1.79 0.15 -0.01 33.26
decline -6.69 0.15 -1.98 -0.62 -5.46 40.08
ire -3.38 -0.28 -1.69 -1.21 -0.04 38.81
risk 3.77 -11.62 -8.35 3.12 0.02 33.37
safety 5.01 -3.28 65.77 -0.2 -1.44 -21.57
spiral -13.11 1.5 -7.8 0.0 -3.04 23.65
health 6.77 1.32 15.79 1.52 -1.0 -24.3
recession 0.63 -1.19 -17.26 -0.0 -0.16 31.28
volatility 0.31 -0.14 -16.1 -0.29 -8.75 35.14
arrest 2.23 6.57 -0.35 -0.04 11.55 -32.42
profile -0.85 -2.96 -7.8 0.0 -0.56 27.43
suspension 11.0 -2.27 1.37 6.35 0.04 -22.47
cancer 27.04 -1.56 -1.08 0.94 -2.41 -3.03
bone 24.31 -0.76 -2.29 0.06 -4.0 -3.38
tempo 26.98 -4.54 -1.95 -1.0 -3.73 -2.39
death 1.66 7.9 0.4 1.07 3.13 -23.99
investor -1.11 -0.59 -6.58 -3.37 -0.64 25.77
marriage 22.43 -1.61 -1.11 -0.29 -0.05 -4.13
control -9.57 8.38 -6.89 -3.64 -2.82 21.59
friendship 16.65 0.0 -1.08 -1.43 -0.19 -7.24
rate 0.77 -2.54 -0.09 -3.42 -4.33 26.23
deflation -1.24 -2.55 -3.17 -1.62 -0.25 20.72
sentence -0.0 0.2 38.33 -14.21 -0.03 -12.82
dilution -0.01 -1.79 -4.15 -2.12 -3.0 23.19
uncertainty -6.33 14.07 -12.36 -2.89 -4.39 25.73
sector -2.05 -2.84 -1.22 -0.62 -2.33 19.76
illiquidity -2.05 -2.84 -1.22 -0.62 -2.33 19.76
season 4.88 1.08 0.08 0.78 -0.7 -16.41
embarrassment 6.15 7.39 -5.5 2.71 -0.02 -13.13
technology -4.51 0.04 -2.68 -1.37 -5.13 19.63
Subjects of risk, run risk, take risk
CHT NYT TBT UST WAP WSJ
investor -8.12 -12.52 -35.54 -1.92 -34.3 213.49
cub 83.76 -14.9 -5.19 -3.05 -11.11 -9.99
bank -9.07 0.64 -16.83 -1.01 -26.74 82.59
bear 62.13 -3.15 -6.49 -3.81 -13.89 -6.65
company -8.7 0.44 -24.09 0.09 -14.11 67.6
firm -6.58 -1.45 -5.17 -2.38 -0.9 40.1
sox 34.7 -7.02 0.33 -0.08 -9.72 -8.74
village 35.21 -8.07 -2.81 -1.65 -1.66 -5.41
fed -2.05 -5.36 -15.76 1.79 -0.17 27.48
person 8.66 -0.35 1.57 0.13 1.8 -26.93
city 0.16 -11.88 79.86 -1.51 -0.29 -24.02
fund -0.95 -0.01 -5.89 -2.98 -15.8 40.56
maker -8.12 1.27 -8.01 -4.69 -17.13 33.65
national -6.14 -9.93 -3.46 0.0 33.02 -2.11
athlete 12.22 -0.02 -0.92 2.99 -2.09 -14.57
oriole -4.61 -7.45 -2.6 -1.52 37.85 -5.0
ecb -2.69 -4.35 -1.51 -0.89 -3.24 23.41
man 9.6 -0.17 5.15 -3.73 -0.02 -13.36
player 8.86 0.58 -0.11 -1.27 0.11 -14.49
redskin -2.03 -10.55 -3.68 -0.0 35.92 -7.08
pampg -2.3 -3.72 -1.3 -0.76 -2.78 20.06
portfolio -2.3 -3.72 -1.3 -0.76 -2.78 20.06
daley 20.94 -3.72 -1.3 -0.76 -2.78 -2.5
orchestra 15.66 -1.33 0.01 -1.14 -4.17 -3.75
morgan -4.22 -2.15 -2.38 -1.4 -0.04 15.11
stock -0.63 -6.83 -2.38 -1.4 -0.04 15.11
goldman -4.99 -0.09 -2.81 -1.65 -1.66 15.71
ray -11.52 -18.62 123.54 -3.81 -7.81 -12.49
hawk 19.68 -3.51 -3.03 0.02 -1.97 -1.55
loan -1.92 -3.1 -1.08 -0.63 -2.31 16.72
retailer -2.32 -0.99 -1.63 0.07 -3.62 18.06
alderman 17.45 -3.1 -1.08 -0.63 -2.31 -2.08
-3.84 -6.21 -2.16 5.41 -4.63 12.44
student -0.41 -0.44 8.05 -0.22 7.04 -20.82
option -7.68 -3.46 -0.73 -2.54 8.44 2.92
county -1.41 -11.96 76.98 -5.71 -1.63 -8.22
beijing -0.72 -0.32 -4.11 -2.41 -3.65 18.03
florida -2.84 -3.46 53.73 -2.54 -4.01 -8.33
yankee -5.76 23.81 -0.24 -1.9 -2.29 -6.24
child 0.06 3.65 3.07 -1.62 -0.02 -13.15
chrysler -2.3 -3.72 -1.3 -0.76 -0.06 11.73
customer -0.04 -8.46 -0.01 0.0 -0.41 11.2
trust -3.46 -0.1 -1.95 -1.14 -4.17 14.7
cooper -2.69 -4.35 -1.51 -0.89 22.08 -2.91
troop -3.11 -2.46 -2.31 0.0 18.52 -0.5
mets -6.53 28.56 -3.68 -2.16 -7.87 -0.62
owner 9.92 -2.18 1.54 3.14 -3.27 -5.3
stanley -1.54 -2.48 -0.87 -0.51 -1.85 13.37
inflation -1.54 -2.48 -0.87 -0.51 -1.85 13.37
cutler 13.96 -2.48 -0.87 -0.51 -1.85 -1.67

Modelling each publication

We can also present our data by publication. To do that, let's reorder our data:


In [4]:
from corpkit import Interrodict
bypaper = OrderedDict([(x, {}) for x in ['CHT', 'NYT', 'TBT', 'UST', 'WAP', 'WSJ']])
# each search and its interrodict
for interro, data in r.items():
    if interro not in trans.keys():
        continue
    # each newspaper and its interrogation
    for paper, datum in data.items():
        bypaper[paper][interro] = datum
bypaper = Interrodict([(k, Interrodict(v)) for k, v in bypaper.items()])

Now we can create a sketch of risk behaviour in each publication:


In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.close('all')
for i, (name, data) in enumerate(bypaper.items()):
    dim = (len(data.values()) / 2)
    display(HTML('<h1>Features of <i>risk</i> in the %s:</h><br>' % name))
    f, axarr = plt.subplots(dim, 2)
    #f.suptitle('Features of risk in the %s' % name, fontsize = 20)
    for index, (interro, datum) in enumerate(data.items()):
        ax = axarr.reshape(-1)[index]
        #colour = colours[index]
        datum = datum.edit(skip_entries = wordlists.closedclass, print_info = False)
        datum = datum.results.sum() / datum.results.sum().sum()
        datum.visualise(trans[interro], kind='bar', ax=ax, figsize = (15, 7), colours = sixcolours[i],
                        num_to_plot = 10, grid = True, x_label = 'Word')
    plt.show()


Features of risk in the CHT:

Features of risk in the NYT:

Features of risk in the TBT:

Features of risk in the UST:

Features of risk in the WAP:

Features of risk in the WSJ:

Charting riskers

The final thing we'll do here is look at riskers in each corpus in a little more detail.


In [ ]:
# proper noun risker only
q = {T: r'/NNP.?/ !< /(?i).?\brisk.?/ >># (@NP $ (VP <+(VP) (VP ( <<# (/VB.?/ < /(?i).?\brisk.?/) '\
        r'| <<# (/VB.?/ < /(?i)(take|taking|takes|taken|took|run|running|runs|ran|put|putting|puts)/) '\
        r'< (NP <<# (/NN.?/ < /(?i).?\brisk.?/))))))'}
priskers = corpora.interrogate(q, show = L)
priskers.save('priskers')

In [ ]:
# non-proper noun risker
q = {T: r'/NN[^P]*/ !< /(?i).?\brisk.?/ >># (@NP $ (VP <+(VP) (VP ( <<# (/VB.?/ < /(?i).?\brisk.?/) '\
        r'| <<# (/VB.?/ < /(?i)(take|taking|takes|taken|took|run|running|runs|ran|put|putting|puts)/) '\
        r'< (NP <<# (/NN.?/ < /(?i).?\brisk.?/))))))'}
nriskers = corpora.interrogate(q, show = L)
nriskers.save('nriskers')

In [ ]:
# any entity
q = {T: r'/NN.?/ !< /(?i).?\brisk.?/ >># NP'}
entities = corpora.interrogate(q, show = L, quicksave = 'entities')
entities.save('entities')

In [ ]:
# do maths
prelrisk = priskers.collapse().edit('%', entities.collapse())
prelrisk.save('prelriskers')

nrelrisk = nriskers.collapse().edit('%', entities.collapse())
nrelrisk.save('nrelriskers')

In [27]:
r.prelrisk.results.T.head(50).style.background_gradient(cmap=divergemap, axis = 1).set_caption('Proper noun riskers')


Out[27]:
Proper noun riskers
CHT NYT TBT UST WAP WSJ
bush 3.35 1.45 0.71 1.97 1.23 1.46
state 0.18 0.37 0.16 0.31 0.49 0.08
clinton 1.99 1.4 1.17 2.25 1.55 2.1
fed 2.27 1.2 0.92 1.67 1.57 0.94
american 0.6 0.83 0.92 0.32 0.55 0.4
israel 2.31 1.84 2.75 2.72 2.04 1.61
u.s. 1.22 0.12 0.0 0.66 0.25 0.53
obama 1.57 2.0 0.8 1.77 0.97 1.5
democrat 1.23 0.88 0.69 1.13 0.97 0.5
republican 0.88 0.91 0.49 0.95 0.85 0.84
congress 0.51 0.5 0.41 0.32 0.39 0.53
washington 0.62 0.44 0.0 0.51 0.63 0.3
america 0.23 0.67 0.33 1.14 0.45 0.09
president 0.06 0.39 0.0 0.0 0.02 0.03
ray 0.0 0.0 4.64 0.0 0.71 0.0
house 0.25 0.24 0.0 0.0 0.22 0.19
china 0.3 0.42 0.0 0.75 0.35 0.41
bear 2.77 1.19 0.0 0.0 0.0 0.31
gorbachev 2.82 3.0 4.26 0.0 2.46 1.96
hussein 1.29 1.85 2.35 1.02 1.22 0.51
cub 4.85 0.0 0.0 0.0 0.0 0.0
iraq 0.37 0.4 0.0 0.32 0.2 0.14
west 0.49 0.72 0.37 0.0 0.81 0.87
sox 3.31 1.03 6.52 3.7 0.0 0.0
iran 1.61 0.31 0.0 0.0 1.02 0.39
florida 0.34 0.41 0.4 0.0 0.26 0.0
beijing 2.0 1.11 0.0 0.0 0.38 1.85
nbc 1.92 2.12 4.62 1.68 2.94 2.61
arafat 4.31 2.55 6.45 4.88 2.34 0.0
gop 4.65 0.0 1.41 3.08 3.14 1.73
department 0.24 0.2 0.03 0.0 0.08 0.04
administration 0.0 0.34 0.08 0.0 0.02 0.06
russia 0.27 0.55 0.0 0.55 0.35 0.52
williams 1.31 1.02 0.45 0.0 0.82 0.0
musharraf 2.94 2.17 0.0 13.33 5.71 4.6
mets 0.0 3.25 0.0 0.0 0.0 3.03
netanyahu 1.37 4.46 0.0 0.0 3.94 6.15
mccain 1.95 1.35 1.63 0.0 1.25 0.75
council 0.13 0.28 0.42 0.0 0.38 0.22
cuban 5.48 4.12 1.59 0.0 4.9 1.64
gore 0.93 1.01 1.61 0.64 0.42 0.9
johnson 0.19 0.0 1.26 0.59 0.69 0.19
iraqi 0.79 1.29 2.56 2.15 1.43 0.62
reagan 0.88 1.5 0.59 0.0 0.99 1.69
bank 0.06 0.04 0.05 0.0 0.02 0.03
putin 3.8 0.53 5.0 0.0 2.9 2.12
yankee 0.0 2.65 1.54 0.0 1.49 0.0
japan 0.0 0.62 0.74 0.52 0.15 0.22
fox 2.3 1.84 0.83 0.0 0.6 1.0
sharon 7.46 4.2 0.0 5.26 0.69 2.63

In [28]:
r.nrelrisk.results.T.head(50).style.background_gradient(cmap=divergemap, axis = 1).set_caption('Noun riskers')


Out[28]:
Noun riskers
CHT NYT TBT UST WAP WSJ
person 0.8 0.67 0.65 0.58 0.69 0.48
company 0.75 0.65 0.35 0.76 0.58 0.48
investor 0.88 0.7 0.48 0.54 0.65 0.66
bank 0.9 0.67 0.48 0.73 0.38 0.47
government 0.81 0.77 0.4 0.6 0.83 0.59
state 0.43 0.5 0.36 0.55 0.68 0.39
man 0.86 0.6 0.83 0.26 0.57 0.5
woman 0.31 0.31 0.19 0.28 0.19 0.4
bush 3.35 1.45 0.71 1.97 1.23 1.46
president 0.52 0.67 0.52 0.44 0.96 0.28
leader 1.54 1.19 0.89 0.82 0.97 0.76
city 0.87 0.35 1.49 0.32 0.76 0.29
player 1.52 1.15 0.76 0.4 1.13 0.53
administration 1.12 0.87 0.64 0.72 0.62 0.75
firm 0.51 0.58 0.6 0.38 0.58 0.44
clinton 1.99 1.4 1.17 2.25 1.55 2.1
team 0.74 0.84 0.82 0.66 0.85 0.48
american 0.73 0.86 1.28 0.4 0.65 0.63
fed 2.27 1.2 0.92 1.67 1.65 0.94
anyone 1.6 1.37 1.41 1.61 1.29 1.15
official 0.3 0.3 0.2 0.09 0.32 0.14
manager 0.96 0.86 0.21 1.04 0.62 0.49
republican 1.33 1.16 0.81 1.42 1.35 1.36
country 0.36 0.21 0.38 0.47 0.36 0.38
owner 2.01 1.0 0.94 1.85 0.86 0.63
fund 0.19 0.24 0.1 0.05 0.08 0.2
member 0.34 0.62 0.39 0.4 0.46 0.41
israel 2.31 1.84 2.75 2.72 2.04 1.61
worker 0.51 0.66 0.59 0.15 0.51 0.44
student 0.43 0.61 0.4 0.3 0.59 0.16
others 0.47 0.52 0.59 0.51 0.44 0.25
someone 0.67 0.9 0.88 1.73 0.83 0.9
democrat 1.37 0.99 0.69 1.13 1.1 0.5
u.s. 1.22 0.12 0.0 0.66 0.25 0.53
driver 1.13 0.95 1.21 1.55 0.74 0.74
party 0.73 0.84 0.58 0.71 0.87 0.43
obama 1.57 2.0 0.8 1.77 0.97 1.5
child 0.14 0.26 0.14 0.06 0.13 0.08
one 0.61 0.6 0.57 0.74 0.36 0.46
school 0.22 0.23 0.17 0.2 0.4 0.22
move 0.41 0.95 0.37 0.14 0.64 0.51
way 0.31 0.09 0.22 0.12 0.15 0.06
executive 0.58 0.42 0.38 1.23 0.95 0.25
soldier 1.08 0.97 3.61 1.22 1.71 1.33
group 0.16 0.17 0.22 0.09 0.16 0.13
congress 0.51 0.5 0.41 0.32 0.39 0.53
employee 0.6 0.38 0.43 0.28 0.57 0.58
network 0.78 1.71 1.16 2.21 0.31 0.51
time 0.1 0.12 0.11 0.07 0.06 0.06
life 0.09 0.14 0.11 0.25 0.17 0.12

Keyness of riskers


In [10]:
kpriskers = r.priskers.collapse().edit('k', SELF, print_info = False)
kpriskers.results.T.head(20).style.background_gradient(cmap=divergemap, axis = 1).set_caption('Keyness of proper noun riskers')


Out[10]:
Keyness of proper noun riskers
CHT NYT TBT UST WAP WSJ
cub 82.52 -15.94 -5.22 -3.09 -11.24 -8.6
bear 64.27 -5.56 -6.31 -3.73 -13.59 -4.95
u.s. 12.82 -42.33 -16.75 -0.15 -22.85 61.95
fed -2.27 -6.78 -15.67 1.77 -0.44 36.27
sox 33.94 -7.77 0.33 -0.08 -9.84 -7.53
hawk 28.08 -2.47 -2.39 -1.41 -5.15 -3.94
president -3.5 41.65 -6.74 -3.99 -8.35 -2.79
oriole -4.74 -7.97 -2.61 -1.54 37.59 -4.3
redskin -4.34 -7.31 -2.39 -1.41 34.46 -3.94
p&g -2.37 -3.98 -1.31 -0.77 -2.81 21.69
morgan -4.34 -2.47 -2.39 -1.41 -0.05 17.11
goldman -5.13 -0.18 -2.83 -1.67 -1.7 17.92
daley 20.63 -3.98 -1.31 -0.77 -2.81 -2.15
beijing -0.8 -0.52 -4.13 -2.44 -3.73 20.88
national -3.95 -6.64 -2.18 0.21 22.16 -3.58
chrysler -2.37 -3.98 -1.31 -0.77 -0.07 13.03
ray -11.85 -19.92 123.26 -3.86 -7.95 -10.75
street -4.34 -0.6 -2.39 0.14 -1.12 12.32
stanley -1.58 -2.66 -0.87 -0.51 -1.87 14.46
cooper -2.76 -4.65 -1.52 -0.9 21.93 -2.51

In [9]:
knriskers = r.nriskers.collapse().edit('k', SELF, print_info = False)
knriskers.results.T.head(50).style.background_gradient(cmap=divergemap, axis = 1).set_caption('Keyness of noun riskers')


Out[9]:
Keyness of noun riskers
CHT NYT TBT UST WAP WSJ
investor -8.37 -12.27 -35.81 -1.99 -33.89 213.5
cub 83.5 -14.84 -5.22 -3.06 -11.06 -9.99
bank -9.3 0.69 -17.0 -1.05 -26.42 82.59
bear 61.88 -3.11 -6.52 -3.83 -13.82 -6.65
company -9.01 0.5 -24.37 0.08 -13.79 67.61
u.s. 13.48 -38.97 -16.74 -0.14 -22.33 52.25
firm -6.71 -1.4 -5.24 -2.43 -0.86 40.1
sox 34.55 -6.98 0.33 -0.08 -9.68 -8.74
village 35.09 -8.04 -2.83 -1.66 -1.64 -5.41
fed -2.12 -5.27 -15.85 1.75 -0.15 27.49
person 8.27 -0.29 1.47 0.1 1.95 -26.93
city 0.14 -11.74 79.49 -1.55 -0.26 -24.02
fund -1.0 -0.01 -5.95 -3.02 -15.66 40.57
maker -8.19 1.3 -8.04 -4.72 -17.05 33.65
national -6.18 -9.89 -3.48 0.0 33.13 -2.11
athlete 12.11 -0.01 -0.93 2.96 -2.05 -14.57
oriole -4.63 -7.42 -2.61 -1.53 37.94 -4.99
ecb -2.7 -4.33 -1.52 -0.89 -3.23 23.41
man 9.38 -0.15 5.06 -3.8 -0.02 -13.36
player 8.68 0.61 -0.12 -1.3 0.13 -14.49
redskin -2.05 -10.51 -3.69 -0.0 36.04 -7.08
p&g -2.32 -3.71 -1.3 -0.77 -2.76 20.06
portfolio -2.32 -3.71 -1.3 -0.77 -2.76 20.06
daley 20.88 -3.71 -1.3 -0.77 -2.76 -2.5
orchestra 15.59 -1.32 0.01 -1.15 -4.15 -3.75
morgan -4.25 -2.13 -2.39 -1.4 -0.04 15.11
goldman -5.02 -0.09 -2.83 -1.66 -1.64 15.71
stock -0.64 -6.8 -2.39 -1.4 -0.04 15.11
ray -11.58 -18.55 123.3 -3.83 -7.76 -12.49
hawk 19.59 -3.48 -3.04 0.02 -1.95 -1.55
loan -1.93 -3.09 -1.09 -0.64 -2.3 16.72
retailer -2.35 -0.97 -1.64 0.07 -3.58 18.06
alderman 17.4 -3.09 -1.09 -0.64 -2.3 -2.08
% -3.86 -6.18 -2.17 5.38 -4.61 12.44
student -0.44 -0.42 7.97 -0.23 7.14 -20.81
option -7.72 -3.43 -0.74 -2.55 8.5 2.92
beijing -0.74 -0.31 -4.13 -2.42 -3.62 18.03
county -1.45 -11.88 76.75 -5.74 -1.6 -8.22
florida -2.88 -3.43 53.6 -2.55 -3.98 -8.32
yankee -5.79 23.89 -0.24 -1.91 -2.26 -6.24
child 0.05 3.72 3.02 -1.65 -0.01 -13.15
chrysler -2.32 -3.71 -1.3 -0.77 -0.06 11.73
customer -0.04 -8.4 -0.01 0.0 -0.4 11.21
trust -3.48 -0.09 -1.96 -1.15 -4.15 14.7
cooper -2.7 -4.33 -1.52 -0.89 22.13 -2.91
troop -3.16 -2.42 -2.33 0.0 18.65 -0.5
mets -6.56 28.65 -3.69 -2.17 -7.83 -0.62
owner 9.76 -2.13 1.51 3.09 -3.2 -5.29
stanley -1.54 -2.47 -0.87 -0.51 -1.84 13.38
inflation -1.54 -2.47 -0.87 -0.51 -1.84 13.38

Relative riskers


In [8]:
r.prelrisk.results.T.iloc[:10,:10].visualise(kind='bar', grid = True, 
                                            legend_pos = 'upper right', figsize = (12, 5),
                                            colours = sixcolours, x_label = 'Publication')
r.nrelrisk.visualise(kind='bar', grid = True, legend_pos = 'upper right', figsize = (12, 5),
                                            x_label = 'Publication')


Out[8]:
<module 'matplotlib.pyplot' from '/Users/daniel/virtenvs/ssled/lib/python2.7/site-packages/matplotlib/pyplot.pyc'>