One specific are of interest is whether risk behaves differently in the six publications, irrespective of the year of a text's publication. In this investigation, we will focus on dependency relationships with risk words, rather than constituencies. As such, we won't be querying syntax trees, but Stanford CoreNLP's XML structures.
In [1]:
# import module
from corpkit import *
# various wordlists
from dictionaries import *
# model corpora in data directory
corpora = Corpora('data')
# inline figures
%matplotlib inline
Corpus: /Volumes/extra/risk/data/CHT-parsed
Corpus: /Volumes/extra/risk/data/NYT-parsed
Corpus: /Volumes/extra/risk/data/TBT-parsed
Corpus: /Volumes/extra/risk/data/UST-parsed
Corpus: /Volumes/extra/risk/data/WAP-parsed
Corpus: /Volumes/extra/risk/data/WSJ-parsed
In [2]:
# load previous results
r = load_all_results()
# map results to some elaborated names
trans = {'adjrisk' : 'Adjectives modifying risk',
'risker' : 'Subjects of risk',
'riskers' : 'Subjects of risk, run risk, take risk',
'riskyx' : 'Nouns modified by adjectival risk',
'torisksomething': 'Objects of risk processes',
'nommod' : 'Nominal modifiers of risk participants',
'risknoun' : 'Nouns modified by nominal risk'}
17:51:32: adjrisk.p loaded as adjrisk.
17:52:49: entities.p loaded as entities.
17:52:51: nommod.p loaded as nommod.
17:53:06: nrelrisk.p loaded as nrelrisk.
17:53:30: nriskers.p loaded as nriskers.
17:53:46: prelrisk.p loaded as prelrisk.
17:54:13: priskers.p loaded as priskers.
17:54:31: relrisker.p loaded as relrisker.
17:54:33: riskers.p loaded as riskers.
17:54:58: risknoun.p loaded as risknoun.
17:55:00: riskyx.p loaded as riskyx.
17:55:03: torisksomething.p loaded as torisksomething.
17:55:03: 12 interrogations loaded from saved_interrogations.
In [3]:
# for moving results around and preserving order
from collections import OrderedDict
# table viewing parameters
import pandas as pd
pd.set_option('display.max_rows', 30)
pd.set_option('display.max_columns', 30)
pd.set_option('display.width', None)
pd.set_option('display.precision', 2)
pd.set_option('expand_frame_repr', True)
# colouring -- requires seaborn
import seaborn as sns
# for colouring freq tables
cm = sns.light_palette("green", as_cmap=True)
# keyword tables
divergemap = sns.diverging_palette(10, 133, as_cmap=True)
sixcolours = sns.color_palette()
# display html table
from IPython.display import display, HTML
What we're going to concentrate on are things that co-occur lexicogrammatically with risk as a participant and risk as a process. For example, we want to find common subjects and objects of risk processes, as well as how risk as a participan is modified.
In the sections below, we'll run all the investigations, and save the results, so that we can load them all as one item for editing and visualising. We'll first look at the data by feature, and then create sketches of the behaviour of risk in each publication.
The first interrogation, of subjects of risk processes, will feature a bit of commentary. The rest will follow the same structure, but will just be code.
Subjects of risk as process are best located with a Tregex query, as this can simultaneously locate those who risk, those who take risks, and those who run risks.
In [223]:
q = {T: r'/NN.?/ !< /(?i).?\brisk.?/ >># (@NP $ (VP <+(VP) (VP ( <<# (/VB.?/ < /(?i).?\brisk.?/) '\
r'| <<# (/VB.?/ < /(?i)(take|taking|takes|taken|took|run|running|runs|ran|put|putting|puts)/) '\
r'< (NP <<# (/NN.?/ < /(?i).?\brisk.?/))))))'}
We can then pass the query to an interrogation method, choosing to show the
lemma form, and to save to disk as riskers.p:
In [ ]:
riskers = corpora.interrogate(q, show = L, quicksave = 'riskers')
In [ ]:
# this can take hours, depending on the corpora's size!
#risker = corpora.interrogate(q, show = L, save = 'risker')
The result is an Interrodict object, with attributes for each newspaper. If we
wanted to quickly see the raw results, we could use topwords():
In [16]:
r.riskers.topwords()
CHT % NYT % TBT % UST % WAP % WSJ %
person 4.12 person 3.22 person 3.78 person 3.48 person 3.67 investor 4.65
company 1.56 company 2.30 city 2.35 company 2.29 state 1.71 company 4.04
man 1.33 bank 1.36 man 1.33 investor 1.18 government 1.53 bank 2.85
investor 1.12 investor 1.15 ray 1.29 clinton 1.18 company 1.49 person 2.07
woman 1.07 state 1.15 county 1.29 state 1.18 president 1.07 firm 1.34
bush 0.99 government 1.14 state 0.98 bush 1.11 man 0.87 government 1.27
player 0.94 man 0.84 company 0.89 bank 0.96 investor 0.73 fund 1.05
bank 0.78 leader 0.83 team 0.84 woman 0.96 official 0.73 fed 1.05
government 0.76 president 0.77 american 0.80 driver 0.89 leader 0.73 u.s. 1.05
owner 0.73 woman 0.71 student 0.76 fed 0.74 bush 0.71 manager 0.78
Lots of pronouns is to be expected, as we didn't exclude them in our search. Maybe something like this instead:
In [17]:
nopro = r.riskers.edit(skip_entries = wordlists.closedclass, print_info = False)
nopro.topwords(df = True).style.set_caption('Riskers')
Out[17]:
Riskers
CHT
CHT
NYT
NYT
TBT
TBT
UST
UST
WAP
WAP
WSJ
WSJ
Result
%
Result
%
Result
%
Result
%
Result
%
Result
%
0.0
person
4.23
person
3.3
person
3.9
person
3.59
person
3.76
investor
4.74
1.0
company
1.61
company
2.36
city
2.43
company
2.37
state
1.75
company
4.12
2.0
man
1.37
bank
1.39
man
1.38
investor
1.22
government
1.57
bank
2.9
3.0
investor
1.15
investor
1.18
ray
1.33
clinton
1.22
company
1.53
person
2.11
4.0
woman
1.1
state
1.18
county
1.33
state
1.22
president
1.09
firm
1.36
5.0
bush
1.02
government
1.16
state
1.01
bush
1.15
man
0.89
government
1.29
6.0
player
0.96
man
0.86
company
0.92
woman
0.99
official
0.75
fund
1.07
7.0
bank
0.8
leader
0.85
team
0.87
bank
0.99
leader
0.75
fed
1.07
8.0
government
0.78
president
0.79
american
0.83
driver
0.92
investor
0.75
u.s.
1.07
9.0
owner
0.75
woman
0.72
student
0.78
fed
0.76
bush
0.73
manager
0.79
Another view is provided by the collapse() method:
In [18]:
# collapse rows (years)
nopro.collapse().results.top()
# top() avoids pandas' truncation of large results
# it responds to the number of max columns
Out[18]:
person
company
investor
bank
government
state
man
woman
bush
president
leader
city
player
firm
administration
clinton
team
american
fed
official
manager
republican
country
owner
fund
member
israel
worker
student
democrat
CHT
158
60
43
30
29
22
51
41
38
17
27
25
36
12
21
20
20
17
13
15
18
9
13
28
12
9
14
12
12
10
NYT
187
134
67
79
66
67
49
41
38
45
48
19
38
28
38
30
27
30
18
32
29
28
17
18
23
28
26
26
19
27
TBT
85
20
8
10
10
22
30
11
9
11
11
53
12
6
8
6
19
18
1
9
3
5
8
13
3
11
7
10
17
4
UST
47
31
16
13
10
16
6
13
15
6
6
5
5
4
7
16
9
5
10
3
8
6
7
10
2
4
5
2
4
5
WAP
165
67
33
25
69
77
39
22
32
48
33
25
28
22
28
31
27
25
20
33
11
30
26
12
5
20
23
19
27
24
WSJ
85
166
191
117
52
20
18
26
21
17
18
6
9
55
25
22
9
14
43
11
32
21
27
9
43
14
10
13
2
9
In [8]:
# collapse columns (words)
nopro.collapse('x').results
Out[8]:
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
CHT
151
145
129
136
154
135
144
166
146
133
107
127
184
156
161
147
183
167
146
126
158
110
89
70
81
85
114
85
NYT
202
190
198
217
194
198
180
179
167
177
190
218
219
210
212
235
232
213
202
216
179
238
229
237
228
206
188
114
TBT
113
110
113
119
120
87
83
93
65
73
58
86
76
77
77
75
71
78
56
77
74
83
112
42
45
32
35
47
UST
12
18
12
24
23
60
61
62
63
50
64
71
60
58
45
68
71
59
53
53
46
44
41
50
36
35
42
28
WAP
157
131
149
152
138
159
154
169
108
103
125
189
191
193
171
194
227
152
178
172
162
162
170
127
135
193
147
80
WSJ
114
99
134
119
96
118
108
125
109
98
118
124
133
117
103
137
144
162
168
140
189
206
202
172
200
208
193
196
In [9]:
# collapse dict key names (newspapers)
nopro.collapse('n').results.head(10).top()
Out[9]:
person
company
investor
bank
government
state
man
woman
bush
president
leader
city
player
administration
firm
clinton
team
american
fed
official
manager
republican
country
owner
fund
member
israel
worker
student
democrat
1987
28
21
8
5
6
10
11
6
0
8
9
11
1
9
10
0
3
4
5
1
4
1
2
2
2
5
2
0
1
4
1988
22
11
8
5
7
9
4
4
5
5
3
8
6
3
4
0
6
1
2
3
1
0
1
3
0
2
3
3
1
4
1989
27
15
6
6
11
8
7
5
7
5
8
8
6
1
2
0
3
7
9
5
2
0
5
2
1
3
6
2
3
1
1990
26
14
4
7
16
16
7
5
18
4
4
7
3
8
10
0
2
6
2
2
6
0
8
5
0
2
8
5
7
1
1991
35
9
10
11
6
9
15
5
14
5
10
6
5
5
4
0
2
7
4
3
5
1
1
1
0
2
4
3
3
2
1992
26
16
16
11
3
10
4
12
18
8
4
6
4
2
2
11
1
6
6
0
3
1
5
5
0
0
0
4
5
6
1993
23
19
10
8
8
7
10
6
0
4
1
4
8
7
2
26
2
4
1
2
4
2
1
6
1
7
3
2
2
4
1994
29
14
19
7
8
8
4
12
1
5
2
5
4
8
1
18
7
3
5
5
5
1
4
7
10
7
1
3
4
0
1995
27
11
8
17
5
11
8
6
0
8
2
4
3
8
2
13
5
1
4
7
4
3
1
8
1
2
2
4
2
1
1996
13
20
14
6
6
10
6
11
0
1
2
6
3
4
1
10
3
3
4
1
4
4
3
3
1
1
3
3
2
0
Recent versions of pandas allow some highlighting of values, too:
In [21]:
nopro.collapse().results.iloc[:,:10].style.background_gradient(cmap=cm, axis = 0)
Out[21]:
person
company
investor
bank
government
state
man
woman
bush
president
CHT
158.0
60.0
43.0
30.0
29.0
22.0
51.0
41.0
38.0
17.0
NYT
187.0
134.0
67.0
79.0
66.0
67.0
49.0
41.0
38.0
45.0
TBT
85.0
20.0
8.0
10.0
10.0
22.0
30.0
11.0
9.0
11.0
UST
47.0
31.0
16.0
13.0
10.0
16.0
6.0
13.0
15.0
6.0
WAP
165.0
67.0
33.0
25.0
69.0
77.0
39.0
22.0
32.0
48.0
WSJ
85.0
166.0
191.0
117.0
52.0
20.0
18.0
26.0
21.0
17.0
This works better, however, when we have relative, rather than absolute, frequencies:
In [22]:
rel = nopro.collapse().edit('%', SELF, print_info = False)
#.results * 100.0 / nopro.collapse().results.sum()
rel.results.T.head(10).style.background_gradient(cmap=cm, axis = 1).set_caption('Riskers')
Out[22]:
Riskers
CHT
NYT
TBT
UST
WAP
WSJ
person
4.23
3.3
3.9
3.59
3.76
2.11
company
1.61
2.36
0.92
2.37
1.53
4.12
investor
1.15
1.18
0.37
1.22
0.75
4.74
bank
0.8
1.39
0.46
0.99
0.57
2.9
government
0.78
1.16
0.46
0.76
1.57
1.29
state
0.59
1.18
1.01
1.22
1.75
0.5
man
1.37
0.86
1.38
0.46
0.89
0.45
woman
1.1
0.72
0.51
0.99
0.5
0.64
bush
1.02
0.67
0.41
1.15
0.73
0.52
president
0.46
0.79
0.51
0.46
1.09
0.42
In [ ]:
# objects of risk: to risk LIFE
q = {F: '(dobj|.subjpass)', GL: r'\brisk'}
toriskx = corpora.interrogate(q, show = L, quicksave = 'toriskx')
In [ ]:
# nominal modifiers of nominal risk: a SECURITY risk
q = {F: '^(compound|nn)$', GL: r'\brisk'}
nommod = corpora.interrogate(q, show = L, quicksave = 'nommod')
In [ ]:
# adjectives modifying risk: a BIG risk
q = {F: 'amod', GL: r'\brisk'}
adjrisk = corpora.interrogate(q, show = L, quicksave = 'adjrisk')
In [ ]:
# nouns modified by adjectival risk: a risky DECISION
q = {F: 'amod', L: r'\brisk'}
riskyx = corpora.interrogate(q, show = GL, quicksave = 'riskyx')
In [ ]:
# objects of risk processes: to risk a LIFE
q = {F: '(dobj|.subjpass)', GL: r'\brisk'}
torisksomething = corpora.interrogate(q, show = L, quicksave = 'torisksomething')
In [ ]:
# nouns modified by nominal risk: risk MANAGEMENT
q = {F: '^(compound|nn)$', L: r'\brisk'}
risknoun = corpora.interrogate(q, show = GL, quicksave = 'risknoun')
In [19]:
%matplotlib inline
for name, data in r.items():
# skip results we're not interested in
if name not in trans.keys():
continue
# refresh class attributes
data = Interrodict(data)
data = data.collapse()
data = data.edit('%', SELF, skip_entries = wordlists.closedclass, print_info = False)
tab = data.results.T.head(15).style.background_gradient(cmap=cm, axis = 1).set_caption(trans[name])
# note, log y axes are turned on, because of outlier results!
plt = data.results.visualise(trans[name], kind='bar', grid = True, figsize = (11, 5), logy = True, ncol = 1,
x_label = 'Publication', num_to_plot = 15, legend_pos = 'outside right', fontsize = 16)
plt.show()
plt = data.results.visualise(trans[name], kind='bar', grid = True, figsize = (11, 5), ncol = 1,
x_label = 'Publication', num_to_plot = 15, stacked = True,
legend_pos = 'outside right', fontsize = 16, reverse_legend = True)
plt.show()
display(tab)
Nouns modified by nominal risk
CHT
NYT
TBT
UST
WAP
WSJ
factor
27.66
19.47
20.08
28.67
23.22
8.94
management
6.02
6.94
9.62
3.14
5.95
8.4
assessment
3.05
3.89
3.44
2.55
6.64
2.22
premium
0.69
1.79
0.36
0.63
1.11
6.42
manager
1.43
1.83
5.28
1.0
1.14
2.28
taker
1.73
2.78
1.93
3.77
1.38
1.17
group
1.94
2.12
2.3
1.78
1.85
1.41
tolerance
2.84
1.1
0.52
2.72
0.92
2.35
profile
1.26
1.26
0.41
1.29
1.32
3.21
business
3.54
1.57
1.96
2.63
1.11
0.76
aversion
0.43
1.04
0.08
0.83
0.69
3.67
analysis
1.08
1.73
0.84
1.73
2.44
1.1
level
1.36
1.3
1.09
1.31
1.45
1.63
officer
0.5
1.68
0.23
0.8
0.84
2.09
pool
1.02
1.33
1.7
1.31
1.43
1.1
Nominal modifiers of risk participants
CHT
NYT
TBT
UST
WAP
WSJ
health
21.27
17.04
25.94
15.34
19.78
8.36
cancer
7.75
6.39
6.25
10.92
6.59
3.14
security
4.67
6.55
6.54
3.97
7.96
2.48
credit
3.79
3.94
2.0
2.26
4.11
9.27
downside
1.97
2.57
0.59
1.76
1.94
5.49
heart
2.69
3.29
1.83
6.35
2.14
1.53
safety
2.93
2.6
3.53
2.48
3.29
1.67
flight
3.53
2.1
4.49
0.87
2.73
0.65
market
1.33
1.46
0.86
0.85
1.28
2.84
inflation
0.82
0.92
0.15
0.96
0.95
3.39
currency
0.66
1.52
0.19
0.81
0.47
3.01
breast
2.06
1.39
0.66
4.32
1.44
0.19
investment
1.3
1.09
0.8
0.97
1.22
1.88
disease
1.45
1.27
1.13
2.63
1.51
0.42
business
0.68
0.8
0.79
0.28
0.76
1.39
Nouns modified by adjectival risk
CHT
NYT
TBT
UST
WAP
WSJ
investment
3.33
3.68
3.38
4.64
3.17
5.04
asset
0.59
1.95
0.17
1.64
0.87
6.89
business
3.43
2.83
2.92
2.28
2.57
2.15
behavior
3.66
2.95
2.5
4.01
3.37
1.11
loan
1.77
2.1
1.61
1.68
2.23
3.04
bond
1.18
1.66
0.98
1.23
1.25
4.19
group
2.55
2.2
2.36
1.64
2.5
0.87
child
2.32
0.8
4.64
1.24
2.92
0.22
strategy
1.36
2.05
0.83
1.8
1.47
2.13
student
1.94
0.74
5.68
1.11
2.58
0.16
youth
2.03
0.65
3.74
1.26
3.4
0.22
area
1.42
1.52
2.17
2.16
1.28
1.17
security
0.99
1.45
1.1
0.86
0.99
2.05
venture
1.53
1.58
1.51
0.9
1.27
0.95
patient
1.47
1.23
1.32
1.68
1.25
0.99
Adjectives modifying risk
CHT
NYT
TBT
UST
WAP
WSJ
high
6.84
6.89
8.91
7.63
6.71
4.31
increase
7.02
5.05
5.17
6.52
5.92
4.37
greater
5.65
4.55
6.14
5.5
4.98
3.73
higher
5.51
3.23
4.27
6.51
3.92
4.18
potential
2.79
2.65
3.28
2.61
2.72
2.87
political
1.98
3.1
1.66
1.57
2.93
3.31
financial
2.24
2.41
2.86
1.49
2.17
2.99
significant
1.95
2.15
1.7
1.64
2.28
2.21
big
1.84
2.01
1.94
2.69
1.45
2.64
serious
2.11
2.08
1.97
1.58
2.28
1.29
lower
2.14
1.48
1.46
2.18
2.0
1.62
great
1.77
1.88
2.51
1.45
2.18
0.81
little
1.63
1.76
1.72
1.6
1.43
1.54
real
1.34
1.93
1.3
1.7
1.72
1.45
own
1.94
1.41
3.07
1.49
1.46
1.08
Objects of risk processes
CHT
NYT
TBT
UST
WAP
WSJ
life
15.99
14.44
22.8
16.2
16.57
6.91
injury
3.0
2.13
3.96
3.27
2.44
0.43
loss
1.43
1.83
1.3
1.46
1.43
1.91
benefit
1.21
1.52
0.73
1.63
1.35
2.1
death
1.2
1.29
1.17
1.25
1.24
0.67
money
0.9
1.02
1.09
0.85
1.01
1.8
career
0.98
0.98
1.14
1.34
1.28
0.58
damage
1.13
0.95
0.98
1.25
1.13
0.75
wrath
1.06
0.98
0.8
1.21
0.98
0.96
health
1.06
0.95
1.27
1.04
0.82
0.49
fine
1.11
0.83
1.08
1.15
0.97
0.4
reward
0.74
0.75
0.54
1.27
0.74
1.42
reputation
0.8
0.91
0.8
0.93
0.79
0.72
arrest
0.87
0.91
0.72
0.74
0.99
0.36
risk
0.88
0.57
0.52
0.98
0.77
1.24
Subjects of risk, run risk, take risk
CHT
NYT
TBT
UST
WAP
WSJ
person
4.23
3.3
3.9
3.59
3.76
2.11
company
1.61
2.36
0.92
2.37
1.53
4.12
investor
1.15
1.18
0.37
1.22
0.75
4.74
bank
0.8
1.39
0.46
0.99
0.57
2.9
government
0.78
1.16
0.46
0.76
1.57
1.29
state
0.59
1.18
1.01
1.22
1.75
0.5
man
1.37
0.86
1.38
0.46
0.89
0.45
woman
1.1
0.72
0.51
0.99
0.5
0.64
bush
1.02
0.67
0.41
1.15
0.73
0.52
president
0.46
0.79
0.51
0.46
1.09
0.42
leader
0.72
0.85
0.51
0.46
0.75
0.45
city
0.67
0.34
2.43
0.38
0.57
0.15
player
0.96
0.67
0.55
0.38
0.64
0.22
administration
0.56
0.67
0.37
0.53
0.64
0.62
firm
0.32
0.49
0.28
0.31
0.5
1.36
In [24]:
for name, data in r.items():
# skip results we're not interested in
if name not in trans.keys():
continue
# refresh class attributes
data = Interrodict(data)
data = data.collapse()
data = data.edit('k', SELF, skip_entries = wordlists.closedclass,
replace_names = r'[^a-zA-Z0-9-]', print_info = False)
tab = data.results.T.head(50).style.background_gradient(cmap=divergemap, axis = 1).set_caption(trans[name])
display(tab)
Nouns modified by nominal risk
CHT
NYT
TBT
UST
WAP
WSJ
factor
624.75
13.36
14.27
233.29
170.65
-1663.58
premium
-258.55
-45.38
-201.48
-85.1
-140.4
1434.85
appetite
-144.46
-140.64
-157.13
-35.33
-229.75
1296.5
aversion
-141.43
-26.76
-161.93
-14.01
-73.38
775.49
business
263.07
-0.45
5.02
24.01
-26.6
-160.94
profile
-16.98
-22.98
-96.79
-4.02
-12.87
329.44
forecast
-32.55
-43.46
-14.36
-7.12
-32.21
233.35
officer
-74.28
30.29
-84.23
-6.51
-18.97
152.59
agency
167.06
-6.94
-11.57
-7.9
-5.42
-33.72
consultants
158.45
-7.89
-0.45
-11.3
-18.29
-32.25
control
-40.07
9.45
-28.39
-20.72
-57.97
176.71
provision
-18.38
-7.59
-19.51
-16.32
-36.59
170.58
weighting
-43.99
-2.04
-15.89
-4.86
-29.73
140.99
sentiment
-23.08
-21.97
-11.99
-7.31
-15.84
139.08
child
76.95
-34.3
28.57
-1.04
8.43
-109.33
resident
-24.71
-33.29
-6.96
-2.43
262.12
-60.45
reversal
-20.92
-19.39
-10.86
-0.56
-20.73
116.38
exposure
-17.7
-8.43
-0.54
-5.51
-13.65
123.58
asset
-22.89
-8.98
-26.15
8.23
-19.97
107.52
taker
-0.31
90.58
0.86
75.47
-15.38
-74.78
neglect
-17.67
-15.57
-4.06
-5.59
154.13
-32.91
capital
-12.35
-0.05
-10.72
-5.99
-5.99
76.37
disclosure
-3.87
-6.13
-20.37
-1.53
-4.74
72.09
trade
-8.79
-9.01
-14.05
0.17
-8.63
68.35
weight
-14.06
-11.27
-10.86
-6.62
0.27
51.64
inc
12.75
-46.96
2.39
-28.43
-47.08
92.27
parity
-12.98
-4.07
-6.74
-4.11
-12.87
64.15
mom
61.26
-7.86
-3.18
-1.94
-6.08
-11.42
manager
-22.6
-2.03
290.93
-24.07
-57.54
10.35
arbitrager
-21.84
12.14
-25.1
-15.3
-47.9
67.27
management
-12.49
0.54
79.24
-101.66
-16.34
87.85
assessment
-6.99
9.28
-0.0
-10.32
341.18
-151.42
firm
17.45
24.6
-20.34
-0.06
0.28
-24.23
regulator
-38.95
9.43
-20.23
-6.64
20.48
0.45
perception
-22.33
1.28
-15.65
-0.18
0.18
23.88
spectrum
-9.04
0.01
-8.23
-3.7
-6.6
42.16
ltd
-0.44
-6.47
-13.67
-8.33
-1.73
43.38
area
17.02
-10.45
39.47
1.12
0.03
-27.61
department
-0.37
4.51
83.56
-15.51
-35.01
-2.66
rating
-5.81
0.19
-17.37
0.0
-1.97
36.92
intelligence
-9.02
-3.0
-4.68
-0.13
-3.8
33.39
tool
0.02
-3.25
-9.64
-1.55
-5.59
42.1
clinic
33.37
-9.24
0.76
-2.28
-0.67
-13.43
romance
36.04
-4.62
-1.87
-1.14
-3.57
-6.72
cos
36.04
-4.62
-1.87
-1.14
-3.57
-6.72
subc
-6.85
-8.78
-3.56
-2.17
68.77
-12.76
seen
-10.46
64.18
-5.43
-3.31
-4.93
-8.6
analysis
-11.7
12.38
-19.15
3.23
90.24
-24.33
behavior
2.32
-1.08
5.86
3.84
31.48
-55.18
trading
-0.91
-2.38
-1.7
-9.02
-16.11
47.12
Nominal modifiers of risk participants
CHT
NYT
TBT
UST
WAP
WSJ
health
334.74
28.7
427.18
-0.46
176.93
-1130.47
credit
-38.48
-30.97
-149.54
-119.28
-15.4
1059.84
inflation
-51.04
-40.36
-134.54
-12.42
-28.71
663.79
downside
-41.41
-1.75
-179.29
-28.61
-41.84
694.28
interest-rate
-55.24
-31.65
-53.35
-36.12
-91.6
659.25
flight
163.41
0.26
164.02
-64.22
34.54
-352.71
currency
-72.01
6.03
-110.04
-17.11
-116.81
521.68
cancer
105.69
12.48
2.28
266.86
17.29
-412.39
heart-attack
-39.38
-70.34
-11.81
-6.33
-32.37
353.54
percent
65.75
39.19
0.44
-79.15
49.77
-313.37
breast
69.42
1.14
-28.83
335.68
2.36
-387.21
counterparty
-39.67
-6.73
-41.52
-41.37
-45.11
320.89
security
-1.35
118.97
39.46
-12.86
295.16
-392.26
default
-5.03
-23.67
-57.17
-8.63
-12.07
269.3
market
-7.1
-1.76
-28.25
-29.19
-9.79
257.18
title
190.44
-34.7
-0.45
-18.39
-0.36
-55.63
foreign-exchange
-32.63
-8.88
-14.17
-4.96
-23.06
153.8
prepayment
-4.16
-6.11
-20.22
-25.01
-30.04
171.06
stock-market
-25.42
-21.82
-2.87
-5.52
-6.67
118.52
disease
13.05
2.83
-0.03
106.25
18.27
-167.8
escape
-5.38
-14.22
347.76
-5.78
-3.16
-90.09
price
-0.03
-19.8
-17.58
-3.11
-11.46
126.64
litigation
-17.8
0.01
-23.97
-10.74
-5.53
117.0
trading
-14.55
0.0
-19.21
-5.84
-38.49
135.56
liquidity
-17.58
0.84
-16.12
-24.7
-19.08
121.68
national-security
-22.39
-5.7
-9.72
-9.69
-7.13
98.61
execution
-11.61
-0.0
-19.5
-6.02
-28.67
122.64
exchange-rate
-14.82
-4.11
-9.18
-3.51
-10.56
98.2
attack
4.85
0.15
-0.13
217.98
0.0
-148.88
event
-5.45
-0.1
-33.28
-2.73
-16.06
97.33
country
-12.2
0.15
-16.12
-4.75
-11.55
89.08
business
-5.47
-0.26
-0.15
-35.79
-1.11
96.07
heart
2.75
51.42
-14.42
318.29
-8.58
-120.9
bank
-21.35
0.18
-4.57
-4.53
-15.99
75.99
investment
0.3
-4.19
-13.5
-5.13
-0.11
82.63
taxpayer
-14.02
-17.41
-0.03
-1.32
-1.13
61.72
safety
15.86
2.1
31.19
0.03
45.02
-79.36
suicide
-0.13
11.04
61.59
-1.3
6.38
-58.62
balance-sheet
-3.7
-14.48
-5.44
-5.42
-6.08
64.46
own
-0.43
34.05
14.06
-10.26
3.42
-50.06
specialty
60.9
-10.09
-3.79
-3.78
-1.15
-13.32
injury
4.8
-2.22
1.35
89.52
7.35
-86.49
stock
-0.07
-12.91
-3.46
0.2
-21.64
68.76
sinkhole
-19.35
-22.38
258.81
-8.37
-18.2
-29.54
fire
0.85
4.77
17.77
31.25
0.5
-60.29
real-estate
-3.92
-10.53
-3.95
-3.94
-8.56
55.77
housing
-11.76
-4.39
-5.11
-1.17
-1.15
42.04
foreign-currency
-6.06
-4.39
-5.11
-1.17
-11.06
53.02
lifetime
9.63
0.1
1.93
17.32
17.32
-57.24
refinancing
-6.4
-0.98
-7.08
-2.48
-9.11
50.26
Nouns modified by adjectival risk
CHT
NYT
TBT
UST
WAP
WSJ
asset
-479.53
-43.28
-488.15
-31.05
-389.04
2635.33
bond
-88.76
-17.61
-86.68
-27.95
-89.21
866.27
student
18.62
-138.75
884.34
-11.7
148.74
-849.32
child
62.06
-133.39
500.18
-7.65
228.93
-790.5
debt
-29.65
-27.25
-75.45
-38.0
-51.26
493.08
behavior
107.03
19.43
-0.09
61.78
69.09
-440.28
youth
31.72
-171.55
289.8
-4.74
465.38
-736.97
currency
-55.02
-43.23
-32.99
-20.67
-48.55
357.99
bet
-56.59
-0.27
-96.24
0.13
-32.85
349.78
group
60.91
20.49
17.94
-1.38
59.9
-276.64
kid
13.52
-145.19
314.72
4.51
105.06
-370.19
pregnancy
56.01
2.95
77.9
-10.77
3.61
-190.88
slice
-38.17
-15.8
-22.89
-14.34
-25.88
206.08
offender
-2.81
85.88
32.93
3.23
0.0
-174.47
treasurys
-23.73
-27.59
-14.23
-8.92
-27.61
177.74
company
-39.02
-0.5
-19.04
0.07
-8.74
143.77
market
-4.4
-8.2
-31.65
-0.34
-17.99
169.24
family
45.67
-4.51
62.95
-8.26
4.28
-109.09
investment
-10.3
-0.27
-5.93
16.71
-25.88
162.11
security
-19.43
3.9
-5.43
-13.97
-22.75
154.25
loan
-15.44
-0.22
-20.16
-8.49
0.72
136.73
lifestyle
134.59
-23.0
-0.0
-3.23
-2.9
-27.99
inmate
2.56
-6.07
166.38
0.72
-0.15
-116.34
youngster
47.31
-17.26
24.19
-2.28
11.91
-95.14
investor
-6.31
0.11
-43.51
3.42
-47.62
135.18
reinsurance
-11.34
-13.93
-10.72
-6.72
-20.81
113.13
capital
-22.39
-3.61
2.34
-17.95
-25.25
112.47
sex
20.09
32.59
-4.41
7.75
-0.35
-79.54
c-fund
-19.94
-23.2
-11.96
-7.49
198.06
-37.44
fund
-6.87
-2.95
-29.3
4.94
-6.84
104.75
woman
45.73
-0.04
0.01
21.68
6.51
-78.23
situation
33.8
5.44
41.25
-2.49
-0.59
-68.74
firm
-12.63
-3.45
-42.32
0.48
-1.52
88.83
business
68.72
9.74
6.54
-1.96
0.11
-34.28
borrower
-0.85
-11.57
-55.38
-2.19
7.94
81.32
profile
-2.08
-6.57
-24.23
-0.89
-6.59
89.25
teens
21.43
-44.05
108.48
1.68
0.1
-79.18
requirement
-27.3
-1.89
-0.04
-9.24
-4.72
67.83
ratio
-12.3
-0.0
-3.69
-0.27
-15.15
87.54
stock
0.35
-20.62
-27.17
23.01
-6.45
69.99
category
34.65
1.33
21.89
0.01
-0.29
-47.9
product
-21.89
0.33
-22.14
6.77
-5.06
62.53
return
-4.31
-0.01
-55.89
20.21
-32.35
85.85
neighborhood
16.43
0.06
8.39
2.26
1.64
-66.47
mortgage
-15.05
-2.59
-2.78
-0.0
0.03
64.36
individual
19.07
-0.0
3.01
0.01
25.3
-70.07
assumption
-0.9
93.57
-4.32
-5.14
-7.35
-14.55
type
-4.63
-4.6
-5.85
-0.28
-5.36
68.99
sector
-7.32
-8.47
-12.41
-4.45
0.18
58.94
trade
-10.75
3.27
-39.15
1.46
-10.7
60.01
Adjectives modifying risk
CHT
NYT
TBT
UST
WAP
WSJ
systemic
-345.27
-5.71
-177.17
-95.21
-10.56
1051.43
high
28.22
41.85
189.21
47.12
18.78
-371.77
interest-rate
-22.96
-34.5
-30.9
-30.39
-40.71
362.9
increase
227.24
-10.38
-0.69
40.46
32.58
-98.92
great
5.54
24.24
75.52
-2.89
85.2
-288.52
-58.72
-120.28
-29.77
323.43
-35.12
91.7
excessive
-86.04
19.78
-87.25
-3.12
-1.7
162.25
regulatory
-22.7
-2.77
-18.77
-22.22
-30.78
222.81
greater
92.51
-2.17
79.67
23.4
11.14
-110.19
genuine
43.71
21.73
3.27
16.03
-7.04
-145.87
sovereign
-49.13
0.11
-30.51
-1.24
-34.48
155.28
geopolitical
-30.23
-12.11
-41.02
28.47
-19.12
143.44
breast-cancer
-4.72
-39.7
-3.51
-1.99
-31.95
173.15
own
49.14
-5.66
226.19
-0.05
-1.27
-79.0
greatest
13.53
6.46
20.83
13.84
38.16
-170.84
political
-65.94
60.95
-68.28
-70.54
21.26
121.89
foreign-exchange
-24.86
-11.85
-10.77
-4.13
-27.15
143.83
reputational
-31.79
0.04
-38.5
-10.85
-2.91
114.75
biggest
-3.25
-0.83
-21.59
14.26
-40.97
161.83
calculated
15.66
9.36
53.43
2.17
1.23
-116.76
highest
32.3
2.86
11.28
51.21
-0.0
-110.67
inflationary
-20.99
-6.28
-13.12
-3.79
-1.8
108.44
economic
-18.2
4.45
-17.86
-11.06
-3.8
107.2
cardiovascular
-16.46
0.04
-43.49
12.31
-9.18
94.95
near-term
-8.33
-4.32
-20.4
-1.27
-12.77
109.64
legal
-24.07
2.65
-10.1
-2.38
-4.68
95.38
perceive
-3.43
-2.73
-22.17
-14.4
-0.64
108.09
key
-14.74
-20.02
-4.53
0.21
-1.44
87.16
coronary
-0.37
104.61
0.25
-0.36
-33.23
-31.32
corporate
-35.19
-1.17
-1.24
-6.94
-1.26
76.26
grave
0.41
71.17
-12.44
-2.61
25.18
-83.82
serious
19.77
21.24
2.06
-4.98
51.05
-97.68
new
-21.82
7.11
-32.59
-0.24
-0.77
76.93
physical
4.21
17.52
2.26
-0.16
10.8
-89.49
emotional
39.99
10.5
-11.54
-2.28
0.31
-52.94
big
-4.29
0.15
-0.1
35.89
-75.0
130.28
upside
-5.86
-3.18
-24.14
-0.11
-13.03
89.23
financial
-1.89
0.66
19.38
-54.17
-6.35
107.6
less
-0.4
-0.07
-3.21
0.79
-17.21
97.64
rise
-18.53
0.6
-12.28
-0.29
-11.62
74.92
artistic
28.96
29.58
-1.52
-2.64
-15.47
-30.18
grow
-11.13
0.19
-25.24
-0.06
-1.02
66.6
provision
-8.66
-12.37
-3.75
-3.22
-9.46
69.4
global
-15.45
-1.07
-8.84
0.0
-0.9
60.55
public
8.21
-8.05
43.7
3.44
32.4
-84.77
additional
-13.83
-0.02
3.1
-0.01
-11.59
68.33
operational
-6.85
-3.4
-12.23
-4.85
-0.01
59.21
bigger
-2.34
-2.35
-1.85
6.43
-15.84
71.89
higher
186.9
-143.79
0.74
181.63
-5.6
0.59
main
-5.26
0.07
-10.95
-0.25
-7.96
59.65
Objects of risk processes
CHT
NYT
TBT
UST
WAP
WSJ
life
26.81
-0.12
406.84
10.3
53.14
-599.24
injury
35.98
-3.18
100.17
18.77
1.37
-272.51
capital
-7.92
-0.07
-22.23
-1.47
-14.45
182.2
management
-8.29
-10.49
2.69
-12.74
-17.1
143.94
appetite
-11.47
-9.49
-6.83
-3.49
-13.07
98.39
return
-6.95
0.33
-13.64
0.13
-7.56
88.48
cost
-5.66
-0.27
-0.28
0.18
-5.3
68.03
inflation
-7.75
1.07
-29.18
0.07
-1.03
57.2
aversion
-7.87
-0.98
-8.29
-4.24
-6.0
62.2
profit
-8.68
0.66
-9.51
-3.76
-5.14
60.89
limb
33.77
-6.18
25.08
-0.82
0.0
-31.31
money
-4.15
-0.29
0.13
-2.01
-0.46
55.96
premium
-2.42
-2.69
-3.35
-2.98
-3.93
57.37
reward
-0.97
-1.33
-10.25
11.59
-1.65
48.84
backlash
-1.24
-11.16
-18.69
3.58
13.9
32.42
benefit
-2.68
5.08
-33.93
2.9
-0.03
45.61
market
-4.39
-1.45
-6.16
-1.22
-0.63
46.24
downgrade
-5.28
-1.25
-10.97
0.03
-1.7
42.96
fine
12.33
-0.37
5.21
4.48
2.61
-37.19
price
0.64
-7.07
-5.62
-2.98
-3.93
45.83
debt
-4.81
-14.19
-1.79
0.15
-0.01
33.26
decline
-6.69
0.15
-1.98
-0.62
-5.46
40.08
ire
-3.38
-0.28
-1.69
-1.21
-0.04
38.81
risk
3.77
-11.62
-8.35
3.12
0.02
33.37
safety
5.01
-3.28
65.77
-0.2
-1.44
-21.57
spiral
-13.11
1.5
-7.8
0.0
-3.04
23.65
health
6.77
1.32
15.79
1.52
-1.0
-24.3
recession
0.63
-1.19
-17.26
-0.0
-0.16
31.28
volatility
0.31
-0.14
-16.1
-0.29
-8.75
35.14
arrest
2.23
6.57
-0.35
-0.04
11.55
-32.42
profile
-0.85
-2.96
-7.8
0.0
-0.56
27.43
suspension
11.0
-2.27
1.37
6.35
0.04
-22.47
cancer
27.04
-1.56
-1.08
0.94
-2.41
-3.03
bone
24.31
-0.76
-2.29
0.06
-4.0
-3.38
tempo
26.98
-4.54
-1.95
-1.0
-3.73
-2.39
death
1.66
7.9
0.4
1.07
3.13
-23.99
investor
-1.11
-0.59
-6.58
-3.37
-0.64
25.77
marriage
22.43
-1.61
-1.11
-0.29
-0.05
-4.13
control
-9.57
8.38
-6.89
-3.64
-2.82
21.59
friendship
16.65
0.0
-1.08
-1.43
-0.19
-7.24
rate
0.77
-2.54
-0.09
-3.42
-4.33
26.23
deflation
-1.24
-2.55
-3.17
-1.62
-0.25
20.72
sentence
-0.0
0.2
38.33
-14.21
-0.03
-12.82
dilution
-0.01
-1.79
-4.15
-2.12
-3.0
23.19
uncertainty
-6.33
14.07
-12.36
-2.89
-4.39
25.73
sector
-2.05
-2.84
-1.22
-0.62
-2.33
19.76
illiquidity
-2.05
-2.84
-1.22
-0.62
-2.33
19.76
season
4.88
1.08
0.08
0.78
-0.7
-16.41
embarrassment
6.15
7.39
-5.5
2.71
-0.02
-13.13
technology
-4.51
0.04
-2.68
-1.37
-5.13
19.63
Subjects of risk, run risk, take risk
CHT
NYT
TBT
UST
WAP
WSJ
investor
-8.12
-12.52
-35.54
-1.92
-34.3
213.49
cub
83.76
-14.9
-5.19
-3.05
-11.11
-9.99
bank
-9.07
0.64
-16.83
-1.01
-26.74
82.59
bear
62.13
-3.15
-6.49
-3.81
-13.89
-6.65
company
-8.7
0.44
-24.09
0.09
-14.11
67.6
firm
-6.58
-1.45
-5.17
-2.38
-0.9
40.1
sox
34.7
-7.02
0.33
-0.08
-9.72
-8.74
village
35.21
-8.07
-2.81
-1.65
-1.66
-5.41
fed
-2.05
-5.36
-15.76
1.79
-0.17
27.48
person
8.66
-0.35
1.57
0.13
1.8
-26.93
city
0.16
-11.88
79.86
-1.51
-0.29
-24.02
fund
-0.95
-0.01
-5.89
-2.98
-15.8
40.56
maker
-8.12
1.27
-8.01
-4.69
-17.13
33.65
national
-6.14
-9.93
-3.46
0.0
33.02
-2.11
athlete
12.22
-0.02
-0.92
2.99
-2.09
-14.57
oriole
-4.61
-7.45
-2.6
-1.52
37.85
-5.0
ecb
-2.69
-4.35
-1.51
-0.89
-3.24
23.41
man
9.6
-0.17
5.15
-3.73
-0.02
-13.36
player
8.86
0.58
-0.11
-1.27
0.11
-14.49
redskin
-2.03
-10.55
-3.68
-0.0
35.92
-7.08
pampg
-2.3
-3.72
-1.3
-0.76
-2.78
20.06
portfolio
-2.3
-3.72
-1.3
-0.76
-2.78
20.06
daley
20.94
-3.72
-1.3
-0.76
-2.78
-2.5
orchestra
15.66
-1.33
0.01
-1.14
-4.17
-3.75
morgan
-4.22
-2.15
-2.38
-1.4
-0.04
15.11
stock
-0.63
-6.83
-2.38
-1.4
-0.04
15.11
goldman
-4.99
-0.09
-2.81
-1.65
-1.66
15.71
ray
-11.52
-18.62
123.54
-3.81
-7.81
-12.49
hawk
19.68
-3.51
-3.03
0.02
-1.97
-1.55
loan
-1.92
-3.1
-1.08
-0.63
-2.31
16.72
retailer
-2.32
-0.99
-1.63
0.07
-3.62
18.06
alderman
17.45
-3.1
-1.08
-0.63
-2.31
-2.08
-3.84
-6.21
-2.16
5.41
-4.63
12.44
student
-0.41
-0.44
8.05
-0.22
7.04
-20.82
option
-7.68
-3.46
-0.73
-2.54
8.44
2.92
county
-1.41
-11.96
76.98
-5.71
-1.63
-8.22
beijing
-0.72
-0.32
-4.11
-2.41
-3.65
18.03
florida
-2.84
-3.46
53.73
-2.54
-4.01
-8.33
yankee
-5.76
23.81
-0.24
-1.9
-2.29
-6.24
child
0.06
3.65
3.07
-1.62
-0.02
-13.15
chrysler
-2.3
-3.72
-1.3
-0.76
-0.06
11.73
customer
-0.04
-8.46
-0.01
0.0
-0.41
11.2
trust
-3.46
-0.1
-1.95
-1.14
-4.17
14.7
cooper
-2.69
-4.35
-1.51
-0.89
22.08
-2.91
troop
-3.11
-2.46
-2.31
0.0
18.52
-0.5
mets
-6.53
28.56
-3.68
-2.16
-7.87
-0.62
owner
9.92
-2.18
1.54
3.14
-3.27
-5.3
stanley
-1.54
-2.48
-0.87
-0.51
-1.85
13.37
inflation
-1.54
-2.48
-0.87
-0.51
-1.85
13.37
cutler
13.96
-2.48
-0.87
-0.51
-1.85
-1.67
In [4]:
from corpkit import Interrodict
bypaper = OrderedDict([(x, {}) for x in ['CHT', 'NYT', 'TBT', 'UST', 'WAP', 'WSJ']])
# each search and its interrodict
for interro, data in r.items():
if interro not in trans.keys():
continue
# each newspaper and its interrogation
for paper, datum in data.items():
bypaper[paper][interro] = datum
bypaper = Interrodict([(k, Interrodict(v)) for k, v in bypaper.items()])
Now we can create a sketch of risk behaviour in each publication:
In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.close('all')
for i, (name, data) in enumerate(bypaper.items()):
dim = (len(data.values()) / 2)
display(HTML('<h1>Features of <i>risk</i> in the %s:</h><br>' % name))
f, axarr = plt.subplots(dim, 2)
#f.suptitle('Features of risk in the %s' % name, fontsize = 20)
for index, (interro, datum) in enumerate(data.items()):
ax = axarr.reshape(-1)[index]
#colour = colours[index]
datum = datum.edit(skip_entries = wordlists.closedclass, print_info = False)
datum = datum.results.sum() / datum.results.sum().sum()
datum.visualise(trans[interro], kind='bar', ax=ax, figsize = (15, 7), colours = sixcolours[i],
num_to_plot = 10, grid = True, x_label = 'Word')
plt.show()
Features of risk in the CHT:
Features of risk in the NYT:
Features of risk in the TBT:
Features of risk in the UST:
Features of risk in the WAP:
Features of risk in the WSJ:
In [ ]:
# proper noun risker only
q = {T: r'/NNP.?/ !< /(?i).?\brisk.?/ >># (@NP $ (VP <+(VP) (VP ( <<# (/VB.?/ < /(?i).?\brisk.?/) '\
r'| <<# (/VB.?/ < /(?i)(take|taking|takes|taken|took|run|running|runs|ran|put|putting|puts)/) '\
r'< (NP <<# (/NN.?/ < /(?i).?\brisk.?/))))))'}
priskers = corpora.interrogate(q, show = L)
priskers.save('priskers')
In [ ]:
# non-proper noun risker
q = {T: r'/NN[^P]*/ !< /(?i).?\brisk.?/ >># (@NP $ (VP <+(VP) (VP ( <<# (/VB.?/ < /(?i).?\brisk.?/) '\
r'| <<# (/VB.?/ < /(?i)(take|taking|takes|taken|took|run|running|runs|ran|put|putting|puts)/) '\
r'< (NP <<# (/NN.?/ < /(?i).?\brisk.?/))))))'}
nriskers = corpora.interrogate(q, show = L)
nriskers.save('nriskers')
In [ ]:
# any entity
q = {T: r'/NN.?/ !< /(?i).?\brisk.?/ >># NP'}
entities = corpora.interrogate(q, show = L, quicksave = 'entities')
entities.save('entities')
In [ ]:
# do maths
prelrisk = priskers.collapse().edit('%', entities.collapse())
prelrisk.save('prelriskers')
nrelrisk = nriskers.collapse().edit('%', entities.collapse())
nrelrisk.save('nrelriskers')
In [27]:
r.prelrisk.results.T.head(50).style.background_gradient(cmap=divergemap, axis = 1).set_caption('Proper noun riskers')
Out[27]:
Proper noun riskers
CHT
NYT
TBT
UST
WAP
WSJ
bush
3.35
1.45
0.71
1.97
1.23
1.46
state
0.18
0.37
0.16
0.31
0.49
0.08
clinton
1.99
1.4
1.17
2.25
1.55
2.1
fed
2.27
1.2
0.92
1.67
1.57
0.94
american
0.6
0.83
0.92
0.32
0.55
0.4
israel
2.31
1.84
2.75
2.72
2.04
1.61
u.s.
1.22
0.12
0.0
0.66
0.25
0.53
obama
1.57
2.0
0.8
1.77
0.97
1.5
democrat
1.23
0.88
0.69
1.13
0.97
0.5
republican
0.88
0.91
0.49
0.95
0.85
0.84
congress
0.51
0.5
0.41
0.32
0.39
0.53
washington
0.62
0.44
0.0
0.51
0.63
0.3
america
0.23
0.67
0.33
1.14
0.45
0.09
president
0.06
0.39
0.0
0.0
0.02
0.03
ray
0.0
0.0
4.64
0.0
0.71
0.0
house
0.25
0.24
0.0
0.0
0.22
0.19
china
0.3
0.42
0.0
0.75
0.35
0.41
bear
2.77
1.19
0.0
0.0
0.0
0.31
gorbachev
2.82
3.0
4.26
0.0
2.46
1.96
hussein
1.29
1.85
2.35
1.02
1.22
0.51
cub
4.85
0.0
0.0
0.0
0.0
0.0
iraq
0.37
0.4
0.0
0.32
0.2
0.14
west
0.49
0.72
0.37
0.0
0.81
0.87
sox
3.31
1.03
6.52
3.7
0.0
0.0
iran
1.61
0.31
0.0
0.0
1.02
0.39
florida
0.34
0.41
0.4
0.0
0.26
0.0
beijing
2.0
1.11
0.0
0.0
0.38
1.85
nbc
1.92
2.12
4.62
1.68
2.94
2.61
arafat
4.31
2.55
6.45
4.88
2.34
0.0
gop
4.65
0.0
1.41
3.08
3.14
1.73
department
0.24
0.2
0.03
0.0
0.08
0.04
administration
0.0
0.34
0.08
0.0
0.02
0.06
russia
0.27
0.55
0.0
0.55
0.35
0.52
williams
1.31
1.02
0.45
0.0
0.82
0.0
musharraf
2.94
2.17
0.0
13.33
5.71
4.6
mets
0.0
3.25
0.0
0.0
0.0
3.03
netanyahu
1.37
4.46
0.0
0.0
3.94
6.15
mccain
1.95
1.35
1.63
0.0
1.25
0.75
council
0.13
0.28
0.42
0.0
0.38
0.22
cuban
5.48
4.12
1.59
0.0
4.9
1.64
gore
0.93
1.01
1.61
0.64
0.42
0.9
johnson
0.19
0.0
1.26
0.59
0.69
0.19
iraqi
0.79
1.29
2.56
2.15
1.43
0.62
reagan
0.88
1.5
0.59
0.0
0.99
1.69
bank
0.06
0.04
0.05
0.0
0.02
0.03
putin
3.8
0.53
5.0
0.0
2.9
2.12
yankee
0.0
2.65
1.54
0.0
1.49
0.0
japan
0.0
0.62
0.74
0.52
0.15
0.22
fox
2.3
1.84
0.83
0.0
0.6
1.0
sharon
7.46
4.2
0.0
5.26
0.69
2.63
In [28]:
r.nrelrisk.results.T.head(50).style.background_gradient(cmap=divergemap, axis = 1).set_caption('Noun riskers')
Out[28]:
Noun riskers
CHT
NYT
TBT
UST
WAP
WSJ
person
0.8
0.67
0.65
0.58
0.69
0.48
company
0.75
0.65
0.35
0.76
0.58
0.48
investor
0.88
0.7
0.48
0.54
0.65
0.66
bank
0.9
0.67
0.48
0.73
0.38
0.47
government
0.81
0.77
0.4
0.6
0.83
0.59
state
0.43
0.5
0.36
0.55
0.68
0.39
man
0.86
0.6
0.83
0.26
0.57
0.5
woman
0.31
0.31
0.19
0.28
0.19
0.4
bush
3.35
1.45
0.71
1.97
1.23
1.46
president
0.52
0.67
0.52
0.44
0.96
0.28
leader
1.54
1.19
0.89
0.82
0.97
0.76
city
0.87
0.35
1.49
0.32
0.76
0.29
player
1.52
1.15
0.76
0.4
1.13
0.53
administration
1.12
0.87
0.64
0.72
0.62
0.75
firm
0.51
0.58
0.6
0.38
0.58
0.44
clinton
1.99
1.4
1.17
2.25
1.55
2.1
team
0.74
0.84
0.82
0.66
0.85
0.48
american
0.73
0.86
1.28
0.4
0.65
0.63
fed
2.27
1.2
0.92
1.67
1.65
0.94
anyone
1.6
1.37
1.41
1.61
1.29
1.15
official
0.3
0.3
0.2
0.09
0.32
0.14
manager
0.96
0.86
0.21
1.04
0.62
0.49
republican
1.33
1.16
0.81
1.42
1.35
1.36
country
0.36
0.21
0.38
0.47
0.36
0.38
owner
2.01
1.0
0.94
1.85
0.86
0.63
fund
0.19
0.24
0.1
0.05
0.08
0.2
member
0.34
0.62
0.39
0.4
0.46
0.41
israel
2.31
1.84
2.75
2.72
2.04
1.61
worker
0.51
0.66
0.59
0.15
0.51
0.44
student
0.43
0.61
0.4
0.3
0.59
0.16
others
0.47
0.52
0.59
0.51
0.44
0.25
someone
0.67
0.9
0.88
1.73
0.83
0.9
democrat
1.37
0.99
0.69
1.13
1.1
0.5
u.s.
1.22
0.12
0.0
0.66
0.25
0.53
driver
1.13
0.95
1.21
1.55
0.74
0.74
party
0.73
0.84
0.58
0.71
0.87
0.43
obama
1.57
2.0
0.8
1.77
0.97
1.5
child
0.14
0.26
0.14
0.06
0.13
0.08
one
0.61
0.6
0.57
0.74
0.36
0.46
school
0.22
0.23
0.17
0.2
0.4
0.22
move
0.41
0.95
0.37
0.14
0.64
0.51
way
0.31
0.09
0.22
0.12
0.15
0.06
executive
0.58
0.42
0.38
1.23
0.95
0.25
soldier
1.08
0.97
3.61
1.22
1.71
1.33
group
0.16
0.17
0.22
0.09
0.16
0.13
congress
0.51
0.5
0.41
0.32
0.39
0.53
employee
0.6
0.38
0.43
0.28
0.57
0.58
network
0.78
1.71
1.16
2.21
0.31
0.51
time
0.1
0.12
0.11
0.07
0.06
0.06
life
0.09
0.14
0.11
0.25
0.17
0.12
In [10]:
kpriskers = r.priskers.collapse().edit('k', SELF, print_info = False)
kpriskers.results.T.head(20).style.background_gradient(cmap=divergemap, axis = 1).set_caption('Keyness of proper noun riskers')
Out[10]:
Keyness of proper noun riskers
CHT
NYT
TBT
UST
WAP
WSJ
cub
82.52
-15.94
-5.22
-3.09
-11.24
-8.6
bear
64.27
-5.56
-6.31
-3.73
-13.59
-4.95
u.s.
12.82
-42.33
-16.75
-0.15
-22.85
61.95
fed
-2.27
-6.78
-15.67
1.77
-0.44
36.27
sox
33.94
-7.77
0.33
-0.08
-9.84
-7.53
hawk
28.08
-2.47
-2.39
-1.41
-5.15
-3.94
president
-3.5
41.65
-6.74
-3.99
-8.35
-2.79
oriole
-4.74
-7.97
-2.61
-1.54
37.59
-4.3
redskin
-4.34
-7.31
-2.39
-1.41
34.46
-3.94
p&g
-2.37
-3.98
-1.31
-0.77
-2.81
21.69
morgan
-4.34
-2.47
-2.39
-1.41
-0.05
17.11
goldman
-5.13
-0.18
-2.83
-1.67
-1.7
17.92
daley
20.63
-3.98
-1.31
-0.77
-2.81
-2.15
beijing
-0.8
-0.52
-4.13
-2.44
-3.73
20.88
national
-3.95
-6.64
-2.18
0.21
22.16
-3.58
chrysler
-2.37
-3.98
-1.31
-0.77
-0.07
13.03
ray
-11.85
-19.92
123.26
-3.86
-7.95
-10.75
street
-4.34
-0.6
-2.39
0.14
-1.12
12.32
stanley
-1.58
-2.66
-0.87
-0.51
-1.87
14.46
cooper
-2.76
-4.65
-1.52
-0.9
21.93
-2.51
In [9]:
knriskers = r.nriskers.collapse().edit('k', SELF, print_info = False)
knriskers.results.T.head(50).style.background_gradient(cmap=divergemap, axis = 1).set_caption('Keyness of noun riskers')
Out[9]:
Keyness of noun riskers
CHT
NYT
TBT
UST
WAP
WSJ
investor
-8.37
-12.27
-35.81
-1.99
-33.89
213.5
cub
83.5
-14.84
-5.22
-3.06
-11.06
-9.99
bank
-9.3
0.69
-17.0
-1.05
-26.42
82.59
bear
61.88
-3.11
-6.52
-3.83
-13.82
-6.65
company
-9.01
0.5
-24.37
0.08
-13.79
67.61
u.s.
13.48
-38.97
-16.74
-0.14
-22.33
52.25
firm
-6.71
-1.4
-5.24
-2.43
-0.86
40.1
sox
34.55
-6.98
0.33
-0.08
-9.68
-8.74
village
35.09
-8.04
-2.83
-1.66
-1.64
-5.41
fed
-2.12
-5.27
-15.85
1.75
-0.15
27.49
person
8.27
-0.29
1.47
0.1
1.95
-26.93
city
0.14
-11.74
79.49
-1.55
-0.26
-24.02
fund
-1.0
-0.01
-5.95
-3.02
-15.66
40.57
maker
-8.19
1.3
-8.04
-4.72
-17.05
33.65
national
-6.18
-9.89
-3.48
0.0
33.13
-2.11
athlete
12.11
-0.01
-0.93
2.96
-2.05
-14.57
oriole
-4.63
-7.42
-2.61
-1.53
37.94
-4.99
ecb
-2.7
-4.33
-1.52
-0.89
-3.23
23.41
man
9.38
-0.15
5.06
-3.8
-0.02
-13.36
player
8.68
0.61
-0.12
-1.3
0.13
-14.49
redskin
-2.05
-10.51
-3.69
-0.0
36.04
-7.08
p&g
-2.32
-3.71
-1.3
-0.77
-2.76
20.06
portfolio
-2.32
-3.71
-1.3
-0.77
-2.76
20.06
daley
20.88
-3.71
-1.3
-0.77
-2.76
-2.5
orchestra
15.59
-1.32
0.01
-1.15
-4.15
-3.75
morgan
-4.25
-2.13
-2.39
-1.4
-0.04
15.11
goldman
-5.02
-0.09
-2.83
-1.66
-1.64
15.71
stock
-0.64
-6.8
-2.39
-1.4
-0.04
15.11
ray
-11.58
-18.55
123.3
-3.83
-7.76
-12.49
hawk
19.59
-3.48
-3.04
0.02
-1.95
-1.55
loan
-1.93
-3.09
-1.09
-0.64
-2.3
16.72
retailer
-2.35
-0.97
-1.64
0.07
-3.58
18.06
alderman
17.4
-3.09
-1.09
-0.64
-2.3
-2.08
%
-3.86
-6.18
-2.17
5.38
-4.61
12.44
student
-0.44
-0.42
7.97
-0.23
7.14
-20.81
option
-7.72
-3.43
-0.74
-2.55
8.5
2.92
beijing
-0.74
-0.31
-4.13
-2.42
-3.62
18.03
county
-1.45
-11.88
76.75
-5.74
-1.6
-8.22
florida
-2.88
-3.43
53.6
-2.55
-3.98
-8.32
yankee
-5.79
23.89
-0.24
-1.91
-2.26
-6.24
child
0.05
3.72
3.02
-1.65
-0.01
-13.15
chrysler
-2.32
-3.71
-1.3
-0.77
-0.06
11.73
customer
-0.04
-8.4
-0.01
0.0
-0.4
11.21
trust
-3.48
-0.09
-1.96
-1.15
-4.15
14.7
cooper
-2.7
-4.33
-1.52
-0.89
22.13
-2.91
troop
-3.16
-2.42
-2.33
0.0
18.65
-0.5
mets
-6.56
28.65
-3.69
-2.17
-7.83
-0.62
owner
9.76
-2.13
1.51
3.09
-3.2
-5.29
stanley
-1.54
-2.47
-0.87
-0.51
-1.84
13.38
inflation
-1.54
-2.47
-0.87
-0.51
-1.84
13.38
In [8]:
r.prelrisk.results.T.iloc[:10,:10].visualise(kind='bar', grid = True,
legend_pos = 'upper right', figsize = (12, 5),
colours = sixcolours, x_label = 'Publication')
r.nrelrisk.visualise(kind='bar', grid = True, legend_pos = 'upper right', figsize = (12, 5),
x_label = 'Publication')
Out[8]:
<module 'matplotlib.pyplot' from '/Users/daniel/virtenvs/ssled/lib/python2.7/site-packages/matplotlib/pyplot.pyc'>
Content source: interrogator/risk
Similar notebooks: