In [1]:
from sklearn.datasets import fetch_20newsgroups
categories = [
'alt.atheism',
'talk.religion.misc',
'comp.graphics',
'sci.space',
]
fetch_subset = lambda subset: fetch_20newsgroups(
subset=subset, categories=categories,
shuffle=True, random_state=42,
remove=('headers', 'footers', 'quotes'))
train = fetch_subset('train')
test = fetch_subset('test')
In [2]:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
vec = TfidfVectorizer(analyzer='char_wb', ngram_range=(3, 4))
clf = SGDClassifier(n_jobs=-1)
pipeline = Pipeline([('vec', vec), ('clf', clf)])
pipeline.fit(train['data'], train['target'])
Out[2]:
Pipeline(steps=[('vec', TfidfVectorizer(analyzer='char_wb', binary=False, decode_error='strict',
dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
lowercase=True, max_df=1.0, max_features=None, min_df=1,
ngram_range=(3, 4), norm='l2', preprocessor=None, smooth_idf=True,
... penalty='l2', power_t=0.5, random_state=None, shuffle=True,
verbose=0, warm_start=False))])
In [9]:
from eli5 import explain_weights, explain_prediction
from eli5 import format_as_html, format_as_text, format_html_styles
print(format_as_text(explain_weights(clf, vec, target_names=train['target_names'])))
Explained as: linear model
Features with largest coefficients per class.
Caveats:
1. Be careful with features which are not
independent - weights don't show their importance.
2. If scale of input features is different then scale of coefficients
will also be different, making direct comparison between coefficient values
incorrect.
3. Depending on regularization, rare features sometimes may have high
coefficients; this doesn't mean they contribute much to the
classification result for most examples.
y='alt.atheism' top features
Weight Feature
------ -------
+2.761 heis
+2.240 eis
+2.136 eist
+1.953 ░ath
+1.915 thei
+1.881 ░pos
+1.872 hei
+1.821 nat
+1.748 sla
+1.709 post
+1.686 slam
+1.656 ish░
+1.646 rna
+1.633 athe
+1.596 lam
+1.548 it░
+1.519 ░is
… 20221 more positive …
… 31994 more negative …
-1.519 pac
-1.522 ░*░
-1.539 ░us
y='comp.graphics' top features
Weight Feature
------ -------
+2.089 file
+1.947 ░3d
+1.936 phi
+1.783 gra
+1.749 raph
+1.744 fil
+1.734 mage
+1.725 ima
+1.696 mag
+1.670 hics
+1.668 aphi
+1.630 phic
+1.620 aph
+1.560 imag
+1.538 grap
+1.493 rap
… 26012 more positive …
… 29226 more negative …
-1.553 ░spa
-1.638 ░na
-1.854 pace
-1.932 spac
y='sci.space' top features
Weight Feature
------ -------
+3.213 spac
+3.136 pace
+2.723 spa
+2.533 pac
+2.470 ░spa
+1.960 orb
+1.866 ace░
+1.862 rbit
+1.839 rbi
+1.830 orbi
+1.795 ░nas
+1.773 ░sp
+1.772 ░orb
+1.723 nas
+1.674 ace
+1.671 999
+1.592 9999
+1.468 ..░
… 25011 more positive …
… 43849 more negative …
-1.484 ░:░
-1.485 phi
y='talk.religion.misc' top features
Weight Feature
------ -------
+2.022 ░*░
+1.799 ░he░
+1.690 ian░
+1.673 us░
+1.564 ian
+1.503 ░de
+1.466 fbi
+1.466 rist
+1.393 ░fbi
+1.353 fbi░
+1.335 ritu
+1.331 ░fb
+1.307 bi░
+1.302 fire
… 21780 more positive …
… 30323 more negative …
-1.332 ░cou
-1.373 nat
-1.408 cou
-1.434 ░ge
-1.632 ░ath
-1.801 heis
In [4]:
from IPython.core.display import display, HTML
show_html = lambda html: display(HTML(html))
show_html_expl = lambda expl, **kwargs: show_html(format_as_html(expl, include_styles=False, **kwargs))
show_html(format_html_styles())
In [5]:
show_html_expl(explain_weights(clf, vec, target_names=train['target_names'], top=100))
Explained as: linear model
Features with largest coefficients per class.
Caveats:
1. Be careful with features which are not
independent - weights don't show their importance.
2. If scale of input features is different then scale of coefficients
will also be different, making direct comparison between coefficient values
incorrect.
3. Depending on regularization, rare features sometimes may have high
coefficients; this doesn't mean they contribute much to the
classification result for most examples.
y=alt.atheism
top features
y=comp.graphics
top features
y=sci.space
top features
y=talk.religion.misc
top features
Weight?
Feature
+2.761
heis
+2.240
eis
+2.136
eist
+1.953
ath
+1.915
thei
+1.881
pos
+1.872
hei
+1.821
nat
+1.748
sla
+1.709
post
+1.686
slam
+1.656
ish
+1.646
rna
+1.633
athe
+1.596
lam
+1.548
it
+1.519
is
+1.504
ish
+1.490
it
+1.479
isla
+1.415
up?
+1.407
ogi
+1.405
pos
+1.401
logi
+1.398
cout
+1.392
isl
+1.354
up?
+1.350
up?
+1.334
lai
+1.332
be
+1.324
/\/\
+1.324
/\/
+1.324
\/\
+1.323
ath
+1.300
stin
+1.299
ural
+1.295
tura
+1.289
mad
+1.283
\/\/
+1.274
oans
+1.267
tly?
+1.263
mu
+1.261
natu
+1.259
oh
+1.252
log
+1.238
up
+1.237
isl
+1.234
laim
+1.221
mott
+1.221
sh
+1.218
obb
+1.205
bobb
+1.204
p?
+1.200
bet.
+1.198
nat
+1.197
ain
+1.195
ent
+1.180
tex
+1.179
aim
+1.178
po
+1.174
our
+1.173
wom
+1.171
free
+1.163
ci
+1.159
nan
+1.148
tin
+1.141
clai
+1.139
nish
+1.138
mmmm
+1.138
****
+1.127
nci
+1.124
gion
+1.124
wom
+1.121
say
+1.120
nati
+1.116
muc
+1.111
lami
+1.107
n!
+1.104
much
+1.098
nan
… 20158 more positive …
… 31977 more negative …
-1.099
cult
-1.100
rist
-1.102
spa
-1.104
ure
-1.110
cu
-1.124
brea
-1.131
bre
-1.158
ian
-1.162
/(
-1.200
use
-1.205
ndi
-1.231
pace
-1.235
spac
-1.264
ture
-1.332
-
-1.434
et
-1.504
his
-1.519
pac
-1.522
*
-1.539
us
Weight?
Feature
+2.089
file
+1.947
3d
+1.936
phi
+1.783
gra
+1.749
raph
+1.744
fil
+1.734
mage
+1.725
ima
+1.696
mag
+1.670
hics
+1.668
aphi
+1.630
phic
+1.620
aph
+1.560
imag
+1.538
grap
+1.493
rap
+1.469
fil
+1.468
omp
+1.457
card
+1.370
line
+1.360
ix
+1.304
|||
+1.295
680
+1.277
ine
+1.270
ile
+1.266
c
+1.264
ook
+1.263
cod
+1.260
680
+1.251
ode
+1.247
lin
+1.244
ima
+1.240
ips
+1.210
comp
+1.203
for
+1.191
3d
+1.191
ray
+1.189
vga
+1.187
68
+1.187
any
+1.183
42
+1.175
int
+1.166
uter
+1.162
code
+1.158
ode
+1.157
||||
+1.157
-
+1.151
lips
+1.130
gra
+1.116
___
+1.097
work
+1.080
hel
+1.074
co
+1.074
edg
+1.072
...
+1.067
!!
+1.061
copy
+1.059
ftp
+1.058
ork
+1.057
2.0
+1.055
llo,
+1.054
3do
+1.054
3do
+1.053
pli
+1.053
spl
+1.051
opy
+1.047
3d
+1.047
~~~
+1.041
ft
+1.040
orma
+1.038
:
+1.036
lin
+1.036
----
+1.034
****
+1.031
help
+1.025
ith
+1.024
lo,
+1.024
lo,
+1.024
~~~~
+1.019
42
+1.017
elp
+1.017
run
+1.013
hic
… 25945 more positive …
… 29213 more negative …
-1.025
ora
-1.040
nas
-1.066
astr
-1.067
nas
-1.068
eli
-1.099
orb
-1.125
ent
-1.161
orbi
-1.173
net
-1.189
orb
-1.191
as
-1.248
pac
-1.461
spa
-1.553
spa
-1.638
na
-1.854
pace
-1.932
spac
Weight?
Feature
+3.213
spac
+3.136
pace
+2.723
spa
+2.533
pac
+2.470
spa
+1.960
orb
+1.866
ace
+1.862
rbit
+1.839
rbi
+1.830
orbi
+1.795
nas
+1.773
sp
+1.772
orb
+1.723
nas
+1.674
ace
+1.671
999
+1.592
9999
+1.468
..
+1.465
bit
+1.432
astr
+1.405
na
+1.387
bill
+1.372
lan
+1.361
get
+1.341
nasa
+1.299
utt
+1.280
air
+1.275
asa
+1.268
la
+1.231
flig
+1.227
et
+1.221
get
+1.201
rati
+1.196
net
+1.182
cos
+1.180
ry
+1.166
asa
+1.142
cra
+1.127
act
+1.122
aun
+1.115
oon
+1.107
!!!!
+1.100
ton
+1.096
bil
+1.096
bil
+1.094
ht
+1.094
ane
+1.093
air
+1.085
oni
+1.084
rich
+1.078
ght
+1.068
ligh
+1.067
anet
+1.066
mars
+1.065
lon
+1.062
aunc
+1.061
it.
+1.055
athi
+1.053
lan
+1.046
sr
+1.043
low
+1.043
cos
+1.043
ala
+1.041
erti
+1.038
min
+1.037
ge
+1.029
act
+1.020
unch
+1.018
ndin
+1.014
it.
+1.006
en
… 24958 more positive …
… 43822 more negative …
-0.991
grap
-0.992
im
-0.994
wron
-0.998
<BIAS>
-0.999
rap
-1.005
rong
-1.021
aph
-1.031
kor
-1.039
ou
-1.051
file
-1.052
cop
-1.072
me.
-1.094
heis
-1.101
raph
-1.105
vid
-1.105
orm
-1.123
ist
-1.138
eis
-1.179
igi
-1.198
bo
-1.207
i'
-1.228
hics
-1.233
god
-1.251
phic
-1.271
aphi
-1.285
3d
-1.310
god
-1.484
:
-1.485
phi
Weight?
Feature
+2.022
*
+1.799
he
+1.690
ian
+1.673
us
+1.564
ian
+1.503
de
+1.466
fbi
+1.466
rist
+1.393
fbi
+1.353
fbi
+1.335
ritu
+1.331
fb
+1.307
bi
+1.302
fire
+1.293
cy!
+1.293
acy!
+1.293
cy!
+1.290
sa
+1.280
bl
+1.258
ans
+1.237
is
+1.229
bloo
+1.228
may
+1.225
nacy
+1.207
my
+1.195
may
+1.177
my
+1.163
amo
+1.153
itu
+1.153
amor
+1.153
ntal
+1.131
idn
+1.128
re,
+1.124
init
+1.123
re,
+1.121
ern
+1.115
vid
+1.113
ans
+1.108
didn
+1.106
ild
+1.102
eal
+1.097
/(
+1.097
me
+1.093
that
+1.087
idn'
+1.084
lood
+1.082
cul
+1.074
cult
+1.072
ians
+1.070
eria
+1.061
mor
+1.053
god
+1.049
and
+1.045
esh
+1.036
rn
+1.033
fir
+1.033
alit
+1.025
eati
+1.018
your
+1.017
he
+1.015
rit
+1.012
ici
+1.011
may
… 21731 more positive …
… 30292 more negative …
-1.025
est
-1.028
at
-1.031
free
-1.054
fre
-1.058
pac
-1.062
le
-1.063
coul
-1.079
lai
-1.079
it
-1.084
rna
-1.086
any
-1.090
co
-1.099
athe
-1.102
it
-1.118
late
-1.137
spa
-1.137
sig
-1.138
ile
-1.177
thei
-1.194
spac
-1.196
pace
-1.213
fre
-1.214
any
-1.219
hei
-1.240
ost
-1.246
it
-1.254
eis
-1.254
ti
-1.255
gra
-1.275
po
-1.290
eist
-1.332
cou
-1.373
nat
-1.408
cou
-1.434
ge
-1.632
ath
-1.801
heis
In [6]:
show_html_expl(explain_prediction(clf, test['data'][7], vec, target_names=train['target_names'], top=50), force_weights=True)
Explained as: linear model
y=alt.atheism
(score -1.951)
top features
y=comp.graphics
(score -2.845)
top features
y=sci.space
(score 1.862)
top features
y=talk.religion.misc
(score -1.450)
top features
Contribution?
Feature
+0.294
:
+0.059
be
+0.048
tin
+0.048
the
+0.034
is
+0.033
up
+0.032
ill
+0.030
ing
+0.029
wh
+0.027
ght
+0.026
ting
+0.025
of
+0.025
of
+0.025
wha
+0.024
what
+0.024
se
+0.024
of
… 466 more positive …
… 591 more negative …
-0.024
der
-0.025
erv
-0.026
obse
-0.026
obs
-0.026
ure
-0.027
sky
-0.027
bser
-0.027
eld
-0.027
moo
-0.028
serv
-0.028
moon
-0.028
the
-0.029
fie
-0.029
bit
-0.029
moo
-0.030
spa
-0.030
fra
-0.031
astr
-0.031
his
-0.032
ht
-0.032
spa
-0.033
ght
-0.033
use
-0.033
ard
-0.034
fi
-0.036
lig
-0.038
pace
-0.038
spac
-0.043
ligh
-0.043
fra
-0.062
th
-0.065
pac
-0.961
<BIAS>
Contribution?
Feature
+0.294
:
+0.038
fi
+0.033
ile
+0.031
bri
+0.030
co
+0.030
ase
+0.028
it
… 462 more positive …
… 584 more negative …
-0.028
ill
-0.028
str
-0.029
the
-0.029
tel
-0.029
of
-0.030
sate
-0.030
bil
-0.030
as
-0.031
hat
-0.031
of
-0.032
at
-0.032
rbi
-0.032
ono
-0.033
of
-0.033
rbit
-0.034
ligh
-0.034
tron
-0.034
nom
-0.035
rono
-0.038
be
-0.039
he
-0.042
lig
-0.043
onom
-0.043
spa
-0.044
orb
-0.045
orbi
-0.045
orb
-0.047
ast
-0.048
spa
-0.049
the
-0.052
stro
-0.053
pac
-0.055
ght
-0.055
ight
-0.056
ght
-0.057
ht
-0.057
pace
-0.059
igh
-0.060
spac
-0.069
astr
-0.091
as
-0.117
th
-0.964
<BIAS>
Contribution?
Feature
+0.108
pac
+0.099
spac
+0.096
pace
+0.093
astr
+0.087
th
+0.080
spa
+0.077
spa
+0.075
orb
+0.073
igh
+0.071
orbi
+0.070
orb
+0.070
rbit
+0.067
rbi
+0.066
ht
+0.066
ght
+0.061
onom
+0.058
bill
+0.057
ligh
+0.053
bil
+0.053
ast
+0.052
stro
+0.049
nom
+0.048
ght
+0.048
ight
+0.047
the
+0.047
bil
+0.044
rono
+0.043
ono
+0.042
moon
+0.042
bit
+0.042
oon
+0.040
sp
+0.040
ace
+0.040
as
+0.039
tro
+0.037
tron
+0.036
sate
+0.036
omy
+0.036
moo
+0.034
ry
+0.034
omy
+0.033
moo
+0.032
the
+0.030
str
+0.030
ld
… 543 more positive …
… 511 more negative …
-0.030
der
-0.031
fi
-0.032
cop
-0.420
:
-0.998
<BIAS>
Contribution?
Feature
+0.089
:
+0.059
is
+0.049
th
+0.047
fra
+0.040
sa
+0.035
his
+0.035
as
+0.032
fra
+0.030
my
+0.027
serv
+0.027
fi
+0.027
eld
+0.026
erv
+0.026
der
+0.025
that
+0.024
hat
+0.024
of
+0.024
cr
+0.024
br
… 509 more positive …
… 545 more negative …
-0.023
lli
-0.024
oul
-0.024
uld
-0.024
ould
-0.025
ing
-0.025
rbi
-0.025
far
-0.026
orbi
-0.026
ting
-0.026
what
-0.027
wha
-0.027
le
-0.028
ost
-0.028
rbit
-0.028
orb
-0.029
a
-0.030
ile
-0.031
co
-0.031
spa
-0.032
ti
-0.033
spa
-0.034
it
-0.035
be
-0.037
pace
-0.037
spac
-0.037
the
-0.039
ght
-0.039
tin
-0.045
pac
-0.053
the
-0.977
<BIAS>
y=alt.atheism
(score -1.951)
top features
Contribution?
Feature
… 466 more positive …
… 591 more negative …
-0.243
Highlighted in text (sum)
-0.961
<BIAS>
: while i'm sure sagan considers it sacrilegious, that wouldn't be
: because of his doubtfull credibility as an astronomer. modern, : ground-based, visible light astronomy (what these proposed
: orbiting billboards would upset) is already a dying field: the
: opacity and distortions caused by the atmosphere itself have
: driven most of the field to use radio, far infrared or space-based
: telescopes. hardly. the keck telescope in hawaii has taken its first pictures; they're
nearly as good as hubble for a tiny fraction of the cost. : in any case, a bright point of light passing through
: the field doesn't ruin observations. if that were the case, the
: thousands of existing satellites would have already done so (satelliets
: might not seem so bright to the eyes, but as far as astronomy is concerned,
: they are extremely bright.) i believe that this orbiting space junk will be far brighter still;
more like the full moon. the moon upsets deep-sky observation all
over the sky (and not just looking at it) because of scattered light. this is a known problem, but of course two weeks out of every four are
ok. what happens when this billboard circles every 90 minutes? what
would be a good time then? : frank crary
: cu boulder
y=comp.graphics
(score -2.845)
top features
Contribution?
Feature
… 462 more positive …
… 584 more negative …
-0.964
<BIAS>
-1.373
Highlighted in text (sum)
: while i'm sure sagan considers it sacrilegious, that wouldn't be
: because of his doubtfull credibility as an astronomer. modern, : ground-based, visible light astronomy (what these proposed
: orbiting billboards would upset) is already a dying field: the
: opacity and distortions caused by the atmosphere itself have
: driven most of the field to use radio, far infrared or space-based
: telescopes. hardly. the keck telescope in hawaii has taken its first pictures; they're
nearly as good as hubble for a tiny fraction of the cost. : in any case, a bright point of light passing through
: the field doesn't ruin observations. if that were the case, the
: thousands of existing satellites would have already done so (satelliets
: might not seem so bright to the eyes, but as far as astronomy is concerned,
: they are extremely bright.) i believe that this orbiting space junk will be far brighter still;
more like the full moon. the moon upsets deep-sky observation all
over the sky (and not just looking at it) because of scattered light. this is a known problem, but of course two weeks out of every four are
ok. what happens when this billboard circles every 90 minutes? what
would be a good time then? : frank crary
: cu boulder
y=sci.space
(score 1.862)
top features
Contribution?
Feature
+1.968
Highlighted in text (sum)
… 543 more positive …
… 511 more negative …
-0.998
<BIAS>
: while i'm sure sagan considers it sacrilegious, that wouldn't be
: because of his doubtfull credibility as an astronomer. modern, : ground-based, visible light astronomy (what these proposed
: orbiting billboards would upset) is already a dying field: the
: opacity and distortions caused by the atmosphere itself have
: driven most of the field to use radio, far infrared or space-based
: telescopes. hardly. the keck telescope in hawaii has taken its first pictures; they're
nearly as good as hubble for a tiny fraction of the cost. : in any case, a bright point of light passing through
: the field doesn't ruin observations. if that were the case, the
: thousands of existing satellites would have already done so (satelliets
: might not seem so bright to the eyes, but as far as astronomy is concerned,
: they are extremely bright.) i believe that this orbiting space junk will be far brighter still;
more like the full moon. the moon upsets deep-sky observation all
over the sky (and not just looking at it) because of scattered light. this is a known problem, but of course two weeks out of every four are
ok. what happens when this billboard circles every 90 minutes? what
would be a good time then? : frank crary
: cu boulder
y=talk.religion.misc
(score -1.450)
top features
Contribution?
Feature
… 509 more positive …
… 545 more negative …
-0.258
Highlighted in text (sum)
-0.977
<BIAS>
: while i'm sure sagan considers it sacrilegious, that wouldn't be
: because of his doubtfull credibility as an astronomer. modern, : ground-based, visible light astronomy (what these proposed
: orbiting billboards would upset) is already a dying field: the
: opacity and distortions caused by the atmosphere itself have
: driven most of the field to use radio, far infrared or space-based
: telescopes. hardly. the keck telescope in hawaii has taken its first pictures; they're
nearly as good as hubble for a tiny fraction of the cost. : in any case, a bright point of light passing through
: the field doesn't ruin observations. if that were the case, the
: thousands of existing satellites would have already done so (satelliets
: might not seem so bright to the eyes, but as far as astronomy is concerned,
: they are extremely bright.) i believe that this orbiting space junk will be far brighter still;
more like the full moon. the moon upsets deep-sky observation all
over the sky (and not just looking at it) because of scattered light. this is a known problem, but of course two weeks out of every four are
ok. what happens when this billboard circles every 90 minutes? what
would be a good time then? : frank crary
: cu boulder
In [7]:
show_html_expl(explain_prediction(clf, test['data'][1], vec, target_names=train['target_names']))
Explained as: linear model
y=alt.atheism
(score -2.503)
top features
y=comp.graphics
(score 1.733)
top features
y=sci.space
(score -1.000)
top features
y=talk.religion.misc
(score -2.112)
top features
Contribution?
Feature
+0.109
mad
+0.094
vat
+0.073
atic
+0.070
mad
+0.066
ican
+0.056
is
+0.052
ble.
+0.047
ade
+0.043
le.
+0.041
an
+0.041
ent
+0.039
le.
+0.039
fin
+0.036
wh
+0.033
in
+0.029
bra
+0.029
the
+0.029
ade
+0.027
ary
+0.026
made
+0.025
ly
+0.025
ing
+0.024
ctio
+0.023
in
+0.023
ary
+0.023
cti
+0.022
in
+0.022
thi
+0.021
tic
+0.021
me
+0.021
fin
+0.019
yon
+0.018
is
+0.018
one
+0.016
a
+0.016
ion
+0.016
thi
+0.016
of
+0.015
of
+0.015
of
+0.013
this
+0.011
tly
+0.010
llec
+0.010
te
+0.010
ca
+0.008
ece
+0.008
sit
+0.008
our
+0.007
yone
+0.006
tion
+0.006
tio
+0.005
an
+0.005
coll
+0.004
one
+0.004
find
+0.003
ing
+0.003
ng
+0.002
any
+0.002
e.
+0.002
ur
+0.002
ma
+0.002
lib
-0.001
lib
-0.001
can
-0.003
any
-0.004
he
-0.004
nyo
-0.004
anyo
-0.004
nyon
-0.004
rec
-0.005
ica
-0.005
ion
-0.005
the
-0.005
ati
-0.005
entl
-0.005
ry
-0.005
our
-0.006
re
-0.007
ble
-0.008
able
-0.009
ne
-0.009
ecti
-0.010
to
-0.010
sit
-0.010
he
-0.010
s.
-0.011
lect
-0.011
co
-0.011
va
-0.012
her
-0.013
ect
-0.013
abl
-0.013
lec
-0.013
re
-0.016
tica
-0.016
here
-0.016
ntl
-0.016
ere
-0.016
me
-0.017
ila
-0.017
the
-0.018
ere
-0.018
ite
-0.018
rar
-0.018
ind
-0.019
olle
-0.019
ftp
-0.019
ntly
-0.019
ft
-0.020
ecen
-0.020
ftp
-0.020
de
-0.020
can
-0.021
si
-0.021
can
-0.022
li
-0.022
cen
-0.022
tp
-0.022
ftp
-0.022
rec
-0.024
tly
-0.024
is
-0.025
lab
-0.025
his
-0.025
on
-0.025
me
-0.026
whe
-0.027
vati
-0.027
whe
-0.028
vat
-0.031
brar
-0.031
libr
-0.032
ibra
-0.032
ibr
-0.033
tour
-0.033
din
-0.034
lle
-0.034
wher
-0.035
cent
-0.036
ding
-0.036
help
-0.036
elp
-0.038
his
-0.038
th
-0.038
vail
-0.039
tou
-0.039
ite
-0.039
avai
-0.042
hel
-0.042
fi
-0.042
aila
-0.042
labl
-0.043
elp
-0.043
ndin
-0.043
lp
-0.044
site
-0.045
ava
-0.045
av
-0.046
tou
-0.046
vai
-0.047
rary
-0.047
ilab
-0.047
rece
-0.049
oll
-0.051
ava
-0.052
hel
-0.057
us.
-0.058
col
-0.058
us.
-0.059
us.
-0.059
ail
-0.074
indi
-0.082
us
-0.083
ndi
-0.084
col
-0.961
<BIAS>
Contribution?
Feature
+0.117
ftp
+0.114
ft
+0.096
ftp
+0.086
help
+0.084
elp
+0.082
tp
+0.082
hel
+0.080
ftp
+0.079
lib
+0.078
lib
+0.076
can
+0.071
brar
+0.069
ibra
+0.068
libr
+0.068
ibr
+0.066
lp
+0.063
site
+0.060
any
+0.059
can
+0.058
elp
+0.055
hel
+0.054
rar
+0.051
anyo
+0.051
de
+0.050
nyo
+0.049
nyon
+0.048
rary
+0.047
any
+0.047
fi
+0.045
abl
+0.045
bra
+0.045
here
+0.045
rec
+0.044
yone
+0.040
yon
+0.040
col
+0.039
a
+0.039
si
+0.039
fin
+0.038
wher
+0.038
ere
+0.038
able
+0.037
co
+0.036
ble
+0.036
ne
+0.035
col
+0.034
li
+0.031
us
+0.029
can
+0.029
ail
+0.029
ect
+0.028
rec
+0.026
ite
+0.025
tour
+0.024
sit
+0.024
sit
+0.023
labl
+0.023
her
+0.023
fin
+0.023
ece
+0.022
va
+0.022
lab
+0.022
find
+0.022
is
+0.022
aila
+0.021
ila
+0.021
ere
+0.020
avai
+0.020
vat
+0.019
olle
+0.019
vail
+0.019
whe
+0.018
vai
+0.018
whe
+0.017
ican
+0.016
ca
+0.016
ilab
+0.015
this
+0.014
ite
+0.013
us.
+0.013
ava
+0.013
is
+0.013
lect
+0.013
oll
+0.012
an
+0.012
tly
+0.011
is
+0.011
cen
+0.011
ade
+0.011
ing
+0.010
on
+0.010
our
+0.010
one
+0.009
ind
+0.008
re
+0.008
ati
+0.008
one
+0.008
ntl
+0.007
re
+0.007
in
+0.007
ntly
+0.006
ade
+0.005
ng
+0.005
rece
+0.005
ica
+0.004
in
+0.004
entl
+0.003
ava
+0.003
tly
+0.003
indi
+0.002
lec
+0.001
atic
+0.001
ion
+0.001
in
+0.000
ble.
+0.000
to
-0.000
av
-0.002
ly
-0.002
tion
-0.002
me
-0.002
tio
-0.003
his
-0.003
he
-0.003
din
-0.004
his
-0.006
ndi
-0.006
cent
-0.006
ary
-0.006
ion
-0.006
ing
-0.007
s.
-0.007
ry
-0.007
tica
-0.008
ding
-0.008
thi
-0.009
tou
-0.009
te
-0.009
ctio
-0.011
me
-0.012
the
-0.012
le.
-0.012
coll
-0.012
thi
-0.013
le.
-0.014
e.
-0.015
wh
-0.015
us.
-0.015
our
-0.015
me
-0.016
llec
-0.016
ma
-0.017
ur
-0.017
the
-0.018
made
-0.018
of
-0.019
of
-0.020
ecen
-0.020
ary
-0.020
of
-0.022
an
-0.022
ndin
-0.022
tou
-0.023
ecti
-0.025
us.
-0.030
the
-0.030
lle
-0.032
vati
-0.032
he
-0.032
tic
-0.039
cti
-0.042
mad
-0.042
mad
-0.043
vat
-0.045
ent
-0.072
th
-0.964
<BIAS>
Contribution?
Feature
+0.083
ndin
+0.067
ndi
+0.056
ry
+0.054
th
+0.051
lle
+0.048
tou
+0.044
oll
+0.043
col
+0.040
a
+0.038
coll
+0.035
tou
+0.035
vat
+0.033
tly
+0.032
col
+0.032
olle
+0.031
tica
+0.031
te
+0.030
wher
+0.030
av
+0.030
one
+0.029
ding
+0.029
the
+0.028
on
+0.027
ntly
+0.026
the
+0.023
tly
+0.023
ary
+0.021
din
+0.021
ntl
+0.021
nyon
+0.021
thi
+0.020
nyo
+0.019
can
+0.019
ing
+0.018
llec
+0.018
anyo
+0.018
li
+0.018
ing
+0.018
ite
+0.017
lab
+0.017
ail
+0.016
entl
+0.016
ng
+0.016
can
+0.014
one
+0.014
ava
+0.014
ilab
+0.012
ecen
+0.012
us.
+0.012
ma
+0.011
he
+0.011
his
+0.011
thi
+0.011
yone
+0.011
tic
+0.011
us.
+0.011
re
+0.011
yon
+0.010
ly
+0.009
the
+0.009
rece
+0.009
tio
+0.009
co
+0.009
her
+0.008
ary
+0.008
can
+0.008
ati
+0.008
us.
+0.008
tion
+0.008
ila
+0.008
ne
+0.008
s.
+0.007
ade
+0.007
cent
+0.007
va
+0.007
whe
+0.006
le.
+0.006
ca
+0.006
ava
+0.005
whe
+0.005
made
+0.005
lect
+0.004
aila
+0.004
sit
+0.004
ere
+0.003
to
+0.003
ion
+0.003
de
+0.003
labl
+0.003
his
+0.002
ion
+0.002
me
+0.002
avai
+0.001
vail
+0.001
any
+0.000
ade
+0.000
here
-0.000
vai
-0.000
any
-0.002
ur
-0.002
me
-0.002
lec
-0.003
this
-0.003
an
-0.003
sit
-0.003
in
-0.003
us
-0.003
ent
-0.003
ect
-0.004
ind
-0.005
rec
-0.005
re
-0.006
able
-0.007
ere
-0.007
ica
-0.008
tour
-0.008
ite
-0.009
le.
-0.009
abl
-0.009
e.
-0.009
indi
-0.011
ble.
-0.012
of
-0.012
of
-0.012
in
-0.012
mad
-0.013
of
-0.013
me
-0.013
is
-0.014
ece
-0.014
an
-0.015
si
-0.016
wh
-0.016
vati
-0.016
cti
-0.016
site
-0.016
ctio
-0.017
rec
-0.017
is
-0.018
is
-0.020
cen
-0.021
in
-0.021
ble
-0.023
our
-0.023
find
-0.024
ecti
-0.024
fin
-0.026
our
-0.026
mad
-0.028
rary
-0.029
help
-0.029
elp
-0.031
rar
-0.032
tp
-0.032
ftp
-0.033
hel
-0.033
fin
-0.035
ftp
-0.038
ican
-0.038
fi
-0.041
he
-0.041
ibr
-0.042
ibra
-0.043
libr
-0.044
brar
-0.046
lp
-0.047
bra
-0.047
elp
-0.048
hel
-0.049
atic
-0.050
lib
-0.052
ft
-0.056
ftp
-0.077
lib
-0.090
vat
-0.998
<BIAS>
Contribution?
Feature
+0.083
is
+0.060
me
+0.057
us.
+0.053
indi
+0.051
me
+0.048
he
+0.045
ecen
+0.044
ite
+0.044
tou
+0.043
his
+0.042
whe
+0.041
whe
+0.040
tou
+0.039
us.
+0.034
ecti
+0.034
vati
+0.034
fi
+0.033
us
+0.033
ndi
+0.032
us.
+0.030
th
+0.029
rece
+0.026
an
+0.026
ite
+0.025
cent
+0.023
cen
+0.023
his
+0.021
ding
+0.021
e.
+0.020
hel
+0.019
din
+0.018
wher
+0.017
our
+0.015
vai
+0.015
of
+0.015
ent
+0.013
to
+0.013
s.
+0.012
lp
+0.012
lle
+0.011
is
+0.011
lec
+0.011
re
+0.011
elp
+0.011
of
+0.010
he
+0.010
rary
+0.009
ava
+0.009
rec
+0.008
ece
+0.008
tour
+0.008
of
+0.007
ere
+0.007
rec
+0.007
tica
+0.006
hel
+0.006
ndin
+0.005
ind
+0.004
sit
+0.004
cti
+0.004
va
+0.003
find
+0.003
our
+0.002
tion
+0.002
this
+0.002
wh
+0.002
col
+0.001
tio
-0.000
si
-0.003
ble
-0.003
ntl
-0.003
ati
-0.003
ntly
-0.003
ava
-0.004
me
-0.004
vail
-0.005
re
-0.005
avai
-0.005
the
-0.006
an
-0.006
ilab
-0.007
labl
-0.007
ctio
-0.008
aila
-0.008
ur
-0.008
abl
-0.009
ica
-0.009
ect
-0.010
ma
-0.010
her
-0.010
ail
-0.010
site
-0.011
mad
-0.011
entl
-0.011
te
-0.011
ng
-0.011
vat
-0.012
on
-0.012
lect
-0.013
llec
-0.013
ila
-0.014
made
-0.014
elp
-0.014
in
-0.015
oll
-0.015
lab
-0.015
help
-0.016
ion
-0.016
tly
-0.016
li
-0.017
ing
-0.018
thi
-0.018
av
-0.018
ary
-0.018
in
-0.018
thi
-0.019
ion
-0.020
ly
-0.020
fin
-0.020
ing
-0.021
ftp
-0.021
rar
-0.021
ere
-0.022
ary
-0.023
tp
-0.023
in
-0.023
sit
-0.023
here
-0.024
one
-0.024
able
-0.024
tic
-0.025
libr
-0.025
is
-0.025
fin
-0.025
bra
-0.026
brar
-0.026
ibr
-0.026
ry
-0.027
atic
-0.027
ibra
-0.027
de
-0.027
ne
-0.029
olle
-0.029
can
-0.031
the
-0.031
ca
-0.031
ftp
-0.032
ft
-0.033
the
-0.033
tly
-0.033
nyo
-0.034
can
-0.034
ade
-0.034
coll
-0.035
nyon
-0.035
anyo
-0.036
ftp
-0.038
ade
-0.038
co
-0.039
col
-0.042
one
-0.043
lib
-0.047
le.
-0.050
yone
-0.050
ican
-0.052
le.
-0.055
any
-0.057
a
-0.057
any
-0.062
yon
-0.063
mad
-0.068
lib
-0.070
vat
-0.075
ble.
-0.092
can
-0.977
<BIAS>
y=alt.atheism
(score -2.503)
top features
Contribution?
Feature
-0.961
<BIAS>
-1.542
Highlighted in text (sum)
the vatican library recently made a tour of the us. can anyone help me in finding a ftp site where this collection is available.
y=comp.graphics
(score 1.733)
top features
Contribution?
Feature
+2.698
Highlighted in text (sum)
-0.964
<BIAS>
the vatican library recently made a tour of the us. can anyone help me in finding a ftp site where this collection is available.
y=sci.space
(score -1.000)
top features
Contribution?
Feature
-0.002
Highlighted in text (sum)
-0.998
<BIAS>
the vatican library recently made a tour of the us. can anyone help me in finding a ftp site where this collection is available.
y=talk.religion.misc
(score -2.112)
top features
Contribution?
Feature
-0.977
<BIAS>
-1.135
Highlighted in text (sum)
the vatican library recently made a tour of the us. can anyone help me in finding a ftp site where this collection is available.
In [8]:
import numpy as np
for doc in test['data'][:10]:
expl = explain_prediction(clf, doc, vec, target_names=train['target_names'], top_targets=1)
show_html_expl(expl, force_weights=False)
Explained as: linear model
y=sci.space
(score 0.275)
top features
Contribution?
Feature
+1.273
Highlighted in text (sum)
-0.998
<BIAS>
trry the skywatch project in arizona.
Explained as: linear model
y=comp.graphics
(score 1.733)
top features
Contribution?
Feature
+2.698
Highlighted in text (sum)
-0.964
<BIAS>
the vatican library recently made a tour of the us. can anyone help me in finding a ftp site where this collection is available.
Explained as: linear model
y=comp.graphics
(score 1.655)
top features
Contribution?
Feature
+2.619
Highlighted in text (sum)
-0.964
<BIAS>
hi there, i am here looking for some help. my friend is a interior decor designer. he is from thailand. he is
trying to find some graphics software on pc. any suggestion on which
software to buy,where to buy and how much it costs ? he likes the most
sophisticated software(the more features it has,the better)
Explained as: linear model
y=comp.graphics
(score 0.467)
top features
Contribution?
Feature
+1.431
Highlighted in text (sum)
-0.964
<BIAS>
rfd request for discussion for the open telematic group otg i have proposed the forming of a consortium/task force for the
promotion of naplps/jpeg, fif to openly discuss ways, method,
procedures,algorythms, applications, implementation, extensions of
naplps/jpeg standards. these standards should facilitate the creation
of real_time online applications that make use of voice, video,
telecommuting, hires graphics, conferencing, distant learning, online
order entry, fax,in addition these dicussion would assist all to
better understand how sgml, cals, oda, mime, oodbms, jpeg, mpeg,
fractals, sql, cdrom, cdromxa, kodak photocd, tcl, v.fast, and
eia/tia562, can best be incorporated and implemented to develop
telematic/multimedia applications. we want to be able to support dos, unix, mac, windows, nt, os/2
platforms. it is our hope that individuals, developers, corporations,
universities, r & d labs would join in in supporting such an endeavor. this would be a not_for_profit group with bylaws and charter. already
many corporations have decided to support otg (open telematic group) so
do not delay joining if you are a developer an rfd has been posted to form a usenet newsgroup and a faq will soon
be be composed to start promulgating what is known on the subject. if
you would like to be added to the maillist send email or mail to the
address below. this group would publish an electronic quarterly naplps/jpeg
newsletter as well as a hardcopy version. we urge all who wants to
see cmcs hires based applications & the naplps/jpeg g r o w, decide to
join and mutually benefit from this not-for_profit endeavor. note: telematic has been defined by mr. james martin as the marriage of voice, video, hi-res graphics, fax, ivr, music over telephone lines/lan. if you would like to get involve write to me at: img inter-multimedia group| internet: epimntl@world.std.com p.o. box 95901 | ed.pimentel@gisatl.fidonet.org atlanta, georgia, us | cis : 70611,3703 | fidonet : 1:133/407 | bbs : +1-404-985-1198 zyxel 14.4k
Explained as: linear model
y=comp.graphics
(score 0.493)
top features
Contribution?
Feature
+1.457
Highlighted in text (sum)
-0.964
<BIAS>
i am interested in finding 3d animation programs for the mac.
i am especially interested in any programs that don't exist
in a pc port and are so good that they would make me go buy
a mac. do any such exist?
Explained as: linear model
y=comp.graphics
(score 0.510)
top features
Contribution?
Feature
+1.475
Highlighted in text (sum)
-0.964
<BIAS>
i'm also interested in such a program. but most of all i'd like to know wich program is able to convert gif or pcx to dxf !!! when i have this program, i can scan pictures and frase (or something like that !) them.
this will be beyond the limit !!!
Explained as: linear model
y=comp.graphics
(score -0.614)
top features
Contribution?
Feature
+0.351
Highlighted in text (sum)
-0.964
<BIAS>
or how about: "end light pollution now!!" your banner would have no effect on its subject, but my banner would.
Explained as: linear model
y=sci.space
(score 1.862)
top features
Contribution?
Feature
+2.860
Highlighted in text (sum)
-0.998
<BIAS>
: while i'm sure sagan considers it sacrilegious, that wouldn't be
: because of his doubtfull credibility as an astronomer. modern, : ground-based, visible light astronomy (what these proposed
: orbiting billboards would upset) is already a dying field: the
: opacity and distortions caused by the atmosphere itself have
: driven most of the field to use radio, far infrared or space-based
: telescopes. hardly. the keck telescope in hawaii has taken its first pictures; they're
nearly as good as hubble for a tiny fraction of the cost. : in any case, a bright point of light passing through
: the field doesn't ruin observations. if that were the case, the
: thousands of existing satellites would have already done so (satelliets
: might not seem so bright to the eyes, but as far as astronomy is concerned,
: they are extremely bright.) i believe that this orbiting space junk will be far brighter still;
more like the full moon. the moon upsets deep-sky observation all
over the sky (and not just looking at it) because of scattered light. this is a known problem, but of course two weeks out of every four are
ok. what happens when this billboard circles every 90 minutes? what
would be a good time then? : frank crary
: cu boulder
Explained as: linear model
y=alt.atheism
(score 1.878)
top features
Contribution?
Feature
+2.839
Highlighted in text (sum)
-0.961
<BIAS>
not if you show that these hypothetical atheists are gullible, excitable
and easily led from some concrete cause. in that case we would also
have to discuss if that concrete cause, rather than atheism, was the
factor that caused their subsequent behaviour.
Explained as: linear model
y=sci.space
(score -0.048)
top features
Contribution?
Feature
+0.950
Highlighted in text (sum)
-0.998
<BIAS>
picture our universe floating like a log
in a river. as the log floats down the
river, it occasionally strikes rocks, the
bank, the bottom, other logs. when this collission
occurs, kinetic energy is translated into heat, the
log degrades, gets scraped up, and other energy translaions occur. the distribution of damage to
the log depends on the shape of the log. however, to a very small virus in a mite on the head of a
termite in the center of the log, the shock waves from the
collissions would appear uniformly random in direction. this is my theory for grb. they are evidence of our universe
interacting with other universes! why not! makes
just as much sense as the grb coming from the oort cloud! the log theory of universes can't be ruled out! of course, i'm a layman in the physics world. you physicists out there, tell me about this !!!!
Content source: TeamHG-Memex/eli5
Similar notebooks: