In [1]:
from sklearn.datasets import fetch_20newsgroups
categories = [
    'alt.atheism',
    'talk.religion.misc',
    'comp.graphics',
    'sci.space',
]
fetch_subset = lambda subset: fetch_20newsgroups(
    subset=subset, categories=categories,
    shuffle=True, random_state=42,
    remove=('headers', 'footers', 'quotes'))
train = fetch_subset('train')
test = fetch_subset('test')

In [2]:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDClassifier
from sklearn.feature_extraction.text import TfidfVectorizer

vec = TfidfVectorizer(analyzer='char_wb', ngram_range=(3, 4))
clf = SGDClassifier(n_jobs=-1)
pipeline = Pipeline([('vec', vec), ('clf', clf)])
pipeline.fit(train['data'], train['target'])


Out[2]:
Pipeline(steps=[('vec', TfidfVectorizer(analyzer='char_wb', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(3, 4), norm='l2', preprocessor=None, smooth_idf=True,
...   penalty='l2', power_t=0.5, random_state=None, shuffle=True,
       verbose=0, warm_start=False))])

In [9]:
from eli5 import explain_weights, explain_prediction
from eli5 import format_as_html, format_as_text, format_html_styles

print(format_as_text(explain_weights(clf, vec, target_names=train['target_names'])))


Explained as: linear model

Features with largest coefficients per class.
Caveats:
1. Be careful with features which are not
   independent - weights don't show their importance.
2. If scale of input features is different then scale of coefficients
   will also be different, making direct comparison between coefficient values
   incorrect.
3. Depending on regularization, rare features sometimes may have high
   coefficients; this doesn't mean they contribute much to the
   classification result for most examples.

y='alt.atheism' top features
Weight  Feature
------  -------
+2.761  heis   
+2.240  eis    
+2.136  eist   
+1.953  ░ath   
+1.915  thei   
+1.881  ░pos   
+1.872  hei    
+1.821  nat    
+1.748  sla    
+1.709  post   
+1.686  slam   
+1.656  ish░   
+1.646  rna    
+1.633  athe   
+1.596  lam    
+1.548  it░    
+1.519  ░is    
… 20221 more positive …
… 31994 more negative …
-1.519  pac    
-1.522  ░*░    
-1.539  ░us    

y='comp.graphics' top features
Weight  Feature
------  -------
+2.089  file   
+1.947  ░3d    
+1.936  phi    
+1.783  gra    
+1.749  raph   
+1.744  fil    
+1.734  mage   
+1.725  ima    
+1.696  mag    
+1.670  hics   
+1.668  aphi   
+1.630  phic   
+1.620  aph    
+1.560  imag   
+1.538  grap   
+1.493  rap    
… 26012 more positive …
… 29226 more negative …
-1.553  ░spa   
-1.638  ░na    
-1.854  pace   
-1.932  spac   

y='sci.space' top features
Weight  Feature
------  -------
+3.213  spac   
+3.136  pace   
+2.723  spa    
+2.533  pac    
+2.470  ░spa   
+1.960  orb    
+1.866  ace░   
+1.862  rbit   
+1.839  rbi    
+1.830  orbi   
+1.795  ░nas   
+1.773  ░sp    
+1.772  ░orb   
+1.723  nas    
+1.674  ace    
+1.671  999    
+1.592  9999   
+1.468  ..░    
… 25011 more positive …
… 43849 more negative …
-1.484  ░:░    
-1.485  phi    

y='talk.religion.misc' top features
Weight  Feature
------  -------
+2.022  ░*░    
+1.799  ░he░   
+1.690  ian░   
+1.673  us░    
+1.564  ian    
+1.503  ░de    
+1.466  fbi    
+1.466  rist   
+1.393  ░fbi   
+1.353  fbi░   
+1.335  ritu   
+1.331  ░fb    
+1.307  bi░    
+1.302  fire   
… 21780 more positive …
… 30323 more negative …
-1.332  ░cou   
-1.373  nat    
-1.408  cou    
-1.434  ░ge    
-1.632  ░ath   
-1.801  heis   


In [4]:
from IPython.core.display import display, HTML
show_html = lambda html: display(HTML(html))
show_html_expl = lambda expl, **kwargs: show_html(format_as_html(expl, include_styles=False, **kwargs))
show_html(format_html_styles())



In [5]:
show_html_expl(explain_weights(clf, vec, target_names=train['target_names'], top=100))


Explained as: linear model

Features with largest coefficients per class.
Caveats:
1. Be careful with features which are not
   independent - weights don't show their importance.
2. If scale of input features is different then scale of coefficients
   will also be different, making direct comparison between coefficient values
   incorrect.
3. Depending on regularization, rare features sometimes may have high
   coefficients; this doesn't mean they contribute much to the
   classification result for most examples.
y=alt.atheism top features y=comp.graphics top features y=sci.space top features y=talk.religion.misc top features
Weight? Feature
+2.761 heis
+2.240 eis
+2.136 eist
+1.953 ath
+1.915 thei
+1.881 pos
+1.872 hei
+1.821 nat
+1.748 sla
+1.709 post
+1.686 slam
+1.656 ish
+1.646 rna
+1.633 athe
+1.596 lam
+1.548 it
+1.519 is
+1.504 ish
+1.490 it
+1.479 isla
+1.415 up?
+1.407 ogi
+1.405 pos
+1.401 logi
+1.398 cout
+1.392 isl
+1.354 up?
+1.350 up?
+1.334 lai
+1.332 be
+1.324 /\/\
+1.324 /\/
+1.324 \/\
+1.323 ath
+1.300 stin
+1.299 ural
+1.295 tura
+1.289 mad
+1.283 \/\/
+1.274 oans
+1.267 tly?
+1.263 mu
+1.261 natu
+1.259 oh
+1.252 log
+1.238 up
+1.237 isl
+1.234 laim
+1.221 mott
+1.221 sh
+1.218 obb
+1.205 bobb
+1.204 p?
+1.200 bet.
+1.198 nat
+1.197 ain
+1.195 ent
+1.180 tex
+1.179 aim
+1.178 po
+1.174 our
+1.173 wom
+1.171 free
+1.163 ci
+1.159 nan
+1.148 tin
+1.141 clai
+1.139 nish
+1.138 mmmm
+1.138 ****
+1.127 nci
+1.124 gion
+1.124 wom
+1.121 say
+1.120 nati
+1.116 muc
+1.111 lami
+1.107 n!
+1.104 much
+1.098 nan
… 20158 more positive …
… 31977 more negative …
-1.099 cult
-1.100 rist
-1.102 spa
-1.104 ure
-1.110 cu
-1.124 brea
-1.131 bre
-1.158 ian
-1.162 /(
-1.200 use
-1.205 ndi
-1.231 pace
-1.235 spac
-1.264 ture
-1.332 -
-1.434 et
-1.504 his
-1.519 pac
-1.522 *
-1.539 us
Weight? Feature
+2.089 file
+1.947 3d
+1.936 phi
+1.783 gra
+1.749 raph
+1.744 fil
+1.734 mage
+1.725 ima
+1.696 mag
+1.670 hics
+1.668 aphi
+1.630 phic
+1.620 aph
+1.560 imag
+1.538 grap
+1.493 rap
+1.469 fil
+1.468 omp
+1.457 card
+1.370 line
+1.360 ix
+1.304 |||
+1.295 680
+1.277 ine
+1.270 ile
+1.266 c
+1.264 ook
+1.263 cod
+1.260 680
+1.251 ode
+1.247 lin
+1.244 ima
+1.240 ips
+1.210 comp
+1.203 for
+1.191 3d
+1.191 ray
+1.189 vga
+1.187 68
+1.187 any
+1.183 42
+1.175 int
+1.166 uter
+1.162 code
+1.158 ode
+1.157 ||||
+1.157 -
+1.151 lips
+1.130 gra
+1.116 ___
+1.097 work
+1.080 hel
+1.074 co
+1.074 edg
+1.072 ...
+1.067 !!
+1.061 copy
+1.059 ftp
+1.058 ork
+1.057 2.0
+1.055 llo,
+1.054 3do
+1.054 3do
+1.053 pli
+1.053 spl
+1.051 opy
+1.047 3d
+1.047 ~~~
+1.041 ft
+1.040 orma
+1.038 :
+1.036 lin
+1.036 ----
+1.034 ****
+1.031 help
+1.025 ith
+1.024 lo,
+1.024 lo,
+1.024 ~~~~
+1.019 42
+1.017 elp
+1.017 run
+1.013 hic
… 25945 more positive …
… 29213 more negative …
-1.025 ora
-1.040 nas
-1.066 astr
-1.067 nas
-1.068 eli
-1.099 orb
-1.125 ent
-1.161 orbi
-1.173 net
-1.189 orb
-1.191 as
-1.248 pac
-1.461 spa
-1.553 spa
-1.638 na
-1.854 pace
-1.932 spac
Weight? Feature
+3.213 spac
+3.136 pace
+2.723 spa
+2.533 pac
+2.470 spa
+1.960 orb
+1.866 ace
+1.862 rbit
+1.839 rbi
+1.830 orbi
+1.795 nas
+1.773 sp
+1.772 orb
+1.723 nas
+1.674 ace
+1.671 999
+1.592 9999
+1.468 ..
+1.465 bit
+1.432 astr
+1.405 na
+1.387 bill
+1.372 lan
+1.361 get
+1.341 nasa
+1.299 utt
+1.280 air
+1.275 asa
+1.268 la
+1.231 flig
+1.227 et
+1.221 get
+1.201 rati
+1.196 net
+1.182 cos
+1.180 ry
+1.166 asa
+1.142 cra
+1.127 act
+1.122 aun
+1.115 oon
+1.107 !!!!
+1.100 ton
+1.096 bil
+1.096 bil
+1.094 ht
+1.094 ane
+1.093 air
+1.085 oni
+1.084 rich
+1.078 ght
+1.068 ligh
+1.067 anet
+1.066 mars
+1.065 lon
+1.062 aunc
+1.061 it.
+1.055 athi
+1.053 lan
+1.046 sr
+1.043 low
+1.043 cos
+1.043 ala
+1.041 erti
+1.038 min
+1.037 ge
+1.029 act
+1.020 unch
+1.018 ndin
+1.014 it.
+1.006 en
… 24958 more positive …
… 43822 more negative …
-0.991 grap
-0.992 im
-0.994 wron
-0.998 <BIAS>
-0.999 rap
-1.005 rong
-1.021 aph
-1.031 kor
-1.039 ou
-1.051 file
-1.052 cop
-1.072 me.
-1.094 heis
-1.101 raph
-1.105 vid
-1.105 orm
-1.123 ist
-1.138 eis
-1.179 igi
-1.198 bo
-1.207 i'
-1.228 hics
-1.233 god
-1.251 phic
-1.271 aphi
-1.285 3d
-1.310 god
-1.484 :
-1.485 phi
Weight? Feature
+2.022 *
+1.799 he
+1.690 ian
+1.673 us
+1.564 ian
+1.503 de
+1.466 fbi
+1.466 rist
+1.393 fbi
+1.353 fbi
+1.335 ritu
+1.331 fb
+1.307 bi
+1.302 fire
+1.293 cy!
+1.293 acy!
+1.293 cy!
+1.290 sa
+1.280 bl
+1.258 ans
+1.237 is
+1.229 bloo
+1.228 may
+1.225 nacy
+1.207 my
+1.195 may
+1.177 my
+1.163 amo
+1.153 itu
+1.153 amor
+1.153 ntal
+1.131 idn
+1.128 re,
+1.124 init
+1.123 re,
+1.121 ern
+1.115 vid
+1.113 ans
+1.108 didn
+1.106 ild
+1.102 eal
+1.097 /(
+1.097 me
+1.093 that
+1.087 idn'
+1.084 lood
+1.082 cul
+1.074 cult
+1.072 ians
+1.070 eria
+1.061 mor
+1.053 god
+1.049 and
+1.045 esh
+1.036 rn
+1.033 fir
+1.033 alit
+1.025 eati
+1.018 your
+1.017 he
+1.015 rit
+1.012 ici
+1.011 may
… 21731 more positive …
… 30292 more negative …
-1.025 est
-1.028 at
-1.031 free
-1.054 fre
-1.058 pac
-1.062 le
-1.063 coul
-1.079 lai
-1.079 it
-1.084 rna
-1.086 any
-1.090 co
-1.099 athe
-1.102 it
-1.118 late
-1.137 spa
-1.137 sig
-1.138 ile
-1.177 thei
-1.194 spac
-1.196 pace
-1.213 fre
-1.214 any
-1.219 hei
-1.240 ost
-1.246 it
-1.254 eis
-1.254 ti
-1.255 gra
-1.275 po
-1.290 eist
-1.332 cou
-1.373 nat
-1.408 cou
-1.434 ge
-1.632 ath
-1.801 heis

In [6]:
show_html_expl(explain_prediction(clf, test['data'][7], vec, target_names=train['target_names'], top=50), force_weights=True)


Explained as: linear model

y=alt.atheism (score -1.951) top features y=comp.graphics (score -2.845) top features y=sci.space (score 1.862) top features y=talk.religion.misc (score -1.450) top features
Contribution? Feature
+0.294 :
+0.059 be
+0.048 tin
+0.048 the
+0.034 is
+0.033 up
+0.032 ill
+0.030 ing
+0.029 wh
+0.027 ght
+0.026 ting
+0.025 of
+0.025 of
+0.025 wha
+0.024 what
+0.024 se
+0.024 of
… 466 more positive …
… 591 more negative …
-0.024 der
-0.025 erv
-0.026 obse
-0.026 obs
-0.026 ure
-0.027 sky
-0.027 bser
-0.027 eld
-0.027 moo
-0.028 serv
-0.028 moon
-0.028 the
-0.029 fie
-0.029 bit
-0.029 moo
-0.030 spa
-0.030 fra
-0.031 astr
-0.031 his
-0.032 ht
-0.032 spa
-0.033 ght
-0.033 use
-0.033 ard
-0.034 fi
-0.036 lig
-0.038 pace
-0.038 spac
-0.043 ligh
-0.043 fra
-0.062 th
-0.065 pac
-0.961 <BIAS>
Contribution? Feature
+0.294 :
+0.038 fi
+0.033 ile
+0.031 bri
+0.030 co
+0.030 ase
+0.028 it
… 462 more positive …
… 584 more negative …
-0.028 ill
-0.028 str
-0.029 the
-0.029 tel
-0.029 of
-0.030 sate
-0.030 bil
-0.030 as
-0.031 hat
-0.031 of
-0.032 at
-0.032 rbi
-0.032 ono
-0.033 of
-0.033 rbit
-0.034 ligh
-0.034 tron
-0.034 nom
-0.035 rono
-0.038 be
-0.039 he
-0.042 lig
-0.043 onom
-0.043 spa
-0.044 orb
-0.045 orbi
-0.045 orb
-0.047 ast
-0.048 spa
-0.049 the
-0.052 stro
-0.053 pac
-0.055 ght
-0.055 ight
-0.056 ght
-0.057 ht
-0.057 pace
-0.059 igh
-0.060 spac
-0.069 astr
-0.091 as
-0.117 th
-0.964 <BIAS>
Contribution? Feature
+0.108 pac
+0.099 spac
+0.096 pace
+0.093 astr
+0.087 th
+0.080 spa
+0.077 spa
+0.075 orb
+0.073 igh
+0.071 orbi
+0.070 orb
+0.070 rbit
+0.067 rbi
+0.066 ht
+0.066 ght
+0.061 onom
+0.058 bill
+0.057 ligh
+0.053 bil
+0.053 ast
+0.052 stro
+0.049 nom
+0.048 ght
+0.048 ight
+0.047 the
+0.047 bil
+0.044 rono
+0.043 ono
+0.042 moon
+0.042 bit
+0.042 oon
+0.040 sp
+0.040 ace
+0.040 as
+0.039 tro
+0.037 tron
+0.036 sate
+0.036 omy
+0.036 moo
+0.034 ry
+0.034 omy
+0.033 moo
+0.032 the
+0.030 str
+0.030 ld
… 543 more positive …
… 511 more negative …
-0.030 der
-0.031 fi
-0.032 cop
-0.420 :
-0.998 <BIAS>
Contribution? Feature
+0.089 :
+0.059 is
+0.049 th
+0.047 fra
+0.040 sa
+0.035 his
+0.035 as
+0.032 fra
+0.030 my
+0.027 serv
+0.027 fi
+0.027 eld
+0.026 erv
+0.026 der
+0.025 that
+0.024 hat
+0.024 of
+0.024 cr
+0.024 br
… 509 more positive …
… 545 more negative …
-0.023 lli
-0.024 oul
-0.024 uld
-0.024 ould
-0.025 ing
-0.025 rbi
-0.025 far
-0.026 orbi
-0.026 ting
-0.026 what
-0.027 wha
-0.027 le
-0.028 ost
-0.028 rbit
-0.028 orb
-0.029 a
-0.030 ile
-0.031 co
-0.031 spa
-0.032 ti
-0.033 spa
-0.034 it
-0.035 be
-0.037 pace
-0.037 spac
-0.037 the
-0.039 ght
-0.039 tin
-0.045 pac
-0.053 the
-0.977 <BIAS>

y=alt.atheism (score -1.951) top features

Contribution? Feature
… 466 more positive …
… 591 more negative …
-0.243 Highlighted in text (sum)
-0.961 <BIAS>

: while i'm sure sagan considers it sacrilegious, that wouldn't be : because of his doubtfull credibility as an astronomer. modern, : ground-based, visible light astronomy (what these proposed : orbiting billboards would upset) is already a dying field: the : opacity and distortions caused by the atmosphere itself have : driven most of the field to use radio, far infrared or space-based : telescopes. hardly. the keck telescope in hawaii has taken its first pictures; they're nearly as good as hubble for a tiny fraction of the cost. : in any case, a bright point of light passing through : the field doesn't ruin observations. if that were the case, the : thousands of existing satellites would have already done so (satelliets : might not seem so bright to the eyes, but as far as astronomy is concerned, : they are extremely bright.) i believe that this orbiting space junk will be far brighter still; more like the full moon. the moon upsets deep-sky observation all over the sky (and not just looking at it) because of scattered light. this is a known problem, but of course two weeks out of every four are ok. what happens when this billboard circles every 90 minutes? what would be a good time then? : frank crary : cu boulder

y=comp.graphics (score -2.845) top features

Contribution? Feature
… 462 more positive …
… 584 more negative …
-0.964 <BIAS>
-1.373 Highlighted in text (sum)

: while i'm sure sagan considers it sacrilegious, that wouldn't be : because of his doubtfull credibility as an astronomer. modern, : ground-based, visible light astronomy (what these proposed : orbiting billboards would upset) is already a dying field: the : opacity and distortions caused by the atmosphere itself have : driven most of the field to use radio, far infrared or space-based : telescopes. hardly. the keck telescope in hawaii has taken its first pictures; they're nearly as good as hubble for a tiny fraction of the cost. : in any case, a bright point of light passing through : the field doesn't ruin observations. if that were the case, the : thousands of existing satellites would have already done so (satelliets : might not seem so bright to the eyes, but as far as astronomy is concerned, : they are extremely bright.) i believe that this orbiting space junk will be far brighter still; more like the full moon. the moon upsets deep-sky observation all over the sky (and not just looking at it) because of scattered light. this is a known problem, but of course two weeks out of every four are ok. what happens when this billboard circles every 90 minutes? what would be a good time then? : frank crary : cu boulder

y=sci.space (score 1.862) top features

Contribution? Feature
+1.968 Highlighted in text (sum)
… 543 more positive …
… 511 more negative …
-0.998 <BIAS>

: while i'm sure sagan considers it sacrilegious, that wouldn't be : because of his doubtfull credibility as an astronomer. modern, : ground-based, visible light astronomy (what these proposed : orbiting billboards would upset) is already a dying field: the : opacity and distortions caused by the atmosphere itself have : driven most of the field to use radio, far infrared or space-based : telescopes. hardly. the keck telescope in hawaii has taken its first pictures; they're nearly as good as hubble for a tiny fraction of the cost. : in any case, a bright point of light passing through : the field doesn't ruin observations. if that were the case, the : thousands of existing satellites would have already done so (satelliets : might not seem so bright to the eyes, but as far as astronomy is concerned, : they are extremely bright.) i believe that this orbiting space junk will be far brighter still; more like the full moon. the moon upsets deep-sky observation all over the sky (and not just looking at it) because of scattered light. this is a known problem, but of course two weeks out of every four are ok. what happens when this billboard circles every 90 minutes? what would be a good time then? : frank crary : cu boulder

y=talk.religion.misc (score -1.450) top features

Contribution? Feature
… 509 more positive …
… 545 more negative …
-0.258 Highlighted in text (sum)
-0.977 <BIAS>

: while i'm sure sagan considers it sacrilegious, that wouldn't be : because of his doubtfull credibility as an astronomer. modern, : ground-based, visible light astronomy (what these proposed : orbiting billboards would upset) is already a dying field: the : opacity and distortions caused by the atmosphere itself have : driven most of the field to use radio, far infrared or space-based : telescopes. hardly. the keck telescope in hawaii has taken its first pictures; they're nearly as good as hubble for a tiny fraction of the cost. : in any case, a bright point of light passing through : the field doesn't ruin observations. if that were the case, the : thousands of existing satellites would have already done so (satelliets : might not seem so bright to the eyes, but as far as astronomy is concerned, : they are extremely bright.) i believe that this orbiting space junk will be far brighter still; more like the full moon. the moon upsets deep-sky observation all over the sky (and not just looking at it) because of scattered light. this is a known problem, but of course two weeks out of every four are ok. what happens when this billboard circles every 90 minutes? what would be a good time then? : frank crary : cu boulder


In [7]:
show_html_expl(explain_prediction(clf, test['data'][1], vec, target_names=train['target_names']))


Explained as: linear model

y=alt.atheism (score -2.503) top features y=comp.graphics (score 1.733) top features y=sci.space (score -1.000) top features y=talk.religion.misc (score -2.112) top features
Contribution? Feature
+0.109 mad
+0.094 vat
+0.073 atic
+0.070 mad
+0.066 ican
+0.056 is
+0.052 ble.
+0.047 ade
+0.043 le.
+0.041 an
+0.041 ent
+0.039 le.
+0.039 fin
+0.036 wh
+0.033 in
+0.029 bra
+0.029 the
+0.029 ade
+0.027 ary
+0.026 made
+0.025 ly
+0.025 ing
+0.024 ctio
+0.023 in
+0.023 ary
+0.023 cti
+0.022 in
+0.022 thi
+0.021 tic
+0.021 me
+0.021 fin
+0.019 yon
+0.018 is
+0.018 one
+0.016 a
+0.016 ion
+0.016 thi
+0.016 of
+0.015 of
+0.015 of
+0.013 this
+0.011 tly
+0.010 llec
+0.010 te
+0.010 ca
+0.008 ece
+0.008 sit
+0.008 our
+0.007 yone
+0.006 tion
+0.006 tio
+0.005 an
+0.005 coll
+0.004 one
+0.004 find
+0.003 ing
+0.003 ng
+0.002 any
+0.002 e.
+0.002 ur
+0.002 ma
+0.002 lib
-0.001 lib
-0.001 can
-0.003 any
-0.004 he
-0.004 nyo
-0.004 anyo
-0.004 nyon
-0.004 rec
-0.005 ica
-0.005 ion
-0.005 the
-0.005 ati
-0.005 entl
-0.005 ry
-0.005 our
-0.006 re
-0.007 ble
-0.008 able
-0.009 ne
-0.009 ecti
-0.010 to
-0.010 sit
-0.010 he
-0.010 s.
-0.011 lect
-0.011 co
-0.011 va
-0.012 her
-0.013 ect
-0.013 abl
-0.013 lec
-0.013 re
-0.016 tica
-0.016 here
-0.016 ntl
-0.016 ere
-0.016 me
-0.017 ila
-0.017 the
-0.018 ere
-0.018 ite
-0.018 rar
-0.018 ind
-0.019 olle
-0.019 ftp
-0.019 ntly
-0.019 ft
-0.020 ecen
-0.020 ftp
-0.020 de
-0.020 can
-0.021 si
-0.021 can
-0.022 li
-0.022 cen
-0.022 tp
-0.022 ftp
-0.022 rec
-0.024 tly
-0.024 is
-0.025 lab
-0.025 his
-0.025 on
-0.025 me
-0.026 whe
-0.027 vati
-0.027 whe
-0.028 vat
-0.031 brar
-0.031 libr
-0.032 ibra
-0.032 ibr
-0.033 tour
-0.033 din
-0.034 lle
-0.034 wher
-0.035 cent
-0.036 ding
-0.036 help
-0.036 elp
-0.038 his
-0.038 th
-0.038 vail
-0.039 tou
-0.039 ite
-0.039 avai
-0.042 hel
-0.042 fi
-0.042 aila
-0.042 labl
-0.043 elp
-0.043 ndin
-0.043 lp
-0.044 site
-0.045 ava
-0.045 av
-0.046 tou
-0.046 vai
-0.047 rary
-0.047 ilab
-0.047 rece
-0.049 oll
-0.051 ava
-0.052 hel
-0.057 us.
-0.058 col
-0.058 us.
-0.059 us.
-0.059 ail
-0.074 indi
-0.082 us
-0.083 ndi
-0.084 col
-0.961 <BIAS>
Contribution? Feature
+0.117 ftp
+0.114 ft
+0.096 ftp
+0.086 help
+0.084 elp
+0.082 tp
+0.082 hel
+0.080 ftp
+0.079 lib
+0.078 lib
+0.076 can
+0.071 brar
+0.069 ibra
+0.068 libr
+0.068 ibr
+0.066 lp
+0.063 site
+0.060 any
+0.059 can
+0.058 elp
+0.055 hel
+0.054 rar
+0.051 anyo
+0.051 de
+0.050 nyo
+0.049 nyon
+0.048 rary
+0.047 any
+0.047 fi
+0.045 abl
+0.045 bra
+0.045 here
+0.045 rec
+0.044 yone
+0.040 yon
+0.040 col
+0.039 a
+0.039 si
+0.039 fin
+0.038 wher
+0.038 ere
+0.038 able
+0.037 co
+0.036 ble
+0.036 ne
+0.035 col
+0.034 li
+0.031 us
+0.029 can
+0.029 ail
+0.029 ect
+0.028 rec
+0.026 ite
+0.025 tour
+0.024 sit
+0.024 sit
+0.023 labl
+0.023 her
+0.023 fin
+0.023 ece
+0.022 va
+0.022 lab
+0.022 find
+0.022 is
+0.022 aila
+0.021 ila
+0.021 ere
+0.020 avai
+0.020 vat
+0.019 olle
+0.019 vail
+0.019 whe
+0.018 vai
+0.018 whe
+0.017 ican
+0.016 ca
+0.016 ilab
+0.015 this
+0.014 ite
+0.013 us.
+0.013 ava
+0.013 is
+0.013 lect
+0.013 oll
+0.012 an
+0.012 tly
+0.011 is
+0.011 cen
+0.011 ade
+0.011 ing
+0.010 on
+0.010 our
+0.010 one
+0.009 ind
+0.008 re
+0.008 ati
+0.008 one
+0.008 ntl
+0.007 re
+0.007 in
+0.007 ntly
+0.006 ade
+0.005 ng
+0.005 rece
+0.005 ica
+0.004 in
+0.004 entl
+0.003 ava
+0.003 tly
+0.003 indi
+0.002 lec
+0.001 atic
+0.001 ion
+0.001 in
+0.000 ble.
+0.000 to
-0.000 av
-0.002 ly
-0.002 tion
-0.002 me
-0.002 tio
-0.003 his
-0.003 he
-0.003 din
-0.004 his
-0.006 ndi
-0.006 cent
-0.006 ary
-0.006 ion
-0.006 ing
-0.007 s.
-0.007 ry
-0.007 tica
-0.008 ding
-0.008 thi
-0.009 tou
-0.009 te
-0.009 ctio
-0.011 me
-0.012 the
-0.012 le.
-0.012 coll
-0.012 thi
-0.013 le.
-0.014 e.
-0.015 wh
-0.015 us.
-0.015 our
-0.015 me
-0.016 llec
-0.016 ma
-0.017 ur
-0.017 the
-0.018 made
-0.018 of
-0.019 of
-0.020 ecen
-0.020 ary
-0.020 of
-0.022 an
-0.022 ndin
-0.022 tou
-0.023 ecti
-0.025 us.
-0.030 the
-0.030 lle
-0.032 vati
-0.032 he
-0.032 tic
-0.039 cti
-0.042 mad
-0.042 mad
-0.043 vat
-0.045 ent
-0.072 th
-0.964 <BIAS>
Contribution? Feature
+0.083 ndin
+0.067 ndi
+0.056 ry
+0.054 th
+0.051 lle
+0.048 tou
+0.044 oll
+0.043 col
+0.040 a
+0.038 coll
+0.035 tou
+0.035 vat
+0.033 tly
+0.032 col
+0.032 olle
+0.031 tica
+0.031 te
+0.030 wher
+0.030 av
+0.030 one
+0.029 ding
+0.029 the
+0.028 on
+0.027 ntly
+0.026 the
+0.023 tly
+0.023 ary
+0.021 din
+0.021 ntl
+0.021 nyon
+0.021 thi
+0.020 nyo
+0.019 can
+0.019 ing
+0.018 llec
+0.018 anyo
+0.018 li
+0.018 ing
+0.018 ite
+0.017 lab
+0.017 ail
+0.016 entl
+0.016 ng
+0.016 can
+0.014 one
+0.014 ava
+0.014 ilab
+0.012 ecen
+0.012 us.
+0.012 ma
+0.011 he
+0.011 his
+0.011 thi
+0.011 yone
+0.011 tic
+0.011 us.
+0.011 re
+0.011 yon
+0.010 ly
+0.009 the
+0.009 rece
+0.009 tio
+0.009 co
+0.009 her
+0.008 ary
+0.008 can
+0.008 ati
+0.008 us.
+0.008 tion
+0.008 ila
+0.008 ne
+0.008 s.
+0.007 ade
+0.007 cent
+0.007 va
+0.007 whe
+0.006 le.
+0.006 ca
+0.006 ava
+0.005 whe
+0.005 made
+0.005 lect
+0.004 aila
+0.004 sit
+0.004 ere
+0.003 to
+0.003 ion
+0.003 de
+0.003 labl
+0.003 his
+0.002 ion
+0.002 me
+0.002 avai
+0.001 vail
+0.001 any
+0.000 ade
+0.000 here
-0.000 vai
-0.000 any
-0.002 ur
-0.002 me
-0.002 lec
-0.003 this
-0.003 an
-0.003 sit
-0.003 in
-0.003 us
-0.003 ent
-0.003 ect
-0.004 ind
-0.005 rec
-0.005 re
-0.006 able
-0.007 ere
-0.007 ica
-0.008 tour
-0.008 ite
-0.009 le.
-0.009 abl
-0.009 e.
-0.009 indi
-0.011 ble.
-0.012 of
-0.012 of
-0.012 in
-0.012 mad
-0.013 of
-0.013 me
-0.013 is
-0.014 ece
-0.014 an
-0.015 si
-0.016 wh
-0.016 vati
-0.016 cti
-0.016 site
-0.016 ctio
-0.017 rec
-0.017 is
-0.018 is
-0.020 cen
-0.021 in
-0.021 ble
-0.023 our
-0.023 find
-0.024 ecti
-0.024 fin
-0.026 our
-0.026 mad
-0.028 rary
-0.029 help
-0.029 elp
-0.031 rar
-0.032 tp
-0.032 ftp
-0.033 hel
-0.033 fin
-0.035 ftp
-0.038 ican
-0.038 fi
-0.041 he
-0.041 ibr
-0.042 ibra
-0.043 libr
-0.044 brar
-0.046 lp
-0.047 bra
-0.047 elp
-0.048 hel
-0.049 atic
-0.050 lib
-0.052 ft
-0.056 ftp
-0.077 lib
-0.090 vat
-0.998 <BIAS>
Contribution? Feature
+0.083 is
+0.060 me
+0.057 us.
+0.053 indi
+0.051 me
+0.048 he
+0.045 ecen
+0.044 ite
+0.044 tou
+0.043 his
+0.042 whe
+0.041 whe
+0.040 tou
+0.039 us.
+0.034 ecti
+0.034 vati
+0.034 fi
+0.033 us
+0.033 ndi
+0.032 us.
+0.030 th
+0.029 rece
+0.026 an
+0.026 ite
+0.025 cent
+0.023 cen
+0.023 his
+0.021 ding
+0.021 e.
+0.020 hel
+0.019 din
+0.018 wher
+0.017 our
+0.015 vai
+0.015 of
+0.015 ent
+0.013 to
+0.013 s.
+0.012 lp
+0.012 lle
+0.011 is
+0.011 lec
+0.011 re
+0.011 elp
+0.011 of
+0.010 he
+0.010 rary
+0.009 ava
+0.009 rec
+0.008 ece
+0.008 tour
+0.008 of
+0.007 ere
+0.007 rec
+0.007 tica
+0.006 hel
+0.006 ndin
+0.005 ind
+0.004 sit
+0.004 cti
+0.004 va
+0.003 find
+0.003 our
+0.002 tion
+0.002 this
+0.002 wh
+0.002 col
+0.001 tio
-0.000 si
-0.003 ble
-0.003 ntl
-0.003 ati
-0.003 ntly
-0.003 ava
-0.004 me
-0.004 vail
-0.005 re
-0.005 avai
-0.005 the
-0.006 an
-0.006 ilab
-0.007 labl
-0.007 ctio
-0.008 aila
-0.008 ur
-0.008 abl
-0.009 ica
-0.009 ect
-0.010 ma
-0.010 her
-0.010 ail
-0.010 site
-0.011 mad
-0.011 entl
-0.011 te
-0.011 ng
-0.011 vat
-0.012 on
-0.012 lect
-0.013 llec
-0.013 ila
-0.014 made
-0.014 elp
-0.014 in
-0.015 oll
-0.015 lab
-0.015 help
-0.016 ion
-0.016 tly
-0.016 li
-0.017 ing
-0.018 thi
-0.018 av
-0.018 ary
-0.018 in
-0.018 thi
-0.019 ion
-0.020 ly
-0.020 fin
-0.020 ing
-0.021 ftp
-0.021 rar
-0.021 ere
-0.022 ary
-0.023 tp
-0.023 in
-0.023 sit
-0.023 here
-0.024 one
-0.024 able
-0.024 tic
-0.025 libr
-0.025 is
-0.025 fin
-0.025 bra
-0.026 brar
-0.026 ibr
-0.026 ry
-0.027 atic
-0.027 ibra
-0.027 de
-0.027 ne
-0.029 olle
-0.029 can
-0.031 the
-0.031 ca
-0.031 ftp
-0.032 ft
-0.033 the
-0.033 tly
-0.033 nyo
-0.034 can
-0.034 ade
-0.034 coll
-0.035 nyon
-0.035 anyo
-0.036 ftp
-0.038 ade
-0.038 co
-0.039 col
-0.042 one
-0.043 lib
-0.047 le.
-0.050 yone
-0.050 ican
-0.052 le.
-0.055 any
-0.057 a
-0.057 any
-0.062 yon
-0.063 mad
-0.068 lib
-0.070 vat
-0.075 ble.
-0.092 can
-0.977 <BIAS>

y=alt.atheism (score -2.503) top features

Contribution? Feature
-0.961 <BIAS>
-1.542 Highlighted in text (sum)

the vatican library recently made a tour of the us. can anyone help me in finding a ftp site where this collection is available.

y=comp.graphics (score 1.733) top features

Contribution? Feature
+2.698 Highlighted in text (sum)
-0.964 <BIAS>

the vatican library recently made a tour of the us. can anyone help me in finding a ftp site where this collection is available.

y=sci.space (score -1.000) top features

Contribution? Feature
-0.002 Highlighted in text (sum)
-0.998 <BIAS>

the vatican library recently made a tour of the us. can anyone help me in finding a ftp site where this collection is available.

y=talk.religion.misc (score -2.112) top features

Contribution? Feature
-0.977 <BIAS>
-1.135 Highlighted in text (sum)

the vatican library recently made a tour of the us. can anyone help me in finding a ftp site where this collection is available.


In [8]:
import numpy as np
for doc in test['data'][:10]:
    expl = explain_prediction(clf, doc, vec, target_names=train['target_names'], top_targets=1)
    show_html_expl(expl, force_weights=False)


Explained as: linear model

y=sci.space (score 0.275) top features

Contribution? Feature
+1.273 Highlighted in text (sum)
-0.998 <BIAS>

trry the skywatch project in arizona.

Explained as: linear model

y=comp.graphics (score 1.733) top features

Contribution? Feature
+2.698 Highlighted in text (sum)
-0.964 <BIAS>

the vatican library recently made a tour of the us. can anyone help me in finding a ftp site where this collection is available.

Explained as: linear model

y=comp.graphics (score 1.655) top features

Contribution? Feature
+2.619 Highlighted in text (sum)
-0.964 <BIAS>

hi there, i am here looking for some help. my friend is a interior decor designer. he is from thailand. he is trying to find some graphics software on pc. any suggestion on which software to buy,where to buy and how much it costs ? he likes the most sophisticated software(the more features it has,the better)

Explained as: linear model

y=comp.graphics (score 0.467) top features

Contribution? Feature
+1.431 Highlighted in text (sum)
-0.964 <BIAS>

rfd request for discussion for the open telematic group otg i have proposed the forming of a consortium/task force for the promotion of naplps/jpeg, fif to openly discuss ways, method, procedures,algorythms, applications, implementation, extensions of naplps/jpeg standards. these standards should facilitate the creation of real_time online applications that make use of voice, video, telecommuting, hires graphics, conferencing, distant learning, online order entry, fax,in addition these dicussion would assist all to better understand how sgml, cals, oda, mime, oodbms, jpeg, mpeg, fractals, sql, cdrom, cdromxa, kodak photocd, tcl, v.fast, and eia/tia562, can best be incorporated and implemented to develop telematic/multimedia applications. we want to be able to support dos, unix, mac, windows, nt, os/2 platforms. it is our hope that individuals, developers, corporations, universities, r & d labs would join in in supporting such an endeavor. this would be a not_for_profit group with bylaws and charter. already many corporations have decided to support otg (open telematic group) so do not delay joining if you are a developer an rfd has been posted to form a usenet newsgroup and a faq will soon be be composed to start promulgating what is known on the subject. if you would like to be added to the maillist send email or mail to the address below. this group would publish an electronic quarterly naplps/jpeg newsletter as well as a hardcopy version. we urge all who wants to see cmcs hires based applications & the naplps/jpeg g r o w, decide to join and mutually benefit from this not-for_profit endeavor. note: telematic has been defined by mr. james martin as the marriage of voice, video, hi-res graphics, fax, ivr, music over telephone lines/lan. if you would like to get involve write to me at: img inter-multimedia group| internet: epimntl@world.std.com p.o. box 95901 | ed.pimentel@gisatl.fidonet.org atlanta, georgia, us | cis : 70611,3703 | fidonet : 1:133/407 | bbs : +1-404-985-1198 zyxel 14.4k

Explained as: linear model

y=comp.graphics (score 0.493) top features

Contribution? Feature
+1.457 Highlighted in text (sum)
-0.964 <BIAS>

i am interested in finding 3d animation programs for the mac. i am especially interested in any programs that don't exist in a pc port and are so good that they would make me go buy a mac. do any such exist?

Explained as: linear model

y=comp.graphics (score 0.510) top features

Contribution? Feature
+1.475 Highlighted in text (sum)
-0.964 <BIAS>

i'm also interested in such a program. but most of all i'd like to know wich program is able to convert gif or pcx to dxf !!! when i have this program, i can scan pictures and frase (or something like that !) them. this will be beyond the limit !!!

Explained as: linear model

y=comp.graphics (score -0.614) top features

Contribution? Feature
+0.351 Highlighted in text (sum)
-0.964 <BIAS>

or how about: "end light pollution now!!" your banner would have no effect on its subject, but my banner would.

Explained as: linear model

y=sci.space (score 1.862) top features

Contribution? Feature
+2.860 Highlighted in text (sum)
-0.998 <BIAS>

: while i'm sure sagan considers it sacrilegious, that wouldn't be : because of his doubtfull credibility as an astronomer. modern, : ground-based, visible light astronomy (what these proposed : orbiting billboards would upset) is already a dying field: the : opacity and distortions caused by the atmosphere itself have : driven most of the field to use radio, far infrared or space-based : telescopes. hardly. the keck telescope in hawaii has taken its first pictures; they're nearly as good as hubble for a tiny fraction of the cost. : in any case, a bright point of light passing through : the field doesn't ruin observations. if that were the case, the : thousands of existing satellites would have already done so (satelliets : might not seem so bright to the eyes, but as far as astronomy is concerned, : they are extremely bright.) i believe that this orbiting space junk will be far brighter still; more like the full moon. the moon upsets deep-sky observation all over the sky (and not just looking at it) because of scattered light. this is a known problem, but of course two weeks out of every four are ok. what happens when this billboard circles every 90 minutes? what would be a good time then? : frank crary : cu boulder

Explained as: linear model

y=alt.atheism (score 1.878) top features

Contribution? Feature
+2.839 Highlighted in text (sum)
-0.961 <BIAS>

not if you show that these hypothetical atheists are gullible, excitable and easily led from some concrete cause. in that case we would also have to discuss if that concrete cause, rather than atheism, was the factor that caused their subsequent behaviour.

Explained as: linear model

y=sci.space (score -0.048) top features

Contribution? Feature
+0.950 Highlighted in text (sum)
-0.998 <BIAS>

picture our universe floating like a log in a river. as the log floats down the river, it occasionally strikes rocks, the bank, the bottom, other logs. when this collission occurs, kinetic energy is translated into heat, the log degrades, gets scraped up, and other energy translaions occur. the distribution of damage to the log depends on the shape of the log. however, to a very small virus in a mite on the head of a termite in the center of the log, the shock waves from the collissions would appear uniformly random in direction. this is my theory for grb. they are evidence of our universe interacting with other universes! why not! makes just as much sense as the grb coming from the oort cloud! the log theory of universes can't be ruled out! of course, i'm a layman in the physics world. you physicists out there, tell me about this !!!!