Classification in Sci-kit Learn

This code predicts the newsgroup from a list of 20 possible news groups. Its trainind on the commonly used 20-newsgroups dataset that is a "unusual" clasification dataset in that each newsgroup is very distinctive, leading to picking models that do better with this kind of data.

The code does the following:

counts words
weights word count features with TFIDF weighting
predicts the newsgroup from the weighted features

Models are optimized through:

Varying tokenization methods including character and word n-grams
Several model types
Randomized hyperparameter and tokenization option search
Ranking of several models so best models are visible

Code came from examples at:

20 newsgroups dataset info is at http://scikit-learn.org/stable/datasets/index.html#the-20-newsgroups-text-dataset

Be sure to install the following (pip3 is python 3 and pip command will also work):

pip3 install sklearn
pip3 install pandas
pip3 install scipy

If I missed an instal and you get an import error, try doing a pip3 install <import name> . Note that the kernel for jupyter needs to be the same version/instalation of python you do the pip3 install in (python 3).



In [2]:

    
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline 
from IPython.core.display import display, HTML
from IPython.display import Audio
import os

from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer, TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import SGDClassifier

import time
display(HTML("<style>.container { width:97% !important; }</style>")) #Set width of iPython cells

Load Data



In [3]:

    
from sklearn.datasets import fetch_20newsgroups
# You can restrict the categories to simulate fewwer classes
#categories = ['alt.atheism', 'soc.religion.christian','comp.graphics', 'sci.med']
#categories = ['comp.graphics', 'sci.med']
#categories = ['alt.atheism', 'talk.religion.misc']
categories=None
twenty_train = fetch_20newsgroups(subset='train',
    categories=categories, shuffle=True, random_state=42)

twenty_test = fetch_20newsgroups(subset='test',
    categories=categories, shuffle=True, random_state=42)

Investigate Training Set



In [4]:

    
twenty_train.target_names









    Out[4]:





['alt.atheism',
 'comp.graphics',
 'comp.os.ms-windows.misc',
 'comp.sys.ibm.pc.hardware',
 'comp.sys.mac.hardware',
 'comp.windows.x',
 'misc.forsale',
 'rec.autos',
 'rec.motorcycles',
 'rec.sport.baseball',
 'rec.sport.hockey',
 'sci.crypt',
 'sci.electronics',
 'sci.med',
 'sci.space',
 'soc.religion.christian',
 'talk.politics.guns',
 'talk.politics.mideast',
 'talk.politics.misc',
 'talk.religion.misc']



In [5]:

    
len(twenty_train.data)









    Out[5]:





11314



In [6]:

    
len(twenty_train.filenames)









    Out[6]:





11314



In [7]:

    
print(twenty_train.data[0])









    



From: lerxst@wam.umd.edu (where's my thing)
Subject: WHAT car is this!?
Nntp-Posting-Host: rac3.wam.umd.edu
Organization: University of Maryland, College Park
Lines: 15

 I was wondering if anyone out there could enlighten me on this car I saw
the other day. It was a 2-door sports car, looked to be from the late 60s/
early 70s. It was called a Bricklin. The doors were really small. In addition,
the front bumper was separate from the rest of the body. This is 
all I know. If anyone can tellme a model name, engine specs, years
of production, where this car is made, history, or whatever info you
have on this funky looking car, please e-mail.

Thanks,
- IL
   ---- brought to you by your neighborhood Lerxst ----



In [8]:

    
twenty_train.target_names[twenty_train.target[0]]









    Out[8]:





'rec.autos'



In [9]:

    
twenty_train.target









    Out[9]:





array([7, 4, 4, ..., 3, 1, 8])



In [10]:

    
len(twenty_train.target)









    Out[10]:





11314



In [11]:

    
twenty_train.target_names









    Out[11]:





['alt.atheism',
 'comp.graphics',
 'comp.os.ms-windows.misc',
 'comp.sys.ibm.pc.hardware',
 'comp.sys.mac.hardware',
 'comp.windows.x',
 'misc.forsale',
 'rec.autos',
 'rec.motorcycles',
 'rec.sport.baseball',
 'rec.sport.hockey',
 'sci.crypt',
 'sci.electronics',
 'sci.med',
 'sci.space',
 'soc.religion.christian',
 'talk.politics.guns',
 'talk.politics.mideast',
 'talk.politics.misc',
 'talk.religion.misc']

Test Set



In [12]:

    
len(twenty_test.data)









    Out[12]:





7532



In [13]:

    
len(twenty_test.data) / len(twenty_train.data)









    Out[13]:





0.66572388191621



In [14]:

    
print(twenty_test.data[10])









    



From: Greg.Reinacker@FtCollins.NCR.COM
Subject: Windows On-Line Review uploaded
Reply-To: Greg.Reinacker@FtCollinsCO.NCR.COM
Organization: NCR Microelectronics, Ft. Collins, CO
Lines: 12

I have uploaded the Windows On-Line Review shareware edition to
ftp.cica.indiana.edu as /pub/pc/win3/uploads/wolrs7.zip.

It is an on-line magazine which contains reviews of some shareware
products...I grabbed it from the Windows On-Line BBS.

--
--------------------------------------------------------------------------
Greg Reinacker                          (303) 223-5100 x9289
NCR Microelectronic Products Division   VoicePlus 464-9289
2001 Danfield Court                     Greg.Reinacker@FtCollinsCO.NCR.COM
Fort Collins, CO  80525



In [15]:

    
twenty_test.target_names[twenty_test.target[10]]









    Out[15]:





'comp.os.ms-windows.misc'

Test Tokenization

CountVectorizer



In [16]:

    
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(twenty_train.data)
print('training examples = ' + str(len(twenty_train.data)))
print('vocabulary length = ' + str(len(count_vect.vocabulary_)))
print('transformed training text matrix shape = ' + str(X_train_counts.shape))









    



training examples = 11314
vocabulary length = 130107
transformed training text matrix shape = (11314, 130107)



In [17]:

    
# vocabulary_ is dict of word string -> word index
list(count_vect.vocabulary_.items())[:50]









    Out[17]:





[('from', 56979),
 ('lerxst', 75358),
 ('wam', 123162),
 ('umd', 118280),
 ('edu', 50527),
 ('where', 124031),
 ('my', 85354),
 ('thing', 114688),
 ('subject', 111322),
 ('what', 123984),
 ('car', 37780),
 ('is', 68532),
 ('this', 114731),
 ('nntp', 87620),
 ('posting', 95162),
 ('host', 64095),
 ('rac3', 98949),
 ('organization', 90379),
 ('university', 118983),
 ('of', 89362),
 ('maryland', 79666),
 ('college', 40998),
 ('park', 92081),
 ('lines', 76032),
 ('15', 4605),
 ('was', 123292),
 ('wondering', 124931),
 ('if', 65798),
 ('anyone', 28615),
 ('out', 90774),
 ('there', 114579),
 ('could', 42876),
 ('enlighten', 51793),
 ('me', 80638),
 ('on', 89860),
 ('saw', 104813),
 ('the', 114455),
 ('other', 90686),
 ('day', 45295),
 ('it', 68766),
 ('door', 48618),
 ('sports', 109581),
 ('looked', 76718),
 ('to', 115475),
 ('be', 32311),
 ('late', 74693),
 ('60s', 16574),
 ('early', 50111),
 ('70s', 18299),
 ('called', 37433)]

Test transform on some short text



In [18]:

    
text = ['The The rain in spain.', 'The brown brown fox.']
counts_matrix = count_vect.transform(text)
type(counts_matrix)









    Out[18]:





scipy.sparse.csr.csr_matrix



In [19]:

    
counts_matrix.data









    Out[19]:





array([1, 1, 1, 2, 2, 1, 1])



In [20]:

    
counts_matrix.indptr









    Out[20]:





array([0, 4, 7], dtype=int32)



In [21]:

    
counts_matrix.indices









    Out[21]:





array([ 66608,  99121, 109111, 114455,  35194,  56573, 114455], dtype=int32)

Convert to coo sparce matrix for easier display



In [22]:

    
from scipy.sparse import coo_matrix
coo = coo_matrix(counts_matrix)
#print(np.stack((coo.row, coo.col, coo.data)))

df = pd.DataFrame({'row':coo.row, 'column':coo.col, 'count':coo.data}, 
                  columns=['row','column', 'count'])

df

Build inverse vocabulary



In [23]:

    
inverse_vocabulary=np.empty(len(count_vect.vocabulary_), dtype=object)
for key,value in count_vect.vocabulary_.items():
    inverse_vocabulary[value] = key
    
for i in coo.col:
    print(i, inverse_vocabulary[i])









    



66608 in
99121 rain
109111 spain
114455 the
35194 brown
56573 fox
114455 the



In [24]:

    
words = [inverse_vocabulary[i] for i in coo.col]
df = pd.DataFrame({'row':coo.row, 'column':coo.col, 'count':coo.data, 'word':words})
df = df[ ['row','column', 'count', 'word'] ]
df

TFIDF

TfidfTransformer



In [25]:

    
tfidf = TfidfTransformer()
tfidf.fit(X_train_counts) # compute weights on whole training set
tfidf_matrix = tfidf.transform(counts_matrix) # transform examples
print( 'tfidf_matrix type = ' + str(type(tfidf_matrix)) )
print( 'tfidf_matrix shape = ' + str(tfidf_matrix.shape) )

coo_tfidf = coo_matrix(tfidf_matrix)
words_tfidf = [inverse_vocabulary[i] for i in coo_tfidf.col]

df = pd.DataFrame({'row':coo_tfidf.row, 'column':coo_tfidf.col, 
                   'value':coo_tfidf.data, 'word':words_tfidf})
df = df[ ['row','column', 'value', 'word'] ]
df









    



tfidf_matrix type = <class 'scipy.sparse.csr.csr_matrix'>
tfidf_matrix shape = (2, 130107)






    Out[25]:







  
    
      
      row
      column
      value
      word
    
  
  
    
      0
      0
      114455
      0.214719
      the
    
    
      1
      0
      109111
      0.759340
      spain
    
    
      2
      0
      99121
      0.602864
      rain
    
    
      3
      0
      66608
      0.117698
      in
    
    
      4
      1
      114455
      0.085361
      the
    
    
      5
      1
      56573
      0.524809
      fox
    
    
      6
      1
      35194
      0.846929
      brown



In [26]:

    
import scipy
scipy.sparse.linalg.norm(tfidf_matrix, axis=1)









    Out[26]:





array([ 1.,  1.])

Notice the following in the above values:

frequent words like 'the' and 'in' are down weighted
Each matrix row has a euclidian norm of 1.0

Tfidf Weights



In [27]:

    
tfidf.idf_.shape









    Out[27]:





(130107,)



In [28]:

    
words = ['the', 'very', 'car', 'vector', 'africa']
for word in words:
    word_index = count_vect.vocabulary_[word]
    print(word + ' = ' + str(tfidf.idf_[word_index]))









    



the = 1.06905600081
very = 2.72055957853
car = 3.98125516175
vector = 7.1150087332
africa = 6.48373695636

Pipelines

Pipelines pass the output of one transform to the input of the next.

Pipeline



In [29]:

    
text_clf = Pipeline([('cvect', CountVectorizer()),
                     ('tfidf', TfidfTransformer()),
                     ('sgdc', MultinomialNB()),
                    ])



In [30]:

    
text_clf.fit(twenty_train.data, twenty_train.target)









    Out[30]:





Pipeline(memory=None,
     steps=[('cvect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        stri...near_tf=False, use_idf=True)), ('sgdc', MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))])



In [31]:

    
predicted = text_clf.predict(twenty_test.data)
np.mean(predicted == twenty_test.target)









    Out[31]:





0.7738980350504514



In [32]:

    
from sklearn import metrics
print(metrics.classification_report(twenty_test.target, predicted,
     target_names=twenty_test.target_names))









    



                          precision    recall  f1-score   support

             alt.atheism       0.80      0.52      0.63       319
           comp.graphics       0.81      0.65      0.72       389
 comp.os.ms-windows.misc       0.82      0.65      0.73       394
comp.sys.ibm.pc.hardware       0.67      0.78      0.72       392
   comp.sys.mac.hardware       0.86      0.77      0.81       385
          comp.windows.x       0.89      0.75      0.82       395
            misc.forsale       0.93      0.69      0.80       390
               rec.autos       0.85      0.92      0.88       396
         rec.motorcycles       0.94      0.93      0.93       398
      rec.sport.baseball       0.92      0.90      0.91       397
        rec.sport.hockey       0.89      0.97      0.93       399
               sci.crypt       0.59      0.97      0.74       396
         sci.electronics       0.84      0.60      0.70       393
                 sci.med       0.92      0.74      0.82       396
               sci.space       0.84      0.89      0.87       394
  soc.religion.christian       0.44      0.98      0.61       398
      talk.politics.guns       0.64      0.94      0.76       364
   talk.politics.mideast       0.93      0.91      0.92       376
      talk.politics.misc       0.96      0.42      0.58       310
      talk.religion.misc       0.97      0.14      0.24       251

             avg / total       0.82      0.77      0.77      7532



In [33]:

    
df = pd.DataFrame(metrics.confusion_matrix(twenty_test.target, predicted))
df

Function to Test a Pipeline



In [34]:

    
class QAResults:
    def init(self, Y_expected, Y_predicted, X, class_labels):
        self.Y_expected = Y_expected
        self.Y_predicted = Y_predicted
        self.X = X
        self.class_labels = class_labels
        self.next_error_index = 0
        self.errors = np.nonzero(Y_expected - Y_predicted) # returns indexs of non-zero elements
        print(self.errors)
        
    def display_next(self):
        if(self.next_error_index >= self.errors[0].shape[0]):
            self.next_error_index = 0 # cycle back around
        X_index = self.errors[0][self.next_error_index]
        print('index = ', X_index )
        print('Expected = ' + self.class_labels[self.Y_expected[X_index]])
        print('Predicted = ' + self.class_labels[self.Y_predicted[X_index]])
        print('\nX['+ str(X_index) +']')
        print( self.X[X_index] )
        self.next_error_index +=1



In [35]:

    
def header(str):
    display(HTML('<h3>'+str+'</h3>'))

tests = {}

def test_pipeline(pipeline, name=None, verbose=True, qa_test = None):
    start=time.time()
    pipeline.fit(twenty_train.data, twenty_train.target)  
    predicted = pipeline.predict(twenty_test.data)
    elapsed_time = (time.time() - start)
    accuracy = np.mean(predicted == twenty_test.target)
    f1 = metrics.f1_score(twenty_test.target, predicted, average='macro') 
    print( 'F1 = %.3f \nAccuracy = %.3f\ntime = %.3f sec.' % (f1, accuracy, elapsed_time))
    if(verbose):
        header('Classification Report')
        print(metrics.classification_report(twenty_test.target, predicted,
                 target_names=twenty_test.target_names, digits=3))
        header('Confusion Matrix (row=expected, col=predicted)')
        df = pd.DataFrame(metrics.confusion_matrix(twenty_test.target, predicted))
        df.columns = twenty_test.target_names
        df['Expected']=twenty_test.target_names
        df.set_index('Expected',inplace=True)
        display(df)
   
    if name is not None:
        tests[name]={'Name':name, 'Accuracy':accuracy, 'F1':f1, 'Time':elapsed_time, 
                     'Details':pipeline.get_params(deep=True)}
    
    if qa_test is not None:
        qa_test.init( twenty_test.target, predicted, twenty_test.data, twenty_test.target_names) 
    
qa_test=QAResults()
test_pipeline(text_clf, qa_test=qa_test)









    



F1 = 0.756 
Accuracy = 0.774
time = 4.562 sec.






    




Classification Report






    



                          precision    recall  f1-score   support

             alt.atheism      0.802     0.520     0.631       319
           comp.graphics      0.810     0.648     0.720       389
 comp.os.ms-windows.misc      0.819     0.655     0.728       394
comp.sys.ibm.pc.hardware      0.672     0.778     0.721       392
   comp.sys.mac.hardware      0.856     0.774     0.813       385
          comp.windows.x      0.890     0.754     0.816       395
            misc.forsale      0.931     0.695     0.796       390
               rec.autos      0.847     0.919     0.881       396
         rec.motorcycles      0.937     0.932     0.935       398
      rec.sport.baseball      0.922     0.899     0.911       397
        rec.sport.hockey      0.892     0.970     0.929       399
               sci.crypt      0.594     0.967     0.736       396
         sci.electronics      0.836     0.598     0.697       393
                 sci.med      0.921     0.737     0.819       396
               sci.space      0.842     0.891     0.866       394
  soc.religion.christian      0.439     0.985     0.607       398
      talk.politics.guns      0.643     0.937     0.763       364
   talk.politics.mideast      0.930     0.915     0.922       376
      talk.politics.misc      0.956     0.416     0.580       310
      talk.religion.misc      0.972     0.139     0.244       251

             avg / total      0.822     0.774     0.768      7532







    




Confusion Matrix (row=expected, col=predicted)






    







  
    
      
      alt.atheism
      comp.graphics
      comp.os.ms-windows.misc
      comp.sys.ibm.pc.hardware
      comp.sys.mac.hardware
      comp.windows.x
      misc.forsale
      rec.autos
      rec.motorcycles
      rec.sport.baseball
      rec.sport.hockey
      sci.crypt
      sci.electronics
      sci.med
      sci.space
      soc.religion.christian
      talk.politics.guns
      talk.politics.mideast
      talk.politics.misc
      talk.religion.misc
    
    
      Expected
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      alt.atheism
      166
      0
      0
      1
      0
      1
      0
      0
      1
      1
      1
      3
      0
      6
      3
      123
      4
      8
      0
      1
    
    
      comp.graphics
      1
      252
      15
      12
      9
      18
      1
      2
      1
      5
      2
      41
      4
      0
      6
      15
      4
      1
      0
      0
    
    
      comp.os.ms-windows.misc
      0
      14
      258
      45
      3
      9
      0
      2
      1
      3
      2
      25
      1
      0
      6
      23
      2
      0
      0
      0
    
    
      comp.sys.ibm.pc.hardware
      0
      5
      11
      305
      17
      1
      3
      6
      1
      0
      2
      19
      13
      0
      5
      3
      1
      0
      0
      0
    
    
      comp.sys.mac.hardware
      0
      3
      8
      23
      298
      0
      3
      8
      1
      3
      1
      16
      8
      0
      2
      8
      3
      0
      0
      0
    
    
      comp.windows.x
      1
      21
      17
      13
      2
      298
      1
      0
      1
      1
      0
      23
      0
      1
      4
      10
      2
      0
      0
      0
    
    
      misc.forsale
      0
      1
      3
      31
      12
      1
      271
      19
      4
      4
      6
      5
      12
      6
      3
      9
      3
      0
      0
      0
    
    
      rec.autos
      0
      1
      0
      3
      0
      0
      4
      364
      3
      2
      2
      4
      1
      1
      3
      3
      4
      0
      1
      0
    
    
      rec.motorcycles
      0
      0
      0
      1
      0
      0
      2
      10
      371
      0
      0
      4
      0
      0
      0
      8
      2
      0
      0
      0
    
    
      rec.sport.baseball
      0
      0
      0
      0
      1
      0
      0
      4
      0
      357
      22
      0
      0
      0
      2
      9
      1
      1
      0
      0
    
    
      rec.sport.hockey
      0
      0
      0
      0
      0
      0
      0
      1
      0
      4
      387
      1
      0
      0
      1
      5
      0
      0
      0
      0
    
    
      sci.crypt
      0
      2
      1
      0
      0
      1
      1
      3
      0
      0
      0
      383
      1
      0
      0
      3
      1
      0
      0
      0
    
    
      sci.electronics
      0
      4
      2
      17
      5
      0
      2
      8
      7
      1
      2
      78
      235
      3
      11
      15
      2
      1
      0
      0
    
    
      sci.med
      2
      3
      0
      1
      1
      3
      1
      0
      2
      3
      4
      11
      5
      292
      6
      52
      6
      4
      0
      0
    
    
      sci.space
      0
      2
      0
      1
      0
      3
      0
      2
      1
      0
      1
      6
      1
      2
      351
      19
      4
      0
      1
      0
    
    
      soc.religion.christian
      2
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
      0
      0
      1
      2
      392
      0
      0
      0
      0
    
    
      talk.politics.guns
      0
      0
      0
      1
      0
      0
      2
      0
      1
      1
      0
      10
      0
      0
      1
      6
      341
      1
      0
      0
    
    
      talk.politics.mideast
      0
      1
      0
      0
      0
      0
      0
      0
      0
      1
      0
      2
      0
      0
      0
      24
      3
      344
      1
      0
    
    
      talk.politics.misc
      2
      0
      0
      0
      0
      0
      0
      1
      0
      0
      1
      11
      0
      1
      7
      35
      118
      5
      129
      0
    
    
      talk.religion.misc
      33
      2
      0
      0
      0
      0
      0
      0
      0
      1
      1
      3
      0
      4
      4
      131
      29
      5
      3
      35
    
  








    



(array([   1,    4,   14, ..., 7525, 7528, 7530]),)



In [36]:

    
qa_test.display_next()  # re-run this cell to see next error









    



index =  1
Expected = comp.windows.x
Predicted = sci.crypt

X[1]
From: Rick Miller <rick@ee.uwm.edu>
Subject: X-Face?
Organization: Just me.
Lines: 17
Distribution: world
NNTP-Posting-Host: 129.89.2.33
Summary: Go ahead... swamp me.  <EEP!>

I'm not familiar at all with the format of these "X-Face:" thingies, but
after seeing them in some folks' headers, I've *got* to *see* them (and
maybe make one of my own)!

I've got "dpg-view" on my Linux box (which displays "uncompressed X-Faces")
and I've managed to compile [un]compface too... but now that I'm *looking*
for them, I can't seem to find any X-Face:'s in anyones news headers!  :-(

Could you, would you, please send me your "X-Face:" header?

I *know* I'll probably get a little swamped, but I can handle it.

	...I hope.

Rick Miller  <rick@ee.uwm.edu> | <ricxjo@discus.mil.wi.us>   Ricxjo Muelisto
Send a postcard, get one back! | Enposxtigu bildkarton kaj vi ricevos alion!
          RICK MILLER // 16203 WOODS // MUSKEGO, WIS. 53150 // USA

Importance of TFIDF Weighting



In [37]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),  # <-- with weighting
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-5, random_state=42,
                                            max_iter=40)),
                         ]), verbose=False)









    



F1 = 0.841 
Accuracy = 0.848
time = 8.696 sec.



In [38]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer()),  # <-- no weighting
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-5, random_state=42,
                                            max_iter=40)),
                         ]), verbose=False)









    



F1 = 0.760 
Accuracy = 0.769
time = 8.705 sec.

TfidfVectorizer combines CountVectorizer and TfidfTransformer



In [39]:

    
test_pipeline(Pipeline([('tfidf_v', TfidfVectorizer()),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False)









    



F1 = 0.846 
Accuracy = 0.854
time = 8.722 sec.



In [40]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False, name='hinge loss')









    



F1 = 0.846 
Accuracy = 0.854
time = 8.760 sec.

Hyper-parameter tests on SGDClassifier



In [41]:

    
# hinge loss is a linear SVM
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False, name='hinge loss')









    



F1 = 0.846 
Accuracy = 0.854
time = 8.373 sec.



In [42]:

    
# log loss is logistic regression
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='log', penalty='l2',
                                            alpha=1e-6, random_state=42,
                                            max_iter=10 )),
                         ]), verbose=False, name='log loss')









    



F1 = 0.840 
Accuracy = 0.847
time = 6.093 sec.



In [43]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='log', penalty='none',
                                            alpha=1e-6, random_state=42,
                                            max_iter=10 )),
                         ]), verbose=False, name='log loss no regularization')









    



F1 = 0.819 
Accuracy = 0.825
time = 6.112 sec.

Test Naive Bayes model

MultinomialNB



In [44]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', MultinomialNB()),
                         ]), verbose=False, name='MultinomialNB')









    



F1 = 0.756 
Accuracy = 0.774
time = 4.628 sec.

K-nearest neighbors model

KNeighborsClassifier



In [45]:

    
from sklearn.neighbors import KNeighborsClassifier

test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('knn', KNeighborsClassifier(n_neighbors=5)),
                         ]), verbose=False, name='KNN n=5')









    



F1 = 0.655 
Accuracy = 0.659
time = 13.326 sec.



In [46]:

    
from sklearn.neighbors import KNeighborsClassifier

for n in range(1,7):
    print( '\nn = ' + str(n))
    test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('knn', KNeighborsClassifier(n_neighbors=n)),
                         ]), verbose=False, name='KNN n=' + str(n))









    



n = 1
F1 = 0.667 
Accuracy = 0.672
time = 11.949 sec.

n = 2
F1 = 0.639 
Accuracy = 0.641
time = 11.942 sec.

n = 3
F1 = 0.656 
Accuracy = 0.658
time = 11.676 sec.

n = 4
F1 = 0.654 
Accuracy = 0.656
time = 11.963 sec.

n = 5
F1 = 0.655 
Accuracy = 0.659
time = 11.997 sec.

n = 6
F1 = 0.655 
Accuracy = 0.661
time = 11.863 sec.



In [47]:

    
from sklearn.neighbors import KNeighborsClassifier

for n in range(1,7):
    print( '\nn = ' + str(n))
    test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('knn', KNeighborsClassifier(n_neighbors=n, weights='distance')),
                         ]), verbose=False, name='KNN n=' + str(n) + ' distance weights')









    



n = 1
F1 = 0.667 
Accuracy = 0.672
time = 11.402 sec.

n = 2
F1 = 0.667 
Accuracy = 0.672
time = 11.514 sec.

n = 3
F1 = 0.675 
Accuracy = 0.680
time = 11.714 sec.

n = 4
F1 = 0.678 
Accuracy = 0.684
time = 12.719 sec.

n = 5
F1 = 0.672 
Accuracy = 0.678
time = 12.581 sec.

n = 6
F1 = 0.674 
Accuracy = 0.681
time = 12.041 sec.

Nearest Centroid Model

NearestCentroid



In [48]:

    
from sklearn.neighbors.nearest_centroid import NearestCentroid

test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', NearestCentroid(metric='euclidean')),
                         ]), verbose=False, name='NearestCentroid')









    



F1 = 0.694 
Accuracy = 0.692
time = 4.217 sec.

Logistic Regression

This is same as to SGDClassifier with log loss, but uses different code/solver.



In [49]:

    
from sklearn.linear_model import LogisticRegression

test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', LogisticRegression(solver='sag', multi_class='multinomial', n_jobs=-1)),
                         ]), verbose=False, name='LogisticRegression multinomial')









    



F1 = 0.819 
Accuracy = 0.827
time = 8.674 sec.



In [50]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', LogisticRegression(solver='sag', multi_class='ovr',n_jobs=-1)),
                         ]), verbose=False, name='LogisticRegression ovr')









    



F1 = 0.819 
Accuracy = 0.828
time = 47.392 sec.



In [51]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', LogisticRegression(C=10, solver='sag', multi_class='multinomial', n_jobs=-1, max_iter=200)),
                         ]), verbose=False, name='LogisticRegression multinomial C=10')









    



F1 = 0.838 
Accuracy = 0.845
time = 15.442 sec.



In [52]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', LogisticRegression(C=100, solver='sag', multi_class='multinomial', n_jobs=-1, max_iter=200)),
                         ]), verbose=False, name='LogisticRegression multinomial C=100')









    



F1 = 0.841 
Accuracy = 0.847
time = 49.159 sec.



In [53]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', LogisticRegression(C=1000, solver='sag', multi_class='multinomial', n_jobs=-1, max_iter=200)),
                         ]), verbose=False, name='LogisticRegression multinomial C=1000')









    



/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/sag.py:326: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  "the coef_ did not converge", ConvergenceWarning)






    



F1 = 0.840 
Accuracy = 0.846
time = 50.857 sec.

Most Influential Features



In [54]:

    
p = Pipeline([('cvect', CountVectorizer(stop_words='english', ngram_range=(1,2),
                                                max_df = 0.88, min_df=1)),
                      ('tfidf', TfidfTransformer(sublinear_tf=True)),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=4e-4, random_state=42,
                                            max_iter=40 )),
                         ])     
test_pipeline(p, verbose=False)









    



F1 = 0.843 
Accuracy = 0.852
time = 30.012 sec.



In [55]:

    
# Adapted from https://stackoverflow.com/questions/11116697/how-to-get-most-informative-features-for-scikit-learn-classifiers
def show_most_informative_features(vectorizer, clf, class_labels, n=50):
    feature_names = vectorizer.get_feature_names()
    for row in range(clf.coef_.shape[0]):
        coefs_with_fns = sorted(zip(clf.coef_[row], feature_names))
        top = zip(coefs_with_fns[:n], coefs_with_fns[:-(n + 1):-1])
        print( '\nclass = ' + class_labels[row])
        l = [[fn_1, coef_1,fn_2,coef_2]  for (coef_1, fn_1), (coef_2, fn_2) in top]
        df = pd.DataFrame(l, columns=['Smallest Word', 'Smallest Weight', 'Largest Word', 'Largest Weight'])
        display(df)
        
show_most_informative_features(p.named_steps['cvect'], p.named_steps['sgdc'], twenty_train.target_names)









    



class = alt.atheism






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      christians
      -0.262594
      keith
      1.439411
    
    
      1
      rutgers edu
      -0.232721
      atheists
      1.271824
    
    
      2
      sandman caltech
      -0.230508
      edu keith
      1.146518
    
    
      3
      host sandman
      -0.229238
      atheism
      1.139742
    
    
      4
      rutgers
      -0.227094
      livesey
      0.918769
    
    
      5
      sandman
      -0.221514
      schneider
      0.881965
    
    
      6
      usa
      -0.219238
      keith cco
      0.858405
    
    
      7
      ca
      -0.216125
      jaeger
      0.855797
    
    
      8
      christ
      -0.207407
      islamic
      0.845568
    
    
      9
      mail
      -0.204733
      islam
      0.809818
    
    
      10
      thanks
      -0.194144
      keith allan
      0.799969
    
    
      11
      interested
      -0.182378
      allan schneider
      0.799969
    
    
      12
      usa lines
      -0.181814
      benedikt
      0.762978
    
    
      13
      help
      -0.172463
      rushdie
      0.752313
    
    
      14
      arc cco
      -0.165953
      solntze wpd
      0.731002
    
    
      15
      organization university
      -0.165137
      solntze
      0.731002
    
    
      16
      arc
      -0.157305
      allan
      0.722777
    
    
      17
      heaven
      -0.156941
      political atheists
      0.716436
    
    
      18
      athos
      -0.154498
      okcforum
      0.710658
    
    
      19
      athos rutgers
      -0.154498
      wpd sgi
      0.649272
    
    
      20
      christian
      -0.148664
      wpd
      0.649272
    
    
      21
      use
      -0.146382
      osrhe
      0.647602
    
    
      22
      distribution usa
      -0.142912
      vice ico
      0.628458
    
    
      23
      new
      -0.141217
      ico tek
      0.628458
    
    
      24
      ritvax
      -0.140880
      god
      0.627499
    
    
      25
      ritvax isc
      -0.140880
      ico
      0.616609
    
    
      26
      acs
      -0.137707
      mozumder
      0.614927
    
    
      27
      government
      -0.137602
      kmr4
      0.614037
    
    
      28
      article apr
      -0.137103
      atheists organization
      0.612605
    
    
      29
      arrogance
      -0.136612
      gregg
      0.612241
    
    
      30
      church
      -0.135456
      jon livesey
      0.605124
    
    
      31
      subject 2000
      -0.135289
      osrhe edu
      0.588700
    
    
      32
      rochester
      -0.134716
      okcforum osrhe
      0.588700
    
    
      33
      lines nntp
      -0.134381
      schneider subject
      0.573327
    
    
      34
      year
      -0.134233
      bobbe vice
      0.571574
    
    
      35
      white
      -0.130156
      bobbe
      0.571574
    
    
      36
      gun
      -0.129868
      caltech edu
      0.564274
    
    
      37
      clh
      -0.129793
      mangoe
      0.563092
    
    
      38
      reply
      -0.127696
      beauchaine
      0.562624
    
    
      39
      graphics
      -0.127420
      livesey solntze
      0.562374
    
    
      40
      pro
      -0.127393
      caltech
      0.547609
    
    
      41
      told
      -0.126946
      jaeger buphy
      0.546331
    
    
      42
      data
      -0.126880
      buphy bu
      0.546331
    
    
      43
      truth
      -0.126806
      buphy
      0.546331
    
    
      44
      aaron
      -0.124190
      tu bs
      0.545316
    
    
      45
      buy
      -0.123875
      i3150101
      0.545316
    
    
      46
      weapons
      -0.123823
      dbstu1 rz
      0.545316
    
    
      47
      scripture
      -0.123695
      dbstu1
      0.545316
    
    
      48
      hell
      -0.123164
      gregg jaeger
      0.537279
    
    
      49
      want
      -0.122580
      i3150101 dbstu1
      0.537143
    
  








    



class = comp.graphics






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      sale
      -0.248925
      graphics
      1.735588
    
    
      1
      windows
      -0.248667
      3d
      1.079136
    
    
      2
      monitor
      -0.217991
      image
      0.861527
    
    
      3
      window
      -0.197619
      polygon
      0.819904
    
    
      4
      mit
      -0.190565
      tiff
      0.793013
    
    
      5
      widget
      -0.176296
      images
      0.663772
    
    
      6
      win
      -0.174629
      cview
      0.656759
    
    
      7
      people
      -0.163595
      pov
      0.631182
    
    
      8
      list
      -0.158626
      animation
      0.602805
    
    
      9
      drive
      -0.158394
      format
      0.519557
    
    
      10
      mit edu
      -0.156234
      comp graphics
      0.512187
    
    
      11
      cica
      -0.150670
      algorithm
      0.497796
    
    
      12
      distribution
      -0.140442
      files
      0.492763
    
    
      13
      video card
      -0.134308
      points
      0.491094
    
    
      14
      school
      -0.134172
      gif
      0.489808
    
    
      15
      application
      -0.133454
      package
      0.477010
    
    
      16
      motif
      -0.133025
      sphere
      0.473042
    
    
      17
      server
      -0.132383
      3do
      0.457766
    
    
      18
      usa
      -0.131238
      library
      0.454194
    
    
      19
      card
      -0.130418
      graphics library
      0.453582
    
    
      20
      right
      -0.130028
      vga
      0.417271
    
    
      21
      key
      -0.128958
      surface
      0.405159
    
    
      22
      font
      -0.128804
      routine
      0.396083
    
    
      23
      motherboard
      -0.128709
      subject tiff
      0.395853
    
    
      24
      distribution usa
      -0.126410
      newsgroup split
      0.395765
    
    
      25
      monitors
      -0.123936
      3d graphics
      0.386984
    
    
      26
      nec
      -0.122411
      viewer
      0.380951
    
    
      27
      make
      -0.122140
      24 bit
      0.374144
    
    
      28
      message
      -0.120238
      vesa
      0.358538
    
    
      29
      really
      -0.119586
      philosophical significance
      0.352424
    
    
      30
      hp
      -0.118778
      studio
      0.350484
    
    
      31
      scsi
      -0.118023
      tdawson
      0.347543
    
    
      32
      modem
      -0.118012
      42
      0.335618
    
    
      33
      xterm
      -0.117902
      jpeg
      0.332758
    
    
      34
      memory
      -0.115604
      algorithms
      0.331362
    
    
      35
      new
      -0.114796
      code
      0.329442
    
    
      36
      x11r5
      -0.114525
      program
      0.328883
    
    
      37
      questions
      -0.114125
      computer graphics
      0.316519
    
    
      38
      lcs mit
      -0.112023
      split
      0.314760
    
    
      39
      lcs
      -0.111929
      subject cview
      0.312441
    
    
      40
      widgets
      -0.111297
      philosophical
      0.312432
    
    
      41
      error
      -0.110961
      tiff philosophical
      0.312197
    
    
      42
      circuit
      -0.109990
      significance 42
      0.312197
    
    
      43
      space
      -0.108308
      polygon routine
      0.311376
    
    
      44
      controller
      -0.107864
      aspects graphics
      0.310915
    
    
      45
      car
      -0.107545
      xv
      0.307551
    
    
      46
      long
      -0.106565
      3d studio
      0.303794
    
    
      47
      norton
      -0.105883
      caspian usc
      0.303565
    
    
      48
      com
      -0.105569
      polygons
      0.301520
    
    
      49
      price
      -0.104208
      24
      0.297507
    
  








    



class = comp.os.ms-windows.misc






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      sale
      -0.315937
      windows
      2.982981
    
    
      1
      motif
      -0.284700
      file
      0.848644
    
    
      2
      graphics
      -0.277000
      ini
      0.793745
    
    
      3
      mac
      -0.250074
      cica
      0.781741
    
    
      4
      board
      -0.214636
      win
      0.751385
    
    
      5
      scsi
      -0.207636
      driver
      0.734663
    
    
      6
      monitor
      -0.206667
      drivers
      0.662261
    
    
      7
      color
      -0.191345
      win3
      0.656588
    
    
      8
      image
      -0.190970
      ms
      0.631716
    
    
      9
      bus
      -0.186814
      files
      0.626170
    
    
      10
      unix
      -0.186793
      dos
      0.595812
    
    
      11
      floppy
      -0.180367
      ms windows
      0.568824
    
    
      12
      apple
      -0.168479
      nt
      0.511702
    
    
      13
      x11r5
      -0.165597
      subject windows
      0.498173
    
    
      14
      ibm
      -0.164128
      microsoft
      0.477887
    
    
      15
      macintosh
      -0.162409
      exe
      0.457013
    
    
      16
      power
      -0.161449
      windows organization
      0.431226
    
    
      17
      following
      -0.160944
      font
      0.423543
    
    
      18
      xlib
      -0.160862
      swap file
      0.412667
    
    
      19
      comp windows
      -0.158583
      printer
      0.386935
    
    
      20
      time
      -0.158197
      ftp
      0.386923
    
    
      21
      ms dos
      -0.157416
      bmp
      0.383132
    
    
      22
      drive
      -0.157296
      win ini
      0.372668
    
    
      23
      interested
      -0.155825
      diamond
      0.364714
    
    
      24
      cview
      -0.154014
      w4wg
      0.359173
    
    
      25
      xterm
      -0.149959
      desktop
      0.358825
    
    
      26
      bit
      -0.148351
      fonts
      0.354241
    
    
      27
      isa
      -0.147073
      program manager
      0.353322
    
    
      28
      window manager
      -0.145817
      ftp cica
      0.352917
    
    
      29
      mpeg
      -0.143865
      edu tw
      0.346036
    
    
      30
      car
      -0.143617
      apps
      0.345799
    
    
      31
      viewer
      -0.142155
      cica indiana
      0.345217
    
    
      32
      x11
      -0.142057
      tw
      0.343264
    
    
      33
      pcx
      -0.141835
      using
      0.341016
    
    
      34
      vesa
      -0.137888
      bj
      0.340315
    
    
      35
      sdsu
      -0.136676
      ini files
      0.329657
    
    
      36
      mit
      -0.135104
      truetype
      0.327812
    
    
      37
      10
      -0.134868
      manager
      0.325475
    
    
      38
      gif
      -0.134016
      latest
      0.323598
    
    
      39
      formats
      -0.133913
      subject win
      0.323259
    
    
      40
      shipping
      -0.132435
      swap
      0.317746
    
    
      41
      openwindows
      -0.131865
      win nt
      0.317550
    
    
      42
      runs
      -0.131365
      louray
      0.317443
    
    
      43
      question
      -0.129966
      utility
      0.316380
    
    
      44
      lines nntp
      -0.129664
      access
      0.315626
    
    
      45
      sdsu edu
      -0.127962
      norton
      0.312211
    
    
      46
      long
      -0.126243
      download
      0.310784
    
    
      47
      se
      -0.123020
      use
      0.310535
    
    
      48
      software
      -0.122193
      zip
      0.306288
    
    
      49
      keyboard
      -0.122145
      seas gwu
      0.305211
    
  








    



class = comp.sys.ibm.pc.hardware






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      mac
      -0.379717
      ide
      1.409218
    
    
      1
      sale
      -0.345942
      controller
      1.206842
    
    
      2
      apple
      -0.307799
      bus
      1.148549
    
    
      3
      windows
      -0.217368
      scsi
      0.970953
    
    
      4
      shipping
      -0.212027
      isa
      0.933777
    
    
      5
      internal
      -0.199717
      vlb
      0.780324
    
    
      6
      file
      -0.181048
      bios
      0.637390
    
    
      7
      brand new
      -0.178187
      486
      0.596254
    
    
      8
      iisi
      -0.171694
      gateway
      0.569682
    
    
      9
      items
      -0.167544
      eisa
      0.557024
    
    
      10
      latest
      -0.167369
      motherboard
      0.556187
    
    
      11
      graphics
      -0.162316
      drive
      0.554489
    
    
      12
      sun
      -0.153188
      drives
      0.551369
    
    
      13
      macintosh
      -0.152913
      pc
      0.522637
    
    
      14
      quadra
      -0.148348
      isa bus
      0.520204
    
    
      15
      window
      -0.143562
      irq
      0.498586
    
    
      16
      win3
      -0.143011
      card
      0.495699
    
    
      17
      code
      -0.142029
      local bus
      0.491729
    
    
      18
      includes
      -0.140734
      floppy
      0.487182
    
    
      19
      space
      -0.138845
      subject ide
      0.478090
    
    
      20
      centris
      -0.138325
      dma
      0.454041
    
    
      21
      sale organization
      -0.137216
      os
      0.438606
    
    
      22
      color
      -0.136745
      vs scsi
      0.437615
    
    
      23
      car
      -0.135345
      ide vs
      0.437615
    
    
      24
      interested
      -0.135291
      adaptec
      0.432051
    
    
      25
      150
      -0.134436
      port
      0.420597
    
    
      26
      driver
      -0.133105
      boot
      0.410292
    
    
      27
      best offer
      -0.132752
      cmos
      0.398034
    
    
      28
      lc
      -0.132335
      settings
      0.371923
    
    
      29
      offer
      -0.131659
      monitors
      0.359509
    
    
      30
      lciii
      -0.125270
      board
      0.358851
    
    
      31
      sell
      -0.122500
      esdi
      0.349988
    
    
      32
      edu
      -0.120479
      jumpers
      0.349597
    
    
      33
      files
      -0.118800
      scsi controller
      0.347016
    
    
      34
      power
      -0.118238
      monitor
      0.343010
    
    
      35
      subject windows
      -0.116013
      dos
      0.327310
    
    
      36
      iifx
      -0.114658
      hd
      0.326227
    
    
      37
      virginia
      -0.111283
      dx2
      0.320946
    
    
      38
      utilities
      -0.109607
      harddisk
      0.319506
    
    
      39
      internal drive
      -0.109555
      dante nmsu
      0.317269
    
    
      40
      ms windows
      -0.109310
      ide controller
      0.316091
    
    
      41
      display
      -0.108820
      seagate
      0.313257
    
    
      42
      000
      -0.107160
      scsi organization
      0.311869
    
    
      43
      information
      -0.105917
      vs
      0.310886
    
    
      44
      se 30
      -0.105826
      disk
      0.308125
    
    
      45
      looked
      -0.104528
      valve heart
      0.307642
    
    
      46
      mit
      -0.103172
      rri uwo
      0.307642
    
    
      47
      duo
      -0.103134
      rri
      0.307642
    
    
      48
      cs
      -0.103112
      heart rri
      0.307642
    
    
      49
      graphics card
      -0.103059
      17 monitors
      0.305329
    
  








    



class = comp.sys.mac.hardware






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      windows
      -0.441898
      mac
      2.041394
    
    
      1
      sale
      -0.342317
      apple
      1.686945
    
    
      2
      ide
      -0.324485
      quadra
      1.263753
    
    
      3
      pc
      -0.290010
      centris
      1.200487
    
    
      4
      controller
      -0.287402
      lc
      0.951701
    
    
      5
      dos
      -0.277375
      duo
      0.898442
    
    
      6
      offer
      -0.207702
      powerbook
      0.837541
    
    
      7
      car
      -0.206371
      lciii
      0.788603
    
    
      8
      com
      -0.173308
      iisi
      0.779973
    
    
      9
      files
      -0.163073
      c650
      0.738654
    
    
      10
      bios
      -0.160484
      simms
      0.672638
    
    
      11
      condition
      -0.158353
      610
      0.659771
    
    
      12
      gateway
      -0.146756
      vram
      0.650616
    
    
      13
      graphics
      -0.145430
      se
      0.640326
    
    
      14
      includes
      -0.144365
      centris 610
      0.603946
    
    
      15
      old 256k
      -0.140535
      nubus
      0.591657
    
    
      16
      forsale
      -0.140051
      fpu
      0.564404
    
    
      17
      isa
      -0.138786
      simm
      0.563169
    
    
      18
      3d
      -0.138507
      pds
      0.561671
    
    
      19
      program
      -0.137868
      adb
      0.557583
    
    
      20
      mike
      -0.137478
      macs
      0.532054
    
    
      21
      apple com
      -0.136940
      hades
      0.514634
    
    
      22
      vlb
      -0.136382
      040
      0.457711
    
    
      23
      cs
      -0.135985
      upgrade
      0.435468
    
    
      24
      sandvik
      -0.135625
      bmug
      0.427254
    
    
      25
      type
      -0.133522
      macintosh
      0.425225
    
    
      26
      amiga
      -0.132956
      se 30
      0.422475
    
    
      27
      kent
      -0.130648
      centris 650
      0.418256
    
    
      28
      00
      -0.130307
      lc iii
      0.414442
    
    
      29
      uucp
      -0.125630
      monitor
      0.405484
    
    
      30
      frame
      -0.125046
      internal
      0.395408
    
    
      31
      file
      -0.124133
      650
      0.389275
    
    
      32
      end
      -0.123662
      scsi
      0.381324
    
    
      33
      john
      -0.120111
      powerpc
      0.375989
    
    
      34
      486
      -0.119271
      subject quadra
      0.360956
    
    
      35
      eisa
      -0.118764
      clock
      0.353483
    
    
      36
      asking
      -0.118329
      nada kth
      0.348820
    
    
      37
      adaptec
      -0.118231
      quadra 800
      0.341816
    
    
      38
      good
      -0.117954
      slot
      0.338758
    
    
      39
      os
      -0.116723
      nada
      0.338661
    
    
      40
      copy
      -0.112375
      drive
      0.336297
    
    
      41
      compaq
      -0.112335
      ethernet
      0.332064
    
    
      42
      god
      -0.112011
      68040
      0.328638
    
    
      43
      bike
      -0.111046
      pb
      0.323738
    
    
      44
      unix
      -0.110769
      ram
      0.313958
    
    
      45
      apple laserwriter
      -0.110738
      machines
      0.313670
    
    
      46
      ericsson
      -0.110566
      pds slot
      0.312209
    
    
      47
      ibm
      -0.110536
      syquest
      0.310015
    
    
      48
      sun
      -0.110302
      mac ii
      0.309108
    
    
      49
      local bus
      -0.110174
      cable
      0.301666
    
  








    



class = comp.windows.x






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      dos
      -0.362704
      motif
      1.775618
    
    
      1
      mac
      -0.259959
      window
      1.740462
    
    
      2
      university
      -0.224300
      x11r5
      1.347464
    
    
      3
      card
      -0.216287
      widget
      1.279595
    
    
      4
      driver
      -0.206881
      server
      1.180182
    
    
      5
      good
      -0.203282
      lcs mit
      1.050715
    
    
      6
      athena mit
      -0.190738
      lcs
      1.048969
    
    
      7
      pc
      -0.168147
      xterm
      1.037401
    
    
      8
      printer
      -0.163036
      expo lcs
      1.025669
    
    
      9
      algorithm
      -0.159614
      expo
      1.008819
    
    
      10
      vga
      -0.158319
      xpert
      0.946528
    
    
      11
      pov
      -0.158016
      mit
      0.899334
    
    
      12
      usa
      -0.155522
      window manager
      0.877270
    
    
      13
      data
      -0.155363
      xlib
      0.867122
    
    
      14
      sale
      -0.154562
      organization internet
      0.861733
    
    
      15
      bbs
      -0.154494
      xpert expo
      0.847135
    
    
      16
      drive
      -0.154115
      internet lines
      0.835634
    
    
      17
      power
      -0.153397
      application
      0.826600
    
    
      18
      car
      -0.149635
      enterpoop
      0.672402
    
    
      19
      quadra
      -0.149372
      enterpoop mit
      0.662596
    
    
      20
      disk
      -0.148759
      x11
      0.653357
    
    
      21
      post
      -0.148603
      widgets
      0.642999
    
    
      22
      organization massachusetts
      -0.148396
      host enterpoop
      0.635055
    
    
      23
      monitor
      -0.148223
      edu xpert
      0.619388
    
    
      24
      distribution usa
      -0.147370
      mit edu
      0.604277
    
    
      25
      ai mit
      -0.147212
      openwindows
      0.583481
    
    
      26
      ai
      -0.145360
      client
      0.579953
    
    
      27
      massachusetts institute
      -0.145081
      display
      0.562440
    
    
      28
      years
      -0.143402
      xt
      0.547640
    
    
      29
      drivers
      -0.143216
      r5
      0.544012
    
    
      30
      chip
      -0.143111
      pixmap
      0.525120
    
    
      31
      small
      -0.141455
      manager
      0.512121
    
    
      32
      edu organization
      -0.141074
      colormap
      0.503765
    
    
      33
      apple
      -0.140136
      clients
      0.501120
    
    
      34
      images
      -0.138349
      tu dresden
      0.493931
    
    
      35
      files
      -0.137146
      running
      0.483813
    
    
      36
      massachusetts
      -0.136488
      expose
      0.481673
    
    
      37
      program manager
      -0.135871
      code
      0.475165
    
    
      38
      ca
      -0.135722
      sun
      0.470845
    
    
      39
      technology
      -0.134941
      dresden
      0.467343
    
    
      40
      tiff
      -0.133385
      inf tu
      0.460746
    
    
      41
      print
      -0.133167
      mwm
      0.449451
    
    
      42
      truetype
      -0.132521
      xview
      0.448274
    
    
      43
      word
      -0.129411
      xdm
      0.446943
    
    
      44
      scsi
      -0.128986
      sunos
      0.446736
    
    
      45
      board
      -0.128633
      event
      0.436659
    
    
      46
      3d
      -0.127373
      internet
      0.418717
    
    
      47
      picture
      -0.126694
      lib
      0.408139
    
    
      48
      technology lines
      -0.126683
      beck
      0.407041
    
    
      49
      james
      -0.125749
      olwm
      0.406992
    
  








    



class = misc.forsale






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      writes
      -0.317536
      sale
      3.436651
    
    
      1
      help
      -0.304476
      shipping
      1.344154
    
    
      2
      does
      -0.289535
      offer
      1.311094
    
    
      3
      know
      -0.280317
      forsale
      1.087234
    
    
      4
      think
      -0.268504
      condition
      0.979972
    
    
      5
      thanks
      -0.256351
      sale organization
      0.898205
    
    
      6
      info
      -0.254473
      best offer
      0.873974
    
    
      7
      question
      -0.232185
      asking
      0.848756
    
    
      8
      just
      -0.212882
      sell
      0.734246
    
    
      9
      com
      -0.211879
      make offer
      0.695124
    
    
      10
      appreciated
      -0.200603
      00
      0.632339
    
    
      11
      bike
      -0.195627
      interested
      0.613650
    
    
      12
      article
      -0.192214
      brand new
      0.612971
    
    
      13
      problem
      -0.186661
      obo
      0.610971
    
    
      14
      using
      -0.185959
      includes
      0.526228
    
    
      15
      mac
      -0.180333
      price
      0.522129
    
    
      16
      time
      -0.179811
      brand
      0.493351
    
    
      17
      advance
      -0.173216
      trade
      0.470088
    
    
      18
      thanks advance
      -0.171076
      excellent
      0.469081
    
    
      19
      read
      -0.170269
      offers
      0.467000
    
    
      20
      ftp
      -0.160806
      excellent condition
      0.458888
    
    
      21
      run
      -0.159450
      manuals
      0.455352
    
    
      22
      ve
      -0.158736
      email
      0.434294
    
    
      23
      information
      -0.158183
      hiram
      0.427270
    
    
      24
      uk
      -0.157411
      new
      0.410821
    
    
      25
      heard
      -0.156603
      stereo
      0.406574
    
    
      26
      anybody
      -0.155429
      original
      0.390596
    
    
      27
      post
      -0.154718
      items
      0.389908
    
    
      28
      better
      -0.154131
      games
      0.370641
    
    
      29
      remember
      -0.150524
      included
      0.366704
    
    
      30
      se
      -0.149907
      25
      0.355985
    
    
      31
      recommend
      -0.148636
      genesis
      0.350733
    
    
      32
      news
      -0.146307
      hiram edu
      0.347349
    
    
      33
      ca
      -0.146250
      manual
      0.346961
    
    
      34
      say
      -0.145148
      sega
      0.339552
    
    
      35
      hp com
      -0.144636
      distribution
      0.334254
    
    
      36
      try
      -0.138278
      forsale organization
      0.333367
    
    
      37
      probably
      -0.136636
      sale trade
      0.327502
    
    
      38
      program
      -0.136584
      cd
      0.325240
    
    
      39
      hi
      -0.135890
      wanted
      0.317359
    
    
      40
      sure
      -0.134926
      selling
      0.314517
    
    
      41
      1993
      -0.134482
      contact
      0.310048
    
    
      42
      don
      -0.132754
      mail
      0.309086
    
    
      43
      going
      -0.132739
      best
      0.306935
    
    
      44
      tell
      -0.132410
      plus shipping
      0.306826
    
    
      45
      data
      -0.131891
      kou
      0.305577
    
    
      46
      cnn
      -0.131432
      douglas kou
      0.305577
    
    
      47
      people
      -0.129769
      used
      0.301618
    
    
      48
      did
      -0.128862
      camera
      0.297479
    
    
      49
      dealer
      -0.128568
      subject sale
      0.295516
    
  








    



class = rec.autos






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      bike
      -0.566461
      car
      2.704451
    
    
      1
      sale
      -0.285270
      cars
      1.858921
    
    
      2
      bikes
      -0.249484
      engine
      0.906552
    
    
      3
      dod
      -0.236707
      dealer
      0.842979
    
    
      4
      david
      -0.185771
      automotive
      0.720391
    
    
      5
      gun
      -0.178030
      ford
      0.684498
    
    
      6
      god
      -0.177929
      callison
      0.657060
    
    
      7
      guns
      -0.177296
      oil
      0.632864
    
    
      8
      motorcycle
      -0.175182
      toyota
      0.629774
    
    
      9
      card
      -0.169799
      boyle
      0.622813
    
    
      10
      team
      -0.164670
      warning read
      0.611300
    
    
      11
      mac
      -0.163937
      subject warning
      0.605882
    
    
      12
      pc
      -0.162375
      dumbest
      0.548768
    
    
      13
      game
      -0.161408
      autos
      0.543519
    
    
      14
      apple
      -0.155769
      auto
      0.517957
    
    
      15
      government
      -0.150223
      dumbest automotive
      0.515150
    
    
      16
      rider
      -0.141518
      automotive concepts
      0.507330
    
    
      17
      public
      -0.140775
      sho
      0.496273
    
    
      18
      radio
      -0.140703
      eliot
      0.487751
    
    
      19
      house
      -0.138416
      frost
      0.469956
    
    
      20
      ride
      -0.133703
      unisql
      0.457907
    
    
      21
      shaft
      -0.133044
      warning
      0.457647
    
    
      22
      yamaha
      -0.132802
      centerline
      0.457581
    
    
      23
      baseball
      -0.130083
      rec autos
      0.457334
    
    
      24
      asking
      -0.129630
      wagon
      0.451413
    
    
      25
      monitor
      -0.127358
      subject dumbest
      0.448474
    
    
      26
      ed
      -0.126569
      centerline com
      0.448218
    
    
      27
      machine
      -0.125961
      dodge
      0.445804
    
    
      28
      midway ecn
      -0.125909
      concepts time
      0.445543
    
    
      29
      condition
      -0.124424
      new car
      0.445538
    
    
      30
      uk
      -0.123403
      saturn
      0.428828
    
    
      31
      advance
      -0.120888
      concepts
      0.418998
    
    
      32
      video
      -0.119778
      nissan
      0.417425
    
    
      33
      hockey
      -0.117756
      taurus
      0.413592
    
    
      34
      dog
      -0.117534
      james callison
      0.408614
    
    
      35
      la
      -0.117507
      jim frost
      0.403838
    
    
      36
      use
      -0.117077
      convertible
      0.403720
    
    
      37
      order
      -0.117014
      driving
      0.403482
    
    
      38
      mode
      -0.116576
      callison uokmax
      0.395484
    
    
      39
      games
      -0.116479
      jimf centerline
      0.394862
    
    
      40
      windows
      -0.114812
      jimf
      0.394862
    
    
      41
      memory
      -0.114748
      uokmax ecn
      0.394628
    
    
      42
      sony
      -0.113444
      uokmax
      0.394628
    
    
      43
      book
      -0.113401
      honda
      0.391601
    
    
      44
      scsi
      -0.112871
      vw
      0.374545
    
    
      45
      control
      -0.112714
      boyle cactus
      0.370921
    
    
      46
      computing
      -0.111639
      models
      0.368860
    
    
      47
      nasa
      -0.111585
      trunk
      0.366876
    
    
      48
      utexas edu
      -0.110489
      engr washington
      0.365249
    
    
      49
      board
      -0.110376
      chevy
      0.361945
    
  








    



class = rec.motorcycles






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      car
      -0.286296
      bike
      3.322988
    
    
      1
      windows
      -0.225016
      dod
      2.566536
    
    
      2
      does
      -0.189510
      bikes
      1.417203
    
    
      3
      cars
      -0.173699
      motorcycle
      1.392350
    
    
      4
      use
      -0.172006
      ride
      1.334104
    
    
      5
      gun
      -0.162740
      riding
      1.221409
    
    
      6
      believe
      -0.156443
      bmw
      0.991547
    
    
      7
      card
      -0.144651
      rider
      0.977306
    
    
      8
      space
      -0.142959
      motorcycles
      0.826216
    
    
      9
      auto
      -0.140948
      helmet
      0.811539
    
    
      10
      using
      -0.139521
      ama
      0.731285
    
    
      11
      mac
      -0.136020
      dog
      0.640129
    
    
      12
      information
      -0.133225
      harley
      0.627291
    
    
      13
      game
      -0.131732
      honda
      0.624566
    
    
      14
      team
      -0.131579
      yamaha
      0.595195
    
    
      15
      government
      -0.130072
      behanna
      0.569610
    
    
      16
      baseball
      -0.128850
      egreen east
      0.563789
    
    
      17
      radio
      -0.126334
      egreen
      0.563789
    
    
      18
      problem
      -0.122624
      east sun
      0.563203
    
    
      19
      american
      -0.122275
      infante
      0.554910
    
    
      20
      fan
      -0.121492
      ranck
      0.546244
    
    
      21
      scott
      -0.120340
      moa
      0.545869
    
    
      22
      boyle
      -0.120062
      zx
      0.535977
    
    
      23
      hockey
      -0.118793
      biker
      0.512128
    
    
      24
      microsystems lines
      -0.117895
      hydro
      0.510585
    
    
      25
      toyota
      -0.117440
      ed green
      0.502827
    
    
      26
      fax
      -0.115135
      riders
      0.496290
    
    
      27
      christian
      -0.115006
      hydro ca
      0.481503
    
    
      28
      price
      -0.113900
      dogs
      0.465928
    
    
      29
      jesus
      -0.113470
      nj nec
      0.465131
    
    
      30
      power
      -0.113262
      drinking
      0.457907
    
    
      31
      chip
      -0.113260
      eskimo com
      0.453305
    
    
      32
      year
      -0.113209
      speedy
      0.453300
    
    
      33
      season
      -0.112759
      countersteering
      0.450573
    
    
      34
      god
      -0.110527
      levine
      0.439081
    
    
      35
      na
      -0.110237
      eskimo
      0.434208
    
    
      36
      support
      -0.109762
      stafford
      0.430404
    
    
      37
      does know
      -0.108036
      sun com
      0.424932
    
    
      38
      conditioning
      -0.107715
      jody levine
      0.424040
    
    
      39
      graphics
      -0.107148
      shaft
      0.418007
    
    
      40
      nissan
      -0.106932
      jody
      0.405544
    
    
      41
      distribution na
      -0.106755
      insurance
      0.403480
    
    
      42
      games
      -0.106543
      nec com
      0.402403
    
    
      43
      reply
      -0.106294
      winona
      0.401970
    
    
      44
      control
      -0.104614
      pettefar
      0.400739
    
    
      45
      pc
      -0.104598
      ed
      0.397985
    
    
      46
      data
      -0.104284
      chris behanna
      0.397672
    
    
      47
      box
      -0.103805
      npet bnr
      0.392363
    
    
      48
      memory
      -0.103360
      npet
      0.392363
    
    
      49
      boston
      -0.103121
      nick pettefar
      0.392363
    
  








    



class = rec.sport.baseball






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      hockey
      -0.599129
      baseball
      1.982992
    
    
      1
      nhl
      -0.501949
      pitching
      1.215024
    
    
      2
      playoffs
      -0.313111
      braves
      1.127552
    
    
      3
      cup
      -0.291172
      phillies
      1.073558
    
    
      4
      sale
      -0.265599
      runs
      1.035096
    
    
      5
      goal
      -0.261501
      cubs
      0.882127
    
    
      6
      playoff
      -0.260685
      mets
      0.856105
    
    
      7
      leafs
      -0.260215
      sox
      0.848817
    
    
      8
      pens
      -0.256636
      players
      0.822357
    
    
      9
      penguins
      -0.247154
      hit
      0.815258
    
    
      10
      goals
      -0.246488
      year
      0.815014
    
    
      11
      ice
      -0.242504
      hitter
      0.805832
    
    
      12
      wings
      -0.242452
      team
      0.718800
    
    
      13
      windows
      -0.230931
      alomar
      0.716358
    
    
      14
      car
      -0.224364
      jays
      0.693386
    
    
      15
      flyers
      -0.203229
      ball
      0.693194
    
    
      16
      people
      -0.199597
      jewish baseball
      0.682621
    
    
      17
      devils
      -0.194810
      yankees
      0.674261
    
    
      18
      dod
      -0.184558
      baseball players
      0.638860
    
    
      19
      want
      -0.180850
      games
      0.635803
    
    
      20
      bruins
      -0.179437
      pitcher
      0.616015
    
    
      21
      space
      -0.171028
      season
      0.605661
    
    
      22
      ca
      -0.165528
      rbi
      0.590167
    
    
      23
      puck
      -0.163944
      jewish
      0.587952
    
    
      24
      contact
      -0.163914
      rockies
      0.582167
    
    
      25
      problem
      -0.162610
      subject jewish
      0.581473
    
    
      26
      karr
      -0.160960
      stadium
      0.570149
    
    
      27
      gun
      -0.160467
      dodgers
      0.550324
    
    
      28
      goalie
      -0.160116
      win
      0.549610
    
    
      29
      god
      -0.159886
      game
      0.539143
    
    
      30
      email
      -0.159301
      red sox
      0.508905
    
    
      31
      stanley
      -0.156141
      pitchers
      0.508403
    
    
      32
      buy
      -0.155195
      yankee
      0.499444
    
    
      33
      lemieux
      -0.154946
      batting
      0.499180
    
    
      34
      penalty
      -0.152949
      tigers
      0.492961
    
    
      35
      program
      -0.151645
      nl
      0.491502
    
    
      36
      state
      -0.151555
      ted
      0.483163
    
    
      37
      bike
      -0.146000
      pitch
      0.482258
    
    
      38
      file
      -0.143852
      hr
      0.481884
    
    
      39
      interested
      -0.143689
      alleg
      0.472645
    
    
      40
      using
      -0.142661
      baerga
      0.472381
    
    
      41
      stanley cup
      -0.142555
      league
      0.469714
    
    
      42
      christian
      -0.141428
      morris
      0.464239
    
    
      43
      computer
      -0.141296
      cs cornell
      0.462765
    
    
      44
      quebec
      -0.141273
      journalism indiana
      0.462645
    
    
      45
      espn
      -0.139897
      orioles
      0.461480
    
    
      46
      uk
      -0.138402
      tedward
      0.456461
    
    
      47
      round
      -0.138221
      tedward cs
      0.454666
    
    
      48
      called
      -0.137844
      ted fischer
      0.450943
    
    
      49
      sharks
      -0.137289
      edward ted
      0.450943
    
  








    



class = rec.sport.hockey






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      pitching
      -0.503812
      hockey
      2.745056
    
    
      1
      runs
      -0.397354
      nhl
      2.031628
    
    
      2
      com
      -0.297169
      team
      1.614400
    
    
      3
      phillies
      -0.297097
      game
      1.222137
    
    
      4
      run
      -0.281868
      playoff
      1.198517
    
    
      5
      mets
      -0.273202
      playoffs
      1.165680
    
    
      6
      use
      -0.264522
      leafs
      1.160847
    
    
      7
      sox
      -0.259872
      cup
      1.074502
    
    
      8
      sale
      -0.259185
      play
      1.056705
    
    
      9
      braves
      -0.251448
      devils
      1.054775
    
    
      10
      thanks
      -0.233531
      wings
      1.007565
    
    
      11
      nl
      -0.229535
      espn
      0.915872
    
    
      12
      windows
      -0.222824
      detroit
      0.913727
    
    
      13
      tigers
      -0.221776
      rangers
      0.911138
    
    
      14
      work
      -0.212930
      season
      0.901455
    
    
      15
      baseball
      -0.210562
      players
      0.882939
    
    
      16
      edu
      -0.208619
      pens
      0.871250
    
    
      17
      rockies
      -0.208497
      penguins
      0.859462
    
    
      18
      jays
      -0.201134
      islanders
      0.839436
    
    
      19
      innings
      -0.195829
      coach
      0.810762
    
    
      20
      hit
      -0.190635
      stanley
      0.809702
    
    
      21
      health
      -0.189913
      toronto
      0.781234
    
    
      22
      pitcher
      -0.189833
      stanley cup
      0.776259
    
    
      23
      drive
      -0.189027
      ice
      0.776044
    
    
      24
      orioles
      -0.188800
      teams
      0.775791
    
    
      25
      using
      -0.185575
      lemieux
      0.761445
    
    
      26
      cubs
      -0.184513
      goals
      0.753313
    
    
      27
      ball
      -0.183012
      gerald
      0.742641
    
    
      28
      reds
      -0.182822
      bruins
      0.736018
    
    
      29
      yankees
      -0.181461
      goal
      0.698247
    
    
      30
      god
      -0.181432
      pittsburgh
      0.690573
    
    
      31
      used
      -0.178879
      ca
      0.670671
    
    
      32
      ibm
      -0.178403
      maynard
      0.669196
    
    
      33
      price
      -0.173524
      abc
      0.657951
    
    
      34
      rbi
      -0.172838
      flyers
      0.650510
    
    
      35
      help
      -0.172739
      montreal
      0.626113
    
    
      36
      hitter
      -0.171543
      player
      0.622925
    
    
      37
      distribution
      -0.170394
      goalie
      0.605528
    
    
      38
      dodgers
      -0.169857
      coverage
      0.604294
    
    
      39
      card
      -0.169183
      puck
      0.585689
    
    
      40
      red sox
      -0.168037
      ramsey
      0.583423
    
    
      41
      tax
      -0.166739
      league
      0.582566
    
    
      42
      base
      -0.165461
      finals
      0.566472
    
    
      43
      stadium
      -0.165115
      win
      0.552868
    
    
      44
      manager
      -0.164440
      traded
      0.552262
    
    
      45
      looking
      -0.162689
      chem utoronto
      0.551491
    
    
      46
      pitched
      -0.162649
      hawks
      0.550538
    
    
      47
      bullpen
      -0.162617
      alchemy
      0.550032
    
    
      48
      does
      -0.158034
      alchemy chem
      0.543425
    
    
      49
      world series
      -0.156910
      utoronto ca
      0.543121
    
  








    



class = sci.crypt






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      thanks
      -0.255692
      clipper
      2.996278
    
    
      1
      help
      -0.248558
      encryption
      2.512530
    
    
      2
      keyboard
      -0.240424
      key
      2.268067
    
    
      3
      windows
      -0.234923
      chip
      1.725607
    
    
      4
      gun
      -0.219906
      clipper chip
      1.559344
    
    
      5
      problem
      -0.192000
      keys
      1.368082
    
    
      6
      guns
      -0.191063
      nsa
      1.353249
    
    
      7
      problems
      -0.186570
      escrow
      1.301817
    
    
      8
      window
      -0.183966
      crypto
      1.258073
    
    
      9
      god
      -0.181335
      secret
      1.023378
    
    
      10
      apple
      -0.177214
      pgp
      1.017468
    
    
      11
      distribution usa
      -0.176762
      gtoal
      1.015017
    
    
      12
      email
      -0.176095
      security
      0.995122
    
    
      13
      car
      -0.174092
      secure
      0.982704
    
    
      14
      waco
      -0.171591
      encrypted
      0.982152
    
    
      15
      info
      -0.169386
      des
      0.955641
    
    
      16
      motif
      -0.167728
      tapped
      0.897075
    
    
      17
      batf
      -0.160759
      tapped code
      0.868792
    
    
      18
      hp
      -0.160049
      code good
      0.864205
    
    
      19
      scsi
      -0.159458
      wiretap
      0.851041
    
    
      20
      image
      -0.154544
      algorithm
      0.843944
    
    
      21
      cb
      -0.153680
      subject tapped
      0.843709
    
    
      22
      memory
      -0.153456
      cryptography
      0.831226
    
    
      23
      cb att
      -0.153022
      government
      0.826672
    
    
      24
      card
      -0.146347
      privacy
      0.800161
    
    
      25
      home
      -0.145396
      public key
      0.799333
    
    
      26
      power
      -0.142330
      subject clipper
      0.774841
    
    
      27
      address
      -0.139914
      eff
      0.707552
    
    
      28
      sale
      -0.138669
      announcement
      0.700055
    
    
      29
      ctrl
      -0.137723
      rsa
      0.663111
    
    
      30
      graphics
      -0.137238
      toal
      0.624819
    
    
      31
      mac
      -0.135983
      graham toal
      0.624819
    
    
      32
      pc
      -0.133167
      code
      0.623301
    
    
      33
      looking
      -0.131588
      public
      0.610487
    
    
      34
      book
      -0.131475
      gtoal gtoal
      0.610321
    
    
      35
      scott
      -0.130827
      gtoal com
      0.610321
    
    
      36
      driver
      -0.129492
      key escrow
      0.596081
    
    
      37
      said
      -0.129159
      denning
      0.591515
    
    
      38
      dc
      -0.128865
      phone
      0.589490
    
    
      39
      israel
      -0.128401
      classified
      0.588888
    
    
      40
      chris
      -0.128212
      qualcomm
      0.581876
    
    
      41
      win
      -0.125422
      sternlight
      0.580894
    
    
      42
      intel
      -0.123691
      scheme
      0.572859
    
    
      43
      mouse
      -0.123439
      david sternlight
      0.570591
    
    
      44
      frank
      -0.123178
      phones
      0.563682
    
    
      45
      area
      -0.122326
      white house
      0.554053
    
    
      46
      display
      -0.120349
      bits
      0.530820
    
    
      47
      sun
      -0.119753
      brad
      0.521718
    
    
      48
      year
      -0.119710
      crypt
      0.512859
    
    
      49
      xterm
      -0.119239
      bontchev
      0.505843
    
  








    



class = sci.electronics






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      windows
      -0.311488
      circuit
      1.056166
    
    
      1
      sale
      -0.273852
      voltage
      0.733122
    
    
      2
      clipper
      -0.203362
      electronics
      0.652888
    
    
      3
      mac
      -0.188981
      amp
      0.581651
    
    
      4
      space
      -0.175729
      circuits
      0.561164
    
    
      5
      graphics
      -0.168297
      power
      0.505019
    
    
      6
      shipping
      -0.166350
      audio
      0.474846
    
    
      7
      did
      -0.148983
      current
      0.463145
    
    
      8
      access
      -0.148443
      cooling
      0.446665
    
    
      9
      year
      -0.142876
      radar
      0.437066
    
    
      10
      government
      -0.141965
      detector
      0.429100
    
    
      11
      engine
      -0.141007
      ground
      0.420204
    
    
      12
      encryption
      -0.136890
      wire
      0.409583
    
    
      13
      bike
      -0.136380
      cooling towers
      0.402634
    
    
      14
      motherboard
      -0.135493
      towers
      0.400315
    
    
      15
      card
      -0.133813
      number phone
      0.398327
    
    
      16
      drive
      -0.133782
      use
      0.388118
    
    
      17
      file
      -0.132637
      line
      0.387387
    
    
      18
      cs
      -0.130776
      electrical
      0.378410
    
    
      19
      asking
      -0.130558
      babb
      0.369220
    
    
      20
      months
      -0.126755
      motorola
      0.368604
    
    
      21
      forsale
      -0.126702
      output
      0.366988
    
    
      22
      driver
      -0.126119
      outlet
      0.362812
    
    
      23
      window
      -0.124397
      device
      0.362767
    
    
      24
      orbit
      -0.122202
      detector detectors
      0.360113
    
    
      25
      motif
      -0.119381
      receiver
      0.359438
    
    
      26
      monitor
      -0.119290
      wiring
      0.354431
    
    
      27
      scsi
      -0.118519
      radio
      0.350346
    
    
      28
      color
      -0.117391
      detectors
      0.346530
    
    
      29
      dos
      -0.117299
      tv
      0.345803
    
    
      30
      modem
      -0.116601
      rf
      0.341801
    
    
      31
      server
      -0.115994
      need number
      0.336502
    
    
      32
      think
      -0.115592
      8051
      0.335419
    
    
      33
      news
      -0.115398
      neoucom
      0.329285
    
    
      34
      memory
      -0.114647
      number line
      0.327816
    
    
      35
      486
      -0.114190
      ee
      0.327052
    
    
      36
      god
      -0.112164
      grissom larc
      0.324003
    
    
      37
      new
      -0.111509
      grissom
      0.321098
    
    
      38
      algorithm
      -0.111352
      low
      0.319040
    
    
      39
      steve
      -0.111241
      signal
      0.316684
    
    
      40
      dod
      -0.107977
      radar detector
      0.316674
    
    
      41
      send
      -0.107632
      signals
      0.315270
    
    
      42
      format
      -0.107159
      build
      0.314854
    
    
      43
      person
      -0.106951
      kolstad
      0.314739
    
    
      44
      floppy
      -0.106858
      phone
      0.310411
    
    
      45
      clipper chip
      -0.106828
      old 256k
      0.309202
    
    
      46
      team
      -0.105403
      256k simms
      0.307771
    
    
      47
      james
      -0.105359
      design
      0.304236
    
    
      48
      programs
      -0.104793
      resistor
      0.302913
    
    
      49
      program
      -0.104208
      input
      0.301473
    
  








    



class = sci.med






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      god
      -0.308393
      msg
      1.391552
    
    
      1
      space
      -0.219429
      gordon banks
      1.134642
    
    
      2
      car
      -0.208842
      geb
      1.087958
    
    
      3
      government
      -0.202934
      disease
      1.053067
    
    
      4
      power
      -0.190390
      doctor
      1.033087
    
    
      5
      drive
      -0.184275
      cs pitt
      1.025371
    
    
      6
      christians
      -0.174080
      banks
      1.011849
    
    
      7
      windows
      -0.166031
      gordon
      0.951553
    
    
      8
      gun
      -0.164528
      geb cs
      0.914965
    
    
      9
      cis
      -0.162413
      pitt
      0.887279
    
    
      10
      graphics
      -0.155481
      edu gordon
      0.886129
    
    
      11
      org
      -0.151248
      dyer
      0.852144
    
    
      12
      christian
      -0.151072
      pitt edu
      0.819541
    
    
      13
      team
      -0.148499
      patients
      0.806629
    
    
      14
      game
      -0.146758
      medical
      0.771834
    
    
      15
      bible
      -0.144245
      food
      0.763348
    
    
      16
      earth
      -0.142148
      medicine
      0.736737
    
    
      17
      card
      -0.141680
      diet
      0.731859
    
    
      18
      jesus
      -0.141456
      treatment
      0.727953
    
    
      19
      bike
      -0.139986
      foods
      0.711181
    
    
      20
      code
      -0.138682
      sensitivity
      0.689626
    
    
      21
      religion
      -0.133725
      syndrome
      0.652705
    
    
      22
      software
      -0.133340
      msg sensitivity
      0.647112
    
    
      23
      tv
      -0.132018
      sensitivity superstition
      0.635233
    
    
      24
      church
      -0.131194
      superstition
      0.626830
    
    
      25
      list
      -0.130082
      symptoms
      0.622026
    
    
      26
      sale
      -0.129283
      spdcc
      0.608653
    
    
      27
      man
      -0.127012
      subject msg
      0.589249
    
    
      28
      dod
      -0.126707
      health
      0.555455
    
    
      29
      clinton
      -0.126026
      cancer
      0.529851
    
    
      30
      cis pitt
      -0.124664
      spdcc com
      0.528743
    
    
      31
      win
      -0.120584
      chinese
      0.528719
    
    
      32
      ac
      -0.117899
      steve dyer
      0.520260
    
    
      33
      police
      -0.116742
      patient
      0.518024
    
    
      34
      math
      -0.115484
      yeast
      0.514193
    
    
      35
      usa
      -0.115059
      pain
      0.513593
    
    
      36
      coverage
      -0.114358
      effects
      0.489690
    
    
      37
      law
      -0.114015
      seizures
      0.478036
    
    
      38
      pc
      -0.113967
      candida
      0.461372
    
    
      39
      engineering
      -0.113634
      physician
      0.448191
    
    
      40
      shipping
      -0.113519
      univ pittsburgh
      0.432932
    
    
      41
      video
      -0.112646
      pittsburgh computer
      0.432932
    
    
      42
      cars
      -0.112641
      homeopathy
      0.431116
    
    
      43
      gas
      -0.111112
      skepticism chastity
      0.430920
    
    
      44
      didn
      -0.110731
      n3jxp skepticism
      0.430920
    
    
      45
      games
      -0.110437
      n3jxp
      0.430920
    
    
      46
      parts
      -0.110218
      chastity intellect
      0.430920
    
    
      47
      rutgers
      -0.109387
      banks n3jxp
      0.430920
    
    
      48
      created
      -0.109305
      chastity
      0.430039
    
    
      49
      chip
      -0.109268
      skepticism
      0.429682
    
  








    



class = sci.space






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      windows
      -0.329683
      space
      2.737442
    
    
      1
      steve
      -0.290013
      orbit
      1.507993
    
    
      2
      greenbelt md
      -0.274327
      moon
      1.435014
    
    
      3
      md usa
      -0.261338
      launch
      1.175230
    
    
      4
      communications greenbelt
      -0.252275
      shuttle
      1.054835
    
    
      5
      distribution usa
      -0.244397
      henry
      1.049330
    
    
      6
      chip
      -0.242633
      nasa
      1.026781
    
    
      7
      greenbelt
      -0.242195
      pat
      1.016211
    
    
      8
      sale
      -0.234942
      alaska
      0.997669
    
    
      9
      car
      -0.232567
      prb access
      0.961597
    
    
      10
      god
      -0.218735
      prb
      0.961597
    
    
      11
      good
      -0.217626
      com pat
      0.934431
    
    
      12
      code
      -0.206001
      lunar
      0.930392
    
    
      13
      md
      -0.201529
      alaska edu
      0.918956
    
    
      14
      file
      -0.196365
      spacecraft
      0.884275
    
    
      15
      cc
      -0.189775
      aurora
      0.820603
    
    
      16
      best
      -0.189436
      zoo toronto
      0.819396
    
    
      17
      clinton
      -0.187100
      nsmca
      0.804579
    
    
      18
      hp
      -0.165996
      zoo
      0.798657
    
    
      19
      usa
      -0.165993
      earth
      0.764267
    
    
      20
      ve
      -0.165896
      sci
      0.748000
    
    
      21
      card
      -0.161573
      henry spencer
      0.691288
    
    
      22
      thanks
      -0.161558
      baalke
      0.685424
    
    
      23
      graphics
      -0.159680
      henry zoo
      0.681170
    
    
      24
      brinich
      -0.156961
      space station
      0.670813
    
    
      25
      steve access
      -0.156961
      aurora alaska
      0.668944
    
    
      26
      steve brinich
      -0.156961
      access digex
      0.653567
    
    
      27
      com steve
      -0.153574
      flight
      0.644492
    
    
      28
      andrew
      -0.153570
      digex
      0.638069
    
    
      29
      run
      -0.152757
      solar
      0.635337
    
    
      30
      law
      -0.152483
      subject space
      0.632624
    
    
      31
      jason
      -0.144077
      kelvin jpl
      0.620839
    
    
      32
      want
      -0.143859
      spencer
      0.614056
    
    
      33
      brian
      -0.143520
      toronto edu
      0.598772
    
    
      34
      opinions
      -0.143468
      funding
      0.598246
    
    
      35
      brinich subject
      -0.142943
      distribution sci
      0.585037
    
    
      36
      game
      -0.142251
      mars
      0.583079
    
    
      37
      mail
      -0.141279
      edu henry
      0.582750
    
    
      38
      season
      -0.141062
      kelvin
      0.580679
    
    
      39
      number
      -0.140707
      mccall
      0.576526
    
    
      40
      color
      -0.139889
      sci space
      0.573207
    
    
      41
      encryption
      -0.139528
      pat subject
      0.570772
    
    
      42
      cars
      -0.138169
      nsmca aurora
      0.560148
    
    
      43
      disk
      -0.137070
      orbital
      0.552450
    
    
      44
      state
      -0.135425
      station
      0.533392
    
    
      45
      ms
      -0.134902
      acad3 alaska
      0.529022
    
    
      46
      does
      -0.134021
      acad3
      0.529022
    
    
      47
      summary
      -0.133050
      comet
      0.521670
    
    
      48
      netcom com
      -0.131623
      billion
      0.519471
    
    
      49
      andrew cmu
      -0.131307
      satellite
      0.507340
    
  








    



class = soc.religion.christian






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      nntp
      -0.478942
      god
      1.805293
    
    
      1
      nntp posting
      -0.477601
      rutgers edu
      1.625885
    
    
      2
      posting host
      -0.477601
      article apr
      1.518949
    
    
      3
      host
      -0.469325
      christians
      1.448327
    
    
      4
      posting
      -0.440066
      rutgers
      1.446445
    
    
      5
      morality
      -0.360335
      athos rutgers
      1.429535
    
    
      6
      distribution
      -0.351899
      athos
      1.429535
    
    
      7
      kaldis
      -0.345950
      christ
      1.266615
    
    
      8
      atheism
      -0.281448
      church
      1.200873
    
    
      9
      brian
      -0.250321
      jesus
      1.052535
    
    
      10
      romulus rutgers
      -0.250041
      clh
      0.942558
    
    
      11
      romulus
      -0.240821
      geneva rutgers
      0.896292
    
    
      12
      com
      -0.232132
      faith
      0.887455
    
    
      13
      keywords
      -0.214753
      geneva
      0.881115
    
    
      14
      theodore kaldis
      -0.211939
      christian
      0.862696
    
    
      15
      islam
      -0.211866
      bible
      0.839950
    
    
      16
      sandvik
      -0.211291
      1993
      0.832347
    
    
      17
      apr 1993
      -0.208587
      christianity
      0.795422
    
    
      18
      american
      -0.202822
      apr
      0.773233
    
    
      19
      distribution usa
      -0.196376
      scripture
      0.703437
    
    
      20
      theodore
      -0.192093
      heaven
      0.690204
    
    
      21
      malcolm
      -0.191463
      sin
      0.678748
    
    
      22
      using
      -0.191385
      catholic
      0.677958
    
    
      23
      lds
      -0.190771
      resurrection
      0.661965
    
    
      24
      host aisun3
      -0.190653
      petch
      0.644785
    
    
      25
      apple
      -0.190322
      arrogance
      0.622870
    
    
      26
      malcolm lee
      -0.190060
      hell
      0.587978
    
    
      27
      keith
      -0.189473
      03
      0.559302
    
    
      28
      newsreader
      -0.188690
      jayne
      0.538981
    
    
      29
      robert
      -0.187946
      romans
      0.527868
    
    
      30
      news
      -0.185091
      fisher
      0.526401
    
    
      31
      usa
      -0.185061
      lord
      0.475500
    
    
      32
      uucp
      -0.183387
      kulikauskas
      0.474620
    
    
      33
      edu kaldis
      -0.182369
      married
      0.472520
    
    
      34
      weiss
      -0.180023
      truth
      0.468538
    
    
      35
      robert weiss
      -0.179193
      easter
      0.468126
    
    
      36
      distribution world
      -0.178996
      verse
      0.462723
    
    
      37
      buffalo
      -0.178983
      arrogance christians
      0.443106
    
    
      38
      gmt
      -0.178545
      believe
      0.427412
    
    
      39
      benedikt
      -0.175737
      revelation
      0.418598
    
    
      40
      koresh
      -0.175436
      petch gvg47
      0.416763
    
    
      41
      apple com
      -0.175034
      gvg47 gvg
      0.416763
    
    
      42
      government
      -0.174643
      gvg47
      0.416763
    
    
      43
      version
      -0.170642
      accepting
      0.408514
    
    
      44
      majority
      -0.166478
      subject daily
      0.407767
    
    
      45
      remus rutgers
      -0.165933
      daily verse
      0.407767
    
    
      46
      remus
      -0.164602
      subject arrogance
      0.403420
    
    
      47
      royalroads
      -0.164564
      question
      0.402095
    
    
      48
      royalroads ca
      -0.164564
      mmalt guild
      0.401333
    
    
      49
      tin
      -0.161246
      mmalt
      0.401333
    
  








    



class = talk.politics.guns






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      clipper
      -0.222678
      gun
      2.579640
    
    
      1
      encryption
      -0.215469
      guns
      1.750272
    
    
      2
      mail
      -0.188852
      firearms
      1.229257
    
    
      3
      chip
      -0.185588
      batf
      1.070268
    
    
      4
      israel
      -0.170434
      waco
      1.054974
    
    
      5
      key
      -0.160673
      atf
      0.987423
    
    
      6
      radar
      -0.157711
      weapons
      0.987191
    
    
      7
      new
      -0.155324
      gun control
      0.936558
    
    
      8
      israeli
      -0.153292
      fbi
      0.913826
    
    
      9
      program
      -0.145553
      handgun
      0.762626
    
    
      10
      turkish
      -0.145435
      subject gun
      0.717268
    
    
      11
      speed
      -0.143815
      cdt
      0.709990
    
    
      12
      space
      -0.138044
      weapon
      0.672084
    
    
      13
      reason
      -0.136117
      firearm
      0.656523
    
    
      14
      death penalty
      -0.133639
      ranch
      0.649816
    
    
      15
      president know
      -0.131840
      survivors
      0.642186
    
    
      16
      data
      -0.131838
      dividian
      0.608275
    
    
      17
      jesus
      -0.131204
      handheld com
      0.606538
    
    
      18
      car
      -0.130628
      dividian ranch
      0.603888
    
    
      19
      warning
      -0.129894
      ifas ufl
      0.599772
    
    
      20
      mit
      -0.128916
      ifas
      0.599772
    
    
      21
      message mr
      -0.128690
      gnv ifas
      0.599772
    
    
      22
      god
      -0.128398
      gnv
      0.595359
    
    
      23
      play
      -0.127494
      subject atf
      0.582316
    
    
      24
      subject message
      -0.126755
      burns dividian
      0.576486
    
    
      25
      game
      -0.126469
      atf burns
      0.576486
    
    
      26
      cryptography
      -0.126197
      control
      0.573298
    
    
      27
      sandvik
      -0.124592
      handheld
      0.572013
    
    
      28
      happened organization
      -0.124249
      nra
      0.566508
    
    
      29
      virginia
      -0.123857
      compound
      0.564035
    
    
      30
      windows
      -0.121348
      kratz
      0.563989
    
    
      31
      mit edu
      -0.120890
      feustel
      0.549113
    
    
      32
      14
      -0.120116
      sw stratus
      0.536962
    
    
      33
      warning read
      -0.119787
      tavares
      0.534152
    
    
      34
      org
      -0.119494
      arms
      0.531348
    
    
      35
      subject warning
      -0.119405
      stratus com
      0.517930
    
    
      36
      se
      -0.118506
      criminals
      0.513595
    
    
      37
      corporation
      -0.118429
      rkba
      0.512320
    
    
      38
      tax
      -0.118302
      ranch survivors
      0.506095
    
    
      39
      atheists
      -0.118159
      com tavares
      0.505799
    
    
      40
      team
      -0.117957
      cdt sw
      0.505799
    
    
      41
      code
      -0.117929
      stratus
      0.503989
    
    
      42
      christian
      -0.117311
      batf fbi
      0.503198
    
    
      43
      games
      -0.117150
      gun like
      0.497802
    
    
      44
      president
      -0.117068
      handguns
      0.497027
    
    
      45
      health care
      -0.116405
      crime
      0.494372
    
    
      46
      keys
      -0.116015
      sw
      0.494356
    
    
      47
      message
      -0.115645
      frank crary
      0.492476
    
    
      48
      cs colorado
      -0.114984
      crary
      0.492476
    
    
      49
      ma
      -0.114734
      burns
      0.490712
    
  








    



class = talk.politics.mideast






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      god
      -0.350832
      israel
      2.761935
    
    
      1
      jesus
      -0.274825
      israeli
      2.527672
    
    
      2
      good
      -0.251022
      turkish
      1.493652
    
    
      3
      thanks
      -0.232954
      arab
      1.340593
    
    
      4
      bible
      -0.209399
      armenians
      1.246517
    
    
      5
      christian
      -0.201946
      jews
      1.231433
    
    
      6
      christ
      -0.200742
      armenian
      1.226403
    
    
      7
      gun
      -0.196413
      armenia
      1.169814
    
    
      8
      uk
      -0.192723
      arabs
      1.072722
    
    
      9
      ve
      -0.178288
      turks
      0.972496
    
    
      10
      distribution
      -0.172574
      serdar
      0.970465
    
    
      11
      space
      -0.170639
      turkey
      0.937920
    
    
      12
      game
      -0.165605
      argic
      0.932473
    
    
      13
      ll
      -0.161409
      israelis
      0.927347
    
    
      14
      jim
      -0.160812
      soldiers
      0.883230
    
    
      15
      fbi
      -0.159531
      cpr
      0.818071
    
    
      16
      little
      -0.155570
      policy
      0.810203
    
    
      17
      waco
      -0.152306
      center policy
      0.776000
    
    
      18
      lord
      -0.151701
      serdar argic
      0.763582
    
    
      19
      data
      -0.151387
      policy research
      0.762256
    
    
      20
      thing
      -0.149069
      jewish
      0.732222
    
    
      21
      distribution usa
      -0.148496
      subject israeli
      0.728123
    
    
      22
      new
      -0.145157
      lebanese
      0.728118
    
    
      23
      pretty
      -0.144219
      occupied
      0.687469
    
    
      24
      guns
      -0.143545
      lebanon
      0.666627
    
    
      25
      mouse
      -0.142456
      palestinian
      0.663780
    
    
      26
      law
      -0.141472
      palestinians
      0.652173
    
    
      27
      best
      -0.140601
      holocaust
      0.649212
    
    
      28
      james
      -0.140366
      igc
      0.641495
    
    
      29
      windows
      -0.139020
      palestine
      0.627737
    
    
      30
      doesn
      -0.137142
      peace
      0.622758
    
    
      31
      available
      -0.135942
      sdpa
      0.622098
    
    
      32
      mormons
      -0.132159
      igc apc
      0.620060
    
    
      33
      christianity
      -0.131594
      apc org
      0.620060
    
    
      34
      version
      -0.131466
      apc
      0.618195
    
    
      35
      michael
      -0.131115
      nysernet
      0.608043
    
    
      36
      looking
      -0.130784
      jake
      0.582760
    
    
      37
      canada
      -0.129731
      urartu
      0.579178
    
    
      38
      net
      -0.129338
      zuma
      0.575633
    
    
      39
      batf
      -0.127972
      villages
      0.567328
    
    
      40
      faith
      -0.127897
      zuma uucp
      0.565462
    
    
      41
      sex
      -0.126974
      sdpa org
      0.562028
    
    
      42
      apple
      -0.124563
      urartu sdpa
      0.554725
    
    
      43
      sun
      -0.124421
      cyprus
      0.540212
    
    
      44
      need
      -0.124299
      hezbollah
      0.537373
    
    
      45
      win
      -0.123517
      cosmo
      0.536988
    
    
      46
      edu david
      -0.122965
      uucp serdar
      0.535759
    
    
      47
      probably
      -0.122663
      sera zuma
      0.535759
    
    
      48
      use
      -0.122423
      sera
      0.535759
    
    
      49
      matthew
      -0.121841
      occupation
      0.530713
    
  








    



class = talk.politics.misc






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      gun
      -0.240749
      cramer
      1.279590
    
    
      1
      clipper
      -0.210502
      optilink
      1.075916
    
    
      2
      car
      -0.208287
      clayton
      0.925913
    
    
      3
      team
      -0.199598
      kaldis
      0.910310
    
    
      4
      guns
      -0.193330
      clayton cramer
      0.872607
    
    
      5
      chip
      -0.187127
      optilink com
      0.799627
    
    
      6
      god
      -0.172919
      gay
      0.784267
    
    
      7
      christian
      -0.169253
      clinton
      0.779250
    
    
      8
      israeli
      -0.159803
      com clayton
      0.653406
    
    
      9
      israel
      -0.158112
      cramer optilink
      0.631128
    
    
      10
      christians
      -0.153255
      tax
      0.627427
    
    
      11
      encryption
      -0.151469
      isc br
      0.593709
    
    
      12
      technology
      -0.144965
      br
      0.537165
    
    
      13
      thanks
      -0.144391
      theodore kaldis
      0.530644
    
    
      14
      waco
      -0.141359
      br com
      0.508279
    
    
      15
      religion
      -0.140141
      steveh
      0.506406
    
    
      16
      control
      -0.139926
      thor isc
      0.504580
    
    
      17
      killed
      -0.132962
      new study
      0.468353
    
    
      18
      jesus
      -0.132649
      romulus rutgers
      0.466554
    
    
      19
      phone
      -0.130550
      steveh thor
      0.464326
    
    
      20
      batf
      -0.129394
      steve hendricks
      0.464326
    
    
      21
      john
      -0.124590
      hendricks
      0.463630
    
    
      22
      jim
      -0.123446
      health care
      0.463058
    
    
      23
      space
      -0.120510
      sexual
      0.455044
    
    
      24
      stated
      -0.120082
      homosexual
      0.451504
    
    
      25
      clipper chip
      -0.119716
      romulus
      0.443966
    
    
      26
      info
      -0.117185
      drugs
      0.441151
    
    
      27
      feustel
      -0.117025
      cramer writes
      0.440658
    
    
      28
      science
      -0.116493
      theodore
      0.431780
    
    
      29
      brad
      -0.115651
      isc
      0.431360
    
    
      30
      fbi
      -0.115404
      jobs
      0.429552
    
    
      31
      key
      -0.115277
      homosexuals
      0.428784
    
    
      32
      jews
      -0.111757
      president
      0.412206
    
    
      33
      buy
      -0.111501
      thor
      0.411450
    
    
      34
      research
      -0.110619
      health
      0.403253
    
    
      35
      christianity
      -0.109087
      ipser
      0.400499
    
    
      36
      baseball
      -0.107713
      deficit
      0.394536
    
    
      37
      matter
      -0.106788
      atlantaga ncr
      0.387261
    
    
      38
      home
      -0.106209
      atlantaga
      0.387261
    
    
      39
      host magnus
      -0.105333
      gay percentage
      0.385092
    
    
      40
      firearms
      -0.104333
      kaldis romulus
      0.384204
    
    
      41
      robert
      -0.102893
      consent
      0.383365
    
    
      42
      end
      -0.101495
      government
      0.382534
    
    
      43
      interested
      -0.101170
      concentrate
      0.377172
    
    
      44
      arms
      -0.101056
      study gay
      0.375971
    
    
      45
      jewish
      -0.100878
      ncratl atlantaga
      0.366063
    
    
      46
      drive
      -0.099840
      ncratl
      0.366063
    
    
      47
      lines article
      -0.098429
      men
      0.356346
    
    
      48
      koresh
      -0.098307
      free
      0.351063
    
    
      49
      reno
      -0.098265
      taxes
      0.349464
    
  








    



class = talk.religion.misc






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      atheists
      -0.195706
      sandvik
      0.585719
    
    
      1
      rutgers edu
      -0.191471
      christian
      0.545276
    
    
      2
      rutgers
      -0.190287
      robert weiss
      0.475192
    
    
      3
      thanks
      -0.163467
      weiss
      0.463496
    
    
      4
      clh
      -0.162012
      ch981 cleveland
      0.459477
    
    
      5
      free
      -0.152926
      ch981
      0.459477
    
    
      6
      need
      -0.152154
      kent
      0.459302
    
    
      7
      problem
      -0.139410
      koresh
      0.450443
    
    
      8
      article apr
      -0.137967
      royalroads ca
      0.446104
    
    
      9
      1993
      -0.137923
      royalroads
      0.446104
    
    
      10
      atheist
      -0.135408
      tony alicea
      0.433298
    
    
      11
      genocide
      -0.123902
      rosicrucian
      0.432065
    
    
      12
      hell
      -0.119854
      morality
      0.429734
    
    
      13
      caltech edu
      -0.115136
      god promise
      0.419108
    
    
      14
      caltech
      -0.114485
      alicea
      0.408533
    
    
      15
      geneva rutgers
      -0.112430
      malcolm lee
      0.407051
    
    
      16
      geneva
      -0.111645
      jesus
      0.396638
    
    
      17
      cobb
      -0.105518
      malcolm
      0.395671
    
    
      18
      schneider
      -0.104465
      93 god
      0.391050
    
    
      19
      atheism
      -0.104245
      psyrobtw
      0.371880
    
    
      20
      new
      -0.100533
      order
      0.365398
    
    
      21
      idea
      -0.100353
      edu tony
      0.357811
    
    
      22
      assumption
      -0.098130
      biblical
      0.354625
    
    
      23
      keith
      -0.096336
      sandvik kent
      0.347946
    
    
      24
      year
      -0.095172
      kent apple
      0.347946
    
    
      25
      islamic
      -0.093796
      kendig
      0.340709
    
    
      26
      athos
      -0.093756
      brian kendig
      0.340709
    
    
      27
      athos rutgers
      -0.093756
      rosicrucian order
      0.339644
    
    
      28
      clinton
      -0.093643
      post royalroads
      0.338062
    
    
      29
      keith cco
      -0.093065
      mlee post
      0.338062
    
    
      30
      wwc
      -0.092543
      quack kfu
      0.329175
    
    
      31
      general
      -0.091737
      kfu com
      0.329175
    
    
      32
      saturn wwc
      -0.091722
      kfu
      0.327360
    
    
      33
      wwc edu
      -0.091722
      god
      0.327133
    
    
      34
      information
      -0.091504
      mlee
      0.323950
    
    
      35
      bu edu
      -0.090913
      promise
      0.319546
    
    
      36
      uk
      -0.090687
      joslin
      0.316207
    
    
      37
      allan schneider
      -0.090648
      apple com
      0.314847
    
    
      38
      keith allan
      -0.090648
      subject 2000
      0.313779
    
    
      39
      perfect
      -0.090471
      sandvik newton
      0.312501
    
    
      40
      gov
      -0.090247
      newton apple
      0.312501
    
    
      41
      university
      -0.089358
      ca malcolm
      0.308892
    
    
      42
      belief
      -0.089247
      kent sandvik
      0.303535
    
    
      43
      bu
      -0.088978
      article sandvik
      0.302410
    
    
      44
      work
      -0.088664
      biblical backing
      0.299272
    
    
      45
      03
      -0.088466
      meritt
      0.298930
    
    
      46
      valuable
      -0.088403
      christian morality
      0.298559
    
    
      47
      mike
      -0.088218
      brian
      0.296168
    
    
      48
      allan
      -0.088208
      com sandvik
      0.293057
    
    
      49
      free moral
      -0.087457
      bskendig netcom
      0.292565



In [56]:

    
p = Pipeline([('cvect', CountVectorizer( analyzer='char', ngram_range=(5,5),
                                                max_df = 0.88, min_df=1)),
                      ('tfidf', TfidfTransformer(sublinear_tf=True)),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=4e-4, random_state=42,
                                            max_iter=40 )),
                         ]) 
test_pipeline(p, verbose=False)
show_most_informative_features(p.named_steps['cvect'], p.named_steps['sgdc'], twenty_train.target_names)









    



F1 = 0.841 
Accuracy = 0.851
time = 98.776 sec.

class = alt.atheism






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      hrist
      -0.194710
      theis
      1.106866
    
    
      1
      ians
      -0.178771
      athei
      1.072342
    
    
      2
      stian
      -0.153789
      athe
      0.864802
    
    
      3
      istia
      -0.151108
      keith
      0.824156
    
    
      4
      s.edu
      -0.150853
      heist
      0.808652
    
    
      5
      risti
      -0.149580
      islam
      0.724374
    
    
      6
      gers.
      -0.149477
      eists
      0.701778
    
    
      7
      .rutg
      -0.143008
      heism
      0.632027
    
    
      8
      ers.e
      -0.143003
      isla
      0.591572
    
    
      9
      rs.ed
      -0.143003
      keit
      0.554473
    
    
      10
      rutge
      -0.140693
      u (ke
      0.524833
    
    
      11
      tgers
      -0.140693
      (keit
      0.521565
    
    
      12
      utger
      -0.140693
      (kei
      0.513267
    
    
      13
      chri
      -0.138795
      vesey
      0.511749
    
    
      14
      =====
      -0.137284
      ivese
      0.511749
    
    
      15
      tians
      -0.136190
      bobb
      0.510663
    
    
      16
      n.cal
      -0.133942
      alla
      0.509364
    
    
      17
      chris
      -0.132989
      eith
      0.507646
    
    
      18
      tian
      -0.132534
      jaege
      0.493277
    
    
      19
      dman.
      -0.130011
      h@cco
      0.483662
    
    
      20
      of c
      -0.129564
      th@cc
      0.481903
    
    
      21
      sandm
      -0.129428
      aeger
      0.480952
    
    
      22
      man.c
      -0.127061
      hneid
      0.479919
    
    
      23
      1993.
      -0.125907
      eider
      0.479919
    
    
      24
      on: u
      -0.125689
      du (k
      0.478513
    
    
      25
      an.ca
      -0.124556
      neide
      0.476685
    
    
      26
      ndman
      -0.123557
      schne
      0.474441
    
    
      27
      hat c
      -0.123154
      chnei
      0.471840
    
    
      28
      t: sa
      -0.123140
      slami
      0.468545
    
    
      29
      .1993
      -0.121659
      jaeg
      0.467804
    
    
      30
      f chr
      -0.121652
      ith@c
      0.466925
    
    
      31
      andma
      -0.117787
      lamic
      0.465649
    
    
      32
      of ch
      -0.117505
      eith@
      0.455297
    
    
      33
      hell
      -0.117165
      nedik
      0.446001
    
    
      34
      want
      -0.114513
      enedi
      0.446001
    
    
      35
      du (a
      -0.111676
      edikt
      0.446001
    
    
      36
      in h
      -0.109096
      bened
      0.446001
    
    
      37
      s pro
      -0.105030
      l ath
      0.445724
    
    
      38
      s.rut
      -0.104740
      schn
      0.437340
    
    
      39
      usa\n
      -0.104656
      ushdi
      0.431082
    
    
      40
      trea
      -0.104591
      shdie
      0.431082
    
    
      41
      ruth
      -0.104466
      rushd
      0.429657
    
    
      42
      ture
      -0.104398
      ider)
      0.421163
    
    
      43
      use t
      -0.104200
      eism
      0.418629
    
    
      44
      cult
      -0.104098
      ze.wp
      0.408721
    
    
      45
      hell
      -0.103903
      tze.w
      0.408721
    
    
      46
      can
      -0.103502
      solnt
      0.408721
    
    
      47
      c@cco
      -0.102430
      olntz
      0.408721
    
    
      48
      n we
      -0.102389
      ntze.
      0.408721
    
    
      49
      l.com
      -0.102057
      lntze
      0.408721
    
  








    



class = comp.graphics






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      wind
      -0.210219
      phics
      0.889178
    
    
      1
      windo
      -0.199793
      aphic
      0.826278
    
    
      2
      indow
      -0.199399
      raphi
      0.792678
    
    
      3
      crypt
      -0.147027
      hics
      0.700232
    
    
      4
      idget
      -0.137149
      graph
      0.697049
    
    
      5
      widge
      -0.134665
      grap
      0.688807
    
    
      6
      sale
      -0.131028
      image
      0.640641
    
    
      7
      dows
      -0.130920
      imag
      0.635767
    
    
      8
      ndows
      -0.128106
      olygo
      0.556094
    
    
      9
      r sal
      -0.125818
      lygon
      0.549179
    
    
      10
      it.ed
      -0.124636
      polyg
      0.548527
    
    
      11
      the x
      -0.121964
      file
      0.451353
    
    
      12
      widg
      -0.118938
      tiff
      0.444364
    
    
      13
      list
      -0.113557
      poly
      0.438104
    
    
      14
      st of
      -0.112570
      nimat
      0.411684
    
    
      15
      monit
      -0.111921
      mage
      0.406374
    
    
      16
      onito
      -0.111012
      cview
      0.387756
    
    
      17
      pric
      -0.110942
      ygon
      0.370210
    
    
      18
      nitor
      -0.110795
      imati
      0.369748
    
    
      19
      price
      -0.109920
      rithm
      0.366211
    
    
      20
      ith a
      -0.109114
      mages
      0.366034
    
    
      21
      mit.e
      -0.108937
      orith
      0.358966
    
    
      22
      moni
      -0.108279
      cvie
      0.358923
    
    
      23
      .mit.
      -0.107361
      algo
      0.358834
    
    
      24
      or sa
      -0.106952
      lgori
      0.353428
    
    
      25
      list
      -0.106588
      gorit
      0.353396
    
    
      26
      e car
      -0.105770
      algor
      0.353228
    
    
      27
      t.edu
      -0.104144
      anima
      0.315670
    
    
      28
      an x
      -0.103585
      oints
      0.301595
    
    
      29
      y inf
      -0.103238
      files
      0.296488
    
    
      30
      icati
      -0.102855
      tiff
      0.293500
    
    
      31
      prot
      -0.102126
      anim
      0.283251
    
    
      32
      the l
      -0.100827
      ibrar
      0.282348
    
    
      33
      itor
      -0.099944
      form
      0.278534
    
    
      34
      elect
      -0.099637
      progr
      0.278199
    
    
      35
      i tr
      -0.099346
      ckage
      0.275891
    
    
      36
      confi
      -0.098909
      point
      0.275882
    
    
      37
      lanet
      -0.098654
      libra
      0.273130
    
    
      38
      contr
      -0.098025
      urfac
      0.271032
    
    
      39
      ight
      -0.097715
      surfa
      0.271032
    
    
      40
      hat t
      -0.096963
      p.gra
      0.269583
    
    
      41
      catio
      -0.096583
      omp.g
      0.269583
    
    
      42
      he ca
      -0.096374
      mp.gr
      0.269583
    
    
      43
      a wi
      -0.094973
      .grap
      0.268418
    
    
      44
      a win
      -0.094596
      ogram
      0.266209
    
    
      45
      acce
      -0.094402
      ackag
      0.264678
    
    
      46
      , ple
      -0.094348
      cs li
      0.262966
    
    
      47
      appli
      -0.093803
      3d g
      0.262910
    
    
      48
      vice
      -0.093504
      libr
      0.260609
    
    
      49
      ndow
      -0.093407
      hics.
      0.259998
    
  








    



class = comp.os.ms-windows.misc






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      ndow
      -0.380450
      ndows
      1.544164
    
    
      1
      board
      -0.236146
      dows
      1.421803
    
    
      2
      dow m
      -0.225387
      wind
      1.159674
    
    
      3
      motif
      -0.202330
      windo
      1.154040
    
    
      4
      for s
      -0.187504
      indow
      1.153866
    
    
      5
      color
      -0.186237
      r win
      0.692845
    
    
      6
      image
      -0.184436
      file
      0.689033
    
    
      7
      enwin
      -0.182456
      s 3.1
      0.668462
    
    
      8
      openw
      -0.182456
      ows 3
      0.592240
    
    
      9
      penwi
      -0.182456
      ws 3.
      0.589709
    
    
      10
      sale
      -0.176028
      for w
      0.558763
    
    
      11
      nwind
      -0.171307
      river
      0.558037
    
    
      12
      the x
      -0.171290
      or wi
      0.546765
    
    
      13
      or sa
      -0.168077
      3.1
      0.525055
    
    
      14
      r sal
      -0.165673
      n win
      0.512513
    
    
      15
      w man
      -0.161914
      : win
      0.473913
    
    
      16
      oard
      -0.159727
      dos
      0.446539
    
    
      17
      otif
      -0.156950
      p fil
      0.385430
    
    
      18
      ow ma
      -0.155975
      in wi
      0.381601
    
    
      19
      phics
      -0.150670
      s-win
      0.380655
    
    
      20
      boar
      -0.149208
      ms-wi
      0.380655
    
    
      21
      imag
      -0.148457
      file
      0.376563
    
    
      22
      he po
      -0.148251
      files
      0.372198
    
    
      23
      mac
      -0.144419
      win3
      0.366861
    
    
      24
      port
      -0.143563
      font
      0.362544
    
    
      25
      x11r
      -0.143242
      .ini
      0.357624
    
    
      26
      x11r5
      -0.139668
      win
      0.329636
    
    
      27
      scsi
      -0.139619
      drive
      0.327014
    
    
      28
      nitor
      -0.138714
      e: wi
      0.324357
    
    
      29
      onito
      -0.138588
      cica
      0.308772
    
    
      30
      monit
      -0.137784
      dows.
      0.308543
    
    
      31
      x-win
      -0.137459
      in 3.
      0.304620
    
    
      32
      an x
      -0.137220
      win 3
      0.303961
    
    
      33
      splay
      -0.135698
      in w
      0.297449
    
    
      34
      displ
      -0.135211
      driv
      0.294832
    
    
      35
      aphic
      -0.134526
      h win
      0.291729
    
    
      36
      open
      -0.134197
      n 3.1
      0.274426
    
    
      37
      power
      -0.133617
      wnloa
      0.274328
    
    
      38
      x win
      -0.133271
      win.
      0.273890
    
    
      39
      scsi
      -0.132955
      ownlo
      0.273710
    
    
      40
      x-wi
      -0.131787
      desk
      0.273228
    
    
      41
      moni
      -0.128444
      rosof
      0.272599
    
    
      42
      apple
      -0.128037
      ms-w
      0.272545
    
    
      43
      hics
      -0.127602
      iver
      0.271827
    
    
      44
      rive
      -0.124650
      downl
      0.271321
    
    
      45
      for a
      -0.123150
      croso
      0.268511
    
    
      46
      stem
      -0.122265
      re: w
      0.262431
    
    
      47
      ing x
      -0.121175
      dows\n
      0.258328
    
    
      48
      colo
      -0.120276
      ith w
      0.257445
    
    
      49
      olor
      -0.119339
      cica.
      0.254940
    
  








    



class = comp.sys.ibm.pc.hardware






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      mac
      -0.225206
      ide
      0.724057
    
    
      1
      apple
      -0.205988
      oller
      0.640627
    
    
      2
      dows
      -0.182041
      scsi
      0.550627
    
    
      3
      r sal
      -0.173268
      troll
      0.543668
    
    
      4
      sale
      -0.169135
      rolle
      0.529195
    
    
      5
      pple
      -0.168663
      isa
      0.471442
    
    
      6
      or sa
      -0.168652
      bus
      0.431050
    
    
      7
      appl
      -0.164108
      scsi
      0.397718
    
    
      8
      indow
      -0.154122
      card
      0.378798
    
    
      9
      ernal
      -0.151867
      ller
      0.370022
    
    
      10
      windo
      -0.146486
      board
      0.351313
    
    
      11
      for s
      -0.138725
      scsi-
      0.346129
    
    
      12
      file
      -0.138207
      vlb
      0.339812
    
    
      13
      quadr
      -0.128540
      bios
      0.331904
    
    
      14
      terna
      -0.128167
      rives
      0.329731
    
    
      15
      power
      -0.127971
      aster
      0.317262
    
    
      16
      iver
      -0.127483
      herbo
      0.316772
    
    
      17
      rnal
      -0.121413
      ntrol
      0.313237
    
    
      18
      uadra
      -0.121338
      rive
      0.311371
    
    
      19
      e mac
      -0.120393
      therb
      0.309691
    
    
      20
      ac ii
      -0.119918
      ontro
      0.307562
    
    
      21
      rd an
      -0.116821
      s scs
      0.306790
    
    
      22
      iisi
      -0.115712
      teway
      0.303595
    
    
      23
      ippin
      -0.114795
      gatew
      0.303595
    
    
      24
      powe
      -0.114598
      atewa
      0.303595
    
    
      25
      hippi
      -0.114077
      erboa
      0.303019
    
    
      26
      mac i
      -0.113544
      loppy
      0.297843
    
    
      27
      the r
      -0.111339
      irq
      0.297385
    
    
      28
      shipp
      -0.110979
      rboar
      0.297033
    
    
      29
      river
      -0.110168
      jumpe
      0.296655
    
    
      30
      trol
      -0.107486
      isa b
      0.295718
    
    
      31
      atest
      -0.105985
      boot
      0.293753
    
    
      32
      offer
      -0.105914
      eisa
      0.289116
    
    
      33
      olor
      -0.105693
      maste
      0.288282
    
    
      34
      adra
      -0.105026
      486dx
      0.287544
    
    
      35
      e kno
      -0.104982
      drive
      0.284757
    
    
      36
      tris
      -0.104622
      umper
      0.283022
    
    
      37
      iisi
      -0.104128
      mothe
      0.282869
    
    
      38
      quad
      -0.102243
      gate
      0.277538
    
    
      39
      ntris
      -0.102111
      486d
      0.276730
    
    
      40
      mail
      -0.101645
      moth
      0.276552
    
    
      41
      ndows
      -0.101178
      scsi\n
      0.269204
    
    
      42
      entri
      -0.099466
      17"
      0.265744
    
    
      43
      pping
      -0.099150
      sa bu
      0.260033
    
    
      44
      ntern
      -0.097730
      aptec
      0.259465
    
    
      45
      ship
      -0.097348
      flopp
      0.257935
    
    
      46
      ne kn
      -0.097247
      7" mo
      0.256589
    
    
      47
      offe
      -0.097217
      a 486
      0.254615
    
    
      48
      rmina
      -0.096218
      eway
      0.254564
    
    
      49
      he ii
      -0.096031
      disk
      0.254408
    
  








    



class = comp.sys.mac.hardware






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      ndows
      -0.249282
      mac
      1.181907
    
    
      1
      windo
      -0.243462
      apple
      1.113316
    
    
      2
      indow
      -0.241740
      pple
      0.938156
    
    
      3
      wind
      -0.216890
      uadra
      0.765887
    
    
      4
      troll
      -0.181238
      quadr
      0.756157
    
    
      5
      r sal
      -0.181068
      appl
      0.699827
    
    
      6
      or sa
      -0.177942
      ntris
      0.691111
    
    
      7
      dows
      -0.169359
      quad
      0.689525
    
    
      8
      ide
      -0.164676
      tris
      0.662605
    
    
      9
      was
      -0.162394
      adra
      0.645392
    
    
      10
      sale
      -0.160638
      entri
      0.606050
    
    
      11
      dos
      -0.159867
      e mac
      0.581103
    
    
      12
      tion
      -0.159049
      simm
      0.534742
    
    
      13
      oller
      -0.155241
      ris 6
      0.528727
    
    
      14
      rolle
      -0.153744
      rbook
      0.524118
    
    
      15
      car
      -0.140105
      erboo
      0.524118
    
    
      16
      sale\n
      -0.129215
      werbo
      0.522094
    
    
      17
      file
      -0.123062
      owerb
      0.513684
    
    
      18
      offer
      -0.119675
      mac i
      0.476253
    
    
      19
      486dx
      -0.118243
      duo
      0.463300
    
    
      20
      contr
      -0.117083
      lciii
      0.462353
    
    
      21
      prog
      -0.114890
      iisi
      0.449844
    
    
      22
      offe
      -0.114142
      lcii
      0.447892
    
    
      23
      ller
      -0.113738
      ac ii
      0.401948
    
    
      24
      was w
      -0.113079
      a mac
      0.401265
    
    
      25
      gate
      -0.112909
      power
      0.398869
    
    
      26
      r win
      -0.111513
      simms
      0.398192
    
    
      27
      the w
      -0.109609
      centr
      0.390573
    
    
      28
      an o
      -0.108568
      vram
      0.380350
    
    
      29
      bios
      -0.108146
      powe
      0.376295
    
    
      30
      ontro
      -0.106682
      c650
      0.374078
    
    
      31
      486d
      -0.106052
      iisi
      0.356028
    
    
      32
      a 486
      -0.106031
      s 610
      0.352749
    
    
      33
      files
      -0.104793
      is 61
      0.350748
    
    
      34
      a 48
      -0.103365
      macs
      0.349994
    
    
      35
      ntrol
      -0.102389
      he lc
      0.349674
    
    
      36
      progr
      -0.101212
      nubus
      0.349553
    
    
      37
      he bi
      -0.100341
      nubu
      0.336223
    
    
      38
      john
      -0.099871
      he ma
      0.318364
    
    
      39
      isa
      -0.095963
      ubus
      0.317760
    
    
      40
      ogram
      -0.095621
      c650
      0.317715
    
    
      41
      eway
      -0.094665
      ciii
      0.316954
    
    
      42
      rogra
      -0.094409
      vram
      0.304517
    
    
      43
      le ii
      -0.092187
      lc ii
      0.300362
    
    
      44
      atewa
      -0.091715
      pgrad
      0.296581
    
    
      45
      gatew
      -0.091715
      upgra
      0.292764
    
    
      46
      teway
      -0.091715
      lc i
      0.289957
    
    
      47
      as wo
      -0.091698
      adb
      0.284755
    
    
      48
      for s
      -0.090928
      imms
      0.282688
    
    
      49
      aptec
      -0.090298
      pds
      0.279150
    
  








    



class = comp.windows.x






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      dos
      -0.285421
      motif
      1.038948
    
    
      1
      dows
      -0.232684
      widge
      1.033707
    
    
      2
      for w
      -0.215935
      idget
      1.029967
    
    
      3
      s 3.1
      -0.210200
      ndow
      0.949009
    
    
      4
      drive
      -0.205488
      widg
      0.870868
    
    
      5
      r win
      -0.199687
      the x
      0.855321
    
    
      6
      driv
      -0.195715
      x11r
      0.814708
    
    
      7
      card
      -0.195496
      otif
      0.810574
    
    
      8
      mode
      -0.183356
      x11r5
      0.791648
    
    
      9
      river
      -0.170276
      moti
      0.714497
    
    
      10
      ndows
      -0.151904
      erver
      0.703306
    
    
      11
      data
      -0.151166
      indow
      0.677439
    
    
      12
      edu (
      -0.149197
      xter
      0.673960
    
    
      13
      ivers
      -0.147227
      windo
      0.672274
    
    
      14
      or wi
      -0.144325
      .lcs.
      0.640502
    
    
      15
      3.1
      -0.140729
      an x
      0.639145
    
    
      16
      files
      -0.135844
      lcs.m
      0.632856
    
    
      17
      file
      -0.134367
      xterm
      0.619203
    
    
      18
      data
      -0.134365
      cs.mi
      0.612331
    
    
      19
      disk
      -0.132212
      xpo.l
      0.598478
    
    
      20
      apple
      -0.130597
      po.lc
      0.598478
    
    
      21
      on: u
      -0.130275
      o.lcs
      0.598478
    
    
      22
      micro
      -0.130189
      expo.
      0.598478
    
    
      23
      ontro
      -0.128230
      s.mit
      0.596266
    
    
      24
      ntrol
      -0.126785
      rver
      0.592994
    
    
      25
      print
      -0.126529
      \nto:
      0.554958
    
    
      26
      s for
      -0.126485
      11r5
      0.554392
    
    
      27
      was
      -0.126044
      @expo
      0.547827
    
    
      28
      rinte
      -0.125220
      lient
      0.542968
    
    
      29
      compu
      -0.122309
      clien
      0.538482
    
    
      30
      u.edu
      -0.121555
      .mit.
      0.536659
    
    
      31
      micr
      -0.120605
      pert@
      0.534854
    
    
      32
      prin
      -0.119756
      xper
      0.530423
    
    
      33
      ccess
      -0.119584
      ing x
      0.522248
    
    
      34
      univ
      -0.118422
      mit.e
      0.511993
    
    
      35
      lity
      -0.116723
      t@exp
      0.503491
    
    
      36
      out
      -0.114841
      rt@ex
      0.502618
    
    
      37
      mac
      -0.114330
      ert@e
      0.502618
    
    
      38
      ows 3
      -0.112654
      ct: x
      0.496638
    
    
      39
      acces
      -0.111811
      dow m
      0.485591
    
    
      40
      niver
      -0.111376
      clie
      0.481668
    
    
      41
      n: us
      -0.110026
      displ
      0.478946
    
    
      42
      : win
      -0.107857
      xpert
      0.478170
    
    
      43
      unive
      -0.107461
      to: x
      0.477017
    
    
      44
      ersit
      -0.107052
      splay
      0.472727
    
    
      45
      word
      -0.106937
      serve
      0.468464
    
    
      46
      iver
      -0.106772
      rnet\n
      0.463489
    
    
      47
      count
      -0.106322
      : xpe
      0.461695
    
    
      48
      rsity
      -0.105651
      dget
      0.460244
    
    
      49
      chip
      -0.105576
      ispla
      0.455907
    
  








    



class = misc.forsale






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      t: re
      -0.284520
      r sal
      1.946512
    
    
      1
      ct: r
      -0.279003
      sale
      1.888745
    
    
      2
      re:
      -0.275372
      or sa
      1.864354
    
    
      3
      : re:
      -0.274179
      for s
      1.219069
    
    
      4
      anyon
      -0.233947
      sale\n
      0.950867
    
    
      5
      nyone
      -0.232929
      offer
      0.879329
    
    
      6
      that
      -0.232821
      offe
      0.794954
    
    
      7
      yone
      -0.229462
      hippi
      0.782960
    
    
      8
      anyo
      -0.227751
      shipp
      0.757651
    
    
      9
      that
      -0.226408
      sale
      0.716921
    
    
      10
      help
      -0.219395
      sale:
      0.702044
    
    
      11
      t the
      -0.207557
      ippin
      0.700228
    
    
      12
      ?\norg
      -0.203893
      ship
      0.657347
    
    
      13
      the s
      -0.202806
      ale:
      0.649299
    
    
      14
      ould
      -0.196347
      rsale
      0.647213
    
    
      15
      n the
      -0.195043
      orsal
      0.644706
    
    
      16
      how
      -0.186268
      forsa
      0.624387
    
    
      17
      hanks
      -0.179725
      sale.
      0.602332
    
    
      18
      info
      -0.176671
      pping
      0.561603
    
    
      19
      here
      -0.168495
      sell
      0.558384
    
    
      20
      anks
      -0.164983
      ale\no
      0.501618
    
    
      21
      thank
      -0.161385
      nditi
      0.497111
    
    
      22
      ? tha
      -0.157931
      fors
      0.495101
    
    
      23
      what
      -0.157100
      askin
      0.492689
    
    
      24
      there
      -0.155025
      ondit
      0.491716
    
    
      25
      on t
      -0.152140
      condi
      0.490672
    
    
      26
      in th
      -0.151960
      for $
      0.488348
    
    
      27
      can
      -0.151402
      sking
      0.480182
    
    
      28
      ommen
      -0.149528
      le\nor
      0.458098
    
    
      29
      does
      -0.149149
      for
      0.457451
    
    
      30
      know
      -0.148880
      aski
      0.450945
    
    
      31
      is t
      -0.148213
      ale.
      0.439920
    
    
      32
      what
      -0.146651
      inclu
      0.436977
    
    
      33
      estio
      -0.146280
      nclud
      0.431275
    
    
      34
      s any
      -0.146008
      cond
      0.430081
    
    
      35
      does
      -0.145651
      clude
      0.422192
    
    
      36
      stion
      -0.145645
      ditio
      0.416524
    
    
      37
      bike
      -0.145437
      ing $
      0.408327
    
    
      38
      o the
      -0.145081
      price
      0.380063
    
    
      39
      some
      -0.143722
      incl
      0.376264
    
    
      40
      any
      -0.142687
      manua
      0.374705
    
    
      41
      s\norg
      -0.142457
      anual
      0.374434
    
    
      42
      ng a
      -0.142366
      if in
      0.364452
    
    
      43
      d be
      -0.141671
      sell
      0.362964
    
    
      44
      recom
      -0.141251
      mail
      0.362954
    
    
      45
      ther
      -0.139936
      ffer
      0.360446
    
    
      46
      info
      -0.135513
      reste
      0.356020
    
    
      47
      in a
      -0.134944
      est o
      0.353581
    
    
      48
      on th
      -0.133711
      t off
      0.349485
    
    
      49
      ecomm
      -0.133620
      game
      0.332203
    
  








    



class = rec.autos






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      bike
      -0.447476
      car
      1.332066
    
    
      1
      bike
      -0.295431
      cars
      1.022231
    
    
      2
      card
      -0.226607
      cars
      0.797087
    
    
      3
      e bik
      -0.204163
      auto
      0.754415
    
    
      4
      game
      -0.148118
      e car
      0.632151
    
    
      5
      or sa
      -0.145579
      deale
      0.521365
    
    
      6
      bikes
      -0.141259
      ealer
      0.515262
    
    
      7
      appl
      -0.138371
      utomo
      0.500544
    
    
      8
      apple
      -0.132100
      autom
      0.499204
    
    
      9
      r sal
      -0.131201
      ford
      0.449743
    
    
      10
      ride
      -0.130011
      otive
      0.427707
    
    
      11
      card
      -0.129637
      tomot
      0.419487
    
    
      12
      board
      -0.129264
      deal
      0.416377
    
    
      13
      s to
      -0.128221
      aler
      0.405098
    
    
      14
      rider
      -0.124475
      car i
      0.401449
    
    
      15
      team
      -0.124013
      car.
      0.399626
    
    
      16
      pple
      -0.117924
      omoti
      0.398442
    
    
      17
      guns
      -0.112414
      car,
      0.383130
    
    
      18
      play
      -0.112218
      car,
      0.376840
    
    
      19
      gun
      -0.111121
      oyota
      0.375485
    
    
      20
      david
      -0.110370
      toyot
      0.373812
    
    
      21
      cage
      -0.109768
      oil
      0.370888
    
    
      22
      avid
      -0.108519
      motiv
      0.359810
    
    
      23
      compu
      -0.107837
      toyo
      0.355105
    
    
      24
      a bik
      -0.105580
      boyle
      0.354520
    
    
      25
      /////
      -0.104768
      ad)..
      0.349386
    
    
      26
      ard d
      -0.103217
      ...(p
      0.349386
    
    
      27
      otorc
      -0.102594
      ..(pl
      0.349386
    
    
      28
      nitor
      -0.102540
      .(ple
      0.349386
    
    
      29
      file
      -0.102145
      car.
      0.346348
    
    
      30
      chine
      -0.101747
      a car
      0.341654
    
    
      31
      order
      -0.099018
      d)...
      0.341043
    
    
      32
      now
      -0.098314
      ead).
      0.340055
    
    
      33
      sale
      -0.097799
      read)
      0.337229
    
    
      34
      onito
      -0.097496
      ngine
      0.332878
    
    
      35
      dod
      -0.096685
      wagon
      0.331770
    
    
      36
      of h
      -0.096505
      engin
      0.330210
    
    
      37
      monit
      -0.096342
      yota
      0.329858
    
    
      38
      omput
      -0.096124
      engi
      0.327341
    
    
      39
      port
      -0.095993
      )...\n
      0.321081
    
    
      40
      imag
      -0.095873
      he ca
      0.321047
    
    
      41
      y bik
      -0.095451
      ....(
      0.319974
    
    
      42
      one
      -0.095380
      umbes
      0.318903
    
    
      43
      guns
      -0.095023
      mbest
      0.318903
    
    
      44
      cont
      -0.094883
      dumbe
      0.316993
    
    
      45
      east
      -0.094793
      autos
      0.315683
    
    
      46
      n of
      -0.094480
      g....
      0.315607
    
    
      47
      rial
      -0.093952
      r car
      0.314386
    
    
      48
      torcy
      -0.093727
      : war
      0.313933
    
    
      49
      one t
      -0.093667
      lliso
      0.312228
    
  








    



class = rec.motorcycles






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      car
      -0.235848
      bike
      2.463835
    
    
      1
      auto
      -0.169623
      bike
      1.382759
    
    
      2
      cars
      -0.135226
      otorc
      1.214320
    
    
      3
      the p
      -0.130284
      torcy
      1.201406
    
    
      4
      game
      -0.129371
      orcyc
      1.189365
    
    
      5
      play
      -0.127945
      rcycl
      1.188078
    
    
      6
      has
      -0.126328
      ride
      1.076691
    
    
      7
      cars
      -0.125134
      cycle
      0.954165
    
    
      8
      torol
      -0.123827
      e bik
      0.945903
    
    
      9
      orola
      -0.123756
      motor
      0.899671
    
    
      10
      otoro
      -0.122984
      moto
      0.898792
    
    
      11
      e pro
      -0.122867
      bikes
      0.855027
    
    
      12
      uld b
      -0.121739
      dod #
      0.847455
    
    
      13
      ical
      -0.120472
      dod
      0.809856
    
    
      14
      does
      -0.120437
      rider
      0.759128
    
    
      15
      ndows
      -0.118623
      ridin
      0.707794
    
    
      16
      on: s
      -0.118125
      ycle
      0.616406
    
    
      17
      windo
      -0.114932
      ride
      0.595201
    
    
      18
      d be
      -0.114779
      ridi
      0.585603
    
    
      19
      space
      -0.114765
      bmw
      0.558758
    
    
      20
      does
      -0.114518
      a bik
      0.548926
    
    
      21
      indow
      -0.113472
      helme
      0.548264
    
    
      22
      .....
      -0.108079
      helm
      0.544714
    
    
      23
      cons
      -0.107385
      iding
      0.543775
    
    
      24
      powe
      -0.106793
      ikes
      0.533816
    
    
      25
      autom
      -0.105640
      y bik
      0.528643
    
    
      26
      belie
      -0.104890
      elmet
      0.527710
    
    
      27
      ld be
      -0.104641
      od #0
      0.492050
    
    
      28
      oes a
      -0.104255
      biker
      0.441115
    
    
      29
      usin
      -0.103999
      he bi
      0.436256
    
    
      30
      eliev
      -0.103526
      cage
      0.435488
    
    
      31
      s, in
      -0.102693
      dod#
      0.414394
    
    
      32
      rola
      -0.101492
      ycles
      0.405564
    
    
      33
      supp
      -0.100667
      bike.
      0.402476
    
    
      34
      e con
      -0.100062
      harl
      0.401603
    
    
      35
      hrist
      -0.099078
      t bik
      0.400582
    
    
      36
      beli
      -0.098856
      r bik
      0.396247
    
    
      37
      power
      -0.098034
      arley
      0.388181
    
    
      38
      lieve
      -0.096643
      ehann
      0.371777
    
    
      39
      gun
      -0.096405
      behan
      0.367423
    
    
      40
      hat t
      -0.095960
      hanna
      0.357456
    
    
      41
      ason
      -0.095853
      my bi
      0.354486
    
    
      42
      inte
      -0.095787
      lmet
      0.354323
    
    
      43
      hey a
      -0.095760
      e rid
      0.350504
    
    
      44
      du (b
      -0.095696
      a mot
      0.350030
    
    
      45
      c.edu
      -0.094634
      harle
      0.346507
    
    
      46
      id th
      -0.094570
      amaha
      0.346121
    
    
      47
      edwar
      -0.094005
      terst
      0.344235
    
    
      48
      car i
      -0.093673
      n@eas
      0.342178
    
    
      49
      ander
      -0.093251
      en@ea
      0.342178
    
  








    



class = rec.sport.baseball






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      hocke
      -0.387633
      pitch
      1.317979
    
    
      1
      ockey
      -0.357096
      pitc
      1.251327
    
    
      2
      hock
      -0.355135
      sebal
      1.125030
    
    
      3
      goal
      -0.350442
      aseba
      1.123968
    
    
      4
      layof
      -0.296948
      baseb
      1.123140
    
    
      5
      ayoff
      -0.295727
      eball
      1.121030
    
    
      6
      playo
      -0.294216
      ball
      0.928144
    
    
      7
      nhl
      -0.216489
      base
      0.905820
    
    
      8
      e nhl
      -0.215044
      tcher
      0.704080
    
    
      9
      yoffs
      -0.207485
      brave
      0.627126
    
    
      10
      ckey
      -0.204440
      itchi
      0.626820
    
    
      11
      he nh
      -0.202300
      runs
      0.622253
    
    
      12
      wing
      -0.182870
      yank
      0.612611
    
    
      13
      pens
      -0.167918
      hitt
      0.610819
    
    
      14
      leaf
      -0.166590
      raves
      0.606225
    
    
      15
      goals
      -0.163845
      yanke
      0.592199
    
    
      16
      leafs
      -0.163608
      hitte
      0.590829
    
    
      17
      uins
      -0.162823
      itche
      0.587270
    
    
      18
      wings
      -0.156440
      ankee
      0.583680
    
    
      19
      you
      -0.153992
      illie
      0.577014
    
    
      20
      nguin
      -0.151521
      llies
      0.571660
    
    
      21
      r sal
      -0.151243
      brav
      0.563277
    
    
      22
      engui
      -0.150934
      game
      0.560745
    
    
      23
      pengu
      -0.150934
      phill
      0.557445
    
    
      24
      bike
      -0.149070
      tchin
      0.546160
    
    
      25
      the i
      -0.146757
      field
      0.514674
    
    
      26
      goal
      -0.146418
      itter
      0.499724
    
    
      27
      guins
      -0.146222
      mets
      0.499050
    
    
      28
      chris
      -0.145174
      hilli
      0.497567
    
    
      29
      lyers
      -0.140835
      cubs
      0.475976
    
    
      30
      eafs
      -0.140708
      layer
      0.472641
    
    
      31
      flyer
      -0.140657
      playe
      0.468003
    
    
      32
      ical
      -0.139467
      team
      0.454292
    
    
      33
      the c
      -0.138016
      ball
      0.442790
    
    
      34
      yoff
      -0.137966
      aves
      0.436709
    
    
      35
      the f
      -0.137776
      cubs
      0.433247
    
    
      36
      peng
      -0.136959
      sox
      0.427468
    
    
      37
      sale
      -0.135846
      jays
      0.426236
    
    
      38
      espn
      -0.135397
      phil
      0.422527
    
    
      39
      space
      -0.134654
      ching
      0.420616
    
    
      40
      e ice
      -0.134463
      e run
      0.417006
    
    
      41
      cup
      -0.133467
      ayers
      0.415740
    
    
      42
      there
      -0.132243
      lomar
      0.413823
    
    
      43
      flye
      -0.131955
      ball.
      0.413712
    
    
      44
      penal
      -0.131549
      hit
      0.409910
    
    
      45
      windo
      -0.131421
      aloma
      0.405286
    
    
      46
      oals
      -0.130694
      play
      0.400369
    
    
      47
      indow
      -0.129893
      home
      0.397470
    
    
      48
      =====
      -0.127582
      seaso
      0.392525
    
    
      49
      bruin
      -0.127174
      alom
      0.392132
    
  








    



class = rec.sport.hockey






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      pitch
      -0.461276
      hocke
      1.535583
    
    
      1
      pitc
      -0.425422
      ockey
      1.502924
    
    
      2
      itchi
      -0.261211
      hock
      1.439787
    
    
      3
      runs
      -0.246273
      play
      1.381557
    
    
      4
      itche
      -0.227483
      ckey
      1.127159
    
    
      5
      base
      -0.204821
      team
      1.124085
    
    
      6
      mets
      -0.176732
      playo
      1.114949
    
    
      7
      ball
      -0.176623
      layof
      1.113788
    
    
      8
      sebal
      -0.176404
      ayoff
      1.105152
    
    
      9
      aseba
      -0.176274
      goal
      0.990104
    
    
      10
      baseb
      -0.176144
      nhl
      0.768611
    
    
      11
      sale
      -0.175885
      game
      0.720266
    
    
      12
      brave
      -0.174616
      game
      0.673643
    
    
      13
      eball
      -0.172910
      team
      0.668960
    
    
      14
      tion
      -0.172551
      leaf
      0.662521
    
    
      15
      llies
      -0.170228
      leafs
      0.648150
    
    
      16
      hitt
      -0.168332
      yoffs
      0.642385
    
    
      17
      tcher
      -0.164154
      e nhl
      0.634809
    
    
      18
      inni
      -0.163471
      yoff
      0.613883
    
    
      19
      card
      -0.159966
      playe
      0.593323
    
    
      20
      tiger
      -0.157453
      wings
      0.588920
    
    
      21
      tige
      -0.156389
      he nh
      0.587949
    
    
      22
      brav
      -0.153643
      evils
      0.575008
    
    
      23
      run
      -0.149741
      wing
      0.544409
    
    
      24
      hitte
      -0.148633
      eafs
      0.544361
    
    
      25
      ball.
      -0.148576
      layer
      0.526256
    
    
      26
      igers
      -0.147544
      coach
      0.526096
    
    
      27
      work
      -0.145700
      devil
      0.522240
    
    
      28
      runs
      -0.142803
      uins
      0.518214
    
    
      29
      .edu
      -0.141035
      espn
      0.517386
    
    
      30
      illie
      -0.140901
      toron
      0.517131
    
    
      31
      field
      -0.140841
      oront
      0.517131
    
    
      32
      he ph
      -0.140491
      ronto
      0.511178
    
    
      33
      hit
      -0.139046
      e pen
      0.500194
    
    
      34
      chers
      -0.137209
      seaso
      0.492490
    
    
      35
      dodge
      -0.135668
      pengu
      0.492271
    
    
      36
      e run
      -0.133673
      engui
      0.492271
    
    
      37
      dodg
      -0.133049
      troit
      0.488038
    
    
      38
      aves
      -0.132833
      detro
      0.487313
    
    
      39
      driv
      -0.131367
      etroi
      0.483845
    
    
      40
      catc
      -0.129916
      nguin
      0.483678
    
    
      41
      rocki
      -0.129907
      guins
      0.474558
    
    
      42
      itter
      -0.129159
      coac
      0.473427
    
    
      43
      ockie
      -0.128602
      y cup
      0.472747
    
    
      44
      edu (
      -0.128316
      cup
      0.453882
    
    
      45
      use
      -0.127157
      ley c
      0.453598
    
    
      46
      ckies
      -0.126803
      seas
      0.451769
    
    
      47
      riole
      -0.126644
      ey cu
      0.451525
    
    
      48
      tchin
      -0.126375
      olcho
      0.451118
    
    
      49
      oriol
      -0.126294
      lchow
      0.451118
    
  








    



class = sci.crypt






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      -----
      -0.284012
      crypt
      2.397442
    
    
      1
      indow
      -0.152546
      encry
      1.571634
    
    
      2
      windo
      -0.152464
      ncryp
      1.567428
    
    
      3
      n: us
      -0.145351
      encr
      1.465511
    
    
      4
      oblem
      -0.140890
      clipp
      1.431540
    
    
      5
      probl
      -0.140256
      ipper
      1.419516
    
    
      6
      roble
      -0.140175
      lippe
      1.412055
    
    
      7
      wind
      -0.135490
      rypti
      1.277410
    
    
      8
      chris
      -0.132707
      yptio
      1.240426
    
    
      9
      guns
      -0.131371
      clip
      1.176980
    
    
      10
      ere i
      -0.130495
      cryp
      1.154029
    
    
      11
      image
      -0.129232
      pper
      1.140149
    
    
      12
      : usa
      -0.128974
      rypto
      1.131629
    
    
      13
      on: u
      -0.126589
      chip
      0.999422
    
    
      14
      ower
      -0.126415
      key
      0.957370
    
    
      15
      the m
      -0.125031
      chip
      0.798083
    
    
      16
      s in
      -0.122532
      keys
      0.791703
    
    
      17
      grap
      -0.121502
      per c
      0.751491
    
    
      18
      thank
      -0.121042
      ption
      0.744498
    
    
      19
      . you
      -0.120594
      escro
      0.716339
    
    
      20
      keybo
      -0.119617
      secur
      0.700087
    
    
      21
      phics
      -0.119188
      scrow
      0.698501
    
    
      22
      ndows
      -0.119161
      e key
      0.694333
    
    
      23
      the b
      -0.118458
      r chi
      0.626615
    
    
      24
      appl
      -0.118342
      er ch
      0.609113
    
    
      25
      eyboa
      -0.118320
      secu
      0.605918
    
    
      26
      yboar
      -0.118320
      yptog
      0.596017
    
    
      27
      kill
      -0.118164
      ptogr
      0.596017
    
    
      28
      israe
      -0.117987
      nsa
      0.590867
    
    
      29
      srael
      -0.117960
      escr
      0.566792
    
    
      30
      or a
      -0.117930
      rypte
      0.555481
    
    
      31
      he ar
      -0.116731
      ypted
      0.552844
    
    
      32
      apple
      -0.116487
      e cli
      0.550312
    
    
      33
      keyb
      -0.114318
      ypto
      0.549488
    
    
      34
      ervic
      -0.114096
      re: o
      0.546840
    
    
      35
      rvice
      -0.113298
      gtoal
      0.531762
    
    
      36
      ave a
      -0.113001
      e nsa
      0.529573
    
    
      37
      ians
      -0.112395
      secre
      0.528408
    
    
      38
      visio
      -0.112270
      tapp
      0.498630
    
    
      39
      spac
      -0.112192
      crow
      0.495821
    
    
      40
      e.com
      -0.111206
      togra
      0.489905
    
    
      41
      pace
      -0.110359
      ecret
      0.488486
    
    
      42
      chri
      -0.108957
      tappe
      0.484258
    
    
      43
      the h
      -0.108616
      e tap
      0.476374
    
    
      44
      ters
      -0.108438
      he ns
      0.475263
    
    
      45
      hanks
      -0.108427
      wiret
      0.467759
    
    
      46
      servi
      -0.107839
      retap
      0.467759
    
    
      47
      e for
      -0.107542
      ireta
      0.467759
    
    
      48
      om: d
      -0.107013
      keys
      0.467381
    
    
      49
      compo
      -0.106812
      t key
      0.467130
    
  








    



class = sci.electronics






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      windo
      -0.179607
      rcuit
      0.766207
    
    
      1
      indow
      -0.178272
      ircui
      0.761114
    
    
      2
      ndows
      -0.155433
      circu
      0.606135
    
    
      3
      mac
      -0.153168
      circ
      0.582962
    
    
      4
      space
      -0.152030
      volt
      0.548929
    
    
      5
      crypt
      -0.148931
      cuit
      0.525815
    
    
      6
      wind
      -0.145591
      oltag
      0.463753
    
    
      7
      drive
      -0.139402
      ltage
      0.463753
    
    
      8
      spac
      -0.135972
      volta
      0.460535
    
    
      9
      sale
      -0.133337
      lectr
      0.444991
    
    
      10
      or sa
      -0.129630
      elec
      0.373032
    
    
      11
      r sal
      -0.123852
      sisto
      0.355999
    
    
      12
      raphi
      -0.121161
      elect
      0.346360
    
    
      13
      aphic
      -0.118990
      etect
      0.311119
    
    
      14
      auto
      -0.118714
      detec
      0.311119
    
    
      15
      bike
      -0.116512
      cuits
      0.304165
    
    
      16
      graph
      -0.114089
      ectro
      0.301768
    
    
      17
      lippe
      -0.112767
      line
      0.298895
    
    
      18
      acce
      -0.109226
      ower
      0.297016
    
    
      19
      t on
      -0.108852
      esist
      0.296768
    
    
      20
      file
      -0.108632
      eiver
      0.292461
    
    
      21
      ncryp
      -0.107185
      ignal
      0.292125
    
    
      22
      encry
      -0.107105
      resis
      0.291774
    
    
      23
      e. th
      -0.106955
      ctron
      0.287576
    
    
      24
      driv
      -0.106869
      tecto
      0.284756
    
    
      25
      think
      -0.106180
      troni
      0.275736
    
    
      26
      cent
      -0.106174
      a pho
      0.275226
    
    
      27
      for m
      -0.105795
      e amp
      0.273497
    
    
      28
      t for
      -0.105434
      audio
      0.268876
    
    
      29
      grap
      -0.104316
      onics
      0.268459
    
    
      30
      dows
      -0.104234
      amp
      0.267126
    
    
      31
      card
      -0.103895
      g tow
      0.266887
    
    
      32
      cause
      -0.103558
      tage
      0.265596
    
    
      33
      clipp
      -0.103260
      cooli
      0.263897
    
    
      34
      all
      -0.099243
      rada
      0.262090
    
    
      35
      perso
      -0.098532
      oolin
      0.261349
    
    
      36
      ipper
      -0.098228
      adio
      0.259043
    
    
      37
      rypti
      -0.096445
      power
      0.258887
    
    
      38
      ship
      -0.096191
      ectri
      0.258411
    
    
      39
      orbit
      -0.094506
      ctric
      0.258411
    
    
      40
      ently
      -0.094468
      use
      0.257458
    
    
      41
      chang
      -0.094430
      radar
      0.253589
    
    
      42
      yptio
      -0.094349
      powe
      0.252295
    
    
      43
      clip
      -0.093778
      cool
      0.251513
    
    
      44
      ccess
      -0.093601
      chip
      0.251445
    
    
      45
      encr
      -0.093538
      utlet
      0.250426
    
    
      46
      ntly
      -0.092964
      outle
      0.250426
    
    
      47
      me a
      -0.092877
      radio
      0.248765
    
    
      48
      becau
      -0.092413
      scope
      0.247174
    
    
      49
      ecaus
      -0.092413
      signa
      0.245484
    
  








    



class = sci.med






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      drive
      -0.174184
      medic
      0.753762
    
    
      1
      stian
      -0.173023
      msg
      0.742886
    
    
      2
      istia
      -0.172149
      docto
      0.699396
    
    
      3
      hrist
      -0.167166
      octor
      0.698220
    
    
      4
      space
      -0.166703
      sease
      0.676768
    
    
      5
      risti
      -0.165866
      iseas
      0.673404
    
    
      6
      chris
      -0.164079
      disea
      0.673404
    
    
      7
      spac
      -0.161962
      food
      0.646730
    
    
      8
      driv
      -0.159636
      dise
      0.635357
    
    
      9
      chri
      -0.156317
      treat
      0.626115
    
    
      10
      power
      -0.155868
      geb@c
      0.590658
    
    
      11
      powe
      -0.146908
      doct
      0.589793
    
    
      12
      god
      -0.133700
      don b
      0.583144
    
    
      13
      car
      -0.133568
      n ban
      0.573876
    
    
      14
      he de
      -0.133291
      pitt.
      0.570318
    
    
      15
      game
      -0.128654
      trea
      0.567426
    
    
      16
      nment
      -0.126921
      cs.pi
      0.562157
    
    
      17
      theis
      -0.126644
      tient
      0.561302
    
    
      18
      pace
      -0.125246
      patie
      0.554959
    
    
      19
      ling
      -0.123670
      atien
      0.554959
    
    
      20
      bibl
      -0.121707
      @cs.p
      0.554225
    
    
      21
      he bi
      -0.121467
      medi
      0.550727
    
    
      22
      windo
      -0.120702
      banks
      0.536373
    
    
      23
      ower
      -0.119597
      .pitt
      0.515359
    
    
      24
      indow
      -0.118981
      gordo
      0.515172
    
    
      25
      ligio
      -0.118284
      ordon
      0.514018
    
    
      26
      relig
      -0.118019
      edica
      0.513667
    
    
      27
      gover
      -0.117337
      bank
      0.504548
    
    
      28
      rive
      -0.114718
      eb@cs
      0.496635
    
    
      29
      overn
      -0.114714
      rdon
      0.495473
    
    
      30
      play
      -0.113358
      pati
      0.494558
    
    
      31
      vernm
      -0.112855
      geb@
      0.492000
    
    
      32
      ernme
      -0.112818
      anks)
      0.491771
    
    
      33
      rnmen
      -0.112802
      b@cs.
      0.487279
    
    
      34
      the g
      -0.111907
      tt.ed
      0.481260
    
    
      35
      eligi
      -0.111141
      itt.e
      0.480813
    
    
      36
      u.edu
      -0.109548
      u (go
      0.473997
    
    
      37
      gove
      -0.109279
      on ba
      0.472747
    
    
      38
      nning
      -0.109024
      reatm
      0.466067
    
    
      39
      ware
      -0.107916
      eatme
      0.466067
    
    
      40
      athei
      -0.107584
      atmen
      0.465534
    
    
      41
      en th
      -0.106238
      s.pit
      0.463466
    
    
      42
      bike
      -0.105980
      ients
      0.429590
    
    
      43
      the w
      -0.105862
      dyer
      0.426110
    
    
      44
      tatio
      -0.105854
      (gord
      0.420624
    
    
      45
      engin
      -0.105831
      (gor
      0.420236
    
    
      46
      bilit
      -0.105432
      iagno
      0.412790
    
    
      47
      on th
      -0.103459
      diagn
      0.412790
    
    
      48
      a.edu
      -0.103111
      foods
      0.412778
    
    
      49
      disk
      -0.103106
      itivi
      0.406920
    
  








    



class = sci.space






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      windo
      -0.201688
      space
      1.760493
    
    
      1
      lt, m
      -0.200772
      spac
      1.692403
    
    
      2
      indow
      -0.199935
      pace
      1.343471
    
    
      3
      elt,
      -0.199304
      orbit
      1.121241
    
    
      4
      belt,
      -0.199116
      orbi
      1.024504
    
    
      5
      t, md
      -0.194512
      moon
      0.850933
    
    
      6
      , md
      -0.182686
      launc
      0.783823
    
    
      7
      ndows
      -0.182051
      aunch
      0.766848
    
    
      8
      chip
      -0.180454
      laun
      0.748340
    
    
      9
      *****
      -0.179084
      nasa
      0.646137
    
    
      10
      steve
      -0.177157
      huttl
      0.625697
    
    
      11
      d usa
      -0.174550
      moon
      0.620814
    
    
      12
      md u
      -0.166240
      uttle
      0.608911
    
    
      13
      md us
      -0.166240
      rbit
      0.606169
    
    
      14
      : usa
      -0.160080
      ace s
      0.599939
    
    
      15
      your
      -0.156201
      e moo
      0.596560
    
    
      16
      ns, g
      -0.150602
      henry
      0.593839
    
    
      17
      driv
      -0.150455
      shutt
      0.589686
    
    
      18
      , gre
      -0.149570
      rb@ac
      0.577582
    
    
      19
      file
      -0.148455
      prb@a
      0.577582
    
    
      20
      n: us
      -0.148328
      prb@
      0.574039
    
    
      21
      play
      -0.144103
      (pat)
      0.571262
    
    
      22
      s, gr
      -0.143880
      nasa
      0.549102
    
    
      23
      drive
      -0.142216
      laska
      0.547842
    
    
      24
      wind
      -0.139548
      alask
      0.547842
    
    
      25
      file
      -0.139332
      lunar
      0.519805
    
    
      26
      nbelt
      -0.137405
      ska.e
      0.505964
    
    
      27
      enbel
      -0.137243
      ka.ed
      0.505964
    
    
      28
      eenbe
      -0.134041
      aska.
      0.505964
    
    
      29
      dows
      -0.133773
      .alas
      0.505964
    
    
      30
      reenb
      -0.133693
      shut
      0.504948
    
    
      31
      t.edu
      -0.133632
      unar
      0.503734
    
    
      32
      good
      -0.133034
      luna
      0.500711
    
    
      33
      any
      -0.132673
      craft
      0.500077
    
    
      34
      your
      -0.131375
      cecra
      0.477195
    
    
      35
      h)\nsu
      -0.129856
      acecr
      0.477195
    
    
      36
      chris
      -0.128344
      pacec
      0.473351
    
    
      37
      teve
      -0.127526
      unch
      0.471966
    
    
      38
      sale
      -0.127338
      b@acc
      0.468673
    
    
      39
      r sal
      -0.126375
      e spa
      0.462074
    
    
      40
      i ha
      -0.123979
      astro
      0.461108
    
    
      41
      clin
      -0.123898
      fligh
      0.460700
    
    
      42
      game
      -0.123274
      ecraf
      0.458215
    
    
      43
      does
      -0.121958
      (pat
      0.457396
    
    
      44
      crypt
      -0.121588
      zoo.t
      0.448436
    
    
      45
      linto
      -0.120826
      oo.to
      0.448436
    
    
      46
      inton
      -0.120503
      o.tor
      0.448436
    
    
      47
      chip
      -0.119908
      @zoo.
      0.448436
    
    
      48
      i hav
      -0.119364
      urora
      0.448343
    
    
      49
      nning
      -0.119182
      auror
      0.448343
    
  








    



class = soc.religion.christian






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      ting-
      -0.286087
      hrist
      1.157187
    
    
      1
      ng-ho
      -0.283180
      chris
      1.037373
    
    
      2
      ralit
      -0.282904
      chri
      1.007450
    
    
      3
      nntp-
      -0.282361
      1993.
      0.926848
    
    
      4
      -host
      -0.282084
      .rutg
      0.912009
    
    
      5
      g-hos
      -0.282073
      rs.ed
      0.909241
    
    
      6
      host:
      -0.282073
      ers.e
      0.909241
    
    
      7
      ntp-p
      -0.282073
      istia
      0.907408
    
    
      8
      p-pos
      -0.282073
      stian
      0.892207
    
    
      9
      tp-po
      -0.282073
      <apr.
      0.878809
    
    
      10
      ing-h
      -0.282041
      risti
      0.878778
    
    
      11
      ost:
      -0.281688
      gers.
      0.870621
    
    
      12
      -post
      -0.280582
      <apr
      0.848478
    
    
      13
      \nnntp
      -0.269971
      utger
      0.826029
    
    
      14
      le <1
      -0.262265
      tgers
      0.826029
    
    
      15
      orali
      -0.246522
      rutge
      0.826029
    
    
      16
      ostin
      -0.240871
      .1993
      0.819469
    
    
      17
      posti
      -0.239520
      e <ap
      0.814446
    
    
      18
      ution
      -0.233377
      os.ru
      0.796072
    
    
      19
      .com>
      -0.227263
      hos.r
      0.796072
    
    
      20
      93apr
      -0.227262
      athos
      0.796072
    
    
      21
      moral
      -0.225977
      @atho
      0.796072
    
    
      22
      mora
      -0.213659
      thos.
      0.787158
    
    
      23
      1993a
      -0.210682
      le <a
      0.784424
    
    
      24
      993ap
      -0.210682
      god
      0.782882
    
    
      25
      sting
      -0.207029
      tians
      0.744131
    
    
      26
      <1993
      -0.203505
      s.edu
      0.678769
    
    
      27
      us.ru
      -0.202609
      churc
      0.667827
    
    
      28
      <199
      -0.202568
      hurch
      0.666893
    
    
      29
      distr
      -0.201586
      chur
      0.623747
    
    
      30
      le <c
      -0.200039
      s.rut
      0.622173
    
    
      31
      istri
      -0.196983
      jesus
      0.590921
    
    
      32
      e <c5
      -0.193733
      jesu
      0.588939
    
    
      33
      ibuti
      -0.191413
      apr.1
      0.574902
    
    
      34
      aldis
      -0.191340
      -clh]
      0.540801
    
    
      35
      ribut
      -0.191021
      --clh
      0.540801
    
    
      36
      islam
      -0.190591
      f chr
      0.538473
    
    
      37
      kaldi
      -0.189607
      faith
      0.518511
    
    
      38
      strib
      -0.189587
      --cl
      0.515254
    
    
      39
      butio
      -0.188530
      va.ru
      0.508999
    
    
      40
      tribu
      -0.187318
      eva.r
      0.508999
    
    
      41
      \ndist
      -0.187065
      a.rut
      0.507753
    
    
      42
      com>
      -0.182778
      neva.
      0.505187
    
    
      43
      erica
      -0.177180
      genev
      0.500854
    
    
      44
      kald
      -0.176562
      god's
      0.499207
    
    
      45
      andvi
      -0.175633
      eneva
      0.494737
    
    
      46
      ndvik
      -0.175633
      993.1
      0.491023
    
    
      47
      sandv
      -0.175633
      esus
      0.486630
    
    
      48
      e <19
      -0.172140
      god'
      0.483537
    
    
      49
      meric
      -0.170508
      od's
      0.478711
    
  








    



class = talk.politics.guns






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      crypt
      -0.231857
      gun
      1.266008
    
    
      1
      ipper
      -0.154711
      fire
      1.046236
    
    
      2
      encry
      -0.151651
      firea
      0.921690
    
    
      3
      ncryp
      -0.151651
      irear
      0.918977
    
    
      4
      spee
      -0.143482
      rearm
      0.914630
    
    
      5
      lippe
      -0.140055
      guns
      0.911108
    
    
      6
      israe
      -0.139948
      guns
      0.869453
    
    
      7
      srael
      -0.139948
      weapo
      0.711248
    
    
      8
      and d
      -0.138797
      eapon
      0.711248
    
    
      9
      ?\norg
      -0.137767
      earms
      0.706334
    
    
      10
      clipp
      -0.137020
      weap
      0.644490
    
    
      11
      encr
      -0.131148
      waco
      0.626129
    
    
      12
      clip
      -0.130617
      batf
      0.615037
    
    
      13
      game
      -0.129599
      gun c
      0.605255
    
    
      14
      isra
      -0.129538
      handg
      0.591689
    
    
      15
      rypto
      -0.128183
      ndgun
      0.588299
    
    
      16
      estin
      -0.128073
      andgu
      0.588299
    
    
      17
      he so
      -0.125262
      arms
      0.545820
    
    
      18
      rypti
      -0.125222
      e gun
      0.545762
    
    
      19
      eason
      -0.124204
      un co
      0.545057
    
    
      20
      he re
      -0.120309
      apons
      0.521431
    
    
      21
      chip
      -0.120229
      atf
      0.484039
    
    
      22
      play
      -0.118937
      waco
      0.463063
    
    
      23
      yptio
      -0.117902
      crim
      0.462801
    
    
      24
      menia
      -0.117776
      n wac
      0.439493
    
    
      25
      rmeni
      -0.117772
      y gun
      0.439133
    
    
      26
      armen
      -0.117333
      cdt@
      0.428831
    
    
      27
      speed
      -0.114494
      gun i
      0.399300
    
    
      28
      pper
      -0.113902
      batf
      0.392067
    
    
      29
      new
      -0.113807
      a gun
      0.382797
    
    
      30
      ealth
      -0.113110
      crary
      0.382576
    
    
      31
      bike
      -0.113081
      n gun
      0.381007
    
    
      32
      mess
      -0.111776
      in wa
      0.374819
    
    
      33
      prog
      -0.111312
      ranc
      0.372492
    
    
      34
      are t
      -0.111286
      pons
      0.366446
    
    
      35
      cryp
      -0.109880
      vivor
      0.358042
    
    
      36
      y is
      -0.109409
      rvivo
      0.356586
    
    
      37
      ystem
      -0.109316
      vidia
      0.354413
    
    
      38
      syste
      -0.109277
      ividi
      0.351073
    
    
      39
      chri
      -0.108416
      ivors
      0.351069
    
    
      40
      stev
      -0.108380
      fire
      0.350243
    
    
      41
      ograp
      -0.108087
      idian
      0.350098
    
    
      42
      ites
      -0.107754
      t gun
      0.347556
    
    
      43
      r) wr
      -0.107164
      u2803
      0.347533
    
    
      44
      togra
      -0.105629
      8037@
      0.347533
    
    
      45
      team
      -0.105547
      7@uic
      0.347533
    
    
      46
      fact
      -0.105446
      37@ui
      0.347533
    
    
      47
      ogram
      -0.105143
      28037
      0.347533
    
    
      48
      stian
      -0.104777
      037@u
      0.347533
    
    
      49
      motor
      -0.104668
      fbi
      0.347489
    
  








    



class = talk.politics.mideast






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      chris
      -0.197178
      israe
      2.039616
    
    
      1
      hrist
      -0.193177
      srael
      2.039260
    
    
      2
      jesus
      -0.178078
      isra
      1.893064
    
    
      3
      ?\norg
      -0.173866
      raeli
      1.341251
    
    
      4
      jesu
      -0.166723
      aeli
      1.163265
    
    
      5
      chri
      -0.162195
      rmeni
      1.038226
    
    
      6
      stian
      -0.161559
      menia
      1.022559
    
    
      7
      istia
      -0.156717
      armen
      1.015505
    
    
      8
      god
      -0.155762
      turk
      0.979189
    
    
      9
      f god
      -0.151733
      rael
      0.911527
    
    
      10
      risti
      -0.148715
      arab
      0.879315
    
    
      11
      bibl
      -0.141630
      enian
      0.817754
    
    
      12
      esus
      -0.141518
      nians
      0.799968
    
    
      13
      the b
      -0.138227
      arme
      0.751435
    
    
      14
      gun
      -0.135298
      nian
      0.735639
    
    
      15
      the c
      -0.135137
      pales
      0.715789
    
    
      16
      good
      -0.128128
      alest
      0.714717
    
    
      17
      he pr
      -0.125567
      turki
      0.697670
    
    
      18
      of go
      -0.124924
      urkis
      0.690479
    
    
      19
      good
      -0.124592
      rkish
      0.689627
    
    
      20
      bible
      -0.123040
      lesti
      0.678826
    
    
      21
      play
      -0.122911
      kish
      0.647414
    
    
      22
      he bi
      -0.121984
      pale
      0.601010
    
    
      23
      s)\nsu
      -0.121421
      jews
      0.586000
    
    
      24
      rophe
      -0.120127
      jews
      0.567469
    
    
      25
      proph
      -0.119750
      arab
      0.564121
    
    
      26
      , but
      -0.118328
      occup
      0.551971
    
    
      27
      e)\nsu
      -0.118053
      arabs
      0.534219
    
    
      28
      he co
      -0.117934
      he ar
      0.526690
    
    
      29
      lord
      -0.117003
      inian
      0.508972
    
    
      30
      e com
      -0.116628
      stini
      0.498846
    
    
      31
      than
      -0.116155
      n isr
      0.498296
    
    
      32
      e bib
      -0.114943
      tinia
      0.493683
    
    
      33
      anyo
      -0.114032
      turks
      0.469413
    
    
      34
      in c
      -0.113425
      : isr
      0.467455
    
    
      35
      anyon
      -0.112181
      urkey
      0.466318
    
    
      36
      nyone
      -0.111922
      turke
      0.466318
    
    
      37
      , and
      -0.111854
      erdar
      0.463276
    
    
      38
      roduc
      -0.111097
      serda
      0.462826
    
    
      39
      than
      -0.109835
      e isr
      0.458450
    
    
      40
      canad
      -0.109623
      soldi
      0.441401
    
    
      41
      game
      -0.109467
      ldier
      0.441401
    
    
      42
      cient
      -0.109081
      oldie
      0.440794
    
    
      43
      n is
      -0.108821
      argic
      0.438893
    
    
      44
      king
      -0.108138
      t isr
      0.438478
    
    
      45
      the d
      -0.106649
      aelis
      0.436242
    
    
      46
      resu
      -0.105719
      enia
      0.431850
    
    
      47
      r) wr
      -0.105287
      terr
      0.430864
    
    
      48
      it's
      -0.104790
      e ara
      0.428938
    
    
      49
      e fed
      -0.104297
      leban
      0.423841
    
  








    



class = talk.politics.misc






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      hrist
      -0.171628
      cram
      0.702884
    
    
      1
      risti
      -0.166115
      crame
      0.653219
    
    
      2
      chris
      -0.161354
      ramer
      0.613786
    
    
      3
      crypt
      -0.154118
      ptili
      0.577782
    
    
      4
      chri
      -0.154054
      optil
      0.577782
    
    
      5
      stian
      -0.152474
      tilin
      0.568303
    
    
      6
      istia
      -0.151307
      ilink
      0.522609
    
    
      7
      gun
      -0.149785
      aldis
      0.494942
    
    
      8
      *****
      -0.141936
      clayt
      0.492113
    
    
      9
      ipper
      -0.139771
      kaldi
      0.489572
    
    
      10
      lippe
      -0.138582
      layto
      0.485312
    
    
      11
      israe
      -0.136118
      kald
      0.452913
    
    
      12
      srael
      -0.136118
      yton
      0.450479
    
    
      13
      the\n
      -0.131490
      @opti
      0.431328
    
    
      14
      clipp
      -0.128860
      ayton
      0.424674
    
    
      15
      isra
      -0.124200
      linto
      0.420665
    
    
      16
      guns
      -0.124179
      clint
      0.417481
    
    
      17
      guns
      -0.123240
      inton
      0.415637
    
    
      18
      team
      -0.122991
      sexua
      0.408172
    
    
      19
      cont
      -0.117785
      exual
      0.408172
    
    
      20
      chip
      -0.116148
      gay
      0.383518
    
    
      21
      pper
      -0.114901
      sc-br
      0.365720
    
    
      22
      fire
      -0.109583
      isc-b
      0.363961
    
    
      23
      contr
      -0.108357
      osexu
      0.357775
    
    
      24
      scien
      -0.108235
      clin
      0.356485
    
    
      25
      lled
      -0.107989
      r.isc
      0.351769
    
    
      26
      is th
      -0.104385
      c-br.
      0.350971
    
    
      27
      encr
      -0.101968
      n cra
      0.343639
    
    
      28
      encry
      -0.101949
      link.
      0.343020
    
    
      29
      ncryp
      -0.101949
      r@opt
      0.341056
    
    
      30
      rearm
      -0.100924
      mer@o
      0.341056
    
    
      31
      ntrol
      -0.100233
      us.ru
      0.340791
    
    
      32
      ontro
      -0.098770
      er@op
      0.340054
    
    
      33
      buil
      -0.098509
      m (cl
      0.336406
    
    
      34
      car
      -0.098180
      ton c
      0.332366
    
    
      35
      firea
      -0.097797
      p ten
      0.329213
    
    
      36
      irear
      -0.097797
      mulus
      0.326969
    
    
      37
      icati
      -0.097326
      (clay
      0.323269
    
    
      38
      cent
      -0.096986
      nk.co
      0.323134
    
    
      39
      waco
      -0.096764
      ink.c
      0.321097
    
    
      40
      chip
      -0.096505
      op te
      0.320966
    
    
      41
      space
      -0.095967
      on cr
      0.318403
    
    
      42
      driv
      -0.095406
      tax
      0.313726
    
    
      43
      scie
      -0.095240
      br.co
      0.313460
    
    
      44
      raeli
      -0.094066
      .isc-
      0.313460
    
    
      45
      clip
      -0.093871
      -br.c
      0.313460
    
    
      46
      illed
      -0.093657
      amer
      0.312036
    
    
      47
      h.edu
      -0.093423
      amer@
      0.307402
    
    
      48
      kille
      -0.093269
      omose
      0.306136
    
    
      49
      team
      -0.093066
      mosex
      0.306136
    
  








    



class = talk.religion.misc






    







  
    
      
      Smallest Word
      Smallest Weight
      Largest Word
      Largest Weight
    
  
  
    
      0
      theis
      -0.167289
      ndvik
      0.336977
    
    
      1
      athei
      -0.159897
      sandv
      0.335371
    
    
      2
      heist
      -0.148800
      andvi
      0.335371
    
    
      3
      athe
      -0.144354
      dvik-
      0.311110
    
    
      4
      h.edu
      -0.123736
      endig
      0.302324
    
    
      5
      .rutg
      -0.114729
      kendi
      0.300536
    
    
      6
      ers.e
      -0.113476
      weiss
      0.263257
    
    
      7
      rs.ed
      -0.113476
      istia
      0.259342
    
    
      8
      eists
      -0.113026
      rt we
      0.258908
    
    
      9
      rutge
      -0.112598
      eritt
      0.257621
    
    
      10
      tgers
      -0.112598
      stian
      0.256729
    
    
      11
      utger
      -0.112598
      h981@
      0.256256
    
    
      12
      -----
      -0.108509
      ch981
      0.256256
    
    
      13
      gers.
      -0.107762
      ch98
      0.256256
    
    
      14
      s on
      -0.104184
      981@c
      0.254621
    
    
      15
      need
      -0.101806
      cruci
      0.254562
    
    
      16
      1993.
      -0.100520
      yalro
      0.253985
    
    
      17
      .1993
      -0.096622
      oyalr
      0.253985
    
    
      18
      ch.ed
      -0.092860
      ds.ca
      0.253985
    
    
      19
      le <a
      -0.091901
      alroa
      0.253985
    
    
      20
      thank
      -0.091515
      .roya
      0.253985
    
    
      21
      du (m
      -0.090084
      weis
      0.251545
    
    
      22
      s for
      -0.090061
      sicru
      0.250585
    
    
      23
      --clh
      -0.089927
      rosic
      0.250585
    
    
      24
      -clh]
      -0.089927
      osicr
      0.250585
    
    
      25
      ectio
      -0.089803
      icruc
      0.250585
    
    
      26
      <apr.
      -0.089658
      ads.c
      0.248127
    
    
      27
      _____
      -0.089291
      risti
      0.247781
    
    
      28
      <apr
      -0.089104
      tian
      0.247239
    
    
      29
      free
      -0.088943
      lroad
      0.244298
    
    
      30
      hanks
      -0.088223
      81@cl
      0.242664
    
    
      31
      e <ap
      -0.086701
      icea)
      0.238063
    
    
      32
      uman
      -0.086578
      ucian
      0.237407
    
    
      33
      re: t
      -0.084443
      ert w
      0.237292
    
    
      34
      --cl
      -0.083790
      ralit
      0.236477
    
    
      35
      free
      -0.083725
      t wei
      0.233681
    
    
      36
      ce of
      -0.083299
      .appl
      0.233057
    
    
      37
      . how
      -0.081699
      rosi
      0.231263
    
    
      38
      keit
      -0.081260
      lm le
      0.231150
    
    
      39
      nati
      -0.081109
      licea
      0.229946
    
    
      40
      than
      -0.080232
      1@cle
      0.229496
    
    
      41
      ech.e
      -0.078315
      hrist
      0.228836
    
    
      42
      . --c
      -0.077761
      oads.
      0.226113
    
    
      43
      dom i
      -0.075727
      orali
      0.225484
    
    
      44
      need
      -0.075382
      m lee
      0.222942
    
    
      45
      work
      -0.075058
      3 god
      0.222288
    
    
      46
      (was
      -0.074601
      93 g
      0.222288
    
    
      47
      our
      -0.074392
      re: a
      0.222246
    
    
      48
      lief
      -0.072877
      93 go
      0.220974
    
    
      49
      genoc
      -0.072577
      jesus
      0.219796

Tokenization Tests

Docs:

CountVectorizer
TfidfTransformer



In [57]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer(use_idf=False)),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False, name='use_idf=False')









    



F1 = 0.804 
Accuracy = 0.815
time = 8.694 sec.



In [58]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english')),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False, name='stopwords')









    



F1 = 0.844 
Accuracy = 0.851
time = 7.025 sec.



In [59]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english', ngram_range=(1,2),
                                                max_df = 0.8, min_df=2)),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False, name='ngram_range=(1,2)')









    



F1 = 0.850 
Accuracy = 0.856
time = 17.337 sec.



In [60]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english')),
                      ('tfidf', TfidfTransformer(norm=None)),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False, name='norm = None')









    



F1 = 0.746 
Accuracy = 0.753
time = 7.385 sec.



In [61]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english')),
                      ('tfidf', TfidfTransformer(sublinear_tf=True)),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False, name='sublinear_tf=True')









    



F1 = 0.852 
Accuracy = 0.859
time = 6.955 sec.



In [62]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english')),
                      ('tfidf', TfidfTransformer(norm='l1')),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False, name='norm=l1')









    



F1 = 0.810 
Accuracy = 0.823
time = 7.500 sec.



In [63]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english', ngram_range=(1,3),
                                                max_df = 0.8, min_df=2)),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False, name='ngram_range=(1,3)')









    



F1 = 0.848 
Accuracy = 0.853
time = 31.060 sec.



In [64]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english', ngram_range=(1,2), 
                                                max_df = 0.8, min_df=2)),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 )),
                         ]), verbose=False, name = 'ngram_range=(1,2), max_df = 0.8, min_df=2')









    



F1 = 0.850 
Accuracy = 0.856
time = 17.937 sec.



In [65]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english', ngram_range=(1,2), 
                                                  max_df = 0.8, min_df=2)),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 , n_jobs=-1)),
                         ]), verbose=False, name='ngram_range=(1,2), max_df = 0.8, min_df=2')









    



F1 = 0.850 
Accuracy = 0.856
time = 17.200 sec.



In [66]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english', token_pattern="[a-zA-Z]{3,}")),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 , n_jobs=-1)),
                         ]), verbose=False, name='no numbers')









    



F1 = 0.839 
Accuracy = 0.847
time = 3.978 sec.



In [67]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english', token_pattern="[a-zA-Z0-9.-]{1,}")),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 , n_jobs=-1)),
                         ]), verbose=False, name='dots in words')









    



F1 = 0.847 
Accuracy = 0.854
time = 5.745 sec.



In [68]:

    
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english', 
                                                 ngram_range=(1,2), min_df=3,max_df=0.8)),
                      ('tfidf', TfidfTransformer(sublinear_tf=True)),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 , n_jobs=-1)),
                         ]), verbose=False, name='ngram_range=(1,2), min_df=3,max_df=0.8, sublinear_tf')









    



F1 = 0.859 
Accuracy = 0.865
time = 14.211 sec.

Char N-grams



In [69]:

    
for n in range(3,8,1):
    print('\nN-grams = '+ str(n))
    test_pipeline(Pipeline([('cvect', CountVectorizer(analyzer='char', ngram_range=(n,n), 
                                                     min_df=2, max_df=0.9)),
                      ('tfidf', TfidfTransformer(sublinear_tf=True)),
                      ('sgdc', SGDClassifier(loss='hinge', penalty='l2',
                                            alpha=1e-4, random_state=42,
                                            max_iter=40 , n_jobs=-1)),
                         ]), verbose=False, name='char ngram ' + str(n) + ' + sublinear_tf')









    



N-grams = 3
F1 = 0.842 
Accuracy = 0.850
time = 24.555 sec.

N-grams = 4
F1 = 0.860 
Accuracy = 0.867
time = 42.013 sec.

N-grams = 5
F1 = 0.863 
Accuracy = 0.870
time = 76.980 sec.

N-grams = 6
F1 = 0.856 
Accuracy = 0.863
time = 101.537 sec.

N-grams = 7
F1 = 0.851 
Accuracy = 0.858
time = 113.803 sec.

SVM Model

SVC
LinearSVC



In [70]:

    
from sklearn.svm import SVC
from sklearn.decomposition import TruncatedSVD
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english')),
                      ('tfidf', TfidfTransformer()),
                      ('svd', TruncatedSVD(n_components=300)),
                      ('svc', SVC(kernel='linear', C=10)),
                         ]), verbose=False, name='SVC + TruncatedSVD')









    



F1 = 0.783 
Accuracy = 0.788
time = 52.128 sec.



In [71]:

    
from sklearn.svm import SVC
from sklearn.decomposition import TruncatedSVD
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english')),
                      ('tfidf', TfidfTransformer()),
                      ('svc', SVC(kernel='linear')),
                         ]), verbose=False, name='SVC')









    



F1 = 0.830 
Accuracy = 0.835
time = 154.554 sec.



In [72]:

    
from sklearn.svm import LinearSVC
from sklearn.decomposition import TruncatedSVD
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english')),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', LinearSVC(C=10)),
                         ]), verbose=False, name='LinearSVC, C=10')









    



F1 = 0.841 
Accuracy = 0.847
time = 9.021 sec.



In [73]:

    
from sklearn.svm import LinearSVC
from sklearn.decomposition import TruncatedSVD
test_pipeline(Pipeline([('cvect', CountVectorizer(stop_words='english')),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', LinearSVC(C=1)),
                         ]), verbose=False, name='LinearSVC, C=1')









    



F1 = 0.845 
Accuracy = 0.851
time = 5.456 sec.

Metadata Features

Code adapted from: http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html



In [74]:

    
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.feature_extraction import DictVectorizer
from sklearn.pipeline import FeatureUnion
from sklearn.preprocessing import StandardScaler
from sklearn.datasets.twenty_newsgroups import strip_newsgroup_footer
from sklearn.datasets.twenty_newsgroups import strip_newsgroup_quoting



In [75]:

    
class ItemSelector(BaseEstimator, TransformerMixin):
    """For data grouped by feature, select subset of data at a provided key.

    The data is expected to be stored in a 2D data structure, where the first
    index is over features and the second is over samples.  i.e.

    >> len(data[key]) == n_samples

    Please note that this is the opposite convention to scikit-learn feature
    matrixes (where the first index corresponds to sample).

    ItemSelector only requires that the collection implement getitem
    (data[key]).  Examples include: a dict of lists, 2D numpy array, Pandas
    DataFrame, numpy record array, etc.

    >> data = {'a': [1, 5, 2, 5, 2, 8],
               'b': [9, 4, 1, 4, 1, 3]}
    >> ds = ItemSelector(key='a')
    >> data['a'] == ds.transform(data)

    ItemSelector is not designed to handle data grouped by sample.  (e.g. a
    list of dicts).  If your data is structured this way, consider a
    transformer along the lines of `sklearn.feature_extraction.DictVectorizer`.

    Parameters
    ----------
    key : hashable, required
        The key corresponding to the desired value in a mappable.
    """
    def __init__(self, key):
        self.key = key

    def fit(self, x, y=None):
        return self

    def transform(self, data_dict):
        return data_dict[self.key]


class TextStats(BaseEstimator, TransformerMixin):
    """Extract features from each document for DictVectorizer"""

    def fit(self, x, y=None):
        return self

    def transform(self, posts):
        return [{'length': len(text),
                 'num_sentences': text.count('.'),
                 'num_questions': text.count('?') ,
                 'num_dollars': text.count('$'), 
                 'num_percent': text.count('%'), 
                 'num_exclamations': text.count('!'), 
                }
                for text in posts]


class SubjectBodyExtractor(BaseEstimator, TransformerMixin):
    """Extract the subject & body from a usenet post in a single pass.

    Takes a sequence of strings and produces a dict of sequences.  Keys are
    `subject` and `body`.
    """
    def fit(self, x, y=None):
        return self

    def transform(self, posts):
        features = np.recarray(shape=(len(posts),),
                               dtype=[('subject', object), ('body', object)])
        for i, text in enumerate(posts):
            headers, _, bod = text.partition('\n\n')
            bod = strip_newsgroup_footer(bod)
            bod = strip_newsgroup_quoting(bod)
            features['body'][i] = bod

            prefix = 'Subject:'
            sub = ''
            for line in headers.split('\n'):
                if line.startswith(prefix):
                    sub = line[len(prefix):]
                    break
            features['subject'][i] = sub

        return features


class Printer(BaseEstimator, TransformerMixin):
    """{Print inputs}"""

    def __init__(self, count):
        self.count = count

    
    def fit(self, x, y=None):
        return self

    def transform(self, x):
        if(self.count >0):
            self.count-=1
            print(x[0])
        return x
    
    
pipeline = Pipeline([
    # Extract the subject & body
    ('subjectbody', SubjectBodyExtractor()),

    # Use FeatureUnion to combine the features from subject and body
    ('union', FeatureUnion(n_jobs=-1,
        transformer_list=[

            # Pipeline for pulling features from the post's subject line
            ('subject', Pipeline([
                ('selector', ItemSelector(key='subject')),
                ('tfidf', TfidfVectorizer(min_df=1)),
            ])),

            # Pipeline for standard bag-of-words model for body
            ('body_bow', Pipeline([
                ('selector', ItemSelector(key='body')),
                ('tfidf', TfidfVectorizer()),
            ])),
            
            # Pipeline for pulling ad hoc features from post's body
            ('body_stats', Pipeline([
                ('selector', ItemSelector(key='body')),
                ('stats', TextStats()),  # returns a list of dicts
                ('cvect', DictVectorizer()),  # list of dicts -> feature matrix
                #('print',Printer(1)),
                # scaling is needed so SGD model will have balanced feature gradients
                ('scale', StandardScaler(copy=False, with_mean=False, with_std=True) ),
                #('print2',Printer(1)),
            ])),
        ],

        # weight components in FeatureUnion
        transformer_weights={
            'subject': 1,
            'body_bow': 1,
            'body_stats': .1,
        },
        
    )),
    #('print',Printer(1)),
    # Use a SVC classifier on the combined features
    #('svc', SVC(kernel='linear')),
    ('sgdc', SGDClassifier(loss='hinge', penalty='l2', alpha=1e-3, random_state=42, max_iter=5 )),
])


test_pipeline(pipeline, verbose=False, name='metadata')









    



F1 = 0.772 
Accuracy = 0.779
time = 8.225 sec.

Randomized Paramerter Search

RandomizedSearchCV



In [76]:

    
from scipy.stats import expon as sp_expon
from scipy.stats import randint as sp_randint
from scipy.stats import uniform as sp_uniform

Random variable of 10^x with x uniformly distributed



In [77]:

    
r = sp_uniform(loc=5,scale=2).rvs(size=1000*1000)

fig, ax = plt.subplots(1, 1, figsize=(12, 5))
ax.hist(r,  bins=100)
plt.show()



In [78]:

    
def geometric_sample(power_min, power_max, sample_size):
    dist = sp_uniform(loc=power_min, scale=power_max-power_min)
    return np.power(10, dist.rvs(size=sample_size))

geometric_sample(1,6,50)









    Out[78]:





array([  1.92896544e+01,   6.83289355e+05,   1.95385196e+04,
         1.20947891e+05,   8.33795910e+04,   3.05388861e+02,
         9.66487850e+01,   2.84030906e+03,   2.20135512e+02,
         1.18118731e+05,   5.54893953e+05,   7.73359579e+02,
         2.31922296e+02,   2.93040871e+03,   3.41772071e+03,
         3.08408903e+02,   1.51417542e+04,   4.44420516e+02,
         3.52444813e+03,   9.52551850e+01,   3.56450292e+02,
         7.61391478e+05,   6.10989850e+05,   1.28923699e+04,
         4.14388670e+03,   1.37949930e+02,   3.03808359e+05,
         2.80071916e+03,   2.36883674e+03,   2.28782530e+02,
         1.69987435e+03,   6.16399317e+01,   3.47855278e+04,
         2.58338539e+05,   1.86128121e+03,   9.21084971e+05,
         1.80319635e+03,   2.17500489e+03,   5.06344048e+04,
         5.92700245e+02,   7.90825307e+03,   1.29513406e+05,
         1.17371924e+02,   3.33168152e+01,   4.88239782e+03,
         1.54286837e+05,   5.54862492e+05,   7.08711990e+04,
         4.91055287e+04,   3.06866519e+03])

Random Parameter Search

RandomizedSearchCV

Example



In [79]:

    
from sklearn.model_selection import RandomizedSearchCV

pipeline = Pipeline([('cvect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                      ('sgdc', SGDClassifier( random_state=42 )),
                         ])

#ngram_range=(1,2), max_df = 0.8, min_df=2

param_dist = {"cvect__stop_words": [None,'english'],
              "cvect__ngram_range": [(1,1),(1,2)],
              "cvect__min_df": sp_randint(1, 6),
              "cvect__max_df": sp_uniform(loc=0.5, scale=0.5), # range is (loc, loc+scale)
              "tfidf__sublinear_tf": [True,False],
              "tfidf__norm": [None, 'l1', 'l2'],
              "sgdc__max_iter": sp_randint(5, 40),
              "sgdc__loss": ['hinge','log'],
              "sgdc__alpha": geometric_sample(-8,-3,10000),
              }

# n_iter - number of random models to evaluate
# n_jobs = -1 to run in parallel on all cores
# cv = 4 , 4-fold cross validation
# scoring='f1_macro' , averages the F1 for each target class

rs = RandomizedSearchCV(pipeline, param_distributions=param_dist,
                        n_iter=5, n_jobs=-1, cv=3, return_train_score=False, 
                        verbose=1, scoring='f1_macro', random_state=42)

test_pipeline(rs, verbose=False, name='Random Parameter Search')

Audio(url='./Beep 2.wav', autoplay=True)









    



Fitting 3 folds for each of 5 candidates, totalling 15 fits






    



[Parallel(n_jobs=-1)]: Done  15 out of  15 | elapsed:   23.9s finished






    



F1 = 0.830 
Accuracy = 0.838
time = 29.103 sec.






    Out[79]:



In [80]:

    
#pd.get_option("display.max_columns")
pd.set_option("display.max_columns", 40)

header('Best')
display( pd.DataFrame.from_dict(rs.best_params_, orient= 'index') )

header('All Results')
df = pd.DataFrame(rs.cv_results_)
df = df.sort_values(['rank_test_score'])
display(df)









    




Best






    







  
    
      
      0
    
  
  
    
      cvect__max_df
      0.577997
    
    
      cvect__min_df
      3
    
    
      cvect__ngram_range
      (1, 1)
    
    
      cvect__stop_words
      english
    
    
      sgdc__alpha
      8.0991e-07
    
    
      sgdc__loss
      log
    
    
      sgdc__max_iter
      6
    
    
      tfidf__norm
      l1
    
    
      tfidf__sublinear_tf
      False
    
  








    




All Results






    







  
    
      
      mean_fit_time
      mean_score_time
      mean_test_score
      param_cvect__max_df
      param_cvect__min_df
      param_cvect__ngram_range
      param_cvect__stop_words
      param_sgdc__alpha
      param_sgdc__loss
      param_sgdc__max_iter
      param_tfidf__norm
      param_tfidf__sublinear_tf
      params
      rank_test_score
      split0_test_score
      split1_test_score
      split2_test_score
      std_fit_time
      std_score_time
      std_test_score
    
  
  
    
      1
      4.677818
      1.673881
      0.886112
      0.577997
      3
      (1, 1)
      english
      8.0991e-07
      log
      6
      l1
      False
      {'cvect__max_df': 0.577997260168, 'cvect__min_...
      1
      0.884335
      0.886725
      0.887280
      0.106748
      0.073919
      0.001278
    
    
      4
      12.823025
      1.979624
      0.880997
      0.545303
      3
      (1, 2)
      None
      3.51484e-06
      log
      7
      None
      True
      {'cvect__max_df': 0.545303217266, 'cvect__min_...
      2
      0.877519
      0.882615
      0.882863
      0.123957
      0.070086
      0.002463
    
    
      0
      6.218041
      1.776505
      0.869679
      0.68727
      5
      (1, 1)
      None
      3.20011e-08
      hinge
      25
      l2
      False
      {'cvect__max_df': 0.687270059424, 'cvect__min_...
      3
      0.865980
      0.872571
      0.870490
      0.143012
      0.044473
      0.002752
    
    
      3
      8.028108
      1.310651
      0.835594
      0.511531
      3
      (1, 1)
      None
      0.000592011
      log
      32
      l2
      False
      {'cvect__max_df': 0.511531212521, 'cvect__min_...
      4
      0.831645
      0.843733
      0.831397
      0.007954
      0.154820
      0.005758
    
    
      2
      6.339887
      1.760869
      0.825414
      0.500389
      4
      (1, 1)
      None
      9.94494e-06
      log
      26
      None
      False
      {'cvect__max_df': 0.500389382921, 'cvect__min_...
      5
      0.833237
      0.834292
      0.808678
      0.283259
      0.057872
      0.011829

Plot Parameter Performance



In [81]:

    
df = df.apply(pd.to_numeric, errors='ignore')
prefix = 'param_'
param_col = [col for col in df.columns if col.startswith(prefix) ]
for col in param_col:
    name = col[len(prefix):]
    header(name)
    if(df[col].dtype == np.float64 or df[col].dtype == np.int64):
        print( 'scatter')
        df.plot(kind='scatter', x=col, y='mean_test_score', figsize=(15,10))
        plt.show()
    else:
        mean = df[[col,'mean_test_score']].fillna(value='None').groupby(col).mean()
        mean.plot(kind='bar', figsize=(10,10))
        plt.show()









    




cvect__max_df






    



scatter






    












    




cvect__min_df






    



scatter






    












    




cvect__ngram_range






    












    




cvect__stop_words






    












    




sgdc__alpha






    



scatter






    












    




sgdc__loss






    












    




sgdc__max_iter






    



scatter






    












    




tfidf__norm






    












    




tfidf__sublinear_tf

All Tests



In [82]:

    
tests_df=pd.DataFrame.from_dict(tests, orient= 'index')
tests_df = tests_df.drop(['Name'], axis=1)
tests_df.columns=[ 'F1', 'Accuracy', 'Time (sec.)', 'Details']
tests_df = tests_df.sort_values(by=['F1'], ascending=False)
display(tests_df)

header('Best Model')
display(tests_df.head(1))
print(tests_df['Details'].values[0])









    







  
    
      
      F1
      Accuracy
      Time (sec.)
      Details
    
  
  
    
      char ngram 5 + sublinear_tf
      0.869888
      0.863004
      76.979677
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      char ngram 4 + sublinear_tf
      0.866702
      0.859743
      42.012700
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      ngram_range=(1,2), min_df=3,max_df=0.8, sublinear_tf
      0.865242
      0.859321
      14.210797
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      char ngram 6 + sublinear_tf
      0.862985
      0.856477
      101.537202
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      sublinear_tf=True
      0.859400
      0.852128
      6.955444
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      char ngram 7 + sublinear_tf
      0.857541
      0.851403
      113.802729
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      ngram_range=(1,2), max_df = 0.8, min_df=2
      0.856213
      0.850155
      17.200412
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      ngram_range=(1,2)
      0.856213
      0.850155
      17.337156
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      dots in words
      0.854488
      0.847328
      5.744998
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      hinge loss
      0.853691
      0.846033
      8.372657
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      ngram_range=(1,3)
      0.853425
      0.847773
      31.060322
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      stopwords
      0.851434
      0.844401
      7.025221
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      LinearSVC, C=1
      0.851036
      0.844743
      5.456266
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      char ngram 3 + sublinear_tf
      0.850239
      0.842296
      24.555441
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      LogisticRegression multinomial C=100
      0.847053
      0.840856
      49.159303
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      LinearSVC, C=10
      0.846920
      0.841163
      9.021482
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      no numbers
      0.846787
      0.839302
      3.978191
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      log loss
      0.846654
      0.840403
      6.092606
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      LogisticRegression multinomial C=1000
      0.845858
      0.839522
      50.856773
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      LogisticRegression multinomial C=10
      0.844530
      0.837837
      15.442162
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      Random Parameter Search
      0.837759
      0.829541
      29.103136
      {'cv': 3, 'error_score': 'raise', 'estimator__...
    
    
      SVC
      0.834971
      0.829670
      154.554208
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      LogisticRegression ovr
      0.827801
      0.818612
      47.392265
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      LogisticRegression multinomial
      0.827403
      0.819329
      8.674405
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      log loss no regularization
      0.825412
      0.819152
      6.112183
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      norm=l1
      0.823155
      0.809600
      7.499916
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      use_idf=False
      0.815321
      0.803527
      8.694123
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      SVC + TruncatedSVD
      0.787971
      0.782752
      52.127526
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      metadata
      0.779076
      0.771931
      8.225383
      {'memory': None, 'steps': [('subjectbody', Sub...
    
    
      MultinomialNB
      0.773898
      0.755754
      4.628393
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      norm = None
      0.753186
      0.745847
      7.384666
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      NearestCentroid
      0.692114
      0.693867
      4.217473
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=4 distance weights
      0.684015
      0.678298
      12.718538
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=6 distance weights
      0.680696
      0.674070
      12.040544
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=3 distance weights
      0.680165
      0.674523
      11.714124
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=5 distance weights
      0.678306
      0.672070
      12.580661
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=2 distance weights
      0.672464
      0.667350
      11.513792
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=1
      0.672464
      0.667350
      11.949279
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=1 distance weights
      0.672464
      0.667350
      11.402045
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=6
      0.660515
      0.654841
      11.862630
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=5
      0.659187
      0.654903
      11.996723
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=3
      0.657860
      0.656407
      11.676285
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=4
      0.656399
      0.653836
      11.962758
      {'memory': None, 'steps': [('cvect', CountVect...
    
    
      KNN n=2
      0.640733
      0.638887
      11.941925
      {'memory': None, 'steps': [('cvect', CountVect...
    
  








    




Best Model






    







  
    
      
      F1
      Accuracy
      Time (sec.)
      Details
    
  
  
    
      char ngram 5 + sublinear_tf
      0.869888
      0.863004
      76.979677
      {'memory': None, 'steps': [('cvect', CountVect...
    
  








    



{'memory': None, 'steps': [('cvect', CountVectorizer(analyzer='char', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=0.9, max_features=None, min_df=2,
        ngram_range=(5, 5), preprocessor=None, stop_words=None,
        strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
        tokenizer=None, vocabulary=None)), ('tfidf', TfidfTransformer(norm='l2', smooth_idf=True, sublinear_tf=True, use_idf=True)), ('sgdc', SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
       eta0=0.0, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='hinge', max_iter=40, n_iter=None,
       n_jobs=-1, penalty='l2', power_t=0.5, random_state=42, shuffle=True,
       tol=None, verbose=0, warm_start=False))], 'cvect': CountVectorizer(analyzer='char', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=0.9, max_features=None, min_df=2,
        ngram_range=(5, 5), preprocessor=None, stop_words=None,
        strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
        tokenizer=None, vocabulary=None), 'tfidf': TfidfTransformer(norm='l2', smooth_idf=True, sublinear_tf=True, use_idf=True), 'sgdc': SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
       eta0=0.0, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='hinge', max_iter=40, n_iter=None,
       n_jobs=-1, penalty='l2', power_t=0.5, random_state=42, shuffle=True,
       tol=None, verbose=0, warm_start=False), 'cvect__analyzer': 'char', 'cvect__binary': False, 'cvect__decode_error': 'strict', 'cvect__dtype': <class 'numpy.int64'>, 'cvect__encoding': 'utf-8', 'cvect__input': 'content', 'cvect__lowercase': True, 'cvect__max_df': 0.9, 'cvect__max_features': None, 'cvect__min_df': 2, 'cvect__ngram_range': (5, 5), 'cvect__preprocessor': None, 'cvect__stop_words': None, 'cvect__strip_accents': None, 'cvect__token_pattern': '(?u)\\b\\w\\w+\\b', 'cvect__tokenizer': None, 'cvect__vocabulary': None, 'tfidf__norm': 'l2', 'tfidf__smooth_idf': True, 'tfidf__sublinear_tf': True, 'tfidf__use_idf': True, 'sgdc__alpha': 0.0001, 'sgdc__average': False, 'sgdc__class_weight': None, 'sgdc__epsilon': 0.1, 'sgdc__eta0': 0.0, 'sgdc__fit_intercept': True, 'sgdc__l1_ratio': 0.15, 'sgdc__learning_rate': 'optimal', 'sgdc__loss': 'hinge', 'sgdc__max_iter': 40, 'sgdc__n_iter': None, 'sgdc__n_jobs': -1, 'sgdc__penalty': 'l2', 'sgdc__power_t': 0.5, 'sgdc__random_state': 42, 'sgdc__shuffle': True, 'sgdc__tol': None, 'sgdc__verbose': 0, 'sgdc__warm_start': False}



In [84]:

    
plt.figure(figsize=(13,5))
tests_df['F1'].plot(kind='bar', ylim=(0.6,None))

Audio(url='./Beep 2.wav', autoplay=True)









    Out[84]:



In [ ]:

	row	column	count	word
0	0	66608	1	in
1	0	99121	1	rain
2	0	109111	1	spain
3	0	114455	2	the
4	1	35194	2	brown
5	1	56573	1	fox
6	1	114455	1	the

	row	column	value	word
0	0	114455	0.214719	the
1	0	109111	0.759340	spain
2	0	99121	0.602864	rain
3	0	66608	0.117698	in
4	1	114455	0.085361	the
5	1	56573	0.524809	fox
6	1	35194	0.846929	brown

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	166	0	0	1	0	1	0	0	1	1	1	3	0	6	3	123	4	8	0	1
1	1	252	15	12	9	18	1	2	1	5	2	41	4	0	6	15	4	1	0	0
2	0	14	258	45	3	9	0	2	1	3	2	25	1	0	6	23	2	0	0	0
3	0	5	11	305	17	1	3	6	1	0	2	19	13	0	5	3	1	0	0	0
4	0	3	8	23	298	0	3	8	1	3	1	16	8	0	2	8	3	0	0	0
5	1	21	17	13	2	298	1	0	1	1	0	23	0	1	4	10	2	0	0	0
6	0	1	3	31	12	1	271	19	4	4	6	5	12	6	3	9	3	0	0	0
7	0	1	0	3	0	0	4	364	3	2	2	4	1	1	3	3	4	0	1	0
8	0	0	0	1	0	0	2	10	371	0	0	4	0	0	0	8	2	0	0	0
9	0	0	0	0	1	0	0	4	0	357	22	0	0	0	2	9	1	1	0	0
10	0	0	0	0	0	0	0	1	0	4	387	1	0	0	1	5	0	0	0	0
11	0	2	1	0	0	1	1	3	0	0	0	383	1	0	0	3	1	0	0	0
12	0	4	2	17	5	0	2	8	7	1	2	78	235	3	11	15	2	1	0	0
13	2	3	0	1	1	3	1	0	2	3	4	11	5	292	6	52	6	4	0	0
14	0	2	0	1	0	3	0	2	1	0	1	6	1	2	351	19	4	0	1	0
15	2	0	0	0	0	0	0	0	1	0	0	0	0	1	2	392	0	0	0	0
16	0	0	0	1	0	0	2	0	1	1	0	10	0	0	1	6	341	1	0	0
17	0	1	0	0	0	0	0	0	0	1	0	2	0	0	0	24	3	344	1	0
18	2	0	0	0	0	0	0	1	0	0	1	11	0	1	7	35	118	5	129	0
19	33	2	0	0	0	0	0	0	0	1	1	3	0	4	4	131	29	5	3	35

	alt.atheism	comp.graphics	comp.os.ms-windows.misc	comp.sys.ibm.pc.hardware	comp.sys.mac.hardware	comp.windows.x	misc.forsale	rec.autos	rec.motorcycles	rec.sport.baseball	rec.sport.hockey	sci.crypt	sci.electronics	sci.med	sci.space	soc.religion.christian	talk.politics.guns	talk.politics.mideast	talk.politics.misc	talk.religion.misc
Expected
alt.atheism	166	0	0	1	0	1	0	0	1	1	1	3	0	6	3	123	4	8	0	1
comp.graphics	1	252	15	12	9	18	1	2	1	5	2	41	4	0	6	15	4	1	0	0
comp.os.ms-windows.misc	0	14	258	45	3	9	0	2	1	3	2	25	1	0	6	23	2	0	0	0
comp.sys.ibm.pc.hardware	0	5	11	305	17	1	3	6	1	0	2	19	13	0	5	3	1	0	0	0
comp.sys.mac.hardware	0	3	8	23	298	0	3	8	1	3	1	16	8	0	2	8	3	0	0	0
comp.windows.x	1	21	17	13	2	298	1	0	1	1	0	23	0	1	4	10	2	0	0	0
misc.forsale	0	1	3	31	12	1	271	19	4	4	6	5	12	6	3	9	3	0	0	0
rec.autos	0	1	0	3	0	0	4	364	3	2	2	4	1	1	3	3	4	0	1	0
rec.motorcycles	0	0	0	1	0	0	2	10	371	0	0	4	0	0	0	8	2	0	0	0
rec.sport.baseball	0	0	0	0	1	0	0	4	0	357	22	0	0	0	2	9	1	1	0	0
rec.sport.hockey	0	0	0	0	0	0	0	1	0	4	387	1	0	0	1	5	0	0	0	0
sci.crypt	0	2	1	0	0	1	1	3	0	0	0	383	1	0	0	3	1	0	0	0
sci.electronics	0	4	2	17	5	0	2	8	7	1	2	78	235	3	11	15	2	1	0	0
sci.med	2	3	0	1	1	3	1	0	2	3	4	11	5	292	6	52	6	4	0	0
sci.space	0	2	0	1	0	3	0	2	1	0	1	6	1	2	351	19	4	0	1	0
soc.religion.christian	2	0	0	0	0	0	0	0	1	0	0	0	0	1	2	392	0	0	0	0
talk.politics.guns	0	0	0	1	0	0	2	0	1	1	0	10	0	0	1	6	341	1	0	0
talk.politics.mideast	0	1	0	0	0	0	0	0	0	1	0	2	0	0	0	24	3	344	1	0
talk.politics.misc	2	0	0	0	0	0	0	1	0	0	1	11	0	1	7	35	118	5	129	0
talk.religion.misc	33	2	0	0	0	0	0	0	0	1	1	3	0	4	4	131	29	5	3	35

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	christians	-0.262594	keith	1.439411
1	rutgers edu	-0.232721	atheists	1.271824
2	sandman caltech	-0.230508	edu keith	1.146518
3	host sandman	-0.229238	atheism	1.139742
4	rutgers	-0.227094	livesey	0.918769
5	sandman	-0.221514	schneider	0.881965
6	usa	-0.219238	keith cco	0.858405
7	ca	-0.216125	jaeger	0.855797
8	christ	-0.207407	islamic	0.845568
9	mail	-0.204733	islam	0.809818
10	thanks	-0.194144	keith allan	0.799969
11	interested	-0.182378	allan schneider	0.799969
12	usa lines	-0.181814	benedikt	0.762978
13	help	-0.172463	rushdie	0.752313
14	arc cco	-0.165953	solntze wpd	0.731002
15	organization university	-0.165137	solntze	0.731002
16	arc	-0.157305	allan	0.722777
17	heaven	-0.156941	political atheists	0.716436
18	athos	-0.154498	okcforum	0.710658
19	athos rutgers	-0.154498	wpd sgi	0.649272
20	christian	-0.148664	wpd	0.649272
21	use	-0.146382	osrhe	0.647602
22	distribution usa	-0.142912	vice ico	0.628458
23	new	-0.141217	ico tek	0.628458
24	ritvax	-0.140880	god	0.627499
25	ritvax isc	-0.140880	ico	0.616609
26	acs	-0.137707	mozumder	0.614927
27	government	-0.137602	kmr4	0.614037
28	article apr	-0.137103	atheists organization	0.612605
29	arrogance	-0.136612	gregg	0.612241
30	church	-0.135456	jon livesey	0.605124
31	subject 2000	-0.135289	osrhe edu	0.588700
32	rochester	-0.134716	okcforum osrhe	0.588700
33	lines nntp	-0.134381	schneider subject	0.573327
34	year	-0.134233	bobbe vice	0.571574
35	white	-0.130156	bobbe	0.571574
36	gun	-0.129868	caltech edu	0.564274
37	clh	-0.129793	mangoe	0.563092
38	reply	-0.127696	beauchaine	0.562624
39	graphics	-0.127420	livesey solntze	0.562374
40	pro	-0.127393	caltech	0.547609
41	told	-0.126946	jaeger buphy	0.546331
42	data	-0.126880	buphy bu	0.546331
43	truth	-0.126806	buphy	0.546331
44	aaron	-0.124190	tu bs	0.545316
45	buy	-0.123875	i3150101	0.545316
46	weapons	-0.123823	dbstu1 rz	0.545316
47	scripture	-0.123695	dbstu1	0.545316
48	hell	-0.123164	gregg jaeger	0.537279
49	want	-0.122580	i3150101 dbstu1	0.537143

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	sale	-0.248925	graphics	1.735588
1	windows	-0.248667	3d	1.079136
2	monitor	-0.217991	image	0.861527
3	window	-0.197619	polygon	0.819904
4	mit	-0.190565	tiff	0.793013
5	widget	-0.176296	images	0.663772
6	win	-0.174629	cview	0.656759
7	people	-0.163595	pov	0.631182
8	list	-0.158626	animation	0.602805
9	drive	-0.158394	format	0.519557
10	mit edu	-0.156234	comp graphics	0.512187
11	cica	-0.150670	algorithm	0.497796
12	distribution	-0.140442	files	0.492763
13	video card	-0.134308	points	0.491094
14	school	-0.134172	gif	0.489808
15	application	-0.133454	package	0.477010
16	motif	-0.133025	sphere	0.473042
17	server	-0.132383	3do	0.457766
18	usa	-0.131238	library	0.454194
19	card	-0.130418	graphics library	0.453582
20	right	-0.130028	vga	0.417271
21	key	-0.128958	surface	0.405159
22	font	-0.128804	routine	0.396083
23	motherboard	-0.128709	subject tiff	0.395853
24	distribution usa	-0.126410	newsgroup split	0.395765
25	monitors	-0.123936	3d graphics	0.386984
26	nec	-0.122411	viewer	0.380951
27	make	-0.122140	24 bit	0.374144
28	message	-0.120238	vesa	0.358538
29	really	-0.119586	philosophical significance	0.352424
30	hp	-0.118778	studio	0.350484
31	scsi	-0.118023	tdawson	0.347543
32	modem	-0.118012	42	0.335618
33	xterm	-0.117902	jpeg	0.332758
34	memory	-0.115604	algorithms	0.331362
35	new	-0.114796	code	0.329442
36	x11r5	-0.114525	program	0.328883
37	questions	-0.114125	computer graphics	0.316519
38	lcs mit	-0.112023	split	0.314760
39	lcs	-0.111929	subject cview	0.312441
40	widgets	-0.111297	philosophical	0.312432
41	error	-0.110961	tiff philosophical	0.312197
42	circuit	-0.109990	significance 42	0.312197
43	space	-0.108308	polygon routine	0.311376
44	controller	-0.107864	aspects graphics	0.310915
45	car	-0.107545	xv	0.307551
46	long	-0.106565	3d studio	0.303794
47	norton	-0.105883	caspian usc	0.303565
48	com	-0.105569	polygons	0.301520
49	price	-0.104208	24	0.297507

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	sale	-0.315937	windows	2.982981
1	motif	-0.284700	file	0.848644
2	graphics	-0.277000	ini	0.793745
3	mac	-0.250074	cica	0.781741
4	board	-0.214636	win	0.751385
5	scsi	-0.207636	driver	0.734663
6	monitor	-0.206667	drivers	0.662261
7	color	-0.191345	win3	0.656588
8	image	-0.190970	ms	0.631716
9	bus	-0.186814	files	0.626170
10	unix	-0.186793	dos	0.595812
11	floppy	-0.180367	ms windows	0.568824
12	apple	-0.168479	nt	0.511702
13	x11r5	-0.165597	subject windows	0.498173
14	ibm	-0.164128	microsoft	0.477887
15	macintosh	-0.162409	exe	0.457013
16	power	-0.161449	windows organization	0.431226
17	following	-0.160944	font	0.423543
18	xlib	-0.160862	swap file	0.412667
19	comp windows	-0.158583	printer	0.386935
20	time	-0.158197	ftp	0.386923
21	ms dos	-0.157416	bmp	0.383132
22	drive	-0.157296	win ini	0.372668
23	interested	-0.155825	diamond	0.364714
24	cview	-0.154014	w4wg	0.359173
25	xterm	-0.149959	desktop	0.358825
26	bit	-0.148351	fonts	0.354241
27	isa	-0.147073	program manager	0.353322
28	window manager	-0.145817	ftp cica	0.352917
29	mpeg	-0.143865	edu tw	0.346036
30	car	-0.143617	apps	0.345799
31	viewer	-0.142155	cica indiana	0.345217
32	x11	-0.142057	tw	0.343264
33	pcx	-0.141835	using	0.341016
34	vesa	-0.137888	bj	0.340315
35	sdsu	-0.136676	ini files	0.329657
36	mit	-0.135104	truetype	0.327812
37	10	-0.134868	manager	0.325475
38	gif	-0.134016	latest	0.323598
39	formats	-0.133913	subject win	0.323259
40	shipping	-0.132435	swap	0.317746
41	openwindows	-0.131865	win nt	0.317550
42	runs	-0.131365	louray	0.317443
43	question	-0.129966	utility	0.316380
44	lines nntp	-0.129664	access	0.315626
45	sdsu edu	-0.127962	norton	0.312211
46	long	-0.126243	download	0.310784
47	se	-0.123020	use	0.310535
48	software	-0.122193	zip	0.306288
49	keyboard	-0.122145	seas gwu	0.305211

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	mac	-0.379717	ide	1.409218
1	sale	-0.345942	controller	1.206842
2	apple	-0.307799	bus	1.148549
3	windows	-0.217368	scsi	0.970953
4	shipping	-0.212027	isa	0.933777
5	internal	-0.199717	vlb	0.780324
6	file	-0.181048	bios	0.637390
7	brand new	-0.178187	486	0.596254
8	iisi	-0.171694	gateway	0.569682
9	items	-0.167544	eisa	0.557024
10	latest	-0.167369	motherboard	0.556187
11	graphics	-0.162316	drive	0.554489
12	sun	-0.153188	drives	0.551369
13	macintosh	-0.152913	pc	0.522637
14	quadra	-0.148348	isa bus	0.520204
15	window	-0.143562	irq	0.498586
16	win3	-0.143011	card	0.495699
17	code	-0.142029	local bus	0.491729
18	includes	-0.140734	floppy	0.487182
19	space	-0.138845	subject ide	0.478090
20	centris	-0.138325	dma	0.454041
21	sale organization	-0.137216	os	0.438606
22	color	-0.136745	vs scsi	0.437615
23	car	-0.135345	ide vs	0.437615
24	interested	-0.135291	adaptec	0.432051
25	150	-0.134436	port	0.420597
26	driver	-0.133105	boot	0.410292
27	best offer	-0.132752	cmos	0.398034
28	lc	-0.132335	settings	0.371923
29	offer	-0.131659	monitors	0.359509
30	lciii	-0.125270	board	0.358851
31	sell	-0.122500	esdi	0.349988
32	edu	-0.120479	jumpers	0.349597
33	files	-0.118800	scsi controller	0.347016
34	power	-0.118238	monitor	0.343010
35	subject windows	-0.116013	dos	0.327310
36	iifx	-0.114658	hd	0.326227
37	virginia	-0.111283	dx2	0.320946
38	utilities	-0.109607	harddisk	0.319506
39	internal drive	-0.109555	dante nmsu	0.317269
40	ms windows	-0.109310	ide controller	0.316091
41	display	-0.108820	seagate	0.313257
42	000	-0.107160	scsi organization	0.311869
43	information	-0.105917	vs	0.310886
44	se 30	-0.105826	disk	0.308125
45	looked	-0.104528	valve heart	0.307642
46	mit	-0.103172	rri uwo	0.307642
47	duo	-0.103134	rri	0.307642
48	cs	-0.103112	heart rri	0.307642
49	graphics card	-0.103059	17 monitors	0.305329

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	windows	-0.441898	mac	2.041394
1	sale	-0.342317	apple	1.686945
2	ide	-0.324485	quadra	1.263753
3	pc	-0.290010	centris	1.200487
4	controller	-0.287402	lc	0.951701
5	dos	-0.277375	duo	0.898442
6	offer	-0.207702	powerbook	0.837541
7	car	-0.206371	lciii	0.788603
8	com	-0.173308	iisi	0.779973
9	files	-0.163073	c650	0.738654
10	bios	-0.160484	simms	0.672638
11	condition	-0.158353	610	0.659771
12	gateway	-0.146756	vram	0.650616
13	graphics	-0.145430	se	0.640326
14	includes	-0.144365	centris 610	0.603946
15	old 256k	-0.140535	nubus	0.591657
16	forsale	-0.140051	fpu	0.564404
17	isa	-0.138786	simm	0.563169
18	3d	-0.138507	pds	0.561671
19	program	-0.137868	adb	0.557583
20	mike	-0.137478	macs	0.532054
21	apple com	-0.136940	hades	0.514634
22	vlb	-0.136382	040	0.457711
23	cs	-0.135985	upgrade	0.435468
24	sandvik	-0.135625	bmug	0.427254
25	type	-0.133522	macintosh	0.425225
26	amiga	-0.132956	se 30	0.422475
27	kent	-0.130648	centris 650	0.418256
28	00	-0.130307	lc iii	0.414442
29	uucp	-0.125630	monitor	0.405484
30	frame	-0.125046	internal	0.395408
31	file	-0.124133	650	0.389275
32	end	-0.123662	scsi	0.381324
33	john	-0.120111	powerpc	0.375989
34	486	-0.119271	subject quadra	0.360956
35	eisa	-0.118764	clock	0.353483
36	asking	-0.118329	nada kth	0.348820
37	adaptec	-0.118231	quadra 800	0.341816
38	good	-0.117954	slot	0.338758
39	os	-0.116723	nada	0.338661
40	copy	-0.112375	drive	0.336297
41	compaq	-0.112335	ethernet	0.332064
42	god	-0.112011	68040	0.328638
43	bike	-0.111046	pb	0.323738
44	unix	-0.110769	ram	0.313958
45	apple laserwriter	-0.110738	machines	0.313670
46	ericsson	-0.110566	pds slot	0.312209
47	ibm	-0.110536	syquest	0.310015
48	sun	-0.110302	mac ii	0.309108
49	local bus	-0.110174	cable	0.301666

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	dos	-0.362704	motif	1.775618
1	mac	-0.259959	window	1.740462
2	university	-0.224300	x11r5	1.347464
3	card	-0.216287	widget	1.279595
4	driver	-0.206881	server	1.180182
5	good	-0.203282	lcs mit	1.050715
6	athena mit	-0.190738	lcs	1.048969
7	pc	-0.168147	xterm	1.037401
8	printer	-0.163036	expo lcs	1.025669
9	algorithm	-0.159614	expo	1.008819
10	vga	-0.158319	xpert	0.946528
11	pov	-0.158016	mit	0.899334
12	usa	-0.155522	window manager	0.877270
13	data	-0.155363	xlib	0.867122
14	sale	-0.154562	organization internet	0.861733
15	bbs	-0.154494	xpert expo	0.847135
16	drive	-0.154115	internet lines	0.835634
17	power	-0.153397	application	0.826600
18	car	-0.149635	enterpoop	0.672402
19	quadra	-0.149372	enterpoop mit	0.662596
20	disk	-0.148759	x11	0.653357
21	post	-0.148603	widgets	0.642999
22	organization massachusetts	-0.148396	host enterpoop	0.635055
23	monitor	-0.148223	edu xpert	0.619388
24	distribution usa	-0.147370	mit edu	0.604277
25	ai mit	-0.147212	openwindows	0.583481
26	ai	-0.145360	client	0.579953
27	massachusetts institute	-0.145081	display	0.562440
28	years	-0.143402	xt	0.547640
29	drivers	-0.143216	r5	0.544012
30	chip	-0.143111	pixmap	0.525120
31	small	-0.141455	manager	0.512121
32	edu organization	-0.141074	colormap	0.503765
33	apple	-0.140136	clients	0.501120
34	images	-0.138349	tu dresden	0.493931
35	files	-0.137146	running	0.483813
36	massachusetts	-0.136488	expose	0.481673
37	program manager	-0.135871	code	0.475165
38	ca	-0.135722	sun	0.470845
39	technology	-0.134941	dresden	0.467343
40	tiff	-0.133385	inf tu	0.460746
41	print	-0.133167	mwm	0.449451
42	truetype	-0.132521	xview	0.448274
43	word	-0.129411	xdm	0.446943
44	scsi	-0.128986	sunos	0.446736
45	board	-0.128633	event	0.436659
46	3d	-0.127373	internet	0.418717
47	picture	-0.126694	lib	0.408139
48	technology lines	-0.126683	beck	0.407041
49	james	-0.125749	olwm	0.406992

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	166	0	0	1	0	1	0	0	1	1	1	3	0	6	3	123	4	8	0	1
1	1	252	15	12	9	18	1	2	1	5	2	41	4	0	6	15	4	1	0	0
2	0	14	258	45	3	9	0	2	1	3	2	25	1	0	6	23	2	0	0	0
3	0	5	11	305	17	1	3	6	1	0	2	19	13	0	5	3	1	0	0	0
4	0	3	8	23	298	0	3	8	1	3	1	16	8	0	2	8	3	0	0	0
5	1	21	17	13	2	298	1	0	1	1	0	23	0	1	4	10	2	0	0	0
6	0	1	3	31	12	1	271	19	4	4	6	5	12	6	3	9	3	0	0	0
7	0	1	0	3	0	0	4	364	3	2	2	4	1	1	3	3	4	0	1	0
8	0	0	0	1	0	0	2	10	371	0	0	4	0	0	0	8	2	0	0	0
9	0	0	0	0	1	0	0	4	0	357	22	0	0	0	2	9	1	1	0	0
10	0	0	0	0	0	0	0	1	0	4	387	1	0	0	1	5	0	0	0	0
11	0	2	1	0	0	1	1	3	0	0	0	383	1	0	0	3	1	0	0	0
12	0	4	2	17	5	0	2	8	7	1	2	78	235	3	11	15	2	1	0	0
13	2	3	0	1	1	3	1	0	2	3	4	11	5	292	6	52	6	4	0	0
14	0	2	0	1	0	3	0	2	1	0	1	6	1	2	351	19	4	0	1	0
15	2	0	0	0	0	0	0	0	1	0	0	0	0	1	2	392	0	0	0	0
16	0	0	0	1	0	0	2	0	1	1	0	10	0	0	1	6	341	1	0	0
17	0	1	0	0	0	0	0	0	0	1	0	2	0	0	0	24	3	344	1	0
18	2	0	0	0	0	0	0	1	0	0	1	11	0	1	7	35	118	5	129	0
19	33	2	0	0	0	0	0	0	0	1	1	3	0	4	4	131	29	5	3	35

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	writes	-0.317536	sale	3.436651
1	help	-0.304476	shipping	1.344154
2	does	-0.289535	offer	1.311094
3	know	-0.280317	forsale	1.087234
4	think	-0.268504	condition	0.979972
5	thanks	-0.256351	sale organization	0.898205
6	info	-0.254473	best offer	0.873974
7	question	-0.232185	asking	0.848756
8	just	-0.212882	sell	0.734246
9	com	-0.211879	make offer	0.695124
10	appreciated	-0.200603	00	0.632339
11	bike	-0.195627	interested	0.613650
12	article	-0.192214	brand new	0.612971
13	problem	-0.186661	obo	0.610971
14	using	-0.185959	includes	0.526228
15	mac	-0.180333	price	0.522129
16	time	-0.179811	brand	0.493351
17	advance	-0.173216	trade	0.470088
18	thanks advance	-0.171076	excellent	0.469081
19	read	-0.170269	offers	0.467000
20	ftp	-0.160806	excellent condition	0.458888
21	run	-0.159450	manuals	0.455352
22	ve	-0.158736	email	0.434294
23	information	-0.158183	hiram	0.427270
24	uk	-0.157411	new	0.410821
25	heard	-0.156603	stereo	0.406574
26	anybody	-0.155429	original	0.390596
27	post	-0.154718	items	0.389908
28	better	-0.154131	games	0.370641
29	remember	-0.150524	included	0.366704
30	se	-0.149907	25	0.355985
31	recommend	-0.148636	genesis	0.350733
32	news	-0.146307	hiram edu	0.347349
33	ca	-0.146250	manual	0.346961
34	say	-0.145148	sega	0.339552
35	hp com	-0.144636	distribution	0.334254
36	try	-0.138278	forsale organization	0.333367
37	probably	-0.136636	sale trade	0.327502
38	program	-0.136584	cd	0.325240
39	hi	-0.135890	wanted	0.317359
40	sure	-0.134926	selling	0.314517
41	1993	-0.134482	contact	0.310048
42	don	-0.132754	mail	0.309086
43	going	-0.132739	best	0.306935
44	tell	-0.132410	plus shipping	0.306826
45	data	-0.131891	kou	0.305577
46	cnn	-0.131432	douglas kou	0.305577
47	people	-0.129769	used	0.301618
48	did	-0.128862	camera	0.297479
49	dealer	-0.128568	subject sale	0.295516

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	bike	-0.566461	car	2.704451
1	sale	-0.285270	cars	1.858921
2	bikes	-0.249484	engine	0.906552
3	dod	-0.236707	dealer	0.842979
4	david	-0.185771	automotive	0.720391
5	gun	-0.178030	ford	0.684498
6	god	-0.177929	callison	0.657060
7	guns	-0.177296	oil	0.632864
8	motorcycle	-0.175182	toyota	0.629774
9	card	-0.169799	boyle	0.622813
10	team	-0.164670	warning read	0.611300
11	mac	-0.163937	subject warning	0.605882
12	pc	-0.162375	dumbest	0.548768
13	game	-0.161408	autos	0.543519
14	apple	-0.155769	auto	0.517957
15	government	-0.150223	dumbest automotive	0.515150
16	rider	-0.141518	automotive concepts	0.507330
17	public	-0.140775	sho	0.496273
18	radio	-0.140703	eliot	0.487751
19	house	-0.138416	frost	0.469956
20	ride	-0.133703	unisql	0.457907
21	shaft	-0.133044	warning	0.457647
22	yamaha	-0.132802	centerline	0.457581
23	baseball	-0.130083	rec autos	0.457334
24	asking	-0.129630	wagon	0.451413
25	monitor	-0.127358	subject dumbest	0.448474
26	ed	-0.126569	centerline com	0.448218
27	machine	-0.125961	dodge	0.445804
28	midway ecn	-0.125909	concepts time	0.445543
29	condition	-0.124424	new car	0.445538
30	uk	-0.123403	saturn	0.428828
31	advance	-0.120888	concepts	0.418998
32	video	-0.119778	nissan	0.417425
33	hockey	-0.117756	taurus	0.413592
34	dog	-0.117534	james callison	0.408614
35	la	-0.117507	jim frost	0.403838
36	use	-0.117077	convertible	0.403720
37	order	-0.117014	driving	0.403482
38	mode	-0.116576	callison uokmax	0.395484
39	games	-0.116479	jimf centerline	0.394862
40	windows	-0.114812	jimf	0.394862
41	memory	-0.114748	uokmax ecn	0.394628
42	sony	-0.113444	uokmax	0.394628
43	book	-0.113401	honda	0.391601
44	scsi	-0.112871	vw	0.374545
45	control	-0.112714	boyle cactus	0.370921
46	computing	-0.111639	models	0.368860
47	nasa	-0.111585	trunk	0.366876
48	utexas edu	-0.110489	engr washington	0.365249
49	board	-0.110376	chevy	0.361945

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	car	-0.286296	bike	3.322988
1	windows	-0.225016	dod	2.566536
2	does	-0.189510	bikes	1.417203
3	cars	-0.173699	motorcycle	1.392350
4	use	-0.172006	ride	1.334104
5	gun	-0.162740	riding	1.221409
6	believe	-0.156443	bmw	0.991547
7	card	-0.144651	rider	0.977306
8	space	-0.142959	motorcycles	0.826216
9	auto	-0.140948	helmet	0.811539
10	using	-0.139521	ama	0.731285
11	mac	-0.136020	dog	0.640129
12	information	-0.133225	harley	0.627291
13	game	-0.131732	honda	0.624566
14	team	-0.131579	yamaha	0.595195
15	government	-0.130072	behanna	0.569610
16	baseball	-0.128850	egreen east	0.563789
17	radio	-0.126334	egreen	0.563789
18	problem	-0.122624	east sun	0.563203
19	american	-0.122275	infante	0.554910
20	fan	-0.121492	ranck	0.546244
21	scott	-0.120340	moa	0.545869
22	boyle	-0.120062	zx	0.535977
23	hockey	-0.118793	biker	0.512128
24	microsystems lines	-0.117895	hydro	0.510585
25	toyota	-0.117440	ed green	0.502827
26	fax	-0.115135	riders	0.496290
27	christian	-0.115006	hydro ca	0.481503
28	price	-0.113900	dogs	0.465928
29	jesus	-0.113470	nj nec	0.465131
30	power	-0.113262	drinking	0.457907
31	chip	-0.113260	eskimo com	0.453305
32	year	-0.113209	speedy	0.453300
33	season	-0.112759	countersteering	0.450573
34	god	-0.110527	levine	0.439081
35	na	-0.110237	eskimo	0.434208
36	support	-0.109762	stafford	0.430404
37	does know	-0.108036	sun com	0.424932
38	conditioning	-0.107715	jody levine	0.424040
39	graphics	-0.107148	shaft	0.418007
40	nissan	-0.106932	jody	0.405544
41	distribution na	-0.106755	insurance	0.403480
42	games	-0.106543	nec com	0.402403
43	reply	-0.106294	winona	0.401970
44	control	-0.104614	pettefar	0.400739
45	pc	-0.104598	ed	0.397985
46	data	-0.104284	chris behanna	0.397672
47	box	-0.103805	npet bnr	0.392363
48	memory	-0.103360	npet	0.392363
49	boston	-0.103121	nick pettefar	0.392363

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	hockey	-0.599129	baseball	1.982992
1	nhl	-0.501949	pitching	1.215024
2	playoffs	-0.313111	braves	1.127552
3	cup	-0.291172	phillies	1.073558
4	sale	-0.265599	runs	1.035096
5	goal	-0.261501	cubs	0.882127
6	playoff	-0.260685	mets	0.856105
7	leafs	-0.260215	sox	0.848817
8	pens	-0.256636	players	0.822357
9	penguins	-0.247154	hit	0.815258
10	goals	-0.246488	year	0.815014
11	ice	-0.242504	hitter	0.805832
12	wings	-0.242452	team	0.718800
13	windows	-0.230931	alomar	0.716358
14	car	-0.224364	jays	0.693386
15	flyers	-0.203229	ball	0.693194
16	people	-0.199597	jewish baseball	0.682621
17	devils	-0.194810	yankees	0.674261
18	dod	-0.184558	baseball players	0.638860
19	want	-0.180850	games	0.635803
20	bruins	-0.179437	pitcher	0.616015
21	space	-0.171028	season	0.605661
22	ca	-0.165528	rbi	0.590167
23	puck	-0.163944	jewish	0.587952
24	contact	-0.163914	rockies	0.582167
25	problem	-0.162610	subject jewish	0.581473
26	karr	-0.160960	stadium	0.570149
27	gun	-0.160467	dodgers	0.550324
28	goalie	-0.160116	win	0.549610
29	god	-0.159886	game	0.539143
30	email	-0.159301	red sox	0.508905
31	stanley	-0.156141	pitchers	0.508403
32	buy	-0.155195	yankee	0.499444
33	lemieux	-0.154946	batting	0.499180
34	penalty	-0.152949	tigers	0.492961
35	program	-0.151645	nl	0.491502
36	state	-0.151555	ted	0.483163
37	bike	-0.146000	pitch	0.482258
38	file	-0.143852	hr	0.481884
39	interested	-0.143689	alleg	0.472645
40	using	-0.142661	baerga	0.472381
41	stanley cup	-0.142555	league	0.469714
42	christian	-0.141428	morris	0.464239
43	computer	-0.141296	cs cornell	0.462765
44	quebec	-0.141273	journalism indiana	0.462645
45	espn	-0.139897	orioles	0.461480
46	uk	-0.138402	tedward	0.456461
47	round	-0.138221	tedward cs	0.454666
48	called	-0.137844	ted fischer	0.450943
49	sharks	-0.137289	edward ted	0.450943

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	pitching	-0.503812	hockey	2.745056
1	runs	-0.397354	nhl	2.031628
2	com	-0.297169	team	1.614400
3	phillies	-0.297097	game	1.222137
4	run	-0.281868	playoff	1.198517
5	mets	-0.273202	playoffs	1.165680
6	use	-0.264522	leafs	1.160847
7	sox	-0.259872	cup	1.074502
8	sale	-0.259185	play	1.056705
9	braves	-0.251448	devils	1.054775
10	thanks	-0.233531	wings	1.007565
11	nl	-0.229535	espn	0.915872
12	windows	-0.222824	detroit	0.913727
13	tigers	-0.221776	rangers	0.911138
14	work	-0.212930	season	0.901455
15	baseball	-0.210562	players	0.882939
16	edu	-0.208619	pens	0.871250
17	rockies	-0.208497	penguins	0.859462
18	jays	-0.201134	islanders	0.839436
19	innings	-0.195829	coach	0.810762
20	hit	-0.190635	stanley	0.809702
21	health	-0.189913	toronto	0.781234
22	pitcher	-0.189833	stanley cup	0.776259
23	drive	-0.189027	ice	0.776044
24	orioles	-0.188800	teams	0.775791
25	using	-0.185575	lemieux	0.761445
26	cubs	-0.184513	goals	0.753313
27	ball	-0.183012	gerald	0.742641
28	reds	-0.182822	bruins	0.736018
29	yankees	-0.181461	goal	0.698247
30	god	-0.181432	pittsburgh	0.690573
31	used	-0.178879	ca	0.670671
32	ibm	-0.178403	maynard	0.669196
33	price	-0.173524	abc	0.657951
34	rbi	-0.172838	flyers	0.650510
35	help	-0.172739	montreal	0.626113
36	hitter	-0.171543	player	0.622925
37	distribution	-0.170394	goalie	0.605528
38	dodgers	-0.169857	coverage	0.604294
39	card	-0.169183	puck	0.585689
40	red sox	-0.168037	ramsey	0.583423
41	tax	-0.166739	league	0.582566
42	base	-0.165461	finals	0.566472
43	stadium	-0.165115	win	0.552868
44	manager	-0.164440	traded	0.552262
45	looking	-0.162689	chem utoronto	0.551491
46	pitched	-0.162649	hawks	0.550538
47	bullpen	-0.162617	alchemy	0.550032
48	does	-0.158034	alchemy chem	0.543425
49	world series	-0.156910	utoronto ca	0.543121

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	thanks	-0.255692	clipper	2.996278
1	help	-0.248558	encryption	2.512530
2	keyboard	-0.240424	key	2.268067
3	windows	-0.234923	chip	1.725607
4	gun	-0.219906	clipper chip	1.559344
5	problem	-0.192000	keys	1.368082
6	guns	-0.191063	nsa	1.353249
7	problems	-0.186570	escrow	1.301817
8	window	-0.183966	crypto	1.258073
9	god	-0.181335	secret	1.023378
10	apple	-0.177214	pgp	1.017468
11	distribution usa	-0.176762	gtoal	1.015017
12	email	-0.176095	security	0.995122
13	car	-0.174092	secure	0.982704
14	waco	-0.171591	encrypted	0.982152
15	info	-0.169386	des	0.955641
16	motif	-0.167728	tapped	0.897075
17	batf	-0.160759	tapped code	0.868792
18	hp	-0.160049	code good	0.864205
19	scsi	-0.159458	wiretap	0.851041
20	image	-0.154544	algorithm	0.843944
21	cb	-0.153680	subject tapped	0.843709
22	memory	-0.153456	cryptography	0.831226
23	cb att	-0.153022	government	0.826672
24	card	-0.146347	privacy	0.800161
25	home	-0.145396	public key	0.799333
26	power	-0.142330	subject clipper	0.774841
27	address	-0.139914	eff	0.707552
28	sale	-0.138669	announcement	0.700055
29	ctrl	-0.137723	rsa	0.663111
30	graphics	-0.137238	toal	0.624819
31	mac	-0.135983	graham toal	0.624819
32	pc	-0.133167	code	0.623301
33	looking	-0.131588	public	0.610487
34	book	-0.131475	gtoal gtoal	0.610321
35	scott	-0.130827	gtoal com	0.610321
36	driver	-0.129492	key escrow	0.596081
37	said	-0.129159	denning	0.591515
38	dc	-0.128865	phone	0.589490
39	israel	-0.128401	classified	0.588888
40	chris	-0.128212	qualcomm	0.581876
41	win	-0.125422	sternlight	0.580894
42	intel	-0.123691	scheme	0.572859
43	mouse	-0.123439	david sternlight	0.570591
44	frank	-0.123178	phones	0.563682
45	area	-0.122326	white house	0.554053
46	display	-0.120349	bits	0.530820
47	sun	-0.119753	brad	0.521718
48	year	-0.119710	crypt	0.512859
49	xterm	-0.119239	bontchev	0.505843

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	windows	-0.311488	circuit	1.056166
1	sale	-0.273852	voltage	0.733122
2	clipper	-0.203362	electronics	0.652888
3	mac	-0.188981	amp	0.581651
4	space	-0.175729	circuits	0.561164
5	graphics	-0.168297	power	0.505019
6	shipping	-0.166350	audio	0.474846
7	did	-0.148983	current	0.463145
8	access	-0.148443	cooling	0.446665
9	year	-0.142876	radar	0.437066
10	government	-0.141965	detector	0.429100
11	engine	-0.141007	ground	0.420204
12	encryption	-0.136890	wire	0.409583
13	bike	-0.136380	cooling towers	0.402634
14	motherboard	-0.135493	towers	0.400315
15	card	-0.133813	number phone	0.398327
16	drive	-0.133782	use	0.388118
17	file	-0.132637	line	0.387387
18	cs	-0.130776	electrical	0.378410
19	asking	-0.130558	babb	0.369220
20	months	-0.126755	motorola	0.368604
21	forsale	-0.126702	output	0.366988
22	driver	-0.126119	outlet	0.362812
23	window	-0.124397	device	0.362767
24	orbit	-0.122202	detector detectors	0.360113
25	motif	-0.119381	receiver	0.359438
26	monitor	-0.119290	wiring	0.354431
27	scsi	-0.118519	radio	0.350346
28	color	-0.117391	detectors	0.346530
29	dos	-0.117299	tv	0.345803
30	modem	-0.116601	rf	0.341801
31	server	-0.115994	need number	0.336502
32	think	-0.115592	8051	0.335419
33	news	-0.115398	neoucom	0.329285
34	memory	-0.114647	number line	0.327816
35	486	-0.114190	ee	0.327052
36	god	-0.112164	grissom larc	0.324003
37	new	-0.111509	grissom	0.321098
38	algorithm	-0.111352	low	0.319040
39	steve	-0.111241	signal	0.316684
40	dod	-0.107977	radar detector	0.316674
41	send	-0.107632	signals	0.315270
42	format	-0.107159	build	0.314854
43	person	-0.106951	kolstad	0.314739
44	floppy	-0.106858	phone	0.310411
45	clipper chip	-0.106828	old 256k	0.309202
46	team	-0.105403	256k simms	0.307771
47	james	-0.105359	design	0.304236
48	programs	-0.104793	resistor	0.302913
49	program	-0.104208	input	0.301473

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	god	-0.308393	msg	1.391552
1	space	-0.219429	gordon banks	1.134642
2	car	-0.208842	geb	1.087958
3	government	-0.202934	disease	1.053067
4	power	-0.190390	doctor	1.033087
5	drive	-0.184275	cs pitt	1.025371
6	christians	-0.174080	banks	1.011849
7	windows	-0.166031	gordon	0.951553
8	gun	-0.164528	geb cs	0.914965
9	cis	-0.162413	pitt	0.887279
10	graphics	-0.155481	edu gordon	0.886129
11	org	-0.151248	dyer	0.852144
12	christian	-0.151072	pitt edu	0.819541
13	team	-0.148499	patients	0.806629
14	game	-0.146758	medical	0.771834
15	bible	-0.144245	food	0.763348
16	earth	-0.142148	medicine	0.736737
17	card	-0.141680	diet	0.731859
18	jesus	-0.141456	treatment	0.727953
19	bike	-0.139986	foods	0.711181
20	code	-0.138682	sensitivity	0.689626
21	religion	-0.133725	syndrome	0.652705
22	software	-0.133340	msg sensitivity	0.647112
23	tv	-0.132018	sensitivity superstition	0.635233
24	church	-0.131194	superstition	0.626830
25	list	-0.130082	symptoms	0.622026
26	sale	-0.129283	spdcc	0.608653
27	man	-0.127012	subject msg	0.589249
28	dod	-0.126707	health	0.555455
29	clinton	-0.126026	cancer	0.529851
30	cis pitt	-0.124664	spdcc com	0.528743
31	win	-0.120584	chinese	0.528719
32	ac	-0.117899	steve dyer	0.520260
33	police	-0.116742	patient	0.518024
34	math	-0.115484	yeast	0.514193
35	usa	-0.115059	pain	0.513593
36	coverage	-0.114358	effects	0.489690
37	law	-0.114015	seizures	0.478036
38	pc	-0.113967	candida	0.461372
39	engineering	-0.113634	physician	0.448191
40	shipping	-0.113519	univ pittsburgh	0.432932
41	video	-0.112646	pittsburgh computer	0.432932
42	cars	-0.112641	homeopathy	0.431116
43	gas	-0.111112	skepticism chastity	0.430920
44	didn	-0.110731	n3jxp skepticism	0.430920
45	games	-0.110437	n3jxp	0.430920
46	parts	-0.110218	chastity intellect	0.430920
47	rutgers	-0.109387	banks n3jxp	0.430920
48	created	-0.109305	chastity	0.430039
49	chip	-0.109268	skepticism	0.429682

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	windows	-0.329683	space	2.737442
1	steve	-0.290013	orbit	1.507993
2	greenbelt md	-0.274327	moon	1.435014
3	md usa	-0.261338	launch	1.175230
4	communications greenbelt	-0.252275	shuttle	1.054835
5	distribution usa	-0.244397	henry	1.049330
6	chip	-0.242633	nasa	1.026781
7	greenbelt	-0.242195	pat	1.016211
8	sale	-0.234942	alaska	0.997669
9	car	-0.232567	prb access	0.961597
10	god	-0.218735	prb	0.961597
11	good	-0.217626	com pat	0.934431
12	code	-0.206001	lunar	0.930392
13	md	-0.201529	alaska edu	0.918956
14	file	-0.196365	spacecraft	0.884275
15	cc	-0.189775	aurora	0.820603
16	best	-0.189436	zoo toronto	0.819396
17	clinton	-0.187100	nsmca	0.804579
18	hp	-0.165996	zoo	0.798657
19	usa	-0.165993	earth	0.764267
20	ve	-0.165896	sci	0.748000
21	card	-0.161573	henry spencer	0.691288
22	thanks	-0.161558	baalke	0.685424
23	graphics	-0.159680	henry zoo	0.681170
24	brinich	-0.156961	space station	0.670813
25	steve access	-0.156961	aurora alaska	0.668944
26	steve brinich	-0.156961	access digex	0.653567
27	com steve	-0.153574	flight	0.644492
28	andrew	-0.153570	digex	0.638069
29	run	-0.152757	solar	0.635337
30	law	-0.152483	subject space	0.632624
31	jason	-0.144077	kelvin jpl	0.620839
32	want	-0.143859	spencer	0.614056
33	brian	-0.143520	toronto edu	0.598772
34	opinions	-0.143468	funding	0.598246
35	brinich subject	-0.142943	distribution sci	0.585037
36	game	-0.142251	mars	0.583079
37	mail	-0.141279	edu henry	0.582750
38	season	-0.141062	kelvin	0.580679
39	number	-0.140707	mccall	0.576526
40	color	-0.139889	sci space	0.573207
41	encryption	-0.139528	pat subject	0.570772
42	cars	-0.138169	nsmca aurora	0.560148
43	disk	-0.137070	orbital	0.552450
44	state	-0.135425	station	0.533392
45	ms	-0.134902	acad3 alaska	0.529022
46	does	-0.134021	acad3	0.529022
47	summary	-0.133050	comet	0.521670
48	netcom com	-0.131623	billion	0.519471
49	andrew cmu	-0.131307	satellite	0.507340

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	166	0	0	1	0	1	0	0	1	1	1	3	0	6	3	123	4	8	0	1
1	1	252	15	12	9	18	1	2	1	5	2	41	4	0	6	15	4	1	0	0
2	0	14	258	45	3	9	0	2	1	3	2	25	1	0	6	23	2	0	0	0
3	0	5	11	305	17	1	3	6	1	0	2	19	13	0	5	3	1	0	0	0
4	0	3	8	23	298	0	3	8	1	3	1	16	8	0	2	8	3	0	0	0
5	1	21	17	13	2	298	1	0	1	1	0	23	0	1	4	10	2	0	0	0
6	0	1	3	31	12	1	271	19	4	4	6	5	12	6	3	9	3	0	0	0
7	0	1	0	3	0	0	4	364	3	2	2	4	1	1	3	3	4	0	1	0
8	0	0	0	1	0	0	2	10	371	0	0	4	0	0	0	8	2	0	0	0
9	0	0	0	0	1	0	0	4	0	357	22	0	0	0	2	9	1	1	0	0
10	0	0	0	0	0	0	0	1	0	4	387	1	0	0	1	5	0	0	0	0
11	0	2	1	0	0	1	1	3	0	0	0	383	1	0	0	3	1	0	0	0
12	0	4	2	17	5	0	2	8	7	1	2	78	235	3	11	15	2	1	0	0
13	2	3	0	1	1	3	1	0	2	3	4	11	5	292	6	52	6	4	0	0
14	0	2	0	1	0	3	0	2	1	0	1	6	1	2	351	19	4	0	1	0
15	2	0	0	0	0	0	0	0	1	0	0	0	0	1	2	392	0	0	0	0
16	0	0	0	1	0	0	2	0	1	1	0	10	0	0	1	6	341	1	0	0
17	0	1	0	0	0	0	0	0	0	1	0	2	0	0	0	24	3	344	1	0
18	2	0	0	0	0	0	0	1	0	0	1	11	0	1	7	35	118	5	129	0
19	33	2	0	0	0	0	0	0	0	1	1	3	0	4	4	131	29	5	3	35

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	nntp	-0.478942	god	1.805293
1	nntp posting	-0.477601	rutgers edu	1.625885
2	posting host	-0.477601	article apr	1.518949
3	host	-0.469325	christians	1.448327
4	posting	-0.440066	rutgers	1.446445
5	morality	-0.360335	athos rutgers	1.429535
6	distribution	-0.351899	athos	1.429535
7	kaldis	-0.345950	christ	1.266615
8	atheism	-0.281448	church	1.200873
9	brian	-0.250321	jesus	1.052535
10	romulus rutgers	-0.250041	clh	0.942558
11	romulus	-0.240821	geneva rutgers	0.896292
12	com	-0.232132	faith	0.887455
13	keywords	-0.214753	geneva	0.881115
14	theodore kaldis	-0.211939	christian	0.862696
15	islam	-0.211866	bible	0.839950
16	sandvik	-0.211291	1993	0.832347
17	apr 1993	-0.208587	christianity	0.795422
18	american	-0.202822	apr	0.773233
19	distribution usa	-0.196376	scripture	0.703437
20	theodore	-0.192093	heaven	0.690204
21	malcolm	-0.191463	sin	0.678748
22	using	-0.191385	catholic	0.677958
23	lds	-0.190771	resurrection	0.661965
24	host aisun3	-0.190653	petch	0.644785
25	apple	-0.190322	arrogance	0.622870
26	malcolm lee	-0.190060	hell	0.587978
27	keith	-0.189473	03	0.559302
28	newsreader	-0.188690	jayne	0.538981
29	robert	-0.187946	romans	0.527868
30	news	-0.185091	fisher	0.526401
31	usa	-0.185061	lord	0.475500
32	uucp	-0.183387	kulikauskas	0.474620
33	edu kaldis	-0.182369	married	0.472520
34	weiss	-0.180023	truth	0.468538
35	robert weiss	-0.179193	easter	0.468126
36	distribution world	-0.178996	verse	0.462723
37	buffalo	-0.178983	arrogance christians	0.443106
38	gmt	-0.178545	believe	0.427412
39	benedikt	-0.175737	revelation	0.418598
40	koresh	-0.175436	petch gvg47	0.416763
41	apple com	-0.175034	gvg47 gvg	0.416763
42	government	-0.174643	gvg47	0.416763
43	version	-0.170642	accepting	0.408514
44	majority	-0.166478	subject daily	0.407767
45	remus rutgers	-0.165933	daily verse	0.407767
46	remus	-0.164602	subject arrogance	0.403420
47	royalroads	-0.164564	question	0.402095
48	royalroads ca	-0.164564	mmalt guild	0.401333
49	tin	-0.161246	mmalt	0.401333

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	clipper	-0.222678	gun	2.579640
1	encryption	-0.215469	guns	1.750272
2	mail	-0.188852	firearms	1.229257
3	chip	-0.185588	batf	1.070268
4	israel	-0.170434	waco	1.054974
5	key	-0.160673	atf	0.987423
6	radar	-0.157711	weapons	0.987191
7	new	-0.155324	gun control	0.936558
8	israeli	-0.153292	fbi	0.913826
9	program	-0.145553	handgun	0.762626
10	turkish	-0.145435	subject gun	0.717268
11	speed	-0.143815	cdt	0.709990
12	space	-0.138044	weapon	0.672084
13	reason	-0.136117	firearm	0.656523
14	death penalty	-0.133639	ranch	0.649816
15	president know	-0.131840	survivors	0.642186
16	data	-0.131838	dividian	0.608275
17	jesus	-0.131204	handheld com	0.606538
18	car	-0.130628	dividian ranch	0.603888
19	warning	-0.129894	ifas ufl	0.599772
20	mit	-0.128916	ifas	0.599772
21	message mr	-0.128690	gnv ifas	0.599772
22	god	-0.128398	gnv	0.595359
23	play	-0.127494	subject atf	0.582316
24	subject message	-0.126755	burns dividian	0.576486
25	game	-0.126469	atf burns	0.576486
26	cryptography	-0.126197	control	0.573298
27	sandvik	-0.124592	handheld	0.572013
28	happened organization	-0.124249	nra	0.566508
29	virginia	-0.123857	compound	0.564035
30	windows	-0.121348	kratz	0.563989
31	mit edu	-0.120890	feustel	0.549113
32	14	-0.120116	sw stratus	0.536962
33	warning read	-0.119787	tavares	0.534152
34	org	-0.119494	arms	0.531348
35	subject warning	-0.119405	stratus com	0.517930
36	se	-0.118506	criminals	0.513595
37	corporation	-0.118429	rkba	0.512320
38	tax	-0.118302	ranch survivors	0.506095
39	atheists	-0.118159	com tavares	0.505799
40	team	-0.117957	cdt sw	0.505799
41	code	-0.117929	stratus	0.503989
42	christian	-0.117311	batf fbi	0.503198
43	games	-0.117150	gun like	0.497802
44	president	-0.117068	handguns	0.497027
45	health care	-0.116405	crime	0.494372
46	keys	-0.116015	sw	0.494356
47	message	-0.115645	frank crary	0.492476
48	cs colorado	-0.114984	crary	0.492476
49	ma	-0.114734	burns	0.490712

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	god	-0.350832	israel	2.761935
1	jesus	-0.274825	israeli	2.527672
2	good	-0.251022	turkish	1.493652
3	thanks	-0.232954	arab	1.340593
4	bible	-0.209399	armenians	1.246517
5	christian	-0.201946	jews	1.231433
6	christ	-0.200742	armenian	1.226403
7	gun	-0.196413	armenia	1.169814
8	uk	-0.192723	arabs	1.072722
9	ve	-0.178288	turks	0.972496
10	distribution	-0.172574	serdar	0.970465
11	space	-0.170639	turkey	0.937920
12	game	-0.165605	argic	0.932473
13	ll	-0.161409	israelis	0.927347
14	jim	-0.160812	soldiers	0.883230
15	fbi	-0.159531	cpr	0.818071
16	little	-0.155570	policy	0.810203
17	waco	-0.152306	center policy	0.776000
18	lord	-0.151701	serdar argic	0.763582
19	data	-0.151387	policy research	0.762256
20	thing	-0.149069	jewish	0.732222
21	distribution usa	-0.148496	subject israeli	0.728123
22	new	-0.145157	lebanese	0.728118
23	pretty	-0.144219	occupied	0.687469
24	guns	-0.143545	lebanon	0.666627
25	mouse	-0.142456	palestinian	0.663780
26	law	-0.141472	palestinians	0.652173
27	best	-0.140601	holocaust	0.649212
28	james	-0.140366	igc	0.641495
29	windows	-0.139020	palestine	0.627737
30	doesn	-0.137142	peace	0.622758
31	available	-0.135942	sdpa	0.622098
32	mormons	-0.132159	igc apc	0.620060
33	christianity	-0.131594	apc org	0.620060
34	version	-0.131466	apc	0.618195
35	michael	-0.131115	nysernet	0.608043
36	looking	-0.130784	jake	0.582760
37	canada	-0.129731	urartu	0.579178
38	net	-0.129338	zuma	0.575633
39	batf	-0.127972	villages	0.567328
40	faith	-0.127897	zuma uucp	0.565462
41	sex	-0.126974	sdpa org	0.562028
42	apple	-0.124563	urartu sdpa	0.554725
43	sun	-0.124421	cyprus	0.540212
44	need	-0.124299	hezbollah	0.537373
45	win	-0.123517	cosmo	0.536988
46	edu david	-0.122965	uucp serdar	0.535759
47	probably	-0.122663	sera zuma	0.535759
48	use	-0.122423	sera	0.535759
49	matthew	-0.121841	occupation	0.530713

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	gun	-0.240749	cramer	1.279590
1	clipper	-0.210502	optilink	1.075916
2	car	-0.208287	clayton	0.925913
3	team	-0.199598	kaldis	0.910310
4	guns	-0.193330	clayton cramer	0.872607
5	chip	-0.187127	optilink com	0.799627
6	god	-0.172919	gay	0.784267
7	christian	-0.169253	clinton	0.779250
8	israeli	-0.159803	com clayton	0.653406
9	israel	-0.158112	cramer optilink	0.631128
10	christians	-0.153255	tax	0.627427
11	encryption	-0.151469	isc br	0.593709
12	technology	-0.144965	br	0.537165
13	thanks	-0.144391	theodore kaldis	0.530644
14	waco	-0.141359	br com	0.508279
15	religion	-0.140141	steveh	0.506406
16	control	-0.139926	thor isc	0.504580
17	killed	-0.132962	new study	0.468353
18	jesus	-0.132649	romulus rutgers	0.466554
19	phone	-0.130550	steveh thor	0.464326
20	batf	-0.129394	steve hendricks	0.464326
21	john	-0.124590	hendricks	0.463630
22	jim	-0.123446	health care	0.463058
23	space	-0.120510	sexual	0.455044
24	stated	-0.120082	homosexual	0.451504
25	clipper chip	-0.119716	romulus	0.443966
26	info	-0.117185	drugs	0.441151
27	feustel	-0.117025	cramer writes	0.440658
28	science	-0.116493	theodore	0.431780
29	brad	-0.115651	isc	0.431360
30	fbi	-0.115404	jobs	0.429552
31	key	-0.115277	homosexuals	0.428784
32	jews	-0.111757	president	0.412206
33	buy	-0.111501	thor	0.411450
34	research	-0.110619	health	0.403253
35	christianity	-0.109087	ipser	0.400499
36	baseball	-0.107713	deficit	0.394536
37	matter	-0.106788	atlantaga ncr	0.387261
38	home	-0.106209	atlantaga	0.387261
39	host magnus	-0.105333	gay percentage	0.385092
40	firearms	-0.104333	kaldis romulus	0.384204
41	robert	-0.102893	consent	0.383365
42	end	-0.101495	government	0.382534
43	interested	-0.101170	concentrate	0.377172
44	arms	-0.101056	study gay	0.375971
45	jewish	-0.100878	ncratl atlantaga	0.366063
46	drive	-0.099840	ncratl	0.366063
47	lines article	-0.098429	men	0.356346
48	koresh	-0.098307	free	0.351063
49	reno	-0.098265	taxes	0.349464

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	atheists	-0.195706	sandvik	0.585719
1	rutgers edu	-0.191471	christian	0.545276
2	rutgers	-0.190287	robert weiss	0.475192
3	thanks	-0.163467	weiss	0.463496
4	clh	-0.162012	ch981 cleveland	0.459477
5	free	-0.152926	ch981	0.459477
6	need	-0.152154	kent	0.459302
7	problem	-0.139410	koresh	0.450443
8	article apr	-0.137967	royalroads ca	0.446104
9	1993	-0.137923	royalroads	0.446104
10	atheist	-0.135408	tony alicea	0.433298
11	genocide	-0.123902	rosicrucian	0.432065
12	hell	-0.119854	morality	0.429734
13	caltech edu	-0.115136	god promise	0.419108
14	caltech	-0.114485	alicea	0.408533
15	geneva rutgers	-0.112430	malcolm lee	0.407051
16	geneva	-0.111645	jesus	0.396638
17	cobb	-0.105518	malcolm	0.395671
18	schneider	-0.104465	93 god	0.391050
19	atheism	-0.104245	psyrobtw	0.371880
20	new	-0.100533	order	0.365398
21	idea	-0.100353	edu tony	0.357811
22	assumption	-0.098130	biblical	0.354625
23	keith	-0.096336	sandvik kent	0.347946
24	year	-0.095172	kent apple	0.347946
25	islamic	-0.093796	kendig	0.340709
26	athos	-0.093756	brian kendig	0.340709
27	athos rutgers	-0.093756	rosicrucian order	0.339644
28	clinton	-0.093643	post royalroads	0.338062
29	keith cco	-0.093065	mlee post	0.338062
30	wwc	-0.092543	quack kfu	0.329175
31	general	-0.091737	kfu com	0.329175
32	saturn wwc	-0.091722	kfu	0.327360
33	wwc edu	-0.091722	god	0.327133
34	information	-0.091504	mlee	0.323950
35	bu edu	-0.090913	promise	0.319546
36	uk	-0.090687	joslin	0.316207
37	allan schneider	-0.090648	apple com	0.314847
38	keith allan	-0.090648	subject 2000	0.313779
39	perfect	-0.090471	sandvik newton	0.312501
40	gov	-0.090247	newton apple	0.312501
41	university	-0.089358	ca malcolm	0.308892
42	belief	-0.089247	kent sandvik	0.303535
43	bu	-0.088978	article sandvik	0.302410
44	work	-0.088664	biblical backing	0.299272
45	03	-0.088466	meritt	0.298930
46	valuable	-0.088403	christian morality	0.298559
47	mike	-0.088218	brian	0.296168
48	allan	-0.088208	com sandvik	0.293057
49	free moral	-0.087457	bskendig netcom	0.292565

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	hrist	-0.194710	theis	1.106866
1	ians	-0.178771	athei	1.072342
2	stian	-0.153789	athe	0.864802
3	istia	-0.151108	keith	0.824156
4	s.edu	-0.150853	heist	0.808652
5	risti	-0.149580	islam	0.724374
6	gers.	-0.149477	eists	0.701778
7	.rutg	-0.143008	heism	0.632027
8	ers.e	-0.143003	isla	0.591572
9	rs.ed	-0.143003	keit	0.554473
10	rutge	-0.140693	u (ke	0.524833
11	tgers	-0.140693	(keit	0.521565
12	utger	-0.140693	(kei	0.513267
13	chri	-0.138795	vesey	0.511749
14	=====	-0.137284	ivese	0.511749
15	tians	-0.136190	bobb	0.510663
16	n.cal	-0.133942	alla	0.509364
17	chris	-0.132989	eith	0.507646
18	tian	-0.132534	jaege	0.493277
19	dman.	-0.130011	h@cco	0.483662
20	of c	-0.129564	th@cc	0.481903
21	sandm	-0.129428	aeger	0.480952
22	man.c	-0.127061	hneid	0.479919
23	1993.	-0.125907	eider	0.479919
24	on: u	-0.125689	du (k	0.478513
25	an.ca	-0.124556	neide	0.476685
26	ndman	-0.123557	schne	0.474441
27	hat c	-0.123154	chnei	0.471840
28	t: sa	-0.123140	slami	0.468545
29	.1993	-0.121659	jaeg	0.467804
30	f chr	-0.121652	ith@c	0.466925
31	andma	-0.117787	lamic	0.465649
32	of ch	-0.117505	eith@	0.455297
33	hell	-0.117165	nedik	0.446001
34	want	-0.114513	enedi	0.446001
35	du (a	-0.111676	edikt	0.446001
36	in h	-0.109096	bened	0.446001
37	s pro	-0.105030	l ath	0.445724
38	s.rut	-0.104740	schn	0.437340
39	usa\n	-0.104656	ushdi	0.431082
40	trea	-0.104591	shdie	0.431082
41	ruth	-0.104466	rushd	0.429657
42	ture	-0.104398	ider)	0.421163
43	use t	-0.104200	eism	0.418629
44	cult	-0.104098	ze.wp	0.408721
45	hell	-0.103903	tze.w	0.408721
46	can	-0.103502	solnt	0.408721
47	c@cco	-0.102430	olntz	0.408721
48	n we	-0.102389	ntze.	0.408721
49	l.com	-0.102057	lntze	0.408721

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	wind	-0.210219	phics	0.889178
1	windo	-0.199793	aphic	0.826278
2	indow	-0.199399	raphi	0.792678
3	crypt	-0.147027	hics	0.700232
4	idget	-0.137149	graph	0.697049
5	widge	-0.134665	grap	0.688807
6	sale	-0.131028	image	0.640641
7	dows	-0.130920	imag	0.635767
8	ndows	-0.128106	olygo	0.556094
9	r sal	-0.125818	lygon	0.549179
10	it.ed	-0.124636	polyg	0.548527
11	the x	-0.121964	file	0.451353
12	widg	-0.118938	tiff	0.444364
13	list	-0.113557	poly	0.438104
14	st of	-0.112570	nimat	0.411684
15	monit	-0.111921	mage	0.406374
16	onito	-0.111012	cview	0.387756
17	pric	-0.110942	ygon	0.370210
18	nitor	-0.110795	imati	0.369748
19	price	-0.109920	rithm	0.366211
20	ith a	-0.109114	mages	0.366034
21	mit.e	-0.108937	orith	0.358966
22	moni	-0.108279	cvie	0.358923
23	.mit.	-0.107361	algo	0.358834
24	or sa	-0.106952	lgori	0.353428
25	list	-0.106588	gorit	0.353396
26	e car	-0.105770	algor	0.353228
27	t.edu	-0.104144	anima	0.315670
28	an x	-0.103585	oints	0.301595
29	y inf	-0.103238	files	0.296488
30	icati	-0.102855	tiff	0.293500
31	prot	-0.102126	anim	0.283251
32	the l	-0.100827	ibrar	0.282348
33	itor	-0.099944	form	0.278534
34	elect	-0.099637	progr	0.278199
35	i tr	-0.099346	ckage	0.275891
36	confi	-0.098909	point	0.275882
37	lanet	-0.098654	libra	0.273130
38	contr	-0.098025	urfac	0.271032
39	ight	-0.097715	surfa	0.271032
40	hat t	-0.096963	p.gra	0.269583
41	catio	-0.096583	omp.g	0.269583
42	he ca	-0.096374	mp.gr	0.269583
43	a wi	-0.094973	.grap	0.268418
44	a win	-0.094596	ogram	0.266209
45	acce	-0.094402	ackag	0.264678
46	, ple	-0.094348	cs li	0.262966
47	appli	-0.093803	3d g	0.262910
48	vice	-0.093504	libr	0.260609
49	ndow	-0.093407	hics.	0.259998

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	ndow	-0.380450	ndows	1.544164
1	board	-0.236146	dows	1.421803
2	dow m	-0.225387	wind	1.159674
3	motif	-0.202330	windo	1.154040
4	for s	-0.187504	indow	1.153866
5	color	-0.186237	r win	0.692845
6	image	-0.184436	file	0.689033
7	enwin	-0.182456	s 3.1	0.668462
8	openw	-0.182456	ows 3	0.592240
9	penwi	-0.182456	ws 3.	0.589709
10	sale	-0.176028	for w	0.558763
11	nwind	-0.171307	river	0.558037
12	the x	-0.171290	or wi	0.546765
13	or sa	-0.168077	3.1	0.525055
14	r sal	-0.165673	n win	0.512513
15	w man	-0.161914	: win	0.473913
16	oard	-0.159727	dos	0.446539
17	otif	-0.156950	p fil	0.385430
18	ow ma	-0.155975	in wi	0.381601
19	phics	-0.150670	s-win	0.380655
20	boar	-0.149208	ms-wi	0.380655
21	imag	-0.148457	file	0.376563
22	he po	-0.148251	files	0.372198
23	mac	-0.144419	win3	0.366861
24	port	-0.143563	font	0.362544
25	x11r	-0.143242	.ini	0.357624
26	x11r5	-0.139668	win	0.329636
27	scsi	-0.139619	drive	0.327014
28	nitor	-0.138714	e: wi	0.324357
29	onito	-0.138588	cica	0.308772
30	monit	-0.137784	dows.	0.308543
31	x-win	-0.137459	in 3.	0.304620
32	an x	-0.137220	win 3	0.303961
33	splay	-0.135698	in w	0.297449
34	displ	-0.135211	driv	0.294832
35	aphic	-0.134526	h win	0.291729
36	open	-0.134197	n 3.1	0.274426
37	power	-0.133617	wnloa	0.274328
38	x win	-0.133271	win.	0.273890
39	scsi	-0.132955	ownlo	0.273710
40	x-wi	-0.131787	desk	0.273228
41	moni	-0.128444	rosof	0.272599
42	apple	-0.128037	ms-w	0.272545
43	hics	-0.127602	iver	0.271827
44	rive	-0.124650	downl	0.271321
45	for a	-0.123150	croso	0.268511
46	stem	-0.122265	re: w	0.262431
47	ing x	-0.121175	dows\n	0.258328
48	colo	-0.120276	ith w	0.257445
49	olor	-0.119339	cica.	0.254940

	Smallest Word	Smallest Weight	Largest Word	Largest Weight
0	mac	-0.225206	ide	0.724057
1	apple	-0.205988	oller	0.640627
2	dows	-0.182041	scsi	0.550627
3	r sal	-0.173268	troll	0.543668
4	sale	-0.169135	rolle	0.529195
5	pple	-0.168663	isa	0.471442
6	or sa	-0.168652	bus	0.431050
7	appl	-0.164108	scsi	0.397718
8	indow	-0.154122	card	0.378798
9	ernal	-0.151867	ller	0.370022
10	windo	-0.146486	board	0.351313
11	for s	-0.138725	scsi-	0.346129
12	file	-0.138207	vlb	0.339812
13	quadr	-0.128540	bios	0.331904
14	terna	-0.128167	rives	0.329731
15	power	-0.127971	aster	0.317262
16	iver	-0.127483	herbo	0.316772
17	rnal	-0.121413	ntrol	0.313237
18	uadra	-0.121338	rive	0.311371
19	e mac	-0.120393	therb	0.309691
20	ac ii	-0.119918	ontro	0.307562
21	rd an	-0.116821	s scs	0.306790
22	iisi	-0.115712	teway	0.303595
23	ippin	-0.114795	gatew	0.303595
24	powe	-0.114598	atewa	0.303595
25	hippi	-0.114077	erboa	0.303019
26	mac i	-0.113544	loppy	0.297843
27	the r	-0.111339	irq	0.297385
28	shipp	-0.110979	rboar	0.297033
29	river	-0.110168	jumpe	0.296655
30	trol	-0.107486	isa b	0.295718
31	atest	-0.105985	boot	0.293753
32	offer	-0.105914	eisa	0.289116
33	olor	-0.105693	maste	0.288282
34	adra	-0.105026	486dx	0.287544
35	e kno	-0.104982	drive	0.284757
36	tris	-0.104622	umper	0.283022
37	iisi	-0.104128	mothe	0.282869
38	quad	-0.102243	gate	0.277538
39	ntris	-0.102111	486d	0.276730
40	mail	-0.101645	moth	0.276552
41	ndows	-0.101178	scsi\n	0.269204
42	entri	-0.099466	17"	0.265744
43	pping	-0.099150	sa bu	0.260033
44	ntern	-0.097730	aptec	0.259465
45	ship	-0.097348	flopp	0.257935
46	ne kn	-0.097247	7" mo	0.256589
47	offe	-0.097217	a 486	0.254615
48	rmina	-0.096218	eway	0.254564
49	he ii	-0.096031	disk	0.254408