0. Load Dataset
Warning: downloading dataset treclegal09_2k_subset (2.8 MB) !
File ../treclegal09_2k_subset.tar.gz downloaded!
Archive extracted!
1. Feature extraction (non hashed)
2. Document Clustering (LSI + K-Means)
.. computed in 2.9s
N_documents cluster_names
9 520 [alias, enron, norm, ect, calo, changed]
2 405 [ect, hou, enron, group, recipients, administr...
3 330 [teneo, recipients, group, administrative, tes...
0 329 [enron, energy, company, trade, services, master]
8 298 [recipients, group, administrative, test, ect,...
1 160 [alias, enron, ect, norm, test, group]
5 129 [shall, party, agreement, transaction, period,...
7 121 [alias, enron, norm, ect, ena, corp]
6 113 [enron_development, ect, hou, enron, shackleto...
4 61 [rewrite, server, address, smtp, mail, virtual]
3. Document Clustering (LSI + Ward Hierarchical Clustering)
.. computed in 3.9s
N_documents cluster_names
0 545 [alias, enron, norm, ect, calo, changed]
1 469 [enron, ect, energy, company, trading, norm]
2 433 [ect, hou, enron, group, recipients, administr...
7 321 [teneo, recipients, group, administrative, tes...
6 238 [recipients, group, administrative, test, ect,...
8 135 [alias, ect, enron, test, group, recipients]
9 119 [enron_development, ect, hou, enron, group, sh...
3 101 [shall, party, agreement, transaction, confirm...
5 64 [rewrite, server, address, smtp, mail, virtual]
4 41 [berkeley, haas, edu, inflation, alias, growth]