In [1]:
using JSON, ProgressMeter, JLD, LightGraphs

Community Detection on twitter network using non-negative matrix factorization and graph regularization.

This implements the DualNMF algorithm in this paper: Community Detection in Political Twitter Networks using Nonnegative Matrix Factorization Methods

Prerequisites:

  • the retweet graph as a LightGraphs object
  • the user/graph matrix

In [18]:
user_word = JLD.load("/media/henripal/hd1/data/new_mat.jld", "new_mat");
graph = JLD.load("/media/henripal/hd1/data/graph.jld", "graph");

In [19]:
size(user_word,2)


Out[19]:
5132

Building a word/word similarity matrix from the user/word matrix

We iterate on columns because its faster and memory friendly in Julia:


In [20]:
function sparse_similarity(m::SparseMatrixCSC)::SparseMatrixCSC
    normalized_user_word = spzeros(size(m)...)
    norms = [norm(m[:,i]) for i in 1:size(m,2)]
    @showprogress for (col,s) in enumerate(norms)
        s == 0 && continue # What does a "normalized" column with a sum of zero look like?
        normalized_user_word[:,col] = m[:,col]/s
    end
    return normalized_user_word' * normalized_user_word
end


# this builds a LightGraphs graph from a similartiy matrix
function build_graph_from_similarity(similarity::Matrix, cutoff::Float64)::Graph
    length = size(similarity, 1)
    graph = Graph(length)
    for i in 1:length
        for j in 1:i-1
            similarity[i, j] > cutoff && add_edge!(graph, i, j)
        end
    end
    graph
end


Out[20]:
sparse_similarity (generic function with 1 method)

In [21]:
# this succesively builds the similarity matrix then
# builds the graph from the similarity matrix

cutoff = .4
similarity = sparse_similarity(user_word)
word_graph = build_graph_from_similarity(full(similarity), cutoff)


Progress: 100%|█████████████████████████████████████████| Time: 0:04:16
Out[21]:
5132×5132 sparse matrix with 24981772 Float64 nonzero entries:
	[1   ,    1]  =  1.0
	[2   ,    1]  =  0.104883
	[3   ,    1]  =  0.0077162
	[4   ,    1]  =  0.134559
	[5   ,    1]  =  0.0689679
	[6   ,    1]  =  0.0223928
	[7   ,    1]  =  0.00913363
	[8   ,    1]  =  0.146813
	[9   ,    1]  =  0.0788164
	[10  ,    1]  =  0.116557
	⋮
	[5122, 5132]  =  0.120978
	[5123, 5132]  =  0.177761
	[5124, 5132]  =  0.0172602
	[5125, 5132]  =  0.266282
	[5126, 5132]  =  0.195395
	[5127, 5132]  =  0.0338713
	[5128, 5132]  =  0.138883
	[5129, 5132]  =  0.127958
	[5130, 5132]  =  0.300903
	[5131, 5132]  =  0.229402
	[5132, 5132]  =  1.0

Algorithm

  • calculate the Laplacian matrices
  • define the update functions
  • define the cost function
  • iterate the update functions

Some remarks: the cost function is quite expensive, so we do not calculate it everytime. It would be maybe useful to adaptively calculate it?

Some of the helper functions are a little strange looking; this is because these matrices are huge, some are not sparse, and the memory usage can get a little out of control. Hence the column iterations, some precalculations, etc..


In [22]:
# calculating the laplacian matrices and plusminus laplacian:
L_c = laplacian_matrix(graph)
word_laplacian = laplacian_matrix(word_graph)
wl_plus = (abs(word_laplacian)+word_laplacian)/2
wl_minus = (abs(word_laplacian)-word_laplacian)/2
gl_plus = (abs(L_c)+L_c)/2
gl_minus = (abs(L_c)-L_c)/2


Out[22]:
1205559×1205559 sparse matrix with 7484416 Int64 nonzero entries:
	[1      ,       1]  =  8
	[2      ,       1]  =  -1
	[3      ,       1]  =  -1
	[4      ,       1]  =  -1
	[5      ,       1]  =  -1
	[6      ,       1]  =  -1
	[7      ,       1]  =  -1
	[8      ,       1]  =  -1
	[9      ,       1]  =  -1
	[1      ,       2]  =  -1
	⋮
	[1205555, 1205555]  =  2
	[234    , 1205556]  =  -1
	[235    , 1205556]  =  -1
	[1205556, 1205556]  =  2
	[11575  , 1205557]  =  -1
	[1205557, 1205557]  =  1
	[94     , 1205558]  =  -1
	[1205558, 1205558]  =  2
	[1205559, 1205558]  =  -1
	[1205558, 1205559]  =  -1
	[1205559, 1205559]  =  1

In [28]:
# parameters and initializing W and U
clusters = 60
α = 10
β = 10
users = size(L_c,1)
words = size(word_laplacian ,1)
W = .5 * spones(sprand(words, clusters, .5))
U = .5 * spones(sprand(users, clusters, .5))


Out[28]:
1205559×60 sparse matrix with 36172295 Float64 nonzero entries:
	[2      ,       1]  =  0.5
	[4      ,       1]  =  0.5
	[6      ,       1]  =  0.5
	[9      ,       1]  =  0.5
	[10     ,       1]  =  0.5
	[13     ,       1]  =  0.5
	[14     ,       1]  =  0.5
	[15     ,       1]  =  0.5
	[16     ,       1]  =  0.5
	[19     ,       1]  =  0.5
	⋮
	[1205542,      60]  =  0.5
	[1205547,      60]  =  0.5
	[1205548,      60]  =  0.5
	[1205549,      60]  =  0.5
	[1205550,      60]  =  0.5
	[1205551,      60]  =  0.5
	[1205552,      60]  =  0.5
	[1205553,      60]  =  0.5
	[1205554,      60]  =  0.5
	[1205558,      60]  =  0.5
	[1205559,      60]  =  0.5

In [30]:
# memory friendly update functions

function update_U(U::SparseMatrixCSC, W::SparseMatrixCSC)::SparseMatrixCSC
    WpW = W' * W
    return U .* sqrt((user_word * W + α * gl_minus * U) ./ (U * WpW + α * gl_plus * U))
end

function update_W(U::SparseMatrixCSC, W::SparseMatrixCSC)::SparseMatrixCSC
    UpU = U' * U
    return W .* sqrt((user_word' * U + β * wl_minus * W) ./ (W * UpU + β * wl_plus * W))
end


Out[30]:
update_U (generic function with 1 method)

In [32]:
# memory friendly frobenius norms and objective functions

function my_frobenius(uw::SparseMatrixCSC, U::SparseMatrixCSC, W::SparseMatrixCSC)::Float64
    (users, words) = size(uw)
    wp = W'
    clusters = size(U,2)
    result = 0
    @showprogress for j in 1:words
        uwp_j = U*wp[:, j]
        result += norm(uw[:, j] - uwp_j)^2
    end
    result
end

function obj(U::SparseMatrixCSC, W::SparseMatrixCSC)::Float64
    my_frobenius(user_word, U, W) + α * trace(U' * L_c * U) + β * trace(W' * word_laplacian * W)
end


Out[32]:
obj (generic function with 1 method)

This is where we run the algorithm. Somewhat time intensive but not crazily so


In [47]:
tolerance = .05
delta = 1000
stride = 10

err = obj(U, W)

while delta > tolerance
    for i in 1:stride
        U = update_U(U, W);
        W = update_W(U, W);
    end
    newerr = obj(U,W)
    delta = abs(newerr - err)
    err = newerr
end


Progress: 100%|█████████████████████████████████████████| Time: 0:09:33

In [49]:
JLD.save("/media/henripal/hd1/data/U_60.jld", "U_60", U)

In [51]:
# another helper functions, assigns the communities based on the highest probability of being in that community

function assign_communities(u::SparseMatrixCSC)
    (n_user, n_cluster) = size(u)
    communities = Array{Int64,1}(n_user)
    @showprogress for user in 1:n_user
        communities[user] = indmax(u[user, :])
    end
    communities
end


Out[51]:
assign_communities (generic function with 1 method)

In [52]:
comm = assign_communities(U)


Progress: 100%|█████████████████████████████████████████| Time: 0:00:10
Out[52]:
1205559-element Array{Int64,1}:
 19
 27
 31
  7
 19
 19
 59
 22
 18
 31
 31
 25
 44
  ⋮
 41
 25
 41
 21
 51
 57
 10
 17
 31
  9
  2
 30

In [53]:
using Plots


WARNING: using Plots.density in module Main conflicts with an existing identifier.
WARNING: using Plots.translate in module Main conflicts with an existing identifier.
WARNING: using Plots.center in module Main conflicts with an existing identifier.

In [55]:
histogram(comm, nbins = 60)


Out[55]:
0 25 50 0 50000 100000 150000 y1

Some post-processing to vizualize data using projector, and restrict ourselves to the 10k largest accounts.

This is totally in rough draft form


In [10]:
U_60 = JLD.load("/media/henripal/hd1/data/U_60.jld", "U_60")


Out[10]:
1205559×60 sparse matrix with 35334538 Float64 nonzero entries:
	[2      ,       1]  =  0.0495808
	[6      ,       1]  =  0.00918191
	[9      ,       1]  =  0.123767
	[10     ,       1]  =  0.000630212
	[13     ,       1]  =  0.00385952
	[14     ,       1]  =  0.00347653
	[15     ,       1]  =  0.00244331
	[16     ,       1]  =  0.000328876
	[19     ,       1]  =  0.00363582
	[20     ,       1]  =  0.00131454
	⋮
	[1205542,      60]  =  8.94298e-39
	[1205547,      60]  =  5.19441e-202
	[1205548,      60]  =  2.42452e-200
	[1205549,      60]  =  7.16569e-5
	[1205550,      60]  =  7.76278e-30
	[1205551,      60]  =  0.0097963
	[1205552,      60]  =  5.43669e-8
	[1205553,      60]  =  1.072e-5
	[1205554,      60]  =  6.15074e-75
	[1205558,      60]  =  0.000951541
	[1205559,      60]  =  0.000741547

In [12]:
using DataFrames

In [25]:
name_followers = readtable("/media/henripal/hd1/data/name_to_follower.csv", header = false);

In [26]:
rename!(name_followers,:x1,:name)
rename!(name_followers,:x2, :followers)


Out[26]:
namefollowers
1GavaironJ5
2bocchijoto1834
3cannabinolsen1
4angelman6132
5alex_latrice21199
6turnipkween242
7EveMorante747
8mwutley113
9LetsCllnk59
10positivelytaco173
11SachaStein171
12andino__20155
13Bonduran1598
14pretocaetano259
15TheLos967
16LaylaGerhart4
17stolethetart34
18aryalptara450
19asvpxstephaniex107
20Doreen5884
21ivysharIey274
22YSemerel8
23LVIaLondres549
24monsterfromars856
25SassyBroncoFan11
26karururo74
27thaecn11
28Raulggrc641
29ZireLLi_B285
30tamtinke2114
&vellip&vellip&vellip

In [32]:
sort!(name_followers, cols= :followers, rev = true);

In [35]:
name_followers = name_followers[1:10000,:]


Out[35]:
namefollowers
1MileyCyrus31598990
2TheEconomist18303980
3POTUS14277895
4funnyordie13936034
5TIME12715214
6ArvindKejriwal10276363
7SarahKSilverman9776677
8jk_rowling9056612
9HuffingtonPost8868956
10people7546839
11lemondefr6668668
12NPR6584109
13PerezHilton6560967
14guardian6210222
15EW6031204
16lilyallen5905870
17piersmorgan5433143
18RedHourBen5225214
19htTweets4999547
20TheFunnyTeens4590964
21billboard4434723
22dumbassgenius4383367
23hitRECordJoe4164075
24todonoticias4037759
25DannyDeVito3967433
26jack3960340
27SkyNews3762095
28MMFlint3736617
29IndiaToday3687969
30BritishVogue3472942
&vellip&vellip&vellip

In [36]:
name_followers[:ind] = [name_to_index[n] for n in name_followers[:name]]


Out[36]:
10000-element Array{Int64,1}:
 1105228
  400654
  430428
  519328
   18442
 1024618
 1058073
  701067
  521599
  441624
  928554
   21126
  883751
       ⋮
 1056759
  389300
  861936
 1060834
  628955
  783077
  146505
  687415
  538132
  373521
  903695
  727673

In [38]:
user_vectors = Array{Float64,2}(10000, 60)


Out[38]:
10000×60 Array{Float64,2}:
 5.31633e-318  5.26693e-318  5.21753e-318  …  4.09076e-319  3.47501e-319
 5.31633e-318  5.26692e-318  5.21753e-318     4.09081e-319  3.47506e-319
 5.31632e-318  5.26692e-318  5.21751e-318     4.09086e-319  3.47511e-319
 5.31632e-318  5.26691e-318  5.21751e-318     4.09091e-319  3.47516e-319
 5.31631e-318  5.2669e-318   5.21748e-318     4.09096e-319  3.47521e-319
 5.31631e-318  5.26691e-318  5.21749e-318  …  4.09101e-319  3.47526e-319
 5.3163e-318   5.26689e-318  5.21749e-318     4.09106e-319  3.47531e-319
 5.3163e-318   5.2669e-318   5.2175e-318      4.09111e-319  3.47536e-319
 5.31629e-318  5.26689e-318  5.2175e-318      4.09116e-319  3.47541e-319
 5.31629e-318  5.26688e-318  5.21747e-318     4.09121e-319  3.47546e-319
 5.31628e-318  5.26688e-318  5.21747e-318  …  4.09126e-319  3.4755e-319 
 5.31628e-318  5.26687e-318  5.21748e-318     4.09131e-319  3.47555e-319
 5.31627e-318  5.26687e-318  5.21745e-318     4.09136e-319  3.4756e-319 
 ⋮                                         ⋱                            
 5.26699e-318  5.21759e-318  5.16818e-318     3.47442e-319  2.59236e-319
 5.26698e-318  5.21759e-318  5.16814e-318     3.47447e-319  2.59241e-319
 5.26698e-318  5.21757e-318  5.16814e-318  …  3.47452e-319  2.59246e-319
 5.26697e-318  5.21757e-318  5.16815e-318     3.47457e-319  2.59251e-319
 5.26696e-318  5.21756e-318  5.16815e-318     3.47462e-319  2.59256e-319
 5.26697e-318  5.21756e-318  5.16816e-318     3.47466e-319  2.59261e-319
 5.26696e-318  5.21755e-318  5.16816e-318     3.47471e-319  2.59266e-319
 5.26695e-318  5.21755e-318  5.16817e-318  …  3.47476e-319  2.59271e-319
 5.26695e-318  5.21754e-318  5.16812e-318     3.47481e-319  2.59276e-319
 5.26694e-318  5.21754e-318  5.16812e-318     3.47486e-319  2.59281e-319
 5.26694e-318  5.21752e-318  5.16813e-318     3.47491e-319  2.59286e-319
 5.26693e-318  5.21752e-318  5.16813e-318     3.47496e-319  2.59291e-319

In [41]:
for i in 1:10000, j in 1:60
    user_vectors[i, j] = U_60[name_followers[:ind][i], j]
end

In [43]:
user_cluster = DataFrame(user_vectors)


Out[43]:
x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16x17x18x19x20x21x22x23x24x25x26x27x28x29x30x31x32x33x34x35x36x37x38x39x40x41x42x43x44x45x46x47x48x49x50x51x52x53x54x55x56x57x58x59x60
10.00033084328561162640.000235565792920536932.0446540121546803e-100.00.00.00.00.00238466716023131570.00.01.6802939987742255e-60.02.014624184008981e-51.3887591951182302e-83.064388919408614e-420.00.000121925911719592830.00284489401918475258.754336407056734e-50.00.0444981647989640750.000149107751439766630.00.00627307400324051550.04.573211742164153e-60.0039847312899172180.00.00.00.00.00104301270767933360.00.06.587998420877805e-50.00.00.00.0053613828931274550.00.00073551586239787760.00081128842225755030.05.8160241596373914e-50.00.00.00033442684644411920.005531427540761470.00061284644805077610.00141753675088193550.00.00.00.00.00084702097404783790.0044934301346492890.00233141493531753260.00.00.0
20.0121859389924536980.0075203465989271660.00.00.00.00.0027907855453410130.0004393821431951680.00.00.00.00.04.705423402757473e-391.2545409526519993e-70.0105735473723471870.00.08.530843319542127e-270.00.00150000410492084080.00.00.00.00.00.000464733896810727240.00.01.1834661896183532e-330.00.009115361678538291.063352318605058e-440.0022588266316003580.00.000397584676851139240.00209381457986022561.2654353806380195e-1470.00.00116140751456734460.01.5617438169482512e-774.2474198281799115e-50.03.425544054559048e-50.06.867498516563607e-390.00264968568670444820.00.06.157052518337497e-170.04.154087665370867e-300.08.69547615838423e-240.08.984054254422174e-284.607879997896263e-50.03.905345380474349e-20
30.00.04.020239757033454e-240.00.07.1523067998953e-50.00.00051103607323942880.00.03.024365460555043e-60.03.1404478241281815e-50.00.00.00030740216038349050.00.00.00.06.289792525958274e-70.0035579026378074970.00.01.2213750773923446e-140.0257825873771236930.05.770156877870895e-66.53461782663682e-72.2871680716142906e-230.00.0001041186949249682.236325870673125e-253.2812142603073855e-70.00.0035172950579614140.01.3710349762929702e-1421.7124600972339375e-300.00103019903336581040.03.1587700870857397e-160.0069791611118137530.0101364819321315380.00.00.00.00.0029340646276419370.00.00.00031029834392099350.00.00.00.01.319297570782638e-50.00.00.0023441462028319865
40.00177474886716784860.03.8932615628673544e-260.00.00.00.00206225854631835940.00.00.00.06.886827065372384e-260.00.0166054173952980030.00.0056255642967040980.00078611954548221950.08.296795781269439e-190.00.0044899616259874640.08.987477612678408e-50.04.544902190476385e-160.00.00.00175047052149689480.000220721433684324970.00.00.05.312240722715251e-310.00.00.00.00087217106888639630.00.00.00.0424189855815816160.00.000299343196683780740.00.000123469710713425224.4390717534201144e-584.632406866563366e-290.00.00.000124215818617122441.984309035718981e-170.04.538956728803023e-220.00.00.02.904558844407672e-210.000242095667102842130.00377237666049764050.0
50.0097884263242806280.02.3076090435205563e-90.00.00.00031451965399874760.0123257629955794960.00.0155004355408053450.02.672768750511227e-50.0096037906130932370.00.00.00.013204658917914030.010389103799791910.0176452099041072630.00.00085291415695503660.00.00661434819242472450.07.810202183096931e-130.052945136641145240.000115675921440471740.00.00.0257476907856696960.00.065237236810438110.00.00.00.00.00.0102074202644021693.0045663718339554e-50.00.00.0220347864794357280.00.00.00.00397765709246868450.00173558551897428480.00.00.0078215764651368370.00.02.1126924436052718e-130.00029016553269901680.00.00.00.00.00.00.0004784013468876284
60.00.02.182879405819539e-580.0049464022005298590.00.00185326753177624460.00.09.249687767222297e-100.0153230134706511746.471332095499124e-110.00.01.2063111448355473e-700.00.00.0239925838475669760.00.00.00175595470724613180.0189744380268601180.0034052550134733250.04.5392163798588044e-268.8061286360795e-80.00.0050001225965207070.0137013703276847140.00.00.012227362994905590.01.0701099586687758e-60.00.0062436522110122310.00.0145746402510536484.874173178347575e-990.00.006292963551030890.00.00.0068510097149863770.00.00293354160900899040.00.00.00.0231341621116709850.05.641185407919584e-90.00.00.00.00.011764454781450467.071123609227314e-90.00112165848826422370.07.378274962461545e-7
70.00.00277970258169201463.3979398156249452e-120.00.08.814023086303373e-50.018528269791202230.00015988564568129380.00.06.459061909324168e-430.00.08.947071998935866e-750.00.00.00.07.936404288729375e-270.00.00.00.00132500596549895622.6718651633135695e-640.00.0048250488128518180.00.00.00.00091588044336505190.0144139559595103180.031207931788666640.00.0122219245508646580.00.00.00.00.00.0006771468759894360.00.00.000219499197650951030.015764776725120330.01.4231777811334748e-351.3217539247370314e-150.0082997415178398430.00.00.03.899171473008588e-504.881385673693359e-372.044953017265063e-343.2883968231742943e-380.0105462627148034810.08.312479537985108e-50.06.378597348496437e-28
80.0212372134468247630.00.00.00152144341745661590.00.00.0143785796340289950.07.1284457103681e-660.018075400576805554.921543795518432e-221.8136484282461668e-160.00.00.00.00307990003174951060.0060079288591426570.01.1382048004529657e-130.002336582726019780.0039354940462522480.00.00049500253411297440.00066502814548212987.34856003740948e-270.00.00.004792227712465160.00.00.0111973874140215820.0063924678206381254.761109215242714e-1020.00.00.00.0072199796733355460.02.45778942403467e-210.00060459834507010340.0204690374418634370.00.000234394793456369720.00.00.02.3408454799743908e-330.00.0070925269288079240.00.04.11377373247869e-100.01.2699217240398904e-230.00.00.00.001435525121047130.00.0
90.00.0071958268326775211.9618794256116788e-220.0032261328610642140.00.00.00.00031182870206114770.00.03396308003783090.08.511212034605835e-390.00.0290012908834762882.5586945656458567e-290.00.017642127040066360.0044914974455220940.00.00.0092435775436163420.00.00147606431503098680.06.802802880433979e-60.00.00.00.00.00.067160626777519730.0158932081431386460.00.00.00.00.0006161699970217671.2126143733393911e-740.00.00.0549739322941840760.00.00.00.02.3883638756848187e-530.00.0016102221508238280.00.00080311228820177056.5421801775161704e-520.00.00.01.0456763609141535e-180.00.00.000480413998772520750.03.4261832108056713e-40
100.00073573569676998660.00.00.00057036877679747720.08.611848964174645e-60.0026941512672324886.769001887003399e-65.754035278129778e-490.00308691414219829960.05.611663497219378e-492.8482500190617526e-270.00.00.00135242875784766630.0006529787474849030.0012650748854692360.00.00.00290368332037552369.442109112482463e-50.00.08.216481952841038e-50.00177411470727226623.205783333790314e-50.00.02.4671910525406004e-420.06947969845829750.00340473675372426640.03.614442224176738e-50.04.109265413636677e-57.096420012284045e-50.00.00.00.0019243839264295390.08.918559383454957e-60.00.01.0114729707463713e-650.02.6310247696736657e-50.00.00.00.00.02.0115125766097873e-560.00.000207142747011557956.077641004401616e-64.908573499800462e-50.01.1012975263377428e-51
110.00.04.755182558599087e-380.00.00.0151673206615440780.00.0062417793030975150.00.00.00.00.01.3675593812257677e-140.00.45942500908324040.0737072415214590.0195639716044334920.00.0316514118759959740.00.00.09.443869951194534e-90.00.00.00.00.06174533940166510.00.048970521720535210.06.126319672157516e-50.0166019043654491540.00.00.01.1118216498416854e-625.116429813514793e-70.00.00.00.00610850125060051750.051022263529656990.0081289799979981397.383448131240064e-96.506337651988628e-80.00.00.00.01.3606063799524002e-350.00.07.908738340088591e-110.00.00.00.00.0
120.0073175482149381650.05.1466475220660164e-140.01.056491322434247e-110.00126754954457637940.0109000204776957730.00112169916015602330.00.02.3219549328631168e-74.402186342223603e-240.01.2387837142152423e-350.00.0099480821625097750.018082972499273330.0267399905015059041.4864041950911461e-50.00109703536860771760.0299327961270797580.00112559851435113370.033347962831897760.00.00.00.00208163744453376170.0250226742084821360.00.0180219331804357670.067410272103395760.00.00011656655987887760.0099034510059116970.0207469943772184420.00.03.615937737371287e-290.00.00650158706024841350.035185635670277330.00.0025553768186367870.00.0045485573309441881.1888107777183815e-50.0048080993981958640.00341967258456309840.00.00.000168452397541904081.3191703573514609e-150.09.586527270660217e-70.00.00.00393910877924329860.00037046054062699190.00160367185497299240.0
130.00.01.5867736056820864e-2450.03.6869401910794603e-880.00.00.00.00.09.859288274766614e-340.02.58531310451153e-451.7223123273641381e-1024.15348698230008e-740.0119724915944947140.0104136927395827010.07.716633057753672e-350.00.00.00.00.00.00.0044009219240336940.00.00.00205251981871291170.00.00.00.00.00104102896553671250.00104425753708979520.00108622416538577730.02.437700532133408e-1531.1387262873557054e-710.00062231744251005370.00.00.00.0065682544563915560.00026382084087316511.0855153386662732e-462.2566143601016862e-350.0019900163849427640.0126523975028753970.000322783532163186070.00.00.00.00.00.01.0220049067290253e-430.000209859717949154950.00182772883884325641.5835446017854633e-151
140.000204480010392019684.5738495342258925e-55.396701624857244e-2020.09.049908825629075e-390.00.00.00.00114655310092269190.00.05.526574547439375e-500.00033583380187449156.82448832617364e-1072.9445363723902807e-160.00029182864935521090.00.00.00.00.0211025577923508560.00142008602832711330.00.00.00174061971976754880.00.00.00.01.9440763935540164e-210.00.00147909336864452340.00060322261316486410.006740236647763260.000100084389606947860.0061026026382265350.0073770917990965290.00.00200672121447094880.00.00.00.00189960586503743830.000173073462378539580.00303264586961775550.000296834008166956150.00.0096766191332368770.00.002240518409085370.00055305354950191310.01.0186329072166427e-50.00.00100882873883807650.00.00.00.00.0
150.00256428854186086570.02.1516473706676736e-130.00.04.3318669174034834e-60.00.01.3846451876700926e-490.015077783788703310.00.00.04.9336242874416535e-1150.00.0062782150778292490.00.0140144841279095626.974972464493002e-121.2904100663128434e-50.00.00.03.709317767809059e-690.00.0043675872083085952.692255178236415e-50.000405584769635095950.00028308688472242220.00.00.0138502074285153690.00.00.03.4745349249668655e-50.000130662648727532829.299739147653046e-871.1796969911036216e-423.516546421461397e-50.0090071793466668170.00.00.077003190890972240.00.07.377472189822059e-420.000171942136462036540.00195907989856274450.00.00.01.7983032989673563e-312.2182911137674775e-490.00.000420526863549720230.0047354239893596860.00.06.070325777653096e-43
160.00.03.326445244708997e-70.00.00.0016986129625529990.086027057662321340.00193589636018783020.00.00.00.01.4378741176195567e-91.5886997392405432e-461.0299698426346623e-230.00.0356385819777647260.0295596932420243643.103435554370457e-80.00.071140710307944060.00332829506230841780.052598426164291571.4155089046675734e-410.00.0410588797531569250.00301783191069995850.0399529316102157640.0198164362363582070.00.384025532068548950.00.00.0106145900608151950.00.00.01.4322570789182196e-410.00.00326206224753912730.136458896713436780.00.00123805965709737530.00.00146109528660015930.07.747145453894382e-70.00.00.008115123447849860.00.00.02.5772014816756534e-174.480064258557233e-80.0168466811229176440.00075301549879090080.00.00.0
170.0037400433920964410.0008059218195445532.0217667578545755e-220.02.2556416910763044e-280.00.00.025651361667975510.00155381828226105910.01.2461326501067271e-50.00.000175832126392167840.00.00.00.00.14031268581374230.0043409571352134710.00.00.0076103308587694250.00.00.120577949436705630.00.0152522886155093370.00.08.475278553244229e-260.00.00.0044285066278094870.00.000347757971195503870.038663946003627180.00.00.00776613360785250.00.014868671876933280.00.0075481808932259580.01090558824566650.019319687311953140.00.00286636377303700230.00.00.00.07.653015771609385e-321.4174784644369366e-51.6297393372852333e-70.0077143207843415080.070178927624767730.02.5024082016117106e-50.00.0
180.0061289019689582030.00.00.00064503488714312390.00.000344164024894826160.00157614283861933520.000121779752939761923.4705655818998465e-270.0130468335249174440.00.00.01.3955798744951263e-1220.00.00.00.02.6452203048820354e-200.000110346027956131120.00.00.000125707378742852720.01.0350511050917589e-210.00.00148590807647875540.00.00.00.0089312587581724980.0092137842395524690.00.0037238414763692870.0020663592532642510.00.00187452324655507920.00.00.00.00.00.00.00.08.875151280115714e-320.00.00.014785890696343230.00047167888601234951.98156573848303e-421.0344531372905702e-460.04.834512802431814e-2436.532118069714816e-250.00061546260628271680.00.00025802466836649410.00.0
190.09.999879011043507e-50.00.01.3387767496799246e-175.401768180852536e-80.00.00.00.00.03.813823054766971e-803.113090761803714e-440.00.00.00.00.05.731756748233698e-440.00.01.7426188793383248e-70.02.2465114756188864e-77.100823033319127e-210.0067761821903486086.631501352065248e-54.909052136483631e-53.356624349817002e-51.2841517090261625e-1890.0030691989436798490.00125346944382818480.00.00.01.281634733714912e-73.2651408850778638e-60.00.00.00.05.212514753365408e-980.00.07.36453849875381e-80.01.4480971666050266e-514.3481146644556315e-60.00171626023825327912.053078817049253e-70.00.00.00.035347772517595359.542602598514165e-510.06953048780079442.340224396644117e-530.05.647992533286518e-51.2872240434616918e-74
200.00111131834014864630.00027718591527325825.9466628776308925e-80.00.03.96170750416727e-50.00.02.85598042491174e-890.00.02.6103588304353928e-580.00.04.357860848695559e-990.00.00.00134405690597034060.04.824494840284024e-50.00.00015600579564197970.00.00.00011386547199764870.00.00.00.00.00.070871763127300540.08.076139769535077e-1820.00.00056868781722882340.000134444662136537740.00052081083336944080.00.00.00.0041256194215680150.00.00.0131715111433408765.2891853706849495e-50.05.181365801927377e-580.00014974642248238040.00.00.03.1789147699368665e-202.6549337064418825e-300.00.00.00083880654033250340.00.000204421732699292340.00.0
210.00.00434114294113997250.00.00.00.00.00.00.00377855574945037360.00.00223391678828058260.00.00388193076058361040.00.00.0056152810923229950.018311947495818130.00.00.0174829018164434320.00.019603888897985850.03.2078427663772057e-60.041785403496778420.00.018104792174981560.0125230316156480780.03.971160363734896e-50.092253292165136110.0183459645352280739.642559645407678e-100.00.00728090486450483350.00.0273283058899049660.00.00.0135968910626675130.01.1137144616173448e-110.0147127403857057960.0407015705238753240.0117145791527277120.0098488153483055770.00.00.0163496398182733130.018523976762325060.00384514224492027268.767590593510485e-60.0085805924153508330.00235926600656390880.00.0186278675970282630.00.0053841879372008430.00.0
220.00334947649912552750.03.091985831895584e-2110.000161527038044319850.00.00.0070619982459336062.6990822793649762e-50.00.00.01.1661028267322304e-729.745864842728471e-350.00.00.00.0097211324076147880.00093874340419140050.00.00.00.000164916542101759620.08.13478270346616e-1010.00.00148315288404223980.0006798881580081830.00229759201431733530.04.8170259282885495e-260.00553924083504584150.00.00.00291424447371236170.00155365882712959380.00.00.02.0061016326391844e-390.000300725861845951060.02.213635524959732e-2210.000167920225550275970.00.04.476887046261101e-221.2159008620676597e-230.00158983800025853480.0141372486399270350.00.00.00.03.8423830274129606e-320.00.003368892706741190.00.00.0046536435151986817.577160936135913e-97
230.00.00.00.00080367088409988469.969101655612104e-391.3962465423649773e-50.00.00.00.0250723108312626773.294619584434729e-350.08.693760490376928e-280.03.513328146208825e-1000.00.00.04.419475210915718e-140.00.0083211451838609060.00.0062868235998316810.00.043348610825348650.06.126607654731565e-50.07.025995274949137e-50.00.00.01.6867395932254108e-1940.00.00071385175363765442.6573742601421058e-51.16398071471101e-55.9274567467731665e-861.0856777521757893e-740.00.00.03.4293442630432786e-50.02.1020356704897426e-50.09.194369818032449e-397.391180940215377e-50.0094629142514354340.0011282387362134150.02.7246927090680344e-174.556522952597801e-342.9329052980517483e-480.05.138426265404471e-52.287639051189262e-60.00107449803045788460.00.0
240.0055432023481428780.00119129090758337070.00.00.00.00.07.982727941271713e-50.00.00698958411755514452.6834964907715243e-811.9650867966762763e-780.00.00.00.00.00.00.00.00.00189966665631078450.00.01.6412357556423329e-597.989947582074964e-510.00300262104221429070.00033287489510678420.00081710748454926990.001626079922253840.00.00.02.591426453275734e-160.00037736162653901490.00071930862900942540.000145261064491767740.08.625009863369085e-781.290148416898703e-570.000230099716994728180.0213197278980604940.00.00.0034680910184086260.04.30006874186567e-621.3250540808863525e-1500.00130445551788838420.00284783886083075364.354390140083496e-51.553540073914648e-613.126868740215103e-421.3491871959270767e-457.88363508890628e-1110.00.00099527442824722244.4399207801333105e-240.00027104445998841690.00075652316306151870.0
255.4148679033299455e-50.05.744732633480829e-110.0062671549218554360.00.00097241197040272790.00.00.00.03.6571747144796755e-55.023875853958544e-310.000226532120145054570.01.3655453212555112e-707.235316347380485e-50.00.00.00307643700813348430.000237672906833540030.00.00.00.00.00.00.00030779378000884540.00.00908943380411552.5500939962275076e-440.056967400920572320.00.00.0104828002754785591.635060814075337e-60.00280799018535748650.01.2599965551538377e-930.0052741001612828360.00.00.00.00.000122983895996034130.00058950076216554570.00.00.00.00.010560860289208330.0061372983754984281.8769347350970286e-277.30161789780687e-62.716650943744761e-80.00.00.02.733228288939811e-72.019683250496061e-60.0
260.0098224745770025680.00.00.00106821843190537881.8772083220845714e-391.2022744652003125e-50.02.479016422982124e-54.26318922084248e-290.0115184983735185183.4950589176145315e-291.5454005129711574e-271.4426486165703242e-240.02.632661224737485e-530.00.00.00.07.925466446176761e-50.0143596637724827550.00.00.00.022057634609151430.0073301782898644196.295150620959165e-50.00117308980293841240.00057594647031049490.00.00.03.435817915048784e-390.00.00.00.03.421113673581968e-881.8627892599548077e-253.719560270454346e-50.00.00.00.09.990012047097616e-60.00.00.00.00.00067583668835101730.02.754086052445391e-191.1893808678716802e-231.5706207204688035e-340.00.00018356483906991650.09.044793289171919e-50.00.0
270.00.00306344674537252550.00.00.00.000116794467157094620.00468276928639394740.00.00.0066126366683795051.4101562387432536e-221.2902979571834794e-230.04.526626402276788e-650.00.00.0046702079400968160.00.00.00.00.00103922918092608640.00.00.00499058689417838150.00.000106473206057323570.00053450961850899940.000304370878923135870.00.138384897798669050.00.00.00.00051944804962405140.00.000199544745237920270.02.4959314582831454e-120.000170016334653659970.0191201395772443134.2229540097745605e-1500.000185495890449033830.00.00.01.3171645519643673e-270.000304913138594968750.0081832334675674590.00.00.02.7220211950370956e-102.3460300555592948e-510.00.0005000896522636920.00098206250805522150.00.00.0
280.00.02.0804057785554173e-60.00.00.00.0124167072911863268.541317590119994e-50.00.03.2913940589835106e-320.01.1531140873552832e-320.00.00.00.00.00.00.00018954465692355640.0213854563474694250.00.0057435155872413830.00.000334867314323584550.0108403066070962370.00062700180288487270.00.00.00.135493390666085820.00.00.00.0110969935345114870.00.0015170607042265431.4329157451676534e-423.2144791787248893e-210.00026847836311124760.00.00.00015826280261185490.0257164855109870280.05.9011696012034735e-400.00.0021766549308837330.0116590526097535750.03.1812555660326e-290.06.66478206051896e-360.00.00.03.4984663182429066e-90.00.00.0
290.00.03.597061109780807e-100.00027065608672547890.01.248557649853732e-50.00202852088068187582.5454971544607745e-53.508885205283459e-510.0052763190717325940.00.00.02.6108003745823184e-810.00.00232674876348931870.00.02.798222741072345e-120.00.00538366965166632054.7996397138800584e-50.05.097946362411215e-400.0017974108647985520.00066753178472870730.00.00.00.00.063061992827845610.05.690789082818212e-260.07.34987119622455e-50.00.00.00.00.00.00.01.0364518816136793e-50.0147349634338144010.03.6887894006629026e-631.8771426001677794e-860.00.00.00.01.0649066230037764e-210.03.876685866647511e-430.09.084175721235827e-55.828112161275778e-60.00.01.1183048751992571e-66
300.000288575710230484130.00.00.00.07.126389610408108e-60.09.344350092167608e-60.00.00635596884071609050.01.3136423793401037e-130.01.403253957901199e-666.548928332851371e-130.00.00.00.00.00.0033479191042621810.00.02.218277832602591e-360.00.00707173877200372154.551554843203721e-60.00093928770123387780.08.936154728636963e-520.064483364322543010.05.199986259275717e-780.00073026656727473440.00.000191158715957397920.01.540700967784537e-810.00.000110522564200699530.04.917361001116743e-1046.874214247196746e-50.01.0863485513950449e-57.091235968799541e-130.00.00054175376664949420.00.03.197292064603831e-140.07.234242811009404e-168.691175389869988e-205.529413980384877e-50.00058983833411969426.106425239367501e-52.1484891148713907e-50.00082245347716033223.6531555795533595e-11
&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip

In [62]:
class = [indmax(user_vectors[i,:]) for i in 1:10000];
class_freq = zeros(60)
for class_no in class
    class_freq[class_no] = class_freq[class_no]+1
end

In [74]:
best_classes = [i for i in 1:60 if class_freq[i] > 250]


Out[74]:
10-element Array{Int64,1}:
  7
 10
 17
 21
 25
 29
 31
 34
 41
 44

In [78]:
new_class = [indmax(user_vectors[i, best_classes]) for i in 1:10000]


Out[78]:
10000-element Array{Int64,1}:
  4
  1
 10
  9
  7
  3
  1
  9
  7
  7
  3
  7
  3
  ⋮
  7
  7
  5
  7
  9
  7
  5
  5
  8
  2
  6
  2

In [79]:
name_followers[:class] = string.(new_class)


Out[79]:
10000-element Array{String,1}:
 "4" 
 "1" 
 "10"
 "9" 
 "7" 
 "3" 
 "1" 
 "9" 
 "7" 
 "7" 
 "3" 
 "7" 
 "3" 
 ⋮   
 "7" 
 "7" 
 "5" 
 "7" 
 "9" 
 "7" 
 "5" 
 "5" 
 "8" 
 "2" 
 "6" 
 "2" 

In [80]:
writetable("/media/henripal/hd1/data/attributes.tsv",name_followers[:,[:name, :class, :followers]])

In [53]:
writetable("/media/henripal/hd1/data/vectors.tsv",user_cluster, header = false)

In [20]:
df


Out[20]:
x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16x17x18x19x20x21x22x23x24x25x26x27x28x29x30x31x32x33x34x35x36x37x38x39x40x41x42x43x44x45x46x47x48x49x50x51x52x53x54x55x56x57x58x59x60name
10.00.00.00.00.00517255961472270.0269458305988737320.107754746868828590.00.00.00.00.00.00.00.00036813296051475080.050199649530958660.00.016343606906772920.27698326596875790.00.00.00.0111385827846112120.00.00.00.00.0506747689504332550.00.00.00.0116853033157835955.982337113625717e-70.0253412810848389970.00.13978763107946310.02.557728206998708e-500.0227986130328280930.111378487326356680.0335926421126174351.1281923594343566e-200.00.00.00.000317831426020809530.00.00.00.083510044476584510.0098627932763775440.00154078548868049280.0144282284676057420.00.00.015625880998423970.00.039471914712473840.096543279520852230.025225820001461275Deborah87958167
20.0495807539881654440.25088364449180740.00.00.00.00.044890519817205820.14694897933793410.0060604817981825740.0295141016353048440.08904555187545580.00.00.00.00020223960028936070.0056612258643935240.00.00.00.00.00.0583102737665972050.00.00.00.00.37113269106560780.00.183354156633009750.00.00.00.00336045983342230440.00.00.00.00.00.045424996102235280.074141308215067720.089662049386099330.00.115551163598071160.00.175419424200186270.00.00.00.00.21328984998041960.00.00312611885078397950.00.072885512818350830.00.089778495241341320.00.051346709790628620.083540290608851720.0texasfarmgirl1836
30.08.713427609010385e-90.00.02.506603772409537e-160.00144158376339748430.0091845751061363930.00146345979583767040.00.01.4087849542070368e-93.0556874831156914e-60.02.081966235264321e-180.00.0203245727897866730.00.00.00.00.00.00.0160420987508670120.00.05.010620238826393e-70.00.0067327381813031610.00.0042817770767424960.04833088299643880.01.7176459235562508e-110.00.00.00.0078897982256745973.1674231391041387e-90.0287184893568724270.00.00.00.00.00.00.00.00212190790937667964.806771499892348e-90.00.00347179006071073353.0484456781940202e-52.9008845144491245e-120.00.00.00.00.00179013878368099070.00.00.0Squatch
40.00.00.00.06.757009503150856e-90.00.0042903042983637770.00.00.00.00.00.00.00.05.349674366843527e-80.01.8740760403039192e-190.00.00.00.02.647414737804112e-160.00.00.00.07.659799678820569e-120.00.00.06.995064178052164e-130.04.167343897568493e-100.00.00.00.00.00.03.690302282820236e-96.2441708347947e-2530.00.00.03.9245758136038415e-240.00.00.04.468493945327262e-60.03.943752366065573e-70.00.00.00.00.03.399631642317278e-90.00.00033363499990994886Lu Who
50.00.02.978697870558589e-200.056104190505804910.00.00.10066110789416850.00.00.00.00176287468364312151.844669806182308e-200.0028185078757673570.00.00.00.02918932717176990.00.26957224229159730.00.0297625755408745360.079061907636609180.00.00.00.00.00.00.00.0466341388043708740.00.00.00.00.00.169561194944668450.04.626771949675073e-1030.00.00.08.911343599386976e-170.00.00.056658287254165130.00041125066640221140.00.022545644821631130.00.05423225026477180.0048080776032672880.00113857266869383340.00.0147687892251953750.00.00.00087941332623738760.0440517963411772640.08622110391965120.0SongsOfLaredo
60.0091819139137917450.01.1601476735579242e-50.00.000351133793636176040.0104579455969794690.018904234671028660.0241820008241138520.00811868908211230.0193365739340267820.00.00.00203696110709507750.00025014845146958460.00.00.0077646414197418670.00.079715439486928070.0144152426582702940.031009344102462850.00.0253036117602748570.00.041415698481792660.017736884814285750.0286638423326467970.00.0236667654664952450.00.0454651745238092850.0177732897681868350.00037256638149400070.020368934399857510.00.00.014962558240682270.000308033576412399840.0109585808470092080.0174707248312910080.00.00.02048149715843740.00.00.00.0050350696481636440.00.0100796177611725220.029726997204708110.00.000151126589164343330.00.00206715345942105540.00.00.00.00518677680691295540.00.006842609054683921Diva
70.00.02.3967158997252667e-600.00263798527167602180.00.0012827630771872160.00.0023261312507860130.0006253552342371810.00097675732387378710.07.193653953955475e-220.05.78946163199559e-50.00012889219133900620.002456533717471960.00075065809776386260.00218125254912359930.00.00208377988710421560.00.00294228963659837760.00.0002726855216050350.00076518727496921710.0066823601574149710.0066924639021836090.0025487623739392850.00.00.00.00058598311234548480.00.00155976617376512270.0016450005864934980.00.0020259288668270858.776629921270553e-50.00065813204774839280.00.00.00.00.00.00189649891852125230.00.000382099845846309550.00144588605585470280.00063335305517657670.00.00186601624301739340.000194245655261179820.0066957483025609840.00.00.00.00.0025942510290929630.008020398839810440.0Bishop Talbert Swan
80.00.045180223599183559.901165630897684e-100.00.0024790993349114870.00.0248908827870084270.049861011028443830.0053431842359907820.00.0114295813986051520.000206746859474775750.01039532751551550.00.00.00.0113256719371187250.071492840199046310.0248925271283972730.033535231248040410.00.085341978316532510.00.00.0278208427456346880.00.00.00.058041965880663030.00.00.019984664600255820.0046477807779876940.00.0142943230947924460.058000132431840030.00.00.0135915078800399170.00.0247241076225980651.1869259060345287e-190.00.00.00.0062624687488889780.00.055434101327895890.00.048871370909708130.00.00077881656476886820.00.00.00.0501878066173754160.0130960400422600510.019899250453677090.00.0NadelParis
90.123767199068653020.00.00.195641452658525950.00.00.00.21658431692582480.0142082393182671670.089102750540440480.00.00.047225662214608340.0040803100927065280.00.097584535174819640.00.218439140616005940.087245142955532070.155602481768017280.00.00.177207107842012240.00413163712913096250.00.161616113785743640.00.00.2017354192457240.00305759559878849340.00.067879114985437870.00.16493892806269420.00.00.00.0045589016336754440.00.00.00.00137918798305013350.00.099347068956935260.00.00.00.00.114862957558158440.19981842846105030.00.0031582146058686330.093204499898946020.00.00.00.064208219799155950.09649171552234170.00.06096894985040057Buster Brown
100.00063021229704064670.00.00.0029309570995928020.00.000109913321011956280.00.00.00.00552633287255019450.00.00.05.070940608576437e-1110.00.00233439737996887040.0041276336217259510.00066716449989175217.2232019446884e-85.772973437045301e-50.00185211883040169230.00.03.1330272313714247e-490.00.00.00018398359380704110.08.752101699406091e-50.00.072758994819415220.05.495776547174534e-2270.00.000116848765088159420.00.00035897212204445130.08.277897290652263e-370.00.05.336674395654321e-239.995083574115849e-50.00.000123886376682219460.00.00.00.00.000168915243755772220.00.01.0818763300871864e-240.00.00.00.00.000144658479719163630.01.144927345117208e-23AdolescentIdle

In [ ]: