Una buena descripción y comparación de diferentes algoritmos de clustering se encuentra en el sitio de Scikit Learn: Clustering
Existen diversas bibliotecas en diferentes leguaje (R, Python, Julia) que implementen diferentes algoritmos de clustering. En Julia tenemos, principalmente:
In [1]:
using RDatasets
iris = dataset("datasets", "iris")
head(iris)
Out[1]:
In [2]:
X = convert(Matrix{Float64}, iris[:,1:4])
Out[2]:
In [6]:
using Distances
In [7]:
distancia = pairwise(Euclidean(), X)
Out[7]:
In [8]:
using RCall
In [9]:
R"""
hc <- hclust(as.dist($distancia), method="complete")
"""
Out[9]:
In [10]:
R"""
plot(hc, label=$(names(iris)[1:4]))
rect.hclust(hc, h=30, border="gray")
""";
In [8]:
R"""
hc <- hclust(as.dist($distancia), method="single")
plot(hc, label=$(names(iris)[1:4]))
rect.hclust(hc, h=30, border="gray")
""";
In [11]:
distancia = pairwise(Euclidean(), X')
Out[11]:
In [17]:
R"""
hcomplete <- hclust(as.dist($distancia), method="complete")
plot(hcomplete)
rect.hclust(hcomplete, h=1.0, border="gray")
""";
In [18]:
R"""
hsingle <- hclust(as.dist($distancia), method="single")
plot(hsingle)
rect.hclust(hsingle, h=1.0, border="gray")
""";
In [19]:
hsingle = rcopy( R"cutree(hsingle, h=1.0)" )
Out[19]:
In [20]:
hcomplete = rcopy( R"cutree(hcomplete, h=1.0)" )
Out[20]:
In [21]:
using FreqTables
In [22]:
freqtable(hcomplete, hsingle)
Out[22]:
In [23]:
using Clustering
In [24]:
varinfo(maximum(hcomplete), hcomplete, maximum(hsingle), hsingle)
In [25]:
methods(varinfo)
Out[25]:
In [26]:
hcomplete = convert(Vector{Int64}, hcomplete)
hsingle = convert(Vector{Int64}, hsingle)
Out[26]:
In [27]:
varinfo(maximum(hcomplete), hcomplete, maximum(hsingle), hsingle)
Out[27]:
In [28]:
using Bootstrap
In [29]:
( varinfo(maximum(hcomplete), hcomplete, maximum(hcomplete), hcomplete),
varinfo(maximum(hsingle), hsingle, maximum(hsingle), hsingle) )
Out[29]:
In [30]:
asignaciones = hcat(hcomplete, hsingle)
Out[30]:
In [31]:
function VI(indices)
A = asignaciones[indices,1]
B = asignaciones[indices,2]
varinfo(maximum(A), A, maximum(B), B)
end
Out[31]:
In [36]:
VI_boot = bootstrap(collect(1:size(asignaciones,1)), VI, BasicSampling(10_000))
Out[36]:
In [39]:
ci(VI_boot, BCaConfInt(0.95))
Out[39]:
In [43]:
using Plots, StatPlots
pyplot(size=(300,300))
histogram(VI_boot.t1, linecolor=:grey, fillcolor=:grey, legend=false)
vline!([0, VI_boot.t0])
Out[43]:
In [ ]: