Based on the eventlist generated by simulation, patients can be put into clusters.
five event in simulation
Bulit the clusters forward-timely. Built a global cluster set list, a dictionary that records the index of the patient's cluster in the cluster list and a resistant cluster set of global cluster set list's index.
For event 1, 2, 3:
For event 4:
For event 5:
We can have the sequences of patients in reality. We can cluster the patients by using the information given by their sequences. The ground truths are:
According to the sequence, we can know whether a patient is resistant or not based on the resistant mutations given by the resi.vcf file. It means that resistant and unresistant patients will not be in a same cluster.
In our ground clusters, every cluster is the set of nodes in a transmission tree. It means that the sequence distances between the nodes in the same cluster are relatively small. Given a sequence, we can find the most alike sequences based on distance profile and put them into a cluster.
Based on the result of simulation, we can easily put all the unremoved patients into resistant and unresistant patients.
For resistant patients, a distance profile is calculated. When a pair of 2 patients have resistance related SNPs, the distance between them is the hamming distance between the two sequences. Otherwise, their distance is infinitely great. Given a distance threshold, patients can be put into clusters.
For unresistant patients, given a distance threshold, patients can be put into clusters.
In the above picture, X, O and spade respectively represent a cluster of documents. They are clustered into 3 clusters.
We have:
In the above picture,
Therefore,
I tried several time based on 12-generation simulation, given a appropriate distance threshold, the precision, recall and F1 based on the ground truth could be around 1. Small distance threshold led to low recall, and big distance threshold led to low precision.
Number of Generation | resistant | number of patients | distance threshold | Recall | Precision | F1 |
---|---|---|---|---|---|---|
12 | unresi | 1041 | 40 | 0.81 | 1 | 0.9 |
50 | 0.965 | 1 | 0.982 | |||
65 | 0.999 | 1 | 1 | |||
75 | 1 | 1 | 1 | |||
85 | 1 | 1 | 1 | |||
100 | 1 | 0.996 | 0.998 | |||
110 | 0.9997 | 0.995 | 0.997 | |||
resi | 149 | 20 | 0.917 | 1 | 0.957 | |
30 | 0.917 | 1 | 0.957 | |||
40 | 0.917 | 1 | 0.957 | |||
50 | 0.917 | 1 | 0.957 | |||
60 | 0.917 | 1 | 0.957 | |||
70 | 0.917 | 1 | 0.957 | |||
80 | 0.917 | 1 | 0.957 | |||
12 | unresi | 1034 | 40 | 0.924 | 1 | 0.96 |
50 | 0.954 | 1 | 0.976 | |||
65 | 0.992 | 1 | 0.996 | |||
75 | 1 | 1 | 1 | |||
85 | 1 | 1 | 1 | |||
100 | 1 | 1 | 1 | |||
110 | 0.988 | 0.981 | 0.985 | |||
resi | 141 | 20 | 1 | 1 | 1 | |
30 | 1 | 1 | 1 | |||
40 | 1 | 1 | 1 | |||
50 | 1 | 1 | 1 | |||
60 | 1 | 1 | 1 | |||
70 | 1 | 1 | 1 | |||
80 | 1 | 1 | 1 | |||
13 | unresi | 2098 | 40 | 0.8 | 1 | 0.889 |
50 | 0.913 | 1 | 0.954 | |||
65 | 0.987 | 1 | 0.994 | |||
75 | 0.994 | 1 | 0.997 | |||
85 | 1 | 0.999 | 1 | |||
100 | 0.998 | 0.994 | 0.996 | |||
110 | 0.999 | 0.97 | 0.985 | |||
resi | 281 | 20 | 0.81 | 1 | 0.894 | |
30 | 0.957 | 1 | 0.978 | |||
40 | 1 | 1 | 1 | |||
50 | 1 | 1 | 1 | |||
60 | 1 | 1 | 1 | |||
70 | 1 | 1 | 1 | |||
80 | 1 | 1 | 1 | |||
13 | unresi | 2113 | 40 | 0.89 | 1 | 0.942 |
50 | 0.96 | 1 | 0.982 | |||
65 | 0.9995 | 1 | 0.9997 | |||
75 | 0.9997 | 1 | 0.9998 | |||
85 | 1 | 1 | 1 | |||
100 | 0.9995 | 0.9993 | 0.9996 | |||
110 | 0.998 | 0.981 | 0.99 | |||
resi | 293 | 20 | 0.732 | 1 | 0.845 | |
30 | 0.83 | 1 | 0.907 | |||
40 | 0.83 | 1 | 0.907 | |||
50 | 0.902 | 1 | 0.95 | |||
60 | 0.902 | 1 | 0.95 | |||
70 | 0.902 | 1 | 0.95 | |||
80 | 0.902 | 1 | 0.95 | |||
14 | unresi | 4213 | 40 | 0.82 | 1 | 0.9 |
50 | 0.927 | 1 | 0.962 | |||
65 | 0.984 | 1 | 0.992 | |||
75 | 0.994 | 1 | 0.997 | |||
85 | 0.999 | 0.999 | 0.999 | |||
100 | 0.999 | 0.996 | 0.998 | |||
110 | 1 | 0.988 | 0.944 | |||
resi | 560 | 20 | 0.899 | 1 | 0.947 | |
30 | 0.937 | 1 | 0.967 | |||
40 | 0.949 | 1 | 0.974 | |||
50 | 0.949 | 1 | 0.974 | |||
60 | 0.949 | 1 | 0.974 | |||
70 | 0.949 | 1 | 0.974 | |||
80 | 0.949 | 1 | 0.974 | |||
14 | unresi | 4173 | 40 | 0.817 | 1 | 0.899 |
50 | 0.906 | 1 | 0.951 | |||
65 | 0.973 | 1 | 0.986 | |||
75 | 0.989 | 1 | 0.995 | |||
85 | 0.995 | 1 | 0.998 | |||
100 | 0.999 | 0.995 | 0.997 | |||
110 | 0.998 | 0.989 | 0.994 | |||
resi | 545 | 20 | 0.965 | 1 | 0.982 | |
30 | 0.976 | 1 | 0.998 | |||
40 | 0.976 | 1 | 0.998 | |||
50 | 0.976 | 1 | 0.998 | |||
60 | 0.976 | 1 | 0.998 | |||
70 | 0.976 | 1 | 0.998 | |||
80 | 0.976 | 1 | 0.998 |
In my cluster algorithm, for one patient, the patients with distance lower than threshold distance will be put into the cluster of the patient. The convergence condition is that for every patient in the cluster, the closest patient should be in the same cluster.
If the distance thresshold is 100, B, C and D will be clustered into one cluster. However, if the distance threshold inreases to 200, A , B and C will be in the same cluster.
As this algorithm is sensitive to the order of patients. In this experiment, the order of patients was shuffled 50 times. The average precision, recall and F1 were calculated.
As to resistant patients,the clustering algorithm achieves the best evaluation with mean precision 1, mean recall 1 and mean F1 1 when the distance threshold is larger than 30.
Resistant patients
Distance_Threshlod:10
precision: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
precision mean: 1.0
precision std: 0.0
recall: 0.595744680851 0.595744680851 0.595744680851 0.63829787234 0.595744680851 0.595744680851 0.63829787234 0.595744680851 0.595744680851 0.595744680851 0.595744680851 0.595744680851 0.595744680851 0.63829787234 0.63829787234 0.63829787234 0.63829787234 0.595744680851 0.595744680851 0.595744680851 0.595744680851 0.595744680851 0.595744680851 0.63829787234 0.63829787234 0.595744680851 0.595744680851 0.63829787234 0.595744680851 0.595744680851 0.595744680851 0.595744680851 0.63829787234 0.595744680851 0.63829787234 0.595744680851 0.595744680851 0.63829787234 0.595744680851 0.63829787234 0.595744680851 0.595744680851 0.595744680851 0.63829787234 0.595744680851 0.63829787234 0.595744680851 0.595744680851 0.595744680851 0.595744680851
recall mean: 0.608510638298
recall std: 0.0196982999952
F1: 0.746666666667 0.746666666667 0.746666666667 0.779220779221 0.746666666667 0.746666666667 0.779220779221 0.746666666667 0.746666666667 0.746666666667 0.746666666667 0.746666666667 0.746666666667 0.779220779221 0.779220779221 0.779220779221 0.779220779221 0.746666666667 0.746666666667 0.746666666667 0.746666666667 0.746666666667 0.746666666667 0.779220779221 0.779220779221 0.746666666667 0.746666666667 0.779220779221 0.746666666667 0.746666666667 0.746666666667 0.746666666667 0.779220779221 0.746666666667 0.779220779221 0.746666666667 0.746666666667 0.779220779221 0.746666666667 0.779220779221 0.746666666667 0.746666666667 0.746666666667 0.779220779221 0.746666666667 0.779220779221 0.746666666667 0.746666666667 0.746666666667 0.746666666667
F1 mean: 0.756432900433
F1 std: 0.0150696258664
cluster number: 261 261 261 260 261 261 260 261 261 261 261 261 261 260 260 260 260 261 261 261 261 261 261 260 260 261 261 260 261 261 261 261 260 261 260 261 261 260 261 260 261 261 261 260 261 260 261 261 261 261
cluster number mean: 260.7
cluster number std: 0.462910049886
Distance_Threshlod:30
precision: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
precision mean: 1.0
precision std: 0.0
recall: 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255 0.978723404255
recall mean: 0.978723404255
recall std: 2.24298922669e-16
F1: 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828 0.989247311828
F1 mean: 0.989247311828
F1 std: 5.60747306673e-16
cluster number: 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252
cluster number mean: 252.0
cluster number std: 0.0
Distance_Threshlod:50
precision: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
precision mean: 1.0
precision std: 0.0
recall: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
recall mean: 1.0
recall std: 0.0
F1: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
F1 mean: 1.0
F1 std: 0.0
cluster number: 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251
cluster number mean: 251.0
cluster number std: 0.0
Distance_Threshlod:70
precision: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
precision mean: 1.0
precision std: 0.0
recall: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
recall mean: 1.0
recall std: 0.0
F1: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
F1 mean: 1.0
F1 std: 0.0
cluster number: 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251
cluster number mean: 251.0
cluster number std: 0.0
Distance_Threshlod:30
precision: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
precision mean: 1.0
precision std: 0.0
recall: 0.683061079545 0.658025568182 0.692649147727 0.665660511364 0.679865056818 0.701171875 0.685901988636 0.6875 0.679154829545 0.67578125 0.678089488636 0.686967329545 0.682350852273 0.665838068182 0.684659090909 0.677911931818 0.691583806818 0.681818181818 0.669211647727 0.682883522727 0.680397727273 0.687855113636 0.673295454545 0.678799715909 0.679332386364 0.681640625 0.688920454545 0.678977272727 0.678622159091 0.697975852273 0.677024147727 0.683238636364 0.690340909091 0.679154829545 0.680575284091 0.692116477273 0.686789772727 0.674538352273 0.688565340909 0.695134943182 0.684659090909 0.671164772727 0.680397727273 0.6953125 0.687677556818 0.691228693182 0.682350852273 0.688920454545 0.702059659091 0.690518465909
recall mean: 0.683153409091
recall std: 0.00884838573748
F1: 0.81168899673 0.793745984151 0.818420224483 0.799275130583 0.809428178839 0.824339839265 0.813691416535 0.814814814815 0.808924606112 0.806526806527 0.808168447783 0.814440585202 0.811187335092 0.799403112343 0.81281618887 0.808042328042 0.817676078514 0.810810810811 0.801829592597 0.811563621017 0.809805579036 0.815064169998 0.804753820034 0.808672659968 0.809050539226 0.81068524971 0.815811606392 0.808798646362 0.808546646922 0.822126947611 0.807411328745 0.811814345992 0.816806722689 0.808924606112 0.809931325938 0.818048268625 0.814315789474 0.805640971265 0.81556256572 0.820152927621 0.81281618887 0.803229919252 0.809805579036 0.820276497696 0.814939505523 0.817427821522 0.811187335092 0.815811606392 0.824953056541 0.816930994643
F1 mean: 0.811721946406
F1 std: 0.00625717877204
cluster number: 691 701 693 700 692 687 691 693 696 695 692 692 694 700 694 693 694 692 697 693 699 694 700 700 696 698 695 698 698 689 695 695 691 692 698 689 689 697 693 688 694 700 696 688 690 689 693 689 685 690
cluster number mean: 693.76
cluster number std: 3.94120048012
Distance_Threshlod:60
precision: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
precision mean: 1.0
precision std: 0.0
recall: 0.979580965909 0.972833806818 0.970703125 0.967862215909 0.979936079545 0.976029829545 0.986328125 0.955610795455 0.973721590909 0.983842329545 0.972123579545 0.974609375 0.971590909091 0.971768465909 0.976740056818 0.984552556818 0.976029829545 0.982066761364 0.969815340909 0.976384943182 0.978870738636 0.975142045455 0.967151988636 0.972833806818 0.965376420455 0.976384943182 0.978160511364 0.975319602273 0.971413352273 0.961825284091 0.985262784091 0.974076704545 0.973188920455 0.967862215909 0.980823863636 0.980113636364 0.974964488636 0.976384943182 0.969992897727 0.986150568182 0.979936079545 0.973721590909 0.977805397727 0.979048295455 0.967151988636 0.977095170455 0.982066761364 0.976029829545 0.968039772727 0.969105113636
recall mean: 0.974868607955
recall std: 0.00623172449967
F1: 0.989685173558 0.986229862299 0.985133795837 0.983668681765 0.989866379697 0.987869530057 0.993117010816 0.977301616125 0.986685858222 0.991855365614 0.985864769965 0.987141444115 0.985590778098 0.985682125169 0.988233180634 0.992216158182 0.987869530057 0.990952252979 0.984676401659 0.988051388015 0.989322566173 0.987414599065 0.983301742034 0.986229862299 0.982383232451 0.988051388015 0.988959698411 0.987505617978 0.985499414573 0.98054122545 0.992576692604 0.986868141752 0.986412309907 0.983668681765 0.990319110792 0.989956958393 0.987323563787 0.988051388015 0.984767913475 0.993026998033 0.989866379697 0.986685858222 0.988778166801 0.989413242419 0.983301742034 0.988414907948 0.990952252979 0.987869530057 0.983760375316 0.98431018936
F1 mean: 0.987264501053
F1 std: 0.00320042147178
cluster number: 538 540 540 540 537 539 536 545 541 538 540 541 542 540 540 537 540 538 541 540 539 542 540 540 542 539 538 540 540 543 538 541 539 543 540 539 541 540 541 538 538 541 539 540 540 539 539 539 542 541
cluster number mean: 539.88
cluster number std: 1.67380783039
Distance_Threshlod:90
precision: 0.994336283186 0.994336283186 0.991531404375 0.991531404375 1.0 1.0 1.0 0.987582047188 1.0 0.982810033327 0.987582047188 0.998050682261 0.991531404375 0.988740323716 0.99434029006 0.988740323716 0.994336283186 0.991531404375 0.991531404375 0.98567132496 0.980501392758 0.994983876747 0.994336283186 0.994983876747 0.994336283186 0.988740323716 0.994983876747 0.994336283186 0.985658640227 0.988740323716 0.988740323716 0.988740323716 0.988740323716 0.991531404375 0.98961084698 0.991531404375 1.0 0.988740323716 0.994336283186 0.991531404375 1.0 1.0 1.0 0.991531404375 0.991531404375 1.0 0.988740323716 1.0 0.991531404375 0.99499195135
precision mean: 0.992752264608
precision std: 0.00488101356762
recall: 0.997514204545 0.997514204545 0.997869318182 0.997869318182 1.0 1.0 1.0 0.988458806818 1.0 0.994850852273 0.988458806818 1.0 0.997869318182 0.997869318182 0.998224431818 0.997869318182 0.997514204545 0.997869318182 0.997869318182 0.989346590909 1.0 0.986150568182 0.997514204545 0.986150568182 0.997514204545 0.997869318182 0.986150568182 0.997514204545 0.988458806818 0.997869318182 0.997869318182 0.997869318182 0.997869318182 0.997869318182 0.997869318182 0.997869318182 1.0 0.997869318182 0.997514204545 0.997869318182 1.0 1.0 1.0 0.997869318182 0.997869318182 1.0 0.997869318182 1.0 0.997869318182 0.987748579545
recall mean: 0.996637073864
recall std: 0.00414384975034
F1: 0.99592270874 0.99592270874 0.994690265487 0.994690265487 1.0 1.0 1.0 0.988020232496 1.0 0.988793788053 0.988020232496 0.999024390244 0.994690265487 0.993283845882 0.996278575226 0.993283845882 0.99592270874 0.994690265487 0.994690265487 0.987505538325 0.990154711674 0.990547529873 0.99592270874 0.990547529873 0.99592270874 0.993283845882 0.990547529873 0.99592270874 0.987056737589 0.993283845882 0.993283845882 0.993283845882 0.993283845882 0.994690265487 0.993722924587 0.994690265487 1.0 0.993283845882 0.99592270874 0.994690265487 1.0 1.0 1.0 0.994690265487 0.994690265487 1.0 0.993283845882 1.0 0.994690265487 0.991357034661
F1 mean: 0.994683664989
F1 std: 0.00364051726246
cluster number: 529 529 528 528 528 528 528 529 528 529 529 527 528 528 528 528 529 528 528 527 527 531 529 531 529 528 531 529 528 528 528 528 528 528 527 528 528 528 529 528 528 528 528 528 528 528 528 528 528 530
cluster number mean: 528.34
cluster number std: 0.894655332106
Distance_Threshlod:120
precision: 0.953767993226 0.952170963365 0.95798605205 0.953350296862 0.955385920271 0.954576271186 0.958964753959 0.96080780421 0.949392712551 0.955385920271 0.957226629872 0.955385920271 0.9519133085 0.956477214542 0.955776482688 0.958964753959 0.948934731146 0.950531825089 0.955385920271 0.950531825089 0.95468483816 0.952981260647 0.955223880597 0.958964753959 0.951030057413 0.957366984993 0.955385920271 0.954576271186 0.959917780062 0.958311888832 0.956285082497 0.947332883187 0.95468483816 0.95468483816 0.956285082497 0.956285082497 0.951030057413 0.952089704383 0.954576271186 0.957366984993 0.953727506427 0.954970263381 0.951030057413 0.955385920271 0.957366984993 0.958964753959 0.953788651036 0.957334693184 0.959877070172 0.952461512434
precision mean: 0.954937783475
precision std: 0.00306086976452
recall: 1.0 0.996803977273 1.0 0.997869318182 1.0 1.0 1.0 0.996803977273 0.999289772727 1.0 0.985440340909 1.0 0.998224431818 0.995028409091 0.990056818182 1.0 0.996448863636 0.999644886364 1.0 0.999644886364 0.995028409091 0.993252840909 0.988636363636 1.0 1.0 0.996803977273 1.0 1.0 0.995028409091 0.991832386364 0.998224431818 0.996448863636 0.995028409091 0.995028409091 0.998224431818 0.998224431818 1.0 0.995028409091 1.0 0.996803977273 0.988103693182 0.997869318182 1.0 1.0 0.996803977273 1.0 0.996803977273 1.0 0.998224431818 0.999644886364
recall mean: 0.997325994318
recall std: 0.0034898043894
F1: 0.976337002687 0.973976405274 0.978542263921 0.975101934588 0.977184002776 0.976760319112 0.979052585832 0.978474945534 0.973702422145 0.977184002776 0.971128608924 0.977184002776 0.974518980759 0.975372030285 0.9726146869 0.979052585832 0.972111553785 0.974469926439 0.977184002776 0.974469926439 0.97443922796 0.97270039993 0.971642963092 0.979052585832 0.974900467371 0.976687543493 0.977184002776 0.976760319112 0.977157802964 0.974784050257 0.976804795413 0.971270335756 0.97443922796 0.97443922796 0.976804795413 0.976804795413 0.974900467371 0.97308560514 0.976760319112 0.976687543493 0.970611319438 0.975948597725 0.974900467371 0.977184002776 0.976687543493 0.979052585832 0.974822017711 0.978202344768 0.978675254591 0.975482976696
F1 mean: 0.975665915516
F1 std: 0.00222625183981
cluster number: 514 514 516 516 515 515 516 517 515 515 520 515 515 516 517 516 515 515 515 515 516 516 519 516 514 516 515 515 517 517 516 514 516 516 516 516 514 515 515 516 518 517 514 515 516 516 515 515 517 515
cluster number mean: 515.7
cluster number std: 1.21638474042
DBScan is a clustering mathod based on density.
DBScan two main parameters Eps and MinPts. Eps is maximum redius of the neighbourhood. MinPts is is the minimum number of points in an Eps-neighbourhood of that points. In our experiment, MinPts is 1.
In the experiment result below, we can see that for unresistant patients the best performance of DBScan is with Eps = 60. The precision, recall and F1 is 1, 0.995,0.998. It is a little higher than my clustering method. For resistant patients, DBScan achieved The precision, recall and F1 is 1, 1,1. It is the same with my clustering method.
precision: 1.0 precision mean: 1.0 precision std: nan
recall: 0.779296875 recall mean: 0.779296875 recall std: nan
F1: 0.875960482986 F1 mean: 0.875960482986 F1 std: nan
Eps:60
precision: 1.0 precision mean: 1.0 precision std: nan
recall: 0.995383522727 recall mean: 0.995383522727 recall std: nan
F1: 0.997686421071 F1 mean: 0.997686421071 F1 std: nan
Eps:90
precision: 0.978627280626 precision mean: 0.978627280626 precision std: nan
recall: 1.0 recall mean: 1.0 recall std: nan
F1: 0.989198208483 F1 mean: 0.989198208483 F1 std: nan
Eps:120
precision: 0.942121110739 precision mean: 0.942121110739 precision std: nan
recall: 1.0 recall mean: 1.0 recall std: nan
F1: 0.970198105082 F1 mean: 0.970198105082 F1 std: nan
Eps:10
precision: 1.0 precision mean: 1.0 precision std: nan
recall: 0.808510638298 recall mean: 0.808510638298 recall std: nan
F1: 0.894117647059 F1 mean: 0.894117647059 F1 std: nan
Eps:30
precision: 1.0 precision mean: 1.0 precision std: nan
recall: 0.978723404255 recall mean: 0.978723404255 recall std: nan
F1: 0.989247311828 F1 mean: 0.989247311828 F1 std: nan
Eps:50
precision: 1.0 precision mean: 1.0 precision std: nan
recall: 1.0 recall mean: 1.0 recall std: nan
F1: 1.0 F1 mean: 1.0 F1 std: nan
Eps:70
precision: 1.0 precision mean: 1.0 precision std: nan
recall: 1.0 recall mean: 1.0 recall std: nan
F1: 1.0 F1 mean: 1.0 F1 std: nan
In [ ]: