Classifier performances for deriving replicates

For each disease, we derive replicates of the mapping of RCTs across diseases after simulating what would have been the mapping of RCTs within regions if the misclassification of RCTs towards groups of diseases was corrected, given the sensitivities and specificities of the classifier to identify each group of disease.

To estimate the performances of the classifier for each group of diseases, we dispose a test set with 2,763 trials manually classified towards the 27-class grouping of diseases used in this work. The test set is described at Atal et al. BMC Bioinformatics 2016.

This script is for calculating sensitivity and specificity of the classifier to identify the disease and other studies relevant to the burden of diseases, and the number of success and number of trials to derive beta distributions

1. Sensitivities and specificities based on test set


In [2]:
#test set, not included in the repo
test_set <- read.table("/media/igna/Elements/HotelDieu/Cochrane/MappingRCTs_vs_Burden/test_set_classified_to28cats.txt")
dim(test_set)


  1. 2763
  2. 8

In [3]:
#We supress injuries from trials concerning the burden of diseases (category nro 28)
test_set$GBDnp <- sapply(strsplit(as.character(test_set$GBDnp),"&&"),function(x){paste(x[x!="28"],collapse="&")})
test_set$GBD28 <- sapply(strsplit(as.character(test_set$GBD28),"&"),function(x){paste(x[x!="28"],collapse="&")})

In [4]:
tst <- strsplit(test_set$GBDnp,"&")
alg <- strsplit(test_set$GBD28,"&")
tst <- lapply(tst,as.numeric)
alg <- lapply(alg,as.numeric)

In [5]:
source('../utils/Evaluation_metrics.R')

In [6]:
dis <- 1:27
Mgbd <- read.table("../Data/27_gbd_groups.txt")

In [7]:
#For each category in 1:27, TP, TN, FP and FN of finding the disease and of finding another disease
set.seed(7212)

dis <- as.character(1:27)

PERF_F  <- data.frame()
for(i in dis){
    ALG <- lapply(alg,function(x){rs <- c()
                                  if(i%in%x) rs <- c(1)
                                  if(sum(setdiff(dis,i)%in%x)!=0) rs <- c(rs,2)
                                  return(rs)
                                      })

    DT <- lapply(tst,function(x){rs <- c()
                                if(i%in%x) rs <- c(1)
                                if(sum(setdiff(dis,i)%in%x)!=0) rs <- c(rs,2)
                                return(rs)
                                    })

    CM <- conf_matrix(ALG,DT,c(1,2))

    PERF <- c(CM[1,],CM[2,])
    PERF_F <- rbind(PERF_F,PERF)
}

In [8]:
#We add performances of classifier to identify trials relevant to the burden of diseases
    ALG <- lapply(alg,length)
    DT <- lapply(tst,length)
    CM <- conf_matrix(ALG,DT,1)
    PERF <- c(CM,rep(NA,4))
    PERF_F <- rbind(PERF_F,PERF)

In [9]:
PERF_F <- data.frame(PERF_F)
names(PERF_F) <- paste(rep(c("TP","FP","TN","FN"),2),rep(c("_Dis","_Oth"),each=4),sep="")

In [14]:
PERF_F$dis <- c(dis,0)
PERF_F$GBD <- c(as.character(Mgbd$x[-28]),"All")

In [15]:
PERF_F <- PERF_F[,c(9,10,1:8)]

In [16]:
Mgbd


x
1Tuberculosis
2HIV/AIDS
3Diarrhea, lower respiratory infections, meningitis, and other common infectious diseases
4Malaria
5Neglected tropical diseases excluding malaria
6Maternal disorders
7Neonatal disorders
8Nutritional deficiencies
9Sexually transmitted diseases excluding HIV
10Hepatitis
11Leprosy
12Neoplasms
13Cardiovascular and circulatory diseases
14Chronic respiratory diseases
15Cirrhosis of the liver
16Digestive diseases (except cirrhosis)
17Neurological disorders
18Mental and behavioral disorders
19Diabetes, urinary diseases and male infertility
20Gynecological diseases
21Hemoglobinopathies and hemolytic anemias
22Musculoskeletal disorders
23Congenital anomalies
24Skin and subcutaneous diseases
25Sense organ diseases
26Oral disorders
27Sudden infant death syndrome

In [17]:
PERF_F


TN_OthFN_OthdisGBDTP_DisFP_DisTN_DisFN_DisTP_OthFP_Oth
1267 150 1 Tuberculosis14 2 2745 2 2142 204
2333 144 2 HIV/AIDS86 7 2659 11 2072 214
3299 144 3 Diarrhea, lower respiratory infections, meningitis, and other common infectious diseases40 21 2693 9 2113 207
4267 150 4 Malaria14 1 2748 0 2142 204
5261 149 5 Neglected tropical diseases excluding malaria6 0 2756 1 2150 203
6289 134 6 Maternal disorders17 5 2715 26 2130 210
7262 148 7 Neonatal disorders4 7 2746 6 2148 205
8272 150 8 Nutritional deficiencies11 15 2732 5 2140 201
9255 150 9 Sexually transmitted diseases excluding HIV0 3 2759 1 2155 203
10262 152 10 Hepatitis14 4 2742 3 2141 208
11256 150 11 Leprosy2 1 2760 0 2154 203
121198 138 12 Neoplasms933 42 1763 25 1213 214
13466 129 13 Cardiovascular and circulatory diseases178 60 2468 57 1951 217
14328 152 14 Chronic respiratory diseases76 17 2665 5 2074 209
15267 152 15 Cirrhosis of the liver19 17 2723 4 2133 211
16289 146 16 Digestive diseases (except cirrhosis)24 28 2703 8 2129 199
17339 153 17 Neurological disorders79 40 2630 14 2060 211
18402 149 18 Mental and behavioral disorders134 33 2587 9 2014 198
19473 147 19 Diabetes, urinary diseases and male infertility196 63 2458 46 1930 213
20262 149 20 Gynecological diseases9 8 2744 2 2146 206
21270 147 21 Hemoglobinopathies and hemolytic anemias10 4 2743 6 2143 203
22382 147 22 Musculoskeletal disorders100 40 2610 13 2046 188
23275 162 23 Congenital anomalies22 34 2706 1 2121 205
24281 150 24 Skin and subcutaneous diseases18 24 2717 4 2134 198
25322 166 25 Sense organ diseases52 40 2667 4 2085 190
26258 148 26 Oral disorders3 4 2751 5 2150 207
27254 150 27 Sudden infant death syndrome0 0 2763 0 2156 203
28NA NA 0 All 2022165 314 262 NA NA

In [12]:
write.csv(PERF_F,'../Tables/Performances_per_27disease_data.csv')

In [ ]: