In [ ]:
library(ggplot2)
library(plyr)
library(reshape2)
In [ ]:
stats = read.csv("overall.csv")
stats$rep = as.factor(sort(rep(1:10, times=44)))
In [ ]:
summary(stats)
table(stats$scale)
table(stats$coverage)
We compare average genome coverage and the scale of varaition againsnt accuracy (i.e. Spearman's $\rho$) (over the 10 reps).
We compare the effects of coverage and scale independently.
In [ ]:
coverage = stats[stats$scale==0.01, ]
coverage$scale = NULL
summary(coverage)
In [ ]:
ggplot(coverage, aes(x=coverage, y=spearman, linetype=measure, color=rep)) +
geom_line(aes(linetype=measure, color=rep),size=1.5) +
xlab('Genome Coverage') +
ylab(expression(paste("Spearman's ", rho))) +
#scale_x_log10()+
theme_bw()
Here we summarise the replicates to averages $\pm$ SD. Note that we exclude replicate 8 as it is an outlier for both IP and WIP metrics (see above).
In [ ]:
coverage = coverage[coverage$rep != 8,]
csumm = ddply(coverage, .(coverage, measure), summarise,
spearman_m=mean(spearman),
spearman_sd=sd(spearman))
summary(csumm)
You can see below that WIP marginally outperforms IP, at low coverage. Above about 20x, I would say that WIP and IP have equivalent performance.
The ribbon is 2 SD, so there is certainly no signficant difference.
In [ ]:
ggplot(csumm, aes(x=coverage, y=spearman_m, ymin=spearman_m-2*spearman_sd, ymax=spearman_m+2*spearman_sd, group=measure)) +
geom_line(aes(linetype=measure)) +
geom_ribbon(aes(fill=measure), alpha=0.2) +
xlab('Genome Coverage') +
ylab(expression(paste("Spearman's ", rho))) +
#scale_x_log10()+
theme_bw()
The differnce between WIP and IP is calculated here
In [ ]:
cdiff = dcast(coverage, coverage * rep~ measure, value.var="spearman")
#cdiff = ddply(cdiff, .(coverage, rep), summarise, spearman_d=wip - ip)
cdiff = ddply(cdiff, .(coverage), summarise, diff_m=mean(wip - ip), diff_sd=sd(wip - ip))
summary(cdiff)
In [ ]:
ggplot(cdiff, aes(x=coverage, y=diff_m, ymin=diff_m-diff_sd, ymax=diff_m+diff_sd)) +
geom_line() +
geom_ribbon(alpha=0.4) +
xlab('Genome Coverage') +
ylab(expression(paste("Spearman's ", rho))) +
scale_x_log10()+
theme_bw()
There is no signficant difference between the two methods (0 is never outside the confidence interval).
In [ ]:
coverage = stats[stats$scale==0.001, ]
coverage$scale = NULL
summary(coverage)
In [ ]:
ggplot(coverage, aes(x=coverage, y=spearman, linetype=measure, color=rep)) +
geom_line(aes(linetype=measure, color=rep),size=1.5) +
xlab('Genome Coverage') +
ylab(expression(paste("Spearman's ", rho))) +
#scale_x_log10()+
theme_bw()
Here we summarise the replicates to averages $\pm$ SD.
In [ ]:
csumm = ddply(coverage, .(coverage, measure), summarise,
spearman_m=mean(spearman),
spearman_sd=sd(spearman))
summary(csumm)
You can see below that WIP marginally outperforms IP, at low coverage. Above about 20x, I would say that WIP and IP have equivalent performance.
The ribbon is 2 SD, so this time there could be signficant differences.
In [ ]:
p = ggplot(csumm, aes(x=coverage, y=spearman_m, ymin=spearman_m-spearman_sd, ymax=spearman_m+spearman_sd, group=measure)) +
geom_line(aes(linetype=measure)) +
geom_ribbon(aes(fill=measure), alpha=0.2) +
xlab('Genome Coverage') +
ylab(expression(paste("Spearman's ", rho))) +
#scale_x_log10()+
theme_bw()
pdf("cov_vs_rho_1000.pdf")
print(p)
dev.off()
p
The differnce between WIP and IP is calculated here
In [ ]:
cdiff = dcast(coverage, coverage * rep~ measure, value.var="spearman")
#cdiff = ddply(cdiff, .(coverage, rep), summarise, spearman_d=wip - ip)
cdiff = ddply(cdiff, .(coverage), summarise, diff_m=mean(wip - ip), diff_sd=sd(wip - ip))
summary(cdiff)
In [ ]:
ggplot(cdiff, aes(x=coverage, y=diff_m, ymin=diff_m-2*diff_sd, ymax=diff_m+2*diff_sd)) +
geom_line() +
geom_ribbon(alpha=0.4) +
xlab('Genome Coverage') +
ylab(expression(paste("Spearman's ", rho))) +
scale_x_log10()+
theme_bw()
In [ ]:
coverage = stats[stats$scale==0.001, ]
coverage$scale = NULL
summary(coverage)
In [ ]:
ggplot(coverage, aes(x=coverage, y=spearman, linetype=measure, color=rep)) +
geom_line(aes(linetype=measure, color=rep),size=1.5) +
xlab('Genome Coverage') +
ylab(expression(paste("Spearman's ", rho))) +
#scale_x_log10()+
theme_bw()
Here we summarise the replicates to averages $\pm$ SD. Note that we exclude replicate 8 as it is an outlier for both IP and WIP metrics (see above).
In [ ]:
coverage = coverage[coverage$rep != 8,]
csumm = ddply(coverage, .(coverage, measure), summarise,
spearman_m=mean(spearman),
spearman_sd=sd(spearman))
summary(csumm)
You can see below that WIP marginally outperforms IP, at low coverage. Above about 20x, I would say that WIP and IP have equivalent performance.
The ribbon is 1 SD, so there is certainly no signficant difference.
In [ ]:
p <- ggplot(csumm, aes(x=coverage, y=spearman_m, ymin=spearman_m-spearman_sd, ymax=spearman_m+spearman_sd, group=measure)) +
geom_line(aes(linetype=measure)) +
geom_ribbon(aes(fill=measure), alpha=0.2) +
xlab('Genome Coverage') +
ylab(expression(paste("Spearman's ", rho))) +
#scale_x_log10()+
theme_bw()
p
In [ ]:
pdf('coverage-accuracy.pdf')
print(p)
dev.off()
The differnce between WIP and IP is calculated here
In [ ]:
cdiff = dcast(coverage, coverage * rep~ measure, value.var="spearman")
#cdiff = ddply(cdiff, .(coverage, rep), summarise, spearman_d=wip - ip)
cdiff = ddply(cdiff, .(coverage), summarise, diff_m=mean(wip - ip), diff_sd=sd(wip - ip))
summary(cdiff)
In [ ]:
p = ggplot(cdiff, aes(x=coverage, y=diff_m, ymin=diff_m-2*diff_sd, ymax=diff_m+2*diff_sd)) +
geom_line() +
geom_ribbon(alpha=0.2) +
xlab('Genome Coverage') +
ylab(expression(paste("Pair-wise difference in Spearman's ", rho))) +
#scale_x_log10()+
theme_bw()
p
In [ ]:
pdf('diff-coverage.pdf')
print(p)
dev.off()
So in terms of pairwise differences in performance between WIP and IP, there is a signfincant improvement above about 2x (below which they are both pretty crap)
As expected
In [ ]: