In [1]:
library("DESeq2")
setwd("~/relmapping")


Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
    colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
    mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which, which.max, which.min

Warning message:
“replacing previous import ‘stats::sd’ by ‘BiocGenerics::sd’ when loading ‘S4Vectors’”Warning message:
“replacing previous import ‘stats::var’ by ‘BiocGenerics::var’ when loading ‘S4Vectors’”Warning message:
“multiple methods tables found for ‘var’”Warning message:
“multiple methods tables found for ‘sd’”Warning message:
“multiple methods tables found for ‘rowSums’”Warning message:
“multiple methods tables found for ‘colSums’”Warning message:
“multiple methods tables found for ‘rowMeans’”Warning message:
“multiple methods tables found for ‘colMeans’”
Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:BiocGenerics’:

    colMeans, colSums, rowMeans, rowSums

The following objects are masked from ‘package:base’:

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Warning message:
“replacing previous import ‘stats::sd’ by ‘BiocGenerics::sd’ when loading ‘IRanges’”Warning message:
“replacing previous import ‘stats::var’ by ‘BiocGenerics::var’ when loading ‘IRanges’”Warning message:
“replacing previous import ‘BiocGenerics::rowSums’ by ‘S4Vectors::rowSums’ when loading ‘IRanges’”Warning message:
“replacing previous import ‘BiocGenerics::var’ by ‘S4Vectors::var’ when loading ‘IRanges’”Warning message:
“replacing previous import ‘BiocGenerics::rowMeans’ by ‘S4Vectors::rowMeans’ when loading ‘IRanges’”Warning message:
“replacing previous import ‘BiocGenerics::colSums’ by ‘S4Vectors::colSums’ when loading ‘IRanges’”Warning message:
“replacing previous import ‘BiocGenerics::sd’ by ‘S4Vectors::sd’ when loading ‘IRanges’”Warning message:
“replacing previous import ‘BiocGenerics::colMeans’ by ‘S4Vectors::colMeans’ when loading ‘IRanges’”Warning message:
“multiple methods tables found for ‘var’”Warning message:
“multiple methods tables found for ‘sd’”Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Warning message:
“replacing previous import ‘BiocGenerics::rowSums’ by ‘S4Vectors::rowSums’ when loading ‘GenomeInfoDb’”Warning message:
“replacing previous import ‘BiocGenerics::var’ by ‘S4Vectors::var’ when loading ‘GenomeInfoDb’”Warning message:
“replacing previous import ‘BiocGenerics::rowMeans’ by ‘S4Vectors::rowMeans’ when loading ‘GenomeInfoDb’”Warning message:
“replacing previous import ‘BiocGenerics::colSums’ by ‘S4Vectors::colSums’ when loading ‘GenomeInfoDb’”Warning message:
“replacing previous import ‘BiocGenerics::sd’ by ‘S4Vectors::sd’ when loading ‘GenomeInfoDb’”Warning message:
“replacing previous import ‘BiocGenerics::colMeans’ by ‘S4Vectors::colMeans’ when loading ‘GenomeInfoDb’”Warning message:
“replacing previous import ‘BiocGenerics::rowSums’ by ‘S4Vectors::rowSums’ when loading ‘GenomicRanges’”Warning message:
“replacing previous import ‘BiocGenerics::var’ by ‘S4Vectors::var’ when loading ‘GenomicRanges’”Warning message:
“replacing previous import ‘BiocGenerics::rowMeans’ by ‘S4Vectors::rowMeans’ when loading ‘GenomicRanges’”Warning message:
“replacing previous import ‘BiocGenerics::colSums’ by ‘S4Vectors::colSums’ when loading ‘GenomicRanges’”Warning message:
“replacing previous import ‘BiocGenerics::sd’ by ‘S4Vectors::sd’ when loading ‘GenomicRanges’”Warning message:
“replacing previous import ‘BiocGenerics::colMeans’ by ‘S4Vectors::colMeans’ when loading ‘GenomicRanges’”Warning message:
“replacing previous import ‘BiocGenerics::rowSums’ by ‘S4Vectors::rowSums’ when loading ‘XVector’”Warning message:
“replacing previous import ‘BiocGenerics::var’ by ‘S4Vectors::var’ when loading ‘XVector’”Warning message:
“replacing previous import ‘BiocGenerics::rowMeans’ by ‘S4Vectors::rowMeans’ when loading ‘XVector’”Warning message:
“replacing previous import ‘BiocGenerics::colSums’ by ‘S4Vectors::colSums’ when loading ‘XVector’”Warning message:
“replacing previous import ‘BiocGenerics::sd’ by ‘S4Vectors::sd’ when loading ‘XVector’”Warning message:
“replacing previous import ‘BiocGenerics::colMeans’ by ‘S4Vectors::colMeans’ when loading ‘XVector’”Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Warning message:
“replacing previous import ‘BiocGenerics::rowSums’ by ‘S4Vectors::rowSums’ when loading ‘SummarizedExperiment’”Warning message:
“replacing previous import ‘BiocGenerics::var’ by ‘S4Vectors::var’ when loading ‘SummarizedExperiment’”Warning message:
“replacing previous import ‘BiocGenerics::rowMeans’ by ‘S4Vectors::rowMeans’ when loading ‘SummarizedExperiment’”Warning message:
“replacing previous import ‘BiocGenerics::colSums’ by ‘S4Vectors::colSums’ when loading ‘SummarizedExperiment’”Warning message:
“replacing previous import ‘BiocGenerics::sd’ by ‘S4Vectors::sd’ when loading ‘SummarizedExperiment’”Warning message:
“replacing previous import ‘BiocGenerics::colMeans’ by ‘S4Vectors::colMeans’ when loading ‘SummarizedExperiment’”Warning message:
“replacing previous import ‘BiocGenerics::var’ by ‘IRanges::var’ when loading ‘DESeq2’”Warning message:
“replacing previous import ‘BiocGenerics::sd’ by ‘IRanges::sd’ when loading ‘DESeq2’”Warning message:
“replacing previous import ‘BiocGenerics::rowSums’ by ‘S4Vectors::rowSums’ when loading ‘DESeq2’”Warning message:
“replacing previous import ‘BiocGenerics::rowMeans’ by ‘S4Vectors::rowMeans’ when loading ‘DESeq2’”Warning message:
“replacing previous import ‘BiocGenerics::colSums’ by ‘S4Vectors::colSums’ when loading ‘DESeq2’”Warning message:
“replacing previous import ‘BiocGenerics::colMeans’ by ‘S4Vectors::colMeans’ when loading ‘DESeq2’”Warning message:
“replacing previous import ‘BiocGenerics::rowSums’ by ‘S4Vectors::rowSums’ when loading ‘AnnotationDbi’”Warning message:
“replacing previous import ‘BiocGenerics::var’ by ‘S4Vectors::var’ when loading ‘AnnotationDbi’”Warning message:
“replacing previous import ‘BiocGenerics::rowMeans’ by ‘S4Vectors::rowMeans’ when loading ‘AnnotationDbi’”Warning message:
“replacing previous import ‘BiocGenerics::colSums’ by ‘S4Vectors::colSums’ when loading ‘AnnotationDbi’”Warning message:
“replacing previous import ‘BiocGenerics::sd’ by ‘S4Vectors::sd’ when loading ‘AnnotationDbi’”Warning message:
“replacing previous import ‘BiocGenerics::colMeans’ by ‘S4Vectors::colMeans’ when loading ‘AnnotationDbi’”

In [2]:
fp <- "~/relmapping/atac728/tg_se.bwa_se.rm_unmapped.rm_chrM.rm_blacklist.rm_q10.macs2_se_extsize150_shiftm75_keepdup_all_noSPMR.sfnorm/atac728_wt.tg_se.bwa_se.rm_unmapped.rm_chrM.rm_blacklist.rm_q10.macs2_se_extsize150_shiftm75_keepdup_all_noSPMR.sfnorm_counts.tsv"

In [3]:
countData <- read.table(fp, header=TRUE, sep="\t")#, row.names=1)

In [4]:
cond <- relevel(factor(c(
    "wt_emb", "wt_emb",
    "wt_l1", "wt_l1",
    "wt_l2", "wt_l2",
    "wt_l3", "wt_l3",
    "wt_l4", "wt_l4",
    "wt_ya", "wt_ya",
    "glp1_ya", "glp1_ya",
    "glp1_d3", "glp1_d3",
    "glp1_d7", "glp1_d7",
    "glp1_d10", "glp1_d10",
    "glp1_d14", "glp1_d14")), "wt_emb")

In [5]:
colData <- data.frame(row.names=colnames(countData), condition=cond)

In [6]:
dds <- DESeqDataSetFromMatrix(countData=countData, colData=colData, design=~condition)
dds <- estimateSizeFactors(dds)

In [7]:
sizeFactors(dds)


wt_emb_rep1
0.979444557074789
wt_emb_rep2
0.67992066981055
wt_l1_rep1
0.721623733625308
wt_l1_rep2
0.789945828887808
wt_l2_rep1
0.603746112907009
wt_l2_rep2
1.23242167090448
wt_l3_rep1
0.901252380272778
wt_l3_rep2
1.12410310329469
wt_l4_rep1
1.40255599391174
wt_l4_rep2
1.26968266945369
wt_ya_rep1
0.806781564274026
wt_ya_rep2
1.28634685862607
glp1_ya_rep1
1.01371822210262
glp1_ya_rep2
1.46954097274342
glp1_d3_rep1
0.949062486457647
glp1_d3_rep2
0.952572585187346
glp1_d7_rep1
0.891202979796336
glp1_d7_rep2
1.0732621541913
glp1_d10_rep1
0.96733820215278
glp1_d10_rep2
1.37192873702082
glp1_d14_rep1
0.824075243016692
glp1_d14_rep2
0.962952837126113

In [8]:
estimateSizeFactorsForMatrix(countData)


wt_emb_rep1
0.979444557074789
wt_emb_rep2
0.67992066981055
wt_l1_rep1
0.721623733625308
wt_l1_rep2
0.789945828887808
wt_l2_rep1
0.603746112907009
wt_l2_rep2
1.23242167090448
wt_l3_rep1
0.901252380272778
wt_l3_rep2
1.12410310329469
wt_l4_rep1
1.40255599391174
wt_l4_rep2
1.26968266945369
wt_ya_rep1
0.806781564274026
wt_ya_rep2
1.28634685862607
glp1_ya_rep1
1.01371822210262
glp1_ya_rep2
1.46954097274342
glp1_d3_rep1
0.949062486457647
glp1_d3_rep2
0.952572585187346
glp1_d7_rep1
0.891202979796336
glp1_d7_rep2
1.0732621541913
glp1_d10_rep1
0.96733820215278
glp1_d10_rep2
1.37192873702082
glp1_d14_rep1
0.824075243016692
glp1_d14_rep2
0.962952837126113

In [9]:
estimateSizeFactorsForMatrixMedianScaled <- function(counts, locfunc = stats::median, geoMeans, controlGenes) {
    if (missing(geoMeans)) {
        incomingGeoMeans <- FALSE
        loggeomeans <- rowMeans(log(counts))
        geomeansMedian <- exp(locfunc(loggeomeans[is.finite(loggeomeans)]))
        print(geomeansMedian)
    }
    #else {
    #    incomingGeoMeans <- TRUE
    #    if (length(geoMeans) != nrow(counts)) {
    #        stop("geoMeans should be as long as the number of rows of counts")
    #    }
    #    loggeomeans <- log(geoMeans)
    #}
    if (all(is.infinite(loggeomeans))) {
        stop("every gene contains at least one zero, cannot compute log geometric means")
    }
    sf <- if (missing(controlGenes)) {
        print("missing(controlGenes)")
        apply(counts, 2, function(cnts) {
            exp(locfunc((log(cnts) - loggeomeans)[is.finite(loggeomeans) & cnts > 0]))
        })
    }
    #else {
    #    print("not missing(controlGenes)")
    #    if (!(is.numeric(controlGenes) | is.logical(controlGenes))) {
    #        stop("controlGenes should be either a numeric or logical vector")
    #    }
    #    loggeomeansSub <- loggeomeans[controlGenes]
    #    apply(counts[controlGenes, , drop = FALSE], 2, function(cnts) {
    #        exp(locfunc((log(cnts) - loggeomeansSub)[is.finite(loggeomeansSub) & cnts > 0]))
    #    })
    #}
    #if (incomingGeoMeans) {
    #    print("incomingGeoMeans")
    #    sf <- sf/exp(mean(log(sf)))
    #}
    print(sf)
    print(1 / sf)
    print(1 / (sf * geomeansMedian))
}

estimateSizeFactorsForMatrixMedianScaled(countData)


[1] 97.75643
[1] "missing(controlGenes)"
  wt_emb_rep1   wt_emb_rep2    wt_l1_rep1    wt_l1_rep2    wt_l2_rep1 
    0.9794446     0.6799207     0.7216237     0.7899458     0.6037461 
   wt_l2_rep2    wt_l3_rep1    wt_l3_rep2    wt_l4_rep1    wt_l4_rep2 
    1.2324217     0.9012524     1.1241031     1.4025560     1.2696827 
   wt_ya_rep1    wt_ya_rep2  glp1_ya_rep1  glp1_ya_rep2  glp1_d3_rep1 
    0.8067816     1.2863469     1.0137182     1.4695410     0.9490625 
 glp1_d3_rep2  glp1_d7_rep1  glp1_d7_rep2 glp1_d10_rep1 glp1_d10_rep2 
    0.9525726     0.8912030     1.0732622     0.9673382     1.3719287 
glp1_d14_rep1 glp1_d14_rep2 
    0.8240752     0.9629528 
  wt_emb_rep1   wt_emb_rep2    wt_l1_rep1    wt_l1_rep2    wt_l2_rep1 
    1.0209868     1.4707598     1.3857637     1.2659096     1.6563254 
   wt_l2_rep2    wt_l3_rep1    wt_l3_rep2    wt_l4_rep1    wt_l4_rep2 
    0.8114106     1.1095671     0.8895981     0.7129840     0.7875984 
   wt_ya_rep1    wt_ya_rep2  glp1_ya_rep1  glp1_ya_rep2  glp1_d3_rep1 
    1.2394929     0.7773953     0.9864674     0.6804846     1.0536714 
 glp1_d3_rep2  glp1_d7_rep1  glp1_d7_rep2 glp1_d10_rep1 glp1_d10_rep2 
    1.0497888     1.1220788     0.9317388     1.0337646     0.7289008 
glp1_d14_rep1 glp1_d14_rep2 
    1.2134814     1.0384725 
  wt_emb_rep1   wt_emb_rep2    wt_l1_rep1    wt_l1_rep2    wt_l2_rep1 
  0.010444191   0.015045147   0.014175679   0.012949630   0.016943391 
   wt_l2_rep2    wt_l3_rep1    wt_l3_rep2    wt_l4_rep1    wt_l4_rep2 
  0.008300330   0.011350324   0.009100150   0.007293475   0.008056743 
   wt_ya_rep1    wt_ya_rep2  glp1_ya_rep1  glp1_ya_rep2  glp1_d3_rep1 
  0.012679400   0.007952370   0.010091075   0.006961022   0.010778538 
 glp1_d3_rep2  glp1_d7_rep1  glp1_d7_rep2 glp1_d10_rep1 glp1_d10_rep2 
  0.010738821   0.011478313   0.009531228   0.010574902   0.007456296 
glp1_d14_rep1 glp1_d14_rep2 
  0.012413316   0.010623061 

In [38]:
# sf <- raw DESeq sizefactors
sf <- sizeFactors(dds)
sf


wt_emb_rep1
1.02018089044883
wt_emb_rep2
0.706435654575078
wt_l1_rep1
0.712355985341904
wt_l1_rep2
0.791469777093167
wt_l2_rep1
0.628572668334735
wt_l2_rep2
1.25455533858885
wt_l3_rep1
0.912033787796305
wt_l3_rep2
1.17485029825242
wt_l4_rep1
1.52472215215913
wt_l4_rep2
1.35488927762581
wt_ya_rep1
0.825148736370067
wt_ya_rep2
1.40282635171228

In [40]:
# geomeansMedian <- median counts in pseudo-reference sample
loggeomeans <- rowMeans(log(countData))
geomeansMedian <- exp(stats::median(loggeomeans[is.finite(loggeomeans)]))
geomeansMedian


109.310187024283

In [42]:
# final ATAC-seq normalisation factors
1 / (sf * geomeansMedian)


wt_emb_rep1
0.00896730989236481
wt_emb_rep2
0.012949910061414
wt_l1_rep1
0.0128422844464942
wt_l1_rep2
0.0115585944728329
wt_l2_rep1
0.014554050234414
wt_l2_rep2
0.00729204835333416
wt_l3_rep1
0.0100306351730979
wt_l3_rep2
0.00778676075116235
wt_l4_rep1
0.00599996411016174
wt_l4_rep2
0.00675204855628792
wt_ya_rep1
0.0110868232449434
wt_ya_rep2
0.00652131903550074

In [ ]: