Introduction

This notebook serves as a tutorial for using gUtils functions.

gUtils is an R package developed to extend the functionality of GenomicRanges, the R package which stores genomic data and represents genomic locations in the Bioconductor project

Load the necessary packages and data

Although it is not a required dependency, we recommmend that users download a BSGenome, the structure used to represent full genomes within Bioconductor packages.

Please install with biocLite()

### try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("BSgenome.Hsapiens.UCSC.hg19")

See the BSgenome documentation for more details: https://bioconductor.org/packages/release/data/annotation/html/BSgenome.Hsapiens.UCSC.hg19.html

Load gUtils


In [1]:
suppressWarnings(suppressMessages(library(gUtils)))

Load BSgenome.Hsapiens.UCSC.hg19 and set with the variable DEFAULT_BSGENOME in your environment:


In [2]:
library(BSgenome.Hsapiens.UCSC.hg19)
Sys.setenv(DEFAULT_BSGENOME = 'BSgenome.Hsapiens.UCSC.hg19::Hsapiens')


Loading required package: BSgenome
Loading required package: Biostrings
Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

Loading required package: rtracklayer

Create GRanges objects for examples


In [3]:
## instantiate an example GRanges objects, gr

gr = GRanges(1, IRanges(c(3, 9, 13), c(9, 13, 20)), strand=c('+','-','-'), name=c("A","B","C"))

print(gr)


GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |        name
         <Rle> <IRanges>  <Rle> | <character>
  [1]        1  [ 3,  9]      + |           A
  [2]        1  [ 9, 13]      - |           B
  [3]        1  [13, 20]      - |           C
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

In [4]:
## instantiate multiple GRanges objects (to be used in examples below)

gr1 = suppressWarnings(GRanges(
    seqnames = Rle(c(1, 2, 1, 3), c(1, 3, 2, 4)),
    ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),
    strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
    score = 1:10,
    GC = seq(1, 0, length=10)))

gr2 = GRanges(1, IRanges(c(103, 107, 113), c(105, 109, 116)), 
    strand=c('+','-','-'), 
    name=c("A","B","C"))

print(gr1)


print(gr2)


GRanges object with 10 ranges and 2 metadata columns:
    seqnames     ranges strand |     score                GC
       <Rle>  <IRanges>  <Rle> | <integer>         <numeric>
  a        1 [101, 111]      - |         1                 1
  b        2 [102, 112]      + |         2 0.888888888888889
  c        2 [103, 113]      + |         3 0.777777777777778
  d        2 [104, 114]      * |         4 0.666666666666667
  e        1 [105, 115]      * |         5 0.555555555555556
  f        1 [106, 116]      + |         6 0.444444444444444
  g        3 [107, 117]      + |         7 0.333333333333333
  h        3 [108, 118]      + |         8 0.222222222222222
  i        3 [109, 119]      - |         9 0.111111111111111
  j        3 [110, 120]      - |        10                 0
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
GRanges object with 3 ranges and 1 metadata column:
      seqnames     ranges strand |        name
         <Rle>  <IRanges>  <Rle> | <character>
  [1]        1 [103, 105]      + |           A
  [2]        1 [107, 109]      - |           B
  [3]        1 [113, 116]      - |           C
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Functions %-% and %+%


In [5]:
## Motivation:
## Adding integer N to a GRanges object with IRanges(start, end) 
## will result in a GRanges with IRanges(start - N, end + N)
gr + 5

## Subtracting integer N to a GRanges object with IRanges(start, end) 
## will result in a GRanges with IRanges(start + N, end - N)
gr - 2

## The functions '%+%' and '%-%' are designed to easily manipulate a GRanges object to add/substract bases 
## to 'start' and 'end'


GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |        name
         <Rle> <IRanges>  <Rle> | <character>
  [1]        1  [-2, 14]      + |           A
  [2]        1  [ 4, 18]      - |           B
  [3]        1  [ 8, 25]      - |           C
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |        name
         <Rle> <IRanges>  <Rle> | <character>
  [1]        1  [ 5,  7]      + |           A
  [2]        1  [11, 11]      - |           B
  [3]        1  [15, 18]      - |           C
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

%+% : shifts GRanges by N bases to the right


In [6]:
## Function: 
## '%+%' is an operator which shifts GRanges by N bases to the right, i.e. IRanges(start + N, end + N)
## Use:
## gr %+% N = original GRanges 'gr' shifted N bases to the right.

In [7]:
gr %+% 5


GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |        name
         <Rle> <IRanges>  <Rle> | <character>
  [1]        1  [ 8, 14]      + |           A
  [2]        1  [14, 18]      - |           B
  [3]        1  [18, 25]      - |           C
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

%-% : shifts GRanges by N bases to the left


In [8]:
## Function: 
## '%-%' is an operator which shifts GRanges by N bases to the left, , i.e. IRanges(start - N, end - N)
## Use:
## gr %-% N = original GRanges 'gr' shifted N bases to the right.

In [9]:
gr %-% 5


GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |        name
         <Rle> <IRanges>  <Rle> | <character>
  [1]        1  [-2,  4]      + |           A
  [2]        1  [ 4,  8]      - |           B
  [3]        1  [ 8, 15]      - |           C
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

%Q% : subsets the GRanges based on conditionals


In [10]:
## Function: 
## '%Q%' subsets the GRanges 'gr' based on conditionals of the metadata columns in 'gr'
## Use:
## gr %Q% (gene_names == 'KRAS')
## gr %Q% ((mapq >= 40) && (GC < 0.5))
## gr %Q% (width < 100)

In [11]:
gr %Q% (name != 'B')


GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |        name
         <Rle> <IRanges>  <Rle> | <character>
  [1]        1  [ 3,  9]      + |           A
  [2]        1  [13, 20]      - |           C
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

%*% : Performs "natural join" or merge of metadata columns of two GRanges


In [12]:
## Function: 
## '%*%' Performs "natural join" or merge of metadata columns of grA and grB 
## using interval overlap as a "primary key", outputs a new GRanges whose maximum length is length(grA)*length(grB). 
## (See please see documentation for function 'gr.findoverlaps()' for more complex queries)
## Use:
## grA %*% grB   # strand-agnostic merging
## grA %**% grB  # strand-specific merging

In [13]:
## grA %*% grB  # strand-agnostic merging

gr1 %*% gr2   ## gr2 %*% gr1 will have metadata columns in a different order


Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
GRanges object with 7 ranges and 5 metadata columns:
      seqnames     ranges strand |  query.id subject.id     score        GC
         <Rle>  <IRanges>  <Rle> | <integer>  <integer> <integer> <numeric>
  [1]        1 [103, 105]      * |         1          1         1 1.0000000
  [2]        1 [105, 105]      * |         5          1         5 0.5555556
  [3]        1 [107, 109]      * |         1          2         1 1.0000000
  [4]        1 [107, 109]      * |         5          2         5 0.5555556
  [5]        1 [107, 109]      * |         6          2         6 0.4444444
  [6]        1 [113, 115]      * |         5          3         5 0.5555556
  [7]        1 [113, 116]      * |         6          3         6 0.4444444
             name
      <character>
  [1]           A
  [2]           A
  [3]           B
  [4]           B
  [5]           B
  [6]           C
  [7]           C
  -------
  seqinfo: 25 sequences from an unspecified genome

In [14]:
## grA %**% grB # strand-specific merging

gr1 %**% gr2


Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
GRanges object with 4 ranges and 5 metadata columns:
      seqnames     ranges strand |  query.id subject.id     score        GC
         <Rle>  <IRanges>  <Rle> | <integer>  <integer> <integer> <numeric>
  [1]        1 [107, 109]      - |         1          2         1 1.0000000
  [2]        1 [105, 105]      * |         5          1         5 0.5555556
  [3]        1 [107, 109]      * |         5          2         5 0.5555556
  [4]        1 [113, 115]      * |         5          3         5 0.5555556
             name
      <character>
  [1]           B
  [2]           A
  [3]           B
  [4]           C
  -------
  seqinfo: 25 sequences from an unspecified genome

%\$% : aggreagates metadata across GRanges

grA %\$% grB aggregates the metadata in grB across the territory of each range in grA


In [15]:
## Function: 
## '%$%' aggregates the metadata in grB across the territory of each range in grA  
## Aggregates the metadata in grB across the territory of each range in grA. 
## This returns grA appended with additional metadata columns of grA with values aggregated over where grA and grB overlap. 
## For character or factor-valued metadata columns of grB, aggregation will return a comma collapsed character value of all grB values that overlap grA[i].
## For numeric columns of grB it will return the width-weighted mean value of that column across the grA[i] and grB overlap. 
## Use:
## grA %$% grB   # strand-agnostic aggregation
## grA %$$% grB  # strand-specific aggregation

In [16]:
## grA %$% grB  # strand-agnostic aggregation

gr1 %$% gr2


Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
GRanges object with 10 ranges and 3 metadata columns:
    seqnames     ranges strand |     score        GC        name
       <Rle>  <IRanges>  <Rle> | <integer> <numeric> <character>
  a        1 [101, 111]      - |         1 1.0000000        A, B
  b        2 [102, 112]      + |         2 0.8888889            
  c        2 [103, 113]      + |         3 0.7777778            
  d        2 [104, 114]      * |         4 0.6666667            
  e        1 [105, 115]      * |         5 0.5555556     A, B, C
  f        1 [106, 116]      + |         6 0.4444444        B, C
  g        3 [107, 117]      + |         7 0.3333333            
  h        3 [108, 118]      + |         8 0.2222222            
  i        3 [109, 119]      - |         9 0.1111111            
  j        3 [110, 120]      - |        10 0.0000000            
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

In [17]:
gr2 %$% gr1


Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
GRanges object with 3 ranges and 3 metadata columns:
      seqnames     ranges strand |        name     score        GC
         <Rle>  <IRanges>  <Rle> | <character> <numeric> <numeric>
  [1]        1 [103, 105]      + |           A  2.000000 0.8888889
  [2]        1 [107, 109]      - |           B  4.000000 0.6666667
  [3]        1 [113, 116]      - |           C  5.571429 0.4920635
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

In [18]:
## grA %$$% grB  # strand-specific aggregation

gr1 %$$% gr2


Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
GRanges object with 10 ranges and 3 metadata columns:
    seqnames     ranges strand |     score        GC        name
       <Rle>  <IRanges>  <Rle> | <integer> <numeric> <character>
  a        1 [101, 111]      - |         1 1.0000000           B
  b        2 [102, 112]      + |         2 0.8888889            
  c        2 [103, 113]      + |         3 0.7777778            
  d        2 [104, 114]      * |         4 0.6666667            
  e        1 [105, 115]      * |         5 0.5555556     A, B, C
  f        1 [106, 116]      + |         6 0.4444444            
  g        3 [107, 117]      + |         7 0.3333333            
  h        3 [108, 118]      + |         8 0.2222222            
  i        3 [109, 119]      - |         9 0.1111111            
  j        3 [110, 120]      - |        10 0.0000000            
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

In [19]:
gr2 %$$% gr1


Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
GRanges object with 3 ranges and 3 metadata columns:
      seqnames     ranges strand |        name     score        GC
         <Rle>  <IRanges>  <Rle> | <character> <numeric> <numeric>
  [1]        1 [103, 105]      + |           A         5 0.5555556
  [2]        1 [107, 109]      - |           B         3 0.7777778
  [3]        1 [113, 116]      - |           C         5 0.5555556
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

%&% : returns subset of overlapping GRanges

grA %&% grB returns the subset of ranges in grA that overlap with at least one range in grB


In [20]:
## Function: 
## '%&%' return the subset of ranges in grA that overlap with at least one range in grB.
## Use:
## grA %&% grB   # strand-agnostic 
## grA %&&% grB  # strand-specific

In [21]:
## grA %&% grB  # strand-agnostic 

gr1 %&% gr2


Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
GRanges object with 3 ranges and 2 metadata columns:
    seqnames     ranges strand |     score                GC
       <Rle>  <IRanges>  <Rle> | <integer>         <numeric>
  a        1 [101, 111]      - |         1                 1
  e        1 [105, 115]      * |         5 0.555555555555556
  f        1 [106, 116]      + |         6 0.444444444444444
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

In [22]:
gr2 %&% gr1


Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
GRanges object with 3 ranges and 1 metadata column:
      seqnames     ranges strand |        name
         <Rle>  <IRanges>  <Rle> | <character>
  [1]        1 [103, 105]      + |           A
  [2]        1 [107, 109]      - |           B
  [3]        1 [113, 116]      - |           C
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

In [23]:
## grA %&&% grB  # strand-specific 

gr1 %&&% gr2


Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
GRanges object with 2 ranges and 2 metadata columns:
    seqnames     ranges strand |     score                GC
       <Rle>  <IRanges>  <Rle> | <integer>         <numeric>
  a        1 [101, 111]      - |         1                 1
  e        1 [105, 115]      * |         5 0.555555555555556
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

In [24]:
gr2 %&&% gr1


Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
GRanges object with 3 ranges and 1 metadata column:
      seqnames     ranges strand |        name
         <Rle>  <IRanges>  <Rle> | <character>
  [1]        1 [103, 105]      + |           A
  [2]        1 [107, 109]      - |           B
  [3]        1 [113, 116]      - |           C
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

%O% : returns number of bases in overlapping GRanges

grA %O% grB returns a length(grA) numeric vector whose item i is the fraction of the width of grA[i] which overlaps at least one range in grB.


In [25]:
## Function: 
## '%O%' 
## Returns a length(grA) numeric vector whose item i is the *fraction of the width* of grA[i] 
## which overlaps at least one range in grB.
## Use:
## grA %O% grB   # strand-agnostic
## grA %OO% grB  # strand-specific

In [26]:
## grA %O% grB   # strand-agnostic

print('gr1 %O% gr2 outputs')
print(gr1 %O% gr2)
print('')
print('gr2 %O% gr1 outputs')
print(gr2 %O% gr1)


[1] "gr1 %O% gr2 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
 [1] 0.5454545 0.0000000 0.0000000 0.0000000 0.6363636 0.6363636 0.0000000
 [8] 0.0000000 0.0000000 0.0000000
[1] ""
[1] "gr2 %O% gr1 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
[1] 1.333333 3.000000 1.750000

In [27]:
## grA %OO% grB   # strand-specific

print('gr1 %OO% gr2 outputs')
print(gr1 %OO% gr2)
print('')
print('gr2 %OO% gr1 outputs')
print(gr2 %OO% gr1)


[1] "gr1 %OO% gr2 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
 [1] 0.2727273 0.0000000 0.0000000 0.0000000 0.6363636 0.0000000 0.0000000
 [8] 0.0000000 0.0000000 0.0000000
[1] ""
[1] "gr2 %OO% gr1 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
[1] 0.3333333 2.0000000 0.7500000

%o% : returns width fraction in overlapping GRanges

grA %o% grB returns a length(grA) numeric vector whose item i is the number of bases in grA[i]


In [28]:
# %o%
## Function: 
## '%o%' 
## Returns a length(grA) numeric vector whose item i is the *number of bases* in grA[i] 
## which overlaps at least one range in grB. 
## Use:
## grA %o% grB   # strand-agnostic
## grA %oo% grB  # strand-specific

In [29]:
## grA %o% grB   # strand-agnostic

print('gr1 %o% gr2 outputs')
print(gr1 %o% gr2)
print('')
print('gr2 %o% gr1 outputs')
print(gr2 %o% gr1)


[1] "gr1 %o% gr2 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
 [1] 6 0 0 0 7 7 0 0 0 0
[1] ""
[1] "gr2 %o% gr1 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
[1] 4 9 7

In [30]:
## grA %oo% grB  # strand-specific

print('gr1 %oo% gr2 outputs')
print(gr1 %oo% gr2)
print('')
print('gr2 %oo% gr1 outputs')
print(gr2 %oo% gr1)


[1] "gr1 %oo% gr2 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
 [1] 3 0 0 0 7 0 0 0 0 0
[1] ""
[1] "gr2 %oo% gr1 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
[1] 1 6 3

%N% : returns total number of ranges in overlapping GRanges

grA %o% grB returns a length(grA) numeric vector whose item i is the total number of ranges in grB that overlap with grA[i]


In [31]:
## Function: 
## '%N%' 
## Returns a length(grA) numeric vector whose item i is the total number of ranges in grB that overlap with grA[i]
## Use:
## grA %N%  grB   # strand-agnostic
## grA %NN% grB  # strand-specific

In [32]:
## grA %N% grB   # strand-agnostic

print('gr1 %N% gr2 outputs')
print(gr1 %N% gr2)
print('')
print('gr2 %N% gr1 outputs')
print(gr2 %N% gr1)


[1] "gr1 %N% gr2 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
 [1] 2 0 0 0 3 2 0 0 0 0
[1] ""
[1] "gr2 %N% gr1 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
[1] 2 3 2

In [33]:
## grA %NN% grB   # strand-agnostic

print('gr1 %NN% gr2 outputs')
print(gr1 %NN% gr2)
print('')
print('gr2 %N% gr1 outputs')
print(gr2 %NN% gr1)


[1] "gr1 %NN% gr2 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
 [1] 1 0 0 0 3 0 0 0 0 0
[1] ""
[1] "gr2 %N% gr1 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
[1] 1 2 1

%^% : returns boolean vector TRUE in overlapping GRanges

grA %^% grB returns a length(grA) logical vector whose item i TRUE if the grA[i] overlap at least on range in grB


In [34]:
## Function: 
## '%^%' 
## Returns a length(grA) logical vector whose item i TRUE if the grA[i] overlap at least on range in grB 
## Use:
## grA %^%  grB  # strand-agnostic
## grA %^^% grB  # strand-specific

In [35]:
## grA %^%  grB  # strand-agnostic

print('gr1 %^% gr2 outputs')
print(gr1 %^% gr2)
print('')
print('gr2 %^% gr1 outputs')
print(gr2 %^% gr1)


[1] "gr1 %^% gr2 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
 [1]  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
[1] ""
[1] "gr2 %^% gr1 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
[1] TRUE TRUE TRUE

In [36]:
## grA %^^% grB  # strand-specific

print('gr1 %^^% gr2 outputs')
print(gr1 %^^% gr2)
print('')
print('gr2 %^^% gr1 outputs')
print(gr2 %^^% gr1)


[1] "gr1 %^^% gr2 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
 [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[1] ""
[1] "gr2 %^^% gr1 outputs"
Warning message:
“"ranges" method for Hits objects is deprecated. Please use
  overlapsRanges() instead.”
[1] TRUE TRUE TRUE

%_% : returns GRanges representing setdiff in input GRanges


In [37]:
## Function: 
## '%_%' 
## Similar to the function BiocGenerics::setdiff(), but strand-agnostic (cf. https://github.com/Bioconductor/BiocGenerics)
## Use:
## grA %_% grB

In [38]:
print('gr1 %_% gr2 outputs')
print(suppressWarnings(gr1 %_% gr2))
print('')
print('gr2 %_% gr1 outputs')
print(suppressWarnings(gr2 %_% gr1))


[1] "gr1 %_% gr2 outputs"
GRanges object with 5 ranges and 0 metadata columns:
      seqnames     ranges strand
         <Rle>  <IRanges>  <Rle>
  [1]        1 [101, 102]      *
  [2]        1 [106, 106]      *
  [3]        1 [110, 112]      *
  [4]        2 [102, 114]      *
  [5]        3 [107, 120]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
[1] ""
[1] "gr2 %_% gr1 outputs"
GRanges object with 0 ranges and 0 metadata columns:
   seqnames    ranges strand
      <Rle> <IRanges>  <Rle>
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

gUtils internal data and functions to load internal data

gUtils contains several internal data variable which load

hg_seqlengths()

returns standard hg19 seqlengths as an integer vector


In [39]:
# hg_seqlengths()
## Function: 'hg_seqlengths()'
## Details:
## Outputs standard hg19 seqlengths as an integer vector

In [40]:
hg_seqlengths()[1:6]


1
249250621
2
243199373
3
198022430
4
191154276
5
180915260
6
171115067

example_genes

returns GRanges of RefSeq genes with exon count and names, using hg19


In [41]:
# example_genes
## Data: 'example_genes', GRanges
## Details:
## GRanges of RefSeq genes with exon count and names, using hg19

In [42]:
head(example_genes)

length(example_genes)


GRanges object with 6 ranges and 2 metadata columns:
      seqnames           ranges strand |        name exonCount
         <Rle>        <IRanges>  <Rle> | <character> <integer>
  [1]        1 [ 69090,  70008]      + |       OR4F5         1
  [2]        1 [367658, 368597]      + |      OR4F16         1
  [3]        1 [367658, 368597]      + |       OR4F3         1
  [4]        1 [367658, 368597]      + |      OR4F29         1
  [5]        1 [861321, 879533]      + |      SAMD11        14
  [6]        1 [896073, 900571]      + |      KLHL17        12
  -------
  seqinfo: 25 sequences from hg19 genome
18812

example_dnase

returns DNAaseI hypersensitivity sites from UCSC Table Browser hg19, subsampled to 10,000 sites


In [43]:
# example_dnase
## Data: 'example_dnase', GRanges
## Details:
## DNAaseI hypersensitivity sites from UCSC Table Browser hg19,
## subsampled to 10,000 sites

In [44]:
head(example_dnase)

length(example_dnase)


GRanges object with 6 ranges and 2 metadata columns:
      seqnames             ranges strand | signalValue    pValue
         <Rle>          <IRanges>  <Rle> |   <numeric> <numeric>
  [1]        1 [ 567461,  567893]      * |     50.4687   282.177
  [2]        1 [ 848076,  848474]      * |     9.15261    12.211
  [3]        1 [ 860836,  861798]      * |     23.4447   53.1665
  [4]        1 [1172453, 1172732]      * |     2.74806   2.10225
  [5]        1 [1210156, 1210484]      * |     2.20956   1.60014
  [6]        1 [1221000, 1221366]      * |     5.70984   5.91024
  -------
  seqinfo: 25 sequences from hg19 genome
10000

grl1

returns SV rearrangements, first dataset


In [45]:
# grl1
## Data: 'grl1', GRangesList
## Details:
## simulated SV rearrangements, first dataset

In [46]:
head(grl1, 3)


GRangesList object of length 3:
$10 
GRanges object with 2 ranges and 1 metadata column:
      seqnames                 ranges strand |       bin
         <Rle>              <IRanges>  <Rle> | <integer>
  [1]        5 [146981323, 146981323]      - |        10
  [2]       14 [104605059, 104605059]      + |        10

$26 
GRanges object with 2 ranges and 1 metadata column:
      seqnames                 ranges strand | bin
  [1]        9 [140100229, 140100229]      - |  26
  [2]       19 [ 24309057,  24309057]      - |  26

$49 
GRanges object with 2 ranges and 1 metadata column:
      seqnames                 ranges strand | bin
  [1]        5 [132216750, 132216750]      - |  49
  [2]       17 [  4803057,   4803057]      + |  49

-------
seqinfo: 25 sequences from an unspecified genome

grl2

returns SV rearrangements, second dataset


In [47]:
# grl2
## Data: 'grl2', GRangesList
## Details:
## simulated SV rearrangements, second dataset

In [48]:
head(grl2, 3)


GRangesList object of length 3:
$4678 
GRanges object with 2 ranges and 1 metadata column:
      seqnames                 ranges strand |       bin
         <Rle>              <IRanges>  <Rle> | <integer>
  [1]        8 [110374809, 110374809]      - |      4678
  [2]       17 [ 72874459,  72874459]      + |      4678

$4713 
GRanges object with 2 ranges and 1 metadata column:
      seqnames                 ranges strand |  bin
  [1]        7 [116166641, 116166641]      + | 4713
  [2]       22 [ 21738147,  21738147]      - | 4713

$4723 
GRanges object with 2 ranges and 1 metadata column:
      seqnames                 ranges strand |  bin
  [1]        6 [ 32822006,  32822006]      + | 4723
  [2]        9 [136529010, 136529010]      - | 4723

-------
seqinfo: 25 sequences from an unspecified genome

si

returns Seqinfo object for hg19


In [49]:
# si
## Data: si, Seqinfo object
## Details:
## Seqinfo object for hg19

si


Seqinfo object with 25 sequences from hg19 genome:
  seqnames seqlengths isCircular genome
  1         249250621      FALSE   hg19
  2         243199373      FALSE   hg19
  3         198022430      FALSE   hg19
  4         191154276      FALSE   hg19
  5         180915260      FALSE   hg19
  ...             ...        ...    ...
  21         48129895      FALSE   hg19
  22         51304566      FALSE   hg19
  X         155270560      FALSE   hg19
  Y          59373566      FALSE   hg19
  M             16571      FALSE   hg19

grl.hiC

HiC data for chr14 from Lieberman-Aiden 2009 (in hg19), subsampled to 10,000 interactions


In [50]:
# grl.hiC
## Data: 'grl.hiC', GRangesList
## Details:
## HiC data for chr14 from Lieberman-Aiden 2009 (in hg19), subsampled
## to 10,000 interactions

head(grl.hiC)


GRangesList object of length 6:
$462760 
GRanges object with 2 ranges and 1 metadata column:
      seqnames               ranges strand |        id
         <Rle>            <IRanges>  <Rle> | <integer>
  [1]       14 [72176266, 72176266]      - |    462760
  [2]       14 [72422406, 72422406]      - |    462760

$40284 
GRanges object with 2 ranges and 1 metadata column:
      seqnames               ranges strand |    id
  [1]       14 [74904243, 74904243]      + | 40284
  [2]       14 [74905303, 74905303]      - | 40284

$240518 
GRanges object with 2 ranges and 1 metadata column:
      seqnames               ranges strand |     id
  [1]       14 [23569934, 23569934]      + | 240518
  [2]       14 [96453342, 96453342]      + | 240518

...
<3 more elements>
-------
seqinfo: 25 sequences from an unspecified genome

In [ ]:


In [ ]: