We use rpy2 and R magics in IPython Notebook to utilize the powerful biomaRt package in R.
Usage:
Ref:
In [1]:
import pandas as pd
%load_ext rpy2.ipython
In [16]:
%%R
library(biomaRt)
In [4]:
%load_ext version_information
%version_information pandas, rpy2
Out[4]:
Current build (currently not working...):
In [15]:
%%R
marts = listMarts()
head(marts)
Sometimes you need to specify a particular genome build (e.g., GTEx v6 used GENCODE v19, which was based on GRCh37.p13 = Ensembl 74):
In [4]:
%%R
marts.v74 = listMarts(host="dec2013.archive.ensembl.org")
head(marts.v74)
In [4]:
%%R
datasets = listDatasets(useMart("ensembl"))
head(datasets)
In [7]:
%%R
mart.hsa = useMart("ensembl", "hsapiens_gene_ensembl")
For an old archive, you can even specify the archive version when calling useMart, e.g.,
In [8]:
%%R
mart74.hsa = useMart("ENSEMBL_MART_ENSEMBL", "hsapiens_gene_ensembl", host="dec2013.archive.ensembl.org")
We will use mart build v74 as our example
In [9]:
%%R
mart.hsa = mart74.hsa
In [13]:
%%R
attributes <- listAttributes(mart.hsa)
head(attributes)
In [14]:
%%R
filters <- listFilters(mart.hsa)
head(filters)
You can search for specific attributes by running grep() on the name. For example, if you’re looking for Affymetrix microarray probeset IDs:
In [15]:
%%R
head(attributes[grep("affy", attributes$name),])
Query in R:
In [16]:
%%R -o df
df = getBM(attributes=c("ensembl_gene_id", "hgnc_symbol", "chromosome_name"),
filters="chromosome_name",
values="Y",
mart=mart.hsa)
head(df)
Accessible in Python:
In [17]:
df.head()
Out[17]:
In [18]:
genes = ["ENSG00000135245", "ENSG00000240758", "ENSG00000225490"]
In [22]:
%%R -i genes -o df
df = getBM(attributes=c("ensembl_gene_id", "hgnc_symbol", "external_gene_id", "chromosome_name", "gene_biotype", "description"),
filters="ensembl_gene_id",
values=genes,
mart=mart.hsa)
df
In [23]:
df
Out[23]: