Jupyter notebooks provide a convenient interface for sharing data and functions between Python and R through use of the Python rpy2
module. By combining all of your code from across multiple languages inside a single notebook your workflow from analyses to plots is easy to follow and reproduce. In this notebook, I show an example of exporting data from an ipyrad JSON object into R so that we can create high quality plots using plotting libraries in R.
A large motivation for creating the JSON storage object for ipyrad Assemblies is that this object stores all of the information for an entire assembly, and thus provides a useful portable format. This way you can execute very large assemblies on a remote HPC cluster and simply import the small JSON file onto your laptop to analyze and compare its size and other stats. It is of course easiest to analyze the stats in Python since ipyrad has a built-in parser for these JSON objects. However, many users may prefer to use R for plotting, and so here we show how to easily transfer results from ipyrad to R.
In [2]:
## import ipyrad and give it a shorter name
import ipyrad as ip
In [3]:
## create a test assembly
data = ip.Assembly("data")
data.set_params('project_dir', 'test')
data.set_params('raw_fastq_path', 'ipsimdata/rad_example_R1_.fastq.gz')
data.set_params('barcodes_path', 'ipsimdata/rad_example_barcodes.txt')
## Assemble data set; runs steps 1-7
data.run('1')
In [4]:
## load the JSON file for this assembly
data = ip.load_json("test/data.json")
In [5]:
## Data can be accessed from the object's stats and stats_df attributes
print data.stats
In [6]:
## This requires that you have the Python module `rpy2` installed.
## If you do not, it can be installed in anaconda with:
## conda install rpy2
There are a few odd tricks to using this module. One being that you shouldn't try to transfer objects with '.' in their names. R doesn't like that. So simply rename these objects before passing them. Below I rename the stats data frame and then use the '-i' flag in R to import it into R namespace. The "%R" at the beginning of the line tells IPyhton to execute just that line as R code.
In [7]:
%load_ext rpy2.ipython
In [25]:
## rename data.stats as statsDF
statsDF = data.stats
## import statsDF into R namespace
%R -i statsDF
In [26]:
%%R
print(statsDF)
In [92]:
%%R -w 350 -h 350
## the dimensions above tell IPython how big to make the embedded figure
## alternatively you can adjust the size when you save the figure
plot(statsDF$reads_raw,
statsDF$reads_filtered,
pch=20, cex=3)
In [13]:
### Other stats from our assembly are also available.
### First store names and then import into R
s5 = data.stats_dfs.s5
s7L = data.stats_dfs.s7_loci
s7S = data.stats_dfs.s7_snps
s7N = data.stats_dfs.s7_samples
## no spaces allowed between comma-separated names when
## transferring multiple objects to R
%R -i s5,s7L,s7S,s7N
In [24]:
%%R -w 800 -h 320
##
barplot(s7N$sample_coverage,
col='grey30', names=rownames(s7N),
ylab="N loci",
xlab="Sample")
In [14]:
%%R -w 450 -h 400
print(s7S)
barplot(s7S$var,
col=rgb(0,0,1,1/4),
names=rownames(s7S),
ylab="N loci", ylim=c(0, 400),
xlab="N variable sites")
barplot(s7S$pis,
col=rgb(1,0,0,1/4),
names=rownames(s7S),
ylab="N loci", ylim=c(0, 400),
xlab="N variable sites",
add=TRUE)
You can of course make much more complex figures with the data for your own Assembly. And you can also create figures to compare stats among multiple Assemblies. See also the R package RADami which has some functions for analysis and plotting using the .loci output file from ipyrad.
If you're an R afficionado please contribute to the ipyrad cookbooks with more advanced plots or analyses, and add your notebooks to the cookbook collection by submitting them to the ipyrad github repo.