Install Misopy


In [1]:
!pip install --user --quiet misopy

Restart the kernel (Kernel -> Restart), so that IPython finds Misopy

Get Samtools


In [1]:
from urllib import urlretrieve
urlretrieve("http://depot.galaxyproject.org/package/linux/x86_64/samtools/samtools-0.1.19-Linux-x86_64.tgz","samtools.tgz")
!tar -xf samtools.tgz

The samtools binary should now be in bin/samtools

Generate Indices of the bam Files

  • Now the bam files from the Galaxy history can be imported with get(history_id)
  • Then index files of the bam files can be generated via samtools

In [2]:
get(1)
get(2)
!mv 1 461177.bam
!mv 2 461178.bam

!bin/samtools index 461177.bam
!bin/samtools index 461178.bam

If everything went well there should now be index files (*.bai) next to the bam files.


In [3]:
!ls *.ba*


461177.bam  461177.bam.bai  461178.bam	461178.bam.bai

Generate Index of an Annotation File

An annotation file is needed.


In [4]:
get(3)
!mv 3 dm3.ensGene.gtf

A script for the conversion of gtf(gff2) to gff3 is needed


In [5]:
urlretrieve("http://genes.mit.edu/burgelab/miso/scripts/gtf2gff3.pl","gtf2gff3.pl")


Out[5]:
('gtf2gff3.pl', <httplib.HTTPMessage instance at 0x7f1c95700bd8>)

... to convert the annotation file to the newer gff3 standard


In [6]:
!perl gtf2gff3.pl dm3.ensGene.gtf > dm3.ensGene.gff3


Config::Std not installed. Will use built-in defaults.

Now misopy is needed to generate an index for this annotation file

The index will be stored in the 'indexed' folder


In [7]:
import misopy
from misopy import index_gff
index_gff.index_gff("dm3.ensGene.gff3", "indexed")


Indexing GFF...
  - GFF: dm3.ensGene.gff3
  - Outputting to: indexed
Through 0 genes...
Through 5000 genes...
Through 10000 genes...
Loaded 14869 genes
  - Loading of genes from GFF took 9.59 seconds
Making directory: indexed/chr2RHet
  - Chromosome serialization took 0.05 seconds
Making directory: indexed/chr3RHet
  - Chromosome serialization took 0.00 seconds
Making directory: indexed/chrU
  - Chromosome serialization took 0.02 seconds
Making directory: indexed/chr4
  - Chromosome serialization took 0.04 seconds
Making directory: indexed/chrYHet
  - Chromosome serialization took 0.00 seconds
Making directory: indexed/chr3L
  - Chromosome serialization took 0.59 seconds
Making directory: indexed/chr2L
  - Chromosome serialization took 0.60 seconds
Making directory: indexed/chr2LHet
  - Chromosome serialization took 0.00 seconds
Making directory: indexed/chrX
  - Chromosome serialization took 0.54 seconds
Making directory: indexed/chrXHet
  - Chromosome serialization took 0.00 seconds
Making directory: indexed/chr2R
  - Chromosome serialization took 0.72 seconds
Making directory: indexed/chr3R
  - Chromosome serialization took 0.86 seconds
Making directory: indexed/chrM
  - Chromosome serialization took 0.00 seconds
Making directory: indexed/chr3LHet
  - Chromosome serialization took 0.01 seconds
Outputting gene records in GFF format...
  - Output file: indexed/genes.gff
  - Serialization of genes from GFF took 4.72 seconds
Indexing of GFF took 14.32 seconds.

Now everything that is needed should be available to create a Sashimi plot

Generate Sashimi Plot

The function for the sashimi plot is imported


In [8]:
from misopy.sashimi_plot import sashimi_plot as sash_plot
help(sash_plot.plot_event)


Help on function plot_event in module misopy.sashimi_plot.sashimi_plot:

plot_event(event_name, pickle_dir, settings_filename, output_dir, no_posteriors=False, plot_title=None, plot_label=None)
    Visualize read densities across the exons and junctions
    of a given MISO alternative RNA processing event.
    
    Also plots MISO estimates and Psi values.

/usr/local/lib/python2.7/dist-packages/matplotlib/__init__.py:1318: UserWarning:  This call to matplotlib.use() has no effect
because the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

  warnings.warn(_use_error_msg)

A sashimi_settings file specifies aspects of the plot.


In [9]:
import ConfigParser

config = ConfigParser.RawConfigParser()
config.add_section('data')
config.set('data', 'bam_prefix', '/import')
config.set('data', 'bam_files', '["461177.bam", "461178.bam"]')
config.add_section('plotting')
config.set('plotting', 'fig_width', '7')
config.set('plotting', 'fig_height', '5')
config.set('plotting', 'intron_scale', '30')
config.set('plotting', 'exon_scale', '4')
config.set('plotting', 'logged', 'False')
config.set('plotting', 'font_size', '5')
config.set('plotting', 'ymax', '2500')
config.set('plotting', 'show_posteriors', 'False')
config.set('plotting', 'bar_posteriors', 'False')
config.set('plotting', 'number_junctions', 'True')
config.set('plotting', 'colors', '["#770022","#FF8800"]')


with open('sashimi_settings.txt', 'wb') as configfile:
    config.write(configfile)

An output directory is created and a specific region of interest is plotted.

The ID of the event we want plotted comes from a valid name from the indexed gff file. (See last column of 'indexed/genes.gff')


In [10]:
!mkdir output
sash_plot.plot_event("FBgn0039909","indexed","sashimi_settings.txt","output")


Reading settings from: sashimi_settings.txt
Parsing data:bam_prefix
Parsing data:bam_files
Parsing plotting:fig_width
Parsing plotting:fig_height
Parsing plotting:intron_scale
Parsing plotting:exon_scale
Parsing plotting:logged
Parsing plotting:font_size
Parsing plotting:ymax
Parsing plotting:show_posteriors
Parsing plotting:bar_posteriors
Parsing plotting:number_junctions
Parsing plotting:colors
Reading dimensions from settings...
 - Height: 5.00
 - Width: 7.00
Plotting read densities and MISO estimates along event...
  - Event: FBgn0039909
Setting up plot using dimensions:  [7.0, 5.0]
Using intron scale  30.0
Using exon scale  4.0
Reading sample label: 461177.bam
Processing BAM: /import/461177.bam
Skipping read with CIGAR 31M2I4M
Skipping read with CIGAR 11M1I25M
Skipping read with CIGAR 35M57N2M
Skipping read with CIGAR 2M57N35M
Skipping read with CIGAR 2M65N35M
Reading sample label: 461178.bam
Processing BAM: /import/461178.bam
Skipping read with CIGAR 2M57N35M
Skipping read with CIGAR 2M57N35M
Skipping read with CIGAR 2M57N35M
Skipping read with CIGAR 35M65N2M
Skipping read with CIGAR 35M65N2M
Skipping read with CIGAR 35M65N2M
Skipping read with CIGAR 35M65N2M
Skipping read with CIGAR 2M70N35M
Saving plot to: /import/output/FBgn0039909.pdf

Finally save the pdf back to the Galaxy history


In [20]:
put("output/FBgn0039909.pdf")

In [ ]: