Predictors for each method inherit from the Predictor
class and all implement a predict method for scoring a single sequence. This may wrap methods from other modules and/or call command line predictors. For example the TepitopePredictor
uses the mhcpredict.tepitope
module. This method should return a Pandas DataFrame
. The predictProteins
method is used for multiple proteins contained in a dataframe of sequences in a standard format. This is created from a genbank or fasta file (see examples below). For large numbers of sequences predictProteins you should provide a path so that the results are saved as each protein is completed to avoid memory issues, since many alleles might be called for each protein. Results are saved with one file per protein in csv format. Results can be loaded into the predictor individually or all together using the load
method.
In [1]:
import os, math, time, pickle, subprocess
from importlib import reload
from collections import OrderedDict
import numpy as np
import pandas as pd
pd.set_option('display.width', 100)
import epitopepredict as ep
from epitopepredict import sequtils, tepitope, plotting, utilities, peptutils
from IPython.display import display, HTML, Image
%matplotlib inline
import matplotlib as mpl
import pylab as plt
In [2]:
#get preset alleles
m2_alleles = ep.get_preset_alleles('mhc2_supertypes')
m1_alleles = ep.get_preset_alleles('mhc1_supertypes')
print (m1_alleles)
print (m2_alleles)
In [3]:
seqs = peptutils.create_random_sequences(10)
print (seqs[:5])
df = pd.DataFrame(seqs,columns=['peptide'])
P = ep.get_predictor('basicmhc1')
b = P.predict_peptides(df.peptide, alleles=m1_alleles, show_cmd=True, cpus=1)
In [4]:
print (b[:5])
In [8]:
#load protein sequences into a dataframe
prots = ep.genbank_to_dataframe('../MTB-H37Rv.gb',cds=True)
prots[:5]
Out[8]:
In [9]:
P = ep.get_predictor('tepitope')
mb_binders = P.predict_sequences(prots[:20], alleles=m2_alleles, cpus=1)
In [10]:
mb_binders.head()
Out[10]:
In [5]:
pb = P.promiscuous_binders(n=3)
pb.shape
Out[5]:
In [11]:
print (pb[:3])
In [12]:
#get names of proteins stored in results of predictor
print (P.get_names())
ax = plotting.plot_tracks([P],name='Rv0011c',cutoff=.94,n=2)
plt.tight_layout()
plt.savefig('mhc_rv0011c.png',dpi=150)
ax = plotting.plot_binder_map(P,name='Rv0011c',cutoff=10)
plt.savefig('mhc_rv0011c_map.png',dpi=150)
In [13]:
reload(plotting)
from bokeh.io import show, output_notebook
output_notebook()
p = plotting.bokeh_plot_tracks([P],name='Rv0011c',cutoff=.95,n=2,width=800)
show(p)
In [ ]: