Flux Balance Analysis model predictions of essential genes

A quick look at the FBA model predictions of essentiality data from the Saccharomyces Genome Deletion Project.


In [1]:
import sys
from warnings import filterwarnings
from scipy import stats
import cobra
sys.path.append('../flux_balance_analysis')
from single_knockouts import single_knockout_loss_costs , single_knockout_modified_loss_cost
from double_knockouts import load_sd_minus_his

In [2]:
filterwarnings('ignore', 'charge of s_[0-9][0-9][0-9][0-9] is not a number ()')
filterwarnings('ignore', 'uppercase AND/OR found in rule ')
model = load_sd_minus_his('../data/external/yeast_7.6/yeast_7.6.xml')
with open('../data/processed/Essential_ORFs.txt', 'r') as f:
    essentialGenes = set([l.strip() for l in f.readlines()])

In [3]:
print 'using gene-loss costs'
print len(essentialGenes), 'experimental essential genes'
genes = set([g.id for g in model.genes])
print len(genes), 'genes in model'
print len(genes.intersection(essentialGenes)), 'overlap'
_wtGrew, glc, flc = single_knockout_loss_costs(model)
modifiedCost = single_knockout_modified_loss_cost(model)


using gene-loss costs
1122 experimental essential genes
909 genes in model
139 overlap

In [6]:
print len(essentialGenes), 'experimental essential genes'
genes = set([g.id for g in model.genes])
print len(genes), 'toal genes in the FBA model'
print len(genes.intersection(essentialGenes)), 'overlap between the model genes and the essential genes.'
_wtGrew, glc, flc = single_knockout_loss_costs(model)
modifiedCost = single_knockout_modified_loss_cost(model)

def essential_gene_prediction_hypothesis_test(costs):
    modelPredictions = set([orfID for orfID, cost in costs.items() if cost > 0.999])
    print len(modelPredictions), 'predicted essential'
    tp = len(modelPredictions.intersection(essentialGenes))
    print tp, 'predicted correctly'
    fp = len(modelPredictions) - tp
    fn = len(genes.intersection(essentialGenes)) - tp
    tn = len(genes) - (tp + fp + fn)
    odds, pval = stats.fisher_exact([[tn, fp],[fn, tp]])
    print 'p-value %.1e' % pval
    print 'odds-ratio %.1f' % odds
    
    
print '\nGene-loss'
essential_gene_prediction_hypothesis_test(glc)
print '\nFunction-loss'
essential_gene_prediction_hypothesis_test(flc)
print '\nGene-loss with non-redundant isoenzymes'
essential_gene_prediction_hypothesis_test(modifiedCost)


1122 experimental essential genes
909 toal genes in the FBA model
139 overlap between the model genes and the essential genes.

Gene-loss
108 predicted essential
77 predicted correctly
p-value 1.3e-47
odds-ratio 29.6

Function-loss
144 predicted essential
64 predicted correctly
p-value 4.7e-21
odds-ratio 7.4

Gene-loss with non-redundant isoenzymes
165 predicted essential
80 predicted correctly
p-value 2.0e-31
odds-ratio 10.9