In [8]:
from msdas import *
%pylab inline
rcParams['savefig.dpi'] = 200
In [2]:
from cellnopt.core import *
from easydev import gsf
pkn = gsf("msdas", "data","PKN-yeastScaffold.sif")
print(pkn)
In [4]:
c = CNOGraph(pkn)
c._stimuli = ["a", "NaCl"]
c.plot()
Out[4]:
We will get rid of nodes DIG1, DIG2 and STE12 for now
In [5]:
y = yeast.YEAST2MIDAS(get_yeast_small_data(), get_yeast_raw_data(), verbose=False)
The data is made of 36 experients. We want to extract 3 time series into a MIDAS file. This is done internally to the export_pkn_midas function.
Another tricky part is that the nodes in the PKN are named after protein names. However, we know have protein_psites. In other words there are more than on node for each protein. Again, te export_pkn_and_midas function will take care of that.
Finally, we may be interested to know the errors, which is extracted from the replicates in the raw data.
The export_pkn_and_midas function returns
This cleaning is performed via the call to cleanup_june function
In [6]:
y.cleanup_june()
y.df.shape
Out[6]:
In [7]:
c,m,e = y.export_pkn_and_midas_june(pkn)
In [9]:
c.plot()
Out[9]:
In [27]:
rcParams['savefig.dpi'] = 150
m.plot()
In [10]:
e.create_empty_simulation()
e.plot()
Some errors are not defined (NA). Let us replace the NAs by a pessimistic value of 50% errors.
In [28]:
e.df.fillna(0.3, inplace=True) # can be called only once . Second call has no effect until you re-create e instance
e.save2midas("MD-YEAST_errors.csv") # valid irrespective of the data scaling
In [29]:
m.save2midas("MD-YEAST_raw.csv")
c.export2sif("PKN-YEAST.sif") # valid irrespective of the data scaling
In [30]:
m = y.get_midas()
m.scale_min_max_across_experiments()
m.save2midas("MD-YEAST_min_max_across_exp.csv")
m.plot()
In [14]:
m = y.get_midas()
m.scale_max_across_experiments()
m.save2midas("MD-YEAST_max_across_exp.csv")
m.plot()
We will get rid of nodes dowstream of FUS3 (included) and downstream of HOG (excluded) to simplify the network. In the data itself, we will also ignore GPA1 and STE2, which are important but difficult to fit. Instead of having bad fits all over the place, we ignore them for now
In [15]:
# STE7 and STE5 are anyway not measured.
# FUS3 is ignored for now
# SKO1, downstream of HOG1
# y-STE18 and b-STE4 could be compressed anyway
# NAs in STE20, SSK1, PBS, STE50 so for now, ignored
c.collapse_nodes(["y-STE18", "STE7", "STE5", "FUS3_T173-S177-T180-Y182", "b-STE4", "SKO1_T215",
"STE20_T413-S418", "STE20_S192", "SSK1_S110",
"STE20_T546-S547", "STE20_T511"])
c.remove_nodes_from(["PBS2_S66-S68","STE50_S202"]) # some nodes cannot be collapsed
c.remove_edge("CDC42", "STE11_S323-S326")
In [16]:
#c.remove_edge("CDC42", "STE11_S323-S326")
c.remove_edge("YPD1", "SSK2_S53-S54~S57")
In [17]:
c.plot()
Out[17]:
In [42]:
c.export2sif("PKN-YEAST_simple.sif")
In [43]:
m.remove_species(["STE20_T413-S418", "STE20_S192", "SSK1_S110",
"GPA1_T189-S199-S200", "STE50_S202", "STE2_S331",
"PBS2_S66-S68", "STE20_T511", "STE20_T546-S547",
"FUS3_T173-S177-T180-Y182", "SKO1_T215"])
In [46]:
m.save2midas("MD-YEAST_simple_raw.csv")
m.plot()
In [47]:
m.scale_min_max_across_experiments()
m.save2midas("MD-YEAST_simple_min_max_across_exp.csv")
m.plot()
In [49]:
e.df = e.df[m.df.columns]
e.df.fillna(0.5, inplace=True)
e.save2midas("MD-YEAST_simple_errors.csv")
Once you have the PKN and MIDAS (errors are optional), you can optimise the data to PKN. The model being large (from an ODE perspective), optimisation for 600 seconds at least is recommended. The more the better. There are different ways to optimise but you willl need CellNOptR/ CNORode packages.
From R, you can run this script
libraryCellNOptR) # version 1.10.1 pknmodel = readSIF("MD-YEAST_min_max_across_exp.csv") cnolist = CNOlist("MD-YEAST_min_max_across_exp.csv") cnolist = readErrors("MD-YEAST_simple_errors.csv") res = createLBodeContPars(pknmodel) res = parEstimationLBodeSSm(cnolist, pknmodel, ode_parameters=res, ndiverse=20, dim_refset=20, maxtime=60) # pdf("filename.pdf", width=30, height=20) # png("filename.png", width=1500, height=1000) sim = plotLBodeFitness(cnolist, pknmodel, res, plotParams=list(cex=0.5)) # dev.off() # to actually save/close the file
From python, you can use cellnopt.wrapper or cellnopt.pipeline; we will use the later for demonstration.
In [31]:
from cellnopt.pipeline import CNOode
In [32]:
p = CNOode("PKN-YEAST_simple.sif", "MD-YEAST_simple_min_max_across_exp.csv")
p.optimise(maxtime=10) # here we run just 10 minutes to show how to proceed
p.report() # a new window should open. We could use HTML (see next cell but
#buggy for now: image do not appear due to replative/absolute path issues)
In [14]:
from IPython.display import HTML
HTML("report_ode/index.html")
Out[14]:
Now, let us show the final results
In [15]:
from IPython.display import Image
In [ ]:
#img = WImage(filename='Fitness-YEAST_.pdf')
MSE = 0.075
In [82]:
Image(filename='Fitness-YEAST_min_max_across_exp.png', resolution=120)
Out[82]:
In [ ]:
MSE = 0.060
In [17]:
Image(filename='Fitness-YEAST_max_across_exp.png')
Out[17]:
In [94]:
m.df.columns
Out[94]:
In [18]:
Image(filename='Fitness-YEAST_simple_raw.png')
Out[18]:
Without SLN1 data, we get the same kind of results.
In [ ]:
In [ ]:
In [86]:
Image(filename='Fitness-YEAST_simple_min_max_across_exp.png', resolution=80)
Out[86]:
In [ ]: