Generation of the PKN, MIDAS and Errors (MIDAS format) from raw data (and small data set)


In [8]:
from msdas import *
%pylab inline
rcParams['savefig.dpi'] = 200


Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['e']
`%matplotlib` prevents importing * from pylab and numpy

1 . Inputs

1.1. PKN


In [2]:
from cellnopt.core import *
from easydev import gsf
pkn = gsf("msdas", "data","PKN-yeastScaffold.sif")
print(pkn)


/home/cokelaer/Work/svn-sysbiomed/trunk/cinapps/msdas/share/data/PKN-yeastScaffold.sif

Let us visualise it


In [4]:
c = CNOGraph(pkn)
c._stimuli = ["a", "NaCl"]
c.plot()


INFO:root:dot -Tpng /tmp/tmpz3Cd_8.dot -o /tmp/tmp8Hp1Lt.png
Out[4]:
<matplotlib.axes.Axes at 0x5b54390>

We will get rid of nodes DIG1, DIG2 and STE12 for now

2 . The data


In [5]:
y = yeast.YEAST2MIDAS(get_yeast_small_data(), get_yeast_raw_data(), verbose=False)


/home/cokelaer/Work/svn-sysbiomed/trunk/cinapps/msdas/src/msdas/readers.py:640: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  self.df['Sequence'] = list(self.df['Sequence_Phospho'].values)
/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/pandas/core/indexing.py:344: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  self.obj[item] = s
WARNING:root:Rebuilding identifier in the dataframe. MERGED prefixes will be lost
WARNING:root:Identifiers are not unique. Have you called merge_peptides() ?
INFO:root:Filling Nas with zeros if any
INFO:root:data is in `df` attribute and full raw data with replicates in `replicates` attribute

2. Export expanded PKN and MIDAS data

The data is made of 36 experients. We want to extract 3 time series into a MIDAS file. This is done internally to the export_pkn_midas function.

Another tricky part is that the nodes in the PKN are named after protein names. However, we know have protein_psites. In other words there are more than on node for each protein. Again, te export_pkn_and_midas function will take care of that.

Finally, we may be interested to know the errors, which is extracted from the replicates in the raw data.

The export_pkn_and_midas function returns

  • expanded graph
  • midas
  • errors

However, we first want to

  • Get rid the row in the big data set
  • Use a mapping from raw to small data. Indeed identifier (protein_psites) are not the same. This ishard coded and would fail in another context of if the merge_peptides function changes.
  • replace small data (df) with the raw data (normalised and averaged), which means that we will now have NAs in the small data
  • Get rid of rows in the small data set that have no mapping.

This cleaning is performed via the call to cleanup_june function


In [6]:
y.cleanup_june()
y.df.shape


INFO:root:Filling Nas with zeros if any
57
Out[6]:
(57, 43)

In [7]:
c,m,e = y.export_pkn_and_midas_june(pkn)


reanming PTP2 -> PTP2_Y257-S258
reanming HOT1 -> HOT1_S153
reanming STE11 -> STE11_S323-S326-S326
reanming STE12 -> STE12_S400
reanming SLN1 -> SLN1_S833
reanming STE50 -> STE50_S202
reanming GPA1 -> GPA1_T189-S199-S200
reanming STE2 -> STE2_S331
reanming FAR1 -> FAR1_S114
reanming SSK2 -> SSK2_S53-S54~S57
building
2
3
4
building
2
3
4

The new PKN


In [9]:
c.plot()


INFO:root:dot -Tpng /tmp/tmpbX2FXQ.dot -o /tmp/tmpbEaata.png
Out[9]:
<matplotlib.axes.Axes at 0x8fd8910>

The MIDAS file


In [27]:
rcParams['savefig.dpi'] = 150
m.plot()


The Errors


In [10]:
e.create_empty_simulation()
e.plot()


Some errors are not defined (NA). Let us replace the NAs by a pessimistic value of 50% errors.

3 . Create the data

3.1 RAW case


In [28]:
e.df.fillna(0.3, inplace=True)  # can be called only once . Second call has no effect until you re-create e instance
e.save2midas("MD-YEAST_errors.csv") # valid irrespective of the data scaling

In [29]:
m.save2midas("MD-YEAST_raw.csv")
c.export2sif("PKN-YEAST.sif") # valid irrespective of the data scaling

3.2 scaled min/max acrosss experiments


In [30]:
m = y.get_midas()
m.scale_min_max_across_experiments()
m.save2midas("MD-YEAST_min_max_across_exp.csv")
m.plot()


building
2
3
4

3.3 scaled max across experiments


In [14]:
m = y.get_midas()
m.scale_max_across_experiments()
m.save2midas("MD-YEAST_max_across_exp.csv")
m.plot()


building
2
3
4

4. Create a simple case (tagged "simple")

We will get rid of nodes dowstream of FUS3 (included) and downstream of HOG (excluded) to simplify the network. In the data itself, we will also ignore GPA1 and STE2, which are important but difficult to fit. Instead of having bad fits all over the place, we ignore them for now


In [15]:
# STE7 and STE5 are anyway not measured. 
# FUS3 is ignored for now
# SKO1, downstream of HOG1
# y-STE18 and b-STE4 could be compressed anyway
# NAs in STE20, SSK1, PBS, STE50 so for now, ignored
c.collapse_nodes(["y-STE18", "STE7", "STE5", "FUS3_T173-S177-T180-Y182", "b-STE4", "SKO1_T215",
   "STE20_T413-S418", "STE20_S192", "SSK1_S110",
  "STE20_T546-S547", "STE20_T511"])
c.remove_nodes_from(["PBS2_S66-S68","STE50_S202"]) # some nodes cannot be collapsed
c.remove_edge("CDC42", "STE11_S323-S326")


WARNING:root:unknown case (no output or input ?). Node FUS3_T173-S177-T180-Y182 removed
WARNING:root:unknown case (no output or input ?). Node SKO1_T215 removed

In [16]:
#c.remove_edge("CDC42", "STE11_S323-S326")
c.remove_edge("YPD1", "SSK2_S53-S54~S57")

In [17]:
c.plot()


INFO:root:dot -Tpng /tmp/tmpuOtdis.dot -o /tmp/tmpFGGwNu.png
Out[17]:
<matplotlib.axes.Axes at 0x5f4a250>

In [42]:
c.export2sif("PKN-YEAST_simple.sif")

now let us clean up the MIDAS file from species that have been removed + the GPA1 and STE2 , which are kept in the PKN


In [43]:
m.remove_species(["STE20_T413-S418", "STE20_S192", "SSK1_S110",
     "GPA1_T189-S199-S200", "STE50_S202", "STE2_S331", 
     "PBS2_S66-S68", "STE20_T511", "STE20_T546-S547", 
     "FUS3_T173-S177-T180-Y182", "SKO1_T215"])

In [46]:
m.save2midas("MD-YEAST_simple_raw.csv")
m.plot()



In [47]:
m.scale_min_max_across_experiments()
m.save2midas("MD-YEAST_simple_min_max_across_exp.csv")
m.plot()



In [49]:
e.df = e.df[m.df.columns]
e.df.fillna(0.5, inplace=True)
e.save2midas("MD-YEAST_simple_errors.csv")

5. Optimisation

Once you have the PKN and MIDAS (errors are optional), you can optimise the data to PKN. The model being large (from an ODE perspective), optimisation for 600 seconds at least is recommended. The more the better. There are different ways to optimise but you willl need CellNOptR/ CNORode packages.

From R, you can run this script

libraryCellNOptR) # version 1.10.1
pknmodel = readSIF("MD-YEAST_min_max_across_exp.csv")
cnolist = CNOlist("MD-YEAST_min_max_across_exp.csv")
cnolist = readErrors("MD-YEAST_simple_errors.csv")
res = createLBodeContPars(pknmodel)
res = parEstimationLBodeSSm(cnolist, pknmodel, ode_parameters=res, ndiverse=20, dim_refset=20, maxtime=60)
# pdf("filename.pdf", width=30, height=20)
# png("filename.png", width=1500, height=1000)
sim = plotLBodeFitness(cnolist, pknmodel, res, plotParams=list(cex=0.5))
# dev.off() # to actually save/close the file

From python, you can use cellnopt.wrapper or cellnopt.pipeline; we will use the later for demonstration.


In [31]:
from cellnopt.pipeline import CNOode

In [32]:
p = CNOode("PKN-YEAST_simple.sif", "MD-YEAST_simple_min_max_across_exp.csv")
p.optimise(maxtime=10) # here we run just 10 minutes to show how to proceed
p.report()  # a new window should open. We could use HTML (see next cell but 
#buggy for now: image do not appear due to replative/absolute path issues)


INFO:root:dot -Tpng /tmp/tmphy5Bth.dot -o report_ode/pknmodel.png
INFO:root:dot -Tpng /tmp/tmppkTZbQ.dot -o report_ode/expmodel.png
INFO:root:dot -Tpng /tmp/tmpzf_Mp_.dot -o report_ode/ode_parameters_k.png
INFO:root:dot -Tpng /tmp/tmpSWBPyI.dot -o report_ode/ode_parameters_n.png
INFO:root:
        library(CNORode)
        pknmodel = readSIF("/tmp/tmp6HYRyi")
        cnolist = CNOlist("/tmp/tmpFBh4FR")
        load("ode_params.RData")
        sim_data = plotLBodeFitness(cnolist, pknmodel, ode_parameters=ode_params)

        sim = do.call(rbind, sim_data)

        signals = colnames(cnolist@signals$`0`)

        colnames(sim) = signals
        write.csv(sim, "sim.csv")

        
INFO:root:------------------
INFO:root:Running the R script. Make take a while depending on your parameters and size of the model/data sets
WARNING:root:Existing directory report_ode. Files may be overwritten
openning report_ode/index.html
Warning: over-writing report_ode/config.ini 

In [14]:
from IPython.display import HTML
HTML("report_ode/index.html")


Out[14]:
CellNOpt pipeline report

CNOode analysis summary

Report created with cellnopt.pipeline.cnoode.CNOode (version unkonwn)

See cellnopt homepage for downloads and documentation.

1 - Input data files

some basic stats about the pkn and data e.g. number of species ? or in the pkn section?

3 - Script used


        library(CNORode)
        pknmodel = readSIF("/tmp/tmpB6eBQF")
        cnolist = CNOlist("/tmp/tmpcrEHxj")
                ode_params = createLBodeContPars(pknmodel)

        ode_params = parEstimationLBodeSSm(cnolist, pknmodel, maxtime=10, 
            dim_refset=10, verbose=F, ndiverse=10, ode_parameters=ode_params)
        
        save(ode_params, file="ode_params.RData")
        res = ode_params
        
            score=res$ssm_results$f), "/tmp/tmpc5Xi2_.csv")

        o.report()

8 - Reproducibility

In a shell, go to the report directory (where is contained this report) and either execute the file called rerun.py or typoe these commands in a python shell

        from cellnopt.pipeline import *
        c = CNObool(config=config.ini)
        c.gaBinaryT1()
        c.report()
        

You will need the configuration file that was used in the report ( config.ini )

New results will be put in a sub directory so that your current report is not overwritten

9 - stats

Best Score 0.109648854488
Computation time 10.174
cellnopt.pipeline 0.0.2
cellnopt.core 0.6.0
easydev 0.6.12
docutils 0.11
pandas 0.13.0
MarkupSafe 0.19
browse 1.0.1
cellnopt.data 1.0.3
pygraphviz 1.2
Sphinx 1.2.2
six 1.5.2
Pygments 1.6
numpy 1.8.0
networkx 1.8.1
python-dateutil 2.2
Jinja2 2.7.2
beautifulsoup4 4.3.1
pytz 2013b
CNORode 1.5.1

Created on 2014-06-13 09:39:27.087983 by cokelaer

Now, let us show the final results


In [15]:
from IPython.display import Image

5.1 raw (normalised in cno)


In [ ]:
#img = WImage(filename='Fitness-YEAST_.pdf')

5.2 scaled min max across experiments case

MSE = 0.075


In [82]:
Image(filename='Fitness-YEAST_min_max_across_exp.png', resolution=120)


Out[82]:

In [ ]:

5.3 scaled max across experiments case

MSE = 0.060


In [17]:
Image(filename='Fitness-YEAST_max_across_exp.png')


Out[17]:

5.4 simple case (raw)


In [94]:
m.df.columns


Out[94]:
Index([u'HOG1_T174-Y176', u'HOG1_T174~Y176', u'PBS2_S248', u'PBS2_S269', u'PTP2_Y257-S258', u'SLN1_S833', u'SSK1_S193~S195', u'SSK1_S673', u'SSK2_S53-S54~S57', u'STE11_S323-S326', u'STE20_S169-T170~T172', u'STE20_S195-S196', u'STE20_T203~T207-T217-T218', u'STE20_T573-T575'], dtype='object')

In [18]:
Image(filename='Fitness-YEAST_simple_raw.png')


Out[18]:

Without SLN1 data, we get the same kind of results.

5.4 simple case (min max across experiments)


In [ ]:


In [ ]:


In [86]:
Image(filename='Fitness-YEAST_simple_min_max_across_exp.png', resolution=80)


Out[86]:

In [ ]: