Generation of the PKN, MIDAS and Errors (MIDAS format) from raw data (and small data set)



In [8]:

    
from msdas import *
%pylab inline
rcParams['savefig.dpi'] = 200









    



Populating the interactive namespace from numpy and matplotlib






    



WARNING: pylab import has clobbered these variables: ['e']
`%matplotlib` prevents importing * from pylab and numpy

1 . Inputs

1.1. PKN



In [2]:

    
from cellnopt.core import *
from easydev import gsf
pkn = gsf("msdas", "data","PKN-yeastScaffold.sif")
print(pkn)









    



/home/cokelaer/Work/svn-sysbiomed/trunk/cinapps/msdas/share/data/PKN-yeastScaffold.sif

Let us visualise it



In [4]:

    
c = CNOGraph(pkn)
c._stimuli = ["a", "NaCl"]
c.plot()









    



INFO:root:dot -Tpng /tmp/tmpz3Cd_8.dot -o /tmp/tmp8Hp1Lt.png






    Out[4]:





<matplotlib.axes.Axes at 0x5b54390>

We will get rid of nodes DIG1, DIG2 and STE12 for now

2 . The data



In [5]:

    
y = yeast.YEAST2MIDAS(get_yeast_small_data(), get_yeast_raw_data(), verbose=False)









    



/home/cokelaer/Work/svn-sysbiomed/trunk/cinapps/msdas/src/msdas/readers.py:640: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  self.df['Sequence'] = list(self.df['Sequence_Phospho'].values)
/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/pandas/core/indexing.py:344: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  self.obj[item] = s
WARNING:root:Rebuilding identifier in the dataframe. MERGED prefixes will be lost
WARNING:root:Identifiers are not unique. Have you called merge_peptides() ?
INFO:root:Filling Nas with zeros if any
INFO:root:data is in `df` attribute and full raw data with replicates in `replicates` attribute

2. Export expanded PKN and MIDAS data

The data is made of 36 experients. We want to extract 3 time series into a MIDAS file. This is done internally to the export_pkn_midas function.

Another tricky part is that the nodes in the PKN are named after protein names. However, we know have protein_psites. In other words there are more than on node for each protein. Again, te export_pkn_and_midas function will take care of that.

Finally, we may be interested to know the errors, which is extracted from the replicates in the raw data.

The export_pkn_and_midas function returns

expanded graph
midas
errors

However, we first want to

Get rid the row in the big data set
Use a mapping from raw to small data. Indeed identifier (protein_psites) are not the same. This ishard coded and would fail in another context of if the merge_peptides function changes.
replace small data (df) with the raw data (normalised and averaged), which means that we will now have NAs in the small data
Get rid of rows in the small data set that have no mapping.

This cleaning is performed via the call to cleanup_june function



In [6]:

    
y.cleanup_june()
y.df.shape









    



INFO:root:Filling Nas with zeros if any






    



57






    Out[6]:





(57, 43)



In [7]:

    
c,m,e = y.export_pkn_and_midas_june(pkn)









    



reanming PTP2 -> PTP2_Y257-S258
reanming HOT1 -> HOT1_S153
reanming STE11 -> STE11_S323-S326-S326
reanming STE12 -> STE12_S400
reanming SLN1 -> SLN1_S833
reanming STE50 -> STE50_S202
reanming GPA1 -> GPA1_T189-S199-S200
reanming STE2 -> STE2_S331
reanming FAR1 -> FAR1_S114
reanming SSK2 -> SSK2_S53-S54~S57
building
2
3
4
building
2
3
4

The new PKN



In [9]:

    
c.plot()









    



INFO:root:dot -Tpng /tmp/tmpbX2FXQ.dot -o /tmp/tmpbEaata.png






    Out[9]:





<matplotlib.axes.Axes at 0x8fd8910>

The MIDAS file



In [27]:

    
rcParams['savefig.dpi'] = 150
m.plot()

The Errors



In [10]:

    
e.create_empty_simulation()
e.plot()

Some errors are not defined (NA). Let us replace the NAs by a pessimistic value of 50% errors.

3 . Create the data

3.1 RAW case



In [28]:

    
e.df.fillna(0.3, inplace=True)  # can be called only once . Second call has no effect until you re-create e instance
e.save2midas("MD-YEAST_errors.csv") # valid irrespective of the data scaling



In [29]:

    
m.save2midas("MD-YEAST_raw.csv")
c.export2sif("PKN-YEAST.sif") # valid irrespective of the data scaling

3.2 scaled min/max acrosss experiments



In [30]:

    
m = y.get_midas()
m.scale_min_max_across_experiments()
m.save2midas("MD-YEAST_min_max_across_exp.csv")
m.plot()









    



building
2
3
4

3.3 scaled max across experiments



In [14]:

    
m = y.get_midas()
m.scale_max_across_experiments()
m.save2midas("MD-YEAST_max_across_exp.csv")
m.plot()









    



building
2
3
4

4. Create a simple case (tagged "simple")

We will get rid of nodes dowstream of FUS3 (included) and downstream of HOG (excluded) to simplify the network. In the data itself, we will also ignore GPA1 and STE2, which are important but difficult to fit. Instead of having bad fits all over the place, we ignore them for now



In [15]:

    
# STE7 and STE5 are anyway not measured. 
# FUS3 is ignored for now
# SKO1, downstream of HOG1
# y-STE18 and b-STE4 could be compressed anyway
# NAs in STE20, SSK1, PBS, STE50 so for now, ignored
c.collapse_nodes(["y-STE18", "STE7", "STE5", "FUS3_T173-S177-T180-Y182", "b-STE4", "SKO1_T215",
   "STE20_T413-S418", "STE20_S192", "SSK1_S110",
  "STE20_T546-S547", "STE20_T511"])
c.remove_nodes_from(["PBS2_S66-S68","STE50_S202"]) # some nodes cannot be collapsed
c.remove_edge("CDC42", "STE11_S323-S326")









    



WARNING:root:unknown case (no output or input ?). Node FUS3_T173-S177-T180-Y182 removed
WARNING:root:unknown case (no output or input ?). Node SKO1_T215 removed



In [16]:

    
#c.remove_edge("CDC42", "STE11_S323-S326")
c.remove_edge("YPD1", "SSK2_S53-S54~S57")



In [17]:

    
c.plot()









    



INFO:root:dot -Tpng /tmp/tmpuOtdis.dot -o /tmp/tmpFGGwNu.png






    Out[17]:





<matplotlib.axes.Axes at 0x5f4a250>



In [42]:

    
c.export2sif("PKN-YEAST_simple.sif")

now let us clean up the MIDAS file from species that have been removed + the GPA1 and STE2 , which are kept in the PKN



In [43]:

    
m.remove_species(["STE20_T413-S418", "STE20_S192", "SSK1_S110",
     "GPA1_T189-S199-S200", "STE50_S202", "STE2_S331", 
     "PBS2_S66-S68", "STE20_T511", "STE20_T546-S547", 
     "FUS3_T173-S177-T180-Y182", "SKO1_T215"])



In [46]:

    
m.save2midas("MD-YEAST_simple_raw.csv")
m.plot()



In [47]:

    
m.scale_min_max_across_experiments()
m.save2midas("MD-YEAST_simple_min_max_across_exp.csv")
m.plot()



In [49]:

    
e.df = e.df[m.df.columns]
e.df.fillna(0.5, inplace=True)
e.save2midas("MD-YEAST_simple_errors.csv")

5. Optimisation

Once you have the PKN and MIDAS (errors are optional), you can optimise the data to PKN. The model being large (from an ODE perspective), optimisation for 600 seconds at least is recommended. The more the better. There are different ways to optimise but you willl need CellNOptR/ CNORode packages.

From R, you can run this script

libraryCellNOptR) # version 1.10.1
pknmodel = readSIF("MD-YEAST_min_max_across_exp.csv")
cnolist = CNOlist("MD-YEAST_min_max_across_exp.csv")
cnolist = readErrors("MD-YEAST_simple_errors.csv")
res = createLBodeContPars(pknmodel)
res = parEstimationLBodeSSm(cnolist, pknmodel, ode_parameters=res, ndiverse=20, dim_refset=20, maxtime=60)
# pdf("filename.pdf", width=30, height=20)
# png("filename.png", width=1500, height=1000)
sim = plotLBodeFitness(cnolist, pknmodel, res, plotParams=list(cex=0.5))
# dev.off() # to actually save/close the file

From python, you can use cellnopt.wrapper or cellnopt.pipeline; we will use the later for demonstration.



In [31]:

    
from cellnopt.pipeline import CNOode



In [32]:

    
p = CNOode("PKN-YEAST_simple.sif", "MD-YEAST_simple_min_max_across_exp.csv")
p.optimise(maxtime=10) # here we run just 10 minutes to show how to proceed
p.report()  # a new window should open. We could use HTML (see next cell but 
#buggy for now: image do not appear due to replative/absolute path issues)









    



INFO:root:dot -Tpng /tmp/tmphy5Bth.dot -o report_ode/pknmodel.png
INFO:root:dot -Tpng /tmp/tmppkTZbQ.dot -o report_ode/expmodel.png
INFO:root:dot -Tpng /tmp/tmpzf_Mp_.dot -o report_ode/ode_parameters_k.png
INFO:root:dot -Tpng /tmp/tmpSWBPyI.dot -o report_ode/ode_parameters_n.png
INFO:root:
        library(CNORode)
        pknmodel = readSIF("/tmp/tmp6HYRyi")
        cnolist = CNOlist("/tmp/tmpFBh4FR")
        load("ode_params.RData")
        sim_data = plotLBodeFitness(cnolist, pknmodel, ode_parameters=ode_params)

        sim = do.call(rbind, sim_data)

        signals = colnames(cnolist@signals$`0`)

        colnames(sim) = signals
        write.csv(sim, "sim.csv")

        
INFO:root:------------------
INFO:root:Running the R script. Make take a while depending on your parameters and size of the model/data sets
WARNING:root:Existing directory report_ode. Files may be overwritten






    



openning report_ode/index.html
Warning: over-writing report_ode/config.ini



In [14]:

    
from IPython.display import HTML
HTML("report_ode/index.html")









    Out[14]:







 
  
  
   CellNOpt pipeline report
  
  
  
 
 
  
   
    CNOode analysis summary
   
   
    Report created with cellnopt.pipeline.cnoode.CNOode (version unkonwn)
   
   
    See
    
     cellnopt homepage
    
    for downloads and documentation.
   
   
    
     
      Contents
     
    
    
     
      
       1 -
       
        Input data files
       
      
      
       2 -
       
        Description
       
      
      
       3 -
       
        Script used
       
      
      
       4 -
       
        PKN graph
       
      
      
       5 -
       
        Expanded before optimisation
       
      
      
       6 -
       
        Optimised model
       
      
      
       7 -
       
        Errors
       
      
      
       8 -
       
        Reproducibility
       
      
      
       9 -
       
        stats
       
      
     
    
   
   
    
     1 -
     
      Input data files
     
    
    
     
      
       input model (PKN)
      
     
     
      
       input data (MIDAS)
      
     
     
      
       Config file
      
     
     
      
       Script
      
     
    
    
     some basic stats about the pkn and data e.g. number of species ? or in the pkn section?
    
   
   
    
     2 -
     
      Description
     
    
    
     
    
   
   
    
     3 -
     
      Script used
     
    
            library(CNORode)
        pknmodel = readSIF("/tmp/tmpB6eBQF")
        cnolist = CNOlist("/tmp/tmpcrEHxj")
                ode_params = createLBodeContPars(pknmodel)

        ode_params = parEstimationLBodeSSm(cnolist, pknmodel, maxtime=10, 
            dim_refset=10, verbose=F, ndiverse=10, ode_parameters=ode_params)
        
        save(ode_params, file="ode_params.RData")
        res = ode_params
        
            score=res$ssm_results$f), "/tmp/tmpc5Xi2_.csv")

        o.report()

   
   
    
     4 -
     
      PKN graph
     
    
    
     
      Documentation
     
     
      
       
        cnograph
       
      
     
    
    
     
    
    
     
    
   
   
    
     5 -
     
      Expanded before optimisation
     
    
    
    
   
   
    
     6 -
     
      Optimised model
     
    
    
     '
     
      
      
     
    
   
   
    
     7 -
     
      Errors
     
    
    
    
   
   
    
     8 -
     
      Reproducibility
     
    
    
     In a shell, go to the report directory (where is contained this report)
        and either execute the file called rerun.py or typoe these commands in 
        a python shell
    
            from cellnopt.pipeline import *
        c = CNObool(config=config.ini)
        c.gaBinaryT1()
        c.report()
        
    
     You will need the configuration file that was used in the report 
        (
     
      config.ini
     
     )
    
    
     New results will be put in a sub directory so that your current 
        report is not overwritten
    
   
   
    
     9 -
     
      stats
     
    
    
     
      
       Best Score
      
      
       0.109648854488
      
     
     
      
       Computation time
      
      
       10.174
      
     
    
    
    
   
   
    
     
      cellnopt.pipeline
     
     
      0.0.2
     
    
    
     
      cellnopt.core
     
     
      0.6.0
     
    
    
     
      easydev
     
     
      0.6.12
     
    
    
     
      docutils
     
     
      0.11
     
    
    
     
      pandas
     
     
      0.13.0
     
    
    
     
      MarkupSafe
     
     
      0.19
     
    
    
     
      browse
     
     
      1.0.1
     
    
    
     
      cellnopt.data
     
     
      1.0.3
     
    
    
     
      pygraphviz
     
     
      1.2
     
    
    
     
      Sphinx
     
     
      1.2.2
     
    
    
     
      six
     
     
      1.5.2
     
    
    
     
      Pygments
     
     
      1.6
     
    
    
     
      numpy
     
     
      1.8.0
     
    
    
     
      networkx
     
     
      1.8.1
     
    
    
     
      python-dateutil
     
     
      2.2
     
    
    
     
      Jinja2
     
     
      2.7.2
     
    
    
     
      beautifulsoup4
     
     
      4.3.1
     
    
    
     
      pytz
     
     
      2013b
     
    
    
     
      CNORode
     
     
      1.5.1
     
    
   
   
    
     Created on 2014-06-13 09:39:27.087983 by cokelaer

Now, let us show the final results



In [15]:

    
from IPython.display import Image

5.1 raw (normalised in cno)



In [ ]:

    
#img = WImage(filename='Fitness-YEAST_.pdf')

5.2 scaled min max across experiments case

MSE = 0.075



In [82]:

    
Image(filename='Fitness-YEAST_min_max_across_exp.png', resolution=120)









    Out[82]:



In [ ]:

5.3 scaled max across experiments case

MSE = 0.060



In [17]:

    
Image(filename='Fitness-YEAST_max_across_exp.png')









    Out[17]:

5.4 simple case (raw)



In [94]:

    
m.df.columns









    Out[94]:





Index([u'HOG1_T174-Y176', u'HOG1_T174~Y176', u'PBS2_S248', u'PBS2_S269', u'PTP2_Y257-S258', u'SLN1_S833', u'SSK1_S193~S195', u'SSK1_S673', u'SSK2_S53-S54~S57', u'STE11_S323-S326', u'STE20_S169-T170~T172', u'STE20_S195-S196', u'STE20_T203~T207-T217-T218', u'STE20_T573-T575'], dtype='object')



In [18]:

    
Image(filename='Fitness-YEAST_simple_raw.png')









    Out[18]:

Without SLN1 data, we get the same kind of results.

5.4 simple case (min max across experiments)



In [ ]:



In [ ]:



In [86]:

    
Image(filename='Fitness-YEAST_simple_min_max_across_exp.png', resolution=80)









    Out[86]:



In [ ]:

cellnopt.pipeline	0.0.2
cellnopt.core	0.6.0
easydev	0.6.12
docutils	0.11
pandas	0.13.0
MarkupSafe	0.19
browse	1.0.1
cellnopt.data	1.0.3
pygraphviz	1.2
Sphinx	1.2.2
six	1.5.2
Pygments	1.6
numpy	1.8.0
networkx	1.8.1
python-dateutil	2.2
Jinja2	2.7.2
beautifulsoup4	4.3.1
pytz	2013b
CNORode	1.5.1

Generation of the PKN, MIDAS and Errors (MIDAS format) from raw data (and small data set)

1 . Inputs

1.1. PKN

Let us visualise it

2 . The data

2. Export expanded PKN and MIDAS data

The new PKN

The MIDAS file

The Errors

3 . Create the data

3.1 RAW case

3.2 scaled min/max acrosss experiments

3.3 scaled max across experiments

4. Create a simple case (tagged "simple")

now let us clean up the MIDAS file from species that have been removed + the GPA1 and STE2 , which are kept in the PKN

5. Optimisation

CNOode analysis summary

Report created with cellnopt.pipeline.cnoode.CNOode (version unkonwn)

Contents

1 - Input data files

2 - Description

3 - Script used

4 - PKN graph

5 - Expanded before optimisation

6 - Optimised model

7 - Errors

8 - Reproducibility

9 - stats

5.1 raw (normalised in cno)

5.2 scaled min max across experiments case

5.3 scaled max across experiments case

5.4 simple case (raw)

5.4 simple case (min max across experiments)