Example application for NetworKIN-based analysis

Here, the utility function provided in kinact that enable the usage of NetworKIN will be introduced shortly. We start again by loading the example data from de Graaf et al. included in the package.


In [2]:
import kinact
data_log2, data_p_value = kinact.get_example_data()
print data_log2.head()


                 5min     10min     20min     30min     60min
ID                                                           
A0AVK6_S71  -0.319306 -0.484960 -0.798082 -0.856103 -0.928753
A0FGR8_S743 -0.856661 -0.981951 -1.500412 -1.441868 -0.861470
A0FGR8_S758 -1.445386 -2.397915 -2.692994 -2.794762 -1.553398
A0FGR8_S691  0.271458  0.264596  0.501685  0.461984  0.655501
A0JLT2_S226 -0.080786  1.069710  0.519780  0.520883 -0.296040

NetworKIN uses as input two different files

  • fasta_file: A file containing all sequences of the proteins of interest
  • site_file: A file listing all phosphosites in the format: ID tab position tab residue

With the function prepare_networkin_files, the needed files with the right layout are produced in a specified directory, based on a list of phosphosites in the format Uniprot-Accession-ID_ResiduePosition.


In [3]:
kinact.networkin.prepare_networkin_files(phospho_sites=data_log2.index.tolist(), 
                                         output_dir='./networkin_example_files/', 
                                         organism='human')


Files for NetworKIN analysis successfully saved in ./networkin_example_files/

Usage of NetworKIN

Web-Interface

NetworKIN can be used via the high-throughput version of the web interface. In order to do so, select 'Human - UniProt' or 'Yeast - Uniprot' from the drop-down menu and paste the contents of the file 'site_file.txt' into the dedicated field. It is possible, that several phosphosites cannot be matched correctly due to different versions of the UniProt database (these will have to be removed manually). After clicking the 'Submit'-Button, NetworKIN will try to map the UniProt Identifiers to STRING in order to integrate contextual information for the prediction. On the next page, possible problems with the matching will be displayed and the user will be prompted to select isoforms or homologs. After clicking 'Next' at the bottom of the page, NetworKIN will predict likely upstream kinases. On the page displaying the results, there is a 'Save' button. Select 'Full Dataset' and save the file as output.txt.

Locally

NetworKIN can also be used locally on your machine, which may be easier depending on the number of phosphosites in your dataset. In order to do so, download the NetworKIN release, the NetPhorest release, and the blast algorithm (important: blast to has be the version 2.2.17, which can be found here) from the dedicated websites. Now, NetPhorest has to be compiled, using a gcc compiler version 3.x., like this:

cd "NetPhorest-directory"
cc -03 -o netphorest netphorest.c -lm

The prediction can then be performed with the following command:

python "path to NetworKIN.py" -n "path to netphorest" -b "path to blast" "Taxon Identifier for organism of interest" fasta_file site_file

e.g.:

python ./NetworKIN.py -n ../netphorest/netphorest -b ../blast-2.2.17/bin/blastall 9606 ./fasta_file.txt ./site_file.txt > output.txt

The output file can then be used to create the adjacency matrix with a dedicated function in kinact.


In [4]:
adjacency_matrix = kinact.networkin.get_kinase_targets_from_networkin('./networkin_example_files/output.txt', 
                                                                      add_omnipath=False, score_cut_off=1)

In [5]:
scores, p_values = kinact.networkin.weighted_mean(data_fc=data_log2['5min'], 
                                                  interactions=adjacency_matrix, 
                                                  mP=data_log2.values.mean(), 
                                                  delta=data_log2.values.std())
print scores.sort_values(ascending=False).head()


PRKACA    0.734578
PDK4      0.627663
MAPK10    0.603200
MAPK13    0.534214
PAK6      0.514163
dtype: float64

In [ ]: