In [ ]:

    
import datetime as dt
import doekbase.data_api.object as object
print dt.datetime.now()
services = { "workspace_service_url": "https://ci.kbase.us/services/ws/",  "shock_service_url": "https://ci.kbase.us/services/shock-api/", }
object_api = object.ObjectAPI(services, ref="PrototypeReferenceGenomes/kb|g.3899")
print object_api.get_typestring()
print object_api.get_id()
print object_api.get_name()
print object_api.get_info()
print dt.datetime.now()



In [ ]:

    
print dt.datetime.now()
data = object_api.get_data()
print dt.datetime.now()
print data



In [ ]:

    
import datetime
print datetime.datetime.now()
ws = doekbase.data_api.browse('1013')
Ath = ws["kb|g.3899"]
print Ath
#CDS_List = Ath.object.get_feature_ids(type_list=['CDS'])
Proteins = Ath.object.get_proteins()
print datetime.datetime.now()
for key, value in Proteins.items():
    print value['protein_id']+' '+value['function']+' '+value['amino_acid_sequence']
    break

Annotation

1) Retrieve id and sequences of proteins 2) Feed into Annotation server 3) Returns id and functional annotation of proteins 4) Update and save functional annotation string in function field of proteins

Input: Genome Transcriptome Protein Fasta (Functional annotation could feasibly be stored in "comment" field of sequence header) FeatureSet (RNA-Seq pipeline produces this, but, at this point in time, no sequences are directly associated with it so sequences would have to be fetched from reference genome object) Output: Where possible the output object type is the same as the input object type, albeit with updated functions

Modeling

1) Retrieve id and functional annotation of proteins 2) Feed into model reconstruction process 3) Builds a set of reactions, where each reaction has a list of feature references 4) Expression Matrix, with references to same features, can be applied during Flux Balance Analysis

Input: Genome (currently only one implemented) Transcriptome Protein Fasta FeatureSet

Output: Metabolic Reconstruction with references to features in input object

NB: It is not required that object is annotated, but resulting model will not have any reactions!

RNA-Seq Pipeline

1) Retrieves set of raw reads 2) Assembles reads into Transcriptome object using reference Genome 3) Produces FeatureSet/ExpressionMatrix with a list of references to features in Transcriptome/Genome object

Input: Reads Reference Genome

Output: Transcriptome FeatureSet ExpressionMatrix

External References

Currently, in any workspace object that contains a reference to a feature in a genome object, the reference looks like this:

549/5/2/features/id/RSP_2777

When rendering a workspace object in a narrative widget or landing pages, these references become links, loading the correct feature from the parent genome object. So whether one is browsing a metabolic model, or an expression matrix, etc. they should be able to click on a link, and observe the attributes of the linked feature.

I don't know how this would be implemented in the api, but I don't know how the reference string would be formulated so that the right feature is retrieved every time?