Gap-fill a model in PyFBA

by Daniel Cuevas

Introduction

In this notebook, we will present the steps to generate a genome-scale metabolic model from RAST annotations, gap-fill the model on rich LB type media, and save the model to hard disk.


The required files and information for this notebook:

  • List of functional roles from RAST (normally labeled 'assigned_functions' from the Genome Directory download).
  • Organism name
  • Organism ID
  • Media file
  • Close genomes functional roles file
  • Directory on hard disk to store model

In [1]:
import sys
import os
import PyFBA

Generate model

The first step shows how to build the model from RAST functional roles.


In [2]:
model_functions_file = "data/citrobacter.assigned_functions"
close_genomes_functions_file = "data/close_genomes_functions"
org_name = "Citrobacter sedlakii"
org_id = "Citrobacter sedlakii"

In [3]:
model = PyFBA.model.roles_to_model(model_functions_file, org_id, org_name)

The model has been generated and is now ready to use for flux-balance analysis simulations. Running flux-balance analysis will show the model does not contain all required metabolism to grow in the LB media.

Here are the LB media contents. For PyFBA media files are stored in directory indicated by environmental variable 'PYFBA_MEDIA_DIR'. This step is only to show file contents but is not required for gap-filling.


In [4]:
lb_media_file = os.path.join(os.environ["PYFBA_MEDIA_DIR"], "ArgonneLB.txt")
with open(lb_media_file) as f:
    for l in f:
        print(l, end="")


Compound	Name	Formula	Charge
cpd03424	Vitamin B12	C61H86CoN13O14PR	6
cpd00215	Pyridoxal	C8H9NO3	0
cpd00028	Heme	C34H30FeN4O4	-2
cpd10515	Fe2+	Fe	2
cpd00030	Mn2+	Mn	2
cpd00149	Co2+	Co	2
cpd00058	Cu2+	Cu	2
cpd00099	Cl-	Cl	-1
cpd00007	O2	O2	0
cpd00034	Zn2+	Zn	2
cpd00156	L-Valine	C5H11NO2	0
cpd00249	Uridine	C9H12N2O6	0
cpd00092	Uracil	C4H4N2O2	0
cpd00069	L-Tyrosine	C9H11NO3	0
cpd00065	L-Tryptophan	C11H12N2O2	0
cpd00184	Thymidine	C10H14N2O5	0
cpd00161	L-Threonine	C4H9NO3	0
cpd00048	Sulfate	O4S	-2
cpd00054	L-Serine	C3H7NO3	0
cpd00220	Riboflavin	C17H20N4O6	0
cpd00129	L-Proline	C5H8NO2	-1
cpd00644	PAN	C9H16NO5	-1
cpd00009	Phosphate	HO4P	-2
cpd00066	L-Phenylalanine	C9H11NO2	0
cpd00218	Niacin	C6H4NO2	-1
cpd00971	Na+	Na	1
cpd00254	Mg	Mg	2
cpd00060	L-Methionine	C5H11NO2S	0
cpd00039	L-Lysine	C6H15N2O2	1
cpd00107	L-Leucine	C6H13NO2	0
cpd00205	K+	K	1
cpd00246	Inosine	C10H12N4O5	0
cpd00322	L-Isoleucine	C6H13NO2	0
cpd00226	HYXN	C5H4N4O	0
cpd00119	L-Histidine	C6H9N3O2	0
cpd00531	Hg2+	Hg	2
cpd00001	H2O	H2O	0
cpd00067	H+	H	1
cpd00033	Glycine	C2H5NO2	0
cpd00023	L-Glutamate	C5H8NO4	-1
cpd00027	D-Glucose	C6H12O6	0
cpd00393	Folate	C19H17N7O6	-2
cpd10516	fe3	Fe	3
cpd00654	Deoxycytidine	C9H13N3O4	0
cpd00438	Deoxyadenosine	C10H13N5O3	0
cpd00381	L-Cystine	C6H12N2O4S2	0
cpd11595	chromate	H2O4Cr	-2
cpd01012	Cd2+	Cd	2
cpd00063	Ca2+	Ca	2
cpd00041	L-Aspartate	C4H6NO4	-1
cpd01048	Arsenate	HO4As	-2
cpd00051	L-Arginine	C6H15N4O2	1
cpd00035	L-Alanine	C3H7NO2	0
cpd00182	Adenosine	C10H13N5O4	0
cpd00311	Guanosine	C10H13N5O5	0
cpd00126	GMP	C10H13N5O8P	-1
cpd00018	AMP	C10H13N5O7P	-1
cpd00091	UMP	C9H12N2O9P	-1
cpd00046	CMP	C9H13N3O8P	-1
cpd00793	Thiamine phosphate	C12H17N4O4PS	0
cpd00541	Lipoate	C8H13O2S2	-1
cpd00239	H2S	H2S	0
cpd00013	NH3	H4N	1
cpd00244	Ni2+	Ni	2
cpd11574	Molybdate	H2MoO4	0

In [5]:
# status := optimization status of FBA simplex solver
# flux_value := biomass flux value (objective function)
# growth := boolean whether the model was able to grow or not
status, flux_value, growth = model.run_fba("ArgonneLB.txt")
print("Growth:", growth)


Growth: False

Gap-fill model on LB media

Each model object in PyFBA contains a gapfill() function that requires two arguments:

  1. Media file
  2. Close genomes functional roles file

The other two arguments here, use_flux and verbose, are optional.

  • use_flux is a boolean flag that will identify which reactions that were added during the first phase of gap-filling have a non-active or zero flux. These reactions are then removed before the second phase of gap-filling occurs. This lowers the number of reactions that must be tested during second phase, thus speeding up the gap-filling process.
  • verbose is an integer flag that will output status update to stderr.

In [6]:
success = model.gapfill("ArgonneLB.txt", close_genomes_functions_file, use_flux=True, verbose=1)
if not success:
    print("Model was unable to gap-fill!")


Current model contains 1445 reactions
Finding media import reactions
Found 139 reactions
New total: 1584 reactions
Finding essential reactions
Found 109 reactions
New total: 1630 reactions
Finding close organism reactions
Found 1875 reactions
New total: 2047 reactions
Finding subsystem reactions
Found 237 reactions
New total: 2284 reactions
Finding EC reactions
Found 0 reactions
New total: 2284 reactions
Finding compound-probability reactions
Found 2686 reactions
New total: 4970 reactions
Gap-fill was successful, now trimming model
Removed 2654 reactions based on flux value
Trimming probable group of reactions
At the beginning the base list has 1817  and the optional list has 1273 reactions
Trimming ec group of reactions
The set of 'base' reactions results in growth so we don't need to bisect the optional set
Trimming subsystems group of reactions
The set of 'base' reactions results in growth so we don't need to bisect the optional set
Trimming close genomes group of reactions
At the beginning the base list has 1520  and the optional list has 165 reactions
Trimming essential group of reactions
At the beginning the base list has 1514  and the optional list has 6 reactions
Trimming media group of reactions
At the beginning the base list has 1451  and the optional list has 54 reactions
Trimming complete.
Total gap-filled reactions: 8
The biomass reaction has a flux of 323.36325286093853

We can view the reactions that were gap-filled into the model.


In [7]:
for n, rid in enumerate(model.gf_reactions, start=1):
    print("({}) {}: {}".format(n, rid, model.reactions[rid].equation))


(1) rxn29117: (1) Pyridoxal[e] <=> (1) Pyridoxal[c]
(2) rxn13681: (1) H+[e] + (1) Co2+[c] <=> (1) H+[c] + (1) Co2+[e]
(3) rxn01513: (1) ATP[c] + (1) H+[c] + (1) dTMP[c] <=> (1) ADP[c] + (1) dTDP[c]
(4) rxn05645: (1) H+[e] + (1) Riboflavin[e] <=> (1) H+[c] + (1) Riboflavin[c]
(5) rxn10571: (1) H2O[c] + (1) ATP[c] + (1) Mg[e] <=> (1) ADP[c] + (1) Phosphate[c] + (1) H+[c] + (1) Mg[c]
(6) rxn00119: (1) ATP[c] + (1) H+[c] + (1) UMP[c] <=> (1) ADP[c] + (1) UDP[c]
(7) rxn03164: (1) ATP[c] + (1) Ala-Ala[c] + (1) UDP-N-acetylmuramoyl-L-alanyl-D-gamma-glutamyl-meso-2-6-diaminopimelate[c] <=> (1) ADP[c] + (1) Phosphate[c] + (1) H+[c] + (1) UDP-N-acetylmuramoyl-L-alanyl-D-glutamyl-6-carboxy-L-lysyl-D-alanyl- D-alanine[c]
(8) rxn02285: (1) NADP[c] + (1) UDP-MurNAc[c] <=> (1) NADPH[c] + (1) H+[c] + (1) UDP-N-acetylglucosamine enolpyruvate[c]

Save model

The second step shows how to save the model to hard disk.


In [8]:
model_directory = "save_citrobacter_sedlakii"
PyFBA.model.save_model(model, model_directory)

Model has been stored. Here is a directory listing of the files that were created.


In [9]:
for f in os.listdir(model_directory):
    fp = os.path.join(model_directory, f)
    print(f, ": ", os.path.getsize(fp), "B", sep="")


Citrobacter sedlakii.compounds: 27883B
Citrobacter sedlakii.gfmedia: 14B
Citrobacter sedlakii.gfreactions: 72B
Citrobacter sedlakii.info: 114B
Citrobacter sedlakii.reactions: 13077B
Citrobacter sedlakii.roles: 70146B