Task: Proteomics

For this study we are interested in finding out protein families having a high similarity, then picking up a pair of proteins and performing structural alignment in a PyMol (MMTK might also be used). See this for a general reference to protein similarity networks: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3031030/.

For the first part of our study we perform a BLAST global alignment on the human proteome. You have the peptide data in your folder. Read how to do this here: https://www.biostars.org/p/6541/

You then select a few high scoring candidate pairs and choose one such pair. Perform structural alignment on Python/PyMol.

A way to perform structural alignment in BioPython is taken from this blog of a Danish computational chemist. It is neat, but I would prefer if you try instead to use PyMol and make a nice visualization of the aligned structures as well. Use this for functional alignment: http://www.pymolwiki.org/index.php/Align Alternatively you can open the BioPython output in any visualization program, preferably from Python.


In [ ]:
import Bio.PDB
 
# Select what residues numbers you wish to align
# and put them in a list
start_id = 1
end_id   = 70
atoms_to_be_aligned = range(start_id, end_id + 1)
 
# Start the parser
pdb_parser = Bio.PDB.PDBParser(QUIET = True)
 
# Get the structures
ref_structure = pdb_parser.get_structure("reference", "1D3Z.pdb")
sample_structure = pdb_parser.get_structure("samle", "1UBQ.pdb")
 
# Use the first model in the pdb-files for alignment
# Change the number 0 if you want to align to another structure
ref_model    = ref_structure[0]
sample_model = sample_structure[0]
 
# Make a list of the atoms (in the structures) you wish to align.
# In this case we use CA atoms whose index is in the specified range
ref_atoms = []
sample_atoms = []
 
# Iterate of all chains in the model in order to find all residues
for ref_chain in ref_model:
  # Iterate of all residues in each model in order to find proper atoms
  for ref_res in ref_chain:
    # Check if residue number ( .get_id() ) is in the list
    if ref_res.get_id()[1] in atoms_to_be_aligned:
      # Append CA atom to list
      ref_atoms.append(ref_res['CA'])
 
# Do the same for the sample structure
for sample_chain in sample_model:
  for sample_res in sample_chain:
    if sample_res.get_id()[1] in atoms_to_be_aligned:
      sample_atoms.append(sample_res['CA'])
 
# Now we initiate the superimposer:
super_imposer = Bio.PDB.Superimposer()
super_imposer.set_atoms(ref_atoms, sample_atoms)
super_imposer.apply(sample_model.get_atoms())
 
# Print RMSD:
print super_imposer.rms
 
# Save the aligned version of 1UBQ.pdb
io = Bio.PDB.PDBIO()
io.set_structure(sample_structure) 
io.save("1UBQ_aligned.pdb")