In [2]:
%autosave 0
from __future__ import print_function


Autosave disabled

Search for single-stranded RNA motifs

We will now search for single-stranded motifs within a structure/trajectory. This is performed by using the ss_motif function.

results = bb.ss_motif(query,target,threshold=0.6,out=None,bulges=0)
  • query is a PDB file with the structure you want to search for within the file target. If the keyword topology is specified, the query structure is searched in the target trajectory file.
  • threshold is the eRMSD threshold to consider a substructure in target to be significantly similar to query. Typical relevant hits have eRMSD in the 0.6-0.9 range.
  • If you specify the optional string keyword out, PDB structures below the threshold are written with the specified prefix.
    • It is possible to specify the maximum number of allowed inserted or bulged bases with the option bulges.
    • The search is performed not considering the sequence. It is possible to specify a sequence with the sequence option. Abbreviations (i.e N/R/Y) are accepted.

The function returns a list of hits. Each element in this list is in turn a list containing the following information:

  • element 0 is the frame index. This is relevant if the search is performed over a trajectory/multi model PDB.
  • element 1 is the eRMSD distance from the query
  • element 2 is the list of residues.

In the following example we search for structures similar to GNRA.pdb in a crystal structure of the H.Marismortui large ribosomal subunit (PDB 1S72).


In [3]:
import barnaba as bb

# find all GNRA tetraloops in H.Marismortui large ribosomal subunit (PDB 1S72)
query = "../test/data/GNRA.pdb" 
target = "../test/data/1S72.pdb" 

# call function. 
results = bb.ss_motif(query,target,threshold=0.6,out='gnra_loops',bulges=1)


# Loaded query ../test/data/GNRA.pdb 
# Loaded target ../test/data/1S72.pdb 
# Treating nucleotide 1MA628 as A 
# Treating nucleotide OMU2587 as U 
# Treating nucleotide OMG2588 as G 
# Treating nucleotide UR32619 as U 
# Treating nucleotide PSU2621 as U 

Now we print the fragment residues and their eRMSD distance from the query structure.


In [4]:
for j in range(len(results)):
    #seq = "".join([r.split("_")[0] for r in results[j][2]])
    print("%2d eRMSD:%5.3f " % (j,results[j][1]))
    print("     Sequence: %s" % ",".join(results[j][2]))
    print()


 0 eRMSD:0.436 
     Sequence: C_252_0,U_253_0,C_254_0,A_255_0,C_256_0,G_257_0

 1 eRMSD:0.448 
     Sequence: U_468_0,G_469_0,U_470_0,G_471_0,A_472_0,A_473_0

 2 eRMSD:0.441 
     Sequence: C_576_0,G_577_0,C_578_0,G_579_0,A_580_0,G_581_0

 3 eRMSD:0.461 
     Sequence: G_690_0,G_691_0,A_692_0,A_693_0,A_694_0,C_695_0

 4 eRMSD:0.352 
     Sequence: C_804_0,G_805_0,A_806_0,A_807_0,A_808_0,G_809_0

 5 eRMSD:0.555 
     Sequence: U_1326_0,G_1327_0,A_1328_0,A_1329_0,A_1330_0,A_1331_0

 6 eRMSD:0.429 
     Sequence: G_1628_0,G_1629_0,A_1630_0,A_1631_0,A_1632_0,C_1633_0

 7 eRMSD:0.276 
     Sequence: C_1862_0,G_1863_0,C_1864_0,A_1865_0,A_1866_0,G_1867_0

 8 eRMSD:0.210 
     Sequence: C_2248_0,G_2249_0,G_2250_0,G_2251_0,A_2252_0,G_2253_0

 9 eRMSD:0.314 
     Sequence: C_2411_0,G_2412_0,A_2413_0,A_2414_0,A_2415_0,G_2416_0

10 eRMSD:0.426 
     Sequence: C_2629_0,G_2630_0,U_2631_0,G_2632_0,A_2633_0,G_2634_0

11 eRMSD:0.350 
     Sequence: C_2695_0,G_2696_0,A_2697_0,G_2698_0,A_2699_0,G_2700_0

12 eRMSD:0.475 
     Sequence: G_2876_0,G_2877_0,U_2878_0,A_2879_0,A_2880_0,C_2881_0

13 eRMSD:0.239 
     Sequence: C_89_1,G_90_1,C_91_1,G_92_1,A_93_1,G_94_1

14 eRMSD:0.488 
     Sequence: C_1594_0,G_1595_0,A_1597_0,A_1598_0,U_1599_0,G_1600_0

15 eRMSD:0.508 
     Sequence: U_493_0,C_494_0,A_495_0,G_496_0,A_498_0,G_499_0

16 eRMSD:0.445 
     Sequence: G_1054_0,G_1055_0,U_1056_0,A_1057_0,A_1058_0,C_1060_0

17 eRMSD:0.302 
     Sequence: C_1793_0,G_1794_0,G_1795_0,A_1796_0,A_1797_0,G_1799_0

We can also calculate RMSD distances for the different hits


In [5]:
import glob

pdbs = glob.glob("gnra_loops*.pdb")
dists = [bb.rmsd(query,f)[0] for f in pdbs]

for j in range(len(results)):
    seq = "".join([r.split("_")[0] for r in results[j][2]])
    print("%2d eRMSD:%5.3f RMSD: %6.4f" % (j,results[j][1],10.*dists[j]), end="")
    print("     Sequence: %s" % seq)

    #print "%50s %6.4f AA" % (f,10.*dist[0])


# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
 0 eRMSD:0.436 RMSD: 0.7048     Sequence: CUCACG
 1 eRMSD:0.448 RMSD: 0.6999     Sequence: UGUGAA
 2 eRMSD:0.441 RMSD: 0.7403     Sequence: CGCGAG
 3 eRMSD:0.461 RMSD: 0.6419     Sequence: GGAAAC
 4 eRMSD:0.352 RMSD: 0.8012     Sequence: CGAAAG
 5 eRMSD:0.555 RMSD: 0.8040     Sequence: UGAAAA
 6 eRMSD:0.429 RMSD: 0.8208     Sequence: GGAAAC
 7 eRMSD:0.276 RMSD: 0.7992     Sequence: CGCAAG
 8 eRMSD:0.210 RMSD: 0.6920     Sequence: CGGGAG
 9 eRMSD:0.314 RMSD: 0.7170     Sequence: CGAAAG
10 eRMSD:0.426 RMSD: 0.6651     Sequence: CGUGAG
11 eRMSD:0.350 RMSD: 0.6953     Sequence: CGAGAG
12 eRMSD:0.475 RMSD: 0.9468     Sequence: GGUAAC
13 eRMSD:0.239 RMSD: 0.6417     Sequence: CGCGAG
14 eRMSD:0.488 RMSD: 2.2815     Sequence: CGAAUG
15 eRMSD:0.508 RMSD: 1.4962     Sequence: UCAGAG
16 eRMSD:0.445 RMSD: 1.0643     Sequence: GGUAAC
17 eRMSD:0.302 RMSD: 1.2767     Sequence: CGGAAG

Note that the first hit has a low eRMSD, but no GNRA sequence. Let's have a look at this structure:


In [6]:
import py3Dmol

query_s = open(query,'r').read()
hit_0 = open(pdbs[0],'r').read()

p = py3Dmol.view(width=900,height=600,viewergrid=(1,2))
p.addModel(query_s,'pdb',viewer=(0,0))
p.addModel(hit_0,'pdb',viewer=(0,1))
p.setStyle({'stick':{}})
p.setBackgroundColor('0xeeeeee')
p.zoomTo()
p.show()


Out[6]:

We can also check hit 14, that has low eRMSD but (relatively) high RMSD


In [7]:
hit_14 = open(pdbs[14],'r').read()

p = py3Dmol.view(width=900,height=600,viewergrid=(1,2))
p.addModel(query_s,'pdb',viewer=(0,0))
p.addModel(hit_14,'pdb',viewer=(0,1))
p.setStyle({'stick':{}})
p.setBackgroundColor('0xeeeeee')
p.zoomTo()
p.show()


Out[7]: