In [2]:

    
%autosave 0
from __future__ import print_function









    














    



Autosave disabled

Search for single-stranded RNA motifs

We will now search for single-stranded motifs within a structure/trajectory. This is performed by using the ss_motif function.

results = bb.ss_motif(query,target,threshold=0.6,out=None,bulges=0)

query is a PDB file with the structure you want to search for within the file target. If the keyword topology is specified, the query structure is searched in the target trajectory file.
threshold is the eRMSD threshold to consider a substructure in target to be significantly similar to query. Typical relevant hits have eRMSD in the 0.6-0.9 range.
If you specify the optional string keyword out, PDB structures below the threshold are written with the specified prefix.
- It is possible to specify the maximum number of allowed inserted or bulged bases with the option bulges.
- The search is performed not considering the sequence. It is possible to specify a sequence with the sequence option. Abbreviations (i.e N/R/Y) are accepted.

The function returns a list of hits. Each element in this list is in turn a list containing the following information:

element 0 is the frame index. This is relevant if the search is performed over a trajectory/multi model PDB.
element 1 is the eRMSD distance from the query
element 2 is the list of residues.

In the following example we search for structures similar to GNRA.pdb in a crystal structure of the H.Marismortui large ribosomal subunit (PDB 1S72).



In [3]:

    
import barnaba as bb

# find all GNRA tetraloops in H.Marismortui large ribosomal subunit (PDB 1S72)
query = "../test/data/GNRA.pdb" 
target = "../test/data/1S72.pdb" 

# call function. 
results = bb.ss_motif(query,target,threshold=0.6,out='gnra_loops',bulges=1)









    



# Loaded query ../test/data/GNRA.pdb 
# Loaded target ../test/data/1S72.pdb 
# Treating nucleotide 1MA628 as A 
# Treating nucleotide OMU2587 as U 
# Treating nucleotide OMG2588 as G 
# Treating nucleotide UR32619 as U 
# Treating nucleotide PSU2621 as U

Now we print the fragment residues and their eRMSD distance from the query structure.



In [4]:

    
for j in range(len(results)):
    #seq = "".join([r.split("_")[0] for r in results[j][2]])
    print("%2d eRMSD:%5.3f " % (j,results[j][1]))
    print("     Sequence: %s" % ",".join(results[j][2]))
    print()









    



 0 eRMSD:0.436 
     Sequence: C_252_0,U_253_0,C_254_0,A_255_0,C_256_0,G_257_0

 1 eRMSD:0.448 
     Sequence: U_468_0,G_469_0,U_470_0,G_471_0,A_472_0,A_473_0

 2 eRMSD:0.441 
     Sequence: C_576_0,G_577_0,C_578_0,G_579_0,A_580_0,G_581_0

 3 eRMSD:0.461 
     Sequence: G_690_0,G_691_0,A_692_0,A_693_0,A_694_0,C_695_0

 4 eRMSD:0.352 
     Sequence: C_804_0,G_805_0,A_806_0,A_807_0,A_808_0,G_809_0

 5 eRMSD:0.555 
     Sequence: U_1326_0,G_1327_0,A_1328_0,A_1329_0,A_1330_0,A_1331_0

 6 eRMSD:0.429 
     Sequence: G_1628_0,G_1629_0,A_1630_0,A_1631_0,A_1632_0,C_1633_0

 7 eRMSD:0.276 
     Sequence: C_1862_0,G_1863_0,C_1864_0,A_1865_0,A_1866_0,G_1867_0

 8 eRMSD:0.210 
     Sequence: C_2248_0,G_2249_0,G_2250_0,G_2251_0,A_2252_0,G_2253_0

 9 eRMSD:0.314 
     Sequence: C_2411_0,G_2412_0,A_2413_0,A_2414_0,A_2415_0,G_2416_0

10 eRMSD:0.426 
     Sequence: C_2629_0,G_2630_0,U_2631_0,G_2632_0,A_2633_0,G_2634_0

11 eRMSD:0.350 
     Sequence: C_2695_0,G_2696_0,A_2697_0,G_2698_0,A_2699_0,G_2700_0

12 eRMSD:0.475 
     Sequence: G_2876_0,G_2877_0,U_2878_0,A_2879_0,A_2880_0,C_2881_0

13 eRMSD:0.239 
     Sequence: C_89_1,G_90_1,C_91_1,G_92_1,A_93_1,G_94_1

14 eRMSD:0.488 
     Sequence: C_1594_0,G_1595_0,A_1597_0,A_1598_0,U_1599_0,G_1600_0

15 eRMSD:0.508 
     Sequence: U_493_0,C_494_0,A_495_0,G_496_0,A_498_0,G_499_0

16 eRMSD:0.445 
     Sequence: G_1054_0,G_1055_0,U_1056_0,A_1057_0,A_1058_0,C_1060_0

17 eRMSD:0.302 
     Sequence: C_1793_0,G_1794_0,G_1795_0,A_1796_0,A_1797_0,G_1799_0

We can also calculate RMSD distances for the different hits



In [5]:

    
import glob

pdbs = glob.glob("gnra_loops*.pdb")
dists = [bb.rmsd(query,f)[0] for f in pdbs]

for j in range(len(results)):
    seq = "".join([r.split("_")[0] for r in results[j][2]])
    print("%2d eRMSD:%5.3f RMSD: %6.4f" % (j,results[j][1],10.*dists[j]), end="")
    print("     Sequence: %s" % seq)

    #print "%50s %6.4f AA" % (f,10.*dist[0])









    



# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
# found  72 atoms in common
 0 eRMSD:0.436 RMSD: 0.7048     Sequence: CUCACG
 1 eRMSD:0.448 RMSD: 0.6999     Sequence: UGUGAA
 2 eRMSD:0.441 RMSD: 0.7403     Sequence: CGCGAG
 3 eRMSD:0.461 RMSD: 0.6419     Sequence: GGAAAC
 4 eRMSD:0.352 RMSD: 0.8012     Sequence: CGAAAG
 5 eRMSD:0.555 RMSD: 0.8040     Sequence: UGAAAA
 6 eRMSD:0.429 RMSD: 0.8208     Sequence: GGAAAC
 7 eRMSD:0.276 RMSD: 0.7992     Sequence: CGCAAG
 8 eRMSD:0.210 RMSD: 0.6920     Sequence: CGGGAG
 9 eRMSD:0.314 RMSD: 0.7170     Sequence: CGAAAG
10 eRMSD:0.426 RMSD: 0.6651     Sequence: CGUGAG
11 eRMSD:0.350 RMSD: 0.6953     Sequence: CGAGAG
12 eRMSD:0.475 RMSD: 0.9468     Sequence: GGUAAC
13 eRMSD:0.239 RMSD: 0.6417     Sequence: CGCGAG
14 eRMSD:0.488 RMSD: 2.2815     Sequence: CGAAUG
15 eRMSD:0.508 RMSD: 1.4962     Sequence: UCAGAG
16 eRMSD:0.445 RMSD: 1.0643     Sequence: GGUAAC
17 eRMSD:0.302 RMSD: 1.2767     Sequence: CGGAAG

Note that the first hit has a low eRMSD, but no GNRA sequence. Let's have a look at this structure:



In [6]:

    
import py3Dmol

query_s = open(query,'r').read()
hit_0 = open(pdbs[0],'r').read()

p = py3Dmol.view(width=900,height=600,viewergrid=(1,2))
p.addModel(query_s,'pdb',viewer=(0,0))
p.addModel(hit_0,'pdb',viewer=(0,1))
p.setStyle({'stick':{}})
p.setBackgroundColor('0xeeeeee')
p.zoomTo()
p.show()









    Out[6]:

We can also check hit 14, that has low eRMSD but (relatively) high RMSD



In [7]:

    
hit_14 = open(pdbs[14],'r').read()

p = py3Dmol.view(width=900,height=600,viewergrid=(1,2))
p.addModel(query_s,'pdb',viewer=(0,0))
p.addModel(hit_14,'pdb',viewer=(0,1))
p.setStyle({'stick':{}})
p.setBackgroundColor('0xeeeeee')
p.zoomTo()
p.show()









    Out[7]: