In [1]:
ls -lah ../data/
In [2]:
!head ../data/rose.fa
In [3]:
%%bash
cd ../data/
RNAfold -p -d2 --noPS --noLP -T 37 < rose.fa
cd -
In [4]:
ls -lah ../data/
In [5]:
!head -n 25 ../data/ROSE1_dp.ps
In [6]:
!tail -n 25 ../data/ROSE1_dp.ps
RNAfold manpageIt also produces PostScript files with plots of the resulting secondary structure graph and a "dot plot" of the base pairing matrix. The dot plot shows a matrix of squares with area proportional to the pairing probability in the upper right half, and one square for each pair in the minimum free energy structure in the lower left half. For each pair i−j with probability p>10E−6 there is a line of the form
i j sqrt(p) ubox
in the PostScript file, so that the pair probabilities can be easily extracted.
In [7]:
%%bash
cat ../data/ROSE1_dp.ps | grep "^[0-9].*ubox$"
In [16]:
%%bash
awk '/^>/' ../data/rose.fa | head -1
In [ ]:
%%bash
# Runs RNAfold for the given RNA sequence over the range of temperatures,
# extracts base pairing probabilities and saves them in .txt files
# for later analysis
# PostScript file generated by RNAfold ends with this
psext="_dp.ps"
# FASTA file with RNA sequence, rna.fa by default
rna_fa=${1:-rna.fa}
# Temperature interval limits
T1=${2:-37}
T2=${3:-43}
# Check the input file exists
if [[ ! -f $rna_fa ]]
then
echo "Could not find $rna_fa ... Exiting."
exit 1
fi
# Get the base_name either from the fasta file or the filename
base_name=`awk '/^>/' $rna_fa | head -1`
if [[ -z "$base_name" ]]
then
base_name="${rna_fa%.*}"
else
base_name=${base_name##>}
fi
# Iterate over the T range and save probabilities to .txt file
for T in $(seq $T1 $T2)
do
echo "Running RNAfold for Temp=$T ..."
RNAfold -p -d2 --noPS --noLP -T $T < $rna_fa
tmpf=`ls | grep _dp.ps`
grep "^[0-9].*ubox$" $tmpf > ${base_name}_${T}.txt
done
# Cleanup
rm ${base_name}_dp.ps
In [ ]: