Author: Laura Gutierrez Funderburk Date: June 2017
In this notebook, we define functions: * PatternToNumber that takes a sequence as input, and output a base-4 integer that decodes the sequence. * NumberToPattern that takes a base-4 number, and outous a gene sequence. * CountingFrequencies that takes as input a gene sequence and an integer k, and counts the frequency of the k-mers in the sequence.

In [2]:
import numpy as np

In [3]:
def PatternToNumber(Pattern):
    i = 0
    Number = 0
    while i<len(Pattern):
        if Pattern[i]=='C':
            Number = Number + 1*4**(len(Pattern)-i-1)
        elif Pattern[i]=='G':
            Number = Number + 2*4**(len(Pattern)-i-1)
        elif Pattern[i]=='T':
            Number = Number + 3*4**(len(Pattern)-i-1)
        i += 1
    return Number

In [4]:
num = PatternToNumber('ACGCGGCTCTGAAA')
print(num)


26900352

In [5]:
string=""
index = 9904
quotient = index
remainder = index%4
while quotient!=0:
    print(quotient, remainder)
    if remainder==0:
        letter='A'
    elif remainder==1:
        letter='C'
    elif remainder==2:
        letter='G'
    elif remainder==3:
        letter='T'
    string += letter
    quotient = quotient/4
    remainder = quotient%4
reversed_str = string[::-1]
print(reversed_str)


(9904, 0)
(2476, 0)
(619, 3)
(154, 2)
(38, 2)
(9, 1)
(2, 2)
GCGGTAA

In [6]:
def NumberToPattern(index):
    string =""
    quotient = index
    remainder = index%4
    while quotient!=0:
        print(quotient, remainder)
        if remainder==0:
            letter='A'
        elif remainder==1:
            letter='C'
        elif remainder==2:
            letter='G'
        elif remainder==3:
            letter='T'
        string += letter
        quotient = quotient/4
        remainder = quotient%4
    reversed_str = string[::-1]
    return reversed_str

In [7]:
resu = NumberToPattern(5161)
print(resu)

resu2 = NumberToPattern(7592)
print(resu2)


(5161, 1)
(1290, 2)
(322, 2)
(80, 0)
(20, 0)
(5, 1)
(1, 1)
CCAAGGC
(7592, 0)
(1898, 2)
(474, 2)
(118, 2)
(29, 1)
(7, 3)
(1, 1)
CTCGGGA

In [12]:
def ComputingFrequencies(Text,k):
    FrequencyArray = []
    Pattern = []
    for i in range((4**k)):
        FrequencyArray.append(0)
    print(len(FrequencyArray))
    for i in range(len(Text)-k+1):
        Pattern.append(Text[i:i + k])
        j = PatternToNumber(Pattern[i])
        FrequencyArray[j] = FrequencyArray[j] + 1
    np_FrequencyArray = np.array(FrequencyArray)
    return np_FrequencyArray

In [13]:
result = ComputingFrequencies("ACTG",2)
print(result)


16
[0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0]

In [14]:
sample = ComputingFrequencies("CACTCCTGATTCCTGGGCGTCGCTCAGGCAGGCTAGTAATCGCTTAATGTTGACCTACGCAAGCAAGGCTGCGCAGAGCAGTCACCGCTAGGAGAGGACTGAGGGCGACGCATCGCTTTCCCAAAGGGGAATTTACACTGATTGAGATCGCTAGTTGGGGCTCGACCCTATTGTTTGTTAGTGAATCATGACCCTAGTGGAAGTGCACCGCCTCGTGAGCGTTTAGTACCTCGTCCCTACCTGTGGCCGCGTCAGGGGTATCGTCAGACAAAGGCACAAAGGTCTATTATTGGTATAGGCTTCTCTCTTCGACCTGCGTAACTAGATTACAAACTATTGACCTCTGCGGCCCGGAGGTCCCATTCACGGACACAGCGAGAAACGGTCATAACCATTAATTGTCTTTAACACCTAATCGGGGTACGACTTGAAACGCTGATACGACAACCGGCATGTGTTCCGTTCTAAGGCAATCCACGTTTATGCGTATGAAGTACCGAATATTGCATCTCGCGCATTTCTACGACTTCCCCCTTAGCTCCATGAGTGAACGGCTCATGAGTTCAGGTCATAACAATTTTCCTCAGGCATGTCAACTAGGCGTCCGACCATGGGTCCTTTTCATTCGCACCACCAATCCGATTTGAACTGGTGTAACTTTCTGCTATAGGTCCAGGCTTAGGAACTGTGCTCAGTCATCGAACTTTCAATTAAGTGCTGGAGGTGCAGAGAATGTGAACCTACGCGAGCTGGAAACCCCTTTCCATGAAGCTCC",5)

with open('/home/lgutierrezfunderburk/Documents/ComputingFrew.txt',"w") as myfile:
    for number in sample:
        myfile.write("%s " % number)


1024