Some remarks on regular expressions

This is a long topic, but I gather here the remarks on what I show during the lectures. You can have a look for example here


In [1]:
import re #Module with regular expression functionality

for match in re.findall("AC.T","TACCTACGTT"): #Look for AC then any character then T
    print(match)


ACCT
ACGT

In [2]:
for match in re.findall("AC[GA]T","TACCTACGTT"): #Look for AC then either G or A then T
    print(match)


ACGT

In [3]:
#look for "match" then space then as many characters as possible then space then "longest"
for match in re.findall("match .* longest", "match always the longest and I really mean longest string"):
    print(match)


match always the longest and I really mean longest

In [4]:
for match in re.findall("[ACGT]","ACGTYHBCGT"): #match a single character A, C, G, or T
    print(match)


A
C
G
T
C
G
T

In [5]:
for match in re.findall("[^ACGT]","ACGTYHBCGT"): #match a single character which is not A, C, G, or T
    print(match)


Y
H
B

In [6]:
print(re.sub("[^ACGT]","?","ACGTYHBCGT")) #Replace all characters which are not A,C,G,T with ?
print(re.sub("[^ACGT]","","ACGTYHBCGT")) #Remove all characters which are not A,C,G,T


ACGT???CGT
ACGTCGT

In [7]:
rexp=r"jobid: (\S+)"

In [ ]: