Laboratory 03

1. Write regular expressions that match any number of digits greedily. Do not match anything else.


In [ ]:
import re

patt = re.compile( # TODO )
    
assert(patt.match("123"))
assert(patt.match("123ab") is None)

2. Match Hungarian phone numbers. The numbers do not include spaces or hyphens.


In [ ]:
patt = # TODO

assert(patt.match("+36301234567"))
assert(patt.match("+37000000000") is None)
assert(patt.match("+363012345678") is None)

3. Match Hungarian phone numbers that may be grouped in the following ways.


In [ ]:
patt = # TODO

assert(patt.match("+36301234567"))
assert(patt.match("+37000000000") is None)
assert(patt.match("+363012345678") is None)
assert(patt.match("+36 30 123 4566"))
assert(patt.match("+36-30-123-4566"))
assert(patt.match("+36-30-123-45667") is None)

4. Match any floating point numbers. Do not match invalid numbers such as 0.34.1


In [ ]:
patt = # TODO

assert(patt.match("1.9"))
assert(patt.match("1.9.2") is None)

5. Match email addresses.


In [ ]:
patt = # TODO

assert(patt.match("abc@example.com"))
assert(patt.match("abc") is None)
assert(patt.match("abc@example@example.com") is None)

6. Match any number in hexadecimal format. Hexadecimal numbers are prefixed with 0x.


In [ ]:
patt = # TODO

assert(patt.match("0xa"))
assert(patt.match("0x16FA"))
assert(patt.match("16FA") is None)
assert(patt.match("0x16FG") is None)

7. Remove everything between parentheses. Be careful about the next pair of parantheses.


In [ ]:
patt = # TODO

assert(patt.sub("",  "(a)bc") == "bc")
assert(patt.sub("",  "abc") == "abc")
assert(patt.sub("",  "a (bc) de (12)") == "a  de ")

8. Create a simple word tokenizer using regular expressions. What patterns should you split on? Add more tests.


In [ ]:
patt = # TODO

assert(patt.split("simple sentence.") == ["simple", "sentence", ""])
assert(patt.split("multiple \t whitespaces") == ["multiple", "whitespaces"])