Lab 6

1) Sa se scrie o functie care extrage cuvintele dintr-un text dat ca parametru. Un cuvant este definit ca o secventa de caractere alfa-numerice.


In [7]:
import re

def extract_words(txt):
    #equal expressions
    #print (re.split("[^a-z']+", txt, flags = re.IGNORECASE))
    #print (re.split("[^a-z'A-Z]+", txt))
    return re.split("[^a-z'A-Z]+", txt)

extract_words("Today I'm having a python course")


Out[7]:
['Today', "I'm", 'having', 'a', 'python', 'course']

2) Sa se scrie o functie care primeste ca parametru un sir de caractere regex, un sir de caractere text si un numar intreg x si returneaza acele substring-uri de lungime maxim x care fac match pe expresia regulata data.


In [20]:
import re

def match_substring(regexp, txt, length):
    regexp = re.compile(regexp)
    result = re.findall(regexp, txt)
    if result:
        print result
        #copy = result[:] #slicing makes copy without reference
        result = [x for x in result if len(x) == length]
        return result
                
match_substring(r'a', "Today I'm having a python course", 4)


['a', 'a', 'a']
Out[20]:
[]

3) Sa se scrie o functie care primeste ca parametru un sir de caractere text si o lista de expresii regulate si returneaza o lista de siruri de caractere care fac match pe cel putin o expresie regulata daca ca parametru.


In [28]:
import re

def match_regex_list(txt, regexp_list):
    print regexp_list
    for i in range(len(regexp_list)):
        regexp_list[i] = re.compile(regexp_list[i])
    print regexp_list
    
    result = []
    for i in range(len(regexp_list)):
        result.append(re.findall(regexp_list[i], txt))
    if result:
        return result
                
match_regex_list("Today I'm having a python course", [r'ay', r'on'])


['ay', 'on']
[<_sre.SRE_Pattern object at 0x056661E0>, <_sre.SRE_Pattern object at 0x05666330>]
Out[28]:
[['ay'], ['on']]

4) Sa se scrie o functie care primeste ca parametru path-ul catre un document xml si un dictionar attrs si returneaza acele elemente care au ca atribute toate cheile din dictionar si ca valoare valorile corespunzatoare. De exemplu, pentru dictionarul {"class": "url", "name": "url-form", "data-id": "item"} se vor selecta elementele care au atributele: class="url" si name="url-form" si data-id="item".


In [1]:
import os
import xml.etree.ElementTree as ET

def match_XML(file_path):
    file_path = os.path.realpath(file_path)
    xml_file = open(file_path, 'r').read()
    tree = ET.parse(file_path)
    root = tree.getroot()
    
    print xml_file
    print '---'
    print tree
    print root.tag, ':: ', root.attrib
    
    for child in root:
        print child.tag, ':: ', child.attrib
    

match_XML("./lab5/game.xml")


<?xml version="1.0" encoding="UTF-8"?>
<game>
	<title>The Imitation Game</title>
	<platform>Android</platform>
	<platform min-version="8">iOS</platform>
	<platform min-version="10">Windows</platform>
	<url>...</url>
	<player>
		<identity>
			<first-name>Dan</first-name>
			<last-name>Alexandru</last-name>
			<!-- other info -->
		</identity>
		<points>1005</points>
	</player>
</game>
---
<xml.etree.ElementTree.ElementTree object at 0x05D44470>
game ::  {}
title ::  {}
platform ::  {}
platform ::  {'min-version': '8'}
platform ::  {'min-version': '10'}
url ::  {}
player ::  {}

5) Sa se scrie o alta varianta a functiei de la exercitiul anterior care returneaza acele elemente care au cel putin un atribut care corespunde cu o pereche cheie-valoare din dictionar.


In [ ]:

6) Sa se scrie o functie care pentru un text dat ca parametru, cenzureaza cuvintele care incep si se termina cu vocale. Prin cenzurare se intelege inlocuirea caracterelor de pe pozitii impare cu caracterul * .


In [15]:
import re

def censor(txt):
    regexp = re.compile(r'^[aeiou]\w+[aeiou]$')
    result = re.findall(regexp, txt)
    if result:
        print result

censor("Today I'm having a python course")

7) Sa se verifice, folosind o expresie regulata, daca un sir de caractere reprezinta un CNP valid.


In [79]:
import re

def is_valid_CNP(txt):
    #regexp = re.compile("[12]\d{12}") #brute form, with sex verification
    regexp = re.compile("([1234])(\d{2})(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])(\d{6})") #almost complete form
        #checks for sex, (year can be anything), valid month, valid day (without specific month days / leap year checks)
        #special checks for last 6 characters ...
    result = re.match(regexp, txt)
    if result:
        #print result
        print result.group(0)
        print result.groups()
        for elem in result.groups():
            print elem
        return True
    return False

is_valid_CNP("2950730225780")


2950730225780
('2', '95', '07', '30', '225780')
2
95
07
30
225780
Out[79]:
True

8) Sa se scrie o functie care parcurge recursiv un director si afiseaza acele fisiere a caror nume face match pe o expresie regulata data ca parametru sau contine un sir de caractere care face match pe aceeasi expresie. Fisierele care satisfac ambele conditii vor fi afisate prefixate cu ">>"


In [ ]:
import os
import re

def find_regex(dir_path, regexp):
    print os.path.realpath(dir_path)
    regexp = re.compile(regexp)
    #os.walk ...
    #result = re.match(regexp, txt)

find_regex('./lab5', r'exe\d{1}.py')