PDF táblázatok pandas-ba való alakítása. Olyan PDF-ekre, amelyek képekből vannak - tehát fényképek, szkennelések, vagy hasonló. Ez magában foglalja a sima fényképek (JPG, PNG) szövegfelismerését is. Az átalakítási folyamat három lépéses:

  1. PDF oldalainak képekké alakítása
  2. Szövegfelismerés a képeken (ezt angolul úgy hívják OCR - optical character recognition)
  3. Szöveg táblázattá alakítása

Installs

ImageMagick és GhostSCript

Kell hozzá ImageMagick és GhostScript.

Poppler

Install Poppler, add to PATH

Go to this page and download the binary of your choice. In this example we will download and use poppler-0.68.0. Extract the archive file poppler-0.68.0_x86.7z into C:\Program Files\poppler. Thus the directory structure should look something like this:

C:
└ Program Files
└ poppler
└ poppler-0.68.0
└ bin
└ include
└ lib
└ share

Windows: Add C:\Program Files\poppler\poppler-0.68.0\bin to your system PATH.

Tesseract

Add C:\Program Files\Tesseract-OCR to your system PATH.

Restart Python

Ezután indítsd újra az Anacondát és a JupyterLabet és a Python-t.


In [1]:
!pip install Pillow


Requirement already satisfied: Pillow in c:\programdata\anaconda3\lib\site-packages (6.2.1)

In [2]:
!pip install pdf2image


Requirement already satisfied: pdf2image in c:\programdata\anaconda3\lib\site-packages (1.10.0)
Requirement already satisfied: pillow in c:\programdata\anaconda3\lib\site-packages (from pdf2image) (6.2.1)

In [3]:
!pip install pytesseract


Requirement already satisfied: pytesseract in c:\programdata\anaconda3\lib\site-packages (0.3.0)
Requirement already satisfied: Pillow in c:\programdata\anaconda3\lib\site-packages (from pytesseract) (6.2.1)

In [4]:
!pip install opencv-python


Requirement already satisfied: opencv-python in c:\programdata\anaconda3\lib\site-packages (4.1.2.30)
Requirement already satisfied: numpy>=1.14.5 in c:\programdata\anaconda3\lib\site-packages (from opencv-python) (1.17.4)

1


In [5]:
from PIL import Image 
import sys 
from pdf2image import convert_from_path 
import os

In [9]:
# Path of the pdf 
PDF_file = "Lista candidatilor admisi - Licenta - sesiunea septembrie 2017.pdf"

In [10]:
if not os.path.exists('pdf/'+PDF_file+'/'):
    os.makedirs('pdf/'+PDF_file+'/')

Készítünk egy mappát, ahová a PDF oldalait exportáljuk képként.


In [11]:
# Store all the pages of the PDF in a variable 
#a masodik szam a felbontas, ezt 200-600 kozott probalgasd
pages = convert_from_path(PDF_file, 300) 
  
# Counter to store images of each page of PDF to image 
image_counter = 1
  
# Iterate through all the pages stored above 
for page in pages: 
  
    # Declaring filename for each page of PDF as JPG 
    # For each page, filename will be: 
    # PDF page 1 -> page_1.jpg 
    # PDF page 2 -> page_2.jpg 
    # PDF page 3 -> page_3.jpg 
    # .... 
    # PDF page n -> page_n.jpg 
    filename = 'pdf/'+PDF_file+"/page_"+str(image_counter)+".jpg"
    print(image_counter,'oldal kész..')
      
    # Save the image of the page in system 
    page.save(filename, 'JPEG') 
  
    # Increment the counter to update filename 
    image_counter = image_counter + 1


1 oldal kész..
2 oldal kész..
3 oldal kész..
4 oldal kész..
5 oldal kész..
6 oldal kész..
7 oldal kész..
8 oldal kész..
9 oldal kész..
10 oldal kész..
11 oldal kész..
12 oldal kész..
13 oldal kész..
14 oldal kész..
15 oldal kész..
16 oldal kész..
17 oldal kész..
18 oldal kész..
19 oldal kész..
20 oldal kész..
21 oldal kész..
22 oldal kész..
23 oldal kész..
24 oldal kész..
25 oldal kész..
26 oldal kész..
27 oldal kész..
28 oldal kész..
29 oldal kész..
30 oldal kész..
31 oldal kész..
32 oldal kész..
33 oldal kész..
34 oldal kész..
35 oldal kész..
36 oldal kész..

2

Szövegfelismerés a képeken


In [12]:
import pytesseract

In [13]:
# Variable to get count of total number of pages 
filelimit = image_counter-1
  
# Creating a text file to write the output 
outfile = 'pdf/'+PDF_file+"/text.txt"
  
# Open the file in append mode so that  
# All contents of all images are added to the same file 
f = open(outfile, "a") 
  
# Iterate from 1 to total number of pages 
for i in range(1, filelimit + 1): 
  
    # Set filename to recognize text from 
    # Again, these files will be: 
    # page_1.jpg 
    # page_2.jpg 
    # .... 
    # page_n.jpg 
    filename = 'pdf/'+PDF_file+"/page_"+str(i)+".jpg"
          
    # Recognize the text as string in image using pytesserct 
    text = str(((pytesseract.image_to_string(Image.open(filename))))) 
    print(i,'oldal kész..')
  
    # The recognized text is stored in variable text 
    # Any string processing may be applied on text 
    # Here, basic formatting has been done: 
    # In many PDFs, at line ending, if a word can't 
    # be written fully, a 'hyphen' is added. 
    # The rest of the word is written in the next line 
    # Eg: This is a sample text this word here GeeksF- 
    # orGeeks is half on first line, remaining on next. 
    # To remove this, we replace every '-\n' to ''. 
    text = text.replace('-\n', '')     
  
    # Finally, write the processed text to the file. 
    f.write(text) 

# Close the file after writing all the text. 
f.close()


1 oldal kész..
2 oldal kész..
3 oldal kész..
4 oldal kész..
5 oldal kész..
6 oldal kész..
7 oldal kész..
8 oldal kész..
9 oldal kész..
10 oldal kész..
11 oldal kész..
12 oldal kész..
13 oldal kész..
14 oldal kész..
15 oldal kész..
16 oldal kész..
17 oldal kész..
18 oldal kész..
19 oldal kész..
20 oldal kész..
21 oldal kész..
22 oldal kész..
23 oldal kész..
24 oldal kész..
25 oldal kész..
26 oldal kész..
27 oldal kész..
28 oldal kész..
29 oldal kész..
30 oldal kész..
31 oldal kész..
32 oldal kész..
33 oldal kész..
34 oldal kész..
35 oldal kész..
36 oldal kész..

3


In [14]:
import pandas as pd

Felismert szöveg beolvasása


In [15]:
pages=open(outfile,'r').read()

Sorok felosztása újsork karakterek \n szerint


In [16]:
lines=[i for i in pages.split('\n') if i]

Csak azokat a sorokat tartjuk meg, amelyek számmal kezdődnek


In [17]:
good_lines=[line for line in lines if line[0].isdigit()]

Tipikus hibás felismert karaterek javítása


In [18]:
good_lines=[line.replace('_',' ').replace('. ',' ').replace('-',' ').replace('—',' ')\
     .replace('~',' ').replace('=',' ').replace('  ',' ').replace('  ',' ')\
     .replace('»',' ')for line in good_lines]

In [19]:
good_lines[:10]


Out[19]:
['1 5501 MITITEL P PAVEL 9.23 10.00 8.00 ADMIS',
 '2 5442 BIDIU D ALINA NICOLETA 9,03 9.40 8.90 ADMIS',
 '3 5168 BOGDAN G LUCIAN BOGDAN 8,61 8.95 7.00 ADMIS',
 '4 5166 POP O JULIA CRISTINA 8,41 8.65 7.60 ADMIS',
 '5 5101 KADAR R ADRIAN LUCIAN 8,36 8.00 8.65 ADMIS',
 '6 5023 KUI K. A ROBERT SZILARD 8,36 7.40 9.25 ADMIS',
 '7 5131 MITITELU I DIANA IOANA 8,23 9.15 6.35 ADMIS',
 '8 5572 RATIU C.M FLAVIA TEODORA 8,23 6.95 9.10 ADMIS',
 '9 5577 SFITLIC P DENIS PETRU 8,13 8.50 7.80 ADMIS',
 '10 5804 TROFIM V VALERIA 8,00 9.00 7.00 ADMIS']

Hibás sorok további kiszűrése manuálisan


In [20]:
for i,l in enumerate(good_lines):
    print(str(i)+':::'+l)


0:::1 5501 MITITEL P PAVEL 9.23 10.00 8.00 ADMIS
1:::2 5442 BIDIU D ALINA NICOLETA 9,03 9.40 8.90 ADMIS
2:::3 5168 BOGDAN G LUCIAN BOGDAN 8,61 8.95 7.00 ADMIS
3:::4 5166 POP O JULIA CRISTINA 8,41 8.65 7.60 ADMIS
4:::5 5101 KADAR R ADRIAN LUCIAN 8,36 8.00 8.65 ADMIS
5:::6 5023 KUI K. A ROBERT SZILARD 8,36 7.40 9.25 ADMIS
6:::7 5131 MITITELU I DIANA IOANA 8,23 9.15 6.35 ADMIS
7:::8 5572 RATIU C.M FLAVIA TEODORA 8,23 6.95 9.10 ADMIS
8:::9 5577 SFITLIC P DENIS PETRU 8,13 8.50 7.80 ADMIS
9:::10 5804 TROFIM V VALERIA 8,00 9.00 7.00 ADMIS
10:::11 5527 CHERECHES A.M ALEXANDRU 7,93 7.30 7.30 ADMIS
11:::1 5425 BEDEA M.D MIHAELA ELENA 9,48 10.00 9.30 ADMIS
12:::2 5088 CSORVASI F CSILLA 9,46 8.95 9.25 ADMIS
13:::3 5046 VERESS I IZABELA 8,88 8.55 8.62 ADMIS
14:::4 5490 SALAJAN D SERGIU DANIEL 8,75 6.40 7.10 ADMIS
15:::5 5008 DOLHA D ANGEL MANUEL 8,58 8.70 8.25 ADMIS
16:::6 5152 SZOLOMAYER IG ROLAND 8,16 8.10 6.75 ADMIS
17:::7, 5505 MARC V DIANA MARIA 8,06 8.65 8.00 ADMIS
18:::8 5035 SERDEAN AL MARIUS STELIAN 7,30 335 6.50 ADMIS
19:::9 5056 MIHON M PETRU 6,85 5.85 7.00 ADMIS
20:::10 5468 BALAZS A TIMEA 6,66 7.60 6.00 ADMIS
21:::11 5030 CIOT P DRAGOS BOGDAN 6,63 7.70 5.00 ADMIS
22:::12 5036 MOLDOVAN F ANDREI AUREL 6,43 6.80 6.10 ADMIS
23:::13 5471 TOTH M ROBERT TAMAS 6,10 7.00 5.80 ADMIS
24:::14 5032 SANDU C NARCIS 6,05 5.70 5.50 ADMIS
25:::7 TH ET ERCELLENTIA
26:::1 5065 PALATKA A KAMILLA 8,88 9.50 8.20 ADMIS
27:::2 5506 HORVATH T CSABA TIBOR 8,01 8.95 8.30 ADMIS
28:::3 5074 LORINCZ I ISTVAN ZOLTAN 7,68 7.80 8.40 ADMIS
29:::4 5090 TOROK S FELIX 7,56 9.05 7.55 ADMIS
30:::1 5496 UTA M ANISOARA MARCELA 8,68 8.05 8.80 ADMIS
31:::2 5085 HARASTASAN D ANDREEA MARIA 8,23 8.20 9.00 ADMIS
32:::3 5414 TUDIC V ANA 7,88 FAS 9.50 ADMIS
33:::4 5822 GHILESCU G MARINA 7,75 8.00 9.00 ADMIS
34:::5 5452 ABRUDAN R ROXANA ALEXANDRA Tadd 7.40 6.90 ADMIS
35:::6 5112 ZALANYI CS.L BRIGITTA ZSUZSA 7,71 7.65 7.30 ADMIS
36:::7 5516 TAMAS P SIMINA MARIA 7,70 7.30 6.90 ADMIS
37:::8 5473 BARBUR D PAUL STELIAN 7,61 7.20 7.45 ADMIS
38:::9 5429 CALIN G VASILE 7,45 VAS 6.50 ADMIS
39:::10 5438 BALAN F DIANA GEORGIANA 7,36 6.40 8.75 ADMIS
40:::11 5031 TODORAN I KYRA 7,16 6.55 6.20 ADMIS
41:::12 5437 BULGAREAN E ANDA 7,11 7.10 7.20 ADMIS
42:::13 5543 FODOR CD DIANA IOANA 7,10 6.20 7.50 ADMIS
43:::14 5484 BUTUNOI I CARMEN DIANA 7,05 7.00 5.75 ADMIS
44:::15 5816 DINCA I PETRISOR GORE 7,00  ADMIS
45:::16 5406 CZULI S SORINA BIANCA 6,90 8.55 6.30 ADMIS
46:::17 5124 MIHALY ZS ROBERT 6,75 7.10 6.30 ADMIS
47:::18 5069 SIMA I HORIA OVIDU 6,71 6.45 5.65 ADMIS
48:::19 5169 SUCIU G SERGIU GABRIEL 6,71 5.00 6.40 ADMIS
49:::20 5079 HULEA N MADALIN 6,68 9.35 5.30 ADMIS
50:::21 5407 MURESAN D MIHAI CIPRIAN 6,60 6.65 6.15 ADMIS
51:::22 5573 CHIRA V ALEX SEBASTIAN 6,58 6.50 6.10 ADMIS
52:::23 5104 TELEPTEAN P MIHAI ANDREI 6,58 5.85 5.10 ADMIS
53:::24 5550 VASIU S.D ALEXANDRU MADALIN 6,46 5.00 5.00 ADMIS
54:::25 5037 SZEKELY V SZABOLCS VILMOS 6,43 6.50 7.10 ADMIS
55:::26 5094 MATIES V MARIA 6,43 6.00 7.25 ADMIS
56:::27 5409 BORLESCU G EDWARD GEORGE STEFAN 6,41 6.85 5.85 ADMIS
57:::28 5559 GHERMAN G JONUT CRISTIAN 6,31 6.35 6.20 ADMIS
58:::29 5132 OLTEAN C IOANA ANDREEA 6,30 5.75 5.75 ADMIS
59:::30 5178 TARMURE D IOSIF EMANUEL 6,30 5.00 7.20 ADMIS
60:::31 5162 ANDREICA G ALEXANDRU MIHAI 6,23 8.10 5.00 ADMIS
61:::32 5493 BOROS G GEORGIANA VERONICA 6,23 5.00 7.50 ADMIS
62:::33 5486 DANCIULESCU O.D BOGDAN CIPRIAN 6,16 6.30 5.20 ADMIS
63:::5158 POP GH. C LOREDANA EMANUELA
64:::5006 CHINDRIS M BOGDAN IOAN
65:::35.
66:::36,
67:::37 5135 FECHETE M DENISA IONELA
68:::38 5021 SERUNA O.S OVIDIU SORIN
69:::6,06 “ 5.00 6.90 ADMIS
70:::6,05 755 5.10 ADMIS
71:::1, 5081 IACUBOVICI F ILARIU CONSTANTIN 8,47 6.50 8.10 ADMIS
72:::2 5040 DARABAN I.S CARINA GIULIANA 8,30 8.60 7.60 ADMIS
73:::3 5464 DOBRA E EMILIAN ANGEL 8,05 8.90 7.40 ADMIS
74:::4 5443 POP A OANA ADRIANA 7,85 8.90 6.50 ADMIS
75:::5 5140 PADUREAN I.D ANDREEA MIHAELA 7,60 TAS 8.35 ADMIS
76:::6 5141 SIVU A.G ALEXANDRA GEORGIANA 713 7.70 5.00 ADMIS
77:::7 5502 ILENI 1 IOAN 6,35 6.93 6.00 ADMIS
78:::8 5526 LAZAR GA MARIA ALEXANDRA 6,28 7.00 5.75 ADMIS
79:::9 5575 MURESAN I ANA MARIA 6,10 5.10 6.30 ADMIS
80:::1 5921 LUKACS GH.E LORAND ISTVAN 8,48 7 9.62 7.05 ADMIS
81:::2 5924 GRAMA I ALEXANDRU NICU 7,69 8.20 6.25 ADMIS
82:::3 5931 KANYO SZ HANNA BARBARA 7,20 6.70 8.40 ADMIS
83:::4 5922 TEPES I FLORINA CRISTINA 6,75 8.10 6.60 ADMIS
84:::5 5927 TANASE Z MARIAN GHEORGHE 6,21 6.35 7.30 ADMIS
85:::6 5923 KAJCSA I LORAND 6,15 6.00 7.45 ADMIS
86:::7 5910 BALEA N..A ADINA 6,10 6.35 6.90 ADMIS
87:::1 5920 RUJAN A ADRIAN DORIN 6,38 8.60 5.55 ADMIS
88:::1 5919 VARZA Z ANDREA BOGLARKA 8,80 8.50 9.90 ADMIS
89:::2 5918 LORINCZ K LILLA TUNDE 8,53 8.75 9.60 ADMIS
90:::3 5913 SZAKACS I ZSUZSA 8,05 8.62 9.66 ADMIS
91:::4 5902 SIMAN N MARIANA IDA 7,14 7.00 6.25 ADMIS
92:::5 5904 HAAN DEAK B ESZTER 6,57 7.60 7.65 ADMIS
93:::6 5930 BALAZS Z RITA 6,11 8.25 5.50 ADMIS
94:::1 5914 FARKAS B ANDREA 6,30 8.90 ADMIS
95:::2 5903 DAVID I HUNOR 6,13 7.80 5.00 ADMIS
96:::3 5905 BARABAS J AKOS 6,02 6.20 7.80 ADMIS
97:::1 5911 DEAK S IBOLYA 9,05 10,00 9.00 ADMIS
98:::2 5928 ALBU A ATTILA LEVENTE 8,11 8.00 6.00 ADMIS
99:::3 5908 POPA C ANNAMARIA 7,93 7.10 7.55 ADMIS
100:::4 5901 BRANZEA F CRISTINA 7,86 8.85 7.35 ADMIS
101:::5 5907 GASPAR J HANNA BARBARA 7,61 7.00 8.60 ADMIS
102:::6 5932 ORDOG G GABORISTVAN 7,39 8.50 7.40 ADMIS
103:::7 5925 KEDVES L ERIKA 7,29 6.70 7.50 ADMIS
104:::8 5915 MEZEI B LENKE 7,29 5.16 9.00 ADMIS
105:::9 5906 TATAR N IMOLA 7,25 8.80 8.05 ADMIS
106:::10 5917 DEAK GY GYULA 7,18 9.00 7.00 ADMIS
107:::11 5926 BEKE L, CSONGOR ZSOLT 7,11 8.50 8.00 ADMIS
108:::12 5916 JANCSO G HUNOR ISTVAN 7,07 7.10 9.20 ADMIS
109:::13 5909 PAPP G GABRIELLA 6,44 8.00 6.20 ADMIS
110:::14 5929 SZAKACS V EVA 6,35 8.00 5.65 ADMIS
111:::15 5912 ZSIGMOND T ATTILA 6,05 5.00 6.90 ADMIS
112:::1 5497 TRIF M ALEXANDRA MARIA 7,83 7.20 7.75 ADMIS
113:::2 5179 BOGAN RUS D MARIUS 6,67 7.20 6.95 ADMIS
114:::1 5155 REZNICIUC E CATALIN MIHAI 8,23 8.20 7.15 ADMIS
115:::2 5491 PORIME I VASILE SERGIU 7,30 6.10 6.90 ADMIS
116:::3 5049 PAVEL D ADRIANA UAS 6.50 6.40 ADMIS
117:::4 5552 SINPETREAN D ALEXANDRA OANA 6,33 5.40 6.70 ADMIS
118:::5 5144 NEMES I ROLLAND 6,08 6.15 6.55 ADMIS
119:::1 5161 MAIDANCIUC IONEL D.P LOREDAN 8,64 6.25 8.30 ADMIS
120:::2 5444 DUNCA C CORNELIU 8,25 6.35 8.10 ADMIS
121:::3 5817 PASCARIUC | DOINA 8,00 8.00 8.00 ADMIS
122:::4 5128 MOLDOVAN D ANDREI FINEAS 7,85 6.20 9.05 ADMIS
123:::5 5076 VASALIUT D DANIELA MARIA 7,75 8.50 6.55 ADMIS
124:::6 5455 HOLBURA V VALENTINA 7,70 7.30 7.95 ADMIS
125:::7 5108 NEAGOIE HV DIANA MARIA 7,63 7.80 7.60 ADMIS
126:::8 5512 POP A DARIUS L335 6.70 5.45 8.65 ADMIS
127:::9 5465 GRETA I SERGIU CLAUDIU 7,48 7.50 8.30 ADMIS
128:::10 5098 FODOR H ALEXANDRU CRISTIAN 7,41 8.50 5.10 7.10 ADMIS
129:::12, 5445 OLTEANU P.O ERVIN ADRIAN 7,23 7.15 7.10 ADMIS
130:::13 5412 AVRAM M MARIA BIANCA 7,13 6.40 7.80 ADMIS
131:::14 5564 RUSU E GEANINA ALINA 7,08 §.15 7.30 ADMIS
132:::15 5802 FIRTOS R ROXANA CASSANDRA 7,00  ADMIS
133:::16 5077 KIS Z HENRIETTA 6,97 6.65 7.00 ADMIS
134:::17 5570 PRICHICI V PAUL 6,90 6.00 5.30 ADMIS
135:::18 5073 NASTASA V CAMELIA 6,80 7.95 7.10 ADMIS
136:::19 5153 MAKI BOJTE E GYOPAR 6,60 6.25 6.20 ADMIS
137:::21 5012 DARABAN D DARIUS DANIEL GHEORGHE 6,51 6.25 7.65 ADMIS
138:::22 5092 PETER G MIHAI GABRIEL 6,46 6.60 5.00 ADMIS
139:::23 5415 SZASZ AL DAVID SANDOR 6,45 5.50 6.75 ADMIS
140:::24 5115 ULICI N ADELA 6,38 5.50 6.60 ADMIS
141:::25, 5571 NICOARA D GILDA MARIA 6,30 6.10 5.20 ADMIS
142:::26 5043 CULCER I NARCISA MINODORA 6,23 5.95 5.00 ADMIS
143:::27 5537 KALLO M MIHAI ROBERT 6,13 6.80 6.15 ADMIS
144:::28 5563 MOLDOVAN F FLAVIUS ALEX 6,13 6.00 6.20 ADMIS
145:::1 5423 HAJA A ALEXANDRA 9,40 10.00 9.00 ADMIS
146:::2 5557 VAIDA C TUDOR MIRCEA 9,26 9.20 9.20 ADMIS
147:::3 5565 FRENT I SIMON FLORIN 8,75 9.45 9.80 ADMIS
148:::4 5403 ORIAN A.V ALIN SORIN 8,71 8.65 8.20 ADMIS
149:::5 5143 PAUN G GEORGETA PAULA 8,61 7.80 8.40 ADMIS
150:::6 5821 NECHITA F.V ALEXANDRU 8,60  ADMIS
151:::8 5419 SABO M VLAD 8,36 9.50 7.35 ADMIS
152:::9 5474 TODEAN D RAMONA PERSIDA 8,30 6.65 9.00 ADMIS
153:::10 5430 BOGDAN O.V ANDREEA EMILIA 8,25 9.15 7.00 ADMIS
154:::11 5091 PRISACARIU C CATALIN CORNELIU 8,06 7.80 6.70 ADMIS
155:::12 5080 BALEA L ANDREEA ADINA 7,96 7.85 8.10 ADMIS
156:::13 5420 CUC T OTILIA MARIA 7,95 6.55 8.20 ADMIS
157:::14 5103 SIPOS I ANDREEA MARIA 7,88 7.15 8.40 ADMIS
158:::16 5160 VLASA D SERGIU TIBERIU 7,83 5.00 5.30 ADMIS
159:::17 5548 MONE G GABRIELA ALINA 7,81 7.00 7.60 ADMIS
160:::18 5100 BOTISAN GN BIANCA LAURA VASILICA 7,81 6.20 9.00 ADMIS
161:::1 5553 NEAMT G RAUL DANUT 9,48 9.80 8.95 ADMIS
162:::2 5180 VASILE A MIHAI ADRIAN 8,75 9.15 8.30 ADMIS
163:::3 5509 ILIE C DIANA 8,50 9.25 7.15 ADMIS
164:::4 5044 OTEL I FLOAREA CORINA 8,44 6.45 7.80 ADMIS
165:::5 5020 DARABANT A VLAD MIHAI 8,29 6.15 7.50 ADMIS
166:::6 5418 BOTOS V GIORGIANA BIANCA 8,23 7.45 8.70 ADMIS
167:::7 5109 KUI I MELINDA IMOLA 8,19 6.70 9.10 ADMIS
168:::8 5019 GONCZI A NOEMI 8,01 5.10 9.75 ADMIS
169:::9 5448 CIORBA V MARIA DAIANA 7,88 7.60 8.20 ADMIS
170:::10 5538 STEFANICS AD RAFAEL DANIEL 7,63 8.35 7.90 ADMIS
171:::11 5475 GRECU P ELENA CRISTINA 7,30 6.00 9.60 ADMIS
172:::12 5544 POP S COSMIN Fld 7A5 6.20 ADMIS
173:::13 5469 STETCO GG IUSTIN RENE IONUT 6,46 5.00 6.80 ADMIS
174:::14 5151 RUS I MARIAAURA 6,06 5.80 5.45 ADMIS
175:::15 5114 SUATEAN A ALINAANDREEA 6,03 5.00 5.15 ADMIS
176:::1 5014 MARICA A.C ALESANDRA 9,61 10.00 9.40 ADMIS
177:::2 5810 CARDOS SIBEL DIANA 9,40  ADMIS
178:::1 5569 ROTAR S.I IOAN ADRIAN 9,71 9.85 9.50 ADMIS
179:::2 5428 KIRALY LL LOREDANA 9,06 855 9.10 ADMIS
180:::3 5562 DANCIU IS VLAD 9,00 9.00 ‘8.00 ADMIS
181:::4.5110 BALICA N BEATRICE COSSETTE 8,98 8.90 9.00 ADMIS
182:::5 5410 NAGY S BEATA 8,65 10.00 9.45 ADMIS
183:::0
184:::1, 5129 MURESAN V ANDRE] 9,56 9.10 9.65 ADMIS
185:::2 5041 BENTA O AMALIA LUCIA 9,51 9.85 8.80 ADMIS
186:::3 5038 DUHASCHI V ANCUTA FLORICA 9532 7.80 10.00 ADMIS
187:::4 5148 HANUSCHI V MIRIAM 9,30 9.50 9.25 ADMIS
188:::5 5449 LARGEAN D ANDREEA 9.21 9.05 9.25 ADMIS
189:::6 5120 RADU V RALUCA ALEXANDRA 9,13 9.50 8.40 ADMIS
190:::7, 5576 HEGHIES I ANDREEA CRISTINA 9,11 8.60 9.75 ADMIS
191:::8 5165 MAN I GEORGIANA 9,10 9.70 8.60 ADMIS
192:::9 5111 ANISOREAC I SONIA 8,95 8.80 9.50 ADMIS
193:::10 5520 ILIS G IOANA 8,83 9.50 8.10 ADMIS
194:::11 5561 RUS B ANDREEA 8,81 8.75 9.00 ADMIS
195:::12 5453 HANECZ I ROLAND ARTUR 8,81 7.80 9.55 ADMIS
196:::13 5045 PARASCHIVA C GEORGE 8,75 8.60 8.25 ADMIS
197:::14 5466 DIMA A ALEXANDRU 8,73 9.45 7.55 ADMIS
198:::15 5064 MORARU M ANDREEA MADALINA 8,61 8.65 7.45 ADMIS
199:::16 5519 TRIF T TEODOR VASILE 8,46 5.00 7.15 ADMIS
200:::17 5149 HANUSCHI V ELIZA TABITA 8,45 7.90 6.35 ADMIS
201:::18 5439 BUCUR V SIMINA 8,33 7.00 9.00 ADMIS
202:::19 5107 NAGHI V IOANA ANCUTA 8,31 5.40 7.30 ADMIS
203:::21 5013 ANITEI V ELENA IULIANA 8,16 9.35 7.15 ADMIS
204:::22 5130 MOLDOVAN D RADU BOGDAN 8,12 8.85 5.15 ADMIS
205:::23, 5123 CUCIUREAN I OANA CAMELIA 8,12 5.00 8.00 ADMIS
206:::24 5515 MOLDOVEANU C ALEXANDRA DIANA 8,10 Tadd 7.95 ADMIS
207:::25 5063 POP GH CRISTINA GABRIELA 8,02 5.20 7.20 ADMIS
208:::26 5411 STOICA O ANDREEA 8,00 7.70 5.00 ADMIS
209:::27 5542 IOSIF G GEORGIANA IOANA 7,96 7.50 6.25 ADMIS
210:::28 5555 BUDULAU I LOREDANAIOANA 7,80 7.50 8.00 ADMIS
211:::29 5173 OLTEAN L.V CRISTINA DOROTHEA 7,64 7.25 5.75 ADMIS
212:::30 5175 TIMIS GH SIMINA 7,48 9.05 8.00 ADMIS
213:::31 5560 COSTE N RADU FLORIN 7,46 7.70 8.30 ADMIS
214:::32 5489 BALINT I ALEXANDRA ANDREEA 7,41  8.10 ADMIS
215:::33 5102 MORAR M FLAVIU CRISTIAN 7,36 TAS 6.85 ADMIS
216:::34 5078 CIOBANU N NICOLAE 7,18 oot 5.00 7.25 ADMIS
217:::35 5477 CHIOREAN V VICTORITA AURELIA 7,007" 6.10 7.00 ADMIS
218:::36 5025 COVRIG P CARMEN LENUTA 679 a 8.30 5.00 ADMIS
219:::37 5535 POPA V DANIELA MARIA
220:::38 5440 POP A.O NICOLAE SIMION
221:::39 5432 PINTEA P ANA IBOLYA
222:::40 5142 MICU A SERGIU ALIN
223:::41 5525 SOCACIU LC OVIDIU DANIEL
224:::42 5119 PRISCORNITA C.I ALEXANDRU FLORIN
225:::43 5488 IAKAB F ANNA MARIA
226:::6,43
227:::6,36
228:::6,35
229:::6,31
230:::6,30
231:::6,13
232:::6,05
233:::3.12
234:::5.00
235:::5.90
236:::6.60
237:::5.00
238:::5.80
239:::6.65
240:::5.25 ADMIS
241:::8.10 ADMIS
242:::6.15 ADMIS
243:::5.05 ADMIS
244:::5.50 ADMIS
245:::5.60 ADMIS
246:::5.00 ADMIS
247:::1 5811 DONAT F LAVINIA DENISA 6,96 7.00 7.40 ADMIS
248:::1 5454 POPA C IOANA 8,63 7.60 9.00 ADMIS
249:::1 5164 ROTUND E CRISTINA VOICHITA 9,68 10.00 9.35 ADMIS
250:::2 5805 VRABIE A ANA CAROLINA 8,25 8.00 8.00 ADMIS
251:::3 5007 BOLOS FS CRISTINA DIANA 8,01 7.85 8.25 ADMIS
252:::4 5004 MURESAN P PAULA PATRICIA 7,95 9.05 5.90 ADMIS
253:::5 5436 PUSCAS V.S SORINA MARIA 7,73 8.45 6.25 ADMIS
254:::6 5508 BUCUR M PATRICIA 7,68 8.65 6.15 ADMIS
255:::7 5408 SZABO A.C KRISZTIAN ROLAND 6,35 6.90 5.80 ADMIS
256:::1, 5855 MIHALI S ALEXANDRA ROXANA 8,70 8.45 8.30 ADMIS
257:::2 5866 TIPLEA I FLORENTINA IOANA 8,33 7.10 6.80 ADMIS
258:::3 5860 GHEORGHEI D MARIANA 8,28 8.00 8.45 ADMIS
259:::4 5864 MOLNAR N KRISTINA 7,89 6.75 7.50 ADMIS
260:::5, 5865 STAN T MARIANA ADRIANA 7,33 9.10 7.90 ADMIS
261:::6 5868 PETRI I ANDREEA 7,11 8.20 5.15 ADMIS
262:::7 5863 BLEDEA V VASILE IONUT 6,26 6.35 6.20 ADMIS
263:::1 5087 ROZNICIUC I LAURENTIU COZMIN 8,56 8.90 8.15 ADMIS
264:::2 5857 HOLMU N CARMEN ADRIANA 8,18 8.40 8.25 ADMIS
265:::3 5854 IUSCO V FLOAREA IULIANA 7,42 7.90 6.80 ADMIS
266:::4 5867 DUBOVICI I.M IOAN ALEXANDRU 7,36 5.60 8.10 ADMIS
267:::5 5856 TIVADAR P PETRU DANUT 1.35 5.00 6.70 ADMIS
268:::6 5852 PETRAN G ANDREEA 6,95 5.60 6.00 ADMIS
269:::7 5861 TIFRAC I CARINA PETRUTA 6,91 6.85 6.10 ADMIS
270:::8 5853 CUCICEA I CRINA IONELA 6,56 7.25 7.00 ADMIS
271:::9 5862 SAS V IONELA CRISTINA 6,56 6.80 5.10 ADMIS
272:::10 5851 MAN FS GABRIEL ALIN 6,55 5.05 6.40 ADMIS
273:::11 5859 SAS PI ALEXANDRU PETRISOR 6,33 7.10 6.25 ADMIS
274:::12 5858 POP V IOANA FLOAREA 6,10 5.60 6.00 ADMIS
275:::1 5137 JECAN DF ALEXANDRU 9,38 8.60 9.90 ADMIS
276:::2 5138 TOADER S.A DIANA FLAVIA 9,36 10.00 8.65 ADMIS
277:::3 5002 POP S BIANCA MARIA 9,01 8.30 9.25 ADMIS
278:::1 5066 PUSCAS V ANDREEA 9,43 9.40 9.35 ADMIS
279:::2 5812 HONCIUC V DENISA 9,26 9.90 7.55 ADMIS
280:::4 5105 MACARIE GH GINA CARLA 9,11 9.60 8.25 ADMIS
281:::5 5150 VERES I SZILARD 9,11 9.20 9.00 ADMIS
282:::6 5113 PASZTOR R CARINA CEZARA 9,11 8.60 9.10 ADMIS
283:::7 5521 COJOCNEAN T TEODOR VASILE VLADUT 9,06 7.90 7.20 ADMIS
284:::8 5157 GRIGORE I MADALINA 8,92 8.55 6.90 ADMIS
285:::9 5401 CHIS V MADALINA OFELIA 8,85 8.80 8.65 ADMIS
286:::10 5116 NICA I STEFAN ANDREI 8,50 8.40 8.10 ADMIS
287:::11 5523 TIMAR A ANDREI DAN 8,46 8.00 8.60 ADMIS
288:::12 5039 LATCU I M IOAN MARIUS 8,35 9.25 8.35 ADMIS
289:::13 5482 CIMPEAN S.D CALIN DAN 8,26 7.75 7.60 ADMIS
290:::14 5547 SPATARU C COSMINA IOANA 8,21 8.05 7.70 ADMIS
291:::15 5172 SERBAN M CIPRIAN ANDREI 8,16 8.75 6.10 ADMIS
292:::16 5154 MERA I IOANA DANIELA 8,08 7.60 8.60 ADMIS
293:::17 5018 GEORZA I DRAGOS 7,68 5.75 5.95 6.00 ADMIS
294:::18 5554 MOCAN E DARIUS NICOLAE 7,61 7.15 8.50 ADMIS
295:::19 5495 BORDA M MIRCEA RAZVAN 7,55 8.05 7.60 ADMIS
296:::22 5807 MOLDOVAN T SABINA MIRELA 7,40 7.60 8.20 ADMIS
297:::23 5047 JINGAROIU C PATRICK LEONHARD Lod 6.80 8.35 ADMIS
298:::24 5072 DEUSAN D MIHAELA IULIA 7,36 7.45 7.35 ADMIS
299:::25 5067 JIMBOREAN M ANDA CATALINA 7,36 6.80 6.90 ADMIS
300:::26 5819 GHIURCAN V ADRIANA SIMONA 7,30 9.00 5.50 ADMIS
301:::27 5492 IONAS V HORATIU SEBASTIAN 7,28 9.15 5.10 ADMIS
302:::28 5540 ILEA S COSMIN EUGEN 7,22 6.00 5.00 ADMIS
303:::29 2451 OTVOS V SAMUEL VIOREL 7,20 6.60 7.50 ADMIS
304:::30 5053 JUCAN T EUGENIA ANCUTA 7,09 5.05 5.00 ADMIS
305:::31 5460 MOROSANU GD ROBERT DANIEL 7,05 5.70 7.00 ADMIS
306:::32 5417 SOPOREAN M MONELA 6,75 5.10 6.75 ADMIS
307:::33.
308:::5450 MIC D LUCIAN PAUL 6,71 sacl 8.45 6.25 ADMIS
309:::34 5534 PRICE C JOSIF SILVIU ON 5.00 8.90 ADMIS
310:::35 5167 TOMA N MARIUS DANIEL 661 5.00 7.50 ADMIS
311:::36 5405 POP I MARIA a 6.90 6.10 ADMIS
312:::38.
313:::39.
314:::40.
315:::41,
316:::42.
317:::43.
318:::44.
319:::45.
320:::5815
321:::5071
322:::5457
323:::5498
324:::5416
325:::5174
326:::5170
327:::6,50
328:::6,45
329:::6,38
330:::6,38
331:::6,30
332:::6,23
333:::6,10
334:::6,08
335:::6,05
336:::6.20
337:::6.10
338:::5.05
339:::5.75
340:::5.25
341:::5.00
342:::5.05
343:::5.45
344:::5.00
345:::7.10
346:::725
347:::139
348:::6.50
349:::6.00
350:::7.10
351:::1 5803 PROKOP GY TAMAS PETER 8,40  ADMIS
352:::2 5472 GERGELY E IMRE SZABOLCS 8,06 8.25 8.35 ADMIS
353:::3 5125 DEAK A NORBERT 7337 8.70 8.45 ADMIS
354:::4 5402 SIMONCA I GABRIEL IOAN 6,74 755 5.15 ADMIS
355:::5 5556 MEZEI A REKA 6,52 5.55 7.85 ADMIS
356:::6 5462 SZOCS D LEHEL DENES 6,42 7.50 5.00 ADMIS
357:::7 5057 BAZSA PATAKI Z JUDITIMOLA 6,02 8.25 5.30 ADMIS
358:::1 5121 BORTAS GH DIANA BEATRICE 10,00 a 10.00 10.00 ADMIS
359:::2 5467 ALBERT A LASZLO 8,90 9.75 9.10 ADMIS
360:::3 5016 RIT S DIANA VANESSA 7,98 7.70 8.45 ADMIS
361:::4, 5433 MANAILA I ALEXANDRA GABRIELA 7,50 5.45 8.65 ADMIS
362:::5 5424 STRAT N.D, STEFAN DORIN 6,30 5.05 6.95 ADMIS
363:::1 5117 LUPULEASA V ANDREI 9,06 10.00 7.40 ADMIS
364:::1 5070 GRAMADA C COSMIN VASILE 9,69 9.80 9.05 ADMIS
365:::2 5003 LUCA D VLAD 9,65 10.00 9.80 ADMIS
366:::3 5470 SUHAR V ALEXANDRU GEORGE 9,60 9.75 9.15 ADMIS
367:::4 5549 HOTIMA I OANA MARIA 9,58 9.70 9.65 ADMIS
368:::5 5536 POP J.C IOAN OCTAVIAN 9,51 9.70 7.40 ADMIS
369:::6 5558 DOBRITOIU GR ALEXANDRU GRIGORE 9,43 9.00 8.10 ADMIS
370:::7 5513 SUCIU O.L OANA ELENA 9,41 8.10 9.20 ADMIS
371:::8 5062 IVANOV R.E IOANA ANDREEA 9,40 9.40 8.55 ADMIS
372:::9, 5551 CERNEA C DARIUS 9,40 7.70 9.00 ADMIS
373:::10, 5518 IUGA V BOGDAN MIHAI 9,33 8.90 9.20 ADMIS
374:::11 5099 GROZA I IONUT 9,31 8.20 8.30 ADMIS
375:::12 5511 GURZO S TIMEA TUNDE 9,26 9.00 9.25 ADMIS
376:::13 5126 SERBANESCU R ROMULUS ANDREI 9,25 9.15 6.40 ADMIS
377:::14 5122 LUCA C RARES VASILE 9,25 1.35 9.30 ADMIS
378:::15 5485 DROBUT D MIHAELA RALUCA 0.21 8.40 8.05 ADMIS
379:::16 5806 FARAGAU N SERGIU NICOLAE 9,20  ADMIS
380:::17 5026 MURESAN I SERGIU RAZVAN 8,94 9.10 8.15 ADMIS
381:::18 5431 TOBIAS F VICTOR FERENCZ 8,88 9.00 8.40 ADMIS
382:::19 5145 CRASNEANU V MARIA EMANUELA 8,88 7.55 9.70 ADMIS
383:::21 5177 MARC O I OVIDIU 8,80 8.50 7.25 ADMIS
384:::22 5097 BALCAN E EUSEBIU 8,75 8.70 7.70 ADMIS
385:::23 5522 ONUTU AN.C CATALIN 8,73 7.75 8.75 ADMIS
386:::24 5171 MUSAT C MARIA MELINDA 8,72 8.50 6.75 ADMIS
387:::25 5441 POP M. S ALEXANDRU AUREL 8,46 9.10 7.50 ADMIS
388:::26 5507 NEGREA L.M GEORGIJANA DANA 8,41 8.20 8.35 ADMIS
389:::27 5458 MOLDOVAN V.A.A MIJHAI TUDOR 8,41 8.10 8.65 ADMIS
390:::28 5479 VASILE S DAVID STEFANEL 8,30 6.40 5.30 ADMIS
391:::1 5426 APETREI P ANCA RALUCA 9,15 10.00 8.80 ADMIS
392:::2 5034 BOGDANIUC C MIRCEA BOGDAN 7,43 715 6.55 ADMIS
393:::3, 5052 MANEA C IULIAN CONSTANTIN 7,30 8.00 6.35 ADMIS
394:::4, 5533 DANILA A ADRIAN IULIAN 7,20 5.10 8.80 ADMIS
395:::5 5808 CIORBA V DENISAIOANA 7,20  ADMIS
396:::6 5813 VESCA V WILLIAM 6,00 6.00 6.00 ADMIS
397:::1, 5051 BUCUR D LOUISA MARIA 9,58 10.00 9.35 ADMIS
398:::2 5093 ORDEAN N PATRIC 9,11 10.00 8.75 ADMIS
399:::3 5820 BENOHR M JAKOB 9,07 9.30 9.40 ADMIS
400:::4, 5427 VIJOLI M.N ADRIANA GEORGIANA 9,06 9.05 9.06 ADMIS
401:::5 5481 POP G MARIA 8,86 9.20 9.25 ADMIS
402:::6 5446 PIRJOL C OCTAVIAN CONSTANTIN 8,10 7.30 7.90 ADMIS
403:::7 5118 SOLOMON V VIOREL ALEXANDRU 7,88 9.10 6.30 ADMIS
404:::8 53476 HOCA D DOREL CATALIN 7,86 7.50 6.80 ADMIS
405:::9 5809 BABII V FELICIA 11>  ADMIS
406:::10 5459 ANTON V.C JULIANA CONSTANTINA 7,71 6.50 7.90 ADMIS
407:::11 5083 POP A DAN 7,68 6.30 6.85 ADMIS
408:::12 5028 MURESAN M MARIUS 7,66 6.60 7.10 ADMIS
409:::13 5106 JUCAN P CAMELIA TAT 5375 5.85 7.45 ADMIS
410:::14 5082 PINTEA N MIHAI ALEXANDRU 7,46 6.40 7.55 ADMIS
411:::15 5530 RUS Z I ANDREEA MARIA 7,41 TAS 7.70 ADMIS
412:::16 5139 SIPOS D.V DANIELA MARIA 7,40 9.30 5.10 ADMIS
413:::17 5005 POP I IONELA DENISA 7,38 5.30 7.85 ADMIS
414:::18 5461 NISTOR P DENISA FLORINELA 7,28 Tid 6.70 ADMIS
415:::19, 5541 BERCIU Z.C GEANINA 7,23 5.80 6.80 ADMIS
416:::21 5567 BIRGAOANU I LAURA STEFANIA TAS 6.05 7.05 ADMIS
417:::22 5514 SLEAM V ANDREI DUMITRU 713 8.10 6.20 ADMIS
418:::23 5456 MALANCA N NICOLETA FLORINA 7J1 7.35 7.00 ADMIS
419:::24 5818 PIRLEA E EMIL 7,00 8.00 7.00 ADMIS
420:::25 5001 DRAGOS I RALUCA MARIA 6,95 8.35 6.70 ADMIS
421:::26 5447 VASILUT V DIANA GABRIELA 6,95 9.15 6.05 ADMIS
422:::27 5463 SOPORAN N ADRIANA SIMONA 6,75 7.85 5.00 ADMIS
423:::28 5015 OLARU I IONUT 6,73 8.75 5.25 ADMIS
424:::29 5510 UNGUREANU V CIPRIAN 6,70 5.20 7.90 ADMIS
425:::30 5503 TAMAS R DANIEL 6,53 5.60 5.25 ADMIS
426:::31 5801 GREC I ILIUTA MIHAELA 6,50  ADMIS
427:::32 5568 CHIPIRLIU C IULIANA ELENA 6,48 6.20 5.85 ADMIS
428:::33 5156 LUCA L LOREDANA MARIANA 6.60 ADMIS
429:::1 5136 BALAJ F M MARIA FLAVIA 8,25 755 9.70 ADMIS
430:::2 5546 PANEA P SIMONA 8,21 7.25 8.20 ADMIS
431:::3 5531 NEDELEA E ANDREI CATALIN 7,76 7.30 9.10 ADMIS
432:::4 5146 CSEGOLDI I JANOS ZOLTAN 6,40 5.60 8.00 ADMIS
433:::5 5823 RAILEAN V JULIANA 6,25 7.00 7.00 ADMIS
434:::6 5504 MILAS G MELANIA SIMONA 6,05 5.00 5.70 ADMIS
435:::1 5528 GHEORGHE N.A ALEXANDRA 9,78 10.00 9.60 ADMIS
436:::2 5084 POP S.V ROBERT ANDREI 8,83 9.15 8.75 ADMIS
437:::3 5574 RUSU I.D ANDRA PAULA 7,61 5.85 8.60 ADMIS
438:::4 5422 AWA M MUHAMAD ZUHEIR 7,40 8.90 6.85 ADMIS
439:::5 5814 MALINA A ALEXANDRU 725 7.00 8.00 ADMIS
440:::6 5147 CZIKA A ARNOLD 6,30 5.00 6.80 ADMIS

Adatok feldolgozása

A név nem mindenhol ugyanolyan hosszú, ezért ezt normalizáljuk.
A név második neve, az iniciálő sokszor 1 vagy |, l, I. Ezt normalizáljuk I-re.


In [21]:
clean_lines=[]
for line in good_lines:
    values=line.split(' ')
    
    try:
        for k in [2,3,4]:
            values[k]=values[k].replace('1','I').replace('l','I')\
                .replace('|','I').replace('7','I').replace('0','O')

        counter=0
        name_start=100
        name_end=0
        while counter<len(values)-1:
            counter+=1
            try:
                if not values[counter][0].isdigit():
                    name_start=min(name_start,counter)
                    if values[counter+1][0].isdigit():
                        name_end=counter+1
                        counter=len(values)
            except:
                print('HIBA: '+line)
    
        clean_values=values[:name_start]+[' '.join(values[name_start:name_end])]+values[name_end:]
        clean_values=[szo for szo in clean_values if szo]
        clean_lines.append(clean_values)

    except:
        print('HIBA: '+line)


HIBA: 7 TH ET ERCELLENTIA
HIBA: 5158 POP GH. C LOREDANA EMANUELA
HIBA: 5006 CHINDRIS M BOGDAN IOAN
HIBA: 35.
HIBA: 36,
HIBA: 37 5135 FECHETE M DENISA IONELA
HIBA: 38 5021 SERUNA O.S OVIDIU SORIN
HIBA: 6,05 755 5.10 ADMIS
HIBA: 0
HIBA: 37 5535 POPA V DANIELA MARIA
HIBA: 38 5440 POP A.O NICOLAE SIMION
HIBA: 39 5432 PINTEA P ANA IBOLYA
HIBA: 40 5142 MICU A SERGIU ALIN
HIBA: 41 5525 SOCACIU LC OVIDIU DANIEL
HIBA: 42 5119 PRISCORNITA C.I ALEXANDRU FLORIN
HIBA: 43 5488 IAKAB F ANNA MARIA
HIBA: 6,43
HIBA: 6,36
HIBA: 6,35
HIBA: 6,31
HIBA: 6,30
HIBA: 6,13
HIBA: 6,05
HIBA: 3.12
HIBA: 5.00
HIBA: 5.90
HIBA: 6.60
HIBA: 5.00
HIBA: 5.80
HIBA: 6.65
HIBA: 5.25 ADMIS
HIBA: 8.10 ADMIS
HIBA: 6.15 ADMIS
HIBA: 5.05 ADMIS
HIBA: 5.50 ADMIS
HIBA: 5.60 ADMIS
HIBA: 5.00 ADMIS
HIBA: 33.
HIBA: 38.
HIBA: 39.
HIBA: 40.
HIBA: 41,
HIBA: 42.
HIBA: 43.
HIBA: 44.
HIBA: 45.
HIBA: 5815
HIBA: 5071
HIBA: 5457
HIBA: 5498
HIBA: 5416
HIBA: 5174
HIBA: 5170
HIBA: 6,50
HIBA: 6,45
HIBA: 6,38
HIBA: 6,38
HIBA: 6,30
HIBA: 6,23
HIBA: 6,10
HIBA: 6,08
HIBA: 6,05
HIBA: 6.20
HIBA: 6.10
HIBA: 5.05
HIBA: 5.75
HIBA: 5.25
HIBA: 5.00
HIBA: 5.05
HIBA: 5.45
HIBA: 5.00
HIBA: 7.10
HIBA: 725
HIBA: 139
HIBA: 6.50
HIBA: 6.00
HIBA: 7.10

In [22]:
df=pd.DataFrame(clean_lines)
df=df[range(6)]
df.columns=['ID','Leg','Nume','Admitere','Bac 2','Lb. Mat.']
df.head()


Out[22]:
ID Leg Nume Admitere Bac 2 Lb. Mat.
0 1 5501 MITITEL P PAVEL 9.23 10.00 8.00
1 2 5442 BIDIU D ALINA NICOLETA 9,03 9.40 8.90
2 3 5168 BOGDAN G LUCIAN BOGDAN 8,61 8.95 7.00
3 4 5166 POP O JULIA CRISTINA 8,41 8.65 7.60
4 5 5101 KADAR R ADRIAN LUCIAN 8,36 8.00 8.65

In [32]:
df.to_excel('pdf/'+PDF_file+'/data.xlsx')

extra

Ez a további rész további tisztítás, ezt végezhetjük már akár Excelben is.

A jegyeknél vagy ahol hiányzik a vessző. Ezt úgy vizsgáljuk, hogy a az első karater után mindig vesszőt teszük, ha már nincs. Aztán számokká konvertáljuk.


In [23]:
for line in clean_lines:
    try:
        for i in [3,4,5]:
            jegy=line[i]
            jegy=jegy.replace('.',',')
            jegy=jegy.replace('I','1').replace('l','1')\
                .replace('|','1').replace('A','4').replace('O','0')

            if jegy[0]==',':jegy=jegy[1:]
            if jegy[-1]==',':jegy=jegy[:-1]
            if jegy!='10,00':
                if jegy!='1000':
                    if len(jegy)<4:
                        jegy=jegy[0]+','+jegy[1:]
                    jegy=float(jegy.replace(',','.'))
                else: jegy=10
            else: jegy=10
            line[i]=jegy
    except:
        print(line)


['3', '5414', 'TUDIC V ANA', 7.88, 'FAS', '9.50', 'ADMIS']
['5', '5452', 'ABRUDAN R ROXANA ALEXANDRA Tadd', 7.4, 6.9, 'ADMIS']
['9', '5429', 'CALIN G VASILE', 7.45, 'VAS', '6.50', 'ADMIS']
['15', '5816', 'DINCA I PETRISOR GORE', 7.0, 'ADMIS']
['5158', '5158', 'POP', 'GH.', 'C', 'LOREDANA', 'EMANUELA']
['5006', '5006', 'CHINDRIS', 'M', 'BOGDAN', 'IOAN']
['37', '5135', '37', 5135.0, 'FECHETE', 'M', 'DENISA', 'IONELA']
['38', '5021', '38', 5021.0, 'SERUNA', 'O.S', 'OVIDIU', 'SORIN']
['6,06', '“', '5.OO', 6.9, 'ADMIS']
['5', '5140', 'PADUREAN I.D ANDREEA MIHAELA', 7.6, 'TAS', '8.35', 'ADMIS']
['1', '5914', 'FARKAS B ANDREA', 6.3, 8.9, 'ADMIS']
['3', '5049', 'PAVEL D ADRIANA UAS', 6.5, 6.4, 'ADMIS']
['14', '5564', 'RUSU E GEANINA ALINA', 7.08, '§.15', '7.30', 'ADMIS']
['15', '5802', 'FIRTOS R ROXANA CASSANDRA', 7.0, 'ADMIS']
['6', '5821', 'NECHITA F.V ALEXANDRU', 8.6, 'ADMIS']
['12', '5544', 'POP S COSMIN Fld', 7.45, 6.2, 'ADMIS']
['2', '5810', 'CARDOS SIBEL DIANA', 9.4, 'ADMIS']
['3', '5562', 'DANCIU IS VLAD', 9.0, 9.0, '‘8.00', 'ADMIS']
['4.5110', 'BALICA N BEATRICE COSSETTE', '8,98', 8.9, 9.0, 'ADMIS']
['24', '5515', 'MOLDOVEANU C ALEXANDRA DIANA', 8.1, 'Tadd', '7.95', 'ADMIS']
['32', '5489', 'BALINT I ALEXANDRA ANDREEA', 7.41, 8.1, 'ADMIS']
['33', '5102', 'MORAR M FLAVIU CRISTIAN', 7.36, 'TAS', '6.85', 'ADMIS']
['34', '5078', 'CIOBANU N NICOLAE', 7.18, 'oot', '5.00', '7.25', 'ADMIS']
['35', '5477', 'CHIOREAN V VICTORITA AURELIA', '7,007"', '6.10', '7.00', 'ADMIS']
['36', '5025', 'COVRIG P CARMEN LENUTA', 6.79, 'a', '8.30', '5.00', 'ADMIS']
['37', '5535', '37', 5535.0, 'POPA', 'V', 'DANIELA', 'MARIA']
['38', '5440', '38', 5440.0, 'POP', 'A.O', 'NICOLAE', 'SIMION']
['39', '5432', '39', 5432.0, 'PINTEA', 'P', 'ANA', 'IBOLYA']
['40', '5142', '40', 5142.0, 'MICU', 'A', 'SERGIU', 'ALIN']
['41', '5525', '41', 5525.0, 'SOCACIU', 'LC', 'OVIDIU', 'DANIEL']
['42', '5119', '42', 5119.0, 'PRISCORNITA', 'C.I', 'ALEXANDRU', 'FLORIN']
['43', '5488', '43', 5488.0, 'IAKAB', 'F', 'ANNA', 'MARIA']
['23', '5047', 'JINGAROIU C PATRICK LEONHARD Lod', 6.8, 8.35, 'ADMIS']
['5450', 'MIC D LUCIAN PAUL', '6,71', 'sacl', '8.45', '6.25', 'ADMIS']
['34', '5534', 'PRICE C JOSIF SILVIU ON', 5.0, 8.9, 'ADMIS']
['36', '5405', 'POP I MARIA a', 6.9, 6.1, 'ADMIS']
['1', '5803', 'PROKOP GY TAMAS PETER', 8.4, 'ADMIS']
['1', '5121', 'BORTAS GH DIANA BEATRICE', 10, 'a', '10.00', '10.00', 'ADMIS']
['16', '5806', 'FARAGAU N SERGIU NICOLAE', 9.2, 'ADMIS']
['5', '5808', 'CIORBA V DENISAIOANA', 7.2, 'ADMIS']
['9', '5809', 'BABII V FELICIA', '11>', 'ADMIS']
['15', '5530', 'RUS Z I ANDREEA MARIA', 7.41, 'TAS', '7.70', 'ADMIS']
['18', '5461', 'NISTOR P DENISA FLORINELA', 7.28, 'Tid', '6.70', 'ADMIS']
['21', '5567', 'BIRGAOANU I LAURA STEFANIA TAS', 6.05, 7.05, 'ADMIS']
['23', '5456', 'MALANCA N NICOLETA FLORINA', '7J1', '7.35', '7.00', 'ADMIS']
['31', '5801', 'GREC I ILIUTA MIHAELA', 6.5, 'ADMIS']
['33', '5156', 'LUCA L LOREDANA MARIANA', 6.6, 'ADMIS']

Ezeket a hibákat ki kell kézzel javítani


In [30]:
df=pd.DataFrame(clean_lines)
df=df[range(6)]
df.columns=['ID','Leg','Nume','Admitere','Crit. 2','Crit. 3']
df.head()


Out[30]:
ID Leg Nume Admitere Crit. 2 Crit. 3
0 1 5501 MITITEL P PAVEL 9.23 10 8
1 2 5442 BIDIU D ALINA NICOLETA 9.03 9.4 8.9
2 3 5168 BOGDAN G LUCIAN BOGDAN 8.61 8.95 7
3 4 5166 POP O JULIA CRISTINA 8.41 8.65 7.6
4 5 5101 KADAR R ADRIAN LUCIAN 8.36 8 8.65

In [31]:
df['Mean']=df.mean(axis=1)
df


Out[31]:
ID Leg Nume Admitere Crit. 2 Crit. 3 Mean
0 1 5501 MITITEL P PAVEL 9.23 10 8 NaN
1 2 5442 BIDIU D ALINA NICOLETA 9.03 9.4 8.9 NaN
2 3 5168 BOGDAN G LUCIAN BOGDAN 8.61 8.95 7 NaN
3 4 5166 POP O JULIA CRISTINA 8.41 8.65 7.6 NaN
4 5 5101 KADAR R ADRIAN LUCIAN 8.36 8 8.65 NaN
... ... ... ... ... ... ... ...
370 2 5084 POP S.V ROBERT ANDREI 8.83 9.15 8.75 NaN
371 3 5574 RUSU I.D ANDRA PAULA 7.61 5.85 8.6 NaN
372 4 5422 AWA M MUHAMAD ZUHEIR 7.4 8.9 6.85 NaN
373 5 5814 MALINA A ALEXANDRU 7.25 7 8 NaN
374 6 5147 CZIKA A ARNOLD 6.3 5 6.8 NaN

375 rows × 7 columns


In [ ]: