PDF táblázatok pandas-ba való alakítása. Ez csak olyan PDF-ekre működik, amelyekben a táblázat valóban szöveg és nem kép. Tehát _Excel_ből vagy _Word_ból voltak exportálva.


In [1]:
import pandas as pd

In [2]:
!pip install camelot-py


Requirement already satisfied: camelot-py in c:\programdata\anaconda3\lib\site-packages (0.7.3)
Requirement already satisfied: PyPDF2>=1.26.0 in c:\programdata\anaconda3\lib\site-packages (from camelot-py) (1.26.0)
Requirement already satisfied: chardet>=3.0.4 in c:\programdata\anaconda3\lib\site-packages (from camelot-py) (3.0.4)
Requirement already satisfied: pdfminer.six>=20170720 in c:\programdata\anaconda3\lib\site-packages (from camelot-py) (20191110)
Requirement already satisfied: pandas>=0.23.4 in c:\programdata\anaconda3\lib\site-packages (from camelot-py) (0.25.1)
Requirement already satisfied: numpy>=1.13.3 in c:\programdata\anaconda3\lib\site-packages (from camelot-py) (1.16.5)
Requirement already satisfied: click>=6.7 in c:\programdata\anaconda3\lib\site-packages (from camelot-py) (7.0)
Requirement already satisfied: openpyxl>=2.5.8 in c:\programdata\anaconda3\lib\site-packages (from camelot-py) (3.0.0)
Requirement already satisfied: pycryptodome in c:\programdata\anaconda3\lib\site-packages (from pdfminer.six>=20170720->camelot-py) (3.9.4)
Requirement already satisfied: six in c:\programdata\anaconda3\lib\site-packages (from pdfminer.six>=20170720->camelot-py) (1.12.0)
Requirement already satisfied: sortedcontainers in c:\programdata\anaconda3\lib\site-packages (from pdfminer.six>=20170720->camelot-py) (2.1.0)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\programdata\anaconda3\lib\site-packages (from pandas>=0.23.4->camelot-py) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in c:\programdata\anaconda3\lib\site-packages (from pandas>=0.23.4->camelot-py) (2019.3)
Requirement already satisfied: et-xmlfile in c:\programdata\anaconda3\lib\site-packages (from openpyxl>=2.5.8->camelot-py) (1.0.1)
Requirement already satisfied: jdcal in c:\programdata\anaconda3\lib\site-packages (from openpyxl>=2.5.8->camelot-py) (1.4.1)

In [3]:
!pip install opencv-python


Requirement already satisfied: opencv-python in c:\programdata\anaconda3\lib\site-packages (4.1.2.30)
Requirement already satisfied: numpy>=1.14.5 in c:\programdata\anaconda3\lib\site-packages (from opencv-python) (1.16.5)

In [4]:
import camelot

In [9]:
tables = camelot.read_pdf('foo.pdf')

In [12]:
tables[0].df


Out[12]:
0 1 2 3 4 5 6
0 Cycle \nName KI \n(1/km) Distance \n(mi) Percent Fuel Savings
1 Improved \nSpeed Decreased \nAccel Eliminate \nStops Decreased \nIdle
2 2012_2 3.30 1.3 5.9% 9.5% 29.2% 17.4%
3 2145_1 0.68 11.2 2.4% 0.1% 9.5% 2.7%
4 4234_1 0.59 58.7 8.5% 1.3% 8.5% 3.3%
5 2032_2 0.17 57.8 21.7% 0.3% 2.7% 1.2%
6 4171_1 0.07 173.9 58.1% 1.6% 2.1% 0.5%