In [1]:
import pdfplumber
Here, we're using the USDA's "National Weekly Ag Energy Round-Up", a weekly one-page report.
In [2]:
report = pdfplumber.open("../pdfs/ag-energy-round-up-2017-02-24.pdf").pages[0]
In [3]:
im = report.to_image()
im
Out[3]:
In [4]:
len(report.curves)
Out[4]:
Here's what the first curve
object looks like:
In [5]:
report.curves[0]
Out[5]:
In [6]:
im.draw_lines(report.curves, stroke="red", stroke_width=2)
Out[6]:
We can get a more better sense of the curves by cycling through a four-color palette:
In [7]:
im.reset()
colors = [ "gray", "red", "blue", "green" ]
for i, curve in enumerate(report.curves):
stroke = colors[i%len(colors)]
im.draw_circles(curve["points"], radius=3, stroke=stroke, fill="white")
im.draw_line(curve["points"], stroke=stroke, stroke_width=2)
im
Out[7]:
Note: Above, you'll notice the zig-zag pattern made by the curve that describes the gridlines. That's because pdfminer
(and, hence, pdfplumber
) currently only provide access to a the points on a curve, and not the actual path of the curve. The actual path can — as with the gridlines — include both "lineto" and "moveto" commands.