texlive
Installing texlive
and texlive-xetex
in Linux distros should be pretty straight-forward, just use your package manager.
For example, on Ubuntu just do:
apt-get install texlive texlive-xetex
On MacOS, there is a texlive
port for Macports. So, all you need to do is:
port install texlive texlive-xetex
LaTeX is preferred over making a PDF programmatically, because this is what LaTeX does: get the content right and LaTeX will make it beautiful.
pybrl
pybrl
has already basic PDF parsing and translation capabilities using pdfminer. To be more specific, there is a pdf_utils
submodule in the utils directory, which can parse a PDF file and provide some layout information.
Now that we know what tools are going to be used, we can dive into the code:
In [1]:
# Load our dependencies
import pybrl as brl
filename = "lorem_ipsum.pdf" # of course :P
pdf_password = None
language = 'english'
# Let's translate the PDF file.
translated = brl.translatePDF(filename, password = pdf_password, language = language) # Easy, right?
In [2]:
# Let's explore what this object looks like:
print(len(translated)) # = 2 (One for each page)
print(len(translated[0])) # = 1 group of text in the page.
# There might be more if (i.e.) a box of text is in a corner.
print(translated[0][0].keys()) # type, text, layout
print(translated[0][0]['type']) # 'text'
print(translated[0][0]['layout']) # The bounding box of this group
print(translated[0][0]['text'][0]) # The first word: ['000001', '111000', '101010', '111010', '100010', '101100']
The translatePDF
method does the following:
As of the time of writing, the layout is pretty basic and all the text of each page is concatenated (e.g. different groups of text in the page).
Since we are using LaTeX to create the PDF file, we actually don't really care about the layout. LaTeX will take care of it.
I will use the following template to generate my document:
\documentclass{scrartcl}
\usepackage[utf8]{inputenc}
\usepackage[parfill]{parskip} % Begin paragraphs with an empty line (and not an indent)
\usepackage{fontspec}
\begin{document}
\setmainfont{LouisLouis.ttf}
%%% Content will go here %%%
\end{document}
In [3]:
tex = "" # Template contents and what will be edited.
output = "output.tex" # Output path to the tex file
TEMPLATE_PATH = "template.tex" # Path to the Template tex file
# Load the Template
with open(TEMPLATE_PATH, "r") as f:
tex = f.read()
# Concatenate all the text.
content = ""
for page in translated:
for group in page:
grouptxt = group['text']
# Convert to Unicode characters:
unicode_brl = brl.toUnicodeSymbols(grouptxt, flatten=True)
content += "\n\n" + unicode_brl
# Create the new TeX
output_tex = tex.replace("%%% Content will go here %%%", content)
# Save it
with open(output, "w") as f:
f.write(output_tex)
In order to generate the LaTeX document we need to run the following:
xelatex output.tex
This will compile the output.pdf
file, which we can now open with our PDF viewer. We need xelatex
to use the braille font.
That's it. Now the PDF file is generated. Of course, you can change how the pages are formatted by changing the Template, but I didn't want to focus on that on this notebook. I want to show how easy it is to do such tasks using pybrl
.
I have made another Notebook which will help you understand how braille is represented in pybrl
. You can check it out here