A simple python application: universal translator

In this lab we will develop a universal text translator by using the Google translation webpage.

We will use the following libraries, please verify that they are available in your system:

* goslate: un-official API access to Google translate

* BeautifulSoup: Processing of html pages.

* urllib2: Downloading of webpages.

DISCLAIMER: This code is intended only for academic purposes, any professional or commercial use of the Google translate page should be conducted under the terms and conditions of the Google Translate API (https://cloud.google.com/translate/docs). The authors of this code are not responsible for any unauthorized use.


In [ ]:
import goslate  # pip install goslate
from bs4 import BeautifulSoup # pip install beautifulsoup4
import urllib2 # pip install requests

 1.- Introduction to Python dictionaries

In this practice we will extensively use Python Dictionaries, in this preliminary section we will learn to build, modify and use them.

Python dictionaries are hash tables (also known as associative memories, associative arrays), a data structure indexed by keys, which can be any immutable (hashable) type. It is best to think of a dictionary as an unordered set of (key, value) pairs, with the requirement that the keys are unique (within one dictionary). Let's see a simple example of dictionary creation and manipulation.


In [ ]:
inventary_dict = {'milk': 23, 'coockies': 12, 'chocolate': 26, 'yogourt': 5}

print "This is the original dictionary:"
print inventary_dict
print " "

print "This is the value associated to 'milk':"
print inventary_dict['milk']
print " "

print "We add a new element to the dictionary:"
inventary_dict.update({'sugar': 103})
print inventary_dict
print " "

print "We increment the value of one of the elements:"
inventary_dict['coockies'] += 10
print inventary_dict
print " "

Note: Observe that the order in a dictionary is not relevant.

Dictionaries have two fundamental methods: keys and values


In [ ]:
keys = inventary_dict.keys()

print "These are the keys of the dictionary:"
print keys
print " "

values = inventary_dict.values()
print "These are the values of the dictionary:"
print values
print " "

print "The size of this dictionary is %d, and it stores the following elements:" % len(inventary_dict.keys())    

for key in keys:
    print key + ": " + str(inventary_dict[key])

EXERCISE : Define the following 'languages_dict' dictionary, that stores the languages our translator will work on. Note that both keys and values are strings:


en: Inglés
zh: Chino
de: Alemán
it: Italiano
es: Español

In [ ]:
languages_dict = <COMPLETAR>

print "Vamos a traducir de %s a %s." % (languages_dict['es'], languages_dict['it'])

EXERCISE : Define a funtion 'view_codes' that prints all language codes in the dictionary, such that view_codes(languages_dict) produces:


en: Inglés
zh: Chino
de: Alemán
it: Italiano
es: Español

Note: dictionaries are not ordered structured, the results may be presented in a different order.


In [ ]:
def view_codes(mydict):
    <COMPLETAR>

view_codes(languages_dict)

 2.- Downloading a webpage

The urllib library allows to download any web content. Other tools for building crawling bots for massive downloading are also available (Scrapy).

We will use the urllib library to download a webpage.

EXERCISE: Complete the code below to compute the number of downloaded characters and print the first 1000.


In [ ]:
agent = {'User-Agent':"Mozilla/4.0"}
url1 = "https://www.u-tad.com/conocenos/conoce-u-tad/"
request = urllib2.Request(url1, headers=agent)
page = urllib2.urlopen(request).read()

n_caracteres_descargados = <COMPLETAR>
print "La página descargada tiene %d caracteres." % n_caracteres_descargados

print "Estos son los primeros 1000 caracteres:"
print "=" * 100
print <COMPLETAR>
print "=" * 100

BeautifulSoup is a powerfull library to postprocess the HTML code, let's see one example, we will extract the text and remove the HTML markup.


In [ ]:
bs = BeautifulSoup(page, "html5lib")
for script in bs(["script", "style"]):
    script.extract()
text_utad = bs.get_text()
text_utad = ' '.join(text_utad.split())
                
print text_utad

3.- The Google Translate webpage syntax

Let's explore the behaviour of the Google Translate webpage. Execute the following code, and open the resulting URL in a web browser (click on it, or copy-paste it to a web browser, if it does not automatically open). You should be able to read the translation text in that webpage. Try to identify the syntax of the query, and manually modify the destiny language, as well as the text to translate.


In [ ]:
url = "https://translate.google.com/m?hl=de&sl=auto&q=adiós+amigos"
print url

EXERCISE: Define a function that takes as argument the destiny language and the text to be translated and returns the url. Check the result by clicking on the printed link.


In [ ]:
destiny_language = 'it'
my_text = "Hola a todos mis amigos"

def url_translate(destiny_language, text):
    url = <COMPLETAR> % (destiny_language, "auto", text.replace(<COMPLETAR>))
    return url

url = url_translate(destiny_language, my_text)
print url

4.- Downloading the html code with the translation

EXERCISE : Write a function 'get_html' that takes as input the destiny language and the text to be translated and returns the html code of the page.


In [ ]:
def get_html(lang, text):
    agent = {'User-Agent':"Mozilla/4.0"}
    url = <COMPLETAR>
    request = urllib2.Request(url, headers=agent)
    html = urllib2.urlopen(request).read()
    return html

html = get_html(destiny_language, my_text)

n_caracteres_descargados = <COMPLETAR>
print "La página descargada tiene %d caracteres." % n_caracteres_descargados
print "=" * 100
print html
print "=" * 100

5.- Postprocessing the downloaded webpage

We will analyze the html content to obtain the desired translation.

EXERCISE : Complete the function 'translate' that takes as input the destiny language and the text to be translated and returns the translation.


In [ ]:
def translate(lang, text):
    html = <COMPLETAR>
    bs = BeautifulSoup(html, "html5lib")
    translation =bs.findAll('div')[2].text   
    return translation

key = 'en'
print u"Traducción al " + unicode(languages_dict[key],'utf-8') + ":"
print translate(key, my_text)

EXERCISE : Use the function in the previous section to translate the text to all languages in the dict.


In [ ]:
for <COMPLETAR>:
    print u"Traducción al " + unicode(languages_dict[key],'utf-8') + ":"
    print <COMPLETAR>
    print " "

6.- Adding more languages

We will increase the number of destiny languages. Modify the dictionary such that the following new languages are included:


ru: Ruso
fr: Francés
hi: Hindi
ja: Japonés
eu: Vasco
gl: Gallego
ca: Catalán

In [ ]:
languages_dict.update(<COMPLETAR>)
languages_dict.<COMPLETAR>
languages_dict.<COMPLETAR>
languages_dict.<COMPLETAR>
languages_dict.<COMPLETAR>
languages_dict.<COMPLETAR>
languages_dict.<COMPLETAR>

view_codes(languages_dict)

EXERCISE : Translate the text to all languages in the dict.


In [ ]:
<COMPLETAR>

7.- Translation to all available languages

We will use the goslate library to get a full dictionary of languages.

EXERCISE : Add the code to print the full list of codes and languages names.

The answer should be:

gu: Gujarati
zh-TW: Chinese (Traditional)
gd: Scots Gaelic
ga: Irish
gl: Galician
lb: Luxembourgish
la: Latin
lo: Lao
(...)

In [ ]:
gs = goslate.Goslate()
all_languages_dict = gs.get_languages()

view_codes(<COMPLETAR>)

We will now call the translation function to translate a text to every one of the languages.

EXERCISE : Write the code necessary to translate the sentence "Ya hemos completado el curso introductorio" .

The answer should be:

Traducción al Gujarati:
અમે પહેલાથી જ પ્રારંભિક અભ્યાસક્રમ પૂર્ણ કર્યા

Traducción al Chinese (Traditional):
我們已經完成了入門課程

Traducción al Scots Gaelic:
Tha sinn air crěoch a chur air mar-thŕ a 'chiad chůrsa

Traducción al Irish:
Táimid tar éis i gcrích cheana féin ar an gcúrsa tosaigh

(...)

In [ ]:
my_text = 'Ya hemos completado el curso introductorio'

for key in <COMPLETAR>:
    print u"\nTraducción al " + <COMPLETAR>  + ":"
    print <COMPLETAR>

In [ ]: