Languages

This tutorial will explain how to set the language property for various nodes and file objects when using the ricecooker framework.

Explore language objects and language codes

First we must import the le-utils pacakge. The languages supported by Kolibri and the Content Curation Server are provided in le_utils.constants.languages.


In [1]:
from le_utils.constants import languages


# can lookup language using language code
language_obj = languages.getlang('en')
language_obj


Out[1]:
Language(native_name='English', primary_code='en', subcode=None, name='English', ka_name=None)

In [2]:
# can lookup language using language name (the new le_utils version has not shipped yet)
language_obj = languages.getlang_by_name('English')
language_obj


Out[2]:
Language(native_name='English', primary_code='en', subcode=None, name='English', ka_name=None)

In [3]:
# all `language` attributed (channel, nodes, and files) need to use language code
language_obj.code


Out[3]:
'en'

In [4]:
from le_utils.constants.languages import getlang_by_native_name

lang_obj = getlang_by_native_name('français')
print(lang_obj)
print(lang_obj.code)


Language(native_name='Français, langue française', primary_code='fr', subcode=None, name='French', ka_name='francais')
fr

The above language code is an internal representaiton that uses two-letter codes, and sometimes has locale information, e.g., pt-BR for Brazilian Portuiguese. Sometimes the internal code representaiton for a language is the three-letter vesion, e.g., zul for Zulu.


In [ ]:

Create chef class

We now create subclass of ricecooker.chefs.SushiChef and defined its get_channel and construct_channel methods.

For the purpose of this example, we'll create three topic nodes in different languages that contain one document in each.


In [ ]:
from ricecooker.chefs import SushiChef
from ricecooker.classes.nodes import ChannelNode, TopicNode, DocumentNode
from ricecooker.classes.files import DocumentFile
from le_utils.constants import licenses

from le_utils.constants.languages import getlang



class MultipleLanguagesChef(SushiChef):
    """
    A sushi chef that creates a channel with content in EN, FR, and SP.
    """
    channel_info = {
        'CHANNEL_TITLE': 'Languages test channel',
        'CHANNEL_SOURCE_DOMAIN': '<yourdomain.org>',     # where you got the content
        'CHANNEL_SOURCE_ID': '<unique id for channel>',  # channel's unique id  CHANGE ME!!
        'CHANNEL_LANGUAGE': getlang('mul').code,         # set global language for channel
        'CHANNEL_DESCRIPTION': 'This channel contains nodes in multiple languages',
        'CHANNEL_THUMBNAIL': None,                       # (optional)
    }

    def construct_channel(self, **kwargs):
        # create channel
        channel = self.get_channel(**kwargs)

        # create the English topic, add a DocumentNode to it
        topic = TopicNode(
            source_id="<en_topic_id>",
            title="New Topic in English",
            language=getlang('en').code,
        )
        doc_node = DocumentNode(
            source_id="<en_doc_id>",
            title='Some doc in English',
            description='This is a sample document node in English',
            files=[DocumentFile(path='samplefiles/documents/doc_EN.pdf')],
            license=licenses.PUBLIC_DOMAIN,
            language=getlang('en').code,
        )
        topic.add_child(doc_node)
        channel.add_child(topic)

        # create the Spanish topic, add a DocumentNode to it
        topic = TopicNode(
            source_id="<es_topic_id>",
            title="Topic in Spanish",
            language=getlang('es-MX').code,
        )
        doc_node = DocumentNode(
            source_id="<es_doc_id>",
            title='Some doc in Spanish',
            description='This is a sample document node in Spanish',
            files=[DocumentFile(path='samplefiles/documents/doc_ES.pdf')],
            license=licenses.PUBLIC_DOMAIN,
            language=getlang('es-MX').code,
        )
        topic.add_child(doc_node)
        channel.add_child(topic)

        # create the French topic, add a DocumentNode to it
        topic = TopicNode(
            source_id="<fr_topic_id>",
            title="Topic in French",
            language=languages.getlang('fr').code,
        )
        doc_node = DocumentNode(
            source_id="<fr_doc_id>",
            title='Some doc in French',
            description='This is a sample document node in French',
            files=[DocumentFile(path='samplefiles/documents/doc_FR.pdf')],
            license=licenses.PUBLIC_DOMAIN,
            language=getlang('fr').code,
        )
        topic.add_child(doc_node)
        channel.add_child(topic)

        return channel

Run of you chef by creating an instance of the chef class and calling it's run method:


In [6]:
mychef = MultipleLanguagesChef()
args = {
    'command': 'dryrun',  # use  'uploadchannel'  for real run
    'verbose': True,
    'token': 'YOURTOKENHERE9139139f3a23232'
}
options = {}
mychef.run(args, options)


INFO     In SushiChef.run method. args={'command': 'dryrun', 'reset': True, 'verbose': True, 'token': 'YOURTO...'} options={}
INFO     

***** Starting channel build process *****


INFO     Calling construct_channel... 
INFO        Setting up initial channel structure... 
INFO        Validating channel structure...
INFO           Languages test channel (ChannelNode): 6 descendants
INFO              New Topic in English (TopicNode): 1 descendant
INFO                 Some doc in English (DocumentNode): 1 file
INFO              Topic in Spanish (TopicNode): 1 descendant
INFO                 Some doc in Spanish (DocumentNode): 1 file
INFO              Topic in French (TopicNode): 1 descendant
INFO                 Some doc in French (DocumentNode): 1 file
INFO        Tree is valid

INFO     Downloading files...
INFO     Processing content...
INFO     	--- Downloaded e8b1fe37ce3da500241b4af4e018a2d7.pdf
INFO     	--- Downloaded cef22cce0e1d3ba08861fc97476b8ccf.pdf
INFO     	--- Downloaded 6c8730e3e2554e6eac0ad79304bbcc68.pdf
INFO        All files were successfully downloaded
INFO     Command is dryrun so we are not uploading chanel.

Congratulations, you put three languages on the internet!


In [ ]:

Example 2: YouTube video with subtitles in multiple languages

You can use the library youtube_dl to get lots of useful metadata about videos and playlists, including the which language subtitle are vailable for a video.


In [7]:
import youtube_dl

ydl = youtube_dl.YoutubeDL({
    #'quiet': True,
    'no_warnings': True,
    'writesubtitles': True,
    'allsubtitles': True,
})


youtube_id =  'FN12ty5ztAs'

info = ydl.extract_info(youtube_id, download=False)
subtitle_languages = info["subtitles"].keys()

print(subtitle_languages)


[youtube] FN12ty5ztAs: Downloading webpage
[youtube] FN12ty5ztAs: Downloading MPD manifest
dict_keys(['en', 'fr', 'zu'])

In [ ]:

Full sushi chef example

The YoutubeVideoWithSubtitlesSushiChef class below shows how to create a channel with youtube video and upload subtitles files with all available languages.


In [10]:
from ricecooker.chefs import SushiChef
from ricecooker.classes import licenses
from ricecooker.classes.nodes import ChannelNode, TopicNode, VideoNode
from ricecooker.classes.files import YouTubeVideoFile, YouTubeSubtitleFile
from ricecooker.classes.files import is_youtube_subtitle_file_supported_language


import youtube_dl
ydl = youtube_dl.YoutubeDL({
    'quiet': True,
    'no_warnings': True,
    'writesubtitles': True,
    'allsubtitles': True,
})


# Define the license object with necessary info
TE_LICENSE = licenses.SpecialPermissionsLicense(
    description='Permission granted by Touchable Earth to distribute through Kolibri.',
    copyright_holder='Touchable Earth Foundation (New Zealand)'
)


class YoutubeVideoWithSubtitlesSushiChef(SushiChef):
    """
    A sushi chef that creates a channel with content in EN, FR, and SP.
    """
    channel_info = {
        'CHANNEL_SOURCE_DOMAIN': '<yourdomain.org>',     # where you got the content
        'CHANNEL_SOURCE_ID': '<unique id for channel>',  # channel's unique id  CHANGE ME!!
        'CHANNEL_TITLE': 'Youtube subtitles downloading chef',
        'CHANNEL_LANGUAGE': 'en',
        'CHANNEL_THUMBNAIL': 'https://edoc.coe.int/4115/postcard-47-flags.jpg',
        'CHANNEL_DESCRIPTION': 'This is a test channel to make sure youtube subtitle languages lookup works'
    }

    def construct_channel(self, **kwargs):
        # create channel
        channel = self.get_channel(**kwargs)

        # get all subtitles available for a sample video
        youtube_id ='FN12ty5ztAs'
        info = ydl.extract_info(youtube_id, download=False)
        subtitle_languages = info["subtitles"].keys()
        print('Found subtitle_languages = ', subtitle_languages)
        
        # create video node
        video_node = VideoNode(
            source_id=youtube_id,
            title='Youtube video',
            license=TE_LICENSE,
            derive_thumbnail=True,
            files=[YouTubeVideoFile(youtube_id=youtube_id)],
        )

        # add subtitles in whichever languages are available.
        for lang_code in subtitle_languages:
            if is_youtube_subtitle_file_supported_language(lang_code):
                video_node.add_file(
                    YouTubeSubtitleFile(
                        youtube_id=youtube_id,
                        language=lang_code
                    )
                )
            else:
                print('Unsupported subtitle language code:', lang_code)

        channel.add_child(video_node)

        return channel

In [11]:
chef = YoutubeVideoWithSubtitlesSushiChef()
args = {
    'command': 'dryrun',  # use  'uploadchannel'  for real run
    'verbose': True,
    'token': 'YOURTOKENHERE9139139f3a23232'
}
options = {}
chef.run(args, options)


INFO     In SushiChef.run method. args={'command': 'dryrun', 'reset': True, 'verbose': True, 'token': 'YOURTO...'} options={}
INFO     

***** Starting channel build process *****


INFO     Calling construct_channel... 
INFO        Setting up initial channel structure... 
INFO        Validating channel structure...
INFO           Youtube subtitles downloading chef (ChannelNode): 1 descendant
INFO              Youtube video (VideoNode): 4 files
INFO        Tree is valid

INFO     Downloading files...
INFO     Processing content...
INFO     	--- Downloaded (YouTube) a144a6af6977247684d2a3977dc6f841.mp4
Found subtitle_languages =  dict_keys(['en', 'fr', 'zu'])
INFO     	--- Downloaded 5f22a71e53271eb2d2abe013457a625d.jpg
ERROR       3 file(s) have failed to download
WARNING  	Video FN12ty5ztAs: http://www.youtube.com/watch?v=FN12ty5ztAs 
	   Subtitle with langauge en is not available for http://www.youtube.com/watch?v=FN12ty5ztAs
WARNING  	Video FN12ty5ztAs: http://www.youtube.com/watch?v=FN12ty5ztAs 
	   Subtitle with langauge fr is not available for http://www.youtube.com/watch?v=FN12ty5ztAs
WARNING  	Video FN12ty5ztAs: http://www.youtube.com/watch?v=FN12ty5ztAs 
	   Subtitle with langauge zul is not available for http://www.youtube.com/watch?v=FN12ty5ztAs
INFO     Command is dryrun so we are not uploading chanel.

In [ ]:


In [ ]:


In [ ]:


In [ ]: