Translation of products

This is the code for translation of products. In Prestashop, we have products in Thai and English description. We need to move the translation database to Woocommerce. Woocommerce uses "WPML Multilingual CMS" plugin. After activate, this plugin will create a database called "wp_icl_translation". The "trid" column is used to match the Thai and English version of the same product.

Our process is to create a English description product in "wp_posts" and "wp_postmeta" then match the "trid" of Thai and English product in "wp_icl_translations".

There are 3 parts.

  • Creating the "wp_posts" for English description products.

  • Creating the "wp_postmeta" for English description products.

  • Matching the "trid" of Thai and English product in the "wp_icl_translations"

Creating the "wp_posts"

Our process here is to create wp_posts for English description product. We create new "wp_posts" with contain English product by copy the Thai products and fill them with English descriptions. Change ID and index and concatenate it with the old "wp_posts". Upload new "wp_posts" to Woocommerce. Woocommerce will show you more products (because we copy from Thai product, backend will show 2 x number of products).

Filling the English "Description" and "Short description".


In [1]:
import pandas as pd
import numpy as np

We load the data. There are

1) "ps_product_lang.csv" the description and short description in Thai and English from Prestashop.

2) "wp_posts.csv" Woocommerce database consists of everything posting on the site (products, images and more).


In [2]:
#Description from Prestashop
df_description = pd.read_csv('sql_prestashop/ps_product_lang.csv', index_col=False)
#wp_posts from Woocommerce
wp_posts = pd.read_csv('sql_prestashop/wp_posts.csv', index_col=False)

We select only the English description from Prestashop database.


In [3]:
#Use only English "Description" & "Short description" from Prestashop.
df_description = df_description[df_description['id_lang'] == 1]

Next step, we fill the "Description" & "Short description" in "wp_posts".

Sorting the ID first.


In [4]:
eng_des = wp_posts.sort_values('ID')

"wp_posts" has many post types. We extract only "product" type.


In [5]:
eng_des = eng_des[eng_des['post_type'] == 'product']

Change column name for merge. Then we merge "eng_des" and "df_description" by name of a products. This will ensure that we match the right description to the right product.


In [6]:
#Change column name for merging dataframe.
eng_des = eng_des.rename(columns = {'post_title':'name'})

#merge.
eng_des = pd.merge(eng_des, df_description[['name', 'description', 'description_short']], how='left', on='name')

Fill wp_post new "Description and "Short description". Then drop the unuse


In [7]:
eng_des['post_excerpt'] = eng_des['description_short']
eng_des['post_content'] = eng_des['description']

#Drop unused column.
eng_des = eng_des.drop(['description', 'description_short'], axis=1)

There are some duplicate entry because some entry share the same name.


In [8]:
#Check for duplicate entry.
count = eng_des['name'].value_counts()
#Drop duplicate name of product.
eng_des = eng_des.drop_duplicates(subset='name')

#Create series for Thai ID.
th_id = eng_des['ID']

Changing "name" column back to "post_title". Protect the error when concatenating (after concatenate, it will have 24 column).


In [9]:
#Changing "name" column back to "post_title". Protect the error when concatenating
#(After concatenate, it will have 24 column).
eng_des = eng_des.rename(columns = {'name':'post_title'})

Change some error strings.

Some of the strings in English description are not in English. We need to change them.

First, we create empty value series and use it to collect new description data.


In [10]:
empty = np.empty(eng_des.shape[0])
description = pd.Series(empty)
description[0] = ''

Then we run a for loop to replace a string.


In [11]:
for i in range(0, eng_des.shape[0]):
    string = eng_des['post_content'].iloc[i]
    if pd.isnull(string) == False:
        string = string.replace("เรื่องย่อ", "Sypnosis")
        string = string.replace("ส่วนที่อยากบอก", "Artist said")   
        string = string.replace("ส่วนที่ผู้จัดทำอยากบอก", "Artist said")  
        string = string.replace("ช่องทางการติดต่อ", "Contact")   
        string = string.replace("ผู้จัดทำ", "Artist") 
        string = string.replace("จากเรื่อง", "Parody")  
        string = string.replace("อ้างอิงจากเรื่อง", "Parody") 
        string = string.replace("ลักษณะของสินค้า", "Details of the product") 
        string = string.replace("ออกขายครั้งแรก", "Publication date")  
        string = string.replace("ชื่อสินค้า", "Product") 
        string = string.replace("ชื่อผลงาน", "Product")
        description[i] = string

eng_des['post_content'] = description

Change ID and Index number.


In [12]:
#Find max "ID" of wp_posts. Use max ID + 1 to be the starting ID of English product.
max_id = wp_posts['ID'].max()
#Create Series of number as a new "ID".
eng_start = max_id + 1
eng_end = eng_start + eng_des.shape[0]
number = pd.Series(range(eng_start, eng_end))

#Reset eng_des index
eng_des = eng_des.reset_index()
#Drop old index column.
eng_des = eng_des.drop(['index'], axis=1)
eng_des['ID'] = number

#Create Series od number as a new index.
max_index = wp_posts.index.values.max()
eng_index_start = max_index + 1
eng_index_end = eng_index_start + eng_des.shape[0]
number = pd.Series(range(eng_index_start, eng_index_end))
eng_des['number'] = number
eng_des = eng_des.set_index(number)
#Drop unused column.
eng_des = eng_des.drop(['number'], axis=1)

Create "wp_posts" entry for images.

We have already made "wp_posts" entry for products. If we concatenate this to the old "wp_posts" and upload to Woocommerce. The new English description product can't find the images. We solve this problem by create the images entry for "wp_posts".

The process here is to copy the old images "wp_posts", change their "post_parent" to match image to the right product. Then concatenate to the old "wp_posts".

First, we extract only "attachment" post_type.


In [13]:
#Generate wp_posts for images
image = wp_posts[wp_posts['post_type'] == 'attachment']

Some images entry don't have a parent product. We select only the entries that have "post_parent" value.


In [14]:
image = image[image['post_parent'] != 0]

Let's check by counting "post_mime_type" and "post_parent". "post_mime_type" is a type of file such as ".jpg", ".png". Use value_counts() will show the type of each file in the data and how number of them in each category. Counting the "post_parent" show nymber of images in each ID.


In [15]:
#Check type of "post_mime_type"
count = image['post_mime_type'].value_counts()
#Check how many products.
count = image['post_parent'].value_counts()

Now we have to change "post_parent" to match the image to the right product. We use Python's dictionary to translate from old "post_parent" to a new one.

Extract old and new ID into variable.


In [16]:
new = eng_des['ID'].reset_index()
new = new.drop(['index'], axis=1)
new = new['ID']
old = th_id.reset_index()
old = old.drop(['index'], axis=1)
old = old['ID']

Some post_parent doesn't appear in dictionary. Pick only the one we have in Thai ID.


In [17]:
#reset index
image = image.reset_index()
image = image.drop(['index'], axis=1)

image['number'] = image['post_parent']
image = image.set_index('number')
image = image.loc[th_id]

Check for nan value and drop rows that have nan English ID.


In [18]:
#Check for nan value.
image['post_parent'].isnull().sum()
#Drop rows that have nan English ID.
image = image[np.isfinite(image['post_parent'])]

Create a dictionary.


In [19]:
dic = {}
for i in range(len(old)):
    dic[old[i]] = new[i]

Translate old to new "ID"


In [20]:
image['post_parent'] = [dic[x] for x in image['post_parent']]

Set new ID and index number.


In [21]:
max_id = eng_des['ID'].max()
#Create Series of number as a new "ID".
start = max_id + 1
end = start + image.shape[0]
number = pd.Series(range(start, end))

#Reset eng_des index
image = image.reset_index()
#Drop old index column.
image = image.drop(['number'], axis=1)
image['ID'] = number

#Create Series od number as a new index.
max_index = eng_des.index.values.max()
index_start = max_index + 1
index_end = index_start + image.shape[0]
number = pd.Series(range(index_start, index_end))
image['number'] = number
image = image.set_index(number)
#Drop unused column.
image = image.drop(['number'], axis=1)

Concatenate to "wp_posts"

Concatenate "wp_posts" and "eng_des" to the old wp_posts.


In [22]:
wp_posts_with_eng = pd.concat([wp_posts, eng_des, image], axis=0)

Arrange index.


In [23]:
wp_posts_with_eng = wp_posts_with_eng.sort_values('ID')

Export new raw_product and upload to Woocommerce.


In [ ]:
wp_posts_with_eng.to_csv('product_import_to_woo/wp_posts_with_eng.csv', encoding='utf-8', index=False)

Creating the "wp_postmeta"

Our process here is to create "wp_postmeta". "wp_post_meta" is a table collecting the detail of each post. In example "wp_postmeta" of product stores a SKU, price, weight and other information about product. Woocommerce isn't working properly without "wp_postmeta", so after we have already create "wp_posts", we have to generate "wp_postmeta".

We load "wp_postmeta.csv" that is the detail of each wp_post ID. We load wp_postmeta into 2 variable.

1) "wp_postmeta" is use in the end to concatenate with a new English product's "wp_postmeta" or "meta" variable.

2) "meta" is a new English product's "wp_postmeta"


In [25]:
wp_postmeta = pd.read_csv('sql_prestashop/wp_postmeta.csv', index_col=False)
meta = pd.read_csv('sql_prestashop/wp_postmeta.csv', index_col=False)

"meta" still have other posts. We need to extract only product, so we bring "meta" with the same ID as Thai products.


In [26]:
meta['number'] = meta['post_id']
meta = meta.set_index('number')
meta = meta.loc[th_id]

Translate old to new "ID".


In [27]:
meta['post_id'] = [dic[x] for x in meta['post_id']]

Set new ID and index number.


In [28]:
max_id = wp_postmeta['meta_id'].max()
#Create Series of number as a new "ID".
start = max_id + 1
end = start + meta.shape[0]
number = pd.Series(range(start, end))

#Reset eng_des index
meta = meta.reset_index()
#Drop old index column.
meta = meta.drop(['number'], axis=1)
meta['meta_id'] = number

#Set new index
#Create Series od number as a new index.
max_index = wp_postmeta.index.values.max()
index_start = max_index + 1
index_end = index_start + meta.shape[0]
number = pd.Series(range(index_start, index_end))
meta['number'] = number
meta = meta.set_index(number)
#Drop unused column.
meta = meta.drop(['number'], axis=1)

Concatenate with the old wp_posts.


In [29]:
wp_postmeta_with_eng = pd.concat([wp_postmeta, meta], axis=0)

Arrange index.


In [30]:
wp_postmeta_with_eng = wp_postmeta_with_eng.sort_values('post_id')

For a comfortable sorting the product in backend, We change the "SKU" into new format.


In [31]:
#Change SKU format.
for i in range(wp_postmeta_with_eng.shape[0]):
    if wp_postmeta_with_eng['meta_key'].iloc[i] == '_sku':
        wp_postmeta_with_eng['meta_value'].iloc[i] = 'A' + str(wp_postmeta_with_eng['meta_value'].iloc[i]).zfill(5)
        x = wp_postmeta_with_eng['meta_value'].iloc[i]


D:\Anaconda3\lib\site-packages\pandas\core\indexing.py:179: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

Export to .csv


In [ ]:
wp_postmeta_with_eng.to_csv('product_import_to_woo/wp_postmeta_with_eng.csv', encoding='utf-8', index=False)

Creating the "wp_icl_translations"

Loading the "wp_icl_translations". Use wp_icl_translations to link Thai & Eng product description.


In [32]:
translations = pd.read_csv('sql_prestashop/wp_icl_translations.csv', index_col=False)

Merging Thai & English Description to wp_icl_translations to make sure we match the correct ID.


In [33]:
thai_des = wp_posts[wp_posts['post_type'] == 'product']
thai_des = thai_des[['ID', 'post_title']]
eng_des = eng_des[['ID', 'post_title']]
tr = translations[translations['element_type'] == 'post_product']

First, we merge Thai product to "wp_icl_translations".


In [36]:
#Change column name for merging dataframe.
tr = tr.rename(columns = {'element_id':'ID'})

#Merge thai_des first. The result dataframe will have new "post_title" column.
tr = pd.merge(tr, thai_des, how='left', on='ID')

Then we merge it with English product description by "post_title". The reason we use "post_title" as a key because Thai and English product don't share the same ID.


In [37]:
#Merge English description. Now we use "post_title" as a key.
tr = pd.merge(tr, eng_des, how='left', on='post_title')

#Rename ID_x and ID_y for comfortable use.
tr = tr.rename(columns = {'ID_x':'ID_th', 'ID_y':'ID_en'})

Drop rows that English ID has a nan value.


In [38]:
tr = tr[np.isfinite(tr['ID_en'])]

Using the value_counts() to find duplicate values. If there are duplicate values, we drop it.


In [39]:
#Check for duplicate entry.
count = tr['ID_en'].value_counts()
#Drop duplicate name of product.
tr = tr.drop_duplicates(subset='ID_en')

Create a series with the size of English products.


In [40]:
#Find max "ID" of wp_icl_translations. Use max ID + 1 to be the starting ID of English description.
max_id = translations['translation_id'].max()
#Create Series of number as a new "ID".
start = max_id + 1
end = start + tr.shape[0]
number = pd.Series(range(start, end))

Creating a new dataframe in the format of wp_icl_translations.


In [41]:
#Create new dataframe in wp_icl_translations form. Then collect the processed
#English id's wp_icl_translations data.
dataframe = pd.DataFrame({
'translation_id': number,
'element_type' : 'post_product',
'element_id': tr['ID_en'].values,
'trid': tr['trid'].values,
'language_code': 'en',
'source_language_code': 'th'}, 
columns=['translation_id', 'element_type', 'element_id', 'trid', 'language_code', 'source_language_code'])

Set new index for dataframe.


In [42]:
#Set new index, so it will continue from the last index in wp_icl_translations.
start = translations.shape[0]
end = start + dataframe.shape[0]
number = pd.Series(range(start,end))
dataframe = dataframe.set_index(number)

Concatenating dataframe to "translations" and name it as wp_icl_translations


In [43]:
wp_icl_translations = pd.concat([translations, dataframe])

"source_language_code" must fill with "NULL" in order to make Woocommerce working properly.


In [44]:
wp_icl_translations["source_language_code"] = wp_icl_translations["source_language_code"].fillna('NULL')

Export to .csv


In [ ]:
wp_icl_translations.to_csv('product_import_to_woo/wp_icl_translations_to_import.csv', encoding='utf-8', index=False)