Product migration from Prestashop to Woocommerce

This is the product migration of the book store from Prestashop to Woocommerce format. Woocommerce has a function to import/export product. We will use this advantage by export the form of Woocommerce, merge with other important data then import it back to Woocommerce.

*I can't attach the data file here because of privacy.


In [1]:
import pandas as pd
import numpy as np

Store the database in the folder "sqlprestashop" The data here are collected from 2 places. It is Prestashop backend (some information already join) and from the mysql database (other important information that aren't joined yet. I will save file from mysql with a name "ps....csv").

  • "presta_product_in_woo_form.xlsx" is the export product form of Woocommerce. Filling some information such as product name an id in Excel first. Use this dataframe to merge other information then import back to Woocommerce.
  • "ps_product_lang.csv" is the Prestashop database, store the description and short description field.
  • "ps_image.csv" is the Prestashop database, store image links.

In [2]:
df_woocommerce = pd.read_excel('sql_prestashop/presta_product_in_woo_form.xlsx')
df_description = pd.read_csv('sql_prestashop/ps_product_lang.csv', index_col=False)
df_img_link = pd.read_csv('sql_prestashop/ps_image.csv', index_col=False)

Description and Short description

The old Prestashop has 2 languages, Thai and English. We interest only description in Thai language ('2').


In [3]:
df_description = df_description[df_description['id_lang'] == 2]

Change the column name from 'id_product' to 'ID' in order to merge.


In [4]:
df_description = df_description.rename(columns = {'id_product':'ID'})
df_img_link = df_img_link.rename(columns = {'id_product':'ID'})

We pick only interest column from df_description. There are 'ID', 'description', 'description_short'. Merge df_woocommerce with df_description by ID. Then fill new merge column in the df_woocommerce 'Description' and 'Short description' column.


In [5]:
result = pd.merge(df_woocommerce, df_description[['ID', 'description', 'description_short']], how='left', on='ID')

#Fill new merge column to df_woocommerce 'description' and 'Short description' column.
result['Description'] = result['description']
result['Short description'] = result['description_short']

Drop the new merge column


In [6]:
result = result.drop(result.columns[37:40], axis=1)

Now we already have Woocommerce import/export format (37 column) with the 'Description' and 'Short description' filled.

Images Link

Next step we create a link for image of the products. We must view the data first by count number of the pic per "ID".


In [7]:
#count pic per id
df_img_link['ID'] = df_img_link['ID'].astype('category')
count_pic_per_id = pd.DataFrame(df_img_link['ID'].value_counts())
count_pic_per_id = count_pic_per_id.sort_index(axis=0)

Now let's take a look at how Prestashop and Woocommerce link the image.

Prestashop keep the product images in "img/p". The image of products id 1234 will be save in folder 1/2/3/4.

First, we copy all images from Prestashop (copy folder "p"). Images still separate by folder, we solve this by search for ".jpg" files in folder "p" on our desktop. Copy all of images to be in the same folder. We only interest in the main image (when upload product image, Prestashop will copy original image and resize it), so we search again and delete the image with a word "default".

Upload all images to Woocommerce. Woocommerce stores them by uploaded year and date. The picture folder is at "wp-content/uploads/2017/09/"


In [8]:
#Create a link to images
id_str = df_img_link['id_image'].astype(str)
df_img_link['id_image'] = 'http://aphrosbrand.com/wp-content/uploads/2017/09/' + id_str + '.jpg'

Many product have link to more than one images. Some images is the books front page. The others are the detail page. We must put them in the right order.


In [9]:
#Each product id image has a postion. Position 1 is the main image of product. 
#We sort the position column first.                         
df_img_link = df_img_link.sort_values('position')     
#Group df_img_link by "ID".
x = df_img_link.groupby('ID').agg(lambda x: x.tolist())
#Not use the "ID" column as index.
x.reset_index(inplace=True)

Merge dataframe. Fill image links in Woocommerce form (column 'Images') and then drop unused column.


In [10]:
#merge
result = pd.merge(result, x, how='left', on='ID')
#Set new images link (id_image) to 'Images'
result['Images'] = result['id_image']
#drop col ที่เพิ่มมาทิ้ง
result = result.drop(result.columns[37:41], axis=1)

Now we have a link to images in "Images" column but they are stored as list. We must remove [, [ and ' sign.


In [11]:
#Create Pandas Siries with nan values. Use it in for loop to remove [, ] and ' sign.
empty = np.empty(result.shape[0])
images = pd.Series(empty)
images[:] = np.nan

for x in range(result.shape[0]):
    text1 = str(result['Images'].loc[x]).replace("'", '')
    text2 = text1.replace('[', '')
    images.loc[x] = text2.replace("]", '')
    
result['Images'] = images

Weight

The weight of product can find in "ps_product.csv"


In [12]:
ps_product = pd.read_csv('sql_prestashop/ps_product.csv', index_col=False)

Create dataframe with column "id_product" and "weight"


In [13]:
weight = pd.DataFrame(ps_product['id_product'])
weight['weight'] = ps_product['weight']

Rename "id_product" to "ID" and merge with result dataframe


In [14]:
#Rename "id_product" to "ID"
weight = weight.rename(columns = {'id_product':'ID'})
#merge
result = pd.merge(result, weight, how='left', on='ID')

Woocommerce store weight of the product in gram. We must multiply weight from Prestashop by 1000. Fill data in "Weight (g)" then dropthe unused column.


In [15]:
result['Weight (g)'] = result['weight'] * 1000
result = result.drop(['weight'], axis=1)

Category

Loading the data from Prestashop database involve with the category.


In [16]:
ps_category_lang = pd.read_csv('sql_prestashop/ps_category_lang.csv', index_col=False)
ps_category_product = pd.read_csv('sql_prestashop/ps_category_product.csv', index_col=False)

In ps_category_lang, we don't have the unique id_category yet. We group the id_category first by select id_category that has maximum string in "name" column.


In [17]:
sum_categories = ps_category_lang.groupby('id_category', as_index=False)['name'].max()

After review the "sum_categories". We manually select the category for our Woocommerce store. Then change it to English (at first the category is written in Thai).


In [18]:
sum_categories.loc[2,'name'] = 'Doujinshi'                    
sum_categories.loc[3,'name'] = 'Doujinshi - Movie'                 
sum_categories.loc[4,'name'] = 'Doujinshi - Manga and Anime'
sum_categories.loc[5,'name'] = 'Others'
sum_categories.loc[7,'name'] = 'Doujinshi - Original'
sum_categories.loc[8,'name'] = 'Keychain'
sum_categories.loc[9,'name'] = 'Doujinshi - Game'
sum_categories.loc[10,'name'] = 'Note Book'
sum_categories.loc[11,'name'] = 'Pin'
sum_categories.loc[17,'name'] = 'Sticker'
sum_categories.loc[10,'name'] = 'Note Book'
sum_categories.loc[18,'name'] = 'Doujinshi - Fiction'
sum_categories.loc[21,'name'] = 'others'

We merge our new category to "ps_product" to match category name with id_product.


In [19]:
#merge
categories = pd.merge(ps_category_product, sum_categories, how='left', on='id_category')

One product maybe member of more than one category. We group other column to list by "id_product".


In [20]:
categories = categories.groupby('id_product', as_index=False).agg(lambda x: x.tolist())

Create Pandas Siries with nan values. This use to acquire the data in loop.


In [21]:
empty = np.empty(categories.shape[0])
cat1 = pd.Series(empty)
cat2 = pd.Series(empty)
cat3 = pd.Series(empty)

cat1[:] = np.nan
cat2[:] = np.nan
cat3[:] = np.nan

We have to change format of category in Woocommerce. The new category will have only 2 main categories "Doujinshi" and "Others" store in "cat1". We run for loop to check the first element of the list in "id_category" column. If it is 3, 4, 5, 9, 11, 20, we set it as "Doujinshi". The others set to "Others"

Then we use "cat2" to find sub category of "Doujinshi" and "Others" by checking the first and second element of the list.

Example: the list in categories['id_category'][0] is [3,4]. the result in cat1 is "Doujinshi" and cat2 is "Movie"


In [22]:
for x in range(categories.shape[0]):
    #Cat1
    #3 4 5 9 11 20 is the 'id_category' for 'Doujinshi'
    if categories['id_category'][x][0] in (3, 4, 5, 9, 11, 20):
        cat1[x] = 'Doujinshi'
    else:
        cat1[x] = 'Others'

    #Cat2
    #Sub Doujinshi
    if 4 in categories['id_category'][x][0:2]:
        cat2[x] = 'Movie'        
    if 5 in categories['id_category'][x][0:2]:
        cat2[x] = 'Manga and Anime'        
    if 9 in categories['id_category'][x][0:2]:
        cat2[x] = 'Original'        
    if 11 in categories['id_category'][x][0:2]:
        cat2[x] = 'Game'        
    if 17 in categories['id_category'][x][0:2]:
        cat2[x] = 'Fiction'        
    if 20 in categories['id_category'][x][0:2]:
        cat2[x] = 'Fiction'
        
    #Sub Others
    if 7 in categories['id_category'][x][0:2]:
        cat2[x] = 'others'        
    if 8 in categories['id_category'][x][0:2]:
        cat2[x] = 'Keychain'        
    if 10 in categories['id_category'][x][0:2]:
        cat2[x] = 'Keychain'        
    if 12 in categories['id_category'][x][0:2]:
        cat2[x] = 'Notebook'       
    if 13 in categories['id_category'][x][0:2]:
        cat2[x] = 'Pin'        
    if 14 in categories['id_category'][x][0:2]:
        cat2[x] = 'others'        
    if 15 in categories['id_category'][x][0:2]:
        cat2[x] = 'others'       
    if 16 in categories['id_category'][x][0:2]:
        cat2[x] = 'others'       
    if 18 in categories['id_category'][x][0:2]:
        cat2[x] = 'others'        
    if 19 in categories['id_category'][x][0:2]:
        cat2[x] = 'Sticker'        
    if 21 in categories['id_category'][x][0:2]:
        cat2[x] = 'others'   
    if 22 in categories['id_category'][x][0:2]:
        cat2[x] = 'others'      
    if 23 in categories['id_category'][x][0:2]:
        cat2[x] = 'others'        
    if 24 in categories['id_category'][x][0:2]:
        cat2[x] = 'others'

We concat "cat1" and "cat2" to be in Woocommerce category form. The final category will be like this "Doujinshi, Doujinshi > Movie".


In [23]:
cat3 = cat1 + ',' + ' ' + cat1 + ' > ' + cat2

Some "cat2" still have nan value. We fill it with "Doujinshi".


In [24]:
#Count nan value in "cat2"
cat2.isnull().sum()
#Fill nan value with "Doujinshi"
cat3[cat2.isnull()] = 'Doujinshi'

Fill "Categories" column of result dataframe with "cat3"


In [25]:
result['Categories'] = cat3

Tag

We have 2 types of tags.

1) New Woocommerce tag using the Prestashop's category. Changing category about the artist name and story name to be tag in Woocommerce.

2) The old tag from Prestashop.


In [26]:
#Create nan Series use to acquire tag
empty = np.empty(categories.shape[0])
tag = pd.Series(empty)
tag[:] = np.nan

We do the tag from 1) first.


In [27]:
#set id_category to be index for the use in for loop.
sum_categories = sum_categories.set_index('id_category')

#We collect string, so set the first element to be empty string.
tag.iloc[0] = ''
for x in range(categories.shape[0]):
    tag_list = []
    for y in categories['id_category'][x]:
        if y > 26:
            tag_list.append(sum_categories['name'].loc[y])
    tag.iloc[x] = tag_list

In [28]:
#Add 'tag' column to categories.
categories['tag'] = tag

Then collect the tag from prestashop database


In [29]:
#Still have to add some old tag. Loading from Prestashop database
ps_tag = pd.read_csv('sql_prestashop/ps_tag.csv', index_col=False)
ps_product_tag = pd.read_csv('sql_prestashop/ps_product_tag.csv', index_col=False)

In [30]:
#Drop unuse column
ps_tag = ps_tag.drop('id_lang', axis=1)

Now "ps_tag" has 2 column "id_tag" and "name". We must find the correlation with each "id_product", so we merge "ps_product_tag" which has "id_product" correspond with "id_tag" to "ps_tag".


In [31]:
#merge
old_presta_tag = pd.merge(ps_product_tag, ps_tag, how='left', on='id_tag')

We use only Thai language ("2").


In [32]:
#Use only lang = 2 (Thai)
old_presta_tag = old_presta_tag[old_presta_tag['id_lang'] == 2]

Next we group the "id_product" to be unique id. The sample in other column with same "id_product" will be aggregate into list.


In [33]:
#Groupby 'id_product' to list
old_presta_tag = old_presta_tag.groupby('id_product', as_index=False).agg(lambda x: x.tolist())

Change new column name for further use.


In [34]:
#Change column name
old_presta_tag = old_presta_tag.rename(columns = {'name':'from_ps_tag'})

Then merge tag "old_presta_tag" to "categories". We have tag from 1) and 2) in the same dataframe already.


In [35]:
#merge old_presta_tag to categories by 'id_product'
#This is to aggregate 'tag' and 'old_tag' with same 'id_product'
categories = pd.merge(categories, old_presta_tag[['id_product', 'from_ps_tag']], how='left', on='id_product')

Next we will aggregate tag from 1) and 2) into the same list.


In [36]:
#Before run the loop. We must find the 'nan' value that will cause the error.
categories['tag'].isnull().sum()
categories['from_ps_tag'].isnull().sum()


Out[36]:
251

If there is no old tag (tag from 2)), it will cause an error. We create "have_ps_tag" which is "categories" dataframe without nan value.


In [37]:
#Avoid categories['old_tag'] nan value that cause the error
#First we extract none nan sample to be new DataFrame called 'have_ps_tag' 
have_ps_tag = categories[categories['from_ps_tag'].isnull() == False]

We use python "extend" to extend the list of column "tag" with "from_ps_tag".


In [38]:
#Then we extend the tag with the one from ps_tag
#Although we run from have_ps_tag, it will change 'categories['tag']' too
#because have_ps_tag is build from categories DataFrame
for x in range(have_ps_tag['id_product'].shape[0]):
    have_ps_tag['tag'].iloc[x].extend(have_ps_tag['from_ps_tag'].iloc[x])

Now we have a long list in categories['tag']. Our problem is some tag is the same name but different in lower or uppercase letter.

Creating new Series to use in for loop. Initial the first index to '' in order to prevent the error.


In [39]:
#Create nan Series use to acquire a sum_tag and lowercase tag list
empty = np.empty(categories.shape[0])
sum_tag = pd.Series(empty)
lowercase = pd.Series(empty)
sum_tag[:] = np.nan
lowercase[:] = np.nan
#Fill null string to protect 'value error'
sum_tag[0] = ''
lowercase[0] = ''

Start a loop by first changing all element in each list to be lowercase letter. Then run a loop again in lowercase letter, pick tag only appear once.

Then set the final tag to categories['tag']


In [40]:
for x in range(categories.shape[0]):
    newlist = []
    lowercase.iloc[x] = [q.lower() for q in categories['tag'].iloc[x]]     
    for i in lowercase.iloc[x]:
        if i not in newlist:
            newlist.append(i)
    sum_tag.iloc[x] = newlist                        
                    
#Replace 'tag' with 'sum_tag' 
categories['tag'] = sum_tag

Merge to result dataframe.


In [41]:
#Change categories column name for merge
categories = categories.rename(columns = {'id_product':'ID'})
#merge to result
result = pd.merge(result, categories, how='left', on='ID')

Fill in 'Tags' and then drop unused column.


In [42]:
#Add to result['Categories'] & result['Tags']
result['Tags'] = result['tag']
#Drop unuse column
result = result.drop(['id_category', 'position', 'name', 'tag', 'from_ps_tag'], axis=1)

Tag still in list. We must remove [, ], ' sign.


In [43]:
#Remove [, ] and ' in 'tag' column
#Convert list to string first
#Declare nan Series use to get data in for loop
empty = np.empty(result.shape[0])
tag = pd.Series(empty)
tag[:] = np.nan

for x in range(result.shape[0]):
    text1 = str(result['Tags'].loc[x]).replace("'", '')
    text1 = text1[0].replace('[','') + text1[1:]
    text1 = text1[-1].replace(']','') + text1[:-1]
    tag.loc[x] = text1
    
result['Tags'] = tag

SKU

Woocommerce use "SKU" instead of product id. Set "SKU" equal to "id_product".


In [44]:
result['SKU'] = result['ID']

Export file

If Product is too large, it will difficult importing to Woocommerce. We separate into 10 part.


In [45]:
separate_no = round(result.shape[0]/10)

In [46]:
result_part1 = result.iloc[0:separate_no]
result_part2 = result.iloc[separate_no:2*separate_no]
result_part3 = result.iloc[2*separate_no:3*separate_no]
result_part4 = result.iloc[3*separate_no:4*separate_no]
result_part5 = result.iloc[4*separate_no:5*separate_no]
result_part6 = result.iloc[5*separate_no:6*separate_no]
result_part7 = result.iloc[6*separate_no:7*separate_no]
result_part8 = result.iloc[7*separate_no:8*separate_no]
result_part9 = result.iloc[8*separate_no:9*separate_no]
result_part10 = result.iloc[9*separate_no:result.shape[0]]

Create a short one for testing import product.


In [47]:
result_short10 = result.iloc[0:10]

Export them to .csv file with "utf-8" encoding.


In [ ]:
#save to .csv
result.to_csv('product_import_to_woo/raw_product.csv', encoding='utf-8')
result_part1.to_csv('product_import_to_woo/raw_product1.csv', encoding='utf-8')
result_part2.to_csv('product_import_to_woo/raw_product2.csv', encoding='utf-8')
result_part3.to_csv('product_import_to_woo/raw_product3.csv', encoding='utf-8')
result_part4.to_csv('product_import_to_woo/raw_product4.csv', encoding='utf-8')
result_part5.to_csv('product_import_to_woo/raw_product5.csv', encoding='utf-8')
result_part6.to_csv('product_import_to_woo/raw_product6.csv', encoding='utf-8')
result_part7.to_csv('product_import_to_woo/raw_product7.csv', encoding='utf-8')
result_part8.to_csv('product_import_to_woo/raw_product8.csv', encoding='utf-8')
result_part9.to_csv('product_import_to_woo/raw_product9.csv', encoding='utf-8')
result_part10.to_csv('product_import_to_woo/raw_product10.csv', encoding='utf-8')

result_short10.to_csv('product_import_to_woo/raw_product_short10.csv', encoding='utf-8')