This is the product migration of the book store from Prestashop to Woocommerce format. Woocommerce has a function to import/export product. We will use this advantage by export the form of Woocommerce, merge with other important data then import it back to Woocommerce.
*I can't attach the data file here because of privacy.
In [1]:
import pandas as pd
import numpy as np
Store the database in the folder "sqlprestashop" The data here are collected from 2 places. It is Prestashop backend (some information already join) and from the mysql database (other important information that aren't joined yet. I will save file from mysql with a name "ps....csv").
In [2]:
df_woocommerce = pd.read_excel('sql_prestashop/presta_product_in_woo_form.xlsx')
df_description = pd.read_csv('sql_prestashop/ps_product_lang.csv', index_col=False)
df_img_link = pd.read_csv('sql_prestashop/ps_image.csv', index_col=False)
The old Prestashop has 2 languages, Thai and English. We interest only description in Thai language ('2').
In [3]:
df_description = df_description[df_description['id_lang'] == 2]
Change the column name from 'id_product' to 'ID' in order to merge.
In [4]:
df_description = df_description.rename(columns = {'id_product':'ID'})
df_img_link = df_img_link.rename(columns = {'id_product':'ID'})
We pick only interest column from df_description. There are 'ID', 'description', 'description_short'. Merge df_woocommerce with df_description by ID. Then fill new merge column in the df_woocommerce 'Description' and 'Short description' column.
In [5]:
result = pd.merge(df_woocommerce, df_description[['ID', 'description', 'description_short']], how='left', on='ID')
#Fill new merge column to df_woocommerce 'description' and 'Short description' column.
result['Description'] = result['description']
result['Short description'] = result['description_short']
Drop the new merge column
In [6]:
result = result.drop(result.columns[37:40], axis=1)
Now we already have Woocommerce import/export format (37 column) with the 'Description' and 'Short description' filled.
Next step we create a link for image of the products. We must view the data first by count number of the pic per "ID".
In [7]:
#count pic per id
df_img_link['ID'] = df_img_link['ID'].astype('category')
count_pic_per_id = pd.DataFrame(df_img_link['ID'].value_counts())
count_pic_per_id = count_pic_per_id.sort_index(axis=0)
Now let's take a look at how Prestashop and Woocommerce link the image.
Prestashop keep the product images in "img/p". The image of products id 1234 will be save in folder 1/2/3/4.
First, we copy all images from Prestashop (copy folder "p"). Images still separate by folder, we solve this by search for ".jpg" files in folder "p" on our desktop. Copy all of images to be in the same folder. We only interest in the main image (when upload product image, Prestashop will copy original image and resize it), so we search again and delete the image with a word "default".
Upload all images to Woocommerce. Woocommerce stores them by uploaded year and date. The picture folder is at "wp-content/uploads/2017/09/"
In [8]:
#Create a link to images
id_str = df_img_link['id_image'].astype(str)
df_img_link['id_image'] = 'http://aphrosbrand.com/wp-content/uploads/2017/09/' + id_str + '.jpg'
Many product have link to more than one images. Some images is the books front page. The others are the detail page. We must put them in the right order.
In [9]:
#Each product id image has a postion. Position 1 is the main image of product.
#We sort the position column first.
df_img_link = df_img_link.sort_values('position')
#Group df_img_link by "ID".
x = df_img_link.groupby('ID').agg(lambda x: x.tolist())
#Not use the "ID" column as index.
x.reset_index(inplace=True)
Merge dataframe. Fill image links in Woocommerce form (column 'Images') and then drop unused column.
In [10]:
#merge
result = pd.merge(result, x, how='left', on='ID')
#Set new images link (id_image) to 'Images'
result['Images'] = result['id_image']
#drop col ที่เพิ่มมาทิ้ง
result = result.drop(result.columns[37:41], axis=1)
Now we have a link to images in "Images" column but they are stored as list. We must remove [, [ and ' sign.
In [11]:
#Create Pandas Siries with nan values. Use it in for loop to remove [, ] and ' sign.
empty = np.empty(result.shape[0])
images = pd.Series(empty)
images[:] = np.nan
for x in range(result.shape[0]):
text1 = str(result['Images'].loc[x]).replace("'", '')
text2 = text1.replace('[', '')
images.loc[x] = text2.replace("]", '')
result['Images'] = images
The weight of product can find in "ps_product.csv"
In [12]:
ps_product = pd.read_csv('sql_prestashop/ps_product.csv', index_col=False)
Create dataframe with column "id_product" and "weight"
In [13]:
weight = pd.DataFrame(ps_product['id_product'])
weight['weight'] = ps_product['weight']
Rename "id_product" to "ID" and merge with result dataframe
In [14]:
#Rename "id_product" to "ID"
weight = weight.rename(columns = {'id_product':'ID'})
#merge
result = pd.merge(result, weight, how='left', on='ID')
Woocommerce store weight of the product in gram. We must multiply weight from Prestashop by 1000. Fill data in "Weight (g)" then dropthe unused column.
In [15]:
result['Weight (g)'] = result['weight'] * 1000
result = result.drop(['weight'], axis=1)
Loading the data from Prestashop database involve with the category.
In [16]:
ps_category_lang = pd.read_csv('sql_prestashop/ps_category_lang.csv', index_col=False)
ps_category_product = pd.read_csv('sql_prestashop/ps_category_product.csv', index_col=False)
In ps_category_lang, we don't have the unique id_category yet. We group the id_category first by select id_category that has maximum string in "name" column.
In [17]:
sum_categories = ps_category_lang.groupby('id_category', as_index=False)['name'].max()
After review the "sum_categories". We manually select the category for our Woocommerce store. Then change it to English (at first the category is written in Thai).
In [18]:
sum_categories.loc[2,'name'] = 'Doujinshi'
sum_categories.loc[3,'name'] = 'Doujinshi - Movie'
sum_categories.loc[4,'name'] = 'Doujinshi - Manga and Anime'
sum_categories.loc[5,'name'] = 'Others'
sum_categories.loc[7,'name'] = 'Doujinshi - Original'
sum_categories.loc[8,'name'] = 'Keychain'
sum_categories.loc[9,'name'] = 'Doujinshi - Game'
sum_categories.loc[10,'name'] = 'Note Book'
sum_categories.loc[11,'name'] = 'Pin'
sum_categories.loc[17,'name'] = 'Sticker'
sum_categories.loc[10,'name'] = 'Note Book'
sum_categories.loc[18,'name'] = 'Doujinshi - Fiction'
sum_categories.loc[21,'name'] = 'others'
We merge our new category to "ps_product" to match category name with id_product.
In [19]:
#merge
categories = pd.merge(ps_category_product, sum_categories, how='left', on='id_category')
One product maybe member of more than one category. We group other column to list by "id_product".
In [20]:
categories = categories.groupby('id_product', as_index=False).agg(lambda x: x.tolist())
Create Pandas Siries with nan values. This use to acquire the data in loop.
In [21]:
empty = np.empty(categories.shape[0])
cat1 = pd.Series(empty)
cat2 = pd.Series(empty)
cat3 = pd.Series(empty)
cat1[:] = np.nan
cat2[:] = np.nan
cat3[:] = np.nan
We have to change format of category in Woocommerce. The new category will have only 2 main categories "Doujinshi" and "Others" store in "cat1". We run for loop to check the first element of the list in "id_category" column. If it is 3, 4, 5, 9, 11, 20, we set it as "Doujinshi". The others set to "Others"
Then we use "cat2" to find sub category of "Doujinshi" and "Others" by checking the first and second element of the list.
Example: the list in categories['id_category'][0] is [3,4]. the result in cat1 is "Doujinshi" and cat2 is "Movie"
In [22]:
for x in range(categories.shape[0]):
#Cat1
#3 4 5 9 11 20 is the 'id_category' for 'Doujinshi'
if categories['id_category'][x][0] in (3, 4, 5, 9, 11, 20):
cat1[x] = 'Doujinshi'
else:
cat1[x] = 'Others'
#Cat2
#Sub Doujinshi
if 4 in categories['id_category'][x][0:2]:
cat2[x] = 'Movie'
if 5 in categories['id_category'][x][0:2]:
cat2[x] = 'Manga and Anime'
if 9 in categories['id_category'][x][0:2]:
cat2[x] = 'Original'
if 11 in categories['id_category'][x][0:2]:
cat2[x] = 'Game'
if 17 in categories['id_category'][x][0:2]:
cat2[x] = 'Fiction'
if 20 in categories['id_category'][x][0:2]:
cat2[x] = 'Fiction'
#Sub Others
if 7 in categories['id_category'][x][0:2]:
cat2[x] = 'others'
if 8 in categories['id_category'][x][0:2]:
cat2[x] = 'Keychain'
if 10 in categories['id_category'][x][0:2]:
cat2[x] = 'Keychain'
if 12 in categories['id_category'][x][0:2]:
cat2[x] = 'Notebook'
if 13 in categories['id_category'][x][0:2]:
cat2[x] = 'Pin'
if 14 in categories['id_category'][x][0:2]:
cat2[x] = 'others'
if 15 in categories['id_category'][x][0:2]:
cat2[x] = 'others'
if 16 in categories['id_category'][x][0:2]:
cat2[x] = 'others'
if 18 in categories['id_category'][x][0:2]:
cat2[x] = 'others'
if 19 in categories['id_category'][x][0:2]:
cat2[x] = 'Sticker'
if 21 in categories['id_category'][x][0:2]:
cat2[x] = 'others'
if 22 in categories['id_category'][x][0:2]:
cat2[x] = 'others'
if 23 in categories['id_category'][x][0:2]:
cat2[x] = 'others'
if 24 in categories['id_category'][x][0:2]:
cat2[x] = 'others'
We concat "cat1" and "cat2" to be in Woocommerce category form. The final category will be like this "Doujinshi, Doujinshi > Movie".
In [23]:
cat3 = cat1 + ',' + ' ' + cat1 + ' > ' + cat2
Some "cat2" still have nan value. We fill it with "Doujinshi".
In [24]:
#Count nan value in "cat2"
cat2.isnull().sum()
#Fill nan value with "Doujinshi"
cat3[cat2.isnull()] = 'Doujinshi'
Fill "Categories" column of result dataframe with "cat3"
In [25]:
result['Categories'] = cat3
We have 2 types of tags.
1) New Woocommerce tag using the Prestashop's category. Changing category about the artist name and story name to be tag in Woocommerce.
2) The old tag from Prestashop.
In [26]:
#Create nan Series use to acquire tag
empty = np.empty(categories.shape[0])
tag = pd.Series(empty)
tag[:] = np.nan
We do the tag from 1) first.
In [27]:
#set id_category to be index for the use in for loop.
sum_categories = sum_categories.set_index('id_category')
#We collect string, so set the first element to be empty string.
tag.iloc[0] = ''
for x in range(categories.shape[0]):
tag_list = []
for y in categories['id_category'][x]:
if y > 26:
tag_list.append(sum_categories['name'].loc[y])
tag.iloc[x] = tag_list
In [28]:
#Add 'tag' column to categories.
categories['tag'] = tag
Then collect the tag from prestashop database
In [29]:
#Still have to add some old tag. Loading from Prestashop database
ps_tag = pd.read_csv('sql_prestashop/ps_tag.csv', index_col=False)
ps_product_tag = pd.read_csv('sql_prestashop/ps_product_tag.csv', index_col=False)
In [30]:
#Drop unuse column
ps_tag = ps_tag.drop('id_lang', axis=1)
Now "ps_tag" has 2 column "id_tag" and "name". We must find the correlation with each "id_product", so we merge "ps_product_tag" which has "id_product" correspond with "id_tag" to "ps_tag".
In [31]:
#merge
old_presta_tag = pd.merge(ps_product_tag, ps_tag, how='left', on='id_tag')
We use only Thai language ("2").
In [32]:
#Use only lang = 2 (Thai)
old_presta_tag = old_presta_tag[old_presta_tag['id_lang'] == 2]
Next we group the "id_product" to be unique id. The sample in other column with same "id_product" will be aggregate into list.
In [33]:
#Groupby 'id_product' to list
old_presta_tag = old_presta_tag.groupby('id_product', as_index=False).agg(lambda x: x.tolist())
Change new column name for further use.
In [34]:
#Change column name
old_presta_tag = old_presta_tag.rename(columns = {'name':'from_ps_tag'})
Then merge tag "old_presta_tag" to "categories". We have tag from 1) and 2) in the same dataframe already.
In [35]:
#merge old_presta_tag to categories by 'id_product'
#This is to aggregate 'tag' and 'old_tag' with same 'id_product'
categories = pd.merge(categories, old_presta_tag[['id_product', 'from_ps_tag']], how='left', on='id_product')
Next we will aggregate tag from 1) and 2) into the same list.
In [36]:
#Before run the loop. We must find the 'nan' value that will cause the error.
categories['tag'].isnull().sum()
categories['from_ps_tag'].isnull().sum()
Out[36]:
If there is no old tag (tag from 2)), it will cause an error. We create "have_ps_tag" which is "categories" dataframe without nan value.
In [37]:
#Avoid categories['old_tag'] nan value that cause the error
#First we extract none nan sample to be new DataFrame called 'have_ps_tag'
have_ps_tag = categories[categories['from_ps_tag'].isnull() == False]
We use python "extend" to extend the list of column "tag" with "from_ps_tag".
In [38]:
#Then we extend the tag with the one from ps_tag
#Although we run from have_ps_tag, it will change 'categories['tag']' too
#because have_ps_tag is build from categories DataFrame
for x in range(have_ps_tag['id_product'].shape[0]):
have_ps_tag['tag'].iloc[x].extend(have_ps_tag['from_ps_tag'].iloc[x])
Now we have a long list in categories['tag']. Our problem is some tag is the same name but different in lower or uppercase letter.
Creating new Series to use in for loop. Initial the first index to '' in order to prevent the error.
In [39]:
#Create nan Series use to acquire a sum_tag and lowercase tag list
empty = np.empty(categories.shape[0])
sum_tag = pd.Series(empty)
lowercase = pd.Series(empty)
sum_tag[:] = np.nan
lowercase[:] = np.nan
#Fill null string to protect 'value error'
sum_tag[0] = ''
lowercase[0] = ''
Start a loop by first changing all element in each list to be lowercase letter. Then run a loop again in lowercase letter, pick tag only appear once.
Then set the final tag to categories['tag']
In [40]:
for x in range(categories.shape[0]):
newlist = []
lowercase.iloc[x] = [q.lower() for q in categories['tag'].iloc[x]]
for i in lowercase.iloc[x]:
if i not in newlist:
newlist.append(i)
sum_tag.iloc[x] = newlist
#Replace 'tag' with 'sum_tag'
categories['tag'] = sum_tag
Merge to result dataframe.
In [41]:
#Change categories column name for merge
categories = categories.rename(columns = {'id_product':'ID'})
#merge to result
result = pd.merge(result, categories, how='left', on='ID')
Fill in 'Tags' and then drop unused column.
In [42]:
#Add to result['Categories'] & result['Tags']
result['Tags'] = result['tag']
#Drop unuse column
result = result.drop(['id_category', 'position', 'name', 'tag', 'from_ps_tag'], axis=1)
Tag still in list. We must remove [, ], ' sign.
In [43]:
#Remove [, ] and ' in 'tag' column
#Convert list to string first
#Declare nan Series use to get data in for loop
empty = np.empty(result.shape[0])
tag = pd.Series(empty)
tag[:] = np.nan
for x in range(result.shape[0]):
text1 = str(result['Tags'].loc[x]).replace("'", '')
text1 = text1[0].replace('[','') + text1[1:]
text1 = text1[-1].replace(']','') + text1[:-1]
tag.loc[x] = text1
result['Tags'] = tag
Woocommerce use "SKU" instead of product id. Set "SKU" equal to "id_product".
In [44]:
result['SKU'] = result['ID']
If Product is too large, it will difficult importing to Woocommerce. We separate into 10 part.
In [45]:
separate_no = round(result.shape[0]/10)
In [46]:
result_part1 = result.iloc[0:separate_no]
result_part2 = result.iloc[separate_no:2*separate_no]
result_part3 = result.iloc[2*separate_no:3*separate_no]
result_part4 = result.iloc[3*separate_no:4*separate_no]
result_part5 = result.iloc[4*separate_no:5*separate_no]
result_part6 = result.iloc[5*separate_no:6*separate_no]
result_part7 = result.iloc[6*separate_no:7*separate_no]
result_part8 = result.iloc[7*separate_no:8*separate_no]
result_part9 = result.iloc[8*separate_no:9*separate_no]
result_part10 = result.iloc[9*separate_no:result.shape[0]]
Create a short one for testing import product.
In [47]:
result_short10 = result.iloc[0:10]
Export them to .csv file with "utf-8" encoding.
In [ ]:
#save to .csv
result.to_csv('product_import_to_woo/raw_product.csv', encoding='utf-8')
result_part1.to_csv('product_import_to_woo/raw_product1.csv', encoding='utf-8')
result_part2.to_csv('product_import_to_woo/raw_product2.csv', encoding='utf-8')
result_part3.to_csv('product_import_to_woo/raw_product3.csv', encoding='utf-8')
result_part4.to_csv('product_import_to_woo/raw_product4.csv', encoding='utf-8')
result_part5.to_csv('product_import_to_woo/raw_product5.csv', encoding='utf-8')
result_part6.to_csv('product_import_to_woo/raw_product6.csv', encoding='utf-8')
result_part7.to_csv('product_import_to_woo/raw_product7.csv', encoding='utf-8')
result_part8.to_csv('product_import_to_woo/raw_product8.csv', encoding='utf-8')
result_part9.to_csv('product_import_to_woo/raw_product9.csv', encoding='utf-8')
result_part10.to_csv('product_import_to_woo/raw_product10.csv', encoding='utf-8')
result_short10.to_csv('product_import_to_woo/raw_product_short10.csv', encoding='utf-8')