Customer migration from Prestashop to Woocommerce part 1 : Gathering raw customer information from Prestashop

This is the product migration of the book store from Prestashop to Woocommerce format. Unlike the product, users (customers in Woocommerce are called users) don't have an import function. We must import the data via sql. Woocommerce users database structure consists of 2 sql table "wp_users" and "wp_usermeta".

1) "wp_users" is about the name, username and password of the users 2) "wp_usermeta" is about the detail of users such as billing address, group of users etc.

Our objective here is to collect the data from Prestashop, manipulate them in the form of Woocommerce.



In [1]:

    
import pandas as pd
import numpy as np
import csv

Loading the data from Prestashop backend and sql. Note that Prestashop backend can generate some table for you by a default function. We copy and paste into excel (the file will have .xlsx extension). For our comfortable, we use this table as a starter then merge with the other missing information from sql.



In [2]:

    
#Read data from excel
#data from prestashop backend
customer_from_backend = pd.read_excel('sql_prestashop/customer_detail.xlsx')
address_from_backend = pd.read_excel('sql_prestashop/customer_address.xlsx')
#data from mysql database
ps_address = pd.read_csv('sql_prestashop/ps_address.csv', index_col=False)
ps_customer_group = pd.read_csv('sql_prestashop/ps_customer_group.csv')

Select only interest column.



In [3]:

    
#Select only use column
ps_address = ps_address[['id_address', 'id_customer', 'other', 'phone', 'phone_mobile', 'vat_number', 'date_add', 'date_upd', 'active', 'deleted']]



In [4]:

    
address_from_backend = address_from_backend.drop(['firstname','lastname'], axis=1)

Sort id_customer ascending.



In [5]:

    
#customer_from_backend is not ascending
customer_from_backend = customer_from_backend.sort_values('id_customer')

In one customer may has many id_address. This depend on how many times they change the address on frontend. We create "link_id_and_address" dataframe to link "id_customer" and "id_address" in order to find the unique id_customer with max id_address first (max id is the lastest address available).



In [6]:

    
link_id_and_address = pd.DataFrame()
link_id_and_address['id_customer'] = ps_address['id_customer']
link_id_and_address['id_address'] = ps_address['id_address']
#sort "id_customer".
link_id_and_address = link_id_and_address.sort_values('id_customer')

We group "link_id_and_address" by "id_customer", so we have the unique "id_customer".



In [7]:

    
#Assume max id_address to be a lastest address update, so we use groupby to 
#find lastest id_address
link_id_and_address = link_id_and_address.groupby('id_customer', as_index=False).max()

Then we merge all dataframe into "information" dataframe



In [8]:

    
#merge
information = pd.merge(link_id_and_address, customer_from_backend, how='left', on='id_customer')
information = pd.merge(information, address_from_backend, how='left', on='id_address')
information = pd.merge(information, ps_address, how='left', on='id_address')

Change name of column in order to merge.



In [9]:

    
#Merge column with the same name will generate _x or _y follow the column name
#So rename it first for further use.
information = information.rename(columns = {'id_customer_x':'id_customer'})

Prestashop stores full name country. We must change to iso country format to use in Woocommerce. We view how many country first.



In [10]:

    
#Changing country in to woocommerce format
count_country = information['country'].value_counts()

Load iso country reference in to a dic.



In [11]:

    
#Load iso country reference data from Wikipedia
dic = {}
with open("sql_prestashop/wikipedia-iso-country-codes.csv") as f:
    file= csv.DictReader(f, delimiter=',')
    for line in file:
        dic[line['English short name lower case']] = line['Alpha-2 code']

If country column has nan value, it will cause an error. We view the country column first. Are there any nan value? Then temporary fill it with '-'.



In [12]:

    
#country column has nan value. change it to other country, avoid the error.
#count null country
information['country'].isnull().sum()









    Out[12]:





12



In [13]:

    
#temporary fill it with '-'
information['country'] = information['country'].fillna('-')

Changing some country names don't match with name in country dic.



In [14]:

    
#Add dic for '-'
dic['-'] = '-'
#Changing some dics that don't match the iso code
dic['Taiwan'] = dic['Taiwan, Province of China']
dic['South Korea'] = dic["Korea, Republic of"]
dic['HongKong'] = dic['Hong Kong']
dic['Vietnam'] = dic['Viet Nam']
dic['Korea, Dem. Republic of'] = dic["Korea, Democratic People's Republic of"]
dic['Macau'] = dic["Macao"]
dic['Brunei'] = dic["Brunei Darussalam"]

Changing to iso country code. Fill '-' back with nan value. Then check for nan value again. The number of nan value must equal to the previous check.



In [15]:

    
#Change to iso code
information['country'] = [dic[x] for x in information['country']]
#Replace '-' back to ba nan value
information['country'] = information['country'].replace('-',np.NaN)
#Check number of nan value again
information['country'].isnull().sum()









    Out[15]:





12

In new Woocommerce store, we have 3 groups of user. That is 18+, vendor and customer. The old Prestashop customers we have 7 group. The groups we want to keep are group 6 (vendors), 4 (18+) and the rest is 3 (customers).

For our comfortable running the for loop, we change the max group number (7) to 0.



In [ ]:

    
#Clean data for ps_customer group
#Group 6 is first priority then 4 
#First change group 7 to 0 in order to make group 6 is the max value
for x in range(ps_customer_group.shape[0]):
    if ps_customer_group['id_group'].iloc[x] == 7:
        ps_customer_group['id_group'].iloc[x] = 0

Now the maximum group number is 6. We group customer by "id_customer" by max group number.



In [16]:

    
#Find max value of group id       
group = ps_customer_group.groupby('id_customer', as_index=False).max()

Still have group number lower than 3. We set them to be 3.



In [17]:

    
#Still have group 2. Change it to group 3
for x in range(group.shape[0]):
    if group['id_group'].iloc[x] == 2:
        group['id_group'].iloc[x] = 3

Now we only have customer in 3 groups

1) Group 6 Vendors

2) Group 4 18+

3) Group 3 Customers

Then we merge "group" to infromation dataframe.



In [18]:

    
#merge again
information = pd.merge(information, group,\
                       how='left', on='id_customer')

Check for nan value in "id_group" and fill it with '3'.



In [19]:

    
#Check for nan value
information['id_group'].isnull().sum()
#Fill nan with 3
information['id_group'] = information['id_group'].fillna(3)



In [20]:

    
#save to .csv
information.to_csv('customer_import_to_woo/raw_information.csv', encoding='utf-8')









    



---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-20-7202c7eea910> in <module>()
      1 #save to .csv
----> 2 information.to_csv('customer_import_to_woo/raw_information.csv', encoding='utf-8')

D:\Anaconda3\lib\site-packages\pandas\core\frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
   1401                                      doublequote=doublequote,
   1402                                      escapechar=escapechar, decimal=decimal)
-> 1403         formatter.save()
   1404 
   1405         if path_or_buf is None:

D:\Anaconda3\lib\site-packages\pandas\io\formats\format.py in save(self)
   1575             f, handles = _get_handle(self.path_or_buf, self.mode,
   1576                                      encoding=self.encoding,
-> 1577                                      compression=self.compression)
   1578             close = True
   1579 

D:\Anaconda3\lib\site-packages\pandas\io\common.py in _get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text)
    380         elif encoding:
    381             # Python 3 and encoding
--> 382             f = open(path_or_buf, mode, encoding=encoding)
    383         elif is_text:
    384             # Python 3 and no explicit encoding

FileNotFoundError: [Errno 2] No such file or directory: 'customer_import_to_woo/raw_information.csv'