The following script extracts the (more) helpful reviews from the swiss reviews and saves them locally. From the extracted reviews it also saves a list with their asin identifiers.
The list of asin identifiers will be later used to to find the average review rating for the respective products.
In [1]:
    
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import yaml
    
Load the swiss reviews
In [91]:
    
with open("data/swiss-reviews.txt", 'r') as fp:
    swiss_rev = fp.readlines()
    
In [92]:
    
len(swiss_rev)
    
    Out[92]:
In [93]:
    
swiss_rev[2]
    
    Out[93]:
The filter_helpful function keeps only the reviews which had at least 5 flags/votes in the helpfulness field. This amounts to a subset of around 23000 reviews. A smaller subset of around 10000 reviews was obtained as well by only keeping reviews with 10 flags/votes. The main advantage of the smaller subset is that it contains better quality reviews while its drawback is, of course, the reduced size.
In [94]:
    
def filter_helpful(line):
    l = line.rstrip('\n')
    l = yaml.load(l)
    if('helpful' in l.keys()):
        if(l['helpful'][1] >= 5):
            return True
        else:
            return False
    else:
        print("Review does not have helpful score key: "+line)
        return False
    
Apply the filter_helpful to each swiss product review
In [95]:
    
def get_helpful(data):
    res = []
    counter = 1
    i = 0
    for line in data:
        i += 1
        if(filter_helpful(line)):
            if(counter % 1000 == 0):
                print("Count "+str(counter)+" / "+str(i))
            counter += 1
            res.append(line)
    return res
    
In [96]:
    
swiss_reviews_helpful = get_helpful(swiss_rev)
    
    
In [97]:
    
len(swiss_reviews_helpful)
    
    Out[97]:
Save the subset with helpful swiss product reviews
In [99]:
    
write_file = open('data/swiss-reviews-helpful-correct-bigger.txt', 'w')
for item in swiss_reviews_helpful:
  write_file.write(item)
write_file.close()
    
In [2]:
    
with open('data/swiss-reviews-helpful-correct-bigger.txt', 'r') as fp:
    swiss_reviews_helpful = fp.readlines()
    
The following function simply extracts the 'asin' from the helpful reviews. Repetitions of the asins are of no consequence, as the list is just meant to be a check up.
In [3]:
    
def filter_asin(line):
    l = line.rstrip('\n')
    l = yaml.load(l)
    if('asin' in l.keys()):
        return l['asin']
    else:
        return ''
    
In [4]:
    
helpful_asins = []
counter = 1
for item in swiss_reviews_helpful:
    if(counter%500 == 0):
        print(counter)
    counter += 1
    x = filter_asin(item)
    if(len(x) > 0):
        helpful_asins.append(x)
    
    
Save the list of asins.
In [104]:
    
import pickle
with open('data/helpful_asins_bigger.pickle', 'wb') as fp:
    pickle.dump(helpful_asins, fp)