2016.12.11 최종 버전

In [6]:
from preprocess import util, preprocess
import pandas as pd
  1. 데이터 불러와서 meta 와 join
  2. 문장 단위로 형태소 분석

In [14]:
data_path = "E:/dataset/Amazon/"
save_path = "E:/dataset/MasterThesis/FINAL/preprocess_data/"
category_list = ["Electronics","Beauty","Clothing_Shoes_and_Jewelry"]

In [4]:
%%time
pre = preprocess.Preprocess(data_path, save_path, category_list)
pre.preprocess()


Start extract sentences of Electronics
Completed extract sentences of Electronics, time : 488.42
Start extract samples from data
Completed extract samples of Electronics, time : 3.74
check shape ----------
Electronics shape after sampling : 250000, 10
Start pos tag of sentences in Electronics
Completed pos-tagging and save of Electronics, time : 1799.74
Start preprocessing in Electronics
Completed preprocess and save of Electronics, time : 88.07
Start extract sentences of Beauty
Completed extract sentences of Beauty, time : 202.66
Start extract samples from data
Completed extract samples of Beauty, time : 7.05
check shape ----------
Beauty shape after sampling : 202181, 10
Start pos tag of sentences in Beauty
Completed pos-tagging and save of Beauty, time : 792.45
Start preprocessing in Beauty
Completed preprocess and save of Beauty, time : 39.41
Start extract sentences of Clothing_Shoes_and_Jewelry
Completed extract sentences of Clothing_Shoes_and_Jewelry, time : 124.45
Start extract samples from data
Completed extract samples of Clothing_Shoes_and_Jewelry, time : 3.58
check shape ----------
Clothing_Shoes_and_Jewelry shape after sampling : 178026, 10
Start pos tag of sentences in Clothing_Shoes_and_Jewelry
Completed pos-tagging and save of Clothing_Shoes_and_Jewelry, time : 676.39
Start preprocessing in Clothing_Shoes_and_Jewelry
Completed preprocess and save of Clothing_Shoes_and_Jewelry, time : 31.93
Wall time: 1h 11min 37s

In [8]:
category = "Electronics"
data = pd.read_csv(save_path + "preprocess_complete_" + category + ".csv")

In [15]:
data.head()


Out[15]:
reviewTime asin reviewerID overall helpful reviewText title brand reviewSentence sent_length reviewSentence_tagged preprocessed
0 2013-07-21 B00CM0XHNS A372YX80GGM7DR 5.0 576 Ok, so I didn't buy this on Amazon, as I didn'... Ultimate Ears BOOM Wireless Bluetooth Speaker ... Logitech ["Ok, so I didn't buy this on Amazon, as I did... 58 [[('Ok', 'NNP'), (',', ','), ('so', 'IN'), ('I... [['ok', 'so', 'i', 'did', "n't", 'buy', 'this'...
1 2013-05-19 B00BQ5RY1G A1BG2Z071TYO7P 2.0 522 I received a Harmony Ultimate from Logitech be... Logitech Harmony Ultimate Remote with Customiz... Logitech ['I received a Harmony Ultimate from Logitech ... 27 [[('I', 'PRP'), ('received', 'VBD'), ('a', 'DT... [['i', 'received', 'a', 'harmony', 'ultimate',...
2 2013-12-16 B00EZ9XG62 AELAESM03451 1.0 290 This review is for the iPad Air keyboard. I ha... Logitech Ultrathin Keyboard Cover for iPad Air... Logitech ['This review is for the iPad Air keyboard.', ... 23 [[('This', 'DT'), ('review', 'NN'), ('is', 'VB... [['this', 'review', 'is', 'for', 'the', 'ipad'...
3 2013-01-21 B0099SMFVQ A36CMGR5ELUM34 5.0 283 Design: Very well put together. Elegant and th... Logitech Bluetooth Illuminated Keyboard K810 f... Logitech ['Design: Very well put together.', 'Elegant a... 28 [[('Design', 'NN'), (':', ':'), ('Very', 'RB')... [['design', 'very', 'well', 'put', 'together']...
4 2013-07-29 B00CM0XHNS A9TETE58A7JR3 3.0 260 So, I've been testing a few bluetooth speakers... Ultimate Ears BOOM Wireless Bluetooth Speaker ... Logitech ["So, I've been testing a few bluetooth speake... 57 [[('So', 'RB'), (',', ','), ('I', 'PRP'), ("'v... [['so', 'i', 'been', 'testing', 'a', 'few', 'b...

In [ ]: