In [ ]:
%matplotlib inline
import datetime
from collections import Counter

import pandas as pd

from sentiment_classification import run_network
from utils import clean_text
from utils import calc_ratios

In [ ]:
from IPython.core.display import HTML
css = open('table.css').read() + open('notebook.css').read()
HTML('<style>{}</style>'.format(css))

In [ ]:
reviews = pd.read_csv("reviews.csv", encoding="utf-8")
reviews.head(5)

In [ ]:
ratings = pd.read_csv("ratings.csv", encoding="utf-8")
ratings.head(5)

How many reviews are there?


In [ ]:

Who has the most review data?


In [ ]:

Convert unixReviewTime field to date field and add it dataframe with column name date


In [ ]:

What is the very first reviewText?


In [ ]:

What is the name of person that made first review?


In [ ]:

What is the id of video games that is the last reviewed?


In [ ]:

Make summaries lowercase and remove punctuations using clean_text function


In [ ]:

What is the most occurrent summary in all data?


In [ ]:

What is the most occurrent summary in 2011?


In [ ]:

What is the most occurent word in summaries in 2000?


In [ ]:

What is the most occurent word in reviewTexts in 2000?


In [ ]:

What is the most occurent word in reviewTexts before 2000?


In [ ]:

Filter out reviews if there are older than 5 years


In [ ]:

Join reviews and ratings dataframe on productID and userID


In [ ]:

Create a column named binary_ratings and for each row fill POSITIVE if ratings is bigger than 3 else fill NEGATIVE


In [ ]:

Find the most occurrent words in POSITIVE and NEGATIVE ratings


In [ ]:

Calculate the ratios of counter values


In [ ]:


In [ ]:

Run network with review_ratings dataframe and fields


In [ ]:

Predict whether the "good" word is positive or negative


In [ ]:

Predict whether the "Bad" word is positive or negative


In [ ]:

Predict "unplayable"


In [ ]:

Predict "excelente"


In [ ]:

Predict "excel"


In [ ]:

Predict "playable"


In [ ]:


In [ ]:


In [ ]: