Ch14 Figure1


In [1]:
df = pd.read_csv('csv_output/ch14_fig2.csv')
%matplotlib inline
sns.set_style("white")

f, ax = plt.subplots(1, figsize=(4,4))
sns.regplot(x='rating', y='sales', data=df[df.rating>0][['rating', 'sales']], ax=ax);
ax.set_title('correlation between sales and rating \n excluding no ratings');
f.savefig('svg_output/ch14_fig2.svg', format='svg')


Assume 0 is no rating, rest are the real ratings. Most items with total sales dollars close to 0 has no ratings. The rest, however, does not indicate that higher ratings yeilds higher sales. In fact, the reason might be the opposite, because the sales are higher, there more people actually bought the item are willing to leave feedbacks on the website. On the bottom, however, if we ignore those that has no ratings, there's almost no correlation at all.