In [1]:
# The knowledge explorer would work with the data analyst to break these down into reports. Maybe they could create reports on the customer’s income. They could also analyze social media platforms and create a word cloud of feedback from thousands of customers. For example, some of the largest words in the word cloud were "travel," "recipe," and "restaurant." The team could go back and ask more questions. Why do our customers like to travel? Where are they going?
data = []
for i in range(5000):
data.append([i, (np.random.gamma(2,3)+10)*1000, (np.random.normal(100, 50)+100)])
df = pd.DataFrame(data, columns=['id', 'income', 'spend'])
# df.to_csv('csv_output/ch11_fig1.csv', index=False)
df = pd.read_csv('csv_output/ch11_fig1.csv')
df.head()
Out[1]:
In [2]:
df = pd.read_csv('csv_output/ch11_fig1.csv')
df['income_cut'], income_bins = pd.cut(df['income'],bins=20, labels=np.arange(1,21),retbins=True)
df['spend_cut'], spend_bins = pd.cut(df['spend'],bins=20, labels=np.arange(1,21),retbins=True)
gb = df.groupby(['income_cut', 'spend_cut']).id.count().reset_index()
%matplotlib inline
sns.set_style("whitegrid")
f, ax = plt.subplots(1, figsize=(8,6))
sns.heatmap(gb.pivot('income_cut', 'spend_cut').fillna(0).reset_index().transpose(), cmap="YlGnBu", ax=ax);
ax.set_ylabel('spend');
ax.set_yticklabels(['%.2f'%x for x in spend_bins]);
ax.set_xlabel('income');
ax.set_xticklabels(['%.2f'%x for x in income_bins]);
for tick in ax.get_xticklabels():
tick.set_rotation(90);
f.savefig('svg_output/ch11_fig1.svg', format='svg')
The x-axis is the income and the y-axis in the spend, people with higher income not necessarily spend more, those around 20k ~ 30k seem to have the highest spend among all.