Ch11 Figure3


In [1]:
# maybe your data analyst created a report of where the customers were traveling out that a lot of your customers do travel internationally, but not enough to justify selling new products. So the team decides to drop it and explore other areas. Maybe next you’ll 

countries = ['NONE' ,'US', 'CHILE', 'CANADA', 'MEXICO', 'SPAIN', 'ITALY', 'CHINA', 'THAILAND', 'UK', 'TURKEY', 'SOUTH KOREA']

data = []

for i in range(5000):
    country = countries[rd.randint(0,len(countries)-1)]
    if country == 'NONE':
        spend = rd.randint(60,1000)
    else:
        spend = rd.randint(20,100)
    data.append([i, country, spend])

df = pd.DataFrame(data, columns=['id', 'travel-dest', 'spend'])
# df.to_csv('csv_output/ch11_fig4.csv', index=False)
df = pd.read_csv('csv_output/ch11_fig4.csv')
df.head()


Out[1]:
id travel-dest spend
0 0 US 77
1 1 MEXICO 26
2 2 CHINA 46
3 3 UK 54
4 4 SOUTH KOREA 52

In [2]:
countries = ['NONE' ,'US', 'CHILE', 'CANADA', 'MEXICO', 'SPAIN', 'ITALY', 'CHINA', 'THAILAND', 'UK', 'TURKEY', 'SOUTH KOREA']
df = pd.read_csv('csv_output/ch11_fig4.csv')

%matplotlib inline
sns.set_style("whitegrid")

d1 = df.groupby('travel-dest').spend.sum().reset_index()
d1 = d1.sort_values('spend', ascending=False)
plt.bar(height=d1.iloc[1:].spend.sum(), left=0);
plt.bar(height=d1.spend, left=np.arange(1,len(countries)+1));

labels = ['TRAVELS']
labels.extend(list(d1['travel-dest']))

plt.xticks(np.arange(len(countries)+1)+.5,labels,rotation='vertical');
plt.ylabel('total spend');
plt.title('customer total spend by traveling destination');
plt.xlim(0, len(countries)+1)

plt.savefig('svg_output/ch11_fig4.svg', format='svg')


Total spend by customers that travel is greater than those that don't. However, comparing travel destination individually, total spend by each destination is way less than those that don't travel.