In this Exercise you will be given some Fake Data about some purchases done through Amazon! Just go ahead and follow the directions and try your best to answer the questions and complete the tasks. Feel free to reference the solutions. Most of the tasks can be solved in different ways. For the most part, the questions get progressively harder.
Please excuse anything that doesn't make "Real-World" sense in the dataframe, all the data is fake and made-up.
Also note that all of these questions can be answered with one line of code.
Import pandas and read in the Ecommerce Purchases csv file and set it to a DataFrame called ecom.
In [0]:
import numpy as np
import pandas as pd
import seaborn as sns
In [0]:
data = pd.read_csv('https://s3-ap-southeast-1.amazonaws.com/intro-to-ml-minhdh/EcommercePurchases.csv')
Check the head of the DataFrame.
In [0]:
data.head()
Out[0]:
How many rows and columns are there?
In [0]:
data.shape
Out[0]:
What is the average Purchase Price?
In [0]:
data["Purchase Price"].mean()
Out[0]:
What were the highest and lowest purchase prices?
In [0]:
data["Purchase Price"].max()
Out[0]:
In [0]:
data["Purchase Price"].min()
Out[0]:
How many people have English 'en' as their Language of choice on the website?
In [0]:
data[data['Language'] == 'en'].count()[0]
Out[0]:
How many people have the job title of "Lawyer" ?
In [0]:
data[data['Job'] == 'Lawyer'].count()[0]
Out[0]:
How many people made the purchase during the AM and how many people made the purchase during PM ?
(Hint: Check out value_counts() )
In [0]:
data['AM or PM'].value_counts()
Out[0]:
What are the 5 most common Job Titles?
In [0]:
data['Job'].value_counts().head()
Out[0]:
Someone made a purchase that came from Lot: "90 WT" , what was the Purchase Price for this transaction?
In [0]:
data['Purchase Price'][data['Lot'] == '90 WT']
Out[0]:
What is the email of the person with the following Credit Card Number: 4926535242672853
In [0]:
data['Email'][data['Credit Card'] == 4926535242672853]
Out[0]:
How many people have American Express as their Credit Card Provider and made a purchase above $95 ?
In [0]:
data2 = data[data['Purchase Price'] > 95]
data2[data2['CC Provider'] == 'American Express'].count()[0]
Out[0]:
Hard: How many people have a credit card that expires in 2025?
In [0]:
data[data['CC Exp Date'].str.contains('/25')].shape[0]
Out[0]:
Hard: What are the top 5 most popular email providers/hosts (e.g. gmail.com, yahoo.com, etc...)
In [0]:
data[data['Email'].split('@')]
In [0]:
Plot distribution of Purchase Price
In [0]:
sns.distplot(data['Purchase Price'])
Out[0]:
Implement countplot on Language
In [0]:
sns.countplot(data['Language'])
Out[0]:
In [0]:
In [0]:
Feel free to plot more graphs to dive deeper into the dataset.