Ecommerce Purchases Exercise

In this Exercise you will be given some Fake Data about some purchases done through Amazon! Just go ahead and follow the directions and try your best to answer the questions and complete the tasks. Feel free to reference the solutions. Most of the tasks can be solved in different ways. For the most part, the questions get progressively harder.

Please excuse anything that doesn't make "Real-World" sense in the dataframe, all the data is fake and made-up.

Also note that all of these questions can be answered with one line of code.

Import pandas and read in the Ecommerce Purchases csv file and set it to a DataFrame called ecom.



In [0]:

    
import numpy as np
import pandas as pd
import seaborn as sns



In [0]:

    
data = pd.read_csv('https://s3-ap-southeast-1.amazonaws.com/intro-to-ml-minhdh/EcommercePurchases.csv')

Check the head of the DataFrame.



In [0]:

    
data.head()









    Out[0]:







  
    
      
      Address
      Lot
      AM or PM
      Browser Info
      Company
      Credit Card
      CC Exp Date
      CC Security Code
      CC Provider
      Email
      Job
      IP Address
      Language
      Purchase Price
    
  
  
    
      0
      16629 Pace Camp Apt. 448\nAlexisborough, NE 77...
      46 in
      PM
      Opera/9.56.(X11; Linux x86_64; sl-SI) Presto/2...
      Martinez-Herman
      6011929061123406
      02/20
      900
      JCB 16 digit
      pdunlap@yahoo.com
      Scientist, product/process development
      149.146.147.205
      el
      98.14
    
    
      1
      9374 Jasmine Spurs Suite 508\nSouth John, TN 8...
      28 rn
      PM
      Opera/8.93.(Windows 98; Win 9x 4.90; en-US) Pr...
      Fletcher, Richards and Whitaker
      3337758169645356
      11/18
      561
      Mastercard
      anthony41@reed.com
      Drilling engineer
      15.160.41.51
      fr
      70.73
    
    
      2
      Unit 0065 Box 5052\nDPO AP 27450
      94 vE
      PM
      Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
      Simpson, Williams and Pham
      675957666125
      08/19
      699
      JCB 16 digit
      amymiller@morales-harrison.com
      Customer service manager
      132.207.160.22
      de
      0.95
    
    
      3
      7780 Julia Fords\nNew Stacy, WA 45798
      36 vm
      PM
      Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0 ...
      Williams, Marshall and Buchanan
      6011578504430710
      02/24
      384
      Discover
      brent16@olson-robinson.info
      Drilling engineer
      30.250.74.19
      es
      78.04
    
    
      4
      23012 Munoz Drive Suite 337\nNew Cynthia, TX 5...
      20 IE
      AM
      Opera/9.58.(X11; Linux x86_64; it-IT) Presto/2...
      Brown, Watson and Andrews
      6011456623207998
      10/25
      678
      Diners Club / Carte Blanche
      christopherwright@gmail.com
      Fine artist
      24.140.33.94
      es
      77.82

How many rows and columns are there?



In [0]:

    
data.shape









    Out[0]:





(10000, 14)

What is the average Purchase Price?



In [0]:

    
data["Purchase Price"].mean()









    Out[0]:





50.347302

What were the highest and lowest purchase prices?



In [0]:

    
data["Purchase Price"].max()









    Out[0]:





99.99



In [0]:

    
data["Purchase Price"].min()









    Out[0]:





0.0

How many people have English 'en' as their Language of choice on the website?



In [0]:

    
data[data['Language'] == 'en'].count()[0]









    Out[0]:





1098

How many people have the job title of "Lawyer" ?



In [0]:

    
data[data['Job'] == 'Lawyer'].count()[0]









    Out[0]:





30

How many people made the purchase during the AM and how many people made the purchase during PM ?

(Hint: Check out value_counts() )



In [0]:

    
data['AM or PM'].value_counts()









    Out[0]:





PM    5068
AM    4932
Name: AM or PM, dtype: int64

What are the 5 most common Job Titles?



In [0]:

    
data['Job'].value_counts().head()









    Out[0]:





Interior and spatial designer    31
Lawyer                           30
Social researcher                28
Purchasing manager               27
Designer, jewellery              27
Name: Job, dtype: int64

Someone made a purchase that came from Lot: "90 WT" , what was the Purchase Price for this transaction?



In [0]:

    
data['Purchase Price'][data['Lot'] == '90 WT']









    Out[0]:





513    75.1
Name: Purchase Price, dtype: float64

What is the email of the person with the following Credit Card Number: 4926535242672853



In [0]:

    
data['Email'][data['Credit Card'] == 4926535242672853]









    Out[0]:





1234    bondellen@williams-garza.com
Name: Email, dtype: object

How many people have American Express as their Credit Card Provider and made a purchase above $95 ?



In [0]:

    
data2 = data[data['Purchase Price'] > 95]
data2[data2['CC Provider'] == 'American Express'].count()[0]









    Out[0]:





39

Hard: How many people have a credit card that expires in 2025?



In [0]:

    
data[data['CC Exp Date'].str.contains('/25')].shape[0]









    Out[0]:





1033

Hard: What are the top 5 most popular email providers/hosts (e.g. gmail.com, yahoo.com, etc...)



In [0]:

    
data[data['Email'].split('@')]









    



---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-104-fc415d27f878> in <module>()
      1 
----> 2 data[data['Email'].split('@')]

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   3612             if name in self._info_axis:
   3613                 return self[name]
-> 3614             return object.__getattribute__(self, name)
   3615 
   3616     def __setattr__(self, name, value):

AttributeError: 'Series' object has no attribute 'split'

Data Visualization

Implement a bar plot for top 5 most popular email providers/hosts



In [0]:

Plot distribution of Purchase Price



In [0]:

    
sns.distplot(data['Purchase Price'])









    Out[0]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1b64aad048>

Implement countplot on Language



In [0]:

    
sns.countplot(data['Language'])









    



/usr/local/lib/python3.6/dist-packages/seaborn/categorical.py:1428: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  stat_data = remove_na(group_data)






    Out[0]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1b645089e8>



In [0]:



In [0]:

    
Feel free to plot more graphs to dive deeper into the dataset.

	Address	Lot	AM or PM	Browser Info	Company	Credit Card	CC Exp Date	CC Security Code	CC Provider	Email	Job	IP Address	Language	Purchase Price
0	16629 Pace Camp Apt. 448\nAlexisborough, NE 77...	46 in	PM	Opera/9.56.(X11; Linux x86_64; sl-SI) Presto/2...	Martinez-Herman	6011929061123406	02/20	900	JCB 16 digit	pdunlap@yahoo.com	Scientist, product/process development	149.146.147.205	el	98.14
1	9374 Jasmine Spurs Suite 508\nSouth John, TN 8...	28 rn	PM	Opera/8.93.(Windows 98; Win 9x 4.90; en-US) Pr...	Fletcher, Richards and Whitaker	3337758169645356	11/18	561	Mastercard	anthony41@reed.com	Drilling engineer	15.160.41.51	fr	70.73
2	Unit 0065 Box 5052\nDPO AP 27450	94 vE	PM	Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...	Simpson, Williams and Pham	675957666125	08/19	699	JCB 16 digit	amymiller@morales-harrison.com	Customer service manager	132.207.160.22	de	0.95
3	7780 Julia Fords\nNew Stacy, WA 45798	36 vm	PM	Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0 ...	Williams, Marshall and Buchanan	6011578504430710	02/24	384	Discover	brent16@olson-robinson.info	Drilling engineer	30.250.74.19	es	78.04
4	23012 Munoz Drive Suite 337\nNew Cynthia, TX 5...	20 IE	AM	Opera/9.58.(X11; Linux x86_64; it-IT) Presto/2...	Brown, Watson and Andrews	6011456623207998	10/25	678	Diners Club / Carte Blanche	christopherwright@gmail.com	Fine artist	24.140.33.94	es	77.82

Ecommerce Purchases Exercise

Data Visualization

Great Job!