Read these carefully
In [1]:
# Run the following to import necessary packages and import dataset. Do not use any additional plotting libraries.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.style.use('ggplot')
d1 = "dataset/sales1.csv"
d2 = "dataset/sales2.csv"
d3 = "dataset/sales3.csv"
d4 = "dataset/sales4.csv"
df1 = pd.read_csv(d1)
df2 = pd.read_csv(d2)
df3 = pd.read_csv(d3)
df4 = pd.read_csv(d4)
df1.head(n=5) # Print n number of rows from top of dataset
Out[1]:
In [2]:
df1.describe()
Out[2]:
In [3]:
df2.describe()
Out[3]:
In [4]:
df3.describe()
Out[4]:
In [5]:
df4.describe()
Out[5]:
Can you identify a dataset that is least likely to represent a company's sales over time? Set the following variable to 'Yes'
or 'No'
.
In [4]:
least_rep_dataset_exists = 'Yes'
If you answered 'Yes'
which dataset is least likely to represent a company's sales over time?
Set the following variable to 1
, 2
, 3
, or 4
.
In [5]:
least_rep_dataset = 4
If this clue changes your answer, try again below. Otherwise, if you are confident in your answer above, leave the following untouched.
In [13]:
df1.head()
Out[13]:
In [16]:
# Show your revised analysis below
df_all = [df1, df2, df3, df4]
for df in df_all:
df.plot.scatter('time', 'avg_sales')
In [18]:
least_rep_dataset_exists_clue = 'Yes'
In [19]:
least_rep_dataset_clue = 4
In [ ]: