Ice Cream
- - Instructions / Notes:
  - The dataset above contains the ice cream sales, temperature, number of deaths by drowning and humidity level in a city during a timespan of 12 months.
- Clue
  - Some of the correlations you found above may be spurious: https://en.wikipedia.org/wiki/Spurious_relationship -- Only include meaningful correlations to the list!

Ice Cream

Instructions / Notes:

Read these carefully

Read and execute each cell in order, without skipping forward
You may create new Jupyter notebook cells to use for e.g. testing, debugging, exploring, etc.- this is encouraged in fact!- just make sure that your final answer dataframes and answers use the set variables outlined below
Have fun!



In [1]:

    
# Run the following to import necessary packages and import dataset. Do not use any additional plotting libraries.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')
datafile = "dataset/icecream.csv"
df = pd.read_csv(datafile)
df









    Out[1]:







  
    
      
      month
      ice_cream_sales
      temperature
      deaths_drowning
      humidity
    
  
  
    
      0
      12
      4.75
      40
      2
      30
    
    
      1
      11
      4.78
      50
      3
      20
    
    
      2
      1
      4.82
      55
      4
      70
    
    
      3
      2
      4.83
      58
      4
      70
    
    
      4
      3
      4.84
      60
      5
      20
    
    
      5
      10
      4.88
      55
      6
      30
    
    
      6
      5
      4.91
      68
      9
      20
    
    
      7
      9
      4.92
      70
      9
      10
    
    
      8
      4
      4.93
      75
      8
      50
    
    
      9
      7
      4.93
      80
      11
      10
    
    
      10
      6
      4.94
      83
      12
      90
    
    
      11
      8
      4.95
      88
      11
      50

The dataset above contains the ice cream sales, temperature, number of deaths by drowning and humidity level in a city during a timespan of 12 months.



In [19]:

    
# Here are the correlation coefficients between pairs of columns
corr = df.corr()
corr









    Out[19]:







  
    
      
      month
      ice_cream_sales
      temperature
      deaths_drowning
      humidity
    
  
  
    
      month
      1.000000
      -0.164441
      -0.207638
      -0.051340
      -0.481519
    
    
      ice_cream_sales
      -0.164441
      1.000000
      0.937996
      0.952490
      0.083232
    
    
      temperature
      -0.207638
      0.937996
      1.000000
      0.950942
      0.191497
    
    
      deaths_drowning
      -0.051340
      0.952490
      0.950942
      1.000000
      0.080003
    
    
      humidity
      -0.481519
      0.083232
      0.191497
      0.080003
      1.000000



In [60]:

    
abs_corr = np.abs(df.corr())

indices = corr.index
corr_pairs = []

for i, idx_i in enumerate(indices):
    for j, c in enumerate(abs_corr[idx_i]):
        if c > .9 and i < j:
            corr_pairs.append((idx_i, indices[j]))



In [61]:

    
corr_pairs









    Out[61]:





[('ice_cream_sales', 'temperature'),
 ('ice_cream_sales', 'deaths_drowning'),
 ('temperature', 'deaths_drowning')]

Identify strong (i.e., correleation coefficient > 0.9) and meaningful correlations among pairs of columns in this dataset. Append these pairs of correlated columns in the following form [column_x, column_y] to the variable below.



In [7]:

    
correlations = []
correlations.append(['ice_cream_sales', 'temperature'])

# do not touch
correlations.sort()
print(correlations)









    



[['ice_cream_sales', 'temperature']]

Clue

Some of the correlations you found above may be spurious: https://en.wikipedia.org/wiki/Spurious_relationship -- Only include meaningful correlations to the list!

If this clue changes your answer, try again below. Otherwise, if you are confident in your answer above, leave the following untouched.



In [ ]:

    
# meaningful_correlation.append(['column_x', 'column_y'])
correlations_clue = []

# do not touch
correlations_clue.sort()
print(correlations_clue)

	month	ice_cream_sales	temperature	deaths_drowning	humidity
0	12	4.75	40	2	30
1	11	4.78	50	3	20
2	1	4.82	55	4	70
3	2	4.83	58	4	70
4	3	4.84	60	5	20
5	10	4.88	55	6	30
6	5	4.91	68	9	20
7	9	4.92	70	9	10
8	4	4.93	75	8	50
9	7	4.93	80	11	10
10	6	4.94	83	12	90
11	8	4.95	88	11	50

	month	ice_cream_sales	temperature	deaths_drowning	humidity
month	1.000000	-0.164441	-0.207638	-0.051340	-0.481519
ice_cream_sales	-0.164441	1.000000	0.937996	0.952490	0.083232
temperature	-0.207638	0.937996	1.000000	0.950942	0.191497
deaths_drowning	-0.051340	0.952490	0.950942	1.000000	0.080003
humidity	-0.481519	0.083232	0.191497	0.080003	1.000000

Table of Contents

Ice Cream

Instructions / Notes:

The dataset above contains the ice cream sales, temperature, number of deaths by drowning and humidity level in a city during a timespan of 12 months.

Clue

Some of the correlations you found above may be spurious: https://en.wikipedia.org/wiki/Spurious_relationship -- Only include meaningful correlations to the list!