Pandas Exercise

When working on real world data tasks, you'll quickly realize that a large portion of your time is spent manipulating raw data into a form that you can actually work with, a process often called data munging or data wrangling. Different programming langauges have different methods and packages to handle this task, with varying degrees of ease, and luckily for us, Python has an excellent one called Pandas which we will be using in this exercise.



In [1]:

    
import numpy as np
import pandas as pd

Importing data and working with Data Frames

The Data Frame is perhaps the most important object in Pandas and Data Science in Python, providing a plethora of functions for common data tasks. Using only Pandas, do the following exercises.

1 - Download the free1.csv from the R Data Repository and save it to the same directory as this notebook. Then import into your environment as a Data Frame. Now read free2.csv directly into a Data Frame from the URL.



In [2]:

    
# Import from file
free1 = pd.read_csv('free1.csv')

# Import from URL
free2 = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/Zelig/free2.csv')



In [3]:

    
print(free1.head())
print(free2.head())









    



   Unnamed: 0  sex   age  educ   country  y  v1  v2  v3  v4  v5  v6
0      109276  0.0  20.0   4.0   Eurasia  1   4   3   3   5   3   4
1       88178  1.0  25.0   4.0    Oceana  2   3   3   5   5   5   5
2      111063  1.0  56.0   2.0  Eastasia  2   3   2   4   5   5   4
3      161488  0.0  65.0   6.0  Eastasia  2   3   3   5   5   5   5
4       44532  1.0  50.0   5.0    Oceana  1   5   3   5   5   3   5
   Unnamed: 0  sex   age  educ   country  y  v1  v2  v3  v4  v5  v6
0      109276  0.0  20.0   4.0   Eurasia  1   4   3   3   5   3   4
1       88178  1.0  25.0   4.0    Oceana  2   3   3   5   5   5   5
2      111063  1.0  56.0   2.0  Eastasia  2   3   2   4   5   5   4
3      161488  0.0  65.0   6.0  Eastasia  2   3   3   5   5   5   5
4       44532  1.0  50.0   5.0    Oceana  1   5   3   5   5   3   5

2 - Combine your free1 Data Frame with free2 into a single Data Frame, named free_data, and print the first few rows to verify that it worked correctly. From here on out, this combined Data Frame is what we will be working with.



In [4]:

    
free_data = pd.concat([free1, free2], ignore_index=True)

3 - Print the last 10 rows.



In [5]:

    
free_data.tail(10)

4 - Rename the first column (currently unamed), to id. Print the column names to verify that it worked correctly.



In [6]:

    
new_col = np.array(free_data.columns)
new_col[0] = 'id'
free_data.columns = new_col
print(free_data.columns)

# solution: free_date.rename(columns={'Unnamed: 0': 'id}, inplace=True)









    



Index(['id', 'sex', 'age', 'educ', 'country', 'y', 'v1', 'v2', 'v3', 'v4',
       'v5', 'v6'],
      dtype='object')

5 - What are the number of rows and columns of the Data Frame?



In [7]:

    
free_data.shape









    Out[7]:





(900, 12)

6 - What are the data types of each column? Can quantities like the mean be calculated for each columm? If not, which one(s) and why?



In [8]:

    
free_data.dtypes









    Out[8]:





id           int64
sex        float64
age        float64
educ       float64
country     object
y            int64
v1           int64
v2           int64
v3           int64
v4           int64
v5           int64
v6           int64
dtype: object

7 - Print out the first 5 rows of the country column.



In [9]:

    
free_data['country'].head()









    Out[9]:





0     Eurasia
1      Oceana
2    Eastasia
3    Eastasia
4      Oceana
Name: country, dtype: object

8 - How many unique values are in the country column?



In [10]:

    
len(free_data['country'].unique())









    Out[10]:





3

9 - Print out the number of occurences of each unique value in the country column.



In [11]:

    
free_data.groupby('country')['id'].count()
# solution: free_data.country.value_count()









    Out[11]:





country
Eastasia    300
Eurasia     300
Oceana      300
Name: id, dtype: int64

10 - Summarize the dataframe.



In [12]:

    
free_data.describe()









    Out[12]:






  
    
      
      id
      sex
      age
      educ
      y
      v1
      v2
      v3
      v4
      v5
      v6
    
  
  
    
      count
      900.000000
      898.000000
      892.000000
      890.000000
      900.000000
      900.000000
      900.000000
      900.000000
      900.000000
      900.000000
      900.000000
    
    
      mean
      90665.368889
      0.556793
      40.744395
      2.941573
      3.520000
      2.648889
      2.535556
      3.664444
      4.084444
      3.866667
      4.380000
    
    
      std
      44234.598996
      0.497041
      16.743316
      1.600394
      1.293709
      1.151991
      1.267310
      1.016363
      0.955973
      0.984869
      0.989399
    
    
      min
      142.000000
      0.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
    
    
      25%
      52621.000000
      0.000000
      27.000000
      1.000000
      3.000000
      2.000000
      2.000000
      3.000000
      3.000000
      3.000000
      4.000000
    
    
      50%
      108699.000000
      1.000000
      39.000000
      3.000000
      4.000000
      3.000000
      2.000000
      4.000000
      4.000000
      4.000000
      5.000000
    
    
      75%
      119329.000000
      1.000000
      52.000000
      4.000000
      5.000000
      3.000000
      3.000000
      4.000000
      5.000000
      5.000000
      5.000000
    
    
      max
      171811.000000
      1.000000
      90.000000
      7.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000

11 - Were all columns included in the summary? If not, print the summary again, forcing this column to appear in the result.



In [13]:

    
#free_data['country'] = free_data['country'].astype('category')



In [14]:

    
free_data.describe(include='all')









    Out[14]:






  
    
      
      id
      sex
      age
      educ
      country
      y
      v1
      v2
      v3
      v4
      v5
      v6
    
  
  
    
      count
      900.000000
      898.000000
      892.000000
      890.000000
      900
      900.000000
      900.000000
      900.000000
      900.000000
      900.000000
      900.000000
      900.000000
    
    
      unique
      NaN
      NaN
      NaN
      NaN
      3
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      top
      NaN
      NaN
      NaN
      NaN
      Oceana
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      freq
      NaN
      NaN
      NaN
      NaN
      300
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      mean
      90665.368889
      0.556793
      40.744395
      2.941573
      NaN
      3.520000
      2.648889
      2.535556
      3.664444
      4.084444
      3.866667
      4.380000
    
    
      std
      44234.598996
      0.497041
      16.743316
      1.600394
      NaN
      1.293709
      1.151991
      1.267310
      1.016363
      0.955973
      0.984869
      0.989399
    
    
      min
      142.000000
      0.000000
      1.000000
      1.000000
      NaN
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
    
    
      25%
      52621.000000
      0.000000
      27.000000
      1.000000
      NaN
      3.000000
      2.000000
      2.000000
      3.000000
      3.000000
      3.000000
      4.000000
    
    
      50%
      108699.000000
      1.000000
      39.000000
      3.000000
      NaN
      4.000000
      3.000000
      2.000000
      4.000000
      4.000000
      4.000000
      5.000000
    
    
      75%
      119329.000000
      1.000000
      52.000000
      4.000000
      NaN
      5.000000
      3.000000
      3.000000
      4.000000
      5.000000
      5.000000
      5.000000
    
    
      max
      171811.000000
      1.000000
      90.000000
      7.000000
      NaN
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000

12 - Print rows 100 to 110 of the free1 Data Frame.



In [15]:

    
free1.iloc[99:110]

13 - Print rows 100 to 110 of only the first 3 columns in free1 using only indices.



In [16]:

    
free1.iloc[99:110, 0:3]

14 - Create and print a list containing the mean and the value counts of each column in the data frame except the country column.



In [17]:

    
# create a list containing all columns
col = list(free_data.columns)
# exclude the country column
col = col[0:4] + col[5:]
#calculate mean and count and put them in a list
[list(free_data[col].mean()), list(free_data[col].count())]

# solution:
# results = []
# for col in free_data.drop('country', axis=1).columns:
#    results.append((free_data[col].mean(), free_data[col].value_counts()))









    Out[17]:





[[90665.368888888886,
  0.55679287305122493,
  40.744394618834079,
  2.9415730337078654,
  3.52,
  2.6488888888888891,
  2.5355555555555553,
  3.6644444444444444,
  4.0844444444444443,
  3.8666666666666667,
  4.3799999999999999],
 [900, 898, 892, 890, 900, 900, 900, 900, 900, 900, 900]]

15 - Create a Data Frame, called demographics, using only the columns sex, age, and educ from the free1 Data Frame. Also create a Data Frame called scores, using only the columns v1, v2, v3, v4, v5, v6 from the free1 Data Frame



In [18]:

    
demographics = free1[['sex', 'age', 'educ']]
print(demographics.head())

scores = free1[['v1', 'v2', 'v3', 'v4', 'v5', 'v6']]
print(scores.head())









    



   sex   age  educ
0  0.0  20.0   4.0
1  1.0  25.0   4.0
2  1.0  56.0   2.0
3  0.0  65.0   6.0
4  1.0  50.0   5.0
   v1  v2  v3  v4  v5  v6
0   4   3   3   5   3   4
1   3   3   5   5   5   5
2   3   2   4   5   5   4
3   3   3   5   5   5   5
4   5   3   5   5   3   5

16 - Loop through each row in scores and grab the largest value, in the v_ columns, found in each row and store your results in two lists containing the value and column name it came from. For example, row 0 is

{'v1': 4, 'v2': 3, 'v3': 3, 'v4': 5, 'v5': 3, 'v6': 4}

the values

('v4', 5)

should be added to your two lists.

We can do this in two ways: vectorized or using a for loop.

In the vectorized version we calculate max and argmax along the columns axis and put the values obtained in two lists:



In [19]:

    
value = list(scores.max(axis=1))
col_name = list(scores.apply(np.argmax, axis=1))

# print the first ten values for checking
print(value[0:10])
print(col_name[0:10])









    



[5, 5, 5, 5, 5, 4, 5, 4, 5, 5]
['v4', 'v3', 'v4', 'v3', 'v1', 'v1', 'v6', 'v1', 'v3', 'v4']

In the for loop version we iterate through each row and append the max and argmax for every row in two lists:



In [20]:

    
value = []
col_name = []

for index, row in scores.iterrows():
    value.append(np.max(row))
    col_name.append(np.argmax(row))

# print the first ten values for checking
print(value[0:10])
print(col_name[0:10])









    



[5, 5, 5, 5, 5, 4, 5, 4, 5, 5]
['v4', 'v3', 'v4', 'v3', 'v1', 'v1', 'v6', 'v1', 'v3', 'v4']

17 - Create a new Data Frame with columns named cat and score from your results in part (16), for the column with the largest score and the actual score respectively.



In [21]:

    
cat_scores = pd.DataFrame({'cat':col_name,'score':value})
cat_scores.head()

18 - Using the Data Frame created in part (17), print the frequency of each column being the max score.



In [22]:

    
cat_scores.groupby('cat').count() / cat_scores.shape[0]

Sorting, Filtering, and Grouping data

Most of the time, we'll want to rearrange the data a bit, include only certain values in our analysis, or put the data into useful groups. Pandas provides syntax and many functions to do this.

Using only Pandas, do the following exercises.

1 - Using the free1.csv downloaded above, import it as a Data Frame named free_data, rename the first column to id, and print the first few rows.



In [23]:

    
free_data = pd.read_csv('free1.csv')
col = list(free_data.columns)
col[0] = 'id'
free_data.columns = col

free_data.head()

2 - Sort free_data by country, educ, and then by age in decending order, modifying the original Data Frame.



In [24]:

    
free_data.sort_values(['country', 'educ', 'age'], ascending=[True, True, False], inplace=True)

free_data.head()

3 - Create a new Data Frame called uni containing only rows from free_data which indicate that the person attended university or graduate school. Print the value counts for each country.

This is the dictionary for the educ column:

No formal education
Less than primary school education
Completed primary school
Completed secondary school
Completed high school
Completed college
Completed post-graduate degree



In [25]:

    
# I guess graduate school means high school, not certain though...
uni = free_data[free_data['educ'] >= 5]
uni.groupby('country')['id'].count()









    Out[25]:





country
Eastasia    33
Eurasia     27
Oceana      21
Name: id, dtype: int64

4 - Create a list of three Data Frames for those who are less than 25 years old, between 25 and 50 years old, and older than 50.



In [26]:

    
ages_df_list = [free_data[free_data['age'] < 25],
                free_data[(free_data['age'] >= 25) & (free_data['age'] <= 50)],
                free_data[free_data['age'] > 50]]

ages_df_list[1]









    Out[26]:






  
    
      
      id
      sex
      age
      educ
      country
      y
      v1
      v2
      v3
      v4
      v5
      v6
    
  
  
    
      261
      137663
      0.0
      50.0
      1.0
      Eastasia
      4
      2
      2
      3
      5
      4
      5
    
    
      223
      127868
      1.0
      46.0
      1.0
      Eastasia
      5
      1
      1
      5
      5
      5
      1
    
    
      103
      81406
      0.0
      45.0
      1.0
      Eastasia
      1
      1
      1
      1
      1
      1
      1
    
    
      360
      138535
      1.0
      45.0
      1.0
      Eastasia
      5
      2
      3
      4
      5
      4
      5
    
    
      319
      129590
      1.0
      43.0
      1.0
      Eastasia
      5
      1
      1
      4
      4
      3
      5
    
    
      348
      159708
      1.0
      43.0
      1.0
      Eastasia
      5
      4
      2
      4
      5
      3
      5
    
    
      199
      156041
      1.0
      40.0
      1.0
      Eastasia
      4
      2
      4
      4
      4
      3
      5
    
    
      243
      160339
      1.0
      39.0
      1.0
      Eastasia
      5
      4
      5
      5
      3
      4
      5
    
    
      74
      140966
      0.0
      35.0
      1.0
      Eastasia
      1
      3
      4
      5
      5
      3
      5
    
    
      29
      8524
      1.0
      32.0
      1.0
      Eastasia
      4
      5
      4
      5
      5
      5
      5
    
    
      204
      8296
      0.0
      32.0
      1.0
      Eastasia
      5
      3
      3
      5
      5
      5
      5
    
    
      122
      166412
      0.0
      31.0
      1.0
      Eastasia
      3
      3
      3
      4
      3
      4
      3
    
    
      263
      114086
      0.0
      30.0
      1.0
      Eastasia
      3
      1
      2
      2
      1
      3
      4
    
    
      302
      82600
      1.0
      30.0
      1.0
      Eastasia
      4
      2
      1
      3
      4
      3
      5
    
    
      198
      125327
      1.0
      28.0
      1.0
      Eastasia
      4
      3
      2
      5
      5
      3
      5
    
    
      220
      125656
      0.0
      27.0
      1.0
      Eastasia
      3
      1
      1
      3
      3
      3
      4
    
    
      259
      20075
      1.0
      25.0
      1.0
      Eastasia
      5
      1
      2
      5
      5
      3
      5
    
    
      344
      105904
      1.0
      25.0
      1.0
      Eastasia
      5
      2
      1
      3
      5
      5
      5
    
    
      374
      149219
      1.0
      46.0
      2.0
      Eastasia
      5
      4
      2
      4
      3
      3
      4
    
    
      421
      107534
      1.0
      41.0
      2.0
      Eastasia
      4
      2
      5
      3
      2
      4
      3
    
    
      242
      146950
      0.0
      38.0
      2.0
      Eastasia
      5
      3
      2
      5
      5
      4
      5
    
    
      367
      162577
      1.0
      36.0
      2.0
      Eastasia
      4
      3
      3
      3
      3
      3
      3
    
    
      411
      147495
      1.0
      33.0
      2.0
      Eastasia
      4
      3
      3
      3
      3
      3
      3
    
    
      303
      146857
      0.0
      31.0
      2.0
      Eastasia
      4
      3
      2
      3
      5
      3
      5
    
    
      162
      82247
      1.0
      30.0
      2.0
      Eastasia
      2
      1
      2
      3
      4
      5
      4
    
    
      276
      111809
      1.0
      27.0
      2.0
      Eastasia
      3
      1
      4
      1
      2
      4
      4
    
    
      150
      128252
      0.0
      48.0
      3.0
      Eastasia
      3
      2
      3
      5
      3
      3
      5
    
    
      258
      124356
      1.0
      48.0
      3.0
      Eastasia
      5
      2
      3
      5
      5
      5
      5
    
    
      435
      20416
      0.0
      48.0
      3.0
      Eastasia
      5
      2
      3
      3
      2
      3
      3
    
    
      286
      121943
      0.0
      47.0
      3.0
      Eastasia
      3
      1
      1
      1
      4
      5
      5
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      129
      27969
      0.0
      44.0
      4.0
      Oceana
      3
      3
      3
      4
      4
      4
      4
    
    
      85
      25324
      0.0
      43.0
      4.0
      Oceana
      3
      5
      4
      5
      4
      5
      3
    
    
      96
      28150
      0.0
      42.0
      4.0
      Oceana
      2
      2
      3
      4
      4
      4
      5
    
    
      146
      42669
      1.0
      40.0
      4.0
      Oceana
      2
      2
      2
      3
      5
      4
      5
    
    
      334
      3073
      0.0
      40.0
      4.0
      Oceana
      4
      3
      4
      3
      4
      4
      4
    
    
      120
      41762
      1.0
      35.0
      4.0
      Oceana
      2
      2
      2
      5
      4
      5
      5
    
    
      166
      42641
      1.0
      35.0
      4.0
      Oceana
      3
      1
      3
      4
      4
      5
      5
    
    
      254
      41344
      0.0
      34.0
      4.0
      Oceana
      3
      2
      2
      3
      5
      4
      5
    
    
      284
      42168
      0.0
      33.0
      4.0
      Oceana
      4
      3
      2
      3
      4
      4
      5
    
    
      241
      42743
      0.0
      30.0
      4.0
      Oceana
      5
      2
      2
      5
      5
      4
      5
    
    
      80
      79340
      0.0
      29.0
      4.0
      Oceana
      3
      3
      4
      3
      3
      4
      5
    
    
      358
      92330
      0.0
      29.0
      4.0
      Oceana
      5
      3
      3
      4
      4
      4
      5
    
    
      155
      89971
      1.0
      28.0
      4.0
      Oceana
      2
      1
      1
      5
      3
      5
      5
    
    
      61
      4949
      1.0
      26.0
      4.0
      Oceana
      2
      5
      5
      5
      4
      4
      4
    
    
      266
      43344
      0.0
      26.0
      4.0
      Oceana
      4
      2
      2
      3
      4
      4
      5
    
    
      1
      88178
      1.0
      25.0
      4.0
      Oceana
      2
      3
      3
      5
      5
      5
      5
    
    
      174
      44669
      0.0
      25.0
      4.0
      Oceana
      3
      1
      1
      4
      3
      4
      4
    
    
      4
      44532
      1.0
      50.0
      5.0
      Oceana
      1
      5
      3
      5
      5
      3
      5
    
    
      14
      30513
      0.0
      49.0
      5.0
      Oceana
      1
      3
      4
      4
      4
      5
      5
    
    
      19
      41371
      1.0
      41.0
      5.0
      Oceana
      1
      2
      1
      4
      4
      3
      5
    
    
      281
      92195
      1.0
      31.0
      5.0
      Oceana
      4
      2
      1
      3
      4
      5
      5
    
    
      23
      92556
      0.0
      30.0
      5.0
      Oceana
      1
      2
      1
      4
      5
      4
      5
    
    
      133
      88111
      0.0
      30.0
      5.0
      Oceana
      3
      3
      2
      4
      5
      5
      5
    
    
      175
      23450
      1.0
      30.0
      5.0
      Oceana
      3
      1
      1
      4
      4
      4
      5
    
    
      251
      91858
      1.0
      29.0
      5.0
      Oceana
      5
      2
      1
      5
      5
      3
      5
    
    
      196
      23272
      0.0
      37.0
      6.0
      Oceana
      3
      2
      2
      5
      5
      4
      5
    
    
      34
      25441
      0.0
      42.0
      7.0
      Oceana
      2
      4
      4
      4
      4
      4
      5
    
    
      401
      26614
      0.0
      33.0
      7.0
      Oceana
      5
      3
      5
      4
      5
      4
      3
    
    
      151
      88856
      0.0
      30.0
      7.0
      Oceana
      3
      2
      2
      5
      4
      4
      5
    
    
      58
      60125
      0.0
      41.0
      NaN
      Oceana
      1
      5
      2
      4
      5
      5
      5
    
  

236 rows × 12 columns

5 - Using a for loop, create a list of 3 Data Frames each containing only one of the 3 countries.



In [27]:

    
countries_df_list = []
for country in free_data['country'].unique():
    countries_df_list.append(free_data[free_data['country'] == country])

countries_df_list[2]









    Out[27]:






  
    
      
      id
      sex
      age
      educ
      country
      y
      v1
      v2
      v3
      v4
      v5
      v6
    
  
  
    
      44
      76803
      1.0
      71.0
      1.0
      Oceana
      2
      4
      4
      4
      5
      4
      5
    
    
      48
      60488
      1.0
      63.0
      1.0
      Oceana
      4
      5
      4
      5
      5
      5
      5
    
    
      136
      61252
      0.0
      62.0
      1.0
      Oceana
      5
      5
      5
      5
      5
      5
      5
    
    
      145
      603
      1.0
      61.0
      1.0
      Oceana
      2
      2
      2
      5
      5
      5
      5
    
    
      171
      42755
      1.0
      60.0
      1.0
      Oceana
      3
      2
      2
      5
      5
      4
      5
    
    
      309
      88267
      0.0
      58.0
      1.0
      Oceana
      4
      2
      2
      3
      5
      5
      5
    
    
      436
      60079
      0.0
      57.0
      1.0
      Oceana
      4
      1
      3
      1
      2
      3
      2
    
    
      89
      76784
      1.0
      55.0
      1.0
      Oceana
      1
      1
      1
      5
      5
      5
      5
    
    
      142
      463
      0.0
      55.0
      1.0
      Oceana
      5
      5
      5
      5
      5
      5
      4
    
    
      20
      60279
      0.0
      54.0
      1.0
      Oceana
      1
      5
      4
      5
      5
      4
      5
    
    
      140
      90873
      1.0
      53.0
      1.0
      Oceana
      2
      2
      2
      3
      4
      4
      4
    
    
      154
      78538
      1.0
      52.0
      1.0
      Oceana
      3
      1
      1
      4
      4
      3
      5
    
    
      252
      59708
      1.0
      51.0
      1.0
      Oceana
      4
      2
      2
      4
      3
      3
      5
    
    
      163
      4028
      0.0
      50.0
      1.0
      Oceana
      3
      1
      1
      4
      5
      5
      5
    
    
      312
      78421
      1.0
      48.0
      1.0
      Oceana
      4
      2
      1
      3
      3
      4
      5
    
    
      336
      78608
      0.0
      48.0
      1.0
      Oceana
      5
      2
      2
      3
      4
      4
      5
    
    
      167
      78558
      1.0
      47.0
      1.0
      Oceana
      3
      2
      1
      4
      5
      4
      5
    
    
      13
      57867
      0.0
      46.0
      1.0
      Oceana
      1
      5
      5
      5
      5
      5
      5
    
    
      67
      73821
      1.0
      45.0
      1.0
      Oceana
      2
      4
      2
      5
      3
      4
      5
    
    
      132
      61135
      0.0
      45.0
      1.0
      Oceana
      4
      4
      4
      4
      4
      4
      4
    
    
      408
      92438
      1.0
      44.0
      1.0
      Oceana
      4
      3
      5
      2
      4
      4
      1
    
    
      269
      58757
      1.0
      43.0
      1.0
      Oceana
      4
      3
      2
      3
      4
      3
      5
    
    
      440
      56676
      1.0
      42.0
      1.0
      Oceana
      5
      4
      1
      3
      3
      2
      4
    
    
      441
      58098
      0.0
      41.0
      1.0
      Oceana
      5
      3
      4
      3
      3
      4
      4
    
    
      118
      78703
      1.0
      40.0
      1.0
      Oceana
      3
      4
      2
      5
      5
      4
      3
    
    
      245
      78405
      0.0
      40.0
      1.0
      Oceana
      4
      3
      3
      4
      4
      4
      5
    
    
      170
      78731
      1.0
      38.0
      1.0
      Oceana
      3
      2
      2
      4
      5
      4
      5
    
    
      93
      60470
      0.0
      37.0
      1.0
      Oceana
      2
      2
      2
      3
      5
      5
      5
    
    
      116
      56613
      0.0
      35.0
      1.0
      Oceana
      5
      5
      4
      4
      5
      5
      5
    
    
      111
      61465
      0.0
      34.0
      1.0
      Oceana
      2
      2
      2
      2
      4
      4
      5
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      201
      3085
      0.0
      24.0
      4.0
      Oceana
      3
      1
      2
      3
      5
      3
      5
    
    
      149
      4360
      1.0
      23.0
      4.0
      Oceana
      3
      2
      1
      4
      5
      4
      5
    
    
      247
      78281
      1.0
      22.0
      4.0
      Oceana
      3
      1
      1
      3
      5
      5
      5
    
    
      60
      30266
      0.0
      19.0
      4.0
      Oceana
      1
      3
      4
      5
      5
      5
      5
    
    
      291
      28638
      1.0
      19.0
      4.0
      Oceana
      4
      3
      3
      3
      4
      4
      5
    
    
      45
      23208
      0.0
      18.0
      4.0
      Oceana
      2
      3
      1
      3
      3
      3
      3
    
    
      271
      42296
      0.0
      18.0
      4.0
      Oceana
      4
      2
      2
      3
      4
      3
      5
    
    
      192
      23288
      1.0
      62.0
      5.0
      Oceana
      3
      2
      2
      4
      3
      3
      5
    
    
      256
      23924
      0.0
      60.0
      5.0
      Oceana
      3
      1
      1
      3
      4
      4
      5
    
    
      72
      30741
      1.0
      55.0
      5.0
      Oceana
      4
      5
      5
      5
      5
      5
      5
    
    
      4
      44532
      1.0
      50.0
      5.0
      Oceana
      1
      5
      3
      5
      5
      3
      5
    
    
      14
      30513
      0.0
      49.0
      5.0
      Oceana
      1
      3
      4
      4
      4
      5
      5
    
    
      19
      41371
      1.0
      41.0
      5.0
      Oceana
      1
      2
      1
      4
      4
      3
      5
    
    
      281
      92195
      1.0
      31.0
      5.0
      Oceana
      4
      2
      1
      3
      4
      5
      5
    
    
      23
      92556
      0.0
      30.0
      5.0
      Oceana
      1
      2
      1
      4
      5
      4
      5
    
    
      133
      88111
      0.0
      30.0
      5.0
      Oceana
      3
      3
      2
      4
      5
      5
      5
    
    
      175
      23450
      1.0
      30.0
      5.0
      Oceana
      3
      1
      1
      4
      4
      4
      5
    
    
      251
      91858
      1.0
      29.0
      5.0
      Oceana
      5
      2
      1
      5
      5
      3
      5
    
    
      63
      90418
      1.0
      24.0
      5.0
      Oceana
      2
      3
      2
      5
      5
      3
      5
    
    
      43
      31052
      0.0
      22.0
      5.0
      Oceana
      1
      4
      3
      4
      5
      4
      5
    
    
      189
      90645
      0.0
      22.0
      5.0
      Oceana
      3
      2
      1
      4
      4
      4
      5
    
    
      238
      92750
      1.0
      19.0
      5.0
      Oceana
      4
      3
      3
      4
      4
      5
      5
    
    
      115
      24643
      0.0
      59.0
      6.0
      Oceana
      3
      3
      2
      4
      4
      3
      5
    
    
      196
      23272
      0.0
      37.0
      6.0
      Oceana
      3
      2
      2
      5
      5
      4
      5
    
    
      62
      30485
      0.0
      68.0
      7.0
      Oceana
      2
      3
      4
      3
      4
      5
      5
    
    
      34
      25441
      0.0
      42.0
      7.0
      Oceana
      2
      4
      4
      4
      4
      4
      5
    
    
      401
      26614
      0.0
      33.0
      7.0
      Oceana
      5
      3
      5
      4
      5
      4
      3
    
    
      151
      88856
      0.0
      30.0
      7.0
      Oceana
      3
      2
      2
      5
      4
      4
      5
    
    
      58
      60125
      0.0
      41.0
      NaN
      Oceana
      1
      5
      2
      4
      5
      5
      5
    
    
      313
      60617
      NaN
      1.0
      NaN
      Oceana
      4
      3
      4
      3
      4
      4
      4
    
  

150 rows × 12 columns

6 - Create a list of age categories, labled 0, 1, and 2 for each row for the three groups made in part (4). Attach this list to the free_data dataframe as a column named age_cat.



In [28]:

    
# create the list of categories
age_cat = []
for i in range(3):
    # concatenate a list of i's of the same length of the i-th dataframe created in part 4
    age_cat  = age_cat + [i] * ages_df_list[i].shape[0]

# sort values by age
free_data.sort_values('age', ascending=True, inplace=True)
# add the list as a new column
free_data.loc[pd.notnull(free_data['age']), 'age_cat'] = age_cat

free_data.head()

7 - Print the mean for all columns for each age_cat using groupby.



In [29]:

    
free_data.groupby('age_cat').mean()









    Out[29]:






  
    
      
      id
      sex
      age
      educ
      y
      v1
      v2
      v3
      v4
      v5
      v6
    
    
      age_cat
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0.0
      97924.298851
      0.534884
      20.505747
      3.317647
      3.643678
      2.517241
      2.597701
      3.655172
      3.954023
      3.781609
      4.471264
    
    
      1.0
      92976.567797
      0.559322
      36.686441
      3.034335
      3.605932
      2.631356
      2.474576
      3.580508
      4.055085
      3.830508
      4.266949
    
    
      2.0
      81100.097561
      0.569106
      62.845528
      2.552846
      3.243902
      2.780488
      2.593496
      3.821138
      4.219512
      3.975610
      4.520325

8 - Print the mean education for each age_cat using groupby.



In [30]:

    
free_data.groupby('age_cat')['educ'].mean()









    Out[30]:





age_cat
0.0    3.317647
1.0    3.034335
2.0    2.552846
Name: educ, dtype: float64

9 - Print summary statistics for each column for those with an education greater than or equal to 5, grouped by age_cat.



In [31]:

    
free_data[free_data['educ'] >= 5].groupby('age_cat').describe()









    Out[31]:






  
    
      
      
      age
      educ
      id
      sex
      v1
      v2
      v3
      v4
      v5
      v6
      y
    
    
      age_cat
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0.0
      count
      20.000000
      20.000000
      20.000000
      20.000000
      20.000000
      20.000000
      20.000000
      20.000000
      20.000000
      20.000000
      20.000000
    
    
      mean
      21.750000
      5.200000
      97745.450000
      0.500000
      2.800000
      3.100000
      3.750000
      4.200000
      4.000000
      4.650000
      3.150000
    
    
      std
      1.585294
      0.410391
      35834.636660
      0.512989
      1.151658
      1.483240
      0.786398
      0.695852
      0.858395
      0.587143
      1.386969
    
    
      min
      19.000000
      5.000000
      31052.000000
      0.000000
      1.000000
      1.000000
      3.000000
      3.000000
      3.000000
      3.000000
      1.000000
    
    
      25%
      20.000000
      5.000000
      90588.250000
      0.000000
      2.000000
      2.000000
      3.000000
      4.000000
      3.000000
      4.000000
      2.000000
    
    
      50%
      22.000000
      5.000000
      108682.000000
      0.500000
      3.000000
      3.000000
      4.000000
      4.000000
      4.000000
      5.000000
      3.000000
    
    
      75%
      23.000000
      5.000000
      116392.250000
      1.000000
      4.000000
      4.250000
      4.000000
      5.000000
      5.000000
      5.000000
      4.000000
    
    
      max
      24.000000
      6.000000
      171662.000000
      1.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
    
    
      1.0
      count
      43.000000
      43.000000
      43.000000
      43.000000
      43.000000
      43.000000
      43.000000
      43.000000
      43.000000
      43.000000
      43.000000
    
    
      mean
      36.767442
      5.488372
      89611.930233
      0.488372
      2.767442
      2.395349
      3.581395
      4.093023
      3.720930
      4.162791
      3.581395
    
    
      std
      7.057037
      0.702787
      39754.316402
      0.505781
      1.087531
      1.177963
      1.096153
      0.921023
      1.031079
      1.111203
      1.257976
    
    
      min
      25.000000
      5.000000
      11633.000000
      0.000000
      1.000000
      1.000000
      1.000000
      2.000000
      1.000000
      2.000000
      1.000000
    
    
      25%
      30.000000
      5.000000
      50499.500000
      0.000000
      2.000000
      2.000000
      3.000000
      4.000000
      3.000000
      3.000000
      3.000000
    
    
      50%
      37.000000
      5.000000
      108928.000000
      0.000000
      3.000000
      2.000000
      4.000000
      4.000000
      4.000000
      5.000000
      4.000000
    
    
      75%
      42.000000
      6.000000
      116362.500000
      1.000000
      3.500000
      3.000000
      4.000000
      5.000000
      4.000000
      5.000000
      5.000000
    
    
      max
      50.000000
      7.000000
      162131.000000
      1.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
    
    
      2.0
      count
      18.000000
      18.000000
      18.000000
      18.000000
      18.000000
      18.000000
      18.000000
      18.000000
      18.000000
      18.000000
      18.000000
    
    
      mean
      58.555556
      5.555556
      83721.611111
      0.666667
      2.888889
      2.833333
      4.111111
      4.166667
      4.222222
      4.611111
      3.222222
    
    
      std
      5.802862
      0.704792
      54036.236832
      0.485071
      1.182663
      1.465285
      0.676400
      1.098127
      0.808452
      0.849837
      1.060275
    
    
      min
      51.000000
      5.000000
      23288.000000
      0.000000
      1.000000
      1.000000
      3.000000
      1.000000
      3.000000
      2.000000
      1.000000
    
    
      25%
      54.250000
      5.000000
      34431.750000
      0.000000
      2.000000
      2.000000
      4.000000
      4.000000
      4.000000
      5.000000
      3.000000
    
    
      50%
      58.000000
      5.000000
      51936.500000
      1.000000
      3.000000
      2.000000
      4.000000
      4.500000
      4.000000
      5.000000
      3.000000
    
    
      75%
      63.500000
      6.000000
      132317.250000
      1.000000
      3.750000
      4.000000
      4.750000
      5.000000
      5.000000
      5.000000
      4.000000
    
    
      max
      68.000000
      7.000000
      164869.000000
      1.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000
      5.000000

10 - Which of the vignette has the largest mean score for each education level? What about the median?

In this survey six vignettes were showed to different people and they were asked to determine the degree of freedom of the situation depicted.

These are the possible votes:

Completely free
Very free
Moderately free
Slightly free
Not free at all

and these are the vignettes:

Kay does not like many of the government's policies. She frequently publishes her opinion in newspapers, criticizing decisions by officials and calling for change. She sees little reason these actions could lead to government reprisal.
Michael disagrees with many of the government's policies. Though he knows criticism is frowned upon, he doesn't believe the government would punish someone for expressing critical views. He makes his opinion known on most issues without regard to who is listening.
Bob has political views at odds with the government. He has heard of people occasionally being arrested for speaking out against the government, and government leaders sometimes make political speeches condemning those who criticize. He sometimes writes letters to newspapers about politics, but he is careful not to use his real name.
Connie does not like the government's stance on many issues. She has a friend who was arrested for being too openly critical of governmental leaders, and so she avoids voicing her opinions in public places.
Vito disagrees with many of the government's policies, and is very careful about whom he says this to, reserving his real opinions for family and close friends only. He knows several men who have been taken away by government officials for saying negative things in public.
Sonny lives in fear of being harassed for his political views. Everyone he knows who has spoken out against the government has been arrested or taken away. He never says a word about anything the government does, not even when he is at home alone with his family.

It seems (reasonably enough) that the sixth vignette has the highest mean and median score among the situations proposed in all but two cases:



In [34]:

    
print(free_data.groupby('educ')['v1', 'v2', 'v3', 'v4', 'v5', 'v6'].mean().apply(np.argmax, axis=1))
print(free_data.groupby('educ')['v1', 'v2', 'v3', 'v4', 'v5', 'v6'].median().apply(np.argmax, axis=1))









    



educ
1.0    v6
2.0    v6
3.0    v6
4.0    v6
5.0    v6
6.0    v6
7.0    v4
dtype: object
educ
1.0    v6
2.0    v6
3.0    v6
4.0    v6
5.0    v6
6.0    v4
7.0    v6
dtype: object

12 - Which country would you say has the most freedom of speech? Be sure to justify your answer quantitatively.

I would say Oceana has the most freedom of speech, because it has the higher mean for the vignettes, expecially vignettes 4 and 5.

It should be noted also that the vignette 5 has a surprisingly low mean in the other two countries, more so if compared to vignette 4 or 3.



In [33]:

    
free_data.groupby('country')['v1', 'v2', 'v3', 'v4', 'v5', 'v6', 'y'].describe()

13 - Is there a difference of opinion between men and women regarding freedom of speech? If any, does this difference manifest itself accross the different countries? Accross education levels? Be sure to justify your answers quantiatively.

We can say that women think of themselves as slightly more free than men:



In [34]:

    
print(free_data.groupby('sex')['y'].mean()) # men thing of themselves as slightly more free









    



sex
0.0    3.417085
1.0    3.600000
Name: y, dtype: float64

and that this difference is greater in Eastasia and almost none in Eurasia:



In [35]:

    
print(free_data.groupby(['sex', 'country'])['y'].mean().unstack())









    



country  Eastasia  Eurasia    Oceana
sex                                 
0.0       3.50000  4.00000  2.840000
1.0       3.76087  4.02381  2.918919

The difference is also present across education levels: curiously uneducated women and women with secondary school degree or higher feel freer than men but women with an intermediate education feel less free than men. We can also observe a tendency, aside from uneducated people, for women to feel freer the more they are educated when compared to men.

Lastly, it's pretty remarkable that the most educated men does not feel free at all, but this can be due to the small sample size.



In [36]:

    
print(free_data.groupby(['sex', 'educ'])['y'].mean().unstack())









    



educ       1.0       2.0       3.0       4.0       5.0       6.0  7.0
sex                                                                  
0.0   3.600000  3.708333  3.480000  3.250000  3.333333  2.777778  3.4
1.0   3.974359  3.312500  3.446429  3.282051  3.551724  3.333333  5.0

Merging, Indexes, and `Apply()`

Much of the power of Data Sciences comes from the ability to join together datasets from very different sources. One could be interested in seeing if there is a relationship between housing prices and prevalence of infectious disease in a given ZIP code for example. This task is often referred to as a merge or join.

Every Pandas Data Frame has an index. Indices in Pandas are a bit of a complex topic, but for the time being consider them to be a unique identifier for each row in a Data Frame. When performing joins and manipulating Data Frames, it is important to remember that your task may require the creation or change of the Data Frame's index. For more extensive reading on this topic, consult the Pandas Documentation.

And lastly, if you are coming from a programming background like C/C++ or Java, you are likely very accustomed to operating on arrays and lists using for loops. Often this is how you will want to work with Data Frames in Python, but Pandas also provides functionality for functional like programming by utilizing the Apply() function. This is similar to the apply family of functions in R and the Map() and related functions in Lisp. Making use of Apply() in Python can make your code more concise, readable, and faster when performing operations on an entire Data Frame.

Using on Pandas, perform the following exercises.

1 - Using the free1.csv downloaded above, import it as a Data Frame named free_data and rename the first column to id.



In [37]:

    
free_data = pd.read_csv('free1.csv')
col = list(free_data.columns)
col[0] = 'id'
free_data.columns = col

free_data.head()

2 - Create a dataframe named free_sub, consisting of the id, country, and y columns from free_data.



In [38]:

    
free_sub = free_data[['id', 'country', 'y']]
free_sub.head()

3 - Create a new Data Frame called ed_level, consisting of the id and three categories of education levels, labeled high, med, and low, for ranges of your choosing. Do this using a for loop.



In [39]:

    
# create a dataframe with one column
ed_level = pd.DataFrame(free_data['id'])
# add a blank column
ed_level['educ_cat'] = None
# dictionary of labels
educ_cat = {1:'low', 2:'low', 3:'med', 4:'med', 5:'high', 6:'high', 7:'high'}

# for every row in the dataframe if the corresponding educ column in the free data is not missing
# use the dictionary to set the corresponding label
for i in range(0, len(ed_level)):
    if pd.notnull(free_data.loc[i, 'educ']):
        ed_level.loc[i, 'educ_cat'] = educ_cat[free_data.loc[i, 'educ']]

ed_level.head(5)

4 - Merge free_sub and ed_level together. Which column should the merge be performed on? Do this using both the concat() and merge() functions.

The merge should be performed on the id column, which is used implicitly by the merge function.

The concat function returns a DataFrame that duplicates all the rows (and indexes) setting to NaN the missing values for each of the original datasets.

The merge function instead performs an inner join returning the rows from the first dataset enriched with the values of the second.



In [40]:

    
# Solution: the id is set as an index for both columns
pd.concat([free_sub, ed_level])









    Out[40]:






  
    
      
      country
      educ_cat
      id
      y
    
  
  
    
      0
      Eurasia
      NaN
      109276
      1.0
    
    
      1
      Oceana
      NaN
      88178
      2.0
    
    
      2
      Eastasia
      NaN
      111063
      2.0
    
    
      3
      Eastasia
      NaN
      161488
      2.0
    
    
      4
      Oceana
      NaN
      44532
      1.0
    
    
      5
      Eastasia
      NaN
      95503
      1.0
    
    
      6
      Oceana
      NaN
      26276
      2.0
    
    
      7
      Oceana
      NaN
      26299
      2.0
    
    
      8
      Oceana
      NaN
      89206
      2.0
    
    
      9
      Oceana
      NaN
      24342
      1.0
    
    
      10
      Eurasia
      NaN
      109960
      1.0
    
    
      11
      Eurasia
      NaN
      117509
      2.0
    
    
      12
      Eastasia
      NaN
      11633
      3.0
    
    
      13
      Oceana
      NaN
      57867
      1.0
    
    
      14
      Oceana
      NaN
      30513
      1.0
    
    
      15
      Oceana
      NaN
      91576
      2.0
    
    
      16
      Oceana
      NaN
      42161
      1.0
    
    
      17
      Oceana
      NaN
      74317
      1.0
    
    
      18
      Oceana
      NaN
      661
      2.0
    
    
      19
      Oceana
      NaN
      41371
      1.0
    
    
      20
      Oceana
      NaN
      60279
      1.0
    
    
      21
      Eurasia
      NaN
      110851
      2.0
    
    
      22
      Oceana
      NaN
      22764
      1.0
    
    
      23
      Oceana
      NaN
      92556
      1.0
    
    
      24
      Eastasia
      NaN
      38858
      4.0
    
    
      25
      Eastasia
      NaN
      71819
      4.0
    
    
      26
      Eurasia
      NaN
      108399
      2.0
    
    
      27
      Oceana
      NaN
      27707
      4.0
    
    
      28
      Eastasia
      NaN
      166386
      1.0
    
    
      29
      Eastasia
      NaN
      8524
      4.0
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      420
      NaN
      low
      116011
      NaN
    
    
      421
      NaN
      low
      107534
      NaN
    
    
      422
      NaN
      med
      111669
      NaN
    
    
      423
      NaN
      low
      118728
      NaN
    
    
      424
      NaN
      med
      121362
      NaN
    
    
      425
      NaN
      med
      108776
      NaN
    
    
      426
      NaN
      high
      109604
      NaN
    
    
      427
      NaN
      low
      120098
      NaN
    
    
      428
      NaN
      low
      116405
      NaN
    
    
      429
      NaN
      high
      141024
      NaN
    
    
      430
      NaN
      low
      108443
      NaN
    
    
      431
      NaN
      med
      110293
      NaN
    
    
      432
      NaN
      med
      117227
      NaN
    
    
      433
      NaN
      med
      117445
      NaN
    
    
      434
      NaN
      med
      110560
      NaN
    
    
      435
      NaN
      med
      20416
      NaN
    
    
      436
      NaN
      low
      60079
      NaN
    
    
      437
      NaN
      high
      52621
      NaN
    
    
      438
      NaN
      med
      108533
      NaN
    
    
      439
      NaN
      med
      109068
      NaN
    
    
      440
      NaN
      low
      56676
      NaN
    
    
      441
      NaN
      low
      58098
      NaN
    
    
      442
      NaN
      high
      117252
      NaN
    
    
      443
      NaN
      med
      110212
      NaN
    
    
      444
      NaN
      med
      168326
      NaN
    
    
      445
      NaN
      low
      95744
      NaN
    
    
      446
      NaN
      med
      109491
      NaN
    
    
      447
      NaN
      low
      65788
      NaN
    
    
      448
      NaN
      med
      147766
      NaN
    
    
      449
      NaN
      med
      116952
      NaN
    
  

900 rows × 4 columns



In [41]:

    
pd.merge(free_sub, ed_level)









    Out[41]:






  
    
      
      id
      country
      y
      educ_cat
    
  
  
    
      0
      109276
      Eurasia
      1
      med
    
    
      1
      88178
      Oceana
      2
      med
    
    
      2
      111063
      Eastasia
      2
      low
    
    
      3
      161488
      Eastasia
      2
      high
    
    
      4
      44532
      Oceana
      1
      high
    
    
      5
      95503
      Eastasia
      1
      high
    
    
      6
      26276
      Oceana
      2
      low
    
    
      7
      26299
      Oceana
      2
      med
    
    
      8
      89206
      Oceana
      2
      low
    
    
      9
      24342
      Oceana
      1
      med
    
    
      10
      109960
      Eurasia
      1
      med
    
    
      11
      117509
      Eurasia
      2
      high
    
    
      12
      11633
      Eastasia
      3
      high
    
    
      13
      57867
      Oceana
      1
      low
    
    
      14
      30513
      Oceana
      1
      high
    
    
      15
      91576
      Oceana
      2
      med
    
    
      16
      42161
      Oceana
      1
      med
    
    
      17
      74317
      Oceana
      1
      med
    
    
      18
      661
      Oceana
      2
      med
    
    
      19
      41371
      Oceana
      1
      high
    
    
      20
      60279
      Oceana
      1
      low
    
    
      21
      110851
      Eurasia
      2
      high
    
    
      22
      22764
      Oceana
      1
      med
    
    
      23
      92556
      Oceana
      1
      high
    
    
      24
      38858
      Eastasia
      4
      high
    
    
      25
      71819
      Eastasia
      4
      low
    
    
      26
      108399
      Eurasia
      2
      high
    
    
      27
      27707
      Oceana
      4
      med
    
    
      28
      166386
      Eastasia
      1
      med
    
    
      29
      8524
      Eastasia
      4
      low
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      420
      116011
      Eurasia
      5
      low
    
    
      421
      107534
      Eastasia
      4
      low
    
    
      422
      111669
      Eastasia
      5
      med
    
    
      423
      118728
      Eurasia
      5
      low
    
    
      424
      121362
      Eastasia
      5
      med
    
    
      425
      108776
      Eurasia
      5
      med
    
    
      426
      109604
      Eurasia
      4
      high
    
    
      427
      120098
      Eurasia
      4
      low
    
    
      428
      116405
      Eurasia
      4
      low
    
    
      429
      141024
      Eastasia
      5
      high
    
    
      430
      108443
      Eurasia
      5
      low
    
    
      431
      110293
      Eurasia
      5
      med
    
    
      432
      117227
      Eurasia
      5
      med
    
    
      433
      117445
      Eurasia
      5
      med
    
    
      434
      110560
      Eurasia
      4
      med
    
    
      435
      20416
      Eastasia
      5
      med
    
    
      436
      60079
      Oceana
      4
      low
    
    
      437
      52621
      Eastasia
      4
      high
    
    
      438
      108533
      Eurasia
      5
      med
    
    
      439
      109068
      Eurasia
      4
      med
    
    
      440
      56676
      Oceana
      5
      low
    
    
      441
      58098
      Oceana
      5
      low
    
    
      442
      117252
      Eurasia
      5
      high
    
    
      443
      110212
      Eurasia
      4
      med
    
    
      444
      168326
      Eastasia
      5
      med
    
    
      445
      95744
      Eastasia
      3
      low
    
    
      446
      109491
      Eurasia
      3
      med
    
    
      447
      65788
      Eastasia
      5
      low
    
    
      448
      147766
      Eastasia
      4
      med
    
    
      449
      116952
      Eurasia
      5
      med
    
  

450 rows × 4 columns

5 - Use the append() function to join together free_sub and ed_level. Are the results the same as in part (4)? If not, how could you reproduce the result append() by using concat() or merge()?



In [42]:

    
free_sub.append(ed_level)









    Out[42]:






  
    
      
      country
      educ_cat
      id
      y
    
  
  
    
      0
      Eurasia
      NaN
      109276
      1.0
    
    
      1
      Oceana
      NaN
      88178
      2.0
    
    
      2
      Eastasia
      NaN
      111063
      2.0
    
    
      3
      Eastasia
      NaN
      161488
      2.0
    
    
      4
      Oceana
      NaN
      44532
      1.0
    
    
      5
      Eastasia
      NaN
      95503
      1.0
    
    
      6
      Oceana
      NaN
      26276
      2.0
    
    
      7
      Oceana
      NaN
      26299
      2.0
    
    
      8
      Oceana
      NaN
      89206
      2.0
    
    
      9
      Oceana
      NaN
      24342
      1.0
    
    
      10
      Eurasia
      NaN
      109960
      1.0
    
    
      11
      Eurasia
      NaN
      117509
      2.0
    
    
      12
      Eastasia
      NaN
      11633
      3.0
    
    
      13
      Oceana
      NaN
      57867
      1.0
    
    
      14
      Oceana
      NaN
      30513
      1.0
    
    
      15
      Oceana
      NaN
      91576
      2.0
    
    
      16
      Oceana
      NaN
      42161
      1.0
    
    
      17
      Oceana
      NaN
      74317
      1.0
    
    
      18
      Oceana
      NaN
      661
      2.0
    
    
      19
      Oceana
      NaN
      41371
      1.0
    
    
      20
      Oceana
      NaN
      60279
      1.0
    
    
      21
      Eurasia
      NaN
      110851
      2.0
    
    
      22
      Oceana
      NaN
      22764
      1.0
    
    
      23
      Oceana
      NaN
      92556
      1.0
    
    
      24
      Eastasia
      NaN
      38858
      4.0
    
    
      25
      Eastasia
      NaN
      71819
      4.0
    
    
      26
      Eurasia
      NaN
      108399
      2.0
    
    
      27
      Oceana
      NaN
      27707
      4.0
    
    
      28
      Eastasia
      NaN
      166386
      1.0
    
    
      29
      Eastasia
      NaN
      8524
      4.0
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      420
      NaN
      low
      116011
      NaN
    
    
      421
      NaN
      low
      107534
      NaN
    
    
      422
      NaN
      med
      111669
      NaN
    
    
      423
      NaN
      low
      118728
      NaN
    
    
      424
      NaN
      med
      121362
      NaN
    
    
      425
      NaN
      med
      108776
      NaN
    
    
      426
      NaN
      high
      109604
      NaN
    
    
      427
      NaN
      low
      120098
      NaN
    
    
      428
      NaN
      low
      116405
      NaN
    
    
      429
      NaN
      high
      141024
      NaN
    
    
      430
      NaN
      low
      108443
      NaN
    
    
      431
      NaN
      med
      110293
      NaN
    
    
      432
      NaN
      med
      117227
      NaN
    
    
      433
      NaN
      med
      117445
      NaN
    
    
      434
      NaN
      med
      110560
      NaN
    
    
      435
      NaN
      med
      20416
      NaN
    
    
      436
      NaN
      low
      60079
      NaN
    
    
      437
      NaN
      high
      52621
      NaN
    
    
      438
      NaN
      med
      108533
      NaN
    
    
      439
      NaN
      med
      109068
      NaN
    
    
      440
      NaN
      low
      56676
      NaN
    
    
      441
      NaN
      low
      58098
      NaN
    
    
      442
      NaN
      high
      117252
      NaN
    
    
      443
      NaN
      med
      110212
      NaN
    
    
      444
      NaN
      med
      168326
      NaN
    
    
      445
      NaN
      low
      95744
      NaN
    
    
      446
      NaN
      med
      109491
      NaN
    
    
      447
      NaN
      low
      65788
      NaN
    
    
      448
      NaN
      med
      147766
      NaN
    
    
      449
      NaN
      med
      116952
      NaN
    
  

900 rows × 4 columns

The append function produces the same results of concat: it returns a DataFrame with duplicated rows and indices setting to NaN the missing values:

6 - Use numpy to generate two lists 100 random floats labeled y1 and y2. Now create a sequence of integers on the range 0-100 labeled x1 and a sequence of integers on the range 50-150 labeled x2. Create two DataFrames, dat1 and dat2 consisting of x1 and y1, and x2 and y2 respectively, but having labels x, y1, and x, y2. Use merge() to join these two Data Frames together, on x, using both an inner and outer join. What is the difference between the two joins?



In [39]:

    
np.random.seed(0)
# random floats
y1 = np.random.ranf(100)
y2 = np.random.ranf(100)
# random integers
x1 = np.arange(0, 100)
x2 = np.arange(50, 150)
# create the dataframes
dat1 = pd.DataFrame({'x':x1, 'y1':y1})
dat2 = pd.DataFrame({'x':x2, 'y2':y2})

The inner join mantains only the rows of the first dataset which have a counterpart in the second dataset, eventually duplicating them if there is more than one match. This leads, in general, to fewer rows returned:



In [40]:

    
pd.merge(dat1, dat2)

The outer join instead returns all the rows from both the datasets with the eventual matches between them (again duplicating the rows if there are multiple matches):



In [41]:

    
pd.merge(dat1, dat2, how='outer')









    Out[41]:






  
    
      
      x
      y1
      y2
    
  
  
    
      0
      0
      0.548814
      NaN
    
    
      1
      1
      0.715189
      NaN
    
    
      2
      2
      0.602763
      NaN
    
    
      3
      3
      0.544883
      NaN
    
    
      4
      4
      0.423655
      NaN
    
    
      5
      5
      0.645894
      NaN
    
    
      6
      6
      0.437587
      NaN
    
    
      7
      7
      0.891773
      NaN
    
    
      8
      8
      0.963663
      NaN
    
    
      9
      9
      0.383442
      NaN
    
    
      10
      10
      0.791725
      NaN
    
    
      11
      11
      0.528895
      NaN
    
    
      12
      12
      0.568045
      NaN
    
    
      13
      13
      0.925597
      NaN
    
    
      14
      14
      0.071036
      NaN
    
    
      15
      15
      0.087129
      NaN
    
    
      16
      16
      0.020218
      NaN
    
    
      17
      17
      0.832620
      NaN
    
    
      18
      18
      0.778157
      NaN
    
    
      19
      19
      0.870012
      NaN
    
    
      20
      20
      0.978618
      NaN
    
    
      21
      21
      0.799159
      NaN
    
    
      22
      22
      0.461479
      NaN
    
    
      23
      23
      0.780529
      NaN
    
    
      24
      24
      0.118274
      NaN
    
    
      25
      25
      0.639921
      NaN
    
    
      26
      26
      0.143353
      NaN
    
    
      27
      27
      0.944669
      NaN
    
    
      28
      28
      0.521848
      NaN
    
    
      29
      29
      0.414662
      NaN
    
    
      ...
      ...
      ...
      ...
    
    
      120
      120
      NaN
      0.521037
    
    
      121
      121
      NaN
      0.054338
    
    
      122
      122
      NaN
      0.199997
    
    
      123
      123
      NaN
      0.018522
    
    
      124
      124
      NaN
      0.793698
    
    
      125
      125
      NaN
      0.223925
    
    
      126
      126
      NaN
      0.345352
    
    
      127
      127
      NaN
      0.928081
    
    
      128
      128
      NaN
      0.704414
    
    
      129
      129
      NaN
      0.031839
    
    
      130
      130
      NaN
      0.164694
    
    
      131
      131
      NaN
      0.621478
    
    
      132
      132
      NaN
      0.577229
    
    
      133
      133
      NaN
      0.237893
    
    
      134
      134
      NaN
      0.934214
    
    
      135
      135
      NaN
      0.613966
    
    
      136
      136
      NaN
      0.535633
    
    
      137
      137
      NaN
      0.589910
    
    
      138
      138
      NaN
      0.730122
    
    
      139
      139
      NaN
      0.311945
    
    
      140
      140
      NaN
      0.398221
    
    
      141
      141
      NaN
      0.209844
    
    
      142
      142
      NaN
      0.186193
    
    
      143
      143
      NaN
      0.944372
    
    
      144
      144
      NaN
      0.739551
    
    
      145
      145
      NaN
      0.490459
    
    
      146
      146
      NaN
      0.227415
    
    
      147
      147
      NaN
      0.254356
    
    
      148
      148
      NaN
      0.058029
    
    
      149
      149
      NaN
      0.434417
    
  

150 rows × 3 columns

7 - Create a Data Frame, called scores consising of only the y and v_ columns from free_data.



In [46]:

    
scores = free_data[['y', 'v1', 'v2', 'v3', 'v4', 'v5', 'v6']]

8 - Using a for loop(s), compute the sum and mean for each column in scores.



In [47]:

    
# initialize an array of zeros to store the results
sums = np.zeros(7)
# calculate the sum with a for loop
for i in range(len(scores)):
    sums += scores.iloc[i]
# calculate the mean out the loop, dividing by the total number of scores
means = sums / len(scores)

print(sums)
print(means)









    



[ 1584.  1192.  1141.  1649.  1838.  1740.  1971.]
[ 3.52        2.64888889  2.53555556  3.66444444  4.08444444  3.86666667
  4.38      ]

9 - Using the apply() function, compute the sum and mean for each column in scores.



In [48]:

    
scores.apply(lambda x: [np.sum(x), np.mean(x)])









    Out[48]:





y                   [1584, 3.52]
v1     [1192, 2.648888888888889]
v2    [1141, 2.5355555555555553]
v3    [1649, 3.6644444444444444]
v4     [1838, 4.084444444444444]
v5    [1740, 3.8666666666666667]
v6                  [1971, 4.38]
dtype: object

10 - Using the apply() function, label each column in scores as either high, med, or low by first computing the mean for each column and assigning the categories at values of your choosing. Do this by writing a single function you can call with apply().



In [49]:

    
scores.apply(lambda x: 'low' if np.mean(x) <= 3 else 'med' if np.mean(x) <= 4 else 'high')









    Out[49]:





y      med
v1     low
v2     low
v3     med
v4    high
v5     med
v6    high
dtype: object

Time Series

In many situations you may not know the relationship between two variables but you do know that there ought to be one. Take for example the daily price of beef and grain. It is reasonable to assume that there exists some, perhaps even a causal, relationship between these two, but due to the complexity of the phenomenon, and the vast number of underlying latent variables involved (fuel price, politics, famine, etc...), you likely have little hope to uncover such a relationship in a reasonable amount of time. However, you do know that these two variables are related in time and may exibit some pattern that repeats itself in time. Identifying these types of patterns is called Time Series Analysis and sequencing your data such that each data point is represented as a unique point in time is called a Time Series. The canonical example of a Time Series is, of course, stock market data which is what we will be using for this exercise

Do the following exercises.

1 - Create a start and end datetime object, starting at a date of your choosing and ending today.



In [50]:

    
start = pd.datetime(2016, 1, 1)
end = pd.datetime.today()

print(start, end)









    



2016-01-01 00:00:00 2017-07-26 10:11:27.586984

2 - For three stocks of your choosing, put their symbols into a list and use pandas to retrieve their data from google for the time frame you created in part (1). Print the results.



In [51]:

    
import pandas_datareader.data as web

It turns out that the DataReader method return a Panel object if called passing a list of stocks, so we can transform it in a DataFrame and unstack it to obtain the derired result:



In [52]:

    
stock_code = ['GOOG', 'FB', 'MSFT']

stock_panel = web.DataReader(stock_code, start=start, end=end, data_source='google')

stock_prices = stock_panel.to_frame().unstack(level=1)

stock_prices









    Out[52]:






  
    
      
      Open
      High
      Low
      Close
      Volume
    
    
      minor
      FB
      GOOG
      MSFT
      FB
      GOOG
      MSFT
      FB
      GOOG
      MSFT
      FB
      GOOG
      MSFT
      FB
      GOOG
      MSFT
    
    
      Date
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2016-01-04
      101.95
      743.00
      54.32
      102.24
      744.06
      54.80
      99.75
      731.26
      53.39
      102.22
      741.84
      54.80
      37717312.0
      3258199.0
      52843210.0
    
    
      2016-01-05
      102.89
      746.45
      54.93
      103.71
      752.00
      55.39
      101.66
      738.64
      54.54
      102.73
      742.58
      55.05
      23258238.0
      1950691.0
      34079674.0
    
    
      2016-01-06
      101.13
      730.00
      54.32
      103.77
      747.18
      54.40
      100.90
      728.92
      53.64
      102.97
      743.62
      54.05
      25096183.0
      1947034.0
      39518863.0
    
    
      2016-01-07
      100.50
      730.31
      52.70
      101.43
      738.50
      53.48
      97.30
      719.06
      52.07
      97.92
      726.39
      52.17
      45172906.0
      2963741.0
      56564852.0
    
    
      2016-01-08
      99.88
      731.45
      52.37
      100.50
      733.23
      53.28
      97.03
      713.00
      52.15
      97.33
      714.47
      52.33
      35402298.0
      2450857.0
      48753969.0
    
    
      2016-01-11
      97.91
      716.61
      52.51
      98.60
      718.86
      52.85
      95.39
      703.54
      51.46
      97.51
      716.03
      52.30
      29932385.0
      2090621.0
      36943756.0
    
    
      2016-01-12
      99.00
      721.68
      52.76
      99.96
      728.75
      53.10
      97.55
      717.32
      52.06
      99.37
      726.07
      52.78
      28395390.0
      2024509.0
      36095539.0
    
    
      2016-01-13
      100.58
      730.85
      53.80
      100.58
      734.74
      54.07
      95.21
      698.61
      51.30
      95.44
      700.56
      51.64
      33091576.0
      2468295.0
      66119018.0
    
    
      2016-01-14
      95.85
      705.38
      52.00
      98.87
      721.92
      53.42
      92.45
      689.10
      51.57
      98.37
      714.72
      53.11
      48488509.0
      2211853.0
      51429807.0
    
    
      2016-01-15
      93.98
      692.29
      51.31
      96.38
      706.74
      51.97
      93.54
      685.37
      50.34
      94.97
      694.45
      50.99
      45935550.0
      3592449.0
      70739137.0
    
    
      2016-01-19
      96.53
      703.30
      51.48
      97.42
      709.98
      51.68
      93.92
      693.41
      50.06
      95.26
      701.79
      50.56
      30620565.0
      2258479.0
      43128276.0
    
    
      2016-01-20
      92.83
      688.61
      49.98
      95.00
      706.85
      51.38
      89.37
      673.26
      49.10
      94.35
      698.45
      50.79
      58982367.0
      3439386.0
      63162940.0
    
    
      2016-01-21
      94.91
      702.18
      51.00
      95.99
      719.19
      51.58
      92.62
      694.46
      50.30
      94.16
      706.59
      50.48
      30468830.0
      2410263.0
      40169367.0
    
    
      2016-01-22
      96.41
      723.60
      51.41
      98.07
      728.13
      52.33
      95.49
      720.12
      51.26
      97.94
      725.25
      52.29
      30331442.0
      2006528.0
      36878402.0
    
    
      2016-01-25
      98.72
      723.58
      51.94
      99.48
      729.68
      52.65
      96.93
      710.01
      51.65
      97.01
      711.67
      51.79
      32319141.0
      1704641.0
      34497323.0
    
    
      2016-01-26
      97.76
      713.85
      51.79
      97.88
      718.28
      52.44
      95.66
      706.48
      51.55
      97.34
      713.04
      52.17
      26637412.0
      1324300.0
      28699484.0
    
    
      2016-01-27
      97.79
      713.67
      52.01
      97.85
      718.24
      52.20
      94.23
      694.39
      51.02
      94.45
      699.99
      51.22
      48311872.0
      2139970.0
      36266102.0
    
    
      2016-01-28
      107.20
      722.22
      51.86
      110.34
      733.69
      52.21
      104.81
      712.35
      51.25
      109.11
      730.96
      52.06
      107084072.0
      2658016.0
      58489190.0
    
    
      2016-01-29
      108.99
      731.53
      54.73
      112.84
      744.99
      55.09
      108.84
      726.80
      54.00
      112.21
      742.95
      55.09
      62357459.0
      3394935.0
      83137039.0
    
    
      2016-02-01
      112.27
      750.46
      54.88
      115.72
      757.86
      55.09
      112.01
      743.27
      54.50
      115.09
      752.00
      54.71
      45840867.0
      4801816.0
      43587105.0
    
    
      2016-02-02
      114.80
      784.50
      54.17
      117.59
      789.87
      54.26
      113.20
      764.65
      52.65
      114.61
      764.65
      53.00
      59644925.0
      6332431.0
      54453559.0
    
    
      2016-02-03
      115.27
      770.22
      53.25
      115.34
      774.50
      53.39
      109.75
      720.50
      51.26
      112.69
      726.95
      52.16
      56848084.0
      6162333.0
      57347519.0
    
    
      2016-02-04
      111.80
      722.81
      52.10
      111.94
      727.00
      52.81
      109.25
      701.86
      51.37
      110.49
      708.01
      52.00
      38648485.0
      5145855.0
      46803371.0
    
    
      2016-02-05
      109.51
      703.87
      51.94
      109.58
      703.99
      52.00
      103.18
      680.15
      49.56
      104.07
      683.57
      50.16
      76522716.0
      5069985.0
      60651115.0
    
    
      2016-02-08
      100.41
      667.85
      49.55
      102.68
      684.03
      49.57
      97.46
      663.06
      48.19
      99.75
      682.74
      49.41
      71016537.0
      4212541.0
      57006105.0
    
    
      2016-02-09
      97.14
      672.32
      49.02
      102.40
      699.90
      50.24
      96.82
      668.77
      48.67
      99.54
      678.11
      49.28
      62580050.0
      3604335.0
      45822192.0
    
    
      2016-02-10
      101.55
      686.86
      49.89
      103.25
      701.31
      50.39
      100.24
      682.13
      49.52
      101.00
      684.12
      49.71
      44943948.0
      2627379.0
      37899918.0
    
    
      2016-02-11
      99.60
      675.00
      48.68
      105.11
      689.35
      50.11
      98.88
      668.87
      48.51
      101.91
      683.11
      49.69
      43493102.0
      3007223.0
      48448154.0
    
    
      2016-02-12
      103.74
      690.26
      50.25
      104.24
      693.75
      50.68
      101.09
      678.60
      49.75
      102.01
      682.40
      50.50
      36038104.0
      2129831.0
      33292336.0
    
    
      2016-02-16
      103.80
      692.98
      50.90
      103.93
      698.00
      51.09
      100.24
      685.05
      50.13
      101.61
      691.00
      51.09
      45438324.0
      2497024.0
      35937137.0
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2017-06-13
      150.15
      951.91
      70.02
      151.18
      959.98
      70.82
      148.90
      944.09
      69.96
      150.68
      953.40
      70.65
      20483350.0
      2013337.0
      25258614.0
    
    
      2017-06-14
      151.26
      959.92
      70.91
      152.40
      961.15
      71.10
      149.05
      942.25
      69.43
      150.25
      950.76
      70.27
      20808759.0
      1489715.0
      25510685.0
    
    
      2017-06-15
      147.67
      933.97
      69.27
      150.04
      943.34
      70.21
      146.37
      924.44
      68.80
      149.80
      942.31
      69.90
      18994152.0
      2133050.0
      26068709.0
    
    
      2017-06-16
      149.59
      940.00
      69.73
      150.83
      942.04
      70.03
      148.60
      931.60
      69.22
      150.64
      939.78
      70.00
      22882417.0
      3094711.0
      48345085.0
    
    
      2017-06-19
      151.71
      949.96
      70.50
      153.57
      959.99
      70.94
      151.71
      949.05
      70.35
      152.87
      957.37
      70.87
      19026462.0
      1533336.0
      23798268.0
    
    
      2017-06-20
      152.88
      957.52
      70.82
      153.84
      961.62
      70.87
      152.21
      950.01
      69.87
      152.25
      950.63
      69.91
      14743853.0
      1125990.0
      21512231.0
    
    
      2017-06-21
      152.36
      953.64
      70.21
      154.08
      960.10
      70.62
      151.88
      950.76
      69.94
      153.91
      959.45
      70.27
      15039834.0
      1202233.0
      19891051.0
    
    
      2017-06-22
      153.01
      958.70
      70.54
      154.55
      960.72
      70.59
      152.91
      954.55
      69.71
      153.40
      957.09
      70.26
      13005535.0
      941958.0
      22965738.0
    
    
      2017-06-23
      152.72
      956.83
      70.09
      155.20
      966.00
      71.25
      152.65
      954.20
      69.92
      155.07
      965.59
      71.21
      18026977.0
      1527856.0
      27617291.0
    
    
      2017-06-26
      156.25
      969.90
      71.40
      156.50
      973.31
      71.71
      153.20
      950.79
      70.44
      153.59
      952.27
      70.53
      18049651.0
      1598355.0
      19606974.0
    
    
      2017-06-27
      152.84
      942.46
      70.11
      153.31
      948.29
      70.18
      150.39
      926.85
      69.18
      150.58
      927.33
      69.21
      19360766.0
      2579930.0
      25215128.0
    
    
      2017-06-28
      150.92
      929.00
      69.21
      153.47
      942.75
      69.84
      149.86
      916.00
      68.79
      153.24
      940.49
      69.80
      16719104.0
      2721406.0
      25806200.0
    
    
      2017-06-29
      152.28
      929.92
      69.38
      152.50
      931.26
      69.49
      148.92
      910.62
      68.09
      151.04
      917.79
      68.49
      23973482.0
      3299176.0
      28918715.0
    
    
      2017-06-30
      151.90
      926.05
      68.78
      151.92
      926.05
      69.38
      150.06
      908.31
      68.74
      150.98
      908.73
      68.93
      17666022.0
      2090226.0
      24161068.0
    
    
      2017-07-03
      151.72
      912.18
      69.33
      152.15
      913.94
      69.60
      147.80
      894.79
      68.02
      148.43
      898.70
      68.17
      13862735.0
      1710373.0
      16165538.0
    
    
      2017-07-05
      149.00
      901.76
      68.26
      150.85
      914.51
      69.44
      148.13
      898.50
      68.22
      150.34
      911.71
      69.08
      14334290.0
      1813884.0
      21176272.0
    
    
      2017-07-06
      149.03
      904.12
      68.27
      150.04
      914.94
      68.78
      148.00
      899.70
      68.12
      148.82
      906.69
      68.57
      14951802.0
      1424503.0
      21117572.0
    
    
      2017-07-07
      149.25
      908.85
      68.70
      151.99
      921.54
      69.84
      149.19
      908.85
      68.70
      151.44
      918.59
      69.46
      13615931.0
      1637785.0
      16878317.0
    
    
      2017-07-10
      151.69
      921.77
      69.46
      153.98
      930.38
      70.25
      151.51
      919.59
      69.20
      153.50
      928.80
      69.98
      13378672.0
      1192825.0
      15014503.0
    
    
      2017-07-11
      153.37
      929.54
      70.11
      155.42
      931.43
      70.68
      152.91
      922.00
      69.75
      155.27
      930.09
      69.99
      13667744.0
      1113235.0
      17460022.0
    
    
      2017-07-12
      156.49
      938.68
      70.69
      159.16
      946.30
      71.28
      156.20
      934.47
      70.55
      158.90
      943.83
      71.15
      22739388.0
      1532144.0
      17750925.0
    
    
      2017-07-13
      158.74
      946.29
      71.50
      159.78
      954.45
      72.04
      158.42
      943.01
      71.31
      159.26
      947.16
      71.77
      13929236.0
      1294687.0
      20269816.0
    
    
      2017-07-14
      160.13
      952.00
      72.24
      160.32
      956.91
      73.27
      159.33
      948.00
      71.96
      159.97
      955.99
      72.78
      16371207.0
      1053774.0
      25868086.0
    
    
      2017-07-17
      160.25
      957.00
      72.80
      160.78
      960.74
      73.45
      158.81
      949.24
      72.72
      159.73
      953.42
      73.35
      12858836.0
      1165537.0
      21803907.0
    
    
      2017-07-18
      159.66
      953.00
      73.09
      163.73
      968.04
      73.39
      159.42
      950.60
      72.66
      162.86
      965.40
      73.30
      23722910.0
      1153964.0
      26435296.0
    
    
      2017-07-19
      163.59
      967.84
      73.50
      165.70
      973.04
      74.04
      163.17
      964.03
      73.45
      164.14
      970.89
      73.86
      26273642.0
      1224540.0
      22416222.0
    
    
      2017-07-20
      164.80
      975.00
      74.18
      165.00
      975.90
      74.30
      162.81
      961.51
      73.28
      164.53
      968.15
      74.22
      18523107.0
      1624463.0
      42361025.0
    
    
      2017-07-21
      164.16
      962.25
      73.45
      165.05
      973.23
      74.29
      163.75
      960.15
      73.17
      164.43
      972.92
      73.79
      14744002.0
      1711000.0
      46717082.0
    
    
      2017-07-24
      164.64
      972.22
      73.53
      166.17
      986.20
      73.75
      164.31
      970.77
      73.13
      166.00
      980.34
      73.60
      17366971.0
      3248347.0
      21394826.0
    
    
      2017-07-25
      165.01
      953.81
      73.80
      165.54
      959.70
      74.31
      163.86
      945.40
      73.50
      165.28
      950.70
      74.19
      15383241.0
      4660979.0
      22018681.0
    
  

393 rows × 15 columns

3 - Create a Data Frame called stock_open for the open prices of the stocks you retrieved in part (2). Print the first few rows.



In [53]:

    
stock_open = stock_prices['Open']
stock_open.head(10)

4 - Compute the total, average, and maximum price for each stock weekly.

If by total we mean sum (it doesn't make too much sense to me though...):



In [54]:

    
stock_open.resample('W').aggregate([sum,np.mean, max])









    Out[54]:






  
    
      
      FB
      GOOG
      MSFT
    
    
      
      sum
      mean
      max
      sum
      mean
      max
      sum
      mean
      max
    
    
      Date
      
      
      
      
      
      
      
      
      
    
  
  
    
      2016-01-10
      506.35
      101.2700
      102.89
      3681.21
      736.2420
      746.45
      268.64
      53.7280
      54.93
    
    
      2016-01-17
      487.32
      97.4640
      100.58
      3566.81
      713.3620
      730.85
      262.38
      52.4760
      53.80
    
    
      2016-01-24
      380.68
      95.1700
      96.53
      2817.69
      704.4225
      723.60
      203.87
      50.9675
      51.48
    
    
      2016-01-31
      510.46
      102.0920
      108.99
      3604.85
      720.9700
      731.53
      262.33
      52.4660
      54.73
    
    
      2016-02-07
      563.65
      112.7300
      115.27
      3731.86
      746.3720
      784.50
      266.34
      53.2680
      54.88
    
    
      2016-02-14
      502.44
      100.4880
      103.74
      3392.29
      678.4580
      690.26
      247.39
      49.4780
      50.25
    
    
      2016-02-21
      414.09
      103.5225
      105.80
      2797.01
      699.2525
      710.00
      206.69
      51.6725
      52.33
    
    
      2016-02-28
      532.36
      106.4720
      108.70
      3506.41
      701.2820
      708.58
      259.64
      51.9280
      52.60
    
    
      2016-03-06
      545.41
      109.0820
      110.25
      3556.61
      711.3220
      719.00
      260.10
      52.0200
      52.97
    
    
      2016-03-13
      535.94
      107.1880
      108.48
      3522.08
      704.4160
      720.00
      260.18
      52.0360
      53.00
    
    
      2016-03-20
      552.59
      110.5180
      112.15
      3658.41
      731.6820
      741.86
      268.04
      53.6080
      54.92
    
    
      2016-03-27
      447.16
      111.7900
      112.13
      2948.33
      737.0825
      742.36
      214.81
      53.7025
      54.11
    
    
      2016-04-03
      572.33
      114.4660
      116.73
      3709.33
      741.8660
      750.10
      272.80
      54.5600
      55.05
    
    
      2016-04-10
      566.69
      113.3380
      114.25
      3713.17
      742.6340
      750.06
      274.52
      54.9040
      55.43
    
    
      2016-04-17
      553.68
      110.7360
      112.23
      3738.17
      747.6340
      754.01
      274.50
      54.9000
      55.30
    
    
      2016-04-24
      557.03
      111.4060
      112.55
      3769.65
      753.9300
      769.51
      276.12
      55.2240
      56.63
    
    
      2016-05-01
      564.70
      112.9400
      119.58
      3547.77
      709.5540
      725.42
      255.49
      51.0980
      52.26
    
    
      2016-05-08
      587.16
      117.4320
      118.04
      3481.07
      696.2140
      698.38
      249.97
      49.9940
      50.34
    
    
      2016-05-15
      599.93
      119.9860
      120.41
      3581.15
      716.2300
      723.41
      254.59
      50.9180
      51.44
    
    
      2016-05-22
      589.01
      117.8020
      119.38
      3532.77
      706.5540
      715.99
      253.95
      50.7900
      51.72
    
    
      2016-05-29
      589.59
      117.9180
      119.56
      3581.03
      716.2060
      724.01
      257.07
      51.4140
      51.93
    
    
      2016-06-05
      475.63
      118.9075
      119.46
      2928.04
      732.0100
      734.53
      209.72
      52.4300
      52.64
    
    
      2016-06-12
      591.29
      118.2580
      119.24
      3611.05
      722.2100
      724.91
      259.30
      51.8600
      52.24
    
    
      2016-06-19
      572.66
      114.5320
      115.30
      3575.55
      715.1100
      719.00
      249.19
      49.8380
      50.41
    
    
      2016-06-26
      567.92
      113.5840
      114.65
      3468.85
      693.7700
      699.06
      253.01
      50.6020
      51.28
    
    
      2016-07-03
      564.44
      112.8880
      114.67
      3410.64
      682.1280
      692.20
      249.78
      49.9560
      51.13
    
    
      2016-07-10
      460.36
      115.0900
      116.63
      2783.62
      695.9050
      699.50
      204.76
      51.1900
      51.73
    
    
      2016-07-17
      589.97
      117.9940
      118.63
      3598.10
      719.6200
      725.73
      266.79
      53.3580
      53.95
    
    
      2016-07-24
      599.38
      119.8760
      121.92
      3672.15
      734.4300
      741.86
      275.62
      55.1240
      56.15
    
    
      2016-07-31
      617.98
      123.5960
      127.52
      3737.74
      747.5480
      772.71
      281.86
      56.3720
      56.61
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2017-01-08
      473.42
      118.3550
      120.98
      3148.51
      787.1275
      795.26
      249.76
      62.4400
      62.79
    
    
      2017-01-15
      625.82
      125.1640
      127.49
      4033.88
      806.7760
      807.86
      313.78
      62.7560
      63.06
    
    
      2017-01-22
      512.78
      128.1950
      128.41
      3224.92
      806.2300
      807.08
      250.26
      62.5650
      62.68
    
    
      2017-01-29
      651.00
      130.2000
      132.68
      4131.69
      826.3380
      837.81
      319.36
      63.8720
      65.39
    
    
      2017-02-05
      658.46
      131.6920
      133.22
      4007.99
      801.5980
      814.66
      321.66
      64.3320
      65.69
    
    
      2017-02-12
      664.41
      132.8820
      134.49
      4031.90
      806.3800
      811.70
      318.58
      63.7160
      64.25
    
    
      2017-02-19
      668.82
      133.7640
      134.70
      4097.31
      819.4620
      823.02
      322.36
      64.4720
      64.74
    
    
      2017-02-26
      537.15
      134.2875
      135.89
      3315.17
      828.7925
      830.12
      257.89
      64.4725
      64.61
    
    
      2017-03-05
      682.24
      136.4480
      137.09
      4143.42
      828.6840
      833.85
      321.43
      64.2860
      64.69
    
    
      2017-03-12
      687.69
      137.5380
      138.91
      4167.14
      833.4280
      843.28
      322.72
      64.5440
      65.19
    
    
      2017-03-19
      698.00
      139.6000
      140.34
      4235.87
      847.1740
      851.61
      323.75
      64.7500
      65.01
    
    
      2017-03-26
      698.35
      139.6700
      141.15
      4174.40
      834.8800
      851.40
      324.52
      64.9040
      65.36
    
    
      2017-04-02
      706.11
      141.2220
      142.40
      4114.83
      822.9660
      833.50
      325.78
      65.1560
      65.65
    
    
      2017-04-09
      709.36
      141.8720
      142.26
      4156.45
      831.2900
      835.51
      328.95
      65.7900
      66.30
    
    
      2017-04-16
      561.14
      140.2850
      141.00
      3294.17
      823.5425
      825.39
      261.92
      65.4800
      65.61
    
    
      2017-04-23
      709.23
      141.8460
      143.90
      4183.34
      836.6680
      842.88
      327.15
      65.4300
      65.67
    
    
      2017-04-30
      734.01
      146.8020
      149.50
      4374.69
      874.9380
      910.66
      340.52
      68.1040
      68.91
    
    
      2017-05-07
      760.30
      152.0600
      153.60
      4586.03
      917.2060
      933.54
      345.70
      69.1400
      69.71
    
    
      2017-05-14
      753.14
      150.6280
      151.49
      4651.90
      930.3800
      936.95
      343.79
      68.7580
      68.99
    
    
      2017-05-21
      741.44
      148.2880
      150.17
      4661.09
      932.2180
      940.00
      340.16
      68.0320
      68.89
    
    
      2017-05-28
      747.64
      149.5280
      152.23
      4762.93
      952.5860
      969.70
      344.25
      68.8500
      69.80
    
    
      2017-06-04
      608.27
      152.0675
      152.70
      3883.74
      970.9350
      975.02
      281.00
      70.2500
      70.53
    
    
      2017-06-11
      769.17
      153.8340
      154.77
      4906.21
      981.2420
      984.50
      361.46
      72.2920
      72.64
    
    
      2017-06-18
      746.84
      149.3680
      151.26
      4725.36
      945.0720
      959.92
      349.18
      69.8360
      70.91
    
    
      2017-06-25
      762.68
      152.5360
      153.01
      4776.65
      955.3300
      958.70
      352.16
      70.4320
      70.82
    
    
      2017-07-02
      764.19
      152.8380
      156.25
      4697.33
      939.4660
      969.90
      348.88
      69.7760
      71.40
    
    
      2017-07-09
      599.00
      149.7500
      151.72
      3626.91
      906.7275
      912.18
      274.56
      68.6400
      69.33
    
    
      2017-07-16
      780.42
      156.0840
      160.13
      4688.28
      937.6560
      952.00
      354.00
      70.8000
      72.24
    
    
      2017-07-23
      812.46
      162.4920
      164.80
      4815.09
      963.0180
      975.00
      367.02
      73.4040
      74.18
    
    
      2017-07-30
      329.65
      164.8250
      165.01
      1926.03
      963.0150
      972.22
      147.33
      73.6650
      73.80
    
  

82 rows × 9 columns

Instead, if we mean last opening price of the week:



In [55]:

    
# calculate last value for each week
last = stock_open.resample('W').last()
# add a level to the column index for merging this data with mean and max dataframe created below
last.columns = pd.MultiIndex.from_product([stock_open.columns, ['last']])
# merge mean and max dataframe and last dataframe using index, then stack and unstack to have the columns in the right place
pd.merge(stock_open.resample('W').aggregate([np.mean, max]),
         last,
         left_index=True,
         right_index=True).stack().unstack()









    Out[55]:






  
    
      
      FB
      GOOG
      MSFT
    
    
      
      last
      max
      mean
      last
      max
      mean
      last
      max
      mean
    
    
      Date
      
      
      
      
      
      
      
      
      
    
  
  
    
      2016-01-10
      99.88
      102.89
      101.2700
      731.45
      746.45
      736.2420
      52.37
      54.93
      53.7280
    
    
      2016-01-17
      93.98
      100.58
      97.4640
      692.29
      730.85
      713.3620
      51.31
      53.80
      52.4760
    
    
      2016-01-24
      96.41
      96.53
      95.1700
      723.60
      723.60
      704.4225
      51.41
      51.48
      50.9675
    
    
      2016-01-31
      108.99
      108.99
      102.0920
      731.53
      731.53
      720.9700
      54.73
      54.73
      52.4660
    
    
      2016-02-07
      109.51
      115.27
      112.7300
      703.87
      784.50
      746.3720
      51.94
      54.88
      53.2680
    
    
      2016-02-14
      103.74
      103.74
      100.4880
      690.26
      690.26
      678.4580
      50.25
      50.25
      49.4780
    
    
      2016-02-21
      102.55
      105.80
      103.5225
      695.03
      710.00
      699.2525
      51.97
      52.33
      51.6725
    
    
      2016-02-28
      108.70
      108.70
      106.4720
      708.58
      708.58
      701.2820
      52.60
      52.60
      51.9280
    
    
      2016-03-06
      110.05
      110.25
      109.0820
      714.99
      719.00
      711.3220
      52.40
      52.97
      52.0200
    
    
      2016-03-13
      108.48
      108.48
      107.1880
      720.00
      720.00
      704.4160
      53.00
      53.00
      52.0360
    
    
      2016-03-20
      111.56
      112.15
      110.5180
      741.86
      741.86
      731.6820
      54.92
      54.92
      53.6080
    
    
      2016-03-27
      112.13
      112.13
      111.7900
      732.01
      742.36
      737.0825
      53.84
      54.11
      53.7025
    
    
      2016-04-03
      113.75
      116.73
      114.4660
      738.60
      750.10
      741.8660
      55.05
      55.05
      54.5600
    
    
      2016-04-10
      114.25
      114.25
      113.3380
      743.97
      750.06
      742.6340
      54.67
      55.43
      54.9040
    
    
      2016-04-17
      110.79
      112.23
      110.7360
      753.98
      754.01
      747.6340
      55.30
      55.30
      54.9000
    
    
      2016-04-24
      111.21
      112.55
      111.4060
      726.30
      769.51
      753.9300
      51.91
      56.63
      55.2240
    
    
      2016-05-01
      116.82
      119.58
      112.9400
      690.70
      725.42
      709.5540
      49.35
      52.26
      51.0980
    
    
      2016-05-08
      117.16
      118.04
      117.4320
      698.38
      698.38
      696.2140
      49.92
      50.34
      49.9940
    
    
      2016-05-15
      120.38
      120.41
      119.9860
      711.93
      723.41
      716.2300
      51.44
      51.44
      50.9180
    
    
      2016-05-22
      116.96
      119.38
      117.8020
      701.62
      715.99
      706.5540
      50.48
      51.72
      50.7900
    
    
      2016-05-29
      119.56
      119.56
      117.9180
      724.01
      724.01
      716.2060
      51.92
      51.93
      51.4140
    
    
      2016-06-05
      118.98
      119.46
      118.9075
      729.27
      734.53
      732.0100
      52.38
      52.64
      52.4300
    
    
      2016-06-12
      117.54
      119.24
      118.2580
      719.47
      724.91
      722.2100
      51.05
      52.24
      51.8600
    
    
      2016-06-19
      114.42
      115.30
      114.5320
      708.65
      719.00
      715.1100
      50.41
      50.41
      49.8380
    
    
      2016-06-26
      111.01
      114.65
      113.5840
      675.17
      699.06
      693.7700
      49.81
      51.28
      50.6020
    
    
      2016-07-03
      114.20
      114.67
      112.8880
      692.20
      692.20
      682.1280
      51.13
      51.13
      49.9560
    
    
      2016-07-10
      116.43
      116.63
      115.0900
      699.50
      699.50
      695.9050
      51.73
      51.73
      51.1900
    
    
      2016-07-17
      117.74
      118.63
      117.9940
      725.73
      725.73
      719.6200
      53.95
      53.95
      53.3580
    
    
      2016-07-24
      119.90
      121.92
      119.8760
      741.86
      741.86
      734.4300
      56.08
      56.15
      55.1240
    
    
      2016-07-31
      124.65
      127.52
      123.5960
      772.71
      772.71
      747.5480
      56.26
      56.61
      56.3720
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2017-01-08
      120.98
      120.98
      118.3550
      795.26
      795.26
      787.1275
      62.30
      62.79
      62.4400
    
    
      2017-01-15
      127.49
      127.49
      125.1640
      807.48
      807.86
      806.7760
      62.62
      63.06
      62.7560
    
    
      2017-01-22
      128.10
      128.41
      128.1950
      806.91
      807.08
      806.2300
      62.67
      62.68
      62.5650
    
    
      2017-01-29
      132.68
      132.68
      130.2000
      834.71
      837.81
      826.3380
      65.39
      65.39
      63.8720
    
    
      2017-02-05
      131.24
      133.22
      131.6920
      802.99
      814.66
      801.5980
      63.50
      65.69
      64.3320
    
    
      2017-02-12
      134.10
      134.49
      132.8820
      811.70
      811.70
      806.3800
      64.25
      64.25
      63.7160
    
    
      2017-02-19
      133.50
      134.70
      133.7640
      823.02
      823.02
      819.4620
      64.47
      64.74
      64.4720
    
    
      2017-02-26
      134.16
      135.89
      134.2875
      827.73
      830.12
      828.7925
      64.53
      64.61
      64.4725
    
    
      2017-03-05
      136.63
      137.09
      136.4480
      830.56
      833.85
      828.6840
      63.99
      64.69
      64.2860
    
    
      2017-03-12
      138.91
      138.91
      137.5380
      843.28
      843.28
      833.4280
      65.11
      65.19
      64.5440
    
    
      2017-03-19
      140.34
      140.34
      139.6000
      851.61
      851.61
      847.1740
      64.91
      65.01
      64.7500
    
    
      2017-03-26
      140.08
      141.15
      139.6700
      820.08
      851.40
      834.8800
      65.36
      65.36
      64.9040
    
    
      2017-04-02
      142.31
      142.40
      141.2220
      828.97
      833.50
      822.9660
      65.65
      65.65
      65.1560
    
    
      2017-04-09
      141.20
      142.26
      141.8720
      827.96
      835.51
      831.2900
      65.85
      66.30
      65.7900
    
    
      2017-04-16
      139.62
      141.00
      140.2850
      822.14
      825.39
      823.5425
      65.29
      65.61
      65.4800
    
    
      2017-04-23
      143.90
      143.90
      141.8460
      842.88
      842.88
      836.6680
      65.67
      65.67
      65.4300
    
    
      2017-04-30
      149.50
      149.50
      146.8020
      910.66
      910.66
      874.9380
      68.91
      68.91
      68.1040
    
    
      2017-05-07
      151.45
      153.60
      152.0600
      933.54
      933.54
      917.2060
      68.90
      69.71
      69.1400
    
    
      2017-05-14
      150.40
      151.49
      150.6280
      931.53
      936.95
      930.3800
      68.61
      68.99
      68.7580
    
    
      2017-05-21
      148.44
      150.17
      148.2880
      931.47
      940.00
      932.2180
      67.50
      68.89
      68.0320
    
    
      2017-05-28
      152.23
      152.23
      149.5280
      969.70
      969.70
      952.5860
      69.80
      69.80
      68.8500
    
    
      2017-06-04
      151.85
      152.70
      152.0675
      969.46
      975.02
      970.9350
      70.44
      70.53
      70.2500
    
    
      2017-06-11
      154.77
      154.77
      153.8340
      984.50
      984.50
      981.2420
      72.04
      72.64
      72.2920
    
    
      2017-06-18
      149.59
      151.26
      149.3680
      940.00
      959.92
      945.0720
      69.73
      70.91
      69.8360
    
    
      2017-06-25
      152.72
      153.01
      152.5360
      956.83
      958.70
      955.3300
      70.09
      70.82
      70.4320
    
    
      2017-07-02
      151.90
      156.25
      152.8380
      926.05
      969.90
      939.4660
      68.78
      71.40
      69.7760
    
    
      2017-07-09
      149.25
      151.72
      149.7500
      908.85
      912.18
      906.7275
      68.70
      69.33
      68.6400
    
    
      2017-07-16
      160.13
      160.13
      156.0840
      952.00
      952.00
      937.6560
      72.24
      72.24
      70.8000
    
    
      2017-07-23
      164.16
      164.80
      162.4920
      962.25
      975.00
      963.0180
      73.45
      74.18
      73.4040
    
    
      2017-07-30
      165.01
      165.01
      164.8250
      953.81
      972.22
      963.0150
      73.80
      73.80
      73.6650
    
  

82 rows × 9 columns

5 - For each stock, return the weeks for which the opening stock price was greater than the yearly daily average.



In [56]:

    
# create a dataframe containing the opening price for each week minus the mean over the year
# I'm appending start year resampling and en of year resampling in order to have all the weeks in the years considered
stock_difference = stock_open.resample('W').first() - stock_open.resample('AS').mean().append(stock_open.resample('A').mean()).resample('W').ffill()
stock_difference.head()



In [57]:

    
# print values for every stock:
for stock in stock_code:
    print('Weeks with opening price above yearly average for {}:\n{}\n'.format(stock,
                        # format the index where stock difference is greater than 0 and join the array using newline
                        '\n'.join(stock_difference[stock_difference[stock] > 0].index.strftime('Week %W of %Y'))))









    



Weeks with opening price above yearly average for GOOG:
Week 05 of 2016
Week 14 of 2016
Week 16 of 2016
Week 31 of 2016
Week 32 of 2016
Week 33 of 2016
Week 34 of 2016
Week 35 of 2016
Week 36 of 2016
Week 37 of 2016
Week 38 of 2016
Week 39 of 2016
Week 40 of 2016
Week 41 of 2016
Week 42 of 2016
Week 43 of 2016
Week 44 of 2016
Week 45 of 2016
Week 46 of 2016
Week 47 of 2016
Week 48 of 2016
Week 49 of 2016
Week 50 of 2016
Week 51 of 2016
Week 18 of 2017
Week 19 of 2017
Week 20 of 2017
Week 21 of 2017
Week 22 of 2017
Week 23 of 2017
Week 24 of 2017
Week 25 of 2017
Week 26 of 2017
Week 27 of 2017
Week 28 of 2017
Week 29 of 2017
Week 30 of 2017

Weeks with opening price above yearly average for FB:
Week 18 of 2016
Week 19 of 2016
Week 20 of 2016
Week 21 of 2016
Week 22 of 2016
Week 23 of 2016
Week 28 of 2016
Week 29 of 2016
Week 30 of 2016
Week 31 of 2016
Week 32 of 2016
Week 33 of 2016
Week 34 of 2016
Week 35 of 2016
Week 36 of 2016
Week 37 of 2016
Week 38 of 2016
Week 39 of 2016
Week 40 of 2016
Week 41 of 2016
Week 42 of 2016
Week 43 of 2016
Week 44 of 2016
Week 45 of 2016
Week 46 of 2016
Week 47 of 2016
Week 48 of 2016
Week 50 of 2016
Week 51 of 2016
Week 17 of 2017
Week 18 of 2017
Week 19 of 2017
Week 20 of 2017
Week 21 of 2017
Week 22 of 2017
Week 23 of 2017
Week 24 of 2017
Week 25 of 2017
Week 26 of 2017
Week 27 of 2017
Week 28 of 2017
Week 29 of 2017
Week 30 of 2017

Weeks with opening price above yearly average for MSFT:
Week 14 of 2016
Week 16 of 2016
Week 30 of 2016
Week 31 of 2016
Week 32 of 2016
Week 33 of 2016
Week 34 of 2016
Week 35 of 2016
Week 36 of 2016
Week 37 of 2016
Week 38 of 2016
Week 39 of 2016
Week 40 of 2016
Week 41 of 2016
Week 42 of 2016
Week 43 of 2016
Week 44 of 2016
Week 45 of 2016
Week 46 of 2016
Week 47 of 2016
Week 48 of 2016
Week 49 of 2016
Week 50 of 2016
Week 51 of 2016
Week 17 of 2017
Week 18 of 2017
Week 19 of 2017
Week 20 of 2017
Week 21 of 2017
Week 22 of 2017
Week 23 of 2017
Week 24 of 2017
Week 25 of 2017
Week 26 of 2017
Week 27 of 2017
Week 28 of 2017
Week 29 of 2017
Week 30 of 2017

	x	y1	y2
0	50	0.570197	0.677817
1	51	0.438602	0.270008
2	52	0.988374	0.735194
3	53	0.102045	0.962189
4	54	0.208877	0.248753
5	55	0.161310	0.576157
6	56	0.653108	0.592042
7	57	0.253292	0.572252
8	58	0.466311	0.223082
9	59	0.244426	0.952749
10	60	0.158970	0.447125
11	61	0.110375	0.846409
12	62	0.656330	0.699479
13	63	0.138183	0.297437
14	64	0.196582	0.813798
15	65	0.368725	0.396506
16	66	0.820993	0.881103
17	67	0.097101	0.581273
18	68	0.837945	0.881735
19	69	0.096098	0.692532
20	70	0.976459	0.725254
21	71	0.468651	0.501324
22	72	0.976761	0.956084
23	73	0.604846	0.643990
24	74	0.739264	0.423855
25	75	0.039188	0.606393
26	76	0.282807	0.019193
27	77	0.120197	0.301575
28	78	0.296140	0.660174
29	79	0.118728	0.290078
30	80	0.317983	0.618015
31	81	0.414263	0.428769
32	82	0.064147	0.135474
33	83	0.692472	0.298282
34	84	0.566601	0.569965
35	85	0.265389	0.590873
36	86	0.523248	0.574325
37	87	0.093941	0.653201
38	88	0.575946	0.652103
39	89	0.929296	0.431418
40	90	0.318569	0.896547
41	91	0.667410	0.367562
42	92	0.131798	0.435865
43	93	0.716327	0.891923
44	94	0.289406	0.806194
45	95	0.183191	0.703889
46	96	0.586513	0.100227
47	97	0.020108	0.919483
48	98	0.828940	0.714241
49	99	0.004695	0.998847

minor	FB	GOOG	MSFT
Date
2016-01-03	NaN	NaN	NaN
2016-01-10	-15.130754	-0.738532	-0.910476
2016-01-17	-19.170754	-27.128532	-2.720476
2016-01-24	-20.550754	-40.438532	-3.750476
2016-01-31	-18.360754	-20.158532	-3.290476

	Unnamed: 0	sex	age	educ	country	y	v1	v2	v3	v4	v5	v6
890	56676	1.0	42.0	1.0	Oceana	5	4	1	3	3	2	4
891	58098	0.0	41.0	1.0	Oceana	5	3	4	3	3	4	4
892	117252	1.0	41.0	6.0	Eurasia	5	4	2	1	4	4	3
893	110212	0.0	40.0	3.0	Eurasia	4	3	2	3	4	3	3
894	168326	0.0	24.0	4.0	Eastasia	5	3	4	3	3	3	4
895	95744	1.0	70.0	1.0	Eastasia	3	2	1	1	2	1	1
896	109491	1.0	18.0	4.0	Eurasia	3	1	1	1	1	1	2
897	65788	1.0	19.0	1.0	Eastasia	5	3	3	3	3	3	3
898	147766	0.0	53.0	4.0	Eastasia	4	3	3	3	3	3	3
899	116952	1.0	18.0	3.0	Eurasia	5	4	4	4	4	4	4

	id	sex	age	educ	y	v1	v2	v3	v4	v5	v6
count	900.000000	898.000000	892.000000	890.000000	900.000000	900.000000	900.000000	900.000000	900.000000	900.000000	900.000000
mean	90665.368889	0.556793	40.744395	2.941573	3.520000	2.648889	2.535556	3.664444	4.084444	3.866667	4.380000
std	44234.598996	0.497041	16.743316	1.600394	1.293709	1.151991	1.267310	1.016363	0.955973	0.984869	0.989399
min	142.000000	0.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
25%	52621.000000	0.000000	27.000000	1.000000	3.000000	2.000000	2.000000	3.000000	3.000000	3.000000	4.000000
50%	108699.000000	1.000000	39.000000	3.000000	4.000000	3.000000	2.000000	4.000000	4.000000	4.000000	5.000000
75%	119329.000000	1.000000	52.000000	4.000000	5.000000	3.000000	3.000000	4.000000	5.000000	5.000000	5.000000
max	171811.000000	1.000000	90.000000	7.000000	5.000000	5.000000	5.000000	5.000000	5.000000	5.000000	5.000000

	Unnamed: 0	sex	age	educ	country	y	v1	v2	v3	v4	v5	v6
99	57136	1.0	28.0	1.0	Oceana	5	5	5	5	5	5	5
100	71010	1.0	51.0	1.0	Eastasia	5	5	5	5	5	5	5
101	145298	1.0	20.0	2.0	Eastasia	2	2	1	3	4	5	3
102	162131	1.0	43.0	5.0	Eastasia	3	3	3	3	3	3	3
103	81406	0.0	45.0	1.0	Eastasia	1	1	1	1	1	1	1
104	164869	1.0	61.0	6.0	Eastasia	3	3	2	4	5	5	5
105	110303	0.0	37.0	6.0	Eurasia	3	3	2	3	5	3	3
106	78048	1.0	28.0	1.0	Oceana	2	2	2	4	5	4	5
107	118281	1.0	23.0	3.0	Eurasia	3	3	4	3	3	3	3
108	24024	1.0	75.0	3.0	Oceana	2	2	2	3	4	3	5
109	101158	1.0	25.0	3.0	Eastasia	2	5	4	3	4	5	2

	score
cat
v1	0.186667
v2	0.102222
v3	0.202222
v4	0.220000
v5	0.062222
v6	0.226667

	id	sex	age	educ	country	y	v1	v2	v3	v4	v5	v6
0	109276	0.0	20.0	4.0	Eurasia	1	4	3	3	5	3	4
1	88178	1.0	25.0	4.0	Oceana	2	3	3	5	5	5	5
2	111063	1.0	56.0	2.0	Eastasia	2	3	2	4	5	5	4
3	161488	0.0	65.0	6.0	Eastasia	2	3	3	5	5	5	5
4	44532	1.0	50.0	5.0	Oceana	1	5	3	5	5	3	5

	id	sex	age	educ	country	y	v1	v2	v3	v4	v5	v6
260	129355	1.0	83.0	1.0	Eastasia	4	1	2	4	5	5	5
415	142546	0.0	70.0	1.0	Eastasia	5	2	3	3	4	4	3
445	95744	1.0	70.0	1.0	Eastasia	3	2	1	1	2	1	1
147	128180	0.0	68.0	1.0	Eastasia	3	2	2	5	3	3	5
268	93407	1.0	64.0	1.0	Eastasia	3	1	3	2	2	2	5

	id	sex	age	educ	country	y	v1	v2	v3	v4	v5	v6
261	137663	0.0	50.0	1.0	Eastasia	4	2	2	3	5	4	5
223	127868	1.0	46.0	1.0	Eastasia	5	1	1	5	5	5	1
103	81406	0.0	45.0	1.0	Eastasia	1	1	1	1	1	1	1
360	138535	1.0	45.0	1.0	Eastasia	5	2	3	4	5	4	5
319	129590	1.0	43.0	1.0	Eastasia	5	1	1	4	4	3	5
348	159708	1.0	43.0	1.0	Eastasia	5	4	2	4	5	3	5
199	156041	1.0	40.0	1.0	Eastasia	4	2	4	4	4	3	5
243	160339	1.0	39.0	1.0	Eastasia	5	4	5	5	3	4	5
74	140966	0.0	35.0	1.0	Eastasia	1	3	4	5	5	3	5
29	8524	1.0	32.0	1.0	Eastasia	4	5	4	5	5	5	5
204	8296	0.0	32.0	1.0	Eastasia	5	3	3	5	5	5	5
122	166412	0.0	31.0	1.0	Eastasia	3	3	3	4	3	4	3
263	114086	0.0	30.0	1.0	Eastasia	3	1	2	2	1	3	4
302	82600	1.0	30.0	1.0	Eastasia	4	2	1	3	4	3	5
198	125327	1.0	28.0	1.0	Eastasia	4	3	2	5	5	3	5
220	125656	0.0	27.0	1.0	Eastasia	3	1	1	3	3	3	4
259	20075	1.0	25.0	1.0	Eastasia	5	1	2	5	5	3	5
344	105904	1.0	25.0	1.0	Eastasia	5	2	1	3	5	5	5
374	149219	1.0	46.0	2.0	Eastasia	5	4	2	4	3	3	4
421	107534	1.0	41.0	2.0	Eastasia	4	2	5	3	2	4	3
242	146950	0.0	38.0	2.0	Eastasia	5	3	2	5	5	4	5
367	162577	1.0	36.0	2.0	Eastasia	4	3	3	3	3	3	3
411	147495	1.0	33.0	2.0	Eastasia	4	3	3	3	3	3	3
303	146857	0.0	31.0	2.0	Eastasia	4	3	2	3	5	3	5
162	82247	1.0	30.0	2.0	Eastasia	2	1	2	3	4	5	4
276	111809	1.0	27.0	2.0	Eastasia	3	1	4	1	2	4	4
150	128252	0.0	48.0	3.0	Eastasia	3	2	3	5	3	3	5
258	124356	1.0	48.0	3.0	Eastasia	5	2	3	5	5	5	5
435	20416	0.0	48.0	3.0	Eastasia	5	2	3	3	2	3	3
286	121943	0.0	47.0	3.0	Eastasia	3	1	1	1	4	5	5
...	...	...	...	...	...	...	...	...	...	...	...	...
129	27969	0.0	44.0	4.0	Oceana	3	3	3	4	4	4	4
85	25324	0.0	43.0	4.0	Oceana	3	5	4	5	4	5	3
96	28150	0.0	42.0	4.0	Oceana	2	2	3	4	4	4	5
146	42669	1.0	40.0	4.0	Oceana	2	2	2	3	5	4	5
334	3073	0.0	40.0	4.0	Oceana	4	3	4	3	4	4	4
120	41762	1.0	35.0	4.0	Oceana	2	2	2	5	4	5	5
166	42641	1.0	35.0	4.0	Oceana	3	1	3	4	4	5	5
254	41344	0.0	34.0	4.0	Oceana	3	2	2	3	5	4	5
284	42168	0.0	33.0	4.0	Oceana	4	3	2	3	4	4	5
241	42743	0.0	30.0	4.0	Oceana	5	2	2	5	5	4	5
80	79340	0.0	29.0	4.0	Oceana	3	3	4	3	3	4	5
358	92330	0.0	29.0	4.0	Oceana	5	3	3	4	4	4	5
155	89971	1.0	28.0	4.0	Oceana	2	1	1	5	3	5	5
61	4949	1.0	26.0	4.0	Oceana	2	5	5	5	4	4	4
266	43344	0.0	26.0	4.0	Oceana	4	2	2	3	4	4	5
1	88178	1.0	25.0	4.0	Oceana	2	3	3	5	5	5	5
174	44669	0.0	25.0	4.0	Oceana	3	1	1	4	3	4	4
4	44532	1.0	50.0	5.0	Oceana	1	5	3	5	5	3	5
14	30513	0.0	49.0	5.0	Oceana	1	3	4	4	4	5	5
19	41371	1.0	41.0	5.0	Oceana	1	2	1	4	4	3	5
281	92195	1.0	31.0	5.0	Oceana	4	2	1	3	4	5	5
23	92556	0.0	30.0	5.0	Oceana	1	2	1	4	5	4	5
133	88111	0.0	30.0	5.0	Oceana	3	3	2	4	5	5	5
175	23450	1.0	30.0	5.0	Oceana	3	1	1	4	4	4	5
251	91858	1.0	29.0	5.0	Oceana	5	2	1	5	5	3	5
196	23272	0.0	37.0	6.0	Oceana	3	2	2	5	5	4	5
34	25441	0.0	42.0	7.0	Oceana	2	4	4	4	4	4	5
401	26614	0.0	33.0	7.0	Oceana	5	3	5	4	5	4	3
151	88856	0.0	30.0	7.0	Oceana	3	2	2	5	4	4	5
58	60125	0.0	41.0	NaN	Oceana	1	5	2	4	5	5	5

	id	sex	age	educ	country	y	v1	v2	v3	v4	v5	v6
44	76803	1.0	71.0	1.0	Oceana	2	4	4	4	5	4	5
48	60488	1.0	63.0	1.0	Oceana	4	5	4	5	5	5	5
136	61252	0.0	62.0	1.0	Oceana	5	5	5	5	5	5	5
145	603	1.0	61.0	1.0	Oceana	2	2	2	5	5	5	5
171	42755	1.0	60.0	1.0	Oceana	3	2	2	5	5	4	5
309	88267	0.0	58.0	1.0	Oceana	4	2	2	3	5	5	5
436	60079	0.0	57.0	1.0	Oceana	4	1	3	1	2	3	2
89	76784	1.0	55.0	1.0	Oceana	1	1	1	5	5	5	5
142	463	0.0	55.0	1.0	Oceana	5	5	5	5	5	5	4
20	60279	0.0	54.0	1.0	Oceana	1	5	4	5	5	4	5
140	90873	1.0	53.0	1.0	Oceana	2	2	2	3	4	4	4
154	78538	1.0	52.0	1.0	Oceana	3	1	1	4	4	3	5
252	59708	1.0	51.0	1.0	Oceana	4	2	2	4	3	3	5
163	4028	0.0	50.0	1.0	Oceana	3	1	1	4	5	5	5
312	78421	1.0	48.0	1.0	Oceana	4	2	1	3	3	4	5
336	78608	0.0	48.0	1.0	Oceana	5	2	2	3	4	4	5
167	78558	1.0	47.0	1.0	Oceana	3	2	1	4	5	4	5
13	57867	0.0	46.0	1.0	Oceana	1	5	5	5	5	5	5
67	73821	1.0	45.0	1.0	Oceana	2	4	2	5	3	4	5
132	61135	0.0	45.0	1.0	Oceana	4	4	4	4	4	4	4
408	92438	1.0	44.0	1.0	Oceana	4	3	5	2	4	4	1
269	58757	1.0	43.0	1.0	Oceana	4	3	2	3	4	3	5
440	56676	1.0	42.0	1.0	Oceana	5	4	1	3	3	2	4
441	58098	0.0	41.0	1.0	Oceana	5	3	4	3	3	4	4
118	78703	1.0	40.0	1.0	Oceana	3	4	2	5	5	4	3
245	78405	0.0	40.0	1.0	Oceana	4	3	3	4	4	4	5
170	78731	1.0	38.0	1.0	Oceana	3	2	2	4	5	4	5
93	60470	0.0	37.0	1.0	Oceana	2	2	2	3	5	5	5
116	56613	0.0	35.0	1.0	Oceana	5	5	4	4	5	5	5
111	61465	0.0	34.0	1.0	Oceana	2	2	2	2	4	4	5
...	...	...	...	...	...	...	...	...	...	...	...	...
201	3085	0.0	24.0	4.0	Oceana	3	1	2	3	5	3	5
149	4360	1.0	23.0	4.0	Oceana	3	2	1	4	5	4	5
247	78281	1.0	22.0	4.0	Oceana	3	1	1	3	5	5	5
60	30266	0.0	19.0	4.0	Oceana	1	3	4	5	5	5	5
291	28638	1.0	19.0	4.0	Oceana	4	3	3	3	4	4	5
45	23208	0.0	18.0	4.0	Oceana	2	3	1	3	3	3	3
271	42296	0.0	18.0	4.0	Oceana	4	2	2	3	4	3	5
192	23288	1.0	62.0	5.0	Oceana	3	2	2	4	3	3	5
256	23924	0.0	60.0	5.0	Oceana	3	1	1	3	4	4	5
72	30741	1.0	55.0	5.0	Oceana	4	5	5	5	5	5	5
4	44532	1.0	50.0	5.0	Oceana	1	5	3	5	5	3	5
14	30513	0.0	49.0	5.0	Oceana	1	3	4	4	4	5	5
19	41371	1.0	41.0	5.0	Oceana	1	2	1	4	4	3	5
281	92195	1.0	31.0	5.0	Oceana	4	2	1	3	4	5	5
23	92556	0.0	30.0	5.0	Oceana	1	2	1	4	5	4	5
133	88111	0.0	30.0	5.0	Oceana	3	3	2	4	5	5	5
175	23450	1.0	30.0	5.0	Oceana	3	1	1	4	4	4	5
251	91858	1.0	29.0	5.0	Oceana	5	2	1	5	5	3	5
63	90418	1.0	24.0	5.0	Oceana	2	3	2	5	5	3	5
43	31052	0.0	22.0	5.0	Oceana	1	4	3	4	5	4	5
189	90645	0.0	22.0	5.0	Oceana	3	2	1	4	4	4	5
238	92750	1.0	19.0	5.0	Oceana	4	3	3	4	4	5	5
115	24643	0.0	59.0	6.0	Oceana	3	3	2	4	4	3	5
196	23272	0.0	37.0	6.0	Oceana	3	2	2	5	5	4	5
62	30485	0.0	68.0	7.0	Oceana	2	3	4	3	4	5	5
34	25441	0.0	42.0	7.0	Oceana	2	4	4	4	4	4	5
401	26614	0.0	33.0	7.0	Oceana	5	3	5	4	5	4	3
151	88856	0.0	30.0	7.0	Oceana	3	2	2	5	4	4	5
58	60125	0.0	41.0	NaN	Oceana	1	5	2	4	5	5	5
313	60617	NaN	1.0	NaN	Oceana	4	3	4	3	4	4	4

	id	sex	age	educ	country	y	v1	v2	v3	v4	v5	v6
313	60617	NaN	1.0	NaN	Oceana	4	3	4	3	4	4	4
316	60716	1.0	17.0	1.0	Oceana	5	2	1	4	4	5	5
158	76998	0.0	18.0	1.0	Oceana	3	2	2	5	5	5	5
232	58633	1.0	18.0	1.0	Oceana	5	4	3	5	3	3	5
449	116952	1.0	18.0	3.0	Eurasia	5	4	4	4	4	4	4

	id	sex	age	educ	y	v1	v2	v3	v4	v5	v6
age_cat
0.0	97924.298851	0.534884	20.505747	3.317647	3.643678	2.517241	2.597701	3.655172	3.954023	3.781609	4.471264
1.0	92976.567797	0.559322	36.686441	3.034335	3.605932	2.631356	2.474576	3.580508	4.055085	3.830508	4.266949
2.0	81100.097561	0.569106	62.845528	2.552846	3.243902	2.780488	2.593496	3.821138	4.219512	3.975610	4.520325

		age	educ	id	sex	v1	v2	v3	v4	v5	v6	y
age_cat
0.0	count	20.000000	20.000000	20.000000	20.000000	20.000000	20.000000	20.000000	20.000000	20.000000	20.000000	20.000000
	mean	21.750000	5.200000	97745.450000	0.500000	2.800000	3.100000	3.750000	4.200000	4.000000	4.650000	3.150000
	std	1.585294	0.410391	35834.636660	0.512989	1.151658	1.483240	0.786398	0.695852	0.858395	0.587143	1.386969
	min	19.000000	5.000000	31052.000000	0.000000	1.000000	1.000000	3.000000	3.000000	3.000000	3.000000	1.000000
	25%	20.000000	5.000000	90588.250000	0.000000	2.000000	2.000000	3.000000	4.000000	3.000000	4.000000	2.000000
	50%	22.000000	5.000000	108682.000000	0.500000	3.000000	3.000000	4.000000	4.000000	4.000000	5.000000	3.000000
	75%	23.000000	5.000000	116392.250000	1.000000	4.000000	4.250000	4.000000	5.000000	5.000000	5.000000	4.000000
	max	24.000000	6.000000	171662.000000	1.000000	5.000000	5.000000	5.000000	5.000000	5.000000	5.000000	5.000000
1.0	count	43.000000	43.000000	43.000000	43.000000	43.000000	43.000000	43.000000	43.000000	43.000000	43.000000	43.000000
	mean	36.767442	5.488372	89611.930233	0.488372	2.767442	2.395349	3.581395	4.093023	3.720930	4.162791	3.581395
	std	7.057037	0.702787	39754.316402	0.505781	1.087531	1.177963	1.096153	0.921023	1.031079	1.111203	1.257976
	min	25.000000	5.000000	11633.000000	0.000000	1.000000	1.000000	1.000000	2.000000	1.000000	2.000000	1.000000
	25%	30.000000	5.000000	50499.500000	0.000000	2.000000	2.000000	3.000000	4.000000	3.000000	3.000000	3.000000
	50%	37.000000	5.000000	108928.000000	0.000000	3.000000	2.000000	4.000000	4.000000	4.000000	5.000000	4.000000
	75%	42.000000	6.000000	116362.500000	1.000000	3.500000	3.000000	4.000000	5.000000	4.000000	5.000000	5.000000
	max	50.000000	7.000000	162131.000000	1.000000	5.000000	5.000000	5.000000	5.000000	5.000000	5.000000	5.000000
2.0	count	18.000000	18.000000	18.000000	18.000000	18.000000	18.000000	18.000000	18.000000	18.000000	18.000000	18.000000
	mean	58.555556	5.555556	83721.611111	0.666667	2.888889	2.833333	4.111111	4.166667	4.222222	4.611111	3.222222
	std	5.802862	0.704792	54036.236832	0.485071	1.182663	1.465285	0.676400	1.098127	0.808452	0.849837	1.060275
	min	51.000000	5.000000	23288.000000	0.000000	1.000000	1.000000	3.000000	1.000000	3.000000	2.000000	1.000000
	25%	54.250000	5.000000	34431.750000	0.000000	2.000000	2.000000	4.000000	4.000000	4.000000	5.000000	3.000000
	50%	58.000000	5.000000	51936.500000	1.000000	3.000000	2.000000	4.000000	4.500000	4.000000	5.000000	3.000000
	75%	63.500000	6.000000	132317.250000	1.000000	3.750000	4.000000	4.750000	5.000000	5.000000	5.000000	4.000000
	max	68.000000	7.000000	164869.000000	1.000000	5.000000	5.000000	5.000000	5.000000	5.000000	5.000000	5.000000

	x	y1	y2
0	0	0.548814	NaN
1	1	0.715189	NaN
2	2	0.602763	NaN
3	3	0.544883	NaN
4	4	0.423655	NaN
5	5	0.645894	NaN
6	6	0.437587	NaN
7	7	0.891773	NaN
8	8	0.963663	NaN
9	9	0.383442	NaN
10	10	0.791725	NaN
11	11	0.528895	NaN
12	12	0.568045	NaN
13	13	0.925597	NaN
14	14	0.071036	NaN
15	15	0.087129	NaN
16	16	0.020218	NaN
17	17	0.832620	NaN
18	18	0.778157	NaN
19	19	0.870012	NaN
20	20	0.978618	NaN
21	21	0.799159	NaN
22	22	0.461479	NaN
23	23	0.780529	NaN
24	24	0.118274	NaN
25	25	0.639921	NaN
26	26	0.143353	NaN
27	27	0.944669	NaN
28	28	0.521848	NaN
29	29	0.414662	NaN
...	...	...	...
120	120	NaN	0.521037
121	121	NaN	0.054338
122	122	NaN	0.199997
123	123	NaN	0.018522
124	124	NaN	0.793698
125	125	NaN	0.223925
126	126	NaN	0.345352
127	127	NaN	0.928081
128	128	NaN	0.704414
129	129	NaN	0.031839
130	130	NaN	0.164694
131	131	NaN	0.621478
132	132	NaN	0.577229
133	133	NaN	0.237893
134	134	NaN	0.934214
135	135	NaN	0.613966
136	136	NaN	0.535633
137	137	NaN	0.589910
138	138	NaN	0.730122
139	139	NaN	0.311945
140	140	NaN	0.398221
141	141	NaN	0.209844
142	142	NaN	0.186193
143	143	NaN	0.944372
144	144	NaN	0.739551
145	145	NaN	0.490459
146	146	NaN	0.227415
147	147	NaN	0.254356
148	148	NaN	0.058029
149	149	NaN	0.434417

minor	FB	GOOG	MSFT
Date
2016-01-04	101.95	743.00	54.32
2016-01-05	102.89	746.45	54.93
2016-01-06	101.13	730.00	54.32
2016-01-07	100.50	730.31	52.70
2016-01-08	99.88	731.45	52.37
2016-01-11	97.91	716.61	52.51
2016-01-12	99.00	721.68	52.76
2016-01-13	100.58	730.85	53.80
2016-01-14	95.85	705.38	52.00
2016-01-15	93.98	692.29	51.31

	FB			GOOG			MSFT
	last	max	mean	last	max	mean	last	max	mean
Date
2016-01-10	99.88	102.89	101.2700	731.45	746.45	736.2420	52.37	54.93	53.7280
2016-01-17	93.98	100.58	97.4640	692.29	730.85	713.3620	51.31	53.80	52.4760
2016-01-24	96.41	96.53	95.1700	723.60	723.60	704.4225	51.41	51.48	50.9675
2016-01-31	108.99	108.99	102.0920	731.53	731.53	720.9700	54.73	54.73	52.4660
2016-02-07	109.51	115.27	112.7300	703.87	784.50	746.3720	51.94	54.88	53.2680
2016-02-14	103.74	103.74	100.4880	690.26	690.26	678.4580	50.25	50.25	49.4780
2016-02-21	102.55	105.80	103.5225	695.03	710.00	699.2525	51.97	52.33	51.6725
2016-02-28	108.70	108.70	106.4720	708.58	708.58	701.2820	52.60	52.60	51.9280
2016-03-06	110.05	110.25	109.0820	714.99	719.00	711.3220	52.40	52.97	52.0200
2016-03-13	108.48	108.48	107.1880	720.00	720.00	704.4160	53.00	53.00	52.0360
2016-03-20	111.56	112.15	110.5180	741.86	741.86	731.6820	54.92	54.92	53.6080
2016-03-27	112.13	112.13	111.7900	732.01	742.36	737.0825	53.84	54.11	53.7025
2016-04-03	113.75	116.73	114.4660	738.60	750.10	741.8660	55.05	55.05	54.5600
2016-04-10	114.25	114.25	113.3380	743.97	750.06	742.6340	54.67	55.43	54.9040
2016-04-17	110.79	112.23	110.7360	753.98	754.01	747.6340	55.30	55.30	54.9000
2016-04-24	111.21	112.55	111.4060	726.30	769.51	753.9300	51.91	56.63	55.2240
2016-05-01	116.82	119.58	112.9400	690.70	725.42	709.5540	49.35	52.26	51.0980
2016-05-08	117.16	118.04	117.4320	698.38	698.38	696.2140	49.92	50.34	49.9940
2016-05-15	120.38	120.41	119.9860	711.93	723.41	716.2300	51.44	51.44	50.9180
2016-05-22	116.96	119.38	117.8020	701.62	715.99	706.5540	50.48	51.72	50.7900
2016-05-29	119.56	119.56	117.9180	724.01	724.01	716.2060	51.92	51.93	51.4140
2016-06-05	118.98	119.46	118.9075	729.27	734.53	732.0100	52.38	52.64	52.4300
2016-06-12	117.54	119.24	118.2580	719.47	724.91	722.2100	51.05	52.24	51.8600
2016-06-19	114.42	115.30	114.5320	708.65	719.00	715.1100	50.41	50.41	49.8380
2016-06-26	111.01	114.65	113.5840	675.17	699.06	693.7700	49.81	51.28	50.6020
2016-07-03	114.20	114.67	112.8880	692.20	692.20	682.1280	51.13	51.13	49.9560
2016-07-10	116.43	116.63	115.0900	699.50	699.50	695.9050	51.73	51.73	51.1900
2016-07-17	117.74	118.63	117.9940	725.73	725.73	719.6200	53.95	53.95	53.3580
2016-07-24	119.90	121.92	119.8760	741.86	741.86	734.4300	56.08	56.15	55.1240
2016-07-31	124.65	127.52	123.5960	772.71	772.71	747.5480	56.26	56.61	56.3720
...	...	...	...	...	...	...	...	...	...
2017-01-08	120.98	120.98	118.3550	795.26	795.26	787.1275	62.30	62.79	62.4400
2017-01-15	127.49	127.49	125.1640	807.48	807.86	806.7760	62.62	63.06	62.7560
2017-01-22	128.10	128.41	128.1950	806.91	807.08	806.2300	62.67	62.68	62.5650
2017-01-29	132.68	132.68	130.2000	834.71	837.81	826.3380	65.39	65.39	63.8720
2017-02-05	131.24	133.22	131.6920	802.99	814.66	801.5980	63.50	65.69	64.3320
2017-02-12	134.10	134.49	132.8820	811.70	811.70	806.3800	64.25	64.25	63.7160
2017-02-19	133.50	134.70	133.7640	823.02	823.02	819.4620	64.47	64.74	64.4720
2017-02-26	134.16	135.89	134.2875	827.73	830.12	828.7925	64.53	64.61	64.4725
2017-03-05	136.63	137.09	136.4480	830.56	833.85	828.6840	63.99	64.69	64.2860
2017-03-12	138.91	138.91	137.5380	843.28	843.28	833.4280	65.11	65.19	64.5440
2017-03-19	140.34	140.34	139.6000	851.61	851.61	847.1740	64.91	65.01	64.7500
2017-03-26	140.08	141.15	139.6700	820.08	851.40	834.8800	65.36	65.36	64.9040
2017-04-02	142.31	142.40	141.2220	828.97	833.50	822.9660	65.65	65.65	65.1560
2017-04-09	141.20	142.26	141.8720	827.96	835.51	831.2900	65.85	66.30	65.7900
2017-04-16	139.62	141.00	140.2850	822.14	825.39	823.5425	65.29	65.61	65.4800
2017-04-23	143.90	143.90	141.8460	842.88	842.88	836.6680	65.67	65.67	65.4300
2017-04-30	149.50	149.50	146.8020	910.66	910.66	874.9380	68.91	68.91	68.1040
2017-05-07	151.45	153.60	152.0600	933.54	933.54	917.2060	68.90	69.71	69.1400
2017-05-14	150.40	151.49	150.6280	931.53	936.95	930.3800	68.61	68.99	68.7580
2017-05-21	148.44	150.17	148.2880	931.47	940.00	932.2180	67.50	68.89	68.0320
2017-05-28	152.23	152.23	149.5280	969.70	969.70	952.5860	69.80	69.80	68.8500
2017-06-04	151.85	152.70	152.0675	969.46	975.02	970.9350	70.44	70.53	70.2500
2017-06-11	154.77	154.77	153.8340	984.50	984.50	981.2420	72.04	72.64	72.2920
2017-06-18	149.59	151.26	149.3680	940.00	959.92	945.0720	69.73	70.91	69.8360
2017-06-25	152.72	153.01	152.5360	956.83	958.70	955.3300	70.09	70.82	70.4320
2017-07-02	151.90	156.25	152.8380	926.05	969.90	939.4660	68.78	71.40	69.7760
2017-07-09	149.25	151.72	149.7500	908.85	912.18	906.7275	68.70	69.33	68.6400
2017-07-16	160.13	160.13	156.0840	952.00	952.00	937.6560	72.24	72.24	70.8000
2017-07-23	164.16	164.80	162.4920	962.25	975.00	963.0180	73.45	74.18	73.4040
2017-07-30	165.01	165.01	164.8250	953.81	972.22	963.0150	73.80	73.80	73.6650

Pandas Exercise

Importing data and working with Data Frames

Sorting, Filtering, and Grouping data

Merging, Indexes, and Apply()

Time Series

Merging, Indexes, and `Apply()`