get_Aggregate_Histogram


Structure of spf files

TARGET_PERIOD:

This column stands for the point (or period) in time to which the forecast refers to. Depending on the macroeconomic variable, the target period is given either as year (format “yyyy”), quarter (format “yyyyQq”) or month (format “yyyymmm”), where the month is a mixed-case three-character tag of calendar months in English.
1) a forecast for the current calendar year
2) a forecast for the next calendar year
3) a forecast for the calendar year after next
4) a longer term forecast (four calendar years ahead in the Q1 and Q2 rounds and five calendar years ahead in the Q3 and Q4 rounds)
5) a “rolling horizon” forecast for the month (for HICP inflation and the unemployment rate) and quarter (for GDP growth) one year ahead of the latest available observation (at the time of the survey)
6) a “rolling horizon” forecast for the month (for HICP inflation and the unemployment rate) and quarter (for GDP growth) two years ahead of the latest available observation (at the time of the survey).

FCT_SOURCE

This column stands for forecast source or forecaster ID, is the code number assigned to an individual forecaster. This number remains the same for a specific forecaster over all forecast rounds (only the subset of forecasters that were actually responding in a particular round is reported).

POINT

This column contains the point forecast of the forecaster for the macroeconomic indicator. For HICP it is year-on-year change in HICP

The following columns

These columns contain the probability assigned to each of the intervals of the forecasted variable. The headers of the columns specify the intervals

  • TN2_0: Inflation expectations; year-on-year change in HICP; probability for interval < -2.0 %
  • TN1_0: Inflation expectations; year-on-year change in HICP; probability for interval < -1.0 %
  • FN2_0TN1_6: Inflation expectations; year-on-year change in HICP; probability for interval from -2.0 to -1.6 %
  • FN1_5TN1_1: Inflation expectations; year-on-year change in HICP; probability for interval from -1.5 to -1.1 %
  • FN1_0TN0_6: Inflation expectations; year-on-year change in HICP; probability for interval from -1.0 to -0.6 %
  • FN0_5TN0_1: Inflation expectations; year-on-year change in HICP; probability for interval from -0.5 to -0.1 %
  • T0_0 Inflation expectations; year-on-year change in HICP; probability for interval < 0.0 %
  • F0_0T0_4: Inflation expectations; year-on-year change in HICP; probability for interval 0.0 - 0.4 %
  • F0_5T0_9: Inflation expectations; year-on-year change in HICP; probability for interval 0.5 - 0.9 %
  • F1_0T1_4: Inflation expectations; year-on-year change in HICP; probability for interval 1.0 - 1.4 %
  • F1_5T1_9: Inflation expectations; year-on-year change in HICP; probability for interval 1.5 - 1.9 %
  • F2_0T2_4: Inflation expectations; year-on-year change in HICP; probability for interval 2.0 - 2.4 %
  • F2_5T2_9: Inflation expectations; year-on-year change in HICP; probability for interval 2.5 - 2.9 %
  • F3_0T3_4: Inflation expectations; year-on-year change in HICP; probability for interval 3.0 - 3.4 %
  • F3_5T3_9: Inflation expectations; year-on-year change in HICP; probability for interval 3.5 - 3.9 %
  • F3_5 Inflation expectations; year-on-year change in HICP; probability for interval >= 3.5 %
  • F4_0: Inflation expectations; year-on-year change in HICP; probability for interval >= 4.0 %

TN1_0 FN1_0TN0_6 FN0_5TN0_1 F0_0T0_4 F0_5T0_9 F1_0T1_4 F1_5T1_9 F2_0T2_4 F2_5T2_9 F3_0T3_4 F3_5T3_9 F4_0

TN1_0 FN1_0TN0_6 FN0_5TN0_1 F0_0T0_4 F0_5T0_9 F1_0T1_4 F1_5T1_9 F2_0T2_4 F2_5T2_9 F3_0T3_4 F3_5T3_9 F4_0


In [22]:
import os
import pandas as pd
import scipy.stats as stats
import scipy.optimize as opt
import numpy as np

In [2]:
raw_data_path = 'C:\\Users\\EEU227\\Documents\\IPython Notebooks\\DEEdata\\SPF_individual_forecasts'

if not os.path.isdir(raw_data_path + '\\CleanData'):
    os.mkdir(raw_data_path + '\\CleanData')

out_data = raw_data_path + '\\CleanData\\'

In [103]:
writer = pd.ExcelWriter(out_data + 'SPF_aggregate_histogram.xlsx')
years = [2013,2014,2015,2016]
quarters = [1,2,3,4]
for year in years:
    for q in quarters:
        f = str(year) + 'Q' + str(q)
        fname = f + '.csv'
        if os.path.isfile(raw_data_path + '\\' + fname):
            raw_df = pd.read_csv(raw_data_path + '\\' + fname,header = 1)
            # find the row where the growth expectations start
            dum = raw_df[raw_df['TARGET_PERIOD'] == 'GROWTH EXPECTATIONS; YEAR-ON-YEAR CHANGE IN REAL GDP'].index[0]
            mask_columns = ~raw_df.columns.str.contains('Unnamed')
            df = raw_df.iloc[0:dum-1,mask_columns]    
            df['source'] = str(year) + '-Q' + str(q)
            df = df.rename(columns={'TARGET_PERIOD':'target','FCT_SOURCE':'id','POINT':'point',
                                   'TN1_0':'[-2.0,-1.1]','FN1_0TN0_6':'[-1.0,-0.6]',
                                   'FN0_5TN0_1':'[-0.5,-0.1]','F0_0T0_4':'[0.0,0.4]',
                                   'F0_5T0_9':'[0.5,0.9]','F1_0T1_4':'[1.0,1.4]',
                                   'F1_5T1_9':'[1.5,1.9]','F2_0T2_4':'[2.0,2.4]',
                                   'F2_5T2_9':'[2.5,2.9]','F3_0T3_4':'[3.0,3.4]',
                                   'F3_5T3_9':'[3.5,3.9]','F4_0':'[4.0,5.0]'})
            df = df[['source','target','id','point','[-2.0,-1.1]','[-1.0,-0.6]','[-0.5,-0.1]',
                     '[0.0,0.4]','[0.5,0.9]','[1.0,1.4]','[1.5,1.9]','[2.0,2.4]','[2.5,2.9]',
                     '[3.0,3.4]','[3.5,3.9]', '[4.0,5.0]']]
            # remove rows where point is missing
            maskNaN = df.point.isnull()
            df = df[~maskNaN]                    
            df.fillna(0,inplace = True)
            for colname in df.columns[3:]:
                df[colname] = df[colname].astype('float')
            mask = df.columns.str.contains(',')
            # remove rows which do not have a histogram
            maskHist = df.loc[:,mask].apply(np.sum, axis=1)!=0.0
            df = df.loc[maskHist,:]
            grouped = df.groupby(['source','target'])
            agg_hist =grouped[['[-2.0,-1.1]','[-1.0,-0.6]','[-0.5,-0.1]',
                     '[0.0,0.4]','[0.5,0.9]','[1.0,1.4]','[1.5,1.9]','[2.0,2.4]','[2.5,2.9]',
                     '[3.0,3.4]','[3.5,3.9]', '[4.0,5.0]']].agg(np.mean,axis=0)
            agg_hist.to_excel(writer,f)
writer.save()

In [99]:
writer = pd.ExcelWriter(out_data + 'SPF_aggregate_histogram.xlsx')

In [100]:
agg_hist.to_excel(writer,f)

In [102]:
writer.save()

In [ ]:


In [ ]:


In [91]:
grouped = df.groupby(['source','target'])

In [92]:
agg_hist =grouped[['[-2.0,-1.1]','[-1.0,-0.6]','[-0.5,-0.1]',
                     '[0.0,0.4]','[0.5,0.9]','[1.0,1.4]','[1.5,1.9]','[2.0,2.4]','[2.5,2.9]',
                     '[3.0,3.4]','[3.5,3.9]', '[4.0,5.0]']].agg(np.mean,axis=0)

In [93]:
agg_hist


Out[93]:
[-2.0,-1.1] [-1.0,-0.6] [-0.5,-0.1] [0.0,0.4] [0.5,0.9] [1.0,1.4] [1.5,1.9] [2.0,2.4] [2.5,2.9] [3.0,3.4] [3.5,3.9] [4.0,5.0]
source target
2016-Q1 2016 0.413101 1.398154 6.563698 20.755563 36.195539 24.786644 7.592633 1.621901 0.447736 0.153583 0.051039 0.020409
2016Dec 0.195111 0.806320 4.042026 13.342081 28.040684 29.923766 15.302813 5.559588 1.886260 0.725573 0.132911 0.042867
2017 0.156856 0.477735 1.518767 6.433687 15.147962 29.303042 32.341059 10.275199 3.194858 0.981854 0.120430 0.048552
2017Dec 0.185830 0.564403 1.435289 6.079712 14.309872 26.287529 31.065046 12.809149 4.601514 2.036121 0.469910 0.155625
2018 0.257009 0.444428 1.476107 5.034666 12.351745 23.639347 33.018521 16.313100 5.195082 1.649630 0.443358 0.177007
2020 0.242110 0.345077 1.214783 3.674173 9.197805 22.013439 32.670633 18.657048 7.707559 2.970538 0.967409 0.339427

In [98]:
agg_hist.xs('2020',level=1).values.sum()


Out[98]:
100.00000000000001

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]: