Structure of spf files

TARGET_PERIOD:

This column stands for the point (or period) in time to which the forecast refers to. Depending on the macroeconomic variable, the target period is given either as year (format “yyyy”), quarter (format “yyyyQq”) or month (format “yyyymmm”), where the month is a mixed-case three-character tag of calendar months in English.
1) a forecast for the current calendar year
2) a forecast for the next calendar year
3) a forecast for the calendar year after next
4) a longer term forecast (four calendar years ahead in the Q1 and Q2 rounds and five calendar years ahead in the Q3 and Q4 rounds)
5) a “rolling horizon” forecast for the month (for HICP inflation and the unemployment rate) and quarter (for GDP growth) one year ahead of the latest available observation (at the time of the survey)
6) a “rolling horizon” forecast for the month (for HICP inflation and the unemployment rate) and quarter (for GDP growth) two years ahead of the latest available observation (at the time of the survey).

FCT_SOURCE

This column stands for forecast source or forecaster ID, is the code number assigned to an individual forecaster. This number remains the same for a specific forecaster over all forecast rounds (only the subset of forecasters that were actually responding in a particular round is reported).

POINT

This column contains the point forecast of the forecaster for the macroeconomic indicator. For HICP it is year-on-year change in HICP

The following columns

These columns contain the probability assigned to each of the intervals of the forecasted variable. The headers of the columns specify the intervals

TN2_0: Inflation expectations; year-on-year change in HICP; probability for interval < -2.0 %
TN1_0: Inflation expectations; year-on-year change in HICP; probability for interval < -1.0 %
FN2_0TN1_6: Inflation expectations; year-on-year change in HICP; probability for interval from -2.0 to -1.6 %
FN1_5TN1_1: Inflation expectations; year-on-year change in HICP; probability for interval from -1.5 to -1.1 %
FN1_0TN0_6: Inflation expectations; year-on-year change in HICP; probability for interval from -1.0 to -0.6 %
FN0_5TN0_1: Inflation expectations; year-on-year change in HICP; probability for interval from -0.5 to -0.1 %
T0_0 Inflation expectations; year-on-year change in HICP; probability for interval < 0.0 %
F0_0T0_4: Inflation expectations; year-on-year change in HICP; probability for interval 0.0 - 0.4 %
F0_5T0_9: Inflation expectations; year-on-year change in HICP; probability for interval 0.5 - 0.9 %
F1_0T1_4: Inflation expectations; year-on-year change in HICP; probability for interval 1.0 - 1.4 %
F1_5T1_9: Inflation expectations; year-on-year change in HICP; probability for interval 1.5 - 1.9 %
F2_0T2_4: Inflation expectations; year-on-year change in HICP; probability for interval 2.0 - 2.4 %
F2_5T2_9: Inflation expectations; year-on-year change in HICP; probability for interval 2.5 - 2.9 %
F3_0T3_4: Inflation expectations; year-on-year change in HICP; probability for interval 3.0 - 3.4 %
F3_5T3_9: Inflation expectations; year-on-year change in HICP; probability for interval 3.5 - 3.9 %
F3_5 Inflation expectations; year-on-year change in HICP; probability for interval >= 3.5 %
F4_0: Inflation expectations; year-on-year change in HICP; probability for interval >= 4.0 %

TN1_0 FN1_0TN0_6 FN0_5TN0_1 F0_0T0_4 F0_5T0_9 F1_0T1_4 F1_5T1_9 F2_0T2_4 F2_5T2_9 F3_0T3_4 F3_5T3_9 F4_0



In [22]:

    
import os
import pandas as pd
import scipy.stats as stats
import scipy.optimize as opt
import numpy as np



In [2]:

    
raw_data_path = 'C:\\Users\\EEU227\\Documents\\IPython Notebooks\\DEEdata\\SPF_individual_forecasts'

if not os.path.isdir(raw_data_path + '\\CleanData'):
    os.mkdir(raw_data_path + '\\CleanData')

out_data = raw_data_path + '\\CleanData\\'



In [103]:

    
writer = pd.ExcelWriter(out_data + 'SPF_aggregate_histogram.xlsx')
years = [2013,2014,2015,2016]
quarters = [1,2,3,4]
for year in years:
    for q in quarters:
        f = str(year) + 'Q' + str(q)
        fname = f + '.csv'
        if os.path.isfile(raw_data_path + '\\' + fname):
            raw_df = pd.read_csv(raw_data_path + '\\' + fname,header = 1)
            # find the row where the growth expectations start
            dum = raw_df[raw_df['TARGET_PERIOD'] == 'GROWTH EXPECTATIONS; YEAR-ON-YEAR CHANGE IN REAL GDP'].index[0]
            mask_columns = ~raw_df.columns.str.contains('Unnamed')
            df = raw_df.iloc[0:dum-1,mask_columns]    
            df['source'] = str(year) + '-Q' + str(q)
            df = df.rename(columns={'TARGET_PERIOD':'target','FCT_SOURCE':'id','POINT':'point',
                                   'TN1_0':'[-2.0,-1.1]','FN1_0TN0_6':'[-1.0,-0.6]',
                                   'FN0_5TN0_1':'[-0.5,-0.1]','F0_0T0_4':'[0.0,0.4]',
                                   'F0_5T0_9':'[0.5,0.9]','F1_0T1_4':'[1.0,1.4]',
                                   'F1_5T1_9':'[1.5,1.9]','F2_0T2_4':'[2.0,2.4]',
                                   'F2_5T2_9':'[2.5,2.9]','F3_0T3_4':'[3.0,3.4]',
                                   'F3_5T3_9':'[3.5,3.9]','F4_0':'[4.0,5.0]'})
            df = df[['source','target','id','point','[-2.0,-1.1]','[-1.0,-0.6]','[-0.5,-0.1]',
                     '[0.0,0.4]','[0.5,0.9]','[1.0,1.4]','[1.5,1.9]','[2.0,2.4]','[2.5,2.9]',
                     '[3.0,3.4]','[3.5,3.9]', '[4.0,5.0]']]
            # remove rows where point is missing
            maskNaN = df.point.isnull()
            df = df[~maskNaN]                    
            df.fillna(0,inplace = True)
            for colname in df.columns[3:]:
                df[colname] = df[colname].astype('float')
            mask = df.columns.str.contains(',')
            # remove rows which do not have a histogram
            maskHist = df.loc[:,mask].apply(np.sum, axis=1)!=0.0
            df = df.loc[maskHist,:]
            grouped = df.groupby(['source','target'])
            agg_hist =grouped[['[-2.0,-1.1]','[-1.0,-0.6]','[-0.5,-0.1]',
                     '[0.0,0.4]','[0.5,0.9]','[1.0,1.4]','[1.5,1.9]','[2.0,2.4]','[2.5,2.9]',
                     '[3.0,3.4]','[3.5,3.9]', '[4.0,5.0]']].agg(np.mean,axis=0)
            agg_hist.to_excel(writer,f)
writer.save()



In [99]:

    
writer = pd.ExcelWriter(out_data + 'SPF_aggregate_histogram.xlsx')



In [100]:

    
agg_hist.to_excel(writer,f)



In [102]:

    
writer.save()



In [ ]:



In [ ]:



In [91]:

    
grouped = df.groupby(['source','target'])



In [92]:

    
agg_hist =grouped[['[-2.0,-1.1]','[-1.0,-0.6]','[-0.5,-0.1]',
                     '[0.0,0.4]','[0.5,0.9]','[1.0,1.4]','[1.5,1.9]','[2.0,2.4]','[2.5,2.9]',
                     '[3.0,3.4]','[3.5,3.9]', '[4.0,5.0]']].agg(np.mean,axis=0)



In [93]:

    
agg_hist









    Out[93]:






  
    
      
      
      [-2.0,-1.1]
      [-1.0,-0.6]
      [-0.5,-0.1]
      [0.0,0.4]
      [0.5,0.9]
      [1.0,1.4]
      [1.5,1.9]
      [2.0,2.4]
      [2.5,2.9]
      [3.0,3.4]
      [3.5,3.9]
      [4.0,5.0]
    
    
      source
      target
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2016-Q1
      2016
      0.413101
      1.398154
      6.563698
      20.755563
      36.195539
      24.786644
      7.592633
      1.621901
      0.447736
      0.153583
      0.051039
      0.020409
    
    
      2016Dec
      0.195111
      0.806320
      4.042026
      13.342081
      28.040684
      29.923766
      15.302813
      5.559588
      1.886260
      0.725573
      0.132911
      0.042867
    
    
      2017
      0.156856
      0.477735
      1.518767
      6.433687
      15.147962
      29.303042
      32.341059
      10.275199
      3.194858
      0.981854
      0.120430
      0.048552
    
    
      2017Dec
      0.185830
      0.564403
      1.435289
      6.079712
      14.309872
      26.287529
      31.065046
      12.809149
      4.601514
      2.036121
      0.469910
      0.155625
    
    
      2018
      0.257009
      0.444428
      1.476107
      5.034666
      12.351745
      23.639347
      33.018521
      16.313100
      5.195082
      1.649630
      0.443358
      0.177007
    
    
      2020
      0.242110
      0.345077
      1.214783
      3.674173
      9.197805
      22.013439
      32.670633
      18.657048
      7.707559
      2.970538
      0.967409
      0.339427



In [98]:

    
agg_hist.xs('2020',level=1).values.sum()









    Out[98]:





100.00000000000001



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

		[-2.0,-1.1]	[-1.0,-0.6]	[-0.5,-0.1]	[0.0,0.4]	[0.5,0.9]	[1.0,1.4]	[1.5,1.9]	[2.0,2.4]	[2.5,2.9]	[3.0,3.4]	[3.5,3.9]	[4.0,5.0]
source	target
2016-Q1	2016	0.413101	1.398154	6.563698	20.755563	36.195539	24.786644	7.592633	1.621901	0.447736	0.153583	0.051039	0.020409
	2016Dec	0.195111	0.806320	4.042026	13.342081	28.040684	29.923766	15.302813	5.559588	1.886260	0.725573	0.132911	0.042867
	2017	0.156856	0.477735	1.518767	6.433687	15.147962	29.303042	32.341059	10.275199	3.194858	0.981854	0.120430	0.048552
	2017Dec	0.185830	0.564403	1.435289	6.079712	14.309872	26.287529	31.065046	12.809149	4.601514	2.036121	0.469910	0.155625
	2018	0.257009	0.444428	1.476107	5.034666	12.351745	23.639347	33.018521	16.313100	5.195082	1.649630	0.443358	0.177007
	2020	0.242110	0.345077	1.214783	3.674173	9.197805	22.013439	32.670633	18.657048	7.707559	2.970538	0.967409	0.339427