This column stands for the point (or period) in time to which the forecast refers to. Depending on the macroeconomic variable, the target period is given either as year (format “yyyy”), quarter (format “yyyyQq”) or month (format “yyyymmm”), where the month is a mixed-case three-character tag of calendar months in English.
1) a forecast for the current calendar year
2) a forecast for the next calendar year
3) a forecast for the calendar year after next
4) a longer term forecast (four calendar years ahead in the Q1 and Q2 rounds and five calendar years ahead in the Q3 and Q4 rounds)
5) a “rolling horizon” forecast for the month (for HICP inflation and the unemployment rate) and quarter (for GDP growth) one year ahead of the latest available observation (at the time of the survey)
6) a “rolling horizon” forecast for the month (for HICP inflation and the unemployment rate) and quarter (for GDP growth) two years ahead of the latest available observation (at the time of the survey).
This column stands for forecast source or forecaster ID, is the code number assigned to an individual forecaster. This number remains the same for a specific forecaster over all forecast rounds (only the subset of forecasters that were actually responding in a particular round is reported).
This column contains the point forecast of the forecaster for the macroeconomic indicator. For HICP it is year-on-year change in HICP
These columns contain the probability assigned to each of the intervals of the forecasted variable. The headers of the columns specify the intervals
TN1_0 FN1_0TN0_6 FN0_5TN0_1 F0_0T0_4 F0_5T0_9 F1_0T1_4 F1_5T1_9 F2_0T2_4 F2_5T2_9 F3_0T3_4 F3_5T3_9 F4_0
TN1_0 FN1_0TN0_6 FN0_5TN0_1 F0_0T0_4 F0_5T0_9 F1_0T1_4 F1_5T1_9 F2_0T2_4 F2_5T2_9 F3_0T3_4 F3_5T3_9 F4_0
In [22]:
import os
import pandas as pd
import scipy.stats as stats
import scipy.optimize as opt
import numpy as np
In [2]:
raw_data_path = 'C:\\Users\\EEU227\\Documents\\IPython Notebooks\\DEEdata\\SPF_individual_forecasts'
if not os.path.isdir(raw_data_path + '\\CleanData'):
os.mkdir(raw_data_path + '\\CleanData')
out_data = raw_data_path + '\\CleanData\\'
In [103]:
writer = pd.ExcelWriter(out_data + 'SPF_aggregate_histogram.xlsx')
years = [2013,2014,2015,2016]
quarters = [1,2,3,4]
for year in years:
for q in quarters:
f = str(year) + 'Q' + str(q)
fname = f + '.csv'
if os.path.isfile(raw_data_path + '\\' + fname):
raw_df = pd.read_csv(raw_data_path + '\\' + fname,header = 1)
# find the row where the growth expectations start
dum = raw_df[raw_df['TARGET_PERIOD'] == 'GROWTH EXPECTATIONS; YEAR-ON-YEAR CHANGE IN REAL GDP'].index[0]
mask_columns = ~raw_df.columns.str.contains('Unnamed')
df = raw_df.iloc[0:dum-1,mask_columns]
df['source'] = str(year) + '-Q' + str(q)
df = df.rename(columns={'TARGET_PERIOD':'target','FCT_SOURCE':'id','POINT':'point',
'TN1_0':'[-2.0,-1.1]','FN1_0TN0_6':'[-1.0,-0.6]',
'FN0_5TN0_1':'[-0.5,-0.1]','F0_0T0_4':'[0.0,0.4]',
'F0_5T0_9':'[0.5,0.9]','F1_0T1_4':'[1.0,1.4]',
'F1_5T1_9':'[1.5,1.9]','F2_0T2_4':'[2.0,2.4]',
'F2_5T2_9':'[2.5,2.9]','F3_0T3_4':'[3.0,3.4]',
'F3_5T3_9':'[3.5,3.9]','F4_0':'[4.0,5.0]'})
df = df[['source','target','id','point','[-2.0,-1.1]','[-1.0,-0.6]','[-0.5,-0.1]',
'[0.0,0.4]','[0.5,0.9]','[1.0,1.4]','[1.5,1.9]','[2.0,2.4]','[2.5,2.9]',
'[3.0,3.4]','[3.5,3.9]', '[4.0,5.0]']]
# remove rows where point is missing
maskNaN = df.point.isnull()
df = df[~maskNaN]
df.fillna(0,inplace = True)
for colname in df.columns[3:]:
df[colname] = df[colname].astype('float')
mask = df.columns.str.contains(',')
# remove rows which do not have a histogram
maskHist = df.loc[:,mask].apply(np.sum, axis=1)!=0.0
df = df.loc[maskHist,:]
grouped = df.groupby(['source','target'])
agg_hist =grouped[['[-2.0,-1.1]','[-1.0,-0.6]','[-0.5,-0.1]',
'[0.0,0.4]','[0.5,0.9]','[1.0,1.4]','[1.5,1.9]','[2.0,2.4]','[2.5,2.9]',
'[3.0,3.4]','[3.5,3.9]', '[4.0,5.0]']].agg(np.mean,axis=0)
agg_hist.to_excel(writer,f)
writer.save()
In [99]:
writer = pd.ExcelWriter(out_data + 'SPF_aggregate_histogram.xlsx')
In [100]:
agg_hist.to_excel(writer,f)
In [102]:
writer.save()
In [ ]:
In [ ]:
In [91]:
grouped = df.groupby(['source','target'])
In [92]:
agg_hist =grouped[['[-2.0,-1.1]','[-1.0,-0.6]','[-0.5,-0.1]',
'[0.0,0.4]','[0.5,0.9]','[1.0,1.4]','[1.5,1.9]','[2.0,2.4]','[2.5,2.9]',
'[3.0,3.4]','[3.5,3.9]', '[4.0,5.0]']].agg(np.mean,axis=0)
In [93]:
agg_hist
Out[93]:
In [98]:
agg_hist.xs('2020',level=1).values.sum()
Out[98]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: