DataImport
We want to import data directly from the ECB data warehouse, so for example rather than going to the series we want to download the csv data. In fact, the ECB provides three different download format (two csv's, one generic and one for Excel) and one XML download.
There is is also an sdmx query facility that allows more granula control over what data will be downloaded.
The URI's are as follows (most also allow https):
http://sdw.ecb.europa.eu/quickview.do?SERIES_KEY=-key-
http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=csv
http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=xls
http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=sdmx
http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=sdmxQuery
http://sdw-ws.ecb.europa.eu/
In [1]:
#!wget https://www.dropbox.com/s//DataImport.py -O DataImport.py
import DataImport as di
#help('DataImport')
In [2]:
#!wget https://www.dropbox.com/s//PDataFrame.py -O PDataFrame.py
import PDataFrame as pdf
#help('PDataFrame')
that's a little side project, which is to create a file containing bookmarks for interesting series in the ECB database; it uses the PDataFrame
class defined above.
Note that the following lines can generally be commented out: the whole idea here is that the bookmarks are kept in persistant storage (here the file ECB_DataSeries.csv
) so one only has to execute bm.set()
once to add a new bookmark (provided the csv file is being moved around with this note book)
In [3]:
#pdf.PDataFrame.create('DataImport.csv', ('key', 'description'))
#bm = pdf.PDataFrame('DataImport.csv')
#bm.set('deposit', ('ILM.W.U2.C.L022.U2.EUR', 'current usage of the deposit facility'))
#bm.set('lending', ('ILM.M.U2.C.A05B.U2.EUR', 'current aggregate usage of major lending facilities'))
#bm.set('lending_marg', ('ILM.W.U2.C.A055.U2.EUR', 'current usage of the marginal lending facility')
just to check what bookmarks we have defined...
In [4]:
bm = pdf.PDataFrame('DataImport.csv')
bm._df
Out[4]:
...and how to get the values back
In [5]:
bm.get('deposit', 'key')
Out[5]:
In [6]:
ei = di.ECBDataImport()
deposit = ei.fetch(bm.get('deposit', 'key'), skip_end=10)
lending = ei.fetch(bm.get('lending', 'key'))
lending_marg = ei.fetch(bm.get('lending_marg', 'key'))
the dataset returned contains a number of additonal info items, for example a description
In [7]:
deposit.keys()
Out[7]:
In [8]:
deposit['descr']
Out[8]:
The time information is in a funny format eg (eg, "2008w21"). So we then reformat the datatables into something that can be plotted, ie a float. For this we have the static method data_table
that takes the data and a reformatting function for the time. Normally it returns a 2-tuple, the first component being the time-tuple, the second component being the value-tuple
If desired, additionally an interpolation function can be return as the third component. This is necessary if we want to do operations on series that are not based on the same time values. We see this in the last line below: le[2]
is the interpolation function for the lending, and it is applied to dp[0]
which are the time values for the deposit function. Now the two series are on the same basis and can hence be substracted (note that in fetch()
we needed the skip_end
parameter, because the available deposit data series goes further than the available lending series, which makes the interpolation fail).
In [9]:
unit = 1000000
dp = ei.data_table(deposit, ei.time_reformat1, unit)
le = ei.data_table(lending, ei.time_reformat2, unit, True)
lm = ei.data_table(lending_marg, ei.time_reformat1, unit)
diff = le[2](dp[0]) - dp[1]
The functions for converting time are implemented as static methods on the object. For the time being there are two of them
In [10]:
ei.time_reformat1("2010w2")
Out[10]:
In [11]:
ei.time_reformat2("2010mar")
Out[11]:
We now can plot the data series. Note that that would not have been that trivial to do in Excel because one of the data series is monthly, the other one is weekly
In [12]:
plot(le[0], le[1])
plot(lm[0], lm[1])
plot(dp[0], dp[1])
Out[12]:
In [13]:
plot(dp[0], diff)
Out[13]: