Data Import - Testing

Class definitions

module DataImport

We want to import data directly from the ECB data warehouse, so for example rather than going to the series we want to download the csv data. In fact, the ECB provides three different download format (two csv's, one generic and one for Excel) and one XML download.

There is is also an sdmx query facility that allows more granula control over what data will be downloaded.

The URI's are as follows (most also allow https):

  • human readable series
    http://sdw.ecb.europa.eu/quickview.do?SERIES_KEY=-key-
  • csv file (generic and Excel format)
    http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=csv
    http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=xls
  • sdmx file
    http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=sdmx
  • sdmx query and endpoint
    http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=sdmxQuery
    http://sdw-ws.ecb.europa.eu/

In [1]:
#!wget https://www.dropbox.com/s//DataImport.py -O DataImport.py
import DataImport as di
#help('DataImport')

module PDataFrame

that's a little side project that creates persistent data frames


In [2]:
#!wget https://www.dropbox.com/s//PDataFrame.py -O PDataFrame.py
import PDataFrame as pdf
#help('PDataFrame')

Testing

Bookmarks

that's a little side project, which is to create a file containing bookmarks for interesting series in the ECB database; it uses the PDataFrame class defined above.

Note that the following lines can generally be commented out: the whole idea here is that the bookmarks are kept in persistant storage (here the file ECB_DataSeries.csv) so one only has to execute bm.set() once to add a new bookmark (provided the csv file is being moved around with this note book)


In [3]:
#pdf.PDataFrame.create('DataImport.csv', ('key', 'description'))
#bm = pdf.PDataFrame('DataImport.csv')
#bm.set('deposit', ('ILM.W.U2.C.L022.U2.EUR', 'current usage of the deposit facility'))
#bm.set('lending', ('ILM.M.U2.C.A05B.U2.EUR', 'current aggregate usage of major lending facilities'))
#bm.set('lending_marg', ('ILM.W.U2.C.A055.U2.EUR', 'current usage of the marginal lending facility')

just to check what bookmarks we have defined...


In [4]:
bm = pdf.PDataFrame('DataImport.csv')
bm._df


Out[4]:
key description
deposit ILM.W.U2.C.L022.U2.EUR current usage of the deposit facility
lending ILM.M.U2.C.A05B.U2.EUR current aggregate usage of major lending facil...
lending_marg ILM.W.U2.C.A055.U2.EUR current usage of the marginal lending facility

3 rows × 2 columns

...and how to get the values back


In [5]:
bm.get('deposit', 'key')


Out[5]:
'ILM.W.U2.C.L022.U2.EUR'

Data

we fetch three data series, the ECB deposit facility, the ECB lending facility, and the ECB marginal lending facility using the fetch method that takes as parameter the series key (see below and explanation for the skip_end parameter)


In [6]:
ei = di.ECBDataImport()
deposit = ei.fetch(bm.get('deposit', 'key'), skip_end=10)
lending = ei.fetch(bm.get('lending', 'key'))
lending_marg = ei.fetch(bm.get('lending_marg', 'key'))

the dataset returned contains a number of additonal info items, for example a description


In [7]:
deposit.keys()


Out[7]:
dict_keys(['data', 'descr', 'head', 'descr_raw', 'key'])

In [8]:
deposit['descr']


Out[8]:
[['Dataset name', 'Internal Liquidity Management'],
 ['Frequency', 'Weekly'],
 ['Reference area', 'Euro area (changing composition)'],
 ['BS reference sector breakdown', 'Eurosystem'],
 ['Balance sheet item', 'Deposit facility'],
 ['Counterpart area', 'Euro area (changing composition)'],
 ['Currency of transaction', 'Euro']]

The time information is in a funny format eg (eg, "2008w21"). So we then reformat the datatables into something that can be plotted, ie a float. For this we have the static method data_table that takes the data and a reformatting function for the time. Normally it returns a 2-tuple, the first component being the time-tuple, the second component being the value-tuple

If desired, additionally an interpolation function can be return as the third component. This is necessary if we want to do operations on series that are not based on the same time values. We see this in the last line below: le[2] is the interpolation function for the lending, and it is applied to dp[0] which are the time values for the deposit function. Now the two series are on the same basis and can hence be substracted (note that in fetch() we needed the skip_end parameter, because the available deposit data series goes further than the available lending series, which makes the interpolation fail).


In [9]:
unit = 1000000
dp = ei.data_table(deposit, ei.time_reformat1, unit)
le = ei.data_table(lending, ei.time_reformat2, unit, True)
lm = ei.data_table(lending_marg, ei.time_reformat1, unit)
diff = le[2](dp[0]) - dp[1]

The functions for converting time are implemented as static methods on the object. For the time being there are two of them


In [10]:
ei.time_reformat1("2010w2")


Out[10]:
(2010.0192307692307, ('w', 2010, 2))

In [11]:
ei.time_reformat2("2010mar")


Out[11]:
(2010.1666666666667, ('m', 2010, 3))

We now can plot the data series. Note that that would not have been that trivial to do in Excel because one of the data series is monthly, the other one is weekly


In [12]:
plot(le[0], le[1])
plot(lm[0], lm[1])
plot(dp[0], dp[1])


Out[12]:
[<matplotlib.lines.Line2D at 0x7ff47adad9b0>]

In [13]:
plot(dp[0], diff)


Out[13]:
[<matplotlib.lines.Line2D at 0x7ff467022748>]