Data Import - Testing

Class definitions

module `DataImport`

We want to import data directly from the ECB data warehouse, so for example rather than going to the series we want to download the csv data. In fact, the ECB provides three different download format (two csv's, one generic and one for Excel) and one XML download.

There is is also an sdmx query facility that allows more granula control over what data will be downloaded.

The URI's are as follows (most also allow https):

human readable series

http://sdw.ecb.europa.eu/quickview.do?SERIES_KEY=-key-

csv file (generic and Excel format)

http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=csv
http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=xls

sdmx file

http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=sdmx

sdmx query and endpoint

http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=-key-&type=sdmxQuery
http://sdw-ws.ecb.europa.eu/



In [1]:

    
#!wget https://www.dropbox.com/s//DataImport.py -O DataImport.py
import DataImport as di
#help('DataImport')

module `PDataFrame`

that's a little side project that creates persistent data frames



In [2]:

    
#!wget https://www.dropbox.com/s//PDataFrame.py -O PDataFrame.py
import PDataFrame as pdf
#help('PDataFrame')

Testing

Bookmarks

that's a little side project, which is to create a file containing bookmarks for interesting series in the ECB database; it uses the PDataFrame class defined above.

Note that the following lines can generally be commented out: the whole idea here is that the bookmarks are kept in persistant storage (here the file ECB_DataSeries.csv) so one only has to execute bm.set() once to add a new bookmark (provided the csv file is being moved around with this note book)



In [3]:

    
#pdf.PDataFrame.create('DataImport.csv', ('key', 'description'))
#bm = pdf.PDataFrame('DataImport.csv')
#bm.set('deposit', ('ILM.W.U2.C.L022.U2.EUR', 'current usage of the deposit facility'))
#bm.set('lending', ('ILM.M.U2.C.A05B.U2.EUR', 'current aggregate usage of major lending facilities'))
#bm.set('lending_marg', ('ILM.W.U2.C.A055.U2.EUR', 'current usage of the marginal lending facility')

just to check what bookmarks we have defined...



In [4]:

    
bm = pdf.PDataFrame('DataImport.csv')
bm._df









    Out[4]:






  
    
      
      key
      description
    
  
  
    
      deposit
       ILM.W.U2.C.L022.U2.EUR
                   current usage of the deposit facility
    
    
      lending
       ILM.M.U2.C.A05B.U2.EUR
       current aggregate usage of major lending facil...
    
    
      lending_marg
       ILM.W.U2.C.A055.U2.EUR
          current usage of the marginal lending facility
    
  

3 rows × 2 columns

...and how to get the values back



In [5]:

    
bm.get('deposit', 'key')









    Out[5]:





'ILM.W.U2.C.L022.U2.EUR'

Data

we fetch three data series, the ECB deposit facility, the ECB lending facility, and the ECB marginal lending facility using the fetch method that takes as parameter the series key (see below and explanation for the skip_end parameter)



In [6]:

    
ei = di.ECBDataImport()
deposit = ei.fetch(bm.get('deposit', 'key'), skip_end=10)
lending = ei.fetch(bm.get('lending', 'key'))
lending_marg = ei.fetch(bm.get('lending_marg', 'key'))

the dataset returned contains a number of additonal info items, for example a description



In [7]:

    
deposit.keys()









    Out[7]:





dict_keys(['data', 'descr', 'head', 'descr_raw', 'key'])



In [8]:

    
deposit['descr']









    Out[8]:





[['Dataset name', 'Internal Liquidity Management'],
 ['Frequency', 'Weekly'],
 ['Reference area', 'Euro area (changing composition)'],
 ['BS reference sector breakdown', 'Eurosystem'],
 ['Balance sheet item', 'Deposit facility'],
 ['Counterpart area', 'Euro area (changing composition)'],
 ['Currency of transaction', 'Euro']]

The time information is in a funny format eg (eg, "2008w21"). So we then reformat the datatables into something that can be plotted, ie a float. For this we have the static method data_table that takes the data and a reformatting function for the time. Normally it returns a 2-tuple, the first component being the time-tuple, the second component being the value-tuple

If desired, additionally an interpolation function can be return as the third component. This is necessary if we want to do operations on series that are not based on the same time values. We see this in the last line below: le[2] is the interpolation function for the lending, and it is applied to dp[0] which are the time values for the deposit function. Now the two series are on the same basis and can hence be substracted (note that in fetch() we needed the skip_end parameter, because the available deposit data series goes further than the available lending series, which makes the interpolation fail).



In [9]:

    
unit = 1000000
dp = ei.data_table(deposit, ei.time_reformat1, unit)
le = ei.data_table(lending, ei.time_reformat2, unit, True)
lm = ei.data_table(lending_marg, ei.time_reformat1, unit)
diff = le[2](dp[0]) - dp[1]

The functions for converting time are implemented as static methods on the object. For the time being there are two of them



In [10]:

    
ei.time_reformat1("2010w2")









    Out[10]:





(2010.0192307692307, ('w', 2010, 2))



In [11]:

    
ei.time_reformat2("2010mar")









    Out[11]:





(2010.1666666666667, ('m', 2010, 3))

We now can plot the data series. Note that that would not have been that trivial to do in Excel because one of the data series is monthly, the other one is weekly



In [12]:

    
plot(le[0], le[1])
plot(lm[0], lm[1])
plot(dp[0], dp[1])









    Out[12]:





[<matplotlib.lines.Line2D at 0x7ff47adad9b0>]



In [13]:

    
plot(dp[0], diff)









    Out[13]:





[<matplotlib.lines.Line2D at 0x7ff467022748>]

	key	description
deposit	ILM.W.U2.C.L022.U2.EUR	current usage of the deposit facility
lending	ILM.M.U2.C.A05B.U2.EUR	current aggregate usage of major lending facil...
lending_marg	ILM.W.U2.C.A055.U2.EUR	current usage of the marginal lending facility