Aggregation helper function in the Country Converter

The country converter (coco) includes a helper function which provides country classification concordances. The concordances are provided in dict, vector and matrix format. The helper function can be used independent from the country classification included in coco (see 'Small example' below), but can also make use of the different classifications provided by coco ('WIOD example' below). The aggregation function can also be used in Matlab to build concordance matrices (see 'Use in Matlab' at the end).

Small example

This example is based on a small country set co:



In [1]:

    
co = ['c1', 'c2', 'c3', 'c4', 'c5']

Assume, these countries can be classified in two ways (e.g. one could be geographic classification, the other one a political one):



In [2]:

    
geographic_reg = {'c1': 'geo1', 'c2': 'geo1', 'c3': 'geo2', 'c4': 'geo3', 'c5': 'geo3'}
political_reg = {'c1': 'pol1', 'c3': 'pol1', 'c5': 'pol2'}

Single classification aggregation

First, we want to aggregated the countries co to political regions, assigning all non classified countries to a Rest of the World region RoW:



In [3]:

    
import country_converter as coco

vector representation



In [4]:

    
concordance_vec = coco.agg_conc(original_countries=co,
                                aggregates=political_reg,
                                missing_countries='RoW',
                                as_dataframe='sparse')

This results in a concordance which maps countries co to the regions given in political_reg:



In [5]:

    
concordance_vec

Two other output formats are available:

full matrix representation



In [6]:

    
concordance_matrix = coco.agg_conc(original_countries=co,
                                   aggregates=political_reg,
                                   missing_countries='RoW',
                                   as_dataframe='full')
concordance_matrix

The numerical values are available in the pandas.DataFrame attribute values:



In [7]:

    
concordance_matrix.values









    Out[7]:





array([[0., 1., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.]])

dictionary representation



In [8]:

    
concordance_dict = coco.agg_conc(original_countries=co,
                                 aggregates=political_reg,
                                 missing_countries='RoW',
                                 as_dataframe=False)
concordance_dict









    Out[8]:





OrderedDict([('c1', 'pol1'),
             ('c2', 'RoW'),
             ('c3', 'pol1'),
             ('c4', 'RoW'),
             ('c5', 'pol2')])

Multiple classification aggregations

The aggregates parameter of coco.agg_conc accepts several classification for the building of the concordance. This allows for a sequential aggregation.

Based on the two classifications above geographic_reg and politic_reg we can build a concordance which first aggregates into political regions and subsequently all countries not included in a political region into geographic regions:



In [9]:

    
conc_pol_geo = coco.agg_conc(original_countries=co,
                             aggregates=[political_reg, geographic_reg],
                             missing_countries='RoW',
                             as_dataframe='sparse')
conc_pol_geo

The order of the classifications is important here, as subsequent classifications are only applied to countries not already classified:



In [10]:

    
conc_geo_pol = coco.agg_conc(original_countries=co,
                             aggregates=[geographic_reg, political_reg],
                             missing_countries='RoW',
                             as_dataframe='sparse')
conc_geo_pol

This can be used to classify into different political regions (see example WIOD below) and two exclude certain countries from the classification as for example:



In [11]:

    
conc_c2_pol_geo = coco.agg_conc(original_countries=co,
                                aggregates=[{'c2': 'c2'}, political_reg, geographic_reg],
                                missing_countries='RoW',
                                as_dataframe='sparse')
conc_c2_pol_geo

WIOD Example

Besides passing custom dictionaries with countries and classification, coco.agg_conc also accepts strings corresponding to any classification available in coco. For example, to aggregated the WIOD countries into EU, OECD and 'OTHER' use:



In [12]:

    
conc_wiod_3reg = coco.agg_conc(original_countries='WIOD',
                               aggregates=['EU', 'OECD'],
                               missing_countries='OTHER',
                               as_dataframe='sparse')



In [13]:

    
conc_wiod_3reg.head(10)

WIOD already includes a ROW region, which consists of EU and non-EU countries. This leads the to the compound name for the region shown above. To set such many to many matches to None and pass them through until a unique matching is found, set merge_multiple_strings to None:



In [14]:

    
conc_wiod_3reg = coco.agg_conc(original_countries='WIOD',
                               aggregates=['EU', 'OECD'],
                               missing_countries='OTHER',
                               merge_multiple_string=None,
                               as_dataframe='sparse')
conc_wiod_3reg.head(10)

Internally, the dicts required for the aggregation are build be calling CountryConverter.get_correspondance_dict(original_class, target_class). If a fine grained control over the aggregation is necessary, call this low level function directly and modify the resulting dict.

As in the small example given above, single countries can be separated by passing a custom dict:



In [15]:

    
conc_wiod_DE_Aus_OECD = coco.agg_conc(original_countries='WIOD',
                                      aggregates=[{'Deu': 'Deu', 'Aus': 'Aus'}, 'OECD'],
                                      missing_countries='Non_OECD',
                                      merge_multiple_string=None,
                                      as_dataframe='sparse')
conc_wiod_DE_Aus_OECD.head(16)









    Out[15]:







  
    
      
      original
      aggregated
    
  
  
    
      0
      RoW
      Non_OECD
    
    
      1
      AUS
      OECD
    
    
      2
      AUT
      OECD
    
    
      3
      BEL
      OECD
    
    
      4
      BRA
      Non_OECD
    
    
      5
      BGR
      Non_OECD
    
    
      6
      CAN
      OECD
    
    
      7
      CHN
      Non_OECD
    
    
      8
      CYP
      Non_OECD
    
    
      9
      CZE
      OECD
    
    
      10
      DNK
      OECD
    
    
      11
      EST
      OECD
    
    
      12
      FIN
      OECD
    
    
      13
      FRA
      OECD
    
    
      14
      DEU
      OECD
    
    
      15
      GRC
      OECD

Use in Matlab

Python function can be called directly in Matlab. However, Matlab follows its trademark and makes simple things complicated. To achieve the above concordance conc_wiod_3reg in Matlab use:

conc_wiod_3reg_dataframe = py.country_converter.agg_conc(...
    pyargs('original_countries','WIOD', ...
           'aggregates',py.list({'EU', 'OECD'}), ...
           'missing_countries', 'OTHER', ...
           'merge_multiple_string', py.None, ...
           'as_dataframe', 'full'));

This returns the data as pandas DataFrame. Since Matlab does not provide any function for converting pandas DataFrames or Numpy arrays to double, two additional steps are required to get to a valid Matlab array:

temp = cellfun(@cell,cell(conc_wiod_3reg_dataframe.values.tolist),'un',0);
conc_matrix = cell2mat(vertcat(temp {:}));

Since Matlab only works with numerical values, all names provided in the original DataFrame are lost.

Additional dicts for the aggregates list can be build in Matlab with for example:

spec_country_dict = py.dict(pyargs('Aus', 'Aus', 'Deu', 'Deu'))

Further details

For more details about agg_conc see the docstring:



In [16]:

    
help(coco.agg_conc)









    



Help on function agg_conc in module country_converter.country_converter:

agg_conc(original_countries, aggregates, missing_countries='test', merge_multiple_string='_&_', log_missing_countries=None, log_merge_multiple_strings=None, coco=None, as_dataframe='sparse', original_countries_class=None)
    Builds an aggregation concordance dict, vec or matrix
    
    Parameters
    ----------
    
    original_countries: list or str
        List of countries to aggregated, also accepts and valid column name of
        CountryConverter.data (e.g. name_short, WIOD, ...)
    
    aggregates: list of dict or str
        List of aggregation information. This can either be dict mapping the
        names of 'original_countries' to aggregates, or a valid column name of
        CountryConverter.data Aggregation happens in order given in this
        parameter.  Thus, country assigned to an aggregate are not re-assigned
        by the following aggregation information.
    
    missing_countries: str, boolean, None
        Entry to fill in for countries in 'original_countries' which do not
        appear in 'aggregates'.  str: Use the given name for all missing
        countries True: Use the name in original_countries for missing
        countries False: Skip these countries None: Use None for these
        countries
    
    merge_multiple_string: str or None, optional
        If multiple correspondance entries are given in one of the aggregates
        join them with the given string (default: '_&_'.  To skip these enries,
        pass None.
    
    log_missing_countries: function, optional
        This function is called with country is country is in
        'original_countries' but missing in all 'aggregates'.
        For example, pass
        lambda x: logging.error('Country {} missing'.format(x))
        to log errors for such countries.  Default: do nothing
    
    log_merge_multiple_strings: function, optional
        Function to call for logging multiple strings, see
        log_missing_countries Default: do nothing
    
    coco: instance of CountryConverter, optional
        CountryConverter instance used for the conversion.  Pass a custom one
        if additional data is needed in addition to the custom country
        converter file.  If None (default), the bare CountryConverter is used
    
    as_dataframe: boolean or st, optional
        If False, output as OrderedDict.  If True or str, output as pandas
        dataframe.  If str and 'full', output as a full matrix, otherwise only
        two collumns with the original and aggregated names are returned.
    
    original_countries_class: str, optional
        Valid column name of CountryConverter.data.  This parameter is needed
        if a list of countries is passed to 'orginal_countries' and strings
        corresponding to data in CountryConverter.data are used subsequently.
        Can be omitted otherwise.
    
    Returns
    -------
    
    OrderedDict or DataFrame (defined by 'as_dataframe')

aggregated	RoW	pol1	pol2
original
c1	0.0	1.0	0.0
c2	1.0	0.0	0.0
c3	0.0	1.0	0.0
c4	1.0	0.0	0.0
c5	0.0	0.0	1.0

	original	aggregated
0	RoW	None_&_EU
1	AUS	OECD
2	AUT	EU
3	BEL	EU
4	BRA	OTHER
5	BGR	EU
6	CAN	OECD
7	CHN	OTHER
8	CYP	EU
9	CZE	EU

	original	aggregated
0	RoW	Non_OECD
1	AUS	OECD
2	AUT	OECD
3	BEL	OECD
4	BRA	Non_OECD
5	BGR	Non_OECD
6	CAN	OECD
7	CHN	Non_OECD
8	CYP	Non_OECD
9	CZE	OECD
10	DNK	OECD
11	EST	OECD
12	FIN	OECD
13	FRA	OECD
14	DEU	OECD
15	GRC	OECD