The country converter (coco) includes a helper function which provides country classification concordances. The concordances are provided in dict, vector and matrix format. The helper function can be used independent from the country classification included in coco (see 'Small example' below), but can also make use of the different classifications provided by coco ('WIOD example' below). The aggregation function can also be used in Matlab to build concordance matrices (see 'Use in Matlab' at the end).
This example is based on a small country set co:
In [1]:
co = ['c1', 'c2', 'c3', 'c4', 'c5']
Assume, these countries can be classified in two ways (e.g. one could be geographic classification, the other one a political one):
In [2]:
geographic_reg = {'c1': 'geo1', 'c2': 'geo1', 'c3': 'geo2', 'c4': 'geo3', 'c5': 'geo3'}
political_reg = {'c1': 'pol1', 'c3': 'pol1', 'c5': 'pol2'}
First, we want to aggregated the countries co to political regions, assigning all non classified countries to a Rest of the World region RoW:
In [3]:
import country_converter as coco
vector representation
In [4]:
concordance_vec = coco.agg_conc(original_countries=co,
aggregates=political_reg,
missing_countries='RoW',
as_dataframe='sparse')
This results in a concordance which maps countries co to the regions given in political_reg:
In [5]:
concordance_vec
Out[5]:
Two other output formats are available:
full matrix representation
In [6]:
concordance_matrix = coco.agg_conc(original_countries=co,
aggregates=political_reg,
missing_countries='RoW',
as_dataframe='full')
concordance_matrix
Out[6]:
The numerical values are available in the pandas.DataFrame attribute values:
In [7]:
concordance_matrix.values
Out[7]:
dictionary representation
In [8]:
concordance_dict = coco.agg_conc(original_countries=co,
aggregates=political_reg,
missing_countries='RoW',
as_dataframe=False)
concordance_dict
Out[8]:
The aggregates parameter of coco.agg_conc accepts several classification for the building of the concordance. This allows for a sequential aggregation.
Based on the two classifications above geographic_reg and politic_reg we can build a concordance which first aggregates into political regions and subsequently all countries not included in a political region into geographic regions:
In [9]:
conc_pol_geo = coco.agg_conc(original_countries=co,
aggregates=[political_reg, geographic_reg],
missing_countries='RoW',
as_dataframe='sparse')
conc_pol_geo
Out[9]:
The order of the classifications is important here, as subsequent classifications are only applied to countries not already classified:
In [10]:
conc_geo_pol = coco.agg_conc(original_countries=co,
aggregates=[geographic_reg, political_reg],
missing_countries='RoW',
as_dataframe='sparse')
conc_geo_pol
Out[10]:
This can be used to classify into different political regions (see example WIOD below) and two exclude certain countries from the classification as for example:
In [11]:
conc_c2_pol_geo = coco.agg_conc(original_countries=co,
aggregates=[{'c2': 'c2'}, political_reg, geographic_reg],
missing_countries='RoW',
as_dataframe='sparse')
conc_c2_pol_geo
Out[11]:
Besides passing custom dictionaries with countries and classification, coco.agg_conc also accepts strings corresponding to any classification available in coco. For example, to aggregated the WIOD countries into EU, OECD and 'OTHER' use:
In [12]:
conc_wiod_3reg = coco.agg_conc(original_countries='WIOD',
aggregates=['EU', 'OECD'],
missing_countries='OTHER',
as_dataframe='sparse')
In [13]:
conc_wiod_3reg.head(10)
Out[13]:
WIOD already includes a ROW region, which consists of EU and non-EU countries. This leads the to the compound name for the region shown above. To set such many to many matches to None and pass them through until a unique matching is found, set merge_multiple_strings to None:
In [14]:
conc_wiod_3reg = coco.agg_conc(original_countries='WIOD',
aggregates=['EU', 'OECD'],
missing_countries='OTHER',
merge_multiple_string=None,
as_dataframe='sparse')
conc_wiod_3reg.head(10)
Out[14]:
Internally, the dicts required for the aggregation are build be calling CountryConverter.get_correspondance_dict(original_class, target_class). If a fine grained control over the aggregation is necessary, call this low level function directly and modify the resulting dict.
As in the small example given above, single countries can be separated by passing a custom dict:
In [15]:
conc_wiod_DE_Aus_OECD = coco.agg_conc(original_countries='WIOD',
aggregates=[{'Deu': 'Deu', 'Aus': 'Aus'}, 'OECD'],
missing_countries='Non_OECD',
merge_multiple_string=None,
as_dataframe='sparse')
conc_wiod_DE_Aus_OECD.head(16)
Out[15]:
Python function can be called directly in Matlab. However, Matlab follows its trademark and makes simple things complicated. To achieve the above concordance conc_wiod_3reg in Matlab use:
conc_wiod_3reg_dataframe = py.country_converter.agg_conc(...
pyargs('original_countries','WIOD', ...
'aggregates',py.list({'EU', 'OECD'}), ...
'missing_countries', 'OTHER', ...
'merge_multiple_string', py.None, ...
'as_dataframe', 'full'));
This returns the data as pandas DataFrame. Since Matlab does not provide any function for converting pandas DataFrames or Numpy arrays to double, two additional steps are required to get to a valid Matlab array:
temp = cellfun(@cell,cell(conc_wiod_3reg_dataframe.values.tolist),'un',0);
conc_matrix = cell2mat(vertcat(temp {:}));
Since Matlab only works with numerical values, all names provided in the original DataFrame are lost.
Additional dicts for the aggregates list can be build in Matlab with for example:
spec_country_dict = py.dict(pyargs('Aus', 'Aus', 'Deu', 'Deu'))
For more details about agg_conc see the docstring:
In [16]:
help(coco.agg_conc)