Building the country information files

The DARIAH app contains a visualization of the number of member country contribution on a map.

We show the map using Leaflet, which loads files containing the boundaries. These files are in geojson format.

Here we bundle all the necessary information of all European countries in one file.

Per country that is:

  • country code (ISO 2 letter)
  • latitude and longitude (the place where to put markers or other features)
  • geojson polygons, representing the boundaries

We have obtained data from the github repo mledoze/countries. We use these files:

  • dist/countries_unescaped.json
  • data/ccc.geo.json (where ccc is the three letter code of a country)

We have compiled manually a selection of European countries from

  • dist/countries.csv

and transformed it to the file

  • europe_countries.csv (with only the name, the 2 letter and 3 letter codes of the country)

The bundle we are producing will be a geojson file with as little information as needed. We also will round the coordinates and weed out duplicate points, in order to reduce the file size.

NB:

For Kosovo we have made manual adjustments:

  • We downloaded a geojson file from elsewhere
  • used KOS as a temporary three letter code

In [1]:
EU_FILE = 'europe_countries.csv'
GEO_DIR = 'geojson'
COUNTRIES = 'all_countries.json'
OUTFILE = '../../../client/src/js/helpers/europe.geo.js'
CENTER_PRECISION = 1

In [2]:
import sys, collections, json

Read the list of European countries


In [3]:
eu_countries = {}
with open(EU_FILE) as f:
    for line in f:
        if line[0] == '#': continue
        fields = line.strip().split(';')
        if len(fields) == 3:
            (name, iso2, iso3) = fields
            eu_countries[iso2] = dict(iso3=iso3, name=name)
for (i, (iso2, info)) in enumerate(sorted(eu_countries.items())):
    print('{:>2} {} {} {}'.format(i+1, iso2, info['iso3'], info['name']))


 1 AD AND Andorra
 2 AL ALB Albania
 3 AM ARM Armenia
 4 AT AUT Austria
 5 AZ AZE Azerbaijan
 6 BA BIH Bosnia and Herzegovina
 7 BE BEL Belgium
 8 BG BGR Bulgaria
 9 BY BLR Belarus
10 CH CHE Switzerland
11 CY CYP Cyprus
12 CZ CZE Czech Republic
13 DE DEU Germany
14 DK DNK Denmark
15 EE EST Estonia
16 ES ESP Spain
17 FI FIN Finland
18 FR FRA France
19 GB GBR United Kingdom
20 GE GEO Georgia
21 GR GRC Greece
22 HR HRV Croatia
23 HU HUN Hungary
24 IE IRL Ireland
25 IS ISL Iceland
26 IT ITA Italy
27 LI LIE Liechtenstein
28 LT LTU Lithuania
29 LU LUX Luxembourg
30 LV LVA Latvia
31 MC MCO Monaco
32 MD MDA Moldova
33 ME MNE Montenegro
34 MK MKD Macedonia
35 MT MLT Malta
36 NL NLD Netherlands
37 NO NOR Norway
38 PL POL Poland
39 PT PRT Portugal
40 RO ROU Romania
41 RS SRB Serbia
42 RU RUS Russia
43 SE SWE Sweden
44 SI SVN Slovenia
45 SK SVK Slovakia
46 SM SMR San Marino
47 TR TUR Turkey
48 UA UKR Ukraine
49 UZ UZB Uzbekistan
50 VA VAT Vatican City
51 XK KOS Kosovo

Read and filter the country file


In [4]:
with open(COUNTRIES) as f:
    countries = json.load(f)
print('Total number of countries: {}'.format(len(countries)))
i = 0
coord_fmt = '{{:>{}.{}f}}'.format(4+CENTER_PRECISION, CENTER_PRECISION)
pair_fmt = '({}, {})'.format(coord_fmt, coord_fmt)
line_fmt = '{{:>2}} {{}} {} {{}}'.format(pair_fmt)

for country in countries:
    iso2 = country['cca2']
    if iso2 in eu_countries:
        i += 1
        (lat, lng) = country['latlng']
        info = eu_countries[iso2]
        info['lat'] = round(lat, CENTER_PRECISION)
        info['lng'] = round(lng, CENTER_PRECISION)
print('Found info for {} European countries'.format(i))
for (i, (iso2, info)) in enumerate(sorted(eu_countries.items())):
    print(line_fmt.format(
        i+1, iso2,
        info['lat'], info['lng'],
        info['name'],
    ))


Total number of countries: 248
Found info for 51 European countries
 1 AD ( 42.5,   1.5) Andorra
 2 AL ( 41.0,  20.0) Albania
 3 AM ( 40.0,  45.0) Armenia
 4 AT ( 47.3,  13.3) Austria
 5 AZ ( 40.5,  47.5) Azerbaijan
 6 BA ( 44.0,  18.0) Bosnia and Herzegovina
 7 BE ( 50.8,   4.0) Belgium
 8 BG ( 43.0,  25.0) Bulgaria
 9 BY ( 53.0,  28.0) Belarus
10 CH ( 47.0,   8.0) Switzerland
11 CY ( 35.0,  33.0) Cyprus
12 CZ ( 49.8,  15.5) Czech Republic
13 DE ( 51.0,   9.0) Germany
14 DK ( 56.0,  10.0) Denmark
15 EE ( 59.0,  26.0) Estonia
16 ES ( 40.0,  -4.0) Spain
17 FI ( 64.0,  26.0) Finland
18 FR ( 46.0,   2.0) France
19 GB ( 54.0,  -2.0) United Kingdom
20 GE ( 42.0,  43.5) Georgia
21 GR ( 39.0,  22.0) Greece
22 HR ( 45.2,  15.5) Croatia
23 HU ( 47.0,  20.0) Hungary
24 IE ( 53.0,  -8.0) Ireland
25 IS ( 65.0, -18.0) Iceland
26 IT ( 42.8,  12.8) Italy
27 LI ( 47.3,   9.5) Liechtenstein
28 LT ( 56.0,  24.0) Lithuania
29 LU ( 49.8,   6.2) Luxembourg
30 LV ( 57.0,  25.0) Latvia
31 MC ( 43.7,   7.4) Monaco
32 MD ( 47.0,  29.0) Moldova
33 ME ( 42.5,  19.3) Montenegro
34 MK ( 41.8,  22.0) Macedonia
35 MT ( 35.8,  14.6) Malta
36 NL ( 52.5,   5.8) Netherlands
37 NO ( 62.0,  10.0) Norway
38 PL ( 52.0,  20.0) Poland
39 PT ( 39.5,  -8.0) Portugal
40 RO ( 46.0,  25.0) Romania
41 RS ( 44.0,  21.0) Serbia
42 RU ( 60.0, 100.0) Russia
43 SE ( 62.0,  15.0) Sweden
44 SI ( 46.1,  14.8) Slovenia
45 SK ( 48.7,  19.5) Slovakia
46 SM ( 43.8,  12.4) San Marino
47 TR ( 39.0,  35.0) Turkey
48 UA ( 49.0,  32.0) Ukraine
49 UZ ( 41.0,  64.0) Uzbekistan
50 VA ( 41.9,  12.4) Vatican City
51 XK ( 42.7,  21.2) Kosovo

Gather the boundary information


In [5]:
def n_points(tp, data):
    if tp == 'll': return len(data)
    if tp == 'Polygon': return sum(len(ll) for ll in data)
    if tp == 'MultiPolygon': return sum(sum(len(ll) for ll in poly) for poly in data)
    return -1

def n_ll(tp, data):
    if tp == 'Polygon': return len(data)
    if tp == 'MultiPolygon': return sum(len(poly) for poly in data)
    return -1

In [6]:
for iso2 in eu_countries:
    info = eu_countries[iso2]
    with open('{}/{}.geo.json'.format(GEO_DIR, info['iso3'])) as f:
        geoinfo = json.load(f)
        geometry = geoinfo['features'][0]['geometry']
        info['geometry'] = geometry

total_ng = 0
total_nl = 0
total_np = 0

for (i, (iso2, info)) in enumerate(sorted(eu_countries.items())):
    geo = info['geometry']
    shape = geo['type']
    data = geo['coordinates']
    ng = 1 if shape == 'Polygon' else len(data)
    np = n_points(shape, data)
    nl = n_ll(shape, data)
    total_ng += ng
    total_nl += nl
    total_np += np

    print('{:>2} {} {:<25} {:<15} {:>2} poly, {:>3} linear ring, {:>5} point'.format(
        i+1, iso2,
        info['name'],
        shape,
        ng, nl, np,
    ))  
print('{:<47}{:>2} poly, {:>3} linear ring, {:>5} point'.format(
    'TOTAL', total_ng, total_nl, total_np,
))


 1 AD Andorra                   Polygon          1 poly,   1 linear ring,    29 point
 2 AL Albania                   Polygon          1 poly,   1 linear ring,   337 point
 3 AM Armenia                   MultiPolygon     2 poly,   4 linear ring,   418 point
 4 AT Austria                   Polygon          1 poly,   1 linear ring,   596 point
 5 AZ Azerbaijan                MultiPolygon     4 poly,   5 linear ring,   871 point
 6 BA Bosnia and Herzegovina    Polygon          1 poly,   1 linear ring,   399 point
 7 BE Belgium                   Polygon          1 poly,   1 linear ring,   381 point
 8 BG Bulgaria                  Polygon          1 poly,   1 linear ring,   564 point
 9 BY Belarus                   Polygon          1 poly,   1 linear ring,   996 point
10 CH Switzerland               Polygon          1 poly,   2 linear ring,   545 point
11 CY Cyprus                    Polygon          1 poly,   1 linear ring,   187 point
12 CZ Czech Republic            Polygon          1 poly,   1 linear ring,   520 point
13 DE Germany                   MultiPolygon    23 poly,  23 linear ring,  2157 point
14 DK Denmark                   MultiPolygon    18 poly,  18 linear ring,  1608 point
15 EE Estonia                   MultiPolygon     6 poly,   6 linear ring,   735 point
16 ES Spain                     MultiPolygon    16 poly,  16 linear ring,  1655 point
17 FI Finland                   MultiPolygon    26 poly,  26 linear ring,  1968 point
18 FR France                    MultiPolygon    10 poly,  10 linear ring,  2007 point
19 GB United Kingdom            MultiPolygon    48 poly,  48 linear ring,  3898 point
20 GE Georgia                   Polygon          1 poly,   1 linear ring,   505 point
21 GR Greece                    MultiPolygon    68 poly,  68 linear ring,  3204 point
22 HR Croatia                   MultiPolygon    19 poly,  19 linear ring,  1365 point
23 HU Hungary                   Polygon          1 poly,   1 linear ring,   616 point
24 IE Ireland                   MultiPolygon     5 poly,   5 linear ring,  1028 point
25 IS Iceland                   Polygon          1 poly,   1 linear ring,  1466 point
26 IT Italy                     MultiPolygon    22 poly,  24 linear ring,  2317 point
27 LI Liechtenstein             Polygon          1 poly,   1 linear ring,    28 point
28 LT Lithuania                 MultiPolygon     2 poly,   2 linear ring,   565 point
29 LU Luxembourg                Polygon          1 poly,   1 linear ring,    84 point
30 LV Latvia                    Polygon          1 poly,   1 linear ring,   535 point
31 MC Monaco                    Polygon          1 poly,   1 linear ring,    15 point
32 MD Moldova                   Polygon          1 poly,   1 linear ring,   425 point
33 ME Montenegro                Polygon          1 poly,   1 linear ring,   268 point
34 MK Macedonia                 Polygon          1 poly,   1 linear ring,   219 point
35 MT Malta                     MultiPolygon     2 poly,   2 linear ring,    44 point
36 NL Netherlands               MultiPolygon     9 poly,   9 linear ring,  1054 point
37 NO Norway                    MultiPolygon    94 poly,  94 linear ring,  8396 point
38 PL Poland                    Polygon          1 poly,   1 linear ring,   924 point
39 PT Portugal                  MultiPolygon     8 poly,   8 linear ring,   768 point
40 RO Romania                   Polygon          1 poly,   1 linear ring,   975 point
41 RS Serbia                    Polygon          1 poly,   1 linear ring,   652 point
42 RU Russia                    MultiPolygon    228 poly, 228 linear ring, 35822 point
43 SE Sweden                    MultiPolygon    19 poly,  19 linear ring,  2638 point
44 SI Slovenia                  Polygon          1 poly,   1 linear ring,   301 point
45 SK Slovakia                  Polygon          1 poly,   1 linear ring,   403 point
46 SM San Marino                Polygon          1 poly,   1 linear ring,    24 point
47 TR Turkey                    MultiPolygon     5 poly,   5 linear ring,  2186 point
48 UA Ukraine                   MultiPolygon     4 poly,   4 linear ring,  2782 point
49 UZ Uzbekistan                Polygon          1 poly,   1 linear ring,  1428 point
50 VA Vatican City              Polygon          1 poly,   1 linear ring,     4 point
51 XK Kosovo                    Polygon          1 poly,   1 linear ring,    21 point
TOTAL                                          667 poly, 673 linear ring, 90933 point

Condense coordinates

We are going to reduce the information in the boundaries in a number of ways. A shape is organized as follows:

Multipolygon: a set of Polygons Polygon: a set of linear rings Linear rings: a list of coordinates, of which the last is equal to the first Coordinate: a longitude and a latitude

GEO_PRECISION

For coordinates we use a resolution of GEO_PRECISION digits behind the decimal point. We round the coordinates. This may cause repetition of identical points in a shape. We weed those out. We must take care that we do not weed out the first and last points.

MIN_POINTS

If a linear ring has too few points, we just ignore it. That is, a linear ring must have at least MIN_POINTS in order to pass.

MAX_POINTS

If a linear ring has too many points, we weed them out, until there are MAX_POINTS left.

MAX_MULTI

If a multipolygon has too many polygons, we retain only MAX_MULTI of them. We order the polygons by the number of points they contain, and we retain the richest ones.


In [7]:
# maximal
GEO_PRECISION =  3 # number of digits in coordinates of shapes
MIN_POINTS    = 1 # minimum number of points in a linear ring
MAX_POINTS    = 500 # maximum number of points in a linear ring
MAX_POLY      = 100 # maximum number of polygons in a multipolygon

In [8]:
# minimal
GEO_PRECISION =  1 # number of digits in coordinates of shapes
MIN_POINTS    = 10 # minimum number of points in a linear ring
MAX_POINTS    = 12 # maximum number of points in a linear ring
MAX_POLY      = 5 # maximum number of polygons in a multipolygon

In [9]:
# medium
GEO_PRECISION =  1 # number of digits in coordinates of shapes
MIN_POINTS    = 15 # minimum number of points in a linear ring
MAX_POINTS    = 60 # maximum number of points in a linear ring
MAX_POLY      = 7 # maximum number of polygons in a multipolygon

In [10]:
def weed_ll(ll):
    new_ll = tuple(collections.OrderedDict(
        ((round(lng, GEO_PRECISION), round(lat, GEO_PRECISION)), None) for (lng, lat) in ll
    ).keys())
    if len(new_ll) > MAX_POINTS:
        new_ll = new_ll[::(int(len(new_ll) / MAX_POINTS) + 1)]        
    return new_ll + (new_ll[0],)

def weed_poly(poly):
    new_poly = tuple(weed_ll(ll) for ll in poly)
    return tuple(ll for ll in new_poly if len(ll) >= MIN_POINTS)

def weed_multi(multi):
    new_multi = tuple(weed_poly(poly) for poly in multi)
    return tuple(sorted(new_multi, key=lambda poly: -n_points('Polygon', poly))[0:MAX_POLY])

def weed(tp, data):
    if tp == 'll': return weed_ll(data)
    if tp == 'Polygon': return weed_poly(data)
    if tp == 'MultiPolygon': return weed_multi(data)

In [11]:
ll = [
    [8.710255,47.696808],
    [8.709721,47.70694],
    [8.708332,47.710548],
    [8.705,47.713051],
    [8.698889,47.713608],
    [8.675278,47.712494],
    [8.670555,47.711105],
    [8.670277,47.707497],
    [8.673298,47.701771],
    [8.675554,47.697495],
    [8.678595,47.693344],
    [8.710255,47.696808],
]
ll2 = [
    [8.710255,47.696808],
    [9.709721,47.70694],
    [10.708332,47.710548],
    [11.705,47.713051],
    [12.698889,47.713608],
    [13.675278,47.712494],
    [14.670555,47.711105],
    [15.670277,47.707497],
    [16.673298,47.701771],
    [17.675554,47.697495],
    [18.678595,47.693344],
    [19.710255,47.696808],
    [20.710255,47.696808],
    [8.710255,47.696808],
]

poly = [ll, ll2]

In [12]:
print(weed_ll(ll))
print('=====')
print(weed_ll(ll2))
print('=====')
print(weed_poly(poly))


((8.7, 47.7), (8.7, 47.7))
=====
((8.7, 47.7), (9.7, 47.7), (10.7, 47.7), (11.7, 47.7), (12.7, 47.7), (13.7, 47.7), (14.7, 47.7), (15.7, 47.7), (16.7, 47.7), (17.7, 47.7), (18.7, 47.7), (19.7, 47.7), (20.7, 47.7), (8.7, 47.7))
=====
()

In [13]:
wtotal_ng = 0
wtotal_nl = 0
wtotal_np = 0

for (i, (iso2, info)) in enumerate(sorted(eu_countries.items())):
    geo = info['geometry']
    shape = geo['type']
    data = geo['coordinates']
    new_data = weed(shape, data)
    geo['coordinates'] = new_data
    data = new_data
    ng = 1 if shape == 'Polygon' else len(data)
    np = n_points(shape, data)
    nl = n_ll(shape, data)
    wtotal_ng += ng
    wtotal_nl += nl
    wtotal_np += np

    print('{:>2} {} {:<25} {:<15} {:>2} poly, {:>3} linear ring, {:>5} point'.format(
        i+1, iso2,
        info['name'],
        shape,
        ng, nl, np,
    ))  
print('{:<47}{:>2} poly, {:>3} linear ring, {:>5} point'.format(
    'TOTAL after weeding', wtotal_ng, wtotal_nl, wtotal_np,
))
print('{:<47}{:>2} poly, {:>3} linear ring, {:>5} point'.format(
    'TOTAL', total_ng, total_nl, total_np,
))
print('{:<47}{:>2} poly, {:>3} linear ring, {:>5} point'.format(
    'IMPROVEMENT', total_ng - wtotal_ng, total_nl - wtotal_nl, total_np - wtotal_np,
))


 1 AD Andorra                   Polygon          1 poly,   0 linear ring,     0 point
 2 AL Albania                   Polygon          1 poly,   1 linear ring,    52 point
 3 AM Armenia                   MultiPolygon     2 poly,   1 linear ring,    55 point
 4 AT Austria                   Polygon          1 poly,   1 linear ring,    49 point
 5 AZ Azerbaijan                MultiPolygon     4 poly,   2 linear ring,    86 point
 6 BA Bosnia and Herzegovina    Polygon          1 poly,   1 linear ring,    42 point
 7 BE Belgium                   Polygon          1 poly,   1 linear ring,    61 point
 8 BG Bulgaria                  Polygon          1 poly,   1 linear ring,    50 point
 9 BY Belarus                   Polygon          1 poly,   1 linear ring,    60 point
10 CH Switzerland               Polygon          1 poly,   1 linear ring,    54 point
11 CY Cyprus                    Polygon          1 poly,   1 linear ring,    32 point
12 CZ Czech Republic            Polygon          1 poly,   1 linear ring,    60 point
13 DE Germany                   MultiPolygon     7 poly,   3 linear ring,   102 point
14 DK Denmark                   MultiPolygon     7 poly,   6 linear ring,   215 point
15 EE Estonia                   MultiPolygon     6 poly,   3 linear ring,   111 point
16 ES Spain                     MultiPolygon     7 poly,   6 linear ring,   164 point
17 FI Finland                   MultiPolygon     7 poly,   1 linear ring,    58 point
18 FR France                    MultiPolygon     7 poly,   2 linear ring,   111 point
19 GB United Kingdom            MultiPolygon     7 poly,   7 linear ring,   251 point
20 GE Georgia                   Polygon          1 poly,   1 linear ring,    59 point
21 GR Greece                    MultiPolygon     7 poly,   7 linear ring,   237 point
22 HR Croatia                   MultiPolygon     7 poly,   4 linear ring,   112 point
23 HU Hungary                   Polygon          1 poly,   1 linear ring,    47 point
24 IE Ireland                   MultiPolygon     5 poly,   1 linear ring,    59 point
25 IS Iceland                   Polygon          1 poly,   1 linear ring,    57 point
26 IT Italy                     MultiPolygon     7 poly,   3 linear ring,   144 point
27 LI Liechtenstein             Polygon          1 poly,   0 linear ring,     0 point
28 LT Lithuania                 MultiPolygon     2 poly,   1 linear ring,    53 point
29 LU Luxembourg                Polygon          1 poly,   1 linear ring,    27 point
30 LV Latvia                    Polygon          1 poly,   1 linear ring,    60 point
31 MC Monaco                    Polygon          1 poly,   0 linear ring,     0 point
32 MD Moldova                   Polygon          1 poly,   1 linear ring,    58 point
33 ME Montenegro                Polygon          1 poly,   1 linear ring,    37 point
34 MK Macedonia                 Polygon          1 poly,   1 linear ring,    35 point
35 MT Malta                     MultiPolygon     2 poly,   0 linear ring,     0 point
36 NL Netherlands               MultiPolygon     7 poly,   3 linear ring,    89 point
37 NO Norway                    MultiPolygon     7 poly,   7 linear ring,   285 point
38 PL Poland                    Polygon          1 poly,   1 linear ring,    54 point
39 PT Portugal                  MultiPolygon     7 poly,   2 linear ring,    67 point
40 RO Romania                   Polygon          1 poly,   1 linear ring,    57 point
41 RS Serbia                    Polygon          1 poly,   1 linear ring,    46 point
42 RU Russia                    MultiPolygon     7 poly,   7 linear ring,   410 point
43 SE Sweden                    MultiPolygon     7 poly,   3 linear ring,   115 point
44 SI Slovenia                  Polygon          1 poly,   1 linear ring,    44 point
45 SK Slovakia                  Polygon          1 poly,   1 linear ring,    48 point
46 SM San Marino                Polygon          1 poly,   0 linear ring,     0 point
47 TR Turkey                    MultiPolygon     5 poly,   2 linear ring,   109 point
48 UA Ukraine                   MultiPolygon     4 poly,   1 linear ring,    58 point
49 UZ Uzbekistan                Polygon          1 poly,   1 linear ring,    55 point
50 VA Vatican City              Polygon          1 poly,   0 linear ring,     0 point
51 XK Kosovo                    Polygon          1 poly,   1 linear ring,    21 point
TOTAL after weeding                            157 poly,  96 linear ring,  4056 point
TOTAL                                          667 poly, 673 linear ring, 90933 point
IMPROVEMENT                                    510 poly, 577 linear ring, 86877 point

Produce geojson file


In [16]:
features = dict(
    type='FeatureCollection',
    features=[],
)
for (iso2, info) in sorted(eu_countries.items()):
    feature = dict(
        type='Feature',
        properties=dict(
            iso2=iso2,
            lng=info['lng'],
            lat=info['lat'],
        ),
        geometry=info['geometry'],
    )
    features['features'].append(feature)

with open(OUTFILE, 'w') as f:
    f.write('''
/**
 * European country borders
 *
 * @module europe_geo_js
 */
/**
 * Contains low resulution geographical coordinates of borders of European countries.
 * These coordinates can be drawn on a map, e.g. by [Leaflet](http://leafletjs.com).
 * 
 * More information, and the computation itself is in 
 * [countries.ipynb](/api/file/tools/country_compose/countries.html)
 * a Jupyer notebook that you can run for yourself, if you want to tweak the
 * resolution and precision of the border coordinates.
 */
''')
    f.write('export const countryBorders = ')
    json.dump(features, f)

In [ ]: