JSON exercise

Using data in file 'data/world_bank_projects.json'

  1. Find the 10 countries with most projects
  2. Find the top 10 major project themes (using column 'mjtheme_namecode')
  3. In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.

1) Find the 10 countries with most projects


In [1]:
# Import required packages
import pandas as pd
import json
from pandas.io.json import json_normalize

In [2]:
# Read JSON file as Pandas DataFrame object
world_bank_df = pd.read_json('data/world_bank_projects.json')
world_bank_df


Out[2]:
_id approvalfy board_approval_month boardapprovaldate borrower closingdate country_namecode countrycode countryname countryshortname ... sectorcode source status supplementprojectflg theme1 theme_namecode themecode totalamt totalcommamt url
0 {'$oid': '52b213b38594d8a2be17c780'} 1999 November 2013-11-12T00:00:00Z FEDERAL DEMOCRATIC REPUBLIC OF ETHIOPIA 2018-07-07T00:00:00Z Federal Democratic Republic of Ethiopia!$!ET ET Federal Democratic Republic of Ethiopia Ethiopia ... ET,BS,ES,EP IBRD Active N {'Percent': 100, 'Name': 'Education for all'} [{'code': '65', 'name': 'Education for all'}] 65 130000000 130000000 http://www.worldbank.org/projects/P129828/ethi...
1 {'$oid': '52b213b38594d8a2be17c781'} 2015 November 2013-11-04T00:00:00Z GOVERNMENT OF TUNISIA NaN Republic of Tunisia!$!TN TN Republic of Tunisia Tunisia ... BZ,BS IBRD Active N {'Percent': 30, 'Name': 'Other economic manage... [{'code': '24', 'name': 'Other economic manage... 54,24 0 4700000 http://www.worldbank.org/projects/P144674?lang=en
2 {'$oid': '52b213b38594d8a2be17c782'} 2014 November 2013-11-01T00:00:00Z MINISTRY OF FINANCE AND ECONOMIC DEVEL NaN Tuvalu!$!TV TV Tuvalu Tuvalu ... TI IBRD Active Y {'Percent': 46, 'Name': 'Regional integration'} [{'code': '47', 'name': 'Regional integration'... 52,81,25,47 6060000 6060000 http://www.worldbank.org/projects/P145310?lang=en
3 {'$oid': '52b213b38594d8a2be17c783'} 2014 October 2013-10-31T00:00:00Z MIN. OF PLANNING AND INT'L COOPERATION NaN Republic of Yemen!$!RY RY Republic of Yemen Yemen, Republic of ... JB IBRD Active N {'Percent': 50, 'Name': 'Participation and civ... [{'code': '57', 'name': 'Participation and civ... 59,57 0 1500000 http://www.worldbank.org/projects/P144665?lang=en
4 {'$oid': '52b213b38594d8a2be17c784'} 2014 October 2013-10-31T00:00:00Z MINISTRY OF FINANCE 2019-04-30T00:00:00Z Kingdom of Lesotho!$!LS LS Kingdom of Lesotho Lesotho ... FH,YW,YZ IBRD Active N {'Percent': 30, 'Name': 'Export development an... [{'code': '45', 'name': 'Export development an... 41,45 13100000 13100000 http://www.worldbank.org/projects/P144933/seco...
5 {'$oid': '52b213b38594d8a2be17c785'} 2014 October 2013-10-31T00:00:00Z REPUBLIC OF KENYA NaN Republic of Kenya!$!KE KE Republic of Kenya Kenya ... JB IBRD Active Y {'Percent': 100, 'Name': 'Social safety nets'} [{'code': '54', 'name': 'Social safety nets'}] 54 10000000 10000000 http://www.worldbank.org/projects/P146161?lang=en
6 {'$oid': '52b213b38594d8a2be17c786'} 2014 October 2013-10-29T00:00:00Z GOVERNMENT OF INDIA 2019-06-30T00:00:00Z Republic of India!$!IN IN Republic of India India ... TI IBRD Active N {'Percent': 20, 'Name': 'Administrative and ci... [{'code': '25', 'name': 'Administrative and ci... 39,25 500000000 500000000 http://www.worldbank.org/projects/P121185/firs...
7 {'$oid': '52b213b38594d8a2be17c787'} 2014 October 2013-10-29T00:00:00Z PEOPLE'S REPUBLIC OF CHINA NaN People's Republic of China!$!CN CN People's Republic of China China ... LR IBRD Active N {'Percent': 100, 'Name': 'Climate change'} [{'code': '81', 'name': 'Climate change'}] 81 0 27280000 http://www.worldbank.org/projects/P127033/chin...
8 {'$oid': '52b213b38594d8a2be17c788'} 2014 October 2013-10-29T00:00:00Z THE GOVERNMENT OF INDIA 2018-12-31T00:00:00Z Republic of India!$!IN IN Republic of India India ... TI IBRD Active N {'Percent': 87, 'Name': 'Other rural developme... [{'code': '79', 'name': 'Other rural developme... 79 160000000 160000000 http://www.worldbank.org/projects/P130164/raja...
9 {'$oid': '52b213b38594d8a2be17c789'} 2014 October 2013-10-29T00:00:00Z THE KINGDOM OF MOROCCO 2014-12-31T00:00:00Z Kingdom of Morocco!$!MA MA Kingdom of Morocco Morocco ... BM,BC,BZ IBRD Active N {'Percent': 33, 'Name': 'Other accountability/... [{'code': '29', 'name': 'Other accountability/... 27,30,29 200000000 200000000 http://www.worldbank.org/projects/P130903?lang=en
10 {'$oid': '52b213b38594d8a2be17c78a'} 2014 October 2013-10-25T00:00:00Z GOVERNMENT OF SOUTH SUDAN NaN Republic of South Sudan!$!SS SS Republic of South Sudan South Sudan ... AZ,JB,AH IBRD Active Y {'Percent': 100, 'Name': 'Global food crisis r... [{'code': '91', 'name': 'Global food crisis re... 91 0 7530000 http://www.worldbank.org/projects/P145339?lang=en
11 {'$oid': '52b213b38594d8a2be17c78b'} 2014 October 2013-10-25T00:00:00Z NaN 2017-12-31T00:00:00Z Republic of India!$!IN IN Republic of India India ... JB,YC,WD,TI IBRD Active N {'Percent': 60, 'Name': 'Rural services and in... [{'code': '78', 'name': 'Rural services and in... 81,87,52,78 250000000 250000000 http://www.worldbank.org/projects/P146653?lang=en
12 {'$oid': '52b213b38594d8a2be17c78c'} 2014 October 2013-10-24T00:00:00Z GOVERNMENT OF GHANA 2019-06-30T00:00:00Z Republic of Ghana!$!GH GH Republic of Ghana Ghana ... CZ IBRD Active N {'Percent': 0, 'Name': ''} NaN NaN 97000000 97000000 http://www.worldbank.org/projects/P144140/gh-e...
13 {'$oid': '52b213b38594d8a2be17c78d'} 2014 October 2013-10-22T00:00:00Z GOVERNMENT OF TIMOR LESTE NaN Democratic Republic of Timor-Leste!$!TP TP Democratic Republic of Timor-Leste Timor-Leste ... BV,TI IBRD Active Y {'Percent': 20, 'Name': 'Regional integration'} [{'code': '47', 'name': 'Regional integration'... 78,81,47 40000000 40000000 http://www.worldbank.org/projects/P130975/timo...
14 {'$oid': '52b213b38594d8a2be17c78e'} 2014 October 2013-10-22T00:00:00Z GOVERNMENT OF JORDAN NaN Hashemite Kingdom of Jordan!$!JO JO Hashemite Kingdom of Jordan Jordan ... JB IBRD Active N {'Percent': 50, 'Name': 'Social safety nets'} [{'code': '54', 'name': 'Social safety nets'},... 53,56,54 0 9500000 http://www.worldbank.org/projects/P144832?lang=en
15 {'$oid': '52b213b38594d8a2be17c78f'} 2014 October 2013-10-17T00:00:00Z MINISTRY OF FINANCE 2019-04-30T00:00:00Z Samoa!$!WS WS Samoa Samoa ... TI IBRD Active N {'Percent': 60, 'Name': 'Rural services and in... [{'code': '78', 'name': 'Rural services and in... 49,81,78 20000000 20000000 http://www.worldbank.org/projects/P145545?lang=en
16 {'$oid': '52b213b38594d8a2be17c790'} 2014 October 2013-10-17T00:00:00Z MINISTRY OF FINANCE 2015-12-31T00:00:00Z Samoa!$!WS WS Samoa Samoa ... AZ,AJ,AH IBRD Active N {'Percent': 100, 'Name': 'Other rural developm... [{'code': '79', 'name': 'Other rural developme... 79 5000000 5000000 http://www.worldbank.org/projects/P145938?lang=en
17 {'$oid': '52b213b38594d8a2be17c791'} 2014 October 2013-10-16T00:00:00Z MINISTRY OF FINANCE AND BUDGET (MFB) NaN Republic of Madagascar!$!MG MG Republic of Madagascar Madagascar ... EP IBRD Active N {'Percent': 100, 'Name': 'Education for all'} [{'code': '65', 'name': 'Education for all'}] 65 0 85400000 http://www.worldbank.org/projects/P132616?lang=en
18 {'$oid': '52b213b38594d8a2be17c792'} 2014 October 2013-10-16T00:00:00Z ROYAL GOVERNMENT OF CAMBODIA NaN Kingdom of Cambodia!$!KH KH Kingdom of Cambodia Cambodia ... BK,JB,BH,BC,JA IBRD Active Y {'Percent': 17, 'Name': 'Child health'} [{'code': '63', 'name': 'Child health'}, {'cod... 69,57,25,67,63 0 13450000 http://www.worldbank.org/projects/P146271?lang=en
19 {'$oid': '52b213b38594d8a2be17c793'} 2014 October 2013-10-10T00:00:00Z MINISTRY OF FINANCE NaN Kingdom of Morocco!$!MA MA Kingdom of Morocco Morocco ... BH,BC,BZ IBRD Active N {'Percent': 40, 'Name': 'Public expenditure, f... [{'code': '27', 'name': 'Public expenditure, f... 25,26,27 0 4350000 http://www.worldbank.org/projects/P143979?lang=en
20 {'$oid': '52b213b38594d8a2be17c794'} 2014 October 2013-10-09T00:00:00Z AGA KHAN DEVELOPMENT NETWORK (AKDN) NaN Kyrgyz Republic!$!KG KG Kyrgyz Republic Kyrgyz Republic ... JB IBRD Active N {'Percent': 50, 'Name': 'Conflict prevention a... [{'code': '58', 'name': 'Conflict prevention a... 57,58 0 2000000 http://www.worldbank.org/projects/P132577?lang=en
21 {'$oid': '52b213b38594d8a2be17c795'} 2014 October 2013-10-07T00:00:00Z NEPAL NaN Nepal!$!NP NP Nepal Nepal ... BH,YW,JB IBRD Active N {'Percent': 30, 'Name': 'Urban services and ho... [{'code': '71', 'name': 'Urban services and ho... 57,71 0 2750000 http://www.worldbank.org/projects/P145359?lang=en
22 {'$oid': '52b213b38594d8a2be17c796'} 2014 October 2013-10-07T00:00:00Z MINISTRY OF PLANNING AND INTERNATIONAL C NaN Hashemite Kingdom of Jordan!$!JO JO Hashemite Kingdom of Jordan Jordan ... WC,BS,BH,TZ,WB IBRD Active N {'Percent': 25, 'Name': 'Other social developm... [{'code': '62', 'name': 'Other social developm... 58,62 0 50000000 http://www.worldbank.org/projects/P147689?lang=en
23 {'$oid': '52b213b38594d8a2be17c797'} 2014 October 2013-10-03T00:00:00Z REPUBLIC OF TAJIKISTAN NaN Republic of Tajikistan!$!TJ TJ Republic of Tajikistan Tajikistan ... JA IBRD Active N {'Percent': 60, 'Name': 'Nutrition and food se... [{'code': '68', 'name': 'Nutrition and food se... 63,68 0 2800000 http://www.worldbank.org/projects/P146109?lang=en
24 {'$oid': '52b213b38594d8a2be17c798'} 2014 September 2013-09-30T00:00:00Z REPUBLIC OF AZERBAIJAN 2018-12-31T00:00:00Z Republic of Azerbaijan!$!AZ AZ Republic of Azerbaijan Azerbaijan ... AB,AZ,YA IBRD Active N {'Percent': 30, 'Name': 'Rural markets'} [{'code': '75', 'name': 'Rural markets'}, {'co... 56,78,76,75 34500000 34500000 http://www.worldbank.org/projects/P122812/thir...
25 {'$oid': '52b213b38594d8a2be17c799'} 2014 September 2013-09-30T00:00:00Z UNIVERSITY OF QUEENSLAND NaN East Asia and Pacific!$!4E 4E East Asia and Pacific East Asia and Pacific ... AB,AZ IBRD Active N {'Percent': 40, 'Name': 'Other environment and... [{'code': '86', 'name': 'Other environment and... 80,81,86 0 4500000 http://www.worldbank.org/projects/P123933/capt...
26 {'$oid': '52b213b38594d8a2be17c79a'} 2014 September 2013-09-30T00:00:00Z LAO PEOPLES DEMOCRATIC REPUBLIC 2014-03-31T00:00:00Z Lao People's Democratic Republic!$!LA LA Lao People's Democratic Republic Lao People's Democratic Republic ... LG,FB,EZ,YZ,BC IBRD Closed N {'Percent': 14, 'Name': 'Regulation and compet... [{'code': '40', 'name': 'Regulation and compet... 67,27,49,40 20000000 20000000 http://www.worldbank.org/projects/P143025/lao-...
27 {'$oid': '52b213b38594d8a2be17c79b'} 2014 September 2013-09-30T00:00:00Z PACIFIC AVIATION SECURITY OFFICE 2018-12-31T00:00:00Z Pacific Islands!$!4P 4P Pacific Islands Pacific Islands ... BV,TV IBRD Active N {'Percent': 5, 'Name': 'Climate change'} [{'code': '81', 'name': 'Climate change'}, {'c... 52,47,25,81 2150000 2150000 http://www.worldbank.org/projects/P145057/paci...
28 {'$oid': '52b213b38594d8a2be17c79c'} 2014 September 2013-09-30T00:00:00Z SOLOMON ISLANDS GOVERNMENT NaN Solomon Islands!$!SB SB Solomon Islands Solomon Islands ... BZ,AZ IBRD Active Y {'Percent': 30, 'Name': 'Rural policies and in... [{'code': '77', 'name': 'Rural policies and in... 57,78,77 3000000 3000000 http://www.worldbank.org/projects/P146021?lang=en
29 {'$oid': '52b213b38594d8a2be17c79d'} 2014 September 2013-09-27T00:00:00Z GOVERNMENT OF MOZAMBIQUE NaN Republic of Mozambique!$!MZ MZ Republic of Mozambique Mozambique ... JB,BW,WZ,WD IBRD Active Y {'Percent': 4, 'Name': 'Other social developme... [{'code': '62', 'name': 'Other social developm... 85,62 32000000 32000000 http://www.worldbank.org/projects/P146098?lang=en
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
470 {'$oid': '52b213b38594d8a2be17c956'} 2013 September 2012-09-20T00:00:00Z GOVERNMENT OF LIBERIA NaN Republic of Liberia!$!LR LR Republic of Liberia Liberia ... BV,TI IBRD Active Y {'Percent': 40, 'Name': 'Trade facilitation an... [{'code': '49', 'name': 'Trade facilitation an... 78,49 50000000 50000000 http://www.worldbank.org/projects/P129654/libe...
471 {'$oid': '52b213b38594d8a2be17c957'} 2013 September 2012-09-20T00:00:00Z GOVERNMENT OF BANGLADESH 2018-12-31T00:00:00Z People's Republic of Bangladesh!$!BD BD People's Republic of Bangladesh Bangladesh ... BU,LA,LR IBRD Active N {'Percent': 45, 'Name': 'Rural services and in... [{'code': '78', 'name': 'Rural services and in... 59,81,57,78 155000000 155000000 http://www.worldbank.org/projects/P131263/rura...
472 {'$oid': '52b213b38594d8a2be17c958'} 2013 September 2012-09-13T00:00:00Z MINISTRY OF NATURE PROTECTION 2014-09-30T00:00:00Z Republic of Armenia!$!AM AM Republic of Armenia Armenia ... BC IBRD Active N {'Percent': 100, 'Name': 'Other environment an... [{'code': '86', 'name': 'Other environment and... 86 0 150000 http://www.worldbank.org/projects/P131631/repo...
473 {'$oid': '52b213b38594d8a2be17c959'} 2013 September 2012-09-13T00:00:00Z MINISTRY OF ENVIRONMENT, FORESTS, WATER NaN Republic of Albania!$!AL AL Republic of Albania Albania ... BC IBRD Active N {'Percent': 100, 'Name': 'Other environment an... [{'code': '86', 'name': 'Other environment and... 86 0 150000 http://www.worldbank.org/projects/P132679/land...
474 {'$oid': '52b213b38594d8a2be17c95a'} 2013 September 2012-09-11T00:00:00Z GOVERNMENT OF PAKISTAN 2017-06-30T00:00:00Z Islamic Republic of Pakistan!$!PK PK Islamic Republic of Pakistan Pakistan ... BM,BH IBRD Active N {'Percent': 34, 'Name': 'Municipal finance'} [{'code': '72', 'name': 'Municipal finance'}, ... 73,72 150000000 150000000 http://www.worldbank.org/projects/P112901/pk-p...
475 {'$oid': '52b213b38594d8a2be17c95b'} 2013 September 2012-09-11T00:00:00Z GOVERNMENT OF INDONESIA 2018-03-31T00:00:00Z Republic of Indonesia!$!ID ID Republic of Indonesia Indonesia ... BO,BH,LZ,TZ,WZ IBRD Active N {'Percent': 25, 'Name': 'Infrastructure servic... [{'code': '39', 'name': 'Infrastructure servic... 72,39 29600000 29600000 http://www.worldbank.org/projects/P118916/infr...
476 {'$oid': '52b213b38594d8a2be17c95c'} 2013 September 2012-09-11T00:00:00Z SOCIALIST REPUBLIC OF VIETNAM 2018-12-31T00:00:00Z Socialist Republic of Vietnam!$!VN VN Socialist Republic of Vietnam Vietnam ... BU,LT,LA IBRD Active N {'Percent': 100, 'Name': 'Rural services and i... [{'code': '78', 'name': 'Rural services and in... 78 448900000 448900000 http://www.worldbank.org/projects/P125996/dist...
477 {'$oid': '52b213b38594d8a2be17c95d'} 2013 September 2012-09-11T00:00:00Z REPUBLIC OF UZBEKISTAN NaN Republic of Uzbekistan!$!UZ UZ Republic of Uzbekistan Uzbekistan ... FH IBRD Active Y {'Percent': 40, 'Name': 'Micro, Small and Medi... [{'code': '41', 'name': 'Micro, Small and Medi... 75,41 40000000 40000000 http://www.worldbank.org/projects/P126962/addi...
478 {'$oid': '52b213b38594d8a2be17c95e'} 2013 September 2012-09-11T00:00:00Z ISLAMIC REPUBLIC OF PAKISTAN NaN Islamic Republic of Pakistan!$!PK PK Islamic Republic of Pakistan Pakistan ... BH IBRD Active Y {'Percent': 10, 'Name': 'Administrative and ci... [{'code': '25', 'name': 'Administrative and ci... 83,36,25 70000000 70000000 http://www.worldbank.org/projects/P131266/punj...
479 {'$oid': '52b213b38594d8a2be17c95f'} 2013 September 2012-09-10T00:00:00Z UT'Z CHE 2016-05-07T00:00:00Z Republic of Guatemala!$!GT GT Republic of Guatemala Guatemala ... AZ IBRD Active N {'Percent': 100, 'Name': 'Other social develop... [{'code': '62', 'name': 'Other social developm... 62 0 2510000 http://www.worldbank.org/projects/P130412/stre...
480 {'$oid': '52b213b38594d8a2be17c960'} 2013 September 2012-09-10T00:00:00Z GOVERNMENT OF ZAMBIA 2014-12-31T00:00:00Z Republic of Zambia!$!ZM ZM Republic of Zambia Zambia ... LS IBRD Active N {'Percent': 100, 'Name': 'Other public sector ... [{'code': '30', 'name': 'Other public sector g... 30 0 350000 http://www.worldbank.org/projects/P131881/zamb...
481 {'$oid': '52b213b38594d8a2be17c961'} 2013 September 2012-09-06T00:00:00Z GOVERNMENT OF INDIA 2015-12-31T00:00:00Z Republic of India!$!IN IN Republic of India India ... BC,EC,BQ,BH IBRD Active N {'Percent': 23, 'Name': 'Child health'} [{'code': '63', 'name': 'Child health'}, {'cod... 68,59,57,63 106000000 106000000 http://www.worldbank.org/projects/P121731/icds...
482 {'$oid': '52b213b38594d8a2be17c962'} 2013 September 2012-09-06T00:00:00Z GOVERNMENT OF INDIA 2018-12-31T00:00:00Z Republic of India!$!IN IN Republic of India India ... AI,YA,AH,BL,AB IBRD Active N {'Percent': 10, 'Name': 'Rural services and in... [{'code': '78', 'name': 'Rural services and in... 79,85,86,78 60000000 60000000 http://www.worldbank.org/projects/P122486/karn...
483 {'$oid': '52b213b38594d8a2be17c963'} 2013 September 2012-09-06T00:00:00Z GOVERNMENT OF INDIA 2013-09-30T00:00:00Z Republic of India!$!IN IN Republic of India India ... CZ,AH,LA,AI,LH IBRD Closed N {'Percent': 19, 'Name': 'Pollution management ... [{'code': '84', 'name': 'Pollution management ... 82,78,81,84 100000000 100000000 http://www.worldbank.org/projects/P124041/hima...
484 {'$oid': '52b213b38594d8a2be17c964'} 2013 September 2012-09-06T00:00:00Z ILO, MINLAND & CFSI & BANK 2015-12-31T00:00:00Z Republic of the Philippines!$!PH PH Republic of the Philippines Philippines ... EV,FH,BZ,JB IBRD Active Y {'Percent': 9, 'Name': 'Rural non-farm income ... [{'code': '76', 'name': 'Rural non-farm income... 78,58,41,76 0 5570000 http://www.worldbank.org/projects/P132238/mult...
485 {'$oid': '52b213b38594d8a2be17c965'} 2013 September 2012-09-05T00:00:00Z GOVERNMENT OF MONGOLIA 2013-12-31T00:00:00Z Mongolia!$!MN MN Mongolia Mongolia ... BZ,JA,AJ IBRD Active N {'Percent': 100, 'Name': 'Other communicable d... [{'code': '64', 'name': 'Other communicable di... 64 0 2900000 http://www.worldbank.org/projects/P131204/capa...
486 {'$oid': '52b213b38594d8a2be17c966'} 2013 September 2012-09-01T00:00:00Z GOVERNMENT OF LEBANON & JORDAN 2015-01-31T00:00:00Z Middle East and North Africa!$!5M 5M Middle East and North Africa Middle East and North Africa ... BZ,BS,BQ,BN IBRD Active N {'Percent': 25, 'Name': 'Other public sector g... [{'code': '30', 'name': 'Other public sector g... 54,55,56,30 0 2400000 http://www.worldbank.org/projects/P132097/5m-d...
487 {'$oid': '52b213b38594d8a2be17c967'} 2013 August 2012-08-30T00:00:00Z UNITED MEXICAN STATES 2017-08-31T00:00:00Z United Mexican States!$!MX MX United Mexican States Mexico ... YA,BL,AB IBRD Active N {'Percent': 20, 'Name': 'Biodiversity'} [{'code': '80', 'name': 'Biodiversity'}, {'cod... 82,79,77,80 0 11690000 http://www.worldbank.org/projects/P121116/sust...
488 {'$oid': '52b213b38594d8a2be17c968'} 2013 August 2012-08-30T00:00:00Z THE STATE OF RIO DE JANEIRO 2014-01-31T00:00:00Z Federative Republic of Brazil!$!BR BR Federative Republic of Brazil Brazil ... JA,EZ,BH IBRD Closed N {'Percent': 20, 'Name': 'Health system perform... [{'code': '67', 'name': 'Health system perform... 65,28,27,67 300000000 300000000 http://www.worldbank.org/projects/P126465/rio-...
489 {'$oid': '52b213b38594d8a2be17c969'} 2013 August 2012-08-30T00:00:00Z UN-HABITAT 2015-06-30T00:00:00Z The Independent State of Papua New Guine!$!PG PG The Independent State of Papua New Guine Papua New Guinea ... BW IBRD Closed N {'Percent': 50, 'Name': 'Urban services and ho... [{'code': '71', 'name': 'Urban services and ho... 73,55,52,71 0 350000 http://www.worldbank.org/projects/P128763/papu...
490 {'$oid': '52b213b38594d8a2be17c96a'} 2013 August 2012-08-29T00:00:00Z GOVERNMENT OF NEPAL 2014-06-30T00:00:00Z Nepal!$!NP NP Nepal Nepal ... BZ IBRD Active N {'Percent': 80, 'Name': 'Public expenditure, f... [{'code': '27', 'name': 'Public expenditure, f... 29,27 0 800000 http://www.worldbank.org/projects/P131860/stre...
491 {'$oid': '52b213b38594d8a2be17c96b'} 2013 August 2012-08-24T00:00:00Z PALESTINIAN WATER AUTHORITY 2014-03-31T00:00:00Z West Bank and Gaza!$!GZ GZ West Bank and Gaza West Bank and Gaza ... WC,WA IBRD Active N {'Percent': 100, 'Name': 'Rural services and i... [{'code': '78', 'name': 'Rural services and in... 78 0 3650000 http://www.worldbank.org/projects/P123322/wate...
492 {'$oid': '52b213b38594d8a2be17c96c'} 2013 August 2012-08-21T00:00:00Z GOVERNMENT OF PAKISTAN 2015-06-30T00:00:00Z Islamic Republic of Pakistan!$!PK PK Islamic Republic of Pakistan Pakistan ... EP IBRD Active N {'Percent': 70, 'Name': 'Education for all'} [{'code': '65', 'name': 'Education for all'}, ... 59,65 0 10000000 http://www.worldbank.org/projects/P128096/paki...
493 {'$oid': '52b213b38594d8a2be17c96d'} 2013 August 2012-08-21T00:00:00Z GOVERNMENT OF BANGLADESH 2015-04-30T00:00:00Z People's Republic of Bangladesh!$!BD BD People's Republic of Bangladesh Bangladesh ... BC IBRD Active N {'Percent': 100, 'Name': 'Other environment an... [{'code': '86', 'name': 'Other environment and... 86 0 150000 http://www.worldbank.org/projects/P132138/revi...
494 {'$oid': '52b213b38594d8a2be17c96e'} 2013 August 2012-08-17T00:00:00Z MINISTRY OF EDUCATION 2014-06-30T00:00:00Z Nepal!$!NP NP Nepal Nepal ... EZ IBRD Active N {'Percent': 100, 'Name': 'Natural disaster man... [{'code': '52', 'name': 'Natural disaster mana... 52 0 1510000 http://www.worldbank.org/projects/P129177/nepa...
495 {'$oid': '52b213b38594d8a2be17c96f'} 2013 August 2012-08-10T00:00:00Z THE COMPETITIVENESS COMPANY 2013-08-31T00:00:00Z Jamaica!$!JM JM Jamaica Jamaica ... EV,AZ IBRD Closed N {'Percent': 50, 'Name': 'Regulation and compet... [{'code': '40', 'name': 'Regulation and compet... 62,40 0 50000 http://www.worldbank.org/projects/P127299/tech...
496 {'$oid': '52b213b38594d8a2be17c970'} 2013 August 2012-08-09T00:00:00Z LAO PEOPLES DEMOCRATIC REPUBLIC 2012-12-31T00:00:00Z Lao People's Democratic Republic!$!LA LA Lao People's Democratic Republic Lao People's Democratic Republic ... YZ,JA,EZ,FZ,BC IBRD Closed N {'Percent': 14, 'Name': 'Child health'} [{'code': '63', 'name': 'Child health'}, {'cod... 65,27,49,63 20000000 20000000 http://www.worldbank.org/projects/P125298/lao-...
497 {'$oid': '52b213b38594d8a2be17c971'} 2013 August 2012-08-03T00:00:00Z GOVERNMENT OF THE REPUBLIC OF GUINEA 2014-12-31T00:00:00Z Republic of Guinea!$!GN GN Republic of Guinea Guinea ... AB,AH,AI IBRD Active N {'Percent': 100, 'Name': 'Global food crisis r... [{'code': '91', 'name': 'Global food crisis re... 91 0 20000000 http://www.worldbank.org/projects/P128309/seco...
498 {'$oid': '52b213b38594d8a2be17c972'} 2013 August 2012-08-02T00:00:00Z REPUBLIC OF INDONESIA 2017-09-30T00:00:00Z Republic of Indonesia!$!ID ID Republic of Indonesia Indonesia ... YA,BL,AB IBRD Active N {'Percent': 85, 'Name': 'Rural services and in... [{'code': '78', 'name': 'Rural services and in... 77,91,78 80000000 80000000 http://www.worldbank.org/projects/P117243/sust...
499 {'$oid': '52b213b38594d8a2be17c973'} 2013 August 2012-08-02T00:00:00Z GOVERMENT OF KENYA 2018-12-31T00:00:00Z Republic of Kenya!$!KE KE Republic of Kenya Kenya ... BV,TC IBRD Active N {'Percent': 1, 'Name': 'Municipal governance a... [{'code': '73', 'name': 'Municipal governance ... 39,49,88,73 300000000 300000000 http://www.worldbank.org/projects/P126321/keny...

500 rows × 50 columns


In [3]:
# Check DataFrame info
world_bank_df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 50 columns):
_id                         500 non-null object
approvalfy                  500 non-null int64
board_approval_month        500 non-null object
boardapprovaldate           500 non-null object
borrower                    485 non-null object
closingdate                 370 non-null object
country_namecode            500 non-null object
countrycode                 500 non-null object
countryname                 500 non-null object
countryshortname            500 non-null object
docty                       446 non-null object
envassesmentcategorycode    430 non-null object
grantamt                    500 non-null int64
ibrdcommamt                 500 non-null int64
id                          500 non-null object
idacommamt                  500 non-null int64
impagency                   472 non-null object
lendinginstr                495 non-null object
lendinginstrtype            495 non-null object
lendprojectcost             500 non-null int64
majorsector_percent         500 non-null object
mjsector_namecode           500 non-null object
mjtheme                     491 non-null object
mjtheme_namecode            500 non-null object
mjthemecode                 500 non-null object
prodline                    500 non-null object
prodlinetext                500 non-null object
productlinetype             500 non-null object
project_abstract            362 non-null object
project_name                500 non-null object
projectdocs                 446 non-null object
projectfinancialtype        500 non-null object
projectstatusdisplay        500 non-null object
regionname                  500 non-null object
sector                      500 non-null object
sector1                     500 non-null object
sector2                     380 non-null object
sector3                     265 non-null object
sector4                     174 non-null object
sector_namecode             500 non-null object
sectorcode                  500 non-null object
source                      500 non-null object
status                      500 non-null object
supplementprojectflg        498 non-null object
theme1                      500 non-null object
theme_namecode              491 non-null object
themecode                   491 non-null object
totalamt                    500 non-null int64
totalcommamt                500 non-null int64
url                         500 non-null object
dtypes: int64(7), object(43)
memory usage: 195.4+ KB

In [4]:
# List top 10 countries with the most projects
world_bank_df.groupby('countryname').size().sort_values(ascending=False)[:10]


Out[4]:
countryname
People's Republic of China         19
Republic of Indonesia              19
Socialist Republic of Vietnam      17
Republic of India                  16
Republic of Yemen                  13
Nepal                              12
People's Republic of Bangladesh    12
Kingdom of Morocco                 12
Africa                             11
Republic of Mozambique             11
dtype: int64

2) Find the top 10 major project themes (using column 'mjtheme_namecode')


In [5]:
# Load nested JSON column ('mjtheme_namecode') as Pandas DataFrame
data = json.load((open('data/world_bank_projects.json')))
project_themes_df = json_normalize(data, 'mjtheme_namecode')
project_themes_df


Out[5]:
code name
0 8 Human development
1 11
2 1 Economic management
3 6 Social protection and risk management
4 5 Trade and integration
5 2 Public sector governance
6 11 Environment and natural resources management
7 6 Social protection and risk management
8 7 Social dev/gender/inclusion
9 7 Social dev/gender/inclusion
10 5 Trade and integration
11 4 Financial and private sector development
12 6 Social protection and risk management
13 6
14 2 Public sector governance
15 4 Financial and private sector development
16 11 Environment and natural resources management
17 8
18 10 Rural development
19 7
20 2 Public sector governance
21 2 Public sector governance
22 2 Public sector governance
23 10 Rural development
24 2
25 10 Rural development
26 6 Social protection and risk management
27 6 Social protection and risk management
28 11 Environment and natural resources management
29 4
... ... ...
1469 8 Human development
1470 9 Urban development
1471 6 Social protection and risk management
1472 6 Social protection and risk management
1473 9 Urban development
1474 2 Public sector governance
1475 2 Public sector governance
1476 10 Rural development
1477 11
1478 8 Human development
1479 7 Social dev/gender/inclusion
1480 11 Environment and natural resources management
1481 5
1482 6 Social protection and risk management
1483 8
1484 4 Financial and private sector development
1485 7 Social dev/gender/inclusion
1486 8 Human development
1487 5 Trade and integration
1488 2 Public sector governance
1489 8 Human development
1490 10 Rural development
1491 6
1492 10 Rural development
1493 10 Rural development
1494 10 Rural development
1495 9 Urban development
1496 8 Human development
1497 5 Trade and integration
1498 4 Financial and private sector development

1499 rows × 2 columns


In [6]:
# Check DataFrame info
project_themes_df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1499 entries, 0 to 1498
Data columns (total 2 columns):
code    1499 non-null object
name    1499 non-null object
dtypes: object(2)
memory usage: 23.5+ KB

In [7]:
# List top 10 most occurring major project themes by name
project_themes_df.groupby('name').size().sort_values(ascending=False)[:10]
# Since there are missing values indicated by the missing name below this output is not accurate


Out[7]:
name
Environment and natural resources management    223
Rural development                               202
Human development                               197
Public sector governance                        184
Social protection and risk management           158
Financial and private sector development        130
                                                122
Social dev/gender/inclusion                     119
Trade and integration                            72
Urban development                                47
dtype: int64

In [8]:
# Convert code column to integers
# Setting errors='raise' will raise an alert if there are missing values
project_themes_df.code = pd.to_numeric(project_themes_df.code, errors='raise')
print(project_themes_df.code.dtype)


int64

In [9]:
# List top 10 most occurring major project themes by code
project_themes_df.groupby('code').size().sort_values(ascending=False)[:10]
# There are no missing values so this output is more accurate than above


Out[9]:
code
11    250
10    216
8     210
2     199
6     168
4     146
7     130
5      77
9      50
1      38
dtype: int64

3) In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.


In [10]:
# Change empty strings in the name column to nulls
project_themes_df.loc[project_themes_df['name'] == "", 'name'] = None
project_themes_df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1499 entries, 0 to 1498
Data columns (total 2 columns):
code    1499 non-null int64
name    1377 non-null object
dtypes: int64(1), object(1)
memory usage: 23.5+ KB

In [11]:
# Create dictionary that maps project codes to names
themes_by_code = {}
for i in range(project_themes_df['code'].min(), project_themes_df['code'].max() + 1):
    code_filter = (project_themes_df.name.notnull()) & (project_themes_df.code == i)
    themes_by_code[i] = project_themes_df.name[code_filter].iloc[0]
themes_by_code


Out[11]:
{1: 'Economic management',
 2: 'Public sector governance',
 3: 'Rule of law',
 4: 'Financial and private sector development',
 5: 'Trade and integration',
 6: 'Social protection and risk management',
 7: 'Social dev/gender/inclusion',
 8: 'Human development',
 9: 'Urban development',
 10: 'Rural development',
 11: 'Environment and natural resources management'}

In [12]:
# Fill in missing name values in DataFrame using code values
project_themes_df.name = project_themes_df.code.map(themes_by_code)
project_themes_df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1499 entries, 0 to 1498
Data columns (total 2 columns):
code    1499 non-null int64
name    1499 non-null object
dtypes: int64(1), object(1)
memory usage: 23.5+ KB

In [13]:
# Rerun code above from number 2 to list top 10 most occurring major project themes by name
project_themes_df.groupby('name').size().sort_values(ascending=False)[:10]


Out[13]:
name
Environment and natural resources management    250
Rural development                               216
Human development                               210
Public sector governance                        199
Social protection and risk management           168
Financial and private sector development        146
Social dev/gender/inclusion                     130
Trade and integration                            77
Urban development                                50
Economic management                              38
dtype: int64