JSON examples and exercise

get familiar with packages for dealing with JSON
study examples with JSON strings and files
work on exercise to be completed and submitted

reference: http://pandas.pydata.org/pandas-docs/stable/io.html#io-json-reader
data source: http://jsonstudio.com/resources/



In [3]:

imports for Python, Pandas



In [ ]:

JSON example, with string

demonstrates creation of normalized dataframes (tables) from nested json string
source: http://pandas.pydata.org/pandas-docs/stable/io.html#normalization



In [ ]:



In [ ]:



In [ ]:

JSON example, with file

demonstrates reading in a json file as a string and as a table
uses small sample file containing data about projects funded by the World Bank
data source: http://jsonstudio.com/resources/



In [ ]:



In [ ]:

JSON exercise

Using data in file 'data/world_bank_projects.json' and the techniques demonstrated above,

Find the 10 countries with most projects
Find the top 10 major project themes (using column 'mjtheme_namecode')
In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.



In [2]:

    
# Answer to Exercise 1 (Find the 10 countries with most projects.)

import pandas as pd

import json from pandas.io.json 
import json_normalize

# load as Pandas dataframe
exercise_json_df = pd.read_json('data/world_bank_projects.json')
exercise_json_df

#Get frequency that the country name value occurs, this will signify the amount of projects each country has
country_project_counts = exercise_json_df.countryname.value_counts()

#Display only the top 10 countries with most projects
country_project_counts[:10]









    Out[2]:





People's Republic of China         19
Republic of Indonesia              19
Socialist Republic of Vietnam      17
Republic of India                  16
Republic of Yemen                  13
Nepal                              12
Kingdom of Morocco                 12
People's Republic of Bangladesh    12
Republic of Mozambique             11
Africa                             11
Name: countryname, dtype: int64



In [6]:

    
#Answer to Exercise 2 (Find the top 10 major project themes (using column 'mjtheme_namecode').)

import pandas as pd

import json
from pandas.io.json import json_normalize

# load json as string
wbprojects = json.load((open('data/world_bank_projects.json')))

# normalize 'mjtheme_namecode' JSON data into a flat table
# NOTE that there are blank names associated with the code.
mjtheme_normalized = json_normalize(wbprojects, 'mjtheme_namecode')

# Use both 'code' and 'name' as index but get a frequency count by the code,
# This way we won't run into the blank name issue.
mjtheme_normalized['Counts'] = mjtheme_normalized.groupby('code')['name'].transform('count')

# Now, sort by the frequency count field in a descending manner.
# Display only the first 10 most occuring project themes.
freq_sort = mjtheme_normalized.set_index(['Counts']).sort_index(axis=0,ascending=False)
freq_sort_no_dup = freq_sort.drop_duplicates('code',inplace = False)
freq_sort_no_dup[:10]









    Out[6]:






  
    
      
      code
      name
    
    
      Counts
      
      
    
  
  
    
      250
      11
      Environment and natural resources management
    
    
      216
      10
      Rural development
    
    
      210
      8
      Human development
    
    
      199
      2
      Public sector governance
    
    
      168
      6
      Social protection and risk management
    
    
      146
      4
      Financial and private sector development
    
    
      130
      7
      Social dev/gender/inclusion
    
    
      77
      5
      Trade and integration
    
    
      50
      9
      Urban development
    
    
      38
      1
      Economic management



In [7]:

    
# Answer to Exercise 3 (In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.)

import pandas as pd
import numpy as np

import json
from pandas.io.json import json_normalize

# load json as string
wbprojects = json.load((open('data/world_bank_projects.json')))

# normalize 'mjtheme_namecode' JSON data into a flat table
# NOTE that there are blank names associated with the code.
mjtheme_normalized = json_normalize(wbprojects, 'mjtheme_namecode')

# We want to create a reference table of all valid names and values in 'mjtheme_namecode'
# In order to replace the missing name values in the original dataframe.
# First replace the empty name(s) with a null qualifier so pandas can recognize it as such.
mjtheme_normalized['name'].replace('', np.nan, inplace = True)

# Drop the rows with null name values and then make a unique set of rows to make our proper reference table.
mjtheme_no_dup = mjtheme_normalized.dropna(subset = ['name']).drop_duplicates('name', inplace = False, keep = 'last')
mjtheme_no_dup

# Now, merge reference table with mjtheme_namecode dataframe to populate the missing blank names.
left = mjtheme_normalized
right = mjtheme_no_dup 
complete_mjtheme = pd.merge(left,right, on = ['code','name'], right_index= 'TRUE')

# Restore the sort
complete_mjtheme.sort_index()









    Out[7]:






  
    
      
      code
      name
    
  
  
    
      0
      8
      Human development
    
    
      2
      1
      Economic management
    
    
      3
      6
      Social protection and risk management
    
    
      4
      5
      Trade and integration
    
    
      5
      2
      Public sector governance
    
    
      6
      11
      Environment and natural resources management
    
    
      7
      6
      Social protection and risk management
    
    
      8
      7
      Social dev/gender/inclusion
    
    
      9
      7
      Social dev/gender/inclusion
    
    
      10
      5
      Trade and integration
    
    
      11
      4
      Financial and private sector development
    
    
      12
      6
      Social protection and risk management
    
    
      14
      2
      Public sector governance
    
    
      15
      4
      Financial and private sector development
    
    
      16
      11
      Environment and natural resources management
    
    
      18
      10
      Rural development
    
    
      20
      2
      Public sector governance
    
    
      21
      2
      Public sector governance
    
    
      22
      2
      Public sector governance
    
    
      23
      10
      Rural development
    
    
      25
      10
      Rural development
    
    
      26
      6
      Social protection and risk management
    
    
      27
      6
      Social protection and risk management
    
    
      28
      11
      Environment and natural resources management
    
    
      30
      5
      Trade and integration
    
    
      31
      11
      Environment and natural resources management
    
    
      32
      10
      Rural development
    
    
      33
      6
      Social protection and risk management
    
    
      34
      6
      Social protection and risk management
    
    
      35
      6
      Social protection and risk management
    
    
      ...
      ...
      ...
    
    
      1465
      11
      Environment and natural resources management
    
    
      1466
      8
      Human development
    
    
      1467
      2
      Public sector governance
    
    
      1468
      2
      Public sector governance
    
    
      1469
      8
      Human development
    
    
      1470
      9
      Urban development
    
    
      1471
      6
      Social protection and risk management
    
    
      1472
      6
      Social protection and risk management
    
    
      1473
      9
      Urban development
    
    
      1474
      2
      Public sector governance
    
    
      1475
      2
      Public sector governance
    
    
      1476
      10
      Rural development
    
    
      1478
      8
      Human development
    
    
      1479
      7
      Social dev/gender/inclusion
    
    
      1480
      11
      Environment and natural resources management
    
    
      1482
      6
      Social protection and risk management
    
    
      1484
      4
      Financial and private sector development
    
    
      1485
      7
      Social dev/gender/inclusion
    
    
      1486
      8
      Human development
    
    
      1487
      5
      Trade and integration
    
    
      1488
      2
      Public sector governance
    
    
      1489
      8
      Human development
    
    
      1490
      10
      Rural development
    
    
      1492
      10
      Rural development
    
    
      1493
      10
      Rural development
    
    
      1494
      10
      Rural development
    
    
      1495
      9
      Urban development
    
    
      1496
      8
      Human development
    
    
      1497
      5
      Trade and integration
    
    
      1498
      4
      Financial and private sector development
    
  

1377 rows × 2 columns



In [ ]:

	code	name
Counts
250	11	Environment and natural resources management
216	10	Rural development
210	8	Human development
199	2	Public sector governance
168	6	Social protection and risk management
146	4	Financial and private sector development
130	7	Social dev/gender/inclusion
77	5	Trade and integration
50	9	Urban development
38	1	Economic management