JSON examples and exercise

get familiar with packages for dealing with JSON
study examples with JSON strings and files
work on exercise to be completed and submitted

reference: http://pandas-docs.github.io/pandas-docs-travis/io.html#json
data source: http://jsonstudio.com/resources/



In [1]:

    
import pandas as pd



In [2]:

    
import json
from pandas.io.json import json_normalize



In [3]:

    
from numpy import nan

JSON exercise

Using data in file 'data/world_bank_projects.json' and the techniques demonstrated above,

Find the 10 countries with most projects
Find the top 10 major project themes (using column 'mjtheme_namecode')
In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.

Part 1



In [4]:

    
pd.read_json('data/world_bank_projects.json')['countryname'].value_counts()[:10]









    Out[4]:





People's Republic of China         19
Republic of Indonesia              19
Socialist Republic of Vietnam      17
Republic of India                  16
Republic of Yemen                  13
Kingdom of Morocco                 12
People's Republic of Bangladesh    12
Nepal                              12
Republic of Mozambique             11
Africa                             11
dtype: int64

Part 2



In [5]:

    
wb_projects_str=json.load((open('data/world_bank_projects.json')))



In [6]:

    
themes=json_normalize(wb_projects_str,'mjtheme_namecode')
codecounts=themes['code'].value_counts()
codecounts.name='counts'
codecounts









    Out[6]:





11    250
10    216
8     210
2     199
6     168
4     146
7     130
5      77
9      50
1      38
3      15
Name: counts, dtype: int64

Make somthing to tell us what each code means



In [7]:

    
codedict=themes.replace('',nan).dropna().groupby('code').last()
codedict









    Out[7]:






  
    
      
      name
    
    
      code
      
    
  
  
    
      1
      Economic management
    
    
      10
      Rural development
    
    
      11
      Environment and natural resources management
    
    
      2
      Public sector governance
    
    
      3
      Rule of law
    
    
      4
      Financial and private sector development
    
    
      5
      Trade and integration
    
    
      6
      Social protection and risk management
    
    
      7
      Social dev/gender/inclusion
    
    
      8
      Human development
    
    
      9
      Urban development

Answer:

Combine the two



In [8]:

    
pd.DataFrame([codedict['name'],codecounts]).transpose().sort('counts',ascending=False)









    Out[8]:






  
    
      
      name
      counts
    
  
  
    
      11
      Environment and natural resources management
      250
    
    
      10
      Rural development
      216
    
    
      8
      Human development
      210
    
    
      2
      Public sector governance
      199
    
    
      6
      Social protection and risk management
      168
    
    
      4
      Financial and private sector development
      146
    
    
      7
      Social dev/gender/inclusion
      130
    
    
      5
      Trade and integration
      77
    
    
      9
      Urban development
      50
    
    
      1
      Economic management
      38
    
    
      3
      Rule of law
      15

Part 3

Basically we just copy the code column and then replace all the codes with the corresponding name. This disturbs my sensibilities as it's doing lots more replacements than it has to, but it's faster than anything else I came up with.



In [9]:

    
%%time
themes['name']=themes['code']


themes['name'].replace(to_replace=codedict.index.values,value=codedict.values,inplace=True)









    



CPU times: user 4.82 ms, sys: 32 µs, total: 4.85 ms
Wall time: 4.49 ms



In [10]:

    
themes









    Out[10]:






  
    
      
      code
      name
    
  
  
    
      0
      8
      Human development
    
    
      1
      11
      Environment and natural resources management
    
    
      2
      1
      Economic management
    
    
      3
      6
      Social protection and risk management
    
    
      4
      5
      Trade and integration
    
    
      5
      2
      Public sector governance
    
    
      6
      11
      Environment and natural resources management
    
    
      7
      6
      Social protection and risk management
    
    
      8
      7
      Social dev/gender/inclusion
    
    
      9
      7
      Social dev/gender/inclusion
    
    
      10
      5
      Trade and integration
    
    
      11
      4
      Financial and private sector development
    
    
      12
      6
      Social protection and risk management
    
    
      13
      6
      Social protection and risk management
    
    
      14
      2
      Public sector governance
    
    
      15
      4
      Financial and private sector development
    
    
      16
      11
      Environment and natural resources management
    
    
      17
      8
      Human development
    
    
      18
      10
      Rural development
    
    
      19
      7
      Social dev/gender/inclusion
    
    
      20
      2
      Public sector governance
    
    
      21
      2
      Public sector governance
    
    
      22
      2
      Public sector governance
    
    
      23
      10
      Rural development
    
    
      24
      2
      Public sector governance
    
    
      25
      10
      Rural development
    
    
      26
      6
      Social protection and risk management
    
    
      27
      6
      Social protection and risk management
    
    
      28
      11
      Environment and natural resources management
    
    
      29
      4
      Financial and private sector development
    
    
      ...
      ...
      ...
    
    
      1469
      8
      Human development
    
    
      1470
      9
      Urban development
    
    
      1471
      6
      Social protection and risk management
    
    
      1472
      6
      Social protection and risk management
    
    
      1473
      9
      Urban development
    
    
      1474
      2
      Public sector governance
    
    
      1475
      2
      Public sector governance
    
    
      1476
      10
      Rural development
    
    
      1477
      11
      Environment and natural resources management
    
    
      1478
      8
      Human development
    
    
      1479
      7
      Social dev/gender/inclusion
    
    
      1480
      11
      Environment and natural resources management
    
    
      1481
      5
      Trade and integration
    
    
      1482
      6
      Social protection and risk management
    
    
      1483
      8
      Human development
    
    
      1484
      4
      Financial and private sector development
    
    
      1485
      7
      Social dev/gender/inclusion
    
    
      1486
      8
      Human development
    
    
      1487
      5
      Trade and integration
    
    
      1488
      2
      Public sector governance
    
    
      1489
      8
      Human development
    
    
      1490
      10
      Rural development
    
    
      1491
      6
      Social protection and risk management
    
    
      1492
      10
      Rural development
    
    
      1493
      10
      Rural development
    
    
      1494
      10
      Rural development
    
    
      1495
      9
      Urban development
    
    
      1496
      8
      Human development
    
    
      1497
      5
      Trade and integration
    
    
      1498
      4
      Financial and private sector development
    
  

1499 rows × 2 columns

	name
code
1	Economic management
10	Rural development
11	Environment and natural resources management
2	Public sector governance
3	Rule of law
4	Financial and private sector development
5	Trade and integration
6	Social protection and risk management
7	Social dev/gender/inclusion
8	Human development
9	Urban development