JSON examples and exercise

get familiar with packages for dealing with JSON
study examples with JSON strings and files
work on exercise to be completed and submitted

reference: http://pandas.pydata.org/pandas-docs/stable/io.html#io-json-reader
data source: http://jsonstudio.com/resources/



In [2]:

    
import pandas as pd

imports for Python, Pandas



In [3]:

    
import json
from pandas.io.json import json_normalize

JSON example, with string

demonstrates creation of normalized dataframes (tables) from nested json string
source: http://pandas.pydata.org/pandas-docs/stable/io.html#normalization



In [3]:

    
# define json string
data = [{'state': 'Florida', 
         'shortname': 'FL',
         'info': {'governor': 'Rick Scott'},
         'counties': [{'name': 'Dade', 'population': 12345},
                      {'name': 'Broward', 'population': 40000},
                      {'name': 'Palm Beach', 'population': 60000}]},
        {'state': 'Ohio',
         'shortname': 'OH',
         'info': {'governor': 'John Kasich'},
         'counties': [{'name': 'Summit', 'population': 1234},
                      {'name': 'Cuyahoga', 'population': 1337}]}]



In [4]:

    
# use normalization to create tables from nested element
json_normalize(data, 'counties')



In [5]:

    
# further populate tables created from nested element
json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])









    Out[5]:







  
    
      
      name
      population
      state
      shortname
      info.governor
    
  
  
    
      0
      Dade
      12345
      Florida
      FL
      Rick Scott
    
    
      1
      Broward
      40000
      Florida
      FL
      Rick Scott
    
    
      2
      Palm Beach
      60000
      Florida
      FL
      Rick Scott
    
    
      3
      Summit
      1234
      Ohio
      OH
      John Kasich
    
    
      4
      Cuyahoga
      1337
      Ohio
      OH
      John Kasich

JSON example, with file

demonstrates reading in a json file as a string and as a table
uses small sample file containing data about projects funded by the World Bank
data source: http://jsonstudio.com/resources/



In [4]:

    
# load json as string
json.load((open('data/world_bank_projects_less.json')))









    Out[4]:





list



In [7]:

    
# load as Pandas dataframe
sample_json_df = pd.read_json('data/world_bank_projects_less.json')
sample_json_df









    Out[7]:







  
    
      
      _id
      approvalfy
      board_approval_month
      boardapprovaldate
      borrower
      closingdate
      country_namecode
      countrycode
      countryname
      countryshortname
      ...
      sectorcode
      source
      status
      supplementprojectflg
      theme1
      theme_namecode
      themecode
      totalamt
      totalcommamt
      url
    
  
  
    
      0
      {'$oid': '52b213b38594d8a2be17c780'}
      1999
      November
      2013-11-12T00:00:00Z
      FEDERAL DEMOCRATIC REPUBLIC OF ETHIOPIA
      2018-07-07T00:00:00Z
      Federal Democratic Republic of Ethiopia!$!ET
      ET
      Federal Democratic Republic of Ethiopia
      Ethiopia
      ...
      ET,BS,ES,EP
      IBRD
      Active
      N
      {'Name': 'Education for all', 'Percent': 100}
      [{'name': 'Education for all', 'code': '65'}]
      65
      130000000
      130000000
      http://www.worldbank.org/projects/P129828/ethi...
    
    
      1
      {'$oid': '52b213b38594d8a2be17c781'}
      2015
      November
      2013-11-04T00:00:00Z
      GOVERNMENT OF TUNISIA
      NaN
      Republic of Tunisia!$!TN
      TN
      Republic of Tunisia
      Tunisia
      ...
      BZ,BS
      IBRD
      Active
      N
      {'Name': 'Other economic management', 'Percent...
      [{'name': 'Other economic management', 'code':...
      54,24
      0
      4700000
      http://www.worldbank.org/projects/P144674?lang=en
    
  

2 rows × 50 columns

JSON exercise

Using data in file 'data/world_bank_projects.json' and the techniques demonstrated above,

Find the 10 countries with most projects
Find the top 10 major project themes (using column 'mjtheme_namecode')
In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.



In [ ]:

	name	population
0	Dade	12345
1	Broward	40000
2	Palm Beach	60000
3	Summit	1234
4	Cuyahoga	1337

	name	population	state	shortname	info.governor
0	Dade	12345	Florida	FL	Rick Scott
1	Broward	40000	Florida	FL	Rick Scott
2	Palm Beach	60000	Florida	FL	Rick Scott
3	Summit	1234	Ohio	OH	John Kasich
4	Cuyahoga	1337	Ohio	OH	John Kasich

	_id	approvalfy	board_approval_month	boardapprovaldate	borrower	closingdate	country_namecode	countrycode	countryname	countryshortname	...	sectorcode	source	status	supplementprojectflg	theme1	theme_namecode	themecode	totalamt	totalcommamt	url
0	{'$oid': '52b213b38594d8a2be17c780'}	1999	November	2013-11-12T00:00:00Z	FEDERAL DEMOCRATIC REPUBLIC OF ETHIOPIA	2018-07-07T00:00:00Z	Federal Democratic Republic of Ethiopia!$!ET	ET	Federal Democratic Republic of Ethiopia	Ethiopia	...	ET,BS,ES,EP	IBRD	Active	N	{'Name': 'Education for all', 'Percent': 100}	[{'name': 'Education for all', 'code': '65'}]	65	130000000	130000000	http://www.worldbank.org/projects/P129828/ethi...
1	{'$oid': '52b213b38594d8a2be17c781'}	2015	November	2013-11-04T00:00:00Z	GOVERNMENT OF TUNISIA	NaN	Republic of Tunisia!$!TN	TN	Republic of Tunisia	Tunisia	...	BZ,BS	IBRD	Active	N	{'Name': 'Other economic management', 'Percent...	[{'name': 'Other economic management', 'code':...	54,24	0	4700000	http://www.worldbank.org/projects/P144674?lang=en