The ONS Register of Geographic Codes is a register maintained by the Office of National Statistics (ONS) that describes a range of coding schemes for various UK geographies.
The register is provided as an Excel spreadsheet cointing multiple worksheets, with each worksheet contianing the codes for a particular geography.
This notebook parses the codes into a set of unnormalised relational database tables in a SQLite3 database.
See also: GSS blog - Why Do We Need Another Register?, which describes the development of a new set of official registers for describing UK geographies.
The ONS Register of Geographic Codes is updated several times a year. Finf the most current version here.
In [172]:
import sqlite3
con = sqlite3.connect("onsgeocodes.sqlite")
In [173]:
import pandas as pd
In [174]:
#Create a function to grab a zip file from an online location and then grab a specified file from inside it
import requests, zipfile
#The following fudge copes with Python 2 and Python 3
try:
from StringIO import StringIO as zreader
except ImportError:
from io import BytesIO as zreader
def ziparchivereader(f):
return zipfile.ZipFile(f, 'r')
def zipgrabber(url):
''' Grab a zip file from a URL '''
r = requests.get(url)
z = ziparchivereader(zreader(r.content))
#z = zipfile.ZipFile(zreader(r.content))
return z
def zipgrabberfile(url, f):
''' Grab a file by name from a zip file grabbed from a URL '''
return zipgrabber(url).open(f)
def zipfilebyname(z,f):
''' Grab a file by name from an already grabbed zip file '''
return z.open(f)
def zipfilelist(z):
''' Return the names of files contained in a grabbed zip file '''
return z.namelist()
In [175]:
url="https://ons.maps.arcgis.com/sharing/rest/content/items/ef6d5454a6f44ea3a7387b3401e2de5e/data"
!mkdir -p downloads
f='downloads/tmp_register_ons_geo_codes.zip'
!rm {f}
!wget -O {f} {url}
In [176]:
zf=ziparchivereader(f)
zipfilelist(zf)
Out[176]:
The .csv file contains metadata describing the geographies listed in separate sheets in the .xlsx file.
In [177]:
metadata=pd.read_csv(zf.open('RGC_JUN_2017_UK.csv'))
metadata.head()
Out[177]:
In [178]:
xl=pd.read_excel(zf.open('RGC_JUN_2017_UK.xlsx'), sheetname=None)
xl.keys()
Out[178]:
The register appears to describe only a subset of the codesets listed in the metadata file - codes relating specifically to Scotland are not provided.
In [179]:
metadata["codeAbbrv"] = metadata["Entity code"].map(str) + '_' + metadata["Entity abbreviation"].map(str)
codes = metadata["codeAbbrv"].tolist()
print(set(codes) - set(xl.keys()))
Metadata for the spreadsheet is contained in the RGC sheet.
In [180]:
xl['RGC']["codeAbbrv"] = xl['RGC']["Entity code"].map(str) + '_' + xl['RGC']["Entity abbreviation"].map(str)
xl['RGC'].to_sql(con=con, name='metadata', index=False, if_exists='replace')
In [182]:
cols=['GEOGCD','GEOGNM','GEOGNMW','OPER_DATE','TERM_DATE','STATUS']
bigcodes=pd.DataFrame(columns=['sheet']+cols)
bigcodes.to_sql(con=con, name='codelist', index=False, if_exists='replace')
sheets= list(xl.keys())
sheets.remove('For_Scotland')
for sheet in sheets[2:]:
xl[sheet].to_sql(con=con, name=sheet, index=False, if_exists='replace')
xl[sheet]['sheet']=sheet
#Reorder the columns
xl[sheet][['sheet']+cols].to_sql(con=con, name='codelist', index=False, if_exists='append')
In [185]:
q='SELECT * FROM E02_MSOA LIMIT 5'
pd.read_sql_query(q, con)
Out[185]:
In [183]:
q='SELECT * FROM codelist WHERE "GEOGCD"="{code}"'.format(code='W40000004')
pd.read_sql_query(q, con)
Out[183]:
In [184]:
q='''
SELECT * FROM codelist JOIN metadata
WHERE "GEOGNM"="{name}" AND codeAbbrv=sheet AND codelist.STATUS="live"
'''.format(name='Isle of Wight')
pd.read_sql_query(q, con)
Out[184]:
In [ ]:
q='''
SELECT DISTINCT "Entity name", sheet FROM codelist JOIN metadata
WHERE "GEOGNM" LIKE "%{name}%" AND codeAbbrv=sheet AND codelist.STATUS="live"
'''.format(name='Isle of Wight')
pd.read_sql_query(q, con)
In [ ]:
#Lookup a code
q='''
SELECT * FROM codelist JOIN metadata
WHERE "GEOGCD" = "{name}" AND codeAbbrv=sheet
'''.format(name='E05008479')
pd.read_sql_query(q, con)
In [ ]: