Scripts for creating weather station, zipcode, and climate zone mappings

The scripts below are used to fetch, format, and process the raw data that form the mappings used by the eemeter internally.

Fetch primary sources of data

Note: The only manually created file is the climate zone file (climate_zone.csv, which was constructed from a set of references downloaded below when the top script is run; all others are primary sources


In [1]:
!mkdir -p data

# ZIP code shapefiles from Census.gov
!wget -N http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_us_zcta510_500k.zip -P data
!unzip -o data/cb_2014_us_zcta510_500k.zip -d data
!mapshaper -i data/cb_2014_us_zcta510_500k.shp -o format=geojson data/cb_2014_us_zcta510_500k.json
!rm data/cb_2014_us_zcta510_500k.cpg
!rm data/cb_2014_us_zcta510_500k.dbf
!rm data/cb_2014_us_zcta510_500k.prj
!rm data/cb_2014_us_zcta510_500k.shp
!rm data/cb_2014_us_zcta510_500k.shp.ea.iso.xml
!rm data/cb_2014_us_zcta510_500k.shp.iso.xml
!rm data/cb_2014_us_zcta510_500k.shp.xml
!rm data/cb_2014_us_zcta510_500k.shx
!rm data/cb_2014_us_zcta510_500k.zip

# County shapefiles from Census.gov
!wget -N http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_county_500k.zip -P data
!unzip -o data/cb_2013_us_county_500k.zip -d data
!mapshaper -i data/cb_2013_us_county_500k.shp -o format=geojson data/cb_2013_us_county_500k.json
!rm data/cb_2013_us_county_500k.dbf
!rm data/cb_2013_us_county_500k.prj
!rm data/cb_2013_us_county_500k.shp
!rm data/cb_2013_us_county_500k.shp.iso.xml
!rm data/cb_2013_us_county_500k.shp.xml
!rm data/cb_2013_us_county_500k.shx
!rm data/cb_2013_us_county_500k.zip

# CA climate zone division shapefiles and transform from CEC
!wget -N http://www.energy.ca.gov/maps/renewable/CA_Building_Standards_Climate_Zones.zip -P data
!unzip -o data/CA_Building_Standards_Climate_Zones.zip -d data
!ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:4326 data/CA_Building_Standards_Climate_Zones_reprojected.shp data/CA_Building_Standards_Climate_Zones.shp
!mapshaper -i data/CA_Building_Standards_Climate_Zones_reprojected.shp -o format=geojson data/CA_Building_Standards_Climate_Zones.json
!rm data/CA\ Building\ Standards\ Climate\ Zones.lyr
!rm data/CA_Building_Standards_Climate_Zones.cpg
!rm data/CA_Building_Standards_Climate_Zones.dbf
!rm data/CA_Building_Standards_Climate_Zones.prj
!rm data/CA_Building_Standards_Climate_Zones.sbn
!rm data/CA_Building_Standards_Climate_Zones.sbx
!rm data/CA_Building_Standards_Climate_Zones.shp
!rm data/CA_Building_Standards_Climate_Zones.shp.xml
!rm data/CA_Building_Standards_Climate_Zones.shx
!rm data/CA_Building_Standards_Climate_Zones.zip
!rm data/CA_Building_Standards_Climate_Zones_reprojected.dbf
!rm data/CA_Building_Standards_Climate_Zones_reprojected.prj
!rm data/CA_Building_Standards_Climate_Zones_reprojected.shp
!rm data/CA_Building_Standards_Climate_Zones_reprojected.shx

# NCDC weather data quality
!wget -N ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-inventory.csv -P data

# NCDC station lat lngs
!wget -N ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-history.csv -P data

# IECC/Building America climate zone csv. Custom; assembled and corrected from primary source
!wget -N https://gist.githubusercontent.com/philngo/d3e251040569dba67942/raw/d1d2e13d73cc50147be6c90d8232f2e4c3eeaffc/climate_zones.csv -P data

# County ids - for reference - used to derive climate_zones.csv
!wget -N http://www2.census.gov/geo/docs/reference/codes/files/national_county.txt -P data

# Building america climate zone guide - for reference - used to derive file below.
!wget -N http://energy.gov/sites/prod/files/2015/02/f19/ba_climate_guide_2013.pdf -P data


--2016-06-23 16:22:19--  http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_us_zcta510_500k.zip
Resolving www2.census.gov... 2600:1406:2c:183::208c, 2600:1406:2c:18e::208c, 23.72.103.78
Connecting to www2.census.gov|2600:1406:2c:183::208c|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 67558284 (64M) [application/zip]
Saving to: 'data/cb_2014_us_zcta510_500k.zip'

cb_2014_us_zcta510_ 100%[=====================>]  64.43M  1.29MB/s   in 66s    

2016-06-23 16:23:26 (996 KB/s) - 'data/cb_2014_us_zcta510_500k.zip' saved [67558284/67558284]

Archive:  data/cb_2014_us_zcta510_500k.zip
  inflating: data/cb_2014_us_zcta510_500k.shp.ea.iso.xml  
  inflating: data/cb_2014_us_zcta510_500k.shp.iso.xml  
  inflating: data/cb_2014_us_zcta510_500k.shp.xml  
  inflating: data/cb_2014_us_zcta510_500k.shp  
  inflating: data/cb_2014_us_zcta510_500k.shx  
  inflating: data/cb_2014_us_zcta510_500k.dbf  
  inflating: data/cb_2014_us_zcta510_500k.prj  
 extracting: data/cb_2014_us_zcta510_500k.cpg  
Warning: Shapefile Z data will be lost.
Wrote data/cb_2014_us_zcta510_500k.json
--2016-06-23 16:23:55--  http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_county_500k.zip
Resolving www2.census.gov... 2600:1406:2c:183::208c, 2600:1406:2c:18e::208c, 23.72.103.78
Connecting to www2.census.gov|2600:1406:2c:183::208c|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9906021 (9.4M) [application/zip]
Saving to: 'data/cb_2013_us_county_500k.zip'

cb_2013_us_county_5 100%[=====================>]   9.45M  1.15MB/s   in 11s    

2016-06-23 16:24:07 (863 KB/s) - 'data/cb_2013_us_county_500k.zip' saved [9906021/9906021]

Archive:  data/cb_2013_us_county_500k.zip
  inflating: data/cb_2013_us_county_500k.shp  
  inflating: data/cb_2013_us_county_500k.shx  
  inflating: data/cb_2013_us_county_500k.shp.iso.xml  
  inflating: data/cb_2013_us_county_500k.shp.xml  
  inflating: data/cb_2013_us_county_500k.prj  
  inflating: data/cb_2013_us_county_500k.dbf  
[dbf] Detected encoding: latin1
[dbf] Sample text:
 Guánica       Mayagüez      Añasco        Bayamón       Canóvanas    
 Cataño        Las Marías    Manatí        Rincón        Río Grande   
 San Sebastián Juana Díaz    Doña Ana      Comerío       Loíza        
 San Germán    Peñuelas     
Wrote data/cb_2013_us_county_500k.json
--2016-06-23 16:24:11--  http://www.energy.ca.gov/maps/renewable/CA_Building_Standards_Climate_Zones.zip
Resolving www.energy.ca.gov... 134.186.116.98
Connecting to www.energy.ca.gov|134.186.116.98|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2213027 (2.1M) [application/zip]
Saving to: 'data/CA_Building_Standards_Climate_Zones.zip'

CA_Building_Standar 100%[=====================>]   2.11M   238KB/s   in 9.0s   

2016-06-23 16:24:20 (240 KB/s) - 'data/CA_Building_Standards_Climate_Zones.zip' saved [2213027/2213027]

Archive:  data/CA_Building_Standards_Climate_Zones.zip
  inflating: data/CA Building Standards Climate Zones.lyr  
 extracting: data/CA_Building_Standards_Climate_Zones.cpg  
  inflating: data/CA_Building_Standards_Climate_Zones.dbf  
  inflating: data/CA_Building_Standards_Climate_Zones.prj  
  inflating: data/CA_Building_Standards_Climate_Zones.sbn  
  inflating: data/CA_Building_Standards_Climate_Zones.sbx  
  inflating: data/CA_Building_Standards_Climate_Zones.shp  
  inflating: data/CA_Building_Standards_Climate_Zones.shp.xml  
  inflating: data/CA_Building_Standards_Climate_Zones.shx  
  inflating: data/GIS Disclaimer.docx  
Warning: Shapefile Z data will be lost.
Wrote data/CA_Building_Standards_Climate_Zones.json
--2016-06-23 16:24:25--  ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-inventory.csv
           => 'data/.listing'
Resolving ftp.ncdc.noaa.gov... 2610:20:8040:2::101, 205.167.25.101
Connecting to ftp.ncdc.noaa.gov|2610:20:8040:2::101|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/data/noaa ... done.
==> EPSV ... done.    ==> LIST ... done.

.listing                [ <=>                  ]   9.50K  --.-KB/s   in 0.02s  

2016-06-23 16:24:27 (567 KB/s) - 'data/.listing' saved [9730]

Removed 'data/.listing'.
--2016-06-23 16:24:27--  ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-inventory.csv
           => 'data/isd-inventory.csv'
==> CWD not required.
==> EPSV ... done.    ==> RETR isd-inventory.csv ... done.
Length: 54294056 (52M)

isd-inventory.csv   100%[=====================>]  51.78M   614KB/s   in 62s    

2016-06-23 16:25:29 (861 KB/s) - 'data/isd-inventory.csv' saved [54294056]

--2016-06-23 16:25:29--  ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-history.csv
           => 'data/.listing'
Resolving ftp.ncdc.noaa.gov... 205.167.25.101, 2610:20:8040:2::101
Connecting to ftp.ncdc.noaa.gov|205.167.25.101|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/data/noaa ... done.
==> PASV ... done.    ==> LIST ... done.

.listing                [ <=>                  ]   9.50K  --.-KB/s   in 0.06s  

2016-06-23 16:25:31 (169 KB/s) - 'data/.listing' saved [9730]

Removed 'data/.listing'.
--2016-06-23 16:25:31--  ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-history.csv
           => 'data/isd-history.csv'
==> CWD not required.
==> PASV ... done.    ==> RETR isd-history.csv ... done.
Length: 2883379 (2.7M)

isd-history.csv     100%[=====================>]   2.75M   761KB/s   in 4.2s   

2016-06-23 16:25:36 (670 KB/s) - 'data/isd-history.csv' saved [2883379]

--2016-06-23 16:25:36--  https://gist.githubusercontent.com/philngo/d3e251040569dba67942/raw/d1d2e13d73cc50147be6c90d8232f2e4c3eeaffc/climate_zones.csv
Resolving gist.githubusercontent.com... 151.101.40.133
Connecting to gist.githubusercontent.com|151.101.40.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 100071 (98K) [text/plain]
Saving to: 'data/climate_zones.csv'

climate_zones.csv   100%[=====================>]  97.73K   563KB/s   in 0.2s   

Last-modified header missing -- time-stamps turned off.
2016-06-23 16:25:38 (563 KB/s) - 'data/climate_zones.csv' saved [100071/100071]

--2016-06-23 16:25:38--  http://www2.census.gov/geo/docs/reference/codes/files/national_county.txt
Resolving www2.census.gov... 2600:1406:2c:18e::208c, 2600:1406:2c:183::208c, 23.72.103.78
Connecting to www2.census.gov|2600:1406:2c:18e::208c|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94210 (92K) [text/plain]
Saving to: 'data/national_county.txt'

national_county.txt 100%[=====================>]  92.00K  --.-KB/s   in 0.1s   

2016-06-23 16:25:38 (805 KB/s) - 'data/national_county.txt' saved [94210/94210]

--2016-06-23 16:25:39--  http://energy.gov/sites/prod/files/2015/02/f19/ba_climate_guide_2013.pdf
Resolving energy.gov... 2605:9f00:0:3::3, 199.167.76.13
Connecting to energy.gov|2605:9f00:0:3::3|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2504717 (2.4M) [application/pdf]
Saving to: 'data/ba_climate_guide_2013.pdf'

ba_climate_guide_20 100%[=====================>]   2.39M   892KB/s   in 2.7s   

2016-06-23 16:25:42 (892 KB/s) - 'data/ba_climate_guide_2013.pdf' saved [2504717/2504717]

Load python packages


In [2]:
%matplotlib inline

from datetime import datetime, timedelta
from collections import defaultdict
import json

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap # only for showing maps - can be skipped.
from shapely.geometry import shape, Point, asShape
import pyproj
import requests
from bs4 import BeautifulSoup

Gather and inspect NOAA station metadata

Gather the lat/long coordinates for all USAF weather stations in the isd-history.csv index file, storing them by USAF ID.


In [3]:
isd_history = pd.read_csv('data/isd-history.csv', dtype={"USAF": str, "WBAN": str}, parse_dates=["BEGIN", "END"])
hasGEO = (
    isd_history.LAT.notnull() & 
    isd_history.LON.notnull() & 
    (isd_history.LAT != 0)
)

# RQ = Peurto Rico
# GQ = Guam
isUS = (((isd_history.CTRY == "US") & (isd_history.STATE.notnull())) 
        | (isd_history.CTRY == "RQ") | (isd_history.CTRY == "GQ"))
isd_history = isd_history[hasGEO & isUS]

isd_history["hasUSAF"] = isd_history.USAF != "999999"

usaf_station_lat_lngs = {}
for usaf_station, rows in isd_history[isd_history.hasUSAF].groupby("USAF"):
    
    # take most recent as most accurate lat long
    row = rows.ix[rows.END.argmax()]
    usaf_station_lat_lngs[usaf_station] = (row.LAT, row.LON)

usaf_station_ids = sorted(list(usaf_station_lat_lngs.keys()))
len(usaf_station_ids)


Out[3]:
3792

In [4]:
def map_points(lats, lons, title):
    plt.figure(figsize=(15,7))

    m = Basemap(
        width=15000000,
        height=7000000,
        resolution='l',
        projection='laea',\
        lat_ts=50,
        lat_0=50,
        lon_0=-107.
    )

    #m.drawcoastlines()
    #m.drawcountries()

    m.fillcontinents(color='coral', lake_color='aqua')
    m.drawmapboundary(fill_color='aqua')

    m.drawparallels(np.arange(0., 90, 10.), labels=[1, 0, 0, 0], fontsize=10)
    m.drawmeridians(np.arange(150., 360., 10.), labels=[0, 0, 0, 1], fontsize=10)

    x, y = m(lons, lats)
    m.scatter(x, y, 3, marker='o', color='k', zorder=10)

    plt.title(title)
    plt.show()
    
lats, lons = zip(*list(usaf_station_lat_lngs.values()))
title = "Available US USAF weather stations (n={})".format(len(lats))
map_points(lats, lons, title)


/usr/local/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):

Filter out stations which don't have enough data

The above gave us a list of available US stations - now we want to see which we should actually use based on missing data criteria


In [5]:
# load up station inventory - data points per station per month
isd_inventory = pd.read_csv('data/isd-inventory.csv', dtype={"USAF": str, "WBAN": str, "YEAR": str})

In [6]:
isd_index = defaultdict(set)
for usaf, full in zip(isd_inventory.USAF, isd_inventory.USAF + '-' + isd_inventory.WBAN):
    if usaf is not '999999':
        isd_index[usaf].add(full)
isd_index = {k: list(v) for k, v in isd_index.items()}

In [7]:
isd_inventory.head()


Out[7]:
USAF WBAN YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
0 007005 99999 2012 18 0 0 0 0 0 0 0 0 0 0 0
1 007011 99999 2011 0 0 0 0 0 0 0 0 0 21 0 2791
2 007011 99999 2012 771 0 183 0 0 0 142 13 9 0 4 0
3 007018 99999 2011 0 0 2104 2797 2543 2614 382 0 0 0 0 0
4 007018 99999 2013 0 0 0 0 0 0 710 0 0 0 0 0

In [8]:
n_years = 5
months = {
    i + 1: month
    for i, month in enumerate([
        "JAN", "FEB", "MAR", "APR", "MAY", "JUN",
        "JUL", "AUG", "SEP", "OCT", "NOV", "DEC"
    ])
}

dates = pd.date_range((datetime.now() - timedelta(days=365*n_years + 31)).date(), freq='M', periods=12 * n_years + 1)
selectors = [(str(i.year), months[i.month]) for i in dates]
print(selectors)


[('2011', 'MAY'), ('2011', 'JUN'), ('2011', 'JUL'), ('2011', 'AUG'), ('2011', 'SEP'), ('2011', 'OCT'), ('2011', 'NOV'), ('2011', 'DEC'), ('2012', 'JAN'), ('2012', 'FEB'), ('2012', 'MAR'), ('2012', 'APR'), ('2012', 'MAY'), ('2012', 'JUN'), ('2012', 'JUL'), ('2012', 'AUG'), ('2012', 'SEP'), ('2012', 'OCT'), ('2012', 'NOV'), ('2012', 'DEC'), ('2013', 'JAN'), ('2013', 'FEB'), ('2013', 'MAR'), ('2013', 'APR'), ('2013', 'MAY'), ('2013', 'JUN'), ('2013', 'JUL'), ('2013', 'AUG'), ('2013', 'SEP'), ('2013', 'OCT'), ('2013', 'NOV'), ('2013', 'DEC'), ('2014', 'JAN'), ('2014', 'FEB'), ('2014', 'MAR'), ('2014', 'APR'), ('2014', 'MAY'), ('2014', 'JUN'), ('2014', 'JUL'), ('2014', 'AUG'), ('2014', 'SEP'), ('2014', 'OCT'), ('2014', 'NOV'), ('2014', 'DEC'), ('2015', 'JAN'), ('2015', 'FEB'), ('2015', 'MAR'), ('2015', 'APR'), ('2015', 'MAY'), ('2015', 'JUN'), ('2015', 'JUL'), ('2015', 'AUG'), ('2015', 'SEP'), ('2015', 'OCT'), ('2015', 'NOV'), ('2015', 'DEC'), ('2016', 'JAN'), ('2016', 'FEB'), ('2016', 'MAR'), ('2016', 'APR'), ('2016', 'MAY')]

In [9]:
# format the data nicely, grabbing the data from each of the months listed above ^^
usaf_station_inventory_data = {}
for station, group in isd_inventory.groupby("USAF"):
    if station in usaf_station_ids:
        group = group.sort_values(by='YEAR')
        group = group.drop(['WBAN', 'USAF'], 1)
        
        # put months in a single column
        group = group.reset_index(drop=True).set_index(['YEAR']) \
            .unstack(level='YEAR').T.swaplevel(0, 1).sortlevel(0)
        
        # set index names
        group.index = group.index.set_names(["YEAR", "MONTH"])
        
        # combine duplicate years
        group = group.groupby(level=["YEAR", "MONTH"]).sum()
        
        # get n for each relevant month
        data = np.array([group.get(selector, 0) for selector in selectors])
        usaf_station_inventory_data[station] = data

In [10]:
# explore a bit to see where we should draw the missing data cutoff
means = []
quantiles = []
for station, data in list(usaf_station_inventory_data.items()):
    means.append(data.mean())
    quantiles.append(np.percentile(data, 5))
    
plt.title("Histogram of means of monthly data point counts - all stations")
plt.hist(means, bins=50)
plt.show()

plt.title("Histogram of 5th percentiles of monthly data point counts - all stations")
plt.hist(quantiles, bins=50)
plt.show()


Accept stations that have greater than 600 points at their 5th percentile data-intensity month.


In [11]:
accepted_usaf_stations = set()
for station, data in list(usaf_station_inventory_data.items()):
    if np.percentile(data, 5) > 600:
        accepted_usaf_stations.add(station)
print len(accepted_usaf_stations)


2069

In [12]:
accepted_usaf_station_lat_lngs = {}
rejected_usaf_station_lat_lngs = {}
for station, data in usaf_station_lat_lngs.items():
    if station in accepted_usaf_stations:
        accepted_usaf_station_lat_lngs[station] = data
    else:
        rejected_usaf_station_lat_lngs[station] = data

accepted_usaf_station_points = {s: Point(lng, lat) for s, (lat, lng) in accepted_usaf_station_lat_lngs.items()}

lats, lons = zip(*list(accepted_usaf_station_lat_lngs.values()))
title = "Accepted US USAF weather stations (n={})".format(len(lats))
map_points(lats, lons, title)

lats, lons = zip(*list(rejected_usaf_station_lat_lngs.values()))
title = "Rejected US USAF weather stations (n={})".format(len(lats))
map_points(lats, lons, title)



In [13]:
print(list(accepted_usaf_station_lat_lngs.items())[:2])


[('725512', (40.893999999999998, -97.626000000000005)), ('726546', (44.905000000000001, -97.149000000000001))]

In [14]:
len(usaf_station_inventory_data.keys())


Out[14]:
3792

Match ZIP Codes (really ZCTAs) and Weather Stations to Climate Zones.

  1. Load geometric data - zipcode, county polygons
  2. Match zipcode centroids to counties
  3. Match counties to climate zones
  4. Match zipcodes to climate zones through counties (except for CA, which is direct zipcode -> climate zone poly)
  5. Match stations to climate zones the same way
  6. Take stations falling outside of counties and match to zipcode by distance, discarding stations that are > 25km from zip codes (usually because in ocean - see map)
  7. Update accepted station list to discard far oceanic stations

In [15]:
# load zipcode geojson
with open('data/cb_2014_us_zcta510_500k.json', 'r') as f:
    zip_js = json.load(f)
    
zipcode_polygons = {}
for zip_feature in zip_js['features']:
    zipcode = zip_feature['properties']['GEOID10']
    polygon = shape(zip_feature['geometry'])
    zipcode_polygons[zipcode] = polygon

In [16]:
zipcode_polygons["11542"]


Out[16]:

In [17]:
# load county geojson
with open('data/cb_2013_us_county_500k.json', 'r') as f:
    county_js = json.load(f)
    
county_polygons = {}
for county_feature in county_js['features']:
    county = county_feature['properties']['GEOID']
    polygon = shape(county_feature['geometry'])
    county_polygons[county] = polygon

In [18]:
county_polygons["16079"]


Out[18]:

Load climate zone metadata


In [19]:
# load county climate zones:
climate_zones = pd.read_csv('data/climate_zones.csv',
        dtype={"State FIPS": str, "County FIPS": str},
        usecols=["State FIPS", "County FIPS", "IECC Climate Zone", "IECC Moisture Regime", "BA Climate Zone"])

In [20]:
climate_zones.head()


Out[20]:
State FIPS County FIPS IECC Climate Zone IECC Moisture Regime BA Climate Zone
0 02 013 7 NaN Very Cold
1 02 016 7 NaN Very Cold
2 02 020 7 NaN Very Cold
3 02 050 8 NaN Subarctic
4 02 060 7 NaN Very Cold

In [21]:
# gather a list of counties (not including california)
counties_dict = {}
for i, row in climate_zones.iterrows():
    # if not in california
    if row["State FIPS"] != "06":
        county_id = row["State FIPS"] + row["County FIPS"]
        county_polygon = county_polygons.get(county_id)
        if county_polygon is not None:
            counties_dict[county_id] = {
                "climate_zone": "{}|{}|{}".format(
                    row["IECC Climate Zone"],
                    row["IECC Moisture Regime"] if not pd.isnull( row["IECC Moisture Regime"]) else "NA",
                    row["BA Climate Zone"]),
                "polygon": county_polygon,
                "centroid": county_polygon.centroid
            }
        else:
            print("Could not find county {}, skipping.".format(county_id))


Could not find county 74300, skipping.

In [22]:
counties_dict["16079"]


Out[22]:
{'centroid': <shapely.geometry.point.Point at 0x10829a090>,
 'climate_zone': '5|B|Cold',
 'polygon': <shapely.geometry.polygon.Polygon at 0x11f079a10>}

In [23]:
# treat CA climate zones specially:
with open('data/CA_Building_Standards_Climate_Zones.json', 'r') as f:
    ca_js = json.load(f)

california_climate_zone_polygons = {}
california_climate_zone_centroids = {}
for ca_feature in ca_js['features']:
    zone = "CA_{:02d}".format(int(ca_feature['properties']['Zone']))
    polygon = shape(ca_feature['geometry'])
    california_climate_zone_polygons[zone] = polygon
    california_climate_zone_centroids[zone] = polygon.centroid

In [24]:
california_climate_zone_polygons['CA_02']


Out[24]:

Make centroids

Note that centroids are not always within the polygon,or on land, or within the county. They are used here as a rough "good enough" approximation of location.


In [25]:
n_zipcodes = len(zipcode_polygons)
zipcode_centroid_points = {}
zipcode_centroid_lat_lngs = {}

for i, (zipcode, zipcode_poly) in enumerate(zipcode_polygons.items()):
    zipcode_centroid = zipcode_poly.centroid
    zipcode_centroid_points[zipcode] = zipcode_centroid
    zipcode_centroid_lat_lngs[zipcode] = (zipcode_centroid.coords[0][1], zipcode_centroid.coords[0][0])

In [26]:
[(z, p.x, p.y) for z, p in zipcode_centroid_points.items()][:10]


Out[26]:
[(u'11542', -73.62889780791781, 40.870932001722714),
 (u'54548', -89.82102639934573, 45.859817510232425),
 (u'11547', -73.6445076194238, 40.83072033793459),
 (u'11545', -73.58899991887439, 40.827048361579266),
 (u'11548', -73.60680753957794, 40.81464188131546),
 (u'11549', -73.60280114007139, 40.71729775313752),
 (u'22749', -78.18177930211002, 38.620227836872665),
 (u'31738', -83.87306418376528, 31.00955756861446),
 (u'19390', -75.84007888524287, 39.832897242595976),
 (u'82443', -108.47145713272253, 43.72508975529363)]

In [27]:
lats, lons = zip(*list(zipcode_centroid_lat_lngs.values()))
title = "ZCTAs".format(len(lats))
map_points(lats, lons, title)


Map zipcodes to climate zones

  1. If ZIP code centroid falls within non-CA county, use that county's climate zone
  2. If ZIP code falls in CA climate zone poly use that
  3. Otherwise fail with None
  4. For failed, find nearest zipcode that did match, and use that climate zone.
  5. Update missing None zipcodes

In [28]:
zipcode_to_climate_zone = {}

# some optimizations for the loop.
counties_dict_items = list(counties_dict.items())
california_climate_zone_polygons_items = list(california_climate_zone_polygons.items())

for i, zipcode in enumerate(zipcode_polygons.keys()):
    print '\r{} of {}'.format(i + 1, n_zipcodes),
    
    zipcode_centroid = zipcode_centroid_points[zipcode]
    
    # check to see if non-CA counties contain centroid
    for county, county_dict in counties_dict_items:
        county_poly = county_dict["polygon"]
        if county_poly.contains(zipcode_centroid):
            zipcode_to_climate_zone[zipcode] = county_dict["climate_zone"]
            break
    else:  # for--else - executes if no break encountered in loop
        # check to see if a CA climate zone contains centroid
        for ca_cz, ca_cz_poly in california_climate_zone_polygons_items:
            if ca_cz_poly.contains(zipcode_centroid):
                zipcode_to_climate_zone[zipcode] = ca_cz
                break
        else:  # for--else - executes if no break encountered in loop
            zipcode_to_climate_zone[zipcode] = None


33144 of 33144

In [29]:
np.unique([cz for z, cz in zipcode_to_climate_zone.items()])


Out[29]:
array([None, '1|A|Hot-Humid', '2|A|Hot-Humid', '2|A|Mixed-Humid',
       '2|B|Hot-Dry', '2|B|Hot-Humid', '3|A|Hot-Humid', '3|A|Mixed-Humid',
       '3|B|Hot-Dry', '4|A|Mixed-Humid', '4|B|Cold', '4|B|Mixed-Dry',
       '4|C|Cold', '4|C|Marine', '5|A|Cold', '5|B|Cold', '6|A|Cold',
       '6|B|Cold', '7|NA|Very Cold', '8|NA|Subarctic', 'CA_01', 'CA_02',
       'CA_03', 'CA_04', 'CA_05', 'CA_06', 'CA_07', 'CA_08', 'CA_09',
       'CA_10', 'CA_11', 'CA_12', 'CA_13', 'CA_14', 'CA_15', 'CA_16'], dtype=object)

In [30]:
list(zipcode_to_climate_zone.items())[:2]


Out[30]:
[(u'11542', '4|A|Mixed-Humid'), (u'54548', '7|NA|Very Cold')]

In [31]:
# non-matches ?
len([z for z, cz in zipcode_to_climate_zone.items() if cz is None])


Out[31]:
57

In [32]:
no_climate_zone_zipcode_lat_lngs = [zipcode_centroid_lat_lngs[z] for z, cz in zipcode_to_climate_zone.items() if cz is None]
lats, lons = zip(*no_climate_zone_zipcode_lat_lngs)
title = "ZIP codes not inside climate zone (n={})".format(len(lats))
map_points(lats, lons, title)



In [33]:
geod = pyproj.Geod(ellps='WGS84')
cutoff = 25  # km
near_zipcode_zipcodes = {}
near_zipcode_lat_lngs = {}

zipcode_centroid_lat_lngs_items = [(z, ll) for z, ll in zipcode_centroid_lat_lngs.items() if zipcode_to_climate_zone[z] is not None]
lats2, lngs2 = zip(*[ll for z, ll in zipcode_centroid_lat_lngs_items])
lats2, lngs2 = np.array(lats2), np.array(lngs2)
        
for z, cz in zipcode_to_climate_zone.items():
    if cz is None:
        
        lat1, lng1 = zipcode_centroid_lat_lngs[z]
        lats1, lngs1 = np.tile(lat1, lats2.shape), np.tile(lng1, lngs2.shape)
        
        dists = geod.inv(lngs1, lats1, lngs2, lats2)[2] / 1000. # km
        min_dist_i = np.argmin(dists)
        min_dist = dists[min_dist_i]
        min_zipcode = zipcode_centroid_lat_lngs_items[min_dist_i][0]
        near_zipcode_zipcodes[z] = min_zipcode
        near_zipcode_lat_lngs[z] = lat1, lng1

In [34]:
zipcode_climate_zone_update = {z1: zipcode_to_climate_zone[z2] for z1, z2 in near_zipcode_zipcodes.items()}

In [35]:
list(zipcode_climate_zone_update.items())[:2]


Out[35]:
[(u'20674', '4|A|Mixed-Humid'), (u'49670', '6|A|Cold')]

In [36]:
len(zipcode_climate_zone_update)


Out[36]:
57

In [37]:
# patch zipcode -> climate zone matching
zipcode_to_climate_zone.update(zipcode_climate_zone_update)

In [38]:
# Make sure no zipcodes map to None
np.unique([cz for z, cz in zipcode_to_climate_zone.items()])


Out[38]:
array(['1|A|Hot-Humid', '2|A|Hot-Humid', '2|A|Mixed-Humid', '2|B|Hot-Dry',
       '2|B|Hot-Humid', '3|A|Hot-Humid', '3|A|Mixed-Humid', '3|B|Hot-Dry',
       '4|A|Mixed-Humid', '4|B|Cold', '4|B|Mixed-Dry', '4|C|Cold',
       '4|C|Marine', '5|A|Cold', '5|B|Cold', '6|A|Cold', '6|B|Cold',
       '7|NA|Very Cold', '8|NA|Subarctic', 'CA_01', 'CA_02', 'CA_03',
       'CA_04', 'CA_05', 'CA_06', 'CA_07', 'CA_08', 'CA_09', 'CA_10',
       'CA_11', 'CA_12', 'CA_13', 'CA_14', 'CA_15', 'CA_16'], 
      dtype='|S15')

Map weather stations to climate zones - same method as for zipcodes


In [39]:
# map weather stations to climate zones

station_to_climate_zone = {}

n_stations = len(accepted_usaf_station_points)

for i, (station, station_point) in enumerate(accepted_usaf_station_points.items()):
    print '\r{} of {}'.format(i+1, n_stations),
    
    # Is the station in a non-CA county?
    for county, county_dict in counties_dict_items:
        county_poly = county_dict["polygon"]
        if county_poly.contains(station_point):
            station_to_climate_zone[station] = county_dict["climate_zone"]
            break
    else:  # if no break
        
        # is the station in a california climate zone?
        for ca_cz, ca_cz_poly in california_climate_zone_polygons_items:
            if ca_cz_poly.contains(station_point):
                station_to_climate_zone[station] = ca_cz
                break
        else:  # if no break
            station_to_climate_zone[station] = None


2069 of 2069

In [40]:
list(station_to_climate_zone.items())[:2]


Out[40]:
[('725512', '5|A|Cold'), ('726546', '6|A|Cold')]

In [41]:
# non-matches ?
len([s for s, cz in station_to_climate_zone.items() if cz is None])


Out[41]:
123

In [42]:
no_climate_zone_station_lat_lngs = [usaf_station_lat_lngs[s] for s, cz in station_to_climate_zone.items() if cz is None]
lats, lons = zip(*no_climate_zone_station_lat_lngs)
title = "US USAF weather stations not inside climate zone (n={})".format(len(lats))
map_points(lats, lons, title)


Clean up stations that didn't fall within a climate zone

  1. Find nearest zipcode to station
  2. If station is within 25km of nearest zipcode, keep it and use it to find the station's climate zone, otherwise, discard. These discards are usually oceanic.

In [43]:
geod = pyproj.Geod(ellps='WGS84')
cutoff = 25  # km
near_station_zipcodes = {}
far_station_zipcodes = {}
near_station_lat_lngs = {}
far_station_lat_lngs = {}
min_dists = []
zipcode_centroid_lat_lngs_items = zipcode_centroid_lat_lngs.items()
lats2, lngs2 = zip(*[p for z, p in zipcode_centroid_lat_lngs_items])
lats2, lngs2 = np.array(lats2), np.array(lngs2)

for s, cz in station_to_climate_zone.items():
    if cz is None:
        
        lat1, lng1 = accepted_usaf_station_lat_lngs[s]
        lats1, lngs1 = np.tile(lat1, lats2.shape), np.tile(lng1, lngs2.shape)
        dists = geod.inv(lngs1, lats1, lngs2, lats2)[2] / 1000. # km
        min_dist_i = np.argmin(dists)
        min_dist = dists[min_dist_i]
        min_dists.append(min_dist)
        min_zipcode = zipcode_centroid_lat_lngs_items[min_dist_i][0]
        
        if min_dist < cutoff:
            near_station_zipcodes[s] = min_zipcode
            near_station_lat_lngs[s] = lat1, lng1
        else:
            far_station_zipcodes[s] = min_zipcode
            far_station_lat_lngs[s] = lat1, lng1
plt.hist(min_dists, bins=30)
plt.title("Histogram of distances to nearest zipcodes: cutoff={}km, n={}/{}".format(cutoff, len(near_station_zipcodes), len(min_dists)))
plt.show()



In [44]:
lats, lons = zip(*far_station_lat_lngs.values())
title = "US USAF weather stations >25km from nearest ZIP code (n={}) -> DISCARD".format(len(lats))
map_points(lats, lons, title)



In [45]:
# correct station_to_climate_zone mapping
accepted_near_usaf_station_lat_lngs = {s: ll for s, ll in accepted_usaf_station_lat_lngs.items() if s not in far_station_lat_lngs}
rejected_far_usaf_station_lat_lngs = {s: ll for s, ll in rejected_usaf_station_lat_lngs.items()}
rejected_far_usaf_station_lat_lngs.update(far_station_lat_lngs)

In [46]:
filtered_station_to_climate_zone = {}
for s, cz in station_to_climate_zone.items():
    if cz is None:
        zipcode = near_station_zipcodes.get(s)
        if zipcode is not None:
            filtered_station_to_climate_zone[s] = zipcode_to_climate_zone[zipcode]
    else:
        filtered_station_to_climate_zone[s] = cz

In [47]:
len(filtered_station_to_climate_zone)


Out[47]:
2041

In [48]:
len([s for s, cz in filtered_station_to_climate_zone.items() if cz is None])


Out[48]:
0

Match to TMY3 data.

Load TMY3 station metadata by scraping the NREL NSRDB site.


In [49]:
tmy3_index_page_url = (
    "http://rredc.nrel.gov/solar/old_data/"
    "nsrdb/1991-2005/tmy3/by_USAFN.html"
)

soup = BeautifulSoup(requests.get(tmy3_index_page_url).text)
tmy3_station_elements = soup.select('td .hide')

tmy3_station_data = {}
for station_el in tmy3_station_elements:
    station_name_el = station_el.findNext('td').findNext('td')
    station_class_el = station_name_el.findNext('td')
    tmy3_station_data[station_el.text.strip()] = {
        "name": (
            "".join(station_name_el.text.split(",")[:-1])
            .replace("\n", "").replace("\t", "").strip()
        ),
        "state": station_name_el.text.split(",")[-1].strip(),
        "class": station_class_el.text.split()[1].strip(),
    }

In [50]:
list(tmy3_station_data.items())[:2]


Out[50]:
[(u'726917',
  {'class': u'II', 'name': u'North Bend Muni Airport', 'state': u'OR'}),
 (u'745700',
  {'class': u'II', 'name': u'Dayton Wright Patterson AFB', 'state': u'OH'})]

In [51]:
accepted_near_tmy3_station_lat_lngs = {}
rejected_far_tmy3_station_lat_lngs = {}
unknown_tmy3_stations = set()
for station, data in tmy3_station_data.items():
    if station in usaf_station_lat_lngs:
        if station in accepted_near_usaf_station_lat_lngs:
            accepted_near_tmy3_station_lat_lngs[station] = usaf_station_lat_lngs[station]
        else:
            rejected_far_tmy3_station_lat_lngs[station] = usaf_station_lat_lngs[station]
    else:
        unknown_tmy3_stations.add(station)

In [52]:
# Take only TMY3 stations that match to USAF stations - 0% unknown
n_unknown = len(unknown_tmy3_stations)
n_rejected = len(rejected_far_tmy3_station_lat_lngs)
n_accepted = len(accepted_near_tmy3_station_lat_lngs)
n_total = len(tmy3_station_data)
print "Unknown: {} ({:.1f}%)".format(n_unknown, float(n_unknown) / n_total * 100.)
print "Accepted: {} ({:.1f}%)".format(n_accepted, float(n_accepted) / n_total * 100.)
print "Rejected: {} ({:.1f}%)".format(n_rejected, float(n_rejected) / n_total * 100.)
print "Total: {}".format(n_total)


Unknown: 0 (0.0%)
Accepted: 912 (89.4%)
Rejected: 108 (10.6%)
Total: 1020

In [53]:
lats, lons = zip(*accepted_near_tmy3_station_lat_lngs.values())
title = "Accepted TMY3 weather stations".format(len(lats))
map_points(lats, lons, title)



In [54]:
lats, lons = zip(*rejected_far_tmy3_station_lat_lngs.values())
title = "Rejected TMY3 weather stations".format(len(lats))
map_points(lats, lons, title)



In [55]:
# climate zone to station list mapping
climate_zone_to_usaf_stations = defaultdict(list)
for station, climate_zone in filtered_station_to_climate_zone.items():
    assert climate_zone is not None
    climate_zone_to_usaf_stations[climate_zone].append(station)
climate_zone_to_usaf_stations = dict(climate_zone_to_usaf_stations)
print list(climate_zone_to_usaf_stations.items())[:2]


[('3|B|Hot-Dry', ['722122', '722630', '722637', '720281', '722700', '746141', '723660', '747335', '723865', '723860', '722650', '722656', '722654', '723700', '722096', '690190', '747320', '720659', '722640', '722648', '722676', '722670', '722695', '722693', '720741', '722324', '722660', '724846', '722192', '722687', '722680', '722537', '720771', '723788', '722169', '720151', '722747', '747400', '722107', '722616', '722720', '722725', '722728', '722730', '720271']), ('2|B|Hot-Dry', ['720644', '699604', '722789', '722784', '722785', '722786', '722780', '722740', '722745', '722748'])]

In [56]:
for cz, stations in sorted(climate_zone_to_usaf_stations.items(), key=lambda x: x[0]):
    lats, lons = zip(*[usaf_station_lat_lngs[s] for s in stations])
    title = "Climate Zone {} USAF stations (n={})".format(cz, len(lats))
    map_points(lats, lons, title)



In [57]:
# climate zone to station list mapping
climate_zone_to_tmy3_stations = defaultdict(list)
for station, climate_zone in filtered_station_to_climate_zone.items():
    if station in accepted_near_tmy3_station_lat_lngs:
        assert climate_zone is not None
        climate_zone_to_tmy3_stations[climate_zone].append(station)
climate_zone_to_tmy3_stations = dict(climate_zone_to_tmy3_stations)
print list(climate_zone_to_tmy3_stations.items())[:2]


[('3|B|Hot-Dry', ['722630', '722700', '723865', '723860', '722650', '722656', '723700', '690190', '747320', '722640', '722670', '722695', '722660', '722687', '722680', '722747', '722725']), ('2|B|Hot-Dry', ['699604', '722789', '722784', '722785', '722780', '722740', '722745', '722748'])]

In [58]:
empty_climate_zones = [cz for cz in climate_zone_to_usaf_stations.keys() if climate_zone_to_tmy3_stations.get(cz) is None]
empty_climate_zones


Out[58]:
['4|C|Cold', '4|B|Cold']

Patch up empty climate zones

  1. For each USAF station in the climate zone, find nearest TMY3 station
  2. Take single best TMY3 station (in this case, all agree, so just take first)
  3. Update TMY3 Climate zone mapping.

In [59]:
geod = pyproj.Geod(ellps='WGS84')
cutoff = 25  # km
climate_zone_tmy3_station_updates = {}
min_dists = []
accepted_near_tmy3_station_lat_lngs_items = accepted_near_tmy3_station_lat_lngs.items()
lats2, lngs2 = zip(*[p for z, p in accepted_near_tmy3_station_lat_lngs_items])
lats2, lngs2 = np.array(lats2), np.array(lngs2)
for empty_climate_zone in empty_climate_zones:
    
    min_dists = []
    min_stations = []
    for usaf_station in climate_zone_to_usaf_stations[empty_climate_zone]:
        lat1, lng2 = usaf_station_lat_lngs[usaf_station]
        lats1, lngs1 = np.tile(lat1, lats2.shape), np.tile(lng1, lngs2.shape)
        dists = geod.inv(lngs1, lats1, lngs2, lats2)[2] / 1000. # km
        min_dist_i = np.argmin(dists)
        min_dist = dists[min_dist_i]
        min_dists.append(min_dist)
        min_station = accepted_near_tmy3_station_lat_lngs_items[min_dist_i][0]
        min_stations.append(min_station)
    if not all(min_stations[0] == s for s in min_stations):
        print "Not all close to single TMY3 station"
    climate_zone_tmy3_station_updates[empty_climate_zone] = [min_stations[0]]

In [60]:
climate_zone_tmy3_station_updates


Out[60]:
{'4|B|Cold': [u'911650'], '4|C|Cold': [u'704890']}

In [61]:
# remove if exist somewhere else:
for cz, stations in climate_zone_to_tmy3_stations.items():
    for cz_update, station_update in climate_zone_tmy3_station_updates.items():
        if station_update[0] in stations:
            stations.remove(station_update[0])

In [62]:
climate_zone_to_tmy3_stations.update(climate_zone_tmy3_station_updates)

In [63]:
empty_climate_zones = [cz for cz in climate_zone_to_usaf_stations.keys() if climate_zone_to_tmy3_stations.get(cz) is None]
empty_climate_zones


Out[63]:
[]

In [64]:
for cz, stations in sorted(climate_zone_to_tmy3_stations.items(), key=lambda x: x[0]):
    lats, lons = zip(*[usaf_station_lat_lngs[s] for s in stations])
    title = "Climate Zone {} TMY3 stations (n={})".format(cz, len(lats))
    map_points(lats, lons, title)


Map stations to zipcodes by looking, if possible, for the closest weather station within the same climate zone. If not within a climate zone, just pick the station which is closest overall.


In [65]:
# zipcode to usaf station mapping
zipcode_to_usaf_station = {}
all_stations = accepted_near_usaf_station_lat_lngs.keys()
for i, (zipcode, (lat1, lng1)) in enumerate(zipcode_centroid_lat_lngs.items()):
    print("\r{} of {}".format(i + 1, n_zipcodes)),
    
    # get set of stations to compare for distance.
    climate_zone = zipcode_to_climate_zone[zipcode]
    assert climate_zone is not None
    stations = climate_zone_to_usaf_stations[climate_zone]
    
    # find minimum distance
    lats2, lngs2 = zip(*[usaf_station_lat_lngs[s] for s in stations])
    lats2, lngs2 = np.array(lats2), np.array(lngs2)
    lats1, lngs1 = np.tile(lat1, lats2.shape), np.tile(lng1, lngs2.shape)
    dists = geod.inv(lngs1, lats1, lngs2, lats2)[2]  # meters
    min_station = stations[np.argmin(dists)]
    
    zipcode_to_usaf_station[zipcode] = min_station


33144 of 33144

In [66]:
# zipcode to tmy3 station mapping
zipcode_to_tmy3_station = {}
all_stations = accepted_near_tmy3_station_lat_lngs.keys()
for i, (zipcode, (lat1, lng1)) in enumerate(zipcode_centroid_lat_lngs.items()):
    print("\r{} of {}".format(i + 1, n_zipcodes)),
    
    # get set of stations to compare for distance.
    climate_zone = zipcode_to_climate_zone[zipcode]
    assert climate_zone is not None
    stations = climate_zone_to_tmy3_stations[climate_zone]
    
    # find minimum distance
    lats2, lngs2 = zip(*[usaf_station_lat_lngs[s] for s in stations])
    lats2, lngs2 = np.array(lats2), np.array(lngs2)
    lats1, lngs1 = np.tile(lat1, lats2.shape), np.tile(lng1, lngs2.shape)
    dists = geod.inv(lngs1, lats1, lngs2, lats2)[2]  # meters
    min_station = stations[np.argmin(dists)]
    
    zipcode_to_tmy3_station[zipcode] = min_station


33144 of 33144

Make some other JSON products that may be useful.


In [67]:
# climate zone to zipcode list mapping
climate_zone_to_zipcodes = defaultdict(list)
for zipcode, climate_zone in zipcode_to_climate_zone.items():
    if climate_zone is not None:
        climate_zone_to_zipcodes[climate_zone].append(zipcode)
climate_zone_to_zipcodes = dict(climate_zone_to_zipcodes)

In [68]:
# usaf station to zipcode list mapping
usaf_station_to_zipcodes = defaultdict(list)
for zipcode, station in zipcode_to_usaf_station.items():
    if station is not None:
        usaf_station_to_zipcodes[station].append(zipcode)
usaf_station_to_zipcodes = dict(usaf_station_to_zipcodes)

In [69]:
# tmy3 station to zipcode list mapping
tmy3_station_to_zipcodes = defaultdict(list)
for zipcode, station in zipcode_to_tmy3_station.items():
    if station is not None:
        tmy3_station_to_zipcodes[station].append(zipcode)
tmy3_station_to_zipcodes = dict(tmy3_station_to_zipcodes)

In [70]:
# usaf station to climate zone mapping
usaf_station_to_climate_zone = {}
for climate_zone, stations in climate_zone_to_usaf_stations.items():
    for station in stations:
        usaf_station_to_climate_zone[station] = climate_zone

In [71]:
# tmy3 station to climate zone mapping
tmy3_station_to_climate_zone = {}
for climate_zone, stations in climate_zone_to_tmy3_stations.items():
    for station in stations:
        tmy3_station_to_climate_zone[station] = climate_zone

In [72]:
# write all outputs:
!mkdir -p outputs


### USAF station -> X
# USAF station -> lat lng
with open('outputs/usaf_station_lat_lngs.json', 'w') as f:
    json.dump(accepted_near_usaf_station_lat_lngs, f)

# USAF station -> zipcodes
with open('outputs/usaf_station_zipcodes.json', 'w') as f:
    json.dump(usaf_station_to_zipcodes, f)

# USAF station -> climate_zone
with open('outputs/usaf_station_climate_zone.json', 'w') as f:
    json.dump(usaf_station_to_climate_zone, f)

# supported USAF stations
with open('outputs/supported_usaf_stations.json', 'w') as f:
    json.dump(sorted(accepted_near_usaf_station_lat_lngs.keys()), f)
    
### TMY3 station -> X
# TMY3 station -> lat lng
with open('outputs/tmy3_station_lat_lngs.json', 'w') as f:
    json.dump(accepted_near_tmy3_station_lat_lngs, f)

# TMY3 station -> zipcodes
with open('outputs/tmy3_station_zipcodes.json', 'w') as f:
    json.dump(tmy3_station_to_zipcodes, f)

# TMY3 station -> climate_zone
with open('outputs/tmy3_station_climate_zone.json', 'w') as f:
    json.dump(tmy3_station_to_climate_zone, f)


# supported TMY3 stations
with open('outputs/supported_tmy3_stations.json', 'w') as f:
    json.dump(sorted(accepted_near_tmy3_station_lat_lngs.keys()), f)


### Zipcode -> X
# zipcode -> lat long
with open('outputs/zipcode_centroid_lat_lngs.json', 'w') as f:
    json.dump(zipcode_centroid_lat_lngs, f)
    
# zipcode -> climate_zone
with open('outputs/zipcode_climate_zone.json', 'w') as f:
    json.dump(zipcode_to_climate_zone, f)

# zipcode -> USAF station
with open('outputs/zipcode_usaf_station.json', 'w') as f:
    json.dump(zipcode_to_usaf_station, f)
    
# zipcode -> TMY3 station
with open('outputs/zipcode_tmy3_station.json', 'w') as f:
    json.dump(zipcode_to_tmy3_station, f)

# supported ZIP codes (ZCTA only)
with open('outputs/supported_zipcodes.json', 'w') as f:
    json.dump(sorted(zipcode_centroid_lat_lngs.keys()), f)

    
### climate zone -> X
# climate_zone -> USAF stations
with open('outputs/climate_zone_usaf_stations.json', 'w') as f:
    json.dump(climate_zone_to_usaf_stations, f)
    
# climate_zone -> stations
with open('outputs/climate_zone_tmy3_stations.json', 'w') as f:
    json.dump(climate_zone_to_tmy3_stations, f)

# climate_zone -> zipcodes
with open('outputs/climate_zone_zipcodes.json', 'w') as f:
    json.dump(climate_zone_to_zipcodes, f)

# supported Climate Zones
with open('outputs/supported_climate_zones.json', 'w') as f:
    json.dump(sorted(climate_zone_to_zipcodes.keys()), f)

# OTHER
with open('outputs/usaf_station_inventory_data.json', 'w') as f:
    json.dump({s: list(d) for s, d in usaf_station_inventory_data.items()}, f)

with open('outputs/tmy3_station_metadata.json', 'w') as f:
    json.dump(tmy3_station_data, f)

with open('outputs/GSOD-ISD_station_index.json', 'w') as f:
    json.dump(isd_index, f)