Figure 3: Cluster-level consumptions

This notebook generates individual panels of Figure 3 in "Combining satellite imagery and machine learning to predict poverty".


In [13]:
from fig_utils import *
import matplotlib.pyplot as plt
import time

%matplotlib inline

Predicting consumption expeditures

The parameters needed to produce the plots are as follows:

  • country: Name of country being evaluated as a lower-case string
  • country_path: Path of directory containing LSMS data corresponding to the specified country
  • dimension: Number of dimensions to reduce image features to using PCA. Defaults to None, which represents no dimensionality reduction.
  • k: Number of cross validation folds
  • k_inner: Number of inner cross validation folds for selection of regularization parameter
  • points: Number of regularization parameters to try
  • alpha_low: Log of smallest regularization parameter to try
  • alpha_high: Log of largest regularization parameter to try
  • margin: Adjusts margins of output plot

The data directory should contain the following 5 files for each country:

  • conv_features.npy: (n, 4096) array containing image features corresponding to n clusters
  • consumptions.npy: (n,) vector containing average cluster consumption expenditures
  • nightlights.npy: (n,) vector containing the average nightlights value for each cluster
  • households.npy: (n,) vector containing the number of households for each cluster
  • image_counts.npy: (n,) vector containing the number of images available for each cluster

Exact results may differ slightly with each run due to randomly splitting data into training and test sets.

Panel A


In [14]:
# Plot parameters
country = 'nigeria'
country_path = '../data/LSMS/nigeria/'
dimension = None
k = 5
k_inner = 5
points = 10
alpha_low = 1
alpha_high = 5
margin = 0.25

# Plot single panel
t0 = time.time()
X, y, y_hat, r_squareds_test = predict_consumption(country, country_path,
                                dimension, k, k_inner, points, alpha_low,
                                alpha_high, margin)
t1 = time.time()
print 'Finished in {} seconds'.format(t1-t0)


Finished in 11.3024020195 seconds

Panel B


In [15]:
# Plot parameters
country = 'tanzania'
country_path = '../data/LSMS/tanzania/'
dimension = None
k = 5
k_inner = 5
points = 10
alpha_low = 1
alpha_high = 5
margin = 0.25

# Plot single panel
t0 = time.time()
X, y, y_hat, r_squareds_test = predict_consumption(country, country_path,
                                dimension, k, k_inner, points, alpha_low,
                                alpha_high, margin)
t1 = time.time()
print 'Finished in {} seconds'.format(t1-t0)


Finished in 13.0215759277 seconds

Panel C


In [16]:
# Plot parameters
country = 'uganda'
country_path = '../data/LSMS/uganda/'
dimension = None
k = 5
k_inner = 5
points = 10
alpha_low = 1
alpha_high = 5
margin = 0.25

# Plot single panel
t0 = time.time()
X, y, y_hat, r_squareds_test = predict_consumption(country, country_path,
                                dimension, k, k_inner, points, alpha_low,
                                alpha_high, margin)
t1 = time.time()
print 'Finished in {} seconds'.format(t1-t0)


Finished in 9.72697901726 seconds

Panel D


In [17]:
# Plot parameters
country = 'malawi'
country_path = '../data/LSMS/malawi/'
dimension = None
k = 5
k_inner = 5
points = 10
alpha_low = 1
alpha_high = 5
margin = 0.25

# Plot single panel
t0 = time.time()
X, y, y_hat, r_squareds_test = predict_consumption(country, country_path,
                                dimension, k, k_inner, points, alpha_low,
                                alpha_high, margin)
t1 = time.time()
print 'Finished in {} seconds'.format(t1-t0)


Finished in 6.32752799988 seconds

In [ ]: