I will explore whether a network-theory driven approach shown to improve the efficiency of an agricultural extension program is sensitive to the models and parameters originally used.
Social networks have been shown to be important vehicles for the transmission of new agricultural methods or 'technologies' (Bandiera and Rasul 2006, Conley and Udry 2010). These types of dynamics and time-varying agent behavior are best captured with through network modeling.
My project is based off a recent paper which used network modeling in conjunction with a large-scale field experiment (Beaman et al 2014). I wish to test the robustness of the findings of their model and so will employ a similar network modeling method.
Beaman and co-authors aimed to improve the rollout of an agricultural extension program using predictions from network theory to optimally select 'seed farmers'. 'Seed farmers' are the select farmers in a village that the agricultural extension program trains. Because it is costly to train farmers in this way, it is most efficient to pick seed farmers such that their adoption of the agricultural technology will lead to the greatest spread of the technology throughout the village.
Beaman and coauthors first elicit the social networks of various rural villages. Then under the condition that the extension program only trains two farmers in each village, they take every possible combination of two nodes in a village network and simulate an information diffusion process for 4 periods. They take a measure of information diffusion at the end of each simulation and the pair of nodes which gives the greatest diffusion is their optimal seeding pair.
Their findings are then used in a field experiment where a random half of total villages are seeded according to their simulated optimal seeds while the other half is seeded according to the extension program's default procedure, usually based off of a field officer's own knowledge of the village and its influential farmers. They find evidence that network-theory informed seeding leads to increased technological adoption over baseline seeding procedures.
I wish to recreate and expand upon their simulations in the following ways:
The original paper looks at rural village in Malawi. I do not have access to their network data but I have a dataset of social graphs from 74 villages in South India. Though there may be differences in network structure between villages in these two locations, I will assume they are reasonably comparable.
First, I will recreate results from Beaman et al by selecting all combinations of node pairs in a subset of 25 villages. For each pair, I will run them through a information diffusion simulation for {3,4,5,6} steps. I will also sweep through values {1,2,3} for a alpha parameter. Each household has an adoption threshold, T, which determines whether they adopt the new technology or not. If X number of connections have adopted the technology and X=>T, then the household will adopt the new technology in the next period. Each household independently drawns a threshold from a normal distribution N(alpha, 0.5) bounded positive, so sweeping through alpha parameters will push up and down the distribution of household thresholds T.
To mitigate stochasticity, I will repeat 2000 times, and take an average measure of information diffusion (given by percent of households adopted at last step). The pair of nodes which give the greatest information difussion are my theory-driven seed farmers equivalent to those found in Beaman et al. I will examine whether the determination of these optimal seed farmers depends on the number of steps run and the alpha parameter used. Then, I will run the same simulations except using the extended information diffusion process described above. I want to see whether seed farmers selected through this method are different than those selected by Beaman's process. For the midterm, I will concentrate on coding the re-creation of method from Beaman et al.
I will model space with an undirected social network. Each node represents a rural household and each edge represents a social connection.
Each node in my network is a household. They are modeled simply and have only a few properties:
In each step, each unadopted household will count the number of connections who have adopted the new technology. If this count exceeds a household's adoption threshold, it will also adopt the technology in the next period.
I will wrap my model in a function which loops through each village, and in each village, loops through every possible pair of nodes. Then, I will sweep through my parameters, number of steps and alpha. I will repeat this under the alternate information diffusion process. I will also determine and collect optimal seeds here.
Each model will start with a list of adopted households. In the first step, only seed households will be in this list which will be read in through the wrapper.
My model will have the following parameters:
In [1]:
#Imports
%matplotlib inline
# Standard imports
import copy
import itertools
# Scientific computing imports
import numpy
import matplotlib.pyplot as plt
import networkx
import pandas
import seaborn; seaborn.set()
import scipy.stats as stats
# Import widget methods
from IPython.html.widgets import *
In [ ]:
class Household(object):
"""
Household class, which encapsulates the entire behavior of a household.
"""
def __init__(self, model, household_id, adopted=False, threshold=1):
"""
Constructor for HH class. By default,
* not adopted
* threshold = 1
Must "link" the Household to their "parent" Model object.
"""
# Set model link and ID
self.model = model
self.household_id = household_id
# Set HH parameters.
self.adopted = adopted
self.threshold = threshold
def __repr__(self):
'''
Return string representation.
'''
skip_none = True
repr_string = type(self).__name__ + " ["
except_list = "model"
elements = [e for e in dir(self) if str(e) not in except_list]
for e in elements:
# Make sure we only display "public" fields; skip anything private (_*), that is a method/function, or that is a module.
if not e.startswith("_") and eval('type(self.{0}).__name__'.format(e)) not in ['DataFrame', 'function', 'method', 'builtin_function_or_method', 'module', 'instancemethod']:
value = eval("self." + e)
if value != None and skip_none == True:
repr_string += "{0}={1}, ".format(e, value)
# Clean up trailing space and comma.
return repr_string.strip(" ").strip(",") + "]"
Below, we will define our model class. This can be broken up as follows:
In [ ]:
class Model(object):
"""
Model class, which encapsulates the entire behavior of a single "run" in network model.
"""
def __init__(self, network, alpha, HH_adopted, HH_not_adopted):
"""
Class constructor.
"""
# Set our model parameters
self.network = network
self.alpha = alpha
self.HH_adopted = HH_adopted
self.HH_not_adopted = HH_not_adopted
# Set our state variables
self.t = 0
self.households = []
# Setup our history variables.
self.history_adopted = []
self.history_not_adopted = []
self.percent_adopted = 0
# Call our setup methods
self.setup_network()
self.setup_household()
def setup_network(self):
"""
Method to setup network.
"""
## need to flesh this out. will network be an input given from wrapper?
## what do I need to do to set up network?
g = network
def setup_households(self):
"""
Method to setup households.
"""
num_households = nx.nodes(g)
# Create all households.
for i in xrange(self.num_households):
self.households.append(Household(model=self,
household_id=i,
adopted=False,
threshold=stats.truncnorm.rvs((0 - alpha) / 0.5, (alpha) / 0.5, loc=alpha, scale=0.5,size=1)
def get_neighborhood(self, x):
"""
Get a list of connected nodes.
"""
neighbors = []
for i in g.neighbors(x):
neighbors.append(i)
return neighbors
def step_adopt_decision(self):
"""
Model a household evaluating their connections and making an adopt/not adopt decision
"""
will_adopt = []
for i in HH_not_adopted:
adopt_count = 0
for j in get_neighborhood(i):
if j.adopted:
adopt_count+=1
if adopt_count >= i.threshold:
will_adopt.append(i)
def step(self):
"""
Model step function.
"""
# Adoption decision
self.step_adopt_decision()
# Increment steps and track history.
self.t += 1
self.HH_adopted.append(will_adopt)
self.HH_not_adopted.remove(will_adopt)
self.history_adopted.append(self.HH_adopted)
self.history_not_adopted.append(self.HH_not_adopted)
self.percent_adopted = len(HH_adopted)/len(households)
def __repr__(self):
'''
Return string representation.
'''
skip_none = True
repr_string = type(self).__name__ + " ["
elements = dir(self)
for e in elements:
# Make sure we only display "public" fields; skip anything private (_*), that is a method/function, or that is a module.
e_type = eval('type(self.{0}).__name__'.format(e))
if not e.startswith("_") and e_type not in ['DataFrame', 'function', 'method', 'builtin_function_or_method', 'module', 'instancemethod']:
value = eval("self." + e)
if value != None and skip_none == True:
if e_type in ['list', 'set', 'tuple']:
repr_string += "\n\n\t{0}={1},\n\n".format(e, value)
elif e_type in ['ndarray']:
repr_string += "\n\n\t{0}=\t\n{1},\n\n".format(e, value)
else:
repr_string += "{0}={1}, ".format(e, value)
# Clean up trailing space and comma.
return repr_string.strip(" ").strip(",") + "]"
Below is the code which wrappers around the model. It does the following:
In [ ]:
## cycle through villages:
## (need to create village list where each item points to a different csv file)
num_samples = 2000
for fn in village_list:
village = np.genfromtxt(fn, delimiter=",")
network = from_numpy_matrix(village)
for HH_adopted in itertools.combinations(nx.nodes(network),2):
HH_not_adopted = [node for node in nx.nodes(network) if node not in HH_adopted]
for alpha in [1,2,3]:
for num_steps in [3,4,5,6]:
for n in xrange(num_samples):
m = Model(network, alpha, HH_adopted, HH_not_adopted)
for t in xrange(num_steps):
m.step()
## I need to collect adoption rate at each final step and average over all samples
## I am not sure where to fit this in
## I also need to write a function which determines optimal seed pairing
#######
#######
I hope to present charts which list optimal seed pairings at each parameter level and information diffusion process. This means optimal pairing will be given for alpha = {1,2,3} for each of num_steps = {3,4,5,6} and this will be done for both the original information diffusion process and the extended process.
I expect that the optimal seeding is dependent upon the alpha parameter though I suspect it may not be as dependent upon the number of steps parameter.
I am more interested in whether optimal seeds given by the extended information diffusion process are different than those given by the original process. I am actually not sure but I suspect that they will be different. If so, they could provide improved predictions to test in the field which hopefully may lead to even more efficient seed farmer targeting.