Midterm

Goal

I will explore whether a network-theory driven approach shown to improve the efficiency of an agricultural extension program is sensitive to the models and parameters originally used.

Justification

Social networks have been shown to be important vehicles for the transmission of new agricultural methods or 'technologies' (Bandiera and Rasul 2006, Conley and Udry 2010). These types of dynamics and time-varying agent behavior are best captured with through network modeling.

My project is based off a recent paper which used network modeling in conjunction with a large-scale field experiment (Beaman et al 2014). I wish to test the robustness of the findings of their model and so will employ a similar network modeling method.

Background on base paper

Beaman and co-authors aimed to improve the rollout of an agricultural extension program using predictions from network theory to optimally select 'seed farmers'. 'Seed farmers' are the select farmers in a village that the agricultural extension program trains. Because it is costly to train farmers in this way, it is most efficient to pick seed farmers such that their adoption of the agricultural technology will lead to the greatest spread of the technology throughout the village.

Beaman and coauthors first elicit the social networks of various rural villages. Then under the condition that the extension program only trains two farmers in each village, they take every possible combination of two nodes in a village network and simulate an information diffusion process for 4 periods. They take a measure of information diffusion at the end of each simulation and the pair of nodes which gives the greatest diffusion is their optimal seeding pair.

Their findings are then used in a field experiment where a random half of total villages are seeded according to their simulated optimal seeds while the other half is seeded according to the extension program's default procedure, usually based off of a field officer's own knowledge of the village and its influential farmers. They find evidence that network-theory informed seeding leads to increased technological adoption over baseline seeding procedures.

My extensions and measures of interest

I wish to recreate and expand upon their simulations in the following ways:

  • I will compare optimal seeds found with their method against optimal seeds found with an extended process of information diffusion. The extended process will include the possibility that households can reject a new technology even after being exposed to it by multiple connections. The original process assumes that a household will automatically adopt a technology after the number of connections who have adopted the technology passes a certain threshold
  • I will also sweep across the number of periods simulated and the alpha which the adoption threshold is normally distributed around to see if this produces alternate optimal seeds.

Outline

The original paper looks at rural village in Malawi. I do not have access to their network data but I have a dataset of social graphs from 74 villages in South India. Though there may be differences in network structure between villages in these two locations, I will assume they are reasonably comparable.

First, I will recreate results from Beaman et al by selecting all combinations of node pairs in a subset of 25 villages. For each pair, I will run them through a information diffusion simulation for {3,4,5,6} steps. I will also sweep through values {1,2,3} for a alpha parameter. Each household has an adoption threshold, T, which determines whether they adopt the new technology or not. If X number of connections have adopted the technology and X=>T, then the household will adopt the new technology in the next period. Each household independently drawns a threshold from a normal distribution N(alpha, 0.5) bounded positive, so sweeping through alpha parameters will push up and down the distribution of household thresholds T.

To mitigate stochasticity, I will repeat 2000 times, and take an average measure of information diffusion (given by percent of households adopted at last step). The pair of nodes which give the greatest information difussion are my theory-driven seed farmers equivalent to those found in Beaman et al. I will examine whether the determination of these optimal seed farmers depends on the number of steps run and the alpha parameter used. Then, I will run the same simulations except using the extended information diffusion process described above. I want to see whether seed farmers selected through this method are different than those selected by Beaman's process. For the midterm, I will concentrate on coding the re-creation of method from Beaman et al.

I. Space

I will model space with an undirected social network. Each node represents a rural household and each edge represents a social connection.

II. Actors

Each node in my network is a household. They are modeled simply and have only a few properties:

  • id: household id
  • adopted: whether they have adopted the new technology or not
  • threshold: the threshold above which they will adopt the new technology in the next period. This threshold will be drawn from a normal distribution with mean alpha and standard deviation 0.5 which is bounded to be positive.

In each step, each unadopted household will count the number of connections who have adopted the new technology. If this count exceeds a household's adoption threshold, it will also adopt the technology in the next period.

III. Model Wrapper

I will wrap my model in a function which loops through each village, and in each village, loops through every possible pair of nodes. Then, I will sweep through my parameters, number of steps and alpha. I will repeat this under the alternate information diffusion process. I will also determine and collect optimal seeds here.

IV. Initial Conditions

Each model will start with a list of adopted households. In the first step, only seed households will be in this list which will be read in through the wrapper.

V. Model Parameters

My model will have the following parameters:

  • network: adjacency matrix that is read in from wrapper
  • alpha: parameter determining distribution of adoption threshold
  • HH_adopted: list of adopted households, in first step these are seed households given by wrapper
  • HH_not_adopted: list of all not adopted households

In [1]:
#Imports

%matplotlib inline

# Standard imports
import copy
import itertools

# Scientific computing imports
import numpy
import matplotlib.pyplot as plt
import networkx
import pandas
import seaborn; seaborn.set()
import scipy.stats as stats


# Import widget methods
from IPython.html.widgets import *


:0: FutureWarning: IPython widgets are experimental and may change in the future.

Household class

Below is a rough draft of the household class. It only has one component:

  • constructor: class constructor, which "initializes" or "creates" the household when we call Household(). This is in the init method.

In [ ]:
class Household(object):
    """
    Household class, which encapsulates the entire behavior of a household.
    """
    
    def __init__(self, model, household_id, adopted=False, threshold=1):
        """
        Constructor for HH class.  By default,
          * not adopted
          * threshold = 1
          
        Must "link" the Household to their "parent" Model object.
        """
        # Set model link and ID
        self.model = model
        self.household_id = household_id
        
        # Set HH parameters.
        self.adopted = adopted
        self.threshold = threshold

    def __repr__(self):
        '''
        Return string representation.
        '''
        skip_none = True
        repr_string = type(self).__name__ + " ["
        except_list = "model"

        elements = [e for e in dir(self) if str(e) not in except_list]
        for e in elements:
            # Make sure we only display "public" fields; skip anything private (_*), that is a method/function, or that is a module.
            if not e.startswith("_") and eval('type(self.{0}).__name__'.format(e)) not in ['DataFrame', 'function', 'method', 'builtin_function_or_method', 'module', 'instancemethod']:
                    value = eval("self." + e)
                    if value != None and skip_none == True:
                        repr_string += "{0}={1}, ".format(e, value)

        # Clean up trailing space and comma.
        return repr_string.strip(" ").strip(",") + "]"

Model class

Below, we will define our model class. This can be broken up as follows:

  • constructor: class constructor, which "initializes" or "creates" the model when we call Model(). This is in the init method.
  • setup_network: sets up graph
  • setup_households: sets up households
  • get_neighborhood: defines a function to get a list of connected nodes
  • step_adopt_decision: method to step through household decision
  • step: main step method

In [ ]:
class Model(object):
    """
    Model class, which encapsulates the entire behavior of a single "run" in network model.
    """
    
    def __init__(self, network, alpha, HH_adopted, HH_not_adopted):
        """
        Class constructor.
        """
        # Set our model parameters
        self.network = network
        self.alpha =  alpha
        self.HH_adopted = HH_adopted
        self.HH_not_adopted = HH_not_adopted
       
        # Set our state variables
        self.t = 0
        self.households = []
        
        # Setup our history variables.
        self.history_adopted = []
        self.history_not_adopted = []
        self.percent_adopted = 0
        
        # Call our setup methods
        self.setup_network()
        self.setup_household()
        
    def setup_network(self):
        """
        Method to setup network.
        """
        ## need to flesh this out.  will network be an input given from wrapper?  
        ## what do I need to do to set up network?
        g = network
        
    def setup_households(self):
        """
        Method to setup households.
        """
        num_households = nx.nodes(g)
        # Create all households.
        for i in xrange(self.num_households):
            self.households.append(Household(model=self,
                                      household_id=i,
                                      adopted=False,
                                      threshold=stats.truncnorm.rvs((0 - alpha) / 0.5, (alpha) / 0.5, loc=alpha, scale=0.5,size=1)   

    
                                             
                                             
    def get_neighborhood(self, x):
        """
        Get a list of connected nodes.
        """
        neighbors = []
        for i in g.neighbors(x):
            neighbors.append(i)
        return neighbors
    
    def step_adopt_decision(self):
        
        """
        Model a household evaluating their connections and making an adopt/not adopt decision
        """
        will_adopt = []
        for i in HH_not_adopted:
            adopt_count = 0
            for j in get_neighborhood(i):
                if j.adopted:
                    adopt_count+=1
            if adopt_count >= i.threshold:
                will_adopt.append(i)     
        
        
    
    def step(self):
        """
        Model step function.
        """
        
        # Adoption decision
        self.step_adopt_decision()
        
        # Increment steps and track history.
        self.t += 1
        self.HH_adopted.append(will_adopt)
        self.HH_not_adopted.remove(will_adopt)
        self.history_adopted.append(self.HH_adopted)
        self.history_not_adopted.append(self.HH_not_adopted)
        self.percent_adopted = len(HH_adopted)/len(households)


    def __repr__(self):
        '''
        Return string representation.
        '''
        skip_none = True
        repr_string = type(self).__name__ + " ["

        elements = dir(self)
        for e in elements:
            # Make sure we only display "public" fields; skip anything private (_*), that is a method/function, or that is a module.
            e_type = eval('type(self.{0}).__name__'.format(e))
            if not e.startswith("_") and e_type not in ['DataFrame', 'function', 'method', 'builtin_function_or_method', 'module', 'instancemethod']:
                    value = eval("self." + e)
                    if value != None and skip_none == True:
                        if e_type in ['list', 'set', 'tuple']:
                            repr_string += "\n\n\t{0}={1},\n\n".format(e, value)
                        elif e_type in ['ndarray']:
                            repr_string += "\n\n\t{0}=\t\n{1},\n\n".format(e, value)
                        else:
                            repr_string += "{0}={1}, ".format(e, value)

        # Clean up trailing space and comma.
        return repr_string.strip(" ").strip(",") + "]"

Wrapper with parameter sweep

Below is the code which wrappers around the model. It does the following:

  • Loops through all villages we wish to examine
    • Pulls network data from a csv and puts in the appropriate format
  • Loops through all possible pairs of nodes within each village
  • Sweeps through alpha and number of steps parameters
  • Runs 2000 samples

In [ ]:
## cycle through villages:
## (need to create village list where each item points to a different csv file)
num_samples = 2000

for fn in village_list:
    village = np.genfromtxt(fn, delimiter=",")
    network = from_numpy_matrix(village)
    for HH_adopted in itertools.combinations(nx.nodes(network),2):
        HH_not_adopted = [node for node in nx.nodes(network) if node not in HH_adopted]
        for alpha in [1,2,3]:
            for num_steps in [3,4,5,6]:
                for n in xrange(num_samples):
                    m = Model(network, alpha, HH_adopted, HH_not_adopted)
                for t in xrange(num_steps):
                    m.step() 
## I need to collect adoption rate at each final step and average over all samples
## I am not sure where to fit this in


## I also need to write a function which determines optimal seed pairing

#######
#######

Results

I hope to present charts which list optimal seed pairings at each parameter level and information diffusion process. This means optimal pairing will be given for alpha = {1,2,3} for each of num_steps = {3,4,5,6} and this will be done for both the original information diffusion process and the extended process.

Hypothetical results

I expect that the optimal seeding is dependent upon the alpha parameter though I suspect it may not be as dependent upon the number of steps parameter.

I am more interested in whether optimal seeds given by the extended information diffusion process are different than those given by the original process. I am actually not sure but I suspect that they will be different. If so, they could provide improved predictions to test in the field which hopefully may lead to even more efficient seed farmer targeting.