Rand 2011 Bayesian Analysis

This notebook outlines how to begin the duplication the analysis of the Rand et al. 2011 study "Dynamic social networks promote cooperation in experiments with humans" Link to Paper

This notebook focuses on using a Bayesian approach. Just one example is shown. Refer to the other Cooperation Analysis notebook for the remaining regression formulas to do full replication

This notebook also requires that bedrock-core be installed locally into the python kernel running this notebook. This can be installed via command line using:

pip install git+https://github.com/Bedrock-py/bedrock-core.git

The other requirements to run this notebook are:

Step 1: Check Environment

First check that Bedrock is installed locally. If the following cell does not run without error, check the install procedure above and try again. Also, ensure that the kernel selected is the same as the kernel where bedrock-core is installed


In [ ]:
from bedrock.client.client import BedrockAPI

Test Connection to Bedrock Server

This code assumes a local bedrock is hosted at localhost on port 81. Change the SERVER variable to match your server's URL and port.


In [ ]:
import requests
import pandas
import pprint
SERVER = "http://localhost:81/"
api = BedrockAPI(SERVER)

Check for Spreadsheet Opal

The following code block checks the Bedrock server for the Spreadsheet Opal. This Opal is used to load .csv, .xls, and other such files into a Bedrock matrix format. The code below calls the Bedrock /dataloaders/ingest endpoint to check if the opals.spreadsheet.Spreadsheet.Spreadsheet opal is installed.

If the code below shows the Opal is not installed, there are two options:

  1. If you are running a local Bedrock or are the administrator of the Bedrock server, install the Spreadsheet Opal with pip on the server Spreadsheet
  2. If you are not administrator of the Bedrock server, e-mail the Bedrock administrator requesting the Opal be installed

In [ ]:
resp = api.ingest("opals.spreadsheet.Spreadsheet.Spreadsheet")
if resp.json():
    print("Spreadsheet Opal Installed!")
else:
    print("Spreadsheet Opal Not Installed!")

Check for STAN GLM Opal

The following code block checks the Bedrock server for the STAN GLM Opal.

If the code below shows the Opal is not installed, there are two options:

  1. If you are running a local Bedrock or are the administrator of the Bedrock server, install the Stan GLM Opal with pip on the server Stan GLM
  2. If you are not administrator of the Bedrock server, e-mail the Bedrock administrator requesting the Opal be installed

In [ ]:
resp = api.analytic('opals.stan.Stan.Stan_GLM')
if resp.json():
    print("Stan_GLM Opal Installed!")
else:
    print("Stan_GLM Opal Not Installed!")

Check for select-from-dataframe Opal

The following code block checks the Bedrock server for the select-from-dataframe Opal. This allows you to filter by row and reduce the columns in a dataframe loaded by the server.

If the code below shows the Opal is not installed, there are two options:

  1. If you are running a local Bedrock or are the administrator of the Bedrock server, install the select-from-datafram Opal with pip on the server select-from-dataframe
  2. If you are not administrator of the Bedrock server, e-mail the Bedrock administrator requesting the Opal be installed

In [ ]:
resp = api.analytic('opals.select-from-dataframe.SelectByCondition.SelectByCondition')
if resp.json():
    print("Select-from-dataframe Opal Installed!")
else:
    print("Select-from-dataframe Opal Not Installed!")

Check for summarize Opal

The following code block checks the Bedrock server for the summarize Opal. This allows you to summarize a matrix with an optional groupby clause.

If the code below shows the Opal is not installed, there are two options:

  1. If you are running a local Bedrock or are the administrator of the Bedrock server, install the summarize with pip on the server summarize
  2. If you are not administrator of the Bedrock server, e-mail the Bedrock administrator requesting the Opal be installed

In [ ]:
resp = api.analytic('opals.summarize.Summarize.Summarize')
if resp.json():
    print("Summarize Opal Installed!")
else:
    print("Summarize Opal Not Installed!")

Step 2: Upload Data to Bedrock and Create Matrix

Now that everything is installed, begin the workflow by uploading the csv data and creating a matrix. To understand this fully, it is useful to understand how a data loading workflow occurs in Bedrock.

  1. Create a datasource that points to the original source file
  2. Generate a matrix from the data source (filters can be applied during this step to pre-filter the data source on load
  3. Analytics work on the generated matrix

Note: Each time a matrix is generated from a data source it will create a new copy with a new UUID to represent that matrix

Check for csv file locally

The following code opens the file and prints out the first part. The file must be a csv file with a header that has labels for each column. The file is comma delimited csv.


In [ ]:
filepath = 'Rand2011PNAS_cooperation_data.csv'
datafile = pandas.read_csv('Rand2011PNAS_cooperation_data.csv')
datafile.head(10)

Now Upload the source file to the Bedrock Server

This code block uses the Spreadsheet ingest module to upload the source file to Bedrock. Note: This simply copies the file to the server, but does not create a Bedrock Matrix format

If the following fails to upload. Check that the csv file is in the correct comma delimited format with headers.


In [ ]:
ingest_id = 'opals.spreadsheet.Spreadsheet.Spreadsheet'
resp = api.put_source('Rand2011', ingest_id, 'default', {'file': open(filepath, "rb")})

if resp.status_code == 201:
    source_id = resp.json()['src_id']
    print('Source {0} successfully uploaded'.format(filepath))
else:
    try:
        print("Error in Upload: {}".format(resp.json()['msg']))
    except Exception:
        pass
    
    try:
        source_id = resp.json()['src_id']
        print("Using existing source.  If this is not the desired behavior, upload with a different name.")
    except Exception:
        print("No existing source id provided")

Check available data sources for the CSV file

Call the Bedrock sources list to see available data sources. Note, that the Rand2011 data source should now be available


In [ ]:
available_sources = api.list("dataloader", "sources").json()
s = next(filter(lambda source: source['src_id'] == source_id, available_sources),'None')
if s != 'None':
    pp = pprint.PrettyPrinter()
    pp.pprint(s)
else:
    print("Could not find source")

Create a Bedrock Matrix from the CSV Source

In order to use the data, the data source must be converted to a Bedrock matrix. The following code steps through that process. Here we are doing a simple transform of csv to matrix. There are options to apply filters (like renaming columns, excluding colum


In [ ]:
resp = api.create_matrix(source_id, 'rand_mtx')
mtx = resp[0]
matrix_id = mtx['id']
print(mtx)
resp

Look at basic statistics on the source data

Here we can see that Bedrock has computed some basic statistics on the source data.

For numeric data

The quartiles, max, mean, min, and standard deviation are provided

For non-numeric data

The label values and counts for each label are provided.

For both types

The proposed tags and data type that Bedrock is suggesting are provided


In [ ]:
analytic_id = "opals.summarize.Summarize.Summarize"
inputData = {
    'matrix.csv': mtx,
    'features.txt': mtx
}

paramsData = []

summary_mtx = api.run_analytic(analytic_id, mtx, 'rand_mtx_summary', input_data=inputData, parameter_data=paramsData)
output = api.download_results_matrix(matrix_id, summary_mtx['id'], 'matrix.csv')
output

Step 3: Filter the data based on a condition

Filter the data to only the Static Condition


In [ ]:
analytic_id = "opals.select-from-dataframe.SelectByCondition.SelectByCondition"
inputData = {
    'matrix.csv': mtx,
    'features.txt': mtx
}

paramsData = [
    {"attrname":"colname","value":"condition"},
    {"attrname":"comparator","value":"=="},
    {"attrname":"value","value":"Static"}
]

filtered_mtx = api.run_analytic(analytic_id, mtx, 'rand_static_only', input_data=inputData, parameter_data=paramsData)

filtered_mtx

Check that Matrix is filtered


In [ ]:
output = api.download_results_matrix('rand_mtx', 'rand_static_only', 'matrix.csv', remote_header_file='features.txt')
output

Step 4: Run Bayesian Logistic Regression

This uses Stan to perform Bayesian logistic regression comparing the effect of the round on the decision


In [ ]:
analytic_id = "opals.stan.Stan.Stan_GLM"
inputData = {
    'matrix.csv': filtered_mtx,
    'features.txt': filtered_mtx
}

paramsData = [
    {"attrname":"formula","value":"decision0d1c ~ round_num"},
    {"attrname":"family","value":'logit'},
    {"attrname":"chains","value":"3"},
    {"attrname":"iter","value":"3000"}
]

result_mtx = api.run_analytic(analytic_id, mtx, 'rand_bayesian1', input_data=inputData, parameter_data=paramsData)

result_mtx

Visualize the output of the analysis

Here the output of the analysis is downloaded and from here can be visualized and exported


In [ ]:
summary_table = api.download_results_matrix('rand_mtx', 'rand_bayesian1', 'matrix.csv')
summary_table

In [ ]:
prior_summary = api.download_results_matrix('rand_mtx', 'rand_bayesian1', 'prior_summary.txt')
print(prior_summary)