This notebook outlines how to recreate the analysis of the Rand et al. 2011 study "Dynamic social networks promote cooperation in experiments with humans" Link to Paper
This outlines the steps to re-create the analysis using the publicly available data published in the paper. This requires either a local or remote copy of Bedrock with the following Opals installed:
This notebook also requires that bedrock-core be installed locally into the python kernel running this notebook. This can be installed via command line using:
pip install git+https://github.com/Bedrock-py/bedrock-core.git
The other requirements to run this notebook are:
First check that Bedrock is installed locally. If the following cell does not run without error, check the install procedure above and try again. Also, ensure that the kernel selected is the same as the kernel where bedrock-core is installed
In [ ]:
from bedrock.client.client import BedrockAPI
In [ ]:
import requests
import pandas
import pprint
SERVER = "http://localhost:81/"
api = BedrockAPI(SERVER)
The following code block checks the Bedrock server for the Spreadsheet Opal. This Opal is used to load .csv, .xls, and other such files into a Bedrock matrix format. The code below calls the Bedrock /dataloaders/ingest
endpoint to check if the opals.spreadsheet.Spreadsheet.Spreadsheet
opal is installed.
If the code below shows the Opal is not installed, there are two options:
In [ ]:
resp = api.ingest("opals.spreadsheet.Spreadsheet.Spreadsheet")
if resp.json():
print("Spreadsheet Opal Installed!")
else:
print("Spreadsheet Opal Not Installed!")
The following code block checks the Bedrock server for the logit2 Opal.
If the code below shows the Opal is not installed, there are two options:
In [ ]:
resp = api.analytic('opals.logit2.Logit2.Logit2')
if resp.json():
print("Logit2 Opal Installed!")
else:
print("Logit2 Opal Not Installed!")
The following code block checks the Bedrock server for the select-from-dataframe Opal. This allows you to filter by row and reduce the columns in a dataframe loaded by the server.
If the code below shows the Opal is not installed, there are two options:
In [ ]:
resp = api.analytic('opals.select-from-dataframe.SelectByCondition.SelectByCondition')
if resp.json():
print("Select-from-dataframe Opal Installed!")
else:
print("Select-from-dataframe Opal Not Installed!")
The following code block checks the Bedrock server for the summarize Opal. This allows you to summarize a matrix with an optional groupby clause.
If the code below shows the Opal is not installed, there are two options:
In [ ]:
resp = api.analytic('opals.summarize.Summarize.Summarize')
if resp.json():
print("Summarize Opal Installed!")
else:
print("Summarize Opal Not Installed!")
Now that everything is installed, begin the workflow by uploading the csv data and creating a matrix. To understand this fully, it is useful to understand how a data loading workflow occurs in Bedrock.
Note: Each time a matrix is generated from a data source it will create a new copy with a new UUID to represent that matrix
In [ ]:
filepath = 'Rand2011PNAS_cooperation_data.csv'
datafile = pandas.read_csv('Rand2011PNAS_cooperation_data.csv')
datafile.head(10)
This code block uses the Spreadsheet ingest module to upload the source file to Bedrock. Note: This simply copies the file to the server, but does not create a Bedrock Matrix format
If the following fails to upload. Check that the csv file is in the correct comma delimited format with headers.
In [ ]:
ingest_id = 'opals.spreadsheet.Spreadsheet.Spreadsheet'
resp = api.put_source('Rand2011', ingest_id, 'default', {'file': open(filepath, "rb")})
if resp.status_code == 201:
source_id = resp.json()['src_id']
print('Source {0} successfully uploaded'.format(filepath))
else:
try:
print("Error in Upload: {}".format(resp.json()['msg']))
except Exception:
pass
try:
source_id = resp.json()['src_id']
print("Using existing source. If this is not the desired behavior, upload with a different name.")
except Exception:
print("No existing source id provided")
In [ ]:
available_sources = api.list("dataloader", "sources").json()
s = next(filter(lambda source: source['src_id'] == source_id, available_sources),'None')
if s != 'None':
pp = pprint.PrettyPrinter()
pp.pprint(s)
else:
print("Could not find source")
In order to use the data, the data source must be converted to a Bedrock matrix. The following code steps through that process. Here we are doing a simple transform of csv to matrix. There are options to apply filters (like renaming columns, excluding colum
In [ ]:
resp = api.create_matrix(source_id, 'rand_mtx')
mtx = resp[0]
matrix_id = mtx['id']
print(mtx)
resp
Here we can see that Bedrock has computed some basic statistics on the source data.
The quartiles, max, mean, min, and standard deviation are provided
The label values and counts for each label are provided.
The proposed tags and data type that Bedrock is suggesting are provided
In [ ]:
analytic_id = "opals.summarize.Summarize.Summarize"
inputData = {
'matrix.csv': mtx,
'features.txt': mtx
}
paramsData = []
summary_mtx = api.run_analytic(analytic_id, mtx, 'rand_mtx_summary', input_data=inputData, parameter_data=paramsData)
output = api.download_results_matrix(matrix_id, summary_mtx['id'], 'matrix.csv')
output
In [ ]:
analytic_id = "opals.select-from-dataframe.SelectByCondition.SelectByCondition"
inputData = {
'matrix.csv': mtx,
'features.txt': mtx
}
paramsData = [
{"attrname":"colname","value":"round_num"},
{"attrname":"comparator","value":"=="},
{"attrname":"value","value":"1"}
]
filtered_mtx = api.run_analytic(analytic_id, mtx, 'rand_round1_only', input_data=inputData, parameter_data=paramsData)
filtered_mtx
In [ ]:
output = api.download_results_matrix('rand_mtx', 'rand_round1_only', 'matrix.csv', remote_header_file='features.txt')
output
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': filtered_mtx,
'features.txt': filtered_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ condition"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, mtx, 'rand_logit2_step3', input_data=inputData, parameter_data=paramsData)
result_mtx
In [ ]:
coef_table = api.download_results_matrix('rand_mtx', 'rand_logit2_step3', 'matrix.csv')
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
The output of this analysis shows how the game condition interacts with the decision to either defect or cooperate. The coefficients provide the log-odds along with the Pr(z) scores to show the statistical significance. This is filtered only on round_num==1.
The referenced paper used several other comparisons to evaluate different interactions. The following code repeats the procedure above for the remaining analysis
In [ ]:
analytic_id = "opals.summarize.Summarize.Summarize"
inputData = {
'matrix.csv': mtx,
'features.txt': mtx
}
paramsData = [
{"attrname":"groupby","value":"condition,round_num"},
{"attrname":"columns","value":"decision0d1c"}
]
base_mtx = api.get_matrix_metadata('Rand2011','rand_mtx')
summary_mtx = api.run_analytic(analytic_id, base_mtx,'summarize_grouped', input_data=inputData, parameter_data=paramsData)
output = api.download_results_matrix(base_mtx['id'], summary_mtx['id'], 'matrix.csv')
output
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': base_mtx,
'features.txt': base_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ round_num"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, base_mtx, 'rand_logit2_step1', input_data=inputData, parameter_data=paramsData)
coef_table = api.download_results_matrix(base_mtx['id'], result_mtx['id'], 'matrix.csv')
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
In [ ]:
analytic_id = "opals.select-from-dataframe.SelectByCondition.SelectByCondition"
inputData = {
'matrix.csv': base_mtx,
'features.txt': base_mtx
}
paramsData = [
{"attrname":"colname","value":"num_neighbors"},
{"attrname":"comparator","value":">"},
{"attrname":"value","value":"0"}
]
filtered_mtx = api.run_analytic(analytic_id, base_mtx, 'rand_has_neighbors', input_data=inputData, parameter_data=paramsData)
In [ ]:
analytic_id = "opals.summarize.Summarize.Summarize"
inputData = {
'matrix.csv': filtered_mtx,
'features.txt': filtered_mtx
}
paramsData = [
{"attrname":"groupby","value":"condition,round_num"},
{"attrname":"columns","value":"decision0d1c"}
]
summary_mtx = api.run_analytic(analytic_id, filtered_mtx,'summarize_grouped', input_data=inputData, parameter_data=paramsData)
output = api.download_results_matrix(base_mtx['id'], summary_mtx['id'], 'matrix.csv')
output
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': filtered_mtx,
'features.txt': filtered_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ round_num"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, filtered_mtx, 'rand_logit2_step2', input_data=inputData, parameter_data=paramsData)
coef_table = api.download_results_matrix(base_mtx['id'], result_mtx['id'], 'matrix.csv')
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': base_mtx,
'features.txt': base_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ fluid_dummy*round_num"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, base_mtx, 'rand_logit2_step4', input_data=inputData, parameter_data=paramsData)
coef_table = api.download_results_matrix(base_mtx['id'], result_mtx['id'], 'matrix.csv')
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
In [ ]:
analytic_id = "opals.select-from-dataframe.SelectByCondition.SelectByCondition"
inputData = {
'matrix.csv': base_mtx,
'features.txt': base_mtx
}
paramsData = [
{"attrname":"colname","value":"round_num"},
{"attrname":"comparator","value":">="},
{"attrname":"value","value":"7"}
]
filtered_mtx = api.run_analytic(analytic_id, base_mtx, 'rand_round7', input_data=inputData, parameter_data=paramsData)
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': filtered_mtx,
'features.txt': filtered_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ condition"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, filtered_mtx, 'rand_logit2_step5', input_data=inputData, parameter_data=paramsData)
coef_table = api.download_results_matrix(base_mtx['id'], result_mtx['id'], 'matrix.csv')
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': filtered_mtx,
'features.txt': filtered_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ C(fluid_dummy)"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, filtered_mtx, 'rand_logit2_step6', input_data=inputData, parameter_data=paramsData)
coef_table = api.download_results_matrix(base_mtx['id'], result_mtx['id'], 'matrix.csv')
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': base_mtx,
'features.txt': base_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ C(condition, Treatment(reference='Random'))"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, base_mtx, 'rand_logit2_step7', input_data=inputData, parameter_data=paramsData)
coef_table = api.download_results_matrix(base_mtx['id'], result_mtx['id'], 'matrix.csv')
pandas.set_option('display.max_colwidth', -1)
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': base_mtx,
'features.txt': base_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ C(condition, Treatment(reference='Static'))"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, base_mtx, 'rand_logit2_step8', input_data=inputData, parameter_data=paramsData)
coef_table = api.download_results_matrix(base_mtx['id'], result_mtx['id'], 'matrix.csv')
pandas.set_option('display.max_colwidth', -1)
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': filtered_mtx,
'features.txt': filtered_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ C(condition, Treatment(reference='Random'))"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, filtered_mtx, 'rand_logit2_step9', input_data=inputData, parameter_data=paramsData)
coef_table = api.download_results_matrix(base_mtx['id'], result_mtx['id'], 'matrix.csv')
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': filtered_mtx,
'features.txt': filtered_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ C(condition, Treatment(reference='Static'))"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, filtered_mtx, 'rand_logit2_step10', input_data=inputData, parameter_data=paramsData)
coef_table = api.download_results_matrix(base_mtx['id'], result_mtx['id'], 'matrix.csv')
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
In [ ]:
analytic_id = "opals.select-from-dataframe.SelectByCondition.SelectByCondition"
inputData = {
'matrix.csv': base_mtx,
'features.txt': base_mtx
}
paramsData = [
{"attrname":"colname","value":"condition"},
{"attrname":"comparator","value":"=="},
{"attrname":"value","value":"Fluid"}
]
filtered_mtx = api.run_analytic(analytic_id, base_mtx, 'rand_fluid_only', input_data=inputData, parameter_data=paramsData)
In [ ]:
analytic_id = "opals.logit2.Logit2.Logit2"
inputData = {
'matrix.csv': filtered_mtx,
'features.txt': filtered_mtx
}
paramsData = [
{"attrname":"formula","value":"decision0d1c ~ C(num_neighbors)"},
{"attrname":"family","value":"binomial"},
{"attrname":"clustered_rse","value":"sessionnum,playerid"}
]
result_mtx = api.run_analytic(analytic_id, filtered_mtx, 'rand_logit2_step11', input_data=inputData, parameter_data=paramsData)
coef_table = api.download_results_matrix(base_mtx['id'], result_mtx['id'], 'matrix.csv')
coef_table
In [ ]:
summary_table = api.download_results_matrix(result_mtx['src_id'], result_mtx['id'], 'summary.csv')
summary_table
In [ ]: