C. Cité des Sciences experiment: Results
What can games bring to synthetic biology education for University students in terms of efficiency and motivation?
What can games bring to synthetic biology popularization to citizens in terms of interest and basic comprehension?
a) the simplified link between genotype and phenotype, b) BioBricks as genes' subcomponents, c) the BioBrick simplified grammar: Promoter - RBS - Coding Sequence - Terminator, d) the simplified role of each kind of brick: condition - quantity - function - end, e) advanced notions: inducible promoters
Does gender/age/... correlates to synthetic biology game-based learning efficiency in terms of knowledge acquisition?
Does gender/age/... correlates to motivation in synthetic biology game-based learning?
Does interest in biology correlates with playing duration?
Does interest in games correlates with playing duration?
Does gender/age/... correlate with playing duration?
Is implicitly-taught content less well assimilated than explicitly-taught content?
Does it depend on demographics - gender/age/... or students/citizens or gamers/non-gamers?
Influence of repeated play
Threshold effect: point after which everything is downhill, understood
Results of timely spaced tests
Effect of priming with pre-test
Can quizz-based assessment be replaced by automated tracking?
This part is based on data gathering over the period from May 2017 to March 2018, on the game version 1.52.
Online data gathering after March 2018 for versions 1.52.2 (March 23rd 2018 to April 26th 2018) and 1.60 yielded too few filled in posttests, even though respectively 103 and 169 people played the 1.52.2 and 1.60 versions.
Purpose: monitor the evolution of the number and type of answers, to detect technical issues - GF/RM before/after mismatches - and to roughly estimate the statistical power of this study.
category | count |
---|---|
surveys | 545 |
unique users | 474 |
RM before | 38 |
GF before | 435 |
RM after | 229 |
GF after | 110 |
unique biologists | 95 |
unique gamers | 281 |
unique perfect users | 46 |
(2018-03-23)
Purpose: find out which questions can be clustered together
H1e Some answers to some questions are expected to be correlated in the pretest but not in the posttest.
H3a A correlation is expected between age and score. No correlation is expected between gender and score.
H3b A correlation is expected between age and motivation. No correlation is expected between gender and motivation.
H4a A correlation is expected between interest in biology and play duration.
H4b A correlation is expected between interest in games and play duration.
H4c A correlation is expected between age and play duration. No correlation is expected between gender and play duration.
H5b A correlation is expected between age, education, gaming profile and the understanding of implicitly taught content. No correlation is expected between gender and the understanding of implicitly taught content.
H6a A correlation is expected between the chapters reached and the score.
Note: the overlay numbers indicate the absolute number of people who answered correctly both questions vertically and horizontally.
Purpose:
Purpose:
correlate different types of knowledge: using the two clusters above:
Cluster 2: 19-21 are aided device function, 26-27 are general cell biology
Hypothesis: cluster 2 is very easy for engaged, biology proficient players, whereas cluster 1 is what the average player gets.
Purpose:
Purpose:
Linked Hypotheses:
"biologists" are respondents who answered positively in at least one question of the following ones:
"gamers" are respondents who answered positively in at least one question of the following ones:
category | p-value |
---|---|
all respondents | 1.433227564562082e-28 |
female | 2.2544270057179512e-14 |
male | 2.6402930785552472e-14 |
biologists | 5.16017986518175e-23 |
gamers | 3.717399953311343e-18 |
(2018-03-14)
Questions:
Can the users be clustered?
What are the most meaningful questions of the survey?
Purpose: same as correlation matrices'
Conclusion: Accuracy is around 85%. Not bad but we expected better (17/01/2018)
Conclusion: Accuracy is around 80%. Not bad but we expected better (19/12/2017)
Conclusion: Score cannot be predicted by the table of RedMetrics data (30/01/2018)
Conclusion: Score cannot be predicted by the table of RedMetrics data + second degree polynomial (30/01/2018)
Conclusion: Tried different combinations, but cannot find any interesting regression (02/02/2018)
Conclusion: No (30/01/2018)
Conclusion: No (30/01/2018)
Conclusion: Redmetrics can be used to predict answers to certain scientific questions (30/01/2018)
Conclusion: Low quality prediction (1/02/2018)
Conclusion: No apparent possible prediction (1/02/2018)
Conclusion: Inconclusive (01/02/2018)
Conclusion: No (01/02/2018)
Conclusion: No (01/02/2018)
no interesting clustering (30/11/2017)
Conclusion: Two clusters, with one small cluster of highly interested subjects with very high level of correct answers (and high score) and big cluster of average interest and low level of correct answers (and low score). (30/01/2018)
Conclusion: No interesting clustering (30/11/2017)
Conclusion: No interesting clustering (16/01/2018)
Conclusion: The data could be clustered in two groups Note: The silhouette coefficient probably never goes very high because of the binary aspect of most of the data (30/11/2017)
Hypothesis: The two groups identified by the clustering algorithm correspond to the "before" and "after" questionnaires. Note: The temporality feature was not included in the clustering algorithm
Conclusion: Hypothesis verified. Parallel coordinates plot is not very informative because of the high number of features and the high proportion of binary features, use only for data exploration (30/12/2017) Would be interesting to see if those that are predicted before while they are after share specific characteristics. (16/01/2018)
Conclusion: No interesting clustering (16/01/2018)
Predicted before | Predicted after | |
---|---|---|
Actual undefined | 31 | 74 |
Actual after | 34 | 60 |
Actual before | 54 | 3 |
Conclusion: Compared to previous test, the undefined class is too big. (16/01/2018)
Conclusion: The data could be clustered in two groups and the clustering is slightly better than with scientific questions coded by answers Note: The silhouette coefficient probably never goes very high because of the binary aspect of most of the data (01/12/2017)
Hypothesis: The two groups identified by the clustering algorithm correspond to the "before" and "after" questionnaires. Note: The temporality feature was not included in the clustering algorithm
Predicted after | Predicted before | |
---|---|---|
Actual after | 68 | 26 |
Actual before | 6 | 51 |
Conclusion: Hypothesis verified. Parallel coordinates plot is not very informative because of the high proportion of binary features, use only for data exploration. Better than with scientific questions coded by answers (16/01/2018)
Conclusion: The data could be clustered in two groups. Three groups could be interesting but not enough data points in third cluster to conclude. (30/11/2017)
Predicted after | Predicted before | |
---|---|---|
Actual undefined | 84 | 21 |
Actual after | 68 | 26 |
Actual before | 6 | 51 |
Conclusion: Compared to previous test, the presence of questionnaire that were realised neither just before nor just after the play test is not detected, but it does not impact the prediction of the before and after temporalities (01/12/2017)
Conclusion: No interesting clustering (30/11/2017)
Conclusion: The data could be clustered in two groups (01/12/2017)
Conclusion: Could be clustered in two groups (17/01/2018)
Conclusion: No difference in score between groups but difference in behaviours. Small group didn't play a lot?
Conclusion: No interesting clustering (19/12/2017)
Conclusion: No interesting clustering (19/12/2017)
This experiment took place from April 10th to April 28th in the Cité des Sciences in Paris.
Subjects were strongly encouraged to follow the protocol as far as they could. But some museum guests did leave before completing it.
In a first phase, from April 10th to April 25th, subjects were invited to fill in a survey, play a version of the game (labelled 1.52.2) at least 20 minutes, and then fill in the survey again.
In a second phase, on April 27th and 28th, subjects were invited to follow the same protocol but with a slightly different version of the game, labelled 1.60, to test an hypothesis according to which players tend to learn better when the game puzzles make more sense and when they get more feedback on their action.
players | respondents | f respondents | m respondents | exploitable respondents | f exploitable respondents | m exploitable respondents | twice respondents | f twice respondents | m twice respondents | volunteers | f volunteers | m volunteers |
---|---|---|---|---|---|---|---|---|---|---|---|---|
193.0 | 193.0 | 54.0 | 112.0 | 181.0 | 51.0 | 105.0 | 126.0 | 36.0 | 88.0 | 90.0 | 24.0 | 65.0 |
Conclusion: this analysis shows that 90 subjects out of the 193 participants could be used in the first phase of the study. Similarly, on the second phase,
This shows that due to various reasons only a portion of the cohort can be used. This will reduce the significance of the results and may even prevent the analysis to be conclusive on a set of questions.
Those reasons were identified as most likely being:
category | count |
---|---|
surveys | 180 |
unique users | 90 |
pretests | 90 |
posttests | 90 |
unique biologists | 0 |
unique gamers | 63 |
unique perfect users | 90 |
(2018-06-07)
Conclusion: diverse sample. The 10-25 and male classes are overrepresented compared to the French population. In terms of in-class use, though, the age overrepresentation is not an issue. In terms of online use, it matches
WIP
Completion time: minimal time taken for players to go from checkpoint n to checkpoint n+1.
Total time: total time spent by players in checkpoint n.
Negative correlations between completion times and answers mean that a user who takes too much time to solve a checkpoint will have trouble answering some questions.
The correlated groups inside the checkpoint vs checkpoint region can be explained by the fact that up to the version of the game used during this experiment, there were two blocking puzzles that prevented lots of players from finishing the game. Those two puzzles were located between checkpoints 1 and 2 for the first, and between checkpoints 4 and 5 for the second.
Purpose:
Linked Hypotheses:
"biologists" are respondents who answered positively in at least one question of the following ones:
"gamers" are respondents who answered positively in at least one question of the following ones:
score metric | posttest | pretest | progress |
---|---|---|---|
mean | 11.044444 | 1.366667 | 9.677778 |
median | 12.000000 | 1.000000 | 11.000000 |
std | 6.345681 | 1.718701 | 4.626980 |
t test: statistic=-13.965178477360617 pvalue=2.0765688405514046e-30
category | count | p-value |
---|---|---|
all respondents | 90 | 2.0765688405514046e-30 |
females | 24 | 1.6138023099633192e-07 |
males | 66 | 1.336159549603149e-24 |
biologists | 0 | - |
gamers | 63 | 5.2795848211637575e-24 |
(2018-06-07)
Questions:
Can the users be clustered?
What are the most meaningful questions of the survey?
Purpose: same as correlation matrices'
WIP
Cohort: all 90 respondents who answered the pretest and posttest exactly once, and volunteered to answer the optional pretest questions. Each dot on those graphs is 1 survey answer. Therefore there are 180 dots on each graph.
Observation: inconsistent results between the "score" graph and the pretest vs posttest graph. Pretests should be grouped on the right of the graph while posttests should be on the left, according to the score gradient.
Moreover, no clustering appears on any question, contrary to preleminary results. There may be a bug in the representation of data.
All the graphs are in this Google drive folder.
All the graphs are in this Google drive folder
Cohort: respondents who filled the survey before playing
Does not apply in this experiment's context.
What can games bring to synthetic biology popularization to citizens in terms of interest and basic comprehension?
In this experiment, curiosity was stable, but more importantly, the cohort was eventually slightly more polarized.
a) the simplified link between genotype and phenotype,
b) BioBricks as genes' subcomponents,
c) the BioBrick simplified grammar: Promoter - RBS - Coding Sequence - Terminator,
d) the simplified role of each kind of brick: condition - quantity - function - end,
e) advanced notions: inducible promoters
a) 1 question: improvement, statistically significant, event with a strict grading policy.
b) does not apply: no question asked related to genes
c) 8 questions on devices: improvement, statistically significant, even with a strict grading policy.
d) 5 questions on BioBricks: icon-function association: improvement, statistically significant, but not with a strict grading policy. name-function association: not computed
e) 1, 3 or 9 questions: slight improvement, statistically significant, but not with a strict grading policy, in which case a strong decrease is observed, revealing that misconceptions were introduced.
Does gender/age/... correlates to synthetic biology game-based learning efficiency in terms of knowledge acquisition?
Does gender/age/... correlates to motivation in synthetic biology game-based learning?
Does interest in biology correlates with playing duration?
Does interest in games correlates with playing duration?
Does gender/age/... correlate with playing duration?
Is implicitly-taught content less well assimilated than explicitly-taught content?
Does it depend on demographics - gender/age/... or students/citizens or gamers/non-gamers?
How comparable are learning metrics computed from questionnaires and from automated remote tracking data?
In [ ]: