Data


In [1]:
import pandas as pd

In [2]:
path = '../.server/media/exports/robomission-2017-11-17/{entity}.csv'
load_entity = lambda name: pd.read_csv(path.format(entity=name), index_col='id')

Tasks


In [4]:
load_entity('tasks').head()


Out[4]:
name level setting solution
id
1 diamonds-in-meteoroid-cloud repeat {"fields": [[["b", []], ["b", []], ["b", []], ... R4{sr}f
2 direction-change while {"fields": [[["b", []], ["b", []], ["b", []], ... W!y{f}W!b{l}
3 arrow loops {"fields": [[["b", []], ["b", []], ["b", []], ... sR3{l}slR2{f}R4{r}R4{l}f
4 diamond-lines comparing {"fields": [[["b", []], ["b", []], ["b", []], ... W!b{Ix=3{f}r}
5 ladder repeat {"fields": [[["b", []], ["b", ["A"]], ["b", ["... W!b{fs}

Levels


In [4]:
load_entity('levels')


Out[4]:
level name credits toolbox tasks
id
1 1 moves 6 fly ['turning-right-and-left', 'three-steps-forwar...
2 2 world 25 shoot ['dont-forget-shot', 'wormhole-demo', 'shootin...
3 3 repeat 40 repeat ['find-the-path', 'diamonds-in-meteoroid-cloud...
4 4 while 60 while ['zig-zag', 'yellow-hint', 'direct-flight-ahea...
5 5 loops 100 loops ['zig-zag-plus', 'big-slalom', 'color-slalom',...
6 6 if 150 loops+if ['diamonds-with-signals', 'two-diamonds', 'fol...
7 7 comparing 200 loops+if+position ['diamond-lines', 'slalom-position-testing', '...
8 8 if-else 300 loops+if+else ['colorful-flowers', 'narrow-passage', 'diamon...
9 9 final-challenge 1000 complete ['triple-slalom', 'cross-2', 'diagonal-diamond...

Toolboxes


In [5]:
load_entity('toolboxes')


Out[5]:
name blocks
id
1 fly ['fly']
2 shoot ['fly', 'shoot']
3 repeat ['fly', 'shoot', 'repeat']
4 while ['fly', 'shoot', 'while', 'color']
5 loops ['fly', 'shoot', 'repeat', 'while', 'color']
6 loops+if ['fly', 'shoot', 'repeat', 'while', 'color', '...
7 loops+if+position ['fly', 'shoot', 'repeat', 'while', 'color', '...
8 loops+if+else ['fly', 'shoot', 'repeat', 'while', 'color', '...
9 complete ['fly', 'shoot', 'repeat', 'while', 'color', '...

Students


In [6]:
load_entity('students').head()


Out[6]:
credits seen_instructions
id
1 35 ['env.space-world', 'env.toolbox', 'env.snappi...
2 0 []
3 39 ['env.space-world', 'env.toolbox', 'env.snappi...
4 4 ['env.space-world', 'env.toolbox', 'env.snappi...

Task Sessions


In [7]:
load_entity('task_sessions').head()


Out[7]:
student task solved start end time_spent
id
5 1 36 False 2017-11-03T14:18:33.507352Z 2017-11-03T14:18:33.507363Z 0
6 1 21 False 2017-11-03T14:41:10.792866Z 2017-11-03T15:04:01.947104Z 1371
7 1 22 True 2017-11-03T15:04:38.763391Z 2017-11-03T16:08:13.435480Z 3814
8 1 33 True 2017-11-03T16:24:48.621123Z 2017-11-03T16:25:02.587917Z 13
9 1 12 True 2017-11-03T16:38:50.718301Z 2017-11-03T16:39:09.866528Z 19

Program Snapshots


In [8]:
snapshots = load_entity('program_snapshots')
snapshots.head(14)


Out[8]:
task_session time program granularity order correct time_from_start time_delta
id
31 6 2017-11-03T15:03:52.665539Z f edit 1 NaN 1361 1361
32 6 2017-11-03T15:04:02.015296Z fr edit 2 NaN 1371 10
33 7 2017-11-03T15:04:41.870822Z f edit 1 NaN 3 3
34 7 2017-11-03T15:04:43.445636Z ff edit 2 NaN 4 1
35 7 2017-11-03T16:03:06.944434Z f edit 3 NaN 3508 3504
36 7 2017-11-03T16:04:48.070260Z f edit 4 NaN 3609 101
37 7 2017-11-03T16:04:49.945154Z f edit 5 NaN 3611 2
38 7 2017-11-03T16:06:07.973780Z f edit 6 NaN 3689 78
39 7 2017-11-03T16:06:10.865586Z f edit 7 NaN 3692 3
40 7 2017-11-03T16:07:35.284635Z f edit 8 NaN 3776 84
41 7 2017-11-03T16:07:37.450566Z f execution 1 False 3778 3778
42 7 2017-11-03T16:08:02.683180Z ff edit 9 NaN 3803 27
43 7 2017-11-03T16:08:04.182221Z fff edit 10 NaN 3805 2
44 7 2017-11-03T16:08:13.565203Z fff execution 2 True 3814 36
  • there are 2 granularity levels: {'edit', 'execution'}
  • correct field is only set for execution snapshots
  • order and time_delta are computed per-granularity
  • time_delta = number of seconds from the last snapshot of the same granularity
  • time_from_start = number of seconds from the start of the task session

Actions

Actions time series describes all actions we model. State of all other entities is given by the initial state (static data, such as tasks and levels) and the actions. Form most analyses, it's easier to use some derived data (such as task sessions) instead of raw actions.

Actions are semi-structured: there are some common fields (e.g name, time, student, task) and an unstructured action-specific dictionary in the data column.

Note: Although task was decided to be a common field, not all actions have it (namely watch-instruction don't). As a result, there are some NaNs in this column and pandas use floats for them (it's not possible to have integers and NaNs in the same array).


In [9]:
actions = load_entity('actions')
actions[20:30]


Out[9]:
name student task time randomness data
id
65 start-task 1 22.0 2017-11-03T15:04:38.837156Z 846556146 {'task_session_id': 7}
66 edit-program 1 22.0 2017-11-03T15:04:41.929466Z 973306676 {'program': 'f', 'task_session_id': 7}
67 edit-program 1 22.0 2017-11-03T15:04:43.504214Z 256778077 {'program': 'ff', 'task_session_id': 7}
68 edit-program 1 22.0 2017-11-03T16:03:07.001961Z 809371847 {'program': 'f', 'task_session_id': 7}
69 edit-program 1 22.0 2017-11-03T16:04:48.129308Z 284165005 {'program': 'f', 'task_session_id': 7}
70 edit-program 1 22.0 2017-11-03T16:04:50.003349Z 892026422 {'program': 'f', 'task_session_id': 7}
71 edit-program 1 22.0 2017-11-03T16:06:08.031688Z 273533843 {'program': 'f', 'task_session_id': 7}
72 edit-program 1 22.0 2017-11-03T16:06:10.923623Z 141552777 {'program': 'f', 'task_session_id': 7}
73 edit-program 1 22.0 2017-11-03T16:07:35.343301Z 1039088847 {'program': 'f', 'task_session_id': 7}
74 run-program 1 22.0 2017-11-03T16:07:37.508914Z 867039629 {'task_session_id': 7, 'program': 'f', 'correc...

In [10]:
# Currently, there are only 4 types of action:
set(actions.name)


Out[10]:
{'edit-program', 'run-program', 'start-task', 'watch-instruction'}