bandicoot is an open-source python toolbox to analyze mobile phone metadata. For more information, see: http://bandicoot.mit.edu/
The source code of the notebook is available as demo.ipynb
and a plain
Python version as demo.py
. You can download them from our repository on Github at https://github.com/yvesalexandre/bandicoot/tree/master/demo
Try bandicoot on your phone!
If you want to try bandicoot with your own data, download our Android app at bandicoot.mit.edu/android
In [1]:
# Records for the user 'ego'
!head -n 5 data/ego.csv
In [2]:
# GPS locations of cell towers
!head -n 5 data/antennas.csv
In [3]:
import bandicoot as bc
U = bc.read_csv('ego', 'data/', 'data/antennas.csv')
In [4]:
import os
viz_path = os.path.dirname(os.path.realpath(__name__)) + '/viz'
bc.visualization.export(U, viz_path)
Out[4]:
In [5]:
from IPython.display import IFrame
IFrame("/files/viz/index.html", "100%", 700)
Out[5]:
In [6]:
bc.individual.percent_initiated_conversations(U)
Out[6]:
In [7]:
bc.spatial.number_of_antennas(U)
Out[7]:
In [8]:
bc.spatial.radius_of_gyration(U)
Out[8]:
The signature of the active_days
indicators is:
bc.individual.active_days(user, groupby='week', interaction='callandtext', summary='default', split_week=False, split_day=False, filter_empty=True, datatype=None)
What does that mean?
Weekly aggregation
By default, _bandicoot_ computes the indicators on a weekly basis and returns the average (mean) over all the weeks available and its standard deviation (std) in a nested dictionary.
In [9]:
bc.individual.active_days(U)
Out[9]:
The groupby
keyword controls the aggregation:
groupby='week'
to divide by week (by default),groupby='month'
to divide by month,groupby=None
to aggregate all values.
In [10]:
bc.individual.active_days(U, groupby='week')
Out[10]:
In [11]:
bc.individual.active_days(U, groupby='month')
Out[11]:
In [12]:
bc.individual.active_days(U, groupby=None)
Out[12]:
Some indicators such as active_days returns one number. Others, such as duration_of_calls returns a distribution.
The summary keyword can take three values:
summary='default'
to return mean and standard deviation,summary='extended'
for the second type of indicators, to return mean, sem, median, skewness and std of the distribution,summary=None
to return the full distribution.
In [13]:
bc.individual.call_duration(U)
Out[13]:
In [14]:
bc.individual.call_duration(U, summary='extended')
Out[14]:
In [15]:
bc.individual.call_duration(U, summary=None)
Out[15]:
In [16]:
bc.individual.active_days(U, split_week=True, split_day=True)
Out[16]:
In [17]:
features = bc.utils.all(U, groupby=None)
In [18]:
features
Out[18]:
In [19]:
bc.to_csv(features, 'demo_export_user.csv')
bc.to_json(features, 'demo_export_user.json')
In [20]:
!head demo_export_user.csv
In [21]:
!head -n 15 demo_export_user.json
You can easily develop your indicator using the @grouping
decorator. You only need to write a function taking as input a list of records and returning an integer or a list of integers (for a distribution). The @grouping
decorator wraps the function and call it for each group of weeks.
In [22]:
from bandicoot.helper.group import grouping
@grouping(interaction='call')
def shortest_call(records):
in_durations = (r.call_duration for r in records)
return min(in_durations)
In [23]:
shortest_call(U)
Out[23]:
In [24]:
shortest_call(U, split_day=True)
Out[24]: