By the end of this notebook, you will be expected to:
- Manually perform exploratory data analysis on call data;
- Leverage the Bandicoot module to automate analysis; and
- Know resources for building your own Funf applications.
- Exercise 1: Using Bandicoot for analysis.
- Exercise 2: Interpreting calls of zero duration in call records.
This notebook introduces two tools that will be discussed in detail in upcoming video content. You can complete the exercise using the sample dataset or generate your own using the instructions below, if you have access to an Android device.
To demonstrate the different lengths of time it takes to gain insights when performing an analysis, you will start to explore the provided dataset (or your own) through a manual analysis cycle, before switching to automated analysis using the Bandicoot framework. You will be introduced to this tool in more detail in Module 5.
In [ ]:
import pandas as pd
import numpy as np
import matplotlib
import os
import bandicoot as bc
from IPython.display import IFrame
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (10, 8)
This notebook begins with an example dataset, on which you will perform similar activities to those demonstrated in Section 1 of Module 2's Notebook 2. You are welcome to share your dataset with fellow students, in cases where they do not have access to android devices, if you are comfortable to do so.
Building applications is a separate topic, and you will begin with using another open source project from MIT to process your data in a format that can be utilized for analysis.
Bandicoot is an open-source Python toolbox, which analyzes mobile phone metadata. This section demonstrates how it can be used to collect your own data. Additional examples, as well as how Bandicoot is used to analyze mobile phone data, will be demonstrated in Module 5.
Important:
The demonstration below requires the use of an Android phone. If you do not have access to an Android phone, a file, named "metadata_sample.csv", in the "data" directory under "module_2", has been provided that you can use for your analysis. Bandicoot is not available on Apple phones due to restrictions in the operating system.
If you have an Android phone, you can export your own metadata by following these steps:
Note:
You can upload files from the directory view in your Jupyter notebook. Ensure that you select the file and then click "upload" to start the upload process.
First, load the supplied CSV file using additional options in the Pandas read_csv function. It is possible to set the index, and instruct the function to parse the datetime column when loading the file. You can read more about the function in the Pandas documentation.
In [ ]:
# Load the dataset.
# You can change the filename to reflect the name of your generated dataset,
# if you downloaded the application in the previous step. The default filename
# is "metadata.csv".
calls = pd.read_csv("data/metadata_sample.csv",parse_dates=['datetime'],
index_col=['datetime'])
Review the data.
In [ ]:
calls.head(5)
In [ ]:
# Add a column where the week is derived from the datetime column.
calls['week'] = calls.index.map(lambda observation_timestamp:
observation_timestamp.week)
# Display the head of the new dataset.
calls.head(5)
In [ ]:
calls.interaction.unique()
In [ ]:
vis = calls.hist()
While libraries such as Pandas are great at general data wrangling and analysis, for bespoke applications, this method of analysis can be somewhat tedious. In many cases you have to define what it is that you would like to visualize, and then manually complete the steps. This is where Bandicoot comes in. Using a module that has been created specifically to look at a certain type of data (in this example, mobile phone data) can save you a significant amount of time.
More content about Bandicoot will be provided in Module 5. However, the following section will give you an idea of how powerful these tools are, when used correctly.
In [ ]:
# Load the input file.
U = bc.read_csv("data/metadata_sample", "")
# Export the visualization to a new directory, "viz".
bc.visualization.export(U, "viz")
Note:
Please note that this question have been changed and is not covered in the supplied video overview. The original question have been removed.
Application and domain specific libraries are used to accelerate development and take care of repetitive tasks. List three points that you think are important to consider when using libraries such as Pandas, FunF, Bandicoot, or any other library. The points raised can be advantages, disadvantages or observations and the description should be limited to one sentence per point addressed.
Your markdown answer here.
Exercise complete:
This is a good time to "Save and Checkpoint".
In Section 1.5.1, the function automatically removes calls with a zero duration. Consider use cases where a zero length call would have a specific meaning, and where would be interested in retaining these records.
Note: In the majority of cases, these records would be removed. However, think about the contents of the data set and what calls of zero duration may signify, if the records were based on your own behavior.
Your answer here.
Exercise complete:
This is a good time to "Save and Checkpoint".
In Module 5, you will be introduced to Bandicoot in more detail, and explore and use additional features of this library.
Funf was introduced in the video content of this module. You are welcome to review the code on GitHub, download it and create your own application. Funf is a fork of the popular Funf Open Sensing Framework. You can visit the original project's wiki for more detail about architecture design, documentation, and scripts for processing collected data.
Students interested in building their own applications can review the links, and obtain the source from the supplied links. The documentation provides instructions on how to run and modify the source. You can create an application to collect your own data, but will require access to the following additional components: