Cerebral Cortex Data Analysis Algorithms

Cerebral Cortex contains a library of algorithms that are useful for processing data and converting it into features or biomarkers. This page demonstrates a simple GPS clustering algorithm. For more details about the algorithms that are available, please see our documentation. These algorithms are constantly being developed and improved through our own work and the work of other researchers.

Initalize the system


In [ ]:
%reload_ext autoreload
from util.dependencies import *
from settings import USER_ID

CC = Kernel("/home/md2k/cc_conf/")

Generate some sample location data

This example utilizes a data generator to protect the privacy of real participants and allows for anyone utilizing this system to explore the data without required institutional review board approvals. This is disabled for this demonstration to not create too much data at once.


In [ ]:
# gen_location_datastream(CC, user_id=USER_ID, stream_name="GPS--org.md2k.phonesensor--PHONE")

Get stream data

Read the demo GPS stream and show some example values. A typical GPS sample contains values for latitude, longitude, altitude, speed, bearing, and accuracy.


In [ ]:
gps_stream = CC.get_stream("GPS--org.md2k.phonesensor--PHONE")
gps_stream.show(3)
gps_stream.summary()

Cluster the location data

Cerebral Cortex makes it easy to apply built-in algorithms to data streams. In this case, gps_clusters is imported from the algorithm library, then compute is utilized to run this algorithm on the gps_stream to generate a set of centroids. This is the general format for applying algorithm to datastream and makes it easy for researchers to apply validated and tested algorithms to his/her own data without the need to become an expert in the particular set of transformations needed.

Note: the compute method engages the parallel computation capabilities of Cerebral Cortex, which causes all the data to be read from the data storage layer and processed on every computational core available to the system. This allows the computation to run as quickly as possible and to take advantage of powerful clusters from a relatively simple interface. This capability is critical to working with mobile sensor big data where data sizes can exceed 100s of gigabytes per datastream for larger studies.


In [ ]:
from cerebralcortex.algorithms import gps_clusters
centroids = gps_stream.compute(gps_clusters)
centroids.show(truncate=False)

Visualize GPS Data

GPS Stream Plot

GPS visualization requires dedicated plotting capabilities. Cerebral Cortex includes a library to allow for interactive exploration. In this plot, use your mouse to drag the map around along with zooming in to explore the specific data points.


In [ ]:
gps_stream.plot_gps_cords(zoom=8)

Centroids Stream Plot

This plot shows only the centroid locations from the clustering algorithm.


In [ ]:
centroids.plot_gps_cords(zoom=12)