Sparkle Data Tutorial

The purpose of this document is to walk you through how you can access data collected with SPARKLE (or Batlab!) offline, with your own code.

For the purpose of this tutorial, we will be a small sample data file generated by Sparkle.

Because the online doc for the API is not able to build completely, we are going to create a local build of the documentation to use as a reference. The directions on how to build a local copy are on the online doc.

This data uses a sample data file, which can be found in the sparkle source under test/sample/tutorial_data.hdf5.

Loading Data Files

The doc for sparkle.data.open says we can use open_acqdata to load our data.

So let's import that, along with some of the normal packages we use to manipulate and plot data:



In [1]:

    
import os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from sparkle.data.open import open_acqdata
from test import sample









    



Using default binding PyQt4



In [2]:

    
# this function reutrns the absolute filepath to some sample data that is stored in the sparkle package
data_file_path = sample.tutorialdata()
print 'opening data file', data_file_path
data = open_acqdata(data_file_path, filemode='r')
print 'data object', data









    



opening data file /home/leeloo/src/wsu/sparkle/test/sample/tutorial_data.hdf5
data object <sparkle.data.hdf5data.HDF5Data object at 0x366a390>

We can look at what is in the data file by using methods of AcquisitionData, as described in the documentation. Two such methods are keys, which gets all the high level groups of the data file, and dataset_names, which searches the data file for all datasets and returns the full keys:



In [3]:

    
print 'Keys:', data.keys()
print 'Data sets:', data.dataset_names()









    



Keys: [u'calibration_1', u'segment_1', u'segment_2']
Data sets: [u'calibration_1/calibration_intensities', u'calibration_1/reference_tone', u'calibration_1/signal', u'segment_1/test_1', u'segment_2/test_2']

Notice that all of the datasets are actually nested under other groupings, we will need to refer to their full path to retrieve them.

All data gathered on SPARKLE, will be nested under a segement. a segment is a group of tests that were run at the same time. The test numbers still increment regardless of segment, so that no two tests should have the same name.

Data gathered on Batlab does not have segments, and will just be a collection of tests.

Retrieving Metadata

We can get metadata for our groups and data sets by using the get_info method:



In [4]:

    
print 'segment group info:', data.get_info('segment_1')









    



segment group info: {u'comment': '', u'calibration_range': array([   4000.,  110000.]), u'calibration_used': 'calibration_1', u'samplerate_ad': 50000.0, u'cellid': 1}

A test segment holds information that is common to all tests in the group, such as recording sample rate. There is a ton of info saved with each test. All the information needed to recreate the stimuli are here, one for each trace. Since the get_info method returns a dict, we can use the keys method to see what info is there, without spitting all of the values to output.



In [5]:

    
print data.get_info('segment_1/test_1').keys()









    



[u'stim', u'user_tag', u'reps', u'calv', u'caldb', u'start', u'mode', u'testtype']

We can look at any of these values one at a time, if we like:



In [6]:

    
test_info = data.get_info('segment_1/test_1')
print 'reps', test_info['reps']
print 'testtype', test_info['testtype']









    



reps 3
testtype Custom

The stim key here is where all the info for the stimuli are stored. It is a JSON formatted string, and instead of parsing it ourselves, there is a special function for sparsing the stimulus info into a dict:



In [7]:

    
from pprint import pprint # pretty print for better formatting of dicts

stim_info = data.get_trace_stim('segment_1/test_1')
print 'number of traces', len(stim_info)
# the first trace is always the control with silence, so lets look at the first two
pprint(stim_info[1:4])









    



number of traces 4
[{u'components': [{u'browsedir': u'',
                   u'duration': 0.036,
                   u'filename': u'C:/shared/MouseVocalizations/IC pharm/A4-09-sylb63.call1',
                   u'index': [0, 0],
                   u'intensity': 65.0,
                   u'risefall': 0.003,
                   u'start_s': 0,
                   u'stim_type': u'Vocalization'}],
  u'overloaded_attenuation': 0,
  u'samplerate_da': 333333,
  u'time_stamps': [1432364421.256, 1432364421.505, 1432364421.756]},
 {u'components': [{u'browsedir': u'',
                   u'duration': 0.036,
                   u'filename': u'C:/shared/MouseVocalizations/IC pharm/A4-11-sylb10.call1',
                   u'index': [0, 0],
                   u'intensity': 65.0,
                   u'risefall': 0.003,
                   u'start_s': 0,
                   u'stim_type': u'Vocalization'}],
  u'overloaded_attenuation': 0,
  u'samplerate_da': 333333,
  u'time_stamps': [1432364446.274, 1432364446.524, 1432364446.773]},
 {u'components': [{u'browsedir': u'',
                   u'duration': 0.036,
                   u'filename': u'C:/shared/MouseVocalizations/IC pharm/E2-02-sylb68.call1',
                   u'index': [0, 0],
                   u'intensity': 65.0,
                   u'risefall': 0.003,
                   u'start_s': 0,
                   u'stim_type': u'Vocalization'}],
  u'overloaded_attenuation': 0,
  u'samplerate_da': 333333,
  u'time_stamps': [1432364471.286, 1432364471.533, 1432364471.786]}]



In [8]:

    
# get_info with an empty string gets the file metadata
print data.get_info('')









    



{u'date': '2015-05-22', u'computername': 'van-d12zmtw1', u'who': 'unknown', u'total cells': 1}

Getting Data sets and Plotting Data

To get the actual data out of our file, we need to know the name of the test we want. We saw earlier that we can get the names of all datasets with the dataset_names method, all of the return values from this method are valid keys to retrieve data. We can use the get_data method using one of these keys to pull out a data set from out file. Datasets are numpy arrays, and thus you can do all the same operations to them that you would do to any other numpy array



In [9]:

    
test1 = data.get_data('segment_1/test_1')
print 'class of test1', type(test1)
print test1.shape









    



class of test1 <type 'numpy.ndarray'>
(4, 3, 2, 10000)

For Sparkle data, the shape represents (# of traces, # of reps, # of channels, # of samples). If the data was gathered before 05/22/2015 then it will not include the channel dimension, and all data is single channel. Since this is a regular numpy array, we can plot it just like we would any numpy. This sample data is a selection of actual experimental data, and I have already found a trace with spikes in it for you



In [10]:

    
# this will plot a slice of the array for the 2nd trace, 4th rep, 1st channel, and every sample
plt.plot(test1[3,1,0,:])









    Out[10]:





[<matplotlib.lines.Line2D at 0x3809150>]

If we want our x-axis to appropriately repesent the data units (time), we need to go back to the metadata to find the recording sample rate and create a matching x-axis array that will represent a time point for each y value.

the samplerate for each test in a segment will always be the same, so it is stored as metadata for the segment



In [11]:

    
# samplerate_ad is for analog-to-digial, i.e. the recording sample rate
fs = data.get_info('segment_1')['samplerate_ad']
print 'Recording samplerate', fs, 'Hz'

# create x axis time points for each sample
x = np.arange(test1.shape[-1])/fs

plt.plot(x, test1[3,1,0,:])
plt.xlabel('time (s)') 
plt.title('Leeloo Dallas. Multipass.')









    



Recording samplerate 50000.0 Hz






    Out[11]:





<matplotlib.text.Text at 0x3835810>

A Step further

So now that we know how to extract metadata and plot data sets, lets take this one step further and do some spike detection. We can import another sparkle module, sparkle.tools.spikestats, for this purpose.



In [12]:

    
from sparkle.tools import spikestats

# get_data takes an optional index parameter, if you only want part of the dataset.
# this will get us the 2nd trace, 4th rep, 1st channel
trace_data = data.get_data('segment_1/test_1', (3,1,0))
print 'trace shape', trace_data.shape

# get the recording sample rate
fs = data.get_info('segment_1')['samplerate_ad']

spikes = spikestats.spike_times(trace_data, threshold=0.035, fs=fs)
print 'number of spikes', len(spikes)

# plot data with spike detection super imposed.
x = np.arange(trace_data.shape[0])/fs
plt.plot(x, trace_data)
# plot indicators for spikes over threshold at 0.8 y-value
plt.plot(spikes, np.ones_like(spikes)*0.045, 'o')
plt.xlabel('time (s)')









    



trace shape (10000,)
number of spikes 2






    Out[12]:





<matplotlib.text.Text at 0x3828c90>



In [13]:

    
# good practice to close data when we are done with it
data.close()