Introduction to visualizing data in the eeghdf files

Getting started

The EEG is stored in hierachical data format (HDF5). This format is widely used, open, and supported in many languages, e.g., matlab, R, python, C, etc.

Here, I will use the h5py library in python



In [1]:

    
# import libraries
from __future__ import print_function, division, unicode_literals
%matplotlib inline
# %matplotlib notebook

import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import h5py
from pprint import pprint

import stacklineplot # local copy

# matplotlib.rcParams['figure.figsize'] = (18.0, 12.0)
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)



In [2]:

    
hdf = h5py.File('./archive/YA2741G2_1-1+.eeghdf')

The data is stored hierachically in an hdf5 file as a tree of keys and values. It is possible to inspect the file using standard hdf5 tools. Below we show the keys and values associated with the root of the tree. This shows that there is a "patient" group and a group "record-0"



In [3]:

    
list(hdf.items())









    Out[3]:





[('patient', <HDF5 group "/patient" (0 members)>),
 ('record-0', <HDF5 group "/record-0" (10 members)>)]

We can focus on the patient group and access it via hdf['patient'] as if it was a python dictionary. Here are the key,value pairs in that group. Note that the patient information has been anonymized. Everyone is given the same set of birthdays. This shows that this file is for Subject 2619, who is male.



In [4]:

    
list(hdf['patient'].attrs.items())









    Out[4]:





[('patient_name', '2619, Subject'),
 ('patientcode', '3fe61c07d97e5b5595d647f9c1dc469e'),
 ('gender', 'Male'),
 ('birthdate', '1990-01-01'),
 ('patient_additional', ''),
 ('gestatational_age_at_birth_days', -1.0),
 ('born_premature', 'unknown')]

Now we look at how the waveform data is stored. By convention, the first record is called "record-0" and it contains the waveform data as well as the approximate time (relative to the birthdate)at which the study was done, as well as technical information like the number of channels, electrode names and sample rate.



In [9]:

    
rec = hdf['record-0']
list(rec.attrs.items())









    Out[9]:





[('start_isodatetime', '2000-01-17 08:56:46'),
 ('end_isodatetime', '2000-01-17 09:26:57'),
 ('number_channels', 36),
 ('number_samples_per_channel', 362200),
 ('sample_frequency', 200.00000000000003),
 ('bits_per_sample', 16),
 ('technician', ''),
 ('patient_age_days', 3668.3727546296295)]



In [10]:

    
# here is the list of data arrays stored in the record
list(rec.items())









    Out[10]:





[('edf_annotations', <HDF5 group "/record-0/edf_annotations" (3 members)>),
 ('physical_dimensions',
  <HDF5 dataset "physical_dimensions": shape (36,), type "|O">),
 ('prefilters', <HDF5 dataset "prefilters": shape (36,), type "|O">),
 ('signal_digital_maxs',
  <HDF5 dataset "signal_digital_maxs": shape (36,), type "<i4">),
 ('signal_digital_mins',
  <HDF5 dataset "signal_digital_mins": shape (36,), type "<i4">),
 ('signal_labels', <HDF5 dataset "signal_labels": shape (36,), type "|O">),
 ('signal_physical_maxs',
  <HDF5 dataset "signal_physical_maxs": shape (36,), type "<f8">),
 ('signal_physical_mins',
  <HDF5 dataset "signal_physical_mins": shape (36,), type "<f8">),
 ('signals', <HDF5 dataset "signals": shape (36, 362200), type "<i2">),
 ('transducers', <HDF5 dataset "transducers": shape (36,), type "|O">)]



In [13]:

    
rec['physical_dimensions'][:]









    Out[13]:





array([b'uV', b'uV', b'uV', b'uV', b'uV', b'uV', b'uV', b'uV', b'uV',
       b'uV', b'uV', b'uV', b'uV', b'uV', b'uV', b'uV', b'uV', b'uV',
       b'uV', b'uV', b'uV', b'uV', b'uV', b'uV', b'uV', b'uV', b'uV',
       b'uV', b'uV', b'uV', b'uV', b'uV', b'uV', b'mV', b'mV', b''], dtype=object)



In [15]:

    
rec['prefilters'][:]









    Out[15]:





array([b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'',
       b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'', b'',
       b'', b'', b'', b'', b'', b'', b'', b'', b'', b''], dtype=object)



In [16]:

    
rec['signal_digital_maxs'][:]









    Out[16]:





array([32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767,
       32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767,
       32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767,
       32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767])



In [18]:

    
rec['signal_digital_mins'][:]









    Out[18]:





array([-32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768,
       -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768,
       -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768,
       -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768,
       -32768, -32768, -32768, -32768])



In [19]:

    
rec['signal_physical_maxs'][:]









    Out[19]:





array([  3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         3.19990200e+03,   3.19990200e+03,   3.19990200e+03,
         1.20025600e+04,   1.20025600e+04,   1.00000000e+00])



In [ ]:



In [ ]:



In [ ]:

We can then grab the actual waveform data and visualize it.



In [6]:

    
signals = rec['signals']
labels = rec['signal_labels']
electrode_labels = [str(s,'ascii') for s in labels]
numbered_electrode_labels = ["%d:%s" % (ii, str(labels[ii], 'ascii')) for ii in range(len(labels))]

Simple visualization of EEG (brief absence seizure)



In [7]:

    
# search identified spasms at 1836, 1871, 1901, 1939
stacklineplot.show_epoch_centered(signals, 1476,epoch_width_sec=15,chstart=0, chstop=19, fs=rec.attrs['sample_frequency'], ylabels=electrode_labels, yscale=3.0)
plt.title('Absence Seizure');

Annotations

It was not a coincidence that I chose this time in the record. I used the annotations to focus on portion of the record which was marked as having a seizure.

You can access the clinical annotations via rec['edf_annotations']



In [8]:

    
annot = rec['edf_annotations']



In [9]:

    
antext = [s.decode('utf-8') for s in annot['texts'][:]]
starts100ns = [xx for xx in annot['starts_100ns'][:]]  # process the bytes into text and lists of start times



In [10]:

    
df = pd.DataFrame(data=antext, columns=['text'])  # load into a pandas data frame
df['starts100ns'] = starts100ns
df['starts_sec'] = df['starts100ns']/10**7
del df['starts100ns']

It is easy then to find the annotations related to seizures



In [11]:

    
df[df.text.str.contains('sz',case=False)]



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

	text	starts_sec
86	SZ START	1380.512
88	SZ END	1384.246
91	SZ START	1416.897
93	SZ END	1422.843
99	SZ START	1476.933
103	SZ END	1483.269