For an online introduction please see
This notebook is setup to be used with Python 3(.6+). Also to properly run this notebook the following libraries need to be installed:
pip install numpy
pip install matplotlib
pip install nixio==1.5.0b3
Note: nixio 1.5.0b3 is a beta release with many new exciting features of NIX. As of the time of the presentation (24.07.2019) these features have not made it into the main NIX release. So if you are using this notebook at a later point in time, installing via pip install nixio
should be enough.
When storing data, we have two main requirements:
Considering the simple plot above, we can list all information that it shows and by extension, that needs to be stored in order to reproduce it.
In this, and in most cases, it would be inefficient to store x-, and y-position for each data point. The voltage measurements have been done in regular (time) intervals. Thus, we rather need to store the measured values and a definition of the x-axis consisting of an offset, the sampling interval, a label, and a unit.
This is exactly the approach chosen in NIX. For each dimension of the data a dimension descriptor must be given. In NIX we define three (and a half) dimension descriptors:
Before we can store any data we need to have it lying around somewhere. Lets re-create the example data for the figure we saw above and then see, how we can store this data in a NIX file.
In [ ]:
# Lets create some example data:
import numpy as np
freq = 5.0;
samples = 1000
sample_interval = 0.001
time = np.arange(samples)
voltage = np.sin(2 * np.pi * time * freq/samples)
In [ ]:
# Lets quickly check how the data we will store actually looks like
# The next line is jupyter notebook specific and will allow us to see plots. It only works in python3.
%matplotlib notebook
import matplotlib.pyplot as plot
plot.plot(time*sample_interval, voltage)
plot.xlabel('Time [s]')
plot.ylabel('Voltage [mV]')
plot.show()
This is perfect data, we would like to keep it and store it in a file. So lets persist this wonderful data in a NIX file.
The DataArray is the most central entity of the NIX data model. As almost all other NIX-entities it requires a name and a type. Both are not restricted but names must be unique inside a Block. type information can be used to introduce semantic meaning and domain-specificity. Upon creation, a unique ID will be assigned to the DataArray.
The DataArray stores the actual data together with label and unit. In addition, the DataArray needs a dimension descriptor for each dimension. The following snippet shows how to create a DataArray and store data in it.
In [ ]:
import nixio
# First create a file we'll use to work with
# Files can be opened in FileMode "ReadOnly", "ReadWrite" and "Overwrite"
# ReadOnly ... Opens an existing file for reading
# ReadWrite ... Opens an existing file for editing or creates a new file
# Overwrite ... Truncates and opens an existing file or creates a new file
f = nixio.File.open('Tutorial.nix', nixio.FileMode.Overwrite)
# Please note, that nix works on an open file and reads and writes directly from and to this file.
# Always close the file using 'f.close()' when you are done.
As you can see in the NIX data model, NIX files are hierarchically structured. Data is stored in 'DataArrays'. DataArrays are contained in 'Blocks'. When we want to create a DataArray, we need to create at least one Block first, that will contain the DataArray.
In [ ]:
# Lets check the blocks we currently have defined in our file; it should be emtpy
f.blocks
In [ ]:
# Lets see how we can create a block in our file; we'll use the handy python help function to get more information
help(f.create_block)
In [ ]:
# "name" and "type" of a block can be used to filter and find our blocks later on when the file contains more content
block = f.create_block(name="basic_examples", type_="examples")
# Please note at this point, that the 'name' of any NIX entity e.g. Blocks, DataArrays, etc. has to be unique
# since it can be used to find and return this exact entity via the 'name'.
# The 'type' can also be used to find entities, but it does not need to be unique. You can use 'name' to uniquely
# identify a particular entity and use 'type' to find groups of related entities
In [ ]:
# Great, we have an empty block
block
In [ ]:
# And this block resides within our file
f.blocks
Now we are finally set up to put our data in our file!
In [ ]:
# First lets check how we can actually create a DataArray
help(block.create_data_array)
In [ ]:
# Now we create the DataArray within the Block created above and with the data
# We also add the appropriate labels immediately.
da = block.create_data_array(name="data_regular", array_type="sine", data=voltage)
da.label = "voltage"
da.unit = "mV"
In [ ]:
# Now we will also add the appropriate Dimension to this DataArray, so it can be correctly interpreted for
# later plotting. We will look into the different Dimensions in a second.
# Note that we always should add dimensions in the order x, y, z ... when thinking in plot terms
# This is necessary to later properly interpret data without knowing the actual structure of a DataArray.
# First we check how to properly create the Dimension we need
help(da.append_sampled_dimension)
In [ ]:
# And lets add the Dimension of our X axis to our DataArray:
dim = da.append_sampled_dimension(sample_interval)
dim.label = "time"
dim.unit = "s"
In [ ]:
# We also want to add a Dimension to our Y axis to make the DataArray consistent even if we do not add
# any additional annotations. We will see what a set dimension is later on
dim_set = da.append_set_dimension()
In the example shown above, the NIX library will figure out the dimensionality of the data, the shape of the data and its type. The data type and the dimensionality (i.e. the number of dimensions) are fixed once the DataArray has been created. The actual size of the DataArray can be changed during the life-time of the entity.
In case you need more control, DataArrays can be created empty for later filling e.g. during data acquisition.
In [ ]:
# Now lets see if we can access our data and do something useful with it e.g. plot it:
plot_data = f.blocks['basic_examples'].data_arrays['data_regular']
plot_data
In [ ]:
plot_data[:5]
In [ ]:
# Lets check the dimensionality of our data
plot_data.dimensions
In [ ]:
# Since we only stored the sampling rate with the second dimension we save quite a bit of space
dim = plot_data.dimensions[0]
dim.sampling_interval
In [ ]:
# Compared to the original time array:
time
In [ ]:
# Lets plot all data from file using all information provided by the file
y = plot_data[:]
# The sampled dimension axis function applies its interval to a passed array to recreate the original time array
x = plot_data.dimensions[0].axis(y.shape[0])
plot.figure(figsize=(10,5))
plot.plot(x, y, '-')
plot.xlabel("%s [%s]" % (dim.label, dim.unit))
plot.ylabel("%s [%s]" % (plot_data.label, plot_data.unit))
plot.title("%s/%s" % (plot_data.name, plot_data.type))
View, that was already nice. As you have seen in the example we dealt with regularly sampled data. What do we do if we have data that is not regularly sampled? As mentioned at the beginning, NIX supports
In [ ]:
# Lets create some irregularly sampled data and store it
duration = 1.0
interval = 0.02
time_points = np.around(np.cumsum(np.random.poisson(interval*1000, int(1.5*duration/interval)))/1000., 3)
time_points = time_points[time_points <= duration]
data_points = np.sin(5 * np.arange(0, time_points[-1] * 2 * np.pi, 0.001))
data_points = data_points[np.asarray(time_points / 0.001 * 2 * np.pi, dtype=int)]
In [ ]:
# Check the block we want to save this data in:
block
In [ ]:
data_irr = block.create_data_array(name="data_irregular", array_type="sine", data=data_points)
data_irr.label = "Voltage"
data_irr.unit = "mV"
In [ ]:
# Lets add our x dimension
dim = data_irr.append_range_dimension(time_points)
dim.label = "time"
dim.unit = "s"
In [ ]:
# And our y dimension
dim_set = data_irr.append_set_dimension()
In [ ]:
# Lets plot our data again
plot_data = f.blocks['basic_examples'].data_arrays['data_irregular']
x_dim = plot_data.dimensions[0]
x = list(x_dim.ticks)
y = plot_data[:]
plot.figure(figsize=(10,5))
plot.plot(x, y, '-o')
plot.xlabel("%s [%s]" % (x_dim.label, x_dim.unit))
plot.ylabel("%s [%s]" % (plot_data.label, plot_data.unit))
plot.title("%s/%s" % (plot_data.name, plot_data.type))
In [ ]:
# Next we will store some basic set or "event" data
data_points = [281, 293, 271, 300, 285, 150]
data_event = block.create_data_array(name="data_event", array_type="event", data=data_points)
data_event.label = "temperature"
data_event.unit = "K"
# Add x dimension
dim = data_event.append_set_dimension()
dim.labels = ["Response A", "Response B", "Response C", "Response D", "Response E", "Response F"]
# Add y dimension
dim_set = data_event.append_set_dimension()
In [ ]:
# And lets see how we can plot this
plot_data = f.blocks['basic_examples'].data_arrays['data_event']
x_dim = plot_data.dimensions[0]
y = plot_data[:]
index = np.arange(len(y))
plot.figure(figsize=(10,5))
plot.bar(index, y)
plot.xticks(index, x_dim.labels)
plot.ylabel("%s [%s]" % (plot_data.label, plot_data.unit))
plot.title("%s/%s" % (plot_data.name, plot_data.type))
Now we know how to save two dimensional data in a DataArray. Due to the ability of adding dimensions NIX also supports multidimensional data and is able to properly describe it. As examples one could save 2D images including their different color channels into one DataArray.
Another use case would be to store different time series data together in one DataArray.
In [ ]:
# Lets create data for two related time series and store them together
# ---- MOCK DATA; the code can be safely ignored --------
freq = 5.0;
samples = 1000
sample_interval = 0.001
time = np.arange(samples)
voltage_trace_A = np.sin(2 * np.pi * time * freq/samples)
voltage_trace_B = np.cos(2 * np.pi * time * freq/samples)
# We use a numpy function that will stack both signal
voltage_stacked = np.vstack((voltage_trace_A, voltage_trace_B))
# ---- MOCK DATA end --------
# Lets create a new DataArray with our multi-dimensional data
data_related = block.create_data_array(name="data_multi_dimension", array_type="multi", data=voltage_stacked)
data_related.label = "voltage"
data_related.unit = "mV"
In [ ]:
# To properly describe the DataArray we need to add two dimensions
# First we describe the depth of the stacked arrays
dim_set = data_related.append_set_dimension()
# Take care to add the lables in the order the arrays were stacked above.
dim_set.labels = ["Trace_A", "Trace_B"]
In [ ]:
# Second we add the second dimension that is common to both stacked arrays of data and describes time
dim_sample = data_related.append_sampled_dimension(sample_interval)
dim_sample.label = "time"
dim_sample.unit = "s"
# And add the y dimensions
dim_set = data_related.append_set_dimension()
In [ ]:
# Lets harvest the fruits of our labour
plot_data = f.blocks['basic_examples'].data_arrays['data_multi_dimension']
In [ ]:
dim_set = plot_data.dimensions[0]
dim_sampled = plot_data.dimensions[1]
# We need to know the dimension of the x-axis, so we compute the
# timepoints from one of the stored arrays and the sampled dimension interval
data_points_A = plot_data[0, :] # Here we access the first of the multidimensional arrays
time_points = dim_sampled.axis(data_points_A.shape[0])
plot.figure(figsize=(10,5))
# Now we add as many plots as we have set dimensions
for i, label in enumerate(dim_set.labels):
plot.plot(time_points, plot_data[i, :], label=label)
plot.xlabel("%s [%s]" % (dim_sampled.label, dim_sampled.unit))
plot.ylabel("%s [%s]" % (plot_data.label, plot_data.unit))
plot.title("%s/%s" % (plot_data.name, plot_data.type))
plot.legend()
What we have seen so far:
Saving data together with data of annotation is what you can easily do with Matlab and with additional work in Python as well. NIX does that and it provides additional features to continue working on this initial data and save the analyzed data in relation to the initial data in a meaningfull way.
The DataArrays store data, but this is not all that is needed to store scientific data. We may want to highlight points or regions in the data and link it to further information.
This is done using the Tag and the MultiTag, for tagging single or mutliple points or regions, respectively.
The basic idea is that the Tag defines the point (and extent) with which it refers to points (or regions) in the data. A tag can point to several DataArrays at once. These are mere links that are stored in the list of references. The following figure illustrates, how a MultiTag links two DataArrays to create a new construct.
In [ ]:
# Let us create a new block to illustrate tagged data
block_tag = f.create_block(name="tag_examples", type_="examples")
To reference only a single point or region, we can use a NIX tag. The NIX tag is a simpler form of the MultiTag that we will cover in a moment.
In [ ]:
# We will create some more elaborate example data to make a point
# For this we need some equally elaborate code, which can be safely ignored.
# This code will create some mock membrane voltage traces for us
# ---- MOCK CODE AND DATA; the code can be safely ignored --------
class LIF(object):
def __init__(self, stepsize=0.0001, offset=1.6, tau_m=0.025, tau_a=0.02, da=0.0, D=3.5):
self.stepsize = stepsize # simulation stepsize [s]
self.offset = offset # offset curent [nA]
self.tau_m = tau_m # membrane time_constant [s]
self.tau_a = tau_a # adaptation time_constant [s]
self.da = da # increment in adaptation current [nA]
self.D = D # noise intensity
self.v_threshold = 1.0 # spiking threshold
self.v_reset = 0.0 # reset voltage after spiking
self.i_a = 0.0 # current adaptation current
self.v = self.v_reset # current membrane voltage
self.t = 0.0 # current time [s]
self.membrane_voltage = []
self.spike_times = []
def _reset(self):
self.i_a = 0.0
self.v = self.v_reset
self.t = 0.0
self.membrane_voltage = []
self.spike_times = []
def _lif(self, stimulus, noise):
"""
euler solution of the membrane equation with adaptation current and noise
"""
self.i_a -= self.i_a - self.stepsize/self.tau_a * (self.i_a)
self.v += self.stepsize * ( -self.v + stimulus + noise + self.offset - self.i_a)/self.tau_m;
self.membrane_voltage.append(self.v)
def _next(self, stimulus):
"""
working horse which delegates to the euler and gets the spike times
"""
noise = self.D * (float(np.random.randn() % 10000) - 5000.0)/10000
self._lif(stimulus, noise)
self.t += self.stepsize
if self.v > self.v_threshold and len(self.membrane_voltage) > 1:
self.v = self.v_reset
self.membrane_voltage[len(self.membrane_voltage)-1] = 2.0
self.spike_times.append(self.t)
self.i_a += self.da
def run_const_stim(self, steps, stimulus):
"""
lif simulation with constant stimulus.
"""
self._reset()
for i in range(steps):
self._next(stimulus);
time = np.arange(len(self.membrane_voltage))*self.stepsize
return time, np.array(self.membrane_voltage), np.array(self.spike_times)
def run_stimulus(self, stimulus):
"""
lif simulation with a predefined stimulus trace.
"""
self._reset()
for s in stimulus:
self._next(s);
time = np.arange(len(self.membrane_voltage))*self.stepsize
return time, np.array(self.membrane_voltage), np.array(self.spike_times)
def __str__(self):
out = '\n'.join(["stepsize: \t" + str(self.stepsize),
"offset:\t\t" + str(self.offset),
"tau_m:\t\t" + str(self.tau_m),
"tau_a:\t\t" + str(self.tau_a),
"da:\t\t" + str(self.da),
"D:\t\t" + str(self.D),
"v_threshold:\t" + str(self.v_threshold),
"v_reset:\t" + str(self.v_reset)])
return out
def __repr__(self):
return self.__str__()
lif_model = LIF()
time, voltage, spike_times = lif_model.run_const_stim(10000, 0.005)
# ---- MOCK CODE AND DATA end --------
In [ ]:
# This is what our time data looks like:
time[:10]
In [ ]:
# This is what our voltage data looks like:
voltage[:10]
In [ ]:
# Our assumption is, that we analysed the voltage traces and identified times where neurons where spiking.
# The data for these spike times are found in the third mock data and look like:
spike_times
In [ ]:
# With the mock membrane voltage traces we can now create a new DataArray on our Block
data = block_tag.create_data_array(name="membrane_voltage_A", array_type="regular_sampled", data=voltage)
data.label = "membrane voltage"
data.unit = "mV"
# As we are used to by now, we add the time dimension as a sampled dimension with the sample interval
dim = data.append_sampled_dimension(time[1]-time[0])
dim.label = "time"
dim.unit = "s"
In [ ]:
# Now we want to store the data from our analysis step, the identified spike times. We store them in a separate
# DataArray on the same Block, right next to our initial data.
spike_data = block_tag.create_data_array(name="spike_times_A", array_type="set", data=spike_times)
# The analysed data set needs to have the same dimensionality like the initial data set so it can be linked
# via a tag or multi tag. Therefore we add two dimensions, they don't need to contain data, since it is
# assumed, that the analysed data will map to the x-axis of the initial data.
spike_data.append_set_dimension()
spike_data.append_set_dimension()
In [ ]:
# We want to make sure, that anyone opening this file will know, that "spike_times_A"
# were derived from the DataArray "membrane_voltage_A".
# We can do that by connecting them via a "MultiTag"
# We first create the MultiTag on the same Block right next to our two DataArrays. Lets see how we can do that:
help(block_tag.create_multi_tag)
In [ ]:
# We create the multi tag with the derived spike data
multi_tag = block_tag.create_multi_tag(name="tag_A", type_="spike_times", positions=spike_data)
# Now we hook the spike data up to the original data
multi_tag.references.append(data)
In [ ]:
# And now we see how these two data sets can be plotted together
# To interpret and plot tagged data, we only need the tag, we do not even need to know the DataArrays themselves.
plot_tag = f.blocks['tag_examples'].multi_tags['tag_A']
# We read in the initial data from the multi tag
init_data = plot_tag.references[0]
# Note that "plot_tag.references" returns a list since a tag could reference multiple original DataArrays
# We read in the spike times from the multi tag
spike_times = plot_tag.positions
In [ ]:
# Now we prepare both initial and analysed data for plotting
dim_sampled = init_data.dimensions[0] # We again reconstruct the time axis
time_points = dim_sampled.axis(init_data.shape[0])
plot.figure(figsize=(10,5))
plot.plot(time_points, init_data[:], label=init_data.name)
plot.scatter(spike_times[:], np.ones(spike_times[:].shape)*np.max(init_data), color='red', label=spike_times.name)
plot.xlabel("%s [%s]" % (dim_sampled.label, dim_sampled.unit))
plot.ylabel("%s [%s]" % (init_data.label, init_data.unit))
plot.title("%s/%s" % (plot_data.name, plot_data.type))
plot.legend()
We can extract information from multiple steps of analysis and are able to plot data and analyses data without having to know or directly access the DataArrays that contain the acutal data.
NIX does not only allow to save initial data and analysed data within in the same file. It also allows to create structured annotations of the experiments that were conducted and connects this information directly to the data.
Metadata in NIX files is stored in the odML format and is saved in parallel to the actual "DataTree" in a "MetadataTree" but can easily be connected to Data in the DataTree.
odML is a hierarchically structured data format, that provides grouping in nestable 'Sections' and stores information in 'Property'-'Value' pairs.
In [ ]:
# Let us annotate the DataArray in our last example.
# As we can see, we have not stored any metadata in our current file yet.
f.sections
In [ ]:
# Lets check how we can create a new section:
help(f.create_section)
In [ ]:
# First we need to create a Section that can contain our annotations
section = f.create_section(name="tag_examples", type_="general_section")
f.sections
In [ ]:
# This section can contain further sections as well as properties:
section.sections
In [ ]:
section.props
In [ ]:
# Lets store additional information about the initial data in our tag example.
sub_sec = section.create_section(name="subject", type_="experiment_A")
In [ ]:
# Lets add some properties to this section
help(sub_sec.create_property)
In [ ]:
prop = sub_sec.create_property(name="species", values_or_dtype="Mus Musculus")
prop = sub_sec.create_property(name="age", values_or_dtype="4")
prop.unit = "weeks"
prop = sub_sec.create_property(name="subjectID", values_or_dtype="78376446-f096-47b9-8bfe-ce1eb43a48dc")
In [ ]:
# Lets check what we have so far:
f.sections
In [ ]:
# One section that will describe our tag_examples
f.sections['tag_examples'].sections
In [ ]:
# A subsection that contains subject related information
f.sections['tag_examples'].sections['subject'].props
In [ ]:
# We can now connect the section describing our tag_example directly to the MultiTag that references both
# the initial as well as the analysed data.
multi_tag = f.blocks['tag_examples'].multi_tags['tag_A']
multi_tag.metadata = f.sections['tag_examples']
In [ ]:
# Now when we look at the data via a MultiTag we can directly access all metadata that has been attached to it.
# E.g. get information about the subject the experiment was conducted with
multi_tag.metadata.sections['subject']
In [ ]:
# We can also attach the same section to the initial DataArray
init_data = f.blocks['tag_examples'].data_arrays['membrane_voltage_A']
init_data.metadata = f.sections['tag_examples']
In [ ]:
# And we can also find it in reverse: we can select a section and find all data, that is connected to it
sec = f.sections['tag_examples']
sec.referring_data_arrays
In [ ]:
sec.referring_multi_tags
In [ ]:
In [ ]:
# And finally we close our file.
f.close()
In [ ]: