In [1]:
import os
import sys
import logging
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
sys.path.append(module_path)
In [2]:
import hurraypy as hurray
import numpy as np
In [3]:
hurray.__version__
Out[3]:
First, make sure all logging messages are sent to stdout:
In [4]:
logger = logging.getLogger('hurraypy')
# console = logging.StreamHandler()
# console.setLevel(logging.DEBUG)
# console.setFormatter(logging.Formatter('%(levelname)s --- %(message)s'))
# logger.addHandler(console)
# logger.setLevel(logging.DEBUG)
In [5]:
logger.handlers
Out[5]:
In [6]:
hurray.log.log.debug("bla")
hurray.log.log.info("bla")
$ hurray --logging=debug --debug=1 --socket=~/hurray.sock
[I 170620 09:46:50 __main__:180] Listening on localhost:2222
[I 170620 09:46:50 __main__:184] Listening on /home/rg/hurray.sock
[I 170619 11:16:50 process:132] Starting 8 processes
In [7]:
# conn = hurray.connect('localhost:2222')
conn = hurray.connect('~/hurray.sock')
conn
Out[7]:
Let's create a file test.h5
(overwrite=True
replaces the file if it already exists):
In [8]:
f = conn.create_file("test.h5", overwrite=True)
Note that Hurray objects (files, datasets, groups) display nicely in Jupyter notebooks.
In [9]:
f
Out[9]:
Working with existing files works like this:
In [10]:
f = conn.File("test.h5")
print(f)
with conn.File("test.h5") as f:
print(f)
Deleting and renaming files is also possible:
In [11]:
f.delete()
Note that the object referenced by f
becomes unusable after deleting the file.
Let's create another file and renamed it to test.h5
:
In [12]:
f2 = conn.create_file("test2.h5", overwrite=True)
In [13]:
f2
Out[13]:
In [14]:
f = f2.rename("test.h5")
In [15]:
f
Out[15]:
Note that rename()
is not "in place". We must (re-)assign its return value.
In [16]:
f3 = conn.create_file("test3.h5", overwrite=True)
In [17]:
try:
f3.rename("test.h5")
except hurray.exceptions.DatabaseError as e:
print(e)
Files can be in subdirectories:
In [18]:
f4 = conn.create_file("project1/data.h5", overwrite=True)
f4
Out[18]:
In [19]:
conn.list_files("project1/")
Out[19]:
In [21]:
conn.list_files("")
Out[21]:
A file can contain two kinds of objects: groups and datasets. Essentially, groups work like Python dictionaries and datasets work like NumPy arrays.
Every group and dataset has a name. First, let's try to create a dataset. We must specify the dataset either by passing a NumPy array or by passing a shape and a datatype:
In [30]:
dst = f.create_dataset("mydata", shape=(400, 300), dtype=np.float64)
In [31]:
dst
Out[31]:
A dataset has a shape
and a dtype
, just like NumPy arrays:
In [32]:
dst.shape, dst.dtype
Out[32]:
It also has a path
, which is the name of the dataset, prefixed by the names of containing groups. Our dataset is not contained in a group. It therefore appears under the root node /
(actually, it is in a group: the file itself is the root group).
In [33]:
dst.path
Out[33]:
Let's check what data our dataset contains. Numpy-style indexing allows to read/write from/to a dataset. A [:]
-index reads the whole dataset into memory. Apparently, our dataset has been initialized with zeros:
In [34]:
dst[:]
Out[34]:
Let's overwrite this dataset with increasing floating point numbers:
In [35]:
arr = np.linspace(0, 1, num=dst.shape[0] * dst.shape[1]).reshape(dst.shape)
arr.shape == dst.shape
Out[35]:
In [36]:
dst[:] = arr
In [37]:
dst[:]
Out[37]:
Creating a dataset has increased file size:
In [38]:
f
Out[38]:
Fancy indexing allows allows to read/write only portions of a dataset. In the following example, only columns 50
to 55
of rows 10
and 11
are sent over the wire:
In [39]:
dst[10:12, 50:55]
Out[39]:
We can also overwrite the above cells using the same notation:
In [40]:
dst[10:12, 50:55] = 999
dst[9:13, 50:55]
Out[40]:
Require ... TODO
In [41]:
dst = f.require_dataset("mydata", shape=(400, 300), dtype=np.float64, exact=True)
In [42]:
dst[9:13, 50:55]
Out[42]:
This shoud result in an error because dtypes do not match:
In [43]:
f.require_dataset("mydata", shape=(400, 300), dtype=np.int16, exact=True)
Datasets can be organised in groups (and subgroups). A group is like a folder and acts like a Python dictionary. Let's create a group named "data":
In [44]:
f.create_group("mygroup")
Out[44]:
Recall that every file object is also a group and therefore acts like a dictionary. Its keys()
now lists are newly created group:
In [45]:
f.keys()
Out[45]:
Let's create a subgroup (note that groups follow POSIX filesystem conventions):
In [46]:
f.create_group("mygroup/subgroup")
Out[46]:
In [47]:
subgrp = f["mygroup/subgroup"]
subgrp
Out[47]:
Now let's put a dataset in our subgroup:
In [48]:
data = np.random.random((600, 400))
In [49]:
dst = subgrp.create_dataset("randomdata", data=data)
In [50]:
dst
Out[50]:
Every group has a tree()
method that displays sub groups and datasets as a tree.
In [51]:
f.tree()
Out[51]:
If you're not in a notebook or ipython console, tree()
will give you a text based representation:
In [52]:
print(f.tree())
Every group and dataset can be assigned a number of key/value pairs, so-called attributes:
In [53]:
dst = f["mygroup/subgroup/randomdata"]
dst.attrs["unit"] = "celsius"
dst.attrs["max_value"] = 50
Objects that have attributes get a red "A":
In [54]:
dst
Out[54]:
In [55]:
dst.attrs.keys()
Out[55]:
In [56]:
dst.attrs["unit"], dst.attrs["max_value"]
Out[56]:
In [57]:
f.tree()
Out[57]:
In [ ]: