In image_io we already learned how to pack image into standard recordio format and load it with ImageRecordIter. This tutorial will walk through the python interface for reading and writing record io files. It can be useful when you need more more control over the details of data pipeline. For example, when you need to augument image and label together for detection and segmentation, or when you need a custom data iterator for triplet sampling and negative sampling.
Setup environment first:
In [26]:
%matplotlib inline
from __future__ import print_function
import mxnet as mx
import numpy as np
import matplotlib.pyplot as plt
In [14]:
record = mx.recordio.MXRecordIO('tmp.rec', 'w')
for i in range(5):
record.write('record_%d'%i)
record.close()
Then we can read it back by opening the same file with 'r':
In [15]:
record = mx.recordio.MXRecordIO('tmp.rec', 'r')
while True:
item = record.read()
if not item:
break
print(item)
record.close()
In [16]:
record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'w')
for i in range(5):
record.write_idx(i, 'record_%d'%i)
record.close()
We can then access records with keys:
In [17]:
record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'r')
record.read_idx(3)
Out[17]:
You can list all keys with:
In [18]:
record.keys
Out[18]:
Each record in a .rec file can contain arbitrary binary data, but machine learning data typically has a label/data structure. mx.recordio
also contains a few utility functions for packing such data, namely: pack
, unpack
, pack_img
, and unpack_img
.
pack
and unpack
are used for storing float (or 1d array of float) label and binary data:
In [27]:
# pack
data = 'data'
label1 = 1.0
header1 = mx.recordio.IRHeader(flag=0, label=label1, id=1, id2=0)
s1 = mx.recordio.pack(header1, data)
print('float label:', repr(s1))
label2 = [1.0, 2.0, 3.0]
header2 = mx.recordio.IRHeader(flag=0, label=label2, id=2, id2=0)
s2 = mx.recordio.pack(header2, data)
print('array label:', repr(s2))
In [30]:
# unpack
print(*mx.recordio.unpack(s1))
print(*mx.recordio.unpack(s2))
In [36]:
# pack_img
data = np.ones((3,3,1), dtype=np.uint8)
label = 1.0
header = mx.recordio.IRHeader(flag=0, label=label, id=0, id2=0)
s = mx.recordio.pack_img(header, data, quality=100, img_fmt='.jpg')
print(repr(s))
In [37]:
# unpack_img
print(*mx.recordio.unpack_img(s))
In [ ]: