Protobuf Serialisation

This notebook documents how Acton serialises protobufs.

Protobufs can be serialised and deserialised individually using the built-in methods SerializeToString and ParseFromString:


In [ ]:
# Serialising.
with open(path, 'wb') as proto_file:
    proto_file.write(proto.SerializeToString())

In [ ]:
# Deserialising. (from acton.proto.io)
proto = Proto()
with open(path, 'rb') as proto_file:
    proto.ParseFromString(proto_file.read())

To serialise multiple protobufs into one file, we serialise each to a string, write the length of this string to a file, then write the string to the file. The length is needed because protobufs are not self-delimiting. We use an unsigned long long with the struct library to store the length.


In [ ]:
for proto in protos:
    proto = proto.SerializeToString()
    length = struct.pack('<Q', len(proto))
    proto_file.write(length)
    proto_file.write(proto)

We also want to store metadata in the resulting file. This is achieved by encoding the metadata as a bytestring and writing it before we write any protobufs. As with protobufs, we must store the length of the metadata before the metadata itself, and we again use an unsigned long long.

Reading the files back in is the inverse of the above; we simply unpack instead of packing and call ParseFromString.


In [ ]:
length = proto_file.read(8)  # 8 = long long
while length:
    length, = struct.unpack('<Q', length)
    proto = Proto()
    proto.ParseFromString(proto_file.read(length))
    length = proto_file.read(8)