PySchema is a library for Python class declaration with typed fields that can be introspected and have data contracts associated with them. This allows for better data integrity checks when serializing/deserializing data and safe interaction with external tools that require typed data.
The foremost design principle when creating the library was to keep the definitions very concise and easy to read. Inspiration was taken from Django's ORM and the main use cases in mind has been database interaction (Postgres) and Apache Avro schema/datum generation.
It has been tested on Python 2.6 and Python 2.7
In [1]:
from pyschema import Record, dumps, loads
from pyschema.types import *
In [2]:
class MyRecord(Record):
foo = Text()
bar = Integer()
In [3]:
r = MyRecord(foo="hej", bar=3)
In [4]:
r.foo
Out[4]:
In [5]:
print r
Creates a json compatible string representing the object. A special $schema
field is added to the json to allow parsing of the record without prior knowledge of the which schema to use. The name of this special field can be set to something else using the pyschema.core.set_schema_name_field
In [6]:
s = dumps(r)
print s
In [7]:
o = loads(s)
print o.bar
PySchema comes with a standard set of field types that can be used to represent the most commonly used data types
Text
Integer
Float
Bytes
- for binary data, the equivalent of Python < 3 str
or Python 3 bytes
Boolean
- True or FalseDate
- datetime.date
objectsDateTime
- datetime.datetime
objectsEnum
- only allows a preset of text values (specified as an arguemnt to the constructor)List
Map
SubRecord
In [8]:
class RecordWithList(Record):
foo = List(Integer())
RecordWithList(foo=[1, 2, 3])
Out[8]:
In [9]:
class RecordWithMap(Record):
foo = Map(Boolean())
RecordWithMap(foo={u"word": True})
Out[9]:
SubRecords allow for nesting of records, i.e. storing records of some sort as fields in other records. SubRecord takes an argument being the schema (i.e. Record class) of the intended stored object. Recursive nesting can also be used by supplying pyschema.SELF
as the schema type to SubRecord, in which case the field accepts records of the parent record type.
In [10]:
class NestedRecord(Record):
foo = SubRecord(MyRecord) # MyRecord is defined above...
NestedRecord(foo=MyRecord(foo="foo", bar=5))
Out[10]:
In [11]:
class NestedSelfRecord(Record):
foo = SubRecord(SELF)
bar = Text()
NestedSelfRecord(foo=NestedSelfRecord(foo=None, bar="Second"), bar="First")
Out[11]:
Complex types are field types just like any other, so they can be combined to create complex data structures
In [12]:
class Part(Record):
value = Integer()
good = Boolean()
attributes = List(Text())
class AdvancedRecord(Record):
name = Text()
parts = Map(SubRecord(Part))
AdvancedRecord(
name=u"tool_1",
parts={
u"moo": Part(
value=u"buzz",
good=False,
attributes=["something", "other"]
)
}
)
Out[12]:
MyRecord(bar=10)
In [13]:
class OtherRecord(Record):
bar = Map(Float())
baz = List(Integer())
In [14]:
OtherRecord()
Out[14]:
Fails at serialization time when types don't match
In [15]:
broken_record = MyRecord(foo=5) # object creation works with any types (to allow for temporary unallowed values)
In [16]:
print broken_record # repr format also still works
In [17]:
print dumps(broken_record) # raises an Exception because 5 isn't a text format
In [18]:
import datetime
class Date(Field):
def dump(self, obj):
return obj.strftime("%Y-%m-%d")
def load(self, text):
return datetime.date(*(int(part) for part in text.split('-')))
In [19]:
class MyOtherRecord(Record):
date = Date()
In [20]:
s = dumps(MyOtherRecord(date=datetime.date(2013, 10, 7)))
print "Serialized:", s
print "Reloaded:", repr(loads(s).date)
In [21]:
Text.postgres_type = "TEXT"
Integer.postgres_type = "INTEGER"
@List.mixin
class ListPostgresMixin:
@property
def postgres_type(self):
return self.field_type.postgres_type + " ARRAY"
In [22]:
def create_table_from_record(schema):
parts = []
for name, field_type in schema._fields.iteritems():
parts.append("%s %s" % (name, field_type.postgres_type))
return "CREATE TABLE %s (" % (schema._schema_name,) + ", ".join(parts) + ")"
In [23]:
class MyTable(Record):
list_name = Text()
numbers = List(Integer())
create_table_from_record(MyTable)
Out[23]:
The following will trigger an error since we haven't mixed in the postgres_type
field for the Map
field type in this example.
In [24]:
class Impossibru(Record):
numbers = Map(Integer())
create_table_from_record(Impossibru)
PySchema utilizes a Schema metaclass for the Record class that hooks into the class declaration logic of the python interpreter.
When a subclass of Record is declared, the metaclass will go through the class properties and create some helper variables needed for schema introspection and general setup. To be able to keep ordering of fields, a counter is increased every time a Field is declared and this is used a the sorting key in the ordered schema.
The metaclass is responsible for setting up the following magic variables on the schema class:
_fields
- contains an OrderedDict of (name, field) mappings, where name is the field name and field is the Field instance, i.e. the type definition instance for the field. E.g. ("foo", Integer(size=4))
_schema_name
- the name of the schema. Typically the same as the class name.