I've been playing with the toolz library recently and it's pretty great, especially their implementation of the curry and memoize decorators. However, applying these to classes creates a problem: inheritance. A quick example before we delve into solving this problem:
In [1]:
from toolz import curry, memoize
@curry
class Person(object):
def __init__(self, name, age):
self.name = name
self.age = age
def __repr__(self):
return "Person(name={!r}, age={!r})".format(self.name, self.age)
p = Person(name='alec')
p(age=26)
Out[1]:
Currying a class we don't expect to inherit from is easy. However, if someone comes along and says, "I'd like to create a subclass" then there's an issue...
In [2]:
class PersonWithHobby(Person):
def __init__(self, name, age, hobby):
super(PersonWithHobby, self).__init__(name, age)
self.hobby = hobby
Oh...that's a problem. Instead, we have to inherit from Person.func
in the case of this decorator. Just like if we had partialled the class manually:
In [3]:
class PersonWithHobby(Person.func):
def __init__(self, name, age, hobby):
super(PersonWithHobby, self).__init__(name, age)
self.hobby = hobby
But if you're anything like me that inheritance line is...bothersome. Because we're locking up the original class in the curry decorator, there's no clean way to get it out and just inherit from it other than accessing the decorator's attributes themselves. I tried for about six hours and ended up roping some folks at /r/learnpython into that mess as well.
Using the memoize decorator presents the same issue. Instead, what we'd probably like to do is not only inherit from the class, but retain its currying or memoizing characteristics.
Ooooh, the scary "M" word. Tim Peters once said,
Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don't (the people who actually need them know with certainty that they need them, and don't need an explanation about why).
That's a pretty big warning to attach to something. Metaclasses are deep magic, but it's relatively straight forward magic. If you're unsure about what a metaclass is, check out Eli Bendersky's Python metaclasses by example for an overview. But the short of it is this:
type
is our magic that takes a class body and makes it an object. This is the default metaclass for all of our classes. However, we don't want vanilla Python classes. We'd like to have classes that are curryable or can be memoized.
The problem with using a decorator is that it'll happily apply to the immediate object, but it generally won't apply to an entire inheritance chain. But a metaclass will.
Both of what I'm about to show could be achieved by overriding __new__
on a regular class. However, that's no fun. Though, since metaclasses are more about creating classes than instances, I couldn't blame you if you created CurryableMixin
and MemoizedMixin
classes and called it a day.
However, I find cooperative multiple inheritance that uses __new__
to be difficult to manage, especially because object.__new__
accepts no arguments other than the class it makes the object out of. So some sort of sink would be needed to strip off any extra kwargs
passed along and then you have to consider if inheriting from an immutable object comes afterwards and if it'll need any of the kwargs and it'll quickly turn into a mess if you try to capture all the corner cases.
This is actually the simpler of the two metaclasses to write, surprisingly. Instead of doing any magic in __new__
or __init__
we simply override __call__
instead. __call__
in this case is analogous to __new__
in a regular class, this handles instantiation of an actual object, rather than class creation.
In [4]:
from functools import wraps
class Curryable(type):
# one level up from classes
# cls here is the actual class we've created already
def __call__(cls, *args, **kwargs):
# we'd like to preserve metadata but not migrate
# the underlying dictionary
@wraps(cls, updated=[])
# distinguish from what was passed to __call__
# and what as passed to currier
def currier(*a, **k):
return super(Curryable, cls).__call__(*a, **k)
# there's sometimes odd behavior if this isn't done
return curry(currier)(*args, **kwargs)
In [5]:
class Person(metaclass=Curryable):
def __init__(self, name, age):
self.name = name
self.age = age
def __repr__(self):
return "Person(name={!r}, age={!r})".format(self.name, self.age)
In [6]:
p = Person(name='alec')
In [7]:
p(age=26)
Out[7]:
curry
guards against type errors, allowing us to repeatedly apply arguments until we get something that doesn't throw a TypeError. This also allows us to build inheritance chains where we simply pass up kwargs
to the next class:
In [8]:
class PersonWithHobby(Person):
# as an example only; it's still best practice to declare required parameters
def __init__(self, hobby, **kwargs):
super(PersonWithHobby, self).__init__(**kwargs)
self.hobby = hobby
def __repr__(self):
return "Person(name={!r}, age={!r}, hobby={!r})".format(self.name, self.age, self.hobby)
p = PersonWithHobby(hobby='coding')
In [9]:
p(name='alec', age=26)
Out[9]:
By default, memoize
will attempt to do the right thing. However, it uses the inspect module to determine if there's keyword arguments. This can act oddly sometimes and if memoize doesn't detect keyword arguments, it'll only memoize positional arguments. Instead, we'd like to always memoize both. We could go further and attempt to bind positional keyword arguments with their actual names, but for now, this will suffice.
In [10]:
def default_cache_key(args, kwargs):
return (args or None, frozenset(kwargs.items()) or None)
If you're not familiar with descriptors, I recommend Chris Beaumont's Python Descriptors Demystified and Simeon Franklin's Descriptor talk
We'll also have to centralize our cache so we can control it. However, this presents a problem. If we have two memoized classes, they shouldn't be able to poke at each other's caches. So simply setting a dictionary on the metaclass won't work. Rather we need to allow each class to only access it's particular cache and actual instances of the class probably shouldn't have access to the cache directly either since their only business with it is existing in it.
And overriding a class's cache should also affect the master cache as well so the two remain consistent. And deleting a class's cache simply pops it from the master cache.
With that in mind, we can write a descriptor that wraps any key-value store and either return the whole store if it's the metaclass accessing it or, if it's a memoized class accessing it, the descriptor will return just the class's cache. Since we're one level up from classes and instances, I've commented which parameters correspond to the class and metaclass.
In [11]:
class HybridValueStore(object):
def __init__(self, valuestore):
self.valuestore = valuestore
# |+------------------> The Descriptor Instance
# | |+------------> The Memoized Class
# | | |+------> The Metaclass
def __get__(self, inst, cls):
if inst is None:
return self.valuestore
else:
return self.valuestore[inst]
def __set__(self, inst, value):
self.valuestore[inst] = value
def __delete__(self, inst):
self.valuestore.pop(inst, None)
In [12]:
from toolz import memoize
class Memoized(type):
cache = HybridValueStore({})
cache_key = HybridValueStore({})
def __new__(mcls, name, bases, attrs, **kwargs):
return super(Memoized, mcls).__new__(mcls, name, bases, attrs)
def __init__(cls, name, bases, attrs, key=default_cache_key, cache=None):
if cache is None:
cache = {}
cls.cache = cache
cls.cache_key = key
super(Memoized, cls).__init__(name, bases, attrs)
def __call__(cls, *args, **kwargs):
@memoize(cache=cls.cache, key=cls.cache_key)
def memoizer(*a, **k):
return super(Memoized, cls).__call__(*a, **k)
return memoizer(*args, **kwargs)
The master cache is implemented with HybridValueStore using a regular dictionary that we add further mappings. Since we've provided a __set__
method, we can use a normal dictionary rather than something like defaultdict
which provides just-in-time access to keys.
We also use the same thing with the cache_keys as well. Originally, I had planned on storing the key on the class's cache, but seeing as dict
can't host arbitrary attributes, that plan fell through. Rather, storing the key alongside the cache as a seperate attribute seems to function just fine.
__new__
is where things start to get strange. In addition to the normal parameters it accepts, there's also **kwargs
. This is to allow passing keyword arguments to the metaclass, which we'll see in a moment. In __init__
is where the extra keywords come into play:
key
is the function we'll use to create cache keys and defaults to the function described above,cache
is the mapping for storing instances. If it's not provided, it simply defaults to a regular dictionary. However, this allows using things like weakref.WeakValueDictionary
or another specialized mapping as the container rather than a regular dictionary.Both of these are simple stored on the instance of the metaclass (which is the created class) but, interestingly, these aren't available to the instances created from the class.
And finally, __call__
is where the memoization actually happens. A wrapper is created to memoize and provided with the class's cache and cache key function and the actual object instantiation is delegated to the next metaclass in the MRO (typically type
).
After all that, let's see this bad boy do its work, just two simple classes will work.
In [13]:
class Frob(metaclass=Memoized):
def __init__(self, frob):
self.frob = frob
def __repr__(self):
return "Frob({})".format(self.frob)
# simply here to show HybridValueStore's fine grained access
class Dummy(metaclass=Memoized):
def __init__(self, *args, **kwargs):
pass
def __repr__(self):
return "Dummy"
f = Frob(1)
d = Dummy()
assert f is Frob(1), "guess it didn't work"
That went well. Let's see some other parts in action:
In [14]:
print("Master Cache: ", Memoized.cache)
print("Frob Cache: ", Frob.cache)
print("Dummy Cache: ", Dummy.cache)
Good to see the fine-grained access to the cache attribute is working. How about if we reset the cache for Frob?
In [15]:
Frob.cache = {}
print("Master Cache: ", Memoized.cache)
print("Frob Cache: ", Frob.cache)
print("Dummy Cache: ", Dummy.cache)
Awesome. Now, there was the curious keyword arguments that we can pass to the metaclass...but how? It's simple, we pass them the same way metaclasses are declared (at least in Python 3):
In [16]:
from collections import OrderedDict
def make_string_key(args, kwargs):
return str(args) + str(kwargs)
class KeywordTest(metaclass=Memoized, key=make_string_key, cache=OrderedDict()):
def __init__(self, *args, **kwargs):
pass
kwt1 = KeywordTest(1, 2, 3)
kwt2 = KeywordTest(4, 5, 6)
In [17]:
print(KeywordTest.cache)
Now we have a cache that keeps order of when it's values were created.
Something curious about this setup is that instances of the memoized class can't access the cache.
In [18]:
f.cache
Which is very handy considering we probably don't want instances accidentally mucking about and overwriting the cache.
What if we wanted currying and memoization on the same class? Seems impossible since Python imposes a restriction of one metaclass per inheritance chain. However, since metaclasses are just regular classes, we can compose them together to form much more complex metaclasses.
Notice how I was using super
to call to things like __new__
, __init__
and __call__
above rather than explicitly saying, type.__new__
? This was to allow for this exact thing. With that already in place, all we need to do to create a curried and memoized class is to just place those two metaclasses together.
However, there is one thing that needs to be noted: order matters. See Raymond Hettinger's PyCon 2015 talk Super Considered Super to see why. If we want to curry then memoize, we simply do this:
In [19]:
class CurriedMemoized(Curryable, Memoized):
pass
class CMTester(metaclass=CurriedMemoized):
def __init__(self, *args, **kwargs):
pass
So far so good. Let's test it out...
In [20]:
CMTester(1, 2, 3)
print(CMTester.cache)
What about taking advantage of Memoized keyword arguments?
In [21]:
class CMKeywordTest(metaclass=CurriedMemoized, key=make_string_key, cache=OrderedDict()):
def __init__(self, *args, **kwargs):
pass
CMKeywordTest(1, 2, 3)
CMKeywordTest(4, 5, 6)
print(CMKeywordTest.cache)
Now, if we had swapped Memoized and Curryable around in the MRO, we'd get compeletly different behavior:
In [22]:
class MemoizedCurry(Memoized, Curryable):
pass
class MCTest(metaclass=MemoizedCurry):
def __init__(self, name, frob):
pass
m = MCTest(name='default frob')
m(frob=1)
print(MCTest.cache)
In this case, we're memoizing just what's partially applied rather than the actual instance. In this particular case, it's probably undesired behavior, but with other metaclasses, this might be the intended order of operations.