I've been playing with the toolz library recently and it's pretty great, especially their implementation of the curry and memoize decorators. However, applying these to classes creates a problem: inheritance. A quick example before we delve into solving this problem:


In [1]:
from toolz import curry, memoize

@curry
class Person(object):
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def __repr__(self):
        return "Person(name={!r}, age={!r})".format(self.name, self.age)
        
p = Person(name='alec')
p(age=26)


Out[1]:
Person(name='alec', age=26)

Currying a class we don't expect to inherit from is easy. However, if someone comes along and says, "I'd like to create a subclass" then there's an issue...


In [2]:
class PersonWithHobby(Person):
    def __init__(self, name, age, hobby):
        super(PersonWithHobby, self).__init__(name, age)
        self.hobby = hobby


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-12f89c179297> in <module>()
----> 1 class PersonWithHobby(Person):
      2     def __init__(self, name, age, hobby):
      3         super(PersonWithHobby, self).__init__(name, age)
      4         self.hobby = hobby

/home/justanr/contrib/toolz/toolz/functoolz.py in __init__(self, func, *args, **kwargs)
    155     def __init__(self, func, *args, **kwargs):
    156         if not callable(func):
--> 157             raise TypeError("Input must be callable")
    158 
    159         # curry- or functools.partial-like object?  Unpack and merge arguments

TypeError: Input must be callable

Oh...that's a problem. Instead, we have to inherit from Person.func in the case of this decorator. Just like if we had partialled the class manually:


In [3]:
class PersonWithHobby(Person.func):
    def __init__(self, name, age, hobby):
        super(PersonWithHobby, self).__init__(name, age)
        self.hobby = hobby

But if you're anything like me that inheritance line is...bothersome. Because we're locking up the original class in the curry decorator, there's no clean way to get it out and just inherit from it other than accessing the decorator's attributes themselves. I tried for about six hours and ended up roping some folks at /r/learnpython into that mess as well.

Using the memoize decorator presents the same issue. Instead, what we'd probably like to do is not only inherit from the class, but retain its currying or memoizing characteristics.

Metaclasses are to classes as decorators are to functions

Ooooh, the scary "M" word. Tim Peters once said,

Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don't (the people who actually need them know with certainty that they need them, and don't need an explanation about why).

That's a pretty big warning to attach to something. Metaclasses are deep magic, but it's relatively straight forward magic. If you're unsure about what a metaclass is, check out Eli Bendersky's Python metaclasses by example for an overview. But the short of it is this:

  • Everything in Python is an Object (including: functions, classes and even modules)
  • Classes make objects.
  • If classes are objects, too, it serves to reason that there's a class that makes them
  • That class-making-class is called type.

type is our magic that takes a class body and makes it an object. This is the default metaclass for all of our classes. However, we don't want vanilla Python classes. We'd like to have classes that are curryable or can be memoized.

The problem with using a decorator is that it'll happily apply to the immediate object, but it generally won't apply to an entire inheritance chain. But a metaclass will.

An Aside: new vs a Metaclass

Both of what I'm about to show could be achieved by overriding __new__ on a regular class. However, that's no fun. Though, since metaclasses are more about creating classes than instances, I couldn't blame you if you created CurryableMixin and MemoizedMixin classes and called it a day.

However, I find cooperative multiple inheritance that uses __new__ to be difficult to manage, especially because object.__new__ accepts no arguments other than the class it makes the object out of. So some sort of sink would be needed to strip off any extra kwargs passed along and then you have to consider if inheriting from an immutable object comes afterwards and if it'll need any of the kwargs and it'll quickly turn into a mess if you try to capture all the corner cases.

Curryable Metaclass

This is actually the simpler of the two metaclasses to write, surprisingly. Instead of doing any magic in __new__ or __init__ we simply override __call__ instead. __call__ in this case is analogous to __new__ in a regular class, this handles instantiation of an actual object, rather than class creation.


In [4]:
from functools import wraps

class Curryable(type):
    # one level up from classes
    # cls here is the actual class we've created already
    def __call__(cls, *args, **kwargs):
        # we'd like to preserve metadata but not migrate
        # the underlying dictionary
        @wraps(cls, updated=[])
        # distinguish from what was passed to __call__
        # and what as passed to currier
        def currier(*a, **k):
            return super(Curryable, cls).__call__(*a, **k)
        # there's sometimes odd behavior if this isn't done
        return curry(currier)(*args, **kwargs)

In [5]:
class Person(metaclass=Curryable):
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def __repr__(self):
        return "Person(name={!r}, age={!r})".format(self.name, self.age)

In [6]:
p = Person(name='alec')

In [7]:
p(age=26)


Out[7]:
Person(name='alec', age=26)

curry guards against type errors, allowing us to repeatedly apply arguments until we get something that doesn't throw a TypeError. This also allows us to build inheritance chains where we simply pass up kwargs to the next class:


In [8]:
class PersonWithHobby(Person):
    # as an example only; it's still best practice to declare required parameters
    def __init__(self, hobby, **kwargs):
        super(PersonWithHobby, self).__init__(**kwargs)
        self.hobby = hobby

    def __repr__(self):
        return "Person(name={!r}, age={!r}, hobby={!r})".format(self.name, self.age, self.hobby)
        
p = PersonWithHobby(hobby='coding')

In [9]:
p(name='alec', age=26)


Out[9]:
Person(name='alec', age=26, hobby='coding')

Memoizing Metaclass

This one is quite a bit more difficult to write. Instead of just overriding __call__, we need to override __init__ as well and provide a key-value store. Instead of throwing a bunch of code all at once, I'd rather disect it bit by bit:

default_cache_key

By default, memoize will attempt to do the right thing. However, it uses the inspect module to determine if there's keyword arguments. This can act oddly sometimes and if memoize doesn't detect keyword arguments, it'll only memoize positional arguments. Instead, we'd like to always memoize both. We could go further and attempt to bind positional keyword arguments with their actual names, but for now, this will suffice.


In [10]:
def default_cache_key(args, kwargs):
    return (args or None, frozenset(kwargs.items()) or None)

HybridValueStore

If you're not familiar with descriptors, I recommend Chris Beaumont's Python Descriptors Demystified and Simeon Franklin's Descriptor talk

We'll also have to centralize our cache so we can control it. However, this presents a problem. If we have two memoized classes, they shouldn't be able to poke at each other's caches. So simply setting a dictionary on the metaclass won't work. Rather we need to allow each class to only access it's particular cache and actual instances of the class probably shouldn't have access to the cache directly either since their only business with it is existing in it.

And overriding a class's cache should also affect the master cache as well so the two remain consistent. And deleting a class's cache simply pops it from the master cache.

With that in mind, we can write a descriptor that wraps any key-value store and either return the whole store if it's the metaclass accessing it or, if it's a memoized class accessing it, the descriptor will return just the class's cache. Since we're one level up from classes and instances, I've commented which parameters correspond to the class and metaclass.


In [11]:
class HybridValueStore(object):
    def __init__(self, valuestore):
        self.valuestore = valuestore
        
            #   |+------------------> The Descriptor Instance
            #   |     |+------------> The Memoized Class
            #   |     |     |+------> The Metaclass
    def __get__(self, inst, cls):
        if inst is None:
            return self.valuestore
        else:
            return self.valuestore[inst]
    
    def __set__(self, inst, value):
        self.valuestore[inst] = value
    
    def __delete__(self, inst):
        self.valuestore.pop(inst, None)

Actual Metaclass

Now, with those two out of the way, we can actually put the pieces together.


In [12]:
from toolz import memoize

class Memoized(type):
    cache = HybridValueStore({})
    cache_key = HybridValueStore({})
    
    def __new__(mcls, name, bases, attrs, **kwargs):
        return super(Memoized, mcls).__new__(mcls, name, bases, attrs)
   
    def __init__(cls, name, bases, attrs, key=default_cache_key, cache=None):
        if cache is None:
            cache = {}
        cls.cache = cache
        cls.cache_key = key
        super(Memoized, cls).__init__(name, bases, attrs)
    
    def __call__(cls, *args, **kwargs):
        @memoize(cache=cls.cache, key=cls.cache_key)
        def memoizer(*a, **k):
            return super(Memoized, cls).__call__(*a, **k)
        return memoizer(*args, **kwargs)

The master cache is implemented with HybridValueStore using a regular dictionary that we add further mappings. Since we've provided a __set__ method, we can use a normal dictionary rather than something like defaultdict which provides just-in-time access to keys.

We also use the same thing with the cache_keys as well. Originally, I had planned on storing the key on the class's cache, but seeing as dict can't host arbitrary attributes, that plan fell through. Rather, storing the key alongside the cache as a seperate attribute seems to function just fine.

__new__ is where things start to get strange. In addition to the normal parameters it accepts, there's also **kwargs. This is to allow passing keyword arguments to the metaclass, which we'll see in a moment. In __init__ is where the extra keywords come into play:

  • key is the function we'll use to create cache keys and defaults to the function described above,
  • cache is the mapping for storing instances. If it's not provided, it simply defaults to a regular dictionary. However, this allows using things like weakref.WeakValueDictionary or another specialized mapping as the container rather than a regular dictionary.

Both of these are simple stored on the instance of the metaclass (which is the created class) but, interestingly, these aren't available to the instances created from the class.

And finally, __call__ is where the memoization actually happens. A wrapper is created to memoize and provided with the class's cache and cache key function and the actual object instantiation is delegated to the next metaclass in the MRO (typically type).

In Action

After all that, let's see this bad boy do its work, just two simple classes will work.


In [13]:
class Frob(metaclass=Memoized):
    def __init__(self, frob):
        self.frob = frob
    
    def __repr__(self):
        return "Frob({})".format(self.frob)

# simply here to show HybridValueStore's fine grained access
class Dummy(metaclass=Memoized):
    def __init__(self, *args, **kwargs):
        pass
    
    def __repr__(self):
        return "Dummy"
    
f = Frob(1)
d = Dummy()
assert f is Frob(1), "guess it didn't work"

That went well. Let's see some other parts in action:


In [14]:
print("Master Cache: ", Memoized.cache)
print("Frob   Cache: ", Frob.cache)
print("Dummy  Cache: ", Dummy.cache)


Master Cache:  {<class '__main__.Dummy'>: {(None, None): Dummy}, <class '__main__.Frob'>: {((1,), None): Frob(1)}}
Frob   Cache:  {((1,), None): Frob(1)}
Dummy  Cache:  {(None, None): Dummy}

Good to see the fine-grained access to the cache attribute is working. How about if we reset the cache for Frob?


In [15]:
Frob.cache = {}
print("Master Cache: ", Memoized.cache)
print("Frob   Cache: ", Frob.cache)
print("Dummy  Cache: ", Dummy.cache)


Master Cache:  {<class '__main__.Dummy'>: {(None, None): Dummy}, <class '__main__.Frob'>: {}}
Frob   Cache:  {}
Dummy  Cache:  {(None, None): Dummy}

Awesome. Now, there was the curious keyword arguments that we can pass to the metaclass...but how? It's simple, we pass them the same way metaclasses are declared (at least in Python 3):


In [16]:
from collections import OrderedDict

def make_string_key(args, kwargs):
    return str(args) + str(kwargs)

class KeywordTest(metaclass=Memoized, key=make_string_key, cache=OrderedDict()):
    def __init__(self, *args, **kwargs):
        pass

kwt1 = KeywordTest(1, 2, 3)
kwt2 = KeywordTest(4, 5, 6)

In [17]:
print(KeywordTest.cache)


OrderedDict([('(1, 2, 3){}', <__main__.KeywordTest object at 0x7fcdb018f358>), ('(4, 5, 6){}', <__main__.KeywordTest object at 0x7fcdb018f908>)])

Now we have a cache that keeps order of when it's values were created.

Something curious about this setup is that instances of the memoized class can't access the cache.


In [18]:
f.cache


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-e5f688beb613> in <module>()
----> 1 f.cache

AttributeError: 'Frob' object has no attribute 'cache'

Which is very handy considering we probably don't want instances accidentally mucking about and overwriting the cache.

Currying AND Memoization Metaclass

What if we wanted currying and memoization on the same class? Seems impossible since Python imposes a restriction of one metaclass per inheritance chain. However, since metaclasses are just regular classes, we can compose them together to form much more complex metaclasses.

Notice how I was using super to call to things like __new__, __init__ and __call__ above rather than explicitly saying, type.__new__? This was to allow for this exact thing. With that already in place, all we need to do to create a curried and memoized class is to just place those two metaclasses together.

However, there is one thing that needs to be noted: order matters. See Raymond Hettinger's PyCon 2015 talk Super Considered Super to see why. If we want to curry then memoize, we simply do this:


In [19]:
class CurriedMemoized(Curryable, Memoized):
    pass

class CMTester(metaclass=CurriedMemoized):
    def __init__(self, *args, **kwargs):
        pass

So far so good. Let's test it out...


In [20]:
CMTester(1, 2, 3)
print(CMTester.cache)


{((1, 2, 3), None): <__main__.CMTester object at 0x7fcdb011c438>}

What about taking advantage of Memoized keyword arguments?


In [21]:
class CMKeywordTest(metaclass=CurriedMemoized, key=make_string_key, cache=OrderedDict()):
    def __init__(self, *args, **kwargs):
        pass
    
CMKeywordTest(1, 2, 3)
CMKeywordTest(4, 5, 6)
print(CMKeywordTest.cache)


OrderedDict([('(1, 2, 3){}', <__main__.CMKeywordTest object at 0x7fcdb018fdd8>), ('(4, 5, 6){}', <__main__.CMKeywordTest object at 0x7fcdb018fd68>)])

Now, if we had swapped Memoized and Curryable around in the MRO, we'd get compeletly different behavior:


In [22]:
class MemoizedCurry(Memoized, Curryable):
    pass

class MCTest(metaclass=MemoizedCurry):
    def __init__(self, name, frob):
        pass
    
m = MCTest(name='default frob')
m(frob=1)
print(MCTest.cache)


{(None, frozenset({('name', 'default frob')})): <function MCTest at 0x7fcdb01836a8>}

In this case, we're memoizing just what's partially applied rather than the actual instance. In this particular case, it's probably undesired behavior, but with other metaclasses, this might be the intended order of operations.

Parting Thoughts

Hopefully this has been a nice introduction to metaclasses and has shown some pratical applications of them rather than some silly examples. If you're still curious about writing your own metaclasses or learning more about them, here's some resources I recommend: