Liza Daly, May 2017
Using PyTorch, I took a pretrained ResNet152 model and further trained it on a collection of 40,000 images of works of art from the openly licensed collections at Rijksmuseum and The Metropolitan Museum of Art.
Each image was paired with its best-known year of origin (not all works have exact dates; I grabbed the earliest date where the museum metadata included a range). This simplified image data is available in the metadata directory of this repository.
Note that I haven't included the images themselves—the full collection is more than 150GB. Use the IDs provided in the metadata along with the instructions and terms of use for the museum APIs to acquire them.
(I've never worked with convolutional neural networks before, so take this as an example of a beginner's exploration rather than any sort of best practice.)
After 60 epochs, training loss using mean-squared error was 0.64 and validation loss was 2.98 (on 10% of the total corpus).
My hand-wavy goal was that network would make reasonable guesses and, on average, would be better than me if I were just standing in a museum. Across the whole dataset of images from the year 1250 to 1930, the model was off by an average of 66 years. See below for some details about where it did better and worse.
In [1]:
%matplotlib inline
import random
import pandas as pd
import seaborn as sns
sns.set(style="white")
import matplotlib.pyplot as plt
import math
from PIL import Image, ImageDraw, ImageFont
from IPython.display import display
from IPython.core.display import HTML
from guess import predict_by_url
In [247]:
df = pd.read_csv('results.csv')
df = df.loc[df.year > 1500]
years = df.year.max() - df.year.min()
# Round down to the nearest 50 years for display
df['rounded'] = df.year.apply(lambda x: int(math.floor(x / 50) * 50))
In [248]:
"Average error for this time slice, in years: %d" % int(df['diff'].mean())
Out[248]:
These collections are skewed towards a little recent, so we might expect the model to disproportionally guess between 1750 and 1900 when it's unsure. There's a bump in the 1600 range because the Rijksmuseum has so many Dutch masters from that period.
(This sample is from the validation set, but the proportions should be the same as training.)
In [249]:
plt.figure(figsize=(15,8))
sns.set_context("notebook", font_scale=1.5)
pt = sns.countplot(x='rounded', data=df, palette=sns.color_palette("Set1", n_colors=11, desat=.5))
pt.set(xlabel="Year of work", ylabel="Number of works")
sns.despine()
This violin plot compares the age of the work by the model's error—how much was the guess off from the true year? (The wider the blob, the more guesses fell into that range). Unsurprisingly, it's almost an inverse of the graph above: the more data we had, the more the guesses pool at the bottom of the error range. In other words, it's more accurate the more examples it had to work with, with the edges having the most error.
In [250]:
plt.figure(figsize=(15,8))
sns.set_context("notebook", font_scale=1.5)
pt = sns.violinplot(x='rounded', y='diff', data=df, linewidth=1, palette=sns.color_palette("Set1", n_colors=11, desat=.5))
pt.set(xlabel="Year of work", ylabel="Predicted error")
sns.despine()
In [251]:
df = df.sort_values('diff')
hits = df.iloc[0:50]
misses = df.tail(100)
In [252]:
def generate_image_set(df, num_images=5):
if hasattr(df, 'path'):
images = [(Image.open(row.path).convert('RGB'), row.year, row.pred) for row in df.itertuples()]
else:
images = [(row.img, row.year, row.pred) for row in df.itertuples()]
[img.thumbnail((400, 400)) for img, _, _ in images]
width = 250; height = 250
images = [(img.crop((img.size[0] / 2 - width/2, img.size[1] / 2 - height/2,
img.size[0] / 2 + width/2, img.size[1] / 2 + height/2)), year, pred) for img, year, pred in images]
images = random.sample(images, min(len(images), num_images))
fnt = ImageFont.truetype('/media/liza/ext-drive/liza/fonts/StagSans-Book.otf', 30)
im = Image.new('RGB', (num_images * width, height), (255, 255, 255))
x_offset = 0
for img, year, pred in images:
draw = ImageDraw.Draw(img)
draw.rectangle((0, height - 35, 5 * width, height), fill=(50, 50, 50))
draw.line((width, 0, width, height), width=10, fill=(255, 255, 255))
draw.text((10, height - 30), str(year), fill=(200, 200, 100), font=fnt)
draw.text((180, height - 30), str(pred), fill=(100, 200, 200), font=fnt)
im.paste(img, (x_offset, 0))
x_offset += 250
return im
First, let's take a sample from the top 50 by accuracy.
It tends to do well with drawings, I think in part because museum drawing collections tend to come in series—lots of sketches by the same artist on the same paper with similar aging properties. But that's legit; those are all qualities that an expert would use to evaluate a date.
The model is also pretty good at Dutch masters, of which it saw a lot of examples, and etchings, which tend to fall within a narrow 1600-1700 year band.
In [8]:
generate_image_set(hits)
Out[8]:
Sampling from the bottom of the range gets us guesses that are hundreds of years off from the true value. These tend to be examples that don't necessarily fit the training data well—including some bad metadata where images of sculpture snuck in. It's also lousy at non-European works, but neural networks are only as culturally sensitive as their datasets. It also tends to do poorly with very recent (1890+) works; there are fewer examples to train from, and they're more stylistically varied.
First, examples where the model thought the work was much older than it really was:
In [260]:
generate_image_set(misses.loc[misses.pred < misses.year])
Out[260]:
Examples where it thought the work was much more recent than it was tend to be early drawings. I think there's just not enough visual information for the model to work with, so it gravitates towards the most-represented years.
In [265]:
generate_image_set(misses.loc[misses.pred > misses.year])
Out[265]:
Just looking at the works after 1900 without regard for performance demonstrates what a tough task this—this stuff is all over the place, stylistically. We'd need a much better set of training examples of 20th century art to expect good results here. Alas, copyright.
In [31]:
generate_image_set(df.loc[df['year'] > 1900])
Out[31]:
It does seem to get impressionism, at least—the recent works where it guessed within 40 years of truth tend to be impressionistic.
In [41]:
generate_image_set(df.loc[(df['diff'] < 40) & (df['year'] > 1900)])
Out[41]:
In [266]:
famous_paintings = [{'year': 1892, 'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/bd/Paul_Gauguin%2C_Nafea_Faa_Ipoipo%3F_%28When_Will_You_Marry%3F%29_1892%2C_oil_on_canvas%2C_101_x_77_cm.jpg/901px-Paul_Gauguin%2C_Nafea_Faa_Ipoipo%3F_%28When_Will_You_Marry%3F%29_1892%2C_oil_on_canvas%2C_101_x_77_cm.jpg'},
{'year': 1907, 'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Gustav_Klimt_046.jpg/1197px-Gustav_Klimt_046.jpg'},
{'year': 1526, 'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4e/Darmstadtmadonna.jpg/846px-Darmstadtmadonna.jpg'},
{'year': 1664, 'url': 'https://upload.wikimedia.org/wikipedia/commons/c/c8/Vermeer_The_concert.JPG'},
{'year': 1633, 'url': 'https://upload.wikimedia.org/wikipedia/commons/f/f3/Rembrandt_Christ_in_the_Storm_on_the_Lake_of_Galilee.jpg'},
{'year': 1886, 'url': 'https://upload.wikimedia.org/wikipedia/commons/e/ee/Van_Gogh_-_Vase_mit_Pechnelken.jpeg'},
{'year': 1888, 'url': 'https://upload.wikimedia.org/wikipedia/commons/b/bc/Girl_in_Front_of_Open_Window.jpg'},
{'year': 1514, 'url': 'https://upload.wikimedia.org/wikipedia/commons/9/95/Raphael_missing.jpg'},
{'year': 1503, 'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa%2C_by_Leonardo_da_Vinci%2C_from_C2RMF_retouched.jpg/804px-Mona_Lisa%2C_by_Leonardo_da_Vinci%2C_from_C2RMF_retouched.jpg'},
{'year': 1607, 'url': 'https://upload.wikimedia.org/wikipedia/commons/b/b1/CaravaggioJeromeValletta.jpg'}]
In [267]:
results = []
for p in famous_paintings:
pred, img = predict_by_url(p['url'])
results.append({'pred': pred, 'img': img, 'year': p['year']})
Lastly, a bit of a torture test: I asked it to evaluate 10 ultra-famous paintings that have at one point been stolen. Almost by definition, these works defied the conventions of their eras and ought to be tough to place on style alone.
Still, it does OK on the two works that tend towards impressionism. But my favorite failure is the Klimt—the model's guess is wildly old, something it tends not to do because of the scarity of pre-1500 examples. I can only assume it mistook the color scheme for gold leaf.
In [268]:
generate_image_set(pd.DataFrame(results[0:5]))
Out[268]:
In [269]:
generate_image_set(pd.DataFrame(results[5:]))
Out[269]: