One feature I would like to be added to the MusicBrainz web interface is the possibility to display timelines, e.g. showing when members arrived or left on a band page, or sorting concerts, albums releases, and so on.
Before suggesting this feature to the MusicBrainz developers (or programing it myself), I want to try and display this kind of timelines in a Jupyter notebook and see how it would look like.
Several JavaScript libraries can do the job; I decided to test the timesheet-advanced.js and visjs libraries.
In [1]:
%load_ext watermark
%watermark --python -r
In [2]:
%watermark --date --updated
The setup required to repeat these operations is explained in the introduction notebook. In case graphs do not appear in this page you can refer to the static version.
The next commands may be needed or not depending on your setup (i.e. if you use my docker setup):
In [3]:
import os
from pprint import pprint
import pandas
import sqlalchemy
# your postgres server IP
IP = 'localhost'
def sql(query, **kwargs):
"""helper function for SQL queries using the %(...) syntax
Parameters defined globally are replaced implicitely"""
params = globals().copy()
params.update(kwargs)
# define DB connection parameters if needed
PGHOST = os.environ.get('PGHOST', IP)
PGDATABASE = os.environ.get('PGDATABASE', 'musicbrainz')
PGUSER = os.environ.get('PGUSER', 'musicbrainz')
PGPASSWORD = os.environ.get('PGPASSWORD', 'musicbrainz')
engine = sqlalchemy.create_engine(
'postgresql+psycopg2://%(PGUSER)s:%(PGPASSWORD)s@%(PGHOST)s/%(PGDATABASE)s' % locals(),
isolation_level='READ UNCOMMITTED')
return pandas.read_sql(query, engine, params=params)
# helper functions to generate an HTML link to an entity MusicBrainz URL
def _mb_link(type, mbid):
return '<a href="https://musicbrainz.org/%(type)s/%(mbid)s">%(mbid)s</a>' % locals()
mb_artist_link = lambda mbid: _mb_link('artist', mbid)
Now we can extract the information for the band I want. The SQL query will look for:
Let's start with some band you probably already know:
In [4]:
band_name = 'The Beatles'
The SQL query is a bit complicated because it uses a lot of different tables, I won't go into details. We store the result in a data structure called a PanDas DataFrame (df).
In [5]:
df = sql("""
SELECT b.name AS band,
m.name AS member,
m.gid AS mbid,
lat.name AS role,
to_date(to_char(l.begin_date_year, '9999') || '0101', 'YYYYMMDD') AS start,
to_date(to_char(l.end_date_year, '9999') || '0101', 'YYYYMMDD') AS end
FROM artist AS b
JOIN l_artist_artist AS laa ON laa.entity1 = b.id
JOIN artist AS m ON laa.entity0 = m.id
JOIN link AS l ON l.id = laa.link
JOIN link_attribute AS la ON la.link = l.id
JOIN link_attribute_type AS lat ON la.attribute_type = lat.id
JOIN link_type AS lt ON l.link_type = lt.id
WHERE lt.name = 'member of band'
AND b.name = %(band_name)s
AND lat.name != 'original';
""")
df
Out[5]:
The data is here, we just want to set a start date for Lennon's roles since it is not in the database.
In [6]:
import datetime
df['start'] = df['start'].fillna(datetime.date(1957, 1, 1))
df['mbid'] = df['mbid'].astype(str) # otherwise PanDas uses the UUID data type which will cause problems later.
df
Out[6]:
The timesheet-advanced package requires the input data for the timeline to be inserted slightly differently from what we have in our dataframe df. Let us first copy our data in a new variable ts and simplify the dates to years.
In [7]:
ts = df.copy()
ts['start'] = ts['start'].apply(lambda date: date.year).astype(str)
ts['end'] = ts['end'].apply(lambda date: date.year).astype(str)
We need a 'label' field (we'll choose the band member name + instrument) and we need a 'type' which is a color. We choose colors to represent all possible roles (vocals, guitar, drums....)
In [8]:
ts['label'] = df['member'] + ' (' + df['role'] + ')'
ts
Out[8]:
In [9]:
colors = dict(zip(sorted(set(ts['role'])), ['red', 'blue', 'yellow', 'green']))
print('Correspondance between colors and roles: {}'.format(colors))
ts['type'] = ts['role'].apply(lambda role: colors[role])
ts
Out[9]:
We can also add a 'link' columns containing URLs to the MusicBrainz website:
In [10]:
ts['link'] = 'https://musicbrainz.org/artist/' + ts['mbid']
ts.drop('mbid', axis=1, inplace=True)
ts
Out[10]:
The last preparation step is to transform this Python data structure into a Javascript one that the timesheet library can read. We're going to use the fact that a Python list and a Javascript array are very close (we could also use JSON format to transform our data into smething JavaScript-compatible).
In [11]:
bubbles = [ts.loc[i].to_dict() for i in range(len(ts))]
print('First bubble:')
pprint(bubbles[0])
Perfect, bubbles contains our data. Time to do some javascript. The Jupyter notebook can display javascript code in an output cell by using the element.append magic.
To display the timeline inside this notebook we need to load the JS/CSS source of the timesheet-advanced package...
In [12]:
from IPython.display import HTML
HTML("""
<link rel="stylesheet" type="text/css" href="https://cdn.rawgit.com/ntucakovic/timesheet-advanced.js/ea3ee1ad/dist/timesheet.min.css" />
<script type="text/javascript" src="https://cdn.rawgit.com/ntucakovic/timesheet-advanced.js/ea3ee1ad/dist/timesheet-advanced.min.js"></script>
""")
Out[12]:
... and to create an output container for our timeline. This cell be filled when the next cell code (new Timesheet(...)) will be executed.
In [13]:
%%javascript
// this must be executed before the "from IPython.display import Javascript" block
element.append('<div id="timesheet-container" style="width: 100%;height: 100%;"></div>');
Last step: we call the Timesheet javascript command using the CSS/JS libraries loaded above, our input data (bubbles), the cell where we want our graph, and the timeline limit (min and max date). Executing the next cell will fill the output cell just above this block automatically.
In [14]:
from IPython.display import Javascript
Javascript("""
var bubbles = %s;
new Timesheet(bubbles, {
container: 'timesheet-container',
type: 'parallel',
timesheetYearMin: %s,
timesheetYearMax: %s,
theme: 'light'
});
""" % (bubbles, ts['start'].min(), ts['end'].max()))
Out[14]:
We have our timeline now! As you can see, the same color is used for the same role consistently. The items on the timeline are clickable links bringing you to the artist page on MusicBrainz. If you can't see the timeline above you can find a static version on github.io
We can try to display the same data with another JavaScript library, vis.js. Again we will need to prepare the data.
In [15]:
v = df.copy()
v['start'] = v['start'].apply(lambda date: date.isoformat())
v['end'] = v['end'].apply(lambda date: date.isoformat())
v.drop('mbid', axis=1, inplace=True)
v['type'] = v['role'].apply(lambda role: colors[role])
v['label'] = v['member'] + ' (' + v['role'] + ')'
v
Out[15]:
This time we are not going to inject the data inside a javascript string executed by the notebook, we are going to attach the data as JSON to the webpage itself (window) so that vis.js can find it.
In [16]:
# Transform into JSON
data = [{'start': line.start,
'end': line.end,
'content': line.label,
'className': line.type
} for _, line in ts.iterrows()]
# Send to Javascript
import json
from IPython.display import Javascript
Javascript("""window.bandData={};""".format(json.dumps(data, indent=4)))
Out[16]:
We need to load the default CSS (from cdnjs.cloudflare.com) and add our custom CSS on top:
In [17]:
%%html
<link rel="stylesheet" type="text/css" href="https://cdnjs.cloudflare.com/ajax/libs/vis/4.20.1/vis-timeline-graph2d.min.css" />
In [18]:
%%html
<style type="text/css">
/* custom styles for individual items, load this after vis.css/vis-timeline-graph2d.min.css */
.vis-item.red {
background-color: red;
}
.vis-item.blue {
background-color: blue;
}
.vis-item.yellow {
background-color: yellow;
}
.vis-item.green {
background-color: greenyellow;
}
.vis-item.vis-selected {
background-color: white;
border-color: black;
color: black;
box-shadow: 0 0 10px gray;
}
</style>
In order to load the JS library itself, we can use the require mechanism inside the notebook:
In [19]:
%%javascript
element.append('<div id="vis-container" style="width: 100%;height: 100%;"></div>');
requirejs.config({
paths: {
vis: '//cdnjs.cloudflare.com/ajax/libs/vis/4.20.1/vis'
}
});
require(['vis'], function(vis){
var data = new vis.DataSet(window.bandData);
var options = {
editable: false
};
// create the timeline
var container = document.getElementById('vis-container');
var timeline = new vis.Timeline(container, data, options);
})
And we have our timeline. Note that this time we can zoom/unzoom (with the mouse wheel) thanks to the vis.js library. You can also change the custom CSS above and see the timeline updated automatically.
In this notebook we used two different JS libraries to display the same data, extracted from the MusicBrainz DB. I hope I did not make things look too complicated and I convinced some of you to try and play with the MusicBrainz database :)