In this notebook and the next ones we are going to explore datasets centered around a MusicBrainz entity (an artist, a work, a recording...) and try to display visualisations around certain themes (work composition dates, tonalities, types, languages...). We will organize our exploration around simple questions we want answered; some would be simpler by using directly the MusicBrainz website, others would be impossible to easily there.
The setup required to reproduce these operations is explained in the introduction notebook. In case graphs do not appear in this page you can refer to the static version of this notebook.
In order to simplify this notebook the required Python commands to set everything up are located in a startup.ipy file in the same repository.
In [1]:
%run startup.ipy
Data visualization can be performed directly with the PanDas library we already showed in previous notebooks. For this one I decided to introduce the plot.ly library that provides interactive graphs with a Python or JavaScript interface. We need to import a few commands (these commands are also included in startup.ipy) in order to use this library with our PanDas dataframes:
In [2]:
import plotly.graph_objs as go
import plotly.figure_factory as ff
# offline mode for plot.ly, no need to store the graphs in their cloud
from plotly.offline import iplot
# special mode to display plotly graphs inside Jupyter notebooks
from plotly.offline import init_notebook_mode
init_notebook_mode()
For this example I chose the French composer Gabriel Fauré:
In [3]:
composer_name = 'Gabriel Fauré'
In [4]:
query_composer = """
SELECT name,
to_date(to_char(begin_date_year, '9999') ||
to_char(begin_date_month, '99') ||
to_char(begin_date_day, '99'), 'YYYY MM DD') AS date_of_birth,
to_date(to_char(end_date_year, '9999') ||
to_char(end_date_month, '99') ||
to_char(end_date_day, '99'), 'YYYY MM DD') AS date_of_death,
gid AS mbid
FROM artist
WHERE name = %(composer_name)s;
"""
composer = sql(query_composer, composer_name=composer_name)
composer_mbid = composer.mbid[0]
composer['url'] = composer.mbid.apply(mb_artist_link)
composer.drop('mbid', axis=1, inplace=True)
composer
Out[4]:
The output above uses the default view of a PanDas dataframe. plot.ly can also display data as tables with the create_table command... one advantage compared to the method above is that the HTML link defined in column url will be a real link to the composer page on MusicBrainz website.
In [5]:
iplot(ff.create_table(composer))
Note that I extracted the composer unique identifier (composer_mbid) that is needed future queries.
Let's try to answer a few questions about Fauré now.
The answer could be obtained by looking at the work page or the relationships page of the composer, or we can directly query the work and artist tables of the database:
In [6]:
works = sql("""
SELECT w.name,
w.gid AS mbid
FROM work AS w
JOIN l_artist_work AS law ON w.id = law.entity1
JOIN artist AS a ON a.id = law.entity0
WHERE a.gid = %(composer_mbid)s;
""", composer_mbid=composer_mbid)
print('Answer: %d works' % len(works))
print('First works (alphabetical order): ')
works['url'] = works.mbid.apply(mb_work_link)
iplot(ff.create_table(works[['name', 'url']].sort_values(by='name').head(15)))
Of course this list contains parent works as well as subworks/movements. If we reduce the selection to the "main" works (i.e. the ones not "part of" another work):
In [7]:
main_works = sql("""
SELECT w.name,
w.gid AS mbid
FROM work AS w
JOIN l_artist_work AS law ON w.id = law.entity1
JOIN artist AS a ON a.id = law.entity0
LEFT OUTER JOIN l_work_work AS lww ON w.id = lww.entity1
WHERE a.gid = %(composer_mbid)s
AND lww.entity1 IS NULL;
""", composer_mbid=composer_mbid)
print('Answer: %d works' % len(main_works))
print('First works (alphabetical order): ')
main_works['url'] = main_works.mbid.apply(mb_work_link)
iplot(ff.create_table(main_works[['name', 'url']].sort_values(by='name').head(10)))
So 168 works in MusicBrainz. Some of them might be transcriptions/arrangements of Fauré's works and we might want to filter them out, but let's leave that question aside for the moment.
Same question, but we we want to fetch the composition dates (attribute of the composer-work relation) this time. For simplicity we keep only the year of the composition date:
In [8]:
works_with_dates = sql("""
SELECT w.name,
to_date(to_char(l.begin_date_year, '9999') || '0101', 'YYYYMMDD') AS start,
w.gid AS mbid
FROM work AS w
JOIN l_artist_work AS law ON w.id = law.entity1
JOIN artist AS a ON a.id = law.entity0
JOIN link AS l ON l.id = law.link
LEFT OUTER JOIN l_work_work AS lww ON w.id = lww.entity1
WHERE a.gid = %(composer_mbid)s
AND lww.entity1 IS NULL
ORDER BY start;
""", composer_mbid=composer_mbid)
works_with_dates['url'] = works_with_dates.mbid.apply(mb_work_link)
iplot(ff.create_table(works_with_dates.drop('mbid', 1).sort_values(by='start').head(10)))
We can display in a bar chart how many works were composed each year (works without date do not appear), using PanDas groupby capabilities:
In [9]:
df = works_with_dates.groupby('start').count()['name']
# concatenate work names composed on the same year
text = works_with_dates[['start', 'name']].groupby('start').agg(lambda s: '<br>'.join(s)).name
iplot(go.Figure(
data=[go.Bar(x=df.index, y=df.values, text=text)],
layout=go.Layout(title="Fauré's compositions by year")
))
MuscBrainz stores composers catalogue using the Series data type, an easy way to see only the main works of a composer (without subworks or arrangements).
In [10]:
catalogue = sql("""
SELECT s.name,
s.gid AS mbid
FROM series AS s
JOIN series_type AS st ON s.type = st.id
JOIN l_artist_series AS las ON s.id = las.entity1
JOIN artist AS a ON a.id = las.entity0
WHERE st.name = 'Catalogue'
AND a.gid = %(composer_mbid)s;
""", composer_mbid=composer_mbid)
catalogue_mbid = catalogue.mbid[0]
catalogue['url'] = catalogue.mbid.apply(mb_series_link)
iplot(ff.create_table(catalogue[['name', 'url']]))
So the answer is yes.
In [11]:
works_in_catalogue = sql("""
SELECT lt.text_value AS catalog_no,
w.name AS work_name,
w.gid AS mbid
FROM work AS w
JOIN l_series_work AS lsw ON w.id = lsw.entity1
JOIN series AS s ON s.id = lsw.entity0
JOIN link_attribute_text_value AS lt ON lt.link = lsw.link
WHERE s.gid = %(catalogue_mbid)s
ORDER BY lsw.link_order ASC;
""", catalogue_mbid=catalogue_mbid)
print('Answer: %d works' % len(works_in_catalogue))
works_in_catalogue['url'] = works_in_catalogue.mbid.apply(mb_work_link)
iplot(ff.create_table(works_in_catalogue.drop('mbid', 1).head(10)))
So it looks like 88 of the ~121 opus numbers are correctly associated in MusicBrainz. Other works are either missing or not linked properly, so there is still work to do for contributors willing to complete data.
We can add to the previous query the work languages (restricted to works present in the catalogue)
In [12]:
works_with_language = sql("""
SELECT w.name AS work,
l.name AS language
FROM work AS w
JOIN l_series_work AS lsw ON w.id = lsw.entity1
JOIN series AS s ON s.id = lsw.entity0
JOIN link_attribute_text_value AS lt ON lt.link = lsw.link
JOIN work_language AS wl ON wl.work = w.id
JOIN language AS l ON l.id = wl.language
WHERE s.gid = %(catalogue_mbid)s
ORDER BY lsw.link_order ASC;
""", catalogue_mbid=catalogue_mbid)
lang = works_with_language.groupby('language').count()
lang
Out[12]:
As could be expected, a large majority of songs in french and some (religious works) in latin. We can visualize the results as a pie chart with plotly:
In [13]:
iplot([go.Pie(labels=lang.index, values=lang.work,
direction="clockwise")])
Let us move to something more subtle: some composers have a special affinity with certain tonalities (Beethoven and C minor, anyone?). Is it the case for Fauré?
In [14]:
works_with_key = sql("""
SELECT w.name,
wat.value
FROM work AS w
JOIN work_attribute AS wa ON wa.work = w.id
JOIN work_attribute_type_allowed_value AS wat ON wat.id = wa.work_attribute_type_allowed_value
JOIN l_artist_work AS law ON w.id = law.entity1
JOIN artist AS a ON a.id = law.entity0
WHERE a.gid = %(composer_mbid)s
AND wa.work_attribute_type = 1;
""", composer_mbid=composer_mbid)
tonalities = works_with_key.groupby('value').count().sort_values(by='name', ascending=False)
tonalities.head(10)
Out[14]:
In [15]:
iplot([go.Pie(labels=tonalities.index,
values=tonalities.name,
direction="clockwise")])
D-flat major, A-flat major, D minor and E minor seem to be Fauré's favorite tonalities.
That one is tricky since MuscBrainz doesn't give a lot of choices for work types. Another possibility would be to look at tags (that are not restricted to predefined values).
In [16]:
works_with_type = sql("""
SELECT w.name,
wt.name AS type
FROM work AS w
JOIN work_type AS wt ON w.type = wt.id
JOIN l_artist_work AS law ON w.id = law.entity1
JOIN artist AS a ON a.id = law.entity0
WHERE a.gid = %(composer_mbid)s;
""", composer_mbid=composer_mbid)
types = works_with_type.groupby('type').count().sort_values(by='name', ascending=False)
types
Out[16]:
In [17]:
iplot([go.Pie(labels=types.index,
values=types.name,
direction="clockwise")])
So in the standard work types, the Song seems to be a large majority. Note that specific chamber music work types (quartet, quintet...) do not exist in MusicBrainz, so most chamber music will not have a specific type and not appear in this visualization.
Of course the same treatment could be applied easily to other composers. I created another ready-for-use notebook called report-composer to combine all queries above into one. If you have the necessary setup to run the notebook, just fill the name of the composer you want to study and run all cells!
I'd love to have some feedback: did I miss something interesting? are some steps missing in the workflow shown above? Tell me!