Greek General Elections 2015

As a simple data analysis task, I want to quantify the amount of collective attentiond devoted to each candidate, and want to see if there are substantial differences when we look at global consumption patterns vs local (i.e. Greek) ones. Finally, I want to see to which degree attention correlates to the electoral performance, as measured by the popular vote.


In [125]:
titles = [
(u'Alexis_Tsipras', u'Αλέξης_Τσίπρας'),
(u'Antonis_Samaras', u'Αντώνης_Σαμαράς'),
(u'Dimitris_Koutsoumpas', u'Δημήτρης_Κουτσούμπας'),
(u'Evangelos_Venizelos', u'Ευάγγελος_Βενιζέλος'),
(u'Nikolaos_Michaloliakos', u'Νίκος_Μιχαλολιάκος'),
(u'Panos_Kammenos', u'Πάνος_Καμμένος'),
(u'Stavros_Theodorakis', u'Σταύρος_Θεοδωράκης'),
(u'Greek_legislative_election,_2015', u'Ελληνικές_βουλευτικές_εκλογές_2015')
]

Traffic

I downloaded data about the daily traffic to the Wikipedia articles of the major competitors from both the English and Hellenic versions of Wikipedia. To quantify the general level of attention toward the elections as a whole, I also downloaded data about traffic volume to the articles about the election themselves.


In [151]:
import json
import pandas as pd

en_titles, el_titles = zip(*titles)

l = {}
for line in open('data/traffic.json'):
    jsobj = json.loads(line)
    df = pd.DataFrame(jsobj)
    l[df['title'][0]] = df['daily_views']

fulldf = pd.DataFrame(l)

# politicians data frames
endf = fulldf[list(en_titles)]
endf.set_index(endf.index.to_datetime(), inplace=True)
eldf = fulldf[list(el_titles)]
eldf.set_index(eldf.index.to_datetime(), inplace=True)

Votes

I also downloaded the data about the popular vote from the English Wikipedia article on the elections.


In [127]:
votes = pd.Series(json.load(open('data/votes.json')))
print votes


Alexis_Tsipras           2246064
Antonis_Samaras          1718815
Dimitris_Koutsoumpas      338138
Evangelos_Venizelos       289482
Nikolas_Michaloliakos     388447
Panos_Kammenos            293371
Stavros_Theodorakis       373868
dtype: int64

The raw data

The raw time series of traffic to the various candidates are shown below. We can see that, in absolute terms, the amount of attention is comparable across the two language editions of Wikipedia. There are some missing data for the English article on Stavros Theodorakis between January the 26th and 28th.

To put these values in context, I also show the traffic to the articles about the election themselves, which can be taken as the general level of attention toward the election. It is interesting to note that in the domestic case the article about Alexis Tsipras receives more traffic than that of the elections.


In [166]:
fig, axs = subplots(ncols=2, sharex=True, sharey=True, figsize=(6, 4))

ax = axs[0]
endf[list(en_titles[:-1])].plot(ax=ax)
endf[[en_titles[-1]]].plot(ax=ax, lw=3, c='gray', alpha=.4)
ax.grid('off')
ax.legend(fontsize='xx-small', loc='upper left', frameon=False)

ax.set_yscale('log')
ax.set_ylabel('Daily traffic')

ax = axs[1]
eldf[list(el_titles[:-1])].plot(ax=ax)
eldf[[el_titles[-1]]].plot(ax=ax, lw=3, c='gray', alpha=.4)
ax.grid('off')
ax.legend(fontsize='xx-small', loc='upper left', frameon=False)
ax.set_yscale('log')

ylim(10, 1e7)

tight_layout(w_pad=.1)


The traffic volume to the english and to the hellenic articles of the election can be used as a general measure of interest toward the election. International and domestic traffic seem to correlate fairly well.


In [114]:
loglog(entot, eltot, 'o--')
xlabel(u'Daily traffic to\n{}”'.format(elections[0]))
ylabel(u'Daily traffic to\n{}”'.format(elections[1]))


Out[114]:
<matplotlib.text.Text at 0x7f11ec921650>

In [202]:
import datetime
EPOCH = datetime.datetime.fromtimestamp(0)
election_day = datetime.datetime(2015, 1, 26)
election_day_ts = election_day - EPOCH
print election_day_ts.days


16461

In [208]:
fig, axs = subplots(ncols=2, sharex=True, sharey=True, figsize=(6, 4))

ax = axs[0]
(endf[list(en_titles[:-1])] / endf[en_titles[-1]]).plot(ax=ax)
ax.grid('off')
ax.legend(fontsize='xx-small', loc='upper left', frameon=False)

ax.set_ylabel('Relative Daily traffic to Elections article')

ax = axs[1]
(eldf[list(el_titles[:-1])] / eldf[el_titles[-1]]).plot(ax=ax)
ax.grid('off')
ax.legend(fontsize='xx-small', loc='upper left', frameon=False)

tight_layout(w_pad=.1)