Risk-limiting audit code for Proportional Representation via Highest Averages

Code and demo from Verifiable European Elections: Risk-limiting Audits for D’Hondt and its relatives by Philip B. Stark and Vanessa Teague, March 26, 2015

Routines:

  • dHondt(partyTotals, seats, divisors)
  • uMax(win, lose)
  • minSampleSize(ballots, u, gamma=0.95, alpha=0.1)

Demonstrated on Denmark's 2014 European Union Parliamentary election which uses an open list proportional representation voting method, with seats allocated via the highest averages method.

Parties can form coalitions, in which case first seats are allocated across the coalitions, and then from the the seats for each coalition, the parties within the coalition are allocated seats.

Works with both Python 2.7 and Python 3

How the ballots look and are marked (for party, for candidate), from Altinget.dk: This is how the European Parliament will change EU elections


In [1]:
from __future__ import division
from __future__ import print_function

import math
import numpy as np

def dHondt(partyTotals, seats, divisors):
    '''
    allocate <seats> seats to parties according to <partyTotals> votes,
    using D'Hondt proportional allocation with <weights> divisors
    
    Input:
        partyTotals: list of total votes by party
        seats:       total number of seats to allocate
        divisors:    divisors for proportional allocation. For d'Hondt, divisors are 1, 2, 3, ...
    
    Returns:
        partySeats:  list of number of seats for each party
        seated:      list of tuples--parties with at least one seat, 
                     number of votes that party got, 
                     and divisor for last seated in the party
        notSeated:   list of tuples--parties with at least one lost seat, 
                     number of votes that party got,
                     and divisor for the first non-seated in the party
        pseudoCandidates: matrix of votes for each pseudocandidate
    '''
    pseudoCandidates = np.array([partyTotals,]*seats, ).T/divisors.astype(float)
    sortedPC = np.sort(np.ravel(pseudoCandidates))
    lastSeated = sortedPC[-seats]
    theSeated = np.where(pseudoCandidates >= lastSeated)
    partySeats = np.bincount(theSeated[0], minlength=len(partyTotals)) # number of seats for each party
    inx = np.nonzero(partySeats)[0]  # only those with at least one seat
    seated = list(zip(inx, partyTotals[inx], divisors[partySeats[inx]-1]))
                                           # parties with at least one seat, 
                                           # number of votes that party got,
                                           # and divisor for last seated in the party
    theNotSeated = np.where(pseudoCandidates < lastSeated)
    partyNotSeats = np.bincount(theNotSeated[0], minlength=len(partyTotals)) # number of non-seats for each party
    inx = np.nonzero(partyNotSeats)[0]
    notSeated = list(zip(inx, partyTotals[inx], divisors[partySeats[inx]]))
                                           # parties with at least one unseated, 
                                           # number of votes that party got,
                                           # and divisor for the first non-seated in the party
    if (lastSeated == sortedPC[-(seats+1)]):
        raise ValueError("Tied contest for the last seat!")
    else:
        return partySeats, seated, notSeated, lastSeated, pseudoCandidates
    
def uMax(win, lose):  
    ''' 
    finds the upper bound u on the MICRO for the contest
    win and lose are lists of triples: [party, tally(party), divisor]
    the divisor for win is the largest divisor for any seat the party won
    the divisor for lose is the smallest divisor for any seat the party lost
    See Stark and Teague, 2014, equation 7.
    
    Input: 
        win:  list of triples--party, tally(party), divisor
        lose: list of triples--party, tally(party), divisor
        
    Returns:
        maximum possible relative overstatement for any ballot
    '''
    new_u = 0.0
    u = new_u
    for w in win:
        for ell in lose:
            if w[0] != ell[0]:
                new_u = (ell[2] + w[2]) / (ell[2]*w[1] - w[2]*ell[1])
                u = max(u, new_u)
                # print "%s," % ((round(u, 7), round(new_u, 7), w, ell),)  # u: %.4g, new_u: %.4g, winner: %s, loser: %s
    return u

def minSampleSize(ballots, u, gamma=0.95, alpha=0.1):
    '''
    find smallest sample size for ballot-level comparison audit
    using risk-limit alpha, and cushion gamma \in (0,1)

    1/alpha = (gamma/(1-1/(ballots*u))+1-gamma)**n
    Input: 
        ballots: number of ballots cast in the contest
        u:       upper bound on overstatement per ballot
        gamma:   hedge against finding a ballot that attains the upper bound. Larger values give
                 less protection
        alpha:   risk limit
    '''
    return math.ceil(math.log(1.0/alpha) / math.log(gamma/(1.0-1.0/(ballots*u)) + 1.0 - gamma))

Re-tally election results


In [2]:
# Final 2014 Danish EU Parliamentary election results from http://www.dst.dk/valg/Valg1475795/valgopg/valgopgHL.htm
# There were two coalitions: (A,B,F) and (C,V)
#
#  Official results by party
#
A = 435245
B = 148949
C = 208262
F = 249305
I = 65480
N = 183724
O = 605889
V = 379840
Ballots = 2332217  # includes invalid and blank ballots
nSeats = 13  # seats to allocate
#
# allocate seats to coalitions
#
coalitionTotals = np.array([A+B+F, C+V, I, N, O])  # for coalitions
coalitionSeats, coalitionSeated, coalitionNotSeated, coalitionLastSeated, coalitionPCs = dHondt(coalitionTotals, nSeats, np.arange(1, nSeats+1))
print('A+B+F, C+V, I, N, O:', coalitionSeats)
#
# allocate seats within coalitions
#
nABFSeats = coalitionSeats[0]
nCVSeats = coalitionSeats[1]
ABFSeats, ABFSeated, ABFNotSeated, ABFLastSeated, ABFPCs = dHondt(np.array([A, B, F]), nABFSeats, np.arange(1, nABFSeats+1))
CVSeats, CVSeated, CVNotSeated, CVLastSeated, CVPCs = dHondt(np.array([C, V]), nCVSeats, np.arange(1, nCVSeats+1))
#
print('A, B, F:', ABFSeats, ';  C, V:', CVSeats)
#
ASeats = ABFSeats[0]
BSeats = ABFSeats[1]
CSeats = CVSeats[0]
FSeats = ABFSeats[2]
ISeats = coalitionSeats[2]
NSeats = coalitionSeats[3]
OSeats = coalitionSeats[4]
VSeats = CVSeats[1]
allSeats = [ASeats, BSeats, CSeats, FSeats, ISeats, NSeats, OSeats, VSeats]

print('---------------\nSeats to parties A, B, C, F, I, N, O, V: ', allSeats)
print('Seated coalitions, votes, divisor:', coalitionSeated)
print('Non-Seated coalitions, votes, divisor:', coalitionNotSeated)


A+B+F, C+V, I, N, O: [5 3 0 1 4]
A, B, F: [3 1 1] ;  C, V: [1 2]
---------------
Seats to parties A, B, C, F, I, N, O, V:  [3, 1, 1, 1, 0, 1, 4, 2]
Seated coalitions, votes, divisor: [(0, 833499, 5), (1, 588102, 3), (3, 183724, 1), (4, 605889, 4)]
Non-Seated coalitions, votes, divisor: [(0, 833499, 6), (1, 588102, 4), (2, 65480, 1), (3, 183724, 2), (4, 605889, 5)]

Audit: initial sample size


In [3]:
gamma = 0.95  # tuning constant in the Kaplan-Wald method
alpha = 0.001 # risk limit

u = uMax(coalitionSeated, coalitionNotSeated)

print("Minimum ballot-level comparison sample size = %d\n for max total overstatement = %.2f, u = %.4g, gamma = %.2f, alpha = %.3f" %
      (minSampleSize(Ballots, u, gamma, alpha), Ballots*u, u, gamma, alpha))


Minimum ballot-level comparison sample size = 1903
 for max total overstatement = 262.24, u = 0.0001124, gamma = 0.95, alpha = 0.001